How To Choose the Best Hardware from the VMware HCL

Comparing Hardware for vSphere

Weighing the Difference: Tips and Best Practices

A best practice is to start by choosing servers and storage from the VMware HCL Guide, especially if it’s your first time building a vSphere Cloud.

1.  Is Your Hardware on the Compatibility List?

Always make sure anything you purchase for your environment is on the VMware HCL (hardware compatibility list) because you don’t want to end up with a key vSphere feature not supported. This tip is more important to anyone buying used or refurbished equipment, or when upgrading to new ESXi versions.

New hardware will likely be supported by the newest vSphere edition, but I’ve seen a one version updates from ESXi 4.x to 5.x eliminate still good  hardware (less than 3 – 5 years old and under warranty) from the HCL.

2. Choose Servers for vSphere within Your Own Compatibility Standards

I’ve sat through many vendor presentations comparing Dell server with HP servers, and comparing HP blades with Cisco UCS. They all have their advantages and disadvantages. Even whitebox works but the main things to keep in mind is: ease of management, ease of installation and configuration, ease of scaling and monitoring.

Here’s what I mean about having your own standards:

Managing – How easy is it to update firmware, replace failed components, get support from the vendor and does your staff need to be trained for anything new? Is this adhoc or does your Ops team have a standard for doing these ops tasks?

Configuration and Installation – Every server demo makes it look so easy, but once the servers are unboxed and racked, cabled and powered in your DC, will they work the same way? Racking the servers is the easy part, having a provisioning system that installs and configures ESXi is the challenge.

Do you have a PXE boot deployment solution and scripts that will build every host the same way to avoid human error? Or a server build checklist that is followed to ensure all your servers are set up and cabled the same way?

Scaling to add more capacity – Many of the servers on the market today are amazingly huge boxes of CPU and memory. The compute power and resource capacity of one of these beefy servers can probably handle most small environments with just a couple units.

Now as much as I like these monster servers, having smaller more manageable units of capacity is my rule of thumb (20- 25 VMs per host is my limit). This usually puts my servers at something with 2 sockets, 24 cores, 192 GB of memory – the main difference being PCI requirements for HBA, Gbic, RAID controllers and NICs.

You may be asking why only 20 – 25 VMs per host when you can build hosts that handle 50 – 100 and save on licenses and hardware cost…

My quick and easy answer is – hardware fails and 100 VMs failing-over at an unknown time (normally during rush hour) is too much of a risk. It is also not easy to plan the maintenance for so many VMs running on one host (remember these VM servers are services that all need to agree to the maintenance window).

Most Dell, HP or Cisco sales reps have never been up at 2AM upgrading ESX or dealing with a crash from a monster server – I have and it’s not a pretty sight, especially when VPs and CIOs are onsite wanting real-time status updates while the outage is being resolved.

Big servers are great options when you need BIG VMs, then they are perfect – but for most workloads that scale wide, they are the wrong choice for normal workloads – just my opinion!

Plan for server hardware failures, they happen to the best of us!

3. Choosing Storage for your vSphere Environment that Maintains Your Compatibility Standards

Even more confusing than server hardware is storage hardware! I like to narrow the decision to 4 factors: Cost per GB, Criteria, Performance and Support Staff!

Cost Per GB – Comparing the cost of GB of storage between storage of like type of performance is the first factor. Not everything needs tier 1, fast disk or SSD which could mean the difference of $8 per GB, or $30 per GB.

Performance! Performance! Performance! Don’t underestimate how important storage performance is for vSphere…

In my experience, storage has been the root cause of the biggest performance issues I’ve dealt with. The second is network (oversubscribed switches).

Under sizing your storage to save money will put your vSphere environment at risk for chaos. When storage is overloaded every VM using it suffers from it.

Work closely with your vendor and support staff to figure out what your IOP requirements are for your current and expected growth. Don’t assume your new storage will handle everything thrown on it.

To avoid storage contention, I like dedicated storage for virtualization. Sharing disks with physical server iSCSI or NFS mounts, and CIFS is risky and requires someone qualified to properly manage it.

Storage Criteria Check List for vSphere

This is an example I created recently:

Manufacturer:
Requirements: Score
(1 to 10)
Comments if applicable:
Ease of Management  How easy is it to create LUNs? (vCenter plugins?)
Unified Management  Can you manage all storage from one pane of glass?
Monitoring  How easy is it to find heavy hitting VMs?
10GbE  Does it support your existing network?
Usability  Is it dedicated or can it be shared?
Redundancy  (Active/Passive – Active/Active) What are your needs?
De-duplication  Is this required?
SNAPShots  Is this required?
Space  GB Capacity?
Virtual  Is it VMware aware?
Physical  iSCSI, NFS, FC?
Scalable  How does it scale when more storage is needed?
Price per GB  Important to know how much a GB cost!
Replication  Is this required?
Backup  Is this required?
Performance (IOPS)  Very important!

Qualified Storage Support Staff for vSphere – Not everyone is a storage expert and even less have experience carving storage for virtualization. All too late you will find out your storage wasn’t correctly or  fully setup!

Something very important to keep in mind is the more features the storage unit has, the greater the chance it will not be set up properly. The same point goes for any hardware or software product.

For example: You buy a new shiny storage object that the vendor is all too happy to charge for installation and configuration, except it was only the basic configuration that you paid for.

After the setup is done, the storage is turned over to your system admins who are all too happy to start carving LUNs. This goes on for 6 – 12 months or until the performance problems begin.

Rule of thumb! Don’t overestimate your system admins – if they don’t have storage engineering experience, you will end up with “Best Effort” and later have to solve the problems.

Don’t be caught off guard, get your storage admins training and send them to user groups where they will learn from others the best practices that are necessary to keep your storage at peak performance.

4. Choose Hardware from the HCL that works with your Monitoring Solution

Nothing more to say here except ask your vendor lots of questions because monitoring is normally an after thought – a biggy is how do you monitor IOPS on storage, by LUN or by individual VM?

Example: What if there are 100 VMs on a LUN and only one of the VMs is the heavy hitter that is impacting the other 99? How does your storage alert me for this?

Conclusion:

I know this was much more information than just VMware HCL, but you should have your own hardware compatibility standards and best practices. And it shouldn’t only be price or brand that influences the purchase of new hardware.

Far too often something new will cause problems because nobody was looking at the bigger picture… all they saw was something shiny and new.

Your Turn:

Do you have a hardware recommendation best practice you want to share, please comment!

Now it's your turn...