Does your VMware vSphere Environment require First Aid?
Rescuing A Poor Performing VMware vSphere Virtual Infrastructure
About 4 years ago I was hired by a firm to help rescue their VMware infrastructure. Though they had everything any top performing VMware environment needed for success, including a VMware TAM, it just wasn’t meeting customer requirements for performance.
The job began with me interviewing frustrated developers who demanded I move them off slow VMs back on to physical servers ASAP, “Get me off this VMware crap, now!” It was my job to calm them down and reassure them we were going to fix the slow VM problems.
Helps On The Way!
After about a week the problems were getting clear to me: mixed matched combinations of server hardware, poorly configured storage, oversubscribed network with backup running on the same switches (VLANs) and poor VM configuration and lack of knowledge and expertise.
Basically it was the typical story I’ve seen many times when the free version of ESXi turns deadly.
VMware is really an awesome technology if it’s set up right.
Unfortunately most companies for years have been deploying into an already badly designed infrastructure then expect VMware to somehow improve server performance while at the same time saving money. Adding VMware or any other hyper-visor product will only highlight the issues that already exist: bad storage, network, design and build practices. An unwillingness to change the way things are done in storage and network management will doom your virtual infrastructure real fast.
- First start by getting help from someone with expertise in the virtual infrastructure space. Normally this will be a 3rd party VMware Engineer that is not defending their own bad practices.
- Do an inventory and technical analysis on everything that VMware leverages: how networks are setup, VLAN configuration, firewalls, backup traffic, storage types, configuration, spindle count, drive sizes, RAID configurations, LUN sizes, Brocades, what shares the same storage and network equipment. Then start on server hardware: model, type, brand, CPUs, memory, blade or not blade, firmware, HBA, NICs. Finally – what VMware ESX/I versions are being used: patch level, VMtools and VM configuration, etc.
- Now decide what the long-term expectations are for this VI environment. For example, do you plan to virtualize everything, possible? If so, then it’s time to think about dedicated resources that are only used for virtualization. No more shared networks and storage environments.
- Redesign for the future and start migrating to a new tested, validated and standard virtual infrastructure that uses best practices for virtualization. Note: Ignorance is the enemy and allowing system, network or storage administrators to do foolish things will cause poor performance, outages and down-time.
A well performing virtual infrastructure is standardized, and in my experience, is scalable by adding more of the same blocks of capacity (example only: cluster of 4 hosts, using 4 terabytes, with networks the same and VM configurations following a documented and test design). Note: Try to stay away from one-off requests, if possible.
An unwillingness to change the way things are done in storage and network management will doom your virtual infrastructure real fast.
Diagnosing Slow VMware VMs
The worst environments are normally a result of leaders and teams doing dumb things requested by ignorant managers or executives. Here’s my suggestion on how to solve this.
- Get your vendor partners to come in and do lunch-and-learns or training sessions to help educate staff on products.
- Make sure you ask them plenty of good thought-out questions about what the risks, or disadvantages are, of doing things that are questionable.
Hopefully your vendors will be honest when answering your questions (some may hold back because they sense what you are doing).
SOW for a Health Check
If worst comes to worst, call VMware and get a SOW worked out to have someone do a health check on your vSphere environment. I never understood this but executes find bad news more believable when it comes from a 3rd party.
Your turn to share tips and comments on how to tune and tweak a poor performing VMware and get rid of slow VMs?