Managers! Is it time for a VMware health check to make sure your virtualization investment is not wasted?
Optimize TCO and ROI by Keeping vSphere in Shape
I know infrastructure budgets are tight, and I also know how difficult it can be to balance CAPEX and OPEX from month to month.
Over the years, I’ve had the pleasure of creating reports for ROI and TCO as well as calculating monthly staff and environment utilization down to the kilowatt.
That’s why performing a regular VMware health check is important because it helps keep VMware costs of ownership low by exposing waste, and providing opportunities to take proactive action.
Barring some of the finer details, here’s what I look for:
1. Start by going vZombie Hunting!
The biggest culprit of a poorly utilized virtual environment is VM sprawl. Sprawl is when virtual machines get deployed for a project and then they are left unused, or when old services get upgraded to new servers but the original VMs never get decommissioned, or when the service gets shut down but the servers never get turned off.
Stop VM Sprawl Before It’s Too Late!
In these cases you end up with valuable resources burning cycles on storage, server and network hardware for nothing – thus they are vZombies (BTW – I started using this term years ago) and candidates for the VMware health check report.
Hunting vZombie servers isn’t easy unless you have a tool such as VMware OpsManager or vKernel. The other way is to create a custom VMware health check powershell script for checking and logging when there is no CPU, memory or network traffic on a VM. Normally a flatline is a good indicator of a zombie.
Once you track down the vZombies and check with the service owner to get the go-ahead, turn them off, back them up, and delete these VMs from your vCenter inventory.
Also don’t forget that some server IP addresses have firewall rules and VIPS associated with them so clean them up, too!
2. Retire Old Server Hardware that is OOW and EOL
No VMware health check would be complete without retiring Out of Warranty (OOW) and End of Life (EOL) server hardware.
Read my lips, this hardware is wasting your ESX licenses because you cannot get enough memory or CPU cores in these systems to leverage your “per socket” ESX license efficiently. Old servers are inefficient for VMware!2. Retire Old Server Hardware that is OOW and EOL
For example: One loaded HP 380G8 or Dell R420 can handle more memory and CPU cores than 4 – 6 old servers and will still use only 2 ESX licenses. Also consolidation on new servers is good for reducing rack U, using less power and cooling, lowering port count on switches, lowering warranty renewals, reducing management overhead from less physical servers and lowers down time from failures of tired junk.
Also, another best practice to reduce the risk of reusing OOW and EOL servers is to get rid of this junk so your admins don’t reuse them. I’ve seen too many junk servers pulled from the bone yard and put back into service because they were available. I repeat…Old servers are inefficient for VMware! If it’s now on the current VMware HCL it should be disposed of…
3. Standardize Configuration for Good VMware Health
A little bit of HP EVA, and a little bit of NetApp, and a little bit of local disk might make for a good song lyric, but they add up to a vSphere that is hard to manage, optimize and keep efficient.
And the same goes for mixed-matched servers of all makes and models as well as mixed-matched configurations of memory and CPU types, such as: servers with 32GB, and others with 64GB, and even others with 192GB all in the same vSphere ESX cluster…
A good best practice to follow is taking inventory of your equipment and enforcing standardization of hardware configurations. This is key to optimizing your VMware investment because one-off environments are trouble and need to be on the VMware health check report so they can be dealt with!
4. Report Bloated Virtual Servers with Too Much Memory, CPU and Disk
Finally, a thorough report will include resource waste such as VMs that were created with too much memory, CPU and disk space.
Over using valuable vSphere resources is common in some vSphere environments because engineers and developers are used to ordering servers based on physical criteria and have never been shown the proof that their servers are only using 20% of the resources they were provisioned.
The unfortunate thing here – depending on the scale – is you may not be able to clean up existing systems because too much work may be involved but you can start to trim back resources on newly deployed VMs.
Small, Medium, Large Virtual Servers
A good best practice is to come up with some standard configurations for VM sizes such as small, medium and large that have various memory, CPU and disk size configurations. This will also make capacity management easy since now you have a set block that you can calculate capacity from. This is not uncommon and most cloud hosts use standard sizes for their VMs.
5. Taking Action after the VMware Health Check is Complete
A good VMware health check documents and lists all offenders of these best practices. Once the report is completed you will want an action plan that road maps:
- How zombie VMs are decommissioned to get rid of sprawl.
- How old hardware is replaced and disposed of to get rid of inefficient server hardware.
- How storage, server and network systems are standardized and consolidated to reduce overall CAPEX and OPEX.
- How VM configurations will be standardized and made more acceptable and efficient.
Over all, this VMware health check focuses on cleanup tasks, standardization and consolidation that will make your virtualization more efficient and help increase the return on your investment.
There are other things you can add to the check such as: VMtools and firmware updating, and ESXi upgrading, but I listed key areas that should be included in your VMware health check report that target reducing TCO and improving ROI.
VMware Health Check Scripts and Services:
- Order a VMware Professional Service Engagement
- Old ESX Health Check Script
- Another Script with Report