VMware Health Check (Improve Uptime & Efficiency)

VMware Health Check

Managers! Is it time for a VMware health check to make sure your virtualization investment is not wasted?

Improve Uptime by Keeping vSphere in Shape

I know infrastructure budgets are tight, and I also know how difficult it can be to balance CAPEX and OPEX from month to month.

Over the years, I’ve had the pleasure of creating SLA Uptime reports as well as calculating monthly staff and environment utilization down to the kilowatt.

This is why performing a regular VMware health check is important because it helps keep VMware costs of ownership low by exposing waste, and providing opportunities to take proactive actions.

Barring some of the finer details, here’s what I look for:

1. Start by going vZombie Hunting!

The biggest culprit of a poorly utilized vSphere environment is VM sprawl.

What is sprawl?

Sprawl is when virtual machines get deployed for a project and then they are left unused, or when old services get upgraded with new virtual servers…

…but the old VMs never get decommissioned. Or when the service gets shut down but the servers never get turned off …

VMware Health Check for zomies

Stop VM Sprawl Before It’s Too Late!

When sprawl happens you end up with valuable resources burning cycles on storage, servers, and network hardware for nothing. They are vZombies (BTW – I started using this term years ago) and prefect candidates for decommission.

Hunting vZombie servers isn’t easy unless you have a tool such as VMware OpsManager or vKernel.

The other way is to create a custom VMware health check PowerShell script for checking and logging when there is no CPU, memory or network traffic on a VM.

Normally a flatline is a good indicator of a zombie.

Once you track down the vZombies and check with the service owner to get the go-ahead, turn them off, back them up, and delete these VMs from your vCenter inventory. (Note: follow the standard decommissioning process)

Also don’t forget some server IP addresses have firewall rules and VIPS associated with them so clean them up, too!

2. Retire Old Server Hardware that is OOW and EOL

No VMware health check would be complete without retiring Out of Warranty (OOW) and End of Life (EOL) server hardware.

Read my lips, this hardware is wasting your ESXi licenses because you cannot get enough memory or CPU cores in these systems to leverage your “per socket” ESX license efficiently.

Old servers are inefficient for VMware and server hardware OOW and EOL should be retired, ASAP!

For example: One loaded HP 380G8 or Dell R420 can handle more memory and CPU cores than 4 – 6 old servers and will still use only 2 ESX licenses. Also consolidation on new servers is good for reducing rack U, using less power and cooling, lowering port count on switches, lowering warranty renewals, reducing management overhead from less physical servers and lowers down time from failures of tired junk.

Also, another best practice to reduce the risk of reusing OOW and EOL servers is to get rid of this junk so your admins don’t reuse them.

I’ve seen too many junk servers pulled from the bone yard and put back into service because they were available.

I repeat…Old servers are inefficient for VMware!

If it’s now on the current VMware HCL it should be disposed of…

3. Standardize Configuration for Good VMware Health

A little bit of HP EVA, and a little bit of NetApp, and a little bit of local disk might make for a good song lyric, but they add up to a vSphere that is hard to manage, optimize and keep efficient.

And the same goes for mixed-matched servers of all makes and models as well. Mixed-matched configurations of memory and CPU types in the same cluster is a no-no!

Some servers with 32GB, and others with 64GB, and even others with 192GB all in the same vSphere ESX cluster… this is a recipe for data loss and poor uptime.

A good best practice to follow is taking inventory of your equipment and enforcing standardization of hardware configurations.

This is key to optimizing your VMware investment because one-off environments are trouble and need to be on the VMware health check report so they can be dealt with!

4. Report Bloated Virtual Servers with Too Much Memory, CPU and Disk

Finally, a thorough report will include resource waste such as VMs that were created with too much memory, CPU and disk space (aka over provisioned).

Over using valuable vSphere resources is common in some vSphere environments because engineers and developers are used to ordering servers based on physical criteria. This is because they have never been shown the proof their servers are only using 20% of the resources they have provisioned (another reason for a good tool).

The unfortunate thing here – depending on the scale – is you may not be able to clean up existing systems because too much work may be involved. But you can start to trim back resources on newly deployed VMs.

Small, Medium, Large Virtual Servers

A good best practice is to come up with some standard configurations for VM sizes such as small, medium, and large; with various memory, CPU and disk size configurations.

This will also make capacity management easier since now you have a set block that you can calculate capacity from.

This is not uncommon and most cloud services use standard sizes for their VMs.

5. Take Action ASAP…

A good VMware health check documents and lists all offenders of these best practices. Once the report is completed you will want an action plan that road maps:

  • How zombie VMs are to be decommissioned to get rid of sprawl and reclaim resources.
  • How old hardware will be replaced and disposed of to get rid of inefficient server hardware and improve TCO and uptime.
  • How storage, servers, and network systems are standardized and consolidated to reduce overall CAPEX and OPEX. And to make them easier to manage.
  • How VM configurations will be standardized and made more efficient.

In Conclusion:

Over all, this VMware health check focuses on cleanup tasks, standardization, and consolidation that will make your vSphere more efficient and help increase the return on your investment and reduce downtime.

There are other tasks you can add to the check such as:

  • VMtools and firmware updating
  • ESXi upgrading

But I listed key areas that should be included in your standard VMware health check report.

VMware Health Check Scripts and Services: 

Do you have a recommendation to add?

BIG Picture

Get "Strategic Ideas That Transform Your Mindset " Sent To You

Discussion

Leave a Reply

vSphere Data Protection EOA
3 Alternative vSphere Backup Replacements For VDP

Important Notification: vSphere Data Protection (VDP) End of Availability (EOA) That...

vSphere 6
8 Updates That Make vSphere 6 Better – Keith Barker

Editor’s note: Keith Barker has been a CBT Nuggets Trainer since 2012. Some...

VMware Interview Questions
25 VMware Interview Questions And Answers: Tough & Technical (Download PDF)

Free VMware vSphere Interview Guide In this VMinstall Guide I’ll share a...

Project Photon
VMware Project Photon: Technical Review for Linux Admins

My quick and dirty review of Project Photon. I was very happy...

DevOps Plan
Best DevOps Strategy Hack (Winners Start With Why)

What is DevOps? It’s a model. It’s a culture. It’s a...

DevOps Tools Download
DevOps Toolchain: Download 3 Free Tools Used By Pros (Ansible, Git, Jenkins)

Hack Your Resume with DevOps Skills Lately I’ve been getting a...

DevOps Guide
Microsoft DevOps: 25+ Powerful Windows SysAdmin Skills (NEW Guide!)

What is Microsoft DevOps? This is an interesting question because first off,...

continuous integration tools
DevOps Training Courses That Will Give You The Edge (Perfect For Beginners Who Want Skills That Are Noticed By Google & Facebook…)

When I was just a fledgling starting out as a Jr...

Click here to learn how to Optimize WordPress Speed
WordPress Speed Optimization: Part 4 – VPS Case Study To Make VMinstall.com Rank Higher In Google Through Faster Hosting & High Availability @ Lower Cost

This is going to be a quick and dirty part 4...

Best Hosting for WordPress
Case Study: Best VPS For Easy WordPress Migration

Why I moved my WordPress blog off slow shared hosting… Let’s...

cloudways review updated
11 Cloudways Review Pros & Cons From My Deep Dive Testing (VPS Series: Part 2)

Which is a better cPanel alternative of a VPS – Cloudways...

ServerPilot Review
ServerPilot Review: Installing WordPress On DigitalOcean (VPS Series: Part 1)

How to install WordPress on a cloud server like a Pro...

tensorflow training
TensorFlow Tools for Beginners (7 Easy Takeaways To Get You Started)

Introduction To Deep Learning As a follow-up to my book review...

job rut
How To Get MOVING When You’re Stuck In A Career Rut (Video)

You worked hard to get your degree and after graduation you...

Storage Engineer Skills
Why Storage Engineer Skills Are HOT! (Can You Say Big Data?)

3 BIG Skills New Storage Engineers Are Missing Out On… Behind...

Rise of the Robots
I’m Amazed After Reading Rise of the Robots – Best Jobs In 2017 (You won’t Guess what’s going on!)

Rise of the Robots by Martin Ford is the Best Book...

Open Source Storage Software
3 Open Source Storage Software Alternatives For Beginners

Build Your Own Storage Solution! Reflecting on my trip to VMworld...

VMware VCP FOR Hire eBook
Read Excerpts from VCP for Hire eBook

BIG Thanks… Before I share the excerpt of VCP for Hire,...

How to Self Publish Your VMware eBook KDP
How to Self Publish Your VMware eBook KDP

Do you have a VMware book you’d like to self publish...

free chef for 5 servers
Free Hosted Opscode to Try Chef On 5 Servers Before You Buy

Is Chef the configuration management and automation tool for your operations team?...