vSphere Monitoring for Health & Performance

vsphere daily monitoring checksManaging and supporting various vSphere environments over the years has given me a binoculars view of what vSphere monitoring checks are best for keeping ESXi hosts healthy and VMs at peak performance.

And, now that most IT departments have realized that cramming as many VMs as possible onto a host is not a good or best practice, it’s easier to justify health and performance over virtual machine density relative to return on investment.

Think of your vSphere as a STACK

From the bottom up, you have (physical stack):

  • network switches, VLANs, interfaces and connections
  • storage units, LUNs, and spindles
  • server hardware, memory, CPUs, hard drives
  • ESXi
  • vCenter & client
  • databases
  • monitoring tools

Then there’s virtual resources such as (virtual stack):

  • vSwitches
  • datastore
  • clusters
  • virtual hardware
  • resource pools
  • vCloud Director

And finally on top of that are (platform stack):

  • VMs
  • OS
  • Middleware
  •  vApps and Apps.

Most health and performance problems normally originate at the physical stack (in networking, storage, or server hardware) and affect the virtual stack.

For example:

Poor network performance on all the VMs sharing a vSwitch that is connected via a NIC to an oversubscribed network switch.

or

Poor performance on all the VMs that share the same oversubscribed storage LUN.

or

Poor performance on all the VMs that share the same oversubscribed server hardware.

7 vSphere Monitoring Recommendations

  1. Monitoring networking to ensure there are no switches or ports that are oversubscribed due to too many VMs sharing a single port or VLAN. This used to be a common health problem before 10G came along.
  2. Monitoring storage for available spare capacity in datastores and to ensure you are not queuing during peak business hours. Queuing is when there is not enough IO being processed by your storage unit and reads and write are backing up (or queuing). This used to be a big health problem when disk drive capacity grew but spindle counts shrunk. With SSD and cache, this has been improved but it is still important to ensure queuing is not happening during peak hours when top performance is critical.
  3. Monitoring for server hardware alerts and failures is another important daily health check. This can include firmware and drivers for blades, chassis, controllers, NICs, and various other hardware devices.
  4. Monitoring resource thresholds that are set to ensure more hardware resources are added before your warning thresholds are breached. Allowing an ESXi host to be oversubscribed puts all the VMs at risk of poor performance since it only takes one highly used VM (bully) to cause performance issues across all the VMs sharing the same host. I’ve found that at least N+1 Esxi host per cluster allows for DRS and HA to safely protect your vSphere.
  5. Monitoring for orphan and zombie VMs that waste valuable physical and virtual hardware resources. As well as snapshots and over provisioning of memory and storage resources during VM deployments. Note: Reducing assigned resources after the VM deployment is almost impossible in most cases regardless what capacity tools you have in place.
  6. Monitoring ESXi patches and VMTool upgrades to ensure important security and system updates are applied regularly.
  7. Monitoring that your VM OS deployment templates are up to date to ensure best practices are consistent after ESXi updates and upgrades are rolled out. Or that new OS patches and revisions are rolled up into new templates.

There are health check scripts that you can find online but these 7 monitoring recommendations will help ensure your vSphere is healthy and VMs are at top performance.

Conclusion

Never fall into the trap of letting density over rule performance, especially if the environment you are supporting is running business critical applications that will impact your company’s customers.

This only covers vSphere monitoring checks that relate to the infrastructure stack (IaaS).

What is not covered are the checks and monitoring for the VM OS and application stacks (PaaS and SaaS).

Those checks are generally assigned to another team or group (DevOps or System Engineers).

Other recommended reading: vSphere 5 Monitoring and Performance Guide

Leave a Reply

vSphere Data Protection EOA
3 Alternative vSphere Backup Replacements For VDP

Important Notification: vSphere Data Protection (VDP) End of Availability (EOA) That...

vSphere 6
8 Updates That Make vSphere 6 Better – Keith Barker

Editor’s note: Keith Barker has been a CBT Nuggets Trainer since 2012. Some...

VMware Interview Questions
25 VMware Interview Questions And Answers: Tough & Technical (Download PDF)

Free VMware vSphere Interview Guide In this VMinstall Guide I’ll share a...

Project Photon
VMware Project Photon: Technical Review for Linux Admins

My quick and dirty review of Project Photon. I was very happy...

DevOps Plan
Best DevOps Strategy Hack (Winners Start With Why)

What is DevOps? It’s a model. It’s a culture. It’s a...

DevOps Tools Download
DevOps Toolchain: Download 3 Free Tools Used By Pros (Ansible, Git, Jenkins)

Hack Your Resume with DevOps Skills Lately I’ve been getting a...

DevOps Guide
Microsoft DevOps: 25+ Powerful Windows SysAdmin Skills (NEW Guide!)

What is Microsoft DevOps? This is an interesting question because first off,...

continuous integration tools
DevOps Training Courses That Will Give You The Edge (Perfect For Beginners Who Want Skills That Are Noticed By Google & Facebook…)

When I was just a fledgling starting out as a Jr...

Click here to learn how to Optimize WordPress Speed
WordPress Speed Optimization: Part 4 – VPS Case Study To Make VMinstall.com Rank Higher In Google Through Faster Hosting & High Availability @ Lower Cost

This is going to be a quick and dirty part 4...

Best Hosting for WordPress
Case Study: Best VPS For Easy WordPress Migration

Why I moved my WordPress blog off slow shared hosting… Let’s...

cloudways review updated
11 Cloudways Review Pros & Cons From My Deep Dive Testing (VPS Series: Part 2)

Which is a better cPanel alternative of a VPS – Cloudways...

ServerPilot Review
ServerPilot Review: Installing WordPress On DigitalOcean (VPS Series: Part 1)

How to install WordPress on a cloud server like a Pro...

DesTechAZ
What’s So Interesting About Woz U?

On 10/12/2017, I had the privilege of attending the AZ Tech...

tensorflow training
TensorFlow Tools for Beginners (7 Easy Takeaways To Get You Started)

Introduction To Deep Learning As a follow-up to my book review...

job rut
How To Get MOVING When You’re Stuck In A Career Rut (Video)

You worked hard to get your degree and after graduation you...

Storage Engineer Skills
Why Storage Engineer Skills Are HOT! (Can You Say Big Data?)

3 BIG Skills New Storage Engineers Are Missing Out On… Behind...

Open Source Storage Software
3 Open Source Storage Software Alternatives For Beginners

Build Your Own Storage Solution! Reflecting on my trip to VMworld...

VMware VCP FOR Hire eBook
Read Excerpts from VCP for Hire eBook

BIG Thanks… Before I share the excerpt of VCP for Hire,...

How to Self Publish Your VMware eBook KDP
How to Self Publish Your VMware eBook KDP

Do you have a VMware book you’d like to self publish...

free chef for 5 servers
Free Hosted Opscode to Try Chef On 5 Servers Before You Buy

Is Chef the configuration management and automation tool for your operations team?...