Managing Virtual Infrastructure 101

managing virtual infrastructure

Managing Virtual Infrastructure

To begin, this post has nothing to do with the nuts and bolts of installing, configuring or supporting VMware or any other virtual infrastructure.

It’s about managing virtual infrastructure and building a solid team of admins and engineers to keep it alive.

Back in 2009, I wrote 10 Biggies to Help Managers and Admins Avoid Virtualization Pit-Falls and since then not much has gotten better or changed. Managing virtual infrastructure is not for the faint of heart. It takes creativity, determination and a kick-ass team…

For the IT manger who has been given the new job of taking over management of virtual infrastructure it is a great opportunity, then when the honeymoon is over it is a lot of work, especially if you are taking over after there were problems…

6 Keys to Managing Virtual Infrastructure Like a Pro!

1. Detailed Audit

Start with a detailed audit of all the hardware and software the VI is built from (also know as currency) – I mean what ESX/i host are running on for servers (make, model, warranty dates), the storage provisioned to them (make, model, design), how the network is designed (1/10g, fiber, flex, xSigo, FCOE, copper, bandwidth, firewalls) and versions of ESX/I and vCenter. Part of the “Big” picture!

2. Health Check

While collecting all the information for your audit you’ll also need to find out what’s going on. What’s working and what’s not working. You may have multiple data centers with different cluster designs that are performing differently because of how they were set up. They may be using the same hardware and ESX version but because of the way networking or storage is setup they behave totally deferent. Another part of the “Big” picture.

Note: If you have the budget, VMware provides a Professional Service to do this.

3. Team Dynamics

Not every System Admin is cut out to be a virtualization engineer because there’s more to building a solid virtual infrastructure then building servers. My opinion of a virtualization engineer is someone that knows storage, networks, security, hardware, Windows/Linux, and scripting. Check out my blog on How to Apply for a VMware Engineer Job for what to look for to staff your team.

A good team has a lead designer who knows the environment and can define clear requirements, installers that can install and configure ESX/ESXi and vCenter according to the design. These two roles can be the same person and focus mainly on the back-end  Then there are the front-end guys who will handle the day to day operations of creating templates, provisioning VMs, customizing VMs, P2Vs, and patching. (OS management and support should be handled by SysAdmins) they should also handle installation and updating VMtools.

Most importantly, you’ll need directors or executive leadership above you that will let you do the job right, and understand virtual infrastructure is not server technology, its infrastructure.

Good team dynamics provide a wide set of job roles to handle virtual infrastructure as a convergence of technologies and should be treat as a data center not as a server. Far too often VI is treated like servers and causes issues for the person managing the VI and team.

4. Road Map and Remediation

Once you have your audit, health check and team, you’ll need a remediation strategy and a road map for future growth. My suggestion for remediation is to standardize all your hardware, ESX installs, and storage and network designs. Try to keep building blocks standard configurations.

Avoid letting every Tom, Dick and Marry deviate from Best Practices because (s)he wants to try something new – even if they are the smartest on your team. New things should be planned, tested, and then added to the roadmap, not just thrown into production adhoc.

I’d bet a dollar,  most of the health checks and audit completed will have a slew of various storage, hardware and configurations from someone’s test or POC environment that became production and then got forgotten. I’ve done, and have been involved in enough POCs in the last 15 years to know how quickly someone’s POC can turn into production.

Somewhere above I mentioned road map  in a nutshell a road map is sort of a life-cycle management plan to remove old stuff and update to new stuff. As I stated, you will need to be determined to stay on track with your road map  Don’t be too aggressive here because as environments grow it takes a long time to properly phase the old out and the new in. Not to mention you will be juggling this with everything else going on. One more thing, don’t forget to celebrate these accomplishments and report them to your boss. Achieving a Milestones is important news.

5. Performance and Health Monitoring (Check Engine Lights)

managing virtual infrastructure

I like to think of problem areas as check engine lights. Do not ignore check engine lights or your VMs will crash – correction – most likely lots of VMs will crash at the same time and services will be offline until the issues are fixed. Then the blame game starts and someone will say to you, “I knew about that problems but I didn’t have time to fix it because I was too busy supporting projects” – at this point you will go crazy!

Whether you use 3rd party tools or have custom scripts for monitoring performance and health, a good monitoring strategy is an absolutely must have requirement. Nothing is worst for a VI manager then calls at 2AM and support tickets related to poor performance of VMs. You and your team need to stay ahead of problems or else they will spread like fire and impact all the VMs running in your vDC (virtual data center).

The problems I’ve found are most tools try to do too much, and normally are bad at reporting what is important. Alert and information overload about every VM, vDisk, vCPU, memory, ESX host or cluster will soon get ignored (Out of the 5000 emails sent, which one is important, right?). Not to mention how much it takes to customize some tools to work right or give you what you need. Also, a lot of tools will requires a full-time resource from your team to focus on fixing all the issues they find, and believe me even the healthiest environment will generate loads of alerts.

6. Capacity Management

Building too much capacity or not enough can be costly. The trick here is to build a set know amount of capacity, and then allow the capacity to be consumed. Then on a set schedule rebuild the capacity back. Trying to build capacity “Just in Time” will always keep you behind the curb and reactive, however, building a pool of capacity is more like having “Capacity as a Service” and allows you to control when you build and how much you need to rebuild. Both work – just one is reactive and the other is proactive. This item is a huge topic of its own and I will try to cover it another day.


To summarize what I’ve written, Managing Virtual Infrastructure 101 is more about trying to manage virtual infrastructure from a broad scope and planned set of practices, rather then from an ever evolving dysfunctional sprawl from hardware, to storage, to network, to VM, and finally to the guest OS thrown together as it’s required.

Learn to see the big picture and avoid anything temporary because once it’s in production it will not be easy to undo.

Note: this is for large environment with multiple members; smaller shops may have one or two people doing everything from landing hardware to installing networks. The two are different and will have their own set of challenges. This blog focuses on the larger shop with dedicated teams.

WAIT! Before You Go... Join Other Go-Getters

Enter your name & email address below to get started.

We're listening. Go ahead and share your ideas?

vSphere 6
8 Updates That Make vSphere 6 Better – Keith Barker
25 VMware Interview Questions: Tough & Technical (Download)
VMware jobs in Phoenix
VMware Jobs in Phoenix Arizona (Comparing Opportunities)
DevOps training
10 Reasons VMware Admins Should Boost OpenStack Skills
DevOps Tools
DevOps Toolchain: Download 3 Free Tools Used By Pros (Ansible, Git, Jenkins)
MSDevOps Guide
Microsoft DevOps: 25+ Power-Packed Windows SysAdmin Skills (NEW Guide!)
continuous integration tools
DevOps Training Courses That Will Give You The Edge (Perfect For Beginners Who Want Skills That Are Noticed By Google & Facebook…)
Why DevOps Fails: How To Get Back On Track
Why DevOps Fails: How To Get Back On Track
Click here to learn how to Optimize WordPress Speed
WordPress Speed Optimization: Part 4 – VPS Case Study To Make Rank Higher In Google Through Faster Hosting & High Availability @ Lower Cost
Best Hosting for WordPress
Case Study: Best VPS For Easy WordPress Migration
cloudways review updated
Cloudways Review: 12 Pros & Cons From My Deep Dive Testing (VPS Series: Part 2)
ServerPilot Review: Installing WordPress On DigitalOcean (VPS Series: Part 1)
Storage Engineer Skills
Why Storage Engineer Skills Are HOT! (Can You Say Big Data?)
Rise of the Robots
I’m Amazed After Reading Rise of the Robots – Best Jobs In 2017 (You won’t Guess what’s going on!)
top virtual reality jobs
30+ Awesome Virtual Reality Jobs That Take You From Boring To Amazing! (Must Read…)
Thinking Like Seth Godin
How To STOP A 300% Salary Loss Because YOU Think Like A SysAdmin: Start Thinking Like Seth Godin (Read Now!)
3 Open Source Storage Software Alternatives For Beginners
shades of devops
Free DevOps eBook Project (Read It Now!)
How to install Genymotion
How To Install Genymotion for Beginners (Includes Google Play)
20 Questions VMware Newbies Want Answered (Free Career Guide)