Managing Virtual Infrastructure 101

managing virtual infrastructure

Managing Virtual Infrastructure

To begin, this post has nothing to do with the nuts and bolts of installing, configuring or supporting VMware or any other virtual infrastructure.

It’s about managing virtual infrastructure and building a solid team of admins and engineers to keep it alive.

Back in 2009, I wrote 10 Biggies to Help Managers and Admins Avoid Virtualization Pit-Falls and since then not much has gotten better or changed. Managing virtual infrastructure is not for the faint of heart. It takes creativity, determination and a kick-ass team…

For the IT manager who has been given the new job of taking over management of virtual infrastructure it is a great opportunity, then when the honeymoon is over it is a lot of work, especially if you are taking over after there were problems…

6 Keys to Managing Virtual Infrastructure Like a Pro!

1. Detailed Audit

Start with a detailed audit of all the hardware and software the VI is built from (also know as currency) – I mean what ESX/i host are running on for servers (make, model, warranty dates), the storage provisioned to them (make, model, design), how the network is designed (1/10g, fiber, flex, xSigo, FCOE, copper, bandwidth, firewalls) and versions of ESX/I and vCenter. Part of the “Big” picture!

2. Health Check

While collecting all the information for your audit you’ll also need to find out what’s going on. What’s working and what’s not working. You may have multiple data centers with different cluster designs that are performing differently because of how they were set up. They may be using the same hardware and ESX version but because of the way networking or storage is setup they behave totally different. Another part of the “Big” picture.

Note: If you have the budget, VMware provides a Professional Service to do this.

3. Team Dynamics

Not every System Admin is cut out to be a virtualization engineer because there’s more to building a solid virtual infrastructure then building servers. My opinion of a virtualization engineer is someone that knows storage, networks, security, hardware, Windows/Linux, and scripting. Check out my blog on How to Apply for a VMware Engineer Job for what to look for to staff your team.

A good team has a lead designer who knows the environment and can define clear requirements, installers that can install and configure ESX/ESXi and vCenter according to the design. These two roles can be the same person and focus mainly on the back-end  Then there are the front-end guys who will handle the day to day operations of creating templates, provisioning VMs, customizing VMs, P2Vs, and patching. (OS management and support should be handled by SysAdmins) they should also handle installation and updating VMtools.

Most importantly, you’ll need directors or executive leadership above you that will let you do the job right, and understand virtual infrastructure is not server technology, its infrastructure.

Good team dynamics provide a wide set of job roles to handle virtual infrastructure as a convergence of technologies and should be treat as a data center not as a server. Far too often VI is treated like servers and causes issues for the person managing the VI and team.

4. Road Map and Remediation

Once you have your audit, health check and team, you’ll need a remediation strategy and a road map for future growth. My suggestion for remediation is to standardize all your hardware, ESX installs, and storage and network designs. Try to keep building blocks standard configurations.

Avoid letting every Tom, Dick and Marry deviate from Best Practices because (s)he wants to try something new – even if they are the smartest on your team. New things should be planned, tested, and then added to the roadmap, not just thrown into production adhoc.

I’d bet a dollar,  most of the health checks and audit completed will have a slew of various storage, hardware and configurations from someone’s test or POC environment that became production and then got forgotten. I’ve done, and have been involved in enough POCs in the last 15 years to know how quickly someone’s POC can turn into production.

Somewhere above I mentioned road map  in a nutshell a road map is sort of a life-cycle management plan to remove old stuff and update to new stuff. As I stated, you will need to be determined to stay on track with your road map  Don’t be too aggressive here because as environments grow it takes a long time to properly phase the old out and the new in. Not to mention you will be juggling this with everything else going on. One more thing, don’t forget to celebrate these accomplishments and report them to your boss. Achieving a Milestones is important news.

5. Performance and Health Monitoring (Check Engine Lights)

managing virtual infrastructure

I like to think of problem areas as check engine lights. Do not ignore check engine lights or your VMs will crash – correction – most likely lots of VMs will crash at the same time and services will be offline until the issues are fixed. Then the blame game starts and someone will say to you, “I knew about that problems but I didn’t have time to fix it because I was too busy supporting projects” – at this point you will go crazy!

Whether you use 3rd party tools or have custom scripts for monitoring performance and health, a good monitoring strategy is an absolutely must have requirement. Nothing is worst for a VI manager then calls at 2AM and support tickets related to poor performance of VMs. You and your team need to stay ahead of problems or else they will spread like fire and impact all the VMs running in your vDC (virtual data center).

The problems I’ve found are most tools try to do too much, and normally are bad at reporting what is important. Alert and information overload about every VM, vDisk, vCPU, memory, ESX host or cluster will soon get ignored (Out of the 5000 emails sent, which one is important, right?). Not to mention how much it takes to customize some tools to work right or give you what you need. Also, a lot of tools will requires a full-time resource from your team to focus on fixing all the issues they find, and believe me even the healthiest environment will generate loads of alerts.

6. Capacity Management

Building too much capacity or not enough can be costly. The trick here is to build a set known amount of capacity, and then allow the capacity to be consumed. Then on a set schedule rebuild the capacity back. Trying to build capacity “Just in Time” will always keep you behind the curb and reactive, however, building a pool of capacity is more like having “Capacity as a Service” and allows you to control when you build and how much you need to rebuild. Both work – just one is reactive and the other is proactive. This item is a huge topic of its own and I will try to cover it another day.

Summary

To summarize what I’ve written, Managing Virtual Infrastructure 101 is more about trying to manage virtual infrastructure from a broad scope and planned set of practices, rather then from an ever evolving dysfunctional sprawl from hardware, to storage, to network, to VM, and finally to the guest OS thrown together as it’s required.

Learn to see the big picture and avoid anything temporary because once it’s in production it will not be easy to undo.

Note: this is for large environment with multiple members; smaller shops may have one or two people doing everything from landing hardware to installing networks. The two are different and will have their own set of challenges. This blog focuses on the larger shop with dedicated teams.

BIG Picture

Get "Strategic Ideas That Transform Your Mindset " Sent To You

Leave a Reply

vSphere Data Protection EOA
3 Alternative vSphere Backup Replacements For VDP

Important Notification: vSphere Data Protection (VDP) End of Availability (EOA) That...

vSphere 6
8 Updates That Make vSphere 6 Better – Keith Barker

Editor’s note: Keith Barker has been a CBT Nuggets Trainer since 2012. Some...

VMware Interview Questions
25 VMware Interview Questions And Answers: Tough & Technical (Download PDF)

Free VMware vSphere Interview Guide In this VMinstall Guide I’ll share a...

Project Photon
VMware Project Photon: Technical Review for Linux Admins

My quick and dirty review of Project Photon. I was very happy...

DevOps Plan
Best DevOps Strategy Hack (Winners Start With Why)

What is DevOps? It’s a model. It’s a culture. It’s a...

DevOps Tools Download
DevOps Toolchain: Download 3 Free Tools Used By Pros (Ansible, Git, Jenkins)

Hack Your Resume with DevOps Skills Lately I’ve been getting a...

DevOps Guide
Microsoft DevOps: 25+ Powerful Windows SysAdmin Skills (NEW Guide!)

What is Microsoft DevOps? This is an interesting question because first off,...

continuous integration tools
DevOps Training Courses That Will Give You The Edge (Perfect For Beginners Who Want Skills That Are Noticed By Google & Facebook…)

When I was just a fledgling starting out as a Jr...

Click here to learn how to Optimize WordPress Speed
WordPress Speed Optimization: Part 4 – VPS Case Study To Make VMinstall.com Rank Higher In Google Through Faster Hosting & High Availability @ Lower Cost

This is going to be a quick and dirty part 4...

Best Hosting for WordPress
Case Study: Best VPS For Easy WordPress Migration

Why I moved my WordPress blog off slow shared hosting… Let’s...

cloudways review updated
11 Cloudways Review Pros & Cons From My Deep Dive Testing (VPS Series: Part 2)

Which is a better cPanel alternative of a VPS – Cloudways...

ServerPilot Review
ServerPilot Review: Installing WordPress On DigitalOcean (VPS Series: Part 1)

How to install WordPress on a cloud server like a Pro...

tensorflow training
TensorFlow Tools for Beginners (7 Easy Takeaways To Get You Started)

Introduction To Deep Learning As a follow-up to my book review...

job rut
How To Get MOVING When You’re Stuck In A Career Rut (Video)

You worked hard to get your degree and after graduation you...

Storage Engineer Skills
Why Storage Engineer Skills Are HOT! (Can You Say Big Data?)

3 BIG Skills New Storage Engineers Are Missing Out On… Behind...

Rise of the Robots
I’m Amazed After Reading Rise of the Robots – Best Jobs In 2017 (You won’t Guess what’s going on!)

Rise of the Robots by Martin Ford is the Best Book...

Open Source Storage Software
3 Open Source Storage Software Alternatives For Beginners

Build Your Own Storage Solution! Reflecting on my trip to VMworld...

VMware VCP FOR Hire eBook
Read Excerpts from VCP for Hire eBook

BIG Thanks… Before I share the excerpt of VCP for Hire,...

How to Self Publish Your VMware eBook KDP
How to Self Publish Your VMware eBook KDP

Do you have a VMware book you’d like to self publish...

free chef for 5 servers
Free Hosted Opscode to Try Chef On 5 Servers Before You Buy

Is Chef the configuration management and automation tool for your operations team?...