Have Zombies and Bullies Overrun Your Cloud Capacity?

bully VMA Lesson on Cloud Resource Management

Isn’t it interesting how technology mimics life (and in this case) after life?

If you’re like me you probably get 10 invites, or spam emails, a week about new storage products or some new software that will help you manage your cloud better? Obviously marketers know we have cloud capacity and operations problems or they wouldn’t be sending us these messages so often.

Managing a cloud isn’t easy, even with tools, and the larger the cloud is the more likely zombies and bullies will become a problem.

Are you wondering what I’m talking about? Well, hang in there and I’ll explain.

zombie vmFirst let’s discuss what a zombie is.

In the movies zombies are normally flesh eating dead people that really have no purpose except to feed on you. And in the case of the zombie I’m talking about, they are virtual machines, servers, or desktops, that are left running “unused” in your cloud – feeding on cloud resources.

For example:

Maybe once upon a time there was a project that required building a bunch of servers for a new product and 50 VMs were spun up with 2 CPU, 4 GB memory and 40 GB of disk space. All of the VMs were joined to the network and DNS was reserved, and firewall rules and a VIP were created.

Then as things normally happen in IT, the plan changed or the project was cancelled and everyone went in new directions. But unfortunately nobody on the infrastructure team was asked to decommission these project VMs so today (5 weeks, 5 months or a year later) they are still running and using up valuable resources: 50 vCPUs, 200 GB memory, 2 TB storage, 50+ IP addresses.

In the example I just gave the VMs were accidentally just left running but another example is when they are intentionally left because (just in case) somewhere in the future someone decides to restart the project. Really! Two years have past and you have 50 VMs that are powered off in your vCenter or OpenStack that you keep around just in case.

These are Zombies feeding on your cloud!

Do you have “Zombie VMs” eating up expensive system resources and producing no value? Hunt them down and get rid of them. Left unchecked this will happen over and over until a large percentage of your cloud is wasted.

BTW, Amazon, Azure and Rackspace love charging you monthly for Zombies!

What are Bullies? You guessed it, VMs.

Bully VMs are very different than Zombie VMs and most of the time they actually have a purpose other than to eat your flesh.

Here’s the common definition about what we know about bullies.

  • They generally pick on the smaller guy.
  • They take away lunch money.
  • And they make fun of people.

Well, in the cloud, bullies tend to do the same stuff but in a slightly different way. They take away all the system resources from smaller VMs sharing the same host, network, and storage.

For example:

Imagine you have an MSSQL DB VM sharing a host, storage, and network with other App VMs. The majority of the time everything is fine but every “month end”, this DB VM becomes a Raging Hulk and starts hammering all the system resources for IOPs, CPU, and bandwidth.

The person on the front end running month-end reports isn’t aware this is happening, but other departments are feeling it because all of the sudden their application is sluggish and taking much longer to load up, or is crashing.

There’s a Bully in your cloud!

In a public cloud bullies aren’t a problem (per se) because AWS will automatically handle the additional demand so other VMs are not starved of resources. But in a private cloud this can go on for months before anyone realizes what is causing the slowness. Especially if there are multiple bullies invading at the same time.

Another common cloud capacity management problem is VM sprawl and you can find out more about it in my health check lesson.

Wrap up.

We covered zombie VMs and bully VMs. Both of these types of virtual machines impact the capacity and health of a cloud. And both need to be hunted down and dealt with before they cause untimely downtime and impact your server or application availability SLA.

A well documented  and followed decommissioning process is best for minimizing zombies and a capacity strategy for managing workload will help isolate bullies, DRS and SDRS also help.

Thanks for your time and interest!

Leave a Reply

vSphere Data Protection EOA
3 Alternative vSphere Backup Replacements For VDP

Important Notification: vSphere Data Protection (VDP) End of Availability (EOA) That...

vSphere 6
8 Updates That Make vSphere 6 Better – Keith Barker

Editor’s note: Keith Barker has been a CBT Nuggets Trainer since 2012. Some...

VMware Interview Questions
25 VMware Interview Questions And Answers: Tough & Technical (Download PDF)

Free VMware vSphere Interview Guide In this VMinstall Guide I’ll share a...

Project Photon
VMware Project Photon: Technical Review for Linux Admins

My quick and dirty review of Project Photon. I was very happy...

DevOps Plan
Best DevOps Strategy Hack (Winners Start With Why)

What is DevOps? It’s a model. It’s a culture. It’s a...

DevOps Tools Download
DevOps Toolchain: Download 3 Free Tools Used By Pros (Ansible, Git, Jenkins)

Hack Your Resume with DevOps Skills Lately I’ve been getting a...

DevOps Guide
Microsoft DevOps: 25+ Powerful Windows SysAdmin Skills (NEW Guide!)

What is Microsoft DevOps? This is an interesting question because first off,...

continuous integration tools
DevOps Training Courses That Will Give You The Edge (Perfect For Beginners Who Want Skills That Are Noticed By Google & Facebook…)

When I was just a fledgling starting out as a Jr...

Click here to learn how to Optimize WordPress Speed
WordPress Speed Optimization: Learn To Rank Higher In Google

This lesson is going to be a quick and dirty part 4...

Best Hosting for WordPress
Case Study: Best VPS For Easy WordPress Migration

Why I moved my WordPress blog off slow shared hosting… Let’s...

cloudways review updated
11 Cloudways Review Pros & Cons From My Deep Dive Testing (VPS Series: Part 2)

Which is a better cPanel alternative of a VPS – Cloudways...

ServerPilot Review
ServerPilot / DigitalOcean Review: Installing WordPress Using A VPS

How to install WordPress on a cloud server like a Pro...

What’s So Interesting About Woz U?

On 10/12/2017, I had the privilege of attending the AZ Tech...

tensorflow training
TensorFlow Tools for Beginners (7 Easy Takeaways To Get You Started)

Introduction To Deep Learning As a follow-up to my book review...

job rut
How To Get MOVING When You’re Stuck In A Career Rut (Video)

You worked hard to get your degree and after graduation you...

Storage Engineer Skills
Why Storage Engineer Skills Are HOT! (Can You Say Big Data?)

3 BIG Skills New Storage Engineers Are Missing Out On… Behind...

Open Source Storage Software
3 Open Source Storage Software Alternatives For Beginners

Build Your Own Storage Solution! Reflecting on my trip to VMworld...

VMware VCP FOR Hire eBook
Read Excerpts from VCP for Hire eBook

BIG Thanks… Before I share the excerpt of VCP for Hire,...

How to Self Publish Your VMware eBook KDP
How to Self Publish Your VMware eBook KDP

Do you have a VMware book you’d like to self publish...

free chef for 5 servers
Free Hosted Opscode to Try Chef On 5 Servers Before You Buy

Is Chef the configuration management and automation tool for your operations team?...