Have Zombies and Bullies Overrun Your Cloud Capacity?

bully VMA Lesson on Cloud Resource Management

Isn’t it interesting how technology mimics life (and in this case) after life?

If you’re like me you probably get 10 invites, or spam emails, a week about new storage products or some new software that will help you manage your cloud better? Obviously marketers know we have cloud capacity and operations problems or they wouldn’t be sending us these messages so often.

Managing a cloud isn’t easy, even with tools, and the larger the cloud is the more likely zombies and bullies will become a problem.

Are you wondering what I’m talking about? Well, hang in there and I’ll explain.

zombie vmFirst let’s discuss what a zombie is.

In the movies zombies are normally flesh eating dead people that really have no purpose except to feed on you. And in the case of the zombie I’m talking about, they are virtual machines, servers, or desktops, that are left running “unused” in your cloud – feeding on cloud resources.

For example:

Maybe once upon a time there was a project that required building a bunch of servers for a new product and 50 VMs were spun up with 2 CPU, 4 GB memory and 40 GB of disk space. All of the VMs were joined to the network and DNS was reserved, and firewall rules and a VIP were created.

Then as things normally happen in IT, the plan changed or the project was cancelled and everyone went in new directions. But unfortunately nobody on the infrastructure team was asked to decommission these project VMs so today (5 weeks, 5 months or a year later) they are still running and using up valuable resources: 50 vCPUs, 200 GB memory, 2 TB storage, 50+ IP addresses.

In the example I just gave the VMs were accidentally just left running but another example is when they are intentionally left because (just in case) somewhere in the future someone decides to restart the project. Really! Two years have past and you have 50 VMs that are powered off in your vCenter or OpenStack that you keep around just in case.

These are Zombies feeding on your cloud!

Do you have “Zombie VMs” eating up expensive system resources and producing no value? Hunt them down and get rid of them. Left unchecked this will happen over and over until a large percentage of your cloud is wasted.

BTW, Amazon, Azure and Rackspace love charging you monthly for Zombies!

What are Bullies? You guessed it, VMs.

Bully VMs are very different than Zombie VMs and most of the time they actually have a purpose other than to eat your flesh.

Here’s the common definition about what we know about bullies.

  • They generally pick on the smaller guy.
  • They take away lunch money.
  • And they make fun of people.

Well, in the cloud, bullies tend to do the same stuff but in a slightly different way. They take away all the system resources from smaller VMs sharing the same host, network, and storage.

For example:

Imagine you have an MSSQL DB VM sharing a host, storage, and network with other App VMs. The majority of the time everything is fine but every “month end”, this DB VM becomes a Raging Hulk and starts hammering all the system resources for IOPs, CPU, and bandwidth.

The person on the front end running month-end reports isn’t aware this is happening, but other departments are feeling it because all of the sudden their application is sluggish and taking much longer to load up, or is crashing.

There’s a Bully in your cloud!

In a public cloud bullies aren’t a problem (per se) because AWS will automatically handle the additional demand so other VMs are not starved of resources. But in a private cloud this can go on for months before anyone realizes what is causing the slowness. Especially if there are multiple bullies invading at the same time.

Another common cloud capacity management problem is VM sprawl and you can find out more about it in my health check lesson.

Wrap up.

We covered zombie VMs and bully VMs. Both of these types of virtual machines impact the capacity and health of a cloud. And both need to be hunted down and dealt with before they cause untimely downtime and impact your server or application availability SLA.

A well documented  and followed decommissioning process is best for minimizing zombies and a capacity strategy for managing workload will help isolate bullies, DRS and SDRS also help.

Thanks for your time and interest!

  • 11

Leave a Reply

HOT Skills >>Master DevOps Tools
vSphere Data Protection EOA
Finding The Best vSphere Backup Replacement For VDP (3 Alternatives)

Important Notification: vSphere Data Protection (VDP) End of Availability (EOA) That...

vSphere 6
8 Updates That Make vSphere 6 Better – Keith Barker

Editor’s note: Keith Barker has been a CBT Nuggets Trainer since 2012. Some...

VMware Interview Questions
25 VMware Interview Questions And Answers: Tough & Technical (Download PDF)

Free VMware vSphere Interview Guide In this VMinstall Guide, I’ll share...

Project Photon
VMware Project Photon: Technical Review for Linux Admins

My quick and dirty review of Project Photon. I was very happy...

The Best Blockchain Jobs
10 Best Blockchain Jobs Near You (Perfect Match For DevOps Skills)

We’ll cover the best Blockchain jobs in a minute but first, let...

DevOps Plan
Best DevOps Strategy Hack (Winners Start With Why)

As IT Leaders, we’re all looking for innovative ways to improve ourselves,...

DevOps Tools Download
DevOps Toolchain: Download 3 Free Tools Used By Pros (Ansible, Git, Jenkins)

Hack Your Resume with DevOps Skills Lately, I’ve been getting a...

DevOps Guide
The Best Microsoft DevOps Skills: 25+ For Windows SysAdmins

What is Microsoft DevOps? This is an interesting question because first off,...

Shift to the Cloud – Will It Shift the Jobs by Danish Wadhwa

Editor’s note: Danish Wadhwa is a strategic thinker and an IT Pro....

VULTR Reviews
VULTR Review – Best 2018 VPS Alternatives (25 PROs CONs & FAQs)

Should Your App, Website or Blog be Powered by VULTR VPS?...

Assessing Your Company's Cloud Readiness
13 Cloud Readiness Assessment Tips To Guide Your Migration Success (Updated)

Planning your migration to the cloud? If you’re planning to move...

devops engineer skills
10 Best DevOps Skills: Finding The Elusive DevOps Engineer

What It Takes To Be A DevOps Engineer in 2018 The...

What’s So Interesting About Woz U?

On 10/12/2017, I had the privilege of attending the AZ Tech...

TensorFlow Tools
TensorFlow Tools (7 Takeaways Toward A Deep Learning Career)

Intro: Deep Learning & A.I. Technology As a follow-up to my...

job rut
How To Get MOVING When You’re Stuck In A Career Rut (Video)

You worked hard to get your degree and after graduation you...

Storage Engineer Skills
Why Storage Engineer Skills Are HOT! (Can You Say Big Data?)

3 BIG Skills New Storage Engineers Are Missing Out On… Behind...

VULTR Reviews
VULTR Review – Best 2018 VPS Alternatives (25 PROs CONs & FAQs)
Website for Small Business
Best Small Business Website Packages (6 Alternatives That Rank!)
Rise of the Robots
Rise Of The Robots Review (You Won’t Guess What’s Going On!)
Click here to learn how to Optimize WordPress Speed
WordPress Speed Optimization: Learn To Rank Higher In Google