Virtual Machines Running or Registered on Multiple ESX Servers

click of death in vcenterDoes your vCenter flicker when browsing clusters? There could be a problem…

Over the last 3 years I’ve seen this happen twice and both times it was not good.

Both cases were caused from an HA event that was interrupted, which left multiple VMs registered on more than one host. Fortunately, the VM stays running and the fix does not cause an outage but it is intimidating having to “KILL” VM processes.

HA being HA might have caused it and the KB below gives more causes and the solution.

HA is a good thing to have enabled, but if your NOC is monitoring your VMs and they see an alert that VMs are powering off they will log into the VIC and start powering them back on, no down time right? That’s one of the main causes of this problem and VMware admins need to educate their NOC admins on letting HA do its job power VMs back on.

Now, the devil’s advocate in me says that sounds good but how does a NOC know it’s a VM, or bunch of VMs? And, don’t we tell then to just treat them like any other server? The devil’s advocate has a good question and I will ask for help answering it. Can I get feedback on how to avoid this issue when and event happens that might cause “VM jumpers”?

Here’s a must know process for every VMware Admin on how to fix this problem…

VMware KB Link:

Here’s how VMware describes it:

  • After one of the following, a Virtual Machine appears as being registered on two ESX Servers:
    • A VMotion fails to complete correctly or times out in VirtualCenter
    • A DRS issue where virtual machines are VMotioned automatically in quick succession
    • When a machine is powered on during VMware HA failover.
    • The Service Console on an ESX host is low on memory starving the vpxa process
  • In VirtualCenter, you see the virtual machine as appearing on one ESX Server for a few seconds, then it seems to be on the other.
    The virtual machine may appear to jump back and forth among different ESX hosts.
To correct this misconfiguration:
  1. Click Inventory in the navigation bar. Expand the inventory as needed and click the appropriate managed host.
  2. Click the Virtual Machines tab.
  3. Note the virtual machine that disappears every few seconds.
  4. Log in as root with SSH to both affected ESX hosts.
  5. Run the vmware-cmd -l command to display the names of the virtual machines registered on this host.
    Run the vm-support -x command to show which virtual machines are currently running on the ESX host.

    Compare results from these commands to determine which ESX host has the virtual machine registered, but is not running it. When you have determined this, you need to unregister the virtual machine from the ESX host on which it is registered but not running.

  6. Run the following command to unregister the virtual machine from the ESX host:
    vmware-cmd -s unregister .vmx
  7. If the virtual machine has a process (PID) associated with it, ESX may not allow you to unregister it and the command fails with the error:

    If you see this error and are unable to unregister the virtual machine:
    • Kill the process for the virtual machine in the Service Console with the following two commands:
      • ps -auwwwxx | grep -i
      • kill -9 <PID of the process returned from the above command>
    • Unregister the virtual machine from the ESX host again with the command:
      vmware-cmd -s unregister .vmx
    • Run the following command to stop the hostd process: 
      service mgmt-vmware stop
    • Use a text editor to open the /etc/vmware/hostd/vmInventory.xml file.
    • Locate the machine you want to remove.
    • Remove all of the information between the <ConfigEntry> tags for the affected virtual machine.
    • Run the following command to start the hostd process:

      service mgmt-vmware start

    VMControl error -999: Unknown error: SoapError: ServerFaultCode(0): (The attempted operation cannot be performed in the current state (Powered On).)
  8. Log in to all of your ESX Servers directly using VI Client.

    You see the virtual machine on both ESX hosts with a Powered-on status. One host however does not display any details of VMware Tools, IP address, etc in the Summary tab.

  9. Click the virtual machine on the host that does not display any details in the Summary tab.
  10. Right-click the virtual machine, and click Power Off.
Note: VMware recommends restarting the mgmt-vmware and vmware-vpxa processes on any hosts on which you have changed registered machines from the command line. For more information, see Restarting the Management agents on an ESX Server (1003490).

What’s next…

Discover 10 virtual machine software products that you can use on desktops or servers. Test them all for free and pick the best virtual machine for you use case…

Leave a Reply

vSphere Data Protection EOA
3 Alternative vSphere Backup Replacements For VDP

Important Notification: vSphere Data Protection (VDP) End of Availability (EOA) That...

vSphere 6
8 Updates That Make vSphere 6 Better – Keith Barker

Editor’s note: Keith Barker has been a CBT Nuggets Trainer since 2012. Some...

VMware Interview Questions
25 VMware Interview Questions And Answers: Tough & Technical (Download PDF)

Free VMware vSphere Interview Guide In this VMinstall Guide I’ll share a...

Project Photon
VMware Project Photon: Technical Review for Linux Admins

My quick and dirty review of Project Photon. I was very happy...

DevOps Plan
Best DevOps Strategy Hack (Winners Start With Why)

What is DevOps? It’s a model. It’s a culture. It’s a...

DevOps Tools Download
DevOps Toolchain: Download 3 Free Tools Used By Pros (Ansible, Git, Jenkins)

Hack Your Resume with DevOps Skills Lately I’ve been getting a...

DevOps Guide
Microsoft DevOps: 25+ Powerful Windows SysAdmin Skills (NEW Guide!)

What is Microsoft DevOps? This is an interesting question because first off,...

continuous integration tools
DevOps Training Courses That Will Give You The Edge (Perfect For Beginners Who Want Skills That Are Noticed By Google & Facebook…)

When I was just a fledgling starting out as a Jr...

Click here to learn how to Optimize WordPress Speed
WordPress Speed Optimization: Learn To Rank Higher In Google

This lesson is going to be a quick and dirty part 4...

Best Hosting for WordPress
Case Study: Best VPS For Easy WordPress Migration

Why I moved my WordPress blog off slow shared hosting… Let’s...

cloudways review updated
11 Cloudways Review Pros & Cons From My Deep Dive Testing (VPS Series: Part 2)

Which is a better cPanel alternative of a VPS – Cloudways...

ServerPilot Review
ServerPilot / DigitalOcean Review: Installing WordPress Using A VPS

How to install WordPress on a cloud server like a Pro...

What’s So Interesting About Woz U?

On 10/12/2017, I had the privilege of attending the AZ Tech...

tensorflow training
TensorFlow Tools for Beginners (7 Easy Takeaways To Get You Started)

Introduction To Deep Learning As a follow-up to my book review...

job rut
How To Get MOVING When You’re Stuck In A Career Rut (Video)

You worked hard to get your degree and after graduation you...

Storage Engineer Skills
Why Storage Engineer Skills Are HOT! (Can You Say Big Data?)

3 BIG Skills New Storage Engineers Are Missing Out On… Behind...

Open Source Storage Software
3 Open Source Storage Software Alternatives For Beginners

Build Your Own Storage Solution! Reflecting on my trip to VMworld...

VMware VCP FOR Hire eBook
Read Excerpts from VCP for Hire eBook

BIG Thanks… Before I share the excerpt of VCP for Hire,...

How to Self Publish Your VMware eBook KDP
How to Self Publish Your VMware eBook KDP

Do you have a VMware book you’d like to self publish...

free chef for 5 servers
Free Hosted Opscode to Try Chef On 5 Servers Before You Buy

Is Chef the configuration management and automation tool for your operations team?...