I was doing my daily round on the VMTN Forums and noticed this topic on VMs flickering between ESX hosts. I’ve personally never witnessed this and didn’t even knew it was a known issue. Luckily Troy Clavell pointed the topic starter out to a KB article related to this exact issue. Apparently it’s being caused by the fact that the VM is registered on two hosts at the same time.
Symptoms:
-
After one of the following, a Virtual Machine appears as being registered on two ESX Servers:
-
A VMotion fails to complete correctly or times out in VirtualCenter
-
A DRS issue where virtual machines are VMotioned automatically in quick succession
-
When a machine is powered on during VMware HA failover.
-
The Service Console on an ESX host is low on memory starving the vpxa process
-
-
In VirtualCenter, you see the virtual machine as appearing on one ESX Server for a few seconds, then it seems to be on the other.
-
The virtual machine may appear to jump back and forth among different ESX hosts.
I’m not going to copy/paste the solution cause the KB article will probably change over time, but it’s most definitely worth looking into… it does sound like something that can happen to all of us.
latoga says
One of my clients witness this as a recurring event last year. In the end, one of the main culprits was the fact that their servers and storage were not all kept at consistent version/patch levels. Once they get all their ESX hosts and SAN frames patched to the most recent level of patches, the issue went away.
The lesson to be learned is that more work with less people still is no reason to get laxed on basic sys admin good house keeping. It can take much longer to track down small items like this than to just do the work and keep your systems patched at the same current level.
Troy Clavell says
Thanks for the mention Duncan! I read your blog daily and am honored to get a mention
Troy Clavell says
FYI… you’re links to the KB are broken
adsouthpaw says
I noticed that one of the causes is listed as “The Service Console on an ESX host is low on memory starving the vpxa process.” This made me think about a recommendation made during a support incident recently to increase Service Console memory to 800MB whenever using IP based storage. Is this a widely published recommendation that we missed?
Duncan says
It’s a best practice to always up the service console to 800MB in my opinion. No matter what kind of environment you are running!
alak says
Duncan — I think this can be caused during host isolation situation where several hosts try to power on the VMs on the “isolated” host — I have two questions here for you
1)Is there any way we can tweak the behavior of HA during host isolated events – perhaps not do anything but send a trap/email to admin ?
2)also in case – HA trying to poweron the VMs is the only option – then can we tune HA behavior so that it relinquishes to poweron the VMs after tries?
Thanks,
Alak
Rudolf says
Hi Duncan,
I recently registered a service call with VMware regarding the same problem. They pointed me to the following KB article: http://kb.vmware.com/kb/1006936.