Today on the community forums someone mentioned he had shutdown his host and that he expected vSphere HA to restart his virtual machines. For whatever reason he got in a situation where all of his VMs were still running but he couldn’t do much anymore with them and as such he wanted to kill the host so that HA could safely restart the virtual machines. However when he shutdown his host nothing happened, the VMs remained powered off. Why did this happen?
I had seen this before in the past, but it never really sunk in until I saw the questions from this customer. I figured I would test it just to see what happened and if I could spot a difference in the vSphere HA logs. I powered on a VM on one of my hosts and moved off all other VMs. I then went to the DCUI of the host and gave a “shutdown” using F12. I tailed the FDM log on one of my hosts and spotted the following log message:
2014-04-04T11:41:54.882Z [688C2B70 info 'Invt' opID=SWI-24c018b] [VmStateChange::SavePowerChange] vm /vmfs/volumes/4ece24c4-3f1ca80e-9cd8-984be1047b14/New Virtual Machine/New Virtual Machine.vmx curPwrState=unknown curPowerOnCount=0 newPwrState=powered off <strong>clnPwrOff=true</strong> hostReporting=host-113
In the above scenario the virtual machine was not restarted even though the host was shutdown. I did the exact same exercise again, but only this time I did the shutdown using the vCenter Web Client. After I witnessed the VM being restarted I also noticed a difference in the FDM log:
2014-04-04T12:12:06.515Z [68040B70 info 'Invt' opID=SWI-1aad525b] [VmStateChange::SavePowerChange] vm /vmfs/volumes/4ece24c4-3f1ca80e-9cd8-984be1047b14/New Virtual Machine/New Virtual Machine.vmx curPwrState=unknown curPowerOnCount=0 newPwrState=powered on <strong>clnPwrOff=false</strong> hostReporting=host-113
The difference is the power-off state that is reported by vSphere HA. In the first scenario the virtual machine is marked as “clnPwrOff=true” which basically tells vSphere HA that an administrator has powered off the virtual machine, this is what happened when “shutdown” was initiated through the DCUI and hence no restart took place. (It seems that ESXi initiates a shutdown of all running virtual machines.) In the second scenario vSphere HA reported that the VM was not cleanly powered off (“clnPwrOff=false”), and as such it restarted the virtual machine as it assumed something bad had happened to it.
So what did we learn? If you, for whatever reason, want vSphere HA to restart your virtual machines which are currently running on a host that you want to shutdown, make sure that you use the vCenter Web Client instead of the DCUI!
Disclaimer: my tests were conducted using vSphere 5.5 Update 1. I believe that at some point in the past “shutdown” via the DCUI would also allow HA to restart the VMs. I am now investigating why this has changed and when. When I find out I will update this post.
raphael schitz (@hypervisor_fr) says
Hi Duncan, very interesting behaviour. Could that be related to the guest shutdown call during ESXi shutdown?
Duncan Epping says
Yeah I guess so, but it is strange that there is a difference in behavior for these two.
Thanks for share Duncan. It’s not clear to me why shutdown a host (By DCUI Or webClient) cause The VM restart. I think that The way is only do a vmotion instead of restart.
I typically resort to powering off the host via out of band (CIMC, DRAC, iLO, etc.). HA tends to respond immediately then
Rob Taylor says
I would be curious to know if vm’s without vmware tools and without any ACPI support would restart, since I’m guessing that esxi wouldn’t be able to get those vm’s to shutdown cleanly.
Nah it isn’t about “guest initiated” or not. This could simply be a “power off VM” task, which it appears to be.
I have ran in to this before. One of my ESXi hosts lost communication with my vCenter server. Basically I needed to restart the management services, but could not as both local and remote shell was disabled. My only option was to F12 restart it through the DCUI. After doing this, the guests were not restarted on other hosts as I would have assumed. This was on ESXi 5.0 probably update1.
In response to Lee’s note above, this had to be performed through iLO and Cleriston, I did not have the option to vMotion due to the failure of management services.
Master or slave host? There is known issue with master host vcenter shutdown in 5.5.
It has got nothing to so with that. This is a change of behavior in ESXi.