I received this question and figured I would write a quick post about it, as it comes up occasionally. Why does vSphere HA no power-on VMs after a full cluster is brought back online after a full cluster shutdown? In this case, the customer had a power outage, so their hosts and all VMs were powered off, by an administrator cleanly, as a result of the backup power unit running out of power. Unfortunately, this happens more frequently than you would think.
When VMs are powered off by an administrator, or anyone/anything (PowerCLI etc) else which has permissions to power off VMs, then vCenter Server will mark these VMs as “cleanly powered off”. Next, also the state of the VMs is tracked by vSphere HA. So if a host is powered off, HA will know if the VM was powered on, or powered off at the time the host goes missing.
Now, when the host (or hosts) returns for duty, vSphere HA will of course verify what the last known state was of the cluster. It will read the list of all the VMs that were powered on, and it will restart those that were powered on and are configured for HA. It will also look at a VM property called “runtime.cleanPowerOff”, this property indicates if the VM was cleanly powered off by an Admin or a script, or if the VM was for instance powered off by vSphere HA itself. (PDL response etc.) Depending on the value of the property, the VM will, or will not be restarted.
Having said all of that, when you power off a VM manually through the UI, or via a script, then the VM will be marked as being “cleanly powered off”. This means that HA has no reason to restart it, as the powered-off VM is not the result of a host, network, or storage failure.
Does (or will) the API provide the means to manually set the value in the runtime.cleanPowerOff property? I can definitely think of some use cases if it does!
Duncan Epping says
it is a property of the VM settings as far as I know. Not sure how easy/difficult it is to set it after a VM is powered off, never tried it myself.
So the use case here is an unexpected power event (as you’ve indicated). You’ve got an integration between your UPS and vCenter, so it’ll issue a graceful shutdown command to the VMs.
On a standalone host, the host will read the VM starup rules and power on critical VMs regardless whether they cleanly shut down or just stopped executing in an unclean fashion. HA will only power on in when the machines shut down uncleanly.
What people are struggling with is this exact functionality exists for a standalone host, so they’re left with – let their VMs die in a crash consistent state on power loss so HA will power them up when power returns or gracefully shut them down and depend on some other automation or paging a human when power returns to restore services. Seems part and parcel for high availability to have the ability to specify that HA should make sure a particular machine is *ALWAYS* running if possible.
James Edmonds says
Yes I think this probably impacts smaller size businesses and deployments more, such as our own.
I’d love to see the HA agent developed so that from vCenter, you can centrally control startup rules for VMs. When a VM is then registered to a host, it will generate a startup entry on that host.
When the power is restored, the host can evaluate its own startup list for registered VMs, then evaluate last running state of remaining VMs to power on those that HA would normally power on.
Maybe it’s not possible, but that’s where I’d get some benefit, as it gives coverage for recovery after both graceful and non graceful shutdowns.
No hibernate function?
Ronny Løken says
Even if you hibernate the VM, it will not be considered a HA event, and as such will not be powered on by HA.
So how do you get guests to power on automatically once the Host is powered back on?
Duncan Epping says
– script it
– do it manually
unfortunately there’s no other way.
I was your customer referenced in this article, initially helped in the Vmware community forums.
Whilst this is a great write up, and good info for this scenario, this doesn’t quite cover what I was trying to account for.
In my case, it was NOT a clean shutdown during a power outage, but a total power loss to the hosts. I am trying to account for the scenario of a power outage in the middle of the night, whereby the hosts lose power upon the UPS running out of battery, and how to get the cluster to start back up once power is restored.
We will have no IT person available to do a clean shutdown/startup, but as production continues to run during the night, we need automated recovery.
When power is restored, the hosts will resume their last power state (on), so we just need to get vCenter to start, which should then consider all VMs as requiring a restart because of HA?
Before anyone says we should use automation to shutdown VMs/Hosts during a power outage, this would actually work against us because of what is described in this article. Nothing would recover itself when power is restored, as it was all shut down cleanly. With no night time IT cover, it actually works in our favour to leave the UPS to run dry and cut power to the hosts, so they recover themselves.
Duncan Epping says
In the case of an “unclean shutdown” where vCenter is also impacted, all VMs should be automatically restarted by default. That is how vSphere HA is designed. It does not rely on vCenter to power-on the VMs. So when this situation occurs and hosts + shared storage come back online, then HA will try to restart the VMs automatically.
Thanks Duncan. As per discussion in the Vmware forums with you also, this is how I expected it to operate, as the HA agent is running on the host.
I have sent this back to Vmware technical support and have them double check their answer to me, and will also look at testing in house.