I wrote an article about the scenario where all host fail, due to for instance a power outage, and how HA responds to it. I had a question today if this was still valid with vSphere 5.0. I figured it wouldn’t hurt to describe the steps that vSphere 5.0 takes.
- Power Outage, all hosts down
- Power on hosts
- Election process will be kicked off. Master will be elected.
- Master reads protected list
- Master initiates restarts for those VMs which were listed as protected but not running
Now the one thing I want to point out is that with vSphere 5.0 we will also track if the VM was cleanly powered off, as in initiated by the admin, or powered-off due to a failure/isolation. In the case they are cleanly powered off they will not be restarted, but in this scenario of course they are not cleanly powered off and as such the VMs will be powered on. The great thing about vSphere 5.0 is that you no longer need to know which hosts where your primary nodes so you can power these on first to ensure quick recovery… No, you can power on any host and HA will sort it out for you.
** Disclaimer: This article contains references to the words master and/or slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **
Trever Jackson says
Awesome! Can you remind me an easy way to see the v4 primary hosts? I can’t recall.
Duncan Epping says
PowerCLI or go to the commanline and use Cli:
I was wondering, what would be your recommended way of automatic powering up VMs that were cleanly powered off? On ESXi hosts you have the option to select auto-power-on with host but after you enable HA this option gets disabled.
Duncan, thanks for all the posts.
My question is: after a power outage with all hosts down or just after a maintenance where I cleanly powered off the VMs and the hosts, how is possble to retart the clustered VM in a specific order (first AD controller, 2 Vcenter server, 3 DB server ….) ?
It is not, you can set “low / medium / high” priority, but that is it.
I just wanted to share an experience I had and where you should think twice when preparing for a power outage.
At the time we had several hundred VMs running in different clusters.
“All” hosts were configured to stay powered off in case of a power outage.
Making us able to handle startup of hosts in a controlled fashion.
Waiting for the SAN, disk arrays, network and so on to become available and verified first.
Now the unplanned power outage came.
The UPS failed as well.
All VMs and hosts went down as expected 🙂
After some time the power came back.
Unfortunately one of the host was miss configured.
It started up as soon as the power came back.
This was a host in a cluster running some 400 VMs(no VDI).
The host got network and storage connectivity.
It was a primary host and thought that,
– Hey my buddies(other hosts in the cluster) are down I better start to power on those VMs…
I was impressed, the host had started and was running more than 100+ VMs by the time I realized what was happening.
Of course the host and the powered on VMs were unusable and non responsive because of the load.
HA doesn’t care about how much recourses you have in the cluster when VMs are powered on. HA just starts them as instructed.
HA doesn’t take HA Admission control in to account during start up of VMs in this scenario,
HA only uses Admission control when powering on VMs in a cluster that is already up and running.
Make sure you start all the hosts at the same time.
Hope this can help someone out there 🙂
Well HA does care how many resources there are available, specifically unreserved resources as it needs those to be able to power-on your VMs 🙂
You are right of course.
We didn’t use reservations.
Would that explain that the host started more VMs than it could handle?
Sorry, I mean it needs unreserved capacity for the memory overhead to allow a VM to start. It doesn’t say anything about performance indeed, it will start more than it can handle performance wise if needed indeed.