I wrote an article about the scenario where all host fail, due to for instance a power outage, and how HA responds to it. I had a question today if this was still valid with vSphere 5.0. I figured it wouldn’t hurt to describe the steps that vSphere 5.0 takes.
- Power Outage, all hosts down
- Power on hosts
- Election process will be kicked off. Master will be elected.
- Master reads protected list
- Master initiates restarts for those VMs which were listed as protected but not running
Now the one thing I want to point out is that with vSphere 5.0 we will also track if the VM was cleanly powered off, as in initiated by the admin, or powered-off due to a failure/isolation. In the case they are cleanly powered off they will not be restarted, but in this scenario of course they are not cleanly powered off and as such the VMs will be powered on. The great thing about vSphere 5.0 is that you no longer need to know which hosts where your primary nodes so you can power these on first to ensure quick recovery… No, you can power on any host and HA will sort it out for you.
** Disclaimer: This article contains references to the words master and/or slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **