I received a question today from someone who wanted to know the difference for isolation detection between vSphere 5.0 and 5.1. I described this in our book, but I figured I would share it here as well. Note that this is an outtake from the book.
The isolation detection mechanism has changed substantially since previous versions of vSphere. The main difference is the fact that HA triggers a master election process before it will declare a host is isolated. In this timeline, “s” refers to seconds. The following timeline is the timeline for a vSphere 5.0 host:
- T0 – Isolation of the host (slave)
- T10s – Slave enters “election state”
- T25s – Slave elects itself as master
- T25s – Slave pings “isolation addresses”
- T30s – Slave declares itself isolated and “triggers” isolation response
For a vSphere 5.1 host this timeline slightly differs due the insertion of a minimum 30s delay after the host declares itself isolated before it applies the configured isolation response. This delay can be increased using the advanced option das.config.fdm.isolationPolicyDelaySec.
- T0 – Isolation of the host (slave)
- T10s – Slave enters “election state”
- T25s – Slave elects itself as master
- T25s – Slave pings “isolation addresses”
- T30s – Slave declares itself isolated
- T60s – Slave “triggers” isolation response
Or as Frank would say euuuh show:
When the isolation response is triggered, with both 5.0 and 5.1, HA creates a “power-off” file for any virtual machine HA powers off whose home datastore is accessible. Next it powers off the virtual machine (or shuts down) and updates the host’s poweron file. The power-off file is used to record that HA powered off the virtual machine and so HA should restart it. These power-off files are deleted when a virtual machine is powered back on or HA is disabled.
After the completion of this sequence, the master will learn the slave was isolated through the “poweron” file as mentioned earlier, and will restart virtual machines based on the information provided by the slave.
** Disclaimer: This article contains references to the words master and/or slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **