Although this is a corner case scenario I did wanted to discuss it to make sure people are aware of this change. Prior to vSphere 5.0 Update 1 a virtual machine would be restarted by HA when the master had detected that the state of the virtual machine had changed compared to the “protectedlist” file. In other words, a master would filter the VMs it thinks had failed before trying to restart any. Prior to Update 1, a master used the protection state it read from the protectedlist. If the master did not know the on-disk protection state for the VM, the master did not try to restart it. Keep in mind that only one master can open the protectedList file in exclusive mode.
In Update 1 this logic has slightly changed. HA can know retrieve the state information from either the protectionlist stored on the datastore or from vCenter Server. So now multiple masters could try to restart a VM. If one of those restarts would fail, for instance because a “partition” does not have sufficient resources, the master in the other partition might be able to restart it. Although these scenarios are highly unlikely, this behavior change was introduced as a safety net!
** Disclaimer: This article contains references to the words master and/or slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **