During the Dutch VMUG someone walked up to me and asked a question about High Availability. He read my article on Primary and Secondary nodes and was wondering who decided where and when VM would be restarted.
Let’s start with a short recap of the “primary/secondary” article: “The first five servers that join the cluster will become a primary node, and the others that will join will become a secondary node. Secondary nodes send their state info to primary nodes and also contact the primary nodes for their heartbeat notification. Primary nodes replicate their data with the other primary nodes and also send their heartbeat to other primary nodes.”
The question was, when a fail-over needs to take place cause an isolation occurred who decides on which host a specific VM will be restarted. The obvious answer is one of the primaries. One of the primaries will be selected as the “fail-over coordinator”. The fail-over coordinator coordinates the restart of virtual machines on the remaining hosts. The coordinator takes restart priorities in account. Keep in mind, when two hosts fail at the same time it will handle the restart sequentially. In other words, restart the VM’s of the first failed host(taking restart priorities in account) and then restart the VM’s of the host that failed as second(again taking restart priorities in account). If the fail-over coordinator fails one of the primaries will take over.
By the way, this is another reason why you can only account for 4 host failures. You need at least 1 primary, this primary will be the fail-over coordinator. When the last primary dies….