I had this question this week around what happens to VMs when a cluster is partitioned. Funny thing is that with questions like these it seems like everyone is thinking the same thing at the same time. I had the question on the same day from a customer running traditional storage and had a network failure across racks and from a customer running Virtual SAN who just wanted to know how this situation was handled. The question boils down to this, what happens to the VM in “Partition 1” when the VM is restarted in Partition 2?
The same can be asked for a traditional environment, only difference being that you wouldn’t see those “disk groups” in the bottom but a single datastore. In that case a VM can be restarted when a disk lock is lost… What happens to the VM in partition 1 that has lost access to its disk? Does the isolation response kick in? Well if you have vSphere 6.0 then potentially VMCP can help because if you have a single datastore and you’ve lost access to it (APD) then the APD response can be triggered. But if you don’t have vSphere 6.0 or don’t have VMCP configured, or if you have VSAN, what would happen? Well first of all, it is a partition scenario and not an isolation scenario. On both sides of the partition HA will have a master and hosts will be able to ping each other so there is absolutely no reason to invoke the “isolation response” as far as HA is concerned. The VM will be restarted in partition 2 and you will have it running in Partition 1, you will either need to kill it manually in Partition 1, or you will need to wait until the partition is lifted. When the partition is lifted the kernel will realize it no longer holds the lock (as it is lost it to another host) and it will kill the impacted VMs instantly.