I had a question from one of my colleagues last week about Isolation Response and IP Storage. His customer has an ISCSI storage infrastructure and recently implemented a new vSphere environment. When one of the hosts was isolated virtual machines were restarted and users started reporting strange problems.
What happened was that the HA Isolation Response was configured to “Leave Powered On” and as both the Management Network and the iSCSI Network were isolated there was no “datastore heartbeating” and no “network heartbeating”. Because the datastores were unavailable the lock on the VMDKs expired and HA would restart the VMs. Although HA will power off the “ghosted VM” which lost the lock when it detects the lock cannot be re-acquired, this will happen when the lock cannot be re-acquired. This means that the time between when the restart happens and the isolation is solved the IP Address and the Mac Address of the VM will pop up on the network and as you can imagine this is not desired.






How long time is expected for the host after being reconnected to the network until it knows that someone else has “taken” the VMDKs?
Could it be reasonable to change the Isolation Response back to Shutdown when only having IP storage and no FC?
the “power off” is a matter of seconds. And yes you can select “shutdown”, but keep in mind that it could take up to 5 minutes before a VM is actually down in that case.
So the network confusion with multiple MAC and IP addresses is just a few seconds maximum?
And yes, the “shutdown” has 5 minutes default before hard power off, so if the network comes back again before that it wont help.
It would be from the time the second VM is powered on until the first VM is powered off. If “leave powered on” is selected this will be when it is detected that the lock cannot be reclaimed, that could take a whike….
Hi Duncan
How is the HA handled in case of NFS? Is the datastore heartbeat available to NFS datastores too or its VMFS only?
Thanks
I guess in this case a power off/shut down isolation response would have been a bad thing.. The environment would have powered itself down. It’s not like the leave powered on worked out great either.. but in the end as long as we know what happened.. its great. Thanks for sharing this Duncan.
I’m still somewhat confused. Why would there be duplicate IP’s and MAC’s? What is your recommendation for IP based storage (NFS or iSCSI)?
What would happen if the host was isolated but the datastore was not – and the isolation response is ‘leave powered on’? It seems that the vm would continue to function as normal but what host would ‘own’ the vm at that point?
@forbsy: there would be duplicate IPs because both VMs would be on the network. If the Datastore is not isolated the VM cannot be restarted as the VMDK would be locked.
@SATINDER SHARMA: NFS is same concept.