There was a question on VMTN this week about the use of the management IP’s in a “smaller” cluster as the isolation address for vSphere HA. The plan was to disable the default isolation address (default gateway) and then add every management IP as an isolation address. In this case 5 or 6 IP’s would be added. I had to think this through and went through the steps of what happens in the case of an isolation event:
- no traffic between slave and master or master and slaves (depending on whether the master is isolated or one of the slaves)
- if it was a slave which is potentially isolated then the slave will start a “master election process”
- if it was the master which is potentially isolated then the master will try to ping the isolation addresses
- if it was a slave and there’s no response to the election process then the slave will ping the isolation address after it has elected itself as master
- if there’s no response to any of the pings (happen in parallel) then the isolation is declared and the isolation response is triggered
Now the question is: will there be a response when the host tries to ping itself while it is isolated, as you need to add all ip-addresses to “isolation address” options for it to make sense… And that is what I tested. It will ping all isolation addresses. All but one will fail, the one that will be successful is the management IP address of the host which is isolated. (You can still ping your own IP when the NICs are disconnected even.) Leaving the VMs running as one of the isolation addresses responded.
In other words, don’t do this. The isolation address should be a reliable address outside of the ESXi host, preferably on the same network as the management.
** Disclaimer: This article contains references to the words master and slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **
Cool, first time I see the whole picture. It seems that having the default gateway as the isolation address might not be a good idea when you are using some technology that implements local default gateways (distributed routers ?) because even when your host is isolated, the DG will respond and you will not trigger the isolation response.
Also, first time I see that there is election of master when you are alone…