I clearly don’t know much about Citrix version of HA, but I do know a thing or two about VMware’s version of HA.
The following are outtakes of the article over at DABCC:
VMware’s HA is heavily dependent on DNS or alternatively hosts entries being in place. The VMware implementation is based on the Legato Automated Availability Management (AAM), in fact some of us will recall that it used to place those logs into /opt/LGTOaam512/logs/ (since 3.5 this has been moved /opt/vmware/aam).
VMware’s HA uses the network to establish a heartbeat between all the ESX Hosts participating, So practically, what does this mean to the poor bloke who has to support the servers? If you network has a bit of a flap (personally I always blame the Network guysJ), your servers will implement an “isolation response”, the default server response will shut down your Virtual Machine to release the shared storage locks, this will allow the machine to be restarted on another host, this of course may not be desirable if the server is busy doing something, i.e. you may cause corruption or other issues with the Application/Database. In other words it won’t perform a clean shutdown. This is configurable such that you can keep the machines powered on, but this isn’t recommended in the case of NAS or iSCSI (as they are also network dependant) and you may end up with a split-brain situation.
There is now also experimental support for component level HA, i.e. if a Virtual Machine fails, then VMware will try to restart it.
- As of ESX 3.5 U2 High Availability doesn’t heavily lean on DNS anymore, it gets its hostname and ip info from VirtualCenter.
- ESX 3.5 U2 gives you the possibility to cleanly shutdown a VM in case of an isolation.
- Normally one would indeed provide it’s SC with redundancy, and preferably via two separate switches to avoid the problems you are describing.
- Virtual Machine High Availability isn’t experimental anymore as of ESX 3.5 U2.