So I’ve been collecting some HA best practices lately. I just wanted to have them all in one place so I can use them myself for the VMTN forum and/or customers. The first two are obvious in my opinion but still often overlooked:
- Your ESX host-names should be in lowercase and use fqdn’s
- Provide Service Console redundancy
- If you add an isolation validation address with “das.isolationaddress”, add an additional 5000 to “das.failuredetectiontime”
- If your Service Console network is setup with “active / standby” redundancy then your “das.failuredetectiontime” needs to be set to 60000
- If you ensured Service Console redundancy by adding a secondary service console then “das.failuredetectiontime” needs to be set to 20000 and you need to setup an additional “das.isolationaddress”
- If you setup a secondary Service Console use a different subnet and vSwitch then your primary has
- If you don’t want to use your default gateway as an isolation validation address or can’t use it because it’s a non-pingable device then disable the usage by setting das.usedefaultisolationaddress to false and add a pingable “das.isolationaddress”
- Change default isolation response to “power off vm” and set restart priorities for your AD/DNS/VC/SQL servers
So if you’ve got more, add them into the comments and I will update the list!
Thomas Weyell says
Hi,
I tell my customers that they must foster there hosts files.
Thomas
Duncan Epping says
Well as of 3.5 U2 HA doesn’t rely on host files or dns anymore but get’s its name resolving information from VirtualCenter.
Jay says
is there any reason why to chose following?
“Change default isolation response to “power off vm” and set restart priorities for your AD/DNS/VC/SQL servers Change default isolation response to “power off vm” and set restart priorities for your AD/DNS/VC/SQL servers ”
I think leaving all VMs power on is a better idea because all VM’s then can be scheduled to come down and bought up gracefully on different hosts. Above would just powered off the VMs and will not shutdown servers gracefully this could be a potential problem with DB servers. Troubleshooting can also be done on isolated host to see if isolated host can be fixed before shutting down VMs.
Duncan Epping says
Well, I don’t like to use a server that’s degraded in any way. If you know what you are doing and have a valid argument to leave them powered on, go ahead. But then you should be monitoring your system 24×7 in my opinion, or at least trigger an action at host degradation…
Would be nice though if you could have a “heartbeat” only network on your VM vSwitch, this way you’ll know if you need to switch over in case of an isolation or not.
Fred says
Hello,
“Well as of 3.5 U2 HA doesn’t rely on host files or dns anymore but get’s its name resolving information from VirtualCenter.”
So what about a virtual machine VCenter ? If the VM is stopped by HA, does it means that the name resolution will not work ?
I use to fill the /etc/hosts on each esx 3.5 U2, is it the righ way to configure properly HA ?
Thanks for your answer !
Duncan Epping says
No that’s not necessary anymore. HA will get the ip+hostnames from VC and cache this info in FT_HOSTS! So no need to do manual hosts files anymore!
justme says
@ Jay – If the host looses all network access, how do you plan to login and cleanly shut the guests down? If you loose all redundancy on your service console, you most likely loose it on your virtual machine networks as well.
Without powering them down, you will have file locks which means you can’t restart them on other hosts.
Stan says
If the secondary SC is on the same subnet, do you still need to set any advanced options? If the primary SC fails, HA should use its default gw as isolation address and the ping should be successful via the secondary SC, correct?
Andy Simmons says
Would you mind elaborating as to why registering the ESX hosts using their FQDNs is important? I’ve come across some installations where the hosts are registered by IP address, but vCenter is still aware of each host’s FQDN and short name. Can this still cause problems for HA, DRS, or any other vCenter functionality?
Obviously FQDNs are preferable from an administrative standpoint, but if each host’s FT_HOSTS still contains all 3 addresses for every server, is this still a problem from an operational standpoint?
Thanks,
Andy