A couple of days ago an ex-colleague phoned me about a weird problem with enabling HA in a ESXi cluster. The following errors occurred:
- Configuration of host IP address is inconsistent on host : address resolved to Host misconfigured. IP address of not found on local interfaces
- cmd addnode failed for primary node: Internal AAM Error – agent could not start
So the first error(1.) was reported by esxhost01 and the second(2.) by esxhost02.
Let’s start with esxhost01.
So this customer had a VMotion and Management portgroup on two seperate vSwitches. This error seems to indicate that during the configuration HA is using the VMotion portgroup. These hosts have been added to VC with the management portgroup IP(IP+Name also in dns). So how do I make sure that HA isn’t using the VMotion network for HA, it’s easy go to your cluster and open up the advanced options for HA and add the following key with the value false:
In other words, don’t use the VMotion network for the HA heartbeat. The weird thing in this case is that it shouldn’t use the VMotion network by default so there seems to be a glitch here…
So now for the second problem.
The HA(AAM) agent could not start. So just to make sure that the USB key wasn’t corrupt the key was recreated. But still this error occurred. As some of you might now, that if you want to use HA with a disk less server you will need to create a userworld swap on the SAN. (Read this KB for more info on that one…) So just to make sure that the swap wasn’t causing this problem the directory was cleaned out and and HA was reconfigured. When the directory was emptied the HA agent installed without any problem at all…
- When reinstalling ESXi or when strange HA errors occur clean up the userworld swap!
Thanks goes out to Remco for providing me with the additional details!