Replaced certificates and get vSphere HA Agent unreachable? I have heard this multiple times in the last couple of weeks. I started looking in to it and it seems that in many of these scenarios the common issue was the thumbprints. The log files typically give a lot of hints that look like this:
[29904B90 verbose 'Cluster' opID=SWI-d0de06e1] [ClusterManagerImpl::IsBadIP] <ip of the ha primary> is bad ip
Also, note that the UI will state “vSphere HA agent unreachable” in many of these cases. Yes I know, these error messages can be improved for sure.
You can simply solve this by disconnecting and reconnecting the hosts. Yes, it really is as simple as that, and you can do this without any downtime. No need to move the VMs off even, just right-click the host and disconnect it. Then when the disconnect task is finished reconnect it.
That’s not our experience of the problem. Disconnecting and reconnecting the hosts doesn’t make a difference as the vCenter (both versions 5.0U1 and 5.0U2) still have the wrong thumbprint in the dbo.VPX_HOST table in the database. The only way to clear the errors for us is to totally remove the host and then re-add it which means we lost the folder structure and historical performance data for the VMs on the host. I’d be interested to know if anyone else sees this (KB2006210 mentions that it’s fixed in vCenter 5.0U1 but I expect this scenario is somehow different).
Thanks Duncan, that fixed my problem. I was tearing my hair out with this one.