Replaced certificates and get vSphere HA Agent unreachable? I have heard this multiple times in the last couple of weeks. I started looking in to it and it seems that in many of these scenarios the common issue was the thumbprints. The log files typically give a lot of hints that look like this:
[29904B90 verbose 'Cluster' opID=SWI-d0de06e1] [ClusterManagerImpl::IsBadIP] <ip of the ha master> is bad ip
Also note that the UI will state “vSphere HA agent unreachable” in many of these cases. Yes I know, these error messages can be improved for sure.
You can simply solve this by disconnecting and reconnecting the hosts. Yes it really is as simple as that, and you can do this without any downtime. No need to move the VMs off even, just right click the host and disconnect it. Then when the disconnect task is finished reconnect it.
** Disclaimer: This article contains references to the words master and/or slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **