I was just cleaning up our Cloud Lab and noticed HA wasn’t enabled. I enabled it and immediately it threw the following error at me:
Error Message: Configuration Issues – HA agent on esx4.mgm.local in cluster ams-hadrs-01 in Lab2 has an error : Error while running health check script.
When experiencing HA configuration issues there are a couple of steps I usually take to try to fix the experienced issues:
- Click “reconfigure for VMware HA” and see if the issue is still there, if so:
- Is DNS configured and does it actually work? If not, fix and reconfigure for HA.
- Is the gateway reachable? If not, fix and reconfigure for HA.
This usually solves 75% of the issues. If it hasn’t been fixed the next step I usually take is unloading the agent and restarting the management services. Although it is pretty rigurous it is the fastest way of fixing HA issues. In my case I am using ESXi and this is what I needed to do to clean up the host:
- Disable HA on the cluster
- /opt/vmware/aam/VMware-aam-ha-uninstall.sh
- /sbin/services.sh restart
- Enable HA on the cluster
This solved the issue I had with HA,
Tim says
I’ve had a few of these recently, including one today whilst patching my cluster.
I managed to get away with just disabling HA for the duration of the patching and then re-enabling when it was done. It’s still a bit of a pain though.
Methone says
Hello,
for me this appears if you upgrade from ESX 4.0 to 4.1 and in real the HA agents are not upgraded to 4.1 … in the meanwhile for that there is a existing kb article.
Best regards
Duncan Epping says
my Lab wasn’t upgraded… for some reason it just didn’t work.
AFidel says
At our DR site I had to use das.isolationaddress because the gateway was a VPN device that dropped ICMP.
Habibablby says
Hello,
Yes, that’s why I always instruct my people to make sure name resolution is proper prior go ahead and enable VMWare HA.
In my situation to get rid of this issue, I always hide the Service Console on another segment that behind a firewall, put the vCenter beside the service console within the same vSwitch and configure the vCenter itself as a DNS Server such as esx.local. Then register all the ESX hosts DNS targeting the vCenter Server. So, name resolution is always proper:)
Thanks for your post Duncan:)
Kalyan says
Hi Duncan,
As you said this problem on ESXi servers. How did you executed “/opt/vmware/aam/VMware-aam-ha-uninstall.sh “? is it with unsupported mode?
Duncan Epping says
You can enable this temporarily indeed through the DCUI.
Fred Peterson says
@Habibablby
While that seems like it makes sense and gives you one single location to make sure DNS is functional…what if it isn’t?
The better option is to use host file entries on each host. While slightly more cumbersome to keep updated, it absolutely ensures no outside DNS can poison or otherwise disrupt operations that require proper name resolution.
Plus each hosts file does not have to be unique, so just edit one and then scp it to the others.
Duncan Epping says
Sorry Fred, but I disagree. I have seen many issues being cause host files. People make mistakes when using custom host files and it is definitely not recommend from VMware’s perspective. DNS is the way to go.
habibalby says
hello Fred, i’m doing what vmware recommendin, and that’s why vmware introduced the new feature to join the host to the domain by having the host points to a real dns server for the nslookup works correctly
Hugo Strydom says
@ Habibalby/Fred
Just to ensure that they is no misunderstanding here.
Fisrt DNS. You will need to first configure DNS to work 100%, thus enter the correct IP/Mask/Gateway/DNS Servers (usualy 2 of them) and correct DNS domain name. Then you also need to manually create the ESX hosts entries in the correct DNS namesapce. Once this is done HA will work. As such I dont at all agree to use host entries. That is what DNS is suppose to do, so make DNS is configured correctly. (On this I have configured DHCP to give IP’s to ESX hosts based on Kernel Mac address and then get DHCP to update DNS.)
Then on the new “Join Domain” function that is new in 4.1. If your DNS is not configured correctly this function will not work. Thus by a joining the ESX to a Domain it will not configure the DNS setting on the ESX host for you. (Just the way I am reading your sentence). There is a few things that needs to be in AD (a group call “ESX Admins”) for this to work.
Brandon says
@Kalyan
As of ESXi 4.1 Tech Support Mode is officially supported. You don’t even have to type “unsupported” to get a login :).
Kelly O says
You do however, get the nasty unremovable alarm in vCenter when it is enabled.
Jonathan says
Tried all the suggestions here for a problem with health checks with no luck. For us it turned out to be a custom health check we had enabled on our clusters not working after upgrading vcentre to 4.1
Eric says
You sir are a gentleman and a scholar. I spent several hours a few months ago with this issue and VMware support. They poured through a few sets of logs over the course of 2 weeks. No one mentioned uninstalling the HA agent (or even that we could!)The eventual “resolution” was to rebuild the 2 hosts that were affected out of the 3 host cluster.
My client thanks you. Worked like a charm
Ionut Nica says
between uninstall script and restarting the services you can also add:
“rm -f /etc/opt/vmware/aam/*”
In my case, the uninstall script could not delete the backup files so I deleted them with the line above.
And yes, quite a lifesaver this post, almost spent 2 full days troubleshooting it.
VP says
Please advice.
A cluster of 4 esx 4.0 hosts. I upgrade one esx 4 to esx’i’4.1 and then added back in the same cluster where the other ESX 4.0 are residing.
1) Is it acceptable to have one ESXi-4.1 and other ESX-4.0 in the same cluster?
2) If yes, I am getting the error on ESXi 4.1… “Cannot complete the configuration of the HA agent on the host. other HA configuration error.”
spox says
Just wanting to double check… will
• /sbin/services.sh restart
drop network connectivity for the guests running on the host?
TGordon says
Thank you very much!! This was a very helpful find during tonights outage.