Monday evening a colleague changed the ip-address of three VMware ESX hosts. He followed the standard VMware procedure, which usually works like a charm. In this case after the ip-address was changed HA did not work anymore. Disabling and enabling the HA resulted in the following error: “Configuration of host IP address is inconsistent on host …”
After a close inspection the following error was found in /var/log/vmware/vpx-rupgrade.log:
VMwareerrortext=ft_gethostbyname and hostname -i return different addresses: 10.21.10.81, 10.21.5.12 and 10.21.1.21
The command “hostname -i” resulted in the following:
[root@bla-01 /var/log/vmware]# hostname -i
10.21.1.21
The command “ft_gethostbyname” returned the following:
[root@bla-01 /opt/vmware/aam/bin]# ./ft_gethostbyname
10.21.10.81 bla-01
10.21.5.12 bla-01
So for some reason ESX resolved the wrong address. The hosts file wasn’t the problem, but FT_HOSTS which is automatically generated by the AAM Client(High Availability) was:
[root@bla-01 /etc]# more FT_HOSTS
# Auto-generated FT_HOSTS file. Timestamp: Mon Jun 2 19:05:09 2008
10.21.10.81 bla-01
10.21.5.12 bla-01
10.21.10.82 bla-02
10.21.5.14 bla-02
10.21.10.83 bla-03
10.21.5.16 bla-03
So I moved the FT_HOSTS to FT_HOSTS.BAK:
[root@bla-01 /etc]# mv FT_HOSTS FT_HOSTS.BAK
Reconfigured the cluster for HA and everything works like expected again:
[root@bla-01 /etc]# more FT_HOSTS
# Auto-generated FT_HOSTS file. Timestamp: Wed Jun 4 10:39:52 2008
10.21.1.21 bla-01
10.21.5.12 bla-01
10.21.1.22 bla-02
10.21.5.14 bla-02
10.21.1.23 bla-03
10.21.5.16 bla-03
Deleting the cluster, removing the hosts from the cluster and or reconfiguring HA did not once update the FT_HOSTS file. I would expect that with every “reconfigure for HA” action an update or check of the FT_HOSTS file would be done.
Aaron Delp says
Great article, I’m adding it to my bag of tricks. Thank you!
Aaron
Steffen Özcan says
Hi Duncan, I read your post this morning and said to myself: “nice, that could quite be useful someday”. Someday turned out to be very soon.
This afternoon one of my hosts reported a HA error. A “reconfigure HA” on that host didn’t work, the error i got was “Command ‘hostname -s’ on host wevhx001 failed or returned incorrect name format”. When i logged into the host, he told me that his hostname now is “i”. great. the /opt/vmware/aam/bin/ft_gethostbyname-command didn’t work, and all other hosts in this cluster (4 nodes) reported the IP and hostname from the failing host when i dropped the command on them. the solution was, the hostname changed for the runtime only in /proc/sys/kernel/hostname, for whatever reason. I have to investigate on this. In the meantime, every now and then one of the other hosts reported HA errors too. Very strange. After i corrected the hostname in /proc/sys/kernel/hostname, a cluster-wide HA reset solved the issues for the moment,
Thanks for your great posts, without them it would definitely have taken a lot more time to solve this problem and a lot others!
best regards
Stefffen
Jan Ivar Beddari says
Out of debugging interest, how do you resolve DNS entries on your VC server? How do you resolve DNS on your hosts? Do they all point to a common DNS-server or do you use a static hosts-file setup?
Ive been using static entries in /etc/hosts and system32\drivers\etc\hosts on the VC server. So far I havent had any trouble with this approach but I wonder how the DNS logic for HA works ..
Duncan says
I would have to go for DNS. I’ve been using hosts files a lot and still notice people make a lot of mistakes, which can give huge and weird problems. I’ve seen hosts files on 10 servers each being different.
santhosh says
Thats a great chunk of information.
Can you please tell me if this is the same for ESX 3i. bcause we have lmitation of the the service console.
Ricky says
The solution to our design problem is to divide whatever class of IP address we are assigned into a number of smaller networks with fewer hosts per network.
This following link helps you to know the IP Subnets.