Server

Changing the IP-address of an ESX host and HA

Duncan Epping · Jun 4, 2008 ·

Monday evening a colleague changed the ip-address of three VMware ESX hosts. He followed the standard VMware procedure, which usually works like a charm. In this case after the ip-address was changed HA did not work anymore. Disabling and enabling the HA resulted in the following error: “Configuration of host IP address is inconsistent on host …”

After a close inspection the following error was found in /var/log/vmware/vpx-rupgrade.log:

VMwareerrortext=ft_gethostbyname and hostname -i return different addresses: 10.21.10.81, 10.21.5.12 and 10.21.1.21

The command “hostname -i” resulted in the following:

[root@bla-01 /var/log/vmware]# hostname -i
10.21.1.21

The command “ft_gethostbyname” returned the following:

[root@bla-01 /opt/vmware/aam/bin]# ./ft_gethostbyname
10.21.10.81 bla-01
10.21.5.12 bla-01

So for some reason ESX resolved the wrong address. The hosts file wasn’t the problem, but FT_HOSTS which is automatically generated by the AAM Client(High Availability) was:

[root@bla-01 /etc]# more FT_HOSTS
# Auto-generated FT_HOSTS file. Timestamp: Mon Jun 2 19:05:09 2008
10.21.10.81 bla-01
10.21.5.12 bla-01
10.21.10.82 bla-02
10.21.5.14 bla-02
10.21.10.83 bla-03
10.21.5.16 bla-03

So I moved the FT_HOSTS to FT_HOSTS.BAK:

[root@bla-01 /etc]# mv FT_HOSTS FT_HOSTS.BAK

Reconfigured the cluster for HA and everything works like expected again:

[root@bla-01 /etc]# more FT_HOSTS
# Auto-generated FT_HOSTS file. Timestamp: Wed Jun 4 10:39:52 2008
10.21.1.21 bla-01
10.21.5.12 bla-01
10.21.1.22 bla-02
10.21.5.14 bla-02
10.21.1.23 bla-03
10.21.5.16 bla-03

Deleting the cluster, removing the hosts from the cluster and or reconfiguring HA did not once update the FT_HOSTS file. I would expect that with every “reconfigure for HA” action an update or check of the FT_HOSTS file would be done.

Good read: how many vm’s on 1 ESX host

Duncan Epping · May 25, 2008 ·

Check out this topic on the VMTN forum by Gabrie. It’s a good read about how many vm’s one would dare to run on an ESX host.

TexiWill:
This really depends. I know companies that are doing no more than a 10:1 or 20:1 compression, but there are other companies with 50+ VMs running on one box (at the time it was a DL760 with 8 CPUs and 64GBs of memory. I do know that the max vCPUs you can put on a system is still 8 * pCores and the larget box I have seen is the DL580G4 with 4 quad cores (16 cores) and 512GBs of memory….. So maximally 128 vCPUs…..

Ken.Cline:
I make this decision based on a couple things:

* – How important are the VMs in questions?
* If they’re truly “mission critical”, then I keep the number small – on the order of 10:1
* If they’re “important”, then let’s look at 20:1
* If they’re “who cares if they’re up”, then load ’em up!

* – How large is the environment? I like to deploy a minimum of two hosts (three makes me happier)
* 20 systems @ 2 hosts = 10:1, @ 3 hosts = 7:1
* 100 systems @ 2 hosts = I wouldn’t do it, @ 3 hosts = 34:1
* 1,000 systems – now you’re talking! @ 20 hosts = 50:1, @ 30 hosts = 34:1, @ 20 hosts = 50:1, @ 10 hosts = 100:1
* 10,000 systems – you can bet I’m going to have a few hosts with 50 to 60 (or more) VMs and some hosts with 10 (or less) VMs!

So, there’s not single “right” answer (other than “it depends”)

VC 2.5 HA constraints

Duncan Epping · May 20, 2008 ·

VMTN user “ian4563” recently posted a thread about problems with the HA constraints. The error that was pulled from the log files:

Das admission check failed. Configured failover: 2, Expected new failover: 0

And the solution according to VMTN user “eziskind”, who also is a VMware employee:

Looks like you have some 4-cpu vms in the clusters too. That will really skew things. You’re being hit by the combination of 2 new things in the HA admission control for VC 2.5:

1) If no reservation is set for a vm (or it is set to 0), use default of 256MHz, 256MB. (these values can be changed using HA advanced options: das.vmMemoryMinMB, das.vmCpuMinMHz)
2) For the cpu component of the slot, use (max MHz per virtual cpu) * (max number of vcpu’s per vm)

The HA admission control algorithm is overly conservative in non-homogenous clusters, ie. ones with vms which have different reservations and/or vcpu number. #2 above makes it more conservative. Given these limitations, its best to try to keep the cluster as homogenous as possible. Is it possible to put the 4-cpu vms in a separate cluster? If not, you can try setting the default vm resources to 0 (using the advanced options in #1). This is how things worked in VC 2.0.

Thanks goes out to my colleague Remco for pointing this topic out.

Microsoft’s Virtualization ROI/TCO Calculator gets a failing grade

Duncan Epping · May 17, 2008 ·

Microsoft’s marketing department is definitely the king when it comes to twisting the facts in such a way that the average reader doesn’t notice it. It’s nice to see that VMware pointed MS out to a couple of their screw ups. Funny thing is that I’ve been trying to find out what Hyper-V + management tools was gonna cost me and compare this to VMware, which was hard because you can’t find the right prices on their website that easily.

Anyway, I think the document definitely is a good read which clears up a lot of unanswered questions. Check it out.

Corrupt cluster or VirtualCenter database

Duncan Epping · May 15, 2008 ·

Today I witnessed something weird. For reason VirtualCenter was totally lost. There were 3 ESX 3.5 hosts in a cluster. One of them failed and it seemed that all the vm’s failed over to the other two. This could be confirmed in VirtualCenter cause all VM’s were registered on either the first or the second host. I could not double check it on the third host because it was impossible to run “vmware-cmd -l” or contact is via the VI Client.

This also meant that I did not have the opportunity to put the host in maintenance mode, because it was also disconnected. Seeing all these symptoms one would expect that the host was completely empty so I decided to reboot the host. Well I guess that was a big mistake because around 15 VM’s got shutdown. Although according to VirtualCenter they were running on a different ESX host the third host decided to kill them.

When I restarted the machine VirtualCenter still showed me wrong information. So I decided to kill the cluster and recreated it. When added the ESX hosts to the cluster everything functioned like it should. Anyway, it’s really tough troubleshooting when you can’t seem to rely on the management tools. Hope this is something VMware fixes soon, or create a workaround like “forced database update”….