Yellow Bricks

What if my VirtualCenter server crashes?

Duncan Epping · Aug 5, 2008 ·

Seva, a VMware Technical Account Manager, put together a cool table with the implications of a VirtualCenter crash. This is a follow up to my blog about VirtualCenter getting more important by the minute. I think the most important thing to remember is that the VM’s keep running whatever happens to your VC Server and HA will still work if VC fails, well except for adding hosts to the cluster of course. So reinstalling the VirtualCenter server and re-adding the hosts is still possible, but in my opinion not recommended. Especially when you’ve got complex Resource Pools and Folder structures set up.

Open this link to the PDF or click on the picture below.

Health status not showing..

Duncan Epping · Aug 5, 2008 ·

I’ve seen this one on the VMTN forum a couple of times. When the Health Status isn’t showing you could do the following to fix it:

Restart VirtualCenter service on the VC Server
Restart mgmt-vmware service on the hosts that are affected (service mgmt-vmware restart)
Restart vmware-vpxa on the hosts that are affected (service vmware-vpxa restart)

If the above did not fix the issue:

Disconnect the affected Host from Inventory on VC
Reconnect the affected Host from Inventory on VC

And if that doesn’t work this is also a possible solution:

Restart the Pegasus service (service pegasus restart)

HA configuration and incompatible networks

Duncan Epping · Aug 1, 2008 ·

There seems to be a lot of fuss about HA not being reconfigured when Update 2 is installed.

The error message that appears:

“HA Agent on <hostname> in cluster <clustername> in <datacenter> has an error Incompatible HA Networks: Host has network(s) that don’t exist on cluster members: <ip address>: Cluster has network(s) missing on host: <ip address>: Consider using Advanced Cluster Settings das.allowNetwork to control network usage”

Pre-Update 2 environments would except incompatible networks between hosts in a cluster and just install/reconfigure. As of Update 2 this clearly isn’t the case any more, there are a couple of misunderstandings that I want to clear up:

If you have redundant service consoles set up they don’t need to be on the same subnet. (Better said, they should not be on the same subnet because of a bug described in this blog!) But they do need to be the same on every host. In other words you can’t mix up subnets, this will not work:

Host A – Service Console – 192.168.1.10
Host B – Service Console – 10.0.0.10

In this case you will need to change the IP-Address of Host B. Or add an additional Service Console named “Service Console HA” to both and filter out the first. You can filter out the first by setting the Service Console used for HA to a specific portgroup:

das.allowNetwork0 “Service Console HA”

For more info read this topic and especially the reply that msevigny posted. The knowledge base article Marc points out to in his post is an internal one, as soon as it’s officially released I will let you guys know.

Update: HA Advanced Options

Duncan Epping · Aug 1, 2008 ·

A while back I wrote down all the HA advanced options. With ESX 3.5 Update 2 VMware added a couple extra advanced options, this is the complete list:

das.failuredetectiontime – Amount of milliseconds, timeout time for isolation response action(with a default of 15000 milliseconds).

das.isolationaddress[x] – IP adres the ESX hosts uses for it’s heartbeat, where [x] = 0‐9. It will use the default gateway by default.

das.usedefaultisolationaddress – Value can be true or false and needs to be set in case the default gateway, which is the default isolation address shouldn’t be used for this purpose.

das.poweroffonisolation – Values are False or True, this is for setting the isolation response. Default a VM will be powered off.

das.vmMemoryMinMB – Higher values will reserve more space for failovers.

das.vmCpuMinMHz – Higher values will reserve more space for failovers.

das.defaultfailoverhost – Value is a hostname, this host will be the primary failover host.

The new ones:

das.failuredetectioninterval – Changes the heartbeat interval among HA hosts. By default, this occurs every second (1000 milliseconds).

das.allowVmotionNetworks – Allows a NIC that is used for VMotion networks to be
considered for VMware HA usage. This permits a host to have only one NIC configured for management and VMotion combined.

das.allowNetwork[x] – Enables the use of port group names to control the networks used for VMware HA, where [x] = 0 – ?. You can set the value to be ʺService Console 2ʺ or ʺManagement Networkʺ to use (only) the networks associated with those port group names in the networking configuration.

das.isolationShutdownTimeout – Shutdown time out for the isolation response “Shutdown VM”, default is 300 seconds. In other words, if a VM isn’t shutdown clean when isolation response occured it’s being powered off after 300 seconds.

Follow Up: HA Change (isolation response)

Duncan Epping · Jul 31, 2008 ·

I’ve been asking around why the default isolation response has been changed from “power off” to “leave powered on”. It seems that this is done because a lot of customers had VM’s being powered off unnecessary. This happened because the service console or physical switches weren’t setup redundant and thus caused HA to kick in. In other words, for those having complete redundancy, switches and nics, change the default back to “power off” or use the new option “Shutdown VM”.

Shutdown VM requires VMware Tools to be installed. If HA is unable to shutdown the VM within 5 minutes it will be powered down. I would prefer this option, especially when you virtualized services like Exchange, SQL, Oracle etc.