I was just reading an excellent weekly technical digest by VMware’s Michael White and noticed the mention of a KB article on SRM. This KB article has the following describtion:
With VMware Site Recovery Manager 1.0 Update 1, recovery of a VM might take a long time. The recovery time during a test or real recovery will be longer when more VM’s are involved. The Change Network Settings task might time out during the test or real failover. This is due to the serial fashion in which Site Recovery Manager waits until a guest heartbeat is seen prior to customizing the VM.
This problem can be encountered when running the following ESX versions:
- ESX 3.5 Update 2 and Update 3
- ESX 3.0.2 and 3.0.3
In other words, the behaviour of ESX has changed and it might be useful and beneficial for SRM to change this behaviour again. We are talking about a 5 minute delay, that’s 5 minutes for each VM. You can imagine that running a recovery plan can and will take a long time when this setting isn’t changed. Here’s the solution which has also been outlined in the KB article.
Set hostd heartbeat delay to 40.
Disconnect the host from VC (Right click on host in VI Client and select “Disconnect” )
Login as root to the ESX Server with SSH.
Using a text editor such as nano or vi , edit the file /etc/vmware/hostd/config.xml
Set the “heartbeatDelayInSecs” tag under “vmsvc” to 40 seconds as shown here:
<vmsvc>
<heartbeatDelayInSecs>40</heartbeatDelayInSecs>
<enabled>true</enabled>
</vmsvc>
Restart the management agents for this change to take effect. See Restarting the Management agents on an ESX Server (1003490).
Reconnect the host in VC ( Right click on host in VI Client and select “Connect” )