Virtual Machine High Availability

Posted by Duncan Epping in January 3rd, 2008
Published in Server

I’ve been testing with the experimental feature Virtual Machine High Availability(aka VM Failure Monitoring) for a couple of days now. I must say it just does what VMware claims in the PDF, resetting a VM within the configured time when a the heartbeat is lost. But one thing that struck me is that there’s hardly any evidence that HA did it’s job, in other words no events logged in VirtualCenter as far as I can see. Well there was an error indicating something was wrong “Remote console on w2k3-001 disconnected”. I checked several log files but could not find any decent errors until I checked the file /var/log/vmware/hostd.log. I know the PDF about this feature states “In this experimental version of Virtual Machine Failure Monitoring, no explicit notification is sent to the administrator.”, but I would at least expect some sort of error.

The following lines in the log /var/log/vmware/hostd.log indicated that VMware initiated the reset of the VM:

  • Task Created : haTask-112-vim.VirtualMachine.reset-1098
  • Event 61 : w2k3-001 on ESX02.esxdemo.local in ha-datacenter is reset
  • State Transition (VM_STATE_ON -> VM_STATE_RESETTING)
  • w2k3-001 on ESX02.esxdemo.local in ha-datacenter is powered on
  • State Transition (VM_STATE_RESETTING -> VM_STATE_ON)
  • Task Completed : haTask-112-vim.VirtualMachine.reset-1098

VM HA was configured with the following parameters:

  • das.FailureInterval = 30 (If there’s no heartbeat received withing 30 seconds initiate restart)
  • das.minUptime = 120 (VM has to be up for at least 120 seconds before HA kicks in, don’t set it to short cause it needs this time to stabilize the heartbeat)
  • das.maxFailures = 40 (Maximum amount of resets within the das.maxFailureWindow, normally I would never set this above 3 but for testing I’ve set this to 40 )
  • das.maxFailureWindow = 86400 ( 86400 Seconds is 1 day, see das.maxFailures)
  • das.vmFailoverEnabled = true (Enable VM HA)

By the way I used the following Microsoft “hidden feature” to force a BSOD:
To enable this feature, add the following value to the registry key HKLM\System\CurrentControlSet\Services\i8042prt\Parameters

  • Name: CrashOnCtrlScroll
    Data Type: REG_DWORD
    Value: 1

Exit Registry Editor, and then restart the computer. When holding down the right ctrl and pressing the scroll lock twice at the same time Windows will generate a BSOD and if you have setup VM HA correctly the VM will be reset within the das.FailureInterval time.


2 user comments or pingbacks in this post

Follow-up this post comment rss or leave a trackback
1. Rich said,

Good to see someone experimenting with the experimental!

2. Rubens said,

It is exactly what I looking for! Cheers

Leave A Reply Below

Currently browsing Virtual Machine High Availability

 Username (*required)

 Email Address (*private)

 Website (*optional)

Topics Search

Yellow-Bricks recommends:







VMware Fusion and VMware Workstation: Support Yellow-Bricks and get a 10% discount on select VMware products.

Train Signal:
Learn Virtualization technology with Train Signal. Support us and learn how to fully utilize VMware products at the same time! Click here for more info!

Advertisements

Recent Comments

Tags

Info