Today I noticed a lot of people end-up on my blog by searching for an error which has got to do with HA heartbeat datastores. Heartbeat datastores were introduced in vSphere 5.0 (vCenter 5.0 actually as that is where the HA agent comes from!!) and I described what it is and where it comes in to play in my HA deepdive section. I just wanted to make the error message that pops up when the minimum amount of heartbeat datastore requirement is not met was easier to google… This is the error that is shown when you only have 1 shared datastore available to your hosts in an HA cluster:
The number of vSphere HA heartbeat datastores for this host is 1 which is less than required 2
Or the other common error, when there are no shared datastores at all:
The number of vSphere HA heartbeat datastores for this host is 0 which is less than required 2
You can either add a datastore or you can simply add an advanced option in your vSphere HA cluster settings. This advanced option is the following:
das.ignoreInsufficientHbDatastore = true
This advanced option will suppress the host config alarm that the number of heartbeat datastores is less than the configured das.heartbeatDsPerHost. By default this is set to “false”, and in this example will be set to true.
“Heartbeat datastores were introduced in vSphere 5”, are you sure Duncan? My 4.1 clusters use HB datastores to great affect.
I haven’t had chance to test the advanced option you mention – and maybe this is what you reference is only relevant to v5 of vSphere?
But let’s not get caught up in assuming everyone is using v5 of vSphere, we are not… is the setting and behaviour the same in v4.x?
Yes I am 100% certain it was introduced with vSphere 5, to be more precise… vCenter Server 5.0 is that is where HA comes from.
The setting or behavior does not exists in 41.
My ESX 4.1 cluster uses data stores. Though I do use vCenter 5.0 with it. I refuse to use vSphere 5 due to the licensing. I chose vCenter 5 just for the hell of it (after checking that it was on the supported list of course)
One thing I wasn’t expecting with regards to heartbeats is that the service console is what is used for network hearbeats. My servers have 4x10GbE ports (2x for VMs, and 2x for storage), 2x4Gb FC for storage and 2x1GbE for service consoles.
My 10GbE network cards have been really flaky (in the process of getting them replaced with a newer manufacturing run), in one case both of the 10GbE cards in the same server failed at the same time stranding all of the VMs. From a vSphere perspective the health was good since the data stores were all good and the service console(1GbE) was good. I couldn’t vmotion anything, so I had to turn off the VMs and manually move them.
I looked in the DRS deepdive book (thanks for writing that btw) and saw the ability to add a 2nd service console interface which I plan to do soon. Certainly I was not expecting both of the 10GbE NICs to fail (all 4 ports) at the same time).
Fortunately I built the system so that I could at least still manage the systems if the 10GbE network crapped out. I’m not really sure what I would do if I had to power off VMs from an iLO remote console interface that would be annoying.
My next major upgrade off of ESX (not ESXi! I like ESX) will be to RHEV I believe.
(vmware customer since ’99 pre 1.0 baby)
Duncan Epping says
Hold on Nate. There are two things as the 5.0 book explains:
1) Network Heartbeats (been there since 3.0)
2) Datastore Heartbeats (introduced in 5.0)
Network heartbeats is what HA uses to detect is a host is isolated from the network. Datastore heartbeats is what the HA master node uses to differentiate between a failure of a host in and an isolation of a host in its cluster. vCenter 5.0 installs the new HA agent…
ahh ok didn’t know vcenter 5 used a new agent. I wasn’t referring to the 5.0 book (I have that too but since I opted for ESX 4.1 on my new clusters I haven’t dug into the 5 book much at all). thanks for the clarification.
Is this setting related to the number of datastores or host? If you have only one host and 2,3,4,5 or 6 datasores, then why would a heartbeat matter, although the error still pops up.
Here is my environment…
4 node cluster (ESXi4.1 U2)
2 iSCSI datastores
Recently we created a new ESXi5 cluster with 2 nodes, followed by upgrading our Virtual Center to 5. Our 4.1 U2 environment threw a weird HA error today stating that HA was changed to Unitialized. It seems that the host lost its heartbeat to a datastore then picked another to use for its heartbeat, but HA never initialized.
After reviewing the SAN logs no disconnects were logged and the same goes for VMWare.
I am pretty certain no storage was disconnected, nor did any VM’s HA to other hosts.
You may want to update the article to include the necessary step of turning HA off and then on again on the cluster to get the warnings to disappear. Otherwise this was very helpful.
Will vSphere HA still work without the datastore heartbeat?
Yes it will work… just without the second layer of validation in the case of a network isolation.
Matthew Lee says
You need to disable and enable HA on the cluster for the warning to be cleared.