Because I’ve been looking into HA myself I wanted to clarify things up, for you guys and for myself… writing is a good way of getting the facts straight. I’ve seen and get a lot of questions regarding HA. So I just bundled a bunch of questions I received over the last couple of months…
How does a primary and / or secondary get selected?
- The first 5 hosts that join the VMware HA cluster are automatically selected as “primary nodes”
- All the others are automatically selected as “secondary nodes”
- When you do a reconfigure for HA the primary nodes and secondary nodes are selected again, this is random
What’s up with these primaries and secondaries?
- Primary nodes hold cluster settings and all node states which are synced between primaries
- Secondary nodes send their state info(resource occupation) to the primary nodes
- Nodes send heartbeats to each other, primary nodes send heartbeats to primary nodes only and secondary also only to primary. And they do this every second. (Which is a changeable value: das.failuredetectioninterval)
So what if a primary node fails, will a secondary be promoted?
- No, there will only be a new primary appointed when the failed one is removed from the cluster. A secondary will be promoted to primary at random.
But what if all my primary nodes fail?
- This is an unaddressed issue, that’s the reason why you can only account for 4 host failures within a cluster! There needs to be at least one primary!
So when does the gateway come in play?
- Actually the gateway, which is the default “isolation address”, will only be used when an isolation has occurred. So when the AAM client thinks it’s isolated it will check the isolation addresses.
So if anyone has a question just drop it here and I’ll try to answer it and update the above list…
John says
HA question –
What has a bigger performance impact on VC:
Number of cluster nodes or the number of machines? I can pretty much bury my VC at will when I put a node into maintenance mode.
When I turn off HA the impact seems to be less (still a huge drag) but less. Expected behavior?
I ended up turning HA off.
Also – I try to keep my clusters are 5-6 nodes. I have two 4 node clusters now (due to IP range restrictions). Is there any way to make the cluster aware of the different IP networks (beyond creating different vLan names)?
Chris says
Two Questions:
1) Is it better to try and keep clusters to only a 4-5 node system with the nodes spread across two sites for DR purposes or keep them spread out and just have one large cluster?
2) How do you determine which nodes are primary and which are secondary.
Duncan Epping says
@ john:
1st question: I would suppose number of nodes in a cluster because all the resource state data needs to go around.
2nd question: this is awkward behavior that I haven’t seen personally. you should definitely check your log files and check for any weird messages regarding this. I would recommend with posting a question on VMTN and phone VMware support. I heard a case like this once and it had to do with zoning on the SAN.
3rd questions: it’s currently not possible to have different ip networks for the SC in the same cluster
Duncan Epping says
@chris:
1) that totally depends on the infrastructural design, stuff like network link. I’ve setup both variations and actually don’t have one that I prefer more… I always have the feeling that it’s safer to have 2 clusters, one at each site.
2) try:
“more /opt/LGTOaam512/log/aam_config_util_listnodes.log”
or
“more /var/log/vmware/aam/aam_config_util_listnodes.log”
Alastair says
I had understood that the number of primary nodes was the configured host failure number plus one, up to a maximum of five.
i.e. in a ten node cluster with configured failure level of 1 there would be only two primaries.
Does this differ from your understanding?
Bouke Groenescheij says
As always, excellent and spot-on info! Great job Duncan.
Duncan Epping says
@ Alastair, every node up til 5 becomes primary!
@ Bouke, that’s a great compliment from the Guru that got me enthusiastic about VMware!
John says
Duncan, after upgrading to 2.5 U2 and dropping all the old performance data that wasn’t rolling up life seems to be much better. I was about ready to open an SR around it but how often do you really put hosts in M mode?
Thanks for the feedback!
Duncan Epping says
well around once every two months or so…
LucD says
Do you happen to know where (which log) you can see if a re-election has taken place ?
My idea was to use the ReconfigureComputeResource method to force a re-election after a 1 or more host failures.
This could be done like this (PowerShell-VITK)
$clus = Get-View -Id (Get-Cluster ).Id
$clsConfigInfoEx = New-Object VMware.Vim.ClusterConfigSpecEx
$clsConfigInfoEx.DasConfig = $clus.ConfigurationEx.DasConfig
$clsConfigInfoEx.DpmConfig = $clus.ConfigurationEx.DpmConfig
$clsConfigInfoEx.DrsConfig = $clus.ConfigurationEx.DrsConfig
$clsConfigInfoEx.DpmConfig = $clus.ConfigurationEx.DpmConfig
$clus.ReconfigureComputeResource($clsConfigInfoEx, $true)
Santosh Kumar says
Thanks for the useful information !!
I have one question, In HA,It can hold max 5 primary nodes, where in case of cluster which has more than 10-12 esx added to it.
But ,In case of max 3 or 4 esx boxes in cluster, 1st one added to cluster becomes the primary, if the second added one will be automatically considered as the primary or secondary. In this scenario how many primary and secondary will be in cluster ,
Can anyone clarify this.. I will be so gr8 full..
Senthil says
This portion cleared my primary,secondary concept of HA