• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

2-node

Nested Fault Domains on a 2-Node vSAN Stretched Cluster, is it supported?

Duncan Epping · Jun 20, 2022 · Leave a Comment

I spotted a question this week on VMTN, the question was fairly basic, are nested fault domains supported on a 2-node vSAN Stretched Cluster? It sounds basic, but unfortunately, it is not documented anywhere, probably because stretched 2-node configurations are not very common. For those who don’t know, with a nested fault domain on a two-node cluster you basically provide an additional layer of resiliency by replicating an object within a host as well. A VM Storage Policy for a configuration like that will look as follows.

This however does mean that you would need to have a minimum of 3 fault domains within your host as well if you want to, this means that you will need to have a minimum of 3 disk groups in each of the two hosts as well. Or better said, when you configure Host Mirroring and then select the second option failures to tolerate the following list will show you the number of disk groups per host you need at a minimum:

  • Host Mirroring – 2 Node Cluster
    • No Data Redundancy – 1 disk group
    • 1 Failure – RAID1 – 3 disk groups
    • 1 Failure – RAID5 – 4 disk groups
    • 2 Failures – RAID1 – 5 disk groups
    • 2 Failures – RAID6 – 6 disk groups
    • 3 Failures – RAID1 – 7 disk groups

If you look at the list, you can imagine that if you need additional resiliency it will definitely come at a cost. But anyway, back to the question, is it supported when your 2-node configuration happens to be stretched across locations, and the answer is yes, VMware supports this.

Is a crossover cable needed for vSAN 2-Node Direct Connect?

Duncan Epping · Jan 8, 2021 · 6 Comments

I had this question last week around vSAN 2-node direct connect and whether using a crossover cable is still required to be used or if a regular CAT6 cable (CAT 5E works as well) can be used. I knew the answer and figured this would be documented somewhere, but it doesn’t appear to be. To be honest, many websites when talking about the need for crossover cables are blatantly wrong. And yes, I also spotted some incorrect recommendations in VMware’s own documentation, so I requested those entries to be updated. Just to be clear, with vSAN 2-Node Direct Connect, or vMotion, or any other service for that matter, you can use a regular CAT6 cable, combined with 10GbE (or faster) NICs, this gives you great performance without the cost of a 10GbE (or faster) switch. I can’t recall having seen a NIC in the past 10 years that does not have Auto MDI/MDI-X implemented, even though it was an optional feature in the 1000Base-T standard. In other words, there’s no need to buy a crossover cable, or make one, just use a regular cable.

2 node direct connect vSAN and error “vSphere HA agent on this host could not reach isolation address”

Duncan Epping · Oct 7, 2019 ·

I’ve had this question over a dozen times now, so I figured I would add a quick pointer to my blog. What is causing the error “vSphere HA agent on this host could not reach isolation address” to pop up on a 2-node direct connect vSAN cluster? The answer is simple, when you have vSAN enabled HA uses the vSAN network for communication. When you have a 2-node Direct Connect the vSAN network is not connected to a switch and there are no other reachable IP addresses other than the IP addresses of the vSAN VMkernel interfaces.

When HA tries to test if the isolation address is reachable (the default gateway of the management interface) the ping will fail as a result. How you can solve this is simply by disabling the isolation response as described in this post here.

Isolation Address in a 2-node direct connect vSAN environment?

Duncan Epping · Nov 22, 2017 ·

As most of you know by now when vSAN is enabled vSphere HA uses the vSAN network for heartbeating. I recently wrote an article about the isolation address and relationship with heartbeat datastores. In the comment section, Johann asked what the settings should be for 2-Node Direct Connect with vSAN. A very valid question as an isolation is still possible, although not as likely as with a stretched cluster considering you do not have a network switch for vSAN in this configuration. Anyway, you would still like the VMs that are impacted by the isolation to be powered off and you would like the other remaining host to power them back on.

So the question remains, which IP Address do you select? Well, there’s no IP address to select in this particular case. As it is “direct connect” there are probably only 2 IP addresses on that segment (one for host 1 and another for host 2). You cannot use the default gateway either, as that is the gateway for the management interface, which is the wrong network. So what do I recommend:

  • Disable the Isolation Response >> set it to “leave powered on” or “disabled” (depends on the version used
  • Disable the use of the default gateway by setting the following HA advanced setting:
    • das.usedefaultisolationaddress = false

That probably makes you wonder what will happen when a host is isolated from the rest of the cluster (other host and the witness). Well, when this happens then the VMs are still killed, but not as a result of the isolation response kicking in, but as a result of vSAN kicking in. Here’s the process:

  • Heartbeats are not received
  • Host elects itself primary
  • Host pings the isolation address
    • If the host can’t ping the gateway of the management interface then the host declares itself isolated
    • If the host can ping the gateway of the management interface then the host doesn’t declare itself isolated
  • Either way, the isolation response is not triggered as it is set to “Leave powered on”
  • vSAN will now automatically kill all VMs which have lost access to its components
    • The isolated host will lose quorum
    • vSAN objects will become isolated
    • The advanced setting “VSAN.AutoTerminateGhostVm=1” allows vSAN to kill the “ghosted” VMs (with all components inaccessible).

In other words, don’t worry about the isolation address in a 2-node configuration, vSAN has this situation covered! Note that “VSAN.AutoTerminateGhostVm=1” only works for 2-node and Stretched vSAN configurations at this time.

UPDATE:

I triggered a failure in my lab (which is 2-node, but not direct connect), and for those who are wondering, this is what you should be seeing in your syslog.log:

syslog.log:2017-11-29T13:45:28Z killInaccessibleVms.py [INFO]: Following VMs are powered on and HA protected in this host.
syslog.log:2017-11-29T13:45:28Z killInaccessibleVms.py [INFO]: * ['vm-01', 'vm-03', 'vm-04']
syslog.log:2017-11-29T13:45:32Z killInaccessibleVms.py [INFO]: List inaccessible VMs at round 1
syslog.log:2017-11-29T13:45:32Z killInaccessibleVms.py [INFO]: * ['vim.VirtualMachine:1', 'vim.VirtualMachine:2', 'vim.VirtualMachine:3']
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: List inaccessible VMs at round 2
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: * ['vim.VirtualMachine:1', 'vim.VirtualMachine:2', 'vim.VirtualMachine:3']
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: Following VMs are found to have all objects inaccessible, and will be terminated.
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: * ['vim.VirtualMachine:1', 'vim.VirtualMachine:2', 'vim.VirtualMachine:3']
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: Start terminating VMs.
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: Successfully terminated inaccessible VM: vm-01
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: Successfully terminated inaccessible VM: vm-03
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: Successfully terminated inaccessible VM: vm-04
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: Finished killing the ghost vms

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007), the author of the "vSAN Deep Dive", the “vSphere Clustering Technical Deep Dive” series, and the host of the "Unexplored Territory" podcast.

Upcoming Events

29-08-2022 – VMware Explore US
07-11-2022 – VMware Explore EMEA
….

Recommended Reads

Sponsors

Want to support Yellow-Bricks? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2022 · Log in