stretched

Doing network/ISL maintenance in a vSAN stretched cluster configuration!

Duncan Epping · Nov 21, 2023 ·

I got a question earlier about the maintenance of an ISL in a vSAN Stretched Cluster configuration which had me thinking for a while. The question was what would you do with your workload during maintenance. I guess the easiest of course is to power off all VMs and simply shutdown the cluster, for which vSAN has a UI option, and there’s a KB you can follow. Now, of course, there could also be a situation where the VMs need to remain running. But how does this work when you end up losing the connection between all three locations? Normally this would lead to a situation where all VMs will become “inaccessible” as you will end up losing quorum.

As said, this had me thinking, you could take advantage of the “vSAN Witness Resiliency” mechanism which was introduced in vSAN 7.0 U3. How would this work?

Well, it is actually pretty straight forward, if all hosts of 1 site are in maintenance mode, failed, or powered off, the votes of the witness object for each VM/Object will be recalculated within 3 minutes. When this recalculation has completed the witness can go down without having any impact on the VM. We introduced this capability to increase resiliency in a double-failure scenario, but we can (ab)use this functionality also during maintenance. Of course I had to test this, so the first step I took was placing all hosts in 1 location into maintenance mode (no data evac). This resulted in all my VMs being vMotioned to the other site.

Now next I checked with RVC if my votes were recalculated or not. As stated, depending on the number of VMs this can take around 3 minutes in total, but usually will probably be quicker. After the recalculation had been completed I powered off the Witness, and this was the result as shown below, all VMs were still running.

Of course, I had to double check on the commandline using RVC (you can use the command “vsan.vm_object_info” to check a particular object for instance) to ensure that indeed the components of those VMs were still “ACTIVE” instead of “ABSENT”, and there you go!

Now when maintenance has been completed, you simply do the reverse, you power on the witness, and then you power on the hosts in the other location. After the “resync” has been completed the VMs will be rebalanced again by DRS. Note, DRS rebalancing (or should rules being applied) will only happen when the resync of the VM has been completed.

What does Datastore Sharing/HCI Mesh/vSAN Max support when stretched?

Duncan Epping · Oct 31, 2023 ·

This question has come up a few times now, what does Datastore Sharing/HCI Mesh/vSAN Max support when stretched? It is a question which keeps coming up somehow, and I personally had some challenges to find the statements in our documentation as well. I just found the statement and I wanted to first of all point people to it, and then also clarify it so there is no question. If I am using Datastore Sharing / HCI Mesh, or will be using vSAN Max, and my vSAN Datastore is stretched, what does VMware (or does not) support?

We have multiple potential combinations, let me list them and add whether it is supported or not, not that this is at the time of writing with the current available version (vSAN 8.0 U2).

vSAN Stretched Cluster datastore shared with vSAN Stretched Cluster –> Supported
vSAN Stretched Cluster datastore shared with vSAN Cluster (not stretched) –> Supported
vSAN Stretched Cluster datastore shared with Compute Only Cluster (not stretched) –> Supported
vSAN Stretched Cluster datastore shared with Compute Only Cluster (stretched, symmetric) –> Supported
vSAN Stretched Cluster datastore shared with Compute Only Cluster (stretched, asymmetric) –> Not Supported

So what is the difference between symmetric and asymmetric? The below image, which comes from the vSAN Stretched Configuration, explains it best. I think Asymmetric in this case is most likely, so if you are running Stretched vSAN and a Stretched Compute Only, it most likely is not supported.

This also applies to vSAN Max by the way. I hope that helps. Oh and before anyone asks, if the “server side” is not stretched it can be connected to a stretched environment and is supported.

Do I need 2 isolation addresses with a (vSAN) stretched cluster for vSphere HA?

Duncan Epping · Sep 27, 2023 ·

This question has come up multiple times now, so I figured I would write a quick post about it, do you need 2 isolation addresses with a (vSAN) stretched cluster for vSphere HA? This question comes up as the documentation has best practices around the configuration of HA isolation addresses for stretched clusters. The documentation (both for vSAN as well as traditional stretched storage) states that you need to have two reliable addresses, one in each location.

Now I have had the above question multiple times as some folks have mentioned that they can use a Gateway Address with Cisco ACI which would still be accessible in both locations even if there’s a partition due to for instance an ISL failure. If that is the case, and the IP address is indeed available in both locations during those types of failure scenarios then it would suffice to use a single IP address as your isolation address.

You will however need to make sure that the IP address is reachable over the vSAN network when using vSAN as your stretched storage platform. (When vSAN is enabled vSphere HA uses the vSAN network for communications.) If it is reachable you can simply define the isolation address by setting the advanced setting “das.isolationaddress0”. It is also recommended to disable the use of the default gate of the management network by setting “das.usedefaultisolationaddress” to false for vSAN based environments.

I have requested the vSAN stretched clustering documentation to be updated to reflect this.

vSAN Stretched Cluster failure matrix

Duncan Epping · May 30, 2023 ·

The last couple of weeks I was involved internally in a discussion around the different vSAN stretched cluster failure scenarios. I wrote a lengthy email about how vSAN and HA would respond in certain scenarios. I have documented many of these over the years on my blog already, but never really published them as a whole.

In some of the scenarios below, I discuss a “partition”, a partition is a scenario where both the L3 connection to the witness is down and the inter site / inter switch link to the other site for one of the locations. So in the diagram above for instance, if I say that Site B is partitioned then it means that Site A can still communicate with the witness, but Site B cannot communicate with the Witness and cannot communicate with Site A either.

For all of the below scenarios the following applies, Site A is the preferred location and Site B is the secondary location. When it comes to the table, the first two columns refer to the policy setting for the VM as shown in the screenshot below. The third column refers to the location where the VM runs from a compute perspective. The fourth discusses the type of failure, and the fifth and sixth columns discuss the behavior witnessed.

Time to list the various scenarios, and no, it doesn’t include all failures that could occur but should discuss most scenarios which are important for a stretched cluster configuration. Do note, the below-discussed behavior will only be witnessed when the best practices, as documented here and here, are followed. Also note that the table has multiple pages, there are close to 30 scenarios described! If there are any questions feel free to leave a comment, if you feel a failure scenario is missing, also please leave a comment.

Site Disaster Tolerance	Failures to Tolerate	VM Location	Failure	vSAN behavior	HA behavior
None Preferred	No data redundancy	Site A or B	Host failure Site A	Objects are inaccessible if failed host contained one or more components of objects	VM cannot be restarted as object is inaccessible
None Preferred	RAID-1/5/6	Site A or B	Host failure Site A	Objects are accessible as there's site local resiliency	VM does not need to be restarted, unless VM was running on failed host
None Preferred	No data redundancy / RAID-1/5/6	Site A	Full failure Site A	Objects are inaccessible as full site failed	VM cannot be restarted in Site B, as all objects reside in Site A
None Preferred	No data redundancy / RAID-1/5/6	Site B	Full failure Site B	Objects are accessible, as only Site A contains objects	VM can be restarted in Site A, as that is where all objects reside
None Preferred	No data redundancy / RAID-1/5/6	Site A	Partition Site A	Objects are accessible as all objects reside in Site A	VM does not need to be restarted
None Preferred	No data redundancy / RAID-1/5/6	Site B	Partition Site B	Objects are accessible in Site A, objects are not accessible in Site B as network is down	VM is restarted in Site A, and killed by vSAN in Site B
None Secondary	No data redundancy / RAID-1/5/6	Site B	Partition Site B	Objects are accessible in Site B	VM resides in Site B, does not need to be restarted
None Preferred	No data redundancy / RAID-1/5/6	Site A	Witness Host Failure	No impact, witness host is not used as data is not replicated	No impact
None Secondary	No data redundancy / RAID-1/5/6	Site B	Witness Host Failure	No impact, witness host is not used as data is not replicated	No impact
Site Mirroring	No data redundancy	Site A or B	Host failure Site A or B	Components on failed hosts inaccessible, read and write IO across ISL as no redundancy locally, rebuild across ISL	VM does not need to be restarted, unless VM was running on failed host
Site Mirroring	RAID-1/5/6	Site A or B	Host failure Site A or B	Components on failed hosts inaccessible, read IO locally due to RAID, rebuild locally	VM does not need to be restarted, unless VM was running on failed host
Site Mirroring	No data redundancy / RAID-1/5/6	Site A	Full failure Site A	Objects are inaccessible in Site A as full site failed	VM restarted in Site B
Site Mirroring	No data redundancy / RAID-1/5/6	Site A	Partition Site A	Objects are inaccessible in Site A as full site is partitioned and quorum is lost	VM restarted in Site B
Site Mirroring	No data redundancy / RAID-1/5/6	Site A	Witness Host Failure	Witness object inaccessible, VM remains accessible	VM does not need to be restarted
Site Mirroring	No data redundancy / RAID-1/5/6	Site B	Full failure Site A	Objects are inaccessible in Site A as full site failed	VM does not need to be restarted as it resides in Site B
Site Mirroring	No data redundancy / RAID-1/5/6	Site B	Partition Site A	Objects are inaccessible in Site A as full site is partitioned and quorum is lost	VM does not need to be restarted as it resides in Site B
Site Mirroring	No data redundancy / RAID-1/5/6	Site B	Witness Host Failure	Witness object inaccessible, VM remains accessible	VM does not need to be restarted
Site Mirroring	No data redundancy / RAID-1/5/6	Site A	Network failure between Site A and B (ISL down)	Site A binds with witness, objects in Site B becomes inaccessible	VM does not need to be restarted
Site Mirroring	No data redundancy / RAID-1/5/6	Site B	Network failure between Site A and B (ISL down)	Site A binds with witness, objects in Site B becomes inaccessible	VM restarted in Site A
Site Mirroring	No data redundancy / RAID-1/5/6	Site A or Site B	Network failure between Witness and Site A (or B)	Witness object absent, VM remains accessible	VM does not need to be restarted
Site Mirroring	No data redundancy / RAID-1/5/6	Site A	Full failure Site A, and simultaneous Witness Host Failure	Objects are inaccessible in Site A and Site B due to quorum being lost	VM cannot be restarted
Site Mirroring	No data redundancy / RAID-1/5/6	Site A	Full failure Site A, followed by Witness Host Failure a few minutes later	Pre vSAN 7.0 U3: Objects are inaccessible in Site A and Site B due to quorum being lost	VM cannot be restarted
Site Mirroring	No data redundancy / RAID-1/5/6	Site A	Full failure Site A, followed by Witness Host Failure a few minutes later	Post vSAN 7.0 U3: Objects are inaccessible in Site A, but accessible in Site B as votes have been recounted	VM restarted in Site B
Site Mirroring	No data redundancy / RAID-1/5/6	Site B	Full failure Site B, followed by Witness Host Failure a few minutes later	Post vSAN 7.0 U3: Objects are inaccessible in Site B, but accessible in Site A as votes have been recounted	VM restarted in Site A
Site Mirroring	No data redundancy	Site A	Full failure Site A, and simultaneous host failure in Site B	Objects are inaccessible in Site A, if components reside on failed host then object is inaccessible in Site B	VM cannot be restarted
Site Mirroring	No data redundancy	Site A	Full failure Site A, and simultaneous host failure in Site B	Objects are inaccessible in Site A, if components do not reside on failed host then object is accessible in Site B	VM restarted in Site B
Site Mirroring	RAID-1/5/6	Site A	Full failure Site A, and simultaneous host failure in Site B	Objects are inaccessible in Site A, accessible in Site B as there's site local resiliency	VM restarted in Site B

How to convert a standard cluster to a stretched cluster while expanding it!

Duncan Epping · Sep 27, 2022 ·

On VMTN a question was asked about how you could convert a 5-node standard cluster to a stretched cluster. It is not documented in our regular documentation, probably as the process is pretty straightforward, so I figured I would write it down. When you create a stretched cluster you will need a Witness Appliance in a third location first. I would recommend deploying that Witness Appliance before doing anything else.

After you deployed the Witness Appliance add the additional hosts to vCenter Server. DO NOT yet add them to the cluster yet though! First, configure each host separately. After you have configured each host, place the host into maintenance mode. After the host is placed into maintenance mode, move it into the cluster and do not take it out of maintenance mode!

Now, when all hosts are part of the cluster you can create the Stretched Cluster. This process is simple, you pick the hosts that belong to each location, and then you select the witness. After the cluster has been created you simply take the hosts out of maintenance mode and you should be good! Note, you take the host out of maintenance after the Stretched Cluster has been created to ensure that you don’t have any rebalancing happening while you are creating the stretched cluster. Simply avoiding unneeded resyncs from occuring.

Do note, all VMs will have the same storage policy assigned still, so you will need to change that policy to ensure that the vSAN objects are placed and replicated according to your requirements! (RAID1 across locations and RAID-1/5/6 within a location for instance.)