stretched cluster

vSAN 7.0 U3 enhanced stretched cluster resiliency, what is it?

Duncan Epping · Oct 4, 2021 ·

I briefly discussed the enhanced stretched cluster resiliency capability in my vSAN 7.0 U3 overview blog. Of course, immediately questions started popping up. I didn’t want to go too deep in that post as I figured I would do a separate post on the topic sooner or later. What does this functionality add, and in which particular scenario?

In short, this enhancement to stretched clusters prevents downtime for workloads in a particular failure scenario. So the question then is, what failure scenario? Let’s take a look at this diagram first of a typical stretched vSAN cluster deployment.

If you look at the diagram you see the following: Datacenter A, Datacenter B, Witness. One of the situations customers have found themselves in is that Datacenter A would go down (unplanned). This of course would lead to the VMs in Datacenter A being restarted in Datacenter B. Unfortunately, sometimes when things go wrong, they go wrong badly, in some cases, the Witness would fail/disappear next. Why? Bad luck, networking issues, etc. Bad things just happen. If and when this happens, there would only be 1 location left, which is Datacenter B.

Now you may think that because Datacenter B typically will have a full RAID set of the VMs running that they will remain running, but that is not true. vSAN looks at the quorum of the top layer, so if 2 out of 3 datacenters disappear, all objects impacted will become inaccessible simply as quorum is lost! Makes sense right? We are not just talking about failures right, could also be that Datacenter A has to go offline for maintenance (planned downtime), and at some point, the Witness fails for whatever reason, this would result in the exact same situation, objects inaccessible.

Starting with 7.0 U3 this behavior has changed. If Datacenter A fails, and a few (let’s say 5) minutes later the witness disappears, all replicated objects would still be available! So why is this? Well in this scenario, if Datacenter A fails, vSAN will create a new votes layout for each of the objects impacted. It basically will assume that the witness can fail and give all components on the witness 0 votes, on top of that it will give the components in the active site additional votes so that we can survive that second failure. If the witness would fail, it would not render the objects inaccessible as quorum would not be lost.

Now, do note, when a failure occurs and Datacenter A is gone, vSAN will have to create a new votes layout for each object. If you have a lot of objects this can take some time. Typically it will take a few seconds per object, and it will do it per object, so if you have a lot of VMs (and a VM consists of various objects) it will take some time. How long, well it could be five minutes. So if anything happens in between, not all objects may have been processed, which would result in downtime for those VMs when the witness would go down, as for that VM/Object quorum would be lost.

What happens if Datacenter A (and the Witness) return for duty? Well at that point the votes would be restored for the objects across locations and the witness.

Pretty cool right?!

Can I make a host in a cluster the vSphere HA primary / master host?

Duncan Epping · May 21, 2021 ·

There was an interesting question on the VMware VMTN Community this week, although I wrote about this in 2016 I figured I would do a short write-up again as the procedure changed since 7.0u1. The question was if it was possible to make a particular host in a cluster the vSphere HA primary (or master as it was called previously) host. The use case was pretty straightforward, in this case, the customer had a stretched cluster configuration with vSAN, they wanted to make sure that the vSphere HA primary host was located in the “preferred” site, as this could potentially speed up the restart of VMs. Now, mind you, that when I say “speed up” we are talking about 2-3 seconds difference at most, but for some folks, this may be crucial. I personally would not recommend making configuration changes, but if you do want to do this, vSphere does have the option to do so.

When it comes to vSphere HA, there’s no UI option or anything like that to assign the “primary/master” host role. However, there’s the option to specify an advanced setting on a host level to indicate that a certain host needs to be favored during the primary/master election. Again, this is not very common for customers to configure, but if you desire to do so, it is possible. The advanced setting is called “fdm.nodeGoodness” and depending on which version you use, you will need to configure it either via the fdm.cfg file, or via the configstorecli. You can read about this process in-depth here.

Of course, I did try if this worked in my lab, here’s what I did, I first list the current configured advanced options using configstorecli for vSphere HA:

configstorecli config current get -g cluster -c ha -k fdm
{
   "mem_reservation_MB": 200,
   "memory_checker_time_in_secs": 0
}

Next, I will set the “node_goodness” for my host, when setting this it will need to be a positive value, in my case I am setting it to 10000000. I first dumped the current config in a json file:

configstorecli config current get -g cluster -c ha -k fdm > test.json

Next, I edited the file and added the setting “node_goodness” with a value of 10000000, so that is looks as follows:

{ 
    "mem_reservation_MB": 200, 
    "memory_checker_time_in_secs": 0,
    "node_goodness": 10000000
}

I then imported the file:

configstorecli config current set -g cluster -c ha -k fdm -infile test.json

After importing the file and reconfiguring for HA on one of my hosts, you can see in the screenshots below that the master role moved from 1507 to 1505.

I also created a quick demo, for those who prefer video content:

Does vSAN Enhanced Durability work when you have a limited number of hosts?

Duncan Epping · Apr 19, 2021 ·

Last week I had a question about how vSAN Enhanced Durability works when you have a limited number of hosts. In this case, the customer had a 3+3+1 stretched cluster configuration, and they wondered what would happen when they would place a host into maintenance mode. Although I was pretty sure I knew what would happen, I figured I would test it in the lab anyway. Let’s start with a high-level diagram of what the environment looks like. Note I use a single VM as an example, just to keep the scenario easy to follow.

In the diagram, we see a virtual disk that is configured to be stretched across locations, and protected by RAID-1 within each location. As a result, you will have two RAID-1 trees each with two components and a witness, and of course, you would have a witness component in the witness location. Now the question is, what happens when you place esxi-host-1 into maintenance mode? In this scenario, vSAN Enhanced Durability will want to create a “durability component”. This durability component is used to commit all new write IO to. This will allow vSAN to resync fast after maintenance mode, and enhances durability as we would still have 2 copies of the (new) data.

However, in the scenario above we only have 3 hosts per location. The question then is, where is this delta component created then? As normally with maintenance mode you would need a 4th host to move data to. Well, it is simple, in this case, what vSAN does is it creates a “durability component” on the host where the witness resides, within the location of course. Let me show you in a diagram, as that makes it clear instantly.

By adding the Durability component next to the witness on esxi-host-3, vSAN enhances durability even in this stretched cluster situation, as it provides a local additional copy of new writes. Now, of course I tested this in my lab. So for those who prefer to see a demo, check out the youtube video below.

vSAN File Services and Stretched Clusters!

Duncan Epping · Mar 29, 2021 ·

As most of you probably know, vSAN File Services is not supported on a stretched cluster with vSAN 7.0 or 7.0U1. However, starting with vSAN 7.0 U2 we now fully support the use of vSAN File Services on a stretched cluster configuration! Why is that?

In 7.0 U2, you now have the ability to specify during configuration of vSAN File Services to which site certain IP addresses belong. In other words, you can specify the “site affinity” of your File Service services. This is shown in the screenshot below. Now I do want to note, this is a soft affinity rule. Meaning that if hosts, or VMs, fail on which these file services containers are running it could be that the container is restarted in the opposite location. Again, a soft rule, not a hard rule!

Of course, that is not the end of the story. You also need to be able to specify for each share with which location they have affinity. Again, you can do this during configuration (or edit it afterward if desired), and this basically then sets the affinity for the file share to a location. Or said differently, it will ensure that when you connect to file share, one of the file servers in the specified site will be used. Again, this is a soft rule, meaning that if none of the file servers are available on that site, you will still be able to use vSAN File Services, just not with the optimized data path you defined.

Hopefully, that gives a quick overview of how you can use vSAN File Services in combination with a vSAN Stretched Cluster. I created a video to demonstrate these new capabilities, you can watch it below.

vSAN 7.0 U2 now integrates with vSphere DRS

Duncan Epping · Mar 24, 2021 ·

One of the features our team requested a while back was integration between DRS and vSAN. The key use case we had was for stretched clusters. Especially in scenarios where a failure has occurred, it would be useful if DRS would understand what vSAN is doing. What do I mean by that?

Today when customers create a stretched cluster they have two locations. Using vSAN terminology these locations are referred to as the Preferred Fault Domain and the Secondary Fault Domain. Typically when VMs are then deployed, customers will create VM-to-Host Affinity Rules which state that VMs should reside in a particular location. When these rules are created DRS will do its best to ensure that the defined rule is adhered to. What is the problem?

Well if you are running a stretched cluster and let’s say one of the sites go down, then what happens when the failed location returns for duty is the following:

vSAN detects the missing components are available again
vSAN will start the resynchronization of the components
DRS runs every minute and rebalances and will move VMs based on the DRS rules

This means that the VMs for which rules are defined will move back to their respective location, even though vSAN is potentially still resynchronizing the data. First of all, the migration will interfere with the replication traffic. Secondly, for as long as the resync has not completed, I/O will across the network between the two locations, this will not only interfere with resync traffic, it will also increase latency for those workloads. So, how does vSAN 7.0 U2 solve this?

Starting with vSAN 7.0 U2 and vSphere 7.0 U2 we now have DRS and vSAN communicating. DRS will verify with vSAN what the state is of the environment, and it will not migrate the VMs back as long the VMs are healthy again. When the VMs are healthy and the resync has completed, you will see the rules being applied and the VMs automatically migrate back (when DRS is configured to Fully Automated that is).

I can’t really show it with a screenshot or anything, as this is a change in the vSAN/DRS architecture, but to make sure it worked I recorded a quick demo which I published through Youtube. Make sure to watch the video!