BC-DR

What do I do after a vSAN Stretched Cluster Site Takeover?

Duncan Epping · Nov 10, 2025 · 4 Comments

Over the last couple of months, various new vSAN features were announced. Two of those features are around the Stretched Cluster configuration, and have probably been the number 1 feature request for a few years. Now that we have Site Takeover and Site Maintenance functionality available, I am starting to get some questions about the impact of them, and in particular, the Site Takeover functionality is raising some questions.

For those who don’t know what these features are, let me describe them briefly:

Site Maintenance = The ability to place a full vSAN stretched cluster Fault Domain into maintenance mode at once. This ensures that all hosts within the fault domain have consistently stored the data, and all hosts will go into maintenance mode at the same time.

Site Takeover = This provides the ability when a Witness and a Data Site has failed to bring back the remaining site through a command line interface. This will reconstruct the remaining “site local” RAID configuration, making the objects available again, which will then allow vSphere HA to restart the VMs.

Now, the question that the above typically raises is what happens to the Witness and the Data Site that failed when you do the Site Takeover? If you look at the VMs RAID configuration, you will notice that both the Witness and the Data Site components of the sites that failed will completely disappear from the RAID configuration.

But what do you do next, because even after you run the Site Takeover, you still see your hosts and the witness in vCenter Server, and you still see a stretched cluster configuration in the UI. Now at first I thought that if the environment was completely up and running again, you had to go through some manual effort to reconstruct the stretched cluster. Basically, remove the failed hosts, wipe the disks, and recreate the stretched cluster. This is, however, not the case.

In the example above, if the Preferred site and the Witness site return for duty, vSAN will automatically discard the stale components in those previously failed sites. It will recreate new components for all objects, and it will do a full resync of the data.

If you end up in a situation where your hosts are completely gone (let’s say as a result of a fire), then you will have to do some kind of manual cleanup as follows, before you rebuild and add hosts back:

Remove the failed hosts from the vCenter inventory
Remove the witness from the vCenter inventory
- Delete the witness from the vCenter Server it is running, a real delete!
Delete the surviving Fault Domain, this should be the only Fault Domain still listed in the vCenter interface
You now have a normal cluster again
Rebuild hosts and recreate the stretched cluster

I hope that helps,

vSAN Stretched Cluster vs Fault Domains in a “campus” setting?

Duncan Epping · Sep 25, 2025 · 2 Comments

I got this question internally recently: Should we create a vSAN Stretched Cluster configuration or create a vSAN Fault Domains configuration when we have multiple datacenters within close proximity on our campus? In this case, we are talking about less than 1ms latency RTT between buildings, maybe a few hundred meters at most. I think it is a very valid question, and I guess it kind of depends on what you are looking to get out of the infrastructure. I wrote down the pros and cons, and wanted to share those with the rest of the world as well, as it may be useful for some of you out there. If anyone has additional pros and cons, feel free to share those in the comments!

vSAN Stretched Clusters:

Pro: You can replicate across fault domains AND protect additionally within a fault domain with R1/R5/R6 if required.
Pro: You can decide whether VMs should be stretched across Fault Domains or not, or just protected within a fault domain/site
Pro: Requires less than 5MS RTT latency, which is easily achievable in this scenario
Con/pro: you probably also need to think about DRS/HA groups (VM-to-Host)
Con: From an operational perspective, it also introduces a witness host, and sites, which may complicate things, and at the various least requires a bit more thinking
Con: Witness needs to be hosted somewhere
Con: Limited to 3 Fault Domains (2x data + 1x witness)
Con: Limited to 20+20+1 configuration

vSAN Fault Domains:

Pro: No real considerations around VM-to-host rules usually, although you can still use it to ensure certain VMs are spread across buildings
Pro: No Witness Appliance to manage, update or upgrade. No overhead of running a witness somewhere
Pro: No design considerations around “dedicated” witness sites and “data site”, each site has the same function
Pro: Can also be used with more than 3 Fault Domains or Datacenters, so could even be 6 Fault Domains, for instance
Pro: Theoretically can go up to 64 hosts
Con: No ability to protect additionally within a fault domain
Con: No ability to specify that you don’t want to replicate VMs across Fault Domains
Con/Pro: Requires sub-1ms RTT latency at all times, which is low, but will be achievable in a campus cluster, usually

Where is the vSAN Snapmanager Appliance with 9.0?

Duncan Epping · Jul 7, 2025 · 4 Comments

I was talking to my colleague Paudie and he mentioned various folks were having problems finding the vSAN Snapmanager Appliance for vSAN / VCF / vSphere 9.0. The appliance used to be stored on the Broadcom Support portal under VMware vSAN >> Drivers & Tools, but it is no longer there.

This is not by mistake. Some may have heard about this, others may have skipped over it, but VMware Live Recovery, vSphere Replication, and vSAN Data Protection (which includes the Snapmanager Appliance) have all converged into a single appliance to make your life easier! This means that if you want to enable vSAN Data Protection, you now need to download the VMware Live Recovery Appliance, specifically version 9.0.3.0 or later.

vSphere HA restart times, how long does it actually take?

Duncan Epping · Mar 13, 2025 · Leave a Comment

I had a question today, and it was based on material I wrote years ago for the Clustering Deepdive. (read it here) The material talks about the sequence HA goes through when a failure has occurred. If you look at the sequence for instance where a “secondary” host has failed, it looks as follows:

T0 – Secondary host failure.
T3s – Primary host begins monitoring datastore heartbeats for 15 seconds.
T10s – The secondary host is declared unreachable and the primary will ping the management network of the failed secondary host. This is a continuous ping for 5 seconds.
T15s – If no heartbeat datastores are configured, the secondary host will be declared dead if there is no reply to the ping.
T18s – If heartbeat datastores are configured, the secondary host will be declared dead if there’s no reply to the ping and the heartbeat file has not been updated or the lock was lost.

So, depending on whether you have heartbeat datastores or not, this sequence takes either 15 or 18 seconds. Does that mean the VMs are then instantly restarted, and if so, how long does that take? Well no, they won’t instantly restart, because when this sequence has ended, the secondary host which has failed is actually declared dead. Now the potentially impacted VMs will need to be verified if they have actually failed, a list of “to be restarted” VMs will need to be created, and a placement request will need to be done.

The placement request will either go to DRS, or will be handled by HA itself, depending on whether DRS is enabled and if vCenter Server is available. After placement has been determined, the primary host will then request the individual hosts to restart the VMs which should be restarted. After the host(s) has received the list of VMs it needs to restart it will do this in batches of 32, and of course restart priority / order, will be applied. The whole aforementioned process can easily take 10-15 seconds (if not longer), which means that in a perfect world, the restart of the VM occurs after about 30 seconds. Now, this is when the restart of the VM is initiated, that does not mean that the VM, or the services it is hosting, will be available after 30 seconds. The power-on sequence of the VM can take anywhere from seconds, to minutes, depending of course on the size of the VM and the services that need to be started during the power-on sequence.

So, although it only takes 15 to 18 seconds for vSphere HA to determine and declare a failure, there’s much more to it, hopefully, this post provides a better understanding of all that is involved.

Unexplored Territory Episode 088 – Stretching VMware Cloud Foundation featuring Paudie O’Riordan

Duncan Epping · Jan 13, 2025 · Leave a Comment

The first episode of 2025 features one of my favorite colleagues, Paudie O’Riordan. Paudie works for the same team as I do, and although we’ve both roamed around a lot, somehow we always ended up either in the same team, or in very close proximity. Paudie is a storage guru, and the last years helped many customers with their VCF (or vSAN) proof of concept, and on top of that helped countless customers understand difficult failure scenarios in a stretched environment when things went south. In Episode 088 Paudie discusses the many dos and don’ts! This is an episode you need cannot miss out on!