• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

vSAN 7.0 U3 enhanced stretched cluster resiliency, what is it?

Duncan Epping · Oct 4, 2021 · 9 Comments

I briefly discussed the enhanced stretched cluster resiliency capability in my vSAN 7.0 U3 overview blog. Of course, immediately questions started popping up. I didn’t want to go too deep in that post as I figured I would do a separate post on the topic sooner or later. What does this functionality add, and in which particular scenario?

In short, this enhancement to stretched clusters prevents downtime for workloads in a particular failure scenario. So the question then is, what failure scenario? Let’s take a look at this diagram first of a typical stretched vSAN cluster deployment.

If you look at the diagram you see the following: Datacenter A, Datacenter B, Witness. One of the situations customers have found themselves in is that Datacenter A would go down (unplanned). This of course would lead to the VMs in Datacenter A being restarted in Datacenter B. Unfortunately, sometimes when things go wrong, they go wrong badly, in some cases, the Witness would fail/disappear next. Why? Bad luck, networking issues, etc. Bad things just happen. If and when this happens, there would only be 1 location left, which is Datacenter B.

Now you may think that because Datacenter B typically will have a full RAID set of the VMs running that they will remain running, but that is not true. vSAN looks at the quorum of the top layer, so if 2 out of 3 datacenters disappear, all objects impacted will become inaccessible simply as quorum is lost! Makes sense right? We are not just talking about failures right, could also be that Datacenter A has to go offline for maintenance (planned downtime), and at some point, the Witness fails for whatever reason, this would result in the exact same situation, objects inaccessible.

Starting with 7.0 U3 this behavior has changed. If Datacenter A fails, and a few (let’s say 5) minutes later the witness disappears, all replicated objects would still be available! So why is this? Well in this scenario, if Datacenter A fails, vSAN will create a new votes layout for each of the objects impacted. It basically will assume that the witness can fail and give all components on the witness 0 votes, on top of that it will give the components in the active site additional votes so that we can survive that second failure. If the witness would fail, it would not render the objects inaccessible as quorum would not be lost.

Now, do note, when a failure occurs and Datacenter A is gone, vSAN will have to create a new votes layout for each object. If you have a lot of objects this can take some time. Typically it will take a few seconds per object, and it will do it per object, so if you have a lot of VMs (and a VM consists of various objects) it will take some time. How long, well it could be five minutes. So if anything happens in between, not all objects may have been processed, which would result in downtime for those VMs when the witness would go down, as for that VM/Object quorum would be lost.

What happens if Datacenter A (and the Witness) return for duty? Well at that point the votes would be restored for the objects across locations and the witness.

Pretty cool right?!

Share it:

  • Tweet

Related

Server, Storage, vSAN 7.0, 7.0 u3, stretched, stretched cluster, u3, VMware, vsan

Reader Interactions

Comments

  1. tntteam says

    7 October, 2021 at 12:04

    Thanks for the info, this is a nice improvement indeed !

    Reply
  2. Michael Schroeder says

    17 March, 2022 at 17:17

    Hello Duncan. This is really a cool improvement for stretched clusters. Today one of my students asked a smart question: What if the witness fails first? Will there be a shift in the vote distribution too? What will happen if one of the sites goes down a couple of minutes AFTER the witness site? Will the VMs on the last surviving site remain online, or is that just bad karma? 😉

    Reply
    • Duncan Epping says

      18 March, 2022 at 09:01

      Very valid question. Unfortunately, that is indeed not happening today. You can imagine that that is rather complex to work through. The Witness is the quorum and we can safely drop the witness as we only have 1 full raid tree available. If we have 2 full raid trees it becomes much more complex as we don’t want to end up in a situation where both locations can write to the same object. I do have some ideas around how we can solve this, but it would require the implementation of another feature before it is really effective.

      Reply
      • Manuel Dal Bianco says

        15 June, 2022 at 18:21

        What about a manual trigger to switch to single site mode? It would be better than nothing and should be simple to implement

        Reply
  3. Marcos Ortiz says

    22 November, 2022 at 10:44

    Hi Duncan, this is a very awesome new feature but i have a simple question. is it activated by default once you upgrade to U3? or do you need to do something to activate it? i mean, upgrade disk versions or anything else…

    Thanks a lot

    Reply
    • Patrick Haan says

      23 November, 2022 at 18:04

      Good questions – especially if it’s needed to do a “vSAN object format” Upgrade too?

      Reply
      • Duncan Epping says

        24 November, 2022 at 17:14

        AFAIK you don’t, but I have personally never done an upgrade without an object level upgrade, especially not in the last few versions as those upgrades are typically meta data only.

        Reply
        • Patrick Haan says

          24 November, 2022 at 17:58

          Disk Format Upgrade – Fully agree.

          But also upgrading “object format level”?
          – Cause vSAN expects to resync a bunch of files – which could be in some (hybrid) cases a problem (latency critical production environemts, etc.)

          Reply
      • Duncan Epping says

        25 November, 2022 at 08:36

        No, it should not require an object format. This is just metadata and “accounting”.

        Reply

Leave a Reply Cancel reply

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007), the author of the "vSAN Deep Dive", the “vSphere Clustering Technical Deep Dive” series, and the host of the "Unexplored Territory" podcast.

Upcoming Events

Feb 9th – Irish VMUG
Feb 23rd – Swiss VMUG
March 7th – Dutch VMUG
May 24th – VMUG Poland
June 1st – VMUG Belgium

Recommended Reads

Sponsors

Want to support Yellow-Bricks? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2023 · Log in