• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

Stretched cluster witness failure resilience in vSAN 7.0

Duncan Epping · Mar 17, 2022 · Leave a Comment

Cormac and I have been busy the past couple of weeks updating the vSAN Deep Dive to 7.0 U3. Yes, there is a lot to update and add, but we are actually going through it at a surprisingly rapid pace. I guess it helps that we had already written dozens of blog posts on the various topics we need to update or add. One of those topics is “witness failure resilience” which was introduced in vSAN 7.0 U3. I have discussed it before on this blog (here and here) but I wanted to share some of the findings with you folks as well before the book is published. (No, I do not know when the book will be available on Amazon just yet!)

In the scenario below, we failed the secondary site of our stretched cluster completely. We can examine the impact of this failure through RVC on vCenter Server. This will provide us with a better understanding of the situation and how the witness failure resilience mechanism actually works. Note that the below output has been truncated for readability reasons. Let’s take a look at the output of RVC for our VM directly after the failure.

VM R1-R1:
Disk backing:
[vsanDatastore] 0b013262-0c30-a8c4-a043-005056968de9/R1-R1.vmx
RAID_1
RAID_1
Component: 0b013262-c2da-84c5-1eee-005056968de9 , host: 10.202.25.221
votes: 1, usage: 0.1 GB, proxy component: false)
Component: 0b013262-3acf-88c5-a7ff-005056968de9 , host: 10.202.25.201
votes: 1, usage: 0.1 GB, proxy component: false)
RAID_1
Component: 0b013262-a687-8bc5-7d63-005056968de9 , host: 10.202.25.238
votes: 1, usage: 0.1 GB, proxy component: true)
Component: 0b013262-3cef-8dc5-9cc1-005056968de9 , host: 10.202.25.236
votes: 1, usage: 0.1 GB, proxy component: true)
Witness: 0b013262-4aa2-90c5-9504-005056968de9 , host: 10.202.25.231
votes: 3, usage: 0.0 GB, proxy component: false)
Witness: 47123362-c8ae-5aa4-dd53-005056962c93 , host: 10.202.25.214
votes: 1, usage: 0.0 GB, proxy component: false)
Witness: 0b013262-5616-95c5-8b52-005056968de9 , host: 10.202.25.228
votes: 1, usage: 0.0 GB, proxy component: false)

As can be seen, the witness component holds 3 votes, the components on the failed site (secondary) hold 2 votes, and the components on the surviving data site (preferred) hold 2 votes. After the full site failure has been detected, the votes are recalculated to ensure that a witness host failure does not impact the availability of the VMs. Below shows the output of RVC once again.

VM R1-R1:
Disk backing:
[vsanDatastore] 0b013262-0c30-a8c4-a043-005056968de9/R1-R1.vmx
RAID_1
RAID_1
Component: 0b013262-c2da-84c5-1eee-005056968de9 , host: 10.202.25.221
votes: 3, usage: 0.1 GB, proxy component: false)
Component: 0b013262-3acf-88c5-a7ff-005056968de9 , host: 10.202.25.201
votes: 3, usage: 0.1 GB, proxy component: false)
RAID_1
Component: 0b013262-a687-8bc5-7d63-005056968de9 , host: 10.202.25.238
votes: 1, usage: 0.1 GB, proxy component: false)
Component: 0b013262-3cef-8dc5-9cc1-005056968de9 , host: 10.202.25.236
votes: 1, usage: 0.1 GB, proxy component: false)
Witness: 0b013262-4aa2-90c5-9504-005056968de9 , host: 10.202.25.231
votes: 1, usage: 0.0 GB, proxy component: false)
Witness: 47123362-c8ae-5aa4-dd53-005056962c93 , host: 10.202.25.214
votes: 3, usage: 0.0 GB, proxy component: false)

As can be seen, the votes for the various components have changed, the data site now has 3 votes per component instead of 1, the witness on the witness host went from 3 votes to 1, and on top of that, the witness that is stored in the surviving fault domain now also has 3 votes instead of 1 vote. This now results in a situation where quorum would not be lost even if the witness component on the witness host is impacted by a failure. A very useful enhancement to vSAN 7.0 Update 3 for stretched cluster configurations if you ask me.

Share it:

  • Tweet

Related

Server 7.0 u3, stretched cluster, update 3, vsan

Reader Interactions

Leave a Reply Cancel reply

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007), the author of the "vSAN Deep Dive" and the “vSphere Clustering Technical Deep Dive” series, and he is the host of the "In de aap gelogeerd" (Dutch) and "unexplored territory" (English) podcasts.

Upcoming Events

09-06-2022 – VMUG Belgium
16-06-2022 – VMUG Sweden

Recommended Reads

Sponsors

Want to support Yellow-Bricks? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2022 · Log in