• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

Duncan Epping

Doing network/ISL maintenance in a vSAN stretched cluster configuration!

Duncan Epping · Nov 21, 2023 · 3 Comments

I got a question earlier about the maintenance of an ISL in a vSAN Stretched Cluster configuration which had me thinking for a while. The question was what would you do with your workload during maintenance. I guess the easiest of course is to power off all VMs and simply shutdown the cluster, for which vSAN has a UI option, and there’s a KB you can follow. Now, of course, there could also be a situation where the VMs need to remain running. But how does this work when you end up losing the connection between all three locations? Normally this would lead to a situation where all VMs will become “inaccessible” as you will end up losing quorum.

As said, this had me thinking, you could take advantage of the “vSAN Witness Resiliency” mechanism which was introduced in vSAN 7.0 U3. How would this work?

Well, it is actually pretty straight forward, if all hosts of 1 site are in maintenance mode, failed, or powered off, the votes of the witness object for each VM/Object will be recalculated within 3 minutes. When this recalculation has completed the witness can go down without having any impact on the VM. We introduced this capability to increase resiliency in a double-failure scenario, but we can (ab)use this functionality also during maintenance. Of course I had to test this, so the first step I took was placing all hosts in 1 location into maintenance mode (no data evac). This resulted in all my VMs being vMotioned to the other site.

Doing network/ISL maintenance in a vSAN stretched cluster configuration!

Now next I checked with RVC if my votes were recalculated or not. As stated, depending on the number of VMs this can take around 3 minutes in total, but usually will probably be quicker. After the recalculation had been completed I powered off the Witness, and this was the result as shown below, all VMs were still running.

Of course, I had to double check on the commandline using RVC (you can use the command “vsan.vm_object_info” to check a particular object for instance) to ensure that indeed the components of those VMs were still “ACTIVE” instead of “ABSENT”, and there you go!

Now when maintenance has been completed, you simply do the reverse, you power on the witness, and then you power on the hosts in the other location. After the “resync” has been completed the VMs will be rebalanced again by DRS. Note, DRS rebalancing (or should rules being applied) will only happen when the resync of the VM has been completed.

Call to action: Help cancer patients enjoy life when still possible!

Duncan Epping · Nov 16, 2023 · Leave a Comment

CALL TO ACTION: Everyone knows someone with cancer, who passed away as a result of cancer, or maybe even won the battle against cancer. I have seen too many family members and friends fight the battle and lose. For this reason, I decided to participate in the Roparun this year. The Roparun non-profit organization aims to help organizations (financially) in Holland that provide Palliative care for cancer patients.

I will join Team 243 to run from Paris to Rotterdam in a relay fashion next year (May 2024). We will be running with two groups of four people, each group will run in a relay fashion for 4 hours and will then have a 4 hour break, each runner will do 2 kilometers per repetition until the 4 hours has passed. This means it will take roughly 3 days to get back from Paris to Rotterdam. It is my goal to collect as many donations as possible to help cancer patients enjoy life when still possible, every contribution, small or large, is appreciated. I know I have helped many people over the years through my blogs, product feedback, videos, demos, books, and podcasts and many folks have asked if they could return the favor in the past. This is your chance to do so. Again, small or large, it does not matter, all help is appreciated!

All proceeds will go to charity! Please share, repost, and donate, all help is appreciated!
Folks from the Netherlands Holland please use: https://www.roparun.nl/fundraisers/duncan-epping
Folks outside of the Netherlands please use: https://www.gofundme.com/f/living-life-despite-cancer

Call to action: Help cancer patients enjoy life when still possible!

What does Datastore Sharing/HCI Mesh/vSAN Max support when stretched?

Duncan Epping · Oct 31, 2023 · 9 Comments

This question has come up a few times now, what does Datastore Sharing/HCI Mesh/vSAN Max support when stretched? It is a question which keeps coming up somehow, and I personally had some challenges to find the statements in our documentation as well. I just found the statement and I wanted to first of all point people to it, and then also clarify it so there is no question. If I am using Datastore Sharing / HCI Mesh, or will be using vSAN Max, and my vSAN Datastore is stretched, what does VMware (or does not) support?

We have multiple potential combinations, let me list them and add whether it is supported or not, not that this is at the time of writing with the current available version (vSAN 8.0 U2).

  • vSAN Stretched Cluster datastore shared with vSAN Stretched Cluster –> Supported
  • vSAN Stretched Cluster datastore shared with vSAN Cluster (not stretched) –> Supported
  • vSAN Stretched Cluster datastore shared with Compute Only Cluster (not stretched) –> Supported
  • vSAN Stretched Cluster datastore shared with Compute Only Cluster (stretched, symmetric) –> Supported
  • vSAN Stretched Cluster datastore shared with Compute Only Cluster (stretched, asymmetric) –> Not Supported

So what is the difference between symmetric and asymmetric? The below image, which comes from the vSAN Stretched Configuration, explains it best. I think Asymmetric in this case is most likely, so if you are running Stretched vSAN and a Stretched Compute Only, it most likely is not supported.

What does Datastore Sharing/HCI Mesh/vSAN Max support when stretched?

This also applies to vSAN Max by the way. I hope that helps. Oh and before anyone asks, if the “server side” is not stretched it can be connected to a stretched environment and is supported.

 

Unexplored Territory episode 59: Introducing vSAN Max!

Duncan Epping · Oct 23, 2023 · Leave a Comment

Two months ago VMware introduced vSAN Max at VMware Explore. I wrote about it in this blog. Last week I had a conversation with Kalyan Krishnaswamy on the topic of vSAN Max, for which Kalyan is the Product Manager. I figured I would share the episode via my blog as well for those who are not subscribed to the Unexplored Territory podcast just yet. Note, you can either listen to it below, of just listen via Spotify, Apple, or anywhere else you get your podcasts.

Witness resiliency feature with a 2-node cluster

Duncan Epping · Oct 9, 2023 · Leave a Comment

A few weeks ago I had a conversation with a customer about a large vSAN ESA 2-node deployment they were planning for. One of the questions they had was if they would have a 2-node configuration with nested fault domains if they would be able to tolerate a witness failure after one of the node had gone down. I tested this for a stretched cluster, but I hadn’t tested it with a 2-node configuration. Will we actually see the votes be re-calculated after a host failure, and will the VM remain up and running when the witness fails after the votes have been recalculated?

Let’s just test it, and look at RVC at what happens in each case. Let’s look at the healthy output first, then we will look at a host failure, followed by the witness failure:

Healthy

    DOM Object: 71c32365-667e-0195-1521-0200ab157625 
      RAID_1
        Concatenation
          Component: 71c32365-b063-df99-2b04-0200ab157625 
            votes: 2, usage: 0.0 GB, proxy component: true
          RAID_0
            Component: 71c32365-f49e-e599-06aa-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: true
            Component: 71c32365-681e-e799-168d-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: true
            Component: 71c32365-06d3-e899-b3b2-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: tru
        Concatenation
          Component: 71c32365-e0cb-ea99-9c44-0200ab157625 
            votes: 1, usage: 0.0 GB, proxy component: false
          RAID_0
            Component: 71c32365-6ac2-ee99-1f6d-0200ab157625 
               votes: 1, usage: 0.0 GB, proxy component: false
            Component: 71c32365-e03f-f099-eb12-0200ab157625 
               votes: 1, usage: 0.0 GB, proxy component: false
            Component: 71c32365-6ad0-f199-a021-0200ab157625 
               votes: 1, usage: 0.0 GB, proxy component: false
      Witness: 71c32365-8c61-f399-48c9-0200ab157625 
        votes: 4, usage: 0.0 GB, proxy component: false

1 host down, as you can see the votes for the witness changed, of course the staste also changed from “active” to “absent”.

    DOM Object: 71c32365-667e-0195-1521-0200ab157625 
      RAID_1
        Concatenation (state: ABSENT (6)
          Component: 71c32365-b063-df99-2b04-0200ab157625 
            votes: 1, proxy component: false
          RAID_0
            Component: 71c32365-f49e-e599-06aa-0200ab157625 
              votes: 1, proxy component: false
            Component: 71c32365-681e-e799-168d-0200ab157625 
              votes: 1, proxy component: false
            Component: 71c32365-06d3-e899-b3b2-0200ab157625 
              votes: 1, proxy component: false
        Concatenation
          Component: 71c32365-e0cb-ea99-9c44-0200ab157625 
             votes: 2, usage: 0.0 GB, proxy component: false
          RAID_0
            Component: 71c32365-6ac2-ee99-1f6d-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: false
            Component: 71c32365-e03f-f099-eb12-0200ab157625
              votes: 1, usage: 0.0 GB, proxy component: false
            Component: 71c32365-6ad0-f199-a021-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: false
      Witness: 71c32365-8c61-f399-48c9-0200ab157625 
        votes: 1, usage: 0.0 GB, proxy component: false

And after I failed the witness, of course we had to check if the VM was still running and didn’t show up as inaccessible in the UI, and it did not. vSAN and the Witness Resilience feature worked as I expected it would work. (Yes, I double checked it through RVC as well, and the VM was “active”.)

Witness resiliency feature with a 2-node cluster

  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Interim pages omitted …
  • Go to page 482
  • Go to Next Page »

Primary Sidebar

About the Author

Duncan Epping is a Chief Technologist in the Office of the CTO in the Cloud Infrastructure Business Group (CIBG) at VMware. Besides writing on Yellow-Bricks, Duncan co-authors the vSAN Deep Dive book series and the vSphere Clustering Deep Dive book series. Duncan also co-hosts the Unexplored Territory Podcast.

Follow Me

  • Twitter
  • LinkedIn
  • Spotify
  • YouTube

Recommended Book(s)

Advertisements




Copyright Yellow-Bricks.com © 2023 · Log in