vSAN

Witness resiliency feature with a 2-node cluster

Duncan Epping · Oct 9, 2023 ·

A few weeks ago I had a conversation with a customer about a large vSAN ESA 2-node deployment they were planning for. One of the questions they had was if they would have a 2-node configuration with nested fault domains if they would be able to tolerate a witness failure after one of the node had gone down. I tested this for a stretched cluster, but I hadn’t tested it with a 2-node configuration. Will we actually see the votes be re-calculated after a host failure, and will the VM remain up and running when the witness fails after the votes have been recalculated?

Let’s just test it, and look at RVC at what happens in each case. Let’s look at the healthy output first, then we will look at a host failure, followed by the witness failure:

Healthy

    DOM Object: 71c32365-667e-0195-1521-0200ab157625 
      RAID_1
        Concatenation
          Component: 71c32365-b063-df99-2b04-0200ab157625 
            votes: 2, usage: 0.0 GB, proxy component: true
          RAID_0
            Component: 71c32365-f49e-e599-06aa-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: true
            Component: 71c32365-681e-e799-168d-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: true
            Component: 71c32365-06d3-e899-b3b2-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: tru
        Concatenation
          Component: 71c32365-e0cb-ea99-9c44-0200ab157625 
            votes: 1, usage: 0.0 GB, proxy component: false
          RAID_0
            Component: 71c32365-6ac2-ee99-1f6d-0200ab157625 
               votes: 1, usage: 0.0 GB, proxy component: false
            Component: 71c32365-e03f-f099-eb12-0200ab157625 
               votes: 1, usage: 0.0 GB, proxy component: false
            Component: 71c32365-6ad0-f199-a021-0200ab157625 
               votes: 1, usage: 0.0 GB, proxy component: false
      Witness: 71c32365-8c61-f399-48c9-0200ab157625 
        votes: 4, usage: 0.0 GB, proxy component: false

1 host down, as you can see the votes for the witness changed, of course the staste also changed from “active” to “absent”.

    DOM Object: 71c32365-667e-0195-1521-0200ab157625 
      RAID_1
        Concatenation (state: ABSENT (6)
          Component: 71c32365-b063-df99-2b04-0200ab157625 
            votes: 1, proxy component: false
          RAID_0
            Component: 71c32365-f49e-e599-06aa-0200ab157625 
              votes: 1, proxy component: false
            Component: 71c32365-681e-e799-168d-0200ab157625 
              votes: 1, proxy component: false
            Component: 71c32365-06d3-e899-b3b2-0200ab157625 
              votes: 1, proxy component: false
        Concatenation
          Component: 71c32365-e0cb-ea99-9c44-0200ab157625 
             votes: 2, usage: 0.0 GB, proxy component: false
          RAID_0
            Component: 71c32365-6ac2-ee99-1f6d-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: false
            Component: 71c32365-e03f-f099-eb12-0200ab157625
              votes: 1, usage: 0.0 GB, proxy component: false
            Component: 71c32365-6ad0-f199-a021-0200ab157625 
              votes: 1, usage: 0.0 GB, proxy component: false
      Witness: 71c32365-8c61-f399-48c9-0200ab157625 
        votes: 1, usage: 0.0 GB, proxy component: false

And after I failed the witness, of course we had to check if the VM was still running and didn’t show up as inaccessible in the UI, and it did not. vSAN and the Witness Resilience feature worked as I expected it would work. (Yes, I double checked it through RVC as well, and the VM was “active”.)

Do I need 2 isolation addresses with a (vSAN) stretched cluster for vSphere HA?

Duncan Epping · Sep 27, 2023 ·

This question has come up multiple times now, so I figured I would write a quick post about it, do you need 2 isolation addresses with a (vSAN) stretched cluster for vSphere HA? This question comes up as the documentation has best practices around the configuration of HA isolation addresses for stretched clusters. The documentation (both for vSAN as well as traditional stretched storage) states that you need to have two reliable addresses, one in each location.

Now I have had the above question multiple times as some folks have mentioned that they can use a Gateway Address with Cisco ACI which would still be accessible in both locations even if there’s a partition due to for instance an ISL failure. If that is the case, and the IP address is indeed available in both locations during those types of failure scenarios then it would suffice to use a single IP address as your isolation address.

You will however need to make sure that the IP address is reachable over the vSAN network when using vSAN as your stretched storage platform. (When vSAN is enabled vSphere HA uses the vSAN network for communications.) If it is reachable you can simply define the isolation address by setting the advanced setting “das.isolationaddress0”. It is also recommended to disable the use of the default gate of the management network by setting “das.usedefaultisolationaddress” to false for vSAN based environments.

I have requested the vSAN stretched clustering documentation to be updated to reflect this.

vSAN ReadyNode emulated configurations? What are those?

Duncan Epping · Sep 26, 2023 ·

Last week Pete Koehler dropped a bomb on us when he blogged about vSAN ReadyNode emulated configurations. Since then I had a few folks asking what this exactly is. It is fairly simple, some vendors have special SKUs for ReadyNodes, which doesn’t always make configuring a ReadyNode to the desired specifications based on the minimum requirements for vSAN ESA and the supported components. SAY WHAT?

Well just imagine you are a Dell shop and you want to use the R750. You simply check if the R750 is listed on the VCG, you list the minimum CPU spec and you go from there based on the minimum (and maximum) specifications for vSAN ESA and based on your workload profile. Just as an example, the minimum specifications for vSAN ESA are now as follows with the introduction of the vSAN AF-0 ReadyNode configuration:

Minimum of 16 cores Intel or AMD
- For example: 2 x Intel Xeon® Gold 6334 3.6G, 8 cores
- Or: 1 x AMD EPYC 9124 16C 200W 3.0GHz Processor
Minimum of 128GB memory
Minimum of 10GbE
Minimum of 2 NVMe Devices (as listed on vSAN VCG) and 3.2TB per host

Now that we know what those minimums are, I could simply go to the Dell website and spec a Dell R750 Server as desired. This server could have for instance:

2 x Intel® Xeon Gold 6342 2.8G, 24 cores
256GB memory
25GbE networking
6 x Dell Ent NVMe CM6 RI 3.84TB

Even though it is not on the list as a ReadyNode configuration, this configuration would be supported as all the components are certified, and the server itself is also certified as a vSAN ReadyNode platform, and we are following the guidelines as documented in the vSAN ESA RN KB.

I hope this helps those who are going through the process of procuring hardware for vSAN ESA.

vSphere 8.0 U2 and vSAN 8.0 U2 just shipped, learn all about it here!

Duncan Epping · Sep 22, 2023 ·

vSphere 8.0 U2 and vSAN 8.0 U2 just shipped, and of course the Unexplored Territory Podcast has already covered this. If you want to learn all about it make sure to listen to the episode below. Or of course read the release notes (vCenter, ESXi, vSAN).

You can find the vSAN 8.0 U2 episode on Spotify (https://bit.ly/3QNjpFk), and Apple (https://bit.ly/3QPt7XL), as well as any other podcast app, or simply listed via the embedded player below!

You can find the vSphere 8.0 U2 episode on Spotify (https://bit.ly/3snOh5l), Apple (https://bit.ly/45lRK2Q), as well as any other podcast app, or simply listed via the embedded player below!

Scalable Snapshots demo with the vSAN 8.0 Express Storage Architecture

Duncan Epping · Sep 5, 2023 ·

Starting with vSAN 8 a brand new architecture was introduced called “Express Storage Architecture”. Over the last year or so a lot of information has been shared about ESA and the benefits of ESA. One of the things which ESA introduces is much-improved snapshot scalability.

With vSAN OSA, and with VMFS, when you create a snapshot you typically immediately see a performance degradation. This is because both VMFS and vSAN OSA still operate using the redo-log based snapshot mechanism. This means that with vSAN OSA when you create a snapshot a new object is created and writes are re-directed. It also means that reads will be coming from various files, if you have one or more snapshots. This mechanism is, unfortunately, not very effective. Let me borrow a diagram that is part of a post John Nicholson wrote to demonstrate that old logic.

With vSAN 8 ESA the mechanism has changed and no longer does vSAN, or vSphere for that matter, create an additional object. vSAN ESA handles this on a meta-data level. In other words, instead of redirecting writes and traversing files for reads, vSAN now leverages a highly efficient B-Tree structure and pointers to keep track of which block is associated with which snapshot.

Not only is this more efficient from a capacity perspective, but more importantly it is very efficient from a performance standpoint. I ran half a dozen tests in my lab, and what I saw was a below 2% performance impact between a VM without a snapshot and a VM with one or multiple snapshots. I could NOT see a significant difference between the first or the fifth snapshot. I do want to point out that my lab is not officially certified to run vSAN ESA, nevertheless, I was very impressed with the results.

During the last run, I actually recorded the whole exercise. In this demo, I show the creation of one snapshot, while the VM is running a benchmark (HCIBench). Now, during the testing, I created not one but various snapshots and of course, I deleted all of them as well. You have all probably experienced extensive stun times during the deletion of a snapshot at times, and this is where vSAN ESA shines. The stun times have been reduced by 100 times, and that is something I am sure each of you will appreciate. Why have they been reduced drastically? Well, simply because we no longer have to copy data from one vSAN object to another. This makes a huge difference, not just for stun times, but also for performance in general (latency, IOPS, throughput). If you are interested, have a look at the demo!