Yellow Bricks

vSAN Stretched: Why is the witness not part of the cluster when the link between a data site and the witness fails?

Duncan Epping · Jun 25, 2024 · Leave a Comment

Last week I received a question about vSAN Stretched which had me wondering for a while what on earth was going on. The person who asked this question was running through several failure scenarios, some of which I have also documented in the past here. The question I got is what is supposed to happen when I have the following scenario as shown in the diagram and the link between the preferred site (Site A) and the witness fails:

The answer, at least that is what I thought, was simple: All VMs will remain running, or said differently, there’s no impact on vSAN. While doing the test, indeed the outcome I documented, which is also documented in the Stretched Clustering Guide and the PoC Guide was indeed the same, the VMs remain running. However, one of the things that was noticed is that when this situation occurs, and indeed the connection between Site A and the Witness is lost, the witness is somehow no longer part of the cluster, which is not what I would expect. The reason I would not expect this to happen is because if a second failure would occur, and for instance the ISL between Site A and Site B goes down, it would direclty impact all VMs. At least, that is what I assumed.

However, when I triggered that second failure and I disconnected the ISL between Site A and Site B, I saw the witness re-appearing again immidiately, I saw the witness objects going from “absent” to “active”, and more importantly, all VMs remained running. The reason this happens is fairly straight forward, when running a configuration like this vSAN has a “leader” and a “backup”, and they each run in a seperate fault domain. Both the leader and the backup need to be able to communicate with the Witness for it to be able to function correctly. If the connection between Site A and the Witness is gone, then either the leader or the backup can no longer communicate with the Witness and the Witness is taken out of the cluster.

So why does the Witness return for duty when the second failure is triggered? Well, when the second failure is triggered the leader is restarted in Site B (as Site A is deemed lost), and the backup is already running in Site B. As both the leader and the backup can communicate again with the witness, the witness returns for duty and so will all of the components automatically and instantly. Which means that even though the ISL has failed between Site A and B after the witness was taken out of the cluster, all VMs remain accessible as the witness is reintroduced instantly to ensure availability of the workload. Pretty cool! (Thanks to vSAN engineering for providing these insights on why this happens!)

VMware Explore 2024 Las Vegas, my top sessions!

Duncan Epping · Jun 19, 2024 · 1 Comment

I’ve been doing this for as long as I can remember, creating a list of my personal favorite sessions for our yearly event, VMware Explore. Of course, I created one for 2024 in Las Vegas as well. Typically these are the sessions I am going to try to attend personally, that is if they are not full. Yes, the list is focussed on areas I am interested in, and is not necessarily a representation of the broad diversity in terms of topics discussed at the event. But I am sure a few others will produce a similar list at some point, which may (or may not) be more balanced.

Normally I would have a list of sessions on my page, but the cool thing is that on the Explore website, I was now able to create a “targeted agenda”, so that you can simply go there and favorite those sessions from your account straight away. This makes things a lot easier than previous years where I actually had to copy and paste all the details, create the links, and then have you go back and forth between the various pages to favorite what you also find interesting. So if you are going to Explore, and you are a bit geeky like I am, the sessions in the following link may be of interest to you!

https://myevents.vmware.com/widget/vmware/explore2024lv/1718781547715001Ezh2

Memory Tiering… Say what?!

Duncan Epping · Jun 14, 2024 · 1 Comment

Recently I presented a keynote at the Belgium VMUG, the topic was Innovation at VMware by Broadcom, but I guess I should say Innovation at Broadcom to be more accurate. During the keynote I briefly went over the process and the various types of innovation and what this can lead to. During the session, I discussed three projects, namely vSAN ESA, the Distributed Services Engine, and something which is being worked on called: Memory Tiering.

Memory Tiering is a very interesting concept that was first publicly discussed at Explore (or VMworld I guess it was still. called) a few years ago as a potential future feature. You may ask yourself why anyone would want to tier memory, as the impact from a performance stance can be significant. There are various reasons to do so, one of them being the cost of memory. Another problem the industry is facing is the fact that memory capacity (and performance) has not grown at the same rate as CPU capacity, which has resulted in many environments being memory-bound, differently said the imbalance between CPU and memory has increased substantially. That’s why VMware started Project Capitola.

When Project Capitola was discussed most of the focus was on Intel Optane, and most of us know what happened to that. I guess some assumed that that would also result in Project Capitola, or memory tiering and memory pooling technology, being scrapped. This is most definitely not the case, VMware has gone full steam ahead and has been discussing the progress in public, although you need to know where to look. If you listen to that session it is clear that there are various efforts, that would allow customers to tier memory in various ways, one of them being of course the various CXL based solutions that are coming to market now/soon.

One of which is memory tiering via a CXL accelerator card, basically an FPGA that has the sole purpose of increasing memory capacity, offload memory tiering and accelerating certain functionality where memory is crucial like for instance vMotion. As mentioned in the SNIA session, using an accelerator card can lead to a 30% reduction in migration times. An accelerator card like this will also open up other opportunities, like pooling memory for instance, which is something customers have been asking for since we created the concept of a cluster. Being able to share compute resources across hosts. Just imagine, your VM can use memory capacity available on another host without having to move the VM. Yes, before anyone comments on this, I do realize that this will have a significant performance impact potentially.

That is of course where the VMware logic comes into play. At VMworld in 2021 when Project Capitola was presented, the team also shared the performance results of recent tests, and it showed that the performance degradation was around 10% when 50% of DRAM was used and 50% of Optane memory. I was watching the SNIA session, and this demo shows the true power of VMware vSphere, memory tiering, and acceleration (Project Peaberry as it is called). On average the performance degradation was around 10%, yet roughly 40% of virtual memory was accessed via the Peaberry accelerator. Do note that the tiering is completely transparent to the application, this works for all different types of workloads out there. The crucial part here to understand is that because the hypervisor is already responsible for memory management, it knows which pages are hot and which pages are cold, that also means it can determine which pages it can move to a different tier while maintaining performance.

Anyway, I cannot reveal too much about what may, or may not, be coming in the future. What I can promise though is that I will make sure to write a blog as soon as I am allowed to talk about more details publicly, and I will probably also record a podcast with the product manager(s) when the time is there, so stay tuned!

Thanks for your support!

Duncan Epping · May 23, 2024 · 1 Comment

About 7-8 months ago I shared with you that I would be participating in the ROPA RUN for charity. As explained, the ROPA RUN is a charity relay running event where you start near Paris and run back to Rotterdam. Each team that participates has 8 runners which are divided in 2 groups, and each group runs 4-5 hrs in a relay fashion.

Last weekend I participated in the event, and what an experience that was. From a running perspective, it definitely was something I had never experienced before. The relay mechanism is what made things more challenging than expected, after every 1-2KM you switch runners and you get ~15 minutes of rest, however, this also means you cool down. Although I didn’t run a huge number of KMs, between 17k and 22k per “shift” of 4-5 hrs, when you have to sit down after every 1-2KM it will get more challenging to get started every single time it is your turn.

What probably was the biggest challenge though for me was the lack of sleep. As we had to travel between locations while the other group was running, and we also had to freshen up, eat, and hydrate, it resulted in around 1.5 hrs of sleep combined over 2 nights. Unfortunately for me, I also had a bad night of sleep on Friday (3hrs), which definitely didn’t help either. This was the most challenging aspect of the whole event… My body appreciates sleep. But I already knew that I guess. I knew I would get a splitting headache if I were sleep-deprived, and I knew that running would be very unpleasant with a headache as a result of lack of sleep.

Why did I sign up knowing that it would be unpleasant? Well first and foremost because it is a charity event, I feel everyone should aim to give back in some shape or form when they can. The other reason of course is because I like to challenge myself, sometimes you need to do things that are far out of your comfort zone, things you may not enjoy when you are in the moment. The third reason was because I would be able to hang out with friends for three days straight. Anyway, that wasn’t why I wanted to write this post, I simply wanted to thank everyone who supported me by reposting my request for donations, and especially those who donated. I personally raised over 2700 euros, and every single cent of that went to charity! Thanks everyone, I truly appreciate it!

Using vSAN Datastore Sharing aka HCI Mesh to connect OSA with ESA is that supported?

Duncan Epping · Apr 10, 2024 · 3 Comments

I’ve seen a few questions around this, is it possible, or supported, to use vSAN Datastore Sharing aka HCI Mesh to connect OSA with ESA? Or of course the other way around. I can be brief about this, no it is not supported and it isn’t possible either. vSAN HCI Mesh or Datastore Sharing uses the vSAN proprietary protocol called RDT. When using vSAN OSA a different version is used of RDT than with vSAN ESA, and these are unfortunately not compatible at the moment. Meaning that as a result you cannot use vSAN Datastore Sharing to share your OSA capacity with an ESA cluster, or the other way around. Hope that clarify things.