Server

Memory Tiering… Say what?!

Duncan Epping · Jun 14, 2024 · 1 Comment

Recently I presented a keynote at the Belgium VMUG, the topic was Innovation at VMware by Broadcom, but I guess I should say Innovation at Broadcom to be more accurate. During the keynote I briefly went over the process and the various types of innovation and what this can lead to. During the session, I discussed three projects, namely vSAN ESA, the Distributed Services Engine, and something which is being worked on called: Memory Tiering.

Memory Tiering is a very interesting concept that was first publicly discussed at Explore (or VMworld I guess it was still. called) a few years ago as a potential future feature. You may ask yourself why anyone would want to tier memory, as the impact from a performance stance can be significant. There are various reasons to do so, one of them being the cost of memory. Another problem the industry is facing is the fact that memory capacity (and performance) has not grown at the same rate as CPU capacity, which has resulted in many environments being memory-bound, differently said the imbalance between CPU and memory has increased substantially. That’s why VMware started Project Capitola.

When Project Capitola was discussed most of the focus was on Intel Optane, and most of us know what happened to that. I guess some assumed that that would also result in Project Capitola, or memory tiering and memory pooling technology, being scrapped. This is most definitely not the case, VMware has gone full steam ahead and has been discussing the progress in public, although you need to know where to look. If you listen to that session it is clear that there are various efforts, that would allow customers to tier memory in various ways, one of them being of course the various CXL based solutions that are coming to market now/soon.

One of which is memory tiering via a CXL accelerator card, basically an FPGA that has the sole purpose of increasing memory capacity, offload memory tiering and accelerating certain functionality where memory is crucial like for instance vMotion. As mentioned in the SNIA session, using an accelerator card can lead to a 30% reduction in migration times. An accelerator card like this will also open up other opportunities, like pooling memory for instance, which is something customers have been asking for since we created the concept of a cluster. Being able to share compute resources across hosts. Just imagine, your VM can use memory capacity available on another host without having to move the VM. Yes, before anyone comments on this, I do realize that this will have a significant performance impact potentially.

That is of course where the VMware logic comes into play. At VMworld in 2021 when Project Capitola was presented, the team also shared the performance results of recent tests, and it showed that the performance degradation was around 10% when 50% of DRAM was used and 50% of Optane memory. I was watching the SNIA session, and this demo shows the true power of VMware vSphere, memory tiering, and acceleration (Project Peaberry as it is called). On average the performance degradation was around 10%, yet roughly 40% of virtual memory was accessed via the Peaberry accelerator. Do note that the tiering is completely transparent to the application, this works for all different types of workloads out there. The crucial part here to understand is that because the hypervisor is already responsible for memory management, it knows which pages are hot and which pages are cold, that also means it can determine which pages it can move to a different tier while maintaining performance.

Anyway, I cannot reveal too much about what may, or may not, be coming in the future. What I can promise though is that I will make sure to write a blog as soon as I am allowed to talk about more details publicly, and I will probably also record a podcast with the product manager(s) when the time is there, so stay tuned!

Thanks for your support!

Duncan Epping · May 23, 2024 · 1 Comment

About 7-8 months ago I shared with you that I would be participating in the ROPA RUN for charity. As explained, the ROPA RUN is a charity relay running event where you start near Paris and run back to Rotterdam. Each team that participates has 8 runners which are divided in 2 groups, and each group runs 4-5 hrs in a relay fashion.

Last weekend I participated in the event, and what an experience that was. From a running perspective, it definitely was something I had never experienced before. The relay mechanism is what made things more challenging than expected, after every 1-2KM you switch runners and you get ~15 minutes of rest, however, this also means you cool down. Although I didn’t run a huge number of KMs, between 17k and 22k per “shift” of 4-5 hrs, when you have to sit down after every 1-2KM it will get more challenging to get started every single time it is your turn.

What probably was the biggest challenge though for me was the lack of sleep. As we had to travel between locations while the other group was running, and we also had to freshen up, eat, and hydrate, it resulted in around 1.5 hrs of sleep combined over 2 nights. Unfortunately for me, I also had a bad night of sleep on Friday (3hrs), which definitely didn’t help either. This was the most challenging aspect of the whole event… My body appreciates sleep. But I already knew that I guess. I knew I would get a splitting headache if I were sleep-deprived, and I knew that running would be very unpleasant with a headache as a result of lack of sleep.

Why did I sign up knowing that it would be unpleasant? Well first and foremost because it is a charity event, I feel everyone should aim to give back in some shape or form when they can. The other reason of course is because I like to challenge myself, sometimes you need to do things that are far out of your comfort zone, things you may not enjoy when you are in the moment. The third reason was because I would be able to hang out with friends for three days straight. Anyway, that wasn’t why I wanted to write this post, I simply wanted to thank everyone who supported me by reposting my request for donations, and especially those who donated. I personally raised over 2700 euros, and every single cent of that went to charity! Thanks everyone, I truly appreciate it!

How to stop vCLS VMs from running on a vSphere HA Failover Host?

Duncan Epping · Mar 18, 2024 · 14 Comments

I’ve had this question twice in about a week, which means that it is time to write a quick post. How do you stop vCLS VMs from running on a vSphere HA Failover Host? For those who don’t know, a vSphere HA Failover Host is a host which is used when a failure has occurred and vSphere HA needs to restart VMs. In some cases customers (but usually Cloud partners) want to prevent the use of these hosts by any workload as it could impact the cost of usage of the platform.

Unfortunately within the UI you cannot specify that vCLS VMs cannot run on specific hosts, you can limit the vCLS VMs from running next to other VMs, but not hosts. There is however an option to specify which datastores the VMs can be stored on, and this is a potential way of limiting which hosts the VMs can run on as well. How? Well if you create a datastore that is not presented to the designated vSphere HA Failover Host then the vCLS VM cannot run on that host as the host cannot access the datastore. It is a workaround for the problem, you can find out more about the datastore placement mechanism for vCLS in this document here. Do note, as stated, those vCLS VMs won’t be able to run on those hosts, so if the rest of the cluster fails and only the Failover Host is left the vCLS VMs will not be powered on. This means that DRS will not function while those VMs are unavailable.

I’ve contacted vSphere HA/vCLS product management to see if we can get this fixed somehow more elegantly in the product, and it is being worked on.

Happy Holidays: vSAN 8.0 U1 Book Discounted to less than 5 USD for the ebook!

Duncan Epping · Dec 8, 2023 ·

Holidays are coming, so we figured it was time to lower the price of the vSAN ESA 8.0 U1 book! Haven’t bought it yet? The paper edition is around $ 19.99 and the ebook is $ 4.99 (or less, depending on where you live).

It’s a lot of great content for a really low price, and as said we made sure the ebook is priced extremely aggressively as we prefer for technical books not to be printed. Pick it up, the price will return to normal (29.99 and 9.99) after the holiday season!

paper – 19.99 USD
ebook – 4.99 USD

Of course, we also have the links to other major Amazon stores:

United Kingdom – ebook – paper
Germany – ebook – paper
Netherlands – ebook – paper
Canada – ebook – paper
France – ebook – paper
Spain – ebook – paper
India – ebook
Japan – ebook – paper
Italy – ebook – paper
Mexico – ebook
Australia – ebook – paper
Brazil – ebook
Or just do a search in your local amazon store!

Doing network/ISL maintenance in a vSAN stretched cluster configuration!

Duncan Epping · Nov 21, 2023 ·

I got a question earlier about the maintenance of an ISL in a vSAN Stretched Cluster configuration which had me thinking for a while. The question was what would you do with your workload during maintenance. I guess the easiest of course is to power off all VMs and simply shutdown the cluster, for which vSAN has a UI option, and there’s a KB you can follow. Now, of course, there could also be a situation where the VMs need to remain running. But how does this work when you end up losing the connection between all three locations? Normally this would lead to a situation where all VMs will become “inaccessible” as you will end up losing quorum.

As said, this had me thinking, you could take advantage of the “vSAN Witness Resiliency” mechanism which was introduced in vSAN 7.0 U3. How would this work?

Well, it is actually pretty straight forward, if all hosts of 1 site are in maintenance mode, failed, or powered off, the votes of the witness object for each VM/Object will be recalculated within 3 minutes. When this recalculation has completed the witness can go down without having any impact on the VM. We introduced this capability to increase resiliency in a double-failure scenario, but we can (ab)use this functionality also during maintenance. Of course I had to test this, so the first step I took was placing all hosts in 1 location into maintenance mode (no data evac). This resulted in all my VMs being vMotioned to the other site.

Now next I checked with RVC if my votes were recalculated or not. As stated, depending on the number of VMs this can take around 3 minutes in total, but usually will probably be quicker. After the recalculation had been completed I powered off the Witness, and this was the result as shown below, all VMs were still running.

Of course, I had to double check on the commandline using RVC (you can use the command “vsan.vm_object_info” to check a particular object for instance) to ensure that indeed the components of those VMs were still “ACTIVE” instead of “ABSENT”, and there you go!

Now when maintenance has been completed, you simply do the reverse, you power on the witness, and then you power on the hosts in the other location. After the “resync” has been completed the VMs will be rebalanced again by DRS. Note, DRS rebalancing (or should rules being applied) will only happen when the resync of the VM has been completed.