vSAN ESA is using more CPU cycles than vSAN OSA?

Duncan Epping · Feb 1, 2023 · 7 Comments

Over the last couple of weeks, I’ve had conversations with customers and partners who have been running performance benchmarks against both vSAN ESA and vSAN OSA. As you can imagine, people want to compare version 8 of OSA against version 8 of ESA, and that is completely fair. What I noticed though is that some of those customers came back with comments around CPU usage of vSAN OSA against ESA. The general comment we get is that vSAN ESA is using more CPU cycles than vSAN OSA.

When looking at it from a total number point of view, or CPU cycles consumed, it is very likely you will see vSAN ESA using more cycles than vSAN OSA. The question then typically arises why that is the case, as VMware (the vSAN team) has been claiming that vSAN ESA is much more efficient than vSAN OSA. To be fair, it is much more efficient. For instance data services like checksumming, encryption, and compression have moved to the top of the stack (as shown below) resulting in the fact that we don’t have to compress/encrypt data 3/4/5/6 times but can do it once at the source and then send it over the network to the destination.

Still, it leaves the question, why is more CPU capacity used? The answer is simple, you are pushing much more IO. We’ve seen customers easily reaching 4x the number of IOPS with ESA than with OSA. Even though ESA is more efficient, if you are pushing 4x (or more) the amount of IO then you will need to remember that those additional IOs also come at a cost, and that cost is CPU cycles to process them. So when you make a comparison, please compare apples to apples, and not apples to oranges.

The last thing I want to add, and hopefully I can share some data in the future, the use of RDMA with vSAN 8 ESA seems to have a significant impact on CPU usage, as in lower the amount of CPU required to produce the same results (or better results). So it is worth considering RDMA for sure when adopting vSAN 8 ESA!

Comments

Matt Mancini says

2 February, 2023 at 00:05

100% on RDMA, its the way to go!

Reply
Christophe Husson says

3 February, 2023 at 12:36

Do you know when RDMA will be supported for stretched cluster ?

Reply
- Duncan Epping says
  
  6 February, 2023 at 14:24
  
  I can’t comment on that unfortunately.
  
  Reply
- Jordi Benet says
  
  20 May, 2023 at 19:31
  
  It won’t be supported anytime soon, because for VSAN you use Ethernet and for RDMA you would use ROCE protocol in the network that requires a lossless network. Being Ethernet a lossy network, you require to activate multiple features in your switch and one of them is PFC. PFC, on today´s switches, limits the distance between switches to around 800meters (Depends on the quality of your DCI dark fiber). Unless your 2 sites are less than 800m apart… you will not be able to… That’s the main reason the industry is moving to NVMEoTCP where you don’t require a lossless network, so you can use it for Stretched cluster deployments.
  
  Reply
Dag Kvello says

6 February, 2023 at 13:53

I believe that the “much” higher HW cost and demands on CPU/Cores/Memory has been under communicated to say the least.

F.eks. the AMD-AF-4 Series has a 16 Core, 128GB RAM requirement while the vSAN-ESA-AF-4 has a minimum 40 core, 512GB RAM pr. node requirement.
I’m not bothering with the Network or NVMe costs in this case as they’ll be the same for OSA and ESA when using NVMe modules.

Besides the obvious 3x HW costs there’s also the much higher vSphere(+) and vSAN(+) licensing cost pr. node.

Reply
- Duncan Epping says
  
  6 February, 2023 at 14:23
  
  But that is the point, the demands on CPU or Cores isn’t higher per IO, as long as you would drive an equal amount of IO for ESA and OSA than ESA will be more efficient. However, the specs are created so that you can drive far more IOs, as that is what customers have asked for. If you don’t require those amounts, then OSA can be used.
  
  Reply
  - Dag Kvello says
    
    6 February, 2023 at 14:50
    
    I can agree that it’s much more efficient in how many IOPS etc it can deliver pr. CPU cycle, but that does in no way change the minimum cost picture when it comes to licensing and HW cost.
    In reality (and I’ve been working on several three-four node vSAN BIDs lately) this makes a classic three-layer All-Flash (32GB FC, NVMeoF) look cheap and flexible in comparison.
    Just the vSAN licensing cost alone for a supported OSA solution are higher than purchasing two IBM FS5200 in Hyperswap config with 32GB FC switches and w/5yr support.
    
    Reply

Related

Reader Interactions

Comments

Leave a ReplyCancel reply