On VMTN I noticed somehow asking why vCenter Server was trying to access assets.contentstack.io, and why there were so many DNS requests for assets.contentstack.io. It took me a while to figure it out, but I noticed that there’s a plugin for the VMware Cloud Provider Services, this plugin is hosted on contentstack.io, and that is the reason you see vCenter Server trying to connect with that URL and why you are seeing DNS requests for assets.contentstack.io. You can prevent this from happening by simply selecting the plugin, and then removing it. That is, of course, if you are not planning on using these services.
Server
vSAN ESA is using more CPU cycles than vSAN OSA?
Over the last couple of weeks, I’ve had conversations with customers and partners who have been running performance benchmarks against both vSAN ESA and vSAN OSA. As you can imagine, people want to compare version 8 of OSA against version 8 of ESA, and that is completely fair. What I noticed though is that some of those customers came back with comments around CPU usage of vSAN OSA against ESA. The general comment we get is that vSAN ESA is using more CPU cycles than vSAN OSA.
When looking at it from a total number point of view, or CPU cycles consumed, it is very likely you will see vSAN ESA using more cycles than vSAN OSA. The question then typically arises why that is the case, as VMware (the vSAN team) has been claiming that vSAN ESA is much more efficient than vSAN OSA. To be fair, it is much more efficient. For instance data services like checksumming, encryption, and compression have moved to the top of the stack (as shown below) resulting in the fact that we don’t have to compress/encrypt data 3/4/5/6 times but can do it once at the source and then send it over the network to the destination.
Still, it leaves the question, why is more CPU capacity used? The answer is simple, you are pushing much more IO. We’ve seen customers easily reaching 4x the number of IOPS with ESA than with OSA. Even though ESA is more efficient, if you are pushing 4x (or more) the amount of IO then you will need to remember that those additional IOs also come at a cost, and that cost is CPU cycles to process them. So when you make a comparison, please compare apples to apples, and not apples to oranges.
The last thing I want to add, and hopefully I can share some data in the future, the use of RDMA with vSAN 8 ESA seems to have a significant impact on CPU usage, as in lower the amount of CPU required to produce the same results (or better results). So it is worth considering RDMA for sure when adopting vSAN 8 ESA!
What can I change about a vSAN ESA Ready Node?
I’ve had half a dozen people asking about this over the past weeks, it really seems more and more people are at the point of adopting vSAN ESA (Express Storage Architecture. When they look at the various vSAN ESA Ready Node configurations what stands out is that the current list is limited in terms of server models and configurations. (https://vmwa.re/vsanesahcl)
The list is being updated every week, last week for instance Supermicro popped up as a Server vendor. Of course, Dell, HPe, and Lenovo had been on the list since day 1. When you select the vendor, the ready node type, and the model you will then have the option to select a number of things, but in most cases, you seem to be limited to “Storage Device” and “Number of Storage Devices”. This however does not mean you cannot change anything. A knowledge base article has been released which describes what you can, and cannot change when it comes to these configurations! The KB article is listed on the vSAN ESA VMware Compatibility Guide list, but somehow it seems people don’t always notice the link. (Yes, I have asked the team to make the link more obvious somehow.)
Now when you look at the KB it lists what you can change, and what the rules are when it comes to making changes. For instance, you can change the CPU, but only for the same or higher core count and the same or higher base clock speed. For memory, you can increase the amount, and the same applies to storage capacity for instance. For storage it is even a bit more specific, you need to use the same make/model, so basically if the ReadyNode configuration lists a P5600 of 1.6TB, you can swap it for a P5600 of 3.2TB. We recently (May 20th 2023) had a change in support, and we now support the change of device model/make, as long as you follow the other guidelines mentioned in the KB. For instance, you can swap an Intel device for a Samsung, but that Samsung would need to be supported by the OEM vendor and needs to be the same (or higher) performance and endurance class. And of course the device needs to be certified for vSAN ESA: http://vmwa.re/vsanesahclc. Anyway, if you are configuring a Ready Node for ESA, make sure to check the KB so that you make supported changes!
The following VIBs on the host are missing from the image and will be removed from the host during remediation: vmware-fdm
I’ve seen a few people being confused about a message which is shown when upgrading ESXi. The message is: The following VIBs on the host are missing from the image and will be removed from the host during remediation: vmware-fdm(version number + build number). Now this happens when you use vLCM (Lifecycle Manager) to upgrade from one version of ESXi to the next. The reason for it is simple, the vSphere HA VIB (vmware-fdm) is never included in the image.
If it is not included, how do the hosts get the VIB? The VIB is pushed by vCenter Server to the hosts when required! (When you enable HA for instance on a cluster.) This also is the case after an upgrade. After the VIB is removed it will simply be replaced by the latest version of it by vCenter Server. So no need to be worried, HA will work perfectly fine after the upgrade!
vSAN 8.0 ESA – Dude, where’s my vSAN disk group?
Last week I was talking to a customer and he mentioned that he deployed vSAN 8.0 in his lab and he was shocked that when he wanted to define disk groups he noticed that they don’t exist anymore. Well, not in vSAN 8.0 ESA (Express Storage Architecture) that is. They do still exist in the Original Storage Architecture! The big change with vSAN 8.0 ESA is that the “bottleneck” in the previous architecture has been removed. No longer will you select a single device for caching for a particular disk group, and no longer do you designate devices purely for capacity.
With vSAN 8.0 ESA all your devices will be part of a single storage pool, and all those devices will contribute to both storage capacity as well as storage performance! The added benefit of course is the fact that writes and reads will be distributed across all devices, removing a potential choking point, and also removing a single point of failure. Why? Well with vSAN OSA when the caching device fails the whole disk group becomes unavailable. With ESA that is no longer the case as there’s no caching device!
So how does vSAN ESA provide both optimal efficiency for capacity as well as optimal performance? Well, it does this by introducing additional layers. The idea is that vSAN will provide write performance at the level of RAID-1 but space efficiency at the level of RAID-5 or RAID-6. That would be the best of both worlds. It would need to do this however while taking into consideration that we are also dealing with different types of flash devices than you normally would be with vSAN OSA. In other words, writes will also need to be optimized for the types of devices used (TLC), and it will also need to be future-proof for devices that may be supported later on (QLC).
One of the key elements in this new architecture is the introduction of the “log-structured filesystem” and the “durable log”. Let’s look at the below diagram first.
What we do with vSAN ESA is that all data is written to the log-structured file system first in the durable log. This ensures that data is persistently stored. This is what the “performance leg” provides. The performance leg literally stores the writes first. That could be 4KB blocks, or 32KB blocks, or whatever. It stores the data first, collects a full stripe write (512KB), and then writes the data to the capacity leg. Why these 2-layers? Well, the performance leg is a RAID-1 configuration, so it is optimal for write performance, while in general, the capacity leg will be RAID-5 or RAID-6, which is optimal for space efficiency. By creating this small performance leg component that holds the durable log, vSAN is capable of immediately acknowledging the writes as it is persisted in the log, and then when there’s a full stripe write it efficiently as RAID-5 or RAID-6.
Now of course, in the UI you will be able to see those new performance leg components and the capacity leg components. They are not marked as “performance” or “capacity” but they are easily recognizable. I created a quick demo that talks you through the above. If you are interested, check it out!