For the last episode of 2024, I invited Deji Akomolafe to discuss running Microsoft workloads on top of VCF. I’ve known Deji for a long time, and if anyone is passionate about VMware and Microsoft technology, it is him. Deji went over the many caveats, and best practices when it comes to running for instance SQL on top of VMware VCF (or vSphere for that matter). NUMA, CPU Scheduling, latency sensitive settings, power settings, virtual disk controllers, just some of the things you can expect in this episode. You can listen to the episode on Spotify, Apple, or via the embedded player below.
Storage
Storage IO Control, aka SIOC, deprecation notice with 8.0 U3!
Recently it was announced that SIOC was going to be deprecated and that SDRS IO Load Balancing as a result would also be deprecated. The following was mentioned in the release notes of 8.0 Update 3:
Deprecation of Storage DRS Load Balancer and Storage I/O Control (SIOC): The Storage DRS (SDRS) I/O Load Balancer, SDRS I/O Reservations-based load balancer, and vSphere Storage I/O Control Components will be deprecated in a future vSphere release. Existing 8.x and 7.x releases will continue to support this functionality. The deprecation affects I/O latency-based load balancing and I/O reservations-based load balancing among datastores within a Storage DRS datastore cluster. In addition, enabling of SIOC on a datastore and setting of Reservations and Shares by using SPBM Storage policies are also being deprecated. Storage DRS Initial placement and load balancing based on space constraints and SPBM Storage Policy settings for limits are not affected by the deprecation.
So why do I bring this up if this was announced a while back? Well, apparently, not everyone had seen that announcement, and not everyone fully understands the impact. For Storage DRS (SDRS), this means that essentially, ‘capacity balancing’ remains available, but anything related to performance will not be available in a next major release. Also, noisy neighbor handling through SIOC with shares and, for instance, IO reservations will no longer be available.
Some of you may also have noticed already that in the UI on a per VM level the ability to specify the IOPS limit had also disappeared. So what does this mean for IOPS Limits in general? Well, that functionality will remain available through Policy Based Management (SPBM) as it is today. So, if you set IOPS limits on a per VM basis in vSphere 7, if you upgrade to vSphere 8 you will need to use the SPBM policy option! This IOPS Limit option in SPBM will remain available, even though in the UI it shows up under “SIOC” it is actually applied through the disk scheduler on a per-host basis.
VCF-9 Vision for a Federated Storage View and vSAN (stretched cluster) visualizations!
As mentioned last week, the sessions at Explore Barcelona were not recorded. I still wanted to share with you what it is we are working on, so I decided to record and share a few demos, and some of the slides we presented. In this video, I show our vision for a Federated Storage View for both vSAN and more traditional storage systems. This federated view will not only provide insights in terms of capacity and performance capabilities, it will also provide you with a visualization of a stretched cluster configuration. This is something that I have been asking for for a while now, and it looks like this will become a reality in VCF 9 at some point. As this revolves all around visualization, I would urge you to watch the below video. And as always, if you have feedback, please leave a comment!
Why vSAN Max aka disaggregated storage?
At VMware Explore 2024 in Las Vegas I had many customer meetings. Last year my calendar was also swamped and one of the things we spend a lot of time on was explaining to customers where vSAN Max would fit into the picture. vSAN Max was originally positioned as a “storage only vSAN platform for petabyte scale use cases”. I guess this still somewhat applies, but since then a lot has changed.
First, the vSAN Max ReadyNode configurations have changed substantially, and you can start at a much smaller capacity scale than when originally launched. We start at 20TB for an XS ReadyNode configuration, which means that with a 4 node minimum you have 80TB. That is something completely different then the petabytes we originally discussed. The other big difference is also the networking requirements, depending on the capacity needs, those have also come down substantially.
Now was said, originally the platform was positioned as a solution for customers running at petabyte scale. The reason I wanted to write a quick blog is because that is not the main argument today customers have for adopting vSAN Max or considering vSAN Max in their environment. The reason is something I personally did not expect to hear, but it is all about operations and sometimes politics.
In a traditional environment, of course, depending on the size, you typically see a separation of duties. You have virtualization admins, networking admins, and storage admins. We have worked hard over the past decade to try to create these full-stack engineers, but the reality is that many companies still have these silos, and they will likely still exist 20 years from now.
This is where vSAN Max can help. The HCI model typically means that the virtualization administrator takes on the storage responsibilities when they implement vSAN, but with vSAN Max this doesn’t necessarily need to be the case. As various customers mentioned last week, with vSAN Max you could have a fully separated environment that is managed by a different team. Funny how often this was brought up as a great use case for vSAN. Especially with the amount of vSAN capacity included in VCF this makes more and more sense!
You could even have a different authentication service connected to the vCenter Server, which manages your vSAN Max clusters! You could have other types of hosts, cluster sizes, best practices, naming schemes, etc. This will all be up to the team managing that particular euuh silo. I know, sometimes a silo is seen as something negative, but for a lot of organizations, this is how they operate, and prefer to operate for the foreseeable future. If so, vSAN Max can cater for that use case as well!
vSAN Data Protection, what can you do with it today?
Last week I posted a blog about how to get vSAN Data Protection up and running, but I never explained why you may want to do this in the first place? Before we dive into it, it probably is smart to also mention that vSAN Data Protection leverages the new snapshotting capability which is part of the vSAN Express Storage Architecture (vSAN ESA). vSAN ESA has a snapshotting capability that infinitely scales. It literally is 100x better than the snapshotting mechanism that is part of vSAN OSA. The reason for this is simple, with vSAN OSA (and VMFS for that matter) when you create a snapshot a new object is created and IO is redirected. With vSAN ESA we basically copy the meta data structure and keep using the existing objects for IO, which means we don’t need to traverse multiple files for reads for instance, and when we delete a snapshot we don’t need to copy data… As it is all about metadata changes / tracking.
Now in vSphere / vSAN 8.0 Update 3 we introduce a new capability called vSAN Data Protection. This provides you the ability to schedule snapshots, create immutable snapshots, clone snapshots, restore snapshots etc. All this through the vSphere Client interface, of which you can see an example below.
Now, in the above screenshot you see that half of the VMs are protected, and the other half is not. Why is that? Well, simply because I decided to only protect half of my VMs when I created my snapshot schedule. This is an “opt-in” mechanism, so you create a schedule, you decide which VMs are part of a schedule and which are not! Let’s take a look at the scheduling mechanism. You can find the mechanism at the cluster level under “Configuration -> vSAN -> Data Protection”. When you go there you see the above screen and you can click on “protection groups” and then “Create protection group”.
This will then present you a screen where you can define the name of the protection group, enable “immutability mode” and select the “Membership” of the protection group. Personally I like the Dynamic VM Patterns option. Just makes things easy. Like in this example as a pattern I used “*de*”, which means that all VMs with “de” in the name will be snapshotted whenever the schedule triggers a snaphot.
As mentioned, vSAN Data Protection includes ALL the virtual machines which match the pattern when the snapshot is taken. So if you create a schedule like for instance the following, it could take close to an hour before the VM is snapshotted for the very first time. If you feel this is too long then there’s of course also an option to manually snapshot the VM.
The schedules you can create can be really complex (up to 10 schedules), I just created an example of what is possible, but there’s many different options and combinations you can create. Note, a VM can be part of three different protection groups at most, and I also want to point out that the protection group is not a consistency group. Also note, you can have at most 200 snapshots per object/VM, so that is also something to take into consideration.
Now when the VMs are snapshotted you get a few extra options available, you have “snapshot management” at the VM level individually, and of course you can also manage things at the protection group level. You will see options like “restore” and “clone” and this is where it gets interesting, as this will allow you to go back to a particular point in time if needed from a particular snapshot, and also clone the VM from a particular snapshot. Why would you clone it? Well if you would want to test something against a specific dataset for instance, or if you want to do some type of digital forensics and analyze a VM for whatever reason.
One thing that is also good to mention is that with vSAN Data Protection, you can also recover VMs which have been deleted from vCenter Server. This is one of those must have features in my opinion, as this is one of those things that does occasionally happen, unfortunately. Power Off VM, delete … oops, wrong one! When it comes to the recovery of a VM, the process is just straight forward. You select the snapshot and click “restore VM”, or “clone VM”, depending on what you would expect as the outcome. Restore means the current instance is powered off, and the snapshot is the restore point when you power it on again. Of course, when you do a clone you simply get a second instance of that VM. Note that when you create a clone, it is a linked clone, so it is extremely fast to instantiate.
The last thing I want to discuss is the immutability mode. I want to first caution people, this is what you think it is… it is IMMUTABLE! This means that you cannot delete the snapshots or change the schedule etc, let me quote the documentation:
Enable immutability mode on a protection group for additional security. You cannot edit or delete this protection group, change the VM membership, edit or delete snapshots. An immutable snapshot is a read-only copy of data that cannot be modified or deleted, even by an attacker with administrative privileges.
That is why I wanted to caution people, because if you create an immutable protection group with hundreds of VMs by accident… Yes, snapshots will be immutable, you cannot delete the protection group or the snapshots, or change the snapshot schedule. No, not even as an administrator. Do note, you can delete the VM if you want…
Anyway, I hope this gave a brief overview of what you can do with vSAN Data Protection at the moment. I’ll play around with it some more, and report back when there’s more to share 🙂