ha

New vCLS architecture with vSphere 8.0 Update 3

Duncan Epping · Sep 24, 2024 · 1 Comment

Some of you may have seen this, others may have not, as I had a question today around vCLS retreat mode with 8.0U3 I figured I would write something on the topic quickly. Starting with vSphere 8.0 Update we introduced a new architecture for vCLS aka vSphere Cluster Services. Pre-vSphere 8.0 Update the vCLS architecture was based on virtual machines with Photon OS. These VMs were there to assist in enabling and disabling primarily DRS. If something was wrong with these VMs then DRS would also be unable to function normally. In the past many of you have probably experienced situations where you had to kill and delete the vCLS VMs to restore functionality of DRS, for that VMware introduced a feature called “retreat mode” which basically killed and deleted the VMs for you. There were some other challenges with the vCLS VMs and as a result the team decided to create a new design for vCLS.

Starting with vSphere 8.0 Update 3 vCLS is now implemented as what I would call a container runtime, sometimes referred to as a Pod VM or PodCRX. In other words, when you upgrade to vSphere 8.0 Update 3 you will see your current vCLS VMs be deleted, and these new shiny vCLS VMs will pop up. How do you know if these VMs are created using a different mechanism? Well you can simply just see that in the UI as demonstrated below. See the “CRX” mention in the UI?

So you may ask yourself, why should I even care? Well the thing is, you should not indeed. The new vCLS architecture uses less resources per VM, there are less vCLS VMs deployed to begin with (two instead of three), and they are more resilient. Also, when a host is for instance placed into maintenance mode while it has a vCLS VM running, that vCLS instance is deleted and recreated elsewhere. Considering the VMs are stateless and tiny, that is much more efficient than trying to vMotion it. Note, vMotion and SvMotion of these new (Embedded as they call them) type of vCLS VMs isn’t even supported to begin with.

Normally, vCLS retreat mode shouldn’t be needed anymore, but if you do end up in a situation where you need to clean up these instances, Retreat Mode is still fully supported with 8.0 U3 as well. You can find the Retreat Mode option in the same place as before, on your cluster object under “Configure –> vSphere Cluster Services –> General –> Edit vCLS Mode”. Simply select “Retreat Mode” and the clean up should happen automatically. When you want the VMs to be recreated, simply go back to the same UI and select “System managed”. This should then lead to the vCLS VMs being recreated.

I hope this helps,

vSphere 7.0 U3 contains two great vCLS enhancements

Duncan Epping · Sep 28, 2021 ·

I have written about vCLS a few times, so I am not going to explain what it is or what it does (detailed blog here). I do want to talk about what is part of vSphere 7.0 U3 specifically though as I feel these features are probably what most folks have been waiting for. Starting with vSphere 7.0 U3 it is now possible to configure the following for vCLS VMs:

Preferred Datastores for vCLS VMs
Anti-Affinity for vCLS VMs with specific other VMs

I created a quick demo for those who prefer to watch videos to learn these things if you don’t skip to the text below. Oh and before I forget, a bonus enhancement is that the vCLS VMs now have a unique name, this was a very common request from customers.

Why would you need the above functionality? Let’s begin with the “preferred datastore” feature, this allows you to specify where the vCLS VMs need to be provisioned to from a storage point of view. This would be useful in a scenario where you have a number of datastores that you would prefer to avoid. Examples could be datastores that are replicated or a datastore that is only intended to be used for ISOs and templates, or maybe you prefer to provision on hybrid storage versus flash storage.

So how do you fix this? Well, it is simple, you click on your cluster object. You then click on “Configure”, and on “Datastores” under “vSphere Cluster Services”. Now you will see “VCLS Allowed”, if you click on “ADD” you now will be able to select the datastores to which these vCLS VMs should be provisioned.

Next up Anti-Affinity for vCLS. You would this feature for situations where for instance a single workload needs to be able to solely run on a host, something like SAP for instance. In order to achieve this, you can use anti-affinity rules. We are not talking about regular anti-affinity rules. This is the very first time a brand new mechanism is used on-premises. I am talking about compute policies. Compute policies have been available for VMware Cloud on AWS customers for a while, but now are also appear to be coming to on-prem customers. What does it do? It enables you to create “anti-affinity” rules for vCLS VMs and specific other VMs in your environment by creating Compute Policies and using Tags!

How does this work? Well, you go to “Policies and Profiles” and then click “Compute Policies”. Now you can click “ADD” and create a policy. You now select “Anti Affinity with vSphere Cluster Services (vCLS) VMs”. Then you select the Tag you created for the VMs that should not run on the same hosts as the vCLS VMs, and then you click create. The vCLS VM Scheduler will then ensure that the vCLS VMs will not run on the same hosts as the tagged VMs. If there’s a conflict, the vCLS Scheduler will move away the vCLS VMs to other hosts within the cluster. Let’s reiterate that, the vCLS VMs will be vMotioned to another host in your cluster, the tagged VMs will not be moved!

Hope that helps!

Issue adding tags to the vCLS VMs with vCenter Server 7.0 U2b

Duncan Epping · Jun 1, 2021 ·

Today I was talking to one of our field folks and he asked if there was an issue with Tags in combination with vCLS VMs in 7.0 U2b specifically. I had tested assigning tags to vCLS VMs before, and it worked just fine. With 7.0 U2b unfortunately this has stopped working. The error you will see displayed in the vSphere Client is the following:

(vmodl.fault.SecurityError) {
faultCause = null,
faultMessage = null
}

Or as it shows in the UI:

So what can you do about it? Well, unfortunately not much right now, I filed a bug and uploaded the logs, engineers are looking at it as we speak, and hopefully, I will have an answer for those who need to use tags soon.

UPDATE: Engineering has found a workaround, customers who can’t wait for the fix can contact GSS to get the workaround implemented!

Can I make a host in a cluster the vSphere HA primary / master host?

Duncan Epping · May 21, 2021 ·

There was an interesting question on the VMware VMTN Community this week, although I wrote about this in 2016 I figured I would do a short write-up again as the procedure changed since 7.0u1. The question was if it was possible to make a particular host in a cluster the vSphere HA primary (or master as it was called previously) host. The use case was pretty straightforward, in this case, the customer had a stretched cluster configuration with vSAN, they wanted to make sure that the vSphere HA primary host was located in the “preferred” site, as this could potentially speed up the restart of VMs. Now, mind you, that when I say “speed up” we are talking about 2-3 seconds difference at most, but for some folks, this may be crucial. I personally would not recommend making configuration changes, but if you do want to do this, vSphere does have the option to do so.

When it comes to vSphere HA, there’s no UI option or anything like that to assign the “primary/master” host role. However, there’s the option to specify an advanced setting on a host level to indicate that a certain host needs to be favored during the primary/master election. Again, this is not very common for customers to configure, but if you desire to do so, it is possible. The advanced setting is called “fdm.nodeGoodness” and depending on which version you use, you will need to configure it either via the fdm.cfg file, or via the configstorecli. You can read about this process in-depth here.

Of course, I did try if this worked in my lab, here’s what I did, I first list the current configured advanced options using configstorecli for vSphere HA:

configstorecli config current get -g cluster -c ha -k fdm
{
   "mem_reservation_MB": 200,
   "memory_checker_time_in_secs": 0
}

Next, I will set the “node_goodness” for my host, when setting this it will need to be a positive value, in my case I am setting it to 10000000. I first dumped the current config in a json file:

configstorecli config current get -g cluster -c ha -k fdm > test.json

Next, I edited the file and added the setting “node_goodness” with a value of 10000000, so that is looks as follows:

{ 
    "mem_reservation_MB": 200, 
    "memory_checker_time_in_secs": 0,
    "node_goodness": 10000000
}

I then imported the file:

configstorecli config current set -g cluster -c ha -k fdm -infile test.json

After importing the file and reconfiguring for HA on one of my hosts, you can see in the screenshots below that the master role moved from 1507 to 1505.

I also created a quick demo, for those who prefer video content:

How long does it take before a host is declared failed?

Duncan Epping · Jan 26, 2021 ·

I had a question this week around the failure of a host. The question was how long it takes before a host is declared failed. Now let’s be clear, failed means “dead” in this case, not isolated or partitioned. It could be the power has failed, the host has gone completely unresponsive, or anything else where there’s absolutely no response from the host whatsoever. In that scenario, how long does it take before HA has declared the VM dead? Now note, the below timeline is in a traditional infrastructure. Also note, that this is theoretical, when everything is optimal.

T0 – Secondary Host failure.
T3s – The Primary Host begins monitoring datastore heartbeats for 15 seconds.
T10s – The host is declared unreachable and the Primary will ping the management network of the failed host.
- This is a continuous ping for 5 seconds.
T15s – If no heartbeat datastores are configured, the host will be declared dead.
T18s – If heartbeat datastores are configured and there have been no heartbeats, the host will be declared dead, restarts will be initiated.

Now, when a Primary Host fails the timeline looks a bit different. This is mainly because first, a new Primary Host will need to be elected. Also, we need to ensure that the new primary has received the latest state of all secondary hosts.

T0 – Primary Host failure.
T10s – Primary election process initiated.
T25s – New primary elected and reads the protectedlist.
- New primary waits for secondary hosts to report running VMs
T35s – Old primary declared unreachable.
T50s – Old primary declared dead, new primary initiates restarts for all VMs on the protectedlist which are not running.

Keep in mind, this does not mean that VMs will be restarted with 18 seconds, or 35 seconds, for that matter. When the host is declared dead, or a new primary is elected, the restart process starts. The VMs that need to be restarted will first need to be placed, and when placed, they will need to be restarted. All of these steps will take time. On top of that, depending on the operating system and the apps running within the VM, the time it takes before the restart is fully completed could vary a lot between VMs. In other words, although the state is declared rather fast, the actual total time it takes to restart can vary and is definitely not an exact science.