In the vSphere 7.0 Update 1 release VMware introduced a new service called the VMware vSphere Cluster Services (vCLS). vCLS provides a mechanism that allows VMware to decouple both vSphere DRS and vSphere HA from vCenter Server. Niels Hagoort wrote a lengthy article on this topic here. You may wonder why VMware introduces this, well as Niels states. by decoupling the clustering services (DRS and HA) from vCenter Server via vCLS we ensure the availability of critical services even when vCenter Server is impacted by a failure.
vCLS is a collection of multiple VMs which, over time, will be the backbone for all clustering services. In the 7.0 U1 release a subset of DRS functionality is enabled through vCLS. Over the past week(s) I have seen many questions coming in and I wanted to create a blog with answers to these questions. When new questions or considerations come up, I will add these to the list below.
- Do I need to maintain/update/manage the vCLS VMs?
No, the VMs are managed by the ESX Agent Manager, you as an administrator should not need to manage these!
- How many vCLS VMs will there be running?
At most, there will be 3 VMs running. If you have 1 host you will have 1 VM, with 2 hosts 2 VMs and with 3 or more hosts you will have 3 VMs.
- Why do I see vCLS VMs with a number higher than 3? (for example “vCLS (5)” or “vCLS (28)”)
During maintenance ESX Agent Manager (EAM) can delete and re-provision the VMs when deemed needed. When a new VM is provisioned the counter will go up.
- What is the resource overhead of the vCLS VMs?
Each VM has 1 vCPU, 128MB memory, 2GB thin disk, no NIC
- If the vCLS has no NIC how does it communicate?
vCLS leverages a VMCI/vSOCKET interface to communicate with the hypervisor.
- On which datastore will the vCLS VMs be provisioned?
It will be provisioned on shared storage, if available during provisioning, otherwise, it will go to local VMFS.
- Can I specify which datastores should be considered for vCLS VMs?
No, unfortunately today it is not possible to specify which datastores should be used for provisioning the vCLS VMs to.
- If one, or more, vCLS VMs are provisioned on the wrong datastore, can I SvMotion it?
Yes, you are allowed to SvMotion the vCLS VMs to a datastore of choice, this should preferably be a datastore which is presented to all hosts in the cluster!
- Can I use Storage DRS (SDRS) to place datastore into maintenance mode which holds vCLS VMs?
SDRS does not consider vCLS VMs for migration, vCLS VMs right now need to be manually migrated, even when using SDRS.
- Why do I have multiple vCLS VMs running on the same host, and can I vMotion them?
After maintenance mode you can end up in a situation where multiple (or all) vCLS VMs are running on the same host, you will need to manually vMotion the vCLS VMs to different hosts in your cluster. The development team is aware of this issue, and is aiming to fix this in an upcoming release.
- Why aren’t the vCLS VMs deleted when I disable DRS?
The vCLS VMs are linked to the “cluster object” and not directly to the DRS functionality. Disabling DRS doesn’t impact the vCLS VMs.
- Can I disable the provisioning of the vCLS VMs?
Yes you can, but keep in mind that vCLS is required for DRS to function. Meaning that if you disable the provisioning of the VMs DRS will not work any longer, which also means that HA can’t leverage DRS for failover placement and will need to resort to the simple placement mechanism.
- I can’t find the advanced setting mentioned in the documentation, what is the setting for disabling vCLS?
Go to your vCenter Server object, go to the configure tab, then go to “Advanced Settings”, add the key “config.vcls.clusters.domain-c<identifier>.enabled” and set it to false. The domain “c-number” for your cluster can be found in the URL when you click on the cluster in the HTML-5 interface. It should look something like the following, where the bold part is the important bit: https://vcsa-06.rainpole.com/ui/app/cluster;nav=h/urn:vmomi:ClusterComputeResource:domain-c22:4df0badc-1655-40de-9181-3422d6c36a3e/summary.
- When I enable “retreat” mode and the vCLS VMs are deleted, will that also delete my resource pools?
No, resource pools are not deleted when vCLS is disabled. Only DRS load balancing is impacted!
- Can I use the API/PowerCLI to enable Retreat Mode?
Yes, an example of this can be found in this VMware Community Thread.
- Should I create anti-affinity rules for the vCLS VMs?
No, starting with vSphere 7.0 Update 2, the vCLS VMs have anti-affinity defined within the system!
- Can I create a custom naming scheme for the vCLS VMs?
Today it is not possible to create a custom naming scheme vCLS VM, this has been filed as a feature request and is considered for a future release. We also discourage renaming the VMs at this point in time.
- Can I rename the folder in which the vCLS VMs are stored in the VMs & Templated view?
No, we discourage changing the name of the folder as this could lead to issues when EAM needs to delete VMs.
- Can I login to the vCLS VMs?
Yes, but this is only intended for troubleshooting. This should not be needed during normal operations as these VMs are managed by EAM. You need to SSH into vCenter Server and then run the following command to retrieve the password. This will then allow you to login with “root” into the vCLS VMs, although I personally have not found a reason to do so.
- Does vCLS require Kubernetes for vSphere to be configured?
No, it does not.
- Do I need to back up or replicate the vCLS VMs?
No, there’s no need to create a backup or replicate the VMs. If the VMs are impacted by an outage than HA will restart them or EAM will recreate them automatically.
- Does vCLS work with ESXi on ARM?
No, in the current release vCLS VMs cannot be provisioned to a cluster using ESXi on ARM. You can disable the creation of the VMs following this article.
- If I need to power off my cluster, what do I do with these VMs?
These VMs are migrated by DRS to the next host until the last host needs to go into maintenance mode and then they are automatically powered off by EAM.
- Which data evacuation option should I use when going into maintenance mode cluster-wide?
I use the “no data migration” option after I have powered off all normal VMs.
- Is vCLS compatible with Site Recovery Manager (SRM)?
SRM is supported with vCLS starting vCenter Server 7.0 U1a.
- Can my vCenter Server instance be 7.0 U1 while my hosts are vSphere 6.5 or 6.7?
Yes, this is fully supported. Just check the vCenter/ESXi compatibility matrix. Everything which is listed is also supported for DRS/vCLS
- Do I need to do anything for a stretched cluster?
No that is not required, we would recommend however to ensure there’s at least 1 VM in each location from a compute perspective.
- Can I identify these VMs as “special VMs” when automating certain tasks or creating reports through my automation tools?
Yes, you can easily identify them, Niels has listed the properties in this blog article at the bottom.
- If one of the VMs fails, is there a warning?
Yes, you will see a warning triggered on the cluster level. This is the result of a new Skyline Health Check that was also introduced as part of 7.0 U1. I created a demo that shows that you can watch the demo here.
- I have upgraded my vCenter Server and the vCLS VMs are not getting provisioned, what can I do?
We have seen some situations where an error is logged (eam.log) which states “Can’t provision VM for ClusterAgent” and “due to lack of suitable datastore” and “Couldn’t acquire token due to: Signature validation failed”. In this case, the customer had two linked vCenter Server instances.
If you see this error, reset the STS certificate and restart the vCenter STS service and see if the VMs are now being provisioned (Thanks to Mina Botros(GSS) for providing this tip!). More details on how to stop/start a service can be found here.
- I am getting an error stating “insufficient resources” on the vCLS VM power on event, how can I fix this problem?
We are not sure what is causing this problem right now, but it can be fixed by changing the per-VM EVC to disabled. This blog describes how to do this.
- I am getting an error stating the following “Feature ‘MWAIT’ was absent but must be present” when the vCLS VM is powered on.
Please enable the MONITOR/MWAIT flag in the bios of the hosts in the cluster, as documented here.
- I am getting the following error, how do I solve it? ““Feature ‘bad_requirement.hv.capable’ was 0, but must be at least 1′. Failed to start the virtual machine. Module ‘FeatureCompatLate’ power on failed.”
You most likely have the setting vhv.enable = “TRUE” configured on each host in “/etc/vmware/config”. If you set it to false the vCLS VMs should start.