vSphere

Demo Time: How to delete the vCLS VMs

Duncan Epping · Oct 27, 2020 ·

As I have a bunch of questions about how you can delete the vSphere Cluster Service VMs (vCLS VMs) I figured I would create a quick demo. It is pretty straightforward, and it should only be used when people are doing some kind of full cluster maintenance. This demo shows you how to get the VMs deleted by leveraging a vCenter Server Level Advanced setting (config.vcls.clusters.domain-c<identifier>.enabled). I have also written a post that has a bunch of requirements, Q&A, and considerations for the vCLS VMs, if you are interested in that read it here.

Here’s the summary of how to delete the VMs: Go to your vCenter Server object, go to the configure tab, then go to “Advanced Settings”, add the key “config.vcls.clusters.domain-c<identifier>.enabled” and set it to false. The domain “c-number” for your cluster can be found in the URL when you click on the cluster. It should look something like the following, where the bold part is the important bit: https://vcsa-06.rainpole.com/ui/app/cluster;nav=h/urn:vmomi:ClusterComputeResource:domain-c22:4df0badc-1655-40de-9181-3422d6c36a3e/summary. If you want to recreate the VMs, simply set the value to “true” when the deletion task has completed.

Note, if you have a resource pool configuration, enabling “retreat mode” (disabling vCLS)) doesn’t impact resource pools in any shape or form, it just impacts DRS load balancing. Anyway, I hope you find the demo useful.

VMware vSphere Cluster Services (vCLS) considerations, questions and answers.

Duncan Epping · Oct 9, 2020 ·

In the vSphere 7.0 Update 1 release VMware introduced a new service called the VMware vSphere Cluster Services (vCLS). vCLS provides a mechanism that allows VMware to decouple both vSphere DRS and vSphere HA from vCenter Server. Niels Hagoort wrote a lengthy article on this topic here. You may wonder why VMware introduces this, well as Niels states. by decoupling the clustering services (DRS and HA) from vCenter Server via vCLS we ensure the availability of critical services even when vCenter Server is impacted by a failure.

vCLS is a collection of multiple VMs which, over time, will be the backbone for all clustering services. In the 7.0 U1 release a subset of DRS functionality is enabled through vCLS. Over the past week(s) I have seen many questions coming in and I wanted to create a blog with answers to these questions. When new questions or considerations come up, I will add these to the list below.

Announcing VMware Cloud Disaster Recovery! (VCDR)

Duncan Epping · Sep 30, 2020 ·

Most of you probably saw the announcements around the acquisition of Datrium not too long ago. One of the major drivers for that acquisition was the Disaster Recovery solution which Datrium developed. This week at VMworld this service was announced as a new VMware disaster recovery option. The service is named VMware Cloud Disaster Recovery, and it provides the ability to replicate workloads from on-prem into cloud storage, and recover from cloud storage into VMware Cloud on AWS! The three key pillars of the service are ease of use, fast recovery, cloud economics.

The solution is extensively covered in three VMworld sessions (HCI2876, HCI2886, HCI2865). I have watched all three and will provide a short summary here. What capabilities does VMware Cloud DR (VCDR) provide and why is VMware heading into this space?

The why was well explained by Mark Chuang in HCI2876, customers are saying that:

“DR is very complex and expensive to manage, and I can’t add IT Headcount”
“Our data grows 10-15% every year, with physical DR it is hard to accommodate the growth in the datacenter to meet the needs”
“We only test full DR once a year because it is disruptive. Any time there is a major change, how can we know it still works? It is a huge issue!”

I guess that makes it clear why VMware is interested in this space, it is a huge problem for customers and the solution typically comes at a high cost. VMware has always been in the business of solving complex solutions in preferably a simple way, and that is exactly what VMware Cloud Disaster Recovery delivers, a simple solution at a relatively low cost.

So what does it bring from a feature/functionality stance?

it all starts with cloud economics, to which ease-of-use also contributes, in my opinion. VMware Cloud Disaster Recovery is super simple to configure and it replicates data to “cheap and deep” cloud storage. This ensures that the cost can be kept low, and note that all of the typical cost that comes with cloud storage (network etc) are all included in the service offering by VMware. The challenge however typically with cloud storage is that it is relatively slow when it comes to restoring, but this is where the “on-demand” capabilities come into play. VMware Cloud DR provides the ability to instantly power-on workloads through a live mount option, without the need to convert the stored data back to a VM format.

When configuring the VMware Cloud DR solutions you will need to install/configure a DRaaS Connector on-prem. This on-prem Connector connects you to the SaaS platform and will provide the required details to the SaaS Orchestrator, note that you can have multiple DRaaS connectors for resiliency and performance reasons. When the connection is configured you will then be able to create “Protection Groups” and “DR Plans”. Those who have worked with Site Recovery Manager will recognize the terms. For those who haven’t:

Protection Groups – These groups list the workloads which will be protected by VMware Cloud DR. Of course you can define the protection schedule, basically how many snapshots need to be shipped remote cloud storage per day/week/month.
DR Plans – These plans list workloads that would need to be failed over when the plan is triggered, and for instance, include the order in which the workloads need to be powered on. Also, if workloads need to get a different IP address in the cloud, then you can specify this here also.

Of course besides creating protection groups and DR plans you have the ability to test and failover the workloads in those plans, again, very similar to what Site Recovery Manager offers. Before I forget, you will have the option of course to select the snapshot you want to recover from. So you can go back to any point in time. What is unique here is that VMs are powered without (initially) moving data from cloud storage to your VMware Cloud on AWS. It basically mounts an NFS share from the SaaS platform and the scale-out file system ensures that the VMs can be instantly be powered on. After you have tested the recovery you can then decide to migrate the VMs to your SDDC, or you can of course also discard the workloads if that is something you desire. Last but not least, of course, you also have the ability to replicate back to on-prem, so that you can bring your workloads back whenever you have recovered your environment from the disaster that occurred and you are ready to run those workloads on-prem again.

Now there are many more details, but I am not going to share those in this post, I may do a couple of additional blogs at a future time. I hope the above gives a good overview of what the offering will provide. For more details, I would recommend watching the VMworld sessions on this topic (HCI2876, HCI2886, HCI2865). The last thing I want to share though is where the solution will be available, or at least what is being planned. As shown below, the offering should be available in multiple regions soon.

Host in vSAN cluster with 0 components while other hosts are almost full?

Duncan Epping · Sep 3, 2020 ·

Internally someone just bumped into an issue where a single host in a cluster wasn’t storing any of the created vSAN Components / Objects. It was to the point where every single host in the cluster was close to the maximum of 9000 components, but that one host had 0 components. After some quick back and forth the following message stood out in the UI:

vSAN node decommission state

What does this mean? Well basically it means that from a vSAN stance this host is in maintenance mode. For whatever reason, the host itself from a hypervisor stance was not in maintenance mode, which means that the two were not in sync. This can simply be resolved by SSH’ing into the respective host and running the following command:

localcli vsan maintenancemode cancel

One thing to consider of course is to trigger a rebalance of the cluster after taking the host out of the decommissioned state when the environment is 6.7 U2 or lower, as that would result in a more equally balanced environment. Starting with 6.7 U3 this process is initiated automatically, when configured. There is a KB that describes how to trigger, and/or configure, to be found here.

vSphere FT and vVols/SPBM an unsupported config? Why?

Duncan Epping · Aug 20, 2020 ·

I was pointed out by a customer (thanks Johan), that vSphere FT is not supported when using SPBM on non-vSAN based storage systems. You may wonder why this is, at least I did wonder. I figured it would be a testing constraint of some sort, but after emailing product management, engineering, and our quality engineering team I now understand why it is. Now before I explain it, the constraint is documented here, let me quote the section for you:

Virtual Volume datastores.
Storage-based policy management. Storage policies are supported for vSAN storage.

So why is this and why would vSAN be supported as that also uses SPBM? Well the difference is in the implementation. For vVols there’s a dependency on vCenter Server to be available when creating new VMs. This is essentially what happens when an FT instance needs to be restarted. We will need to associate an SPBM policy with it and we can only retrieve it via vCenter Server. With vSAN, FT/HA can also retrieve the needed info via the ESXi host. This is why FT and vSAN are a supported configuration, and vVols and FT, unfortunately, is not at the moment. Hopefully, though, this will change in the future. (Yes, I filed a feature request before anyone asks.)