I had this question today, probably sparked by my post earlier where a showed a single-node cluster being able to leverage HCI Mesh to mount a remote vSAN Datastore. The question was if this also works with 2-node vSAN or with a vSAN Stretched Cluster. Unfortunately, the answer is no, a 2-node cluster or a vSAN Stretched Cluster is not supported with HCI Mesh today. Yes, this is a hard limit today, meaning that the “health check”, which is done before mounting the datastore, will actually report the issue and not allow it to progress. You can imagine that this is the result of the latency and bandwidth/throughput requirements there are in place for vSAN HCI Mesh today. This may, or may not, change over time.
7.0 u2
Does vSAN Enhanced Durability work when you have a limited number of hosts?
Last week I had a question about how vSAN Enhanced Durability works when you have a limited number of hosts. In this case, the customer had a 3+3+1 stretched cluster configuration, and they wondered what would happen when they would place a host into maintenance mode. Although I was pretty sure I knew what would happen, I figured I would test it in the lab anyway. Let’s start with a high-level diagram of what the environment looks like. Note I use a single VM as an example, just to keep the scenario easy to follow.
In the diagram, we see a virtual disk that is configured to be stretched across locations, and protected by RAID-1 within each location. As a result, you will have two RAID-1 trees each with two components and a witness, and of course, you would have a witness component in the witness location. Now the question is, what happens when you place esxi-host-1 into maintenance mode? In this scenario, vSAN Enhanced Durability will want to create a “durability component”. This durability component is used to commit all new write IO to. This will allow vSAN to resync fast after maintenance mode, and enhances durability as we would still have 2 copies of the (new) data.
However, in the scenario above we only have 3 hosts per location. The question then is, where is this delta component created then? As normally with maintenance mode you would need a 4th host to move data to. Well, it is simple, in this case, what vSAN does is it creates a “durability component” on the host where the witness resides, within the location of course. Let me show you in a diagram, as that makes it clear instantly.
By adding the Durability component next to the witness on esxi-host-3, vSAN enhances durability even in this stretched cluster situation, as it provides a local additional copy of new writes. Now, of course I tested this in my lab. So for those who prefer to see a demo, check out the youtube video below.
Using HCI Mesh with a stand-alone vSphere host?
Last week at the French VMUG we received a great question. The question was whether you can use HCI Mesh (datastore sharing) with a stand-alone vSphere Host. The answer is simple, no you cannot. VMware does not support enabling vSAN, and HCI Mesh, on a single stand-alone host. However, if you still want to mount a vSAN Datastore from a single vSphere host, there is a way around this limitation.
First, let’s list the requirements:
- The host needs to be managed by the same vCenter Server as the vSAN Cluster
- The host needs to be under the same virtual datacenter as the vSAN Cluster
- Low latency, high bandwidth connection between the host and the vSAN Cluster
If you meet these requirements, then what you can do to mount the vSAN Datastore to a single host is the following:
- Create a cluster without any services enabled
- Add the stand-alone host to the cluster
- Enable vSAN, select “vSAN HCI Mesh Compute Cluster”
- Mount the datastore
Note, when you create a cluster and add a host, vCenter/EAM will try to provision the vCLS VM. Of course this VM is not really needed as HA and DRS are not useful with a single host cluster. So what you can do is enable “retreat mode”. For those who don’t know how to do this, or those who want to know more about vCLS, read this article.
As I had to test the above in my lab, I also created a short video demonstrating the workflow, watch it below.
vSphere Native Key Provider backup not working
I had three people asking this question the past few weeks, they were trying to configure the vSphere Native Key Provider so that they could enable vSAN Encryption, but the backup function wasn’t working. If you have not seen the Native Key Provider in action yet, just watch the video below.
As demonstrated in the video, when you configure the vSphere Native Key provider, you need to back up the key first before you can use it. Now, as mentioned, I had a few folks asking the past weeks why they couldn’t back up the key. The reason for it is simple, when you configure the Native Key Provider and want to back it up, you need to access the vSphere UI via the fully qualified domain name. In other words, when you access the H5 UI via the IP address of the vCenter Server, then the backup function won’t work. Also, when you have multiple vCenter Server instances in Linked Mode, you need to make sure you access the correct vCenter Server, the vCenter Server instance on which the Native Key Provider is enabled. Isn’t all of this documented? Yes, it is! But who reads documentation these days right?
vSAN 7.0 U2 now integrates with vSphere DRS
One of the features our team requested a while back was integration between DRS and vSAN. The key use case we had was for stretched clusters. Especially in scenarios where a failure has occurred, it would be useful if DRS would understand what vSAN is doing. What do I mean by that?
Today when customers create a stretched cluster they have two locations. Using vSAN terminology these locations are referred to as the Preferred Fault Domain and the Secondary Fault Domain. Typically when VMs are then deployed, customers will create VM-to-Host Affinity Rules which state that VMs should reside in a particular location. When these rules are created DRS will do its best to ensure that the defined rule is adhered to. What is the problem?
Well if you are running a stretched cluster and let’s say one of the sites go down, then what happens when the failed location returns for duty is the following:
- vSAN detects the missing components are available again
- vSAN will start the resynchronization of the components
- DRS runs every minute and rebalances and will move VMs based on the DRS rules
This means that the VMs for which rules are defined will move back to their respective location, even though vSAN is potentially still resynchronizing the data. First of all, the migration will interfere with the replication traffic. Secondly, for as long as the resync has not completed, I/O will across the network between the two locations, this will not only interfere with resync traffic, it will also increase latency for those workloads. So, how does vSAN 7.0 U2 solve this?
Starting with vSAN 7.0 U2 and vSphere 7.0 U2 we now have DRS and vSAN communicating. DRS will verify with vSAN what the state is of the environment, and it will not migrate the VMs back as long the VMs are healthy again. When the VMs are healthy and the resync has completed, you will see the rules being applied and the VMs automatically migrate back (when DRS is configured to Fully Automated that is).
I can’t really show it with a screenshot or anything, as this is a change in the vSAN/DRS architecture, but to make sure it worked I recorded a quick demo which I published through Youtube. Make sure to watch the video!