Software Defined

HCI Mesh error: Failed to run the remote datastore mount pre-checks

Duncan Epping · Apr 21, 2021 ·

I had two comments on my HCI Mesh compute only blogpost where both were reporting the same error when trying to mount a remote datastore. The error that popped up was the following:

Failed to run the remote datastore mount pre-checks.

I tried to reproduce it in my lab, as both had upgraded from 7.0 to U2 I did the same, that didn’t result in the same error though. The error doesn’t provide any details around why the pre-check fails, as shown below in the screenshot. After some digging I found out that the solution is simple though, you need to make sure IPv6 is enabled on your hosts. Yes, even when you are not using IPv6, it still needs to be enabled to pass the pre-checks. Thanks, Jiří and Reza for raising the issue!

Does vSAN Enhanced Durability work when you have a limited number of hosts?

Duncan Epping · Apr 19, 2021 ·

Last week I had a question about how vSAN Enhanced Durability works when you have a limited number of hosts. In this case, the customer had a 3+3+1 stretched cluster configuration, and they wondered what would happen when they would place a host into maintenance mode. Although I was pretty sure I knew what would happen, I figured I would test it in the lab anyway. Let’s start with a high-level diagram of what the environment looks like. Note I use a single VM as an example, just to keep the scenario easy to follow.

In the diagram, we see a virtual disk that is configured to be stretched across locations, and protected by RAID-1 within each location. As a result, you will have two RAID-1 trees each with two components and a witness, and of course, you would have a witness component in the witness location. Now the question is, what happens when you place esxi-host-1 into maintenance mode? In this scenario, vSAN Enhanced Durability will want to create a “durability component”. This durability component is used to commit all new write IO to. This will allow vSAN to resync fast after maintenance mode, and enhances durability as we would still have 2 copies of the (new) data.

However, in the scenario above we only have 3 hosts per location. The question then is, where is this delta component created then? As normally with maintenance mode you would need a 4th host to move data to. Well, it is simple, in this case, what vSAN does is it creates a “durability component” on the host where the witness resides, within the location of course. Let me show you in a diagram, as that makes it clear instantly.

By adding the Durability component next to the witness on esxi-host-3, vSAN enhances durability even in this stretched cluster situation, as it provides a local additional copy of new writes. Now, of course I tested this in my lab. So for those who prefer to see a demo, check out the youtube video below.

vSAN File Services and Stretched Clusters!

Duncan Epping · Mar 29, 2021 ·

As most of you probably know, vSAN File Services is not supported on a stretched cluster with vSAN 7.0 or 7.0U1. However, starting with vSAN 7.0 U2 we now fully support the use of vSAN File Services on a stretched cluster configuration! Why is that?

In 7.0 U2, you now have the ability to specify during configuration of vSAN File Services to which site certain IP addresses belong. In other words, you can specify the “site affinity” of your File Service services. This is shown in the screenshot below. Now I do want to note, this is a soft affinity rule. Meaning that if hosts, or VMs, fail on which these file services containers are running it could be that the container is restarted in the opposite location. Again, a soft rule, not a hard rule!

Of course, that is not the end of the story. You also need to be able to specify for each share with which location they have affinity. Again, you can do this during configuration (or edit it afterward if desired), and this basically then sets the affinity for the file share to a location. Or said differently, it will ensure that when you connect to file share, one of the file servers in the specified site will be used. Again, this is a soft rule, meaning that if none of the file servers are available on that site, you will still be able to use vSAN File Services, just not with the optimized data path you defined.

Hopefully, that gives a quick overview of how you can use vSAN File Services in combination with a vSAN Stretched Cluster. I created a video to demonstrate these new capabilities, you can watch it below.

vSAN 7.0 U2 now integrates with vSphere DRS

Duncan Epping · Mar 24, 2021 ·

One of the features our team requested a while back was integration between DRS and vSAN. The key use case we had was for stretched clusters. Especially in scenarios where a failure has occurred, it would be useful if DRS would understand what vSAN is doing. What do I mean by that?

Today when customers create a stretched cluster they have two locations. Using vSAN terminology these locations are referred to as the Preferred Fault Domain and the Secondary Fault Domain. Typically when VMs are then deployed, customers will create VM-to-Host Affinity Rules which state that VMs should reside in a particular location. When these rules are created DRS will do its best to ensure that the defined rule is adhered to. What is the problem?

Well if you are running a stretched cluster and let’s say one of the sites go down, then what happens when the failed location returns for duty is the following:

vSAN detects the missing components are available again
vSAN will start the resynchronization of the components
DRS runs every minute and rebalances and will move VMs based on the DRS rules

This means that the VMs for which rules are defined will move back to their respective location, even though vSAN is potentially still resynchronizing the data. First of all, the migration will interfere with the replication traffic. Secondly, for as long as the resync has not completed, I/O will across the network between the two locations, this will not only interfere with resync traffic, it will also increase latency for those workloads. So, how does vSAN 7.0 U2 solve this?

Starting with vSAN 7.0 U2 and vSphere 7.0 U2 we now have DRS and vSAN communicating. DRS will verify with vSAN what the state is of the environment, and it will not migrate the VMs back as long the VMs are healthy again. When the VMs are healthy and the resync has completed, you will see the rules being applied and the VMs automatically migrate back (when DRS is configured to Fully Automated that is).

I can’t really show it with a screenshot or anything, as this is a change in the vSAN/DRS architecture, but to make sure it worked I recorded a quick demo which I published through Youtube. Make sure to watch the video!

vSAN 7.0 U2 Durability Components?

Duncan Epping · Mar 22, 2021 ·

Last week I published a new demo on my youtube channel (at the bottom of this post) and it discussed an enhanced feature called Durability Components. Some may know these as “delta components” as well. These durability components were introduced in vSAN 7.0 Update 1 and provided a mechanism to maintain the required availability for VMs while doing maintenance. That meaning that when you would place a host into maintenance mode new “durability components” would be created for the components which were stored on that host. This would then allow all the new VM I/O to be committed to the existing component, as well as the durability component.

Now, starting with vSAN 7.0 Update 2, vSAN also uses these durability components in situations where a host failure has occurred. So if a host has failed, durability components will be created to ensure we still maintain the specified availability level specified within the policy as shown in the diagram above. The great thing is that if a second host fails in an FTT=1 scenario and you are able to recover the first failed host, we can still merge the data with the first failed host with the durability component! So not only are these durability components great for improving the resync times, but they also provide a higher level of availability to vSAN! To summarize:

Host fails
Durability components are created for all impacted objects
New writes are committed to existing components and the new durability components
Host recovers
Durability components are merged with the previously failed components
Durability components are deleted when resync has completed

I hope that help providing a better understanding of how these durability components help improving availability/resiliency in your environment with vSAN 7.0 Update 2.

I can understand that some of you may not want to test durability components in their own environment, this is why I recorded a quick demo and published it on my youtube channel. Check out the video below, as it also shows you how durability components are represented in the UI.