One of the features our team requested a while back was integration between DRS and vSAN. The key use case we had was for stretched clusters. Especially in scenarios where a failure has occurred, it would be useful if DRS would understand what vSAN is doing. What do I mean by that?
Today when customers create a stretched cluster they have two locations. Using vSAN terminology these locations are referred to as the Preferred Fault Domain and the Secondary Fault Domain. Typically when VMs are then deployed, customers will create VM-to-Host Affinity Rules which state that VMs should reside in a particular location. When these rules are created DRS will do its best to ensure that the defined rule is adhered to. What is the problem?
Well if you are running a stretched cluster and let’s say one of the sites go down, then what happens when the failed location returns for duty is the following:
- vSAN detects the missing components are available again
- vSAN will start the resynchronization of the components
- DRS runs every minute and rebalances and will move VMs based on the DRS rules
This means that the VMs for which rules are defined will move back to their respective location, even though vSAN is potentially still resynchronizing the data. First of all, the migration will interfere with the replication traffic. Secondly, for as long as the resync has not completed, I/O will across the network between the two locations, this will not only interfere with resync traffic, it will also increase latency for those workloads. So, how does vSAN 7.0 U2 solve this?
Starting with vSAN 7.0 U2 and vSphere 7.0 U2 we now have DRS and vSAN communicating. DRS will verify with vSAN what the state is of the environment, and it will not migrate the VMs back as long the VMs are healthy again. When the VMs are healthy and the resync has completed, you will see the rules being applied and the VMs automatically migrate back (when DRS is configured to Fully Automated that is).
I can’t really show it with a screenshot or anything, as this is a change in the vSAN/DRS architecture, but to make sure it worked I recorded a quick demo which I published through Youtube. Make sure to watch the video!