I was talking to a partner and customer last week at a VMUG. They were running a two node (direct connect) vSAN configuration and had some issues during maintenance which were, to them, not easy to explain. What they did is they placed the host which was in the “preferred fault domain” in to maintenance mode. After they placed that host in to maintenance mode the link between the two hosts for whatever reason failed. After they rebooted the host in the preferred host it connected back to the witness but at this point in time the connection between the hosts had not returned yet. This confused vSAN and that resulted in the scenario where the VMs in the secondary fault domain were powered off. As you can imagine an undesired effect.
This issue is solved in the near future in a new version of vSAN, but for those who need to do maintenance on a two-node (direct connect) configuration (or a full site maintenance in a stretched environment) I would highly recommend the following simple procedure. This will need to be done when doing maintenance on the host which is in the “preferred fault domain”:
- Change the preferred fault domain
- Place the host in to maintenance mode
- Do your maintenance
Fairly straight forward, but important to remember…