After writing the article that “4 is the minimum number of hosts for VSAN” I received a lot of questions via email and on twitter etc about the cost associated with it and if this was a must. Let me start with saying that I wrote this article to get people thinking about Sizing their VSAN environment. When it comes to it, Virtual SAN and maintenance windows can be a difficult topic.
I guess there are a couple of things to consider here. Even in a regular storage environment you typically do upgrades in a rolling fashion meaning that if you have two controllers one will be upgraded while they other handles IO. In that case you are also at risk. The thing is though, as a virtualization administrator you have a bit more flexibility, and you expect certain features to work as expected like for instance vSphere HA. You need to ask yourself what is the level of risk I am willing to take, the level of risk I can take?
When it comes to placing a host in to Maintenance Mode, from a VSAN point of view you will need to ask yourself:
- Do I want to move data from one host to another to maintain availability levels?
- Do I just want to ensure data accessibility and take the risk of potential downtime during maintenance?
I guess there is something to say for either. When you move data from one node to another, to maintain availability levels, your “maintenance window” could be stretched extremely long. As you would potentially be copying TBs over the network from host to host it could take hours to complete. If your ESXi upgrade including a host reboot takes about 20 minutes, is it acceptable to wait for hours for the data to be migrated? Or do you take the risk, inform your users about the potential downtime, and as such do the maintenance with a higher risk but complete it in minutes rather than hours? After those 20 minutes VSAN would sync up again automatically, so no data loss etc.
It is impossible for me to give you advice on this one to be honest, I would highly recommend to also sit down with your storage team. Look at what their current procedures are today, what they have included in their SLA to the business (if there is one), and how they handle upgrades / periodic maintenance.