After writing the article that “4 is the minimum number of hosts for VSAN” I received a lot of questions via email and on twitter etc about the cost associated with it and if this was a must. Let me start with saying that I wrote this article to get people thinking about Sizing their VSAN environment. When it comes to it, Virtual SAN and maintenance windows can be a difficult topic.
I guess there are a couple of things to consider here. Even in a regular storage environment you typically do upgrades in a rolling fashion meaning that if you have two controllers one will be upgraded while they other handles IO. In that case you are also at risk. The thing is though, as a virtualization administrator you have a bit more flexibility, and you expect certain features to work as expected like for instance vSphere HA. You need to ask yourself what is the level of risk I am willing to take, the level of risk I can take?
When it comes to placing a host in to Maintenance Mode, from a VSAN point of view you will need to ask yourself:
- Do I want to move data from one host to another to maintain availability levels?
- Do I just want to ensure data accessibility and take the risk of potential downtime during maintenance?
I guess there is something to say for either. When you move data from one node to another, to maintain availability levels, your “maintenance window” could be stretched extremely long. As you would potentially be copying TBs over the network from host to host it could take hours to complete. If your ESXi upgrade including a host reboot takes about 20 minutes, is it acceptable to wait for hours for the data to be migrated? Or do you take the risk, inform your users about the potential downtime, and as such do the maintenance with a higher risk but complete it in minutes rather than hours? After those 20 minutes VSAN would sync up again automatically, so no data loss etc.
It is impossible for me to give you advice on this one to be honest, I would highly recommend to also sit down with your storage team. Look at what their current procedures are today, what they have included in their SLA to the business (if there is one), and how they handle upgrades / periodic maintenance.
Thank you for writing this…I am puzzled by this phrase: “Or do you acknowledge the risk and ensure users are aware and take the shortest path?”
What are you thinking of in this sentence??
For myself, I could see announcing that servers will not be available for 1/2 hour for maintenance, for example. And then one could shut down servers for that time period. Or just move a subset of servers. Various aspects of this approach would work in many SMBs.
But I think you mean something else with this idea of taking the shortest path??
Thank you, Tom
Duncan Epping says
What I was trying to say was the following:
Do you acknowledge the risk of not moving data,
Inform your users of the risk and potential down time,
And as such take the “short path” in terms of time it takes to upgrade all hosts.
Hello, thank you for explaining. My head is spinning already!! 🙂 🙂 🙂
iwan rahabok says
Good point as always. Could you give example on the “copying TBs over the network from host to host it could take hours to complete”. I’m thinking we normally use 2x 10 GE (per ESXi) when we have VSAN. Assuming 5 GE is available for the VSAN copy, copying 1 TB would take ~30 minutes (27.3 minute mathematically). So if that host to be placed under maintenance has 3 TB, it would take 1.5 hours. 3 TB seems reasonable for 20 VM (150 GB per VM) since VSAN 1.0 use cases does not yet cover VM with large storage need.
Alternatively, we can do “pre-copy”. Basically, we do the copying earlier, by increasing the #copies for the VMs that has data on that host. This needs some clever scripting so we don’t have to check every VM one by one 🙂
Thanks from the little red dot Duncan. Really hope to have you in Singapore one fine day.
Thank you for this great writeup!
With all the excitement about Virtual SAN I’m doing my “homework” and try to evaluate if it makes sense for us. This post is a real eye opener.
It got me also thinking whether Virtual SAN is the right solution for small businesses from a financial point of view.
We now use an “Essentials Plus” bundle with 3 hosts and a traditional shared storage, exactly for the reasons you mention in your article. The 3rd machine only makes sense in this strategy as we don’t need the capacity for virtual machines.
Moving to 4 hosts would mean we have to go to another (read: far more expensive) vSphere licensing model.
In our particular case this means:
– extra vSphere licensing costs
– Virtual SAN licensing costs (don’t know pricing yet 😉 )
– costs hardware adjustments (SSD, higher capacity SAS drives , …)
– costs extra host
… have to beat the costs of a traditional shared storage.
The same is also true for a solution based on Nutanix of course.
I hope VMware will address this a bit by altering the bundles.
@Guy @depping Very simple solution. Bag the unnecessary vCenter Foundatoin 3-host limit for every single person/company that runs VMware and no matter when or where or how they bought VMware. It’s a stupid, arbitrary restriction designed to force more sales of overpriced VMware despite VMware’s rock-solidnoess. Then companies and people can better make up their minds what to do. Another thing to do would be to give away vCenter and only charge for the hosts’ CPUs. It will take awhile but XenServer and Hyper-V will eventually take over from VMware if the prices don’t go down for intial purchases and for ongoing support…
I really love VMware products for making stuff that’s actually pretty complicated so easy to manage.
I talked to VMware sales guys at several VMware and Dell events, notably after the release of Windows 2012 and the progress of hyper-v. From those talks, I understand their focus is primary on big clients. I’m not an expert but I figure this business model is sound.
It’s a shame VMware doesn’t seem interested in bringing technologies like DRS, storage vMotion and now Virtual SAN to their small clients. They still can make a lot of sense to us.
@Guy Totally agree. VMware likes the one percenters better. 🙂 🙂