<Update 1-oct-15>Make sure to read this article also as it is based on Virtual SAN 6.1, which is the current latest version </update>
What is the minimum number of hosts for VSAN? This is one of those discussions which is difficult… I mean, what is the minimum number of hosts for vSphere HA for instance. If you ask anyone that question then most people will say: the minimum number for HA is 2. However, when you think about why you are using vSphere HA then you will realize pretty quick that the actual minimum number is 3.
Why is that? Well you can imagine that when you need to upgrade your hosts you also want some form of resiliency for your virtual machines. Guess what, if you have only 2 hosts and you are upgrading 1 of them and the other fails… Where would your virtual machines be restarted? I can give you the answer: nowhere. The only host you had left is in maintenance mode and undergoing an upgrade. So in that case you are … euhm screwed.
Now lets looks at VSAN, in order to comply to a “number of failures to tolerate = 1” policy you will need 3 hosts at a minimum at all times. Even if 1 host fails miserably then you can still access your data because with 3 hosts and 2 mirror copies and a witness you will still have > 50% of your copies available. But what happens when you place one of those hosts in maintenance mode?
Well I guess when both remaining hosts keep on functioning as expected then all VMs will just keep on running, however if one fails… then… then you have a challenge. So think about the number of hosts you want to have supporting your VSAN datastore!
I guess the question then arises, with this “number of failures to tolerate” policy, how many hosts do I need at a minimum? How many mirror copies will be created and how many witnesses? Also, how many hosts will I need when I want to take “maintenance mode” in to consideration?
Number of Failures | Mirror copies | Witnesses | Min. Hosts | Hosts + Maintenance |
0 | 1 | 0 | 1 host | n/a |
1 | 2 | 1 | 3 hosts | 4 hosts |
2 | 3 | 2 | 5 hosts | 6 hosts |
3 | 4 | 3 | 7 hosts | 8 hosts |
I hope that helps making the right decision…
Marko says
Very good read! Maybe you should promote VSAN as a RAID6 solution, needs 4 “data hosts”, works fine with 3 and with reduced performance also with 2 “data hosts”.
Unfortunately, from my experiemce, the fiasco begins earlier. Only one A/C per data center, bad UPS support,…
Duncan Epping says
2 hosts = reduced availability… performance could be roughly the same.
Marko says
Will you post some numbers & figures after GA of VSAN? Would be great!
Duncan says
Performance numbers?
Marko says
Yes, to give us an impression what “performance could be roughly the same” means in numbers. It’s always good to know how much performance a degraded systems gives to your application(s).
Duncan Epping says
so many variables there, that will depend on: number of hosts in your cluster, number of failures, objects impacted, blocks impacted and accessed.
Marko says
Duncan, sometimes it takes a while until it dawns on me. If I’m right your table shows
that VSAN doesn’t work as a vMSC solution. Or is there a way to create a “witness vm” which only acts as a witness (like the Lefthand FOM)?
Duncan Epping says
VSAN is not a vMSC solution. Not designed to be one, for now that is. You cannot control where the witness resides. But you can see the potential already right 🙂
Marko says
Imho VSAN could be the vMSC solution for a lot of companies! Looking forward to see VSAN 2.0 with this feature! 😀
Tom says
Hello,
Thank you for writing this particular article.
what about this idea?? –> Many companies with only 3 hosts might only ever put a host into maintenance mode, 1 at a time, for performing remediation/updates via VUM, which usually requires a reboot. This often lasts 15-30 minutes per host. If one did one host per day or one host at a time, this would mitigate the risk, provided other things were taken care of, such as redundant power supplies, UPS, backup power generalor, etc.
Thoughts, anyone??
Thank you, Tom
Duncan Epping says
That is up to the customer of course, I only share what I know… can’t assess how you deal with certain risks 🙂
Carlos says
Hmm, I don’t buy 🙂
Your +1 for maintenance does not provide for real tolerance. If you take one of the copies for maintenance and your only working copy fails, then you are kaput even if you have 4 hosts, because you would not have had time to regain redundancy.
Duncan Epping says
I think you misunderstood the concept:
N+1 = 3 hosts needed (2 components for each object and a witness)
If I have 4 hosts, and place it in to maintenance mode and tell it to move data then 3 hosts will hold your components and a witness.
So even if a host would fail in that scenario during maintenance mode… it would work fine.
Bob says
Hey Duncan – sorry for resurrecting an old thread, but when you say “and you tell it to move data” do you mean the “Ensure accessibility” or “Full data migration”?
Duncan Epping says
full data migration is the only way to “move all” data out.