At the end of 2010 I wrote an article about cluster sizes… ever since it has been a popular article and I figured that it was time to update it. vSphere 5 changed the game when it comes to sizing/scaling of your clusters and I this is an excellent opportunity to emphasize that. The key take-away of my 2010 article was the following:
I am not advocating to go big…. but neither am I advocating to have a limited cluster size for reasons that might not even apply to your environment. Write down the requirements of your customer or your environment and don’t limit yourself to design considerations around Compute alone. Think about storage, networking, update management, max config limits, DRS & DPM, HA, resource and operational overhead.
We all know that HA used to be a constraint for your cluster size… However these times are long gone. I still occasionally see people referring to old “max config limits” around the amount of VMs per cluster when exceeding 8 hosts… This is not a concern anymore. I also still see people referring to the max 5 primary node limit… Again not a concern anymore. I guess we can generalize things and using the 2010 article and applying that to vSphere 5 I guess we can come to the following conclusions:
- HA does not limit the number of hosts in a cluster anymore! Using more hosts in a cluster results in less overhead. (N+1 for 8 hosts vs N+1 for 32 hosts)
- DRS loves big clusters! More hosts equals more scheduling opportunities.
- SCSI Locking? Hopefully all of you are using VAAI capable arrays by now… This should not be a concern. Even if you are not using VAAI, optimistic locking should have relieved this for almost all environments!
- Max number of hosts accessing a file = 8! This is a constraint in an environment using linked clones like View
- Max values in general (256 LUNs, 1024 Paths, 512 VMs per host, 3000 VMs per cluster)
Once again, I am not advocating to scale-up or scale-out. I am mere showing that there are hardly any limiting factors anymore at this point in time. One of the few constraints that is still valid is the max of 8 hosts in a cluster using linked clones. Or better said, a max of 8 hosts accessing a file concurrently. (Yes we are working on fixing this…)
I would like to know from you guys what the cluster sizes are you are using, and if you are constraint somehow… what those constraints are… chip in!
Brandon Riley says
My biggest clusters are both 22 nodes. I can’t tell you how excited I was to get rid of that HA limitation! I have some smaller ones for vCloud implementations, and stuff that needs to be separate. But if I can’t see a really good reason to separate it out with vSphere 5, I go for one giant cluster.
Gabrie van Zanten says
With optimistic locking would you say you could have many VMs on a datastore and not worry about locking issues anymore? Which of course makes more sense to use very large data stores.
Bilal Hashmi says
Duncan great post. I have noticed in spite of the new ability people tend to go with several small clusters versus fewer bigger clusters. I think most of this is due to lack of awareness and I guess more posts like this will get folks to start feeling comfortable with the idea of bigger cluster.
Again I am not suggesting one should go big either, it really depends on the need. Another good case for a bigger cluster would be the ability to provision bigger vDC in vCloud for customers. As a VCD is really a RP in a vSphere cluster, one could end up in a sticky situation if the customer wants a bigger vDC than what the underlying HA/DRS cluster is capable of. This may only apply to service providers perhaps but nonetheless I think this would be another reason to go big. Again, big or small should only be dictated by the environment now, the number 8 has little importance in vSphere 5 🙂
Great post for showing to presales and customers. How about a similar post about datastore limits and concerns? With VMFS5 and VAAI many of the old limits are gone, but looks like many people still design the old way.
Frank C says
I’m always a fan for going bigger! In my mind I want the software to manage availability and resource distribution and having one large cluster instead of two smaller ones meets that need. It reduces operational overhead when it’s time to create a new virtual machine and determining which cluster should get the next virtual machine.
Our latest production cluster is a 12 node one, made up of IBM X3850 X5’s (4-way 512GB RAM) That gives us 6TB of RAM and 384 processors available. Storage comes from multiple IBM XIV’s and some IBM DS8700. Previously we have 6 node clusters…
How many VMguests do you have running?
How many vlans do you have configure in VMware?
Where are you seeing issues if any?
Bigger clusters are cheaper so any attempts to have multiple smaller clusters will result in some resistance.
2 x 4-node clusters requires 2 node’s worth of capacity in case of failure because the extra capacity is tied to a node. This costs you in terms of hardware, ESXi licensing, and likely guest licensing. A single 8-node cluster requires 1 spare node.
Our biggest is a 11node cluster.
We discussed to add four new hosts to this cluster but we don’t want to use EVC and get rid of the new CPU features (HW virtualization assist). That’s why we decided to create a new cluster.
Maybe this is a different topic but worth to be considered if you buy new host with new CPU generations and keep the ‘old’ ones. I assume we are not the only ones who keep the ‘old’ hosts 😉
The question which arises from that is: What’s better – mix up different CPU gen’s in a big heterogeneous cluster or build separate and smaller homogenous clusters?
My biggest cluster is 17 hosts. I’ve also got one at 12 and another at 11. They’re all at ESXi 5.0 and I have not yet seen any issues that are tied to the size of the cluster. We’re primarily using NFS datastores but I do have a 9-node cluster that’s both NFS and FC SAN.
Duncan says he hopes everyone is using VAAI arrays now. Not on the NFS side of things… NetApp is still a long way behind – they don’t even have the release of Data OnTap out yet that we’re looking for and we sure won’t be one of the first to run it.
Sid Brydon says
We are using 8 node clusters in our blade chassis production environment, spread across 4 c-Class chassis in an N+2 setup, the design is to cater for that very low chance of chassis failure! All using FC
In non-production we have clusters going up to 14 nodes using rackmount servers.
Travis Wood says
I am going bigger with clusters now than I did with vSphere 4, but some limiting factors I’ve encountered are :
A) Licensing – when you need to license all CPUs in a a cluster.
B) Patching – Ideally all hosts in a cluster should be updated in the same change window. The time to update a cluster depends on the cluster size, amount of time to evacuate a host, number of resource available to do concurrent host updates etc.
C) LUNs – I haven’t encountered it in vSphere 5, but theoretically you could hit the maximum number of LUNs if you have a big cluster needing lots of storage. Though with the ability to have 64TB datastores, this is less of an issue if you scale them up.
I’ve been breaking my clusters up by OS for licensing reasons so that all my Linux guests are isolated from my Windows Server guests. Doing this saved me well over $50K in license/subscription costs.
I also try to split my clusters across blade chassis as much as I can like Sid has done. And yes, we have had at least 2 chassis outages – one for a scheduled midplane replacement and another because somebody pulled the wrong power plugs (when were going to do the scheduled work).
What manufacture of Blade Chassis?
Richard Busby says
The 8-hosts-concurrently-accessing-a-VM limit only applies for block storage though, right? So if you’re using NFS you can scale all the way to 32 nodes and still use linked clones… assuming your NFS storage can handle the required throughput, of course.
The max number of hosts in a cluster is still 32. Is it right?
Why the cluster size is limited to 32? And what’s the bottleneck?
Thank you. 🙂
Great article Duncan!
We have a 8-node cluster and a 4-node cluster in each of our both (physical) datacenters. Reason for not putting all host into one big cluster are licencing costs (Oracle etc.)
Julian Wood says
I’ve always been a fan of bigger cluster sizes. I see even less benefit nowadays with segregating workloads when you have the power of DRS and resource pools to manage resources if you need to protect a group of VMs against another. Yes, you needed to split clusters for iSCSI locking and still for Linked Clones but if you are using NFS your locking problem goes away.
Always go for simplicity if you can.
I worked on a VDI implementation which started with segregating into multiple 8 host clusters but we collapsed it into a 32 host cluster all with NFS datastores.
Just gave us so much flexibility and removed another provisioning question of where to place the VM.
@Gabrie: Yes with optimistic locking (non-VAAI) or VAAI the problems that people faced in the past around scsi reservation conflicts are gone. VAAI is preferred of course!
@Ed: The VAAI reference was mainly around “scsi reservations” aka locking, this is totally different in NFS and that was never really a problem to begint with.
@Jcylinder: I believe that the limit here is DRS and the amount of computations. If you double the VMs and hosts, just imagine how busy DRS will be. This will be addressed in the future though, cannot comment on when unfortunately.
Zach Milleson says
Our biggest cluster is 26 hosts. We have been running with this many hosts since 4.0. It reduces the amount of administrative interaction as our ops teams are routinely adding/removing VMs and we don’t need to instruct them what cluster to add the VM to since there is one large cluster.
We ran into the 1024 path limit with our old SAN but we reduced the amount of paths per LUN from 8 to 4 when we moved to 3PAR. We also tripled the size of our datastores to 1.5TB. Our path limit is hovering around 300 currently.
We are in the process of refreshing our hosts in the next year. We will be doubling the amount of physical RAM with room to double again. This will allow our cluster to grow from our current 750+ VM count to the max 3000 if needed.
Our standards were set in 4.1 and nothing has changed in 5.0. Performance was just increased in multiple areas.
What are the size of your current 26 hosts? Are these older 32bit boxes with 32GB RAM limit?
In response to keeping old hosts, we just move those old hosts into our testing environment and dispose of whatever had previously been in test.
Craig Risinger says
@Travis Wood: +1/”Like”/Yay! All good points, esp. patching. Well, the licensing thing is bunk, but that’s outside our control. 😉
Don’t have only one of anything, including clusters. You can’t always predict when or what testing or maintenance or upgrading you’ll need to do. It’s useful to have a test area and/or limit the scope of problem consequences. That said, I do still like large clusters.
Re HA scalability, just remember that eventually you need to assume more than one host will fail at a time. Run the example to extremes to examine the idea: if you had a cluster of 100,000 hosts, the odds that >1 host will fail simultaneously would be appreciable. In reality, when should you use N+2 instead of N+1? I don’t know, maybe 32 hosts? Usually you have other capacity considerations that obviate this point, like keeping X% available to provide DR. And IMO, capacity planning is best done with better tools and processes than the HA settings. Capacity planning makes your VMs run well. HA settings just makes your VMs run. But that’s a whole other conversation. 🙂
Julian Wood says
I agree with your HA scalability. I never reommend only thinking in terms of N+x(being a fixed number). This is particularly important with blade environments where you may have to factor in entire chassis offline.
Pick a percentage failure level and scale with that.
So if you are happy with availabilty at 8 hosts of N+1 (12.5% for availability) then go with N+4 for 32 hosts.
Garret Black says
Our largest cluster is currently 16 hosts with 42 2TB datastores on vsphere 4.1. We are planning our upgrade to 5 now and I was planning to have larger datastores, but right now we are running into a few performance issues with lun queue depth on our dedicated AMS2500 SAN so larger datastores wouldn’t help that at all. It seems whenever you increase capacity in one area there is something you can miss and cause issues in another area. Always great to see these posts to see what everyone else is doing.
Eduardo Rocha says
Hey Duncan, you mentioned the max number of hosts accessing a file = 8.
Would that be applicable to RDM ?
thanks a lot.
No. Only for Linked Clones.
Greg W. Stuart says
Great read, and yes, I’m glad there are not many cluster constraints when it comes to hosts in HA. Right now, the biggest cluster I’ve seen with my current client is 15 hosts and that’s their production cluster. I personally haven’t seen many clusters that have more than 15 hosts.
great article – end of the day
to me – its about risk vs limitations (none)
steven zhu says
why searching for best way to deal with my issue, I came across this post. great article.
I did hit max path per host on my 10 node (having almost 100 VMs on each) cluster. I have 4 FC connections per host and each LUN is presented with 2 paths. One day, the admin tells me he can’t see new LUNs provisioned.
I am thinking about splitting the cluster into two, and have half of the LUNs to each cluster.
Anyone has better idea to resolve the issue?