A while back I had discussion with someone and he asked me if it was possible to limit the amount of eggs in a single basket, in other words limit the amount of VMs per host. The reason this customer wanted to do this was to limit the impact of a failure. They had roughly 1500 VMs in their cluster and some hosts carried 50 VMs while other had 20 or 80. This is the nature of DRS though and totally expected.
If one of these hosts would fail, and lets say they had 80 VMs the impact of that would be substantial. To minimize the risk they wanted to limit the amount of VMs per host. I had thought about this before and had already asked the HA and DRS team if they could do anything around this. The DRS team started looking in to it and to my surprise they managed to get it in quick.
In VMworld 2012 session “VSP2825: DRS: Advanced Concepts, Best Practices and Future Directions” by Ajay Gulati and Aashish Parikh a solution is presented. (You can watch this session for free on youtube, highly recommended!) This solution is a new vSphere DRS advanced setting which is introduced in vSphere 5.1.
Note that when you configure this setting it might impact the performance of your virtual machines as it could limit the load balancing mechanism of your cluster. If you have no requirements to limit the amount of VMs per ESXi host, don’t do it. When this setting is configured, vSphere DRS will not allow migrations to a host which has reached the threshold and will also not admit new VMs to the host if it has reached the threshold.
Domenico Viggiani says
And a good suggestion to spread VMs between hosts, more equally than DRS?
No that is not what it is for, it is not for load balancing, just to restrict the number if VMs per host and that is it. You will hit certain restrictions like the number of load balancing options in your cluster when you have an imbalance and reaching your defined limit threshold.
I wouldn’t typically recommend using this to be honest.
I don’t image it to be useful too. If you hit a resource limit, DRS should rebalance the cluster. If you want to separate VMs – again, DRS rules. Why limit the _number_ of VMs per host without saying _which_ VMs are not welcomed on a specific host(s)…
Nice to see real conversations end up as features in VMware products. Keep listening to customers and act when it makes sense.
that still limits DRS a lot I would say…
Can DRS make some math? I mean you see this every day…for instance the last time I was configuring a cluster 7 nodes 120 GB RAM each….I had 6 VMs in one node and 2 other VMs in other 2 nodes….I know this is DRS working but can DRS do something like this host has a lot of resources but I would like to share to others nodes since they are not overcommitted….
Duncan Epping says
Keep in mind that DRS doesn’t move VMs when there is no need because there is a “cost” and “risk” factor to every move. So moving them just to balance stuff with no gain wouldn’t really make sense.
But of course if the logic is added DRS should be able to do that. If you check the youtube movie they actually talk about that at the end of the session, more or less anyway.
As i interpreted its a hard limit which cant be violated by DRS and could lead to serious performance issues as DRS is bound by the limit especially on environments with small to medium sized clusters and/or enviroments with big differences in size and resource demands in the VMs. Which makes its it a dangerous option to use for balancing in my opinion.
A very simple example:
3 hosts in a cluster
6 VMs per host
LimitVMsPerESXHost is set to 9
If one host dies and HA (I assume HA totally ignores this value) restarts the 6 VMs evenly on the remaining hosts its gives DRS zero room for any balancing because both hosts have reached the limit or if more VMs are booted on one the two hosts it can do only a tiny bit of balancing until both each host has 9 VMs.
As far as my knowledge goes DRS wont make balancing choices: if i move VM 1 to host B and then i can move VM 2 to host A to make a more balanced cluster, so in in the above case if in an unlucky case all the big or resource demanding VMs end up on one host u could experience serious performance issues because DRS cant do a damn thing about host that gets hammered.
There is another trap lying in wait if u have this limit not exceptionally well documented because its very typical setting u set and then forget because u have do it once. When u start upgrading or replacing the hosts the limit could hit u smack in the face when the amount of VMs on each hosts grows. If you’re lucky it happens under normal conditions when u try to start a new VM and DRS has no room available because of the limit. But knowing Murphy this happens after a host failure and u are trying to figure out why DRS isnt doing its magic.
Based on this information i wouldn’t recommend it to any Vmware customer and only when its changed to a soft limit (which is much harder to program) in which DRS abides the limit but when big unbalance in the cluster occurs it can temporarily ignore the limit to fix the unbalance i would consider to put into production. But when is the unbalance big enough to allow DRS to override the limit. As i see very little practical use of this limit i dont think its worth the time to change it to a soft limit.
instead of number of VMs percentage would be more practical I think
nice I was checking the video when I wrote the percentage thing…. few seconds later they talked about LimitVMsPerESXHostPercent
I think more flexible than LimitVMsPerESXHost
or I did not get this you have to use both
or just one?
That is not in the current release! It is proposed for a next release.
Suppose Cluster has 4 nodes with 40 VMs on each host and the limit is set to 20 host per VM. Does this make sense? I guess no !
The limit must be greater than the VMs residing on each host?
Also a detailed description about how to calculate the limit or any document if available would be appreciated.
A scenario would be more helpful.
Mo Jamal says
Great article DE. I can sense the hidden gesture ( and agree with DE) that do not use it unless you have a very good reason. Unless all VMs are unformily sized ( CPU, mem, workload etc) in the cluster unless its a VDI cluster, why would you think of restricting VMs per host! If you do your maths, DRS greymatter should prevail. Even in case of VDI, you may have a differing sizes of compute workload.
May be limiting on host percentage basis..but that wouldn’t necessarily result in equal no of VMs per host..