Death to false myths probably sounds a bit euuhm well Dutch probably, or “direct” as others would label it. Lately I have seen some statements floating around which are either false or misused. One of them is around Admission Control and how it impacts consolidation ratio even if you are not using reservations. I have had multiple questions around this in the last couple of weeks and noticed this thread on VMTN.
The thread referred to is all about which Admission Control policy to use, as the selected policy potentially impacts the amount of virtual machines you can run on a cluster. Now lets take a look at the example in this VMTN thread, and I have rounded up some of the numbers to simplify things:
- 7 host cluster
- 512 GB of memory
- 132 GHz of CPU resources
- 217 MB of Memory Overhead (no reservations used)
So if you do the quick math. According to Admission Control (host failures example) you can power-on about ~2500 virtual machines. That is without taking N-1 resiliency in to account. When I take out the largest host we are still talking about ~1800 virtual machines that can be powered on. Yes that is 700 slots/virtual machines less due to the N-1, admission control needs to be able to guarantee that even if the largest host fails all virtual machines can be restarted.
Considering we have 512GB in total that means that if those 1800 virtual machines on average actively use 280MB we will see TPS / swapping / ballooning / compression. (512GB / 1800 VMs) Clearly you want to avoid most of these, swapping / ballooning / compression that is. Especially considering most VMs are typically provisioned with 2GB of memory or more.
So what does that mean or did we learn? Two things:
- Admission Control is about guaranteeing virtual machine restarts
- If you set no reservation you can power-on an insane amount of virtual machines
Let me reemphasize the last bullet, you can power-on an INSANE amount of virtual machines on just a couple of hosts when no reservations are used. In this case HA would allow for 1800 virtual machines to be powered-on before it starts screaming it is out of resources. Is that going to work in real life, would your virtual machines be happy with the amount of resources they are getting? I don’t think so… I don’t believe that 280MB of physically backed memory is sufficient for most workloads. Yes, maybe TPS can help a bit, but chances of hitting the swap file are substantial.
Let it be clear, admission control is no resource management solution. It is only guaranteeing virtual machines can be restarted and if you have no reservations set then the numbers you will see are probably not realistic. At least not from a user experience perspective. I bet your users / customers would like to have a bit more resources available than just the bare minimum required to power-on a virtual machine! So don’t let these numbers fool you.
Eric Sloof says
I’ve created a nice presentation about this subject with the title “Mythbusting-VMware-HA-works-out-of-the-box”
I’ve always found the vSphere Client’s information related to Admission Control to not be sufficient and quite confusing. The text makes it seems like it IS a resource management solution in stating that it will reserve resources and limit the amount of virtual machines that can be ran. As you stated, unless you are doing reservations on your VMs, Admission Control won’t be doing too much considering the 32mhz for CPU and 0MB+overhead for memory.
Duncan Epping says
I agree vCenter does kinda gives the wrong impression.
Irfan Ahmad says
I agree that vSphere Client doesn’t make this analysis easy.
But honestly, HA is quite complicated under the hood. As Duncan says in this post even in simplified form, there are gotchas:
“(1) Admission Control is about guaranteeing virtual machine restarts
(2) If you set no reservation you can power-on an insane amount of virtual machines”
The problem is that many people do feel pressure to put in reservations for memory or CPU once in a while for some VMs. Having a blanket rule of NO RESERVATIONS is too onerous and too simple. Such a simple rule forces people into a mode where they either use no reservations and the only guaranteed policy (number of host failures) OR switch over to one of the non-guaranteed policies or turn off admission control entirely.
Yet another key issue is that large VMs create an effective “reservation” due to their large overhead which again limits the number of VMs that can be powered. This is another tricky calculation.
These are some of the reasons why CloudPhysics has created two HA related cards on our site. One is to help analyze HA cluster health whereas the second is to interactively simulate HA cluster resource management policies.
If you are using HA, I highly recommend simulating different HA policies and resource management settings using the CloudPhysics HA simulator before making any further changes to your environment. Visit http://www.cloudphysics.com and you’ll be simulating within minutes. I find that most customers are up and going in less than 10 minutes.
Thanks, Irfan. I saw this at VMworld and have been meaning to give it a look. Signed up and will give it a whirl!
Irfan Ahmad says
Phil: see my other comment above.
Most people want to hear a single answer but their environment is unique. So the ideal answer depends on that. I think when Duncan is asked about a single policy that achieves the right trade-offs, that’s not a fair question to begin with. I’ve had that discussion with him and we build the HA simulator as a result of that. I have seen plenty of customers unnecessarily take on additional risk by moving away from the “host failures” guaranteed policy. If they had just used the HA simulator, they would be able to see what is best for their cluster.
We are in the process of improving the HA Simulator so feedback welcome.