Admission Control is always a difficult topic when I talk to customers. It seems that many people still don’t fully grasp the concept, or simply misunderstand how it works. To be honest, I can’t blame them. It doesn’t always make sense when you think things through. Most recently for Admission Control we introduced a mechanism in which you can specify what the “tolerated performance loss” should be for any given VM. This isn’t really admission control unfortunately as it doesn’t stop you from powering on new VMs, it does, however, warn you if you reach the threshold where a host failure would lead to the specified performance degradation.
After various discussion with the HA team over the past couple of years, we are now exploring what we can change about Admission Control to give you more options as a user to ensure VMs are not only restarted but also receive the resources you expect them to receive. As such, the HA team is proposing 3 different ways of doing Admission Control, and we would like to have your feedback on this potential change:
- Admission Control based on reserved resources and VM overheads
This is what you have today, nothing changes here. We use the static reservations and ensure that all VMs can be powered on!
- Admission Control based on consumed resources
This is similar to the “performance degradation tolerated” option. We will look at the average consumed CPU and Memory resources, let’s say past 24 hours), and base our admission control calculations on that. This will allow you to guarantee performance for workloads to be similar after a failure.
- Admission Control based on configured resources
This is a static way of doing admission control similar to the first. The only difference is that here Admission Control will do the calculations based on the resources configured. So if you configured a VM with 24GB of memory, then we will do the math with 24GB of memory for that VM. The big advantage, of course, is that the VMs will always be able to claim the resources they have assigned.
In our opinion, adding these options should help to ensure that VMs will receive the resources you (or your customers) would expect them to get. Please help us by leaving a comment/providing feedback. If you agree that this would be helpful then let us know, if you have serious concerns then we would also like to know. Please help shape the future of HA!
Don Bessee says
Thanks for asking!
I favor the third option with a caveat; allow the overprovisioning level for cpu and memory (seperately) to be tolerated by the cluster to be set as part of HA settings. This would allow for the most control and transparency. The overprovisioning level would have to be applied to the level of resource failure selected (number of hosts or percent of cluster resource) for this to work right. If the message refusing to start, vMotion, etc. would also specify which resource was exeeded it would create a very positive control for the admin as well.
This would have the further advantage of inceasing clarity around VM performance which can be a somewhat murky subject.
Duncan Epping says
Note that the third option would also be fairly restrictive, as it would be based on the configured resources for the VM. Something to take in to consideration.
Into a memory perspective,
1- Reservations – If you think that overhead is the minimum amount of resources that a esxi host needs to porweron a vm. For customers that design a fixed resources allocation admission control based on slots is a more conservative way to deal with fragmentation. Robust like a solid rock. This is pure essential. Keep percentages too.
2- Shares – Here start overcommitting and ROI games. memfreepkt will govern severe reclamations technics. Here make sense a admission control based on consumed resources. For customers that design for cluster memory consumption spikes, admission control based on consumed resource will be the optimal. My feedback is to track last 8 hours.
3- Limits – configuring a artificial roof for example with 24GB for a vm, make sense that this become available for one occasion. This admission control does not make sense for me, but the fact is that this is a option that vsphere product make be available for sure. Some customer design with limits specially providers that sells compute allocation.
My feedback is track memory demand for each powered on VM for last 8 hours, make a average and use this value for govern admission control based on consumed resources.
Another feedback a graphical runtime admission control view tab 😉
By the way, I have a lot of another feedbacks 😉
I don’t know if I understand it correctly, but the difference of #3 is that you can decouple Admission Control from reservations. But I would like reservations to be changed in the case of an issue, and I don’t see how that could be done.
In fact, I don’t understand your last point: “The big advantage, is that the VMs will always be able to claim the resources they have assigned.” I thougth reservations were exactly for that. Sounds like soft reservations, like memory will be available in the pool but can be reclaimed even if the VM is under the configured amount ?
Duncan Epping says
Not sure what you are asking Carlos. In this case Admission Control would be done based on the actual configured resources for the VM, so a VM with 8GB would have admission control calculations done with 8GB. This also means that if a host fails, that 8GB can be guaranteed theoretically.
I don’t see the difference between #1 and #3, if you substitute “reservation” for “configured resources”.
Duncan Epping says
I am not sure what your point is, but thanks for the comment.
This is odd. #1: configure reservation, admission control ensures you will have that available when needed, kernel does not take those away for others.
#3: configure resources, admission control ensures those are available .
Is that so ?
Duncan Epping says
Yes that is what I mention above. Today admission control takes reservations in to account. The question is if it is useful to take the configured resources in to account as an alternative to the already available option.
Then I restate what I tried to propose as an alternative: create a “percentage of reduction to tolerate” and reduce the reservations in case of failure. That should be optional on a per VM case though. When everything is ok, you get full reservation, but in case of trouble, kernel is allowed to reclaim some resources to compensate.
Duncan Epping says
Thanks, I will provide that suggestion to the engineering team.
Implement both, but prioritize option two. Thanks.
Duncan Epping says
Suresh Siwach says
First of all thanks for inviting us for giving feedback.
Improvement which you have mentioned in your Post will shape VMware HA for future requirements and I am agree with the changes you suggested.
It took time for me giving feedback because you have covered almost everything. However I have tried to go into deep dive to find the improvement.
Problem: As per my experience and observation I have seen if a VM’s claim a memory from the ESXi, it is never getting released back to ESXi’s (apart from VM’s migration on another host or powering off VM’s) . This is the reason for requirement of admission control policy. CPU congestion is rare case.
Resolution: DRS have a scale from most conservative to most aggressive. In Most conservative mode VM’s will migrate only when we will placing ESXi in Maintenance Mode(MM). So next to most conservative pointer we should have additional next pointer for host failure. In this situation DRS should check and initiate the migration(As per priority and respect to affinity or anti-affinity rules) of the VM’s which are having high ratio (for example 100GB:40GB) between memory claimed from ESXi by VM’s & active memory used by VM’s, and very less ratio (for example 100GB:90GB) to memory claimed from ESXi & Memory assigned to VM. While VM will be migrate to another host it will capture the require memory only and we will be having more free memory.
I trust on VMware Development Team they can achieve next to infinite level.
I am looking forward for your comment on my feedback.
Thank you for giving us the opportunity to give feedback here. Admission Control is one of the features that currently does not offer an easy solution for my requirements.
Typically, my customers have clusters of hosts of the same size. The majority of VMs have a comparable size, but in each environment, there are some machines that are multiples the size. Reservations are only made for the most important VMs. Often, the largest VMs are the most important.
For me this means the following problems with the existing algorithms
1. Slot policy
– Slot size based on largest reservation does not fit the smaller VMs and waste resources
– Manual slot size is a complex balancing act between capacity utilization and availability (hard to calculate)
2. Cluster resource Percentage
– Only consider reservations (and ignore 90% of VMs)
– A good value for das.vmmemoryminmb and das.vmcpuminmhz is hard to find (like manual slot size)
3. Dedicated failover host
– Poor fit due to the concept, since a smaller distribution affects a larger number of VMs in the event of a failure
In summary, a workaround could be found for this. However, manual adaptation to changing circumstances is necessary on a regular basis.
I would therefore like an automatic, simple and flexible solution. For example, an algorithm with a combination of the following options:
1. Host failures to tolerate: X Hosts
2. Metric for calculation
– Consumed resources (average of last x hours)
– Configured resources
3. Performance degradation to tolerate: X percent (for example if set to 10%, calculate with 90% of the metric above)
4. Additional failover resources: X percent (add these resources to the sum of running VMs – in case 110% of a host should be reserved for failover use 10% here. This should avoid setting a custom value which will be wrong after adding/removing a host to the cluster)
– Soft (just issue a warning but allow to power on additional VMs)
– Normal (like the current behavior)
– Force (make sure – maybe in combination with DRS – that the largest VM can be failed over)
Johan van Amersfoort says
Thanks for sharing!
From the options you shared, I personally like the second one.
Also, would it be possible to manage a priority parameter through policy-based management?
So let’s say you have 50 mission critical VMs and 50 “nornal” VMs. With a policy you would be able to prioritize the mission critical VMs over other VMs to ensure a reservation on resources.
Duncan Epping says
Not possible today, but as Frank has discussed for VMC on AWS, we are most definitely moving towards compute policies as well: http://frankdenneman.nl/2018/10/19/compute-policy-in-vmware-cloud-on-aws/
Thanks for the opportunity to provide feedback.
What I would like to do is define admission control on a resource pool/pools within a cluster.
The use case would be for large vMSC where we are losing 50% of the resource by enabling admission control.
Would it be possible to use the 50% that is reserved for site failover to run dev/test workloads and in the event of a site failure, HA would restart at the opposing site? Then depending on cluster utilisation they would start the production VMs at the penalty of powering off dev/test VMs. If there was sufficient resource for failover then the resource pools would just have their share values defined so as per normal. I guess it might be a conditional boolean style logic that would assist this.
Christian Lugo says
This is mostly what I’m thinking, but I guess you could just lower the percentage today to reserve 40% and you will be covered.
My vote would be to have a percentage for CPU and Memory so we can see how far we are from the recommended 50%.
I know that on the cluster we can see the current usage but something more visual will be great, like a summary for the cluster.
There could be also a “morituri” class for machines that are allowed to use failover resources but condemned to be shut down in the event of a failure to allow for primary load recovery.
David Pasek says
First of all thank you Duncan to open this topic again.
I’m personally ok with existing “Admission Control based on reserved resources and VM overheads”. It is important to know that per VM reservations has to be used if one would like to use Admission Control for some capacity/performance guarantee after ESXi host(s) failure. Lot of people have been told during vSphere ICM trainings to not leverage reservations per VM but per Resource Pools but with such approach admission control do not take such RP reservations into consideration.
The only problem I have with VM reservations is the manageability. Nowadays, VM reservations has to be set statically per VM which is difficult to manage in larger scale. Home made automation (for example PowerCLI) can be used but policy based management supported by VMware would be IMHO much better approach to design admission control based on customer requirements. So I personally vote for #1 but introduce VM policy based management to simplify VM resource reservation management. Probably something Frank Denneman and Maarten Wiggers talked about on Tech Preview: The Road to a Declarative Compute Control Plane [VIN2256BU].
Option 3 would be the best imho for clusters where all VM’s need a reservation. This is a reality, especially on the memory reservation since SQL or SAP are performing better with there own memory management hence the fixed memory for all Vm’s in one cluster. The only option now is Specify Failover Hosts or Percentage of Cluster Resources Reserved. The downside with the last option is that it will not take in account for very large VMs. Would be nice if DRS first makes room so the VM can start up.
I think these options would be great additions to Admission Control. The performance degradation option in 6.5 was a very welcome improvement and these options would greatly improve on that. I’ve found that over the years many colleagues I spoken to about HA already believe admission control behaves like option 2 or 3 and are surprised when I tell them otherwise.
I’ve seen many cases of over-provisioned clusters where a host failure has unexpectedly impacted performance and I think these additional options will be very useful in preventing this from happening without having to worry about manually checking performance stats or setting reservations.