In vSphere 5.5 a couple of new pieces of functionality have been added to DRS. The first one is around the maximum number of VMs on a single host that DRS should allow. I guess some of you will say hey didn’t we introduce that last year with that advanced setting called “LimitVMsPerESXHost“? Yes that is correct, but the DRS team found this too restrictive. They’ve added an extra setting which is called LimitVMsPerESXHostPercent. A bit smarter, and less restrictive… so how does it work?
Basically LimitVMsPerESXHostPercent is LimitVMsPerESXHost in a more dynamic way as it automatically adjusts the limits. When you set LimitVMsPerESXHostPercent to 50 in a 4 host cluster which is running 20 VMs already and you want to poweron 12 new VMs. How many VMs can a single host run?
32 total VMs, 4 hosts --> mean: 8 VMs per host
We set the percentage to 50 so the new limit is 8 + (50% * 8) = 12
So if host 1 was only running 2 VMs, it can now take on an additional 12 without the need for you to constantly change the LimitVMsPerESXHost when you introduce new VMs. LimitVMsPerESXHostPercent does this for you.
Latency Sensitive Workloads
As of vSphere 5.5 DRS recognizes VMs marked as latency-sensitive (vCenter Web Client option). With 5.1 it could occur that latency sensitive VMs were moved around by DRS, as you can imagine when a VM migrates this will impact which ever application is running. Although the impact is typically little, for a latency sensitive workload even “little” could be disastrous. So in order to avoid this unwanted impact DRS treats latency sensitive VMs as if they have soft-affinity to the host they are running on. But what when there is an absolute need to migrate this VM, well as mentioned it is “soft affinity”, so treated like a “should rule” and in that case it means that the VM can be moved when needed.
Do note that within the DRS UI you don’t see this affinity anywhere, this is solved within DRS itself. Awesome and needed if you ask me!
Last but not least another new advanced option, this option is titled “AggressiveCPUActive“. When you set it to “1” DRS will be more aggressive when it comes to balancing VMs when %RDY is impacting them. This can be useful in environments where %RDY has a very spiky behaviour. AggressiveCPUActive will help avoid averaging out the bursts and will allow for DRS to aggressively balance your virtual infrastructure. (Official explanation: AggressiveCPUActive, when set to 1, causes DRS to use the 80th percentile of the last five 1-minute average values of CPU activity (in other words, the second highest) to predict the CPU demand of virtual machines in the DRS cluster, instead of using the 5-minute average value (which is the default behavior). This more aggressive DRS behavior can better detect spikes in CPU ready time and thus better predict CPU demand in some deployments.)
DISCLAIMER: I do not recommend using advanced settings unless there is an absolute need for it. I can see why you would use the “LimitVMsPerESXHostPercent” but be careful with “AggressiveCPUActive“.
Duncan, can you give us more detail on what the latency-sensitive option actually does to virtual machines? I’ve seen the options pop up in the Resource Management document (http://pubs.vmware.com/vsphere-51/topic/com.vmware.vsphere.resmgmt.doc/GUID-9F4FD589-A73B-454A-A5A5-FED4C0F918C3.html) for 5.1, but this really lacks detail on what it does to the virtual machine to “optimize” the it – and what sort of impact this has to the host the guest resides on, other guests on the same host, or even the cluster as a whole now that DRS is aware of it.
hi, you mind to come up a comparison table for v4.1, 5, 5.1, 5.5 HA,DRS? TQ
So if host 1 was only running 2 VMs, it can now take on an additional 12
shouldnt it be
So if host 1 was only running 2 VMs, it can now take on an additional 10
since 10 new+2 existing=12 — > 50% ???!!!!
Tim Cooke says
My big bugbear about DRS in 4.x and from the look of it in 5.x as well is that is doesn’t make account of the impact to the number of VMs affected should a host fail. For example, a two host cluster with 20 VMs. Put one host in MM and VMs are migrated onto the other host. Take that host out of MM again and unless DRS detects a performance improvement in CPU then all 20 VMs would stay running on that first host. If that host then fails, all 20 VMs go down. I’d really like to see a function that tries to take into account the number of VMs as well as their performance needs.
Along the same lines, DRS doesn’t account for memory utilization on the host, so you could have a number of the VMs with large memory balloons due to contention for RAM, but DRS won’t move them onto a host with more free memory because it only looks at CPU. Should a VM with a large balloon suddenly wake up it then takes a while for it to deflate the balloon (in turn taking time inflating it in other VMs as needed) to get the physical memory it needs. Better surely to balance the VMs across hosts based on their memory needs too.
If there is a way to do either of the above with 4.x or 5.x already, I would be interested to know how to configure it.
Regarding memory utilization DRS looks only at active memory. But in 5.5 there is a new setting, PercentIdleMBInMemDemand, which defaults to 25(%), and that adds this % to the active count. This means DRS becomes more aggressive, since DRS now considers a VM that previously had 10% active memory to have 35% active memory.
If you want to further increase the aggressiveness, you increase that value.
This was brought up in “VSVC5280 DRS: New Features, Best Practices and Future Directions” and VMworld 2013.
With addition of this option DRS/DPM becomes less aggressive right? As per your comment, DRS/DPM becomes more aggressive when we increase the value.
Eric Hoy says
I have to agree with Tim…