Recently I’ve been hearing this comment more and more, DRS is just a load balancing solution. It seems that some folks spread this FUD to diminish what DRS really is and does. Let me start by saying that DRS is not a load balancing solution. The ultimate goal of DRS is to ensure all workloads receive the resources they demand. Frank Denneman has a great post on this topic as this has led to some confusion in the past. I would advise reading it if you want to understand why exactly VMs are not moved while the cluster seems imbalanced. In short: why balance VMs when the VMs are not constraint? In other words, DRS has a VM centric view of the virtual world and not a host centric… In the end, it is all about your applications and how they perform and not necessarily about the infrastructure it is hosted on, DRS cares about VM/Application happiness. Also, keep in mind that there is a risk and a cost involved with every move you do.
Of course there is a lot of functionality that you leverage without thinking about it and take for granted. Things like Resource Pools (limits / reservations / shares), DRS Maintenance Mode (fully automated), VM Placement, Admission Control (yes DRS has one) and last but not least the various types of (anti) affinity rules. Also, before anyone starts shouting about active memory vs consumed (PercentIdleMBInMemDemand solves this) or %RDY taken in to account… DRS has many knobs you can twist.
But besides that, there is more. Something not a lot of people realize is that for instance HA and DRS are loosely coupled but tightly integrated. When you have both enabled on your cluster then HA will be able to call upon DRS for making the right placement decision and defragmenting resources when needed. What does that mean? Well lets assume for a second that you are running at full (or almost) capacity and a host fails while taking a host failure in to account by leveraging HA admission control. When the host fails HA will need to restart your VMs, but if there at some point is not enough spare capacity left to restart a VM on a given host? Well in that case HA will call upon DRS to make space available so that these VMs can be restarted. That is nice right?! And there is more smartness coming with considering HA / DRS admission control, hopefully I can tell you all about it soon.
Then of course there is also the case where resource pools are implemented. vSphere HA and DRS work in conjunction to ensure that when VMs are failed over that shares are flattened to avoid strange prioritisation during times of contention. HA and DRS do this as VMs always failover to the root resource pool of a host, but of course DRS will place the VMs back where they belong when it runs the first time after the failover has occurred. This especially is important when you have set shares on VMs individually in a resource pool model.
So when someone says DRS is just a simple load balancing solution take their story with a grain of salt…
Looking forward to seeing how the new IO/Net features may play into DRS. Still our shop are pretty big fans of the vmturbo product.
David Chung (@dchung615) says
I think VMturbo has much “smarter” version of DRS implementation. I think VMware should take a second look at integrating vCPU Ready in to the DRS calculation. I know Duncan talked about it in other blog post but I don’t know what came out of it. http://www.yellow-bricks.com/2013/05/09/drs-not-taking-cpu-ready-time-in-to-account-need-your-help/
Duncan Epping says
As I stated above David, CPU Ready is already considered and you can even make DRS more aggressive with regards to this if you want to by using the mentioned advanced setting.
Just to note, in all investigation the weird scenarios described in most of those comments around %RDY being high they were the result of incorrect configured power savings settings.
I like what VMTurbo has done, but personally I prefer to have direct integration with components like vSphere HA and NetIOC in the near future. To each his own I guess.
David Chung (@dchung615) says
Yes – I would love it if all those features are baked in to DRS/HA combo. 🙂
What has changed about DRS since this post which makes it seem more like a load balancing solution? http://www.yellow-bricks.com/drs-deepdive/
[email protected] says
That post is based on vSphere 4.0 and only describes a part of what I mention above. For a more detailed description of what DRS is and how it works I prefer to refer to the Clustering Deepdive book as it is more up to date.
michael stump (@_stump) says
It’s a perception issue, because DRS appears to balance load as a result of ensuring VM resource entitlement. And the concept of a “load balancer” is easy to understand, while understanding VM resource entitlements requires more knowledge of virtualization in general.
Personally, I don’t see why anyone would spend $100k to replace a built-in feature of vSphere, but that’s just me.
Brian (or anyone else with experience to comment) – If you see this, please comment on the benefits of vmTurrbo. Their sales guys have been after us for a long time and I just don’t see the cost v. benefits for our small-to-medium sized shop. We could actually over-buy hardware for less money then their licensing.
Have you done a full demo of the product at all yet? Even if you can afford to over-buy hosts, storage, etc vs the software that doesn’t necessarily mean your environment is running optimally right? There are plenty of benefits I gain in my environment that may not benefit you, don’t think of this as a DRS alternative as DRS does not come close to doing what this product does as the commenter above is suggesting I assume. I’m more than happy to swap emails with you offline if you like. I can give you more information on my environment and how we are using the product etc. I also had put off talking to sales for some time but when we came up on our ELA renewal I finally checked them out and was impressed with what I saw.
Brian – We got a sales demo, and maybe a virtual appliance, a couple years ago and management at the time was not impressed for the price tag. Of course, since then we get our regular and frequent sales calls! How do we securely exchange e-mails?
See if you can hit my up on LinkedIn…
Raffaele Giordanelli (@jordanelliraf) says
Interesting post Duncan!
Just wondering how this behavior changes in stretched clusters.
Are risk and cost computed differently?