I saw a question today which was interesting, how do I disable DRS for a single host in the cluster? I thought about it, and you cannot do this within the UI, at least… there is no “disable DRS” option on a host level. You can enable/disable it on a cluster level but that is it. But there are of course ways to ensure a host is not considered by DRS:
- Place the host in maintenance mode
This will result in the host not being used by DRS. However it also means the host won’t be used by HA and you cannot run any workloads on it.
- Create “VM/Host” affinity rules and exclude the host that needs to be DRS disabled. That way all current workloads will not run, or be considered to run, on that particular host. If you create “must” rules this is guaranteed, if you create “should” rules then at least HA can still use the host for restarts but unless there is severe memory pressure or you hit 100 CPU utilization it will not be used by DRS either.
- Disable the vMotion VMkernel interface
This will result in not being able to vMotion any VMs to the host (and not from the host either). However, HA will still consider it for restarts and you can run workloads on the host, and the host will be considered for “initial placement” during a power-on of a VM.
I will file a feature request for a “disable drs” on a particular host option in the UI, I guess it could be useful for some in certain scenarios.
Do you have an example of a scenario? I can’t think of one.
Duncan Epping says
I don’t know why they wanted to do this, but they did… So I figured I would share.
We had scenario when we need to consolidate Data Center and we use Zerto for replication and pre-seed 105 Tb of Data ( Application Servers and Databases) , this Data Center have only 30Mbps uplink so it would take almost 1 year to replicate 105 Tb over 30Mbps, so we use standalone host and pre seed data on it , then we sheep it to out Data Center and instead of creating affinity rules , I created separate cluster and add this host to the cluster , on a background we used same DvSW so once VMs came up,we were able to do compute and Storage vmotion
Don Woodward says
Would be useful for isolating a Host when adding new storage to test new storage before adding to all hosts – if you have resource pools disabling DRS on cluster level deletes resource pools.
In my cluster I have a hosts that perfectly fits a critical VM (number of cores and memory) but DRS keeps migrating VMs onto this host pushing it at 90% memory and increasing the vCPU/core when there’s plenty of free resources on the cluster.
– I don’t want to create a VM-group with all the other VMs for a “should not run on” rule because it’s not dynamic and dirty.
– I don’t want to disable the vmotion vmkernel to keep this possibility and spare change management an extra head ache.
– I don’t want to remove it from the cluster because I need HA.
So this is a use case where this feature would be nice to have. I’m actually surprised it’s not already in, the same way we can set a VM to manual DRS.
Cheers for filing a feature request.
Sebastian Barylo (@sbarylo42) says
Most common scenario I know, was when some (not very virtualization friendly) application developers wanted the proof that “all those other VMs the host is running” do not steal capacity from their precious system (as long as the host is not in “High” state or more that is).
Disabling vMotion interface was the tactics I typically employed, but if you do that, then from my experience with 4.1 up to 5.1 at least, DRS is still taking resources of such “excluded” host” into account for load balancing calculations.
As a result (especially in small or heavily loaded clusters) there is increase of DRS recommendations (and subsequent vMotions among remaining hosts) in the cluster, which might be not really desirable side-effect.
Duncan Epping says
Definitely not preferred indeed,
Exactly what we see. Even with vMotion disabled and the hosts not having access to the production storage. DRS calculates the compute available from the DR hosts and calculates the cluster as imbalanced and moves the VMs around on the production hosts in an attempt to balance.
Eric Schwinger says
That’s an interesting request. Would using the the “Specify failover hosts” HA option accomplish the same thing?
Todd Scalzott says
What about for testing changes that were made while in Maintenance Mode? Would disabling DRS not allow one to exit Maintenance Mode and then vMotion a non-critical worlkload or two?
We have primary hosts in one DC and standby DR (in maintenance mode) hosts in a different DC that participate in the same cluster. In the event of a DC outage we promote replica SAN resources into the DR hosts, exit maintenance mode and HA will restart the VMs. If we take these hosts out of maintenance mode DRS constantly (every 20-30mins) moves VMs around on the primary hosts trying to rebalance. So these hosts are always in maintenance mode. If we could exclude / disable DRS on these hosts it means we could run non critical workloads and make use of the otherwise idle compute.
Duncan Epping says
Thanks for that use case, always helpful!
r bousie says
Good call Duncan, I can think of multiple scenario’s were this would be helpfull.
That’s great! I would use it in the following scenario: I have a seven host cluster. One host will be used for a elastic (syslog) VMs that have high CPU load and a pre calculated 1pCPU to vCPU load. I don’t want to have other VMs on this host and even not those VMs on the other Hosts as well. I don’t need HA for this elastic VMs but I need to move them in case of Maintenance or upgrade.
But you have to think about to update “VM/Host” affinity Rules when you deploy new VMs :-/
My use case: I’m gradually migrating VMs from an old cluster to a new cluster. I only have a limited amount of Veeam licenses for backup, so I have to place some hosts in maintenance mode so no VM’s get moved there (as they wont get backed up) But when one or more hosts are in maintenance mode in a cluster, for some reason you can’t deploy a new VM from a template into that cluster. So disabling DRS would be much better for me than placing in maintenance mode.
Fred Peterson says
Oracle licensing since Oracle is still a piece of @#$! company.
You can run things in a cluster as long as you can guarantee it won’t migrate on demand. Disabling DRS in order to provide host failure failover would be one reason.
I have a client who just underwent a Microsoft licensing audit. One of the things MS flagged is that Express editions of SQL (such as those installed with applications like Skype for Business Servers and Anti-Virus central servers) are only valid so long as they are not in a cluster.
MS flags DRS as being “cluster capable” so in order for them to accept Express as being not-clustered they require you to disable DRS for the hosts running SQL Express.
Seems a peculiar requirement but here we are.
[email protected] says
Never heard anything like that before, that is very strange
Ronald Martens says
For support issues it could come in handy if you can disable DRS on a specific host. Lately I had an issue with Exchange and MS asked me to isolate 2 specific servers, because of sizing recommendations. To not reconfigure all DRS groups and rules it would be easier to create a new DRS group for the 2 servers and to lock them to the DRS disabled host. Or could there be any other way?
Tek Thapa says
Here is another one or similar use case: I am troubleshooting MSSQL Server Database for sometime now. GreenWay Application utilizes this DB Server. As soon as, all users hit the network this DB Server’s CUP utilization passes above 90%. I have 8 CISCO UCS blades running on the cluster. GreenWay is asking us to disable the HT on the host where this SQL Server is running. So, I started to look for solution and found this thread! My goal is to dedicate one Blade for this test without having to go multiple hoops, but it seems there are not many options available. I have built one extra blade in this same cluster with HT disable, but I see the pros and cons now. I will be moving this SQL VM to this host, disable the VMotion and test. If GreenWay’s claim turns valid, then I have to find longterm solution. Any suggestions?
Nathan Neulinger says
Another scenario is when you have a clusterwide setting of “manual”, but have overrides on individual machines. Trying to get to maintenance mode on that host when some machines override the manual, and not all machines are vmotionable – requires some serious jumping through hoops.
Some sort of flag to “do not move anything here” – or a resource reservation on the host – so I can say “yes, this host is drs enabled, but I’ve reserved X cpu and X mem on it before DRS even gets to see it”.
Nathan Neulinger says
Other related situation is when you want to slow-evacuate a host over a few days. I’d like to prevent it being selected/migrated to, but not necessarily put it to maintenance mode since that won’t complete until it’s fully evacuated which might take days.
As another option, you can create a separate cluster and drug and drop this host to this separate cluster object so other VMs will not use this host.