Clearing up a misunderstanding around CPU throttling with vMotion

I was reading a nice article by Michael Webster on multi-nic vMotion. In the comment section Josh Attwell refers to a tweet by Eric Siebert around how CPUs are throttled when many VMs are simultaneously vMotioned. This is the tweet:

Heard interesting vMotion tidbit today, more simultaneous vMotions are made possible by throttling the clock speed of VMs to slow them down

— Eric Siebert (@ericsiebert) June 6, 2012

I want to make sure that everyone understands that this is not exactly the case. There is a vMotion enhancement in 5.0 which is called SDPS aka “Slow Down During Page Send”. I wrote an article about this feature when vSphere 5.0 was released but I guess it doesn’t hurt to repeat this as the blogosphere was literally swamped with info around the 5.0 release.

SDPS kicks in when the rate at which pages are changed (dirtied) exceeds the rate at which the pages can be transferred to the other host. In other words, if your virtual machines are not extremely memory active then chances of SDSP ever kicking in is small, very very small. If it does kick in, it kicks in to prevent the vMotion process from failing for this particular VM. Now note that by default SDPS is not doing anything, normally your VMs will not be throttled by vMotion and it will only be throttled when there is a requirement to do so.

I quoted my original article on this subject below to provide you the details:

Simply said, vMotion will track the rate at which the guest pages are changed, or as the engineers prefer to call it, “dirtied”. The rate at which this occurs is compared to the vMotion transmission rate. If the rate at which the pages are dirtied exceeds the transmission rate, the source vCPUs will be placed in a sleep state to decrease the rate at which pages are dirtied and to allow the vMotion process to complete. It is good to know that the vCPUs will only be put to sleep for a few milliseconds at a time at most. SDPS injects frequent, tiny sleeps, disrupting the virtual machine’s workload just enough to guarantee vMotion can keep up with the memory page change rate to allow for a successful and non-disruptive completion of the process. You could say that, thanks to SDPS, you can vMotion any type of workload regardless of how aggressive it is.

It is important to realize that SDPS only slows down a virtual machine in the cases where the memory page change rate would have previously caused a vMotion to fail.

This technology is also what enables the increase in accepted latency for long distance vMotion. Pre-vSphere 5.0, the maximum supported latency for vMotion was 5ms. As you can imagine, this restricted many customers from enabling cross-site clusters. As of vSphere 5.0, the maximum supported latency has been doubled to 10ms for environments using Enterprise Plus. This should allow more customers to enable DRS between sites when all the required infrastructure components are available like, for instance, shared storage.

Comments

Andrey says

16 July, 2012 at 16:30

Thanks for the quick article Duncan! I believe you also included this in the vSphere 5.0 Clustering technical deepdive if I recall correctly.
Sudhish Ahuja says

17 July, 2012 at 21:36

If I understood correctly SDPS is altering vCPU C-State, its great news in future we might see more power savings from a VMware by enabling virtual machine’s to control CPU power-saving based on utilization
Bill Griffith says

31 July, 2012 at 21:00

Will DRS ever cause this to happen? For instance, if DRS sees the need to migrate vm’s, will it choose a vm which might fall into this state or will it pass by the vm with rapidly changing pages and move a vm with more slowly changing pages?

Bill G.
Frank Denneman says

1 August, 2012 at 15:55

Good question Bill,

I will post an article soon describing the way DRS chooses a virtual machine and will highlight the interoperability with SDPS.
Mark White says

13 August, 2012 at 20:15

So I understand how SDPS introduces small sleep processes into the vCPU to slow down the VM’s response to requests – inherently slowing down the Page Dirty Rate and transmitting the dirtied pages more efficiently. When it comes down to it though there is some code somewhere which tells the vCPU to sleep OR reconfigures the vCPU’s priority on the scheduler. Here’s my questions: 1. how are the sleeps introduced i.e. which process (e.g. vMotion)? 2. are the sleeps pushed onto the scheduler directly from some vMotion process or are they pulled from the vCPU having been dropped there? The core of my query is whether SDPS is managed by the VM being migrated or centrally. Any clarification would very much be appreciated 🙂
Duncan Epping says

13 August, 2012 at 20:59

That is NDA info Mark. I cannot go in to these details unfortunately.
Mark White says

13 August, 2012 at 21:22

That makes sense Duncan – probably why I can’t find anything on it online – no matter – will be meeting with Joe Baguley in a couple of weeks – maybe he can direct me in the right direction 😉 Thanks anyway…

Related

Reader Interactions

Comments