Clearing up a misunderstanding around CPU throttling with vMotion

I was reading a nice article by Michael Webster on multi-nic vMotion. In the comment section Josh Attwell refers to a tweet by Eric Siebert around how CPUs are throttled when many VMs are simultaneously vMotioned. This is the tweet:

I want to make sure that everyone understands that this is not exactly the case. There is a vMotion enhancement in 5.0 which is called SDPS aka “Slow Down During Page Send”. I wrote an article about this feature when vSphere 5.0 was released but I guess it doesn’t hurt to repeat this as the blogosphere was literally swamped with info around the 5.0 release.

SDPS kicks in when the rate at which pages are changed (dirtied) exceeds the rate at which the pages can be transferred to the other host. In other words, if your virtual machines are not extremely memory active then chances of SDSP ever kicking in is small, very very small. If it does kick in, it kicks in to prevent the vMotion process from failing for this particular VM. Now note that by default SDPS is not doing anything, normally your VMs will not be throttled by vMotion and it will only be throttled when there is a requirement to do so.

I quoted my original article on this subject below to provide you the details:

Simply said, vMotion will track the rate at which the guest pages are changed, or as the engineers prefer to call it, “dirtied”. The rate at which this occurs is compared to the vMotion transmission rate. If the rate at which the pages are dirtied exceeds the transmission rate, the source vCPUs will be placed in a sleep state to decrease the rate at which pages are dirtied and to allow the vMotion process to complete. It is good to know that the vCPUs will only be put to sleep for a few milliseconds at a time at most. SDPS injects frequent, tiny sleeps, disrupting the virtual machine’s workload just enough to guarantee vMotion can keep up with the memory page change rate to allow for a successful and non-disruptive completion of the process. You could say that, thanks to SDPS, you can vMotion any type of workload regardless of how aggressive it is.

It is important to realize that SDPS only slows down a virtual machine in the cases where the memory page change rate would have previously caused a vMotion to fail.

This technology is also what enables the increase in accepted latency for long distance vMotion. Pre-vSphere 5.0, the maximum supported latency for vMotion was 5ms. As you can imagine, this restricted many customers from enabling cross-site clusters. As of vSphere 5.0, the maximum supported latency has been doubled to 10ms for environments using Enterprise Plus. This should allow more customers to enable DRS between sites when all the required infrastructure components are available like, for instance, shared storage.

vSphere 5.0 U1a was just released, vDS/SvMotion bug fixed!

Many of you who hit the SvMotion / VDS / HA problem requested the hotpatch that was available for it. Now that Update 1a has been released with a permanent fix how do you go about installing it? This is the recommended procedure:

  1. Backup your vCenter Database
  2. Uninstall the vCenter hot-patch
  3. Install the new version by pointing it to the database

The reason for this is that the hot-patch increased the build number, and this could possibly conflict with later versions.

And for those who have been waiting on it, the vCenter Appliance has also been update to Update 1 and now includes a vPostgress database by default instead of DB2!

LUN sizes in the new Storage World

I am on a holiday and catching up on some articles I had saved that I still wanted to read. I stumbled on an article about sizing VMFS volumes by  Ravi Venkat (Pure Storage) on flash based arrays. I must say that Ravi has a couple of excellent arguments around the operational and architectural simplicity of these new types of arrays and I do strongly believe that indeed it makes the world a lot easier.

IOps requirements, indeed forget about them when you have thousands at your disposal… And indeed your raid penalty also doesn’t really matter anymore, especially as many of these new storage arrays also have new types of raid-levels. Great right?

Yes in most cases this is great news! One thing to watch out for though is the failure domain. Meaning that if you create a large 32TB volume with hundreds of virtual machines the impact would be huge if this volume for whatever reason blows up. Not only the impact of the failure itself but also the RTO aka “Recovery Time Objective” would be substantially longer. Yes the array might be lightning fast, but your will and probably are limited by your backup solution. How long will it take to restore those 32TBs? Have you ever done the math?

It isn’t too complicated to do the math, but I would strongly suggest to test it! When I was an admin we had a clearly defined RTO and RPO. We tested these every once in a while, and even though we were already using tapeless backups, it still took a long time to restore 2TB.

Nevertheless, I do feel that Ravi points out the “hidden value” of these types of storage architectures. Definitely something to take in to account when you are looking for new storage… I am wondering how many of you are already using flash based solutions, and how you do your sizing.

Answering some admission control questions

I received a bunch of questions on HA admission control in this blog post and I figured I would answer them in a blog post so that everyone would be able to find / read it. This was the original set of questions:

There are 4 ESXi Hosts in the network and 4 VMs (Same CPU, RAM Reservation for all VMs) on each Host. Admission Control is policy is set to ‘Host failure cluster tolerates’ to 1. All the available 12 slots have been used by the powered ON VMs, except the 4 reserved slots for failover.
1) What happens if 2 ESXi Hosts fails now? ( 2 * 4 VMs needs to fail over). Will HA restart only 4 VMs as it has only 4 slots available? And Restart of the remaining 4 VM fails?
Same Scenario, but Policy is set to ‘% of cluster resources reserved’ = 25%. All the available 75 % resources have been utilized by all the 16 VMs, except 25 % reserved for failover
2) What happens if 2 ESXi Hosts fails now? ( 2 * 4 VMs needs to fail over). Will HA restart only 4 VMs as it consumes 25 % of resources? And Restart of the other 4 VM fails?
3) Does HA check the VM reservation (or any other factor) at the time of restart ?
4) HA only restart a VM if the Host could guarantee the reserved resources or restart Fails?
5) What if no VM reservations are set VM level ?
6)What does HA takes into consideration when it has to restart VMs which has no reservation ?
7)Will it guarantee the configured Resources for each VMs ?
8)If not, How HA can restart 8 VMs (as per our eg) when it only has configured reserved resources for just 4 VM
9)Will it share the reserved resources across 8 VMs and will not care about the resource crunch or is it about first come first serve
10)Admission control doesn’t have any role at all in the event of HA failover ?

Let me tackle these questions one by one:

  1. In this scenario 4 VMs will be restarted and 4 VMs might be restarted! Note that the “slot size” policy is used and that this is based on the worst case scenario. So if your slot is 1GB and 2GHz but your VMs require way less than that to power-on it could be all VMs are restarted. However, HA guarantees the restart of 4 VMs. Keep in mind that this scenario doesn’t happen too often, as you would be overcommitting to the extreme here. As said HA will restart all VMs it can. It just needs to be able to satisfy the resource reservations on memory and CPU!
  2. Again, also in this HA will do its best to restart. It can restart new VMs until all “unreserved capacity” is used. As HA only needs to guarantee reserved resources chances of hitting this is very slim, as most people don’t use reservations at a VM level it would mean you are overcommiting extremely
  3. Yes it will validate if there is a host which can back the the resource reservations before it tries the restart
  4. Yes it will only restart the VM when this can be guaranteed. If it cannot be then HA can call,”DRS” to defragment resources for this VM
  5. If there are no reservations then HA will only look at the “memory overhead” in order to place this VM
  6. HA ensures the portgroup and datastore are available on the host.
  7. It will not guarantee configured resources, HA is about restarting virtual machines not about resource management. DRS is about resource management and guaranteeing access to resources.
  8. HA will only be able to restart the VM if there are unreserved resources available to satisfy the VMs request
  9. All resources required for a virtual machine need to be available on a single host! Yes resources will be shared on a single host, just as long as no reservations are defined.
  10. No Admission Control doesn’t have any role in an HA failover. Admission Control happens on a vCenter level, HA failovers happen on an ESX(i) level.

Storage DRS interoperability white paper released

I just noticed I never blogged about a white paper Frank Denneman and I co-authored. The white paper deals about interoperability between Storage DRS and various other products and features. I highly recommend reading it if you are planning on implementing Storage DRS or want to get a better understanding of how Storage DRS interacts with other components of your infrastructure.

Storage DRS interoperability
This document presents an overview of best practices for customers considering the implementation of VMware vSphere Storage DRS in combination with advanced storage device features or other VMware products.