**disclaimer: this article is an out-take of our book: vSphere 5 Clustering Technical Deepdive**
There are some fundamental changes when it comes to vMotion scalability and performance in vSphere 5.0. Most of these changes have one common goal: being able to vMotion ANY type of workload. It doesn’t matter if you have a virtual machine with 32GB of memory that is rapidly changing memory pages any more with the the following enhancements:
- Multi-NIC vMotion support
- Stun During Page Send (SDPS)
Multi-NIC vMotion Support
One of the most substantial and visible changes is multi-NIC vMotion capabilities. vMotion is now capable of using multiple NICs concurrently to decrease the amount of time a vMotion takes. That means that even a single vMotion can leverage all of the configured vMotion NICs. Prior vSphere 5.0, only a single NIC was used for a vMotion enabled VMkernel. Enabling multiple NICs for your vMotion enabled VMkernel’s will remove some of the constraints from a bandwidth/throughput perspective that are associated with large and memory active virtual machines. The following list shows the currently supported maximum number of NICs for multi-NIC vMotion:
- 1GbE – 16 NICs supported
- 10GbE – 4 NICs supported
It is important to realize that in the case of 10GbE interfaces, it is only possible to use the full bandwidth when the server is equipped with the latest PCI Express busses. Ensure that your server hardware is capable of taking full advantage of these capabilities when this is a requirement.
Stun During Page Send
A couple of months back I described this cool vSphere 4.1 vMotion enhancement called Quick Resume and now it is replaced with Stun During Page Send, or also often referred to as “Slowdown During Page Send” is a feature that “slowsd own” the vCPU of the virtual machine that is being vMotioned. Simply said, vMotion will track the rate at which the guest pages are changed, or as the engineers prefer to call it, “dirtied”. The rate at which this occurs is compared to the vMotion transmission rate. If the rate at which the pages are dirtied exceeds the transmission rate, the source vCPUs will be placed in a sleep state to decrease the rate at which pages are dirtied and to allow the vMotion process to complete. It is good to know that the vCPUs will only be put to sleep for a few milliseconds at a time at most. SDPS injects frequent, tiny sleeps, disrupting the virtual machine’s workload just enough to guarantee vMotion can keep up with the memory page change rate to allow for a successful and non-disruptive completion of the process. You could say that, thanks to SDPS, you can vMotion any type of workload regardless of how aggressive it is.
It is important to realize that SDPS only slows down a virtual machine in the cases where the memory page change rate would have previously caused a vMotion to fail.
This technology is also what enables the increase in accepted latency for long distance vMotion. Pre-vSphere 5.0, the maximum supported latency for vMotion was 5ms. As you can imagine, this restricted many customers from enabling cross-site clusters. As of vSphere 5.0, the maximum supported latency has been doubled to 10ms for environments using Enterprise Plus. This should allow more customers to enable DRS between sites when all the required infrastructure components are available like, for instance, shared storage.