I was reading up on vMotion today and stumbled on this excellent article by my colleague Kyle Gleed and noticed something that hardly anyone has blogged about…. Quick Resume. Quick Resume is a feature that allows you to vMotion a virtual machine which has a high memory page change rate. Basically when the change rate of your memory pages exceeds the capabilities of your network infrastructure you could end up in a scenario where vMotioning a virtual machine would fail as the change rate would make a switch-over impossible. With Quick Resume this has changed.
Quick Resume enables the source virtual machines to be stunned while starting the destination virtual machine before all pages have copied. However, as the virtual machine is already running at the destination it could possibly attempt to touch (read or write) a page which hasn’t been copied yet. In that case Quick Resume requests the page from the source to allow the guest to complete the action while continuously copying the remaining memory pages until all pages are migrated. But what if the network would fail at that point, wouldn’t you end up with a destination virtual machine which cannot access certain memory pages anymore as they are “living” remotely? Just like Storage IO Control, vMotion leverages shared storage. A special file would be created in the case Quick Resume is used and this file is basically used as a backup buffer. In the case the network would fail this file would allow for the migration to complete. This file is typically in the order of just a couple MBs. Besides being used as a buffer for transferring the memory pages it also enables bi-directional communication between the two hosts allowing the vMotion to complete as though the network hadn’t failed. Is that cool or what?
The typical question that arises immediately is if this will impact performance? It is good to realize that without Quick Resume vMotioning large memory active virtual machines would be difficult. The switch-over time could potentially be too large and lead to temporary loss of connection with the virtual machine. Although Quick Resume will impact performance when pages that are not copied yet are accessed, the benefits of being able to vMotion very large virtual machines with minimal impact by far outweigh this temporary increase of memory access time.
There is so many cool features and enhancements in vSphere that I just keep being amazed.
Sean says
Very cool tech. I haven’t come across any VMs in our environment that have a high enough memory page change rate to warrant using this feature yet, but if I do I’ll be watching the datastore to see if I can spot the backup quick resume file buffer during a vMotion 🙂
Louw Pretorius says
Is there any more documentation about Quick Resume I can have a look at?
Louw
Duncan Epping says
No there is no official documentation as it is just a feature of vMotion itself.
Jason Boche says
Without documentation, I’m glad you brought it out into the open and I’m not surprised it wasn’t widely known up to this point.
Duncan Epping says
Well these are the type of features that get addressed in a VMworld presentation and if you are lucky someone picks it up and blogs about it. Apparently that didn’t happen this time,
Doug says
I can definitely see this being useful for distance vMotions — where the latency of the distance link is too high for even ‘normal’ VM memory page change rates to be accommodated.
Derek B. Moore says
Should enhance the ability to move vms between datacenters like what we are testing at http://downtowncolo.com
CianoKuraz says
I suppose this Quick Resume is the same used by sVMotion, in pair with Quick Suspend just after the pre-copy of the vm disks and swap file from source to destination?
Duncan Epping says
No that is FSR what you are referring to –> Fast Suspend and Resume. It is what enables the switch-over from source to “ghost” destination vm.
This enables the vMotion of large virtual machines with a high memory page change rate.
Andy Kitzke says
I always wondered what would happen in this situation. It’s good to get a better understanding of how vmotion can still occur in these situations.
Michael Nauen says
AT the vmware uptime blog is a deeper articel.
http://blogs.vmware.com/uptime/2011/02/vmotion-whats-going-on-under-the-covers.html
•You should be able to vMotion any workload as long as it is dirtying memory pages at a rate that is less than your vMotion network transmit rate.
•When the preCopy cannot converge, vMotion needs to decide whether to fail the vMotion or to proceed with switchover to the destination anyway. It makes this decision by estimating the time required to transmit all the remaining outstanding pages. By default, if this time is below 100 seconds vMotion will proceed with the switchover.
So most have a 1 Gigabit Network like in any VMware documentation is recommended. To me it means that 1000 Mbit/s = 125 MB/S is the maximum transmit rate.
So if it needs to be under 100 seconds , it must transmit 12500 MB in 100 seconds or 12,5 GB in 100 seconds is the limit per VM.
With a 10 Gigabit Network the limit is 125 GB in 100 seconds.
With Infiniband 40 Gigabit it is 500 GB in 100 seconds
Hopefully 40 and 100 Gigabit Cards will come soon.
The terminus remaining outstanding pages is unclear for me.
Michael B. says
Obviously size of the VM and network are the 2 main factors but is there any decrease in performance when “Quick Resume” is being utilized?
Also looking at the blog article from Michael Nauen shows ESX 4.1 as the stated version for Quick Resume. Is 4.1 the first version to utilize Quick Resume?
The blog article also shows checking the vmkernel log for if network throughput is sufficient. Will the vmkernel log make entry stating using Quick Resume?
Thanks,
Mike
Duncan Epping says
Isn’t that what my last paragraph states? if memory pages still reside on the source node when requested by the Guest OS they will need to be fetched across the network first which means “latency”.
4.1 is the first version indeed.
Vivekrai says
Hi Duncan Epping,
A special file would be created in the case Quick Resume is used and this file is basically used as a backup buffer is it bitmap file or anything else.