storage vmotion

vSphere 5.0: Storage vMotion and the Mirror Driver

Duncan Epping · Jul 14, 2011 ·

**disclaimer: this article is an out-take of our book: vSphere 5 Clustering Technical Deepdive**

There’s a cool and exciting new feature as part of Storage vMotion in vSphere 5.0. This new feature is called Mirror Mode and it enables faster and highly efficient Storage vMotion processes. But what is it exactly, and what does it replace?

Prior to vSphere 5.0 we used a mechanism called Change Block Tracking (CBT), to ensure that blocks which were already copied to the destination were marked as changed and copied during the iteration. Although CBT was efficient compared to legacy mechanisms (snapshots), the Storage vMotion engineers came up with an even more elegant and efficient solution which is called Mirror Mode. Mirror Mode does exactly what you would expect it to do; it mirrors the I/O. In other words, when a virtual machine that is being Storage vMotioned writes to disk, the write will be committed to both the source and the destination disk. The write will only be acknowledged to the virtual machine when both the source and the destination have acknowledged the write. Because of this, it is unnecessary to do re-iterative copies and the Storage vMotion process will complete faster than ever before.

The questions remain: How does this work? Where does Mirror Mode reside? Is this something that happens inside or outside of the guest? A diagram will make this more obvious.

By leveraging DISKLIB, the Mirror Driver can be enabled for the virtual machine that needs to be Storage vMotioned. Before this driver can be enabled, the virtual machine will need to be stunned and of course unstunned after it has been enabled. The new driver leverages the datamover to do a single-pass block copy of the source disk to the destination disk. Additionally, the Mirror Driver will mirror writes between the two disks. Not only has efficiency increased but also migration time predictability, making it easier to plan migrations. I’ve seen data where the “down time” associated with the final copy pass was virtually eliminated (from 13seconds down to 0.22 seconds) in the case of rapid changing disks, but also the migrations time went from 2900 seconds back to 1900 seconds. Check this great paper by Ali Mashtizadeh for more details.

The Storage vMotion process is fairly straight forward and not as complex as one might expect.

The virtual machine working directory is copied by VPXA to the destination datastore.
A “shadow” virtual machine is started on the destination datastore using the copied files. The “shadow” virtual machine idles, waiting for the copying of the virtual machine disk file(s) to complete.
Storage vMotion enables the Storage vMotion Mirror driver to mirror writes of already copied blocks to the destination.
In a single pass, a copy of the virtual machine disk file(s) is completed to the target datastore while mirroring I/O.
Storage vMotion invokes a Fast Suspend and Resume of the virtual machine (similar to vMotion) to transfer the running virtual machine over to the idling shadow virtual machine.
After the Fast Suspend and Resume completes, the old home directory and VM disk files are deleted from the source datastore.
1. It should be noted that the shadow virtual machine is only created in the case that the virtual machine home directory is moved. If and when it is a “disks-only Storage vMotion, the virtual machine will simply be stunned and unstunned.

Of course I tested it as I wanted to make sure mirror mode was actually enabled when doing a Storage vMotion. I opened up the VMs log files and this is what I dug up:

2011-06-03T07:10:13.934Z| vcpu-0| DISKLIB-LIB : Opening mirror node /vmfs/devices/svm/ad746a-1100be4-svmmirror 2011-06-03T07:10:47.986Z| vcpu-0| HBACommon: First write on scsi0:0.fileName='/vmfs/volumes/4d884a16-0382fb1e-c6c0-0025b500020d/VM_01/VM_01.vmdk' 2011-06-03T07:10:47.986Z| vcpu-0| DISKLIB-DDB : "longContentID" = "68f263d7f6fddfebc2a13fb60560e8e7" (was "dcbd5c17ac7e86a46681af33ef8049e5") 2011-06-03T07:10:48.060Z| vcpu-0| DISKLIB-CHAIN : DiskChainUpdateContentID: old=0xef8049e5, new=0x560e8e7 (68f263d7f6fddfebc2a13fb60560e8e7) 2011-06-03T07:11:29.773Z| Worker#0| Disk copy done for scsi1:0. 2011-06-03T07:15:16.218Z| Worker#0| Disk copy done for scsi0:0. 2011-06-03T07:15:16.218Z| Worker#0| SVMotionMirroredMode: Disk copy phase completed

Is that cool or what? One can only imagine what kind of new features can be introduced in the future using this new mirror mode driver. (FT enabled VMs across multiple physical datacenters and storage arrays anyone? Just guessing by the way…)

Storage vMotion performance difference?

Duncan Epping · Feb 24, 2011 ·

Last week I wrote about the different datamovers being used when a Storage vMotion is initiated and the destination VMFS volume has a different blocksize as the source VMFS volume. Not only will it make a difference in terms of reclaiming zero space, but as mentioned it also makes a different in performance. The question that always arises is how much difference does it make? Well this week there was a question on the VMTN community regarding a SvMotion from FC to FATA and the slow performance. Of course within a second FATA was blamed, but that wasn’t actually the cause of this problem. The FATA disks were formatted with a different blocksize and that cause the legacy datamover to be used. I asked Paul, who started the thread, if he could check what the difference would be when equal blocksizes were used. Today Paul did his tests and he blogged about it here and but I copied the table which contains the details that shows you what performance improvement the fs3dm (please note, that VAAI is not used… this is purely a different datamover) brought:

From To Duration in minutes

FC datastore 1MB blocksize FATA datastore 4MB blocksize 08:01

FATA datastore 4MB blocksize FC datastore 1MB blocksize 12:49

FC datastore 4MB blocksize FATA datastore 4MB blocksize 02:36

FATA datastore 4MB blocksize FC datastore 4MB blocksize 02:24

From	To	Duration in minutes
FC datastore 1MB blocksize	FATA datastore 4MB blocksize	08:01
FATA datastore 4MB blocksize	FC datastore 1MB blocksize	12:49
FC datastore 4MB blocksize	FATA datastore 4MB blocksize	02:36
FATA datastore 4MB blocksize	FC datastore 4MB blocksize	02:24

As I explained in my article about the datamover, the difference is caused by the fact that the data doesn’t travel all the way up the stack… and yes the difference is huge!

Storage IO Control and Storage vMotion?

Duncan Epping · Jan 14, 2011 ·

I received a very good question this week to which I did not have the answer, I had a feeling but that is not enough. The question was if Storage vMotion would be “throttled” by Storage IO Control. As I happened to have a couple of meetings scheduled this week with the actual engineers I asked the question and this was their answer:

Storage IO Control can throttle Storage vMotion when the latency threshold is exceeded. The reason for this being is that Storage vMotion is “billed” to the virtual machine.

This basically means that if you initiate a Storage vMotion the “process” belongs to the VM and as such if the host is throttled the Storage vMotion process might be throttled as well by the local scheduler(SFQ) depending on the amount of shares that were originally allocated to this virtual machine. Definitely something to keep in mind when doing a Storage vMotion of a large virtual machine as it could potentially lead to an increase of the amount of time it takes for the Storage vMotion to complete. Don’t get me wrong, that is not necessarily a negative thing cause at the same time it will prevent that particular Storage vMotion to consume all available bandwidth.

Storage Masking?

Duncan Epping · Feb 5, 2010 ·

I received a bunch of questions around storage masking over the last couple of weeks. One of them was around VMware’s best practice to mask LUNs on a per cluster basis. The best practice has been around for years and basically is there to reduce conflicts. More hosts accessing the same LUNs means more overhead, just to give you an example every 5 minutes a rescan of both HBAs takes place automatically to check for dead storage paths. You can imagine that there’s a difference between 64 hosts accessing your storage or limiting it to for instance 16 hosts. Also think about things like the failure domain you are introducing, what if an APD condition exists, this now doesn’t just impact 1 cluster… It could impact all of them.

For vSphere 5.1 read this revision…

The obvious next question is, won’t I lose a lot of flexibility? Well in a way you do as a simple VMotion to another cluster will not work anymore. But of course there’s always a way to move a host to a different cluster. In my design I usually propose a so called “Transfer Volume”. This Volume(NFS or VMFS) can be used to transfer VMs to a different cluster. Yes there’s a slight operational overhead here, but is also reduces overhead in terms of traffic to a LUN and decreases the chance of scsi reservation conflicts etc.

Here’s the process:

Storage VMotion the VM from LUN on Array 1 to Transfer LUN
VMotion VM from Cluster A to Cluster B
Storage VMotion the VM from Transfer LUN to LUN on Array 2

Of course these don’t necessarily need to be two separate arrays, it could just as easily be a single array with a group of LUNs masked to a particular cluster. For the people who have a hard time visualizing it:

SVMotion and disk space

Duncan Epping · Nov 19, 2008 ·

I received this question a couple of times and there’s no real definitive answer written anywhere…

“Does storage vmotion require additional disk space on the source volume?”

The answer is: Yes it does. Storage VMotion uses the snapshot technology to release the lock on the source disk. This snapshot is placed on the source volume. So in other words, all changes that take place during a Storage VMotion are written to the delta file. This delta file, can and will grow fast.

So keep this in mind if you need to storage vmotion a VM because the VMFS volume is running out of diskspace… it might run out of diskspace sooner than you think.