Write-Same vs XCopy when using Storage vMotion

I had a question last week about Storage vMotion and when Write-same vs XCopy was used. I was confident I knew the answer, but I figured I would do some testing. So what was the question exactly and the scenario I tested?

Imagine you have a virtual machine with a “lazy zero thick disk” and an “eager zero thick” disk. When initiating a Storage vMotion while preserving the disk format, would the pre-initialized blocks in the “eager zero thick” disk be copied through XCopy or would “write-same” (aka zero out) be used?

So that is what I tested. I created this virtual machine with two disks of which one being thick and about half filled and the other “eager zero thick”. I did a Storage vMotion to a different datastore (same format as source) and checked esxtop while the migration was on going:

CLONE_WR = 21943
ZERO = 2

In other words, when preserving the disk format the “XCopy” command (CLONE_WR) is issued by the hypervisor. The reason for this is when doing a SvMotion and keeping the disk formats the same the copy command is initiated for a chunk but the hypervisor doesn’t read the block before the command is initiated to the array. Hence the reason the hypervisor doesn’t know these are “zero” blocks in the “eager zero thick” disk and goes through the process of copy offload to the array.

Of course it would interesting to see what happens if I tell during the migration that all disks will need to become “eager zero thick”, remember one of the disks was “lazy zero thick”:

CLONE_WR = 21928
ZERO = 35247

It is clear that in this case it does zero out the blocks (ZERO). As there is a range of blocks which aren’t used by the virtual machine yet the hypervisor ensures these blocks are zeroed so that they can be used immediately when the virtual machine wants to… as that is what the admin requested “eager zero thick” aka pre-zeroed.

For those who want to play around with this, check esxtop and then the VAAI stats. I described how-to in this article.

Storage vMotion does not rename files?

A while back I posted that 5.0 U2 re-introduced the renaming behavior for VM file names. I was just informed by our excellent Support Team that unfortunately the release notes missed something crucial and Storage vMotion does not rename files by default. In order to get the renaming behavior you will have to set an advanced setting within vCenter.. This is how you do it:

  • Go to “Administration”
  • Click on “vCenter Server Settings”
  • Click “Advanced Settings”
  • Add the key “provisioning.relocate.enableRename” with value “true” and click “add”
  • Restart vCenter service or vCenter Server

Now the renaming of the files during the SvMotion process should work again!
All of you who need this functionality, please make sure to add this advanced setting.

storage vmotion does not rename files

Renaming virtual machine files using SvMotion back in 5.0 U2

I have been pushing for this heavily internally together with Frank Denneman and it pleases me to say that it is finally back… You can rename your virtual machine files again using Storage vMotion as of 5.0 u2.

vSphere 5 Storage vMotion is unable to rename virtual machine files on completing migration
In vCenter Server , when you rename a virtual machine in the vSphere Client, the vmdk disks are not renamed following a successful Storage vMotion task. When you perform a Storage vMotion of the virtual machine to have its folder and associated files renamed to match the new name. The virtual machine folder name changes, but the virtual machine file names do not change.

This issue is resolved in this release

src: https://www.vmware.com/support/vsphere5/doc/vsp_vc50_u2_rel_notes.html#resolvedissues

Those who want to know what else is fixed, you can find the full release notes here of both ESXi 5.0 U2 and vCenter 5.0 U2:

** do note that this fix is not part of 5.1 yet **

Scripts release for Storage vMotion / HA problem

Last week when the Storage vMotion / HA problem went public I asked both William Lam and Alan Renouf if they could write a script to detect the problem. I want to thank both of them for their quick response and turnaround, they cranked the script out in literally hours. The scripts were validated multiple times in a VDS environment and worked flawless. Note that these scripts can detect the problem in an environment using a regular Distributed vSwitch and a Nexus 1000v, the script can only mitigate the problem though in a Distributed vSwitch environment. Here are the links to the scripts:

Once again thanks guys!

Clarifying the SvMotion / VDS problem

<Update>I asked William Lam if he could write a script to detect this problem and possibly even mitigate it. William worked on it over the weekend and just posted the result! Head over to his blog for the script! Thanks William for cranking it out this quick! For those who prefer PowerCLI… Alan Renouf just posted his version of the script! Both scripts provide the same functionality though!</Update>

I think there is some confusion around the SvMotion / VDS problem I described a couple of days back. Let me try to clarify it in a couple of simple steps.

First of all, this only applies to virtual machines that have been Storage vMotioned by vCenter 5.0 and are connected to a Distributed vSwitch. This could be either manually or using Storage DRS. So what is the exact problem?

  • When a VM is attached to a dvPortgroup it is connected to a port. This information is stored locally on the host and on the VMFS volume this VM is stored on.
  • This volume will contain a file which is named equal to the port number of this VM.
  • When the VM is Storage vMotioned to a different datastore this file is not created on the destination datastore
  • When the host fails on which the Storage vMotioned VM resides HA will attempt to restart that VM.
  • In order for HA to restart it and connect it to the dvPortgroup this file is required.
  • As the file is not available the restart fails.

You can simply resolve this by connecting the impacted VMs to a different dvPortgroup temporarily and then reconnect them back to the original portgroup. As soon as you’ve done that the file will be created on the datastore. For now this is a manual task, but I am sure some of my teammembers are working on a scripted solution as we speak… right Alan / William? :)

vSphere 5.0: Storage vMotion and the Mirror Driver

**disclaimer: this article is an out-take of our book: vSphere 5 Clustering Technical Deepdive**

There’s a cool and exciting new feature as part of Storage vMotion in vSphere 5.0. This new feature is called Mirror Mode and it enables faster and highly efficient Storage vMotion processes. But what is it exactly, and what does it replace?

Prior to vSphere 5.0 we used a mechanism called Change Block Tracking (CBT), to ensure that blocks which were already copied to the destination were marked as changed and copied during the iteration. Although CBT was efficient compared to legacy mechanisms (snapshots), the Storage vMotion engineers came up with an even more elegant and efficient solution which is called Mirror Mode. Mirror Mode does exactly what you would expect it to do; it mirrors the I/O. In other words, when a virtual machine that is being Storage vMotioned writes to disk, the write will be committed to both the source and the destination disk. The write will only be acknowledged to the virtual machine when both the source and the destination have acknowledged the write. Because of this, it is unnecessary to do re-iterative copies and the Storage vMotion process will complete faster than ever before.

The questions remain: How does this work? Where does Mirror Mode reside? Is this something that happens inside or outside of the guest? A diagram will make this more obvious.

By leveraging DISKLIB, the Mirror Driver can be enabled for the virtual machine that needs to be Storage vMotioned. Before this driver can be enabled, the virtual machine will need to be stunned and of course unstunned after it has been enabled. The new driver leverages the datamover to do a single-pass block copy of the source disk to the destination disk. Additionally, the Mirror Driver will mirror writes between the two disks. Not only has efficiency increased but also migration time predictability, making it easier to plan migrations. I’ve seen data where the “down time” associated with the final copy pass was virtually eliminated (from 13seconds down to 0.22 seconds) in the case of rapid changing disks, but also the migrations time went from 2900 seconds back to 1900 seconds. Check this great paper by Ali Mashtizadeh for more details.

The Storage vMotion process is fairly straight forward and not as complex as one might expect.

  1. The virtual machine working directory is copied by VPXA to the destination datastore.
  2. A “shadow” virtual machine is started on the destination datastore using the copied files. The “shadow” virtual machine idles, waiting for the copying of the virtual machine disk file(s) to complete.
  3. Storage vMotion enables the Storage vMotion Mirror driver to mirror writes of already copied blocks to the destination.
  4. In a single pass, a copy of the virtual machine disk file(s) is completed to the target datastore while mirroring I/O.
  5. Storage vMotion invokes a Fast Suspend and Resume of the virtual machine (similar to vMotion) to transfer the running virtual machine over to the idling shadow virtual machine.
  6. After the Fast Suspend and Resume completes, the old home directory and VM disk files are deleted from the source datastore.
    1. It should be noted that the shadow virtual machine is only created in the case that the virtual machine home directory is moved. If and when it is a “disks-only Storage vMotion, the virtual machine will simply be stunned and unstunned.

Of course I tested it as I wanted to make sure mirror mode was actually enabled when doing a Storage vMotion. I opened up the VMs log files and this is what I dug up:

2011-06-03T07:10:13.934Z| vcpu-0| DISKLIB-LIB   : Opening mirror node /vmfs/devices/svm/ad746a-1100be4-svmmirror
2011-06-03T07:10:47.986Z| vcpu-0| HBACommon: First write on scsi0:0.fileName='/vmfs/volumes/4d884a16-0382fb1e-c6c0-0025b500020d/VM_01/VM_01.vmdk'
2011-06-03T07:10:47.986Z| vcpu-0| DISKLIB-DDB   : "longContentID" = "68f263d7f6fddfebc2a13fb60560e8e7" (was "dcbd5c17ac7e86a46681af33ef8049e5")
2011-06-03T07:10:48.060Z| vcpu-0| DISKLIB-CHAIN : DiskChainUpdateContentID: old=0xef8049e5, new=0x560e8e7 (68f263d7f6fddfebc2a13fb60560e8e7)
2011-06-03T07:11:29.773Z| Worker#0| Disk copy done for scsi1:0.
2011-06-03T07:15:16.218Z| Worker#0| Disk copy done for scsi0:0.
2011-06-03T07:15:16.218Z| Worker#0| SVMotionMirroredMode: Disk copy phase completed

Is that cool or what? One can only imagine what kind of new features can be introduced in the future using this new mirror mode driver. (FT enabled VMs across multiple physical datacenters and storage arrays anyone? Just guessing by the way…)

Storage vMotion performance difference?

Last week I wrote about the different datamovers being used when a Storage vMotion is initiated and the destination VMFS volume has a different blocksize as the source VMFS volume. Not only will it make a difference in terms of reclaiming zero space, but as mentioned it also makes a different in performance. The question that always arises is how much difference does it make? Well this week there was a question on the VMTN community regarding a SvMotion from FC to FATA and the slow performance. Of course within a second FATA was blamed, but that wasn’t actually the cause of this problem. The FATA disks were formatted with a different blocksize and that cause the legacy datamover to be used. I asked Paul, who started the thread, if he could check what the difference would be when equal blocksizes were used. Today Paul did his tests and he blogged about it here and but I copied the table which contains the details that shows you what performance improvement the fs3dm (please note, that VAAI is not used… this is purely a different datamover) brought:

From To Duration in minutes
FC datastore 1MB blocksize FATA datastore 4MB blocksize 08:01
FATA datastore 4MB blocksize FC datastore 1MB blocksize 12:49
FC datastore 4MB blocksize FATA datastore 4MB blocksize 02:36
FATA datastore 4MB blocksize FC datastore 4MB blocksize 02:24

As I explained in my article about the datamover, the difference is caused by the fact that the data doesn’t travel all the way up the stack… and yes the difference is huge!