Storage Migrations?

On an internal mailing list we had a very useful discussion around storage migrations when a SAN is replaced or a migration needs to take place to a different set of disks. Many customers face this at some point. The question usually is what is the best approach? SAN Replication or Storage vMotion… Both have its pros and cons I guess.

SAN Replication:

  • Can utilize Array based copy mechanisms for fast replication (+)
  • Per LUN migration, high level of concurrency (+)
  • Old volumes still available (+)
  • Need to resignature or mount the volume again (-)
    • A resignature also means you will need to reregister the VM! (-)
  • Downtime for the VM during the cut over (-)

Storage vMotion:

  • No downtime for your VMs (+)
  • Fast Storage vMotion when your Array supports VAAI (+)
    • If your Array doesn’t support VAAI migrations can be slow (-)
    • Induced cost if VAAI isn’t supported (-)
    • Only intra Array not across arrays (-)
  • No resignaturing or re-registering needed (+)
  • Per VM migration (-)
    • Limited concurrency (2 per host, 8 per vmfs volume) (-)

As you can see both have its pros and cons and it boils down to the following questions:

How much down time can you afford?
How much time do you have for the migration?

Be Sociable, Share!

    Comments

    1. Duco Jaspars says

      VAAI is not supported across Array’s AFAIK, so when you talk about replacing SANS it should not be in the list of arguments, agreed?

    2. says

      For us having to reregister the VMs would kill that option. Because we use Veeam for vm backups and if the machines getting new IDs veeam would see them as new machines and lose the old backups and dedupe.

    3. says

      Ben points out a should only be used in a last ditch effort if other options arentbig thing. Reregistering VMs may create backup/restore issues. I think the Storage vMotion is the ideal option when given a large alloted amount of time. I think the replication available. Of course, different engineers with different evironments might warrant different approaches.

    4. says

      I’m in the process of migrating 4TB from MSA1500 to EVA4100, and another project alongside to migrate 15TB from EVA4100 to EVA4100.

      Storage vMotion over the EVA links which have multiple paths is reasonably fast, but the single link to the MSA has resulted in 5 days of SvMotion time.

      On a busy school network 100% uptime is critical… hence our reasons for SvMotion.

      The fastest hardware storage replication product I found was from Vicom – http://www.vicom.com/products/products.htm

    5. says

      Probably the Holy Grail for storage migrations will be the VAAI-Accelerated Storage VMotion, which in my opinion, is the best of both worlds.
      SAN replication is great to achieve Continuous Data Protection but it’s kind of “expensive” in terms of downtime and disk space, in fact, I more often find myself wanting to migrate only part of a datastore rather than everything inside it.

      Just my two cents,
      Fabio

    6. says

      Ongoing V2Vs (migration of an existing VM from one infrastructure to another) can quickly become the bane of a VI admins existence. Reasons for the V2V can be many:

      1. ESX host infrastructure implemented 3-4 years ago faces tech refresh in the form of new clusters being built with new hardware. All the VMs need to migrated from old cluster to the new cluster which may or may not share the same storage allocation. Utilization of EVC may not be available for technical, political, or design reasons.

      2. Tech refresh, re-tiering, or restructuring of back end storage will force a storage migration of each VM. Lack of front end storage virtualization (ie. Hitachi USPV) places ownership of the migration pieces on the VI admin.

      3. Restructuring, repurposing of existing VI will cause a displacement of VMs = more V2Vs.

      Migrations utilizing storage offload won’t always be available. Storage vMotion won’t always be available – unfortunately, not all organizations trust hot Storage vMotion yet. It isn’t 100% reliable and there is no 100% guarantee which can be made to the business to perform mid day Storage vMotion. Recovery from a failed Storage vMotion can be ugly. The same could be said for vMotion but I view vMotion as more mature and reliable than Storage vMotion. Lack of these two technologies leaves cold migrations of a powered down VM during a maintenance window which will typically be during the middle of the night. This is a mind numbing task for a VI admin when dealing with large quantities of VMs. Consider the evacuation of one cluster to be several hundred or maybe even thousands of VMs. Now to repeat this process over and over for many clusters throughout the year in a large environment – this activity is not sustainable. We need more tools, we need 100% reliable tools, and the business lines need to trust the technology if we are going to have the mobility we really need for virtual machines. Mobility has got to be transparent. Of course, if one is struggling with their VI implementation, one cannot ignore the design either.

    7. says

      Excellent summary. Used both methods on a project, the experience with re-registering of VMs meant I would take Storage vMotion anytime again in future despite the slower migration times.

      One thing you might want to add to the article is how to deal with moving VMs with RDMs to a new storage array.

    8. says

      Jason,
      I agree with your points, but Storage VMotion has matured a lot since the 3.5 days and I think that the VAAI acceleration will be the key, letting the storage do its thing is the way to go IMO, when VMware will develop a cross-storage VAAI accelerated SVM we will probably have a 100% bulletproof solution for VM storage movement.

    9. says

      At my shop, we just completed an array migration and went with the storage vMotion method for the 500 or so VMs that needed to be moved. We were going from a 3PAR S400 to a 3PAR T800 where replication works beautifully, but in the end it came down to a trade off between migration time and uptime. Uptime won, storage vMotion worked great, no downtime.
      In order to avoid hundreds of hours of pointing and clicking, it was automated via a script that used the vmware perl api and 3 of William Lam’s excellent scripts – getVMsPerDatastore.pl, whichClusterIsMyVMIn.pl, and getHostViaGuest.pl (http://engineering.ucsb.edu/~duonglt/vmware). We used the ‘something is going on anyway’ opportunity to include a bit of cluster reorganization in there too, using the transfer volume trick (http://www.yellow-bricks.com/2010/02/05/storage-masking/), and it all worked out well.

    10. says

      I would also opt for the Storage vMotion Approach – If you have the option of no downtime – why would you not use it?

    11. says

      If you’re in a larger environment and want to forklift everything quickly SAN Replication seems like a good play. If you don’t have the luxury of having SAN replication (or it’s too expensive) then storage vMotion is a good option if you have the connectivity and you’re not being asked to work at ridiculous speeds.

      I led a project relocating a datacenter where SAN replication (and therefore SRM) was not an option, and Storage vMotion was not an option due to distance (800 miles), so we ended up using a 3rd option.

      We used Vizioncore’s vReplicator product over the WAN. While this forced us to move smaller groups of VMs at a time, this was actually a blessing because we had to re-IP the applications during the move. The re-IP was the biggest challenge of the datacenter move and it was good that we only had to focus on a few application groups at a time.

      The vReplicatior solution could also be used in a LAN capacity but I’m not sure why anyone would choose this over Storage vMotion when working within the same datacenter.

    12. Nathan says

      Having recently completed a SAN migration project I am familiar with this issue.

      The Hardware vendor wanted to replicate at SAN level, and simply switch across to the new SAN, when I posed the question regarding LUN signatures and down time the blank look on his face spoke volumes!

      I chose to Storage vmotion over a couple of weeks instead using a few hosts as the migration platform and swing the other hosts across in a controlled fashion.

      All worked well we had no down time and kept the business happy, that is until the vendor decided to reboot the array!

      Sometimes you just can get good help….

    13. Ole Andre Schistad says

      This is a very interesting topic, and I think you’ll often find that people tend to take the hammers and nails approach – a VMware admin will often prefer to use storage vmotion because that is their comfort zone, whereas a storage admin will probably never even consider anything but SAN mirroring.

      To be honest, I find that this is an area where the VMware LVM introduces a lot of headache. I totally get why VMFS is so paranoid about volume signatures – sometimes that is the only defence against total data corruption – but in the specific case of a SAN migration it works against the user.

      True; the new force mount in ESX 4 helps a lot but you are still left with a problem waiting to happen sometime in the future. What I mean by this is that, since the information about the forced mount lives in the esx configuration, a newly added or reinstalled server will refuse to mount these volumes until an admin explicitly force mounts them again. So while it can be a good way of completing a SAN migration with minimal downtime, I would not recommend anyone to just forcemount the new LUNs on their clusters and then just leave it be – at some point they should either bring down all the affected VMs and resignature the volume, or provision new LUNs and storage vmotion the VMs over in a controlled fashion.

      Maybe we need a third option here? In addition to forced mount of an inconsistently labeled VMFS, or a resignature which generates a new UUID, maybe we also need the option to update the metadata on the volume to its now correct values – typically the storage system ID – without changing the actual volume ID? This would be a double-warning, “yes I know what I am doing” case but imho it would solve many issues if used correctly.

      Like several other posters I have also been in the situation where someone else planned a storage replacement, and failed to consider the effect of VMFS volume signatures and different storage system IDs on the replica. My initial response was to strongly recommend using storage vmotion in that case, but the customer insisted that this would take too long and that the move had to be completed in one operation.

      So my next recommendation then was that they immediately went ahead with their long-overdue upgrade to vSphere. This, of course, was ignored as well. So a couple of days before the migration I had to come in and do a proof of concept of how to enable snapshot lun mounting without resignaturing on an ESX 3.5 server, and the strict instruction to never never never present a mirror LUN to an ESX without first triple-checking that said ESX was configured with enableresignature=0. Which was also forgotten, so half-way through the move they powered on an ESX which promptly proceeded to resignature every volume thus disconnecting everything.

      I had foreseen this circumstance and explained how to do a “for vm in `find /vmfs/volumes/ -name \*.vmx`; do vmware-cmd -s register $vm;done” so total disaster was avoided, but still – the whole project became a big mess.

    14. AndyMcM says

      Going to have to say Storage vMotion is the way forward and use scripting to automate it, couple things that nobody has mentioned.

      1) What if you are going between two different vendors? NetApp to EMC or EqualLogic to HP? Not going to be able to use Replication unless there is a management layer between storage and hosts.

      2) VAAI would be great to use but following on from the above point will VAAI work between different vendors? Also if you are migrating most likely going to be migrating from old hardware to new hardware so VAAI might not be supported on the source SAN.

    15. says

      Hi Andy. As for using disparate storage vendors this could be handled by the replication engine in many cases. Many storage vendors offer replication engines that support a finite list of other SANs from other vendors. The question here I think is does my SAN replication engine support both SAN A and SAN B? In addition you’ve got products like the EMC VPLEX and IBM SVC that can sit in front of multiple storage platforms. In fact that’s another option….

      Use a VPLEX or IBM SVC to move the storage if you have one of these.

      As for VAAI I am not sure this can be used across different SANs, but only within one SAN.

    16. Duco Jaspars says

      Thinking a little about this a little more, I would say it should technically be possible to do VAAI across Arrays, would be a nice selling point for a storage vendor, and a lock-in to stay with a specific vendor when migrating/upgrading, since multi-vendor-multi-array VAAI sounds to good to be true …

      Unless the switch fabric gets more inteligent of course, VAAI in the fabric …

    17. says

      I’ve just posted a script I put together to hot SVMotion around 100 VM’s for a datacentre migration. We did not have a single issue either end (total of over 200 SVMotion’s, and around 3TB of VM’s) – http://virtualiseeverything.blogspot.com/2010/07/multi-vm-storage-migration-powershell.html

      I’m all for array replication, but SVMotion has been proven to work for us, and as a number of commentors have already said: uptime always wins over migration time.

    18. says

      Disclosure EMCer here…

      One thing to poke at a bit – we are exploring HOW we could use VAAI in conjunction with the open SAN replication technologies we have (Open Replicator, SAN Copy). It **MIGHT**be possible. Stay tuned.

      But to be clear, for now – VAAI is “one array only”. Also working on things to help automated vmotion at large scale. Stay tuned.