BC-DR

EMC adds automated failback to SRM

Duncan Epping · Feb 23, 2009 ·

EMC just announced that they are adding automated failbacks to SRM for their Celerra family via a vCenter plugin. I hope I can see a demo here at VMworld:

VMware Site Recovery Manager Automated Failback via a VMware vCenter plug-in helps Celerra customers coordinate a “failback” to the original virtual infrastructure, including all the process steps once VMware Site Recovery Manager performs a failover. EMC offers the only solution on the market today that arms customers with end-to-end disaster recovery at the simple push of a button.

SRM and rescanning your storage twice

Duncan Epping · Feb 19, 2009 ·

I just got off the phone with a former colleague. He was implementing SRM at a customer site and couldn’t get it working correctly because the VMFS volumes weren’t discovered at the recovery site. As most of you know sometimes you need to rescan your HBA’s twice before the LUNs and orVMFS volumes are discovered. When using SRM the rescan only occurs once by default. Fortunately this is a setting that can be changed in the vmware-dr.xml file:

To enable the additional rescan, edit the vmware-dr.xml file at both the protected and recovery sites to add a <hostRescanRepeatCnt> element within the <SanProvider> element. Set the value of <hostRescanRepeatCnt> to 2, as shown in the following example:
<SanProvider>
.
.
.
<hostRescanRepeatCnt>2</hostRescanRepeatCnt>
</SanProvider>

If you are doing SRM implementations it might be useful to write this one down… Especially when combined with HP EVA’s.

VMware HA or VMware SRM, what should I use?

Duncan Epping · Feb 12, 2009 ·

I was just reading up on VMTN and noticed this great topic. For some reason there are a lot of people that don’t see the difference between HA and SRM. I suggest reading the full topic and especially Jay Judkowitz’s replies and Smoggy’s reply, both are Subject Matter Experts on SRM and explained the topic starter what the differences are and when to use it. Here’s an outtake of the discussion which captures the essence of the answer in my opinion:

With SRM, you get a much more well defined failover.
- The VMs start in a specified order
- You can set some VMs to be started serially with others starting in parallel
- You can designate VMs at the recovery site to suspend to make room for recovery VMs
- You can have callout scripts and predefined breakpoints to make sure that critical non-VMware activity is done at the right time and place
- You can set the resource pool at the remote site (with the same size or different as the source resource pool) so that you get a predictable and defined QOS on CPU and memory
Once you have that well defined failover plan, you can test it and audit the results
- Testing will automatically snap the recovery LUNs so you can power on the recovery VMs without interrupting replication
- You can specify a test network at the second site that SRM will automatically put the recovery VMs on during a test so that they do not interfere with the running VMs
- You can therefore do non-disruptive DR testing any time without warning. The recovery plan executes the same as for failover, but in a “test bubble” where storage and network IO are safely segregated away from production work.
- There is a test results page for the recovery plan which lists all test runs, how long they took and how successful they were. From this page, you can drill down to each test run and see exactly what steps succeeded and failed and how long they took to run.
- With the history page, you can grade your organization over time. With the detailed reports, you can troubleshoot specific runs.

I suggest that if you’re looking into Business Continuity / Disaster Recovery and you’ve got questiosn on what/where/when/how with SRM you visit the VMTN forums… these guys really know what they are talking about and can really help you understanding what BC/DR is about.

New vCenter add-on announced: vCenter Heartbeat

Duncan Epping · Feb 9, 2009 ·

VMware just announced a new add-on for vCenter:

Source
As customers expand their use of VMware, maintaining a highly available management infrastructure is quickly becoming a key requirement. Learn about VMware’s new availability solution for vCenter Server and how it expands infrastructure and services sales opportunities.

The title of the webex is “Introducing VMware vCenter Server Heartbeat”

In other words, there’s an add on coming up that provides high availability for your vCenter Server. We will have to wait until tomorrow before more details are announced!

I just also noticed that there will be a session at VMworld Europe on vCenter Server Heartbeat:

DC10 – Chosing a Solution for vCenter Server Availability (by David Friedlander VMware Product Manager)

SRM Failback?

Duncan Epping · Feb 4, 2009 ·

I get this question a lot:Does SRM have Failback capabilities? The answer is short but not simple, yes it does. Keep in mind that there’s no big red button labeled “Failback” which is the “not simple” part of the answer. Luckily for us the VMware Uptime Blog Team wrote an extensive article on how to do a failback with the current version of Site Recovery Manager. In short this is what one needs to do to failback:

Reverse the replication direction in the storage layer to be from Site B to Site A
Clean up the shadow virtual machines and protection groups on Site A
Clean up the Recovery Plans configured on Site B
Configure the protection group(s) on Site B
Configure the Recovery Plans on Site A
Test recovery from Site B to Site A
Perform the recovery from Site B to Site A

Read the complete article on the Uptime Blog for all the details and show the article to your manager. It includes a table with an the estimated amount of time a failback would normally take manual vs SRM.