This week Site Recovery Manager 6.1 was announced. There are many enhancements in SRM 6.1 like the integration with NSX for instance and policy driven protection, but personally I feel that support for stretched storage is huge. When I say stretched storage I am referring to solutions like EMC VPLEX, Hitachi Virtual Storage Platform and IBM San Volume Controller(etc). In the past, and you can still today, when you had these solutions deployed you would have a single vCenter Server with a single cluster and moved VMs around manually when needed, or let HA take care of restarts in failure scenarios.
As of SRM 6.1 running these types of stretched configurations is now also supported. So how does that work, what does it allow you to do, and what does it look like? Well in contrary to a vSphere Metro Storage Cluster solution with SRM 6.1 you will be using two vCenter Server instances. These two vCenter Server instances will have an SRM server attached to it which will use a storage replication adaptor to communicate to the array.
But why would you want this? Why not just stretch the compute cluster also? Many have deployed these stretched configurations for disaster avoidance purposes. The problem is however that there is no form of orchestration whatsoever. This means that all workloads will come up typically in a random fashion. In some cases the application knows how to recover from situations like that, in most cases it does not… Leaving you with a lot of work, as after a failure you will now need to restart services, or VMs, in the right order. This is where SRM comes in, this is the strength of SRM, orchestration.
Besides doing orchestration of a full failover, what SRM can also do in the 6.1 release is evacuate a datacenter using vMotion in an orchestrated / automated way. If there is a disaster about to happen, you can now use the SRM interface to move virtual machines from one datacenter to another, with just a couple of clicks, planned migration is what it is called as can be seen in the screenshot above.
Personally I think this is a great step forward for stretched storage and SRM, very excited about this release!
Steve says
Does the integration with NSX finally migrate the DFW rules to the recovery site?
Vicks says
Hi Duncan , When you say orchestration , which VM to restart based on application . it is there in HA as well though it is just a restart priority but here perhaps it is more granular ? , I do agree on vMotion orchestration part it is very helpful . In short very good features .
How abt licensing , vCenter & SRM, dont you think it is additional Lic ?
Regards
Vicks
Mark Burgess says
Hi Duncan,
Doesn’t this just highlight that HA should have orchestration features that control start-up order?
This must be something that people have been asking for since the early days of HA.
This is not just a problem in stretched clusters it could also be a (smaller) problem with local HA.
Also vCenter should be site aware in a stretched cluster and have the ability to evacuate a site by vMotioning all the VMs to the other site.
Doesn’t adding SRM just make things more complex and expensive?
Best regards
Mark
Duncan Epping says
And it is something that is on the roadmap, but when we are talking about single host failures, partial cluster failures, double failures, not as easy to develop as one may think.
http://www.yellow-bricks.com/2013/09/13/ha-futures-restart-order/
Mark says
I agree with Mark, some of this functionality should be built into vSphere rather then having to deploy SRM, especially in stretched cluster environments
mavatko says
+1, forcing you to buy SRM for something which should be IMHO basic feature…. I believe evacuation by vmotion can be solved with a simple vCO workflow. However what I would really love to see is dependencies aware HA – also with possibility to force restart of the dependent application. Something which should be there long time ago
Duncan Epping says
See my reply above, something that is being worked on. But for HA it is completely different then with SRM hence it takes time.
mavatko says
Yep, I actually remember it from time when you published it. Awaiting improvements with each release, but nothing for two years. Although +1 for component protection
3434@hat.com says
and maybe add evacuate a datacenter using SVmotion
3434@hat.com says
add mirror policy and mirror schedule also netapp