I was reading this article by Chad Sakac on vSphere DR / HA, or in other words SRM versus Stretched (vMSC) solutions. I have presented on vSphere Metro Storage Cluster solutions at VMworld together with Lee Dilworth and also wrote a white paper on this topic a while back and various blog posts since. I agree with Chad that there are too many people misinformed about the benefits of both solutions. I have been on calls with customers where indeed people were saying SRM is a legacy solution and the next big thing is “Active / Active”. Funny thing is that in a way I agree when they say SRM has been around for a long time and the world is slowly changing, I do not agree with the term “legacy” though.
I guess it depends on how you look at it, yes SRM has been around for a long time but it also is a proven solution that does what it says it does. It is an orchestration solution for Disaster Recovery solutions. Think about a disaster recovery scenario for a second and then read those two last sentences again. When you are planning for DR, isn’t it nice to use a solution that does what it says it does. Although I am a big believer in “active / active” solutions, there is a time and place for it; in many of the discussions I have been a stretched cluster solution was just not what people were looking for. On top of that Stretched Cluster solutions aren’t always easy to operate. That is I guess what Chad was also referring to in his post. Don’t get me wrong, a stretched cluster is a perfectly viable solution when your organization is mature enough and you are looking for a disaster avoidance and workload mobility solution.
If you are at the point of making a decision around SRM vs Stretched Cluster make sure to think about your requirements / goals first. Hopefully all of you have read this excellent white paper by Ken Werneburg. Ken describes the pros and cons of each of these solutions perfectly, read it carefully and then make your decision based on your business requirement.
So just in short to recap for those who are interested but don’t have time to read the full paper, make time though… really do!
Where does SRM shine:
- Disaster Recovery
- Disaster Avoidance (will incur downtime when VMs failover to other site)
Where does a Stretched Cluster solution shine:
- Workload mobility
- Cross-site automated load balancing
- Enhanced downtime avoidance
- Disaster Avoidance (VMs can be vMotioned, no downtime incurred!)
Hans De Leenheer says
there’s only a certain amount of stretching you can do 😉
good remarks, couldn’t agree more
Anil Sedha says
There is one more thing for which Stretch clusters work best – increased utilization of Recovery Site Hardware. A lot of organizations invest into DR hardware that remains unused. An active-active setup helps better utilize that infrastructure and gives a higher bang for the buck.
Besides, EMC products like vPlex when used with RecoverPoint are a killer combination. I am a RecoverPoint customer and that is the most robust and feature rich replication appliance that I have seen out there today. And the real benefit is when you move from Metro Cluster to a Geo Cluster setup.
However, as you and Chad (@sakacc) both indicate – an ‘active/active’ cluster is not for everyone. But those who can refine their architecture can not only better utilize their infrastructure but also add other sideline benefits – performance for mission critical workloads, failover benefits, high availability and so on.
It really depends on the use case.
Steve Flanders says
I’m not sure I agree. While stretch clusters do allow you to leverage hardware in multiple locations you still need to have enough hardware in either site to support a DR scenario. The initial idea behind stretch clusters is active site balancing not disaster recovery. If you are able to perform true DR with stretched clusters then you likely are leveraging an equal amount of infrastructure across each datacenter and have an equal amount of DR infrastructure that remains idle. One way I have seen companies leverage DR infrastructure is for a non-production environments (e.g. development, performance, etc). The idea is in a true DR scenario the non-production environment running on DR infrastructure can be turned down and the production environment can be brought back up.
Iwan 'e1' Rahabok says
Indeed there is a lot of confusion. I have had hours of discussions, mostly with global banks, that I documented the thought in http://communities.vmware.com/docs/DOC-19992. Hope you find it useful.
Jacint Juhasz says
I speak almost every day to customers about the differences and I do not use the term ‘legacy’ for the SRM based solution (or any solution where replication happens) I say ‘classic DR’.
I would disagree with the recap part. For me the main difference between the two solutions is vMSC is a true Disaster Avoidance solution, so you literally don’t let disaster happen, you avoid it. If you have an SRM based solution first you let disaster happen THEN you react. I would not list Disaster Avoidance in the shine section of SRM, it is Disaster Recovery.
Don’t get me wrong, both are great solutions and it’s our responsibility to help customers select the one which is better for them.
Hi Duncan, great article and very useful thanks.
We have a SME sized environment and I chuckle at the classification of SRM as ‘legacy’, it makes me wonder what we should call our DR strategy, ‘legacy legacy’ DR or ‘fossilized’ DR perhaps! I’m talking a single server and application level back ups!
Finally though after much convincing, we are about to move to an SRM based model using VR.
I know that CBT tracking/snapshot based products like Veeam can cause a performance hit, I just wondered if anyone knew if hypervisor based replication (SRM and VR) had the same impact?