My colleague Ken Werneburg, also known as “@vmKen“, just published a new white paper. (Follow him if you aren’t yet!) This white paper talks about both SRM and Stretched Cluster solutions and explains the advantages and disadvantages of either. It provides a great overview in my opinion on when a stretched cluster should be implemented or when SRM makes more sense. Various goals and concepts are discussed and I think this is a must read for everyone exploring implementing a Stretched Clusters or SRM.
http://www.vmware.com/resources/techresources/10262
This paper is intended to clarify concepts involved with choosing solutions for vSphere site availability, and to help understand the use cases for availability solutions for the virtualized infrastructure. Specific guidance is given around the intended use of DR solutions like VMware vCenter Site Recovery Manager and contrasted with the intended use of geographically stretched clusters spanning multiple datacenters. While both solutions excel at their primary use case, their strengths lie in different areas which are explored within.
Duncan says
Yes, DR but no HA.. chalk and cheese. L2 stretch clusters proved HA across the entire lifecycle including OS patching.
I would not recommend confusing the two capabilities. SRM does not provide a complete HA solution.
Marko says
Duncan,
thank you for sharing the link to this whitepaper. Could you explain a little bit more what why a stretched cluster and SRM couldn’t be used together?
Thanks, Marko
Duncan says
Because a stretched cluster means you only have 1 vCenter Server. SRM requires two vCenter Server instances 🙂
Marko says
Okay, let’s assume there are 3 (three) datacenters, DC A, DC B and DC C. A stretched cluster between A and B is configured so I can’t use SRM at the same time for A and B.
If there is a stretched cluster between A and B it should be possible to use SRM to replicate (AB) to DC C. Right?
Dave Gogerly says
Hi,
Has any body configured above solution with vPlex please? Ex Site A and B being a stretched cluster , and Site C being the DR site for both sites A & C. Incase any of the sites A or B fail, the failed site can be recovered at SITE C through SRM. If so can some one please give me a break down on the configuration.
Many thanks,
Dave
Doug says
I think the key discussion points I have, once we get past explaining the requirements and differences are some limitations of the ‘magic’ with stretched HA/DRS clusters:
1) Not site aware (a VM and its storage may be at opposite sites. You potentially take the latency hit for EACH I/O)
2) Will not automatically keep workloads at their appropriate/optimal site (a majority of your users may be accessing the VM from site A, but DRS could relocate the VM to site B to balance load. Now, your users get to take the latency hit… makes troubleshooting ‘slowness’ a lot of fun!)
3)Cannot handle dependencies between virtual machines (consider a 3-tier application/service. What happens when the web tier and database tier are at site A and the app tier is at site B? What if the storage for the DB tier is also at site B?)
Fun stuff, especially considering lack of integration with the underlying storage.
Duncan Epping says
Yes that should be possible Marko
Marko says
@Duncan
Thank you, now it’s clear to me.
@Doug
I know, but sometimes you need to accept some discomfort to get a bigger solution running.
Imho 1), 2) and 3) should be manageable by an VMware solution. Maybe we see such features in the future? Duncan?
Duncan says
All of the issues mentioned can be worked around by simple use of DRS Affinity Rules and Datastore Clusters. I am writing a whitepaper on the topic as we speak which will give architectural and operational guidance. Hopefully out in a couple of weeks.
Manuel says
@ Duncan
As usual, your posts are very interesting and helpful. Thanks a lot.
As I am designing a new vSphere 5 environment and determining the possible shot-distance failover solution where I consider stretched clusters and long-distance failover with SRM like Marko mentioned I am highly interested in your mentioned white paper.
Please let us know as soon as your work is done.
Duncan Epping says
I will let you guys know for sure. Working on editing the docs right now.
Bob Greenway says
While I do have reservations regarding stretched clusters,(mainly down to I/O latency/LUN location already mentioned) DRS groups are a way to effectivly allow HA of a site, while stil maintaining site affinity for specific servers.
And having implemented both, as the WP says, they each have their pros and cons
Justin says
Im actually looking to do this for a DR strategy until VPLEX and SRM are supported together.
I plan on using VPLEX metro for replication and using the stretch cluster to manage the hosts/vm’s on the prod and dr locations.
Disclaimer: it shouldnt be called DR since the locations are 15miles apart 🙂 More like TR
Munishpal says
If I am not wrong, With the latest version of VPLEX 5.1 Stretched vSphere Clusters and DR with SRM ,both can be achieved.
http://virtualgeek.typepad.com/virtual_geek/2012/05/stretched-vsphere-clusters-dr-with-srm-why-not-both.html
Munishpal says
I have also started open discussion about the same. Please feel free to post your comments
http://vcommunique.blogspot.in/2012/05/emc-vplex-srm.html
Duncan Epping says
Yes latest VPLEX release will support that…
Ratnadeep Bhattacharya says
Hi Duncan,
I have one question. It may be silly but I don’t have the infrastructure at my disposal currently to work this one out.
Let’s say I create a stretched cluster. Would it be possible to set up SRM on this cluster for failover to my DR site which is, say, a third vCenter?
Would be interesting to find out.
Regards,
Deep
Duncan says
@Ratnadeep Bhattacharya: Yes that should be possible.