During VMworld I received multiple questions around support for vSphere Storage DRS with vSphere Site Recovery Manager (SRM), we even had this question during our session and my answer was “Yes it does”. During some of the other sessions presenters stated that it was unsupported. Scott Lowe also mentions recalling the fact that it was mentioned somewhere down the line to be unsupported. Now although the Resource Management Guide for vSphere 5.0 on page 91 currently says it is supported it is not supported. Yes I know I stated it was supported but unfortunately the document is incorrect and the information provided to me was outdated. Although I verified the facts, I was not informed about this change. Hopefully this will not happen again and my apologies for that.
Now lets give the raw facts first, SRM does not support Storage vMotion and SRM does not support Storage DRS. The reason that SRM does not support Storage vMotion (and subsequently Storage DRS) is because it changes the location of the virtual machine without SRM being aware of it. After the location of the virtual machine has changed the VM that was originally protected by SRM will not be protected anymore which can have an impact on your RTO. These are the raw facts. I have requested the SRM team to document this in a KB to make sure everyone understands the reason and the impact.
The question of course is… will it work? My colleague Cormac has tested it and you can read his observations here.
This statement is documented in the SRM releasenotes: http://www.vmware.com/support/srm/srm_releasenotes_5_0_0.html
Interoperability with Storage vMotion and Storage DRS:
Due to some specific and limited cases where recoverability can be compromised during storage movement, Site Recovery Manager 5.0 is not supported for use with Storage vMotion (SVmotion) and is not supported for use with the Storage Distributed Resource Scheduler (SDRS) including the use of datastore clusters.
** update: I followed the documentation which apparently was incorrect. Documentation bug has been filed, should be update in the near future. **
** update: Link to SRM releasenotes with statement added. **
So, If I’m putting Storage DRS into Manual mode and applying the constraints outlined above, is SRM *really* supporting SDRS, or just *tolerating* it?
Splitting hairs, but I believe the attendees’ questions were intended to uncover whether one can fully utilize SDRS with SRM. In that case, it appears that the answer is, as you mention, “sort of.”
I would say SRM is supported with SDRS but not aware of it and as such any issues caused by applying recommendation need to be manually resolved. Tolerating it is the best description.
However I just received an email about this article internally and I will come back to this later.
So I get the logic behind the statement, “All datastores that are part of a datastore cluster should belong to the same consistency group” but am I the only one that thinks this throws a kink into array based replication? I think consistency groups could quickly double or triple in the number of member LUNs to get the true benefits of SDRS. This can have a tradeoff of more bandwidth usage for replication and shorter protection windows. My hope is that storage vendors find this as a push to start managing consistency at the vm level.
I’d be more concerned with change rates flying through the roof when you apply a recommendation to move say a VM with 30GB of data from LUNX to LUNY.
My default recommendation until the array is aware of these sorts of moves is going to be not to use storage DRS with replicated volumes.
A couple of things here:
1) SDRS is not only load balancing, also affinity rules / maintenance mode and most importantly Initial Placement
2) Yes you can decide not to use SDRS but SDRS has been created to prevent issues from occurring. These issues are either performance related or diskspace related. You can always prefer to have downtime to a VM due to a filled up VMFS volume or slow performance due to high latency. The question is will your customer agree and would they rather take the impact.
Hi Duncan,
Can you be more specific about which Replication method in SRM? Are you talking about Array Based replication or SRM’s new replication method which VMware calls it vSphere replication?
As far as I know if you have Array based replication; if you use SDRS to move the VM to an unprotected Volume/Lun then you will lose the protection.
If you move VM to a protected Volume/lun the issue is “Replication”. The whole VM should be replicated to your DR Site. (Good luck if you have a VM with 500GB Data on it.)
So In order to be protected all the time you should make a “Datastore cluster” that all the Datastors inside that Cluster are protected.
vSphere Replication is new for me and I am not sure how it works.
Peace.
Yes I was specifically talking about array based replication as this is the most commonly used form of replication today. For vSphere replication check the link in the article.
Yes all datastores in your datastore cluster should of course be replicated. This is the only way to form a datastore cluster, always the same availability characteristics.
As stated in the comment above. You have the choice of not moving it and incur downtime because your VMFS volume filled up or experience performance issues because of high latency… or incur the cost of replication. If I were a customer I know what I would pick!
And I think it would be better to create affinity rules for the guests with larger disks (like the 500M example mentioned above) to reduce the cost of replication anyhow. This allows the smaller vmdk’s to be moved instead. Chances are that moving a couple of smaller vmdk’s with decent amount of IO will mitigate any space or performance issue and be a worthwhile trade-off. Also, I don’t mind performing SRM reconfigure as a standard practice, it’s a good habit to be in anyhow.
“2) Yes you can decide not to use SDRS but SDRS has been created to prevent issues from occurring. These issues are either performance related or diskspace related. You can always prefer to have downtime to a VM due to a filled up VMFS volume or slow performance due to high latency. The question is will your customer agree and would they rather take the impact.”
When I look at my larger customers that have the bandwidth to support this sort of thing, they aren’t really appropriately equipped to support thin provisioned disks or overcommitment anyway on replicated volumes. The odds of them running out of performance or capacity are slim when there are multitudes of processes and tools to prevent that from being an issue. Latency hasn’t been an issue in well planned environments for a good chunk of my customers for a while.
Smaller customers (which have an easier time of it with less silos or outsourced IT) might not have enough bandwidth to handle suddenly dumping 500GB of data on the wire. In some cases this could impact other applications and as mentioned will mean the VM isn’t protected until it can re-sync.
Dodging downtime doesn’t really help me if I throw my replication schedule off to such a point where I’m shipping tapes with my snapmirrors off to my DR site to re-initialize.
As a customer I’d take the pragmatic approach of ensuring my DR plan is always supportable and meets the expectations of the business. You never know when some event could happen (man-made or natural) to force you to enact your plan.
I’d say if I were to apply the 80/20 rule; 80% of our customers would be better off not using SDRS on replicated volumes where 20% will be able to make the best of it. This could be due to infrastructure issues, operational readiness issues, or just a lack of tangible benefit.
Storage DRS is a great thing but if you’re depending on it to avoid running out of capacity or to fight performance problems then you need to re-think your overall datacenter design.
I fully agree with you but keep in mind that Storage performance problems are more common than anything. I have done countless of analyses on environment where performance was below expectations and 99% of the times it was caused by storage. Regardless of the type of array at some point your disks backing your datastore will need to be able to digest the load, not enough spindles or a heavy raid-penalty will cause issues at some point.
Yes I fully agree that it is best to avoid these problems completely. Design and implement accordingly and you should never have these problems to begin with. But than again, designing and implementing based on historical averages and peak values are in no way a guarantee.
I caught that in your session Monday morning & sent you a tweet about it. Thank you for the follow up clarification.
I still don’t see why SRM wouldn’t support Storage DRS for Initial Placement. As you mention, that is one of the cool features that people often overlook, and I can definitely see value there.
How does that play-out when using vSphere Replication instead of storage-based replication?
Not supported for now.
If we align the datastore cluster (in SDRS) with the datastore group (in SRM), will this work or supported?
No it is not supported
Thanks for the clarification!
Hi, if we using vSphere Replication in SRM, it’s recommended and best practice to use physical server as SRM server instead of VM?