Today someone asked for a Storage DRS intro, I wrote one for our book a year ago and figured I would share it with the world. I still feel that Storage DRS is one of the coolest features in vSphere 5.0 and I think that everyone should be using this! I know there are some caveats (1, 2) when you are using specific array functionality or for instance SRM, but nevertheless… this is one of those features that will make an admin’s life that much easier! If you are not using it today, I highly suggest evaluating this cool feature.
*** out take from the vSphere 5.0 Clustering Deepdive ***
vSphere 5.0 introduces many great new features, but everyone will probably agree with us that vSphere Storage DRS is most the exciting new feature. vSphere Storage DRS helps resolve some of the operational challenges associated with virtual machine provisioning, migration and cloning. Historically, monitoring datastore capacity and I/O load has proven to be very difficult. As a result, it is often neglected, leading to hot spots and over- or underutilized datastores. Storage I/O Control (SIOC) in vSphere 4.1 solved part of this problem by introducing a datastore-wide disk-scheduler that allows for allocation of I/O resources to virtual machines based on their respective shares during times of contention.
Storage DRS (SDRS) brings this to a whole new level by providing smart virtual machine placement and load balancing mechanisms based on space and I/O capacity. In other words, where SIOC reactively throttles hosts and virtual machines to ensure fairness, SDRS proactively makes recommendations to prevent imbalances from both a space utilization and latency perspective. More simply, SDRS does for storage what DRS does for compute resources.
There are five key features that SDRS offers:
- Resource aggregation
- Initial Placement
- Load Balancing
- Datastore Maintenance Mode
- Affinity Rules
Resource aggregation enables grouping of multiple datastores, into a single, flexible pool of storage called a Datastore Cluster. Administrators can dynamically populate Datastore Clusters with datastores. The flexibility of separating the physical from the logical greatly simplifies storage management by allowing datastores to be efficiently and dynamically added or removed from a Datastore Cluster to deal with maintenance or out of space conditions. The load balancer will take care of initial placement as well as future migrations based on actual workload measurements and space utilization.
The goal of Initial Placement is to speed up the provisioning process by automating the selection of an individual datastore and leaving the user with the much smaller-scale decision of selecting a Datastore Cluster. SDRS selects a particular datastore within a Datastore Cluster based on space utilization and I/O capacity. In an environment with multiple seemingly identical datastores, initial placement can be a difficult and time-consuming task for the administrator. Not only will the datastore with the most available disk space need to be identified, but it is also crucial to ensure that the addition of this new virtual machine does not result in I/O bottlenecks. SDRS takes care of all of this and substantially lowers the amount of operational effort required to provision virtual machines; that is the true value of SDRS.
However, it is probably safe to assume that many of you are most excited about the load balancing capabilities SDRS offers. SDRS can operate in two distinct modes: No Automation (manual mode) or Fully Automated. Where initial placement reduces complexity in the provisioning process, load balancing addresses imbalances within a datastore cluster. Prior to vSphere 5.0, placement of virtual machines was often based on current space consumption or the number of virtual machines on each datastore. I/O capacity monitoring and space utilization trending was often regarded as too time consuming Over the years, we have seen this lead to performance problems in many environments, and in some cases, even result in down time because a datastore ran out of space. SDRS load balancing helps prevent these, unfortunately, common scenarios by making placement recommendations based on both space utilization and I/O capacity when the configured thresholds are exceeded. Depending on the selected automation level, these recommendations will be automatically applied by SDRS or will need to be applied by the administrator.
Although we see load balancing as a single feature of SDRS, it actually consists of two separately-configurable options. When either of the configured thresholds for Utilized Space (80% by default) or I/O Latency (15 milliseconds by default) are exceeded, SDRS will make recommendations to prevent problems and resolve the imbalance in the datastore cluster. In the case of I/O capacity load balancing, it can even be explicitly disabled.
Before anyone forgets, SDRS can be enabled on fully populated datastores and environments. It is also possible to add fully populated datastores to existing datastore clusters. It is a great way to solve actual or potential bottlenecks in any environment with minimal required effort or risk.
Datastore Maintenance Mode is one of those features that you will typically not use often; you will appreciate it when you need. Datastore Maintenance Mode can be compared to Host Maintenance Mode: when a datastore is placed in Maintenance Mode all registered virtual machines, on that datastore, are migrated to the other datastores in the datastore cluster. Typical use cases are data migration to a new storage array or maintenance on a LUN, such as migration to another RAID group.
Affinity Rules enable control over which virtual disks should or should not be placed on the same datastore within a datastore cluster in accordance with your best practices and/or availability requirements. By default, a virtual machine’s virtual disks are kept together on the same datastore.
For those who want more details, Frank Denneman wrote an excellent series about Datastore Clusters which might interest you:
Part 1: Architecture and design of datastore clusters.
Part 2: Partially connected datastore clusters.
Part 3: Impact of load balancing on datastore cluster configuration.
Part 4: Storage DRS and Multi-extents datastores.
Part 5: Connecting multiple DRS clusters to a single Storage DRS datastore cluster.
Part 6: Aggregating datastores from multiple storage arrays into one Storage DRS datastore cluster.
Some other articles that might be of use:
- SDRS and Auto-Tiering solutions – The Injector (Duncan)
- Storage DRS Load Balance Frequence (Frank)
- SDRS Out-Of-Space avoidance (Frank)
- Storage vMotion and the mirror-mode driver (Duncan)
The following video will give an overview of the above mentioned features… worth checking.
Bob Greenway says
I can’t say I am a big fan of Storage DRS, seems to me aimed at very small businesses who have limited budget on storage, why put more I/O on a host when the storage vendors (3PAR/IBM etc) already do this tiered migration in the background, plus with scary KB articles such as http://kb.vmware.com/kb/2013639 (posted on this site) why put a production system at risk
Duncan Epping says
Please re-read the post. Storage DRS is more than I/O load balancing. I think Initial Placement is an excellent feature and there is no storage vendor out there that has solved it. Even if you do not turn on I/O load balancing and set it to manual there is a lot of value.
Also want to point out that the KB you are referring to is not a Storage DRS bug but a Storage vMotion bug. I agree that this need to be fixed.
Austin says
Duncan, I am a new VCP and I have been following your blog for about 2 months now. Just wanted to say thanks for all of your posts, they have been great!
Steve says
Duncan-
Great post (great book, really). The thing that bugged me about Storage DRS from the moment I first saw it to right now is still the fact that, during initial placement, based on how provisioning works in vCenter.. unless the template you deploy from contains all of the correct parameters from a capacity configuration perspective (vRam set right, additional drives, etc)… it’s still making a decision based on incomplete information (that is, unless you let it decide where to put drives randomly, which I do not — I want it to store them together, generally).
In our environment, templates are generally a standard operating system build that gets provisioned, then memory gets upped to whatever that VM needs, and additional drives are added, etc. When that gets done once the provisioning request is already done, though… well, you can see the conundrum (hopefully).
jodi shely says
Nice explanation. Thanks!
Hemanth says
Duncan,
What would be the best way to remove a Datastore from Storage DRS Cluster.
No option in GUI to remove the Datastore from Storage DRS Cluster, want to remove the datastore from SDRS Cluster & then re-add the datastore to the SDRS Cluster.
James says
Good question Hemanth. I was wondering he same thing. I guess you need to remove sDRS completely and start from scratch? This would surprise me though as what about all your config settings? You would have to put this all back which is fine I suppose but still. Anyone?
groenlandiaJames says
I guess no one knows or the solution is you need to start from scratch. Could be improved. Can someone please explain to me what the difference is between storage DRS I/O (default 15ms) monitoring and Storage I/O (30ms by default). I am confused.
Russell Ford says
Hey guys, also sift through hoards of settings and menus before figuring out…simply drag and drop the volume from the cluster to remove it.