storage drs

SDRS and Auto-Tiering solutions – The Injector

Duncan Epping · Aug 5, 2011 ·

A couple of weeks ago I wrote an article about Storage DRS (hereafter SDRS) interoperability and I mentioned that using SDRS with Auto-Tiering solutions should work… Now the truth is slightly different, however as I noticed some people started throwing huge exclamation marks around SDRS I wanted to make a statement. Many have discussed this and made comments around why SDRS would not be supported with auto-tiering solutions and I noticed the common idea is that SDRS would not be supported with them as it could initiate a migration to a different datastore and as such “reset” the tiered VM back to default. Although this is correct there is a different reason why VMware recommends to follow the guidelines provided by the Storage Vendor. The guideline by the way is to use Space Balancing but not enable I/O metric. Those who were part of the beta or have read the documentation, or our book might recall this when creating datastore clusters select datastores which have similar performance characteristics. In other words do not mix an SSD backed datastore with a SATA backed datastore, however mixing SATA with SAS is okay. Before we will explain why lets repeat the basics around SDRS:

SDRS allows the aggregation of multiple datastores into a single object called a datastore cluster. SDRS will make recommendations to balance virtual machines or disks based on I/O and space utilization and during virtual machine or virtual disk provisioning make recommendations for placement. SDRS can be set in fully automated or manual mode. In manual mode SDRS will only make recommendations, in fully automated mode these recommendations will be applied by SDRS as well. When balancing recommendations are applied Storage DRS is used to move the virtual machine.

So what about Auto-Tiering solutions? Auto-tiering solutions move “blocks” around based hotspots. Yes, again, when SvMotion would migrate the virtual machine or virtual disk this process would be reset. In other words the full disk will land on the same tier and the array will need to decide at some point what belongs where… but is this an issue? In my opinion it probably isn’t but it will depend on why SDRS decides to move the virtual machine as it might lead to a temporary decrease in performance for specific chunks of data within the VM. As auto-tiering solutions help preventing performance issues by moving blocks around you might not want to have SDRS making performance recommendations but why… what is the technical reason for this?

As stated SDRS uses I/O and space utilization for balancing… Space makes sense I guess but what about I/O… what does SDRS use, how does it know where to place a virtual machine or disk? Many people seem to be under the impression that SDRS simply uses average latency but would that work in a greenfield deployment where no virtual machines are deployed yet? It wouldn’t and it would also not say much about the performance capabilities of the datastore. No in order to ensure the correct datastore is selected SDRS needs to know what the datastore is capable off, it will need to characterize the datastore and in order to do so it uses Storage IO Control (hereafter SIOC), more specifically what we call “the injector”. The injector is part of SIOC and is a mechanism which is used to characterize each of the datastore by injecting random (read) I/O. Before you get worried, the injector only injects I/O when the datastore is idle. Even when the injector is busy and it notices other activity on the datastore it will back down and retry later. Now in order to characterize the datastore the injector uses different amount of outstanding I/Os and measures the latency for these I/Os. For example it starts with 1 outstanding I/O and gets a response within 3 miliseconds. When 3 outstanding I/Os are used the average latency for these I/Os is 3.8 miliseconds. With 5 I/Os the average latency is 4.3 and so on and so forth. For each device the outcome can be plotted as show in the below screenshot and the slope of the graph indicates the performance capabilities of the datastore. The steeper the line the lower the performance capabilities. The graphs shows the test where a multitude of datastores are characterized each being backed by a different number of spindles. As clearly shown there is a relationship between the steepness and the number of spindles used.

So why does SDRS care? Well in order to ensure the correct recommendations are made each of the datastores will be characterized in other words a datastore backed by 16 spindles will be a more logical choice than a datastore with 4 spindles. So what is the problem with Auto-Tiering solutions? Well think about it for a second… when a datastore has many hotspots an auto-tiering solution will move chunks around. Although this is great for the virtual machine it also means that when the injector characterizes the datastore it could potentially read from the SSD backed chunks or the SATA backed chunks and this will lead to unexpected results in terms of average latency and as you can imagine this will be confusing to SDRS and possibly lead to incorrect recommendations. Now, this is typically one of those scenarios which requires extensive testing and hence the reason VMware refers to the storage vendor for their recommendation around using SDRS in combination with auto-tiering solutions. My opinion: Use SDRS Space Balancing as this will help preventing downtime related to “out of space” scenarios and also help speeding up the provisioning process. On top of that you will get Datastore Maintenance Mode and Affinity Rules.

Storage DRS interoperability

Duncan Epping · Jul 15, 2011 ·

I was asked about this a couple of times over the last few days so I figured it might be an interesting topic. This is described in our book as well in the Datastore Cluster chapter but I decided to rewrite it and add some of it into a table to make it easier to digest. Lets start of with the table and explain why/where/what… Keep in mind that this is my opinion and not necessarily the best practice or recommendation of your storage vendor. When you implement Storage DRS make sure to validate this against their recommendations. I have marked the area where I feel caution needs to be taken with (*).

Capability	Mode	Space	I/O Metric
Thin Provisioning	Manual	Yes (*)	Yes
Deduplication	Manual	Yes (*)	Yes
Replication	Manual (*)	Yes	Yes
Auto-tiering	Manual	Yes	No (*)

Yes you are reading that correctly, Storage DRS enabled with all of them and even with I/O metric enabled except for auto-tiering. Now although I said “Manual” for all of them I even believe that in some of these cases Fully Automated mode would be perfectly fine. Now as it will of course depend on the environment I would suggest to start out in Manual mode if any of these 4 storage capabilities are used to see what the impact is after applying a recommendation.

First of all “Manual Mode”… What is it? Manual Mode basically means that Storage DRS will make recommendations when the configured thresholds for latency or space utilization has been exceeded. It also will provide recommendations for placement during the provisioning process of a virtual machine or a virtual disk. In other words, when setting Storage DRS to manual you will still benefit from it as it will monitor your environment for you and based on that recommend where to place or migrate virtual disks to.

In the case of Thin Provisioning I would like to expand. I would recommend before migrating virtual machines that the “dead space” that will be left behind on the source datastore after the migration can be reclaimed by the use of the unmap primitive as part of VAAI.

Deduplication is a difficult one. The question is, will the “deduplication” process be as efficient after the migration as it was before the migration. Will it be able to deduplicate the same amount of data? There is always a chance that this is not the case… But than again, do you really care all that much about it when you are running out of disk space on your datastore or are exceeding your latency threshold? Those are very valid reasons to move a virtual disk as both can lead to degradation of service.

In an environment where replication is used care should be taken when balancing recommendations are applied. The reason for this being that the full virtual disk that is migrated will need to be replicated after the migration. This temporarily leads to an “unprotected state” and as such it is recommended to only migrate virtual disks which are protected during scheduled maintenance windows.

Auto-tiering arrays have been a hot debate lately. Not many seem to agree with my stance but up til today no one has managed to give me a great argument or explain to me exactly why I would not want to enable Storage DRS on auto-tiering solutions. Yes I fully understand that when I move a virtual machine from datastore A to datastore B the virtual machine will more than likely end up on relatively slow storage and the auto-tiering solution will need to optimize the placement again. However when you are running out of diskspace what would you prefer, down time or a temporary slow down? In the case of “I/O” balancing this is different and in a follow up post I will explain why this is not supported.

** This article is based on vSphere 5.0 information **

Thanks!!

Duncan Epping · Jul 13, 2011 ·

** Update: Available now: paperback full |paperback black & white **

I’ve seen a lot of crazy things, but when I clicked the amazon link for our book yesterday I literally jumped up and started cheering… Number 1 in “Computers & Internet”. These are the kind of things that make it all worth it! ~~PS: We asked amazon/createspace to get the printed copy up asap and they are looking in to it as it should have been ready by now.~~

What’s new for storage whitepaper and videos

Duncan Epping · Jul 12, 2011 ·

Just noticed that the collateral I have been working on is available for download today as well. Check the “What’s new for Storage” whitepaper, the Storage DRS video and the Profile-Driven Storage video.

vSphere 5.0: Storage DRS introduction

Duncan Epping · Jul 12, 2011 ·

Storage DRS is a brand new feature of vSphere 5.0. It has been one of my focus areas for the last 6 months and probably one of the coolest features of vSphere 5.0. Storage DRS enables you to aggregate datastores in to a single object, called a datastore cluster. This new object is what you will be managing from now on. Storage DRS enables smart placement of virtual machines based on utilized diskspace, latency and LUN performance capabilities. In other words, when you create a new virtual machine you will select a Datastore Cluster instead of a Datastore and Storage DRS will place the virtual machine on one of the datastores in that datastore cluster. This is where the strength lies of Storage DRS, reducing operational effort associated with provisioning of virtual machines…

But that’s not all there is, Storage DRS is a lot more than just initial placement… lets sum the core functionality of Storage DRS up:

Initial Placement
Migration Recommendations (Manual / Fully Automated)
Affinity Rules
Maintenance Mode

These in my opinion are the 4 core pieces of functionality that Storage DRS provides. Initial placement as stated will reduce the amount of operational effort required to provision virtual machines. Storage DRS will figure out which datastore it should be placed on, no need anymore to manually monitor each datastore and figure out which one has the most available diskspace and relative low latency. On top of that SDRS also provides Migration Recommendations if and when thresholds are exceeded, it can generate them (manual mode) or generate and apply them (fully automated mode). These thresholds are utilized disk space(80%) and latency (15ms). This helps preventing bottlenecks in terms of disk space and hot spots in terms of latency.

Affinity Rules and Maintenance Mode are very similar to what DRS offers today. You have the ability to split disks and virtual machines with Affinity Rules, or keep them together. With Maintenance Mode it will be very easy to migrate to new LUNs or to do planned maintenance on a volume, couple of clicks and all VMs will be moved off.

Once again I would like to stress that although the Migration Recommendations (especially in Fully Automated mode) sound really sexy, and it is, it will more than likely be the Initial Placement recommendations where you will benefit the most. More technical information will follow soon here and on frankdenneman.nl