vSphere 5.1 Storage DRS Interoperability

A while back I did this article on Storage DRS Interoperability. I had questions last week about this so I figured I would write a new article which reflects the current state (vSphere 5.1). I also included some details that are part of the interoperability white paper Frank and I did so that we have a fairly complete picture. This white paper is on 5.0, it will probably be updated at some point in the future.

The first column describes the feature or functionality, the second column the recommended or supported automation mode and the third and fourth column show which type of balancing is supported.

Capability Automation Mode Space Balancing I/O Metric Balancing
Array-based Snapshots Manual Yes Yes
Array-based Deduplication Manual Yes Yes
Array-based Thin provisioning Manual Yes Yes
Array-based Auto-Tiering Manual Yes No
Array-based Replication Manual Yes Yes
vSphere Raw Device Mappings Fully Automated Yes Yes
vSphere Replication Fully Automated Yes Yes
vSphere Snapshots Fully Automated Yes Yes
vSphere Thin provisioned disks Fully Automated Yes Yes
vSphere Linked Clones Fully Automated (*) Yes Yes
vSphere Storage Metro Clustering Manual Yes Yes
vSphere Site Recovery Manager Not supported n/a n/a
VMware vCloud Director Fully Automated (*) Yes Yes
VMware View (Linked Clones) Not Supported n/a n/a
VMware View (Full Clones) Fully Automated Yes Yes

(*) = Change from 5.0

Death to false myths: Storage IO Control = Storage DRS IO load balancing

I often hear people making comments around Storage IO Control and Storage DRS IO Load Balancing being one and the same thing. It has been one of those myths that has been floating around for a long time now, and with this article I am going to try to stop it.

I guess where this myth comes from is that when you create a Datastore Cluster and you enable Storage DRS IO Load Balancing then it configures Storage IO Control for you automatically on all datastores which are part of that particular Datastore Cluster. This seems to give people the impression that they are the same thing.

I have heard people making these claims especially around interoperability discussions. For example, one of the common made mistakes is that you should not enable Storage IO Control on a datastore which has auto-tiering (like EMC FAST for instance) enabled. Now the thing is that in the Storage DRS Interop white paper it is listed that when using an auto-tiering array you should disable IO Load Balancing when using Storage DRS. However, let is be clear Storage IO Control and Storage DRS Load Balancing are not one and the same thing and Storage IO Control is supported in those scenarios!

Storage DRS uses Storage IO Control to retrieve the IO metrics required to create load balancing recommendations. So lets repeat that, Storage DRS leverages Storage IO Control. Storage IO Control works perfectly fine without Storage DRS. Storage IO Control is all about handling queues and limiting the impact of short IO spikes. Storage DRS is about sustained latency and moving virtual machines around to balance out the environment.

I guess I can summarize this article in just one sentence:
Storage IO Control != Storage DRS IO Load Balancing

 

Should I use many small LUNs or a couple large LUNs for Storage DRS?

At several VMUGs I presented a question that always came up was the following: “Should I use many small LUNs or a couple of large LUNs for Storage DRS? What are the benefits of either?”

I posted about VMFS-5 LUN sizing a while ago and I suggest reading that first if you haven’t yet, just to get some idea around some of the considerations taken when sizing datastores. I guess that article already more or less answers the question… I personally prefer many “small LUNs” than a couple of large LUNs, but let me explain why. As an example, lets say you need 128TB of storage in total. What are your options?

You could create 2x 64TB LUNs, 4x 32TB LUNs, 16x 8TB LUNs or 32x 4TB LUNs. What would be easiest? Well I guess 2x 64TB LUNs would be easiest right. You only need to request 2 LUNs and adding them to a datastore cluster will be easy. Same goes for the 4x 32TB LUNs… but with 16x 8TB and 32x 4TB the amount of effort increases.

However, that is just a one-time effort. You format them with VMFS, add the to the datastore cluster and you are done. Yes, it seems like a lot of work but in reality it might take you 20-30 minutes to do this for 32 LUNs. Now if you take a step back and think about it for a second… why did I wanted to use Storage DRS in the first place?

Storage DRS (and Storage IO Control for that matter) is all about minimizing risk. In storage, two big risks are hitting an “out of space” scenario or extremely degraded performance. Those happen to be the two pain points that Storage DRS targets. In order to prevent these problems from occurring Storage DRS will try to balance the environment, when a certain threshold is reached that is. You can imagine that things will be “easier” for Storage DRS when it has multiple options to balance. When you have one option (2 datastores – source datastore) you won’t get very far. However, when you have 31 options (32 datastores – source datastore) that increases the chances of finding the right fit for your virtual machine or virtual disk while minimizing the impact on your environment.

I already dropped the name, Storage IO Control (SIOC), this is another feature to take in to account. Storage IO Control is all about managing your queues, you don’t want to do that yourself. Believe me it is complex and no one likes queues right. (If you have Enterprise Plus, enable SIOC!) Reality is though, there are many queues in between the application and the spindles your data sits on. The question is would you prefer to have 2 device queues with many workloads potentially queuing up, or would you prefer to have 32 device queues? Look at the impact that this could have.

Please don’t get me wrong… I am not advocating to go really small and create many small LUNs. Neither am I saying you should create a couple of really large LUNs. Try to find the the sweetspot for your environment by taking failure domain (backup restore time), IOps, queues (SIOC) and load balancing options for Storage DRS in to account.

VMware vSphere 5.1 Clustering Deepdive available on Amazon now!

Frank and I published the book this morning and Amazon was extremely fast with getting it up on the website. It is available now:

VMware vSphere 5.1 Clustering Deepdive available at VMworld!

Frank and I had been talking about this for a couple of months, but without mentioning what it was we were working on. The last couple of months we’ve spent our spare time on updating the 5.0 Clustering Deepdive to 5.1.

Although this “just” an update to 5.1, we’ve added a section about stretched clustering to the book and the Storage DRS section has been completely overhauled.  Several new paragraphs were added to the vSphere HA section and we had to do some minor tweaks to the vSphere DRS section. On top of we added a great foreword by Raghu Raghuram!

In the upcoming week the book will be available on Amazon (paper – kindle) and in the Apple iBooks store. As we needed to be careful with publishing it at a certain time/date in some cases it might take a couple of days before it shows up in your “local” online bookstore. If you really can’t wait, it is available now on Createspace.

Again, we have kept the prices low… The e-book will sell for only $ 7.49 (note a surcharge might be added based on location) and the paper copies sells for $ 24.95. It is a bargain if I say so myself. Note that even the paper copy will be available directly from European Amazon stores and so will the ebook.

For those at VMworld, there are copies available at the VMworld store on Tuesday, or maybe even Monday afternoon. Note that there is a limited amount available… if you want a copy I would recommend picking it up soon! If you see Frank or myself walking around and would love to have your book signed, don’t hesitate it is our pleasure! We had the honor of presenting the book to Carl Eschenbach yesterday, I can tell you Carl was thrilled and so are we… P i c k i t u p!

Storage DRS interoperability white paper released

I just noticed I never blogged about a white paper Frank Denneman and I co-authored. The white paper deals about interoperability between Storage DRS and various other products and features. I highly recommend reading it if you are planning on implementing Storage DRS or want to get a better understanding of how Storage DRS interacts with other components of your infrastructure.

Storage DRS interoperability
This document presents an overview of best practices for customers considering the implementation of VMware vSphere Storage DRS in combination with advanced storage device features or other VMware products.

http://www.vmware.com/resources/techresources/10286

An introduction to Storage DRS

Today someone asked for a Storage DRS intro, I wrote one for our book a year ago and figured I would share it with the world. I still feel that Storage DRS is one of the coolest features in vSphere 5.0 and I think that everyone should be using this! I know there are some caveats (1, 2) when you are using specific array functionality or for instance SRM, but nevertheless… this is one of those features that will make an admin’s life that much easier! If you are not using it today, I highly suggest evaluating this cool feature.

*** out take from the vSphere 5.0 Clustering Deepdive ***

vSphere 5.0 introduces many great new features, but everyone will probably agree with us that vSphere Storage DRS is most the exciting new feature. vSphere Storage DRS helps resolve some of the operational challenges associated with virtual machine provisioning, migration and cloning. Historically, monitoring datastore capacity and I/O load has proven to be very difficult. As a result, it is often neglected, leading to hot spots and over- or underutilized datastores. Storage I/O Control (SIOC) in vSphere 4.1 solved part of this problem by introducing a datastore-wide disk-scheduler that allows for allocation of I/O resources to virtual machines based on their respective shares during times of contention.

Storage DRS (SDRS) brings this to a whole new level by providing smart virtual machine placement and load balancing mechanisms based on space and I/O capacity. In other words, where SIOC reactively throttles hosts and virtual machines to ensure fairness, SDRS proactively makes recommendations to prevent imbalances from both a space utilization and latency perspective. More simply, SDRS does for storage what DRS does for compute resources.

There are five key features that SDRS offers:

  • Resource aggregation
  • Initial Placement
  • Load Balancing
  • Datastore Maintenance Mode
  • Affinity Rules

Resource aggregation enables grouping of multiple datastores, into a single, flexible pool of storage called a Datastore Cluster. Administrators can dynamically populate Datastore Clusters with datastores. The flexibility of separating the physical from the logical greatly simplifies storage management by allowing datastores to be efficiently and dynamically added or removed from a Datastore Cluster to deal with maintenance or out of space conditions. The load balancer will take care of initial placement as well as future migrations based on actual workload measurements and space utilization.

The goal of Initial Placement is to speed up the provisioning process by automating the selection of an individual datastore and leaving the user with the much smaller-scale decision of selecting a Datastore Cluster. SDRS selects a particular datastore within a Datastore Cluster based on space utilization and I/O capacity. In an environment with multiple seemingly identical datastores, initial placement can be a difficult and time-consuming task for the administrator. Not only will the datastore with the most available disk space need to be identified, but it is also crucial to ensure that the addition of this new virtual machine does not result in I/O bottlenecks. SDRS takes care of all of this and substantially lowers the amount of operational effort required to provision virtual machines; that is the true value of SDRS.

However, it is probably safe to assume that many of you are most excited about the load balancing capabilities SDRS offers. SDRS can operate in two distinct modes: No Automation (manual mode) or Fully Automated. Where initial placement reduces complexity in the provisioning process, load balancing addresses imbalances within a datastore cluster. Prior to vSphere 5.0, placement of virtual machines was often based on current space consumption or the number of virtual machines on each datastore. I/O capacity monitoring and space utilization trending was often regarded as too time consuming Over the years, we have seen this lead to performance problems in many environments, and in some cases, even result in down time because a datastore ran out of space. SDRS load balancing helps prevent these, unfortunately, common scenarios by making placement recommendations based on both space utilization and I/O capacity when the configured thresholds are exceeded. Depending on the selected automation level, these recommendations will be automatically applied by SDRS or will need to be applied by the administrator.

Although we see load balancing as a single feature of SDRS, it actually consists of two separately-configurable options. When either of the configured thresholds for Utilized Space (80% by default) or I/O Latency (15 milliseconds by default) are exceeded, SDRS will make recommendations to prevent problems and resolve the imbalance in the datastore cluster. In the case of I/O capacity load balancing, it can even be explicitly disabled.

Before anyone forgets, SDRS can be enabled on fully populated datastores and environments. It is also possible to add fully populated datastores to existing datastore clusters. It is a great way to solve actual or potential bottlenecks in any environment with minimal required effort or risk.

Datastore Maintenance Mode is one of those features that you will typically not use often; you will appreciate it when you need. Datastore Maintenance Mode can be compared to Host Maintenance Mode: when a datastore is placed in Maintenance Mode all registered virtual machines, on that datastore, are migrated to the other datastores in the datastore cluster. Typical use cases are data migration to a new storage array or maintenance on a LUN, such as migration to another RAID group.

Affinity Rules enable control over which virtual disks should or should not be placed on the same datastore within a datastore cluster in accordance with your best practices and/or availability requirements. By default, a virtual machine’s virtual disks are kept together on the same datastore.

For those who want more details, Frank Denneman wrote an excellent series about Datastore Clusters which might interest you:

Part 1: Architecture and design of datastore clusters.
Part 2: Partially connected datastore clusters.
Part 3: Impact of load balancing on datastore cluster configuration.
Part 4: Storage DRS and Multi-extents datastores.
Part 5: Connecting multiple DRS clusters to a single Storage DRS datastore cluster.
Part 6: Aggregating datastores from multiple storage arrays into one Storage DRS datastore cluster.

Some other articles that might be of use:

The following video will give an overview of the above mentioned features… worth checking.