Storage

Disk.SchedulerWithReservation aka mClock

Duncan Epping · Jan 23, 2013 ·

A long time ago when playing around in my lab with vSphere 5.1 I stumbled across this advanced setting called Disk.SchedulerWithReservation. I start digging to see what I could do with it and what it was about… if I could anything with it at all.

The description was kind of vague but it revealed what this disk scheduler was, it mentioned “mClock”. For those who don’t collect academic papers for night time reading like me, mClock is a new disk scheduler which is being researched by VMware and partners. The disk scheduler, in contrary to the current scheduler SFQ, will allow you to do some more advanced stuff.

For instance mClock will allow you to set an IOps reservation on a VM. So in other words, when you have a virtual machine that needs to have 500 IOps guaranteed you will be able to do so with mClock. Now I have been digging and asking around and unfortunately this logic to set reservations has not been implemented in 5.1.

If you are interested in mClock and its benefits I would recommend reading this academic paper by my colleagues Ajay Gulati (One of the leads on: DRS, Storage DRS, SIOC). I find it very interesting and hope it will be fully available sometime soon. And before you ask, no I don’t know when or even if this will ever be available.

Using ESXTOP to check VAAI primitive stats

Duncan Epping · Dec 20, 2012 ·

Yesterday a comment was made around a VAAI primitive on my article about virtual disk types and performance. In this case “write same” was mentioned and the comment was about how it would not be used when expanding a thin disk or lazy zero thick disk. Now the nice thing is that with ESXTOP you can actually see VAAI primitive stats. For instance “ATS” (locking) can be seen, but also… write same or “ZERO” as ESXTOP calls it.

If you open up ESXTOP and do the following you will see these VAAI primitive stats:

esxtop
press “u”
press “f”
press “o”
press “enter”

The screenshot below shows you what that should look like, nice right… In this case 732 blocks were zeroed out using the write-same / zero VAAI primitive.

Should I use many small LUNs or a couple large LUNs for Storage DRS?

Duncan Epping · Dec 6, 2012 ·

At several VMUGs I presented a question that always came up was the following: “Should I use many small LUNs or a couple of large LUNs for Storage DRS? What are the benefits of either?”

I posted about VMFS-5 LUN sizing a while ago and I suggest reading that first if you haven’t yet, just to get some idea around some of the considerations taken when sizing datastores. I guess that article already more or less answers the question… I personally prefer many “small LUNs” than a couple of large LUNs, but let me explain why. As an example, lets say you need 128TB of storage in total. What are your options?

You could create 2x 64TB LUNs, 4x 32TB LUNs, 16x 8TB LUNs or 32x 4TB LUNs. What would be easiest? Well I guess 2x 64TB LUNs would be easiest right. You only need to request 2 LUNs and adding them to a datastore cluster will be easy. Same goes for the 4x 32TB LUNs… but with 16x 8TB and 32x 4TB the amount of effort increases.

However, that is just a one-time effort. You format them with VMFS, add the to the datastore cluster and you are done. Yes, it seems like a lot of work but in reality it might take you 20-30 minutes to do this for 32 LUNs. Now if you take a step back and think about it for a second… why did I wanted to use Storage DRS in the first place?

Storage DRS (and Storage IO Control for that matter) is all about minimizing risk. In storage, two big risks are hitting an “out of space” scenario or extremely degraded performance. Those happen to be the two pain points that Storage DRS targets. In order to prevent these problems from occurring Storage DRS will try to balance the environment, when a certain threshold is reached that is. You can imagine that things will be “easier” for Storage DRS when it has multiple options to balance. When you have one option (2 datastores – source datastore) you won’t get very far. However, when you have 31 options (32 datastores – source datastore) that increases the chances of finding the right fit for your virtual machine or virtual disk while minimizing the impact on your environment.

I already dropped the name, Storage IO Control (SIOC), this is another feature to take in to account. Storage IO Control is all about managing your queues, you don’t want to do that yourself. Believe me it is complex and no one likes queues right. (If you have Enterprise Plus, enable SIOC!) Reality is though, there are many queues in between the application and the spindles your data sits on. The question is would you prefer to have 2 device queues with many workloads potentially queuing up, or would you prefer to have 32 device queues? Look at the impact that this could have.

Please don’t get me wrong… I am not advocating to go really small and create many small LUNs. Neither am I saying you should create a couple of really large LUNs. Try to find the the sweetspot for your environment by taking failure domain (backup restore time), IOps, queues (SIOC) and load balancing options for Storage DRS in to account.

vSphere Metro Storage Cluster – Uniform vs Non-Uniform

Duncan Epping · Nov 13, 2012 ·

Last week I presented in Belgium at the quarterly VMUG event in Brussels. We did a Q&A and got some excellent questions. One of them was about vSphere Metro Storage Cluster (vMSC) solutions and more explicitly about Uniform vs Non-Uniform architectures. I have written extensively about this in the vSphere Metro Storage Cluster whitepaper but realized I never blogged that part. So although this is largely a repeat of what I wrote in the white paper I hope it is still useful for some of you.

<update>As of 2013 the official required bandwidth is 250Mbps per concurrent vMotion</update>

Uniform Versus Nonuniform Configurations

VMware vMSC solutions are classified in two distinct categories, based on a fundamental difference in how hosts access storage. It is important to understand the different types of stretched storage solutions because this will impact your design and operational considerations. Most storage vendors have a preference for one of these solutions, so depending on your preferred vendor it could be you have no choice. The following two main categories are as described on the VMware Hardware Compatibility List:

Uniform host access configuration – When ESXi hosts from both sites are all connected to a storage node in the storage cluster across all sites. Paths presented to ESXi hosts are stretched across distance.
Nonuniform host access configuration – ESXi hosts in each site are connected only to storage node(s) in the same site. Paths presented to ESXi hosts from storage nodes are limited to the local site.

We will describe the two categories in depth to fully clarify what both mean from an architecture/implementation perspective.

With the Uniform Configuration, hosts in Datacenter A and Datacenter B have access to the storage systems in both datacenters. In effect, the storage-area network is stretched between the sites, and all hosts can access all LUNs. NetApp MetroCluster is an example of this. In this configuration, read/write access to a LUN takes place on one of the two arrays, and a synchronous mirror is maintained in a hidden, read-only state on the second array. For example, if a LUN containing a datastore is read/write on the array at Datacenter A, all ESXi hosts access that datastore via the array in Datacenter A. For ESXi hosts in Datacenter A, this is local access. ESXi hosts in Datacenter B that are running virtual machines hosted on this datastore send read/write traffic across the network between datacenters. In case of an outage, or operator-controlled shift of control of the LUN to Datacenter B, all ESXi hosts continue to detect the identical LUN being presented, except that it is now accessed via the array in Datacenter B.

The notion of “site affinity”—sometimes referred to as “site bias” or “LUN locality”—for a virtual machine is dictated by the read/write copy of the datastore. For example, when a virtual machine has site affinity with Datacenter A, its read/write copy of the datastore is located in Datacenter A.

The ideal situation is one in which virtual machines access a datastore that is controlled (read/write) by the array in the same datacenter. This minimizes traffic between datacenters and avoids the performance impact of reads’ going across the interconnect. It also minimizes unnecessary downtime in case of a network outage between sites. If your virtual machine is hosted in Datacenter B but its storage is in Datacenter A you can imagine the virtual machine won’t be able to do I/O when there is a site partition.

With the Non-uniform Configuration, hosts in Datacenter A have access only to the array in Datacenter A. Nonuniform configurations typically leverage the concept of a “virtual LUN.” This enables ESXi hosts in each datacenter to read and write to the same datastore/LUN. The clustering solution maintains the cache state on each array, so an ESXi host in either datacenter detects the LUN as local. Even when two virtual machines reside on the same datastore but are located in different datacenters, they write locally without any performance impact on either of them.

Note that even in this configuration each of the LUNs/datastores has “site affinity” defined. In other words, if anything happens to the link between the sites, the storage system on the preferred site for a given datastore is the only remaining one that has read/write access to it, thereby preventing any data corruption in the case of a failure scenario. This also means that it is recommended to align virtual machine – host affinity with datastore affinity to avoid any unnecessary disruption caused by a site isolation.

I hope this helps understanding the differences between Uniform vs Non-Uniform configurations. Many more details about vSphere Metro Storage Cluster solutions, including design and operational considerations, can be found in the vSphere Metro Storage Cluster whitepaper. Make sure to read it if you are considering, or have implemented, a stretched storage solution!

VMFS File Sharing Limits increased to 32

Duncan Epping · Nov 6, 2012 ·

I was reading this white paper about VMware View 5.1 and VMFS File Locking today. It mentions the 8 host cluster limitation for VMware View with regards to linked clones and points to VMFS file sharing limits as the cause for this. While this is true in a way, VMware View 5.1 is limited to 8 host clusters for linked clones on VMFS Datastores, the explanation doesn’t cover all details or reflect the current state of vSphere / VMFS. (Although there is a fair bit of details in there about VMFS prior to vSphere 5.1.)

What the paper doesn’t mention is that in vSphere 5.1 this “file sharing limit” has been increased from 8 to 32 for VMFS Datastores. Cormac Hogan wrote about this a while ago. So to be clear, VMFS is fully capable today of sharing a file with 32 hosts in a cluster. VMware View doesn’t support that yet unfortunately, but for instance VMware vCloud Director 5.1 does support it today.

I still suggest reading the white paper, as it does help getting a better understanding of VMFS and View internals!