VMFS Deepdive Paper

I received this question yesterday about material around VMFS. Of course I pointed them to the white paper Cormac recently published, which is an awesome read. But I also had this other link laying around which I forgot to share with the world. This paper is written by Satyam Vaghani (former VMware Principal Engineer, now CTO of PernixData) and is a true deepdive on VMFS and topics like locking / metadata. Do note that this paper was written in the VMFS 3.x timeframe, and as such some of the concepts might have changed.

Nevertheless, a very interesting read if you ask me! Make sure to pick up a copy here.

Should I use many small LUNs or a couple large LUNs for Storage DRS?

At several VMUGs I presented a question that always came up was the following: “Should I use many small LUNs or a couple of large LUNs for Storage DRS? What are the benefits of either?”

I posted about VMFS-5 LUN sizing a while ago and I suggest reading that first if you haven’t yet, just to get some idea around some of the considerations taken when sizing datastores. I guess that article already more or less answers the question… I personally prefer many “small LUNs” than a couple of large LUNs, but let me explain why. As an example, lets say you need 128TB of storage in total. What are your options?

You could create 2x 64TB LUNs, 4x 32TB LUNs, 16x 8TB LUNs or 32x 4TB LUNs. What would be easiest? Well I guess 2x 64TB LUNs would be easiest right. You only need to request 2 LUNs and adding them to a datastore cluster will be easy. Same goes for the 4x 32TB LUNs… but with 16x 8TB and 32x 4TB the amount of effort increases.

However, that is just a one-time effort. You format them with VMFS, add the to the datastore cluster and you are done. Yes, it seems like a lot of work but in reality it might take you 20-30 minutes to do this for 32 LUNs. Now if you take a step back and think about it for a second… why did I wanted to use Storage DRS in the first place?

Storage DRS (and Storage IO Control for that matter) is all about minimizing risk. In storage, two big risks are hitting an “out of space” scenario or extremely degraded performance. Those happen to be the two pain points that Storage DRS targets. In order to prevent these problems from occurring Storage DRS will try to balance the environment, when a certain threshold is reached that is. You can imagine that things will be “easier” for Storage DRS when it has multiple options to balance. When you have one option (2 datastores – source datastore) you won’t get very far. However, when you have 31 options (32 datastores – source datastore) that increases the chances of finding the right fit for your virtual machine or virtual disk while minimizing the impact on your environment.

I already dropped the name, Storage IO Control (SIOC), this is another feature to take in to account. Storage IO Control is all about managing your queues, you don’t want to do that yourself. Believe me it is complex and no one likes queues right. (If you have Enterprise Plus, enable SIOC!) Reality is though, there are many queues in between the application and the spindles your data sits on. The question is would you prefer to have 2 device queues with many workloads potentially queuing up, or would you prefer to have 32 device queues? Look at the impact that this could have.

Please don’t get me wrong… I am not advocating to go really small and create many small LUNs. Neither am I saying you should create a couple of really large LUNs. Try to find the the sweetspot for your environment by taking failure domain (backup restore time), IOps, queues (SIOC) and load balancing options for Storage DRS in to account.

VMFS File Sharing Limits increased to 32

I was reading this white paper about VMware View 5.1 and VMFS File Locking today. It mentions the 8 host cluster limitation for VMware View with regards to linked clones and points to VMFS file sharing limits as the cause for this. While this is true in a way, VMware View 5.1 is limited to 8 host clusters for linked clones on VMFS Datastores, the explanation doesn’t cover all details or reflect the current state of vSphere / VMFS. (Although there is a fair bit of details in there about VMFS prior to vSphere 5.1.)

What the paper doesn’t mention is that in vSphere 5.1 this “file sharing limit” has been increased from 8 to 32 for VMFS Datastores. Cormac Hogan wrote about this a while ago. So to be clear, VMFS is fully capable today of sharing a file with 32 hosts in a cluster. VMware View doesn’t support that yet unfortunately, but for instance VMware vCloud Director 5.1 does support it today.

I still suggest reading the white paper, as it does help getting a better understanding of VMFS and View internals!

VMware Storage APIs for VM and Application Granular Data Management

Last year at VMworld there was a lot of talk about VMware vStorage APIs for VM and Application Granular Data Management aka Virtual Volumes aka VVOL / VVOLs. The video of this session was just posted up on youtube and I am guessing people will have questions about it after watching it. What I wanted to do in this article is explain what VMware is trying to solve and how VMware intends to solve it. I tried to keep this article as close to the information provided during the session as possible. Note that this session was a Technology Preview, in no shape or form did VMware commit to ever delivering this solution let alone mention a timeline. Before we go any further… if you want to hear more and are attending VMworld, sign up for this session by Vijay Ramachandra and Tom Phelan!

INF-STO2223 – Tech Preview: vSphere Integration with Existing Storage Infrastructure


The Storage Integration effort started with the vSphere API for Array Integration, also known as VAAI. VAAI was aimed to offload data operations to the array to reduce the load and overhead on the hypervisor but more importantly to allow for greater scale and better performance. In vSphere 5.0 the vSphere Storage APIs for Storage Awareness (aka VASA) were introduced which allowed for an out-of-band communications channel to discover storage characteristics. For those who are interested in VASA, I would like to recommend reading Cormac’s excellent article where he explains what it is and he shows how VMware partners have implemented it.

Although these APIs have bridged a huge gap they do not solve all of the problems customers are facing today.

What is VMware trying to solve?

In general VMware is trying to increase agility and flexibility of the VMware storage stack through providing a general framework where any current and future data operations can be implemented with minimal effort for both VMware and partners. Customers have asked for a solution which allows them to differentiate services to their customers on a per application level. Currently when provisioning LUNs, typically large LUNs, this is impossible.

Another area of improvement is granularity. For instance, it is desired to have per VM level fail-over or for instance to allow deduplication on a per VMDK level. This is currently impossible with VMFS. A VMFS volume is usually a LUN and data management happens at a LUN / Volume granularity. In other words a LUN is the level at which you operate from a storage perspective but this is shared by many VMDKs or VMs which might have different requirements.

As stated in mentioned in last years VMworld presentation the currently wish list is:

  1. Ability to offload to storage system on a per VMDK level
  2. Snapshots / cloning / replication / deduplication etc
  3. A framework where any current or future storage system operation can be leveraged
  4. No disruption to the existing VM creation workflows
  5. Highly scalable

These 4 should maximize your ROI on hardware investment, reduce operational effort associated with storage management and virtual machine deployment. It will also allow you to enforce application level SLAs by specifying policies on a VMDK or VM level instead of a datastore level. The granularity that it will allow for is in my opinion the most important part here!

How does VMware intend to solve it?

During the research phase many different options were looked at. Many of these however did not take full advantage of the capabilities of the storage system and they introduced more complexity around data layout. The best way of solving this problem is leveraging well known objects… volumes / LUNs.

These objects are referred to as VM Volumes, but also sometimes referred to as vVOLs. A VM Volume is a VMDK (or it derivative) stored natively inside a storage system. Each VM Volume will have a representative on the storage system. By creating a volume for each VMDK you can set policies on the lowest possible level. Not only that, the SAN vs NAS debate is over. This however does implies that when every VMDK is a storage object there could be thousands of VM Volumes. Will this require a complete redesign of storage systems to allow for this kind of scalability? Just think about the current 256 LUNs per host limit for instance. Will this limit the amount of VMs per host/cluster?

In order to solve this potential problem a new concept is introduced which is called an “IO De-multiplexer” or “IO Demux”. This is one single device which will exist on a storage system and it represents a logical I/O channel from the ESXi hosts to the entire storage system. Multi-pathing and path policies will be defined on a per IO Demux basis, which typically would be once. Behind this IO Demux device there could be 1000s of VM volumes.

This however introduces a new challenge. Where in the past the storage administrator was in control, now the VM administrator could possible create hundreds of large disks without discussing it with the storage admin. To solve this problem a new concept called Capacity Pools is introduced. A Capacity Pool is an allocation of physical storage space and a set of allowed services for any part of that storage space. Services could be replication, cloning, backup etc. This would be allowed until the set threshold is exceeded. It is the intention to allow Capacity Pools to span multiple storage systems and even across physical sites.

In order to allow to set specific QoS parameters another new concept is introduced called Profiles. Profiles are a set of QoS parameters (performance and data services) which apply to a VM Volume, or even a Capacity Pool. The storage administrator can create these profiles and assign these to the Capacity Pools which will allow the tenant of this pool to assign these policies to his VM Volumes.

As you can imagine this shifts responsibilities within the organization between teams, however it will allow for greater granularity, scale, flexibility and most importantly business agility.


Many customers have found it difficult to manage storage in virtualized environments. VMFS volumes typically contain dozens of virtual machines and VMDKs making differentiation on a per application level very difficult. VM Volumes will allow for more granular data management by leveraging the strength of a storage system, the volume manager. VM Volumes will simplify data and virtual infrastructure management by shifting responsibilities between teams and removing multiple layers of complexity.

LUN sizes in the new Storage World

I am on a holiday and catching up on some articles I had saved that I still wanted to read. I stumbled on an article about sizing VMFS volumes by  Ravi Venkat (Pure Storage) on flash based arrays. I must say that Ravi has a couple of excellent arguments around the operational and architectural simplicity of these new types of arrays and I do strongly believe that indeed it makes the world a lot easier.

IOps requirements, indeed forget about them when you have thousands at your disposal… And indeed your raid penalty also doesn’t really matter anymore, especially as many of these new storage arrays also have new types of raid-levels. Great right?

Yes in most cases this is great news! One thing to watch out for though is the failure domain. Meaning that if you create a large 32TB volume with hundreds of virtual machines the impact would be huge if this volume for whatever reason blows up. Not only the impact of the failure itself but also the RTO aka “Recovery Time Objective” would be substantially longer. Yes the array might be lightning fast, but your will and probably are limited by your backup solution. How long will it take to restore those 32TBs? Have you ever done the math?

It isn’t too complicated to do the math, but I would strongly suggest to test it! When I was an admin we had a clearly defined RTO and RPO. We tested these every once in a while, and even though we were already using tapeless backups, it still took a long time to restore 2TB.

Nevertheless, I do feel that Ravi points out the “hidden value” of these types of storage architectures. Definitely something to take in to account when you are looking for new storage… I am wondering how many of you are already using flash based solutions, and how you do your sizing.