At several VMUGs I presented a question that always came up was the following: “Should I use many small LUNs or a couple of large LUNs for Storage DRS? What are the benefits of either?”
I posted about VMFS-5 LUN sizing a while ago and I suggest reading that first if you haven’t yet, just to get some idea around some of the considerations taken when sizing datastores. I guess that article already more or less answers the question… I personally prefer many “small LUNs” than a couple of large LUNs, but let me explain why. As an example, lets say you need 128TB of storage in total. What are your options?
You could create 2x 64TB LUNs, 4x 32TB LUNs, 16x 8TB LUNs or 32x 4TB LUNs. What would be easiest? Well I guess 2x 64TB LUNs would be easiest right. You only need to request 2 LUNs and adding them to a datastore cluster will be easy. Same goes for the 4x 32TB LUNs… but with 16x 8TB and 32x 4TB the amount of effort increases.
However, that is just a one-time effort. You format them with VMFS, add the to the datastore cluster and you are done. Yes, it seems like a lot of work but in reality it might take you 20-30 minutes to do this for 32 LUNs. Now if you take a step back and think about it for a second… why did I wanted to use Storage DRS in the first place?
Storage DRS (and Storage IO Control for that matter) is all about minimizing risk. In storage, two big risks are hitting an “out of space” scenario or extremely degraded performance. Those happen to be the two pain points that Storage DRS targets. In order to prevent these problems from occurring Storage DRS will try to balance the environment, when a certain threshold is reached that is. You can imagine that things will be “easier” for Storage DRS when it has multiple options to balance. When you have one option (2 datastores – source datastore) you won’t get very far. However, when you have 31 options (32 datastores – source datastore) that increases the chances of finding the right fit for your virtual machine or virtual disk while minimizing the impact on your environment.
I already dropped the name, Storage IO Control (SIOC), this is another feature to take in to account. Storage IO Control is all about managing your queues, you don’t want to do that yourself. Believe me it is complex and no one likes queues right. (If you have Enterprise Plus, enable SIOC!) Reality is though, there are many queues in between the application and the spindles your data sits on. The question is would you prefer to have 2 device queues with many workloads potentially queuing up, or would you prefer to have 32 device queues? Look at the impact that this could have.
Please don’t get me wrong… I am not advocating to go really small and create many small LUNs. Neither am I saying you should create a couple of really large LUNs. Try to find the the sweetspot for your environment by taking failure domain (backup restore time), IOps, queues (SIOC) and load balancing options for Storage DRS in to account.
Rob Bergin says
I always have such a different take on features (one of the great things about VMware is that folks can do different things with it, its flexible).
I always envisioned Storage DRS about different arrays and different controllers or different costs of storage – I guess the same thing can be said for different LUNs but I felt like Storage DRS allowed the ability to have two different storage targets (and yes LUNs are different) but I meant an iSCSI / SATA pool, a iSCSI SAS pool, a 4/8 GBe FC / SAS pool, a 10 Gbe NFS / SAS pool and a SSD pool or two.
And then based on business use, cost of the storage and IO requirements – the VM workloads could move around and the business could try to avoid spending money on VMs that don’t need expensive storage or try to fit to purpose the storage to the VM.
Duncan Epping says
I am not sure I am following you.
Storage DRS is not designed to handle various tiers in a single datastore cluster. In other words, if you place various tiers (sata / fc / ssd) in a single datastore cluster it is difficult to control what ends up where.
Julian Wood says
Hi Duncan, good discussion, I think the question that also needs to be asked is what pool of physical disks the LUNs ultimately back onto and whether this is part of the same performance aggregate or pool. There is far less benefit in using Storage IO Control to load balance IO across LUNs ultimately backed by the same physical disks than load balancing across separate physical storage pools. When you may be thin provisioning the LUNs as well on the same disk pool behind the scenes even using initial placement means less when it is all the same storage.
Duncan Epping says
I agree that in a scenario like this that there is more to be discussed. But I am not sure I am following you with regards to SIOC and multiple spindles.
Even if you have 1 large pool of disks shared by 10 LUNs SIOC will find the workload causing the problems, when it is a virtual workload on a SIOC enabled volume, and throttle it. SIOC is not about load balancing, it is about throttling and fairness of scheduling.
If none of the virtual workloads is causing this latency then SIOC will simply back off. If one of your virtual workloads is causing the latency then SIOC will throttle it and ensure at least you virtual workloads will get the fair share they deserve.
Julian Wood says
Hey Duncan, yes you are right (of course) and SIOC works in a queue based mechanism regardless of the underlying physical layout. Glad I’m learning more today! Although best to be turned on for all pools + associated datastores backed on the same disks, it measures latency and will apply the throttling to the VMs even if datastores not under SIOC are backed to the same disks.
Thanks for the discussion!
Jonathan Meier says
I agree with Ducan that it is all about finding the sweet spot. Julian’s comment about the storage backend is exteremly valid. I find the additional overhead of having more luns to be cost benifical to the storage backend.
Frank Denneman says
My point of view: SIOC on datastores backed by a single datapool vmwa.re/1a8
Frank Denneman says
I agree with Duncan, even if the datastores are backed by the same spindles, you want to intelligently manage your queue-depths and the virtual machine workload. With SIOC you can prioritize workload running on the datastore across hosts. I posted a more extensive explanation on my own blog: SIOC on datastores backed by a single datapool: http://frankdenneman.nl/sioc/sioc-on-datastores-backed-by-a-single-datapool/
Angelo says
Thanks Ducan, however how does SIOC come in to play with devices like a VNX and FAST, my understanding is you shouldn’t enable SIOC in this case and SDRS should be set to manual, kinda defeats the purpose.
Duncan Epping says
Not sure what SIOC has got to do with FAST? It is fully supported by EMC (http://virtualgeek.typepad.com/virtual_geek/2010/07/vsphere-41-sioc-and-array-auto-tiering.html). SIOC is all about managing queues for short term bursts! FAST is all about handling hot blocks, more long term.
So don’t worry in the case of FAST, just enable it.
Angelo says
Sorry Duncan I was referring to the VMware vSphere
Storage DRS™ Interoperability doc pg. 6 Array-Based Auto-Tiering “VMware recommends configuring Storage DRS in manual mode with I/O metric disabled.”
Duncan says
Which doesn’t mean SIOC doesn’t work. Just that moving VM based on IO metrics might be counteractive if FAST also moves block.
Angelo says
Understood, but if we are trying to get best performance than SIOC should be disabled in this type of scenario? If not than there is gotta be a better guideline. Reason I’m asking is both vmware and emc vnx doc just say disable it and manual which to me is not the way I want to go.
Duncan says
Not sure which doc you are refering to but i have never seen the recommendation to disable SIOC. I have seen the recommendation to disable IO load balancing for Storage DRS, these are two different things!
Patrick says
SIOC works good until you use an HP Lefthand array which can’t separate VMware LUNs from raw iSCSI LUNs.
Drew Henning says
Duncan, thanks for sharing your thoughts on datastore sizing. I’ve been thinking about this lately in my environment.
Just curious if you have the same feelings for NAS vs block?
I agree in taking into account the pool of disks the LUN/volume resides on. But it seems to me there are less queue’s for vSphere to account for in NFS. (maybe I’m wrong)
Also, with NetApp, the datastore/volume is the de-dupe boundary. Datastore sizing can have an affect on de-dupe rates and data inflating/deflating as you move between datastores/volumes.