A question that pops up on the VMTN Community once every day is what size VMFS datastore should I create? The answer always varies, one says “500Gb” the other says “1TB”. Now the real answer should be, it depends.
Most companies can use a simple formula in my opinion. First you should answer these questions:
- What’s the maximum amount of VMs you’ve set for a VMFS volume?
- What’s the average size of a VM in your environment? (First remove the really large VM’s that typically get an RDM.)
If you don’t know what the maximum amount of VMs should be just use a safe number, anywhere between 10 and 15. Here’s the formula I always use:
round((maxVMs * avgSize) + 20% )
I usually use increments of 25GB. This is where the round comes in to play. If you end up with 380GB round it up to 400GB and if you end up with 321GB round it up to 325GB. Let’s assume your average VM size is 30GB and your max amount of VMs per VMFS volume is 10:
(10*30) + 60 =360
360 rounded up -> 375GB
Nice post… I also find you need to factor in the I/O of the disk subsystem when working out VMs per LUN. Slower/older subsystems don’t preform as well and should have lower VM limit (hence lower I/O requirements) than faster/newer disk subsystems. IOMeter comes in handy here to benchmark new systems for a baseline.
Basically as you say its a black art, some people go with 1 VM per LUN, others go 2TB LUNs and pile them up…. We work on the 500-600GB range per LUN with some ‘bigger’ LUNS for the larger guests.
Tom Howarth says
I do on the whole agree with your statments, however you have missed out another vital component of your equation, IOPS. without this number your formula will always be “finger in the air” albeit a better “finger in the air” that it should be 500GB ot 1TB stab in the dark arguement.
That’s why I wrote “most companies”. I’ve never personally ever seen an environment where IOPS was a constraint. Of course I will take this in account normally, but these are the exception which I usually don’t place on a mixed VMFS volume anyway.
Hany Michael says
Beside I/O, What about things like SCSI reservations and the affect it can have on larger number of VMs residing on a single LUN and shared with many ESX hosts?
Imagine a heavy work is being done on number of VMs simultaneously (like snapshoting, VMotion’ing ..etc) and the ESX is locking the hell out of the LUN all the time..
The bigger the LUN size the larger the number of VMs it can host…storage is not cheap, and this is a fact we live in everyday.. I will never leave a precious 300GB of free space out of 1TB LUN if I felt that the LUN is overwhelmed with VMs I/O or scsi conflicts…but I can definitely leave 100GB out of 500-600GB LUNs…which are the sizes I always use in my environment …
Thanks for the simple and straight forward answer to this long running question. Any thoughts on providing an addendum which addresses datastore sizing over NFS?
Duncan Epping says
Good question Vaughn, I hardly ever touch NFS to be honest. Need to think about that.
@Hany scsi reservations are of a lesser concern with vSphere. The average time a SCSI reservation takes has been decreased, meaning it’s less of a constraint than it used to be.
I will try to focus on Max amount of VMs per volume the next time. But I will need some time to do some decent research and or testing. (And our test environment still needs an update, so if HP/Dell/IBM/Fujitsu/SUN is looking for a cool blog to sponsor.. hint hint hint) 🙂
Interesting conversation – as we (EMC) have been refreshing some of the best practice guides for vSphere.
We had a lot of debate (the previous versions recommended ~ 300-500GB as a max, and baseline along the formula you recommend Duncan) on this topic.
In the end, capacity is one of the planning vectors. Those IOps limited cases absolutely happen (depends on the type of VM). SCSI locking/reservation is generally not an issue during steady state – but as Hany pointed out can be when there are many ESX snapshots occuring simulaneously. I’ve done posts on this topic, and Duncan is also right that they are MUCH more transient than they used to be – and will continue to improve). host-side LUN queues can also be a limiter on the number of VMS (I’ve done posts on this one).
While 10-16 VMs per datastore is safe, it’s also WAY low. There’s certain logic of not only performance, but aggregated risk.
So – our current thinking is – a capacity-oriented recommendation is almost impossible to give.
The other thing is that between ESX snapshots and vswap – the extra space is really difficult to predict.
In general, both on performance and capacity, my guidance is:
– spend less time on planning up front than you would do on a physical deployment (people are used to the pain of re-configuration of storage being DEADLY disruptive)
– plan using standardized building blocks (so, hey, if you want n vms per datastore, and m GB per datastore fine – just know you can go bigger with things like spanned VMFS)
– monitor both capacity/performance – the nice new managed datastore objects and storage view reports are nice here.
– use svmotion (along with dynamic/non-disruptive array reconfiguration) liberally to resolve point issues.
With all that – are we (I) making this more complicated than it needs to be? T
Exactly my point. But people keep asking for the same question that’s why I tried to simplify it. I tend to say max 25 VMs per datastore with a mixed IO pattern. I use the formula to cut down on the overhead that a lot of my customers see. The formula isn’t based on IOPS, hell why would it if that’s usually not the problem. I’ve seen environment where they were running 50 VMs on a 1TB datastore without any issues at all.
Snapshotting is a whole different story. If you’ve got more than 5 snaps running on 1 datastore you need to ask yourself “why?”. Am I using the right feature? Should I maybe clone instead of snap, it’s not a backup!
Building blocks, yes yellow-bricks… building blocks for virtualization. 🙂
Anyway I agree, keep it simple. I try to come back on the max amount of VMs per volume topic later. (again, if I can find the time cause I also need to be billable unfortunately)
“First remove the really large VM’s that typically get an RDM.”
What Size VM’s would typically get an RDM and why ?
any VM that has a disk larger than 800GB would qualify for an RDM or dedicated VMFS.
Our team has standardized on 100gb LUNs and if larger volumes are needed they bind them together using the OS. (we use VSFW on Windows)
Everywhere I’ve been reading using LUNs much larger than 100gb. Is this the overall volume size or the individual LUNs themselves.
I don’t touch the SAN and don’t really know the pros and cons of LUN sizing. I’m wondering if there are any good resources to learn about this.
I personally only seen LUNs between 300GB and 500GB and only really large are RDM or single vmdk/vmfs LUNs…. 100GB for most of the environment I work for would mean that they will hit the 256 LUN limit.
I’d agree on the LUN sizes, the most common I see are 300-700 GB and this works for most companies. It’s the old 80/20 rule. 80% (or really more like 90-95%) of the workloads that you run in VMs are low to moderate IO and they fit into the general design. You can get away with bigger LUNs (1 TB+) and more VMs/LUN (30+) but at some point the size of the VMs and overall capacity starts to dictate LUN size. My recommendation to most customers is:
– tier your workloads into 2 or 3 tiers
– for the low to moderate VMs that make up 80-90%, then a 300-700 GB LUN with 10 – 30 VMs each is usually a safe, happy zone with the easiest management. the size you arrive at here will be in part a factor of the VM sizes.
– for the tier 1 VMs with higher IO, case by case basis. In this scenario many of the same rules and best practices for the physical world apply to the virtual world. this is where reference architectures and validation test reports for specific apps on specific platforms really come in handy.
Massimo Re Ferre' says
is there any reason for which you suggest 800GB> LUNs to be RDMs? Is it for performance? Management? Or what?
I have always thought/said that from a performance perspective there is little that RDM buys you Vs VMFS since the overhead is in the virtualization driver/kernel stack rather than on the file system you use (VMFS Vs Raw). And for a management perspective I am a big fan of encapsulation.
I would only tend to see RDM useful in cases you have a physical server with tons of data on the SAN that you (for some good reason) don’t want to convert/encapsulate into a vmdk.
Duncan Epping says
Doesn’t need to be RDM, can be single VMFS<VMDK volume as well. I like to isolate the large heavy i/o VMDKs to avoid issues and keep as much flexibility as possible. It's not a requirement…
Massimo Re Ferre' says
However a big LUN/VMDK doesn’t always mean high I/O activity (think about a backup region rarely accessed etc etc etc).
Well I guess that, quoting you, “Now the real answer should be, it depends.”
It depends indeed. There are several reasons for including or excluding these, I would say Capacity Planner input is one of those 🙂
How do people design their VMFS at a VM level i.e. do you put all OS in one datastore, page file in another and data in another or do you just keep VM’s disks together.
At the moment we split our VM’s vmdk’s out and i’m starting to think we should go back to keeping the disks all together for the majority of VM’s, it would make admin/scripting/backups/replication so much easier.
I like to keep it together.
1) it’s easier
2) mixed i/o profile
3) in case of SRM/DR, one location for a single VM
Is there any performance loss or gain by having a large VMFS datastore spanning multiple smaller LUNs
Now with vSphere’s ability to virtualize any application more VMs with larger amounts of RAM could impact VMFS design. That means either more RAM reservations or more disk space needed for larger VM swap files.
Assuming most admins will either forget or decide not to make a reservation, couldn’t your formula be adjusted to round(((maxVMs * avgSize) + 20%)+total vRAM )? I know the 20% has accounts for the vRAM to date, but I’d leave it in the formula “just in case.”
For my systems I use 1TB lun sizes, they are striped across 200 SATA disks in my storage array(VMFS volumes make up less than 1% of my overall storage as far as disk space and IOPS). I have 12 VMFS data volumes today(3 for Prod OPS, 5 for QA, 4 for Corp/IT). I have 1 swap VMFS volume for each environment(primarily to control space provisioning and monitor swap IOPS). All in all 14.1TB configured, with thin provisioning 2.2TB is in use right now. Have about 110 VMs currently. Databases are using RDM primarily for array based snapshots/replication between systems. I have decided to mark my first two production VMFS volumes “full”, VMware has provisioned about 1TB between the two. The extra space is room to grow for existing VMs. Since it is thin provisioned really no space is wasted, I don’t care if the volume has 400GB of unused capacity because it really is unused capacity.
IOPS wise our production operations VMFS volumes as a whole average 170 IOPS(Servers total 40 cores/283GB ram). For Corp/QA their volumes average 248 IOPS(Servers total 128 cores/512GB ram).
I’ve grouped the volumes on the array into storage pools for QA and for corp, so I can monitor/alert on their space usage as a group rather than have to monitor the volume levels individually.
I’m also in the process of migrating all systems to a boot from SAN configuration. I have 6 systems booting from SAN today(Fiber), the rest will be converted in the next month or two as we upgrade to vSphere. ESXi v4 is taking 1.5GB of space(so much for that 70MB number they toss around, I hate misleading things and that number is horribly misleading), ESX v4 is taking 3.5GB(space is allocated in 16kB increments on the fly as data is written).
Everything is fiber connected.
Whoops minor correction, 14.1TB configured, 1.82TB is used(Before RAID), 2.2TB is used(after RAID, QA+Corp RAID 5 5+1, Prod operations RAID 5 3+1)
First of all; sorry for commenting on a really old article – we all hate necromancy but I just have to ask:
Everyone keeps talking about the need for several LUN’s. What if I have a 2TB RAID10 array (4x2TB SATA)? Am I supposed to create 4x500GB LUNs (or whatever LUN size I choose to use)? Does this matter at all when every LUN is hitting the same spindles anyway?
I guess it’s not a normal scenario since many use SAS/FC disks, but I’m not in that budget range.
Depends on the queue depth of the array, but in general you will have a per LUN queue depth which in this case will make a difference. Also think about SCSI Reservations. The more VMs on a single volume the bigger the chance is of SCSI Reservation conflicts. So there is a good reason to narrow the LUN size down.
Ah! Of course, didnt think about SCSI reservations. I’ll contact my array vendor for queue depth settings. Thanks for the help Duncan!
Every time I see a really great post I do one of three thing:1.Share it with my relevant friends.2.save it in all my common bookmarking sites.3.Be sure to visit the site where I read the post.After reading this article I’m seriously concidering doing all three!
This is an old article but thought I would throw in a question. We currently run several ESX clusters each attached to their own SAN. We are in the process of starting to build out a new 15-node ESX cluster with its own dedicated SAN of about 150TB.
With this much space on the backend now, I really have to throw all my other standards out the door, ie 500GB luns, having multiple Luns dedicated to guest OS paging, applications, databases, etc. There is just too much space and only 255Luns I can create.
With probably about 400-500 vm’s going in this cluster, I am bit confused how to carve up the space. I liked the idea of keeping my guest page files on separate lun’s before, and I really like it now because that will separate page io from the Luns the vm’s are on which get snapped daily and expire weekly. I just know if I can do that anymore with the large space and only 255 luns to carve.
I am considering seeing if I can easily dump about 100vm’s paging onto a single Lun(linux and windows guests) and be ok. I am assuming scsi reservations will be nearly non-existent on a Lun that is pagefiles only.
So in summary, with very large San it is sometimes very tough to do a best practices.
I would also welcome any comments from people on how best to carve up 150TB into less than 255luns obviously. Our basic vm’s are about 50gb I suppose. Some smaller, some much larger.
Brian Laws says
Hi, Duncan! I know this is an old thread, but I have an updated question on it. How do these recommendations change based on VAAI? From what I’ve heard and read, since VAAI does block SCSI reservations, there no longer is the issue of LUN locking. This, then, supposedly eliminates the max-VM-per-LUN point. Do you find that this is true? If so, I think I’d rather go with fewer, larger LUNs. I’m also wondering how the new VMFS 5 will impact this discussion.
BTW, we’re on an IBM XIV, so IOPS are spread equally throughout the entire frame. So for me, the amount of IOPS on a single LUN isn’t (supposedly) a concern.
Duncan Epping says
Let me revise this article at some point with a new formula instead of updating it..
Anyway, yes a LUN can host more VMs these days due to the offloading of locking and the fact that the locking mechanism has been improved over time. So what was once relevant in 2009 isn’t relevant today.
Have you revised your formula and placed it somewhere else? Excellent discussion by the way.
Any thoughts on the NFS sizing? Would be interesting to read!
We are implementing Netapp for storage consolidation. We are currently carving out 1TB of space which will hold around 30 VM’s comfortably with good performance. With the NetApp dedupe we are showing a 3 -1 ratio so instead of 30 VM’s on 1TB we should be able to place 90 VM’s on 1TB. If we were to do that how would it affect I/O and SCSI reservation locks. We are using Vcenter 4.1. Should we go with smaller LUN sizes? 500GB would place us at around 45 VM’s per LUN estimate. Would we see any problems at all using 90 W7 Desktop VM’s per Lun since I/O would be reduced using deduplication & commonality being called from memory.
I have a VM with Red Hat version 6, which is used for BBDD IBM DB2, and has 20 disks 1TB for VMDK , and each disk vmdk 1TB (Type: Thick Provision Lazy Zeroed) is on a Datastorage 1TB .
You know what the maximum size for the space of a Datastorage? If I’m not mistaken, you have to leave a 20% free space …
Tengo una VM con Red Hat versión 6, que se usa para BBDD de DB2, y tiene 20 discos VMDK de 1TB, y cada discos vmdk de 1TB (Type: Thick Provision Lazy Zeroed) esta en un datastorage de 1TB.
Sabes cual el el tamaño máximo para el espacio libre de un datastorage ? Si no me equivoco, hay que dejarle un 20% de espacio libre…