I had a question on my old VMFS LUN Sizing article I did back in 2009… The question was how valid the used formula and values still were in today’s environment especially considering VMFS-5 is around the corner. It is a very valid question so I decided to take my previous article and rewrite it. Now one thing to keep in mind though is that I tried to make it usable for generic consumption and you will still need to figure out things yourself as I simply don’t have all info needed to make it cookie-cutter, but I guess this is as close as it can get.
Parameters:
MinSize = 1.2GB
MaxVMs = 40
SlackSpace = 20%
AvgSizeVMDK = 30GB
AvgDisksVMs = 2
AvgMemSize = 3GB
Before I will drop the formula I want to explain the MaxVMs parameter. You will need to figure out how many IOps your LUN can handle first, for a hint check this article. But besides IOps you will also beed to take burst room into account and of course the RTO defined for this environment:
((IOpsPerLUN – 20%) / AVGIOpsPerVM) ≤ (MaxVMsWithinRTO)
Keep in mind that the article I pointed out just a second ago is geared towards worst case numbers, so no cache or other benefits. Secondly I subtracted 20% which is room for bursting. Now this is by no means a best practice and this number will need to be tweaked based on the size of your LUN and the total amount of IOps you LUN can handle. For instance when you are using 8 SATA spindles that 20% might only be 80 IOps, depending on the raid level used, in the case of SAS it could be 280 IOps with just 8 spindles and that is a huge difference. Anyway I leave that up to you to decide but I used 20% headroom for both disk space (for snapshots and the memory overhead swap files) and performance, just to keep it simple. The second part of this one is MaxVMsWithinRTO. In short make sure that you can recover the number of VMs on the datastore within the defined recovery time objective (RTO). You don’t want to find yourself in a situation where the RTO is 4hrs but the total amount of time for the restore is 24 hours.
Formula, aaahhh yes here we go. Now note that I did not take traditional constraints around “SCSI Reservations Conflicts” into account as with VMFS -5 and VAAI SCSI Locking Offload these are lifted. If you have an array which doesn’t support the ATS primitive make sure you take this into account as well. Although the SCSI locking mechanism has been improved over the last years it could still limit you when you have a lot of power-on events, vMotion events etc.
(((MaxVMs * AvgDisksVMs) * AvgSizeVMDK) + ( MaxVMs * AvgMemSize)) + SlackSpace ≥ MinSize
Lets use the numbers defined in the parameters above and do the math:
(((40 * 2) * 30GB) + (40 * 3GB)) + 20% = (2400GB + 120GB) * 1.2 = 3024 GB
I hope this helps making your storage design decisions. One thing to keep in mind of course is that most storage arrays have optimal configurations for LUN sizes in terms of performance. Depending on your IOps requirements you might want to make sure that these align.
Paul says
What is the MinSize parameter?
Gabrie van Zanten says
Think that is the 1.2GB, which is the minimum size of a VMFS volume. The formula just checks that you are not below that min-limit.
Duncan Epping says
I should have explained that indeed. VMFS requires a minimum size of 1.2GB for the metadata.
Jason Langdon says
Hi Duncan,
Has there been any update as to when your new book “vSphere 5.0 Clustering technical deepdive” will be available in ebook formats other then the Kindle?
Thanks,
Jason
Duncan says
No there has been no update. I don’t know if we wlll ever release other formats. Experienced a lot of layout issues and don’t have he bandwidth to make the required changes.
Jason Boche says
Good stuff as usual Duncan. Have a great weekend Dutchie!
Dr.Satyanarayana says
greencloudcomputingtechnologies.weebly.com/
Manoray says
Nice Mustache Dr. Satyanarayana!
Garret Black says
The new vSphere 5 sizes are going to make things interesting for enterprise SAN’s where you are carving your datastore luns from a dedicated-to-vmware-cluster disk pool. The only reason I can see to make luns smaller than the limit of 64TB is to spread the load onto additional controller processors. With that being said it would make sense (unless i’m missing something) to divide your disk pool capacity by the number of controller processors to get your datastore size.
example: (disk pool capacity – 10% free space) / total processors = datastore size
real world example (AMS2500 with a RAID5 DP of 185 15K SAS disks): (77.3TB – 7.73)/4 = 17.39TB So in this case I would likely make the datastores 16TB to keep some additional free space in the disk pool.
Does that make sense?
Clustor says
That depends i think.
I can imagine that although the VAAI features would allow for more VMs per Datastore , you still have to take the queue depth limits for those LUNs into account right?
I would have a lot of customers who could fit their entire environment in a 5-10 TB single LUN. Would not be that good for performance, queue depth wise. That has not changed with VMFS 5 i think.
The more hosts in a cluster, the less of an issue this will be for the host-side queues, but Array-queues on LUN level are present on most arrays as well.