I had a discussion on block sizes after the post on thin-provisioned disks with some of my colleagues. For those that did not read this post here’s a short recap:
If you create a thin provisioned disk on a datastore with a 1MB blocksize the thin provisioned disk will grow with increments of 1MB. Hopefully you can see where I’m going. A thin provisioned disk on a datastore with an 8MB blocksize will grow in 8MB increments. Each time the thin-provisioned disk grows a SCSI reservation takes place because of meta data changes. As you can imagine an 8MB blocksize will decrease the amount of meta data changes needed, which means less SCSI reservations. Less SCSI reservations equals better performance in my book.
As some of you know the locking mechanism has been improved with vSphere, yes there’s a good reason why they call it “optimistic locking”. In other words why bother increasing your block size if the locking mechanism has improved?
Although the mechanism behaves differently it does not mean that locking does not need to occur. In my opinion it’s still better to have 1 lock vs 8 locks if a VMDK need to grow. But there’s another good reason, with vSphere comes growable VMFS volumes. You might start with a 500GB VMFS volume and a 1MB block size, but when you expand the disk this block size might not be sufficient when you create new VMs. Keep in mind that you can’t modify the block size, while you just might have given people the option to create disks beyond the limit of the block size. (Mind: you will receive an error, it’s not possible.)
So what about overhead? Will my 1KB log files all be created in 8MB blocks? Cause this would mean a large overhead and might be a valid reason to use 1MB block sizes!
No it will not. VMFS-3 solves this issue by offering a sub-block allocator. Small files use a sub block to reduced overhead. A sub block of a 1MB block size volume is 1/16th the size of the block. For an 8MB block size volume it’s 1/128th. In other words, the sub-blocks are 64KB large in both cases and thus the overhead is the same in both cases as well.
Now my question to you guys, what do you think? Would it make sense to always use an 8MB blocksize… I think it would
Ken Cline says
I’ve always felt that larger block sizes were more appropriate. VMFS deals with (mostly) large files, so take advantage of that and allocate in large chunks 🙂
Bouke Groenescheij says
The option should be removed from the interface and VMware should allocate it always to 8Mb blocks!!! I’ve seen too many cases with customers wanted to grow their vmdk but were limited to 256Gb. Totally agree with you!
Roger Lund says
Agreed, I always use the 8 MB block size for my datastore.
Justin says
We used 2MB block size when we upgraded to 3.5. I may consider using 8MB in our next upgrade.
glnsize says
Until I understood how vmfs3 utilized sub blocks I always used 1mb. Once it was explained to be how it all worked I switched to an 8mb. Sub block allocation negated my only concern with using the larger block size.
Sid Smith says
I agree with the changes made as part of vSpher using an 8mb block size is the logical choice.
jl says
we will wait to see how other people experience thin-provisioned disks performance wise and if the scsi reservations are going to make problems, just saying that vsphere has optimized locking technology is not enough, need a detailed white paper that describes why it is better.
Duncan Epping says
I don’t think you really understood my post… Or I am really confused about what you are trying to say.
craig says
I am just little curious here, if 8MB block size should be the right setting to go for, but why the block size recommendation from VMware in the administration console, always tie up a certain block size with specify datastore size? any idea? just to clarify here
Duncan Epping says
Good question and I honestly don’t know. That’s one of the reasons I asked you guys to chip in…
Sharninder says
8MB all the way. There aren’t many files less than 64kb on a vmfs anyway and the sub-blocks solve the internal fragmentation problem to a large extent. 8MB sounds like the logical choice for vsphere.
Although, I’m not sure how thin provisioned disks will behave wrt. thick disks’ in performance.
Satyam Vaghani says
I am a VMware employee and I wrote VMFS with a few cronies, but the following is a personal opinion:
Forget about locking. Period. Yes, SCSI reservations do happen (and I am not
trying to defend that here) and there will be some minor differences in
performance, but the suggestion on the (very well written) blog post goes against the mission of VMFS, which is to
simplify storage virtualization.
Heres a counter example: if you have a
nearly full 8MB VMFS volume and a less full 1MB VMFS volume, you’ll still
encounter less IO overheads allocating blocks on a 1MB VMFS volume compared
to the 8MB volume because the resource allocator will sweat more trying to
find a free block in the nearly full volume. This is just one scenario, but my point is that there are tons of things to consider if one wants to account for overheads in a holistic manner and the VMFS engineers don’t want you to bother with these “tons” of things. Let us handle all that for you.
So in summary, blocksizes and thin provisioning should be treated
orthogonally. Since thin provisioning is an official feature, the thing for
users to know is that it will work “well” on all VMFS blocksize
configurations that we support. Thinking about reservations or # IOs the
resource manager does, queue sizes on a host vs the blocksize, etc will confuse the user with
assertions that are not valid all the time.
I like the post in that it explains blocks vs sub-blocks. It also appeals to
power users, so that’s great too. But reservation vs. thin provisioning
considerations should be academic only. I can tell you about things like
non-blocking retries, optimistic IO (not optimistic locking) and tons of
other things that we have done under the covers to make sure reservations
and thin provisioning don’t belong in the same sentence with vSphere 4. But
conversely, I challenge any user to prove that 1MB incurs a significant
overhead compared to 8MB with thin provisioning 🙂
Duncan says
This is honestly one of the best replies I ever had on my blog. Thank you very much for your insights! I really appreciate it and I know for sure that all my readers will appreciate it.
canalha says
A valuable discussion, guys.
Question on top of it: won’t VMFS-3 also need a lock to allocate the sub-block?
Tom says
It seems that unless one has a lot of I/O issues to think about — big companies, big apps, etc. — block size might be important. But for the majority of SMBs implementing esx/vsphere, they’re not going to have gigantic I/O issues and the block size is just one more thing to remember. Makes more sense for everyone to set up their VMs aligned. I created my gold templates on aligned C:\ drives because I was creating new ones anyway. And I align any new partitions I create…I really liked Satyam’s comment. It convinced me that I don’t have too much to worry about by going with the defaults and making the effort to align disks.
pwallace says
I am not sure I see where Satyam’s comments help dismiss disk aligning, but I am new to this so any explanation would be appreciated.
Anders Olsson says
According to http://kb.vmware.com/kb/1003565 sub blocks don’t exist:
“With a block size of 1MB, every file that is created uses at least 1MB of space on the storage, regardless of its actual size. With an 8MB block size, a 1KB file still occupies 8MB of space. The unused space in that block is wasted. The larger block size is only required when a file is so large that it requires an extended addressing space. Being aware of the intended use helps with your planning and efficient use of space on the data store.”
Golddiggie says
Is it the overhead with thin provisioning the reason why some VM’s just run better if the drives are thick provisioned?
I’ve moved from having all VM’s drives thin provisioned to making the C drive (where the OS is installed, and not much else) thick with the other drives, mostly, thin (at least for now). I’m actually going to make my SQL server thick on both drives now, so that there’s less overhead, and (possibly) better performance on the VM. It would explain why on slower storage, thin provisioning may not be such a good idea.
Where there’s been plenty of storage, I’ve typically used the block size, on each LUN, that made sense for what the size of the virtual disk would be on the VM’s. If we’re looking at small drives, that easily fit under the 256GB size limit for the 1MB block size, then I use that. I have had VM’s that have (occasionally) needed larger drives, so larger block sizes were used on those LUNs.
I see it really coming down to proper planning for deciding what block size to use. Have a baseline that will work for the majority of the VM’s, but still have some LUNs that are different for when you need to deviate from the standard. It also goes along with the size you make the LUNs… If you’re using 500GB (or less) LUNs, then 1MB (or 2MB) block sizes make a lot more sense than 8MB blocks. If you’re using 2TB LUNs, then the larger block sizes probably make more sense. I would also not make a single LUN the only one of it’s size, with that block size.
Ken says
It looks like the KB (http://kb.vmware.com/kb/1003565) has been fixed. It now reads:
“VMFS3 uses sub blocks for directories and small files with size smaller than 1 MB. When the VMFS uses all the sub block (4096 sub blocks of 64 KB each), file blocks will be used. For files of 1 MB or higher, file blocks are used. The size of the file block depends on the block size you selected when the Datastore was created.”
urs weber says
I’ve one more question concerning block size:
I prefer bigger block sizes too, but how is the changed block tracking feature affected?
Is the changed block tracking also using the VMFS block size?
If yes, bigger blocks means less granularity and so may lead to less compression and less dedup percentage.
Are my suggestions right or is the block size not affecting changed block tracking?
Regards
Urs
David says
I know this is an older article but found it very interesting as I was searching for articles on VMFS versions and block sizes.
We recently moved to ESXi 4.1 from ESX 4.0 and began having issues with backups using Vranger. For performance gains Vranger used the console in ESX to process and proxy the data before sending across the network to the repository. In ESXi the console is removed so this local proxying could not be done. To compensate for the lack of a console, the Vranger folks now recommended running Vranger on a VM and using a technology called Hot Add where the VM would now act as the proxy before sending data across the network to the repository.
Now this is where the block size comes in. It appears that there is a limitation in the VDDK that states the following:
“•Hot Add limitation when VMFS block sizes are mismatched
Hot Add cannot be used if the VMFS block size of the datastore containing the virtual machine folder for the target virtual machine does not match the VMFS block size of the datastore containing the proxy virtual machine. For example, if you back up virtual disk on a datastore with 1MB blocks, the proxy must also be on a datastore with 1MB blocks.”
This article can be found in the VDDK release notes
http://www.vmware.com/support/developer/vddk/VDDK-1.2.1-Relnotes.html
This became a big issue when trying to backup larger VMs that were on datastores of different block sizes than where the Vranger proxy was running. The Hot Add reduced the backup time by a factor of 2-3 times. A backup that normally would take over 24 hours would now finish under 10 hours.
Duncan Epping says
Thanks, great info
Sir Mix-A-lot says
“I like big blocks, and I cannot lie”