I’m an outspoken person as most of you noticed by now, but I’m also open for discussion and that’s why I particularly like blogging. Every now and then a good discussion starts based on one of my blog articles. (Or a blog article of any of the bloggers for that matter.) These usually start in the form of a comment on an article but also via email or Twitter, even within VMware some of my articles have been discussed extensively.
A couple of weeks ago I voiced my opinion about VMFS block sizes and growing your VMFS. Growing your VMFS is a new feature, introduced with vSphere. In the article I stated that a large block size, 8MB, would be preferabel because of the fact that you would have less locking when using thin provisioned disks.
If you create a thin provisioned disk on a datastore with a 1MB block size the thin provisioned disk will grow with increments of 1MB. Hopefully you can see where I’m going. A thin provisioned disk on a datastore with an 8MB block size will grow in 8MB increments. Each time the thin-provisioned disk grows a SCSI reservation takes place because of meta data changes. As you can imagine an 8MB block size will decrease the amount of meta data changes needed, which means less SCSI reservations. Less SCSI reservations equals better performance in my book.
As a consultant I get a lot of question on vmfs locking and I assumed, with the current understanding I had, that a larger blocksize would be beneficial in terms of performance. I’m no scientist or developer and I rely on the information I find on the internet, manuals, course material and the occasional internal mailinglists… In this case this information wasn’t correct, or better said not updated yet to the changes that vSphere introduced. Luckily for me, and you guys, one of my colleagues jumped in to give us some good insights:
I am a VMware employee and I wrote VMFS with a few cronies, but the following is a personal opinion:
Forget about locking. Period. Yes, SCSI reservations do happen (and I am not trying to defend that here) and there will be some minor differences in performance, but the suggestion on the (very well written) blog post goes against the mission of VMFS, which is to simplify storage virtualization.
Heres a counter example: if you have a nearly full 8MB VMFS volume and a less full 1MB VMFS volume, you’ll still encounter less IO overheads allocating blocks on a 1MB VMFS volume compared to the 8MB volume because the resource allocator will sweat more trying to find a free block in the nearly full volume. This is just one scenario, but my point is that there are tons of things to consider if one wants to account for overheads in a holistic manner and the VMFS engineers don’t want you to bother with these “tons” of things. Let us handle all that for you.
So in summary, blocksizes and thin provisioning should be treated orthogonally. Since thin provisioning is an official feature, the thing for users to know is that it will work “well” on all VMFS blocksize configurations that we support. Thinking about reservations or # IOs the resource manager does, queue sizes on a host vs the blocksize, etc will confuse the user with assertions that are not valid all the time.
I like the post in that it explains blocks vs sub-blocks. It also appeals to power users, so that’s great too. But reservation vs. thin provisioning considerations should be academic only. I can tell you about things like non-blocking retries, optimistic IO (not optimistic locking) and tons of other things that we have done under the covers to make sure reservations and thin provisioning don’t belong in the same sentence with vSphere 4. But conversely, I challenge any user to prove that 1MB incurs a significant overhead compared to 8MB with thin provisioning
Does this mean that I would not pick an 8MB block size over a 1MB block size any more?
Not exactly, but it will depend on the specific situation of a customer. My other reason for picking an 8MB block size was VMFS volume growing. If you grow a VMFS volume the reason for this probably is the fact that you need to grow a VMDK. If the VMDK needs to grow larger than the maximum file size, which is dictated by the chosen block size, you would need to move(Storage VMotion or cold migration) the VMDK to a different datastore. But if you would have selected an 8MB block size when you created the VMFS volume you would not be in this position. In other words I would prefer a larger block size, but this is based on flexibility in terms of administration and not based on performance or possible locking issues.
I want to thank Satyam for his very useful comment, thanks for chipping in!