I’ve seen this myth floating around from time to time and as I never publicly wrote about it I figured it was time to write an article to debunk this myth. The question that is often posed is if thin disks will hurt performance due to fragmentation of the blocks allocated on the VMFS volume. I guess we need to rehash (do a search on VMFS for more info) some basics first around Think Disks and VMFS volumes…
When you format a VMFS volume you can select the blocksize (1MB, 2MB, 4MB or 8MB). This blocksize is used when the hypervisor allocates storage for the VMDKs. So when you create a VMDK on an 8MB formatted VMFS volume it will create that VMDK out of 8MB blocks and yes indeed in the case of a 1MB formatted VMFS volume it will use 1MB. Now this blocksize also happens to be the size of the extend that is used for Think Disks. In other words, every time your thin disks needs to expand it will grow in extends of 1MB. (Related to that, with a lazy-thick disk the zero-out also uses the blocksize. So when something needs to be written to an untouched part of the VMDK it will zero out using the blocksize of the VMFS volume.)
So using a thin disk in combination with a small blocksize cause more fragmentation? Yes, more than possibly it would. However the real question is if it will hurt your performance. The answer to that is: No it won’t. The reason for it being that the VMFS blocksize is totally irrelevant when it comes to Guest OS I/O. So lets assume you have an regular Windows VM and this VM is issuing 8KB writes and reads to a 1MB blocksize formatted volume, the hypervisor won’t fetch 1MB as that could cause a substantial overhead… no it would request from the array what was requested by the OS and the array will serve up whatever it is configured to do so. I guess what people are worried about the most is sequential I/O, but think about that for a second or two. How sequential is your I/O when you are looking at it from the Array’s perspective? You have multiple hosts running dozens of VMs accessing who knows how many volumes and subsequently who knows how many spindles. That sequential I/O isn’t as sequential anymore all of a sudden it is?!
<edit> As pointed out many arrays recognize sequential i/o and prefetch which is correct, this doesn’t mean that when contiguous blocks are used it is faster as fragmented blocks also means more spindles etc </edit>
I guess the main take away here is, stop worrying about VMFS it is rock solid and it will get the job done.
Conrad says
Great Post! I was actually just reading about this the other day.
Christian says
Does the expanding of the vmdk file issue some locking operation that would hurt performance?
Duncan Epping says
Yes it does incur a reservation and no it doesn’t impact performance due to the way we do locking in 4.x, optimistic locking is what it is called.
Afidel says
Duncan, I thought there was in fact some potential performance impact in 4.0 and 4.1 without VAAI but with VAAI there was zero performance impact.
Duncan Epping says
There’s a whitepaper that shows the performance difference between thin and lazy thick… check it:
http://www.google.com/url?sa=t&source=web&cd=1&ved=0CBMQFjAA&url=http%3A%2F%2Fwww.vmware.com%2Fpdf%2Fvsp_4_thinprov_perf.pdf&rct=j&q=storage%20performance%20thick%20thin%20vmware&ei=O5t2TcikJ4PAsAPiroHDBA&usg=AFQjCNEEBciis4vrumWsmh5lFVbM2Zxx9w&sig2=NKK12GIxkeLMIaSZz4omRQ&cad=rja
Vaughn Stewart says
Duncan,
I believe the tests ran in the paper cited do not show the performance differences between a thin and a lazy-zeroed disk as the load generation tool (IOMeter) creates files used in the testing in advance of IO load being generated. As such, the thin VMDK does not grow while the tests are running, locking does not occur, and performance is solid.
While technically the test did measure performance capabilities of a thin and a lazy zero thick VMDK, I would suggest that due to the creation of the testbed the thin VMDK truly behaved like a lazy zero disk.
I would suggest a better test would be to compare how quickly a thin and a lazy thick VMDK could grow by adding net new data into the VMDK. In this design the VMDK would have to expand, incur locks, and potentially incur a performance issue.
That’s my $0.02,
Vaughn
Robert says
“That sequential I/O isn’t as sequential anymore all of a sudden it is?!”
Taking a slight shortcut here imo..
The really smart caching algorithms of arrays. As soon as they see “something” is requesting adjacent tracks, it’s sequential IO for the array and it starts prestaging blocks to cache with a greater blocksize for a much greater chance of getting read cache hits. Now what if blocks aren’t adjacent anymore….? Bye smart algorithms. So yes, it does hurt performance.
Duncan Epping says
Yes that was a shortcut indeed. However, just remember that even if the disk is thick provisioned it doesn’t mean it is optimized to be placed on the same disks and using the same disk doesn’t mean it results in better performance as multiple spindles can provide data faster to I/O.
I should have expanded on it a bit more maybe.
Robert says
I see where you’re coming from Duncan. And I agree this is not something which will only happen on thin disks. Not at all. But you also know that even with wide striping nowadays the algorithms don’t care if a block is adjacent on the same spindle. It just sees if blocks are adjacent on (sub)lun basis, and will know from which array/spindle(s) it needs to prefetch the next block from. Let alone if the storage is virtualised. Heck, with virtualised arrays it’s even possible the frontend storage is prestaging sequential reads even over multiple boxes and arrays 🙂
I shouldn’t have used the word “tracks”, I was a bit off there, it’s not what I meant.
Just wanted to point out it was a bit of a shortcut, and while you might know it, lots of your readers don’t, and will take it for granted. But I see you updated your article, thanks. 😉
Saturnous says
No difference between thin and thick but between thin&thick and eagerzeroed thick. I suggest eagerzeroed thick disks for database transaction logs and AD servers (as they try forced unit access). It is more a “over the thumb” suggestion but i see more risk on IO intensive thin disk to fill up the datastore.
NetApp promise that a stream of blocks pressed through the NVRAM will find itself in neighborhood each other on the spindles.
Duncan Epping says
eager zero could help performance indeed as blocks are already zeroed.
Scott Drummonds says
Guest IOs are penalized by VMFS fragmentation when a single guest contiguous IO spans two VMFS blocks. An example would be an 8k read on a 4k guest file system when one 4k block occurs at the end of one VMFS block and the second 4k block occurs at the beginning of the next. Due to VMFS fragmentation, one IO becomes two.
But the key thing to remember in this situation is that the VMFS blocks are so much larger than the guest blocks that this splitting is very rare. The net impact to performance is immeasurably small. As you already pointed out, the thin disk paper we published a couple years back (http://www.vmware.com/pdf/vsp_4_thinprov_perf.pdf) discusses this on page 7.
Guest defragmentation is another issue and can have a measurable impact on VM performance:
http://vpivot.com/2010/02/12/windows-guest-defragmentation/
http://vpivot.com/2010/04/14/windows-guest-defragmentation-take-two/
Scott
Duncan Epping says
Only if and when a disk is thin provisioned. And then what is the % that that is actually the case, I bet it is low. Even if it is it doesn’t automatically mean the VMFS block is stored on a different location.
Morten says
Another significant issue with VMFS fragmentation is in combination with vStorage APIs for Data Protection. When backing up VMs with VADP, every noncontinuous block of data needs to be mapped independently, resulting in a huge amount of mappings, “Map disk region” logs in vCenter, and slowdowns of backups.
SOSTech says
As a minor correction, according to my tests (http://wp.me/p1cl48-8G) in VMFS3 Thin Provisioned VMDKs grew in chunks equal to twice the block size and in VMFS5 they grow in chunks equal to the block size (at least for native VMFS5).
Snapshots always grow in 16MB chunks.