Thin provisioned disks and VMFS fragmentation, do I really need to worry?

I’ve seen this myth floating around from time to time and as I never publicly wrote about it I figured it was time to write an article to debunk this myth. The question that is often posed is if thin disks will hurt performance due to fragmentation of the blocks allocated on the VMFS volume. I guess we need to rehash (do a search on VMFS for more info)  some basics first around Think Disks and VMFS volumes…

When you format a VMFS volume you can select the blocksize (1MB, 2MB, 4MB or 8MB). This blocksize is used when the hypervisor allocates storage for the  VMDKs. So when you create a VMDK on an 8MB formatted VMFS volume it will create that VMDK out of 8MB blocks and yes indeed in the case of a 1MB formatted VMFS volume it will use 1MB. Now this blocksize also happens to be the size of the extend that is used for Think Disks. In other words, every time your thin disks needs to expand it will grow in extends of 1MB. (Related to that, with a lazy-thick disk the zero-out also uses the blocksize. So when something needs to be written to an untouched part of the VMDK it will zero out using the blocksize of the VMFS volume.)

So using a thin disk in combination with a small blocksize cause more fragmentation? Yes, more than possibly it would. However the real question is if it will hurt your performance. The answer to that is: No it won’t. The reason for it being that the VMFS blocksize is totally irrelevant when it comes to Guest OS I/O. So lets assume you have an regular Windows VM and this VM is issuing 8KB writes and reads to a 1MB blocksize formatted volume, the hypervisor won’t fetch 1MB as that could cause a substantial overhead… no it would request from the array what was requested by the OS and the array will serve up whatever it is configured to do so. I guess what people are worried about the most is sequential I/O, but think about that for a second or two. How sequential is your I/O when you are looking at it from the Array’s perspective? You have multiple hosts running dozens of VMs accessing who knows how many volumes and subsequently who knows how many spindles. That sequential I/O isn’t as sequential anymore all of a sudden it is?!

<edit> As pointed out many arrays recognize sequential i/o and prefetch which is correct, this doesn’t mean that when contiguous blocks are used it is faster as fragmented blocks also means more spindles etc </edit>

I guess the main take away here is, stop worrying about VMFS it is rock solid and it will get the job done.

Be Sociable, Share!

    Comments

    1. Christian says

      Does the expanding of the vmdk file issue some locking operation that would hurt performance?

    2. says

      “That sequential I/O isn’t as sequential anymore all of a sudden it is?!”

      Taking a slight shortcut here imo..
      The really smart caching algorithms of arrays. As soon as they see “something” is requesting adjacent tracks, it’s sequential IO for the array and it starts prestaging blocks to cache with a greater blocksize for a much greater chance of getting read cache hits. Now what if blocks aren’t adjacent anymore….? Bye smart algorithms. So yes, it does hurt performance.

      • says

        Yes that was a shortcut indeed. However, just remember that even if the disk is thick provisioned it doesn’t mean it is optimized to be placed on the same disks and using the same disk doesn’t mean it results in better performance as multiple spindles can provide data faster to I/O.

        I should have expanded on it a bit more maybe.

        • says

          I see where you’re coming from Duncan. And I agree this is not something which will only happen on thin disks. Not at all. But you also know that even with wide striping nowadays the algorithms don’t care if a block is adjacent on the same spindle. It just sees if blocks are adjacent on (sub)lun basis, and will know from which array/spindle(s) it needs to prefetch the next block from. Let alone if the storage is virtualised. Heck, with virtualised arrays it’s even possible the frontend storage is prestaging sequential reads even over multiple boxes and arrays :)

          I shouldn’t have used the word “tracks”, I was a bit off there, it’s not what I meant.

          Just wanted to point out it was a bit of a shortcut, and while you might know it, lots of your readers don’t, and will take it for granted. But I see you updated your article, thanks. ;)

    3. says

      No difference between thin and thick but between thin&thick and eagerzeroed thick. I suggest eagerzeroed thick disks for database transaction logs and AD servers (as they try forced unit access). It is more a “over the thumb” suggestion but i see more risk on IO intensive thin disk to fill up the datastore.

      NetApp promise that a stream of blocks pressed through the NVRAM will find itself in neighborhood each other on the spindles.

    4. says

      Guest IOs are penalized by VMFS fragmentation when a single guest contiguous IO spans two VMFS blocks. An example would be an 8k read on a 4k guest file system when one 4k block occurs at the end of one VMFS block and the second 4k block occurs at the beginning of the next. Due to VMFS fragmentation, one IO becomes two.

      But the key thing to remember in this situation is that the VMFS blocks are so much larger than the guest blocks that this splitting is very rare. The net impact to performance is immeasurably small. As you already pointed out, the thin disk paper we published a couple years back (http://www.vmware.com/pdf/vsp_4_thinprov_perf.pdf) discusses this on page 7.

      Guest defragmentation is another issue and can have a measurable impact on VM performance:
      http://vpivot.com/2010/02/12/windows-guest-defragmentation/
      http://vpivot.com/2010/04/14/windows-guest-defragmentation-take-two/

      Scott

      • says

        Only if and when a disk is thin provisioned. And then what is the % that that is actually the case, I bet it is low. Even if it is it doesn’t automatically mean the VMFS block is stored on a different location.

    5. Morten says

      Another significant issue with VMFS fragmentation is in combination with vStorage APIs for Data Protection. When backing up VMs with VADP, every noncontinuous block of data needs to be mapped independently, resulting in a huge amount of mappings, “Map disk region” logs in vCenter, and slowdowns of backups.

    6. says

      As a minor correction, according to my tests (http://wp.me/p1cl48-8G) in VMFS3 Thin Provisioned VMDKs grew in chunks equal to twice the block size and in VMFS5 they grow in chunks equal to the block size (at least for native VMFS5).
      Snapshots always grow in 16MB chunks.