I was answering a couple of questions on the VMTN forum and stumbled on the impact of using similar blocksizes for all VMFS volume which I never realized. Someone did a zero out on their VM to reclaim some wasted diskspace by doing a Storage vMotion to a different datastore. (Something that I have described in the past here, and also with relationship to VCB here.) However to the surprise of this customer the zeroed out space was not reclaimed even though he did a Storage vMotion to a thin disk.
After some tests they noticed that when doing a Storage vMotion between Arrays or between datastores formatted with a different blocksize they were able to reclaim the zeroed out diskspace. This is a cool workaround to gobble the zeroes Now the question of course remains why this is the case. The answer is simple: there is a different datamover used. Now this might not say much to some so let me explain the three types that can be used:
- fsdm – This is the legacy datamover which is the most basic version and the slowest as the data moves all the way up the stack and down again.
- fs3dm – This datamover was introduced with vSphere 4.0 and contained some substantial optimizations where data does not travel through all stacks.
- fs3dm – hardware offload – This is the VAAI hardware offload full copy that is leveraged and was introduced with vSphere 4.1. Maximum performance and minimal host CPU/Mem overhead.
I guess it is a bit easier to digest when it is visualized. It should be noted here that fs3dm is used in software mode when HW Offload (VAAI) capabilities are not available. Also note that “application”, the top layer, can be Storage vMotion or for instance vmkfstools:
(Please note that the original diagram came from a VMworld preso (TA3220))
Now why does it make a difference? Well remember this little line in the VAAI FAQ?
- The source and destination VMFS volumes have different block sizes
As soon as different blocksized vmfs volume or a different array is selected as the destination the hypervisor uses the legacy datamover (fsdm). The advantage of this datamover, and it is probably one of the few advantages, is that it gobbles the zeroes. The new datamover (fs3dm), and that includes both the software datamover and the HW Offload, will not do this currently and as such all blocks are copied. The fs3dm datamover however is substantially faster than the fsdm datamover though. The reason for this is that the fsdm datamover reads the block in to the applications buffer (Storage vMotion in our case) and then write the block. The new datamover just moves the block without adding it to the application buffer. I guess it is a “minor” detail as you will not do this on a day to day basis, but when you are part of operations it is definitely a “nice to know”.