I was answering a couple of questions on the VMTN forum and stumbled on the impact of using similar blocksizes for all VMFS volume which I never realized. Someone did a zero out on their VM to reclaim some wasted diskspace by doing a Storage vMotion to a different datastore. (Something that I have described in the past here, and also with relationship to VCB here.) However to the surprise of this customer the zeroed out space was not reclaimed even though he did a Storage vMotion to a thin disk.
After some tests they noticed that when doing a Storage vMotion between Arrays or between datastores formatted with a different blocksize they were able to reclaim the zeroed out diskspace. This is a cool workaround to gobble the zeroes Now the question of course remains why this is the case. The answer is simple: there is a different datamover used. Now this might not say much to some so let me explain the three types that can be used:
- fsdm – This is the legacy datamover which is the most basic version and the slowest as the data moves all the way up the stack and down again.
- fs3dm – This datamover was introduced with vSphere 4.0 and contained some substantial optimizations where data does not travel through all stacks.
- fs3dm – hardware offload – This is the VAAI hardware offload full copy that is leveraged and was introduced with vSphere 4.1. Maximum performance and minimal host CPU/Mem overhead.
I guess it is a bit easier to digest when it is visualized. It should be noted here that fs3dm is used in software mode when HW Offload (VAAI) capabilities are not available. Also note that “application”, the top layer, can be Storage vMotion or for instance vmkfstools:
(Please note that the original diagram came from a VMworld preso (TA3220))
Now why does it make a difference? Well remember this little line in the VAAI FAQ?
- The source and destination VMFS volumes have different block sizes
As soon as different blocksized vmfs volume or a different array is selected as the destination the hypervisor uses the legacy datamover (fsdm). The advantage of this datamover, and it is probably one of the few advantages, is that it gobbles the zeroes. The new datamover (fs3dm), and that includes both the software datamover and the HW Offload, will not do this currently and as such all blocks are copied. The fs3dm datamover however is substantially faster than the fsdm datamover though. The reason for this is that the fsdm datamover reads the block in to the applications buffer (Storage vMotion in our case) and then write the block. The new datamover just moves the block without adding it to the application buffer. I guess it is a “minor” detail as you will not do this on a day to day basis, but when you are part of operations it is definitely a “nice to know”.
Erik Zandboer says
Duncan,
Nice one! So here you actually see an advantage of NOT using VAAI… Basically you’d want a tickbox on svMotion, “Do not use VAAI” or maybe more human readable “force shrink of thin disks”…
Will Usher says
Do you know of any way to force the use of the legacy fsdm?
Duncan Epping says
other than using a different array / blocksized volume. no.
William says
Duncan,
There is a hidden option that determines if the datamover should be using fs3dm or not, though it’s only hidden/available through the use of vsish which is only available on ESXi
/VMFS/EnableDataMovement
1 = yes
0 = no
Description – Whether VMFS should handle data movement requests by leveraging FS3DM
http://download.virtuallyghetto.com/complete_vsish_config.html
Of course it’s probably hidden for a reason, but I assume if you were to disable this, that it would use fsdm?
–William
Gregor says
You can do this also in PowerCLI:
Get-VM -Name “vm-to-shrink” | Get-VMHost | Get-VMHostAdvancedConfiguration -Name “VMFS3.EnableDataMovement”
Get-VM -Name “vm-to-shrink” | Get-VMHost | Set-VMHostAdvancedConfiguration -Name “VMFS3.EnableDataMovement” -Value ([int64] 0)
Get-VM -Name “vm-to-shrink” | Get-VMHost | Set-VMHostAdvancedConfiguration -Name “VMFS3.EnableDataMovement” -Value ([int64] 1)
Rick Vanover says
This brings me to a question, since we are talking about it.
I have always formatted VMFS-3 volumes at 8 MB; even if the LUN size is smaller than 2 TB (-512B).
For the primary reason that workloads move around, including bigger VMs from larger LUNs to smaller LUNs -> Even if the 257 GB VMDK shows up on the 512 GB LUN.
My question is, is formatting at 8 MB at all times OK? Any downside to that?
The sub-block algorithm protects us somewhat from waste, and I’d rather keep functionality high.
What does everyone think?
Duncan Epping says
No, there are no negative side effects for 8MB blocksize formatted volumes Rick.
Brandon says
Wow, what the hell. It isn’t a “regular” occurrence, but anyone trying to use ‘thin’ provisioning would be interested in this one. Some environments are moving whole-sale to thin.
You said, specifically that he was trying to reclaim space by zeroing out, I guess by using something like sdelete. The question I have, is what if the disk type is any one of the “thick” types versus already being set to “thin”. Does that trigger the older datamover? Is a check done that says, “this is thin already, no reason to use the legacy datamover?” If that is the case, that check is probably done when you specifically check the box that says “thin provisioned format” versus “same format as source” or “thick format”. I guess then, if that were the case, you could do a storage vmotion to thick and then back to thin. Maybe it doesn’t matter and you flat out have to use a different blocksized datastore. ;\
Brandon says
OK I just performed some tests.
4MB block -> 4MB block datastore on the same array. Svmotion choosing “thin”. Test VM has a “thick” disk (win2k8r2 — essentially a fresh install).
35 GB “thick” .vmdk before svmotion
14.5 GB “thin” .vmdk after svmotion
Same 35 GB disk above (currently “thin” .vmdk 14.5 GB), with 1 GB of data created, then deleted (confirmed .vmdk is ~15.5 GB). Sdelete to re-zero… which of course inflates the .vmdk to essentially the full size of the disk. Same 4MB block -> 4MB block datastore on the same array. Specifically chose “same format as source”.
35 GB “thin” .vmdk before svmotion
35 GB “thin” .vmdk after svmotion — this is expected given Duncan’s post
Repeated svmotion specifically choosing thin:
35 GB “thin” .vmdk before svmotion
35 GB “thin” .vmdk after svmotion — again this was expected
Repeated svmotion again specifically choosing thick:
35 GB “thin” .vmdk before svmotion
35 GB “thick” .vmdk after svmotion
Repeat svmotion again specifing thin format:
35 GB “thick” .vmdk before svmotion
35 GB “thin” .vmdk after svmotion
Wow… that sucks :\. Although it does appear that thick disks which aren’t eagerzero will save some space on the inital svmotion as zeros aren’t already written. That doesn’t help for p2v’s, etc or anywhere else you want to reclaim. Looks like you need a “mover” datastore specifically set with a different block size in order to remove space. Kind of sucks for the 8mb block size datastores with a 1.1 TB “thin” disk you want to reclaim space from :).
Duncan says
That is what I tested as well and hence this article to explain the impact.
Brandon says
I certainly didn’t mean to take way from your original post. I just didn’t imply enough from it, it isn’t that I didn’t believe you :). I speak Oklahoma English so I have to reread some of what you say sometimes, even in the HA/DRS deepdive book. Hopefully I only added for others rather than detract.
Ricky says
Hello,
On vSphere 5.0u1, with 2 VMFS 5 on the same VAAI array, I SvMotionned my thin VM to thick, then re-SvMotionned from thick to thin and it works.
NB : when I tried from thin to thin, it doesn’t work.
Scott Lowe says
Great find, Duncan, and thanks for sharing this information with us. It would be great if we could trigger use of the legacy datamover (to enable the reclaiming of zero’ed blocks), although I suspect that continued advances in zero reclamation by arrays and expanded use of the T10 UNMAP command will alleviate the need to reclaim those blocks at the hypervisor level.
stefano says
for people who don’t know what “T10 UNMAP” is ?
Harold says
Hi Duncan, that is a nice read !
Never realized there were more datamovers.
one quick feed-back, I guess your third bullet should say: HW offload? instead of fs3dm ?
Harold says
never mind, I misread the third bullet, it is correct after all.
Remains a nice read!
Andrew Fidel says
Seems like one reason to do thin on thin, the array should be smart enough to see the zero’d blocks and unmap them at that level =)
Chad Sakac says
@Andrew – that is one reason why I think this option will become more prevalent over time – UNMAP support (both in the array and in the future, in vSphere itself) will make this config the most efficient.
Jason Boche says
I remember the VAAI FAQ about different block sizes but I probably wouldn’t have connected it to such a scenario until you pointed it out. Nice find.
Jas
Tomas Fojta says
I was scratching my head for some time why I cannot reclaim zeroed space. At the end I solved it with svmotion to NFS datastore. Another option is if you can have some downtime, clone the disk with vmkfstools or export it to OVF and import back. There should definitely be a command or checkbox to choose the datamover while doing svmotion.
Saturnous says
You forgot to mention the trick to vmkfstools -i 2gbsparse the disk to reclaim. Anyway it would need a downtime :(.
Jake says
Might be worth linking directly to the VAAI FAQ to save a google search. 😀 http://kb.vmware.com/kb/1021976
Mike Semon says
All of my Datastores have the same block size so I would be using the fs3dm datamover. I thought by using the shrink feature in VMware tools or sdelete that I would reclaim or zero out unused blocks. Is that not the case. I am doing SvMotion from thick to thin.
HalChris says
Great find! It makes sense why it happens and why the workaround works as the different block sizes and as Tomas said about svMotion to an NFS share.
I wonder if the development team at VMware is working on a fix to this?
C
Duncan Epping says
I spoke with the engineers and they will look into the possibilities. I cannot make any promises unfortunately.
Richard Powers says
I don’t have a VAAI array yet. But could you just disable VAAI on the ESX servers temporary and that would force it to use one of the legacy datamovers ? I couldn’t find anything that says it requires downtime but you never know.
I believe that’s where William was getting at and it might be cleaner way than the KB article
Here’s the link to the KB article on how to disable VAAI.
http://tinyurl.com/4jjaz68
-Rich
Duncan says
It’s the FS3DM datamover that causes this behavior. That datamover has 2 different mechanisms software or vaai. even if you disable vaai it won’t work. William does mention a hidden advanced option, but I cannot recommend using it.
Michael says
Thanks for a great article. I need some clarification. How legacy datamover handles zeroed blocks when target vmdk is thick? Does it actually write zeroes?
Duncan Epping says
yes it does copy those blocks and writes them.
Alastair says
This also has consequences for View Composer disks. The supposedly thin provisioned replicas end up fully allocated as happened to a customer I visited this week: http://www.demitasse.co.nz/wordpress2/2011/08/vaai-consideration-with-view-composer/
Wee Kiong says
Does that means, if we have datastore bearing the same block size, the space will not be reclaimed when we use Storage Vmotion since the fsdm 3 soft/hard is used.
Is this resolve in vSphere 5 with the new VAAI?
Duncan Epping says
Yes it means that when you have similar blocksizes in-guest dead space will not be reclaimed.
vSphere 5 does not resolve this. vSphere 5 will only reclaim dead space of the VMFS volume itself when your array supports it, but not dead space in-guest.
Wee Kiong says
Hi Duncan,
Thanks for the explanation. So for VMFS it doesn’t matter the block size as long the array supports it.
For guest only through storage vmo across different block size then will it be reclaimed.
Daniel Rhoden says
Since vSphere 5 datastores now use 1 MB blocks only, how will we reclaim guest-level wasted space (zeros)? I assume we can’t create an 8 MB block datastore just for reclaiming the unused space. Though to be honest that’s not ideal from a management perspective. I was and am still hoping for a trim-like and ghost clean-up cycle for guests to VMware much in the same way SSD’s manage their deleted space today.
thanks as usual
Duncan says
Only thing u could do with 5 is move to a different array or to an NFS volume…
The engineers are working on improving this though,
Andrew Mauro says
Move to a VMFS5 datastore with different block size could still be a solution.
I’ve tried also the William Lam “hack” and it works also on ESXi 5.
Kalimukwa says
Excellent… I understand the concept of datamover choices based on block size. Does this apply to svmotions to different arrays I.e. two VAAI capable arrays?
Alex Henstra says
Hi Duncan,
Any word from the engineers? I ran into this problem when trying to shrink a Windows 7 VM for a View environment. I noticed that when VIEW generates a replica on the same SAN between VMFS stores with the same blocksize, the dead space in the replica is removed. So View must use FSDM??
Mike DeGain says
How is the data integrity verified after being moved through the FSDM. Is it checksummed? Is there any way to verify the checksum in the logs?
James says
What happens when you are Storage VMotioning from a ESXi 4.1 with 3.94 VMFS datastore on FC that are moving vMs to a NAS remotely. How does tyhis affect performance? Also is there anyway to increase performance at all or somewhat?
mglaverick says
I’m looking at this issue today. I’m assuming if someone took a VM that was thin on VMFS, and it was Storage vMotion’d to NFS, and back again – its free space would show up in the vSphere Web Client?