vmfs

Storage Filters

Duncan Epping · Aug 11, 2010 ·

I was reading about Storage Filters last week and wanted to do a short write up. I totally forgot about it until I noticed this new KB article. The KB article only discusses the LUN filters though and not the other filters that are available today.

Currently 4 filters have been made public:

config.vpxd.filter.hostRescanFilter
config.vpxd.filter.vmfsFilter
config.vpxd.filter.rdmFilter
config.vpxd.filter.SameHostAndTransportsFilter

The first filter on the list is one I discussed roughly a year ago. The “Host Rescan Filter” makes it possible to disable the automatic storage rescan that occurs on all hosts after a VMFS volume has been created. The reason you might want to avoid this is when you adding multiple volumes and want to avoid multiple rescans but just initiate a single rescan after you create your final volume. By setting “config.vpxd.filter.hostRescanFilter” to false the automatic rescan is disabled. In short the steps needed:

Open up the vSphere Client
Go to Administration -> vCenter Server
Go to Settings -> Advanced Settings
If the key “config.vpxd.filter.hostRescanFilter” is not available add it and set it to false

To be honest this is the only storage filter I would personally recommend using. For instance “config.vpxd.filter.rdmFilter” when set to “false” will enable you to add a LUN as an RDM to a VM while this LUN is already used as an RDM by a different VM. Now that can be useful in very specific situations like when MSCS is used, but in general should be avoided as data could be corrupted when the wrong LUN is selected.

The filter “config.vpxd.filter.vmfsFilter” can be compared to the RDM filter as when set to false it would enable you to overwrite a VMFS volume with VMFS or re-use as an RDM. Again, not something I would recommend enabling as it could lead to loss of data which has a serious impact on any organization.

Same goes for “config.vpxd.filter.SameHostAndTransportsFilter”. When it is set to “False” you can actually add an “incompatible LUN” as an extend to an existing volume. An example of an incompatible LUN would for instance be a LUN which is not presented to all hosts that have access to the VMFS volume it will be added to. I can’t really think of a single reason to change the defaults on this setting to be honest besides troubleshooting, but it is good to know they are there.

Most of the storage filters have its specific use cases. In general storage filters should be avoided, except for “config.vpxd.filter.hostRescanFilter” which has proven to be useful in specific situations.

That’s why I love blogging…

Duncan Epping · Jun 2, 2009 ·

I’m an outspoken person as most of you noticed by now, but I’m also open for discussion and that’s why I particularly like blogging. Every now and then a good discussion starts based on one of my blog articles. (Or a blog article of any of the bloggers for that matter.) These usually start in the form of a comment on an article but also via email or Twitter, even within VMware some of my articles have been discussed extensively.

A couple of weeks ago I voiced my opinion about VMFS block sizes and growing your VMFS. Growing your VMFS is a new feature, introduced with vSphere. In the article I stated that a large block size, 8MB, would be preferabel because of the fact that you would have less locking when using thin provisioned disks.

If you create a thin provisioned disk on a datastore with a 1MB block size the thin provisioned disk will grow with increments of 1MB. Hopefully you can see where I’m going. A thin provisioned disk on a datastore with an 8MB block size will grow in 8MB increments. Each time the thin-provisioned disk grows a SCSI reservation takes place because of meta data changes. As you can imagine an 8MB block size will decrease the amount of meta data changes needed, which means less SCSI reservations. Less SCSI reservations equals better performance in my book.

As a consultant I get a lot of question on vmfs locking and I assumed, with the current understanding I had, that a larger blocksize would be beneficial in terms of performance. I’m no scientist or developer and I rely on the information I find on the internet, manuals, course material and the occasional internal mailinglists… In this case this information wasn’t correct, or better said not updated yet to the changes that vSphere introduced. Luckily for me, and you guys, one of my colleagues jumped in to give us some good insights:

I am a VMware employee and I wrote VMFS with a few cronies, but the following is a personal opinion:

Forget about locking. Period. Yes, SCSI reservations do happen (and I am not trying to defend that here) and there will be some minor differences in performance, but the suggestion on the (very well written) blog post goes against the mission of VMFS, which is to simplify storage virtualization.

Heres a counter example: if you have a nearly full 8MB VMFS volume and a less full 1MB VMFS volume, you’ll still encounter less IO overheads allocating blocks on a 1MB VMFS volume compared to the 8MB volume because the resource allocator will sweat more trying to find a free block in the nearly full volume. This is just one scenario, but my point is that there are tons of things to consider if one wants to account for overheads in a holistic manner and the VMFS engineers don’t want you to bother with these “tons” of things. Let us handle all that for you.

So in summary, blocksizes and thin provisioning should be treated orthogonally. Since thin provisioning is an official feature, the thing for users to know is that it will work “well” on all VMFS blocksize configurations that we support. Thinking about reservations or # IOs the resource manager does, queue sizes on a host vs the blocksize, etc will confuse the user with assertions that are not valid all the time.

I like the post in that it explains blocks vs sub-blocks. It also appeals to power users, so that’s great too. But reservation vs. thin provisioning considerations should be academic only. I can tell you about things like non-blocking retries, optimistic IO (not optimistic locking) and tons of other things that we have done under the covers to make sure reservations and thin provisioning don’t belong in the same sentence with vSphere 4. But conversely, I challenge any user to prove that 1MB incurs a significant overhead compared to 8MB with thin provisioning

Satyam Vaghani

Does this mean that I would not pick an 8MB block size over a 1MB block size any more?

Not exactly, but it will depend on the specific situation of a customer. My other reason for picking an 8MB block size was VMFS volume growing. If you grow a VMFS volume the reason for this probably is the fact that you need to grow a VMDK. If the VMDK needs to grow larger than the maximum file size, which is dictated by the chosen block size, you would need to move(Storage VMotion or cold migration) the VMDK to a different datastore. But if you would have selected an 8MB block size when you created the VMFS volume you would not be in this position. In other words I would prefer a larger block size, but this is based on flexibility in terms of administration and not based on performance or possible locking issues.

I want to thank Satyam for his very useful comment, thanks for chipping in!

Block sizes and growing your VMFS

Duncan Epping · May 14, 2009 ·

I had a discussion on block sizes after the post on thin-provisioned disks with some of my colleagues. For those that did not read this post here’s a short recap:

If you create a thin provisioned disk on a datastore with a 1MB blocksize the thin provisioned disk will grow with increments of 1MB. Hopefully you can see where I’m going. A thin provisioned disk on a datastore with an 8MB blocksize will grow in 8MB increments. Each time the thin-provisioned disk grows a SCSI reservation takes place because of meta data changes. As you can imagine an 8MB blocksize will decrease the amount of meta data changes needed, which means less SCSI reservations. Less SCSI reservations equals better performance in my book.

As some of you know the locking mechanism has been improved with vSphere, yes there’s a good reason why they call it “optimistic locking”. In other words why bother increasing your block size if the locking mechanism has improved?

Although the mechanism behaves differently it does not mean that locking does not need to occur. In my opinion it’s still better to have 1 lock vs 8 locks if a VMDK need to grow. But there’s another good reason, with vSphere comes growable VMFS volumes. You might start with a 500GB VMFS volume and a 1MB block size, but when you expand the disk this block size might not be sufficient when you create new VMs. Keep in mind that you can’t modify the block size, while you just might have given people the option to create disks beyond the limit of the block size. (Mind: you will receive an error, it’s not possible.)

So what about overhead? Will my 1KB log files all be created in 8MB blocks? Cause this would mean a large overhead and might be a valid reason to use 1MB block sizes!

No it will not. VMFS-3 solves this issue by offering a sub-block allocator. Small files use a sub block to reduced overhead. A sub block of a 1MB block size volume is 1/16th the size of the block. For an 8MB block size volume it’s 1/128th. In other words, the sub-blocks are 64KB large in both cases and thus the overhead is the same in both cases as well.

Now my question to you guys, what do you think? Would it make sense to always use an 8MB blocksize… I think it would

Update: VMFS metadata backup

Duncan Epping · Apr 5, 2009 ·

Mike Laspina just released a new version of his VMFS metadata backup script. William Lam, the creator of the health check report script I wrote about, helped Mike out to add a new feature “rolling backup” with folder augmented organization based on the host name, store alias, date label and the rolling instance number. This new version saves 10 versions of your metadata instead of just one and gives them a more appropriate name.

You can find the new version of the VMFS metadata backup script here.

Protecting VMFS Datastores

Duncan Epping · Mar 29, 2009 ·

A couple of months ago Mike La Spina wrote a cool article titled “Understanding VMFS volumes“. This articles explained the concept of UUID’s and even explained how you could create a backup of your VMFS metadata. I haven’t personally ever encountered VMFS metadata corruption but it’s better to be safe than sorry.

Today Mike wrote a follow up which contains a script that can backup your metadata on a regular base by using cron. The script does a DD of each VMFS volume header and copies the vh.sf file of the VMFS volume to local disk. If you haven’t got a clue what I’m talking about I suggest Head over to Mike’s blog and read the articles!

It can’t be a coincidence that the script has been released shortly after VMware released the VMware VMFS Volume Management PDF… And is it a coincidence that Eric Sloof wrote about VMFS metadate files yesterday? Eric’s also posted a link to Mostafa Khalil’s VMworld 2007 presentation on VMFS volumes and how to backup metadata by the way.

All very useful documents/articles if you want to know the ins and out of the VMFS filesystem.