Thin provisioned disks and VMFS fragmentation, do I really need to worry?

I’ve seen this myth floating around from time to time and as I never publicly wrote about it I figured it was time to write an article to debunk this myth. The question that is often posed is if thin disks will hurt performance due to fragmentation of the blocks allocated on the VMFS volume. I guess we need to rehash (do a search on VMFS for more info)  some basics first around Think Disks and VMFS volumes…

When you format a VMFS volume you can select the blocksize (1MB, 2MB, 4MB or 8MB). This blocksize is used when the hypervisor allocates storage for the  VMDKs. So when you create a VMDK on an 8MB formatted VMFS volume it will create that VMDK out of 8MB blocks and yes indeed in the case of a 1MB formatted VMFS volume it will use 1MB. Now this blocksize also happens to be the size of the extend that is used for Think Disks. In other words, every time your thin disks needs to expand it will grow in extends of 1MB. (Related to that, with a lazy-thick disk the zero-out also uses the blocksize. So when something needs to be written to an untouched part of the VMDK it will zero out using the blocksize of the VMFS volume.)

So using a thin disk in combination with a small blocksize cause more fragmentation? Yes, more than possibly it would. However the real question is if it will hurt your performance. The answer to that is: No it won’t. The reason for it being that the VMFS blocksize is totally irrelevant when it comes to Guest OS I/O. So lets assume you have an regular Windows VM and this VM is issuing 8KB writes and reads to a 1MB blocksize formatted volume, the hypervisor won’t fetch 1MB as that could cause a substantial overhead… no it would request from the array what was requested by the OS and the array will serve up whatever it is configured to do so. I guess what people are worried about the most is sequential I/O, but think about that for a second or two. How sequential is your I/O when you are looking at it from the Array’s perspective? You have multiple hosts running dozens of VMs accessing who knows how many volumes and subsequently who knows how many spindles. That sequential I/O isn’t as sequential anymore all of a sudden it is?!

<edit> As pointed out many arrays recognize sequential i/o and prefetch which is correct, this doesn’t mean that when contiguous blocks are used it is faster as fragmented blocks also means more spindles etc </edit>

I guess the main take away here is, stop worrying about VMFS it is rock solid and it will get the job done.

Talking about face melting stuff

Yes not only Chad Sakac deals with face melting ultra uber geeky cool stuff I do as well (those on twitter know what I am referring to), but unfortunately I cannot share the details on some of the stuff I am working on. I can however provide you a link which contains the papers written by  the engineers on some of the stuff that might be coming up in the future. Note the “might”, there is no guarantee it will ever make it into the VMware products, but nevertheless still cool to read:

http://labs.vmware.com/publications

Is this all distant future? No it isn’t. For instance the paper that talks about Parda is actually what ended up as Storage I/O Control in 4.1. A couple I would recommend reading or at least that had my personal interest are:

No one likes queues

Well depending on what type of queues we are talking about of course, but in general no one likes queues. We are however confronted with queues on a daily basis, especially in compute environments. I was having a discussing with an engineer around storage queues and he sent me the following which I thought was worth sharing as it gives a good overview of how traffic flows from queue to queue with the default limits on the VMware side:

From top to bottom:

  • Guest device driver queue depth (LSI=32, PVSCSI=64)
  • vHBA (Hard coded limit: LSI=128, PVSCSI=255)
  • disk.schedNumOutstanding=32 (VMKernel),
  • VMkernel Device Driver (FC=32, iSCSI=128, NFS=256,  local disk=32)
  • Multiple SAN/Array Queues (Check Chad’s article for more details but it includes port buffers, port queues, disk queues etc (might be different for other storage vendors))

The following is probably worth repeating or clarifying:

The PVSCSI default queue depth is 64. You can increase it to 255 if required, please note that it is a per-device queue depth and keep in mind that this would only be truly useful when it is increased all the way down the stack and the Array Controller supports it. There is no point in increasing the queuedepth on a single layer when the other layers are not able to handle it as it would only push down the delay one layer. As explained in an article a year or three ago, disk.schednumreqoutstanding is enforced when multiple VMs issue I/Os on the same physical LUN, when it is a single VM it doesn’t apply and it will be the Device Driver queue that limits it.

I hope this provides a bit more insight to how the traffic flows. And by the way, if you are worried a single VM floods one of those queues there is an answer for that, it is called Storage IO Control!

Managing availability through vCenter Alarms

Last week a customer asked me a question about how to respond to for instance a partial failure in their SAN environment. A while back I had a similar question from one of my other customers so I more or less knew where to look, and I actually already blogged about this over a year ago when I was showing some of the new vSphere features. Although this is fairly obvious I hardly ever see people using this and hence the reason I wanted to document one of the obvious things that you can implement…. Alarms

Alarms can be used to trigger an alert, and that is of course the default behavior of predefined alarms. However you can also create your own alarms and associate an action with it. I am showing the possibilities here and am not saying that this is a best practice, but the following two screenshots show that it is possible to place a host in maintenance mode based on degraded storage redundancy.

First you define the alarm:

And then you define the action:

Again, this is action could have a severe impact when a switch fails and I wouldn’t recommend it, but I wanted to ensure everyone understands the type of combinations that are possible. I would generally recommend to send an SNMP trap or even a notification email… and I would recommend to at least define the following alarms:

  • Degraded Storage Path Redundancy
  • Duplicate IP Detected
  • HA Agent Error
  • Host connection lost
  • Host error
  • Host warning
  • Host WWN changed
  • Host WWN conflict
  • Lost Network Connectivity
  • Lost Network Redundancy
  • Lost Storage Connectivity
  • Lost Storage Path Redundancy

Many of these deal with hardware issues and you might already be monitoring for them, if not make sure you monitor them through vCenter and take appropriate action when needed.

PowerCLI reference book

I blogged about this almost two months ago and just on Luc’s blog that the release date has been set and the cover art was released. I wanted to remind all of you the book that is one of a kind, VMware vSphere PowerCLI Reference: Automating vSphere.

Release date: 28th of March
Authors: Luc Dekens, Alan Renouf, Glen Sizemore, Arnim van Lieshout, Jonathan Medd
ISB: 0470890797

Converting Open Virtualization Format (OVF) -Virtual Machines to VMware Fusion

I needed to run an appliance inside VMware Fusion on my Mac, the appliance was in OVF format. VMware Fusion currently does not support this format and requires you to convert the image with a tool called ovftool which can be downloaded at the following location: http://communities.vmware.com/community/vmtn/vsphere/automationtools/ovf

Conversion is as simple as:

./ovftool "source.ovf" "target"

Optionally you could use parameters that are described in full detail when running ovftool –help. What remains is importing the created .vmx into Fusion, that’s it.

-ray

RE: VMFS 3 versions – maybe you should upgrade your vmfs?

I was just answering some questions on the VMTN forum when someone asked the following question:

Should I upgrade our VMFS luns from 3.21 (some in 3.31) to 3.46 ? What benefits will we get?

This person was referred to an article by Frank Brix Pedersen who states the following:

Ever since ESX3.0 we have used the VMFS3 filesystem and we are still using it on vSphere. What most people don’t know is that there actually is sub versions of the VMFS.

  • ESX 3.0 VMFS 3.21
  • ESX 3.5 VMFS 3.31 key new feature: optimistic locking
  • ESX 4.0 VMFS 3.33 key new feature: optimistic IO

The good thing about it is that you can use all features on all versions. In ESX4 thin provisioning was introduced but it does need the VMFS to be 3.33. It will still work on 3.21. The changes in the VMFS is primarily regarding the handling of SCSI reservations. SCSI reservations happens a lot of times. Creation of new vm, growing a snapshot delta file or growing thin provisioned disk etc.

I want to make sure everyone realizes that this is actually not true. All the enhancements made in 3.5, 4.0 and even 4.1 are not implemented on a filesystem level but rather on a VMFS Driver level or through the addition of specific filters or even a new datamover.

Just to give an extreme example: You can leverage VAAI capabilities on a VMFS volume with VMFS filesystem version 3.21, however in order to invoke VAAI you will need the VMFS 3.46 driver. In other words, a migration to a new datastore is not required to leverage new features!

Page 30 of 187« First...1020...2829303132...405060...Last »