I just had a discussion about an upgrade from 3.0.x to 3.5. During that conversation the new HBA drivers that VMware uses came up. After an upgrade it seems that the queue depth settings are lost. Thinking about it I can understand why. ESX 3.5 uses different driver for most HBA’s. For instance Emulex, their driver for the LP11002 used to be the “lpfcdd_732” and now is the “lpfc_740”. After an upgrade the new driver/module doesn’t get the options you provided with “esxcfg-module” because it was specified for that specific “old” module. This means that you have to set these options again!
Search Results for: queue depth
Queuedepth, how and when
So you’ve heard this probably from a few dozens of people by now when you don’t hit the expected SAN performance: Set your queuedepth to a larger size.
So how do you set this queuedepth? Find out for which module you’ll need to set this option:
vmkload_mod -l | grep qla
Now set it to a depth of 64 for module qla2300_707
esxcfg-module -s ql2xmaxqdepth=64 qla2300_707
esxcfg-boot –b
So now you’ve set the queue depth to 64 for your HBA cards, but why? Well I hope the answer is:”because I monitored my system with esxtop and I noticed that the “QUED” value was high”.
So there’s your when. You’ll need to set this setting if you notice a high “QUED” value in esxtop. Take a look at the following example I borrowed from a great blog on this subject:
As you can see in the example, the “ACTV” has a value of 32. Indeed 32 active commands cause that’s the default queue depth for qlogic cards. And 31 outstanding commands, in other words if we bump up the queue depth to 64 than all the commands should be processed instead of queued in the VMkernel.
What will this result in?
Queuedepth, and what’s next?
I’ve seen a lot of people picking up on the queuedepth settings lately, especially when there are QLogic adapters involved. Although it can be really beneficial to set the queuedepth to 64 it’s totally useless when one forgets about the “Disk.SchedNumReqOutstanding” setting. This setting always has to be aligned with the queuedepth because if the Disk.SchedNumReqOutstanding parameter is given a lower value than the queue depth, only that many outstanding commands are issued from the ESX kernel to the LUN from all virtual machines. In other words if you set a queuedepth of 64 and a Disk.SchedNumReqOutstanding of 16, only 16 commands get issued at a time to the LUN instead of the 64 your queuedepth is set to.
You can set Disk.SchedNumReqOutstanding via the command line and via VirtualCenter:
- VirtualCenter -> Configuration Tab -> Advanced Settings -> Disk -> Disk.SchedNumReqOutstanding
- Commandline -> esxcfg-advcfg -s 64 /Disk/SchedNumReqOutstanding
Disk.UseDeviceReset section is deprecated, see this article for more info.
vVols and queueing
I was reading an article last week by Ray Lucchesi on Virtual Volumes (vVols) and queueing. In that article (and podcast) Ray (and friends on the podcast) describe vVols and the benefits they bring but also a potential danger. I have written about vVols before and if you don’t know what it is or does then I recommend reading those articles. I have been wondering as well, how all of this works, as I also felt that there could easily be a bottleneck. I had some conversations over the last couple of weeks and I figured I would share it with you instead of just leaving a comment on Ray’s blog. Lets look at an architectural diagram first:
In the diagram above (which I borrowed from the vSphere Storage blog, thanks Rolo) you see two important constructs which are part of the overall vVols architecture namely the Storage Container aka Virtual Datastore and the Protocol Endpoint (PE). The Storage Container is where the vVols will be stored. The IO though is proxied through the Protocol Endpoint. You can imagine that if we would not do this and expose every single vVol directly to vSphere that you would have 1000s of devices connected to vSphere, and as you know vSphere has a 256 device limit at the moment. This would never scale, and as such the Protocol Endpoint is used as an access point to a vVols capable storage system.
Now think about a VMFS volume and look at the vVols architectural diagram again. Yes, there is a potential bottleneck indeed. However, what the diagram does not show is that you can have multiple Protocol Endpoints. Ray mentions the following in his post: “I am also not aware of any VASA 2.0 requirement that restricts the number of PEs for a storage system’s support of a single vSphere cluster”. And I can confirm that VMware did not limit the number of Protocol Endpoints in any shape or form. I read the specifications and it literally states 1 PE at a minimum and preferably more. Note that vendor implementations of vVols may differ, I have seen implementations that describe many PEs per storage system, but also implementations which have 1 PE per storage system. And in the case of 1 PE per storage system can that be a bottleneck?
The queue depth of the Protocol Endpoint isn’t limited to 32 like a regular LUN when multiple VMs are contending for IO (“disk.schednumreqoutstanding”) or 64 (typical device queue depth) but set to 128 by default. This can be increased when required however. Before you do, please consult your storage vendor. There are a couple of variables that need to be taken in to account like the max device queue depth for instance and then there also is the HBA max queue depth as well. (For NFS queue depth is no concern typically.) The potential constraint when there is only (uncommon) a single PE can be mitigated. What is important here is that vVols itself does not impose any constraints.
Also, note that some storage vendors have an implementation where the array actually can make the distinction between regular IO and control/management related IO. Regular IO in those cases doesn’t proxy through the PE, which means you will not fill up the queue of the PE. Pretty smart.
I am hoping that clears up some of the misunderstandings out there.
No one likes queues
Well depending on what type of queues we are talking about of course, but in general no one likes queues. We are however confronted with queues on a daily basis, especially in compute environments. I was having a discussing with an engineer around storage queues and he sent me the following which I thought was worth sharing as it gives a good overview of how traffic flows from queue to queue with the default limits on the VMware side:
From top to bottom:
- Guest device driver queue depth (LSI=32, PVSCSI=64)
- vHBA (Hard coded limit: LSI=128, PVSCSI=255)
- disk.schedNumOutstanding=32 (VMKernel),
- VMkernel Device Driver (FC=32, iSCSI=128, NFS=256, local disk=32)
- Multiple SAN/Array Queues (Check Chad’s article for more details but it includes port buffers, port queues, disk queues etc (might be different for other storage vendors))
The following is probably worth repeating or clarifying:
The PVSCSI default queue depth is 64. You can increase it to 255 if required, please note that it is a per-device queue depth and keep in mind that this would only be truly useful when it is increased all the way down the stack and the Array Controller supports it. There is no point in increasing the queuedepth on a single layer when the other layers are not able to handle it as it would only push down the delay one layer. As explained in an article a year or three ago, disk.schednumreqoutstanding is enforced when multiple VMs issue I/Os on the same physical LUN, when it is a single VM it doesn’t apply and it will be the Device Driver queue that limits it.
I hope this provides a bit more insight to how the traffic flows. And by the way, if you are worried a single VM floods one of those queues there is an answer for that, it is called Storage IO Control!