I’ve seen a lot of people picking up on the queuedepth settings lately, especially when there are QLogic adapters involved. Although it can be really beneficial to set the queuedepth to 64 it’s totally useless when one forgets about the “Disk.SchedNumReqOutstanding” setting. This setting always has to be aligned with the queuedepth because if the Disk.SchedNumReqOutstanding parameter is given a lower value than the queue depth, only that many outstanding commands are issued from the ESX kernel to the LUN from all virtual machines. In other words if you set a queuedepth of 64 and a Disk.SchedNumReqOutstanding of 16, only 16 commands get issued at a time to the LUN instead of the 64 your queuedepth is set to.
You can set Disk.SchedNumReqOutstanding via the command line and via VirtualCenter:
- VirtualCenter -> Configuration Tab -> Advanced Settings -> Disk -> Disk.SchedNumReqOutstanding
- Commandline -> esxcfg-advcfg -s 64 /Disk/SchedNumReqOutstanding
Disk.UseDeviceReset section is deprecated, see this article for more info.
Hi,
Could it be that you mixed up the default values in this sentence:
ESX defaults to Disk.UseLunReset=1 and Disk.UseDeviceReset=0
Because this is what you’re setting it to.
Regards,
Lukas
I checked my ESX 3.5u1 Servers and they have both device and lun-reset-parameters set to 1 (!!!). What’s up with that? I have not changed the parameters from the native install.
Yeah you’re right Lukas, that’s a typo. Fixed it. It defaults to 1 – 1, and it should be 1 – 0 in a SAN environment.
Duncan,
First of all great site, great information.
I have a different apreciation of the purpose of the SchedNumReqOutstanding setting. I use it to ensure that a single high IO VM cannot swamp the HBA queue at the expense of other VMs on that LUN. In that case the SchedNumReqOutstanding is a per VM limit to the requests that can be sent to a particular LUN and should be less than the HBA queue depth so that other VMs can also queue requests.
The VMware DSA course also notes that the SchedNumReqOutstanding limit only apples when there is more than one VM per LUN. If there is only one VM on the LUN then the HBA queue depth applies.
Well that’s another way to put it I guess. If you reach 64 mainly due to 1 VM that’s a great way to cap the VM, or probably the only way.
Edit:
Just been looking for more info and this pdf came along:
http://www.vmware.com/files/pdf/scalable_storage_performance.pdf
Yes it says:
Also make sure to set the Disk.SchedNumReqOutstanding parameter to the same value as the queue depth.
If this parameter is given a higher value than the queue depth, it is still capped at the queue depth. However,
if this parameter is given a lower value than the queue depth, only that many outstanding commands are
issued from the ESX kernel to the LUN from all virtual machines.
But what about the situation when you have multiple LUNs with multiple VMs.
it’s 64 for each lun per vm when multiple vm’s access the lun.
Do all the ESX servers need to be rebooted after?
yes, and do a “esxcfg-boot -b” when you’ve applied the queuedepth settings!
What does that command do?
it sets up the information required for booting which includes this parameter.
I can run it and reboot at a later date correct?
yes you can, but i would recommend doing it asap.
Thanks Duncan. Much appreciated.
There’s much more that goes into setting the Queue Depth than making the assumption that a queue depth of X value on the host will be sufficient and will drive high I/O.
The fan-in ratio of host ports to a single Target port needs to be considered AND
The Queue depth on the Array target port side AND given that A/A multipathing is not officially supported, the Active path for each LUN should be balanced across all front end target ports.
If I have a hypotherical Target queue depth per port of 512 and thru that port I expose 4 LUNs (Datastores) to a 4 node ESX cluster with a host queue depth value of 64 on each. Each Host has an Active path thru the Target Port.
Then I can, potentially, end up with 4 x 4 x 64 = 1024 outstanding IOs at which point the Target port will issue a QFULL condition simply because it will be saturated.
The way FC drivers deal with such conditions is that they will throttle I/O significantly so the target queues have time to clear and thenthe initiator will gradually increase I/O again. The end results is significant latency and for some Operating systems (i.e AIX) this condition will result in an I/O error if 3 consecutive QFULL conditions occur for the same request.
I’ve written an article on Dynamic Queue Depth management and what NetApp has done to control, monitor and dynamically allocate and change queue slot allocations without having to touch the host queue depth value beyond the initial setup.
http://partners.netapp.com/go/techontap/matl/fc-sans.html
cheers
Thanks for the excellent reply nick!! this is valuable info,
I see this is a two year old article, nevertheless still current on our site.
I read it and have a question (or two).
How can you raise one Disk.Sched setting without raising (or changing) the other 3 ?
I mean, when you raise the Disk.SchedNumReqOutstanding value are you not required to also raise 1 or more of the other Disk.Sched settings ?
If you, for example, look at the Disk.SchedQuantum default and Max values (8 and 64) they are comparable (factor 8) to the default and Max value of the Disk.SchedNumReqOutstanding (32 and 256). So if you double the one shouldn’t you be doubling the other too ?
The same goes for the Disk.SchedQControlSeqReqs, to be able to max out the outstanding commands shouldn’t you also double the default here?
These are the four Disk.Sched settings:
Disk.SchedNumReqOutstanding
Number of outstanding commands to a target with competing worlds [1-256: default = 32]: 32
Disk.SchedQuantum
Number of consecutive requests from one World [1-64: default = 8]: 8
Disk.SchedQControlSeqReqs
Number of consecutive requests from a VM required to raise the outstanding commands to max [0-2048: default = 128]: 128
Disk.SchedQControlVMSwitches
Number of switches between commands issued by different VMs required to reduce outstanding commands to SchedNumReqOutstanding [0-2048: default = 6]: 6
The explaination the vCentre gives for DSNRO is when two worlds are competing against the same resource (LUN) Datastor.
so what i have seen is drop the QD value and the DNSRO to the same as it will operate better that way
havin too much QD value will eat up availble port tags
so TAG count /LUNS = qdepth
But and a big but watch out if you add more LUN the tags avalible will drop and so you will swamp the port
8 -16 is fine