Disk Controller features and Queue Depth?

I have been working on various VSAN configurations and a question that always comes up is what are my disk controller features and queue depth for controller X? (Local disks, not FC based…) Note that this is not only useful to know when using VSAN, but also when you are planning on doing host local caching with solutions like PernixData FVP or SanDisk FlashSoft for instance. The controller used can impact the performance, and a really low queue depth will result in a lower performance, it is as simple as that.

I have found myself digging through documentation and doing searches on the internet until I stumbled across the following website. I figured I would share the link with you, as it will help you (especially consultants) when you need to go through this exercise multiple times:

http://forums.servethehome.com/index.php?threads/lsi-raid-controller-and-hba-complete-listing-plus-oem-models.599/

Just as an example, the Dell H200 Integrated disk controller is on the VSAN HCL. According to the website above it is based on the LSI 2008 and provides the following feature set: 2×4 port internal SAS, no cache, no BBU, RAID 0, 1 and 10. According to the VSAN HCL also provides “Virtual SAN Pass-Through”. I guess the only info missing is queue depth of the controller. I have not been able to find a good source for this. So I figured I would make this thread a source for that info.

Before we dive in to that, I want to show something which is also important to realize. Some controllers take: SAS / NL-SAS and SATA. Although typically the price difference between SATA and NL-SAS is neglectable, the queue depth difference is not. Erik Bussink was kind enough to provide me with these details of one of the controllers he is using as an example, first in the list is “RAID” device – second is SATA and third SAS… As you can see SAS is the clear winner here, and that includes NL-SAS drives.

mpt2sas_raid_queue_depth: int
     Max RAID Device Queue Depth (default=128)
  mpt2sas_sata_queue_depth: int
     Max SATA Device Queue Depth (default=32)
  mpt2sas_sas_queue_depth: int
     Max SAS Device Queue Depth (default=254)

If you want to contribute, please take the following steps and report the Vendor, Controller type and aqlength in a comment please.

  1. Run the esxtop command on the ESXi shell / SSH session
  2. Press d
  3. Press f and select Queue Stats (d)
  4. The value listed under AQLEN is the queue depth of the storage adapter

The following table shows the Vendor, Controller and Queue Depth. Note that this is based on what we (my readers and I) have witnessed in our labs and results my vary depending on the firmware and driver used. Make sure to check the VSAN HCL for the supported driver / firmware version, note that not all controllers below are on the VSAN HCL, this is a “generic” list as I want it to serve multiple use cases.

Generally speaking it is recommended to use a disk controller with a queue depth > 256 when used for VSAN or “host local caching” solutions.

Vendor Disk Controller Queue Depth
Adaptec RAID 2405 504
Dell (R610) SAS 6/iR 127
Dell PERC 6/i 925
Dell PERC H200 Integrated 600
Dell PERC H310 25
Dell (M710HD) PERC H200 Embedded 499
Dell (M910) PERC H700 Modular 975
Dell PERC H700 Integrated 975
Dell (M620) PERC H710 Mini 975
Dell (T620) PERC H710 Adapter 975
Dell (T620) PERC H710p 975
Dell PERC H810 975
HP Smart Array P220i 1020
HP Smart Array P400i 128
HP Smart Array P410i 1020
HP Smart Array P420i 1020
HP Smart Array P700m 1200
IBM ServeRAID-M5015 965
IBM ServeRAID-M5110 975
Intel C602 AHCI (Patsburg) 31 (per port)
Intel C602 SCU (Patsburg) 256
Intel RMS25KB040 600
LSI 2004 25
LSI 2008 25 / 600 (firmware dependent!)
LSI 2108 600
LSI 2208 600
LSI 2308 600
LSI 3008 600
LSI 9300-8i 600
Be Sociable, Share!

    Comments

    1. says

      You bring up a very good point Duncan. Thanks for that. The only question that remains then from an architecting/design point of view is how big the impact of the queue depth would be. > 250 would suit well for small/regular workloads but is still a great difference (probably in price as well) then +1000.

      Question 1: Where is it that 250 is not enough anymore, where would you need to go for at least one of those 600?
      Question 2 to avoid the FUD of “we are the best” > is there a useful ceiling? is there a point where a bigger queue depth just doesn’t make sense anymore or is that unexisting and is bigger always better?

      • says

        good question… In the end it is going to depend on your IO Pattern, number of disks and SSDs attached to it, and what you can afford I guess. Considering though that a controller with a queuedepth of 600 like the 2208 is only a couple of bucks more than the 2008 (150 bucks with supermicro per server typically)

        Q1: SATA Disk typically has a queue depth of 32. While SAS and NL-SAS have 256. So 600 is the minimum you should aim for if you ask me when you do VSAN or local host caching. If you take anything less than you have this nice Hour Glass effect :)

        Q2: Not sure if there really is a ceiling to be honest. I mean a bigger queue depth will allow you to more efficiently use the queues of your devices, whether that is SAS magnetic disks or SAS flash devices.

        Hope that helps. (PS, I am not the real expert on this topic either)

    2. Peter Vandermeulen says

      I tested a few of our Dell servers
      Dell PERC H710 Mini 975
      Dell PERC H810 975
      Dell PERC H700 Integrated 975

    3. Wade Holmes says

      Please be aware that queue depth varies depending on driver. For example, the queue depth of the LSI2004 = 25 with driver that will be supported by VSAN (megaraid_sas). For the LSI3008, queue depth can be either 256 or 1024, depending on driver. VMware is working on having this information added to the VMware VCG for VSAN.

    4. says

      Great post! Always happy to learn a new trick (finding queue depth via esxtop) so thanks!

      You already had the info for the H710 Mini I included below, but I’m also listing server model in case that helps anyone. I’ve also added a few new ones that weren’t listed yet. The following are a few flavors of Dell blades and a couple types of T620′s we use for our field offices running ESXi. Device names were verified via lspci.

      Dell PowerEdge M910 = Dell PERC H700 Modular, 975
      Dell PowerEdge M620 = Dell PERC H710 Mini, 975 (already listed)
      Dell PowerEdge T620 = Dell PERC H710 Adapter, 975
      Dell PowerEdge T620 = Dell PERC H710P Adapter, 975

      • says

        The NVMe specification outlines support for 64k io’s per queue, and many queues. Controller data sheets will specify the max number of queues and max concurrent IO’s that can be processed by the controller; they will if they want to help us anyway :). The equivalent comparison to this article (great article BTW) would be the controller supports 64k (and more) IO submissions, but like the SAS at 254 and SATA at 32 an NVMe controller will need to specify how many io’s maximum it can process concurrently, which I assure you will not be 64k :). That said, the trick will be to balance outstanding requests with the device (and driver’s) ability to keep up. Once you overload the device the kernel will throttle and queue to prevent stalling the system while the device chokes through its list of io requests.

    5. Michael says

      I did not find any storage vmotion tests between two datastores. Is this possible with VSAN to test on the same host to be may saturate a 10 Gigabit link on the vmware layer?

    6. Sylvain says

      Hi,
      I also checked the parameter on a HP ProLiant Gen8 with a p220i and a p420i and I only see 28 for each in the AQLEN parameter. It’s very far from the value other observed. Is there a parameter somewhere that could have been set and could be limiting? Because I see very poor performance results so far on my VSAN cluster and I think something must be wrong and it might be this queue lenght.
      Sylvain

        • Sylvain says

          Yes: everything updated with the latest firmware disk and I used the HP esxi 5.5U1 iso.
          But I applied existing server profiles to configure the servers and I begin to wonder if an old parameter could be in there, herited from an older server with less capable hardware.
          What do you think?

          • Matt Mabis says

            This is Caused from using the HP ASYNC Driver, the Inbox driver is the supported vSAN driver, and will change back from 28 to 1020 once you revert it. to test this, all you have to do is download the Offline bundle do the esxcli command to remove the ASYNC Driver and install the same Inbox Driver.

            • Sylvain says

              Hello Matt
              Sorry for the late reply but I only saw your comment.
              Thank you for your answer
              Can you elaborate a little more (or give me a link) to the procedure please?

            • Sylvain says

              Hi
              Just wanted to let you know I managed to swap the driver and now I see the correct AQLEN 1020
              For those interrested, here is the process I used:

              Download update-from-esxi5.5-5.5_update01.zip from vmware (esxi download section)
              Unzip it and extract VMware_bootbank_scsi-hpsa_5.5.0-44vmw.550.0.0.1331820.vib
              Upload it to the host or a shared datastore
              ssh to the host (or cli) as root and do

              esxcli software vib remove -n “scsi-hpsa”
              esxcli software vib install -v /path_you_stored_the_vib_file/VMware_bootbank_scsi-hpsa_5.5.0-44vmw.550.0.0.1331820.vib

              reboot, et voilà!

              Thanks again Matt.

      • Ryan says

        I’m seeing the exact same thing (28) – DL380p Gen8, P420i, HP ESXi 5.5u1 fully patched. My Fusion-IO ioDrive2 (PCIe SSD) shows 5000, which sounds nice. :)

      • says

        I personally would not run these disk controllers myself to be honest. But I cannot make a VMware official statement and say you need to replace them. I suggest contacting your VMware representative and ask him/her to pass this on to product management.

    7. IronMtn says

      Hi Duncan,

      Been looking at disk performance counters info lately and I was under the impression that QUED was queue depth, not AQLEN. I’ve also seen some people equate QUED to the stat disk.queueLatency.average. Can you help clear my confusion on this?

    8. Shafay Latif says

      HP H222 SAS Controller has AQLEN=600
      Best is just to go with HP P420i embedded with AQLEN=1020
      ***Enable HBA mode/passthrough on P420i using HPSSACLI and following ESXi commands
      -Make sure disks are wipe clean and no RAID exists
      -Make sure FW is latest v5.42
      -Make sure ESXi device driver is installed v5.5.0-44vmw.550.0.0.1331820 http://www.vibsdepot/hpq/feb2014-550/esxi-550-devicedrivers/hpsa-5.5.0-1487947.zip

      -Put host in MM, from ilo of ESXi in support mode (Alt+F1) execute the following

      To View controller config using HPSSACLI with ESXCLI
      ~ # esxcli hpssacli cmd -q “controller slot=0 show config detail”
      To enable HBA mode on P420i using HPSSACLI
      ~ # esxcli hpssacli cmd -q “controller slot=0 modify hbamode=on forced”

      Reboot the host & perform a scan and walah … disks will show up in vSphere web client on each host>devices>before you enable vSAN

    Leave a Reply