I was reading an article last week by Ray Lucchesi on Virtual Volumes (vVols) and queueing. In that article (and podcast) Ray (and friends on the podcast) describe vVols and the benefits they bring but also a potential danger. I have written about vVols before and if you don’t know what it is or does then I recommend reading those articles. I have been wondering as well, how all of this works, as I also felt that there could easily be a bottleneck. I had some conversations over the last couple of weeks and I figured I would share it with you instead of just leaving a comment on Ray’s blog. Lets look at an architectural diagram first:
In the diagram above (which I borrowed from the vSphere Storage blog, thanks Rolo) you see two important constructs which are part of the overall vVols architecture namely the Storage Container aka Virtual Datastore and the Protocol Endpoint (PE). The Storage Container is where the vVols will be stored. The IO though is proxied through the Protocol Endpoint. You can imagine that if we would not do this and expose every single vVol directly to vSphere that you would have 1000s of devices connected to vSphere, and as you know vSphere has a 256 device limit at the moment. This would never scale, and as such the Protocol Endpoint is used as an access point to a vVols capable storage system.
Now think about a VMFS volume and look at the vVols architectural diagram again. Yes, there is a potential bottleneck indeed. However, what the diagram does not show is that you can have multiple Protocol Endpoints. Ray mentions the following in his post: “I am also not aware of any VASA 2.0 requirement that restricts the number of PEs for a storage system’s support of a single vSphere cluster”. And I can confirm that VMware did not limit the number of Protocol Endpoints in any shape or form. I read the specifications and it literally states 1 PE at a minimum and preferably more. Note that vendor implementations of vVols may differ, I have seen implementations that describe many PEs per storage system, but also implementations which have 1 PE per storage system. And in the case of 1 PE per storage system can that be a bottleneck?
The queue depth of the Protocol Endpoint isn’t limited to 32 like a regular LUN when multiple VMs are contending for IO (“disk.schednumreqoutstanding”) or 64 (typical device queue depth) but set to 128 by default. This can be increased when required however. Before you do, please consult your storage vendor. There are a couple of variables that need to be taken in to account like the max device queue depth for instance and then there also is the HBA max queue depth as well. (For NFS queue depth is no concern typically.) The potential constraint when there is only (uncommon) a single PE can be mitigated. What is important here is that vVols itself does not impose any constraints.
Also, note that some storage vendors have an implementation where the array actually can make the distinction between regular IO and control/management related IO. Regular IO in those cases doesn’t proxy through the PE, which means you will not fill up the queue of the PE. Pretty smart.
I am hoping that clears up some of the misunderstandings out there.
In vSphere 5.5 the command you mention “disk.schednumreqoutstanding”, would appear to be “Disk.SchedQControlSeqReqs”. Is this correct? In your best guestimation if HBA=600 SAS=254 SATA-SSD=32 Say 600+254-32, or 600/avg#guests+254+32, im just grasping at straws now. How might one start to quantify a correct value here? Moreover where might i find some deep dive info on this subject? Thanks!
Duncan Epping says
you are mixing a couple of things up here. Disk.SchedQControlSeqReqs is not related directly here. It is DSNRO as mentioned. More details about DSNRO can be found here: http://www.yellow-bricks.com/2011/06/23/disk-schednumreqoutstanding-the-story/
Also, when it comes to an HBA then SAS/SATA etc is not really relevant as the devices are SAN devices and the max queue depth is not fixed
Thank you very much.