I mentioned the new disk IO scheduler in vSphere 5.5 yesterday. When discussing this new disk IO scheduler one thing that was brought to my attention is a caveat around disk limits. Lets get started by saying that disk limits are a function of the host local disk scheduler and not, I repeat, not Storage IO Control. This is an often made mistake by many.
Now, when setting a limit on a virtual disk you define a limit in IOPS. The IOPS specified is the maximum number of IOPS the virtual machine can drive. The caveat is is as follows: IOPS takes the IO size in to account. (It does this as a 64KB IO has a different cost than a 4KB IO.) The calculation is in multiples of 32KB. Note that if you do a 4KB IO it is counted as one IO, however if you do a 64KB IO it is counted as two IOs. Any IO larger than 32KB will be 2 IOs at a minimum as it is rounded up. In other words, a 40KB IO would be 2 IOs and not 1.25 IOs. This also implies that there could be an unexpected result when you have an application doing relatively large blocksize IOs. If you set a limit of 100 IOPS but your app is doing 64KB IOs than you will see your VM being limited to 50 IOPS as each 64KB IO will count as 2 IOs instead of 1. So the formula here is: ceil(IO Size / 32).
I think that is useful to know when you are limiting your virtual machines. Especially cause this is a change in behaviour compared to vSphere 5.1.
Greg Schulz (@storageio) says
@Duncan thanks and great point, it should be obvious that a larger IOP would have more overhead, however what should be obvious is not always so, or known so thanks for amplifying that. Also good point about VMW splitting IOs or handling in 32K increments which is very similar to AWS EBS fwiw. In the situation you are referring to there is a performance cost, where there is a similar focus is with AWS (and other cloud providers) who track and charge for IOs. Just like in your example, with AWS EBS (and other providers) if you have a 64K IO you are going to be charged for multiple IOs which may seem like common sense, however it also is a surprise to many.
Cheers gs
Alexey Savva says
Hi Duncan,
Is this behavior same for 5.1 with default SFQ scheduler or it is specific to mClock scheduler only?
Thanks in advance!
Duncan Epping says
No different behaviour with SFQ
ranjit says
I am a little slow and I didn’t get the last statement – how is 64KB IOs means limiting the VM to 50IOPS. Wouldnt that be 2 IOPS? as 32KB IO is 1 IOPS?
Brian Farrugia says
Hi Ranjit,
That sentence also made me think a bit to see how Duncan came to that conclusion. If I understand properly, 1 IO operation would handle a maximum of 32KB. With 64KB you would need 2 IO operations. With a 100 IOPS limit, you would be doing only 50 64KB IO operations per second. For example, instead of reading/writing 6400KB/s you would only read/write at 3200KB/s. (32KB/s * 100 IOPS)
Hope I got it properly and eventually my explanation helps you understand it.
Duncan Epping says
Correct!
Duncan Epping says
If you set a limit of 100 IOPS but your app is doing 64KB IOs than you will see your VM being limited to 50 IOPS as each 64KB IO will count as 2 IOs instead of 1. So the formula here is: ceil(IO Size / 32), which in this case is: ceil(64/32)=2
James Hess says
I think this change is unfortunate…. What makes the most sense for the storage environment, I suppose, is really the opposite; many storage systems are disk access bound — latency of an I/O, not bandwidth used by an I/O, so it would be nice if there were an option for how the limit will be implemented.
e.g. We really want to encourage large I/Os and well-behaved use of the storage resource (Avoiding unnecessary small IOPS, also avoiding bandwidth waste as well).
The real reason to want to set limits is to be able to restrict the amount of infrastructure used by a particular VM to be aligned with costs the business has agreed to bear for that VM’s workload, while leaving other capacity available for other use cases or higher priority uses…
And in some cases, this would be desired at all times, not just when there is contention. So ideally the limit mechanisms for disk utilization should allow the disk usage to be restricted in a manner that accurately reflects where the scarcity is in terms of use of the storage.
The kind of policy I am thinking I might actually want…..
VM 4k random IOP Limit = 50
But if you are doing synchronous writes, 50% penalty after the first 50 in a second. Limit = 25 IOPs
If you are doing 90% reads, you get a 50% bonus, if your average op latency was low. Limit = 75 IOPS
If your IOPS are 90% reads of a sequential nature, you get double the limit, if your average op latency was low = 100 IOPS.
For a large block read 16K or more, triple the limit = 150 IOPS.
In other words, more flexible disk limits set on a per VM basis or a per Resource pool basis, not a per VMDK basis, but totally different from having Storage I/O control with no fixed disk limits.