What’s the point of setting “–IOPS=1” ?

Duncan Epping · Mar 30, 2010 ·

To be honest and completely frank I really don’t have a clue why people recommend setting “–IOPS=1” by default. I have been reading all these so called best practices around changing the default behaviour of “1000” to “1” but none of these contain any justification. Just to give you an example take a look at the following guide: Configuration best practices for HP StorageWorks Enterprise Virtual Array (EVA) family and VMware vSphere 4. The HP document states the following:

Secondly, for optimal default system performance with EVA, it is recommended to configure the round robin load balancing selection to IOPS with a value of 1.

Now please don’t get me wrong, I am not picking on HP here as there are more vendors recommending this. I am however really curious how they measured “optimal performance” for the HP EVA. I have the following questions:

What was the workload exposed to the EVA?
How many LUNs/VMFS volumes were running this workload?
How many VMs per volume?
Was VMware’s thin provisioning used?
If so, what was the effect on the ESX host and the array? (was there an overhead?)

So far none of of the vendors have published this info and I very much doubt, yes call me sceptical, that these tests have been conducted with a real life workload. Maybe I just don’t get it but when consolidating workloads a threshold of a 1000 IOPS isn’t that high is it? Why switch after every single IO? I can imagine that for a single VMFS volume this will boost the performance as all paths will be equally hit and load distribution on the array will be optimal. But for a real life situation where you would have multiple VMFS volumes this effect decreases. Are you following me? Hmmm, let me give you an example:

Test Scenario 1:

1 ESX 4.0 Host
1 VMFS volume
1 VM with IOMeter
HP EVA and IOPS set to 1 with Round Robin based on the ALUA SATP

Following HP’s best practices the Host will have 4 paths to the VMFS volume. However as the HP EVA is an Asymmetric Active Active array(ALUA) only two paths will be shown as “optimized”. (For more info on ALUA read my article here and Frank’s excellent article here.) Clearly when IOPS is set to 1 and there’s a single VM pushing IOs to the EVA on a single VMFS volume the “stress” produced by this VM would be equally divided on all paths without causing any spiky behaviour. In contrary to what a change of paths every “1000 IOs” might do. Although a 1000 is not a gigantic number it will cause spikes in your graphs.

Now lets consider a different scenario. Let’s take a more realistic one:

Test Scenario 2:

8 ESX 4.0 Hosts
10 VMFS volumes
16 VMs per volume with IOMeter
HP EVA and IOPS set to 1 with Round Robin based on the ALUA SATP

Again each VMFS volume will have 4 paths but only two of those will be “optimized” and thus be used. We will have 160 VMs in total on this 8 Host cluster and 10 VMFS volumes which means 16 VMs per VMFS volume. (Again following all best practices.) Now remember we will only have two optimized paths per VMFS volume and we have 16 VMs driving traffic to a volume, but not only 16 VMs this is also coming from 8 different hosts to these Storage Processors. Potentially each host is sending traffic down every single path to every single controller…

Let’s assume the following:

Every VM produces 8 IOps on average
Every host runs 20 VMs of which 2 will be located on the same VMFS volume

This means that every ESX host changes the path to a specific VMFS volume every 62 seconds(1000/(2×8)), with 10 volumes that’s a change every 6 seconds on average per host. With 8 hosts in a cluster and just two Storage Processors… You see where I am going? Now I would be very surprised if we would see a real performance improvement when IOPS is set to 1 instead of the default 1000. Especially when you have multiple Hosts running multiple VMs hosted on multiple VMFS volumes. If you feel I am wrong here or work for a Storage Vendor and have access to the scenarios used please don’t hesitate to join the discussion.

<update> Let me point out though that every situation is different, if you have had discussions with your storage vendor based on your specific requirements and configuration and this recommendation was given… Do not ignore it, ask why and if it indeed fits –> implement! Your storage vendor has tested various configurations and knows when to implement what, this is just a reminder that implementing “best practices” blind is not always the best option!</update>

Comments

Kenneth van Ditmarsch says

30 March, 2010 at 14:38

Hi Duncan,

I’m wonderig, isn’t there any (time)overhead whenever the swapping from 1 path to the other? If there is than setting it to “1” wouldn’t make any sense at all.

Kenneth
dconvery says

30 March, 2010 at 14:46

This doesn’t answer your IOPS setting question. You can assign a service processor to “own” a LUN. So you could attempt a little bit of load balancing on the EVA by dividing the LUNs by SP. That way you can make use of all four (or eight) paths.
Justin says

30 March, 2010 at 14:48

I have also seen the recommendations for changing this setting to 1, but this was from the EMC side. I actually considered changing the value to 1 until I read about this bug with vSphere 4.0 U1 and this setting.

http://virtualgeek.typepad.com/virtual_geek/2009/12/vsphere-4-nmp-rr-iooperationslimit-bug-and-workaround.html

Does anyone know if this issue has been resolved with a patch?
Duncan Epping says

30 March, 2010 at 14:50

Not sure what you mean Dave. The EVA is an asymmetrical array so you can only use half of the paths you see. (When you see 4 paths, only 2 will be optimized)

The question why would you set it to anything else than the defined default? Does it make sense? What are you trying to achieve? When does it make sense? (if at all)

I personally don’t see where you would gain if you are using more LUNs than you have paths in a multi host environment.
Frederic says

30 March, 2010 at 15:09

Hello Duncan
From my working with the EVA, it is not represented as asymmetrical array
on windows, all 4 path will be used for io when you use the MPIO package of HP
Duncan Epping says

30 March, 2010 at 15:13

But I am not talking about Windows and it is asymmetrical for sure…
Jeff says

30 March, 2010 at 15:15

This will be resolved in 4.0 U2 See the email below from vmware.

Thank you for the information you provided. I was able to find a corresponding problem report and it looks like the bug you mentioned will be fixed in Update 2 of ESX 4.0. Unfortunately, I cannot tell you at this time when ESX 4.0 U2 will be released.

Regards,

Technical Support Engineer
Global Support Services
VMware Inc.
Chad Sakac says

30 March, 2010 at 15:26

Disclosure – EMCer here.

IMO, I wouldn’t recommend changing the default RR iooperationslimit value. Not just because of the item noted in the link to my blog post, but for the point you note, Duncan.

The “spikyness” caused by the 1000 IO duration to a single path shows up in only a “artificial” situation (very small number of VMs and datastores).

This often shows up in benchmarking activity, but in most real world workloads (many VMs, many hosts, many datastores), there is no material upside. If the benefit isn’t large, it seems to be to be in the best interest of the customer to KISS.

Also for others, it is worth noting that EVAs, CLARiiONs, NetApp – and many others, while they can appear “active active” via ALUA (Asymmetric Logical Unit Access), the paths on “non-owning SP” is “active, unoptimized”. Each of the arrays has a different degree of “asymmetry” interally (ergo how much of a “penalty” is there on the array for serving IO via the “non-owning” brain/storage processor.

BTW – you can force vSphere to use non-optimized paths, but to save people from shooting themselves in the foot, I won’t go further down that discussion 🙂

This is one “classic” architectural difference with “enterprise-class” arrays (as opposed to the previous category which are “mid-range” arrays)- which typically have symmetrical internal architectures, an IO can be served by any front-end, and cache behaviour is global across all “brains”.

For what it’s worth – this is covered well in Mastering vSphere 4, Chapter 6.
Duncan Epping says

30 March, 2010 at 15:31

Thanks Chad for confirming my suspicion. I do however would love to see a comment from HP,EQL or anyother vendor who tested it and recommends it. I would love to see the justification for this.

And I know you can use non-optimized paths but no way I will ever recommend it as you will not only shoot yourself in the foot but probably lose your leg completely. 🙂
Daniel says

30 March, 2010 at 15:56

VMworld 2009 in SF
TA3264

http://mylearn.vmware.com/courseware/51227/TA3264_formatted.pdf

http://sessions.vmworld.com/lcms/mL_course/courseware/48059/index.html

Wow, i hope nobody can access this files directly…

Daniel
vmachine says

30 March, 2010 at 16:12

Duncan – I would also agree with your opinion.

Kenneth: As Active pathes don´t have to change/switch there is no lag or overhead when swapping. There directly accessable and changing between active path 1 and two doesn´t impact performance.

dconvery: I agree that you can choose the “owning” Controller as a storage admin by yourself, so you can do a manually multipathing on storage side. VMware using ALUA will only access (when using a EVA) two of four pathes (because of the asymmetric architecture – 2 pathes optimized, 2 pathes are not). But if using 10 LUNs, you can configure 5 LUNs owned by Controller A and 5 LUNs owned by Controller B. VMware is accessing 5 LUNs via Controller A (2 pathes) and 5 LUNs via Controller B(2 pathes). That means full usage and manual distribution of all pathes the EVA has.
Daniel says

30 March, 2010 at 16:36

my comment is gone, nice
Rob B says

30 March, 2010 at 17:01

I’m not a vendor – but I am running our production infrastructure with the IOPS=1 setting (including a workaround for the bug mentioned above). I won’t go into a lot of detail about our config, but we have 30 hosts using storage from 4 EVAs and a pile’o’lun’s. And I did see an overall small improvement in performance – not the drastic numbers you can get from a single-VM synthic benchmark, but worth doing.

I believe the main benefit comes from the bursty-ness of IO of some VMs. Allowing for the fact that with a semi-large infrastructure you will be doing a fairly good job of load-balancing anyways, there is still room for improvement when a single VM suddenly needs to do a batch of IO. With IOPS=1000 (or any number > than the lun queue depth) that VM can only get as many outstanding IOs as the queue depth of a single path (default 32), and only after 1000 IO’s will it start filling the queue of the next path (and the 32 queued in the first path will quickly complete, leaving that queue empty). With an IOPS= value the queue depth were more spiky and intermittant.

And as far as I know, there is no downside or overhead to spreading IOs across all available optimal paths, so why not.
nate says

30 March, 2010 at 17:19

I would think alternating paths with every single I/O would cause a pretty decent overhead on the ESX host itself at least when pushing a lot of I/O.

My own VM environments consist of about 25 hosts, around 200-300 VMs, all of them fiber attached. Combined they maybe average 350 IOPS(bursts to higher but really short bursts). A majority of the hosts are still on ESX 3.5 without round robin MPIO.

If your pushing like 1,000 IOPS, switching paths 1,000 times just seems really excessive.

my storage is a 4-controller 3PAR T400(every LUN is evenly distributed over all 4 controllers).
duncan says

30 March, 2010 at 17:20

The why not part is easy to answer: As VMware tests are mostly conductect with the default values of a 1000.

Also keep in mind that there IS an overhead. (Not sure what the impact is though) As most SCSI reservations are dealt with per path there is an overhead when changing paths for every IO as it will return a scsi reservation conflict(although a false one) for that IO and thus a retry of the IO is needed. I wonder what for instance the effect would be on a thin provisioned environment.
Michael says

30 March, 2010 at 17:33

Since some EVA/VMware users are out here…I’m facing the problem that LUN transitioning on our EVA 4400 (09522000) doesn’t seem work correctly:

When changing the preferred path/mode for existing LUNs that are presented to VMware vSphere 4 from i.e. Path A/failover (Controller 1) to Path B/failover (Controller 2), the managing controller changes for some seconds but will then automatically revert to controller 1. This also happens when using fixed paths to Controller 2 in vSphere (where from my understanding the EVA should actually transition the LUN to the controller that is serving the most I/O).

This however also happens when creating new LUNs from scratch. Even when selecting Path B/failover, the managing controller will be controller 1 and not controller 2.

Any ideas why this might be the case?

Thanks
Doug says

30 March, 2010 at 21:08

Not to go way off topic, but I definitely agree with questioning IOPS=1, and I have yet to implement vSphere on an EVA with that change made. Changing the default PSP to RR, yes. With regard to the EVA and owning controller switches, the EVA can access all LUNs down all paths, but, as Duncan mentioned, there are ‘optimal’ paths that refer to the controller owning the LUN.

The EVA will automatically (‘implicitly’) transition a LUN in the event of a controller failure (may be obvious), when it detects that a large percentage of the I/Os are being sent to the non-owning controller, or in order to maintain DR group consistency. That last one is usually what gets people. If you create a DR group containing multiple Vdisks, the EVA will move all of those Vdisks to the same owning controller.

Specifically, usage measurements are taken on an hourly basis and implicit failover occurs if >= 2/3 of the reads occur on the non-owning controller. For a DR group >=2/3 of the total of all reads (for all LUNs in the group) must occur on the non-owning controller.
Frank Denneman says

30 March, 2010 at 21:54

Doug,

An Implicit Lun Transfer (ILT) is disabled if a vDISK is a member of a DR Group. So all proxy-reads will happen (if no ALUA SATP or ESX 3.5 is used)
Doug says

30 March, 2010 at 22:08

Thanks Frank!
My EVA information must be a bit dated… I know that cause me heartburn in the past — and I have a doc dated 2006 that indicates the issue with DR group and implicit transitions. 🙂
Frank Denneman says

30 March, 2010 at 22:34

No probs Doug 🙂
Drew says

31 March, 2010 at 05:55

This is the response I got from vmware support when logging the issue with the iops parameter resetting to a random number. As we have an EVA I though this info could come in handy

………………………………………………..
Setting the iops value to 1 is considered to be too low and is not supported. Because after every I/O it will try to switch path and that is not an ideal configuration.

# esxcli nmp roundrobin setconfig -d naa.6000eb391530aa26000000000000130c –iops 64 bytes 64000000 –type iops

With the value of “iops” set to 64, the value of “bytes” set to 64000000, and the value of “type” set to iops, ESX server will switch paths whenever the number of iops per path exceeds the next 64 iops lot. if the “type” parameter were set to the value “bytes”, ESX server will switch paths whenever the number of I/O commands per path exceeds the next 64000000 commands lot.

Note: On the “type” parameter:

-t|–type Set the type of the Round Robin path switching that should be enabled for this device.

Valid values for “type” are:

bytes: Set the trigger for path switching based on the number of bytes sent down a path.
default: Set the trigger for path switching back to default values.
iops: Set the trigger for path switching based on the number of I/O operations on a path.

Note that cutting down the number of iops does present some potential problems. With some storage arrays caching is done per path. By spreading the requests across multiple paths, you are defeating any caching optimization at the storage end and could end up hurting your performance.
…………………………….

I can also confirm that I was told the bug would be fixed in update 2
Chad Sakac says

1 April, 2010 at 03:32

Disclosure – EMCer here.

Duncan, you asked to see some of the testing data, I posted the results of the original testing (and a link to a detailed document).

You can find it here if you are curious: http://virtualgeek.typepad.com/virtual_geek/2010/03/understanding-more-about-nmp-rr-and-iooperationslimit1.html
duncan says

1 April, 2010 at 07:40

Thanks Chad! I appreciate the update and info. Seeing your data and numbers it appears I am on the right track. Especially considering the amount of VMs/datastores you used are still conservative compared to what we see in real life environment.
invisible says

7 April, 2010 at 23:02

Nice post,

I have a question – what is the concern to have 1000 IOs per single path? Bandwidth bottleneck on path itself? But even with 65K per IO bandwidth would be 65MB/sec per VM. Single 4G FC path from an ESX server to the storage would easily serve 8 VMs all of them using the same path to the storage (above numbers are approximate just for an illustration).

My personal experience with FC is that it is always the storage array and disk groups/luns configuration which creates bottleneck, not media or bandwidth.
Rob Q says

12 May, 2010 at 20:48

The data from Chad is from an Active/Passive array which use the 1000 iops perameter better. I went with a setting of “16” because it is not as drastic as “1”. i flat lined my IOPS down 6 paths, and have each one of my controllers working evenly…
djlaube says

18 May, 2010 at 17:08

Thanks all for the useful info on this setting. Also of note, a new “best practice” guide that I received for using the HP Lefthand P4000 series storage (all iSCSI) with vSphere does recommend changing the IOPS setting to “1”. So I guess you can add that vendor to your list of ones that are making that recommendation.
Christian Skovdal says

1 June, 2010 at 12:19

Hi all
I found a workaround for the reset of iops values after reboot.
just use the –config, and the changes will persist after reboot.
Got a heads-up from vmware that the error bug should be fixed in update 2

for i in `ls /vmfs/devices/disks/ | grep naa.600|grep -v :1` ; do esxcli nmp psp setconfig –device $i –config “policy=iops;iops=1;useANO=1”; done

Kind regards,
Christian Skovdal – Denmark
Aboubacar Diare says

10 July, 2010 at 01:46

Hey Duncan.

So I am finally making due on a promise to Calvin about commenting on your blog about the IOPS=1 recommendation. I take the full blame on the delay this has taken to respond due to my other commitments so be sure to cut Calvin some slack. :).

Anyway, as you know best practice recommendations are made for various reasons. Some are performance driven, others simply to save the customer time in configuration and/or upgrade and others are just the lesser of many evils when all things considered. So I’ll loop back around to this after I explore the technical data behind our recommendation.

When we looked at IOPS=1 just like everyone else we were after performance degradation or improvements (better latency and/or improved IOPS and MB/s).

In our meticulous approach we always tend to start off with a small environment we can use to run many workload variations to draw conclusions. But also derive assumptions we can use to adjust/derive next steps and/or configuration(s) that will make the most sense and prove/debunk these assumptions.

With this investigation we started with very small config

– Single ESX server
– Single VM running PVSCSI adapter
– Single datastore
– Single VMDK
– 2 EVA controllers
– 2 ports on each controller used for IO
– LUN size 500GB
– Datastore 500GB
– VMDK 200GB

and ran a bunch of workloads sequential, random, 60/40, 100% READ and WRITE for various IO sizes 8k, 16k, 64k, 128k in all combinations of the above.
Then scale out the configuration from there.

What we found is that in most cases IOPS=1 led slightly better throughput and/or latency but the rest of the time is pretty much on par with results when IOPS=1000. However one interesting data point looking at the whole environment was the EVA host port Queue utilization distribution.

Because EVA is ALUA capable being an Asymmetric Active Active array, ESX4.0 also being ALUA compliant when using the Round Robin and MRU policies, is able to direct IO to a LUN only through the optimal controller ports for that LUN. The use of the UseANO flag can send IOs to all controllers but by default, ESX4.0 will detect a LUNs optimal path and Round Robin all IOs to the port on that controller.

What happens then is that the IOPS=1000 setting causes a very un-even usage of the EVA host port queue. Don’t give me wrong, the EVA port queues are deep: 2048 for each port on a controller.

So in the case of the simplex test config I described above, When you look at an average usage over an equivalent period of time, the peak host port queue usage for IOPS=1000 is double that of IOPs=1 for each controller port used. So my VM was running the PVSCSI adapter so I know that I won’t see more than 64 IOs queue since that is the PVSCSI max queuedepth. With the QLogic HBA queue set to its max of 256 i can be sure that all IOs the single VM can issue will make it to the array host port un-throttled. So in this test you note that for IOPS=1 both controller ports show steady queue utilization of ~30.
However with IOPS=1000 because at any given point all IOs will be sent down a single path the host port queue utilization has to account for double the number above. I have some nice graphs that show this but I can’t seem to find how to attach them to this comment.

So the impact of this is that as you add VMs and datastores to the configuration, IOPS=1 literally gives us a linear increase in terms of port utilization while yielding the same performance observation when compared to IOPS=1000. However IOPS=1000 gets messier and messier because everytime it accounts for twice as many IOs.

One functional example where this can cause undesirable behavior is as follows:
As you deploy a VI infrastructure this is what could happen:
– It would take ~16VMs on one ESX server to create 1000 IOs using PVSCSI
– If you have two EVA ports you’ll send all 1000 IOs down the first array port with IOPS=1000 then the next 1000 IOs down the next port.
– This means that at any given point on time in this configuration the array has to be able to handle at the host port queue 1000 IOs.
– If you add another ESX server with 16 VMs you simple double that number to 2000 IOs and you are now quickly getting in areas where you can create queue full conditions which impact application IOs. This is especially noticeable in configurations where other Operating systems may be sharing the same array and or configurations that are using the same array ports for say Array based replication. Queue full can impede your replication IO.

– Now if you used IOPS=1 you would only need to account for at most 500 IOs at each host port for the scope of our example of the 16VMs issuing 1000 IOs to two ports.
– This means it would take you 4 ESX servers and 64 VMs to get to the same border line queue full condition in our example.

We can increase the number of host ports per controller by using a larger EVA say an 8400 which would mean we can double the configuration based on the example above. Yielding 4 ESX server 32 VMs for IOPS=1000 and 8 ESX servers 64 VMs for IOPS=1. Again in the context of our example.

So looping back for reasons on the recommendation, though the slight latency and throughput gains with IOPS=1 yielded for most workloads isn’t the most compelling reason, this reason was pretty compelling to keep a well balance environment that can be scaled. Also helping an administrator to account appropriately for the impact scaling out the environment will have on current installations.

Please let me know how I can share with you some of the graphical data I referenced to make my point more clear.

I hope I made this clear enough so let me know if you have any questions.

R/
Aboubacar.
Aboubacar Diare says

10 July, 2010 at 01:50

Christian Skovdal

A better command line to use in the example you gave above, instead of:

for i in `ls /vmfs/devices/disks/ | grep naa.600|grep -v :1` ; do esxcli nmp psp setconfig –device $i –config “policy=iops;iops=1;useANO=1″; done

You can use:

for i in `esxcli nmp device list | grep naa.600|grep -v :1` ; ……..

The nice thing about this is that often times when you have partitions in the /vmfs/devices/disks/ folder for the various disks. The `ls /vmfs/devices/disks/` command will run and fail against those. So to avoid that the esxcli nmp device list will just return all the devices only.

R/
Aboubacar.
Aboubacar Diare says

10 July, 2010 at 01:57

Back on the IOPS=1 topic….

I have had customers tell me that they’ve experimented with other values than IOPS=1 on EVA and found sweet spots around IOPS=8 and that performance would break down in their environment once they got to double digits for this value.

This is just an aside but if any of these customers are reading this post, please share those results with everyone.

Best Regards,

Aboubacar.
duncan says

10 July, 2010 at 12:36

Thanks Aboubacar for your response.

One thing that I still wonder though is what the impact is of both in an average environment. Let’s say 8 hosts and 200 VMs in total. These 200 VMs would, according to best practices be divided across at least 7 VMFS volumes and as such 7 different LUNs.

I wonder why this has been tested with a non-real life scenario vs a real life scenario like the above. Would the their still be a substantial gain?
Aboubacar Diare says

11 July, 2010 at 10:49

Duncan.

Real life scenario is very subjective. Lab environment are so much in some ways different than customer environment in various ways. Especially when it comes down to recreating customer IO workloads because configurations are typically easy to dupe but IO workload and their variation in the course of business are the challenge.

So though our baseline tests were performed in the simplex config I show above, additional tests performed as I mentioned where in a scalled out environment. We definitely did not test 8 ESX servers and 200 VMs.

However the point I was trying to make in my post with regards to the host port queue utilization was that with IOPS=1 as the environment scales out port queue utilization is much more linear and predicable than with IOPS=1000. And as you scale out the configuration the behavior is compounded. The value is also not just in improved performance seen with some workloads and configuration but also with the ability for an administrator to predict accurately the impact the addition of servers, datastores, VMs etc.. will have on their array host port. That from my point of view is a pretty nice value.

If you can shoot me an email, I’ll visually show you some of the data I am talking about. It’ll probably help convey my point better.

Finally, for us it doesn’t get more real life than when a customer tells us the setting is working well in their environment for their workload and/or that while experimenting with their production IO, any value above 8 just wasn’t adequate for their environment. Such tests on production system with real life IO is very telling. And we hope to continue hearing from more and more customers who have implemented these recommendation or tweaked them to their specific need.

Best Regards,
Ron Sexton says

24 February, 2011 at 01:36

Well testing with IOPS=1 for my situation with a heavy I/O SQL application got me about a 30 minute improvement on a load process that runs about 8 hours. This was an improvement over IOPS=1000 .
And i might add that the newer PowerCLI worked nicely for querying and configuring this.

This is with the default bytelimit. There may be a ‘sweeter’ spot but for now its a nice improvement.

ESXi 4.1 U1
EMC VMAX using 4 FAs.
BL465c G7 AMD 12 core ESX hosts with 64GB of memory.
Emulex 4GB adapter.

Thanks for the information.
- Duncan Epping says
  
  24 February, 2011 at 08:42
  
  Yes but keep in mind that you focused on a single VM in this case. Would it really matter if you have 1000 VMs running on 50 LUNs with moderate IO?
  
  I am not saying it won’t make a difference, I am saying that all of these tests are usually focused on a single heavy duty workload, while that isn’t always reality.
  
  Thanks though for providing these numbers as they will help others making decisions.
Phil says

27 April, 2011 at 05:13

So – can the IOPS and Multipathing policies be changed on the fly? Or should a host be evacuated prior?
Justin McD says

20 April, 2012 at 19:52

I know this is an old post now, but up above Duncan mentioned that he would “not recommend” using unoptimized paths. Am I missing something because I thought that was required in order to send I/O down both paths after switching to Round Robin (at least on a mid-range array like the EMC CX4-240). I was reading this post from Jason Boche (http://www.boche.net/blog/index.php/2010/02/04/configure-vmware-esxi-round-robin-on-emc-storage/) and he mentions “The nmp roundrobin setting useANO is configured by default to 0 which means unoptimized paths reported by the array will not be included in Round Robin path selection unless optimized paths become unavailable.”

So in order to see I/O on both paths in ESXTOP, you must use the command:

esxcli nmp roundrobin setconfig –useANO 1 –device naa.50060160c4602f4a50060160c4602f4a

Is Duncan referring to something else?
Duncan Epping says

21 April, 2012 at 11:39

But Jason was doing that to proof a point, the fact that you can have I/O across multiple paths. It doesn’t mean it is recommended however. By using unoptimized paths there will be an extra hop for I/Os which means additional latency.
Justin McD says

23 April, 2012 at 17:02

Duncan, thanks for clearing that up and for the response. I think I was (and still may be) slightly confused on the performance benefit of switching to the Round Robin NMP. So if it is not recommended to allow Round Robin to use unoptimized paths (makes sense), is the benefit that it will still spread the I/O load across two different HBA’s and Fibre Channel switch ports (rather than the SP). So I assume I would need to manually balance the LUNs across the two SPs to load balance the storage array? Or will Round Robin nor any NMP use both HBA’s for load balancing?

Currently I am using MRU with two HBA’s connected to an EMC CLARiiON CX4-240 (FLARE 30). Only one HBA appears to have I/O on it currently and I would like to change this.
Aboubacar Diare says

16 May, 2012 at 03:52

Duncan.

Are you seeing more customers implement the IOPS=1 recommendation in their implementations? As I browse through various storage best practices including EMC, HP and others, I see that most vendors are starting to line up behind the IOPS=1 recommendation.

I am wondering if you are seeing lots of customer actually adopting this recommendation.

R/
Aboubacar.
Duncan Epping says

16 May, 2012 at 05:44

Not really to be honest, as most customers also start to realize that with a larger amount of VMs / Datastores / Hosts the randomness is already high.
Michael Sasse says

3 November, 2012 at 23:15

How about modifying the RR policy based on bytes and leaving IOPs at default? I’ve worked with a client who wanted more bandwidth across their two iSCSI 1Gb connections so we tried changing the bytes value from 10MB to 1MB. This improved their read and write throughout from within a VM from 150MB/s to 207MB/s with a small drop in latency. Random reads and writes had no change.
Andy says

21 February, 2013 at 10:57

I’ve read through the original post, and the informative comments – however I am still at odds as to change the IOPS value according to the still current technical whitepaper.
HP Enterprise Virtual Array Storage and VMware vSphere 4.0, 4.1 and 5.x configuration best practices

We have a EVA 6350, and looking to migrate from a 4400 in the coming weeks.
Aboubacar Diare says

21 February, 2013 at 12:04

Andy.

Feel free to email me directly at aboubacar.diare@hp.com and I can help answer questions you might have about the recommendation in the paper referenced.

R/
Aboubacar.
Vitto says

23 March, 2013 at 23:45

hi Duncan, we’ve recently experienced case with VMAX and latency issues on VDI farm, seems like VMAX has problem with latency when it comes to heavy parallel IO (50 VDI per LUN). We saw very strange behavior when latency dropped to 300-500ms per LUN (AV updates). 6 paths, RR/1000IOPs threshold so pretty default setting for active/active SAN. We tried everything and once we set this to 1 IO situation instantly improved (latency around 3-5ms)
Some details: thousand of VDIs, dedicated 6 FAs, 4 engines, 16 ESX 5.OU1
Daniel Vexø says

28 August, 2013 at 08:53

Hi Duncan and everybody else

I too do not see the idea in switching path for every IOPS – or at least not when we’re talking about fx iSCSI.
But I do see the idea in tweaking it to fit with the frame size, because do we really want to switch paths after every IOPS or do we want to switch paths when the frame size is reached so as to saturate the entirety of our bandwidth back to the SAN?
And I’m not the only one with this thought; Dave Gibbons has given his view on a usable tweak here: http://blog.dave.vc/2011/07/esx-iscsi-round-robin-mpio-multipath-io.html
Sadly enough I haven’t got the opertunaty to test any of these solutions, so I’d very much like to hear what you guys think?

Best regards, Daniel…

Related

Reader Interactions

Comments