Fiber channel round-robin load balancing

Duncan Epping · Mar 5, 2008 ·

There’s a nice article about “round-robin” load balancing on SystemsArchitech which got me a bit dazzled about this new functionality:

esxcfg-mpath – lun <*.lun> —policy custom –H minq –T any –C 0 –B 2048
The policy states that the LUN should utilize a custom policy that determines which (of two) HBA to utilize based on the minimum queue length. This HBA selection is triggered every 2048 blocks transmitted to a given LUN over the same target. The policy will use any targets available to either of the two HBAs. In using storage that manages host port load-balancing, LUNs will only have two paths (one per fabric) and the storage array will perform storage array host port balancing from within its own management. With other storage arrays it is typical to perform this host port balancing from the hosts accessing the given storage array.

With 10 ESX hosts set with this policy wouldn’t it probably cause path thrashing on an active/active SAN? You never know which controller will access a specific LUN and if your unlucky it will be switching from controller A to controller B every millisecond. Anyone else any thoughts on this new feature and the possible danger?

Comments

Suresh Thoppay says

7 March, 2008 at 02:48

It should not be an issue for Active/Active SAN.

Even for Active/Passive, if zoning is done properly, it is not a big issue
Scott Lowe says

8 March, 2008 at 03:08

I had a customer tell me today that he’d been experimenting with this functionality. Granted, it was only with a single server running ESX Server 3.5. He did see an improvement in throughput versus a single path for a LUN, but he also uncovered an oddity: VirtualCenter showed all the traffic going through a single HBA.

I applaud VMware for including this “experimental” functionality, but I think I’ll wait until the experimental status is removed before I really try this with production workloads.
Brandon Meyer says

28 March, 2008 at 22:15

I have been using this “experimental” feature in production for a while now. It seems that I have improved speed because I had 12 ESX servers going through a single SP. Now they all use different paths. Each server has 8 paths to the EVA 8000’s LUNs. I had the same concern about thrashing with multiple servers accessing the same LUN through differnt SPs.One person said this isn’t an issue with an Active\Active array. Something tells me that it is a problem, but I can’t find anything telling me either way. Does anybody know for sure and have something to explain why or why not?
Gregory Perry says

29 March, 2008 at 05:51

Path thrashing is only an issue if you have an Active/Passive array. In an Active/Passive array, only one SP can control a LUN at any given time. Should the Active storage fabric to a single ESX server get disconnected for some reason, ESX then starts talking to the LUN/VMFS through the secondary path to the Passive SP. Once that LUN/VMFS is accessed through the secondary Passive path, control gets passed back and forth between the failed host and the other ESX initiators which are zoned to the same LUN/VMFS, which then causes contention (path thrashing). For a more in depth overview of what the problem is, read “Understanding Path Thrashing” on page 109 of the VI3.5 Fibre Channel SAN Configuration Guide.
Swap says

2 April, 2014 at 09:48

Little off-topic, how does SIOC (styorage I/O Contrl) feature behave when there are multiple ESXi hosts with unequal no. of VMs running ? Lets say ESXi-1 running 100 VMs while ESXi-2 running 10 VMs ?

(Assuming either auto DRS is off or ESXi hosts have different specs which can lead to such unequal VM distribution.)

Related

Reader Interactions

Comments