deepdive

What’s that ALUA exactly?

Duncan Epping · Sep 29, 2009 ·

Of course by now we have all read the excellent and lengthy posts by Chad Sakac on ALUA. I’m just a simple guy and usually try to summarize posts like Chad’s in a couple of lines which makes it easier for me to remember and digest.

First of all ALUA stands for “Asymmetric Logical Unit Access”. As Chad explains and as a google search shows it’s common for midrange arrays these days to have ALUA support. With midrange we are talking about EMC Clariion, HP EVA and others. My interpretation of ALUA is that you can see any given LUN via both storage processors as active but only one of these storage processors “owns” the LUN and because of that there will be optimized and unoptimized paths. The optimized paths are the ones with a direct path to the storage processor that owns the LUN. The unoptimized paths have a connection with the storage processor that does not own the LUN but have an indirect path to the storage processor that does own it via an interconnect bus.

In the past when you configured your HP EVA(Active/Active according to VMware terminology) attached VMware environment you would have had two(supported) options as pathing policies. The first option would be Fixed and the second MRU. Most people used Fixed however and tried to balance the I/O. As Frank Denneman described in his article this does not always lead to the expected results. This is because the path selection might not be consistent within the cluster and this could lead to path thrashing as one half of the cluster is accessing the LUN through storage processor A and the other half through storage processor B.

This “problem” has been solved with vSphere. VMware vSphere is aware of what the most optimal path is to the LUN. In other words VMware knows which processor owns which LUNs and sends traffic preferably directly to the owner. If the optimized path to a LUN is dead an unoptimized path will be selected and within the array the I/O will be directed via an interconnect to the owner again. The pathing policy MRU also takes optimized / unoptimized paths into account. Whenever there’s no optimized path available MRU will use an unoptimized path; when an optimized path returns MRU will switch back to the optimized path. Cool huh!?!

What does this mean in terms of selecting the correct PSP? Like I said you will have three options: MRU, Fixed and RR. Picking between MRU and Fixed is easy in my opinion as MRU is aware of optimized and unoptimized paths it is less static and error prone than Fixed. When using MRU however be aware of the fact that your LUNs need to be equally balanced between the storage processors, if they are not you might be overloading one storage processor while the other is doing absolutely nothing. This might be something you want to make your storage team aware off. The other option of course is Round Robin. With RR 1000 commands will be send down a path before it switches over to the next one. Although theoretically this should lead to a higher throughput I haven’t seen any data to back this “claim” up. Would I recommend using RR? Yes I would, but I would also recommend to perform benchmarks to ensure you are making the right decision.

HA and Slot sizes

Duncan Epping · Aug 12, 2009 ·

This has always been a hot topic, HA and Slot sizes/Admission Control. One of the most extensive (Non-VMware) articles is by Chad Sakac aka Virtual Geek, but of course since then a couple of things has changed. Chad commented on my HA Deepdive if I could address this topic, here you go Chad.

Slot sizes

Lets start with the basics.

What is a slot?

A slot is a logical representation of the memory and CPU resources that satisfy the requirements for any powered-on virtual machine in the cluster.

In other words a slot size is the worst case CPU and Memory reservation scenario in a cluster. This directly leads to the first “gotcha”:

HA uses the highest CPU reservation of any given VM and the highest memory reservation of any given VM.

If VM1 has 2GHZ and 1024GB reserved and VM2 has 1GHZ and 2048GB reserved the slot size for memory will be 2048MB+memory overhead and the slot size for CPU will be 2GHZ.

Now how does HA calculate how many slots are available per host?

Of course we need to know what the slot size for memory and CPU is first. Then we divide the total available CPU resources of a host by the CPU slot size and the total available Memory Resources of a host by the memory slot size. This leaves us with a slot size for both memory and CPU. The most restrictive number is the amount of slots for this host. If you have 25 CPU slots but only 5 memory slots the amount of available slots for this host will be 5.

As you can see this can lead to very conservative consolidation ratios. With vSphere this is something that’s configurable. If you have just one VM with a really high reservation you can set the following advanced settings to lower the slot size being used during these calculations: das.slotCpuInMHz or das.slotMemInMB. To avoid not being able to power on the VM with high reservations these VM will take up multiple slots. Keep in mind that when you are low on resources this could mean that you are not able to power-on this high reservation VM as resources are fragmented throughout the cluster instead of located on a single host.

Host Failures?

Now what happens if you set the number of allowed host failures to 1?
The host with the most slots will be taken out of the equation. If you have 8 hosts with 90 slots in total but 7 hosts each have 10 slots and one host 20 this single host will not be taken into account. Worst case scenario! In other words the 7 hosts should be able to provide enough resources for the cluster when a failure of the “20 slot” host occurs.

And of course if you set it to 2 the next host that will be taken out of the equation is the host with the second most slots and so on.

What more?

One thing worth mentioning, as Chad stated with vCenter 2.5 the number of vCPUs for any given VM was also taken in to account. This led to a very conservative and restrictive admission control. This behavior has been modified with vCenter 2.5 U2, the amount of vCPUs is not taken into account.

HA deepdive

Duncan Epping · Jul 27, 2009 ·

I just refreshed my HA Deepdive page. I had it on my “to do” list for a long time but never got to it. Well it took me a couple of evenings but it’s finally done and I’m happy about it. I just hope you guys find the refresh useful and enjoy it. I also flushed all the comments on the page, if you’ve got any question don’t hesitate to ask them. I might even add a FAQ one day… who knows 🙂

Blades and HA / Cluster design

Duncan Epping · Feb 9, 2009 ·

After reading Aaron‘s excellent articles(1, 2) on Scott Lowe’s Blog I remembered a discussion I had with a couple of co-workers. The discussion was about VMware HA Cluster design in Blade Environments.

The thing that started this discussing was an HA “problem” that occurred at a customer site. This specific customer had 2 Blade chassis to avoid a single point of failure in his virtual environment. All blade servers were joined in one big cluster to get the most out of the environment in terms of Distributed Resource Scheduling.

Unfortunately for this customer at one point in time one of his blade chassis failed. In other words, power off on the chassis, all blades gone at the same time. The firs thing that comes to mind is: HA will kick in and the VM’s will be up and running within no-time. [Read more…] about Blades and HA / Cluster design

High Availability “Deepdive” page

Duncan Epping · Jan 26, 2009 ·

I’ve just created a new Page. This page will also deal about VMware HA. I threw all my “deepdive” posts into one page which makes it easier to find for you guys and search engines. But most important, easier to maintain. When I’ve got more technical in-depth information I will add it to the page.

Check it out and let me know what you think.