4.1

ALUA and the useANO setting

Duncan Epping · Mar 29, 2011 ·

Disclaimer: Now, lets make this very clear. Don’t touch “useANO” unless you are specifically instructed to do so, this article is just for educational purposes.

I had some issues in my lab with an ALUA array. (If you have no clue what ALUA is, read this post.) As you hopefully know with an ALUA array you typically have 4 paths. Two of these paths are marked within vCenter as “Active (I/O)” and the remaining two are marked as “Active”. The command-line interface describes this slightly better in my opinion as it says “Active” and “Active unoptimized”. So lets assume for you a second you would use Round Robin, vSphere is smart enough to only use the paths marked in vCenter as “Active (I/O)”.

During the discussions around the issues the setting “useANO” was dropped. I thought I knew what it did but during the discussion I started to doubt myself. So I did a quick search on the internet and noticed that not a lot of people actually knew what it stands for and what it does. I’ve seen people stating that paths are disabled or hidden… That is not the case at all. It’s not a magic setting. Lets start with explaining what it stands for.

useANO = Use Active-Non-Optimized

So in other words “useANO” allows you to enable the usage of Active-Non-Optimized paths. By default this is of course set to 0 as in a normal situation you wouldn’t want to use a non-optimized path as this would mean traffic would need to flow back to the owning processor. Chad Sakac made a nice diagram that depicts this scenario in his article on ALUA (must read!). Note that “SP B” is the processor that “owns” the LUN and the right path would typically be the path marked as “Active (I/O)”. The left path would be the “Active” path or less elegantly put the Non-Optimized path:

As you can understand having traffic flowing through a non-optimized path is normally something you would want to avoid as this will cause latency to go up. (more hops) This is a scenario of course that could happen when the path to the “owning” processor (SP B in the diagram) is unavailable for whatever reason…. You could also force this to happen by setting “useANO=1”. That is what is does, it allows you to use non-optimized paths. For those who skipped it, please read the disclaimer at the top!

VAAI sweetness

Duncan Epping · Mar 24, 2011 ·

Nothing deep technical this time, I just want to make clear how cool VAAI is! Last week I noticed on twitter that some people reported some nice figures around VAAI. I asked them if they were willing to run some tests and compare VAAI vs NON-VAAI runs. And these were some of the responses I received, I cut them down to the core of the message and I leave it up to you to visit these articles and read them. Thanks for helping me proof this point guys!

vSphere VAAI Performance on the HP P4000 G2 by Barrie Seed

The results are pretty conclusive. For block zeroing on a VMDK, VAAI accelerates the operation by 4-5x

VAAI enabled: 109 seconds

VAAI disabled: 482 seconds

VAAI Awesomeness by Anders Hansen

I guess a picture says more than a thousand words. Difference in percentage for Cloning:

Difference in time for Eager Zero Thick Creation:

Exploring the performance benefits of VAAI by Matt Liebowitz

To the results:

Time to create a 50GB eagerzeroedthick VMDK without VAAI: 10 minutes generating approximately 750 write IOPS on the array

Time to create a 50GB eagerzeroedthick VMDK with VAAI: 1 minute 30 seconds, could not measure IOPS (more on that later)

Clearly there is a significant difference in creating the blank eagerzeroedthick VMDK. How about when Windows 2008 R2 is installed on that VMDK and then converted to a template? How fast can we deploy that template?

Deploying 50GB eagerzeroedthick template without VAAI: 19 minutes generating between 1,200-1,600 IOPS (half read/write, which makes sense since it has to read from and write to the same array)

Deploying 50GB eagerzeroedthick template with VAAI: 6 minutes (again, couldn’t measure IOPS)

NetApp VMware VAAI Performance Tests by Jacint Juhasz

It’s not a surprise, the trend is the same.

Operation Enabled VAAI Disabled VAAI

50GB VMDK creation with cluster support (zeroed) 5:09 9:36

Clone VM within datastore (LUN) 8:36 13:38

Clone VM between datastores (LUN) 8:34 14:36

Storage VMotion 9:38 14:45

With VAAI enabled, there’s no write or read rate (as there’s no read or write from the host side), but the charts shows latency around 8-10ms. With disabled VAAI the chart looks a bit different. For the VMDK creation the write rate is around 100000KBps with 160ms latency (write only, no reads). The read/write operation shows 70000KBps IO rate with 10-15ms latency.

3PAR vSphere VAAI “Write Same” Test Results: 20x performance boost by Derek Seaman

“Write Same” Without VAAI:
70GB VMDK 2 minutes 20 seconds (500MB/sec)
240GB VMDK 8 minutes 1 second (498MB/sec)
1TB VMDK 33 minutes 10 seconds (502MB/sec)

Without VAAI the ESXi 4.1 host is sending a total 500MB/sec of data through the SAN and into the 4 ports on the 3PAR. Because the T400 is an active/active concurrent controller design, both controllers can own the same LUN and distribute the I/O load. In the 3PAR IMC (InForm Management console) I monitored the host ports and all four were equally loaded around 125MB/sec.

This shows that round-robin was functioning, and highlights the very well balanced design of the T400. But this configuration is what everyone has been using the last 10 years..nothing exciting here except if you want to weight down your SAN and disk array with processing zeros. Boorrrringgg!!

Now what is interesting, and very few arrays support, is a ‘zero detect’ feature where the array is smart enough on thin provisioned LUNs to not write data if the entire block is all zeros. So in the 3PAR IMC I was monitoring the back-end disk facing ports and sure enough, virtually zero I/O. This means the controllers were accepting 500MB/sec of incoming zeros, and writing practically nothing to disk. Pretty cool!

“Write Same” With VAAI: 20x Improvement
70GB VMDK 7 seconds (10GB/sec)
240GB VMDK 24 seconds (10GB/sec)
1TB VMDK 1 minute 23 seconds (12GB/sec)

I guess it is needless to say why VAAI rocks and why when you are looking to buy new storage it is important to inform if the array is VAAI capable, and if not make sure you ask when it will support VAAI!?! VAAI isn’t just for specific workloads, VAAI was designed to reduce stress on different layers and to decrease the cost of specific actions and more importantly for you to decrease the costs of operations!

ESXi Management Network Resiliency

Duncan Epping · Mar 22, 2011 ·

When we wrote the HA/DRS book both Frank and I were still very much in an “ESX Classic” mindset. Over the last weeks I had questions around resilient network configurations for ESXi. I referred people back to the book but the comments that I got were that the examples were very much ESX Classic instead of ESXi. Now in my opinion the configuration looks very much the same except that “Service Console” will need to be replace with “Management Network” but I figured I might as well just document my preference for a resilient ESXi Management Network as I needed to do it anyway as part of an update of the book to a future version of vSphere.

In our book we give two examples, one of which is the simple version with a single “Service Console Network” and one with a dual “Service Console Network” configuration. Now I figured I could update both but I’d rather do just one and explain why I prefer to use this one. The one that I have picked is the single “Management Network” setup. The main reason for it being is the reduced complexity that it brings and on top of that multiple Management Networks will make sense in an environment where you have many NICs and Switches but with all these converged architectures flying around it doesn’t really make sense anymore to have 4 virtual links when you only have 2 physical. Yes I understand that something can happen to a subnet as well, but if that is the case you have far bigger problems than your HA heartbeat network failing. Another thing to keep in mind is that you can also mitigate some of the risks of running into a false positive by selected a different “Isolation Response”, typically we see these set to “Leave Powered On”.

The below is an excerpt from the book.

Although there are many configurations possible and supported we recommend a simple but highly resilient configuration. We have included the vMotion (VMkernel) network in our example as combining the Management Network and the vMotion network on a single vSwitch is the most commonly used configuration and an industry accepted best practice.

Requirements:

2 physical NICs
VLAN trunking

Recommended:

2 physical switches

The vSwitch should be configured as follows:

vSwitch0: 2 Physical NICs (vmnic0 and vmnic1)
- When multiple physical PCI devices are available make sure to use a port of each to increase resiliency
2 Portgroups (Management Network and vMotion VMkernel)
Management Network active on vmnic0 and standby on vmnic1
vMotion VMkernel active on vmnic1 and standby on vmnic0
Failback set to No

Each portgroup has a VLAN ID assigned and runs dedicated on its own physical NIC; only in the case of a failure it is switched over to the standby NIC. We highly recommend setting failback to “No” to avoid chances of a false positive which can occur when a physical switch routes no traffic during boot but the ports are reported as “up”. (NIC Teaming Tab)

Pros: Only 2 NICs in total are needed for the Management Network and vMotion VMkernel, especially useful in Blade environments. This setup is also less complex.

Cons: Just a single active path for heartbeats.

The following diagram depicts the active/standby scenario:

To increase resiliency we also recommend implementing the following Advanced Settings where the ip-address for “das.isolationaddress” should be a “pingable” device which is reachable by the ESXi hosts, preferably on the same subnet with as little hops as possible:

 das.isolationaddress = <ip-address>
 das.failuredetectiontime = 20000

Changing the PSP from Fixed to RR

Duncan Epping · Mar 21, 2011 ·

Today I was fooling around with my new Lab environment when I noticed my Path Selection Policy (PSP) was set to fixed while the array (Clariion CX4-120) most definitely supports Round Robin (RR). I wrote about it in the past(1, 2) but as with vSphere 4.1 the commands slightly changed I figured it wouldn’t hurt to write it down again:

First I validated what the currently used Storage Array Type Plugin (SATP) was and which Path Selected Policy was used:

esxcli nmp device list

(note that compared to 4.1 the “storage” bit was added… yes a minor but important change!)

Than I wanted to make sure that every single LUN that would be added would get the standard PSP for Round Robin:

esxcli nmp satp setdefaultpsp --satp VMW_SATP_ALUA_CX --psp VMW_PSP_RR

Now I also needed to set the PSP per LUN, for which I used these two lines of “script”:

for i in `ls /vmfs/devices/disks | grep naa.600`;
do esxcli nmp device setpolicy --device $i --psp VMW_PSP_RR;done

And I figured why not just set the number of IOps down to 1 as well just to see if it changes anything:

for i in `ls /vmfs/devices/disks/ | grep naa.600`;
do esxcli nmp roundrobin setconfig --device $i --type "iops" --iops=1;done

Setting “iops=1” Didn’t make much difference for me, but it appears to be a general recommendation these days so I figured it would be best to include it.

Before I forget, I wanted to document this as well. For my testing I used the following command which lets you clone a VMDK and time it:

time vmkfstools -i source.vmdk destination.vmdk

And the result would look as follows:

Destination disk format: VMFS zeroedthick
Cloning disk 'destination.vmdk'...
Clone: 100% done.
real    2m 9.67s
user    0m 0.33s
sys     0m 0.00s

Something that might be useful as well, timing the creation of a zeroedthick VMDK:

time vmkfstools -c 30G -d eagerzeroedthick newdisk.vmdk

I am using this to measure the difference between using and not using VAAI on a storage platform. It is a lot easier than constantly kicking off tasks in through vCenter. (Yes Alan and Luc I know it is way easier with PowerCLI.)

MinFreePct 6% or should it be less?

Duncan Epping · Mar 17, 2011 ·

Back in the days when Servers still had 8GB or 16GB memory at most a setting was introduced that guaranteed the hypervisor had a certain amount of free memory to its disposal. The main purpose of this being of course stability of the system. As with any Operating System free memory is desirable to ensure it is available whenever a process requests it.. or should we say World in the case of ESXi.

These days however we hardly see environments with 8 or 16GB hosts…. No, most servers today have a minimum of 48GB and I guess the standard is 72 or 96GB. With 72GB and 96GB being the standard today one can imagine that 6% might be slightly going overboard. Especially in high density environments like VDI every single MB worth of of extra memory can and will be worth it. As such it might be beneficial to change that 6% back to 2%. This KB article has been around for a couple of weeks, and describes just that: http://kb.vmware.com/kb/1033687

Now you might wonder what happens if you change that 6% down to 2% as the memory states are closely related this is what many have published in the past:

6% – High
4% – Soft
2% – Hard
1% – Low

But is that really the case? What if I would change MinFreePct? Well I actually mentioned that in one of my previous articles. MinFreePct is defined as 6% however the other memory states are not fixed but rather a percentage of MinFreePct:

Free memory state thresholds { soft:64 pct hard:32 pct low:16 pct }

So that means that if you change the “High” watermark (6%) down to 2% the percentage that will trigger ballooning / compression / swap will also automatically change. Would I recommend changing MinFreePct? Well it depends, if you are running a high density VDI workload this might just give you that little extra you need but in most other cases I would leave it to the default. (For more on memory tuning for VDI read Andre’s article that he coincidentally published today.)