vstorage

Mythbusters: ESX/ESXi caching I/O?

Duncan Epping · Apr 7, 2011 ·

We had a discussion internally about ESX/ESXi caching I/Os. In particular this discussion was around caching of writes as a customer was concerned about consistency of their data. I fully understand that they are concerned and I know in the past some vendors were doing write caching however VMware does not do this for obvious reasons. Although performance is important it is worthless when your data is corrupt / inconsistent. Of course I looked around for data to back this claim up and bust this myth once and for all. I found a KB article that acknowledges this and have a quote from one of our VMFS engineers.

Source Satyam Vaghani (VMware Engineering)
ESX(i) does not cache guest OS writes. This gives a VM the same crash consistency as a physical machine: i.e. a write that was issued by the guest OS and acknowledged as successful by the hypervisor is guaranteed to be on disk at the time of acknowledgement. In other words, there is no write cache on ESX to talk about, and so disabling it is moot. So that’s one thing out of our way.

Source – Knowledge Base
VMware ESX acknowledges a write or read to a guest operating system only after that write or read is acknowledged by the hardware controller to ESX. Applications running inside virtual machines on ESX are afforded the same crash consistency guarantees as applications running on physical machines or physical disk controllers.

Virtual Machine Storage and Snapshots Survey

Duncan Epping · Mar 30, 2011 ·

It seems to be survey month…. Nevertheless this survey will take you a couple of minutes and is about Virtual Machine Storage and Snapshots. Most of our PMs are currently revising / updating and prioritizing the roadmap and real customer data and opinions are always welcome to define these. We would appreciate it if you could take 5 minutes of your time to complete this one, it is 12 questions.

Virtual Machine Storage and Snapshots Survey

ALUA and the useANO setting

Duncan Epping · Mar 29, 2011 ·

Disclaimer: Now, lets make this very clear. Don’t touch “useANO” unless you are specifically instructed to do so, this article is just for educational purposes.

I had some issues in my lab with an ALUA array. (If you have no clue what ALUA is, read this post.) As you hopefully know with an ALUA array you typically have 4 paths. Two of these paths are marked within vCenter as “Active (I/O)” and the remaining two are marked as “Active”. The command-line interface describes this slightly better in my opinion as it says “Active” and “Active unoptimized”. So lets assume for you a second you would use Round Robin, vSphere is smart enough to only use the paths marked in vCenter as “Active (I/O)”.

During the discussions around the issues the setting “useANO” was dropped. I thought I knew what it did but during the discussion I started to doubt myself. So I did a quick search on the internet and noticed that not a lot of people actually knew what it stands for and what it does. I’ve seen people stating that paths are disabled or hidden… That is not the case at all. It’s not a magic setting. Lets start with explaining what it stands for.

useANO = Use Active-Non-Optimized

So in other words “useANO” allows you to enable the usage of Active-Non-Optimized paths. By default this is of course set to 0 as in a normal situation you wouldn’t want to use a non-optimized path as this would mean traffic would need to flow back to the owning processor. Chad Sakac made a nice diagram that depicts this scenario in his article on ALUA (must read!). Note that “SP B” is the processor that “owns” the LUN and the right path would typically be the path marked as “Active (I/O)”. The left path would be the “Active” path or less elegantly put the Non-Optimized path:

As you can understand having traffic flowing through a non-optimized path is normally something you would want to avoid as this will cause latency to go up. (more hops) This is a scenario of course that could happen when the path to the “owning” processor (SP B in the diagram) is unavailable for whatever reason…. You could also force this to happen by setting “useANO=1”. That is what is does, it allows you to use non-optimized paths. For those who skipped it, please read the disclaimer at the top!

VAAI sweetness

Duncan Epping · Mar 24, 2011 ·

Nothing deep technical this time, I just want to make clear how cool VAAI is! Last week I noticed on twitter that some people reported some nice figures around VAAI. I asked them if they were willing to run some tests and compare VAAI vs NON-VAAI runs. And these were some of the responses I received, I cut them down to the core of the message and I leave it up to you to visit these articles and read them. Thanks for helping me proof this point guys!

vSphere VAAI Performance on the HP P4000 G2 by Barrie Seed

The results are pretty conclusive. For block zeroing on a VMDK, VAAI accelerates the operation by 4-5x

VAAI enabled: 109 seconds

VAAI disabled: 482 seconds

VAAI Awesomeness by Anders Hansen

I guess a picture says more than a thousand words. Difference in percentage for Cloning:

Difference in time for Eager Zero Thick Creation:

Exploring the performance benefits of VAAI by Matt Liebowitz

To the results:

Time to create a 50GB eagerzeroedthick VMDK without VAAI: 10 minutes generating approximately 750 write IOPS on the array

Time to create a 50GB eagerzeroedthick VMDK with VAAI: 1 minute 30 seconds, could not measure IOPS (more on that later)

Clearly there is a significant difference in creating the blank eagerzeroedthick VMDK. How about when Windows 2008 R2 is installed on that VMDK and then converted to a template? How fast can we deploy that template?

Deploying 50GB eagerzeroedthick template without VAAI: 19 minutes generating between 1,200-1,600 IOPS (half read/write, which makes sense since it has to read from and write to the same array)

Deploying 50GB eagerzeroedthick template with VAAI: 6 minutes (again, couldn’t measure IOPS)

NetApp VMware VAAI Performance Tests by Jacint Juhasz

It’s not a surprise, the trend is the same.

Operation Enabled VAAI Disabled VAAI

50GB VMDK creation with cluster support (zeroed) 5:09 9:36

Clone VM within datastore (LUN) 8:36 13:38

Clone VM between datastores (LUN) 8:34 14:36

Storage VMotion 9:38 14:45

With VAAI enabled, there’s no write or read rate (as there’s no read or write from the host side), but the charts shows latency around 8-10ms. With disabled VAAI the chart looks a bit different. For the VMDK creation the write rate is around 100000KBps with 160ms latency (write only, no reads). The read/write operation shows 70000KBps IO rate with 10-15ms latency.

3PAR vSphere VAAI “Write Same” Test Results: 20x performance boost by Derek Seaman

“Write Same” Without VAAI:
70GB VMDK 2 minutes 20 seconds (500MB/sec)
240GB VMDK 8 minutes 1 second (498MB/sec)
1TB VMDK 33 minutes 10 seconds (502MB/sec)

Without VAAI the ESXi 4.1 host is sending a total 500MB/sec of data through the SAN and into the 4 ports on the 3PAR. Because the T400 is an active/active concurrent controller design, both controllers can own the same LUN and distribute the I/O load. In the 3PAR IMC (InForm Management console) I monitored the host ports and all four were equally loaded around 125MB/sec.

This shows that round-robin was functioning, and highlights the very well balanced design of the T400. But this configuration is what everyone has been using the last 10 years..nothing exciting here except if you want to weight down your SAN and disk array with processing zeros. Boorrrringgg!!

Now what is interesting, and very few arrays support, is a ‘zero detect’ feature where the array is smart enough on thin provisioned LUNs to not write data if the entire block is all zeros. So in the 3PAR IMC I was monitoring the back-end disk facing ports and sure enough, virtually zero I/O. This means the controllers were accepting 500MB/sec of incoming zeros, and writing practically nothing to disk. Pretty cool!

“Write Same” With VAAI: 20x Improvement
70GB VMDK 7 seconds (10GB/sec)
240GB VMDK 24 seconds (10GB/sec)
1TB VMDK 1 minute 23 seconds (12GB/sec)

I guess it is needless to say why VAAI rocks and why when you are looking to buy new storage it is important to inform if the array is VAAI capable, and if not make sure you ask when it will support VAAI!?! VAAI isn’t just for specific workloads, VAAI was designed to reduce stress on different layers and to decrease the cost of specific actions and more importantly for you to decrease the costs of operations!

Changing the PSP from Fixed to RR

Duncan Epping · Mar 21, 2011 ·

Today I was fooling around with my new Lab environment when I noticed my Path Selection Policy (PSP) was set to fixed while the array (Clariion CX4-120) most definitely supports Round Robin (RR). I wrote about it in the past(1, 2) but as with vSphere 4.1 the commands slightly changed I figured it wouldn’t hurt to write it down again:

First I validated what the currently used Storage Array Type Plugin (SATP) was and which Path Selected Policy was used:

esxcli nmp device list

(note that compared to 4.1 the “storage” bit was added… yes a minor but important change!)

Than I wanted to make sure that every single LUN that would be added would get the standard PSP for Round Robin:

esxcli nmp satp setdefaultpsp --satp VMW_SATP_ALUA_CX --psp VMW_PSP_RR

Now I also needed to set the PSP per LUN, for which I used these two lines of “script”:

for i in `ls /vmfs/devices/disks | grep naa.600`;
do esxcli nmp device setpolicy --device $i --psp VMW_PSP_RR;done

And I figured why not just set the number of IOps down to 1 as well just to see if it changes anything:

for i in `ls /vmfs/devices/disks/ | grep naa.600`;
do esxcli nmp roundrobin setconfig --device $i --type "iops" --iops=1;done

Setting “iops=1” Didn’t make much difference for me, but it appears to be a general recommendation these days so I figured it would be best to include it.

Before I forget, I wanted to document this as well. For my testing I used the following command which lets you clone a VMDK and time it:

time vmkfstools -i source.vmdk destination.vmdk

And the result would look as follows:

Destination disk format: VMFS zeroedthick
Cloning disk 'destination.vmdk'...
Clone: 100% done.
real    2m 9.67s
user    0m 0.33s
sys     0m 0.00s

Something that might be useful as well, timing the creation of a zeroedthick VMDK:

time vmkfstools -c 30G -d eagerzeroedthick newdisk.vmdk

I am using this to measure the difference between using and not using VAAI on a storage platform. It is a lot easier than constantly kicking off tasks in through vCenter. (Yes Alan and Luc I know it is way easier with PowerCLI.)