VAAI sweetness

Nothing deep technical this time, I just want to make clear how cool VAAI is! Last week I noticed on twitter that some people reported some nice figures around VAAI. I asked them if they were willing to run some tests and compare VAAI vs NON-VAAI runs. And these were some of the responses I received, I cut them down to the core of the message and I leave it up to you to visit these articles and read them. Thanks for helping me proof this point guys!

vSphere VAAI Performance on the HP P4000 G2 by Barrie Seed

The results are pretty conclusive. For block zeroing on a VMDK, VAAI accelerates the operation by 4-5x

VAAI enabled: 109 seconds

VAAI disabled: 482 seconds

VAAI Awesomeness by Anders Hansen

I guess a picture says more than a thousand words. Difference in percentage for Cloning:

Difference in time for Eager Zero Thick Creation:

Exploring the performance benefits of VAAI by Matt Liebowitz

To the results:

Time to create a 50GB eagerzeroedthick VMDK without VAAI: 10 minutes generating approximately 750 write IOPS on the array

Time to create a 50GB eagerzeroedthick VMDK with VAAI: 1 minute 30 seconds, could not measure IOPS (more on that later)

Clearly there is a significant difference in creating the blank eagerzeroedthick VMDK. How about when Windows 2008 R2 is installed on that VMDK and then converted to a template? How fast can we deploy that template?

Deploying 50GB eagerzeroedthick template without VAAI: 19 minutes generating between 1,200-1,600 IOPS (half read/write, which makes sense since it has to read from and write to the same array)

Deploying 50GB eagerzeroedthick template with VAAI: 6 minutes (again, couldn’t measure IOPS)

NetApp VMware VAAI Performance Tests by Jacint Juhasz

It’s not a surprise, the trend is the same.

Operation Enabled VAAI Disabled VAAI

50GB VMDK creation with cluster support (zeroed) 5:09 9:36

Clone VM within datastore (LUN) 8:36 13:38

Clone VM between datastores (LUN) 8:34 14:36

Storage VMotion 9:38 14:45

With VAAI enabled, there’s no write or read rate (as there’s no read or write from the host side), but the charts shows latency around 8-10ms. With disabled VAAI the chart looks a bit different. For the VMDK creation the write rate is around 100000KBps with 160ms latency (write only, no reads). The read/write operation shows 70000KBps IO rate with 10-15ms latency.

3PAR vSphere VAAI “Write Same” Test Results: 20x performance boost by Derek Seaman

“Write Same” Without VAAI:
70GB VMDK 2 minutes 20 seconds (500MB/sec)
240GB VMDK 8 minutes 1 second (498MB/sec)
1TB VMDK 33 minutes 10 seconds (502MB/sec)

Without VAAI the ESXi 4.1 host is sending a total 500MB/sec of data through the SAN and into the 4 ports on the 3PAR. Because the T400 is an active/active concurrent controller design, both controllers can own the same LUN and distribute the I/O load. In the 3PAR IMC (InForm Management console) I monitored the host ports and all four were equally loaded around 125MB/sec.

This shows that round-robin was functioning, and highlights the very well balanced design of the T400. But this configuration is what everyone has been using the last 10 years..nothing exciting here except if you want to weight down your SAN and disk array with processing zeros. Boorrrringgg!!

Now what is interesting, and very few arrays support, is a ‘zero detect’ feature where the array is smart enough on thin provisioned LUNs to not write data if the entire block is all zeros. So in the 3PAR IMC I was monitoring the back-end disk facing ports and sure enough, virtually zero I/O. This means the controllers were accepting 500MB/sec of incoming zeros, and writing practically nothing to disk. Pretty cool!

“Write Same” With VAAI: 20x Improvement
70GB VMDK 7 seconds (10GB/sec)
240GB VMDK 24 seconds (10GB/sec)
1TB VMDK 1 minute 23 seconds (12GB/sec)

I guess it is needless to say why VAAI rocks and why when you are looking to buy new storage it is important to inform if the array is VAAI capable, and if not make sure you ask when it will support VAAI!?! VAAI isn’t just for specific workloads, VAAI was designed to reduce stress on different layers and to decrease the cost of specific actions and more importantly for you to decrease the costs of operations!

Comments

Andreas says

24 March, 2011 at 13:35

Yep, VAAI is really sweet. We deploy our templates (Win2008r2 with all the patches) in about 1min30s when the SAN is under load. This on P4300 and P4500 boxes from HP. The machine is about 16GB in size. It’s amazingly fast!

We’re still waiting for HP to include VAAI in HP EVA firmware 🙂
- Shane says
  
  25 March, 2011 at 09:13
  
  I’ve heard that the HP EVA will be VAAI capable 2nd half of this year.
  Shane.
- Britt says
  
  27 May, 2011 at 05:40
  
  Andreas – Which controller/interfaces are you using in the P2000/MSA?
Jason Boche says

24 March, 2011 at 15:07

Thank you for aggregating these samples!!!
Calvin Zito says

24 March, 2011 at 17:23

Nice job Duncan! I have a video showing the P4000 VAAI in action. While I didn’t get video of the non-VAAI run of the clone or vMotion, but I do talk about how long those took without VAAI. I’ll leave the link if any of your readers want to see VAAI in a video.

http://h30507.www3.hp.com/t5/Around-the-Storage-Block-Blog/VAAI-on-the-P4000-demo/ba-p/88583

Calvin (@HPStorageGuy)
yyaazz says

24 March, 2011 at 18:20

Hi Duncan,

thanks for collecting these results and also thanks for including mine!

Jacint
- Duncan Epping says
  
  24 March, 2011 at 18:46
  
  No thank you for providing these insights, very valuable!
Barrie says

25 March, 2011 at 11:28

Thanks Duncan – nice collection of data from different arrays.
Dave Convery says

25 March, 2011 at 14:18

Interesting numbers around Jumbo frames as well. It seems that enabling them actually inhibits performance.

Dave
Derek Seaman says

26 March, 2011 at 01:21

Hey Duncan, thanks so much for including my 3PAR test results. I was going to reply with a link to my test results when I started reading your article, then saw you directly referenced them! 🙂 VAAI does rock!
binaryspiral says

27 March, 2011 at 05:22

I took two NSM 2120 modules out of production last year because of performance issues – it wasn’t the units fault, we just out grew it and had to decide on an upgrade path. Went with a NetApp 3140 and fast disk drawers.

With a software upgrade, they are now known as P4000 now and are happily serving up storage in our dev lab – but this new upgrade is going to make them very useful in development.

Thanks HP, you just earned another year of support contract purchase. 🙂
Rawley Burbridge says

12 May, 2011 at 18:10

Does anyone have experience with changing the data mover transfer chunk size?

esxcfg-advcfg /DataMover/MaxHWTransferSize -s “new size KB”
- Cedric says
  
  7 August, 2013 at 07:29
  
  Yes , the default chunk size is 4 MB, there has been significant improvements after bumping up the value to 16 MB, the caveat is that if you have different arrays, you would want to set it to the lowest value that is compatible to both/multiple arrays, failing which the lower end array may have degraded performance.

Related

Reader Interactions

Comments