Mythbusters: ESX/ESXi caching I/O?

We had a discussion internally about ESX/ESXi caching I/Os. In particular this discussion was around caching of writes  as a customer was concerned about consistency of their data. I fully understand that they are concerned and I know in the past some vendors were doing write caching however VMware does not do this for obvious reasons. Although performance is important it is worthless when your data is corrupt / inconsistent. Of course I looked around for  data to back this claim up and bust this myth once and for all. I found a KB article that acknowledges this and have a quote from one of our VMFS engineers.

Source Satyam Vaghani (VMware Engineering)
ESX(i) does not cache guest OS writes. This gives a VM the same crash consistency as a physical machine: i.e. a write that was issued by the guest OS and acknowledged as successful by the hypervisor is guaranteed to be on disk at the time of acknowledgement. In other words, there is no write cache on ESX to talk about, and so disabling it is moot. So that’s one thing out of our way.

Source – Knowledge Base
VMware ESX acknowledges a write or read to a guest operating system only after that write or read is acknowledged by the hardware controller to ESX. Applications running inside virtual machines on ESX are afforded the same crash consistency guarantees as applications running on physical machines or physical disk controllers.

Virtual Machine Storage and Snapshots Survey

It seems to be survey month…. Nevertheless this survey will take you a couple of minutes and is about Virtual Machine Storage and Snapshots. Most of our PMs are currently revising / updating and prioritizing the roadmap and real customer data and opinions are always welcome to define these. We would appreciate it if you could take 5 minutes of your time to complete this one, it is 12 questions.

Virtual Machine Storage and Snapshots Survey

ALUA and the useANO setting

Disclaimer: Now, lets make this very clear. Don’t touch “useANO” unless you are specifically instructed to do so, this article is just for educational purposes.

I had some issues in my lab with an ALUA array. (If you have no clue what ALUA is, read this post.) As you hopefully know with an ALUA array you typically have 4 paths. Two of these paths are marked within vCenter as “Active (I/O)” and the remaining two are marked as “Active”. The command-line interface describes this slightly better in my opinion as it says “Active” and “Active unoptimized”. So lets assume for you a second you would use Round Robin, vSphere is smart enough to only use the paths marked in vCenter as “Active (I/O)”.

During the discussions around the issues the setting “useANO” was dropped. I thought I knew what it did but during the discussion I started to doubt myself. So I did a quick search on the internet and noticed that not a lot of people actually knew what it stands for and what it does. I’ve seen people stating that paths are disabled or hidden… That is not the case at all. It’s not a magic setting. Lets start with explaining what it stands for.

useANO = Use Active-Non-Optimized

So in other words “useANO” allows you to enable the usage of Active-Non-Optimized paths. By default this is of course set to 0 as in a normal situation you wouldn’t want to use a non-optimized path as this would mean traffic would need to flow back to the owning processor. Chad Sakac made a nice diagram that depicts this scenario in his article on ALUA (must read!). Note that “SP B” is the processor that “owns” the LUN and the right path would typically be the path marked as “Active (I/O)”. The left path would be the “Active” path or less elegantly put the Non-Optimized path:

As you can understand having traffic flowing through a non-optimized path is normally something you would want to avoid as this will cause latency to go up. (more hops) This is a scenario of course that could happen when the path to the “owning” processor (SP B in the diagram) is unavailable for whatever reason…. You could also force this to happen by setting “useANO=1″.  That is what is does, it allows you to use non-optimized paths. For those who skipped it, please read the disclaimer at the top!

Enable TRIM on OS X 10.6.7

Thanks to Jason Nash’s article I managed to finally get TRIM enabled for my SSD on my MAC. The procedure works great, however when you have a dual SSD setup like I have (booting on a 120GB Intel and running my VMs on a 256GB Kingston) it doesn’t work as replacing the identifier leaves you with 1 SSD without TRIM support. I googled around for a bit and found this article by Oskar Groth. Oskar made a nice GUI tool that enables TRIM for you on all SSDs that your Mac contains! Be aware that this is a hack and there is no support whatsoever for this. So why would you add it? Well look at the test that Jason did, and I also did some testing and the preliminary findings are that it vastly improves your performance!

Tintri – virtual machine aware storage

This week I had a call with a new and exciting company called Tintri. Tintri has been flying under the radar for the last couple of years and has worked really hard to develop a new product. Tintri was founded by some of the smartest kids on the block one of which is their current CEO and former EVP of Engineering at VMware Dr. Kieran Harty. But not only former VMware employees, no we are talking about former Datadomain, NetApp and SUN employees. Although it is a rough time for a storage start-up they are jumping in the deep. Although one might wonder how deep it actually is as these are well experienced people and they know how deep they can go and what the weak and strong points are in virtualized environments when it comes to storage.

During the call the folks at Tintri offered to give a demo of how their Storage Unit works. As you know I don’t have a deep storage background like someone as Chad Sakac but I am working with storage on a day to day basis. I look at storage from a different perspective. My interest is usually around management, performance and availability.

From operational/management perspective Tintri VMstore does change things. Tintri VMstore is VM aware, which of course is easier to accomplish when you develop your own filesystem and serve it up as NFS than when using block-based storage. Tintri leverages the VMware vSphere APIs and correlates information with what they have running on top their filesystem. Now why would you want to do that? Well for instance for simple things like Storage Level snapshots per VM, try doing that on the average FC/iSCSI array and you find yourself snapshotting a full LUN or assigning dedicated LUNs to VMs. In both case not an ideal situation.

What makes VMstore special is that on top of the integration with vSphere they also do inline deduplication and compression, meaning that although a 5u VMstore node (5u includes a UPS system) offers you 8.5 TB of usable harddisk capacity it could potentially serve multitudes of that. (Depending on the type of workload of course.) But when it is doing inline dedupe and compression what about performance? Tintri VMstore offers 16 SATA drives. No not just SATA as that probably wouldn’t meet all your performance requirements, no they also offer MLC Flash aka SSD and that is where the dedupe and compression is done. In other words, in order to enable inline dedupe and compression Tintri developed a hybrid filesystem that moves data between SSD and SATA. By the way, VMstore uses RAID-6 for the 1TB of SSD drives it contains and for the 16x1TB SATA drives. If data needs to move it decides that on a pretty granular level, 4kb. Of course VMstore is smart enough to batch these transfers to optimize bandwidth.

Each VMstore node will be served up as a single NFS Share via 10GbE. Do you need more diskspace? Hook up another VMstore node and connect to the NFS Share. Other things that are simplified of course it the management of the VMstore node itself. Need to upload log files? Don’t worry, a single click sends it to the cloud over an SSL connection and  Tintri will pick it up from there. No hassling with FTP etc. Same goes for the “call back” system for support, it will upload details to the cloud and Tintri will pick it up.

When they demoed it yesterday most workloads were actually running on SSD at that moment. (The showed me their VDI environment) The cool thing is that you can actually see the performance stats on a per VM level (see screenshot below) or even per VMDK if you want to. On top of that you can also “reserve” performance for your VMs by telling VMstore that these need to be pinned to SSD.

The following is a quote from one of their customers:

Previous attempts to virtualize our Oracle Financials application had failed – as we couldn’t deliver the performance users required,” said Don St. Onge, CIO, TIBCO Software, Inc. “With Tintri VMstore, we saw a 2X performance boost which was more than enough to keep our users happy. Tintri’s unique approach to deduplication and compression lets us run the entire 1TB database instance in only 177GB of flash memory.

Now this might be slightly overstating it, like most press releases do, as I have many customers virtualizing their production tier 1 apps, but the key take away for me is the fact that they run a 1TB database in 177GB of flash and still see a performance improvement. I guess that is due to the beefy specs of the VMstore node which is literally using multiple multi core CPUs.

So in short (copy from Tintri’s press release):

  • VM-aware file system designed to service I/O workloads from VMs;
  • Seamless flash/disk integration with file system for smooth workload transitions and efficient use of flash capacity;
  • Monitoring, control and reporting features on a per-VM and per-virtual disk basis for greater transparency in managing storage for VMs;
  • Hybrid flash/disk appliances with inline deduplication and compression capabilities.

So Tintri is hot, fantastic, great… but there must be things that you feel can be improved? Well of course there are…

I would love to see even more integration with VMware. Not only make the VMstore node VM aware but also make vSphere VMstore aware. In others I would expect and love to see plugins which allow you to do most of the VM level storage management tasks within vCenter instead of through the VMstore webinterface. (Although it is a very simple interface which you can master in seconds.) Also I feel that replication is something that it needs to have. I can imagine it is part of their roadmap but I would rather see it today than tomorrow. Having the ability to enable replication per VM and than only replicate the changed and compressed “chunks” is more than welcome. It would also be great if it had Syslog capabilities so that event correlation is even easier.

My take in short: Tintri VMstore has an interesting approach on the traditional problems virtualized infrastructures are facing, by making their nodes VM aware they are looking to solve these problems. Along the way they are simplifying management and have a very competitive price. Most definitely worth investigating if their solution meets your requirements! Their website specifically calls out “Test and Development” as one of the target solutions for VMstore, I guess by now everyone know what starting with “Test and Development” brought VMware…. Keep an eye out for these guys,