Storage

Tintri follow up

Duncan Epping · Aug 18, 2011 ·

Back in March I wrote about this new and interesting storage vendor called Tintri which had just released a new NAS appliance called VMstore. I wrote about their level of integration and the fact that their NAS appliance is virtual machine aware and allows you to define performance policies per virtual machine. I am not going to rehash the complete post so for more details read it before you continue reading this article. During the briefing for that article we discussed some of the caveats with regards to their design and some possible enhancements. Tintri apparently is the type of company who listens to community input and can act quick. Yesterday I had a briefing of some of the new features Tintri will announce next week. I’ve been told that none of this is under embargo so I will go ahead and share with you what I feel is very exciting. Before I do though I want to mention that Tintri now also has teams in APAC and EMEA, as some of you know they started out only in North-America but now have expanded to the rest of the world.

First of all, and this is probably the most heard complaint, is that the upcoming Tintri VMstore devices will be available in a dual controller configuration which makes it more interesting to many of you probably. Especially the more up-time sensitive environments will appreciate this, and who isn’t sensitive about up-time these days? Especially in a virtualized environment where many workloads share a single device this improvement is more than welcome! The second thing which I really liked is how they enhanced their dashboard. Now this seems like a minor thing but I can ensure you that it will make your life a lot easier. Let me dump a screenshot first and then discuss what you are looking at.

The screenshot shows the per VM latency statistics… Now what is exciting about that? Well if you look at the bottom you will see the different colors and each of those represent a specific type of latency. Lets assume your VM experiences 40ms of latency and your customer starts complaining. The main thing to figure out is what causes this slow down. (Or in many cases, who can I blame?) Is your network saturated? Is the host swamped? Is it your storage device? In order to identify these types of problems you would need a monitor tool and most likely multiple tools to pinpoint the issue. Tintri decided to hook in to vCenter and just pull down the various metrics and use this to create the nice graph that you see in the screenshot. This allows you to quickly pinpoint the issue from a single pane of glass. And yes you can also expect this as a new tab within vCenter.

Another great feature which Tintri offers is the ability to realign your VMDKs. Tintri does this, unlike most solutions out there, from the “inside”. With that meaning that their solution is incorporated into their appliance and not a separate tool which needs to run against each and every VM. Smart solution which can and will safe you a lot of time.

It’s all great and amazing isn’t it? Or are there any caveats? One thing I still feel needs to be addressed is replication. With this next release it is not available yet but is that a problem now that SRM offers vSphere Replication? I guess that relieves some of the immediate pressure but I would still like to see a native Tintri’s solution providing a-sync and sync replication. Yes it will take time but I would expect though that Tintri is working on this. I tried to persuade them to make a statement yesterday they unfortunately couldn’t say anything with regards to a timeline / roadmap.

Definitely a booth I will be checking out at VMworld.

Nutanix Complete Cluster

Duncan Epping · Aug 18, 2011 ·

I was just reading up and noticed an article about Nutanix. Nutanix is a “new” company which just came out of stealth mode and offers a datacenter in a box type of solution. With that meaning that they have a solution which provides shared storage and compute resources in a single 2u chassis. This 2u chassis can hold up to 4 compute nodes and each of these nodes can have 2 CPUs, up to 192GB of memory, 320 GB of PCIe SSD, 300 GB SATA SSD and 5 TB of SATA HDDs. Now the cool thing about it is that each of the nodes “local” storage can be served up as shared storage to all of the nodes enabling you to use HA/DRS etc. I guess you could indeed describe Nutanix’s solution as the “Complete Cluster” solution and as Nutanix says it is unique and many analysts and bloggers have been really enthusiastic about this… but is it really that special?

What Nutanix actually uses for their building block is an HPC form factor case like the one I discussed in May of this year. I wouldn’t call that revolutionary as Dell, Super Micro, HP (and others) sell these as well but market it differently (in my opinion a missed opportunity). What does make Nutanix somewhat unique is that they package it as a complete solution including a Virtual Storage Appliance they’ve created. It is not just a VSA but it appears to be a smart device which is capable of taking advantage of the SSD drives available and uses that as a shared cache distributed amongst each of the hosts and it uses multiple tiers of storage; SSD and SATA. It kind of reminds me of what Tintri does only this is a virtual appliance that is capable of leveraging multiple nodes. (I guess HP could offer something similar in a heartbeat if they bundle their VSA with the DL170e) Still I strongly believe that this is a promising concept and hope these guys are at VMworld so I can take a peak and discuss the technology behind this a bit more in-depth as I have a few questions from a design perspective…

No 10Gbe redundancy? (according to the datasheet just a single port)
Only 2 nics for VM traffic, vMotion, Management? (Why not just 2 10Gbe nic ports?)
What about when the VMware cluster boundaries are reached? (Currently 32 nodes)
Out band management ports? (could be useful to have console access)
How about campus cluster scenarios, any constraints?
…..

Lets see if I can get these answered over the next couple of days or at VMworld.

VM with disks in multiple datastore clusters?

Duncan Epping · Aug 9, 2011 ·

This week I received a question about Storage DRS. The question was if it was possible to have a VM with multiple disks in different datastore clusters? It’s not uncommon to have set ups like these so I figured it would be smart to document it. The answer is yes that is supported. You can create a virtual machine with a system disk on a raid-5 backed datastore cluster and a data disk on a raid-10 backed datastore cluster. If Storage DRS sees the need to migrate either of the disks to a different datastore it will make the recommendation to do so.

vSphere 5 Coverage

Duncan Epping · Aug 6, 2011 ·

I just read Eric’s article about all the topics he covered around vSphere 5 over the last couple of weeks and as I just published the last article I had prepared I figured it would make sense to post something similar. (Great job by the way Eric, I always enjoy reading your articles and watching your videos!) Although I did hit roughly 10.000 unique views on average per day the first week after the launch and still 7000 a day currently I have the feeling that many were focused on the licensing changes rather then all the new and exciting features that were coming up, but now that the dust has somewhat settled it makes sense to re-emphasize them. Over the last 6 months I have been working with vSphere 5 and explored these features, my focus for most of those 6 months was to complete the book but of course I wrote a large amount of articles along the way, many of which ended up in the book in some shape or form. This is the list of articles I published. If you feel there is anything that I left out that should have been covered let me know and I will try to dive in to it. I can’t make any promises though as with VMworld coming up my time is limited.

Once again if there it something you feel I should be covering let me know and I’ll try to dig in to it. Preferably something that none of the other blogs have published of course.

SDRS and Auto-Tiering solutions – The Injector

Duncan Epping · Aug 5, 2011 ·

A couple of weeks ago I wrote an article about Storage DRS (hereafter SDRS) interoperability and I mentioned that using SDRS with Auto-Tiering solutions should work… Now the truth is slightly different, however as I noticed some people started throwing huge exclamation marks around SDRS I wanted to make a statement. Many have discussed this and made comments around why SDRS would not be supported with auto-tiering solutions and I noticed the common idea is that SDRS would not be supported with them as it could initiate a migration to a different datastore and as such “reset” the tiered VM back to default. Although this is correct there is a different reason why VMware recommends to follow the guidelines provided by the Storage Vendor. The guideline by the way is to use Space Balancing but not enable I/O metric. Those who were part of the beta or have read the documentation, or our book might recall this when creating datastore clusters select datastores which have similar performance characteristics. In other words do not mix an SSD backed datastore with a SATA backed datastore, however mixing SATA with SAS is okay. Before we will explain why lets repeat the basics around SDRS:

SDRS allows the aggregation of multiple datastores into a single object called a datastore cluster. SDRS will make recommendations to balance virtual machines or disks based on I/O and space utilization and during virtual machine or virtual disk provisioning make recommendations for placement. SDRS can be set in fully automated or manual mode. In manual mode SDRS will only make recommendations, in fully automated mode these recommendations will be applied by SDRS as well. When balancing recommendations are applied Storage DRS is used to move the virtual machine.

So what about Auto-Tiering solutions? Auto-tiering solutions move “blocks” around based hotspots. Yes, again, when SvMotion would migrate the virtual machine or virtual disk this process would be reset. In other words the full disk will land on the same tier and the array will need to decide at some point what belongs where… but is this an issue? In my opinion it probably isn’t but it will depend on why SDRS decides to move the virtual machine as it might lead to a temporary decrease in performance for specific chunks of data within the VM. As auto-tiering solutions help preventing performance issues by moving blocks around you might not want to have SDRS making performance recommendations but why… what is the technical reason for this?

As stated SDRS uses I/O and space utilization for balancing… Space makes sense I guess but what about I/O… what does SDRS use, how does it know where to place a virtual machine or disk? Many people seem to be under the impression that SDRS simply uses average latency but would that work in a greenfield deployment where no virtual machines are deployed yet? It wouldn’t and it would also not say much about the performance capabilities of the datastore. No in order to ensure the correct datastore is selected SDRS needs to know what the datastore is capable off, it will need to characterize the datastore and in order to do so it uses Storage IO Control (hereafter SIOC), more specifically what we call “the injector”. The injector is part of SIOC and is a mechanism which is used to characterize each of the datastore by injecting random (read) I/O. Before you get worried, the injector only injects I/O when the datastore is idle. Even when the injector is busy and it notices other activity on the datastore it will back down and retry later. Now in order to characterize the datastore the injector uses different amount of outstanding I/Os and measures the latency for these I/Os. For example it starts with 1 outstanding I/O and gets a response within 3 miliseconds. When 3 outstanding I/Os are used the average latency for these I/Os is 3.8 miliseconds. With 5 I/Os the average latency is 4.3 and so on and so forth. For each device the outcome can be plotted as show in the below screenshot and the slope of the graph indicates the performance capabilities of the datastore. The steeper the line the lower the performance capabilities. The graphs shows the test where a multitude of datastores are characterized each being backed by a different number of spindles. As clearly shown there is a relationship between the steepness and the number of spindles used.

So why does SDRS care? Well in order to ensure the correct recommendations are made each of the datastores will be characterized in other words a datastore backed by 16 spindles will be a more logical choice than a datastore with 4 spindles. So what is the problem with Auto-Tiering solutions? Well think about it for a second… when a datastore has many hotspots an auto-tiering solution will move chunks around. Although this is great for the virtual machine it also means that when the injector characterizes the datastore it could potentially read from the SSD backed chunks or the SATA backed chunks and this will lead to unexpected results in terms of average latency and as you can imagine this will be confusing to SDRS and possibly lead to incorrect recommendations. Now, this is typically one of those scenarios which requires extensive testing and hence the reason VMware refers to the storage vendor for their recommendation around using SDRS in combination with auto-tiering solutions. My opinion: Use SDRS Space Balancing as this will help preventing downtime related to “out of space” scenarios and also help speeding up the provisioning process. On top of that you will get Datastore Maintenance Mode and Affinity Rules.