Storage

How to calculate what your Virtual SAN datastore size should be

Duncan Epping · Jan 8, 2014 ·

I have had this question so many times I figured I would write an article about it, how to calculate what your Virtual SAN datastore size should be? Ultimate this determines which kind of server hardware you can use, which disk controller you need and which disks… So it is important that you get it right. I know the VMware Technical Marketing team is developing collateral around this topic, when that has been published I will add a link here. Lets start with a quote by Christian Dickmann one of our engineers as it is the foundation of this article:

In Virtual SAN your whole cluster acts as a hot-spare

Personally I like to work top-down, meaning that I start with an average for virtual machines or a total combined number. Lets take an example to go through the exercise, makes it a bit easier to digest.

Lets assume the average VM disk size is 50GB. On average the VMs have 4GB of memory provisioned. And we have 100 virtual machines in total that we want to run on a 4 host cluster. Based on that info the formula would look something like this:

(total number of VMs * average VM size) + (total number of VMs * average VM memory size) = total capacity required

In our case that would be:

(100 * 50GB) + (100 * 4GB) = 5400 GB

So that is it? Well not really, like every storage / file system there is some overhead and we will need to take the “failures to tolerate” in to account. If I set my “failures to tolerate” to 1 than I would have 2 copies of my VMs, this means I need 5400 GB * 2 = . Personally I also add an additional 10% in disk capacity to ensure we have room for things like: meta data, log files, vmx files and some small snapshots when required. Note that VSAN by default provisions all VMDKs as thin objects (note that swap files are thick, Cormac explained that here), so there should be room available regardless. Better safe than sorry though. This means that 10800 GB actually becomes 11880 GB. I prefer to round this up to 12TB. The formula I have been using thus looks as follows:

(((Number of VMs * Avg VM size) + (Number of VMs * Avg mem size)) * FTT+1) + 10%

Now the next step is to see how you divide that across your hosts. I mentioned we would have 4 hosts in our cluster. We have two options, we create a cluster that can re-protect itself after a full host failure or we create cluster that cannot. Just to clarify, in order to have 1 host of spare capacity available we will need to divide the total capacity by 3 instead of 4. Lets look at those two options, and what the impact is:

12TB / 3 hosts = 4TB per host (for each of the 4 hosts)
- Allows you re-protect (sync/mirror) all virtual machine objects even when you lose a full host
- All virtual machines will maintain availability levels when doing maintenance
- Requires an additional 1TB per host!
12TB / 4 hosts = 3TB per host (for each of the 4 hosts)
- If all disk space is consumed, when a host fails virtual machines cannot be “re-protected” as there would be no capacity to sync/mirror the objects again
- When entering maintenance mode data availability cannot be maintained as there would be no room to sync/mirror the objects to another disk

Now if you look at the numbers, we are talking about an additional 1TB per host. With 4 hosts, and lets assume we are using 2.5″ SAS 900GB Hitachi drives that would be 4 additional drives, at a cost of around 1000 per drive. When using 3.5″ SATA drives the cost would be a lot lower even. Although this is just a number I found on the internet it does illustrate that the cost of providing additional availability could be small. Prices could differ though depending on the server brand used. But even at double the cost, I would go for the additional drive and as such additional “hot spare capacity”.

To make life a bit easier I created a calculator. I hope this helps everyone who is looking at configuring hosts for their Virtual SAN based infrastructure.

Startup News Flash part 11

Duncan Epping · Dec 17, 2013 ·

Last Startup News Flash of the year, part 11… It is relatively short this time, I am guessing everyone is wrapping up before the holiday season really starts. I know I am!

I want to congratulate Nimble on their very successful IPO. They introduced their stock at the price of $21.00 per share and are now at $ 35.00 after just a couple of days of trading. Not sure why, but for whatever reason I haven’t written about Nimble yet in-depth, personally I’ve been impressed by what they offer. If you look at the cost of their solution and hold it against quality and features they offer I am sure you will be impressed as well, definitely one of those companies I would be talking to when looking to invest in a new storage system! Once again, congrats to all involved on the successful IPO.

Infinio just announced a new round of funding. 12 million for Series B is not bad if you ask me. Investors include: Bessemer Venture Partners, Highland Capital Partners, Lightspeed Venture Partners, and Osage University Partners (a partner of Columbia University, home of Infinio’s roots). After having recently announced GA of their 1.0 product I guess it is full speed ahead with this new injection. Congrats and looking forward to the upcoming releases.

That was it for this year with regards to startups news , hopefully back next year with more Startup News!

Startup News Flash part 10

Duncan Epping · Nov 29, 2013 ·

There we are, part 10 of the Startup News Flash. Someone asked me on Twitter last week why Company XYZ was never included in the news flash. Let it be clear that I am not leaving anyone out (unless I feel they aren’t relevant to this newsletter or my audience), I have limited time so typically do not do briefings… Which means that if the marketing team doesn’t sent me the details via email and I haven’t somehow stumbled across the announcement it will not appear on here. If you want your company to be listed, make sure they sent their press releases over.

Some new models announced by Nutanix. Funny to see how they’ve been pushing hard from a marketing perspective to remove the “pure VDI play” label they had and now launch a VDI focused model called the 7000 series. (Do not get me wrong, I think this is a brilliant move!) The 7000 series offers you the option to include NVIDIA K1 or K2 Grid cards. Primarily intended to accelerate graphics, so if you are for instance doing a lot of 3D rendering or just are a heavy graphical VDI user these could really provide a benefit over their (and other vendors) normal offerings. On top of that the 3000 and 6000 series has been overhauled. The NX-3061 and NX-3061 with 10 Core (2.8GHz) Ivy Bridge have been introduced and the NX6060 and NX6080 10 Core (2.8 and 3.0GHz respectively) have been introduced. Haven’t seen anything around pricing, so can’t comment on that.

No clue what it is exactly these guys do to be honest. I find their teaser video very intriguing. Not much detail to be found around what they are doing other than “re-imagine enteprise computing”. Hoping to hear more from these guys in the future as their teaser did make me curious.

I don’t care much about benchmarks, but it is always nice to see a smaller (or the underdog) company beat the big players. Kaminario managed to outperform Oracle, IBM and Fujitsu with their SPC-2 Performance Benchmark using their scale-out all flash array K2 v4. Just a couple of weeks after breaking the SPC-1 Benchmark World Record again. Like I said, I don’t care much about benchmarks as it doesn’t typically say much about the operational efficiency etc. Still it is a nice indication of what can be achieved, though your results may vary depending on your IO pattern of course.

VSAN VDI Benchmarking and Beta refresh!

Duncan Epping · Nov 26, 2013 ·

I was reading this blog post on VSAN VDI Benchmarking today on Vroom, the VMware Performance blog. You see a lot of people doing synthetic tests (max iops with sequential reads) on all sorts of storage devices, but lately more and more vendors are doing these more “real world performance tests”. While reading this article about VDI benchmarking, and I suggest you check out all parts (part 1, part 2, part 3), there was one thing that stood out to me and that was the comparison between VSAN and an All Flash Array.

The following quotes show the strength of VSAN if you ask me:

we see that VSAN can consolidate 677 heavy users (VDImark) for 7-node and 767 heavy users for 8-node cluster. When compared to the all flash array, we don’t see more than 5% difference in the user consolidation.

Believe me when I say that 5% is not a lot. If you are actively looking at various solutions, I would highly recommend to include the “overhead costs” to your criteria list as depending on the solution chosen this could make a substantial difference. I have seen other solutions requiring a lot more resources. But what about response time, cause that is where the typical All Flash Array shines… ultra low latency, how about VSAN?

Similar to the user consolidation, the response time of Group-A operations in VSAN is similar to what we saw with the all flash array.

Both very interesting results if you ask me. Especially the < 5% in user consolidation is what stood out to me the most! Once again, for more details on these tests read the VDI Benchmarking blog part 1, part 2, part 3!

Beta Refresh

For those who are testing VSAN, there is a BETA refresh available as of today. This release has a fix for the AHCI driver issue… and it increases the disk group limit from 6 to 7. From a disk group perspective this will come in handy as many servers have 8, 16 or 24 disk slots allowing you to do 7HHDs + 1 SSD per group. Also some additional RVC commands have been added in the storage policy space, I am sure they will come in handy!

Nice side affect of the number of HDDs going up is increase in max capacity:

(8 hosts * (5 diskgroups * 7 HDDs)) * Size of HDD = Total capacity

With 2 TB disks this would result in:

(8 * (5 * 7)) * 2TB = 560TB

Now keep on testing with VSAN and don’t forget to report feedback through the community forums or your VMware rep.

Startup intro: Coho Data

Duncan Epping · Oct 15, 2013 ·

Today a new startup is revealed named Coho Data, formerly known as Convergent.io. Coho Data was founded by Andrew Warfield, Keir Fraser and Ramana Jonnala. For those who care, they are backed by Andreessen Horowitz. Probably most known for the work they did at Citrix on Xenserver. What is it they introduced / revealed this week?

Coho Data introduces a new scale-out hybrid storage solution (NFS for VM workloads). With hybrid meaning a mix of SATA and SSD. This for obvious reasons, SATA bringing you capacity and flash providing you raw performance. Let me point out that Coho is not a hyperconverged solution, it is a full storage system.

What does it look like? It is a 2U box which holds 2 “MicroArrays” which each MicroArray having 2 processors, 2 x 10GbE NIC port and 2 PCIe INTEL 910 cards. Each 2u block provides you 39TB of capacity and ~180K IOPS (Random 80/20 read/write, 4K block size). Starting at $2.50 per GB, pre-dedupe & compression (which they of course offer). Couple of things I liked looking at their architecture, first and probably foremost the “scale-out” architecture, scale to infinity is what they say in a linear fashion. On top of that, it comes with an OpenFlow-enabled 10GbE switch to allow for ease of management and again scalability.

If you look closely at how they architected their hardware, they created these highspeed IO lanes: 10GbE NIC <–> CPU <–> PCIe Flash Unit. Each highway has its dedicated CPU, NIC Port, ad on top of that they PCIe Flash, allowing for optimal performance, efficiency and fine grained control. Nice touch if you ask me.

Another thing I really liked was their UI. You can really see they put a lot of thought in the user experience aspect by keeping things simple and presenting data in an easy understandable way. I wish every vendor did that. I mean, if you look at the screenshot below how simple does that look? Dead simple right!? I’ve seen some of the other screens, like for instance for creating a snapshot schedule… again same simplicity. Apparently, and I have not tested this but I will believe them on their word, they brought that simplicity all the way down to the “install / configure” part of things. Getting Coho Data up and running literally only takes 15 minutes.

What I also liked very much about the Coho Data solution is that Software-defined Networking (SDN) and Software-defined Storage (SDS) are tightly coupled. In other words, Soho configures the network for you… As just said, it takes 15 minutes to setup. Try creating the zoning / masking scheme for a storage system and a set of LUNs these days, even that takes more time then 15 – 20 minutes. There aren’t too many vendors combining SDN and SDS in a smart fashion today.

When they briefed me they gave me a short demo and Andy explained the scale-out architecture, during the demo it happened various times that I could draw a parallel between the VMware virtualization platform and their solution which made is easy for me to understand and relate to their solution. For instance, Soho Data offers what I would call DRS for Software-Defined Storage. If for whatever reasons defined policies are violated then Coho Data will balance the workload appropriately across the cluster. Just like DRS (and Storage DRS) does, Coho Data will do a risk/benefit analysis before initiating the move. I guess the logical question would be, well why would I want Coho to do this when VMware can also do this with Storage DRS? Well keep in mind that Storage DRS works “across datastores”, but as Coho presents a single datastore you need something that allows you to balance within.

I guess the question then remains what do they lack today? Well today as a 1.0 platform Coho doesn’t offer replication to outside of their own cluster. But considering they have snapshotting in place I suspect their architecture already caters for it, and it something they should be able to release fairly quickly. Another thing which is lacking today is a vSphere Web Client plugin, but then again if you look at their current UI and the simplicity of it I do wonder if there is any point in having one.

All in all, I have been impressed by these newcomers in the SDS space and I can’t wait to play around with their gear at some point!