virtual san

Essential Virtual SAN – Second Edition – Rough Cuts now on Safari!

Duncan Epping · Mar 9, 2016 ·

Cormac Hogan and I have been working hard over the past couple of months to update the Virtual SAN book. It was a lot of work with all the changes that were introduced to Virtual SAN since the first version, but we managed to get it done. We just finalized the last chapter and it looks like the Rough Cuts have just gone been published through Safari. If you have a Safari account you can find the book here.

The book will go over the basics, but also describe specific subjects in-depth like stretched clustering that was introduced in 6.1 and the data services and perf monitoring which was introduced in 6.2. About to go on a VSAN journey? Well this is where you should start!

Foreword was written by Christos Karamanolis (VMware Fellow and CTO) and we had the pleasure of working with Christian Dickmann (VSAN Dev Architect) and John Nicholson (VSAN Tech Marketing) as technical reviewers. Thanks guys for keeping us honest.

Hopefully the official version will be out soon as well through Amazon and other book stores. We have asked for a “digital first approach” which means that the Mobi/Epub version should be out first followed by a paper version. When it is, I will definitely let you guys know.

Before I forget, thanks Cormac… Always a pleasure working with you on projects like these. Go VSAN!

Hybrid vs All-flash VSAN, are we really getting close?

Duncan Epping · Mar 4, 2016 ·

For the last 12 months people have been saying that all-flash and hybrid configurations are getting really close in terms of pricing. During the many conversations I have had with customers it became clear that this is not always the case when they requested quotes from server vendors and I wondered why. I figured I would go through the exercise myself to see how close we actually are and to what we are getting close. I want to end this discussion once and for all, and hopefully convince all of you to get rid of that spinning rust from your VSAN configurations, especially those who are now at the point of making their design.

For my exercise I needed to make up some numbers, I figured I would use an example of a customer to make it as realistic as possible. I want to point out that I am not looking at the price of the full infrastructure here, just comparing the “capacity tier”, so if you received a quote that is much higher than that does make sense as you will have included CPU, Memory, Caching tier etc etc. Note that I used dollar prices and took no discount in to account, discount will be different for every customer and differ per region, and I don’t want to make it more complex than it needs to be. This applies to both the software licenses as the hardware.

What are we going to look at:

10 host cluster
80TB usable capacity required
Prices for SAS magnetic disks and an all-flash configuration

I must say that the majority of my customers use SAS, some use NL-SAS. With NL-SAS of course the price point is different, but those customers are typically also not overly concerned about performance, hence the SAS and all-flash comparison is more appropriate. I also want to point out that the prices below are “list prices”. Of course the various vendors will give a substantial discount, and this will be based on your relationship and negotiation skills.

SAS 1.2TB HDD = $ 579
SAS 960GB SSD = $ 1131

The above prices were taken from dell.com and the SSD is a read intensive SSD. Now lets take a look at an example. With FTT=1 and RAID-1 for Hybrid and RAID-5 for all-flash. In order to get to 80TB usable we need to calculate the overhead and then divide it by the size of the device to find out how many devices we need. Multiply that outcome by the cost of the device and you end up with the cost of the 80TB usable capacity tier.

80TB * 2 (FTT factor) = 160TB / 1.2TB device = 133.33 devices needed
134* 579 = $ 77586

80TB * 1,33 (FTT factor) = 106.4TB / 0.960TB device = 110.8 devices needed
111 * 1131 = $ 125542

Now if you look at the price per solution, hybrid costs are $ 77.5k while all-flash is $125.5k. A significant difference, and then there also is the 30k licensing delta (list price) you need to take in to account, which means the cost for the capacity is actually $155.5k. However, we have not included ANY form of deduplication and compression in to account. Of course the results will differ per environment, but I feel 3-4x is a reasonable average across hundreds of VMs with 80TB of usable capacity in total. Lets do the math with 3x just to be safe.

I can’t simply divide the cost by 3 as unfortunately dedupe and compression do not work on licensing cost, unless of course the lower number of devices would result in a lower number of hosts needed, but lets not assume that that is the case. In this case I will divide 111 devices required by 3 which means we need 37 devices in total:

37 * 1131 = $ 41847

As you can see, from a storage cost perspective we are now much lower than hybrid, 41k for all-flash versus 78k for hybrid. We haven’t factored in the license yet though, so if you add 30K for licensing then (delta between standard/advanced * 20 CPUs) it means all-flash is 71k and hybrid 78k. The difference being $ 7000 between these configurations with all-flash being cheaper, well that is not the only difference, the biggest difference of course will be the user experience, much higher number of IOPS but more importantly also an extremely low latency. Now, as stated the above is an example with prices taken from Dell.com, if you do the same with SuperMicro then of course the results will be different. Prices will differ per partner, but Thinkmate for instance charges $379 for a 10K RPM 1.2TB SAS drive, and $549 for a Micron DC510 with 960GB of capacity. Which means that the base price with just taking RAID-5 in to consideration and no dedupe and compression benefit will be really close.

Yes, the time is now, all-flash will over take “performance” hybrid configurations for sure!

VSAN 6.2 : Why going forward FTT=2 should be your new default

Duncan Epping · Mar 1, 2016 ·

I’ve been talking to a lot of customers the past 12-18 months, if one thing stood out is that about 98% of all our customers used Failures To Tolerate = 1. This means that 1 host or disk could die/disappear without losing data. Most of the customers when talking to them about availability indicated that they would prefer to use FTT=2 but cost was simply too high.

With VSAN 6.2 all of this will change. Today with a 100GB disk FTT=1 results in 200GB of required disk capacity. With FTT=2 you will require 300GB of disk capacity for the same virtual machine, which is an extra 50% capacity required compared to FTT=1. The risk, for most people, did not appear to weigh up against the cost. With RAID-5 and RAID-6 the math changes, and the cost of extra availability also changes.

The 100GB disk we just mentioned with FTT=1 and the Failure Tolerance Method set to “RAID-5/6” (only available for all flash) means that the 100GB disk requires 133GB of capacity. Already that is a saving of 67GB compared to “RAID-1”. But that savings is even bigger when going to FTT=2, now that 100GB disk requires 150GB of disk capacity. This is less than “FTT=1” with “RAID-1” today and literally half of FTT=2 and FTM=RAID-1. On top of that, the delta between FTT=1 and FTT=2 is also tiny, for an additional 17GB disk space you can now tolerate 2 failures. Lets put that in to a table, so it is a bit easier to digest. (Note that you can sort the table by clicking on a column header.)

FTT	FTM	Overhead	VM size	Capacity required
1	Raid-1	2x	100GB	200GB
1	Raid-5/6	1.33x	100GB	133GB
2	Raid-1	3x	100GB	300GB
2	Raid-5/6	1.5x	100GB	150GB

Of course you need to ask yourself if your workload requires it, does it make sense with desktops? Well for most desktops it probably doesn’t… But for your Exchange environment maybe it does, for your databases maybe it does, for your file servers, print servers, for your web farm even it can make a difference. That is why I feel that the standard used “FTT” setting is going to change slowly, and will (should) be FTT=2 in combination with FTM set to “RAID-5/6”. Now let it be clear, there is a performance difference between FTT=2 with FTM=RAID-1 vs FTT=2 with FTM=RAID-6 (same applies for FTT=1) and of course there is a CPU resource cost as well. Make sure to benchmark what the “cost” is for your environment and make an educated decision based on that. I believe though that in the majority of cases the extra availability will outweigh the cost / overhead, but still this is up to you to determine and decide. What is great about VSAN in my opinion is the fact that we offer you the flexibility to decide per workload what makes sense.

VSAN 6.2 : Sparse Swap, what is it good for?

Duncan Epping · Feb 29, 2016 ·

I already briefly touched on Sparse Swap in my VSAN 6.2 launch article. When talking about space efficiency and VSAN 6.2 most people will immidiately bring up RAID-5/6 and/or deduplication and compression. Those of course are definitely the big ticket items for VSAN 6.2, there is no doubt. Sparse Swap however is one of those tiny little enhancements to VSAN that can make a big difference in terms of cost and space efficiency. And although I already briefly discussed it, I would like to go over it again and show you an example why it makes a difference and when it can make a big difference.

First off all a bit of history. Up to VSAN 6.1 all “swap files” were created with a 100% space reservation. This means that when you deploy a VM and the VM has 4GB of memory, and no memory reservation defined, that a swap file will be created and 4GB of disk space will be reserved for this swap file. Now keep in mind that in order to ensure availability that swap file is not a single 4GB object, but actually 2 x 4GB. You can imagine that with a single VM the cost of that swap file is negligible. But with 100VMs per host and 1600 in a cluster that single 4GB swap file per VM now results in:

1600 VMs * 4GB * 2 (FTT overhead) =12800GB capacity reservation

Note that even with with RAID5 or RAID6 the FTT overhead would still be 2*, this is because swap is a special object and VM Storage Policies do not apply to it. Note that no other VM / object can reserve or use the space which is reserved for those swap files. When Sparse Swap however is enabled (advanced host setting) no capacity will be reserved for those VM swap files. This means that instead of losing 12800GB capacity you now don’t lose anything.

When should you use this? Well first and foremost when you don’t overcommit on memory! If you are planning to overcommit on memory then please do not use this functionality as you will need the swap file when there are no memory pages available. I hope that this is clear and you only use it when you are not overcommitting on memory. Linked clone desktops is one of those use cases where swap files are a significant portion of the total required datastore capacity, leveraging Sparse Swap will allow you to reduce the cost, especially when running all-flash. So now that we know why, how do you enable it? Well that is really simple:

esxcfg-advcfg -s 1 /VSAN/SwapThickProvisionDisabled

I hope this article makes it clear that this small enhancement can go a long way! Oh and before I forget, this neat small but useful enhancement was the result of a feature request a customer filed about 3 – 4 months ago, just think about that for a second, that is agility / flexibility right there, and yes our customers come first.

VMUGs next week in South Africa, sign up!

Duncan Epping · Feb 26, 2016 ·

Last year when I visited South Africa and presented at 3 events someone on twitter said: if I would have known I would have shown up. Already tweeted it a couple of times, but just in case people missed it, I figured I would also do a short post on the topic. I am presenting at 3 VMUGs in South Africa next week together with Scott Lowe and Joe Baguley. If you live near any of these three cities, make sure to sign up… Great way to meet like minded people, extend your network, and get to hear all about Software Defined Everything from Joe, The future of NSX from Scott and all about VSAN 6.2 from myself. (Note, Joe will not be able to present in Durban unfortunately)

Hope to see you guys next week, and euuh… Yes, I would love a Stellenbrau 😉