vSphere

Hybrid vs All-flash VSAN, are we really getting close?

Duncan Epping · Mar 4, 2016 ·

For the last 12 months people have been saying that all-flash and hybrid configurations are getting really close in terms of pricing. During the many conversations I have had with customers it became clear that this is not always the case when they requested quotes from server vendors and I wondered why. I figured I would go through the exercise myself to see how close we actually are and to what we are getting close. I want to end this discussion once and for all, and hopefully convince all of you to get rid of that spinning rust from your VSAN configurations, especially those who are now at the point of making their design.

For my exercise I needed to make up some numbers, I figured I would use an example of a customer to make it as realistic as possible. I want to point out that I am not looking at the price of the full infrastructure here, just comparing the “capacity tier”, so if you received a quote that is much higher than that does make sense as you will have included CPU, Memory, Caching tier etc etc. Note that I used dollar prices and took no discount in to account, discount will be different for every customer and differ per region, and I don’t want to make it more complex than it needs to be. This applies to both the software licenses as the hardware.

What are we going to look at:

10 host cluster
80TB usable capacity required
Prices for SAS magnetic disks and an all-flash configuration

I must say that the majority of my customers use SAS, some use NL-SAS. With NL-SAS of course the price point is different, but those customers are typically also not overly concerned about performance, hence the SAS and all-flash comparison is more appropriate. I also want to point out that the prices below are “list prices”. Of course the various vendors will give a substantial discount, and this will be based on your relationship and negotiation skills.

SAS 1.2TB HDD = $ 579
SAS 960GB SSD = $ 1131

The above prices were taken from dell.com and the SSD is a read intensive SSD. Now lets take a look at an example. With FTT=1 and RAID-1 for Hybrid and RAID-5 for all-flash. In order to get to 80TB usable we need to calculate the overhead and then divide it by the size of the device to find out how many devices we need. Multiply that outcome by the cost of the device and you end up with the cost of the 80TB usable capacity tier.

80TB * 2 (FTT factor) = 160TB / 1.2TB device = 133.33 devices needed
134* 579 = $ 77586

80TB * 1,33 (FTT factor) = 106.4TB / 0.960TB device = 110.8 devices needed
111 * 1131 = $ 125542

Now if you look at the price per solution, hybrid costs are $ 77.5k while all-flash is $125.5k. A significant difference, and then there also is the 30k licensing delta (list price) you need to take in to account, which means the cost for the capacity is actually $155.5k. However, we have not included ANY form of deduplication and compression in to account. Of course the results will differ per environment, but I feel 3-4x is a reasonable average across hundreds of VMs with 80TB of usable capacity in total. Lets do the math with 3x just to be safe.

I can’t simply divide the cost by 3 as unfortunately dedupe and compression do not work on licensing cost, unless of course the lower number of devices would result in a lower number of hosts needed, but lets not assume that that is the case. In this case I will divide 111 devices required by 3 which means we need 37 devices in total:

37 * 1131 = $ 41847

As you can see, from a storage cost perspective we are now much lower than hybrid, 41k for all-flash versus 78k for hybrid. We haven’t factored in the license yet though, so if you add 30K for licensing then (delta between standard/advanced * 20 CPUs) it means all-flash is 71k and hybrid 78k. The difference being $ 7000 between these configurations with all-flash being cheaper, well that is not the only difference, the biggest difference of course will be the user experience, much higher number of IOPS but more importantly also an extremely low latency. Now, as stated the above is an example with prices taken from Dell.com, if you do the same with SuperMicro then of course the results will be different. Prices will differ per partner, but Thinkmate for instance charges $379 for a 10K RPM 1.2TB SAS drive, and $549 for a Micron DC510 with 960GB of capacity. Which means that the base price with just taking RAID-5 in to consideration and no dedupe and compression benefit will be really close.

Yes, the time is now, all-flash will over take “performance” hybrid configurations for sure!

VSAN 6.2 : Why going forward FTT=2 should be your new default

Duncan Epping · Mar 1, 2016 ·

I’ve been talking to a lot of customers the past 12-18 months, if one thing stood out is that about 98% of all our customers used Failures To Tolerate = 1. This means that 1 host or disk could die/disappear without losing data. Most of the customers when talking to them about availability indicated that they would prefer to use FTT=2 but cost was simply too high.

With VSAN 6.2 all of this will change. Today with a 100GB disk FTT=1 results in 200GB of required disk capacity. With FTT=2 you will require 300GB of disk capacity for the same virtual machine, which is an extra 50% capacity required compared to FTT=1. The risk, for most people, did not appear to weigh up against the cost. With RAID-5 and RAID-6 the math changes, and the cost of extra availability also changes.

The 100GB disk we just mentioned with FTT=1 and the Failure Tolerance Method set to “RAID-5/6” (only available for all flash) means that the 100GB disk requires 133GB of capacity. Already that is a saving of 67GB compared to “RAID-1”. But that savings is even bigger when going to FTT=2, now that 100GB disk requires 150GB of disk capacity. This is less than “FTT=1” with “RAID-1” today and literally half of FTT=2 and FTM=RAID-1. On top of that, the delta between FTT=1 and FTT=2 is also tiny, for an additional 17GB disk space you can now tolerate 2 failures. Lets put that in to a table, so it is a bit easier to digest. (Note that you can sort the table by clicking on a column header.)

FTT	FTM	Overhead	VM size	Capacity required
1	Raid-1	2x	100GB	200GB
1	Raid-5/6	1.33x	100GB	133GB
2	Raid-1	3x	100GB	300GB
2	Raid-5/6	1.5x	100GB	150GB

Of course you need to ask yourself if your workload requires it, does it make sense with desktops? Well for most desktops it probably doesn’t… But for your Exchange environment maybe it does, for your databases maybe it does, for your file servers, print servers, for your web farm even it can make a difference. That is why I feel that the standard used “FTT” setting is going to change slowly, and will (should) be FTT=2 in combination with FTM set to “RAID-5/6”. Now let it be clear, there is a performance difference between FTT=2 with FTM=RAID-1 vs FTT=2 with FTM=RAID-6 (same applies for FTT=1) and of course there is a CPU resource cost as well. Make sure to benchmark what the “cost” is for your environment and make an educated decision based on that. I believe though that in the majority of cases the extra availability will outweigh the cost / overhead, but still this is up to you to determine and decide. What is great about VSAN in my opinion is the fact that we offer you the flexibility to decide per workload what makes sense.

vSphere HA deepdive 6.x available for free!

Duncan Epping · Feb 18, 2016 ·

I’ve been discussing this over the last 12 months with Frank, and to be honest we are still not sure what is the right thing to do but we decided to take this step anyway. Over the past couple of years we released various updates of the vSphere Clustering Deepdive. Updating the book sometimes was a massive pain (version 4 to 5 for instance), but some of the minor updates have been relative straight forward, although still time consuming due to formatting / diagrams / screenshots etc.

Ever since we’ve been looking for new ways to distribute our book, or publication as I will refer to it from now on. I’ve looked at various options, and found one which I felt was the best of all worlds: Gitbook. Gitbook is a solution which allows you as an author to develop content in Markdown and distribute it in various different formats. This could be as static html, pdf, ePub or Mobi. Basically any format you would want in this day and age. The great thing about the platform as well is that it integrates with Github and you can share your source there and do things like version control etc. It does it in such a way that I can use the Gitbook client on my Mac, while someone else who wants to contribute or submit a change can simply use their client of choice and submit a change through git. Although I don’t expect too many people to do this, it will make it easier for me to have material reviewed for instance by one of the VMware engineers.

So what did I just make available for free? Well in short, an updated version (vSphere 6.0 Update 1) of the vSphere HA Deepdive. This includes the stretched clustering section of the book. Note that DRS and SDRS have not been included (yet). This may or may not happen in some shape or form in the future though. For now, I hope you will enjoy and appreciate the content that I made available for free. You can access it by clicking “HA Deepdive” on the left, or (in my opinion) for a better reading experience read it on Gitbook directly through this link: ha.yellow-bricks.com.

Note that there are links as well to download the content in different formats, for those who want to read it on their iPad / phone / whatever. Also note that Gitbook allows you to comment on a paragraph by clicking the “+” sign on the right side of the paragraph when you hover over it… Please submit feedback when you see mistakes! And for those who are really active, if you want to you could even contribute to the content! I will keep updating the content over the upcoming months probably with more info on VVols and for instance the Advanced Settings, so keep checking back regularly!

EMC and VMware introduce VxRail, a new hyper-converged appliance

Duncan Epping · Feb 16, 2016 ·

As most of you know I’ve been involved in Virtual SAN in some shape or form since the very first release. Reason I was very excited about Virtual SAN is because I felt it would provide anyone the ability to develop a hyper-converged offering. Many VMware partners have already done this, and with the VSAN Ready Node program growing and enhancing every day (more about this soon) customers have an endless list of options to chose from. Today EMC and VMware introduce a new hyper-converged appliance: VxRail.

I am not going to make this an extremely long post, as my friend Chad has already done that of course and there is no point in repeating his blog word for word. I do feel however that VxRail truly is the best both EMC and VMware have to offer. The great thing about VxRail in my opinion is that it can be configured in anyway you like. From 6 all the way up to 28 cores per CPU, from 64GB of memory all the way up to 512GB of memory, from 3.6TB of storage all the way up to 19TB of storage. And yes that was per “node” not per appliance. And considering the roadmap, I can see those numbers increasing fast as well. Also note that we are talking “hybrid” and “all-flash” models here. I have to agree with Chad, I think that all-flash will be preferably to hybrid. The tipping point in terms of economics have definitely been reached, especially when you take the various data services in to account that VSAN has to offer.

These are the models which VCE will offer for All-Flash. Note that you can start with 3 nodes and scale up in 1 node increments.

What I think is great about VxRail is the fact (besides that it comes with vSphere and VSAN) that it comes with additional services like for instance RecoverPoint for VM (15 VMs for free per appliance), which is completely integrated with the Web Client by the way. (For those who don’t know, RecoverPoint provides sync and a-sync replication.) Or for instance S-3 compliant object storage is provided out of the box, 10TB license is included for free per appliance. On top of that there is integration built in with Data Domain.

Must be expensive right? Well actually it isn’t. Smallest configuration starts at $60k list price… Great price point, and I can’t wait for the first boxes to hit the street. Heck I need to talk Chad in to sending me one of those All-Flash models for our lab at some point.

What’s new for Virtual SAN 6.2?

Duncan Epping · Feb 10, 2016 ·

Yes, finally… the Virtual SAN 6.2 release has just been announced. Needless to say, but I am very excited about this release. This is the release that I have personally been waiting for. Why? Well I think the list of new functionality will make that obvious. There are a couple of clear themes in this release, and I think it is fair to say that data services / data efficiency is most important. Lets take a look at the list of what is new first and then discuss them one by one

Deduplication and Compression
RAID-5/6 (Erasure Coding)
Sparse Swap Files
Checksum / disk scrubbing
Quality of Service / Limits
In mem read caching
Integrated Performance Metrics
Enhanced Health Service
Application support

That is indeed a good list of new functionality, just 6 months after the previous release that brought you Stretched Clustering, 2 node Robo etc. I’ve already discussed some of these as part of the Beta announcements, but lets go over them one by one so we have all the details in one place. By the way, there also is an official VMware paper available here.

Deduplication and Compression has probably been the number one ask from customers when it comes to features requests for Virtual SAN since version 1.0. The Deduplication and Compression is a feature which can be enabled on an all-flash configuration only. Deduplication and Compression always go hand-in-hand and is enabled on a cluster level. Note that Deduplication and Compression are referred to as nearline dedupe / compression, which basically means that deduplication and compression happens during destaging from the caching tier to the deduplication tier.

Now lets dig a bit deeper. More specifically, deduplication granularity is 4KB and will happen first and is then followed by an attempt to compress the unique block. This block will only be stored compressed when it can be compressed down to 2KB or smaller. The domain for deduplication is the disk group in each host. Of course the question then remains, what kind of space savings can be expected? It depends is the answer. In our environments, and our testing, have shown space savings between 2x and 7x. Where 7x arefull clone desktops (optimal situation) and 2x is a SQL database. Results in other words will depend on your workoad.

Next on the list is RAID-5/6 or Erasure Coding as it is also referred to. In the UI by the way, this is configurable through the VM Storage Policies and you do this through defining the “Fault Tolerance Method” (FTM). When you configure this you have two options: RAID-1 (Mirroring) and RAID-5/6 (Erasure Coding). Depending on how FTT (failures to tolerate) is configured when RAID-5/6 is selected you will end up with a 3+1 (RAID-5) configuration for FTT=1 and 4+2 for FTT=2.

Note that “3+1” means you will have 3 data blocks and 1 parity block, in the case of 4+2 this means 4 data blocks and 2 parity blocks. Note that again this functionality is only available for all-flash configurations. There is a huge benefit to using it by the way:

Lets take the example of a 100GB Disk:

100GB disk with FTT =1 & FTM=RAID-1 set –> 200GB disk space needed
100GB disk with FTT =1 & FTM=RAID-5/6 set –> 130.33GB disk space needed
100GB disk with FTT =2 & FTM=RAID-1 set –> 300GB disk space needed
100GB disk with FTT =2 & FTM=RAID-5/6 set –> 150GB disk space needed

As demonstrated, the space savings are enormous, especially with FTT=2 the 2x savings can and will make a big difference. Having that said, do note that the minimum number of hosts required also change. For RAID-5 this is 4 (remember 3+1) and 6 for RAID-6 (remember 4+2). The following two screenshots demonstrate how easy it is to configure it and what the layout looks of the data in the web client.

Sparse Swap Files is a new feature that can only be enabled by setting an advanced setting. It is one of those features that is a direct result of a customer feature request for cost optimization. As most of you hopefully know, when you create VM with 4GB of memory a 4GB swap file will be created on a datastore at the same time. This is to ensure memory pages can be assigned to that VM even when you are overcommitting and there is no physical memory available. With VSAN when this file is created it is created “thick” at 100% of the memory size. In other words, a 4GB swap file will take up 4GB which can’t be used by any other object/component on the VSAN datastore. When you have a handful of VMs there is nothing to worry about, but if you have thousands of VMs then this adds up quickly. By setting the advanced host setting “SwapThickProvisionedDisabled” the swap file will be provisioned thin and disk space will only be claimed when the swap file is consumed. Needless to say, but we only recommend using this when you are not overcommitting on memory. Having no space for swap and needed to write to swap wouldn’t make your workloads happy.

Next up is the Checksum / disk scrubbing functionality. As of VSAN 6.2 for every write (4KB) a checksum is calculated and stored separately from the data (5-byte). Note that this happens even before the write occurs to the caching tier so even an SSD corruption would not impact data integrity. On a read of course the checksum is validated and if there is a checksum error it will be corrected automatically. Also, in order to ensure that over time stale data does not decay in any shape or form, there is a disk scrubbing process which reads the blocks and corrects when needed. Intel crc32c is leveraged to optimize the checksum process. And note that it is enabled by default for ALL virtual machines as of this release, but if desired it can be disabled as well through policy for VMs which do not require this functionality.

Another big ask, primarily by service providers, was Quality of Service functionality. There are many aspects of QoS but one of the major asks was definitely the capability to limit VMs or Virtual Disks to a certain number of IOPS through policy. This simply to prevent a single VM from consuming all available resources of a host. One thing to note is that when you set a limit of 1000 IOPS VSAN uses a block size of 32KB by default. Meaning that when pushing 64KB writes the 1000 IOPS limits is actual 500. When you are doing 4KB writes (or reads for that matter) however, we still count with 32KB blocks as this is a normalized value. Keep this in mind when setting the limit.

When it comes to caching there was also a nice “little” enhancement. As of 6.2 VSAN also has a small in-memory read cache. Small in this case means 0.4% of a host’s memory capacity up to a max of 1GB. Note that this in-memory cache is a client side cache, meaning that the blocks of a VM are cached on the host where the VM is located.

Besides all these great performance and efficiency enhancements of course a lot of work has also been done around the operational aspects. As of VSAN 6.2 no longer do you as an admin need to dive in to the VSAN observer, but you can just open up the Web Client to see all performance statistics you want to see about VSAN. It provides a great level of detail ranging from how a cluster is behaving down to the individual disk. What I personally feel is very interesting about this performance monitoring solution is that all the data is stored on VSAN itself. When you enable the performance service you simply select the VSAN storage policy and you are set. All data is stored on VSAN and also all the calculations are done by your hosts. Yes indeed, a distributed and decentralized performance monitoring solution, where the Web Client is just showing the data it is provided.

Of course all new functionality, where applicable, has health check tests. This is one of those things that I got used to so fast, and already take for granted. The Health Check will make your life as an admin so much easier, not just the regular tests but also the pro-active tests which you can run whenever you desire.

Last but not least I want to call out the work that has been done around application support, I think especially the support for core SAP applications is something that stands out!

If you ask me, but of course I am heavily biased, this release is the best release so far and contains all the functionality many of you have been asking for. I hope that you are as excited about it as I am, and will consider VSAN for new projects or when current storage is about to be replaced.