flash

All-Flash Stretched vSAN Cluster and VM/Host Rules

Duncan Epping · Aug 20, 2018 ·

I had a question last week about the need for DRS Rules or also known as VM/Host Rules in an All-Flash Stretched vSAN Infrastructure. In a vSAN Stretched Cluster, there is read locality implemented. The read locality functionality, in this case, ensures that: reads will always come from the local fault domain.

This means that in the case of a stretched environment, the reads will not have to traverse the network. As the maximum latency is 5ms, this avoids a potential performance hit of 5ms for reads. The additional benefit is also that in a hybrid configuration we avoid needing to re-warm the cache. For the (read) cache re-warming issue we recommend our customers to implement VM/Host Rules. These rules ensure that VMs always run on the same set of hosts, in a normal healthy situation. (This is explained on storagehub here.)

What about an All-Flash cluster, do you still need to implement these rules? The answer to that is: it depends. You don’t need to implement it for “read cache” issues, as in an all-flash cluster there is no read cache. Could you run without those rules? Yes, you can, but if you have DRS enabled this also means that DRS freely moves VMs around, potentially every 5 minutes. This also means that you will have vMotion traffic consuming the inter-site links, and considering how resource hungry vMotion can be, you need to ask yourself if cross-site load balancing adds anything, what the risk is, what the reward is? Personally, I would prefer to load balance within a site, and only go across the link when doing site maintenance, but you may have a different view or set of requirements. If so, then it is good to know that vSAN and vSphere support this.

Hybrid vs All-flash VSAN, are we really getting close?

Duncan Epping · Mar 4, 2016 ·

For the last 12 months people have been saying that all-flash and hybrid configurations are getting really close in terms of pricing. During the many conversations I have had with customers it became clear that this is not always the case when they requested quotes from server vendors and I wondered why. I figured I would go through the exercise myself to see how close we actually are and to what we are getting close. I want to end this discussion once and for all, and hopefully convince all of you to get rid of that spinning rust from your VSAN configurations, especially those who are now at the point of making their design.

For my exercise I needed to make up some numbers, I figured I would use an example of a customer to make it as realistic as possible. I want to point out that I am not looking at the price of the full infrastructure here, just comparing the “capacity tier”, so if you received a quote that is much higher than that does make sense as you will have included CPU, Memory, Caching tier etc etc. Note that I used dollar prices and took no discount in to account, discount will be different for every customer and differ per region, and I don’t want to make it more complex than it needs to be. This applies to both the software licenses as the hardware.

What are we going to look at:

10 host cluster
80TB usable capacity required
Prices for SAS magnetic disks and an all-flash configuration

I must say that the majority of my customers use SAS, some use NL-SAS. With NL-SAS of course the price point is different, but those customers are typically also not overly concerned about performance, hence the SAS and all-flash comparison is more appropriate. I also want to point out that the prices below are “list prices”. Of course the various vendors will give a substantial discount, and this will be based on your relationship and negotiation skills.

SAS 1.2TB HDD = $ 579
SAS 960GB SSD = $ 1131

The above prices were taken from dell.com and the SSD is a read intensive SSD. Now lets take a look at an example. With FTT=1 and RAID-1 for Hybrid and RAID-5 for all-flash. In order to get to 80TB usable we need to calculate the overhead and then divide it by the size of the device to find out how many devices we need. Multiply that outcome by the cost of the device and you end up with the cost of the 80TB usable capacity tier.

80TB * 2 (FTT factor) = 160TB / 1.2TB device = 133.33 devices needed
134* 579 = $ 77586

80TB * 1,33 (FTT factor) = 106.4TB / 0.960TB device = 110.8 devices needed
111 * 1131 = $ 125542

Now if you look at the price per solution, hybrid costs are $ 77.5k while all-flash is $125.5k. A significant difference, and then there also is the 30k licensing delta (list price) you need to take in to account, which means the cost for the capacity is actually $155.5k. However, we have not included ANY form of deduplication and compression in to account. Of course the results will differ per environment, but I feel 3-4x is a reasonable average across hundreds of VMs with 80TB of usable capacity in total. Lets do the math with 3x just to be safe.

I can’t simply divide the cost by 3 as unfortunately dedupe and compression do not work on licensing cost, unless of course the lower number of devices would result in a lower number of hosts needed, but lets not assume that that is the case. In this case I will divide 111 devices required by 3 which means we need 37 devices in total:

37 * 1131 = $ 41847

As you can see, from a storage cost perspective we are now much lower than hybrid, 41k for all-flash versus 78k for hybrid. We haven’t factored in the license yet though, so if you add 30K for licensing then (delta between standard/advanced * 20 CPUs) it means all-flash is 71k and hybrid 78k. The difference being $ 7000 between these configurations with all-flash being cheaper, well that is not the only difference, the biggest difference of course will be the user experience, much higher number of IOPS but more importantly also an extremely low latency. Now, as stated the above is an example with prices taken from Dell.com, if you do the same with SuperMicro then of course the results will be different. Prices will differ per partner, but Thinkmate for instance charges $379 for a 10K RPM 1.2TB SAS drive, and $549 for a Micron DC510 with 960GB of capacity. Which means that the base price with just taking RAID-5 in to consideration and no dedupe and compression benefit will be really close.

Yes, the time is now, all-flash will over take “performance” hybrid configurations for sure!

All-flash VSAN configuration example

Duncan Epping · Mar 31, 2015 ·

I was talking to a customer this week who was looking to deploy various 4 node VSAN configurations. They needed a solution which would provide them performance and wanted to minimize the moving components due to the location and environmental aspects of the deployment, all-flash VSAN is definitely a great choice for this scenario. I looked at various server vendors and based on their requirements (and budget) provided them a nice configuration (in my opinion) which comes in for slightly less than $ 45K.

What I found interesting is the price of the SSDs, especially the “capacity tier” as the price is very close to SAS 10K RPM. I selected the Intel S3500 as the capacity tier as it was one of the cheapest listed that is part of the VMware VSAN HCL, will be good to track GB/$ for new entries on the HCL that will be coming soon, so far S3500 seems to be the sweet spot. Also seems that from a price point perspective the 800GB devices are most cost effective at the moment. The 3500 seems to perform well as demonstrated in this paper by VMware on VSAN scaling / performance.

This is what the bill of materials looked like, and I can’t wait to see it deployed:

Supermicro SuperServer 2028TP-HC0TR – 2U TwinPro2
Each node comes with:
- 2 x Eight-Core Intel Xeon Processor E5-2630 v3 2.40GHz 20MB Cache (85W)
- 256 GB in 8 DIMMs at 2133 MHz (32GB DIMMs)
- 2 x 10GbE NIC port
- 1 x 400GB Intel SSD DC S3700 (Cache tier)
- 5 x 800GB Intel SSD DC S3500 (Capacity tier)
- Dual 10-Gigabit Ethernet
- LSI 3008 12G SAS

That is a total of 16TB of flash based storage capacity, 1TB of memory and 64 cores in mere 2U. The above price is based on a simple online configurator and does not include any licenses, a very compelling solution if you ask me.

All Flash VSAN – One versus multiple disk groups

Duncan Epping · Mar 11, 2015 ·

A while ago I wrote this article on the topic of “one versus multiple disk groups“. The summary was that you can start with a single disk group, but that from a failure domain perspective having multiple disk groups is definitely preferred. Also from a performance stance there could be a benefit.

So the question now is, what about all-flash VSAN? First of all, same rules apply: 5 disk groups max, each disk group 1 SDD for caching and 7 devices for capacity. There is something extra to consider though. It isn’t something I was aware off until I read the excellent Design and Sizing Guide by Cormac. It states the following:

In version 6.0 of Virtual SAN, if the flash device used for the caching layer in all-flash configurations is less than 600GB, then 100% of the flash device is used for cache. However, if the flash cache device is larger than 600GB, the only 600GB of the device is used for caching. This is a per-disk group basis.

Now for the majority of environments this won’t really be an issue as they typically don’t hit the above limit, but it is good to know when doing your design/sizing exercise. The recommendation of 10% cache to capacity ratio still stands, and this is used capacity before FTT. If you have a requirement for a total of 100TB, then with FTT=1 that is roughly 50TB of usable capacity. When it comes to flash this means you will need a total of max 5TB flash. That is 5TB of flash in total, with 10 hosts that would be 500GB per host and that is below the limit. But with 5 hosts that would be 1TB per host which is above the 600GB mark and would result in 400GB per host being unused?

Well not entirely, although the write buffer has a max size of 600GB, note that the remainder of the capacity is used by the SSD for endurance, so it will cycle through those cells… and that is also mainly what you are sizing for when it comes to the write buffer. That 10% is mainly around endurance. So you have a couple of options, you can have multiple disk groups to use the full write buffer capability if you feel you need that, or you can trust on VSAN and the flash “internals” to do what they need to do… I have customers doing both, and I have never heard a customer “complain” about the all-flash write-buffer limit… 600GB is a lot to fill up.

Two logical PCIe flash devices for VSAN

Duncan Epping · Jan 5, 2015 ·

A couple of days ago I was asked whether I would recommend to use two logical PCIe flash devices leveraging a single physical PCIe flash device. The reason for the question was the recommendation from VMware to have two Virtual SAN disk groups instead of (just) one disk group.

First of all, I want to make it clear that this is a recommended practices but definitely not a requirement. The reason people have started recommending it is because of “failure domains”. As some of you may know, when a flash device becomes unavailable, which is used for read caching / write buffering and fronts a given set of disks, all the disks in that disk group associated with the flash devices becomes unavailable. As such a disk group can be considered a failure domain, and when it comes to availability it is typically best to spread risks so having multiple failure domains is desirable.

When it comes to PCIe devices would it make sense to carve up a single physical device in to multiple logical? From a failure point of view I personally think it doesn’t add much value, if the device fails then it is likely both logical devices fail. From an availability point of view there isn’t much 2 logical devices adds, however it could be beneficial to have multiple logical devices if you have more than 7 disks per server.

As most of you will know each host can have 7 disks per disk group at most and 5 disk groups per server. If there is a requirement for the server to have more than 7 disks then there will be a need to have multiple flash devices, in that scenario creating multiple logical devices would be needed, although I would still prefer having multiple physical devices from a failure tolerance perspective than having multiple logical devices. But I guess it all depends on what type of devices you use, if you have sufficient PCIe slots available etc. In the end the decision is up to you, but do make sure you understand the impact of your decision.