Today I was answering some questions on the VMTN forums and one of the questions was around the quality of components in some of the all flash / hybrid arrays. This person kept coming back to the type of flash used (eMLC vs MLC, SATA vs NL-SAS vs SAS). One of the comments he made was the following:
I talked to Pure Storage but they want $$$ for 11TB of consumer grade MLC.
I am guessing he did a quick search on the internet, found a price for some SSDs and multiplied it and figured that Pure Storage was asking way too much… And even compared to some more traditional arrays filled with SSD they could sound more expensive. I guess this also applies to other solutions, so I am not calling out Pure Storage here.One thing some people seem to forget is that when it comes to these new storage architectures is that they are build with flash in mind.
What does that mean? Well everyone has heard all of the horror stories around consumer grade flash wearing out extremely fast and blowing up in your face. Well fortunately that is only true to a certain extent as some consumer grade SSDs easily reach 1PB of writes these days. On top of that there are a couple of things I think you should know and consider before making statements like these or be influenced by a sales team who says “well we offer SLC versus MLC so we are better than them”.
For instance (As Pure Storage lists on their website), there are many more MLC drives shipped than any other type at this point. Which means that it has been tested inside out by consumers, who can break devices in many more ways than you or your QA team can? Right, the consumer! More importantly if you ask me, ALL of these new storage architectures have in-depth knowledge of the type of flash they are using. That is how their system was architected! They know how to leverage flash, they know how to write to flash, they know how to avoid fast wear out. They developed an architecture which was not only designed but also highly optimized for flash… This is what you pay for. You pay for the “total package” which means the whole solution, not just those flash devices that are leveraged. The flash devices are a part of the solution, and just a relatively small part if you ask me. You pay for total capacity with low latency and functionality like deduplication, compression and replication (in some cases). You pay for the ease of deployment and management (operational efficiency), meaning you get to spent your time on stuff that matters to your customer… their applications.
You can summarize all of it in a single sentence: the physical components used in all of these solutions are just a small part of the solution, whenever someone tries to sell you the “hardware” that is when you need to be worried!
Nisah Cheatham says
Nice blog post.
That summary sentence… words to live by.
Howard Marks says
The problem here is the term “consumer grade MLC” which the guy complaining about the cost assumes means the same thing as “consumer grade SSD”. There are several disconnects here:
1 – That eMLC is “Enterprise Grade” overall better quality flash. eMLC, or as Intel calls it HET MLC, is a compromise technology that uses a lower erase voltage over a LONGER time to make the erase less violent and therefore make the cells live through more erase cycles. The longer erase times mean more variable latency on writes as garbage collection is slower.
2 – So if we use MLC (the stuff used in consumer SSDs as well as enterprise SSDs and incorrectly called “Consumer Grade”) we can simply further over-provision to achieve the same endurance less eMLC would have given. A typical consumer SSD with 240GB of addressable space actually has 256GB of flash. An enterprise SSD with 256GB of flash will usually be sold as a 200GB SSD. More flash to work with extends endurance and makes garbage collection easier making latency more consistent.
Today most solutions using SLC at all are forced to by legacy architectures that have more write amplification (that is 1 write from the host becomes multiple writes to the flash) and can’t do system wide wear leveling. Smart software and a little extra flash can provide plenty of performance and endurance.
Vaughn Stewart (@vStewed) says
Howard,
In regards to your comment, “The longer erase times mean more variable latency on writes as garbage collection is slower.”
The true difference between eMLC & MLC is where over provisioning (OP) is provided. An all-flash array (AFA) configured with eMLC leverage SSD level OP (i.e. XtremIO 400GB SSD has 624GB of raw capacity or 56% OP). AFAs configured with MLC SSDs leverage system level OP (i.e. Pure Storage FlashArray reserves 20%).
Every modern AFA regardless of vendor incorporates one of these two forms OP to provide additional capacity that acts as a buffer for a number of SSD operations including garbage collection (GC). While it is possible to overwrite data at a rate that is greater than that which the AFA can execute GC, this condition is rarely (if ever) observed outside of synthetic tested / benchmark exercises that disable software functionality in an attempt to find HW limits.
The rate at which an eMLC or MLC device can erase is impacted by other variables such as the rate at which the data has been deleted. For datasets with deduplication, GC is only involved after every reference of a block is invalidated (deleted). For datasets with compression the rate at which GC is performance is accelerated, as the data being written to the new cells requires less bandwidth.
In summary architecture matters. There’s much more to IO processing in an AFA than simply the hardware. SW capabilities must be considered when discussing how HW components work within a system such as an AFA.
Viva la software-defined!
Cheers,
v
John Nicholson. says
Remembering back to Pure one of the cooler things I remember from Tech Field day was that they used DRAM cache not for performance but for wear leveling/write coalescing.
They use MLC flash with the controller/Firmware from the Samsung enterprise line so its not really the same thing as just grabbing some Samsung 840’s and throwing them in production (As well as they are doing some smart stuff to optimize writes).
Similarly VSAN does some smart things with writes to avoid burning out flash, and optimizing data placement to reduce load on the disks (making sure to align with boundaries, keep common LBA’s together etc). This does make
Henry Scoles says
The important thing to remember is that nearly every vendor is relying on the exact same commodity hardware to build their storage array; whether it is “hardware” or software defined. They each add a few things or have a proprietary asic (3PAR), but it all comes down to architecture. System architecture is incredibly important, but often mostly ignored. You are paying for an architecture and support of that architecture. If you had time, money, and a team of engineers you could build this commodity stuff into something, however it wouldn’t be cheaper when you were done because you have no economies of scale; unless you are google, ebay or facebook. Which is exactly what they have done because it makes sense for them and they have the smart people, the time, the money and the economies of scale.
Every vendor is selling the exact same stuff. Dig into an array and you’ll see that 95% of all storage arrays are the exact same components with 5% being “different” and the rest being the nameplate and pretty plastic.
A well architected storage system is like the golden gate bridge. It can carry dump trucks and semis (high bandwidth) and it can also carry Fiats and smart cars (small IO) and it doesn’t fall down if we mix them or place a line of only one type or another on it.
Vaughn Stewart (@vStewed) says
=== Disclaimer: Pure Storage employee ===
Measuring raw capacity makes no sense, whether one is considering hybrid disk (aka disk) or all flash arrays. There are overheads including RAID, metadata, checksums, etc, which reduce usable capacity and data reduction technologies like, dedupe, compression, pattern removal and xcopy that amplify usable capacity.
So when one comments on 11TB of raw SSD capacity they are actually referring to a configuration that with VMware virtual infrastructures provides 55TB to 99TB of usable capacity.
It’s important to understand HW limits and boundaries but we live in a world where infrastructure resources are virtualized.
Cheers,
v