I saw a tweet pass by from PernixData and although I already knew the world of datacenter/storage design was changing it just really sank in. Over the last 5 years we have seen the world of storage change significantly. We have seen new types of storage being introduced like all-flash-based storage, hybrid storage (mix of SSD and SATA) and hyper-converged solutions. Examples of these would be Violin Memory (all-flash), Tintri (hybrid) and Nutanix (converged). More recently object-based storage solutions are trending, as Stephen Foskett states in his article on scaling storage it is nothing new but it seems to be more relevant in this new day and age.
I would expect Frank Denneman to dive in to the whole architecture aspect as part of his “Basic elements of a flash virtualization platform” series, so I am not going in to a huge amount of depth, but I did wanted to coin this term / strategy / direction. Host based flash caching solutions like VMware vFlash (when released), PernixData, FlashSoft and others will allow you to decouple performance from capacity. It truly should be treated as a new tier of storage, an extension of your storage system! This is something which will take time to realize… as it is natural to see host based flash caching solution as an extension of your hypervisor. I have been struggling with this myself for a while to be honest. When you realize that host based flash caching is a new storage tier you will also wonder what would sit behind that new storage tier? In an existing environment it is clear what the next tier is, but in a green field deployment which components should be part of a hybrid storage stack?
Just to clarify, “hybrid” in “hybrid storage stack” refers to the usage of flash for performance requirements and spindles for capacity whereas “stack” refers to the fact that this solution is not contained with in a single box as opposed to a hybrid storage device. So the first component obviously would be host based flash caching, this would enable you to meet your performance requirements. Now, I will aim to keep things simple but there are various host based data services like replication which could be included if needed. From a capacity perspective a storage system would be needed, something that can easily scale out and is easy to manage. Object-based storage solutions are trending for a reason, and I think they could be a good fit. No need for me to explain why, when Stephen has already done that in his excellent article, lets just quote the relevant portion:
This is exactly the architecture that the latest storage arrays are adopting: Object storage inside, with loosely-coupled nodes offering truly dynamic scaling. Although many allow native API access, most of these products also include an integrated object-to-file gateway, with VMware-friendly NFS or Windows-oriented SMB as the front-end protocol. These aren’t the ideal protocols for scaly-storage access, but at least they’re compatible with existing applications.
By finally divorcing data storage from legacy RAID, these systems offer compelling advantages. Many include integrated tiering, with big, slow disks and flash storage acting in concert.
Now here comes the problem I see… These object storage solutions today are not designed to work in-conjunction with host local flash caching solutions. Not that I would expect it to cause issues from a technical perspective, but fit might cause issues from a total cost of ownership perspective. What I am saying is that many of these systems are already “optimized” for both performance and capacity. So what would be next? Smart object based storage solution that integrates with host local flash caching solutions and can easily scale out for a fair price? I haven’t seen too many (which doesn’t mean there aren’t any), it seems there is an opportunity here.
Maybe a call-to-action for all those vendors working on host based flash caching solutions… It would be nice to see reference architectures for existing environments with legacy storage, but also for green-field deployments. What if I have a brand new datacenter, where does your platform fit? How do I control cost by decoupling performance and capacity? What are good options for capacity? How well do these solutions interact / integrate? I know, a lot of questions and not a lot of answers for now… hopefully that will change.