Last week I was briefed by Springpath and they launched their company officially yesterday, although they have been around for a long time. Springpath was founded by Mallik Mahalingam and Krishna Yadappanavar. For those who don’t know them, Mallik was responsible for VXLAN (See the IETF draft) and Krishna was one of the folks who was responsible for VMFS. (Together with Satyam who started Pernix Data) I believe it was early 2013 or end of 2012 when Mallik reached out to me and he wanted to validate some of his thinking around the software defined storage space, I agreed to meet up and we discussed the state at that time and where some of the gaps were. Since May 2012 they operated in stealth (under the name Storvisor) and landed a total of 34 million dollars from investors like Sequoia, NEA and Redpoint. Well established VC names indeed, but what did they develop?
Springpath is what most folks would refer to as a Server SAN solution, some may also refer to it as “hyper-converged”. I don’t label them as hyper-converged as Springpath doesn’t sell a hardware solution, they sell software and have a strict hardware compatibility list. The list of server vendors on the HCL seemed to cover the majority of big players out there though, I was told Dell, HP, Cisco and SuperMicro are on the list and that others are being worked on as we speak. This approach offers a bit more flexibility according to Springpath for customers as they can chose their own preferred vendor and leverage the server vendor relationship they already have for discounts but also maintain similar operational processes.
Springpath’s primary focus in the first release is vSphere, which knowing the background of these guys makes a lot of sense, and comes in the shape of a virtual appliance. This virtual appliance is installed on top of the hypervisor and grabs local spindles and flash. With a minimum of three nodes you then can create a shared datastore which is served back to vSphere as an NFS mount. There are of course also plans to support Hyper-V and when they do the appliance will provide SMB capabilities and for KVM it will use NFS. But that is on the roadmap right now, but not too far out according to Mallik. (Note that support for Hyper-V, KVM etc will all be released in a different version. KVM and Docker is in Beta as we speak, if you are interested go to their website and drop them an email!) There is even talk about supporting the Springpath solution to run as a Docker container and providing shared storage for Docker itself. All these different platforms should be able to leverage the same shared data platform according to Springpath, the diagram below shows this architecture.
They demonstrated the configuration / installation of their stack and I must say I was impressed with how simple it was. They showed a simple UI which allowed them to configure the IP details etc, but they also showed how they could simply drop a JSON file in there with all the config details which would then be used to deploy the storage environment. When fully configured the whole environment can be managed from the Web Client, no need for a separate UI or anything like that. All integrated within the Web Client, and for Hyper-V and other platforms they had similar plans… no separate client but all manageable through the familiar interfaces those platforms already offer.
Of course there is more to it then just serving up an NFS mount. Springpath has a rather unique architecture and offers a rich set of data services. For instance, in the 1.0 release Springpath will offer inline deduplication and compression, but also native snaps (VM level, Resource Pools or Folder level) and fast clones (they did 50 clones in roughly a minute in the demo they showed us) and there is the modular caching and persistence layer. The diagram below illustrates the architecture.
What is unique about the architecture in my opinion is the modularisation of their stack. I asked if they would be capable of having nodes with just caching for instance and Springpath acknowledged this. Just imagine you have a couple of blade chassis and a couple of 2U hosts. In this scenario you can have flash in the blades and flash+disks in the 2U hosts and have the blades do caching and the 2U hosts do caching and persistence. Also, the caching is done in memory and flash so it has 2 layers which is what allows for always on in-line compression and dedupe without a performance hit.
When it comes to availability the Springpath solution reminds me of Virtual SAN. Let me quote their white paper as I can’t explain this any better than they already did:
HALO’s Log Structure Distributed Object layer provides high availability for data by replicating copies of any incoming data. First, any data that is written to the write cache on an SSD is synchronously replicated to one or two (this is a policy driven tune that users can set) other SSDs located in different servers, before the writes are acknowledged to the application. This enables incoming writes to be acknowledged quickly at a low latency and ensures they are protected from any SSD or server failures. If an SSD or server fails, the replica is quickly recreated on surviving SSDs or servers using other available copies. Similarly, all data that is de-staged from the write cache to the persistent tier is replicated by the Log Structure Distributed Object layer. This replicated persistent data is likewise protected from any hard disk or server failure. With two replicas or a total of three data copies, the cluster can survive two SSDs or two HDDs or two server failures without the risk of data loss. Please see the Springpath Systems Administrator’s Guide for a full table of fault tolerant configurations and settings.
When talking about performance the Springpath folks mentioned that they don’t do anything around data locality. They said that they did not need data locality to provide great performance as there data distribution layer ensures that performance is guaranteed. Also with their requirement of 10GbE the cost of networking is so small that they felt it is neglectable, similar statements to what the VMware VSAN team claimed.
That all sounds awesome right, but what is missing and what is all of this going to cost?
First of all the costs, the licensing is an annual fee which includes support and is a per host price which is $ 4K per server. The support they offer is 24x7x365 email and 8x7x365 phone support for now, but they said that as the company would grow they would expect the offering to grow as well potentially. Their support offering includes cloud based monitoring and proactive alerts / support by the way. (aka phone home capabilities) From a pricing stand point, this is a rather unique strategy indeed if you ask me, I have not seen an annual subscription based solution yet for Server SAN’s, I like the “per server” model though as it is nice and simple but I am not sure how customers will respond to the subscription model. I do like that there is no capacity limitation or different pricing for different levels of functionality.
Now what is missing today, well obviously replication… I asked about this during our session and it was said that async (with a low RPO) is being worked on and could be expected soon. They did not provide a date but said that it wasn’t unreasonable to expect this within 6 months. When I asked about sync replication they said that they were considering this but would like to have more customer feedback/validation first before they invest in this functionality. Another feature which won’t be in the 1.0 release but is planned are configurable “failure domains”. Springpath said that they expect to deliver this very soon, within 6 months again, and that it will allow you to define failure domains for multi-rack / chassis deployments. They also mentioned that they will be working on VVOL support for the near feature.
Concluding, Springpath managed to create a compelling solution and 1.0 product. To be fair, considering what they’ve developed it doesn’t even sound like a 1.0 product but rather a 2.0 product. Their product has a lot of similarities to existing offerings out there but they do offer unique capabilities for a reasonable price and considering the Hyper-V, KVM/Openstack and Docker support on the roadmap it looks like they are planning to expand in to different markets and grow their customer base (currently two dozen) fast. It is going to be interesting to see how they will compete with existing players out there like Nutanix, Maxta, SimpliVity and of course Virtual SAN though… It is a crowded market already and it is not going to be easy. One thing is certain though, on-paper their offering is definitely very strong and time will tell if that is enough to compete with some of these already established names, I think they can.
PS: Cormac published his thoughts on Springpath yesterday, make sure to read his article as well.
What's in a name says
1. Can you elaborate a bit more on block and object access in the Data Access layer besides File (NFS)? Which version of nfs is supported?
2. Now regarding the data locality not being a problem: it makes several assumption on ToR bandwidth, inter-rack bandwidth or even flat datacenter networks etc that is not guaranteed if this is going to come in existing datacenter on commodity h/w. A cluster may be split between racks and the network backbone is also carrying traffic due to other VI infrastructure pieces so its an interesting Jim Gray rule that needs to be applicable to cost of sending bits over network (even if it may be 10gbps). Note, similar arguments have been made by hyper converged players against traditional shared storage i.e. ‘.. we save IO traffic on FC fabric by doing caching on host side as a first step and now by just getting rid of FC fabric altogether’; but at cost of network which is then quickly followed up by claims of building data locality and in this case (“it doesn’t matter”). How is a customer to choose?
3. In-line dedupe: first as a clarification this is inline dedupe on both SSD and spinning (capacity) tier? what savings are proposed and at what CPU cost per IO? Based on the other link to Cormac’s writeup, it looks like we are looking at a pretty beefy 8vcpu appliances with 40G mem? So in your opinion how much space does it leave for running VMs or do you think its a better ref architecture to just create storage servers dedicated to running Springpath software?
4. I will be curious to see how they stack up against Nutanix’s in terms of feature set (besides the obvious cost and hardware software solution argument)?