startup

Datrium finally out of stealth… Welcome Datrium DVX!

Duncan Epping · Jul 28, 2015 ·

Before I get started, I have not been briefed by Datrium so I am also still learning as I type this and it is purely based on the somewhat limited info on their website. Datrium’s name has been in the press a couple of times as it was the company that was often associated with Diane Greene. The rumours back then were that Diane Greene was the founder and was going to take on EMC, that was just a rumour as Diane Greene is actually an investor in Datrium. Not just her of course, Datrium is also backed by NEA (Venture Capitalist) and various other well known people like Ed Bugnion, Mendel Rosenblum, Frank Slootman and Kai Li. Yes, a big buy in from some of the original VMware founders. Knowing that two of the Datrium founders (Boris Weissman and Ganesh Venkitachalam) are former VMware Principal Engineers (and old-timers) that makes sense. (Source) This morning a tweet was send out, and it seems today they are officially out of stealth.

As the sun rises this morning, so does a new dominant #datastorage player #Datrium #stealthmode

— Datrium (@Datrium) July 28, 2015

So what is Datrium about? Well Datrium delives a new type of storage system which they call DVX. Datrium DVX is a hybrid solution comprised of host local data services and a network accessed capacity shelf called “netshelf”. I think this quote from their website says it all what their intention is… Move all functionality to the host and let the “shelf” just take care of storing bits. I included a diagram that I found on their website as it makes it more clear.

On the host, DiESL manages in-use data in massive deduplicated and compressed caches on BYO (bring your own) commodity SSDs locally, so reads don’t need a network hop. Hosts operate locally, not as a pool with other hosts.

It seems that from a host perspective the data services (caching, compression, raid, cloning etc) are implemented through the installation of a VIB. So not VM/Appliance based but rather kernel based. The NetShelf is accessible via 10GbE and Datrium uses a proprietary protocol to connect to it. From a host side (ESXi) they connect locally over NFS, which means they have implemented an NFS Server within the host. The NFS connection is also terminated within the host and they included their own protocol/driver on the host to be able to connect to the NetShelf. It is a bit of an awkward architecture, or better said … at first it is difficult to wrap your head around it. This is the reason I used the word “hybrid” but maybe I should have used unique. Hybrid, not because of the mixture of flash and HDD but rather because it is a hybrid of hyper-converged / host local caching and more traditional storage but done in a truly unique way. What does that look like? Something like this I guess:

So what does this look like from a storage perspective? Well each NetShelf will come with 29TB of usable capacity. Expected deduplication and compression rate for enterprise companies is between 2-6x which means you will have between 58TB and 175TB to your disposal. In order to ensure your data is high available the NetShelf is a dual controller setup with dual port drives (Which means the drives are connected to both controllers and used in an “active/standby” fashion). Each controller has NVRAM which is used for write caching, and a write will be acknowledge to the VM when it has been written to the NVRAM of both controllers. In other words, if a controller fails there should be no data loss.

Talking about availability, what if a host fails? If I read their website correctly then there is no write caching from a host point of view as it is states that each host operates independently from a caching point of view (no mirroring of writes to other hosts). This also means that all the data services need to be inline –> dedupe / compress / raid. When those actions complete the result will be stored on the NetShelf and then it is accessible by other hosts when needed. It makes me wonder what happens when DRS is enabled and a VM is migrated from one host to another. Will the read cache migrate with it to the other host? And what about very write intensive workloads, how will those perform when all data services are inline? What kind of overhead can/will it have on the host? How will it scale out? What if I need more than 1 Netshelf? Those are some of the questions that popup immediately. Considering the brain-power within Datrium I am assuming they have a simple answer to those questions… (Former VMware, Data Domain, NetApp, EMC etc) I will try to ask them these questions at VMworld or during a briefing and write a follow up.

From an operational aspect it is an interesting solution as it should lower the effort involved with managing storage almost to zero. There is the NFS connection and you have your VMs and VMDKS at the front end, at the back-end you have a blackbox or better said a shelf dedicated to storing bits. This should be dead easy to manage and deploy. It shouldn’t require a dedicated storage administrator but the VMware admin should be able to manage it. Some of you may ask, well what if I want to connect anything other than a VMware host to it? For now Datrium appears to be mainly targeting VMware environments (which makes sense considering their dna) but I guess they could implement this for various platforms in a similar fashion.

Again, I was not briefed by Datrium and I accidentally saw their tweet this morning but their solution is so intriguing I figured I would share it anyway. Hope it was useful.

Interested? More info here:

Datasheet – http://www.datrium.com/datasheet/DVX_DataSheet.pdf
Host side implementation info – http://www.datrium.com/dvx-overview/diesl-software/
DVX Netshelf – http://www.datrium.com/dvx-overview/datrium-netshelf/
Twitter: http://www.twitter.com/datriumstorage

PernixData announcements at #VFD5

Duncan Epping · Jun 25, 2015 ·

Today PernixData presented at Virtualization Field Day 5. Excellent presentation by Satyam once again. This week I was fortunate to catch up with Frank Denneman to discuss what was going to be announced and what can be expected in the near future. I want to make it clear that there were no expectations given around release dates, don’t expect this to drop next week.

There were 4 key announcements:

PernixData Architect – A better way to design, operate, and optimize data centers
PernixData Cloud – Making enterprise IT more transparent
PernixData FVP
1. Freedom – Yes, “free” is the key word here!
2. New features / functionality…

I am going to go at this in a different order than the deck, as I want to cover some of the changes with regards to FVP first. Satyam spoke about a new thing called “FVP Freedom“. FVP Freedom is a free version of FVP which can be used by anyone in any environment. Of course there are some constraints / limitations and these are:

Up to 128GB DFTM cluster for write through acceleration
Community support

However, you can use FVP Freedom for an unlimited number of hosts and unlimited number of VMs. Note that “DFTM” stands for Distributed Fault Tolerant Memory. This means that FVP Freedom gives you memory (read) caching only, no SSD caching. (128GB limit per cluster) I think this is huge, and it is a very smart way of getting people to test your solution and run it in production. So what can you do with 128GB? Well of course they tested this, and they were capable of increasing the VSImax users from 181 to 328 with that 128GB of memory on 2 hosts. You may wonder why they took this approach cause what does giving it away for free bring them? Well that will be obvious when you read the other announcements.

Besides a free version of FVP some enhancements to the current version were also announced. For me support for vSphere 6.0 and VVols were two major items. On top of that new “phone home” functionality is build in, which allows for better and pro-active support. What also stood out was the new stand alone UI. This means that you will be taken out of the Web Client to a standalone HTML5+JS based interface. You may wonder why they did this, that is where the two new product announcements come in to play. FVP is still a Windows installable by the way, I hoped they would announce an appliance which lowers complexity in terms of installation and management, but maybe next time who knows.

PernixData Architect was the first announcement. It is a piece of software that enables you to monitor your infrastructure (storage focussed of course) and make educated decisions based on the information and even recommendations provided. So what are we talking about in terms of metrics etc? PernixData Architect (for now?) is focussed on storage, not just from a cluster point of view, but also from a host level and virtual machine level. What is the latency a virtual machine is experiencing? How many IOPS does this VM do on average? What is the throughput? What is the read/write ratio? What are common block sizes? All the things you would like to know when designing, scaling, sizing your storage infrastructure and of course when using FVP.

Besides the details above you can for instance also see what the active working set is in your environment for any of your VMs. You can even get recommendations around how to configure FVP, you may have it set to write back but if you are mainly serving reads from cache you may want to change that for instance. It will also give you other recommendations for instance around networking etc.

You can imagine that with all the metrics and info they are gathering they will be able to provide you much more recommendations in the future. I can see those dashboards expanding fast, and I think it is valuable for everyone to understand how their workloads are behaving. On twitter some comments were made about vR Ops and CloudPhysics. Not a fair comparison as Pernix is focussed on “just” storage for now. Personally I hope they will start tying in other aspects like memory, cpu and networking as I don’t think customers want to be stuck with 2 or 3 monitoring solutions.

Now that you have all that data, what can you do with it? Well that is where PernixData Cloud comes in. PernixData Cloud can give you Insights in to how you are doing compared to others in the industry with similar environments, or even with different environments. Those running PernixData Architect can feed it in to the cloud analytics platform and do an analysis on it. But what if they don’t? How useful is this cloud analytics platform going to be? Well here is the catch, when you use FVP Freedom one of the requirements will be to upload your statistics and environmental details in to PernixData Cloud. So, what kind of data can you get out of it? Let me give you two visual examples as that shows immediately why this is valuable:

Both of the above examples demonstrate what PernixData Cloud Insights can give you. Data that is going help making purchasing decisions, and I can see how it could also be useful in the future for making design decisions. (Here is what others did to achieve X.) Best example is the top screenshot, not sure which flash device to buy? What are others buying? What can you expect out of it in terms of latency/throughput/IOPS? Cloud Insights will enable you to make educated decisions based on real life environments instead of based on fact-sheets which always appear to be misleading.

All in all, exciting news / announcements from PernixData at Virtualization Field Day 5. Nice work guys, and thanks Mr Denneman for taking the time to have a chat with me and thanks Mr Foskett for streaming the event live!

Rubrik follow up, GA and funding announcement

Duncan Epping · May 27, 2015 ·

Two months ago I published an introduction post on Rubrik. Yesterday Rubrik announced that their platform went GA and they announced a funding round (series B) of 41 million dollars led by Greylock. I want to congratulate Rubrik with this new milestone, major achievement and I am sure we will hear much more from them in the months to come. For those who don’t recall, here is what Rubrik is all about:

Rubrik is building a hyperconverged backup solution and it will scale from 3 to 1000s of nodes. Note that this solution will be up and running in 15 minutes and includes the option to age out data to the public cloud. What impressed me most is that Rubrik can discover your datacenter without any agents, it scales-out in a fully automated fashion and will be capable of deduplicating / compressing data but also offer the ability to mount data instantly. All of this through a slick UI or you can leverage the REST APIs , fully programmable end-to-end.

When I published the article some people made comments that you can do the above with various of other solutions and people asked why I was so excited about their solution. Well, first of all because you can do all of that from a single platform and don’t need a backup solution plus a storage solution and have multiple pieces to manage without scale-out capabilities. I like the model, the combination of what is being offered, the fact that is is a single package designed for this purpose and not glued together… But of course there is more, I just couldn’t talk about it yet. I am not gonna go in to an extreme amount of detail as Cormac wrote an excellent piece here and there is this great blog from Chris, who is a user of the product, which explains the value of the solution. (Always nice to see by the way people read your article and share their experience as well in return…)

I do want to touch on a couple of things which I feel sets Rubrik apart. (And there may be others who do this / offer this, but I haven’t been briefed by them.)

Global search across all data
- “Google-alike” search, which means you start typing the name of a file in the UI of any VM and while typing the UI already presents a list of potential files you are looking for. Then when it shows the right file you click it and it presents a list of options. The file with this name could of course be on one or many VMs, you can pick which one you want and select from which point in time. When I was an admin I was often challenged with this problem “I deleted a file, I know the name… but no clue where I stored it, can you recover it?”. Well that is no problem any longer with global search, just type the name and restore it.
True Scale Out
- I’d already highlighted this, but I agree with Scott Lowe that there is “scale-out” and there is “Scale-Out”. In the case of Rubrik we are talking scale out with capital S and capital O. Not just from a capacity stance, but also when it comes to (as Scott points out) task management and the ability to run any task anywhere in the cluster. So with each node you add you aren’t just scaling capacity, but also performance on all fronts. No single choking point with Rubrik as far as I can tell.
Miscellaneous, stuff that people take for granted… but does matter
- API-Driven – Not something you would expect I would get excited about. And it seems such an obvious thing, but Rubrik’s solution can be configured and managed through the API they expose. Note that every single thing you see in the UI can be done through the API, the UI is simply an API client.
- Well performing instant mount through the use of flash and serving the cluster up as a scale-out NFS solution to any vSphere host in your environment. Want to access a VM that was backed-up? Mount it!
- Cloud archiving… Yes others offer this functionality I know. I still feel it is valuable enough to mention that Rubrik does offer the option to archive data to S3 for instance.

Of course there is more to Rubrik then what I just listed, read the articles by Scott, Cormac and Chris to get a good overview… Or just contact Rubrik and ask for a demo.

Startup intro: Rubrik. Backup and recovery redefined

Duncan Epping · Mar 24, 2015 ·

Some of you may have seen the article by The Register last week about this new startup called Rubrik. Rubrik just announced what they are working on and announced their funding at the same time:

Rubrik, Inc. today announced that it has received $10 million in Series A funding and launched its Early Access Program for the Rubrik Converged Data Management platform. Rubrik offers live data access for recovery and application development by fusing enterprise data management with web-scale IT, and eliminating backup software. This marks the end of a decade-long innovation drought in backup and recovery, the backbone of IT. Within minutes, businesses can manage the explosion of data across private and public clouds.

The Register made a comment, which I want to briefly touch on. They mentioned it was odd that a venture capitalist is now the CEO for a startup and how it normally is the person with the technical vision who heads up the company. I can’t agree more with The Register. For those who don’t know Rubrik and their CEO, the choice for Bipul Sinha may come as a surprise it may seem a bit odd. Then there are some who may say that it is a logical choice considering they are funded by Lightspeed… Truth of the matter is that Bipul Sinha is the person with the technical vision. I had the pleasure to see his vision evolve from a couple of scribbles on the whiteboard to what Rubrik is right now.

I still recall having a conversation with Bipul talking about the state of the “backup industry”, and I recall we agreed the different components of a datacenter had evolved over time but that the backup industry was still very much stuck in the old world. (We agreed backup and recovery solutions suck in most cases…) Back when we had this discussion there was nothing yet, no team, no name, just a vision. Knowing what is coming in the near future and knowing their vision I do think this quote from the press release embraces best what Rubrik is working on and it will do:

Today we are excited to announce the first act in our product journey. We have built a powerful time machine that delivers live data and seamless scale in a hybrid cloud environment. Businesses can now break the shackles of legacy and modernize their data infrastructure, unleashing significant cost savings and management efficiencies.

Of course Rubrik would not be possible without a very strong team of founding members. Arvind Jain, Arvind Nithrakashyap and Soham Mazumdar are probably the strongest co-founders one can wish. The engineering team has deep experience in building distributed systems, such as Google File System, Google Search, YouTube, Facebook Data Infrastructure, Amazon Infrastructure, and Data Domain File System. Expectations just raised a couple of notches right?!

I agree that even the statement above is still a bit fluffy so lets add some more details, what are they working on? Rubrik is working on a solution which combines backup software and a backup storage appliance in to a single solution and initially will target VMware environments. They are building (and I hate using this word) a hyperconverged backup solution and it will scale from 3 to 1000s of nodes. Note that this solution will be up and running in 15 minutes and includes the option to age out data to the public cloud. What impressed me most is that Rubrik can discover your datacenter without any agents, it scales-out in a fully automated fashion and will be capable of deduplicating / compressing data but also offer the ability to mount data instantly. All of this through a slick UI or you can leverage the REST APIs , fully programmable end-to-end.

I just went over “instant mount” quickly, but I want to point out that this is not just for “restoring VMs”. Considering the REST APIs you can also imagine that this would be a perfect solution to enable test/dev environments or running Tier 2/3 workloads. How valuable is it to have instant copies of your production data available and test your new code against production without any interruption to your current environment? To throw a buzzword in there: perfectly fit for a devops world and continuous development.

That is about all I can say for now unfortunately… For those who agree that backup/recovery has not evolved and are interested in a backup solution for tomorrow, there is an early access program and I urge you to sign up to learn more but also help shaping the product! The solution is targeting environments of 200 VMs and upwards, make sure you meet those requirements. Read more here and/or follow them on twitter (or Bipul).

Good luck Rubrik, I am sure this is going to be a great journey!

Startup introduction: Springpath

Duncan Epping · Feb 19, 2015 ·

Last week I was briefed by Springpath and they launched their company officially yesterday, although they have been around for a long time. Springpath was founded by Mallik Mahalingam and Krishna Yadappanavar. For those who don’t know them, Mallik was responsible for VXLAN (See the IETF draft) and Krishna was one of the folks who was responsible for VMFS. (Together with Satyam who started Pernix Data) I believe it was early 2013 or end of 2012 when Mallik reached out to me and he wanted to validate some of his thinking around the software defined storage space, I agreed to meet up and we discussed the state at that time and where some of the gaps were. Since May 2012 they operated in stealth (under the name Storvisor) and landed a total of 34 million dollars from investors like Sequoia, NEA and Redpoint. Well established VC names indeed, but what did they develop?

Springpath is what most folks would refer to as a Server SAN solution, some may also refer to it as “hyper-converged”. I don’t label them as hyper-converged as Springpath doesn’t sell a hardware solution, they sell software and have a strict hardware compatibility list. The list of server vendors on the HCL seemed to cover the majority of big players out there though, I was told Dell, HP, Cisco and SuperMicro are on the list and that others are being worked on as we speak. This approach offers a bit more flexibility according to Springpath for customers as they can chose their own preferred vendor and leverage the server vendor relationship they already have for discounts but also maintain similar operational processes.

Springpath’s primary focus in the first release is vSphere, which knowing the background of these guys makes a lot of sense, and comes in the shape of a virtual appliance. This virtual appliance is installed on top of the hypervisor and grabs local spindles and flash. With a minimum of three nodes you then can create a shared datastore which is served back to vSphere as an NFS mount. There are of course also plans to support Hyper-V and when they do the appliance will provide SMB capabilities and for KVM it will use NFS. But that is on the roadmap right now, but not too far out according to Mallik. (Note that support for Hyper-V, KVM etc will all be released in a different version. KVM and Docker is in Beta as we speak, if you are interested go to their website and drop them an email!) There is even talk about supporting the Springpath solution to run as a Docker container and providing shared storage for Docker itself. All these different platforms should be able to leverage the same shared data platform according to Springpath, the diagram below shows this architecture.

They demonstrated the configuration / installation of their stack and I must say I was impressed with how simple it was. They showed a simple UI which allowed them to configure the IP details etc, but they also showed how they could simply drop a JSON file in there with all the config details which would then be used to deploy the storage environment. When fully configured the whole environment can be managed from the Web Client, no need for a separate UI or anything like that. All integrated within the Web Client, and for Hyper-V and other platforms they had similar plans… no separate client but all manageable through the familiar interfaces those platforms already offer. [Read more…] about Startup introduction: Springpath