Data Recovery

vSphere HA deepdive 6.x available for free!

Duncan Epping · Feb 18, 2016 ·

I’ve been discussing this over the last 12 months with Frank, and to be honest we are still not sure what is the right thing to do but we decided to take this step anyway. Over the past couple of years we released various updates of the vSphere Clustering Deepdive. Updating the book sometimes was a massive pain (version 4 to 5 for instance), but some of the minor updates have been relative straight forward, although still time consuming due to formatting / diagrams / screenshots etc.

Ever since we’ve been looking for new ways to distribute our book, or publication as I will refer to it from now on. I’ve looked at various options, and found one which I felt was the best of all worlds: Gitbook. Gitbook is a solution which allows you as an author to develop content in Markdown and distribute it in various different formats. This could be as static html, pdf, ePub or Mobi. Basically any format you would want in this day and age. The great thing about the platform as well is that it integrates with Github and you can share your source there and do things like version control etc. It does it in such a way that I can use the Gitbook client on my Mac, while someone else who wants to contribute or submit a change can simply use their client of choice and submit a change through git. Although I don’t expect too many people to do this, it will make it easier for me to have material reviewed for instance by one of the VMware engineers.

So what did I just make available for free? Well in short, an updated version (vSphere 6.0 Update 1) of the vSphere HA Deepdive. This includes the stretched clustering section of the book. Note that DRS and SDRS have not been included (yet). This may or may not happen in some shape or form in the future though. For now, I hope you will enjoy and appreciate the content that I made available for free. You can access it by clicking “HA Deepdive” on the left, or (in my opinion) for a better reading experience read it on Gitbook directly through this link: ha.yellow-bricks.com.

Note that there are links as well to download the content in different formats, for those who want to read it on their iPad / phone / whatever. Also note that Gitbook allows you to comment on a paragraph by clicking the “+” sign on the right side of the paragraph when you hover over it… Please submit feedback when you see mistakes! And for those who are really active, if you want to you could even contribute to the content! I will keep updating the content over the upcoming months probably with more info on VVols and for instance the Advanced Settings, so keep checking back regularly!

data copy management / converged data management / secondary storage

Duncan Epping · Dec 3, 2015 ·

At the Italian VMUG I was part of the “Expert panel” at the end of the event. One of the questions was around innovation in the world of IT, what should be next. I knew immediately what I was going to answer: backup/recovery >> data copy management. My key reason for it being is that we haven’t seen much innovation in this space.

And yes before some of my community friends will go nuts and point at Veeam and some of the great stuff they have introduced over the last 10 years, I am talking more broadly here. Many of my customers are still using the same backup solution they used 10-15 years ago, yes it is a different version probably, but all the same concepts apply. Well maybe tapes have been replaced by virtual tape libraries stored on a disk system somewhere, but that is about it. The world of backup/recovery hasn’t evolved really.

Over the last years though we’ve been seeing a shift in the industry. This shift started with companies like Veeam and then continued with companies like Actifio, and this is now accelerated by companies like Cohesity and Rubrik. What is different from what these guys offer versus the more traditional backup solution… well the difference is that all of these are more than backup solutions, they don’t focus on a single use case. They “simply” took a step back and looked at what kind of solutions are using your data today, who is using it, how and of course what for. On top of that, where the data is stored is also a critical part of it of the puzzle.

In my mind Rubrik and Cohesity are leading the pack when it comes to this new wave of, they’ve developed a solution which is a convergence of different products (Backup / Scale-out storage / Analytics / etc). I used “convergence” on purpose, as this is what it is to me “converged data (copy) management”. Although not all use cases may have reached the full potential yet, the vision is pretty clear, and multiple layers have already converged, even if we would just consider backup and scale-out storage. I said pretty clear as the various startups have taken different messaging approaches. This is something that became obvious during the last Storage Field Day where Cohesity presented. Different story than for instance Rubrik had during Virtualization Field Day. Just as an example, Rubrik typical;y leads with data protection and management, where Cohesity’s messaging appears to be more around being a “secondary storage platform”. This in the case of Cohesity lead to discussions (during SFD) around what secondary storage is, how you get data on the platform and finally then what you can do with it.

To me, and the folks at these startups may have completely different ideas around this, there are a couple of use cases which stand out for a converged data management platform, use cases which I would expect to be the first target and I will explain why in a second.

Backup and Recovery (long retention capabilities)
Disaster Recovery using point in time snapshots/replication (relatively short retention capabilities and low RPO)

Why are these the two use cases to go after first? Well it is the easiest way to suck data in to your system and make your system sticky. It is also the market where innovation is needed, on top of that you need to have the data in your system first before you can do anything with it. Before some of the other use cases start to make sense like “data analytics”, or creating clones for “test / dev” purposes, or spinning up DR instances whether that is in your remote site or somewhere in the cloud.

The first use case (backup and recovery) is something which all of them are targeting, the second one not so much at this point. In my opinion a shame, as it could definitely be very compelling for customers to have these two data availability concepts combined. Especially when some form of integration with an orchestration layer can be included (think Site Recovery Manager here) and protection of workloads is enabled through policy. Policy in this case allowing you to specify SLA for data recovery in the form of recovery point, recovery time and retention. And then when needed, you as a customer have the choice of how you want to make your data available again: VM fail-over, VM recovery, live/instant recovery, file granular or application/database object level recovery and so on and so forth. Not just that, from that point on you should be capable of using your data for other use cases, the use cases I mentioned earlier like analytics, test/dev copies etc.

We aren’t there yet, better said we are far from there, but I do feel this is where we are headed towards… and some are closing in faster than others. I can’t wait for all of this to materialize and we start making those next steps and see what kind of new use cases can be made possible on converged data management platforms.

Rubrik 2.0 release announced today

Duncan Epping · Aug 19, 2015 ·

Today the Rubrik 2.0 release was announced. I’ve written about who they are and what they do twice now so I am not going to repeat that. If you haven’t read those articles please read those first. (Article 1 and article 2) Chris Wahl took the time to brief me and the first thing that stood out to me was the new term that was coined namely: Converged Data Management. Considering what Rubrik does and has planned for the future I think that term is spot on.

When it comes to 2.0 there are a bunch of features that are introduced, I will list them out and then discuss some of them in a bit more detail:

New Rubrik appliance model r348
- Same 2U/4Node platform, but leveraging 8TB disks instead of 4TB disks
Replication
Auto Protect
WAN Efficient (global deduplication)
AD Authentication – No need to explain
OpenStack Swift support
Application aware backups
Detailed reporting
Capacity planning

Lets start at the top, a new model is introduced next to the two existing models. The 2 other models are also both 2U/4Node solutions but use 4TB drives instead of the 8TB drives the R348 will be using. This will boost capacity for single Brik up to roughly 300TB, in 2U this is not bad at all I would say.

Of course the hardware isn’t the most exiting, the software changes fortunately are. In the 2.0 release Rubrik introduces replication between sites / appliances and global dedupe which ensures that replication is as efficient as it can be. The great thing here is that you backup data and replicate it straight after it has been deduplicated to other sites. All of this is again policy driven by the way, so you can define when you want to replicate, how often and for how long data needs to be saved on the destination.

Auto-protect is one of those features which you will take for granted fast, but is very valuable. Basically it will allow you to set a default SLA on a vCenter level, or Cluster – Resource Pool – Folder, you get the drift. Set and forget is basically what this means, no longer the risk of newly provisioned VMs which have not been added to the backup schedule. Something really simple, but very useful.

When it comes to applications awareness Rubrik in version 2.0 will also leverage a VSS provider to allow for transactional consistent backups. This applies today for Microsoft Exchange, SQL, Sharepoint and Active Directory. More can be expected in the near future. Note that this applies to backups, for restoring there is no option (yet) to restore a specific mailbox for instance, but Chris assured me that this on their radar.

When it comes to usability a lot of improvements have been made starting with things like reporting and capacity planning. One of the reports which I found very useful is the SLA Compliancy reporting capability. It will simply show you if VMs are meeting the defined SLA or not. Capacity planning is also very helpful as it will inform you what the growth rate is locally and in the cloud, and also when you will be running out of space. Nice trigger to buy an additional appliance right, or change your retention period or archival policy etc. On top of that things like object deletion, task cancellation, progress bars and much more usability improvements have made it in to the 2.0 release.

All in all an impressive release, especially considering the 1.0 was released less than 6 months ago. It is great to see a high release cadence for an industry which has been moving extremely slow for the past decades. Thanks Rubrik for stirring things up!

Rubrik follow up, GA and funding announcement

Duncan Epping · May 27, 2015 ·

Two months ago I published an introduction post on Rubrik. Yesterday Rubrik announced that their platform went GA and they announced a funding round (series B) of 41 million dollars led by Greylock. I want to congratulate Rubrik with this new milestone, major achievement and I am sure we will hear much more from them in the months to come. For those who don’t recall, here is what Rubrik is all about:

Rubrik is building a hyperconverged backup solution and it will scale from 3 to 1000s of nodes. Note that this solution will be up and running in 15 minutes and includes the option to age out data to the public cloud. What impressed me most is that Rubrik can discover your datacenter without any agents, it scales-out in a fully automated fashion and will be capable of deduplicating / compressing data but also offer the ability to mount data instantly. All of this through a slick UI or you can leverage the REST APIs , fully programmable end-to-end.

When I published the article some people made comments that you can do the above with various of other solutions and people asked why I was so excited about their solution. Well, first of all because you can do all of that from a single platform and don’t need a backup solution plus a storage solution and have multiple pieces to manage without scale-out capabilities. I like the model, the combination of what is being offered, the fact that is is a single package designed for this purpose and not glued together… But of course there is more, I just couldn’t talk about it yet. I am not gonna go in to an extreme amount of detail as Cormac wrote an excellent piece here and there is this great blog from Chris, who is a user of the product, which explains the value of the solution. (Always nice to see by the way people read your article and share their experience as well in return…)

I do want to touch on a couple of things which I feel sets Rubrik apart. (And there may be others who do this / offer this, but I haven’t been briefed by them.)

Global search across all data
- “Google-alike” search, which means you start typing the name of a file in the UI of any VM and while typing the UI already presents a list of potential files you are looking for. Then when it shows the right file you click it and it presents a list of options. The file with this name could of course be on one or many VMs, you can pick which one you want and select from which point in time. When I was an admin I was often challenged with this problem “I deleted a file, I know the name… but no clue where I stored it, can you recover it?”. Well that is no problem any longer with global search, just type the name and restore it.
True Scale Out
- I’d already highlighted this, but I agree with Scott Lowe that there is “scale-out” and there is “Scale-Out”. In the case of Rubrik we are talking scale out with capital S and capital O. Not just from a capacity stance, but also when it comes to (as Scott points out) task management and the ability to run any task anywhere in the cluster. So with each node you add you aren’t just scaling capacity, but also performance on all fronts. No single choking point with Rubrik as far as I can tell.
Miscellaneous, stuff that people take for granted… but does matter
- API-Driven – Not something you would expect I would get excited about. And it seems such an obvious thing, but Rubrik’s solution can be configured and managed through the API they expose. Note that every single thing you see in the UI can be done through the API, the UI is simply an API client.
- Well performing instant mount through the use of flash and serving the cluster up as a scale-out NFS solution to any vSphere host in your environment. Want to access a VM that was backed-up? Mount it!
- Cloud archiving… Yes others offer this functionality I know. I still feel it is valuable enough to mention that Rubrik does offer the option to archive data to S3 for instance.

Of course there is more to Rubrik then what I just listed, read the articles by Scott, Cormac and Chris to get a good overview… Or just contact Rubrik and ask for a demo.

Startup intro: Rubrik. Backup and recovery redefined

Duncan Epping · Mar 24, 2015 ·

Some of you may have seen the article by The Register last week about this new startup called Rubrik. Rubrik just announced what they are working on and announced their funding at the same time:

Rubrik, Inc. today announced that it has received $10 million in Series A funding and launched its Early Access Program for the Rubrik Converged Data Management platform. Rubrik offers live data access for recovery and application development by fusing enterprise data management with web-scale IT, and eliminating backup software. This marks the end of a decade-long innovation drought in backup and recovery, the backbone of IT. Within minutes, businesses can manage the explosion of data across private and public clouds.

The Register made a comment, which I want to briefly touch on. They mentioned it was odd that a venture capitalist is now the CEO for a startup and how it normally is the person with the technical vision who heads up the company. I can’t agree more with The Register. For those who don’t know Rubrik and their CEO, the choice for Bipul Sinha may come as a surprise it may seem a bit odd. Then there are some who may say that it is a logical choice considering they are funded by Lightspeed… Truth of the matter is that Bipul Sinha is the person with the technical vision. I had the pleasure to see his vision evolve from a couple of scribbles on the whiteboard to what Rubrik is right now.

I still recall having a conversation with Bipul talking about the state of the “backup industry”, and I recall we agreed the different components of a datacenter had evolved over time but that the backup industry was still very much stuck in the old world. (We agreed backup and recovery solutions suck in most cases…) Back when we had this discussion there was nothing yet, no team, no name, just a vision. Knowing what is coming in the near future and knowing their vision I do think this quote from the press release embraces best what Rubrik is working on and it will do:

Today we are excited to announce the first act in our product journey. We have built a powerful time machine that delivers live data and seamless scale in a hybrid cloud environment. Businesses can now break the shackles of legacy and modernize their data infrastructure, unleashing significant cost savings and management efficiencies.

Of course Rubrik would not be possible without a very strong team of founding members. Arvind Jain, Arvind Nithrakashyap and Soham Mazumdar are probably the strongest co-founders one can wish. The engineering team has deep experience in building distributed systems, such as Google File System, Google Search, YouTube, Facebook Data Infrastructure, Amazon Infrastructure, and Data Domain File System. Expectations just raised a couple of notches right?!

I agree that even the statement above is still a bit fluffy so lets add some more details, what are they working on? Rubrik is working on a solution which combines backup software and a backup storage appliance in to a single solution and initially will target VMware environments. They are building (and I hate using this word) a hyperconverged backup solution and it will scale from 3 to 1000s of nodes. Note that this solution will be up and running in 15 minutes and includes the option to age out data to the public cloud. What impressed me most is that Rubrik can discover your datacenter without any agents, it scales-out in a fully automated fashion and will be capable of deduplicating / compressing data but also offer the ability to mount data instantly. All of this through a slick UI or you can leverage the REST APIs , fully programmable end-to-end.

I just went over “instant mount” quickly, but I want to point out that this is not just for “restoring VMs”. Considering the REST APIs you can also imagine that this would be a perfect solution to enable test/dev environments or running Tier 2/3 workloads. How valuable is it to have instant copies of your production data available and test your new code against production without any interruption to your current environment? To throw a buzzword in there: perfectly fit for a devops world and continuous development.

That is about all I can say for now unfortunately… For those who agree that backup/recovery has not evolved and are interested in a backup solution for tomorrow, there is an early access program and I urge you to sign up to learn more but also help shaping the product! The solution is targeting environments of 200 VMs and upwards, make sure you meet those requirements. Read more here and/or follow them on twitter (or Bipul).

Good luck Rubrik, I am sure this is going to be a great journey!