Rubrik follow up, GA and funding announcement

Two months ago I published an introduction post on Rubrik. Yesterday Rubrik announced that their platform went GA and they announced a funding round (series B) of 41 million dollars led by Greylock. I want to congratulate Rubrik with this new milestone, major achievement and I am sure we will hear much more from them in the months to come. For those who don’t recall, here is what Rubrik is all about:

Rubrik is building a hyperconverged backup solution and it will scale from 3 to 1000s of nodes. Note that this solution will be up and running in 15 minutes and includes the option to age out data to the public cloud. What impressed me most is that Rubrik can discover your datacenter without any agents, it scales-out in a fully automated fashion and will be capable of deduplicating / compressing data but also offer the ability to mount data instantly. All of this through a slick UI or you can leverage the REST APIs , fully programmable end-to-end.

When I published the article some people made comments that you can do the above with various of other solutions and people asked why I was so excited about their solution. Well, first of all because you can do all of that from a single platform and don’t need a backup solution plus a storage solution and have multiple pieces to manage without scale-out capabilities. I like the model, the combination of what is being offered, the fact that is is a single package designed for this purpose and not glued together… But of course there is more, I just couldn’t talk about it yet. I am not gonna go in to an extreme amount of detail as Cormac wrote an excellent piece here and there is this great blog from Chris, who is a user of the product, which explains the value of the solution. (Always nice to see by the way people read your article and share their experience as well in return…)

I do want to touch on a couple of things which I feel sets Rubrik apart. (And there may be others who do this / offer this, but I haven’t been briefed by them.)

  • Global search across all data
    • “Google-alike” search, which means you start typing the name of a file in the UI of any VM and while typing the UI already presents a list of potential files you are looking for. Then when it shows the right file you click it and it presents a list of options. The file with this name could of course be on one or many VMs, you can pick which one you want and select from which point in time. When I was an admin I was often challenged with this problem “I deleted a file, I know the name… but no clue where I stored it, can you recover it?”. Well that is no problem any longer with global search, just type the name and restore it.
  • True Scale Out
    • I’d already highlighted this, but I agree with Scott Lowe that there is “scale-out” and there is “Scale-Out”. In the case of Rubrik we are talking scale out with capital S and capital O. Not just from a capacity stance, but also when it comes to (as Scott points out) task management and the ability to run any task anywhere in the cluster. So with each node you add you aren’t just scaling capacity, but also performance on all fronts. No single choking point with Rubrik as far as I can tell.
  • Miscellaneous, stuff that people take for granted… but does matter
    • API-Driven – Not something you would expect I would get excited about. And it seems such an obvious thing, but Rubrik’s solution can be configured and managed through the API they expose. Note that every single thing you see in the UI can be done through the API, the UI is simply an API client.
    • Well performing instant mount through the use of flash and serving the cluster up as a scale-out NFS solution to any vSphere host in your environment. Want to access a VM that was backed-up? Mount it!
    • Cloud archiving… Yes others offer this functionality I know. I still feel it is valuable enough to mention that Rubrik does offer the option to archive data to S3 for instance.

Of course there is more to Rubrik then what I just listed, read the articles by Scott, Cormac and Chris to get a good overview… Or just contact Rubrik and ask for a demo.

No one ever got fired for buying IBM/HP/DELL/EMC etc

Last week on twitter there was a discussion about hyper-converged solutions and how these were not what someone who works in an enterprise environment would buy for their tier 1 workloads. I asked the question: well what about buying Pure Storage, Tintri, Nimble or Solid Fire systems? All non-hyper converged solutions, but relatively new. Answer was straight forward: not buying those either, big risk. Then the classic comment came:

No one ever got fired for buying IBM (Dell, HP, NetApp, EMC… pick one)

Brilliant marketing slogan by the way (IBM) which has stuck around since the 70s and is now being used by many others. I wondered though… Did anyone ever get fired for buying Pure Storage? Or for buying Tintri? What about Nutanix? Or VMware Virtual SAN? Hold on, maybe someone got fired for buying Nimble, yeah probably Nimble then. No of course not, even after a dozen google searches nothing shows up. Why you may ask yourself, well because typically people don’t get fired for buying a certain solution. People get fired for being incompetent / lazy / stupid. In the case of infrastructure and workloads that translates in to managing and placing workloads incorrectly or misconfiguring infrastructure. Fatal mistakes which result in dataloss or long periods of downtime, that is what gets you fired.

Sure, buying from a startup may impose some risks. But I would hope that everyone reading this weighs those risks against the benefits, that is what you do as an architect in my opinion. You assess risks and you determine how to mitigate those within your budget. (Yes of course taking requirements and constraints in to account as well.)

Now when it comes to these newer storage solutions, and “new” is relative in this case as some have been around for over 5 years, I would argue that the risk is in most cases negligible. Will those newer storage systems be free of bugs? No, but neither will your legacy storage system be. Some of those systems have been around for over a decade and are now used in scenarios they were never designed for, which means that new problems may be exposed. I am not saying that legacy storage systems will break under your workload, but are you taking that risk in to account? Probably not, why not? Because hardly anyone talks about that risk.

If you (still) don’t feel comfortable with that “new” storage system (yet) but they do appear to give you that edge or bigger bang for the buck simply ask the sales rep a couple of questions which will help building trust:

  • How many systems are sold world wide similar to what you are looking to buy and for similar platforms
    • If they sold thousands, but none of them is using vSphere for instance then what are the chances of you hitting that driver problem firsts? If they sold thousand it will be useful to know…
  • How many customers for that particular model
    • Wouldn’t be the first time a vendors sells thousands of boxes to a single customer for a very specific use case and it works great for them, just not in your particular use case.
    • But if they have many customers, maybe ask…
  • If you can talk to a couple of customers
    • Best thing you can ask for in my opinion, reference call or visit. This is when you find out if what is promised actually is reality.

I do believe that the majority of infrastructure related startups are great companies with great technology. Personally I see a bigger threat in terms of sustainability, rather than technology. Not every startup is going to be around 10 years from now. But if you look at all the different storage (or infra) startups which are out there today, and then look at how they are doing in the market it shouldn’t be too difficult to figure out who is in it for the long run. Whether you buy from a well-established vendor or from a relatively new storage company, it is all about your workload. What are the requirements and how can those requirements be satisfied by that platform. Assess the risks and weigh them against the benefit and make a decision based on that. Don’t make decisions based on a marketing slogan that has been around since the 70s. The world looks different now, technology is moving faster than ever before, being stuck in the 70s is not going to help you or your company compete in this day and age.

VAAI support in vSphere Standard and up as of 6.0!

After some internal discussions over the last months it was decided to move VAAI (vSphere APIs for Array Integration) and Multi-Pathing down to vSphere Standard as of 6.0. Main reason for this was that Virtual Volumes, by many considered as the natural evolution of VAAI, is also part of vSphere Standard. So if you have vSphere Standard and a VAAI capable array and looking to move to 6.0, make sure to check the configuration of your hosts and use this great functionality! Note that VAAI did indeed already work in lower editions, but from a licensing point of view you weren’t entitled to it… I guess many folks never really looked at enabling / disabling it explicitly, but for those who did… now you can use it. More details on what is included with which license can be found here: http://www.vmware.com/au/products/vsphere/compare.html

VAAI support in vSphere Standard

This host currently has no network management redundancy

Bumped in to this a billion times by now, and I wouldn’t recommend applying this in production but for your lab when you need to take clean screenshots it works great. I’ve mentioned this setting before but as it was part of a larger article it doesn’t stand out when searching so I figured I would dedicate a short and simple article to it. Here is what you will need to do if you see the following message in the vSphere Web Client: this host currently has no network management redundancy.

  • Go to your Cluster object
  • Go to Settings
  • Go to “vSphere HA”
  • Click “Edit”
  • Add an advanced setting called “das.ignoreRedundantNetWarning”
  • Set the advanced setting to “true”
  • On each host right click and select “reconfigure for vSphere HA”

This is what it should look like in the UI:
This host currently has no network management redundancy

You can also do this in PowerCLI by the way, note that “Stretched-Bluefin-Frimley” is the name of my cluster.

New-AdvancedSetting -Entity Stretched-Bluefin-Frimley -type ClusterHA -Name "das.ignoreRedundantNetWarning" -Value "true" -force

 

Deploy VCSA 6.0 firstboot error

I was doing some tests in my lab and while deploying a new VCSA 6.0 I received an error that firstboot was unsuccessful. Not really a great error message if you ask me but okay. I had already validated DNS twice before I got started, but I checked it again just in case… DNS was all good, what else could it be? Figured NTP could be another problem and my friend William Lam confirmed that. I checked the host if NTP was configured and it was not for some reason. So I configured NTP on my ESXi hosts which was straight forward, but what about the VCSA I had deployed? Also not too complicated, I logged in via SSH and did the following:

  • ntp.get
    Will show “Status: Down”
  • ntp.server.add –servers 10.17.0.1
    This configures VCSA to fetch the time from ntp server to 10.17.0.1
  • timesync.set –mode NTP
    Make sure that the time sync is set to ntp
  • ntp.get
    Should show “Status: Up”

That should do it… By the way, you can simply check “resolv.conf” for DNS to see how it is configured today, also look at “hosts” for the host name etc.