BC-DR

Cohesity announces 4.0 and Round C funding

Duncan Epping · Apr 4, 2017 ·

Earlier this week I was on the phone with Rawlinson Rivera, my former VMware/vSAN colleague, and he told me all about the new stuff for Cohesity that was just announced. First of all, congrats with Round C funding. As we’ve all seen, lately it has been mayhem in the storage world. Landing a $90 million round is big. This round was co-led by investors GV (formerly Google Ventures) and Sequoia Capital. Both Cisco Investments and Hewlett Packard Enterprise (HPE) also participated in this round as strategic investors. I am not an analyst, and I am not going to pretend either, lets talk tech.

Besides the funding round, Cohesity also announced the 4.0 release of their hyper-converged secondary storage platform. Now, let it be clear, I am not a fan of the “hyper-converged” term used here. Why? Well I think this is a converged solution. They combined multiple secondary storage use cases and created a single appliance. Hyper-Converged stands for something in the industry, and usually it means the combination of a hypervisor, storage software and hardware. The hypervisor is missing here. (No I am not saying “hyper” in hyper-converged” stands for hypervisor.) Anyway, lets continue.

In 4.0 some really big functionality is introduced, lets list it out and then discuss each at a time:

S3 Compatible Object Storage
Quotas for File Services
NAS Data Protection
RBAC for Data Protection
Folder and Tag based protection
Erasure Coding

As of 4.0 you can now on the Cohesity platform create S3 Buckets, besides replicating to an S3 bucket you can now also present them! This is fully S3 compatible and can be created through their simple UI. Besides exposing their solution as S3 you can also apply all of their data protection logic to it, so you can have cloud archival / tiering /replication. But also enable encryption, data retention and create snapshots.

Cohesity already offered file services (NFS and SMB), and in this release they are expanding the functionality. The big request from customers was Quotas and that is introduced in 4.0. Along with what they call Write-Once-Read-Many (WORM) capabilities, which refers to data retention in this case (write once, keep forever).

For the Data Protection platform they now offer NAS Data Protection. Basically they can connect to a NAS device and protect everything which is stored on that device by snapping the data and storing it on their platform. So if you have a NetApp filer for instance you can now protect that by offloading the data to the Cohesity platform. For the Data Protection solution they also intro Role Based Access. I think this was one of the big ticket items missing, and with 4.0 they now provide that as well. Last but not last “vCenter Integration”, which means that they can now auto-protect groups of VMs based on the folder they are in or the tag they have provided. Just imagine you have 5000 VMs, you don’t want to associate a backup scheme with each of these, you probably much rather do that for an X number of VMs with a similar SLA at a time. Give them a tag, and associate the tag with the protection scheme (see screenshot). Same for folders, easy.

Last but not least: Erasure Coding. This is not a “front-end” feature, but it is very useful to have. Especially in larger configurations it can safe a lot of precious disk space. Today they have “RAID-1” mechanism more or less, where each block is replicated / mirrored to another host in the cluster. This results in a 100% overhead, in other words: for every 100GB stored you need 200GB capacity. By introducing Erasure Coding they reduce that immediately to 33%. Or in other words, with a 3+1 scheme you get 50% more usable capacity and with a 5+2 (double protection) you get 43% more. Big savings, a lot of extra usable capacity.

Oh and before I forget, besides getting Cisco and HPE as investors you can now also install Cohesity on Cisco kit (there’s a list of approved configurations). HPE took it one step further even, they can sell you a configuration with Cohesity included and pre-installed. Smart move.

All in all, some great new functionality and some great enhancements of the current offering. Good work Cohesity, looking forward to see what is next for you guys.

vSAN needs 3 fault domains

Duncan Epping · Mar 29, 2017 ·

I have been having discussions with various customers about all sorts of highly available vSAN environments. Now that vSAN has been available for a couple of years customers are starting to become more and more comfortable around designing these infrastructures, which also leads to some interesting discussions. Many discussions these days are on the subject of multi room or multi site infrastructures. A lot of customers seem to have multiple datacenter rooms in the same building, or multiple datacenter rooms across a campus. When going through these different designs one thing stands out, in many cases customers have a dual datacenter configuration, and the question is if they can use stretched clustering across two rooms or if they can do fault domains across two rooms.

Of course theoretically this is possible (not supported, but you can do it). Just look at the diagram below, we cross host the witness and we have 2 clusters across 2 rooms and protect the witness by hosting it on the other vSAN cluster:

The challenge with these types of configurations is what happens when a datacenter room goes down. What a lot of people tend to forget is that depending on what fails the impact will vary. In the scenario above where you cross host a witness the failure if “Site A”, which is the left part of the diagram, results in a full environment not being available. Really? Yeah really:

Site A is down
Hosts-1a / 2a / 1b / 2b are unavailable
Witness B for Cluster B is down >> as such Cluster B is down as majority is lost
As Cluster B is down (temporarily), Cluster A is also impacted as Witness A is hosted on Cluster B
So we now have a circular dependency

Some may say: well you can move Witness B to the same side as Witness A, meaning in Site B. But now if Site B fails the witness VMs are gone also impacting all clusters directly. That would only work if only Site A is ever expected to go down, who can give that guarantee? Of course the same applies to using “fault domains”, just look at the diagram below:

In this scenario we have the “orange fault domain” in Room A, “yellow” in Room B and “green” across rooms as there is no other option at that point. If Room A fails, VMs that have components in “Orange” and on “Host3” will be impacted directly, as more than 50% of their components will be lost the VMs cannot be restarted in Room B. Only when their components in “fault domain green” happen to be on “Host-6” then the VMs can be restarted. Yes in terms of setting up your fault domains this is possible, this is supported, but it isn’t recommended. No guarantees can be given your VMs will be restarted when either of the rooms fail. My tip of the day, when you start working on your design, overlay the virtual world with the physical world and run through failure scenarios step by step. What happens if Host 1 fails? What happens if Site 1 fails? What happens if Room A fails?

Now so far I have been talking about failure domains and stretched clusters, these are all logical / virtual constructs which are not necessarily tied to physical constructs. In reality however when you design for availability/failure, and try to prevent any type of failure to impact your environment the physical aspect should be considered at all times. Fault Domains are not random logical constructs, there’s a requirement for 3 fault domains at a minimum, so make sure you have 3 fault domains physically as well. Just to be clear, in a stretched cluster the witness acts as the 3rd fault domain. If you do not have 3 physical locations (or rooms), look for alternatives! One of those for instance could be vCloud Air, you can host your Stretched Cluster witness there if needed!

Virtually Speaking Podcast – vSAN Customer Use Cases

Duncan Epping · Feb 22, 2017 ·

As John Nicholson was traveling in and around New Zealand I was asked by Pete if I could co-host the Virtually Speaking Podcast again. It is always entertaining to join, Pete is such a natural when it comes to these things. I euuh, well I do my best to keep up with him :). Below you can find the latest episode on the topic of vSAN Customer Use Cases. It includes a lot of soundbites recorded at VMware World Wide Kick Off / Tech Summit, which is a VMware internal event for all Sales, Pre-Sales and Post-Sales field facing people.

You can of course also subscribe on iTunes!

Rubrik update >> 3.1

Duncan Epping · Feb 8, 2017 ·

It has been a while since I wrote about Rubrik. This week I was briefed by Chris Wahl on what is coming in their next release, which is called Cloud Data Management 3.1. As Chris mentioned during the briefing, backup solutions grab data. In most cases this data is then never used, or in some cases used for restores but that is it. A bit of a waste if you imagine there are various other uses cases for this data.

First of all, it should be possible from a backup and recovery perspective to set a policy, secure it, validate compliancy and search the data. On top of that the data set should be fully indexed and should be accessible through APIs which allows you to automate and orchestrate various types of workflows, like for instance provide it to developers for test/dev purposes.

Anyway, what was introduced in Cloud Data Management 3.1? Today Rubrik from a source perspective supports vSphere, SQL Server, Linux and NAS and with 3.1 also “physical” Windows (or native, whatever you want to call it) is supported. (Windows 2008 R2, 2012 and 2012 R2) Fully policy based in a similar way to how they implemented it for vSphere. Also, support for SQL Server Failover Clustering (WSFC) was added. Note that the Rubrik connector must be installed on both nodes. Rubrik will automatically recognize that the hosts are part of a cluster and provide additional restore options etc.

There are a couple of User Experience improvements as well. Instead of being “virtual machine” centric now the UI revolves around “hosts”. Meaning that the focus is on the “OS”, and they will for instance show all file systems which are protected and a calendar with snapshots and per day a set of the snapshots of the host. One of the areas Rubrik still had some gaps was reporting and analytics. With 3.1 Rubrik Envision is introduced.

Rubrik Envision provides you build your own fully customisable reports, and of course provides different charts and filtering / query options. These can be viewed, downloaded and emailed in html-5 format. This can also be done in a scheduled fashion, create a report and schedule it to be send out. Four standard reports are included to get you started, of course you can also tweak those if needed.

(blatantly stole this image from Mr Wahl)

Cloud Data Management 3.1 also adds Software Based encryption (AES-256) at rest, where in the past self encrypting devices were used. Great thing is that this will be supported for all R300 series. Single click to enable it, nice! When thinking about this later I asked Chris a question about multi-tenancy and he mentioned something I had not realized:

For multi tenant environments, we’re encrypting data transfers in and out of the appliance using SSL certificates between the clusters (such as hosting provider cluster to customer cluster), which are logically divided by SLA Domains. Customers don’t have any visibility into other replication customers and can supply their own keys for archive encryption (Azure, AWS, Object, etc.)

That was a nice surprise to me. Especially in multi-tenancy environments or large enterprise organizations with clear separation between business units that is a nice plus.

Some “minor” changes Chris mentioned as well, in the past Rubrik would help with every upgrade but this didn’t scale well plus there are customers who have Rubrik gear installed in a “dark site” (meaning no remote connection for security purposes). With the 3.1 release there is the option for customers to do this themselves. Download the binary, upload to the box, type upgrade and things happen. Also, restores directly to ESXi are useful. In the past you needed vCenter in place first. Some other enhancements around restoring, but too many little things to go in to. Overall a good solid update if you ask me.

Last but not least, from a company/business point of view, 250 people work at Rubrik right now. 6x growth in terms of customer acquisition, which is great to hear. (No statement around customer count though.) I am sure we will hear more from the guys in the future. They have a good story, a good product and are solving a real pain point in most datacenters today: backup/recovery and explosion of data sets and data growth. Plenty of opportunities if you ask me.

Two host stretched vSAN cluster with Standard license?

Duncan Epping · Jan 24, 2017 ·

I was asked today if it was possible to create a 2 host stretched cluster using a vSAN Standard license or a ROBO Standard license. First of all, from a licensing point of view the EULA states you are allowed to do this with a Standard license:

A Cluster containing exactly two Servers, commonly referred to as a 2-node Cluster, can be deployed as a Stretched Cluster. Clusters with three or more Servers are not allowed to be deployed as a Stretched Cluster, and the use of the Software in these Clusters is limited to using only a physical Server or a group of physical Servers as Fault Domains.

I figured I would give it a go in my lab. Exec summary: worked like a charm!

Loaded up the ROBO license:

Go to the Fault Domains & Stretched Cluster section under “Virtual SAN” and click Configure. And one host to “preferred” and one to “secondary” fault domain:

Select the Witness host:

Select the witness disks for the vSAN cluster:

Click Finish:

And then the 2-node stretched cluster is formed using a Standard or ROBO license:

Of course I tried the same with 3 hosts, which failed as my license does not allow me to create a stretched cluster larger than 1+1+1. And even if it would succeed, the EULA clearly states that you are not allowed to do so, you need Enterprise licenses for that.

There you have it. Two host stretched using vSAN Standard, nice right?!