data protection

Does vSAN Data Protection work with vSAN Stretched Clusters and can snapshots be stretched?

Duncan Epping · Oct 18, 2024 · Leave a Comment

I have written a few articles about vSAN Data Protection now, and my last article featured a nice vSAN DP demo. A very good question was asked in the comment section, and it was about vSAN Stretched Clusters. Basically, the question was whether Snapshots are also stretched across locations. This is a great question, as there are a couple of things which are probably worth explaining again.

vSAN Data Protection relies on the snapshot capability which was introduced with vSAN ESA. This snapshot capability in vSAN ESA is significantly different than with vSAN OSA or with VMFS. With vSAN OSA and VMFS when you create a snapshot a new object (vSAN) or file (VMFS) is created. With vSAN ESA this is no longer the case as we don’t create additional files or objects, but we create a copy of the metadata structure instead. This is why vSAN ESA snapshots perform much better than vSAN OSA or VMFS snapshots do, as we no longer need to traverse multiple files or objects to read data. We can simply use the same object, and leverage the metadata structure to keep track of what has changed.

Now, with vSAN, as most of you hopefully know, object (and it’s components) are placed across the cluster based on what is specified within the storage policy that is associated with the object or VM. In other words, if the policy states FTT=1 and RAID-1, then you will see 2 copies of the data. If the policy states the data needs to be stretched across locations, and within each location be protected with RAID-5, then you will see a RAID-1 configuration across sites and a RAID-5 configuration within each site. As vSAN ESA snapshots are an integral part of the object, the snapshots automatically follow all requirements as defined within the policy. In other words, if the policy says stretched then the snapshot will also automatically be stretched.

There is one caveat I want to call out, and for that I want to show a diagram. The diagram below shows the Data Protection Appliance, aka the snapshot manager appliance. As you can see, it states “metadata decoupled from appliance” and it links somehow to a global namespace object. This global namespace object is where all the details of the protected VMs (and more) is being stored. As you can imagine, both the Snapshot Manager, as well as the Global Namespace object should also be stretched. For the global namespace object this means that you need to ensure that the default datastore policy is set to “stretched”, and of course for the snapshot manager appliance you can simply select the correct policy when provisioning the appliance. Either way, make sure the default datastore policy aligns with the disaster recovery and data protection policy.

I hope this helps those exploring vSAN Data Protection in a stretched cluster configuration!

vSAN Data Protection, what can you do with it today?

Duncan Epping · Jul 2, 2024 · 13 Comments

Last week I posted a blog about how to get vSAN Data Protection up and running, but I never explained why you may want to do this in the first place? Before we dive into it, it probably is smart to also mention that vSAN Data Protection leverages the new snapshotting capability which is part of the vSAN Express Storage Architecture (vSAN ESA). vSAN ESA has a snapshotting capability that infinitely scales. It literally is 100x better than the snapshotting mechanism that is part of vSAN OSA. The reason for this is simple, with vSAN OSA (and VMFS for that matter) when you create a snapshot a new object is created and IO is redirected. With vSAN ESA we basically copy the meta data structure and keep using the existing objects for IO, which means we don’t need to traverse multiple files for reads for instance, and when we delete a snapshot we don’t need to copy data… As it is all about metadata changes / tracking.

Now in vSphere / vSAN 8.0 Update 3 we introduce a new capability called vSAN Data Protection. This provides you the ability to schedule snapshots, create immutable snapshots, clone snapshots, restore snapshots etc. All this through the vSphere Client interface, of which you can see an example below.

Now, in the above screenshot you see that half of the VMs are protected, and the other half is not. Why is that? Well, simply because I decided to only protect half of my VMs when I created my snapshot schedule. This is an “opt-in” mechanism, so you create a schedule, you decide which VMs are part of a schedule and which are not! Let’s take a look at the scheduling mechanism. You can find the mechanism at the cluster level under “Configuration -> vSAN -> Data Protection”. When you go there you see the above screen and you can click on “protection groups” and then “Create protection group”.

This will then present you a screen where you can define the name of the protection group, enable “immutability mode” and select the “Membership” of the protection group. Personally I like the Dynamic VM Patterns option. Just makes things easy. Like in this example as a pattern I used “*de*”, which means that all VMs with “de” in the name will be snapshotted whenever the schedule triggers a snaphot.

As mentioned, vSAN Data Protection includes ALL the virtual machines which match the pattern when the snapshot is taken. So if you create a schedule like for instance the following, it could take close to an hour before the VM is snapshotted for the very first time. If you feel this is too long then there’s of course also an option to manually snapshot the VM.

The schedules you can create can be really complex (up to 10 schedules), I just created an example of what is possible, but there’s many different options and combinations you can create. Note, a VM can be part of three different protection groups at most, and I also want to point out that the protection group is not a consistency group. Also note, you can have at most 200 snapshots per object/VM, so that is also something to take into consideration.

Now when the VMs are snapshotted you get a few extra options available, you have “snapshot management” at the VM level individually, and of course you can also manage things at the protection group level. You will see options like “restore” and “clone” and this is where it gets interesting, as this will allow you to go back to a particular point in time if needed from a particular snapshot, and also clone the VM from a particular snapshot. Why would you clone it? Well if you would want to test something against a specific dataset for instance, or if you want to do some type of digital forensics and analyze a VM for whatever reason.

One thing that is also good to mention is that with vSAN Data Protection, you can also recover VMs which have been deleted from vCenter Server. This is one of those must have features in my opinion, as this is one of those things that does occasionally happen, unfortunately. Power Off VM, delete … oops, wrong one! When it comes to the recovery of a VM, the process is just straight forward. You select the snapshot and click “restore VM”, or “clone VM”, depending on what you would expect as the outcome. Restore means the current instance is powered off, and the snapshot is the restore point when you power it on again. Of course, when you do a clone you simply get a second instance of that VM. Note that when you create a clone, it is a linked clone, so it is extremely fast to instantiate.

The last thing I want to discuss is the immutability mode. I want to first caution people, this is what you think it is… it is IMMUTABLE! This means that you cannot delete the snapshots or change the schedule etc, let me quote the documentation:

Enable immutability mode on a protection group for additional security. You cannot edit or delete this protection group, change the VM membership, edit or delete snapshots. An immutable snapshot is a read-only copy of data that cannot be modified or deleted, even by an attacker with administrative privileges.

That is why I wanted to caution people, because if you create an immutable protection group with hundreds of VMs by accident… Yes, snapshots will be immutable, you cannot delete the protection group or the snapshots, or change the snapshot schedule. No, not even as an administrator. Do note, you can delete the VM if you want…

Anyway, I hope this gave a brief overview of what you can do with vSAN Data Protection at the moment. I’ll play around with it some more, and report back when there’s more to share 🙂

Where’s my vSAN Data Protection screen in 8.0 U3?

Duncan Epping · Jun 28, 2024 · Leave a Comment

The first time I deployed vSphere/vSAN 8.0 U3 I immediately looked for the vSAN Data Protection UI. I always get excited about new features, and simply want to test it. I mean who doesn’t like scalable snapshots and a great way of managing snapshot schedules? Finally available within the vSphere Client! Of course, I could not find it, but I figured that was because I was on some weird alpha build of the product. Now that the product has shipped it must be there out of the box right?

No it isn’t. You will need to deploy an appliance in order for this functionality to appear in the UI. The appliance can be found under “Drivers and Tools” under the vSphere Hypervisor download (Which is under VMware vSphere), it is called “VMware vSAN Snapshot Service Appliance”. The current version is named “snapservice_appliance-8.0.3.0-24057802_OVF10.ova”. You need to deploy this OVA, and I would highly recommend to request a DNS name for it and have it properly registered. I fiddled around with the hosts file on VCSA and forgot to add the name to my local host file on my laptop and had some weird issues as a result, which I am trying to reproduce at the moment, will report back if/when I can.

The other thing to point out is the following, the documentation tells you to download the certs and copy the text for the Appliance, it isn’t something most of us do daily either, you can simply open a web browser and use the following url “https://<name of your vCenter server>/certs/download.zip” to download the certs and then unzip the downloaded file. (More details to be found here.) It will contain the certs, and if you open the cert with a proper text editor you can copy/paste that into the deployment screen for the OVA. (Yes, I know there are other ways as well, but I found this one to be the easiest.)

Now when you deployed the OVA, and when everything was configured correctly you should see a successful task, or actually two: download plugin, deploy plug, as shown in the next screenshot.

If you do get the “error downloading plug-in” error message, it likely is one of two things:

DNS / Hosts files are not correctly configured, resulting in the URL not being reachable. Make sure you can resolve the URL!
Cert thumbprint was incorrectly copied/pasted, there’s a whole section on troubleshooting this here.

Okay, now that I got the appliance up and running, I will probably do a follow-up post on what you can do with it 🙂

Startup intro: SaaS-based backup solution Clumio

Duncan Epping · Apr 6, 2020 ·

Last week I saw an update from one of the Clumio founders on twitter. It reminded me that I had promised to take a look at their product. This week I had a meeting set up with Clumio and we went over their product and how to configure it briefly. Clumio is a SaaS based backup solution that was founded in 2017 by former PernixData, Nutanix, EMC folks. The three founders are Poojan Kumar, Kaustubh Patil, and Woon Jung, and those three you may remember from PernixData. One thing to point out is that they had 3 rounds of funding (~190 million dollars) so far and they came out of stealth around VMworld 2019. Coincidentally they won the Gold award for Best of VMworld in the data protection category, and best of show for the entire show, not bad for a first VMworld. I guess that I have to point out that although I would classify them as backup/recovery today, they are adding new functionality weekly and “backup/recovery” is probably not a fair category, data protection is more appropriate and it would not surprise me if that evolves to data management and protection over time. If you are not a fan of reading, simply head over to my youtube video on Clumio, otherwise, just continue below.

So how does it work conceptually? Well they basically have a SaaS solution, but you will need to install an OVA (they call it a cloud connector) in your environment to connect to the SaaS platform for VMware on-premises and VMware Cloud on AWS. When you connect AWS EBS they use a cloud formation template. This cloud connector is a 4 vCPU/8GB virtual machine that then needs the ability to connect to “the outside world” of course. The Cloud Connector is stateless and requires no updates. You can run this Cloud Connector appliance in multiple clusters, on-prem, or in VMware Cloud on AWS and once they are registered you will see those data sources in your portal. This is nice as you can see all your data sources across public and private clouds in one single pane of glass. You will have the ability to define “backup schemes” by creating policies. These policies can of course then be associated with objects. These objects can be VMs, Clusters and even vCenter Server instances. This means that if you assign a policy to vCenter Server that every new VM created will inherit the policy automatically. You may wonder, where is your data stored? Your data is stored in S3 buckets that are part of the Clumio SaaS-based platform. Customers are isolated from each other, they will have their own dedicated S3 buckets, and these buckets are created and maintained by Clumio, you as a customer only interact with Clumio! [Read more…] about Startup intro: SaaS-based backup solution Clumio

HCI3041BU: Introducing Scalable File Storage on vSAN

Duncan Epping · Sep 6, 2018 ·

Another beta announcement last week for vSAN was around Native File Services. This was the topic of HCI3041BU, which was titled “Introducing Scalable File Storage on vSAN with Native File Services”. The full session can be found here, the summary is below for your convenience. The session was by Venkat Kolli (Product Manager) and engineers Rick Spillane and Wenguang Wang.

Venkat kicks of the session describing the different types of storage most of our customers have in their data center today, and also what kind of data lands on the different types of storage. Basically, it is divided into three main types: Block, File, and Object. Where I personally believe that “object” is at the point of becoming more common on-premises but for many is consumed as a cloud service. Looking at where the data growth is today, it is mainly in the “unstructured data” space.

Next Venkat discusses the management complexity of traditional file storage, not just management complexity but also scaling and forecasting. Which in most cases leads to increased cost. How can vSAN help with simplifying File Services and lowering cost by providing a framework which allows you to serve block, file and object. For now, we are discussing file-services however, but the vision is clear.

Rick is up next introducing File Services. vSAN File Services allows you to create file shares and provide file services to users/consumers through the same familiar interface you have available today in vSphere. On top of that, you get to leverage the power of policy-based management to provision file shares in a specific way. Which means that File Shares will work in stretched clusters, can be protected with vSAN Data Protection, can be striped/replicated etc. Most important piece of feedback during the design phase from customers was that they did not want a separate storage cluster to manage for file services, this needed to be an integral part of today’s offering.

The requirements and design principles for the vSAN Distributed File System were:

Elastic Scaling
- Scale IOPs up/down
Single namespace across the cluster
Centrally managed, configured and monitored
Transparent failover
POSIX File Interface
Use vSAN services like data path, consensus mechanisms, and checksumming

Rick next explains a new platform that will (potentially) be included in vSAN, this is called the Storage Services Platform. What this provides is stateless containerized frontend servers which sit on top of the vSAN Distributed File System. This will be available for both VMware and partners, so even partners should be able to provide storage services through this platform. Data will sit in the VDFS volumes and then will be exposed through these services. These services, of course, are fully distributed and self-managing.

The Storage Services Platform is implemented in the form of a storage services control plane. This control plane will for instance monitor all front-end servers and node and help in the case of failures, but also will help to ensure availability during maintenance and upgrade. Also, when it comes to scalability the control plane monitors the instances and allows to scale up and down when needed.

Okay, that sounds great, but how do file shares get formed? File shares will be an aggregate of one or multiple vSAN Objects. The great thing about this is that it allows for elasticity in size and performance, plus policies can be associated with these objects. You can now simply create file shares through the UI, or leverage the API. The vSAN team made sure that you can access it and define them the way you prefer. On top of that, this platform will also be available to Kubernetes as part of our Cloud Native Storage Control Plane.

Next Rick briefly discussed data protection for file shares, he mentioned that the team has worked with various 3rd party vendors to allow for full backup and recovery, including file-level restore. What Rick also revealed, surprisingly enough, is that in the initial release we will have:

NFS v4.1 support
AD-based Authentication
Kerberos
Containerized application support

And in the release after that support for the following is planned:

SMB
vSAN DP Integration
OpenLDAP support

Next Wenguang came up on stage, and he demoed vSAN File Services. He showed how simple it is to enable File Services in the UI. Literally, a couple of steps, provide the networking details and also authentication mechanism. The next step will be to download an OVF, this contains the frontend service we spoke about earlier, for now, this is an NFS server, but this could be other services in the future. After the File Services have been enabled and the OVF is deployed you can start creating file shares. Again this is very straightforward, part of the familiar vSAN UI / HTML-5 interface, which is what I like most, if you know vSAN and/or vSphere you will be able to use vSAN File Services as well. I hope potential other services will be implemented in a similar easy manner.

The Q&A was interesting as well, as some questions around the potential SMB implementation were answered (SAMBA on Linux vs Microsoft vs Dell/EMC stack?) and for instance what block size is used for the file system (4K, like vSAN).

All in all a very exciting solution, and a great overview of what you can expect in the future for vSAN. Note that this is part of the beta, so if you are interested sign up!