• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • Unexplored Territory Podcast
  • HA Deepdive
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

draas

RE: Re-Imagining Ransomware Protection with VMware Ransomware Recovery

Duncan Epping · Apr 13, 2023 ·

Last week a blog post was published on VMware’s Virtual Blocks blog on the topic of Ransomware Recovery. Some of the numbers shared were astonishing and hard to contextualize even. Global damages caused by ransomware for instance are estimated to exceed 42 billion dollars in 2024, and this is expected to be doubling every year. Also, 66% of all enterprises were hit by ransomware, of which 96% did not regain full access to their data.

Now, it explicitly mentions “enterprises”, but this does not mean that only enterprise organizations are prone to ransomware attacks. Ransomware attacks do not discriminate, every company, non-profit, and even individuals are at risk if you ask me. As a smart person once said, data is the new oil, and it seems that everyone is drilling for it, including trespassers who don’t own the land! Of course, depending on the type of organization, solutions and services are available to mitigate the risks of losing access to your company’s most valuable asset, data.

VMware, and many other vendors, have various solutions (and services) to protect your data center, your workloads, and essentially your data. But what do you do if you are breached? How do you recover? How fast can you recover, and how fast do you need to recover? How far back do you need to go, and are you allowed to go? Some of you may wonder why I ask these questions, well that has everything to do with the numbers shared at the start of this blog. Unfortunately, today, when organizations are breached malicious code is often only detected after a significant amount of time. Giving the attacker time to collect information about the environment, spread itself throughout the environment, activate the attack, and ultimately request the ransom.

This is when you, the administrator, the consultant, and the cloud admin, will get those questions. How fast can you recover? How far back do we need to go? Where do we recover to? And what about your data? All fair questions, but these shouldn’t be asked after an attack has occurred and ransom is demanded. These are questions we all need to ask constantly, and we should be aligning our Ransomware Recovery strategy with the answers to those questions.

Now, it is fair to say that I am probably somewhat biased, but it is also fair to say that I am as Dutch as it gets and I wouldn’t be writing this blog if I did not believe in this service. VMware’s Ransomware Recovery as a Service, which is part of VMware Cloud Disaster Recovery, provides a unique solution in my humble opinion. First, the service provided can just simply start as a cloud storage service to which you replicate your workloads, without needing to run a full (small but still) software-defined datacenter. This is especially useful for those organizations that can afford to take ~3hrs to spin up an SDDC when there’s a need to recover (or test the process). However, it is also possible to have an SDDC ready for recovery at all times, which will reduce the recovery time objective significantly.

Of course, VMware provides the ability to protect multiple environments, many different workloads, and many point-in-time copies (snapshots). But it also enables you to verify your recovery point (snapshot) in a fully isolated environment. What you will appreciate is that the solution will actually not only isolate the workloads, but on top of that also provide you insights at various levels about the probability of the snapshot being infected. First of all, while going through the recovery process, entropy and change rate are shown which provides insights of when potentially the environment was infected. (Or ransomware was activated for that matter.)

But maybe even more important, through the use of NSX and VMware’s Next Generation Anti-Virus software, a recovery point can be safely tried. A quarantined environment is instantiated and the recovery point can be scanned for vulnerabilities and threats, and an analysis of the workloads to be recovered can be provided, as shown below. This simplifies the recovery and validation process immensely, as it removes the need for many of the manual steps usually involved in this process. Of course, as part of the recovery process, the advanced runbook capabilities of VMware Cloud Disaster Recovery are utilized, enabling the recovery of a full data center, or simply a select group of VMs, by running a recovery plan. This recovery plan includes the order in which workloads need to be powered on and restored, but can also include IP customization, DNS registration, and more.

Depending on the outcome of the analysis, you can then determine what to do with the snapshot. Is the data not compromised? Are the workloads not infected? Are there any known vulnerabilities that we would need to mitigate first? If data is compromised, or the environment is infected in any shape or form, you can simply disregard the snapshot and clean the environment. Rinse and repeat until you find that recovery point that is not compromised! If there are known vulnerabilities, and the environment is clean, you can mitigate those and complete the recovery. Ultimately resulting in full access to your company’s most valuable asset, data.

VMware announces Ransomware Recovery as a Service and Data Protection vision!

Duncan Epping · Sep 13, 2022 ·

At VMware Explore there was a whole session (CEIB1236US) dedicated to the vision for Data Protection and Ransomware Recovery as a Service. Especially the Ransomware Recovery as a Service had my interest as it is something that keeps coming up with customers. How do I protect my data, and when needed how do recover? Probably a year ago or so I had a conversation with VMware CTO for Cloud Storage and Data (Sazzala) on this topic, and we met up with various customers to gather requirements. Those discussions ultimately led to the roadmap for this new service and new features. Below I am going to summarize what was discussed in this session at VMware Explore, but I would urge you to watch the session as it is very valuable, and it is impossible for me to capture everything.

VMware’s Disaster Recovery as a Service solution is a unique offering as it provides the best of both worlds when it comes to Disaster Recovery. With DR you typically have two options:

  1. Fast recovery, relatively high cost.
    • Traditionally most customers went for this option, they had a “hot standby” environment that provided full capacity in case of emergency. But as this environment is always up and running and underutilized, it is a significant overhead.
  2. Slower recovery, relatively low cost.
    • This is where VMs are replicated to cheap and deep storage and compute resources are limited (if available at all). When a recovery needs to happen, data rehydration is required and as such, it is a relatively slow process.

With VMware’s offering, you now have a 3rd option: Fast recovery, at a relatively low cost! VMware provides the ability to store backups on cheap storage, and then recover (without hydration) directly in a cloud-based SDDC. It provides a lot of flexibility, as you can have a minimum set of hosts constantly running within your prepared SDDC, and scale out when needed during a failure, or you can even create a full SDDC at the time of recovery.

Now, this offering is available in VMware Cloud on AWS in various regions. During the session, the intention was also shared to deliver similar capabilities on Azure VMware Solution, Oracle Cloud VMware Solution, Google Cloud VMware Engine, and/or Alibaba Cloud VMware Service. Basically all global hyper-scalers. Maybe even more important, VMware also discussed additional capabilities that are being worked on. Scaling to tens of thousands of VMs, managing multi-petabytes of storage, providing 1-minute RPO levels, proving multi-VM consistency, having end-to-end SLA observability, providing advanced insights into cost and usage, and probably most important… a full REST API.

All of those enhancements are very useful for those aiming to recover from a disaster, not just natural disasters, but also for Ransomware attacks. Some of you may wonder how common a ransomware attack is, but unfortunately, it is very common. Surveys have revealed that 60% of the surveyed organizations were hit by ransomware in the past 12 months, 92% of those who paid the ransom did not gain full access to the data, and the average downtime was 16 days. Those are some scary numbers in my opinion. Especially the downtime associated with an attack, and the fact that full access was not regained even after paying a ransom.

In general recovery from ransomware is complex as ransomware typically remains undetected for larger periods of time before you are exposed to it. Then when you are exposed you don’t have too many options, you recover to a healthy point in time or you pay the ransom. When you recover, of course, you want to know if the set you are recovering is infected or not. You also want to have some indication of when the environment was infected, as no one wants to go through 3 months of snapshots before you find the right one. That alone would take days, if not weeks, and downtime is extremely expensive. This is where VMware Ransomware Recovery for VMware Cloud DR comes in.

The aim for the VMware Ransomware Recovery for VMware Cloud DR solution is to provide the ability to recover to an Isolated Recovery Environment (including networking). This first of all prevents reinfection at the time of recovery. During the recovery process, the environment is also analyzed by a next-generation anti-virus scanner for known/current threats. Simply to prevent a situation where you recover a snapshot that was infected. What I am even more impressed by is that the plan is to also include a visual indication of when most likely an environment was infected, this is done by providing an insight into the data change rate and entropy. Now, entropy is not a word most non-native speakers are familiar with, I wasn’t, but it refers to the randomness of the data. Both the change rate and the entropy could indicate abnormal patterns, which then could indicate the time of infection and help identify a healthy snapshot to recover!

As mentioned, during recovery the snapshot is scanned by a Next-Gen AV, and of course, when infections are detected they will be reported in the UI. This then provides you the option to discard the recovery and select a different snapshot. Even if no vulnerabilities are found the environment can be powered on fully isolated, providing you the ability to manually inspect before exposing app owners, or end-users, to the environment again.


Now comes the cool part, when you have curated the environment, when you are absolutely sure this is a healthy point in time that was not infected, you have the choice to fallback to your “source” environment or simply remain running in your VMware Cloud while you clean up your “source” site. Before I forget, I’ve been talking about full environments and VMs so far, but of course, it is also the intention to provide the ability to restore files and folders of course! All in all, a very impressive solution that should be available in the near future.

If you are interested in these capabilities and would like to stay informed, please fill out this form: https://forms.office.com/r/yh69Npq7nY.

Primary Sidebar

About the Author

Duncan Epping is a Chief Technologist and Distinguished Engineering Architect at Broadcom. Besides writing on Yellow-Bricks, Duncan is the co-author of the vSAN Deep Dive and the vSphere Clustering Deep Dive book series. Duncan is also the host of the Unexplored Territory Podcast.

Follow Us

  • X
  • Spotify
  • RSS Feed
  • LinkedIn

Recommended Book(s)

Advertisements




Copyright Yellow-Bricks.com © 2025 ยท Log in