BC-DR
What does Datastore Sharing/HCI Mesh/vSAN Max support when stretched?
This question has come up a few times now, what does Datastore Sharing/HCI Mesh/vSAN Max support when stretched? It is a question which keeps coming up somehow, and I personally had some challenges to find the statements in our documentation as well. I just found the statement and I wanted to first of all point people to it, and then also clarify it so there is no question. If I am using Datastore Sharing / HCI Mesh, or will be using vSAN Max, and my vSAN Datastore is stretched, what does VMware (or does not) support?
We have multiple potential combinations, let me list them and add whether it is supported or not, not that this is at the time of writing with the current available version (vSAN 8.0 U2).
- vSAN Stretched Cluster datastore shared with vSAN Stretched Cluster –> Supported
- vSAN Stretched Cluster datastore shared with vSAN Cluster (not stretched) –> Supported
- vSAN Stretched Cluster datastore shared with Compute Only Cluster (not stretched) –> Supported
- vSAN Stretched Cluster datastore shared with Compute Only Cluster (stretched, symmetric) –> Supported
- vSAN Stretched Cluster datastore shared with Compute Only Cluster (stretched, asymmetric) –> Not Supported
So what is the difference between symmetric and asymmetric? The below image, which comes from the vSAN Stretched Configuration, explains it best. I think Asymmetric in this case is most likely, so if you are running Stretched vSAN and a Stretched Compute Only, it most likely is not supported.
This also applies to vSAN Max by the way. I hope that helps. Oh and before anyone asks, if the “server side” is not stretched it can be connected to a stretched environment and is supported.
Do I need 2 isolation addresses with a (vSAN) stretched cluster for vSphere HA?
This question has come up multiple times now, so I figured I would write a quick post about it, do you need 2 isolation addresses with a (vSAN) stretched cluster for vSphere HA? This question comes up as the documentation has best practices around the configuration of HA isolation addresses for stretched clusters. The documentation (both for vSAN as well as traditional stretched storage) states that you need to have two reliable addresses, one in each location.
Now I have had the above question multiple times as some folks have mentioned that they can use a Gateway Address with Cisco ACI which would still be accessible in both locations even if there’s a partition due to for instance an ISL failure. If that is the case, and the IP address is indeed available in both locations during those types of failure scenarios then it would suffice to use a single IP address as your isolation address.
You will however need to make sure that the IP address is reachable over the vSAN network when using vSAN as your stretched storage platform. (When vSAN is enabled vSphere HA uses the vSAN network for communications.) If it is reachable you can simply define the isolation address by setting the advanced setting “das.isolationaddress0”. It is also recommended to disable the use of the default gate of the management network by setting “das.usedefaultisolationaddress” to false for vSAN based environments.
I have requested the vSAN stretched clustering documentation to be updated to reflect this.
Deleting the vCLS VMs using Retreat Mode starting with vSphere 8.0 U2
I posted about “retreat mode” and how to delete the vCLS VMs when needed a while back, including a quick demo. Back then you needed to configure an advanced setting for a cluster if you wanted to delete the VMs for whatever reason. (Usually for troubleshooting purposes people would do a delete/recreate.) Starting with vSphere 8.0 U2 you can now use the UI to enable retreat mode on a per cluster level. How do you do this? well fairly straight forward:
- Click on the cluster you would want to delete the VMs for
- Click on Configure
- Click on “General” under “vSphere Cluster Services”
- Click on “EDIT VCLS MODE”
- Click on “Retreat Mode” and click “OK”
Now the VMs will be deleted, if you want to recreate the VMs, follow the same procedure, but change “Retreat Mode” to “System Managed”. I tested the process yesterday and created a quick demo for you:
RE: Re-Imagining Ransomware Protection with VMware Ransomware Recovery
Last week a blog post was published on VMware’s Virtual Blocks blog on the topic of Ransomware Recovery. Some of the numbers shared were astonishing and hard to contextualize even. Global damages caused by ransomware for instance are estimated to exceed 42 billion dollars in 2024, and this is expected to be doubling every year. Also, 66% of all enterprises were hit by ransomware, of which 96% did not regain full access to their data.
Now, it explicitly mentions “enterprises”, but this does not mean that only enterprise organizations are prone to ransomware attacks. Ransomware attacks do not discriminate, every company, non-profit, and even individuals are at risk if you ask me. As a smart person once said, data is the new oil, and it seems that everyone is drilling for it, including trespassers who don’t own the land! Of course, depending on the type of organization, solutions and services are available to mitigate the risks of losing access to your company’s most valuable asset, data.
VMware, and many other vendors, have various solutions (and services) to protect your data center, your workloads, and essentially your data. But what do you do if you are breached? How do you recover? How fast can you recover, and how fast do you need to recover? How far back do you need to go, and are you allowed to go? Some of you may wonder why I ask these questions, well that has everything to do with the numbers shared at the start of this blog. Unfortunately, today, when organizations are breached malicious code is often only detected after a significant amount of time. Giving the attacker time to collect information about the environment, spread itself throughout the environment, activate the attack, and ultimately request the ransom.
This is when you, the administrator, the consultant, and the cloud admin, will get those questions. How fast can you recover? How far back do we need to go? Where do we recover to? And what about your data? All fair questions, but these shouldn’t be asked after an attack has occurred and ransom is demanded. These are questions we all need to ask constantly, and we should be aligning our Ransomware Recovery strategy with the answers to those questions.
Now, it is fair to say that I am probably somewhat biased, but it is also fair to say that I am as Dutch as it gets and I wouldn’t be writing this blog if I did not believe in this service. VMware’s Ransomware Recovery as a Service, which is part of VMware Cloud Disaster Recovery, provides a unique solution in my humble opinion. First, the service provided can just simply start as a cloud storage service to which you replicate your workloads, without needing to run a full (small but still) software-defined datacenter. This is especially useful for those organizations that can afford to take ~3hrs to spin up an SDDC when there’s a need to recover (or test the process). However, it is also possible to have an SDDC ready for recovery at all times, which will reduce the recovery time objective significantly.
Of course, VMware provides the ability to protect multiple environments, many different workloads, and many point-in-time copies (snapshots). But it also enables you to verify your recovery point (snapshot) in a fully isolated environment. What you will appreciate is that the solution will actually not only isolate the workloads, but on top of that also provide you insights at various levels about the probability of the snapshot being infected. First of all, while going through the recovery process, entropy and change rate are shown which provides insights of when potentially the environment was infected. (Or ransomware was activated for that matter.)
But maybe even more important, through the use of NSX and VMware’s Next Generation Anti-Virus software, a recovery point can be safely tried. A quarantined environment is instantiated and the recovery point can be scanned for vulnerabilities and threats, and an analysis of the workloads to be recovered can be provided, as shown below. This simplifies the recovery and validation process immensely, as it removes the need for many of the manual steps usually involved in this process. Of course, as part of the recovery process, the advanced runbook capabilities of VMware Cloud Disaster Recovery are utilized, enabling the recovery of a full data center, or simply a select group of VMs, by running a recovery plan. This recovery plan includes the order in which workloads need to be powered on and restored, but can also include IP customization, DNS registration, and more.
Depending on the outcome of the analysis, you can then determine what to do with the snapshot. Is the data not compromised? Are the workloads not infected? Are there any known vulnerabilities that we would need to mitigate first? If data is compromised, or the environment is infected in any shape or form, you can simply disregard the snapshot and clean the environment. Rinse and repeat until you find that recovery point that is not compromised! If there are known vulnerabilities, and the environment is clean, you can mitigate those and complete the recovery. Ultimately resulting in full access to your company’s most valuable asset, data.