I got this question internally recently: Should we create a vSAN Stretched Cluster configuration or create a vSAN Fault Domains configuration when we have multiple datacenters within close proximity on our campus? In this case, we are talking about less than 1ms latency RTT between buildings, maybe a few hundred meters at most. I think it is a very valid question, and I guess it kind of depends on what you are looking to get out of the infrastructure. I wrote down the pros and cons, and wanted to share those with the rest of the world as well, as it may be useful for some of you out there. If anyone has additional pros and cons, feel free to share those in the comments!
vSAN Stretched Clusters:
- Pro: You can replicate across fault domains AND protect additionally within a fault domain with R1/R5/R6 if required.
- Pro: You can decide whether VMs should be stretched across Fault Domains or not, or just protected within a fault domain/site
- Pro: Requires less than 5MS RTT latency, which is easily achievable in this scenario
- Con/pro: you probably also need to think about DRS/HA groups (VM-to-Host)
- Con: From an operational perspective, it also introduces a witness host, and sites, which may complicate things, and at the various least requires a bit more thinking
- Con: Witness needs to be hosted somewhere
- Con: Limited to 3 Fault Domains (2x data + 1x witness)
- Con: Limited to 20+20+1 configuration
vSAN Fault Domains:
- Pro: No real considerations around VM-to-host rules usually, although you can still use it to ensure certain VMs are spread across buildings
- Pro: No Witness Appliance to manage, update or upgrade. No overhead of running a witness somewhere
- Pro: No design considerations around “dedicated” witness sites and “data site”, each site has the same function
- Pro: Can also be used with more than 3 Fault Domains or Datacenters, so could even be 6 Fault Domains, for instance
- Pro: Theoretically can go up to 64 hosts
- Con: No ability to protect additionally within a fault domain
- Con: No ability to specify that you don’t want to replicate VMs across Fault Domains
- Con/Pro: Requires sub-1ms RTT latency at all times, which is low, but will be achievable in a campus cluster, usually
Leave a Reply