Running your VSAN witness for a 2 node cluster on a 2 node cluster

Duncan Epping · Sep 20, 2016 ·

A week ago we had a discussion on twitter about a scenario which was talked about at VMworld. The scenario is one where you have two 2-node clusters and for each 2-node cluster the required Witness VM is running on the other. Let me show you what I mean to make it clear:

The Witness VM on Cluster A is the witness for Cluster B, and the Witness VM on Cluster B is the witness for Cluster A. As it stands today this is not a supported configuration out of the box. For ongoing support, it is required that users go through the RPQ process so VMware can validate the design. Please contact your VMware representative for more details.

A knowledge base article should be published on this topic soon, if and when it is published I will update this post and point to it.

Comments

Eelke Hagen says

20 September, 2016 at 22:24

One way witness on another vsan 2 node cluster is still supported though? As long as the second witness runs somewhere else (like on a third server)?

Do you have a link on that RPQ process and in what cases you should get your design validated?
- Duncan Epping says
  
  21 September, 2016 at 08:36
  
  A KB article should be released soon, it will contain more details.
David Pasek says

20 September, 2016 at 22:42

Hello Duncan. First of all thanks for such very valuable information. However, may I ask you why it is not supported OOB? My understanding is that only requirement for 2-node (aka ROBO) VSAN cluster is to have external witness. It seems to me that such requirement is fulfilled even by this design. I just don’t understand why witness on other 2-node cluster is not supported and RPQ process is needed? I’m asking because I don’t understand how witness underlaying infrastructure impacts supportability. Thanks for your answer in advance.
Joey D. says

21 September, 2016 at 16:39

Duncan,
I personally worked w/ VMware in creating an installation as described above, and have been ensured that it was in fact supported. The published installation process was a bit “broken”, due to a few things that are not necessarily dictated in the docs are in fact “required to be done in an appropriate order” to ensure proper instantiation (wrong order == multiple back/forth stepping). Also, there were assurances to require low latency, modified multicast addressing per cluster, and proper failover for the vSS. The infrastructure itself has been functioning quite well for about 8 months across multiple clusters, so I anticipate the details of the KB publication mentioned. I also appreciate this info and will definitely be following up w/ their engineering folks today 🙂
wodge says

22 September, 2016 at 14:56

A really interesting solution for us would be if you could support a truly 2 node stretched cluster, with the Witness appliance (and VC) running in-band within the VSAN cluster. That way, you would achieve a true 2 node replicated storage platform (without needing any external resource), and the could exploit the native high-availability features offered from a vSphere cluster (e.g. HA / FT). Would be ideal for a small management cluster.

I appreciate (1) there would be complexity in building the cluster initially, and (2) the Witness appliance would need to be highly available to survive the host failure (protected by FT?).

I’m sure you guys have already investigated this possibility (I guess it would be a big sell if you could provide a truly standalone 2-node vSphere/VSAN cluster).
- Joe says
  
  22 September, 2016 at 17:06
  
  i second the witness using FT. I don’t see why that wouldn’t work. Just for fun I was planning on testing this in my lab. Not sure how it iwll go but in theory it sounds… plausible!
  - Eelke Hagen says
    
    22 September, 2016 at 22:00
    
    come on people, cluster basics 101:
    In the scenario you describe the cluster is unable to differentiate between connectivity issues vs host down (with regards to the host running the active witness).
    
    You never want both sides to become active since you will have trouble getting your data synchronized (you might end up with 2 fully functional isolated ‘datacenters’ and half of the company might end up running on one and the other half on the second).
    These vm’s can never be merged, think about what happens to a database server in this scenario.
    Split brain prevention is key, no cluster can go without it. When in doubt, shut all servers down, just make sure it never gets inconsistent.
    
    Think about what would happen with salary payments if your HR db goes corrupt this way, some payments wil be made, some will be made double, some will get lost. Better hit the emergency brake…
    - Joe says
      
      22 September, 2016 at 22:09
      
      Eelke,
      
      Thanks for letting me know you think I should take the ICM class again. I’ll take it under advisement. Did 2 node vSphere HA clusters become a thing of the past?
      
      Here’s a link discussing the usefulness of a 2 node vSphere cluster with HA….. Duncan supplied the correct answer. https://communities.vmware.com/thread/462963?start=0&tstart=0
      
      Not sure how split brain would happen if HA is configured correctly in a standard cluster. Maybe you could elaborate for us who aren’t all knowing.
      
      Thanks
      Joe
      - Eelke Hagen says
        
        22 September, 2016 at 22:32
        
        2 node HA is no probleem as long as you let the storage handle data integrity. Single (shared) storage like a SAN. vSphere uses file locking and datastore heartbeating. Thus this is no issue in a regular ESX 2 node cluster, only when it comes to storage.
        
        Joe says
        
        27 September, 2016 at 15:33
        
        It seems possible still. Has anyone tested this yet? I may have to test this in my lab this week.
        
        Duncan Epping says
        
        27 September, 2016 at 15:44
        
        Of course it is possible, this is not about what is possible, this is about what is supported.
Eelke Hagen says

27 September, 2016 at 17:51

And if you test it, please also test breaking the connection between the nodes. If you end up with both active that is a problem and a failed test.
AK says

30 September, 2016 at 11:32

Hi Duncan,

Stated in the following document; http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsan/vmware-virtual-san-6.2-stretched-cluster-guide.pdf

Copied from the PDF:
“The minimum supported configuration is 1+1+1
(3 nodes). The maximum configuration is 15+15+1 (31 nodes).”

So I can conclude from this document that is a supported config. So 1 node in site X and Y and one host as witness on site Z.

Or?

Thanks,
Tim says

1 December, 2016 at 23:27

Your diagram shows (4) hosts, (2) Clusters… I would like to know if (2) nodes, (1) cluster is on the roadmap and in the testing phase? The scenario would be a branch office requiring a single node for compute, Memory, and Storage. The two host would be connected to each other via (2) 10GbE Twin-Ax Cables for vSAN/vMotion and also (2) 1GbE connections for the Guest traffic. Witness VMs would run on both of the (2) Nodes. Thank you!
- Duncan Epping says
  
  6 December, 2016 at 17:24
  
  No we do not have any plans for supporting 2 host clusters with witness running on top of that cluster. Too many risks.
Eelke Hagen says

27 January, 2017 at 10:35

Hi Duncan, we have a cluster with a dedicated witness host with local storage (with the Witness Appliance running on top of that).

What would be the supported way to keep this backed up?

Does a daily backup using our backup tool suffice and would it work if we would restore the witness and in the proces lose up to a day’s worth of witness components?

Or should we use Vmware Replication and limit dataloss to 15 minutes? Would everything sync up if re restored that?

Regards,
Eelke
- Duncan Epping says
  
  27 January, 2017 at 11:49
  
  if it goes bad you introduce a new witness appliance in to the cluster, that is usually the way to go.

Related

Reader Interactions

Comments