stretched

Can you move a vSAN Stretched Cluster to a different vCenter Server?

Duncan Epping · Sep 17, 2019 ·

I noticed a question today on one of our internal social platforms, the question was if you can move a vSAN Stretched Cluster to a different vCenter Server. I can be short, I tested it and the answer is yes! How do you do it? Well, we have a great KB that actually documents the process for a normal vSAN Cluster and the same applies to a stretched cluster. When you add the hosts to your new vCenter Server and into your newly created cluster it will pull in the fault domain details (stretched cluster configuration details) from the hosts itself, so when you go to the UI the Fault Domains will pop up again, as shown in the screenshot below.

What did I do? Well in short, but please use the KB for the exact steps:

Powered off all VMs
Placed the hosts into maintenance mode (do not forget about the Witness!)
Disconnected all hosts from the old vCenter Server, again, do not forget about the witness
Removed the hosts from the inventory
Connected the Witness to the new vCenter Server
Created a new Cluster object on the new vCenter Server
Added the stretched cluster hosts to the new cluster on the new vCenter Server
Took the Witness out of Maintenance Mode first
Took the other hosts out of maintenance

That was it, pretty straight forward. Of course, you will need to make sure you have the storage policies in both locations, and you will also need to do some extra work if you use a VDS. Nevertheless, it works pretty much straight-forward and as you would expect it to work!

All-Flash Stretched vSAN Cluster and VM/Host Rules

Duncan Epping · Aug 20, 2018 ·

I had a question last week about the need for DRS Rules or also known as VM/Host Rules in an All-Flash Stretched vSAN Infrastructure. In a vSAN Stretched Cluster, there is read locality implemented. The read locality functionality, in this case, ensures that: reads will always come from the local fault domain.

This means that in the case of a stretched environment, the reads will not have to traverse the network. As the maximum latency is 5ms, this avoids a potential performance hit of 5ms for reads. The additional benefit is also that in a hybrid configuration we avoid needing to re-warm the cache. For the (read) cache re-warming issue we recommend our customers to implement VM/Host Rules. These rules ensure that VMs always run on the same set of hosts, in a normal healthy situation. (This is explained on storagehub here.)

What about an All-Flash cluster, do you still need to implement these rules? The answer to that is: it depends. You don’t need to implement it for “read cache” issues, as in an all-flash cluster there is no read cache. Could you run without those rules? Yes, you can, but if you have DRS enabled this also means that DRS freely moves VMs around, potentially every 5 minutes. This also means that you will have vMotion traffic consuming the inter-site links, and considering how resource hungry vMotion can be, you need to ask yourself if cross-site load balancing adds anything, what the risk is, what the reward is? Personally, I would prefer to load balance within a site, and only go across the link when doing site maintenance, but you may have a different view or set of requirements. If so, then it is good to know that vSAN and vSphere support this.

Isolation Address in a 2-node direct connect vSAN environment?

Duncan Epping · Nov 22, 2017 ·

As most of you know by now when vSAN is enabled vSphere HA uses the vSAN network for heartbeating. I recently wrote an article about the isolation address and relationship with heartbeat datastores. In the comment section, Johann asked what the settings should be for 2-Node Direct Connect with vSAN. A very valid question as an isolation is still possible, although not as likely as with a stretched cluster considering you do not have a network switch for vSAN in this configuration. Anyway, you would still like the VMs that are impacted by the isolation to be powered off and you would like the other remaining host to power them back on.

So the question remains, which IP Address do you select? Well, there’s no IP address to select in this particular case. As it is “direct connect” there are probably only 2 IP addresses on that segment (one for host 1 and another for host 2). You cannot use the default gateway either, as that is the gateway for the management interface, which is the wrong network. So what do I recommend:

Disable the Isolation Response >> set it to “leave powered on” or “disabled” (depends on the version used
Disable the use of the default gateway by setting the following HA advanced setting:
- das.usedefaultisolationaddress = false

That probably makes you wonder what will happen when a host is isolated from the rest of the cluster (other host and the witness). Well, when this happens then the VMs are still killed, but not as a result of the isolation response kicking in, but as a result of vSAN kicking in. Here’s the process:

Heartbeats are not received
Host elects itself primary
Host pings the isolation address
- If the host can’t ping the gateway of the management interface then the host declares itself isolated
- If the host can ping the gateway of the management interface then the host doesn’t declare itself isolated
Either way, the isolation response is not triggered as it is set to “Leave powered on”
vSAN will now automatically kill all VMs which have lost access to its components
- The isolated host will lose quorum
- vSAN objects will become isolated
- The advanced setting “VSAN.AutoTerminateGhostVm=1” allows vSAN to kill the “ghosted” VMs (with all components inaccessible).

In other words, don’t worry about the isolation address in a 2-node configuration, vSAN has this situation covered! Note that “VSAN.AutoTerminateGhostVm=1” only works for 2-node and Stretched vSAN configurations at this time.

UPDATE:

I triggered a failure in my lab (which is 2-node, but not direct connect), and for those who are wondering, this is what you should be seeing in your syslog.log:

syslog.log:2017-11-29T13:45:28Z killInaccessibleVms.py [INFO]: Following VMs are powered on and HA protected in this host.
syslog.log:2017-11-29T13:45:28Z killInaccessibleVms.py [INFO]: * ['vm-01', 'vm-03', 'vm-04']
syslog.log:2017-11-29T13:45:32Z killInaccessibleVms.py [INFO]: List inaccessible VMs at round 1
syslog.log:2017-11-29T13:45:32Z killInaccessibleVms.py [INFO]: * ['vim.VirtualMachine:1', 'vim.VirtualMachine:2', 'vim.VirtualMachine:3']
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: List inaccessible VMs at round 2
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: * ['vim.VirtualMachine:1', 'vim.VirtualMachine:2', 'vim.VirtualMachine:3']
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: Following VMs are found to have all objects inaccessible, and will be terminated.
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: * ['vim.VirtualMachine:1', 'vim.VirtualMachine:2', 'vim.VirtualMachine:3']
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: Start terminating VMs.
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: Successfully terminated inaccessible VM: vm-01
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: Successfully terminated inaccessible VM: vm-03
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: Successfully terminated inaccessible VM: vm-04
syslog.log:2017-11-29T13:46:06Z killInaccessibleVms.py [INFO]: Finished killing the ghost vms

New whitepaper available: vSphere Metro Storage Cluster Recommended Practices (6.5 update)

Duncan Epping · Oct 24, 2017 ·

I had many requests for an updated version of this paper, so the past couple of weeks I have been working hard. The paper was outdated as it was last updated around the vSphere 6.0 timeframe, and it was only a minor update. I looked at every single section and added in new statements and guidance around vSphere HA Restart Priority for instance. So for those running a vSphere Metro Storage Cluster / Stretched Cluster of some kind, please read the brand new vSphere Metro Storage Cluster Recommended Practices (6.5 update) white paper.

It is available on storagehub.vmware.com in PDF and for reading within your browser. Any questions and comments, please do not hesitate to leave them here.

vSAN 6.6 Stretched Cluster Demo

Duncan Epping · May 19, 2017 ·

I had one more demo to finish and share and that is the vSAN 6.6 stretched cluster demo. I already did a stretched clustering demo when we initially released it, but with the enhanced functionality around local protection I figured I would re-record it. In this demo (~12 minutes) I will show you how to configure vSAN 6.6 with dedupe / compression enabled in a Stretched Cluster configuration. I will also create 3 VM Storage Policies, assign those to VMs and show you that vSAN has place the data across locations. I hope you find it useful.