• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

HA restarts in a DR/DA event

Duncan Epping · May 3, 2014 ·

I received a couple of questions last week about HA restarts in the scenario where a full site failure has occurred or a part of the storage system has failed and needs to be taken over by another datacenter. Yes indeed this is related to stretched clusters and HA restarts in a DR/DA event.

The questions were straight forward, how does the restart time-out work and what happens after the last retry? I wrote about HA restarts and the sequence last year, so lets just copy and paste that here:

  • Initial restart attempt
  • If the initial attempt failed, a restart will be retried after 2 minutes of the previous attempt
  • If the previous attempt failed, a restart will be retried after 4 minutes of the previous attempt
  • If the previous attempt failed, a restart will be retried after 8 minutes of the previous attempt
  • If the previous attempt failed, a restart will be retried after 16 minutes of the previous attempt

You can extend the restart retry by increasing the value “das.maxvmrestartcount”. And then after every 15/16 minutes a new restart will be attempted. The question this triggered though is why would it even take 4 retries? The answer I got was: we don’t know if we will be able to fail over the storage within 30 minutes and if we will have sufficient compute resources…

Here comes the sweet part about vSphere HA, it actually is a pretty smart solution, it will know if VMs can be restarted or not. In this case as the datastore is not available there is absolutely no point in even trying and HA as such will not even bother. As soon as the storage becomes available though the restart attempts will start. Same applies to compute resource, if for whatever reason there is insufficient unreserved compute resources to restart your VMs then HA will wait for them to become available… nice right!?! Do note I emphasized the word “unreserved” as that is what HA cares about and not actually about used resources.

Share it:

  • Tweet

Related

Server 5.5, ha, stretched, stretched cluster, u1, vSphere

Reader Interactions

Comments

  1. David Pasek says

    4 May, 2014 at 10:24

    Excellent insights. As usually!!! Thanks for sharing and your endless community support.

  2. Matthew Kryszkiewicz says

    12 May, 2014 at 10:18

    Correct me if I`m worng, but in the scenario when only storage fails at one site, the VMs will be stuck with datastore in APD, and no HA restarts going on right?

    • Virtualizer says

      12 May, 2014 at 12:03

      That is correct. The VM will eventually fail, running in memory, waiting for the SAN to come back up online..

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007) and the author of the "vSAN Deep Dive" and the “vSphere Clustering Technical Deep Dive” series.

Upcoming Events

04-Mar-21 | Polish VMUG – Roadshow
09-Mar-21 | Austria VMUG – Roadshow
16-Mar-21 | VMUG Turkey – Roadshow
18-Mar-21 | St Louis Usercon Keynote
26-Mar-21 | Hungary VMUG – Roadshow
08-Apr-21 | VMUG France – Roadshow

Recommended reads

Sponsors

Want to support us? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2021 · Log in