I wrote about how vSphere HA 5.x restart attempt timing works a long time ago but there appears still to be some confusion about this. I figured I would clarify this a bit more, I don’t think I can make it more simple than this:
- Initial restart attempt
- If the initial attempt failed, a restart will be retried after 2 minutes of the previous attempt
- If the previous attempt failed, a restart will be retried after 4 minutes of the previous attempt
- If the previous attempt failed, a restart will be retried after 8 minutes of the previous attempt
- If the previous attempt failed, a restart will be retried after 16 minutes of the previous attempt
After the fifth failed attempt the cycle ends. Well that is, unless a new master host is selected (for whatever reason) between the first and the fifth attempt. In that case, we start counting again. Meaning that if a new master is selected after attempt 3, the new master will start with the “initial restart attempt.
Or as Frank Denneman would say:
** Disclaimer: This article contains references to the words master and/or slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **
Hi,
What would the usual suspects be if restarts have reached a maximum retry count\cycle?
It is still confusing. Can you re-write it?
What happens after all 5 attempts happen and nothing works – it stops?
Yes it stops. 5 attempts and then it gives up.
Duncan,
If you use the advanced option das.maxvmrestartcount, and say set it to 10, at what interval do the last 5 restarts take effect, it used to be 8 minutes, but then the restart timers changed to the ones in your post and I can’t find the info now.
Thanks
If I am not mistaken 15 minutes. But I haven’t tested this and I can’t find any doc to back it up…
Me neither 🙁
Hello Ducan,
Do you some more informations about HA restart time line if you configure more than 5 restart tries ? I’ve not found any informations about it and we have several customers that are asking for it because 30mn is very short in case of DR …
Thanks.
15/16 minutes is when each retry will occur.