• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • Unexplored Territory Podcast
  • HA Deepdive
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

vSphere HA restart times, how long does it actually take?

Duncan Epping · Mar 13, 2025 · Leave a Comment

I had a question today, and it was based on material I wrote years ago for the Clustering Deepdive. (read it here) The material talks about the sequence HA goes through when a failure has occurred. If you look at the sequence for instance where a “secondary” host has failed, it looks as follows:

  • T0 – Secondary host failure.
  • T3s – Primary host begins monitoring datastore heartbeats for 15 seconds.
  • T10s – The secondary host is declared unreachable and the primary will ping the management network of the failed secondary host. This is a continuous ping for 5 seconds.
  • T15s – If no heartbeat datastores are configured, the secondary host will be declared dead if there is no reply to the ping.
  • T18s – If heartbeat datastores are configured, the secondary host will be declared dead if there’s no reply to the ping and the heartbeat file has not been updated or the lock was lost.

So, depending on whether you have heartbeat datastores or not, this sequence takes either 15 or 18 seconds. Does that mean the VMs are then instantly restarted, and if so, how long does that take? Well no, they won’t instantly restart, because when this sequence has ended, the secondary host which has failed is actually declared dead. Now the potentially impacted VMs will need to be verified if they have actually failed, a list of “to be restarted” VMs will need to be created, and a placement request will need to be done.

The placement request will either go to DRS, or will be handled by HA itself, depending on whether DRS is enabled and if vCenter Server is available. After placement has been determined, the primary host will then request the individual hosts to restart the VMs which should be restarted. After the host(s) has received the list of VMs it needs to restart it will do this in batches of 32, and of course restart priority / order, will be applied. The whole aforementioned process can easily take 10-15 seconds (if not longer), which means that in a perfect world, the restart of the VM occurs after about 30 seconds. Now, this is when the restart of the VM is initiated, that does not mean that the VM, or the services it is hosting, will be available after 30 seconds. The power-on sequence of the VM can take anywhere from seconds, to minutes, depending of course on the size of the VM and the services that need to be started during the power-on sequence.

So, although it only takes 15 to 18 seconds for vSphere HA to determine and declare a failure, there’s much more to it, hopefully, this post provides a better understanding of all that is involved.

Unexplored Territory #092 – Introducing DSM 2.2 featuring Cormac Hogan!

Duncan Epping · Mar 10, 2025 · 2 Comments

Recently Data Services Manager 2.2 was released, so it was time for me to ask my friend Cormac Hogan back on the show to share with us what was introduced. Although it was just a “minor” release, there were some major announcements, of which the S3 Object Storage capabilities are probably what will excite you the most! Make sure to listen to the episode either via the player below or on your favorite podcast app. (Spotify, Apple, etc)

Unexplored Territory #091 – Discussing performance with Ravi Soundararajan!

Duncan Epping · Feb 24, 2025 · Leave a Comment

This is probably my favorite episode in a long time. Ravi is just such an enthusiastic and charismatic person to talk too, and on top of that he has a deep understanding of everything vSphere/vCenter and performance. If you want to hear more about tagging, vCenter limits, bandwidth for vCenter, then this is the episode to listen to! What a show!

Can I have an AF-4 ReadyNode for vSAN ESA with less memory?

Duncan Epping · Feb 18, 2025 · Leave a Comment

I got this question the other day, and it was around the amount of memory the AF-4 ReadyNode configuration needs to have in order for it to be supported. I can understand where the question comes from, but what most people don’t seem to understand is that there’s a set of minimal requirements, and that the ReadyNode profiles are as the KB states a “guidance”. The listed configurations are a guidance. This guidance is based on the anticipated resource consumption for a given set of VMs. Of course, this could be very different for your workload. That is why this article that describes the hardware guidance now clearly states the following:

To maintain a configuration supported by VMware Global Services (GS), all ReadyNodes certified for vSAN ESA must meet or exceed the resources of the smallest configuration (vSAN-ESA-AF-0 for vSAN HCI or vSAN-Max-XS for vSAN Max).

This not only applies to memory, but also to other components, as long as you meet the minimum specified below.

Can I have an AF-4 ReadyNode for vSAN ESA with less memory?

Can I disable the vSAN service if the cluster is running production workloads?

Duncan Epping · Feb 7, 2025 · Leave a Comment

I just had a discussion with someone who had to disable the vSAN service, while the cluster was running a production workload. They had all their VMs running on 3rd party storage, so vSAN was empty, but when they went to the vSAN Configuration UI the “Turn Off” option was grayed out. The reason this option is grayed out is that vSphere HA was enabled. This is usually the case for most customers. (Probably 99.9%.) If you need to turn off vSAN, make sure to temporarily disable vSphere HA first, and of course enable it again after you turned off vSAN! This ensures that HA is reconfigured to use the Management Network instead of the vSAN Network.

Another thing to consider, it could be that you manually configured the “HA Isolation Address” for the vSAN Network, make sure to also change that to an IP address on the Management Network again. Lastly, if there’s still anything stored on vSAN, this will be inaccessible when you disable the vSAN service. Of course, if nothing is running on vSAN, then there will be no impact to the workload.

Can I disable the vSAN service if the cluster is running production workloads?

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 3
  • Page 4
  • Page 5
  • Page 6
  • Page 7
  • Interim pages omitted …
  • Page 492
  • Go to Next Page »

Primary Sidebar

About the Author

Duncan Epping is a Chief Technologist and Distinguished Engineering Architect at Broadcom. Besides writing on Yellow-Bricks, Duncan is the co-author of the vSAN Deep Dive and the vSphere Clustering Deep Dive book series. Duncan is also the host of the Unexplored Territory Podcast.

Follow Us

  • X
  • Spotify
  • RSS Feed
  • LinkedIn

Recommended Book(s)

Advertisements




Copyright Yellow-Bricks.com © 2025 · Log in