Server

Can I vMotion a VM while IO Insight is tracing it?

Duncan Epping · Mar 4, 2021 ·

Today during the Polish VMUG we had a great question, basically, the question was if you can vMotion a VM while vSAN IO Insight is tracing it. I did not know the answer as I had never tried it, so I had to test and validate it in the lab. While testing it became obvious that IO Insight and vMotion are not a supported combination today. Or better said, when you vMotion a VM which has IO Insight enabled and the VM is being traced, then the tracing will stop and you will not be able to inspect the results. When you click on view results you will see the error suggesting that the “monitored VMs might be deleted” as shown below.

For now, if you are tracing a VM for an extended period of time, make sure to override the DRS automation level for that VM so that DRS does not interfere with the tracing. (You can do this on a per VM basis.) I would also recommend informing other administrators to not manually migrate the VM temporarily to avoid the situation where the trace is stopped. You may wonder why this is the case, well it is pretty simple, tracing happens on a host level. We start a user world on the host where the VM is running to trace the IO. If you move the VM, the user world doesn’t know what has happened to the VM unfortunately. For now, who knows if this is something that may change over time… Either way, I would always recommend not migrating VMs while tracing, as that also impacts the data.

Hope that helps, and thank Tomasz for the great question!

What is this Catalog folder on my datastore?

Duncan Epping · Feb 22, 2021 ·

A question popped up on our internal slack earlier these days, and as I didn’t find anything online for it I figured I would write a quick article. When you look at your datastore, you may find various folders. Some you will recognize like the “.vSphere-HA” folder structure, which is used by vSphere HA, others you may not recognize, like the folder called “catalog” (see screenshot below), which has folders like “shard”, “mutex”, “tidy”, and “vclock” in it. The folder “catalog”, and all folders underneath, are created automatically when you use First Class Disk’s (FCD). FCD uses the folder structure to store it’s metadata in it. So please do not remove/delete or touch these folders. If you like to know more about FCD, make sure to read Cormac’s post on it.

Oh and if wonder why you are using FCD in the first place, it is often used for Kubernetes “persistent volumes”. So if you are using Tanzu/Kubernetes and have persistent volumes, chances are you are using FCD, which would result in those folders on your datastore. Nothing to worry about. 🙂

I joined the Futr Tech Podcast last week, check out the episode here!

Duncan Epping · Feb 16, 2021 ·

Last week I had the pleasure of joining Chris and Sandesh on the Futr Tech podcast. The episode was just published online, and I wanted to share it with all of you via this blog post. Make sure to watch/listen to the episode and subscribe to the youtube channel or podcast. I’ve been following these guys for a while, and there are some very interesting conversations to check out. (I enjoyed the episode with Bipul Sinha very much.)

You can find them on youtube here, or add them to your podcast app of choice (buzzsprout, spotify, itunes) I had fun, looking forward to some more podcasting in 2021!

vSAN 6.7 U1 ebook available for 4.99 USD and paper copy for 19.99!

Duncan Epping · Feb 2, 2021 ·

If you haven’t seen it yet, you can pick up the vSAN 6.7 U1 Deep Dive ebook for 4.99 USD and the paper copy for 19.99. Yes, we also lowered the price for all other regions, so it doesn’t matter where you are, you should able to pick it for less than a Big Mac meal! Although the book doesn’t cover vSAN 7.0, we still feel it is very relevant, and all core principles still apply to vSAN today. Hopefully, we will be able to update the book at some point later this year, if and when we can find the time, for now hopefully this offer makes the book affordable all over the world!

How long does it take before a host is declared failed?

Duncan Epping · Jan 26, 2021 ·

I had a question this week around the failure of a host. The question was how long it takes before a host is declared failed. Now let’s be clear, failed means “dead” in this case, not isolated or partitioned. It could be the power has failed, the host has gone completely unresponsive, or anything else where there’s absolutely no response from the host whatsoever. In that scenario, how long does it take before HA has declared the VM dead? Now note, the below timeline is in a traditional infrastructure. Also note, that this is theoretical, when everything is optimal.

T0 – Secondary Host failure.
T3s – The Primary Host begins monitoring datastore heartbeats for 15 seconds.
T10s – The host is declared unreachable and the Primary will ping the management network of the failed host.
- This is a continuous ping for 5 seconds.
T15s – If no heartbeat datastores are configured, the host will be declared dead.
T18s – If heartbeat datastores are configured and there have been no heartbeats, the host will be declared dead, restarts will be initiated.

Now, when a Primary Host fails the timeline looks a bit different. This is mainly because first, a new Primary Host will need to be elected. Also, we need to ensure that the new primary has received the latest state of all secondary hosts.

T0 – Primary Host failure.
T10s – Primary election process initiated.
T25s – New primary elected and reads the protectedlist.
- New primary waits for secondary hosts to report running VMs
T35s – Old primary declared unreachable.
T50s – Old primary declared dead, new primary initiates restarts for all VMs on the protectedlist which are not running.

Keep in mind, this does not mean that VMs will be restarted with 18 seconds, or 35 seconds, for that matter. When the host is declared dead, or a new primary is elected, the restart process starts. The VMs that need to be restarted will first need to be placed, and when placed, they will need to be restarted. All of these steps will take time. On top of that, depending on the operating system and the apps running within the VM, the time it takes before the restart is fully completed could vary a lot between VMs. In other words, although the state is declared rather fast, the actual total time it takes to restart can vary and is definitely not an exact science.