high availability

Why does HA not power-on VMs after a full cluster shutdown?

Duncan Epping · Dec 20, 2021 ·

I received this question and figured I would write a quick post about it, as it comes up occasionally. Why does vSphere HA no power-on VMs after a full cluster is brought back online after a full cluster shutdown? In this case, the customer had a power outage, so their hosts and all VMs were powered off, by an administrator cleanly, as a result of the backup power unit running out of power. Unfortunately, this happens more frequently than you would think.

When VMs are powered off by an administrator, or anyone/anything (PowerCLI etc) else which has permissions to power off VMs, then vCenter Server will mark these VMs as “cleanly powered off”. Next, also the state of the VMs is tracked by vSphere HA. So if a host is powered off, HA will know if the VM was powered on, or powered off at the time the host goes missing.

Now, when the host (or hosts) returns for duty, vSphere HA will of course verify what the last known state was of the cluster. It will read the list of all the VMs that were powered on, and it will restart those that were powered on and are configured for HA. It will also look at a VM property called “runtime.cleanPowerOff”, this property indicates if the VM was cleanly powered off by an Admin or a script, or if the VM was for instance powered off by vSphere HA itself. (PDL response etc.) Depending on the value of the property, the VM will, or will not be restarted.

Having said all of that, when you power off a VM manually through the UI, or via a script, then the VM will be marked as being “cleanly powered off”. This means that HA has no reason to restart it, as the powered-off VM is not the result of a host, network, or storage failure.

Can I make a host in a cluster the vSphere HA primary / master host?

Duncan Epping · May 21, 2021 ·

There was an interesting question on the VMware VMTN Community this week, although I wrote about this in 2016 I figured I would do a short write-up again as the procedure changed since 7.0u1. The question was if it was possible to make a particular host in a cluster the vSphere HA primary (or master as it was called previously) host. The use case was pretty straightforward, in this case, the customer had a stretched cluster configuration with vSAN, they wanted to make sure that the vSphere HA primary host was located in the “preferred” site, as this could potentially speed up the restart of VMs. Now, mind you, that when I say “speed up” we are talking about 2-3 seconds difference at most, but for some folks, this may be crucial. I personally would not recommend making configuration changes, but if you do want to do this, vSphere does have the option to do so.

When it comes to vSphere HA, there’s no UI option or anything like that to assign the “primary/master” host role. However, there’s the option to specify an advanced setting on a host level to indicate that a certain host needs to be favored during the primary/master election. Again, this is not very common for customers to configure, but if you desire to do so, it is possible. The advanced setting is called “fdm.nodeGoodness” and depending on which version you use, you will need to configure it either via the fdm.cfg file, or via the configstorecli. You can read about this process in-depth here.

Of course, I did try if this worked in my lab, here’s what I did, I first list the current configured advanced options using configstorecli for vSphere HA:

configstorecli config current get -g cluster -c ha -k fdm
{
   "mem_reservation_MB": 200,
   "memory_checker_time_in_secs": 0
}

Next, I will set the “node_goodness” for my host, when setting this it will need to be a positive value, in my case I am setting it to 10000000. I first dumped the current config in a json file:

configstorecli config current get -g cluster -c ha -k fdm > test.json

Next, I edited the file and added the setting “node_goodness” with a value of 10000000, so that is looks as follows:

{ 
    "mem_reservation_MB": 200, 
    "memory_checker_time_in_secs": 0,
    "node_goodness": 10000000
}

I then imported the file:

configstorecli config current set -g cluster -c ha -k fdm -infile test.json

After importing the file and reconfiguring for HA on one of my hosts, you can see in the screenshots below that the master role moved from 1507 to 1505.

I also created a quick demo, for those who prefer video content:

vSphere HA internals: restart placement changes in vSphere 7!

Duncan Epping · May 13, 2020 ·

Frank and I are looking to update the vSphere Clustering deep dive to vSphere 7. While scoping the work I stumbled on to something interesting, and this is the change that was introduced for the vSphere HA restart mechanism, and specifically the placement of VMs in vSphere 7. In previous releases vSphere HA had a straight forward way of doing placement for VMs when VMs need to be restarted as a result of a failure. In vSphere 7.0 this mechanism was completely overhauled.

So how did it work pre-vSphere 7?

HA uses the cluster configuration
HA uses the latest compatibility list it received from vCenter
HA leverages a local copy of the DRS algorithm with a basic (fake) set of stats and runs the VMs through the algorithm
HA receives a placement recommendation from the local algorithm and restarts the VM on the suggested host
Within 5 minutes DRS runs within vCenter, and will very likely move the VM to a different host based on actual load

As you can imagine this is far from optimal. So what is introduced in vSphere 7? Well, we introduce two different ways of doing placement for restarts in vSphere 7:

Remote Placement Engine
Simple Placement Engine

The Remote Placement Engine, in short, is the ability for vSphere HA to make a call to DRS for the recommendation of the placement of a VM. This will take the current load of the cluster, the VM happiness, and all configured affinity/anti-affinity/vm-host affinity rules into consideration! Will this result in a much slower restart? The great thing is that the DRS algorithm has been optimized over the past years and it is so fast that there will not be a noticeable difference between the old mechanism and the new mechanism. Added benefit of course for the engineering team is that they can remove the local DRS module, which means there’s less code to maintain. How this works is that the FDM Master communicated with the FDM Manager which runs in vCenter Server. FDM Manager communicates with the DRS service to request a placement recommendation.

Now some of you will probably wonder what happens when vCenter Server is unavailable, well this is where the Simple Placement Engine comes into play. The team has developed a new placement engine that basically takes a round-robin approach, but does consider of course “must rules” (VM to Host) and the compatibility list. Note, affinity, or anti-affinity rules, are not considered when SPE is used instead of RPE! This is a known limitation, which is considered to be fixed in the future. If a host, for instance, is not connected to the datastore the VM is running on that needs to be restarted than that host is excluded from the list of potential placement targets. By the way, before I forget, version 7 also introduced a vCenter heartbeat mechanism as a result. HA will be heart beating the vCenter Server instance to understand when it will need to resort to the Simple Placement Engine vs the Remote Placement Engine.

I dug through the FDM log to find some proof of these new mechanisms, (/var/log/fdm.log) and found an entry that shows there are indeed two placement engines:

Invoking the RPE + SPE Placement Engine

RPE stands for “remote placement engine”, and SPE for “simple placement engine”. Where Remote of course refers to DRS. You may ask yourself, how do you know if DRS is being called? Well, that is something you can see in the logs in the DRS log files, when a placement request is received, the below entry shows up in the log file:

FdmWaitForUpdates-vim.ClusterComputeResource:domain-c8-26307464

This even happens when DRS is disabled and also when you use a license edition which does not include DRS even, which is really cool if you ask me. If for whatever reason vCenter Server is unavailable, and as a result DRS can’t be called, you will see this mentioned in the FDM log, and as shown below, it will use the Simple Placement Engine’s recommendation for the placement of the VM:

Invoke the placement service to process the placement update from SPE

A cool and very useful small HA enhancement if you ask me for vSphere 7.0!

** Disclaimer: This article contains references to the words master and/or slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **

I discovered .PNG files on my datastore, can I delete them?

Duncan Epping · Sep 26, 2019 ·

I noticed this question on Reddit about .PNG which were located in VM folders on a datastore. The user wanted to remove the datastore from the cluster but didn’t know where these files were coming from and if the VM required those files to be available in some shape or form. I can be brief about it, you can safely delete this .PNG files. These files are typically created by VM Monitoring (part of vSphere HA) when a VM is rebooted by VM Monitoring. This is to ensure you can troubleshoot the problem potentially after the reboot has occurred. So it takes a screenshot of the VM to for instance capture the blue screen of death. This feature has been in vSphere for a while, but I guess most people have never really noticed it. I wrote an article about it when vSphere 5.0 was released and below is the screenshot from that article where the .PNG file is highlighted. For whatever reason I had trouble finding my own article on this topic so I figured I would write a new one on it. Of course, after finishing this post I found the original article. Anyway, I hope it helps others who find these .PNG files in their VM folders.

Oh, and I should have added, it can also be caused by vCloud Director or be triggered through the API, as described by William in this post from 2013.

I want vSphere HA to use a specific Management VMkernel interface

Duncan Epping · Apr 30, 2019 ·

This comes up occasionally, customers have multiple Management VMkernel interfaces and see vSphere HA traffic on both of the interfaces, or on the incorrect interface. In some cases, customers use the IP address of the first management VMkernel interface to connect the host to vCenter and then set an isolation address that is on the network of the second management VMkernel so that HA uses that network. This is unfortunately not how it works. I wrote about this 6 years ago, but it never hurts to reiterate as it came up twice over the past couple of weeks. The “management” tickbox is all about HA traffic. Whether “Management” is enabled or not makes no difference for vCenter or SSH for instance. If you create a VMkernel interface without the Management tickbox enabled you can still connect to it over SSH and you can still use the IP to add it to vCenter Server. Yes, it is confusing, but that is how it works.

If you want the interface to not be used by vSphere HA, make sure to untick “Management”. Note, this is not the case for vSAN, with vSAN automatically the vSAN network is used by HA. This only pertains to traditional, non-vSAN based, infrastructures.