VMware

vSphere FT and vVols/SPBM an unsupported config? Why?

Duncan Epping · Aug 20, 2020 ·

I was pointed out by a customer (thanks Johan), that vSphere FT is not supported when using SPBM on non-vSAN based storage systems. You may wonder why this is, at least I did wonder. I figured it would be a testing constraint of some sort, but after emailing product management, engineering, and our quality engineering team I now understand why it is. Now before I explain it, the constraint is documented here, let me quote the section for you:

Virtual Volume datastores.
Storage-based policy management. Storage policies are supported for vSAN storage.

So why is this and why would vSAN be supported as that also uses SPBM? Well the difference is in the implementation. For vVols there’s a dependency on vCenter Server to be available when creating new VMs. This is essentially what happens when an FT instance needs to be restarted. We will need to associate an SPBM policy with it and we can only retrieve it via vCenter Server. With vSAN, FT/HA can also retrieve the needed info via the ESXi host. This is why FT and vSAN are a supported configuration, and vVols and FT, unfortunately, is not at the moment. Hopefully, though, this will change in the future. (Yes, I filed a feature request before anyone asks.)

VCG Notification demo and Changing the Default vSAN Policy Demo

Duncan Epping · Jun 22, 2020 ·

I created two youtube video last week which I just wanted to share with everyone. In these demo’s I am showing the new VCG Notification option. The VCG Notification option is very useful for customers who want to be notified via email when a change to a component of a ready node configuration has occurred. This could be a change in support, change of driver / firmware etc.

Another demo that I recorded was around how to change the default policy for a vSAN Cluster. This seems to be an option that many folks haven’t been able to find in the UI. It is pretty straight forward, hence I am sharing it here.

vCenter 7.0 stating “New vCenter server updates are available” while there are no updates?

Duncan Epping · Jun 8, 2020 ·

I have seen this issue reported on the VMware Community Forum a few times, when you run vCenter 7.0 you will receive a message in the vSphere Client stating the following “New vCenter server updates are available”. When you then click “View Updates” however you will notice that there are no updates available for vCenter Server and you are indeed running the latest and greatest version. We (Cormac and I) actually encountered the issue in our lab as well, which is demonstrated in the screenshot below.

Pretty confusing indeed. Please note that this is a known issue, there’s no need to report this to VMware. Soon a patch will be released for vCenter Server which will fix this problem.

The issue is fixed in 7.0b as documented in the release notes!

Running vSphere 6.7 or 6.5 and configured HA APD/PDL responses? Read this…

Duncan Epping · May 14, 2020 ·

If you are running vSphere 6.7 or 6.5 and have not installed 6.7 P02 yet (6.5 P05 is available soon) and you have APD/PDL responses configured within vSphere HA it could be that an issue causes VMs not to be failed over when an APD or PDL occurs. This is a known issue in the release, and P02 or P05 solves this problem. What is the problem? Well, a bug causes VMs which are listed in “VM overrides” to have settings that are not configured to be set to “disabled” instead of “unset”, in specific the APD/PDL setting.

This means that even though you have APD/PDL responses configured on a cluster level, the VM level configuration overrides it as it would be set to “disabled”. It doesn’t matter really why you added them to VM Overrides, could be to configure VM Restart Priority for instance. The frustrating part is that the UI doesn’t show you it is disabled as it looks like it is not configured.

If you can’t install the patch just yet, for whatever reason, but you do have VMs in VM Overrides, make sure to go to VM Overrides and explicitly configure the VMs to have the APD/PDL responses enabled similar to what it is configured to on a cluster level as shown in the screenshots below.

vSphere HA internals: restart placement changes in vSphere 7!

Duncan Epping · May 13, 2020 ·

Frank and I are looking to update the vSphere Clustering deep dive to vSphere 7. While scoping the work I stumbled on to something interesting, and this is the change that was introduced for the vSphere HA restart mechanism, and specifically the placement of VMs in vSphere 7. In previous releases vSphere HA had a straight forward way of doing placement for VMs when VMs need to be restarted as a result of a failure. In vSphere 7.0 this mechanism was completely overhauled.

So how did it work pre-vSphere 7?

HA uses the cluster configuration
HA uses the latest compatibility list it received from vCenter
HA leverages a local copy of the DRS algorithm with a basic (fake) set of stats and runs the VMs through the algorithm
HA receives a placement recommendation from the local algorithm and restarts the VM on the suggested host
Within 5 minutes DRS runs within vCenter, and will very likely move the VM to a different host based on actual load

As you can imagine this is far from optimal. So what is introduced in vSphere 7? Well, we introduce two different ways of doing placement for restarts in vSphere 7:

Remote Placement Engine
Simple Placement Engine

The Remote Placement Engine, in short, is the ability for vSphere HA to make a call to DRS for the recommendation of the placement of a VM. This will take the current load of the cluster, the VM happiness, and all configured affinity/anti-affinity/vm-host affinity rules into consideration! Will this result in a much slower restart? The great thing is that the DRS algorithm has been optimized over the past years and it is so fast that there will not be a noticeable difference between the old mechanism and the new mechanism. Added benefit of course for the engineering team is that they can remove the local DRS module, which means there’s less code to maintain. How this works is that the FDM Master communicated with the FDM Manager which runs in vCenter Server. FDM Manager communicates with the DRS service to request a placement recommendation.

Now some of you will probably wonder what happens when vCenter Server is unavailable, well this is where the Simple Placement Engine comes into play. The team has developed a new placement engine that basically takes a round-robin approach, but does consider of course “must rules” (VM to Host) and the compatibility list. Note, affinity, or anti-affinity rules, are not considered when SPE is used instead of RPE! This is a known limitation, which is considered to be fixed in the future. If a host, for instance, is not connected to the datastore the VM is running on that needs to be restarted than that host is excluded from the list of potential placement targets. By the way, before I forget, version 7 also introduced a vCenter heartbeat mechanism as a result. HA will be heart beating the vCenter Server instance to understand when it will need to resort to the Simple Placement Engine vs the Remote Placement Engine.

I dug through the FDM log to find some proof of these new mechanisms, (/var/log/fdm.log) and found an entry that shows there are indeed two placement engines:

Invoking the RPE + SPE Placement Engine

RPE stands for “remote placement engine”, and SPE for “simple placement engine”. Where Remote of course refers to DRS. You may ask yourself, how do you know if DRS is being called? Well, that is something you can see in the logs in the DRS log files, when a placement request is received, the below entry shows up in the log file:

FdmWaitForUpdates-vim.ClusterComputeResource:domain-c8-26307464

This even happens when DRS is disabled and also when you use a license edition which does not include DRS even, which is really cool if you ask me. If for whatever reason vCenter Server is unavailable, and as a result DRS can’t be called, you will see this mentioned in the FDM log, and as shown below, it will use the Simple Placement Engine’s recommendation for the placement of the VM:

Invoke the placement service to process the placement update from SPE

A cool and very useful small HA enhancement if you ask me for vSphere 7.0!

** Disclaimer: This article contains references to the words master and/or slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **