• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

high availability

vSphere HA internals: restart placement changes in vSphere 7!

Duncan Epping · May 13, 2020 · 2 Comments

Frank and I are looking to update the vSphere Clustering deep dive to vSphere 7. While scoping the work I stumbled on to something interesting, and this is the change that was introduced for the vSphere HA restart mechanism,  and specifically the placement of VMs in vSphere 7. In previous releases vSphere HA had a straight forward way of doing placement for VMs when VMs need to be restarted as a result of a failure. In vSphere 7.0 this mechanism was completely overhauled.

So how did it work pre-vSphere 7?

  • HA uses the cluster configuration
  • HA uses the latest compatibility list it received from vCenter
  • HA leverages a local copy of the DRS algorithm with a basic (fake) set of stats and runs the VMs through the algorithm
  • HA receives a placement recommendation from the local algorithm and restarts the VM on the suggested host
  • Within 5 minutes DRS runs within vCenter, and will very likely move the VM to a different host based on actual load

As you can imagine this is far from optimal. So what is introduced in vSphere 7? Well, we introduce two different ways of doing placement for restarts in vSphere 7:

  1. Remote Placement Engine
  2. Simple Placement Engine

The Remote Placement Engine, in short, is the ability for vSphere HA to make a call to DRS for the recommendation of the placement of a VM. This will take the current load of the cluster, the VM happiness, and all configured affinity/anti-affinity/vm-host affinity rules into consideration! Will this result in a much slower restart? The great thing is that the DRS algorithm has been optimized over the past years and it is so fast that there will not be a noticeable difference between the old mechanism and the new mechanism. Added benefit of course for the engineering team is that they can remove the local DRS module, which means there’s less code to maintain. How this works is that the FDM Master communicated with the FDM Manager which runs in vCenter Server. FDM Manager communicates with the DRS service to request a placement recommendation.

Now some of you will probably wonder what happens when vCenter Server is unavailable, well this is where the Simple Placement Engine comes in to play. The team has developed a new placement engine that basically takes a round-robin approach, but does consider of course “must rules” and the compatibility list. If a host, for instance, is not connected to the datastore the VM is running on that needs to be restarted than that host is excluded from the list of potential placement targets. By the way, before I forget, version 7 also introduced a vCenter heartbeat mechanism as a result. HA will be heartbeating the vCenter Server instance to understand when it will need to resort to the Simple Placement Engine vs the Remote Placement Engine.

I dug through the FDM log to find some proof of these new mechanisms, (/var/log/fdm.log) and found an entry which shows there are indeed two placement engines:

Invoking the RPE + SPE Placement Engine

RPE stands for “remote placement engine”, and SPE for “simple placement engine”. Where Remote of course refers to DRS. You may ask yourself, how do you know if DRS is being called? Well, that is something you can see in the logs in the DRS log files, when a placement request is received, the below entry shows up in the log file:

FdmWaitForUpdates-vim.ClusterComputeResource:domain-c8-26307464

This even happens when DRS is disabled and also when you use a license edition which does not include DRS even, which is really cool if you ask me. If for whatever reason vCenter Server is unavailable, and as a result DRS can’t be called, you will see this mentioned in the FDM log, and as shown below, it will use the Simple Placement Engine’s recommendation for the placement of the VM:

Invoke the placement service to process the placement update from SPE

A cool and very useful small HA enhancement if you ask me for vSphere 7.0!

 

** Disclaimer: This article contains references to the words master and/or slave. I recognize these as exclusionary words. The words are used in this article for consistency because it’s currently the words that appear in the software, in the UI, and in the log files. When the software is updated to remove the words, this article will be updated to be in alignment. **

I discovered .PNG files on my datastore, can I delete them?

Duncan Epping · Sep 26, 2019 · 1 Comment

I noticed this question on Reddit about .PNG which were located in VM folders on a datastore. The user wanted to remove the datastore from the cluster but didn’t know where these files were coming from and if the VM required those files to be available in some shape or form. I can be brief about it, you can safely delete this .PNG files. These files are typically created by VM Monitoring (part of vSphere HA) when a VM is rebooted by VM Monitoring. This is to ensure you can troubleshoot the problem potentially after the reboot has occurred. So it takes a screenshot of the VM to for instance capture the blue screen of death. This feature has been in vSphere for a while, but I guess most people have never really noticed it. I wrote an article about it when vSphere 5.0 was released and below is the screenshot from that article where the .PNG file is highlighted. For whatever reason I had trouble finding my own article on this topic so I figured I would write a new one on it. Of course, after finishing this post I found the original article. Anyway, I hope it helps others who find these .PNG files in their VM folders.

Oh, and I should have added, it can also be caused by vCloud Director or be triggered through the API, as described by William in this post from 2013.

I want vSphere HA to use a specific Management VMkernel interface

Duncan Epping · Apr 30, 2019 ·

This comes up occasionally, customers have multiple Management VMkernel interfaces and see vSphere HA traffic on both of the interfaces, or on the incorrect interface. In some cases, customers use the IP address of the first management VMkernel interface to connect the host to vCenter and then set an isolation address that is on the network of the second management VMkernel so that HA uses that network. This is unfortunately not how it works. I wrote about this 6 years ago, but it never hurts to reiterate as it came up twice over the past couple of weeks. The “management” tickbox is all about HA traffic. Whether “Management” is enabled or not makes no difference for vCenter or SSH for instance. If you create a VMkernel interface without the Management tickbox enabled you can still connect to it over SSH and you can still use the IP to add it to vCenter Server. Yes, it is confusing, but that is how it works.

If you want the interface to not be used by vSphere HA, make sure to untick “Management”. Note, this is not the case for vSAN, with vSAN automatically the vSAN network is used by HA. This only pertains to traditional, non-vSAN based, infrastructures.

Trigger APD on iSCSI LUN on vSphere

Duncan Epping · Jun 21, 2018 ·

I was testing various failure scenarios in my lab today for the vSphere Clustering Deepdive session I have scheduled for VMworld. I needed some screenshots and log files of when a datastore hit an APD scenario, for those who don’t know APD stands for all paths down. In other words: the storage is inaccessible and ESXi doesn’t know what has happened and why. vSphere HA has the ability to respond to that kind of failure. I wanted to test this, but my setup was fairly simple and virtual. So I couldn’t unplug any cables. I also couldn’t make configuration changes to the iSCSI array as that would rather trigger a PDL (permanent device loss), so how do you test and APD scenario?

After trying various things like killing the iSCSI daemon (it gets restarted automatically with no impact on the workload) I bumped in to this command which triggered the APD:

  • SSH in to the host you want to trigger the APD on, run the following command
    esxcli iscsi session remove  -A vmhba65
  • Make sure of course to replace “vmhba65” with the name of your iSCSI adapter

This triggered APD, as witness in the fdm.log and vmkernel.log, and ultimately resulted in vSphere HA killing the impacted VM and restarting it on a healthy host. Anyway, just wanted to share this as I am sure there are others who would like to test APD responses in their labs or before their environment goes in to production.

There may be other easy ways as well, if you know any, please share in the comments section.

vSphere HA Restart Priority

Duncan Epping · Apr 4, 2018 ·

I’ve seen more and more questions popping up about vSphere HA Restart Priority lately. I figured I would write something about it. I already did in this post about what’s new in vSphere 6.5 and I did so in the Stretched Cluster guide. It has always been possible to set a restart priority for VMs, but pre-vSphere 6.5 this priority simply referred to the scheduling of the restart of the VM after a failure. Each host in a cluster can restart 32 VMs at the same time, so you can imagine that if the restart priority is only about VM restarts that it doesn’t really add a lot of value. (Simply because we can schedule many at the same time, and the priority would as such have no effect.)

As of vSphere 6.5 we have the ability to specify the priority and also specify when HA should continue with the next batch. Especially this last part is important, as this allows you to specify that we start with the next priority level when:

  1. Resources are allocated (default)
  2. VMs are powered on
  3. Guest heartbeat is detected
  4. App heartbeat is detected

I think these are mostly self-explanatory, note though the “resources are allocated” means that a target host for restart has been found by the master. So this happens within milliseconds. Very similar for VMs are powered on, this also says nothing about when a VM is available. This literally is “power on”. In some cases it could take 10-20 seconds for a VM to be fully booted and the apps to be available, in other cases it may take minutes… It all depends on the services that will need to be started within the VM. So if it is important for the “service provided” by the VM to be available before starting the next batch then option 3 or 4 would be your best pick. Note that with option 4 you will need to have VM/Application Monitoring and defined within the VM. Now when you have made your choice around when to start the next batch you can simply start adding VMs to a specific level.

Instead of the 3 standard restart “buckets” you now have 5: Highest, High, Medium, Low, Lowest. Why these funny names? Well that was done in order to stay backwards compatible with vSphere 6 / 5 etc. By default all VMs will have the “medium” restart priority, and no it won’t make any difference if you change all of them to high. Simply because the restart priority is about the priority between VMs, it doesn’t change the host response times etc. In other words, changing the restart priority only makes sense when you have VMs at different levels, and usually will only make a big difference when you also change the option “Start next priority VMs when”.

So where do you change this? Well that is pretty straight forward:

  • Click on your HA cluster and then the “Configure” Tab
  • Click on “VM Overrides” and then click “Add”
  • Click on the green plus sign and select the VMs you would like to give a higher, or lower priority
  • Then select the new priority and specify when the next batch should start

And if you are wondering, yes the restart priority also applies when vCenter is not available. So you can use it even to ensure vCenter, AD and DNS are booted up first. All of this info is stored in the cluster configuration data. You can examine this on the commandline by the way by typing the following:

/opt/vmware/fdm/fdm/prettyPrint.sh clusterconfig

Note that the outcome is usually pretty big, so you will have to scroll through it to find what you need, if you do a search on “restartPriority” then you should be able to find it the VMs for which you changed the priority. Pretty cool right?!

Oh, if you didn’t know yet… Frank, Niels and I are actively updating the vSphere Clustering Deep Dive. Hopefully we will have something out “soon”, as in around VMworld.

  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Interim pages omitted …
  • Go to page 7
  • Go to Next Page »

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the HCI BU at VMware. He is a VCDX (# 007) and the author of multiple books including "vSAN Deep Dive" and the “vSphere Clustering Technical Deep Dive” series.

Upcoming Events

04-Feb-21 | Czech VMUG – Roadshow
25-Feb-21 | Swiss VMUG – Roadshow
04-Mar-21 | Polish VMUG – Roadshow
09-Mar-21 | Austrian VMUG – Roadshow
18-Mar-21 | St Louis Usercon Keynote

Recommended reads

Sponsors

Want to support us? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2021 · Log in