• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

vsphere ha

Running vSphere 6.7 or 6.5 and configured HA APD/PDL responses? Read this…

Duncan Epping · May 14, 2020 · Leave a Comment

If you are running vSphere 6.7 or 6.5 and have not installed 6.7 P02 yet (6.5 P05 is available soon) and you have APD/PDL responses configured within vSphere HA it could be that an issue causes VMs not to be failed over when an APD or PDL occurs. This is a known issue in the release, and P02 or P05 solves this problem. What is the problem? Well, a bug causes VMs which are listed in “VM overrides” to have settings that are not configured to be set to “disabled” instead of “unset”, in specific the APD/PDL setting.

This means that even though you have APD/PDL responses configured on a cluster level, the VM level configuration overrides it as it would be set to “disabled”. It doesn’t matter really why you added them to VM Overrides, could be to configure VM Restart Priority for instance. The frustrating part is that the UI doesn’t show you it is disabled as it looks like it is not configured.

If you can’t install the patch just yet, for whatever reason, but you do have VMs in VM Overrides, make sure to go to VM Overrides and explicitly configure the VMs to have the APD/PDL responses enabled similar to what it is configured to on a cluster level as shown in the screenshots below.

vSphere HA internals: VMCP super aggressive option in vSphere 7

Duncan Epping · May 11, 2020 · 8 Comments

Most of you probably heard about a feature called VMCP aka VM Component Protection. If not, this is the functionality in vSphere HA that enabled you to restart VMs which have been impacted by a PDL (permanent device loss) or APD (all paths down) scenario. (If you have no idea what I am talking about read this article first.)

When you configure the APD response you have four options:

  1. Disable
  2. Issue Event
  3. Power Off / Restart – Conservative
  4. Power Off / Restart – Aggressive

The main difference between Conservative and Aggressive is that if you find yourself in a situation where HA isn’t sure whether a VM can be restarted during an APD scenario it will not power off the VM when using Conservative. If you have it configured as Aggressive it will power off the VM. However, if HA is certain that a VM can’t be powered on it will not power off the VM. Basically it prefers availability of the VM.

As you can imagine, in certain scenarios having a VM running while it is impacted by an “APD” situation makes no sense. The VM has lost access to storage, and you simply may prefer to kill the workload. Why? Well, when it loses access to storage it can’t write to disk. You could find yourself in a situation where a change is acknowledged and you think it is written to disk but it somehow is sitting in a memory cache etc.

If you prefer the VM to be killed, regardless of whether it can be restarted or not, you can enable this via a vSphere HA advanced setting. Now before you implement this, do note that if a cluster-wide APD situation occurs, you could find yourself in the scenario where ALL virtual machines are powered off by HA and not restarted as the resources are not available. Anyway, if you feel this is a requirement, you can configure the following vSphere HA advanced setting in vSphere 7:

das.restartVmsWithoutResourceChecks = true

2 node direct connect vSAN and error “vSphere HA agent on this host could not reach isolation address”

Duncan Epping · Oct 7, 2019 · Leave a Comment

I’ve had this question over a dozen times now, so I figured I would add a quick pointer to my blog. What is causing the error “vSphere HA agent on this host could not reach isolation address” to pop up on a 2-node direct connect vSAN cluster? The answer is simple, when you have vSAN enabled HA uses the vSAN network for communication. When you have a 2-node Direct Connect the vSAN network is not connected to a switch and there are no other reachable IP addresses other than the IP addresses of the vSAN VMkernel interfaces.

When HA tries to test if the isolation address is reachable (the default gateway of the management interface) the ping will fail as a result. How you can solve this is simply by disabling the isolation response as described in this post here.

I discovered .PNG files on my datastore, can I delete them?

Duncan Epping · Sep 26, 2019 · 1 Comment

I noticed this question on Reddit about .PNG which were located in VM folders on a datastore. The user wanted to remove the datastore from the cluster but didn’t know where these files were coming from and if the VM required those files to be available in some shape or form. I can be brief about it, you can safely delete this .PNG files. These files are typically created by VM Monitoring (part of vSphere HA) when a VM is rebooted by VM Monitoring. This is to ensure you can troubleshoot the problem potentially after the reboot has occurred. So it takes a screenshot of the VM to for instance capture the blue screen of death. This feature has been in vSphere for a while, but I guess most people have never really noticed it. I wrote an article about it when vSphere 5.0 was released and below is the screenshot from that article where the .PNG file is highlighted. For whatever reason I had trouble finding my own article on this topic so I figured I would write a new one on it. Of course, after finishing this post I found the original article. Anyway, I hope it helps others who find these .PNG files in their VM folders.

Oh, and I should have added, it can also be caused by vCloud Director or be triggered through the API, as described by William in this post from 2013.

This host has no isolation addresses defined as required by vSphere HA

Duncan Epping · Dec 19, 2018 ·

I had a comment on one of my 2-node vSAN cluster articles that there was an issue with HA when disabling the Isolation Response. The isolation response is not required for 2-node as it is impossible to properly detect an isolation event and vSAN has a mechanism to do exactly what the Isolation Response does: kill the VMs when they are useless. The error witnessed was “This host has no isolation addresses defined as required by vSphere HA” as shown also in the screenshot below.

This host has no isolation addresses defined as required by vSphere HA

So now what? Well first of all, as mentioned in the comments section as well, vSphere always checks if an isolation address is specified, that could be the default gateway of the management network or it could be the isolation address that you specified through advanced setting das.isolationaddress. When you use das.isolationaddress it often goes hand in hand with das.usedefaultisolationaddress set to false. That last setting, das.usedefaultisolationaddress, is what causes the error above to be triggered. What you should do in a 2-node configuration is the following:

  1. Do not configure the isolation response, explanation to be found in the above-mentioned article
  2. Do not configure das.usedefaultisolationaddress, if it is configured set it to true
  3. Make sure you have a gateway on the management vmkernel, if that is not the case you could set das.isolationaddress and simply set it to 127.0.0.1 to prevent the error from popping up.

Hope this helps those hitting this error message.

  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Go to Next Page »

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007) and the author of the "vSAN Deep Dive" and the “vSphere Clustering Technical Deep Dive” series.

Upcoming Events

25-Feb-21 | Swiss VMUG – Roadshow
04-Mar-21 | Polish VMUG – Roadshow
09-Mar-21 | Austria VMUG – Roadshow
16-Mar-21 | VMUG Turkey – Roadshow
18-Mar-21 | St Louis Usercon Keynote
26-Mar-21 | Hungary VMUG – Roadshow
08-Apr-21 | VMUG France – Roadshow

Recommended reads

Sponsors

Want to support us? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2021 · Log in