• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

apd

Trigger APD on iSCSI LUN on vSphere

Duncan Epping · Jun 21, 2018 ·

I was testing various failure scenarios in my lab today for the vSphere Clustering Deepdive session I have scheduled for VMworld. I needed some screenshots and log files of when a datastore hit an APD scenario, for those who don’t know APD stands for all paths down. In other words: the storage is inaccessible and ESXi doesn’t know what has happened and why. vSphere HA has the ability to respond to that kind of failure. I wanted to test this, but my setup was fairly simple and virtual. So I couldn’t unplug any cables. I also couldn’t make configuration changes to the iSCSI array as that would rather trigger a PDL (permanent device loss), so how do you test and APD scenario?

After trying various things like killing the iSCSI daemon (it gets restarted automatically with no impact on the workload) I bumped in to this command which triggered the APD:

  • SSH in to the host you want to trigger the APD on, run the following command
    esxcli iscsi session remove  -A vmhba65
  • Make sure of course to replace “vmhba65” with the name of your iSCSI adapter

This triggered APD, as witness in the fdm.log and vmkernel.log, and ultimately resulted in vSphere HA killing the impacted VM and restarting it on a healthy host. Anyway, just wanted to share this as I am sure there are others who would like to test APD responses in their labs or before their environment goes in to production.

There may be other easy ways as well, if you know any, please share in the comments section.

Using HA VM Component Protection in a mixed environment

Duncan Epping · Nov 29, 2017 ·

I have some customers who are running both traditional storage and vSAN in the same environment. As most of you are aware, vSAN and VMCP do not go together at this point. So what does that mean for traditional storage, as in with traditional storage for certain storage failure scenarios you can benefit from VMCP.

Well the statement around vSAN and VMCP is actually a bit more delicate. vSAN does not propagate PDL or APD in a way which VMCP understands. So you can enable VMCP in your environment, without it having an impact on VMs running on top of vSAN. The VMs which are running on the traditional storage will be able to use the VMCP functionality, and if an APD or PDL is declared on the LUN they are running on vSphere HA will take action. For vSAN, well we don’t propagate the state of a disk that way and we have other mechanisms to provide availability / resiliency.

In summary: Yes, you can enable HA VMCP in a mixed storage environment (vSAN + Traditional Storage). It is fully supported.

vSphere 5.5 U1 patch released for NFS APD problem!

Duncan Epping · Jun 11, 2014 ·

On April 19th I wrote about an issue with vSphere 5.1 and NFS based datastores APD ‘ing. People internally at VMware have worked very hard to root cause the issue and fix it. Log entries witnessed are:

YYYY-04-01T14:35:08.075Z: [APDCorrelator] 9414268686us: [esx.problem.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
YYYY-04-01T14:36:55.274Z: No correlator for vob.vmfs.nfs.server.disconnect
YYYY-04-01T14:36:55.274Z: [vmfsCorrelator] 9521467867us: [esx.problem.vmfs.nfs.server.disconnect] 192.168.1.1/NFS-DS1 12345678-abcdefg0-0000-000000000000 NFS-DS1
YYYY-04-01T14:37:28.081Z: [APDCorrelator] 9553899639us: [vob.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed. 

More details on the fix can be found here: http://kb.vmware.com/kb/2077360

vSphere 5.1 All Paths Down (APD) enhancements

Duncan Epping · Sep 10, 2012 ·

I’ve written about Permanent Device Loss multiple times but another scenario that some of you might have encountered is All Paths Down. All Paths Down already describes the scenario, but an example would be when for whatever reason the network between the host and the array fails. This would be result in an APD condition, meaning that the LUNs are unreachable due to the fact that all paths to the LUN are gone.

Some of you who have been in this scenario probably also have seen hosts being disconnected. In some cases, I’ve seen this happening at one point, a host might even freeze up. This would typically happen when a lot of I/O was sent to the datastore. This is of course something that everyone would want to avoid and hence a new advanced setting has been introduced, a new mechanism to handle APD conditions.

This brand new setting is called Misc.APDHandlingEnable. It can be set to 0 or 1. A value of zero means that ESXi will stick to the “old” method which is to always retry failed I/O’s. A value of 1 enables the new behavior. The behavior will allow ESXi to “fast-fail” I/Os. This will happen after 140 seconds by default. Fast-failing I/Os is what will prevent the host to be disconnected or frozen up.  This is configurable though through Misc.APDTimeout. Note you can set a filter in the Web Client to find the right advanced setting as shown in the screenshot below. Note that the minimum value for Misc.APDTimeout is 20 seconds.

vSphere 5.1 All Paths Down (APD) enhancements

Cormac Hogan has a great article about APD with a lot more technical details, make sure to read it.

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007), the author of the "vSAN Deep Dive", the “vSphere Clustering Technical Deep Dive” series, and the host of the "Unexplored Territory" podcast.

Upcoming Events

Feb 9th – Irish VMUG
Feb 23rd – Swiss VMUG
March 7th – Dutch VMUG
May 24th – VMUG Poland
June 1st – VMUG Belgium

Recommended Reads

Sponsors

Want to support Yellow-Bricks? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2023 · Log in