vSphere 5.5 U1 patch released for NFS APD problem!

Duncan Epping · Jun 11, 2014 ·

On April 19th I wrote about an issue with vSphere 5.1 and NFS based datastores APD ‘ing. People internally at VMware have worked very hard to root cause the issue and fix it. Log entries witnessed are:

YYYY-04-01T14:35:08.075Z: [APDCorrelator] 9414268686us: [esx.problem.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
YYYY-04-01T14:36:55.274Z: No correlator for vob.vmfs.nfs.server.disconnect
YYYY-04-01T14:36:55.274Z: [vmfsCorrelator] 9521467867us: [esx.problem.vmfs.nfs.server.disconnect] 192.168.1.1/NFS-DS1 12345678-abcdefg0-0000-000000000000 NFS-DS1
YYYY-04-01T14:37:28.081Z: [APDCorrelator] 9553899639us: [vob.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

More details on the fix can be found here: http://kb.vmware.com/kb/2077360

Comments

Jeff says

11 June, 2014 at 09:12

Awesome work from vmware engineers as usual! Anxious to deploy this over the weekend.
Admin says

11 June, 2014 at 12:56

Thanks for releasing the patch for an important bug after almost two months. Come on VMware, you can do better.

Keeping enterprise customers in dark about the release detail will not make them happy.
- Duncan Epping says
  
  11 June, 2014 at 14:38
  
  I recommend that you provide this feedback directly to your VMware pre-sales or sales contact. That way the people responsible will hear directly from customers how things like these are experienced. Thanks,
Anthony Spiteri (@anthonyspiteri) says

11 June, 2014 at 15:38

Transparency on the root cause would be a favourable outcome given the time it took to resolve and the general hush hush nature of the problem.

It otherwise causes unnecessary speculation.

Glad to have the fix though…just in time for a platform upgrade.
Andy says

20 June, 2014 at 07:45

I’ve patched our hosts and still getting the APDCorrelator errors and NFS datastores going offline for a minute or two in the vSphere Client.
Not getting BSOD’s or issues with linux file systems just performance issues to the point where any applications that have to connect to DB’s are crashing…. Got a ticket with VMware and waiting to hear something.
- Will says
  
  1 July, 2014 at 16:32
  
  that is worrying! we are planning to upgrade and we use NFS for almost everything. am I better staying on ESXi 5.5 GA I wonder!?
- Jim says
  
  17 July, 2014 at 20:31
  
  My company too applied the patch and had APD occur again. We are using NetApp and found NetApp is still recommending the nfs max queuedepth = 64 as still being needed for NetApp.
  
  KB ID: 1014696 Version: 5.0 Published date: 07/11/2014
  https://kb.netapp.com/support/index?page=content&id=1014696&actp=LIST_RECENT
  VMware has published KB 2016122: NFS connectivity issues on NetApp NFS filers on ESXi 5.x and KB 2077360: VMware ESXi 5.5, Patch ESXi550-201406401-SG: Updates esx-base
  
  Their (VMware) claim is that there is a version of Data ONTAP that ‘resolves’ this issue
  The Data ONTAP upgrade referenced will ONLY prevent the TCP windowsize from dropping to 0, it will NOT resolve all APD issues
  Additionally, enabling SIOC will only ‘resolve’ the issue ‘after’ it begins occurring, it will not ‘prevent’ the issue from occurring.
  
  The only recommended way to resolve this is to limit the NFS maxqueuedepth to 64.

Related

Reader Interactions

Comments