• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

Change in Permanent Device Loss (PDL) behavior for vSphere 5.1 and up?

Duncan Epping · Aug 1, 2013 ·

Yesterday someone asked me a question on twitter about a whitepaper by EMC on stretched clusters and Permanent Device Loss (PDL) behavior. For those who don’t know what a PDL is, make sure to read this article first.  This EMC whitepaper states the following on page 40:

Note:

In a full WAN partition that includes cross-connect, VPLEX can only send SCSI sense code (2/4/3+5) across 50% of the paths since the cross-connected paths are effectively dead. When using ESXi version 5.1 and above, ESXi servers at the non-preferred site will declare PDL and kill VM’s causing them to restart elsewhere (assuming advanced settings are in place); however ESXi 5.0 update 1 and below will only declare APD (even though VPLEX is sending sense code 2/4/3+5). This will result in a VM zombie state. Please see the section Path loss handling semantics (PDL and APD)

Now as far as I understood, and I tested this with 5.0 U1 the VMs would not be killed indeed when half of the paths were declared APD and the other half PDL. But I guess something has changed with vSphere 5.1. I knew about one thing that has changed which isn’t clearly documented so I figured I would do some digging and write a short article on this topic. So here are the changes in behavior:

Virtual Machine using multiple Datastores:

  • vSphere 5.0 u1 and lower: When a Virtual Machine’s files are spread across multiple Datastores it might not be restarted in the case a Permanent Device Loss scenario occurs.
  • vSphere 5.1 and higher: When a Virtual Machine’s files are spread across multiple Datastores and a Permanent Device Loss scenario occurs then vSphere HA will restart the virtual machine taking availability of those datastores on the various hosts in your cluster in to account.

Half of the paths in APD state:

  • vSphere 5.0 u1 and lower: When a datastore on which your virtual machine resides is not in a 100% declared in a PDL state (assume half of the paths in APD) then the virtual machine will not be killed and restarted.
  • vSphere 5.1 and higher: When a datastore on which your virtual machine resides is not in a 100% declared in a PDL state (assume half of the paths in APD) then the virtual machine will be killed and restarted. This is a huge change compared to 5.0 U1 and lowe

These are the changes in behavior I know about for vSphere 5.1, I have asked engineering to confirm these changes for vSphere Metro Storage Cluster environments. When I have received an answer I will update this blog.

Share it:

  • Tweet

Related

Server, Storage clustering, PDL, Storage, vmsc, vsphere metro storage cluster

Reader Interactions

Comments

  1. Graham Mitchell says

    1 August, 2013 at 14:41

    Duncan,
    Any way to subscribe to this post without having to type a comment?

  2. Ben says

    1 August, 2013 at 23:34

    Subscribing to post…

  3. Paul Martin says

    2 August, 2013 at 10:56

    subscribing to post

  4. Paul Martin says

    2 August, 2013 at 11:01

    Duncan, is this behaviour only with NMP?, I have tested similar in metro cluster with EMC PowerPath and we didn’t observe this type of behaviour. If only half the paths went dead the VMs stayed alive and standby paths took over, PPVE5.7, VPLEX metro 5.2 Cross-Cluster config in metro stretched cluster. This was vsphere 5.1.

    Thanks

    • Johan Blom says

      3 September, 2013 at 14:58

      I did not observe this behavior with PPVE5.8 or NMP.

      I did configure das.maskCleanShutdownEnabled and disk.terminateVMOnPDLDefault”

      When all paths where down it

      • Johan Blom says

        3 September, 2013 at 16:13

        When all paths were dead HA restarted the VM’s

        • Johan Blom says

          5 September, 2013 at 13:25

          But in this case (which I assume is the case EMC describes)

          the none preferred site looses connection to the other site and the witness (cross connect is “down” too) the luns will be detached and vplex will send a PDL. therefor the VM’s will be restarted if the advanced settings have been configured.. that’s what happened here when I tried it..

  5. Raiko Mesterheide says

    2 August, 2013 at 13:31

    Subscribing to post…

  6. JTurver says

    5 August, 2013 at 07:50

    Subscribing thanks

  7. Alexander Kocken says

    9 August, 2013 at 14:50

    Subscribing to post…

  8. Per says

    13 August, 2013 at 10:45

    Subscribing to post

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007), the author of the "vSAN Deep Dive", the “vSphere Clustering Technical Deep Dive” series, and the host of the "Unexplored Territory" podcast.

Upcoming Events

Feb 9th – Irish VMUG
Feb 23rd – Swiss VMUG
March 7th – Dutch VMUG
May 24th – VMUG Poland
June 1st – VMUG Belgium

Recommended Reads

Sponsors

Want to support Yellow-Bricks? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2023 · Log in