I have discussed this topic a couple of times, and want to inform people about a recent change in recommendation. In the past when deploying a stretched cluster (vMSC) it was recommended by most storage vendors and by VMware to set Disk.AutoremoveOnPDL to 0. This basically disabled the feature that automatically removes LUNs which are in a PDL (permanent device loss) state. Upon return of the device a rescan would then allow you to use the device again. With vSphere 6.0 however there has been a change to how vSphere responds to a PDL scenario, vSphere does not expect the device to return. To be clear, the PDL behaviour in vSphere was designed around the removal of devices, they should not stay in the PDL state and return for duty, this did work however in previous version due to a bug.
With vSphere 6.0 and higher VMware recommends to set Disk.AutoremoveOnPDL to 1, which is the default setting. If you are a vMSC / stretched cluster customer, please change your environment and design accordingly. But before you do, please consult your storage vendor and discuss the change. I would also like to recommend testing the change and behaviour to validate that the environment returns for duty correctly after a PDL! Sorry about the confusion.
KB article backing my recommendation was just posted: https://kb.vmware.com/kb/2059622. Documentation (vMSC whitepaper) is also being updated.
Hi Duncan, after having read your post I’ve contact EMC support. We have a VPLEX Metro cluster providing storage to a ESX 5.5/6 stretched cluster. This is their reply:
EMC recommends to set Disk.AutoremoveOnPDL to 0 (disable this function)
– in the case of a VMware vSphere Metro Storage Cluster, the PDL’s are likely temporary because one site has become orphaned from the other, in which case a failover has occur. if the device in a PDL state are removed permanently when the failure or configuration error of the vMSC environment is fixed, they will not automatically be visible to hosts again. this will require a manual rescan in order to bring the devices back into service. The whole reason for having a vMSC environment is that it handles these types of things automatically. So you don’t want to have to do manual rescans all the time. for this reason, the PDL Autoremove functionality should be disabled on all hosts that are part of a vMSC configuration.
– in addition, PDL AutoRemove occurs only if there are no open handles left on the device. The auto-remove takes place when the last handle on the device closes. If the device recovers, or if it is re-added after having been inadvertently removed, it is treated as a new device. In such cases VMware does not guarantee consistency for VMs on that datastore
– in summary, in a vMSC environment, such as with VPLEX Metro, AutoRemove should be disabled so that devices appear instantly upon recovery and do not require rescanning
So I’m quite confused now.
Thanks.
Matteo
Yes I understand you are confused. The change in behaviour and the change in best practice was actually discussed with EMC and our Support Team. On both sides documents should be updated soon as well. I will stress this again today.
Hi Duncan, any news on this?
Yes, there was an update to the KB article and the documentation is also being changed but this is a slower process.
The revised KB can be found here with the recommendation to NOT disable Disk.AutoremoveOnPDL for vSphere 6.0 and higher:
https://kb.vmware.com/selfservice/kb/2059622
Hi Duncan, I’ve been in contact with EMC support about this and they’re now saying that with VPLEX Metro this parameter should be set to 1 for ESXi 6.0 AND 5.5.
Is there any official EMC documentation about this?
Thanks.
Matteo
I don’t know about EMC documentation, I don’t own that… but I do know about our documentation and it clearly states this change in behaviour. Please point EMC to KB: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2059622
They can contact our team for more details. (I will also ping them once again.)