I am currently updating the vSphere Metro Storage Cluster best practices white paper, over the last two weeks I received various questions if there were any new recommendation for vMSC for 6.0. I have summarized the recommendations below for your convenience, the white paper is being reviewed and I am updating screenshots, hopefully will be done soon.
- In order to allow vSphere HA to respond to both an APD and a PDL condition vSphere HA needs to be configured in a specific way. VMware recommends enabling VM Component Protection. After the creation of the cluster VM Component Protection needs to be enabled.
- The configuration for PDL is basic. In the “Failure conditions and VM response” section it can be configured what the response should be after a PDL condition is detected. VMware recommends setting this to “Power off and restart VMs”. When this condition is detected a VM will be restarted instantly on a healthy host within the vSphere HA cluster.
- When an APD condition is detected a timer is started. After 140 seconds the APD condition is officially declared and the device is marked as APD time out. When the 140 seconds has passed HA will start counting, the default HA time out is 3 minutes. When the 3 minutes has passed HA will restart the impacted virtual machines, but you can configure VMCP to respond differently if desired. VMware recommends configuring it to “Power off and restart VMs (conservative)”.
- Conservative refers to the likelihood of HA being able to restart VMs. When set to “conservative” HA will only restart the VM that is impacted by the APD if it knows another host can restart it. In the case of “aggressive” HA will try to restart the VM even if it doesn’t know the state of the other hosts, which could lead to a situation where your VM is not restarted as there is no host that has access to the datastore the VM is located on.
- It is also good to know that if the APD is lifted and access to the storage is restored before the time-out has passed that HA will not unnecessarily restart the virtual machine, unless you explicitly configure it do so. If a response is desired even when the environment has recovered from the APD condition then “Response for APD recovery after APD timeout” should be configured to “Reset VMs”. VMware recommends leaving this setting disabled.
Joseph Griffiths says
So glad that component protection has been added. Thanks for the quick note.
Philip Coakes says
Duncan; does this latest update to vMSC for 6.0 whitepapers mean that an updated Clustering Deepdive is imminent?
Duncan Epping says
It may just be… who knows 😉
Jon says
Arrg!
Jake says
Any update on this whitepaper? In the middle of an XtremIO/VPLEX/vSPhere 6 vMSC and want to get it right! thanks!
Duncan Epping says
Been on a holiday for a couple of weeks, so the paper is delayed. No ETA at the moment.
Sim says
Hello Duncan
Was just wondering if there is any update to this whitepaper.
I’m currently going to start implementation of Vmetro using Left Hands and your expertise is always welcome.
Duncan Epping says
Should be out in two weeks I have been told. But no guarantees.
Tom Otomanski says
Hi Duncan
Is the below still valid in v6?
http://www.yellow-bricks.com/2013/11/08/disable-disk-autoremoveonpdl-vmsc-environment/
Duncan Epping says
Yes it is