I was testing various failure scenarios in my lab today for the vSphere Clustering Deepdive session I have scheduled for VMworld. I needed some screenshots and log files of when a datastore hit an APD scenario, for those who don’t know APD stands for all paths down. In other words: the storage is inaccessible and ESXi doesn’t know what has happened and why. vSphere HA has the ability to respond to that kind of failure. I wanted to test this, but my setup was fairly simple and virtual. So I couldn’t unplug any cables. I also couldn’t make configuration changes to the iSCSI array as that would rather trigger a PDL (permanent device loss), so how do you test and APD scenario?
After trying various things like killing the iSCSI daemon (it gets restarted automatically with no impact on the workload) I bumped in to this command which triggered the APD:
- SSH in to the host you want to trigger the APD on, run the following command
esxcli iscsi session removeย -A vmhba65
- Make sure of course to replace “vmhba65” with the name of your iSCSI adapter
This triggered APD, as witness in the fdm.log and vmkernel.log, and ultimately resulted in vSphere HA killing the impacted VM and restarting it on a healthy host. Anyway, just wanted to share this as I am sure there are others who would like to test APD responses in their labs or before their environment goes in to production.
There may be other easy ways as well, if you know any, please share in the comments section.
Anh Hoang says
Hi Duncan,
Not sure if I followed you correctly but after executing your command the VM is still running on the host even though datastore is inaccessible ๐
The ESXi Shell can be disabled by an administrative user. See the
vSphere Security documentation for more information.
[root@ESXHOST:~] esxcli iscsi session remove -A vmhba65
[root@ESXHOST:~] esxcli vm process list | grep VM1
VM1
Display Name: VM1
Config File: /vmfs/volumes/5acae014-ed75bea1-58c0-2c44fd84e244/VM1/VM1.vmx
[root@ESXHOST:~] esxcli storage filesystem list
Mount Point Volume Name UUID Mounted Type Size Free
————————————————- ——————- ———————————– ——- —— ————- ————
/vmfs/volumes/52ceefc5-7e3f7e66-8a38-2c44fd84e245 diskfiles_tmp 52ceefc5-7e3f7e66-8a38-2c44fd84e245 true VMFS-5 294473695232 215386947584
/vmfs/volumes/5971657b-af85ae8b-cb63-2c44fd84e244 diskfiles 5971657b-af85ae8b-cb63-2c44fd84e244 true VMFS-5 1199906488320 468635877376
/vmfs/volumes/5acae014-ed75bea1-58c0-2c44fd84e244 iSCSI_NAS1_LUN1 5acae014-ed75bea1-58c0-2c44fd84e244 true VMFS-6 0 0
/vmfs/volumes/09d36c79-482221c4-b644-f284d6560b0e 09d36c79-482221c4-b644-f284d6560b0e true vfat 261853184 72810496
/vmfs/volumes/52ceefbd-d14919f8-056d-2c44fd84e245 52ceefbd-d14919f8-056d-2c44fd84e245 true vfat 299712512 82952192
/vmfs/volumes/314a6078-7f91256f-bb46-4c086893b5cf 314a6078-7f91256f-bb46-4c086893b5cf true vfat 261853184 64671744
/vmfs/volumes/5ac47a9c-80f0c605-72dd-2c44fd84e244 5ac47a9c-80f0c605-72dd-2c44fd84e244 true vfat 4293591040 4260888576
duncan@yellow-bricks says
and you did configure HA to respond to APD failures?
Anh Hoang says
Ah you’re right..my bad. I have enabled the HA respond to ADP failures and it works…but not straight forward to restart the VM on another host. It somehow lost connection for few pings and it’s reachable again then it’s showing inaccessible in vsphere, and finally HA response to kill the VM and power on on healthy host. I have collected all logs, if you need I can send or paste it all here ๐
duncan@yellow-bricks says
This is the standard behavior, no need to send me the log files… this is how APD response works. It kills the VM and restarts it on a healthy host!
Anh Hoang says
Yup very nice feature! Honestly, we somehow skip this in our environment. Many thanks for this article ๐