All of you know by now that I have a love for availability related topics… Hence the reasons I needed to write something about INF-BCO2807. The session titled “vSphere HA and Datastore Access Outages – Current- Capabilities Deep-Dive and Tech Preview”, presented by Keith Farkas and Smriti Desai, discussed possible future HA enhancements that will solve component failures. Those of you who read my whitepaper on stretched clusters can immediately see why this would be a nice enhancement!
Once again a big fat disclaimer, VMware gives absolutely no guarantees when or even if this will be released.
This session was all about inaccessible data stores. During our talk Lee Dilworth and I explained the difference between a Permanent Device Loss (PDL) and an All Paths Down (APD) condition. In short, PDL is a “scsi sense code” issued by the storage system (or an iSCSI “login reject” for that matter). This scsi sense code allows vSphere (both the kernel and HA) to respond and act upon it. In the case of an APD vSphere cannot respond… the LUN is gone on that host and we don’t know why, so what do we do? Well with 5.1 and prior we do nothing. This results in zombied virtual machines, and that is not the state you want your virtual machines to be in right?
So how is VMware planning to solve this? It is planning to enhance HA with what was referred to as “Component Protection”. Component Protection allows responses per virtual machine when an APD or PDL has been detected. This is not based on guest I/Os failing, but on the vSphere platform declaring that the device is in a PDL or APD condition.
When an APD scenario is detected HA will be smart enough to understand which hosts can restart virtual machines, as in some cases multiple hosts might be impacted. Of course it will also only kill your virtual machine and restart it when it knows capacity is available for it.
I don’t know about you, but I would rather see this implemented today than tomorrow!? APD is not common, but also not rare… and when disaster strikes, it strikes hard!
I don’t think this session is scheduled for VMworld Europe, so make sure to watch the recording as soon as it is available as it is well worth your time. Keith and Smriti gave an excellent deepdive on the current vSphere HA and a nice look in to the future!