Most of you probably heard about a feature called VMCP aka VM Component Protection. If not, this is the functionality in vSphere HA that enabled you to restart VMs which have been impacted by a PDL (permanent device loss) or APD (all paths down) scenario. (If you have no idea what I am talking about read this article first.)
When you configure the APD response you have four options:
- Disable
- Issue Event
- Power Off / Restart – Conservative
- Power Off / Restart – Aggressive
The main difference between Conservative and Aggressive is that if you find yourself in a situation where HA isn’t sure whether a VM can be restarted during an APD scenario it will not power off the VM when using Conservative. If you have it configured as Aggressive it will power off the VM. However, if HA is certain that a VM can’t be powered on it will not power off the VM. Basically it prefers availability of the VM.
As you can imagine, in certain scenarios having a VM running while it is impacted by an “APD” situation makes no sense. The VM has lost access to storage, and you simply may prefer to kill the workload. Why? Well, when it loses access to storage it can’t write to disk. You could find yourself in a situation where a change is acknowledged and you think it is written to disk but it somehow is sitting in a memory cache etc.
If you prefer the VM to be killed, regardless of whether it can be restarted or not, you can enable this via a vSphere HA advanced setting. Now before you implement this, do note that if a cluster-wide APD situation occurs, you could find yourself in the scenario where ALL virtual machines are powered off by HA and not restarted as the resources are not available. Anyway, if you feel this is a requirement, you can configure the following vSphere HA advanced setting in vSphere 7:
das.restartVmsWithoutResourceChecks = true
Hello Duncan thank you for work.
I have a question, any news about VMCP for Networking?
I know it is being considered, but I can’t publicly comment on when or if it will ever make it into a release unfortunately.
OK, thank you for this feedback anyway. I appreciate. Have a good day
Hi Duncan! Thank you for sharing this!
I had a question about this from vSAN Prospective:
For example, we had 12-nodes vSAN Cluster splitted in 3 FD across the racks (4+4+4). vSAN utilize it’s own network from NIC prospective, but also for Ethernet fabric – 2 ToR for vSAN and 2 ToR VM-Network.
Now let’s imagine, vSAN Fabric is completely down in one of FD for whatever reason (most commonly – configuration mistake on spine-level). vSAN will lock any object on isolated FD because it lost communication with 2 other components. So every VM running on isolated FD will lost access to it’s vmdk’s and VM will be restarted on any of 2 others FDs. The problem is some of VMs will continue to running even without disk for sometime.
AFAIK, for vSAN Stretched Cluster this problem solved through VSAN.AutoTerminateGhostVm, but what behavior will be for non-stretched HA/vSAN clusters? Is there a way to solve this issue from tuning HA settings or architecture-prospective (in case FDs and separate fabrics for vSAN/VM Networks is required)
There’s no solution for non-stretched at the moment. Hence we have requested the HA and vSAN team to explore how to enable APD/PDL alike responses for vSAN. I can’t comment on whether when or if it would make it publicly.
In the image, it’s noted that HA will restart the VM _if_ it can be restarted on another host. In a scenario where all hosts share the same storage (NAS or iSCSI) and there’s an upstream event that disrupts connectivity, does HA continue to monitor at some interval so that when storage connectivity is resolved the VM is still restarted, or is HA only making that evaluation at the time of initial impact?
Ah, great question. A change of cluster resources would trigger a restart. So the VMs will be restarted when the datastore returns for duty!
thankyou for sharing your information:)