Someone reported issues that in their environment VMs could not be restarted as there were no compatible hosts available. The relevant part of the error message was:
I don’t know why in this case it happened as the log files unfortunately don’t provide these details. This person had manually restarted all of his VMs and that actually worked okay. This could mean that some how the “compatibility list” that vSphere HA maintains was not complete or it wasincorrect. So the question would be how do you validate that if you ever end up in a scenario like this?
First of all before I forget, create a support dump. That way VMware Global Support Services can help pinpointing your problems and provide tips on how to prevent these from occurring.
On a host, and you will have to SSH in to one, you can actually run a script that provides you with some nice details around this. Lets go through the options of the script and explain what you can get out of it. The script is called “prettyPrint.sh” can be found in “/opt/vmware/fdm/fdm/”.
The hostlist option provides all relevant details about the hosts which are part of this cluster including “hostId”, host name, ip address etc.
The clusterconfig option provides all configuration info of your cluster like admission control and isolation response.
The compatlist option provides the list of VMs and host they are compatible with, only for vSphere 5.0.
The vmmetadata option provides the list of VMs and host they are compatible with, only for vSphere 5.1.
So in this case “vmmetadata” was important as it lists VMs compatible with which host. In this case “<index>0</index> refers to a VM and “<compatMask>0,1,2,3</compatMask> refers to the hosts it is compatible with. Nice right?!
<compatMatrix> <restartCompat> <index>0</index> <compatMask>0,1,2,3</compatMask> </restartCompat> <restartCompat> <index>1</index> <compatMask>0,1,2,3</compatMask> </restartCompat> <restartCompat> <index>2</index> <compatMask>0,1,2,3</compatMask> </restartCompat> </compatMatrix>
** Update: Added Portgroup Test **
On VMTN someone asked if HA also takes networking in to account when restarting VMs. If a given portgroup is not available on specific hosts will HA smartly place VMs? In my test I removed the “VM Network” portgroup from one of my hosts (host with ID 2). When listing the compatibility list the following shows up:
<restartCompat> <index>0</index> <compatMask>0,1,3</compatMask> </restartCompat>
As you can see host with ID 2 is missing.
Nice post Duncan.
Cool tidbit of info there about prettyPrint.sh