VMware

Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster

Duncan Epping · Jul 12, 2018 ·

I was talking to the HA team this week, specifically about the upcoming HA book. One thing they mentioned is that people still seem to struggle with the concept of admission control. This is the reason I wrote a whole chapter on it years ago, yet there still seems to be a lot of confusion. One thing that is not clear to people is the percentage calculations. We have some customers with VMs with extremely large reservations, in that case instead of using the “slot policy” they typically switch to “percentage based policy”. Simply as the percentage based policy is a lot more flexible.

However, recently we have had some customers that hit the following error message:

Insufficient configured resources to satisfy the desired vSphere HA failover level on the cluster

This error message, in the case of these particular situations (yes there was a bug as well, read this article on that), set the percentage lower than what would equal a full host. In other words, in a 4 host environment, a single host would equal 25%. In some cases, customers would set the percentage to a value lower than 25%. I am personally not sure why anyone would do this as it contradicts the whole essence of admission control. Nevertheless, it happens.

This message indicates that you may not have sufficient resources, in the case of a host failure, to restart all the VMs. This of course is the result of the percentage being set lower than the value that would equal a single host. Note though, this does not stop you from powering on new VMs. You will only be stopped from powering on new VMs when you exceed the available unreserved resources.

So if you are seeing this error message, please verify the configured percentage if you set it manually. Ensure that at a minimum it equals the largest host in the cluster.

** back to finalizing the book **

Adding a fifth (virtual) ESXi host to vCenter Foundation

Duncan Epping · Jul 6, 2018 ·

When running a 4 node stretched cluster environment it should be possible to use “cheaper” vCenter Server licenses, namely vCenter Foundation. One of the limitations of vCenter Foundation is that you can only manage 4 hosts with it. This is where some customers who wanted to manage a stretched cluster hit some issues. The issue occurs at the point where you want to add the Witness VM to the inventory. Deploying the VM, of course, works fine, but it becomes problematic when you add the virtual ESXi host (Witness Appliance) to the vCenter Foundation instance as vCenter simply will not allow you to add a 5th host. Yes, this 5th host would be a witness, and will not be running any VMs, and even has a special license. Yet, the “add host” wizard does not differentiate between a regular host and a virtual witness appliance.

Fortunately, there’s a workaround. It is fairly straightforward, and it has to do with the order in which you add hosts to vCenter Foundation. If you add the witness VM before the physical hosts then the appliance is not counted against the license. The license count (and allocation) apparently happens after the host has been added, but somehow vCenter does validate beforehand. I guess we do this to avoid abuse.

So if you have vCenter Foundation, and want to build a stretched cluster leveraging a 2+2+1 configuration, meaning 4 physical hosts and 1 witness VM, then simply add the Witness VM to the inventory as a host first and then add the rest. For those wondering, yes this is documented in the release notes of vSphere 6.5 Update, all the way at the bottom.

Trigger APD on iSCSI LUN on vSphere

Duncan Epping · Jun 21, 2018 ·

I was testing various failure scenarios in my lab today for the vSphere Clustering Deepdive session I have scheduled for VMworld. I needed some screenshots and log files of when a datastore hit an APD scenario, for those who don’t know APD stands for all paths down. In other words: the storage is inaccessible and ESXi doesn’t know what has happened and why. vSphere HA has the ability to respond to that kind of failure. I wanted to test this, but my setup was fairly simple and virtual. So I couldn’t unplug any cables. I also couldn’t make configuration changes to the iSCSI array as that would rather trigger a PDL (permanent device loss), so how do you test and APD scenario?

After trying various things like killing the iSCSI daemon (it gets restarted automatically with no impact on the workload) I bumped in to this command which triggered the APD:

SSH in to the host you want to trigger the APD on, run the following command
```
esxcli iscsi session remove  -A vmhba65
```
Make sure of course to replace “vmhba65” with the name of your iSCSI adapter

This triggered APD, as witness in the fdm.log and vmkernel.log, and ultimately resulted in vSphere HA killing the impacted VM and restarting it on a healthy host. Anyway, just wanted to share this as I am sure there are others who would like to test APD responses in their labs or before their environment goes in to production.

There may be other easy ways as well, if you know any, please share in the comments section.

Customer Experience Improvement Program: where, when and what?

Duncan Epping · May 28, 2018 ·

I got a question on my post about the Customer Experience Improvement Program (ceip) demo, the questions boiled down to the following:

What is being send to VMware
Where is the data stored by VMware
When is the data send to VMware (how often)

The “what” question was easy to answer, as this was documented by John Nicholson on Storagehub.vmware.com for vSAN specifically. Realizing that it isn’t easy to find anywhere what ceip data is stored I figured I would add a link here and also repeat the summary of that article, assuming by now everyone uses a VCSA (if not go to the link):

SSH into VCSA
Run command: cd /var/log/vmware/vsan-health/
Data collected by online health checks is written and gzipped to files " <uuid>cloud_health_check_data.json.gz" and " <uuid>vsan_perf_data.json.gz
You can extract the json content by calling " gunzip -k <gzipped-filename> " or view the contents by calling " zcat <gzipped-filename> "

So that is how you view what is being stored, John also posted an example of the dataset on github for those who just want to have a quick peek. Note that you need an “obfuscation map” (aka key) to make sense out of the data in terms of host-names / VM names / ip-addresses etc. Without that you can stare at the dataset all you want, but you won’t be able to relate it back to a customer. I would also add that we are not storing any VM/Workload data, it is configuration data / feature usage / performance data. Hopefully that will answer the “what” question for you.

Where is the data stored? The data is send to “https://vcsa.vmware.com” and it ends up in VMware’s analytics cloud, which is hosted in secure data centers in the US. The frequency is a bit difficult to answer, as this fully depends on which products are in use, but to my knowledge with vSAN/vSphere it is on an hourly basis. I have asked the VMware team who owns this to create a single page/document with all of the required details needed in it so that security teams can simply be pointed to it.

Hopefully I will have a follow up soon.

How to simplify vSAN Support!

Duncan Epping · May 25, 2018 ·

Last week I presented at the Tech Support Summit in Cork with Cormac. Our session was about the evolution of vSAN, where are we today but more importantly which directly will we be going. One thing that struck me when I discussed vSAN Support Insight, the solution we announced not to long ago, is that not too many people seemed to understand the benefit. When you have vSAN and you enable CEIP (Customer Experience Improvement Program) then you have a phone home solution for your vSphere and vSAN environment automatically. What this brings is fairly simple to explain: less frustration! Why? Well the support team will have, when you provide them your vCenter UUID, instant access to all of the metadata of your environment. What does that mean? Well the configuration for instance, the performance data, logs, health check details etc. This will allow them to instantly get a good understanding of what your environment looks like, without the need for you as a customer to upload your logs etc.

At the event I demoed the Support Insight interface, which is what the Support Team has available, and a lot of customers afterwards said: now I see the benefit of enabling this, I will do this for sure when I get back to the office. So I figured I would take the demo, do a voice over and release it to the public. We need more people to join the customer experience improvement program, so watch the video to see what this gives the support team. Note by the way that everything is anonymized, without you providing a UUID it is not possible to correlate the data to a customer. Even when you provide a UUID the support team can only see the host, vm, policy and portgroup (etc) names when you provide them with what is called an obfuscation map (key). Anyway, watch the demo and join now!