vSphere HA – VM Monitoring sensitivity

Last week there was a question on VMTN about VM Monitoring sensitivity. I could have sworn I did an article on that exact topic, but I couldn’t find it. I figured I would do a new one with a table explaining the levels of sensitivity that you can configure VM Monitoring to.

The question that was asked was based on a false positive response of VM Monitoring, in this case the virtual machine was frozen due to the consolidation of snapshots and VM Monitoring responded by restarting the virtual machine. As you can imagine the admin wasn’t too impressed as it caused downtime for his virtual machine. He wanted to know how to prevent this from happening. The answer was simple, change the sensitivity as it is set to “high” by default.

As shown in the table high sensitivity means that VM Monitoring responds to missing “VMware Tools heartbeat” within 30 seconds. However, before VM Monitoring restarts the VM though it will check if their was any storage or networking I/O for the last 120 seconds (advanced setting: das.iostatsInterval). If the answer is no to both, the VM will be restarted. So if you feel VM Monitoring is too aggressive, change it accordingly!

Sensitivity Failure Interval Max Failures Max Failures Time window
Low 120 seconds 3 7 days
Medium 60 seconds 3 24 hours
High 30 seconds 3 1 hour

Do note that you can change the above settings individually as well in the UI, as seen in the screenshot below. For instance you could manually increase the failure interval to 240 seconds. How you should configure it is something I cannot answer, it should be based on what you feel is an acceptable response time to a failure. Also, what is the sweet spot to avoid a false positive… A lot to think about indeed when introducing VM Monitoring.

vSphere 5.1 Storage DRS Interoperability

A while back I did this article on Storage DRS Interoperability. I had questions last week about this so I figured I would write a new article which reflects the current state (vSphere 5.1). I also included some details that are part of the interoperability white paper Frank and I did so that we have a fairly complete picture. This white paper is on 5.0, it will probably be updated at some point in the future.

The first column describes the feature or functionality, the second column the recommended or supported automation mode and the third and fourth column show which type of balancing is supported.

Capability Automation Mode Space Balancing I/O Metric Balancing
Array-based Snapshots Manual Yes Yes
Array-based Deduplication Manual Yes Yes
Array-based Thin provisioning Manual Yes Yes
Array-based Auto-Tiering Manual Yes No
Array-based Replication Manual Yes Yes
vSphere Raw Device Mappings Fully Automated Yes Yes
vSphere Replication Fully Automated Yes Yes
vSphere Snapshots Fully Automated Yes Yes
vSphere Thin provisioned disks Fully Automated Yes Yes
vSphere Linked Clones Fully Automated (*) Yes Yes
vSphere Storage Metro Clustering Manual Yes Yes
vSphere Site Recovery Manager Not supported n/a n/a
VMware vCloud Director Fully Automated (*) Yes Yes
VMware View (Linked Clones) Not Supported n/a n/a
VMware View (Full Clones) Fully Automated Yes Yes

(*) = Change from 5.0

DRS not taking CPU Ready Time in to account? Need your help!

For years these rumors have been floating around that DRS does not take CPU Ready Time (%RDY) in to account when it comes load balancing the virtual infrastructure. Fact is that %RDY has always been a part of the DRS algorithm but not as a first class citizen but as part of CPU Demand, which is a combination of various metrics but includes %RDY. Still, one might ask why %RDY is not a first class citizen.

There is a good reason though that %RDY isn’t, just think about what DRS is and does and how it actually goes about balancing out the environment, trying to please all virtual machines. Yes a lot of possibilities indeed to move virtual machines around in a cluster. So you can imagine that it is is really complex (and expensive) to calculate what the possible impact is after a virtual machine has been migrated “from a host” or “to a host” for all of the first class citizen metrics.

Now, for a long time the DRS engineering team has been looking for situations in the field where a cluster is balanced according to DRS but there are still virtual machines experiencing performance problems due to high %RDY. The DRS team really wants to fix this problem or bust the myth – what they need is hard data. In other words, vc-support bundles from vCenter and vm-support bundles from all hosts with high ready times. So far, no one has been able to provide these logs / cold hard facts.

If you see this scenario in your environment regularly please let me know. I will personally get you in touch with our DRS engineering team and they will look at your environment and try to solve this problem once and for all. We need YOU!

What is static overhead memory?

We had a discussion internally on static overhead memory. Coincidentally I spoke with Aashish Parikh from the DRS team on this topic a couple of weeks ago when I was in Palo Alto. Aashish is working on improving the overhead memory estimation calculation so that both HA and DRS can be even more efficient when it comes to placing virtual machines. The question was around what determines the static memory and this is the answer that Aashish provided. I found it very useful hence the reason I asked Aashish if it was okay to share it with the world. I added some bits and pieces where I felt additional details were needed though.

First of all, what is static overhead and what is dynamic overhead:

  • When a VM is powered-off, the amount of overhead memory required to power it on is called static overhead memory.
  • Once a VM is powered-on, the amount of overhead memory required to keep it running is called dynamic or runtime overhead memory.

Static overhead memory of a VM depends upon various factors:

  1. Several virtual machine configuration parameters like the number vCPUs, amount of vRAM, number of devices, etc
  2. The enabling/disabling of various VMware features (FT, CBRC; etc)
  3. ESXi Build Number

Note that static overhead memory estimation is calculated fairly conservative and we take a worst-case-scenario in to account. This is the reason why engineering is exploring ways of improving it. One of the areas that can be improved is for instance including host configuration parameters. These parameters are things like CPU model, family & stepping, various CPUID bits, etc. This means that as a result, two similar VMs residing on different hosts would have different overhead values.

But what about Dynamic? Dynamic overhead seems to be more accurate today right? Well there is a good reason for it, with dynamic overhead it is “known” where the host is running and the cost of running the VM on that host can easily be calculated. It is not a matter of estimating it any longer, but a matter of doing the math. That is the big difference: Dynamic = VM is running and we know where versus Static = VM is powered off and we don’t know where it might be powered!

Same applies for instance to vMotion scenarios. Although the platform knows what the target destination will be; it still doesn’t know how the target will treat that virtual machine. As such the vMotion process aims to be conservative and uses static overhead memory instead of dynamic. One of the things or instance that changes the amount of overhead memory needed is the “monitor mode” used (BT, HV or HWMMU).

So what is being explored to improve it? First of all including the additional host side parameters as mentioned above. But secondly, but equally important, based on the vm -> “target host” combination the overhead memory should be calculated. Or as engineering calls it calculating “Static overhead of VM v on Host h”.

Now why is this important? When is static overhead memory used? Static overhead memory is used by both HA and DRS. HA for instance uses it with Admission Control when doing the calculations around how many VMs can be powered on before unreserved resources are depleted. When you power-on a virtual machine the host side “admission control” will validate if it has sufficient unreserved resource available for the “static memory overhead” to be guaranteed… But also DRS and vMotion use the static memory overhead metric, for instance to ensure a virtual machine can be placed on a target host during a vMotion process as the static memory overhead needs to be guaranteed.

As you can see, a fairly lengthy chunk of info on just a single simple metric in vCenter / ESXTOP… but very nice to know!

Guaranteeing availability through admission control, chip in!

I have been having these discussions with our engineering teams for the last year around guaranteed restarts of virtual machines in a cluster. In the current shape / form we use Admission Control to guarantee virtual machines are restarted. Today Admission Control is all about guaranteeing virtual machine restarts by keeping track of Memory and CPU resource reservations, but you can imagine that in the Software Defined Datacenter this could be expanded with for instance storage or networking reservation.

Now why am I having these discussions, what is the problem with Admission Control today? Well first of all it is the perception that many appear to have of Admission Control. Many believe the Admission Control algorithm uses “used” resources. Reality however is that Admission Control is not that flexible, it uses resource reservations and as you know this is static. So what is the result of using reservations?

By using reservations for “admission control” vSphere HA has a simple way of guaranteeing a restart is possible at all times. Simply because it checks if sufficient “unreserved resources” are available and if so it allows the virtual machine to be powered-on. If not, then it won’t allow the power-on just to ensure that all virtual machines can be restarted in case of a failure. But what is the problem? Although we guarantee a restart we do not guarantee any type of performance after the restart! Unless, unless of course you are setting your reservations equal to what you provisioned… but I don’t know anyone doing this as it eliminates any form of overcommitment and will result in an increase of cost and a decrease in flexibility.

So that is the problem. Question is – what should we do about it? We (the engineering teams and I) would like to hear from YOU.

  • What would you like admission control to be?
  • What guarantees do you want HA to provide?
  • After a failure, what criteria should HA apply in deciding which VMs to restart?

One idea we have been discussing is to have Admission Control use something like “used” resources… or for instance an “average of resources used” per virtual machine. What if you could say: I want to ensure that my virtual machines always get at least 80% of what they use on average? If so, what should HA do when there are not enough resources to meet the 80% demand of all VMs? Power on some of the VMs? Power on all with reduced share values?

Also, something we have discussed is having vCenter show how many resources are used on average taking your high availability N-X setup in to account, which should at least provide an insight around how your VMs (and applications) will perform after a fail-over. Is that something you see value in?

What do you think? Be open and honest, tell us what you think… don’t be scared, we won’t be bite, we are open for all suggestions.

What is: Current Memory Failover Capacity?

I have had this question many times by now, what is “Current Memory Failover Capacity” that is shown in the cluster summary when you have selected the “Percentage Based Admission Control Policy”? What is that percentage? 99% of what? And will it go down to 0%? Or will it go down to the percentage that you reserved? Well I figured it was time to put things to the test and no longer be guessing.

As shown in the screenshot above, I have selected 33% of memory to be reserved and currently have 99% of memory failover capacity. Lets power-on a bunch of virtual machines and see what happens. Below is the result shown in a screenshot, “current memory failover capacity” went down from 99% to 94%.

Also when I increase the reservation in a virtual machine I can see “Current Memory Failover Capacity” drop down even further. So it is not about “used” but about “unreserved / reserved” memory resources (including memory overhead), let that be absolutely clear! When will vCenter Server shout “Insufficient resources to satisfy configured failover level for vSphere HA”?

It shouldn’t be too difficult to figure that one out, just power-on new VMs until it says “stop it”. As you can see in the screenshot below. This happens when you reach the percentage you specified to reserve as “memory failover capacity”. In other words in my case I reserved 33%, when “Current Memory Failover Capacity” reaches 33% it doesn’t allow the VM to be powered on as this would violate the selected admission control policy.

I agree, this is kind of confusing…  But I guess when you run out of resources it will become pretty clear very quickly ;-)

 

Using das.vmmemoryminmb with Percentage Based admission control

I had question today about using the advanced settings to set a minimal amount of resources that HA would use to do the admission control math with. Many of us have used these advanced settings das.vmMemoryMinMB and das.vmCpuMinMHz to dictate the slot size when no reservations were set in an environment where the “host failures” admission control policy was used. However what many don’t appear to realize is that this will also work for the Percentage Based admission control policy.

If you want to avoid extreme overcommitment and want to specify a minimal amount of resources that HA should use to do the math with then even with the Percentage Based admission control policy you can use these settings. In the case where your VM reservation does not exceed the value specified, the value is used to do the math with. In other words if you set “das.vmMemoryMinMB” to 2048, it will use 2048 to do the math with unless the reservation set on the VM is higher.

I did a quick experiment in my test lab which I had just rebuilt. Without das.vmMemoryMinMB and two VMs running (with no reservation) I had 99% Mem Failover Capacity as shown in the screenshot below:

With das.vmMemoryMinMB set to 20480, and two VMs running, I had 78% Mem Failover Capacity as shown in the screenshot below:

I guess that proves that you can use das.vmMemoryMinMB and das.vmCpuMinMHz to influence Percentage Based admission control.