CPU utilization increasing after VMotion in a DRS enabled cluster

VMwarewolf already posted this fix on his blog but had to remove it… Now VMware added it to their knowledge base. Check out the original article because it may change in time. For the lazy people I included how to diagnose the problem and more…

Diagnose the problem:

Use the VI Client to log in to VirtualCenter as an administrator.

Disable DRS in the cluster and wait for 1 minute.

In the VI Client, note the virtual machine’s CPU usage from performance tab.

In the VI Client, note the virtual machine’s memory overhead in the summary tab.

Enable DRS in the cluster.

Use VMotion to move the problematic virtual machine to another host.

Note the virtual machine CPU usage and memory overhead on the new host.

Disable DRS in the cluster and wait for 1 minute.

Note the virtual machine CPU usage and memory overhead on the new host.

If the CPU usage of the virtual machine increases in step 7 in comparison to step 3, and decreases back to the original state (similar to the behavior in step 3) in step 9 with an observable increase in the overhead memory, this indicates the issue discussed in this article.

You do not need to disable DRS to work around this issue.

The workaround:

Use the VI Client to log in to VirtualCenter as an administrator.

Right-click your cluster from the inventory.

Click Edit Settings.

Ensure that VMware DRS is shown as enabled. If it is not enabled check the box to enable VMware DRS.

Click OK.

Click an ESX Server from the Inventory.

Click the Configuration tab.

Click Advanced Settings.

Click the Mem option.

Locate the Mem.VMOverheadGrowthLimit parameter.

Change the value of this parameter to 5. (Note: By default this setting is set to -1.)

Click OK.

To verify the setting has taken effect:

Log in to your ESX Server service console as root from either an SSH Session or directly from the console of the server.

Type less /var/log/vmkernel.

A successfully changed setting displays a message similar to the following and no further action is required:
vmkernel: 1:16:23:57.956 cpu3:1036)Config: 414: VMOverheadGrowthLimit” = 5, Old Value: -1, (Status: 0x0)

If changing the setting was unsuccessful a message similar to the following is displayed:
vmkernel: 1:08:05:22.537 cpu2:1036)Config: 414: “VMOverheadGrowthLimit” = 0, Old Value: -1, (Status: 0x0)

Note: If you see a message changing the limit to 5 and then changing it back to -1, the fix is not successfully applied.

To fix multiple ESX Server hosts:

If this parameter needs to be changed on several hosts (or if the workaround fails for the individual host) use the following procedure to implement the workaround instead of changing every server individually:

Log on to the VirtualCenter Server Console as an administrator.

Make a backup copy of the vpxd.cfg file (typically it is located in C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\vpxd.cfg).

In the vpxd.cfg file, add the following configuration after the <vpxd> tag:
<cluster>
<VMOverheadGrowthLimit>5</VMOverheadGrowthLimit>
</cluster>
This configuration provides an initial growth margin in MB-to-virtual machine overhead memory. You can increase this amount to larger values if doing so further improves virtual machine performance.

Restart the VMware VirtualCenter Server Service.Note: When you restart the VMware VirtualCenter Server Service, the new value for the overhead limit should be pushed down to all the clusters in VirtualCenter.

Comments

Mark says

16 July, 2008 at 09:09

Thx for this nice tip. We have this problem right now, and are going to try this solution.
prithvi says

18 June, 2014 at 21:52

We are facing the same issue with SAP large VM’s in %.0 U 2 & U 3. What might be the reason for CPU utilization spiking up on VM’s after vMotion.
chp says

18 June, 2014 at 21:53

5.0 U2 & U3 ESXi

Related

Reader Interactions

Comments