performance

esxtop running out of control?

Duncan Epping · Jun 1, 2010 ·

One of my colleagues (Thanks Horst) pointed me to this option I had never noticed with ESXTOP “-l”. The option has been added to avoid high CPU utilization as the result of a large amount of LUNs during collection of the storage statistics.

kb 1005894
Add a lock mode (-l) to esxtop to help optimize CPU utilization.

In a large ESX Server deployment that includes many LUNs, esxtop usesa lot of CPU while accessing storage statistics.

To alleviate this problem, you can use the -l option with esxtop to enable lock mode. This option locks the entities (worlds, virtual CPUs, LUNs, NICs, and so on) for which statistics are displayed. Any new entities created during the esxtop session will not have statitistics displayed.

Batch mode (-b) also implies lock mode.

KB Article 1020524 (TPS and Nehalem)

Duncan Epping · May 27, 2010 ·

Scott Lowe pointed to KB Article 1020524 in his short take article. Although I agree with Scott that it is a useful article it is actually technically incorrect. I wanted to point it out as when Scott points to something you know many will pick up on it.

On Nehalem systems with Hardware assisted Memory Management Unit (MMU), TPS improves performance from 10 – 20% by using large pages.

Since TPS is done per small page, the VMkernel delays page sharing until host memory becomes over-committed. In the background, the ESX host gathers information about TPS, then uses it when it needs more memory.

It might be just a detail, but it is important to realize that it is not TPS that improves performance but Large Pages. TPS has got absolutely nothing to do with it and does not impact performance anywhere near the mentioned 10-20%.

One thing to note is that TPS does identify the pages which can be shared, however as 2MB pages are used they are not actively shared. When a system gets overcommitted those Large Pages (2MB) will be broken down in Small Pages (4KB) and the already identified pages will shared.

I just love TPS….

Swapping?

Duncan Epping · May 26, 2010 ·

We had a discussion internally about performance and swapping. I started writing this article and asked Frank if it made sense. Frank’s reply “just guess what I am writing about at the moment”. As both of us had a different approach we decided to launch both articles at the same time and refer to each others post. So here’s the link to Frank’s take on the discussion and I highly recommend reading it: “Re: Swapping“.

As always the common theme of the discussion was “swapping bad”. Although I don’t necessarily disagree. I do want to note that it is important to figure out if the system is actually actively swapping or not.

In many cases “bad performance” is blamed on swapping. However this is not always the case. As described in my section on “ESXTOP” there are multiple metrics on “swap” itself. Only a few of those relate to performance degradation due to swapping. I’ve listed the important metrics below.

Host:
MEM – SWAP/MB curr = Total swapped machine memory of all the groups including virtual machines.
MEM – SWAP/MB “target” = The expected swap usage.
MEM – SWAP/MB “r/s” = The rate at which memory is swapped in from disk.
MEM – SWAP/MB “w/s” = the rate at machine memory is swapped out to disk.

VM:
MEM – SWCUR = If larger than 0 host has swapped memory pages from this VM in the past.
MEM – SWTGT = The expected swap usage.
MEM – SWR/s (J) = If larger than 0 host is actively reading from swap(vswp).
MEM – SWW/s (J) = If larger than 0 host is actively writing to swap(vswp).

So which metrics do really matter when your customer complains about degradation of performance?

First metric to check:
SWR/s (J) = If larger than zero the ESX host is actively reading from swap(vswp).

Associated to that metric I would recommend looking at the following metric:
%SWPWT = The percentage of time the world is waiting for the ESX VMKernel swapping memory.

So what about all those other metrics? Why don’t they really matter?
Take “Current Swap”, as long as it is not being “read” it might just be one of those pages sporadically used which is just sitting there doing nothing. Will it hurt performance? Maybe, but currently as long as it is not being read… no it will most likely not hurt. Even writing to swap does not necessarily hurt performance, it might though. Those should just be used as indicators that the system is severely overcommitted and that performance might be degraded in the future when pages are being read!

Limiting your vCPU

Duncan Epping · May 18, 2010 ·

I had a discussion with someone around limiting a VM to a specific amount of Mhz’s after I found out that limits where set on most VMs. This environment was a “cloud” environment and the limit was set to create an extra level of fairness.

My question of course was doesn’t this impact performance? The answer was simple: No as a limit on a vCPU is only applied when there’s a resource constraint. It took me a couple of minutes to figure out what he actually tried to tell me but basically it came down to the following:

When a single VM has a limit of 300MHz and is the only VM running on a host than it will run it full speed as it will be constantly rescheduled for 300MHz.

However, that’s not what happens in my opinion. It took me a while to get the wording right but after a discussion with @frankdenneman this is what we came up with:

Look at a vCPU limit as a restriction within a specific time frame. When a time frame consists of 2000 units and a limit has been applied of 300 units it will take a full pass, so 300 “active” + 1700 units of waiting before it is scheduled again.

In other words applying a limit on a vCPU will slow your VM down no matter what. Even if there are no other VMs running on that 4 socket quad core host.

Would I ever recommend setting a limit? Only in very few cases. For instance when you have an old MS DOS application which is polling 10000 times a second it might be useful to limit it. Personally witnessed they can consume 100% of your resources, unnecessary as it isn’t doing anything actually.

In most cases however I would recommend against it. It will degrade user experience / performance and there is no need in my opinion. The VMkernel has got a great scheduler which will take fairness into account.

Disabling TPS hurting performance?

Duncan Epping · May 11, 2010 ·

On the internal mailinglist there was a discussion today around how disabling TPS (Transparent Page Sharing) could negativitely impact performance. It is something I hadn’t thought about yet but when you do think about it it actually does make sense and is definitely something to keep in mind.

Most new servers have some sort of NUMA architecture today. As hopefully all of you know TPS does not cross a NUMA node boundary. This basically means that pages will not be shared between NUMA nodes. Another thing that Frank Denneman already described in his article here is that when memory pages are allocated remotely there is a memory penalty associated with it. (Did you know there is an “esxtop” metric, N%L,which shows the percentage of remote pages?) These pages are accessed across an interconnect bus which is always slower than so called local memory.

Now you might ask what is the link between NUMA, TPS and degraded performance? Think about it for a second… TPS decreases the amount of physical pages needed. If TPS is disabled there is no sharing and chances of going across NUMA nodes are increased and as stated this will definitely impact performance. Funny how disabling a mechanism(TPS) which is often associated with “CPU overhead” can have a negative impact on memory latency.