esxtop

Why is %WAIT so high in esxtop?

Duncan Epping · Jul 17, 2012 ·

I got this question today around %WAIT and why it was so high for all these VMs. I grabbed a screenshot from our test environment. It shows %WAIT next to %VMWAIT.

First of all, I suggest looking at %VMWAIT. This one is more relevant in my opinion than %WAIT. %VMWAIT is a derivative of %WAIT, however it does not include %IDLE time but does include %SWPWT and the time the VM is blocked for when a device is unavailable. That kind of reveals immediately why %WAIT seems extremely high, it includes %IDLE! Another thing to note is the %WAIT for a VM is multiple worlds collided in to a single metric. Let me show you what I mean:

As you can see 5 worlds, which explains the %WAIT time to be around 500% constantly when the VM is not doing much. Hope that helps…

<edit> I just got pointed to this great KB article by one of my colleagues. It explains various CPU metrics in-depth. Key take away from that article for me is the following: %WAIT + %RDY + %CSTP + %RUN = 100%. Note that this is per world! Thanks Daniel for pointing this out!</edit>

vSphere 5.0 – what’s new for esxtop

Duncan Epping · Oct 4, 2011 ·

I was just playing around with esxtop in vSphere 5.0 and spotted something that changed. I figured there must be more so I started digging. I didn’t dig too deep as there is a great VMworld session (VSP1999) on this topic by Krishna Raj Raja and I figured why re-invent the wheel. Anyway, here’s the things I noticed which will definitely come in handy at some point while troubleshooting performance issues:

Each display type now shows the number of Worlds, VMs and vCPUs on the host on the first line. This will allow you to quickly identify why there for instance is a high %RDY.
%VMWAIT is a derivitive of %WAIT, however it does not include IDLE time and only %SWPWT and “blocked”. It could for instance also be blocked when the connectivity to the storage device has failed.
In the Power display there’s a new line which is PSTATE MHZ. This shows you the different clock frequencies per state. For instance “2395” is the clock frequency of %P0 and “1596” is the clock frequency of %P7. Please note that “%USED” is based on the base (%P0) of your CPU. %UTIL is the utilization in it’s current state (%Px), so in this case that could be 40% of %P7 (1596) which is 638.
In the “Device Display” there are new stats starting with “F”, for example FCMDs, these show the failed I/Os. Fairly quick way to see if there are any I/O errors.
These two new counters in the “Memory Display”, LLSWR/s / LLSWW/s, show the amount of memory being written to host cache or read from host cache. Useful when you have enabled this feature and want to know if it is actively being used. Of course there are also vCenter stats for this one.

I love esxtop, with 5.0 is has become even better and especially “%VMWAIT” and the PSTATE details will come in handy at some point in time!

How cool is TPS?

Duncan Epping · Jan 10, 2011 ·

Frank and I have discussed this topic multiple times and it was briefly mentioned in Frank’s excellent series about over-sizing virtual machines; Zero Pages, TPS and the impact of a boot-storm. Pre-vSphere 4.1 we have seen it all happen, a host fails and multiple VMs need to be restarted. Temporary contention exists as it could take up to 60 minutes before TPS completes. Or of course when the memory pressure thresholds are reached the VMkernel requests TPS to scan memory and collapse pages if and where possible. However, this is usually already too late resulting in ballooning or compressing (if your lucky) and ultimately swapping. If it is an HA initiated “boot-storm” or for instance you VDI users all powering up those desktops at the same time, the impact is the same.

Now one of the other things I also wanted to touch on was Large Pages, as this is the main argument our competitors are using against TPS. Reason for this being that Large Pages are not TPS’ed as I have discussed in this article and many articles before that one. I even heard people saying that TPS should be disabled as most Guest OS’es being installed today are 64Bit and as such ESX(i) will back even Small Pages (Guest OS) by Large Pages and TPS will only add unnecessary overhead without any benefits… Well I have a different opinion about that and will show you with a couple of examples why TPS should be enabled.

One of the major improvements in vSphere 4.0 is that it recognizes zeroed pages instantly and collapses them. I have dug around for detailed info but the best I could publicly find about it was in the esxtop bible and I quote:

A zero page is simply the memory page that is all zeros. If a zero guest physical page is detected by VMKernel page sharing module, this page will be backed by the same machine page on each NUMA node. Note that “ZERO” is included in “SHRD”.

(Please note that this metric was added in vSphere 4.1)

I wondered what that would look like in real life. I isolated one of my ESXi host (24GB of memory) in my lab and deployed 12 VMs with 3GB each with Windows 2008 64-Bit installed. I booted all of them up in literally seconds and as Windows 2008 zeroes out memory during boot I knew what to expect:

I added a couple of arrows so that it is a bit more obvious what I am trying to show here. On the top left you can see that TPS saved 16476MB and used 15MB to store unique pages. As the VMs clearly show most of those savings are from “ZERO” pages. Just subtract ZERO from SHRD (Shared Pages) and you will see what I mean. Pre-vSphere 4.0 this would have resulted in severe memory contention and as a result more than likely ballooning (if the balloon driver is already started, remember it is a “boot-storm”) or swapping.

Just to make sure I’m not rambling I disabled TPS (by setting Mem.ShareScanGHz to 0) and booted up those 12 VMs again. This is the result:

As shown at the top, the hosts status is “hard” as a result of 0 page sharing and even worse, as can be seen on a VM level, most VMs started swapping. We are talking about VMkernel swap here, not ballooning. I guess that clearly shows why TPS needs to be enabled and where and when you will benefit from it. Please note that you can also see “ZERO” pages in vCenter as shown in the screenshot below.

One thing Frank and I discussed a while back, and I finally managed to figure out, is why after boot of a Windows VM the “ZERO” pages still go up and fluctuate so much. I did not know this but found the following explanation:

There are two threads that are specifically responsible for moving threads from one list to another. Firstly, the zero page thread runs at the lowest priority and is responsible for zeroing out free pages before moving them to the zeroed page list.

In other words, when an application / service or even Windows itself “deprecates” the page it will be zeroed out by the “zero page thread” aka garbage collector at some point. The Page Sharing module will pick this up and collapse the page instantly.

I guess there is only one thing left to say, how cool is TPS?!

VMworld esxtop advanced session

Duncan Epping · Nov 8, 2010 ·

During my flight from Boston back to the Netherlands I listened to the VMworld esxtop session “Troubleshooting using ESXTOP for Advanced Users, TA6720“. As always an excellent session with a lot of in-depth info. Most of it was already documented, however there were a couple of key points that I hadn’t documented yet. I just added those to my esxtop page which I wanted to stress as I personally believe it is very useful info. It seems pretty random but it rolled up nicely into the esxtop page in my opinion.

%SYS should be less than 20, %SYS is the percentage of time spent by system services on behalf of the world. The possible system services are interrupt handlers, bottom halves, and system worlds.
-b = batch mode, adding “-a” will force all metrics to be gathered
Limit display to a single group (l)
- enables you to focus on a specific VM
Limiting the number of entities (#)
- this enables you for instance to watch the top 5 worlds for

I have also added thresholds for ZIP/s, UNZIP/s and CACHEUSD. These should of course be 0 from a performance perspective as anything larger than 0 means the host was overcommitted on memory and had to resort to memory compression.

If anyone has more metrics/thresholds to contribute which they used in the past to troubleshoot issues let me know!

How many pages can be shared if Large Pages are broken up?

Duncan Epping · Nov 7, 2010 ·

I have written multiple articles(1, 2, 3, 4) on this topic so hopefully by now everyone knows that Large Pages are not shared by TPS. However when there is contention the large pages will be broken up in small pages and those will be shared based on the outcome of the TPS algorythm. Something I have always wondered and discussed with the developers a while back is if it would be possible to have an indication of how many pages can possibly be shared when Large Pages would be broken down. (Please note that we are talking about Small Pages backed by Large Pages here.) Unfortunately there was no option to reveal this back then.

While watching the VMworld esxtop session “Troubleshooting using ESXTOP for Advanced Users, TA6720” I noticed something really cool. Which is shown in the quote and the screenshot below which is taken from the session. Other new metrics that were revealed in this session and shown in this screenshot are around Memory Compression. I guess the screenshot speaks for itself.

COWH : Copy on Write Pages hints – amount of memory in MB that are potentially shareable,

Potentially shareable which are not shared. for instance when large pages are used, this is a good hint!!

There was more cool stuff in this session that I will be posting about this week, or at least adding to my esxtop page for completeness.