We had a discussion internally about performance and swapping. I started writing this article and asked Frank if it made sense. Frank’s reply “just guess what I am writing about at the moment”. As both of us had a different approach we decided to launch both articles at the same time and refer to each others post. So here’s the link to Frank’s take on the discussion and I highly recommend reading it: “Re: Swapping“.
As always the common theme of the discussion was “swapping bad”. Although I don’t necessarily disagree. I do want to note that it is important to figure out if the system is actually actively swapping or not.
In many cases “bad performance” is blamed on swapping. However this is not always the case. As described in my section on “ESXTOP” there are multiple metrics on “swap” itself. Only a few of those relate to performance degradation due to swapping. I’ve listed the important metrics below.
Host:
MEM – SWAP/MB curr = Total swapped machine memory of all the groups including virtual machines.
MEM – SWAP/MB “target” = The expected swap usage.
MEM – SWAP/MB “r/s” = The rate at which memory is swapped in from disk.
MEM – SWAP/MB “w/s” = the rate at machine memory is swapped out to disk.
VM:
MEM – SWCUR = If larger than 0 host has swapped memory pages from this VM in the past.
MEM – SWTGT = The expected swap usage.
MEM – SWR/s (J) = If larger than 0 host is actively reading from swap(vswp).
MEM – SWW/s (J) = If larger than 0 host is actively writing to swap(vswp).
So which metrics do really matter when your customer complains about degradation of performance?
First metric to check:
SWR/s (J) = If larger than zero the ESX host is actively reading from swap(vswp).
Associated to that metric I would recommend looking at the following metric:
%SWPWT = The percentage of time the world is waiting for the ESX VMKernel swapping memory.
So what about all those other metrics? Why don’t they really matter?
Take “Current Swap”, as long as it is not being “read” it might just be one of those pages sporadically used which is just sitting there doing nothing. Will it hurt performance? Maybe, but currently as long as it is not being read… no it will most likely not hurt. Even writing to swap does not necessarily hurt performance, it might though. Those should just be used as indicators that the system is severely overcommitted and that performance might be degraded in the future when pages are being read!
Suttoi says
As always an interesting post,
Is there any metric that will measure the impact of swapping rather than the amount of swapping?
By which I mean, we know that swapping can hurt VM performance, but I’m not totally sure how the impact of “memory slowness” might be measured.
For example:
Disk slowness is easy (in theory). If the response time is >20ms things are starting to go bad. But if a VM isn’t doing much IO this probably won’t show up as a problem, so we can get away with continuing to use slower disk. No one will ever see the difference.
The measure of how well we are “getting away with it” is probably disk queue length. response times might be >20ms, but if the IOs are being serviced quickly and not stacking up in the queue, we probably are getting away with it. If the queues start to build though, the end users might see some tangible impact on performance.
Is %SWPWT the metric that we should use to differentiate good swapping from bad?
Is there such a thing as a memory request queue, in the host or the guest OS?
It seems that hosts almost always run out of RAM way before they run out of CPU, so anything we can do to safely stretch memory resources can only be a good thing.
Craig Risinger says
%SWPWT is a good indicator of pain and probably the closest thing you’ll get to delay for memory access.
MEM – SWR/s is also an excellent indicator of pain. I’d look at this or the MB/s read in. The vmkernel does on-demand page-in: it will read in from .vswp only when something in the guest OS wants that data. So anything being swapped in is wanted, and the process that wants it was expecting it to be coming from memory which is 3(?) orders of magnitude faster than disk. The process will be disappointed.
I agree that MEM – SWCUR is basically worthless. It could mean your host is overcommitted and you’ll have performance problems in the future, or it could just mean you have many VMs with too much vRAM.
In most large orgs, the VMs still have too much vRAM installed. Windows on boot accesses every page. During a bootstorm, this puts memory pressure on the host, which swaps out. But many of those VMs will never touch those vRAM addresses again, since they never put real data there. So the bogus “data” (just a bunch of zeros, probably) stays swapped on disk and is never missed. The memory pressure was bogus. In this case, ESX is doing a good job of allowing efficient use of scarce pRAM even though the vRAM was not properly sized.
YP Chien says
I do get the %SWPWT stats when running esxtop in interactive mode. Does anyone knows where to get the %SWPWT when running esxtop in batch mode?
Craig Risinger says
When running esxtop in interactive mode, change which fields are displayed, then save that to an esxtop configuration file. Then run esxtop batch mode pointing to that config file. See http://www.yellow-bricks.com/esxtop/
esxtop -b -a will also include ALL stats in batch mode, but that can produce lines with 50,000 columns, which can be unwieldy. Excel opens only the first X columns. esxplot, a free tool, handles such wide files well. Theoretically perfmon will open these files, but plan on needing to shave at least once while you wait.
YP Chien says
Thanks Craig for the instruction. I took at the logs collected through extop batch, but don’t see the %SWPWT counters, but do see the %Swap Wait counters. Are these the same? Just to confirm.
PHP Programmer says
One of the fabulous factors about blogs is the facility to empower all of us to share our thoughts on the world wide web. Sometimes it’s just fantastic to blast away your thoughts and get it out of your mind. And you wanna to know what is astounding about it? It doesn’t matter what the subject are! There will always be an interested parties to read what you have written. Saying by php programmers. Thank you for sharing yours.
Craig Risinger says
@YP Chien:
Yes, those are the same.
In batch mode, the names of the metrics are much longer; in live mode, everything has to fit on one screen simultaneously, so they shorten the metrics names.
J says
Hey Duncan.
I know Im posting a reply to this almost 2 years later but just thought Id take a shot.
So is the swapped(red) I see on the guest in vCenter Resource Allocation hypervisor swapping or guest swapping?
I ask because in ESXTOP Im seeing SWCUR at 350.00, SWR/s at 0.00, and ballooning on a this guest. I guess Im confused why Im seeing red on the guest if balloon driver should be pinning down those pages to be given out to other VMs rather than swapping. My host is overcommitted but I thought balloon driver would kick in first and vmkernel swapping last.
I can understand balloon driver kicking in and forcing the OS to swap out to disk but if this is actually vmkernel swap(the red)I can assume it is impacting the performance of my VM right?
Thanks for the great work on the site. Cheers.
Duncan Epping says
SWCUR just means that sometime in the past memory was swapped out. The balloon driver should kick in first normally, but it could be that their was a spike… balloon driver couldn’t reclaim sufficient memory and memory was swapped out. Just as long as you don’t see any “SWW/s” you should be good 🙂