On twitter @heiner_hardt asked for help with a performance related issue he was experiencing. As I am starting to appreciate esxtop more every single day and I really start to appreciate solving performance problems I decided to dive in to it.
After the initial couple of questions Heiner posted a screenshot:
Heiner highlighted (red outline) a couple of metrics which indicated swapping and ballooning as he pointed out with the text boxes. Although I can’t disagree that swapping and ballooning happened at some point in time I do disagree with the conclusion that this virtual machine is swapping. Lets break it down:
- 1393 Free -> Currently 1393MB memory available
- High State -> Hypervisor is not under memory pressure
- SWAP /MB 146 Cur -> 146MB has been swapped
- SWAP /MB 83 Target -> Target amount that needed to be swapped was 83MB
- 0.00 r/s -> No reads from swap currently
- 0.00 w/s -> No writes to swap currently
- MCTLSZ 1307.27 -> The amount of guest physical memory that has been reclaimed by the balloon driver is 1307.27MB
- MCTLTGT 1307.27 -> The amount of guest physical memory to be kept in the balloon driver is 1307.27MB
- SWCUR 146.61 -> The current amount of memory that has been swapped is 146.61.
- SWTGT 83.75 -> The target amount of memory that needed to be swapped was 83.75MB
Now that we know what these metrics mean and what the associated values are we can easily draw a conclusion:
At one point the host has most likely been overcommitted. However currently there is no memory pressure (state = high (>6% free memory)) as there is 1393MB of memory available. The metric “swcur” seems to indicate that swapping has occurred” however currently the host is not actively reading from swap or actively writing to swap (0.00 r/s and 0.00 w/s).
If the host is not experiencing memory pressure why is the balloon driver still inflated (MCTLTGT 1307.27MB)? Although the host is currently in a high memory state the amount of available memory almost equals the amount of claimed memory by the balloon driver. However deflating the balloon would return the host to a memory constrained state again.
My recommendation? Cut down on memory on your VMs! The fact that memory has been granted does not necessarily mean it is actively used and in this case it leads to serious overcommitment which in its turn leads to ballooning and even worse swapping.
One thing to point out though is the amount of “PSHARE” (TPS) is compared to average environments low. Might be something to explore!