In the “CPU/MEM Reservation Behavior” article there was a lively discussion going on between Chris Huss(vmtrainers.com) and myself. I think the following comment by Chris more or less summarizes the discussion
I wasn’t aware that the balloon driver was involved with the Mem.IdleTax. I haven’t seen any documentation stating this…and assumed that the VMkernel just stopped mapping idle memory for the VM without letting it know. If the VM needed the memory again, the VMkernel would just re-map it.
I can be totally wrong about this, but I have not seen any documentation to debunk this theory. It is my belief that the Mem.IdleTax is a totally separate memory saving/shaving technique from the balloon driver or the .vswp file.
If VMware engineering has or would publish an official article on this…I think it would clear up alot of things.
To summarize; How does ESX reclaim idle memory or free memory from a virtual machine? The answer is simple. ESX has two idle memory reclamation mechanisms:
- Balloon driver
- vSwap
I would like to refer to page 29 of the Resource Management Guide where the above is stated. I do not think it is a coincidence that the paragraph above “memory reclamation” is “Memory Tax for Idle Virtual Machines”. (There is a third memory “reclamation” mechanism by the way, it is called “TPS”, but this is not used to specifically reclaim Idle Memory but rather to free up memory by sharing pages where possible.)
By default the balloon driver is used to reclaim idle memory. The balloon driver is in fact used as some operating systems only update there internal free memory map. Basically what I am saying is that the hypervisor is unaware of the fact that specific pages are unused as they might still contain data and the GOS(Guest Operating System) will not report to the hypervisor that the pages are not being used anymore. The balloon driver is used to notify the GOS that there is a lack of memory.
When the balloon inflates the GOS will first assign all “unused / free” pages to the balloon driver. If this is enough it will stop. If this isn’t enough the OS will decide which pages it will page out until it reaches its threshold. The pages will need to be written to GOS swap as they might be needed later, they can’t just be reused without storing them somewhere.
I guess this section of the excellent white-paper “Memory Resource Management in VMware ESX Server” by Carl Waldspruger describes what I explained above.
The guest OS decides which particular pages to reclaim and, if necessary, pages them out to its own virtual disk. The balloon driver communicates the physical page number for each allocated page to ESX Server, which may then reclaim the corresponding machine page.
To be absolutely certain I reached out Carl Waldspruger to verify my statements/claims are correct. (Yes they were…)
By the way this concept is also described in the “VMware vSphere: Manage for Performance” course manual on page 151. Excellent course which I can recommend to everyone as it will not only explain this concept but also how to identify it and how to resolve it.
http://www.boche.net/blog/index.php/2009/01/29/idle-memory-tax/
Nice post Duncan.
For anyone new to ESX memory management and reclamation techniques (or needing a refresher) this VMworld session by Kit Colbert is a great intro: http://www.vmworld.com/docs/DOC-2116
This topic is also described in VMware vSphere: Install, Configure & Manage on module 9.
Great post. Just one clarification point:
Re, “The pages will need to be written to GOS swap as they might be needed later, they can’t just be reused without storing them somewhere.”
That’s true for most of what we think of as data, but it’s not necessary for file cache. To make any future reads faster, the GOS may store files which it has read from disk in memory (a filesystem cache), but it may mark those memory pages as “free”. So even though there is data there, when the memory gets repurposed, it doesn’t need to be written to GOS swapfile–after all, it’s already on disk.
I agree Craig, I left any caching mechanisms out of scope as it also differs per OS. Thanks anyway for clearing things up!
Duncan,
Thanks for your hard work on this one. It looks like Carl verified that the balloon driver or .vswp file is used during the Mem.IdleTax process. Although not specifically worded in any of VMware’s documentation, I know we can all trust Carl’s verbal verification.
To the other responders, I Google’d for any kind of references on this, but disregarded any 3rd party blogs or references that were not official VMware. Also, I teach the VMware ICM course and FT class almost every week. While it’s true we have a Module 9 that covers ballooning and the .vswp file, there is no mention of Mem.IdleTax in that course…which is why I tell my students to read the Resource Management Guide as a supplement. I am very aware of that document and everything it mentions about this topic. Never does it mention specifically that the balloon driver or the .vswp file have anything to do specifically with the Mem.IdleTax process.
If Carl says it does, then it does. It would be nice to see it documented officially…but I’ll take what I can get.
Now, the other question…which originally started this discussion, is how memory reservations and the Mem.IdleTax work. Do the same rules apply to the Mem.IdleTax/balloon driver/.vswp file if you have a memory reservation?
Can the Mem.IdleTax reclaim unused/idle memory that is being reserved? I know you guys will say no…out of instinct, but is this documented?
In a perfect world, I’d like to think the Mem.IdleTax could help us prevent a VM from hogging memory that’s been reserved…but now, just sitting idle…going to waste. It’s wishful thinking, but it would be nice to see official VMware documentation to say one way or the other.
Thanks guys,
Chris
Duncan, do you know if it is possible to identify ballooning caused by mem tax idle from ballooning caused by memory pressure (without looking at the consumed stats) ?
“By default the balloon driver is used to reclaim idle memory.”
Nope, this is avery common mistake.
In fact the memory state is used to determine, what to use:
State is being defined as:
Amount of free machine memory on the host. VMkernel has four free-memory thresholds that affect memory reclamation:
◦0 (high) Free memory >= 6% of machine memory minus Service Console memory.
◦1 (soft) 4%
◦2 (hard) 2%
◦3 (low) 1%
0 (high) and 1 (soft): Swapping is favored over ballooning.
2 (hard) and 3 (low): Ballooning is favored over swapping
TPS is used always
Seva, thanks for your comment but you are incorrect. This article deals about Idle Memory not about the memory states. I know these are related but idle memory(either memory not recently touched or “deleted” pages) can only be reclaimed by the balloon driver mechanism as VMkernel swap is random:
“A randomized page replacement policy is used to prevent the types of pathological interference with native guest OS memory management algorithms”
Besides that your list is in the incorrect order. You might want to read up on this. I can recommend the following documents:
http://waldspurger.org/carl/papers/esx-mem-osdi02.pdf
http://www.vmware.com/files/pdf/perf-vsphere-memory_management.pdf
Duncan, i’ve been reading up on memory allocation and I still feel something is missing on the deallocation subject.
Since deallocation can only happen with TPS, Ballooning and Host swapping (according to VMware) how can my VM memory usage drop from 75% usage to 25% without ballooning (stats min/avg/max 0/0/0)?
I do not think TPS can free up that much memory.
My gut feeling(s):
Either memory is marked idle and will be first to swap/balloon when contention happens making the guest used memory status an illusion.
Or there is a process swapping out highly idle memory pages to the vswp file.
I agree Craig, I left any caching mechanisms out of scope as it also differs per OS. Thanks anyway for clearing things up!
Hi Duncan,
I just wanted to clear something up here. Firstly in the article you state:
” ESX has two idle memory reclamation mechanisms:
Balloon driver
vSwap”
But the in the comments you state:
“I know these are related but idle memory(…) can only be reclaimed by the balloon driver mechanism as VMkernel swap is random”
Everything I have read states it uses both, but I have always doubted that it resorted to host swapping. This has always seemed a bit heavy handed for “idle” memory. Can you please clarify?
Thanks,
Forbes
yeah I should have phrased it differently. it should be “memory reclamation” and not necessarily “idle memory reclamation”.