• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

tps

Storage capacity for swap files and TPS disabled

Duncan Epping · Dec 8, 2016 ·

A while ago (2014) I wrote an article on TPS being disabled by default in future release. (Read KB 2080735 and 2097593 for more info) I described why VMware made this change from a security perspective and what the impact could be. Even today, two years later, I am still getting questions about this and what for instance the impact is on swap files. With vSAN you have the ability to thin provision swap files, and with TPS being disabled is this something that brings a risk?

Lets break it down, first of all what is the risk of having TPS enabled and where does TPS come in to play?

With large pages enabled by default most customers aren’t actually using TPS to the level they think they are. Unless you are using old CPUs which don’t have EPT or RVI capabilities, which I doubt at this point, it only kicks in with memory pressure (usually) and then large pages get broken in to small pages and only then will they be TPS’ed, if you have severe memory pressure that usually means you will go straight to ballooning or swapping.

Having said that, lets assume a hacker has managed to find his way in to you virtual machine’s guest operating system. Only when memory pages are collapsed, which as described above only happens under memory pressure, will the hacker be able to attack the system. Note that the VM/Data he wants to attack will need to be on the located on the same host and the memory pages/data he needs to breach the system will need to be collapsed. (actually, same NUMA node even) Many would argue that if a hacker gets that far and gets all the way in to your VM and capable of exploiting this gap you have far bigger problems. On top of that, what is the likelihood of pulling this off? Personally, and I know the VMware security team probably doesn’t agree, I think it is unlikely. I understand why VMware changed the default, but there are a lot of “IFs” in play here.

Anyway, lets assume you assessed the risk and feel you need to protect yourself against it and keep the default setting (intra-VM TPS only), what is the impact on your swap file capacity allocation? As stated when there is memory pressure, and ballooning cannot free up sufficient memory and intra-VM TPS is not providing the needed memory space either the next step after compressing memory pages is swapping! And in order for ESXi to swap memory to disk you will need disk capacity. If and when the swap file is thin provisioned (vSAN Sparse Swap) then before swapping out those blocks on vSAN will need to be allocated. (This also applies to NFS where files are thin provisioned by default by the way.)

What does that mean in terms of design? Well in your design you will need to ensure you allocate capacity on vSAN (or any other storage platform) for your swap files. This doesn’t need to be 100% capacity, but should be more than the level of expected overcommitment. If you expect that during maintenance for instance (or an HA event) you will have memory overcommitment of about 25% than you could ensure you have 25% of the capacity needed for swap files available at least to avoid having a VM being stunned as new blocks for the swap file cannot be allocated and you run out of vSAN datastore space.

Let it be clear, I don’t know many customers running their storage systems in terms of capacity up to 95% or more, but if you are and you have thin swap files and you are overcommitting and TPS is disabled, you may want to re-think your strategy.

What happens at which vSphere memory state?

Duncan Epping · Mar 2, 2015 ·

I’ve received a bunch of questions from people around what happens at each vSphere memory state after writing the article around breaking up large pages and introducing a new memory state in vSphere 6.0. Note, that the below is about vSphere 6.0 only, the “Clear” memory state does not exist before  6.0. Also, note that there is an upper and lower boundary when transitioning between states, this means that you will not see actions triggered at the exact specified threshold, but slightly before or after passing that threshold.

I created a simple table that shows what happens when. Note that minFree itself not a fixed number but rather a sliding scale and the value will depend on the host memory configuration.

Memory state
Threshold Actions performed
High 400% of minFree Break Large Pages when below Threshold (wait for next TPS run)
Clear 100% of minFree Break Large Pages and actively call TPS to collapse pages
Soft 64% of minFree TPS + Balloon
Hard 32% of minFree TPS + Compress + Swap
Low 16% of minFree Compress + Swap + Block

First of all, note that when you enter the “Hard” state the balloon driver stops and “Swap” and “Compress” take over. This is something I never really realized, but it is important to know as it means that when memory fills up fast you will see a short period of ballooning and then jump to compressing and swapping immediately. You may ask yourself what is this “block” thing. Well this is the worst situation you can find yourself in and it is the last resort, as my colleague Ishan described it:

The low state is similar to the hard state. In addition, to compressing and swapping memory pages, ESX may block certain VMs from allocating memory in this state. It aggressively reclaims memory from VMs, until ESX moves into the hard state.

I hope this makes it clear which action is triggered at which state, and also why the “Clear” state was introduced and the “High” state changed. It provides more time for the other actions to do what they need to do: free up memory to avoid blocking VMs from allocating new memory pages.

vSphere 6.0: Breaking Large Pages…

Duncan Epping · Feb 17, 2015 ·

When talking about Transparent Page Sharing (TPS) one thing that comes up regularly is the use of Large Pages and how that impacts TPS. As most of you hopefully know TPS does not collapse large page. However, when there is memory pressure you will see that large pages are broken up in to small pages and those small pages can then be collapsed by TPS. ESXi does this to prevent other memory reclaiming techniques, which have way more impact on performance, to kick in. You can imagine that fetching a memory page from a swap file on a spindle will take significantly longer than fetching a page from memory. (Nice white paper on the topic of memory reclamation can be found here…)

Something that I have personally ran in to a couple of times is the situation where memory pressure goes up so fast that the different states at which certain memory reclaiming techniques are used are crossed in a matter of seconds. This usually results in swapping to disk, even though large pages should have been broken up and collapsed where possible by TPS or memory should have been compressed or VMs ballooned. This is something that I’ve discussed with the respective developers and they came up with a solution. In order to understand what was implemented, lets look at how memory states were defined in vSphere 5. There were 4 memory states namely High (100% of minFree), Soft (64% of minFree), Hard (32% of minFree) and Low (16% of minFree). What does that mean % of minFree mean? Well if minFree is roughly 10GB for you configuration then the Soft for instance is reached when there is less then 64% of minFree available which is 6.4GB of memory. For Hard this is 3.2GB and so on. It should be noted that the change in state and the action it triggers does not happen exactly at the percentage mentioned, there is a lower and upper boundary where transition happens and this was done to avoid oscillation.

With vSphere 6.0 a fifth memory state is introduced and this state is called Clear. Clear is 100% of minFree and High has been redefined as 400% of MinFree. When there is less then High (400% of minFree) but more then Clear (100% of minFree) available then ESXi will start pre-emptively breaking up large pages so that TPS (when enabled!) can collapse them at next run. Lets take that 10GB as minFree as an example again, when you have between 30GB (High) and 10GB (Clear) of free memory available large pages will be broken up. This should provide the leeway needed to safely collapse pages (TPS) and avoid the potential performance decrease which the other memory states could introduce. Very useful if you ask me, and I am very happy that this change in behaviour, which I requested a long time ago, has finally made it in to the product.

Those of you who have been paying attention the last months will know that by default inter VM transparent page sharing is disabled. If you do want to reap the benefits of TPS and would like to leverage TPS in times of contention then enabling it in 6.0 is pretty straight forward. Just go to the advanced settings and set “Mem.ShareForceSalting” to 0. Do note that there are security risks potentially when doing this, and I recommend to read the above article to get a better understand of those risks.

** update – originally I was told that High was 300% of minFree, looking at my hosts today it seems that High is actually 400% **

Re: Large Pages (@gabvirtualworld @frankdenneman @forbesguthrie)

Duncan Epping · Jan 26, 2011 ·

I was reading an article by one of my Tech Marketing colleagues, Kyle Gleed and coincidentally Gabe published an article about the same topic to which Frank replied and just now Forbes Guthrie… the topic being Large Pages. I have written about this topic many times in the past and both Kyle, Gabe, Forbes and Frank mentioned the possible impact of large pages so I won’t go into detail.

There appears to be a lot of concerns around the benefits and the possible downside of leaving it enabled in terms of monitoring memory usage. There are a couple of things I want to discuss as I have the feeling that not everyone fully understands the concept.

First of all what are the Large/Small Pages? Small Pages are regular 4k memory pages and Large Pages are 2m pages. I guess the difference is pretty obvious. Now as Frank explained when using Large Pages there is a difference in TLB(translation lookaside buffer) entries; basically a VM provisioned with 2GB would need would need a 1000 TLB entries with Large Pages and 512.000 with Small Pages. Now you might wonder what this has got to do with your VM, well that’s easy… If you have an CPU that has EPT(Intel) or RVI(AMD) capabilities the VMkernel will try to back ALL pages with Large Pages.

Please read that last sentence again and spot what I tried to emphasize. All pages. So in other words where Gabe was talking about “does your Application really benefit from” I would like to state  that that is irrelevant. We are not merely talking about just your application, but about your VM as a whole. By backing all pages by Large Pages the chances of TLB misses are decreased, and for those who never looked into what the TLB does I would suggest reading this excellent wikipedia page. Let me give you the conclusion though, TLB misses will increase latency from a memory perspective.

That’s not just it, the other thing I wanted to share is the “impact” of breaking up the large pages into small pages when there is memory pressure. As Frank so elegantly stated “the VMkernel will resort to share-before-swap and compress-before-swap”. There is no nicer way of expressing uber-sweetness I guess. Now one thing that Frank did not mention though is that if the VMkernel detects memory pressure has been relieved it will start defragmenting small pages and form large pages again so that the workload can benefit again from the performance increase that these bring.

Now the question remains what kind of performance benefits can we expect as some appear to be under the impression that when the application doesn’t use large pages there is no benefit. I have personally conducted several tests with a XenApp workload and measured a 15% performance increase and on top of that less peaks and lower response times. Now this isn’t a guarantee that you will see the same behavior or results, but I can assure it is beneficial for your workload regardless of what types of pages are used. Small on Large or Large on Large, all will benefit and so will you…

I guess the conclusion is, don’t worry too much as vSphere will sort it out for you!

How cool is TPS?

Duncan Epping · Jan 10, 2011 ·

Frank and I have discussed this topic multiple times and it was briefly mentioned in Frank’s excellent series about over-sizing virtual machines; Zero Pages, TPS and the impact of a boot-storm. Pre-vSphere 4.1 we have seen it all happen, a host fails and multiple VMs need to be restarted. Temporary contention exists as it could take up to 60 minutes before TPS completes. Or of course when the memory pressure thresholds are reached the VMkernel requests TPS to scan memory and collapse pages if and where possible. However, this is usually already too late resulting in ballooning or compressing (if your lucky) and ultimately swapping. If it is an HA initiated “boot-storm” or for instance you VDI users all powering up those desktops at the same time, the impact is the same.

Now one of the other things I also wanted to touch on was Large Pages, as this is the main argument our competitors are using against TPS. Reason for this being that Large Pages are not TPS’ed as I have discussed in this article and many articles before that one. I even heard people saying that TPS should be disabled as most Guest OS’es being installed today are 64Bit and as such ESX(i) will back even Small Pages (Guest OS) by Large Pages and TPS will only add unnecessary overhead without any benefits… Well I have a different opinion about that and will show you with a couple of examples why TPS should be enabled.

One of the major improvements in vSphere 4.0 is that it recognizes zeroed pages instantly and collapses them. I have dug around for detailed info but the best I could publicly find about it was in the esxtop bible and I quote:

A zero page is simply the memory page that is all zeros. If a zero guest physical page is detected by VMKernel page sharing module, this page will be backed by the same machine page on each NUMA node. Note that “ZERO” is included in “SHRD”.

(Please note that this metric was added in vSphere 4.1)

I wondered what that would look like in real life. I isolated one of my ESXi host (24GB of memory) in my lab and deployed 12 VMs with 3GB each with Windows 2008 64-Bit installed. I booted all of them up in literally seconds and as Windows 2008 zeroes out memory during boot I knew what to expect:

I added a couple of arrows so that it is a bit more obvious what I am trying to show here. On the top left you can see that TPS saved 16476MB and used 15MB to store unique pages. As the VMs clearly show most of those savings are from “ZERO” pages. Just subtract ZERO from SHRD (Shared Pages) and you will see what I mean. Pre-vSphere 4.0 this would have resulted in severe memory contention and as a result more than likely ballooning (if the balloon driver is already started, remember it is a “boot-storm”) or swapping.

Just to make sure I’m not rambling I disabled TPS (by setting Mem.ShareScanGHz to 0) and booted up those 12 VMs again. This is the result:

As shown at the top, the hosts status is “hard” as a result of 0 page sharing and even worse, as can be seen on a VM level, most VMs started swapping. We are talking about VMkernel swap here, not ballooning. I guess that clearly shows why TPS needs to be enabled and where and when you will benefit from it. Please note that you can also see “ZERO” pages in vCenter as shown in the screenshot below.

One thing Frank and I discussed a while back, and I finally managed to figure out, is why after boot of a Windows VM the “ZERO” pages still go up and fluctuate so much. I did not know this but found the following explanation:

There are two threads that are specifically responsible for moving threads from one list to another. Firstly, the zero page thread runs at the lowest priority and is responsible for zeroing out free pages before moving them to the zeroed page list.

In other words, when an application / service or even Windows itself “deprecates” the page it will be zeroed out by the “zero page thread” aka garbage collector at some point. The Page Sharing module will pick this up and collapse the page instantly.

I guess there is only one thing left to say, how cool is TPS?!

  • Go to page 1
  • Go to page 2
  • Go to Next Page »

Primary Sidebar

About the author

Duncan Epping is a Chief Technologist in the Office of CTO of the Cloud Platform BU at VMware. He is a VCDX (# 007) and the author of the "vSAN Deep Dive" and the “vSphere Clustering Technical Deep Dive” series.

Upcoming Events

14-Apr-21 | VMUG Italy – Roadshow
26-May-21 | VMUG Egypt – Roadshow
May-21 | Australian VMUG – Roadshow

Recommended reads

Sponsors

Want to support us? Buy an advert!

Advertisements

Copyright Yellow-Bricks.com © 2021 · Log in