Disabling TPS hurting performance?

On the internal mailinglist there was a discussion today around how disabling TPS (Transparent Page Sharing) could negativitely impact performance. It is something I hadn’t thought about yet but when you do think about it it actually does make sense and is definitely something to keep in mind.

Most new servers have some sort of NUMA architecture today. As hopefully all of you know TPS does not cross a NUMA node boundary. This basically means that pages will not be shared between NUMA nodes. Another thing that Frank Denneman already described in his article here is that when memory pages are allocated remotely there is a memory penalty associated with it. (Did you know there is an “esxtop” metric, N%L,which shows the percentage of remote pages?) These pages are accessed across an interconnect bus which is always slower than so called local memory.

Now you might ask what is the link between NUMA, TPS and degraded performance? Think about it for a second… TPS decreases the amount of physical pages needed. If TPS is disabled there is no sharing and chances of going across NUMA nodes are increased and as stated this will definitely impact performance. Funny how disabling a mechanism(TPS) which is often associated with “CPU overhead” can have a negative impact on memory latency.

You can skip to the end and leave a response. Pinging is currently not allowed.

11 Responses to “Disabling TPS hurting performance?”

  1. daniel says:

    Good to know, but are there any occasions where disabling TPS on a modern architecture is actually recommended?

  2. Duncan says:

    Not that I am aware off. My recommendation is always keep it turned on.

    Duncan

  3. AFidel says:

    Too bad we will all have this penalty in the next generation or two of CPU’s due to huge MMU pages making TPS completely ineffective =(

  4. NiTRo says:

    I enabled TPS over NUMA nodes to get more memory saving on a cluster without noticable performance impact. How much hurt are we talking about Duncan ?

  5. I don’t know… There are so many performance variables that come into play, and so many contrary opinions and supposed facts floating around about best practices. I guess the question really is…how much would losing TPS negatively affect performance? Would the loss be more than the gain from forcing the utilization of the virtualized MMU as proposed by Mr. Drummonds here: http://vpivot.com/2010/04/20/a-performance-tip-for-esx-3-0-and-esx-3-5/ ? Or…am I missing something? Wouldn’t that force the use of large page tables which negatively impacts TPS?

    I just know that regardless of how nice it might be…I’ve yet to work in an environment where management could stomach the overcommitment required to really appreciate TPS in all of its glory. Perhaps it’s a holy grail for some…but I think there are plenty who’ve hardly scratched the surface of memory overcommitment…

  6. AFidel, you can force ESX to use small pages and compare performance if there is negative impact.

    I have done it myself on my Nehalems x5570. I noticed no negative impact, but after that I saw amazing numbers in one experiment. You can see esxtop screenshot here: http://blog.vadmin.ru/2010/02/transparent-page-sharing.html
    I have VMotioned enough production VMs (37 actually) to one host so sum of VM memory became equal to physical memory, 64 GB. After one hour memory usage graph stabilized at level ~32 GB usage. I.e. TPS saved me half of configured memory. CPU load was 10-15% max at that moment. VMs I’m talking about were with various OSes – RHEL, Windows XP, 2003, 2008, 32 and 64 bit.

  7. Fred Peterson says:

    If TPS was actually saving you 50%, you have way more memory assigned to those the VM’s then is actually necessary because the majority of savings with TPS comes from the zero pages.

  8. AFidel says:

    Anton,
    Yes, I typically see ~50% memory savings from TPS myself which is why I’m slightly worried about the way the hardware MMU’s are headed (1GB+ PT’s which would all but make TPS worthless with hardware acceleration).

    p.s.
    64GB is a really odd configuration for a 5570, should be 72GB typically. 64GB gives you an unbalanced configuration which can have a significant negative performance impact (~20-30%). Throw in a couple more DIMM’s and you should see a nice pickup in performance.

  9. Fred, I have about 20 VMs in production cluster with 3-4 GB of RAM that shoud never go swapping. So it’s not a discussion where I gave too much memory to these VMs. In the case go Hyper-V I have to buy all this memory and it’s not ‘cheap as garbage’.

    AFidel, thanks for recommendation. If you’re referring to 3-channel memory then it would be problem for me, because I have 8*8 GB configuration -> 4 modules per CPU. So I have to either install 48 GB or 96 GB (3 or 6 modules). And I have no performance problem currently, 37 VMs loaded my CPUs only about 15% total.

  10. Scot Grabowski says:

    http://kb.vmware.com/selfservice/microsites/search.dolanguage=en_US&cmd=displayKC&externalId=1004901 So this KB from VMware is actually telling you to do this ? We have many multi proc boxes now going into DEV on vm’s and this slow reboot is an issue during the testing phase. Are they telling us to steal from peter to pay paul basically ? So for now we are testing with 2 vm’s we turned pshare off on the guests for now. I guess we will see how it impacts the RAM ? It scares me to think of what may happen as we are starting to create a high perfomance cluster for VM’s with over 4 VCPU’s if we turn all the page sharing off on them what is going to happen to the memory ? Once they go prod and are not rebooting we can change it back having a very large env as we do, It can be an admin nightmare.

  11. invisible says:

    What are setting controlling page sizes for TPS?

    I’ve also noticed that most TPS savings come from Windows machines. for Linux page sharing usually does not exceed 10-15%.

    P.S. >200 running VMs, cluster of 10 BL460c G6 with 2 X5570 and 96GB of RAM on each.

Leave a Reply

Subscribe to RSS Feed Follow me on Twitter!