I was reading an article by one of my Tech Marketing colleagues, Kyle Gleed and coincidentally Gabe published an article about the same topic to which Frank replied and just now Forbes Guthrie… the topic being Large Pages. I have written about this topic many times in the past and both Kyle, Gabe, Forbes and Frank mentioned the possible impact of large pages so I won’t go into detail.
There appears to be a lot of concerns around the benefits and the possible downside of leaving it enabled in terms of monitoring memory usage. There are a couple of things I want to discuss as I have the feeling that not everyone fully understands the concept.
First of all what are the Large/Small Pages? Small Pages are regular 4k memory pages and Large Pages are 2m pages. I guess the difference is pretty obvious. Now as Frank explained when using Large Pages there is a difference in TLB(translation lookaside buffer) entries; basically a VM provisioned with 2GB would need would need a 1000 TLB entries with Large Pages and 512.000 with Small Pages. Now you might wonder what this has got to do with your VM, well that’s easy… If you have an CPU that has EPT(Intel) or RVI(AMD) capabilities the VMkernel will try to back ALL pages with Large Pages.
Please read that last sentence again and spot what I tried to emphasize. All pages. So in other words where Gabe was talking about “does your Application really benefit from” I would like to state that that is irrelevant. We are not merely talking about just your application, but about your VM as a whole. By backing all pages by Large Pages the chances of TLB misses are decreased, and for those who never looked into what the TLB does I would suggest reading this excellent wikipedia page. Let me give you the conclusion though, TLB misses will increase latency from a memory perspective.
That’s not just it, the other thing I wanted to share is the “impact” of breaking up the large pages into small pages when there is memory pressure. As Frank so elegantly stated “the VMkernel will resort to share-before-swap and compress-before-swap”. There is no nicer way of expressing uber-sweetness I guess. Now one thing that Frank did not mention though is that if the VMkernel detects memory pressure has been relieved it will start defragmenting small pages and form large pages again so that the workload can benefit again from the performance increase that these bring.
Now the question remains what kind of performance benefits can we expect as some appear to be under the impression that when the application doesn’t use large pages there is no benefit. I have personally conducted several tests with a XenApp workload and measured a 15% performance increase and on top of that less peaks and lower response times. Now this isn’t a guarantee that you will see the same behavior or results, but I can assure it is beneficial for your workload regardless of what types of pages are used. Small on Large or Large on Large, all will benefit and so will you…
I guess the conclusion is, don’t worry too much as vSphere will sort it out for you!
NiTRo says
Duncan, in a memory consolidation perspective, did you ever tried to disable the Mem.AllocGuestLargePage setting on such configuration ?
Duncan says
No I haven’t but I have customers who do that as they they want to go for 100% visibility and max out their boxes.
orzdude says
Something I’m a bit concerned about is how fast the vmkernel is going employ the share-before-swap concept. We have clusters with few hosts and many VMs, so if one host fails and HA kicks in or we have to evacuate a host, the others will have to provide a lot more memory resources in a relatively short amount of time.
My guess is that it won’t be enough time to reap the benefits of TPS and the vmkernel will need to swap quite some memory, at least in the beginning.
(On a related note however, the instant zero-page-consolidation Ducan mentioned some time ago should ease some of the pressure (for HA restarts).)
I had disabled Mem.AllocGuestLargePage on production hosts before the upgrade to 4.1 and can’t say I noticed any obvious performance decrease; or increase now that we have them enabled since 4.1. I did not conduct benchmarks though and those hosts serve as general consolidation for a lot of different, really light applications (but from identical OS-Templates).
I still have Mem.AllocGuestLargePage=0 in place on our test systems though, running about 20 (pretty different) VMs looks like this:
PMEM /MB: 32765 total: 1146 vmk, 16588 other, 15030 free
VMKMEM/MB: 32006 managed: 1920 minfree, 6229 rsvd, 25777 ursvd, high state
PSHARE/MB: 10603 shared, 1601 common: 9002 saving
NiTRo says
Thanks for your feedback Orzdude, it’s really good to know that TPS would have a second life 🙂
Sean D says
My biggest problem with the large page+TPS behavior is have no idea how much memory I really have. I know there is information in esxtop about how much memory you might get if TPS starts. However, I don’t do ESX sizing based on esxtop. I do ESX sizing based on long term trending, using information I pull out of ESX with the perl SDK. AFAIK, that information is not available there.
In addition, theres the problem of inconsistent performance. Lets say I plan on getting the most for my memory and want to count on TPS. When VMs start early in the cluster’s lifetime, they’ll use large pages. As the cluster fills up, they’ll move to small pages and potentially have a performance change, which could cause problems.
The other alternative is to give up on TPS altogether. To do that I need to increase my ESX footprint by a third, and I don’t think my management will go for that.
As such, I’m stuck with disabling large pages until there’s a better solution for these problems.
Duncan says
I understand and I will give that feedback to the engineers to see if we can do anything about that.
Afidel says
The only thing I rely on TPS for is non-production where I overcomit quite a bit and as a buffer against a multi-host failure in production. For me it’s easier politically to give the VM owners “dedicated” resources that perform consistently. I’m already seeing a 85+% cost savings vs dedicated hardware so why rock the boat to gain a few percent more and possibly make my life harder from a troubleshooting perspective?
Tore says
Afidel, actually a good point. Depends on the environment of course, if you have 1000 hosts even 3% increase in performance will be a big deal, but if you have 10 hosts thoose 5% may not mean so much.
It seems that for many the problem is to detect performance degradation, as a result of large pages being split to small. Yes you can monitor the latency and performance of your application(s). But you may have many different application(s), which again depends on many other shared resources (e.g. SAN) that may affect things as well. When you don’t have dedicated resources, you dont know what to except. And many don’t know what is acceptable values for their application, but they sure do know what isn’t (e.g. its slow, laggy, …)
Duncan says
I guess another way to look at it is that you establish your SLA based on what TPS brings and you can let your customer benefit from what Large Pages bring.
Although I hardly know anyone who has response times of workloads mentioned in SLAs, that is a very limited subset. So it makes me wonder why people are scared about that “performance degradation” when in essence they go back to the state they were in.
But maybe I am just looking at it too simplistic.