(Inter-VM) TPS Disabled by default, what should you do?

We’ve all probably seen the announcement around inter-VM(!!) TPS (transparent page sharing) being disabled by default in future releases of vSphere, and the recommendation to disable it in current versions. The reason for this is the fact there was a research paper published which demonstrates how it is possible to get access to data under certain highly controlled conditions. As the KB article describes:

Published academic papers have demonstrated that by forcing a flush and reload of cache memory, it is possible to measure memory timings to determine an AES encryption key in use on another virtual machine running on the same physical processor of the host server if Transparent Page Sharing is enabled. This technique works only in a highly controlled environment using a non-standard configuration.

There were many people who blogged about what the potential impact is on your environment or designs. Typically in the past people would take a 20 to 30% memory sharing in to account when sizing their environment. With inter-VM TPS disabled of course this goes out of the window. Frank described this nicely in this post. However, as Frank also described and I mentioned in previous articles when large pages are being used (usually the case) then TPS is not used by default and only under pressure…

The under pressure part is important if you ask me as TPS is the first memory reclaiming technique used when a host is under pressure. If TPS cannot sufficiently reduce the memory pressure then ballooning is leveraged, followed by compression and swapping ultimately. Personally I would like to avoid swapping at all costs and preferably compression as well. Ballooning typically doesn’t result in a huge performance degradation so it could be acceptable, but TPS is something I prefer as it just breaks up large pages in to small pages and collapses those when possible. Performance loss is hardly measurable in that case. Of course TPS would be way more effective when pages between VMs can be shared rather then just within the VM.

Anyway, the question remains should you have (inter-VM) TPS disabled or not? When you assess the risk you need to ask yourself first who has access to your virtual machines as the technique requires you to login to a virtual machine. Before we look at the scenarios, not that I mentioned “inter-VM” a couple of times now, TPS is not completely disabled in future versions. It will be disabled for inter-VM sharing by default, but can be enabled. More to be found on that in this article on the vSphere blog.

Lets explore 3 scenarios:

  1. Server virtualisation (private)
  2. Public cloud
  3. Virtual Desktops

In the case of “Server virtualisation”, in most scenarios I would expect that only the system administrators and/or application owners have access to the virtual machines. The question then is, why would they go to this level when they have access to the virtual machines anyway? So in the scenario where Server Virtualization is your use case, and access to your virtual machines is restricted to a limited number of people, I would definitely reconsider enabling inter-VM TPS.

In a public cloud environment this however is different of course. You can imagine that a hacker could buy a virtual machine and try to retrieve the AES encryption key. What he (the hacker) does with it next of course is even then still the question. Hopefully the cloud provider ensures that that the tenants are isolated from each other from a security/networking point of view. If that is the case there shouldn’t be much they could do with it. Then again, it could be just one of the many steps they have to take to break in to a system so I would probably not want to take the risk, although the risk is low. This is one of the scenarios where I would leave inter-VM TPS disabled.

Third and last scenario is Virtual Desktops. In the case of a virtual desktop many different users have access to virtual machines… The question though is if you are running any applications or accessing applications which are leveraging AES encryption or not. I cannot answer that for you, so I will leave that up in the air… you will need to assess that risk.

I guess the answer to whether you should or should not disable (inter-VM) TPS is as always: it depends. I understand why inter-VM TPS was disabled, but if the risk is low I would definitely consider enabling it.

What is coming for vSphere and VSAN? VMworld reveals…

I’ve been prepping a presentation for upcoming VMUGs, but wanted to also share this with my readers. The session is all about vSphere futures, what is coming soon? Before anyone says I am breaking NDA, I’ve harvested all of this info from public VMworld sessions. Except for the VSAN details, those were announced to the press at VMworld EMEA. Lets start with Virtual SAN…

The Virtual SAN details were posted in this Computer Weekly article, and by the looks of it they interviewed VMware’s CEO Pat Gelsinger and Alberto Farronato from the VSAN product team. So what is coming soon?

  • All Flash Virtual SAN support
    Considering the price of MLC has lowered to roughly the same price as SAS HDDs per GB I think this is a great new feature to have. Being able to build all-flash configurations at the price point of a regular configuration, and with probably many supported configurations is a huge advantage of VSAN. I would expect VSAN to support various types of flash as the “capacity” layer, so this is an architects dream… designing your own all-flash storage system!
  • Virsto integration
    I played with Virsto when it was just released and was impressed by the performance and the scalability. Functions that were part of Virst such as snapshots and clones these have been built into VSAN and it will bring VSAN to the next level!
  • JBOD support
    Something many have requested, and primarily to be able to use VSAN in Blade environments… Well with the JBOD support announced this will be a lot easier. I don’t know the exact details, but just the “JBOD” part got me excited.
  • 64 host VSAN cluster support
    VSAN doesn’t scale? Here you go,

That is a nice list by itself, and I am sure there is plenty more for VSAN. At VMworld for instance Wade Holmes also spoke about support for disk controller based encryption for instance. Cool right?! So what about vSphere? Considering even the version number was dropped during the keynote and it hints at a major release you would expect some big functionality to be introduced. Once again, all the stuff below is harvested from various public VMworld sessions:

  • VMFork aka Project Fargo – discussed here…
  • Increased scale!
    • 64 host HA/DRS cluster, I know a handful of customers who asked for 64 host clusters, so here it is guys… or better said: soon you will have it!
  • SMP vCPU FT – up to 4 vCPU support
    • I like FT from an innovation point of view, but it isn’t a feature I would personally use too much as I feel “fault tolerance” from an app perspective needs to be solved by the app. Now, I do realize that there are MANY legacy applications out there, and if you have a scale-up application which needs to be highly available then SMP FT is very useful. Do note that with this release the architecture of FT has changed. For instance you used to share the same “VMDK” for both primary and secondary, but that is no longer the case.
  • vMotion across anything
    • vMotion across vCenter instances
    • vMotion across Distributed Switch
    • vMotion across very large distance, support up to 100ms latency
    • vMotion to vCloud Air datacenter
  • Introduction of Virtual Datacenter concept in vCenter
    • Enhance “policy driven” experience within vCenter. Virtual Datacenter aggregates compute clusters, storage clusters, networks, and policies!
  • Content Library
    • Content Library provides storage and versioning of files including VM templates, ISOs, and OVFs.
      Includes powerful publish and subscribe features to replicate content
      Backed by vSphere Datastores or NFS
  • Web Client performance / enhancement
    • Recent tasks pane drops to the bottom instead of on the right
    • Performance vastly improved
    • Menus flattened
  • DRS placement “network aware”
    • Hosts with high network contention can show low CPU and memory usage, DRS will look for more VM placements
    • Provide network bandwidth reservation for VMs and migrate VMs in response to reservation violations!
  • vSphere HA component protection
    • Helps when hitting “all paths down” situations by allowing HA to take action on impacted virtual machines
  • Virtual Volumes, bringing the VSAN “policy goodness” to traditional storage systems

Of course there is more, but these are the ones that were discussed at VMworld… for the remainder you will have to wait until the next version of vSphere is released, or you can also sign up for the beta still I believe!

VMworld attendees: Give Back!

Always wanted to give back to the community but can’t find the time? How about VMware help you with that? If you are attending VMworld Europe you can simply do this by going to the hangspace and throw a paper airplane as far as you can. The VMware Foundation did the same thing in the US and it was a big success, I am hoping we can repeat that! The amount that will be donated will depend on where your airplane lands. It could be as little as 15 dollars, but also easily a 1000. I just went to the hangspace and threw a paper airplane  with my friend Jessica from the VMware Foundation and and former colleague but now EMC, Paul Manning.

Take the time, go to the hangspace… it literally only takes 2 minutes and give back!

EVO:RAIL now also available through HP and HDS!

It is going to be a really short blog post, but as many folks have asked about this I figured it was worth a quick article. In the VMworld Europe 2014 keynote Pat Gelsinger today announced that both HP and Hitachi Data Systems have joined the VMware EVO:RAIL program. This is great news if you ask me for customers all throughout the world as it provides more options for procuring an EVO:RAIL hyper-converged infrastructure appliance through your preferred server vendor!

The family is growing: Dell, Fujitsy, Hitachi Data Systems, HP, Inspur, NetOne Systems and SuperMicro… who would you like to see next?

Underlying Infrastructure for your pets and cattle

Last week on twitter Maish asked a question that got me thinking. Actually, I have been thinking about this for a while now. The question deals with how you design your infrastructure for the various types of workloads (pets and cattle). Whether your workload falls in the “pet” category or the “cattle” category. (If you are not familiar with the terms pets/cattle read this article by Massimo)

I asked Maish what it actually means for your infrastructure, and at the same time I gave it some more thought over the last week. Cattle is the type of application architecture which handles failures by providing a distributed solution, it scales out instead of up typically, the VMs are disposable as they usually won’t hold state. With pets this is different, they typically scale up and resiliency is often provided by either a 3rd party clustering mechanism or the infrastructure under neath, in many cases contain state and recoverability is key. As you can imagine both types of workloads have different requirements of the infrastructure. Going back to Maish’s question, I guess the real question is if you can afford the “what it means for the underlying infrastructure”. What do I mean with that?

If you look at the requirements of both architectures, you could say that “pets” will typically demand more from the underlying infrastructure when it comes to resiliency / recoverability. Cattle will have less demands from that perspective but flexibility / agility is more important. You can imagine that you could implement two different infrastructure architectures for these specific workloads, but does this make sense? If you are Netflix, Google, Youtube etc then it may make sense to do this due to the scale they operate at and the fact that IT is their core business. In those cases “cattle” is what drives the business, and there are back-end systems. Reality is though that for the majority this is not the case. Your environment will be a hybrid, and more than likely “pets” will have the overhand as that is simply what the state of the world is today.

That does not mean they cannot co-exist. That is what I believe is the true strength of virtualization, it allows you to run many different types of workloads on the same infrastructure. Whether that is your Exchange environment or your in-house developed scale out web application which serves hundreds of thousands of customers does not make a difference to your virtualization platform. From an operational perspective the big benefit here is that you will not have to maintain different run books to manage your workloads. From an ops perspective they will look  the same on the outside, although they may differ on the inside. What may change though is the services required for those systems, but with the rich ecosystem available for virtualization platforms these days that should not be a problem. Need extra security / microsegmentation? VMware NSX can provide the security isolation needed to run these applications smoothly. Sub milliseconds latency requirements? Plenty of storage / caching solutions out there that can deliver this!

Will the application architecture shift that is happening right now impact your underlying infrastructure? We have made these huge steps in operational efficiency in the last 5 years, and with SDDC we are about to take the next big step, and although I do believe that the application architecture shift will result in infrastructure changes lets not make the same mistakes we made in the past by creating these infrastructure silos per workload. I strongly believe that repeatability, consistency, reliability and predictability are key and this starts with a solid, scalable and trusted foundation (infrastructure).