vSphere 6.0: Breaking Large Pages…

When talking about Transparent Page Sharing (TPS) one thing that comes up regularly is the use of Large Pages and how that impacts TPS. As most of you hopefully know TPS does not collapse large page. However, when there is memory pressure you will see that large pages are broken up in to small pages and those small pages can then be collapsed by TPS. ESXi does this to prevent other memory reclaiming techniques, which have way more impact on performance, to kick in. You can imagine that fetching a memory page from a swap file on a spindle will take significantly longer than fetching a page from memory. (Nice white paper on the topic of memory reclamation can be found here…)

Something that I have personally ran in to a couple of times is the situation where memory pressure goes up so fast that the different states at which certain memory reclaiming techniques are used are crossed in a matter of seconds. This usually results in swapping to disk, even though large pages should have been broken up and collapsed where possible by TPS or memory should have been compressed or VMs ballooned. This is something that I’ve discussed with the respective developers and they came up with a solution. In order to understand what was implemented, lets look at how memory states were defined in vSphere 5. There were 4 memory states namely High (100% of minFree), Soft (64% of minFree), Hard (32% of minFree) and Low (16% of minFree). What does that mean % of minFree mean? Well if minFree is roughly 10GB for you configuration then the Soft for instance is reached when there is less then 64% of minFree available which is 6.4GB of memory. For Hard this is 3.2GB and so on. It should be noted that the change in state and the action it triggers does not happen exactly at the percentage mentioned, there is a lower and upper boundary where transition happens and this was done to avoid oscillation.

With vSphere 6.0 a fifth memory state is introduced and this state is called Clear. Clear is 100% of minFree and High has been redefined as 300% of MinFree. When there is less then High (300% of minFree) but more then Clear (100% of minFree) available then ESXi will start pre-emptively breaking up large pages so that TPS (when enabled!) can collapse them at next run. Lets take that 10GB as minFree as an example again, when you have between 30GB (High) and 10GB (Clear) of free memory available large pages will be broken up. This should provide the leeway needed to safely collapse pages (TPS) and avoid the potential performance decrease which the other memory states could introduce. Very useful if you ask me, and I am very happy that this change in behaviour, which I requested a long time ago, has finally made it in to the product.

Those of you who have been paying attention the last months will know that by default inter VM transparent page sharing is disabled. If you do want to reap the benefits of TPS and would like to leverage TPS in times of contention then enabling it in 6.0 is pretty straight forward. Just go to the advanced settings and set “Mem.ShareForceSalting” to 0. Do note that there are security risks potentially when doing this, and I recommend to read the above article to get a better understand of those risks.

What is new for vMotion in vSphere 6.0?

vMotion is probably my favourite VMware feature ever. It is one of those features which revolutionized the world and just when you think they can’t really innovate anymore they take it to a whole new level. So what is new?

  • Cross vSwitch vMotion
  • Cross vCenter vMotion
  • Long Distance vMotion
  • vMotion Network improvements
    • No requirement for L2 adjacency any longer!
  • vMotion support for Microsoft Clusters using physical RDMs

That is a nice long list indeed. Lets discuss each of these new features one by one and lets start at the top with Cross vSwitch vMotion. Cross vSwitch vMotion basically allows you to do what the name tells you. It allows you to migrate virtual machines between different vSwitches. Not just from vSS to vSS but also from vSS to vDS and vDS to vDS. Note that vDS to vSS is not supported. This is because when migrating from vDS metadata of the VM is transferred as well and the vSwitch does not have this logic and cannot handle the metadata. Note that the IP Address of the VM that you are migrating will not magically change, so you will need to make sure both  the source and the destination portgroup belong to the same layer 2 network. All of this is very useful during for instance Datacenter Migrations or when you are moving VMs between clusters for instance or are migrating to a new vCenter instance even.

Next on the list is Cross vCenter vMotion. This is something that came up fairly frequent when talking about vMotion, will we ever have the ability to move a VM to a new vCenter Server instance? Well as of vSphere 6.0 this is indeed possible. Not only can you move between vCenter Servers but you can do this with all the different migration types there are: change compute / storage / network. You can even do it without having a shared datastore between the source and destination vCenter aka “shared nothing migration. This functionality will come in handy when you are migrating to a different vCenter instance or even when you are migrating workloads to a different location. Note, it is a requirement for the source and destination vCenter Server to belong to the same SSO domain. What I love about this feature is that when the VM is migrated things like alarms, events, HA and DRS settings are all migrated with it. So if you have affinity rules or changed the host isolation response or set a limit or reservation it will follow the VM!

My personal favourite is Long Distance vMotion. When I say long distance, I do mean long distance. Remember that the max tolerated latency was 10ms for vMotion? With this new feature that just went up to 100ms. Long distance vMotion uses socket buffer resizing techniques to ensure that migrations succeed when latency is high. Note that this will work with any storage system, both VMFS and NFS based solutions are fully supported. I have been told that this feature is still being tested and that we may even see the latency requirements increase, when I get an official statement I will make sure to let you know.

Then there are the network enhancements. First and foremost, vMotion traffic is now fully supported over an L3 connection. So no longer is there the need for L2 adjacency for your vMotion network, I know a lot of you have asked for this and I am happy to be able to announce it. On top of that. You can now also specify which VMkernel interface should be used for migration of cold data. It is not something many people are aware off, but depending on the type of migration you are doing and the type of VM you are migrating it could be in previous versions that the Management Network was used to transfer data. (Frank Denneman described this scenario in this post.) For this specific scenario it is now possible to define a VMkernel interface for “Provisioning traffic” as shown in the screenshot below. This interface will be used for, and let me quote the documentation here, “Supports the traffic for virtual machine cold migration, cloning, and snapshot creation. You can use the provisioning TPC/IP stack to handle NFC (network file copy) traffic during long-distance vMotion. NFC provides a file-type aware FTP service for vSphere, ESXi uses NFC for copying and moving data between datastores.”

Full support for vMotion of Microsoft Cluster virtual machines is also newly introduced in vSphere 6.0. Note that these VMs will need to use physical RDMs and only supported with Windows 2008, 2008 R2, 2012 and 2012 R2. Very useful if you ask me when you need to do maintenance or you have resource contention of some kind.

That was it for now… There is some more stuff coming with regards to vMotion but I cannot disclose that yet unfortunately.

vSphere 6.0 finally announced!

Today Pat Gelsinger and Ben Fathi announced vSphere 6.0. (if you missed it you can still sign up for other events) I know many of you have been waiting on this and are ready to start your download engines but please note that this is just the announcement of GA… the bits will follow shortly. I figured I would do a quick post which details what is in vSphere 6.0 / what is new.There were a lot of announcements today, but I am just going to cover vSphere 6.0 and VSAN. I have some more detailed posts to come so I am not gonna go in to a lot of depth in this post, I just figured I would post a list of all the stuff that is in the release… or at least that I am aware off, some stuff wasn’t broadly announced.

  • vSphere 6
    • Virtual Volumes
      • Want “Virtual SAN” alike policy based management for your traditional storage systems? That is what Virtual Volumes will bring in vSphere 6.0. If you ask me this is the flagship feature in this release.
    • Long Distance vMotion
    • Cross vSwitch and vCenter vMotion
    • vMotion of MSCS VMs using pRDMs
    • vMotion L2 adjacency restrictions are lifted!
    • vSMP Fault Tolerance
    • Content Library
    • NFS 4.1 support
    • Instant Clone aka VMFork
    • vSphere HA Component Protection
    • Storage DRS and SRM support
    • Storage DRS deep integration with VASA to understand thin provisioned, deduplicated, replicated or compressed datastores!
    • Network IO Control per VM reservations
    • Storage IOPS reservations
    • Introduction of Platform Services Controller architecture for vCenter
      • SSO, licensing, certificate authority services are grouped and can be centralized for multiple vCenter Server instances
    • Linked Mode support for vCenter Server Appliance
    • Web Client performance and usability improvements
    • Max Config:
      • 64 hosts per cluster
      • 8000 VMs per cluster
      • 480 CPUs per host
      • 12TB of memory
      • 1000 VMs per host
      • 128 vCPUs per VM
      • 4TB RAM per VM
    • vSphere Replication
      • Compression of replication traffic configurable per VM
      • Isolation of vSphere Replication host traffic
    • vSphere Data Protection now includes all vSphere Data Protection Advanced functionality
      • Up to 8TB of deduped data per VDP Appliance
      • Up to 800 VMs per VDP Appliance
      • Application level backup and restore of SQL Server, Exchange, SharePoint
      • Replication to other VDP Appliances and EMC Avamar
      • Data Domain support
  • Virtual SAN 6
    • All flash configurations
    • Blade enablement through certified JBOD configurations
    • Fault Domain aka “Rack Awareness”
    • Capacity planning / “What if scenarios”
    • Support for hardware-based checksumming / encryption
    • Disk serviceability (Light LED on Failure, Turn LED on/off manually etc)
    • Disk / Diskgroup maintenance mode aka evacuation
    • Virtual SAN Health Services plugin
    • Greater scale
      • 64 hosts per cluster
      • 200 VMs per host
      • 62TB max VMDK size
      • New on-disk format enables fast cloning and snapshotting
      • 32 VM snapshots
      • From 20K IOPS to 40K IOPS in hybrid configuration per host (2x)
      • 90K IOPS with All-Flash per host

As you can see a long list of features and products that have been added or improved. I can’t wait until the GA release is available. In the upcoming days I will post some more details on some of the above listed features as there is no point in flooding the blogosphere even more with similar info.

Platform9 manages private clouds as a service

A couple of months ago I introduced you to this new company founded by 4 former VMware employees called Platform9. I have been having discussions with them occasionally about what they were working on and I’ve been very intrigued by what they are building and am very pleased to see there first version go GA and want to congratulate them with hitting this major milestone. For those who are not familiar with what they do, this is what their website says:

Platform9 Managed OpenStack is a cloud service that enables Enterprises to manage their internal server infrastructure as efficient private clouds.

In short, they have a PaaS based solution which allows you to simply manage KVM based virtualization hosts. It is a very simple way of creating a private cloud and it will literally get your KVM based solution up and running in minutes which very welcome in this world where things seem to become increasingly more complex, and especially when you talk about KVM/Openstack.

Besides the GA announcement the pricing model was also announced. The pricing model follows the same “pay per month” model as CloudPhysics has. In the case of  Platform9 the costs are $49 per CPU per month with an annual subscription being required. This is for what they call their “business tier” which has unlimited scale. There is also a “lite tier” which is free but will have limited scale and is mainly aimed for people to test Platform9 and learn about their offering. An Enterprise tier is also in the works and will offer more advanced features and premium support. Features it will include additionally to what the Business tier offers appear to be mainly in the “software defined networking”  and security space, so I would expect things like firewalling, network isolation, single sign-on etc.

I highly recommend watching the Virtualization Field Day 4 videos as they demonstrate perfectly what they are capable off. The video that is probably most interesting to you is the one where they demonstrate a beta of the offering they are planning for vSphere (embedded below). The beta shows vSphere hosts and KVM hosts in a single pane of glass. The end-user can deploy “instances” (virtual machines) in the environment of choice using a single tool which from an operational perspective is great. On top of that, Platform9 discovers existing workloads on KVM and vSphere and non-disruptively adds them to their management interface.

DRS is just a load balancing solution…

Recently I’ve been hearing this comment more and more, DRS is just a load balancing solution. It seems that some folks spread this FUD to diminish what DRS really is and does. Let me start by saying that DRS is not a load balancing solution. The ultimate goal of DRS is to ensure all workloads receive the resources they demand. Frank Denneman has a great post on this topic as this has led to some confusion in the past. I would advise reading it if you want to understand why exactly VMs are not moved while the cluster seems imbalanced. In short: why balance VMs when the VMs are not constraint? In other words, DRS has a VM centric view of the virtual world and not a host centric… In the end, it is all about your applications and how they perform and not necessarily about the infrastructure it is hosted on, DRS cares about VM/Application happiness. Also, keep in mind that there is a risk and a cost involved with every move you do.

Of course there is a lot of functionality that you leverage without thinking about it and take for granted. Things like Resource Pools (limits / reservations / shares), DRS Maintenance Mode (fully automated), VM Placement, Admission Control (yes DRS has one) and last but not least the various types of (anti) affinity rules. Also, before anyone starts shouting about active memory vs consumed (PercentIdleMBInMemDemand solves this) or %RDY taken in to account… DRS has many knobs you can twist.

But besides that, there is more. Something not a lot of people realize is that for instance HA and DRS are loosely coupled but tightly integrated. When you have both enabled on your cluster then HA will be able to call upon DRS for making the right placement decision and defragmenting resources when needed. What does that mean? Well lets assume for a second that you are running at full (or almost) capacity and a host fails while taking a host failure in to account by leveraging HA admission control. When the host fails HA will need to restart your VMs, but if there at some point is not enough spare capacity left to restart a VM on a given host? Well in that case HA will call upon DRS to make space available so that these VMs can be restarted. That is nice right?! And there is more smartness coming with considering HA / DRS admission control, hopefully I can tell you all about it soon.

Then of course there is also the case where resource pools are implemented. vSphere HA and DRS work in conjunction to ensure that when VMs are failed over that shares are flattened to avoid strange prioritisation during times of contention. HA and DRS do this as VMs always failover to the root resource pool of a host, but of course DRS will place the VMs back where they belong when it runs the first time after the failover has occurred. This especially is important when you have set shares on VMs individually in a resource pool model.

So when someone says DRS is just a simple load balancing solution take their story with a grain of salt…