Cloud native inhabitants

When ever I hear the term “cloud native” I think about my kids. It may sounds a bit strange as many of you will think about “apps” probably first when “cloud native” is dropped. Cloud native to me is not about an application, but about a problem which has been solved and a solution which is offered in a specific way. A week or so ago someone made a comment on twitter around how “Generation X” will adopt cloud faster than the current generation of IT admins…

Some even say that “Generation X” is more tech savvy, just look at how a 3 year old handles an iPad, they are growing up with technology. To be blunt… that has nothing to do with the technical skills of the 3 year old kid, but is more about the intuitive user interface that took years to develop. It comes natural to them as that is what they are exposed to from day 1. They see there mom or dad swiping a screen daily, mimicking them doesn’t require deep technical understanding of how an iPad works, they move their finger from right to left… but I digress.

My kids don’t know what a video tape is and even a CD to play music is so 2008, which for them is a lifetime, my kids are cloud native inhabitants. They use Netflix to watch TV, they use Spotify to listen to music, they use Facebook to communicate with friends, they use Youtube / Gmail and many other services running somewhere in the cloud. They are native inhabitants of the cloud. They won’t adopt cloud technology faster, for them it is a natural choice as it is what they are exposed to day in day out.

Startup intro: Rubrik. Backup and recovery redefined

Some of you may have seen the article by The Register last week about this new startup called Rubrik. Rubrik just announced what they are working on and announced their funding at the same time:

Rubrik, Inc. today announced that it has received $10 million in Series A funding and launched its Early Access Program for the Rubrik Converged Data Management platform. Rubrik offers live data access for recovery and application development by fusing enterprise data management with web-scale IT, and eliminating backup software. This marks the end of a decade-long innovation drought in backup and recovery, the backbone of IT. Within minutes, businesses can manage the explosion of data across private and public clouds.

The Register made a comment, which I want to briefly touch on. They mentioned it was odd that a venture capitalist is now the CEO for a startup and how it normally is the person with the technical vision who heads up the company. I can’t agree more with The Register. For those who don’t know Rubrik and their CEO, the choice for Bipul Sinha may come as a surprise it may seem a bit odd. Then there are some who may say that it is a logical choice considering they are funded by Lightspeed… Truth of the matter is that Bipul Sinha is the person with the technical vision. I had the pleasure to see his vision evolve from a couple of scribbles on the whiteboard to what Rubrik is right now.

I still recall having a conversation with Bipul talking about the state of the “backup industry”, and I recall we agreed the different components of a datacenter had evolved over time but that the backup industry was still very much stuck in the old world. (We agreed backup and recovery solutions suck in most cases…) Back when we had this discussion there was nothing yet, no team, no name, just a vision. Knowing what is coming in the near future and knowing their vision I do think this quote from the press release embraces best what Rubrik is working on and it will do:

Today we are excited to announce the first act in our product journey. We have built a powerful time machine that delivers live data and seamless scale in a hybrid cloud environment. Businesses can now break the shackles of legacy and modernize their data infrastructure, unleashing significant cost savings and management efficiencies.

Of course Rubrik would not be possible without a very strong team of founding members. Arvind Jain, Arvind Nithrakashyap and Soham Mazumdar are probably the strongest co-founders one can wish. The engineering team has deep experience in building distributed systems, such as Google File System, Google Search, YouTube, Facebook Data Infrastructure, Amazon Infrastructure, and Data Domain File System. Expectations just raised a couple of notches right?!

I agree that even the statement above is still a bit fluffy so lets add some more details, what are they working on? Rubrik is working on a solution which combines backup software and a backup storage appliance in to a single solution and initially will target VMware environments. They are building (and I hate using this word) a hyperconverged backup solution and it will scale from 3 to 1000s of nodes. Note that this solution will be up and running in 15 minutes and includes the option to age out data to the public cloud. What impressed me most is that Rubrik can discover your datacenter without any agents, it scales-out in a fully automated fashion and will be capable of deduplicating / compressing data but also offer the ability to mount data instantly. All of this through a slick UI or you can leverage the REST APIs , fully programmable end-to-end.

I just went over “instant mount” quickly, but I want to point out that this is not just for “restoring VMs”. Considering the REST APIs you can also imagine that this would be a perfect solution to enable test/dev environments or running Tier 2/3 workloads. How valuable is it to have instant copies of your production data available and test your new code against production without any interruption to your current environment? To throw a buzzword in there: perfectly fit for a devops world and continuous development.

That is about all I can say for now unfortunately… For those who agree that backup/recovery has not evolved and are interested in a backup solution for tomorrow, there is an early access program and I urge you to sign up to learn more but also help shaping the product! The solution is targeting environments of 200 VMs and upwards, make sure you meet those requirements. Read more here and/or follow them on twitter (or Bipul).

Good luck Rubrik, I am sure this is going to be a great journey!

Get your download engines running, vSphere 6.0 is here!

Yes the day is finally there, vSphere 6.0 / SRM / VSAN (and more) finally available. So where do you find it? Well that is simple… here:

Have fun!

vSphere 6.0: Breaking Large Pages…

When talking about Transparent Page Sharing (TPS) one thing that comes up regularly is the use of Large Pages and how that impacts TPS. As most of you hopefully know TPS does not collapse large page. However, when there is memory pressure you will see that large pages are broken up in to small pages and those small pages can then be collapsed by TPS. ESXi does this to prevent other memory reclaiming techniques, which have way more impact on performance, to kick in. You can imagine that fetching a memory page from a swap file on a spindle will take significantly longer than fetching a page from memory. (Nice white paper on the topic of memory reclamation can be found here…)

Something that I have personally ran in to a couple of times is the situation where memory pressure goes up so fast that the different states at which certain memory reclaiming techniques are used are crossed in a matter of seconds. This usually results in swapping to disk, even though large pages should have been broken up and collapsed where possible by TPS or memory should have been compressed or VMs ballooned. This is something that I’ve discussed with the respective developers and they came up with a solution. In order to understand what was implemented, lets look at how memory states were defined in vSphere 5. There were 4 memory states namely High (100% of minFree), Soft (64% of minFree), Hard (32% of minFree) and Low (16% of minFree). What does that mean % of minFree mean? Well if minFree is roughly 10GB for you configuration then the Soft for instance is reached when there is less then 64% of minFree available which is 6.4GB of memory. For Hard this is 3.2GB and so on. It should be noted that the change in state and the action it triggers does not happen exactly at the percentage mentioned, there is a lower and upper boundary where transition happens and this was done to avoid oscillation.

With vSphere 6.0 a fifth memory state is introduced and this state is called Clear. Clear is 100% of minFree and High has been redefined as 300% of MinFree. When there is less then High (300% of minFree) but more then Clear (100% of minFree) available then ESXi will start pre-emptively breaking up large pages so that TPS (when enabled!) can collapse them at next run. Lets take that 10GB as minFree as an example again, when you have between 30GB (High) and 10GB (Clear) of free memory available large pages will be broken up. This should provide the leeway needed to safely collapse pages (TPS) and avoid the potential performance decrease which the other memory states could introduce. Very useful if you ask me, and I am very happy that this change in behaviour, which I requested a long time ago, has finally made it in to the product.

Those of you who have been paying attention the last months will know that by default inter VM transparent page sharing is disabled. If you do want to reap the benefits of TPS and would like to leverage TPS in times of contention then enabling it in 6.0 is pretty straight forward. Just go to the advanced settings and set “Mem.ShareForceSalting” to 0. Do note that there are security risks potentially when doing this, and I recommend to read the above article to get a better understand of those risks.

What is new for vMotion in vSphere 6.0?

vMotion is probably my favourite VMware feature ever. It is one of those features which revolutionized the world and just when you think they can’t really innovate anymore they take it to a whole new level. So what is new?

  • Cross vSwitch vMotion
  • Cross vCenter vMotion
  • Long Distance vMotion
  • vMotion Network improvements
    • No requirement for L2 adjacency any longer!
  • vMotion support for Microsoft Clusters using physical RDMs

That is a nice long list indeed. Lets discuss each of these new features one by one and lets start at the top with Cross vSwitch vMotion. Cross vSwitch vMotion basically allows you to do what the name tells you. It allows you to migrate virtual machines between different vSwitches. Not just from vSS to vSS but also from vSS to vDS and vDS to vDS. Note that vDS to vSS is not supported. This is because when migrating from vDS metadata of the VM is transferred as well and the vSwitch does not have this logic and cannot handle the metadata. Note that the IP Address of the VM that you are migrating will not magically change, so you will need to make sure both  the source and the destination portgroup belong to the same layer 2 network. All of this is very useful during for instance Datacenter Migrations or when you are moving VMs between clusters for instance or are migrating to a new vCenter instance even.

Next on the list is Cross vCenter vMotion. This is something that came up fairly frequent when talking about vMotion, will we ever have the ability to move a VM to a new vCenter Server instance? Well as of vSphere 6.0 this is indeed possible. Not only can you move between vCenter Servers but you can do this with all the different migration types there are: change compute / storage / network. You can even do it without having a shared datastore between the source and destination vCenter aka “shared nothing migration. This functionality will come in handy when you are migrating to a different vCenter instance or even when you are migrating workloads to a different location. Note, it is a requirement for the source and destination vCenter Server to belong to the same SSO domain. What I love about this feature is that when the VM is migrated things like alarms, events, HA and DRS settings are all migrated with it. So if you have affinity rules or changed the host isolation response or set a limit or reservation it will follow the VM!

My personal favourite is Long Distance vMotion. When I say long distance, I do mean long distance. Remember that the max tolerated latency was 10ms for vMotion? With this new feature that just went up to 150ms. Long distance vMotion uses socket buffer resizing techniques to ensure that migrations succeed when latency is high. Note that this will work with any storage system, both VMFS and NFS based solutions are fully supported. (** was announced with 100ms, but updated to 150ms! **)

Then there are the network enhancements. First and foremost, vMotion traffic is now fully supported over an L3 connection. So no longer is there the need for L2 adjacency for your vMotion network, I know a lot of you have asked for this and I am happy to be able to announce it. On top of that. You can now also specify which VMkernel interface should be used for migration of cold data. It is not something many people are aware off, but depending on the type of migration you are doing and the type of VM you are migrating it could be in previous versions that the Management Network was used to transfer data. (Frank Denneman described this scenario in this post.) For this specific scenario it is now possible to define a VMkernel interface for “Provisioning traffic” as shown in the screenshot below. This interface will be used for, and let me quote the documentation here, “Supports the traffic for virtual machine cold migration, cloning, and snapshot creation. You can use the provisioning TPC/IP stack to handle NFC (network file copy) traffic during long-distance vMotion. NFC provides a file-type aware FTP service for vSphere, ESXi uses NFC for copying and moving data between datastores.”

Full support for vMotion of Microsoft Cluster virtual machines is also newly introduced in vSphere 6.0. Note that these VMs will need to use physical RDMs and only supported with Windows 2008, 2008 R2, 2012 and 2012 R2. Very useful if you ask me when you need to do maintenance or you have resource contention of some kind.

That was it for now… There is some more stuff coming with regards to vMotion but I cannot disclose that yet unfortunately.