Platform9 manages private clouds as a service

A couple of months ago I introduced you to this new company founded by 4 former VMware employees called Platform9. I have been having discussions with them occasionally about what they were working on and I’ve been very intrigued by what they are building and am very pleased to see there first version go GA and want to congratulate them with hitting this major milestone. For those who are not familiar with what they do, this is what their website says:

Platform9 Managed OpenStack is a cloud service that enables Enterprises to manage their internal server infrastructure as efficient private clouds.

In short, they have a PaaS based solution which allows you to simply manage KVM based virtualization hosts. It is a very simple way of creating a private cloud and it will literally get your KVM based solution up and running in minutes which very welcome in this world where things seem to become increasingly more complex, and especially when you talk about KVM/Openstack.

Besides the GA announcement the pricing model was also announced. The pricing model follows the same “pay per month” model as CloudPhysics has. In the case of  Platform9 the costs are $49 per CPU per month with an annual subscription being required. This is for what they call their “business tier” which has unlimited scale. There is also a “lite tier” which is free but will have limited scale and is mainly aimed for people to test Platform9 and learn about their offering. An Enterprise tier is also in the works and will offer more advanced features and premium support. Features it will include additionally to what the Business tier offers appear to be mainly in the “software defined networking”  and security space, so I would expect things like firewalling, network isolation, single sign-on etc.

I highly recommend watching the Virtualization Field Day 4 videos as they demonstrate perfectly what they are capable off. The video that is probably most interesting to you is the one where they demonstrate a beta of the offering they are planning for vSphere (embedded below). The beta shows vSphere hosts and KVM hosts in a single pane of glass. The end-user can deploy “instances” (virtual machines) in the environment of choice using a single tool which from an operational perspective is great. On top of that, Platform9 discovers existing workloads on KVM and vSphere and non-disruptively adds them to their management interface.

You wanted VMTN back? VMUG to the rescue!

I’ve written about VMTN in the past and discussed the return of VMTN many times within VMware with various people all the way up to our CTO. Unfortunately due to various reasons it never happened, but fortunately the VMUG organization jumped on to it not too long ago and managed to get it revamped. If you are interested in it then see the blurb below, visit the VMUG website and sign up. I can’t tell you how excited I am about this and how surprised I was that the VMUG team has managed to pull this off in a relatively short time frame. Thanks VMUG!

Source: VMUG – EVALExperience!
VMware and VMUG have partnered with Kivuto Solutions to provide VMUG Advantage Subscribers a customized web portal that provides VMUG Advantage Subscribers with self-service capability to download software and license keys. Licenses to available VMware products are regularly updated and posted to the self-service web portal. The licenses available to VMUG Advantage Subscribers are 365-day evaluation licenses that require a one-time, annual download. Annual product downloads ensure that Subscribers receive the most up-to-date versions of products.

Included products are:

A new 365 entitlement will be offered with the renewal of your yearly VMUG Advantage Subscription. Software is provided to VMUG Advantage Subscribers with no associated entitlement to support services, and users may not purchase such services in association with the EVALExperience licenses.

DRS is just a load balancing solution…

Recently I’ve been hearing this comment more and more, DRS is just a load balancing solution. It seems that some folks spread this FUD to diminish what DRS really is and does. Let me start by saying that DRS is not a load balancing solution. The ultimate goal of DRS is to ensure all workloads receive the resources they demand. Frank Denneman has a great post on this topic as this has led to some confusion in the past. I would advise reading it if you want to understand why exactly VMs are not moved while the cluster seems imbalanced. In short: why balance VMs when the VMs are not constraint? In other words, DRS has a VM centric view of the virtual world and not a host centric… In the end, it is all about your applications and how they perform and not necessarily about the infrastructure it is hosted on, DRS cares about VM/Application happiness. Also, keep in mind that there is a risk and a cost involved with every move you do.

Of course there is a lot of functionality that you leverage without thinking about it and take for granted. Things like Resource Pools (limits / reservations / shares), DRS Maintenance Mode (fully automated), VM Placement, Admission Control (yes DRS has one) and last but not least the various types of (anti) affinity rules. Also, before anyone starts shouting about active memory vs consumed (PercentIdleMBInMemDemand solves this) or %RDY taken in to account… DRS has many knobs you can twist.

But besides that, there is more. Something not a lot of people realize is that for instance HA and DRS are loosely coupled but tightly integrated. When you have both enabled on your cluster then HA will be able to call upon DRS for making the right placement decision and defragmenting resources when needed. What does that mean? Well lets assume for a second that you are running at full (or almost) capacity and a host fails while taking a host failure in to account by leveraging HA admission control. When the host fails HA will need to restart your VMs, but if there at some point is not enough spare capacity left to restart a VM on a given host? Well in that case HA will call upon DRS to make space available so that these VMs can be restarted. That is nice right?! And there is more smartness coming with considering HA / DRS admission control, hopefully I can tell you all about it soon.

Then of course there is also the case where resource pools are implemented. vSphere HA and DRS work in conjunction to ensure that when VMs are failed over that shares are flattened to avoid strange prioritisation during times of contention. HA and DRS do this as VMs always failover to the root resource pool of a host, but of course DRS will place the VMs back where they belong when it runs the first time after the failover has occurred. This especially is important when you have set shares on VMs individually in a resource pool model.

So when someone says DRS is just a simple load balancing solution take their story with a grain of salt…

ScaleIO in the ESXi Kernel, what about the rest of the ecosystem?

Before reading my take on this, please read this great article by Vijay Ramachandran as he explains the difference between ScaleIO and VSAN in the kernel. And before I say anything, let me reinforce that this is my opinion and not VMware’s necessarily. I’ve seen some negative comments around Scale IO / VMware / EMC, most of them are around the availability of a second storage solution in the ESXi kernel next to VMware’s own Virtual SAN. The big complaint typically is: Why is EMC allowed and the rest of the ecosystem isn’t? The question though is if VMware is really not allowing other partners to do the same? While flying to Palo Alto I read an article by Itzik which stated the following:

ScaleIO 1.31 introduces several changes in the VMware environment. First, it provides the option to install the SDC natively in the ESX kernel instead of using the SVM to host the SDC component. The V1.31 SDC driver for ESX is VMware PVSP certified, and requires a host acceptance level of “PartnerSupported” or lower in the ESX hosts.

Let me point out here that the solution that EMC developed is under PVSP support. What strikes me is the fact that many seem to think that what ScaleIO achieved is a unique thing despite the “partner support” statement. Although I admit that there aren’t many storage solutions that sit within the hypervisor, and this is great innovation, it is not unique for a solution to sit within the hypervisor.

If you look at flash caching solutions for instance you will see that some sit in the hypervisor (PernixData, SanDisk’s Flashsoft) and some sit on top (Atlantis, Infinio). It is not like VMware favours one over the other in case of these partners. It was their design, it was their way to get around a problem they had… Some managed to develop a solution that sits in the hypervisor, others did not focus on that. Some probably felt that optimizing the data path first was most important, and maybe even more important they had the expertise to do so.

Believe me when I say that it isn’t easy to create these types of solutions. There is no standard framework for this today, hence they end up being partner supported as they leverage existing APIs and frameworks in an innovative way. Until there is you will see some partners sitting on top and others within the hypervisor, depending on what they want to invest in and what skill set they have… (Yes a framework is being explored as talked about in this video by one of our partners, I don’t know when or if this will be release however!)

What ScaleIO did is innovative for sure, but there are others who have done something similar and I expect more will follow in the near future. It is just a matter of time.

Two logical PCIe flash devices for VSAN

A couple of days ago I was asked whether I would recommend to use two logical PCIe flash devices leveraging a single physical PCIe flash device. The reason for the question was the recommendation from VMware to have two Virtual SAN disk groups instead of (just) one disk group.

First of all, I want to make it clear that this is a recommended practices but definitely not a requirement. The reason people have started recommending it is because of “failure domains”. As some of you may know, when a flash device becomes unavailable, which is used for read caching / write buffering and fronts a given set of disks, all the disks in that disk group associated with the flash devices becomes unavailable. As such a disk group can be considered a failure domain, and when it comes to availability it is typically best to spread risks so having multiple failure domains is desirable.

When it comes to PCIe devices would it make sense to carve up a single physical device in to multiple logical? From a failure point of view I personally think it doesn’t add much value, if the device fails then it is likely both logical devices fail. From an availability point of view there isn’t much 2 logical devices adds, however it could be beneficial to have multiple logical devices if you have more than 7 disks per server.

As most of you will know each host can have 7 disks per disk group at most and 5 disk groups per server. If there is a requirement for the server to have more than 7 disks then there will be a need to have multiple flash devices, in that scenario creating multiple logical devices would be needed, although I would still prefer having multiple physical devices from a failure tolerance perspective than having multiple logical devices. But I guess it all depends on what type of devices you use, if you have sufficient PCIe slots available etc. In the end the decision is up to you, but do make sure you understand the impact of your decision.