You wanted VMTN back? VMUG to the rescue!

I’ve written about VMTN in the past and discussed the return of VMTN many times within VMware with various people all the way up to our CTO. Unfortunately due to various reasons it never happened, but fortunately the VMUG organization jumped on to it not too long ago and managed to get it revamped. If you are interested in it then see the blurb below, visit the VMUG website and sign up. I can’t tell you how excited I am about this and how surprised I was that the VMUG team has managed to pull this off in a relatively short time frame. Thanks VMUG!

Source: VMUG – EVALExperience!
VMware and VMUG have partnered with Kivuto Solutions to provide VMUG Advantage Subscribers a customized web portal that provides VMUG Advantage Subscribers with self-service capability to download software and license keys. Licenses to available VMware products are regularly updated and posted to the self-service web portal. The licenses available to VMUG Advantage Subscribers are 365-day evaluation licenses that require a one-time, annual download. Annual product downloads ensure that Subscribers receive the most up-to-date versions of products.

Included products are:

A new 365 entitlement will be offered with the renewal of your yearly VMUG Advantage Subscription. Software is provided to VMUG Advantage Subscribers with no associated entitlement to support services, and users may not purchase such services in association with the EVALExperience licenses.

DRS is just a load balancing solution…

Recently I’ve been hearing this comment more and more, DRS is just a load balancing solution. It seems that some folks spread this FUD to diminish what DRS really is and does. Let me start by saying that DRS is not a load balancing solution. The ultimate goal of DRS is to ensure all workloads receive the resources they demand. Frank Denneman has a great post on this topic as this has led to some confusion in the past. I would advise reading it if you want to understand why exactly VMs are not moved while the cluster seems imbalanced. In short: why balance VMs when the VMs are not constraint? In other words, DRS has a VM centric view of the virtual world and not a host centric… In the end, it is all about your applications and how they perform and not necessarily about the infrastructure it is hosted on, DRS cares about VM/Application happiness. Also, keep in mind that there is a risk and a cost involved with every move you do.

Of course there is a lot of functionality that you leverage without thinking about it and take for granted. Things like Resource Pools (limits / reservations / shares), DRS Maintenance Mode (fully automated), VM Placement, Admission Control (yes DRS has one) and last but not least the various types of (anti) affinity rules. Also, before anyone starts shouting about active memory vs consumed (PercentIdleMBInMemDemand solves this) or %RDY taken in to account… DRS has many knobs you can twist.

But besides that, there is more. Something not a lot of people realize is that for instance HA and DRS are loosely coupled but tightly integrated. When you have both enabled on your cluster then HA will be able to call upon DRS for making the right placement decision and defragmenting resources when needed. What does that mean? Well lets assume for a second that you are running at full (or almost) capacity and a host fails while taking a host failure in to account by leveraging HA admission control. When the host fails HA will need to restart your VMs, but if there at some point is not enough spare capacity left to restart a VM on a given host? Well in that case HA will call upon DRS to make space available so that these VMs can be restarted. That is nice right?! And there is more smartness coming with considering HA / DRS admission control, hopefully I can tell you all about it soon.

Then of course there is also the case where resource pools are implemented. vSphere HA and DRS work in conjunction to ensure that when VMs are failed over that shares are flattened to avoid strange prioritisation during times of contention. HA and DRS do this as VMs always failover to the root resource pool of a host, but of course DRS will place the VMs back where they belong when it runs the first time after the failover has occurred. This especially is important when you have set shares on VMs individually in a resource pool model.

So when someone says DRS is just a simple load balancing solution take their story with a grain of salt…

ScaleIO in the ESXi Kernel, what about the rest of the ecosystem?

Before reading my take on this, please read this great article by Vijay Ramachandran as he explains the difference between ScaleIO and VSAN in the kernel. And before I say anything, let me reinforce that this is my opinion and not VMware’s necessarily. I’ve seen some negative comments around Scale IO / VMware / EMC, most of them are around the availability of a second storage solution in the ESXi kernel next to VMware’s own Virtual SAN. The big complaint typically is: Why is EMC allowed and the rest of the ecosystem isn’t? The question though is if VMware is really not allowing other partners to do the same? While flying to Palo Alto I read an article by Itzik which stated the following:

ScaleIO 1.31 introduces several changes in the VMware environment. First, it provides the option to install the SDC natively in the ESX kernel instead of using the SVM to host the SDC component. The V1.31 SDC driver for ESX is VMware PVSP certified, and requires a host acceptance level of “PartnerSupported” or lower in the ESX hosts.

Let me point out here that the solution that EMC developed is under PVSP support. What strikes me is the fact that many seem to think that what ScaleIO achieved is a unique thing despite the “partner support” statement. Although I admit that there aren’t many storage solutions that sit within the hypervisor, and this is great innovation, it is not unique for a solution to sit within the hypervisor.

If you look at flash caching solutions for instance you will see that some sit in the hypervisor (PernixData, SanDisk’s Flashsoft) and some sit on top (Atlantis, Infinio). It is not like VMware favours one over the other in case of these partners. It was their design, it was their way to get around a problem they had… Some managed to develop a solution that sits in the hypervisor, others did not focus on that. Some probably felt that optimizing the data path first was most important, and maybe even more important they had the expertise to do so.

Believe me when I say that it isn’t easy to create these types of solutions. There is no standard framework for this today, hence they end up being partner supported as they leverage existing APIs and frameworks in an innovative way. Until there is you will see some partners sitting on top and others within the hypervisor, depending on what they want to invest in and what skill set they have… (Yes a framework is being explored as talked about in this video by one of our partners, I don’t know when or if this will be release however!)

What ScaleIO did is innovative for sure, but there are others who have done something similar and I expect more will follow in the near future. It is just a matter of time.

Two logical PCIe flash devices for VSAN

A couple of days ago I was asked whether I would recommend to use two logical PCIe flash devices leveraging a single physical PCIe flash device. The reason for the question was the recommendation from VMware to have two Virtual SAN disk groups instead of (just) one disk group.

First of all, I want to make it clear that this is a recommended practices but definitely not a requirement. The reason people have started recommending it is because of “failure domains”. As some of you may know, when a flash device becomes unavailable, which is used for read caching / write buffering and fronts a given set of disks, all the disks in that disk group associated with the flash devices becomes unavailable. As such a disk group can be considered a failure domain, and when it comes to availability it is typically best to spread risks so having multiple failure domains is desirable.

When it comes to PCIe devices would it make sense to carve up a single physical device in to multiple logical? From a failure point of view I personally think it doesn’t add much value, if the device fails then it is likely both logical devices fail. From an availability point of view there isn’t much 2 logical devices adds, however it could be beneficial to have multiple logical devices if you have more than 7 disks per server.

As most of you will know each host can have 7 disks per disk group at most and 5 disk groups per server. If there is a requirement for the server to have more than 7 disks then there will be a need to have multiple flash devices, in that scenario creating multiple logical devices would be needed, although I would still prefer having multiple physical devices from a failure tolerance perspective than having multiple logical devices. But I guess it all depends on what type of devices you use, if you have sufficient PCIe slots available etc. In the end the decision is up to you, but do make sure you understand the impact of your decision.

and that was 2014….

We are nearing the end of 2014 and the start of 2015 so I figured it was the time to look back at what I did in the past 12 months. I have a bunch of stats which may be of interest, but of course also on a more personal level 2014 was a hectic year so lets start with that.

In 2014 from a more personal point of view:

  • Joined the VMware Foundation on a life changing trip to Vietnam. 2 weeks in Vietnam helping the great (and inspiring) folks of Orphan Impact. I was happy that I was able to contribute in some way, and very happy with all the kind readers who decided to donate to Orphan Impact, it really does matter!
  • Was part of the product team that made VMware EVO:RAIL happen. What a journey… Had never been part of a product team, and seeing something growing from just an idea on a whiteboard to a full blown product and seeing it launched is a great experience
  • Was promoted to Chief Technologist. Very honoured and humbled. What an insane journey the last 7 years at VMware, starting in PSO as a senior consultant, who would have thought.
  • Joined the VMware Office of CTO. Long dream to join this team of super heroes.
  • Co-authored “Essential Virtual SAN”, published through VMware Press and went to China for a book signing with a crazy long queue of people waiting.

Needless to say, but 2014 was hectic and life changing in many ways for me. But what about the stats?

  • Published 143 new articles, bringing it to a total of 1810 posts (including this one)
  • My 5 top referrers:
  • Visitors came from 224 countries with the US leading by large (6x more than the UK and India)
  • I had 3 major contributors in terms of comments, thanks John Nicholson / Forbsy and James Hess!
  • My top visited page was the Virtual SAN page
  • My top article was the EVO:RAIL Introduction article, but all VSAN articles score extremely high
  • Visitor traffic grew with 20%

Also from a stats perspective 2014 was a great year. I am hoping 2015 will be even better, and I want to wish all of you a Happy New Year!