Software Defined

Virtual SAN and Network IO Control

Duncan Epping · Oct 29, 2013 ·

Since I started playing with Virtual SAN there was something that I more or less avoided / neglected and that is Network IO Control. However, Virtual SAN and Network IO Control should go hand-in-hand. (And as such the Distributed Switch.) Note that when using VSAN (beta) the Distributed Switch and Network IO Control come with it. I guess I skipped it as there were more exciting thing to talk about, but as more and more people are asking about it I figured it is time to discuss Virtual SAN and Network IO Control. Before we get started, lets list the type of networks we will have within the VSAN cluster:

Management Network
vMotion Network
Virtual SAN Network
Virtual Machine Network

Considering it is recommend to use 10GbE with Virtual SAN that is what I will assume with this blog post. In most of these cases, at least I would hope, there will be a form of redundancy and as such we will have 2 x 10GbE to our disposal. So how would I recommend to configure the network?

Lets start with the various portgroups and VMkernel interfaces:

1 x Management Network VMkernel interface
1 x vMotion VMkernel interface (All interfaces need to be in the same subnet)
1 x Virtual SAN VMkernel interface
1 x Virtual Machine Portgroup

Some of you might be surprised that I have only listed the vMotion VMkernel interface and the Virtual SAN VMkernel interface once… And after various discussions and thinking about this for those I figured I would keep things as simple as possible, especially considering the average IO profile of server environments.

By default we can make sure the various traffic types are separated on different physical ports, but we can also set limits and shares when desired. I do not recommend using limits though, why limit a traffic type when you can use shares and “artificially limit” your traffic types based on resource usage and demand?! Also note that shares and limits are enforced per uplink.

So we will be using shares, as shares only come in to play when there is contention. What we will do is take 20GbE in to account and carve it up. Easiest way, if you ask me, is to say each traffic type gets an X number of GbE assigned at a minimum which is based on some of the recommendations out there for these types of traffic:

Management Network –> 1GbE
vMotion VMK –> 5GbE
Virtual Machine PG –> 2GbE
Virtual SAN VMkernel interface –> 10GbE

Now as you can see “management”, “virtual machine” and vMotion” traffic share Port 1 and “Virtual SAN” traffic uses Port 2. This way we have sufficient bandwidth for all the various types of traffic in a normal state. We also want to make sure that no traffic type can push out other types of traffic, for that we will use the Network IO Control shares mechanism.

Now lets look at it from a shares perspective.You will want to make sure that for instance vMotion and Virtual SAN always has sufficient bandwidth. I will work under the assumption that I only have 1 physical port available and all traffic types share the same physical port. We know this is not the case, but lets take a “worst case scenario” approach.

Lets assume you have a 1000 shares in total and lets take a worst case scenario in to account where 1 physical 10GbE ports has failed and only 1 is used for all traffic. By taking this approach you ensure that Virtual SAN always has 50% of the bandwidth to its disposal while leaving the remaining traffic types with sufficients bandwidth to avoid a potential self-inflicted DoS.

Traffic Type	Shares	Limit
Management Network	20	n/a
vMotion VMkernel Interface	50	n/a
Virtual Machine Portgroup	30	n/a
Virtual SAN VMkernel Interface	100	n/a

You can imagine that when you select the uplinks used for the various types of traffic in a smart way that even more bandwidth can be leveraged by the various traffic types. After giving it some thought, this is what I would recommend per traffic type:

Management Network VMkernel interface = Explicit Fail-over order = P1 active / P2 standby
vMotion VMkernel interface = Explicit Fail-over order = P1 active / P2 standby
Virtual Machine Portgroup = Explicit Fail-over order = P1 active / P2 standby
Virtual SAN VMkernel interface = Explicit Fail-over order = P2 active / P1 standby

Why use Explicit Fail-over order for these types? The best explanation here is predictability. By separating traffic types we allow for optimal storage performance while also providing vMotion and virtual machine traffic sufficient bandwidth.

Also vMotion traffic is bursty and can / will consume all available bandwidth, so when combined with Virtual SAN on the same uplink you could see how these two could potentially hurt each other. Of course depending on the IO profile of your virtual machines and the type of operations being done. But you can see how a vMotion of a virtual machine provisioned with a lot of memory can impact the available bandwidth for other traffic types. Don’t ignore this, use Network IO Control!

Lets try to visualize things, makes it easier to digest. Just to be clear, dotted lines are “standby” and the others are “active”.

I hope this provides some guidance around how to configure Virtual SAN and Network IO Control in a VSAN environment. Of course there are various ways of doing it, this is my recommendation and my attempt to keep things simple and based on experience with the products.

4 is the minimum number of hosts for VSAN if you ask me

Duncan Epping · Oct 24, 2013 ·

<Update 1-oct-15>Make sure to read this article also as it is based on Virtual SAN 6.1, which is the current latest version </update>

What is the minimum number of hosts for VSAN? This is one of those discussions which is difficult… I mean, what is the minimum number of hosts for vSphere HA for instance. If you ask anyone that question then most people will say: the minimum number for HA is 2. However, when you think about why you are using vSphere HA then you will realize pretty quick that the actual minimum number is 3.

Why is that? Well you can imagine that when you need to upgrade your hosts you also want some form of resiliency for your virtual machines. Guess what, if you have only 2 hosts and you are upgrading 1 of them and the other fails… Where would your virtual machines be restarted? I can give you the answer: nowhere. The only host you had left is in maintenance mode and undergoing an upgrade. So in that case you are … euhm screwed.

Now lets looks at VSAN, in order to comply to a “number of failures to tolerate = 1” policy you will need 3 hosts at a minimum at all times. Even if 1 host fails miserably then you can still access your data because with 3 hosts and 2 mirror copies and a witness you will still have > 50% of your copies available. But what happens when you place one of those hosts in maintenance mode?

Well I guess when both remaining hosts keep on functioning as expected then all VMs will just keep on running, however if one fails… then… then you have a challenge. So think about the number of hosts you want to have supporting your VSAN datastore!

I guess the question then arises, with this “number of failures to tolerate” policy, how many hosts do I need at a minimum? How many mirror copies will be created and how many witnesses? Also, how many hosts will I need when I want to take “maintenance mode” in to consideration?

Number of Failures	Mirror copies	Witnesses	Min. Hosts	Hosts + Maintenance
0	1	0	1 host	n/a
1	2	1	3 hosts	4 hosts
2	3	2	5 hosts	6 hosts
3	4	3	7 hosts	8 hosts

I hope that helps making the right decision…

How to configure the Virtual SAN observer for monitoring/troubleshooting

Duncan Epping · Oct 21, 2013 ·

There have been various blog posts on the topic of configuring the Virtual SAN observer on both Windows and Linux by Rawlinson Rivera and Erik Bussink. I like to keep things in a single location and document them for my own use so I figured I would do a write-up for yellow-bricks.com. First of all, what is the Virtual SAN / VSAN observer? One of our engineers (Christian Dickmann) published an internal blog on this topic and I believe it explains what it is / what it does best:

You will also find VSAN datastore as well as VM level performance statistics in the vSphere Web Client. If however you are the kind of guy who wants to really drill down on your VSAN performance in-depth, down to the physical disk layers, understand cache hit rates, reasons for observed latencies, etc. then the vSphere Web Client won’t satisfy your thirst in vSphere 5.5. That’s where the VSAN observer comes in.

So how do I enable it? Well I am a big fan of the vCenter Server Appliance so that will be my focus. Just a couple of short steps to get this up and running luckily:

Open an ssh session to your vCenter Server Appliance:
- ssh root@<name or ip of your vcva>
Open rvc using your root account and the vCenter name, in my case:
- rvc root@localhost
Now do a “cd” in to your vCenter object (you can do an “ls” so see what the names are of your objects on any level), and if you do tab it will be completed with your datacenter object:
- cd localhost/Datacenter/
Now do a “cd” again, the first object is “computers” and the second is your “cluster”, in my case that looks as follows:
- cd computers/VSANCluster/
Now you can start the VSAN observer using the following command:
- vsan.observer . –run-webserver –force
Now you can see the observer querying stats every 60 seconds, and as mentioned you can stop this by doing a <Ctrl>+<C>

Fairly straight forward right? You can now go to the observer console using:

http://<vcenter name or ip>:8010
The below is what it should look like (Thanks Rawlinson for the nice screenshot)

Now one thing that is important to realize is that everything is kept in memory until you stop the VSAN observer… So it will take up GBs after a couple of hours. This tool is intended for short term monitoring and troubleshooting. Now there are some other commands in RVC that might be useful. One of the commands I found useful was “vsan.resync_dashboard”. Basically it shows you what is happening in terms of mirror sync’ing. If you fail a host, you should see the sync happening here…

I also found “vsan.vm_object_info” very useful and interesting as it allows you to see the state of your objects. And for the geeks who do not prefer to see the pretty graphs the observer shows, take a look at “vsan.vm_perf_stats”.

Startup News Flash part 7

Duncan Epping · Oct 16, 2013 ·

VMworld europe is this week and I’ve been very busy just running around on the show floor and doing sessions. Considering there were a couple of small but worthy updates I figured I would publish this one in between sessions… Here it is: Startup News Flash part 7.

announced yesterday that Ken Klein is taking on the role of Chief Executive Officer and Former CEO and founder, Kieran Harty assumes the new role of Chief Technology Officer and will drive Tintri’s product strategy and roadmap.

Pernix has just announced a program called PernixPro, which gives industry experts free access to PernixData FVP software + various tools for collaborating with PernixData experts and R&D. If you are a vExpert or a VCDX and want to get familiar with FVP, sign up here.

“Traditionally” SimpliVity has been more focused on generic server virtualization with a high level of integration with regards to DR and Back-up / Recovery. This week SimpliVity announced they are entering the VDI space. They announced new partnership agreements with NVIDIA and Teradici. What I like about their platform is, that although they offer a hyperconverged solution, that you can connect from the outside in. Meaning scale compute independ of storage. Also, their platform offer inline deduple, optimized full clones, and 1:1 persistent desktops. For more details hit up their website.

Nutanix just announced that they have been validated by VMware for the VMware Horizon View Agent Direct-Connection, which is part of the Horizon Suite 5.3.

For those who missed it, read the Startup Intro I posted this week on CohoData. Interesting company / solution if you ask me!

Startup intro: Coho Data

Duncan Epping · Oct 15, 2013 ·

Today a new startup is revealed named Coho Data, formerly known as Convergent.io. Coho Data was founded by Andrew Warfield, Keir Fraser and Ramana Jonnala. For those who care, they are backed by Andreessen Horowitz. Probably most known for the work they did at Citrix on Xenserver. What is it they introduced / revealed this week?

Coho Data introduces a new scale-out hybrid storage solution (NFS for VM workloads). With hybrid meaning a mix of SATA and SSD. This for obvious reasons, SATA bringing you capacity and flash providing you raw performance. Let me point out that Coho is not a hyperconverged solution, it is a full storage system.

What does it look like? It is a 2U box which holds 2 “MicroArrays” which each MicroArray having 2 processors, 2 x 10GbE NIC port and 2 PCIe INTEL 910 cards. Each 2u block provides you 39TB of capacity and ~180K IOPS (Random 80/20 read/write, 4K block size). Starting at $2.50 per GB, pre-dedupe & compression (which they of course offer). Couple of things I liked looking at their architecture, first and probably foremost the “scale-out” architecture, scale to infinity is what they say in a linear fashion. On top of that, it comes with an OpenFlow-enabled 10GbE switch to allow for ease of management and again scalability.

If you look closely at how they architected their hardware, they created these highspeed IO lanes: 10GbE NIC <–> CPU <–> PCIe Flash Unit. Each highway has its dedicated CPU, NIC Port, ad on top of that they PCIe Flash, allowing for optimal performance, efficiency and fine grained control. Nice touch if you ask me.

Another thing I really liked was their UI. You can really see they put a lot of thought in the user experience aspect by keeping things simple and presenting data in an easy understandable way. I wish every vendor did that. I mean, if you look at the screenshot below how simple does that look? Dead simple right!? I’ve seen some of the other screens, like for instance for creating a snapshot schedule… again same simplicity. Apparently, and I have not tested this but I will believe them on their word, they brought that simplicity all the way down to the “install / configure” part of things. Getting Coho Data up and running literally only takes 15 minutes.

What I also liked very much about the Coho Data solution is that Software-defined Networking (SDN) and Software-defined Storage (SDS) are tightly coupled. In other words, Soho configures the network for you… As just said, it takes 15 minutes to setup. Try creating the zoning / masking scheme for a storage system and a set of LUNs these days, even that takes more time then 15 – 20 minutes. There aren’t too many vendors combining SDN and SDS in a smart fashion today.

When they briefed me they gave me a short demo and Andy explained the scale-out architecture, during the demo it happened various times that I could draw a parallel between the VMware virtualization platform and their solution which made is easy for me to understand and relate to their solution. For instance, Soho Data offers what I would call DRS for Software-Defined Storage. If for whatever reasons defined policies are violated then Coho Data will balance the workload appropriately across the cluster. Just like DRS (and Storage DRS) does, Coho Data will do a risk/benefit analysis before initiating the move. I guess the logical question would be, well why would I want Coho to do this when VMware can also do this with Storage DRS? Well keep in mind that Storage DRS works “across datastores”, but as Coho presents a single datastore you need something that allows you to balance within.

I guess the question then remains what do they lack today? Well today as a 1.0 platform Coho doesn’t offer replication to outside of their own cluster. But considering they have snapshotting in place I suspect their architecture already caters for it, and it something they should be able to release fairly quickly. Another thing which is lacking today is a vSphere Web Client plugin, but then again if you look at their current UI and the simplicity of it I do wonder if there is any point in having one.

All in all, I have been impressed by these newcomers in the SDS space and I can’t wait to play around with their gear at some point!