vSphere Metro Storage Cluster using Virtual SAN, can I do it?

This question keeps on coming up over and over again lately, vSphere Metro Storage Cluster (vMSC) using Virtual SAN, can I do it? (I guess the real question is “should you do it”.) It seems that Virtual SAN/VSAN is getting more and more traction, even though it is still beta and people are trying to come up with all sorts of interesting usecases. At VMworld various people asked if they could use VSAN to implement a vMSC solution during my sessions and the last couple of weeks this question just keeps on coming up in emails etc.

I guess if you look at what VSAN is and does it makes sense for people to ask this question. It is a distributed storage solution with a synchronous distributed caching layer that allows for high resiliency. You can specify the number of copies required of your data and VSAN will take care of the magic for you, if a component of your cluster fails then VSAN can respond to it accordingly. This is what you would like to see I guess:

Now let it be clear, the above is what you would like to see in a stretched environment but unfortunately not what VSAN can do in its current form. I guess if you look at the following it becomes clear why it might not be a such a great idea to use VSAN for this use case at this point in time.

The problem here is:

  • Object placement: You will want that second mirror copy to be in Location B but you cannot control it today as you cannot define “failure domains” within VSAN at the moment.
  • Witness placement: Essentially you want to have the ability to have a 3rd site that functions as a tiebreaker when there is a partition / isolation event.
  • Support: No one has tested/certified VSAN over distance, in other words… not supported

For now, the answer is to the question can I use Virtual SAN to build a vSphere Metro Storage Cluster is: No, it is not supported to span a VSAN cluster over distance. The feedback and request from many of you has been heard loud and clear by our developers and PM team… And at VMworld it was already mentioned by one of the developers that he was intrigued by the use case and he would be looking in to it in the future. Of course, there was no mention of when this would happen or even if it would ever happen.

Virtual SAN and Network IO Control

Since I started playing with Virtual SAN there was something that I more or less avoided / neglected and that is Network IO Control. However, Virtual SAN and Network IO Control should go hand-in-hand. (And as such the Distributed Switch.) Note that when using VSAN (beta) the Distributed Switch and Network IO Control come with it. I guess I skipped it as there were more exciting thing to talk about, but as more and more people are asking about it I figured it is time to discuss Virtual SAN and Network IO Control. Before we get started, lets list the type of networks we will have within the VSAN cluster:

  • Management Network
  • vMotion Network
  • Virtual SAN Network
  • Virtual Machine Network

Considering it is recommend to use 10GbE with Virtual SAN that is what I will assume with this blog post. In most of these cases, at least I would hope, there will be a form of redundancy and as such we will have 2 x 10GbE to our disposal. So how would I recommend to configure the network?

Lets start with the various portgroups and VMkernel interfaces:

  • 1 x Management Network VMkernel interface
  • 1 x vMotion VMkernel interface (All interfaces need to be in the same subnet)
  • 1 x Virtual SAN VMkernel interface
  • 1 x Virtual Machine Portgroup

Some of you might be surprised that I have only listed the vMotion VMkernel interface and the Virtual SAN VMkernel interface once… And after various discussions and thinking about this for those I figured I would keep things as simple as possible, especially considering the average IO profile of server environments.

By default we can make sure the various traffic types are separated on different physical ports, but we can also set limits and shares when desired. I do not recommend using limits though, why limit a traffic type when you can use shares and “artificially limit” your traffic types based on resource usage and demand?! Also note that shares and limits are enforced per uplink.

So we will be using shares, as shares only come in to play when there is contention. What we will do is take 20GbE in to account and carve it up. Easiest way, if you ask me, is to say each traffic type gets an X number of GbE assigned at a minimum which is based on some of the recommendations out there for these types of traffic:

  • Management Network –> 1GbE
  • vMotion VMK –> 5GbE
  • Virtual Machine PG –> 2GbE
  • Virtual SAN VMkernel interface –> 10GbE

Now as you can see “management”, “virtual machine” and vMotion” traffic share Port 1 and “Virtual SAN” traffic uses Port 2. This way we have sufficient bandwidth for all the various types of traffic in a normal state. We also want to make sure that no traffic type can push out other types of traffic, for that we will use the Network IO Control shares mechanism.

Now lets look at it from a shares perspective.You will want to make sure that for instance vMotion and Virtual SAN always has sufficient bandwidth. I will work under the assumption that I only have 1 physical port available and all traffic types share the same physical port. We know this is not the case, but lets take a “worst case scenario” approach.

Lets assume you have a 1000 shares in total and lets take a worst case scenario in to account where 1 physical 10GbE ports has failed and only 1 is used for all traffic. By taking this approach you ensure that Virtual SAN always has 50% of the bandwidth to its disposal while leaving the remaining traffic types with sufficients bandwidth to avoid a potential self-inflicted DoS.

Traffic Type Shares Limit
Management Network  20 n/a
vMotion VMkernel Interface  50 n/a
Virtual Machine Portgroup  30 n/a
Virtual SAN VMkernel Interface  100 n/a

You can imagine that when you select the uplinks used for the various types of traffic in a smart way that even more bandwidth can be leveraged by the various traffic types. After giving it some thought, this is what I would recommend per traffic type:

  • Management Network VMkernel interface = Explicit Fail-over order = P1 active / P2 standby
  • vMotion VMkernel interface = Explicit Fail-over order = P1 active / P2 standby
  • Virtual Machine Portgroup = Explicit Fail-over order = P1 active / P2 standby
  • Virtual SAN VMkernel interface = Explicit Fail-over order = P2 active / P1 standby

Why use Explicit Fail-over order for these types? The best explanation here is predictability. By separating traffic types we allow for optimal storage performance while also providing vMotion and virtual machine traffic sufficient bandwidth.

Also vMotion traffic is bursty and can / will consume all available bandwidth, so when combined with Virtual SAN on the same uplink you could see how these two could potentially hurt each other. Of course depending on the IO profile of your virtual machines and the type of operations being done. But you can see how a vMotion of a virtual machine provisioned with a lot of memory can impact the available bandwidth for other traffic types. Don’t ignore this, use Network IO Control!

Lets try to visualize things, makes it easier to digest. Just to be clear, dotted lines are “standby” and the others are “active”.

Virtual SAN and Network IO Control

I hope this provides some guidance around how to configure Virtual SAN and Network IO Control in a VSAN environment. Of course there are various ways of doing it, this is my recommendation and my attempt to keep things simple and based on experience with the products.

4 is the minimum number of hosts for VSAN if you ask me

What is the minimum number of hosts for VSAN? This is one of those discussions which is difficult… I mean, what is the minimum number of hosts for vSphere HA for instance. If you ask anyone that question then most people will say: the minimum number for HA is 2. However, when you think about why you are using vSphere HA then you will realize pretty quick that the actual minimum number is 3.

Why is that? Well you can imagine that when you need to upgrade your hosts you also want some form of resiliency for your virtual machines. Guess what, if you have only 2 hosts and you are upgrading 1 of them and the other fails… Where would your virtual machines be restarted? I can give you the answer: nowhere. The only host you had left is in maintenance mode and undergoing an upgrade. So in that case you are … euhm screwed.

Now lets looks at VSAN, in order to comply to a “number of failures to tolerate = 1″ policy you will need 3 hosts at a minimum at all times. Even if 1 host fails miserably then you can still access your data because with 3 hosts and 2 mirror copies and a witness you will still have > 50% of your copies available. But what happens when you place one of those hosts in maintenance mode?

Well I guess when both remaining hosts keep on functioning as expected then all VMs will just keep on running, however if one fails… then… then you have a challenge. So think about the number of hosts you want to have supporting your VSAN datastore!

I guess the question then arises, with this “number of failures to tolerate” policy, how many hosts do I need at a minimum? How many mirror copies will be created and how many witnesses? Also, how many hosts will I need when I want to take “maintenance mode” in to consideration?

Number of Failures Mirror copies Witnesses Min. Hosts Hosts + Maintenance
0 1 0 1 host n/a
1 2 1 3 hosts 4 hosts
2 3 2 5 hosts 6 hosts
3 4 3 7 hosts 8 hosts

I hope that helps making the right decision…

How to configure the Virtual SAN observer for monitoring/troubleshooting

There have been various blog posts on the topic of configuring the Virtual SAN observer on both Windows and Linux by Rawlinson Rivera and Erik Bussink. I like to keep things in a single location and document them for my own use so I figured I would do a write-up for yellow-bricks.com. First of all, what is the Virtual SAN / VSAN observer? One of our engineers (Christian Dickmann) published an internal blog on this topic and I believe it explains what it is / what it does best:

You will also find VSAN datastore as well as VM level performance statistics in the vSphere Web Client. If however you are the kind of guy who wants to really drill down on your VSAN performance in-depth, down to the physical disk layers, understand cache hit rates, reasons for observed latencies, etc. then the vSphere Web Client won’t satisfy your thirst in vSphere 5.5. That’s where the VSAN observer comes in.

So how do I enable it? Well I am a big fan of the vCenter Server Appliance so that will be my focus. Just a couple of short steps to get this up and running luckily:

  • Open an ssh session to your vCenter Server Appliance:
    • ssh root@<name or ip of your vcva>
  • Open rvc using your root account and the vCenter name, in my case:
    • rvc root@localhost
  • Now do a “cd” in to your vCenter object (you can do an “ls” so see what the names are of your objects on any level), and if you do tab it will be completed with your datacenter object:
    • cd localhost/Datacenter/
  • Now do a “cd” again, the first object is “computers” and the second is your “cluster”, in my case that looks as follows:
    • cd computers/VSANCluster/
  • Now you can start the VSAN observer using the following command:
    • vsan.observer . –run-webserver –force
  • Now you can see the observer querying stats every 60 seconds, and as mentioned you can stop this by doing a <Ctrl>+<C>

Fairly straight forward right? You can now go to the observer console using:

  • http://<vcenter name or ip>:8010
    The below is what it should look like (Thanks Rawlinson for the nice screenshot)

Now one thing that is important to realize is that everything is kept in memory until you stop the VSAN observer… So it will take up GBs after a couple of hours. This tool is intended for short term monitoring and troubleshooting. Now there are  some other commands in RVC that might be useful. One of the commands I found useful was “vsan.resync_dashboard”. Basically it shows you what is happening in terms of mirror sync’ing. If you fail a host, you should see the sync happening here…

I also found “vsan.vm_object_info” very useful and interesting as it allows you to see the state of your objects. And for the geeks who do not prefer to see the pretty graphs the observer shows, take a look at “vsan.vm_perf_stats”.

Startup News Flash part 7

VMworld europe is this week and I’ve been very busy just running around on the show floor and doing sessions. Considering there were a couple of small but worthy updates I figured I would publish this one in between sessions… Here it is: Startup News Flash part 7.

announced yesterday that Ken Klein is taking on the role of Chief Executive Officer  and Former CEO and founder, Kieran Harty assumes the new role of Chief Technology Officer and will drive Tintri’s product strategy and roadmap.

has just announced a program called PernixPro, which gives industry experts free access to PernixData FVP software + various tools for collaborating with PernixData experts and R&D. If you are a vExpert or a VCDX and want to get familiar with FVP, sign up here.

“Traditionally” SimpliVity has been more focused on generic server virtualization with a high level of integration with regards to DR and Back-up / Recovery. This week SimpliVity announced they are entering the VDI space. They announced new partnership agreements with NVIDIA and Teradici. What I like about their platform is, that although they offer a hyperconverged solution, that you can connect from the outside in. Meaning scale compute independ of storage. Also, their platform offer inline deduple, optimized full clones, and 1:1 persistent desktops. For more details hit up their website.

Nutanix just announced that they have been validated by VMware for the VMware Horizon View Agent Direct-Connection, which is part of the Horizon Suite 5.3.

For those who missed it, read the Startup Intro I posted this week on CohoData. Interesting company / solution if you ask me!