virtual san

vSphere Metro Storage Cluster using Virtual SAN, can I do it?

Duncan Epping · Oct 31, 2013 ·

This question keeps on coming up over and over again lately, vSphere Metro Storage Cluster (vMSC) using Virtual SAN, can I do it? (I guess the real question is “should you do it”.) It seems that Virtual SAN/VSAN is getting more and more traction, even though it is still beta and people are trying to come up with all sorts of interesting usecases. At VMworld various people asked if they could use VSAN to implement a vMSC solution during my sessions and the last couple of weeks this question just keeps on coming up in emails etc.

I guess if you look at what VSAN is and does it makes sense for people to ask this question. It is a distributed storage solution with a synchronous distributed caching layer that allows for high resiliency. You can specify the number of copies required of your data and VSAN will take care of the magic for you, if a component of your cluster fails then VSAN can respond to it accordingly. This is what you would like to see I guess:

Now let it be clear, the above is what you would like to see in a stretched environment but unfortunately not what VSAN can do in its current form. I guess if you look at the following it becomes clear why it might not be a such a great idea to use VSAN for this use case at this point in time.

The problem here is:

Object placement: You will want that second mirror copy to be in Location B but you cannot control it today as you cannot define “failure domains” within VSAN at the moment.
Witness placement: Essentially you want to have the ability to have a 3rd site that functions as a tiebreaker when there is a partition / isolation event.
Support: No one has tested/certified VSAN over distance, in other words… not supported

For now, the answer is to the question can I use Virtual SAN to build a vSphere Metro Storage Cluster is: No, it is not supported to span a VSAN cluster over distance. The feedback and request from many of you has been heard loud and clear by our developers and PM team… And at VMworld it was already mentioned by one of the developers that he was intrigued by the use case and he would be looking in to it in the future. Of course, there was no mention of when this would happen or even if it would ever happen.

Virtual SAN and Network IO Control

Duncan Epping · Oct 29, 2013 ·

Since I started playing with Virtual SAN there was something that I more or less avoided / neglected and that is Network IO Control. However, Virtual SAN and Network IO Control should go hand-in-hand. (And as such the Distributed Switch.) Note that when using VSAN (beta) the Distributed Switch and Network IO Control come with it. I guess I skipped it as there were more exciting thing to talk about, but as more and more people are asking about it I figured it is time to discuss Virtual SAN and Network IO Control. Before we get started, lets list the type of networks we will have within the VSAN cluster:

Management Network
vMotion Network
Virtual SAN Network
Virtual Machine Network

Considering it is recommend to use 10GbE with Virtual SAN that is what I will assume with this blog post. In most of these cases, at least I would hope, there will be a form of redundancy and as such we will have 2 x 10GbE to our disposal. So how would I recommend to configure the network?

Lets start with the various portgroups and VMkernel interfaces:

1 x Management Network VMkernel interface
1 x vMotion VMkernel interface (All interfaces need to be in the same subnet)
1 x Virtual SAN VMkernel interface
1 x Virtual Machine Portgroup

Some of you might be surprised that I have only listed the vMotion VMkernel interface and the Virtual SAN VMkernel interface once… And after various discussions and thinking about this for those I figured I would keep things as simple as possible, especially considering the average IO profile of server environments.

By default we can make sure the various traffic types are separated on different physical ports, but we can also set limits and shares when desired. I do not recommend using limits though, why limit a traffic type when you can use shares and “artificially limit” your traffic types based on resource usage and demand?! Also note that shares and limits are enforced per uplink.

So we will be using shares, as shares only come in to play when there is contention. What we will do is take 20GbE in to account and carve it up. Easiest way, if you ask me, is to say each traffic type gets an X number of GbE assigned at a minimum which is based on some of the recommendations out there for these types of traffic:

Management Network –> 1GbE
vMotion VMK –> 5GbE
Virtual Machine PG –> 2GbE
Virtual SAN VMkernel interface –> 10GbE

Now as you can see “management”, “virtual machine” and vMotion” traffic share Port 1 and “Virtual SAN” traffic uses Port 2. This way we have sufficient bandwidth for all the various types of traffic in a normal state. We also want to make sure that no traffic type can push out other types of traffic, for that we will use the Network IO Control shares mechanism.

Now lets look at it from a shares perspective.You will want to make sure that for instance vMotion and Virtual SAN always has sufficient bandwidth. I will work under the assumption that I only have 1 physical port available and all traffic types share the same physical port. We know this is not the case, but lets take a “worst case scenario” approach.

Lets assume you have a 1000 shares in total and lets take a worst case scenario in to account where 1 physical 10GbE ports has failed and only 1 is used for all traffic. By taking this approach you ensure that Virtual SAN always has 50% of the bandwidth to its disposal while leaving the remaining traffic types with sufficients bandwidth to avoid a potential self-inflicted DoS.

Traffic Type	Shares	Limit
Management Network	20	n/a
vMotion VMkernel Interface	50	n/a
Virtual Machine Portgroup	30	n/a
Virtual SAN VMkernel Interface	100	n/a

You can imagine that when you select the uplinks used for the various types of traffic in a smart way that even more bandwidth can be leveraged by the various traffic types. After giving it some thought, this is what I would recommend per traffic type:

Management Network VMkernel interface = Explicit Fail-over order = P1 active / P2 standby
vMotion VMkernel interface = Explicit Fail-over order = P1 active / P2 standby
Virtual Machine Portgroup = Explicit Fail-over order = P1 active / P2 standby
Virtual SAN VMkernel interface = Explicit Fail-over order = P2 active / P1 standby

Why use Explicit Fail-over order for these types? The best explanation here is predictability. By separating traffic types we allow for optimal storage performance while also providing vMotion and virtual machine traffic sufficient bandwidth.

Also vMotion traffic is bursty and can / will consume all available bandwidth, so when combined with Virtual SAN on the same uplink you could see how these two could potentially hurt each other. Of course depending on the IO profile of your virtual machines and the type of operations being done. But you can see how a vMotion of a virtual machine provisioned with a lot of memory can impact the available bandwidth for other traffic types. Don’t ignore this, use Network IO Control!

Lets try to visualize things, makes it easier to digest. Just to be clear, dotted lines are “standby” and the others are “active”.

I hope this provides some guidance around how to configure Virtual SAN and Network IO Control in a VSAN environment. Of course there are various ways of doing it, this is my recommendation and my attempt to keep things simple and based on experience with the products.

How to configure the Virtual SAN observer for monitoring/troubleshooting

Duncan Epping · Oct 21, 2013 ·

There have been various blog posts on the topic of configuring the Virtual SAN observer on both Windows and Linux by Rawlinson Rivera and Erik Bussink. I like to keep things in a single location and document them for my own use so I figured I would do a write-up for yellow-bricks.com. First of all, what is the Virtual SAN / VSAN observer? One of our engineers (Christian Dickmann) published an internal blog on this topic and I believe it explains what it is / what it does best:

You will also find VSAN datastore as well as VM level performance statistics in the vSphere Web Client. If however you are the kind of guy who wants to really drill down on your VSAN performance in-depth, down to the physical disk layers, understand cache hit rates, reasons for observed latencies, etc. then the vSphere Web Client won’t satisfy your thirst in vSphere 5.5. That’s where the VSAN observer comes in.

So how do I enable it? Well I am a big fan of the vCenter Server Appliance so that will be my focus. Just a couple of short steps to get this up and running luckily:

Open an ssh session to your vCenter Server Appliance:
- ssh root@<name or ip of your vcva>
Open rvc using your root account and the vCenter name, in my case:
- rvc root@localhost
Now do a “cd” in to your vCenter object (you can do an “ls” so see what the names are of your objects on any level), and if you do tab it will be completed with your datacenter object:
- cd localhost/Datacenter/
Now do a “cd” again, the first object is “computers” and the second is your “cluster”, in my case that looks as follows:
- cd computers/VSANCluster/
Now you can start the VSAN observer using the following command:
- vsan.observer . –run-webserver –force
Now you can see the observer querying stats every 60 seconds, and as mentioned you can stop this by doing a <Ctrl>+<C>

Fairly straight forward right? You can now go to the observer console using:

http://<vcenter name or ip>:8010
The below is what it should look like (Thanks Rawlinson for the nice screenshot)

Now one thing that is important to realize is that everything is kept in memory until you stop the VSAN observer… So it will take up GBs after a couple of hours. This tool is intended for short term monitoring and troubleshooting. Now there are some other commands in RVC that might be useful. One of the commands I found useful was “vsan.resync_dashboard”. Basically it shows you what is happening in terms of mirror sync’ing. If you fail a host, you should see the sync happening here…

I also found “vsan.vm_object_info” very useful and interesting as it allows you to see the state of your objects. And for the geeks who do not prefer to see the pretty graphs the observer shows, take a look at “vsan.vm_perf_stats”.

VC Ops included in the VMware Horizon Suite 5.3

Duncan Epping · Oct 15, 2013 ·

I was reading up on the announcements published today during VMworld. When talking about VDI/EUC with customers, and I am not an EUC guy so try to avoid this when I can, a couple of things always stood… First one was storage problems and the second one was monitoring. I think the announcements done today are a game-changer in that space, and I am sure that you will appreciate this:

New VMware Virtual SAN for Horizon View beta will deliver significantly lower upfront capital expense (CAPEX) and total cost of ownership (TCO) for virtual desktop infrastructure (VDI). The bundling of VMware vCenter Operations Manager for View in Horizon Suite, available at no additional cost, offers advanced VDI performance and operations management for large-scale virtual desktop production monitoring, advanced problem warning, faster time to resolution and complete infrastructure coverage.

How about that? I definitely think this a great step forward, and am happy to see that especially VC Ops is being included with the Horizon Suite. I can definitely recommend implementing it to those who own the Horizon Suite, and those who do not own the Suite yet, it might be time to invest. Please note that VSAN is still in Beta and is not been included from a licensing perspective but has been tested with the Horizon Suite. Use it in your test environments – play with it etc… but do not run your production workloads on it yet. (Read Andre’s article for more details on the Horizon Suite.)

** EDIT, there was a lot of confusion yesterday about VSAN being bundled or not. Apparently the press release was only supposed to say that you can use VSAN with the Horizon Suite. There is no support, no bundling, no technology preview. **

Pretty pictures Friday, the VSAN edition…

Duncan Epping · Oct 11, 2013 ·

I’ve been working on a whole bunch of VSAN diagrams… I’ve shared a couple already via twitter and in the various blog articles, but I liked the following two very much and figured I would share it with you as well. Hoping they make sense to everyone. Also, if there are any other VSAN concepts you like to see visualized let me know and I will see what I can do.

First one shows how VSAN mirrors writes to two active mirror copies. Writes need to be acknowledged by all active copies, but note they are acknowledged as soon as they his the flash buffer! De-staging from buffer to HDD is a completely independent process, even between the two hosts this happens independently from each other.

The second diagram is all about striping. When the stripe width is defined using the VM Storage Policies then objects will grow in increments of 1MB. In other words: stripe segment 1 will go to esxi-02 and stripe segment 2 will go to esxi-03 and so on.

Just a little something I figured was nice to share on a Friday, some nice light pre-weekend/VMworld content 🙂