VMware

Removing stretched VSAN configuration?

Duncan Epping · Dec 15, 2015 ·

I had a question today around how to safely remove a stretched VSAN configuration without putting any of the workloads in danger. This is fairly straight forward to be honest, there are 1 or 2 things though which are important. (For those wondering why you would want to do this, some customers played with this option and started loading workloads on top of VSAN and then realized it was still running in stretched mode.) Here are the steps required:

Click on your VSAN cluster and go to Manage and disable the stretched configuration
- This will remove the witness host, but will leave 2 fault domains in tact
Remove the two remaining fault domains
Go to the Monitor section and click on Health and check the “virtual san object health”. Most likely it will be “red” as the “witness components” have gone missing. VSAN will repair this automatically by default in 60 minutes. We prefer to take step 4 though asap after removing the failure domains!
Click “repair object immediately”, now witness components will be recreated and the VSAN cluster will be healthy again.
Click “retest” after a couple of minutes

By the way, that “repair object immediately” feature can also be used in the case of a regular host failure where “components” have gone absent. Very useful feature, especially if you don’t expect a host to return any time soon (hardware failure for instance) and have the spare capacity.

Doing a stretched VSAN design?

Duncan Epping · Nov 27, 2015 ·

As I have given various people already individually the formulas needed to calculate how much bandwidth is required I figured I would share this as well. If you are doing a stretched VSAN design you will want to read this excellent paper by Jase McCarty. This paper describes the bandwidth requirements between the “data sites” and from the data sites to the “witness site”. It provides the formula needed, and it will show you that the “general guidelines” provided during launch were relatively conservative. In many cases especially the the connection to the witness location can be low bandwidth. Just have a read when you are designing a stretched VSAN and do the math.

http://www.vmware.com/files/pdf/products/vsan/vmware-virtual-san-6.1-stretched-cluster-bandwidth-sizing.pdf

Dell FX2 platform certified for VSAN with storage blades!

Duncan Epping · Oct 8, 2015 ·

A couple of weeks ago the Dell FX2 disk controller was added to the Virtual SAN Compatibility Guide and shortly after the Ready Node configurations were added. For those who haven’t looked at the Dell FX2 platform, it is (in my opinion) hyper-converged on steroids. Not only can it provide you with 4 compute nodes in 2U it also packs a 10GbE switch and can hold two storage blades with each 16 disks in it. What? Yes indeed, that is a lot of horse power in a single system.

I am working with a customer right now who is designing a new cluster configuration leveraging the Dell FX2 platform. In this case they are planning on 16 hosts in total. In their case after assessing their current workloads they are going with the FC430 E5-2670 v3 series with 12 cores (dual processor). Each host will have 256GB of memory and uses SD to boot from.

From a storage perspective they are looking to use the FD332 storage blades. Two per FX2 chassis, fully maxed out with 32 drives in total, which is 8 drives per host. All-flash by the way, leveraging 1.6TB devices for the capacity tier and 400GB devices for the write cache. Yes that is 38.4TB raw capacity per FX2 chassis, times 4… ~153TB.Not a coincidence that the configuration is very similar to the “AF-6 Series – Dell FX2 Platform”, they prefer to use a certified and tested solution instead of picking their own components, which makes sense if you ask me.

One of the key reasons for them to go with all-flash is the beta which is coming up. They want to get their hands dirty with functionality like deduplication, checksumming and RAID-5/6 (aka erasure coding) as soon as possible. All 4 chassis will run in one site first for testing purposes for now and they are considering after the initial tests to deploy them across two sites in a stretched configuration. They asked me what the big benefit was of RAID-5 or RAID-6 over the network (aka erasure coding) and it definitely is the lower raw capacity requirements it will lead to. If you look at the current FTT=1 implementation it means that a 20GB disk requires an additional 20GB for availability reasons, which means 40GB in total. With an RAID-5 implementation instead of RAID-1 this 20GB disk would only require 26.6GB of disk space, that is a savings of almost 14GB immediately. And that is before any type of space efficiency (dedupe) is enabled. Anyway, back to the FX2.

So far only “all-flash” has made it to VSAN Ready Node list, and of course components are also listed as in the disk controller “FD332-PERC” (single and dual ROC) and I’ve seen the 1.8″ flash devices also on the list. Waiting to see what one of these boxes would cost in an all-flash configuration, and hoping to also see a hybrid configuration soon. I’m a fan of the Dell FX2 systems, that is for sure.

2 is the minimum number of hosts for VSAN if you ask me

Duncan Epping · Oct 1, 2015 ·

In 2013 I wrote an article about the minimum number of hosts for Virtual SAN. Since then this post has started living its own life. Somehow people have misunderstood my post and used/abused it in many shapes and forms. When I look at the size of a traditional cluster (non-VSAN) the minimum size is 2. From an availability perspective I ask myself what is the risk I am willing to take. What does that mean?

In a previous life I did many projects for SMB customers. My SMB customers typically had somewhat in the range of 2-5 hosts. With the majority having 2-3. In many cases those having 2-3 hosts were running roughly a similar number of virtual machines. The difference between the two situations “2 hosts” versus “3 hosts” was whether during times of maintenance (upgrading / updating) or failure if the ability to restart the virtual machine after a secondary failure. Many customers decided to go with 2 node clusters. Key reason for it being price vs risk. At normal operations risk is low, but the price of an additional host was relatively high.

Now compare this to Virtual SAN and you will see the same applies. With Virtual SAN we have a minimum of 3 hosts, well in a ROBO configuration you can have 2 with an external witness. This means that from a support perspective the bare minimum of dedicated physical hosts required for VSAN is 2. There you go, 2 is the bare minimum for ROBO. For non-ROBO 3 is the minimum. Fully supported, offers all functionality and similar to 4 hosts.

Is having an extra host a good plan? Yes of course it is. HA / DRS / VSAN (and any other scale-out storage solution for that matter) will benefit from more hosts. You as a customer need to ask yourself what the risk is, and if the cost is justifiable.

PS1: A question just came in, want to make that it is clear. Even in a 2-host ROBO configuration you can do maintenance! A single copy of the data and the witness remains available and will have quorum.

PS2: No, you cannot host your “witness” VM on the VSAN cluster itself, this is not supported as the witness is the quorom for the cluster and it should be outside of the cluster to provide certainty of the state in the case of a failure.

vSAN licensing / packaging

Duncan Epping · Sep 14, 2015 ·

I’ve seen many questions on vSAN packaging over the last months so I figured I would share a table that shows what is possible with which license. A lot of the confusion is around the “ROBO” use case, and I want to make it crystal clear that you can deploy a 2-node ROBO configuration using Standard, Advanced or the special “vSAN for ROBO” 25VM pack that will be made available. Anyway, when it comes to functionality the table below should make it crystal clear what is included with what.

Before anyone asks, “stretched clusters” refers to the vSAN stretched cluster workflow / feature. Two data center rooms in the same building leveraging external witness capabilities through the stretched cluster workflow requires “Advanced”. Three datacenters stretched across campus distance using “fault domains” does not require Advanced, but can use Standard.

Also note that “vSAN Advanced” is included in the “Horizon Advanced” and the “Horizon Enterprise” Suites. If you have either of those, I highly recommend testing vSAN, I am seeing more and more customers taking advantage of it, a great storage platform which performs extremely and is really simple to manage is included in your suite, why not use it?!

The below table shows what the current licensing/packaging looks like for vSAN 6.6. Note that for vSAN 6.5 “all-flash” is now available in all licensing levels. In vSAN 6.6 “QoS” has been dropped down to Standard, and “Local Site Protection for Stretched Clusters” and “vSAN Encryption” have been added to Enterprise. For pricing, please contact your partner or a VMware sales rep.

	vSAN Standard	vSAN Advanced	vSAN Enterprise	vSAN for ROBO Standard	vSAN for ROBO Advanced
SPBM	X	X	X	X	X
Read/Write SSD Caching	X	X	X	X	X
Distributed RAID	X	X	X	X	X
Distributed Switch	X	X	X	X	X
Snapshots / Clones	X	X	X	X	X
Rack Awareness	X	X	X	X	X
Health Monitoring	X	X	X	X	X
vSphere Replication *	X	X	X	X	X
Two Node Robo Configuration	X	X	X	X	X
Two Node Direct Connect	X	X	X	X	X
All-Flash	X	X	X	X	X
Quality of Service	X	X	X	X	X
Dedupe and Compression		X	X		X
RAID-5/6		X	X		X
Stretched Cluster			X
Local Site Protection for Stretched Clusters			X
vSAN Encryption			X

* vSphere Replication is new with a 5 minute RPO, this was exclusive certified for vSAN. In some material you will see this being referred too as vSAN Replication.

Full licensing white paper can be found here,