vSphere

Two host stretched vSAN cluster with Standard license?

Duncan Epping · Jan 24, 2017 ·

I was asked today if it was possible to create a 2 host stretched cluster using a vSAN Standard license or a ROBO Standard license. First of all, from a licensing point of view the EULA states you are allowed to do this with a Standard license:

A Cluster containing exactly two Servers, commonly referred to as a 2-node Cluster, can be deployed as a Stretched Cluster. Clusters with three or more Servers are not allowed to be deployed as a Stretched Cluster, and the use of the Software in these Clusters is limited to using only a physical Server or a group of physical Servers as Fault Domains.

I figured I would give it a go in my lab. Exec summary: worked like a charm!

Loaded up the ROBO license:

Go to the Fault Domains & Stretched Cluster section under “Virtual SAN” and click Configure. And one host to “preferred” and one to “secondary” fault domain:

Select the Witness host:

Select the witness disks for the vSAN cluster:

Click Finish:

And then the 2-node stretched cluster is formed using a Standard or ROBO license:

Of course I tried the same with 3 hosts, which failed as my license does not allow me to create a stretched cluster larger than 1+1+1. And even if it would succeed, the EULA clearly states that you are not allowed to do so, you need Enterprise licenses for that.

There you have it. Two host stretched using vSAN Standard, nice right?!

How to disable DRS for a single host in the cluster

Duncan Epping · Jan 17, 2017 ·

I saw a question today which was interesting, how do I disable DRS for a single host in the cluster? I thought about it, and you cannot do this within the UI, at least… there is no “disable DRS” option on a host level. You can enable/disable it on a cluster level but that is it. But there are of course ways to ensure a host is not considered by DRS:

Place the host in maintenance mode
This will result in the host not being used by DRS. However it also means the host won’t be used by HA and you cannot run any workloads on it.
Create “VM/Host” affinity rules and exclude the host that needs to be DRS disabled. That way all current workloads will not run, or be considered to run, on that particular host. If you create “must” rules this is guaranteed, if you create “should” rules then at least HA can still use the host for restarts but unless there is severe memory pressure or you hit 100 CPU utilization it will not be used by DRS either.
Disable the vMotion VMkernel interface
This will result in not being able to vMotion any VMs to the host (and not from the host either). However, HA will still consider it for restarts and you can run workloads on the host, and the host will be considered for “initial placement” during a power-on of a VM.

I will file a feature request for a “disable drs” on a particular host option in the UI, I guess it could be useful for some in certain scenarios.

Can you run all-flash with vSAN 6.2 Standard license?

Duncan Epping · Jan 15, 2017 ·

As I get the following question a lot I figured I would share the answer here as well: Can you run all-flash with vSAN 6.2 Standard license? Many of you have seen the change in licensing when 6.5 was introduced. No longer is vSAN licenses based on storage hardware used, spindles or all-flash, you can use the lowest license SKU. Which of course is great for those wanting to use 6.5, but what about those who want to stick to 6.0 U2 aka vSAN 6.2? (This also works for 6.0 and 6.1 of course, but I would highly recommend 6.2 with the latest patches!)

Well there is a way to “downgrade” your license. (I would call it convert myself, but downgrade apparently is the official term for it.) There are 3 simple steps which are described in the following KB, but copied/pasted here for your convenience:

Navigate to and login in to your MyVMware portal at www.myvmware.com.
Locate the page with your licenses, and then select the license to convert. Once selected, click “Downgrade License Keys” from the drop down menu.
Two downgrade options will be displayed in another drop down menu. Select “Virtual SAN 6 with All Flash Add-on” to convert your existing vSAN STD licenses to a STD version that includes the all-flash add-on.

VMs not getting killed after vMSC partition has lifted

Duncan Epping · Jan 12, 2017 ·

I was talking to a VMware partner over the past couple of weeks about challenges they had in a new vSphere Metro Storage Cluster (vMSC) environment. In their particular case they simulated a site partition. During the site partition three things were expected to happen:

VMs that were impacted by APD (or PDL) should be killed by vSphere HA Component Protection
- If HA Component Protection does not work, vSphere should kill the VMs when the partition is lifted
VMs should be restarted by vSphere HA

The problems faced were two-fold, VMs were restarted by vSphere HA, however:

vSphere HA Component Protection did not kill the VMs
When the partition was lifted vSphere did not kill the VMs which had lost the lock to the datastore either

It took a while before we figured out what was going on, at least for one of the problems. Lets start with the second problem first, why aren’t the VMs killed when the partition is lifted? vSphere should do this automatically. Well vSphere does this automatically, but only when there’s a Guest Operating system installed and an I/O is issued. As soon as an I/O is issued by the VM then vSphere will notice the lock to the disk is lost and obtained by another host and kill the VM. If you have an “empty VM” then this won’t happen as there will not be any I/O to the disk. (I’ve filed a feature request to kill VMs as well even without disk I/O or without a disk.) So how do you solve this? If you do any type of vSphere HA testing (with or without vMSC) make sure to install a guest OS so it resembles real life.

Now back to the first problem. The fact that vSphere HA Component Protection does not kick in is still being debated, but I think there is a very specific reason for it. vSphere HA Component Protection is a feature that kills VMs on a host so they can be restarted when an APD or a PDL scenario has occurred. However, it will only do this when it is:

Certain the VM can be restarted on the other side (conservative setting)
There are healthy hosts in the other partition, or we don’t know (Aggressive)

First one is clear I guess (more info about this here), but what does the second one mean? Well basically there are three options:

Availability of healthy host: Yes >> Terminate
Availability of healthy host: No >> Don’t Terminate
Availability of healthy host: Unknown >> Terminate

So in the case you where you have VMCP set to “Aggressively” failover VMs, it will only do so when it knows hosts are available in the other site or when it does not know the state of the hosts in the other site. If for whatever reason the hosts are deemed as unhealthy the answer to the question if there are healthy hosts available or not will be “No”, and as such the VMs will not be killed by VMCP. The question remains, why are these hosts reported as “unhealthy” in this partition scenario, that is something we are now trying to figure out. Potentially it could be caused by misconfigured Heartbeat Datastores, but this is still something to be confirmed. If I know more, I will update this article.

Just received confirmation from development, heartbeat datastores need to be available on both sites for vSphere HA to identify this scenario correctly. If there are no heartbeat datastores available on both sites then it could happen that no hosts are marked as healthy, which means that VMCP will not instantly kill those VMs when the APD has occured.

Disk controllers for vSAN with or without cache?

Duncan Epping · Dec 13, 2016 ·

I got this question today and I thought I already wrote something on the topic, but as I cannot find anything I figured I would write up something quick. The question was if a disk controller for vSAN should have cache or not? It is a fair question as many disk controllers these days come with 1GB, 2GB or 4GB of cache.

Let it be clear that with vSAN you are required to disable the write cache at all times. The reason for this is simple, vSAN is in control of data consistency and vSAN does not expect a write cache (battery backed or not) in its data path. Make sure to disable it. From a read perspective you can have caching enabled. In some cases we see controllers where people simply set the write cache to 0% and the rest automatically then becomes read cache. This is fully supported, however our tests have shown that there’s little added benefit in terms of performance. Especially as reads come from SSD anyway typically, theoretically there could be a performance gain, but personally I would rather spend my money on flash for vSAN.

My recommendation is fairly straight forward: use a disk controller which is a plain pass through controller without any fancy features. You don’t need RAID on the disk controller with vSAN, you don’t need caching on the disk controller with vSAN, keep it simple, that works best. So if you have the option to dumb it down, go for it.