VMware vSphere Virtual SAN design considerations…

I have been playing a lot with vSphere Virtual SAN (VSAN) in the last couple of months… I figured I would write down some of my thoughts around creating a hardware platform or constructing the virtual environment when it comes to VSAN. There are some recommended practices and there are some constraints, I aim to use this blog post to gather all of these Virtual SAN design considerations. Please read the VSAN introduction, how to install VSAN in your virtual lab and “How do you know where an object is located” to get a better understanding of the product. There is a long list of VSAN blogs that can be found here: vmwa.re/vsan

The below is all based on vSphere 5.5 Virtual SAN (public) Beta and my interpretation and thoughts based on various conversations with colleagues, engineering and reading various documents.

  • vSphere Virtual SAN (VSAN) clusters are limited to a maximum total of 32 hosts and there is a minimum of 3 hosts. VSAN is also currently limited to 100 VMs per host, resulting in a maximum of 3200 VMs in a 32 host cluster. Please note that HA currently has a limit of 2048 protected VMs in a single Datastore.
  • It is recommended to dedicate a 10GbE NIC port to your VSAN VMkernel traffic, although 1GbE is fully supported it could be a limiting factor in I/O intensive environments. Both VSS and VDS are supported.
  • It is recommended to have a VSAN VMkernel on every physical NIC! Ensure to configure them in a “active/standby” configuration so that when you have 2 physical NIC ports and 2 VSAN VMkernel’s each of them will have its own port. Do note that multiple VSAN VMkernel NICs on a single host on the same subnet is not a supported configuration, in  different subnets it is supported.
  • IP Hash Load Balancing is supported by VSAN, but due to limited number of IP-addresses between source/destination load balancing benefits could be limited. In other words, an etherchannel formed out of 4x1GbE NIC will most likely not result in 4GbE.
  • Although Jumbo Frames are fully supported with VSAN they do add a level of operational complexity. When Jumbo Frames are enabled ensure these are enabled end-to-end!
  • VSAN requires at a minimum 1 SSD and 1 Magnetic Disk per diskgroup on a host which is contributing storage. Each diskgroup can have a maximum of 1 SSD and 7 magnetic disks. When you have more than 7 HDDs or two or more SSDs you will need to create additional diskgroups.
  • Each host that is providing capacity to the VSAN datastore has at least one local diskgroup. There is a maximum of 5 disk groups per host!
  • It can beneficial to create multiple smaller disk groups instead of larger diskgroups. More diskgroups means smaller failure domains and more cache drives / queues.
  • Ensure when sizing your environment to take data replicas in to account. If your environment needs N+1 or N+2 (etc) resiliency factor this in accordingly.
  • SSD capacity does not count towards total VSAN datastore capacity. When sizing your environment, do not include SSD capacity in your totalized capacity calculation.
  • It is a recommended practice to have a minimum 1:10 ratio of SSD capacity to HDD capacity in each disk group. In other words, when you have 1TB of HDD capacity, it is recommended to have at least 100GB of SSD capacity. Note that VMware’s recommendation has changed since BETA, new recommendation is:
    • 10 percent of the anticipated consumed storage capacity before the number of failures to tolerate is considered
  • By default, 70% of the available SSD capacity will be used as read cache and 30% will be used as a write buffer. As in most designs, when it comes to cache/buffer –> more = better.
  • Selecting the SSD with the right performance profile can make a 5x-10x difference in VSAN performance easily, chose carefully and wisely. Both SSD and PCIe flash solutions are supported, but there are requirements! Make sure to check the HCL before purchasing new hardware. My tip Intel S3700, great price/performance balance.
  • VSAN relies on VM Storage Policies for policy based management. There is a default policy under the hood, but you cannot see this within the UI. As such it is a recommended practice to create a new standard policy for your environment after VSAN has been configured. It is recommended to start with all settings set to default, ensure “Number of failures to tolerate” is configured to 1. This guarantees that when a single host fails virtual machines can be restarted and recovered from this failure with minimal impact on the environment. Attach this policy to your virtual machines when migrating them to VSAN or during virtual machine provisioning.
  • Configure vSphere HA isolation response to “power-off” to ensure that virtual machines which reside on an isolated host can be safely restarted.
  • Ensure vSphere HA admission control policy (“host failures to tolerate” or the “percentage based) aligns with your VSAN availability strategy. In other words, ensure that both compute and storage are configured using the same “N+x” availability approach.
  • When defining your VM Storage Policy avoid unnecessary usage of “flash read cache reservation”. VSAN has internal read cache optimization algorithms, trust it like you trust the “host scheduler” or DRS!
  • VSAN does not support virtual machine disks greater than 2TB-512b, VMs which require larger VMDKs are not suitable candidates at this point in time for VSAN.
  • VSAN does not support FT, DPM, Storage DRS or Storage I/O Control. It should be noted though that VSAN internally takes care of scheduling and balancing when required. Storage DRS and SIOC are designed for SAN/NAS environments.
  • Although supported by VSAN, it is recommended practice to keep the hosts/disk configuration for a VSAN cluster similar. Non-uniform cluster configuration could lead to variations in performance and could make it more complex to stay compliant to defined policies after a failure.
  • When adding new SSDs or HDDs ensure these are not pre-formatted. Note that when VSAN is configured to “automatic mode” disks are added to existing disk groups or new disk groups are created automatically.
  • Note that vSphere HA behaves slightly different in a VSAN enabled cluster, here are some of the changes / caveats
    • Be aware that when HA is turned on in the cluster, FDM agent (HA) traffic goes over the VSAN network and not the Management Network. However, when an potential isolation is detected HA will ping the default gateway (or specified isolation address) using the Management Network.
    • When enabling VSAN ensure vSphere HA is disabled. You cannot enable VSAN when HA is already configured. Either configure VSAN during the creation of the cluster or disable vSphere HA temporarily when configuring VSAN.
    • When there are only VSAN datastores available within a cluster then Datastore Heartbeating is disabled. HA will never use a VSAN datastore for heartbeating.
    • When changes are made to the VSAN network it is required to re-configure vSphere HA.
  • VSAN requires a RAID Controller / HBA which supports passthrough mode or pseudo passthrough mode. Validate with your server vendor if the included disk controller has support for passthrough. An example of a passthrough mode controller which is sold separately is the LSI SAS 9211-8i.
  • Ensure log files are stored externally to your ESXi hosts and VSAN by leveraging vSphere’s syslog capabilities.
  • ESXi can be installed on: USB, SD and Magnetic Disk. Hosts with 512GB or more memory are only supported when ESXi is installed on magnetic disk.

That is it for now. When more comes to mind I will add it to the list!

vSphere 5.5 nuggets: High Availability Enhancement

There aren’t a lot of changes in 5.5 when it comes to vSphere High Availability aka HA, but one is worth noting. As most of you are probably aware of, vSphere HA in the past did nothing with VM to VM Affinity or Anti Affinity rules. Typically for people using “affinity” rules this was not an issue, but those using “anti-affinity” rules did see this as an issue. They created these rules to ensure specific virtual machines would never be running on the same host, but vSphere HA would simply ignore the rule when a failure had occurred and just place the VMs “randomly”. With vSphere 5.5 this has changed! vSphere HA is now “anti affinity” aware. In order to ensure anti-affinity rules are respected you will need to set an advanced setting:

das.respectVmVmAntiAffinityRules - Values: "false" (default) and "true"

Now note that this also means that when you configure anti-affinity rules and have this advanced setting  configured to “true” and somehow there aren’t sufficient hosts available to respect these rules… then rules will be respected and it could result in HA not restarting a VM. Make sure to understand this potential impact when configuring this setting and configuring these rules.

vSphere 5.5 nuggets: Change Disk.SchedNumReqOutstanding per device!

Always wanted to change Disk.SchedNumReqOutstanding per device instead of per host? Well now with vSphere 5.5 you can! I didn’t know about this either, but my colleague Paudie pointed this out. Useful feature when you have several storage arrays and you need to tweak these values, now lets be clear… I do not recommend tweaking this, but in the case you need to you can now do it per device using esxcli.

Get the current configured value for a specific device:
esxcli storage core device list --device <device>

Set the value for a specific device::
esxcli storage core device set -d <device> -O <value between 1-256>.

Testing vSphere Virtual SAN in your virtual lab with vSphere 5.5

For those who want to start testing the beta of vSphere Virtual SAN in their lab with vSphere 5.5 I figured it would make sense to describe how I created my nested lab. (Do note that performance will be far from optimal) I am not going to describe how to install ESXi nested as there are a billion articles out there that describe how to do that.I suggest creating ESXi hosts with 3 disks each and a minimum of 5GB of memory per host:

  • Disk 1 – 5GB
  • Disk 2 – 20GB
  • Disk 3 – 200GB

After you have installed ESXi and imported a vCenter Server Appliance (my preference for lab usage, so easy and fast to set up!) you add your ESXi hosts to your vCenter Server. Note to the vCenter Server NOT to a Cluster yet.

Login via SSH to each of your ESXi hosts and run the following commands:

  • esxcli storage nmp satp rule add –satp VMW_SATP_LOCAL –device mpx.vmhba2:C0:T0:L0 –option “enable_local enable_ssd”
  • esxcli storage nmp satp rule add –satp VMW_SATP_LOCAL –device mpx.vmhba3:C0:T0:L0 –option “enable_local”
  • esxcli storage core claiming reclaim -d mpx.vmhba2:C0:T0:L0
  • esxcli storage core claiming reclaim -d mpx.vmhba3:C0:T0:L0

These two commands ensure that the disks are seen as “local” disks by Virtual SAN and that the “20GB” disk is seen as an “SSD”, although it isn’t using an SSD. There is another option which might even be better, you can simply add a VMX setting to specify the disks are SSDs. Check William’s awesome blog post for the how to.

After running these two commands we will need to make sure the hosts are configured properly for Virtual SAN. First we will add them to our vCenter Server, but without adding them to a cluster! So just add them on a Datacenter level.

Now we will properly configure the host. We will need to create an additional VMkernel adapter, do this for each of the three hosts:

  1. Click on your host within the web client
  2. Click “Manage” -> “Networking” -> “VMkernel Adapters”
  3. Click the “Add host networking” icon
  4. Select “VMkernel Network Adapter”
  5. Select the correct vSwitch
  6. Provide an IP-Address and tick the “Virtual SAN” traffic tickbox!
  7. Next -> Next -> Finish

When this is configured for all three hosts, configure a cluster:

  1. Click your “Datacenter” object
  2. On the “Getting started” tab click “Create a cluster”
  3. Give the cluster a name and tick the “Turn On” tickbox for Virtual SAN
  4. Also enable HA and DRS if required

Now you should be able to move your hosts in to the cluster. With the Web Client for vSphere 5.5 you can simply drag and drop the hosts one by one in to the cluster. VSAN will now be automatically configured for these hosts… Nice right. When all configuration tasks are completed just click on your Cluster object and then “Manage” -> “Settings” -> “Virtual SAN”. Now you should see the amount of hosts part of the VSAN cluster, number of SSDs and number of data disks.

Now before you get started there is one thing you will need to do, and that is enable “VM Storage Policies” on your cluster / hosts. You can do this via the Web Client as follows:

  • Click the “home” icon
  • Click “VM Storage Policies”
  • Click the little policy icon with the green checkmark, second from the left
  • Select your cluster and click “Enable” and then close

Now note that you have enabled VM Storage Policies, there are no pre-defined policies. Yes there is a “default policy”, but you can only see that on the command line. For those interested just open up an SSH session and run the following command:

~ # esxcli vsan policy getdefault
Policy Class Policy Value
------------ --------------------------------------------------------
cluster (("hostFailuresToTolerate" i1) )
vdisk (("hostFailuresToTolerate" i1) )
vmnamespace (("hostFailuresToTolerate" i1) )
vmswap (("hostFailuresToTolerate" i1) ("forceProvisioning" i1))
~ #

Now this means that in the case of “hostFailuresToTolerate”, Virtual SAN can tolerate a 1 host failure before you potentially lose data. In other words, in a 3 node cluster you will have 2 copies of your data and a witness. Now if you would like to have N+2 resilience instead of N+1 it is fairly straight forward. You do the following:

  • Click the “home” icon
  • Click “VM Storage Policies”
  • Click the “New VM Storage Policy” icon
  • Give it a name, I used “N+2 resiliency” and click “Next”
  • Click “Next” on Rule-Sets and select a vendor, which will be “vSan”
  • Now click <add capability> and select “Number of failures to tolerate” and set it to 2 and click “Next”
  • Click “Next” -> “Finish”

That is it for creating a new profile. Of course you can make these as complex as you want, their are various other options like “Number of disk stripes” and “Flash read cache reservation %”. For now I wouldn’t recommend tweaking these too much unless you absolutely understand the impact of changing these.

In order to use the profile you will go to an existing virtual machine and you right click it and do the following:

  • Click “All vCenter Actions”
  • Click “VM Storage Service Policies”
  • Click “Manage VM Storage Policies”
  • Select the appropriate policy on “Home VM Storage Policy” and do not forget to hit the “Apply to disks” button
  • Click OK

Now the new policy will be applied to your virtual machine and its disk objects! Also while deploying a new virtual machine you can in the provisioning workflow immediately select the correct policy so that it is deployed in a correct fashion.

These are some of the basics for testing VSAN in a virtual environment… now register and get ready to play!

What’s new in vSphere 5.5 for DRS?

In vSphere 5.5 a couple of new pieces of functionality have been added to DRS. The first one is around the maximum number of VMs on a single host that DRS should allow. I guess some of you will say hey didn’t we introduce that last year with that advanced setting called “LimitVMsPerESXHost“? Yes that is correct, but the DRS team found this too restrictive. They’ve added an extra setting which is called LimitVMsPerESXHostPercent. A bit smarter, and less restrictive… so how does it work?

Basically LimitVMsPerESXHostPercent is LimitVMsPerESXHost in a more dynamic way as it automatically adjusts the limits. When you set LimitVMsPerESXHostPercent to 50 in a 4 host cluster which is running 20 VMs already and you want to poweron 12 new VMs. How many VMs can a single host run?

32 total VMs, 4 hosts --> mean: 8 VMs per host

We set the percentage to 50 so the new limit is 8 + (50% * 8) = 12

So if host 1 was only running 2 VMs, it can now take on an additional 12 without the need for you to constantly change the LimitVMsPerESXHost when you introduce new VMs. LimitVMsPerESXHostPercent does this for you.

Latency Sensitive Workloads

As of vSphere 5.5 DRS recognizes VMs marked as latency-sensitive (vCenter Web Client option). With 5.1 it could occur that latency sensitive VMs were moved around by DRS, as you can imagine when a VM migrates this will impact which ever application is running. Although the impact is typically little, for a latency sensitive workload even “little” could be disastrous. So in order to avoid this unwanted impact DRS treats latency sensitive VMs as if they have soft-affinity to the host they are running on. But what when there is an absolute need to migrate this VM, well as mentioned it is “soft affinity”, so treated like a “should rule” and in that case it means that the VM can be moved when needed.

Do note that within the DRS UI you don’t see this affinity anywhere, this is solved within DRS itself. Awesome and needed if you ask me!

Another one

Last but not least another new advanced option, this option is titled “AggressiveCPUActive“. When you set it to “1” DRS will be more aggressive when it comes to balancing VMs when %RDY is impacting them. This can be useful in environments where %RDY has a very spiky behaviour. AggressiveCPUActive will help avoid averaging out the bursts and will allow for DRS to aggressively balance your virtual infrastructure. (Official explanation: AggressiveCPUActive, when set to 1, causes DRS to use the 80th percentile of the last five 1-minute average values of CPU activity (in other words, the second highest) to predict the CPU demand of virtual machines in the DRS cluster, instead of using the 5-minute average value (which is the default behavior). This more aggressive DRS behavior can better detect spikes in CPU ready time and thus better predict CPU demand in some deployments.)

DISCLAIMER: I do not recommend using advanced settings unless there is an absolute need for it. I can see why you would use the “LimitVMsPerESXHostPercent” but be careful with “AggressiveCPUActive“.