Testing vSphere Virtual SAN in your virtual lab with vSphere 5.5

For those who want to start testing the beta of vSphere Virtual SAN in their lab with vSphere 5.5 I figured it would make sense to describe how I created my nested lab. (Do note that performance will be far from optimal) I am not going to describe how to install ESXi nested as there are a billion articles out there that describe how to do that.I suggest creating ESXi hosts with 3 disks each and a minimum of 5GB of memory per host:

  • Disk 1 – 5GB
  • Disk 2 – 20GB
  • Disk 3 – 200GB

After you have installed ESXi and imported a vCenter Server Appliance (my preference for lab usage, so easy and fast to set up!) you add your ESXi hosts to your vCenter Server. Note to the vCenter Server NOT to a Cluster yet.

Login via SSH to each of your ESXi hosts and run the following commands:

  • esxcli storage nmp satp rule add –satp VMW_SATP_LOCAL –device mpx.vmhba2:C0:T0:L0 –option “enable_local enable_ssd”
  • esxcli storage nmp satp rule add –satp VMW_SATP_LOCAL –device mpx.vmhba3:C0:T0:L0 –option “enable_local”
  • esxcli storage core claiming reclaim -d mpx.vmhba2:C0:T0:L0
  • esxcli storage core claiming reclaim -d mpx.vmhba3:C0:T0:L0

These two commands ensure that the disks are seen as “local” disks by Virtual SAN and that the “20GB” disk is seen as an “SSD”, although it isn’t using an SSD. There is another option which might even be better, you can simply add a VMX setting to specify the disks are SSDs. Check William’s awesome blog post for the how to.

After running these two commands we will need to make sure the hosts are configured properly for Virtual SAN. First we will add them to our vCenter Server, but without adding them to a cluster! So just add them on a Datacenter level.

Now we will properly configure the host. We will need to create an additional VMkernel adapter, do this for each of the three hosts:

  1. Click on your host within the web client
  2. Click “Manage” -> “Networking” -> “VMkernel Adapters”
  3. Click the “Add host networking” icon
  4. Select “VMkernel Network Adapter”
  5. Select the correct vSwitch
  6. Provide an IP-Address and tick the “Virtual SAN” traffic tickbox!
  7. Next -> Next -> Finish

When this is configured for all three hosts, configure a cluster:

  1. Click your “Datacenter” object
  2. On the “Getting started” tab click “Create a cluster”
  3. Give the cluster a name and tick the “Turn On” tickbox for Virtual SAN
  4. Also enable HA and DRS if required

Now you should be able to move your hosts in to the cluster. With the Web Client for vSphere 5.5 you can simply drag and drop the hosts one by one in to the cluster. VSAN will now be automatically configured for these hosts… Nice right. When all configuration tasks are completed just click on your Cluster object and then “Manage” -> “Settings” -> “Virtual SAN”. Now you should see the amount of hosts part of the VSAN cluster, number of SSDs and number of data disks.

Now before you get started there is one thing you will need to do, and that is enable “VM Storage Policies” on your cluster / hosts. You can do this via the Web Client as follows:

  • Click the “home” icon
  • Click “VM Storage Policies”
  • Click the little policy icon with the green checkmark, second from the left
  • Select your cluster and click “Enable” and then close

Now note that you have enabled VM Storage Policies, there are no pre-defined policies. Yes there is a “default policy”, but you can only see that on the command line. For those interested just open up an SSH session and run the following command:

~ # esxcli vsan policy getdefault
Policy Class Policy Value
------------ --------------------------------------------------------
cluster (("hostFailuresToTolerate" i1) )
vdisk (("hostFailuresToTolerate" i1) )
vmnamespace (("hostFailuresToTolerate" i1) )
vmswap (("hostFailuresToTolerate" i1) ("forceProvisioning" i1))
~ #

Now this means that in the case of “hostFailuresToTolerate”, Virtual SAN can tolerate a 1 host failure before you potentially lose data. In other words, in a 3 node cluster you will have 2 copies of your data and a witness. Now if you would like to have N+2 resilience instead of N+1 it is fairly straight forward. You do the following:

  • Click the “home” icon
  • Click “VM Storage Policies”
  • Click the “New VM Storage Policy” icon
  • Give it a name, I used “N+2 resiliency” and click “Next”
  • Click “Next” on Rule-Sets and select a vendor, which will be “vSan”
  • Now click <add capability> and select “Number of failures to tolerate” and set it to 2 and click “Next”
  • Click “Next” -> “Finish”

That is it for creating a new profile. Of course you can make these as complex as you want, their are various other options like “Number of disk stripes” and “Flash read cache reservation %”. For now I wouldn’t recommend tweaking these too much unless you absolutely understand the impact of changing these.

In order to use the profile you will go to an existing virtual machine and you right click it and do the following:

  • Click “All vCenter Actions”
  • Click “VM Storage Service Policies”
  • Click “Manage VM Storage Policies”
  • Select the appropriate policy on “Home VM Storage Policy” and do not forget to hit the “Apply to disks” button
  • Click OK

Now the new policy will be applied to your virtual machine and its disk objects! Also while deploying a new virtual machine you can in the provisioning workflow immediately select the correct policy so that it is deployed in a correct fashion.

These are some of the basics for testing VSAN in a virtual environment… now register and get ready to play!

What’s new in vSphere 5.5 for DRS?

In vSphere 5.5 a couple of new pieces of functionality have been added to DRS. The first one is around the maximum number of VMs on a single host that DRS should allow. I guess some of you will say hey didn’t we introduce that last year with that advanced setting called “LimitVMsPerESXHost“? Yes that is correct, but the DRS team found this too restrictive. They’ve added an extra setting which is called LimitVMsPerESXHostPercent. A bit smarter, and less restrictive… so how does it work?

Basically LimitVMsPerESXHostPercent is LimitVMsPerESXHost in a more dynamic way as it automatically adjusts the limits. When you set LimitVMsPerESXHostPercent to 50 in a 4 host cluster which is running 20 VMs already and you want to poweron 12 new VMs. How many VMs can a single host run?

32 total VMs, 4 hosts --> mean: 8 VMs per host

We set the percentage to 50 so the new limit is 8 + (50% * 8) = 12

So if host 1 was only running 2 VMs, it can now take on an additional 12 without the need for you to constantly change the LimitVMsPerESXHost when you introduce new VMs. LimitVMsPerESXHostPercent does this for you.

Latency Sensitive Workloads

As of vSphere 5.5 DRS recognizes VMs marked as latency-sensitive (vCenter Web Client option). With 5.1 it could occur that latency sensitive VMs were moved around by DRS, as you can imagine when a VM migrates this will impact which ever application is running. Although the impact is typically little, for a latency sensitive workload even “little” could be disastrous. So in order to avoid this unwanted impact DRS treats latency sensitive VMs as if they have soft-affinity to the host they are running on. But what when there is an absolute need to migrate this VM, well as mentioned it is “soft affinity”, so treated like a “should rule” and in that case it means that the VM can be moved when needed.

Do note that within the DRS UI you don’t see this affinity anywhere, this is solved within DRS itself. Awesome and needed if you ask me!

Another one

Last but not least another new advanced option, this option is titled “AggressiveCPUActive“. When you set it to “1” DRS will be more aggressive when it comes to scheduling. This can be useful in environments where DRS estimates CPU demand too low. AggressiveCPUActive will help avoid averaging out the bursts and will allow for DRS to aggressively balance your virtual infrastructure.

DISCLAIMER: I do not recommend using advanced settings unless there is an absolute need for it. I can see why you would use the “LimitVMsPerESXHostPercent” but be careful with “AggressiveCPUActive“.

vSphere 5.5 nuggets: changes to disk.terminateVMOnPDLDefault

Those who were in the vSphere 5.5 beta program might have noticed it but I am suspecting many did not. With vSphere 5.5 there is finally an advanced setting to enable Disk.terminateVMOnPDLDefault. This advanced setting was introduced with vSphere 5.0 and unfortunately needed to be enabled in a file (/etc/vmware/settings); which was inconvenient to say the least. I asked the engineering team what the plans were to improve this but there were no direct plans. It took a bit longer then expected, but nevertheless the feature request I created made it in to the product. So if you are using a vSphere Metro Storage Cluster (what a coincidence, I am presenting on this topic in an hour at VMworld) please note that the following method should now be used to allow vSphere HA to respond to a Permanent Device Loss aka PDL:

  1. Browse to the host in the vSphere Web Client navigator
  2. Click the Manage tab and click Settings
  3. Under System, click Advanced System Settings
  4. In Advanced System Settings, select “VMkernel.Boot.terminateVMOnPDL”
  5. Click the Edit button (pencil) to edit the value and set it to “Yes”
  6. Click OK

Note the change in setting name from Disk.terminateVMOnPDLDefault to VMkernel.Boot.terminateVMOnPDL!

vSphere 5.5: platform scalability

One of the things that always keeps people busy are the “max config” numbers for a release. I figured I would dedicate a blog post to vSphere 5.5 platform scalability now that it has been released. A couple of things that stand out if you ask me when it comes to platform scalability:

  • 320 Logical CPUs per host
  • 4TB of memory per host
  • 16 NUMA nodes per host
  • 4096 vCPUs maximum per host
  • 40 GbE NIC support
  • 62TB VMDK and Virtual RDM support
  • 16Gb Fibrechannel end-to-end support
  • VMFS Heap Increase, 64TB open VMDK per host max

Some nice key improvements right when it comes to platform scalability? I think so! Not that I expect to have the need for a host with 4TB of memory and 320 pCPUs in the near future, but you never know right. Some more details to be found in the what’s new whitepaper in vSphere 5.5 for Platform.

Startup News Flash part 4

This is the fourth part already of the Startup News Flash, we are in the middle of VMworld and of course there were many many announcements. I tried to filter out those which are interesting, as mentioned in one of the other posts if you feel one is missing leave a comment.

Nutanix announced version 3.5  of their OS last week. The 3.5 release contains a bunch of new features, one of them being what they call the “Nutanix Elastic Deduplication Engine”. I think it is great they added this feature is ultimately it will allow you to utilize your flash and RAM tier more efficiently. The more you can cache the better right?! I am sure this will result in a performance improvement in many environment, you can imagine that especially for VDI or environments where most VMs are based on the same template this will be the case. What might be worth knowing is that Nutanix dedupe is inline for their RAM and flash tier and then for their magnetic disks is happening in the background. Nutanix also announced that besides supporting vSphere and KVM they also support Hyper-V as of now, which is great for customers as it offers you choice. On top of all that, they managed to develop a new simplified UI and a rest-based API allowing for customers to build a software defined datacenter! Also worth noting is that they’ve been working on their DR story. They’ve developed a Storage Replication Adapter which is one of the components needed to implement Site Recover Manager with array based replication. They also optimized their replication technology by extending their compression technology to that layer. (Disclaimer: the SRA is not listed on the VMware website, as such it is not supported by VMware. Please validate the SRM section of the VMware website before implementing.)

Of course an update from a flash caching vendor, this time it is Proximal Data who announced the 2.0 version of their software. AutoCache 2.0 includes role-based administration features and multi-hypervisor support to meet the specific needs of cloud service providers. Good to see that multi hypervisor and cloud is part of the proximal story soon. I like the Proximal aggressive price point. It starts at $999 per host for flash caches less than 500GB, which is unique for a solution which does both block and file caching. Not sure I agree with Proximal’s stance with regards to write-back caching and “down-playing” 1.0 solutions, especially not when you don’t offer that functionality yourself or were a 1.0 version yesterday.

I just noticed this article published by Silicon Angle which mentions the announcement of the SMB Edition of FVP, priced at a flat $9,999, supports up to 100 VMs across a maximum of four hosts with two processors and one flash drive each. More details to be found in this press release by PernixData.

Also something which might interest people is Violin Memory filing for IPO. It had been rumored numerous times, but this time it seems to be happening for real. The Register has an interesting view by the way. I hope it will be a huge success for everyone involved!

Also want to point people again to some of the cool announcements VMware did in the storage space, although far from being a startup I do feel this is worth listing here again: introduction to vSphere Flash Read Cacheintroduction to Virtual SAN.