Disconnect a host from VSAN cluster doesn’t change capacity?

Someone asked this question on VMTN this week and I received a similar question this week from another user… If you disconnect a host from a VSAN cluster it doesn’t change the total amount of available capacity. The customer was wondering why this was. Well the answer is simple: You are not disconnecting the host from your VSAN cluster, but you are rather disconnecting it from vCenter Server instead! (In contrary to HA and DRS by the way) In other words: your VSAN host is still providing storage to the VSAN datastore when it is disconnected.

If you want a host to leave a VSAN cluster you have two options in my opinion:

  • Place it in maintenance mode with full data migration and remove it from the cluster
  • Run the following command from the ESXi command line:
    esxcli vsan cluster leave

Please keep that in mind when you do maintenance… Do not use “disconnect” but actually remove the host from the cluster if you do not want it to participate in VSAN any longer.

Running CoreOS on Fusion

I wanted to play around with CoreOs and Docker a bit so I went to the CoreOS website but unfortunately they do not provide an OVF or OVA download. The CoreOS website doesn’t really explain how to do this, they do show how to do it for ESXi where they show how to create an OVF/OVA. I figured I would do a quick write-up on how to get the latest version up and running quickly in Fusion, without jumping through hoops.

  • Download the latest version here: coreos_production_vmware_insecure.zip (~180MB)
  • Unzip the file after downloading
  • If you look in the folder you will see a “.vmx” file and a “.vmdk” file
  • Move the whole folder in to the “Virtual Machines” folder under “Documents”
  • Now simply right click the VMX file and “open” it
  • You may be asked if you want to upgrade the hardware, I recommend doing this
  • Boot

After you are done booting you can “simply” connect to is as follows:

  • Look at the VM console for the IP Address
  • Now change your directory to the folder where the virtual machine is stored as there should be a key in that folder
  • Now run the following command, where is the IP of the VM in my environment:
    ssh -i insecure_ssh_key core@

Note that the key is highly insecure and you should replace it of course. More details can be found here.

PS: Dear CoreOS, please create an OVA or OVF… It will make life even easier for your customers.

vSphere 5.5 U1 patch released for NFS APD problem!

On April 19th I wrote about an issue with vSphere 5.1 and NFS based datastores APD ‘ing. People internally at VMware have worked very hard to root cause the issue and fix it. Log entries witnessed are:

YYYY-04-01T14:35:08.075Z: [APDCorrelator] 9414268686us: [esx.problem.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
YYYY-04-01T14:36:55.274Z: No correlator for vob.vmfs.nfs.server.disconnect
YYYY-04-01T14:36:55.274Z: [vmfsCorrelator] 9521467867us: [esx.problem.vmfs.nfs.server.disconnect] 12345678-abcdefg0-0000-000000000000 NFS-DS1
YYYY-04-01T14:37:28.081Z: [APDCorrelator] 9553899639us: [vob.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed. 

More details on the fix can be found here: http://kb.vmware.com/kb/2077360

Why Queue Depth matters!

A while ago I wrote an article about the queue depth of certain disk controllers and tried to harvest some of the values and posted those up. William Lam did a “one up” this week and posted a script that can gather the info which then should be posted in a Google Docs spreadsheet, brilliant if you ask me. (PLEASE run the script and lets fill up the spreadsheet!!) But some of you may still wonder why this matters… (For those who didn’t read some of the troubles one customer had with a low-end shallow queue depth disk controller, and Chuck’s take on it here.) Considering the different layers of queuing involved, it probably makes most sense to show the picture from virtual machine down to the device.

queue depth

In this picture there are at least 6 different layers at which some form of queuing is done. Within the guest there is the vSCSI adaptor that has a queue. Then the next layer is VMkernel/VSAN which of course has its own queue and manages the IO that is pushed to the MPP aka muti-pathing layer the various devices on a host. On the next level a Disk Controller has a queue, potentially (depending on the controller used) each disk controller port has a queue. Last but not least of course each device (i.e. a disk) will have a queue. Note that this is even a simplified diagram.

If you look closely at the picture you see that IO of many virtual machines will all flow through the same disk controller and that this IO will go to or come from one or multiple devices. (Typically multiple devices.) Realistically, what are my potential choking points?

  1. Disk Controller queue
  2. Port queue
  3. Device queue

Lets assume you have 4 disks; these are SATA disks and each have a queue depth of 32. Total combined this means that in parallel you can handle 128 IOs. Now what if your disk controller can only handle 64? This will result in 64 IOs being held back by the VMkernel / VSAN. As you can see, it would beneficial in this scenario to ensure that your disk controller queue can hold the same number of IOs (or more) as your device queue can hold.

When it comes to disk controllers there is a huge difference in maximum queue depth value between vendors, and even between models of the same vendor. Lets look at some extreme examples:

HP Smart Array P420i - 1020
Intel C602 AHCI (Patsburg) - 31 (per port)
LSI 2008 - 25
LSI 2308 - 600

For VSAN it is recommended to ensure that the disk controller has a queue depth of at least 256. But go higher if possible. As you can see in the example there are various ranges, but for most LSI controllers the queue depth is 600 or higher. Now the disk controller is just one part of the equation, as there is also the device queue. As I listed in my other post, a RAID device for LSI for instance has a default queue depth of 128 while a SAS device has 254 and a SATA device has 32. The one which stands out the most is the queue depth of the SATA device, only a queue depth of 32 and you can imagine this can once again become a “choking point”. However, fortunately the shallow queue depth of SATA can easily be overcome by using NL-SAS drives (nearline serially attached SCSI) instead. NL-SAS drives are essentially SATA drives with a SAS connector and come with the following benefits:

  • Dual ports allowing redundant paths
  • Full SCSI command set
  • Faster interface compared to SATA, up to 20%
  • Larger (deeper) command queue [depth]

So what about the cost then? From a cost perspective the difference between NL-SAS and SATA is for most vendors negligible. For a 4TB drive the difference at the time of writing on different website was on average $ 30,-. I think it is safe to say that for ANY environment NL-SAS is the way to go and SATA should be avoided when possible.

In other words, when it comes to queue depth: spent a couple of extra bucks and go big… you don’t want to choke your own environment to death!

vCenter Availability and Performance survey

One of the VMware product managers asked me to share this with you guys and ask you to please take the time to fill out this vCenter Server availability and performance survey. It is a very in-depth survey which should help the engineering team and the product management team making the right decisions when it comes to scalability and availability. So if you feel vCenter scalability and availability is important, take the time to fill it out!