4.1

vSphere Metro Storage Cluster solutions, what is supported and what not?

Duncan Epping · Oct 7, 2011 ·

I started digging in to this yesterday when I had a comment on my Metro Cluster article. I found it very challenging to get through the vSphere Metro Storage Cluster HCL details and decided to write an article about it which might help you as well when designing or implementing a solution like this.

First things first, here are the basic rules for a supported environment?
(Note that the below is taken from the “important support information”, which you see in the “screenshot, call out 3”.)

Only array-based synchronous replication is supported and asynchronous replication is not supported.
Storage Array types FC, iSCSI, SVD, and FCoE are supported.
NAS devices are not supported with vMSC configurations at the time of writing.
The maximum supported latency between the ESXi ethernet networks sites is 10 milliseconds RTT.
- Note that 10ms of latency for vMotion is only supported with Enterprise+ plus licenses (Metro vMotion).
The maximum supported latency for synchronous storage replication is 5 milliseconds RTT (or higher depending on the type of storage used, please read more here.)

How do I know if the array / solution I am looking at is supported and what are the constraints / limitations you might ask yourself? This is the path you should walk to find out about it:

Go to : http://www.vmware.com/resources/compatibility/search.php?deviceCategory=san (See screenshot, call out 1)
In the “Array Test Configuration” section select the appropriate configuration type like for instance “FC Metro Cluster Storage” (See screenshot, call out 2)
(note that there’s no other category at the time of writing)
Hit the “Update and View Results” button
This will result in a list of supported configurations for FC based metro cluster solutions, currently only EMC VPLEX is supported
Click name of the Model (in this case VPLEX) and note all the details listed
Unfold the “FC Metro Cluster Storage” solution for the footnotes as they will provide additional information on what is supported and what is not.
In the case of our example, VPLEX, it says “Only Non-uniform host access configuration is supported” but what does this mean?
- Go back to the Search Results and click the “Click here to Read Important Support Information” link (See screenshot, call out 3)
- Half way down it will provide details for ” vSphere Metro Cluster Storage (vMSC)in vSphere 5.0″
- It states that “Non-uniform” are ESXi hosts only connected to the storage node(s) in the same site. Paths presented to ESXi hosts from storage nodes are limited to local site.
Note that in this case not only is “non-uniform” a requirement, you will also need to adhere to the latency and replication type requirements as listed above.

Yes I realize this is not a perfect way of navigating through the HCL and have already reached out to the people responsible for it.

Mandatory DRS Rules and HA

Duncan Epping · Aug 24, 2011 ·

On twitter Mike Laverick asked a question around DRS Affinity Rules and if HA would respect these. In this particular instance the question was around VM-Host affinity rules and I noticed multiple tweeps responding and figured it would not hurt to repeat this.

There are two different types of VM-Host affinity rules:

Must aka mandatory
Should aka preferential

The difference between these two with regards to HA is that HA will always respect a must rule. These are mandatory, even if that results in downtime for the VM. The should rule is also known as the preferential rule. In others words it would be nice if this rule can be respected, but if it can’t… no harm.

How does HA know which VM belongs to which host with regards to DRS rules? Well that is fairly straight forward. HA keeps track of which VM is compatible with which hosts. This “VM to Host compatibility list” is used for portgroups and datastores but also for DRS rules. Check the screenshot below for a hint…

Please note, this is a very old article about HA, there are tons of new articles on this topic. Just do a search on my blog, or download my ebook freely available via Rubrik.

HA slotsize caveat…

Duncan Epping · Jul 21, 2011 ·

I had a question this week from one of my colleagues which had me dazzled for a while. A customer had an HA enabled cluster and used “Host Failures Cluster Tolerates” as the admission control policy. As you hopefully all know it uses a slot algorithm, in short:

HA uses the highest CPU reservation of any given VM and the highest memory reservation of any given VM. If no reservations of higher than 256Mhz are set HA will use a default of 256Mhz for CPU and a default of 0MB+memory overhead for memory.

In their case they ended up with a slot size of 405MB. However after validating the overhead of all machines they found that the largest memory overhead was 149MB. So where did this 405MB come from? Luckily one of the engineers responded to the email thread and managed to clear things up. With vCenter 2.5 we also used a default slotsize of 256MB for memory. This default slotsize is configured in “vpxd.cfg” and unfortunately after upgrading from 2.5 to vCenter 4.0 this setting is not reset. For this customer that meant that the result was:

256 (default slotsize) + 149 (dynamic memory overhead) = 405MB

Although a minor issue, definitely something to keep in mind when troubleshooting HA slotsize issues. Always check the vpxd.cfg and check if there are any values defined for “<slotMemMinMB>”.

Disk.SchedNumReqOutstanding the story

Duncan Epping · Jun 23, 2011 ·

There has been a lot of discussion in the past around Disk.SchedNumReqOutstanding and what the value should be and how it relates to the Queue Depth. Jason Boche wrote a whole article about when Disk.SchedNumReqOutstanding (DSNRO) is used and when not and I guess I would explain it as follows:

When two or more virtual machines are issuing I/Os to the same datastore Disk.SchedNumReqOutstanding will limit the amount of I/Os that will be issued to the LUN.

So what does that mean? It took me a while before I fully got it, so lets try to explain it with an example. This is basically how the VMware IO scheduler (Start-Time Fair Scheduling aka SFQ) works.

You have set your queue depth for your HBA to 64 and a virtual machine is issuing I/Os to a datastore. As it is just a single VM up to 64 IOs will then end up in the device driver immediately. In most environments however LUNs are shared by many virtual machines and in most cases these virtual machines should be treated equally. When two or more virtual machines issue I/O to the same datastore DSNRO kicks in. However it will only throttle the queue depth when the VMkernel has detected that the threshold of a certain counter is reached. The name of this counter is Disk.SchedQControlVMSwitches and by default it is set to 6, meaning that the VMkernel will need to have detected 6 VM switches when handling I/O before it will throttle the queue down to the value of Disk.SchedNumReqOutstanding, by default 32. (VM Switches means that it will need to detect 6 times that the selected I/O is not coming from the same VM as the previous I/O.)

The reason the throttling happens is because the VMkernel cannot control the order of the I/Os that have been issued to the driver. Just imagine you have a VM A issuing a lot of I/Os and another, VM B, issuing just a few I/Os. VM A would end up using most of the full queue depth all the time. Every time VM B issues an I/O it will be picked quickly by the VMkernel scheduler (which is a different topic) and sent to the driver as soon as another one completes from there, but it will still get behind the 64 I/Os already in the driver, which will add significantly to it’s I/O latency. By limiting the amount of outstanding requests we will allow the VMkernel to schedule VM B’s I/O sooner in the I/O stream from VM A and thus we reduce the latency penalty for VM B.

Now that brings us to the second part of all statements out there, should we really set Disk.SchedNumReqOutstanding to the same value as your queue depth? Well in the case you want your I/Os processed as quickly as possible without any fairness you probably should. But if you have mixed workloads on a single datastore, and wouldn’t want virtual machines to incur excessive latency just because a single virtual machine issues a lot if I/Os, you probably shouldn’t.

Is that it? No not really, there are several questions that remain unanswered.

What about sequential I/O in the case of Disk.SchedNumReqOutstanding?
How does the VMkernel know when to stop using Disk.SchedNumReqOutstanding?

Lets tackle the sequential I/O question first. The VMkernel will allow by default to issue up to 8 sequential commands (controlled by Disk.SchedQuantum) from a VM in a row even when it would normally seem more fair to take an I/O from another VM. This is done in order not to destroy the sequential-ness of VM workloads because I/Os that happen to sectors nearby the previous I/O are handled by an order of magnitude (10x is not unusual when excluding cache effects or when caches are small compared to the disk size) faster than an I/O to sectors far away. But what is considered to be sequential? Well if the next I/O is less than 2000 sectors away from the current the I/O it is considered to be sequential (controlled by Disk.SectorMaxDiff).

Now if for whatever reason one of the VMs becomes idle you would more than likely prefer your active VM to be able to use the full queue depth again. This is what Disk.SchedQControlSeqReqs is for. By default Disk.SchedQControlSeqReqs is set to 128, meaning that when a VM has been able to issue 128 commands without any switches Disk.SchedQControlVMSwitches will be reset to 0 again and the active VM can use the full queue depth of 64 again. With our example above in mind, the idea is that if VM B is issuing very rare IOs (less than 1 in every 128 from another VM) then we still let VM B pay the high penalty on latency because presumably it is not disk bound anyway.

To conclude, now that the coin has finally dropped on Disk.SchedNumReqOutstanding I strongly feel that the advanced settings should not be changed unless specifically requested by VMware GSS. Changing these values can impact fairness within your environment and could lead to unexpected behavior from a performance perspective.

I would like to thank Thor for all the help he provided.

List of VAAI capable storage arrays?

Duncan Epping · Jun 6, 2011 ·

I was browsing the VMTN community and noticed a great tip from my colleague Mostafa Khalil and I believe it is worth sharing with you. The original question was: “Does anybody have a list of which arrays support VAAI (or a certain subset of the VAAI features)?”. Mostafa updated the post a couple of days back with the following reponse which also shows the capabilities of the 2.0 version of the VMware HCL:

A new version of the Web HCL will provide search criteria specific to VAAI.

As of this date, the new interface is still in “preview” stage. You can access it by clicking the “2.0 preview” button at the top of the page which is at: http://www.vmware.com/go/hcl/

The criteria are grouped under Features Category, Features and Plugin’s.

Features Category: Choice of “All” or “VAAI-Block”

Features: Choice of “All”, Block Zero”, “Full Copy”, “HW Assisted Locking” and more.

Plugin’s: Choice of “All” and any of the listed plugins.

Unfortunately there appear to be some glitches when it comes to listing all the arrays correctly, but I am confident that it will be fixed soon… Thanks Mostafa for the great tip.