ESX

Re: RTFM “What I learned today – HA Split Brain”

Duncan Epping · Jul 22, 2009 ·

I’m going to start with a quote from Mike’s article “What I learned today…“:

Split brain is HA situation where an ESX host becomes “orphaned” from the rest of the cluster because its primary service console network has failed. As you might know the COS network is used in the process of checking if an ESX host has suffered an untimely demise. If you fail to protect the COS network by giving vSwitch0 two NICs or by adding a 2nd COS network to say your VMotion switch, under-desired consequences can occour. Anyway, the time for detecting split brain used to be 15 seconds, for some reason this has changed to 12 seconds. I’m not 100% why, or if in fact the underlying value has changed – or that VMware has merely corrected its own documentation. You see its possible to get split brain in Vi3.5 happening if the network goes down for more than 12 seconds, but comes back up on the 13th, 14th or 15th second. I guess I will have to do some research on this one. Of course, the duration can be changed – and split brain is trivial matter if you take the neccessary network redundency steps…

I thought this issue was something that was common knowledge but if Mike doesn’t know about it my guess is that most of you don’t know about this. Before we dive into Mike’s article, technically this is not a split brain, it is an “orphaned vm” but not a scenario where the disk files and the in memory VM are split between hosts.

Before we start this setting is key in Mike’s example:

das.failuredetectiontime = This is the time period when a host has received no heartbeats from another host, that it waits before declaring the other host dead.

The default value is 15 seconds. In other words the host will be declared dead on the fifteenth second and a restart will be initiated by one of the primary hosts.

For now let’s assume the isolation response is “power off”. These VMs can only be restarted if the current VMs have been powered off. Here’s the clue, the “power off”(isolation response) will be initiated by the isolated host 2 seconds before the das.failuredetectiontime.

Does this mean that you can end up with your VMs being down and HA not restarting them?
Yes, when the heartbeat returns between the 13th and 15th second shutdown could already have been initiated. The restart however will not be initiated because the heartbeat indicates that the host is not isolated.

How can you avoid this?
Pick “Leave VM powered on” as an isolation response. Increasing the das.failuredetectiontime will also decrease the chances of running in to issues like these.

Did this change?
No, it’s been like this since it has been introduced.

Up to 80 virtual machines per host in an HA Cluster (3.5 vs vSphere)

Duncan Epping · Jul 16, 2009 ·

I was re-reading the KB article on how to improve HA scaling. Apparently pre vCenter 2.5 U5 there was a problem when the amount of VMs that needs to fail-over exceeds 35. Keep in mind that it’s a soft limit, you can run more than 35 VMs on a single host in a HA cluster if you want to though.

To increase scalability up to 80VMs per host vCenter needs to be upgraded to 2.5 U5 and the following configuration changes are recommended:

To increase the maximum vCPU limit to 192
To increase the Service Console memory limit to 512 MB.
To increase the memory resource reservation of the vim resource pool to 1024 MB.
To include/edit the host agent memory configuration values. (hostdStopMemInMB=380 and hostdWarnMemInMB=300)

A question that I immediately had was what about vSphere. What are the values for vSphere and do I need to increase them as well? Here are the vSphere default settings:

512
300MB
0 MB
hostdStopMemInMB=380 and hostdWarnMemInMB=300

As you can see 1 and 4 are already the new default on vSphere. I would always recommend to set the Service Console memory to 800MB. With most hosts having 32GB or more the costs of assigning an extra 500MB to the Service Console is minimal. That leaves the recommendation to increase the memory reservation for the vim resource pool. I would recommend to leave it set to the default value. vSphere scales up to 100 VMs per host in a HA cluster and chances are that this will be increased when U1 hits the streets. (These values usually change with every release.)

NetApp’s vSphere best practices and EMC’s SRM in a can

Duncan Epping · Jul 15, 2009 ·

This week both NetApp and EMC released updated versions of documents I highly recommend to everyone interested in virtualization! Now some might think why would I want to read a NetApp document when we are an EMC shop.! Or why would I want to read an EMC document when my environment is hosted on a NetApp FAS3050. The answer is simple, although both documents contain vendor specific information there’s more to be learned from these documents because the focus is VMware products. No marketing nonsense, just the good stuff!

NetApp’s guide dives in to the basics of multipathing for instance. Especially the section on iSCSI/NFS is useful, how do I setup multiple VMkernels for load balancing and are the pros and cons. EMC’s SRM and Celerra guide includes a full how to set this up. Not only the EMC side but also the VMware SRM side of it. Like I said both documents are highly recommended!

TR-3749: vSphere on NetApp Storage Best Practices
EMC Celerra VSA and VMware SRM setup and configurations guide

Max amount of VMs per VMFS volume

Duncan Epping · Jul 7, 2009 ·

Two weeks ago I discussed how to determine the correct LUN/VMFS size. In short it boils down to the following formula:

round((maxVMs * avgSize) + 20% )

So in other words, the max amount of virtual machines per volume multiplied by the average size of a virtual machine plus 20% for snaps and .vswp rounded up. (As pointed out in the comments if you have VMs with high amounts of memory you will need to adjust the % accordingly.) This should be your default VMFS size. Now a question that was asked in one of the comments, which I already expected, was “how do I determine what the maximum amount of VMs per volume is?”. There’s an excellent white paper on this topic. Of course there’s more than meets the eye but based on this white paper and especially the following table I decided to give it a shot:

No matter what I tried typing up, and believe me I started over a billion times, it all came down to this:

Decide your optimal queue depth.
I could do a write up, but Frank Dennenman wrote an excellent blog on this topic. Read it here and read NetApp’s Nick Triantos article as well. But in short you’ve got two options:
- Queue Depth = (Target Queue Depth / Total number of LUNs mapped from the array) / Total number of hosts connected to a given LUN
- Queue Depth = LUN Queue Depth / Total number of hosts connected to a given LUN
There are two options because some vendors use a Target Queue Depth and others specifically specify a LUN Queue Depth. In the case they mention both just take the one which is most restrictive.
Now that you know what your queue depth should be you, let’s figure out the rest.
Let’s take a look at the table first. I added “mv” as it was not labeled as such in the table.
n = LUN Queue Depth
a = Average active SCSI Commands per server
d = Queue Depth (from a host perspective)

m = Max number of VMs per ESX host on a single VMFS volume
mv = Max number VMs on shared VMFS volume

First let’s figure out what “m”, max number of VMs per host on a single volume, should be:
- d/a = m
  queue depth 64 / 4 active I/Os on average per VM = 16 VMs per host on a single VMFS volume
The second one is “mv”, max number of VMs on a shared VMFS volume
- n/a = mv
  Lun Queue Depth of 1024 / 4 active I/Os on average per VM = 256 VMs in total on a single VMFS volume but multiple hosts
Now that we know “d”, “m” and “mv” it should be fairly easy to give a rough estimate of the maximum amount of VMs per LUN if you actually know what your average active I/Os number is. I know this will be your next question so my tip of today:
Windows perfmon – average disk queue length. This contains both active and queued commands.
For Linux this is “top” and if you are already running a virtual environment open up esxtop and take a look at “qstats”.
Another option of course would be running Capacity Planner.

Please don’t overthink this. If you are experiencing issues there are always ways to move VMs around that’s why VMware invented Storage VMotion. Standardize your environment for ease of management and also make sure you feel comfortable about the number of “eggs in one basket”.

vSphere and vmfs-undelete

Duncan Epping · Jul 3, 2009 ·

This week someone asked me during the VMTN Podcast on chat if I knew where vmfs-undelete resided in vSphere. I had a look but couldn’t find it either. A quick search gave me this:

vmfs-undelete utility is not available for ESX/ESXi 4.0
ESX/ESXi 3.5 Update 3 included a utility called vmfs-undelete, which could be used to recover deleted .vmdk files. This utility is not available with ESX/ESXi 4.0.

Workaround: None. Deleted .vmdk files cannot be recovered.

So if you are currently actively using vmfs-undelete and looking into upgrading to vSphere take this in account!