vSphere HA and VMs per Datastore limit!

I felt I would need to get this out there, as it is not something many seem to be aware off . More and more people are starting to use storage solutions which offer 1 large shared datastore, examples are solutions like Virtual SAN, Tintri and Nutanix. I have seen various folks saying: unlimited number of VMs per datastore, but of course there are limits to everything! If you are planning to build a big cluster (HA enabled), keep in mind that per cluster your limit for a datastore is 2048 powered-on virtual machines! Say what? Yes that is right, per cluster you are limited to 2048 powered-on VMs on a single datastore. This is documented in the Max Config Guide of both vSphere 5.5 and vSphere 5.1. Please note it says datastore and not VMFS or NFS explicitly, this applies to both!

The reason for this today is the vSphere HA poweron list. I described that list in this article, in short: this list keeps track of the power-state of your virtual machines If you need more VMs in your cluster than 2048 you will need to create multiple datastores for now. (More details in the blog post) Do note that this is a known limitation and I have been told that the engineering team is researching a solution to this problem. Hopefully it will be in one of the upcoming releases.

vSphere HA advanced settings, the KB

I’ve posted about vSphere HA advanced settings various times in the past, and let me start by saying that you shouldn’t play around with them unless you have a requirement to do so. But if you do, there is a KB article which I can highly recommend as it lists all the known and lesser known advanced settings. I had the KB article updated with vSphere 5.5 advanced settings yesterday (Thanks KB team for being so responsive!) but it also applies to vSphere 5.0 and 5.1. Recommended read for those who want to get in to the nitty gritty details of vSphere HA.

http://kb.vmware.com/kb/2033250

With a single Datastore can I still use HA’s Datastore heartbeating?

I had a question last week around HA’s datastore heartbeating, the question was if datastore heartbeating still worked if you only have 1 datastore in your environment. I can understand where the question comes from as HA throws this error that you need to have 2 datastores at a minimum for HA datastore heartbeating to function correctly. I want to point out that even though HA says that 2 datastores is the minimum, even when only one datastore is available it will be used for heartbeat purposes. Yes this error will be there on your cluster, and yes you can suppress it using “das.ignoreInsufficientHbDatastore“. I figured others might be hitting the same error and have the same question so why not document it?!

Different tiers of storage in a single Storage DRS datastore cluster?

This question around adding different tiers of storage in a single Storage DRS datastore cluster keeps popping up every once in a while. I can understand where it is coming from as one would think that VM Storage Profiles combined with Storage DRS would allow you to have all types of tiers in one cluster, but then balance within that “tier” within that pool.

Truth is that that does not work with vSphere 5.1 and lower unfortunately. Storage DRS and VM Storage Profiles (Profile Driven Storage) are not tightly integrated. Meaning that when you provision a virtual machine in to a datastore cluster and Storage DRS needs to rebalance the cluster at one point, it will consider ANY datastore within that datastore cluster as a possible placement destination. Yes I agree, it is not what you hoped for… it is – what it is. (feature request filed) Frank visualized this nicely in his article a while back:

So when you architect your datastore clusters, there are a couple of things you will need to keep in mind. These are the design rules at a minimum, that is if you ask me:

  • LUNs of the same storage tier
    • See above
  • More LUNs = more balancing options
    • Do note size matters, a single LUN will need to be able to fit your largest VM!
  • Preferably LUNs of the same array (so VAAI offload works properly)
    • VAAI XCOPY (used by SvMotion for instance) doesn’t work when going from Array-A to Array-B
  • When replication is used, LUNs that are part of the same consistency group
    • You will want to make sure that VMs that need to be consistent from a replication perspective are not moved to a LUN that is outside of the consistency group
  • Similar availability characteristics and performance characteristics
    • You don’t want potential performance or availability to degrade when a VM is moved

Hope this helps,

Unmounting datastore fails due to vSphere HA?

On the VMware Community Forums someone reported he was having issues unmounting datastores when vSphere HA was enabled. Internally I contacted various folks to see what was going on. The error that this customer was hitting was the following:

The vSphere HA agent on host '<hostname>' failed to quiesce file activity on datastore '/vmfs/volumes/<volume id>'

After some emails back and forth with Support and Engineering (awesome to work with such a team by the way!) the issue was discovered and it seems that in two separate instances issues were resolved that had to do with unmounting of datastores. Keith Farkas explained on the forums how you can figure out if you are hitting those exact problems or not and in which release they are fixed, but at I realize those kind of threads are difficult to find I figured I would post it here for future reference:

You can determine if you are encountering this issue by searching the VC log files. Find the task corresponding to the unmount request, and see if the follow error message is logged during the task’s execution (Fixed in 5.1 U1a) :

2012-09-28T11:24:08.707Z [7F7728EC5700 error 'DAS'] [VpxdDas::SetDatastoreDisabledForHACallback] Failed to disable datastore /vmfs/volumes/505dc9ea-2f199983-764a-001b7858bddc on host [vim.HostSystem:host-30,10.112.28.11]: N3Csi5Fault16NotAuthenticated9ExceptionE(csi.fault.NotAuthenticated)

While we are on the subject, I’ll also mention that there is another know issue in VC 5.0 that was fixed in VC5.0U1 (the fix is in VC 5.1 too). This issue related to unmounting a force mounted VMFS datastore. You can determine whether you are hitting this error by again checking the VC log files. If you see an error message such as the following with VC 5.0, then you may be hitting this problem. A work around, like above, is to disable HA while you unmount the datastore.

2011-11-29T07:20:17.108-08:00 [04528 info 'Default' opID=19B77743-00000A40] [VpxLRO] -- ERROR task-396 -- host-384 -- vim.host.StorageSystem.unmountForceMountedVmfsVolume: vim.fault.PlatformConfigFault:

CPU Affinity and vSphere HA

On the VMware Community Forums someone asked today if CPU Affinity and vSphere HA worked in conjunction and if it was supported. To be fair I never tested this scenario, but I was certain it was supported and would work… Never hurts to  validate though before you answer a question like that. I connected to my lab and disabled a VM for DRS so I could enable CPU affinity. I pinned the CPUs down to core 0 and 1 as shown in the screenshot below:

cpu affinity

After pinning the vCPUs to a set of logical CPUs I powered on the VM. The result was, as expected, a “Protected” virtual machine as shown in the screenshot below.

HA protection

But would it get restarted if anything happened to the host? Yes it would, and I tested this of course. I switched the server off which was running this virtual machine and within a minute vSphere HA restarted the virtual machine on one of the other hosts in the cluster. So there you have it, CPU Affinity and vSphere HA work fine.

PS: Would I ever recommend using CPU Affinity? No I would not!