vSphere HA advanced settings, the KB

I’ve posted about vSphere HA advanced settings various times in the past, and let me start by saying that you shouldn’t play around with them unless you have a requirement to do so. But if you do, there is a KB article which I can highly recommend as it lists all the known and lesser known advanced settings. I had the KB article updated with vSphere 5.5 advanced settings yesterday (Thanks KB team for being so responsive!) but it also applies to vSphere 5.0 and 5.1. Recommended read for those who want to get in to the nitty gritty details of vSphere HA.

http://kb.vmware.com/kb/2033250

With a single Datastore can I still use HA’s Datastore heartbeating?

I had a question last week around HA’s datastore heartbeating, the question was if datastore heartbeating still worked if you only have 1 datastore in your environment. I can understand where the question comes from as HA throws this error that you need to have 2 datastores at a minimum for HA datastore heartbeating to function correctly. I want to point out that even though HA says that 2 datastores is the minimum, even when only one datastore is available it will be used for heartbeat purposes. Yes this error will be there on your cluster, and yes you can suppress it using “das.ignoreInsufficientHbDatastore“. I figured others might be hitting the same error and have the same question so why not document it?!

Minimum bandwidth requirements per concurrent vMotion?

I have been digging for a long time now to figure out what the minimum bandwidth requirements are per concurrent vMotion. After a long time I finally managed to get a statement. In the past the statement was made that 622Mbps was the minimum required bandwidth for vMotion, it appears that this is incorrect for vSphere 5.0 and higher. With vSphere 5.0 a new feature called Stun During Page Send (SDPS) was introduced and this has decreased the bandwidth requirements from 622Mpbs down to 250Mbps per concurrent vMotion.

Always nice to know right?!

Unmounting datastore fails due to vSphere HA?

On the VMware Community Forums someone reported he was having issues unmounting datastores when vSphere HA was enabled. Internally I contacted various folks to see what was going on. The error that this customer was hitting was the following:

The vSphere HA agent on host '<hostname>' failed to quiesce file activity on datastore '/vmfs/volumes/<volume id>'

After some emails back and forth with Support and Engineering (awesome to work with such a team by the way!) the issue was discovered and it seems that in two separate instances issues were resolved that had to do with unmounting of datastores. Keith Farkas explained on the forums how you can figure out if you are hitting those exact problems or not and in which release they are fixed, but at I realize those kind of threads are difficult to find I figured I would post it here for future reference:

You can determine if you are encountering this issue by searching the VC log files. Find the task corresponding to the unmount request, and see if the follow error message is logged during the task’s execution (Fixed in 5.1 U1a) :

2012-09-28T11:24:08.707Z [7F7728EC5700 error 'DAS'] [VpxdDas::SetDatastoreDisabledForHACallback] Failed to disable datastore /vmfs/volumes/505dc9ea-2f199983-764a-001b7858bddc on host [vim.HostSystem:host-30,10.112.28.11]: N3Csi5Fault16NotAuthenticated9ExceptionE(csi.fault.NotAuthenticated)

While we are on the subject, I’ll also mention that there is another know issue in VC 5.0 that was fixed in VC5.0U1 (the fix is in VC 5.1 too). This issue related to unmounting a force mounted VMFS datastore. You can determine whether you are hitting this error by again checking the VC log files. If you see an error message such as the following with VC 5.0, then you may be hitting this problem. A work around, like above, is to disable HA while you unmount the datastore.

2011-11-29T07:20:17.108-08:00 [04528 info 'Default' opID=19B77743-00000A40] [VpxLRO] -- ERROR task-396 -- host-384 -- vim.host.StorageSystem.unmountForceMountedVmfsVolume: vim.fault.PlatformConfigFault:

CPU Affinity and vSphere HA

On the VMware Community Forums someone asked today if CPU Affinity and vSphere HA worked in conjunction and if it was supported. To be fair I never tested this scenario, but I was certain it was supported and would work… Never hurts to  validate though before you answer a question like that. I connected to my lab and disabled a VM for DRS so I could enable CPU affinity. I pinned the CPUs down to core 0 and 1 as shown in the screenshot below:

cpu affinity

After pinning the vCPUs to a set of logical CPUs I powered on the VM. The result was, as expected, a “Protected” virtual machine as shown in the screenshot below.

HA protection

But would it get restarted if anything happened to the host? Yes it would, and I tested this of course. I switched the server off which was running this virtual machine and within a minute vSphere HA restarted the virtual machine on one of the other hosts in the cluster. So there you have it, CPU Affinity and vSphere HA work fine.

PS: Would I ever recommend using CPU Affinity? No I would not!

How does Mem.MinFreePct work with vSphere 5.0 and up?

With vSphere 5.0 VMware changed the way Mem.MinFreePct worked. I had briefly explained Mem.MinFreePct in a blog post a long time ago. Basically Mem.MinFreePct, pre vSphere 5.0, was the percentage of memory set aside by the VMkernel to ensure there are always sufficient system resources available. I received a question on twitter yesterday based on the explanation in the vSphere 5.1 Clustering Deepdive and after exchanging > 10 tweets I figured it made sense to just write an article.

Mem.MinFreePct used to be 6% with vSphere 4.1 and lower. Now you can imagine that when you had a host with 10GB you wouldn’t worry about 600MB being kept free, but that is slightly different for a host with 100GB as it would result in 6GB being kept free but still not an extreme amount right. What would happen when you have a host with 512GB of memory… Yes, that would result in 30GB of memory being kept free. I am guessing you can see the point now. So what changed with vSphere 5.0?

In vSphere 5.0 a “sliding scale” principle was introduced instead of Mem.MinFreePct. Let me call it “Mem.MinFree”, as I wouldn’t view this as a percentage but rather do the math and view it as a number instead. Lets borrow Frank’s table for this sliding scale concept:

Percentage kept free of –>
Memory Range
6% 0-4GB
4% 4-12GB
2% 12-28GB
1% Remaining memory

What does this mean if you have 100GB of memory in your host? It means that from the first 4GB of memory we will set aside 6% which equates to ~ 245MB. For the next 8GB (4-12GB range) we set aside another 4% which equates to ~327MB. For the next 16GB (12-28GB range) we set aside 2% which equates to ~ 327MB. Now from the remaining 72GB (100GB host – 28GB) we set aside 1% which equates to ~ 720MB. In total the value of Mem.MinFree is ~ 1619MB. This number, 1619MB, is being kept free for the system.

Now, what happens when the host has less than 1619MB of free memory? That is when the various memory reclamation techniques come in to play. We all know the famous “high, soft, hard, and low” memory states, these used to be explained as: 6% (High), 4% (Soft), 2% (Hard), 1% (Low). FORGET THAT! Yes, I mean that… forget these as that is what we used in the “old world” (pre 5.0). With vSphere 5.0 and up these water marks should be viewed as a Percentage of Mem.MinFree. I used the example from above to clarify it a bit what it results in.

Free memory state Threshold in Percentage
Threshold in MB
High water mark Higher than or equal to Mem.MinFree 1619MB
Soft water mark 64% of Mem.MinFree 1036MB
Hard water mark 32% of Mem.MinFree 518MB
Low water mark 16% of Mem.MinFree 259MB

I hope this clarifies a bit how vSphere 5.0 (and up) ensures there is sufficient memory available for the VMkernel to handle system tasks…

Reminder: Free Kindle copy of vSphere 5 and 4.1 Clustering Deepdive

Just a reminder, if you want your free Kindle copy of the vSphere 5.0 Clustering Deepdive or the vSphere 4.1 HA and DRS Deepdive, make sure to check Amazon (US Kindle Store) today and tomorrow (Wednesday June the 5th and Thursday June the 6th)! You can download the Kindle copy of both these books for free, yes that is correct ZERO dollars.

So make sure you pick it up either before the promo expires…

** Note I have linked the US Kindle stores, it is also available for free in local Kindle stores, just do search! **