5.1

Different tiers of storage in a single Storage DRS datastore cluster?

Duncan Epping · Aug 6, 2013 ·

This question around adding different tiers of storage in a single Storage DRS datastore cluster keeps popping up every once in a while. I can understand where it is coming from as one would think that VM Storage Profiles combined with Storage DRS would allow you to have all types of tiers in one cluster, but then balance within that “tier” within that pool.

Truth is that that does not work with vSphere 5.1 and lower unfortunately. Storage DRS and VM Storage Profiles (Profile Driven Storage) are not tightly integrated. Meaning that when you provision a virtual machine in to a datastore cluster and Storage DRS needs to rebalance the cluster at one point, it will consider ANY datastore within that datastore cluster as a possible placement destination. Yes I agree, it is not what you hoped for… it is – what it is. (feature request filed) Frank visualized this nicely in his article a while back:

So when you architect your datastore clusters, there are a couple of things you will need to keep in mind. These are the design rules at a minimum, that is if you ask me:

LUNs of the same storage tier
- See above
More LUNs = more balancing options
- Do note size matters, a single LUN will need to be able to fit your largest VM!
Preferably LUNs of the same array (so VAAI offload works properly)
- VAAI XCOPY (used by SvMotion for instance) doesn’t work when going from Array-A to Array-B
When replication is used, LUNs that are part of the same consistency group
- You will want to make sure that VMs that need to be consistent from a replication perspective are not moved to a LUN that is outside of the consistency group
Similar availability characteristics and performance characteristics
- You don’t want potential performance or availability to degrade when a VM is moved

Hope this helps,

Unmounting datastore fails due to vSphere HA?

Duncan Epping · Jul 5, 2013 ·

On the VMware Community Forums someone reported he was having issues unmounting datastores when vSphere HA was enabled. Internally I contacted various folks to see what was going on. The error that this customer was hitting was the following:

The vSphere HA agent on host '<hostname>' failed to quiesce file activity on datastore '/vmfs/volumes/<volume id>'

After some emails back and forth with Support and Engineering (awesome to work with such a team by the way!) the issue was discovered and it seems that in two separate instances issues were resolved that had to do with unmounting of datastores. Keith Farkas explained on the forums how you can figure out if you are hitting those exact problems or not and in which release they are fixed, but at I realize those kind of threads are difficult to find I figured I would post it here for future reference:

You can determine if you are encountering this issue by searching the VC log files. Find the task corresponding to the unmount request, and see if the follow error message is logged during the task’s execution (Fixed in 5.1 U1a) :

2012-09-28T11:24:08.707Z [7F7728EC5700 error 'DAS'] [VpxdDas::SetDatastoreDisabledForHACallback] Failed to disable datastore /vmfs/volumes/505dc9ea-2f199983-764a-001b7858bddc on host [vim.HostSystem:host-30,10.112.28.11]: N3Csi5Fault16NotAuthenticated9ExceptionE(csi.fault.NotAuthenticated)

While we are on the subject, I’ll also mention that there is another know issue in VC 5.0 that was fixed in VC5.0U1 (the fix is in VC 5.1 too). This issue related to unmounting a force mounted VMFS datastore. You can determine whether you are hitting this error by again checking the VC log files. If you see an error message such as the following with VC 5.0, then you may be hitting this problem. A work around, like above, is to disable HA while you unmount the datastore.

2011-11-29T07:20:17.108-08:00 [04528 info 'Default' opID=19B77743-00000A40] [VpxLRO] -- ERROR task-396 -- host-384 -- vim.host.StorageSystem.unmountForceMountedVmfsVolume: vim.fault.PlatformConfigFault:

CPU Affinity and vSphere HA

Duncan Epping · Jun 27, 2013 ·

On the VMware Community Forums someone asked today if CPU Affinity and vSphere HA worked in conjunction and if it was supported. To be fair I never tested this scenario, but I was certain it was supported and would work… Never hurts to validate though before you answer a question like that. I connected to my lab and disabled a VM for DRS so I could enable CPU affinity. I pinned the CPUs down to core 0 and 1 as shown in the screenshot below:

After pinning the vCPUs to a set of logical CPUs I powered on the VM. The result was, as expected, a “Protected” virtual machine as shown in the screenshot below.

But would it get restarted if anything happened to the host? Yes it would, and I tested this of course. I switched the server off which was running this virtual machine and within a minute vSphere HA restarted the virtual machine on one of the other hosts in the cluster. So there you have it, CPU Affinity and vSphere HA work fine.

PS: Would I ever recommend using CPU Affinity? No I would not!

vCenter Single Sign On aka SSO, what do I recommend?

Duncan Epping · Jun 26, 2013 ·

I have had various people asking me over the last 9 months what I would recommend when it comes to SSO. Would I use a multi-site configuration, maybe even an HA configuration or would I go for the Basic configuration? What about when I have multiple vCenter Server instances, would I share the SSO instance between these or deploy multiple SSO instances? All very valid questions I would say. I have kept my head low intentionally the last year to be honest, but after reading this excellent blog post by Josh Odgers where he posted an awesome architectural decision flow chart I figured it was time voice my opinion. Just look at this impression of the flow chart (for full resolution visit Josh’s website):

Complex? Yes I agree, probably too complex for most people. Difficult to digest, and that is not due to Josh’s diagramming skills. SSO has various deployment models (multi site, HA, basic), and then there is the option to deploy it centralized or localized as well. On top of that there is also the option to protect it using Heartbeat. Now you can probably understand why the flow diagram ended up looking complex. Many different options but what makes sense?

Justin King already mentioned this in his blog series on SSO (part 1, 2, 3, 4) as a suggestion, but lets drive it home! Although it might seem like it defeats the purpose I would recommend the following in almost every single scenario one can imagine: Basic SSO deployment, local to vCenter Server instance. Really, the KISS principle applies here. (Keep It Simple SSO!) Why do I recommend this? Well for the following simple reasons:

SSO in HA mode does not make sense as clustering the SSO database is not supported, so although you just deployed an HA solution you still end up with a single point of failure!
You could separate SSO from vCenter, but why would you create a dependency on network connection between the vCenter instance and the SSO instance? It is asking for trouble.
A centralized SSO instance sounds like it make sense, but the problem here is that it requires all connecting vCenter instances to be on the same version. Yes indeed, this complicates your operational model. So go localized for now.

So is there a valid reason to deviate from this? Yes there is and it is called Linked Mode. Linked Mode “requires” SSO to be deployed in a “multi-site” configuration, this is probably one of the few reasons I would not follow the KISS principle when there is a requirement for linked-mode… personally I never use Linked Mode though, I find it confusing.

So there you have it, KISS!

How does Mem.MinFreePct work with vSphere 5.0 and up?

Duncan Epping · Jun 14, 2013 ·

With vSphere 5.0 VMware changed the way Mem.MinFreePct worked. I had briefly explained Mem.MinFreePct in a blog post a long time ago. Basically Mem.MinFreePct, pre vSphere 5.0, was the percentage of memory set aside by the VMkernel to ensure there are always sufficient system resources available. I received a question on twitter yesterday based on the explanation in the vSphere 5.1 Clustering Deepdive and after exchanging > 10 tweets I figured it made sense to just write an article.

https://twitter.com/vmcutlip/status/345289952684290048

Mem.MinFreePct used to be 6% with vSphere 4.1 and lower. Now you can imagine that when you had a host with 10GB you wouldn’t worry about 600MB being kept free, but that is slightly different for a host with 100GB as it would result in 6GB being kept free but still not an extreme amount right. What would happen when you have a host with 512GB of memory… Yes, that would result in 30GB of memory being kept free. I am guessing you can see the point now. So what changed with vSphere 5.0?

In vSphere 5.0 a “sliding scale” principle was introduced instead of Mem.MinFreePct. Let me call it “Mem.MinFree”, as I wouldn’t view this as a percentage but rather do the math and view it as a number instead. Lets borrow Frank’s table for this sliding scale concept:

Percentage kept free of –>	Memory Range
6%	0-4GB
4%	4-12GB
2%	12-28GB
1%	Remaining memory

What does this mean if you have 100GB of memory in your host? It means that from the first 4GB of memory we will set aside 6% which equates to ~ 245MB. For the next 8GB (4-12GB range) we set aside another 4% which equates to ~327MB. For the next 16GB (12-28GB range) we set aside 2% which equates to ~ 327MB. Now from the remaining 72GB (100GB host – 28GB) we set aside 1% which equates to ~ 720MB. In total the value of Mem.MinFree is ~ 1619MB. This number, 1619MB, is being kept free for the system.

Now, what happens when the host has less than 1619MB of free memory? That is when the various memory reclamation techniques come in to play. We all know the famous “high, soft, hard, and low” memory states, these used to be explained as: 6% (High), 4% (Soft), 2% (Hard), 1% (Low). FORGET THAT! Yes, I mean that… forget these as that is what we used in the “old world” (pre 5.0). With vSphere 5.0 and up these water marks should be viewed as a Percentage of Mem.MinFree. I used the example from above to clarify it a bit what it results in.

Free memory state	Threshold in Percentage	Threshold in MB
High water mark	Higher than or equal to Mem.MinFree	1619MB
Soft water mark	64% of Mem.MinFree	1036MB
Hard water mark	32% of Mem.MinFree	518MB
Low water mark	16% of Mem.MinFree	259MB

I hope this clarifies a bit how vSphere 5.0 (and up) ensures there is sufficient memory available for the VMkernel to handle system tasks…