What is: Current Memory Failover Capacity?

I have had this question many times by now, what is “Current Memory Failover Capacity” that is shown in the cluster summary when you have selected the “Percentage Based Admission Control Policy”? What is that percentage? 99% of what? And will it go down to 0%? Or will it go down to the percentage that you reserved? Well I figured it was time to put things to the test and no longer be guessing.

As shown in the screenshot above, I have selected 33% of memory to be reserved and currently have 99% of memory failover capacity. Lets power-on a bunch of virtual machines and see what happens. Below is the result shown in a screenshot, “current memory failover capacity” went down from 99% to 94%.

Also when I increase the reservation in a virtual machine I can see “Current Memory Failover Capacity” drop down even further. So it is not about “used” but about “unreserved / reserved” memory resources (including memory overhead), let that be absolutely clear! When will vCenter Server shout “Insufficient resources to satisfy configured failover level for vSphere HA”?

It shouldn’t be too difficult to figure that one out, just power-on new VMs until it says “stop it”. As you can see in the screenshot below. This happens when you reach the percentage you specified to reserve as “memory failover capacity”. In other words in my case I reserved 33%, when “Current Memory Failover Capacity” reaches 33% it doesn’t allow the VM to be powered on as this would violate the selected admission control policy.

I agree, this is kind of confusing…  But I guess when you run out of resources it will become pretty clear very quickly ;-)


Using das.vmmemoryminmb with Percentage Based admission control

I had question today about using the advanced settings to set a minimal amount of resources that HA would use to do the admission control math with. Many of us have used these advanced settings das.vmMemoryMinMB and das.vmCpuMinMHz to dictate the slot size when no reservations were set in an environment where the “host failures” admission control policy was used. However what many don’t appear to realize is that this will also work for the Percentage Based admission control policy.

If you want to avoid extreme overcommitment and want to specify a minimal amount of resources that HA should use to do the math with then even with the Percentage Based admission control policy you can use these settings. In the case where your VM reservation does not exceed the value specified, the value is used to do the math with. In other words if you set “das.vmMemoryMinMB” to 2048, it will use 2048 to do the math with unless the reservation set on the VM is higher.

I did a quick experiment in my test lab which I had just rebuilt. Without das.vmMemoryMinMB and two VMs running (with no reservation) I had 99% Mem Failover Capacity as shown in the screenshot below:

With das.vmMemoryMinMB set to 20480, and two VMs running, I had 78% Mem Failover Capacity as shown in the screenshot below:

I guess that proves that you can use das.vmMemoryMinMB and das.vmCpuMinMHz to influence Percentage Based admission control.

Why the world needs Software Defined Storage

Yesterday I was at a Software Defined Datacenter event organized by IBM and VMware. The famous Cormac Hogan presented on Software Defined Storage and I very much enjoyed hearing about the VMware vision and of course Cormac’s take on this. Coincidentally, last week I read this article by long-time community guru Jason Boche on VAAI and number of VMs, and after a discussion with a customer yesterday (at the event) about their operational procedures for provisioning new workloads I figured it was time to write down my thoughts.

I have seen many different definitions so far for Software Defined Storage and I guess there is a source of truth in all of them. Before I explain what it means to me, let me describe commonly faced challenges people have today.

In a lot of environments managing storage and associated workloads is a tedious task. It is not uncommon to see large spreadsheets with a long list of LUNs, IDs, Capabilities, Groupings and whatever more is relevant to them and their workloads. These spreadsheets are typically used to decide where to place a virtual machine or virtual disk. Based on the requirements of the application a specific destination will be selected. On top of that, a selection will need to be made based on currently available disk space of a datastore and of course the current IO load. You do not want to randomly place your virtual machine and find out two days later that you are running out of disk space… Well, that is if you have a relatively mature provisioning process. Of course it is also not uncommon to just pick a random datastore and hope for the best.

To be honest, I can understand many people randomly provision virtual machines. Keeping track of virtual disks, datastores, performance, disk space and other characteristics… it is simply too much and boring. Didn’t we invent computer systems to do these repeatable boring tasks for us? That leads us to the question where and how Software Defined Storage should help you?

A common theme recurring in many “Software Defined” solutions presented by VMware is:

Abstract, Pool, Automate.

This also applies to Software Defined Storage in my opinion. These are three basic requirements that a Software Defined Storage solution should meet. But what does this mean and how does it help you? Let me try to make some sense out of that nice three word marketing slogan:

Software Defined Storage should enable you to provision workloads to a pool of virtualized physical resources based on service level agreements (defined in a policy) in an automated fashion.

I understand that is a mouth full, so lets elaborate a bit more. Think about the challenges I described above… or what Jason described with regards to “VMs per Volume” and how there are various different components that can impact your service level. A Software Defined Storage (SDS) solution should be able to intelligently place virtual disks (virtual machines / vApps) based on selected policy for the object (virtual disk / machine / appliance). These policies typically contain characteristics of the provided service level. On top of that a Software Defined Storage solution should take risks / constraints in to account. Meaning that you don’t want your workload to be deployed to a volume which is running out of disk space for instance.

What about those characteristics, what are those? Characteristics could be anything, just two simple examples to make it a bit more obvious:

  • Does your application require recover-ability after a disaster? –> SDS selects destination which is replicated, or instructs storage system to create replicated object for the VM
  • Does your application require a certain level of performance? –> SDS selects destination that can provide this performance, or instructs storage system to reserve storage resources for the VM

Now this all sounds a bit vague, but I am purposely trying to avoid using product or feature names. Software Defined Storage is not about a particular feature, product or storage system. Although I dropped the word policy, note that enabling Profile Driven Storage within vCenter Server does not provide you a Software Defined Storage solution. It shouldn’t matter either (to a certain extent) if you are using EMC, NetApp, Nimbus, a VMware software solution or any of the other thousands of different storage systems out there. Any of those systems, or even a combination of them, should work in the software defined world. To be clear, in my opinion (today) there isn’t such a thing as a Software Defined Storage product, it is a strategy. It is a way of operating that particular part of your datacenter.

To be fair, there is a huge difference between various solutions. There are products and features out there that will enable you to build a solution like this and transform the way you manage your storage and provision new workloads. Products and features that will allow you to create a flexible offering. VMware has been and is working hard to be a part of this space, vSphere Replication / Storage DRS / Storage IO Control / Virsto / Profile Driven Storage are part of the “now”, but just the beginning… Virtual Volumes, Virtual Flash and Distributed Storage have all been previewed at VMworld and are potentially what is next. Who knows what else is in the pipeline or what other vendors are working on.

If you ask me, there are exciting times ahead. Software Defined Storage is a big part of the Software Defined Data Center story and you can bet this will change datacenter architecture and operations.

** There are two excellent articles on this topic the first by Bill Earl, and the second by Christos Karamanolis, make sure to read their perspective. **

Write-Same vs XCopy when using Storage vMotion

I had a question last week about Storage vMotion and when Write-same vs XCopy was used. I was confident I knew the answer, but I figured I would do some testing. So what was the question exactly and the scenario I tested?

Imagine you have a virtual machine with a “lazy zero thick disk” and an “eager zero thick” disk. When initiating a Storage vMotion while preserving the disk format, would the pre-initialized blocks in the “eager zero thick” disk be copied through XCopy or would “write-same” (aka zero out) be used?

So that is what I tested. I created this virtual machine with two disks of which one being thick and about half filled and the other “eager zero thick”. I did a Storage vMotion to a different datastore (same format as source) and checked esxtop while the migration was on going:

CLONE_WR = 21943
ZERO = 2

In other words, when preserving the disk format the “XCopy” command (CLONE_WR) is issued by the hypervisor. The reason for this is when doing a SvMotion and keeping the disk formats the same the copy command is initiated for a chunk but the hypervisor doesn’t read the block before the command is initiated to the array. Hence the reason the hypervisor doesn’t know these are “zero” blocks in the “eager zero thick” disk and goes through the process of copy offload to the array.

Of course it would interesting to see what happens if I tell during the migration that all disks will need to become “eager zero thick”, remember one of the disks was “lazy zero thick”:

CLONE_WR = 21928
ZERO = 35247

It is clear that in this case it does zero out the blocks (ZERO). As there is a range of blocks which aren’t used by the virtual machine yet the hypervisor ensures these blocks are zeroed so that they can be used immediately when the virtual machine wants to… as that is what the admin requested “eager zero thick” aka pre-zeroed.

For those who want to play around with this, check esxtop and then the VAAI stats. I described how-to in this article.

How to disable Datastore Heartbeating

I have had this question multiple times now, how do I disable datastore heartbeating? Personally, I don’t know why you would ever want to do this… but as multiple people have asked I figured I would write it down. There is no “disable” button unfortunately, but there is a work-around. Below are the steps you need to take to disable datastore heartbeating.

vSphere Client:

  • Right Cluster object
  • Click “Edit Settings”
  • Click “Datastore Heartbeating”
  • Click “Select only from my preferred datastores”
  • Do not select any datastores

Web Client:

  • Click “Cluster object”
  • Click “Manage” tab
  • Click “vSphere HA”
  • Click “Edit button” on the right side
  • Click “Datastore Heartbeating”
  • Click “Select only from my preferred datastores”
  • Do not select any datastores

It is as simple as that… However, let me stress that this is not something that I would recommend doing. Only when you are troubleshooting and need it disabled for whatever reason, please make sure to enable it when you are done.