resource management

New fling released: VM Resource and Availability Service

Duncan Epping · Feb 2, 2015 ·

I have the pleasure of announcing a brand new fling that was released today. This fling is called “VM Resource and Availability Service” and is something that I came up with during a flight to Palo Alto while talking to Frank Denneman. When it comes to HA Admission Control the one thing that always bugged me was why it was all based on static values. Yes it is great to know my VMs will restart, but I would also like to know if they will receive the resources they were receiving before the fail-over. In other words, will my user experience be the same or not? After going back and forth with engineering we decided that this could be worth exploring further and we decided to create a fling. I want to thank Rahul(DRS Team), Manoj and Keith(HA Team) for taking the time and going to this extend to explore this concept.

Something which I think is also unique is that this is a SaaS based solution, it allows you to upload a DRM dump and then you can simulate failure of one or more hosts from a cluster (in vSphere) and identify how many:

VMs would be safely restarted on different hosts
VMs would fail to be restarted on different hosts
VMs would experience performance degradation after restarted on a different host

With this information, you can better plan the placement and configuration of your infrastructure to reduce downtime of your VMs/Services in case of host failures. Is that useful or what? I would like to ask everyone to go through the motion, and of course to provide feedback if you feel this is useful information or not. You can leave feedback on this blog post or the fling website, we are aiming to monitor both.

For those who don’t know where to find the DRM dump, Frank described it in his article on the drmdiagnose fling, which I also recommend trying out! There is also a readme file with a bit more in-depth info!

vCenter server appliance: /var/log/vmware/vpx/drmdump/clusterX/
vCenter server Windows 2003: %ALLUSERSPROFILE%\Application Data\VMware\VMware VirtualCenter\Logs\drmdump\clusterX\
vCenter server Windows 2008: %ALLUSERSPROFILE%\VMware\VMware VirtualCenter\Logs\drmdump\clusterX\

So where can you find it? Well that is really easy, no downloads as I said… fully ran as a service:

Open hasimulator.vmware.com to access the web service.
Click on “Simulate Now” to accept the EULA terms, upload the DRM dump file and start the simulation process.
Click on the help icon (at the top right corner) for a detailed description on how to use this service.

Hardening recommendation to set limits on VMs or Resource Pools?

Duncan Epping · Jul 25, 2013 ·

I received this question last week about a recommendation which was in the vSphere 5.1 Hardening Guide. The recommendation in the vSphere 5.1 Hardening Guide is the following:

By default, all virtual machines on an ESXi host share the resources equally. By using the resource management capabilities of ESXi, such as shares and limits, you can control the server resources that a virtual machine consumes. You can use this mechanism to prevent a denial of service that causes one virtual machine to consume so much of the host’s resources that other virtual machines on the same host cannot perform their intended functions.

Now it might be just me but I don’t get the recommendation and my answer to this customer was as follows:
Virtual machines can never use more CPU/Memory resources then provisioned. For instance, when 4GB of memory is provisioned for a virtual machine the Guest OS of that VM will never consume more than 4GB. Same applies to CPU, if a VM has a single vCPU than that VM can never consume more than a single core of a CPU.

So how do I limit my VM? First of all: right sizing! If your VM needs 4GB then don’t provision it with 12GB as it some point it will consume it. Secondly: shares. Shares are the easiest way to ensure that the “noisy neighbor” isn’t pushing away the other virtual machines. By even leaving the shares set to default you can ensure that at least all “alike VMs” have more or less the same priority when it comes to resources. So what about limits?

Try to avoid (VM Level) limits at all times! Why? Well look at memory for a second, lets say you provision your VM with 4GB and limit it to 4GB and now someone changes the memory to 8GB but forgets to change the limit. So what happens? Well your VM uses up the 4GB and moves in to “extra 4GB” but the limit is there, so you the VM will experience memory pressure and you will see ballooning / swapping etc. Not a scenario you want to find yourself in right, indeed! What about CPU then? Well again, it is a hard limit in ALL scenarios. So if you set a 1GHz scenario but have a 2.3GHz CPU, your VM will not consume the 2.3GHz ever…. A waste? Yes it is. And not just VM level limits, there is also an operational impact with resource pool limits.

I can understand what the hardening guide is suggesting, but believe me you don’t want to go there. So let it be clear, AVOID using limits at all times!

How does Mem.MinFreePct work with vSphere 5.0 and up?

Duncan Epping · Jun 14, 2013 ·

With vSphere 5.0 VMware changed the way Mem.MinFreePct worked. I had briefly explained Mem.MinFreePct in a blog post a long time ago. Basically Mem.MinFreePct, pre vSphere 5.0, was the percentage of memory set aside by the VMkernel to ensure there are always sufficient system resources available. I received a question on twitter yesterday based on the explanation in the vSphere 5.1 Clustering Deepdive and after exchanging > 10 tweets I figured it made sense to just write an article.

https://twitter.com/vmcutlip/status/345289952684290048

Mem.MinFreePct used to be 6% with vSphere 4.1 and lower. Now you can imagine that when you had a host with 10GB you wouldn’t worry about 600MB being kept free, but that is slightly different for a host with 100GB as it would result in 6GB being kept free but still not an extreme amount right. What would happen when you have a host with 512GB of memory… Yes, that would result in 30GB of memory being kept free. I am guessing you can see the point now. So what changed with vSphere 5.0?

In vSphere 5.0 a “sliding scale” principle was introduced instead of Mem.MinFreePct. Let me call it “Mem.MinFree”, as I wouldn’t view this as a percentage but rather do the math and view it as a number instead. Lets borrow Frank’s table for this sliding scale concept:

Percentage kept free of –>	Memory Range
6%	0-4GB
4%	4-12GB
2%	12-28GB
1%	Remaining memory

What does this mean if you have 100GB of memory in your host? It means that from the first 4GB of memory we will set aside 6% which equates to ~ 245MB. For the next 8GB (4-12GB range) we set aside another 4% which equates to ~327MB. For the next 16GB (12-28GB range) we set aside 2% which equates to ~ 327MB. Now from the remaining 72GB (100GB host – 28GB) we set aside 1% which equates to ~ 720MB. In total the value of Mem.MinFree is ~ 1619MB. This number, 1619MB, is being kept free for the system.

Now, what happens when the host has less than 1619MB of free memory? That is when the various memory reclamation techniques come in to play. We all know the famous “high, soft, hard, and low” memory states, these used to be explained as: 6% (High), 4% (Soft), 2% (Hard), 1% (Low). FORGET THAT! Yes, I mean that… forget these as that is what we used in the “old world” (pre 5.0). With vSphere 5.0 and up these water marks should be viewed as a Percentage of Mem.MinFree. I used the example from above to clarify it a bit what it results in.

Free memory state	Threshold in Percentage	Threshold in MB
High water mark	Higher than or equal to Mem.MinFree	1619MB
Soft water mark	64% of Mem.MinFree	1036MB
Hard water mark	32% of Mem.MinFree	518MB
Low water mark	16% of Mem.MinFree	259MB

I hope this clarifies a bit how vSphere 5.0 (and up) ensures there is sufficient memory available for the VMkernel to handle system tasks…