esxi

Memory Limits

Duncan Epping · Jul 6, 2010 ·

We had a discussion internally around memory limits and what the use case would be for using them. I got some great feedback on my reply and comments so I decided to turn the whole thing into a blog article.

A comment made by one of our developers, which I highly respect, is what triggered my reply. Please note that this is not VMware’s view or usecase but what some of our customers feed back to our development team.

An admin may impose a limit on VMs executing on an unloaded host to better reflect the actual service a VM will likely get once the system is loaded; I’ve heard this use case from several admins)

From a memory performance perspective that is probably the worst thing an Admin can do in my humble opinion. If you are seriously overcommitting your hosts up to the point where swapping or ballooning will occur you need to think about the way you are provisioning. I can understand, well not really, people doing it on a CPU level as the impact is much smaller.

Andrew Mitchell commented on the same email and his reply is key to understanding the impact of memory limits.

“When modern OS’s boot, one of the first things they do is check to see how much RAM they have available then tune their caching algorithms and memory management accordingly. Applications such as SQL, Oracle and JVMs do much the same thing.”

I guess the best way to explain in one line is: The limit is not exposed to the OS itself and as such the App will suffer and so will the service provided to the user.

The funny thing about this is that although the App might request everything it can it, it might not even need it. In that case, more common than we think, it is better to decrease provisioned memory than to create an artificial boundary by applying a memory limit. The limit will more than likely impose an unneeded and unwanted performance impact. Simply lowering the amount of provisioned memory might impact performance but most likely will not as the OS will tune it’s caching algorithms and memory management accordingly.

How does “das.maxvmrestartcount” work?

Duncan Epping · Jun 30, 2010 ·

The amount of retries is configurable as of vCenter 2.5 U4 with the advanced option “das.maxvmrestartcount”. My colleague Hugo Strydom wrote about this a while ago and after a short discussion with one of our developers I realised Hugo’s article was not 100% correct. The default value is 5. Pre vCenter 2.5 U4 HA would keep retrying forever which could lead to serious problems as described in KB article 1009625 where multiple virtual machines would be registered on multiple hosts simultaneously leading to a confusing and inconsistent state. (http://kb.vmware.com/kb/1009625)

Important to note is that HA will try to start the virtual machine one of your hosts in the affected cluster; if this is unsuccessful on that host the restart count will be increased by 1. The first restart attempt will than occur after two minutes. If that one fails the next will occur after 4 minutes, and if that one fails the following will occur after 8 minutes until the “das.maxvmrestartcount” has been reached.

To make it more clear look at the following:

T+0 – Restart
T+2 – Restart retry 1
T+4 – Restart retry 2
T+8 – Restart retry 3
T+8 – Restart retry 4
T+8 – Restart retry 5

In other words, it could take up to 30 minutes before a successful restart has been initiated when using the default of “5” restarts max. If you increase that number, each following will also be “T+8” again.

DRS Sub Cluster? vSphere 4.next

Duncan Epping · Jun 21, 2010 ·

On the community forums a question was asked around Campus Clusters and pinning VMs to a specific set of hosts. In vSphere 4.0 that’s currently not possible unfortunately and it definitely is a feature that many customers would want to use.

Banjot Chanana revealed during VMworld that it was an upcoming feature but did not go into much details. However on the community forums, thanks @lamw for point this out, Elisha just revealed the following:

Controls will be available in the upcoming vSphere 4.1 release to enable this behavior. You’ll be able to set “soft” (ie. preferential) or “hard” (ie. strict) rules associating a set of vms with a set of hosts. HA will respect the hard rules and only failover vms to the appropriate hosts.

Basically DRS Host Affinity rules which VMware HA adheres to. Can’t wait for the upcoming vSphere version to be released and to figure out how all these nice “little” enhancements change our designs.

HA: Max amount of host failures?

Duncan Epping · Jun 18, 2010 ·

A colleague had a question around the maximum amount of host failures HA could take. The availability guide states the following:

The maximum Configured Failover Capacity that you can set is four. Each cluster has up to five primary hosts and if all fail simultaneously, failover of all hosts might not be successful.

However, when you select the “Percentage” admission control policy you can set it to 50% even when you have 32 hosts in a cluster. That means that the amount of failover capacity being reserved equals 16 hosts.

Although this is fully supported but there is a caveat of course. The amount of primary nodes is still limited to five. Even if you have the ability to reserve over 5 hosts as spare capacity that does not guarantee a restart. If, for what ever reason, half of your 32 hosts cluster fails and those 5 primaries happen to be part of the failed hosts your VMs will not restart. (One of the primary nodes coordinates the fail-over!) Although the “percentage” option enables you to save additional spare capacity there’s always the chance all primaries fail.

All in all, I still believe the Percentage admission control policy provides you more flexibility than any other admission control policy.

Storage IO Control, the movie

Duncan Epping · Jun 17, 2010 ·

Not sure why hardly anyone picked up on this cool youtube movie about Storage IO Control(SIOC), but I figured it was worth posting. SIOC is probably one of the coolest version coming to a vSphere version in the near future. Scott Drummonds wrote a cool article about it which shows the strength of SIOC when it comes to fairness. One might say that there already is a tool to do it and that’s called per VM disk shares, well that’s not entirely true… The following diagrams depict the current situation(without…) and the future(with…) :

As the diagrams clearly shows, the current version of shares are on a per Host basis. When a single VM on a host floods your storage all other VMs on the datastore will be effected. Those who are running on the same host could easily, by using shares, carve up the bandwidth. However if that VM which causes the load would move a different host the shares would be useless. With SIOC the fairness mechanism that was introduced goes one level up. That means that on a cluster level Disk shares will be taken into account.

There are a couple of things to keep in mind though:

SIOC is enabled per datastore
SIOC only applies disk shares when a certain threshold(Device Latency, most likely 30ms) has been reached.
- The latency value will be configurable but it is not recommended for now
SIOC carves out the array queue, this enables a faster response for VMs doing for instance sequential IOs
SIOC will enforce limits in terms of IOPS when specified on the VM level
No reservation setting for now…

Anyway, enough random ramblings… here’s the movie. Watch it!

For those with a VMworld account I can recommend watching TA3461.