ESX

vSphere 4.1 released

Duncan Epping · Jul 13, 2010 ·

I just wanted to let you know that vSphere 4.1 has been released and is available to download. No point in me rehashing the same “what’s new” info everyone is rehashing today and probably the rest of the week. Expect some more detailed blogs coming up over the course of the upcoming weeks.

Reservations primer

Duncan Epping · Jul 8, 2010 ·

My colleague Craig Risinger wrote the below and was kind enough to share it with us. Thanks Craig!

A quick primer on VMware Reservations (not that anyone asked)…

A Reservation is a guarantee.

There’s a difference between reserving a resource and using it. A VM can use more or less than it has reserved. Also, if a reservation-holder isn’t using all the reserved resource, it will share CPU but not RAM. In other words, CPU reservations are friendly but memory reservations are greedy.

Reservation admission control:

If a VM has a reservation defined, the ESX host must have at least that much resource unreserved (not just unused, but unreserved) or else it will refuse to power on the VM. Reservations cannot overlap. A chunk of resource can be reserved by only one entity at a time; there can’t be two reservations on it.

Scenario #1

Given:

An ESX host has 16 GHz (= 8 x 2 GHz cores) and 16 GB.
VM-1 and VM-2 each have 8 vCPUs and 16 GB of vRAM.
VM-1 has reserved 13 GHz of CPU resources and 10 GB of memory.
VM-1 is currently using 11 GHz of CPU resources and 9 GB of memory. (Using != reserving.)

Consequently:

VM-2 can use up to 5 GHz. (Not 3 GHz, CPU reservations are friendly.)
VM-2 can reserve up to 3 GHz. (Using != reserving. Reservations don’t overlap.)
VM-2 can use up to 6 GB. (Not 7 GB. Memory reservations are greedy.)
VM-2 can reserve up to 6 GB. (Reservations don’t overlap.)

Please note that if VM-2 had a 7 GB reservation defined, it would not power on. (Reservation admission control.)

It’s also possible for VM-1 to use more resources than it has reserved. That makes the discussion a bit more complex. VM-1 is guaranteed whatever it’s reserved, and it also gets to fight VM-2 for more resources, assuming VM-2 hasn’t reserved the excess. I’ll come up with example scenarios for that too if you like.

There’s good reason why CPU reservations are friendly but memory reservations are greedy. Say a reservation holder is not using all of a resource, and it lets an interloper use the resource for a while; later, the reservation holder wants to use all it has reserved. An interloper can be kicked off a pCPU quickly. CPU instructions are transient, quickly finished. But RAM holds data. If an interloper was holding pRAM, its data would have to be swapped to disk before the reservation holder could repurpose that pRAM to satisfy its reservation. That swapping would take significant time and delay the reservation holder unfairly. So ESX doesn’t allow reserved pRAM to be used by an interloper.

For a more detailed discussion that gets into Resource Pools, how memory reservations do or don’t prevent host-level swapping, and more, see the following post I wrote several months ago, http://www.yellow-bricks.com/2010/03/03/cpumem-reservation-behaviour/.

Author: Craig Risinger

Are you using DPM? We need you!

Duncan Epping · Jul 7, 2010 ·

I just received a request from our Product Management team. Gil Adato is is working on the next generation of DPM and is seeking DPM current and past users and he asked me to post the following message on my blog. I hope you can help Gil and VMware taking DPM to the next level!

Dear VMware Customers,

Product management is starting planning efforts for the 2012 vSphere release, and we are considering numerous DPM improvements. In order to make the right decision, it’s critical that we get a better understanding of our customers’ experiences with the current product, and get customers’ feedback on what needs to be on the roadmap. We’re also interested in customers’ improvement suggestions and in sharing with them several improvement ideas that we have in mind. Customers’ input will be very helpful to us, and customers would see the benefit of communicating their comments, requirements and suggestions/wish-list directly to the product team.

If you’re a current or past DPM user and you’d like to be a part of the process and help shape the next generation of VMware’s products, please contact me directly.

My contact info is the following:

Gil Adato
gadato@vmware.com

When contacting me, please send me the following initial information:

Company Name
Customer contact name
Location (Country)
Customer’s email address

VMware’s product management team is looking forward to hearing from you and making you a part of the product development process.

Thank you for your cooperation,

Gil Adato

Memory Limits

Duncan Epping · Jul 6, 2010 ·

We had a discussion internally around memory limits and what the use case would be for using them. I got some great feedback on my reply and comments so I decided to turn the whole thing into a blog article.

A comment made by one of our developers, which I highly respect, is what triggered my reply. Please note that this is not VMware’s view or usecase but what some of our customers feed back to our development team.

An admin may impose a limit on VMs executing on an unloaded host to better reflect the actual service a VM will likely get once the system is loaded; I’ve heard this use case from several admins)

From a memory performance perspective that is probably the worst thing an Admin can do in my humble opinion. If you are seriously overcommitting your hosts up to the point where swapping or ballooning will occur you need to think about the way you are provisioning. I can understand, well not really, people doing it on a CPU level as the impact is much smaller.

Andrew Mitchell commented on the same email and his reply is key to understanding the impact of memory limits.

“When modern OS’s boot, one of the first things they do is check to see how much RAM they have available then tune their caching algorithms and memory management accordingly. Applications such as SQL, Oracle and JVMs do much the same thing.”

I guess the best way to explain in one line is: The limit is not exposed to the OS itself and as such the App will suffer and so will the service provided to the user.

The funny thing about this is that although the App might request everything it can it, it might not even need it. In that case, more common than we think, it is better to decrease provisioned memory than to create an artificial boundary by applying a memory limit. The limit will more than likely impose an unneeded and unwanted performance impact. Simply lowering the amount of provisioned memory might impact performance but most likely will not as the OS will tune it’s caching algorithms and memory management accordingly.

How does “das.maxvmrestartcount” work?

Duncan Epping · Jun 30, 2010 ·

The amount of retries is configurable as of vCenter 2.5 U4 with the advanced option “das.maxvmrestartcount”. My colleague Hugo Strydom wrote about this a while ago and after a short discussion with one of our developers I realised Hugo’s article was not 100% correct. The default value is 5. Pre vCenter 2.5 U4 HA would keep retrying forever which could lead to serious problems as described in KB article 1009625 where multiple virtual machines would be registered on multiple hosts simultaneously leading to a confusing and inconsistent state. (http://kb.vmware.com/kb/1009625)

Important to note is that HA will try to start the virtual machine one of your hosts in the affected cluster; if this is unsuccessful on that host the restart count will be increased by 1. The first restart attempt will than occur after two minutes. If that one fails the next will occur after 4 minutes, and if that one fails the following will occur after 8 minutes until the “das.maxvmrestartcount” has been reached.

To make it more clear look at the following:

T+0 – Restart
T+2 – Restart retry 1
T+4 – Restart retry 2
T+8 – Restart retry 3
T+8 – Restart retry 4
T+8 – Restart retry 5

In other words, it could take up to 30 minutes before a successful restart has been initiated when using the default of “5” restarts max. If you increase that number, each following will also be “T+8” again.