I just wanted to let you know that vSphere 4.1 has been released and is available to download. No point in me rehashing the same “what’s new” info everyone is rehashing today and probably the rest of the week. Expect some more detailed blogs coming up over the course of the upcoming weeks.
VMware
Reservations primer
My colleague Craig Risinger wrote the below and was kind enough to share it with us. Thanks Craig!
A quick primer on VMware Reservations (not that anyone asked)…
A Reservation is a guarantee.
There’s a difference between reserving a resource and using it. A VM can use more or less than it has reserved. Also, if a reservation-holder isn’t using all the reserved resource, it will share CPU but not RAM. In other words, CPU reservations are friendly but memory reservations are greedy.
Reservation admission control:
If a VM has a reservation defined, the ESX host must have at least that much resource unreserved (not just unused, but unreserved) or else it will refuse to power on the VM. Reservations cannot overlap. A chunk of resource can be reserved by only one entity at a time; there can’t be two reservations on it.
Scenario #1
Given:
An ESX host has 16 GHz (= 8 x 2 GHz cores) and 16 GB.
VM-1 and VM-2 each have 8 vCPUs and 16 GB of vRAM.
VM-1 has reserved 13 GHz of CPU resources and 10 GB of memory.
VM-1 is currently using 11 GHz of CPU resources and 9 GB of memory. (Using != reserving.)
Consequently:
VM-2 can use up to 5 GHz. (Not 3 GHz, CPU reservations are friendly.)
VM-2 can reserve up to 3 GHz. (Using != reserving. Reservations don’t overlap.)
VM-2 can use up to 6 GB. (Not 7 GB. Memory reservations are greedy.)
VM-2 can reserve up to 6 GB. (Reservations don’t overlap.)
Please note that if VM-2 had a 7 GB reservation defined, it would not power on. (Reservation admission control.)
It’s also possible for VM-1 to use more resources than it has reserved. That makes the discussion a bit more complex. VM-1 is guaranteed whatever it’s reserved, and it also gets to fight VM-2 for more resources, assuming VM-2 hasn’t reserved the excess. I’ll come up with example scenarios for that too if you like.
There’s good reason why CPU reservations are friendly but memory reservations are greedy. Say a reservation holder is not using all of a resource, and it lets an interloper use the resource for a while; later, the reservation holder wants to use all it has reserved. An interloper can be kicked off a pCPU quickly. CPU instructions are transient, quickly finished. But RAM holds data. If an interloper was holding pRAM, its data would have to be swapped to disk before the reservation holder could repurpose that pRAM to satisfy its reservation. That swapping would take significant time and delay the reservation holder unfairly. So ESX doesn’t allow reserved pRAM to be used by an interloper.
For a more detailed discussion that gets into Resource Pools, how memory reservations do or don’t prevent host-level swapping, and more, see the following post I wrote several months ago, http://www.yellow-bricks.com/2010/03/03/cpumem-reservation-behaviour/.
Author: Craig Risinger
Changes to Snapshot mechanism “Delete All”
Don’t know if anyone noticed it or not but with the latest set of patches VMware changed the “Delete All” mechanism that is part of the Snapshot feature. I wrote multiple articles about the “Delete All” functionality as it often led to completely filled up VMFS volumes when someone used without knowing the inner workings.
When using the Delete All option in Snapshot Manager, the snapshot farthest from the base disk is committed to its parent, causing that parent snapshot to grow. When the commit is complete, that snapshot is removed and the process starts over on the newly updated snapshot to its parent. This continues until every snapshot has been committed.
This method can be relatively slow since data farthest from the base disk might be copied several times. More importantly, this method can aggressively use disk space if the snapshots are large, which is especially problematic if a limited amount of space is available on the datastore. The space issue is troublesome in that you might choose to delete snapshots explicitly to free up storage.
This issue is resolved in this release in that the order of snapshot consolidation has been modified to start with the snapshot closest to the base disk instead of farthest. The end result is that copying data repeatedly is avoided.
Just to give an example, 4 snapshots:
Old situation (pre vSphere 4 Update 2)
- Base disk – 15GB
- Snapshot 1 – 1GB –> possibly grows to 13GB
- Snapshot 2 – 1GB –> possibly grows to 12GB
- Snapshot 3 – 1GB –> possibly grows to 11GB
- Snapshot 4 – 10GB
Snapshot 4 is copied in to Snapshot 3, Snapshot 3 in to Snapshot 2, Snapshot 2 in to Snapshot 1 and Snapshot 1 in to your Base disk. After the copy of Snapshot 1 in to the Base disk all Snapshots will be deleted. Please note that the total amount of diskspace consumed before the “Delete All” was 28GB. Right before the final merge the consumed diskspace is 61GB. This is just an example, just imagine what could happen with a 100GB data disk!
New situation
- Base disk – 15GB
- Snapshot 1 – 1GB
- Snapshot 2 – 1GB
- Snapshot 3 – 1GB
- Snapshot 4 – 10GB
Snapshot 1 is copied in to Base disk, Snapshot 2 is copied in to Base disk, Snapshot 3 in to Base disk and Snapshot 4 in to your Base disk. After the copy of Snapshot 4 in to the Base disk all Snapshots will be deleted. Please note that the total amount of diskspace consumed before the “Delete All” was 28GB. Right before the final merge the consumed diskspace is still 28GB. Not only did VMware reduced the chances of running out of disk space, the time to commit the snapshot by using “delete all” has also been decreased using this new mechanism.
vSphere 4 U2 and recovering from HA Split Brain
A couple of months ago I wrote this article about a future feature that would enable HA to recover from a Split Brain scenario. vSphere 4.0 Update 2 recently was released but the release notes or documentation did not mention this new feature.
I had never noticed this until I was having a discussion around this feature with one of my colleagues. I asked our HA Product Manager and one of our developers and it appears that this mysteriously has slipped the release notes. As I personally believe that this is a very important feature of HA I wanted to rehash some of the info stated in that article. I did rewrite it slightly though. Here we go:
One of the most common issues experienced in an iSCSI/NFS environment with VMware HA pre vSphere 4.0 Update 2 is a split brain situation.
First let me explain what a split brain scenario is, lets start with describing the situation which is most commonly encountered:
- 4 Hosts
- iSCSI / NFS based storage
- Isolation response: leave powered on
When one of the hosts is completely isolated, including the Storage Network, the following will happen:
- Host ESX001 is completely isolated including the storage network(remember iSCSI/NFS based storage!) but the VMs will not be powered off because the isolation response is set to “leave powered on”.
- After 15 seconds the remaining, non isolated, hosts will try to restart the VMs.
- Because of the fact that the iSCSI/NFS network is also isolated the lock on the VMDK will time out and the remaining hosts will be able to boot up the VMs.
- When ESX001 returns from isolation it will still have the VMX Processes running in memory and this is when you will see a “ping-pong” effect within vCenter, in other words VMs flipping back and forth between ESX001 and any of the other hosts.
As of version 4.0 Update 2 ESX(i) detects that the lock on the VMDK has been lost and issues a question which is automatically answered. The VM will be powered off to recover from the split-brain scenario and to avoid the ping-pong effect. Please note that HA will generate an event for this auto-answer which is viewable within vCenter.
Don’t you just love VMware HA!
How does “das.maxvmrestartcount” work?
The amount of retries is configurable as of vCenter 2.5 U4 with the advanced option “das.maxvmrestartcount”. My colleague Hugo Strydom wrote about this a while ago and after a short discussion with one of our developers I realised Hugo’s article was not 100% correct. The default value is 5. Pre vCenter 2.5 U4 HA would keep retrying forever which could lead to serious problems as described in KB article 1009625 where multiple virtual machines would be registered on multiple hosts simultaneously leading to a confusing and inconsistent state. (http://kb.vmware.com/kb/1009625)
Important to note is that HA will try to start the virtual machine one of your hosts in the affected cluster; if this is unsuccessful on that host the restart count will be increased by 1. The first restart attempt will than occur after two minutes. If that one fails the next will occur after 4 minutes, and if that one fails the following will occur after 8 minutes until the “das.maxvmrestartcount” has been reached.
To make it more clear look at the following:
- T+0 – Restart
- T+2 – Restart retry 1
- T+4 – Restart retry 2
- T+8 – Restart retry 3
- T+8 – Restart retry 4
- T+8 – Restart retry 5
In other words, it could take up to 30 minutes before a successful restart has been initiated when using the default of “5” restarts max. If you increase that number, each following will also be “T+8” again.