VMware

VMware View without HA?

Duncan Epping · Jul 15, 2010 ·

I was discussing something with one of my former colleagues a couple of days ago. He asked me what the impact was of running VMware View in an environment without HA.

To be honest I am not a View SME, but I do know a thing or two about HA/vSphere in general. So the first thing that I mentioned was that it wasn’t a good idea. Although VDI in general is all about density not running HA in these environments could lead to serious issues when a host fails.

Now, just imagine you have 80 Desktop VMs per host running and roughly 8 hosts in a DRS only cluster on NFS based storage. One of those hosts is isolated from the network…. what happens?

User connection is dropped
VMDK Lock times out
User tries to reconnect
Broker powers on the VM on a new host

Now that sounds great doesn’t it? Well yeah in a way it does, but what happens when the host is not isolated anymore?

Indeed, the VMs were still running. So basically you have a split brain scenario. The only way in the past to avoid this was to make sure you had HA enabled and had set HA to power off the VM.

But with vSphere 4 Update 2 a new mechanism has been introduced. I wanted to stress this, as some people have already made assumption that it is part of AAM/HA. It actually isn’t… The question for powering off the VM to recover from the split brain scenario is generated by “hostd” and answered by “vpxa”. In other words, with or without HA enabled ESX(i) will recover the split brain

Again, I am most definitely not a Desktop/View guy so I am wondering how the View experts out there look against disabling HA on your View Compute Cluster. (Note that on the Management Layer this should be enabled.)

vSphere 4.1: Datacenter.QueryConnectionInfo failed?

Duncan Epping · Jul 13, 2010 ·

When I was installing vSphere 4.1 ESXi I ran into a problem. I received the following error when I added the ESXi host to my cluster:

Call “Datacenter.QueryConnectionInfo” for object “yellow bricks” on vCenter Server “W2K8-001” failed.

Although the error didn’t make much sense I had the feeling it had something to do with name resolution(This KB article gave a hint I guess). After I added my dns suffix on my NIC it worked. Problem solved.

vSphere 4.1 released

Duncan Epping · Jul 13, 2010 ·

I just wanted to let you know that vSphere 4.1 has been released and is available to download. No point in me rehashing the same “what’s new” info everyone is rehashing today and probably the rest of the week. Expect some more detailed blogs coming up over the course of the upcoming weeks.

Reservations primer

Duncan Epping · Jul 8, 2010 ·

My colleague Craig Risinger wrote the below and was kind enough to share it with us. Thanks Craig!

A quick primer on VMware Reservations (not that anyone asked)…

A Reservation is a guarantee.

There’s a difference between reserving a resource and using it. A VM can use more or less than it has reserved. Also, if a reservation-holder isn’t using all the reserved resource, it will share CPU but not RAM. In other words, CPU reservations are friendly but memory reservations are greedy.

Reservation admission control:

If a VM has a reservation defined, the ESX host must have at least that much resource unreserved (not just unused, but unreserved) or else it will refuse to power on the VM. Reservations cannot overlap. A chunk of resource can be reserved by only one entity at a time; there can’t be two reservations on it.

Scenario #1

Given:

An ESX host has 16 GHz (= 8 x 2 GHz cores) and 16 GB.
VM-1 and VM-2 each have 8 vCPUs and 16 GB of vRAM.
VM-1 has reserved 13 GHz of CPU resources and 10 GB of memory.
VM-1 is currently using 11 GHz of CPU resources and 9 GB of memory. (Using != reserving.)

Consequently:

VM-2 can use up to 5 GHz. (Not 3 GHz, CPU reservations are friendly.)
VM-2 can reserve up to 3 GHz. (Using != reserving. Reservations don’t overlap.)
VM-2 can use up to 6 GB. (Not 7 GB. Memory reservations are greedy.)
VM-2 can reserve up to 6 GB. (Reservations don’t overlap.)

Please note that if VM-2 had a 7 GB reservation defined, it would not power on. (Reservation admission control.)

It’s also possible for VM-1 to use more resources than it has reserved. That makes the discussion a bit more complex. VM-1 is guaranteed whatever it’s reserved, and it also gets to fight VM-2 for more resources, assuming VM-2 hasn’t reserved the excess. I’ll come up with example scenarios for that too if you like.

There’s good reason why CPU reservations are friendly but memory reservations are greedy. Say a reservation holder is not using all of a resource, and it lets an interloper use the resource for a while; later, the reservation holder wants to use all it has reserved. An interloper can be kicked off a pCPU quickly. CPU instructions are transient, quickly finished. But RAM holds data. If an interloper was holding pRAM, its data would have to be swapped to disk before the reservation holder could repurpose that pRAM to satisfy its reservation. That swapping would take significant time and delay the reservation holder unfairly. So ESX doesn’t allow reserved pRAM to be used by an interloper.

For a more detailed discussion that gets into Resource Pools, how memory reservations do or don’t prevent host-level swapping, and more, see the following post I wrote several months ago, http://www.yellow-bricks.com/2010/03/03/cpumem-reservation-behaviour/.

Author: Craig Risinger

Changes to Snapshot mechanism “Delete All”

Duncan Epping · Jul 5, 2010 ·

Don’t know if anyone noticed it or not but with the latest set of patches VMware changed the “Delete All” mechanism that is part of the Snapshot feature. I wrote multiple articles about the “Delete All” functionality as it often led to completely filled up VMFS volumes when someone used without knowing the inner workings.

Source

When using the Delete All option in Snapshot Manager, the snapshot farthest from the base disk is committed to its parent, causing that parent snapshot to grow. When the commit is complete, that snapshot is removed and the process starts over on the newly updated snapshot to its parent. This continues until every snapshot has been committed.

This method can be relatively slow since data farthest from the base disk might be copied several times. More importantly, this method can aggressively use disk space if the snapshots are large, which is especially problematic if a limited amount of space is available on the datastore. The space issue is troublesome in that you might choose to delete snapshots explicitly to free up storage.

This issue is resolved in this release in that the order of snapshot consolidation has been modified to start with the snapshot closest to the base disk instead of farthest. The end result is that copying data repeatedly is avoided.

Just to give an example, 4 snapshots:

Old situation (pre vSphere 4 Update 2)

Base disk – 15GB
Snapshot 1 – 1GB –> possibly grows to 13GB
Snapshot 2 – 1GB –> possibly grows to 12GB
Snapshot 3 – 1GB –> possibly grows to 11GB
Snapshot 4 – 10GB

Snapshot 4 is copied in to Snapshot 3, Snapshot 3 in to Snapshot 2, Snapshot 2 in to Snapshot 1 and Snapshot 1 in to your Base disk. After the copy of Snapshot 1 in to the Base disk all Snapshots will be deleted. Please note that the total amount of diskspace consumed before the “Delete All” was 28GB. Right before the final merge the consumed diskspace is 61GB. This is just an example, just imagine what could happen with a 100GB data disk!

New situation

Base disk – 15GB
Snapshot 1 – 1GB
Snapshot 2 – 1GB
Snapshot 3 – 1GB
Snapshot 4 – 10GB

Snapshot 1 is copied in to Base disk, Snapshot 2 is copied in to Base disk, Snapshot 3 in to Base disk and Snapshot 4 in to your Base disk. After the copy of Snapshot 4 in to the Base disk all Snapshots will be deleted. Please note that the total amount of diskspace consumed before the “Delete All” was 28GB. Right before the final merge the consumed diskspace is still 28GB. Not only did VMware reduced the chances of running out of disk space, the time to commit the snapshot by using “delete all” has also been decreased using this new mechanism.