Server

VMware View without HA?

Duncan Epping · Jul 15, 2010 ·

I was discussing something with one of my former colleagues a couple of days ago. He asked me what the impact was of running VMware View in an environment without HA.

To be honest I am not a View SME, but I do know a thing or two about HA/vSphere in general. So the first thing that I mentioned was that it wasn’t a good idea. Although VDI in general is all about density not running HA in these environments could lead to serious issues when a host fails.

Now, just imagine you have 80 Desktop VMs per host running and roughly 8 hosts in a DRS only cluster on NFS based storage. One of those hosts is isolated from the network…. what happens?

User connection is dropped
VMDK Lock times out
User tries to reconnect
Broker powers on the VM on a new host

Now that sounds great doesn’t it? Well yeah in a way it does, but what happens when the host is not isolated anymore?

Indeed, the VMs were still running. So basically you have a split brain scenario. The only way in the past to avoid this was to make sure you had HA enabled and had set HA to power off the VM.

But with vSphere 4 Update 2 a new mechanism has been introduced. I wanted to stress this, as some people have already made assumption that it is part of AAM/HA. It actually isn’t… The question for powering off the VM to recover from the split brain scenario is generated by “hostd” and answered by “vpxa”. In other words, with or without HA enabled ESX(i) will recover the split brain

Again, I am most definitely not a Desktop/View guy so I am wondering how the View experts out there look against disabling HA on your View Compute Cluster. (Note that on the Management Layer this should be enabled.)

vSphere 4.1, VMware HA New maximums and DRS integration will make our life easier

Duncan Epping · Jul 14, 2010 ·

I guess there are a couple of keypoints I will need to stress for those creating designs:

New HA maximums

32 host clusters
320 virtual machines per host
3,000 virtual machines per cluster

In other words:

you can have 10 hosts with 300 VMs each
or 20 hosts with 150 VMs each
or 32 with 93 VMs….

as long as you don’t go beyond 320 per host or 3000 per cluster you are fine!

DRS Integration

HA integrates on multiple levels with DRS as of vSphere 4.1. It is a huge improvement and it is something in my opinion that everyone should know about.

Resource Fragmentation

As of vSphere 4.1 HA is closely integrated with DRS. When a failover occurs HA will first check if there are resources available on that host for the failover. If resources are not available HA will ask DRS to accommodate for these where possible. Think about a VM with a huge reservations and fragmented resources throughout your cluster as described in my HA Deepdive. HA, as of 4.1, will be able to request a defragmentation of resources to accommodate for this VMs resource requirements. How cool is that?! One thing to note though is that HA will request it, but a guarantee can not be given so you should still be cautious when it comes to resource fragmentation.

DPM

In the past there barely was integration between DRS/DPM and HA. Especially when DPM was enabled this could lead to some weir behaviour when resources where scarce and an HA failover would need to happen. With vSphere 4.1 this has changed. In such cases, VMware HA will use DRS to try to adjust the cluster (for example, by bringing hosts out of standby mode or migrating virtual machines to defragment the cluster resources) so that HA can perform the failovers.

I didn’t even found out about this one until I read the Availability Guide again. Prior to vSphere 4.1, an HA failed over virtual machine could be granted more resource shares then it should causing resource starvation until DRS balanced the load. As of vSphere 4.1 HA calculates normalized shares for a virtual machine when it is powered on after an isolation event!

vSphere 4.1: Datacenter.QueryConnectionInfo failed?

Duncan Epping · Jul 13, 2010 ·

When I was installing vSphere 4.1 ESXi I ran into a problem. I received the following error when I added the ESXi host to my cluster:

Call “Datacenter.QueryConnectionInfo” for object “yellow bricks” on vCenter Server “W2K8-001” failed.

Although the error didn’t make much sense I had the feeling it had something to do with name resolution(This KB article gave a hint I guess). After I added my dns suffix on my NIC it worked. Problem solved.

vSphere 4.1 released

Duncan Epping · Jul 13, 2010 ·

I just wanted to let you know that vSphere 4.1 has been released and is available to download. No point in me rehashing the same “what’s new” info everyone is rehashing today and probably the rest of the week. Expect some more detailed blogs coming up over the course of the upcoming weeks.

Reservations primer

Duncan Epping · Jul 8, 2010 ·

My colleague Craig Risinger wrote the below and was kind enough to share it with us. Thanks Craig!

A quick primer on VMware Reservations (not that anyone asked)…

A Reservation is a guarantee.

There’s a difference between reserving a resource and using it. A VM can use more or less than it has reserved. Also, if a reservation-holder isn’t using all the reserved resource, it will share CPU but not RAM. In other words, CPU reservations are friendly but memory reservations are greedy.

Reservation admission control:

If a VM has a reservation defined, the ESX host must have at least that much resource unreserved (not just unused, but unreserved) or else it will refuse to power on the VM. Reservations cannot overlap. A chunk of resource can be reserved by only one entity at a time; there can’t be two reservations on it.

Scenario #1

Given:

An ESX host has 16 GHz (= 8 x 2 GHz cores) and 16 GB.
VM-1 and VM-2 each have 8 vCPUs and 16 GB of vRAM.
VM-1 has reserved 13 GHz of CPU resources and 10 GB of memory.
VM-1 is currently using 11 GHz of CPU resources and 9 GB of memory. (Using != reserving.)

Consequently:

VM-2 can use up to 5 GHz. (Not 3 GHz, CPU reservations are friendly.)
VM-2 can reserve up to 3 GHz. (Using != reserving. Reservations don’t overlap.)
VM-2 can use up to 6 GB. (Not 7 GB. Memory reservations are greedy.)
VM-2 can reserve up to 6 GB. (Reservations don’t overlap.)

Please note that if VM-2 had a 7 GB reservation defined, it would not power on. (Reservation admission control.)

It’s also possible for VM-1 to use more resources than it has reserved. That makes the discussion a bit more complex. VM-1 is guaranteed whatever it’s reserved, and it also gets to fight VM-2 for more resources, assuming VM-2 hasn’t reserved the excess. I’ll come up with example scenarios for that too if you like.

There’s good reason why CPU reservations are friendly but memory reservations are greedy. Say a reservation holder is not using all of a resource, and it lets an interloper use the resource for a while; later, the reservation holder wants to use all it has reserved. An interloper can be kicked off a pCPU quickly. CPU instructions are transient, quickly finished. But RAM holds data. If an interloper was holding pRAM, its data would have to be swapped to disk before the reservation holder could repurpose that pRAM to satisfy its reservation. That swapping would take significant time and delay the reservation holder unfairly. So ESX doesn’t allow reserved pRAM to be used by an interloper.

For a more detailed discussion that gets into Resource Pools, how memory reservations do or don’t prevent host-level swapping, and more, see the following post I wrote several months ago, http://www.yellow-bricks.com/2010/03/03/cpumem-reservation-behaviour/.

Author: Craig Risinger

Server

New HA maximums

DRS Integration

Resource Fragmentation

DPM

Shares