vcenter

My Homelab

Duncan Epping · Jan 19, 2010 ·

This weeks VMTN podcast is about Homelabs. John Troyer asked on twitter who had a homelab and if they already posted an article about it. Most bloggers already did but I never got to it. Weird thing is that the common theme for most virtualization bloggers seems to be physical! Take a look at what some of these guys have in their home lab and try to imagine the associated cost in terms of cooling, power but also the noise associated with it.

Jason Boche – EMC Celerra NS-120
Chad Sakac – Building a home lab (check the storage he has at home!)
Gabe – White box ESX home lab

I decided to take a completely different route. Why buy three or four servers when you can run all your ESX hosts virtually on a single desktop. Okay, I must admit, it is a desktop on steroids but it does save me a lot of (rack)space, noise, heat and of course electricity. Here are the core components of which my Desktop consists:

Asustek P6T WS Pro
Intel Core i7-920
6 x 2GB Kingston 1333Mhz
2 x Seagate Cheetah SAS 15k6 in RAID-0

I also have two NAS devices on which I have multiple iSCSI LUNs and NFS shares. I even have replication going on between the two devices! Works like a charm.

2 x Iomega IX4-200d

There’s one crucial part missing. On my laptop I use VMware Player but on my desktop I like to use VMware Workstation. Although VMware Player might just work fine, I like to have a bit more functionality at my disposal like teaming for instance.

VMware Workstation 7.0

That’s my lab. I installed 3 x ESXi 4.0 Update 1 in a VM and installed Windows 2008 in a VM with vCenter 4.0 Update 1. Attached the ESX hosts to the iSCSI LUNs and NFS Shares and off we go. Single box lab!

HA admission control, the answers…

Duncan Epping · Nov 9, 2009 ·

I received a whole bunch of questions around my two latest posts on HA admission control. I added all the info to my HA Deepdive page but just in case you don’t regularly read that section I will post them here as well:

The default of 256Mhz when no reservations are set is too conservative in my environment. What happens if you set a 100Mhz reservation?
Nothing. The minimum value VMware HA uses to calculate with is 256Mhz. Keep in mind that this goes for slots and when using a percentage based admission control policy. Of course this can be overruled with an advanced setting (das.slotCpuInMHz) but I don’t recommend doing this.
What happens if you have an unbalanced cluster and the largest host fails?
If your admission control policy is based on amount of host failures VMware HA will take this into account. However, when you select a percentage this is not the case. You will need to make sure that you specify a percentage which is equal or preferably larger than the percentage of resources provided by the largest host in this cluster. Otherwise there’s a chance that VMware HA can’t restart all virtual machines.
What would your recommendation be, reserve a specific percentage or set the amount of host failures VMware HA can tolerate?
It depends. Yes I know, that is the obvious answer but it actually does. There are three options and each have it’s own advantages and disadvantages. Here you go:
- Amount of host failures
  Pros: Fully automated, when a host is added to a cluster HA calculates how many slots are available.
  Cons: Can be very conservative and inflexible when reservations are used as the largest reservation dictates slot sizes.
- Percentage reserved
  Pros: Flexible. Although reservations have its effect on the amount of available resources it impacts the environment less.
  Cons: Manual calculations need to be done when adding additional hosts in a cluster. Unbalanced clusters can be a problem when chosen percentage is too low.
- Designated failover host
  Pros: What you see is what you get.
  Cons: What you see is what you get.

How to avoid HA slot sizing issues with reservations?

Duncan Epping · Nov 6, 2009 ·

Can I avoid large HA slot sizes due to reservations without resorting to advanced settings? That’s the question I get almost daily. The answer used to be NO. HA uses reservations to calculate the slot size and there’s no way to tell HA to ignore them without using advanced settings pre-vSphere. So there is your answer: pre-vSphere.

With vSphere VMware introduced a percentage next to an amount of host failures. The percentage avoids the slot size issue as it does not use slots for admission control. So what does it use?

When you select a specific percentage that percentage of the total amount of resources will stay unused for HA purposes. First of all VMware HA will add up all available resources to see how much it has available. Then VMware HA will calculate how much resources are currently consumed by adding up all reservations of both memory and cpu for powered on virtual machines. For those virtual machines that do not have a reservation a default of 256Mhz will be used for CPU and a default of 0MB+memory overhead will be used for Memory. (Amount of overhead per config type can be found on page 28 of the resource management guide.)

In other words:

((total amount of available resources – total reserved VM resources)/total amount of available resources)
Where total reserved VM resources include the default reservation of 256Mhz and the memory overhead of the VM.

Let’s use a diagram to make it a bit more clear:

Total cluster resources are 24Ghz(CPU) and 96GB(MEM). This would lead to the following calculations:

((24Ghz-(2Gz+1Ghz+256Mhz+4Ghz))/24Ghz) = 69 % available
((96GB-(1,1GB+114MB+626MB+3,2GB)/96GB= 85 % available

As you can see the amount of memory differs from the diagram. Even if a reservation has been set the amount of memory overhead is added to the reservation. For both metrics HA admission control will constantly check if the policy has been violated or not. When one of either two thresholds are reached, memory or CPU, admission control will disallow powering on any additional virtual machines. Pretty simple huh?!

Document it…

Duncan Epping · Nov 4, 2009 ·

Something that I noticed over the last months while doing design reviews is that hardly anyone documents decisions in a design. Most designs I review are physical designs, which is understandable as most IT people are technical people who could not care less about logical designs. I am perfectly fine with that, although I do recommend taking a different approach, as long as you document why you are going down a specific path.

There can be specific constraints or requirements (both technical and business related) which justify your decision, but if you don’t document these constraints or requirements chances are someone will change the design based on a false assumption and who knows what it will lead to…

DRS Deepdive part II

Duncan Epping · Oct 22, 2009 ·

Yesterday I posted the DRS Deepdive. One of the questions still left open was how DRS decides which VM to move to create a balance cluster. After a lot of digging for non-NDA info I found this “procedure” in a VMworld presentation(TA16) amongst some other cool info.

The following procedure is used to form a set of recommendations to correct the imbalanced cluster:

While (load imbalance metric > threshold) {
move = GetBestMove();
  If no good migration is found:
    stop;
  Else:
    Add move to the list of recommendations;
    Update cluster to the state after the move is added;
}

Step by step in plain English:

While the cluster is imbalanced (Current host load standard deviation > Target host load standard deviation) select a VM to migrate based on specific criteria and simulate a move and recompute the “Current host load standard deviation” and add to the migration recommendation list. If the cluster is still imbalanced(Current host load standard deviation > Target host load standard deviation) repeat procedure.

Now how does DRS select the best VM to move? DRS uses the following procedure:

GetBestMove() {
  For each VM v:
    For each host h that is not Source Host:
      If h is lightly loaded compared to Source Host:
      If Cost Benefit and Risk Analysis accepted
      simulate move v to h
      measure new cluster-wide load imbalance metric as g
  Return move v that gives least cluster-wide imbalance g.
}

Again in plain English:

For each VM check if a VMotion to each of the hosts which are less utilized than source host would result in a less imbalanced cluster and meets the Cost Benefit and Risk Analysis criteria. Compare the outcome of all tried combinations(VM<->Host) and return the VMotion that results in the least cluster imbalance.

This should result in a migration which gives the most improvement in terms of cluster balance, in other words: most bang for the buck! This is the reason why usually the larger VMs are moved as they will most likely decrease “Current host load standard deviation” the most. If it’s not enough to balance the cluster within the given threshold the “GetBestMove” gets executed again by the procedure which is used to form a set of recommendations.

Now the next question would be what does “Cost Benefit” and “Risk Analysis” consist of and why are we doing this?

First of all we want to avoid a constant stream of VMotions and this will be done by weighing costs vs benefits vs risks. These consists of:

Cost benefit
Cost: CPU reserved during migration on t he target host
Cost: Memory consumed by shadow VM during VMotion on the target host
Cost: VM “downtime” during the VMotion
Benefit: More resources available on source host due to migration
Benefit: More resources for migrated VM as it moves to a less utilized host
Benefit: Cluster Balance
Risk Analysis
Stable vs unstable workload of the VM (historic info used)

Based on these consideration a cost-benefit-risk metric will be calculated and if this has an acceptable value the VM will be consider for migration.

I will consolidate both post in a single blog page today to make it easier to find!