vcenter

DRS Deepdive

Duncan Epping · Oct 21, 2009 ·

Last week I mentioned which metrics DRS used for load balancing VMs across a cluster. Of course the obvious question was when the DRS Deepdive would be posted. I must admit I’m not an expert on this topic as like most of you I always took for granted that it worked out of the box. I can’t remember that there ever was the need to troubleshoot DRS related problems, or better said I don’t think I’ve ever seen an issue which was DRS related.

This article will focus on two primary DRS functions:

Load balancing VMs due to imbalanced Cluster
VM Placement when booting

I will not be focusing on Resource Pools at all as I feel that there are already more than enough articles which explain these. The Resource Management Guide also contains a wealth of info on resource pools and this should be your starting place!

Load Balancing

First of all VMware DRS evaluates your cluster every 5 minutes. If there’s an imbalance in load it will reorganize your cluster, with the help of VMotion, to create an evenly balanced cluster again. So how does it detect an imbalanced Cluster? First of all let’s start with a screenshot:

fig 1

There are three major elements here:

Migration Threshold
Target host load standard deviation
Current host load standard deviation

Keep in mind that when you change the “Migration Threshold” the value of the “Target host load standard deviation” will also change. In other words the Migration Threshold dictates how much the cluster can be “imbalanced”. There also appears to be a direct relationship between the amount of hosts in a cluster and the “Target host load standard deviation”. However, I haven’t found any reference to support this observation. (Two host cluster with threshold set to three has a THLSD of 0.2, a three host cluster has a THLSD of 0.163.) As said every 5 minutes DRS will calculate the sum of the resource entitlements of all virtual machines on a single host and divides that number by the capacity of the host:

sum(expected VM loads) / (capacity of host)

The result of all hosts will then be used to compute an average and the standard deviation. (Which effectively is the “Current host load standard deviation” you see in the screenshot(fig1).) I’m not going to explain what a standard deviation is as it’s explained extensively on Wiki.

If the environment is imbalanced and the Current host load standard deviation exceeds the value of the “Target host load standard deviation” DRS will either recommend migrations or perform migrations depending on the chosen setting.

Every migration recommendation will get a priority rating. This priority rating is based on the Current host load standard deviation. The actual algorithm being used to determine this is described in this KB article. I needed to read the article 134 times before I actually understood what they were trying to explain so I will use an example based on the info shown in the screenshot(fig1). Just to make sure it’s absolutely clear, LoadImbalanceMetric is the Current host load standard deviation value and ceil is basically a “round up”. The formula mentioned in the KB article followed by an example based on the screenshot(fig1):

6 - ceil(LoadImbalanceMetric / 0.1 * sqrt(NumberOfHostsInCluster))

6 - ceil(0.022 / 0.1 * sqrt(3))

This would result in a priority level of 5 for the migration recommendation if the cluster was imbalanced.

The only question left for me is how does DRS decide which VM it will VMotion… If anyone knows, feel free to chip in. I’ve already emailed the developers and when I receive a reply I will add it to this article and create a seperate article about the change so that it stands out.

VM Placement

The placement of a VM when being powered on is as you know part of DRS. DRS analyzes the cluster based on the algorithm described in “Load Balancing”. The question of course is for the VM which is being powered on what kind of values does DRS work with? Here’s the catch, DRS assumes that 100% of the provisioned resources for this VM will be used. DRS does not take limits or reservations into account. Just like HA, DRS has got “admission control”. If DRS can’t guarantee the full 100% of the resources provisioned for this VM can be used it will VMotion VMs away so that it can power on this single VM. If however there are not enough resources available it will not power on this VM.

That’s it for now… Like I said earlier, if you have more indepth details feel free to chip in as this is a grey area for most people.

Active / Standby etherchannels?

Duncan Epping · Oct 12, 2009 ·

I’ve seen this a couple of times already and just had a very long phone call with a customer who created the following set up:

So basically the first two nics are active with load balancing set to IP-Hash and configured as an Etherchannel on the stacked Cisco 3750’s. The second pair are “standby”. Also with load balancing set to IP-Hash and configured as a second Etherchannel on the stacked Cisco 3750’s. A diagram probably makes more sense:

Explanation: All NICs belong to the same vSwitch. Etherchannel 01 consist of “vmnic0” and “vmnic3” and both are active. Etherchannel 02 consists of “vmnic1” and “vmnic4” and both are standby.

My customer created this to ensure a 2Gb link is always available. In other words if “vmnic3” fails “vmnic1” and “vmnic4” should take over as they are a “pair”. But is this really what happens when “vmnic3” fails?

As you can clearly see, what they expected to happen did not happen. When “vmnic3” failed VMware ESX “promoted” the first standby NIC to active, which in this case belongs to a different Etherchannel. What happened next was not a pretty sight, mac-address table went completely nuts with “MACFLAPS” all over the place. I’m not a networking guy but I can tell you this introducing a loop when you configured portfast is not a smart idea. DON’T DO THIS AT HOME KIDS!

Best Practices: running vCenter virtual (vSphere)

Duncan Epping · Oct 9, 2009 ·

Yesterday we had a discussion on running vCenter virtual on one of the internal mailinglists. One of the gaps identified was the lack of a best practices document. Although there are multiple for VI3 and there are some KB articles these do need seem to be easy to find or complete. This is one of the reasons I wrote this article. Keep in mind that these are my recommendations and they do not necessarily align with VMware’s recommendations or requirements.

Sizing

Sizing is one of the most difficult parts in my opinion. As of vSphere the minimum requirements of vCenter have changed but it goes against my personal opinion on this subject. My recommendation would be to always start with 1 vCPU for environments with less than 10 hosts for instance. Here’s my suggestion:

< 10 ESX Hosts
- 1 x vCPU
- 3GB of memory
- Windows 64Bit OS(preferred) or Windows 32Bit OS
> 10 ESX Hosts but < 50 ESX Hosts
- 2 x vCPU
- 4GB of memory
- Windows 64Bit OS(preferred) or Windows 32Bit OS
> 50 ESX hosts but < 200 ESX Hosts
- 4 x vCPU
- 4GB of memory
- Windows 64Bit OS(preferred) or Windows 32Bit OS
> 200 ESX Hosts
- 4 x vCPU
- 8GB of memory
- Windows 64Bit OS(requirement)

My recommendation differ from VMware’s recommendation. The reason for this is that in small environments(<10 Hosts) there’s usually more flexibility for increasing resources in terms of scheduling down time. Although 2 vCPUs are a requirement I’ve seen multiple installations where a single vCPU was more than sufficient. Another argument for starting with a single vCPU would be “Practice What You Preach”. (How many times have you convinced an application owner to downscale after a P2V?!) I do however personally prefer to always use a 64Bit OS to enable upgrades to configs with more than 4GB of memory when needed.

vCenter Server in a HA/DRS Cluster

Disable DRS(Change Automation Level!) for your vCenter Server and make sure to document where the vCenter Server is located (My suggestion would be the first ESX host on the cluster).
Make sure HA is enabled for your vCenter Server, and set the startup priority to high. (Default is medium for every VM.)
Make sure the vCenter Server VM gets enough resources by setting the shares for both Memory and CPU to “high”.
Make sure other services and servers on which vCenter depends are also starting automatically, with a high priority and in the correct order like:
1. Active Directory.
2. DNS.
3. SQL.
Write a procedure to boot the vCenter / AD / DNS / SQL manually in case of a complete power outage occurs.

Most of these recommendations are pretty obvious but you would be surprised how many environments I’ve seen where for instance MS SQL had a medium startup priority and vCenter a high priority. Or where after a complete power outage no one knows how to boot the vCenter Server. Documenting standard procedures is key here; especially know that with vSphere vCenter is more important than ever before.

Source:
http://kb.vmware.com/kb/1009080
http://kb.vmware.com/kb/1009039
ESX and vCenter Server Installation Guide
Upgrade Guide

Using limits instead of downscaling….

Duncan Epping · Sep 25, 2009 ·

I’ve seen this floating around the communities a couple of times and someone also mentioned this during a VCDX Panel: setting limits on VMs when you are not allowed to decrease the memory. For example you want to P2V a server with 8GB of memory and an average utilization of 15%. According to normal guidelines it would make sense to resize the VM to 2GB, however due to political reasons (I paid for 8GB and I demand…) this is not an option. This is when people start looking into using limits. However I don’t recommend this approach and there’s a good reason for it.

Using limits can lead to serious performance issues when the VM starts swapping. As many of you know the first thing that happens when you reach the limit is that the balloon driver kicks in. The balloon driver will force the OS to swap out. Of course this will affect performance but at least when the OS gets to pick the pages it will do this in a smart way. When the OS reaches its limits the VMkernel will start swapping and this is where it gets nasty because the VMkernel does not take anything into account. It could easily swap out pages actively being used by your application or operating system which will affect the performance of your VM heavily. (That’s a short summary of the process, if you want a more in-depth explanation of this please read this excellent post by Scott “VMGuru” Herold.)

Swapping, either VMkernel or OS, is the reason I don’t recommend using limits. Just think about it for a minute. You probably convinced the application owner to virtualize their services with arguments like availability, flexibility and equal performance. Setting a limit will more than likely affect performance when the threshold is in reach and thus hurt their trust in virtualization and the IT organization. Another side effect is that there’s no way to recover from swapping without a reboot, which will mean availability will also be decreased. In other words; avoid setting limits.

I do however understand why admins take these drastic steps; but again I don’t agree. If you want to convince your application owner that their VM needs to be resized monitor it. Prove to them that the server is not utilizing the memory and claim it back. Claiming back is difficult, that’s why I personally recommend to invest more time and effort during the first phase of your P2V project, educate the application owner and convince them with the outcome of your capacity planning tools. Explain them how easy it is to increase memory and make them feel more comfortable by adding a week of aftercare which includes resource monitoring. If you really want to convince them, but that’s dependent on the level of maturity within the organization, change the cost-model and make it more attractive to downsize…

IO DRS – Providing Performance Isolation to VMs in Shared Storage Environments (TA3461)

Duncan Epping · Sep 16, 2009 ·

This was probably one of the coolest sessions of VMworld. Irfan Ahmad was the host of this session and some of you might know him from Project PARDA. The PARDA whitepaper describes the algorithm being used and how the customer could benefit from this in terms of performance. As Irfan stated this is still in a research phase. Although the results are above expectations it’s still uncertain if this will be included in a future release and if it does when this will be. There are a couple of key take aways that I want to share:

Congestion management on a per datastore level -> limits on IOPS and set shares per VM
Check the proportional allocation of the VMs to be able to identify bottlenecks.
With I/O DRS throughput for tier 1 VMs will increase when demanded (More IOPS, lower latency) of course based on the limits / shares specified.
CPU overhead is limitied -> my take: with the new hardware of today I wouldn’t worry about an overhead of a couple percent.
“If it’s not broken, don’t fix it” -> if the latency is low for all workloads on a specific datastore do not take action, only above a certain threshold!
I/O DRS does not take SAN congestion in account, but SAN is less likely to be the bottleneck
Researching the use of Storage VMotion move around VMDKs when there’s congestion on the array level
Interacting with queue depth throttling
Dealing with end-points and would co-exist with Powerpath

That’s it for now… I just wanted to make a point. There’s a lot of cool stuff coming up. Don’t be fooled by the lack of announcements(according to some people, although I personally disagree) during the keynotes. Start watching the sessions, there’s a lot of knowledge to be gained!