Best training course in ages!

I’ve done a lot of training courses in my career. A lot of them were disappointing as they never met my expectations. I guess the ones that did meet my expectations, or even exceeded them, were mainly VMware related. Especially the DSA course rocked. But there’s a new training in town, and it just claimed the crown… VMware vSphere: Manage for Performance!

I read the course material this week and I can honestly say that it rocks! Of course I did not expect anything less becauseVMware’s performance guru Scott Drummonds was involved in developing this course. (Scott wrote an article about the training a month ago, you can find it here.) Below you will find a short description.

This hands-on training course explores the management of performance in a VMware vSphere™ environment. It provides the knowledge and skills necessary to make fundamental design decisions that enhance performance and to meet performance goals in an already-deployed vSphere installation. The course is based on VMware® ESX™ 4.0, ESXi 4.0, and vCenter™ Server 4.0.

source pdf

Like I said, heavily recommended for everyone!

Reclaiming idle memory

In the “CPU/MEM Reservation Behavior” article there was a lively discussion going on between Chris Huss(vmtrainers.com) and myself. I think the following comment by Chris more or less summarizes the discussion

I wasn’t aware that the balloon driver was involved with the Mem.IdleTax. I haven’t seen any documentation stating this…and assumed that the VMkernel just stopped mapping idle memory for the VM without letting it know. If the VM needed the memory again, the VMkernel would just re-map it.

I can be totally wrong about this, but I have not seen any documentation to debunk this theory. It is my belief that the Mem.IdleTax is a totally separate memory saving/shaving technique from the balloon driver or the .vswp file.

If VMware engineering has or would publish an official article on this…I think it would clear up alot of things.

To summarize; How does ESX reclaim idle memory or free memory from a virtual machine? The answer is simple. ESX has two idle memory reclamation mechanisms:

  1. Balloon driver
  2. vSwap

I would like to refer to page 29 of the Resource Management Guide where the above is stated. I do not think it is a coincidence that the paragraph above “memory reclamation” is “Memory Tax for Idle Virtual Machines”. (There is a third memory “reclamation” mechanism by the way, it is called “TPS”, but this is not used to specifically reclaim Idle Memory but rather to free up memory by sharing pages where possible.)

By default the balloon driver is used to reclaim idle memory. The balloon driver is in fact used as some operating systems only update there internal free memory map. Basically what I am saying is that the hypervisor is unaware of the fact that specific pages are unused as they might still contain data and the GOS(Guest Operating System) will not report to the hypervisor that the pages are not being used anymore. The balloon driver is used to notify the GOS that there is a lack of memory.

When the balloon inflates the GOS will first assign all “unused / free” pages to the balloon driver. If this is enough it will stop. If this isn’t enough the OS will decide which pages it will page out  until it reaches its threshold. The pages will need to be written to GOS swap as they might be needed later, they can’t just be reused without storing them somewhere.

I guess this section of the excellent white-paper “Memory Resource Management in VMware ESX Server” by Carl Waldspruger describes what I explained above.

The guest OS decides which particular pages to reclaim and, if necessary, pages them out to its own virtual disk. The balloon driver communicates the physical page number for each allocated page to ESX Server, which may then reclaim the corresponding machine page.

To be absolutely certain I reached out Carl Waldspruger to verify my statements/claims are correct. (Yes they were…)

By the way this concept is also described in the “VMware vSphere: Manage for Performance” course manual on page 151. Excellent course which I can recommend to everyone as it will not only explain this concept but also how to identify it and how to resolve it.

Changing the directory of your vSphere vCenter log files

Something that a lot of people haven’t looked in to or just don’t think about is relocating the log files of vCenter, I wrote a short article 2 years ago and thought it was time to reiterate it. By default (Windows 2003) log files are stored in “C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\Logs”, and for Windows 2008 log files are stored in “C:\ProgramData\VMware\VMware VirtualCenter\Logs”.

As you can imagine the C:\ partition is not the ideal place for storing log files. I would personally recommend to use a separate drive for logfiles so avoid it from flooding any OS or Program related drives. You could pick a small size based on the expected log size and if needed increase the amount of logs that are stored and the size of the log file.

Changing this is pretty simple. Open “vpxd.cfg” and add the following line in between <log> and </log>

<directory>D:\VMware\Logs</directory>

Changing the amount of log files stored and the size is also pretty basic, in this example vCenter will store 10 logfiles which are max 10MB each:

<maxFileSize>10485760</maxFileSize>
<maxFileNum>10</maxFileNum>

Keep in mind that you will need to restart the vCenter Service after these changes before they take effect!

VM powered on Alarm?

One of my readers(Thanks Andrzej!) emailed me something that I thought might be interesting for those who are closely monitoring their environment.

Did you know that there are two similar VM event triggers in Alarms in vCenter?

  1. VM powered on
  2. DRS – VM powered on

The first only works for VMs outside of DRS enabled clusters. The second only works for VMs inside DRS enabled clusters. Now that’s definitely something you should be aware off when enabling Alarms / Event triggers. Imagine you want to know when a VM has been powered on and you enable the first even trigger but didn’t notice it will only sent an alarm when the VMs are not part of DRS cluster… You could be waiting for a very long time before you receive a single event alarm.

Just when I wanted to click “Publish” I received an email from one of my colleagues. Horst Mundt wrote an excellent article about Alarms and created a very handy spreadsheet which contains all alarms / events.

vSphere alarm triggers
In terms of alarms, vCenter 4 has much more to offer than vCenter 2.5. There is a whole range of default alarms available when you install vCenter 4, and they will give you a very good first shot for monitoring your vSphere  environment. If you’ve never wondered what exactly the default alarms mean, or how to tune them – that’s fine. If you’re interested in a bit more detail – read the attached PDF.

Make sure to visit the VMTN source page and leave a comment or rate the article.

Adding NICs to your vSwitch on ESXi?

I just finished installing vSphere ESXi 4.0 update 1, I used all the default settings. I expected that all my portgroups would inherit all their settings from the vSwitch that was configured during installation… unfortunately this is not the case as can be seen in the screenshots below.

Default install with no redundancy:

VM Network inherits from vSwitch:

Management Network does not inherit from vSwitch:

For the default “VM Network” portgroup everything works as expected. But for the “Management Network” it doesn’t. So what’s the problem? Well it might not be a huge issue but it is something you will need to keep in mind. I wanted to add two NICs to my vSwitch0 and expected that both would be marked as “active” on the vSwitch. And this is what happens on the vSwitch, BUT the “Management Network” does not inherit the vSwitch settings so what do you think will happen? Again see the screenshot below for the details:

For some weird reason one of the vmnics is set to “unused” instead of active… Keep this in mind when installing / configuring ESXi as you might end up with less redundancy then expected. I just did a quick search if it was a known/documented change and it appears that I am not the only one who ran into this, but is does not seem to be a commonly known “issue”/change.

Single Initiator Zoning, recommended or not?

A question we receive a lot is what kind of zoning should be implemented for our storage solution? The answer is usually really short and simple: at least single initiator zoning.

Single initiator zoning is something we have always recommend in the field (VMware PSO Consultants/Architects) and something that is clearly mentioned in our documentation… at least that’s what I thought.

On page 31 of the SAN Design and Deploy guide we clearly state the following:

When a SAN is configured using zoning, the devices outside a zone are not visible to the devices inside the zone. When there is one HBA or initiator to a single storage processor port or target zone, it is commonly referred to as single zone. This type of single zoning protects devices within a zone from fabric notifications, such as Registered State Change Notification (RSCN) changes from other zones. In addition, SAN traffic within each zone is isolated from the other zones. Thus, using single zone is a common industry practice.

That’s crystal clear isn’t it? Unfortunately there’s another document floating around which is called “Fibre Channel SAN Configuration Guide” and this document states the following on page 36:

  • ESX Server hosts that use shared storage for virtual machine failover or load balancing must be in one zone.
  • If you have a very large deployment, you might need to create separate zones for different areas of functionality. For example, you can separate accounting from human resources.

So which one is correct and which one isn’t? I don’t want any confusion around this. The first document, the SAN Design and Deploy guide is correct. VMware recommends single initiator zoning. Of course if you want to do “single initiator / single target” that would even be better, but single initiator is the bare minimum. Now let’s hope the VMware Tech Writers can get that document fixed…

CPU/MEM Reservation Behavior

Again an interesting discussion we had amongst some colleagues (Thanks Frank, Andrew and Craig! Especially Craig as most text below comes from The Resource Master). The topic was CPU/Memory reservations and more specifically the difference in behavior of these two.

One would expect that both a CPU and Memory reservation would have the same behavior when it comes to claiming and releasing resources but unfortunately this is not the case. Or should we say fortunately?

The following is taken from the resource management guide:

CPU Reservation:
Consider a virtual machine with reservation=2GHz that is
totally idle. It has 2GHz reserved, but it is not using any of
its reservation. Other virtual machines cannot reserve these 2GHz. Other virtual machines can use these 2GHz, that is, idle
CPU reservations are not wasted.

Memory Reservation:
If a virtual machine has a memory reservation but has not yet accessed its full reservation, the unused memory can be reallocated to other virtual machines. After a virtual machine has accessed its full reservation, ESX Server allows the virtual machine to retain this much memory, and will not reclaim it, even if the virtual machine becomes idle and stops accessing memory.

The above paragraph is a bit misleading , as it seems to imply that a VM has to access its full reservation. What it should really say is “Memory which is protected by a reservation will not be reclaimed by ballooning or Host-level swapping even if it becomes idle,” and “Physical machine memory will not be allocated to the VM until the VM accesses virtual RAM needing physical RAM backing.” Then that pRAM is protected by the reservation and won’t be reclaimed by ballooning or .vswp-file swapping. If there is any .vswp memory at all as no .vswp is created when the reservation is equal to the provisioned memory.

Note, however, that even if pRAM is not allocated to the VM to back vRAM because the VM hasn’t accessed corresponding vRAM yet, the whole reservation is reserved, but the pRAM could still be used  This gets really confusing. But I think of it thus:

  1. Reservations can be defined at the VM level or the Resource Pool level.
  2. Reservations at the RP level are activated or reserved immediately.
  3. Reservations at the VM level are activated or reserved when the VM is powered on.
  4. An activated reservation is removed from the total physical Resource “Unreserved” accounting.
  5. Reserving and using a resource are distinct: memory or CPU can be reserved but not used or used but not reserved.
  6. CPU reservations are friendly.
  7. Memory reservations are greedy and hoard memory.
  8. Memory reservations are activated at startup, yet pRAM is only allocated as needed. Unallocated pRAM may be used by others.
  9. Once pRAM is protected by a memory reservation, it will never be reclaimed by ballooning of .vswp-swapping even if the corresponding vRAM is idle.

Example: A VM has 4 GB of vRAM installed and a 3 GB memory reservation defined. When the VM starts, 3 GB of pRAM are reserved. If the host had 32 GB of RAM installed and no reservations active, it now has 29 GB “unreserved”.

However, if the VM accesses only 500 MB of vRAM, only 500 MB of pRAM are allocated (or granted) to it. Other VMs could use 2500 MB of RAM that you would think is part of the reservation. They cannot reserve that 2500 MB however. As soon as the VM accesses 3 GB of vRAM and so has 3 GB of pRAM backing it, no other VMs can use that 3 GB of pRAM even if the VM never touches it again, because that pRAM is now protected by the 3 GB Reservation.  If the VM uses 4 GB, it gets the 3 GB guaranteed never ballooned or swapped, but the remaining 1 GB is subject to ballooning or swapping.

Simple huh ;-)

Subscribe to RSS Feed Follow me on Twitter!