Server

Patches, also for ESX 3.0.x

Duncan Epping · Dec 3, 2008 ·

I didn’t even notice it this morning, but it was 05:45 when I woke up so that should be an excuse. It’s not only ESX 3.5 that needs to be patched, same goes for the 3.0.x hosts. So be sure to check out the patches section of the VMware website if you’ve still got 3.0.x running.

New patches for ESX 3.5

Duncan Epping · Dec 3, 2008 ·

VMware just released a bunch of patches for ESX 3.5, three security, two critical and one general:

And I know for a fact that some of you where waiting on ESX350-200811401 to drop:

A memory corruption condition may occur in the virtual machine hardware. A malicious request sent from the guest operating system to the virtual hardware may cause the virtual hardware to write to uncontrolled physical memory.The Common Vulnerabilities and Exposures project (cve.mitre.org) has assigned the name CVE-2008-4917 to this issue.
VMotion might trigger VMware Tools to automatically upgrade. This issue occurs on virtual machines that have the setting for Check and upgrade Tools before each power-on enabled, and the affected virtual machines are moved, using VMotion, to a host with a newer version of VMware-esx-tools. Symptoms seen without this patch:
- Virtual machines unexpectedly restart during a VMotion migration
- The guest operating systems might stall (reported on forums).
Note: After patching the ESX host, you need to upgrade VMware Tools in the affected guests that reside on the host.
Swapping active and standby NICs results in a loss of connectivity to the virtual machine.
A race issue caused an ASSERT_BUG to unnecessarily run and caused the ESX host to crash. This change removes the invalid ASSERT_BUG.Symptoms seen without this patch: The ESX host crashes with an ASSERT message that includes fs3DiskLock.c:1423.Example: ASSERT /build/mts/release/bora-77234/bora/modules/vmkernel/vmfs3/fs3DiskLock.c:1423 bugNr=147983
A virtual machine can become registered on multiple hosts due to a .vmdk file locking issue. This issue occurs when network errors cause HA to power on the same virtual machine on multiple hosts, and when SAN errors cause the host on which the virtual machine was originally running to lose its heartbeat. The original virtual machine becomes unresponsive.With this patch, the VI Client displays a dialog box warning you that a .vmdk lock is lost. The virtual machine is powered off after you click OK.
This change fixes confusing VMkernel log messages in cases where one of the storage processors (SP) of an EMC CLARiiON CX storage array is hung. The messages now correctly identify which SP is hung.Example of confusing message:vmkernel: 1:23:09:57.886 cpu3:1056)WARNING: SCSI: 2667: CX SP B is hung. vmkernel: 1:23:09:57.886 cpu3:1056)SCSI: 2715: CX SP A for path vmhba1:2:2 is hung.

vmkernel: 1:23:09:57.886 cpu3:1056)WARNING: SCSI: 4282: SP of path vmhba1:2:2 is hung. Mark all paths using this SP as dead. Causing full path failover.

In this case, research revealed that SP A was hung, but SP B was not.
This patch allows VMkernel to successfully boot on unbalanced NUMA configurations—that is, those with some nodes having no CPU or memory. When such unbalanced configuration is detected, VMkernel shows an alert and continues booting. Previously, VMkernel failed to load on such NUMA configurations.Sample alert message when memory is missing from one of the nodes (here, node 2):

No memory detected in SRAT node 2. This can cause very bad performance.
When the zpool create command from a Solaris 10 virtual machine is run on a LUN that is exported as a raw device mapping (RDM) to that virtual machine, the command creates a partition table of type GPT (GUID partition table) on that LUN as part of creating the ZFS filesystem. Later when a LUN rescan is run on the ESX server through VirtualCenter or through the command line, the rescan takes a significantly long amount of time to complete because the VMkernel fails to read the GUID partition table. This patch fixes this problem.
Symptoms seen without this patch: Rescanning HBAs takes a long time and an error message similar to the following is logged in /var/log/vmkernel:Oct 31 18:10:38 vmkernel: 0:00:45:17.728 cpu0:8293)WARNING: SCSI: 255: status Timeout for vml.02006500006006016033d119005c8ef7b7f6a0dd11524149442030. residual R 800, CR 80, ER 3
A race in LVM resignaturing code can cause volumes to disappear on a host when a snapshot is presented to multiple ESX hosts, such as in SRM environments.

Symptoms: After rescanning, VMFS volumes are not visible.
This change resolves a rare VMotion instability.Symptoms: During a VMotion migration, certain 32-bit applications running in 64-bit guests might crash due to access violations.
Solaris 10 Update 4, 64-bit graphical installation fails with the default virtual machine RAM size of 512MB.
DRS development and performance improvement. This change prevents unexpected migration behavior.
In a DRS cluster environment, the hostd service reaches a hard limit for memory usage, which causes hostd to restart itself.Symptoms: The hostd service restarts and temporarily disconnects from VirtualCenter. The ESX host stops responding before hostd reconnects.
Fixes for supporting Site Recovery Manager (upcoming December 2008 release) on ESX 3.5 Update 2 and Update 3.

Converting Domain Controllers

Duncan Epping · Dec 2, 2008 ·

Just noticed this great VMware KB article. The article deals about converting aka p2v’ing Microsoft Domain Controllers. Those of you who have done VMware implementations and migrations know that this usually causes problems and leaves the Active Directory in a faulty state. This will lead to replication not working properly anymore. My advise usually is: Create a new VM from a template and do a “dcpromo”, best solution to also get rid of the slack. Or do a “cold migration”, no and I repeat NO hot migration. This will kill your replication for sure. Anyway, read this KB Article for more info.

This Microsoft KB article deals about the problems that may occur when doing a P2V. It also contains a very important piece of information:

Microsoft does not support any other process that takes a snapshot of the elements of an Active Directory domain controller’s system state and copies elements of that system state to an operating system image. Unless an administrator intervenes, such processes cause a USN rollback. This USN rollback causes the direct and transitive replication partners of an incorrectly restored domain controller to have inconsistent objects in their Active Directory databases.

So in other words, hot migrations aren’t supported.

What’s in a name…

Duncan Epping · Dec 2, 2008 ·

This will probably take me a couple of weeks and maybe months to get fully adjusted to the new names VMware is going to use for their products.

Most of the name changes were already announced during VMworld. And there are three major ones in the actual list:

VirtualCenter → VMware vCenter
VDI → VMware View
VMFS → VMware vStorage VMFS

So most related products also had a revamp, Lifecycle Manager became VMware vCenter Lifecycle Manager and so on. Makes sense to me with the Virtual Datacenter OS coming up.

So learn this list by heart and we will do a test tomorrow 😉

VMware CPU Host Info

Duncan Epping · Dec 1, 2008 ·

Richard Garsthagen managed to find some spare time and update his ESX Host CPU info tool:

Interested in knowing if all your physical ESX servers are the same? VMware CPU Host Info will help you find out. The application gathers the important system information from your hosts and puts this in one single overview.

The program will tell you if your servers are VT capable and more important if this feature is turned on. I have found that on most my servers, this feature is disabled in the BIOS.

At some point in time VMware will provide a new cool feature called FT, this ‘Fault Tolerant’ feature will only work with the latest processors. This program will also let you know if your processors are new enough 🙂 (you have to have harpertown or above).

At the Login screen just provide your username, password and the IP/DNS of your Virtual Center. After the login, the program will collect from all hosts the Vendor, Model, CPU Types and the CPU feature bits.

You can connect the tool to multiple VC’s at the same time and it will report useful info like CPU Type and Features which might be handy if you’ve got a dozen of hosts and you want to create clusters based on VMotion compatibility. And as you can see in the screenshot below it also detects if you’re hosts are Fault Tolerance compatible. Visit his blog and pick it up.

Let’s hope Richard can keep finding time to do cool stuff like this or to blog on a more regular base again!