Yellow Bricks

Site Recovery Manager 1.0 Update 1 Patch 4

Duncan Epping · Sep 14, 2009 ·

One of my colleagues, Michael White, just pointed out that VMware released a patch for Site Recovery Manager:

Site Recovery Manager 1.0 Update 1 Patch 4
File size: 7.9 MB
File type: .msi

Here are the most important fixes:

a problem that could cause a recovery plan to fail and log the message
Panic: Assert Failed: “_pausing” @ d:/build/ob/bora-172907/santorini/src/recovery/secondary/recoveryTaskBase.cpp:328

a problem that caused the SRM SOAP API method getFinalStatus() to write all XML output on a single line

full session keys are no longer logged (partial keys are used in the log instead)

a problem that could cause SRM to crash during a test recovery and log the message
Exception: Assert Failed: “!IsNull()” @ d:/build/ob/bora-128004/srm101-stage/santorini/public\common/typedMoRef.h:168

a problem that could cause a recovery plan test to fail to create test bubble network when recovering virtual machines that had certain types of virtual NICs

a problem that could cause incorrect virtual machine start-up order on recovery hosts that enable DRS

a problem that could cause the SRM server to crash while testing a recovery plan

a problem that could cause SRM to fail and log a “Cannot execute scripts” error when customizing Windows virtual machines on ESX 3.5 U1 hosts.

support for customizing Windows 2008 has been added

a problem that could prevent network settings from being updated during test recovery for guests other than Windows 2003 Std 32-bit

a problem that prevents protected virtual machines from following recommended Distributed Resource Scheduler (DRS) settings when recovering to more than one DRS cluster.

a problem observed at sites that support more than seven ESX hosts. If you refresh inventory mappings when connected to such a site, the display becomes unresponsive for up to ten minutes.

a problem that could prevent SRM from computing LUN consistency groups correctly when one or more of the LUNs in the consistency group did not host any virtual machines.

a problem that could cause the client user interface to become unresponsive when creating protection groups with over 300 members

several problems that could cause SRM to log an error rmessage vim.fault.AlreadyExists when recomputing datastore groups

a problem that could cause SRM to log an Assert Failed: “ok” @ src/san/consistencyGroupValidator.cpp:64 error when two different datastores match a single replicated device returned by the SRA

a problem that could cause SRM to remove static iSCSI targets with non-test LUNs during test recovery

several problems that degrade the performance of inventory mapping

Cool Tool Update: RVTools 2.6

Duncan Epping · Sep 13, 2009 ·

Rob de Veij just uploaded a new version of RVTools. Check it out, there are a whole bunch of new cool features added. Honestly one of the best free tools around, great work Rob! (Everyone keep in mind that Rob does this during the evening so if you’re using this for commercial purposes would be nice to make a small donation.)

Version 2.6 (September, 2009)

RVTools is now using the vSphere 4 SDK. The SDK has been enhanced to support new features of ESX/ESXi 4.0 and vCenter Server 4.0 systems.

On vNetwork tab the Vmxnet2 information is improved (due to the new SDK).

The name of the vCenter server or ESX host to which RVTools is connected is now visible in the windows title.

New menu option: Export All. Which exports all the data to csv files.

Export All function can also started from the command line. The output files are written to a unique directory in the users documents directory.

New vSwitch tab. The vSwitch tab displays for each virtual switch the name of the switch, number of ports, free ports, promiscuous mode value, mac address changed allowed value, forged transmits allowed value, traffic shapping flag, width, peak and burst, teaming policy, reverse policy flag, notify switch value, rolling order, offload flag, TSO support flag, zero copy transmits support flag, maximum transmission unit size, host name, datacenter name and cluster name.

New vPort tab. The vPort tab displays for each port the name of the port, the name of the virtual switch where the port is defined, VLAN ID, promiscuous mode value, mac address changed allowed value, forged transmits allowed value, traffic shapping flag, width, peak and burst, teaming policy, reverse policy flag, notify switch value, rolling order, offload flag, TSO support flag, zero copy transmits support flag, size, host name, datacenter name and cluster name.

Filter is now also working on vHost, vSwitch and vPort tab.

Health check change: number of virtual machines per core check is changed to number of virtual CPUs per core.

Memory alarms triggered with AMD RVI and Intel EPT?

Duncan Epping · Sep 11, 2009 ·

I’ve reported on this twice already but it seems a fix will be offered soon. I discovered the problem back in March when I did a project where we virtualized a large amount of Citrix XenApp servers on an AMD platform with RVI capabilities. As Hardware MMU increased performance significantly it was enabled by default for 32Bit OS’es. This is when we noticed that large pages(side effect of enabling MMU) are not TPS’ed and thus give a totally different view of resource consumption than on your average cluster. When vSphere and Nehalem was released more customers experienced this behavior, as EPT(Intel’s version of RVI) is fully supported and utilized on vSphere, as reported in this article. To be absolutely clear: large pages were never supposed to be TPS’ed and this is not a bug but actually working as designed. However; we did discover an issue with the algorithm being used to calculate Guest Active Memory which causes the alarms to be triggered as “kichaonline” describes in this reply.

I’m not going to reiterate everything that has been reported in this VMTN Topic about the problem, but what I would like to mention is that a patch will be released soon to fix the incorrect alarms:

Several people have, understandably, asked about when this issue will be fixed. We are on track to resolving the problem in Patch 2, which is expected in mid to late September.

In the meantime, disabling large page usage as a temporary work-around is probably the best approach, but I would like to reiterate that this causes a measurable loss of performance. So once the patch becomes available, it is a good idea to go back and reenable large pages.

Also a small clarification. Someone asked if the temporary work-around would be “free” (i.e., have no performance penalty) for Win2k3 x64 which doesn’t enable large pages by default. While this may seem plausible, it is however not the case. When running a virtual machine, there are two levels of memory mapping in use: from guest linear to guest physical address and from guest physical to machine address. Large pages provide benefits at each of these levels. A guest that doesn’t enable large pages in the first level mapping, will still get performance improvements from large pages if they can be used for the second level mapping. (And, unsurprisingly, large pages provide the biggest benefits when both mappings are done with large pages.) You can read more about this in the “Memory and MMU Virtualization” section of this document:

http://www.vmware.com/resources/techresources/10036

Thanks,
Ole

Mid / Late september may sound to vague for some and that’s probably why Ole reported the following yesterday:

The problem will be fixed in Patch 02, which we currently expect to be available approximately September 30.

Thanks,
Ole

Mythbusters: Hyperthreading and VMware FT

Duncan Epping · Sep 10, 2009 ·

When vSphere was still in beta one of the requirements for using FT was to have hyperthreading disabled. For most people this wasn’t an issue as traditional hyperthreading usually did not improve performance and thus was disabled by default. However with the Nehalem all this changed. Of course I can’t guarantee a specific percentage of performance increase but increases of up to 20% have been reported which is the primary reason for having HT enabled on any Nehalem system.

As you can imagine the HT requirement for FT has been floating around ever since and is a myth which have never been debunked. I’ve spoken with product management about it and they confirmed it’s an obsolete requirement. Hyperthreading does not have to be disabled for FT to work. Or to put it even more strongly: FT is supported on systems which have hyperthreading enabled. Product Management promised me that a KB article will be created to debunk this myth or an entry will be added to the FT FAQ KB article soon.

UPDATE: The FT FAQ KB Article has been updated and includes the following statement.

Does Fault Tolerance support Intel Hyper-Threading Technology?
Yes, Fault Tolerance does support Intel Hyper-Technology on systems that have it enabled. Enabling or disabling Hyper-Threading has no impact on Fault Tolerance.

VMware Data Recovery 1.0.2

Duncan Epping · Sep 10, 2009 ·

VMware just released a brand new version of VMware Data Recovery.

Version 1.0.2
Build Number 188925
Release Date 2009/09/09

This releases fixes a couple of known issues:

Various Integrity Check Issues
Under certain circumstances, integrity checks reported damaged restore points and cannot load session errors. For example, such problems might be reported if:
- A combination of simultaneous overlapping backups and integrity checks are started.
- A backup is stopped before completion because the backup window closes. In such a case, the deduplication store records transactions, but the closing of the backup window prevents recording the transaction to the catalog.
When integrity checks failed in such cases, Data Recovery would mark restore points as damaged or report that the backup session could not be found. Data Recovery integrity check now handles these conditions properly, so these problems no longer occur.
Connections Using Alternate Ports not Supported
By default, connections to vCenter Server use port 443. If vCenter Server is configured to use an alternate port, Data Recovery continued to attempt to connect using the default port. This caused the Data Recovery plug-in to report authentication failures when attempting to connect to the Data Recovery appliance. Alternate vCenter Server port configurations are now supported.

Multiple VMDKs with the Same Name not Handled Properly
A virtual machine can have multiple VMDK files with the same name that are stored on different LUNs. In such a case, Data Recovery would only restore one of the disks. Data Recovery now restores all disks.

You can find the full release notes here.