I’ve been playing around with Site Recovery Manager these last couple of days. Installing it was really easy and same goes for the basic configuration. I already wrote a blog about this topic a month ago or so but now I’ve experienced it myself. Most of the time during a Site Recover Manager project will be spent during the Plan & Design phase and writing documentation. I will just give you one example why. The following was taken from the SRM Course material:
Datastore Group
Replicated datastores containing the complete set of virtual machines that you want to protect with SRM
Protection Group
A group of virtual machines that are failed over together during test and recovery
For those who don’t know, there’s a one on one mapping between Datastore Groups and Protection Groups. So in other words, once you’ve mapped a Datastore Group to a Protection Group there’s no way of changing it without having to recreate the Protection Group.
I think a picture says more than a 1000 words so I stole this one from the Evaluator Guide to clarify the relationship between datastore, Datastore Groups and Protection Groups:

Notice that there are multiple datastores in Datastore Group 2 because VM4 has disks in both datastores. So these datastores are joined into one Datastore Group. This Datastore Group will have a one to one relationship with a Protection Group. Keep in mind, this is really important: a Protection Group contains VM’s that are failed over together during test and recovery.
If you’ve got VM’s with multiple disks on multiple datastores with no logic in which disk is placed on which datastore you could and probably will end up with all datastores being member of the same Datastore Group. Being member of the same Datastore Group means being part of the same Protection Group. Being part of the same Protection Group will result in a less granular fail-over. It’s all or nothing in this case and I can imagine most companies would like to have some sort of tiering model in place or even better fail over services one at a time. (This doesn’t mean by the way that if you create multiple Protection Group that you can’t fail over everything at the same time, they can all be joined in a Recovery Plan)
Some might think that you would be able to randomly add disks to datastores after you finished configuring. This clearly isn’t the case. If you add a disk to a protected(!) VM the Datastore Group will be recomputed. In our situation this meant that all VM’s in the “Medium Priority” Protection Group were moved over to the “High Priority” Protection Group. This was caused by the fact that we added a disk to a “Medium Priority” VM and placed it on a “High Priority” datastore. As you can imagine this also causes your Recovery Plans to end up with a “warning”, you will need to reconfigure the moved VM’s before you can fail them over as part of your “High Priority” datastore. (Which probably wasn’t the desired strategy…)
When I was searching the internet for information on SRM I stumbled upon this article on the VMware Uptime blog by Lee Dilworth. I’ve taken the following from the “What we’ve learnt” post, which confirms what we’ve seen the last couple of days:
Datastore Group computation is triggered by the following events:
- Existing VM is deleted or unregistered
- VM is storage vmotioned to a different datastore
- New disk is attached to VM on a datastore previously not used by the VM
- New datastore is created
- Existing datastore is expanded
So in other words, moving VM’s from one Datastore to another or creating a new disk on a different Datastore can cause problems because the Datastore Group computation will be re-run. Not only do you need to take virtual disk placement in consideration when configuring SRM, you will also need to be really careful when moving virtual disks. Documentation, Design and Planning is key here.
I would suggest documenting current disk placement before you even start implementing SRM, and given the results you might need to move disks around before you start with SRM. Make sure to check your documentation and design before randomly adding virtual disks when SRM has been implemented. Documenting your current disk placement can be done easily with the script that Hugo created this week by the way, and I would suggest to regularly create reports and save them.
Expect some more SRM stuff coming up over the next couple of weeks.
When I published my article on tools/scripts I use during a VMware Healthcheck I received a couple of emails on Tripwire’s Configcheck. I’ve been on a holiday for a couple of weeks so it took me a bit longer than usual to check out the product.
Configcheck can be downloaded for free. Configcheck is a Java Application so you will need to install JRE. Installing JRE can be a bit of a pain sometimes on a server so this is one of the reasons for me that will make it hard to actively use Configcheck at customer sites. (This depends on the customers policy.) Installing the product is fairly easy though:
- Download Java JRE.
- Download the file configcheck.zip to a Windows machine that has Java Runtime Environment (JRE) version 1.5, or higher.
- Unzip the configcheck.zip file
That’s it, fairly easy. Now you can run “configcheck.cmd” to check the specified ESX host on security issues. Once the check is complete you can click the test results to view remediation steps. The test results will look like this:

As you can see, 37 Passed and 40 Failed. Not really surprising considering the fact that I ran this against a newly build ESX 3.5 U3 host. No modifications whatsoever. Clicking the test results didn’t work on my test servers because of the lack of an internet connection. Unfortunately it’s also not possible to export the data in this version. It’s free and Tripwire’s Enterprise edition does give you this capability, if you need export and a whole lot more check it out. You can find a data-sheet with a comparison between Configcheck and enterprise here.
Luckily Tripwire also provides the remediation steps in pdf form. For instance the remediation steps for 1.2.2 “Verify the log files to keep is equal to 10″:
Description:
This test determines if virtual machines are configured to keep 10 log files when the recommended log rotate size of 100KB is exceeded. Virtual machines log activity in their respective vmware.log files. If growth of these log files is not limited, it is possible for virtual machines to cause a denial of service on the ESX Server by filling up the VMFS volume. There are two options for preventing virtual machines from flooding the hard disk of the host: size-based log file rotation or disabling logging for the virtual machine. This policy checks for size-based log file rotation because disabling logging altogether limits troubleshooting options.
Remediation:
To remediate failure of this policy test, configure the virtual machine to keep 10 log files when the recommended log rotate size of 100KB is exceeded. Configuring the virtual machine to keep 10 log files when the recommended log rotate size of 100KB is exceeded:
Login to the VirtualCenter or use the VI Client to connect directly to the ESX Server hosting the improperly configured virtual machine.
- Power off the virtual machine if needed.
- Right click the virtual machine and click Edit Settings.
- Select the Options tab.
- Select Advanced > General, and click the Configuration Parameters button.
- Look for a row with log.keepOld and set the value to 10.
- If the row does not exist, then click the Add Row button.
- In the Name field type log.keepOld.
- In the Value field type the value to 10.
- Click OK to close the Configuration Parameters dialog.
- Click OK to close the Virtual Machine Properties dialog.
As you can see, the description and remediation explain why and what to do in a fairly extensive manner. Which is great cause not does this make solving the “problem” really easy, Tripwire’s Configcheck also educates the SysAdmin!
I’ve wrote about this tool several times so most of you must have tested it by now and are probably actively using it: RVTools. Rob just notified me that he uploaded a brand new version of his tool. The following have been added to version 2.2:
- New vDatastore tab. The “vDatastore” tab displays for each datastore the name, connectivity status, file system type, number of virtual machines on the datastore, total capacity in MB’s, free capacity in MB’s, multiple host access indication and the url.
- Your custom defined fields are now visible on most of the tab-pages
- New menu option “export data to csv file”
- New “upgrade policy” field on vTools tab-page
- New “Sync time with host” field on vTools tab-page
- The field “OS” which is displayed on most of the tab-pages now displays the name of the guest OS according to the VMware Tools. In previous versions we used the configuration value. The vTools tab displays both “OS” fields.
Here’s a screenshot of the new tab “vDatastore”:

If you’ve never used RVTools before besure to check it out, it’s worth it. And if you are already using it download it and upgrade!
I was just reading the Press Release on the “Preliminary Fourth-Quarter Financial Results” over on EMC’s website and noticed the following:
The restructuring program will reduce EMC’s global Information Infrastructure workforce by approximately 2,400 positions, or about 7% of its headcount as of September 30, 2008.
Don’t know if anyone had already noticed it or not. But it seems like major news to me. Especially with all the rumours floating around on layoffs at other huge software firms.
Hugo posted a great script. It will compare configuration items between a given set of hosts. This can especially come in handy when you’ve got a huge amount of datastores, portgroups or a huge amount of ESX hosts for that matter. Hugo’s post contains a set of excellent examples. Just check his post for more details and the script, heres what the outcome would look like for now:
InputObject SideIndicator
———– ————-
esxServer1_Local <=
esxServer2_Local =>
DATASTORE_TEST1 =>
Update: Hugo just posted a follow up to his original blog. This new script creates a CSV file, which can be imported in to Excel for example and the result will look like this:

Ivo Beerens just published a new version of his Powershell Healthcheck script. The script will report the following in a nicely formatted html file:
- VMware ESX server Hardware and version
- VMware vCenter version
- Cluster information
- VMware statistics
- Active Snapshots
- CDROMs connected to VMs
- Floppy drives connected to VMs
- Datastores Information such as free space
- RDM information
- VM information such as VMware tools version, processor and memory limits
- VM’s and there datastore
- VMware timesync enabled
- Percentage disk space used inside the VM
- VC error logs last 5 days
Go to Ivo’s website for the download of the script and the source blog post. I use this script personally just to keep track of changes and get a quick overview of the current situation of an environment.
I often get the question where I get my information from. Besides the wealth of information found on the PlanetV12n blogs and the VMware documentation there’s another great source: VMworld. Some go to VMworld just for “networking”, but besides expanding your network you can gather a lot of knowledge during your stay. Not only during your stay, because your vmworld.com account will also give you the possibility to download the presentations. These presentations contain, and especially the deep-dive sessions, in-depth information on the chosen subject.
Just to give you an idea, check this presentation that Rich shared via VM/ETC.com “VI3 TA19, Advanced Log Analysis“. There’s a great section on SCSI Error Strings which might just explain the vmkernel log messages you’ve been seeing(Storagemonitor). Anyway, VMworld Europe 2009, France – Cannes, Be there and Register now!