Today I noticed the following in a ESX 3.5 and Virtual Center 2.5 environment: When you check the performance of the memory(past day) on the cluster the minimum and maximum is way more than expected. In this case the minimum is 66,9% and the maximum 106,01%. This cluster has around 64GB of memory and there’s only around 30GB assigned, nowhere near the 66,9% or the 106,01% for that matter. Anyone who can confirm this behavior or even better explain it. I’m afraid it’s a bug…
Storage VMotion Fails With Error Message “Failed to unstun VM after disk reparent”
VMware justed added a new KB article about problems with Storage VMotion script:
Storage VMotion can fail for a virtual machine with the error message:
Failed to unstun VM after disk reparent.
The virtual machine is partially migrated and powered off. Generally, the virtual machine cannot be powered on again.
This issue affects:
- Virtual machines converted in-place (not deployed or cloned) from template virtual machines with compact disks.
- Virtual machines cloned from a virtual machine of the above type (as well as virtual machines cloned from those virtual machines, and so on).
- Virtual machines with disks created with the following command: vmkfstools -i <source> -d <destination>. ESX Server 3.5 does not support thin-provisioned (or sparse) disks.
- Virtual machines created through the SDK with at least one disk created using the RelocateSpec.transform parameter set to sparse. Again, ESX Server 3.5 does not support thin-provisioned disks.
This issue affects the above types of virtual machines because when a disk that is thin-provisioned, or is flat but was cloned from a virtual machine that thin-provisioned disks, is copied to the target datastore as part of Storage VMotion, the disk’s content ID (CID) value is not preserved (although the content of the disk is correctly copied). When the virtual machine attempts to open the disk, it notices the CID is different from what it expects and fails to resume because it believes the disk is corrupted. In reality, the disk is not corrupted, only the CID is incorrect. So not only does the virtual machine end up powered off, but it cannot be powered back on because the CID is still incorrect.
To identify and solve the problem download the scripts attached to the article!
Unidentified Flying Partition?
Two days ago RyanWI posted about an unidentified partition which generates errors in the vmkernel log. Today I visited a customer with the same setup, Dell servers and an EMC SAN and I was curious if I could reproduce this error. Well I did not had not much trouble reproducing the error because they already had the same issue. I encountered the following errors in the vmkernel log:
Feb 12 09:16:47 esx1 kernel: SCSI device sdc: 2097151 512-byte hdwr sectors (1074 MB)
Feb 12 09:16:47 esx1 kernel: sdc: I/O error: dev 08:20, sector 0
Feb 12 09:16:47 esx1 kernel: I/O error: dev 08:20, sector 0
Feb 12 09:16:47 esx1 kernel: unable to read partition table
After a close inspection it seems that in the DRAC Bios of the Dell server the “Virtual Media Configuration, Virtual Media” is set to “attached”, this causes a unrecognizable device to appear which causes the above errors. If you set the “Virtual Media” to “detached” the problem is solved.
Netapp SnapManager for Virtual Infrastructure
Netapp just announced a new product “SnapManager® for Virtual Infrastructure”:
SnapManager for Virtual Infrastructure enables customers to protect their VMware environments with automated data protection and recovery of their virtual machines. SnapManager for Virtual Infrastructure dramatically reduces human error and increases server utilization for application workloads by eliminating the interruptions and performance impact caused by traditional server-hosted backups and restores. As a result, customers can protect their data more reliably. More information is available at www.netapp.com.
Add multiple SCSI controllers to your VM to improve performance
[edit 18-02-2011: It has come to my attention that the info in this article was incorrect / outdated. The LSI Logic has a default queuedepth of 32. Even if the LSI could go higher than 32 it would be capped by either the device queue depth or disk.schednumreqoutstanding. To enable a single VM to have a queuedepth larger than 32 the pvscsi card should be used and for optimal performance all layers should be aligned.)
A couple of months ago at the Dutch VMug meeting Bouke-Jumé gave some good storage tips. This is one of them:
The LSI Bus Logic Controller / Driver has a standard queue depth of 256. Although it isn’t possible to set this higher it is possible to add a second controller and when you make sure the SCSI ID of your disk corresponds to the SCSI card you will have another queue of 256. This can lead to improved performance for Database Servers, Fileservers and other I/O intensive VM’s.
Open the properties of the VM,
For the first disk and SCSI Controller 0 go to the virtual disk and select 0:X
For the second disk and SCSI Controller 1 go to the virtual disk and select 1:X
And so on…