If your hostd, vpxd and or vpxa log files are rotating to quickly or not quick enough check out this great KB article on how to set this up. The most important thing to set is the maximum amount of logfiles and the size. Be sure to do a basic calculation so your logging partition won’t fill up completely! And while your at it, might be worth setting up the rotation scheme for vmkernel, messages etc. If I can find the time I will do a blog article on that one later this week.
kb
Consolidating snapshots
I was just reading this excellent KB article on consolidating snapshots. It really contains a wealth of information and a procedure on how to remove a snapshot of a disk that has been changed in size for instance.
if the disk has been changed in size the following error will show up during the boot process of the VM:
The parent virtual disk has been modified since the child was created
Read the full article for the solution.
Converting Domain Controllers
Just noticed this great VMware KB article. The article deals about converting aka p2v’ing Microsoft Domain Controllers. Those of you who have done VMware implementations and migrations know that this usually causes problems and leaves the Active Directory in a faulty state. This will lead to replication not working properly anymore. My advise usually is: Create a new VM from a template and do a “dcpromo”, best solution to also get rid of the slack. Or do a “cold migration”, no and I repeat NO hot migration. This will kill your replication for sure. Anyway, read this KB Article for more info.
This Microsoft KB article deals about the problems that may occur when doing a P2V. It also contains a very important piece of information:
Microsoft does not support any other process that takes a snapshot of the elements of an Active Directory domain controller’s system state and copies elements of that system state to an operating system image. Unless an administrator intervenes, such processes cause a USN rollback. This USN rollback causes the direct and transitive replication partners of an incorrectly restored domain controller to have inconsistent objects in their Active Directory databases.
So in other words, hot migrations aren’t supported.
Dell Recovery CD fails to recover ESXi version 3.5
I just noticed this new KB article that deals about not being able to upgrade ESXi on a Dell box because of the fact that the virtual media is attached:
Upgrade to ESXi 3.5 Update 2.
If you cannot upgrade to ESXi 3.5 Update 2, use the following workaround:
- Connect to the DRAC through ILO, as follows:
- Open the Media tab.
- Open the Configuration tab.
- Deselect the Attach virtual media check box.
Boot the ESXi system from the recovery CD.To use DRAC virtual media to perform the recovery, follow these steps:
- Attach the virtual media
- Using the virtual media, boot the machine.
- When the recovery CD is fully loaded, disconnect the virtual media and proceed with the recovery.
Which reminded of the nice I/O errors this Dell DRAC virtual media produces when attached. So be sure to detach the virtual media before you actually run ESX(i). Same goes for Fujitsu blades by the way, when a virtual media has been present it also produces these nice I/O errors:
Feb 12 09:16:47 esx1 kernel: SCSI device sdc: 2097151 512-byte hdwr sectors (1074 MB)
Feb 12 09:16:47 esx1 kernel: sdc: I/O error: dev 08:20, sector 0
Feb 12 09:16:47 esx1 kernel: I/O error: dev 08:20, sector 0
Feb 12 09:16:47 esx1 kernel: unable to read partition table
Which isn’t as bad as it seems, it’s just not able to read the partition. For Fujitsu blades the only workaround I’ve seen so far was to completely disable USB before booting.
NFS.LockDisable what should it be 1 or 0
There has been a lot of discussion(check Scott’s take on this) around this advanced NFS setting called “NFS.LockDisable”. In short, you can disable the locking mechanism on NFS volumes with this setting.
In the past NetApp had a best practices document which stated that it should be disabled by setting it to “1”. But, as some noticed this can and probably will result in corrupt file-systems. So this “best practice” mysteriously disappeared from the NetApp VI3 Best Practices guide and a KB Article with the VMware best practice on this setting popped up.
So if you did set “NFS.LockDisable” to 1 please change it back to “0”.
It might be beneficial to also implement the “prefvmx.ConsolidateDeleteNFSLocks” that Scott discussed along with patch ESX350-200808401-BG. This setting is to avoid long delays when deleting ESX snapshots. This can take up to 30 seconds, which is quite long compared to iSCSI or FC. So you should only implement this fix if you run NFS and do VMware snapshots at them same time and are experiencing these dalays.
I do recommend that everyone with an NFS filer takes a look at the NetApp best practices document because it does contain valuable information, but before you apply it besure that it doesn’t conflict with a VMware best practice!