• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Yellow Bricks

by Duncan Epping

  • Home
  • Unexplored Territory Podcast
  • HA Deepdive
  • ESXTOP
  • Stickers/Shirts
  • Privacy Policy
  • About
  • Show Search
Hide Search

troubleshooting

Cool Tool: opvizor

Duncan Epping · Dec 7, 2010 ·

Recently Dennis Zimmer, which most of you probably know of Icomasoft or from the books he authored, emailed me about a new tool his company was developing. I watched the video that is hosted on opvizor.com and must admit that it looks promising. Especially as most solutions today are reactive or semi-pro-active and opvizor is aiming to be pro-active.

opvizor identifies in advance when the virtualized IT infrastructure is lo osing on performance or might crash. Issues in VMware environments can be analyzed and corrected before they become dangerous. In addition, opvizor provides optimized logfiles and makes it possible to share the infrastructure data with internal and external partners, thus allowing more efficient problem solving. “Our goal is, that opvizor anticipates 60 percent of issues from system behavior.”

Now the tool just entered the Beta stage and opvizor is looking for people willing to give it a testdrive and willing to provide feedback! Funnily enough the tool kind of reminds me of a great tool we use internally to take vm-support files apart and analyze them. I can assure you that with the right amount of work / commitment this can turn into a really powerful tool to monitor / healthcheck your environment on a regular basis.

Configuring HA: Error while running health check script

Duncan Epping · Nov 24, 2010 ·

I was just cleaning up our Cloud Lab and noticed HA wasn’t enabled. I enabled it and immediately it threw the following error at me:

Error Message: Configuration Issues – HA agent on esx4.mgm.local in cluster ams-hadrs-01 in Lab2 has an error : Error while running health check script.

When experiencing HA configuration issues there are a couple of steps I usually take to try to fix the experienced issues:

  • Click “reconfigure for VMware HA” and see if the issue is still there, if so:
    • Is DNS configured and does it actually work? If not, fix and reconfigure for HA.
    • Is the gateway reachable? If not, fix and reconfigure for HA.

This usually solves 75% of the issues. If it hasn’t been fixed the next step I usually take is unloading the agent and restarting the management services. Although it is pretty rigurous it is the fastest way of fixing HA issues.  In my case I am using ESXi and this is what I needed to do to clean up the host:

  • Disable HA on the cluster
  • /opt/vmware/aam/VMware-aam-ha-uninstall.sh
  • /sbin/services.sh restart
  • Enable HA on the cluster

This solved the issue I had with HA,

Did you know? SCSI Reservations…

Duncan Epping · Oct 26, 2010 ·

Today we had an interesting discussion on the VCDX mailing list. One thing I noticed a while back when I was randomly looking around in “esxtop” was a new field. The field is called ” RESVSTATS and can be enabled in all disk related displays(d, u,v).

esxtop performance reservations scsi

This will make troubleshooting storage related performance issues a bit easier as the SCSI Reservations(RESV/S) are shown a column(click the screenshot for a larger version) when enabled, and even more specifically SCSI Reservation Conflicts (CONS) are shown next to it):

NFS based automated installs of ESX 4

Duncan Epping · Mar 26, 2010 ·

Just something I noticed today while testing an automated install from NFS. The arguments I pass to the installer are:

initrd=initrd.img mem=512m ksdevice=vmnic1 ip=192.168.1.123 netmask=255.255.255.0 gateway=192.168.1.1 ks=nfs://192.168.1.10:/nfs/install/ks.cfg quiet

Let’s focus on the part that’s incorrect, with ESX 3 the following bit(part of the bootstrap above) would work:

ks=nfs://192.168.1.10:/nfs/install/ks.cfg

As of ESX 4 this doesn’t work anymore, and when I do an “alt-f2” and go to /var/log and check the esx-installer.log file it shows the following error:

mount: 192.168.1.10::nfs/install failed, reason given by server: Permission denied

After checking the permissions on my NFS share 4 times I was pretty certain that this could not cause this issue. After trying some various combinations I noticed that the format of the string for “ks” has changed. As of ESX 4 you can’t use the second colon(:) anymore. So the correct format is:

ks=nfs://192.168.1.10/nfs/install/ks.cfg

I still receive a warning but the installer does continue. If anyone knows why the following message is displayed please speak up:

No COS NICs have been added by the user

E1000 and dropped rx packets

Duncan Epping · Feb 2, 2010 ·

At a customer site we received several notifications of performance issues with a VMware VI3 environment. After having checked the configuration of the VMs and the Hosts we decided to dive into esxtop. At first sight we did not see any abnormalities. Low %RDY, which is usually the first thing I check, some swapping but not enough to cause any major delays. The weird thing about this one is that it seemed that only when IP was sent/received the VM felt sluggish. As we could not reproduce the issue we decided to start esxtop in batchmode and use esxplot and perfmon to get to the bottom of it. Soon we found what the issue was, receive packets were being dropped at the vSwitch level.

The following screenshot depicts the symptoms.

In other words, at times an enormous amount of received packets were dropped. After some research I found an article which actually describes this behavior. (http://kb.vmware.com/kb/1010071) We tried increasing the buffer size for the E1000 virtual network adapter this VM was configured with but it did not resolve the issue. As there were other drivers mentioned in the post we decided to “upgrade” the NIC to a “vmxnet” NIC and this actually resolved the issue. Although performance is not where we expected it would be yet we are not seeing any dropped packets anymore and can focus on the next possible cause.

  • « Go to Previous Page
  • Page 1
  • Page 2
  • Page 3
  • Page 4
  • Go to Next Page »

Primary Sidebar

About the Author

Duncan Epping is a Chief Technologist and Distinguished Engineering Architect at Broadcom. Besides writing on Yellow-Bricks, Duncan is the co-author of the vSAN Deep Dive and the vSphere Clustering Deep Dive book series. Duncan is also the host of the Unexplored Territory Podcast.

Follow Us

  • X
  • Spotify
  • RSS Feed
  • LinkedIn

Recommended Book(s)

Advertisements




Copyright Yellow-Bricks.com © 2025 · Log in