Today I received an email from Chethan Kumar who is part of Performance Engineering. Chethan rewrote the vSphere Performance Troubleshooting guide so that it contains all the nitty gritty details there are to know about vSphere 4.1 when it comes to performance related problems.
The hugely popular Performance Troubleshooting for VMware vSphere 4 guide is now updated for vSphere 4.1 . This document provides step-by-step approach for troubleshooting most common performance problems in vSphere-based virtual environments. The steps discussed in the document use performance data and charts readily available in the vSphere Client and esxtop to aid the troubleshooting flows. Each performance troubleshooting flow has two parts:
- How to identify the problem using specific performance counters.
- Possible causes of the problem and solutions to solve it.
Now the cool thing about this document is that it not only shows you some of the key metrics to look at but it also provides you with workflows that contains the steps required to analyze these types of issues and of course ultimately solve them. Just download the paper and as an example look at figure 3 on page 16 or figure 5 on page 19 and you’ll know why I love this document.
This performance troubleshooting guide is one of the most invaluable documents out there as it will also give you a better in-sight of how some of the metrics are constructed. Something that I did not know but just discovered is the following:
If you have a snapshot and are experiencing high I/O response times validate the following:
LAT/rd and LAT/wr columns indicate the average response time of read and write I/O commands seen by the VM. Compare these with DAVG/rd and DAVG/rw. If LAT/rd > DAVG/rd or LAT/rw > DAVG/rw and QUED = 0 than the latency is more than likely cause by the snapshot as DAVG/rd and DAVG/rw is device latency and LAT/rd and LAT/rw is the latency observed by the guest.
You see, you can learn something new every day.