performance

First Success of VMware’s Performance Service Offering

Duncan Epping · Aug 17, 2009 ·

Scott Drummonds just posted a new blog article which deals about an upcoming VMware PSO offering. When Scott Drummonds is involved you know the topic of this offering is performance. In this case it’s performance related to SQL databases and I/O bottlenecks, which is probably the most reported issue. As Scott explains briefly they were able to identify the issue rather quickly by monitoring the physical servers and the virtual environment.

I guess the quote of Scott’s article captures the essence:

In the customer’s first implementation of the virtual infrastructure, both SQL Servers, X and Y, were placed on RAID group A. But in the native configuration SQL Server X was placed on RAID group B. This meant that the storage bandwidth of the physical configuration was approximately 1850 IOPS. In the virtual configuration the two databases shared a single 800 IOPS RAID volume.It does not take a rocket scientist to realize that users are going to complain when a critical SQL Server instances goes from 1050 IOPS to 400. And this was not news to the VI admin on-site, either. What we found as we investigated further was that virtual disks requested by the application owners were used in unexpected and undocumented ways and frequently demanded more throughput than originally estimated. In fact, through vscsiStats analysis (Using vscsiStats for Storage Performance Analysis), my contact and I were able to identify an “unused” VMDK with moderate sequential IO that we immediately recognized as log traffic. Inspection of the application’s configuration confirmed this.

vSphere CPU Scheduler whitepaper, this is it!!

Duncan Epping · Aug 13, 2009 ·

This is the whitepaper I’ve been waiting for. By now we all know that the CPU Scheduler has changed. The only problem is that there wasn’t any official documentation about what changed and where we would benefit. Well this has changed. VMware just published a new whitepaper titled “The CPU Scheduler in VMware® ESX™ 4“.

The CPU scheduler in VMware ESX 4 is crucial to providing good performance in a consolidated environment. Since most modern processors are equipped with multiple cores per processor, systems with tens of cores running hundreds of virtual machines are common. In such a large system, allocating CPU resource efficiently and fairly is critical. In ESX 4, there are significant changes to the ESX CPU scheduler that improve performance and scalability. This paper describes these changes and their impact. This paper also provides details of the CPU scheduling algorithms in the ESX server.

I can elaborate all I want but I need you guys to read the whitepaper to understand why vSphere is performing a lot better than VI 3.5. (I will give you a hint: “cell”.)

Another whitepaper that’s definitely worth reading is “Virtual Machine Monitor Execution Modes: in VMware vSphere 4.0“.

The monitor is a thin layer that provides virtual x86 hardware to the overlying operating system. This paper contains VMware vSphere 4.0 default monitor modes chosen for many popular guests running modern x86 CPUs. While most workloads perform well under these default settings, a user may derive performance benefits by overriding the defaults. The paper examines situations where manual monitor mode configuration may be practical and provides two ways of changing the default monitor mode of the virtual machine in vSphere.

And while you arealready taking the time off to educate yourself you might also want to read the “FT Architecture and Performance” whitepaper. Definitely worth reading!

Change the default pathing policy to round robin

Duncan Epping · Jul 10, 2009 ·

I just received an email from one one of my readers, Mike Laskowski, he wanted to share the following with us:

I have over 100+ LUN’s in my environment. Round Robin is officially supported on ESX4. In the past we had a script that would manually load balance the LUN’s across FAs. ESX4 has a different way to balance the LUN’s to round robin. What you can do is build the ESX server and then in the CLI do:

esxcli nmp satp setdefaultpsp –psp VMW_PSP_RR –satp VMW_SATP_SYMM

Note: You should do this before presenting LUN’s and adding datastore. If you already have LUN’s presented and datastore added, then you do that command and then you’ll have to reboot the ESX server to take effect. This will make Round Robin the default on all LUN’s. It would take forever if you had to manually change each LUN.

THX Mike Laskowski

Please note that this example is specifically for the “SYMM” SATP. SATP stands for Storage Array Type Plugin and Symm stands for EMC DMX Symmetrix. In case you are using a different array find out what the SATP is you are using and change it accordingly.

vSphere performance

Duncan Epping · May 19, 2009 ·

The last couple of weeks I’ve seen all these performance numbers(most not publicly available though) of vSphere, one even more impressing than the other. I think every one will agree that the latest one is really impressive, 364.00 IOPS is just insane. There’s no load vSphere can’t handle, when correctly sized of course.

But something that even made a bigger impression on me, as a consolidation fanatic, is the following line from the latest performance study:

VMware’s new paravirtualized SCSI adapter (pvSCSI) offered 12% improvement in throughput at 18% less CPU cost compared to LSI virtual adapter

Now this may not sound like much, but when you are running 50 hosts it will make a difference. It will save you on cooling / rack space / power / hardware / maintenance, in other words this will have it’s effect on your ROI and TCO. This is the kind of info that I would love to see more, where did we cut down on “overhead”… Which improvements will make our consolidation numbers go up?!

CPU Affinity…

Duncan Epping · Apr 28, 2009 ·

I was just reading a discussion on the VMTN community on CPU affinity. The general opinion of the Experts is “Don’t use CPU affinity”. I fully agree with them, ESX is more than capable to handle the scheduling on it’s own with just a limited overhead. And like Ken Cline also stresses it could harm performance because of NUMA load balancing for instance.

Something that’s often overlooked though when one does CPU affinity is that people tend to give the VM vCPUs a 1:1 relationship with host cores. In other words a VM with two vCPUs will be pinned down to two cores on the host.

This does make sense doesn’t it? No it actually doesn’t. There’s more to a VM than just it’s vCPUs. I would like to refer to page 132 of the Resource Management Guide, aka HA-DRS Bible. In short, besides the vCPUs there are several VM associated threads that need to be scheduled as well. When affinity is set these threads, or worlds as VMware calls them, will be scheduled on the assigned cores. You can imagine that when you use the vCenter client to manage the client these threads(Video / Keyboard / Mouse / CD-Rom etc) will need to be scheduled on the same set of cores as the vCPUs need to be scheduled on… If you have a two vCPU VM and want to use CPU affinity pin it down to at least three cores! Before you start assigning cores to your VM also read the bulletpoints on page 133 why you shouldn’t.

The CPU affinity setting for a virtual machine applies not only to all of the virtual CPUs associated with the virtual machine, but also to all other threads (also known as “worlds”) associated with the virtual machine. Such virtual machine threads perform processing required for emulating mouse, keyboard, screen, CD‐ROM and miscellaneous legacy devices.

In some cases, such as display‐intensive workloads, significant communication might occur between the virtual CPUs and these other virtual machine threads. Performance might degrade if the virtual machineʹs affinity setting prevents these additional threads from being scheduled concurrently with the virtual machineʹs virtual CPUs (for example, a uniprocessor virtual machine with affinity to a single CPU, or a two‐way SMP virtual machine with affinity to only two CPUs).

For the best performance, when manual affinity settings are used, VMware recommends that you include at least one additional physical CPU in the affinity setting in order to allow at least one of the virtual machineʹs threads to be scheduled at
the same time as its virtual CPUs (for example, a uniprocessor virtual machine with affinity to at least two CPUs or a two‐way SMP virtual machine with affinity to at least three CPUs).