VMware

Changing the PSP from Fixed to RR

Duncan Epping · Mar 21, 2011 ·

Today I was fooling around with my new Lab environment when I noticed my Path Selection Policy (PSP) was set to fixed while the array (Clariion CX4-120) most definitely supports Round Robin (RR). I wrote about it in the past(1, 2) but as with vSphere 4.1 the commands slightly changed I figured it wouldn’t hurt to write it down again:

First I validated what the currently used Storage Array Type Plugin (SATP) was and which Path Selected Policy was used:

esxcli nmp device list

(note that compared to 4.1 the “storage” bit was added… yes a minor but important change!)

Than I wanted to make sure that every single LUN that would be added would get the standard PSP for Round Robin:

esxcli nmp satp setdefaultpsp --satp VMW_SATP_ALUA_CX --psp VMW_PSP_RR

Now I also needed to set the PSP per LUN, for which I used these two lines of “script”:

for i in `ls /vmfs/devices/disks | grep naa.600`;
do esxcli nmp device setpolicy --device $i --psp VMW_PSP_RR;done

And I figured why not just set the number of IOps down to 1 as well just to see if it changes anything:

for i in `ls /vmfs/devices/disks/ | grep naa.600`;
do esxcli nmp roundrobin setconfig --device $i --type "iops" --iops=1;done

Setting “iops=1” Didn’t make much difference for me, but it appears to be a general recommendation these days so I figured it would be best to include it.

Before I forget, I wanted to document this as well. For my testing I used the following command which lets you clone a VMDK and time it:

time vmkfstools -i source.vmdk destination.vmdk

And the result would look as follows:

Destination disk format: VMFS zeroedthick
Cloning disk 'destination.vmdk'...
Clone: 100% done.
real    2m 9.67s
user    0m 0.33s
sys     0m 0.00s

Something that might be useful as well, timing the creation of a zeroedthick VMDK:

time vmkfstools -c 30G -d eagerzeroedthick newdisk.vmdk

I am using this to measure the difference between using and not using VAAI on a storage platform. It is a lot easier than constantly kicking off tasks in through vCenter. (Yes Alan and Luc I know it is way easier with PowerCLI.)

Talking about face melting stuff

Duncan Epping · Mar 7, 2011 ·

Yes not only Chad Sakac deals with face melting ultra uber geeky cool stuff I do as well (those on twitter know what I am referring to), but unfortunately I cannot share the details on some of the stuff I am working on. I can however provide you a link which contains the papers written by the engineers on some of the stuff that might be coming up in the future. Note the “might”, there is no guarantee it will ever make it into the VMware products, but nevertheless still cool to read:

http://labs.vmware.com/publications

Is this all distant future? No it isn’t. For instance the paper that talks about Parda is actually what ended up as Storage I/O Control in 4.1. A couple I would recommend reading or at least that had my personal interest are:

Re: Large Pages (@gabvirtualworld @frankdenneman @forbesguthrie)

Duncan Epping · Jan 26, 2011 ·

I was reading an article by one of my Tech Marketing colleagues, Kyle Gleed and coincidentally Gabe published an article about the same topic to which Frank replied and just now Forbes Guthrie… the topic being Large Pages. I have written about this topic many times in the past and both Kyle, Gabe, Forbes and Frank mentioned the possible impact of large pages so I won’t go into detail.

There appears to be a lot of concerns around the benefits and the possible downside of leaving it enabled in terms of monitoring memory usage. There are a couple of things I want to discuss as I have the feeling that not everyone fully understands the concept.

First of all what are the Large/Small Pages? Small Pages are regular 4k memory pages and Large Pages are 2m pages. I guess the difference is pretty obvious. Now as Frank explained when using Large Pages there is a difference in TLB(translation lookaside buffer) entries; basically a VM provisioned with 2GB would need would need a 1000 TLB entries with Large Pages and 512.000 with Small Pages. Now you might wonder what this has got to do with your VM, well that’s easy… If you have an CPU that has EPT(Intel) or RVI(AMD) capabilities the VMkernel will try to back ALL pages with Large Pages.

Please read that last sentence again and spot what I tried to emphasize. All pages. So in other words where Gabe was talking about “does your Application really benefit from” I would like to state that that is irrelevant. We are not merely talking about just your application, but about your VM as a whole. By backing all pages by Large Pages the chances of TLB misses are decreased, and for those who never looked into what the TLB does I would suggest reading this excellent wikipedia page. Let me give you the conclusion though, TLB misses will increase latency from a memory perspective.

That’s not just it, the other thing I wanted to share is the “impact” of breaking up the large pages into small pages when there is memory pressure. As Frank so elegantly stated “the VMkernel will resort to share-before-swap and compress-before-swap”. There is no nicer way of expressing uber-sweetness I guess. Now one thing that Frank did not mention though is that if the VMkernel detects memory pressure has been relieved it will start defragmenting small pages and form large pages again so that the workload can benefit again from the performance increase that these bring.

Now the question remains what kind of performance benefits can we expect as some appear to be under the impression that when the application doesn’t use large pages there is no benefit. I have personally conducted several tests with a XenApp workload and measured a 15% performance increase and on top of that less peaks and lower response times. Now this isn’t a guarantee that you will see the same behavior or results, but I can assure it is beneficial for your workload regardless of what types of pages are used. Small on Large or Large on Large, all will benefit and so will you…

I guess the conclusion is, don’t worry too much as vSphere will sort it out for you!

Enable Storage IO Control on all Datastores!

Duncan Epping · Jan 20, 2011 ·

This week I received an email from one of my readers about some weird Storage IO Control behavior in their environment. On a regular basis he would receive an error stating that an “external I/O workload has been detected on shared datastore running Storage I/O Control (SIOC) for congestion management”. He did a quick scan of his complete environment and couldn’t find any hosts connecting to those volumes. After exchanging a couple of emails about the environment I managed to figure out what triggered this alert.

Now this all sounds very logical but probably is one of the most common made mistakes… sharing spindles. Some storage platforms carve out a volume from a specific set of spindles. This means that these spindles are solely dedicated to that particular volume. Other storage platforms however group spindles and layer volumes across these. Simply said, they are sharing spindles to increase performance. NetApp’s “aggregates” and HP’s “disk groups” would be a good example.

This can and probably will cause the alarm to be triggered as essentially an unknown workload is impacting your datastore performance. If you are designing your environment from the ground-up, make sure that all spindles that are backing your VMFS volumes have SIOC enabled.

However, in an existing environment this will be difficult, don’t worry that SIOC will be overly conservative and unnecessarily throttle your virtual workload. If and when SIOC detects an external workload it will stop throttling the virtual workload to avoid giving the external more bandwidth while negatively impact the virtual workload. From a throttling perspective that will look as follows:

32 29 28 27 25 24 22 20 (detect nonVI –> Max Qdepth )
32 31 29 28 26 25 (detect nonVI –> Max Qdepth)
32 30 29 27 25 24 (detect nonVI –> Max Qdepth)
…..

Please note that the above example depicts a scenario where SIOC notices that the latency threshold is still exceeded and the cycle will start again, SIOC checks latency values every 4 seconds. The question of course remains how SIOC knows that there is an external workload accessing the datastore. SIOC uses a what we call a “self-learning algorithm”. It keeps track of historical observed latency, outstanding IOs and window sizes. Based on that info it can identify anomalies and that is what triggers the alarm.

To summarize:

Enable SIOC on all datastores that are backed by the same set of spindles
If you are designing a green field implementation try to avoid sharing spindles between non VMware and VMware workloads

More details about when this event could be triggered can be found in this KB article.

Changes

Duncan Epping · Jan 17, 2011 ·

It is that time of the year again… Roughly 1 year ago I blogged about the fact that I joined the VMware Cloud Practice, today I want to let you guys know that I have accepted a new job role within VMware as a Principal Architect working for the Technical Marketing team.

While I have enjoyed working within the PSO/TS organization, I have always been tempted to be part of an R&D organization to get as close to the source as possible, and of course to be able to influence the products, features and the direction of those. I never expected it to happen this soon though but the opportunity presented itself and as you can imagine I grabbed it with both hands. Joining Technical Marketing means that I will be able to focus more on educating people through white-papers, books, documentation and yellow-bricks.com. On top of that the promotion to Principal is a huge recognition; to be amongst the ranks of Pang Chen, Lee Dilworth, John Arrasjid, Dan Anderson and others is a true honor.

I want to thank everyone who made this possible, you know who you are! I cannot wait to get started,

Duncan