Yellow Bricks

Calling all virtual appliance vendors!

Duncan Epping · Feb 1, 2011 ·

Lately I have been playing around in my Lab a lot. I tried many virtual appliances as I wanted to use a variety of workloads. I downloaded many appliances and hoped to have all of them up and running in a bare minimum amount of time. Apparently I miscalculated / underestimated the amount of work to get a virtual appliance up and running. Yes of course there are a whole bunch that will work out of the box, and all of these have one thing in common:

OVF

Yes, deploying virtual appliances is a lot easier when they are packaged as an OVF or even an OVA for that matter. Packaging a virtual appliance which is “tarred” and “gzipped” using an old version of VMware Workstation with a sparse disk format doesn’t cut it any more in the age of automation and transportability. Although that works great on the workstation you developed it on it doesn’t really make it easy for your customer to deploy it.

I am not saying this to make my life easier, but I truly believe that the adoption of a standard like OVF will also increase the adoption of your product. Instead of jumping through hoops to get the appliance running people can focus on what it really is about, your product.

It is time to start adopting OVF.

VCDX 4

Duncan Epping · Jan 31, 2011 ·

It took a while, but finally here it is the VCDX 4 program. Now some of us who were fortunate enough to complete the VCDX 3 certification and managed to pass VCAP-DCD are already a VCDX 4, but for those of you who just started the journey now is the time to start digging!

In general the process hasn’t really changed all that much:

Pass the VCP exam
Pass the VCAP-DCA exam
Pass the VCAP-DCD exam
Submit a vSphere 4.x design
Defend your vSphere 4.x design

There are a couple of things I do want to point out here though:

The VCDX Application Form has changed! We received a lot of feedback on the form and based on that we decided to trim it down. So if you are planning to do the VCDX4 defense make sure you download the latest version of the form. Also note that the design decision tree was taken out, reason for that being is that many already included these decisions in their design. If you haven’t done that yet, make sure you do it. For every decision make sure you explain why/what etc.
Dates… no indeed no dates have been announced so far. However, the VCDX4 website does state the following “Unlike previous years, currently, there are no plans to have VCDX Defenses coinciding with the VMworld events in 2011 at this time”. Some of you will ask why, well just think about it for a second you have VMworld and then during VMworld you have the top experts confined to a room with hardly any opportunity to present/attend. I guess that is the main reason, I said “guess” cause I wasn’t part of the decision making process.

All there is left to say, if you plan on certifying this year make sure you start writing today! Your design will be a lot of work, make sure you meet the requirements and don’t forget any of the required documents mentioned in the application. For tips, do a search on VCDX on my blog…

Re: Large Pages (@gabvirtualworld @frankdenneman @forbesguthrie)

Duncan Epping · Jan 26, 2011 ·

I was reading an article by one of my Tech Marketing colleagues, Kyle Gleed and coincidentally Gabe published an article about the same topic to which Frank replied and just now Forbes Guthrie… the topic being Large Pages. I have written about this topic many times in the past and both Kyle, Gabe, Forbes and Frank mentioned the possible impact of large pages so I won’t go into detail.

There appears to be a lot of concerns around the benefits and the possible downside of leaving it enabled in terms of monitoring memory usage. There are a couple of things I want to discuss as I have the feeling that not everyone fully understands the concept.

First of all what are the Large/Small Pages? Small Pages are regular 4k memory pages and Large Pages are 2m pages. I guess the difference is pretty obvious. Now as Frank explained when using Large Pages there is a difference in TLB(translation lookaside buffer) entries; basically a VM provisioned with 2GB would need would need a 1000 TLB entries with Large Pages and 512.000 with Small Pages. Now you might wonder what this has got to do with your VM, well that’s easy… If you have an CPU that has EPT(Intel) or RVI(AMD) capabilities the VMkernel will try to back ALL pages with Large Pages.

Please read that last sentence again and spot what I tried to emphasize. All pages. So in other words where Gabe was talking about “does your Application really benefit from” I would like to state that that is irrelevant. We are not merely talking about just your application, but about your VM as a whole. By backing all pages by Large Pages the chances of TLB misses are decreased, and for those who never looked into what the TLB does I would suggest reading this excellent wikipedia page. Let me give you the conclusion though, TLB misses will increase latency from a memory perspective.

That’s not just it, the other thing I wanted to share is the “impact” of breaking up the large pages into small pages when there is memory pressure. As Frank so elegantly stated “the VMkernel will resort to share-before-swap and compress-before-swap”. There is no nicer way of expressing uber-sweetness I guess. Now one thing that Frank did not mention though is that if the VMkernel detects memory pressure has been relieved it will start defragmenting small pages and form large pages again so that the workload can benefit again from the performance increase that these bring.

Now the question remains what kind of performance benefits can we expect as some appear to be under the impression that when the application doesn’t use large pages there is no benefit. I have personally conducted several tests with a XenApp workload and measured a 15% performance increase and on top of that less peaks and lower response times. Now this isn’t a guarantee that you will see the same behavior or results, but I can assure it is beneficial for your workload regardless of what types of pages are used. Small on Large or Large on Large, all will benefit and so will you…

I guess the conclusion is, don’t worry too much as vSphere will sort it out for you!

Cool Tool Update: RVTools 3.0

Duncan Epping · Jan 23, 2011 ·

When I was enjoying some family time yesterday Eric Sloof stole my usual RVTools scoop. Nevertheless I believe it is worth publishing this as RVTools is one of the most valuable free non-vendor tools out there. Rob de Veij released a major version of RVTools. There are couple of major improvements in this version and hence the reason it took Rob slightly longer than expected to come with this update.

Here are the improvements in RVTools 3.0:

Pass-through authentication implemented. Allows you to use your logged on Windows credentials to automatically logon.
All numeric columns are now formated to make it more readable.
On vInfo the columns Commited, Uncommited, Shared and on vSnapshot the column size are now formated in MBs instead of bytes.
New tabpage created with service console and VMKernel information.
Now using vSphere Web Services SDK 4.1 which supports the new features available in vSphere 4.1
Export to csv file now uses Windows regional separator
Using NPOI to make it possible to write directly to xls files without the need for a installed Excel version on the system.
New menu function to write all information to one excel workbook with for each tabpage a new worksheet.
New command line options. Check the documentation!

Download it now,

Enable Storage IO Control on all Datastores!

Duncan Epping · Jan 20, 2011 ·

This week I received an email from one of my readers about some weird Storage IO Control behavior in their environment. On a regular basis he would receive an error stating that an “external I/O workload has been detected on shared datastore running Storage I/O Control (SIOC) for congestion management”. He did a quick scan of his complete environment and couldn’t find any hosts connecting to those volumes. After exchanging a couple of emails about the environment I managed to figure out what triggered this alert.

Now this all sounds very logical but probably is one of the most common made mistakes… sharing spindles. Some storage platforms carve out a volume from a specific set of spindles. This means that these spindles are solely dedicated to that particular volume. Other storage platforms however group spindles and layer volumes across these. Simply said, they are sharing spindles to increase performance. NetApp’s “aggregates” and HP’s “disk groups” would be a good example.

This can and probably will cause the alarm to be triggered as essentially an unknown workload is impacting your datastore performance. If you are designing your environment from the ground-up, make sure that all spindles that are backing your VMFS volumes have SIOC enabled.

However, in an existing environment this will be difficult, don’t worry that SIOC will be overly conservative and unnecessarily throttle your virtual workload. If and when SIOC detects an external workload it will stop throttling the virtual workload to avoid giving the external more bandwidth while negatively impact the virtual workload. From a throttling perspective that will look as follows:

32 29 28 27 25 24 22 20 (detect nonVI –> Max Qdepth )
32 31 29 28 26 25 (detect nonVI –> Max Qdepth)
32 30 29 27 25 24 (detect nonVI –> Max Qdepth)
…..

Please note that the above example depicts a scenario where SIOC notices that the latency threshold is still exceeded and the cycle will start again, SIOC checks latency values every 4 seconds. The question of course remains how SIOC knows that there is an external workload accessing the datastore. SIOC uses a what we call a “self-learning algorithm”. It keeps track of historical observed latency, outstanding IOs and window sizes. Based on that info it can identify anomalies and that is what triggers the alarm.

To summarize:

Enable SIOC on all datastores that are backed by the same set of spindles
If you are designing a green field implementation try to avoid sharing spindles between non VMware and VMware workloads

More details about when this event could be triggered can be found in this KB article.