Storage vMotion performance difference?

Last week I wrote about the different datamovers being used when a Storage vMotion is initiated and the destination VMFS volume has a different blocksize as the source VMFS volume. Not only will it make a difference in terms of reclaiming zero space, but as mentioned it also makes a different in performance. The question that always arises is how much difference does it make? Well this week there was a question on the VMTN community regarding a SvMotion from FC to FATA and the slow performance. Of course within a second FATA was blamed, but that wasn’t actually the cause of this problem. The FATA disks were formatted with a different blocksize and that cause the legacy datamover to be used. I asked Paul, who started the thread, if he could check what the difference would be when equal blocksizes were used. Today Paul did his tests and he blogged about it here and but I copied the table which contains the details that shows you what performance improvement the fs3dm (please note, that VAAI is not used… this is purely a different datamover) brought:

From To Duration in minutes
FC datastore 1MB blocksize FATA datastore 4MB blocksize 08:01
FATA datastore 4MB blocksize FC datastore 1MB blocksize 12:49
FC datastore 4MB blocksize FATA datastore 4MB blocksize 02:36
FATA datastore 4MB blocksize FC datastore 4MB blocksize 02:24

As I explained in my article about the datamover, the difference is caused by the fact that the data doesn’t travel all the way up the stack… and yes the difference is huge!

Storage Performance

This is just a post to make it easier finding these excellent articles/threads on VMTN about measuring storage performance:

All these have one “requirement”  and that is that Iometer is used.

Another one that I wanted to point out are these excellent scripts that Clinton Kitson created which collects and processes vscsistats data. That by itself is cool, but what is planned for the next update is even cooler. Live plotted 3d graphs. Can’t wait for that one to be released!

Enable Storage IO Control on all Datastores!

This week I received an email from one of my readers about some weird Storage IO Control behavior in their environment.  On a regular basis he would receive an error stating that an “external I/O workload has been detected on shared datastore running Storage I/O Control (SIOC) for congestion management”. He did a quick scan of his complete environment and couldn’t find any hosts connecting to those volumes. After exchanging a couple of emails about the environment I managed to figure out what triggered this alert.

Now this all sounds very logical but probably is one of the most common made mistakes… sharing spindles. Some storage platforms carve out a volume from a specific set of spindles. This means that these spindles are solely dedicated to that particular volume. Other storage platforms however group spindles and layer volumes across these. Simply said, they are sharing spindles to increase performance. NetApp’s “aggregates” and HP’s “disk groups” would be a good example.

This can and probably will cause the alarm to be triggered as essentially an unknown workload is impacting your datastore performance. If you are designing your environment from the ground-up, make sure that all spindles that are backing your VMFS volumes have SIOC enabled.

However, in an existing environment this will be difficult, don’t worry that SIOC will be overly conservative and unnecessarily throttle your virtual workload. If and when SIOC detects an external workload it will stop throttling the virtual workload to avoid giving the external more bandwidth while negatively impact the virtual workload. From a throttling perspective that will look as follows:

32 29 28 27 25 24 22 20 (detect nonVI –> Max Qdepth )
32 31 29 28 26 25 (detect nonVI –> Max Qdepth)
32 30 29 27 25 24 (detect nonVI –> Max Qdepth)
…..

Please note that the above example depicts a scenario where SIOC notices that the latency threshold is still exceeded and the cycle will start again, SIOC checks latency values every 4 seconds. The question of course remains how SIOC knows that there is an external workload accessing the datastore. SIOC uses a what we call a “self-learning algorithm”. It keeps track of historical observed latency, outstanding IOs and window sizes. Based on that info it can identify anomalies and that is what triggers the alarm.

To summarize:

  • Enable SIOC on all datastores that are backed by the same set of spindles
  • If you are designing a green field implementation try to avoid sharing spindles between non VMware and VMware workloads

More details about when this event could be triggered can be found in this KB article.

Storage IO Control and Storage vMotion?

I received a very good question this week to which I did not have the answer, I had a feeling but that is not enough. The question was if Storage vMotion would be “throttled” by Storage IO Control. As I happened to have a couple of meetings scheduled this week with the actual engineers I asked the question and this was their answer:

Storage IO Control can throttle Storage vMotion when the latency threshold is exceeded. The reason for this being is that Storage vMotion is “billed” to the virtual machine.

This basically means that if you initiate a Storage vMotion the “process” belongs to the VM and as such if the host is throttled the Storage vMotion process might be throttled as well by the local scheduler(SFQ) depending on the amount of shares that were originally allocated to this virtual machine. Definitely something to keep in mind when doing a Storage vMotion of a large virtual machine as it could potentially lead to an increase of the amount of time it takes for the Storage vMotion to complete. Don’t get me wrong, that is not necessarily a negative thing cause at the same time it will prevent that particular Storage vMotion to consume all available bandwidth.

Introducing voiceforvirtual.com

At VMworld I met up with the guys presenting the Storage I/O Control session, Irfan Ahmad and Chethan Kumar. As many of you hopefully know Irfan has always been active in the social media space (virtualscoop.org). Chethan however is “new” and just started his own blog.

Chethan is a Senior Member of the Performance Engineering team  At VMware. He focuses on characterizing / troubleshooting performance of Enterprise Applications (mostly databases) in virtual environments using VMware products. Chethan has also studied the performance characteristic of the VMware storage stack and was one of the people who produced this great whitepaper on Storage I/O Control. Chethan just released his first article and I am sure many excellent articles will follow. Make sure you add him to your bookmarks/rss reader.

Running Virtual Center Database in a Virtual Machine

I just completed an interesting project. For years, we at VMware believed that SQL server databases run well when virtualized. We have illustrated this through several benchmark studies published as white papers. It was time for us to look at real applications. One such application that can be found in most of the vSphere based virtual environments is the database component of the vCenter server (the brain behind a vSphere environment). Using the vCenter database as the application and the resource intensive tasks of the vCenter databases (implemented as stored procedures in SQL server-based databases) as the load generator, I compared the performance of these resource intensive tasks in a virtual machine (in a vSphere 4.1 host) to that in a native server.