Scott Drummonds just posted a new blog article which deals about an upcoming VMware PSO offering. When Scott Drummonds is involved you know the topic of this offering is performance. In this case it’s performance related to SQL databases and I/O bottlenecks, which is probably the most reported issue. As Scott explains briefly they were able to identify the issue rather quickly by monitoring the physical servers and the virtual environment.
I guess the quote of Scott’s article captures the essence:
In the customer’s first implementation of the virtual infrastructure, both SQL Servers, X and Y, were placed on RAID group A. But in the native configuration SQL Server X was placed on RAID group B. This meant that the storage bandwidth of the physical configuration was approximately 1850 IOPS. In the virtual configuration the two databases shared a single 800 IOPS RAID volume.It does not take a rocket scientist to realize that users are going to complain when a critical SQL Server instances goes from 1050 IOPS to 400. And this was not news to the VI admin on-site, either. What we found as we investigated further was that virtual disks requested by the application owners were used in unexpected and undocumented ways and frequently demanded more throughput than originally estimated. In fact, through vscsiStats analysis (Using vscsiStats for Storage Performance Analysis), my contact and I were able to identify an “unused” VMDK with moderate sequential IO that we immediately recognized as log traffic. Inspection of the application’s configuration confirmed this.
daniel says
Interesting read, it seems that of all the criticism and skepticism vmware gets, virtualized SQL is usually on top. We’re just starting to virtualize oracle/mssql/postgresql/mysql and it’s good to know that there are solutions for most problems we might run into.
John van der Sluis says
I see this every week. Think about your core infrastructure, application requirements and correct sizing based on those requirements as well as following best practices not only for SQL in this example but the underlying guest OS also. I cant tell you how many times “performance” issues were related back to misconfiguration on a guest or not following best practices you would also do on a physical infrastructure.