I received a preliminary copy of a this report a couple of weeks ago, but since then nothing has changed. NetApp took the time to compare FC against, FCoE, iSCSI and NFS. Like most of us, probably, I still had the VI3 mindset and expected that FC would come out on top. Fact of the matter is that everything is so close, the differences are neglectable and tr-3916 shows that regardless of the type of data access protocol used you can get the same mileage. I am glad NetApp took the time to test these scenarios and used various test scenarios. It is no longer about which protocol works best, what drives the most performance… no it is about what is easiest for you to manage! Are you an NFS shop? No need to switch to FC anymore. Do you like the simplicity of iSCSI? Go for it…
Thanks NetApp for this valuable report. Although this report of course talks about NetApp it is useful material to read for all of you!
a link for the report please ?
the link is in the first sentence?
I would relay like to see similar test on EMC platform 🙂
So would I!
I’m sure some people will view this is an awful result (e.g. claiming that the 10GB FCoE result wasn’t properly tuned or the 8Gb FC implementation on NetApp has now been shown to be slow), to me this is a hugely positive result.
Technology is meant to make the difficult things easy, and the impossible possible, and letting people use the protocol that fits their environment and knowledge best is a fundamental part of that.
Almost every time a vendor says “You’ll need to buy and learn all new stuff”, whether it’s FCoE for an iSCSI user, or NFS for a FC user, they’re making the easy things difficult, just so they can make the sale that suits them.
Curious, whether this is also taking into consideration respective workload types. Ie: SQL or Exchange, Oracle or SAP, etc.
I agree with your findings Duncan that, it’s nice we can choose the protocol we WANT and not have to worry about performance of one vs the other; though for the underlying applications, is that story still true or are some protocols going to be more performant than others?
Hi Christopher,
I think that application types are really not relevant in these tests. The tests measure the throughput and the latency. In other words: how much data goes through and how much fast that data is carried across.
This should cover any application type. Some applications are more sensitive to latency than others, but this test states that there is very little difference, which should mean any application would perform about the same on any of the protocols.
Apart from the question how valid the test is (see my post below), throughput and latency are the two driving factors for any application reaching out to storage.
Looking through the actual report, it occured to me that the maximum load they impose on any of the protocols was maxed at 128 VM’s having 256 TOTAL outstanding 4K IOs. That would mean that the entire storage array would be handling only 256 4K I/O’s per second, or 256*4K = 1 Mbyte per second. Can anyone confirm I am reading this right?
1MB/s is a 1 Gbit link filled up to only 1% of its bandwidth capacity, other links are under utilized even more.
No wonder everything comes out so close…. If this is really what this test was about, it is completely useless in my opinion. I would like to see 25, 50 or even 75% bandwidth usage… Then see what happens!
Hmm you’re right about the bandwidth, I had read it as 256 outstanding I/Os per VM, but on a second reading you’re correct it does say Total outstanding 4K I/Os, so that’s just 1MB of bandwidth.
This does seem to be too low to really provide a significant amount of imformation, though the results are still interesting.
Just to nitpick.
1MByte/s of traffic is 8% of a 1Gbit connection(you mixed bits and bytes).
I still agree that this is far too low to be of much use if correct.
Hi Chris,
Actually I did not confuse bits and bytes. I just used a factor 10 for the bits-bytes conversion to leave some room for overhead on the data transported: 1MByte/s is about 10Mbit/s, which is 1% of 1000Mbit/s = 1Gbit/sec.
256 outstanding IOs does not mean 256 IOs per second!!
@Erik – There are two models which one deploys storage with vSphere, which I will refer to as shared & isolated datastores. In this TR we review both. Shared in sections 3 & 4, and Isolated in section 5.
Shared datastores are large pools typically comprised of multiple VMs, where each VM commonly has low to moderate I/O requests. Shared datastores commonly contain 5-15 VMs with SAN protocols (FC, FCoE, iSCSI) and 60-200 VMs with NAS (NFS). While each VMs I/O load is not large, the aggregated I/O load is rather large.
Isolated data stores are smaller pools comprised of a single VM that has high I/O requirements, such as an OLTP database.
From the details of your question, IOMeter settings including outstanding I/O and block size, applies to the shared datastore tests of section 3.
IOMeter sends I/O requests asynchronously, resulting in an aggregate I/O load on the data store that can be measured in the hundreds of MB/s and tens of thousands of IOPs. I wish I could share with you the actual results, but VMware engineering specifically requests that we only publish relative numbers.
Trust me here; the workload on the shared datastore is massive.
Cheers,
Vaughn
BTW – There’s additional conversation on this topic here: http://nt-ap.com/jzOO9G
This phrase from the test ” The tests were
executed with different numbers of VMs ranging from 32, 64, 96, and 128 executing 64, 128, 192, and
256 total outstanding I/Os, respectively” seemed to me a bit wrong after I had done some calculations.
128 VMs x 256 IO/s x 4Kbyte = 131072Kbytes/s. This result makes sense when you see that 1Gb (both iSCSI and NFS) shows the largest negative difference compared to all other protocols.
I can’t imagine NetApp conducting such tests using so low bandwidth and number of IOs, that would make this test useless from the real life perspective.
I should have read all document before making any conclusions. 🙂
it happens to the best of us!
Cheers!
Vaughn Stewart
NetApp
We have to remember that this measure is only applicable on Netapp storage because those storage system are oriented filer, specially optimised for NFS & CIFS protocol.
@JBRISON – With regards to your personal view that the data in this report is slanted towards NAS protocols, I respectfully disagree.
I would suggets you familiarize your self with some additional performance testing completed by NetApp and VMware engineering which did compare NetApp NAS to SAN from vendor XXX*.
http://nt-ap.com/laK5BN
* As vendor XXX refused to authorize these results in this report, we cannot legally disclose the name of this vendor. However, all of the data in this report was reviewed by and VMware performance engineering.
Cheers,
Vaughn
I haven’t read the full report yet, just the synopsis, but i’m puzzled how can 1gige connectivity be equally as fast as 10gige. Either they were testing something else, or their benchmark haven’t fully stressed each connection type…or storage array they were using couldn’t handle more work regardless of the protocol…or is there some bug in vmware that capped performance?
If my understanding is correct only the connection between the hosts and the Cisco switch varied. The connection between the storage and the switch was always 10Gb Converged Network Adapter.
So even if each host had 1Gb, 8 hosts can easily saturate 192 disks with 8k or 4k workload.
Also, the 256 I/O outstanding limit was set per host so that makes 2048 total.
@Mxx – does this help you with your question?
http://www.yellow-bricks.com/2011/05/16/surprising-results-fcnfsiscsifcoe/comment-page-1/#comment-24636
Vaughn
I dont think the tests were running into bandwidth bottlenecks, that’s why you see similar performance between 1Gbps and 10Gbps. The bottleneck is how many IOPS the disk subsystem can deliver for VMs.
The tests were based on random IO peformance. With 4 RAID-DP 23 disks (21+2) groups, the total IOPS for 8K random request should be in the range of 4*21*170 IOPS = 14K IOPS, this corresponds to about 110MBps or 1Gbps bandwidth for 8K request.
but with all the various tests they’ve run, there should’ve been at least some tests that are more throughput intensive than others to show advantage of 4/8/10gige connectivity over 1gig nfs…really strange.
It in nice to see it confirm that iSCSI can deliver performance ..etc . I have StoneFly unit iSCSI.com and
very happy with it.
A couple of points.
I think it’s important to note that this is a comparison of the protocols WITHIN a NetApp filer, not a comparison of the protocols in general. Another vendor’s block storage could outperform the NetApp block storage. And just as easily, another vendor’s block storage could under perform when compared to these results. Most importantly, another vendor’s block could outperform its NFS (and vice versa). So it’s all relative to the array and all arrays have different strengths.
I don’t agree with your conclusion that it’s no longer about which protocol performs best. The 4Gb Fibre Channel substantially beats the the IO performance of 1Gb iSCSI and 1Gb NFS (100 to 93 and 94 on IOps, respectively). Sure, the 10Gb and 8Gb are all within a couple of relative points, but they’re also within a couple of points of the 4Gb fibre. Which, due to a lack of actual results to prove otherwise, leads me to only one logical conclusion, that the testing did not properly saturate the higher speed links (otherwise, we’d at least be seeing the 8Gb FC outperforming the 4Gb FC. Unless it’s NetApp’s stance that if you want to do FC on one of their arrays, don’t waste your money on the 8Gb ports since they perform the same as the 4Gb?).
With no longer about the protocol I am NOT referring to the transmit rate. Of course there will be a difference between 1GbE ethernet and 8GB FC.
Duncan,
The only a difference in the bandwidth per link, not the rate at which data is transfered.
Most want to see a performance report where the link is saturated. What most misunderstand that most data center links are no where near saturation, and that with real-world workloads and block sizes, its not possible to fill a 4/8/10 Gb link.
Trust me, storage vendors have real-world data on what vSphere deployments generate. We see the mix, block size, and IOPs.
There’s more to storage than theoretical maximums. 🙂
Cheers,
Vaughn Stewart
NetApp
Hi Vaughn and others,
Even the relative numbers would be interesting to see. In fact, these numbers would be VERY interesting to see. When you load any of the protocols under a Mbyte/sec, all will perform on par with each other (like driving on an empty highway).
I would really like to see the relative numbers on saturation of the used protocols; like 50% bandwidth used or 25% bandwidth used. THAT is where it makes (or used to make?!?!) a difference: In the old days, FC could be loaded up to 80% of its bandwidth and still scale linear; this is where the IP protocols would have flattened out on their performance.
That is why the relative saturation numbers count and make THE difference. Plus, I am sure a lot of people would love to see that you now actually CAN load 1Gbit/s links to higher (relative) numbers without introducing a lot of latency.
Read the report, the relative numbers are there for both shared datastores and isolated data sets.
http://blogs.netapp.com/virtualstorageguy/2011/05/new-vsphere-41-report-measuring-san-nas-performance.html
Sorry the links to the reports doesn’t work
I noticed the URL to my blog is broken – while dated here’s the new link for anyone interested:
http://virtualstorageguy.com/2011/05/15/new-vsphere-4-1-report-measuring-san-nas-performance/