Storage

Quality of components in Hybrid / All flash storage

Duncan Epping · Jun 24, 2014 ·

Today I was answering some questions on the VMTN forums and one of the questions was around the quality of components in some of the all flash / hybrid arrays. This person kept coming back to the type of flash used (eMLC vs MLC, SATA vs NL-SAS vs SAS). One of the comments he made was the following:

I talked to Pure Storage but they want $$$ for 11TB of consumer grade MLC.

I am guessing he did a quick search on the internet, found a price for some SSDs and multiplied it and figured that Pure Storage was asking way too much… And even compared to some more traditional arrays filled with SSD they could sound more expensive. I guess this also applies to other solutions, so I am not calling out Pure Storage here.One thing some people seem to forget is that when it comes to these new storage architectures is that they are build with flash in mind.

What does that mean? Well everyone has heard all of the horror stories around consumer grade flash wearing out extremely fast and blowing up in your face. Well fortunately that is only true to a certain extent as some consumer grade SSDs easily reach 1PB of writes these days. On top of that there are a couple of things I think you should know and consider before making statements like these or be influenced by a sales team who says “well we offer SLC versus MLC so we are better than them”.

For instance (As Pure Storage lists on their website), there are many more MLC drives shipped than any other type at this point. Which means that it has been tested inside out by consumers, who can break devices in many more ways than you or your QA team can? Right, the consumer! More importantly if you ask me, ALL of these new storage architectures have in-depth knowledge of the type of flash they are using. That is how their system was architected! They know how to leverage flash, they know how to write to flash, they know how to avoid fast wear out. They developed an architecture which was not only designed but also highly optimized for flash… This is what you pay for. You pay for the “total package” which means the whole solution, not just those flash devices that are leveraged. The flash devices are a part of the solution, and just a relatively small part if you ask me. You pay for total capacity with low latency and functionality like deduplication, compression and replication (in some cases). You pay for the ease of deployment and management (operational efficiency), meaning you get to spent your time on stuff that matters to your customer… their applications.

You can summarize all of it in a single sentence: the physical components used in all of these solutions are just a small part of the solution, whenever someone tries to sell you the “hardware” that is when you need to be worried!

vSphere 5.5 U1 patch released for NFS APD problem!

Duncan Epping · Jun 11, 2014 ·

On April 19th I wrote about an issue with vSphere 5.1 and NFS based datastores APD ‘ing. People internally at VMware have worked very hard to root cause the issue and fix it. Log entries witnessed are:

YYYY-04-01T14:35:08.075Z: [APDCorrelator] 9414268686us: [esx.problem.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
YYYY-04-01T14:36:55.274Z: No correlator for vob.vmfs.nfs.server.disconnect
YYYY-04-01T14:36:55.274Z: [vmfsCorrelator] 9521467867us: [esx.problem.vmfs.nfs.server.disconnect] 192.168.1.1/NFS-DS1 12345678-abcdefg0-0000-000000000000 NFS-DS1
YYYY-04-01T14:37:28.081Z: [APDCorrelator] 9553899639us: [vob.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

More details on the fix can be found here: http://kb.vmware.com/kb/2077360

Why Queue Depth matters!

Duncan Epping · Jun 9, 2014 ·

A while ago I wrote an article about the queue depth of certain disk controllers and tried to harvest some of the values and posted those up. William Lam did a “one up” this week and posted a script that can gather the info which then should be posted in a Google Docs spreadsheet, brilliant if you ask me. (PLEASE run the script and lets fill up the spreadsheet!!) But some of you may still wonder why this matters… (For those who didn’t read some of the troubles one customer had with a low-end shallow queue depth disk controller, and Chuck’s take on it here.) Considering the different layers of queuing involved, it probably makes most sense to show the picture from virtual machine down to the device.

In this picture there are at least 6 different layers at which some form of queuing is done. Within the guest there is the vSCSI adaptor that has a queue. Then the next layer is VMkernel/VSAN which of course has its own queue and manages the IO that is pushed to the MPP aka muti-pathing layer the various devices on a host. On the next level a Disk Controller has a queue, potentially (depending on the controller used) each disk controller port has a queue. Last but not least of course each device (i.e. a disk) will have a queue. Note that this is even a simplified diagram.

If you look closely at the picture you see that IO of many virtual machines will all flow through the same disk controller and that this IO will go to or come from one or multiple devices. (Typically multiple devices.) Realistically, what are my potential choking points?

Disk Controller queue
Port queue
Device queue

Lets assume you have 4 disks; these are SATA disks and each have a queue depth of 32. Total combined this means that in parallel you can handle 128 IOs. Now what if your disk controller can only handle 64? This will result in 64 IOs being held back by the VMkernel / VSAN. As you can see, it would beneficial in this scenario to ensure that your disk controller queue can hold the same number of IOs (or more) as your device queue can hold.

When it comes to disk controllers there is a huge difference in maximum queue depth value between vendors, and even between models of the same vendor. Lets look at some extreme examples:

HP Smart Array P420i - 1020 Intel C602 AHCI (Patsburg) - 31 (per port) LSI 2008 - 25 LSI 2308 - 600

For VSAN it is recommended to ensure that the disk controller has a queue depth of at least 256. But go higher if possible. As you can see in the example there are various ranges, but for most LSI controllers the queue depth is 600 or higher. Now the disk controller is just one part of the equation, as there is also the device queue. As I listed in my other post, a RAID device for LSI for instance has a default queue depth of 128 while a SAS device has 254 and a SATA device has 32. The one which stands out the most is the queue depth of the SATA device, only a queue depth of 32 and you can imagine this can once again become a “choking point”. However, fortunately the shallow queue depth of SATA can easily be overcome by using NL-SAS drives (nearline serially attached SCSI) instead. NL-SAS drives are essentially SATA drives with a SAS connector and come with the following benefits:

Dual ports allowing redundant paths
Full SCSI command set
Faster interface compared to SATA, up to 20%
Larger (deeper) command queue [depth]

So what about the cost then? From a cost perspective the difference between NL-SAS and SATA is for most vendors negligible. For a 4TB drive the difference at the time of writing on different website was on average $ 30,-. I think it is safe to say that for ANY environment NL-SAS is the way to go and SATA should be avoided when possible.

In other words, when it comes to queue depth: spent a couple of extra bucks and go big… you don’t want to choke your own environment to death!

Re: SFD5 event and negativity / respect

Duncan Epping · Apr 28, 2014 ·

Storage Field Day was hosted last week, and I typically like these events. Mainly because they have start-ups presenting their new technology and I like the flow of the sessions typically. I also like the interaction between the “delegates” and the vendors, well at times I do. There were several blog posts on the topic from people who are part of the, what I would call at this point, old boys club (yes there were women attending as well but you get the point) as that is what it felt like during the event. I wanted to comment on Bob’s article, but it looks like he is not looking for a healthy debate so I figured a blog post would be the best way to reply.

For those who don’t know: The sessions usually start with some background on the company, a problem description and then followed by a product session with demos and deep-dives where and when needed. Delegates will fire off questions during these sessions, sometimes this leads to a great discussion and sometimes it doesn’t.

This week, as some of you may have noticed on twitter, the event was held but personally I didn’t enjoy it very much. I think this tweet from my friend Jason Boche captures the feeling I had well:

Negativity in the stream is getting out of hand. Show some compassion, respect, & professionalism. #Heathers

— Jason Boche (@jasonboche) April 24, 2014

What stood out to me, and by watching twitter to others as well, was the negativity from some of the delegates about some of the vendors. When the initial problem statement/marketing fluff would take too long the “boring” comments from the delegates started to pass by on twitter, especially during the start of the EMC session this was particularly bad. (Not the first time I have seen it… and definitely not trying to defend a vendor here as they could have known what they were up against and should know the first rule of presenting: know your audience.) Maybe even more annoying for the person watching the feed were the “inside jokes” and the “annecotes” / “incrowd discussions”. It really disrupted the flow of some of the sessions, and I think the PernixData session was the best example of it… it derailed too often leading to the presenter running out of time, or as Frank put it:

https://twitter.com/FrankDenneman/status/459235681345482752

When several people commented on the tweets/atmosphere some heated debates kicked off. What stood out to me during these debates was that the “delegates” felt that they were doing the vendors a service and that the vendors should respect their time/effort. (I agree with them to a certain extend) It was also mentioned various times that they were all experts and there was no need for basics/problem descriptions as all had done their due diligence and came well prepared. Personally I don’t believe that based on the questions asked, and personally I think everyone can learn something even from the basics, besides that I would argue that the Tech Field Day website is really clear on this:

Don’t assume all of the attendees are experts in your area. True to the spirit of Gestalt IT, we intentionally mix many IT disciplines at Tech Field Day to spark creativity and out-of-the-box thinking.

And on the topic of respect; it goes both ways and it seems that the Tech Field Day charter agrees with me on this as this is what it states in the section what it is like to be a delegate:

… just treat them with the thoughtfulness, professionalism and mutual respect they deserve.

But what is the underlying problem? What the delegates seem to have forgotten is the vendor’s perception… Why are these vendors there. What is their reason to participate? Are they looking for feedback from a handful of people on their product(s) and aiming to make road map changes when needed… Or are they looking to introduce their product (or new version) to the world through the reach the event has? (note I said event and not delegates on purpose) I would expect it to be the latter, as the majority of companies presenting are presenting a new product or version and not a road map on top of that I would argue that if they are looking for direct product feedback they would do this in a closed setting with a limited group of people under a strict NDA. Even when that would not be the case, just as you are asking the vendor to be respectful of your time, you should also be respectful towards them for what they are investing. Which is probably a lot more than just time as without their sponsorship there would not be an event. (Assuming Mr Stephen Foskett is not a secret billionaire… But who knows :-)) Either way, think about what allows these events to exist. Without these companies investing, it would be difficult for Stephen to organize these. Also, think about the people watching the event online and even about the person sitting next to you. What is glaringly obvious to you, may not be so for the person sitting next to you simply because they come from a different background.

So why am I writing this, well hopefully so things will change for the better. As I stated, I like these events as they are valuable to the community in my opinion and they provide a nice podium for start-ups to present themselves to the world, but that positive aspect should not get lost in unneeded debates and negativity. As that is what these events are about in my opinion, it is providing a service to the community and I hope it will stay that way.

PS: I have a lot of respect for the endless effort Stephen puts in organizing these sessions / events…

Alert: vSphere 5.5 U1 and NFS issue!

Duncan Epping · Apr 19, 2014 ·

Some had already reported on this on twitter and the various blog posts but I had to wait until I received the green light from our KB/GSS team. An issue has been discovered with vSphere 5.5 Update 1 that is related to loss of connection of NFS based datastores. (NFS volumes include VSA datastores.)

*** Patch released, read more about it here ***

This is a serious issue, as it results in an APD of the datastore meaning that the virtual machines will not be able to do any IO to the datastore at the time of the APD. This by itself can result in BSOD’s for Windows guests and filesystems becoming read only for Linux guests.

Witnessed log entries can include:

2014-04-01T14:35:08.074Z: [APDCorrelator] 9413898746us: [vob.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
2014-04-01T14:35:08.075Z: [APDCorrelator] 9414268686us: [esx.problem.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.
2014-04-01T14:36:55.274Z: No correlator for vob.vmfs.nfs.server.disconnect
2014-04-01T14:36:55.274Z: [vmfsCorrelator] 9521467867us: [esx.problem.vmfs.nfs.server.disconnect] 192.168.1.1/NFS-DS1 12345678-abcdefg0-0000-000000000000 NFS-DS1
2014-04-01T14:37:28.081Z: [APDCorrelator] 9553899639us: [vob.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
2014-04-01T14:37:28.081Z: [APDCorrelator] 9554275221us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.

If you are hitting these issues than VMware recommends reverting back to vSphere 5.5. Please monitor the following KB closely for more details and hopefully a fix in the near future: http://kb.vmware.com/kb/2076392