Disk Controller features and Queue Depth?

I have been working on various VSAN configurations and a question that always comes up is what are my disk controller features and queue depth for controller X? (Local disks, not FC based…) Note that this is not only useful to know when using VSAN, but also when you are planning on doing host local caching with solutions like PernixData FVP or SanDisk FlashSoft for instance. The controller used can impact the performance, and a really low queue depth will result in a lower performance, it is as simple as that.

I have found myself digging through documentation and doing searches on the internet until I stumbled across the following website. I figured I would share the link with you, as it will help you (especially consultants) when you need to go through this exercise multiple times:

http://forums.servethehome.com/index.php?threads/lsi-raid-controller-and-hba-complete-listing-plus-oem-models.599/

Just as an example, the Dell H200 Integrated disk controller is on the VSAN HCL. According to the website above it is based on the LSI 2008 and provides the following feature set: 2×4 port internal SAS, no cache, no BBU, RAID 0, 1 and 10. According to the VSAN HCL also provides “Virtual SAN Pass-Through”. I guess the only info missing is queue depth of the controller. I have not been able to find a good source for this. So I figured I would make this thread a source for that info.

Before we dive in to that, I want to show something which is also important to realize. Some controllers take: SAS / NL-SAS and SATA. Although typically the price difference between SATA and NL-SAS is neglectable, the queue depth difference is not. Erik Bussink was kind enough to provide me with these details of one of the controllers he is using as an example, first in the list is “RAID” device – second is SATA and third SAS… As you can see SAS is the clear winner here, and that includes NL-SAS drives.

mpt2sas_raid_queue_depth: int
     Max RAID Device Queue Depth (default=128)
  mpt2sas_sata_queue_depth: int
     Max SATA Device Queue Depth (default=32)
  mpt2sas_sas_queue_depth: int
     Max SAS Device Queue Depth (default=254)

If you want to contribute, please take the following steps and report the Vendor, Controller type and aqlength in a comment please.

  1. Run the esxtop command on the ESXi shell / SSH session
  2. Press d
  3. Press f and select Queue Stats (d)
  4. The value listed under AQLEN is the queue depth of the storage adapter

The following table shows the Vendor, Controller and Queue Depth. Note that this is based on what we (my readers and I) have witnessed in our labs and results my vary depending on the firmware and driver used. Make sure to check the VSAN HCL for the supported driver / firmware version, note that not all controllers below are on the VSAN HCL, this is a “generic” list as I want it to serve multiple use cases.

Generally speaking it is recommended to use a disk controller with a queue depth > 256 when used for VSAN or “host local caching” solutions.

HPSmart Array P700m1200

Vendor Disk Controller Queue Depth
Adaptec RAID 2405 504
Dell H310 25
Dell (R610) SAS 6/iR 127
Dell PERC 6/i 925
Dell (M710HD) PERC H200 Embedded 499
Dell (M910) PERC H700 Modular 975
Dell PERC H700 Integrated 975
Dell (M620) PERC H710 Mini 975
Dell (T620) PERC H710 Adapter 975
Dell (T620) PERC H710p 975
Dell PERC H810 975
HP Smart Array P220i 1020
HP Smart Array P400i 128
HP Smart Array P410i 1020
HP Smart Array P420i 1020
Intel C602 AHCI (Patsburg) 31 (per port)
Intel C602 SCU (Patsburg) 256
Intel RMS25KB040 600
LSI 2004 25
LSI 2008 25
LSI 2108 600
LSI 2208 600
LSI 2308 600
LSI 3008 600
LSI 9300-8i 600

Startup News Flash part 17

Number 17 already… A short one, I expect more news next week when we have “Storage Field Day”, hence I figured I would release this one already. Make sure to watch the live feed if you are interested in getting the details on new releases from companies like Diablo, SanDisk, PernixData etc.

Last week Tintri announced support for the Red Hat Enterprise Virtualization platform. Kind of surprising to see them selecting a specific linux vendor to be honest, but then again it probably also is the more popular option for people who want full support etc. What is nice in my opinion is that Tintri offers the exact same “VM Aware” experience for both platforms. Although I don’t see too many customers using both VMware and RHEV in production, it is nice to have the option.

CloudVolumes, no not a storage company, announced support for View 6.0. CloudVolumes developed a solution which helps you manage applications. They provude a central management solution, and the option to distribute and elimate the need for streaming / packaging. I have looked at it briefly and it is an interesting approach they take. I like how they solved the “layering” problem by isolating the app in its own disk container. It does make me wonder how this scales when you have dozens of apps per desktop, never the less an interesting approach worth looking in to.

Win a Jackery Giant backup battery, by just leaving a comment

I was one of the lucky guys who won a price during the Top Bloggers award “ceremony”. Veeam was so kind enough to provide two of the exact same items so that every blogger who won a price could also give away a price to their readers. I am not going to make it more difficult than it needs to be. Leave a comment before Friday the 18th of April, make sure use your real email address in the form, and I will let my daughter pick a random winner on Saturday morning. I will update this blog post and inform the winner.

What can you win? (Funny, I was at the point of buying one of these myself as I always run out of battery on my phone and iPad during all-day events!)

Jackery Giant

- Large power capacity with 2.1A output
- The world’s most powerful external rechargeable battery
- 2.1A fast charging
- Size, style and speed make this most powerful external rechargeable battery to-date

This large capacity portable external battery has dual output ports and 10,400mAh for lengthening mobile device battery life up to 500% for smart phones. Its compact size and stylish design has three LED charge status indicators with a two LED flashlight for up to 700 hours of illumination.

FUD it!

In the last couple of weeks something stood out to me when it comes to the world of storage and virtualisation and that is animosity. What struck me personally is how aggressive some storage vendors have responded to Virtual SAN, and Server Side Storage in general. I can understand it in a way as Virtual SAN plays in the same field and they probably feel threatened and it makes them anxious. In some cases I even see vendors responding to VSAN who do not even play in the same space, I guess they are in need of attention. Not sure this is the way to go about to be honest, if I were considering a hyper(visor)-converged solution I wouldn’t like being called lazy because of it. Then again, I was always taught that lazy administrators are the best administrators in the world as they plan accordingly and pro-actively take action. This allows them to lean back while everyone else is running around chasing problems, so maybe it was a compliment.

Personally I am perfectly fine with competition, and I don’t mind being challenged. Whether that includes FUD or just cold hard facts is even besides the point, although I prefer to play it fair. It is a free world, and if you feel you need to say something about someone else product you are free to do so. However you may want to think about the impression you leave behind. In a way it is insulting to our customers. With our customers including your customers.

For the majority of my professional career I have been a customer, and personally I can’t think of anything more insulting than a vendor spoon feeding why their competitor is not what you are looking for. It is insulting as it insinuates that you are not smart enough to do your own research and tear it down as you desire, not smart enough to know what you really need, not smart enough to make the decision by yourself.

Personally when this happened in the past, I would simply ask them to skip the mud slinging and go to the part where they explain their value add. And in many cases, I would end up just ignoring the whole pitch… cause if you feel it is more important to “educate” me on what someone else does over what you do… then they probably do something very well and I should be looking at them instead.

So lets respect our customers… let them be the lazy admin when they want, let them decide what is best for them… and not what is best for you.

PS: I love the products that our competitors are working on, and I have a lot of respect how they paved the way of the future.

Updating LSI firmware through the ESXi commandline

I received an email this week from one of my readers / followers on twitter who had gone through the effort of upgrading his LSI controller firmware. He shared the procedure with me as unfortunately it wasn’t well documented. I hope this will help others in the future, I know it will help me as I was about to look at the exact same for my VSAN environment, thanks for sharing this Tom!

– copy / paste from Tom’s document –

We do quite a bit of virtualization and storage validation and performance testing in the Taneja Group Labs (http://tanejagroup.com/). Recently, we were performing some tests with VMware’s VSAN and due to some performance issues we were having with the AHCI controllers on our servers we needed to revise our environment to add some LSI SAS 2308 controllers and attach our SSD and HDDs to the LSI card. However our new LSI SAS controllers didn’t come with the firmware mandated by the VSAN HCL (they had v14 and the HCL specifies v18) and didn’t recognize the attached drives.  So we set about updating LSI 2308 firmware. Updating the LSI firmware is a simple process and can be accomplished from an ESXi 5.5 U1 server but isn’t very well documented. After updating the firmware and rebooting the system the drives were recognized and could be used by VSAN. Below are the steps I took to update my LSI controllers from v14 to v18. [Read more...]

VSAN for ROBO?

I noticed this new SuperMicro VSAN Ready Node being published last week. The configuration is potentially a nice solution for ROBO deployments, primarily due to the cost of the system.

When I did the math it came in around $ 3800,-. This is the configuration:

  • SuperMicro SuperServer 1018D-73MTF
  • 1 x Intel E3-1270 V3 3.5GHz- Quadcore
  • 32GB Memory
  • 5 x 1TB 7200 RPM NL-SAS HDD
  • 1 x 200GB Intel S3700 SSD
  • LSI 2308 Disk controller
  • 4 x 1GbE NIC port

It is a nice configuration that will allow for roughly fifteen 1 vCPU Virtual Machines with 3GB of memory and 60GB disk capacity per host. Personally I would use a different CPU and some more memory probably as that gives you a bit more headroom, especially during maintenance. The cost from a software point of view is socket based so you can increase memory and change the type of CPU with relative low cost impact. The SuperMicro server listed however is limited to the E3 CPU family and to 32GB but there are alternatives out there. (For instance the Dell R320 or maybe even the R210 etc)

From a software point of view the cost of this configuration is limited to 3 x VSAN license and 3 x vSphere. As VSAN even works with Essentials Plus and Standard you could leverage that to keep the cost down, but keep in mind that you won’t have DRS if you drop down to Standard or lower. Still sounds like a nice ROBO package to me, especially when you have many sites this could be a great way to create a standardized packaged solution.

ESXi DCUI Shutdown vs vCenter Shutdown of a host

Today on the community forums someone mentioned he had shutdown his host and that he expected vSphere HA to restart his virtual machines. For whatever reason he got in a situation where all of his VMs were still running but he couldn’t do much anymore with them and as such he wanted to kill the host so that HA could safely restart the virtual machines. However when he shutdown his host nothing happened, the VMs remained powered off. Why did this happen?

I had seen this before in the past, but it never really sunk in until I saw the questions from this customer. I figured I would test it just to see what happened and if I could spot a difference in the vSphere HA logs. I powered on a VM on one of my hosts and moved off all other VMs. I then went to the DCUI of the host and gave a “shutdown” using F12. I tailed the FDM log on one of my hosts and spotted the following log message:

2014-04-04T11:41:54.882Z [688C2B70 info 'Invt' opID=SWI-24c018b] [VmStateChange::SavePowerChange] vm /vmfs/volumes/4ece24c4-3f1ca80e-9cd8-984be1047b14/New Virtual Machine/New Virtual Machine.vmx curPwrState=unknown curPowerOnCount=0 newPwrState=powered off clnPwrOff=true hostReporting=host-113

In the above scenario the virtual machine was not restarted even though the host was shutdown. I did the exact same exercise again, but only this time I did the shutdown using the vCenter Web Client. After I witnessed the VM being restarted I also noticed a difference in the FDM log:

2014-04-04T12:12:06.515Z [68040B70 info 'Invt' opID=SWI-1aad525b] [VmStateChange::SavePowerChange] vm /vmfs/volumes/4ece24c4-3f1ca80e-9cd8-984be1047b14/New Virtual Machine/New Virtual Machine.vmx curPwrState=unknown curPowerOnCount=0 newPwrState=powered on clnPwrOff=false hostReporting=host-113

The difference is the power-off state that is reported by vSphere HA. In the first scenario the virtual machine is marked as “clnPwrOff=true” which basically tells vSphere HA that an administrator has powered off the virtual machine, this is what happened when “shutdown” was initiated through the DCUI and hence no restart took place. (It seems that ESXi initiates a shutdown of all running virtual machines.) In the second scenario vSphere HA reported that the VM was not cleanly powered off (“clnPwrOff=false”), and as such it restarted the virtual machine as it assumed something bad had happened to it.

So what did we learn? If you, for whatever reason, want vSphere HA to restart your virtual machines which are currently running on a host that you want to shutdown, make sure that you use the vCenter Web Client instead of the DCUI!

Disclaimer: my tests were conducted using vSphere 5.5 Update 1. I believe that at some point in the past “shutdown” via the DCUI would also allow HA to restart the VMs. I am now investigating why this has changed and when. When I find out I will update this post.