In the last couple of weeks I had 3 different instances of people encountering weird behaviour in their environment. In two cases it was a VSAN environment and the other one was an environment using a flash caching solution. What all three had in common is that when they were driving a lot of IO the SSD device would be unavailable, for one of them they even had challenges enabling VSAN in the first place before any IO load was placed on it.
With the first customer it took me a while what was going on. I asked him the standard questions:
- Which disk controller are you using?
- Which flash device are you using?
- Which disks are you using?
- Do you have 10GbE networking?
- Are they on the HCL?
- What is the queue depth of the devices?
All the answers were “positive”, meaning that the full environment was supported… the queue depth was 600 so that was fine, enterprise grade MLC devices used and even the HDDs were on the HCL. So what was causing their problems? I asked them to show me the Web Client and the disk devices and the flash devices, then I noticed that the flash devices were connected to a different disk controller. The HDDs (SAS drives) were connected to the disk controllers which was on the HCL, a highly performant and reliable device… The flash device however was connected to the on-board shallow queue depth and non-certified controller. Yes indeed, the infamous AHCI disk controller. When I pointed it out the customers were shocked ,”why on earth would the vendor do that…”, well to be honest if you look at it: SAS drives were connected to the SAS controller and the SATA flash device was connected to the SATA disk controller, from that perspective it makes sense right? And in the end, the OEM doesn’t really know what your plans are with it when you configure your own box right? So before you install anything, open it up and make sure that everything is connected properly in a fully supported way! (PS: or simply go VSAN Ready Node / EVO:RAIL :-))
David Chung says
Okay – this is exactly what happened to me with the HP servers….
martijnl says
Check, same here. Specced an extra RAID controller but in the factory it wasn’t connected to the SAS drive bay. We couldn’t do that ourselves as the cable routing is such that it could never reach the add-on card.
Dan McGee says
Out of curiosity, was it the same hardware vendor across the three instances of this occurring?
John Nicholson. says
Might add “Is the firmware on the HBA supported with pass through, and configured in the right form of pass through” to the list of questions.
Duncan Epping says
I guess that would be the right question to ask depending on the HW config you have, but I can understand where you are coming from 🙂
Vince says
Why is OP trying to push EVO Rail so hard?
Duncan Epping says
OP?
John Nicholson. says
Original Poster (IE you). Its a common term in internet forums, not normally seen in blog comments.