In the last couple of weeks I had 3 different instances of people encountering weird behaviour in their environment. In two cases it was a VSAN environment and the other one was an environment using a flash caching solution. What all three had in common is that when they were driving a lot of IO the SSD device would be unavailable, for one of them they even had challenges enabling VSAN in the first place before any IO load was placed on it.
With the first customer it took me a while what was going on. I asked him the standard questions:
- Which disk controller are you using?
- Which flash device are you using?
- Which disks are you using?
- Do you have 10GbE networking?
- Are they on the HCL?
- What is the queue depth of the devices?
All the answers were “positive”, meaning that the full environment was supported… the queue depth was 600 so that was fine, enterprise grade MLC devices used and even the HDDs were on the HCL. So what was causing their problems? I asked them to show me the Web Client and the disk devices and the flash devices, then I noticed that the flash devices were connected to a different disk controller. The HDDs (SAS drives) were connected to the disk controllers which was on the HCL, a highly performant and reliable device… The flash device however was connected to the on-board shallow queue depth and non-certified controller. Yes indeed, the infamous AHCI disk controller. When I pointed it out the customers were shocked ,”why on earth would the vendor do that…”, well to be honest if you look at it: SAS drives were connected to the SAS controller and the SATA flash device was connected to the SATA disk controller, from that perspective it makes sense right? And in the end, the OEM doesn’t really know what your plans are with it when you configure your own box right? So before you install anything, open it up and make sure that everything is connected properly in a fully supported way! (PS: or simply go VSAN Ready Node / EVO:RAIL :-))
Okay – this is exactly what happened to me with the HP servers….
Check, same here. Specced an extra RAID controller but in the factory it wasn’t connected to the SAS drive bay. We couldn’t do that ourselves as the cable routing is such that it could never reach the add-on card.
Out of curiosity, was it the same hardware vendor across the three instances of this occurring?
Might add “Is the firmware on the HBA supported with pass through, and configured in the right form of pass through” to the list of questions.
I guess that would be the right question to ask depending on the HW config you have, but I can understand where you are coming from 🙂
Why is OP trying to push EVO Rail so hard?
OP?
Original Poster (IE you). Its a common term in internet forums, not normally seen in blog comments.