Disk controllers for vSAN with or without cache?

Duncan Epping · Dec 13, 2016 ·

I got this question today and I thought I already wrote something on the topic, but as I cannot find anything I figured I would write up something quick. The question was if a disk controller for vSAN should have cache or not? It is a fair question as many disk controllers these days come with 1GB, 2GB or 4GB of cache.

Let it be clear that with vSAN you are required to disable the write cache at all times. The reason for this is simple, vSAN is in control of data consistency and vSAN does not expect a write cache (battery backed or not) in its data path. Make sure to disable it. From a read perspective you can have caching enabled. In some cases we see controllers where people simply set the write cache to 0% and the rest automatically then becomes read cache. This is fully supported, however our tests have shown that there’s little added benefit in terms of performance. Especially as reads come from SSD anyway typically, theoretically there could be a performance gain, but personally I would rather spend my money on flash for vSAN.

My recommendation is fairly straight forward: use a disk controller which is a plain pass through controller without any fancy features. You don’t need RAID on the disk controller with vSAN, you don’t need caching on the disk controller with vSAN, keep it simple, that works best. So if you have the option to dumb it down, go for it.

Comments

Mike C says

13 December, 2016 at 18:14

You can never say this enough. Been there done that. Trying to buy a T Shirt
James Hess says

13 December, 2016 at 19:09

Do they say anything against using controller cache for Write-Through caching with vSAN to trade some increase in latency in exchange for increase in burst write throughput thanks to the buffering that WT caching provides?

This does not affect data consistency like battery or non-battery-based Writeback caching, because with controllers that implement Write-Through caching the storage controller hardware does not acknowledge any I/O until that I/O has already been written to media. WT merely serves as a transparent buffer to allow more pending IOs to be serviced more efficiently, and the host software is still control over data consistency, since the controller responds according to the SCSI specifications (No I/O acknowledged until I/O completed, Flush not acknowledged until all I/O completed).

This kind of write-through caching cannot possibly likely be done at good efficiency by software on the host, since its distance latency-wise is many nanoseconds too far away from the storage media, and the peak write rate of some SSD could still be improved by buffering data as close to the media in the data path as possible before acknowledging.
- Duncan Epping says
  
  14 December, 2016 at 00:56
  
  In most cases I know this is only implemented for RAID-0 configurations, which have a operational impact. (If you know any implemented for pass through let me know.) Personally I don’t feel it brings a huge value compared to the operational overhead and complexity a RAID-0 set adds. But that is my preference. (And seems to be that of the vSAN engineering team as well.)
ghost2512 says

13 December, 2016 at 19:21

I have a 4 hosts cluster with Lsi-3008 and, from this point of view, to have the pass-through is a great deal!!
I have just migrated from hybrid to all flash and you don’t need to touch the controller at all with pass through, so from my point of view,if you can, stay away from RAID-0 only controllers!
- Duncan Epping says
  
  14 December, 2016 at 00:52
  
  Exactly!
  - ghost2512 says
    
    15 December, 2016 at 19:22
    
    Just to tell another advantage, last weekend I made a controller firmware update just by updating a spare controller in my PC, powered off one host, physically exchanged the controller and simply powered on the host and all went fine, without any reconfiguration!! So RAID pain went away!!
    The same for adding one controller and migrating one DG to the second one: shut down, install the new controller, change cabling and power up. All done!! 5 minutes work.
    Think about controller swap for failure, no more fear of that and no rebuild because in just 5 minutes you are ready
    - Duncan Epping says
      
      19 December, 2016 at 11:48
      
      Thanks for sharing that!
Francois Corfdir says

13 December, 2016 at 23:36

for my part I use megaraid 5210 from lenovo who I think is LSI OEM. I have only raid 0 who is in VSAN HCL so I use Raid 0 that can be a long day to configure when you have a lot of disk.
Whith Megaraid when we didn’t use the raid cache upgrade 1G your card are in iMR (integrated mega raid) mode that let you only a queue depth of 240 that is not enought for a VSAN. When you add the upgrade cache card to your megaraid it change the card to MR (mega raid) mode and when your card is in mega raid mode that change your queue depth queue to 895.
I know that you have to disable all cache from your disk and controler but we need a big queue depth to you have to add the cache raid
- Wagner Bandeira says
  
  27 December, 2016 at 14:13
  
  Exactly! I had the same observation. Even in the HCL has a note that the queue depth of 895 is only possible with the cache card even if you do not use the cache capacity (disabled). It´s a good tip!
Ron Scott-Adams (@Tohuw) says

17 December, 2016 at 09:29

Thanks for taking the time to write this up; the question comes up often.

It’s worth mentioning some controllers, like Cisco’s 12G Modular RAID Controller, require at least a 1 GB caching module to achieve sufficient queue depth. That said, there’s also a successor for that card in the 12G Modular SAS Controller, which does not require cache and also allows pass-through. This is a great time to refer anyone looking at a custom vSAN build to the vSAN HCL.
- Duncan Epping says
  
  19 December, 2016 at 11:50
  
  Yeah John Nicholson also emailed me that. That is something specific for that Cisco controller, of course normal logic / principles still apply. If memory is needed to increase the queue depth to a reasonable level than that is what it is, in “normal” cases this is not needed however. (hence I also prefer the new Cisco passthrough option)
  - Andy says
    
    25 April, 2017 at 14:01
    
    Good points in this article.
    Would a 12G controller with cache module installed provide sufficient queue depth only if it is set to write-through or can it be disabled and still provide 891,895 depth?
fstevenchalmers says

21 December, 2016 at 10:05

Observation from 50,000 feet: back about 40 years ago the principle “don’t hide latency, eliminate it!” was attributed to Seymour Cray. Caches are in the business of hiding latency, regardless of where they’re located.

So the 50,000 foot answer to the original question is: actually store the data somewhere fast (byte addressable storage class memory comes to mind); give the application read/write access to the appropriate subset of storage, directly from user space (you should be envisioning DAX-like access to files and objects here). At this point neither VSAN nor an array controller bucket-brigades an individual read or write: their roles become what a network person would call “control plane”, not the “data plane” intermediaries they have always been in the storage world.

Fun to watch how this plays out over the next 10 or 20 years.

@FStevenChalmers
Josh says

24 December, 2016 at 02:54

Skimping on cache and features is fine but be sure that in doing so you don’t get the cheapest raid card possible with a low queue depth. Queue depth is still very important.
KD Mann says

19 August, 2017 at 16:06

Wow…Duncan…if what you are saying is true, then there is a BIG problem with vSAN.

Are you really sure about what you are saying here? Write-back DRAM cache is ubiquitous on storage devices and it is found on ALL and EVERY block-level device, this includes virtually every HDD and every SSD ever sold in the last 20 years (or more). In fact, any storage device or controller that presents an ANSI standard block level target may (and should be PRESUMED by the operating system) to be caching writes.

That is why it is the EXPLICIT responsibility of the Operating System (in this case, VMware/vSAN) to know, and deal with the existence of a write cache >>anywhere<>VMWARE’S<< responsibility to specify this when it writes!

If vSAN cannot deal with the inevitable existence of write-back caching on block-level devices. then it is very badly broken. The idea that VMware would leave it up to customers to remember to disable write-back is ludicrous.

That said, this would not be the first time that VMware fundamentally failed to understand how storage stacks work. For example, the method VMware uses to spoof the ANSI t10 standard SCSI ABORT command (as described in a VMware patent) is virtually guaranteed to cause silent data corruption in ACID compliant OLTP.

Many underinformed people wonder why intelligent customers still refuse to virtualize their most mission-critical applications. This is a great example of why. When VMware tells me it is my responsibility to remember and then to ensure there is no write-back caching happening ANYWHERE in my storage architecture — this reminds me how immature virtualization still is (at least at VMware).

And yes, in case you were wondering — Microsoft's Hyper-v and Storage Spaces Direct handle these things correctly and transparently.
kamruddin chowdhury says

31 January, 2018 at 19:46

We are implementing hybrid vSAN 6.6 on Dell PowerEdge 730. All the caching and capacity disks are attached to same controller (PERC H730 mini). Storage controller and disk both have cache. I have disabled the controller cache. Should I also disable the disk cache also? This decision is an urgent requirement pls.
Thanks in advance.

Related

Reader Interactions

Comments