This is one of those questions that comes up every now and then, I have written about this before, but it never hurts to repeat some of it. The comment I got was around rebuild time of failed drives in VSAN, surely it takes longer than with a “legacy” storage system. The answer of course is: it depends (on many factors).
But what does it depend on? Well it depends on what exactly we are talking about, but in general I think the following applies:
With VSAN components (copies of objects, in other words copies of data) are placed across multiple hosts, multiple diskgroups and multiple disks. Basically if you have a cluster of lets say 8 hosts with 7 disks each and you have 200 VMs then the data of those 200 VMs will be spread across 8 hosts and 56 disks in total. If one of those 56 disks happens to fail then the data that was stored on that disk would need to be reprotected. That data is coming from the other 7 hosts which is potentially 49 disks in total. You may ask, why not 55 disks? Well because replica copies are never stored on the same hosts for resiliency purposes, look at the diagram below where a single object is split in to 2 data components and a witness, they are all located on different hosts!
We do not “mirror” disks, we mirror the data itself, and the data can and will be place anywhere. This means that when a failure has occurred of a disk within a diskgroup on a host all remaining disk groups / disk / hosts will be helping to rebuild the impacted data, which is 49 disks potentially. Note that not only will disks and hosts containing impacted objects help rebuilding the data, all 8 hosts and 55 disks will be able to receive the replica data!
Now compare this to a RAID set with a spare disk. In the case of a spare disk you have 1 disk which is receiving all the data that is being rebuild. That single disk can only take an X number of IOPS. Lets say it is a really fast disk and it can take 200 IOPS. Compare that to VSAN… Lets say you used really slow disks which only do 75 IOPS… Still that is (potentially) 49 disks x 75 IOPS for reads and 55 disks for writes.
That is the major difference, we don’t have a single drive as a designated hot spare (or should I say bottleneck?), we have the whole cluster as a hot spare! As such rebuild times when using similar drives should always be faster with VSAN compared to traditional storage.
Brian Suhr says
Question on your statement below
“Note that not only will all 49 disks and 7 hosts help rebuilding the data, all 8 hosts and 55 disks will be able to receive the replica data!”
If a VM has a stripe =1 then the rebuild would be copying from 1 disk to the single target disk.
If stripe =2 then it would be 2 disks to 2 disks.
It does not seem like all disks would be involved unless your configuring VMs with a very large stripe size.
Duncan Epping says
Yes, I should have explained that a bit further and added the word “potentially”. Of course it depends on how many components are stored on the impacted disk and where those stored. Typically they will be distributed across the cluster and disks. So if 20 / 30 components are impacted they will come from the different hosts and disks.
Duncan Epping says
Also, striping also happens in other situations then defined by policy. (like size above 255GB for instance etc)
John Nicholson (@Lost_Signal) says
So you have a 1.2TB 10K drive fail in this case. Assuming the cluster was only utilized at 50% that’s only 600GB of data that needs to be copied from the 49 disks (So ~12GB) of data would need to be moved from each drive. Copying 12GB from a 10K drive to a 10K drive is fairly fast (even a slower drive its fairly quick).
When I worked for a customer we had a case where we had a failure like this happen on a Friday afternoon shortly before the office was shutting down. We watched the fast rebuild, checked the capacity and deferred the drive swap until Monday morning (and no one’s weekend plans were impacted).
With our traditional older arrays we had parity groups that limited a rebuild to a 8-10 disk set at most, and for cost reasons we only kept 1-2 hot spares per unit. With VSAN we could have sustained a failure a day for a day for the week and capacity was our only concern.
Operationally VSAN gives you a bit of flexibility here.
So in case of the default policy of stripe 1, it would be one disk read and one disk write? I don’t see any benefit. What is the recommended disk striping strategy? Increasing the stipe will increase the speed of operation and rebuild but will also increase the chance of having to rebuild. Right?
It looks much better if you consider more than vm per disk – you will get multiple rebuilds across multiple disks even with disk stripe=1
This is only better than legacy arrays. Enterprise arrays already stripe across numerous disks with rebuilds in minutes.