I have seen this question popping up various times now where people want to know how long VSAN rebuilding will take with large drives. And it was something that was asked on twitter as well today, and I think there are some common misconceptions out there when it comes to rebuilding. Maybe this tweet summarizes those misconceptions best:
https://twitter.com/rbergin/status/466908885165424641
There are a couple of things I feel need to be set straight here:
- VSAN is an object store storage solution, each disk is a destination for objects
- There is no filesystem or RAID set spanning disks
I suggest you read the above twice, now if you know that there is no RAID set spanning disks or a single filesystem formatted across multiple you can conclude the following: If a disk fails then what is on the disk will need to be rebuild. Lets look at an example:
I have a 4TB disk with 1TB capacity used by virtual machine objects. The 4TB disk fails. Now the objects are more than likely out of compliance from an availability stance and VSAN will start rebuilding the missing components of those objects. Notice I said “objects and components” and not “disk”. This means that VSAN will start reconstructing the 1TB worth of components of those impacted objects, and not the full 4TB! The total size of the lost components is what matters, and not the total size of the lost disk.
Now when VSAN starts rebuilding it is good to know that all hosts that hold components of impacted objects will contribute to the rebuild. Even better, VSAN does not have to wait for the failed disk to be replaced or return for duty… VSAN used the whole VSAN cluster as a hot spare and will start rebuilding those components within your cluster, as long as there is sufficient disk capacity available of course. On top of that, the rebuilding logic of VSAN is smart… it will not just go all out but it will instead take the current workload consideration. If you have virtual machines which are doing a lot of IO than VSAN, while rebuilding, is smart enough to prioritize the rebuilding of those components in such a way that it will not hurt your workloads.
Now the question remains, how long will it take to rebuild 1TB worth of lost components? Well that depends… And what does it depend on?
- Total size of components to be rebuild of impacted objects
- Number of hosts in the cluster
- Number of hosts contributing to the rebuild
- Number of disks per host
- Network infrastructure
- Current workload of VMs within the cluster
A lot of variables indeed, difficult for me to predict how long it will take. This is something
Oh, and before I forget, congrats to the VSAN team for winning best of Microsoft TechEd in the virtualization category. WHAT? Yes you read that correctly…
Julian Wood says
Hi Duncan. I was part of the twitter conversation at the London VMUG that prompted the question from Rob. The discussion was actually around the difference between recovery of a RAID volume with traditional storage and VSAN recovery with a host failure.
The discussions was around failure domains. With a single drive failure in a SAN RAID scenario, the entire drive contents are rebuilt onto another drive but that is only the max total space of a drive so say 4TB. A VSAN datastore can have up to 35 disks on a host. If an entire host fails or even needs to be put into longer term maintenance mode, the total used capacity of all the VSAN blocks stored on all drives in that host will need to be rebuilt elsewhere, this would take a lot of network bandwidth and shifting data around than a single drive failure in an external array. Sure, if multiple drives die in an external array you are in serious trouble.
This is not to say VSAN is bad in any way, people just need to understand failure domains and what needs to be rebuilt when something dies, be it a disk or a host.
Perfect explanation though to get people away from thinking about VSAN in terms of RAID recovery. There is no RAID data protection with VSAN. VMs are assigned a policy which then copies their blocks onto multiple drives and hosts to satisfy performance and availability.
Thanks for sharing. 🙂
Duncan Epping says
Sure for long term maintenance it is different. Then again if you plan extremely long term maintenance I would suspect that it doesn’t matter too much if it takes 1 hr or 3 hrs… it is long term anyway.
Also, keep in mind that VSAN doesn’t move the bits JUST from the host going in maintenance mode. It also leverages the components on others hosts to reconstruct. So it is not 1 to many, but many to many.
Rob Bergin says
Glad Julian chimed in.
You grabbed my tweet but its may be out of context without Julian’s.
Mine was a response was related to Julian’s who tweeted “So is VSAN a SPOF during host failure with 1 replica for 60 mins before rebuild starts? #LonVMUG”.
I thought to myself – 60 minutes + Rebuild Time isn’t bad compared the RAID rebuild times of a 1-4 TB SATA drive in a RAID group (which can take a significant amount of time and was one of the primary driver of RAID-DP – eliminate dual HDD failures) (disclaimer: I work for NetApp).
And this thinking of a dual failure is the same with a Virtual SAN.
I assume most production environments will likely run at least N+2 for Host Failures because if you are running 35 drives (say 1-4 TB) – the usable capacity of that Virtual SAN node could be rather large when it comes to things like Maintenance Mode or Node Failure.
Duncan Epping says
I grabbed this as an example. I have had more questions on this topic. The amount it takes before a rebuilt occur will depend on the type of failure. If it is a 1 disk failure then it is instant. A host failure is 60 minutes by default, but could be longer.
John Nicholson. says
In terms of discussions about double drive failures, node failures etc, I’d point people to look at IBM’s XIV experience with a similar design (Scale out mirroring at sub disk level with a semi-random distribution for many to many based restore system). While IBM’s smaller chunk system will in theory mean faster restores, its restore behavior is similar enough. .
Tony Pearson has a great post discussing their experience with this.
https://www.ibm.com/developerworks/community/blogs/InsideSystemStorage/entry/ddf-debunked-xiv-two-years-later?lang=en