I have seen this question popping up various times now where people want to know how long VSAN rebuilding will take with large drives. And it was something that was asked on twitter as well today, and I think there are some common misconceptions out there when it comes to rebuilding. Maybe this tweet summarizes those misconceptions best:
@julian_wood you see rebuild times on a 1 TB vs 4 TB SATA lately?? I haven't heard too many VSAN node rebuild time stories yet.
— Rob Bergin (@rbergin) May 15, 2014
There are a couple of things I feel need to be set straight here:
- VSAN is an object store storage solution, each disk is a destination for objects
- There is no filesystem or RAID set spanning disks
I suggest you read the above twice, now if you know that there is no RAID set spanning disks or a single filesystem formatted across multiple you can conclude the following: If a disk fails then what is on the disk will need to be rebuild. Lets look at an example:
I have a 4TB disk with 1TB capacity used by virtual machine objects. The 4TB disk fails. Now the objects are more than likely out of compliance from an availability stance and VSAN will start rebuilding the missing components of those objects. Notice I said “objects and components” and not “disk”. This means that VSAN will start reconstructing the 1TB worth of components of those impacted objects, and not the full 4TB! The total size of the lost components is what matters, and not the total size of the lost disk.
Now when VSAN starts rebuilding it is good to know that all hosts that hold components of impacted objects will contribute to the rebuild. Even better, VSAN does not have to wait for the failed disk to be replaced or return for duty… VSAN used the whole VSAN cluster as a hot spare and will start rebuilding those components within your cluster, as long as there is sufficient disk capacity available of course. On top of that, the rebuilding logic of VSAN is smart… it will not just go all out but it will instead take the current workload consideration. If you have virtual machines which are doing a lot of IO than VSAN, while rebuilding, is smart enough to prioritize the rebuilding of those components in such a way that it will not hurt your workloads.
Now the question remains, how long will it take to rebuild 1TB worth of lost components? Well that depends… And what does it depend on?
- Total size of components to be rebuild of impacted objects
- Number of hosts in the cluster
- Number of hosts contributing to the rebuild
- Number of disks per host
- Network infrastructure
- Current workload of VMs within the cluster
A lot of variables indeed, difficult for me to predict how long it will take. This is something
Oh, and before I forget, congrats to the VSAN team for winning best of Microsoft TechEd in the virtualization category. WHAT? Yes you read that correctly…