One versus multiple VSAN disk groups per host

I received two questions on the same topic this week so I figured it would make sense to write something up quickly. The questions were around an architectural decision for VSAN, one versus multiple VSAN disk groups per host. I have explained the concept of disk groups already in various posts but in short this is what a disk group is and what the requirements are:

A disk group is a logical container for disks used by VSAN. Each disk groups needs at a minimum 1 magnetic disk and can have a maximum of 7 disks. Each disk group also requires 1 flash device.

VSAN disk groups

Now when designing your VSAN cluster at some point the question will arise should I have 1 or multiple disk groups per host? Can and will it impact performance? Can it impact availability?

There are a couple of things to keep in mind when it comes to VSAN if you ask me. The flash device which is part of each disk group is the caching/buffer layer for those disks, without the flash device the disks will also be unavailable. As such a disk group can be seen as a “failure domain”, because if the flash device fails the whole disk group is unavailable for that period of time. (Don’t worry, VSAN will automatically rebuild all your components that are impacted automatically.) Another thing to keep in mind is performance. Each flash device will provide an X amount of IOPS.A higher total number of IOPS could (probably will) change performance drastically, however it should be noted that capacity could still be a constraint. If this all sounds a bit fluffy lets run through an example!

  • Total capacity required: 20TB
  • Total flash capacity: 2TB
  • Total number of hosts: 5

This means that per host we will require:

  • 4TB of disk capacity (20TB/5 hosts)
  • 400GB of flash capacity (2TB/5 hosts)

This could simply result in each host having 2 x 2TB NL-SAS and 1 x 400GB flash device. Lets assume your flash device is capable of delivering 36000 IOPS… You can see where I am going right? What if I would have 2 x 200GB flash and 4x 1TB magnetic disks instead? Typical the lower capacity drives will do less write IOPS but for the Intel S3700 for instance that is 4000 less. So instead of 1 x 36000 IOPS it would result in 2 x 32000 IOPS. Yes, that could have a nice impact indeed….

But not just that, we also have more disk groups and smaller fault domains as a result. On top of that we will end up with more magnetic disks which means more IOPS per GB capacity in general. (If an NL-SAS drive does 80 IOPS for 2TB then two NL-SAS drives of 1TB will do 160 IOPS. Which means same TB capacity but twice the IOPS if you need it.)

In summary, yes there is a benefit in having more disk groups per hosts and as such more flash devices…

How long will VSAN rebuilding take with large drives?

I have seen this question popping up various times now where people want to know how long VSAN rebuilding will take with large drives. And it was something that was asked on twitter as well today, and I think there are some common misconceptions out there when it comes to rebuilding. Maybe this tweet summarizes those misconceptions best:

There are a couple of things I feel need to be set straight here:

  1. VSAN is an object store storage solution, each disk is a destination for objects
  2. There is no filesystem or RAID set spanning disks

I suggest you read the above twice, now if you know that there is no RAID set spanning disks or a single filesystem formatted across multiple you can conclude the following: If a disk fails then what is on the disk will need to be rebuild. Lets look at an example:

I have a 4TB disk with 1TB capacity used by virtual machine objects. The 4TB disk fails. Now the objects are more than likely out of compliance from an availability stance and VSAN will start rebuilding the missing components of those objects. Notice I said “objects and components” and not “disk”. This means that VSAN will start reconstructing the 1TB worth of components of those impacted objects, and not the full 4TB! The total size of the lost components is what matters, and not the total size of the lost disk.

Now when VSAN starts rebuilding it is good to know that all hosts that hold components of impacted objects will contribute to the rebuild. Even better, VSAN does not have to wait for the failed disk to be replaced or return for duty… VSAN used the whole VSAN cluster as a hot spare and will start rebuilding those components within your cluster, as long as there is sufficient disk capacity available of course. On top of that, the rebuilding logic of VSAN is smart… it will not just go all out but it will instead take the current workload consideration. If you have virtual machines which are doing a lot of IO than VSAN, while rebuilding, is smart enough to prioritize the rebuilding of those components in such a way that it will not hurt your workloads.

Now the question remains, how long will it take to rebuild 1TB worth of lost components? Well that depends… And what does it depend on?

  • Total size of components to be rebuild of impacted objects
  • Number of hosts in the cluster
    • Number of hosts contributing to the rebuild
  • Number of disks per host
  • Network infrastructure
  • Current workload of VMs within the cluster

A lot of variables indeed, difficult for me to predict how long it will take. This is something

Oh, and before I forget, congrats to the VSAN team for winning best of Microsoft TechEd in the virtualization category. WHAT? Yes you read that correctly…

VPLEX Geosynchrony 5.2 supporting up to 10ms latency with HA/DRS

I was just informed that as of last week VPLEX Metro with Geosynchrony 5.2 has been certified for a round trip (RTT) latency up to 10ms while running HA/DRS in a vMSC solution. So far all vMSC solutions had been certified with 5ms RTT and this is a major breakthrough if you ask me. Great to see that EMC spent the time certifying this including support for HA and DRS across this distance.

Round-trip-time for a non-uniform host access configuration is now supported up to 10 milliseconds for VPLEX Geosynchrony 5.2 and ESXi 5.5 with NMP and PowerPath

More details on this topic can be found here:

Using differently sized disks in a VSAN environment

Internally someone asked this question, and at the Italian VMUG I had someone asking me the same question… What if I want to scale out, or scale-up, and need to add differently sized disks to my existing VSAN environment? Will that be an as expensive exercise as with (some) traditional RAID systems?

Some of you have introduced new disks to RAID sets in the pasts may have seen this: you add a 2TB disk to a RAID config that only has 1TB disks and you waste 1TB as the RAID set only includes the capacity of the other disks. VSAN is not like this fortunately!

With VSAN you can scale-up and scale-out dynamically. VSAN does not, to a certain extend, care about the disk capacity. For VSAN the disk is just destination to store objects, and there is no filesystem or lower level formatting going on to stripe blocks across the disks, sure it uses a filesystem… but this is “local” to the disk, and not across disks. So whether it is a 1TB disk you add to an environment with all 1TB disks, or you add a 2TB disk, it will not matter to VSAN. Same applies to replacing disks by the way, if you need to replace a 1TB disk because it has gone bad and would like to use a 2TB disk instead… go ahead! Each disk will have its own filesystem, and the full capacity can be used by VSAN!

The question then arises, will it make a difference if I use RAID-0 or Passthrough at the disk controller level? Again, it does not. Keep in mind that when you do RAID-0 configurations for VSAN that each disk is in its own RAID-0 configuration. Meaning that if you have 3 disks, you will have 3 x RAID-0 set each containing 1 disk. Of course, there is a small implication here when you replace disks as you will need to remove the old RAID-0 set with that disk and create a new RAID-0 set with the new disk, but that is fairly straight forward.

One thing to keep in mind though, from an architectural / operational perspective… if you swap out a 1TB disk for a 2TB disk then you will need to ask yourself will this impact the experience for my customers. Will the performance be different? Because 100 IOps coming from the same disk for 1TB is different then 100IOps coming from the same disk for 2TB, as you will (simply said) be sharing the 100IOps with more VMs (capacity). In short: 100 IOps for a 1000GB disk = 0,1 IOps per GB BUT 100 IOps for a 2000GB disk = 0,05 IOps per GB, you can see the potential impact right… You have more capacity per disk, but with the same number of IOps being provided by that disk. Hopefully though the majority of IO (all writes will for sure, and most reads) will be handles by flash so the impact should be relatively low. Still, something to consider.

Virtual Volumes vendor demos

I was at the Italian VMUG last week and one of the users asked me what Virtual Volumes would look like. He wanted to know if the experience would be similar to the “VM Storage Policy” experiences he has been having with Virtual SAN. I didn’t have an environment running capable of demonstrating Virtual SAN unfortunately so I shared the following videos with him. Considering I already did a blog post on this topic almost 2 years back I figured I would also publicly share these videos. Note that these videos are demos/previews, and no statement is made when or even if this technology will ever be released.