Slow backup of VM on VSAN Datastore

Someone at out internal field conference asked me a question around why doing a full back up of a virtual machine on a VSAN datastore is slower then when doing the same exercise for that virtual machine on a traditional storage array. Note that the test that was conducted here was done with a single virtual machine. The best way to explain why this is is by taking a look at the architecture of VSAN. First, let me mention that the full backup of the VM on a traditional array was done on a storage system that had many disks backing the datastore on which the virtual machine was located.

Virtual SAN, as hopefully all of you know, creates a shared datastore out of host local resources. This datastore is formed out of disk and flash. Another thing to understand is that Virtual SAN is an object store. Each object typically is stored in a resilient fashion and as such on two hosts, hence 3 hosts is the minimum. Now, by default the component of an object is not striped which means that components are stored in most cases on a single physical spindle, for an object this means that as you can see in the diagram below that the disk (object) has two components and without stripes is stored on 2 physical disks.

Now lets get back to the original question. Why did the backup on VSAN take longer then with a traditional storage system? It is fairly simple to explain looking at the above info. In the case of the traditional storage array you are reading from multiple disks (10+) but with VSAN you are only reading from 2 disks. As you can imagine when reading from disk performance / throughput results will differ depending on the number of resources the total number of disks it is reading from can provide. In this test, as there it is just a single virtual machine being backed up, the VSAN result will be different as it has a lower number of disks (resources) to its disposal and on top of that is the VM is new there is no data cached so the flash layer is not used. Now, depending on your workload you can of course decide to stripe the components, but also… when it comes to backup you can also decided to increase the number of concurrent backups… if you increase the number of concurrent backups then the results will get closer as more disks are being leveraged across all VMs. I hope that helps explaining  why results can be different, but hopefully everyone understands that when you test things like this that parallelism is important or provide the right level of stripe width.

Software Defined Storage, which phase are you in?!

Working within R&D at VMware means you typically work with technology which is 1 – 2 years out, and discuss futures of products which are 2-3 years. Especially in the storage  space a lot has changed. Not just innovations within the hypervisor by VMware like Storage DRS, Storage IO Control, VMFS-5, VM Storage Policies (SPBM), vSphere Flash Read Cache, Virtual SAN etc. But also by partners who do software based solutions like PernixData (FVP), Atlantis (ILIO) and SANDisk FlashSoft. Of course there is the whole Server SAN / Hyper-converged movement with Nutanix, Scale-IO, Pivot3, SimpliVity and others. Then there is the whole slew of new storage systems some which are scale out and all-flash, others which focus more on simplicity, here we are talking about Nimble, Tintri, Pure Storage, Xtreme-IO, Coho Data, Solid Fire and many many more.

Looking at it from my perspective, I would say there are multiple phases when it comes to the SDS journey:

  • Phase 0 – Legacy storage with NFS / VMFS
  • Phase 1 – Legacy storage with NFS / VMFS + Storage IO Control and Storage DRS
  • Phase 2 – Hybrid solutions (Legacy storage + acceleration solutions or hybrid storage)
  • Phase 3 – Object granular policy driven (scale out) storage

<edit>

Maybe I should have abstracted a bit more:

  • Phase 0 – Legacy storage
  • Phase 1 – Legacy storage + basic hypervisor extensions
  • Phase 2 – Hybrid solutions with hypervisor extensions
  • Phase 3 – Fully hypervisor / OS integrated storage stack

</edit>

I have written about Software Defined Storage multiple times in the last couple of years, have worked with various solutions which are considered to be “Software Defined Storage”. I have a certain view of what the world looks like. However, when I talk to some of our customers reality is different, some seem very happy with what they have in Phase 0. Although all of the above is the way of the future, and for some may be reality today, I do realise that Phase 1, 2 and 3 may be far away for many. I would like to invite all of you to share:

  1. Which phase you are in, and where you would like to go to?
  2. What you are struggling with most today that is driving you to look at new solutions?

Good Read: Virtual SAN data locality white paper

I was reading the Virtual SAN Data Locality white paper. I think it is a well written paper, and really enjoyed it. I figured I would share the link with all of you and provide a short summary. (http://blogs.vmware.com/vsphere/files/2014/07/Understanding-Data-Locality-in-VMware-Virtual-SAN-Ver1.0.pdf)

The paper starts with an explanation of what data locality is (also referred to as “locality of reference”), and explains the different types of latency experienced in Server SAN solutions (network, SSD). It then explains how Virtual SAN caching works, how locality of reference is implemented within VSAN and also how VSAN does not move data around because of the high cost compared to the benefit for VSAN. It also demonstrates how VSAN delivers consistent performance, even without a local read cache. The key word here is consistent performance, something that is not in the case for all Server SAN solutions. In some cases, a significant performance degradation is experienced minutes long after a workload has been migrated. As hopefully all of you know vSphere DRS runs every 5 minutes by default, which means that migrations can and will happen various times a day in most environments. (Seen environments where 30 migrations a day was not uncommon.) The paper then explains where and when data locality can be beneficial, primarily when RAM is used and with specific use cases (like View) and then explains how CBRC aka View Accelerator (in RAM deduplicated read cache) could be used for this purpose. (Does not explain how other Server SAN solutions leverage RAM for local read caching in-depth, but sure those vendors will have more detailed posts on that, which are worth reading!)

Couple of real gems in this paper, which I will probably read a couple of times in the upcoming days!

vSphere 5.5 and disk limits (mClock scheduler caveat)

I mentioned the new disk IO scheduler in vSphere 5.5 yesterday. When discussing this new disk IO scheduler one thing that was brought to my attention is a caveat around disk limits. Lets get started by saying that disk limits are a function of the host local disk scheduler and not, I repeat, not Storage IO Control. This is an often made mistake by many.

Now, when setting a limit on a virtual disk you define a limit in IOPS. The IOPS specified is the maximum number of IOPS the virtual machine can drive. The caveat is is as follows: IOPS takes the IO size in to account. (It does this as a 64KB IO has a different cost than a 4KB IO.) The calculation is in multiples of 32KB. Note that if you do a 4KB IO it is counted as one IO, however if you do a 64KB IO it is counted as two IOs. Any IO larger than 32KB will be 2 IOs at a minimum as it is rounded up.  In other words, a 40KB IO would be 2 IOs and not 1.25 IOs. This also implies that there could be an unexpected result when you have an application doing relatively large blocksize IOs. If you set a limit of 100 IOPS but your app is doing 64KB IOs than you will see your VM being limited to 50 IOPS as each 64KB IO will count as 2 IOs instead of 1. So the formula here is: ceil(IO Size / 32).

I think that is useful to know when you are limiting your virtual machines. Especially cause this is a change in behaviour compared to vSphere 5.1.

New disk IO scheduler used in vSphere 5.5

When 5.1 was released I noticed the mention of “mClock” in the advanced settings of a vSphere host. I tried enabling it but failed miserably. A couple of weeks back I noticed the same advanced setting again, but this time also noticed it was enabled. So what is this mClock thingie? Well mClock is the new disk IO scheduler used in vSphere 5.5. There isn’t much detail on mClock by itself other than an academic paper by Ajay Gulati.

disk io scheduler

The paper describes in-depth why mClock was designed / developed, it primarily was to provide a better IO scheduling mechanism that allows for limits, shares and yes also reservations. The paper also describes some interesting details around how different IO sizes and latency is taken in to account. I recommend anyone who likes reading brain hurting material to take a look at it. I am also digging internally for some more human readable material, If I find out more I will let you guys know!