Slow backup of VM on VSAN Datastore

Someone at out internal field conference asked me a question around why doing a full back up of a virtual machine on a VSAN datastore is slower then when doing the same exercise for that virtual machine on a traditional storage array. Note that the test that was conducted here was done with a single virtual machine. The best way to explain why this is is by taking a look at the architecture of VSAN. First, let me mention that the full backup of the VM on a traditional array was done on a storage system that had many disks backing the datastore on which the virtual machine was located.

Virtual SAN, as hopefully all of you know, creates a shared datastore out of host local resources. This datastore is formed out of disk and flash. Another thing to understand is that Virtual SAN is an object store. Each object typically is stored in a resilient fashion and as such on two hosts, hence 3 hosts is the minimum. Now, by default the component of an object is not striped which means that components are stored in most cases on a single physical spindle, for an object this means that as you can see in the diagram below that the disk (object) has two components and without stripes is stored on 2 physical disks.

Now lets get back to the original question. Why did the backup on VSAN take longer then with a traditional storage system? It is fairly simple to explain looking at the above info. In the case of the traditional storage array you are reading from multiple disks (10+) but with VSAN you are only reading from 2 disks. As you can imagine when reading from disk performance / throughput results will differ depending on the number of resources the total number of disks it is reading from can provide. In this test, as there it is just a single virtual machine being backed up, the VSAN result will be different as it has a lower number of disks (resources) to its disposal and on top of that is the VM is new there is no data cached so the flash layer is not used. Now, depending on your workload you can of course decide to stripe the components, but also… when it comes to backup you can also decided to increase the number of concurrent backups… if you increase the number of concurrent backups then the results will get closer as more disks are being leveraged across all VMs. I hope that helps explaining  why results can be different, but hopefully everyone understands that when you test things like this that parallelism is important or provide the right level of stripe width.

What is coming for vSphere and VSAN? VMworld reveals…

I’ve been prepping a presentation for upcoming VMUGs, but wanted to also share this with my readers. The session is all about vSphere futures, what is coming soon? Before anyone says I am breaking NDA, I’ve harvested all of this info from public VMworld sessions. Except for the VSAN details, those were announced to the press at VMworld EMEA. Lets start with Virtual SAN…

The Virtual SAN details were posted in this Computer Weekly article, and by the looks of it they interviewed VMware’s CEO Pat Gelsinger and Alberto Farronato from the VSAN product team. So what is coming soon?

  • All Flash Virtual SAN support
    Considering the price of MLC has lowered to roughly the same price as SAS HDDs per GB I think this is a great new feature to have. Being able to build all-flash configurations at the price point of a regular configuration, and with probably many supported configurations is a huge advantage of VSAN. I would expect VSAN to support various types of flash as the “capacity” layer, so this is an architects dream… designing your own all-flash storage system!
  • Virsto integration
    I played with Virsto when it was just released and was impressed by the performance and the scalability. Functions that were part of Virst such as snapshots and clones these have been built into VSAN and it will bring VSAN to the next level!
  • JBOD support
    Something many have requested, and primarily to be able to use VSAN in Blade environments… Well with the JBOD support announced this will be a lot easier. I don’t know the exact details, but just the “JBOD” part got me excited.
  • 64 host VSAN cluster support
    VSAN doesn’t scale? Here you go,

That is a nice list by itself, and I am sure there is plenty more for VSAN. At VMworld for instance Wade Holmes also spoke about support for disk controller based encryption for instance. Cool right?! So what about vSphere? Considering even the version number was dropped during the keynote and it hints at a major release you would expect some big functionality to be introduced. Once again, all the stuff below is harvested from various public VMworld sessions:

  • VMFork aka Project Fargo – discussed here…
  • Increased scale!
    • 64 host HA/DRS cluster, I know a handful of customers who asked for 64 host clusters, so here it is guys… or better said: soon you will have it!
  • SMP vCPU FT – up to 4 vCPU support
    • I like FT from an innovation point of view, but it isn’t a feature I would personally use too much as I feel “fault tolerance” from an app perspective needs to be solved by the app. Now, I do realize that there are MANY legacy applications out there, and if you have a scale-up application which needs to be highly available then SMP FT is very useful. Do note that with this release the architecture of FT has changed. For instance you used to share the same “VMDK” for both primary and secondary, but that is no longer the case.
  • vMotion across anything
    • vMotion across vCenter instances
    • vMotion across Distributed Switch
    • vMotion across very large distance, support up to 100ms latency
    • vMotion to vCloud Air datacenter
  • Introduction of Virtual Datacenter concept in vCenter
    • Enhance “policy driven” experience within vCenter. Virtual Datacenter aggregates compute clusters, storage clusters, networks, and policies!
  • Content Library
    • Content Library provides storage and versioning of files including VM templates, ISOs, and OVFs.
      Includes powerful publish and subscribe features to replicate content
      Backed by vSphere Datastores or NFS
  • Web Client performance / enhancement
    • Recent tasks pane drops to the bottom instead of on the right
    • Performance vastly improved
    • Menus flattened
  • DRS placement “network aware”
    • Hosts with high network contention can show low CPU and memory usage, DRS will look for more VM placements
    • Provide network bandwidth reservation for VMs and migrate VMs in response to reservation violations!
  • vSphere HA component protection
    • Helps when hitting “all paths down” situations by allowing HA to take action on impacted virtual machines
  • Virtual Volumes, bringing the VSAN “policy goodness” to traditional storage systems

Of course there is more, but these are the ones that were discussed at VMworld… for the remainder you will have to wait until the next version of vSphere is released, or you can also sign up for the beta still I believe!

Project Fargo aka VMFork – What is it?

I have seen various people talking about Project Fargo (also known as VM Fork) and what struck me is that many are under the impression that Project Fargo is the result of the CloudVolumes acquisition. Lets set that straight first, Project Fargo is not based on any technology developed by the CloudVolumes team. Project Fargo has been developed in house and as far as I can tell is an implementation of Snowflock (University of Toronto / Carnegie Mellon University), although I know that in house they have been looking at techniques like these for a long time. Okay, now that we have that out of the way, what is Project Fargo?

Simply said: Project Fargo is a solution that enables you to rapidly clone a running VM. When I say “rapidly clone”, I mean RAPIDLY… Within seconds. Yes, that is extremely fast for a running VM. What should be noted here of course is the fact that it is not a full clone. I guess this is where the “VMFork” name comes in to play, the “parent” virtual machine is quiesced and forked and a “child” VM is born. This child VM is leveraging the disk and memory of the parent (for reads), this is why it is so extremely fast to instantiate… as I said literally seconds, as it “only” needs to create empty delta files, create a VMX and instantiate the process, and do some networking magic as you do not want to have VMs popping up on the network with the same MAC address. Note that the child VM starts where the parent VM left off, so there is no boot process it is instant on! (just like you suspend and resume it) I can’t reveal too much around how this works, yet, but you can imagine that a technique like “fast suspend resume” (FSR), which is the corner stone of features like Storage vMotion, is leveraged.

The question then arises, what if the child wants to write data to memory or disk? This is where the “copy on write” technique comes in to play. Of course the child won’t be allowed to over write shared memory pages (or disk for that matter) and as such a new page will be allocated. For those having a hard time visualizing it, note that this is a conceptual diagram and not how it actually is implemented, I should have maybe drawn the different layers but it would make it too complex. In this scenario you see a single parent with a single child, but you can imagine there could also be 10 child VMs or more, you can see how efficient that would be in terms of resource sharing! And even for the pages which would be unique compared to the parent, if you clone many similar VMs there is a significant chance that TPS will be able to collapse those even! One thing to point out here is that the parent VM is quiesced, in other words it’s sole purpose is allowing for the quick creation of child VMs.

project fargo

Cool piece of technology I agree, but what would the use case be? Well there are multiple use cases, and those who will be attending VMworld should definitely visit the sessions which will discuss this topic or watch them online (SDDC3227, SDDC3281, EUC2551 etc). I think there are 2 major use cases: virtual desktops and test/dev.

The virtual desktop (just in time desktops) use case is pretty obvious… You create that parent VM, spin it up and it gets quiesced and you can start forking that parent when needed. This will almost be instant, very efficient and also reduce the required resource capacity for VDI environments.

With test/dev scenarios you can imagine that when testing software you don’t want to wait for lengthy cloning processes to finish. Forking a VM will allow you to rapidly test what has been developed , within seconds you have a duplicate environment which you can use / abuse any way you like and destroy it when done. As the disk footprint is small, create/destroy will have a minimal impact on your existing infrastructure both from a resource and “stress” point of view. It basically means that your testing will take less time “end-to-end”.

Can’t wait for it to be available and to start testing it, especially when combined with products like CloudVolumes and Virtual SAN this feature has a lot of potential.

** UPDATE: Various people asked questions around what would happen with VMFork now TPS is disabled by default in upcoming versions of vSphere. I spoke with the lead engineer on this topic and he assured me there is no impact on VMFork. The disabling of TPS will be overruled per VMFork group. So the parent and childs belonging to the same group will be able to leverage TPS and share pages. **

It is all about choice

The last couple of years we’ve seen a major shift in the market towards the software-defined datacenter. This has resulted in many new products, features and solutions being brought to market. What struck me though over the last couple of days is that many of the articles I have read in the past 6 months (and written as well) were about hardware and in many cases about the form factor or how it has changed. Also, there are the posts around hyper-converged vs traditional, or all flash storage solutions vs server side caching. Although we are moving towards a software-defined world, it seems that administrators / consultants / architects still very much live in the physical world. In many of these cases it even seems like there is a certain prejudice when it comes to the various types of products and the form factor they come in and whether that is 2U vs blade or software vs hardware is beside the point.

When I look at discussions being held around whether server side caching solutions is preferred over an all-flash arrays, which is just another form factor discussion if you ask me, the only right answer that comes to mind is “it depends”. It depends on what your business requirements are, what your budget is, if there are any constraints from an environmental perspective, hardware life cycle, what your staff’s expertise / knowledge is etc etc. It is impossible to to provide a single answer and solution to all the problems out there. What I realized is that what the software-defined movement actually brought us is choice, and in many of these cases the form factor is just a tiny aspect of the total story. It seems to be important though for many people, maybe still an inheritance from the “server hugger” days where hardware was still king? Those times are long gone though if you ask me.

In some cases a server side caching solutions will be the perfect fit, for instance when ultra low latency and use of existing storage infrastructure  is a requirement. In other cases bringing in an all-flash array may make more sense, or a hyper-converged appliance could be the perfect fit for that particular use case. What is more important though is how these components will enable you to optimize your operations, how these components will enable you to build that software-defined datacenter and help you meet the demands of the business. This is what you will need to ask yourself when looking at these various solutions, and if there is no clear answer… there is plenty of choice out there, stay open minded and go explore.

VMware EVO:RAIL demos

I just bumped in to a bunch of new VMware EVO:RAIL demos which I wanted to share. Especially the third demo which shows how EVO:RAIL scales out by a couple of simple clicks.

General overview:

Customer Testimonial:


Clustering appliances:

Management experience:

Configuration experience:

x