Project Fargo aka VMFork – What is it?

Duncan Epping · Oct 7, 2014 ·

I have seen various people talking about Project Fargo (also known as VM Fork or Instant Clone) and what struck me is that many are under the impression that Project Fargo is the result of the CloudVolumes acquisition. Lets set that straight first, Project Fargo is not based on any technology developed by the CloudVolumes team. Project Fargo has been developed in house and as far as I can tell is an implementation of Snowflock (University of Toronto / Carnegie Mellon University), although I know that in house they have been looking at techniques like these for a long time. Okay, now that we have that out of the way, what is Project Fargo?

Simply said: Project Fargo is a solution that enables you to rapidly clone a running VM. When I say “rapidly clone”, I mean RAPIDLY… Within seconds. Yes, that is extremely fast for a running VM. What should be noted here of course is the fact that it is not a full clone. I guess this is where the “VMFork” name comes in to play, the “parent” virtual machine is quiesced and forked and a “child” VM is born. This child VM is leveraging the disk and memory of the parent (for reads), this is why it is so extremely fast to instantiate… as I said literally seconds, as it “only” needs to create empty delta files, create a VMX and instantiate the process, and do some networking magic as you do not want to have VMs popping up on the network with the same MAC address. Note that the child VM starts where the parent VM left off, so there is no boot process it is instant on! (just like you suspend and resume it) I can’t reveal too much around how this works, yet, but you can imagine that a technique like “fast suspend resume” (FSR), which is the corner stone of features like Storage vMotion, is leveraged.

The question then arises, what if the child wants to write data to memory or disk? This is where the “copy on write” technique comes in to play. Of course the child won’t be allowed to over write shared memory pages (or disk for that matter) and as such a new page will be allocated. For those having a hard time visualizing it, note that this is a conceptual diagram and not how it actually is implemented, I should have maybe drawn the different layers but it would make it too complex. In this scenario you see a single parent with a single child, but you can imagine there could also be 10 child VMs or more, you can see how efficient that would be in terms of resource sharing! And even for the pages which would be unique compared to the parent, if you clone many similar VMs there is a significant chance that TPS will be able to collapse those even! One thing to point out here is that the parent VM is quiesced, in other words it’s sole purpose is allowing for the quick creation of child VMs.

Cool piece of technology I agree, but what would the use case be? Well there are multiple use cases, and those who will be attending VMworld should definitely visit the sessions which will discuss this topic or watch them online (SDDC3227, SDDC3281, EUC2551 etc). I think there are 2 major use cases: virtual desktops and test/dev.

The virtual desktop (just in time desktops) use case is pretty obvious… You create that parent VM, spin it up and it gets quiesced and you can start forking that parent when needed. This will almost be instant, very efficient and also reduce the required resource capacity for VDI environments.

With test/dev scenarios you can imagine that when testing software you don’t want to wait for lengthy cloning processes to finish. Forking a VM will allow you to rapidly test what has been developed , within seconds you have a duplicate environment which you can use / abuse any way you like and destroy it when done. As the disk footprint is small, create/destroy will have a minimal impact on your existing infrastructure both from a resource and “stress” point of view. It basically means that your testing will take less time “end-to-end”.

Can’t wait for it to be available and to start testing it, especially when combined with products like CloudVolumes and Virtual SAN this feature has a lot of potential.

** UPDATE: Various people asked questions around what would happen with VMFork now TPS is disabled by default in upcoming versions of vSphere. I spoke with the lead engineer on this topic and he assured me there is no impact on VMFork. The disabling of TPS will be overruled per VMFork group. So the parent and childs belonging to the same group will be able to leverage TPS and share pages. **

Comments

Richard May says

7 October, 2014 at 17:48

coughSnapClonecough
- Duncan Epping says
  
  7 October, 2014 at 21:42
  
  not sure which snapclone you are referring to as about 20 storage vendors use that name 🙂
FabCam says

7 October, 2014 at 17:50

Definitely cool technology… Will it be a replacement for View Composer?
F.
- duncan says
  
  7 October, 2014 at 20:41
  
  Can’t comment on that…
Alan Conboy (@acconboy) says

7 October, 2014 at 19:28

Odd, Scale Computing has been doing this in production for a year.
- Duncan Epping says
  
  7 October, 2014 at 21:41
  
  Forking running VMs and getting the child VMs resumed where the parent VM left off? I did not know that, that is cool that they can do that.
David Pasek says

7 October, 2014 at 19:54

Hi Duncan. Thanks for blog post. Sounds cool. Can you explain benefits of VMFork against linked clones? I don’t use linked clones but it fits into same use cases, right? I see the difference and benefit of memory snapshot. Forked VM will probably don’t need booting, right? Anything else what is not obvious? Thx.
- duncan says
  
  7 October, 2014 at 20:42
  
  Indeed, a Forked VM doesn’t need booting and it shares memory with the parent. So if you have 100 forks the total footprint is tiny.
  - Rj says
    
    8 October, 2014 at 06:11
    
    Hey Duncan
    
    So a fork vm (child) will have it’s own ram right but shares the pages from the source vm? Or does it use and run apps off Of the source vm ram. If the latter is true then the ram for source vm is a limiting factor correct?
    - Duncan Epping says
      
      8 October, 2014 at 08:44
      
      I am not sure if I am following you. Each VM has its own RAM assigned. So 100 child VMs with 4GB is 100x4GB assigned. However, many of those pages will be shared with the child VM.
Greg M says

7 October, 2014 at 23:54

There is a lot more to it than simply cloning a machine on the storage. When the Parent VM is forked the child VM’s share the memory pages at a host level with the parent as well as the disk. What’s more the VM is running at this time, thereby avoiding boot, prep and login times. The real power is in the preparation of the apps on the parent VM and how they are launched for the child VM (can’t go into detail) but it’s highly efficient and the VM/Apps are isolated into their required context. In terms of provisioning times we expect to see very large savings in the creation of say 500 desktops and a smaller footprint required for storage. Linked Clones still suffer from a number of drawbacks for some use cases, this is where Cloud Volumes will also help being able to deploy the users applications and persistent data on a lightweight child VM wherever they go or login.
- Duncan Epping says
  
  8 October, 2014 at 08:45
  
  exactly 🙂
Vaughn Stewart (@vStewed) says

8 October, 2014 at 01:56

Snowflock is very cool indeed. I smell a VMware alternative to Docker. Time will tell…
David La Motta (@emitromdave) says

8 October, 2014 at 16:33

This is really interesting, thanks for sharing. How does it compare / contrast (replace?) with VAAI?
- Duncan Epping says
  
  9 October, 2014 at 08:58
  
  Not sure I am following your question… it doesn’t replace VAAI as VAAI is an offload mechanism to the array. In this scenario there is no need to offload anything as all the smart things happen on the hypervisor side. VAAI is more then just cloning though, and in many scenarios linked clones may not be wanted, so VAAI XCOPY etc is still very much needed.
  - David La Motta (@emitromdave) says
    
    9 October, 2014 at 15:40
    
    I had cloning specifically in my mind when I typed the comment. Some vendor’s “fast cloning” is only fast on the surface: try to make immediate use of those array-offloaded clones and you may be waiting until the array finishes off its magic under the covers. So the thought of VMFork seemed like a good alternative to a VAAI-offloaded clone. The whole linked clone argument, however, throws a wrench into the mix.
    
    Hope that clears up where I was coming from. Thanks for the reply.
vmPete says

8 October, 2014 at 19:07

Thanks for bringing up the less obvious use case of Test/Dev. I’ve seen a few interesting attempts at traditional linked cloning to support a Test/Dev workflow, and the I/O penalties and other overhead was too great to make it worth it. I’d be curious to know if the technology could be leveraged in such a way to help/revamp the approach used for snapshots. Something many with large data, high change rate environments would appreciate.
- Duncan Epping says
  
  9 October, 2014 at 08:55
  
  A new mechanism for snapshots can be expected in the future…
John Nicholson. says

8 October, 2014 at 23:35

vmPete, think of a snapshot system that doesn’t have the traditional VM performance overhead (Virstro?) or a log structured file system underneath.
vmrandy says

9 October, 2014 at 04:38

Awesome! Finally we will see “linked clones” as a first class citizen in vSphere for any VM? I never understood why it wasn’t more accessible.

It sounds a little like a competing technology to containers. Maybe complimentary… Looking forward to finding out more and seeing it in action.
- Duncan Epping says
  
  9 October, 2014 at 08:59
  
  Complimentary Randy. VMFork could be an excellent way to create tiny VMs for your containers.
Fred Peterson says

9 October, 2014 at 06:11

This effectively won’t work with Active Directory joined Windows servers. But for something like a generic web farm that needs no remote resources to access via Kerberos, that can expand on the fly rapidly to support an unexpected load? Perfect. Or if you need to do an upgrade, destroy the Forks, update the parent, then start your forks back up…
- Fred Peterson says
  
  9 October, 2014 at 06:17
  
  Well I take that back, it would work for Windows as long as the fork was sysprep’d…but that’s somewhat of a lengthy process that might be more involved. PVS might be more beneficial at that point.
  - Duncan Epping says
    
    9 October, 2014 at 08:56
    
    Actually, VMware has developed a mechanism which allows Windows VMs to get a new SID, join the domain etc almost instantly without rebooting.
    - Fred Peterson says
      
      9 October, 2014 at 17:01
      
      OoooOoo. That was worth mentioning in the original post! You mentioned “network magic” which I can assume this would be part of. Thanks!
    - Steffen Oezcan says
      
      9 October, 2014 at 22:00
      
      Hi Duncan, are you talking about (Horizon) View´s QuickPrep funtionality here (which can NOT change the SID) or some other new technology?
@vcdxnz001 says

18 October, 2014 at 18:38

This is very cool tech indeed. Given that TPS is being disabled in the future as some researches have found a way to steal encryption keys from other VM’s running on the same processor if TPS was enabled, how will Project Fargo aka VMFork protect against the same security risk? The info on the TPS security issues is documented in VMware KB 2080735. The way VMFork works looks a lot like TPS. If I can force a cache flush on a forked VM will I also be able to determine the timings and get the AES encryption keys?
- duncan says
  
  19 October, 2014 at 10:38
  
  Not sure how that would work in a VDI environment Michael to be honest. But I have already asked internally what the impact is of disabling TPS on VMFork.
- Duncan Epping says
  
  21 October, 2014 at 08:32
  
  I spoke with engineering. The disabling of TPS will be overruled per VMFork group. So the parent and childs belonging to the same group will be able to leverage TPS and share pages.
roderick derks says

19 October, 2014 at 18:02

Duncan, when you run more VM Forks of one VM won’t there be a risk that the drives of the original VM will be hammered with read requests?
- Roderick says
  
  12 November, 2014 at 21:23
  
  Got any thoughts about this?
  - Duncan Epping says
    
    13 November, 2014 at 13:47
    
    The answer is simply: yes… the VM will be hammered with read requests but hopefully you will:
    1) consider using CBRC
    2) take this in to account when creating the design
    3) look at decent storage solutions that have good read cache functionality
Justin Adams says

4 November, 2014 at 19:57

Duncan, at the end of paragraph 3 you state: “One thing to point out here is that the parent VM is quiesced, in other words it’s sole purpose is allowing for the quick creation of child VMs.”

If the sole purpose of the parent VM is to allow the quick creation of child VMs, then doesn’t that exclude the option for Test/Dev scenarios? You could potentially bog down your production application if you are performing heavy read operations in the child VMs during Test/Dev operations.l
- duncan says
  
  4 November, 2014 at 21:51
  
  Why would it exclude test/dev? You can easily spin up new VMs to test what you’ve developed. Sure it will need to be combined with a mechanism to pull in code etc.
  
  In terms of storage, yes you will need to factor that in to your architecture.
Paul says

11 November, 2014 at 15:35

Rapid scaling of web servers for ticketing / event scenarios should be a great use case based on the replies above.

Good article Duncan, thanks for posting it.
slowping says

16 December, 2014 at 01:46

not new tech… been done before two years ago. Rapid cloning (vm fork) plus dynamic host renames (without reboot) in windows workgroups and active directory.

http://youtu.be/OTqN-MlJhUE
- Janon says
  
  7 February, 2015 at 20:09
  
  Native capability from first party vendor baked into the core package vs niche tech from obscure startup who very well may get acquired and go away (whoops! They got acquired a month after you posted… What a coincidence! Good value on the stock?) is absolutely worth mentioning as news. Besides which Gridcentric tech was cool, but was also only Xen. So outside of Citrix XenDesk, and the handful of OpenStack activity, it’s questionable what was ever accomplished with it in real production. Now contrast vSphere which will immediately have to “walk the walk” in fortune 10 environments from day one.
Ryan says

10 March, 2015 at 14:17

Has their been a white paper published on this technology? I’m assuming the proper name is “Instant Clone,” but I’ve yet been able to find anything official on how the technology works.
- Duncan Epping says
  
  10 March, 2015 at 14:46
  
  that is the official new name indeed, but 6.0 hasn’t shipped yet, so no details other then the above
  - Ryan says
    
    16 March, 2015 at 20:56
    
    Now that version 6 has been released I’m noticing this feature seems absent. The clone and snapshots features in vCenter do not seem to mention anything about instant clone and I’ve yet to see a white paper published.
    
    Do you have any information?
    - Duncan Epping says
      
      17 March, 2015 at 11:48
      
      The feature is only accessible through a private API. View and BDE will leverage that API, and for now are the only two solutions that will leverage it. More use cases will be explored over time.
      
      I don’t know why we’ve seen no white paper yet. I can ask the vSphere team if anything is planned.
Andy says

10 March, 2015 at 22:14

I am just curous, what happens if the parent clone is deleted? Does the child migrate the vmdks to its own folder or is it essentially deleted as well?
Lina Lu says

11 March, 2015 at 04:52

It’s a good article!
How about the performance of child VMs when they allocating new storage/memory resources? Will child VM prefetch memory from parent?
Snowflock use multicast to reduce network load and prevent performance degradation of child VMs. Does “Instant Clone” use multicast also?
- Duncan Epping says
  
  11 March, 2015 at 09:43
  
  New storage / memory cannot be pre-fetched as it is new?
  
  With regards to networking, there is no change when it comes to that afaik,
  - Lina Lu says
    
    12 March, 2015 at 04:08
    
    Thanks for reply!
    
    Allocating new storage/memory resources will take time, so childs may suffer when this happens. Is this true?
    
    As I know, in snowflock implementations, childs can not share memory with parent, as they will prefetch old memory from parent when idle. This is good for performance, but bad for memory resource utilization.
    In “Instant Clone”, will childs share memory with parent unless childs change the contents of memory, or perfetch old memory from parent?

Related

Reader Interactions

Comments