Tag: instant clone | Yellow Bricks

I wrote a blog post a while back about VMFork, which then afterwards got rebranded to Instant Clone. In the vSphere 6.7 release there has been a major change to the architecture of VMFork aka Instant Clone, so I figured I would do an update on the post. As an update doesn’t stand out from the rest of the content I am sharing it as a new post.

Instant Clone was designed and developed to provide a mechanism that allows you to instantaneously create VMs. In the early days it was mainly used by folks who want to deploy desktops, by the desktop community this was often referred to as “just in time” desktops. These desktops would literally be created when the user tried to login, it is indeed that fast. How did this work? Well a good way to describe it is that it is essentially a “vMotion” of a VM on the same host with a linked clone disk. This essentially leads to a situation which looks as follows:

On a host you had a parent VM and a child VM associated with it. You would have a shared base disk, shared memory and then of course unique memory pages and a delta disk for (potential) changes written to disk. The reason customers primarily used this only with VDI at first was the fact that there was no public API for it. Of course folks like Alan Renouf and William Lam fought hard for public APIs internally and they managed to get things like the PowerCLI cmdlets and python vSphere SDK pushed through. Which was great, but unfortunately not 100% fully supported. On top of that there were some architectural challenges with the 1.0 release of Instant Clones. Mainly caused by the fact that VMs were pinned to a host (next to their parent VM) and as such things like HA, DRS, vMotion wouldn’t work. Now with version 2.0 this all changes. William already wrote an extensive blog post about it here. I just went over all of the changes and watched some internal training, and I am going to write some of my findings/learnings down as well, just so that it sticks… First let’s list the things that stood out to me:

Targeted use cases
- VDI
- Container hosts
- Big data / hadoop workers
- DevTest
- DevOps
There are two workflows for instant clone
- Instant clone a running a VM, source and generated VMs continue running
- Instant clone a frozen VM, source is frozen using guestRPC at point in time defined by customer
No UI yet, but “simple API” available
Integration with vSphere Features
- Now supported: HA, DRS, vMotion (Storage / XvMotion etc)
Even when TPS is disabled (default) VMFork still leverages the P-Share technology to collapse the memory pages for efficiency
There is no explicit parent-child relationship any longer

Let’s look at the use cases first, I think the DevTest / DevOps is interesting. You could for instance do an Instant Clone (live) of a VM and then test an upgrade for instance for the application running within the VM. For this you would use the first workflow that I mentioned above: instant clone a running VM. What happens here in the workflow is fairly straight forward. I am using William’s screenshots of the diagrams the developers created to explain it. Thanks William, and dev team 🙂

Now note that above when the first clone is created the source gets a delta disk as well. This is to ensure that the shared disk doesn’t change as that would cause problems for the target. Now when a 2nd VM is created and a 3 the source VM gets additional delta’s. This as you can imagine isn’t optimal and will over time even potentially slow down the source VM. Also one thing to point out is that although the mac address changes for the generated VM, you as the admin still need to make sure the Guest OS picks this up. As mentioned above, as there’s no UI in vSphere 6.7 for this functionality you need to use the API. If you look at the MOB you can actually find the InstantClone_Task and simply call that, for a demo scroll down. But as said, be careful as you don’t want to end up with the same VM with the same IP on the same network multiple times. You can get around the Mac/IP conflict issue rather easy and William has explained how in his post here. You can even change the port group for the NIC, for instance switch over to an isolated network only used for testing these upgrade scenarios etc.

That second workflow would be used for the following use cases: VDI, Container Hosts, Hadoop workers… all more or less the same type of use case. Scale out identical VMs fast! In this case, well lets look at the diagram first:

In the above scenario the Source VM is what they call “frozen”. You can freeze a VM by leveraging the vmware-rpctool and run it with “instantclone.freeze”. This needs to happen from within the guest, and note that you need to have VMware tools installed to have vmware-rpctool available. When this is executed, the VM goes in to a frozen state, meaning that no CPU instructions are executed. Now that you froze the VM you can go through the same instant clone workflow. Instant Clone will know that the VM is frozen. After the instant clone is create you will notice that there’s a single delta disk for the source VM and each generated VM will have its own delta disk as shown above. Big benefit is that the source VM won’t have many delta disks. Plus, you know for sure that every single VM you create from this frozen VM is 100% identical as they all resume from the exact same point in time. Of course when the instant clone is created the new VM is “unfrozen / resumed”, the source will remain frozen. Note that if for whatever reason the source is restarted / power cycled then the “frozen state” is lost. Another added benefit of the frozen VM is that you can automate the “identity / IP / mac” issue when leveraging the “frozen source VM” workflow. How do you do this? Well you disable the network, freeze it, instant clone it (unfreezes automatically), make network changes, enable network. William just did a whole blog post on how to do various required Guest changes, I would highly recommend reading this one as well!

So before you start using Instant Clone, first think about which of the two workflows you prefer and why. So what else did I learn?

As mentioned, and this is something I never realized, but even when TPS is disabled Instant Clone will still share the memory pages through the P-Share mechanism. P-Share is the same mechanism that TPS leverages to collapse memory pages. I always figured that you needed to re-enable TPS again (with or without salting), but that is not the case. You can’t even disable the use of P-Share at this point in time… Which personally I don’t think is a security concern, but you may think different about it. Either way, of course I tested this, below you see the screenshot of the memory info before and after an instant clone. And yes, TPS was disabled. (Look at the shares / saving values…)

Before:

After:

Last but not least, the explicit parent-child relationship caused several problems from a functionality stance (like HA, DRS, vMotion etc not being supported). Per vSphere 6.7 this is no longer the case. There is no strict relationship, and as such all the features you love in vSphere can be fully leveraged even for your Instant Clone VMs. This is why they call this new version of Instant Clone “parentless”.

If you are wondering how you can simply test it without diving in to the API too deep and scripting… You can use the Managed Object Browser (MOB) to invoke the method as mentioned earlier. I recorded this quick demo that shows this, which is based on a demo from one of our Instant Clone engineers. I recommend watching it in full screen as it is much easier to follow that way. (or watch it on youtube in a larger window…) Pay attention, as it is a quick demo, instant clone is extremely fast and the workflow is extremely simple.

And that’s it for now. Hope that helps those interested in Instant Clone / VMFork, and maybe some of you will come up with some interesting use cases that we haven’t thought about it. Would be good if you have use cases to share those in the comment section below. Thanks,

I have seen various people talking about Project Fargo (also known as VM Fork or Instant Clone) and what struck me is that many are under the impression that Project Fargo is the result of the CloudVolumes acquisition. Lets set that straight first, Project Fargo is not based on any technology developed by the CloudVolumes team. Project Fargo has been developed in house and as far as I can tell is an implementation of Snowflock (University of Toronto / Carnegie Mellon University), although I know that in house they have been looking at techniques like these for a long time. Okay, now that we have that out of the way, what is Project Fargo?

Simply said: Project Fargo is a solution that enables you to rapidly clone a running VM. When I say “rapidly clone”, I mean RAPIDLY… Within seconds. Yes, that is extremely fast for a running VM. What should be noted here of course is the fact that it is not a full clone. I guess this is where the “VMFork” name comes in to play, the “parent” virtual machine is quiesced and forked and a “child” VM is born. This child VM is leveraging the disk and memory of the parent (for reads), this is why it is so extremely fast to instantiate… as I said literally seconds, as it “only” needs to create empty delta files, create a VMX and instantiate the process, and do some networking magic as you do not want to have VMs popping up on the network with the same MAC address. Note that the child VM starts where the parent VM left off, so there is no boot process it is instant on! (just like you suspend and resume it) I can’t reveal too much around how this works, yet, but you can imagine that a technique like “fast suspend resume” (FSR), which is the corner stone of features like Storage vMotion, is leveraged.

The question then arises, what if the child wants to write data to memory or disk? This is where the “copy on write” technique comes in to play. Of course the child won’t be allowed to over write shared memory pages (or disk for that matter) and as such a new page will be allocated. For those having a hard time visualizing it, note that this is a conceptual diagram and not how it actually is implemented, I should have maybe drawn the different layers but it would make it too complex. In this scenario you see a single parent with a single child, but you can imagine there could also be 10 child VMs or more, you can see how efficient that would be in terms of resource sharing! And even for the pages which would be unique compared to the parent, if you clone many similar VMs there is a significant chance that TPS will be able to collapse those even! One thing to point out here is that the parent VM is quiesced, in other words it’s sole purpose is allowing for the quick creation of child VMs.

Cool piece of technology I agree, but what would the use case be? Well there are multiple use cases, and those who will be attending VMworld should definitely visit the sessions which will discuss this topic or watch them online (SDDC3227, SDDC3281, EUC2551 etc). I think there are 2 major use cases: virtual desktops and test/dev.

The virtual desktop (just in time desktops) use case is pretty obvious… You create that parent VM, spin it up and it gets quiesced and you can start forking that parent when needed. This will almost be instant, very efficient and also reduce the required resource capacity for VDI environments.

With test/dev scenarios you can imagine that when testing software you don’t want to wait for lengthy cloning processes to finish. Forking a VM will allow you to rapidly test what has been developed , within seconds you have a duplicate environment which you can use / abuse any way you like and destroy it when done. As the disk footprint is small, create/destroy will have a minimal impact on your existing infrastructure both from a resource and “stress” point of view. It basically means that your testing will take less time “end-to-end”.

Can’t wait for it to be available and to start testing it, especially when combined with products like CloudVolumes and Virtual SAN this feature has a lot of potential.

** UPDATE: Various people asked questions around what would happen with VMFork now TPS is disabled by default in upcoming versions of vSphere. I spoke with the lead engineer on this topic and he assured me there is no impact on VMFork. The disabling of TPS will be overruled per VMFork group. So the parent and childs belonging to the same group will be able to leverage TPS and share pages. **

Instant Clone in vSphere 6.7 rocks!

Project Fargo aka VMFork – What is it?