Changes to Snapshot mechanism “Delete All”

Duncan Epping · Jul 5, 2010 ·

Don’t know if anyone noticed it or not but with the latest set of patches VMware changed the “Delete All” mechanism that is part of the Snapshot feature. I wrote multiple articles about the “Delete All” functionality as it often led to completely filled up VMFS volumes when someone used without knowing the inner workings.

Source

When using the Delete All option in Snapshot Manager, the snapshot farthest from the base disk is committed to its parent, causing that parent snapshot to grow. When the commit is complete, that snapshot is removed and the process starts over on the newly updated snapshot to its parent. This continues until every snapshot has been committed.

This method can be relatively slow since data farthest from the base disk might be copied several times. More importantly, this method can aggressively use disk space if the snapshots are large, which is especially problematic if a limited amount of space is available on the datastore. The space issue is troublesome in that you might choose to delete snapshots explicitly to free up storage.

This issue is resolved in this release in that the order of snapshot consolidation has been modified to start with the snapshot closest to the base disk instead of farthest. The end result is that copying data repeatedly is avoided.

Just to give an example, 4 snapshots:

Old situation (pre vSphere 4 Update 2)

Base disk – 15GB
Snapshot 1 – 1GB –> possibly grows to 13GB
Snapshot 2 – 1GB –> possibly grows to 12GB
Snapshot 3 – 1GB –> possibly grows to 11GB
Snapshot 4 – 10GB

Snapshot 4 is copied in to Snapshot 3, Snapshot 3 in to Snapshot 2, Snapshot 2 in to Snapshot 1 and Snapshot 1 in to your Base disk. After the copy of Snapshot 1 in to the Base disk all Snapshots will be deleted. Please note that the total amount of diskspace consumed before the “Delete All” was 28GB. Right before the final merge the consumed diskspace is 61GB. This is just an example, just imagine what could happen with a 100GB data disk!

New situation

Base disk – 15GB
Snapshot 1 – 1GB
Snapshot 2 – 1GB
Snapshot 3 – 1GB
Snapshot 4 – 10GB

Snapshot 1 is copied in to Base disk, Snapshot 2 is copied in to Base disk, Snapshot 3 in to Base disk and Snapshot 4 in to your Base disk. After the copy of Snapshot 4 in to the Base disk all Snapshots will be deleted. Please note that the total amount of diskspace consumed before the “Delete All” was 28GB. Right before the final merge the consumed diskspace is still 28GB. Not only did VMware reduced the chances of running out of disk space, the time to commit the snapshot by using “delete all” has also been decreased using this new mechanism.

Comments

Matt Liebowitz says

5 July, 2010 at 13:43

This is a very welcome improvement as I’ve been in the situation you describe with snapshots that end up taking up 100GB+ during deletion. In my opinion this is one of the most important improvements in U2 that isn’t getting much press.

Now if they could just fix it so that the progress doesn’t just stay at 95% until it’s done (or the vCenter task times out) that would be a real improvement. 🙂
vAntMet says

5 July, 2010 at 13:45

“Snapshot 1 is copied in to Base disk, Snapshot 2 is copied in to Base disk, Snapshot 3 in to Base disk and Snapshot 4 in to your Base disk. After the copy of Snapshot 1 in to the Base disk all Snapshots will be deleted. ”

I’m guessing that’s a typo and you meant “After the copy of Snapshot 4 in to the Base disk all Snapshots will be deleted.” – otherwise you have a race condition, and a possibility of losing a lot of data 😉
Duncan Epping says

5 July, 2010 at 13:58

that’s a type indeed, changed it. thanks,
PiroNet says

5 July, 2010 at 14:12

That’s a real improvement definitely.

What about this, why not delete snapshot#1 right after it is committed into its main parent disk, then remap the parent/child link for snapshot#2,3 and 4. Then again commit snapshot#2, delete it, remap parent/child link, etc…

You could reclaim precious disk space right after the first snapshot is committed instead waiting the last one to be committed.

That may look a dumb scenario, but hey I’m learning every day 🙂
Duncan Epping says

5 July, 2010 at 14:23

Not sure why they didn’t use that mechanism but remapping and deleting between merges instead of a single remap + delete action has probably more risks associated to it.
PD UK says

5 July, 2010 at 15:08

That seems like a good improvement.

Not that it matters, but if, like you say, the old method commits snapshot 4 to snapshot 3 and then deltes it, before copying 3 into 2 etc, then I can only work out that you get a MAX size of 40GB. This is just before snapshot 2 is deleted but after it has been commited to snapshot 1.

Maybe it’s just my maths.
PD UK says

5 July, 2010 at 15:30

Sorry, just re-read this.

You said :
Snapshot 4 is copied in to Snapshot 3, Snapshot 3 in to Snapshot 2, Snapshot 2 in to Snapshot 1 and Snapshot 1 in to your Base disk. After the copy of Snapshot 1 in to the Base disk all Snapshots will be deleted

That case, it’s 61GB MAX size.

However, at the top it says :

…the snapshot farthest from the base disk is committed to its parent, causing that parent snapshot to grow. When the commit is complete, that snapshot is removed..

Now it’s 40GB MAX size.

Since we have many servers not on U2, I’m left wondering what is correct ?
Andrew Miller says

5 July, 2010 at 16:24

I was very happy when I saw this in the release notes (I do my best to read them each time :-)….I still heavily caution customers about using VMware-level snapshots but this makes it much less painful when I’m having to help people deal with long-standing snapshots.
Duncan Epping says

5 July, 2010 at 16:33

@PD UK: I have tested this in the past. When you click “Delete All” the snapshots will not be removed before the final copy has been done. Basically each snapshot will be merged in to its parent, and the “first” snapshot is merged in to the base disk it will proceed with deleting all the snapshots.
Justin Cockrell says

5 July, 2010 at 16:53

Great info, thanks for pointing out this change. It’s a step in the right direction. Like Andrew mentions, I try to talk people out of using snapshots in most situations. When I have gotten into ‘snapshot hell’ situations I’ve generally done either a clone or V2V conversion to a different datastore vice trying to delete all snapshots. Deleting multiple snaps just makes me nervous 😉
PD UK says

5 July, 2010 at 16:55

Cheers Duncan.

I’ll accept that. 🙂
Nicholas Weaver says

5 July, 2010 at 17:32

Man that is a great improvement…

I remember back when I was a customer dealing with multiple snaps on a SQL DB and running into the space issue.

Cool that VMware is constantly tuning the behavior of small things like this while they are working on the big stuff.

.nick
Duncan Epping says

5 July, 2010 at 18:16

@Nicholas: I remember one of my customers having over 20 snapshots on an Exchange Server with the largest one being roughly 100GB. That was one of the first time I was called to do troubleshooting, I can tell you I was sweating.
William says

7 July, 2010 at 06:30

For the new situation, Snapshot 1 is copied in to Base disk, Snapshot 2 is copied in to Base disk, Snapshot 3 in to Base disk and Snapshot 4 in to your Base disk. After the copy of Snapshot 4 in to the Base disk all Snapshots will be deleted. Please note that the total amount of diskspace consumed before the “Delete All” was 28GB. Right before the final merge the consumed diskspace is still 28GB.

However, my calcultion is 41GB before the final merge since Snapshot 1,2,3,4 were added to the Base Disk before they were removed. It’s 15 + 1 + 1 + 1 + 10 + 13 (snapshots) = 41.

Did I miss any thing?

Thanks
Chris Sommers says

7 July, 2010 at 14:32

William

Merging a snapshot to the base disk does not increase the size of the base disk…

Chris
@vRobM says

7 July, 2010 at 20:08

Chris is right,

The base disk holds the raw data, and the snapshots hold the change data since the time of each snapshot creation.

So it’s possible to have 100GB of snapshot data, and only 10GB base disk. When the 100GB of changes is committed (deleted), the base disk still has 10GB of raw data, which was updated with 100GB worth of changes.

This is also why in the old snapshot situation, the parent snapshots usually do not grow to the size of the parent+child snapshot, but some value in between. It wasn’t predictable, and could be surprising when you least expected it. Insert the ‘Murphy’s Law’ factor 🙂

Duncan is also right about the ‘single remap + delete action’ not being a smart one, since if at any point you run out of disk space or there is an error that makes the snapshot consolidation process fail, there’s no safe way to restart the process. If something fails, you need the original files to remain unmodified, so once the failure cause is corrected, you can retry the action again until successful completion.
duncan says

8 July, 2010 at 07:58

@William: The base disk, unless thin, has a fixed size. So in my case it is a thick disk which means the base will not grow. Keep in mind snapshots are like a block level bitmap of your base disk.
Paul Sterley says

20 October, 2010 at 17:54

May I re-post this on my blog with a link back here? I’m mainly interested in putting it somewhere I can find it easily later if I need to explain it to someone.
Piyush Chordia says

10 December, 2012 at 08:07

I have written a similar article on how VMware snaphots work .
http://www.pcclm.com/2012/02/virtual-machine-snapshots-in-vmware.html

Related

Reader Interactions

Comments