Preparing Linux for optimal thin provisioning and deduplication

VMware and disk-array based thin provisioning can help you economize on disk space, just as disk-array based deduplication can. But how to take optimal advantage of those techniques ? Executive summary:

dd if=/dev/zero of=/tmp/tempfile.zeroes ; rm -f /tmp/tempfile.zeroes

First of all, both thin provisioning and deduplication work transparently. But you can make them work better with a little bit of work. There's three different types of blocks on your (virtual) disk that we need to distinguish: data blocks, old data blocks, and empty blocks.
Data blocks and empty blocks are what they are, and can't be influenced. Data blocks contain data for files that are on your filesystem. Empty blocks are empty, have never been written to with any real data, so their blank, contain only zeroes.
It's the "old data" blocks that we can improve: they contain data that used to be part of a file. The file got deleted, but the contents are still there.
Overwriting those blocks with zeroes will help a (re)conversion to thin provisioning later, and for dedup they are now empty blocks again, so perfect sharing possibilities.
Overwriting just the old data blocks is hard, but you can probably live with overwriting both empty and old data blocks. This will allocate all blocks (byebye thin provisioning, for now), but the contents will be all-zeroes: perfect thin re-thin-provisioning later, and perfect for dedup.
for each local filesystem mounted on your system, you can execute

dd if=/dev/zero of=/$mountpoint/tempfile.zeroes ; rm -f /$mountpoint/tempfile.zeroes

be careful, this will - very briefly - fill up your filesystem(s). If your application can't handle this, do it when that app isn't running. For those of you who use LVM, the unallocated PE's in every VG aren't cleared by this procedure, so for every Volume Group, find out name and available free space:

vgs -o vg_name,vg_free

then run (fill $vg_free and $vg_name with the data you just gathered)

lvcreate -n zerolv -L $vg_free $vg_name && dd if=/dev/zero of=/dev/$vgname/zerolv ; lvchange -a n /dev/$vgname/zerolv && lvremove /dev/$vgname/zerolv

Comments

localhost said…

This reminds me of a blog post I wrote a year ago:
http://amedee.be/kleine-full-disk-backup-dd (in Dutch)

18/2/10 10:01

Bert de Bruijn said…

Forgot to mention: for windows the tool of choice is sdelete.

18/2/10 15:57

Jonathan Barber said…

Many thanks for the post, very helpful to know that you have to use the same procedure for Linux as for Windows.

Just to let you know that you can create a LV that fills a VG with the "-l 100%FREE" argument to lvcreate. e.g:
lvcreate -l 100%FREE -n zerolv $vg_name

28/6/11 16:09

the birdhouse in my soul

Search This Blog

Preparing Linux for optimal thin provisioning and deduplication

Comments

Popular posts from this blog

Volkswagen UHV bluetooth touch adapter & its problems

Reset lost root password on vSphere ESXi 6.7

GEM WS2 MIDI System Exclusive structure and checksums