VMware and disk-array based thin provisioning can help you economize on disk space, just as disk-array based deduplication can. But how to take optimal advantage of those techniques ? Executive summary:
First of all, both thin provisioning and deduplication work transparently. But you can make them work better with a little bit of work. There's three different types of blocks on your (virtual) disk that we need to distinguish: data blocks, old data blocks, and empty blocks.
Data blocks and empty blocks are what they are, and can't be influenced. Data blocks contain data for files that are on your filesystem. Empty blocks are empty, have never been written to with any real data, so their blank, contain only zeroes.
It's the "old data" blocks that we can improve: they contain data that used to be part of a file. The file got deleted, but the contents are still there.
Overwriting those blocks with zeroes will help a (re)conversion to thin provisioning later, and for dedup they are now empty blocks again, so perfect sharing possibilities.
Overwriting just the old data blocks is hard, but you can probably live with overwriting both empty and old data blocks. This will allocate all blocks (byebye thin provisioning, for now), but the contents will be all-zeroes: perfect thin re-thin-provisioning later, and perfect for dedup.
for each local filesystem mounted on your system, you can execute
be careful, this will - very briefly - fill up your filesystem(s). If your application can't handle this, do it when that app isn't running. For those of you who use LVM, the unallocated PE's in every VG aren't cleared by this procedure, so for every Volume Group, find out name and available free space:
then run (fill $vg_free and $vg_name with the data you just gathered)
dd if=/dev/zero of=/tmp/tempfile.zeroes ; rm -f /tmp/tempfile.zeroes
First of all, both thin provisioning and deduplication work transparently. But you can make them work better with a little bit of work. There's three different types of blocks on your (virtual) disk that we need to distinguish: data blocks, old data blocks, and empty blocks.
Data blocks and empty blocks are what they are, and can't be influenced. Data blocks contain data for files that are on your filesystem. Empty blocks are empty, have never been written to with any real data, so their blank, contain only zeroes.
It's the "old data" blocks that we can improve: they contain data that used to be part of a file. The file got deleted, but the contents are still there.
Overwriting those blocks with zeroes will help a (re)conversion to thin provisioning later, and for dedup they are now empty blocks again, so perfect sharing possibilities.
Overwriting just the old data blocks is hard, but you can probably live with overwriting both empty and old data blocks. This will allocate all blocks (byebye thin provisioning, for now), but the contents will be all-zeroes: perfect thin re-thin-provisioning later, and perfect for dedup.
for each local filesystem mounted on your system, you can execute
dd if=/dev/zero of=/$mountpoint/tempfile.zeroes ; rm -f /$mountpoint/tempfile.zeroes
be careful, this will - very briefly - fill up your filesystem(s). If your application can't handle this, do it when that app isn't running. For those of you who use LVM, the unallocated PE's in every VG aren't cleared by this procedure, so for every Volume Group, find out name and available free space:
vgs -o vg_name,vg_free
then run (fill $vg_free and $vg_name with the data you just gathered)
lvcreate -n zerolv -L $vg_free $vg_name && dd if=/dev/zero of=/dev/$vgname/zerolv ; lvchange -a n /dev/$vgname/zerolv && lvremove /dev/$vgname/zerolv
Comments
http://amedee.be/kleine-full-disk-backup-dd (in Dutch)
Just to let you know that you can create a LV that fills a VG with the "-l 100%FREE" argument to lvcreate. e.g:
lvcreate -l 100%FREE -n zerolv $vg_name