Skip to main content

Preparing Linux for optimal thin provisioning and deduplication

VMware and disk-array based thin provisioning can help you economize on disk space, just as disk-array based deduplication can. But how to take optimal advantage of those techniques ? Executive summary:
dd if=/dev/zero of=/tmp/tempfile.zeroes ; rm -f /tmp/tempfile.zeroes

First of all, both thin provisioning and deduplication work transparently. But you can make them work better with a little bit of work. There's three different types of blocks on your (virtual) disk that we need to distinguish: data blocks, old data blocks, and empty blocks.
Data blocks and empty blocks are what they are, and can't be influenced. Data blocks contain data for files that are on your filesystem. Empty blocks are empty, have never been written to with any real data, so their blank, contain only zeroes.
It's the "old data" blocks that we can improve: they contain data that used to be part of a file. The file got deleted, but the contents are still there.
Overwriting those blocks with zeroes will help a (re)conversion to thin provisioning later, and for dedup they are now empty blocks again, so perfect sharing possibilities.
Overwriting just the old data blocks is hard, but you can probably live with overwriting both empty and old data blocks. This will allocate all blocks (byebye thin provisioning, for now), but the contents will be all-zeroes: perfect thin re-thin-provisioning later, and perfect for dedup.
for each local filesystem mounted on your system, you can execute
dd if=/dev/zero of=/$mountpoint/tempfile.zeroes ; rm -f /$mountpoint/tempfile.zeroes

be careful, this will - very briefly - fill up your filesystem(s). If your application can't handle this, do it when that app isn't running. For those of you who use LVM, the unallocated PE's in every VG aren't cleared by this procedure, so for every Volume Group, find out name and available free space:
vgs -o vg_name,vg_free

then run (fill $vg_free and $vg_name with the data you just gathered)
lvcreate -n zerolv -L $vg_free $vg_name && dd if=/dev/zero of=/dev/$vgname/zerolv ; lvchange -a n /dev/$vgname/zerolv && lvremove /dev/$vgname/zerolv

Comments

localhost said…
This reminds me of a blog post I wrote a year ago:
http://amedee.be/kleine-full-disk-backup-dd (in Dutch)
Bert de Bruijn said…
Forgot to mention: for windows the tool of choice is sdelete.
Jonathan Barber said…
Many thanks for the post, very helpful to know that you have to use the same procedure for Linux as for Windows.

Just to let you know that you can create a LV that fills a VG with the "-l 100%FREE" argument to lvcreate. e.g:
lvcreate -l 100%FREE -n zerolv $vg_name

Popular posts from this blog

Volkswagen UHV bluetooth touch adapter & its problems

My Volkswagen car has the "universal cellphone preparation" UHV built-in. This is the main part of a car kit, but requires an additional adapter for connecting to a cellphone. At first, I was using an adapter for my good old Nokia 6310, even after I changed to the Nokia E71. Connecting was easy: pair the phone with the "VW UHV" bluetooth entity, and done. This has the phone connected to the car kit at all times, so even non-call-related functions use the car audio system (e.g. voice recognition). But progress will have its way, no matter what happens. So in comes the "bluetooth touch adapter". Instead of a phone-specific adapter, this is a small touchscreen device that slots into the UHV dashboard mount. Connecting a phone is very different now: the Bluetooth Touch Adapter connects to the "VW UHV" device via bluetooth the phone connects to "Touch Adapter" device, also via bluetooth The device doesn't allow step 2 if step 1 didn'...

Reset lost root password on vSphere ESXi 6.7

VMware's solution to a lost or forgotten root password for ESXi is simple: go to  https://kb.vmware.com/s/article/1317898?lang=en_US  and you'll find that "Reinstalling the ESXi host is the only supported way to reset a password on ESXi". If your host is still connected to vCenter, you may be able to use Host Profiles to reset the root password, or alternatively you can join ESXi in Active Directory via vCenter, and log in with a user in the "ESX Admins" AD group. If your host is no longer connected to vCenter, those options are closed. Can you avoid reinstallation? Fortunately, you can. You will need to reset and reboot your ESXi though. If you're ready for an unsupported deep dive into the bowels of ESXi, follow these steps: Create a bootable Linux USB-drive (or something else you can boot your server with). I used a CentOS 7 installation USB-drive that I could use to boot into rescue mode. Reset your ESXi and boot from the Linux medium. Ident...

GEM WS2 MIDI System Exclusive structure and checksums

MIDI is the standard for communication between electronic music instruments like keyboards and synthesizers. And computers! While tinkering with an old floppy-less GEM WS2 keyboard, I wanted to figure out the structure of their System Exclusive memory dumps. SysEx is the vendor-specific (and non-standard) part of MIDI. Vendors can use it for real-time instructions (changing a sound parameter in real-time) and for non-real-time instructions (sending or loading a configuration, sample set, etc.). In the GEM WS2, there's two ways of saving the memory (voices, globals, styles and songs): in .ALL files on floppy, and via MIDI SysEx. The .ALL files are binary files, 60415 bytes long. The only recognizable parts are the ASCII encoded voice and global names. The SysEx dumps are 73691 bytes long. As always in MIDI, only command start (and end) bytes have MSB 1, and all data bytes have MSB 0. The data is spread out over 576 SysEx packets, preceded by one SysEx packet with header informat...