Go Beyond

Written by Teran McKinney
/ About Me / Half-time Remote DevOps/Systems Engineer /

Sparse files

Sparse files are files which define a large file boundary/size, without initially filling it with any data to take up disk space. They merely allocate logical space without counting towards du/df usage.

This is useful with virtual disk images, for VMWare, XenServer, or most whatever. You can give users terabytes of space while only having gigabytes; of course this backfires when they start using the space and the file grows until you run out.

They're also very helpful for loopback filesystems, perhaps encrypted if you're wishing to keep secrets from Big Brother and are too lazy to repartition.

Anyways, on a Linuxy environment I know of two ways to do this:

truncate -s 100T bigsparsefile

truncate is seldom mentioned, which is the main reason I wrote the post. It's clean and simple, though in my opinion, not as elegant as the dd approach.

dd count=0 bs=1 seek=100T of=100tbsparsefile

This is ingenius and I did not make it up on my own. You need no input, as the only relevant portion is the seek as no real data is written. Only an inode is created with stated file size.

You need the bs=1 because dd's default is to write 512 bytes per unit. This cuts it down to one byte per teraseek, instead of 512 bytes per tera/giga/megaseek.

Keep in mind that sparse files are often stored in archives and are transfered as zeroes, which can make for some awkward results. You usually need special flags to properly handle sparse files. -S should do the trick for tar and rsync.

Editor's note: This is pretty old. There can be some drawbacks to this and it may not be good to do for using swap. Email me if you're curious about this.

Share on Voat.