The problem#
I will often make copies of important files onto multiple devices, and
then later make backups of all of those devices onto the same drive. At
which point, I now have multiple redundant copies of those files within
my backup. Tools like rdfind
, fdupes
,
and jdupes
exist to deal with the general problem of
searching a collection of files for duplicates efficiently, but none of
them support only checking if files are identical if their filenames
and/or paths match, so they end up doing a lot of extra work in this
case.
The solution#
Download the script I wrote, hardlink-dups-by-name.sh
and run it as follows:
hardlink-dups-by-name.sh a_backup/ another_backup/
Then all files like a_backup/some/path
that are identical to the
corresponding file another_backup/some/path
will get hard-linked
together so there will only be one copy of the data taking up space.