Transferring many small files is much slower than you would expect given their total size.
tar c directory | pv -abrt | ssh target 'cd destination; tar x'
cd destination; ssh source tar c directory | pv -abrt | tar x
There's a few different things going on here. The primary issue is that
common file transfer programs like
poorly handle the case of lots of small files, which is solved by
transferring one large file. We can create an archive with
put all of the small files into one file and then transfer that file
tar cf archive.tar "$directory" scp archive.tar "$target:$destination" rm archive.tar
That works, but creating the intermediate file
archive.tar might take
a lot of time and disk space, during which the actual transfer hasn't
even started yet.
Taking a step back, we are going to use a useful feature of
that makes it very flexible for scripting, but is easy to be unaware
of if you only every use
ssh for its common usage case of getting an
ssh can take as an argument a command to run on the destination
computer. A simple demonstration of this is
w which will output the status of the remote computer.
We note that the output appeared on our terminal because
handles sending the output of
w back along the network connection.
In fact, it will also send its input over the network, so
echo "Hello World!" | ssh $hostname 'cat > file'
creates a file on the remote host containing the string
which was generated on the local host.
In our case, we want to generate an archive on one host, send it over
ssh, and extract it on the target host. Importantly, we are going
to take advantage of the fact that
tar was designed to generate
its output sequentially, without making any random access writes to
the archive. While this was intented for outputting tape backups, it
helps us here as well. To make
tar output to standard output, we
tar command to
tar cf - directory
tar c directory—like most
programs it defaults to using standard input/output, which
can be explicitly denoted by
-). On the other side of the pipe,
tar x to extract the archive. Inserting the
ssh in the middle gives us
tar c directory | ssh
target tar x. If you don't want to extract the archive into
your home directory on the target, you can add a
cd to the desired
directory before the
tar x command, just remember to
enclose the entire command to be run on the remote host in quotes or the
semicolon will be misinterpreted as ending the
Depending on the nature of the files being transferred and the network link, it may be worthwhile to compress the files before transferring them. Do note that on a fast enough network this may in fact slow down the transfer.
The simplest way is to just use
tar's compression support by adding a flag to tar like
--lzop. If you want more control, you can separate out the compression
tar c directory | gzip | ssh destination 'gunzip | tar x',
which is equivalent to giving
ssh also has built-in suppport for compression
which can be enabled with the
-C flag. In practice, this often makes
A progress bar
When doing a file transfer, you usually want to know how fast it's going
and how long it's going to take. This is especially useful if you are
experimenting with compression settings to see which one is the fastest.
Do note that in that case you need to be careful to measure the transfer
speed of the uncompressed content, which is why I mentioned splitting
out the compression commands from
There's a very useful little utility called
which will sit on a pipe and display the status of transfers through
that pipe. Here we will be using it to measure the network transfer
speed. By inserting
pv in the pipeline before
ssh, we can get an
output of how fast
ssh is transferring data.
pv has a lot of output
options, so you'll probably want to check
the man page.
The ones I used above are
-a for the current and average transfer
-t for the elapsed time, and
-b for bytes transferred so far.
Note that we can't easily get a progress percentage out of
because it has no way of knowing how big the transfer is. We can tell
it using the
-s option… but the exact transfer size is unknown.
du -bs directory can be used to estimate the
size, but it ignores the overhead. Also, the
-b option for counting
bytes is a GNU extension. If not using GNU
du, then use
count kilobytes and multiply the result by 1024.