The problem#
My existing script for publishing my blog has Pelican run on the web server and generate the static site directly into the directory served by nginx. This has the effect that while the blog is being published, it is inaccessible or some of the pages or styles are missing. The publish takes well under a minute, so this isn't a big issue, but there's no reason for any downtime at all.
The solution#
Instead of serving the output/
directory, instead generate it
and then copy it over by changing the make publish
line in
schedule_publish.sh
to the following:
make publish || exit 1
if [ -L output_dir ]
then
cp -r output output_dir/
rm -rf output_dir/html.old
mv output_dir/html output_dir/html.old
mv output_dir/output output_dir/html
fi
where output_dir/
is a symbolic link to the parent of the
directory actually being served and html/
is the directory actually
being served (which output/
previously was a symbolic link to).
The details#
Double buffering#
This is effectly double buffering the output/
directory.
The work is organized such that invalidating the old site and installing
the new site can be executed in quick succession. As both operations
are done as simple renames (mv
within a single filesystem), they
will be nearly instant, so it's extremely unlikely a web request will
arrive while the site is down. If we wanted to avoid even that, we could
tell nginx check both directories (html/
and html.old/
)
for every request, but that seems unnecessary and without extra work
would mean files that were deleted in the most recent version of the
site would still be accessible until the next update as nginx would find
them in html.old/
.
To ensure the timing works out, the script assumes output_dir/
may be
a symbolic link to a different filesystem and therefore copies output/
over first as that may take time. Also, instead of deleting the old
site, it is just moved out of the way, which doubles as a backup in case
the new site ends up being corrupted somehow. The previous old site is
deleted before moving the current site out of the way as deleting files
also takes time.
Error handling#
Adding the || exit 1
after make publish
ensures that if
the publish fails, the old site will stay in place. This also skips
scheduling future posts to be published, but if the site is unable to be
published, that would be a waste of time anyway.
Permissions#
In order to make sure the files are generated with the right permissions
to be readable by the web server, output/
and the directory the
output_dir/
link points to both have their group set to www-data
and
have the g+s
bit set with chmod
so files/directories
created in them will inherit the group and therefore also be readable by
www-data
.
Comments
Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.
There are no comments yet.