Pelican's default Makefile
includes an option
make regenerate
which uses Pelican's
-r
/--autoreload
option to regenerate the site whenever a
file is modified. Combined with the Firefox extension Auto
Reload, this makes it easy to keep an eye on how a blog
post will be rendered as you author it and to quickly preview theme
changes.
The problem#
With just thirty articles, Pelican already takes several seconds to regenerate the site. For publishing a site, this is plenty fast, but for tweaking formatting in a blog post or theme, this is too slow.
The quick solution#
Pelican has an option, --write-selected
, which makes it only write out
the files listed. Writing just one file takes about half a second on my
computer, even though it still has to do some processing for all of the
files in order to determine what to write. To use --write-selected
,
you have to determine the output filename of the article you are editing:
$ pelican -r content -o output -s pelicanconf.py \
--relative-urls \
--write-selected output/draft/in-progress-article.html
The right solution#
Optimally, we wouldn't have to tell Pelican which file to output; instead, it would figure out which files could be affected by a change and regenerate only those files.
Once you consider things like {filename}
links and both adding and
removing tags, it becomes clear that figuring out dependencies is
actually non-trivial, and it would be easy to miss some detail.
A workaround would be to implicitly handle dependencies by taking all
of the information that would be used to write the file and recording
it. At the step where --write-selected
applies, much of the processing
has already been done, including processing {filename}
links and
organizing articles into tags. The only step remaining is actually using
the template to build the actual HTML file. If that information changes,
then the output file will change, so it should be regenerated. That's
actually an oversimplification because there's probably properties that
the template doesn't read, but figuring out exactly which properties the
template does read could be difficult.
As a further simplification, instead of remembering the full data for every output file, we notice that all we care about is whether that data changed, not what it is, so it's enough to just store a hash value. In order to avoid collisions, the hash should be a cryptographic hash function of some repeatable value like the string representation of the dictionary with the keys in sorted order.
I will look into implementing this at some point, but for the time being, I found a simpler, albeit incomplete, solution.
The hackish solution#
I realised that I don't actually need a complete solution: most of
the time when I care about regenerate being fast, I am writing a new
blog post marked as a draft. Furthermore, I rarely have many drafts
at the same time, so instead of figuring out exactly which files to
generate, it suffices to regenerate all drafts. So I added an option
--write-only-drafts
to do so.
The --write-only-drafts
flag is like the --write-selected
flag
except instead of taking an argument, it selects all drafts.
Comments
Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.
There are no comments yet.