I changed the URI scheme of this blog recently from
latter looks better and makes the actual day of the post more visible.
But I already had posts using the old scheme and
cool URIs don't change. Luckily, someone wrote a Pelican
plugin called pelican-alias which allows articles to be
tagged with additional URIs to redirect to their canonical location. All I
had to do was add an
Alias: /posts/2015/02/... line to the top of each
of the posts I had already written and the plugin would take care of the
Automating the aliasing
The non-trivial part of automating this is that the URIs include the article's slug, which may have been generated by Pelican from the title, so Pelican has to be involved in generating the correct redirects.
There are two ways I could have automated this process:
- Modify the plugin to add a redirect from the old scheme to the new scheme for every article. Unless somehow controlled, this would result in creating redirects for new articles which do not need them.
- Write a one-off script to get the slugs out of Pelican and write the
Alias:lines into the blog posts.
I took the latter approach because it was simpler and involved no new code to maintain.
Script to add
I wrote the following one-off script
(which relies a bit on my
content/ directory layout and URI scheme):
make html DEBUG=1 2>&1 | grep -F 2015/02 | grep -F Writing | sed s@.*2015\/02\/@@ | sed s@/index.html@@ | sed "s@^\([^/]*\)/\([^/]*\)$@@sed -i '/^Tags:/aAlias: /posts/2015/02/\2/' content/2015/02\1-*.md@"
How it works
This script doesn't actually make any file changes. Instead it outputs a script which makes the changes. It could easily be modified to run the script that it generates, but it's a one-off script so I wanted to be able to inspect the output before executing it anyway.
Code being generated
The goal is to output a list of
sed commands which will add
Alias: lines. For example, the post Hello World!
posted on February 1, 2015 has the slug
. We need
to get the filename of the post somehow. As I have at most one post
per day, there's a mapping from dates to filenames, specifically
content/2015/0201-*.md will match the only post made on February 1,
2015. If the filename scheme did not include the date, then
$(grep -Rl '^Date: 2015-02-01' content/ | grep -v ^content/pages)
would get the filename of the post (but not page) that has a
line saying it was posted on February 1, 2015.
So the actual
sed command the script generates looks like
sed -i '/^Tags:/aAlias: /posts/2015/02/hello-world/' content/2015/0201-*.md
This uses two
sed features that are less commonly used:
sedto write its changes back to the source file.
acommand appends a line. Before the command, we anchor on the
Tags:line, which was the last line of the header in all of my blog posts.
How to generate it
DEBUG=1 will make Pelican output a lot of information about
exactly what it is doing. The relevant part for this script is that for
every blog article Pelican will write a line stating where it outputs
the article to:
-> Writing blog/output/2015/02/01/hello-world/index.htm
Due to the URI scheme, these lines include the full date and the slug, which is the information we need.
All of the debugging information is output to stderr. In order
to actually process it,
2>&1 redirects stderr (fd
2) to stdout (fd 1), which is the stream that pipes pass to
the next command.
grep commands select the lines we want. The first two
sed commands select out everything before the day and after the
slug, leaving just
01/hello-world in our example. Note the use of
/ as the delimiter for the
s command as the
/ but don't contain
sed command selects out the strings before and after the
/ and uses them to write a new
sed command. The first string,
\1, is the day of the post and
\2 is the slug of the post, which are
the two pieces needed to build the output
sed command described