The problem#
I changed the URI scheme of this blog recently from
/posts/YYYY/MM/slug/
to /YYYY/MM/DD/slug/
. The
latter looks better and makes the actual day of the post more visible.
But I already had posts using the old scheme and
cool URIs don't change. Luckily, someone wrote a Pelican
plugin called pelican-alias which allows articles to be
tagged with additional URIs to redirect to their canonical location. All I
had to do was add an Alias: /posts/2015/02/...
line to the top of each
of the posts I had already written and the plugin would take care of the
rest.
Automating the aliasing#
The non-trivial part of automating this is that the URIs include the article's slug, which may have been generated by Pelican from the title, so Pelican has to be involved in generating the correct redirects.
There are two ways I could have automated this process:
- Modify the plugin to add a redirect from the old scheme to the new scheme for every article. Unless somehow controlled, this would result in creating redirects for new articles which do not need them.
- Write a one-off script to get the slugs out of Pelican and write the
Alias:
lines into the blog posts.
I took the latter approach because it was simpler and involved no new code to maintain.
Script to add Alias:
lines#
I wrote the following one-off script
(which relies a bit on my content/
directory layout and URI scheme):
make html DEBUG=1 2>&1 |
grep -F 2015/02 |
grep -F Writing |
sed s@.*2015\/02\/@@ |
sed s@/index.html@@ |
sed "s@^\([^/]*\)/\([^/]*\)$@@sed -i '/^Tags:/aAlias: /posts/2015/02/\2/' content/2015/02\1-*.md@"
How it works#
This script doesn't actually make any file changes. Instead it outputs a script which makes the changes. It could easily be modified to run the script that it generates, but it's a one-off script so I wanted to be able to inspect the output before executing it anyway.
Code being generated#
The goal is to output a list of sed
commands which will add
the Alias:
lines. For example, the post Hello World!
posted on February 1, 2015 has the slug
. We need
to get the filename of the post somehow. As I have at most one post
per day, there's a mapping from dates to filenames, specifically
hello-world
content/2015/0201-*.md
will match the only post made on February 1,
2015. If the filename scheme did not include the date, then
$(grep -Rl '^Date: 2015-02-01' content/ | grep -v ^content/pages)
would get the filename of the post (but not page) that has a Date:
line saying it was posted on February 1, 2015.
So the actual sed
command the script generates looks like
sed -i '/^Tags:/aAlias: /posts/2015/02/hello-world/' content/2015/0201-*.md
This uses two sed
features that are less commonly used:
- The
-i
option tellssed
to write its changes back to the source file. - The
a
command appends a line. Before the command, we anchor on theTags:
line, which was the last line of the header in all of my blog posts.
How to generate it#
Adding the DEBUG=1
will make Pelican output a lot of information about
exactly what it is doing. The relevant part for this script is that for
every blog article Pelican will write a line stating where it outputs
the article to:
-> Writing blog/output/2015/02/01/hello-world/index.htm
Due to the URI scheme, these lines include the full date and the slug, which is the information we need.
All of the debugging information is output to stderr. In order
to actually process it, 2>&1
redirects stderr (fd
2) to stdout (fd 1), which is the stream that pipes pass to
the next command.
The grep
commands select the lines we want. The first two
sed
commands select out everything before the day and after the
slug, leaving just 01/hello-world
in our example. Note the use of @
instead of /
as the delimiter for the sed
s
command as the
patterns contain /
but don't contain @
.
The final sed
command selects out the strings before and after the
/
and uses them to write a new sed
command. The first string,
\1
, is the day of the post and \2
is the slug of the post, which are
the two pieces needed to build the output sed
command described
above.
Comments
Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.
There are no comments yet.