A Weird Imagination

Changing Pelican URL scheme

Posted in

The problem

I changed the URI scheme of this blog recently from /posts/YYYY/MM/slug/ to /YYYY/MM/DD/slug/. The latter looks better and makes the actual day of the post more visible.

But I already had posts using the old scheme and cool URIs don't change. Luckily, someone wrote a Pelican plugin called pelican-alias which allows articles to be tagged with additional URIs to redirect to their canonical location. All I had to do was add an Alias: /posts/2015/02/... line to the top of each of the posts I had already written and the plugin would take care of the rest.

Automating the aliasing

The non-trivial part of automating this is that the URIs include the article's slug, which may have been generated by Pelican from the title, so Pelican has to be involved in generating the correct redirects.

There are two ways I could have automated this process:

  1. Modify the plugin to add a redirect from the old scheme to the new scheme for every article. Unless somehow controlled, this would result in creating redirects for new articles which do not need them.
  2. Write a one-off script to get the slugs out of Pelican and write the Alias: lines into the blog posts.

I took the latter approach because it was simpler and involved no new code to maintain.

Script to add Alias: lines

I wrote the following one-off script (which relies a bit on my content/ directory layout and URI scheme):

make html DEBUG=1 2>&1 |
    grep -F 2015/02 |
    grep -F Writing |
    sed s@.*2015\/02\/@@ |
    sed s@/index.html@@ |
    sed "s@^\([^/]*\)/\([^/]*\)$@@sed -i '/^Tags:/aAlias: /posts/2015/02/\2/' content/2015/02\1-*.md@"

How it works

This script doesn't actually make any file changes. Instead it outputs a script which makes the changes. It could easily be modified to run the script that it generates, but it's a one-off script so I wanted to be able to inspect the output before executing it anyway.

Code being generated

The goal is to output a list of sed commands which will add the Alias: lines. For example, the post Hello World! posted on February 1, 2015 has the slug hello-world. We need to get the filename of the post somehow. As I have at most one post per day, there's a mapping from dates to filenames, specifically content/2015/0201-*.md will match the only post made on February 1, 2015. If the filename scheme did not include the date, then

$(grep -Rl '^Date: 2015-02-01' content/ | grep -v ^content/pages)

would get the filename of the post (but not page) that has a Date: line saying it was posted on February 1, 2015.

So the actual sed command the script generates looks like

sed -i '/^Tags:/aAlias: /posts/2015/02/hello-world/' content/2015/0201-*.md

This uses two sed features that are less commonly used:

  1. The -i option tells sed to write its changes back to the source file.
  2. The a command appends a line. Before the command, we anchor on the Tags: line, which was the last line of the header in all of my blog posts.

How to generate it

Adding the DEBUG=1 will make Pelican output a lot of information about exactly what it is doing. The relevant part for this script is that for every blog article Pelican will write a line stating where it outputs the article to:

-> Writing blog/output/2015/02/01/hello-world/index.htm

Due to the URI scheme, these lines include the full date and the slug, which is the information we need.

All of the debugging information is output to stderr. In order to actually process it, 2>&1 redirects stderr (fd 2) to stdout (fd 1), which is the stream that pipes pass to the next command.

The grep commands select the lines we want. The first two sed commands select out everything before the day and after the slug, leaving just 01/hello-world in our example. Note the use of @ instead of / as the delimiter for the sed s command as the patterns contain / but don't contain @.

The final sed command selects out the strings before and after the / and uses them to write a new sed command. The first string, \1, is the day of the post and \2 is the slug of the post, which are the two pieces needed to build the output sed command described above.

Comments

Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.

There are no comments yet.