The problem#

My previous blog post has a footnote in the first sentence. Due to the way footnotes are handled, the footnote reference is a link to #fn:prg, which works fine if the footnote is actually on the page, but on the blog main page (or any other listing of multiple articles) the footnote is not present because it's after the Read more… link. The result is that on those pages, all footnote references are broken links. These broken links should either be repaired such that they point to the article page or removed.

First attempt#

Unable to find an existing solution, I decided to write my own plugin, summary_footnotes. I started by finding another plugin, clean_summary that modifies summary and based my code off of it. That plugin uses Beautiful Soup to parse the summary and rewrite it. A quick look at the docs and I was able to figure out how to select the footnote links and rewrite them, which got me this version of the plugin.

The bug#

Everything was fine on my dev environment, but when I pushed it to the live site, I got an error like

WARNING: Unable to find ParseResult(scheme='', netloc='', path=u'somefile.md', params='', query='', fragment=''), skipping url replacement

Specifically on this blog post which contains a link to an earlier blog post using the {filename}somefile.md-style internal link scheme. Using make DEBUG=1 publish and adding some debugging statements of my own to contents.py where that warning message was generated, I saw that the error occurred when the linking blog post was loaded before the post it linked to. My summary rewriting was somehow causing the content to get generated too early, before every post had been loaded. I confirmed this by checking that the clean_summary plugin caused the same bug to occur. Then I used the following code to output a stack trace so I could see exactly how my code was calling into the code which generated the warning:

import traceback
for line in traceback.format_list(traceback.extract_stack()):
    logger.debug(line)

The (hackish) solution#

Asking for the summary was causing the content to get generated early. The solution I came up with with is a hack: I replaced the _get_summary() method with one that calls the original and then rewrites the result as desired. Then the new code only gets invoked when the summary is actually accessed so it isn't changing the access patterns. Specifically, this type of hack is called a monkey patch, and is generally considered extremely poor programming practice because it makes code very difficult to follow.

Some research revealed that the bug actually exists in the summary plugin as well, I just hadn't been triggering it. The conclusion of that thread is that the bug is due to the Pelican plugin API lacking the proper hooks and a fix is being worked on. The thread also includes a workaround in the form of the ssummary plugin, which is also implemented using monkey patching.

As is expected from monkey patching, my monkey patch made assumptions broken by ssummary's. I fixed it by copying ssummary's way of doing the patch, which is more resilient due to overwriting the entire property and keeping a private copy of the old version in orig_summary which is referenced by the new property:

orig_summary = content.Contents.summary
contents.Content.summary = \
    property(lambda instance:
                get_summary(instance, orig_summary),
             orig_summary.fset, orig_summary.fdel,
             orig_summary.__doc__)

Get the code and use it on your own blog.

A Weird Imagination

My first Pelican plugin

The problem#

First attempt#

The bug#

The (hackish) solution#

Comments