The problem#
My previous blog post has
a footnote in the first sentence. Due to the way footnotes are handled,
the footnote reference is a link to #fn:prg
, which works fine if the
footnote is actually on the page, but on the blog main page (or any
other listing of multiple articles) the footnote is not present because
it's after the Read more…
link. The result is that on those
pages, all footnote references are broken links. These broken links
should either be repaired such that they point to the article page or
removed.
First attempt#
Unable to find an existing solution, I decided to write my own plugin, summary_footnotes. I started by finding another plugin, clean_summary that modifies summary and based my code off of it. That plugin uses Beautiful Soup to parse the summary and rewrite it. A quick look at the docs and I was able to figure out how to select the footnote links and rewrite them, which got me this version of the plugin.
The bug#
Everything was fine on my dev environment, but when I pushed it to the live site, I got an error like
WARNING: Unable to find ParseResult(scheme='', netloc='', path=u'somefile.md', params='', query='', fragment=''), skipping url replacement
Specifically on this blog post
which contains a link to an earlier blog post using the
{filename}somefile.md
-style internal link scheme.
Using make DEBUG=1 publish
and adding some debugging statements of my
own to contents.py
where that warning message was generated, I saw
that the error occurred when the linking blog post was loaded before
the post it linked to. My summary rewriting was somehow causing the
content to get generated too early, before every post had been loaded.
I confirmed this by checking that the clean_summary
plugin caused the same bug to occur. Then I used the following code
to output a stack trace so I could see exactly how my code was calling
into the code which generated the warning:
import traceback
for line in traceback.format_list(traceback.extract_stack()):
logger.debug(line)
The (hackish) solution#
Asking for the summary was causing the content to get generated
early. The solution I came up with with is a hack: I replaced the
_get_summary()
method with one that calls the original and then
rewrites the result as desired. Then the new code only gets invoked
when the summary is actually accessed so it isn't changing the
access patterns.
Specifically, this type of hack is called a monkey patch,
and is generally considered extremely poor programming practice because
it makes code very difficult to follow.
Some research revealed that the bug actually exists in the summary plugin as well, I just hadn't been triggering it. The conclusion of that thread is that the bug is due to the Pelican plugin API lacking the proper hooks and a fix is being worked on. The thread also includes a workaround in the form of the ssummary plugin, which is also implemented using monkey patching.
As is expected from monkey patching, my monkey patch made assumptions
broken by ssummary's. I fixed it by copying ssummary's way
of doing the patch, which is more resilient due to overwriting the entire
property and keeping a private copy of the old version in orig_summary
which is referenced by the new property:
orig_summary = content.Contents.summary
contents.Content.summary = \
property(lambda instance:
get_summary(instance, orig_summary),
orig_summary.fset, orig_summary.fdel,
orig_summary.__doc__)
Get the code and use it on your own blog.
Comments
Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.
There are no comments yet.