The problem#
When writing blog posts, I like to use Markdown's reference-style links which let you avoid writing URLs inline and instead provide a short name and define it elsewhere in the document. I always put them at the end, which results in the bottom of the Markdown file looking like a bibliography for the post. But then there's the extra task of making sure the references at the bottom of the post are consistent with their usage in the blog post; this isn't a huge problem as usually I add a link and immediately use it, looking at the preview to make sure it got used properly. But sometimes I'll start a post by entering a list of links I expect to use, and sometimes I'll miss something checking the preview.
The solution#
lint_refs.py
takes any number of Markdown files as arguments and
prints the references that are invalid or not used:
#!/usr/bin/env python
import sys
from pathlib import Path
from markdown import markdown, extensions, postprocessors
class ReferenceProxy(dict):
def __init__(self, *arg, **kw):
super(ReferenceProxy, self).__init__(*arg, **kw)
self.read = set()
def __contains__(self, key):
self.read.add(key)
return super().__contains__(key)
class ReferenceLintExtension(extensions.Extension,
postprocessors.Postprocessor):
def __init__(self, filename):
self.filename = filename
def extendMarkdown(self, md):
self.refs = md.references = \
ReferenceProxy(**md.references)
md.postprocessors.register(self, 'ref_lint', 1)
def run(self, text):
undefined = self.refs.read - set(self.refs.keys())
unused = set(self.refs.keys()) - self.refs.read
if unused or undefined:
print(f"\n\n# {self.filename}")
if undefined:
print("\n## UNDEFINED REFERENCES")
print('\n'.join(sorted(undefined)))
if unused:
print("\n## UNUSED REFERENCES")
print('\n'.join(sorted(unused)))
return text
for filename in sys.argv[1:]:
markdown(Path(filename).read_text(), extensions=[
ReferenceLintExtension(filename),
'markdown.extensions.extra'])
Given example.md
:
A [broken link][broken]. A [working link][working].
[working]: https://example.com/
[not-used]: https://example.org/
$ ./lint-refs.py example.md
# example.md
## UNDEFINED REFERENCES
broken
broken link
## UNUSED REFERENCES
not-used