The problem#
I've recently been writing more series of blog posts or otherwise
linking between posts using {filename}
links. And also I've been adjusting the scheduling of my future planned
blog posts, which involves changing the filename as my naming scheme
includes the publication date in the filename. Which means there's
opportunities for not adjusting the links to match and ending up with
broken links between posts.
Pelican does generate warnings like
WARNING Unable to find './invalid.md', skipping url log.py:89
replacement.
but currently building my entire blog takes about a minute, so I
generally only do it when publishing. So I wanted a more lightweight way
to just check the intra-blog {filename}
links.
The solution#
I wrote the script check_filename_links.sh
:
#!/bin/bash
content="${1:-.}"
find "$content" -iname '*.md' -type f -print0 |
while IFS= read -r -d '' filename
do
grep '^\[.*]: {filename}' "$filename" |
sed 's/^[^ ]* {filename}\([^\#]*\)\#\?.*$/\1/' |
while read -r link
do
if [ "${link:0:1}" != "/" ]
then
linkedfile="$(dirname "$filename")/$link"
else
linkedfile="$content$link"
fi
if [ ! -f "$linkedfile" ]
then
echo "filename=$filename, link=$link,"\
"file does not exist: $linkedfile"
fi
done
done
Run it from your content/
directory or provide the path to the
content/
directory as an argument and it will print out the broken
links:
filename=./foo/bar.md, link=./invalid.md, file does not exist: ./foo/./invalid.md