A Weird Imagination

Title filtering for Liferea

Posted in

Liferea is a desktop news aggregator (sometimes called an RSS reader). Unlike the late Google Reader or most of its alternatives like the open-source Tiny Tiny RSS which are web-based and run on a server to be accessed via a web browser, Liferea is a separate desktop application and uses an embedded browser to view content.

The problem#

Sometimes you don't actually care about all of the items in a feed and the site provides no filtering mechanism. If the uninteresting items are rare enough, you can just ignore them, but a news aggregator is most useful if it only notifies you of news items you actually might want to read.

The solution#

Luckily, Liferea is very flexible. It supports running a command on a feed which it calls a conversion filter. I wrote some python scripts to filter feeds by title locally.

For instance, I wanted to follow only the changelog posts in the forum feed http://braceyourselfgames.com/forums/feed.php, but it includes changes to all forum topics, so I checked the Use conversion filter option and set the conversion filter to

/path/to/atom_filter_title.py --whitelist "Re: Change log"

The details#

atom_filter_title.py parses the feed and removes unwanted items using Python's ElementTree library (rss_filter_title.py is the RSS version of the script which is mostly the same):

#!/usr/bin/env python3

from sys import stdin, stdout, argv
from xml.etree.ElementTree import ElementTree
import xml


tree = ElementTree()
xml.etree.ElementTree.register_namespace("","http://www.w3.org/2005/Atom")
tree.parse(stdin)

if argv[1] == '--whitelist':
    whitelist=True
    keywordID = argv[2:]
else:
    whitelist=False
    keywordID = argv[1:]
root = tree.getroot()

for node in tree.findall('{http://www.w3.org/2005/Atom}entry'):
    ch = node.find('{http://www.w3.org/2005/Atom}title')
    if ch is not None:
        node_matched = False
        for keyword in keywordID:
            if ch.text.find(keyword) != -1:
                node_matched = True
                break
        if node_matched ^ whitelist:
            root.remove(node)
tree.write(stdout, encoding='unicode')

The script can take any number of arguments which is a list of strings which must not appear in the title. Or, alternatively, the --whitelist argument can be used to indicate at least one of the strings must appear in the title.

Comments

Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.

There are no comments yet.