Liferea is a desktop news aggregator (sometimes called an RSS reader). Unlike the late Google Reader or most of its alternatives like the open-source Tiny Tiny RSS which are web-based and run on a server to be accessed via a web browser, Liferea is a separate desktop application and uses an embedded browser to view content.
The problem#
Sometimes you don't actually care about all of the items in a feed and the site provides no filtering mechanism. If the uninteresting items are rare enough, you can just ignore them, but a news aggregator is most useful if it only notifies you of news items you actually might want to read.
The solution#
Luckily, Liferea is very flexible. It supports running a command on
a feed which it calls a conversion filter
. I wrote
some python scripts to filter feeds by title locally.
For instance, I wanted to follow only the changelog posts in the forum feed
http://braceyourselfgames.com/forums/feed.php
,
but it includes changes to all forum topics, so I checked the Use
conversion filter
option and set the conversion filter to
/path/to/atom_filter_title.py --whitelist "Re: Change log"
The details#
atom_filter_title.py
parses the feed and removes
unwanted items using Python's ElementTree
library
(rss_filter_title.py
is the RSS version of the script
which is mostly the same):
#!/usr/bin/env python3
from sys import stdin, stdout, argv
from xml.etree.ElementTree import ElementTree
import xml
tree = ElementTree()
xml.etree.ElementTree.register_namespace("","http://www.w3.org/2005/Atom")
tree.parse(stdin)
if argv[1] == '--whitelist':
whitelist=True
keywordID = argv[2:]
else:
whitelist=False
keywordID = argv[1:]
root = tree.getroot()
for node in tree.findall('{http://www.w3.org/2005/Atom}entry'):
ch = node.find('{http://www.w3.org/2005/Atom}title')
if ch is not None:
node_matched = False
for keyword in keywordID:
if ch.text.find(keyword) != -1:
node_matched = True
break
if node_matched ^ whitelist:
root.remove(node)
tree.write(stdout, encoding='unicode')
The script can take any number of arguments which is a list of strings
which must not appear in the title. Or, alternatively, the --whitelist
argument can be used to indicate at least one of the strings must appear
in the title.
Comments
Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.
There are no comments yet.