A Weird Imagination

Mysterious Twitter scraping bug

Posted in

The bug

A couple days ago my Twitter screen scraper stopped working in Liferea. I hadn't changed anything, and the script's output at the command-line still looked okay to my inspection, but Liferea started giving the message

The last update of this subscription failed!
There were errors while parsing this feed!

Details

Could not detect the type of this feed! Please check if the source really points to a resource provided in one of the supported syndication formats!

XML Parser Output:

The URL you want Liferea to subscribe to points to a webpage and the auto discovery found no feeds on this page. Maybe this webpage just does not support feed auto discovery.Source points to HTML document.

You may want to contact the author/webmaster of the feed about this!

Needless to say, this wasn't a very useful error message. Weirdly, it even sometimes included the error

HTTP error code 404: Resource Not Found

even though it was reading from the output of a script, not loading the feed from the web.

First debugging attempt

So I ran Liferea from the command-line because I knew it prints some status messages out and thought it might show more detail there.

And it started working. I closed it and tried running it again the normal way (through the applications menu) and the bug reappeared. It looked like I had a heisenbug.

Second debugging attempt

In attempt to get Liferea to do something different, I created a new subscription that used the Twitter scraping command as a filter instead of a source, giving it a dummy website as the source. The error was more informative: it said the script returned an exit status of 1 (non-zero values mean something went wrong). I checked on a terminal to confirm that the script in fact returned an exit status of 0:

$ twitter_user_to_rss_file twitter >/dev/null
$ echo $?
0

To see what was going wrong, I had to somehow see what this process was doing inside Liferea. To do so, I used strace:

strace -p $(pidof liferea) -f

The -p option attaches to an already running process with the given PID, which is found using pidof. The -f option makes strace attach to all child processes as well. In our case, the process of interest is the process Liferea will spawn to run the script.

It turns out Liferea makes a lot of calls to poll() and recvmsg(), so I had to filter those out:

strace -p $(pidof liferea) -f -e trace=\!poll,recvmsg

Note the escaping on ! because bash treats it as a special character.

The cause

Looking at that trace, I was able to find the error message being printed out by my script: Perl was unable to find a library. I had installed it locally in my home directory from CPAN, but the environment variables telling it so were set in my .bashrc.

Environment variables are inherited by child processes, which is why it worked when I ran Liferea from a terminal: bash set the variables using my .bashrc and Liferea got them from bash. When running from the application menu, there was no bash in the process tree, so the variables were never set.

The solution

The immediate workaround was, of course, to simply run Liferea from a terminal. The long-term fix requires setting those variables for all X programs, which can be done by setting the environment variables in your ~/.xsession file. I created a separate ~/bin/perl_local_libs with the settings which I source from both my ~/.bashrc and ~/.xsession files.

Comments

Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.

There are no comments yet.