The problem#

In a web app where the display should be constantly up-to-date, the client needs some way to get up-to-date information from the server. One of the simplest ways to do so is to regularly (every few seconds) query the server asking if there is new information. This involves making a lot of requests and is wasteful of bandwidth and processor time on both the client and server (the latter can be improved with caching).

If updates are rare, it makes much more sense for the server to notify the client when they occur, but HTTP is designed around the client making requests to the server, not the other way around. And, furthermore, the Django web framework (like many web frameworks) is built around that model.

The solution#

Of course, this is a well-understood problem and there are a wide variety of APIs and libraries to solve it discussed on the Wikipedia page for Comet. The main workarounds are WebSockets which is a very flexible technology for two-way communication in a web browser and long polling which is a simpler technique which involves merely having the server not answer a request immediately and instead wait until it actually has an update to reply with.

In the rest of this blog post, I discuss the changes I made to convert a Django-based web app that I originally wrote to use a basic polling pattern and hosted using uWSGI to instead use long polling and be hosted using Gunicorn/Uvicorn. I also cover nginx configuration including hosting the app in a subdirectory.

The details#

Why long polling?#

Since I was retrofitting a web app that I had already written to use a basic polling pattern, I decided to use long polling instead of WebSockets as it involved fewer code changes. For a new project I would probably use WebSockets, and, furthermore, a lot of the setup doesn't change, so this post may still be useful if you intend to use WebSockets.

Moving to ASGI#

The normal way to run a Python web server is to use a WSGI server like uWSGI. But WSGI is inherently synchronous, so ASGI was developed as a replacement with support for asynchronous applications. The Channels library extends Django to support asynchronous operations when run on an ASGI server.

The Channels documentation has a clear explanation of how to deploy a server. Following their instructions, this diff adds an asgi.py entrypoint to my app.

Setting up Gunicorn#

Those instructions use Daphne which is the reference implementation of ASGI. I chose to use Gunicorn with Uvicorn for ASGI support instead. I mostly followed the Gunicorn documentation on deploying. Here's the systemd .service file I use and gunicorn.conf.py file it references. (For comparison, here's the .service file and uwsgi.ini file for the old configuration with uWSGI.)

One catch is that I run the app in a subdirectory and while uWSGI has its SCRIPT_NAME setting and Daphne has the corresponding Root Path setting, Gunicorn does not appear to have an equivalent. Instead, I had to use Django's FORCE_SCRIPT_NAME setting by adding

FORCE_SCRIPT_NAME = '/fear-tracker/'
STATIC_URL = '/fear-tracker/static/'

to my local_settings.py file (where /fear-tracker/ is the subdirectory where the app is hosted).

The following nginx configuration section handles running the app in a subdirectory:

location /fear-tracker {
    rewrite /fear-tracker$ fear-tracker/ permanent;
    rewrite /fear-tracker(.*) $1 break;

    proxy_set_header Host $http_host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_buffering off;
    proxy_pass http://localhost:8043;
    proxy_redirect / /fear-tracker/;
}
location /fear-tracker/static {
    alias /home/anyoneeb/sites/apps/fear-tracker/fear_tracker/static/;
}

Long polling with Channels#

As I linked above, here are the full changes to implement long-polling. But to break it down a bit, here is my initial commit to add long polling. There's a few things going on here:

In settings.py, I've added 'channels' to INSTALLED_APPS (as explained in the Channels tutorial) and a CHANNEL_LAYERS setting for the local-development-friendly support for cross-channel communication. Note that your local_settings.py file should follow their recommendations for using Redis in production.
The routing.py file now routes the path to the long-polling handler to a new class instead of letting it fall through to urls.py.
The Javascript code now waits for a response and then immediately starts a new request once it has handled a response instead of waiting five seconds between requests.
And the actual meat of the change is the new class StatusLongPollConsumer in views.py.

The basic logic around StatusLongPollConsumer is that is accepts a request and waits for an update to trigger a fear_tracker.invalidate_status message which makes it tell the client to ask the normal status/ handler for the actual update.

Unfortunately, I was having trouble responding to an HTTP request as a response to a Channel Layers message. Looking at the source for AsyncHttpConsumer, I noticed the http_request method disconnects as soon as the handle() method returns, so I overrode it with a version that does not do so. This seems like a hack, but I couldn't figure out any other way to do it, and it seems to work.

async def http_request(self, message):
    """
    Async entrypoint - concatenates body fragments and hands off control
    to ``self.handle`` when the body has been completely received.
    """
    if "body" in message:
        self.body.append(message["body"])
    if not message.get("more_body"):
        try:
            await self.handle(b"".join(self.body))
        finally:
            pass
            # *Don't* disconnect immediately.
            #await self.disconnect()
            #raise StopConsumer()

One other detail to note is that the call to reverse() does not get the correct URL for the subdirectory, which I had a workaround for at one point but changed to using proxy_redirect as described above because Django's default support for appending trailing slashes had the same issue. Making the change at the nginx level fixed both.

Timeouts#

That version worked when testing locally, but once I wired everything up in production, it would give an error of

504 Gateway Time-out

after about a minute. Looking at the nginx error logs, I saw

upstream timed out (110: Connection timed out) while reading response header from upstream

It turns out nginx doesn't like holding connections open forever, and with good reason: each open connection is something extra for the server to keep track of, so the resources should be reclaimed if there's not actually a client waiting on the other end.

The solution is to only long poll for so long, and have the client restart a new connection if it is still interested in updates. It would be best if the server would close the connection before the timeout hit, but I couldn't figure out how to do that without adding a separate process like this solution using asgi.delay. Instead, I just have the client abort the connection based on this StackOverflow answer.

While we're modifying the Javascript side of things, one other important change was modifying the retry logic to wait a few seconds after an error, so it doesn't hammer the server whenever the app goes down for any reason.

A Weird Imagination

Long Polling in Django Channels