I often say that I love deleting code, and it’s true, but it’s not often that I get a really good example like this, of finding a generic problem hiding behind some specific code, and ending up with something that’s simpler, better, and can do more.

Background

I’ve spent the last couple of months working with Sidekick Studios on improvements to the client side of a mobile market research application. It’s been fun, and, especially after my previous experience in a rather large organisation, reinforced my belief that I enjoy and find it easiest to contribute in small companies.

One feature of the admin web app is the ability to generate zip files of multiple photos and videos on demand. As the main application stack (Ruby on Rails) isn’t well suited to this kind of streaming operation, the downloads were implemented with a short and simple Node.js application that fetched files from S3 and streamed them into a zip file and out to the client.

This approach works very well: the download starts almost instantly, and Node.js is a good fit for this straightforward IO-bound process. I suspect that Go would also work well, incidentally.

At the point when I started working on the project, the downloader worked out which files to put in the zip file by querying the SQL database directly. This wasn’t a terrible thing, but it posed a few problems as we added features:

  • It was coupled to the database schema
  • It didn’t know about the more complex permissions we were adding
  • We wanted to add HTML pages to the zip file to put the files into context

All of these things could have been added to the downloader, but it would have been duplication: the web app already knew how to do this stuff.

The solution, obviously, was to make the downloader more stupid, so that it could just freeload on the web app’s functionality. Here’s how I did it:

Step 1: Remove the database connection

Instead of querying the database to find the files, just ask the web app for a list of files to include. This is the manifest. As well as the list of files, it also includes what they should be stored as, and the name of the zip file to be generated.

As the manifest is JSON, it’s trivial to parse into a JavaScript data structure. In fact, since I used restler, I didn’t even have to do the deserialisation myself.

Step 2: Add the ability to download files from HTTP(S) as well as S3

Need HTML? Just ask the web app. This goes for static assets as well. We could put the latter in the downloader, but the general principle is for the downloader to know as little as possible.

This means that each file entry now has three pieces of information:

  • Where to find it
  • How to get it (HTTP or S3 API)
  • Where to put it

Step 3: Make it secure

We want to make sure that people can only download files that they’re allowed. Since we know that the download process starts in the web app, when an authenticated person follows a link, and that the downloader needs to request the manifest from the web app, we just need some identifier to tie the two together — the download token:

  • Person clicks link, e.g. http://www.example.com/downloads/42
  • Web app checks permissions, generates token for requested object
  • Web app redirects to downloader with token, e.g. http://dl.example.com/?token=4d3n7
  • Downloader requests manifest with token, e.g. http://www.example.com/manifests/?token=4d3n7
  • Web app generates manifest for object identified by token
  • Web app sends manifest to downloader

The token is also used when requesting any HTML files generated by the web app for the download.

The download token is a long string composed of four parts:

[ object type | object ID | timestamp | HMAC ]

The timestamp lets the application check that the token isn’t too old, whilst the HMAC ensures that the token is authentic and hasn’t been tampered with.

The token does not need to be persisted in a database for authentication, nor does it need to be parsed anywhere outside the web app. The downloader can treat it as an opaque string that it just appends to every request.

Is it stupid enough yet?

By this stage, the downloader only needs to know two pieces of information, neither of which is specific to the project:

  • Where to find the manifests
  • The S3 credentials

The S3 credentials are not even strictly necessary—we could fetch the files via HTTP—but requests through the public interface are more expensive than within AWS.

I’ve lied a bit, because I didn’t just remove code: I added features to the web side (notably the download token and manifest generation). However, the downloader app is now about 20% shorter and does a lot more, despite knowing a lot less.