jonm.dev

The Art of Writing Software



RESTful Refactor: Combine Resources

Tags [ caching, HTTP, refactoring, REST, REST API, RESTful web services ]

I’ve been spending a lot of time thinking about RESTful web services, hypermedia APIs, and I’ve started to discover several design patterns as I’ve begun to play around with these in code. Today, I want to talk about the granularity of resources, which is roughly “how much stuff shows up at a single resource”. Generally speaking, RESTful architectures work better with coarser-grained resources, i.e., transferring more stuff in one response, and I’ll walk through an example of that in this article.

Now, in my previous article, I suggested taking each domain object (or collection of domain objects) and making it a resource with an assigned URL. While following this path (along with the other guidelines mentioned) does gets you to a RESTful architecture, it may not always be an optimal one, and you may refactor your API to improve it.

Let’s take, for example, the canonical and oversimplified “list of favorite things” web service. There are potentially two resource types:

All well and good, and I can model all sorts of actions here:

adding a new favorite
POST to /favorites
removing a favorite
DELETE to the specific /favorites/{id}
editing a favorite
PUT to the specific /favorites/{id}
getting the full list
GET to /favorites

Fully RESTful, great. However, let’s think about cache semantics, particularly the cache semantics we should assign to the GET to /favorites. This is probably the most common request we’d have to serve, and in fact it ought to be quite cacheable, as in practice (as with a lot of user-maintained preferences or data) there are going to be lots of read accesses between writes.

There’s a problem here, though: some of the actions that would cause an update to the list don’t operate on the list’s URL (namely, editing a single entry or deleting an entry). This means an intermediary HTTP cache won’t invalidate the cache entry for the list when those updates happen. If we want a subsequent fetch of the list by a user to reflect an immediate update, we either have to put ‘Cache-Control: max-age=0’ on the list and require validation on each access, or we need the client to remember to send ‘Cache-Control: no-cache’ when fetching a list after an update.

Putting Cache-Control: max-age=0 on the list resource really seems a shame; most RESTful APIs are set up to cross WAN links, and so you may be paying most of the latency of a full fetch that returned a 200 OK even if you are getting a 304 Not Modified response, especially if you have fine-grained resources that don’t have a lot of data (and a textual list of 10 or so favorite items isn’t a lot of data!).

Requiring the client to send Cache-Control: no-cache is also problematic: the cache semantics of the resources are really supposed to be the server’s concern, yet we are relying on the client to understand something extra about the relationship between various resources and their caching semantics. This is a road that leads to tight coupling between client and server, thus throwing away one of the really useful properties of a REST architecture: allowing the server and client to evolve largely independently.

Instead, let me offer the following rule of thumb: if a change to one resource should cause a cache invalidation of another resource, maybe they shouldn’t be separate resources. I’ll call this a “RESTful refactoring”: Combining Resources.

In our case, I would suggest that we only need one resource:

We can still model all of our actions:

adding a new favorite
PUT to /favorites a list containing the new item
removing a favorite
PUT to /favorites a new list with the offending item removed
editing a favorite
PUT to /favorites a list containing an updated item
getting the full list
GET to /favorites

But now, I can put a much longer cache timeout on the /favorites resource, because if a client does something to change its state, it will do a PUT to /favorites, invalidating its own cache (assuming the client has its own non-shared/private cache). If the resource represents a user-specific list, then I can probably set the cache timeout considering:

Probably these values are a lot larger than the zero seconds we were using via Cache-Control: max-age=0. When you can figure out how to assign longer expiration times to your responses, you can get a much bigger win for performance and scale. While revalidating a cached response is probably faster than fetching the resource anew, not having to send a request at all to the origin is waaaaaaay better.

The extreme case, here, of course, would be a web service where a user could just get all their “stuff” in one big blob with one request (as we modelled above). There are many domains where this is quite possible, and when you factor in gzip encoding, you can start to contemplate pushing around quite verbose documents, which can be a big win assuming your server can render the response reasonably quickly enough.