RESTful Refactor: Combine Resources
Tags [ caching, HTTP, refactoring, REST, REST API, RESTful web services ]
I’ve been spending a lot of time thinking about RESTful web services, hypermedia APIs, and I’ve started to discover several design patterns as I’ve begun to play around with these in code. Today, I want to talk about the granularity of resources, which is roughly “how much stuff shows up at a single resource”. Generally speaking, RESTful architectures work better with coarser-grained resources, i.e., transferring more stuff in one response, and I’ll walk through an example of that in this article.
Now, in my previous article, I suggested taking each domain object (or collection of domain objects) and making it a resource with an assigned URL. While following this path (along with the other guidelines mentioned) does gets you to a RESTful architecture, it may not always be an optimal one, and you may refactor your API to improve it.
Let’s take, for example, the canonical and oversimplified “list of favorite things” web service. There are potentially two resource types:
- a favorite thing (
/favorites/{id}
) - a list of favorite things (
/favorites
)
All well and good, and I can model all sorts of actions here:
- adding a new favorite
- POST to /favorites
- removing a favorite
- DELETE to the specific /favorites/{id}
- editing a favorite
- PUT to the specific /favorites/{id}
- getting the full list
- GET to /favorites
Fully RESTful, great. However, let’s think about cache semantics,
particularly the cache semantics we should assign to the GET
to
/favorites
. This is probably the most common request we’d have to
serve, and in fact it ought to be quite cacheable, as in practice (as
with a lot of user-maintained preferences or data) there are going to
be lots of read accesses between writes.
There’s a problem here, though: some of the actions that would cause an update to the list don’t operate on the list’s URL (namely, editing a single entry or deleting an entry). This means an intermediary HTTP cache won’t invalidate the cache entry for the list when those updates happen. If we want a subsequent fetch of the list by a user to reflect an immediate update, we either have to put ‘Cache-Control: max-age=0’ on the list and require validation on each access, or we need the client to remember to send ‘Cache-Control: no-cache’ when fetching a list after an update.
Putting Cache-Control: max-age=0
on the list resource really seems a
shame; most RESTful APIs are set up to cross
WAN links,
and so you may be paying most of the latency of a full fetch that
returned a
200 OK
even if you are getting a
304 Not Modified
response, especially if you have fine-grained
resources that don’t have a lot of data (and a textual list of 10 or
so favorite items isn’t a lot of data!).
Requiring the client to send Cache-Control: no-cache
is also
problematic: the cache semantics of the resources are really supposed
to be the server’s concern, yet we are relying on the client to
understand something extra about the relationship between various
resources and their caching semantics. This is a road that leads to
tight coupling between client and server, thus throwing away one of
the really useful properties of a REST architecture: allowing the
server and client to evolve largely independently.
Instead, let me offer the following rule of thumb: if a change to one resource should cause a cache invalidation of another resource, maybe they shouldn’t be separate resources. I’ll call this a “RESTful refactoring”: Combining Resources.
In our case, I would suggest that we only need one resource:
- the list of favorites
We can still model all of our actions:
- adding a new favorite
- PUT to /favorites a list containing the new item
- removing a favorite
- PUT to /favorites a new list with the offending item removed
- editing a favorite
- PUT to /favorites a list containing an updated item
- getting the full list
- GET to /favorites
But now, I can put a much longer cache timeout on the /favorites resource, because if a client does something to change its state, it will do a PUT to /favorites, invalidating its own cache (assuming the client has its own non-shared/private cache). If the resource represents a user-specific list, then I can probably set the cache timeout considering:
how long am I willing to wait for another user to see the results of this user’s updates?
if the same user accesses the resource from a different computer, how long am I willing to allow those two views to stay out of sync? (bearing in mind that the user can usually, and pretty intuitively, hit refresh on a browser page that looks out of date)?
Probably these values are a lot larger than the zero seconds we were
using via Cache-Control: max-age=0
. When you can figure out how to
assign longer expiration times to your responses, you can get a much
bigger win for performance and scale. While revalidating a cached
response is probably faster than fetching the resource anew, not
having to send a request at all to the origin is waaaaaaay
better.
The extreme case, here, of course, would be a web service where a user
could just get all their “stuff” in one big blob with one request (as
we modelled above). There are many domains where this is quite
possible, and when you factor in gzip
encoding, you can start to
contemplate pushing around quite verbose documents, which can be a big
win assuming your server can render the response reasonably quickly
enough.