Hypermedia Programming: Lists
The humble list is one of our simplest and yet most powerful data structures–so much so that we will even routinely write them out by hand. We put them on sticky notes to help us remember things we need to do or things we need to buy from the grocery store. We even use them for entertainment. In this post I’ll explain how to represent and manipulate lists using hypermedia techniques.
The most straightforward list representation actually doesn’t look that different than a handwritten to-do list; the text/uri-list media type just consists of one URI per line. This makes the format incredibly concise, since there is very little syntactic structure (just interleaved line breaks!), while making it completely general through the use of globally unambiguous identifiers.
Now let’s talk about manipulating this list with HTTP. I would expect to be able to:
- delete the list by issuing a
DELETEto its URI
- reorder the list by issuing a
PUTto its URI with the complete, reordered list of URIs
- insert or remove new URIs somewhere in the middle by issuing a
PUT, again with a complete “new” version of the list
- append to the list by issuing a POST with a
text/uri-listbody containing the new URI(s) to be added
The above assume, of course, that I am properly authenticated and
authorized to modify the list. If the original list resource had an
Last-Modified header, I would supply
If-Unmodified-Since headers on my modification request.
Once the list grows large, however, using
PUTs to make what seem like
“small” changes (removing an item, inserting an item) doesn’t seem
particularly efficient. For these types of changes, I’d like to be
able to use
PATCH and specify these small edits. Now, since
text/uri-list is a text type, we ought to be able to borrow the output
of the common
diff utility to specify the changes we want to
make. [Unfortunately, it turns out the output of diff isn’t registered
as a standard media type, although I’m trying to rectify
as well in my not-so-copious spare time.]
This means, for example, we could see something like the following protocol exchange, starting with retrieving the initial list:
These general approaches will work very well up through relatively large lists, although at some point your list will get bigger than you are willing to serve in a single representation. Now it’s time to add pagination!
The easiest way to do this on the server side is to provide Link headers which tell you where you are in the list. In fact, there are registered link relations that are perfect for this already, in particular:
- Points to the first page of the list.
- Points to the last page of the list.
- Points to the next page in the list.
- prev or previous
- Points to the previous page in the list.
Now let’s work through an example. Suppose you fetch a URL that you expect, from context, to be a list and these response headers come back:
HTTP/1.1 200 OK Date: Fri, 31 Aug 2012 01:38:18 GMT Content-Type: text/uri-list Link: <./page/2>; rel="next last" ...
Now, you can infer a couple of things, namely, that this list spans
multiple pages (due to the presence of the
next link), but also that
it has exactly two pages (because the
next link is also the
link). We can also tell that this is the first page, because there
prev link; we might also be able to infer that if the server
Link: <.>; rel="first"
Ok, that works well for paginated list retrieval. It’s not too hard to
look for these
Link headers and traverse them to retrieve and/or
iterate over the entire list. But now how about updates? There’s
actually an ambiguity problem here, because we followed a particular
URL for the whole list but got back a representation for the
first page of the list instead. If I
DELETE that URL, does
- delete the entire list; OR
- delete all the entries on the first page only?
The short answer is: there’s no way to tell. As a server implementor,
though, when someone did a
GET on a list that I decided I needed to
paginate, I might instead issue a 302 redirect to a different URL
representing the first page explicitly. For example:
GET /list HTTP/1.1 Host: www.example.com HTTP/1.1 302 FOUND Date: Fri, 31 Aug 2012 01:38:18 GMT Location: http://www.example.com/list/page/1
Then I could treat
/list as if
they were targeting the entire list, and treat requests to
/list/page/1 as if they were targeting just the first page.
But back to our client conundrum; perhaps our server doesn’t adhere to
this redirect convention–it’s certainly not an official
standard. How do we proceed? Well, if our goal (and writing
“goal-oriented” clients is a good orientation for hypermedia clients)
is to delete the whole list, then we can just alternate
GETs until the whole thing is gone. Either the
DELETE affects the
whole list in the first shot, or it deletes just the first page. In
the latter case, I’ve made progress and can repeatedly delete the
first page until I’m done.
[Sidebar: avid readers of the HTTP spec will have spotted the trick question already here.
DELETEs are supposed to be idempotent, but deleting just the first page of a list is not an idempotent operation because the second page becomes the first page, so repeated
DELETEs to the same “first page” URL will continue deleting more items. Therefore the correct behavior for the server is to delete the entire list. However, if you meet a server that has decided on the second semantics, good luck waving standards documents around to get its implementor to change it.]
If our goal, however, is just to remove the first page, we probably
PATCH it. However, that’s not a commonly implemented HTTP verb
501 NOT IMPLEMENTED or
405 METHOD NOT ALLOWED), and there
isn’t a standardized text diff media type yet anyway, so that might
not work. In this case, our client may well have to be prepared to
DELETE the entire list and then reconstruct just the desired parts
What’s very interesting about this is that the client as we’ve
described it actually implements a
mechanism. It takes sensor readings or receives feedback
via “examining” the system with
GETs, and then takes further actions
based on its current goal and the current state of the system. For a
really good introduction to how this can lead to very robust systems,
GN&C Fault Protection Fundamentals
by Robert Rasmussen; although the
paper is about spacecraft guidance systems (cool!) its concepts are
easily applicable to software systems in general.
text/uri-list, while a great format for capturing list membership
and order, doesn’t tell a recipient anything about the actual members
of the list. In that sense, it’s all identifier and no data. For those
list members that are URLs, we can attempt to
GET them or check their
HEAD or ask for their
OPTIONS to get more information. For URIs that
are not locators (and hence dereferenceable), like
Tag URIs, we’d have
to consult a lookup service or have access to other out-of-band
information. At any rate, if a client was looking for a particular
member of a list, it might have to make several more requests before
it could find the right one. In particular, a human looking at the
list in a browser will likely have to do a bunch of cut-n-pasting to
fully investigate the list contents.
Query results are represented by a list of links with summary information, not by arrays of object representations.
In other words, along with the links, we want to provide a little
extra contextual information to make it easier for the client to
identify what they’re looking for. The
text/uri-list format lacks this
extra context and assumes the recipient can find it elsewhere
easily. Perhaps we should look for alternative formats that are nearly
as concise but which also provide opportunity to supply the little bit
of context Fielding describes. Two immediate options that spring to
which are media types whose domains specifically include lists.
For example, here’s our initial list of favorite items, represented in
Now all the same rules apply for this list, as far as what the methods
POST appends a new item to the list, etc.). This is an
example of what folks mean by uniform interface. If a URL
represents a list, then
POSTing to it should append to the list,
regardless of the media type used to serve the list or to enclose a
new item to be appended. So long as the client and server commonly
understand a sufficient set of media types, they can interoperate. In
the case of the Atom-formatted list, I would probably expect to have
<entry> containing my new item, as I have a strong
hint that the server understands
application/atom+xml. However, the
server may also advertise additional formats with
Link headers (Atom
lets us do this with embedded
<link> elements too):
Link: <.>; rel="alternate"; type="text/uri-list"
To take advantage of these I may need to adjust my client’s
header to specify my preference for them. But at any rate, if the
resource is a list, there’s no reason a client couldn’t
GET it as
Atom, and then
POST a single URI onto it with
text/uri-list, so long
as the client and server understand both media types. If the server
doesn’t, it may well reply with a
415 UNSUPPORTED MEDIA TYPE and then
the client may try again if it has another option.
Last but not least, since I like using HTML as the media type for my APIs, we should point out that this is also a fine and relatively compact way to represent a list:
Where to go next?
I’ve given you a brief tour about how to deal with hypermedia lists in a standardized way, relying on the documented semantics of HTTP and various media types. I believe it should be possible to construct relatively robust and general client libraries for dealing with lists in all their incarnations; that would be a great open source project…hint hint.