Hypermedia Programming: Lists
Tags [ hypermedia ]
The humble list is one of our simplest and yet most powerful data structures–so much so that we will even routinely write them out by hand. We put them on sticky notes to help us remember things we need to do or things we need to buy from the grocery store. We even use them for entertainment. In this post I’ll explain how to represent and manipulate lists using hypermedia techniques.
The most straightforward list representation actually doesn’t look that different than a handwritten to-do list; the text/uri-list media type just consists of one URI per line. This makes the format incredibly concise, since there is very little syntactic structure (just interleaved line breaks!), while making it completely general through the use of globally unambiguous identifiers.
Now let’s talk about manipulating this list with HTTP. I would expect to be able to:
- delete the list by issuing a
DELETE
to its URI - reorder the list by issuing a
PUT
to its URI with the complete, reordered list of URIs - insert or remove new URIs somewhere in the middle by issuing a
PUT
, again with a complete “new” version of the list - append to the list by issuing a POST with a
text/uri-list
body containing the new URI(s) to be added
The above assume, of course, that I am properly authenticated and
authorized to modify the list. If the original list resource had an
ETag
or Last-Modified
header, I would supply If-Match
or
If-Unmodified-Since
headers on my modification request.
Once the list grows large, however, using PUT
s to make what seem like
“small” changes (removing an item, inserting an item) doesn’t seem
particularly efficient. For these types of changes, I’d like to be
able to use PATCH
and specify these small edits. Now, since
text/uri-list
is a text type, we ought to be able to borrow the output
of the common diff
utility to specify the changes we want to
make. [Unfortunately, it turns out the output of diff isn’t registered
as a standard media type, although I’m trying to rectify
that
as well in my not-so-copious spare time.]
This means, for example, we could see something like the following protocol exchange, starting with retrieving the initial list:
Adding pagination
These general approaches will work very well up through relatively large lists, although at some point your list will get bigger than you are willing to serve in a single representation. Now it’s time to add pagination!
The easiest way to do this on the server side is to provide Link headers which tell you where you are in the list. In fact, there are registered link relations that are perfect for this already, in particular:
- first
- Points to the first page of the list.
- last
- Points to the last page of the list.
- next
- Points to the next page in the list.
- prev or previous
- Points to the previous page in the list.
Now let’s work through an example. Suppose you fetch a URL that you expect, from context, to be a list and these response headers come back:
HTTP/1.1 200 OK
Date: Fri, 31 Aug 2012 01:38:18 GMT
Content-Type: text/uri-list
Link: <./page/2>; rel="next last"
...
Now, you can infer a couple of things, namely, that this list spans
multiple pages (due to the presence of the next
link), but also that
it has exactly two pages (because the next
link is also the last
link). We can also tell that this is the first page, because there
isn’t a prev
link; we might also be able to infer that if the server
additionally provided:
Link: <.>; rel="first"
Ok, that works well for paginated list retrieval. It’s not too hard to
look for these Link
headers and traverse them to retrieve and/or
iterate over the entire list. But now how about updates? There’s
actually an ambiguity problem here, because we followed a particular
URL for the whole list but got back a representation for the
first page of the list instead. If I DELETE
that URL, does
it:
- delete the entire list; OR
- delete all the entries on the first page only?
The short answer is: there’s no way to tell. As a server implementor,
though, when someone did a GET
on a list that I decided I needed to
paginate, I might instead issue a 302 redirect to a different URL
representing the first page explicitly. For example:
GET /list HTTP/1.1
Host: www.example.com
HTTP/1.1 302 FOUND
Date: Fri, 31 Aug 2012 01:38:18 GMT
Location: http://www.example.com/list/page/1
Then I could treat PUT
s, DELETE
s, PATCH
es and POST
s to /list
as if
they were targeting the entire list, and treat requests to
/list/page/1
as if they were targeting just the first page.
But back to our client conundrum; perhaps our server doesn’t adhere to
this redirect convention–it’s certainly not an official
standard. How do we proceed? Well, if our goal (and writing
“goal-oriented” clients is a good orientation for hypermedia clients)
is to delete the whole list, then we can just alternate DELETE
s with
GET
s until the whole thing is gone. Either the DELETE
affects the
whole list in the first shot, or it deletes just the first page. In
the latter case, I’ve made progress and can repeatedly delete the
first page until I’m done.
[Sidebar: avid readers of the HTTP spec will have spotted the trick question already here.
DELETE
s are supposed to be idempotent, but deleting just the first page of a list is not an idempotent operation because the second page becomes the first page, so repeatedDELETE
s to the same “first page” URL will continue deleting more items. Therefore the correct behavior for the server is to delete the entire list. However, if you meet a server that has decided on the second semantics, good luck waving standards documents around to get its implementor to change it.]
If our goal, however, is just to remove the first page, we probably
want to PATCH
it. However, that’s not a commonly implemented HTTP verb
(expect 501 NOT IMPLEMENTED
or 405 METHOD NOT ALLOWED
), and there
isn’t a standardized text diff media type yet anyway, so that might
not work. In this case, our client may well have to be prepared to
DELETE
the entire list and then reconstruct just the desired parts
with PUT
and/or POST
.
What’s very interesting about this is that the client as we’ve
described it actually implements a
closed-loop control
mechanism. It takes sensor readings or receives feedback
via “examining” the system with GET
s, and then takes further actions
based on its current goal and the current state of the system. For a
really good introduction to how this can lead to very robust systems,
see
GN&C Fault Protection Fundamentals
by Robert Rasmussen; although the
paper is about spacecraft guidance systems (cool!) its concepts are
easily applicable to software systems in general.
Richer Representations
The text/uri-list
, while a great format for capturing list membership
and order, doesn’t tell a recipient anything about the actual members
of the list. In that sense, it’s all identifier and no data. For those
list members that are URLs, we can attempt to GET
them or check their
HEAD
or ask for their OPTIONS
to get more information. For URIs that
are not locators (and hence dereferenceable), like
URNs or
Tag URIs, we’d have
to consult a lookup service or have access to other out-of-band
information. At any rate, if a client was looking for a particular
member of a list, it might have to make several more requests before
it could find the right one. In particular, a human looking at the
list in a browser will likely have to do a bunch of cut-n-pasting to
fully investigate the list contents.
What can we do about this? In REST APIs must be hypertext-driven, Roy Fielding suggests the following pattern:
Query results are represented by a list of links with summary information, not by arrays of object representations.
In other words, along with the links, we want to provide a little
extra contextual information to make it easier for the client to
identify what they’re looking for. The text/uri-list
format lacks this
extra context and assumes the recipient can find it elsewhere
easily. Perhaps we should look for alternative formats that are nearly
as concise but which also provide opportunity to supply the little bit
of context Fielding describes. Two immediate options that spring to
mind are
Atom+XML and
Collection+JSON,
which are media types whose domains specifically include lists.
For example, here’s our initial list of favorite items, represented in
application/atom+xml
:
Now all the same rules apply for this list, as far as what the methods
mean (i.e. POST
appends a new item to the list, etc.). This is an
example of what folks mean by uniform interface. If a URL
represents a list, then POST
ing to it should append to the list,
regardless of the media type used to serve the list or to enclose a
new item to be appended. So long as the client and server commonly
understand a sufficient set of media types, they can interoperate. In
the case of the Atom-formatted list, I would probably expect to have
to POST
an <entry>
containing my new item, as I have a strong
hint that the server understands application/atom+xml
. However, the
server may also advertise additional formats with Link
headers (Atom
lets us do this with embedded <link>
elements too):
Link: <.>; rel="alternate"; type="text/uri-list"
To take advantage of these I may need to adjust my client’s Accept
header to specify my preference for them. But at any rate, if the
resource is a list, there’s no reason a client couldn’t GET
it as
Atom, and then POST
a single URI onto it with text/uri-list
, so long
as the client and server understand both media types. If the server
doesn’t, it may well reply with a 415 UNSUPPORTED MEDIA TYPE
and then
the client may try again if it has another option.
Last but not least, since I like using HTML as the media type for my APIs, we should point out that this is also a fine and relatively compact way to represent a list:
Where to go next?
I’ve given you a brief tour about how to deal with hypermedia lists in a standardized way, relying on the documented semantics of HTTP and various media types. I believe it should be possible to construct relatively robust and general client libraries for dealing with lists in all their incarnations; that would be a great open source project…hint hint.