PubSubHubbub was concieved as a protocol for delivering push notifications of updates to Atom and RSS feeds, and I think most would agree that it has been somewhat successful in doing so. However, almost immediately people became interested in either making it support other specific serialization formats (I penned a variant for streams of JSON objects, for example) and ultimately making it general enough for any arbitrary data type.
Attempting to support arbitrary data formats exposed a number of weaknesses in the original protocol, the main one being that the signature used for authenticated notifications applies only to the body of the notification. This was not too big an issue when the payload was constrained to being a valid feed, but with support for arbitrary resources comes the need to support the HTTP headers that describe the payload — Content-Type in particular — and these really need to be be included in the signature too in order to prevent a class of attack where a request is intercepted and altered with a new set of headers in order to obtain a more harmful interpretation of the existing payload.
At the 2010 Federated Social Web Summit I suggested the solution of making the notification body be an entire HTTP response rather than just a payload, which Joseph Smarr lovingly branded a "turducken solution". Of course the problem with this approach, as I acknowledged at the time, is that most web frameworks out there are not equipped to parse an HTTP response bytestream out of the body of an HTTP request, and so this can be tricky to implement on some popular web application stacks.
Today I offer a new solution that arises from looking at PubSubHubbub from a different angle. Rather than thinking of it as a means to notify of new items in a stream, instead we can think of it conceptually as a protocol for mirroring resources.
If you frame the problem in terms of resources — a fundamental HTTP concept — then this brings us closer to HTTP and allows us to make better use of the facilities that HTTP provides. In particular, we can represent update notifications with HTTP PUT requests:
PUT /example.jpg HTTP/1.0 Content-Type: image/jpeg Content-Length: 2545 Host: example.com Authorization: HubSignature 103456 abcd1234abcd1234abcd1234abcd1234abcd1234 (image payload)
HTTP already defines the how to use entity header fields with a PUT request to provide the metadata for an entity body, so we can use this as the format of a "fat ping". The only new thing in my above example is the hypothetical HubSignature auth mechanism, which I imagine to be a signature generated in terms of the Content-* set of header fields, the request method, the request URI, the payload, a nonce (103465 in this example) and the hub secret.
This has the advantage of being very close to what lots of web server software already expects from a PUT. Web servers and frameworks generally provide a mechanism for integrating new HTTP auth mechanisms so it this protocol could be handled relatively easily in (for example) Apache HTTPD by combining its existing PUT support with a new auth module. We could also just use Basic auth over HTTPS to transmit a shared secret in a manner that allows the processing of notifications with no new software at all.
If we're willing to explore a less proven part of the HTTP stack we could also exploit the newer PATCH method as a means to re-introduce optional delta-based notifications in a more general and less ambiguous way. We'd just need to figure out a means for the subscriber and hub to negotiate which patch document formats they both support. Leaving that problem aside for now, here's a patch notification using a hypothetical Atom patch format I just made up:
PATCH /example.atom HTTP/1.0
Content-Type: application/atom-delta+xml
Content-Length: 346
Host: example.com
If-Match: abcdabcd12341234abcdabcd12341234
Authorization: HubSignature cheese 1234abcd1234abcd1234abcd1234abcd1234abcd
<delta:feed xmlns:delta="whatever"
xmlns:ts="http://purl.org/atompub/tombstones/1.0">
xmlns="http://www.w3.org/2005/Atom"
<entry>
<id>tag:example.org,2011:entry2</id>
<title>A new entry</title>
<!-- etc, etc -->
</entry>
<at:deleted-entry ref="tag:example.org,2011:entry1" />
</delta:feed>
Naturally the DELETE verb can be used to complete this story by providing a means to indicate that a mirrored resource no longer exists, though of course the subscriber would be free to ignore this and keep the resource if desired.
As far as I can tell the only downside of this approach is its incompatiblity with the established PubSubHubbub protocol, but I believe adoption of PubSubHubbub for arbitrary content types remains low enough that the community could suffer a breaking change in the interests of better integration with existing HTTP features and tools.
What do you think?
I'm not 100% sure I see the difference between POST and PUT here. Would you be able to explicit it?
And I definetely agree with your final statement. Supporting arbitrary content WILL break compatibility, yet, we need it... so we have to accept that we're going to leave some implementations on the side of this amazing road.
Posted by: Julien51 | 02/27/2012 at 10:20 AM
Julien,
The difference is really just that PUT has better-defined semantics and therefore generic server software like Apache HTTP can handle it out of the box for simple cases rather than needing to run some custom application code to handle the notification. Of course, in many cases you will want a custom application to do something deeper with the new data, but I like the idea of applying this to simple cases like, for example, copying an image onto another server and updating it when the source changes.
Since HTTP basically defines POST as "some action but we don't know what it is" you always need specialized software to handle it, and the PubSubHubbub spec needs more prose on how to process the notification rather than just referring to what's been established by HTTP.
But really my proposal was not about using PUT specifically but rather applying the full set of HTTP resource manipulation verbs (DELETE and PATCH too) to this problem rather than wrapping the result in a "notification envelope" as we do for Atom and then having to re-invent mechanisms to communicate the difference between "this is a full resource", "this is a delta" and "this is a notification that the resource no longer exists".
Posted by: Martin Atkins | 02/27/2012 at 10:31 AM
Ho, I see this is mostly a semantic difference. I do actually like the idea of PATCH and DELETE very much. I understand PUT would be nice, but I wonder if it wouldn't be actually quite confusing, for a very small benefit (who uses Apache for their PubSubHubbub listener?).
We should debate about this on the group!
Posted by: Julien51 | 02/27/2012 at 12:14 PM
I'm curious to hear why you find PUT confusing, since I personally find it much more intuitive, but maybe that's because I've remolded my brain sufficiently. :)
Using Apache is of course just an example. Working closer to HTTP's core concepts means firstly that more of the required software stack is already out there and secondly that the PubSubHubbub spec can be simpler since it can just refer to HTTP's existing definitions of these concepts.
And as for "who uses Apache for their PubSubHubbub listener?", right now that's probably close to zero but I wonder if that's just because it hasn't been easy until now. It's unlikely that someone would use a feed-like resource in this way, since they'll want some application code to extract the items out of the feed rather than to just store the feed verbatim, but for something more atomic like an image or a Word document it's more likely that I'd just want to copy the data byte-for-byte onto my own server, and most HTTP servers can already do this.
And sure, we can debate this in the group, but the debate there tends to get pretty heated so I think I'll wait until later, when I'm less busy, to start a thread. :)
Posted by: Martin Atkins | 02/27/2012 at 12:32 PM