It's becoming increasingly common for content-publishing applications to include a feature where they'll duplicate (in some sense) the content a user creates on other services such as Twitter or Facebook.
Unfortunately, this has the unfortunate side-effect that multi-feed aggregators cannot easily detect this and often end up showing the same content more than once.
However, I think we can go some way towards a technical solution to this problem without trying to boil the ocean and stop people cross-posting: have the publisher that's creating the duplicate content declare that it has done so in its feeds.
What does this look like? It feels like this just takes one very simple extension element with the same attributes as the in-reply-to element introduced by Atom Threading Extensions: a ref attribute giving the id of the duplicate entry, and a type,href pair linking to a representation of the duplicate entry. For example:
<crosspost:dupe
ref="http://twitter.com/apparentlymart/statuses/3641424947"
href="http://twitter.com/apparentlymart/statuses/3641424947"
type="text/html"
/>
This alone isn't enough to do the de-duping, since we can't trust publishers not to lie about what's a duplicate, but in an application such as FriendFeed or MT Action Streams where a user has configured a list of feeds to import it is easier to assume that all of the referenced feeds are trustworthy in the context of that user: if I've got both my notes blog and Twitter both added to MT Action Streams and the notes blog declares a Twitter entry from my account as a duplicate it's fair to assume that it is indeed a duplicate.
This is not a complete solution, since it is possible that I've cross-posted to both Twitter and Facebook and you consume those two feeds but not the "origin" feed; however, I think this is a step in the right direction and solves the immediate problem at hand. It would be nice if the services that tend to receive these duplicates would extend their APIs such that publishers can declare that they're posting a dupe and so the receiving service can create a reverse-dupe element, but that's not something we can bootstrap so easily today.
I'm interested to see if any providers who offer the functionality to duplicate their content on Twitter and/or Facebook would be willing to work on this. It ought to be a reasonably easy, tightly-scoped specification and should not be a burden for implementers as long as they know how to form an Atom id (or RSS equivalent) for the services they publish to based on a service-local id returned from the API.
Comments