Support
Help Save Reptile!
Navigation

Essentials

Installation

Developers

P2P (content distribution)

Search Infrastructure

Services

Proposals

Resources

RSS Support for HTML Content

Reptile supports a number of components for supporting the output of RSS from arbitrary content.


RSSContentSerializer

Given a URL we can produce RSS 1.0 with mod_content output.


RSSANameChannelSerializer

Given a URL to a website that uses named anchors to describe content, we can produce an RSS 1.0 feed with the mod_content included.

This seems to work just fine for all blogger sites and supports export for all Advogato users.


RSSWebsiteFilterSerializer

Given a base URL we can produce RSS 1.0 with mod_content output for all the given links within the document. This is good for producing an RSS 1.0 feed for any valid HTML website (CNN, MSNBC, etc).

Status:

Note that all sites that require authentication will be fixed once I add this feature to the RSSContentSerializer

Fox News
WORKS
CBS News
Fails... missing all items... why?
ABC News
WORKS
MSNBC
Requires auth
CNN
WORKS
Time News
Should work... They don't offer any descernable pattern for URLs which are for articles and which are not. I may have to write a dedicated filter or URL encoded regexp for this site.
USA Today
POSSIBLE: Need to match on ^/[a-z/]+2002 for articles.
LA Times
Requires authentication.
NY Times
Requires authentication.
Reuters
SHOULD WORKS with base of /news_article.jhtml... has some problems... missing some articles. I think this is a trivial bug.
Washington Post
WORKS - BUG: some minor article duplicates. easy to fix...
SF Chronicle
WORKS
UK Times
Requires authentication
South China Morning Post
- Can't support - costs $$ for a authentication token
People's Daily (China)
WORKS Too much trailing content
Pravda (Russia)
Very bad HTML... too much trailing content. The core algorithm in RSSContentSerializer will need to be updated for this.
Asahi (Japan)
BBC (UK)
WORKS There are some misc image rendering bugs and problems with HTML escaping that need to be fixed in RSSContentSerializer but for the most part it works.
The Guardian (UK)
The Register (UK/Tech)
WORKS
The Onion
WORKS

These current URLs are a bit complex for real deployment. I am going to register PURLs for all of these.



Copyright © 2001-2003, OpenPrivacy.org