Given a base URL we can produce RSS 1.0 with mod_content output
for all the given links within the document. This is good for
producing an RSS 1.0 feed for any valid HTML website (CNN,
MSNBC, etc).
Status:
Note that all sites that require authentication will be
fixed once I add this feature to the
RSSContentSerializer
-
Fox News
-
WORKS
-
CBS News
-
Fails... missing all items... why?
-
ABC News
-
WORKS
-
MSNBC
-
Requires auth
-
CNN
-
WORKS
-
Time News
-
Should work... They don't offer any descernable pattern
for URLs which are for articles and which are not. I
may have to write a dedicated filter or URL encoded
regexp for this site.
-
USA Today
-
POSSIBLE: Need to match on ^/[a-z/]+2002 for articles.
-
LA Times
-
Requires authentication.
-
NY Times
-
Requires authentication.
-
Reuters
-
SHOULD WORKS with base of /news_article.jhtml... has
some problems... missing some articles. I think this is a
trivial bug.
-
Washington Post
-
WORKS
- BUG: some minor article duplicates. easy to fix...
-
SF Chronicle
-
WORKS
-
UK Times
-
Requires authentication
-
South China Morning Post
-
- Can't support - costs $$ for a authentication token
-
People's Daily (China)
-
WORKS
Too much trailing content
-
Pravda (Russia)
-
Very bad HTML... too much trailing content. The core
algorithm in RSSContentSerializer will need to be
updated for this.
-
Asahi (Japan)
-
BBC (UK)
-
WORKS
There are some misc image rendering bugs and problems
with HTML escaping that need to be fixed in
RSSContentSerializer but for the most part it works.
-
The Guardian (UK)
-
-
The Register (UK/Tech)
-
WORKS
-
The Onion
-
WORKS
These current URLs are a bit complex for real deployment. I am
going to register PURLs for all of these.
|