Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Programming Books Media Book Reviews IT Technology

Content Syndication With RSS 78

Alex Moskalyuk writes "Ben Hammersley's Content Syndication with RSS is a step-by-step guide to implementing RSS. This standard is gaining popularity among the Web community, and some of your favorite sites might syndicate their content as RSS feeds. The new O'Reilly publication focuses on many aspects of this standard, and is of primary interest to developers, Web site designers, data architects and anyone interested in distributing their data around the Web." So if you have a steady stream of information for your customers, family, or fans, read on for the rest of Alex's review.
Content Syndication With RSS
author Ben Hammersley
pages 222
publisher O'Reilly
rating 8/10
reviewer Alex Moskalyuk
ISBN 0596003838
summary Introduction and guide for RSS implementations

The first three chapters are primarily discussing the multiplicity of RSS standards. While with some other technologies it might seem a bit excessive, remember that RSS is a forked project with the forks at this moment bearing little resemblance to one another. The abbreviations even have different abbreviations - RSS means Really Simple Syndication if you are using RSS 0.91 or RSS 0.92, that was developed by Dave Winer. RSS means RDF Site Summary if the version you're using RSS 1.0. The development credits in this case go to RSS DEV team. To confuse you even more, the RSS 2.0 standard is deciphered as... correct, Really Simple Syndication again.

Hence chapter 4 discusses Winer's implementation (simplistic and user-friendly), while chapter 6 focuses on RSS 1.0 (RDF-compliant and data-architect-friendly), and chapter 8 talks about RSS 2.0 (improved RSS 0.9x). Chapter 4 is available online as a PDF file. Section 4.4 is recommended for those interested in promoting their RSS feeds as it provides pretty good reference to meta data.

Chapter 9 is perhaps of special interest to Web developers and administrators out there. It presents several code samples to properly parse RSS and present the result in readable HTML. The examples include (a) parsing with XML::Simple in Perl, (b) parsing with Perl regular expressions, (c) parsing with XML::Simple and sending the headlines to cell phones via WWW::SMS, (d) parsing via XSLT transformation. Python, PHP and ASP folks might feel left out due to the abundance of Perl examples, but if you got so far in the book, you can probably apply the regular expressions example or search for appropriate support for RSS format in your preferred language.

Going beyond the standard itself, RSS directories, aggregators and readers are discussed. Author makes a distinction between the last two by classifying Meerkat-like services into aggregators and desktop or Web applications designed to present the information to the user into readers. The chapter also provides information about Syndic8, its API, and describes the feed registration process. OReilly's Meerkat is also discussed in chapter, together with reference table for its API (you can make Meerkat generate HTML or RSS news headlines on certain topic or using certain keywords by providing a right query to its Web interface).

The book is quite a smooth read for a text describing the details of data specification. The chapters are informative and the book is not overloaded with useless information just to increase the page count. The tips are quite useful for someone, who is knew to the field and answers some questions not covered by standards (e.g., how often should you request an RSS feed, what to do if you're being screen-scraped, etc.)

I like the way the author divided the chapters into RSS 0.9x/2.0 and RSS 1.0 and kept two worlds apart. Most of the time you probably won't be interested in developing a feed to support both standards, but would like to focus just on one. The examples in Perl are perfect with me, although for someone new to Perl or programming in general those examples with abundant regular expressions might look a bit convoluted. Kudos to the author for not expanding on the topic, like many do, and providing an example of a script for RSS manipulation in every possible language out there.

What's missing? I wish more pages were dedicated to desktop RSS readers. FeedReader, HotSheet, Syndirella, Beaver and SharpReader are excellent end user applications currently gaining some popularity among those who'd prefer to browse the favorite headlines at a glance, instead of going to a dozen of sites every morning. To be fair, there's a huge list of readers in Appendix, and some applications mentioned above only came around in the last few months, which was probably after the book hit the press. Some sites also didn't make it into the book. I like DailyRotation and FreshNews that borrow from Meerkat's versatility and provide their own feed portal.

Overall, the book is a pretty good developer's guide to RSS standard. Accompanied with helpful illustrations and numerous tips it's an excellent resource for those unfamiliar with RSS and a helpful reference for those who have been doing Web syndication for a while.


You can purchase Content Syndication With RSS from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

This discussion has been archived. No new comments can be posted.

Content Syndication With RSS

Comments Filter:
  • by Anonymous Coward on Monday April 21, 2003 @11:02AM (#5773684)
    As both an aggregator and provider of content, I can safely say an entire book need not be devoted to the subject. Maybe a pamphlet. I guarantee this book is 90% fluff.
  • by Blaine Hilton ( 626259 ) on Monday April 21, 2003 @11:05AM (#5773713) Homepage
    I'm not familiar with the book, but content syndication is a big thing on the Internet right now, and I can understand why. RSS and XML data feeds are popping up everywhere and the average John Doe user would like to be able to parse those feeds for his/her website(s).

    I think the ability to easily transfer information in real time is just going to grow with time, this is not a fast fad.

    Go calculate [webcalc.net] something!

      • Well you can always say life is a fad. On a theological scale I like to think of the ends of the movie MIB and MIB II. Where they open the locker and find that the entire world we know is really some small spec of dust in a much larger world. It's all in the perspective.

        However related to technology I believe XML, and RSS being a type of XML are going to catch on and stay with us, much like HTML.

        Now go calculate [webcalc.net] something.

        • On a theological scale I like to think of the ends of the movie MIB and MIB II. Where they open the locker and find that the entire world we know is really some small spec of dust in a much larger world. It's all in the perspective.

          Woah, dude, that's like so deep. You should be a philosophy major and bring up those points in your 101 class, now *that* would be original.
    • is it just me? (Score:3, Interesting)

      Am I the only one who likes to view website/blog entries in their original context (where relevant) -- i.e. on the webpage where it was published -- even if it means I have a really long "links" list?

      Presentation isn't everything, but it matters.
      • Re:is it just me? (Score:3, Interesting)

        by derch ( 184205 )
        No, of course not. I like doing it to, but I've found that an RSS reader like NetNewsWire is the best way to keep up to date on blogs and news sites. Instead of visiting ten sites once or twice a day to see if new stories have been posted, the news reader just lets me know when new content is there. It also gives me the headlines - very useful when Slashdot carries crap stories for several days straight. A simple click of a menu item in NetNewsWire loads the story in a browser.

        I've gone from manually k
        • amen brother. NetNewsWire has changed the way I browse. The only thing that bugs me is that its not so easy to find the url for sites that do syndicate. Only reason I have PA's url is cause tycho linked it one day not long ago. You can't find it easily on the site. and thats how I think it is with alot of sites.
      • With RSS you still view the original website, just you get updates when the site has changed. It's really handy, I use Trillian Pro and it has a built in RSS interface, so whenever a new article shows up on slasdot I find out about it without me needing to hit their website throught the day. Then when I see an article that interests me, I just click on the RSS headline and it opens the slashdot article in a new browser.
    • by Anonymous Coward
      Here's the problem I have with content syndication/RSS.

      It is difficult for a non-professional or non-techie to implement someone else's feed on their site.

      I have content that is updated all day long and there are probably hundreds (at a minimum) of users that would love to add my content to their site via an RSS feed. And it would drive a lot of people to my site, too.

      Unfortunately, HTML is about as limited as most of these people get. A lot of them probably aren't even that far along - most likely using
      • Not at all useless!

        "...implement my feed on their site with no more difficulty than copying and pasting a few lines of pre-generated code"

        There are utils which do precisely this.

        It is done server-side in CGI or perl. The user is given a javascript snippet which pulls your RSS feed onto his or her website. Simple as that.

        Here's one ready to go...

        http://www.infinitepenguins.net/rss/

        Best regards -Resprung

      • All the RDF/RSS feed grabbers/users that I have seen are fairly involved perl (or other language) scripts that require a nice chunk of work on the webmaster's side.
        If a webmaster can handle HTML::Template, Syndic Lite [rant-central.com] offers a clean method of presenting RSS on a web page.
      • Try magpierss [sourceforge.net].
        Easy as heck to use: Just drop it in the source directory, add 3 lines of php code to the html and you're good to go.
      • <SHAMELESS PLUG>
        Take a look at http://www.cim.mcgill.ca/~simra/headline.html and view the page source. A few HTML comment lines fetch the rss source and then insert the links, titles, descriptions, etc any way you like. The down-side is that it's not automatic-- the html is static and must be generated by the headline script, which requires perl. The upshot is you can crontab that, and you don't need CGI capabilities on the web server.

        We've also got screen-scraping capabilities for a few sites t
      • There are php scripts that do just this. You just copy one class file (about 30 lines of code) to your web directory and paste about 5 lines of php into your html wherever you want the rss feed to go.

        No offence, but you cannot have looked very hard.
  • /. Feed (Score:4, Interesting)

    by dmdx0a0d ( 549007 ) on Monday April 21, 2003 @11:19AM (#5773803)
    When is /. going to use the RSS standard instead of its current PITA XML format?
  • by Our Man In Redmond ( 63094 ) on Monday April 21, 2003 @11:36AM (#5773924)
    Here's Slashdot's current RSS page:

    Slashdot
    http://slashdot.org/
    News for nerds, stuff that matters
    en-us
    Copyright 1997-2001, OSDN
    2003-04-21T16:33:48+00:00
    OSDN
    pater@slash dot.org
    Technology
    hourly
    1
    1970-01-01T00:00+0 0:00

    Your Headline Reader Has Been Banned
    http://slashdot.org/faq/accounts.shtml#ac1 050
    Your RSS reader is abusing the Slashdot server. You are requesting pages more often than our terms of service allow. Please see the FAQ link for more information, and if you email us, include your IPID MD5: 2be13864b6e87d2ec6b4701261c83663.

    You May Only Load Headlines Every 30 Minutes
    http://slashdot.org/faq/accounts.shtml#ac 1050
    Your RSS reader is abusing the Slashdot server. You are requesting pages more often than our terms of service allow. Please see the FAQ link for more information, and if you email us, include your IPID MD5: 2be13864b6e87d2ec6b4701261c83663.

    In 72 Hours, Your Ban Will Be Lifted
    http://slashdot.org/faq/accounts.shtml#ac1 050
    Your RSS reader is abusing the Slashdot server. You are requesting pages more often than our terms of service allow. Please see the FAQ link for more information, and if you email us, include your IPID MD5: 2be13864b6e87d2ec6b4701261c83663.

    Do Not Bother Contacting Us For 72 Hours
    http://slashdot.org/faq/accounts.shtml#ac10 50
    Your RSS reader is abusing the Slashdot server. You are requesting pages more often than our terms of service allow. Please see the FAQ link for more information, and if you email us, include your IPID MD5: 2be13864b6e87d2ec6b4701261c83663.


    So apparently we've not only succeeded in slashdotting Slashdot, we've gotten Slashdot to give us multiple duplicate posts! WE WIN!
    • Slashdot has been blocking my rss aggregator for about 2 weeks -- despite the fact that my aggregator is set to every 4 hours. The sad thing is I didn't really care because the RSS summaries were pretty crap. Not putting the full article summary (which in most cases is the article) is bad, Stopping in the middle of a sentence to do it is really bad...
      • I thought it was my aggregator too, but it turns out that it was Evolution, which was fetching every 10 minutes, even though I never read the summary.

        I think 30 minutes is a bit harsh, given the fact that many /. readers refresh index.html, which is a larger file, several times an hour.
    • Re:Meta-Slashdot! (Score:4, Insightful)

      by jpkunst ( 612360 ) on Monday April 21, 2003 @12:03PM (#5774127)

      Yes, RSS reader banning on /. is a bit extreme. Just trying to find the correct URLs to use got me banned for 72 hours.

      JP

  • by yerricde ( 125198 ) on Monday April 21, 2003 @11:37AM (#5773929) Homepage Journal

    How are sites that offer a Semantic Web interface such as RSS supposed to bring in revenue? They can't rely on advertising because the machines that browse the Semantic Web cannot be trusted to deliver advertising to a human eyeball.

    • 1) Text Ads
      2) If your just linking to other people stuff, the RSS link will go to them -- do I really owe you advertising bucks if all you do is link?
      3) If your linking to your stuff then the RSS link will go to your page which will presumably have ads on it if you care.
      4) Alot of sites (the majority?) that offer RSS feeds are not designed to make money and those that do have better ways of doing it than ads (example: reading Jon Udels blog made me buy one of his books).
      • Text Ads

        Are you saying put these in a separate section of the feed, where a machine can easily filter them out? Or would you put them in the main part of the feed itself, indistinguishable from a normal link, a practice which got a few search engines accused of corruption?

        do I really owe you advertising bucks if all you do is link?

        Try telling that to any major directory such as Yahoo!.

        Alot of sites (the majority?) that offer RSS feeds are not designed to make money

        In other words, the dot-com

        • Are you saying put these in a separate section of the feed, where a machine can easily filter them out? Or would you put them in the main part of the feed itself, indistinguishable from a normal link, a practice which got a few search engines accused of corruption?

          Yes, in a section that can be easily filtered. Anyone who cares can already filter out ads so why should this be different. People who care aren't going to buy you products on general principle so it's a wash.

          Try telling that to any major dire
      • 2) If your just linking to other people stuff, the RSS link will go to them -- do I really owe you advertising bucks if all you do is link?

        Well, in the case of those who have to write "scrapers" to deliver content for other websites (that get the traffic via the links & thus get to "expose" any advertising they may run), it would be nice to have some way to generate revenue & recoup development time.

        OTOH, my RSS's are popular with hacker types, geeks & computer security folks who might get,

    • 1) Write an advertisement and disguise it as a "review."
      2) Publish advertisement on web site as a real news story.
      3) ??????
      4) Profit!
    • Easy (Score:4, Interesting)

      by jimmyCarter ( 56088 ) on Monday April 21, 2003 @01:31PM (#5774787) Journal
      I've noticed that with most blogs, the content is actually placed into the RSS (HTML tags and all in some cases). Some of the bigger sites that offer feeds (/., News.com, etc.) provide a headline and then maybe a 40 character summary.

      In /.'s case, I'm consuimng the headlines via an aggregator, but all I'm seeing is a link and the article headline. I still go to the site to read the full content and the comments, so I'm still seeing the banner ads and such.

      The key is putting limited information, so you can draw the user to the site if you're trying to generate revenu from your content. Then, you better hope the internal link referenced in the feed has some advertising.
    • Good question. How about publishing the summaries of your content, but then requiring a direct connection (including the advertising) to read the content itself.

    • An great point. A few months ago, I wrote a scraper that would grab the latests posts to the computer security mailing lists archived at insecure.org [insecure.org] & convert them into valid RSS feeds. These rapidly became the most retrieved files on my website. Unfortunately, I can't even count these in a traffic analysis to a potential advertiser. Oh well ... nobody buys from my ads anyway ;-)

      And oh yeah, if you want to use those RSS's, they're at djeaux.com [djeaux.com]. Free & free of advertising!


    • ThinkGeek.com has an RSS feed
      http://www.thinkgeek.com/thinkgeek.rdf
      Where you can view all the latest stuff.

      The advertising industry needs to get more up-to-date,
      this isn't the 1950s anymore, and the general
      advertising ballgame hasn't changed.
  • List of RSS feeds? (Score:2, Interesting)

    by mortonda ( 5175 )
    It seems like this would be a good way for major news outlets to draw traffic to their sites - if I could put a brief RSS generated bit of info on one of my web pages, people might click the link and go to the other web sites. So why can't I find any RSS feeds for major news sites like CNN and such?

    Making an RSS feed is easy - I want to have RSS feeds of other more interesting sites avaiable to put in my own web pages. And that would benefit everyone, no?
    • CNN: http://rss.syntechsoftware.com/cnn.xml

      Why they don't provide one themselves is beyond me. Perhaps it's because they move as fast as a large dinosaur.
  • by SpaceKow ( 24359 ) on Monday April 21, 2003 @12:43PM (#5774448) Homepage Journal
    Password protected feeds add real value to RSS for obvious reasons. You won't always want everyone to read your feeds

    Diarist.com offers a HTTP Password protected RSS feed here. http://rsstest.diarist.com/

    As I write this... There are only two RSS clients which can read it's passworded feeds.

    1. NewsGator
    2. A beta version of FeedReader
  • I wrote a module for my site that caches headlines from feeds you enter in.

    Check it out. [bengarvey.com]

    (You need to register to edit the feeds you want to subscribe to)

  • by jmagar.com ( 67146 ) on Monday April 21, 2003 @01:10PM (#5774660) Homepage
    If you run PHP / MySQL website, and want a free and powerful RSS content syndication engine that easily integrates with any architecture check out MyHeadlines [jmagar.com]. Already having been ported to PHPNuke, PostNuke, Xoops, MyPHPNuke, PHPWebsite, and a Stand Alone version is also available. The easy CMS abstraction layer lets you integrate with just about any PHP based web site. It comes with a catagorized database of over 3000 feeds, and features a scraper subsystem for constructing new RSS feeds for sites that don't produce their own.

    Cheers,
    Mike

  • Just to let you know: Full featured RSS support ist scheduled for KDE 3.2. See http://dot.kde.org/1049415292/ for more information.
    This will include a RSS dcop service providing a powerful XML-RPC interface to www.syndic8.com, a new RSS konqueror sidebar and a rewritten knewsticker.
    Currently everything is still under development but already quite useful (if you know how to deal with dcop...). Let's hope we will have everything finished before KDE 3.2.
  • Not only can you read any users entries at:
    www.livejournal.com/users/andrewducker/rss

    but they syndicate over 1000 feeds in return. For instance you can add Slashdot to your friends list by adding user "Slashdot" or going to:
    http://www.livejournal.com/users/slashdot/

    I now read nearly all my news through syndication - you can see my total news feed at http://andrewducker.livejournal.com/friends/news

    Syndication has my news gathering a whole lot easier.
  • Slashdot's Palm page in an iframe is a nice solution for your personal starting page - if anyone still has such a thing.
  • Can anyone recommend a good RSS reader? I'd prefer webbased so I can just host it on my box w/apache and read my news from anywhere. Any tips? Thanks! cuban
    • I use this: http://magpierss.sourceforge.net/ (PHP)

      Basically I have a list of URLs for stuff I like in my DB and use Magpie RSS to go get, cache and parse it all for output in xhtml.

    • Re:RSS Reader? (Score:2, Informative)

      by quake74 ( 466627 )
      I needed one which didn't use a database, but only flatfiles. It took me a while to find them but here is what I've found:
      CafeRSS [tidakada.com] The one I'm using rigth now. Really easy.
      OnyxRSS [readinged.com] More powerful, uses the XML parsing fetures of PHP
      Rippy [sooke.bc.ca] Another one, I just don't know.
      Have fun!
      quake74
  • check out www.newsblob.com [newsblob.com] It's another general purpose daily news RSS feed site.
  • Jamie Zawinski came up with Cheesegrater [jwz.org] which allows you to get an RSS feed from sites that don't have an RSS feed.

    Kind of useful, written entirely in Perl, and I've tried it on a Linux box with no problems. Not sure if it'll work with other OSes, but its worth a shot.

    Go grab the two perl scripts and the cron job if need be.

No man is an island if he's on at least one mailing list.

Working...