RubyGems - yyyc514-syndication - Versions diffs - 0.6.1.1 - Mend

yyyc514-syndication 0.6.1.1

Files changed (24) hide show

data/CHANGES +10 -0
data/DEVELOPER +5 -0
data/IMPLEMENTATION +55 -0
data/README +228 -0
data/examples/apple.rb +24 -0
data/examples/google.rb +23 -0
data/examples/yahoo.rb +21 -0
data/lib/syndication/atom.rb +531 -0
data/lib/syndication/common.rb +289 -0
data/lib/syndication/content.rb +44 -0
data/lib/syndication/dublincore.rb +98 -0
data/lib/syndication/feedburner.rb +18 -0
data/lib/syndication/google.rb +58 -0
data/lib/syndication/podcast.rb +90 -0
data/lib/syndication/rss.rb +332 -0
data/lib/syndication/syndication.rb +49 -0
data/lib/syndication/tagsoup.rb +51 -0
data/rakefile +60 -0
data/test/atomtest.rb +190 -0
data/test/feedburntest.rb +79 -0
data/test/google.rb +91 -0
data/test/rsstest.rb +422 -0
data/test/tagsouptest.rb +86 -0
metadata +83 -0

data/CHANGES ADDED Viewed

@@ -0,0 +1,10 @@
+# == Changes in 0.5.1
+#
+# - Fixes for handling of CDATA-encoded text.
+#
+# == Changes in 0.5
+#
+# - Fixed problem with syndication/dublincore reported by Ura Takefumi.
+#
+# - Added new TagSoup completely-non-validating parser, tests for same,
+#   and option to use it for parsing feeds.

data/DEVELOPER ADDED Viewed

@@ -0,0 +1,5 @@
+# = Developer info for syndication project
+#
+# You only need to know this if actually hacking on the code via RubyForge.
+#
+# Release tags are of the format v_0_5 (for 0.5).

data/IMPLEMENTATION ADDED Viewed

@@ -0,0 +1,55 @@
+# = Implementation notes
+# == Syndication 0.5
+#
+# For this release, I added a parser called TagSoup. The name is taken from
+# the jargon term used for HTML written without any regard to the rules of
+# HTML structure, i.e. HTML with many common authoring mistakes in.
+#
+# TagSoup is a very small and very dumb parser which implements the stream
+# API of REXML. The test code compares it against REXML for some simple
+# example XML and makes sure it calls the same callbacks in the same order
+# with the same parameters.
+#
+# Note that hacking together your own XML parser is, generally speaking, the
+# wrong thing to do. Using TagSoup as a general replacement for REXML is very
+# definitely the wrong thing to do. Please don't do it.
+#
+# A real XML parser does all kinds of things that TagSoup doesn't, like pay
+# attention to DTDs, handle quoted special characters in element attributes,
+# handle whitespace in a documented standard way, and so on. The fact that
+# TagSoup is defective in many areas is intentional. It's designed to be
+# used as a last resort, for parsing web syndication feeds which are invalid.
+#
+# == Syndication 0.4
+#
+# As discussed in the README, this is really my fourth attempt at writing
+# RSS parsing code. For the record, I thought I'd list the approaches I
+# tried and abandoned. In a way, that's more interesting than the one I
+# picked...
+#
+# First I used hashes for storage and just looked for matching tags.
+# That approach works, kinda, but it doesn't really understand nested
+# elements at all. As a result, it becomes really hard to deal with Atom
+# feeds, where an <email> element could belong to one of a number of kinds
+# of person. Plus, I wanted a real object-based approach which would be
+# amenable to RDoc documentation.
+#
+# Next I wrote a classic stack-based parser, with a container stack and a
+# text buffer stack. That worked well for RSS; I got it parsing every RSS
+# variant, and even went as far as a test suite.  However, as I tried
+# extending it to deal with Atom, I realized that the parser code was
+# becoming hard to follow, as the state machine gained more and more
+# special cases.
+#
+# For a third iteration, I tried to generalize the knowledge represented by the
+# state machine, by placing it in the context stack. That is, I would have a
+# smart stack that knew which XML elements could go inside other elements.
+# Actually, there would have been four context stacks, for containers,
+# attributes, tags and textual data.
+#
+# That design never made it past the paper stage, because I realized that I
+# could move all the knowledge into the classes used to create the objects of
+# the final parse tree.  With the new model--the one used in this code--the
+# parser really doesn't know anything about Atom or RSS. It just forwards
+# events to a tree of objects, which construct child objects as appropriate to
+# grow the tree and represent the feed.

data/README ADDED Viewed

@@ -0,0 +1,228 @@
+# = Syndication 0.6
+#
+# This module provides classes for parsing web syndication feeds in RSS and
+# Atom formats.
+#
+# To parse RSS, use Syndication::RSS::Parser.
+#
+# To parse Atom, use Syndication::Atom::Parser.
+#
+# If you want my advice on which to generate, my order of preference would
+# be:
+#
+# 1. Atom 1.0
+# 2. RSS 1.0
+# 3. RSS 2.0
+#
+# My reasoning is simply that I hate having to sniff for HTML (see
+# Syndication::RSS).
+#
+# == License
+#
+# Syndication is Copyright 2005-2006 mathew <meta@pobox.com>, and is licensed
+# under the same terms as Ruby.
+#
+# == Requirements
+#
+# Built and tested using Ruby 1.8.4. Needs only the standard library.
+#
+# == Rationale
+#
+# Ruby already has an RSS library as part of the standard library, so you
+# might be wondering why I decided to write another one.
+#
+# I started out trying to document the standard rss module, but found the
+# code rather impenetrable. It was also difficult to see how it could be made
+# documentable via Rdoc.
+#
+# Then I tried writing code to use the standard RSS library, and discovered
+# that it had a number of (what I consider to be) defects:
+#
+# - It doesn't support RSS 2.0 with extensions (such as iTunes podcast feeds),
+#   and it wasn't clear to me how to extend it to do so.
+#
+# - It doesn't support RSS 0.9.
+#
+# - It doesn't support Atom.
+#
+# - The API is different depending on what kind of RSS feed you are parsing.
+#
+# I asked around, and discovered that I wasn't the only person dissatisfied
+# with the RSS library. Since fixing the problems would have resulted in
+# breaking existing code that used the RSS module, I opted for an all-new
+# implementation.
+#
+# This is the result. The first release was version 0.4, which was actually my
+# fourth attempt at putting together a clean, simple, universal API for RSS
+# and Atom parsing. (The first three never saw public release.)
+#
+# == Features
+#
+# Here are what I see as the key improvements over the rss module in the
+# Ruby standard library:
+#
+# - Supports all RSS versions, including RSS 0.9, as well as Atom.
+#
+# - Provides a unified API/object model for accessing the decoded data,
+#   with no need to know what format the feed is in.
+#
+# - Allows use of extended RSS 2.0 feeds.
+#
+# - Simple API, fully documented.
+#
+# - Test suite with over 220 test assertions.
+#
+# - Commented source code.
+#
+# - Less source code than the standard library rss module.
+#
+# - Faster than the standard library (at least, in my tests).
+#
+# Other features:
+#
+# - Optional support for RSS 1.0 Dublin Core, Syndication and Content modules,
+#   Apple iTunes Podcast elements, and Google Calendar.
+#
+# - Content module decodes CDATA-escaped or encoded HTML content for you.
+#
+# - Supports namespaces, and encoded XHTML/HTML in Atom feeds.
+#
+# - Dates decoded to Ruby DateTime objects. Note, however, that this is slow,
+#   so parsing is only performed if you ask for the value.
+#
+# - Simple to extend to support your own RSS extensions, uses reflection.
+#
+# - Uses REXML fast stream parsing API for speed, or built-in TagSoup parser
+#   for invalid feeds.
+#
+# - Non-validating, tries to be as forgiving as possible of structural errors.
+#
+# - Remaps namespace prefixes to standard values if it recognizes the module's
+#   URL.
+#
+# In the interests of balance, here are some key disadvantages over the
+# standard library RSS support:
+#
+# - No support for _generating_ RSS feeds, only for parsing them. If
+#   you're using Rails, you can use RXML; if not, you can use rss/maker.
+#   My feeling is that XML generation isn't a wheel that needs reinventing.
+#
+# - Different API, not a drop-in replacement.
+#
+# - Incomplete support for Atom 0.3 draft. (Anyone still using it?)
+#
+# - No support for base64 data in Atom feeds (yet).
+#
+# - No Japanese documentation.
+#
+# - No XSL output options.
+#
+# - Slower if there are dates in the feed and you ask for their values.
+#
+# == Other options
+#
+# There are, of course, other Ruby RSS/Atom libraries out there. The ones I
+# know about:
+#
+# = simple-rss
+#
+# http://rubyforge.org/projects/simple-rss
+#
+# Pros:
+# - Much smaller than syndication or rss.
+#
+# - Completely non-validating.
+#
+# - Backwards compatible with rss in standard library.
+#
+# Cons:
+# - Doesn't use a real XML parser.
+#
+# - No support for namespaces.
+#
+# - Incomplete Atom support (e.g. can't get name and e-mail of <atom:person>
+#   elements as separate fields, you still have to decode XHTML data yourself)
+#
+# - No documentation.
+#
+# For the record, I started work on my library long before simple-rss was
+# announced.
+#
+# = feedtools
+#
+# http://rubyforge.org/projects/feedtools/
+#
+# This one solves most of the same problems as Syndication; however the two
+# were developed in parallel, in ignorance of each other.
+#
+# Feedtools builds in database caching and persistance, and HTTP fetching.
+# Personally, I don't think those belong in a feed parsing library--they
+# are easily implemented using other standard libraries if you want them.
+#
+# Pros:
+# - Lots of test cases.
+#
+# - Used by lots of Rails people.
+#
+# - Knows about many more namespaces.
+#
+# - Can generate feeds.
+#
+# Cons:
+# - Skimpy documentation.
+#
+# - Uses HTree then XPath parsing, rather than a single stream parse.
+#
+# - Tries to unify RSS and Atom APIs, at the expense of Atom functionality.
+#   (Which could also be a pro, depending on your viewpoint.)
+#
+# == Design philosophy
+#
+# Here's my design philosophy for this module:
+#
+# - The interface should be via standard Ruby objects and methods; e.g.
+#   feed.channel.item[0].title, rather than (say) a dictionary hash.
+#
+# - It should be easier to parse RSS via the module than to hack something
+#   together using REXML, even if all you want is a list of titles and URLs.
+#
+# - It should be easy to add support for new RSS extensions without needing
+#   to know anything about reflection or other advanced topics. Just define
+#   a mixin with a bunch of appropriately-named methods, and you're done.
+#
+# - The code should be simple to understand.
+#
+# - Even so, good complete documentation is extremely important.
+#
+# - Be lenient in what you accept.
+#
+# - Be conservative in what you generate.
+#
+# - Get well-formed feeds parsing reliably, then worry about broken feeds.
+#
+# - Atom will hopefully be the future. Provide full support for RSS, but don't
+#   hold Atom back by trying to force it into an RSS data model.
+#
+# == Future plans
+#
+# Here are some possible improvements:
+#
+# - RSS and Atom generation. Create objects, then call Syndication::FeedMaker
+#   to generate XML in various flavors. This probably won't happen until an XML
+#   generator is picked for the Ruby standard library.
+#
+# - Faster date parsing. It turns out that when I asked for parsed dates in
+#   my test code, the profiler showed Date.parse chewing up 25% of the total
+#   CPU time used. A more specific ISO8601 specific date parser could cut
+#   that down drastically.
+#
+# - Additional Google Data support. I just wanted to be able to display my
+#   upcoming calendar dates, but clearly there is a lot more that could be
+#   implemented. Unfortunately, recurring events don't seem to have a clean
+#   XML representation in Google's data feeds yet.
+#
+# == Feedback
+#
+# There are doubtless things I could have done better. Comments, suggestions,
+# etc are welcome; e-mail <meta@pobox.com>.
+#

data/examples/apple.rb ADDED Viewed

@@ -0,0 +1,24 @@
+# Example of using RSS 1.0 content module in RSS 2.0.
+# (Naughty, but there you go.)
+require 'rubygems'
+require 'syndication/rss'
+require 'syndication/content'
+require 'open-uri'
+url = 'http://docs.info.apple.com/rss/allproducts.rss'
+parser = Syndication::RSS::Parser.new
+xml = nil
+open(url) { |http|
+  xml = http.read
+}
+feed = parser.parse(xml)
+for i in feed.items
+  puts i.content_encoded
+  puts
+end

data/examples/google.rb ADDED Viewed

@@ -0,0 +1,23 @@
+# Atom syndication example:
+# Output upcoming events from a Google calendar feed
+require 'open-uri'
+require 'syndication/atom'
+require 'syndication/google'
+MY_CALENDAR = 'http://www.google.com/calendar/feeds/j4a3sad66efnj3rm5ou2fbnsbg@group.calendar.google.com/public/full'
+parser = Syndication::Atom::Parser.new
+feed = nil
+open(MY_CALENDAR) {|file|
+  text = file.read
+  feed = parser.parse(text)
+}
+t = feed.updated.strftime("%H:%I on %A %d %B")
+puts "#{feed.title.txt}: #{feed.subtitle.txt} (updated #{t})"
+for e in feed.entries
+  if e.gd_when && e.gd_when.first
+    t = e.gd_when.first.strftime("%d %b %y")
+    puts "#{t}: #{e.title.txt}"
+  end
+end

data/examples/yahoo.rb ADDED Viewed

@@ -0,0 +1,21 @@
+# RSS Syndication example:
+#
+# Output Yahoo news headlines, dated.
+require 'open-uri'
+require 'syndication/rss'
+parser = Syndication::RSS::Parser.new
+feed = nil
+open("http://rss.news.yahoo.com/rss/topstories") {|file|
+  text = file.read
+  feed = parser.parse(text)
+}
+chan = feed.channel
+t = chan.lastbuilddate.strftime("%H:%I on %A %d %B")
+puts "#{chan.title} at #{t}"
+for i in feed.items
+  t = i.pubdate.strftime("%d %b")
+  puts "#{t}: #{i.title}"
+end