RubyGems - feedme - Versions diffs - 0.1 → 0.8.0 - Mend

feedme 0.1 → 0.8.0

Files changed (11) hide show

data/History.txt CHANGED Viewed

@@ -1,3 +1,103 @@
+=== 0.8 / 2009-12-14
+* Add new virtual method _values: returns all values for a given tag.
+* Transformations with arguments are now specified as an array rather than
+  part of the symbol
+* Add transform method
+* Add regexp transform
+* Add nokogiri support (hpricot is still the default)
+* Copy/paste and fix feed-normalizer clean_html method, drop feed-normalizer dependency
+=== 0.7.1 / 2009-09-24
+* Fix nil_or_empty? to strip whitespace from strings
+=== 0.7 / 2009-09-24
+* Design decision: all element and attribute names will be stored as lower-case. They may still
+  be accessed using upper case, since keys will be normalized by all accessors.
+* Design decision: RDF will be dealt with at parse time: elements with rdf:resource attributes will be
+  replaced by the actual, referenced elements. Ordering of the referring elements will be preserved.
+* Removed the concept of ghost tags.
+=== 0.6.5 / 2009-09-24
+* Fix :truncHtml completely by requiring active_support.
+=== 0.6.4 / 2009-09-23
+* Roll version to make github happy.
+=== 0.6.3 / 2009-09-23
+* Fix truncHtml: use code by Henrik Nyh, which in turn uses Hypricot
+=== 0.6.2 / 2009-09-23
+* Fix content-parsing regular expression to correctly handle closed elements
+* Reverse earlier design decision: keep namespaces for attributes.
+=== 0.6.1 / 2009-09-23
+* Improve handling of rdf:items. From now on, .items will forward to .item_array. The rdf items can still be accessed by [:items_array] or .items_array.
+=== 0.6 / 2009-09-23
+* Fix handling of the items element (mostly affects RSS 1.0 documents)
+* Make attribute naming consistent
+* Design decision: attributes can only ever have a single value, so they will always be stored as scalars
+  rather than arrays. This will also nicely resolve any possible collisions between attribute and tag names.
+=== 0.5.4 / 2009-09-22
+* Minor improvements to to_indented_s
+* Fix tag names: change all tags with namespaces to the cleaned version (unquote, ':' replaced with '_')
+* Design decision: all attribute names will have their namespaces stripped; namespaces are generally
+  treated as optional (even if they aren't technically so) and it's annoying to have to check both forms;
+  this decision may be reversed if there are found to be conflicts
+=== 0.5.3 / 2009-09-22
+* Roll version to test GitHub wierdness.
+=== 0.5.2 / 2009-09-22
+* Improve to_s method for prettier array display.
+=== 0.5.1 / 2009-09-21
+* Update example code
+* Bug fix: call_virtual_method has invalid return if neither a key nor any of its aliases has a value
+* Subsequent releases will follow standard versioning model of "major.minor.bugfix"
+=== 0.5 / 2009-09-21
+* Special handling for atom id tag
+* to_indented_str method, which creates a pretty output for a FeedData
+* Improved to_s method that delegates to to_indented_str
+=== 0.4 / 2009-09-20
+* Expose call_virtual_method as public
+* Change 'name' argument of call_virtual_method to 'sym'
+* Add default value for call_virtual_method 'args' argument
+* Add :'media:content' and :'content:encoded' as ext tags
+* fix use of FeedNormalizer in :cleanHtml transformation
+=== 0.3 / 2009-09-18
+* Update example code
+* Bug fix: call_virtual_method always throws exception
+* Bug fix: responds_to? -> respond_to? and rels -> :rels
+=== 0.2 / 2009-09-12
+* Change bang mods to more flexible transformations framework.
+* Add additional transformation functions.
+* Add methods for RSS/Atom emulation that automatically add appropriate aliases.
+* Add empty_string_for_nil and error_on_missing_key options.
+* Add support for parsing only certain rels in the strict parser.
 === 0.1 / 2009-09-03
 * Everything is new. First release.

data/Manifest.txt CHANGED Viewed

@@ -3,5 +3,7 @@ Manifest.txt
 README.txt
 Rakefile
 lib/feedme.rb
+lib/truncator.rb
+lib/util.rb
 examples/rocketboom.rb
 examples/rocketboom.rss

data/README.txt CHANGED Viewed

@@ -1,6 +1,6 @@
 = feedme
-* http://feedme.rubyforge.org
+* http://wiki.github.com/jdidion/feedme
 == DESCRIPTION:
@@ -24,76 +24,143 @@ The API is similar to SimpleRSS:
     require 'open-uri'
     rss = FeedMe.parse open('http://slashdot.org/index.rdf')
-	rss.version # => 1.0
+    rss.version # => 1.0
     rss.channel.title # => "Slashdot"
     rss.channel.link # => "http://slashdot.org/"
     rss.items.first.link # => "http://books.slashdot.org/article.pl?sid=05/08/29/1319236&from=rss"
-But since the parser can read Atom feeds as easily as RSS feeds, there are optional aliases that allow more atom like reading:
+But since the parser can read Atom feeds as easily as RSS feeds, there are aliases that allow more atom like reading:
     rss.feed.title # => "Slashdot"
     rss.feed.link # => "http://slashdot.org/"
     rss.entries.first.link # => "http://books.slashdot.org/article.pl?sid=05/08/29/1319236&from=rss"
-Under the covers, all content is stored in arrays. This means that you can access all content for a tag that appears multiple times (i.e. category):
-	rss.items.first.category_array	# => ["News for Nerds", "Technology"]
-	rss.items.first.category # => "News for Nerds"
+Under the covers, all element values are stored in arrays. This means that you can access all content for an element that appears multiple times (i.e. category):
+    rss.items.first.category_array  # => ["News for Nerds", "Technology"]
+    rss.items.first.category # => "News for Nerds"
 You also have access to all the attributes as well as tag values:
-	rss.items.first.guid.isPermaLink # => "true"
-	rss.items.first.guid.content	 # => http://books.slashdot.org/article.pl?sid=05/08/29/1319236
+    rss.items.first.guid.isPermaLink # => "true"
+    rss.items.first.guid.content     # => http://books.slashdot.org/article.pl?sid=05/08/29/1319236
 FeedMe also adds some syntactic sugar that makes it easy to get the information you want:
-	rss.items.first.category? # => true
-	rss.items.first.category_count # => 2
-	rss.items.first.guid_content # => http://books.slashdot.org/article.pl?sid=05/08/29/1319236
+    rss.items.first.category? # => true
+    rss.items.first.category_count # => 2
+    rss.items.first.guid_value # => http://books.slashdot.org/article.pl?sid=05/08/29/1319236
 There are two different parsers that you can use, depending on your needs. The default parser is "promiscuous," meaning that it parses all tags. There is also a strict parser that only parses tags specified in a list. Here is how you create the different types of parsers:
-	FeedMe.parse(source) # parse using the default (promiscuous) parser
-	FeedMe::ParserBuilder.new.parse(source) # equivalent to the previous line
-	FeedMe::StrictParserBuilder.new.parse(source) # only parse certain tags
+    FeedMe.parse(source) # parse using the default (promiscuous) parser
+    FeedMe::ParserBuilder.new.parse(source) # equivalent to the previous line
+    FeedMe.parse_strict(source)
+    FeedMe::StrictParserBuilder.new.parse(source) # only parse certain tags
+The FeedMe class methods and the parser builder constructors also accept an options hash. Options are also passed on to the Parser constructor. Currently, only two options are available:
+1. :empty_string_for_nil => false # return the empty string instead of a nil value
+2. :error_on_missing_key => false # raise an error if a specified key or virtual method does not exist (otherwise nil is returned)
 The strict parser can be extended by adding new tags to parse:
-	builder = FeedMe::StrictParserBuilder.new
-	builder.rss_tags << :some_new_tag
-	builder.rss_item_tags << :'item+myrel' # parse an item that has a custom rel type
-	builder.item_ext_tags << :'feedburner:origLink' # parse an extension tag - one that has a specific namespace
+    builder = FeedMe::StrictParserBuilder.new
+    builder.rss_tags << :some_new_tag
+    builder.rss_item_tags << :'item+myrel' # parse an item that has a custom rel type
+    builder.item_ext_tags << :feedburner_origLink # parse an extension tag - one that has a specific
+                                                  # namespace (use '_', not ':', to separate namespace
+                                                  # from attribute name)
 Either parser can be extended by adding aliases to existing tags:
-	builder.aliases[:updated] => :pubDate  # now you can always access the updated date using :updated, regardless of whether it's an RSS or Atom feed
+    builder.aliases[:updated] => :pubDate  # now you can always access the updated date using :updated,
+                                           # regardless of whether it's an RSS or Atom feed
+If you don't know ahead of time what type of feed you'll be parsing, you can tell FeedMe to always emulate RSS or Atom. These methods just add a bunch of aliases:
+    builder.emulate_rss!
+    builder.emulate_atom!
+Another bit of syntactic sugar are transformations. These are modifications that can be applied to feed content. There is a default transformation that can be applied by adding '!' to the tag name.
+    rss.entry.content  # => <div>Some great stuff</div>
+    rss.entry.content! # => Some great stuff
+The default transformation can be changed:
+    builder.default_transformation = [ :cleanHtml ]
+Custom transformations are defined by mapping one or more transformation functions to a suffix:
+    builder.transformations['clean'] = [ :cleanHtml ]
+    rss.entry.content           # => <div>This is a bunch of text</div><p></p></html>
+    rss.entry.content_clean     # => <div>This is a bunch of text</div>
+You can also/instead apply an arbitrary set of transformations via the transform method:
-Another bit of syntactic sugar is the "bang mod." These are modifications that can be applied to feed content by adding '!' to the tag name. The default bang mod is to strip HTML tags from the content.
+    rss.entry.transform(:content, [ :clean, [ :trunc, 50 ] ])
-	rss.entry.content # => <div>Some great stuff</div>
-	rss.entry.content! # => Some great stuff
-You can create your own bang mods. The following is an example of a bang mod that takes an argument. The first line is how bang mods are added, and the third line tells the builder to actually apply this bang mod when the '!' suffix is used. Note that bang mod names may only contain alphanumeric characters. Argument values are specified at the end separated by underscores.
+You can create your own transformation function. The following is an example of a transformation function that takes an argument. Note that transformation function names may only contain alphanumeric characters. Argument values are specified at the end separated by underscores.
+    builder.transformation_fns[:wrap] => proc {|str, col|
+        str.gsub(/(.{1,#{col}})( +|$\n?)|(.{1,#{col}})/, "\\1\\3\n").strip
+    }
+    builder.transformations['wrap'] = [ :wrap_10 ]
+    rss.entry.content = This is a bunch of text
+    rss.entry.content_wrap = This is a
+                             bunch of
+                             text
-	# wrap content at a specified number of columns
-	builder.bang_mod_fns[:wrap] => proc {|str, col| str.gsub(/(.{1,#{col}})( +|$\n?)|(.{1,#{col}})/, "\\1\\3\n").strip }
-	builder.bang_mods << :wrap_80
+The transformation functions available by default are:
+1. :stripHtml - described above
+2. :cleanHtml - ** Requires FeedNormalizer (which in turn requires Hypricot) **
+    rss.entry_array[0].content  # => 1 > 2
+    rss.entry_array[0].content! # => 1 &gt; 2
+    rss.entry_array[1].content  # => <div>Some great stuff</div><p></p></html>
+    rss.entry_array[1].content! # => <div>Some great stuff</div>
+3. :wrap - takes number of columns as a parameter. Respects word boundaries. Example of :wrap_10:
+    rss.entry.content  # => This is a bunch of text
+    rss.entry.content! # => This is a
+                            bunch of
+                            text
+4. :trunc - truncates text to a certain length. Example of :trunc_10:
+    rss.entries.first.content  # => This is a long long long sentence
+    rss.entries.first.content! # => This is a
+5. :truncHtml - truncates the content inside the first set of HTML tags, but preserves the tags. ** Requires ActiveSupport and Hpricot ** Example of :truncHtml_10:
+    rss.entries.first.content  # => <div>This is a long long long sentence</div></html>
+    rss.entries.first.content! # => <div>This is a </div></html>
+6. :regexp - apply a regular expression and extract the capture groups
+    rss.entries.first.content  # => This is a long long long entry
+    rss.entries.first.transform(:content, [ :regexp, /(This is a long ).*(entry)/ ]) # => This is a long entry
 In order to prevent clashes between tag/attribute names and the parser class' instance variables, all instance variables are prefixed with 'fm_'. They are:
-	fm_source	# the original, unparsed source
-	fm_options	# the options passed to the parser constructor
-	fm_type		# the feed type
-	fm_tags		# the tags the parser looks for in the source
-	fm_parsed	# the list of tags the parser actually found
-	fm_unparsed # the list of tags that appeared in the feed but were not parsed (useful for debugging)
+    fm_source   # the original, unparsed source
+    fm_options  # the options passed to the parser constructor
+    fm_type     # the feed type
+    fm_tags     # the tags the parser looks for in the source
+    fm_parsed   # the list of tags the parser actually found
+    fm_unparsed # the list of tags that appeared in the feed but were not parsed (useful for debugging)
 Additionally, there are several variables that are available at every level of the parse tree:
-	fm_builder	# the ParserBuilder that created the parser
-	fm_parent	# the container of the current level of the parse tree
-	fm_tag_name # the name of the rss/atom tag whose content is contained in this level of the tree
+    fm_builder  # the ParserBuilder that created the parser
+    fm_parent   # the container of the current level of the parse tree
+    fm_tag_name # the name of the rss/atom tag whose content is contained in this level of the tree
 === A word on RSS/Atom Versions
@@ -107,9 +174,20 @@ Due to various incompatibilities between different RSS versions, it is strongly
 == INSTALL:
-* gem install feedme
-* http://rubyforge.org/projects/feedme
+* gem install jdidion-feedme (Add GitHub as a gem source: gem sources -a http://gems.github.com)
+* http://github.com/jdidion/feedme/downloads
+To use certain features of FeedMe, some dependencies are required:
+* To use the :truncHtml transformation for truncating HTML content, ActiveSupport and Hpricot are required
+    sudo gem install activesupport
+    sudo gem install hpricot
+* To use the :cleanHtml for sanitizing HTML, FeedNormalizer and Hpricot are required
+    sudo gem install feed-normalizer
+    sudo gem install hpricot
 == LICENSE:
-This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
+This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

data/Rakefile CHANGED Viewed

@@ -1,7 +1,24 @@
 require 'rubygems'
-require 'hoe'
+require 'jeweler'
-Hoe.spec 'feedme' do |hoe|
-  hoe.developer('John Didion', 'jdidion@rubyforge.org')
-  hoe.rubyforge_name = 'feedme'
+tasks = Jeweler::Tasks.new do |s|
+  s.name = "feedme"
+  s.authors = ["John Didion"]
+  s.description = %q{A simple, flexible, and extensible RSS and Atom parser for Ruby. Based on the popular SimpleRSS library, but with many nice extra features.}
+  s.email = ["code@didion.net"]
+  s.extra_rdoc_files = ["History.txt", "Manifest.txt", "README.txt"]
+  s.files = ["History.txt", "Manifest.txt", "README.txt", "Rakefile",
+    "lib/feedme.rb", "lib/hpricot-util.rb", "lib/nokogiri-util.rb",
+    "lib/html-cleaner.rb", "lib/util.rb", "examples/rocketboom.rb",
+    "examples/rocketboom.rss", "test/test_helper.rb"]
+  s.homepage = %q{http://wiki.github.com/jdidion/feedme}
+  s.rdoc_options = ["--main", "README.txt"]
+  s.require_paths = ["lib"]
+  s.rubyforge_project = %q{feedme}
+  s.summary = %q{A simple, flexible, and extensible RSS and Atom parser for Ruby}
+  s.test_files = ["test/test_helper.rb"]
 end
+tasks.jeweler.remote = 'github'
+Jeweler::GemcutterTasks.new

data/examples/rocketboom.rb CHANGED Viewed

@@ -1,5 +1,6 @@
-#require 'feedme'
-require '../lib/feedme'
+#!/usr/bin/ruby
+require 'rubygems'
+require 'feedme'
 require 'net/http'
 def fetch(url)
@@ -24,13 +25,13 @@ end
 # create a new ParserBuilder
 builder = FeedMe::ParserBuilder.new
 # add a bang mod to wrap content to 50 columns
-builder.bang_mods << :wrap_80
+builder.default_transformation << :wrap_80
 # parse the rss feed
 rss = builder.parse(content)
 # equivalent to rss.channel.title
-puts "#{rss.type} Feed: #{rss.title}"
+puts "#{rss.class} Feed: #{rss.title}"
 # use a virtual method...this one a shortcut to rss.items.size
 puts "#{rss.item_count} items"

data/lib/feedme.rb CHANGED Viewed

@@ -1,54 +1,84 @@
-####################################################################################
-# FeedMe v0.1
-#
-# FeedMe is an easy to use parser for RSS and Atom files. It is based on SimpleRSS,
-# but has some improvements that make it worth considering:
-# 1. Support for attributes
-# 2. Support for nested elements
-# 3. Support for elements that appear multiple times
-# 4. Syntactic sugar that makes it easier to get at the information you want
-#
-# One word of caution: FeedMe will be maintained only so long as SimpleRSS does not
-# provide the above features. I will try to keep FeedMe's API compatible with
-# SimpleRSS so that it will be easy for users to switch if/when necessary.
-####################################################################################
 require 'cgi'
 require 'time'
+require 'util.rb'
 module FeedMe
-  VERSION = "0.1"
+  # The current version of FeedMe.
+  VERSION = "0.7.2"
-  # constants for the feed type
+  # The value of Parser#fm_type for RSS feeds.
   RSS  = :RSS
+  # The value of Parser#fm_type for RDF (RSS 1.0) feeds.
+  RDF  = :RDF
+  # The value of Parser#fm_type for Atom feeds.
   ATOM = :ATOM
-  # the key used to access the content element of a mixed tag
+  # The key used to access the content element of a mixed tag.
   CONTENT_KEY = :content
+  # Helper libraries for HTML functions
+  NOKOGIRI_HELPER = 'nokogiri-util.rb'
+  HPRICOT_HELPER = 'hpricot-util.rb'
+  # Parse a feed using the promiscuous parser.
   def FeedMe.parse(source, options={})
-    ParserBuilder.new.parse(source, options)
+    ParserBuilder.new(options).parse(source)
   end
+  # Parse a feed using the strict parser.
   def FeedMe.parse_strict(source, options={})
-    StrictParserBuilder.new.parse(source, options)
+    StrictParserBuilder.new(options).parse(source)
   end
+  # This class is used to create promiscuous parsers.
   class ParserBuilder
-    attr_accessor :rss_tags, :rss_item_tags, :atom_tags, :atom_entry_tags,
-                  :date_tags, :value_tags, :ghost_tags, :aliases,
-                  :bang_mods, :bang_mod_fns
+    # The options passed to this ParserBuilder's constructor.
+    attr_reader :options
+    # The tags that are parsed for RSS feeds.
+    attr_accessor :rss_tags
+    # The subtags of item elements that are parsed for RSS feeds.
+    attr_accessor :rss_item_tags
+    # The tags that are parsed for Atom feeds.
+    attr_accessor :atom_tags
+    # The subtags of entry elements that are parsed for Atom feeds.
+    attr_accessor :atom_entry_tags
+    # The names of tags that should be parsed as date values.
+    attr_accessor :date_tags
+    # An array of names of attributes/subtags whose values can be
+    # used as the default value of a mixed element.
+    attr_accessor :value_tags
+    # Tags to use for element value when specific tag isn't specified
+    attr_accessor :default_value_tags
+    # A hash of attribute/tag name aliases.
+    attr_accessor :aliases
+    # An array of the transformation functions applied when the !
+    # suffix is added to the attribute/tag name.
+    attr_accessor :default_transformation
+    # Mapping of transformation names to functions. Each key is a
+    # suffix that can be appended to an attribute/tag name, and
+    # the value is an array of transformation function names that
+    # are applied when that transformation is used.
+    attr_accessor :transformations
+    # Mapping of transformation function names to Procs.
+    attr_accessor :transformation_fns
+    # the helper library used for HTML transformations
+    attr_accessor :html_helper_lib
-    # the promiscuous parser only has to know about tags that have nested subtags
-    def initialize
+    # Create a new ParserBuilder. Allowed options are:
+    # * :empty_string_for_nil => false # return the empty string instead of a nil value
+    # * :error_on_missing_key => false # raise an error if a specified key or virtual
+    #   method does not exist (otherwise nil is returned)
+    def initialize(options={})
+      @options = options
       # rss tags
     	@rss_tags = [
     	  {
     		  :image     => nil,
-          :textInput => nil,
-          :skipHours => nil,
-          :skipDays  => nil,
-          :items     => [{ :'rdf:Seq' => nil }],
+          :textinput => nil,
+          :skiphours => nil,
+          :skipdays  => nil,
+          :items     => [{ :rdf_seq => nil }],
          #:item      => @rss_item_tags
     		}
     	]
@@ -70,14 +100,15 @@ module FeedMe
       ]
       # tags whose value is a date
-      @date_tags = [ :pubDate, :lastBuildDate, :published, :updated, :'dc:date', :expirationDate ]
+      @date_tags = [ :pubdate, :lastbuilddate, :published, :updated, :dc_date,
+        :expirationdate ]
-      # tags that can be used as the default value for a tag with attributes
-      @value_tags = [ CONTENT_KEY, :href ]
+      # tags that can be used as the default value for a mixed element
+      @value_tags = {
+        :media_content => :url
+      }
+      @default_value_tags = [ CONTENT_KEY, :href, :url ]
-      # tags that don't become part of the parsed object tree
-      @ghost_tags = [ :'rdf:Seq' ]
       # tag/attribute aliases
     	@aliases = {
     	  :items        => :item_array,
@@ -87,64 +118,130 @@ module FeedMe
     	  :link         => :'link+self'
     	}
-    	# bang mods
-    	@bang_mods = [ :stripHtml ]
-    	@bang_mod_fns = {
-    	  :stripHtml => proc {|str| str.gsub(/<\/?[^>]*>/, "").strip },
-    	  :wrap      => proc {|str, col| str.gsub(/(.{1,#{col}})( +|$\n?)|(.{1,#{col}})/, "\\1\\3\n").strip }
+    	# transformations
+    	@html_helper_lib = HPRICOT_HELPER
+    	@default_transformation = [ :cleanHtml ]
+    	@transformations = {}
+    	@transformation_fns = {
+    	  # remove all HTML tags
+    	  :stripHtml => proc do |str|
+    	    require @html_helper_lib
+    	    FeedMe.html_helper.strip_html(str)
+    	  end,
+    	  # clean HTML content using FeedNormalizer's HtmlCleaner class
+    	  :cleanHtml => proc do |str|
+    	    require @html_helper_lib
+    	    FeedMe.html_helper.clean_html(str)
+    	  end,
+    	  # wrap text at a certain number of characters (respecting word boundaries)
+    	  :wrap => proc do |str, col|
+    	    str.gsub(/(.{1,#{col}})( +|$\n?)|(.{1,#{col}})/, "\\1\\3\n").strip
+    	  end,
+    	  # truncate text, respecting word boundaries
+    	  :trunc => proc {|str, wordcount| str.trunc(wordcount.to_i) },
+        # truncate HTML and leave enclosing HTML tags
+        :truncHtml => proc do |str, wordcount|
+          require @html_helper_lib
+    	    FeedMe.html_helper.truncate_html(str, wordcount.to_i)
+        end,
+        :regexp => proc do |str, regexp|
+          match = Regexp.new(regexp).match(str)
+          match.nil? ? nil : match[1]
+        end,
     	}
     end
+    # Prepare tag list for an RSS feed.
     def all_rss_tags
       all_tags = rss_tags.dup
       all_tags[0][:item] = rss_item_tags.dup
       return all_tags
     end
+    # Prepare tag list for an Atom feed.
     def all_atom_tags
       all_tags = atom_tags.dup
       all_tags[0][:entry] = atom_entry_tags.dup
       return all_tags
     end
-    def parse(source, options={})
+    # Add aliases so that Atom feed elements can be accessed
+    # using the names of their RSS counterparts.
+    def emulate_rss!
+      aliases.merge!({
+        :guid           => :id,       # this alias never actually gets used; see FeedData#id
+        :copyright      => :rights,
+        :pubdate        => [ :published, :updated ],
+        :lastbuilddate  => [ :updated, :published ],
+        :description    => [ :content, :summary ],
+        :managingeditor => [ :'author/name', :'contributor/name' ],
+        :webmaster      => [ :'author/name', :'contributor/name' ],
+        :image          => [ :icon, :logo ]
+      })
+    end
+    # Add aliases so that RSS feed elements can be accessed
+    # using the names of their Atom counterparts.
+    def emulate_atom!
+      aliases.merge!({
+        :rights       => :copyright,
+        :content      => :description,
+        :contributor  => :author,
+        :id           => [ :guid_value, :link ],
+        :author       => [ :managingeditor, :webmaster ],
+        :updated      => [ :lastbuilddate, :pubdate ],
+        :published    => [ :pubDate, :lastbuilddate ],
+        :icon         => :'image/url',
+        :logo         => :'image/url',
+        :summary      => :'description_trunc'
+      })
+    end
+    # Parse +source+ using a +Parser+ created from this +ParserBuilder+.
+    def parse(source)
 		  Parser.new(self, source, options)
 	  end
   end
+  #
   class StrictParserBuilder < ParserBuilder
-    attr_accessor :feed_ext_tags, :item_ext_tags
+    attr_accessor :feed_ext_tags, :item_ext_tags, :rels
-    def initialize
-      super()
+    def initialize(options={})
+      super(options)
       # rss tags
     	@rss_tags = [
     	  {
     		  :image     => [ :url, :title, :link, :width, :height, :description ],
-          :textInput => [ :title, :description, :name, :link ],
-          :skipHours => [ :hour ],
-          :skipDays  => [ :day ],
+          :textinput => [ :title, :description, :name, :link ],
+          :skiphours => [ :hour ],
+          :skipdays  => [ :day ],
           :items     => [
             {
-              :'rdf:Seq' => [ :'rdf:li' ]
+              :rdf_seq => [ :rdf_li ]
             },
-            :'rdf:Seq'
+            :rdf_seq
           ],
          #:item      => @item_tags
     		},
     		:title, :link, :description,                          # required
-    		:language, :copyright, :managingEditor, :webMaster,   # optional
-    		:pubDate, :lastBuildDate, :category, :generator,
+    		:language, :copyright, :managingeditor, :webmaster,   # optional
+    		:pubdate, :lastbuilddate, :category, :generator,
     		:docs, :cloud, :ttl, :rating,
-    		:image, :textInput, :skipHours, :skipDays, :item,     # have subtags
+    		:image, :textinput, :skiphours, :skipdays, :item,     # have subtags
     		:items
     	]
       @rss_item_tags = [
         {},
         :title, :description,                                 # required
         :link, :author, :category, :comments, :enclosure,     # optional
-        :guid, :pubDate, :source, :expirationDate
+        :guid, :pubdate, :source, :expirationdate
     	]
       #atom tags
@@ -157,9 +254,7 @@ module FeedMe
         },
         :id, :author, :title, :updated,                     # required
         :category, :contributor, :generator, :icon, :logo,  # optional
-        :'link+self', :'link+alternate', :'link+edit',
-        :'link+replies', :'link+related', :'link+enclosure',
-        :'link+via', :rights, :subtitle
+        :link, :rights, :subtitle
       ]
       @atom_entry_tags = [
         {
@@ -167,22 +262,25 @@ module FeedMe
           :contributor  => person_tags
         },
         :id, :author, :title, :updated, :summary,           # required
-        :category, :content, :contributor, :'link+self',
-        :'link+alternate', :'link+edit', :'link+replies',
-        :'link+related', :'link+enclosure', :published,
-        :rights, :source
+        :category, :content, :contributor, :link,
+        :published, :rights, :source
       ]
+      @rels = {
+        :link => [ 'self', 'alternate', 'edit', 'replies', 'related', 'enclosure', 'via' ]
+      }
       # extensions
       @feed_ext_tags = [
-        :'dc:date', :'feedburner:browserFriendly',
-        :'itunes:author', :'itunes:category'
+        :dc_date, :feedburner_browserfriendly,
+        :itunes_author, :itunes_category
       ]
       @item_ext_tags = [
-        :'dc:date', :'dc:subject', :'dc:creator',
-        :'dc:title', :'dc:rights', :'dc:publisher',
-        :'trackback:ping', :'trackback:about',
-        :'feedburner:origLink'
+        :dc_date, :dc_subject, :dc_creator,
+        :dc_title, :dc_rights, :dc_publisher,
+        :trackback_ping, :trackback_about,
+        :feedburner_origlink, :media_content,
+        :content_encoded
       ]
     end
@@ -202,46 +300,69 @@ module FeedMe
   class FeedData
     attr_reader :fm_tag_name, :fm_parent, :fm_builder
-    def initialize(tag_name, parent, builder, attrs = {})
+    def initialize(tag_name, parent, builder)
       @fm_tag_name = tag_name
       @fm_parent = parent
       @fm_builder = builder
-      @data = attrs.dup
+      @data = {}
     end
     def key?(key)
-      @data.key?(key)
+      @data.key?(clean_tag(key))
     end
     def keys
       @data.keys
     end
+    def delete(key)
+      @data.delete(clean_tag(key))
+    end
+    def each
+      @data.each {|key, value| yield(key, value) }
+    end
+    def each_with_index
+      @data.each_with_index {|key, value, index| yield(key, value, index) }
+    end
+    def size
+      @data.size
+    end
     def [](key)
-      @data[key]
+      @data[clean_tag(key)]
     end
     def []=(key, value)
-      @data[key] = value
+      @data[clean_tag(key)] = value
+    end
+    # special handling for atom id tags, due to conflict with
+    # ruby's Object#id method
+    def id
+      key?(:id) ? self[:id] : call_virtual_method(:id)
     end
     def to_s
-      @data.to_s
+      to_indented_s
     end
-    def method_missing(name, *args)
-      call_virtual_method(name, args)
+    def to_indented_s(indent_step=2)
+      FeedMe.pretty_to_s(self, indent_step, 0, Proc.new do |key, value|
+        (value.is_a?(Array) && value.size == 1) ? [unarrayize(key), value.first] : [key, value]
+      end)
     end
-    protected
-    def clean_tag(tag)
-    	tag.to_s.gsub(':','_').intern
-  	end
-    # generate a name for the array variable corresponding to a single-value variable
-    def arrayize(key)
-      return key + '_array'
+    def method_missing(name, *args)
+      result = begin
+        call_virtual_method(name, args)
+      rescue NameError
+        raise if fm_builder.options[:error_on_missing_key]
+      end
+      result = '' if result.nil? and fm_builder.options[:empty_string_for_nil]
+      result
     end
     # There are several virtual methods for each attribute/tag.
@@ -263,70 +384,146 @@ module FeedMe
     # array.size.
     # 7. If the tag name is of the form "tag+rel", the tag having the
     # specified rel value is returned
-    def call_virtual_method(name, args, history=[])
+    def call_virtual_method(sym, args=[], history=[])
       # make sure we don't get stuck in an infinite loop
       history.each do |call|
-        if call[0] == fm_tag_name and call[1] == name
-          puts name
-          puts self.inspect
-          raise FeedMe::InfiniteCallLoopError.new(name, history)
+        if call[0] == fm_tag_name and call[1] == sym
+          raise FeedMe::InfiniteCallLoopError.new(sym, history)
         end
       end
-      history << [ fm_tag_name, name ]
+      history << [ fm_tag_name, sym ]
-      raw_name = name
-      name = clean_tag(name)
+      name = clean_tag(sym)
       name_str = name.to_s
-      array_key = clean_tag(arrayize(name.to_s))
-      if name_str[-1,1] == '?'
+      array_key = arrayize(name.to_s)
+      result = if key? name
+        self[name]
+      elsif key? array_key
+        self[array_key].first
+      elsif name_str[-1,1] == '?'
         !call_virtual_method(name_str[0..-2], args, history).nil? rescue false
       elsif name_str[-1,1] == '!'
         value = call_virtual_method(name_str[0..-2], args, history)
-        fm_builder.bang_mods.each do |bm|
-          parts = bm.to_s.split('_')
-          bm_key = parts[0].to_sym
-          next unless fm_builder.bang_mod_fns.key?(bm_key)
-          value = fm_builder.bang_mod_fns[bm_key].call(value, *parts[1..-1])
+        _transform(fm_builder.default_transformation, value)
+      elsif name_str =~ /(.+)_values/
+        call_virtual_method(arrayize($1), args, history).collect do |value|
+          _resolve_value value
         end
-        return value
-      elsif key? name
-        self[name]
-      elsif key? array_key
-        self[array_key].first
       elsif name_str =~ /(.+)_value/
+        _resolve_value call_virtual_method($1, args, history)
+      elsif name_str =~ /(.+)_count/
+        call_virtual_method(arrayize($1), args, history).size
+      elsif name_str =~ /(.+)_(.+)/ && fm_builder.transformations.key?($2)
         value = call_virtual_method($1, args, history)
-        if value.is_a?(FeedData)
-          fm_builder.value_tags.each do |tag|
-            return value.call_virtual_method(tag, args, history) rescue nil
-          end
-        else
-          value
+        _transform(fm_builder.transformations[$2], value)
+      elsif name_str.include?('/')    # this is only intended to be used internally
+        value = self
+        name_str.split('/').each do |p|
+          parts = p.split('_')
+          name = clean_tag(parts[0])
+          new_args = parts.size > 1 ? parts[1..-1] : args
+          value = (value.method(name).call(*new_args) rescue
+            value.call_virtual_method(name, new_args, history)) rescue nil
+          break if value.nil?
         end
-      elsif name_str =~ /(.+)_count/
-        call_virtual_method(clean_tag(arrayize($1)), args, history).size
-      elsif name_str.include?("+")
-  		  tag_data = tag.to_s.split("+")
-  		  rel = tag_data[1]
-  		  call_virtual_method(clean_tag(arrayize(tag_data[0])), args, history).each do |elt|
+        value
+      elsif name_str.include?('+')
+  		  name_data = name_str.split('+')
+  		  rel = name_data[1]
+  		  value = nil
+  		  call_virtual_method(arrayize(name_data[0]), args, history).each do |elt|
   		    next unless elt.is_a?(FeedData) and elt.rel?
-  		    return elt if elt.rel.casecmp(rel) == 0
+  		    value = elt if elt.rel.casecmp(rel) == 0
+  		    break unless value.nil?
 		    end
+		    value
 		  elsif fm_builder.aliases.key? name
-        name = fm_builder.aliases[name]
-        method(name).call(*args) rescue call_virtual_method(name, args, history)
-      elsif fm_tag_name == :items      # special handling for RDF items tag
-        self[:'rdf:li_array'].method(raw_name).call(*args)
-      elsif fm_tag_name == :'rdf:li'   # special handling for RDF li tag
-        uri = self[:'rdf:resource']
-        fm_parent.fm_parent.item_array.each do |item|
-          if item[:'rdf:about'] == uri
-            return item.call_virtual_method(name, args, history)
-          end
+        names = fm_builder.aliases[name]
+        names = [names] unless names.is_a? Array
+        value = nil
+        names.each do |name|
+          value = (method(name).call(*args) rescue
+            call_virtual_method(name, args, history)) rescue next
+          break unless value.nil?
         end
+        value
       else
-        raise NameError.new("No such method #{name}", name)
+        nil
+      end
+      raise NameError.new("No such method '#{name}'", name) if result.nil?
+      result
+    end
+    # Apply transformations to a tag value. Can either accept a transformation
+    # name or an array of transformation function names.
+    def transform(tag, trans)
+      value = call_virtual_method(tag) or return nil
+      transformations = trans.is_a?(String) ?
+        fm_builder.transformations[trans] : trans
+      _transform(transformations, value)
+    end
+    protected
+    def clean_tag(tag)
+    	tag.to_s.downcase.gsub(':','_').intern
+  	end
+    # generate a name for the array variable corresponding to a single-value variable
+    def arrayize(key)
+      clean_tag(key.to_s + '_array')
+    end
+    def unarrayize(key)
+      clean_tag(key.to_s.gsub(/_array$/, ''))
+    end
+    private
+    def _transform(trans_array, value)
+      trans_array.each do |t|
+        if t.is_a? String
+          value = _transform(fm_builder.transformations[t], value)
+        else
+          if t.is_a? Symbol
+            t_name = t
+            args = []
+          elsif t[0].is_a? Array
+            raise 'array where symbol expected'
+          else
+            t_name = t[0]
+            args = t[1..-1]
+          end
+          trans = fm_builder.transformation_fns[t_name] or
+            raise NameError.new("No such transformation #{t_name}", t_name)
+          if value.is_a? Array
+            value = value.collect {|x| trans.call(x, *args) }
+          else
+            value = trans.call(value, *args)
+          end
+        end
+      end
+      value
+    end
+    def _resolve_value(obj)
+      value = obj
+      if obj.is_a?(FeedData)
+        if fm_builder.value_tags.key? obj.fm_tag_name
+          value = obj.call_virtual_method(fm_builder.value_tags[obj.fm_tag_name])
+        else
+          fm_builder.default_value_tags.each do |tag|
+            value = obj.call_virtual_method(tag) rescue next
+            break unless value.nil?
+          end
+        end
       end
+      value
     end
   end
@@ -346,19 +543,31 @@ module FeedMe
     alias :feed :channel
     def fm_tag_name
-      @fm_type == FeedMe::RSS ? 'channel' : 'feed'
+      @fm_type == FeedMe::ATOM ? 'feed' : 'channel'
+    end
+    def fm_prefix
+      fm_type.to_s.downcase
     end
     private
     def parse
       # RSS = everything between channel tags + everthing between </channel> and </rdf> if this is an RDF document
-      if @fm_source =~ %r{<(?:.*?:)?(?:rss|rdf)(.*?)>.*?<(?:.*?:)?channel(.*?)>(.+)</(?:.*?:)?channel>(.*)</(?:.*?:)?(?:rss|rdf)>}mi
-        @fm_type = FeedMe::RSS
+      if @fm_source =~ %r{<(?:.*?:)?(rss|rdf)(.*?)>.*?<(?:.*?:)?channel(.*?)>(.+)</(?:.*?:)?channel>(.*)</(?:.*?:)?(?:rss|rdf)>}mi
+        @fm_type = $2.upcase.to_s
         @fm_tags = fm_builder.all_rss_tags
-        attrs = parse_attributes($1, $2)
+        attrs = parse_attributes($1, $3)
         attrs[:version] ||= '1.0';
-        parse_content(self, attrs, $3 + nil_safe_to_s($4), @fm_tags)
+        parse_content(self, attrs, $4, @fm_tags)
+        # for RDF documents, replace references with actual items
+        unless nil_or_empty?($5)
+          refs = FeedData.new(nil, nil, fm_builder)
+          parse_content(refs, {}, $5, @fm_tags)
+          dereference_rdf_tags(:items_array, :item_array, refs) {|a| a.first[:rdf_seq_array].first[:rdf_li_array] }
+          [:image_array, :textinput_array].each {|tag| dereference_rdf_tags(tag, tag, refs) }
+        end
       # Atom = everthing between feed tags
       elsif @fm_source =~ %r{<(?:.*?:)?feed(.*?)>(.+)</(?:.*?:)?feed>}mi
         @fm_type = FeedMe::ATOM
@@ -369,21 +578,37 @@ module FeedMe
       end
   	end
+    # References within the <channel> element are replaced by the actual
+    def dereference_rdf_tags(rdf_tag, rss_tag, refs)
+      if self.key?(rdf_tag)
+        src_items = self.delete(rdf_tag)
+        src_items = yield(src_items) if block_given?
+        ref_items = refs[rss_tag]
+        unless src_items.empty? || ref_items.empty?
+          self[rss_tag] = src_items.collect do |src_item|
+            next unless src_item.key?(:rdf_resource)
+            uri = src_item[:rdf_resource]
+            ref_items.each do |ref_item|
+              next unless ref_item.key?(:rdf_about)
+              if (ref_item[:rdf_about].eql?(uri))
+                ref_item[:rdf_resource] = uri
+                break ref_item
+              end
+            end
+          end
+        end
+      end
+    end
   	def parse_content(parent, attrs, content, tags)
   	  # add attributes to parent
-  	  attrs.each_pair {|key, value| add_tag(parent, key, unescape(value)) }
-  	  # the first item in a tag array may be a hash that defines tags that have subtags
-  	  first_tag = 0
-  	  if !tags.nil? && tags[0].is_a?(Hash)
-  	    sub_tags = tags[0]
-  	    first_tag = 1
-      end
+  	  attrs.each_pair {|key, value| parent[key] = unescape(value) }
+      return if content.nil?
   	  # split the content into elements
   	  elements = {}
-  	  # TODO: this will break if a namespace is used that is not rss: or atom:
-  	  content.scan( %r{(<(?:rss:|atom:)?([^ >]+)([^>]*)(?:/>|>(.*?)</(?:rss:|atom:)?\2>))}mi ) do |match|
+ 	    # TODO: this will break if a namespace is used that is not rss: or atom:
+  	  content.scan( %r{(<([\w:]+)(.*?)(?:/>|>(.*?)</\2>))}mi ) do |match|
   	    # \1 = full content (from start to end tag), \2 = tag name
   	    # \3 = attributes, and \4 = content between tags
   	    key = clean_tag(match[1])
@@ -395,33 +620,37 @@ module FeedMe
   	    end
   	  end
-      # check if this is a promiscuous parser
-      if tags.nil? || tags.empty? || (tags.size == 1 && first_tag == 1)
-        tags = elements.keys
-        first_tag = 0
-      end
+      # the first item in a tag array may be a hash that defines tags that have subtags
+  	  sub_tags = tags[0] if !nil_or_empty?(tags) && tags[0].is_a?(Hash)
+  	  first_tag = sub_tags.nil? || tags.size == 1 ? 0 : 1
+  	  # if this is a promiscuous parser, tag names will depend on the elements found in the feed
+  	  tags = elements.keys if (sub_tags.nil? ? nil_or_empty?(tags) : first_tag == 0)
   	  # iterate over all tags (some or all of which may not be present)
   	  tags[first_tag..-1].each do |tag|
   	    key = clean_tag(tag)
-  	    element_array = elements.delete(tag) or next
+  		  element_array = elements.delete(tag) or next
   	    @fm_parsed << key
   		  element_array.each do |elt|
+  		    elt_attrs = elt[0]
+  		    elt_content = elt[1]
+  		    rels = fm_builder.rels[key] if fm_builder.respond_to?(:rels)
+  		    # if a list of accepted rels is specified, only parse this tag
+  		    # if its rel attribute is inlcuded in the list
+  		    next unless rels.nil? || elt_attrs.nil? || !elt_attrs.rel? || rels.include?(elt_attrs.rel)
   		    if !sub_tags.nil? && sub_tags.key?(key)
-  		      if fm_builder.ghost_tags.include? key
-  		        new_parent = parent
-  		      else
-  		        new_parent = FeedData.new(key, parent, fm_builder)
-  		        add_tag(parent, key, new_parent)
-  		      end
-  		      parse_content(new_parent, elt[0], elt[1], sub_tags[key])
+  		      new_parent = FeedData.new(key, parent, fm_builder)
+  		      add_tag(parent, key, new_parent)
+  		      parse_content(new_parent, elt_attrs, elt_content, sub_tags[key])
   		    else
-  		      add_tag(parent, key, clean_content(key, elt[0], elt[1], parent))
+  		      add_tag(parent, key, clean_content(key, elt_attrs, elt_content, parent))
   		    end
   		  end
   		end
   		@fm_unparsed += elements.keys
   		@fm_parsed.uniq!
@@ -429,7 +658,7 @@ module FeedMe
   	end
     def add_tag(hash, key, value)
-      array_var = clean_tag(arrayize(key.to_s))
+      array_var = arrayize(key)
       if hash.key? array_var
         hash[array_var] << value
       else
@@ -446,18 +675,19 @@ module FeedMe
   		content = content.to_s
   		if fm_builder.date_tags.include? tag
   			content = Time.parse(content) rescue unescape(content)
-  		else
-  			content = unescape(content)
+  		else
+  		  content = unescape(content)
   		end
       unless attrs.empty?
-        hash = FeedData.new(tag, parent, fm_builder, attrs)
+        hash = FeedData.new(tag, parent, fm_builder)
+        attrs.each_pair {|key, value| hash[key] = unescape(value) }
         if !content.empty?
           hash[FeedMe::CONTENT_KEY] = content
         end
         return hash
       end
       return content
   	end
@@ -466,9 +696,9 @@ module FeedMe
       attrs.each do |a|
         next if a.nil?
         # pull key/value pairs out of attr string
-        array = a.scan(/(\w+)=['"]?([^'"]+)/)
+        array = a.scan(/([\w:]+)=['"]?([^'"]+)/)
         # unescape values
-        array = array.collect {|key, value| [clean_tag(format_tag(key)), unescape(value)]}
+        array = array.collect {|key, value| [clean_tag(key), unescape(value)]}
         hash.merge! Hash[*array.flatten]
       end
       return hash
@@ -484,32 +714,10 @@ module FeedMe
       content = cdata[1] if cdata
       return content
-    	#if content =~ /([^-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]%)/n then
-    	# CGI.unescapeHTML(content).gsub(/(<!\[CDATA\[|\]\]>)/,'').strip
-    	#else
-    	#	content.gsub(/(<!\[CDATA\[|\]\]>)/,'').strip
-    	#end
-    end
-    def underscore(camel_cased_word)
-      camel_cased_word.to_s.gsub(/::/, '/').
-        gsub(/([A-Z]+)([A-Z][a-z])/,'\1_\2').
-        gsub(/([a-z\d])([A-Z])/,'\1_\2').
-        tr("-", "_").
-        downcase
-    end
-    def camelize(lower_case_and_underscored_word, first_letter_in_uppercase = true)
-      if first_letter_in_uppercase
-        lower_case_and_underscored_word.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
-      else
-        lower_case_and_underscored_word[0,1].downcase + camelize(lower_case_and_underscored_word)[1..-1]
-      end
     end
-    def nil_safe_to_s(obj)
-      obj.nil? ? '' : obj.to_s
+    def nil_or_empty?(obj)
+      obj.nil? || obj.empty? || (obj.is_a?(String) && obj.strip.empty?)
     end
   end