feedme 0.1 → 0.8.0

Sign up to get free protection for your applications and to get access to all the features.
data/History.txt CHANGED
@@ -1,3 +1,103 @@
1
+ === 0.8 / 2009-12-14
2
+
3
+ * Add new virtual method _values: returns all values for a given tag.
4
+ * Transformations with arguments are now specified as an array rather than
5
+ part of the symbol
6
+ * Add transform method
7
+ * Add regexp transform
8
+ * Add nokogiri support (hpricot is still the default)
9
+ * Copy/paste and fix feed-normalizer clean_html method, drop feed-normalizer dependency
10
+
11
+ === 0.7.1 / 2009-09-24
12
+
13
+ * Fix nil_or_empty? to strip whitespace from strings
14
+
15
+ === 0.7 / 2009-09-24
16
+
17
+ * Design decision: all element and attribute names will be stored as lower-case. They may still
18
+ be accessed using upper case, since keys will be normalized by all accessors.
19
+ * Design decision: RDF will be dealt with at parse time: elements with rdf:resource attributes will be
20
+ replaced by the actual, referenced elements. Ordering of the referring elements will be preserved.
21
+ * Removed the concept of ghost tags.
22
+
23
+ === 0.6.5 / 2009-09-24
24
+
25
+ * Fix :truncHtml completely by requiring active_support.
26
+
27
+ === 0.6.4 / 2009-09-23
28
+
29
+ * Roll version to make github happy.
30
+
31
+ === 0.6.3 / 2009-09-23
32
+
33
+ * Fix truncHtml: use code by Henrik Nyh, which in turn uses Hypricot
34
+
35
+ === 0.6.2 / 2009-09-23
36
+
37
+ * Fix content-parsing regular expression to correctly handle closed elements
38
+ * Reverse earlier design decision: keep namespaces for attributes.
39
+
40
+ === 0.6.1 / 2009-09-23
41
+
42
+ * Improve handling of rdf:items. From now on, .items will forward to .item_array. The rdf items can still be accessed by [:items_array] or .items_array.
43
+
44
+ === 0.6 / 2009-09-23
45
+
46
+ * Fix handling of the items element (mostly affects RSS 1.0 documents)
47
+ * Make attribute naming consistent
48
+ * Design decision: attributes can only ever have a single value, so they will always be stored as scalars
49
+ rather than arrays. This will also nicely resolve any possible collisions between attribute and tag names.
50
+
51
+ === 0.5.4 / 2009-09-22
52
+
53
+ * Minor improvements to to_indented_s
54
+ * Fix tag names: change all tags with namespaces to the cleaned version (unquote, ':' replaced with '_')
55
+ * Design decision: all attribute names will have their namespaces stripped; namespaces are generally
56
+ treated as optional (even if they aren't technically so) and it's annoying to have to check both forms;
57
+ this decision may be reversed if there are found to be conflicts
58
+
59
+ === 0.5.3 / 2009-09-22
60
+
61
+ * Roll version to test GitHub wierdness.
62
+
63
+ === 0.5.2 / 2009-09-22
64
+
65
+ * Improve to_s method for prettier array display.
66
+
67
+ === 0.5.1 / 2009-09-21
68
+
69
+ * Update example code
70
+ * Bug fix: call_virtual_method has invalid return if neither a key nor any of its aliases has a value
71
+ * Subsequent releases will follow standard versioning model of "major.minor.bugfix"
72
+
73
+ === 0.5 / 2009-09-21
74
+
75
+ * Special handling for atom id tag
76
+ * to_indented_str method, which creates a pretty output for a FeedData
77
+ * Improved to_s method that delegates to to_indented_str
78
+
79
+ === 0.4 / 2009-09-20
80
+
81
+ * Expose call_virtual_method as public
82
+ * Change 'name' argument of call_virtual_method to 'sym'
83
+ * Add default value for call_virtual_method 'args' argument
84
+ * Add :'media:content' and :'content:encoded' as ext tags
85
+ * fix use of FeedNormalizer in :cleanHtml transformation
86
+
87
+ === 0.3 / 2009-09-18
88
+
89
+ * Update example code
90
+ * Bug fix: call_virtual_method always throws exception
91
+ * Bug fix: responds_to? -> respond_to? and rels -> :rels
92
+
93
+ === 0.2 / 2009-09-12
94
+
95
+ * Change bang mods to more flexible transformations framework.
96
+ * Add additional transformation functions.
97
+ * Add methods for RSS/Atom emulation that automatically add appropriate aliases.
98
+ * Add empty_string_for_nil and error_on_missing_key options.
99
+ * Add support for parsing only certain rels in the strict parser.
100
+
1
101
  === 0.1 / 2009-09-03
2
102
 
3
103
  * Everything is new. First release.
data/Manifest.txt CHANGED
@@ -3,5 +3,7 @@ Manifest.txt
3
3
  README.txt
4
4
  Rakefile
5
5
  lib/feedme.rb
6
+ lib/truncator.rb
7
+ lib/util.rb
6
8
  examples/rocketboom.rb
7
9
  examples/rocketboom.rss
data/README.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  = feedme
2
2
 
3
- * http://feedme.rubyforge.org
3
+ * http://wiki.github.com/jdidion/feedme
4
4
 
5
5
  == DESCRIPTION:
6
6
 
@@ -24,76 +24,143 @@ The API is similar to SimpleRSS:
24
24
  require 'open-uri'
25
25
 
26
26
  rss = FeedMe.parse open('http://slashdot.org/index.rdf')
27
-
28
- rss.version # => 1.0
27
+ rss.version # => 1.0
29
28
  rss.channel.title # => "Slashdot"
30
29
  rss.channel.link # => "http://slashdot.org/"
31
30
  rss.items.first.link # => "http://books.slashdot.org/article.pl?sid=05/08/29/1319236&from=rss"
32
31
 
33
- But since the parser can read Atom feeds as easily as RSS feeds, there are optional aliases that allow more atom like reading:
32
+ But since the parser can read Atom feeds as easily as RSS feeds, there are aliases that allow more atom like reading:
34
33
 
35
34
  rss.feed.title # => "Slashdot"
36
35
  rss.feed.link # => "http://slashdot.org/"
37
36
  rss.entries.first.link # => "http://books.slashdot.org/article.pl?sid=05/08/29/1319236&from=rss"
38
-
39
- Under the covers, all content is stored in arrays. This means that you can access all content for a tag that appears multiple times (i.e. category):
40
-
41
- rss.items.first.category_array # => ["News for Nerds", "Technology"]
42
- rss.items.first.category # => "News for Nerds"
43
-
37
+
38
+ Under the covers, all element values are stored in arrays. This means that you can access all content for an element that appears multiple times (i.e. category):
39
+
40
+ rss.items.first.category_array # => ["News for Nerds", "Technology"]
41
+ rss.items.first.category # => "News for Nerds"
42
+
44
43
  You also have access to all the attributes as well as tag values:
45
44
 
46
- rss.items.first.guid.isPermaLink # => "true"
47
- rss.items.first.guid.content # => http://books.slashdot.org/article.pl?sid=05/08/29/1319236
45
+ rss.items.first.guid.isPermaLink # => "true"
46
+ rss.items.first.guid.content # => http://books.slashdot.org/article.pl?sid=05/08/29/1319236
48
47
 
49
48
  FeedMe also adds some syntactic sugar that makes it easy to get the information you want:
50
49
 
51
- rss.items.first.category? # => true
52
- rss.items.first.category_count # => 2
53
- rss.items.first.guid_content # => http://books.slashdot.org/article.pl?sid=05/08/29/1319236
50
+ rss.items.first.category? # => true
51
+ rss.items.first.category_count # => 2
52
+ rss.items.first.guid_value # => http://books.slashdot.org/article.pl?sid=05/08/29/1319236
54
53
 
55
54
  There are two different parsers that you can use, depending on your needs. The default parser is "promiscuous," meaning that it parses all tags. There is also a strict parser that only parses tags specified in a list. Here is how you create the different types of parsers:
56
-
57
- FeedMe.parse(source) # parse using the default (promiscuous) parser
58
- FeedMe::ParserBuilder.new.parse(source) # equivalent to the previous line
59
- FeedMe::StrictParserBuilder.new.parse(source) # only parse certain tags
55
+
56
+ FeedMe.parse(source) # parse using the default (promiscuous) parser
57
+ FeedMe::ParserBuilder.new.parse(source) # equivalent to the previous line
58
+ FeedMe.parse_strict(source)
59
+ FeedMe::StrictParserBuilder.new.parse(source) # only parse certain tags
60
+
61
+ The FeedMe class methods and the parser builder constructors also accept an options hash. Options are also passed on to the Parser constructor. Currently, only two options are available:
62
+
63
+ 1. :empty_string_for_nil => false # return the empty string instead of a nil value
64
+ 2. :error_on_missing_key => false # raise an error if a specified key or virtual method does not exist (otherwise nil is returned)
60
65
 
61
66
  The strict parser can be extended by adding new tags to parse:
62
67
 
63
- builder = FeedMe::StrictParserBuilder.new
64
- builder.rss_tags << :some_new_tag
65
- builder.rss_item_tags << :'item+myrel' # parse an item that has a custom rel type
66
- builder.item_ext_tags << :'feedburner:origLink' # parse an extension tag - one that has a specific namespace
67
-
68
+ builder = FeedMe::StrictParserBuilder.new
69
+ builder.rss_tags << :some_new_tag
70
+ builder.rss_item_tags << :'item+myrel' # parse an item that has a custom rel type
71
+ builder.item_ext_tags << :feedburner_origLink # parse an extension tag - one that has a specific
72
+ # namespace (use '_', not ':', to separate namespace
73
+ # from attribute name)
74
+
68
75
  Either parser can be extended by adding aliases to existing tags:
69
76
 
70
- builder.aliases[:updated] => :pubDate # now you can always access the updated date using :updated, regardless of whether it's an RSS or Atom feed
77
+ builder.aliases[:updated] => :pubDate # now you can always access the updated date using :updated,
78
+ # regardless of whether it's an RSS or Atom feed
79
+
80
+ If you don't know ahead of time what type of feed you'll be parsing, you can tell FeedMe to always emulate RSS or Atom. These methods just add a bunch of aliases:
81
+
82
+ builder.emulate_rss!
83
+ builder.emulate_atom!
84
+
85
+ Another bit of syntactic sugar are transformations. These are modifications that can be applied to feed content. There is a default transformation that can be applied by adding '!' to the tag name.
86
+
87
+ rss.entry.content # => <div>Some great stuff</div>
88
+ rss.entry.content! # => Some great stuff
89
+
90
+ The default transformation can be changed:
91
+
92
+ builder.default_transformation = [ :cleanHtml ]
93
+
94
+ Custom transformations are defined by mapping one or more transformation functions to a suffix:
95
+
96
+ builder.transformations['clean'] = [ :cleanHtml ]
97
+
98
+ rss.entry.content # => <div>This is a bunch of text</div><p></p></html>
99
+ rss.entry.content_clean # => <div>This is a bunch of text</div>
100
+
101
+ You can also/instead apply an arbitrary set of transformations via the transform method:
71
102
 
72
- Another bit of syntactic sugar is the "bang mod." These are modifications that can be applied to feed content by adding '!' to the tag name. The default bang mod is to strip HTML tags from the content.
103
+ rss.entry.transform(:content, [ :clean, [ :trunc, 50 ] ])
73
104
 
74
- rss.entry.content # => <div>Some great stuff</div>
75
- rss.entry.content! # => Some great stuff
76
-
77
- You can create your own bang mods. The following is an example of a bang mod that takes an argument. The first line is how bang mods are added, and the third line tells the builder to actually apply this bang mod when the '!' suffix is used. Note that bang mod names may only contain alphanumeric characters. Argument values are specified at the end separated by underscores.
105
+ You can create your own transformation function. The following is an example of a transformation function that takes an argument. Note that transformation function names may only contain alphanumeric characters. Argument values are specified at the end separated by underscores.
106
+
107
+ builder.transformation_fns[:wrap] => proc {|str, col|
108
+ str.gsub(/(.{1,#{col}})( +|$\n?)|(.{1,#{col}})/, "\\1\\3\n").strip
109
+ }
110
+ builder.transformations['wrap'] = [ :wrap_10 ]
111
+
112
+ rss.entry.content = This is a bunch of text
113
+ rss.entry.content_wrap = This is a
114
+ bunch of
115
+ text
78
116
 
79
- # wrap content at a specified number of columns
80
- builder.bang_mod_fns[:wrap] => proc {|str, col| str.gsub(/(.{1,#{col}})( +|$\n?)|(.{1,#{col}})/, "\\1\\3\n").strip }
81
- builder.bang_mods << :wrap_80
117
+ The transformation functions available by default are:
82
118
 
119
+ 1. :stripHtml - described above
120
+ 2. :cleanHtml - ** Requires FeedNormalizer (which in turn requires Hypricot) **
121
+
122
+ rss.entry_array[0].content # => 1 > 2
123
+ rss.entry_array[0].content! # => 1 &gt; 2
124
+
125
+ rss.entry_array[1].content # => <div>Some great stuff</div><p></p></html>
126
+ rss.entry_array[1].content! # => <div>Some great stuff</div>
127
+
128
+ 3. :wrap - takes number of columns as a parameter. Respects word boundaries. Example of :wrap_10:
129
+
130
+ rss.entry.content # => This is a bunch of text
131
+ rss.entry.content! # => This is a
132
+ bunch of
133
+ text
134
+
135
+ 4. :trunc - truncates text to a certain length. Example of :trunc_10:
136
+
137
+ rss.entries.first.content # => This is a long long long sentence
138
+ rss.entries.first.content! # => This is a
139
+
140
+ 5. :truncHtml - truncates the content inside the first set of HTML tags, but preserves the tags. ** Requires ActiveSupport and Hpricot ** Example of :truncHtml_10:
141
+
142
+ rss.entries.first.content # => <div>This is a long long long sentence</div></html>
143
+ rss.entries.first.content! # => <div>This is a </div></html>
144
+
145
+ 6. :regexp - apply a regular expression and extract the capture groups
146
+
147
+ rss.entries.first.content # => This is a long long long entry
148
+ rss.entries.first.transform(:content, [ :regexp, /(This is a long ).*(entry)/ ]) # => This is a long entry
149
+
83
150
  In order to prevent clashes between tag/attribute names and the parser class' instance variables, all instance variables are prefixed with 'fm_'. They are:
84
-
85
- fm_source # the original, unparsed source
86
- fm_options # the options passed to the parser constructor
87
- fm_type # the feed type
88
- fm_tags # the tags the parser looks for in the source
89
- fm_parsed # the list of tags the parser actually found
90
- fm_unparsed # the list of tags that appeared in the feed but were not parsed (useful for debugging)
151
+
152
+ fm_source # the original, unparsed source
153
+ fm_options # the options passed to the parser constructor
154
+ fm_type # the feed type
155
+ fm_tags # the tags the parser looks for in the source
156
+ fm_parsed # the list of tags the parser actually found
157
+ fm_unparsed # the list of tags that appeared in the feed but were not parsed (useful for debugging)
91
158
 
92
159
  Additionally, there are several variables that are available at every level of the parse tree:
93
160
 
94
- fm_builder # the ParserBuilder that created the parser
95
- fm_parent # the container of the current level of the parse tree
96
- fm_tag_name # the name of the rss/atom tag whose content is contained in this level of the tree
161
+ fm_builder # the ParserBuilder that created the parser
162
+ fm_parent # the container of the current level of the parse tree
163
+ fm_tag_name # the name of the rss/atom tag whose content is contained in this level of the tree
97
164
 
98
165
  === A word on RSS/Atom Versions
99
166
 
@@ -107,9 +174,20 @@ Due to various incompatibilities between different RSS versions, it is strongly
107
174
 
108
175
  == INSTALL:
109
176
 
110
- * gem install feedme
111
- * http://rubyforge.org/projects/feedme
177
+ * gem install jdidion-feedme (Add GitHub as a gem source: gem sources -a http://gems.github.com)
178
+ * http://github.com/jdidion/feedme/downloads
179
+
180
+ To use certain features of FeedMe, some dependencies are required:
181
+ * To use the :truncHtml transformation for truncating HTML content, ActiveSupport and Hpricot are required
182
+
183
+ sudo gem install activesupport
184
+ sudo gem install hpricot
185
+
186
+ * To use the :cleanHtml for sanitizing HTML, FeedNormalizer and Hpricot are required
187
+
188
+ sudo gem install feed-normalizer
189
+ sudo gem install hpricot
112
190
 
113
191
  == LICENSE:
114
192
 
115
- This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
193
+ This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
data/Rakefile CHANGED
@@ -1,7 +1,24 @@
1
1
  require 'rubygems'
2
- require 'hoe'
2
+ require 'jeweler'
3
3
 
4
- Hoe.spec 'feedme' do |hoe|
5
- hoe.developer('John Didion', 'jdidion@rubyforge.org')
6
- hoe.rubyforge_name = 'feedme'
4
+ tasks = Jeweler::Tasks.new do |s|
5
+ s.name = "feedme"
6
+ s.authors = ["John Didion"]
7
+ s.description = %q{A simple, flexible, and extensible RSS and Atom parser for Ruby. Based on the popular SimpleRSS library, but with many nice extra features.}
8
+ s.email = ["code@didion.net"]
9
+ s.extra_rdoc_files = ["History.txt", "Manifest.txt", "README.txt"]
10
+ s.files = ["History.txt", "Manifest.txt", "README.txt", "Rakefile",
11
+ "lib/feedme.rb", "lib/hpricot-util.rb", "lib/nokogiri-util.rb",
12
+ "lib/html-cleaner.rb", "lib/util.rb", "examples/rocketboom.rb",
13
+ "examples/rocketboom.rss", "test/test_helper.rb"]
14
+ s.homepage = %q{http://wiki.github.com/jdidion/feedme}
15
+ s.rdoc_options = ["--main", "README.txt"]
16
+ s.require_paths = ["lib"]
17
+ s.rubyforge_project = %q{feedme}
18
+ s.summary = %q{A simple, flexible, and extensible RSS and Atom parser for Ruby}
19
+ s.test_files = ["test/test_helper.rb"]
7
20
  end
21
+ tasks.jeweler.remote = 'github'
22
+ Jeweler::GemcutterTasks.new
23
+
24
+
@@ -1,5 +1,6 @@
1
- #require 'feedme'
2
- require '../lib/feedme'
1
+ #!/usr/bin/ruby
2
+ require 'rubygems'
3
+ require 'feedme'
3
4
  require 'net/http'
4
5
 
5
6
  def fetch(url)
@@ -24,13 +25,13 @@ end
24
25
  # create a new ParserBuilder
25
26
  builder = FeedMe::ParserBuilder.new
26
27
  # add a bang mod to wrap content to 50 columns
27
- builder.bang_mods << :wrap_80
28
+ builder.default_transformation << :wrap_80
28
29
 
29
30
  # parse the rss feed
30
31
  rss = builder.parse(content)
31
32
 
32
33
  # equivalent to rss.channel.title
33
- puts "#{rss.type} Feed: #{rss.title}"
34
+ puts "#{rss.class} Feed: #{rss.title}"
34
35
 
35
36
  # use a virtual method...this one a shortcut to rss.items.size
36
37
  puts "#{rss.item_count} items"
data/lib/feedme.rb CHANGED
@@ -1,54 +1,84 @@
1
- ####################################################################################
2
- # FeedMe v0.1
3
- #
4
- # FeedMe is an easy to use parser for RSS and Atom files. It is based on SimpleRSS,
5
- # but has some improvements that make it worth considering:
6
- # 1. Support for attributes
7
- # 2. Support for nested elements
8
- # 3. Support for elements that appear multiple times
9
- # 4. Syntactic sugar that makes it easier to get at the information you want
10
- #
11
- # One word of caution: FeedMe will be maintained only so long as SimpleRSS does not
12
- # provide the above features. I will try to keep FeedMe's API compatible with
13
- # SimpleRSS so that it will be easy for users to switch if/when necessary.
14
- ####################################################################################
15
-
16
1
  require 'cgi'
17
2
  require 'time'
3
+ require 'util.rb'
18
4
 
19
5
  module FeedMe
20
- VERSION = "0.1"
6
+ # The current version of FeedMe.
7
+ VERSION = "0.7.2"
21
8
 
22
- # constants for the feed type
9
+ # The value of Parser#fm_type for RSS feeds.
23
10
  RSS = :RSS
11
+ # The value of Parser#fm_type for RDF (RSS 1.0) feeds.
12
+ RDF = :RDF
13
+ # The value of Parser#fm_type for Atom feeds.
24
14
  ATOM = :ATOM
25
15
 
26
- # the key used to access the content element of a mixed tag
16
+ # The key used to access the content element of a mixed tag.
27
17
  CONTENT_KEY = :content
28
18
 
19
+ # Helper libraries for HTML functions
20
+ NOKOGIRI_HELPER = 'nokogiri-util.rb'
21
+ HPRICOT_HELPER = 'hpricot-util.rb'
22
+
23
+ # Parse a feed using the promiscuous parser.
29
24
  def FeedMe.parse(source, options={})
30
- ParserBuilder.new.parse(source, options)
25
+ ParserBuilder.new(options).parse(source)
31
26
  end
32
27
 
28
+ # Parse a feed using the strict parser.
33
29
  def FeedMe.parse_strict(source, options={})
34
- StrictParserBuilder.new.parse(source, options)
30
+ StrictParserBuilder.new(options).parse(source)
35
31
  end
36
32
 
33
+ # This class is used to create promiscuous parsers.
37
34
  class ParserBuilder
38
- attr_accessor :rss_tags, :rss_item_tags, :atom_tags, :atom_entry_tags,
39
- :date_tags, :value_tags, :ghost_tags, :aliases,
40
- :bang_mods, :bang_mod_fns
35
+ # The options passed to this ParserBuilder's constructor.
36
+ attr_reader :options
37
+ # The tags that are parsed for RSS feeds.
38
+ attr_accessor :rss_tags
39
+ # The subtags of item elements that are parsed for RSS feeds.
40
+ attr_accessor :rss_item_tags
41
+ # The tags that are parsed for Atom feeds.
42
+ attr_accessor :atom_tags
43
+ # The subtags of entry elements that are parsed for Atom feeds.
44
+ attr_accessor :atom_entry_tags
45
+ # The names of tags that should be parsed as date values.
46
+ attr_accessor :date_tags
47
+ # An array of names of attributes/subtags whose values can be
48
+ # used as the default value of a mixed element.
49
+ attr_accessor :value_tags
50
+ # Tags to use for element value when specific tag isn't specified
51
+ attr_accessor :default_value_tags
52
+ # A hash of attribute/tag name aliases.
53
+ attr_accessor :aliases
54
+ # An array of the transformation functions applied when the !
55
+ # suffix is added to the attribute/tag name.
56
+ attr_accessor :default_transformation
57
+ # Mapping of transformation names to functions. Each key is a
58
+ # suffix that can be appended to an attribute/tag name, and
59
+ # the value is an array of transformation function names that
60
+ # are applied when that transformation is used.
61
+ attr_accessor :transformations
62
+ # Mapping of transformation function names to Procs.
63
+ attr_accessor :transformation_fns
64
+ # the helper library used for HTML transformations
65
+ attr_accessor :html_helper_lib
41
66
 
42
- # the promiscuous parser only has to know about tags that have nested subtags
43
- def initialize
67
+ # Create a new ParserBuilder. Allowed options are:
68
+ # * :empty_string_for_nil => false # return the empty string instead of a nil value
69
+ # * :error_on_missing_key => false # raise an error if a specified key or virtual
70
+ # method does not exist (otherwise nil is returned)
71
+ def initialize(options={})
72
+ @options = options
73
+
44
74
  # rss tags
45
75
  @rss_tags = [
46
76
  {
47
77
  :image => nil,
48
- :textInput => nil,
49
- :skipHours => nil,
50
- :skipDays => nil,
51
- :items => [{ :'rdf:Seq' => nil }],
78
+ :textinput => nil,
79
+ :skiphours => nil,
80
+ :skipdays => nil,
81
+ :items => [{ :rdf_seq => nil }],
52
82
  #:item => @rss_item_tags
53
83
  }
54
84
  ]
@@ -70,14 +100,15 @@ module FeedMe
70
100
  ]
71
101
 
72
102
  # tags whose value is a date
73
- @date_tags = [ :pubDate, :lastBuildDate, :published, :updated, :'dc:date', :expirationDate ]
103
+ @date_tags = [ :pubdate, :lastbuilddate, :published, :updated, :dc_date,
104
+ :expirationdate ]
74
105
 
75
- # tags that can be used as the default value for a tag with attributes
76
- @value_tags = [ CONTENT_KEY, :href ]
106
+ # tags that can be used as the default value for a mixed element
107
+ @value_tags = {
108
+ :media_content => :url
109
+ }
110
+ @default_value_tags = [ CONTENT_KEY, :href, :url ]
77
111
 
78
- # tags that don't become part of the parsed object tree
79
- @ghost_tags = [ :'rdf:Seq' ]
80
-
81
112
  # tag/attribute aliases
82
113
  @aliases = {
83
114
  :items => :item_array,
@@ -87,64 +118,130 @@ module FeedMe
87
118
  :link => :'link+self'
88
119
  }
89
120
 
90
- # bang mods
91
- @bang_mods = [ :stripHtml ]
92
- @bang_mod_fns = {
93
- :stripHtml => proc {|str| str.gsub(/<\/?[^>]*>/, "").strip },
94
- :wrap => proc {|str, col| str.gsub(/(.{1,#{col}})( +|$\n?)|(.{1,#{col}})/, "\\1\\3\n").strip }
121
+ # transformations
122
+ @html_helper_lib = HPRICOT_HELPER
123
+ @default_transformation = [ :cleanHtml ]
124
+ @transformations = {}
125
+ @transformation_fns = {
126
+ # remove all HTML tags
127
+ :stripHtml => proc do |str|
128
+ require @html_helper_lib
129
+ FeedMe.html_helper.strip_html(str)
130
+ end,
131
+
132
+ # clean HTML content using FeedNormalizer's HtmlCleaner class
133
+ :cleanHtml => proc do |str|
134
+ require @html_helper_lib
135
+ FeedMe.html_helper.clean_html(str)
136
+ end,
137
+
138
+ # wrap text at a certain number of characters (respecting word boundaries)
139
+ :wrap => proc do |str, col|
140
+ str.gsub(/(.{1,#{col}})( +|$\n?)|(.{1,#{col}})/, "\\1\\3\n").strip
141
+ end,
142
+
143
+ # truncate text, respecting word boundaries
144
+ :trunc => proc {|str, wordcount| str.trunc(wordcount.to_i) },
145
+
146
+ # truncate HTML and leave enclosing HTML tags
147
+ :truncHtml => proc do |str, wordcount|
148
+ require @html_helper_lib
149
+ FeedMe.html_helper.truncate_html(str, wordcount.to_i)
150
+ end,
151
+
152
+ :regexp => proc do |str, regexp|
153
+ match = Regexp.new(regexp).match(str)
154
+ match.nil? ? nil : match[1]
155
+ end,
95
156
  }
96
157
  end
97
158
 
159
+ # Prepare tag list for an RSS feed.
98
160
  def all_rss_tags
99
161
  all_tags = rss_tags.dup
100
162
  all_tags[0][:item] = rss_item_tags.dup
101
163
  return all_tags
102
164
  end
103
165
 
166
+ # Prepare tag list for an Atom feed.
104
167
  def all_atom_tags
105
168
  all_tags = atom_tags.dup
106
169
  all_tags[0][:entry] = atom_entry_tags.dup
107
170
  return all_tags
108
171
  end
109
172
 
110
- def parse(source, options={})
173
+ # Add aliases so that Atom feed elements can be accessed
174
+ # using the names of their RSS counterparts.
175
+ def emulate_rss!
176
+ aliases.merge!({
177
+ :guid => :id, # this alias never actually gets used; see FeedData#id
178
+ :copyright => :rights,
179
+ :pubdate => [ :published, :updated ],
180
+ :lastbuilddate => [ :updated, :published ],
181
+ :description => [ :content, :summary ],
182
+ :managingeditor => [ :'author/name', :'contributor/name' ],
183
+ :webmaster => [ :'author/name', :'contributor/name' ],
184
+ :image => [ :icon, :logo ]
185
+ })
186
+ end
187
+
188
+ # Add aliases so that RSS feed elements can be accessed
189
+ # using the names of their Atom counterparts.
190
+ def emulate_atom!
191
+ aliases.merge!({
192
+ :rights => :copyright,
193
+ :content => :description,
194
+ :contributor => :author,
195
+ :id => [ :guid_value, :link ],
196
+ :author => [ :managingeditor, :webmaster ],
197
+ :updated => [ :lastbuilddate, :pubdate ],
198
+ :published => [ :pubDate, :lastbuilddate ],
199
+ :icon => :'image/url',
200
+ :logo => :'image/url',
201
+ :summary => :'description_trunc'
202
+ })
203
+ end
204
+
205
+ # Parse +source+ using a +Parser+ created from this +ParserBuilder+.
206
+ def parse(source)
111
207
  Parser.new(self, source, options)
112
208
  end
113
209
  end
114
210
 
211
+ #
115
212
  class StrictParserBuilder < ParserBuilder
116
- attr_accessor :feed_ext_tags, :item_ext_tags
213
+ attr_accessor :feed_ext_tags, :item_ext_tags, :rels
117
214
 
118
- def initialize
119
- super()
215
+ def initialize(options={})
216
+ super(options)
120
217
 
121
218
  # rss tags
122
219
  @rss_tags = [
123
220
  {
124
221
  :image => [ :url, :title, :link, :width, :height, :description ],
125
- :textInput => [ :title, :description, :name, :link ],
126
- :skipHours => [ :hour ],
127
- :skipDays => [ :day ],
222
+ :textinput => [ :title, :description, :name, :link ],
223
+ :skiphours => [ :hour ],
224
+ :skipdays => [ :day ],
128
225
  :items => [
129
226
  {
130
- :'rdf:Seq' => [ :'rdf:li' ]
227
+ :rdf_seq => [ :rdf_li ]
131
228
  },
132
- :'rdf:Seq'
229
+ :rdf_seq
133
230
  ],
134
231
  #:item => @item_tags
135
232
  },
136
233
  :title, :link, :description, # required
137
- :language, :copyright, :managingEditor, :webMaster, # optional
138
- :pubDate, :lastBuildDate, :category, :generator,
234
+ :language, :copyright, :managingeditor, :webmaster, # optional
235
+ :pubdate, :lastbuilddate, :category, :generator,
139
236
  :docs, :cloud, :ttl, :rating,
140
- :image, :textInput, :skipHours, :skipDays, :item, # have subtags
237
+ :image, :textinput, :skiphours, :skipdays, :item, # have subtags
141
238
  :items
142
239
  ]
143
240
  @rss_item_tags = [
144
241
  {},
145
242
  :title, :description, # required
146
243
  :link, :author, :category, :comments, :enclosure, # optional
147
- :guid, :pubDate, :source, :expirationDate
244
+ :guid, :pubdate, :source, :expirationdate
148
245
  ]
149
246
 
150
247
  #atom tags
@@ -157,9 +254,7 @@ module FeedMe
157
254
  },
158
255
  :id, :author, :title, :updated, # required
159
256
  :category, :contributor, :generator, :icon, :logo, # optional
160
- :'link+self', :'link+alternate', :'link+edit',
161
- :'link+replies', :'link+related', :'link+enclosure',
162
- :'link+via', :rights, :subtitle
257
+ :link, :rights, :subtitle
163
258
  ]
164
259
  @atom_entry_tags = [
165
260
  {
@@ -167,22 +262,25 @@ module FeedMe
167
262
  :contributor => person_tags
168
263
  },
169
264
  :id, :author, :title, :updated, :summary, # required
170
- :category, :content, :contributor, :'link+self',
171
- :'link+alternate', :'link+edit', :'link+replies',
172
- :'link+related', :'link+enclosure', :published,
173
- :rights, :source
265
+ :category, :content, :contributor, :link,
266
+ :published, :rights, :source
174
267
  ]
175
268
 
269
+ @rels = {
270
+ :link => [ 'self', 'alternate', 'edit', 'replies', 'related', 'enclosure', 'via' ]
271
+ }
272
+
176
273
  # extensions
177
274
  @feed_ext_tags = [
178
- :'dc:date', :'feedburner:browserFriendly',
179
- :'itunes:author', :'itunes:category'
275
+ :dc_date, :feedburner_browserfriendly,
276
+ :itunes_author, :itunes_category
180
277
  ]
181
278
  @item_ext_tags = [
182
- :'dc:date', :'dc:subject', :'dc:creator',
183
- :'dc:title', :'dc:rights', :'dc:publisher',
184
- :'trackback:ping', :'trackback:about',
185
- :'feedburner:origLink'
279
+ :dc_date, :dc_subject, :dc_creator,
280
+ :dc_title, :dc_rights, :dc_publisher,
281
+ :trackback_ping, :trackback_about,
282
+ :feedburner_origlink, :media_content,
283
+ :content_encoded
186
284
  ]
187
285
  end
188
286
 
@@ -202,46 +300,69 @@ module FeedMe
202
300
  class FeedData
203
301
  attr_reader :fm_tag_name, :fm_parent, :fm_builder
204
302
 
205
- def initialize(tag_name, parent, builder, attrs = {})
303
+ def initialize(tag_name, parent, builder)
206
304
  @fm_tag_name = tag_name
207
305
  @fm_parent = parent
208
306
  @fm_builder = builder
209
- @data = attrs.dup
307
+ @data = {}
210
308
  end
211
309
 
212
310
  def key?(key)
213
- @data.key?(key)
311
+ @data.key?(clean_tag(key))
214
312
  end
215
313
 
216
314
  def keys
217
315
  @data.keys
218
316
  end
219
317
 
318
+ def delete(key)
319
+ @data.delete(clean_tag(key))
320
+ end
321
+
322
+ def each
323
+ @data.each {|key, value| yield(key, value) }
324
+ end
325
+
326
+ def each_with_index
327
+ @data.each_with_index {|key, value, index| yield(key, value, index) }
328
+ end
329
+
330
+ def size
331
+ @data.size
332
+ end
333
+
220
334
  def [](key)
221
- @data[key]
335
+ @data[clean_tag(key)]
222
336
  end
223
337
 
224
338
  def []=(key, value)
225
- @data[key] = value
339
+ @data[clean_tag(key)] = value
340
+ end
341
+
342
+ # special handling for atom id tags, due to conflict with
343
+ # ruby's Object#id method
344
+ def id
345
+ key?(:id) ? self[:id] : call_virtual_method(:id)
226
346
  end
227
347
 
228
348
  def to_s
229
- @data.to_s
349
+ to_indented_s
230
350
  end
231
351
 
232
- def method_missing(name, *args)
233
- call_virtual_method(name, args)
352
+ def to_indented_s(indent_step=2)
353
+ FeedMe.pretty_to_s(self, indent_step, 0, Proc.new do |key, value|
354
+ (value.is_a?(Array) && value.size == 1) ? [unarrayize(key), value.first] : [key, value]
355
+ end)
234
356
  end
235
357
 
236
- protected
237
-
238
- def clean_tag(tag)
239
- tag.to_s.gsub(':','_').intern
240
- end
241
-
242
- # generate a name for the array variable corresponding to a single-value variable
243
- def arrayize(key)
244
- return key + '_array'
358
+ def method_missing(name, *args)
359
+ result = begin
360
+ call_virtual_method(name, args)
361
+ rescue NameError
362
+ raise if fm_builder.options[:error_on_missing_key]
363
+ end
364
+ result = '' if result.nil? and fm_builder.options[:empty_string_for_nil]
365
+ result
245
366
  end
246
367
 
247
368
  # There are several virtual methods for each attribute/tag.
@@ -263,70 +384,146 @@ module FeedMe
263
384
  # array.size.
264
385
  # 7. If the tag name is of the form "tag+rel", the tag having the
265
386
  # specified rel value is returned
266
- def call_virtual_method(name, args, history=[])
387
+ def call_virtual_method(sym, args=[], history=[])
267
388
  # make sure we don't get stuck in an infinite loop
268
389
  history.each do |call|
269
- if call[0] == fm_tag_name and call[1] == name
270
- puts name
271
- puts self.inspect
272
- raise FeedMe::InfiniteCallLoopError.new(name, history)
390
+ if call[0] == fm_tag_name and call[1] == sym
391
+ raise FeedMe::InfiniteCallLoopError.new(sym, history)
273
392
  end
274
393
  end
275
- history << [ fm_tag_name, name ]
394
+ history << [ fm_tag_name, sym ]
276
395
 
277
- raw_name = name
278
- name = clean_tag(name)
396
+ name = clean_tag(sym)
279
397
  name_str = name.to_s
280
- array_key = clean_tag(arrayize(name.to_s))
281
-
282
- if name_str[-1,1] == '?'
398
+ array_key = arrayize(name.to_s)
399
+
400
+ result = if key? name
401
+ self[name]
402
+ elsif key? array_key
403
+ self[array_key].first
404
+ elsif name_str[-1,1] == '?'
283
405
  !call_virtual_method(name_str[0..-2], args, history).nil? rescue false
284
406
  elsif name_str[-1,1] == '!'
285
407
  value = call_virtual_method(name_str[0..-2], args, history)
286
- fm_builder.bang_mods.each do |bm|
287
- parts = bm.to_s.split('_')
288
- bm_key = parts[0].to_sym
289
- next unless fm_builder.bang_mod_fns.key?(bm_key)
290
- value = fm_builder.bang_mod_fns[bm_key].call(value, *parts[1..-1])
408
+ _transform(fm_builder.default_transformation, value)
409
+ elsif name_str =~ /(.+)_values/
410
+ call_virtual_method(arrayize($1), args, history).collect do |value|
411
+ _resolve_value value
291
412
  end
292
- return value
293
- elsif key? name
294
- self[name]
295
- elsif key? array_key
296
- self[array_key].first
297
413
  elsif name_str =~ /(.+)_value/
414
+ _resolve_value call_virtual_method($1, args, history)
415
+ elsif name_str =~ /(.+)_count/
416
+ call_virtual_method(arrayize($1), args, history).size
417
+ elsif name_str =~ /(.+)_(.+)/ && fm_builder.transformations.key?($2)
298
418
  value = call_virtual_method($1, args, history)
299
- if value.is_a?(FeedData)
300
- fm_builder.value_tags.each do |tag|
301
- return value.call_virtual_method(tag, args, history) rescue nil
302
- end
303
- else
304
- value
419
+ _transform(fm_builder.transformations[$2], value)
420
+ elsif name_str.include?('/') # this is only intended to be used internally
421
+ value = self
422
+ name_str.split('/').each do |p|
423
+ parts = p.split('_')
424
+ name = clean_tag(parts[0])
425
+ new_args = parts.size > 1 ? parts[1..-1] : args
426
+ value = (value.method(name).call(*new_args) rescue
427
+ value.call_virtual_method(name, new_args, history)) rescue nil
428
+ break if value.nil?
305
429
  end
306
- elsif name_str =~ /(.+)_count/
307
- call_virtual_method(clean_tag(arrayize($1)), args, history).size
308
- elsif name_str.include?("+")
309
- tag_data = tag.to_s.split("+")
310
- rel = tag_data[1]
311
- call_virtual_method(clean_tag(arrayize(tag_data[0])), args, history).each do |elt|
430
+ value
431
+ elsif name_str.include?('+')
432
+ name_data = name_str.split('+')
433
+ rel = name_data[1]
434
+ value = nil
435
+ call_virtual_method(arrayize(name_data[0]), args, history).each do |elt|
312
436
  next unless elt.is_a?(FeedData) and elt.rel?
313
- return elt if elt.rel.casecmp(rel) == 0
437
+ value = elt if elt.rel.casecmp(rel) == 0
438
+ break unless value.nil?
314
439
  end
440
+ value
315
441
  elsif fm_builder.aliases.key? name
316
- name = fm_builder.aliases[name]
317
- method(name).call(*args) rescue call_virtual_method(name, args, history)
318
- elsif fm_tag_name == :items # special handling for RDF items tag
319
- self[:'rdf:li_array'].method(raw_name).call(*args)
320
- elsif fm_tag_name == :'rdf:li' # special handling for RDF li tag
321
- uri = self[:'rdf:resource']
322
- fm_parent.fm_parent.item_array.each do |item|
323
- if item[:'rdf:about'] == uri
324
- return item.call_virtual_method(name, args, history)
325
- end
442
+ names = fm_builder.aliases[name]
443
+ names = [names] unless names.is_a? Array
444
+ value = nil
445
+ names.each do |name|
446
+ value = (method(name).call(*args) rescue
447
+ call_virtual_method(name, args, history)) rescue next
448
+ break unless value.nil?
326
449
  end
450
+ value
327
451
  else
328
- raise NameError.new("No such method #{name}", name)
452
+ nil
453
+ end
454
+
455
+ raise NameError.new("No such method '#{name}'", name) if result.nil?
456
+
457
+ result
458
+ end
459
+
460
+ # Apply transformations to a tag value. Can either accept a transformation
461
+ # name or an array of transformation function names.
462
+ def transform(tag, trans)
463
+ value = call_virtual_method(tag) or return nil
464
+ transformations = trans.is_a?(String) ?
465
+ fm_builder.transformations[trans] : trans
466
+ _transform(transformations, value)
467
+ end
468
+
469
+ protected
470
+
471
+ def clean_tag(tag)
472
+ tag.to_s.downcase.gsub(':','_').intern
473
+ end
474
+
475
+ # generate a name for the array variable corresponding to a single-value variable
476
+ def arrayize(key)
477
+ clean_tag(key.to_s + '_array')
478
+ end
479
+
480
+ def unarrayize(key)
481
+ clean_tag(key.to_s.gsub(/_array$/, ''))
482
+ end
483
+
484
+ private
485
+
486
+ def _transform(trans_array, value)
487
+ trans_array.each do |t|
488
+ if t.is_a? String
489
+ value = _transform(fm_builder.transformations[t], value)
490
+ else
491
+ if t.is_a? Symbol
492
+ t_name = t
493
+ args = []
494
+ elsif t[0].is_a? Array
495
+ raise 'array where symbol expected'
496
+ else
497
+ t_name = t[0]
498
+ args = t[1..-1]
499
+ end
500
+
501
+ trans = fm_builder.transformation_fns[t_name] or
502
+ raise NameError.new("No such transformation #{t_name}", t_name)
503
+
504
+ if value.is_a? Array
505
+ value = value.collect {|x| trans.call(x, *args) }
506
+ else
507
+ value = trans.call(value, *args)
508
+ end
509
+ end
510
+ end
511
+ value
512
+ end
513
+
514
+ def _resolve_value(obj)
515
+ value = obj
516
+ if obj.is_a?(FeedData)
517
+ if fm_builder.value_tags.key? obj.fm_tag_name
518
+ value = obj.call_virtual_method(fm_builder.value_tags[obj.fm_tag_name])
519
+ else
520
+ fm_builder.default_value_tags.each do |tag|
521
+ value = obj.call_virtual_method(tag) rescue next
522
+ break unless value.nil?
523
+ end
524
+ end
329
525
  end
526
+ value
330
527
  end
331
528
  end
332
529
 
@@ -346,19 +543,31 @@ module FeedMe
346
543
  alias :feed :channel
347
544
 
348
545
  def fm_tag_name
349
- @fm_type == FeedMe::RSS ? 'channel' : 'feed'
546
+ @fm_type == FeedMe::ATOM ? 'feed' : 'channel'
547
+ end
548
+
549
+ def fm_prefix
550
+ fm_type.to_s.downcase
350
551
  end
351
552
 
352
553
  private
353
554
 
354
555
  def parse
355
556
  # RSS = everything between channel tags + everthing between </channel> and </rdf> if this is an RDF document
356
- if @fm_source =~ %r{<(?:.*?:)?(?:rss|rdf)(.*?)>.*?<(?:.*?:)?channel(.*?)>(.+)</(?:.*?:)?channel>(.*)</(?:.*?:)?(?:rss|rdf)>}mi
357
- @fm_type = FeedMe::RSS
557
+ if @fm_source =~ %r{<(?:.*?:)?(rss|rdf)(.*?)>.*?<(?:.*?:)?channel(.*?)>(.+)</(?:.*?:)?channel>(.*)</(?:.*?:)?(?:rss|rdf)>}mi
558
+ @fm_type = $2.upcase.to_s
358
559
  @fm_tags = fm_builder.all_rss_tags
359
- attrs = parse_attributes($1, $2)
560
+ attrs = parse_attributes($1, $3)
360
561
  attrs[:version] ||= '1.0';
361
- parse_content(self, attrs, $3 + nil_safe_to_s($4), @fm_tags)
562
+ parse_content(self, attrs, $4, @fm_tags)
563
+
564
+ # for RDF documents, replace references with actual items
565
+ unless nil_or_empty?($5)
566
+ refs = FeedData.new(nil, nil, fm_builder)
567
+ parse_content(refs, {}, $5, @fm_tags)
568
+ dereference_rdf_tags(:items_array, :item_array, refs) {|a| a.first[:rdf_seq_array].first[:rdf_li_array] }
569
+ [:image_array, :textinput_array].each {|tag| dereference_rdf_tags(tag, tag, refs) }
570
+ end
362
571
  # Atom = everthing between feed tags
363
572
  elsif @fm_source =~ %r{<(?:.*?:)?feed(.*?)>(.+)</(?:.*?:)?feed>}mi
364
573
  @fm_type = FeedMe::ATOM
@@ -369,21 +578,37 @@ module FeedMe
369
578
  end
370
579
  end
371
580
 
581
+ # References within the <channel> element are replaced by the actual
582
+ def dereference_rdf_tags(rdf_tag, rss_tag, refs)
583
+ if self.key?(rdf_tag)
584
+ src_items = self.delete(rdf_tag)
585
+ src_items = yield(src_items) if block_given?
586
+ ref_items = refs[rss_tag]
587
+ unless src_items.empty? || ref_items.empty?
588
+ self[rss_tag] = src_items.collect do |src_item|
589
+ next unless src_item.key?(:rdf_resource)
590
+ uri = src_item[:rdf_resource]
591
+ ref_items.each do |ref_item|
592
+ next unless ref_item.key?(:rdf_about)
593
+ if (ref_item[:rdf_about].eql?(uri))
594
+ ref_item[:rdf_resource] = uri
595
+ break ref_item
596
+ end
597
+ end
598
+ end
599
+ end
600
+ end
601
+ end
602
+
372
603
  def parse_content(parent, attrs, content, tags)
373
604
  # add attributes to parent
374
- attrs.each_pair {|key, value| add_tag(parent, key, unescape(value)) }
375
-
376
- # the first item in a tag array may be a hash that defines tags that have subtags
377
- first_tag = 0
378
- if !tags.nil? && tags[0].is_a?(Hash)
379
- sub_tags = tags[0]
380
- first_tag = 1
381
- end
382
-
605
+ attrs.each_pair {|key, value| parent[key] = unescape(value) }
606
+ return if content.nil?
607
+
383
608
  # split the content into elements
384
609
  elements = {}
385
- # TODO: this will break if a namespace is used that is not rss: or atom:
386
- content.scan( %r{(<(?:rss:|atom:)?([^ >]+)([^>]*)(?:/>|>(.*?)</(?:rss:|atom:)?\2>))}mi ) do |match|
610
+ # TODO: this will break if a namespace is used that is not rss: or atom:
611
+ content.scan( %r{(<([\w:]+)(.*?)(?:/>|>(.*?)</\2>))}mi ) do |match|
387
612
  # \1 = full content (from start to end tag), \2 = tag name
388
613
  # \3 = attributes, and \4 = content between tags
389
614
  key = clean_tag(match[1])
@@ -395,33 +620,37 @@ module FeedMe
395
620
  end
396
621
  end
397
622
 
398
- # check if this is a promiscuous parser
399
- if tags.nil? || tags.empty? || (tags.size == 1 && first_tag == 1)
400
- tags = elements.keys
401
- first_tag = 0
402
- end
403
-
623
+ # the first item in a tag array may be a hash that defines tags that have subtags
624
+ sub_tags = tags[0] if !nil_or_empty?(tags) && tags[0].is_a?(Hash)
625
+ first_tag = sub_tags.nil? || tags.size == 1 ? 0 : 1
626
+ # if this is a promiscuous parser, tag names will depend on the elements found in the feed
627
+ tags = elements.keys if (sub_tags.nil? ? nil_or_empty?(tags) : first_tag == 0)
628
+
404
629
  # iterate over all tags (some or all of which may not be present)
405
630
  tags[first_tag..-1].each do |tag|
406
631
  key = clean_tag(tag)
407
- element_array = elements.delete(tag) or next
632
+ element_array = elements.delete(tag) or next
408
633
  @fm_parsed << key
409
634
 
410
635
  element_array.each do |elt|
636
+ elt_attrs = elt[0]
637
+ elt_content = elt[1]
638
+ rels = fm_builder.rels[key] if fm_builder.respond_to?(:rels)
639
+
640
+ # if a list of accepted rels is specified, only parse this tag
641
+ # if its rel attribute is inlcuded in the list
642
+ next unless rels.nil? || elt_attrs.nil? || !elt_attrs.rel? || rels.include?(elt_attrs.rel)
643
+
411
644
  if !sub_tags.nil? && sub_tags.key?(key)
412
- if fm_builder.ghost_tags.include? key
413
- new_parent = parent
414
- else
415
- new_parent = FeedData.new(key, parent, fm_builder)
416
- add_tag(parent, key, new_parent)
417
- end
418
- parse_content(new_parent, elt[0], elt[1], sub_tags[key])
645
+ new_parent = FeedData.new(key, parent, fm_builder)
646
+ add_tag(parent, key, new_parent)
647
+ parse_content(new_parent, elt_attrs, elt_content, sub_tags[key])
419
648
  else
420
- add_tag(parent, key, clean_content(key, elt[0], elt[1], parent))
649
+ add_tag(parent, key, clean_content(key, elt_attrs, elt_content, parent))
421
650
  end
422
651
  end
423
652
  end
424
-
653
+
425
654
  @fm_unparsed += elements.keys
426
655
 
427
656
  @fm_parsed.uniq!
@@ -429,7 +658,7 @@ module FeedMe
429
658
  end
430
659
 
431
660
  def add_tag(hash, key, value)
432
- array_var = clean_tag(arrayize(key.to_s))
661
+ array_var = arrayize(key)
433
662
  if hash.key? array_var
434
663
  hash[array_var] << value
435
664
  else
@@ -446,18 +675,19 @@ module FeedMe
446
675
  content = content.to_s
447
676
  if fm_builder.date_tags.include? tag
448
677
  content = Time.parse(content) rescue unescape(content)
449
- else
450
- content = unescape(content)
678
+ else
679
+ content = unescape(content)
451
680
  end
452
681
 
453
682
  unless attrs.empty?
454
- hash = FeedData.new(tag, parent, fm_builder, attrs)
683
+ hash = FeedData.new(tag, parent, fm_builder)
684
+ attrs.each_pair {|key, value| hash[key] = unescape(value) }
455
685
  if !content.empty?
456
686
  hash[FeedMe::CONTENT_KEY] = content
457
687
  end
458
688
  return hash
459
689
  end
460
-
690
+
461
691
  return content
462
692
  end
463
693
 
@@ -466,9 +696,9 @@ module FeedMe
466
696
  attrs.each do |a|
467
697
  next if a.nil?
468
698
  # pull key/value pairs out of attr string
469
- array = a.scan(/(\w+)=['"]?([^'"]+)/)
699
+ array = a.scan(/([\w:]+)=['"]?([^'"]+)/)
470
700
  # unescape values
471
- array = array.collect {|key, value| [clean_tag(format_tag(key)), unescape(value)]}
701
+ array = array.collect {|key, value| [clean_tag(key), unescape(value)]}
472
702
  hash.merge! Hash[*array.flatten]
473
703
  end
474
704
  return hash
@@ -484,32 +714,10 @@ module FeedMe
484
714
  content = cdata[1] if cdata
485
715
 
486
716
  return content
487
-
488
- #if content =~ /([^-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]%)/n then
489
- # CGI.unescapeHTML(content).gsub(/(<!\[CDATA\[|\]\]>)/,'').strip
490
- #else
491
- # content.gsub(/(<!\[CDATA\[|\]\]>)/,'').strip
492
- #end
493
- end
494
-
495
- def underscore(camel_cased_word)
496
- camel_cased_word.to_s.gsub(/::/, '/').
497
- gsub(/([A-Z]+)([A-Z][a-z])/,'\1_\2').
498
- gsub(/([a-z\d])([A-Z])/,'\1_\2').
499
- tr("-", "_").
500
- downcase
501
- end
502
-
503
- def camelize(lower_case_and_underscored_word, first_letter_in_uppercase = true)
504
- if first_letter_in_uppercase
505
- lower_case_and_underscored_word.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
506
- else
507
- lower_case_and_underscored_word[0,1].downcase + camelize(lower_case_and_underscored_word)[1..-1]
508
- end
509
717
  end
510
718
 
511
- def nil_safe_to_s(obj)
512
- obj.nil? ? '' : obj.to_s
719
+ def nil_or_empty?(obj)
720
+ obj.nil? || obj.empty? || (obj.is_a?(String) && obj.strip.empty?)
513
721
  end
514
722
  end
515
723