feedme 0.1 → 0.8.0
Sign up to get free protection for your applications and to get access to all the features.
- data/History.txt +100 -0
- data/Manifest.txt +2 -0
- data/README.txt +124 -46
- data/Rakefile +21 -4
- data/examples/rocketboom.rb +5 -4
- data/lib/feedme.rb +399 -191
- data/lib/hpricot-util.rb +82 -0
- data/lib/html-cleaner.rb +188 -0
- data/lib/nokogiri-util.rb +117 -0
- data/lib/util.rb +45 -0
- metadata +12 -16
data/History.txt
CHANGED
@@ -1,3 +1,103 @@
|
|
1
|
+
=== 0.8 / 2009-12-14
|
2
|
+
|
3
|
+
* Add new virtual method _values: returns all values for a given tag.
|
4
|
+
* Transformations with arguments are now specified as an array rather than
|
5
|
+
part of the symbol
|
6
|
+
* Add transform method
|
7
|
+
* Add regexp transform
|
8
|
+
* Add nokogiri support (hpricot is still the default)
|
9
|
+
* Copy/paste and fix feed-normalizer clean_html method, drop feed-normalizer dependency
|
10
|
+
|
11
|
+
=== 0.7.1 / 2009-09-24
|
12
|
+
|
13
|
+
* Fix nil_or_empty? to strip whitespace from strings
|
14
|
+
|
15
|
+
=== 0.7 / 2009-09-24
|
16
|
+
|
17
|
+
* Design decision: all element and attribute names will be stored as lower-case. They may still
|
18
|
+
be accessed using upper case, since keys will be normalized by all accessors.
|
19
|
+
* Design decision: RDF will be dealt with at parse time: elements with rdf:resource attributes will be
|
20
|
+
replaced by the actual, referenced elements. Ordering of the referring elements will be preserved.
|
21
|
+
* Removed the concept of ghost tags.
|
22
|
+
|
23
|
+
=== 0.6.5 / 2009-09-24
|
24
|
+
|
25
|
+
* Fix :truncHtml completely by requiring active_support.
|
26
|
+
|
27
|
+
=== 0.6.4 / 2009-09-23
|
28
|
+
|
29
|
+
* Roll version to make github happy.
|
30
|
+
|
31
|
+
=== 0.6.3 / 2009-09-23
|
32
|
+
|
33
|
+
* Fix truncHtml: use code by Henrik Nyh, which in turn uses Hypricot
|
34
|
+
|
35
|
+
=== 0.6.2 / 2009-09-23
|
36
|
+
|
37
|
+
* Fix content-parsing regular expression to correctly handle closed elements
|
38
|
+
* Reverse earlier design decision: keep namespaces for attributes.
|
39
|
+
|
40
|
+
=== 0.6.1 / 2009-09-23
|
41
|
+
|
42
|
+
* Improve handling of rdf:items. From now on, .items will forward to .item_array. The rdf items can still be accessed by [:items_array] or .items_array.
|
43
|
+
|
44
|
+
=== 0.6 / 2009-09-23
|
45
|
+
|
46
|
+
* Fix handling of the items element (mostly affects RSS 1.0 documents)
|
47
|
+
* Make attribute naming consistent
|
48
|
+
* Design decision: attributes can only ever have a single value, so they will always be stored as scalars
|
49
|
+
rather than arrays. This will also nicely resolve any possible collisions between attribute and tag names.
|
50
|
+
|
51
|
+
=== 0.5.4 / 2009-09-22
|
52
|
+
|
53
|
+
* Minor improvements to to_indented_s
|
54
|
+
* Fix tag names: change all tags with namespaces to the cleaned version (unquote, ':' replaced with '_')
|
55
|
+
* Design decision: all attribute names will have their namespaces stripped; namespaces are generally
|
56
|
+
treated as optional (even if they aren't technically so) and it's annoying to have to check both forms;
|
57
|
+
this decision may be reversed if there are found to be conflicts
|
58
|
+
|
59
|
+
=== 0.5.3 / 2009-09-22
|
60
|
+
|
61
|
+
* Roll version to test GitHub wierdness.
|
62
|
+
|
63
|
+
=== 0.5.2 / 2009-09-22
|
64
|
+
|
65
|
+
* Improve to_s method for prettier array display.
|
66
|
+
|
67
|
+
=== 0.5.1 / 2009-09-21
|
68
|
+
|
69
|
+
* Update example code
|
70
|
+
* Bug fix: call_virtual_method has invalid return if neither a key nor any of its aliases has a value
|
71
|
+
* Subsequent releases will follow standard versioning model of "major.minor.bugfix"
|
72
|
+
|
73
|
+
=== 0.5 / 2009-09-21
|
74
|
+
|
75
|
+
* Special handling for atom id tag
|
76
|
+
* to_indented_str method, which creates a pretty output for a FeedData
|
77
|
+
* Improved to_s method that delegates to to_indented_str
|
78
|
+
|
79
|
+
=== 0.4 / 2009-09-20
|
80
|
+
|
81
|
+
* Expose call_virtual_method as public
|
82
|
+
* Change 'name' argument of call_virtual_method to 'sym'
|
83
|
+
* Add default value for call_virtual_method 'args' argument
|
84
|
+
* Add :'media:content' and :'content:encoded' as ext tags
|
85
|
+
* fix use of FeedNormalizer in :cleanHtml transformation
|
86
|
+
|
87
|
+
=== 0.3 / 2009-09-18
|
88
|
+
|
89
|
+
* Update example code
|
90
|
+
* Bug fix: call_virtual_method always throws exception
|
91
|
+
* Bug fix: responds_to? -> respond_to? and rels -> :rels
|
92
|
+
|
93
|
+
=== 0.2 / 2009-09-12
|
94
|
+
|
95
|
+
* Change bang mods to more flexible transformations framework.
|
96
|
+
* Add additional transformation functions.
|
97
|
+
* Add methods for RSS/Atom emulation that automatically add appropriate aliases.
|
98
|
+
* Add empty_string_for_nil and error_on_missing_key options.
|
99
|
+
* Add support for parsing only certain rels in the strict parser.
|
100
|
+
|
1
101
|
=== 0.1 / 2009-09-03
|
2
102
|
|
3
103
|
* Everything is new. First release.
|
data/Manifest.txt
CHANGED
data/README.txt
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
= feedme
|
2
2
|
|
3
|
-
* http://
|
3
|
+
* http://wiki.github.com/jdidion/feedme
|
4
4
|
|
5
5
|
== DESCRIPTION:
|
6
6
|
|
@@ -24,76 +24,143 @@ The API is similar to SimpleRSS:
|
|
24
24
|
require 'open-uri'
|
25
25
|
|
26
26
|
rss = FeedMe.parse open('http://slashdot.org/index.rdf')
|
27
|
-
|
28
|
-
rss.version # => 1.0
|
27
|
+
rss.version # => 1.0
|
29
28
|
rss.channel.title # => "Slashdot"
|
30
29
|
rss.channel.link # => "http://slashdot.org/"
|
31
30
|
rss.items.first.link # => "http://books.slashdot.org/article.pl?sid=05/08/29/1319236&from=rss"
|
32
31
|
|
33
|
-
But since the parser can read Atom feeds as easily as RSS feeds, there are
|
32
|
+
But since the parser can read Atom feeds as easily as RSS feeds, there are aliases that allow more atom like reading:
|
34
33
|
|
35
34
|
rss.feed.title # => "Slashdot"
|
36
35
|
rss.feed.link # => "http://slashdot.org/"
|
37
36
|
rss.entries.first.link # => "http://books.slashdot.org/article.pl?sid=05/08/29/1319236&from=rss"
|
38
|
-
|
39
|
-
Under the covers, all
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
37
|
+
|
38
|
+
Under the covers, all element values are stored in arrays. This means that you can access all content for an element that appears multiple times (i.e. category):
|
39
|
+
|
40
|
+
rss.items.first.category_array # => ["News for Nerds", "Technology"]
|
41
|
+
rss.items.first.category # => "News for Nerds"
|
42
|
+
|
44
43
|
You also have access to all the attributes as well as tag values:
|
45
44
|
|
46
|
-
|
47
|
-
|
45
|
+
rss.items.first.guid.isPermaLink # => "true"
|
46
|
+
rss.items.first.guid.content # => http://books.slashdot.org/article.pl?sid=05/08/29/1319236
|
48
47
|
|
49
48
|
FeedMe also adds some syntactic sugar that makes it easy to get the information you want:
|
50
49
|
|
51
|
-
|
52
|
-
|
53
|
-
|
50
|
+
rss.items.first.category? # => true
|
51
|
+
rss.items.first.category_count # => 2
|
52
|
+
rss.items.first.guid_value # => http://books.slashdot.org/article.pl?sid=05/08/29/1319236
|
54
53
|
|
55
54
|
There are two different parsers that you can use, depending on your needs. The default parser is "promiscuous," meaning that it parses all tags. There is also a strict parser that only parses tags specified in a list. Here is how you create the different types of parsers:
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
55
|
+
|
56
|
+
FeedMe.parse(source) # parse using the default (promiscuous) parser
|
57
|
+
FeedMe::ParserBuilder.new.parse(source) # equivalent to the previous line
|
58
|
+
FeedMe.parse_strict(source)
|
59
|
+
FeedMe::StrictParserBuilder.new.parse(source) # only parse certain tags
|
60
|
+
|
61
|
+
The FeedMe class methods and the parser builder constructors also accept an options hash. Options are also passed on to the Parser constructor. Currently, only two options are available:
|
62
|
+
|
63
|
+
1. :empty_string_for_nil => false # return the empty string instead of a nil value
|
64
|
+
2. :error_on_missing_key => false # raise an error if a specified key or virtual method does not exist (otherwise nil is returned)
|
60
65
|
|
61
66
|
The strict parser can be extended by adding new tags to parse:
|
62
67
|
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
+
builder = FeedMe::StrictParserBuilder.new
|
69
|
+
builder.rss_tags << :some_new_tag
|
70
|
+
builder.rss_item_tags << :'item+myrel' # parse an item that has a custom rel type
|
71
|
+
builder.item_ext_tags << :feedburner_origLink # parse an extension tag - one that has a specific
|
72
|
+
# namespace (use '_', not ':', to separate namespace
|
73
|
+
# from attribute name)
|
74
|
+
|
68
75
|
Either parser can be extended by adding aliases to existing tags:
|
69
76
|
|
70
|
-
|
77
|
+
builder.aliases[:updated] => :pubDate # now you can always access the updated date using :updated,
|
78
|
+
# regardless of whether it's an RSS or Atom feed
|
79
|
+
|
80
|
+
If you don't know ahead of time what type of feed you'll be parsing, you can tell FeedMe to always emulate RSS or Atom. These methods just add a bunch of aliases:
|
81
|
+
|
82
|
+
builder.emulate_rss!
|
83
|
+
builder.emulate_atom!
|
84
|
+
|
85
|
+
Another bit of syntactic sugar are transformations. These are modifications that can be applied to feed content. There is a default transformation that can be applied by adding '!' to the tag name.
|
86
|
+
|
87
|
+
rss.entry.content # => <div>Some great stuff</div>
|
88
|
+
rss.entry.content! # => Some great stuff
|
89
|
+
|
90
|
+
The default transformation can be changed:
|
91
|
+
|
92
|
+
builder.default_transformation = [ :cleanHtml ]
|
93
|
+
|
94
|
+
Custom transformations are defined by mapping one or more transformation functions to a suffix:
|
95
|
+
|
96
|
+
builder.transformations['clean'] = [ :cleanHtml ]
|
97
|
+
|
98
|
+
rss.entry.content # => <div>This is a bunch of text</div><p></p></html>
|
99
|
+
rss.entry.content_clean # => <div>This is a bunch of text</div>
|
100
|
+
|
101
|
+
You can also/instead apply an arbitrary set of transformations via the transform method:
|
71
102
|
|
72
|
-
|
103
|
+
rss.entry.transform(:content, [ :clean, [ :trunc, 50 ] ])
|
73
104
|
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
105
|
+
You can create your own transformation function. The following is an example of a transformation function that takes an argument. Note that transformation function names may only contain alphanumeric characters. Argument values are specified at the end separated by underscores.
|
106
|
+
|
107
|
+
builder.transformation_fns[:wrap] => proc {|str, col|
|
108
|
+
str.gsub(/(.{1,#{col}})( +|$\n?)|(.{1,#{col}})/, "\\1\\3\n").strip
|
109
|
+
}
|
110
|
+
builder.transformations['wrap'] = [ :wrap_10 ]
|
111
|
+
|
112
|
+
rss.entry.content = This is a bunch of text
|
113
|
+
rss.entry.content_wrap = This is a
|
114
|
+
bunch of
|
115
|
+
text
|
78
116
|
|
79
|
-
|
80
|
-
builder.bang_mod_fns[:wrap] => proc {|str, col| str.gsub(/(.{1,#{col}})( +|$\n?)|(.{1,#{col}})/, "\\1\\3\n").strip }
|
81
|
-
builder.bang_mods << :wrap_80
|
117
|
+
The transformation functions available by default are:
|
82
118
|
|
119
|
+
1. :stripHtml - described above
|
120
|
+
2. :cleanHtml - ** Requires FeedNormalizer (which in turn requires Hypricot) **
|
121
|
+
|
122
|
+
rss.entry_array[0].content # => 1 > 2
|
123
|
+
rss.entry_array[0].content! # => 1 > 2
|
124
|
+
|
125
|
+
rss.entry_array[1].content # => <div>Some great stuff</div><p></p></html>
|
126
|
+
rss.entry_array[1].content! # => <div>Some great stuff</div>
|
127
|
+
|
128
|
+
3. :wrap - takes number of columns as a parameter. Respects word boundaries. Example of :wrap_10:
|
129
|
+
|
130
|
+
rss.entry.content # => This is a bunch of text
|
131
|
+
rss.entry.content! # => This is a
|
132
|
+
bunch of
|
133
|
+
text
|
134
|
+
|
135
|
+
4. :trunc - truncates text to a certain length. Example of :trunc_10:
|
136
|
+
|
137
|
+
rss.entries.first.content # => This is a long long long sentence
|
138
|
+
rss.entries.first.content! # => This is a
|
139
|
+
|
140
|
+
5. :truncHtml - truncates the content inside the first set of HTML tags, but preserves the tags. ** Requires ActiveSupport and Hpricot ** Example of :truncHtml_10:
|
141
|
+
|
142
|
+
rss.entries.first.content # => <div>This is a long long long sentence</div></html>
|
143
|
+
rss.entries.first.content! # => <div>This is a </div></html>
|
144
|
+
|
145
|
+
6. :regexp - apply a regular expression and extract the capture groups
|
146
|
+
|
147
|
+
rss.entries.first.content # => This is a long long long entry
|
148
|
+
rss.entries.first.transform(:content, [ :regexp, /(This is a long ).*(entry)/ ]) # => This is a long entry
|
149
|
+
|
83
150
|
In order to prevent clashes between tag/attribute names and the parser class' instance variables, all instance variables are prefixed with 'fm_'. They are:
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
151
|
+
|
152
|
+
fm_source # the original, unparsed source
|
153
|
+
fm_options # the options passed to the parser constructor
|
154
|
+
fm_type # the feed type
|
155
|
+
fm_tags # the tags the parser looks for in the source
|
156
|
+
fm_parsed # the list of tags the parser actually found
|
157
|
+
fm_unparsed # the list of tags that appeared in the feed but were not parsed (useful for debugging)
|
91
158
|
|
92
159
|
Additionally, there are several variables that are available at every level of the parse tree:
|
93
160
|
|
94
|
-
|
95
|
-
|
96
|
-
|
161
|
+
fm_builder # the ParserBuilder that created the parser
|
162
|
+
fm_parent # the container of the current level of the parse tree
|
163
|
+
fm_tag_name # the name of the rss/atom tag whose content is contained in this level of the tree
|
97
164
|
|
98
165
|
=== A word on RSS/Atom Versions
|
99
166
|
|
@@ -107,9 +174,20 @@ Due to various incompatibilities between different RSS versions, it is strongly
|
|
107
174
|
|
108
175
|
== INSTALL:
|
109
176
|
|
110
|
-
* gem install feedme
|
111
|
-
* http://
|
177
|
+
* gem install jdidion-feedme (Add GitHub as a gem source: gem sources -a http://gems.github.com)
|
178
|
+
* http://github.com/jdidion/feedme/downloads
|
179
|
+
|
180
|
+
To use certain features of FeedMe, some dependencies are required:
|
181
|
+
* To use the :truncHtml transformation for truncating HTML content, ActiveSupport and Hpricot are required
|
182
|
+
|
183
|
+
sudo gem install activesupport
|
184
|
+
sudo gem install hpricot
|
185
|
+
|
186
|
+
* To use the :cleanHtml for sanitizing HTML, FeedNormalizer and Hpricot are required
|
187
|
+
|
188
|
+
sudo gem install feed-normalizer
|
189
|
+
sudo gem install hpricot
|
112
190
|
|
113
191
|
== LICENSE:
|
114
192
|
|
115
|
-
This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
|
193
|
+
This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
|
data/Rakefile
CHANGED
@@ -1,7 +1,24 @@
|
|
1
1
|
require 'rubygems'
|
2
|
-
require '
|
2
|
+
require 'jeweler'
|
3
3
|
|
4
|
-
|
5
|
-
|
6
|
-
|
4
|
+
tasks = Jeweler::Tasks.new do |s|
|
5
|
+
s.name = "feedme"
|
6
|
+
s.authors = ["John Didion"]
|
7
|
+
s.description = %q{A simple, flexible, and extensible RSS and Atom parser for Ruby. Based on the popular SimpleRSS library, but with many nice extra features.}
|
8
|
+
s.email = ["code@didion.net"]
|
9
|
+
s.extra_rdoc_files = ["History.txt", "Manifest.txt", "README.txt"]
|
10
|
+
s.files = ["History.txt", "Manifest.txt", "README.txt", "Rakefile",
|
11
|
+
"lib/feedme.rb", "lib/hpricot-util.rb", "lib/nokogiri-util.rb",
|
12
|
+
"lib/html-cleaner.rb", "lib/util.rb", "examples/rocketboom.rb",
|
13
|
+
"examples/rocketboom.rss", "test/test_helper.rb"]
|
14
|
+
s.homepage = %q{http://wiki.github.com/jdidion/feedme}
|
15
|
+
s.rdoc_options = ["--main", "README.txt"]
|
16
|
+
s.require_paths = ["lib"]
|
17
|
+
s.rubyforge_project = %q{feedme}
|
18
|
+
s.summary = %q{A simple, flexible, and extensible RSS and Atom parser for Ruby}
|
19
|
+
s.test_files = ["test/test_helper.rb"]
|
7
20
|
end
|
21
|
+
tasks.jeweler.remote = 'github'
|
22
|
+
Jeweler::GemcutterTasks.new
|
23
|
+
|
24
|
+
|
data/examples/rocketboom.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
|
-
|
2
|
-
require '
|
1
|
+
#!/usr/bin/ruby
|
2
|
+
require 'rubygems'
|
3
|
+
require 'feedme'
|
3
4
|
require 'net/http'
|
4
5
|
|
5
6
|
def fetch(url)
|
@@ -24,13 +25,13 @@ end
|
|
24
25
|
# create a new ParserBuilder
|
25
26
|
builder = FeedMe::ParserBuilder.new
|
26
27
|
# add a bang mod to wrap content to 50 columns
|
27
|
-
builder.
|
28
|
+
builder.default_transformation << :wrap_80
|
28
29
|
|
29
30
|
# parse the rss feed
|
30
31
|
rss = builder.parse(content)
|
31
32
|
|
32
33
|
# equivalent to rss.channel.title
|
33
|
-
puts "#{rss.
|
34
|
+
puts "#{rss.class} Feed: #{rss.title}"
|
34
35
|
|
35
36
|
# use a virtual method...this one a shortcut to rss.items.size
|
36
37
|
puts "#{rss.item_count} items"
|
data/lib/feedme.rb
CHANGED
@@ -1,54 +1,84 @@
|
|
1
|
-
####################################################################################
|
2
|
-
# FeedMe v0.1
|
3
|
-
#
|
4
|
-
# FeedMe is an easy to use parser for RSS and Atom files. It is based on SimpleRSS,
|
5
|
-
# but has some improvements that make it worth considering:
|
6
|
-
# 1. Support for attributes
|
7
|
-
# 2. Support for nested elements
|
8
|
-
# 3. Support for elements that appear multiple times
|
9
|
-
# 4. Syntactic sugar that makes it easier to get at the information you want
|
10
|
-
#
|
11
|
-
# One word of caution: FeedMe will be maintained only so long as SimpleRSS does not
|
12
|
-
# provide the above features. I will try to keep FeedMe's API compatible with
|
13
|
-
# SimpleRSS so that it will be easy for users to switch if/when necessary.
|
14
|
-
####################################################################################
|
15
|
-
|
16
1
|
require 'cgi'
|
17
2
|
require 'time'
|
3
|
+
require 'util.rb'
|
18
4
|
|
19
5
|
module FeedMe
|
20
|
-
|
6
|
+
# The current version of FeedMe.
|
7
|
+
VERSION = "0.7.2"
|
21
8
|
|
22
|
-
#
|
9
|
+
# The value of Parser#fm_type for RSS feeds.
|
23
10
|
RSS = :RSS
|
11
|
+
# The value of Parser#fm_type for RDF (RSS 1.0) feeds.
|
12
|
+
RDF = :RDF
|
13
|
+
# The value of Parser#fm_type for Atom feeds.
|
24
14
|
ATOM = :ATOM
|
25
15
|
|
26
|
-
#
|
16
|
+
# The key used to access the content element of a mixed tag.
|
27
17
|
CONTENT_KEY = :content
|
28
18
|
|
19
|
+
# Helper libraries for HTML functions
|
20
|
+
NOKOGIRI_HELPER = 'nokogiri-util.rb'
|
21
|
+
HPRICOT_HELPER = 'hpricot-util.rb'
|
22
|
+
|
23
|
+
# Parse a feed using the promiscuous parser.
|
29
24
|
def FeedMe.parse(source, options={})
|
30
|
-
ParserBuilder.new.parse(source
|
25
|
+
ParserBuilder.new(options).parse(source)
|
31
26
|
end
|
32
27
|
|
28
|
+
# Parse a feed using the strict parser.
|
33
29
|
def FeedMe.parse_strict(source, options={})
|
34
|
-
StrictParserBuilder.new.parse(source
|
30
|
+
StrictParserBuilder.new(options).parse(source)
|
35
31
|
end
|
36
32
|
|
33
|
+
# This class is used to create promiscuous parsers.
|
37
34
|
class ParserBuilder
|
38
|
-
|
39
|
-
|
40
|
-
|
35
|
+
# The options passed to this ParserBuilder's constructor.
|
36
|
+
attr_reader :options
|
37
|
+
# The tags that are parsed for RSS feeds.
|
38
|
+
attr_accessor :rss_tags
|
39
|
+
# The subtags of item elements that are parsed for RSS feeds.
|
40
|
+
attr_accessor :rss_item_tags
|
41
|
+
# The tags that are parsed for Atom feeds.
|
42
|
+
attr_accessor :atom_tags
|
43
|
+
# The subtags of entry elements that are parsed for Atom feeds.
|
44
|
+
attr_accessor :atom_entry_tags
|
45
|
+
# The names of tags that should be parsed as date values.
|
46
|
+
attr_accessor :date_tags
|
47
|
+
# An array of names of attributes/subtags whose values can be
|
48
|
+
# used as the default value of a mixed element.
|
49
|
+
attr_accessor :value_tags
|
50
|
+
# Tags to use for element value when specific tag isn't specified
|
51
|
+
attr_accessor :default_value_tags
|
52
|
+
# A hash of attribute/tag name aliases.
|
53
|
+
attr_accessor :aliases
|
54
|
+
# An array of the transformation functions applied when the !
|
55
|
+
# suffix is added to the attribute/tag name.
|
56
|
+
attr_accessor :default_transformation
|
57
|
+
# Mapping of transformation names to functions. Each key is a
|
58
|
+
# suffix that can be appended to an attribute/tag name, and
|
59
|
+
# the value is an array of transformation function names that
|
60
|
+
# are applied when that transformation is used.
|
61
|
+
attr_accessor :transformations
|
62
|
+
# Mapping of transformation function names to Procs.
|
63
|
+
attr_accessor :transformation_fns
|
64
|
+
# the helper library used for HTML transformations
|
65
|
+
attr_accessor :html_helper_lib
|
41
66
|
|
42
|
-
#
|
43
|
-
|
67
|
+
# Create a new ParserBuilder. Allowed options are:
|
68
|
+
# * :empty_string_for_nil => false # return the empty string instead of a nil value
|
69
|
+
# * :error_on_missing_key => false # raise an error if a specified key or virtual
|
70
|
+
# method does not exist (otherwise nil is returned)
|
71
|
+
def initialize(options={})
|
72
|
+
@options = options
|
73
|
+
|
44
74
|
# rss tags
|
45
75
|
@rss_tags = [
|
46
76
|
{
|
47
77
|
:image => nil,
|
48
|
-
:
|
49
|
-
:
|
50
|
-
:
|
51
|
-
:items => [{ :
|
78
|
+
:textinput => nil,
|
79
|
+
:skiphours => nil,
|
80
|
+
:skipdays => nil,
|
81
|
+
:items => [{ :rdf_seq => nil }],
|
52
82
|
#:item => @rss_item_tags
|
53
83
|
}
|
54
84
|
]
|
@@ -70,14 +100,15 @@ module FeedMe
|
|
70
100
|
]
|
71
101
|
|
72
102
|
# tags whose value is a date
|
73
|
-
@date_tags = [ :
|
103
|
+
@date_tags = [ :pubdate, :lastbuilddate, :published, :updated, :dc_date,
|
104
|
+
:expirationdate ]
|
74
105
|
|
75
|
-
# tags that can be used as the default value for a
|
76
|
-
@value_tags =
|
106
|
+
# tags that can be used as the default value for a mixed element
|
107
|
+
@value_tags = {
|
108
|
+
:media_content => :url
|
109
|
+
}
|
110
|
+
@default_value_tags = [ CONTENT_KEY, :href, :url ]
|
77
111
|
|
78
|
-
# tags that don't become part of the parsed object tree
|
79
|
-
@ghost_tags = [ :'rdf:Seq' ]
|
80
|
-
|
81
112
|
# tag/attribute aliases
|
82
113
|
@aliases = {
|
83
114
|
:items => :item_array,
|
@@ -87,64 +118,130 @@ module FeedMe
|
|
87
118
|
:link => :'link+self'
|
88
119
|
}
|
89
120
|
|
90
|
-
#
|
91
|
-
@
|
92
|
-
@
|
93
|
-
|
94
|
-
|
121
|
+
# transformations
|
122
|
+
@html_helper_lib = HPRICOT_HELPER
|
123
|
+
@default_transformation = [ :cleanHtml ]
|
124
|
+
@transformations = {}
|
125
|
+
@transformation_fns = {
|
126
|
+
# remove all HTML tags
|
127
|
+
:stripHtml => proc do |str|
|
128
|
+
require @html_helper_lib
|
129
|
+
FeedMe.html_helper.strip_html(str)
|
130
|
+
end,
|
131
|
+
|
132
|
+
# clean HTML content using FeedNormalizer's HtmlCleaner class
|
133
|
+
:cleanHtml => proc do |str|
|
134
|
+
require @html_helper_lib
|
135
|
+
FeedMe.html_helper.clean_html(str)
|
136
|
+
end,
|
137
|
+
|
138
|
+
# wrap text at a certain number of characters (respecting word boundaries)
|
139
|
+
:wrap => proc do |str, col|
|
140
|
+
str.gsub(/(.{1,#{col}})( +|$\n?)|(.{1,#{col}})/, "\\1\\3\n").strip
|
141
|
+
end,
|
142
|
+
|
143
|
+
# truncate text, respecting word boundaries
|
144
|
+
:trunc => proc {|str, wordcount| str.trunc(wordcount.to_i) },
|
145
|
+
|
146
|
+
# truncate HTML and leave enclosing HTML tags
|
147
|
+
:truncHtml => proc do |str, wordcount|
|
148
|
+
require @html_helper_lib
|
149
|
+
FeedMe.html_helper.truncate_html(str, wordcount.to_i)
|
150
|
+
end,
|
151
|
+
|
152
|
+
:regexp => proc do |str, regexp|
|
153
|
+
match = Regexp.new(regexp).match(str)
|
154
|
+
match.nil? ? nil : match[1]
|
155
|
+
end,
|
95
156
|
}
|
96
157
|
end
|
97
158
|
|
159
|
+
# Prepare tag list for an RSS feed.
|
98
160
|
def all_rss_tags
|
99
161
|
all_tags = rss_tags.dup
|
100
162
|
all_tags[0][:item] = rss_item_tags.dup
|
101
163
|
return all_tags
|
102
164
|
end
|
103
165
|
|
166
|
+
# Prepare tag list for an Atom feed.
|
104
167
|
def all_atom_tags
|
105
168
|
all_tags = atom_tags.dup
|
106
169
|
all_tags[0][:entry] = atom_entry_tags.dup
|
107
170
|
return all_tags
|
108
171
|
end
|
109
172
|
|
110
|
-
|
173
|
+
# Add aliases so that Atom feed elements can be accessed
|
174
|
+
# using the names of their RSS counterparts.
|
175
|
+
def emulate_rss!
|
176
|
+
aliases.merge!({
|
177
|
+
:guid => :id, # this alias never actually gets used; see FeedData#id
|
178
|
+
:copyright => :rights,
|
179
|
+
:pubdate => [ :published, :updated ],
|
180
|
+
:lastbuilddate => [ :updated, :published ],
|
181
|
+
:description => [ :content, :summary ],
|
182
|
+
:managingeditor => [ :'author/name', :'contributor/name' ],
|
183
|
+
:webmaster => [ :'author/name', :'contributor/name' ],
|
184
|
+
:image => [ :icon, :logo ]
|
185
|
+
})
|
186
|
+
end
|
187
|
+
|
188
|
+
# Add aliases so that RSS feed elements can be accessed
|
189
|
+
# using the names of their Atom counterparts.
|
190
|
+
def emulate_atom!
|
191
|
+
aliases.merge!({
|
192
|
+
:rights => :copyright,
|
193
|
+
:content => :description,
|
194
|
+
:contributor => :author,
|
195
|
+
:id => [ :guid_value, :link ],
|
196
|
+
:author => [ :managingeditor, :webmaster ],
|
197
|
+
:updated => [ :lastbuilddate, :pubdate ],
|
198
|
+
:published => [ :pubDate, :lastbuilddate ],
|
199
|
+
:icon => :'image/url',
|
200
|
+
:logo => :'image/url',
|
201
|
+
:summary => :'description_trunc'
|
202
|
+
})
|
203
|
+
end
|
204
|
+
|
205
|
+
# Parse +source+ using a +Parser+ created from this +ParserBuilder+.
|
206
|
+
def parse(source)
|
111
207
|
Parser.new(self, source, options)
|
112
208
|
end
|
113
209
|
end
|
114
210
|
|
211
|
+
#
|
115
212
|
class StrictParserBuilder < ParserBuilder
|
116
|
-
attr_accessor :feed_ext_tags, :item_ext_tags
|
213
|
+
attr_accessor :feed_ext_tags, :item_ext_tags, :rels
|
117
214
|
|
118
|
-
def initialize
|
119
|
-
super()
|
215
|
+
def initialize(options={})
|
216
|
+
super(options)
|
120
217
|
|
121
218
|
# rss tags
|
122
219
|
@rss_tags = [
|
123
220
|
{
|
124
221
|
:image => [ :url, :title, :link, :width, :height, :description ],
|
125
|
-
:
|
126
|
-
:
|
127
|
-
:
|
222
|
+
:textinput => [ :title, :description, :name, :link ],
|
223
|
+
:skiphours => [ :hour ],
|
224
|
+
:skipdays => [ :day ],
|
128
225
|
:items => [
|
129
226
|
{
|
130
|
-
:
|
227
|
+
:rdf_seq => [ :rdf_li ]
|
131
228
|
},
|
132
|
-
:
|
229
|
+
:rdf_seq
|
133
230
|
],
|
134
231
|
#:item => @item_tags
|
135
232
|
},
|
136
233
|
:title, :link, :description, # required
|
137
|
-
:language, :copyright, :
|
138
|
-
:
|
234
|
+
:language, :copyright, :managingeditor, :webmaster, # optional
|
235
|
+
:pubdate, :lastbuilddate, :category, :generator,
|
139
236
|
:docs, :cloud, :ttl, :rating,
|
140
|
-
:image, :
|
237
|
+
:image, :textinput, :skiphours, :skipdays, :item, # have subtags
|
141
238
|
:items
|
142
239
|
]
|
143
240
|
@rss_item_tags = [
|
144
241
|
{},
|
145
242
|
:title, :description, # required
|
146
243
|
:link, :author, :category, :comments, :enclosure, # optional
|
147
|
-
:guid, :
|
244
|
+
:guid, :pubdate, :source, :expirationdate
|
148
245
|
]
|
149
246
|
|
150
247
|
#atom tags
|
@@ -157,9 +254,7 @@ module FeedMe
|
|
157
254
|
},
|
158
255
|
:id, :author, :title, :updated, # required
|
159
256
|
:category, :contributor, :generator, :icon, :logo, # optional
|
160
|
-
:
|
161
|
-
:'link+replies', :'link+related', :'link+enclosure',
|
162
|
-
:'link+via', :rights, :subtitle
|
257
|
+
:link, :rights, :subtitle
|
163
258
|
]
|
164
259
|
@atom_entry_tags = [
|
165
260
|
{
|
@@ -167,22 +262,25 @@ module FeedMe
|
|
167
262
|
:contributor => person_tags
|
168
263
|
},
|
169
264
|
:id, :author, :title, :updated, :summary, # required
|
170
|
-
:category, :content, :contributor, :
|
171
|
-
:
|
172
|
-
:'link+related', :'link+enclosure', :published,
|
173
|
-
:rights, :source
|
265
|
+
:category, :content, :contributor, :link,
|
266
|
+
:published, :rights, :source
|
174
267
|
]
|
175
268
|
|
269
|
+
@rels = {
|
270
|
+
:link => [ 'self', 'alternate', 'edit', 'replies', 'related', 'enclosure', 'via' ]
|
271
|
+
}
|
272
|
+
|
176
273
|
# extensions
|
177
274
|
@feed_ext_tags = [
|
178
|
-
:
|
179
|
-
:
|
275
|
+
:dc_date, :feedburner_browserfriendly,
|
276
|
+
:itunes_author, :itunes_category
|
180
277
|
]
|
181
278
|
@item_ext_tags = [
|
182
|
-
:
|
183
|
-
:
|
184
|
-
:
|
185
|
-
:
|
279
|
+
:dc_date, :dc_subject, :dc_creator,
|
280
|
+
:dc_title, :dc_rights, :dc_publisher,
|
281
|
+
:trackback_ping, :trackback_about,
|
282
|
+
:feedburner_origlink, :media_content,
|
283
|
+
:content_encoded
|
186
284
|
]
|
187
285
|
end
|
188
286
|
|
@@ -202,46 +300,69 @@ module FeedMe
|
|
202
300
|
class FeedData
|
203
301
|
attr_reader :fm_tag_name, :fm_parent, :fm_builder
|
204
302
|
|
205
|
-
def initialize(tag_name, parent, builder
|
303
|
+
def initialize(tag_name, parent, builder)
|
206
304
|
@fm_tag_name = tag_name
|
207
305
|
@fm_parent = parent
|
208
306
|
@fm_builder = builder
|
209
|
-
@data =
|
307
|
+
@data = {}
|
210
308
|
end
|
211
309
|
|
212
310
|
def key?(key)
|
213
|
-
@data.key?(key)
|
311
|
+
@data.key?(clean_tag(key))
|
214
312
|
end
|
215
313
|
|
216
314
|
def keys
|
217
315
|
@data.keys
|
218
316
|
end
|
219
317
|
|
318
|
+
def delete(key)
|
319
|
+
@data.delete(clean_tag(key))
|
320
|
+
end
|
321
|
+
|
322
|
+
def each
|
323
|
+
@data.each {|key, value| yield(key, value) }
|
324
|
+
end
|
325
|
+
|
326
|
+
def each_with_index
|
327
|
+
@data.each_with_index {|key, value, index| yield(key, value, index) }
|
328
|
+
end
|
329
|
+
|
330
|
+
def size
|
331
|
+
@data.size
|
332
|
+
end
|
333
|
+
|
220
334
|
def [](key)
|
221
|
-
@data[key]
|
335
|
+
@data[clean_tag(key)]
|
222
336
|
end
|
223
337
|
|
224
338
|
def []=(key, value)
|
225
|
-
@data[key] = value
|
339
|
+
@data[clean_tag(key)] = value
|
340
|
+
end
|
341
|
+
|
342
|
+
# special handling for atom id tags, due to conflict with
|
343
|
+
# ruby's Object#id method
|
344
|
+
def id
|
345
|
+
key?(:id) ? self[:id] : call_virtual_method(:id)
|
226
346
|
end
|
227
347
|
|
228
348
|
def to_s
|
229
|
-
|
349
|
+
to_indented_s
|
230
350
|
end
|
231
351
|
|
232
|
-
def
|
233
|
-
|
352
|
+
def to_indented_s(indent_step=2)
|
353
|
+
FeedMe.pretty_to_s(self, indent_step, 0, Proc.new do |key, value|
|
354
|
+
(value.is_a?(Array) && value.size == 1) ? [unarrayize(key), value.first] : [key, value]
|
355
|
+
end)
|
234
356
|
end
|
235
357
|
|
236
|
-
|
237
|
-
|
238
|
-
|
239
|
-
|
240
|
-
|
241
|
-
|
242
|
-
|
243
|
-
|
244
|
-
return key + '_array'
|
358
|
+
def method_missing(name, *args)
|
359
|
+
result = begin
|
360
|
+
call_virtual_method(name, args)
|
361
|
+
rescue NameError
|
362
|
+
raise if fm_builder.options[:error_on_missing_key]
|
363
|
+
end
|
364
|
+
result = '' if result.nil? and fm_builder.options[:empty_string_for_nil]
|
365
|
+
result
|
245
366
|
end
|
246
367
|
|
247
368
|
# There are several virtual methods for each attribute/tag.
|
@@ -263,70 +384,146 @@ module FeedMe
|
|
263
384
|
# array.size.
|
264
385
|
# 7. If the tag name is of the form "tag+rel", the tag having the
|
265
386
|
# specified rel value is returned
|
266
|
-
def call_virtual_method(
|
387
|
+
def call_virtual_method(sym, args=[], history=[])
|
267
388
|
# make sure we don't get stuck in an infinite loop
|
268
389
|
history.each do |call|
|
269
|
-
if call[0] == fm_tag_name and call[1] ==
|
270
|
-
|
271
|
-
puts self.inspect
|
272
|
-
raise FeedMe::InfiniteCallLoopError.new(name, history)
|
390
|
+
if call[0] == fm_tag_name and call[1] == sym
|
391
|
+
raise FeedMe::InfiniteCallLoopError.new(sym, history)
|
273
392
|
end
|
274
393
|
end
|
275
|
-
history << [ fm_tag_name,
|
394
|
+
history << [ fm_tag_name, sym ]
|
276
395
|
|
277
|
-
|
278
|
-
name = clean_tag(name)
|
396
|
+
name = clean_tag(sym)
|
279
397
|
name_str = name.to_s
|
280
|
-
array_key =
|
281
|
-
|
282
|
-
|
398
|
+
array_key = arrayize(name.to_s)
|
399
|
+
|
400
|
+
result = if key? name
|
401
|
+
self[name]
|
402
|
+
elsif key? array_key
|
403
|
+
self[array_key].first
|
404
|
+
elsif name_str[-1,1] == '?'
|
283
405
|
!call_virtual_method(name_str[0..-2], args, history).nil? rescue false
|
284
406
|
elsif name_str[-1,1] == '!'
|
285
407
|
value = call_virtual_method(name_str[0..-2], args, history)
|
286
|
-
fm_builder.
|
287
|
-
|
288
|
-
|
289
|
-
|
290
|
-
value = fm_builder.bang_mod_fns[bm_key].call(value, *parts[1..-1])
|
408
|
+
_transform(fm_builder.default_transformation, value)
|
409
|
+
elsif name_str =~ /(.+)_values/
|
410
|
+
call_virtual_method(arrayize($1), args, history).collect do |value|
|
411
|
+
_resolve_value value
|
291
412
|
end
|
292
|
-
return value
|
293
|
-
elsif key? name
|
294
|
-
self[name]
|
295
|
-
elsif key? array_key
|
296
|
-
self[array_key].first
|
297
413
|
elsif name_str =~ /(.+)_value/
|
414
|
+
_resolve_value call_virtual_method($1, args, history)
|
415
|
+
elsif name_str =~ /(.+)_count/
|
416
|
+
call_virtual_method(arrayize($1), args, history).size
|
417
|
+
elsif name_str =~ /(.+)_(.+)/ && fm_builder.transformations.key?($2)
|
298
418
|
value = call_virtual_method($1, args, history)
|
299
|
-
|
300
|
-
|
301
|
-
|
302
|
-
|
303
|
-
|
304
|
-
|
419
|
+
_transform(fm_builder.transformations[$2], value)
|
420
|
+
elsif name_str.include?('/') # this is only intended to be used internally
|
421
|
+
value = self
|
422
|
+
name_str.split('/').each do |p|
|
423
|
+
parts = p.split('_')
|
424
|
+
name = clean_tag(parts[0])
|
425
|
+
new_args = parts.size > 1 ? parts[1..-1] : args
|
426
|
+
value = (value.method(name).call(*new_args) rescue
|
427
|
+
value.call_virtual_method(name, new_args, history)) rescue nil
|
428
|
+
break if value.nil?
|
305
429
|
end
|
306
|
-
|
307
|
-
|
308
|
-
|
309
|
-
|
310
|
-
|
311
|
-
call_virtual_method(
|
430
|
+
value
|
431
|
+
elsif name_str.include?('+')
|
432
|
+
name_data = name_str.split('+')
|
433
|
+
rel = name_data[1]
|
434
|
+
value = nil
|
435
|
+
call_virtual_method(arrayize(name_data[0]), args, history).each do |elt|
|
312
436
|
next unless elt.is_a?(FeedData) and elt.rel?
|
313
|
-
|
437
|
+
value = elt if elt.rel.casecmp(rel) == 0
|
438
|
+
break unless value.nil?
|
314
439
|
end
|
440
|
+
value
|
315
441
|
elsif fm_builder.aliases.key? name
|
316
|
-
|
317
|
-
|
318
|
-
|
319
|
-
|
320
|
-
|
321
|
-
|
322
|
-
|
323
|
-
if item[:'rdf:about'] == uri
|
324
|
-
return item.call_virtual_method(name, args, history)
|
325
|
-
end
|
442
|
+
names = fm_builder.aliases[name]
|
443
|
+
names = [names] unless names.is_a? Array
|
444
|
+
value = nil
|
445
|
+
names.each do |name|
|
446
|
+
value = (method(name).call(*args) rescue
|
447
|
+
call_virtual_method(name, args, history)) rescue next
|
448
|
+
break unless value.nil?
|
326
449
|
end
|
450
|
+
value
|
327
451
|
else
|
328
|
-
|
452
|
+
nil
|
453
|
+
end
|
454
|
+
|
455
|
+
raise NameError.new("No such method '#{name}'", name) if result.nil?
|
456
|
+
|
457
|
+
result
|
458
|
+
end
|
459
|
+
|
460
|
+
# Apply transformations to a tag value. Can either accept a transformation
|
461
|
+
# name or an array of transformation function names.
|
462
|
+
def transform(tag, trans)
|
463
|
+
value = call_virtual_method(tag) or return nil
|
464
|
+
transformations = trans.is_a?(String) ?
|
465
|
+
fm_builder.transformations[trans] : trans
|
466
|
+
_transform(transformations, value)
|
467
|
+
end
|
468
|
+
|
469
|
+
protected
|
470
|
+
|
471
|
+
def clean_tag(tag)
|
472
|
+
tag.to_s.downcase.gsub(':','_').intern
|
473
|
+
end
|
474
|
+
|
475
|
+
# generate a name for the array variable corresponding to a single-value variable
|
476
|
+
def arrayize(key)
|
477
|
+
clean_tag(key.to_s + '_array')
|
478
|
+
end
|
479
|
+
|
480
|
+
def unarrayize(key)
|
481
|
+
clean_tag(key.to_s.gsub(/_array$/, ''))
|
482
|
+
end
|
483
|
+
|
484
|
+
private
|
485
|
+
|
486
|
+
def _transform(trans_array, value)
|
487
|
+
trans_array.each do |t|
|
488
|
+
if t.is_a? String
|
489
|
+
value = _transform(fm_builder.transformations[t], value)
|
490
|
+
else
|
491
|
+
if t.is_a? Symbol
|
492
|
+
t_name = t
|
493
|
+
args = []
|
494
|
+
elsif t[0].is_a? Array
|
495
|
+
raise 'array where symbol expected'
|
496
|
+
else
|
497
|
+
t_name = t[0]
|
498
|
+
args = t[1..-1]
|
499
|
+
end
|
500
|
+
|
501
|
+
trans = fm_builder.transformation_fns[t_name] or
|
502
|
+
raise NameError.new("No such transformation #{t_name}", t_name)
|
503
|
+
|
504
|
+
if value.is_a? Array
|
505
|
+
value = value.collect {|x| trans.call(x, *args) }
|
506
|
+
else
|
507
|
+
value = trans.call(value, *args)
|
508
|
+
end
|
509
|
+
end
|
510
|
+
end
|
511
|
+
value
|
512
|
+
end
|
513
|
+
|
514
|
+
def _resolve_value(obj)
|
515
|
+
value = obj
|
516
|
+
if obj.is_a?(FeedData)
|
517
|
+
if fm_builder.value_tags.key? obj.fm_tag_name
|
518
|
+
value = obj.call_virtual_method(fm_builder.value_tags[obj.fm_tag_name])
|
519
|
+
else
|
520
|
+
fm_builder.default_value_tags.each do |tag|
|
521
|
+
value = obj.call_virtual_method(tag) rescue next
|
522
|
+
break unless value.nil?
|
523
|
+
end
|
524
|
+
end
|
329
525
|
end
|
526
|
+
value
|
330
527
|
end
|
331
528
|
end
|
332
529
|
|
@@ -346,19 +543,31 @@ module FeedMe
|
|
346
543
|
alias :feed :channel
|
347
544
|
|
348
545
|
def fm_tag_name
|
349
|
-
@fm_type == FeedMe::
|
546
|
+
@fm_type == FeedMe::ATOM ? 'feed' : 'channel'
|
547
|
+
end
|
548
|
+
|
549
|
+
def fm_prefix
|
550
|
+
fm_type.to_s.downcase
|
350
551
|
end
|
351
552
|
|
352
553
|
private
|
353
554
|
|
354
555
|
def parse
|
355
556
|
# RSS = everything between channel tags + everthing between </channel> and </rdf> if this is an RDF document
|
356
|
-
if @fm_source =~ %r{<(?:.*?:)?(
|
357
|
-
@fm_type =
|
557
|
+
if @fm_source =~ %r{<(?:.*?:)?(rss|rdf)(.*?)>.*?<(?:.*?:)?channel(.*?)>(.+)</(?:.*?:)?channel>(.*)</(?:.*?:)?(?:rss|rdf)>}mi
|
558
|
+
@fm_type = $2.upcase.to_s
|
358
559
|
@fm_tags = fm_builder.all_rss_tags
|
359
|
-
attrs = parse_attributes($1, $
|
560
|
+
attrs = parse_attributes($1, $3)
|
360
561
|
attrs[:version] ||= '1.0';
|
361
|
-
parse_content(self, attrs, $
|
562
|
+
parse_content(self, attrs, $4, @fm_tags)
|
563
|
+
|
564
|
+
# for RDF documents, replace references with actual items
|
565
|
+
unless nil_or_empty?($5)
|
566
|
+
refs = FeedData.new(nil, nil, fm_builder)
|
567
|
+
parse_content(refs, {}, $5, @fm_tags)
|
568
|
+
dereference_rdf_tags(:items_array, :item_array, refs) {|a| a.first[:rdf_seq_array].first[:rdf_li_array] }
|
569
|
+
[:image_array, :textinput_array].each {|tag| dereference_rdf_tags(tag, tag, refs) }
|
570
|
+
end
|
362
571
|
# Atom = everthing between feed tags
|
363
572
|
elsif @fm_source =~ %r{<(?:.*?:)?feed(.*?)>(.+)</(?:.*?:)?feed>}mi
|
364
573
|
@fm_type = FeedMe::ATOM
|
@@ -369,21 +578,37 @@ module FeedMe
|
|
369
578
|
end
|
370
579
|
end
|
371
580
|
|
581
|
+
# References within the <channel> element are replaced by the actual
|
582
|
+
def dereference_rdf_tags(rdf_tag, rss_tag, refs)
|
583
|
+
if self.key?(rdf_tag)
|
584
|
+
src_items = self.delete(rdf_tag)
|
585
|
+
src_items = yield(src_items) if block_given?
|
586
|
+
ref_items = refs[rss_tag]
|
587
|
+
unless src_items.empty? || ref_items.empty?
|
588
|
+
self[rss_tag] = src_items.collect do |src_item|
|
589
|
+
next unless src_item.key?(:rdf_resource)
|
590
|
+
uri = src_item[:rdf_resource]
|
591
|
+
ref_items.each do |ref_item|
|
592
|
+
next unless ref_item.key?(:rdf_about)
|
593
|
+
if (ref_item[:rdf_about].eql?(uri))
|
594
|
+
ref_item[:rdf_resource] = uri
|
595
|
+
break ref_item
|
596
|
+
end
|
597
|
+
end
|
598
|
+
end
|
599
|
+
end
|
600
|
+
end
|
601
|
+
end
|
602
|
+
|
372
603
|
def parse_content(parent, attrs, content, tags)
|
373
604
|
# add attributes to parent
|
374
|
-
attrs.each_pair {|key, value|
|
375
|
-
|
376
|
-
|
377
|
-
first_tag = 0
|
378
|
-
if !tags.nil? && tags[0].is_a?(Hash)
|
379
|
-
sub_tags = tags[0]
|
380
|
-
first_tag = 1
|
381
|
-
end
|
382
|
-
|
605
|
+
attrs.each_pair {|key, value| parent[key] = unescape(value) }
|
606
|
+
return if content.nil?
|
607
|
+
|
383
608
|
# split the content into elements
|
384
609
|
elements = {}
|
385
|
-
|
386
|
-
content.scan( %r{(<(
|
610
|
+
# TODO: this will break if a namespace is used that is not rss: or atom:
|
611
|
+
content.scan( %r{(<([\w:]+)(.*?)(?:/>|>(.*?)</\2>))}mi ) do |match|
|
387
612
|
# \1 = full content (from start to end tag), \2 = tag name
|
388
613
|
# \3 = attributes, and \4 = content between tags
|
389
614
|
key = clean_tag(match[1])
|
@@ -395,33 +620,37 @@ module FeedMe
|
|
395
620
|
end
|
396
621
|
end
|
397
622
|
|
398
|
-
#
|
399
|
-
|
400
|
-
|
401
|
-
|
402
|
-
|
403
|
-
|
623
|
+
# the first item in a tag array may be a hash that defines tags that have subtags
|
624
|
+
sub_tags = tags[0] if !nil_or_empty?(tags) && tags[0].is_a?(Hash)
|
625
|
+
first_tag = sub_tags.nil? || tags.size == 1 ? 0 : 1
|
626
|
+
# if this is a promiscuous parser, tag names will depend on the elements found in the feed
|
627
|
+
tags = elements.keys if (sub_tags.nil? ? nil_or_empty?(tags) : first_tag == 0)
|
628
|
+
|
404
629
|
# iterate over all tags (some or all of which may not be present)
|
405
630
|
tags[first_tag..-1].each do |tag|
|
406
631
|
key = clean_tag(tag)
|
407
|
-
|
632
|
+
element_array = elements.delete(tag) or next
|
408
633
|
@fm_parsed << key
|
409
634
|
|
410
635
|
element_array.each do |elt|
|
636
|
+
elt_attrs = elt[0]
|
637
|
+
elt_content = elt[1]
|
638
|
+
rels = fm_builder.rels[key] if fm_builder.respond_to?(:rels)
|
639
|
+
|
640
|
+
# if a list of accepted rels is specified, only parse this tag
|
641
|
+
# if its rel attribute is inlcuded in the list
|
642
|
+
next unless rels.nil? || elt_attrs.nil? || !elt_attrs.rel? || rels.include?(elt_attrs.rel)
|
643
|
+
|
411
644
|
if !sub_tags.nil? && sub_tags.key?(key)
|
412
|
-
|
413
|
-
|
414
|
-
|
415
|
-
new_parent = FeedData.new(key, parent, fm_builder)
|
416
|
-
add_tag(parent, key, new_parent)
|
417
|
-
end
|
418
|
-
parse_content(new_parent, elt[0], elt[1], sub_tags[key])
|
645
|
+
new_parent = FeedData.new(key, parent, fm_builder)
|
646
|
+
add_tag(parent, key, new_parent)
|
647
|
+
parse_content(new_parent, elt_attrs, elt_content, sub_tags[key])
|
419
648
|
else
|
420
|
-
add_tag(parent, key, clean_content(key,
|
649
|
+
add_tag(parent, key, clean_content(key, elt_attrs, elt_content, parent))
|
421
650
|
end
|
422
651
|
end
|
423
652
|
end
|
424
|
-
|
653
|
+
|
425
654
|
@fm_unparsed += elements.keys
|
426
655
|
|
427
656
|
@fm_parsed.uniq!
|
@@ -429,7 +658,7 @@ module FeedMe
|
|
429
658
|
end
|
430
659
|
|
431
660
|
def add_tag(hash, key, value)
|
432
|
-
array_var =
|
661
|
+
array_var = arrayize(key)
|
433
662
|
if hash.key? array_var
|
434
663
|
hash[array_var] << value
|
435
664
|
else
|
@@ -446,18 +675,19 @@ module FeedMe
|
|
446
675
|
content = content.to_s
|
447
676
|
if fm_builder.date_tags.include? tag
|
448
677
|
content = Time.parse(content) rescue unescape(content)
|
449
|
-
else
|
450
|
-
|
678
|
+
else
|
679
|
+
content = unescape(content)
|
451
680
|
end
|
452
681
|
|
453
682
|
unless attrs.empty?
|
454
|
-
hash = FeedData.new(tag, parent, fm_builder
|
683
|
+
hash = FeedData.new(tag, parent, fm_builder)
|
684
|
+
attrs.each_pair {|key, value| hash[key] = unescape(value) }
|
455
685
|
if !content.empty?
|
456
686
|
hash[FeedMe::CONTENT_KEY] = content
|
457
687
|
end
|
458
688
|
return hash
|
459
689
|
end
|
460
|
-
|
690
|
+
|
461
691
|
return content
|
462
692
|
end
|
463
693
|
|
@@ -466,9 +696,9 @@ module FeedMe
|
|
466
696
|
attrs.each do |a|
|
467
697
|
next if a.nil?
|
468
698
|
# pull key/value pairs out of attr string
|
469
|
-
array = a.scan(/(\w+)=['"]?([^'"]+)/)
|
699
|
+
array = a.scan(/([\w:]+)=['"]?([^'"]+)/)
|
470
700
|
# unescape values
|
471
|
-
array = array.collect {|key, value| [clean_tag(
|
701
|
+
array = array.collect {|key, value| [clean_tag(key), unescape(value)]}
|
472
702
|
hash.merge! Hash[*array.flatten]
|
473
703
|
end
|
474
704
|
return hash
|
@@ -484,32 +714,10 @@ module FeedMe
|
|
484
714
|
content = cdata[1] if cdata
|
485
715
|
|
486
716
|
return content
|
487
|
-
|
488
|
-
#if content =~ /([^-_.!~*'()a-zA-Z\d;\/?:@&=+$,\[\]]%)/n then
|
489
|
-
# CGI.unescapeHTML(content).gsub(/(<!\[CDATA\[|\]\]>)/,'').strip
|
490
|
-
#else
|
491
|
-
# content.gsub(/(<!\[CDATA\[|\]\]>)/,'').strip
|
492
|
-
#end
|
493
|
-
end
|
494
|
-
|
495
|
-
def underscore(camel_cased_word)
|
496
|
-
camel_cased_word.to_s.gsub(/::/, '/').
|
497
|
-
gsub(/([A-Z]+)([A-Z][a-z])/,'\1_\2').
|
498
|
-
gsub(/([a-z\d])([A-Z])/,'\1_\2').
|
499
|
-
tr("-", "_").
|
500
|
-
downcase
|
501
|
-
end
|
502
|
-
|
503
|
-
def camelize(lower_case_and_underscored_word, first_letter_in_uppercase = true)
|
504
|
-
if first_letter_in_uppercase
|
505
|
-
lower_case_and_underscored_word.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
|
506
|
-
else
|
507
|
-
lower_case_and_underscored_word[0,1].downcase + camelize(lower_case_and_underscored_word)[1..-1]
|
508
|
-
end
|
509
717
|
end
|
510
718
|
|
511
|
-
def
|
512
|
-
obj.nil? ?
|
719
|
+
def nil_or_empty?(obj)
|
720
|
+
obj.nil? || obj.empty? || (obj.is_a?(String) && obj.strip.empty?)
|
513
721
|
end
|
514
722
|
end
|
515
723
|
|