feed-normalizer 1.5.1 → 1.5.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -2,51 +2,51 @@
2
2
 
3
3
  * Fix a bug that was breaking the parsing process for certain feeds. [reported by: Patrick Minton]
4
4
 
5
- 1.5.0
6
-
7
- * Add support for new fields:
8
- * Atom 0.3: issued is now available through entry.date_published.
9
- * RSS: feed.skip_hours, feed.skip_days, feed.ttl [joshpeek]
10
- * All: entry.last_updated, this is an alias to entry.date_published for RSS.
11
- * Rewrite relative links in content [joshpeek]
12
- * Handle CDATA sections consistently across all formats. [sam.lown]
13
- * Prevent SimpleRSS from doing its own escaping. [reported by: paul.stadig, lionel.bouton]
14
- * Reparse Time classes [reported by: sam.lown]
15
-
16
- 1.4.0
17
-
18
- * Support content:encoded. Accessible via Entry#content.
19
- * Support categories. Accessible via Entry#categories.
20
- * Introduces a new parsing feature 'loose parsing'. Use :loose => true
21
- when parsing if the required output should retain extra data, rather
22
- than drop it in the interests of 'lowest common denomiator' normalization.
23
- Currently affects how categories works. See the documentation in
24
- FeedNormalizer#parse for more details.
25
-
26
- 1.3.2
27
-
28
- * Add support for applicable dublin core elements. (dc:date and dc:creator)
29
- * Feeds can now be dumped to YAML.
30
-
31
- 1.3.1
32
-
33
- * Small changes to work with hpricot 0.6. This release depends on hpricot 0.6.
34
- * Reduced the greediness of a regexp that was removing html comments.
35
-
36
- 1.3.0
37
-
38
- * Small changes to work with hpricot 0.5.
39
-
40
- 1.2.0
41
-
42
- * Added HtmlCleaner - sanitizes HTML and removes 'bad' URIs to a level suitable
43
- for 'safe' display inside a web browser. Can be used as a standalone library,
44
- or as part of the Feed object. See Feed.clean! for details about cleaning a
45
- Feed instance. Also see HtmlCleaner and its unit tests. Uses Hpricot.
46
- * Added Feed-diffing. Differences between two feeds can be displayed using
47
- Feed.diff. Works nicely with YAML for a readable diff.
48
- * FeedNormalizer.parse now takes a hash for its arguments.
49
- * Removed FN::Content.
50
- * Now uses Hoe!
51
-
52
-
5
+ 1.5.0
6
+
7
+ * Add support for new fields:
8
+ * Atom 0.3: issued is now available through entry.date_published.
9
+ * RSS: feed.skip_hours, feed.skip_days, feed.ttl [joshpeek]
10
+ * All: entry.last_updated, this is an alias to entry.date_published for RSS.
11
+ * Rewrite relative links in content [joshpeek]
12
+ * Handle CDATA sections consistently across all formats. [sam.lown]
13
+ * Prevent SimpleRSS from doing its own escaping. [reported by: paul.stadig, lionel.bouton]
14
+ * Reparse Time classes [reported by: sam.lown]
15
+
16
+ 1.4.0
17
+
18
+ * Support content:encoded. Accessible via Entry#content.
19
+ * Support categories. Accessible via Entry#categories.
20
+ * Introduces a new parsing feature 'loose parsing'. Use :loose => true
21
+ when parsing if the required output should retain extra data, rather
22
+ than drop it in the interests of 'lowest common denomiator' normalization.
23
+ Currently affects how categories works. See the documentation in
24
+ FeedNormalizer#parse for more details.
25
+
26
+ 1.3.2
27
+
28
+ * Add support for applicable dublin core elements. (dc:date and dc:creator)
29
+ * Feeds can now be dumped to YAML.
30
+
31
+ 1.3.1
32
+
33
+ * Small changes to work with hpricot 0.6. This release depends on hpricot 0.6.
34
+ * Reduced the greediness of a regexp that was removing html comments.
35
+
36
+ 1.3.0
37
+
38
+ * Small changes to work with hpricot 0.5.
39
+
40
+ 1.2.0
41
+
42
+ * Added HtmlCleaner - sanitizes HTML and removes 'bad' URIs to a level suitable
43
+ for 'safe' display inside a web browser. Can be used as a standalone library,
44
+ or as part of the Feed object. See Feed.clean! for details about cleaning a
45
+ Feed instance. Also see HtmlCleaner and its unit tests. Uses Hpricot.
46
+ * Added Feed-diffing. Differences between two feeds can be displayed using
47
+ Feed.diff. Works nicely with YAML for a readable diff.
48
+ * FeedNormalizer.parse now takes a hash for its arguments.
49
+ * Removed FN::Content.
50
+ * Now uses Hoe!
51
+
52
+
@@ -1,27 +1,27 @@
1
- Copyright (c) 2006-2007, Andrew A. Smith
2
- All rights reserved.
3
-
4
- Redistribution and use in source and binary forms, with or without modification,
5
- are permitted provided that the following conditions are met:
6
-
7
- * Redistributions of source code must retain the above copyright notice,
8
- this list of conditions and the following disclaimer.
9
-
10
- * Redistributions in binary form must reproduce the above copyright notice,
11
- this list of conditions and the following disclaimer in the documentation
12
- and/or other materials provided with the distribution.
13
-
14
- * Neither the name of the copyright owner nor the names of its contributors
15
- may be used to endorse or promote products derived from this software
16
- without specific prior written permission.
17
-
18
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
19
- ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
20
- WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
21
- DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
22
- ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
23
- (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
24
- LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
25
- ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
26
- (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
27
- SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1
+ Copyright (c) 2006-2007, Andrew A. Smith
2
+ All rights reserved.
3
+
4
+ Redistribution and use in source and binary forms, with or without modification,
5
+ are permitted provided that the following conditions are met:
6
+
7
+ * Redistributions of source code must retain the above copyright notice,
8
+ this list of conditions and the following disclaimer.
9
+
10
+ * Redistributions in binary form must reproduce the above copyright notice,
11
+ this list of conditions and the following disclaimer in the documentation
12
+ and/or other materials provided with the distribution.
13
+
14
+ * Neither the name of the copyright owner nor the names of its contributors
15
+ may be used to endorse or promote products derived from this software
16
+ without specific prior written permission.
17
+
18
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
19
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
20
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
21
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
22
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
23
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
24
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
25
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
26
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
27
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -1,19 +1,18 @@
1
- History.txt
2
- License.txt
3
- Manifest.txt
4
- Rakefile
5
- README.txt
6
- lib/feed-normalizer.rb
7
- lib/html-cleaner.rb
8
- lib/parsers/rss.rb
9
- lib/parsers/simple-rss.rb
10
- lib/structures.rb
11
- test/data/atom03.xml
12
- test/data/atom10.xml
13
- test/data/rdf10.xml
14
- test/data/rss20.xml
15
- test/data/rss20diff.xml
16
- test/data/rss20diff_short.xml
17
- test/test_all.rb
18
- test/test_feednormalizer.rb
19
- test/test_htmlcleaner.rb
1
+ History.txt
2
+ License.txt
3
+ Manifest.txt
4
+ Rakefile
5
+ README.txt
6
+ lib/feed-normalizer.rb
7
+ lib/html-cleaner.rb
8
+ lib/parsers/rss.rb
9
+ lib/parsers/simple-rss.rb
10
+ lib/structures.rb
11
+ test/data/atom03.xml
12
+ test/data/atom10.xml
13
+ test/data/rdf10.xml
14
+ test/data/rss20.xml
15
+ test/data/rss20diff.xml
16
+ test/data/rss20diff_short.xml
17
+ test/test_feednormalizer.rb
18
+ test/test_htmlcleaner.rb
data/README.txt CHANGED
@@ -1,63 +1,63 @@
1
- == Feed Normalizer
2
-
3
- An extensible Ruby wrapper for Atom and RSS parsers.
4
-
5
- Feed normalizer wraps various RSS and Atom parsers, and returns a single unified
6
- object graph, regardless of the underlying feed format.
7
-
8
- == Download
9
-
10
- * gem install feed-normalizer
11
- * http://rubyforge.org/projects/feed-normalizer
12
- * svn co http://feed-normalizer.googlecode.com/svn/trunk
13
-
14
- == Usage
15
-
16
- require 'feed-normalizer'
17
- require 'open-uri'
18
-
19
- feed = FeedNormalizer::FeedNormalizer.parse open('http://www.iht.com/rss/frontpage.xml')
20
-
21
- feed.title # => "International Herald Tribune"
22
- feed.url # => "http://www.iht.com/pages/index.php"
23
- feed.entries.first.url # => "http://www.iht.com/articles/2006/10/03/frontpage/web.1003UN.php"
24
-
25
- feed.class # => FeedNormalizer::Feed
26
- feed.parser # => "RSS::Parser"
27
-
28
- Now read an Atom feed, and the same class is returned, and the same terminology applies:
29
-
30
- feed = FeedNormalizer::FeedNormalizer.parse open('http://www.atomenabled.org/atom.xml')
31
-
32
- feed.title # => "AtomEnabled.org"
33
- feed.url # => "http://www.atomenabled.org/atom.xml"
34
- feed.entries.first.url # => "http://www.atomenabled.org/2006/09/moving-toward-atom.php"
35
-
36
- The feed representation stays the same, even though a different parser was used.
37
-
38
- feed.class # => FeedNormalizer::Feed
39
- feed.parser # => "SimpleRSS"
40
-
41
- == Cleaning / Sanitizing
42
-
43
- feed.title # => "My Feed > Your Feed"
44
- feed.entries.first.content # => "<p x='y'>Hello</p><object></object></html>"
45
- feed.clean!
46
-
47
- All elements should now be either clean HTML, or HTML escaped strings.
48
-
49
- feed.title # => "My Feed &gt; Your Feed"
50
- feed.entries.first.content # => "<p>Hello</p>"
51
-
52
- == Extending
53
-
54
- Implement a parser wrapper by extending the FeedNormalizer::Parser class and overriding
55
- the public methods. Also note the helper methods in the root Parser object to make
56
- mapping of output from the particular parser to the Feed object easier.
57
-
58
- See FeedNormalizer::RubyRssParser and FeedNormalizer::SimpleRssParser for examples.
59
-
60
- == Authors
61
- * Andrew A. Smith (andy@tinnedfruit.org)
62
-
63
- This library is released under the terms of the BSD License (see the License.txt file for details).
1
+ == Feed Normalizer
2
+
3
+ An extensible Ruby wrapper for Atom and RSS parsers.
4
+
5
+ Feed normalizer wraps various RSS and Atom parsers, and returns a single unified
6
+ object graph, regardless of the underlying feed format.
7
+
8
+ == Download
9
+
10
+ * gem install feed-normalizer
11
+ * http://rubyforge.org/projects/feed-normalizer
12
+ * svn co http://feed-normalizer.googlecode.com/svn/trunk
13
+
14
+ == Usage
15
+
16
+ require 'feed-normalizer'
17
+ require 'open-uri'
18
+
19
+ feed = FeedNormalizer::FeedNormalizer.parse open('http://www.iht.com/rss/frontpage.xml')
20
+
21
+ feed.title # => "International Herald Tribune"
22
+ feed.url # => "http://www.iht.com/pages/index.php"
23
+ feed.entries.first.url # => "http://www.iht.com/articles/2006/10/03/frontpage/web.1003UN.php"
24
+
25
+ feed.class # => FeedNormalizer::Feed
26
+ feed.parser # => "RSS::Parser"
27
+
28
+ Now read an Atom feed, and the same class is returned, and the same terminology applies:
29
+
30
+ feed = FeedNormalizer::FeedNormalizer.parse open('http://www.atomenabled.org/atom.xml')
31
+
32
+ feed.title # => "AtomEnabled.org"
33
+ feed.url # => "http://www.atomenabled.org/atom.xml"
34
+ feed.entries.first.url # => "http://www.atomenabled.org/2006/09/moving-toward-atom.php"
35
+
36
+ The feed representation stays the same, even though a different parser was used.
37
+
38
+ feed.class # => FeedNormalizer::Feed
39
+ feed.parser # => "SimpleRSS"
40
+
41
+ == Cleaning / Sanitizing
42
+
43
+ feed.title # => "My Feed > Your Feed"
44
+ feed.entries.first.content # => "<p x='y'>Hello</p><object></object></html>"
45
+ feed.clean!
46
+
47
+ All elements should now be either clean HTML, or HTML escaped strings.
48
+
49
+ feed.title # => "My Feed &gt; Your Feed"
50
+ feed.entries.first.content # => "<p>Hello</p>"
51
+
52
+ == Extending
53
+
54
+ Implement a parser wrapper by extending the FeedNormalizer::Parser class and overriding
55
+ the public methods. Also note the helper methods in the root Parser object to make
56
+ mapping of output from the particular parser to the Feed object easier.
57
+
58
+ See FeedNormalizer::RubyRssParser and FeedNormalizer::SimpleRssParser for examples.
59
+
60
+ == Authors
61
+ * Andrew A. Smith (andy@tinnedfruit.org)
62
+
63
+ This library is released under the terms of the BSD License (see the License.txt file for details).
data/Rakefile CHANGED
@@ -1,25 +1,29 @@
1
- require 'hoe'
2
-
3
- Hoe.new("feed-normalizer", "1.5.1") do |s|
4
- s.author = "Andrew A. Smith"
5
- s.email = "andy@tinnedfruit.org"
6
- s.url = "http://feed-normalizer.rubyforge.org/"
7
- s.summary = "Extensible Ruby wrapper for Atom and RSS parsers"
8
- s.description = s.paragraphs_of('README.txt', 1..2).join("\n\n")
9
- s.changes = s.paragraphs_of('History.txt', 0..1).join("\n\n")
10
- s.extra_deps << ["simple-rss", ">= 1.1"]
11
- s.extra_deps << ["hpricot", ">= 0.6"]
12
- s.need_zip = true
13
- s.need_tar = false
14
- end
15
-
16
-
17
- begin
18
- require 'rcov/rcovtask'
19
- Rcov::RcovTask.new("rcov") do |t|
20
- t.test_files = Dir['test/test_all.rb']
21
- end
22
- rescue LoadError
23
- nil
24
- end
25
-
1
+ require 'hoe'
2
+
3
+ $: << "lib"
4
+ require 'feed-normalizer'
5
+
6
+ Hoe.spec("feed-normalizer") do |s|
7
+ s.version = "1.5.2"
8
+ s.author = "Andrew A. Smith"
9
+ s.email = "andy@tinnedfruit.org"
10
+ s.url = "http://github.com/aasmith/feed-normalizer"
11
+ s.summary = "Extensible Ruby wrapper for Atom and RSS parsers"
12
+ s.description = s.paragraphs_of('README.txt', 1..2).join("\n\n")
13
+ s.changes = s.paragraphs_of('History.txt', 0..1).join("\n\n")
14
+ s.extra_deps << ["simple-rss", ">= 1.1"]
15
+ s.extra_deps << ["hpricot", ">= 0.6"]
16
+ s.need_zip = true
17
+ s.need_tar = false
18
+ end
19
+
20
+
21
+ begin
22
+ require 'rcov/rcovtask'
23
+ Rcov::RcovTask.new("rcov") do |t|
24
+ t.test_files = Dir['test/test_all.rb']
25
+ end
26
+ rescue LoadError
27
+ nil
28
+ end
29
+
@@ -1,149 +1,149 @@
1
- require 'structures'
2
- require 'html-cleaner'
3
-
4
- module FeedNormalizer
5
-
6
- # The root parser object. Every parser must extend this object.
7
- class Parser
8
-
9
- # Parser being used.
10
- def self.parser
11
- nil
12
- end
13
-
14
- # Parses the given feed, and returns a normalized representation.
15
- # Returns nil if the feed could not be parsed.
16
- def self.parse(feed, loose)
17
- nil
18
- end
19
-
20
- # Returns a number to indicate parser priority.
21
- # The lower the number, the more likely the parser will be used first,
22
- # and vice-versa.
23
- def self.priority
24
- 0
25
- end
26
-
27
- protected
28
-
29
- # Some utility methods that can be used by subclasses.
30
-
31
- # sets value, or appends to an existing value
32
- def self.map_functions!(mapping, src, dest)
33
-
34
- mapping.each do |dest_function, src_functions|
35
- src_functions = [src_functions].flatten # pack into array
36
-
37
- src_functions.each do |src_function|
38
- value = if src.respond_to?(src_function)
39
- src.send(src_function)
40
- elsif src.respond_to?(:has_key?)
41
- src[src_function]
42
- end
43
-
44
- unless value.to_s.empty?
45
- append_or_set!(value, dest, dest_function)
46
- break
47
- end
48
- end
49
-
50
- end
51
- end
52
-
53
- def self.append_or_set!(value, object, object_function)
54
- if object.send(object_function).respond_to? :push
55
- object.send(object_function).push(value)
56
- else
57
- object.send(:"#{object_function}=", value)
58
- end
59
- end
60
-
61
- private
62
-
63
- # Callback that ensures that every parser gets registered.
64
- def self.inherited(subclass)
65
- ParserRegistry.register(subclass)
66
- end
67
-
68
- end
69
-
70
-
71
- # The parser registry keeps a list of current parsers that are available.
72
- class ParserRegistry
73
-
74
- @@parsers = []
75
-
76
- def self.register(parser)
77
- @@parsers << parser
78
- end
79
-
80
- # Returns a list of currently registered parsers, in order of priority.
81
- def self.parsers
82
- @@parsers.sort_by { |parser| parser.priority }
83
- end
84
-
85
- end
86
-
87
-
88
- class FeedNormalizer
89
-
90
- # Parses the given xml and attempts to return a normalized Feed object.
91
- # Setting +force_parser+ to a suitable parser will mean that parser is
92
- # used first, and if +try_others+ is false, it is the only parser used,
93
- # otherwise all parsers in the ParserRegistry are attempted, in
94
- # order of priority.
95
- #
96
- # ===Available options
97
- #
98
- # * <tt>:force_parser</tt> - instruct feed-normalizer to try the specified
99
- # parser first. Takes a class, such as RubyRssParser, or SimpleRssParser.
100
- #
101
- # * <tt>:try_others</tt> - +true+ or +false+, defaults to +true+.
102
- # If +true+, other parsers will be used as described above. The option
103
- # is useful if combined with +force_parser+ to only use a single parser.
104
- #
105
- # * <tt>:loose</tt> - +true+ or +false+, defaults to +false+.
106
- #
107
- # Specifies parsing should be done loosely. This means that when
108
- # feed-normalizer would usually throw away data in order to meet
109
- # the requirement of keeping resulting feed outputs the same regardless
110
- # of the underlying parser, the data will instead be kept. This currently
111
- # affects the following items:
112
- # * <em>Categories:</em> RSS allows for multiple categories per feed item.
113
- # * <em>Limitation:</em> SimpleRSS can only return the first category
114
- # for an item.
115
- # * <em>Result:</em> When loose is true, the extra categories are kept,
116
- # of course, only if the parser is not SimpleRSS.
117
- def self.parse(xml, opts = {})
118
-
119
- # Get a string ASAP, as multiple read()'s will start returning nil..
120
- xml = xml.respond_to?(:read) ? xml.read : xml.to_s
121
-
122
- if opts[:force_parser]
123
- result = opts[:force_parser].parse(xml, opts[:loose])
124
-
125
- return result if result
126
- return nil if opts[:try_others] == false
127
- end
128
-
129
- ParserRegistry.parsers.each do |parser|
130
- result = parser.parse(xml, opts[:loose])
131
- return result if result
132
- end
133
-
134
- # if we got here, no parsers worked.
135
- return nil
136
- end
137
- end
138
-
139
-
140
- parser_dir = File.dirname(__FILE__) + '/parsers'
141
-
142
- # Load up the parsers
143
- Dir.open(parser_dir).each do |fn|
144
- next unless fn =~ /[.]rb$/
145
- require "parsers/#{fn}"
146
- end
147
-
148
- end
149
-
1
+ require 'structures'
2
+ require 'html-cleaner'
3
+
4
+ module FeedNormalizer
5
+
6
+ # The root parser object. Every parser must extend this object.
7
+ class Parser
8
+
9
+ # Parser being used.
10
+ def self.parser
11
+ nil
12
+ end
13
+
14
+ # Parses the given feed, and returns a normalized representation.
15
+ # Returns nil if the feed could not be parsed.
16
+ def self.parse(feed, loose)
17
+ nil
18
+ end
19
+
20
+ # Returns a number to indicate parser priority.
21
+ # The lower the number, the more likely the parser will be used first,
22
+ # and vice-versa.
23
+ def self.priority
24
+ 0
25
+ end
26
+
27
+ protected
28
+
29
+ # Some utility methods that can be used by subclasses.
30
+
31
+ # sets value, or appends to an existing value
32
+ def self.map_functions!(mapping, src, dest)
33
+
34
+ mapping.each do |dest_function, src_functions|
35
+ src_functions = [src_functions].flatten # pack into array
36
+
37
+ src_functions.each do |src_function|
38
+ value = if src.respond_to?(src_function)
39
+ src.send(src_function)
40
+ elsif src.respond_to?(:has_key?)
41
+ src[src_function]
42
+ end
43
+
44
+ unless value.to_s.empty?
45
+ append_or_set!(value, dest, dest_function)
46
+ break
47
+ end
48
+ end
49
+
50
+ end
51
+ end
52
+
53
+ def self.append_or_set!(value, object, object_function)
54
+ if object.send(object_function).respond_to? :push
55
+ object.send(object_function).push(value)
56
+ else
57
+ object.send(:"#{object_function}=", value)
58
+ end
59
+ end
60
+
61
+ private
62
+
63
+ # Callback that ensures that every parser gets registered.
64
+ def self.inherited(subclass)
65
+ ParserRegistry.register(subclass)
66
+ end
67
+
68
+ end
69
+
70
+
71
+ # The parser registry keeps a list of current parsers that are available.
72
+ class ParserRegistry
73
+
74
+ @@parsers = []
75
+
76
+ def self.register(parser)
77
+ @@parsers << parser
78
+ end
79
+
80
+ # Returns a list of currently registered parsers, in order of priority.
81
+ def self.parsers
82
+ @@parsers.sort_by { |parser| parser.priority }
83
+ end
84
+
85
+ end
86
+
87
+
88
+ class FeedNormalizer
89
+
90
+ # Parses the given xml and attempts to return a normalized Feed object.
91
+ # Setting +force_parser+ to a suitable parser will mean that parser is
92
+ # used first, and if +try_others+ is false, it is the only parser used,
93
+ # otherwise all parsers in the ParserRegistry are attempted, in
94
+ # order of priority.
95
+ #
96
+ # ===Available options
97
+ #
98
+ # * <tt>:force_parser</tt> - instruct feed-normalizer to try the specified
99
+ # parser first. Takes a class, such as RubyRssParser, or SimpleRssParser.
100
+ #
101
+ # * <tt>:try_others</tt> - +true+ or +false+, defaults to +true+.
102
+ # If +true+, other parsers will be used as described above. The option
103
+ # is useful if combined with +force_parser+ to only use a single parser.
104
+ #
105
+ # * <tt>:loose</tt> - +true+ or +false+, defaults to +false+.
106
+ #
107
+ # Specifies parsing should be done loosely. This means that when
108
+ # feed-normalizer would usually throw away data in order to meet
109
+ # the requirement of keeping resulting feed outputs the same regardless
110
+ # of the underlying parser, the data will instead be kept. This currently
111
+ # affects the following items:
112
+ # * <em>Categories:</em> RSS allows for multiple categories per feed item.
113
+ # * <em>Limitation:</em> SimpleRSS can only return the first category
114
+ # for an item.
115
+ # * <em>Result:</em> When loose is true, the extra categories are kept,
116
+ # of course, only if the parser is not SimpleRSS.
117
+ def self.parse(xml, opts = {})
118
+
119
+ # Get a string ASAP, as multiple read()'s will start returning nil..
120
+ xml = xml.respond_to?(:read) ? xml.read : xml.to_s
121
+
122
+ if opts[:force_parser]
123
+ result = opts[:force_parser].parse(xml, opts[:loose])
124
+
125
+ return result if result
126
+ return nil if opts[:try_others] == false
127
+ end
128
+
129
+ ParserRegistry.parsers.each do |parser|
130
+ result = parser.parse(xml, opts[:loose])
131
+ return result if result
132
+ end
133
+
134
+ # if we got here, no parsers worked.
135
+ return nil
136
+ end
137
+ end
138
+
139
+
140
+ parser_dir = File.dirname(__FILE__) + '/parsers'
141
+
142
+ # Load up the parsers
143
+ Dir.open(parser_dir).each do |fn|
144
+ next unless fn =~ /[.]rb$/
145
+ require "parsers/#{fn}"
146
+ end
147
+
148
+ end
149
+