feed-normalizer 1.5.1 → 1.5.2

Sign up to get free protection for your applications and to get access to all the features.
@@ -2,51 +2,51 @@
2
2
 
3
3
  * Fix a bug that was breaking the parsing process for certain feeds. [reported by: Patrick Minton]
4
4
 
5
- 1.5.0
6
-
7
- * Add support for new fields:
8
- * Atom 0.3: issued is now available through entry.date_published.
9
- * RSS: feed.skip_hours, feed.skip_days, feed.ttl [joshpeek]
10
- * All: entry.last_updated, this is an alias to entry.date_published for RSS.
11
- * Rewrite relative links in content [joshpeek]
12
- * Handle CDATA sections consistently across all formats. [sam.lown]
13
- * Prevent SimpleRSS from doing its own escaping. [reported by: paul.stadig, lionel.bouton]
14
- * Reparse Time classes [reported by: sam.lown]
15
-
16
- 1.4.0
17
-
18
- * Support content:encoded. Accessible via Entry#content.
19
- * Support categories. Accessible via Entry#categories.
20
- * Introduces a new parsing feature 'loose parsing'. Use :loose => true
21
- when parsing if the required output should retain extra data, rather
22
- than drop it in the interests of 'lowest common denomiator' normalization.
23
- Currently affects how categories works. See the documentation in
24
- FeedNormalizer#parse for more details.
25
-
26
- 1.3.2
27
-
28
- * Add support for applicable dublin core elements. (dc:date and dc:creator)
29
- * Feeds can now be dumped to YAML.
30
-
31
- 1.3.1
32
-
33
- * Small changes to work with hpricot 0.6. This release depends on hpricot 0.6.
34
- * Reduced the greediness of a regexp that was removing html comments.
35
-
36
- 1.3.0
37
-
38
- * Small changes to work with hpricot 0.5.
39
-
40
- 1.2.0
41
-
42
- * Added HtmlCleaner - sanitizes HTML and removes 'bad' URIs to a level suitable
43
- for 'safe' display inside a web browser. Can be used as a standalone library,
44
- or as part of the Feed object. See Feed.clean! for details about cleaning a
45
- Feed instance. Also see HtmlCleaner and its unit tests. Uses Hpricot.
46
- * Added Feed-diffing. Differences between two feeds can be displayed using
47
- Feed.diff. Works nicely with YAML for a readable diff.
48
- * FeedNormalizer.parse now takes a hash for its arguments.
49
- * Removed FN::Content.
50
- * Now uses Hoe!
51
-
52
-
5
+ 1.5.0
6
+
7
+ * Add support for new fields:
8
+ * Atom 0.3: issued is now available through entry.date_published.
9
+ * RSS: feed.skip_hours, feed.skip_days, feed.ttl [joshpeek]
10
+ * All: entry.last_updated, this is an alias to entry.date_published for RSS.
11
+ * Rewrite relative links in content [joshpeek]
12
+ * Handle CDATA sections consistently across all formats. [sam.lown]
13
+ * Prevent SimpleRSS from doing its own escaping. [reported by: paul.stadig, lionel.bouton]
14
+ * Reparse Time classes [reported by: sam.lown]
15
+
16
+ 1.4.0
17
+
18
+ * Support content:encoded. Accessible via Entry#content.
19
+ * Support categories. Accessible via Entry#categories.
20
+ * Introduces a new parsing feature 'loose parsing'. Use :loose => true
21
+ when parsing if the required output should retain extra data, rather
22
+ than drop it in the interests of 'lowest common denomiator' normalization.
23
+ Currently affects how categories works. See the documentation in
24
+ FeedNormalizer#parse for more details.
25
+
26
+ 1.3.2
27
+
28
+ * Add support for applicable dublin core elements. (dc:date and dc:creator)
29
+ * Feeds can now be dumped to YAML.
30
+
31
+ 1.3.1
32
+
33
+ * Small changes to work with hpricot 0.6. This release depends on hpricot 0.6.
34
+ * Reduced the greediness of a regexp that was removing html comments.
35
+
36
+ 1.3.0
37
+
38
+ * Small changes to work with hpricot 0.5.
39
+
40
+ 1.2.0
41
+
42
+ * Added HtmlCleaner - sanitizes HTML and removes 'bad' URIs to a level suitable
43
+ for 'safe' display inside a web browser. Can be used as a standalone library,
44
+ or as part of the Feed object. See Feed.clean! for details about cleaning a
45
+ Feed instance. Also see HtmlCleaner and its unit tests. Uses Hpricot.
46
+ * Added Feed-diffing. Differences between two feeds can be displayed using
47
+ Feed.diff. Works nicely with YAML for a readable diff.
48
+ * FeedNormalizer.parse now takes a hash for its arguments.
49
+ * Removed FN::Content.
50
+ * Now uses Hoe!
51
+
52
+
@@ -1,27 +1,27 @@
1
- Copyright (c) 2006-2007, Andrew A. Smith
2
- All rights reserved.
3
-
4
- Redistribution and use in source and binary forms, with or without modification,
5
- are permitted provided that the following conditions are met:
6
-
7
- * Redistributions of source code must retain the above copyright notice,
8
- this list of conditions and the following disclaimer.
9
-
10
- * Redistributions in binary form must reproduce the above copyright notice,
11
- this list of conditions and the following disclaimer in the documentation
12
- and/or other materials provided with the distribution.
13
-
14
- * Neither the name of the copyright owner nor the names of its contributors
15
- may be used to endorse or promote products derived from this software
16
- without specific prior written permission.
17
-
18
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
19
- ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
20
- WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
21
- DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
22
- ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
23
- (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
24
- LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
25
- ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
26
- (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
27
- SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1
+ Copyright (c) 2006-2007, Andrew A. Smith
2
+ All rights reserved.
3
+
4
+ Redistribution and use in source and binary forms, with or without modification,
5
+ are permitted provided that the following conditions are met:
6
+
7
+ * Redistributions of source code must retain the above copyright notice,
8
+ this list of conditions and the following disclaimer.
9
+
10
+ * Redistributions in binary form must reproduce the above copyright notice,
11
+ this list of conditions and the following disclaimer in the documentation
12
+ and/or other materials provided with the distribution.
13
+
14
+ * Neither the name of the copyright owner nor the names of its contributors
15
+ may be used to endorse or promote products derived from this software
16
+ without specific prior written permission.
17
+
18
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
19
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
20
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
21
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
22
+ ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
23
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
24
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
25
+ ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
26
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
27
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -1,19 +1,18 @@
1
- History.txt
2
- License.txt
3
- Manifest.txt
4
- Rakefile
5
- README.txt
6
- lib/feed-normalizer.rb
7
- lib/html-cleaner.rb
8
- lib/parsers/rss.rb
9
- lib/parsers/simple-rss.rb
10
- lib/structures.rb
11
- test/data/atom03.xml
12
- test/data/atom10.xml
13
- test/data/rdf10.xml
14
- test/data/rss20.xml
15
- test/data/rss20diff.xml
16
- test/data/rss20diff_short.xml
17
- test/test_all.rb
18
- test/test_feednormalizer.rb
19
- test/test_htmlcleaner.rb
1
+ History.txt
2
+ License.txt
3
+ Manifest.txt
4
+ Rakefile
5
+ README.txt
6
+ lib/feed-normalizer.rb
7
+ lib/html-cleaner.rb
8
+ lib/parsers/rss.rb
9
+ lib/parsers/simple-rss.rb
10
+ lib/structures.rb
11
+ test/data/atom03.xml
12
+ test/data/atom10.xml
13
+ test/data/rdf10.xml
14
+ test/data/rss20.xml
15
+ test/data/rss20diff.xml
16
+ test/data/rss20diff_short.xml
17
+ test/test_feednormalizer.rb
18
+ test/test_htmlcleaner.rb
data/README.txt CHANGED
@@ -1,63 +1,63 @@
1
- == Feed Normalizer
2
-
3
- An extensible Ruby wrapper for Atom and RSS parsers.
4
-
5
- Feed normalizer wraps various RSS and Atom parsers, and returns a single unified
6
- object graph, regardless of the underlying feed format.
7
-
8
- == Download
9
-
10
- * gem install feed-normalizer
11
- * http://rubyforge.org/projects/feed-normalizer
12
- * svn co http://feed-normalizer.googlecode.com/svn/trunk
13
-
14
- == Usage
15
-
16
- require 'feed-normalizer'
17
- require 'open-uri'
18
-
19
- feed = FeedNormalizer::FeedNormalizer.parse open('http://www.iht.com/rss/frontpage.xml')
20
-
21
- feed.title # => "International Herald Tribune"
22
- feed.url # => "http://www.iht.com/pages/index.php"
23
- feed.entries.first.url # => "http://www.iht.com/articles/2006/10/03/frontpage/web.1003UN.php"
24
-
25
- feed.class # => FeedNormalizer::Feed
26
- feed.parser # => "RSS::Parser"
27
-
28
- Now read an Atom feed, and the same class is returned, and the same terminology applies:
29
-
30
- feed = FeedNormalizer::FeedNormalizer.parse open('http://www.atomenabled.org/atom.xml')
31
-
32
- feed.title # => "AtomEnabled.org"
33
- feed.url # => "http://www.atomenabled.org/atom.xml"
34
- feed.entries.first.url # => "http://www.atomenabled.org/2006/09/moving-toward-atom.php"
35
-
36
- The feed representation stays the same, even though a different parser was used.
37
-
38
- feed.class # => FeedNormalizer::Feed
39
- feed.parser # => "SimpleRSS"
40
-
41
- == Cleaning / Sanitizing
42
-
43
- feed.title # => "My Feed > Your Feed"
44
- feed.entries.first.content # => "<p x='y'>Hello</p><object></object></html>"
45
- feed.clean!
46
-
47
- All elements should now be either clean HTML, or HTML escaped strings.
48
-
49
- feed.title # => "My Feed &gt; Your Feed"
50
- feed.entries.first.content # => "<p>Hello</p>"
51
-
52
- == Extending
53
-
54
- Implement a parser wrapper by extending the FeedNormalizer::Parser class and overriding
55
- the public methods. Also note the helper methods in the root Parser object to make
56
- mapping of output from the particular parser to the Feed object easier.
57
-
58
- See FeedNormalizer::RubyRssParser and FeedNormalizer::SimpleRssParser for examples.
59
-
60
- == Authors
61
- * Andrew A. Smith (andy@tinnedfruit.org)
62
-
63
- This library is released under the terms of the BSD License (see the License.txt file for details).
1
+ == Feed Normalizer
2
+
3
+ An extensible Ruby wrapper for Atom and RSS parsers.
4
+
5
+ Feed normalizer wraps various RSS and Atom parsers, and returns a single unified
6
+ object graph, regardless of the underlying feed format.
7
+
8
+ == Download
9
+
10
+ * gem install feed-normalizer
11
+ * http://rubyforge.org/projects/feed-normalizer
12
+ * svn co http://feed-normalizer.googlecode.com/svn/trunk
13
+
14
+ == Usage
15
+
16
+ require 'feed-normalizer'
17
+ require 'open-uri'
18
+
19
+ feed = FeedNormalizer::FeedNormalizer.parse open('http://www.iht.com/rss/frontpage.xml')
20
+
21
+ feed.title # => "International Herald Tribune"
22
+ feed.url # => "http://www.iht.com/pages/index.php"
23
+ feed.entries.first.url # => "http://www.iht.com/articles/2006/10/03/frontpage/web.1003UN.php"
24
+
25
+ feed.class # => FeedNormalizer::Feed
26
+ feed.parser # => "RSS::Parser"
27
+
28
+ Now read an Atom feed, and the same class is returned, and the same terminology applies:
29
+
30
+ feed = FeedNormalizer::FeedNormalizer.parse open('http://www.atomenabled.org/atom.xml')
31
+
32
+ feed.title # => "AtomEnabled.org"
33
+ feed.url # => "http://www.atomenabled.org/atom.xml"
34
+ feed.entries.first.url # => "http://www.atomenabled.org/2006/09/moving-toward-atom.php"
35
+
36
+ The feed representation stays the same, even though a different parser was used.
37
+
38
+ feed.class # => FeedNormalizer::Feed
39
+ feed.parser # => "SimpleRSS"
40
+
41
+ == Cleaning / Sanitizing
42
+
43
+ feed.title # => "My Feed > Your Feed"
44
+ feed.entries.first.content # => "<p x='y'>Hello</p><object></object></html>"
45
+ feed.clean!
46
+
47
+ All elements should now be either clean HTML, or HTML escaped strings.
48
+
49
+ feed.title # => "My Feed &gt; Your Feed"
50
+ feed.entries.first.content # => "<p>Hello</p>"
51
+
52
+ == Extending
53
+
54
+ Implement a parser wrapper by extending the FeedNormalizer::Parser class and overriding
55
+ the public methods. Also note the helper methods in the root Parser object to make
56
+ mapping of output from the particular parser to the Feed object easier.
57
+
58
+ See FeedNormalizer::RubyRssParser and FeedNormalizer::SimpleRssParser for examples.
59
+
60
+ == Authors
61
+ * Andrew A. Smith (andy@tinnedfruit.org)
62
+
63
+ This library is released under the terms of the BSD License (see the License.txt file for details).
data/Rakefile CHANGED
@@ -1,25 +1,29 @@
1
- require 'hoe'
2
-
3
- Hoe.new("feed-normalizer", "1.5.1") do |s|
4
- s.author = "Andrew A. Smith"
5
- s.email = "andy@tinnedfruit.org"
6
- s.url = "http://feed-normalizer.rubyforge.org/"
7
- s.summary = "Extensible Ruby wrapper for Atom and RSS parsers"
8
- s.description = s.paragraphs_of('README.txt', 1..2).join("\n\n")
9
- s.changes = s.paragraphs_of('History.txt', 0..1).join("\n\n")
10
- s.extra_deps << ["simple-rss", ">= 1.1"]
11
- s.extra_deps << ["hpricot", ">= 0.6"]
12
- s.need_zip = true
13
- s.need_tar = false
14
- end
15
-
16
-
17
- begin
18
- require 'rcov/rcovtask'
19
- Rcov::RcovTask.new("rcov") do |t|
20
- t.test_files = Dir['test/test_all.rb']
21
- end
22
- rescue LoadError
23
- nil
24
- end
25
-
1
+ require 'hoe'
2
+
3
+ $: << "lib"
4
+ require 'feed-normalizer'
5
+
6
+ Hoe.spec("feed-normalizer") do |s|
7
+ s.version = "1.5.2"
8
+ s.author = "Andrew A. Smith"
9
+ s.email = "andy@tinnedfruit.org"
10
+ s.url = "http://github.com/aasmith/feed-normalizer"
11
+ s.summary = "Extensible Ruby wrapper for Atom and RSS parsers"
12
+ s.description = s.paragraphs_of('README.txt', 1..2).join("\n\n")
13
+ s.changes = s.paragraphs_of('History.txt', 0..1).join("\n\n")
14
+ s.extra_deps << ["simple-rss", ">= 1.1"]
15
+ s.extra_deps << ["hpricot", ">= 0.6"]
16
+ s.need_zip = true
17
+ s.need_tar = false
18
+ end
19
+
20
+
21
+ begin
22
+ require 'rcov/rcovtask'
23
+ Rcov::RcovTask.new("rcov") do |t|
24
+ t.test_files = Dir['test/test_all.rb']
25
+ end
26
+ rescue LoadError
27
+ nil
28
+ end
29
+
@@ -1,149 +1,149 @@
1
- require 'structures'
2
- require 'html-cleaner'
3
-
4
- module FeedNormalizer
5
-
6
- # The root parser object. Every parser must extend this object.
7
- class Parser
8
-
9
- # Parser being used.
10
- def self.parser
11
- nil
12
- end
13
-
14
- # Parses the given feed, and returns a normalized representation.
15
- # Returns nil if the feed could not be parsed.
16
- def self.parse(feed, loose)
17
- nil
18
- end
19
-
20
- # Returns a number to indicate parser priority.
21
- # The lower the number, the more likely the parser will be used first,
22
- # and vice-versa.
23
- def self.priority
24
- 0
25
- end
26
-
27
- protected
28
-
29
- # Some utility methods that can be used by subclasses.
30
-
31
- # sets value, or appends to an existing value
32
- def self.map_functions!(mapping, src, dest)
33
-
34
- mapping.each do |dest_function, src_functions|
35
- src_functions = [src_functions].flatten # pack into array
36
-
37
- src_functions.each do |src_function|
38
- value = if src.respond_to?(src_function)
39
- src.send(src_function)
40
- elsif src.respond_to?(:has_key?)
41
- src[src_function]
42
- end
43
-
44
- unless value.to_s.empty?
45
- append_or_set!(value, dest, dest_function)
46
- break
47
- end
48
- end
49
-
50
- end
51
- end
52
-
53
- def self.append_or_set!(value, object, object_function)
54
- if object.send(object_function).respond_to? :push
55
- object.send(object_function).push(value)
56
- else
57
- object.send(:"#{object_function}=", value)
58
- end
59
- end
60
-
61
- private
62
-
63
- # Callback that ensures that every parser gets registered.
64
- def self.inherited(subclass)
65
- ParserRegistry.register(subclass)
66
- end
67
-
68
- end
69
-
70
-
71
- # The parser registry keeps a list of current parsers that are available.
72
- class ParserRegistry
73
-
74
- @@parsers = []
75
-
76
- def self.register(parser)
77
- @@parsers << parser
78
- end
79
-
80
- # Returns a list of currently registered parsers, in order of priority.
81
- def self.parsers
82
- @@parsers.sort_by { |parser| parser.priority }
83
- end
84
-
85
- end
86
-
87
-
88
- class FeedNormalizer
89
-
90
- # Parses the given xml and attempts to return a normalized Feed object.
91
- # Setting +force_parser+ to a suitable parser will mean that parser is
92
- # used first, and if +try_others+ is false, it is the only parser used,
93
- # otherwise all parsers in the ParserRegistry are attempted, in
94
- # order of priority.
95
- #
96
- # ===Available options
97
- #
98
- # * <tt>:force_parser</tt> - instruct feed-normalizer to try the specified
99
- # parser first. Takes a class, such as RubyRssParser, or SimpleRssParser.
100
- #
101
- # * <tt>:try_others</tt> - +true+ or +false+, defaults to +true+.
102
- # If +true+, other parsers will be used as described above. The option
103
- # is useful if combined with +force_parser+ to only use a single parser.
104
- #
105
- # * <tt>:loose</tt> - +true+ or +false+, defaults to +false+.
106
- #
107
- # Specifies parsing should be done loosely. This means that when
108
- # feed-normalizer would usually throw away data in order to meet
109
- # the requirement of keeping resulting feed outputs the same regardless
110
- # of the underlying parser, the data will instead be kept. This currently
111
- # affects the following items:
112
- # * <em>Categories:</em> RSS allows for multiple categories per feed item.
113
- # * <em>Limitation:</em> SimpleRSS can only return the first category
114
- # for an item.
115
- # * <em>Result:</em> When loose is true, the extra categories are kept,
116
- # of course, only if the parser is not SimpleRSS.
117
- def self.parse(xml, opts = {})
118
-
119
- # Get a string ASAP, as multiple read()'s will start returning nil..
120
- xml = xml.respond_to?(:read) ? xml.read : xml.to_s
121
-
122
- if opts[:force_parser]
123
- result = opts[:force_parser].parse(xml, opts[:loose])
124
-
125
- return result if result
126
- return nil if opts[:try_others] == false
127
- end
128
-
129
- ParserRegistry.parsers.each do |parser|
130
- result = parser.parse(xml, opts[:loose])
131
- return result if result
132
- end
133
-
134
- # if we got here, no parsers worked.
135
- return nil
136
- end
137
- end
138
-
139
-
140
- parser_dir = File.dirname(__FILE__) + '/parsers'
141
-
142
- # Load up the parsers
143
- Dir.open(parser_dir).each do |fn|
144
- next unless fn =~ /[.]rb$/
145
- require "parsers/#{fn}"
146
- end
147
-
148
- end
149
-
1
+ require 'structures'
2
+ require 'html-cleaner'
3
+
4
+ module FeedNormalizer
5
+
6
+ # The root parser object. Every parser must extend this object.
7
+ class Parser
8
+
9
+ # Parser being used.
10
+ def self.parser
11
+ nil
12
+ end
13
+
14
+ # Parses the given feed, and returns a normalized representation.
15
+ # Returns nil if the feed could not be parsed.
16
+ def self.parse(feed, loose)
17
+ nil
18
+ end
19
+
20
+ # Returns a number to indicate parser priority.
21
+ # The lower the number, the more likely the parser will be used first,
22
+ # and vice-versa.
23
+ def self.priority
24
+ 0
25
+ end
26
+
27
+ protected
28
+
29
+ # Some utility methods that can be used by subclasses.
30
+
31
+ # sets value, or appends to an existing value
32
+ def self.map_functions!(mapping, src, dest)
33
+
34
+ mapping.each do |dest_function, src_functions|
35
+ src_functions = [src_functions].flatten # pack into array
36
+
37
+ src_functions.each do |src_function|
38
+ value = if src.respond_to?(src_function)
39
+ src.send(src_function)
40
+ elsif src.respond_to?(:has_key?)
41
+ src[src_function]
42
+ end
43
+
44
+ unless value.to_s.empty?
45
+ append_or_set!(value, dest, dest_function)
46
+ break
47
+ end
48
+ end
49
+
50
+ end
51
+ end
52
+
53
+ def self.append_or_set!(value, object, object_function)
54
+ if object.send(object_function).respond_to? :push
55
+ object.send(object_function).push(value)
56
+ else
57
+ object.send(:"#{object_function}=", value)
58
+ end
59
+ end
60
+
61
+ private
62
+
63
+ # Callback that ensures that every parser gets registered.
64
+ def self.inherited(subclass)
65
+ ParserRegistry.register(subclass)
66
+ end
67
+
68
+ end
69
+
70
+
71
+ # The parser registry keeps a list of current parsers that are available.
72
+ class ParserRegistry
73
+
74
+ @@parsers = []
75
+
76
+ def self.register(parser)
77
+ @@parsers << parser
78
+ end
79
+
80
+ # Returns a list of currently registered parsers, in order of priority.
81
+ def self.parsers
82
+ @@parsers.sort_by { |parser| parser.priority }
83
+ end
84
+
85
+ end
86
+
87
+
88
+ class FeedNormalizer
89
+
90
+ # Parses the given xml and attempts to return a normalized Feed object.
91
+ # Setting +force_parser+ to a suitable parser will mean that parser is
92
+ # used first, and if +try_others+ is false, it is the only parser used,
93
+ # otherwise all parsers in the ParserRegistry are attempted, in
94
+ # order of priority.
95
+ #
96
+ # ===Available options
97
+ #
98
+ # * <tt>:force_parser</tt> - instruct feed-normalizer to try the specified
99
+ # parser first. Takes a class, such as RubyRssParser, or SimpleRssParser.
100
+ #
101
+ # * <tt>:try_others</tt> - +true+ or +false+, defaults to +true+.
102
+ # If +true+, other parsers will be used as described above. The option
103
+ # is useful if combined with +force_parser+ to only use a single parser.
104
+ #
105
+ # * <tt>:loose</tt> - +true+ or +false+, defaults to +false+.
106
+ #
107
+ # Specifies parsing should be done loosely. This means that when
108
+ # feed-normalizer would usually throw away data in order to meet
109
+ # the requirement of keeping resulting feed outputs the same regardless
110
+ # of the underlying parser, the data will instead be kept. This currently
111
+ # affects the following items:
112
+ # * <em>Categories:</em> RSS allows for multiple categories per feed item.
113
+ # * <em>Limitation:</em> SimpleRSS can only return the first category
114
+ # for an item.
115
+ # * <em>Result:</em> When loose is true, the extra categories are kept,
116
+ # of course, only if the parser is not SimpleRSS.
117
+ def self.parse(xml, opts = {})
118
+
119
+ # Get a string ASAP, as multiple read()'s will start returning nil..
120
+ xml = xml.respond_to?(:read) ? xml.read : xml.to_s
121
+
122
+ if opts[:force_parser]
123
+ result = opts[:force_parser].parse(xml, opts[:loose])
124
+
125
+ return result if result
126
+ return nil if opts[:try_others] == false
127
+ end
128
+
129
+ ParserRegistry.parsers.each do |parser|
130
+ result = parser.parse(xml, opts[:loose])
131
+ return result if result
132
+ end
133
+
134
+ # if we got here, no parsers worked.
135
+ return nil
136
+ end
137
+ end
138
+
139
+
140
+ parser_dir = File.dirname(__FILE__) + '/parsers'
141
+
142
+ # Load up the parsers
143
+ Dir.open(parser_dir).each do |fn|
144
+ next unless fn =~ /[.]rb$/
145
+ require "parsers/#{fn}"
146
+ end
147
+
148
+ end
149
+