jsl-feedzirra 0.0.12.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (39) hide show
  1. data/README.rdoc +194 -0
  2. data/Rakefile +56 -0
  3. data/lib/core_ext/array.rb +8 -0
  4. data/lib/core_ext/date.rb +21 -0
  5. data/lib/core_ext/string.rb +9 -0
  6. data/lib/feedzirra/backend/filesystem.rb +32 -0
  7. data/lib/feedzirra/backend/memcache.rb +37 -0
  8. data/lib/feedzirra/backend/memory.rb +22 -0
  9. data/lib/feedzirra/feed.rb +68 -0
  10. data/lib/feedzirra/feed_parser.rb +64 -0
  11. data/lib/feedzirra/http_multi.rb +185 -0
  12. data/lib/feedzirra/parser/atom.rb +26 -0
  13. data/lib/feedzirra/parser/atom_entry.rb +34 -0
  14. data/lib/feedzirra/parser/atom_feed_burner.rb +27 -0
  15. data/lib/feedzirra/parser/atom_feed_burner_entry.rb +35 -0
  16. data/lib/feedzirra/parser/feed_entry_utilities.rb +45 -0
  17. data/lib/feedzirra/parser/feed_utilities.rb +71 -0
  18. data/lib/feedzirra/parser/itunes_rss.rb +50 -0
  19. data/lib/feedzirra/parser/itunes_rss_item.rb +31 -0
  20. data/lib/feedzirra/parser/itunes_rss_owner.rb +12 -0
  21. data/lib/feedzirra/parser/rss.rb +28 -0
  22. data/lib/feedzirra/parser/rss_entry.rb +40 -0
  23. data/lib/feedzirra/reader.rb +28 -0
  24. data/lib/feedzirra.rb +44 -0
  25. data/spec/feedzirra/feed_entry_utilities_spec.rb +52 -0
  26. data/spec/feedzirra/feed_spec.rb +5 -0
  27. data/spec/feedzirra/feed_utilities_spec.rb +149 -0
  28. data/spec/feedzirra/parser/atom_entry_spec.rb +45 -0
  29. data/spec/feedzirra/parser/atom_feed_burner_entry_spec.rb +42 -0
  30. data/spec/feedzirra/parser/atom_feed_burner_spec.rb +39 -0
  31. data/spec/feedzirra/parser/atom_spec.rb +35 -0
  32. data/spec/feedzirra/parser/itunes_rss_item_spec.rb +48 -0
  33. data/spec/feedzirra/parser/itunes_rss_owner_spec.rb +18 -0
  34. data/spec/feedzirra/parser/itunes_rss_spec.rb +50 -0
  35. data/spec/feedzirra/parser/rss_entry_spec.rb +41 -0
  36. data/spec/feedzirra/parser/rss_spec.rb +41 -0
  37. data/spec/spec.opts +2 -0
  38. data/spec/spec_helper.rb +67 -0
  39. metadata +159 -0
data/README.rdoc ADDED
@@ -0,0 +1,194 @@
1
+ == Feedzirra
2
+
3
+ === Description
4
+
5
+ Feedzirra is a feed library that is designed to get and update many feeds as quickly as possible. This includes using libcurl-multi through the
6
+ taf2-curb[link:http://github.com/taf2/curb/tree/master] gem for faster http gets, and libxml through nokogiri[link:http://github.com/tenderlove/nokogiri/tree/master]
7
+ and sax-machine[link:http://github.com/pauldix/sax-machine/tree/master] for faster parsing.
8
+
9
+ It allows for easy customization of feed parsing options through the definition of custom parsing classes, and allows you to take as little or as much control as you want in updating feeds. Feedzirra
10
+ makes it easy to figure out which content in feeds is new by providing simple 'backends' so that Feedzirra can track the last contents fetched from a particular feed. Out of the box, Feedzirra can
11
+ store this information in the filesystem, Memcached or Tokyo Cabinet. If you want to keep track of new or updated feeds on your own, just use the default backend which will will let you set options
12
+ for conditional fetching of feeds without the help of Feedzirra.
13
+
14
+ === Installation
15
+
16
+ For now Feedzirra exists only on github. It also has a few gem requirements that are only on github. Before you start you need to have libcurl[link:http://curl.haxx.se/] and
17
+ libxml[link:http://xmlsoft.org/] installed. If you're on Leopard you have both. Otherwise, you'll need to grab them. Once you've got those libraries, these are the gems that get
18
+ used: nokogiri, pauldix-sax-machine, taf2-curb (note that this is a fork that lives on github and not the Ruby Forge version of curb), and pauldix-feedzirra. The feedzirra gemspec has all
19
+ the dependencies so you should be able to get up and running with the standard github gem install routine:
20
+
21
+ gem sources -a http://gems.github.com # if you haven't already
22
+ gem install pauldix-feedzirra
23
+
24
+ ==== Troubleshooting Installation
25
+
26
+ *NOTE:*Some people have been reporting a few issues related to installation. First, the Ruby Forge version of curb is not what you want. It will not work. Nor will the curl-multi gem that lives on
27
+ Ruby Forge. You have to get the taf2-curb[link:http://github.com/taf2/curb/tree/master] fork installed.
28
+
29
+ If you see this error when doing a require:
30
+
31
+ /Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- curb_core (LoadError)
32
+
33
+ It means that the taf2-curb gem didn't build correctly. To resolve this you can do a git clone git://github.com/taf2/curb.git then run rake gem in the curb directory, then sudo gem
34
+ install pkg/curb-0.2.4.0.gem. After that you should be good.
35
+
36
+ If you see something like this when trying to run it:
37
+
38
+ NoMethodError: undefined method `on_success' for #<Curl::Easy:0x1182724>
39
+ from ./lib/feedzirra/feed.rb:88:in `add_url_to_multi'
40
+
41
+ This means that you are requiring curl-multi or the Ruby Forge version of Curb somewhere. You can't use those and need to get the taf2 version up and running.
42
+
43
+ If you're on Debian or Ubuntu and getting errors while trying to install the taf2-curb gem, it could be because you don't have the latest version of libcurl installed. Do this to fix:
44
+
45
+ sudo apt-get install libcurl4-gnutls-dev
46
+
47
+ Another problem could be if you are running Mac Ports and you have libcurl installed through there. You need to uninstall it for curb to work! The version in Mac Ports is old and doesn't play nice with curb. If you're running Leopard, you can just uninstall and you should be golden. If you're on an older version of OS X, you'll then need to {download curl}[http://curl.haxx.se/download.html] and build from source. Then you'll have to install the taf2-curb gem again. You might have to perform the step above.
48
+
49
+ If you're still having issues, please let me know on the mailing list. Also, {Todd Fisher (taf2)}[link:http://github.com/taf2] is working on fixing the gem install. Please send him a full error report.
50
+
51
+ === Usage
52
+
53
+ This experimental branch offers a new interface to feed fetching with persistent back-end stores. This allows you to
54
+ easily run a script retrieving the feeds once per hour or once per day, and it will remember which feeds have been seen
55
+ before and which are new. This features uses the Feedzirra::Reader interface.
56
+
57
+ You can create a Feedzirra::Reader object after the Feedzirra library (with require 'feedzirra') is loaded as follows:
58
+
59
+ reader = Feedzirra::Reader.new('http://www.woostercollective.com/rss/index.xml')
60
+ feed = reader.fetch
61
+
62
+ The Reader object can take a single URL or a list of URLs followed by a Hash of options. The options hash
63
+ allows configuration of the backend store, as well as fetching options for the list of urls. Following is
64
+ an example of configuration with the Memcache store connected to Tokyo Tyrant (the front-end for Tokyo Cabinet):
65
+
66
+ reader = Feedzirra::Reader.new('http://www.pauldix.net/atom.xml', :backend =>
67
+ { :class => Feedzirra::Backend::Memcache, :port => 1978, :server => 'localhost' })
68
+
69
+ Other options that may be put in the options hash follow the original API described below.
70
+
71
+ Running reader.fetch will first check the back-end store to see if this feed was fetched previously. If it was previously fetched,
72
+ Feedzirra uses this information to avoid fetching the whole body if it has already been downloaded based on etag. If the feed
73
+ has been updated, the new_entries will be populated based on the results of the last query. The back-end store will be updated with
74
+ the results of every fetch, so Feedzirra will maintain state between executions. Feedzirra currently supports filesystem, memcache
75
+ and a Ruby Hash structure-based back end that doesn't attempt to persist any information.
76
+
77
+ Once you've retrieved a single feed, you can use the accessors below to query the results.
78
+
79
+ # feed and entries accessors
80
+ feed.title # => "Paul Dix Explains Nothing"
81
+ feed.url # => "http://www.pauldix.net"
82
+ feed.feed_url # => "http://feeds.feedburner.com/PaulDixExplainsNothing"
83
+ feed.etag # => "GunxqnEP4NeYhrqq9TyVKTuDnh0"
84
+ feed.last_modified # => Sat Jan 31 17:58:16 -0500 2009 # it's a Time object
85
+
86
+ entry = feed.entries.first
87
+ entry.title # => "Ruby Http Client Library Performance"
88
+ entry.url # => "http://www.pauldix.net/2009/01/ruby-http-client-library-performance.html"
89
+ entry.author # => "Paul Dix"
90
+ entry.summary # => "..."
91
+ entry.content # => "..."
92
+ entry.published # => Thu Jan 29 17:00:19 UTC 2009 # it's a Time object
93
+ entry.categories # => ["...", "..."]
94
+
95
+ # sanitizing an entry's content
96
+ entry.title.sanitize # => returns the title with harmful stuff escaped
97
+ entry.author.sanitize # => returns the author with harmful stuff escaped
98
+ entry.content.sanitize # => returns the content with harmful stuff escaped
99
+ entry.content.sanitize! # => returns content with harmful stuff escaped and replaces original (also exists for author and title)
100
+ entry.sanitize! # => sanitizes the entry's title, author, and content in place (as in, it changes the value to clean versions)
101
+ feed.sanitize_entries! # => sanitizes all entries in place
102
+
103
+ # updating a single feed
104
+ updated_feed = Feedzirra::Feed.update(feed)
105
+
106
+ # an updated feed has the following extra accessors
107
+ updated_feed.updated? # returns true if any of the feed attributes have been modified. will return false if only new entries
108
+ updated_feed.new_entries # a collection of the entry objects that are newer than the latest in the feed before update
109
+
110
+ # fetching multiple feeds
111
+ feed_urls = ["http://feeds.feedburner.com/PaulDixExplainsNothing", "http://feeds.feedburner.com/trottercashion"]
112
+ feeds = Feedzirra::Reader.new(feed_urls).fetch
113
+
114
+ # feeds is now a hash with the feed_urls as keys and the parsed feed objects as values. If an error was thrown
115
+ # there will be a Fixnum of the http response code instead of a feed object
116
+
117
+ # updating multiple feeds. if you're using a persistent back-end, Feedzirra uses that to determine which entries are ones that you haven't seen before
118
+ updated_feeds = Feedzirra::reader.new(urls).fetch
119
+
120
+ # defining custom behavior on failure or success. note that a return status of 304 (not updated) will call the on_success handler
121
+ feed = Feedzirra::Reader.new("http://feeds.feedburner.com/PaulDixExplainsNothing",
122
+ :on_success => lambda {|feed| puts feed.title },
123
+ :on_failure => lambda {|url, response_code, response_header, response_body| puts response_body }).fetch
124
+
125
+ # if a collection was passed into the initializer, the handlers will be called for each one
126
+
127
+ === Extending
128
+
129
+ Feedzirra is easily extended with custom parsing classes and persistent back-ends. You'll have to read the source to find out how, though, because we
130
+ still haven't written the documentation. :(
131
+
132
+ === Benchmarks
133
+
134
+ One of the goals of Feedzirra is speed. This includes not only parsing, but fetching multiple feeds as quickly as possible. I ran a benchmark getting 20 feeds 10 times using Feedzirra, rFeedParser, and FeedNormalizer. For more details the {benchmark code can be found in the project in spec/benchmarks/feedzirra_benchmarks.rb}[http://github.com/pauldix/feedzirra/blob/7fb5634c5c16e9c6ec971767b462c6518cd55f5d/spec/benchmarks/feedzirra_benchmarks.rb]
135
+
136
+ feedzirra 5.170000 1.290000 6.460000 ( 18.917796)
137
+ rfeedparser 104.260000 12.220000 116.480000 (244.799063)
138
+ feed-normalizer 66.250000 4.010000 70.260000 (191.589862)
139
+
140
+ The result of that benchmark is a bit sketchy because of the network variability. Running 10 times against the same 20 feeds was meant to smooth some of that out. However, there is also a {benchmark comparing parsing speed in spec/benchmarks/parsing_benchmark.rb}[http://github.com/pauldix/feedzirra/blob/7fb5634c5c16e9c6ec971767b462c6518cd55f5d/spec/benchmarks/parsing_benchmark.rb] on an atom feed.
141
+
142
+ feedzirra 0.500000 0.030000 0.530000 ( 0.658744)
143
+ rfeedparser 8.400000 1.110000 9.510000 ( 11.839827)
144
+ feed-normalizer 5.980000 0.160000 6.140000 ( 7.576140)
145
+
146
+ There's also a {benchmark that shows the results of using Feedzirra to perform updates on feeds}[http://github.com/pauldix/feedzirra/blob/45d64319544c61a4c9eb9f7f825c73b9f9030cb3/spec/benchmarks/updating_benchmarks.rb] you've already pulled in. I tested against 179 feeds. The first is the initial pull and the second is an update 65 seconds later. I'm not sure how many of them support etag and last-modified, so performance may be better or worse depending on what feeds you're requesting.
147
+
148
+ feedzirra fetch and parse 4.010000 0.710000 4.720000 ( 15.110101)
149
+ feedzirra update 0.660000 0.280000 0.940000 ( 5.152709)
150
+
151
+ === Discussion
152
+
153
+ I'd like feedback on the api and any bugs encountered on feeds in the wild. I've set up a {google group here}[http://groups.google.com/group/feedzirra].
154
+
155
+ === TODO
156
+
157
+ This thing needs to hammer on many different feeds in the wild. I'm sure there will be bugs. I want to find them and crush them. I didn't bother using the test suite for feedparser. i wanted to start fresh.
158
+
159
+ Here are some more specific TODOs.
160
+ * Make a feedzirra-rails gem to integrate feedzirra seamlessly with Rails and ActiveRecord.
161
+ * Add support for authenticated feeds.
162
+ * Create a super sweet DSL for defining new parsers.
163
+ * Test against Ruby 1.9.1 and fix any bugs.
164
+ * I'm not keeping track of modified on entries. Should I add this?
165
+ * Clean up the fetching code inside feed.rb so it doesn't suck so hard.
166
+ * Make the feed_spec actually mock stuff out so it doesn't hit the net.
167
+ * Readdress how feeds determine if they can parse a document. Maybe I should use namespaces instead?
168
+
169
+ === LICENSE
170
+
171
+ (The MIT License)
172
+
173
+ Copyright (c) 2009:
174
+
175
+ {Paul Dix}[http://pauldix.net]
176
+
177
+ Permission is hereby granted, free of charge, to any person obtaining
178
+ a copy of this software and associated documentation files (the
179
+ 'Software'), to deal in the Software without restriction, including
180
+ without limitation the rights to use, copy, modify, merge, publish,
181
+ distribute, sublicense, and/or sell copies of the Software, and to
182
+ permit persons to whom the Software is furnished to do so, subject to
183
+ the following conditions:
184
+
185
+ The above copyright notice and this permission notice shall be
186
+ included in all copies or substantial portions of the Software.
187
+
188
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
189
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
190
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
191
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
192
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
193
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
194
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/Rakefile ADDED
@@ -0,0 +1,56 @@
1
+ require "spec"
2
+ require "spec/rake/spectask"
3
+ require 'rake/rdoctask'
4
+ require 'lib/feedzirra.rb'
5
+
6
+ # Grab recently touched specs
7
+ def recent_specs(touched_since)
8
+ recent_specs = FileList['app/**/*'].map do |path|
9
+
10
+ if File.mtime(path) > touched_since
11
+ spec = File.join('spec', File.dirname(path).split("/")[1..-1].join('/'),
12
+ "#{File.basename(path, ".*")}_spec.rb")
13
+ spec if File.exists?(spec)
14
+ end
15
+ end.compact
16
+
17
+ recent_specs += FileList['spec/**/*_spec.rb'].select do |path|
18
+ File.mtime(path) > touched_since
19
+ end
20
+ recent_specs.uniq
21
+ end
22
+
23
+ desc "Run all the tests"
24
+ task :default => :spec
25
+
26
+ # Tasks
27
+ Spec::Rake::SpecTask.new do |t|
28
+ t.spec_opts = ['--options', "\"#{File.dirname(__FILE__)}/spec/spec.opts\""]
29
+ t.spec_files = FileList['spec/**/*_spec.rb']
30
+ end
31
+
32
+ desc 'Run recent specs'
33
+ Spec::Rake::SpecTask.new("spec:recent") do |t|
34
+ t.spec_opts = ["--format","specdoc","--color"]
35
+ t.spec_files = recent_specs(Time.now - 600) # 10 min.
36
+ end
37
+
38
+ Spec::Rake::SpecTask.new('spec:rcov') do |t|
39
+ t.spec_opts = ['--options', "\"#{File.dirname(__FILE__)}/spec/spec.opts\""]
40
+ t.spec_files = FileList['spec/**/*_spec.rb']
41
+ t.rcov = true
42
+ t.rcov_opts = ['--exclude', 'spec,/usr/lib/ruby,/usr/local,/var/lib,/Library', '--text-report']
43
+ end
44
+
45
+ Rake::RDocTask.new do |rd|
46
+ rd.title = 'Feedzirra'
47
+ rd.rdoc_dir = 'rdoc'
48
+ rd.rdoc_files.include('README.rdoc', 'lib/feedzirra.rb', 'lib/feedzirra/**/*.rb')
49
+ rd.options = ["--quiet", "--opname", "index.html", "--line-numbers", "--inline-source", '--main', 'README.rdoc']
50
+ end
51
+
52
+ task :install do
53
+ rm_rf "*.gem"
54
+ puts `gem build feedzirra.gemspec`
55
+ puts `sudo gem install feedzirra-#{Feedzirra::VERSION}.gem`
56
+ end
@@ -0,0 +1,8 @@
1
+ class Array
2
+
3
+ # From Rails' ActiveSupport library
4
+ def extract_options!
5
+ last.is_a?(::Hash) ? pop : {}
6
+ end
7
+
8
+ end
@@ -0,0 +1,21 @@
1
+ # Date code pulled from:
2
+ # Ruby Cookbook by Lucas Carlson and Leonard Richardson
3
+ # Published by O'Reilly
4
+ # ISBN: 0-596-52369-6
5
+ class Date
6
+ def feed_utils_to_gm_time
7
+ feed_utils_to_time(new_offset, :gm)
8
+ end
9
+
10
+ def feed_utils_to_local_time
11
+ feed_utils_to_time(new_offset(DateTime.now.offset-offset), :local)
12
+ end
13
+
14
+ private
15
+ def feed_utils_to_time(dest, method)
16
+ #Convert a fraction of a day to a number of microseconds
17
+ usec = (dest.sec_fraction * 60 * 60 * 24 * (10**6)).to_i
18
+ Time.send(method, dest.year, dest.month, dest.day, dest.hour, dest.min,
19
+ dest.sec, usec)
20
+ end
21
+ end
@@ -0,0 +1,9 @@
1
+ class String
2
+ def sanitize!
3
+ self.replace(sanitize)
4
+ end
5
+
6
+ def sanitize
7
+ Dryopteris.sanitize(self)
8
+ end
9
+ end
@@ -0,0 +1,32 @@
1
+ require 'md5'
2
+ require 'uri'
3
+
4
+ module Feedzirra
5
+ module Backend
6
+ class Filesystem
7
+
8
+ DEFAULTS = {
9
+ :path => File.join(%w[ ~ / .feedzirra ])
10
+ }
11
+
12
+ def initialize(options = { })
13
+ @options = DEFAULTS.merge(options)
14
+ end
15
+
16
+ def get(url)
17
+ f = filename_for(url)
18
+ Marshal.load(File.read(filename_for(url))) if File.exist?(f)
19
+ end
20
+
21
+ def set(url, result)
22
+ File.open(filename_for(url), 'w') {|f| f.write(Marshal.dump(result)) }
23
+ end
24
+
25
+ private
26
+
27
+ def filename_for(url)
28
+ File.join(@options[:path], MD5.hexdigest(URI.parse(url).normalize.to_s))
29
+ end
30
+ end
31
+ end
32
+ end
@@ -0,0 +1,37 @@
1
+ require 'memcache'
2
+
3
+ module Feedzirra
4
+ module Backend
5
+
6
+ # Can be used to set up Memcache, or clients able to speak the Memcache protocol such as
7
+ # Tokyo Tyrant, as a Feedzirra::Backend.
8
+ class Memcache
9
+ DEFAULTS = {
10
+ :server => 'localhost',
11
+ :port => '11211'
12
+ }
13
+
14
+ def initialize(options = { })
15
+ @options = DEFAULTS.merge(options)
16
+ @cache = MemCache.new([ @options[:server], @options[:port] ].join(':'), :namespace => 'Feedzirra')
17
+ end
18
+
19
+ def get(url)
20
+ res = @cache.get(key_for(url))
21
+ Marshal.load(res) unless res.nil?
22
+ end
23
+
24
+ def set(url, result)
25
+ @cache.set(key_for(url), Marshal.dump(result))
26
+ end
27
+
28
+ private
29
+
30
+ def key_for(url)
31
+ MD5.hexdigest(URI.parse(url).normalize.to_s)
32
+ end
33
+
34
+ end
35
+
36
+ end
37
+ end
@@ -0,0 +1,22 @@
1
+ module Feedzirra
2
+ module Backend
3
+
4
+ # Memory store uses a ruby Hash to store the results of feed fetches.
5
+ # It won't persist after the application exits.
6
+ class Memory
7
+
8
+ def initialize(options = { })
9
+ @store = { }
10
+ end
11
+
12
+ def get(url)
13
+ @store[url]
14
+ end
15
+
16
+ def set(url, result)
17
+ @store[url] = result
18
+ end
19
+
20
+ end
21
+ end
22
+ end
@@ -0,0 +1,68 @@
1
+ module Feedzirra
2
+
3
+ # TODO - remove this class, and only use the Feedzirra::Reader class for accessing feed data.
4
+ class Feed
5
+
6
+ # Fetches and returns the parsed XML for each URL provided.
7
+ #
8
+ # === Parameters
9
+ # [urls<String> or <Array>] A single feed URL, or an array of feed URLs.
10
+ # [options<Hash>] Valid keys for this argument as as followed:
11
+ # * :user_agent - String that overrides the default user agent.
12
+ # * :if_modified_since - Time object representing when the feed was last updated.
13
+ # * :if_none_match - String, an etag for the request that was stored previously.
14
+ # * :on_success - Block that gets executed after a successful request.
15
+ # * :on_failure - Block that gets executed after a failed request.
16
+ # === Returns
17
+ # A Feed object if a single URL is passed.
18
+ #
19
+ # A Hash if multiple URL's are passed. The key will be the URL, and the value the Feed object.
20
+ def self.fetch_and_parse(urls, options = {})
21
+ multi = Feedzirra::HttpMulti.new(urls, options)
22
+ multi.perform
23
+ urls.is_a?(String) ? multi.responses.values.first : multi.responses
24
+ end
25
+
26
+ # Updates each feed for each Feed object provided.
27
+ #
28
+ # === Parameters
29
+ # [feeds<Feed> or <Array>] A single feed object, or an array of feed objects.
30
+ # [options<Hash>] Valid keys for this argument as as followed:
31
+ # * :user_agent - String that overrides the default user agent.
32
+ # * :on_success - Block that gets executed after a successful request.
33
+ # * :on_failure - Block that gets executed after a failed request.
34
+ # === Returns
35
+ # A updated Feed object if a single URL is passed.
36
+ #
37
+ # A Hash if multiple Feeds are passed. The key will be the URL, and the value the updated Feed object.
38
+ def self.update(feeds, options = {})
39
+ multi = Feedzirra::HttpMulti.new(urls, options)
40
+
41
+ multi.perform
42
+ multi.responses.size == 1 ? multi.responses.values.first : multi.responses.values
43
+ end
44
+
45
+ # Fetches and returns the raw XML for each URL provided.
46
+ #
47
+ # === Parameters
48
+ # [urls<String> or <Array>] A single feed URL, or an array of feed URLs.
49
+ # [options<Hash>] Valid keys for this argument as as followed:
50
+ # :user_agent - String that overrides the default user agent.
51
+ # :if_modified_since - Time object representing when the feed was last updated.
52
+ # :if_none_match - String that's normally an etag for the request that was stored previously.
53
+ # :on_success - Block that gets executed after a successful request.
54
+ # :on_failure - Block that gets executed after a failed request.
55
+ # === Returns
56
+ # A String of XML if a single URL is passed.
57
+ #
58
+ # A Hash if multiple URL's are passed. The key will be the URL, and the value the XML.
59
+ #
60
+ # FIXME - Raw mode is currently not supported!
61
+ def self.fetch_raw(urls, options = {})
62
+ multi = Feedzirra::HttpMulti.new(urls, options.merge(:raw => true))
63
+ multi.perform
64
+ urls.is_a?(String) ? multi.responses.values.first : multi.responses
65
+ end
66
+
67
+ end
68
+ end
@@ -0,0 +1,64 @@
1
+ module Feedzirra
2
+ class NoParserAvailable < StandardError; end
3
+
4
+ # Determines the correct parser for an XML feed.
5
+ class FeedParser
6
+ attr_reader :xml, :parser
7
+
8
+ def initialize(xml)
9
+ @xml = xml
10
+ @parser = detect_feed_processor_for_xml(@xml)
11
+ end
12
+
13
+ def detect_feed_processor_for_xml(xml)
14
+ start_of_doc = xml.slice(0, 1000)
15
+
16
+ if_none_found = lambda do
17
+ raise NoParserAvailable.new("No valid parser for XML.")
18
+ end
19
+
20
+ feed_classes.detect(if_none_found) do |klass|
21
+ klass.able_to_parse?(start_of_doc)
22
+ end
23
+ end
24
+
25
+ # Determines the correct Parser for the XML document with which
26
+ # we were initialized and returns results of this parser on the
27
+ # document. Raises NoParserAvailable if no valid parser is
28
+ # found.
29
+ def run
30
+ @parser.parse(@xml)
31
+ end
32
+
33
+ # Adds a new feed parsing class that will be used for parsing.
34
+ #
35
+ # === Parameters
36
+ # [klass<Constant>] The class/constant that you want to register.
37
+ # === Returns
38
+ # A updated array of feed parser class names.
39
+ def add_feed_class(klass)
40
+ feed_classes.unshift klass
41
+ end
42
+
43
+ # Provides a list of registered feed parsing classes.
44
+ #
45
+ # === Returns
46
+ # A array of class names.
47
+ def feed_classes
48
+ @@feed_classes ||= [Feedzirra::Parser::RSS, Feedzirra::Parser::AtomFeedBurner, Feedzirra::Parser::Atom]
49
+ end
50
+
51
+ # Makes all entry types look for the passed in element to parse. This is actually just a call to
52
+ # element (a SAXMachine call) in the class
53
+ #
54
+ # === Parameters
55
+ # [element_tag<String>]
56
+ # [options<Hash>] Valid keys are same as with SAXMachine
57
+ def add_common_feed_entry_element(element_tag, options = {})
58
+ feed_classes.map{|k| eval("#{k}Entry") }.each do |klass|
59
+ klass.send(:element, element_tag, options)
60
+ end
61
+ end
62
+
63
+ end
64
+ end