feedzirra 0.0.24 → 0.0.30

Sign up to get free protection for your applications and to get access to all the features.
Files changed (38) hide show
  1. data/.rspec +1 -0
  2. data/README.rdoc +207 -0
  3. data/Rakefile +19 -24
  4. data/lib/feedzirra.rb +7 -28
  5. data/lib/feedzirra/core_ext.rb +3 -0
  6. data/lib/{core_ext → feedzirra/core_ext}/date.rb +2 -4
  7. data/lib/{core_ext → feedzirra/core_ext}/string.rb +0 -0
  8. data/lib/feedzirra/feed.rb +99 -41
  9. data/lib/feedzirra/feed_entry_utilities.rb +12 -11
  10. data/lib/feedzirra/parser.rb +15 -0
  11. data/lib/feedzirra/parser/atom.rb +7 -13
  12. data/lib/feedzirra/parser/atom_entry.rb +4 -14
  13. data/lib/feedzirra/parser/atom_feed_burner.rb +4 -10
  14. data/lib/feedzirra/parser/atom_feed_burner_entry.rb +8 -13
  15. data/lib/feedzirra/parser/itunes_rss.rb +4 -4
  16. data/lib/feedzirra/parser/itunes_rss_item.rb +1 -1
  17. data/lib/feedzirra/parser/rss.rb +4 -10
  18. data/lib/feedzirra/parser/rss_entry.rb +2 -12
  19. data/lib/feedzirra/version.rb +3 -0
  20. data/spec/benchmarks/feed_benchmarks.rb +98 -0
  21. data/spec/benchmarks/feedzirra_benchmarks.rb +40 -0
  22. data/spec/benchmarks/fetching_benchmarks.rb +28 -0
  23. data/spec/benchmarks/parsing_benchmark.rb +30 -0
  24. data/spec/benchmarks/updating_benchmarks.rb +33 -0
  25. data/spec/feedzirra/feed_entry_utilities_spec.rb +1 -1
  26. data/spec/feedzirra/feed_spec.rb +38 -5
  27. data/spec/feedzirra/feed_utilities_spec.rb +7 -4
  28. data/spec/feedzirra/parser/atom_feed_burner_entry_spec.rb +5 -0
  29. data/spec/feedzirra/parser/atom_feed_burner_spec.rb +5 -1
  30. data/spec/feedzirra/parser/atom_spec.rb +5 -1
  31. data/spec/feedzirra/parser/itunes_rss_item_spec.rb +1 -1
  32. data/spec/feedzirra/parser/rss_entry_spec.rb +2 -1
  33. data/spec/feedzirra/parser/rss_spec.rb +5 -1
  34. data/spec/sample_feeds/run_against_sample.rb +20 -0
  35. data/spec/spec_helper.rb +10 -2
  36. metadata +141 -59
  37. data/README.textile +0 -208
  38. data/spec/spec.opts +0 -2
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --color
data/README.rdoc ADDED
@@ -0,0 +1,207 @@
1
+ == Feedzirra
2
+
3
+ I'd like feedback on the api and any bugs encountered on feeds in the wild. I've set up a {google group here}[http://groups.google.com/group/feedzirra].
4
+
5
+ === Description
6
+
7
+ Feedzirra is a feed library that is designed to get and update many feeds as quickly as possible. This includes using libcurl-multi through the taf2-curb[link:http://github.com/taf2/curb/tree/master] gem for faster http gets, and libxml through nokogiri[link:http://github.com/tenderlove/nokogiri/tree/master] and sax-machine[link:http://github.com/pauldix/sax-machine/tree/master] for faster parsing.
8
+
9
+ Once you have fetched feeds using Feedzirra, they can be updated using the feed objects. Feedzirra automatically inserts etag and last-modified information from the http response headers to lower bandwidth usage, eliminate unnecessary parsing, and make things speedier in general.
10
+
11
+ Another feature present in Feedzirra is the ability to create callback functions that get called "on success" and "on failure" when getting a feed. This makes it easy to do things like log errors or update data stores.
12
+
13
+ The fetching and parsing logic have been decoupled so that either of them can be used in isolation if you'd prefer not to use everything that Feedzirra offers. However, the code examples below use helper methods in the Feed class that put everything together to make things as simple as possible.
14
+
15
+ The final feature of Feedzirra is the ability to define custom parsing classes. In truth, Feedzirra could be used to parse much more than feeds. Microformats, page scraping, and almost anything else are fair game.
16
+
17
+ === Installation
18
+
19
+ For now Feedzirra exists only on github. It also has a few gem requirements that are only on github. Before you start you need to have libcurl[link:http://curl.haxx.se/] and libxml[link:http://xmlsoft.org/] installed. If you're on Leopard you have both. Otherwise, you'll need to grab them. Once you've got those libraries, these are the gems that get used: nokogiri, pauldix-sax-machine, taf2-curb (note that this is a fork that lives on github and not the Ruby Forge version of curb), and pauldix-feedzirra. The feedzirra gemspec has all the dependencies so you should be able to get up and running with the standard github gem install routine:
20
+
21
+ gem sources -a http://gems.github.com # if you haven't already
22
+ gem install pauldix-feedzirra
23
+
24
+ *NOTE:*Some people have been reporting a few issues related to installation. First, the Ruby Forge version of curb is not what you want. It will not work. Nor will the curl-multi gem that lives on Ruby Forge. You have to get the taf2-curb[link:http://github.com/taf2/curb/tree/master] fork installed.
25
+
26
+ If you see this error when doing a require:
27
+
28
+ /Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- curb_core (LoadError)
29
+
30
+ It means that the taf2-curb gem didn't build correctly. To resolve this you can do a git clone git://github.com/taf2/curb.git then run rake gem in the curb directory, then sudo gem install pkg/curb-0.2.4.0.gem. After that you should be good.
31
+
32
+ If you see something like this when trying to run it:
33
+
34
+ NoMethodError: undefined method `on_success' for #<Curl::Easy:0x1182724>
35
+ from ./lib/feedzirra/feed.rb:88:in `add_url_to_multi'
36
+
37
+ This means that you are requiring curl-multi or the Ruby Forge version of Curb somewhere. You can't use those and need to get the taf2 version up and running.
38
+
39
+ If you're on Debian or Ubuntu and getting errors while trying to install the taf2-curb gem, it could be because you don't have the latest version of libcurl installed. Do this to fix:
40
+
41
+ sudo apt-get install libcurl4-gnutls-dev
42
+
43
+ Another problem could be if you are running Mac Ports and you have libcurl installed through there. You need to uninstall it for curb to work! The version in Mac Ports is old and doesn't play nice with curb. If you're running Leopard, you can just uninstall and you should be golden. If you're on an older version of OS X, you'll then need to {download curl}[http://curl.haxx.se/download.html] and build from source. Then you'll have to install the taf2-curb gem again. You might have to perform the step above.
44
+
45
+ If you're still having issues, please let me know on the mailing list. Also, {Todd Fisher (taf2)}[link:http://github.com/taf2] is working on fixing the gem install. Please send him a full error report.
46
+
47
+ === Speedup date parsing
48
+
49
+ In MRI the date parsing code is written in ruby and is optimized for readability over speed, to speed up this part you can install the {home_run}[https://github.com/jeremyevans/home_run] gem to replace it with an optimized C version.
50
+
51
+ === Usage
52
+
53
+ {A gist of the following code}[link:http://gist.github.com/57285]
54
+
55
+ require 'feedzirra'
56
+
57
+ # fetching a single feed
58
+ feed = Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing")
59
+
60
+ # feed and entries accessors
61
+ feed.title # => "Paul Dix Explains Nothing"
62
+ feed.url # => "http://www.pauldix.net"
63
+ feed.feed_url # => "http://feeds.feedburner.com/PaulDixExplainsNothing"
64
+ feed.etag # => "GunxqnEP4NeYhrqq9TyVKTuDnh0"
65
+ feed.last_modified # => Sat Jan 31 17:58:16 -0500 2009 # it's a Time object
66
+
67
+ entry = feed.entries.first
68
+ entry.title # => "Ruby Http Client Library Performance"
69
+ entry.url # => "http://www.pauldix.net/2009/01/ruby-http-client-library-performance.html"
70
+ entry.author # => "Paul Dix"
71
+ entry.summary # => "..."
72
+ entry.content # => "..."
73
+ entry.published # => Thu Jan 29 17:00:19 UTC 2009 # it's a Time object
74
+ entry.categories # => ["...", "..."]
75
+
76
+ # sanitizing an entry's content
77
+ entry.title.sanitize # => returns the title with harmful stuff escaped
78
+ entry.author.sanitize # => returns the author with harmful stuff escaped
79
+ entry.content.sanitize # => returns the content with harmful stuff escaped
80
+ entry.content.sanitize! # => returns content with harmful stuff escaped and replaces original (also exists for author and title)
81
+ entry.sanitize! # => sanitizes the entry's title, author, and content in place (as in, it changes the value to clean versions)
82
+ feed.sanitize_entries! # => sanitizes all entries in place
83
+
84
+ # updating a single feed
85
+ updated_feed = Feedzirra::Feed.update(feed)
86
+
87
+ # an updated feed has the following extra accessors
88
+ updated_feed.updated? # returns true if any of the feed attributes have been modified. will return false if only new entries
89
+ updated_feed.new_entries # a collection of the entry objects that are newer than the latest in the feed before update
90
+
91
+ # fetching multiple feeds
92
+ feed_urls = ["http://feeds.feedburner.com/PaulDixExplainsNothing", "http://feeds.feedburner.com/trottercashion"]
93
+ feeds = Feedzirra::Feed.fetch_and_parse(feed_urls)
94
+
95
+ # feeds is now a hash with the feed_urls as keys and the parsed feed objects as values. If an error was thrown
96
+ # there will be a Fixnum of the http response code instead of a feed object
97
+
98
+ # updating multiple feeds. it expects a collection of feed objects
99
+ updated_feeds = Feedzirra::Feed.update(feeds.values)
100
+
101
+ # defining custom behavior on failure or success. note that a return status of 304 (not updated) will call the on_success handler
102
+ feed = Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing",
103
+ :on_success => lambda {|url, feed| puts feed.title },
104
+ :on_failure => lambda {|url, response_code, response_header, response_body| puts response_body })
105
+ # if a collection was passed into fetch_and_parse, the handlers will be called for each one
106
+
107
+ # the behavior for the handlers when using Feedzirra::Feed.update is slightly different. The feed passed into on_success will be
108
+ # the updated feed with the standard updated accessors. on failure it will be the original feed object passed into update
109
+
110
+ # fetching a feed via a proxy (optional)
111
+ feed = Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing", {:proxy_url => '10.0.0.1', :proxy_port => 3084})
112
+
113
+ === Extending
114
+
115
+ ==== Adding a feed parsing class
116
+
117
+ # Adds a new feed parsing class, this class will be used first
118
+ Feedzirra::Feed.add_feed_class MyFeedClass
119
+
120
+ ==== Adding attributes to all feeds types / all entries types
121
+
122
+ # Add the generator attribute to all feed types
123
+ Feedzirra::Feed.add_common_feed_element('generator')
124
+ Feedzirra::Feed.fetch_and_parse("href="http://www.pauldix.net/atom.xml").generator # => 'TypePad'
125
+
126
+ # Add some GeoRss information
127
+ Feedzirra::Feed.add_common_feed_entry_element('geo:lat', :as => :lat)
128
+ Feedzirra::Feed.fetch_and_parse("http://www.earthpublisher.com/georss.php").entries.each do |e|
129
+ p "lat: #{e.lat}, long: #{e.long}"
130
+ end
131
+
132
+ ==== Adding attributes to only one class
133
+
134
+ If you want to add attributes for only on class you simply have to declare them in the class
135
+
136
+ # Add some GeoRss information
137
+ require 'lib/feedzirra/parser/rss_entry'
138
+
139
+ class Feedzirra::Parser::RSSEntry
140
+ element 'geo:lat', :as => :lat
141
+ element 'geo:long', :as => :long
142
+ end
143
+
144
+ # Fetch a feed containing GeoRss info and print them
145
+ Feedzirra::Feed.fetch_and_parse("http://www.earthpublisher.com/georss.php").entries.each do |e|
146
+ p "lat: #{e.lat}, long: #{e.long}"
147
+ end
148
+
149
+ === Benchmarks
150
+
151
+ One of the goals of Feedzirra is speed. This includes not only parsing, but fetching multiple feeds as quickly as possible. I ran a benchmark getting 20 feeds 10 times using Feedzirra, rFeedParser, and FeedNormalizer. For more details the {benchmark code can be found in the project in spec/benchmarks/feedzirra_benchmarks.rb}[http://github.com/pauldix/feedzirra/blob/7fb5634c5c16e9c6ec971767b462c6518cd55f5d/spec/benchmarks/feedzirra_benchmarks.rb]
152
+
153
+ feedzirra 5.170000 1.290000 6.460000 ( 18.917796)
154
+ rfeedparser 104.260000 12.220000 116.480000 (244.799063)
155
+ feed-normalizer 66.250000 4.010000 70.260000 (191.589862)
156
+
157
+ The result of that benchmark is a bit sketchy because of the network variability. Running 10 times against the same 20 feeds was meant to smooth some of that out. However, there is also a {benchmark comparing parsing speed in spec/benchmarks/parsing_benchmark.rb}[http://github.com/pauldix/feedzirra/blob/7fb5634c5c16e9c6ec971767b462c6518cd55f5d/spec/benchmarks/parsing_benchmark.rb] on an atom feed.
158
+
159
+ feedzirra 0.500000 0.030000 0.530000 ( 0.658744)
160
+ rfeedparser 8.400000 1.110000 9.510000 ( 11.839827)
161
+ feed-normalizer 5.980000 0.160000 6.140000 ( 7.576140)
162
+
163
+ There's also a {benchmark that shows the results of using Feedzirra to perform updates on feeds}[http://github.com/pauldix/feedzirra/blob/45d64319544c61a4c9eb9f7f825c73b9f9030cb3/spec/benchmarks/updating_benchmarks.rb] you've already pulled in. I tested against 179 feeds. The first is the initial pull and the second is an update 65 seconds later. I'm not sure how many of them support etag and last-modified, so performance may be better or worse depending on what feeds you're requesting.
164
+
165
+ feedzirra fetch and parse 4.010000 0.710000 4.720000 ( 15.110101)
166
+ feedzirra update 0.660000 0.280000 0.940000 ( 5.152709)
167
+
168
+ === TODO
169
+
170
+ This thing needs to hammer on many different feeds in the wild. I'm sure there will be bugs. I want to find them and crush them. I didn't bother using the test suite for feedparser. i wanted to start fresh.
171
+
172
+ Here are some more specific TODOs.
173
+ * Make a feedzirra-rails gem to integrate feedzirra seamlessly with Rails and ActiveRecord.
174
+ * Add support for authenticated feeds.
175
+ * Create a super sweet DSL for defining new parsers.
176
+ * Test against Ruby 1.9.1 and fix any bugs.
177
+ * I'm not keeping track of modified on entries. Should I add this?
178
+ * Clean up the fetching code inside feed.rb so it doesn't suck so hard.
179
+ * Make the feed_spec actually mock stuff out so it doesn't hit the net.
180
+ * Readdress how feeds determine if they can parse a document. Maybe I should use namespaces instead?
181
+
182
+ === LICENSE
183
+
184
+ (The MIT License)
185
+
186
+ Copyright (c) 2009:
187
+
188
+ {Paul Dix}[http://pauldix.net]
189
+
190
+ Permission is hereby granted, free of charge, to any person obtaining
191
+ a copy of this software and associated documentation files (the
192
+ 'Software'), to deal in the Software without restriction, including
193
+ without limitation the rights to use, copy, modify, merge, publish,
194
+ distribute, sublicense, and/or sell copies of the Software, and to
195
+ permit persons to whom the Software is furnished to do so, subject to
196
+ the following conditions:
197
+
198
+ The above copyright notice and this permission notice shall be
199
+ included in all copies or substantial portions of the Software.
200
+
201
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
202
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
203
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
204
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
205
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
206
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
207
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/Rakefile CHANGED
@@ -1,9 +1,14 @@
1
- require "spec"
2
- require "spec/rake/spectask"
3
- require 'rake/rdoctask'
4
- require 'lib/feedzirra.rb'
1
+ require 'bundler'
2
+ Bundler.setup
3
+
4
+ require 'rake'
5
+ require 'rdoc/task'
6
+ require 'rspec'
7
+ require 'rspec/core/rake_task'
8
+
9
+ $LOAD_PATH.unshift File.expand_path('../lib', __FILE__)
10
+ require 'feedzirra/version'
5
11
 
6
- # Grab recently touched specs
7
12
  def recent_specs(touched_since)
8
13
  recent_specs = FileList['app/**/*'].map do |path|
9
14
 
@@ -20,37 +25,27 @@ def recent_specs(touched_since)
20
25
  recent_specs.uniq
21
26
  end
22
27
 
23
- desc "Run all the tests"
24
- task :default => :spec
25
-
26
- # Tasks
27
- Spec::Rake::SpecTask.new do |t|
28
- t.spec_opts = ['--options', "\"#{File.dirname(__FILE__)}/spec/spec.opts\""]
29
- t.spec_files = FileList['spec/**/*_spec.rb']
28
+ RSpec::Core::RakeTask.new do |t|
29
+ t.pattern = FileList['spec/**/*_spec.rb']
30
30
  end
31
31
 
32
32
  desc 'Run recent specs'
33
- Spec::Rake::SpecTask.new("spec:recent") do |t|
34
- t.spec_opts = ["--format","specdoc","--color"]
35
- t.spec_files = recent_specs(Time.now - 600) # 10 min.
33
+ RSpec::Core::RakeTask.new("spec:recent") do |t|
34
+ t.pattern = recent_specs(Time.now - 600) # 10 min.
36
35
  end
37
36
 
38
- Spec::Rake::SpecTask.new('spec:rcov') do |t|
39
- t.spec_opts = ['--options', "\"#{File.dirname(__FILE__)}/spec/spec.opts\""]
40
- t.spec_files = FileList['spec/**/*_spec.rb']
37
+ RSpec::Core::RakeTask.new('spec:rcov') do |t|
38
+ t.pattern = FileList['spec/**/*_spec.rb']
41
39
  t.rcov = true
42
40
  t.rcov_opts = ['--exclude', 'spec,/usr/lib/ruby,/usr/local,/var/lib,/Library', '--text-report']
43
41
  end
44
42
 
45
- Rake::RDocTask.new do |rd|
43
+ RDoc::Task.new do |rd|
46
44
  rd.title = 'Feedzirra'
47
45
  rd.rdoc_dir = 'rdoc'
48
46
  rd.rdoc_files.include('README.rdoc', 'lib/feedzirra.rb', 'lib/feedzirra/**/*.rb')
49
47
  rd.options = ["--quiet", "--opname", "index.html", "--line-numbers", "--inline-source", '--main', 'README.rdoc']
50
48
  end
51
49
 
52
- task :install do
53
- rm_rf "*.gem"
54
- puts `gem build feedzirra.gemspec`
55
- puts `sudo gem install feedzirra-#{Feedzirra::VERSION}.gem`
56
- end
50
+ desc "Run all the tests"
51
+ task :default => :spec
data/lib/feedzirra.rb CHANGED
@@ -1,41 +1,20 @@
1
- $LOAD_PATH.unshift(File.dirname(__FILE__)) unless $LOAD_PATH.include?(File.dirname(__FILE__))
2
-
3
1
  require 'zlib'
4
2
  require 'curb'
5
3
  require 'sax-machine'
6
4
  require 'loofah'
7
5
  require 'uri'
8
6
 
9
- require 'active_support/version'
10
7
  require 'active_support/basic_object'
11
8
  require 'active_support/core_ext/module'
12
9
  require 'active_support/core_ext/kernel'
13
10
  require 'active_support/core_ext/object'
11
+ require 'active_support/time'
14
12
 
15
- if ActiveSupport::VERSION::MAJOR >= 3
16
- require 'active_support/time'
17
- else
18
- require 'active_support/core_ext/time'
19
- end
20
-
21
- require 'core_ext/date'
22
- require 'core_ext/string'
23
-
24
- require 'feedzirra/feed_utilities'
25
- require 'feedzirra/feed_entry_utilities'
26
- require 'feedzirra/feed'
27
-
28
- require 'feedzirra/parser/rss_entry'
29
- require 'feedzirra/parser/itunes_rss_owner'
30
- require 'feedzirra/parser/itunes_rss_item'
31
- require 'feedzirra/parser/atom_entry'
32
- require 'feedzirra/parser/atom_feed_burner_entry'
33
-
34
- require 'feedzirra/parser/rss'
35
- require 'feedzirra/parser/itunes_rss'
36
- require 'feedzirra/parser/atom'
37
- require 'feedzirra/parser/atom_feed_burner'
13
+ require 'feedzirra/core_ext'
38
14
 
39
15
  module Feedzirra
40
- VERSION = "0.0.24"
41
- end
16
+ autoload :FeedEntryUtilities, 'feedzirra/feed_entry_utilities'
17
+ autoload :FeedUtilities, 'feedzirra/feed_utilities'
18
+ autoload :Feed, 'feedzirra/feed'
19
+ autoload :Parser, 'feedzirra/parser'
20
+ end
@@ -0,0 +1,3 @@
1
+ Dir["#{File.dirname(__FILE__)}/core_ext/*.rb"].sort.each do |path|
2
+ require "feedzirra/core_ext/#{File.basename(path, '.rb')}"
3
+ end
@@ -1,4 +1,4 @@
1
- # Date code pulled from:
1
+ # Date code pulled and adapted from:
2
2
  # Ruby Cookbook by Lucas Carlson and Leonard Richardson
3
3
  # Published by O'Reilly
4
4
  # ISBN: 0-596-52369-6
@@ -13,9 +13,7 @@ class Date
13
13
 
14
14
  private
15
15
  def feed_utils_to_time(dest, method)
16
- #Convert a fraction of a day to a number of microseconds
17
- usec = (dest.sec_fraction * 60 * 60 * 24 * (10**6)).to_i
18
16
  Time.send(method, dest.year, dest.month, dest.day, dest.hour, dest.min,
19
- dest.sec, usec)
17
+ dest.sec, dest.zone)
20
18
  end
21
19
  end
File without changes
@@ -5,16 +5,16 @@ module Feedzirra
5
5
  USER_AGENT = "feedzirra http://github.com/pauldix/feedzirra/tree/master"
6
6
 
7
7
  # Takes a raw XML feed and attempts to parse it. If no parser is available a Feedzirra::NoParserAvailable exception is raised.
8
- #
8
+ # You can pass a block to be called when there's an error during the parsing.
9
9
  # === Parameters
10
10
  # [xml<String>] The XML that you would like parsed.
11
11
  # === Returns
12
12
  # An instance of the determined feed type. By default a Feedzirra::Atom, Feedzirra::AtomFeedBurner, Feedzirra::RDF, or Feedzirra::RSS object.
13
13
  # === Raises
14
14
  # Feedzirra::NoParserAvailable : If no valid parser classes could be found for the feed.
15
- def self.parse(xml)
15
+ def self.parse(xml, &block)
16
16
  if parser = determine_feed_parser_for_xml(xml)
17
- parser.parse(xml)
17
+ parser.parse(xml, block)
18
18
  else
19
19
  raise NoParserAvailable.new("No valid parser for XML.")
20
20
  end
@@ -49,30 +49,100 @@ module Feedzirra
49
49
  @feed_classes ||= [Feedzirra::Parser::RSS, Feedzirra::Parser::AtomFeedBurner, Feedzirra::Parser::Atom]
50
50
  end
51
51
 
52
- # Makes all entry types look for the passed in element to parse. This is actually just a call to
53
- # element (a SAXMachine call) in the class
52
+ # Makes all registered feeds types look for the passed in element to parse.
53
+ # This is actually just a call to element (a SAXMachine call) in the class.
54
+ #
55
+ # === Parameters
56
+ # [element_tag<String>] The element tag
57
+ # [options<Hash>] Valid keys are same as with SAXMachine
58
+ def self.add_common_feed_element(element_tag, options = {})
59
+ feed_classes.each do |k|
60
+ k.element element_tag, options
61
+ end
62
+ end
63
+
64
+ # Makes all registered feeds types look for the passed in elements to parse.
65
+ # This is actually just a call to elements (a SAXMachine call) in the class.
66
+ #
67
+ # === Parameters
68
+ # [element_tag<String>] The element tag
69
+ # [options<Hash>] Valid keys are same as with SAXMachine
70
+ def self.add_common_feed_elements(element_tag, options = {})
71
+ feed_classes.each do |k|
72
+ k.elements element_tag, options
73
+ end
74
+ end
75
+
76
+ # Makes all registered entry types look for the passed in element to parse.
77
+ # This is actually just a call to element (a SAXMachine call) in the class.
54
78
  #
55
79
  # === Parameters
56
80
  # [element_tag<String>]
57
81
  # [options<Hash>] Valid keys are same as with SAXMachine
58
82
  def self.add_common_feed_entry_element(element_tag, options = {})
59
- # need to think of a better way to do this. will break for people who want this behavior
60
- # across their added classes
61
- feed_classes.map{|k| eval("#{k}Entry") }.each do |klass|
62
- klass.send(:element, element_tag, options)
63
- end
83
+ call_on_each_feed_entry :element, element_tag, options
64
84
  end
65
85
 
86
+ # Makes all registered entry types look for the passed in elements to parse.
87
+ # This is actually just a call to element (a SAXMachine call) in the class.
88
+ #
89
+ # === Parameters
90
+ # [element_tag<String>]
91
+ # [options<Hash>] Valid keys are same as with SAXMachine
92
+ def self.add_common_feed_entry_elements(element_tag, options = {})
93
+ call_on_each_feed_entry :elements, element_tag, options
94
+ end
95
+
96
+ # Call a method on all feed entries classes.
97
+ #
98
+ # === Parameters
99
+ # [method<Symbol>] The method name
100
+ # [parameters<Array>] The method parameters
101
+ def self.call_on_each_feed_entry(method, *parameters)
102
+ feed_classes.each do |k|
103
+ # iterate on the collections defined in the sax collection
104
+ k.sax_config.collection_elements.each_value do |vl|
105
+ # vl is a list of CollectionConfig mapped to an attribute name
106
+ # we'll look for the one set as 'entries' and add the new element
107
+ vl.find_all{|v| (v.accessor == 'entries') && (v.data_class.class == Class)}.each do |v|
108
+ v.data_class.send(method, *parameters)
109
+ end
110
+ end
111
+ end
112
+ end
113
+
114
+ # Setup curl from options.
115
+ # Possible parameters:
116
+ # * :user_agent - overrides the default user agent.
117
+ # * :compress - any value to enable compression
118
+ # * :http_authentication - array containing http authentication parameters
119
+ # * :proxy_url - proxy url
120
+ # * :proxy_port - proxy port
121
+ # * :max_redirects - max number of redirections
122
+ # * :timeout - timeout
123
+ def self.setup_easy curl, options
124
+ curl.headers["Accept-encoding"] = 'gzip, deflate' if options.has_key?(:compress)
125
+ curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT)
126
+
127
+ curl.userpwd = options[:http_authentication].join(':') if options.has_key?(:http_authentication)
128
+ curl.proxy_url = options[:proxy_url] if options.has_key?(:proxy_url)
129
+ curl.proxy_port = options[:proxy_port] if options.has_key?(:proxy_port)
130
+ curl.max_redirects = options[:max_redirects] if options[:max_redirects]
131
+ curl.timeout = options[:timeout] if options[:timeout]
132
+
133
+ curl.follow_location = true
134
+ end
135
+
66
136
  # Fetches and returns the raw XML for each URL provided.
67
137
  #
68
138
  # === Parameters
69
139
  # [urls<String> or <Array>] A single feed URL, or an array of feed URLs.
70
140
  # [options<Hash>] Valid keys for this argument as as followed:
71
- # :user_agent - String that overrides the default user agent.
72
141
  # :if_modified_since - Time object representing when the feed was last updated.
73
142
  # :if_none_match - String that's normally an etag for the request that was stored previously.
74
143
  # :on_success - Block that gets executed after a successful request.
75
144
  # :on_failure - Block that gets executed after a failed request.
145
+ # * all parameters defined in setup_easy
76
146
  # === Returns
77
147
  # A String of XML if a single URL is passed.
78
148
  #
@@ -83,15 +153,10 @@ module Feedzirra
83
153
  responses = {}
84
154
  url_queue.each do |url|
85
155
  easy = Curl::Easy.new(url) do |curl|
86
- curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT)
156
+ setup_easy curl, options
157
+
87
158
  curl.headers["If-Modified-Since"] = options[:if_modified_since].httpdate if options.has_key?(:if_modified_since)
88
159
  curl.headers["If-None-Match"] = options[:if_none_match] if options.has_key?(:if_none_match)
89
- curl.headers["Accept-encoding"] = 'gzip, deflate' if options.has_key?(:compress)
90
- curl.follow_location = true
91
- curl.userpwd = options[:http_authentication].join(':') if options.has_key?(:http_authentication)
92
-
93
- curl.max_redirects = options[:max_redirects] if options[:max_redirects]
94
- curl.timeout = options[:timeout] if options[:timeout]
95
160
 
96
161
  curl.on_success do |c|
97
162
  responses[url] = decode_content(c)
@@ -143,7 +208,7 @@ module Feedzirra
143
208
  # === Returns
144
209
  # A decoded string of XML.
145
210
  def self.decode_content(c)
146
- if c.header_str.match(/Content-Encoding: gzip/)
211
+ if c.header_str.match(/Content-Encoding: gzip/i)
147
212
  begin
148
213
  gz = Zlib::GzipReader.new(StringIO.new(c.body_str))
149
214
  xml = gz.read
@@ -152,7 +217,7 @@ module Feedzirra
152
217
  # Maybe this is not gzipped?
153
218
  xml = c.body_str
154
219
  end
155
- elsif c.header_str.match(/Content-Encoding: deflate/)
220
+ elsif c.header_str.match(/Content-Encoding: deflate/i)
156
221
  xml = Zlib::Inflate.inflate(c.body_str)
157
222
  else
158
223
  xml = c.body_str
@@ -166,9 +231,9 @@ module Feedzirra
166
231
  # === Parameters
167
232
  # [feeds<Feed> or <Array>] A single feed object, or an array of feed objects.
168
233
  # [options<Hash>] Valid keys for this argument as as followed:
169
- # * :user_agent - String that overrides the default user agent.
170
234
  # * :on_success - Block that gets executed after a successful request.
171
235
  # * :on_failure - Block that gets executed after a failed request.
236
+ # * all parameters defined in setup_easy
172
237
  # === Returns
173
238
  # A updated Feed object if a single URL is passed.
174
239
  #
@@ -195,23 +260,17 @@ module Feedzirra
195
260
  # [responses<Hash>] Existing responses that you want the response from the request added to.
196
261
  # [feeds<String> or <Array>] A single feed object, or an array of feed objects.
197
262
  # [options<Hash>] Valid keys for this argument as as followed:
198
- # * :user_agent - String that overrides the default user agent.
199
263
  # * :on_success - Block that gets executed after a successful request.
200
264
  # * :on_failure - Block that gets executed after a failed request.
265
+ # * all parameters defined in setup_easy
201
266
  # === Returns
202
267
  # The updated Curl::Multi object with the request details added to it's stack.
203
268
  def self.add_url_to_multi(multi, url, url_queue, responses, options)
204
269
  easy = Curl::Easy.new(url) do |curl|
205
- curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT)
270
+ setup_easy curl, options
206
271
  curl.headers["If-Modified-Since"] = options[:if_modified_since].httpdate if options.has_key?(:if_modified_since)
207
272
  curl.headers["If-None-Match"] = options[:if_none_match] if options.has_key?(:if_none_match)
208
- curl.headers["Accept-encoding"] = 'gzip, deflate' if options.has_key?(:compress)
209
- curl.follow_location = true
210
- curl.userpwd = options[:http_authentication].join(':') if options.has_key?(:http_authentication)
211
273
 
212
- curl.max_redirects = options[:max_redirects] if options[:max_redirects]
213
- curl.timeout = options[:timeout] if options[:timeout]
214
-
215
274
  curl.on_success do |c|
216
275
  add_url_to_multi(multi, url_queue.shift, url_queue, responses, options) unless url_queue.empty?
217
276
  xml = decode_content(c)
@@ -219,7 +278,7 @@ module Feedzirra
219
278
 
220
279
  if klass
221
280
  begin
222
- feed = klass.parse(xml)
281
+ feed = klass.parse(xml, Proc.new{|message| puts "Error while parsing [#{url}] #{message}" })
223
282
  feed.feed_url = c.last_effective_url
224
283
  feed.etag = etag_from_header(c.header_str)
225
284
  feed.last_modified = last_modified_from_header(c.header_str)
@@ -230,7 +289,7 @@ module Feedzirra
230
289
  end
231
290
  else
232
291
  # puts "Error determining parser for #{url} - #{c.last_effective_url}"
233
- # raise NoParserAvailable.new("no valid parser for content.") (this would unfirtunately fail the whole 'multi', so it's not really useable)
292
+ # raise NoParserAvailable.new("no valid parser for content.") (this would unfortunately fail the whole 'multi', so it's not really usable)
234
293
  options[:on_failure].call(url, c.response_code, c.header_str, c.body_str) if options.has_key?(:on_failure)
235
294
  end
236
295
  end
@@ -238,12 +297,16 @@ module Feedzirra
238
297
  curl.on_failure do |c, err|
239
298
  add_url_to_multi(multi, url_queue.shift, url_queue, responses, options) unless url_queue.empty?
240
299
  responses[url] = c.response_code
241
- options[:on_failure].call(url, c.response_code, c.header_str, c.body_str) if options.has_key?(:on_failure)
300
+ if c.response_code == 304 # it's not modified. this isn't an error condition
301
+ options[:on_success].call(url, nil) if options.has_key?(:on_success)
302
+ else
303
+ options[:on_failure].call(url, c.response_code, c.header_str, c.body_str) if options.has_key?(:on_failure)
304
+ end
242
305
  end
243
306
  end
244
307
  multi.add(easy)
245
308
  end
246
-
309
+
247
310
  # An abstraction for adding a feed by a Feed object to the passed Curb::multi stack.
248
311
  #
249
312
  # === Parameters
@@ -253,26 +316,21 @@ module Feedzirra
253
316
  # [responses<Hash>] Existing responses that you want the response from the request added to.
254
317
  # [feeds<String>] or <Array> A single feed object, or an array of feed objects.
255
318
  # [options<Hash>] Valid keys for this argument as as followed:
256
- # * :user_agent - String that overrides the default user agent.
257
319
  # * :on_success - Block that gets executed after a successful request.
258
320
  # * :on_failure - Block that gets executed after a failed request.
321
+ # * all parameters defined in setup_easy
259
322
  # === Returns
260
323
  # The updated Curl::Multi object with the request details added to it's stack.
261
324
  def self.add_feed_to_multi(multi, feed, feed_queue, responses, options)
262
325
  easy = Curl::Easy.new(feed.feed_url) do |curl|
263
- curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT)
326
+ setup_easy curl, options
264
327
  curl.headers["If-Modified-Since"] = feed.last_modified.httpdate if feed.last_modified
265
328
  curl.headers["If-None-Match"] = feed.etag if feed.etag
266
- curl.userpwd = options[:http_authentication].join(':') if options.has_key?(:http_authentication)
267
- curl.follow_location = true
268
-
269
- curl.max_redirects = options[:max_redirects] if options[:max_redirects]
270
- curl.timeout = options[:timeout] if options[:timeout]
271
329
 
272
330
  curl.on_success do |c|
273
331
  begin
274
332
  add_feed_to_multi(multi, feed_queue.shift, feed_queue, responses, options) unless feed_queue.empty?
275
- updated_feed = Feed.parse(c.body_str)
333
+ updated_feed = Feed.parse(c.body_str){ |message| puts "Error while parsing [#{feed.feed_url}] #{message}" }
276
334
  updated_feed.feed_url = c.last_effective_url
277
335
  updated_feed.etag = etag_from_header(c.header_str)
278
336
  updated_feed.last_modified = last_modified_from_header(c.header_str)