astro-feedzirra 0.0.8.20090419 → 0.0.12

Sign up to get free protection for your applications and to get access to all the features.
Files changed (38) hide show
  1. data/README.rdoc +169 -0
  2. data/README.textile +9 -2
  3. data/Rakefile +3 -0
  4. data/lib/feedzirra.rb +11 -10
  5. data/lib/feedzirra/feed.rb +37 -41
  6. data/lib/feedzirra/feed_entry_utilities.rb +6 -0
  7. data/lib/feedzirra/parser/atom.rb +26 -0
  8. data/lib/feedzirra/parser/atom_entry.rb +34 -0
  9. data/lib/feedzirra/parser/atom_feed_burner.rb +27 -0
  10. data/lib/feedzirra/parser/atom_feed_burner_entry.rb +35 -0
  11. data/lib/feedzirra/parser/itunes_rss.rb +50 -0
  12. data/lib/feedzirra/parser/itunes_rss_item.rb +32 -0
  13. data/lib/feedzirra/parser/itunes_rss_owner.rb +12 -0
  14. data/lib/feedzirra/parser/rss.rb +28 -0
  15. data/lib/feedzirra/parser/rss_entry.rb +40 -0
  16. data/lib/feedzirra/push_parser.rb +56 -0
  17. data/spec/feedzirra/feed_spec.rb +40 -32
  18. data/spec/feedzirra/feed_utilities_spec.rb +9 -9
  19. data/spec/feedzirra/{atom_entry_spec.rb → parser/atom_entry_spec.rb} +3 -3
  20. data/spec/feedzirra/{atom_feed_burner_entry_spec.rb → parser/atom_feed_burner_entry_spec.rb} +4 -4
  21. data/spec/feedzirra/{atom_feed_burner_spec.rb → parser/atom_feed_burner_spec.rb} +6 -6
  22. data/spec/feedzirra/{atom_spec.rb → parser/atom_spec.rb} +5 -5
  23. data/spec/feedzirra/{itunes_rss_item_spec.rb → parser/itunes_rss_item_spec.rb} +8 -4
  24. data/spec/feedzirra/{itunes_rss_owner_spec.rb → parser/itunes_rss_owner_spec.rb} +3 -3
  25. data/spec/feedzirra/{itunes_rss_spec.rb → parser/itunes_rss_spec.rb} +5 -5
  26. data/spec/feedzirra/{rss_entry_spec.rb → parser/rss_entry_spec.rb} +3 -3
  27. data/spec/feedzirra/{rss_spec.rb → parser/rss_spec.rb} +5 -5
  28. data/spec/feedzirra/push_parser_spec.rb +16 -0
  29. metadata +28 -25
  30. data/lib/feedzirra/atom.rb +0 -22
  31. data/lib/feedzirra/atom_entry.rb +0 -29
  32. data/lib/feedzirra/atom_feed_burner.rb +0 -22
  33. data/lib/feedzirra/atom_feed_burner_entry.rb +0 -30
  34. data/lib/feedzirra/itunes_rss.rb +0 -46
  35. data/lib/feedzirra/itunes_rss_item.rb +0 -28
  36. data/lib/feedzirra/itunes_rss_owner.rb +0 -8
  37. data/lib/feedzirra/rss.rb +0 -23
  38. data/lib/feedzirra/rss_entry.rb +0 -35
@@ -0,0 +1,169 @@
1
+ == Feedzirra
2
+
3
+ I'd like feedback on the api and any bugs encountered on feeds in the wild. I've set up a {google group here}[http://groups.google.com/group/feedzirra].
4
+
5
+ === Description
6
+
7
+ Feedzirra is a feed library that is designed to get and update many feeds as quickly as possible. This includes using libcurl-multi through the taf2-curb[link:http://github.com/taf2/curb/tree/master] gem for faster http gets, and libxml through nokogiri[link:http://github.com/tenderlove/nokogiri/tree/master] and sax-machine[link:http://github.com/pauldix/sax-machine/tree/master] for faster parsing.
8
+
9
+ Once you have fetched feeds using Feedzirra, they can be updated using the feed objects. Feedzirra automatically inserts etag and last-modified information from the http response headers to lower bandwidth usage, eliminate unnecessary parsing, and make things speedier in general.
10
+
11
+ Another feature present in Feedzirra is the ability to create callback functions that get called "on success" and "on failure" when getting a feed. This makes it easy to do things like log errors or update data stores.
12
+
13
+ The fetching and parsing logic have been decoupled so that either of them can be used in isolation if you'd prefer not to use everything that Feedzirra offers. However, the code examples below use helper methods in the Feed class that put everything together to make things as simple as possible.
14
+
15
+ The final feature of Feedzirra is the ability to define custom parsing classes. In truth, Feedzirra could be used to parse much more than feeds. Microformats, page scraping, and almost anything else are fair game.
16
+
17
+ === Installation
18
+
19
+ For now Feedzirra exists only on github. It also has a few gem requirements that are only on github. Before you start you need to have libcurl[link:http://curl.haxx.se/] and libxml[link:http://xmlsoft.org/] installed. If you're on Leopard you have both. Otherwise, you'll need to grab them. Once you've got those libraries, these are the gems that get used: nokogiri, pauldix-sax-machine, taf2-curb (note that this is a fork that lives on github and not the Ruby Forge version of curb), and pauldix-feedzirra. The feedzirra gemspec has all the dependencies so you should be able to get up and running with the standard github gem install routine:
20
+
21
+ gem sources -a http://gems.github.com # if you haven't already
22
+ gem install pauldix-feedzirra
23
+
24
+ *NOTE:*Some people have been reporting a few issues related to installation. First, the Ruby Forge version of curb is not what you want. It will not work. Nor will the curl-multi gem that lives on Ruby Forge. You have to get the taf2-curb[link:http://github.com/taf2/curb/tree/master] fork installed.
25
+
26
+ If you see this error when doing a require:
27
+
28
+ /Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- curb_core (LoadError)
29
+
30
+ It means that the taf2-curb gem didn't build correctly. To resolve this you can do a git clone git://github.com/taf2/curb.git then run rake gem in the curb directory, then sudo gem install pkg/curb-0.2.4.0.gem. After that you should be good.
31
+
32
+ If you see something like this when trying to run it:
33
+
34
+ NoMethodError: undefined method `on_success' for #<Curl::Easy:0x1182724>
35
+ from ./lib/feedzirra/feed.rb:88:in `add_url_to_multi'
36
+
37
+ This means that you are requiring curl-multi or the Ruby Forge version of Curb somewhere. You can't use those and need to get the taf2 version up and running.
38
+
39
+ If you're on Debian or Ubuntu and getting errors while trying to install the taf2-curb gem, it could be because you don't have the latest version of libcurl installed. Do this to fix:
40
+
41
+ sudo apt-get install libcurl4-gnutls-dev
42
+
43
+ Another problem could be if you are running Mac Ports and you have libcurl installed through there. You need to uninstall it for curb to work! The version in Mac Ports is old and doesn't play nice with curb. If you're running Leopard, you can just uninstall and you should be golden. If you're on an older version of OS X, you'll then need to {download curl}[http://curl.haxx.se/download.html] and build from source. Then you'll have to install the taf2-curb gem again. You might have to perform the step above.
44
+
45
+ If you're still having issues, please let me know on the mailing list. Also, {Todd Fisher (taf2)}[link:http://github.com/taf2] is working on fixing the gem install. Please send him a full error report.
46
+
47
+ === Usage
48
+
49
+ {A gist of the following code}[link:http://gist.github.com/57285]
50
+
51
+ require 'feedzirra'
52
+
53
+ # fetching a single feed
54
+ feed = Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing")
55
+
56
+ # feed and entries accessors
57
+ feed.title # => "Paul Dix Explains Nothing"
58
+ feed.url # => "http://www.pauldix.net"
59
+ feed.feed_url # => "http://feeds.feedburner.com/PaulDixExplainsNothing"
60
+ feed.etag # => "GunxqnEP4NeYhrqq9TyVKTuDnh0"
61
+ feed.last_modified # => Sat Jan 31 17:58:16 -0500 2009 # it's a Time object
62
+
63
+ entry = feed.entries.first
64
+ entry.title # => "Ruby Http Client Library Performance"
65
+ entry.url # => "http://www.pauldix.net/2009/01/ruby-http-client-library-performance.html"
66
+ entry.author # => "Paul Dix"
67
+ entry.summary # => "..."
68
+ entry.content # => "..."
69
+ entry.published # => Thu Jan 29 17:00:19 UTC 2009 # it's a Time object
70
+ entry.categories # => ["...", "..."]
71
+
72
+ # sanitizing an entry's content
73
+ entry.title.sanitize # => returns the title with harmful stuff escaped
74
+ entry.author.sanitize # => returns the author with harmful stuff escaped
75
+ entry.content.sanitize # => returns the content with harmful stuff escaped
76
+ entry.content.sanitize! # => returns content with harmful stuff escaped and replaces original (also exists for author and title)
77
+ entry.sanitize! # => sanitizes the entry's title, author, and content in place (as in, it changes the value to clean versions)
78
+ feed.sanitize_entries! # => sanitizes all entries in place
79
+
80
+ # updating a single feed
81
+ updated_feed = Feedzirra::Feed.update(feed)
82
+
83
+ # an updated feed has the following extra accessors
84
+ updated_feed.updated? # returns true if any of the feed attributes have been modified. will return false if only new entries
85
+ updated_feed.new_entries # a collection of the entry objects that are newer than the latest in the feed before update
86
+
87
+ # fetching multiple feeds
88
+ feed_urls = ["http://feeds.feedburner.com/PaulDixExplainsNothing", "http://feeds.feedburner.com/trottercashion"]
89
+ feeds = Feedzirra::Feed.fetch_and_parse(feed_urls)
90
+
91
+ # feeds is now a hash with the feed_urls as keys and the parsed feed objects as values. If an error was thrown
92
+ # there will be a Fixnum of the http response code instead of a feed object
93
+
94
+ # updating multiple feeds. it expects a collection of feed objects
95
+ updated_feeds = Feedzirra::Feed.update(feeds.values)
96
+
97
+ # defining custom behavior on failure or success. note that a return status of 304 (not updated) will call the on_success handler
98
+ feed = Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing",
99
+ :on_success => lambda {|feed| puts feed.title },
100
+ :on_failure => lambda {|url, response_code, response_header, response_body| puts response_body })
101
+ # if a collection was passed into fetch_and_parse, the handlers will be called for each one
102
+
103
+ # the behavior for the handlers when using Feedzirra::Feed.update is slightly different. The feed passed into on_success will be
104
+ # the updated feed with the standard updated accessors. on failure it will be the original feed object passed into update
105
+
106
+ # Defining custom parsers
107
+ # TODO: the functionality is here, just write some good examples that show how to do this
108
+
109
+ === Extending
110
+
111
+ === Benchmarks
112
+
113
+ One of the goals of Feedzirra is speed. This includes not only parsing, but fetching multiple feeds as quickly as possible. I ran a benchmark getting 20 feeds 10 times using Feedzirra, rFeedParser, and FeedNormalizer. For more details the {benchmark code can be found in the project in spec/benchmarks/feedzirra_benchmarks.rb}[http://github.com/pauldix/feedzirra/blob/7fb5634c5c16e9c6ec971767b462c6518cd55f5d/spec/benchmarks/feedzirra_benchmarks.rb]
114
+
115
+ feedzirra 5.170000 1.290000 6.460000 ( 18.917796)
116
+ rfeedparser 104.260000 12.220000 116.480000 (244.799063)
117
+ feed-normalizer 66.250000 4.010000 70.260000 (191.589862)
118
+
119
+ The result of that benchmark is a bit sketchy because of the network variability. Running 10 times against the same 20 feeds was meant to smooth some of that out. However, there is also a {benchmark comparing parsing speed in spec/benchmarks/parsing_benchmark.rb}[http://github.com/pauldix/feedzirra/blob/7fb5634c5c16e9c6ec971767b462c6518cd55f5d/spec/benchmarks/parsing_benchmark.rb] on an atom feed.
120
+
121
+ feedzirra 0.500000 0.030000 0.530000 ( 0.658744)
122
+ rfeedparser 8.400000 1.110000 9.510000 ( 11.839827)
123
+ feed-normalizer 5.980000 0.160000 6.140000 ( 7.576140)
124
+
125
+ There's also a {benchmark that shows the results of using Feedzirra to perform updates on feeds}[http://github.com/pauldix/feedzirra/blob/45d64319544c61a4c9eb9f7f825c73b9f9030cb3/spec/benchmarks/updating_benchmarks.rb] you've already pulled in. I tested against 179 feeds. The first is the initial pull and the second is an update 65 seconds later. I'm not sure how many of them support etag and last-modified, so performance may be better or worse depending on what feeds you're requesting.
126
+
127
+ feedzirra fetch and parse 4.010000 0.710000 4.720000 ( 15.110101)
128
+ feedzirra update 0.660000 0.280000 0.940000 ( 5.152709)
129
+
130
+ === TODO
131
+
132
+ This thing needs to hammer on many different feeds in the wild. I'm sure there will be bugs. I want to find them and crush them. I didn't bother using the test suite for feedparser. i wanted to start fresh.
133
+
134
+ Here are some more specific TODOs.
135
+ * Make a feedzirra-rails gem to integrate feedzirra seamlessly with Rails and ActiveRecord.
136
+ * Add support for authenticated feeds.
137
+ * Create a super sweet DSL for defining new parsers.
138
+ * Test against Ruby 1.9.1 and fix any bugs.
139
+ * I'm not keeping track of modified on entries. Should I add this?
140
+ * Clean up the fetching code inside feed.rb so it doesn't suck so hard.
141
+ * Make the feed_spec actually mock stuff out so it doesn't hit the net.
142
+ * Readdress how feeds determine if they can parse a document. Maybe I should use namespaces instead?
143
+
144
+ === LICENSE
145
+
146
+ (The MIT License)
147
+
148
+ Copyright (c) 2009:
149
+
150
+ {Paul Dix}[http://pauldix.net]
151
+
152
+ Permission is hereby granted, free of charge, to any person obtaining
153
+ a copy of this software and associated documentation files (the
154
+ 'Software'), to deal in the Software without restriction, including
155
+ without limitation the rights to use, copy, modify, merge, publish,
156
+ distribute, sublicense, and/or sell copies of the Software, and to
157
+ permit persons to whom the Software is furnished to do so, subject to
158
+ the following conditions:
159
+
160
+ The above copyright notice and this permission notice shall be
161
+ included in all copies or substantial portions of the Software.
162
+
163
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
164
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
165
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
166
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
167
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
168
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
169
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -93,7 +93,7 @@ updated_feed.new_entries # a collection of the entry objects that are newer tha
93
93
 
94
94
  # fetching multiple feeds
95
95
  feed_urls = ["http://feeds.feedburner.com/PaulDixExplainsNothing", "http://feeds.feedburner.com/trottercashion"]
96
- feeds = Feedzirra::Feed.fetch_and_parse(feeds_urls)
96
+ feeds = Feedzirra::Feed.fetch_and_parse(feed_urls)
97
97
 
98
98
  # feeds is now a hash with the feed_urls as keys and the parsed feed objects as values. If an error was thrown
99
99
  # there will be a Fixnum of the http response code instead of a feed object
@@ -116,6 +116,12 @@ Feedzirra::Feed.add_common_feed_entry_element("wfw:commentRss", :as => :comment_
116
116
  # AtomEntry classes. Now you can access those in an atom feed:
117
117
  Feedzirra::Feed.parse(some_atom_xml).entries.first.comment_rss_ # => wfw:commentRss is now parsed!
118
118
 
119
+
120
+ # You can also define your own parsers and add them to the ones Feedzirra knows about. Here's an example that adds
121
+ # ITunesRSS parsing. It's included in the library, but not part of Feedzirra by default because some of the field names
122
+ # differ from other classes, thus breaking normalization.
123
+ Feedzirra::Feed.add_feed_class(ITunesRSS) # now all feeds will be checked to see if they match ITunesRSS before others
124
+
119
125
  # You can also access http basic auth feeds. Unfortunately, you can't get to these inside of a bulk get of a bunch of feeds.
120
126
  # You'll have to do it on its own like so:
121
127
  Feedzirra::Feed.fetch_and_parse(some_url, :http_authentication => ["myusername", "mypassword"])
@@ -151,7 +157,8 @@ This thing needs to hammer on many different feeds in the wild. I'm sure there w
151
157
  Here are some more specific TODOs.
152
158
  * Fix the iTunes parser so things are normalized again
153
159
  * Fix the Zlib deflate error
154
- * Fork taf2-curb and require that in feedzirra
160
+ * Fix this error: http://github.com/inbox/70508
161
+ * Convert to use Typhoeus instead of taf2-curb
155
162
  * Make the entries parse all link fields
156
163
  * Make a feedzirra-rails gem to integrate feedzirra seamlessly with Rails and ActiveRecord.
157
164
  * Create a super sweet DSL for defining new parsers.
data/Rakefile CHANGED
@@ -20,6 +20,9 @@ def recent_specs(touched_since)
20
20
  recent_specs.uniq
21
21
  end
22
22
 
23
+ desc "Run all the tests"
24
+ task :default => :spec
25
+
23
26
  # Tasks
24
27
  Spec::Rake::SpecTask.new do |t|
25
28
  t.spec_opts = ['--options', "\"#{File.dirname(__FILE__)}/spec/spec.opts\""]
@@ -17,20 +17,21 @@ require 'core_ext/string'
17
17
  require 'feedzirra/feed_utilities'
18
18
  require 'feedzirra/feed_entry_utilities'
19
19
  require 'feedzirra/feed'
20
+ require 'feedzirra/push_parser'
20
21
 
21
22
  require 'feedzirra/push_parser'
22
23
 
23
- require 'feedzirra/rss_entry'
24
- require 'feedzirra/itunes_rss_owner'
25
- require 'feedzirra/itunes_rss_item'
26
- require 'feedzirra/atom_entry'
27
- require 'feedzirra/atom_feed_burner_entry'
24
+ require 'feedzirra/parser/rss_entry'
25
+ require 'feedzirra/parser/itunes_rss_owner'
26
+ require 'feedzirra/parser/itunes_rss_item'
27
+ require 'feedzirra/parser/atom_entry'
28
+ require 'feedzirra/parser/atom_feed_burner_entry'
28
29
 
29
- require 'feedzirra/rss'
30
- require 'feedzirra/itunes_rss'
31
- require 'feedzirra/atom'
32
- require 'feedzirra/atom_feed_burner'
30
+ require 'feedzirra/parser/rss'
31
+ require 'feedzirra/parser/itunes_rss'
32
+ require 'feedzirra/parser/atom'
33
+ require 'feedzirra/parser/atom_feed_burner'
33
34
 
34
35
  module Feedzirra
35
- VERSION = "0.0.8"
36
+ VERSION = "0.0.12"
36
37
  end
@@ -3,6 +3,7 @@ module Feedzirra
3
3
 
4
4
  class Feed
5
5
  USER_AGENT = "feedzirra http://github.com/pauldix/feedzirra/tree/master"
6
+ TIMEOUT = 30
6
7
 
7
8
  # Takes a raw XML feed and attempts to parse it. If no parser is available a Feedzirra::NoParserAvailable exception is raised.
8
9
  #
@@ -46,7 +47,7 @@ module Feedzirra
46
47
  # === Returns
47
48
  # A array of class names.
48
49
  def self.feed_classes
49
- @feed_classes ||= [ITunesRSS, RSS, AtomFeedBurner, Atom]
50
+ @feed_classes ||= [Feedzirra::Parser::RSS, Feedzirra::Parser::AtomFeedBurner, Feedzirra::Parser::Atom]
50
51
  end
51
52
 
52
53
  # Makes all entry types look for the passed in element to parse. This is actually just a call to
@@ -58,25 +59,11 @@ module Feedzirra
58
59
  def self.add_common_feed_entry_element(element_tag, options = {})
59
60
  # need to think of a better way to do this. will break for people who want this behavior
60
61
  # across their added classes
61
- [RSSEntry, AtomFeedBurnerEntry, AtomEntry].each do |klass|
62
+ feed_classes.map{|k| eval("#{k}Entry") }.each do |klass|
62
63
  klass.send(:element, element_tag, options)
63
64
  end
64
65
  end
65
66
 
66
- # Makes all entry types look for the passed in elements to parse. This is actually just a call to
67
- # elements (a SAXMachine call) in the class
68
- #
69
- # === Parameters
70
- # [element_tag<String>]
71
- # [options<Hash>] Valid keys are same as with SAXMachine
72
- def self.add_common_feed_entry_elements(element_tag, options = {})
73
- # need to think of a better way to do this. will break for people who want this behavior
74
- # across their added classes
75
- [RSSEntry, AtomFeedBurnerEntry, AtomEntry].each do |klass|
76
- klass.send(:elements, element_tag, options)
77
- end
78
- end
79
-
80
67
  # Fetches and returns the raw XML for each URL provided.
81
68
  #
82
69
  # === Parameters
@@ -100,7 +87,7 @@ module Feedzirra
100
87
  curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT)
101
88
  curl.headers["If-Modified-Since"] = options[:if_modified_since].httpdate if options.has_key?(:if_modified_since)
102
89
  curl.headers["If-None-Match"] = options[:if_none_match] if options.has_key?(:if_none_match)
103
- curl.headers["Accept-encoding"] = 'gzip, deflate'
90
+ curl.headers["Accept-encoding"] = 'gzip, deflate' if options.has_key?(:compress)
104
91
  curl.follow_location = true
105
92
  curl.userpwd = options[:http_authentication].join(':') if options.has_key?(:http_authentication)
106
93
 
@@ -115,7 +102,7 @@ module Feedzirra
115
102
  end
116
103
 
117
104
  multi.perform
118
- return urls.is_a?(String) ? responses.values.first : responses
105
+ urls.is_a?(String) ? responses.values.first : responses
119
106
  end
120
107
 
121
108
  # Fetches and returns the parsed XML for each URL provided.
@@ -123,11 +110,11 @@ module Feedzirra
123
110
  # === Parameters
124
111
  # [urls<String> or <Array>] A single feed URL, or an array of feed URLs.
125
112
  # [options<Hash>] Valid keys for this argument as as followed:
126
- # * :user_agent - String that overrides the default user agent.
127
- # * :if_modified_since - Time object representing when the feed was last updated.
128
- # * :if_none_match - String, an etag for the request that was stored previously.
129
- # * :on_success - Block that gets executed after a successful request.
130
- # * :on_failure - Block that gets executed after a failed request.
113
+ # * :user_agent - String that overrides the default user agent.
114
+ # * :if_modified_since - Time object representing when the feed was last updated.
115
+ # * :if_none_match - String, an etag for the request that was stored previously.
116
+ # * :on_success - Block that gets executed after a successful request.
117
+ # * :on_failure - Block that gets executed after a failed request.
131
118
  # === Returns
132
119
  # A Feed object if a single URL is passed.
133
120
  #
@@ -137,12 +124,12 @@ module Feedzirra
137
124
  multi = Curl::Multi.new
138
125
  responses = {}
139
126
 
140
- # I broke these down so I would only try to do 30 simultaneously because
127
+ # I broke these down so I would only try to do 30 simultaneously because
141
128
  # I was getting weird errors when doing a lot. As one finishes it pops another off the queue.
142
129
  url_queue.slice!(0, 30).each do |url|
143
130
  add_url_to_multi(multi, url, url_queue, responses, options)
144
131
  end
145
-
132
+
146
133
  multi.perform
147
134
  return urls.is_a?(String) ? responses.values.first : responses
148
135
  end
@@ -194,7 +181,7 @@ module Feedzirra
194
181
  end
195
182
 
196
183
  multi.perform
197
- return responses.size == 1 ? responses.values.first : responses.values
184
+ responses.size == 1 ? responses.values.first : responses.values
198
185
  end
199
186
 
200
187
  # An abstraction for adding a feed by URL to the passed Curb::multi stack.
@@ -216,7 +203,7 @@ module Feedzirra
216
203
  curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT)
217
204
  curl.headers["If-Modified-Since"] = options[:if_modified_since].httpdate if options.has_key?(:if_modified_since)
218
205
  curl.headers["If-None-Match"] = options[:if_none_match] if options.has_key?(:if_none_match)
219
- curl.headers["Accept-encoding"] = 'gzip, deflate'
206
+ curl.headers["Accept-encoding"] = 'gzip, deflate' if options.has_key?(:compress)
220
207
  curl.follow_location = true
221
208
  curl.userpwd = options[:http_authentication].join(':') if options.has_key?(:http_authentication)
222
209
 
@@ -226,12 +213,16 @@ module Feedzirra
226
213
  klass = determine_feed_parser_for_xml(xml)
227
214
 
228
215
  if klass
229
- feed = klass.parse(xml)
230
- feed.feed_url = c.last_effective_url
231
- feed.etag = etag_from_header(c.header_str)
232
- feed.last_modified = last_modified_from_header(c.header_str)
233
- responses[url] = feed
234
- options[:on_success].call(url, feed) if options.has_key?(:on_success)
216
+ begin
217
+ feed = klass.parse(xml)
218
+ feed.feed_url = c.last_effective_url
219
+ feed.etag = etag_from_header(c.header_str)
220
+ feed.last_modified = last_modified_from_header(c.header_str)
221
+ responses[url] = feed
222
+ options[:on_success].call(url, feed) if options.has_key?(:on_success)
223
+ rescue Exception => e
224
+ options[:on_failure].call(url, c.response_code, c.header_str, c.body_str) if options.has_key?(:on_failure)
225
+ end
235
226
  else
236
227
  # puts "Error determining parser for #{url} - #{c.last_effective_url}"
237
228
  # raise NoParserAvailable.new("no valid parser for content.") (this would unfirtunately fail the whole 'multi', so it's not really useable)
@@ -267,18 +258,23 @@ module Feedzirra
267
258
  curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT)
268
259
  curl.headers["If-Modified-Since"] = feed.last_modified.httpdate if feed.last_modified
269
260
  curl.headers["If-None-Match"] = feed.etag if feed.etag
261
+ curl.timeout = (options[:timeout] || TIMEOUT)
270
262
  curl.userpwd = options[:http_authentication].join(':') if options.has_key?(:http_authentication)
271
263
  curl.follow_location = true
272
264
 
273
265
  curl.on_success do |c|
274
- add_feed_to_multi(multi, feed_queue.shift, feed_queue, responses, options) unless feed_queue.empty?
275
- updated_feed = Feed.parse(c.body_str)
276
- updated_feed.feed_url = c.last_effective_url
277
- updated_feed.etag = etag_from_header(c.header_str)
278
- updated_feed.last_modified = last_modified_from_header(c.header_str)
279
- feed.update_from_feed(updated_feed)
280
- responses[feed.feed_url] = feed
281
- options[:on_success].call(feed) if options.has_key?(:on_success)
266
+ begin
267
+ add_feed_to_multi(multi, feed_queue.shift, feed_queue, responses, options) unless feed_queue.empty?
268
+ updated_feed = Feed.parse(c.body_str)
269
+ updated_feed.feed_url = c.last_effective_url
270
+ updated_feed.etag = etag_from_header(c.header_str)
271
+ updated_feed.last_modified = last_modified_from_header(c.header_str)
272
+ feed.update_from_feed(updated_feed)
273
+ responses[feed.feed_url] = feed
274
+ options[:on_success].call(feed) if options.has_key?(:on_success)
275
+ rescue Exception => e
276
+ options[:on_failure].call(feed, c.response_code, c.header_str, c.body_str) if options.has_key?(:on_failure)
277
+ end
282
278
  end
283
279
 
284
280
  curl.on_failure do |c|
@@ -19,6 +19,12 @@ module Feedzirra
19
19
  @id || @url
20
20
  end
21
21
 
22
+ ##
23
+ # Summary is @summary of @content of nil.
24
+ def summary
25
+ @summary || @content
26
+ end
27
+
22
28
  ##
23
29
  # Writter for published. By default, we keep the "oldest" publish time found.
24
30
  def published=(val)
@@ -0,0 +1,26 @@
1
+ module Feedzirra
2
+
3
+ module Parser
4
+ # == Summary
5
+ # Parser for dealing with Atom feeds.
6
+ #
7
+ # == Attributes
8
+ # * title
9
+ # * feed_url
10
+ # * url
11
+ # * entries
12
+ class Atom
13
+ include SAXMachine
14
+ include FeedUtilities
15
+ element :title
16
+ element :link, :as => :url, :value => :href, :with => {:type => "text/html"}
17
+ element :link, :as => :feed_url, :value => :href, :with => {:type => "application/atom+xml"}
18
+ elements :entry, :as => :entries, :class => AtomEntry
19
+
20
+ def self.able_to_parse?(xml) #:nodoc:
21
+ xml =~ /(Atom)|(#{Regexp.escape("http://purl.org/atom")})/
22
+ end
23
+ end
24
+ end
25
+
26
+ end
@@ -0,0 +1,34 @@
1
+ module Feedzirra
2
+
3
+ module Parser
4
+ # == Summary
5
+ # Parser for dealing with Atom feed entries.
6
+ #
7
+ # == Attributes
8
+ # * title
9
+ # * url
10
+ # * author
11
+ # * content
12
+ # * summary
13
+ # * published
14
+ # * categories
15
+ class AtomEntry
16
+ include SAXMachine
17
+ include FeedEntryUtilities
18
+ element :title
19
+ element :link, :as => :url, :value => :href, :with => {:type => "text/html", :rel => "alternate"}
20
+ element :name, :as => :author
21
+ element :content
22
+ element :summary
23
+ element :published
24
+ element :id
25
+ element :created, :as => :published
26
+ element :issued, :as => :published
27
+ element :updated
28
+ element :modified, :as => :updated
29
+ elements :category, :as => :categories, :value => :term
30
+ end
31
+
32
+ end
33
+
34
+ end