jsl-feedzirra 0.0.12.8 → 0.0.12.9

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE.rdoc ADDED
@@ -0,0 +1,22 @@
1
+ (The MIT License)
2
+
3
+ Copyright (c) 2009 {Paul Dix}[http://pauldix.net]
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ 'Software'), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
19
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
20
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
21
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
22
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc CHANGED
@@ -1,6 +1,6 @@
1
- == Feedzirra
1
+ = Feedzirra
2
2
 
3
- === Description
3
+ == Description
4
4
 
5
5
  Feedzirra is a feed library that is designed to get and update many feeds as quickly as possible. This includes using libcurl-multi through the
6
6
  taf2-curb[link:http://github.com/taf2/curb/tree/master] gem for faster http gets, and libxml through
@@ -12,22 +12,20 @@ much control as you want in updating feeds. Feedzirra makes it easy to figure o
12
12
  of a feed in a key-value store. Feedzirra uses the the "moneta" gem, which is a unified interface to key-value storage systems, in order to provide
13
13
  access to many different types of stores depending on your requirements.
14
14
 
15
- === Installation
15
+ == Installation
16
16
 
17
17
  For now Feedzirra exists only on github. It also has a few gem requirements that are only on github. Before you start you need to have
18
18
  libcurl[link:http://curl.haxx.se/] and libxml[link:http://xmlsoft.org/] installed. If you're on Leopard you have both. Otherwise, you'll need to
19
- grab them. Once you've got those libraries, these are the gems that get used: nokogiri, pauldix-sax-machine, taf2-curb (note that this is a fork
20
- that lives on github and not the Ruby Forge version of curb), and pauldix-feedzirra. The feedzirra gemspec has all the dependencies so you should
21
- be able to get up and running with the standard github gem install routine:
19
+ grab them. Once you've got those libraries, you should be able to get up and running with the standard github gem install routine:
22
20
 
23
21
  gem sources -a http://gems.github.com # if you haven't already
24
- gem install pauldix-feedzirra
22
+ gem install jsl-feedzirra
25
23
 
26
- === Usage
24
+ == Usage
27
25
 
28
- This experimental branch offers a new interface to feed fetching with persistent back-end stores. This allows you to
29
- easily run a script retrieving the feeds once per hour or once per day, and it will remember which feeds have been seen
30
- before and which are new. This features uses the Feedzirra::Reader interface.
26
+ This experimental branch offers a new interface to feed fetching with persistent back-end stores. This allows you to easily run a script
27
+ retrieving the feeds once per hour or once per day, and it will remember which feeds have been seenbefore and which are new. This feature
28
+ uses the Feedzirra::Reader interface.
31
29
 
32
30
  You can create a Feedzirra::Reader object after the Feedzirra library (with require 'feedzirra') is loaded as follows:
33
31
 
@@ -51,84 +49,60 @@ and a Ruby Hash structure-based back end that doesn't attempt to persist any inf
51
49
 
52
50
  Once you've retrieved a single feed, you can use the accessors below to query the results.
53
51
 
54
- # feed and entries accessors
55
- feed.title # => "Paul Dix Explains Nothing"
56
- feed.url # => "http://www.pauldix.net"
57
- feed.feed_url # => "http://feeds.feedburner.com/PaulDixExplainsNothing"
58
- feed.etag # => "GunxqnEP4NeYhrqq9TyVKTuDnh0"
59
- feed.last_modified # => Sat Jan 31 17:58:16 -0500 2009 # it's a Time object
60
-
61
- entry = feed.entries.first
62
- entry.title # => "Ruby Http Client Library Performance"
63
- entry.url # => "http://www.pauldix.net/2009/01/ruby-http-client-library-performance.html"
64
- entry.author # => "Paul Dix"
65
- entry.summary # => "..."
66
- entry.content # => "..."
67
- entry.published # => Thu Jan 29 17:00:19 UTC 2009 # it's a Time object
68
- entry.categories # => ["...", "..."]
69
-
70
- # sanitizing an entry's content
71
- entry.title.sanitize # => returns the title with harmful stuff escaped
72
- entry.author.sanitize # => returns the author with harmful stuff escaped
73
- entry.content.sanitize # => returns the content with harmful stuff escaped
74
- entry.content.sanitize! # => returns content with harmful stuff escaped and replaces original (also exists for author and title)
75
- entry.sanitize! # => sanitizes the entry's title, author, and content in place (as in, it changes the value to clean versions)
76
- feed.sanitize_entries! # => sanitizes all entries in place
77
-
78
- # updating a single feed
79
- updated_feed = Feedzirra::Feed.update(feed)
80
-
81
- # an updated feed has the following extra accessors
82
- updated_feed.updated? # returns true if any of the feed attributes have been modified. will return false if only new entries
83
- updated_feed.new_entries # a collection of the entry objects that are newer than the latest in the feed before update
84
-
85
- # fetching multiple feeds
86
- feed_urls = ["http://feeds.feedburner.com/PaulDixExplainsNothing", "http://feeds.feedburner.com/trottercashion"]
87
- feeds = Feedzirra::Reader.new(feed_urls).fetch
88
-
89
- # feeds is now a hash with the feed_urls as keys and the parsed feed objects as values. If an error was thrown
90
- # there will be a Fixnum of the http response code instead of a feed object
91
-
92
- # updating multiple feeds. if you're using a persistent back-end, Feedzirra uses that to determine which entries are ones that you haven't seen before
93
- updated_feeds = Feedzirra::reader.new(urls).fetch
94
-
95
- # defining custom behavior on failure or success. note that a return status of 304 (not updated) will call the on_success handler
96
- feed = Feedzirra::Reader.new("http://feeds.feedburner.com/PaulDixExplainsNothing",
97
- :on_success => lambda {|feed| puts feed.title },
98
- :on_failure => lambda {|url, response_code, response_header, response_body| puts response_body }).fetch
99
-
100
- # if a collection was passed into the initializer, the handlers will be called for each one
101
-
102
- === Extending
103
-
104
- Feedzirra is easily extended with custom parsing classes and persistent back-ends. You'll have to read the source to find out how, though, because we
105
- still haven't written the documentation. :(
106
-
107
- === Benchmarks
108
-
109
- One of the goals of Feedzirra is speed. This includes not only parsing, but fetching multiple feeds as quickly as possible. I ran a benchmark getting 20 feeds 10 times using Feedzirra, rFeedParser, and FeedNormalizer. For more details the {benchmark code can be found in the project in spec/benchmarks/feedzirra_benchmarks.rb}[http://github.com/pauldix/feedzirra/blob/7fb5634c5c16e9c6ec971767b462c6518cd55f5d/spec/benchmarks/feedzirra_benchmarks.rb]
110
-
111
- feedzirra 5.170000 1.290000 6.460000 ( 18.917796)
112
- rfeedparser 104.260000 12.220000 116.480000 (244.799063)
113
- feed-normalizer 66.250000 4.010000 70.260000 (191.589862)
114
-
115
- The result of that benchmark is a bit sketchy because of the network variability. Running 10 times against the same 20 feeds was meant to smooth some of that out. However, there is also a {benchmark comparing parsing speed in spec/benchmarks/parsing_benchmark.rb}[http://github.com/pauldix/feedzirra/blob/7fb5634c5c16e9c6ec971767b462c6518cd55f5d/spec/benchmarks/parsing_benchmark.rb] on an atom feed.
116
-
117
- feedzirra 0.500000 0.030000 0.530000 ( 0.658744)
118
- rfeedparser 8.400000 1.110000 9.510000 ( 11.839827)
119
- feed-normalizer 5.980000 0.160000 6.140000 ( 7.576140)
120
-
121
- There's also a {benchmark that shows the results of using Feedzirra to perform updates on feeds}[http://github.com/pauldix/feedzirra/blob/45d64319544c61a4c9eb9f7f825c73b9f9030cb3/spec/benchmarks/updating_benchmarks.rb] you've already pulled in. I tested against 179 feeds. The first is the initial pull and the second is an update 65 seconds later. I'm not sure how many of them support etag and last-modified, so performance may be better or worse depending on what feeds you're requesting.
122
-
123
- feedzirra fetch and parse 4.010000 0.710000 4.720000 ( 15.110101)
124
- feedzirra update 0.660000 0.280000 0.940000 ( 5.152709)
125
-
126
- === Discussion
52
+ # feed and entries accessors
53
+ feed.title # => "Paul Dix Explains Nothing"
54
+ feed.url # => "http://www.pauldix.net"
55
+ feed.feed_url # => "http://feeds.feedburner.com/PaulDixExplainsNothing"
56
+ feed.etag # => "GunxqnEP4NeYhrqq9TyVKTuDnh0"
57
+ feed.last_modified # => Sat Jan 31 17:58:16 -0500 2009 # it's a Time object
58
+
59
+ entry = feed.entries.first
60
+ entry.title # => "Ruby Http Client Library Performance"
61
+ entry.url # => "http://www.pauldix.net/2009/01/ruby-http-client-library-performance.html"
62
+ entry.author # => "Paul Dix"
63
+ entry.summary # => "..."
64
+ entry.content # => "..."
65
+ entry.published # => Thu Jan 29 17:00:19 UTC 2009 # it's a Time object
66
+ entry.categories # => ["...", "..."]
67
+
68
+ # sanitizing an entry's content
69
+ entry.title.sanitize # => returns the title with harmful stuff escaped
70
+ entry.author.sanitize # => returns the author with harmful stuff escaped
71
+ entry.content.sanitize # => returns the content with harmful stuff escaped
72
+ entry.content.sanitize! # => returns content with harmful stuff escaped and replaces original (also exists for author and title)
73
+ entry.sanitize! # => sanitizes the entry's title, author, and content in place (as in, it changes the value to clean versions)
74
+ feed.sanitize_entries! # => sanitizes all entries in place
75
+
76
+ # updating a single feed
77
+ updated_feed = Feedzirra::Feed.update(feed)
78
+
79
+ # an updated feed has the following extra accessors
80
+ updated_feed.updated? # returns true if any of the feed attributes have been modified. will return false if only new entries
81
+ updated_feed.new_entries # a collection of the entry objects that are newer than the latest in the feed before update
82
+
83
+ # fetching multiple feeds
84
+ feed_urls = ["http://feeds.feedburner.com/PaulDixExplainsNothing", "http://feeds.feedburner.com/trottercashion"]
85
+ feeds = Feedzirra::Reader.new(feed_urls).fetch
86
+
87
+ # feeds is now a hash with the feed_urls as keys and the parsed feed objects as values. If an error was thrown
88
+ # there will be a Fixnum of the http response code instead of a feed object
89
+
90
+ # updating multiple feeds. if you're using a persistent back-end, Feedzirra uses that to determine which entries are ones that you haven't seen before
91
+ updated_feeds = Feedzirra::reader.new(urls).fetch
92
+
93
+ # defining custom behavior on failure or success. note that a return status of 304 (not updated) will call the on_success handler
94
+ feed = Feedzirra::Reader.new("http://feeds.feedburner.com/PaulDixExplainsNothing",
95
+ :on_success => lambda {|feed| puts feed.title },
96
+ :on_failure => lambda {|url, response_code, response_header, response_body| puts response_body }).fetch
97
+
98
+ # if a collection was passed into the initializer, the handlers will be called for each one
99
+
100
+ == Discussion
127
101
 
128
102
  I'd like feedback on the api and any bugs encountered on feeds in the wild. I've set up a
129
103
  {google group here}[http://groups.google.com/group/feedzirra].
130
104
 
131
- ==== Troubleshooting Installation
105
+ == Troubleshooting Installation
132
106
 
133
107
  *NOTE:*Some people have been reporting a few issues related to installation. First, the Ruby Forge version of curb is not what you want. It will not work. Nor will the curl-multi gem that lives on
134
108
  Ruby Forge. You have to get the taf2-curb[link:http://github.com/taf2/curb/tree/master] fork installed.
@@ -155,7 +129,7 @@ Another problem could be if you are running Mac Ports and you have libcurl insta
155
129
 
156
130
  If you're still having issues, please let me know on the mailing list. Also, {Todd Fisher (taf2)}[link:http://github.com/taf2] is working on fixing the gem install. Please send him a full error report.
157
131
 
158
- === TODO
132
+ == TODO
159
133
 
160
134
  This thing needs to hammer on many different feeds in the wild. I'm sure there will be bugs. I want to find them and crush them. I didn't bother
161
135
  using the test suite for feedparser. i wanted to start fresh.
@@ -171,29 +145,6 @@ Here are some more specific TODOs.
171
145
  * Make the feed_spec actually mock stuff out so it doesn't hit the net.
172
146
  * Readdress how feeds determine if they can parse a document. Maybe I should use namespaces instead?
173
147
 
174
- === LICENSE
175
-
176
- (The MIT License)
177
-
178
- Copyright (c) 2009:
179
-
180
- {Paul Dix}[http://pauldix.net]
181
-
182
- Permission is hereby granted, free of charge, to any person obtaining
183
- a copy of this software and associated documentation files (the
184
- 'Software'), to deal in the Software without restriction, including
185
- without limitation the rights to use, copy, modify, merge, publish,
186
- distribute, sublicense, and/or sell copies of the Software, and to
187
- permit persons to whom the Software is furnished to do so, subject to
188
- the following conditions:
189
-
190
- The above copyright notice and this permission notice shall be
191
- included in all copies or substantial portions of the Software.
192
-
193
- THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
194
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
195
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
196
- IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
197
- CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
198
- TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
199
- SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
148
+ == LICENSE
149
+
150
+ This library is provided under the MIT License. See {the complete LICENSE}[link:files/LICENSE_rdoc.html] for details.
@@ -19,7 +19,7 @@ module Feedzirra
19
19
  # A Hash if multiple URL's are passed. The key will be the URL, and the value the Feed object.
20
20
  def self.fetch_and_parse(urls, options = {})
21
21
  multi = Feedzirra::HttpMulti.new(urls, options)
22
- multi.perform
22
+ multi.run
23
23
  urls.is_a?(String) ? multi.responses.values.first : multi.responses
24
24
  end
25
25
 
@@ -36,9 +36,8 @@ module Feedzirra
36
36
  #
37
37
  # A Hash if multiple Feeds are passed. The key will be the URL, and the value the updated Feed object.
38
38
  def self.update(feeds, options = {})
39
- multi = Feedzirra::HttpMulti.new(urls, options)
40
-
41
- multi.perform
39
+ multi = Feedzirra::HttpMulti.new(feeds, options)
40
+ multi.run
42
41
  multi.responses.size == 1 ? multi.responses.values.first : multi.responses.values
43
42
  end
44
43
 
@@ -60,7 +59,7 @@ module Feedzirra
60
59
  # FIXME - Raw mode is currently not supported!
61
60
  def self.fetch_raw(urls, options = {})
62
61
  multi = Feedzirra::HttpMulti.new(urls, options.merge(:raw => true))
63
- multi.perform
62
+ multi.run
64
63
  urls.is_a?(String) ? multi.responses.values.first : multi.responses
65
64
  end
66
65
 
data/lib/feedzirra.rb CHANGED
@@ -39,6 +39,6 @@ require 'feedzirra/parser/atom'
39
39
  require 'feedzirra/parser/atom_feed_burner'
40
40
 
41
41
  module Feedzirra
42
- USER_AGENT = "feedzirra http://github.com/pauldix/feedzirra/tree/master"
43
- VERSION = "0.0.12.5"
42
+ USER_AGENT = "feedzirra http://github.com/jsl/feedzirra/tree/master"
43
+ VERSION = "0.0.12.9"
44
44
  end
@@ -1,5 +1,36 @@
1
1
  require File.join(File.dirname(__FILE__), %w[.. spec_helper])
2
2
 
3
3
  describe Feedzirra::Feed do
4
+ describe "Feed.fetch_and_parse" do
5
+ it "should call #run on the HttpMulti object" do
6
+ multi = mock('httpmulti')
7
+ multi.expects(:run)
8
+ response = mock('response', :values => [ ])
9
+ multi.expects(:responses).returns(response)
10
+ Feedzirra::HttpMulti.expects(:new).with('foo', { }).returns(multi)
11
+ Feedzirra::Feed.fetch_and_parse('foo')
12
+ end
13
+ end
4
14
 
15
+ describe "Feed.update" do
16
+ it "should call #run on the HttpMulti object" do
17
+ multi = mock('httpmulti')
18
+ multi.expects(:run)
19
+ response = mock('response', :values => [ ], :size => 0)
20
+ multi.expects(:responses).returns(response).times(2)
21
+ Feedzirra::HttpMulti.expects(:new).with('foo', { }).returns(multi)
22
+ Feedzirra::Feed.update('foo')
23
+ end
24
+ end
25
+
26
+ describe "#Feed.fetch_raw" do
27
+ it "should call #run on the HttpMulti object" do
28
+ multi = mock('httpmulti')
29
+ multi.expects(:run)
30
+ response = mock('response', :values => [ ])
31
+ multi.expects(:responses).returns(response)
32
+ Feedzirra::HttpMulti.expects(:new).with('foo', { :raw => true}).returns(multi)
33
+ Feedzirra::Feed.fetch_raw('foo')
34
+ end
35
+ end
5
36
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: jsl-feedzirra
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.12.8
4
+ version: 0.0.12.9
5
5
  platform: ruby
6
6
  authors:
7
7
  - Paul Dix
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-04-29 00:00:00 -07:00
12
+ date: 2009-05-19 00:00:00 -07:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -18,9 +18,9 @@ dependencies:
18
18
  version_requirement:
19
19
  version_requirements: !ruby/object:Gem::Requirement
20
20
  requirements:
21
- - - ">"
21
+ - - ">="
22
22
  - !ruby/object:Gem::Version
23
- version: 0.0.0
23
+ version: "0"
24
24
  version:
25
25
  - !ruby/object:Gem::Dependency
26
26
  name: nokogiri
@@ -28,9 +28,9 @@ dependencies:
28
28
  version_requirement:
29
29
  version_requirements: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - ">"
31
+ - - ">="
32
32
  - !ruby/object:Gem::Version
33
- version: 0.0.0
33
+ version: "0"
34
34
  version:
35
35
  - !ruby/object:Gem::Dependency
36
36
  name: pauldix-sax-machine
@@ -90,7 +90,7 @@ dependencies:
90
90
  requirements:
91
91
  - - ">="
92
92
  - !ruby/object:Gem::Version
93
- version: 0.0.0
93
+ version: "0"
94
94
  version:
95
95
  description:
96
96
  email: paul@pauldix.net
@@ -98,8 +98,9 @@ executables: []
98
98
 
99
99
  extensions: []
100
100
 
101
- extra_rdoc_files: []
102
-
101
+ extra_rdoc_files:
102
+ - README.rdoc
103
+ - LICENSE.rdoc
103
104
  files:
104
105
  - lib/core_ext/date.rb
105
106
  - lib/core_ext/string.rb
@@ -122,6 +123,7 @@ files:
122
123
  - lib/feedzirra/parser/feed_utilities.rb
123
124
  - lib/feedzirra/parser/feed_entry_utilities.rb
124
125
  - README.rdoc
126
+ - LICENSE.rdoc
125
127
  - Rakefile
126
128
  - spec/spec.opts
127
129
  - spec/spec_helper.rb
@@ -138,10 +140,15 @@ files:
138
140
  - spec/feedzirra/feed_utilities_spec.rb
139
141
  - spec/feedzirra/feed_entry_utilities_spec.rb
140
142
  has_rdoc: true
141
- homepage: http://github.com/pauldix/feedzirra
143
+ homepage: http://github.com/jsl/feedzirra
142
144
  post_install_message:
143
- rdoc_options: []
144
-
145
+ rdoc_options:
146
+ - --title
147
+ - HashBack
148
+ - --main
149
+ - README.rdoc
150
+ - --line-numbers
151
+ - --inline-source
145
152
  require_paths:
146
153
  - lib
147
154
  required_ruby_version: !ruby/object:Gem::Requirement