jsl-feedzirra 0.0.12.9 → 0.0.12.10
Sign up to get free protection for your applications and to get access to all the features.
- data/README.rdoc +7 -10
- data/lib/feedzirra.rb +3 -1
- data/lib/feedzirra/http_multi.rb +8 -2
- metadata +1 -1
data/README.rdoc
CHANGED
@@ -3,8 +3,8 @@
|
|
3
3
|
== Description
|
4
4
|
|
5
5
|
Feedzirra is a feed library that is designed to get and update many feeds as quickly as possible. This includes using libcurl-multi through the
|
6
|
-
taf2-curb[
|
7
|
-
nokogiri[
|
6
|
+
{taf2-curb}[http://github.com/taf2/curb/tree/master] gem for faster http gets, and libxml through
|
7
|
+
{nokogiri}[http://github.com/tenderlove/nokogiri/tree/master] and {sax-machine}[http://github.com/pauldix/sax-machine/tree/master] for
|
8
8
|
faster parsing.
|
9
9
|
|
10
10
|
It allows for easy customization of feed parsing options through the definition of custom parsing classes, and allows you to take as little or as
|
@@ -15,7 +15,7 @@ access to many different types of stores depending on your requirements.
|
|
15
15
|
== Installation
|
16
16
|
|
17
17
|
For now Feedzirra exists only on github. It also has a few gem requirements that are only on github. Before you start you need to have
|
18
|
-
libcurl[
|
18
|
+
{libcurl}[http://curl.haxx.se/] and {libxml}[http://xmlsoft.org/] installed. If you're on Leopard you have both. Otherwise, you'll need to
|
19
19
|
grab them. Once you've got those libraries, you should be able to get up and running with the standard github gem install routine:
|
20
20
|
|
21
21
|
gem sources -a http://gems.github.com # if you haven't already
|
@@ -36,8 +36,7 @@ The Reader object can take a single URL or a list of URLs followed by a Hash of
|
|
36
36
|
allows configuration of the backend store, as well as fetching options for the list of urls. Following is
|
37
37
|
an example of configuration with the Memcache store connected to Tokyo Tyrant (the front-end for Tokyo Cabinet):
|
38
38
|
|
39
|
-
reader = Feedzirra::Reader.new('http://www.pauldix.net/atom.xml', :backend =>
|
40
|
-
{ :moneta_klass => Moneta::Memcache, :port => 1978, :server => 'localhost' })
|
39
|
+
reader = Feedzirra::Reader.new('http://www.pauldix.net/atom.xml', :backend => { :moneta_klass => 'Moneta::Memcache', :server => 'localhost:1978' })
|
41
40
|
|
42
41
|
Other options that may be put in the options hash follow the original API described below.
|
43
42
|
|
@@ -105,7 +104,7 @@ I'd like feedback on the api and any bugs encountered on feeds in the wild. I've
|
|
105
104
|
== Troubleshooting Installation
|
106
105
|
|
107
106
|
*NOTE:*Some people have been reporting a few issues related to installation. First, the Ruby Forge version of curb is not what you want. It will not work. Nor will the curl-multi gem that lives on
|
108
|
-
Ruby Forge. You have to get the taf2-curb[
|
107
|
+
Ruby Forge. You have to get the {taf2-curb}[http://github.com/taf2/curb/tree/master] fork installed.
|
109
108
|
|
110
109
|
If you see this error when doing a require:
|
111
110
|
|
@@ -127,7 +126,7 @@ If you're on Debian or Ubuntu and getting errors while trying to install the taf
|
|
127
126
|
|
128
127
|
Another problem could be if you are running Mac Ports and you have libcurl installed through there. You need to uninstall it for curb to work! The version in Mac Ports is old and doesn't play nice with curb. If you're running Leopard, you can just uninstall and you should be golden. If you're on an older version of OS X, you'll then need to {download curl}[http://curl.haxx.se/download.html] and build from source. Then you'll have to install the taf2-curb gem again. You might have to perform the step above.
|
129
128
|
|
130
|
-
If you're still having issues, please let me know on the mailing list. Also, {Todd Fisher (taf2)}[
|
129
|
+
If you're still having issues, please let me know on the mailing list. Also, {Todd Fisher (taf2)}[http://github.com/taf2] is working on fixing the gem install. Please send him a full error report.
|
131
130
|
|
132
131
|
== TODO
|
133
132
|
|
@@ -141,10 +140,8 @@ Here are some more specific TODOs.
|
|
141
140
|
* Create a super sweet DSL for defining new parsers.
|
142
141
|
* Test against Ruby 1.9.1 and fix any bugs.
|
143
142
|
* I'm not keeping track of modified on entries. Should I add this?
|
144
|
-
* Clean up the fetching code inside feed.rb so it doesn't suck so hard.
|
145
|
-
* Make the feed_spec actually mock stuff out so it doesn't hit the net.
|
146
143
|
* Readdress how feeds determine if they can parse a document. Maybe I should use namespaces instead?
|
147
144
|
|
148
145
|
== LICENSE
|
149
146
|
|
150
|
-
This library is provided under the MIT License. See
|
147
|
+
This library is provided under the MIT License. See the complete LICENSE in LICENSE.rdoc for details.
|
data/lib/feedzirra.rb
CHANGED
@@ -8,6 +8,8 @@ require 'sax-machine'
|
|
8
8
|
require 'dryopteris'
|
9
9
|
require 'uri'
|
10
10
|
|
11
|
+
require 'md5'
|
12
|
+
|
11
13
|
require 'active_support/basic_object'
|
12
14
|
require 'active_support/core_ext/object'
|
13
15
|
require 'active_support/core_ext/time'
|
@@ -40,5 +42,5 @@ require 'feedzirra/parser/atom_feed_burner'
|
|
40
42
|
|
41
43
|
module Feedzirra
|
42
44
|
USER_AGENT = "feedzirra http://github.com/jsl/feedzirra/tree/master"
|
43
|
-
VERSION = "0.0.12.
|
45
|
+
VERSION = "0.0.12.10"
|
44
46
|
end
|
data/lib/feedzirra/http_multi.rb
CHANGED
@@ -29,6 +29,8 @@ module Feedzirra
|
|
29
29
|
@multi.perform
|
30
30
|
end
|
31
31
|
|
32
|
+
private
|
33
|
+
|
32
34
|
# Breaks the urls into chunks of 30 because of weird errors encountered on
|
33
35
|
# entering more items. As one finishes it pops another off the queue.
|
34
36
|
def prepare
|
@@ -46,7 +48,7 @@ module Feedzirra
|
|
46
48
|
url = retrievable.feed_url
|
47
49
|
else
|
48
50
|
url = retrievable
|
49
|
-
retrievable = @backend[url] # Try to fetch the last retrieval from backend
|
51
|
+
retrievable = @backend[key_for(url)] # Try to fetch the last retrieval from backend
|
50
52
|
end
|
51
53
|
|
52
54
|
easy = build_curl_easy(url, retrievable, retrievable_queue)
|
@@ -100,7 +102,7 @@ module Feedzirra
|
|
100
102
|
set_updated_feed_entries!(retrievable, updated_feed)
|
101
103
|
end
|
102
104
|
|
103
|
-
@backend[url] = updated_feed
|
105
|
+
@backend[key_for(url)] = updated_feed
|
104
106
|
|
105
107
|
responses[url] = updated_feed
|
106
108
|
@options[:on_success].call(retrievable) if @options.has_key?(:on_success)
|
@@ -118,6 +120,10 @@ module Feedzirra
|
|
118
120
|
@options[:on_failure].call(retrievable, curl.response_code, curl.header_str, curl.body_str) if options.has_key?(:on_failure)
|
119
121
|
end
|
120
122
|
|
123
|
+
def key_for(url)
|
124
|
+
['url', MD5.hexdigest(url)].join('-')
|
125
|
+
end
|
126
|
+
|
121
127
|
# Determines the etag from the request headers.
|
122
128
|
#
|
123
129
|
# === Parameters
|