pauldix-feedzirra 0.0.1 → 0.0.2
Sign up to get free protection for your applications and to get access to all the features.
- data/README.textile +47 -10
- data/lib/feedzirra/atom_entry.rb +2 -1
- data/lib/feedzirra/atom_feed_burner_entry.rb +2 -0
- data/lib/feedzirra/feed.rb +24 -8
- data/lib/feedzirra/feed_entry_utilities.rb +19 -0
- data/lib/feedzirra/feed_utilities.rb +17 -2
- data/lib/feedzirra/rss.rb +1 -1
- data/lib/feedzirra/rss_entry.rb +4 -0
- data/lib/feedzirra.rb +3 -1
- data/spec/feedzirra/atom_entry_spec.rb +4 -0
- data/spec/feedzirra/atom_feed_burner_entry_spec.rb +9 -0
- data/spec/feedzirra/feed_entry_utilities_spec.rb +28 -0
- data/spec/feedzirra/feed_spec.rb +20 -10
- data/spec/feedzirra/feed_utilities_spec.rb +1 -1
- data/spec/feedzirra/rss_entry_spec.rb +4 -0
- data/spec/feedzirra/rss_spec.rb +5 -4
- metadata +26 -2
data/README.textile
CHANGED
@@ -1,7 +1,8 @@
|
|
1
1
|
h1. Feedzirra
|
2
2
|
|
3
3
|
"http://github.com/pauldix/feedzirra/tree/master":http://github.com/pauldix/feedzirra/tree/master
|
4
|
-
|
4
|
+
|
5
|
+
I'd like feedback on the api and any bugs encountered on feeds in the wild. I've set up a "google group here":http://groups.google.com/group/feedzirra.
|
5
6
|
|
6
7
|
h2. Summary
|
7
8
|
|
@@ -9,24 +10,46 @@ A feed fetching and parsing library that treats the internet like Godzilla treat
|
|
9
10
|
|
10
11
|
h2. Description
|
11
12
|
|
12
|
-
Feedzirra is a feed library that is designed to get and update many feeds as quickly as possible. This includes using libcurl-multi through the "taf2-curb"http://github.com/taf2/curb/tree/master gem for faster http gets, and libxml through "nokogiri":http://github.com/tenderlove/nokogiri/tree/master and "sax-machine":http://github.com/pauldix/sax-machine/tree/master for faster parsing.
|
13
|
+
Feedzirra is a feed library that is designed to get and update many feeds as quickly as possible. This includes using libcurl-multi through the "taf2-curb":http://github.com/taf2/curb/tree/master gem for faster http gets, and libxml through "nokogiri":http://github.com/tenderlove/nokogiri/tree/master and "sax-machine":http://github.com/pauldix/sax-machine/tree/master for faster parsing.
|
13
14
|
|
14
15
|
Once you have fetched feeds using Feedzirra, they can be updated using the feed objects. Feedzirra automatically inserts etag and last-modified information from the http response headers to lower bandwidth usage, eliminate unnecessary parsing, and make things speedier in general.
|
15
16
|
|
17
|
+
Another feature present in Feedzirra is the ability to create callback functions that get called "on success" and "on failure" when getting a feed. This makes it easy to do things like log errors or update data stores.
|
18
|
+
|
16
19
|
The fetching and parsing logic have been decoupled so that either of them can be used in isolation if you'd prefer not to use everything that Feedzirra offers. However, the code examples below use helper methods in the Feed class that put everything together to make things as simple as possible.
|
17
20
|
|
18
21
|
The final feature of Feedzirra is the ability to define custom parsing classes. In truth, Feedzirra could be used to parse much more than feeds. Microformats, page scraping, and almost anything else are fair game.
|
19
22
|
|
20
23
|
h2. Installation
|
21
24
|
|
22
|
-
For now Feedzirra exists only on github. It also has a few gem requirements that are only on github. Before you start you need to have "libcurl":http://curl.haxx.se/ and "libxml":http://xmlsoft.org/ installed. If you're on Leopard you have both. Otherwise, you'll need to grab them. Once you've got those libraries, these are the gems you
|
25
|
+
For now Feedzirra exists only on github. It also has a few gem requirements that are only on github. Before you start you need to have "libcurl":http://curl.haxx.se/ and "libxml":http://xmlsoft.org/ installed. If you're on Leopard you have both. Otherwise, you'll need to grab them. Once you've got those libraries, these are the gems that get used: nokogiri, pauldix-sax-machine, taf2-curb (note that this is a fork that lives on github and not the Ruby Forge version of curb), and pauldix-feedzirra. The feedzirra gemspec has all the dependencies so you should be able to get up and running with the standard github gem install routine:
|
23
26
|
<pre>
|
24
|
-
gem install nokogiri
|
25
27
|
gem sources -a http://gems.github.com # if you haven't already
|
26
|
-
gem install pauldix-sax-machine
|
27
|
-
gem install taf2-curb
|
28
28
|
gem install pauldix-feedzirra
|
29
29
|
</pre>
|
30
|
+
<b>NOTE:</b>Some people have been reporting a few issues related to installation. First, the Ruby Forge version of curb is not what you want. It will not work. Nor will the curl-multi gem that lives on Ruby Forge. You have to get the "taf2-curb":http://github.com/taf2/curb/tree/master fork installed.
|
31
|
+
|
32
|
+
If you see this error when doing a require:
|
33
|
+
<pre>
|
34
|
+
/Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- curb_core (LoadError)
|
35
|
+
</pre>
|
36
|
+
It means that the taf2-curb gem didn't build correctly. To resolve this you can do a git clone git://github.com/taf2/curb.git then run rake gem in the curb directory, then sudo gem install pkg/curb-0.2.4.0.gem. After that you should be good.
|
37
|
+
|
38
|
+
If you see something like this when trying to run it:
|
39
|
+
<pre>
|
40
|
+
NoMethodError: undefined method `on_success' for #<Curl::Easy:0x1182724>
|
41
|
+
from ./lib/feedzirra/feed.rb:88:in `add_url_to_multi'
|
42
|
+
</pre>
|
43
|
+
This means that you are requiring curl-multi or the Ruby Forge version of Curb somewhere. You can't use those and need to get the taf2 version up and running.
|
44
|
+
|
45
|
+
If you're on Debian or Ubuntu and getting errors while trying to install the taf2-curb gem, it could be because you don't have the latest version of libcurl installed. Do this to fix:
|
46
|
+
<pre>
|
47
|
+
sudo apt-get install libcurl4-gnutls-dev
|
48
|
+
</pre>
|
49
|
+
|
50
|
+
Another problem could be if you are running Mac Ports and you have libcurl installed through there. You need to uninstall it for curb to work! The version in Mac Ports is old and doesn't play nice with curb. If you're running Leopard, you can just uninstall and you should be golden. If you're on an older version of OS X, you'll then need to "download curl":http://curl.haxx.se/download.html and build from source. Then you'll have to install the taf2-curb gem again. You might have to perform the step above.
|
51
|
+
|
52
|
+
If you're still having issues, please let me know on the mailing list. Also, "Todd Fisher (taf2)":http://github.com/taf2 is working on fixing the gem install. Please send him a full error report.
|
30
53
|
|
31
54
|
h2. Usage
|
32
55
|
|
@@ -51,6 +74,14 @@ entry.author # => "Paul Dix"
|
|
51
74
|
entry.summary # => "..."
|
52
75
|
entry.content # => "..."
|
53
76
|
entry.published # => Thu Jan 29 17:00:19 UTC 2009 # it's a Time object
|
77
|
+
entry.categories # => ["...", "..."]
|
78
|
+
|
79
|
+
# sanitizing an entry's content
|
80
|
+
entry.sanitized.title # => returns the title with harmful stuff escaped
|
81
|
+
entry.sanitized.author # => returns the author with harmful stuff escaped
|
82
|
+
entry.sanitized.content # => returns the content with harmful stuff escaped
|
83
|
+
entry.sanitize! # => sanitizes the entry's title, author, and content in place (as in, it changes the value to clean versions)
|
84
|
+
feed.sanitize_entries! # => sanitizes all entries in place
|
54
85
|
|
55
86
|
# updating a single feed
|
56
87
|
updated_feed = Feedzirra::Feed.update(feed)
|
@@ -61,7 +92,7 @@ updated_feed.new_entries # a collection of the entry objects that are newer tha
|
|
61
92
|
|
62
93
|
# fetching multiple feeds
|
63
94
|
feed_urls = ["http://feeds.feedburner.com/PaulDixExplainsNothing", "http://feeds.feedburner.com/trottercashion"]
|
64
|
-
feeds = Feedzirra::Feed.fetch_and_parse(
|
95
|
+
feeds = Feedzirra::Feed.fetch_and_parse(feeds_urls)
|
65
96
|
|
66
97
|
# feeds is now a hash with the feed_urls as keys and the parsed feed objects as values. If an error was thrown
|
67
98
|
# there will be a Fixnum of the http response code instead of a feed object
|
@@ -106,11 +137,17 @@ h2. Next Steps
|
|
106
137
|
|
107
138
|
This thing needs to hammer on many different feeds in the wild. I'm sure there will be bugs. I want to find them and crush them. I didn't bother using the test suite for feedparser. i wanted to start fresh.
|
108
139
|
|
109
|
-
Here are some more specific
|
110
|
-
*
|
111
|
-
*
|
140
|
+
Here are some more specific TODOs.
|
141
|
+
* Make a feedzirra-rails gem to integrate feedzirra seamlessly with Rails and ActiveRecord.
|
142
|
+
* Add function to sanitize content.
|
143
|
+
* Add support to automatically handle gzip and deflate encododing.
|
144
|
+
* Add support for authenticated feeds.
|
112
145
|
* Create a super sweet DSL for defining new parsers.
|
146
|
+
* Test against Ruby 1.9.1 and fix any bugs.
|
113
147
|
* I'm not keeping track of modified on entries. Should I add this?
|
148
|
+
* Clean up the fetching code inside feed.rb so it doesn't suck so hard.
|
149
|
+
* Make the feed_spec actually mock stuff out so it doesn't hit the net.
|
150
|
+
* Readdress how feeds determine if they can parse a document. Maybe I should use namespaces instead?
|
114
151
|
|
115
152
|
h2. LICENSE
|
116
153
|
|
data/lib/feedzirra/atom_entry.rb
CHANGED
@@ -3,11 +3,12 @@ module Feedzirra
|
|
3
3
|
include SAXMachine
|
4
4
|
include FeedEntryUtilities
|
5
5
|
element :title
|
6
|
-
element :link, :as => :url, :value => :href, :with => {:type => "text/html"}
|
6
|
+
element :link, :as => :url, :value => :href, :with => {:type => "text/html", :rel => "alternate"}
|
7
7
|
element :name, :as => :author
|
8
8
|
element :content
|
9
9
|
element :summary
|
10
10
|
element :published
|
11
11
|
element :created, :as => :published
|
12
|
+
elements :category, :as => :categories, :value => :term
|
12
13
|
end
|
13
14
|
end
|
@@ -4,9 +4,11 @@ module Feedzirra
|
|
4
4
|
include FeedEntryUtilities
|
5
5
|
element :title
|
6
6
|
element :name, :as => :author
|
7
|
+
element :link, :as => :url, :value => :href, :with => {:type => "text/html", :rel => "alternate"}
|
7
8
|
element :"feedburner:origLink", :as => :url
|
8
9
|
element :summary
|
9
10
|
element :content
|
10
11
|
element :published
|
12
|
+
elements :category, :as => :categories, :value => :term
|
11
13
|
end
|
12
14
|
end
|
data/lib/feedzirra/feed.rb
CHANGED
@@ -13,7 +13,7 @@ module Feedzirra
|
|
13
13
|
end
|
14
14
|
|
15
15
|
def self.determine_feed_parser_for_xml(xml)
|
16
|
-
start_of_doc = xml.slice(0,
|
16
|
+
start_of_doc = xml.slice(0, 1000)
|
17
17
|
feed_classes.detect {|klass| klass.able_to_parse?(start_of_doc)}
|
18
18
|
end
|
19
19
|
|
@@ -22,24 +22,25 @@ module Feedzirra
|
|
22
22
|
end
|
23
23
|
|
24
24
|
def self.feed_classes
|
25
|
-
@feed_classes ||= [RSS,
|
25
|
+
@feed_classes ||= [RSS, AtomFeedBurner, Atom]
|
26
26
|
end
|
27
27
|
|
28
28
|
# can take a single url or an array of urls
|
29
29
|
# when passed a single url it returns the body of the response
|
30
30
|
# when passed an array of urls it returns a hash with the urls as keys and body of responses as values
|
31
31
|
def self.fetch_raw(urls, options = {})
|
32
|
-
|
32
|
+
url_queue = [*urls]
|
33
33
|
multi = Curl::Multi.new
|
34
34
|
responses = {}
|
35
|
-
|
35
|
+
url_queue.each do |url|
|
36
36
|
easy = Curl::Easy.new(url) do |curl|
|
37
37
|
curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT)
|
38
38
|
curl.headers["If-Modified-Since"] = options[:if_modified_since].httpdate if options.has_key?(:if_modified_since)
|
39
39
|
curl.headers["If-None-Match"] = options[:if_none_match] if options.has_key?(:if_none_match)
|
40
|
+
curl.headers["Accept-encoding"] = 'gzip, deflate'
|
40
41
|
curl.follow_location = true
|
41
42
|
curl.on_success do |c|
|
42
|
-
responses[url] = c
|
43
|
+
responses[url] = decode_content(c)
|
43
44
|
end
|
44
45
|
curl.on_failure do |c|
|
45
46
|
responses[url] = c.response_code
|
@@ -49,7 +50,7 @@ module Feedzirra
|
|
49
50
|
end
|
50
51
|
|
51
52
|
multi.perform
|
52
|
-
return
|
53
|
+
return urls.is_a?(String) ? responses.values.first : responses
|
53
54
|
end
|
54
55
|
|
55
56
|
def self.fetch_and_parse(urls, options = {})
|
@@ -64,7 +65,21 @@ module Feedzirra
|
|
64
65
|
end
|
65
66
|
|
66
67
|
multi.perform
|
67
|
-
return
|
68
|
+
return urls.is_a?(String) ? responses.values.first : responses
|
69
|
+
end
|
70
|
+
|
71
|
+
def self.decode_content(c)
|
72
|
+
if c.header_str.match(/Content-Encoding: gzip/)
|
73
|
+
gz = Zlib::GzipReader.new(StringIO.new(c.body_str))
|
74
|
+
xml = gz.read
|
75
|
+
gz.close
|
76
|
+
elsif c.header_str.match(/Content-Encoding: deflate/)
|
77
|
+
xml = Zlib::Deflate.inflate(c.body_str)
|
78
|
+
else
|
79
|
+
xml = c.body_str
|
80
|
+
end
|
81
|
+
|
82
|
+
xml
|
68
83
|
end
|
69
84
|
|
70
85
|
def self.update(feeds, options = {})
|
@@ -84,10 +99,11 @@ module Feedzirra
|
|
84
99
|
curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT)
|
85
100
|
curl.headers["If-Modified-Since"] = options[:if_modified_since].httpdate if options.has_key?(:if_modified_since)
|
86
101
|
curl.headers["If-None-Match"] = options[:if_none_match] if options.has_key?(:if_none_match)
|
102
|
+
curl.headers["Accept-encoding"] = 'gzip, deflate'
|
87
103
|
curl.follow_location = true
|
88
104
|
curl.on_success do |c|
|
89
105
|
add_url_to_multi(multi, url_queue.shift, url_queue, responses, options) unless url_queue.empty?
|
90
|
-
xml = c
|
106
|
+
xml = decode_content(c)
|
91
107
|
klass = determine_feed_parser_for_xml(xml)
|
92
108
|
if klass
|
93
109
|
feed = klass.parse(xml)
|
@@ -10,6 +10,25 @@ module Feedzirra
|
|
10
10
|
@published = parse_datetime(val)
|
11
11
|
end
|
12
12
|
|
13
|
+
def sanitized
|
14
|
+
dispatcher = Class.new do
|
15
|
+
def initialize(entry)
|
16
|
+
@entry = entry
|
17
|
+
end
|
18
|
+
|
19
|
+
def method_missing(method, *args)
|
20
|
+
Dryopteris.sanitize(@entry.send(method))
|
21
|
+
end
|
22
|
+
end
|
23
|
+
dispatcher.new(self)
|
24
|
+
end
|
25
|
+
|
26
|
+
def sanitize!
|
27
|
+
self.title = sanitized.title
|
28
|
+
self.author = sanitized.author
|
29
|
+
self.content = sanitized.content
|
30
|
+
end
|
31
|
+
|
13
32
|
alias_method :last_modified, :published
|
14
33
|
end
|
15
34
|
end
|
@@ -26,7 +26,7 @@ module Feedzirra
|
|
26
26
|
|
27
27
|
def update_from_feed(feed)
|
28
28
|
self.new_entries += find_new_entries_for(feed)
|
29
|
-
self.entries
|
29
|
+
self.entries.unshift(*self.new_entries)
|
30
30
|
|
31
31
|
updated! if UPDATABLE_ATTRIBUTES.any? { |name| update_attribute(feed, name) }
|
32
32
|
end
|
@@ -39,6 +39,10 @@ module Feedzirra
|
|
39
39
|
end
|
40
40
|
end
|
41
41
|
|
42
|
+
def sanitize_entries!
|
43
|
+
entries.each {|entry| entry.sanitize!}
|
44
|
+
end
|
45
|
+
|
42
46
|
private
|
43
47
|
|
44
48
|
def updated!
|
@@ -46,7 +50,18 @@ module Feedzirra
|
|
46
50
|
end
|
47
51
|
|
48
52
|
def find_new_entries_for(feed)
|
49
|
-
|
53
|
+
# this implementation is a hack, which is why it's so ugly.
|
54
|
+
# it's to get around the fact that not all feeds have a published date.
|
55
|
+
# however, they're always ordered with the newest one first.
|
56
|
+
# So we go through the entries just parsed and insert each one as a new entry
|
57
|
+
# until we get to one that has the same url as the the newest for the feed
|
58
|
+
latest_entry = self.entries.first
|
59
|
+
found_new_entries = []
|
60
|
+
feed.entries.each do |entry|
|
61
|
+
break if entry.url == latest_entry.url
|
62
|
+
found_new_entries << entry
|
63
|
+
end
|
64
|
+
found_new_entries
|
50
65
|
end
|
51
66
|
|
52
67
|
def existing_entry?(test_entry)
|
data/lib/feedzirra/rss.rb
CHANGED
data/lib/feedzirra/rss_entry.rb
CHANGED
@@ -4,9 +4,13 @@ module Feedzirra
|
|
4
4
|
include FeedEntryUtilities
|
5
5
|
element :title
|
6
6
|
element :link, :as => :url
|
7
|
+
|
7
8
|
element :"dc:creator", :as => :author
|
8
9
|
element :"content:encoded", :as => :content
|
9
10
|
element :description, :as => :summary
|
11
|
+
|
10
12
|
element :pubDate, :as => :published
|
13
|
+
element :"dc:date", :as => :published
|
14
|
+
elements :category, :as => :categories
|
11
15
|
end
|
12
16
|
end
|
data/lib/feedzirra.rb
CHANGED
@@ -2,8 +2,10 @@ $LOAD_PATH.unshift(File.dirname(__FILE__)) unless $LOAD_PATH.include?(File.dirna
|
|
2
2
|
|
3
3
|
gem 'activesupport'
|
4
4
|
|
5
|
+
require 'zlib'
|
5
6
|
require 'curb'
|
6
7
|
require 'sax-machine'
|
8
|
+
require 'dryopteris'
|
7
9
|
require 'active_support/basic_object'
|
8
10
|
require 'active_support/core_ext/object'
|
9
11
|
require 'active_support/core_ext/time'
|
@@ -25,5 +27,5 @@ require 'feedzirra/atom'
|
|
25
27
|
require 'feedzirra/atom_feed_burner'
|
26
28
|
|
27
29
|
module Feedzirra
|
28
|
-
VERSION = "0.0.
|
30
|
+
VERSION = "0.0.2"
|
29
31
|
end
|
@@ -11,6 +11,11 @@ describe Feedzirra::AtomFeedBurnerEntry do
|
|
11
11
|
@entry.title.should == "Making a Ruby C library even faster"
|
12
12
|
end
|
13
13
|
|
14
|
+
it "should be able to fetch a url via the 'alternate' rel if no origLink exists" do
|
15
|
+
entry = Feedzirra::AtomFeedBurner.parse(File.read("#{File.dirname(__FILE__)}/../sample_feeds/PaulDixExplainsNothingAlternate.xml")).entries.first
|
16
|
+
entry.url.should == 'http://feeds.feedburner.com/~r/PaulDixExplainsNothing/~3/519925023/making-a-ruby-c-library-even-faster.html'
|
17
|
+
end
|
18
|
+
|
14
19
|
it "should parse the url" do
|
15
20
|
@entry.url.should == "http://www.pauldix.net/2009/01/making-a-ruby-c-library-even-faster.html"
|
16
21
|
end
|
@@ -30,4 +35,8 @@ describe Feedzirra::AtomFeedBurnerEntry do
|
|
30
35
|
it "should parse the published date" do
|
31
36
|
@entry.published.to_s.should == "Thu Jan 22 15:50:22 UTC 2009"
|
32
37
|
end
|
38
|
+
|
39
|
+
it "should parse the categories" do
|
40
|
+
@entry.categories.should == ['Ruby', 'Another Category']
|
41
|
+
end
|
33
42
|
end
|
@@ -14,4 +14,32 @@ describe Feedzirra::FeedUtilities do
|
|
14
14
|
time.to_s.should == "Wed Feb 20 18:05:00 UTC 2008"
|
15
15
|
end
|
16
16
|
end
|
17
|
+
|
18
|
+
describe "sanitizing" do
|
19
|
+
before(:each) do
|
20
|
+
@feed = Feedzirra::Feed.parse(sample_atom_feed)
|
21
|
+
@entry = @feed.entries.first
|
22
|
+
end
|
23
|
+
|
24
|
+
it "should provide a sanitized title" do
|
25
|
+
new_title = "<script>" + @entry.title
|
26
|
+
@entry.title = new_title
|
27
|
+
@entry.sanitized.title.should == Dryopteris.sanitize(new_title)
|
28
|
+
end
|
29
|
+
|
30
|
+
it "should sanitize things in place" do
|
31
|
+
@entry.title += "<script>"
|
32
|
+
@entry.author += "<script>"
|
33
|
+
@entry.content += "<script>"
|
34
|
+
|
35
|
+
cleaned_title = Dryopteris.sanitize(@entry.title)
|
36
|
+
cleaned_author = Dryopteris.sanitize(@entry.author)
|
37
|
+
cleaned_content = Dryopteris.sanitize(@entry.content)
|
38
|
+
|
39
|
+
@entry.sanitize!
|
40
|
+
@entry.title.should == cleaned_title
|
41
|
+
@entry.author.should == cleaned_author
|
42
|
+
@entry.content.should == cleaned_content
|
43
|
+
end
|
44
|
+
end
|
17
45
|
end
|
data/spec/feedzirra/feed_spec.rb
CHANGED
@@ -5,29 +5,29 @@ describe Feedzirra::Feed do
|
|
5
5
|
context "when there's an available parser" do
|
6
6
|
it "should parse an rdf feed" do
|
7
7
|
feed = Feedzirra::Feed.parse(sample_rdf_feed)
|
8
|
-
feed.class.should == Feedzirra::RDF
|
9
8
|
feed.title.should == "HREF Considered Harmful"
|
9
|
+
feed.entries.first.published.to_s.should == "Tue Sep 02 19:50:07 UTC 2008"
|
10
10
|
feed.entries.size.should == 10
|
11
11
|
end
|
12
12
|
|
13
13
|
it "should parse an rss feed" do
|
14
14
|
feed = Feedzirra::Feed.parse(sample_rss_feed)
|
15
|
-
feed.class.should == Feedzirra::RSS
|
16
15
|
feed.title.should == "Tender Lovemaking"
|
16
|
+
feed.entries.first.published.to_s.should == "Thu Dec 04 17:17:49 UTC 2008"
|
17
17
|
feed.entries.size.should == 10
|
18
18
|
end
|
19
19
|
|
20
20
|
it "should parse an atom feed" do
|
21
21
|
feed = Feedzirra::Feed.parse(sample_atom_feed)
|
22
|
-
feed.class.should == Feedzirra::Atom
|
23
22
|
feed.title.should == "Amazon Web Services Blog"
|
23
|
+
feed.entries.first.published.to_s.should == "Fri Jan 16 18:21:00 UTC 2009"
|
24
24
|
feed.entries.size.should == 10
|
25
25
|
end
|
26
26
|
|
27
27
|
it "should parse an feedburner atom feed" do
|
28
28
|
feed = Feedzirra::Feed.parse(sample_feedburner_atom_feed)
|
29
|
-
feed.class.should == Feedzirra::AtomFeedBurner
|
30
29
|
feed.title.should == "Paul Dix Explains Nothing"
|
30
|
+
feed.entries.first.published.to_s.should == "Thu Jan 22 15:50:22 UTC 2009"
|
31
31
|
feed.entries.size.should == 5
|
32
32
|
end
|
33
33
|
end
|
@@ -42,8 +42,8 @@ describe Feedzirra::Feed do
|
|
42
42
|
|
43
43
|
it "should parse an feedburner rss feed" do
|
44
44
|
feed = Feedzirra::Feed.parse(sample_rss_feed_burner_feed)
|
45
|
-
feed.class.should == Feedzirra::RDF
|
46
45
|
feed.title.should == "Sam Harris: Author, Philosopher, Essayist, Atheist"
|
46
|
+
feed.entries.first.published.to_s.should == "Tue Jan 13 17:20:28 UTC 2009"
|
47
47
|
feed.entries.size.should == 10
|
48
48
|
end
|
49
49
|
end
|
@@ -57,12 +57,12 @@ describe Feedzirra::Feed do
|
|
57
57
|
Feedzirra::Feed.determine_feed_parser_for_xml(sample_feedburner_atom_feed).should == Feedzirra::AtomFeedBurner
|
58
58
|
end
|
59
59
|
|
60
|
-
it "should return the Feedzirra::
|
61
|
-
Feedzirra::Feed.determine_feed_parser_for_xml(sample_rdf_feed).should == Feedzirra::
|
60
|
+
it "should return the Feedzirra::RSS class for an rdf/rss 1.0 feed" do
|
61
|
+
Feedzirra::Feed.determine_feed_parser_for_xml(sample_rdf_feed).should == Feedzirra::RSS
|
62
62
|
end
|
63
63
|
|
64
|
-
it "should return the Feedzirra::
|
65
|
-
Feedzirra::Feed.determine_feed_parser_for_xml(sample_rss_feed_burner_feed).should == Feedzirra::
|
64
|
+
it "should return the Feedzirra::RSS class for an rss feedburner feed" do
|
65
|
+
Feedzirra::Feed.determine_feed_parser_for_xml(sample_rss_feed_burner_feed).should == Feedzirra::RSS
|
66
66
|
end
|
67
67
|
|
68
68
|
it "should return the Feedzirra::RSS object for an rss 2.0 feed" do
|
@@ -113,7 +113,7 @@ describe Feedzirra::Feed do
|
|
113
113
|
describe "fetching feeds" do
|
114
114
|
before(:each) do
|
115
115
|
@paul_feed_url = "http://feeds.feedburner.com/PaulDixExplainsNothing"
|
116
|
-
@trotter_feed_url = "http://
|
116
|
+
@trotter_feed_url = "http://feeds2.feedburner.com/trottercashion"
|
117
117
|
end
|
118
118
|
|
119
119
|
describe "handling many feeds" do
|
@@ -139,6 +139,11 @@ describe Feedzirra::Feed do
|
|
139
139
|
results[@paul_feed_url].should =~ /Paul Dix/
|
140
140
|
results[@trotter_feed_url].should =~ /Trotter Cashion/
|
141
141
|
end
|
142
|
+
|
143
|
+
it "should always return a hash when passed an array" do
|
144
|
+
results = Feedzirra::Feed.fetch_raw([@paul_feed_url])
|
145
|
+
results.class.should == Hash
|
146
|
+
end
|
142
147
|
end
|
143
148
|
|
144
149
|
describe "#fetch_and_parse" do
|
@@ -169,6 +174,11 @@ describe Feedzirra::Feed do
|
|
169
174
|
feeds[@trotter_feed_url].feed_url.should == @trotter_feed_url
|
170
175
|
end
|
171
176
|
|
177
|
+
it "should always return a hash when passed an array" do
|
178
|
+
feeds = Feedzirra::Feed.fetch_and_parse([@paul_feed_url])
|
179
|
+
feeds.class.should == Hash
|
180
|
+
end
|
181
|
+
|
172
182
|
it "should yeild the url and feed object to a :on_success lambda" do
|
173
183
|
successful_call_mock = mock("successful_call_mock")
|
174
184
|
successful_call_mock.should_receive(:call)
|
@@ -125,8 +125,8 @@ describe Feedzirra::FeedUtilities do
|
|
125
125
|
@new_entry.url = "http://pauldix.net/new.html"
|
126
126
|
@new_entry.published = (Time.now + 10).to_s
|
127
127
|
@feed.entries << @old_entry
|
128
|
-
@updated_feed.entries << @old_entry
|
129
128
|
@updated_feed.entries << @new_entry
|
129
|
+
@updated_feed.entries << @old_entry
|
130
130
|
end
|
131
131
|
|
132
132
|
it "should update last-modified from the latest entry date" do
|
@@ -30,4 +30,8 @@ describe Feedzirra::RSSEntry do
|
|
30
30
|
it "should parse the published date" do
|
31
31
|
@entry.published.to_s.should == "Thu Dec 04 17:17:49 UTC 2008"
|
32
32
|
end
|
33
|
+
|
34
|
+
it "should parse the categories" do
|
35
|
+
@entry.categories.should == ['computadora', 'nokogiri', 'rails']
|
36
|
+
end
|
33
37
|
end
|
data/spec/feedzirra/rss_spec.rb
CHANGED
@@ -5,10 +5,11 @@ describe Feedzirra::RSS do
|
|
5
5
|
it "should return true for an RSS feed" do
|
6
6
|
Feedzirra::RSS.should be_able_to_parse(sample_rss_feed)
|
7
7
|
end
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
8
|
+
|
9
|
+
# this is no longer true. combined rdf and rss into one
|
10
|
+
# it "should return false for an rdf feed" do
|
11
|
+
# Feedzirra::RSS.should_not be_able_to_parse(sample_rdf_feed)
|
12
|
+
# end
|
12
13
|
|
13
14
|
it "should return fase for an atom feed" do
|
14
15
|
Feedzirra::RSS.should_not be_able_to_parse(sample_atom_feed)
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: pauldix-feedzirra
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Paul Dix
|
@@ -9,11 +9,12 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2009-
|
12
|
+
date: 2009-02-19 00:00:00 -08:00
|
13
13
|
default_executable:
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: nokogiri
|
17
|
+
type: :runtime
|
17
18
|
version_requirement:
|
18
19
|
version_requirements: !ruby/object:Gem::Requirement
|
19
20
|
requirements:
|
@@ -23,6 +24,7 @@ dependencies:
|
|
23
24
|
version:
|
24
25
|
- !ruby/object:Gem::Dependency
|
25
26
|
name: pauldix-sax-machine
|
27
|
+
type: :runtime
|
26
28
|
version_requirement:
|
27
29
|
version_requirements: !ruby/object:Gem::Requirement
|
28
30
|
requirements:
|
@@ -32,6 +34,7 @@ dependencies:
|
|
32
34
|
version:
|
33
35
|
- !ruby/object:Gem::Dependency
|
34
36
|
name: taf2-curb
|
37
|
+
type: :runtime
|
35
38
|
version_requirement:
|
36
39
|
version_requirements: !ruby/object:Gem::Requirement
|
37
40
|
requirements:
|
@@ -39,8 +42,19 @@ dependencies:
|
|
39
42
|
- !ruby/object:Gem::Version
|
40
43
|
version: 0.2.3
|
41
44
|
version:
|
45
|
+
- !ruby/object:Gem::Dependency
|
46
|
+
name: builder
|
47
|
+
type: :runtime
|
48
|
+
version_requirement:
|
49
|
+
version_requirements: !ruby/object:Gem::Requirement
|
50
|
+
requirements:
|
51
|
+
- - ">="
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: 2.1.2
|
54
|
+
version:
|
42
55
|
- !ruby/object:Gem::Dependency
|
43
56
|
name: activesupport
|
57
|
+
type: :runtime
|
44
58
|
version_requirement:
|
45
59
|
version_requirements: !ruby/object:Gem::Requirement
|
46
60
|
requirements:
|
@@ -48,6 +62,16 @@ dependencies:
|
|
48
62
|
- !ruby/object:Gem::Version
|
49
63
|
version: 2.0.0
|
50
64
|
version:
|
65
|
+
- !ruby/object:Gem::Dependency
|
66
|
+
name: mdalessio-dryopteris
|
67
|
+
type: :runtime
|
68
|
+
version_requirement:
|
69
|
+
version_requirements: !ruby/object:Gem::Requirement
|
70
|
+
requirements:
|
71
|
+
- - ">="
|
72
|
+
- !ruby/object:Gem::Version
|
73
|
+
version: 0.0.0
|
74
|
+
version:
|
51
75
|
description:
|
52
76
|
email: paul@pauldix.net
|
53
77
|
executables: []
|