pauldix-feedzirra 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
data/README.textile ADDED
@@ -0,0 +1,140 @@
1
+ h1. Feedzirra
2
+
3
+ "http://github.com/pauldix/feedzirra/tree/master":http://github.com/pauldix/feedzirra/tree/master
4
+ "group discussion":http://groups.google.com/group/feedzirra
5
+
6
+ h2. Summary
7
+
8
+ A feed fetching and parsing library that treats the internet like Godzilla treats Japan: it dominates and eats all.
9
+
10
+ h2. Description
11
+
12
+ Feedzirra is a feed library that is designed to get and update many feeds as quickly as possible. This includes using libcurl-multi through the "taf2-curb"http://github.com/taf2/curb/tree/master gem for faster http gets, and libxml through "nokogiri":http://github.com/tenderlove/nokogiri/tree/master and "sax-machine":http://github.com/pauldix/sax-machine/tree/master for faster parsing.
13
+
14
+ Once you have fetched feeds using Feedzirra, they can be updated using the feed objects. Feedzirra automatically inserts etag and last-modified information from the http response headers to lower bandwidth usage, eliminate unnecessary parsing, and make things speedier in general.
15
+
16
+ The fetching and parsing logic have been decoupled so that either of them can be used in isolation if you'd prefer not to use everything that Feedzirra offers. However, the code examples below use helper methods in the Feed class that put everything together to make things as simple as possible.
17
+
18
+ The final feature of Feedzirra is the ability to define custom parsing classes. In truth, Feedzirra could be used to parse much more than feeds. Microformats, page scraping, and almost anything else are fair game.
19
+
20
+ h2. Installation
21
+
22
+ For now Feedzirra exists only on github. It also has a few gem requirements that are only on github. Before you start you need to have "libcurl":http://curl.haxx.se/ and "libxml":http://xmlsoft.org/ installed. If you're on Leopard you have both. Otherwise, you'll need to grab them. Once you've got those libraries, these are the gems you'll need.
23
+ <pre>
24
+ gem install nokogiri
25
+ gem sources -a http://gems.github.com # if you haven't already
26
+ gem install pauldix-sax-machine
27
+ gem install taf2-curb
28
+ gem install pauldix-feedzirra
29
+ </pre>
30
+
31
+ h2. Usage
32
+
33
+ "A gist of the following code":http://gist.github.com/57285
34
+ <pre>
35
+ require 'feedzirra'
36
+
37
+ # fetching a single feed
38
+ feed = Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing")
39
+
40
+ # feed and entries accessors
41
+ feed.title # => "Paul Dix Explains Nothing"
42
+ feed.url # => "http://www.pauldix.net"
43
+ feed.feed_url # => "http://feeds.feedburner.com/PaulDixExplainsNothing"
44
+ feed.etag # => "GunxqnEP4NeYhrqq9TyVKTuDnh0"
45
+ feed.last_modified # => Sat Jan 31 17:58:16 -0500 2009 # it's a Time object
46
+
47
+ entry = feed.entries.first
48
+ entry.title # => "Ruby Http Client Library Performance"
49
+ entry.url # => "http://www.pauldix.net/2009/01/ruby-http-client-library-performance.html"
50
+ entry.author # => "Paul Dix"
51
+ entry.summary # => "..."
52
+ entry.content # => "..."
53
+ entry.published # => Thu Jan 29 17:00:19 UTC 2009 # it's a Time object
54
+
55
+ # updating a single feed
56
+ updated_feed = Feedzirra::Feed.update(feed)
57
+
58
+ # an updated feed has the following extra accessors
59
+ updated_feed.updated? # returns true if any of the feed attributes have been modified. will return false if only new entries
60
+ updated_feed.new_entries # a collection of the entry objects that are newer than the latest in the feed before update
61
+
62
+ # fetching multiple feeds
63
+ feed_urls = ["http://feeds.feedburner.com/PaulDixExplainsNothing", "http://feeds.feedburner.com/trottercashion"]
64
+ feeds = Feedzirra::Feed.fetch_and_parse(feeds)
65
+
66
+ # feeds is now a hash with the feed_urls as keys and the parsed feed objects as values. If an error was thrown
67
+ # there will be a Fixnum of the http response code instead of a feed object
68
+
69
+ # updating multiple feeds. it expects a collection of feed objects
70
+ updated_feeds = Feedzirra::Feed.udpate(feeds.values)
71
+
72
+ # defining custom behavior on failure or success. note that a return status of 304 (not updated) will call the on_success handler
73
+ feed = Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing",
74
+ :on_success => lambda {|feed| puts feed.title },
75
+ :on_failure => lambda {|url, response_code, response_header, response_body| puts response_body })
76
+ # if a collection was passed into fetch_and_parse, the handlers will be called for each one
77
+
78
+ # the behavior for the handlers when using Feedzirra::Feed.update is slightly different. The feed passed into on_success will be
79
+ # the updated feed with the standard updated accessors. on failure it will be the original feed object passed into update
80
+
81
+ # Defining custom parsers
82
+ # TODO: the functionality is here, just write some good examples that show how to do this
83
+ </pre>
84
+
85
+ h2. Benchmarks
86
+
87
+ One of the goals of Feedzirra is speed. This includes not only parsing, but fetching multiple feeds as quickly as possible. I ran a benchmark getting 20 feeds 10 times using Feedzirra, rFeedParser, and FeedNormalizer. For more details the "benchmark code can be found in the project in spec/benchmarks/feedzirra_benchmarks.rb":http://github.com/pauldix/feedzirra/blob/7fb5634c5c16e9c6ec971767b462c6518cd55f5d/spec/benchmarks/feedzirra_benchmarks.rb
88
+ <pre>
89
+ feedzirra 5.170000 1.290000 6.460000 ( 18.917796)
90
+ rfeedparser 104.260000 12.220000 116.480000 (244.799063)
91
+ feed-normalizer 66.250000 4.010000 70.260000 (191.589862)
92
+ </pre>
93
+ The result of that benchmark is a bit sketchy because of the network variability. Running 10 times against the same 20 feeds was meant to smooth some of that out. However, there is also a "benchmark comparing parsing speed in spec/benchmarks/parsing_benchmark.rb":http://github.com/pauldix/feedzirra/blob/7fb5634c5c16e9c6ec971767b462c6518cd55f5d/spec/benchmarks/parsing_benchmark.rb on an atom feed.
94
+ <pre>
95
+ feedzirra 0.500000 0.030000 0.530000 ( 0.658744)
96
+ rfeedparser 8.400000 1.110000 9.510000 ( 11.839827)
97
+ feed-normalizer 5.980000 0.160000 6.140000 ( 7.576140)
98
+ </pre>
99
+ There's also a "benchmark that shows the results of using Feedzirra to perform updates on feeds":http://github.com/pauldix/feedzirra/blob/45d64319544c61a4c9eb9f7f825c73b9f9030cb3/spec/benchmarks/updating_benchmarks.rb you've already pulled in. I tested against 179 feeds. The first is the initial pull and the second is an update 65 seconds later. I'm not sure how many of them support etag and last-modified, so performance may be better or worse depending on what feeds you're requesting.
100
+ <pre>
101
+ feedzirra fetch and parse 4.010000 0.710000 4.720000 ( 15.110101)
102
+ feedzirra update 0.660000 0.280000 0.940000 ( 5.152709)
103
+ </pre>
104
+
105
+ h2. Next Steps
106
+
107
+ This thing needs to hammer on many different feeds in the wild. I'm sure there will be bugs. I want to find them and crush them. I didn't bother using the test suite for feedparser. i wanted to start fresh.
108
+
109
+ Here are some more specific todos.
110
+ * Clean up the fetching code inside feed.rb so it doesn't suck so hard.
111
+ * Make the feed_spec actually mock stuff out so it doesn't hit the net.
112
+ * Create a super sweet DSL for defining new parsers.
113
+ * I'm not keeping track of modified on entries. Should I add this?
114
+
115
+ h2. LICENSE
116
+
117
+ (The MIT License)
118
+
119
+ Copyright (c) 2009:
120
+
121
+ "Paul Dix":http://pauldix.net
122
+
123
+ Permission is hereby granted, free of charge, to any person obtaining
124
+ a copy of this software and associated documentation files (the
125
+ 'Software'), to deal in the Software without restriction, including
126
+ without limitation the rights to use, copy, modify, merge, publish,
127
+ distribute, sublicense, and/or sell copies of the Software, and to
128
+ permit persons to whom the Software is furnished to do so, subject to
129
+ the following conditions:
130
+
131
+ The above copyright notice and this permission notice shall be
132
+ included in all copies or substantial portions of the Software.
133
+
134
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
135
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
136
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
137
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
138
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
139
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
140
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/Rakefile ADDED
@@ -0,0 +1,14 @@
1
+ require "spec"
2
+ require "spec/rake/spectask"
3
+ require 'lib/feedzirra.rb'
4
+
5
+ Spec::Rake::SpecTask.new do |t|
6
+ t.spec_opts = ['--options', "\"#{File.dirname(__FILE__)}/spec/spec.opts\""]
7
+ t.spec_files = FileList['spec/**/*_spec.rb']
8
+ end
9
+
10
+ task :install do
11
+ rm_rf "*.gem"
12
+ puts `gem build feedzirra.gemspec`
13
+ puts `sudo gem install feedzirra-#{Feedzirra::VERSION}.gem`
14
+ end
@@ -0,0 +1,21 @@
1
+ # Date code pulled from:
2
+ # Ruby Cookbook by Lucas Carlson and Leonard Richardson
3
+ # Published by O'Reilly
4
+ # ISBN: 0-596-52369-6
5
+ class Date
6
+ def feed_utils_to_gm_time
7
+ feed_utils_to_time(new_offset, :gm)
8
+ end
9
+
10
+ def feed_utils_to_local_time
11
+ feed_utils_to_time(new_offset(DateTime.now.offset-offset), :local)
12
+ end
13
+
14
+ private
15
+ def feed_utils_to_time(dest, method)
16
+ #Convert a fraction of a day to a number of microseconds
17
+ usec = (dest.sec_fraction * 60 * 60 * 24 * (10**6)).to_i
18
+ Time.send(method, dest.year, dest.month, dest.day, dest.hour, dest.min,
19
+ dest.sec, usec)
20
+ end
21
+ end
@@ -0,0 +1,14 @@
1
+ module Feedzirra
2
+ class Atom
3
+ include SAXMachine
4
+ include FeedUtilities
5
+ element :title
6
+ element :link, :as => :url, :value => :href, :with => {:type => "text/html"}
7
+ element :link, :as => :feed_url, :value => :href, :with => {:type => "application/atom+xml"}
8
+ elements :entry, :as => :entries, :class => AtomEntry
9
+
10
+ def self.able_to_parse?(xml)
11
+ xml =~ /(Atom)|(#{Regexp.escape("http://purl.org/atom")})/
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,13 @@
1
+ module Feedzirra
2
+ class AtomEntry
3
+ include SAXMachine
4
+ include FeedEntryUtilities
5
+ element :title
6
+ element :link, :as => :url, :value => :href, :with => {:type => "text/html"}
7
+ element :name, :as => :author
8
+ element :content
9
+ element :summary
10
+ element :published
11
+ element :created, :as => :published
12
+ end
13
+ end
@@ -0,0 +1,14 @@
1
+ module Feedzirra
2
+ class AtomFeedBurner
3
+ include SAXMachine
4
+ include FeedUtilities
5
+ element :title
6
+ element :link, :as => :url, :value => :href, :with => {:type => "text/html"}
7
+ element :link, :as => :feed_url, :value => :href, :with => {:type => "application/atom+xml"}
8
+ elements :entry, :as => :entries, :class => AtomFeedBurnerEntry
9
+
10
+ def self.able_to_parse?(xml)
11
+ (xml =~ /Atom/ && xml =~ /feedburner/) || false
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,12 @@
1
+ module Feedzirra
2
+ class AtomFeedBurnerEntry
3
+ include SAXMachine
4
+ include FeedEntryUtilities
5
+ element :title
6
+ element :name, :as => :author
7
+ element :"feedburner:origLink", :as => :url
8
+ element :summary
9
+ element :content
10
+ element :published
11
+ end
12
+ end
@@ -0,0 +1,157 @@
1
+ module Feedzirra
2
+ class NoParserAvailable < StandardError; end
3
+
4
+ class Feed
5
+ USER_AGENT = "feedzirra http://github.com/pauldix/feedzirra/tree/master"
6
+
7
+ def self.parse(xml)
8
+ if parser = determine_feed_parser_for_xml(xml)
9
+ parser.parse(xml)
10
+ else
11
+ raise NoParserAvailable.new("no valid parser for content.")
12
+ end
13
+ end
14
+
15
+ def self.determine_feed_parser_for_xml(xml)
16
+ start_of_doc = xml.slice(0, 500)
17
+ feed_classes.detect {|klass| klass.able_to_parse?(start_of_doc)}
18
+ end
19
+
20
+ def self.add_feed_class(klass)
21
+ feed_classes.unshift klass
22
+ end
23
+
24
+ def self.feed_classes
25
+ @feed_classes ||= [RSS, RDF, AtomFeedBurner, Atom]
26
+ end
27
+
28
+ # can take a single url or an array of urls
29
+ # when passed a single url it returns the body of the response
30
+ # when passed an array of urls it returns a hash with the urls as keys and body of responses as values
31
+ def self.fetch_raw(urls, options = {})
32
+ urls = [*urls]
33
+ multi = Curl::Multi.new
34
+ responses = {}
35
+ urls.each do |url|
36
+ easy = Curl::Easy.new(url) do |curl|
37
+ curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT)
38
+ curl.headers["If-Modified-Since"] = options[:if_modified_since].httpdate if options.has_key?(:if_modified_since)
39
+ curl.headers["If-None-Match"] = options[:if_none_match] if options.has_key?(:if_none_match)
40
+ curl.follow_location = true
41
+ curl.on_success do |c|
42
+ responses[url] = c.body_str
43
+ end
44
+ curl.on_failure do |c|
45
+ responses[url] = c.response_code
46
+ end
47
+ end
48
+ multi.add(easy)
49
+ end
50
+
51
+ multi.perform
52
+ return responses.size == 1 ? responses.values.first : responses
53
+ end
54
+
55
+ def self.fetch_and_parse(urls, options = {})
56
+ url_queue = [*urls]
57
+ multi = Curl::Multi.new
58
+
59
+ # I broke these down so I would only try to do 30 simultaneously because
60
+ # I was getting weird errors when doing a lot. As one finishes it pops another off the queue.
61
+ responses = {}
62
+ url_queue.slice!(0, 30).each do |url|
63
+ add_url_to_multi(multi, url, url_queue, responses, options)
64
+ end
65
+
66
+ multi.perform
67
+ return responses.size == 1 ? responses.values.first : responses
68
+ end
69
+
70
+ def self.update(feeds, options = {})
71
+ feed_queue = [*feeds]
72
+ multi = Curl::Multi.new
73
+ responses = {}
74
+ feed_queue.slice!(0, 30).each do |feed|
75
+ add_feed_to_multi(multi, feed, feed_queue, responses, options)
76
+ end
77
+
78
+ multi.perform
79
+ return responses.size == 1 ? responses.values.first : responses.values
80
+ end
81
+
82
+ def self.add_url_to_multi(multi, url, url_queue, responses, options)
83
+ easy = Curl::Easy.new(url) do |curl|
84
+ curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT)
85
+ curl.headers["If-Modified-Since"] = options[:if_modified_since].httpdate if options.has_key?(:if_modified_since)
86
+ curl.headers["If-None-Match"] = options[:if_none_match] if options.has_key?(:if_none_match)
87
+ curl.follow_location = true
88
+ curl.on_success do |c|
89
+ add_url_to_multi(multi, url_queue.shift, url_queue, responses, options) unless url_queue.empty?
90
+ xml = c.body_str
91
+ klass = determine_feed_parser_for_xml(xml)
92
+ if klass
93
+ feed = klass.parse(xml)
94
+ feed.feed_url = c.last_effective_url
95
+ feed.etag = etag_from_header(c.header_str)
96
+ feed.last_modified = last_modified_from_header(c.header_str)
97
+ responses[url] = feed
98
+ options[:on_success].call(url, feed) if options.has_key?(:on_success)
99
+ else
100
+ puts "Error determining parser for #{url} - #{c.last_effective_url}"
101
+ end
102
+ end
103
+ curl.on_failure do |c|
104
+ add_url_to_multi(multi, url_queue.shift, url_queue, responses, options) unless url_queue.empty?
105
+ responses[url] = c.response_code
106
+ options[:on_failure].call(url, c.response_code, c.header_str, c.body_str) if options.has_key?(:on_failure)
107
+ end
108
+ end
109
+ multi.add(easy)
110
+ end
111
+
112
+ def self.add_feed_to_multi(multi, feed, feed_queue, responses, options)
113
+ # on_success = options[:on_success]
114
+ # on_failure = options[:on_failure]
115
+ # options[:on_success] = lambda do ||
116
+
117
+ easy = Curl::Easy.new(feed.feed_url) do |curl|
118
+ curl.headers["User-Agent"] = (options[:user_agent] || USER_AGENT)
119
+ curl.headers["If-Modified-Since"] = feed.last_modified.httpdate if feed.last_modified
120
+ curl.headers["If-None-Match"] = feed.etag if feed.etag
121
+ curl.follow_location = true
122
+ curl.on_success do |c|
123
+ add_feed_to_multi(multi, feed_queue.shift, feed_queue, responses, options) unless feed_queue.empty?
124
+ updated_feed = Feed.parse(c.body_str)
125
+ updated_feed.feed_url = c.last_effective_url
126
+ updated_feed.etag = etag_from_header(c.header_str)
127
+ updated_feed.last_modified = last_modified_from_header(c.header_str)
128
+ feed.update_from_feed(updated_feed)
129
+ responses[feed.feed_url] = feed
130
+ options[:on_success].call(feed) if options.has_key?(:on_success)
131
+ end
132
+ curl.on_failure do |c|
133
+ add_feed_to_multi(multi, feed_queue.shift, feed_queue, responses, options) unless feed_queue.empty?
134
+ response_code = c.response_code
135
+ if response_code == 304 # it's not modified. this isn't an error condition
136
+ responses[feed.feed_url] = feed
137
+ options[:on_success].call(feed) if options.has_key?(:on_success)
138
+ else
139
+ responses[feed.url] = c.response_code
140
+ options[:on_failure].call(feed, c.response_code, c.header_str, c.body_str) if options.has_key?(:on_failure)
141
+ end
142
+ end
143
+ end
144
+ multi.add(easy)
145
+ end
146
+
147
+ def self.etag_from_header(header)
148
+ header =~ /.*ETag:\s(.*)\r/
149
+ $1
150
+ end
151
+
152
+ def self.last_modified_from_header(header)
153
+ header =~ /.*Last-Modified:\s(.*)\r/
154
+ Time.parse($1) if $1
155
+ end
156
+ end
157
+ end
@@ -0,0 +1,15 @@
1
+ module Feedzirra
2
+ module FeedEntryUtilities
3
+ attr_reader :published
4
+
5
+ def parse_datetime(string)
6
+ DateTime.parse(string).feed_utils_to_gm_time
7
+ end
8
+
9
+ def published=(val)
10
+ @published = parse_datetime(val)
11
+ end
12
+
13
+ alias_method :last_modified, :published
14
+ end
15
+ end
@@ -0,0 +1,56 @@
1
+ module Feedzirra
2
+ module FeedUtilities
3
+ UPDATABLE_ATTRIBUTES = %w(title feed_url url last_modified)
4
+
5
+ attr_writer :new_entries, :updated, :last_modified
6
+ attr_accessor :etag
7
+
8
+ def last_modified
9
+ @last_modified ||= begin
10
+ entry = entries.reject {|e| e.published.nil? }.sort_by { |entry| entry.published if entry.published }.last
11
+ entry ? entry.published : nil
12
+ end
13
+ end
14
+
15
+ def updated?
16
+ @updated
17
+ end
18
+
19
+ def new_entries
20
+ @new_entries ||= []
21
+ end
22
+
23
+ def has_new_entries?
24
+ new_entries.size > 0
25
+ end
26
+
27
+ def update_from_feed(feed)
28
+ self.new_entries += find_new_entries_for(feed)
29
+ self.entries += self.new_entries
30
+
31
+ updated! if UPDATABLE_ATTRIBUTES.any? { |name| update_attribute(feed, name) }
32
+ end
33
+
34
+ def update_attribute(feed, name)
35
+ old_value, new_value = send(name), feed.send(name)
36
+
37
+ if old_value != new_value
38
+ send("#{name}=", new_value)
39
+ end
40
+ end
41
+
42
+ private
43
+
44
+ def updated!
45
+ @updated = true
46
+ end
47
+
48
+ def find_new_entries_for(feed)
49
+ feed.entries.inject([]) { |result, entry| result << entry unless existing_entry?(entry); result }
50
+ end
51
+
52
+ def existing_entry?(test_entry)
53
+ entries.any? { |entry| entry.url == test_entry.url }
54
+ end
55
+ end
56
+ end
@@ -0,0 +1,15 @@
1
+ module Feedzirra
2
+ class RDF
3
+ include SAXMachine
4
+ include FeedUtilities
5
+ element :title
6
+ element :link, :as => :url
7
+ elements :item, :as => :entries, :class => RDFEntry
8
+
9
+ attr_accessor :feed_url
10
+
11
+ def self.able_to_parse?(xml)
12
+ xml =~ /(rdf\:RDF)|(#{Regexp.escape("http://purl.org/rss/1.0")})|(rss version\=\"0\.9.?\")/ || false
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,12 @@
1
+ module Feedzirra
2
+ class RDFEntry
3
+ include SAXMachine
4
+ include FeedEntryUtilities
5
+ element :title
6
+ element :link, :as => :url
7
+ element :"dc:creator", :as => :author
8
+ element :"content:encoded", :as => :content
9
+ element :description, :as => :summary
10
+ element :"dc:date", :as => :published
11
+ end
12
+ end
@@ -0,0 +1,15 @@
1
+ module Feedzirra
2
+ class RSS
3
+ include SAXMachine
4
+ include FeedUtilities
5
+ element :title
6
+ element :link, :as => :url
7
+ elements :item, :as => :entries, :class => RSSEntry
8
+
9
+ attr_accessor :feed_url
10
+
11
+ def self.able_to_parse?(xml)
12
+ xml =~ /rss.*version\=\"2\.0\"/
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,12 @@
1
+ module Feedzirra
2
+ class RSSEntry
3
+ include SAXMachine
4
+ include FeedEntryUtilities
5
+ element :title
6
+ element :link, :as => :url
7
+ element :"dc:creator", :as => :author
8
+ element :"content:encoded", :as => :content
9
+ element :description, :as => :summary
10
+ element :pubDate, :as => :published
11
+ end
12
+ end
data/lib/feedzirra.rb ADDED
@@ -0,0 +1,29 @@
1
+ $LOAD_PATH.unshift(File.dirname(__FILE__)) unless $LOAD_PATH.include?(File.dirname(__FILE__))
2
+
3
+ gem 'activesupport'
4
+
5
+ require 'curb'
6
+ require 'sax-machine'
7
+ require 'active_support/basic_object'
8
+ require 'active_support/core_ext/object'
9
+ require 'active_support/core_ext/time'
10
+
11
+ require 'core_ext/date'
12
+
13
+ require 'feedzirra/feed_utilities'
14
+ require 'feedzirra/feed_entry_utilities'
15
+ require 'feedzirra/feed'
16
+
17
+ require 'feedzirra/rss_entry'
18
+ require 'feedzirra/rdf_entry'
19
+ require 'feedzirra/atom_entry'
20
+ require 'feedzirra/atom_feed_burner_entry'
21
+
22
+ require 'feedzirra/rss'
23
+ require 'feedzirra/rdf'
24
+ require 'feedzirra/atom'
25
+ require 'feedzirra/atom_feed_burner'
26
+
27
+ module Feedzirra
28
+ VERSION = "0.0.1"
29
+ end
@@ -0,0 +1,33 @@
1
+ require File.dirname(__FILE__) + '/../spec_helper'
2
+
3
+ describe Feedzirra::AtomEntry do
4
+ before(:each) do
5
+ # I don't really like doing it this way because these unit test should only rely on AtomEntry,
6
+ # but this is actually how it should work. You would never just pass entry xml straight to the AtomEnry
7
+ @entry = Feedzirra::Atom.parse(sample_atom_feed).entries.first
8
+ end
9
+
10
+ it "should parse the title" do
11
+ @entry.title.should == "AWS Job: Architect & Designer Position in Turkey"
12
+ end
13
+
14
+ it "should parse the url" do
15
+ @entry.url.should == "http://aws.typepad.com/aws/2009/01/aws-job-architect-designer-position-in-turkey.html"
16
+ end
17
+
18
+ it "should parse the author" do
19
+ @entry.author.should == "AWS Editor"
20
+ end
21
+
22
+ it "should parse the content" do
23
+ @entry.content.should == sample_atom_entry_content
24
+ end
25
+
26
+ it "should provide a summary" do
27
+ @entry.summary.should == "Late last year an entrepreneur from Turkey visited me at Amazon HQ in Seattle. We talked about his plans to use AWS as part of his new social video portal startup. I won't spill any beans before he's ready to..."
28
+ end
29
+
30
+ it "should parse the published date" do
31
+ @entry.published.to_s.should == "Fri Jan 16 18:21:00 UTC 2009"
32
+ end
33
+ end
@@ -0,0 +1,33 @@
1
+ require File.dirname(__FILE__) + '/../spec_helper'
2
+
3
+ describe Feedzirra::AtomFeedBurnerEntry do
4
+ before(:each) do
5
+ # I don't really like doing it this way because these unit test should only rely on AtomEntry,
6
+ # but this is actually how it should work. You would never just pass entry xml straight to the AtomEnry
7
+ @entry = Feedzirra::AtomFeedBurner.parse(sample_feedburner_atom_feed).entries.first
8
+ end
9
+
10
+ it "should parse the title" do
11
+ @entry.title.should == "Making a Ruby C library even faster"
12
+ end
13
+
14
+ it "should parse the url" do
15
+ @entry.url.should == "http://www.pauldix.net/2009/01/making-a-ruby-c-library-even-faster.html"
16
+ end
17
+
18
+ it "should parse the author" do
19
+ @entry.author.should == "Paul Dix"
20
+ end
21
+
22
+ it "should parse the content" do
23
+ @entry.content.should == sample_feedburner_atom_entry_content
24
+ end
25
+
26
+ it "should provide a summary" do
27
+ @entry.summary.should == "Last week I released the first version of a SAX based XML parsing library called SAX-Machine. It uses Nokogiri, which uses libxml, so it's pretty fast. However, I felt that it could be even faster. The only question was how..."
28
+ end
29
+
30
+ it "should parse the published date" do
31
+ @entry.published.to_s.should == "Thu Jan 22 15:50:22 UTC 2009"
32
+ end
33
+ end
@@ -0,0 +1,39 @@
1
+ require File.dirname(__FILE__) + '/../spec_helper'
2
+
3
+ describe Feedzirra::AtomFeedBurner do
4
+ describe "#will_parse?" do
5
+ it "should return true for a feedburner atom feed" do
6
+ Feedzirra::AtomFeedBurner.should be_able_to_parse(sample_feedburner_atom_feed)
7
+ end
8
+
9
+ it "should return false for an rdf feed" do
10
+ Feedzirra::AtomFeedBurner.should_not be_able_to_parse(sample_rdf_feed)
11
+ end
12
+
13
+ it "should return false for a regular atom feed" do
14
+ Feedzirra::AtomFeedBurner.should_not be_able_to_parse(sample_atom_feed)
15
+ end
16
+ end
17
+
18
+ describe "parsing" do
19
+ before(:each) do
20
+ @feed = Feedzirra::AtomFeedBurner.parse(sample_feedburner_atom_feed)
21
+ end
22
+
23
+ it "should parse the title" do
24
+ @feed.title.should == "Paul Dix Explains Nothing"
25
+ end
26
+
27
+ it "should parse the url" do
28
+ @feed.url.should == "http://www.pauldix.net/"
29
+ end
30
+
31
+ it "should parse the feed_url" do
32
+ @feed.feed_url.should == "http://feeds.feedburner.com/PaulDixExplainsNothing"
33
+ end
34
+
35
+ it "should parse entries" do
36
+ @feed.entries.size.should == 5
37
+ end
38
+ end
39
+ end