logophobia-feedzirra 0.0.18 → 0.0.20

Sign up to get free protection for your applications and to get access to all the features.
data/README.rdoc ADDED
@@ -0,0 +1,169 @@
1
+ == Feedzirra
2
+
3
+ I'd like feedback on the api and any bugs encountered on feeds in the wild. I've set up a {google group here}[http://groups.google.com/group/feedzirra].
4
+
5
+ === Description
6
+
7
+ Feedzirra is a feed library that is designed to get and update many feeds as quickly as possible. This includes using libcurl-multi through the taf2-curb[link:http://github.com/taf2/curb/tree/master] gem for faster http gets, and libxml through nokogiri[link:http://github.com/tenderlove/nokogiri/tree/master] and sax-machine[link:http://github.com/pauldix/sax-machine/tree/master] for faster parsing.
8
+
9
+ Once you have fetched feeds using Feedzirra, they can be updated using the feed objects. Feedzirra automatically inserts etag and last-modified information from the http response headers to lower bandwidth usage, eliminate unnecessary parsing, and make things speedier in general.
10
+
11
+ Another feature present in Feedzirra is the ability to create callback functions that get called "on success" and "on failure" when getting a feed. This makes it easy to do things like log errors or update data stores.
12
+
13
+ The fetching and parsing logic have been decoupled so that either of them can be used in isolation if you'd prefer not to use everything that Feedzirra offers. However, the code examples below use helper methods in the Feed class that put everything together to make things as simple as possible.
14
+
15
+ The final feature of Feedzirra is the ability to define custom parsing classes. In truth, Feedzirra could be used to parse much more than feeds. Microformats, page scraping, and almost anything else are fair game.
16
+
17
+ === Installation
18
+
19
+ For now Feedzirra exists only on github. It also has a few gem requirements that are only on github. Before you start you need to have libcurl[link:http://curl.haxx.se/] and libxml[link:http://xmlsoft.org/] installed. If you're on Leopard you have both. Otherwise, you'll need to grab them. Once you've got those libraries, these are the gems that get used: nokogiri, pauldix-sax-machine, taf2-curb (note that this is a fork that lives on github and not the Ruby Forge version of curb), and pauldix-feedzirra. The feedzirra gemspec has all the dependencies so you should be able to get up and running with the standard github gem install routine:
20
+
21
+ gem sources -a http://gems.github.com # if you haven't already
22
+ gem install pauldix-feedzirra
23
+
24
+ *NOTE:*Some people have been reporting a few issues related to installation. First, the Ruby Forge version of curb is not what you want. It will not work. Nor will the curl-multi gem that lives on Ruby Forge. You have to get the taf2-curb[link:http://github.com/taf2/curb/tree/master] fork installed.
25
+
26
+ If you see this error when doing a require:
27
+
28
+ /Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `gem_original_require': no such file to load -- curb_core (LoadError)
29
+
30
+ It means that the taf2-curb gem didn't build correctly. To resolve this you can do a git clone git://github.com/taf2/curb.git then run rake gem in the curb directory, then sudo gem install pkg/curb-0.2.4.0.gem. After that you should be good.
31
+
32
+ If you see something like this when trying to run it:
33
+
34
+ NoMethodError: undefined method `on_success' for #<Curl::Easy:0x1182724>
35
+ from ./lib/feedzirra/feed.rb:88:in `add_url_to_multi'
36
+
37
+ This means that you are requiring curl-multi or the Ruby Forge version of Curb somewhere. You can't use those and need to get the taf2 version up and running.
38
+
39
+ If you're on Debian or Ubuntu and getting errors while trying to install the taf2-curb gem, it could be because you don't have the latest version of libcurl installed. Do this to fix:
40
+
41
+ sudo apt-get install libcurl4-gnutls-dev
42
+
43
+ Another problem could be if you are running Mac Ports and you have libcurl installed through there. You need to uninstall it for curb to work! The version in Mac Ports is old and doesn't play nice with curb. If you're running Leopard, you can just uninstall and you should be golden. If you're on an older version of OS X, you'll then need to {download curl}[http://curl.haxx.se/download.html] and build from source. Then you'll have to install the taf2-curb gem again. You might have to perform the step above.
44
+
45
+ If you're still having issues, please let me know on the mailing list. Also, {Todd Fisher (taf2)}[link:http://github.com/taf2] is working on fixing the gem install. Please send him a full error report.
46
+
47
+ === Usage
48
+
49
+ {A gist of the following code}[link:http://gist.github.com/57285]
50
+
51
+ require 'feedzirra'
52
+
53
+ # fetching a single feed
54
+ feed = Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing")
55
+
56
+ # feed and entries accessors
57
+ feed.title # => "Paul Dix Explains Nothing"
58
+ feed.url # => "http://www.pauldix.net"
59
+ feed.feed_url # => "http://feeds.feedburner.com/PaulDixExplainsNothing"
60
+ feed.etag # => "GunxqnEP4NeYhrqq9TyVKTuDnh0"
61
+ feed.last_modified # => Sat Jan 31 17:58:16 -0500 2009 # it's a Time object
62
+
63
+ entry = feed.entries.first
64
+ entry.title # => "Ruby Http Client Library Performance"
65
+ entry.url # => "http://www.pauldix.net/2009/01/ruby-http-client-library-performance.html"
66
+ entry.author # => "Paul Dix"
67
+ entry.summary # => "..."
68
+ entry.content # => "..."
69
+ entry.published # => Thu Jan 29 17:00:19 UTC 2009 # it's a Time object
70
+ entry.categories # => ["...", "..."]
71
+
72
+ # sanitizing an entry's content
73
+ entry.title.sanitize # => returns the title with harmful stuff escaped
74
+ entry.author.sanitize # => returns the author with harmful stuff escaped
75
+ entry.content.sanitize # => returns the content with harmful stuff escaped
76
+ entry.content.sanitize! # => returns content with harmful stuff escaped and replaces original (also exists for author and title)
77
+ entry.sanitize! # => sanitizes the entry's title, author, and content in place (as in, it changes the value to clean versions)
78
+ feed.sanitize_entries! # => sanitizes all entries in place
79
+
80
+ # updating a single feed
81
+ updated_feed = Feedzirra::Feed.update(feed)
82
+
83
+ # an updated feed has the following extra accessors
84
+ updated_feed.updated? # returns true if any of the feed attributes have been modified. will return false if only new entries
85
+ updated_feed.new_entries # a collection of the entry objects that are newer than the latest in the feed before update
86
+
87
+ # fetching multiple feeds
88
+ feed_urls = ["http://feeds.feedburner.com/PaulDixExplainsNothing", "http://feeds.feedburner.com/trottercashion"]
89
+ feeds = Feedzirra::Feed.fetch_and_parse(feed_urls)
90
+
91
+ # feeds is now a hash with the feed_urls as keys and the parsed feed objects as values. If an error was thrown
92
+ # there will be a Fixnum of the http response code instead of a feed object
93
+
94
+ # updating multiple feeds. it expects a collection of feed objects
95
+ updated_feeds = Feedzirra::Feed.update(feeds.values)
96
+
97
+ # defining custom behavior on failure or success. note that a return status of 304 (not updated) will call the on_success handler
98
+ feed = Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing",
99
+ :on_success => lambda {|feed| puts feed.title },
100
+ :on_failure => lambda {|url, response_code, response_header, response_body| puts response_body })
101
+ # if a collection was passed into fetch_and_parse, the handlers will be called for each one
102
+
103
+ # the behavior for the handlers when using Feedzirra::Feed.update is slightly different. The feed passed into on_success will be
104
+ # the updated feed with the standard updated accessors. on failure it will be the original feed object passed into update
105
+
106
+ # Defining custom parsers
107
+ # TODO: the functionality is here, just write some good examples that show how to do this
108
+
109
+ === Extending
110
+
111
+ === Benchmarks
112
+
113
+ One of the goals of Feedzirra is speed. This includes not only parsing, but fetching multiple feeds as quickly as possible. I ran a benchmark getting 20 feeds 10 times using Feedzirra, rFeedParser, and FeedNormalizer. For more details the {benchmark code can be found in the project in spec/benchmarks/feedzirra_benchmarks.rb}[http://github.com/pauldix/feedzirra/blob/7fb5634c5c16e9c6ec971767b462c6518cd55f5d/spec/benchmarks/feedzirra_benchmarks.rb]
114
+
115
+ feedzirra 5.170000 1.290000 6.460000 ( 18.917796)
116
+ rfeedparser 104.260000 12.220000 116.480000 (244.799063)
117
+ feed-normalizer 66.250000 4.010000 70.260000 (191.589862)
118
+
119
+ The result of that benchmark is a bit sketchy because of the network variability. Running 10 times against the same 20 feeds was meant to smooth some of that out. However, there is also a {benchmark comparing parsing speed in spec/benchmarks/parsing_benchmark.rb}[http://github.com/pauldix/feedzirra/blob/7fb5634c5c16e9c6ec971767b462c6518cd55f5d/spec/benchmarks/parsing_benchmark.rb] on an atom feed.
120
+
121
+ feedzirra 0.500000 0.030000 0.530000 ( 0.658744)
122
+ rfeedparser 8.400000 1.110000 9.510000 ( 11.839827)
123
+ feed-normalizer 5.980000 0.160000 6.140000 ( 7.576140)
124
+
125
+ There's also a {benchmark that shows the results of using Feedzirra to perform updates on feeds}[http://github.com/pauldix/feedzirra/blob/45d64319544c61a4c9eb9f7f825c73b9f9030cb3/spec/benchmarks/updating_benchmarks.rb] you've already pulled in. I tested against 179 feeds. The first is the initial pull and the second is an update 65 seconds later. I'm not sure how many of them support etag and last-modified, so performance may be better or worse depending on what feeds you're requesting.
126
+
127
+ feedzirra fetch and parse 4.010000 0.710000 4.720000 ( 15.110101)
128
+ feedzirra update 0.660000 0.280000 0.940000 ( 5.152709)
129
+
130
+ === TODO
131
+
132
+ This thing needs to hammer on many different feeds in the wild. I'm sure there will be bugs. I want to find them and crush them. I didn't bother using the test suite for feedparser. i wanted to start fresh.
133
+
134
+ Here are some more specific TODOs.
135
+ * Make a feedzirra-rails gem to integrate feedzirra seamlessly with Rails and ActiveRecord.
136
+ * Add support for authenticated feeds.
137
+ * Create a super sweet DSL for defining new parsers.
138
+ * Test against Ruby 1.9.1 and fix any bugs.
139
+ * I'm not keeping track of modified on entries. Should I add this?
140
+ * Clean up the fetching code inside feed.rb so it doesn't suck so hard.
141
+ * Make the feed_spec actually mock stuff out so it doesn't hit the net.
142
+ * Readdress how feeds determine if they can parse a document. Maybe I should use namespaces instead?
143
+
144
+ === LICENSE
145
+
146
+ (The MIT License)
147
+
148
+ Copyright (c) 2009:
149
+
150
+ {Paul Dix}[http://pauldix.net]
151
+
152
+ Permission is hereby granted, free of charge, to any person obtaining
153
+ a copy of this software and associated documentation files (the
154
+ 'Software'), to deal in the Software without restriction, including
155
+ without limitation the rights to use, copy, modify, merge, publish,
156
+ distribute, sublicense, and/or sell copies of the Software, and to
157
+ permit persons to whom the Software is furnished to do so, subject to
158
+ the following conditions:
159
+
160
+ The above copyright notice and this permission notice shall be
161
+ included in all copies or substantial portions of the Software.
162
+
163
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
164
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
165
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
166
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
167
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
168
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
169
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,11 @@
1
+ module Feedzirra
2
+ module Parser
3
+ class MRSSCategory
4
+ include SAXMachine
5
+
6
+ element :'media:category', :as => :category
7
+ element :'media:category', :value => :scheme, :as => :scheme
8
+ element :'media:category', :value => :label, :as => :label
9
+ end
10
+ end
11
+ end
@@ -1,24 +1,48 @@
1
+ require File.dirname(__FILE__) + '/mrss_credit'
2
+ require File.dirname(__FILE__) + '/mrss_restriction'
3
+ require File.dirname(__FILE__) + '/mrss_category'
4
+ require File.dirname(__FILE__) + '/mrss_copyright'
5
+ require File.dirname(__FILE__) + '/mrss_hash'
6
+ require File.dirname(__FILE__) + '/mrss_player'
7
+ require File.dirname(__FILE__) + '/mrss_rating'
8
+ require File.dirname(__FILE__) + '/mrss_restriction'
9
+ require File.dirname(__FILE__) + '/mrss_text'
10
+ require File.dirname(__FILE__) + '/mrss_thumbnail'
11
+
1
12
  module Feedzirra
2
13
  module Parser
3
- class RSSEntry
4
- class MRSSContent
5
- include SAXMachine
14
+ class MRSSContent
15
+ include SAXMachine
16
+
17
+ element :'media:content', :as => :url, :value => :url
18
+ element :'media:content', :as => :content_type, :value => :type
19
+ element :'media:content', :as => :medium, :value => :medium
20
+ element :'media:content', :as => :duration, :value => :duration
21
+ element :'media:content', :as => :isDefault, :value => :isDefault
22
+ element :'media:content', :as => :expression, :value => :expression
23
+ element :'media:content', :as => :bitrate, :value => :bitrate
24
+ element :'media:content', :as => :framerate, :value => :framerate
25
+ element :'media:content', :as => :samplingrate, :value => :sampling
26
+ element :'media:content', :as => :channels, :value => :duration
27
+ element :'media:content', :as => :height, :value => :height
28
+ element :'media:content', :as => :width, :value => :width
29
+ element :'media:content', :as => :lang, :value => :lang
30
+ element :'media:content', :as => :fileSize, :value => :fileSize
31
+
32
+ # optional elements
33
+ element :'media:title', :as => :media_title
34
+ element :'media:keywords', :as => :media_keywords
35
+ element :'media:description', :as => :media_description
6
36
 
7
- element :'media:content', :as => :url, :value => :url
8
- element :'media:content', :as => :content_type, :value => :type
9
- element :'media:content', :as => :medium, :value => :medium
10
- element :'media:content', :as => :duration, :value => :duration
11
- element :'media:content', :as => :isDefault, :value => :isDefault
12
- element :'media:content', :as => :expression, :value => :expression
13
- element :'media:content', :as => :bitrate, :value => :bitrate
14
- element :'media:content', :as => :framerate, :value => :framerate
15
- element :'media:content', :as => :samplingrate, :value => :sampling
16
- element :'media:content', :as => :channels, :value => :duration
17
- element :'media:content', :as => :height, :value => :height
18
- element :'media:content', :as => :width, :value => :width
19
- element :'media:content', :as => :lang, :value => :lang
20
- element :'media:content', :as => :fileSize, :value => :fileSize
21
- end
37
+ element :'media:thumbnail', :as => :media_thumbnail, :class => MRSSThumbnail
38
+ element :'media:rating', :as => :rating, :class => MRSSRating
39
+ element :'media:category', :as => :media_category, :class => MRSSCategory
40
+ element :'media:hash', :as => :media_hash, :class => MRSSHash
41
+ element :'media:player', :as => :media_player, :class => MRSSPlayer
42
+ elements :'media:credit', :as => :credits, :class => MRSSCredit
43
+ element :'media:copyright', :as => :copyright, :class => MRSSCopyright
44
+ element :'media:restriction', :as => :media_restriction, :class => MRSSRestriction
45
+ element :'media:text', :as => :text, :class => MRSSText
22
46
  end
23
47
  end
24
48
  end
@@ -0,0 +1,10 @@
1
+ module Feedzirra
2
+ module Parser
3
+ class MRSSCopyright
4
+ include SAXMachine
5
+
6
+ element :'media:copyright', :as => :copyright
7
+ element :'media:copyright', :as => :url, :value => :url
8
+ end
9
+ end
10
+ end
@@ -1,13 +1,11 @@
1
1
  module Feedzirra
2
2
  module Parser
3
- class RSSEntry
4
- class MRSSCredit
5
- include SAXMachine
3
+ class MRSSCredit
4
+ include SAXMachine
6
5
 
7
- element :'media:credit', :as => :role, :value => :role
8
- element :'media:credit', :as => :scheme, :value => :scheme
9
- element :'media:credit', :as => :name
10
- end
6
+ element :'media:credit', :as => :role, :value => :role
7
+ element :'media:credit', :as => :scheme, :value => :scheme
8
+ element :'media:credit', :as => :name
11
9
  end
12
10
  end
13
11
  end
@@ -1,13 +1,37 @@
1
1
  require File.dirname(__FILE__) + '/mrss_content'
2
+ require File.dirname(__FILE__) + '/mrss_credit'
3
+ require File.dirname(__FILE__) + '/mrss_restriction'
4
+ require File.dirname(__FILE__) + '/mrss_group'
5
+ require File.dirname(__FILE__) + '/mrss_category'
6
+ require File.dirname(__FILE__) + '/mrss_copyright'
7
+ require File.dirname(__FILE__) + '/mrss_hash'
8
+ require File.dirname(__FILE__) + '/mrss_player'
9
+ require File.dirname(__FILE__) + '/mrss_rating'
10
+ require File.dirname(__FILE__) + '/mrss_restriction'
11
+ require File.dirname(__FILE__) + '/mrss_text'
12
+ require File.dirname(__FILE__) + '/mrss_thumbnail'
2
13
 
3
14
  module Feedzirra
4
15
  module Parser
5
- class RSSEntry
6
- class MRSSGroup
7
- include SAXMachine
16
+ class MRSSGroup
17
+ include SAXMachine
8
18
 
9
- elements :'media:content', :as => :media_content, :class => MRSSContent
10
- end
19
+ elements :'media:content', :as => :media_content, :class => MRSSContent
20
+
21
+ # optional elements
22
+ element :'media:title', :as => :media_title
23
+ element :'media:keywords', :as => :media_keywords
24
+ element :'media:description', :as => :media_description
25
+
26
+ element :'media:thumbnail', :as => :media_thumbnail, :class => MRSSThumbnail
27
+ element :'media:rating', :as => :rating, :class => MRSSRating
28
+ element :'media:category', :as => :media_category, :class => MRSSCategory
29
+ element :'media:hash', :as => :media_hash, :class => MRSSHash
30
+ element :'media:player', :as => :media_player, :class => MRSSPlayer
31
+ elements :'media:credit', :as => :credits, :class => MRSSCredit
32
+ element :'media:copyright', :as => :copyright, :class => MRSSCopyright
33
+ element :'media:restriction', :as => :media_restriction, :class => MRSSRestriction
34
+ element :'media:text', :as => :text, :class => MRSSText
11
35
  end
12
36
  end
13
37
  end
@@ -0,0 +1,10 @@
1
+ module Feedzirra
2
+ module Parser
3
+ class MRSSHash
4
+ include SAXMachine
5
+
6
+ element :'media:hash', :as => :hash
7
+ element :'media:hash', :value => :algo, :as => :algo
8
+ end
9
+ end
10
+ end
@@ -0,0 +1,11 @@
1
+ module Feedzirra
2
+ module Parser
3
+ class MRSSPlayer
4
+ include SAXMachine
5
+
6
+ element :'media:player', :value => :url, :as => :url
7
+ element :'media:player', :value => :width, :as => :width
8
+ element :'media:player', :value => :height, :as => :height
9
+ end
10
+ end
11
+ end
@@ -0,0 +1,10 @@
1
+ module Feedzirra
2
+ module Parser
3
+ class MRSSRating
4
+ include SAXMachine
5
+
6
+ element :'media:rating', :as => :rating
7
+ element :'media:rating', :value => :scheme, :as => :scheme
8
+ end
9
+ end
10
+ end
@@ -1,13 +1,11 @@
1
1
  module Feedzirra
2
2
  module Parser
3
- class RSSEntry
4
- class MRSSRestriction
5
- include SAXMachine
3
+ class MRSSRestriction
4
+ include SAXMachine
6
5
 
7
- element :'media:restriction', :as => :value
8
- element :'media:restriction', :as => :scope, :value => :type
9
- element :'media:restriction', :as => :relationship, :value => :relationship
10
- end
6
+ element :'media:restriction', :as => :value
7
+ element :'media:restriction', :as => :scope, :value => :type
8
+ element :'media:restriction', :as => :relationship, :value => :relationship
11
9
  end
12
10
  end
13
11
  end
@@ -0,0 +1,13 @@
1
+ module Feedzirra
2
+ module Parser
3
+ class MRSSText
4
+ include SAXMachine
5
+
6
+ element :'media:text', :as => :type, :value => :type
7
+ element :'media:text', :as => :lang, :value => :lang
8
+ element :'media:text', :as => :start, :value => :start
9
+ element :'media:text', :as => :end, :value => :end
10
+ element :'media:text', :as => :text
11
+ end
12
+ end
13
+ end
@@ -0,0 +1,11 @@
1
+ module Feedzirra
2
+ module Parser
3
+ class MRSSThumbnail
4
+ include SAXMachine
5
+
6
+ element :'media:thumbnail', :as => :url, :value => :url
7
+ element :'media:thumbnail', :as => :with, :value => :width
8
+ element :'media:thumbnail', :as => :height, :value => :height
9
+ end
10
+ end
11
+ end
@@ -1,3 +1,14 @@
1
+ require File.dirname(__FILE__) + '/mrss_credit'
2
+ require File.dirname(__FILE__) + '/mrss_restriction'
3
+ require File.dirname(__FILE__) + '/mrss_category'
4
+ require File.dirname(__FILE__) + '/mrss_copyright'
5
+ require File.dirname(__FILE__) + '/mrss_hash'
6
+ require File.dirname(__FILE__) + '/mrss_player'
7
+ require File.dirname(__FILE__) + '/mrss_rating'
8
+ require File.dirname(__FILE__) + '/mrss_restriction'
9
+ require File.dirname(__FILE__) + '/mrss_text'
10
+ require File.dirname(__FILE__) + '/mrss_thumbnail'
11
+
1
12
  module Feedzirra
2
13
  module Parser
3
14
  # == Summary
@@ -49,6 +60,21 @@ module Feedzirra
49
60
  # elements :'itunes:category', :as => :itunes_categories,
50
61
  # :class => ITunesCategory
51
62
 
63
+ # MediaRSS support
64
+ element :'media:title', :as => :media_title
65
+ element :'media:keywords', :as => :media_keywords
66
+ element :'media:description', :as => :media_description
67
+
68
+ element :'media:thumbnail', :as => :media_thumbnail, :class => MRSSThumbnail
69
+ element :'media:rating', :as => :rating, :class => MRSSRating
70
+ element :'media:category', :as => :media_category, :class => MRSSCategory
71
+ element :'media:hash', :as => :media_hash, :class => MRSSHash
72
+ element :'media:player', :as => :media_player, :class => MRSSPlayer
73
+ elements :'media:credit', :as => :credits, :class => MRSSCredit
74
+ element :'media:copyright', :as => :copyright, :class => MRSSCopyright
75
+ element :'media:restriction', :as => :media_restriction, :class => MRSSRestriction
76
+ element :'media:text', :as => :text, :class => MRSSText
77
+
52
78
  def self.able_to_parse?(xml) #:nodoc:
53
79
  xml =~ /\<rss|rdf/
54
80
  end
@@ -2,6 +2,14 @@ require File.dirname(__FILE__) + '/mrss_content'
2
2
  require File.dirname(__FILE__) + '/mrss_credit'
3
3
  require File.dirname(__FILE__) + '/mrss_restriction'
4
4
  require File.dirname(__FILE__) + '/mrss_group'
5
+ require File.dirname(__FILE__) + '/mrss_category'
6
+ require File.dirname(__FILE__) + '/mrss_copyright'
7
+ require File.dirname(__FILE__) + '/mrss_hash'
8
+ require File.dirname(__FILE__) + '/mrss_player'
9
+ require File.dirname(__FILE__) + '/mrss_rating'
10
+ require File.dirname(__FILE__) + '/mrss_restriction'
11
+ require File.dirname(__FILE__) + '/mrss_text'
12
+ require File.dirname(__FILE__) + '/mrss_thumbnail'
5
13
 
6
14
  module Feedzirra
7
15
  module Parser
@@ -44,44 +52,22 @@ module Feedzirra
44
52
  element :"dc:creator", :as => :author
45
53
  element :"dcterms:modified", :as => :updated
46
54
 
47
- # MediaRSS support
48
- element :'media:thumbnail', :as => :media_thumbnail, :value => :url
49
- element :'media:thumbnail', :as => :media_thumbnail_width, :value => :width
50
- element :'media:thumbnail', :as => :media_thumbnail_height, :value => :height
51
- element :'media:description', :as => :media_description
52
-
53
- element :'media:rating', :as => :rating
54
- element :'media:rating', :value => :scheme, :as => :rating_scheme
55
-
55
+ # MediaRSS support, optional elements
56
56
  element :'media:title', :as => :media_title
57
57
  element :'media:keywords', :as => :media_keywords
58
+ element :'media:description', :as => :media_description
58
59
 
59
- element :'media:category', :as => :media_category
60
- element :'media:category', :value => :scheme, :as => :media_category_scheme
61
- element :'media:category', :value => :label, :as => :media_category_label
62
-
63
- element :'media:hash', :as => :media_hash
64
- element :'media:hash', :value => :algo, :as => :media_hash_algo
65
-
66
- element :'media:player', :value => :url, :as => :media_player_url
67
- element :'media:player', :value => :width, :as => :media_player_width
68
- element :'media:player', :value => :height, :as => :media_player_height
69
-
60
+ element :'media:thumbnail', :as => :media_thumbnail, :class => MRSSThumbnail
61
+ element :'media:rating', :as => :rating, :class => MRSSRating
62
+ element :'media:category', :as => :media_category, :class => MRSSCategory
63
+ element :'media:hash', :as => :media_hash, :class => MRSSHash
64
+ element :'media:player', :as => :media_player, :class => MRSSPlayer
70
65
  elements :'media:credit', :as => :credits, :class => MRSSCredit
71
-
72
- element :'media:copyright', :as => :copyright
73
- element :'media:copyright', :as => :copyright_url, :value => :url
74
-
66
+ element :'media:copyright', :as => :copyright, :class => MRSSCopyright
75
67
  element :'media:restriction', :as => :media_restriction, :class => MRSSRestriction
76
-
68
+ element :'media:text', :as => :text, :class => MRSSText
77
69
  elements :'media:content', :as => :media_content, :class => MRSSContent
78
- element :'media:group', :as => :media_group, :class => MRSSGroup
79
-
80
- element :'media:text', :as => :media_text_type, :value => :type
81
- element :'media:text', :as => :media_text_lang, :value => :lang
82
- element :'media:text', :as => :media_text_start, :value => :start
83
- element :'media:text', :as => :media_text_end, :value => :end
84
- element :'media:text', :as => :media_text
70
+ elements :'media:group', :as => :media_groups, :class => MRSSGroup
85
71
 
86
72
  # iTunes
87
73
  element :'itunes:author', :as => :author
data/lib/feedzirra.rb CHANGED
@@ -19,11 +19,20 @@ require 'feedzirra/feed_utilities'
19
19
  require 'feedzirra/feed_entry_utilities'
20
20
  require 'feedzirra/feed'
21
21
 
22
- require 'feedzirra/parser/rss_entry'
23
- require 'feedzirra/parser/rss_image'
24
22
  require 'feedzirra/parser/mrss_content'
23
+ require 'feedzirra/parser/mrss_credit'
25
24
  require 'feedzirra/parser/mrss_restriction'
26
25
  require 'feedzirra/parser/mrss_group'
26
+ require 'feedzirra/parser/mrss_category'
27
+ require 'feedzirra/parser/mrss_copyright'
28
+ require 'feedzirra/parser/mrss_hash'
29
+ require 'feedzirra/parser/mrss_player'
30
+ require 'feedzirra/parser/mrss_rating'
31
+ require 'feedzirra/parser/mrss_restriction'
32
+ require 'feedzirra/parser/mrss_text'
33
+ require 'feedzirra/parser/mrss_thumbnail'
34
+ require 'feedzirra/parser/rss_entry'
35
+ require 'feedzirra/parser/rss_image'
27
36
  require 'feedzirra/parser/itunes_category'
28
37
  require 'feedzirra/parser/atom_entry'
29
38
  require 'feedzirra/parser/atom_feed_burner_entry'
@@ -33,5 +42,5 @@ require 'feedzirra/parser/atom'
33
42
  require 'feedzirra/parser/atom_feed_burner'
34
43
 
35
44
  module Feedzirra
36
- VERSION = "0.0.18"
45
+ VERSION = "0.0.20"
37
46
  end
@@ -0,0 +1,98 @@
1
+ # this is some spike code to compare the speed of different methods for performing
2
+ # multiple feed fetches
3
+ require 'rubygems'
4
+ require 'curb'
5
+ require 'activesupport'
6
+
7
+ require 'net/http'
8
+ require 'uri'
9
+
10
+ require 'benchmark'
11
+ include Benchmark
12
+
13
+ GET_COUNT = 1
14
+ urls = ["http://www.pauldix.net"] * GET_COUNT
15
+
16
+
17
+ benchmark do |t|
18
+ t.report("taf2-curb") do
19
+ multi = Curl::Multi.new
20
+ urls.each do |url|
21
+ easy = Curl::Easy.new(url) do |curl|
22
+ curl.headers["User-Agent"] = "feedzirra"
23
+ # curl.headers["If-Modified-Since"] = Time.now.httpdate
24
+ # curl.headers["If-None-Match"] = "ziEyTl4q9GH04BR4jgkImd0GvSE"
25
+ curl.follow_location = true
26
+ curl.on_success do |c|
27
+ # puts c.header_str.inspect
28
+ # puts c.response_code
29
+ # puts c.body_str.slice(0, 500)
30
+ end
31
+ curl.on_failure do |c|
32
+ puts "**** #{c.response_code}"
33
+ end
34
+ end
35
+ multi.add(easy)
36
+ end
37
+
38
+ multi.perform
39
+ end
40
+
41
+ t.report("nethttp") do
42
+ urls.each do |url|
43
+ res = Net::HTTP.get(URI.parse(url))
44
+ # puts res.slice(0, 500)
45
+ end
46
+ end
47
+
48
+ require 'rfuzz/session'
49
+ include RFuzz
50
+ t.report("rfuzz") do
51
+ GET_COUNT.times do
52
+ http = HttpClient.new("www.pauldix.net", 80)
53
+ response = http.get("/")
54
+ if response.http_status != "200"
55
+ puts "***** #{response.http_status}"
56
+ else
57
+ # puts response.http_status
58
+ # puts response.http_body.slice(0, 500)
59
+ end
60
+ end
61
+ end
62
+
63
+ require 'eventmachine'
64
+ t.report("eventmachine") do
65
+ counter = GET_COUNT
66
+ EM.run do
67
+ GET_COUNT.times do
68
+ http = EM::Protocols::HttpClient2.connect("www.pauldix.net", 80)
69
+ request = http.get("/")
70
+ request.callback do
71
+ # puts request.status
72
+ # puts request.content.slice(0, 500)
73
+ counter -= 1
74
+ EM.stop if counter == 0
75
+ end
76
+ end
77
+ end
78
+ end
79
+
80
+
81
+ require 'curl-multi'
82
+ t.report("curl multi") do
83
+ multi = Curl::Multi.new
84
+ urls.each do |url|
85
+ on_failure = lambda do |ex|
86
+ puts "****** Failed to retrieve #{url}"
87
+ end
88
+
89
+ on_success = lambda do |body|
90
+ # puts "got #{url}"
91
+ # puts body.slice(0, 500)
92
+ end
93
+ multi.get(url, on_success, on_failure)
94
+ end
95
+
96
+ multi.select([], []) while multi.size > 0
97
+ end
98
+ end
@@ -0,0 +1,40 @@
1
+ require File.dirname(__FILE__) + '/../../lib/feedzirra.rb'
2
+ require 'rfeedparser'
3
+ require 'feed-normalizer'
4
+ require 'open-uri'
5
+
6
+ require 'benchmark'
7
+ include Benchmark
8
+
9
+ iterations = 10
10
+ urls = File.readlines(File.dirname(__FILE__) + "/../sample_feeds/successful_feed_urls.txt").slice(0, 20)
11
+ puts "benchmarks on #{urls.size} feeds"
12
+ puts "************************************"
13
+ benchmark do |t|
14
+ t.report("feedzirra") do
15
+ iterations.times do
16
+ Feedzirra::Feed.fetch_and_parse(urls, :on_success => lambda { |url, feed| $stdout.print '.'; $stdout.flush })
17
+ end
18
+ end
19
+
20
+ t.report("rfeedparser") do
21
+ iterations.times do
22
+ urls.each do |url|
23
+ feed = FeedParser.parse(url)
24
+ $stdout.print '.'
25
+ $stdout.flush
26
+ end
27
+ end
28
+ end
29
+
30
+ t.report("feed-normalizer") do
31
+ iterations.times do
32
+ urls.each do |url|
33
+ # have to use the :force option to make feed-normalizer parse an atom feed
34
+ feed = FeedNormalizer::FeedNormalizer.parse(open(url), :force_parser => FeedNormalizer::SimpleRssParser)
35
+ $stdout.print '.'
36
+ $stdout.flush
37
+ end
38
+ end
39
+ end
40
+ end
@@ -0,0 +1,28 @@
1
+ require 'rubygems'
2
+ require File.dirname(__FILE__) + '/../../lib/feedzirra.rb'
3
+
4
+ require 'open-uri'
5
+
6
+ require 'benchmark'
7
+ include Benchmark
8
+
9
+ iterations = 10
10
+ urls = File.readlines(File.dirname(__FILE__) + "/../sample_feeds/successful_feed_urls.txt").slice(0, 20)
11
+ puts "benchmarks on #{urls.size} feeds"
12
+ puts "************************************"
13
+ benchmark do |t|
14
+ t.report("feedzirra open uri") do
15
+ iterations.times do
16
+ urls.each do |url|
17
+ Feedzirra::Feed.parse(open(url, "User-Agent" => "feedzirra http://github.com/pauldix/feedzirra/tree/master").read)
18
+ $stdout.print '.'; $stdout.flush
19
+ end
20
+ end
21
+ end
22
+
23
+ t.report("feedzirra fetch and parse") do
24
+ iterations.times do
25
+ Feedzirra::Feed.fetch_and_parse(urls, :on_success => lambda { |url, feed| $stdout.print '.'; $stdout.flush })
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,30 @@
1
+ require File.dirname(__FILE__) + '/../../lib/feedzirra.rb'
2
+ require 'rfeedparser'
3
+ require 'feed-normalizer'
4
+
5
+ require 'benchmark'
6
+ include Benchmark
7
+
8
+ iterations = 50
9
+ xml = File.read(File.dirname(__FILE__) + '/../sample_feeds/PaulDixExplainsNothing.xml')
10
+
11
+ benchmark do |t|
12
+ t.report("feedzirra") do
13
+ iterations.times do
14
+ Feedzirra::Feed.parse(xml)
15
+ end
16
+ end
17
+
18
+ t.report("rfeedparser") do
19
+ iterations.times do
20
+ FeedParser.parse(xml)
21
+ end
22
+ end
23
+
24
+ t.report("feed-normalizer") do
25
+ iterations.times do
26
+ # have to use the :force option to make feed-normalizer parse an atom feed
27
+ FeedNormalizer::FeedNormalizer.parse(xml, :force_parser => FeedNormalizer::SimpleRssParser)
28
+ end
29
+ end
30
+ end
@@ -0,0 +1,33 @@
1
+ require 'rubygems'
2
+ require File.dirname(__FILE__) + '/../../lib/feedzirra.rb'
3
+
4
+ require 'benchmark'
5
+ include Benchmark
6
+
7
+ urls = File.readlines(File.dirname(__FILE__) + "/../sample_feeds/successful_feed_urls.txt")
8
+ puts "benchmarks on #{urls.size} feeds"
9
+ puts "************************************"
10
+ benchmark do |t|
11
+ feeds = {}
12
+ t.report("feedzirra fetch and parse") do
13
+ feeds = Feedzirra::Feed.fetch_and_parse(urls,
14
+ :on_success => lambda { |url, feed| $stdout.print '.'; $stdout.flush },
15
+ :on_failure => lambda {|url, response_code, header, body| puts "#{response_code} ERROR on #{url}"})
16
+ end
17
+
18
+ # curb caches the dns lookups for 60 seconds. to make things fair we have to wait for the cache to expire
19
+ puts "sleeping to wait for dns cache to clear"
20
+ 65.times {$stdout.print('.'); sleep(1)}
21
+ puts "done"
22
+
23
+ updated_feeds = []
24
+ t.report("feedzirra update") do
25
+ updated_feeds = Feedzirra::Feed.update(feeds.values.reject {|f| f.class == Fixnum},
26
+ :on_success => lambda {|feed| $stdout.print '.'; $stdout.flush},
27
+ :on_failure => lambda {|feed, response_code, header, body| puts "#{response_code} ERROR on #{feed.feed_url}"})
28
+ end
29
+
30
+ updated_feeds.each do |feed|
31
+ puts feed.feed_url if feed.updated?
32
+ end
33
+ end
@@ -0,0 +1,32 @@
1
+ require File.join(File.dirname(__FILE__), %w[.. .. spec_helper])
2
+
3
+ describe Feedzirra::Parser::RSSEntry::MRSSContent do
4
+ before(:each) do
5
+ # I don't really like doing it this way because these unit test should only rely on RSSEntry,
6
+ # but this is actually how it should work. You would never just pass entry xml straight to the AtomEnry
7
+ @entries = Feedzirra::Parser::RSS.parse(sample_mrss_feed).entries
8
+ end
9
+
10
+ it "should parse the media" do
11
+ entry = @entries.first
12
+ entry.media_content.size.should == 1
13
+ entry.media_description.should == 'The story began with a July 23 article in a local newspaper, The Independent. Jenna Hewitt, 26, of Montauk, and three friends said they found the ...'
14
+ entry.media_thumbnail.should == 'http://3.gvt0.com/vi/Y3rNEu4A8WM/default.jpg'
15
+ entry.media_thumbnail_width.should == '320'
16
+ entry.media_thumbnail_height.should == '240'
17
+ end
18
+
19
+ it "should handle multiple pieces of content" do
20
+ media = @entries[1].media_content
21
+ media.size.should == 2
22
+ media[0].url.should == 'http://www.youtube.com/v/pvaM6sjLbuA&#38;fs=1'
23
+ media[0].content_type.should == 'application/x-shockwave-flash'
24
+ media[0].medium.should == 'video'
25
+ media[0].duration.should == '575'
26
+
27
+ media[1].url.should == 'http://www.youtube.com/v/pvaM6sjLbuA&#38;fs=2'
28
+ media[1].content_type.should == 'video/mp4'
29
+ media[1].medium.should == 'video'
30
+ media[1].duration.should == '576'
31
+ end
32
+ end
@@ -0,0 +1,20 @@
1
+ require 'rubygems'
2
+ require File.dirname(__FILE__) + "/../../lib/feedzirra.rb"
3
+
4
+ feed_urls = File.readlines(File.dirname(__FILE__) + "/top5kfeeds.dat").collect {|line| line.split.first}
5
+
6
+ success = lambda do |url, feed|
7
+ puts "SUCCESS - #{feed.title} - #{url}"
8
+ end
9
+
10
+ failed_feeds = []
11
+ failure = lambda do |url, response_code, header, body|
12
+ failed_feeds << url if response_code == 200
13
+ puts "*********** FAILED with #{response_code} on #{url}"
14
+ end
15
+
16
+ Feedzirra::Feed.fetch_and_parse(feed_urls, :on_success => success, :on_failure => failure)
17
+
18
+ File.open("./failed_urls.txt", "w") do |f|
19
+ f.write failed_feeds.join("\n")
20
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: logophobia-feedzirra
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.18
4
+ version: 0.0.20
5
5
  platform: ruby
6
6
  authors:
7
7
  - Paul Dix
@@ -81,37 +81,51 @@ extensions: []
81
81
  extra_rdoc_files: []
82
82
 
83
83
  files:
84
- - lib/core_ext/date.rb
85
- - lib/core_ext/string.rb
86
84
  - lib/feedzirra.rb
87
85
  - lib/feedzirra/feed.rb
88
- - lib/feedzirra/parser/atom.rb
89
- - lib/feedzirra/parser/atom_entry.rb
86
+ - lib/feedzirra/feed_utilities.rb
87
+ - lib/feedzirra/parser/rss_entry.rb
90
88
  - lib/feedzirra/parser/atom_feed_burner.rb
91
- - lib/feedzirra/parser/atom_feed_burner_entry.rb
92
- - lib/feedzirra/parser/itunes_category.rb
89
+ - lib/feedzirra/parser/mrss_hash.rb
90
+ - lib/feedzirra/parser/atom_entry.rb
93
91
  - lib/feedzirra/parser/rss.rb
94
- - lib/feedzirra/parser/rss_entry.rb
92
+ - lib/feedzirra/parser/mrss_rating.rb
93
+ - lib/feedzirra/parser/mrss_text.rb
94
+ - lib/feedzirra/parser/mrss_player.rb
95
+ - lib/feedzirra/parser/mrss_group.rb
95
96
  - lib/feedzirra/parser/rss_image.rb
96
- - lib/feedzirra/parser/mrss_content.rb
97
+ - lib/feedzirra/parser/atom.rb
98
+ - lib/feedzirra/parser/atom_feed_burner_entry.rb
99
+ - lib/feedzirra/parser/mrss_thumbnail.rb
97
100
  - lib/feedzirra/parser/mrss_credit.rb
101
+ - lib/feedzirra/parser/itunes_category.rb
98
102
  - lib/feedzirra/parser/mrss_restriction.rb
99
- - lib/feedzirra/parser/mrss_group.rb
100
- - lib/feedzirra/feed_utilities.rb
103
+ - lib/feedzirra/parser/mrss_category.rb
104
+ - lib/feedzirra/parser/mrss_content.rb
105
+ - lib/feedzirra/parser/mrss_copyright.rb
101
106
  - lib/feedzirra/feed_entry_utilities.rb
107
+ - lib/core_ext/date.rb
108
+ - lib/core_ext/string.rb
109
+ - README.rdoc
102
110
  - README.textile
103
111
  - Rakefile
104
- - spec/spec.opts
105
- - spec/spec_helper.rb
106
- - spec/feedzirra/feed_spec.rb
112
+ - spec/feedzirra/parser/mrss_content_spec.rb
113
+ - spec/feedzirra/parser/rss_entry_spec.rb
114
+ - spec/feedzirra/parser/atom_feed_burner_entry_spec.rb
115
+ - spec/feedzirra/parser/rss_spec.rb
107
116
  - spec/feedzirra/parser/atom_spec.rb
108
117
  - spec/feedzirra/parser/atom_entry_spec.rb
109
118
  - spec/feedzirra/parser/atom_feed_burner_spec.rb
110
- - spec/feedzirra/parser/atom_feed_burner_entry_spec.rb
111
- - spec/feedzirra/parser/rss_spec.rb
112
- - spec/feedzirra/parser/rss_entry_spec.rb
113
- - spec/feedzirra/feed_utilities_spec.rb
114
119
  - spec/feedzirra/feed_entry_utilities_spec.rb
120
+ - spec/feedzirra/feed_spec.rb
121
+ - spec/feedzirra/feed_utilities_spec.rb
122
+ - spec/benchmarks/feed_benchmarks.rb
123
+ - spec/benchmarks/updating_benchmarks.rb
124
+ - spec/benchmarks/feedzirra_benchmarks.rb
125
+ - spec/benchmarks/fetching_benchmarks.rb
126
+ - spec/benchmarks/parsing_benchmark.rb
127
+ - spec/sample_feeds/run_against_sample.rb
128
+ - spec/spec_helper.rb
115
129
  has_rdoc: true
116
130
  homepage: http://github.com/pauldix/feedzirra
117
131
  post_install_message:
data/spec/spec.opts DELETED
@@ -1,2 +0,0 @@
1
- --diff
2
- --color