feedjira 0.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +14 -0
- data/.rspec +1 -0
- data/.travis.yml +8 -0
- data/CHANGELOG.md +162 -0
- data/Gemfile +17 -0
- data/Guardfile +5 -0
- data/README.md +242 -0
- data/Rakefile +6 -0
- data/benchmarks/README.md +90 -0
- data/benchmarks/basic.rb +31 -0
- data/benchmarks/feed_list.txt +10 -0
- data/benchmarks/feed_xml/apple.xml +149 -0
- data/benchmarks/feed_xml/cnn.xml +278 -0
- data/benchmarks/feed_xml/daring_fireball.xml +1697 -0
- data/benchmarks/feed_xml/engadget.xml +604 -0
- data/benchmarks/feed_xml/feedjira_commits.xml +370 -0
- data/benchmarks/feed_xml/gizmodo.xml +2 -0
- data/benchmarks/feed_xml/loop.xml +441 -0
- data/benchmarks/feed_xml/rails.xml +1938 -0
- data/benchmarks/feed_xml/white_house.xml +951 -0
- data/benchmarks/feed_xml/xkcd.xml +2 -0
- data/benchmarks/fetching_systems.rb +23 -0
- data/benchmarks/other_libraries.rb +73 -0
- data/feedjira.gemspec +27 -0
- data/lib/feedjira.rb +16 -0
- data/lib/feedjira/core_ext.rb +3 -0
- data/lib/feedjira/core_ext/date.rb +19 -0
- data/lib/feedjira/core_ext/string.rb +9 -0
- data/lib/feedjira/core_ext/time.rb +31 -0
- data/lib/feedjira/feed.rb +459 -0
- data/lib/feedjira/feed_entry_utilities.rb +66 -0
- data/lib/feedjira/feed_utilities.rb +103 -0
- data/lib/feedjira/parser.rb +20 -0
- data/lib/feedjira/parser/atom.rb +61 -0
- data/lib/feedjira/parser/atom_entry.rb +34 -0
- data/lib/feedjira/parser/atom_feed_burner.rb +22 -0
- data/lib/feedjira/parser/atom_feed_burner_entry.rb +35 -0
- data/lib/feedjira/parser/google_docs_atom.rb +28 -0
- data/lib/feedjira/parser/google_docs_atom_entry.rb +29 -0
- data/lib/feedjira/parser/itunes_rss.rb +50 -0
- data/lib/feedjira/parser/itunes_rss_item.rb +41 -0
- data/lib/feedjira/parser/itunes_rss_owner.rb +12 -0
- data/lib/feedjira/parser/rss.rb +24 -0
- data/lib/feedjira/parser/rss_entry.rb +37 -0
- data/lib/feedjira/parser/rss_feed_burner.rb +23 -0
- data/lib/feedjira/parser/rss_feed_burner_entry.rb +43 -0
- data/lib/feedjira/version.rb +3 -0
- data/spec/feedjira/feed_entry_utilities_spec.rb +62 -0
- data/spec/feedjira/feed_spec.rb +762 -0
- data/spec/feedjira/feed_utilities_spec.rb +273 -0
- data/spec/feedjira/parser/atom_entry_spec.rb +86 -0
- data/spec/feedjira/parser/atom_feed_burner_entry_spec.rb +47 -0
- data/spec/feedjira/parser/atom_feed_burner_spec.rb +56 -0
- data/spec/feedjira/parser/atom_spec.rb +76 -0
- data/spec/feedjira/parser/google_docs_atom_entry_spec.rb +22 -0
- data/spec/feedjira/parser/google_docs_atom_spec.rb +31 -0
- data/spec/feedjira/parser/itunes_rss_item_spec.rb +63 -0
- data/spec/feedjira/parser/itunes_rss_owner_spec.rb +18 -0
- data/spec/feedjira/parser/itunes_rss_spec.rb +58 -0
- data/spec/feedjira/parser/rss_entry_spec.rb +85 -0
- data/spec/feedjira/parser/rss_feed_burner_entry_spec.rb +85 -0
- data/spec/feedjira/parser/rss_feed_burner_spec.rb +57 -0
- data/spec/feedjira/parser/rss_spec.rb +57 -0
- data/spec/sample_feeds/AmazonWebServicesBlog.xml +797 -0
- data/spec/sample_feeds/AmazonWebServicesBlogFirstEntryContent.xml +63 -0
- data/spec/sample_feeds/AtomFeedWithSpacesAroundEquals.xml +61 -0
- data/spec/sample_feeds/FeedBurnerUrlNoAlternate.xml +28 -0
- data/spec/sample_feeds/GoogleDocsList.xml +188 -0
- data/spec/sample_feeds/HREFConsideredHarmful.xml +314 -0
- data/spec/sample_feeds/HREFConsideredHarmfulFirstEntry.xml +22 -0
- data/spec/sample_feeds/ITunesWithSpacesInAttributes.xml +63 -0
- data/spec/sample_feeds/PaulDixExplainsNothing.xml +175 -0
- data/spec/sample_feeds/PaulDixExplainsNothingAlternate.xml +175 -0
- data/spec/sample_feeds/PaulDixExplainsNothingFirstEntryContent.xml +19 -0
- data/spec/sample_feeds/PaulDixExplainsNothingWFW.xml +174 -0
- data/spec/sample_feeds/SamRuby.xml +583 -0
- data/spec/sample_feeds/TechCrunch.xml +1515 -0
- data/spec/sample_feeds/TechCrunchFirstEntry.xml +9 -0
- data/spec/sample_feeds/TechCrunchFirstEntryDescription.xml +3 -0
- data/spec/sample_feeds/TenderLovemaking.xml +516 -0
- data/spec/sample_feeds/TenderLovemakingFirstEntry.xml +66 -0
- data/spec/sample_feeds/TrotterCashionHome.xml +611 -0
- data/spec/sample_feeds/TypePadNews.xml +368 -0
- data/spec/sample_feeds/atom_with_link_tag_for_url_unmarked.xml +31 -0
- data/spec/sample_feeds/itunes.xml +67 -0
- data/spec/sample_feeds/pet_atom.xml +497 -0
- data/spec/spec_helper.rb +88 -0
- metadata +229 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA1:
|
|
3
|
+
metadata.gz: 1579ca4b98963d52648b62e13ce1100d09744a1b
|
|
4
|
+
data.tar.gz: f3fee2d061ffb4ad9bf54b363963abd1ac313970
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 78294bd172b74e4f9e784bbabd7bfee8243f22457f16ca6946eeb12dc762b90252d0289480c04aae42727346718a8b0d41d077f5867e7a381c71796245fc68aa
|
|
7
|
+
data.tar.gz: febf23842faa2a597f5fdcae9e0c470d8d948a00e6d592115af82dc0dbb10baacee12678c02567e7d2d2fb49e8a6d12613676c9de21d4f8d5b1213222aeae0e9
|
data/.gitignore
ADDED
data/.rspec
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
--color
|
data/.travis.yml
ADDED
data/CHANGELOG.md
ADDED
|
@@ -0,0 +1,162 @@
|
|
|
1
|
+
# Feedjira Changelog
|
|
2
|
+
|
|
3
|
+
## 0.9.0
|
|
4
|
+
|
|
5
|
+
* Project renamed to Feedjira
|
|
6
|
+
|
|
7
|
+
## 0.7.1
|
|
8
|
+
|
|
9
|
+
* Bugfix
|
|
10
|
+
* Don't use entry id for updating when feed doesn't provide it [[#205][]]
|
|
11
|
+
|
|
12
|
+
[#205]: https://github.com/pauldix/feedzirra/pull/205
|
|
13
|
+
|
|
14
|
+
## 0.7.0
|
|
15
|
+
|
|
16
|
+
* General
|
|
17
|
+
* README update for callback arity [[#202][]]
|
|
18
|
+
|
|
19
|
+
* Enhancements
|
|
20
|
+
* Add error info to `on_failure` callback [[#194][]]
|
|
21
|
+
* On failure callbacks get curl and error as args
|
|
22
|
+
* Bugfix for parsing dates that are ISO 8601 with milliseconds [[#203][]]
|
|
23
|
+
|
|
24
|
+
[#194]: https://github.com/pauldix/feedzirra/pull/194
|
|
25
|
+
[#202]: https://github.com/pauldix/feedzirra/pull/202
|
|
26
|
+
[#203]: https://github.com/pauldix/feedzirra/pull/203
|
|
27
|
+
|
|
28
|
+
## 0.6.0
|
|
29
|
+
|
|
30
|
+
* General
|
|
31
|
+
* Update expected parser classes in docs [[#200][]]
|
|
32
|
+
* Fix Rubinius issue with Travis
|
|
33
|
+
|
|
34
|
+
* Enhancements
|
|
35
|
+
* Added content to `itunes_rss_item` [[#198][]]
|
|
36
|
+
* Allow user to pass a particular parser using `parse_with`
|
|
37
|
+
* Strip leading whitespace from XML [[#196][]]
|
|
38
|
+
* Parse out RSS version [[#172][]]
|
|
39
|
+
* Add generic preprocessing hook for Parsers
|
|
40
|
+
* Add preprocessing hook for Atom XHTML content [[#58][]] [[#130][]]
|
|
41
|
+
|
|
42
|
+
[#58]: https://github.com/pauldix/feedzirra/pull/58
|
|
43
|
+
[#130]: https://github.com/pauldix/feedzirra/issues/130
|
|
44
|
+
[#172]: https://github.com/pauldix/feedzirra/issues/172
|
|
45
|
+
[#196]: https://github.com/pauldix/feedzirra/pull/196
|
|
46
|
+
[#198]: https://github.com/pauldix/feedzirra/pull/198
|
|
47
|
+
[#200]: https://github.com/pauldix/feedzirra/pull/200
|
|
48
|
+
|
|
49
|
+
## 0.5.0
|
|
50
|
+
|
|
51
|
+
* General
|
|
52
|
+
* Lots of README cleanup
|
|
53
|
+
* Remove pending specs
|
|
54
|
+
* Rewrite benchmarks and move them out of the spec folder
|
|
55
|
+
* Upgrade to latest Rspec
|
|
56
|
+
|
|
57
|
+
* Enhancements
|
|
58
|
+
* Allow spaces in rss tag when checking parse-ability [[#127][]]
|
|
59
|
+
* Compare `entry_id` and `url` for finding new entries [[#195][]]
|
|
60
|
+
* Add closed captioned and order tags for iTunesRSSItem [[#160][]]
|
|
61
|
+
|
|
62
|
+
[#127]: https://github.com/pauldix/feedzirra/pull/127
|
|
63
|
+
[#160]: https://github.com/pauldix/feedzirra/pull/160
|
|
64
|
+
[#195]: https://github.com/pauldix/feedzirra/pull/195
|
|
65
|
+
|
|
66
|
+
## 0.4.0
|
|
67
|
+
|
|
68
|
+
* Enhancements
|
|
69
|
+
* Raise when parser invokes its failure callback [[#159][]]
|
|
70
|
+
* Add PubSubHubbub hub urls as feed element [[#138][]]
|
|
71
|
+
* Add support for iTunes image in iTunes RSS item [[#164][]]
|
|
72
|
+
|
|
73
|
+
* Bug fixes
|
|
74
|
+
* Use curb callbacks rather than response codes [[#161][]]
|
|
75
|
+
|
|
76
|
+
[#138]: https://github.com/pauldix/feedzirra/pull/138
|
|
77
|
+
[#159]: https://github.com/pauldix/feedzirra/issues/159
|
|
78
|
+
[#161]: https://github.com/pauldix/feedzirra/pull/161
|
|
79
|
+
[#164]: https://github.com/pauldix/feedzirra/pull/164
|
|
80
|
+
|
|
81
|
+
## 0.3.0
|
|
82
|
+
|
|
83
|
+
* General
|
|
84
|
+
* Add CodeClimate badge [[#192][]]
|
|
85
|
+
|
|
86
|
+
* Enhancements
|
|
87
|
+
* CURL SSL Version option [[#156][]]
|
|
88
|
+
* Cookie support for Curb [[#98][]]
|
|
89
|
+
|
|
90
|
+
* Deprecations
|
|
91
|
+
* For `ITunesRSSItem`, use `id` instead of `guid` [[#169][]]
|
|
92
|
+
|
|
93
|
+
[#98]: https://github.com/pauldix/feedzirra/pull/98
|
|
94
|
+
[#156]: https://github.com/pauldix/feedzirra/pull/156
|
|
95
|
+
[#169]: https://github.com/pauldix/feedzirra/pull/169
|
|
96
|
+
[#192]: https://github.com/pauldix/feedzirra/pull/192
|
|
97
|
+
|
|
98
|
+
## 0.2.2
|
|
99
|
+
|
|
100
|
+
* General
|
|
101
|
+
* Switch to CHANGELOG
|
|
102
|
+
* Set LICENSE in gemspec
|
|
103
|
+
* Lots of whitespace cleaning
|
|
104
|
+
* README updates
|
|
105
|
+
|
|
106
|
+
* Enhancements
|
|
107
|
+
* Also use dc:identifier for `entry_id` [[#182][]]
|
|
108
|
+
|
|
109
|
+
* Bug fixes
|
|
110
|
+
* Don't try to sanitize non-existent elements [[#174][]]
|
|
111
|
+
* Fix Rspec deprecations [[#188][]]
|
|
112
|
+
* Fix Travis [[#191][]]
|
|
113
|
+
|
|
114
|
+
[#174]: https://github.com/pauldix/feedzirra/pull/174
|
|
115
|
+
[#182]: https://github.com/pauldix/feedzirra/pull/182
|
|
116
|
+
[#188]: https://github.com/pauldix/feedzirra/pull/188
|
|
117
|
+
[#191]: https://github.com/pauldix/feedzirra/pull/191
|
|
118
|
+
|
|
119
|
+
## 0.2.1
|
|
120
|
+
|
|
121
|
+
* Use `Time.parse_safely` in `Feed.last_modified_from_header` [[#129][]].
|
|
122
|
+
* Added image to the RSS Entry Parser [[#103][]].
|
|
123
|
+
* Compatibility fixes for Ruby 2.0 [[#136][]].
|
|
124
|
+
* Remove gorillib dependency [[#113][]].
|
|
125
|
+
|
|
126
|
+
[#103]: https://github.com/pauldix/feedzirra/pull/103
|
|
127
|
+
[#113]: https://github.com/pauldix/feedzirra/pull/113
|
|
128
|
+
[#129]: https://github.com/pauldix/feedzirra/pull/129
|
|
129
|
+
[#136]: https://github.com/pauldix/feedzirra/pull/136
|
|
130
|
+
|
|
131
|
+
## 0.2.0.rc2
|
|
132
|
+
|
|
133
|
+
* Bump sax-machine to `v0.2.0.rc1`, fixes encoding issues [[#76][]].
|
|
134
|
+
|
|
135
|
+
[#76]: https://github.com/pauldix/feedzirra/issues/76
|
|
136
|
+
|
|
137
|
+
## 0.2.0.rc1
|
|
138
|
+
|
|
139
|
+
* Remove ActiveSupport dependency
|
|
140
|
+
* No longer tethered to any version of Rails!
|
|
141
|
+
* Update curb (v0.8.0) and rspec (v2.10.0)
|
|
142
|
+
* Revert [3008ceb][]
|
|
143
|
+
* Add Travis-CI integration
|
|
144
|
+
* General repository and gem maintenance
|
|
145
|
+
|
|
146
|
+
[3008ceb]: https://github.com/pauldix/feedzirra/commit/3008ceb338df1f4c37a211d0aab8a6ad4f584dbc
|
|
147
|
+
|
|
148
|
+
## 0.1.3
|
|
149
|
+
|
|
150
|
+
* ?
|
|
151
|
+
|
|
152
|
+
## 0.1.2
|
|
153
|
+
|
|
154
|
+
* ?
|
|
155
|
+
|
|
156
|
+
## 0.1.1
|
|
157
|
+
|
|
158
|
+
* make FeedEntries enumerable (patch by Daniel Gregoire)
|
|
159
|
+
|
|
160
|
+
## 0.1.0
|
|
161
|
+
|
|
162
|
+
* lower builder requirement to make it rails-3 friendly
|
data/Gemfile
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
source 'https://rubygems.org/'
|
|
2
|
+
|
|
3
|
+
gemspec
|
|
4
|
+
|
|
5
|
+
group :development, :test do
|
|
6
|
+
gem 'rake'
|
|
7
|
+
end
|
|
8
|
+
|
|
9
|
+
group :tools do
|
|
10
|
+
gem 'guard-rspec'
|
|
11
|
+
gem 'simplecov', :require => false, :platforms => :mri_19
|
|
12
|
+
end
|
|
13
|
+
|
|
14
|
+
platforms :rbx do
|
|
15
|
+
gem 'racc'
|
|
16
|
+
gem 'rubysl'
|
|
17
|
+
end
|
data/Guardfile
ADDED
data/README.md
ADDED
|
@@ -0,0 +1,242 @@
|
|
|
1
|
+
# Feedjira [![Build Status][travis-badge]][travis] [![Code Climate][code-climate-badge]][code-climate]
|
|
2
|
+
|
|
3
|
+
[travis-badge]: https://secure.travis-ci.org/feedjira/feedjira.png
|
|
4
|
+
[travis]: http://travis-ci.org/feedjira/feedjira
|
|
5
|
+
[code-climate-badge]: https://codeclimate.com/github/feedjira/feedjira.png
|
|
6
|
+
[code-climate]: https://codeclimate.com/github/feedjira/feedjira
|
|
7
|
+
|
|
8
|
+
I'd like feedback on the api and any bugs encountered on feeds in the wild. I've
|
|
9
|
+
set up a [google group][].
|
|
10
|
+
|
|
11
|
+
[google group]: http://groups.google.com/group/feedjira
|
|
12
|
+
|
|
13
|
+
## Description
|
|
14
|
+
|
|
15
|
+
Feedjira is a feed library that is designed to get and update many feeds as
|
|
16
|
+
quickly as possible. This includes using libcurl-multi through the [curb][] gem
|
|
17
|
+
for faster http gets, and libxml through [nokogiri][] and [sax-machine][] for
|
|
18
|
+
faster parsing. Feedjira requires at least Ruby 1.9.2.
|
|
19
|
+
|
|
20
|
+
[curb]: https://github.com/taf2/curb
|
|
21
|
+
[nokogiri]: https://github.com/sparklemotion/nokogiri
|
|
22
|
+
[sax-machine]: https://github.com/pauldix/sax-machine
|
|
23
|
+
|
|
24
|
+
Once you have fetched feeds using Feedjira, they can be updated using the feed
|
|
25
|
+
objects. Feedjira automatically inserts etag and last-modified information from
|
|
26
|
+
the http response headers to lower bandwidth usage, eliminate unnecessary
|
|
27
|
+
parsing, and make things speedier in general.
|
|
28
|
+
|
|
29
|
+
Another feature present in Feedjira is the ability to create callback functions
|
|
30
|
+
that get called "on success" and "on failure" when getting a feed. This makes it
|
|
31
|
+
easy to do things like log errors or update data stores.
|
|
32
|
+
|
|
33
|
+
The fetching and parsing logic have been decoupled so that either of them can be
|
|
34
|
+
used in isolation if you'd prefer not to use everything that Feedjira offers.
|
|
35
|
+
However, the code examples below use helper methods in the Feed class that put
|
|
36
|
+
everything together to make things as simple as possible.
|
|
37
|
+
|
|
38
|
+
The final feature of Feedjira is the ability to define custom parsing classes.
|
|
39
|
+
In truth, Feedjira could be used to parse much more than feeds. Microformats,
|
|
40
|
+
page scraping, and almost anything else are fair game.
|
|
41
|
+
|
|
42
|
+
## Speedup date parsing
|
|
43
|
+
|
|
44
|
+
In MRI before 1.9.3 the date parsing code was written in Ruby and was optimized
|
|
45
|
+
for readability over speed, to speed up this part you can install the
|
|
46
|
+
[home_run][] gem to replace it with an optimized C version. In most cases, if
|
|
47
|
+
you are using Ruby 1.9.3+, you will not need to use home\_run.
|
|
48
|
+
|
|
49
|
+
[home_run]: https://github.com/jeremyevans/home_run
|
|
50
|
+
|
|
51
|
+
## Usage
|
|
52
|
+
|
|
53
|
+
[A gist of the following code](http://gist.github.com/57285)
|
|
54
|
+
|
|
55
|
+
```ruby
|
|
56
|
+
require 'feedjira'
|
|
57
|
+
|
|
58
|
+
# fetching a single feed
|
|
59
|
+
feed = Feedjira::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing")
|
|
60
|
+
|
|
61
|
+
# feed and entries accessors
|
|
62
|
+
feed.title # => "Paul Dix Explains Nothing"
|
|
63
|
+
feed.url # => "http://www.pauldix.net"
|
|
64
|
+
feed.feed_url # => "http://feeds.feedburner.com/PaulDixExplainsNothing"
|
|
65
|
+
feed.etag # => "GunxqnEP4NeYhrqq9TyVKTuDnh0"
|
|
66
|
+
feed.last_modified # => Sat Jan 31 17:58:16 -0500 2009 # it's a Time object
|
|
67
|
+
|
|
68
|
+
entry = feed.entries.first
|
|
69
|
+
entry.title # => "Ruby Http Client Library Performance"
|
|
70
|
+
entry.url # => "http://www.pauldix.net/2009/01/ruby-http-client-library-performance.html"
|
|
71
|
+
entry.author # => "Paul Dix"
|
|
72
|
+
entry.summary # => "..."
|
|
73
|
+
entry.content # => "..."
|
|
74
|
+
entry.published # => Thu Jan 29 17:00:19 UTC 2009 # it's a Time object
|
|
75
|
+
entry.categories # => ["...", "..."]
|
|
76
|
+
|
|
77
|
+
# sanitizing an entry's content
|
|
78
|
+
entry.title.sanitize # => returns the title with harmful stuff escaped
|
|
79
|
+
entry.author.sanitize # => returns the author with harmful stuff escaped
|
|
80
|
+
entry.content.sanitize # => returns the content with harmful stuff escaped
|
|
81
|
+
entry.content.sanitize! # => returns content with harmful stuff escaped and replaces original (also exists for author and title)
|
|
82
|
+
entry.sanitize! # => sanitizes the entry's title, author, and content in place (as in, it changes the value to clean versions)
|
|
83
|
+
feed.sanitize_entries! # => sanitizes all entries in place
|
|
84
|
+
|
|
85
|
+
# updating a single feed
|
|
86
|
+
updated_feed = Feedjira::Feed.update(feed)
|
|
87
|
+
|
|
88
|
+
# an updated feed has the following extra accessors
|
|
89
|
+
updated_feed.updated? # returns true if any of the feed attributes have been modified. will return false if no new entries
|
|
90
|
+
updated_feed.new_entries # a collection of the entry objects that are newer than the latest in the feed before update
|
|
91
|
+
|
|
92
|
+
# fetching multiple feeds
|
|
93
|
+
feed_urls = ["http://feeds.feedburner.com/PaulDixExplainsNothing", "http://feeds.feedburner.com/trottercashion"]
|
|
94
|
+
feeds = Feedjira::Feed.fetch_and_parse(feed_urls)
|
|
95
|
+
|
|
96
|
+
# feeds is now a hash with the feed_urls as keys and the parsed feed objects as values. If an error was thrown
|
|
97
|
+
# there will be a Fixnum of the http response code instead of a feed object
|
|
98
|
+
|
|
99
|
+
# updating multiple feeds. it expects a collection of feed objects
|
|
100
|
+
updated_feeds = Feedjira::Feed.update(feeds.values)
|
|
101
|
+
|
|
102
|
+
# defining custom behavior on failure or success. note that a return status of 304 (not updated) will call the on_success handler
|
|
103
|
+
feed = Feedjira::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing",
|
|
104
|
+
:on_success => lambda {|url, feed| puts feed.title },
|
|
105
|
+
:on_failure => lambda {|curl, error| puts error })
|
|
106
|
+
|
|
107
|
+
# if a collection was passed into fetch_and_parse, the handlers will be called for each one
|
|
108
|
+
|
|
109
|
+
# the behavior for the handlers when using Feedjira::Feed.update is slightly different. The feed passed into on_success will be
|
|
110
|
+
# the updated feed with the standard updated accessors. on failure it will be the original feed object passed into update
|
|
111
|
+
|
|
112
|
+
# fetching a feed via a proxy (optional)
|
|
113
|
+
feed = Feedjira::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing", {:proxy_url => '10.0.0.1', :proxy_port => 3084})
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
## Extending
|
|
117
|
+
|
|
118
|
+
### Adding a feed parsing class
|
|
119
|
+
|
|
120
|
+
When determining which parser to use for a given XML document, the following
|
|
121
|
+
list of parser classes is used:
|
|
122
|
+
|
|
123
|
+
* `Feedjira::Parser::RSSFeedBurner`
|
|
124
|
+
* `Feedjira::Parser::GoogleDocsAtom`
|
|
125
|
+
* `Feedjira::Parser::AtomFeedBurner`
|
|
126
|
+
* `Feedjira::Parser::Atom`
|
|
127
|
+
* `Feedjira::Parser::ITunesRSS`
|
|
128
|
+
* `Feedjira::Parser::RSS`
|
|
129
|
+
|
|
130
|
+
You can insert your own parser at the front of this stack by calling
|
|
131
|
+
`add_feed_class`, like this:
|
|
132
|
+
|
|
133
|
+
```ruby
|
|
134
|
+
Feedjira::Feed.add_feed_class MyAwesomeParser
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
Now when you `fetch_and_parse`, `MyAwesomeParser` will be the first one to get a
|
|
138
|
+
chance to parse the feed.
|
|
139
|
+
|
|
140
|
+
If you have the XML and just want to provide a parser class for one parse, you
|
|
141
|
+
can specify that using `parse_with`:
|
|
142
|
+
|
|
143
|
+
```ruby
|
|
144
|
+
Feedjira::Feed.parse_with MyAwesomeParser, xml
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
### Adding attributes to all feeds types / all entries types
|
|
148
|
+
|
|
149
|
+
```ruby
|
|
150
|
+
# Add the generator attribute to all feed types
|
|
151
|
+
Feedjira::Feed.add_common_feed_element('generator')
|
|
152
|
+
Feedjira::Feed.fetch_and_parse("href="http://www.pauldix.net/atom.xml").generator # => 'TypePad'
|
|
153
|
+
|
|
154
|
+
# Add some GeoRss information
|
|
155
|
+
Feedjira::Feed.add_common_feed_entry_element('geo:lat', :as => :lat)
|
|
156
|
+
Feedjira::Feed.fetch_and_parse("http://www.earthpublisher.com/georss.php").entries.each do |e|
|
|
157
|
+
p "lat: #[e.lat}, long: #{e.long]"
|
|
158
|
+
end
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
### Adding attributes to only one class
|
|
162
|
+
|
|
163
|
+
If you want to add attributes for only one class you simply have to declare them
|
|
164
|
+
in the class
|
|
165
|
+
|
|
166
|
+
```ruby
|
|
167
|
+
# Add some GeoRss information
|
|
168
|
+
require 'lib/feedjira/parser/rss_entry'
|
|
169
|
+
|
|
170
|
+
class Feedjira::Parser::RSSEntry
|
|
171
|
+
element 'geo:lat', :as => :lat
|
|
172
|
+
element 'geo:long', :as => :long
|
|
173
|
+
end
|
|
174
|
+
|
|
175
|
+
# Fetch a feed containing GeoRss info and print them
|
|
176
|
+
Feedjira::Feed.fetch_and_parse("http://www.earthpublisher.com/georss.php").entries.each do |e|
|
|
177
|
+
p "lat: #{e.lat}, long: #{e.long}"
|
|
178
|
+
end
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
## Testing
|
|
182
|
+
|
|
183
|
+
Feedjira uses [curb][] to perform requests. `curb` provides bindings for
|
|
184
|
+
[libcurl][] and supports numerous protocols, including FILE. To test Feedjira
|
|
185
|
+
with local file use `file://` protocol:
|
|
186
|
+
|
|
187
|
+
[libcurl]: http://curl.haxx.se/libcurl/
|
|
188
|
+
|
|
189
|
+
```ruby
|
|
190
|
+
feed = Feedjira::Feed.fetch_and_parse('file:///home/feedjira/examples/feed.rss')
|
|
191
|
+
```
|
|
192
|
+
|
|
193
|
+
## Benchmarks
|
|
194
|
+
|
|
195
|
+
Since a major goal of Feedjira is speed, benchmarks are provided--see the
|
|
196
|
+
[Benchmark README][benchmark_readme] for more details.
|
|
197
|
+
|
|
198
|
+
[benchmark_readme]: https://github.com/feedjira/feedjira/blob/master/benchmarks/README.md
|
|
199
|
+
|
|
200
|
+
## TODO
|
|
201
|
+
|
|
202
|
+
This thing needs to hammer on many different feeds in the wild. I'm sure there
|
|
203
|
+
will be bugs. I want to find them and crush them. I didn't bother using the test
|
|
204
|
+
suite for feedparser. i wanted to start fresh.
|
|
205
|
+
|
|
206
|
+
Here are some more specific TODOs.
|
|
207
|
+
|
|
208
|
+
* Make a feedjira-rails gem to integrate feedjira seamlessly with Rails and ActiveRecord.
|
|
209
|
+
* Add support for authenticated feeds.
|
|
210
|
+
* Create a super sweet DSL for defining new parsers.
|
|
211
|
+
* I'm not keeping track of modified on entries. Should I add this?
|
|
212
|
+
* Clean up the fetching code inside feed.rb so it doesn't suck so hard.
|
|
213
|
+
* Make the feed_spec actually mock stuff out so it doesn't hit the net.
|
|
214
|
+
* Readdress how feeds determine if they can parse a document. Maybe I should use namespaces instead?
|
|
215
|
+
|
|
216
|
+
## LICENSE
|
|
217
|
+
|
|
218
|
+
(The MIT License)
|
|
219
|
+
|
|
220
|
+
Copyright (c) 2009-2013:
|
|
221
|
+
|
|
222
|
+
- [Paul Dix](http://pauldix.net)
|
|
223
|
+
- [Julien Kirch](http://archiloque.net/)
|
|
224
|
+
- [Ezekiel Templin](http://zeke.templ.in/)
|
|
225
|
+
- [Jon Allured](http://jonallured.com/)
|
|
226
|
+
|
|
227
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
|
228
|
+
this software and associated documentation files (the 'Software'), to deal in
|
|
229
|
+
the Software without restriction, including without limitation the rights to
|
|
230
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
|
231
|
+
the Software, and to permit persons to whom the Software is furnished to do so,
|
|
232
|
+
subject to the following conditions:
|
|
233
|
+
|
|
234
|
+
The above copyright notice and this permission notice shall be included in all
|
|
235
|
+
copies or substantial portions of the Software.
|
|
236
|
+
|
|
237
|
+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
238
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
|
239
|
+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
|
240
|
+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
|
241
|
+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
|
242
|
+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|