feedjira 0.9.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +14 -0
- data/.rspec +1 -0
- data/.travis.yml +8 -0
- data/CHANGELOG.md +162 -0
- data/Gemfile +17 -0
- data/Guardfile +5 -0
- data/README.md +242 -0
- data/Rakefile +6 -0
- data/benchmarks/README.md +90 -0
- data/benchmarks/basic.rb +31 -0
- data/benchmarks/feed_list.txt +10 -0
- data/benchmarks/feed_xml/apple.xml +149 -0
- data/benchmarks/feed_xml/cnn.xml +278 -0
- data/benchmarks/feed_xml/daring_fireball.xml +1697 -0
- data/benchmarks/feed_xml/engadget.xml +604 -0
- data/benchmarks/feed_xml/feedjira_commits.xml +370 -0
- data/benchmarks/feed_xml/gizmodo.xml +2 -0
- data/benchmarks/feed_xml/loop.xml +441 -0
- data/benchmarks/feed_xml/rails.xml +1938 -0
- data/benchmarks/feed_xml/white_house.xml +951 -0
- data/benchmarks/feed_xml/xkcd.xml +2 -0
- data/benchmarks/fetching_systems.rb +23 -0
- data/benchmarks/other_libraries.rb +73 -0
- data/feedjira.gemspec +27 -0
- data/lib/feedjira.rb +16 -0
- data/lib/feedjira/core_ext.rb +3 -0
- data/lib/feedjira/core_ext/date.rb +19 -0
- data/lib/feedjira/core_ext/string.rb +9 -0
- data/lib/feedjira/core_ext/time.rb +31 -0
- data/lib/feedjira/feed.rb +459 -0
- data/lib/feedjira/feed_entry_utilities.rb +66 -0
- data/lib/feedjira/feed_utilities.rb +103 -0
- data/lib/feedjira/parser.rb +20 -0
- data/lib/feedjira/parser/atom.rb +61 -0
- data/lib/feedjira/parser/atom_entry.rb +34 -0
- data/lib/feedjira/parser/atom_feed_burner.rb +22 -0
- data/lib/feedjira/parser/atom_feed_burner_entry.rb +35 -0
- data/lib/feedjira/parser/google_docs_atom.rb +28 -0
- data/lib/feedjira/parser/google_docs_atom_entry.rb +29 -0
- data/lib/feedjira/parser/itunes_rss.rb +50 -0
- data/lib/feedjira/parser/itunes_rss_item.rb +41 -0
- data/lib/feedjira/parser/itunes_rss_owner.rb +12 -0
- data/lib/feedjira/parser/rss.rb +24 -0
- data/lib/feedjira/parser/rss_entry.rb +37 -0
- data/lib/feedjira/parser/rss_feed_burner.rb +23 -0
- data/lib/feedjira/parser/rss_feed_burner_entry.rb +43 -0
- data/lib/feedjira/version.rb +3 -0
- data/spec/feedjira/feed_entry_utilities_spec.rb +62 -0
- data/spec/feedjira/feed_spec.rb +762 -0
- data/spec/feedjira/feed_utilities_spec.rb +273 -0
- data/spec/feedjira/parser/atom_entry_spec.rb +86 -0
- data/spec/feedjira/parser/atom_feed_burner_entry_spec.rb +47 -0
- data/spec/feedjira/parser/atom_feed_burner_spec.rb +56 -0
- data/spec/feedjira/parser/atom_spec.rb +76 -0
- data/spec/feedjira/parser/google_docs_atom_entry_spec.rb +22 -0
- data/spec/feedjira/parser/google_docs_atom_spec.rb +31 -0
- data/spec/feedjira/parser/itunes_rss_item_spec.rb +63 -0
- data/spec/feedjira/parser/itunes_rss_owner_spec.rb +18 -0
- data/spec/feedjira/parser/itunes_rss_spec.rb +58 -0
- data/spec/feedjira/parser/rss_entry_spec.rb +85 -0
- data/spec/feedjira/parser/rss_feed_burner_entry_spec.rb +85 -0
- data/spec/feedjira/parser/rss_feed_burner_spec.rb +57 -0
- data/spec/feedjira/parser/rss_spec.rb +57 -0
- data/spec/sample_feeds/AmazonWebServicesBlog.xml +797 -0
- data/spec/sample_feeds/AmazonWebServicesBlogFirstEntryContent.xml +63 -0
- data/spec/sample_feeds/AtomFeedWithSpacesAroundEquals.xml +61 -0
- data/spec/sample_feeds/FeedBurnerUrlNoAlternate.xml +28 -0
- data/spec/sample_feeds/GoogleDocsList.xml +188 -0
- data/spec/sample_feeds/HREFConsideredHarmful.xml +314 -0
- data/spec/sample_feeds/HREFConsideredHarmfulFirstEntry.xml +22 -0
- data/spec/sample_feeds/ITunesWithSpacesInAttributes.xml +63 -0
- data/spec/sample_feeds/PaulDixExplainsNothing.xml +175 -0
- data/spec/sample_feeds/PaulDixExplainsNothingAlternate.xml +175 -0
- data/spec/sample_feeds/PaulDixExplainsNothingFirstEntryContent.xml +19 -0
- data/spec/sample_feeds/PaulDixExplainsNothingWFW.xml +174 -0
- data/spec/sample_feeds/SamRuby.xml +583 -0
- data/spec/sample_feeds/TechCrunch.xml +1515 -0
- data/spec/sample_feeds/TechCrunchFirstEntry.xml +9 -0
- data/spec/sample_feeds/TechCrunchFirstEntryDescription.xml +3 -0
- data/spec/sample_feeds/TenderLovemaking.xml +516 -0
- data/spec/sample_feeds/TenderLovemakingFirstEntry.xml +66 -0
- data/spec/sample_feeds/TrotterCashionHome.xml +611 -0
- data/spec/sample_feeds/TypePadNews.xml +368 -0
- data/spec/sample_feeds/atom_with_link_tag_for_url_unmarked.xml +31 -0
- data/spec/sample_feeds/itunes.xml +67 -0
- data/spec/sample_feeds/pet_atom.xml +497 -0
- data/spec/spec_helper.rb +88 -0
- metadata +229 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 1579ca4b98963d52648b62e13ce1100d09744a1b
|
4
|
+
data.tar.gz: f3fee2d061ffb4ad9bf54b363963abd1ac313970
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 78294bd172b74e4f9e784bbabd7bfee8243f22457f16ca6946eeb12dc762b90252d0289480c04aae42727346718a8b0d41d077f5867e7a381c71796245fc68aa
|
7
|
+
data.tar.gz: febf23842faa2a597f5fdcae9e0c470d8d948a00e6d592115af82dc0dbb10baacee12678c02567e7d2d2fb49e8a6d12613676c9de21d4f8d5b1213222aeae0e9
|
data/.gitignore
ADDED
data/.rspec
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
--color
|
data/.travis.yml
ADDED
data/CHANGELOG.md
ADDED
@@ -0,0 +1,162 @@
|
|
1
|
+
# Feedjira Changelog
|
2
|
+
|
3
|
+
## 0.9.0
|
4
|
+
|
5
|
+
* Project renamed to Feedjira
|
6
|
+
|
7
|
+
## 0.7.1
|
8
|
+
|
9
|
+
* Bugfix
|
10
|
+
* Don't use entry id for updating when feed doesn't provide it [[#205][]]
|
11
|
+
|
12
|
+
[#205]: https://github.com/pauldix/feedzirra/pull/205
|
13
|
+
|
14
|
+
## 0.7.0
|
15
|
+
|
16
|
+
* General
|
17
|
+
* README update for callback arity [[#202][]]
|
18
|
+
|
19
|
+
* Enhancements
|
20
|
+
* Add error info to `on_failure` callback [[#194][]]
|
21
|
+
* On failure callbacks get curl and error as args
|
22
|
+
* Bugfix for parsing dates that are ISO 8601 with milliseconds [[#203][]]
|
23
|
+
|
24
|
+
[#194]: https://github.com/pauldix/feedzirra/pull/194
|
25
|
+
[#202]: https://github.com/pauldix/feedzirra/pull/202
|
26
|
+
[#203]: https://github.com/pauldix/feedzirra/pull/203
|
27
|
+
|
28
|
+
## 0.6.0
|
29
|
+
|
30
|
+
* General
|
31
|
+
* Update expected parser classes in docs [[#200][]]
|
32
|
+
* Fix Rubinius issue with Travis
|
33
|
+
|
34
|
+
* Enhancements
|
35
|
+
* Added content to `itunes_rss_item` [[#198][]]
|
36
|
+
* Allow user to pass a particular parser using `parse_with`
|
37
|
+
* Strip leading whitespace from XML [[#196][]]
|
38
|
+
* Parse out RSS version [[#172][]]
|
39
|
+
* Add generic preprocessing hook for Parsers
|
40
|
+
* Add preprocessing hook for Atom XHTML content [[#58][]] [[#130][]]
|
41
|
+
|
42
|
+
[#58]: https://github.com/pauldix/feedzirra/pull/58
|
43
|
+
[#130]: https://github.com/pauldix/feedzirra/issues/130
|
44
|
+
[#172]: https://github.com/pauldix/feedzirra/issues/172
|
45
|
+
[#196]: https://github.com/pauldix/feedzirra/pull/196
|
46
|
+
[#198]: https://github.com/pauldix/feedzirra/pull/198
|
47
|
+
[#200]: https://github.com/pauldix/feedzirra/pull/200
|
48
|
+
|
49
|
+
## 0.5.0
|
50
|
+
|
51
|
+
* General
|
52
|
+
* Lots of README cleanup
|
53
|
+
* Remove pending specs
|
54
|
+
* Rewrite benchmarks and move them out of the spec folder
|
55
|
+
* Upgrade to latest Rspec
|
56
|
+
|
57
|
+
* Enhancements
|
58
|
+
* Allow spaces in rss tag when checking parse-ability [[#127][]]
|
59
|
+
* Compare `entry_id` and `url` for finding new entries [[#195][]]
|
60
|
+
* Add closed captioned and order tags for iTunesRSSItem [[#160][]]
|
61
|
+
|
62
|
+
[#127]: https://github.com/pauldix/feedzirra/pull/127
|
63
|
+
[#160]: https://github.com/pauldix/feedzirra/pull/160
|
64
|
+
[#195]: https://github.com/pauldix/feedzirra/pull/195
|
65
|
+
|
66
|
+
## 0.4.0
|
67
|
+
|
68
|
+
* Enhancements
|
69
|
+
* Raise when parser invokes its failure callback [[#159][]]
|
70
|
+
* Add PubSubHubbub hub urls as feed element [[#138][]]
|
71
|
+
* Add support for iTunes image in iTunes RSS item [[#164][]]
|
72
|
+
|
73
|
+
* Bug fixes
|
74
|
+
* Use curb callbacks rather than response codes [[#161][]]
|
75
|
+
|
76
|
+
[#138]: https://github.com/pauldix/feedzirra/pull/138
|
77
|
+
[#159]: https://github.com/pauldix/feedzirra/issues/159
|
78
|
+
[#161]: https://github.com/pauldix/feedzirra/pull/161
|
79
|
+
[#164]: https://github.com/pauldix/feedzirra/pull/164
|
80
|
+
|
81
|
+
## 0.3.0
|
82
|
+
|
83
|
+
* General
|
84
|
+
* Add CodeClimate badge [[#192][]]
|
85
|
+
|
86
|
+
* Enhancements
|
87
|
+
* CURL SSL Version option [[#156][]]
|
88
|
+
* Cookie support for Curb [[#98][]]
|
89
|
+
|
90
|
+
* Deprecations
|
91
|
+
* For `ITunesRSSItem`, use `id` instead of `guid` [[#169][]]
|
92
|
+
|
93
|
+
[#98]: https://github.com/pauldix/feedzirra/pull/98
|
94
|
+
[#156]: https://github.com/pauldix/feedzirra/pull/156
|
95
|
+
[#169]: https://github.com/pauldix/feedzirra/pull/169
|
96
|
+
[#192]: https://github.com/pauldix/feedzirra/pull/192
|
97
|
+
|
98
|
+
## 0.2.2
|
99
|
+
|
100
|
+
* General
|
101
|
+
* Switch to CHANGELOG
|
102
|
+
* Set LICENSE in gemspec
|
103
|
+
* Lots of whitespace cleaning
|
104
|
+
* README updates
|
105
|
+
|
106
|
+
* Enhancements
|
107
|
+
* Also use dc:identifier for `entry_id` [[#182][]]
|
108
|
+
|
109
|
+
* Bug fixes
|
110
|
+
* Don't try to sanitize non-existent elements [[#174][]]
|
111
|
+
* Fix Rspec deprecations [[#188][]]
|
112
|
+
* Fix Travis [[#191][]]
|
113
|
+
|
114
|
+
[#174]: https://github.com/pauldix/feedzirra/pull/174
|
115
|
+
[#182]: https://github.com/pauldix/feedzirra/pull/182
|
116
|
+
[#188]: https://github.com/pauldix/feedzirra/pull/188
|
117
|
+
[#191]: https://github.com/pauldix/feedzirra/pull/191
|
118
|
+
|
119
|
+
## 0.2.1
|
120
|
+
|
121
|
+
* Use `Time.parse_safely` in `Feed.last_modified_from_header` [[#129][]].
|
122
|
+
* Added image to the RSS Entry Parser [[#103][]].
|
123
|
+
* Compatibility fixes for Ruby 2.0 [[#136][]].
|
124
|
+
* Remove gorillib dependency [[#113][]].
|
125
|
+
|
126
|
+
[#103]: https://github.com/pauldix/feedzirra/pull/103
|
127
|
+
[#113]: https://github.com/pauldix/feedzirra/pull/113
|
128
|
+
[#129]: https://github.com/pauldix/feedzirra/pull/129
|
129
|
+
[#136]: https://github.com/pauldix/feedzirra/pull/136
|
130
|
+
|
131
|
+
## 0.2.0.rc2
|
132
|
+
|
133
|
+
* Bump sax-machine to `v0.2.0.rc1`, fixes encoding issues [[#76][]].
|
134
|
+
|
135
|
+
[#76]: https://github.com/pauldix/feedzirra/issues/76
|
136
|
+
|
137
|
+
## 0.2.0.rc1
|
138
|
+
|
139
|
+
* Remove ActiveSupport dependency
|
140
|
+
* No longer tethered to any version of Rails!
|
141
|
+
* Update curb (v0.8.0) and rspec (v2.10.0)
|
142
|
+
* Revert [3008ceb][]
|
143
|
+
* Add Travis-CI integration
|
144
|
+
* General repository and gem maintenance
|
145
|
+
|
146
|
+
[3008ceb]: https://github.com/pauldix/feedzirra/commit/3008ceb338df1f4c37a211d0aab8a6ad4f584dbc
|
147
|
+
|
148
|
+
## 0.1.3
|
149
|
+
|
150
|
+
* ?
|
151
|
+
|
152
|
+
## 0.1.2
|
153
|
+
|
154
|
+
* ?
|
155
|
+
|
156
|
+
## 0.1.1
|
157
|
+
|
158
|
+
* make FeedEntries enumerable (patch by Daniel Gregoire)
|
159
|
+
|
160
|
+
## 0.1.0
|
161
|
+
|
162
|
+
* lower builder requirement to make it rails-3 friendly
|
data/Gemfile
ADDED
@@ -0,0 +1,17 @@
|
|
1
|
+
source 'https://rubygems.org/'
|
2
|
+
|
3
|
+
gemspec
|
4
|
+
|
5
|
+
group :development, :test do
|
6
|
+
gem 'rake'
|
7
|
+
end
|
8
|
+
|
9
|
+
group :tools do
|
10
|
+
gem 'guard-rspec'
|
11
|
+
gem 'simplecov', :require => false, :platforms => :mri_19
|
12
|
+
end
|
13
|
+
|
14
|
+
platforms :rbx do
|
15
|
+
gem 'racc'
|
16
|
+
gem 'rubysl'
|
17
|
+
end
|
data/Guardfile
ADDED
data/README.md
ADDED
@@ -0,0 +1,242 @@
|
|
1
|
+
# Feedjira [![Build Status][travis-badge]][travis] [![Code Climate][code-climate-badge]][code-climate]
|
2
|
+
|
3
|
+
[travis-badge]: https://secure.travis-ci.org/feedjira/feedjira.png
|
4
|
+
[travis]: http://travis-ci.org/feedjira/feedjira
|
5
|
+
[code-climate-badge]: https://codeclimate.com/github/feedjira/feedjira.png
|
6
|
+
[code-climate]: https://codeclimate.com/github/feedjira/feedjira
|
7
|
+
|
8
|
+
I'd like feedback on the api and any bugs encountered on feeds in the wild. I've
|
9
|
+
set up a [google group][].
|
10
|
+
|
11
|
+
[google group]: http://groups.google.com/group/feedjira
|
12
|
+
|
13
|
+
## Description
|
14
|
+
|
15
|
+
Feedjira is a feed library that is designed to get and update many feeds as
|
16
|
+
quickly as possible. This includes using libcurl-multi through the [curb][] gem
|
17
|
+
for faster http gets, and libxml through [nokogiri][] and [sax-machine][] for
|
18
|
+
faster parsing. Feedjira requires at least Ruby 1.9.2.
|
19
|
+
|
20
|
+
[curb]: https://github.com/taf2/curb
|
21
|
+
[nokogiri]: https://github.com/sparklemotion/nokogiri
|
22
|
+
[sax-machine]: https://github.com/pauldix/sax-machine
|
23
|
+
|
24
|
+
Once you have fetched feeds using Feedjira, they can be updated using the feed
|
25
|
+
objects. Feedjira automatically inserts etag and last-modified information from
|
26
|
+
the http response headers to lower bandwidth usage, eliminate unnecessary
|
27
|
+
parsing, and make things speedier in general.
|
28
|
+
|
29
|
+
Another feature present in Feedjira is the ability to create callback functions
|
30
|
+
that get called "on success" and "on failure" when getting a feed. This makes it
|
31
|
+
easy to do things like log errors or update data stores.
|
32
|
+
|
33
|
+
The fetching and parsing logic have been decoupled so that either of them can be
|
34
|
+
used in isolation if you'd prefer not to use everything that Feedjira offers.
|
35
|
+
However, the code examples below use helper methods in the Feed class that put
|
36
|
+
everything together to make things as simple as possible.
|
37
|
+
|
38
|
+
The final feature of Feedjira is the ability to define custom parsing classes.
|
39
|
+
In truth, Feedjira could be used to parse much more than feeds. Microformats,
|
40
|
+
page scraping, and almost anything else are fair game.
|
41
|
+
|
42
|
+
## Speedup date parsing
|
43
|
+
|
44
|
+
In MRI before 1.9.3 the date parsing code was written in Ruby and was optimized
|
45
|
+
for readability over speed, to speed up this part you can install the
|
46
|
+
[home_run][] gem to replace it with an optimized C version. In most cases, if
|
47
|
+
you are using Ruby 1.9.3+, you will not need to use home\_run.
|
48
|
+
|
49
|
+
[home_run]: https://github.com/jeremyevans/home_run
|
50
|
+
|
51
|
+
## Usage
|
52
|
+
|
53
|
+
[A gist of the following code](http://gist.github.com/57285)
|
54
|
+
|
55
|
+
```ruby
|
56
|
+
require 'feedjira'
|
57
|
+
|
58
|
+
# fetching a single feed
|
59
|
+
feed = Feedjira::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing")
|
60
|
+
|
61
|
+
# feed and entries accessors
|
62
|
+
feed.title # => "Paul Dix Explains Nothing"
|
63
|
+
feed.url # => "http://www.pauldix.net"
|
64
|
+
feed.feed_url # => "http://feeds.feedburner.com/PaulDixExplainsNothing"
|
65
|
+
feed.etag # => "GunxqnEP4NeYhrqq9TyVKTuDnh0"
|
66
|
+
feed.last_modified # => Sat Jan 31 17:58:16 -0500 2009 # it's a Time object
|
67
|
+
|
68
|
+
entry = feed.entries.first
|
69
|
+
entry.title # => "Ruby Http Client Library Performance"
|
70
|
+
entry.url # => "http://www.pauldix.net/2009/01/ruby-http-client-library-performance.html"
|
71
|
+
entry.author # => "Paul Dix"
|
72
|
+
entry.summary # => "..."
|
73
|
+
entry.content # => "..."
|
74
|
+
entry.published # => Thu Jan 29 17:00:19 UTC 2009 # it's a Time object
|
75
|
+
entry.categories # => ["...", "..."]
|
76
|
+
|
77
|
+
# sanitizing an entry's content
|
78
|
+
entry.title.sanitize # => returns the title with harmful stuff escaped
|
79
|
+
entry.author.sanitize # => returns the author with harmful stuff escaped
|
80
|
+
entry.content.sanitize # => returns the content with harmful stuff escaped
|
81
|
+
entry.content.sanitize! # => returns content with harmful stuff escaped and replaces original (also exists for author and title)
|
82
|
+
entry.sanitize! # => sanitizes the entry's title, author, and content in place (as in, it changes the value to clean versions)
|
83
|
+
feed.sanitize_entries! # => sanitizes all entries in place
|
84
|
+
|
85
|
+
# updating a single feed
|
86
|
+
updated_feed = Feedjira::Feed.update(feed)
|
87
|
+
|
88
|
+
# an updated feed has the following extra accessors
|
89
|
+
updated_feed.updated? # returns true if any of the feed attributes have been modified. will return false if no new entries
|
90
|
+
updated_feed.new_entries # a collection of the entry objects that are newer than the latest in the feed before update
|
91
|
+
|
92
|
+
# fetching multiple feeds
|
93
|
+
feed_urls = ["http://feeds.feedburner.com/PaulDixExplainsNothing", "http://feeds.feedburner.com/trottercashion"]
|
94
|
+
feeds = Feedjira::Feed.fetch_and_parse(feed_urls)
|
95
|
+
|
96
|
+
# feeds is now a hash with the feed_urls as keys and the parsed feed objects as values. If an error was thrown
|
97
|
+
# there will be a Fixnum of the http response code instead of a feed object
|
98
|
+
|
99
|
+
# updating multiple feeds. it expects a collection of feed objects
|
100
|
+
updated_feeds = Feedjira::Feed.update(feeds.values)
|
101
|
+
|
102
|
+
# defining custom behavior on failure or success. note that a return status of 304 (not updated) will call the on_success handler
|
103
|
+
feed = Feedjira::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing",
|
104
|
+
:on_success => lambda {|url, feed| puts feed.title },
|
105
|
+
:on_failure => lambda {|curl, error| puts error })
|
106
|
+
|
107
|
+
# if a collection was passed into fetch_and_parse, the handlers will be called for each one
|
108
|
+
|
109
|
+
# the behavior for the handlers when using Feedjira::Feed.update is slightly different. The feed passed into on_success will be
|
110
|
+
# the updated feed with the standard updated accessors. on failure it will be the original feed object passed into update
|
111
|
+
|
112
|
+
# fetching a feed via a proxy (optional)
|
113
|
+
feed = Feedjira::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing", {:proxy_url => '10.0.0.1', :proxy_port => 3084})
|
114
|
+
```
|
115
|
+
|
116
|
+
## Extending
|
117
|
+
|
118
|
+
### Adding a feed parsing class
|
119
|
+
|
120
|
+
When determining which parser to use for a given XML document, the following
|
121
|
+
list of parser classes is used:
|
122
|
+
|
123
|
+
* `Feedjira::Parser::RSSFeedBurner`
|
124
|
+
* `Feedjira::Parser::GoogleDocsAtom`
|
125
|
+
* `Feedjira::Parser::AtomFeedBurner`
|
126
|
+
* `Feedjira::Parser::Atom`
|
127
|
+
* `Feedjira::Parser::ITunesRSS`
|
128
|
+
* `Feedjira::Parser::RSS`
|
129
|
+
|
130
|
+
You can insert your own parser at the front of this stack by calling
|
131
|
+
`add_feed_class`, like this:
|
132
|
+
|
133
|
+
```ruby
|
134
|
+
Feedjira::Feed.add_feed_class MyAwesomeParser
|
135
|
+
```
|
136
|
+
|
137
|
+
Now when you `fetch_and_parse`, `MyAwesomeParser` will be the first one to get a
|
138
|
+
chance to parse the feed.
|
139
|
+
|
140
|
+
If you have the XML and just want to provide a parser class for one parse, you
|
141
|
+
can specify that using `parse_with`:
|
142
|
+
|
143
|
+
```ruby
|
144
|
+
Feedjira::Feed.parse_with MyAwesomeParser, xml
|
145
|
+
```
|
146
|
+
|
147
|
+
### Adding attributes to all feeds types / all entries types
|
148
|
+
|
149
|
+
```ruby
|
150
|
+
# Add the generator attribute to all feed types
|
151
|
+
Feedjira::Feed.add_common_feed_element('generator')
|
152
|
+
Feedjira::Feed.fetch_and_parse("href="http://www.pauldix.net/atom.xml").generator # => 'TypePad'
|
153
|
+
|
154
|
+
# Add some GeoRss information
|
155
|
+
Feedjira::Feed.add_common_feed_entry_element('geo:lat', :as => :lat)
|
156
|
+
Feedjira::Feed.fetch_and_parse("http://www.earthpublisher.com/georss.php").entries.each do |e|
|
157
|
+
p "lat: #[e.lat}, long: #{e.long]"
|
158
|
+
end
|
159
|
+
```
|
160
|
+
|
161
|
+
### Adding attributes to only one class
|
162
|
+
|
163
|
+
If you want to add attributes for only one class you simply have to declare them
|
164
|
+
in the class
|
165
|
+
|
166
|
+
```ruby
|
167
|
+
# Add some GeoRss information
|
168
|
+
require 'lib/feedjira/parser/rss_entry'
|
169
|
+
|
170
|
+
class Feedjira::Parser::RSSEntry
|
171
|
+
element 'geo:lat', :as => :lat
|
172
|
+
element 'geo:long', :as => :long
|
173
|
+
end
|
174
|
+
|
175
|
+
# Fetch a feed containing GeoRss info and print them
|
176
|
+
Feedjira::Feed.fetch_and_parse("http://www.earthpublisher.com/georss.php").entries.each do |e|
|
177
|
+
p "lat: #{e.lat}, long: #{e.long}"
|
178
|
+
end
|
179
|
+
```
|
180
|
+
|
181
|
+
## Testing
|
182
|
+
|
183
|
+
Feedjira uses [curb][] to perform requests. `curb` provides bindings for
|
184
|
+
[libcurl][] and supports numerous protocols, including FILE. To test Feedjira
|
185
|
+
with local file use `file://` protocol:
|
186
|
+
|
187
|
+
[libcurl]: http://curl.haxx.se/libcurl/
|
188
|
+
|
189
|
+
```ruby
|
190
|
+
feed = Feedjira::Feed.fetch_and_parse('file:///home/feedjira/examples/feed.rss')
|
191
|
+
```
|
192
|
+
|
193
|
+
## Benchmarks
|
194
|
+
|
195
|
+
Since a major goal of Feedjira is speed, benchmarks are provided--see the
|
196
|
+
[Benchmark README][benchmark_readme] for more details.
|
197
|
+
|
198
|
+
[benchmark_readme]: https://github.com/feedjira/feedjira/blob/master/benchmarks/README.md
|
199
|
+
|
200
|
+
## TODO
|
201
|
+
|
202
|
+
This thing needs to hammer on many different feeds in the wild. I'm sure there
|
203
|
+
will be bugs. I want to find them and crush them. I didn't bother using the test
|
204
|
+
suite for feedparser. i wanted to start fresh.
|
205
|
+
|
206
|
+
Here are some more specific TODOs.
|
207
|
+
|
208
|
+
* Make a feedjira-rails gem to integrate feedjira seamlessly with Rails and ActiveRecord.
|
209
|
+
* Add support for authenticated feeds.
|
210
|
+
* Create a super sweet DSL for defining new parsers.
|
211
|
+
* I'm not keeping track of modified on entries. Should I add this?
|
212
|
+
* Clean up the fetching code inside feed.rb so it doesn't suck so hard.
|
213
|
+
* Make the feed_spec actually mock stuff out so it doesn't hit the net.
|
214
|
+
* Readdress how feeds determine if they can parse a document. Maybe I should use namespaces instead?
|
215
|
+
|
216
|
+
## LICENSE
|
217
|
+
|
218
|
+
(The MIT License)
|
219
|
+
|
220
|
+
Copyright (c) 2009-2013:
|
221
|
+
|
222
|
+
- [Paul Dix](http://pauldix.net)
|
223
|
+
- [Julien Kirch](http://archiloque.net/)
|
224
|
+
- [Ezekiel Templin](http://zeke.templ.in/)
|
225
|
+
- [Jon Allured](http://jonallured.com/)
|
226
|
+
|
227
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
228
|
+
this software and associated documentation files (the 'Software'), to deal in
|
229
|
+
the Software without restriction, including without limitation the rights to
|
230
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
231
|
+
the Software, and to permit persons to whom the Software is furnished to do so,
|
232
|
+
subject to the following conditions:
|
233
|
+
|
234
|
+
The above copyright notice and this permission notice shall be included in all
|
235
|
+
copies or substantial portions of the Software.
|
236
|
+
|
237
|
+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
238
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
239
|
+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
240
|
+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
241
|
+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
242
|
+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|