damog-feedbag 0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. data/README +76 -0
  2. data/feedbag.rb +137 -0
  3. metadata +64 -0
data/README ADDED
@@ -0,0 +1,76 @@
1
+ Feedbag
2
+ --------------------------------------------------------
3
+ Do you want me to drag my sack across your face?
4
+ - Glenn Quagmire
5
+
6
+ Feedbag is a feed auto-discovery Ruby library. You don't need to know
7
+ more about it.
8
+
9
+ Quick synopsis
10
+ --------------
11
+ ~/axiombox/feedbag :master $ irb
12
+ >> require "feedbag"
13
+ => true
14
+ >> Feedbag.find "http://log.damog.net/"
15
+ => ["http://feeds.feedburner.com/TeoremaDelCerdoInfinito", "http://log.damog.net/comments/feed/"]
16
+ >>
17
+
18
+ Tutorial
19
+ --------
20
+ So you want to know more about it.
21
+
22
+ OK, if the URL passed to the find method is a feed itself, that only
23
+ feed URL will be returned.
24
+
25
+ >> Feedbag.find "github.com/damog.atom"
26
+ => ["http://github.com/damog.atom"]
27
+ >>
28
+
29
+ Otherwise, it will always return LINK feeds first, A (anchor tags) feeds
30
+ later. Between A feeds, the ones hosted on the same URL's host, will
31
+ have larger priority:
32
+
33
+ >> Feedbag.find "http://ve.planetalinux.org"
34
+ => ["http://feedproxy.google.com/PlanetaLinuxVenezuela", "http://rendergraf.wordpress.com/feed/", "http://rootweiller.wordpress.com/feed/", "http://skatox.com/blog/feed/", "http://kodegeek.com/atom.xml", "http://blog.0x29.com.ve/?feed=rss2&cat=8"]
35
+ >>
36
+
37
+ On your application you should only take the very first element of
38
+ the array, most of the times:
39
+
40
+ >> Feedbag.find("planet.debian.org").first(3)
41
+ => ["http://planet.debian.org/rss10.xml", "http://planet.debian.org/rss20.xml", "http://planet.debian.org/atom.xml"]
42
+ >>
43
+
44
+ (Try running that same example without the "first" method. That
45
+ example's host is a blog aggregator, so it has hundreds of feed URLs:
46
+
47
+ >> Feedbag.find("planet.debian.org").size
48
+ => 104
49
+ >>
50
+
51
+ Feedbag will find them all, but it will return the most important ones
52
+ on the first elements on the array returned.
53
+
54
+ >> Feedbag.find("cnn.com")
55
+ => ["http://rss.cnn.com/rss/cnn_topstories.rss", "http://rss.cnn.com/rss/cnn_latest.rss", "http://rss.cnn.com/services/podcasting/robinmeade/rss.xml"]
56
+ >>
57
+
58
+ Why you should use it?
59
+ ----------------------
60
+ - Because it's cool.
61
+ - Because it only uses Hpricot as dependency.
62
+ - Because it follows modern feed filename conventions (like those ones
63
+ used by WordPress blogs, or Blogger, etc).
64
+ - Because it's a single file you can embed easily in your application.
65
+ - Because it passes most of the Mark Pilgrim's Atom auto-discovery test
66
+ suite. It doesn't pass them all because some of those tests are
67
+ broken (citation needed).
68
+
69
+ Why did you build it?
70
+ ---------------------
71
+ - Because I liked Benjamin Trott's Feed::Finder.
72
+ - Because I thought it would be good to have Feed::Finder's functionality
73
+ in Ruby.
74
+ - Because I thought it was going to be easy to maintain.
75
+ - Because I was going to use it on rFeed.
76
+
data/feedbag.rb ADDED
@@ -0,0 +1,137 @@
1
+ #!/usr/bin/ruby
2
+
3
+ # Copyright Axiombox (c) 2008
4
+ # David Moreno <david@axiombox.com> (c) 2008
5
+
6
+ # This program is free software: you can redistribute it and/or modify
7
+ # it under the terms of the GNU General Public License as published by
8
+ # the Free Software Foundation, either version 3 of the License, or
9
+ # (at your option) any later version.
10
+ #
11
+ # This program is distributed in the hope that it will be useful,
12
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
13
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
+ # GNU General Public License for more details.
15
+ #
16
+ # You should have received a copy of the GNU General Public License
17
+ # along with this program. If not, see <http://www.gnu.org/licenses/>.
18
+
19
+
20
+ # Originally wrote by David Moreno <david@axiombox.com>
21
+ # mainly based on Benjamin Trott's Feed::Find
22
+
23
+ require "rubygems"
24
+ require "hpricot"
25
+ require "open-uri"
26
+ require "net/http"
27
+
28
+ module Feedbag
29
+
30
+ @content_types = [
31
+ 'application/x.atom+xml',
32
+ 'application/atom+xml',
33
+ 'application/xml',
34
+ 'text/xml',
35
+ 'application/rss+xml',
36
+ 'application/rdf+xml',
37
+ ]
38
+
39
+ $feeds = []
40
+ $base_uri = nil
41
+
42
+ def self.find(url)
43
+ $feeds = []
44
+
45
+ url_uri = URI.parse(url)
46
+ url = "#{url_uri.scheme or 'http'}://#{url_uri.host}#{url_uri.path}"
47
+
48
+ begin
49
+ html = open(url) do |f|
50
+ if @content_types.include?(f.content_type.downcase)
51
+ return self.add_feed(url, nil)
52
+ end
53
+
54
+ doc = Hpricot(f.read)
55
+
56
+ if doc.at("base") and doc.at("base")["href"]
57
+ $base_uri = doc.at("base")["href"]
58
+ else
59
+ $base_uri = nil
60
+ end
61
+
62
+ # first with links
63
+ (doc/"link").each do |l|
64
+ next unless l["rel"]
65
+ if l["type"] and @content_types.include?(l["type"].downcase.strip) and (l["rel"].downcase =~ /alternate/i or l["rel"] == "service.feed")
66
+ self.add_feed(l["href"], url, $base_uri)
67
+ end
68
+ end
69
+
70
+ (doc/"a").each do |a|
71
+ next unless a["href"]
72
+ if self.looks_like_feed?(a["href"]) and (a["href"] =~ /\// or a["href"] =~ /#{url_uri.host}/)
73
+ self.add_feed(a["href"], url, $base_uri)
74
+ end
75
+ end
76
+
77
+ (doc/"a").each do |a|
78
+ next unless a["href"]
79
+ if self.looks_like_feed?(a["href"])
80
+ self.add_feed(a["href"], url, $base_uri)
81
+ end
82
+ end
83
+
84
+ end
85
+ rescue OpenURI::HTTPError => the_error
86
+ puts "Error ocurred with `#{url}': #{the_error}"
87
+ rescue SocketError => err
88
+ puts "Socket error ocurred with: `#{url}': #{err}"
89
+ end
90
+
91
+ $feeds
92
+ end
93
+
94
+ def self.looks_like_feed?(url)
95
+ if url =~ /(\.(rdf|xml|rdf)$|feed=(rss|atom)|(atom|feed)\/$)/i
96
+ true
97
+ else
98
+ false
99
+ end
100
+ end
101
+
102
+ def self.add_feed(feed_url, orig_url, base_uri = nil)
103
+ # puts "#{feed_url} - #{orig_url}"
104
+ url = feed_url.sub(/^feed:/, '').strip
105
+
106
+ if base_uri
107
+ # url = base_uri + feed_url
108
+ url = URI.parse(base_uri).merge(feed_url).to_s
109
+ end
110
+
111
+ begin
112
+ uri = URI.parse(url)
113
+ rescue
114
+ puts "Error with `#{url}'"
115
+ exit 1
116
+ end
117
+ unless uri.absolute?
118
+ orig = URI.parse(orig_url)
119
+ url = orig.merge(url).to_s
120
+ end
121
+
122
+ # verify url is really valid
123
+ $feeds.push(url) unless $feeds.include?(url)# if self._is_http_valid(URI.parse(url), orig_url)
124
+ end
125
+
126
+ def self._is_http_valid(uri, orig_url)
127
+ req = Net::HTTP.get_response(uri)
128
+ orig_uri = URI.parse(orig_url)
129
+ case req
130
+ when Net::HTTPSuccess then
131
+ return true
132
+ else
133
+ return false
134
+ end
135
+ end
136
+ end
137
+
metadata ADDED
@@ -0,0 +1,64 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: damog-feedbag
3
+ version: !ruby/object:Gem::Version
4
+ version: "0.2"
5
+ platform: ruby
6
+ authors:
7
+ - Axiombox
8
+ - David Moreno
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2008-12-25 00:00:00 -08:00
14
+ default_executable:
15
+ dependencies:
16
+ - !ruby/object:Gem::Dependency
17
+ name: hpricot
18
+ version_requirement:
19
+ version_requirements: !ruby/object:Gem::Requirement
20
+ requirements:
21
+ - - ">="
22
+ - !ruby/object:Gem::Version
23
+ version: "0"
24
+ version:
25
+ description: Ruby's favorite feed auto-discoverty tool
26
+ email: david@axiombox.com
27
+ executables: []
28
+
29
+ extensions: []
30
+
31
+ extra_rdoc_files:
32
+ - README
33
+ files:
34
+ - feedbag.rb
35
+ - README
36
+ has_rdoc: true
37
+ homepage: http://axiombox.com/feedbag
38
+ post_install_message:
39
+ rdoc_options:
40
+ - --main
41
+ - README
42
+ require_paths:
43
+ - lib
44
+ required_ruby_version: !ruby/object:Gem::Requirement
45
+ requirements:
46
+ - - ">="
47
+ - !ruby/object:Gem::Version
48
+ version: "0"
49
+ version:
50
+ required_rubygems_version: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: "0"
55
+ version:
56
+ requirements: []
57
+
58
+ rubyforge_project: feedbag
59
+ rubygems_version: 1.2.0
60
+ signing_key:
61
+ specification_version: 2
62
+ summary: Ruby's favorite feed auto-discovery tool
63
+ test_files: []
64
+