penso-feedbag 0.5.100 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/.gitignore ADDED
@@ -0,0 +1,30 @@
1
+ *.swp
2
+ **/*.pid
3
+ log/*.log
4
+ log/*.pid
5
+ tmp
6
+ .DS_Store
7
+ public/cache/**/*
8
+ public/system/**/*
9
+ doc/**/*
10
+ db/*.sqlite3
11
+ .project
12
+ .loadpath
13
+ nbproject/
14
+ .idea
15
+ testjour.log
16
+ *.so
17
+ *.o
18
+ Makefile
19
+ mkmf.log
20
+ *.bundle
21
+ conftest
22
+ content/
23
+ .idea
24
+ *.sw?
25
+ .DS_Store
26
+ coverage
27
+ rdoc
28
+ pkg
29
+ pkg/*
30
+ log/*
data/ChangeLog ADDED
@@ -0,0 +1,20 @@
1
+ * 0.5.99 - Tue May 12 12:52:22 EDT 2009
2
+ - Added rails/init.rb to load easily on a Rails app.
3
+
4
+ * 0.5.13.1 - Wed Apr 22 11:16:19 EDT 2009
5
+ - Changed args on find() from nil to {}
6
+
7
+ * 0.5.13 - Wed Apr 22 11:12:40 EDT 2009
8
+ - Added :narrow option so find() skips feed_validate and A links.
9
+
10
+ * 0.5.12 - Fri Mar 20 12:34:48 EDT 2009
11
+ - Added support for "feed://" URLs
12
+
13
+ * 0.5.11 - Sat Mar 7 17:22:30 EST 2009
14
+ - Benchmark against Rfeedfinder added.
15
+
16
+ * 0.5.10 - Wed Mar 4 13:32:33 EST 2009
17
+ - Feeds whose URLs contained query string arguments were not being
18
+ auto-discovered -- fixed
19
+
20
+ ** For previous changes, see the git log
data/README.markdown CHANGED
@@ -1,7 +1,6 @@
1
1
  Feedbag
2
2
  =======
3
- > Do you want me to drag my sack across your face?
4
- > - Glenn Quagmire
3
+ Forked version of feedback that returns title and url.
5
4
 
6
5
  Feedbag is a feed auto-discovery Ruby library. You don't need to know more about it. It is said to be:
7
6
 
data/TODO ADDED
@@ -0,0 +1 @@
1
+ - Document Feedbag.feed?
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.6.0
data/feedbag.gemspec ADDED
@@ -0,0 +1,20 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ Gem::Specification.new do |s|
4
+ s.name = %q{feedbag}
5
+ s.version = "0.5.103"
6
+ s.homepage = "http://axiombox.com/feedbag"
7
+ #s.rubyforge_project = "feedbag"
8
+
9
+ s.authors = ["Axiombox", "David Moreno"]
10
+ s.date = %q{2009-02-10}
11
+ s.description = %q{Ruby's favorite feed auto-discoverty tool}
12
+ s.email = %q{david@axiombox.com}
13
+ s.extra_rdoc_files = ["README.markdown", "COPYING"]
14
+ s.files = ["lib/feedbag.rb", "benchmark/rfeedfinder_benchmark.rb"]
15
+ s.has_rdoc = true
16
+ s.rdoc_options = ["--main", "README.markdown"]
17
+ s.summary = %q{Ruby's favorite feed auto-discovery tool}
18
+ s.add_dependency("hpricot", '>= 0.6')
19
+ end
20
+
data/index.html ADDED
@@ -0,0 +1,115 @@
1
+ <h1>Feedbag</h1>
2
+
3
+ <blockquote>
4
+ <p>Do you want me to drag my sack across your face?
5
+ - Glenn Quagmire</p>
6
+ </blockquote>
7
+
8
+ <p>Feedbag is a feed auto-discovery Ruby library. You don't need to know more about it. It is said to be:</p>
9
+
10
+ <blockquote>
11
+ <p>Ruby's favorite auto-discovery tool/library!</p>
12
+ </blockquote>
13
+
14
+ <h3>Quick synopsis</h3>
15
+
16
+ <pre><code>&gt;&gt; require "rubygems"
17
+ =&gt; true
18
+ &gt;&gt; require "feedbag"
19
+ =&gt; true
20
+ &gt;&gt; Feedbag.find "log.damog.net"
21
+ =&gt; ["http://feeds.feedburner.com/TeoremaDelCerdoInfinito", "http://log.damog.net/comments/feed/"]
22
+ </code></pre>
23
+
24
+ <h3>Installation</h3>
25
+
26
+ <pre><code>$ sudo gem install damog-feedbag -s http://gems.github.com/
27
+ </code></pre>
28
+
29
+ <p>Or just grab feedbag.rb and use it on your own project:</p>
30
+
31
+ <pre><code>$ wget http://github.com/damog/feedbag/raw/master/lib/feedbag.rb
32
+ </code></pre>
33
+
34
+ <h2>Tutorial</h2>
35
+
36
+ <p>So you want to know more about it.</p>
37
+
38
+ <p>OK, if the URL passed to the find method is a feed itself, that only feed URL will be returned.</p>
39
+
40
+ <pre><code>&gt;&gt; Feedbag.find "github.com/damog.atom"
41
+ =&gt; ["http://github.com/damog.atom"]
42
+ &gt;&gt;
43
+ </code></pre>
44
+
45
+ <p>Otherwise, it will always return LINK feeds first, A (anchor tags) feeds later. Between A feeds, the ones hosted on the same URL's host, will have larger priority:</p>
46
+
47
+ <pre><code>&gt;&gt; Feedbag.find "http://ve.planetalinux.org"
48
+ =&gt; ["http://feedproxy.google.com/PlanetaLinuxVenezuela", "http://rendergraf.wordpress.com/feed/", "http://rootweiller.wordpress.com/feed/", "http://skatox.com/blog/feed/", "http://kodegeek.com/atom.xml", "http://blog.0x29.com.ve/?feed=rss2&amp;cat=8"]
49
+ &gt;&gt;
50
+ </code></pre>
51
+
52
+ <p>On your application you should only take the very first element of the array, most of the times:</p>
53
+
54
+ <pre><code>&gt;&gt; Feedbag.find("planet.debian.org").first(3)
55
+ =&gt; ["http://planet.debian.org/rss10.xml", "http://planet.debian.org/rss20.xml", "http://planet.debian.org/atom.xml"]
56
+ &gt;&gt;
57
+ </code></pre>
58
+
59
+ <p>(Try running that same example without the "first" method. That example's host is a blog aggregator, so it has hundreds of feed URLs:)</p>
60
+
61
+ <pre><code>&gt;&gt; Feedbag.find("planet.debian.org").size
62
+ =&gt; 104
63
+ &gt;&gt;
64
+ </code></pre>
65
+
66
+ <p>Feedbag will find them all, but it will return the most important ones on the first elements on the array returned.</p>
67
+
68
+ <pre><code>&gt;&gt; Feedbag.find("cnn.com")
69
+ =&gt; ["http://rss.cnn.com/rss/cnn_topstories.rss", "http://rss.cnn.com/rss/cnn_latest.rss", "http://rss.cnn.com/services/podcasting/robinmeade/rss.xml"]
70
+ &gt;&gt;
71
+ </code></pre>
72
+
73
+ <h3>Why should you use it?</h3>
74
+
75
+ <ul>
76
+ <li>Because it's cool.</li>
77
+ <li>Because it only uses <a href="https://code.whytheluckystiff.net/hpricot/">Hpricot</a> as dependency.</li>
78
+ <li>Because it follows modern feed filename conventions (like those ones used by WordPress blogs, or Blogger, etc).</li>
79
+ <li>Because it's a single file you can embed easily in your application.</li>
80
+ <li>Because it passes most of the Mark Pilgrim's <a href="http://diveintomark.org/tests/client/autodiscovery/">Atom auto-discovery test suite</a>. It doesn't pass them all because some of those tests are broken (citation needed).</li>
81
+ </ul>
82
+
83
+ <h3>Why did I build it?</h3>
84
+
85
+ <ul>
86
+ <li>Because I liked Benjamin Trott's <a href="http://search.cpan.org/~btrott/Feed-Find-0.06/lib/Feed/Find.pm">Feed::Find</a>.</li>
87
+ <li>Because I thought it would be good to have Feed::Find's functionality in Ruby.</li>
88
+ <li>Because I thought it was going to be easy to maintain.</li>
89
+ <li>Because I was going to use it on <a href="http://github.com/damog/rfeed">rFeed</a>.</li>
90
+ <li>And finally, because I didn't know <a href="http://rfeedfinder.rubyforge.org/">rfeedfinder</a> existed :-)</li>
91
+ </ul>
92
+
93
+ <h3>Bugs</h3>
94
+
95
+ <p>Please, report bugs to <a href="rt@support.axiombox.com">rt@support.axiombox.com</a> or directly to the author.</p>
96
+
97
+ <h3>Contribute</h3>
98
+
99
+ <blockquote>
100
+ <p>git clone git://github.com/damog/feedbag.git</p>
101
+ </blockquote>
102
+
103
+ <p>...patch, build, hack and make pull requests. I'll be glad.</p>
104
+
105
+ <h3>Author</h3>
106
+
107
+ <p><a href="http://damog.net/">David Moreno</a> &lt;<a href="mailto:david@axiombox.com">david@axiombox.com</a>>.</p>
108
+
109
+ <h3>Copyright</h3>
110
+
111
+ <p>This is free software. See <a href="http://github.com/damog/feedbag/master/COPYING">COPYING</a> for more information.</p>
112
+
113
+ <h3>Thanks</h3>
114
+
115
+ <p><a href="http://maggit.net">Raquel</a>, for making <a href="http://axiombox.com">Axiombox</a> and most of my dreams possible. Also, <a href="http://github.com">GitHub</a> for making a nice code sharing service that doesn't suck.</p>
data/lib/feedbag.rb CHANGED
@@ -20,8 +20,11 @@ require "rubygems"
20
20
  require "hpricot"
21
21
  require "open-uri"
22
22
  require "net/http"
23
+ require 'timeout'
24
+ require 'iconv'
23
25
 
24
26
  module Feedbag
27
+ Feed = Struct.new(:url, :title)
25
28
 
26
29
  @content_types = [
27
30
  'application/x.atom+xml',
@@ -64,6 +67,8 @@ module Feedbag
64
67
  end
65
68
  #url = "#{url_uri.scheme or 'http'}://#{url_uri.host}#{url_uri.path}"
66
69
 
70
+ return self.add_feed(url, nil) if looks_like_feed? url
71
+
67
72
  # check if feed_valid is avail
68
73
  unless args[:narrow]
69
74
  begin
@@ -83,48 +88,39 @@ module Feedbag
83
88
  end
84
89
 
85
90
  begin
86
- html = open(url) do |f|
87
- content_type = f.content_type.downcase
88
- if content_type == "application/octet-stream" # open failed
89
- content_type = f.meta["content-type"].gsub(/;.*$/, '')
90
- end
91
- if @content_types.include?(content_type)
92
- return self.add_feed(url, nil)
93
- end
94
-
95
- doc = Hpricot(f.read)
96
-
97
- if doc.at("base") and doc.at("base")["href"]
98
- $base_uri = doc.at("base")["href"]
99
- else
100
- $base_uri = nil
101
- end
102
-
103
- # first with links
104
- (doc/"link").each do |l|
105
- next unless l["rel"]
106
- if l["type"] and @content_types.include?(l["type"].downcase.strip) and (l["rel"].downcase =~ /alternate/i or l["rel"] == "service.feed")
107
- self.add_feed(l["href"], url, $base_uri)
108
- end
109
- end
110
-
111
- unless args[:narrow]
112
- (doc/"a").each do |a|
113
- next unless a["href"]
114
- if self.looks_like_feed?(a["href"]) and (a["href"] =~ /\// or a["href"] =~ /#{url_uri.host}/)
115
- self.add_feed(a["href"], url, $base_uri)
116
- end
117
- end
118
-
119
- (doc/"a").each do |a|
120
- next unless a["href"]
121
- if self.looks_like_feed?(a["href"])
122
- self.add_feed(a["href"], url, $base_uri)
123
- end
124
- end
91
+ Timeout::timeout(10) do
92
+ open(url) do |f|
93
+ if @content_types.include?(f.content_type.downcase)
94
+ return self.add_feed(url, nil)
95
+ end
96
+
97
+ ic = Iconv.new('UTF-8//IGNORE', f.charset)
98
+ doc = Hpricot(ic.iconv(f.read))
99
+
100
+ if doc.at("base") and doc.at("base")["href"]
101
+ $base_uri = doc.at("base")["href"]
102
+ else
103
+ $base_uri = nil
104
+ end
105
+
106
+ # first with links
107
+ (doc/"link").each do |l|
108
+ next unless l["rel"]
109
+ if l["type"] and @content_types.include?(l["type"].downcase.strip) and (l["rel"].downcase =~ /alternate/i or l["rel"] == "service.feed")
110
+ self.add_feed(l["href"], url, $base_uri, l["title"])
111
+ end
112
+ end
113
+
114
+ unless args[:narrow]
115
+ (doc/"a").each do |a|
116
+ next unless a["href"]
117
+ if self.looks_like_feed?(a["href"])
118
+ self.add_feed(a["href"], url, $base_uri, a["title"] || a.inner_html || a['alt']) # multiple fallbacks, first title, then the tag content, then the alt tag (in case of image)
119
+ end
120
+ end
121
+ end
125
122
  end
126
-
127
- end
123
+ end
128
124
  rescue Timeout::Error => err
129
125
  $stderr.puts "Timeout error ocurred with `#{url}: #{err}'"
130
126
  rescue OpenURI::HTTPError => the_error
@@ -136,18 +132,17 @@ module Feedbag
136
132
  ensure
137
133
  return $feeds
138
134
  end
139
-
140
135
  end
141
136
 
142
137
  def self.looks_like_feed?(url)
143
- if url =~ /(\.(rdf|xml|rdf|rss)$|feed=(rss|atom)|(atom|feed)\/?$)/i
138
+ if url =~ /((\.|\/)(rdf|xml|rdf|rss)$|feed=(rss|atom)|(atom|feed)\/?$)/i
144
139
  true
145
140
  else
146
141
  false
147
142
  end
148
143
  end
149
144
 
150
- def self.add_feed(feed_url, orig_url, base_uri = nil)
145
+ def self.add_feed(feed_url, orig_url, base_uri = nil, title = "")
151
146
  # puts "#{feed_url} - #{orig_url}"
152
147
  url = feed_url.sub(/^feed:/, '').strip
153
148
 
@@ -168,7 +163,7 @@ module Feedbag
168
163
  end
169
164
 
170
165
  # verify url is really valid
171
- $feeds.push(url) unless $feeds.include?(url)# if self._is_http_valid(URI.parse(url), orig_url)
166
+ $feeds.push(Feed.new(url, title)) unless $feeds.any? { |f| f.url == url }# if self._is_http_valid(URI.parse(url), orig_url)
172
167
  end
173
168
 
174
169
  # not used. yet.
data/rails/init.rb ADDED
@@ -0,0 +1 @@
1
+ require File.join File.dirname(__FILE__), "..", "lib", "feedbag"
@@ -0,0 +1,46 @@
1
+ #!/usr/bin/ruby
2
+
3
+ require "#{File.dirname(__FILE__)}/../feedbag"
4
+ require "test/unit"
5
+ require "open-uri"
6
+ require "hpricot"
7
+ require "pp"
8
+
9
+ class AtomAutoDiscoveryTest < Test::Unit::TestCase
10
+ def test_autodisc
11
+ base_url = "http://diveintomark.org/tests/client/autodiscovery/"
12
+ url = base_url + "html4-001.html"
13
+
14
+ i = 1
15
+ puts "trying now with #{url}"
16
+ while(i)
17
+ puts
18
+ i = 0 # unless otherwise found
19
+
20
+ f = Feedbag.find url
21
+
22
+ assert_instance_of Array, f
23
+ assert f.size == 1, "Feedbag didn't find a feed on #{url} or found more than one"
24
+
25
+ puts " found #{f[0]}"
26
+ feed = Hpricot(open(f[0]))
27
+
28
+ (feed/"link").each do |l|
29
+ next unless l["rel"] == "alternate"
30
+ assert_equal l["href"], url
31
+ end
32
+
33
+ # ahora me voy al siguiente
34
+ html = Hpricot(open(url))
35
+ (html/"link").each do |l|
36
+ next unless l["rel"] == "next"
37
+ url = URI.parse(base_url).merge(l["href"]).to_s
38
+ puts "trying now with #{url}"
39
+ i = 1
40
+ end
41
+
42
+ end
43
+ end
44
+
45
+
46
+ end
metadata CHANGED
@@ -1,51 +1,58 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: penso-feedbag
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.100
4
+ version: 0.6.1
5
5
  platform: ruby
6
6
  authors:
7
- - Axiombox
8
- - David Moreno
7
+ - Joel Duffin
8
+ - Justin Ball
9
9
  - Fabien Penso
10
10
  autorequire:
11
11
  bindir: bin
12
12
  cert_chain: []
13
13
 
14
- date: 2010-02-11 00:00:00 +01:00
14
+ date: 2009-11-10 00:00:00 +01:00
15
15
  default_executable:
16
16
  dependencies:
17
17
  - !ruby/object:Gem::Dependency
18
- name: hpricot
19
- type: :runtime
18
+ name: shoulda
19
+ type: :development
20
20
  version_requirement:
21
21
  version_requirements: !ruby/object:Gem::Requirement
22
22
  requirements:
23
23
  - - ">="
24
24
  - !ruby/object:Gem::Version
25
- version: "0.6"
25
+ version: "0"
26
26
  version:
27
- description: Ruby's favorite feed auto-discoverty tool
28
- email: david@axiombox.com
27
+ description: This gem will return title and url for each feed discovered at a given url, handling proper charset
28
+ email: fabienpenso@gmail.com
29
29
  executables: []
30
30
 
31
31
  extensions: []
32
32
 
33
33
  extra_rdoc_files:
34
+ - ChangeLog
34
35
  - README.markdown
35
- - COPYING
36
36
  files:
37
- - lib/feedbag.rb
38
- - benchmark/rfeedfinder_benchmark.rb
39
- - README.markdown
37
+ - .gitignore
40
38
  - COPYING
39
+ - ChangeLog
40
+ - README.markdown
41
+ - TODO
42
+ - VERSION
43
+ - benchmark/rfeedfinder_benchmark.rb
44
+ - feedbag.gemspec
45
+ - index.html
46
+ - lib/feedbag.rb
47
+ - rails/init.rb
48
+ - test/atom_autodiscovery_test.rb
41
49
  has_rdoc: true
42
- homepage: http://axiombox.com/feedbag
50
+ homepage: http://github.com/penso/muck-feedbag
43
51
  licenses: []
44
52
 
45
53
  post_install_message:
46
54
  rdoc_options:
47
- - --main
48
- - README.markdown
55
+ - --charset=UTF-8
49
56
  require_paths:
50
57
  - lib
51
58
  required_ruby_version: !ruby/object:Gem::Requirement
@@ -62,10 +69,10 @@ required_rubygems_version: !ruby/object:Gem::Requirement
62
69
  version:
63
70
  requirements: []
64
71
 
65
- rubyforge_project: feedbag
72
+ rubyforge_project:
66
73
  rubygems_version: 1.3.5
67
74
  signing_key:
68
75
  specification_version: 3
69
- summary: Ruby's favorite feed auto-discovery tool
70
- test_files: []
71
-
76
+ summary: Fork of the feedbag gem.
77
+ test_files:
78
+ - test/atom_autodiscovery_test.rb