async-feedbag 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (6) hide show
  1. checksums.yaml +7 -0
  2. data/COPYING +20 -0
  3. data/README.markdown +77 -0
  4. data/bin/feedbag +28 -0
  5. data/lib/async-feedbag.rb +195 -0
  6. metadata +84 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: e899338c5afa426f2847a62a0e2ec083de5e3dac00eec1ccdad3488f29417864
4
+ data.tar.gz: f74563f3b9b6bd9188f9ed0325ed3745c0a5d7d77bb36935c8bc9776f573fe65
5
+ SHA512:
6
+ metadata.gz: e403c42cb90ba8cc4e27648a376b54073642d3ac2cba3227961cedfddb8324c15b938e4f1b245c16d9afbf5e371cf0476e0cf6c826692feaa51e7c052b765d92
7
+ data.tar.gz: f7c5c5b019d2e110edef47d94127ce7b91e519f4dc0798f0bd5023b9526086c4f96356b9a076e85f85f8fb7d9882522bb6717fc23e34a6e580c87bc66ddaed26
data/COPYING ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (C) 2008-2022 David Moreno <damog@damog.net> et al.
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.markdown ADDED
@@ -0,0 +1,77 @@
1
+ Async-Feedbag
2
+ =======
3
+
4
+ Async-Feedbag is a fork of Ruby's favorite auto-discovery tool/library, using [async-http](https://github.com/socketry/async-http) for the requests.
5
+
6
+ ### Quick synopsis
7
+
8
+ ```ruby
9
+ >> require "async-feedbag"
10
+ => true
11
+ >> AsyncFeedbag.find "damog.net/blog"
12
+ => ["http://damog.net/blog/atom.xml"]
13
+ >> AsyncFeedbag.feed? "perl.org"
14
+ => false
15
+ >> AsyncFeedbag.feed?("https://m.signalvnoise.com/feed")
16
+ => true
17
+ ```
18
+
19
+ ### Installation
20
+
21
+ $ gem install async-feedbag
22
+
23
+ Or just grab async-feedbag.rb and use it on your own project:
24
+
25
+ $ wget https://raw.githubusercontent.com/renatolond/feedbag/main/lib/async-feedbag.rb
26
+
27
+ You can also use the command line tool for quick queries, if you install the gem:
28
+
29
+ » feedbag https://www.ruby-lang.org/en/
30
+ == https://www.ruby-lang.org/en/:
31
+ - https://www.ruby-lang.org/en/feeds/news.rss
32
+
33
+ ### Usage
34
+ Feedbag will find all RSS feed types. Here's an example of finding ATOM and JSON Feed
35
+
36
+ ```ruby
37
+ > AsyncFeedbag.find('https://daringfireball.net')
38
+ => ["https://daringfireball.net/feeds/main", "https://daringfireball.net/feeds/json", "https://daringfireball.net/linked/2021/02/17/bookfeed"]
39
+ ```
40
+
41
+ Feedbag defaults to a User-Agent string of **AsyncFeedbag/1.10.2**, however you can override this
42
+
43
+ ```ruby
44
+ 0> AsyncFeedbag.find('https://kottke.org', 'User-Agent' => "My Personal Agent/1.0.1")
45
+ => ["http://feeds.kottke.org/main", "http://feeds.kottke.org/json"]
46
+ ````
47
+
48
+ The other options passed to find, will be passed to OpenURI. For example:
49
+
50
+ ```ruby
51
+ AsyncFeedbag.find("https://kottke.org", 'User-Agent' => "My Personal Agent/1.0.1", open_timeout: 1000)
52
+ ```
53
+
54
+ You can find the other options to OpenURI [here](https://rubyapi.org/o/openuri/openread#method-i-open).
55
+
56
+
57
+ ### Why should you use it?
58
+
59
+ - Because it only uses [Nokogiri](http://nokogiri.org/) as dependency.
60
+ - Because it follows modern feed filename conventions (like those ones used by WordPress blogs, or Blogger, etc).
61
+ - Because it's a single file you can embed easily in your application.
62
+ - Because it's faster than anything else.
63
+
64
+ ### Author
65
+
66
+ [David Moreno](http://damog.net/) <[damog@damog.net](mailto:damog@damog.net)> is the original author of [feedbag](https://github.com/damog/feedbag).
67
+ [Renato "Lond" Cerqueira](https://lond.com.br) is the author of the async fork.
68
+
69
+ ### Donations
70
+
71
+ ![Superfeedr](https://raw.githubusercontent.com/damog/feedbag/master/img/superfeedr_150.png)
72
+
73
+ [Superfeedr](http://superfeedr.com) has kindly financially [supported](https://github.com/damog/feedbag/issues/9) the development of Feedbag.
74
+
75
+ ### Copyright
76
+
77
+ This is and will always be free software. See [COPYING](https://raw.githubusercontent.com/renatolond/feedbag/master/COPYING) for more information.
data/bin/feedbag ADDED
@@ -0,0 +1,28 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ require "rubygems"
5
+ require "feedbag-async"
6
+
7
+ def usage
8
+ %(
9
+ #{$PROGRAM_NAME} <url 1> [<url 2> <url 3> ... <url n>]
10
+ )
11
+ end
12
+
13
+ if ARGV.empty?
14
+ puts usage
15
+ exit 1
16
+ end
17
+
18
+ ARGV.each do |url|
19
+ puts "== #{url}:"
20
+ feeds = FeedbagAsync.find url
21
+ if feeds.empty?
22
+ puts " no feeds found!"
23
+ else
24
+ feeds.each do |f|
25
+ puts " - #{f}"
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,195 @@
1
+ #!/usr/bin/ruby
2
+ # frozen_string_literal: true
3
+
4
+ # See COPYING before using this software.
5
+
6
+ require "nokogiri"
7
+ require "async/http/internet/instance"
8
+ require "async/http/middleware/location_redirector"
9
+
10
+ class AsyncInternetWithRedirect < Async::HTTP::Internet
11
+ protected
12
+
13
+ def make_client(endpoint)
14
+ ::Protocol::HTTP::AcceptEncoding.new(
15
+ Async::HTTP::Middleware::LocationRedirector.new(Async::HTTP::Client.new(endpoint, **@options))
16
+ )
17
+ end
18
+ end
19
+
20
+ class AsyncFeedbag
21
+ VERSION = "2.0.0"
22
+ CONTENT_TYPES = [
23
+ "application/x.atom+xml",
24
+ "application/atom+xml",
25
+ "application/xml",
26
+ "text/xml",
27
+ "application/rss+xml",
28
+ "application/rdf+xml",
29
+ "application/json",
30
+ "application/feed+json"
31
+ ].freeze
32
+
33
+ class << self
34
+ # (see #feed?)
35
+ def feed?(url)
36
+ new.feed?(url)
37
+ end
38
+
39
+ # @param url [String]
40
+ # @param options [Hash]
41
+ # @return (see #find)
42
+ def find(url, options = {})
43
+ new(options: options).find(url, **options)
44
+ end
45
+ end
46
+
47
+ def initialize(options: nil)
48
+ @feeds = []
49
+ @options = options || {}
50
+ @options["User-Agent"] ||= "AsyncFeedbag/#{VERSION}"
51
+ end
52
+
53
+ FEED_SCHEME_RE = %r{^feed://}
54
+ def feed?(url)
55
+ # use LWR::Simple.normalize some time
56
+ url_uri = URI.parse(url)
57
+ url = "#{url_uri.scheme or "http"}://#{url_uri.host}#{url_uri.path}"
58
+ url << "?#{url_uri.query}" if url_uri.query
59
+
60
+ # hack:
61
+ url.sub!(FEED_SCHEME_RE, "http://")
62
+
63
+ res = AsyncFeedbag.find(url)
64
+ (res.size == 1) && (res.first == url)
65
+ end
66
+
67
+ RedirectionError = Class.new(StandardError)
68
+
69
+ XML_RE = /.xml$/
70
+ SERVICE_FEED_XPATH = "//link[@rel='alternate' or @rel='service.feed'][@href][@type]"
71
+ JSON_FEED_XPATH = "//link[@rel='alternate' and @type='application/json'][@href]"
72
+ def find(url, _options = {})
73
+ url_uri = URI.parse(url)
74
+ url = nil
75
+ if url_uri.scheme.nil?
76
+ url = "http://#{url_uri}"
77
+ elsif url_uri.scheme == "feed"
78
+ return add_feed(url_uri.to_s.sub(FEED_SCHEME_RE, "http://"), nil)
79
+ else
80
+ url = url_uri.to_s
81
+ end
82
+
83
+ # check if feed_valid is avail
84
+ begin
85
+ require "feed_validator"
86
+ v = W3C::FeedValidator.new
87
+ v.validate_url(url)
88
+ return add_feed(url, nil) if v.valid?
89
+ rescue LoadError
90
+ # scoo
91
+ rescue REXML::ParseException
92
+ # usually indicates timeout
93
+ # TODO: actually find out timeout. use Terminator?
94
+ # $stderr.puts "Feed looked like feed but might not have passed validation or timed out"
95
+ rescue => e
96
+ warn "#{e.class} error occurred with: `#{url}': #{e.message}"
97
+ end
98
+
99
+ retries = 2
100
+ begin
101
+ headers = @options.slice("User-Agent")
102
+ Sync do
103
+ response = AsyncInternetWithRedirect.get(url, headers)
104
+ if response.redirection?
105
+ original_uri = URI.parse(url)
106
+ uri = URI.parse(response.headers["location"])
107
+ if uri.host == original_uri.host
108
+ url = response.headers["location"]
109
+ raise RedirectionError
110
+ end
111
+ end
112
+
113
+ content_type = response.headers["content-type"].gsub(/;.*$/, "").downcase
114
+ next add_feed(url, nil) if CONTENT_TYPES.include?(content_type)
115
+
116
+ doc = Nokogiri::HTML(response.read)
117
+
118
+ @base_uri = (doc.at("base")["href"] if doc.at("base") && doc.at("base")["href"])
119
+
120
+ # first with links
121
+ (doc / "atom:link").each do |l|
122
+ next unless l["rel"] && l["href"].present?
123
+
124
+ add_feed(l["href"], url, @base_uri) if l["type"] && CONTENT_TYPES.include?(l["type"].downcase.strip) && (l["rel"].downcase == "self")
125
+ end
126
+
127
+ doc.xpath(SERVICE_FEED_XPATH).each do |l|
128
+ add_feed(l["href"], url, @base_uri) if CONTENT_TYPES.include?(l["type"].downcase.strip)
129
+ end
130
+
131
+ doc.xpath(JSON_FEED_XPATH).each do |e|
132
+ add_feed(e["href"], url, @base_uri) if looks_like_feed?(e["href"])
133
+ end
134
+
135
+ (doc / "a").each do |a|
136
+ next unless a["href"]
137
+
138
+ add_feed(a["href"], url, @base_uri) if looks_like_feed?(a["href"]) && (a["href"].include?("/") || a["href"] =~ /#{url_uri.host}/)
139
+
140
+ next unless a["href"]
141
+
142
+ add_feed(a["href"], url, @base_uri) if looks_like_feed?(a["href"])
143
+ end
144
+
145
+ # Added support for feeds like http://tabtimes.com/tbfeed/mashable/full.xml
146
+ add_feed(url, nil) if url.match(XML_RE) && doc.root && doc.root["xml:base"] && (doc.root["xml:base"].strip == url.strip)
147
+ ensure
148
+ response&.close
149
+ end
150
+ rescue RedirectionError
151
+ retries -= 1
152
+ retry if retries >= 0
153
+ rescue Timeout::Error => e
154
+ warn "Timeout error occurred with `#{url}: #{e}'"
155
+ rescue => e
156
+ warn "#{e.class} error occurred with: `#{url}': #{e.message}"
157
+ end
158
+ return @feeds
159
+ end
160
+
161
+ FEED_RE = %r{(\.(rdf|xml|rss)(\?([\w'\-%]?(=[\w'\-%.]*)?(&|#|\+|;)?)+)?(:[\w'\-%]+)?$|feed=(rss|atom)|(atom|feed)/?$)}i
162
+ def looks_like_feed?(url)
163
+ FEED_RE.match?(url)
164
+ end
165
+
166
+ def add_feed(feed_url, orig_url, base_uri = nil)
167
+ url = feed_url.sub(/^feed:/, "").strip
168
+
169
+ if base_uri
170
+ url = URI.parse(base_uri).merge(feed_url).to_s
171
+ end
172
+
173
+ begin
174
+ uri = URI.parse(url)
175
+ rescue
176
+ puts "Error with `#{url}'"
177
+ exit 1
178
+ end
179
+ unless uri.absolute?
180
+ orig = URI.parse(orig_url)
181
+ url = orig.merge(url).to_s
182
+ end
183
+
184
+ # verify url is really valid
185
+ @feeds.push(url) unless @feeds.include?(url)
186
+ end
187
+ end
188
+
189
+ if __FILE__ == $PROGRAM_NAME
190
+ if ARGV.empty?
191
+ puts "usage: feedbag url"
192
+ else
193
+ puts AsyncFeedbag.find ARGV.first
194
+ end
195
+ end
metadata ADDED
@@ -0,0 +1,84 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: async-feedbag
3
+ version: !ruby/object:Gem::Version
4
+ version: 2.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Renato "Lond" Cerqueira
8
+ - David Moreno
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 1980-01-02 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: async-http
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0.89'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0.89'
27
+ - !ruby/object:Gem::Dependency
28
+ name: nokogiri
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.8'
34
+ - - ">="
35
+ - !ruby/object:Gem::Version
36
+ version: 1.8.18
37
+ type: :runtime
38
+ prerelease: false
39
+ version_requirements: !ruby/object:Gem::Requirement
40
+ requirements:
41
+ - - "~>"
42
+ - !ruby/object:Gem::Version
43
+ version: '1.8'
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: 1.8.18
47
+ description: Ruby's favorite feed auto-discovery tool
48
+ email: renato@lond.com.br
49
+ executables:
50
+ - feedbag
51
+ extensions: []
52
+ extra_rdoc_files:
53
+ - COPYING
54
+ - README.markdown
55
+ files:
56
+ - COPYING
57
+ - README.markdown
58
+ - bin/feedbag
59
+ - lib/async-feedbag.rb
60
+ homepage: http://github.com/renatolond/async-feedbag
61
+ licenses:
62
+ - MIT
63
+ metadata:
64
+ rubygems_mfa_required: 'true'
65
+ rdoc_options:
66
+ - "--main"
67
+ - README.markdown
68
+ require_paths:
69
+ - lib
70
+ required_ruby_version: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - ">="
73
+ - !ruby/object:Gem::Version
74
+ version: '3.0'
75
+ required_rubygems_version: !ruby/object:Gem::Requirement
76
+ requirements:
77
+ - - ">="
78
+ - !ruby/object:Gem::Version
79
+ version: '0'
80
+ requirements: []
81
+ rubygems_version: 3.6.7
82
+ specification_version: 4
83
+ summary: RSS/Atom feed auto-discovery tool
84
+ test_files: []