async-feedbag 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/COPYING +20 -0
- data/README.markdown +77 -0
- data/bin/feedbag +28 -0
- data/lib/async-feedbag.rb +195 -0
- metadata +84 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: e899338c5afa426f2847a62a0e2ec083de5e3dac00eec1ccdad3488f29417864
|
4
|
+
data.tar.gz: f74563f3b9b6bd9188f9ed0325ed3745c0a5d7d77bb36935c8bc9776f573fe65
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: e403c42cb90ba8cc4e27648a376b54073642d3ac2cba3227961cedfddb8324c15b938e4f1b245c16d9afbf5e371cf0476e0cf6c826692feaa51e7c052b765d92
|
7
|
+
data.tar.gz: f7c5c5b019d2e110edef47d94127ce7b91e519f4dc0798f0bd5023b9526086c4f96356b9a076e85f85f8fb7d9882522bb6717fc23e34a6e580c87bc66ddaed26
|
data/COPYING
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (C) 2008-2022 David Moreno <damog@damog.net> et al.
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.markdown
ADDED
@@ -0,0 +1,77 @@
|
|
1
|
+
Async-Feedbag
|
2
|
+
=======
|
3
|
+
|
4
|
+
Async-Feedbag is a fork of Ruby's favorite auto-discovery tool/library, using [async-http](https://github.com/socketry/async-http) for the requests.
|
5
|
+
|
6
|
+
### Quick synopsis
|
7
|
+
|
8
|
+
```ruby
|
9
|
+
>> require "async-feedbag"
|
10
|
+
=> true
|
11
|
+
>> AsyncFeedbag.find "damog.net/blog"
|
12
|
+
=> ["http://damog.net/blog/atom.xml"]
|
13
|
+
>> AsyncFeedbag.feed? "perl.org"
|
14
|
+
=> false
|
15
|
+
>> AsyncFeedbag.feed?("https://m.signalvnoise.com/feed")
|
16
|
+
=> true
|
17
|
+
```
|
18
|
+
|
19
|
+
### Installation
|
20
|
+
|
21
|
+
$ gem install async-feedbag
|
22
|
+
|
23
|
+
Or just grab async-feedbag.rb and use it on your own project:
|
24
|
+
|
25
|
+
$ wget https://raw.githubusercontent.com/renatolond/feedbag/main/lib/async-feedbag.rb
|
26
|
+
|
27
|
+
You can also use the command line tool for quick queries, if you install the gem:
|
28
|
+
|
29
|
+
» feedbag https://www.ruby-lang.org/en/
|
30
|
+
== https://www.ruby-lang.org/en/:
|
31
|
+
- https://www.ruby-lang.org/en/feeds/news.rss
|
32
|
+
|
33
|
+
### Usage
|
34
|
+
Feedbag will find all RSS feed types. Here's an example of finding ATOM and JSON Feed
|
35
|
+
|
36
|
+
```ruby
|
37
|
+
> AsyncFeedbag.find('https://daringfireball.net')
|
38
|
+
=> ["https://daringfireball.net/feeds/main", "https://daringfireball.net/feeds/json", "https://daringfireball.net/linked/2021/02/17/bookfeed"]
|
39
|
+
```
|
40
|
+
|
41
|
+
Feedbag defaults to a User-Agent string of **AsyncFeedbag/1.10.2**, however you can override this
|
42
|
+
|
43
|
+
```ruby
|
44
|
+
0> AsyncFeedbag.find('https://kottke.org', 'User-Agent' => "My Personal Agent/1.0.1")
|
45
|
+
=> ["http://feeds.kottke.org/main", "http://feeds.kottke.org/json"]
|
46
|
+
````
|
47
|
+
|
48
|
+
The other options passed to find, will be passed to OpenURI. For example:
|
49
|
+
|
50
|
+
```ruby
|
51
|
+
AsyncFeedbag.find("https://kottke.org", 'User-Agent' => "My Personal Agent/1.0.1", open_timeout: 1000)
|
52
|
+
```
|
53
|
+
|
54
|
+
You can find the other options to OpenURI [here](https://rubyapi.org/o/openuri/openread#method-i-open).
|
55
|
+
|
56
|
+
|
57
|
+
### Why should you use it?
|
58
|
+
|
59
|
+
- Because it only uses [Nokogiri](http://nokogiri.org/) as dependency.
|
60
|
+
- Because it follows modern feed filename conventions (like those ones used by WordPress blogs, or Blogger, etc).
|
61
|
+
- Because it's a single file you can embed easily in your application.
|
62
|
+
- Because it's faster than anything else.
|
63
|
+
|
64
|
+
### Author
|
65
|
+
|
66
|
+
[David Moreno](http://damog.net/) <[damog@damog.net](mailto:damog@damog.net)> is the original author of [feedbag](https://github.com/damog/feedbag).
|
67
|
+
[Renato "Lond" Cerqueira](https://lond.com.br) is the author of the async fork.
|
68
|
+
|
69
|
+
### Donations
|
70
|
+
|
71
|
+

|
72
|
+
|
73
|
+
[Superfeedr](http://superfeedr.com) has kindly financially [supported](https://github.com/damog/feedbag/issues/9) the development of Feedbag.
|
74
|
+
|
75
|
+
### Copyright
|
76
|
+
|
77
|
+
This is and will always be free software. See [COPYING](https://raw.githubusercontent.com/renatolond/feedbag/master/COPYING) for more information.
|
data/bin/feedbag
ADDED
@@ -0,0 +1,28 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
# frozen_string_literal: true
|
3
|
+
|
4
|
+
require "rubygems"
|
5
|
+
require "feedbag-async"
|
6
|
+
|
7
|
+
def usage
|
8
|
+
%(
|
9
|
+
#{$PROGRAM_NAME} <url 1> [<url 2> <url 3> ... <url n>]
|
10
|
+
)
|
11
|
+
end
|
12
|
+
|
13
|
+
if ARGV.empty?
|
14
|
+
puts usage
|
15
|
+
exit 1
|
16
|
+
end
|
17
|
+
|
18
|
+
ARGV.each do |url|
|
19
|
+
puts "== #{url}:"
|
20
|
+
feeds = FeedbagAsync.find url
|
21
|
+
if feeds.empty?
|
22
|
+
puts " no feeds found!"
|
23
|
+
else
|
24
|
+
feeds.each do |f|
|
25
|
+
puts " - #{f}"
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
@@ -0,0 +1,195 @@
|
|
1
|
+
#!/usr/bin/ruby
|
2
|
+
# frozen_string_literal: true
|
3
|
+
|
4
|
+
# See COPYING before using this software.
|
5
|
+
|
6
|
+
require "nokogiri"
|
7
|
+
require "async/http/internet/instance"
|
8
|
+
require "async/http/middleware/location_redirector"
|
9
|
+
|
10
|
+
class AsyncInternetWithRedirect < Async::HTTP::Internet
|
11
|
+
protected
|
12
|
+
|
13
|
+
def make_client(endpoint)
|
14
|
+
::Protocol::HTTP::AcceptEncoding.new(
|
15
|
+
Async::HTTP::Middleware::LocationRedirector.new(Async::HTTP::Client.new(endpoint, **@options))
|
16
|
+
)
|
17
|
+
end
|
18
|
+
end
|
19
|
+
|
20
|
+
class AsyncFeedbag
|
21
|
+
VERSION = "2.0.0"
|
22
|
+
CONTENT_TYPES = [
|
23
|
+
"application/x.atom+xml",
|
24
|
+
"application/atom+xml",
|
25
|
+
"application/xml",
|
26
|
+
"text/xml",
|
27
|
+
"application/rss+xml",
|
28
|
+
"application/rdf+xml",
|
29
|
+
"application/json",
|
30
|
+
"application/feed+json"
|
31
|
+
].freeze
|
32
|
+
|
33
|
+
class << self
|
34
|
+
# (see #feed?)
|
35
|
+
def feed?(url)
|
36
|
+
new.feed?(url)
|
37
|
+
end
|
38
|
+
|
39
|
+
# @param url [String]
|
40
|
+
# @param options [Hash]
|
41
|
+
# @return (see #find)
|
42
|
+
def find(url, options = {})
|
43
|
+
new(options: options).find(url, **options)
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
def initialize(options: nil)
|
48
|
+
@feeds = []
|
49
|
+
@options = options || {}
|
50
|
+
@options["User-Agent"] ||= "AsyncFeedbag/#{VERSION}"
|
51
|
+
end
|
52
|
+
|
53
|
+
FEED_SCHEME_RE = %r{^feed://}
|
54
|
+
def feed?(url)
|
55
|
+
# use LWR::Simple.normalize some time
|
56
|
+
url_uri = URI.parse(url)
|
57
|
+
url = "#{url_uri.scheme or "http"}://#{url_uri.host}#{url_uri.path}"
|
58
|
+
url << "?#{url_uri.query}" if url_uri.query
|
59
|
+
|
60
|
+
# hack:
|
61
|
+
url.sub!(FEED_SCHEME_RE, "http://")
|
62
|
+
|
63
|
+
res = AsyncFeedbag.find(url)
|
64
|
+
(res.size == 1) && (res.first == url)
|
65
|
+
end
|
66
|
+
|
67
|
+
RedirectionError = Class.new(StandardError)
|
68
|
+
|
69
|
+
XML_RE = /.xml$/
|
70
|
+
SERVICE_FEED_XPATH = "//link[@rel='alternate' or @rel='service.feed'][@href][@type]"
|
71
|
+
JSON_FEED_XPATH = "//link[@rel='alternate' and @type='application/json'][@href]"
|
72
|
+
def find(url, _options = {})
|
73
|
+
url_uri = URI.parse(url)
|
74
|
+
url = nil
|
75
|
+
if url_uri.scheme.nil?
|
76
|
+
url = "http://#{url_uri}"
|
77
|
+
elsif url_uri.scheme == "feed"
|
78
|
+
return add_feed(url_uri.to_s.sub(FEED_SCHEME_RE, "http://"), nil)
|
79
|
+
else
|
80
|
+
url = url_uri.to_s
|
81
|
+
end
|
82
|
+
|
83
|
+
# check if feed_valid is avail
|
84
|
+
begin
|
85
|
+
require "feed_validator"
|
86
|
+
v = W3C::FeedValidator.new
|
87
|
+
v.validate_url(url)
|
88
|
+
return add_feed(url, nil) if v.valid?
|
89
|
+
rescue LoadError
|
90
|
+
# scoo
|
91
|
+
rescue REXML::ParseException
|
92
|
+
# usually indicates timeout
|
93
|
+
# TODO: actually find out timeout. use Terminator?
|
94
|
+
# $stderr.puts "Feed looked like feed but might not have passed validation or timed out"
|
95
|
+
rescue => e
|
96
|
+
warn "#{e.class} error occurred with: `#{url}': #{e.message}"
|
97
|
+
end
|
98
|
+
|
99
|
+
retries = 2
|
100
|
+
begin
|
101
|
+
headers = @options.slice("User-Agent")
|
102
|
+
Sync do
|
103
|
+
response = AsyncInternetWithRedirect.get(url, headers)
|
104
|
+
if response.redirection?
|
105
|
+
original_uri = URI.parse(url)
|
106
|
+
uri = URI.parse(response.headers["location"])
|
107
|
+
if uri.host == original_uri.host
|
108
|
+
url = response.headers["location"]
|
109
|
+
raise RedirectionError
|
110
|
+
end
|
111
|
+
end
|
112
|
+
|
113
|
+
content_type = response.headers["content-type"].gsub(/;.*$/, "").downcase
|
114
|
+
next add_feed(url, nil) if CONTENT_TYPES.include?(content_type)
|
115
|
+
|
116
|
+
doc = Nokogiri::HTML(response.read)
|
117
|
+
|
118
|
+
@base_uri = (doc.at("base")["href"] if doc.at("base") && doc.at("base")["href"])
|
119
|
+
|
120
|
+
# first with links
|
121
|
+
(doc / "atom:link").each do |l|
|
122
|
+
next unless l["rel"] && l["href"].present?
|
123
|
+
|
124
|
+
add_feed(l["href"], url, @base_uri) if l["type"] && CONTENT_TYPES.include?(l["type"].downcase.strip) && (l["rel"].downcase == "self")
|
125
|
+
end
|
126
|
+
|
127
|
+
doc.xpath(SERVICE_FEED_XPATH).each do |l|
|
128
|
+
add_feed(l["href"], url, @base_uri) if CONTENT_TYPES.include?(l["type"].downcase.strip)
|
129
|
+
end
|
130
|
+
|
131
|
+
doc.xpath(JSON_FEED_XPATH).each do |e|
|
132
|
+
add_feed(e["href"], url, @base_uri) if looks_like_feed?(e["href"])
|
133
|
+
end
|
134
|
+
|
135
|
+
(doc / "a").each do |a|
|
136
|
+
next unless a["href"]
|
137
|
+
|
138
|
+
add_feed(a["href"], url, @base_uri) if looks_like_feed?(a["href"]) && (a["href"].include?("/") || a["href"] =~ /#{url_uri.host}/)
|
139
|
+
|
140
|
+
next unless a["href"]
|
141
|
+
|
142
|
+
add_feed(a["href"], url, @base_uri) if looks_like_feed?(a["href"])
|
143
|
+
end
|
144
|
+
|
145
|
+
# Added support for feeds like http://tabtimes.com/tbfeed/mashable/full.xml
|
146
|
+
add_feed(url, nil) if url.match(XML_RE) && doc.root && doc.root["xml:base"] && (doc.root["xml:base"].strip == url.strip)
|
147
|
+
ensure
|
148
|
+
response&.close
|
149
|
+
end
|
150
|
+
rescue RedirectionError
|
151
|
+
retries -= 1
|
152
|
+
retry if retries >= 0
|
153
|
+
rescue Timeout::Error => e
|
154
|
+
warn "Timeout error occurred with `#{url}: #{e}'"
|
155
|
+
rescue => e
|
156
|
+
warn "#{e.class} error occurred with: `#{url}': #{e.message}"
|
157
|
+
end
|
158
|
+
return @feeds
|
159
|
+
end
|
160
|
+
|
161
|
+
FEED_RE = %r{(\.(rdf|xml|rss)(\?([\w'\-%]?(=[\w'\-%.]*)?(&|#|\+|;)?)+)?(:[\w'\-%]+)?$|feed=(rss|atom)|(atom|feed)/?$)}i
|
162
|
+
def looks_like_feed?(url)
|
163
|
+
FEED_RE.match?(url)
|
164
|
+
end
|
165
|
+
|
166
|
+
def add_feed(feed_url, orig_url, base_uri = nil)
|
167
|
+
url = feed_url.sub(/^feed:/, "").strip
|
168
|
+
|
169
|
+
if base_uri
|
170
|
+
url = URI.parse(base_uri).merge(feed_url).to_s
|
171
|
+
end
|
172
|
+
|
173
|
+
begin
|
174
|
+
uri = URI.parse(url)
|
175
|
+
rescue
|
176
|
+
puts "Error with `#{url}'"
|
177
|
+
exit 1
|
178
|
+
end
|
179
|
+
unless uri.absolute?
|
180
|
+
orig = URI.parse(orig_url)
|
181
|
+
url = orig.merge(url).to_s
|
182
|
+
end
|
183
|
+
|
184
|
+
# verify url is really valid
|
185
|
+
@feeds.push(url) unless @feeds.include?(url)
|
186
|
+
end
|
187
|
+
end
|
188
|
+
|
189
|
+
if __FILE__ == $PROGRAM_NAME
|
190
|
+
if ARGV.empty?
|
191
|
+
puts "usage: feedbag url"
|
192
|
+
else
|
193
|
+
puts AsyncFeedbag.find ARGV.first
|
194
|
+
end
|
195
|
+
end
|
metadata
ADDED
@@ -0,0 +1,84 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: async-feedbag
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 2.0.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Renato "Lond" Cerqueira
|
8
|
+
- David Moreno
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 1980-01-02 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: async-http
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - ">="
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '0.89'
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - ">="
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '0.89'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: nokogiri
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - "~>"
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '1.8'
|
34
|
+
- - ">="
|
35
|
+
- !ruby/object:Gem::Version
|
36
|
+
version: 1.8.18
|
37
|
+
type: :runtime
|
38
|
+
prerelease: false
|
39
|
+
version_requirements: !ruby/object:Gem::Requirement
|
40
|
+
requirements:
|
41
|
+
- - "~>"
|
42
|
+
- !ruby/object:Gem::Version
|
43
|
+
version: '1.8'
|
44
|
+
- - ">="
|
45
|
+
- !ruby/object:Gem::Version
|
46
|
+
version: 1.8.18
|
47
|
+
description: Ruby's favorite feed auto-discovery tool
|
48
|
+
email: renato@lond.com.br
|
49
|
+
executables:
|
50
|
+
- feedbag
|
51
|
+
extensions: []
|
52
|
+
extra_rdoc_files:
|
53
|
+
- COPYING
|
54
|
+
- README.markdown
|
55
|
+
files:
|
56
|
+
- COPYING
|
57
|
+
- README.markdown
|
58
|
+
- bin/feedbag
|
59
|
+
- lib/async-feedbag.rb
|
60
|
+
homepage: http://github.com/renatolond/async-feedbag
|
61
|
+
licenses:
|
62
|
+
- MIT
|
63
|
+
metadata:
|
64
|
+
rubygems_mfa_required: 'true'
|
65
|
+
rdoc_options:
|
66
|
+
- "--main"
|
67
|
+
- README.markdown
|
68
|
+
require_paths:
|
69
|
+
- lib
|
70
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
71
|
+
requirements:
|
72
|
+
- - ">="
|
73
|
+
- !ruby/object:Gem::Version
|
74
|
+
version: '3.0'
|
75
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
76
|
+
requirements:
|
77
|
+
- - ">="
|
78
|
+
- !ruby/object:Gem::Version
|
79
|
+
version: '0'
|
80
|
+
requirements: []
|
81
|
+
rubygems_version: 3.6.7
|
82
|
+
specification_version: 4
|
83
|
+
summary: RSS/Atom feed auto-discovery tool
|
84
|
+
test_files: []
|