directlink 0.0.3.1 → 0.0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +29 -10
- data/api_tokens_for_travis.sh +0 -1
- data/bin/directlink +1 -1
- data/directlink.gemspec +2 -1
- data/lib/directlink.rb +59 -15
- data/test.rb +42 -2
- metadata +16 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 31d4684edbeadeddce540e3ca0a1894f5756af86
|
4
|
+
data.tar.gz: f1efd18489faf49d2bc7460f210b53cf7cb0929c
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: bef3ef81ba91007a8b26d6f45813d67319bd00b3c3bfbc71977d2caa4c40e29f31e8f35a32d2f7af8c1c8bcd8d7a267ef3998bc51912a997462a864f2cf563e0
|
7
|
+
data.tar.gz: 18f45737947862ebc5409b3a20be5d2389aa7e9f85125bd85465c670c7ab488b43ab55c599ac79bdfde7063125b5a891a42336f9c2348ffcd5807707dc908d44
|
data/README.md
CHANGED
@@ -3,7 +3,7 @@
|
|
3
3
|
|
4
4
|
# gem directlink
|
5
5
|
|
6
|
-
This tool
|
6
|
+
This tool obtains from any sort of hyperlink (a thumbnail URL, a link to a photo album, a news article, etc.) a directlink(s) to high resolution images at that page. Also it tells the resulting resolution and the image type (format). The gem also includes a binary so you can use it as a CLI.
|
7
7
|
|
8
8
|
## Usage
|
9
9
|
|
@@ -16,7 +16,7 @@ $ gem install directlink
|
|
16
16
|
$ directlink
|
17
17
|
usage: directlink [--debug] [--json] [--github] <link1> <link2> <link3> ...
|
18
18
|
```
|
19
|
-
Converts `<img src=` attribute value from any Google web service
|
19
|
+
Converts `<img src=` attribute value from any Google web service to the largest available:
|
20
20
|
```
|
21
21
|
$ directlink //4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/w530-h278-p/IMG_20171223_093922.jpg
|
22
22
|
<= //4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/w530-h278-p/IMG_20171223_093922.jpg
|
@@ -77,16 +77,30 @@ $ directlink --json https://imgur.com/a/oacI3gl https://avatars1.githubuserconte
|
|
77
77
|
}
|
78
78
|
]
|
79
79
|
```
|
80
|
-
Downloads master:HEAD version of `lib/directlink.rb` from GitHub and uses it once instead of installed one
|
80
|
+
Downloads `master:HEAD` version of `lib/directlink.rb` from GitHub and uses it once instead of installed one:
|
81
81
|
```
|
82
82
|
$ directlink --github https://imgur.com/a/oacI3gl
|
83
83
|
```
|
84
|
-
When an image hosting with known API is recognized,
|
84
|
+
When an image hosting with known API is recognized, it will try to use the API tokens you've provided as env vars (otherwise it will go "don't give up" mode):
|
85
85
|
```
|
86
86
|
$ export IMGUR_CLIENT_ID=0f99cd781...
|
87
87
|
$ export FLICKR_API_KEY=dc2bfd348b...
|
88
88
|
```
|
89
89
|
|
90
|
+
#### the "don't give up mode"
|
91
|
+
|
92
|
+
If the passed link is not the image link or a photo page of a known image hosting, the tool is still able to find the main images that the linked webpage contains (here it found three images in the markdown file):
|
93
|
+
```
|
94
|
+
$ bundle exec bin/directlink https://github.com/Nakilon/dhash-vips
|
95
|
+
<= https://github.com/Nakilon/dhash-vips
|
96
|
+
=> https://camo.githubusercontent.com/852607c7f4b604fc3c83b782c4f6983cf488b0d4/68747470733a2f2f73746f726167652e676f6f676c65617069732e636f6d2f64686173682d766970732e6e616b696c6f6e2e70726f2f64686173685f69737375655f6578616d706c652e706e67
|
97
|
+
png 592x366
|
98
|
+
=> https://camo.githubusercontent.com/5e354666bac69e32d605dbd45351bfb7d808924b/68747470733a2f2f73746f726167652e676f6f676c65617069732e636f6d2f64686173682d766970732e6e616b696c6f6e2e70726f2f6964686173685f6578616d706c655f696e2e706e67
|
99
|
+
png 773x679
|
100
|
+
=> https://camo.githubusercontent.com/5456cc20ae9b20c06792ddd19b533ae36404d8c1/68747470733a2f2f73746f726167652e676f6f676c65617069732e636f6d2f64686173682d766970732e6e616b696c6f6e2e70726f2f6964686173685f6578616d706c655f6f75742e706e67
|
101
|
+
png 1610x800
|
102
|
+
```
|
103
|
+
|
90
104
|
### As a library
|
91
105
|
|
92
106
|
```
|
@@ -110,11 +124,16 @@ Google can serve image in arbitrary resolution so `DirectLink.google` has an opt
|
|
110
124
|
irb> DirectLink.google "//4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/w530-h278-p/IMG_20171223_093922.jpg", 100
|
111
125
|
=> "https://4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/s100/IMG_20171223_093922.jpg"
|
112
126
|
```
|
113
|
-
To
|
127
|
+
To disable the "don't give up" mode (otherwise it consumes time on analyzing all the images on the linked page):
|
128
|
+
```
|
129
|
+
irb> DirectLink "https://github.com/Nakilon/dhash-vips", nil, true
|
130
|
+
# raises FastImage::UnknownImageType
|
131
|
+
```
|
132
|
+
To silent the STDOUT logger that you may see sometimes:
|
114
133
|
```ruby
|
115
134
|
DirectLink.silent = true
|
116
135
|
```
|
117
|
-
You also may look into [`bin/directlink`](bin/directlink)
|
136
|
+
You also may look into [`bin/directlink`](bin/directlink) as a library usage example and the list of all possible exceptions.
|
118
137
|
|
119
138
|
#### about long retries
|
120
139
|
|
@@ -125,7 +144,7 @@ NetHTTPUtils.logger.level = Logger::WARN
|
|
125
144
|
```
|
126
145
|
W 180507 102210 : NetHTTPUtils : retrying in 10 seconds because of SocketError 'Failed to open TCP connection to minus.com:80 (getaddrinfo: nodename nor servname provided, or not known)' at: http://minus.com/
|
127
146
|
```
|
128
|
-
To make `DirectLink()` respond faster pass an optional argument that specifies the max retry delay
|
147
|
+
To make `DirectLink()` respond faster pass an optional argument that specifies the max retry delay. Here we get the exception immediately:
|
129
148
|
```ruby
|
130
149
|
DirectLink "http://minus.com/", 0
|
131
150
|
```
|
@@ -136,7 +155,7 @@ SocketError: Failed to open TCP connection to minus.com:80 (getaddrinfo: nodenam
|
|
136
155
|
## Notes:
|
137
156
|
|
138
157
|
* `module DirectLink` public methods return different sets of properties -- `DirectLink()` unites them
|
139
|
-
* the `
|
158
|
+
* the `ErrorAssert` and `ErrorMissingEnvVar` should never be raised and you might report it if it does
|
140
159
|
* style: `@@` and lambdas are used to keep things private
|
141
|
-
* this gem is a 2 or 3 libraries merged
|
142
|
-
*
|
160
|
+
* this gem is a historically 2 or 3 libraries merged -- this is why tests may look awkward
|
161
|
+
* 500px.com has discontinued API in June 2018 -- the tool now uses undocumented methods
|
data/api_tokens_for_travis.sh
CHANGED
data/bin/directlink
CHANGED
@@ -72,7 +72,7 @@ rescue NetHTTPUtils::Error,
|
|
72
72
|
SocketError,
|
73
73
|
FastImage::UnknownImageType,
|
74
74
|
FastImage::ImageFetchFailure,
|
75
|
-
DirectLink::ErrorMissingEnvVar,
|
75
|
+
# DirectLink::ErrorMissingEnvVar,
|
76
76
|
# DirectLink::ErrorAssert,
|
77
77
|
DirectLink::ErrorNotFound,
|
78
78
|
DirectLink::ErrorBadLink => e
|
data/directlink.gemspec
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
Gem::Specification.new do |spec|
|
2
2
|
spec.name = "directlink"
|
3
|
-
spec.version = "0.0.
|
3
|
+
spec.version = "0.0.4.0"
|
4
4
|
spec.summary = "converts any kind of image hyperlink to direct link, type of image and its resolution"
|
5
5
|
|
6
6
|
spec.author = "Victor Maslov aka Nakilon"
|
@@ -11,6 +11,7 @@ Gem::Specification.new do |spec|
|
|
11
11
|
|
12
12
|
spec.add_dependency "nethttputils", "~>0.2.4.0"
|
13
13
|
spec.add_dependency "fastimage", "~>2.1.3"
|
14
|
+
spec.add_dependency "nokogiri"
|
14
15
|
spec.add_development_dependency "minitest"
|
15
16
|
|
16
17
|
spec.require_path = "lib"
|
data/lib/directlink.rb
CHANGED
@@ -12,24 +12,28 @@ module DirectLink
|
|
12
12
|
puts str unless Module.nesting.first.silent
|
13
13
|
end
|
14
14
|
|
15
|
-
class ErrorMissingEnvVar < RuntimeError ; end
|
16
15
|
class ErrorAssert < RuntimeError
|
17
16
|
def initialize msg
|
18
17
|
super "#{msg} -- consider reporting this issue to GitHub"
|
19
18
|
end
|
20
19
|
end
|
21
|
-
|
20
|
+
logging_error = Class.new RuntimeError do
|
22
21
|
def initialize msg
|
23
22
|
Module.nesting.first.logger.error msg
|
24
23
|
super msg
|
25
24
|
end
|
26
25
|
end
|
27
|
-
class ErrorNotFound <
|
28
|
-
class ErrorBadLink <
|
26
|
+
class ErrorNotFound < logging_error ; end
|
27
|
+
class ErrorBadLink < logging_error
|
29
28
|
def initialize link, sure = false
|
30
29
|
super "#{link.inspect}#{" -- if you think this link is valid, please report the issue" unless sure}"
|
31
30
|
end
|
32
31
|
end
|
32
|
+
class ErrorMissingEnvVar < logging_error
|
33
|
+
def initialize msg
|
34
|
+
super "(warning, recommendation) #{msg}"
|
35
|
+
end
|
36
|
+
end
|
33
37
|
|
34
38
|
|
35
39
|
def self.google src, width = 0
|
@@ -202,7 +206,7 @@ end
|
|
202
206
|
|
203
207
|
require "fastimage"
|
204
208
|
|
205
|
-
def DirectLink link, max_redirect_resolving_retry_delay = nil
|
209
|
+
def DirectLink link, max_redirect_resolving_retry_delay = nil, giveup = false
|
206
210
|
begin
|
207
211
|
URI link
|
208
212
|
rescue URI::InvalidURIError
|
@@ -234,28 +238,42 @@ def DirectLink link, max_redirect_resolving_retry_delay = nil
|
|
234
238
|
} : {})
|
235
239
|
raise NetHTTPUtils::Error.new "", r.code.to_i unless "200" == r.code
|
236
240
|
link = r.uri.to_s
|
241
|
+
# why do we resolve redirects before trying the known adapters?
|
242
|
+
# because they can be hidden behind URL shorteners
|
243
|
+
# also it can resolve NetHTTPUtils::Error(404) before trying the adapter
|
244
|
+
|
245
|
+
# TODO: get rid of this copypasta, that is caused by that we want to pass urls without schema to this method
|
246
|
+
if %w{ lh3 googleusercontent com } == URI(link).host.split(?.).last(3) ||
|
247
|
+
%w{ bp blogspot com } == URI(link).host.split(?.).last(3)
|
248
|
+
u = DirectLink.google link
|
249
|
+
f = FastImage.new(u, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla"})
|
250
|
+
w, h = f.size
|
251
|
+
return struct.new u, w, h, f.type
|
252
|
+
end
|
237
253
|
|
238
|
-
|
239
|
-
%w{ i imgur com } == URI(link).host.split(?.).last(3) ||
|
240
|
-
%w{ m imgur com } == URI(link).host.split(?.).last(3) ||
|
241
|
-
%w{ www imgur com } == URI(link).host.split(?.).last(3)
|
254
|
+
begin
|
242
255
|
imgur = DirectLink.imgur(link).sort_by{ |u, w, h, t| - w * h }.map do |u, w, h, t|
|
243
256
|
struct.new u, w, h, t
|
244
257
|
end
|
245
258
|
# `DirectLink.imgur` return value is always an Array
|
246
259
|
return imgur.size == 1 ? imgur.first : imgur
|
247
|
-
|
260
|
+
rescue DirectLink::ErrorMissingEnvVar
|
261
|
+
end if %w{ imgur com } == URI(link).host.split(?.).last(2) ||
|
262
|
+
%w{ i imgur com } == URI(link).host.split(?.).last(3) ||
|
263
|
+
%w{ m imgur com } == URI(link).host.split(?.).last(3) ||
|
264
|
+
%w{ www imgur com } == URI(link).host.split(?.).last(3)
|
248
265
|
|
249
266
|
if %w{ 500px com } == URI(link).host.split(?.).last(2)
|
250
267
|
w, h, u, t = DirectLink._500px(link)
|
251
268
|
return struct.new u, w, h, t
|
252
269
|
end
|
253
270
|
|
254
|
-
|
271
|
+
begin
|
255
272
|
w, h, u = DirectLink.flickr(link)
|
256
273
|
f = FastImage.new(u, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla"})
|
257
274
|
return struct.new u, w, h, f.type
|
258
|
-
|
275
|
+
rescue DirectLink::ErrorMissingEnvVar
|
276
|
+
end if %w{ www flickr com } == URI(link).host.split(?.).last(3)
|
259
277
|
|
260
278
|
if %w{ wikipedia org } == URI(link).host.split(?.).last(2) ||
|
261
279
|
%w{ commons wikimedia org } == URI(link).host.split(?.).last(3)
|
@@ -265,7 +283,33 @@ def DirectLink link, max_redirect_resolving_retry_delay = nil
|
|
265
283
|
return struct.new u, w, h, f.type
|
266
284
|
end
|
267
285
|
|
268
|
-
|
269
|
-
|
270
|
-
|
286
|
+
begin
|
287
|
+
f = FastImage.new(link, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"})
|
288
|
+
rescue FastImage::UnknownImageType
|
289
|
+
raise if giveup
|
290
|
+
require "nokogiri"
|
291
|
+
html = Nokogiri::HTML NetHTTPUtils::request_data link
|
292
|
+
h = {}
|
293
|
+
l = lambda do |node, s = []|
|
294
|
+
node.element_children.flat_map do |child|
|
295
|
+
if "img" == child.node_name
|
296
|
+
begin
|
297
|
+
[[s, (h[child[:src]] = h[child[:src]] || DirectLink(child[:src], nil, true))]]
|
298
|
+
rescue => e
|
299
|
+
[]
|
300
|
+
end
|
301
|
+
else
|
302
|
+
l[child, s + [child.node_name]]
|
303
|
+
end
|
304
|
+
end
|
305
|
+
end
|
306
|
+
l[html].group_by(&:first).map{ |k, v| [k.join(?>), v.map(&:last)] }.tap do |result|
|
307
|
+
next unless result.empty?
|
308
|
+
raise unless t = html.at_css "meta[@property='og:image']"
|
309
|
+
return DirectLink t[:content], nil, true
|
310
|
+
end.max_by{ |_, v| v.map{ |i| i.width * i.height }.inject(:+) / v.size }.last
|
311
|
+
else
|
312
|
+
w, h = f.size
|
313
|
+
struct.new link, w, h, f.type
|
314
|
+
end
|
271
315
|
end
|
data/test.rb
CHANGED
@@ -1,4 +1,5 @@
|
|
1
1
|
STDOUT.sync = true
|
2
|
+
require "pp"
|
2
3
|
|
3
4
|
require "minitest/autorun"
|
4
5
|
require "minitest/mock"
|
@@ -6,6 +7,9 @@ require "minitest/mock"
|
|
6
7
|
# TODO: I'm not sure it's ok that after we started using NetHTTPUtils for redirect resolving
|
7
8
|
# we don't raise `FastImage::ImageFetchFailure` anymore in any test
|
8
9
|
|
10
|
+
fail unless ENV.include? "IMGUR_CLIENT_ID"
|
11
|
+
fail unless ENV.include? "FLICKR_API_KEY"
|
12
|
+
|
9
13
|
require_relative "lib/directlink"
|
10
14
|
DirectLink.silent = true
|
11
15
|
describe DirectLink do
|
@@ -408,7 +412,6 @@ describe DirectLink do
|
|
408
412
|
describe "some other tests" do
|
409
413
|
[
|
410
414
|
["http://www.aeronautica.difesa.it/organizzazione/REPARTI/divolo/PublishingImages/6%C2%B0%20Stormo/2013-decollo%20al%20tramonto%20REX%201280.jpg", ["http://www.aeronautica.difesa.it/organizzazione/REPARTI/divolo/PublishingImages/6%C2%B0%20Stormo/2013-decollo%20al%20tramonto%20REX%201280.jpg", 1280, 853, :jpeg]],
|
411
|
-
["http://example.com", FastImage::UnknownImageType, "FastImage::UnknownImageType"], # we explicitly expect this useless `e.message ` to be sure we know how FastImage behaves
|
412
415
|
["http://minus.com/lkP3hgRJd9npi", SocketError, /nodename nor servname provided, or not known|No address associated with hostname/, 0],
|
413
416
|
["https://i.redd.it/si758zk7r5xz.jpg", NetHTTPUtils::Error, "HTTP error #404 "],
|
414
417
|
["http://www.cutehalloweencostumeideas.org/wp-content/uploads/2017/10/Niagara-Falls_04.jpg", SocketError, /nodename nor servname provided, or not known|Name or service not known/, 0],
|
@@ -432,6 +435,39 @@ describe DirectLink do
|
|
432
435
|
end
|
433
436
|
end
|
434
437
|
|
438
|
+
describe "giving up" do
|
439
|
+
[
|
440
|
+
["http://example.com", FastImage::UnknownImageType],
|
441
|
+
["https://github.com/Nakilon/dhash-vips", FastImage::UnknownImageType, true],
|
442
|
+
["https://github.com/Nakilon/dhash-vips", 3],
|
443
|
+
["http://imgur.com/HQHBBBD", FastImage::UnknownImageType, true],
|
444
|
+
["http://imgur.com/HQHBBBD", "https://i.imgur.com/HQHBBBD.jpg?fb"],
|
445
|
+
].each_with_index do |(input, expectation, giveup), i|
|
446
|
+
it "##{i + 1}" do
|
447
|
+
t = ENV.delete "IMGUR_CLIENT_ID"
|
448
|
+
begin
|
449
|
+
case expectation
|
450
|
+
when Class
|
451
|
+
e = assert_raises expectation, "for #{input} (giveup = #{giveup})" do
|
452
|
+
DirectLink input, nil, giveup
|
453
|
+
end
|
454
|
+
assert_equal expectation.to_s, e.message, "for #{input} (giveup = #{giveup})"
|
455
|
+
when String
|
456
|
+
result = DirectLink input, nil, giveup
|
457
|
+
assert_equal expectation, result.url, "for #{input} (giveup = #{giveup})"
|
458
|
+
else
|
459
|
+
result = DirectLink input, nil, giveup
|
460
|
+
assert_equal expectation, result.size, ->{
|
461
|
+
"for #{input} (giveup = #{giveup}): #{result.map &:url}"
|
462
|
+
}
|
463
|
+
end
|
464
|
+
ensure
|
465
|
+
ENV["IMGUR_CLIENT_ID"] = t
|
466
|
+
end
|
467
|
+
end
|
468
|
+
end
|
469
|
+
end
|
470
|
+
|
435
471
|
end
|
436
472
|
|
437
473
|
describe "./bin" do
|
@@ -468,9 +504,13 @@ describe DirectLink do
|
|
468
504
|
[
|
469
505
|
[1, "http://example.com/", "FastImage::UnknownImageType"],
|
470
506
|
[1, "http://example.com/404", "NetHTTPUtils::Error: HTTP error #404 "],
|
471
|
-
|
507
|
+
|
508
|
+
# TODO: a test when the giveup=false fails and reraises the DirectLink::ErrorMissingEnvVar
|
509
|
+
# maybe put it to ./lib tests
|
510
|
+
|
472
511
|
# by design it should be impossible to write a test for DirectLink::ErrorAssert
|
473
512
|
[1, "https://flic.kr/p/DirectLinkErrorNotFound", "NetHTTPUtils::Error: HTTP error #404 "],
|
513
|
+
|
474
514
|
[1, "https://imgur.com/a/badlinkpattern", "NetHTTPUtils::Error: HTTP error #404 "],
|
475
515
|
# TODO: a test that it appends the `exception.cause`
|
476
516
|
].each_with_index do |(expected_exit_code, link, expected_output, unset), i|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: directlink
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.4.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Victor Maslov aka Nakilon
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-08-
|
11
|
+
date: 2018-08-12 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nethttputils
|
@@ -38,6 +38,20 @@ dependencies:
|
|
38
38
|
- - "~>"
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: 2.1.3
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: nokogiri
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - ">="
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0'
|
48
|
+
type: :runtime
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
41
55
|
- !ruby/object:Gem::Dependency
|
42
56
|
name: minitest
|
43
57
|
requirement: !ruby/object:Gem::Requirement
|