directlink 0.0.3.1 → 0.0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +29 -10
- data/api_tokens_for_travis.sh +0 -1
- data/bin/directlink +1 -1
- data/directlink.gemspec +2 -1
- data/lib/directlink.rb +59 -15
- data/test.rb +42 -2
- metadata +16 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 31d4684edbeadeddce540e3ca0a1894f5756af86
|
4
|
+
data.tar.gz: f1efd18489faf49d2bc7460f210b53cf7cb0929c
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: bef3ef81ba91007a8b26d6f45813d67319bd00b3c3bfbc71977d2caa4c40e29f31e8f35a32d2f7af8c1c8bcd8d7a267ef3998bc51912a997462a864f2cf563e0
|
7
|
+
data.tar.gz: 18f45737947862ebc5409b3a20be5d2389aa7e9f85125bd85465c670c7ab488b43ab55c599ac79bdfde7063125b5a891a42336f9c2348ffcd5807707dc908d44
|
data/README.md
CHANGED
@@ -3,7 +3,7 @@
|
|
3
3
|
|
4
4
|
# gem directlink
|
5
5
|
|
6
|
-
This tool
|
6
|
+
This tool obtains from any sort of hyperlink (a thumbnail URL, a link to a photo album, a news article, etc.) a directlink(s) to high resolution images at that page. Also it tells the resulting resolution and the image type (format). The gem also includes a binary so you can use it as a CLI.
|
7
7
|
|
8
8
|
## Usage
|
9
9
|
|
@@ -16,7 +16,7 @@ $ gem install directlink
|
|
16
16
|
$ directlink
|
17
17
|
usage: directlink [--debug] [--json] [--github] <link1> <link2> <link3> ...
|
18
18
|
```
|
19
|
-
Converts `<img src=` attribute value from any Google web service
|
19
|
+
Converts `<img src=` attribute value from any Google web service to the largest available:
|
20
20
|
```
|
21
21
|
$ directlink //4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/w530-h278-p/IMG_20171223_093922.jpg
|
22
22
|
<= //4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/w530-h278-p/IMG_20171223_093922.jpg
|
@@ -77,16 +77,30 @@ $ directlink --json https://imgur.com/a/oacI3gl https://avatars1.githubuserconte
|
|
77
77
|
}
|
78
78
|
]
|
79
79
|
```
|
80
|
-
Downloads master:HEAD version of `lib/directlink.rb` from GitHub and uses it once instead of installed one
|
80
|
+
Downloads `master:HEAD` version of `lib/directlink.rb` from GitHub and uses it once instead of installed one:
|
81
81
|
```
|
82
82
|
$ directlink --github https://imgur.com/a/oacI3gl
|
83
83
|
```
|
84
|
-
When an image hosting with known API is recognized,
|
84
|
+
When an image hosting with known API is recognized, it will try to use the API tokens you've provided as env vars (otherwise it will go "don't give up" mode):
|
85
85
|
```
|
86
86
|
$ export IMGUR_CLIENT_ID=0f99cd781...
|
87
87
|
$ export FLICKR_API_KEY=dc2bfd348b...
|
88
88
|
```
|
89
89
|
|
90
|
+
#### the "don't give up mode"
|
91
|
+
|
92
|
+
If the passed link is not the image link or a photo page of a known image hosting, the tool is still able to find the main images that the linked webpage contains (here it found three images in the markdown file):
|
93
|
+
```
|
94
|
+
$ bundle exec bin/directlink https://github.com/Nakilon/dhash-vips
|
95
|
+
<= https://github.com/Nakilon/dhash-vips
|
96
|
+
=> https://camo.githubusercontent.com/852607c7f4b604fc3c83b782c4f6983cf488b0d4/68747470733a2f2f73746f726167652e676f6f676c65617069732e636f6d2f64686173682d766970732e6e616b696c6f6e2e70726f2f64686173685f69737375655f6578616d706c652e706e67
|
97
|
+
png 592x366
|
98
|
+
=> https://camo.githubusercontent.com/5e354666bac69e32d605dbd45351bfb7d808924b/68747470733a2f2f73746f726167652e676f6f676c65617069732e636f6d2f64686173682d766970732e6e616b696c6f6e2e70726f2f6964686173685f6578616d706c655f696e2e706e67
|
99
|
+
png 773x679
|
100
|
+
=> https://camo.githubusercontent.com/5456cc20ae9b20c06792ddd19b533ae36404d8c1/68747470733a2f2f73746f726167652e676f6f676c65617069732e636f6d2f64686173682d766970732e6e616b696c6f6e2e70726f2f6964686173685f6578616d706c655f6f75742e706e67
|
101
|
+
png 1610x800
|
102
|
+
```
|
103
|
+
|
90
104
|
### As a library
|
91
105
|
|
92
106
|
```
|
@@ -110,11 +124,16 @@ Google can serve image in arbitrary resolution so `DirectLink.google` has an opt
|
|
110
124
|
irb> DirectLink.google "//4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/w530-h278-p/IMG_20171223_093922.jpg", 100
|
111
125
|
=> "https://4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/s100/IMG_20171223_093922.jpg"
|
112
126
|
```
|
113
|
-
To
|
127
|
+
To disable the "don't give up" mode (otherwise it consumes time on analyzing all the images on the linked page):
|
128
|
+
```
|
129
|
+
irb> DirectLink "https://github.com/Nakilon/dhash-vips", nil, true
|
130
|
+
# raises FastImage::UnknownImageType
|
131
|
+
```
|
132
|
+
To silent the STDOUT logger that you may see sometimes:
|
114
133
|
```ruby
|
115
134
|
DirectLink.silent = true
|
116
135
|
```
|
117
|
-
You also may look into [`bin/directlink`](bin/directlink)
|
136
|
+
You also may look into [`bin/directlink`](bin/directlink) as a library usage example and the list of all possible exceptions.
|
118
137
|
|
119
138
|
#### about long retries
|
120
139
|
|
@@ -125,7 +144,7 @@ NetHTTPUtils.logger.level = Logger::WARN
|
|
125
144
|
```
|
126
145
|
W 180507 102210 : NetHTTPUtils : retrying in 10 seconds because of SocketError 'Failed to open TCP connection to minus.com:80 (getaddrinfo: nodename nor servname provided, or not known)' at: http://minus.com/
|
127
146
|
```
|
128
|
-
To make `DirectLink()` respond faster pass an optional argument that specifies the max retry delay
|
147
|
+
To make `DirectLink()` respond faster pass an optional argument that specifies the max retry delay. Here we get the exception immediately:
|
129
148
|
```ruby
|
130
149
|
DirectLink "http://minus.com/", 0
|
131
150
|
```
|
@@ -136,7 +155,7 @@ SocketError: Failed to open TCP connection to minus.com:80 (getaddrinfo: nodenam
|
|
136
155
|
## Notes:
|
137
156
|
|
138
157
|
* `module DirectLink` public methods return different sets of properties -- `DirectLink()` unites them
|
139
|
-
* the `
|
158
|
+
* the `ErrorAssert` and `ErrorMissingEnvVar` should never be raised and you might report it if it does
|
140
159
|
* style: `@@` and lambdas are used to keep things private
|
141
|
-
* this gem is a 2 or 3 libraries merged
|
142
|
-
*
|
160
|
+
* this gem is a historically 2 or 3 libraries merged -- this is why tests may look awkward
|
161
|
+
* 500px.com has discontinued API in June 2018 -- the tool now uses undocumented methods
|
data/api_tokens_for_travis.sh
CHANGED
data/bin/directlink
CHANGED
@@ -72,7 +72,7 @@ rescue NetHTTPUtils::Error,
|
|
72
72
|
SocketError,
|
73
73
|
FastImage::UnknownImageType,
|
74
74
|
FastImage::ImageFetchFailure,
|
75
|
-
DirectLink::ErrorMissingEnvVar,
|
75
|
+
# DirectLink::ErrorMissingEnvVar,
|
76
76
|
# DirectLink::ErrorAssert,
|
77
77
|
DirectLink::ErrorNotFound,
|
78
78
|
DirectLink::ErrorBadLink => e
|
data/directlink.gemspec
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
Gem::Specification.new do |spec|
|
2
2
|
spec.name = "directlink"
|
3
|
-
spec.version = "0.0.
|
3
|
+
spec.version = "0.0.4.0"
|
4
4
|
spec.summary = "converts any kind of image hyperlink to direct link, type of image and its resolution"
|
5
5
|
|
6
6
|
spec.author = "Victor Maslov aka Nakilon"
|
@@ -11,6 +11,7 @@ Gem::Specification.new do |spec|
|
|
11
11
|
|
12
12
|
spec.add_dependency "nethttputils", "~>0.2.4.0"
|
13
13
|
spec.add_dependency "fastimage", "~>2.1.3"
|
14
|
+
spec.add_dependency "nokogiri"
|
14
15
|
spec.add_development_dependency "minitest"
|
15
16
|
|
16
17
|
spec.require_path = "lib"
|
data/lib/directlink.rb
CHANGED
@@ -12,24 +12,28 @@ module DirectLink
|
|
12
12
|
puts str unless Module.nesting.first.silent
|
13
13
|
end
|
14
14
|
|
15
|
-
class ErrorMissingEnvVar < RuntimeError ; end
|
16
15
|
class ErrorAssert < RuntimeError
|
17
16
|
def initialize msg
|
18
17
|
super "#{msg} -- consider reporting this issue to GitHub"
|
19
18
|
end
|
20
19
|
end
|
21
|
-
|
20
|
+
logging_error = Class.new RuntimeError do
|
22
21
|
def initialize msg
|
23
22
|
Module.nesting.first.logger.error msg
|
24
23
|
super msg
|
25
24
|
end
|
26
25
|
end
|
27
|
-
class ErrorNotFound <
|
28
|
-
class ErrorBadLink <
|
26
|
+
class ErrorNotFound < logging_error ; end
|
27
|
+
class ErrorBadLink < logging_error
|
29
28
|
def initialize link, sure = false
|
30
29
|
super "#{link.inspect}#{" -- if you think this link is valid, please report the issue" unless sure}"
|
31
30
|
end
|
32
31
|
end
|
32
|
+
class ErrorMissingEnvVar < logging_error
|
33
|
+
def initialize msg
|
34
|
+
super "(warning, recommendation) #{msg}"
|
35
|
+
end
|
36
|
+
end
|
33
37
|
|
34
38
|
|
35
39
|
def self.google src, width = 0
|
@@ -202,7 +206,7 @@ end
|
|
202
206
|
|
203
207
|
require "fastimage"
|
204
208
|
|
205
|
-
def DirectLink link, max_redirect_resolving_retry_delay = nil
|
209
|
+
def DirectLink link, max_redirect_resolving_retry_delay = nil, giveup = false
|
206
210
|
begin
|
207
211
|
URI link
|
208
212
|
rescue URI::InvalidURIError
|
@@ -234,28 +238,42 @@ def DirectLink link, max_redirect_resolving_retry_delay = nil
|
|
234
238
|
} : {})
|
235
239
|
raise NetHTTPUtils::Error.new "", r.code.to_i unless "200" == r.code
|
236
240
|
link = r.uri.to_s
|
241
|
+
# why do we resolve redirects before trying the known adapters?
|
242
|
+
# because they can be hidden behind URL shorteners
|
243
|
+
# also it can resolve NetHTTPUtils::Error(404) before trying the adapter
|
244
|
+
|
245
|
+
# TODO: get rid of this copypasta, that is caused by that we want to pass urls without schema to this method
|
246
|
+
if %w{ lh3 googleusercontent com } == URI(link).host.split(?.).last(3) ||
|
247
|
+
%w{ bp blogspot com } == URI(link).host.split(?.).last(3)
|
248
|
+
u = DirectLink.google link
|
249
|
+
f = FastImage.new(u, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla"})
|
250
|
+
w, h = f.size
|
251
|
+
return struct.new u, w, h, f.type
|
252
|
+
end
|
237
253
|
|
238
|
-
|
239
|
-
%w{ i imgur com } == URI(link).host.split(?.).last(3) ||
|
240
|
-
%w{ m imgur com } == URI(link).host.split(?.).last(3) ||
|
241
|
-
%w{ www imgur com } == URI(link).host.split(?.).last(3)
|
254
|
+
begin
|
242
255
|
imgur = DirectLink.imgur(link).sort_by{ |u, w, h, t| - w * h }.map do |u, w, h, t|
|
243
256
|
struct.new u, w, h, t
|
244
257
|
end
|
245
258
|
# `DirectLink.imgur` return value is always an Array
|
246
259
|
return imgur.size == 1 ? imgur.first : imgur
|
247
|
-
|
260
|
+
rescue DirectLink::ErrorMissingEnvVar
|
261
|
+
end if %w{ imgur com } == URI(link).host.split(?.).last(2) ||
|
262
|
+
%w{ i imgur com } == URI(link).host.split(?.).last(3) ||
|
263
|
+
%w{ m imgur com } == URI(link).host.split(?.).last(3) ||
|
264
|
+
%w{ www imgur com } == URI(link).host.split(?.).last(3)
|
248
265
|
|
249
266
|
if %w{ 500px com } == URI(link).host.split(?.).last(2)
|
250
267
|
w, h, u, t = DirectLink._500px(link)
|
251
268
|
return struct.new u, w, h, t
|
252
269
|
end
|
253
270
|
|
254
|
-
|
271
|
+
begin
|
255
272
|
w, h, u = DirectLink.flickr(link)
|
256
273
|
f = FastImage.new(u, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla"})
|
257
274
|
return struct.new u, w, h, f.type
|
258
|
-
|
275
|
+
rescue DirectLink::ErrorMissingEnvVar
|
276
|
+
end if %w{ www flickr com } == URI(link).host.split(?.).last(3)
|
259
277
|
|
260
278
|
if %w{ wikipedia org } == URI(link).host.split(?.).last(2) ||
|
261
279
|
%w{ commons wikimedia org } == URI(link).host.split(?.).last(3)
|
@@ -265,7 +283,33 @@ def DirectLink link, max_redirect_resolving_retry_delay = nil
|
|
265
283
|
return struct.new u, w, h, f.type
|
266
284
|
end
|
267
285
|
|
268
|
-
|
269
|
-
|
270
|
-
|
286
|
+
begin
|
287
|
+
f = FastImage.new(link, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"})
|
288
|
+
rescue FastImage::UnknownImageType
|
289
|
+
raise if giveup
|
290
|
+
require "nokogiri"
|
291
|
+
html = Nokogiri::HTML NetHTTPUtils::request_data link
|
292
|
+
h = {}
|
293
|
+
l = lambda do |node, s = []|
|
294
|
+
node.element_children.flat_map do |child|
|
295
|
+
if "img" == child.node_name
|
296
|
+
begin
|
297
|
+
[[s, (h[child[:src]] = h[child[:src]] || DirectLink(child[:src], nil, true))]]
|
298
|
+
rescue => e
|
299
|
+
[]
|
300
|
+
end
|
301
|
+
else
|
302
|
+
l[child, s + [child.node_name]]
|
303
|
+
end
|
304
|
+
end
|
305
|
+
end
|
306
|
+
l[html].group_by(&:first).map{ |k, v| [k.join(?>), v.map(&:last)] }.tap do |result|
|
307
|
+
next unless result.empty?
|
308
|
+
raise unless t = html.at_css "meta[@property='og:image']"
|
309
|
+
return DirectLink t[:content], nil, true
|
310
|
+
end.max_by{ |_, v| v.map{ |i| i.width * i.height }.inject(:+) / v.size }.last
|
311
|
+
else
|
312
|
+
w, h = f.size
|
313
|
+
struct.new link, w, h, f.type
|
314
|
+
end
|
271
315
|
end
|
data/test.rb
CHANGED
@@ -1,4 +1,5 @@
|
|
1
1
|
STDOUT.sync = true
|
2
|
+
require "pp"
|
2
3
|
|
3
4
|
require "minitest/autorun"
|
4
5
|
require "minitest/mock"
|
@@ -6,6 +7,9 @@ require "minitest/mock"
|
|
6
7
|
# TODO: I'm not sure it's ok that after we started using NetHTTPUtils for redirect resolving
|
7
8
|
# we don't raise `FastImage::ImageFetchFailure` anymore in any test
|
8
9
|
|
10
|
+
fail unless ENV.include? "IMGUR_CLIENT_ID"
|
11
|
+
fail unless ENV.include? "FLICKR_API_KEY"
|
12
|
+
|
9
13
|
require_relative "lib/directlink"
|
10
14
|
DirectLink.silent = true
|
11
15
|
describe DirectLink do
|
@@ -408,7 +412,6 @@ describe DirectLink do
|
|
408
412
|
describe "some other tests" do
|
409
413
|
[
|
410
414
|
["http://www.aeronautica.difesa.it/organizzazione/REPARTI/divolo/PublishingImages/6%C2%B0%20Stormo/2013-decollo%20al%20tramonto%20REX%201280.jpg", ["http://www.aeronautica.difesa.it/organizzazione/REPARTI/divolo/PublishingImages/6%C2%B0%20Stormo/2013-decollo%20al%20tramonto%20REX%201280.jpg", 1280, 853, :jpeg]],
|
411
|
-
["http://example.com", FastImage::UnknownImageType, "FastImage::UnknownImageType"], # we explicitly expect this useless `e.message ` to be sure we know how FastImage behaves
|
412
415
|
["http://minus.com/lkP3hgRJd9npi", SocketError, /nodename nor servname provided, or not known|No address associated with hostname/, 0],
|
413
416
|
["https://i.redd.it/si758zk7r5xz.jpg", NetHTTPUtils::Error, "HTTP error #404 "],
|
414
417
|
["http://www.cutehalloweencostumeideas.org/wp-content/uploads/2017/10/Niagara-Falls_04.jpg", SocketError, /nodename nor servname provided, or not known|Name or service not known/, 0],
|
@@ -432,6 +435,39 @@ describe DirectLink do
|
|
432
435
|
end
|
433
436
|
end
|
434
437
|
|
438
|
+
describe "giving up" do
|
439
|
+
[
|
440
|
+
["http://example.com", FastImage::UnknownImageType],
|
441
|
+
["https://github.com/Nakilon/dhash-vips", FastImage::UnknownImageType, true],
|
442
|
+
["https://github.com/Nakilon/dhash-vips", 3],
|
443
|
+
["http://imgur.com/HQHBBBD", FastImage::UnknownImageType, true],
|
444
|
+
["http://imgur.com/HQHBBBD", "https://i.imgur.com/HQHBBBD.jpg?fb"],
|
445
|
+
].each_with_index do |(input, expectation, giveup), i|
|
446
|
+
it "##{i + 1}" do
|
447
|
+
t = ENV.delete "IMGUR_CLIENT_ID"
|
448
|
+
begin
|
449
|
+
case expectation
|
450
|
+
when Class
|
451
|
+
e = assert_raises expectation, "for #{input} (giveup = #{giveup})" do
|
452
|
+
DirectLink input, nil, giveup
|
453
|
+
end
|
454
|
+
assert_equal expectation.to_s, e.message, "for #{input} (giveup = #{giveup})"
|
455
|
+
when String
|
456
|
+
result = DirectLink input, nil, giveup
|
457
|
+
assert_equal expectation, result.url, "for #{input} (giveup = #{giveup})"
|
458
|
+
else
|
459
|
+
result = DirectLink input, nil, giveup
|
460
|
+
assert_equal expectation, result.size, ->{
|
461
|
+
"for #{input} (giveup = #{giveup}): #{result.map &:url}"
|
462
|
+
}
|
463
|
+
end
|
464
|
+
ensure
|
465
|
+
ENV["IMGUR_CLIENT_ID"] = t
|
466
|
+
end
|
467
|
+
end
|
468
|
+
end
|
469
|
+
end
|
470
|
+
|
435
471
|
end
|
436
472
|
|
437
473
|
describe "./bin" do
|
@@ -468,9 +504,13 @@ describe DirectLink do
|
|
468
504
|
[
|
469
505
|
[1, "http://example.com/", "FastImage::UnknownImageType"],
|
470
506
|
[1, "http://example.com/404", "NetHTTPUtils::Error: HTTP error #404 "],
|
471
|
-
|
507
|
+
|
508
|
+
# TODO: a test when the giveup=false fails and reraises the DirectLink::ErrorMissingEnvVar
|
509
|
+
# maybe put it to ./lib tests
|
510
|
+
|
472
511
|
# by design it should be impossible to write a test for DirectLink::ErrorAssert
|
473
512
|
[1, "https://flic.kr/p/DirectLinkErrorNotFound", "NetHTTPUtils::Error: HTTP error #404 "],
|
513
|
+
|
474
514
|
[1, "https://imgur.com/a/badlinkpattern", "NetHTTPUtils::Error: HTTP error #404 "],
|
475
515
|
# TODO: a test that it appends the `exception.cause`
|
476
516
|
].each_with_index do |(expected_exit_code, link, expected_output, unset), i|
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: directlink
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.4.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Victor Maslov aka Nakilon
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-08-
|
11
|
+
date: 2018-08-12 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nethttputils
|
@@ -38,6 +38,20 @@ dependencies:
|
|
38
38
|
- - "~>"
|
39
39
|
- !ruby/object:Gem::Version
|
40
40
|
version: 2.1.3
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: nokogiri
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - ">="
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0'
|
48
|
+
type: :runtime
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
41
55
|
- !ruby/object:Gem::Dependency
|
42
56
|
name: minitest
|
43
57
|
requirement: !ruby/object:Gem::Requirement
|