directlink 0.0.3.1 → 0.0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: d421a30c60a09ebaf198aeb77679a78a61a8521c
4
- data.tar.gz: 7acbb0d68a9e150da049bd50afd0bd5a64e61fe1
3
+ metadata.gz: 31d4684edbeadeddce540e3ca0a1894f5756af86
4
+ data.tar.gz: f1efd18489faf49d2bc7460f210b53cf7cb0929c
5
5
  SHA512:
6
- metadata.gz: e1e5a70b474a6261f113304aee2639e10183dc5150017e68f68974b72d3ae4b9b8d41b3a5243883fc20214db6ae51d624a76039b548ccf33a07fe8911f5d1463
7
- data.tar.gz: e7ece452fbf63d76a9d155dd53e05d4e702e2f5c9d825a1562206f3ff727ca1e85e910a15814ba646f59cc600f76284206604d91c929e03efccfe5f372496d70
6
+ metadata.gz: bef3ef81ba91007a8b26d6f45813d67319bd00b3c3bfbc71977d2caa4c40e29f31e8f35a32d2f7af8c1c8bcd8d7a267ef3998bc51912a997462a864f2cf563e0
7
+ data.tar.gz: 18f45737947862ebc5409b3a20be5d2389aa7e9f85125bd85465c670c7ab488b43ab55c599ac79bdfde7063125b5a891a42336f9c2348ffcd5807707dc908d44
data/README.md CHANGED
@@ -3,7 +3,7 @@
3
3
 
4
4
  # gem directlink
5
5
 
6
- This tool converts any sort of image hyperlink (a thumbnail URL, a link to an album, etc.) to a high resolution one. Also it tells the resulting resolution and the image type (format). I wanted such automation often so I made a gem with a binary.
6
+ This tool obtains from any sort of hyperlink (a thumbnail URL, a link to a photo album, a news article, etc.) a directlink(s) to high resolution images at that page. Also it tells the resulting resolution and the image type (format). The gem also includes a binary so you can use it as a CLI.
7
7
 
8
8
  ## Usage
9
9
 
@@ -16,7 +16,7 @@ $ gem install directlink
16
16
  $ directlink
17
17
  usage: directlink [--debug] [--json] [--github] <link1> <link2> <link3> ...
18
18
  ```
19
- Converts `<img src=` attribute value from any Google web service (current Google regexes are very strict and may often fail -- it is a [defensive programming](https://en.wikipedia.org/wiki/Defensive_programming) practice -- report me your links!) to the largest available:
19
+ Converts `<img src=` attribute value from any Google web service to the largest available:
20
20
  ```
21
21
  $ directlink //4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/w530-h278-p/IMG_20171223_093922.jpg
22
22
  <= //4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/w530-h278-p/IMG_20171223_093922.jpg
@@ -77,16 +77,30 @@ $ directlink --json https://imgur.com/a/oacI3gl https://avatars1.githubuserconte
77
77
  }
78
78
  ]
79
79
  ```
80
- Downloads master:HEAD version of `lib/directlink.rb` from GitHub and uses it once instead of installed one (this is easier than installing gem from repo):
80
+ Downloads `master:HEAD` version of `lib/directlink.rb` from GitHub and uses it once instead of installed one:
81
81
  ```
82
82
  $ directlink --github https://imgur.com/a/oacI3gl
83
83
  ```
84
- When an image hosting with known API is recognized, the API will be used and you'll have to create app there and provide env vars:
84
+ When an image hosting with known API is recognized, it will try to use the API tokens you've provided as env vars (otherwise it will go "don't give up" mode):
85
85
  ```
86
86
  $ export IMGUR_CLIENT_ID=0f99cd781...
87
87
  $ export FLICKR_API_KEY=dc2bfd348b...
88
88
  ```
89
89
 
90
+ #### the "don't give up mode"
91
+
92
+ If the passed link is not the image link or a photo page of a known image hosting, the tool is still able to find the main images that the linked webpage contains (here it found three images in the markdown file):
93
+ ```
94
+ $ bundle exec bin/directlink https://github.com/Nakilon/dhash-vips
95
+ <= https://github.com/Nakilon/dhash-vips
96
+ => https://camo.githubusercontent.com/852607c7f4b604fc3c83b782c4f6983cf488b0d4/68747470733a2f2f73746f726167652e676f6f676c65617069732e636f6d2f64686173682d766970732e6e616b696c6f6e2e70726f2f64686173685f69737375655f6578616d706c652e706e67
97
+ png 592x366
98
+ => https://camo.githubusercontent.com/5e354666bac69e32d605dbd45351bfb7d808924b/68747470733a2f2f73746f726167652e676f6f676c65617069732e636f6d2f64686173682d766970732e6e616b696c6f6e2e70726f2f6964686173685f6578616d706c655f696e2e706e67
99
+ png 773x679
100
+ => https://camo.githubusercontent.com/5456cc20ae9b20c06792ddd19b533ae36404d8c1/68747470733a2f2f73746f726167652e676f6f676c65617069732e636f6d2f64686173682d766970732e6e616b696c6f6e2e70726f2f6964686173685f6578616d706c655f6f75742e706e67
101
+ png 1610x800
102
+ ```
103
+
90
104
  ### As a library
91
105
 
92
106
  ```
@@ -110,11 +124,16 @@ Google can serve image in arbitrary resolution so `DirectLink.google` has an opt
110
124
  irb> DirectLink.google "//4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/w530-h278-p/IMG_20171223_093922.jpg", 100
111
125
  => "https://4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/s100/IMG_20171223_093922.jpg"
112
126
  ```
113
- To silent the logger that `DirectLink.imgur` uses:
127
+ To disable the "don't give up" mode (otherwise it consumes time on analyzing all the images on the linked page):
128
+ ```
129
+ irb> DirectLink "https://github.com/Nakilon/dhash-vips", nil, true
130
+ # raises FastImage::UnknownImageType
131
+ ```
132
+ To silent the STDOUT logger that you may see sometimes:
114
133
  ```ruby
115
134
  DirectLink.silent = true
116
135
  ```
117
- You also may look into [`bin/directlink`](bin/directlink) for usage example and the list of all possible exceptions.
136
+ You also may look into [`bin/directlink`](bin/directlink) as a library usage example and the list of all possible exceptions.
118
137
 
119
138
  #### about long retries
120
139
 
@@ -125,7 +144,7 @@ NetHTTPUtils.logger.level = Logger::WARN
125
144
  ```
126
145
  W 180507 102210 : NetHTTPUtils : retrying in 10 seconds because of SocketError 'Failed to open TCP connection to minus.com:80 (getaddrinfo: nodename nor servname provided, or not known)' at: http://minus.com/
127
146
  ```
128
- To make `DirectLink()` respond faster pass an optional argument that specifies the max retry delay as any numeric value. Here we get the exception immediately:
147
+ To make `DirectLink()` respond faster pass an optional argument that specifies the max retry delay. Here we get the exception immediately:
129
148
  ```ruby
130
149
  DirectLink "http://minus.com/", 0
131
150
  ```
@@ -136,7 +155,7 @@ SocketError: Failed to open TCP connection to minus.com:80 (getaddrinfo: nodenam
136
155
  ## Notes:
137
156
 
138
157
  * `module DirectLink` public methods return different sets of properties -- `DirectLink()` unites them
139
- * the `DirectLink::ErrorAssert` should never happen and you might report it if it does
158
+ * the `ErrorAssert` and `ErrorMissingEnvVar` should never be raised and you might report it if it does
140
159
  * style: `@@` and lambdas are used to keep things private
141
- * this gem is a 2 or 3 libraries merged so don't expect tests to be full and consistent
142
- * since 500px.com closed their API in June 2018 the gem uses potentially unreliable undocumented methods
160
+ * this gem is a historically 2 or 3 libraries merged -- this is why tests may look awkward
161
+ * 500px.com has discontinued API in June 2018 -- the tool now uses undocumented methods
@@ -1,3 +1,2 @@
1
1
  export IMGUR_CLIENT_ID=0f99cd781c9d0d8
2
2
  export FLICKR_API_KEY=dc2bfd348b01bdc5b09d36876dc38f3d
3
- export _500PX_CONSUMER_KEY=ESkHTUELdcE48bezGfwzSjqVIBVTnNRIPTviTGLv
data/bin/directlink CHANGED
@@ -72,7 +72,7 @@ rescue NetHTTPUtils::Error,
72
72
  SocketError,
73
73
  FastImage::UnknownImageType,
74
74
  FastImage::ImageFetchFailure,
75
- DirectLink::ErrorMissingEnvVar,
75
+ # DirectLink::ErrorMissingEnvVar,
76
76
  # DirectLink::ErrorAssert,
77
77
  DirectLink::ErrorNotFound,
78
78
  DirectLink::ErrorBadLink => e
data/directlink.gemspec CHANGED
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |spec|
2
2
  spec.name = "directlink"
3
- spec.version = "0.0.3.1"
3
+ spec.version = "0.0.4.0"
4
4
  spec.summary = "converts any kind of image hyperlink to direct link, type of image and its resolution"
5
5
 
6
6
  spec.author = "Victor Maslov aka Nakilon"
@@ -11,6 +11,7 @@ Gem::Specification.new do |spec|
11
11
 
12
12
  spec.add_dependency "nethttputils", "~>0.2.4.0"
13
13
  spec.add_dependency "fastimage", "~>2.1.3"
14
+ spec.add_dependency "nokogiri"
14
15
  spec.add_development_dependency "minitest"
15
16
 
16
17
  spec.require_path = "lib"
data/lib/directlink.rb CHANGED
@@ -12,24 +12,28 @@ module DirectLink
12
12
  puts str unless Module.nesting.first.silent
13
13
  end
14
14
 
15
- class ErrorMissingEnvVar < RuntimeError ; end
16
15
  class ErrorAssert < RuntimeError
17
16
  def initialize msg
18
17
  super "#{msg} -- consider reporting this issue to GitHub"
19
18
  end
20
19
  end
21
- @@LoggingError = Class.new RuntimeError do
20
+ logging_error = Class.new RuntimeError do
22
21
  def initialize msg
23
22
  Module.nesting.first.logger.error msg
24
23
  super msg
25
24
  end
26
25
  end
27
- class ErrorNotFound < @@LoggingError ; end
28
- class ErrorBadLink < @@LoggingError
26
+ class ErrorNotFound < logging_error ; end
27
+ class ErrorBadLink < logging_error
29
28
  def initialize link, sure = false
30
29
  super "#{link.inspect}#{" -- if you think this link is valid, please report the issue" unless sure}"
31
30
  end
32
31
  end
32
+ class ErrorMissingEnvVar < logging_error
33
+ def initialize msg
34
+ super "(warning, recommendation) #{msg}"
35
+ end
36
+ end
33
37
 
34
38
 
35
39
  def self.google src, width = 0
@@ -202,7 +206,7 @@ end
202
206
 
203
207
  require "fastimage"
204
208
 
205
- def DirectLink link, max_redirect_resolving_retry_delay = nil
209
+ def DirectLink link, max_redirect_resolving_retry_delay = nil, giveup = false
206
210
  begin
207
211
  URI link
208
212
  rescue URI::InvalidURIError
@@ -234,28 +238,42 @@ def DirectLink link, max_redirect_resolving_retry_delay = nil
234
238
  } : {})
235
239
  raise NetHTTPUtils::Error.new "", r.code.to_i unless "200" == r.code
236
240
  link = r.uri.to_s
241
+ # why do we resolve redirects before trying the known adapters?
242
+ # because they can be hidden behind URL shorteners
243
+ # also it can resolve NetHTTPUtils::Error(404) before trying the adapter
244
+
245
+ # TODO: get rid of this copypasta, that is caused by that we want to pass urls without schema to this method
246
+ if %w{ lh3 googleusercontent com } == URI(link).host.split(?.).last(3) ||
247
+ %w{ bp blogspot com } == URI(link).host.split(?.).last(3)
248
+ u = DirectLink.google link
249
+ f = FastImage.new(u, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla"})
250
+ w, h = f.size
251
+ return struct.new u, w, h, f.type
252
+ end
237
253
 
238
- if %w{ imgur com } == URI(link).host.split(?.).last(2) ||
239
- %w{ i imgur com } == URI(link).host.split(?.).last(3) ||
240
- %w{ m imgur com } == URI(link).host.split(?.).last(3) ||
241
- %w{ www imgur com } == URI(link).host.split(?.).last(3)
254
+ begin
242
255
  imgur = DirectLink.imgur(link).sort_by{ |u, w, h, t| - w * h }.map do |u, w, h, t|
243
256
  struct.new u, w, h, t
244
257
  end
245
258
  # `DirectLink.imgur` return value is always an Array
246
259
  return imgur.size == 1 ? imgur.first : imgur
247
- end
260
+ rescue DirectLink::ErrorMissingEnvVar
261
+ end if %w{ imgur com } == URI(link).host.split(?.).last(2) ||
262
+ %w{ i imgur com } == URI(link).host.split(?.).last(3) ||
263
+ %w{ m imgur com } == URI(link).host.split(?.).last(3) ||
264
+ %w{ www imgur com } == URI(link).host.split(?.).last(3)
248
265
 
249
266
  if %w{ 500px com } == URI(link).host.split(?.).last(2)
250
267
  w, h, u, t = DirectLink._500px(link)
251
268
  return struct.new u, w, h, t
252
269
  end
253
270
 
254
- if %w{ www flickr com } == URI(link).host.split(?.).last(3)
271
+ begin
255
272
  w, h, u = DirectLink.flickr(link)
256
273
  f = FastImage.new(u, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla"})
257
274
  return struct.new u, w, h, f.type
258
- end
275
+ rescue DirectLink::ErrorMissingEnvVar
276
+ end if %w{ www flickr com } == URI(link).host.split(?.).last(3)
259
277
 
260
278
  if %w{ wikipedia org } == URI(link).host.split(?.).last(2) ||
261
279
  %w{ commons wikimedia org } == URI(link).host.split(?.).last(3)
@@ -265,7 +283,33 @@ def DirectLink link, max_redirect_resolving_retry_delay = nil
265
283
  return struct.new u, w, h, f.type
266
284
  end
267
285
 
268
- f = FastImage.new(link, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"})
269
- w, h = f.size
270
- struct.new link, w, h, f.type
286
+ begin
287
+ f = FastImage.new(link, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"})
288
+ rescue FastImage::UnknownImageType
289
+ raise if giveup
290
+ require "nokogiri"
291
+ html = Nokogiri::HTML NetHTTPUtils::request_data link
292
+ h = {}
293
+ l = lambda do |node, s = []|
294
+ node.element_children.flat_map do |child|
295
+ if "img" == child.node_name
296
+ begin
297
+ [[s, (h[child[:src]] = h[child[:src]] || DirectLink(child[:src], nil, true))]]
298
+ rescue => e
299
+ []
300
+ end
301
+ else
302
+ l[child, s + [child.node_name]]
303
+ end
304
+ end
305
+ end
306
+ l[html].group_by(&:first).map{ |k, v| [k.join(?>), v.map(&:last)] }.tap do |result|
307
+ next unless result.empty?
308
+ raise unless t = html.at_css "meta[@property='og:image']"
309
+ return DirectLink t[:content], nil, true
310
+ end.max_by{ |_, v| v.map{ |i| i.width * i.height }.inject(:+) / v.size }.last
311
+ else
312
+ w, h = f.size
313
+ struct.new link, w, h, f.type
314
+ end
271
315
  end
data/test.rb CHANGED
@@ -1,4 +1,5 @@
1
1
  STDOUT.sync = true
2
+ require "pp"
2
3
 
3
4
  require "minitest/autorun"
4
5
  require "minitest/mock"
@@ -6,6 +7,9 @@ require "minitest/mock"
6
7
  # TODO: I'm not sure it's ok that after we started using NetHTTPUtils for redirect resolving
7
8
  # we don't raise `FastImage::ImageFetchFailure` anymore in any test
8
9
 
10
+ fail unless ENV.include? "IMGUR_CLIENT_ID"
11
+ fail unless ENV.include? "FLICKR_API_KEY"
12
+
9
13
  require_relative "lib/directlink"
10
14
  DirectLink.silent = true
11
15
  describe DirectLink do
@@ -408,7 +412,6 @@ describe DirectLink do
408
412
  describe "some other tests" do
409
413
  [
410
414
  ["http://www.aeronautica.difesa.it/organizzazione/REPARTI/divolo/PublishingImages/6%C2%B0%20Stormo/2013-decollo%20al%20tramonto%20REX%201280.jpg", ["http://www.aeronautica.difesa.it/organizzazione/REPARTI/divolo/PublishingImages/6%C2%B0%20Stormo/2013-decollo%20al%20tramonto%20REX%201280.jpg", 1280, 853, :jpeg]],
411
- ["http://example.com", FastImage::UnknownImageType, "FastImage::UnknownImageType"], # we explicitly expect this useless `e.message ` to be sure we know how FastImage behaves
412
415
  ["http://minus.com/lkP3hgRJd9npi", SocketError, /nodename nor servname provided, or not known|No address associated with hostname/, 0],
413
416
  ["https://i.redd.it/si758zk7r5xz.jpg", NetHTTPUtils::Error, "HTTP error #404 "],
414
417
  ["http://www.cutehalloweencostumeideas.org/wp-content/uploads/2017/10/Niagara-Falls_04.jpg", SocketError, /nodename nor servname provided, or not known|Name or service not known/, 0],
@@ -432,6 +435,39 @@ describe DirectLink do
432
435
  end
433
436
  end
434
437
 
438
+ describe "giving up" do
439
+ [
440
+ ["http://example.com", FastImage::UnknownImageType],
441
+ ["https://github.com/Nakilon/dhash-vips", FastImage::UnknownImageType, true],
442
+ ["https://github.com/Nakilon/dhash-vips", 3],
443
+ ["http://imgur.com/HQHBBBD", FastImage::UnknownImageType, true],
444
+ ["http://imgur.com/HQHBBBD", "https://i.imgur.com/HQHBBBD.jpg?fb"],
445
+ ].each_with_index do |(input, expectation, giveup), i|
446
+ it "##{i + 1}" do
447
+ t = ENV.delete "IMGUR_CLIENT_ID"
448
+ begin
449
+ case expectation
450
+ when Class
451
+ e = assert_raises expectation, "for #{input} (giveup = #{giveup})" do
452
+ DirectLink input, nil, giveup
453
+ end
454
+ assert_equal expectation.to_s, e.message, "for #{input} (giveup = #{giveup})"
455
+ when String
456
+ result = DirectLink input, nil, giveup
457
+ assert_equal expectation, result.url, "for #{input} (giveup = #{giveup})"
458
+ else
459
+ result = DirectLink input, nil, giveup
460
+ assert_equal expectation, result.size, ->{
461
+ "for #{input} (giveup = #{giveup}): #{result.map &:url}"
462
+ }
463
+ end
464
+ ensure
465
+ ENV["IMGUR_CLIENT_ID"] = t
466
+ end
467
+ end
468
+ end
469
+ end
470
+
435
471
  end
436
472
 
437
473
  describe "./bin" do
@@ -468,9 +504,13 @@ describe DirectLink do
468
504
  [
469
505
  [1, "http://example.com/", "FastImage::UnknownImageType"],
470
506
  [1, "http://example.com/404", "NetHTTPUtils::Error: HTTP error #404 "],
471
- [1, "http://imgur.com/HQHBBBD", "DirectLink::ErrorMissingEnvVar: define IMGUR_CLIENT_ID env var", " && unset IMGUR_CLIENT_ID"], # TODO: make similar test for ./lib
507
+
508
+ # TODO: a test when the giveup=false fails and reraises the DirectLink::ErrorMissingEnvVar
509
+ # maybe put it to ./lib tests
510
+
472
511
  # by design it should be impossible to write a test for DirectLink::ErrorAssert
473
512
  [1, "https://flic.kr/p/DirectLinkErrorNotFound", "NetHTTPUtils::Error: HTTP error #404 "],
513
+
474
514
  [1, "https://imgur.com/a/badlinkpattern", "NetHTTPUtils::Error: HTTP error #404 "],
475
515
  # TODO: a test that it appends the `exception.cause`
476
516
  ].each_with_index do |(expected_exit_code, link, expected_output, unset), i|
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: directlink
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.3.1
4
+ version: 0.0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Victor Maslov aka Nakilon
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-08-10 00:00:00.000000000 Z
11
+ date: 2018-08-12 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nethttputils
@@ -38,6 +38,20 @@ dependencies:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: 2.1.3
41
+ - !ruby/object:Gem::Dependency
42
+ name: nokogiri
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
41
55
  - !ruby/object:Gem::Dependency
42
56
  name: minitest
43
57
  requirement: !ruby/object:Gem::Requirement