directlink 0.0.3.1 → 0.0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: d421a30c60a09ebaf198aeb77679a78a61a8521c
4
- data.tar.gz: 7acbb0d68a9e150da049bd50afd0bd5a64e61fe1
3
+ metadata.gz: 31d4684edbeadeddce540e3ca0a1894f5756af86
4
+ data.tar.gz: f1efd18489faf49d2bc7460f210b53cf7cb0929c
5
5
  SHA512:
6
- metadata.gz: e1e5a70b474a6261f113304aee2639e10183dc5150017e68f68974b72d3ae4b9b8d41b3a5243883fc20214db6ae51d624a76039b548ccf33a07fe8911f5d1463
7
- data.tar.gz: e7ece452fbf63d76a9d155dd53e05d4e702e2f5c9d825a1562206f3ff727ca1e85e910a15814ba646f59cc600f76284206604d91c929e03efccfe5f372496d70
6
+ metadata.gz: bef3ef81ba91007a8b26d6f45813d67319bd00b3c3bfbc71977d2caa4c40e29f31e8f35a32d2f7af8c1c8bcd8d7a267ef3998bc51912a997462a864f2cf563e0
7
+ data.tar.gz: 18f45737947862ebc5409b3a20be5d2389aa7e9f85125bd85465c670c7ab488b43ab55c599ac79bdfde7063125b5a891a42336f9c2348ffcd5807707dc908d44
data/README.md CHANGED
@@ -3,7 +3,7 @@
3
3
 
4
4
  # gem directlink
5
5
 
6
- This tool converts any sort of image hyperlink (a thumbnail URL, a link to an album, etc.) to a high resolution one. Also it tells the resulting resolution and the image type (format). I wanted such automation often so I made a gem with a binary.
6
+ This tool obtains from any sort of hyperlink (a thumbnail URL, a link to a photo album, a news article, etc.) a directlink(s) to high resolution images at that page. Also it tells the resulting resolution and the image type (format). The gem also includes a binary so you can use it as a CLI.
7
7
 
8
8
  ## Usage
9
9
 
@@ -16,7 +16,7 @@ $ gem install directlink
16
16
  $ directlink
17
17
  usage: directlink [--debug] [--json] [--github] <link1> <link2> <link3> ...
18
18
  ```
19
- Converts `<img src=` attribute value from any Google web service (current Google regexes are very strict and may often fail -- it is a [defensive programming](https://en.wikipedia.org/wiki/Defensive_programming) practice -- report me your links!) to the largest available:
19
+ Converts `<img src=` attribute value from any Google web service to the largest available:
20
20
  ```
21
21
  $ directlink //4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/w530-h278-p/IMG_20171223_093922.jpg
22
22
  <= //4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/w530-h278-p/IMG_20171223_093922.jpg
@@ -77,16 +77,30 @@ $ directlink --json https://imgur.com/a/oacI3gl https://avatars1.githubuserconte
77
77
  }
78
78
  ]
79
79
  ```
80
- Downloads master:HEAD version of `lib/directlink.rb` from GitHub and uses it once instead of installed one (this is easier than installing gem from repo):
80
+ Downloads `master:HEAD` version of `lib/directlink.rb` from GitHub and uses it once instead of installed one:
81
81
  ```
82
82
  $ directlink --github https://imgur.com/a/oacI3gl
83
83
  ```
84
- When an image hosting with known API is recognized, the API will be used and you'll have to create app there and provide env vars:
84
+ When an image hosting with known API is recognized, it will try to use the API tokens you've provided as env vars (otherwise it will go "don't give up" mode):
85
85
  ```
86
86
  $ export IMGUR_CLIENT_ID=0f99cd781...
87
87
  $ export FLICKR_API_KEY=dc2bfd348b...
88
88
  ```
89
89
 
90
+ #### the "don't give up mode"
91
+
92
+ If the passed link is not the image link or a photo page of a known image hosting, the tool is still able to find the main images that the linked webpage contains (here it found three images in the markdown file):
93
+ ```
94
+ $ bundle exec bin/directlink https://github.com/Nakilon/dhash-vips
95
+ <= https://github.com/Nakilon/dhash-vips
96
+ => https://camo.githubusercontent.com/852607c7f4b604fc3c83b782c4f6983cf488b0d4/68747470733a2f2f73746f726167652e676f6f676c65617069732e636f6d2f64686173682d766970732e6e616b696c6f6e2e70726f2f64686173685f69737375655f6578616d706c652e706e67
97
+ png 592x366
98
+ => https://camo.githubusercontent.com/5e354666bac69e32d605dbd45351bfb7d808924b/68747470733a2f2f73746f726167652e676f6f676c65617069732e636f6d2f64686173682d766970732e6e616b696c6f6e2e70726f2f6964686173685f6578616d706c655f696e2e706e67
99
+ png 773x679
100
+ => https://camo.githubusercontent.com/5456cc20ae9b20c06792ddd19b533ae36404d8c1/68747470733a2f2f73746f726167652e676f6f676c65617069732e636f6d2f64686173682d766970732e6e616b696c6f6e2e70726f2f6964686173685f6578616d706c655f6f75742e706e67
101
+ png 1610x800
102
+ ```
103
+
90
104
  ### As a library
91
105
 
92
106
  ```
@@ -110,11 +124,16 @@ Google can serve image in arbitrary resolution so `DirectLink.google` has an opt
110
124
  irb> DirectLink.google "//4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/w530-h278-p/IMG_20171223_093922.jpg", 100
111
125
  => "https://4.bp.blogspot.com/-5kP8ndL0kuM/Wpt82UCqvmI/AAAAAAAAEjI/ZbbZWs0-kgwRXEJ9JEGioR0bm6U8MOkvQCKgBGAs/s100/IMG_20171223_093922.jpg"
112
126
  ```
113
- To silent the logger that `DirectLink.imgur` uses:
127
+ To disable the "don't give up" mode (otherwise it consumes time on analyzing all the images on the linked page):
128
+ ```
129
+ irb> DirectLink "https://github.com/Nakilon/dhash-vips", nil, true
130
+ # raises FastImage::UnknownImageType
131
+ ```
132
+ To silent the STDOUT logger that you may see sometimes:
114
133
  ```ruby
115
134
  DirectLink.silent = true
116
135
  ```
117
- You also may look into [`bin/directlink`](bin/directlink) for usage example and the list of all possible exceptions.
136
+ You also may look into [`bin/directlink`](bin/directlink) as a library usage example and the list of all possible exceptions.
118
137
 
119
138
  #### about long retries
120
139
 
@@ -125,7 +144,7 @@ NetHTTPUtils.logger.level = Logger::WARN
125
144
  ```
126
145
  W 180507 102210 : NetHTTPUtils : retrying in 10 seconds because of SocketError 'Failed to open TCP connection to minus.com:80 (getaddrinfo: nodename nor servname provided, or not known)' at: http://minus.com/
127
146
  ```
128
- To make `DirectLink()` respond faster pass an optional argument that specifies the max retry delay as any numeric value. Here we get the exception immediately:
147
+ To make `DirectLink()` respond faster pass an optional argument that specifies the max retry delay. Here we get the exception immediately:
129
148
  ```ruby
130
149
  DirectLink "http://minus.com/", 0
131
150
  ```
@@ -136,7 +155,7 @@ SocketError: Failed to open TCP connection to minus.com:80 (getaddrinfo: nodenam
136
155
  ## Notes:
137
156
 
138
157
  * `module DirectLink` public methods return different sets of properties -- `DirectLink()` unites them
139
- * the `DirectLink::ErrorAssert` should never happen and you might report it if it does
158
+ * the `ErrorAssert` and `ErrorMissingEnvVar` should never be raised and you might report it if it does
140
159
  * style: `@@` and lambdas are used to keep things private
141
- * this gem is a 2 or 3 libraries merged so don't expect tests to be full and consistent
142
- * since 500px.com closed their API in June 2018 the gem uses potentially unreliable undocumented methods
160
+ * this gem is a historically 2 or 3 libraries merged -- this is why tests may look awkward
161
+ * 500px.com has discontinued API in June 2018 -- the tool now uses undocumented methods
@@ -1,3 +1,2 @@
1
1
  export IMGUR_CLIENT_ID=0f99cd781c9d0d8
2
2
  export FLICKR_API_KEY=dc2bfd348b01bdc5b09d36876dc38f3d
3
- export _500PX_CONSUMER_KEY=ESkHTUELdcE48bezGfwzSjqVIBVTnNRIPTviTGLv
data/bin/directlink CHANGED
@@ -72,7 +72,7 @@ rescue NetHTTPUtils::Error,
72
72
  SocketError,
73
73
  FastImage::UnknownImageType,
74
74
  FastImage::ImageFetchFailure,
75
- DirectLink::ErrorMissingEnvVar,
75
+ # DirectLink::ErrorMissingEnvVar,
76
76
  # DirectLink::ErrorAssert,
77
77
  DirectLink::ErrorNotFound,
78
78
  DirectLink::ErrorBadLink => e
data/directlink.gemspec CHANGED
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |spec|
2
2
  spec.name = "directlink"
3
- spec.version = "0.0.3.1"
3
+ spec.version = "0.0.4.0"
4
4
  spec.summary = "converts any kind of image hyperlink to direct link, type of image and its resolution"
5
5
 
6
6
  spec.author = "Victor Maslov aka Nakilon"
@@ -11,6 +11,7 @@ Gem::Specification.new do |spec|
11
11
 
12
12
  spec.add_dependency "nethttputils", "~>0.2.4.0"
13
13
  spec.add_dependency "fastimage", "~>2.1.3"
14
+ spec.add_dependency "nokogiri"
14
15
  spec.add_development_dependency "minitest"
15
16
 
16
17
  spec.require_path = "lib"
data/lib/directlink.rb CHANGED
@@ -12,24 +12,28 @@ module DirectLink
12
12
  puts str unless Module.nesting.first.silent
13
13
  end
14
14
 
15
- class ErrorMissingEnvVar < RuntimeError ; end
16
15
  class ErrorAssert < RuntimeError
17
16
  def initialize msg
18
17
  super "#{msg} -- consider reporting this issue to GitHub"
19
18
  end
20
19
  end
21
- @@LoggingError = Class.new RuntimeError do
20
+ logging_error = Class.new RuntimeError do
22
21
  def initialize msg
23
22
  Module.nesting.first.logger.error msg
24
23
  super msg
25
24
  end
26
25
  end
27
- class ErrorNotFound < @@LoggingError ; end
28
- class ErrorBadLink < @@LoggingError
26
+ class ErrorNotFound < logging_error ; end
27
+ class ErrorBadLink < logging_error
29
28
  def initialize link, sure = false
30
29
  super "#{link.inspect}#{" -- if you think this link is valid, please report the issue" unless sure}"
31
30
  end
32
31
  end
32
+ class ErrorMissingEnvVar < logging_error
33
+ def initialize msg
34
+ super "(warning, recommendation) #{msg}"
35
+ end
36
+ end
33
37
 
34
38
 
35
39
  def self.google src, width = 0
@@ -202,7 +206,7 @@ end
202
206
 
203
207
  require "fastimage"
204
208
 
205
- def DirectLink link, max_redirect_resolving_retry_delay = nil
209
+ def DirectLink link, max_redirect_resolving_retry_delay = nil, giveup = false
206
210
  begin
207
211
  URI link
208
212
  rescue URI::InvalidURIError
@@ -234,28 +238,42 @@ def DirectLink link, max_redirect_resolving_retry_delay = nil
234
238
  } : {})
235
239
  raise NetHTTPUtils::Error.new "", r.code.to_i unless "200" == r.code
236
240
  link = r.uri.to_s
241
+ # why do we resolve redirects before trying the known adapters?
242
+ # because they can be hidden behind URL shorteners
243
+ # also it can resolve NetHTTPUtils::Error(404) before trying the adapter
244
+
245
+ # TODO: get rid of this copypasta, that is caused by that we want to pass urls without schema to this method
246
+ if %w{ lh3 googleusercontent com } == URI(link).host.split(?.).last(3) ||
247
+ %w{ bp blogspot com } == URI(link).host.split(?.).last(3)
248
+ u = DirectLink.google link
249
+ f = FastImage.new(u, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla"})
250
+ w, h = f.size
251
+ return struct.new u, w, h, f.type
252
+ end
237
253
 
238
- if %w{ imgur com } == URI(link).host.split(?.).last(2) ||
239
- %w{ i imgur com } == URI(link).host.split(?.).last(3) ||
240
- %w{ m imgur com } == URI(link).host.split(?.).last(3) ||
241
- %w{ www imgur com } == URI(link).host.split(?.).last(3)
254
+ begin
242
255
  imgur = DirectLink.imgur(link).sort_by{ |u, w, h, t| - w * h }.map do |u, w, h, t|
243
256
  struct.new u, w, h, t
244
257
  end
245
258
  # `DirectLink.imgur` return value is always an Array
246
259
  return imgur.size == 1 ? imgur.first : imgur
247
- end
260
+ rescue DirectLink::ErrorMissingEnvVar
261
+ end if %w{ imgur com } == URI(link).host.split(?.).last(2) ||
262
+ %w{ i imgur com } == URI(link).host.split(?.).last(3) ||
263
+ %w{ m imgur com } == URI(link).host.split(?.).last(3) ||
264
+ %w{ www imgur com } == URI(link).host.split(?.).last(3)
248
265
 
249
266
  if %w{ 500px com } == URI(link).host.split(?.).last(2)
250
267
  w, h, u, t = DirectLink._500px(link)
251
268
  return struct.new u, w, h, t
252
269
  end
253
270
 
254
- if %w{ www flickr com } == URI(link).host.split(?.).last(3)
271
+ begin
255
272
  w, h, u = DirectLink.flickr(link)
256
273
  f = FastImage.new(u, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla"})
257
274
  return struct.new u, w, h, f.type
258
- end
275
+ rescue DirectLink::ErrorMissingEnvVar
276
+ end if %w{ www flickr com } == URI(link).host.split(?.).last(3)
259
277
 
260
278
  if %w{ wikipedia org } == URI(link).host.split(?.).last(2) ||
261
279
  %w{ commons wikimedia org } == URI(link).host.split(?.).last(3)
@@ -265,7 +283,33 @@ def DirectLink link, max_redirect_resolving_retry_delay = nil
265
283
  return struct.new u, w, h, f.type
266
284
  end
267
285
 
268
- f = FastImage.new(link, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"})
269
- w, h = f.size
270
- struct.new link, w, h, f.type
286
+ begin
287
+ f = FastImage.new(link, raise_on_failure: true, http_header: {"User-Agent" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36"})
288
+ rescue FastImage::UnknownImageType
289
+ raise if giveup
290
+ require "nokogiri"
291
+ html = Nokogiri::HTML NetHTTPUtils::request_data link
292
+ h = {}
293
+ l = lambda do |node, s = []|
294
+ node.element_children.flat_map do |child|
295
+ if "img" == child.node_name
296
+ begin
297
+ [[s, (h[child[:src]] = h[child[:src]] || DirectLink(child[:src], nil, true))]]
298
+ rescue => e
299
+ []
300
+ end
301
+ else
302
+ l[child, s + [child.node_name]]
303
+ end
304
+ end
305
+ end
306
+ l[html].group_by(&:first).map{ |k, v| [k.join(?>), v.map(&:last)] }.tap do |result|
307
+ next unless result.empty?
308
+ raise unless t = html.at_css "meta[@property='og:image']"
309
+ return DirectLink t[:content], nil, true
310
+ end.max_by{ |_, v| v.map{ |i| i.width * i.height }.inject(:+) / v.size }.last
311
+ else
312
+ w, h = f.size
313
+ struct.new link, w, h, f.type
314
+ end
271
315
  end
data/test.rb CHANGED
@@ -1,4 +1,5 @@
1
1
  STDOUT.sync = true
2
+ require "pp"
2
3
 
3
4
  require "minitest/autorun"
4
5
  require "minitest/mock"
@@ -6,6 +7,9 @@ require "minitest/mock"
6
7
  # TODO: I'm not sure it's ok that after we started using NetHTTPUtils for redirect resolving
7
8
  # we don't raise `FastImage::ImageFetchFailure` anymore in any test
8
9
 
10
+ fail unless ENV.include? "IMGUR_CLIENT_ID"
11
+ fail unless ENV.include? "FLICKR_API_KEY"
12
+
9
13
  require_relative "lib/directlink"
10
14
  DirectLink.silent = true
11
15
  describe DirectLink do
@@ -408,7 +412,6 @@ describe DirectLink do
408
412
  describe "some other tests" do
409
413
  [
410
414
  ["http://www.aeronautica.difesa.it/organizzazione/REPARTI/divolo/PublishingImages/6%C2%B0%20Stormo/2013-decollo%20al%20tramonto%20REX%201280.jpg", ["http://www.aeronautica.difesa.it/organizzazione/REPARTI/divolo/PublishingImages/6%C2%B0%20Stormo/2013-decollo%20al%20tramonto%20REX%201280.jpg", 1280, 853, :jpeg]],
411
- ["http://example.com", FastImage::UnknownImageType, "FastImage::UnknownImageType"], # we explicitly expect this useless `e.message ` to be sure we know how FastImage behaves
412
415
  ["http://minus.com/lkP3hgRJd9npi", SocketError, /nodename nor servname provided, or not known|No address associated with hostname/, 0],
413
416
  ["https://i.redd.it/si758zk7r5xz.jpg", NetHTTPUtils::Error, "HTTP error #404 "],
414
417
  ["http://www.cutehalloweencostumeideas.org/wp-content/uploads/2017/10/Niagara-Falls_04.jpg", SocketError, /nodename nor servname provided, or not known|Name or service not known/, 0],
@@ -432,6 +435,39 @@ describe DirectLink do
432
435
  end
433
436
  end
434
437
 
438
+ describe "giving up" do
439
+ [
440
+ ["http://example.com", FastImage::UnknownImageType],
441
+ ["https://github.com/Nakilon/dhash-vips", FastImage::UnknownImageType, true],
442
+ ["https://github.com/Nakilon/dhash-vips", 3],
443
+ ["http://imgur.com/HQHBBBD", FastImage::UnknownImageType, true],
444
+ ["http://imgur.com/HQHBBBD", "https://i.imgur.com/HQHBBBD.jpg?fb"],
445
+ ].each_with_index do |(input, expectation, giveup), i|
446
+ it "##{i + 1}" do
447
+ t = ENV.delete "IMGUR_CLIENT_ID"
448
+ begin
449
+ case expectation
450
+ when Class
451
+ e = assert_raises expectation, "for #{input} (giveup = #{giveup})" do
452
+ DirectLink input, nil, giveup
453
+ end
454
+ assert_equal expectation.to_s, e.message, "for #{input} (giveup = #{giveup})"
455
+ when String
456
+ result = DirectLink input, nil, giveup
457
+ assert_equal expectation, result.url, "for #{input} (giveup = #{giveup})"
458
+ else
459
+ result = DirectLink input, nil, giveup
460
+ assert_equal expectation, result.size, ->{
461
+ "for #{input} (giveup = #{giveup}): #{result.map &:url}"
462
+ }
463
+ end
464
+ ensure
465
+ ENV["IMGUR_CLIENT_ID"] = t
466
+ end
467
+ end
468
+ end
469
+ end
470
+
435
471
  end
436
472
 
437
473
  describe "./bin" do
@@ -468,9 +504,13 @@ describe DirectLink do
468
504
  [
469
505
  [1, "http://example.com/", "FastImage::UnknownImageType"],
470
506
  [1, "http://example.com/404", "NetHTTPUtils::Error: HTTP error #404 "],
471
- [1, "http://imgur.com/HQHBBBD", "DirectLink::ErrorMissingEnvVar: define IMGUR_CLIENT_ID env var", " && unset IMGUR_CLIENT_ID"], # TODO: make similar test for ./lib
507
+
508
+ # TODO: a test when the giveup=false fails and reraises the DirectLink::ErrorMissingEnvVar
509
+ # maybe put it to ./lib tests
510
+
472
511
  # by design it should be impossible to write a test for DirectLink::ErrorAssert
473
512
  [1, "https://flic.kr/p/DirectLinkErrorNotFound", "NetHTTPUtils::Error: HTTP error #404 "],
513
+
474
514
  [1, "https://imgur.com/a/badlinkpattern", "NetHTTPUtils::Error: HTTP error #404 "],
475
515
  # TODO: a test that it appends the `exception.cause`
476
516
  ].each_with_index do |(expected_exit_code, link, expected_output, unset), i|
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: directlink
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.3.1
4
+ version: 0.0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Victor Maslov aka Nakilon
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-08-10 00:00:00.000000000 Z
11
+ date: 2018-08-12 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nethttputils
@@ -38,6 +38,20 @@ dependencies:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: 2.1.3
41
+ - !ruby/object:Gem::Dependency
42
+ name: nokogiri
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
41
55
  - !ruby/object:Gem::Dependency
42
56
  name: minitest
43
57
  requirement: !ruby/object:Gem::Requirement