image_metadata_scraper 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: bd0aa9cda13195b2788ff0f711d86872401ec746
4
+ data.tar.gz: 971691ea282517cfce83ad72277543e0cd3b6c69
5
+ SHA512:
6
+ metadata.gz: 34f9410fedb0716de1bb3bf9ffa659f4a9753d405777d740b247bedcf8dbf7cba206f81c6dc4e8efed5197007df5fd0a59bbfc76c19b64279623a900005c5f67
7
+ data.tar.gz: 4ce3f666c1fa9e4da10c76f4b74834fe346453d92419a8bd4ad7cdea9a6f67042e6e1509c9ece9938435747d71e8895875818d885520a5fda92d3ae7b1b4b1cd
data/.gitignore ADDED
@@ -0,0 +1,9 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
data/.travis.yml ADDED
@@ -0,0 +1,6 @@
1
+ sudo: false
2
+ language: ruby
3
+ rvm:
4
+ - 2.4.0
5
+ after_success:
6
+ - bundle exec codeclimate-test-reporter
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,116 @@
1
+ CC0 1.0 Universal
2
+
3
+ Statement of Purpose
4
+
5
+ The laws of most jurisdictions throughout the world automatically confer
6
+ exclusive Copyright and Related Rights (defined below) upon the creator and
7
+ subsequent owner(s) (each and all, an "owner") of an original work of
8
+ authorship and/or a database (each, a "Work").
9
+
10
+ Certain owners wish to permanently relinquish those rights to a Work for the
11
+ purpose of contributing to a commons of creative, cultural and scientific
12
+ works ("Commons") that the public can reliably and without fear of later
13
+ claims of infringement build upon, modify, incorporate in other works, reuse
14
+ and redistribute as freely as possible in any form whatsoever and for any
15
+ purposes, including without limitation commercial purposes. These owners may
16
+ contribute to the Commons to promote the ideal of a free culture and the
17
+ further production of creative, cultural and scientific works, or to gain
18
+ reputation or greater distribution for their Work in part through the use and
19
+ efforts of others.
20
+
21
+ For these and/or other purposes and motivations, and without any expectation
22
+ of additional consideration or compensation, the person associating CC0 with a
23
+ Work (the "Affirmer"), to the extent that he or she is an owner of Copyright
24
+ and Related Rights in the Work, voluntarily elects to apply CC0 to the Work
25
+ and publicly distribute the Work under its terms, with knowledge of his or her
26
+ Copyright and Related Rights in the Work and the meaning and intended legal
27
+ effect of CC0 on those rights.
28
+
29
+ 1. Copyright and Related Rights. A Work made available under CC0 may be
30
+ protected by copyright and related or neighboring rights ("Copyright and
31
+ Related Rights"). Copyright and Related Rights include, but are not limited
32
+ to, the following:
33
+
34
+ i. the right to reproduce, adapt, distribute, perform, display, communicate,
35
+ and translate a Work;
36
+
37
+ ii. moral rights retained by the original author(s) and/or performer(s);
38
+
39
+ iii. publicity and privacy rights pertaining to a person's image or likeness
40
+ depicted in a Work;
41
+
42
+ iv. rights protecting against unfair competition in regards to a Work,
43
+ subject to the limitations in paragraph 4(a), below;
44
+
45
+ v. rights protecting the extraction, dissemination, use and reuse of data in
46
+ a Work;
47
+
48
+ vi. database rights (such as those arising under Directive 96/9/EC of the
49
+ European Parliament and of the Council of 11 March 1996 on the legal
50
+ protection of databases, and under any national implementation thereof,
51
+ including any amended or successor version of such directive); and
52
+
53
+ vii. other similar, equivalent or corresponding rights throughout the world
54
+ based on applicable law or treaty, and any national implementations thereof.
55
+
56
+ 2. Waiver. To the greatest extent permitted by, but not in contravention of,
57
+ applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and
58
+ unconditionally waives, abandons, and surrenders all of Affirmer's Copyright
59
+ and Related Rights and associated claims and causes of action, whether now
60
+ known or unknown (including existing as well as future claims and causes of
61
+ action), in the Work (i) in all territories worldwide, (ii) for the maximum
62
+ duration provided by applicable law or treaty (including future time
63
+ extensions), (iii) in any current or future medium and for any number of
64
+ copies, and (iv) for any purpose whatsoever, including without limitation
65
+ commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes
66
+ the Waiver for the benefit of each member of the public at large and to the
67
+ detriment of Affirmer's heirs and successors, fully intending that such Waiver
68
+ shall not be subject to revocation, rescission, cancellation, termination, or
69
+ any other legal or equitable action to disrupt the quiet enjoyment of the Work
70
+ by the public as contemplated by Affirmer's express Statement of Purpose.
71
+
72
+ 3. Public License Fallback. Should any part of the Waiver for any reason be
73
+ judged legally invalid or ineffective under applicable law, then the Waiver
74
+ shall be preserved to the maximum extent permitted taking into account
75
+ Affirmer's express Statement of Purpose. In addition, to the extent the Waiver
76
+ is so judged Affirmer hereby grants to each affected person a royalty-free,
77
+ non transferable, non sublicensable, non exclusive, irrevocable and
78
+ unconditional license to exercise Affirmer's Copyright and Related Rights in
79
+ the Work (i) in all territories worldwide, (ii) for the maximum duration
80
+ provided by applicable law or treaty (including future time extensions), (iii)
81
+ in any current or future medium and for any number of copies, and (iv) for any
82
+ purpose whatsoever, including without limitation commercial, advertising or
83
+ promotional purposes (the "License"). The License shall be deemed effective as
84
+ of the date CC0 was applied by Affirmer to the Work. Should any part of the
85
+ License for any reason be judged legally invalid or ineffective under
86
+ applicable law, such partial invalidity or ineffectiveness shall not
87
+ invalidate the remainder of the License, and in such case Affirmer hereby
88
+ affirms that he or she will not (i) exercise any of his or her remaining
89
+ Copyright and Related Rights in the Work or (ii) assert any associated claims
90
+ and causes of action with respect to the Work, in either case contrary to
91
+ Affirmer's express Statement of Purpose.
92
+
93
+ 4. Limitations and Disclaimers.
94
+
95
+ a. No trademark or patent rights held by Affirmer are waived, abandoned,
96
+ surrendered, licensed or otherwise affected by this document.
97
+
98
+ b. Affirmer offers the Work as-is and makes no representations or warranties
99
+ of any kind concerning the Work, express, implied, statutory or otherwise,
100
+ including without limitation warranties of title, merchantability, fitness
101
+ for a particular purpose, non infringement, or the absence of latent or
102
+ other defects, accuracy, or the present or absence of errors, whether or not
103
+ discoverable, all to the greatest extent permissible under applicable law.
104
+
105
+ c. Affirmer disclaims responsibility for clearing rights of other persons
106
+ that may apply to the Work or any use thereof, including without limitation
107
+ any person's Copyright and Related Rights in the Work. Further, Affirmer
108
+ disclaims responsibility for obtaining any necessary consents, permissions
109
+ or other rights required for any use of the Work.
110
+
111
+ d. Affirmer understands and acknowledges that Creative Commons is not a
112
+ party to this document and has no duty or obligation with respect to this
113
+ CC0 or use of the Work.
114
+
115
+ For more information, please see
116
+ <http://creativecommons.org/publicdomain/zero/1.0/>
data/README.md ADDED
@@ -0,0 +1,13 @@
1
+ # ImageMetadataScraper
2
+
3
+ ```ruby
4
+ require 'image_metadata_scraper'
5
+ ImageMetadataScraper.scrape 'http://fav.me/shorturl'
6
+
7
+ => { :image_url => "http://orig.deviantart.net/image-file-url.jpg",
8
+ :thumbnail_url => "http://pre.deviantart.net/small-image-version-url.jpg",
9
+ :artist => "artist",
10
+ :url => "http://artist.deviantart.com/art/some-piece-123" }
11
+ ```
12
+
13
+ [Supported providers](https://github.com/little-bobby-tables/image_metadata_scraper/blob/master/lib/image_metadata_scraper.rb#L11)
data/Rakefile ADDED
@@ -0,0 +1,10 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rake/testtask'
3
+
4
+ Rake::TestTask.new do |t|
5
+ t.libs << 'test'
6
+ t.pattern = 'test/**/*_test.rb'
7
+ t.verbose = false
8
+ end
9
+
10
+ task default: :test
@@ -0,0 +1,28 @@
1
+ $LOAD_PATH.unshift File.expand_path('../lib', __FILE__)
2
+
3
+ require 'image_metadata_scraper/version'
4
+ Gem::Specification.new do |spec|
5
+ spec.name = 'image_metadata_scraper'
6
+ spec.version = ImageMetadataScraper::VERSION
7
+ spec.authors = ['little-bobby-tables']
8
+ spec.email = ['little-bobby-tables@users.noreply.github.com']
9
+
10
+ spec.summary = 'Simple image metadata scraper.'
11
+ spec.homepage = 'https://github.com/little-bobby-tables/image_metadata_scraper'
12
+ spec.license = 'CC0-1.0'
13
+
14
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
15
+ f.match(%r{^(test|spec|features)/})
16
+ end
17
+ spec.require_paths = ['lib']
18
+
19
+ spec.add_runtime_dependency 'nokogiri'
20
+ spec.add_runtime_dependency 'activesupport'
21
+
22
+ spec.add_development_dependency 'openssl'
23
+ spec.add_development_dependency 'vcr'
24
+ spec.add_development_dependency 'webmock'
25
+ spec.add_development_dependency 'minitest-reporters'
26
+ spec.add_development_dependency 'simplecov'
27
+ spec.add_development_dependency 'codeclimate-test-reporter'
28
+ end
@@ -0,0 +1,42 @@
1
+ # frozen_string_literal: true
2
+ require 'nokogiri'
3
+ require 'net/http'
4
+
5
+ module ImageMetadataScraper
6
+ module DeviantArt
7
+ def self.post(url)
8
+ response = URI(url).try do |uri|
9
+ Net::HTTP.new(uri.host).get(uri.request_uri, { 'Cookie' => 'agegate_state=1' })
10
+ end
11
+ page = Nokogiri::HTML(response.body)
12
+
13
+ # Image file URL: downloads enabled
14
+ image_file_url = page.at('a.dev-page-download')&.attr('href')
15
+ image_file_url &&= _follow_download_url(image_file_url, response)
16
+
17
+ # Image file URL: downloads disabled
18
+ image_file_url ||= page.at('.dev-view-deviation img.dev-content-full').attr('src')
19
+
20
+ thumbnail_url = page.at('.dev-view-deviation img.dev-content-normal').attr('src')
21
+
22
+ artist_name = page.at('.dev-title-container .username').content
23
+
24
+ canonical_page_url = page.at('meta[property="og:url"]')&.attr('content')
25
+
26
+ { image_url: image_file_url, thumbnail_url: thumbnail_url, artist: artist_name, url: canonical_page_url }
27
+ end
28
+
29
+ def self.direct_link(url)
30
+ # Direct links have the fav.me shortcode as a part of the URL, we just need to grab that.
31
+ shortcode = url.match(/-(.+)\..+\z/)[1]
32
+ url = "http://fav.me/#{shortcode}"
33
+ post(ImageMetadataScraper.redirect_from(url))
34
+ end
35
+
36
+ def self._follow_download_url(url, previous_request)
37
+ cookies = previous_request['set-cookie'].split(';').first
38
+ uri = URI(url)
39
+ Net::HTTP.new(uri.host).head(uri.request_uri, { 'Cookie' => cookies })['location']
40
+ end
41
+ end
42
+ end
@@ -0,0 +1,24 @@
1
+ # frozen_string_literal: true
2
+ require 'nokogiri'
3
+ require 'open-uri'
4
+
5
+ module ImageMetadataScraper
6
+ module Tumblr
7
+ def self.post(url)
8
+ api_url = url.match /\A(?<blog>https?:\/\/.*tumblr.com)\/(post|image)\/(?<post_id>\d+)/ do |m|
9
+ "#{m[:blog]}/api/read?id=#{m[:post_id]}"
10
+ end
11
+ xml = Nokogiri::XML(open(api_url))
12
+
13
+ image_file_url = xml.at('//photo-url[@max-width="1280"]').content
14
+
15
+ thumbnail_url = xml.at('//photo-url[@max-width="500"]').content
16
+
17
+ artist_name = xml.at('//tumblelog').attr('name')
18
+
19
+ canonical_page_url = xml.at('//post').attr('url-with-slug')
20
+
21
+ { image_url: image_file_url, thumbnail_url: thumbnail_url, artist: artist_name, url: canonical_page_url }
22
+ end
23
+ end
24
+ end
@@ -0,0 +1,4 @@
1
+ # frozen_string_literal: true
2
+ module ImageMetadataScraper
3
+ VERSION = '0.1.0'
4
+ end
@@ -0,0 +1,56 @@
1
+ # frozen_string_literal: true
2
+ require 'net/http'
3
+
4
+ require 'image_metadata_scraper/version'
5
+ require 'image_metadata_scraper/deviantart'
6
+ require 'image_metadata_scraper/tumblr'
7
+
8
+ module ImageMetadataScraper
9
+ IMAGE_FILE_URL = /\Ahttps?:\/\/.*\.(jpg|jpeg|png|gif|svg)/
10
+
11
+ SCRAPERS = {
12
+ /\Ahttps?:\/\/.+deviantart\.com\/.+/ => DeviantArt.method(:post),
13
+ /\Ahttps?:\/\/.+deviantart\.net\/.+d.+/ => DeviantArt.method(:direct_link),
14
+ /\Ahttps?:\/\/.+tumblr\.com\/(post|image)\/.+/ => Tumblr.method(:post),
15
+ IMAGE_FILE_URL => ->(url) { { image_url: url, thumbnail_url: url } }
16
+ }.freeze
17
+
18
+ # Returns a hash of scraped image metadata that always contains:
19
+ # +image_url+: URL to the largest available image file
20
+ # +thumbnail_url+: URL to a small version of the image
21
+ #
22
+ # and includes, if applicable,
23
+ # +artist+: the name of the artist (blogger)
24
+ # +url+: canonical URL to the image page (e.g. DeviantArt post)
25
+ #
26
+ # Returns nil if scraping fails.
27
+ def self.scrape(url)
28
+ url = http_url(url) or return
29
+ url = redirect_from(url)
30
+
31
+ scraper = SCRAPERS.detect { |regex, _| url =~ regex }&.last or return
32
+ scraper.call(url)
33
+ end
34
+
35
+ def self.redirect_from(url)
36
+ response = Net::HTTP.get_response(URI(url.strip))
37
+ case response.code when '301', '302'
38
+ response.header['location']
39
+ else
40
+ url
41
+ end
42
+ end
43
+
44
+ def self.http_url(url)
45
+ return if url.blank?
46
+
47
+ url = url.strip
48
+ scheme = url.match(/\A.+:\/\//)
49
+
50
+ if scheme.nil?
51
+ "http://#{url}"
52
+ elsif scheme.to_s == 'http://' || scheme.to_s == 'https://'
53
+ url
54
+ end
55
+ end
56
+ end
metadata ADDED
@@ -0,0 +1,167 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: image_metadata_scraper
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - little-bobby-tables
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2017-03-19 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: nokogiri
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: activesupport
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: openssl
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: vcr
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: webmock
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: minitest-reporters
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: simplecov
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
111
+ - !ruby/object:Gem::Dependency
112
+ name: codeclimate-test-reporter
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '0'
125
+ description:
126
+ email:
127
+ - little-bobby-tables@users.noreply.github.com
128
+ executables: []
129
+ extensions: []
130
+ extra_rdoc_files: []
131
+ files:
132
+ - ".gitignore"
133
+ - ".travis.yml"
134
+ - Gemfile
135
+ - LICENSE
136
+ - README.md
137
+ - Rakefile
138
+ - image_metadata_scraper.gemspec
139
+ - lib/image_metadata_scraper.rb
140
+ - lib/image_metadata_scraper/deviantart.rb
141
+ - lib/image_metadata_scraper/tumblr.rb
142
+ - lib/image_metadata_scraper/version.rb
143
+ homepage: https://github.com/little-bobby-tables/image_metadata_scraper
144
+ licenses:
145
+ - CC0-1.0
146
+ metadata: {}
147
+ post_install_message:
148
+ rdoc_options: []
149
+ require_paths:
150
+ - lib
151
+ required_ruby_version: !ruby/object:Gem::Requirement
152
+ requirements:
153
+ - - ">="
154
+ - !ruby/object:Gem::Version
155
+ version: '0'
156
+ required_rubygems_version: !ruby/object:Gem::Requirement
157
+ requirements:
158
+ - - ">="
159
+ - !ruby/object:Gem::Version
160
+ version: '0'
161
+ requirements: []
162
+ rubyforge_project:
163
+ rubygems_version: 2.6.10
164
+ signing_key:
165
+ specification_version: 4
166
+ summary: Simple image metadata scraper.
167
+ test_files: []