link_thumbnailer 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/.gitignore ADDED
@@ -0,0 +1,17 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ Gemfile.lock
7
+ InstalledFiles
8
+ _yardoc
9
+ coverage
10
+ doc/
11
+ lib/bundler/man
12
+ pkg
13
+ rdoc
14
+ spec/reports
15
+ test/tmp
16
+ test/version_tmp
17
+ tmp
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in link_thumbnailer.gemspec
4
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2012 Pierre-Louis Gottfrois
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,127 @@
1
+ # LinkThumbnailer
2
+
3
+ Ruby gem generating image thumbnails from a given URL. Rank them and give you back an object containing images and website informations. Works like Facebook link previewer.
4
+
5
+ ## Installation
6
+
7
+ Add this line to your application's Gemfile:
8
+
9
+ gem 'link_thumbnailer'
10
+
11
+ And then execute:
12
+
13
+ $ bundle
14
+
15
+ Or install it yourself as:
16
+
17
+ $ gem install link_thumbnailer
18
+
19
+ Run:
20
+
21
+ $ rails g link_thumbnailer:install
22
+
23
+ This will add `link_thumbnailer.rb` to `config/initializers/`. See [#Configuration](https://github.com/gottfrois/link_thumbnailer#configuration) for more details.
24
+
25
+ ## Usage
26
+
27
+ Run `irb` and require mandatory dependencies:
28
+
29
+ require 'rails'
30
+ => true
31
+ require 'link_thumbnailer'
32
+ => true
33
+
34
+ This gem can handle [Opengraph](http://ogp.me/) protocol. Here is an example with such a website:
35
+
36
+ object = LinkThumbnailer.generate('http://zerply.com')
37
+ => #<LinkThumbnailer::Object description="Go beyond the résumé - showcase your work and your talent" image="http://zerply.com/img/front/facebook_icon_green.png" images=["http://zerply.com/img/front/facebook_icon_green.png"] site_name="zerply.com" title="Join Me on Zerply" url="http://zerply.com">
38
+
39
+ object.title?
40
+ => true
41
+ object.title
42
+ => "Join Me on Zerply"
43
+
44
+ object.url?
45
+ => true
46
+ object.url
47
+ => "http://zerply.com"
48
+
49
+ object.foo?
50
+ => false
51
+ object.foo
52
+ => nil
53
+
54
+ Now with a regular website with no particular protocol:
55
+
56
+ object = LinkThumbnailer.generate('http://foo.com')
57
+ => #<LinkThumbnailer::Object description=nil images=[[ JPEG 750x200 750x200+0+0 DirectClass 8-bit 45kb] scene=0] title="Foo.com" url="http://foo.com">
58
+
59
+ object.title
60
+ => "Foo.com"
61
+
62
+ object.images
63
+ => [[ JPEG 750x200 750x200+0+0 DirectClass 8-bit 45kb]
64
+ scene=0]
65
+
66
+ object.images.first.source_url
67
+ => #<URI::HTTP:0x007ff7a923ef58 URL:http://foo.com/media/BAhbB1sHOgZmSSItMjAxMi8wNC8yNi8yMC8xMS80OS80MjYvY29yZG92YWJlYWNoLmpwZwY6BkVUWwg6BnA6CnRodW1iSSINNzUweDIwMCMGOwZU/cordovabeach.jpg>
68
+
69
+ You can check whether this object is valid or not (set mandatory attributes in the initializer, defaults are `[url, title, images]`)
70
+
71
+ object.valid?
72
+ => true
73
+
74
+ You also can set `max` and `top` options at runtime:
75
+
76
+ object = LinkThumbnailer.generate('http://foo.com', :top => 10, :max => 20)
77
+
78
+ ## Configuration
79
+
80
+ In `config/initializers/link_thumbnailer.rb`
81
+
82
+ LinkThumbnailer.configure do |config|
83
+ # Set mandatory attributes require for the website to be valid.
84
+ # You can set `strict` to false if you want to skip this validation.
85
+ # config.mandatory_attributes = %w(url title image)
86
+
87
+ # Whether you want to validate given website against mandatory attributes or not.
88
+ # config.strict = true
89
+
90
+ # Numbers of redirects before raising an exception when trying to parse given url.
91
+ # config.redirect_limit = 3
92
+
93
+ # List of blacklisted urls you want to skip when searching for images.
94
+ # config.blacklist_urls = [
95
+ # %r{^http://ad\.doubleclick\.net/},
96
+ # %r{^http://b\.scorecardresearch\.com/},
97
+ # %r{^http://pixel\.quantserve\.com/},
98
+ # %r{^http://s7\.addthis\.com/}
99
+ # ]
100
+
101
+ # Fetch 10 images maximum.
102
+ # config.max = 10
103
+
104
+ # Return top 5 images only.
105
+ # config.top = 5
106
+ end
107
+
108
+ ## Features
109
+
110
+ Implemented:
111
+
112
+ - Implements [OpenGraph](http://ogp.me/) protocol.
113
+ - Find images and sort them according to how well they represent what the page is about (includes absolute images).
114
+ - Sort images based on their size and color.
115
+ - Blacklist some well known advertisings image urls.
116
+
117
+ Coming soon:
118
+
119
+ - Cache results on filesystem
120
+
121
+ ## Contributing
122
+
123
+ 1. Fork it
124
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
125
+ 3. Commit your changes (`git commit -am 'Added some feature'`)
126
+ 4. Push to the branch (`git push origin my-new-feature`)
127
+ 5. Create new Pull Request
data/Rakefile ADDED
@@ -0,0 +1,2 @@
1
+ #!/usr/bin/env rake
2
+ require "bundler/gem_tasks"
@@ -0,0 +1,15 @@
1
+ module LinkThumbnailer
2
+ module Generators
3
+ class InstallGenerator < Rails::Generators::Base
4
+
5
+ source_root File.expand_path('../../templates', __FILE__)
6
+
7
+ desc 'Creates a LinkThumbnailer initializer for your application.'
8
+
9
+ def copy_initializer
10
+ template 'initializer.rb', 'config/initializers/link_thumbnailer.rb'
11
+ end
12
+
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,26 @@
1
+ # Use this hook to configure LinkThumbnailer bahaviors.
2
+ LinkThumbnailer.configure do |config|
3
+ # Set mandatory attributes require for the website to be valid.
4
+ # You can set `strict` to false if you want to skip this validation.
5
+ # config.mandatory_attributes = %w(url title image)
6
+
7
+ # Whether you want to validate given website against mandatory attributes or not.
8
+ # config.strict = true
9
+
10
+ # Numbers of redirects before raising an exception when trying to parse given url.
11
+ # config.redirect_limit = 3
12
+
13
+ # List of blacklisted urls you want to skip when searching for images.
14
+ # config.blacklist_urls = [
15
+ # %r{^http://ad\.doubleclick\.net/},
16
+ # %r{^http://b\.scorecardresearch\.com/},
17
+ # %r{^http://pixel\.quantserve\.com/},
18
+ # %r{^http://s7\.addthis\.com/}
19
+ # ]
20
+
21
+ # Fetch 10 images maximum.
22
+ # config.max = 10
23
+
24
+ # Return top 5 images only.
25
+ # config.top = 5
26
+ end
@@ -0,0 +1,58 @@
1
+ require 'uri'
2
+
3
+ module LinkThumbnailer
4
+
5
+ module Doc
6
+
7
+ def doc_base_href
8
+ base = at('//head/base')
9
+ base['href'] if base
10
+ end
11
+
12
+ def img_srcs
13
+ search('//img').map { |i| i['src'] }.compact
14
+ end
15
+
16
+ def img_abs_urls(base_url = nil)
17
+ result = []
18
+
19
+ img_srcs.each do |i|
20
+ begin
21
+ u = URI(i)
22
+ rescue URI::InvalidURIError
23
+ next
24
+ end
25
+
26
+ result << if u.is_a?(URI::HTTP)
27
+ u
28
+ else
29
+ URI.join(base_url || doc_base_href || source_url, i)
30
+ end
31
+ end
32
+
33
+ result
34
+ end
35
+
36
+ def title
37
+ css('title').text.strip
38
+ end
39
+
40
+ def description
41
+ if element = xpath('//meta[@name="description" and @content]').first
42
+ return element.attributes['content'].value.strip
43
+ end
44
+
45
+ css('body p').each do |node|
46
+ if !node.has_attribute?('style') && node.first_element_child.nil?
47
+ return node.text.strip
48
+ end
49
+ end
50
+
51
+ nil
52
+ end
53
+
54
+ attr_accessor :source_url
55
+
56
+ end
57
+
58
+ end
@@ -0,0 +1,15 @@
1
+ require 'nokogiri'
2
+
3
+ module LinkThumbnailer
4
+
5
+ class DocParser
6
+
7
+ def parse(doc_string, source_url = nil)
8
+ doc = Nokogiri::HTML(doc_string).extend(LinkThumbnailer::Doc)
9
+ doc.source_url = source_url
10
+ doc
11
+ end
12
+
13
+ end
14
+
15
+ end
@@ -0,0 +1,29 @@
1
+ require 'net/http/persistent'
2
+
3
+ module LinkThumbnailer
4
+
5
+ class Fetcher
6
+
7
+ def fetch(url, redirect_count = 0)
8
+ if redirect_count > LinkThumbnailer.redirect_limit
9
+ raise ArgumentError,
10
+ "too many redirects (#{redirect_count})"
11
+ end
12
+
13
+ uri = url.is_a?(URI) ? url : URI(url)
14
+
15
+ if uri.is_a?(URI::HTTP)
16
+ http = Net::HTTP::Persistent.new('linkthumbnailer')
17
+ http.headers['User-Agent'] = 'linkthumbnailer'
18
+ resp = http.request(uri)
19
+ case resp
20
+ when Net::HTTPSuccess; resp.body
21
+ when Net::HTTPRedirection; fetch(resp['location'], redirect_count + 1)
22
+ else resp.error!
23
+ end
24
+ end
25
+ end
26
+
27
+ end
28
+
29
+ end
@@ -0,0 +1,18 @@
1
+ module LinkThumbnailer
2
+
3
+ module ImgComparator
4
+
5
+ def <=> other
6
+ result = ([other.rows, other.columns].min ** 2) <=>
7
+ ([rows, columns].min ** 2)
8
+
9
+ if result == 0
10
+ result = other.number_colors <=> number_colors
11
+ end
12
+
13
+ result
14
+ end
15
+
16
+ end
17
+
18
+ end
@@ -0,0 +1,47 @@
1
+ require 'RMagick'
2
+
3
+ module LinkThumbnailer
4
+
5
+ class ImgParser
6
+
7
+ def initialize(fetcher, img_url_filter)
8
+ @fetcher = fetcher
9
+ @img_url_filters = [*img_url_filter]
10
+ end
11
+
12
+ def parse(img_urls)
13
+ @img_url_filters.each do |filter|
14
+ img_urls.delete_if { |i| filter.reject?(i) }
15
+ end
16
+
17
+ imgs = []
18
+ count = 0
19
+ img_urls.each { |i|
20
+ break if count >= LinkThumbnailer.max
21
+ img = parse_one(i)
22
+ img.extend LinkThumbnailer::ImgComparator
23
+ imgs << img
24
+ count += 1
25
+ }
26
+
27
+ imgs.sort!
28
+
29
+ imgs.first(LinkThumbnailer.top)
30
+ end
31
+
32
+ def parse_one(img_url)
33
+ begin
34
+ img_data = @fetcher.fetch(img_url)
35
+ img = Magick::ImageList.new.from_blob(img_data).extend(
36
+ LinkThumbnailer::WebImage
37
+ )
38
+ img.source_url = img_url
39
+ img
40
+ rescue Exception
41
+ nil
42
+ end
43
+ end
44
+
45
+ end
46
+
47
+ end
@@ -0,0 +1,14 @@
1
+ module LinkThumbnailer
2
+
3
+ class ImgUrlFilter
4
+
5
+ def reject?(img_url)
6
+ LinkThumbnailer.blacklist_urls.each do |url|
7
+ return true if img_url && img_url.to_s[url]
8
+ end
9
+ false
10
+ end
11
+
12
+ end
13
+
14
+ end
@@ -0,0 +1,24 @@
1
+ require 'hashie'
2
+
3
+ module LinkThumbnailer
4
+ class Object < Hashie::Mash
5
+
6
+ def method_missing(method_name, *args, &block)
7
+ method_name = method_name.to_s
8
+
9
+ if method_name.end_with?('?')
10
+ method_name.chop!
11
+ !self[method_name].nil?
12
+ else
13
+ self[method_name]
14
+ end
15
+ end
16
+
17
+ def valid?
18
+ return false if self.keys.empty?
19
+ LinkThumbnailer.mandatory_attributes.each {|a| return false unless self[a] } if LinkThumbnailer.strict
20
+ true
21
+ end
22
+
23
+ end
24
+ end
@@ -0,0 +1,18 @@
1
+ module LinkThumbnailer
2
+
3
+ class Opengraph
4
+
5
+ def self.parse(object, doc)
6
+ doc.css('meta').each do |m|
7
+ if m.attribute('property') && m.attribute('property').to_s.match(/^og:(.+)$/i)
8
+ object[$1.gsub('-', '_')] = m.attribute('content').to_s
9
+ end
10
+ end
11
+ object[:images] = [object[:image]] if object[:image]
12
+
13
+ object
14
+ end
15
+
16
+ end
17
+
18
+ end
@@ -0,0 +1,3 @@
1
+ module LinkThumbnailer
2
+ VERSION = "0.0.1"
3
+ end
@@ -0,0 +1,10 @@
1
+ module LinkThumbnailer
2
+
3
+ module WebImage
4
+
5
+ attr_accessor :source_url
6
+ attr_accessor :doc
7
+
8
+ end
9
+
10
+ end
@@ -0,0 +1,76 @@
1
+ # require 'json'
2
+ # require 'nokogiri'
3
+ # require 'link_thumbnailer/parser/opengraph'
4
+
5
+ require 'link_thumbnailer/object'
6
+ require 'link_thumbnailer/fetcher'
7
+ require 'link_thumbnailer/doc_parser'
8
+ require 'link_thumbnailer/doc'
9
+ require 'link_thumbnailer/img_url_filter'
10
+ require 'link_thumbnailer/img_parser'
11
+ require 'link_thumbnailer/img_comparator'
12
+ require 'link_thumbnailer/web_image'
13
+
14
+ require 'link_thumbnailer/opengraph'
15
+
16
+ require 'link_thumbnailer/version'
17
+
18
+ module LinkThumbnailer
19
+
20
+ mattr_accessor :mandatory_attributes
21
+ @@mandatory_attributes = %w(url title images)
22
+
23
+ mattr_accessor :strict
24
+ @@strict = true
25
+
26
+ mattr_accessor :redirect_limit
27
+ @@redirect_limit = 3
28
+
29
+ mattr_accessor :blacklist_urls
30
+ @@blacklist_urls = [
31
+ %r{^http://ad\.doubleclick\.net/},
32
+ %r{^http://b\.scorecardresearch\.com/},
33
+ %r{^http://pixel\.quantserve\.com/},
34
+ %r{^http://s7\.addthis\.com/}
35
+ ]
36
+
37
+ mattr_accessor :max
38
+ @@max = 10
39
+
40
+ mattr_accessor :top
41
+ @@top = 5
42
+
43
+ def self.configure
44
+ yield self
45
+ end
46
+
47
+ def self.generate(url, options = {})
48
+ @@top = options[:top].to_i if options[:top]
49
+ @@max = options[:max].to_i if options[:max]
50
+
51
+ @object = LinkThumbnailer::Object.new
52
+ @fetcher = LinkThumbnailer::Fetcher.new
53
+ @doc_parser = LinkThumbnailer::DocParser.new
54
+
55
+ doc_string = @fetcher.fetch(url)
56
+ doc = @doc_parser.parse(doc_string, url)
57
+
58
+ @object[:url] = doc.source_url
59
+
60
+ # Try Opengraph first
61
+ @object = LinkThumbnailer::Opengraph.parse(@object, doc)
62
+ return @object if @object.valid?
63
+
64
+ # Else try manually
65
+ @img_url_filters = [LinkThumbnailer::ImgUrlFilter.new]
66
+ @img_parser = LinkThumbnailer::ImgParser.new(@fetcher, @img_url_filters)
67
+
68
+ @object[:title] = doc.title
69
+ @object[:description] = doc.description
70
+ @object[:images] = @img_parser.parse(doc.img_abs_urls.dup)
71
+
72
+ return nil unless @object.valid?
73
+ @object
74
+ end
75
+
76
+ end
@@ -0,0 +1,22 @@
1
+ # -*- encoding: utf-8 -*-
2
+ require File.expand_path('../lib/link_thumbnailer/version', __FILE__)
3
+
4
+ Gem::Specification.new do |gem|
5
+ gem.authors = ["Pierre-Louis Gottfrois"]
6
+ gem.email = ["pierrelouis.gottfrois@gmail.com"]
7
+ gem.description = %q{Ruby gem generating thumbnail images from a given URL.}
8
+ gem.summary = %q{Ruby gem ranking images from a given URL returning an object containing images and website informations.}
9
+ gem.homepage = "https://github.com/gottfrois/link_thumbnailer"
10
+
11
+ gem.files = `git ls-files`.split($\)
12
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
13
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
14
+ gem.name = "link_thumbnailer"
15
+ gem.require_paths = ["lib"]
16
+ gem.version = LinkThumbnailer::VERSION
17
+
18
+ gem.add_dependency(%q{nokogiri}, ['~> 1.4.0'])
19
+ gem.add_dependency(%q{hashie}, ['~> 1.2.0'])
20
+ gem.add_dependency(%q{net-http-persistent}, ['> 0'])
21
+ gem.add_dependency(%q{rmagick}, ['> 0'])
22
+ end
metadata ADDED
@@ -0,0 +1,129 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: link_thumbnailer
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Pierre-Louis Gottfrois
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2012-08-19 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: nokogiri
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ~>
20
+ - !ruby/object:Gem::Version
21
+ version: 1.4.0
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ~>
28
+ - !ruby/object:Gem::Version
29
+ version: 1.4.0
30
+ - !ruby/object:Gem::Dependency
31
+ name: hashie
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ~>
36
+ - !ruby/object:Gem::Version
37
+ version: 1.2.0
38
+ type: :runtime
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ~>
44
+ - !ruby/object:Gem::Version
45
+ version: 1.2.0
46
+ - !ruby/object:Gem::Dependency
47
+ name: net-http-persistent
48
+ requirement: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ! '>'
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ type: :runtime
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ! '>'
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ - !ruby/object:Gem::Dependency
63
+ name: rmagick
64
+ requirement: !ruby/object:Gem::Requirement
65
+ none: false
66
+ requirements:
67
+ - - ! '>'
68
+ - !ruby/object:Gem::Version
69
+ version: '0'
70
+ type: :runtime
71
+ prerelease: false
72
+ version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
74
+ requirements:
75
+ - - ! '>'
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ description: Ruby gem generating thumbnail images from a given URL.
79
+ email:
80
+ - pierrelouis.gottfrois@gmail.com
81
+ executables: []
82
+ extensions: []
83
+ extra_rdoc_files: []
84
+ files:
85
+ - .gitignore
86
+ - Gemfile
87
+ - LICENSE
88
+ - README.md
89
+ - Rakefile
90
+ - lib/generators/link_thumbnailer/install_generator.rb
91
+ - lib/generators/templates/initializer.rb
92
+ - lib/link_thumbnailer.rb
93
+ - lib/link_thumbnailer/doc.rb
94
+ - lib/link_thumbnailer/doc_parser.rb
95
+ - lib/link_thumbnailer/fetcher.rb
96
+ - lib/link_thumbnailer/img_comparator.rb
97
+ - lib/link_thumbnailer/img_parser.rb
98
+ - lib/link_thumbnailer/img_url_filter.rb
99
+ - lib/link_thumbnailer/object.rb
100
+ - lib/link_thumbnailer/opengraph.rb
101
+ - lib/link_thumbnailer/version.rb
102
+ - lib/link_thumbnailer/web_image.rb
103
+ - link_thumbnailer.gemspec
104
+ homepage: https://github.com/gottfrois/link_thumbnailer
105
+ licenses: []
106
+ post_install_message:
107
+ rdoc_options: []
108
+ require_paths:
109
+ - lib
110
+ required_ruby_version: !ruby/object:Gem::Requirement
111
+ none: false
112
+ requirements:
113
+ - - ! '>='
114
+ - !ruby/object:Gem::Version
115
+ version: '0'
116
+ required_rubygems_version: !ruby/object:Gem::Requirement
117
+ none: false
118
+ requirements:
119
+ - - ! '>='
120
+ - !ruby/object:Gem::Version
121
+ version: '0'
122
+ requirements: []
123
+ rubyforge_project:
124
+ rubygems_version: 1.8.24
125
+ signing_key:
126
+ specification_version: 3
127
+ summary: Ruby gem ranking images from a given URL returning an object containing images
128
+ and website informations.
129
+ test_files: []