link_thumbnailer 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
data/.gitignore ADDED
@@ -0,0 +1,17 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ Gemfile.lock
7
+ InstalledFiles
8
+ _yardoc
9
+ coverage
10
+ doc/
11
+ lib/bundler/man
12
+ pkg
13
+ rdoc
14
+ spec/reports
15
+ test/tmp
16
+ test/version_tmp
17
+ tmp
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in link_thumbnailer.gemspec
4
+ gemspec
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2012 Pierre-Louis Gottfrois
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,127 @@
1
+ # LinkThumbnailer
2
+
3
+ Ruby gem generating image thumbnails from a given URL. Rank them and give you back an object containing images and website informations. Works like Facebook link previewer.
4
+
5
+ ## Installation
6
+
7
+ Add this line to your application's Gemfile:
8
+
9
+ gem 'link_thumbnailer'
10
+
11
+ And then execute:
12
+
13
+ $ bundle
14
+
15
+ Or install it yourself as:
16
+
17
+ $ gem install link_thumbnailer
18
+
19
+ Run:
20
+
21
+ $ rails g link_thumbnailer:install
22
+
23
+ This will add `link_thumbnailer.rb` to `config/initializers/`. See [#Configuration](https://github.com/gottfrois/link_thumbnailer#configuration) for more details.
24
+
25
+ ## Usage
26
+
27
+ Run `irb` and require mandatory dependencies:
28
+
29
+ require 'rails'
30
+ => true
31
+ require 'link_thumbnailer'
32
+ => true
33
+
34
+ This gem can handle [Opengraph](http://ogp.me/) protocol. Here is an example with such a website:
35
+
36
+ object = LinkThumbnailer.generate('http://zerply.com')
37
+ => #<LinkThumbnailer::Object description="Go beyond the résumé - showcase your work and your talent" image="http://zerply.com/img/front/facebook_icon_green.png" images=["http://zerply.com/img/front/facebook_icon_green.png"] site_name="zerply.com" title="Join Me on Zerply" url="http://zerply.com">
38
+
39
+ object.title?
40
+ => true
41
+ object.title
42
+ => "Join Me on Zerply"
43
+
44
+ object.url?
45
+ => true
46
+ object.url
47
+ => "http://zerply.com"
48
+
49
+ object.foo?
50
+ => false
51
+ object.foo
52
+ => nil
53
+
54
+ Now with a regular website with no particular protocol:
55
+
56
+ object = LinkThumbnailer.generate('http://foo.com')
57
+ => #<LinkThumbnailer::Object description=nil images=[[ JPEG 750x200 750x200+0+0 DirectClass 8-bit 45kb] scene=0] title="Foo.com" url="http://foo.com">
58
+
59
+ object.title
60
+ => "Foo.com"
61
+
62
+ object.images
63
+ => [[ JPEG 750x200 750x200+0+0 DirectClass 8-bit 45kb]
64
+ scene=0]
65
+
66
+ object.images.first.source_url
67
+ => #<URI::HTTP:0x007ff7a923ef58 URL:http://foo.com/media/BAhbB1sHOgZmSSItMjAxMi8wNC8yNi8yMC8xMS80OS80MjYvY29yZG92YWJlYWNoLmpwZwY6BkVUWwg6BnA6CnRodW1iSSINNzUweDIwMCMGOwZU/cordovabeach.jpg>
68
+
69
+ You can check whether this object is valid or not (set mandatory attributes in the initializer, defaults are `[url, title, images]`)
70
+
71
+ object.valid?
72
+ => true
73
+
74
+ You also can set `max` and `top` options at runtime:
75
+
76
+ object = LinkThumbnailer.generate('http://foo.com', :top => 10, :max => 20)
77
+
78
+ ## Configuration
79
+
80
+ In `config/initializers/link_thumbnailer.rb`
81
+
82
+ LinkThumbnailer.configure do |config|
83
+ # Set mandatory attributes require for the website to be valid.
84
+ # You can set `strict` to false if you want to skip this validation.
85
+ # config.mandatory_attributes = %w(url title image)
86
+
87
+ # Whether you want to validate given website against mandatory attributes or not.
88
+ # config.strict = true
89
+
90
+ # Numbers of redirects before raising an exception when trying to parse given url.
91
+ # config.redirect_limit = 3
92
+
93
+ # List of blacklisted urls you want to skip when searching for images.
94
+ # config.blacklist_urls = [
95
+ # %r{^http://ad\.doubleclick\.net/},
96
+ # %r{^http://b\.scorecardresearch\.com/},
97
+ # %r{^http://pixel\.quantserve\.com/},
98
+ # %r{^http://s7\.addthis\.com/}
99
+ # ]
100
+
101
+ # Fetch 10 images maximum.
102
+ # config.max = 10
103
+
104
+ # Return top 5 images only.
105
+ # config.top = 5
106
+ end
107
+
108
+ ## Features
109
+
110
+ Implemented:
111
+
112
+ - Implements [OpenGraph](http://ogp.me/) protocol.
113
+ - Find images and sort them according to how well they represent what the page is about (includes absolute images).
114
+ - Sort images based on their size and color.
115
+ - Blacklist some well known advertisings image urls.
116
+
117
+ Coming soon:
118
+
119
+ - Cache results on filesystem
120
+
121
+ ## Contributing
122
+
123
+ 1. Fork it
124
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
125
+ 3. Commit your changes (`git commit -am 'Added some feature'`)
126
+ 4. Push to the branch (`git push origin my-new-feature`)
127
+ 5. Create new Pull Request
data/Rakefile ADDED
@@ -0,0 +1,2 @@
1
+ #!/usr/bin/env rake
2
+ require "bundler/gem_tasks"
@@ -0,0 +1,15 @@
1
+ module LinkThumbnailer
2
+ module Generators
3
+ class InstallGenerator < Rails::Generators::Base
4
+
5
+ source_root File.expand_path('../../templates', __FILE__)
6
+
7
+ desc 'Creates a LinkThumbnailer initializer for your application.'
8
+
9
+ def copy_initializer
10
+ template 'initializer.rb', 'config/initializers/link_thumbnailer.rb'
11
+ end
12
+
13
+ end
14
+ end
15
+ end
@@ -0,0 +1,26 @@
1
+ # Use this hook to configure LinkThumbnailer bahaviors.
2
+ LinkThumbnailer.configure do |config|
3
+ # Set mandatory attributes require for the website to be valid.
4
+ # You can set `strict` to false if you want to skip this validation.
5
+ # config.mandatory_attributes = %w(url title image)
6
+
7
+ # Whether you want to validate given website against mandatory attributes or not.
8
+ # config.strict = true
9
+
10
+ # Numbers of redirects before raising an exception when trying to parse given url.
11
+ # config.redirect_limit = 3
12
+
13
+ # List of blacklisted urls you want to skip when searching for images.
14
+ # config.blacklist_urls = [
15
+ # %r{^http://ad\.doubleclick\.net/},
16
+ # %r{^http://b\.scorecardresearch\.com/},
17
+ # %r{^http://pixel\.quantserve\.com/},
18
+ # %r{^http://s7\.addthis\.com/}
19
+ # ]
20
+
21
+ # Fetch 10 images maximum.
22
+ # config.max = 10
23
+
24
+ # Return top 5 images only.
25
+ # config.top = 5
26
+ end
@@ -0,0 +1,58 @@
1
+ require 'uri'
2
+
3
+ module LinkThumbnailer
4
+
5
+ module Doc
6
+
7
+ def doc_base_href
8
+ base = at('//head/base')
9
+ base['href'] if base
10
+ end
11
+
12
+ def img_srcs
13
+ search('//img').map { |i| i['src'] }.compact
14
+ end
15
+
16
+ def img_abs_urls(base_url = nil)
17
+ result = []
18
+
19
+ img_srcs.each do |i|
20
+ begin
21
+ u = URI(i)
22
+ rescue URI::InvalidURIError
23
+ next
24
+ end
25
+
26
+ result << if u.is_a?(URI::HTTP)
27
+ u
28
+ else
29
+ URI.join(base_url || doc_base_href || source_url, i)
30
+ end
31
+ end
32
+
33
+ result
34
+ end
35
+
36
+ def title
37
+ css('title').text.strip
38
+ end
39
+
40
+ def description
41
+ if element = xpath('//meta[@name="description" and @content]').first
42
+ return element.attributes['content'].value.strip
43
+ end
44
+
45
+ css('body p').each do |node|
46
+ if !node.has_attribute?('style') && node.first_element_child.nil?
47
+ return node.text.strip
48
+ end
49
+ end
50
+
51
+ nil
52
+ end
53
+
54
+ attr_accessor :source_url
55
+
56
+ end
57
+
58
+ end
@@ -0,0 +1,15 @@
1
+ require 'nokogiri'
2
+
3
+ module LinkThumbnailer
4
+
5
+ class DocParser
6
+
7
+ def parse(doc_string, source_url = nil)
8
+ doc = Nokogiri::HTML(doc_string).extend(LinkThumbnailer::Doc)
9
+ doc.source_url = source_url
10
+ doc
11
+ end
12
+
13
+ end
14
+
15
+ end
@@ -0,0 +1,29 @@
1
+ require 'net/http/persistent'
2
+
3
+ module LinkThumbnailer
4
+
5
+ class Fetcher
6
+
7
+ def fetch(url, redirect_count = 0)
8
+ if redirect_count > LinkThumbnailer.redirect_limit
9
+ raise ArgumentError,
10
+ "too many redirects (#{redirect_count})"
11
+ end
12
+
13
+ uri = url.is_a?(URI) ? url : URI(url)
14
+
15
+ if uri.is_a?(URI::HTTP)
16
+ http = Net::HTTP::Persistent.new('linkthumbnailer')
17
+ http.headers['User-Agent'] = 'linkthumbnailer'
18
+ resp = http.request(uri)
19
+ case resp
20
+ when Net::HTTPSuccess; resp.body
21
+ when Net::HTTPRedirection; fetch(resp['location'], redirect_count + 1)
22
+ else resp.error!
23
+ end
24
+ end
25
+ end
26
+
27
+ end
28
+
29
+ end
@@ -0,0 +1,18 @@
1
+ module LinkThumbnailer
2
+
3
+ module ImgComparator
4
+
5
+ def <=> other
6
+ result = ([other.rows, other.columns].min ** 2) <=>
7
+ ([rows, columns].min ** 2)
8
+
9
+ if result == 0
10
+ result = other.number_colors <=> number_colors
11
+ end
12
+
13
+ result
14
+ end
15
+
16
+ end
17
+
18
+ end
@@ -0,0 +1,47 @@
1
+ require 'RMagick'
2
+
3
+ module LinkThumbnailer
4
+
5
+ class ImgParser
6
+
7
+ def initialize(fetcher, img_url_filter)
8
+ @fetcher = fetcher
9
+ @img_url_filters = [*img_url_filter]
10
+ end
11
+
12
+ def parse(img_urls)
13
+ @img_url_filters.each do |filter|
14
+ img_urls.delete_if { |i| filter.reject?(i) }
15
+ end
16
+
17
+ imgs = []
18
+ count = 0
19
+ img_urls.each { |i|
20
+ break if count >= LinkThumbnailer.max
21
+ img = parse_one(i)
22
+ img.extend LinkThumbnailer::ImgComparator
23
+ imgs << img
24
+ count += 1
25
+ }
26
+
27
+ imgs.sort!
28
+
29
+ imgs.first(LinkThumbnailer.top)
30
+ end
31
+
32
+ def parse_one(img_url)
33
+ begin
34
+ img_data = @fetcher.fetch(img_url)
35
+ img = Magick::ImageList.new.from_blob(img_data).extend(
36
+ LinkThumbnailer::WebImage
37
+ )
38
+ img.source_url = img_url
39
+ img
40
+ rescue Exception
41
+ nil
42
+ end
43
+ end
44
+
45
+ end
46
+
47
+ end
@@ -0,0 +1,14 @@
1
+ module LinkThumbnailer
2
+
3
+ class ImgUrlFilter
4
+
5
+ def reject?(img_url)
6
+ LinkThumbnailer.blacklist_urls.each do |url|
7
+ return true if img_url && img_url.to_s[url]
8
+ end
9
+ false
10
+ end
11
+
12
+ end
13
+
14
+ end
@@ -0,0 +1,24 @@
1
+ require 'hashie'
2
+
3
+ module LinkThumbnailer
4
+ class Object < Hashie::Mash
5
+
6
+ def method_missing(method_name, *args, &block)
7
+ method_name = method_name.to_s
8
+
9
+ if method_name.end_with?('?')
10
+ method_name.chop!
11
+ !self[method_name].nil?
12
+ else
13
+ self[method_name]
14
+ end
15
+ end
16
+
17
+ def valid?
18
+ return false if self.keys.empty?
19
+ LinkThumbnailer.mandatory_attributes.each {|a| return false unless self[a] } if LinkThumbnailer.strict
20
+ true
21
+ end
22
+
23
+ end
24
+ end
@@ -0,0 +1,18 @@
1
+ module LinkThumbnailer
2
+
3
+ class Opengraph
4
+
5
+ def self.parse(object, doc)
6
+ doc.css('meta').each do |m|
7
+ if m.attribute('property') && m.attribute('property').to_s.match(/^og:(.+)$/i)
8
+ object[$1.gsub('-', '_')] = m.attribute('content').to_s
9
+ end
10
+ end
11
+ object[:images] = [object[:image]] if object[:image]
12
+
13
+ object
14
+ end
15
+
16
+ end
17
+
18
+ end
@@ -0,0 +1,3 @@
1
+ module LinkThumbnailer
2
+ VERSION = "0.0.1"
3
+ end
@@ -0,0 +1,10 @@
1
+ module LinkThumbnailer
2
+
3
+ module WebImage
4
+
5
+ attr_accessor :source_url
6
+ attr_accessor :doc
7
+
8
+ end
9
+
10
+ end
@@ -0,0 +1,76 @@
1
+ # require 'json'
2
+ # require 'nokogiri'
3
+ # require 'link_thumbnailer/parser/opengraph'
4
+
5
+ require 'link_thumbnailer/object'
6
+ require 'link_thumbnailer/fetcher'
7
+ require 'link_thumbnailer/doc_parser'
8
+ require 'link_thumbnailer/doc'
9
+ require 'link_thumbnailer/img_url_filter'
10
+ require 'link_thumbnailer/img_parser'
11
+ require 'link_thumbnailer/img_comparator'
12
+ require 'link_thumbnailer/web_image'
13
+
14
+ require 'link_thumbnailer/opengraph'
15
+
16
+ require 'link_thumbnailer/version'
17
+
18
+ module LinkThumbnailer
19
+
20
+ mattr_accessor :mandatory_attributes
21
+ @@mandatory_attributes = %w(url title images)
22
+
23
+ mattr_accessor :strict
24
+ @@strict = true
25
+
26
+ mattr_accessor :redirect_limit
27
+ @@redirect_limit = 3
28
+
29
+ mattr_accessor :blacklist_urls
30
+ @@blacklist_urls = [
31
+ %r{^http://ad\.doubleclick\.net/},
32
+ %r{^http://b\.scorecardresearch\.com/},
33
+ %r{^http://pixel\.quantserve\.com/},
34
+ %r{^http://s7\.addthis\.com/}
35
+ ]
36
+
37
+ mattr_accessor :max
38
+ @@max = 10
39
+
40
+ mattr_accessor :top
41
+ @@top = 5
42
+
43
+ def self.configure
44
+ yield self
45
+ end
46
+
47
+ def self.generate(url, options = {})
48
+ @@top = options[:top].to_i if options[:top]
49
+ @@max = options[:max].to_i if options[:max]
50
+
51
+ @object = LinkThumbnailer::Object.new
52
+ @fetcher = LinkThumbnailer::Fetcher.new
53
+ @doc_parser = LinkThumbnailer::DocParser.new
54
+
55
+ doc_string = @fetcher.fetch(url)
56
+ doc = @doc_parser.parse(doc_string, url)
57
+
58
+ @object[:url] = doc.source_url
59
+
60
+ # Try Opengraph first
61
+ @object = LinkThumbnailer::Opengraph.parse(@object, doc)
62
+ return @object if @object.valid?
63
+
64
+ # Else try manually
65
+ @img_url_filters = [LinkThumbnailer::ImgUrlFilter.new]
66
+ @img_parser = LinkThumbnailer::ImgParser.new(@fetcher, @img_url_filters)
67
+
68
+ @object[:title] = doc.title
69
+ @object[:description] = doc.description
70
+ @object[:images] = @img_parser.parse(doc.img_abs_urls.dup)
71
+
72
+ return nil unless @object.valid?
73
+ @object
74
+ end
75
+
76
+ end
@@ -0,0 +1,22 @@
1
+ # -*- encoding: utf-8 -*-
2
+ require File.expand_path('../lib/link_thumbnailer/version', __FILE__)
3
+
4
+ Gem::Specification.new do |gem|
5
+ gem.authors = ["Pierre-Louis Gottfrois"]
6
+ gem.email = ["pierrelouis.gottfrois@gmail.com"]
7
+ gem.description = %q{Ruby gem generating thumbnail images from a given URL.}
8
+ gem.summary = %q{Ruby gem ranking images from a given URL returning an object containing images and website informations.}
9
+ gem.homepage = "https://github.com/gottfrois/link_thumbnailer"
10
+
11
+ gem.files = `git ls-files`.split($\)
12
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
13
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
14
+ gem.name = "link_thumbnailer"
15
+ gem.require_paths = ["lib"]
16
+ gem.version = LinkThumbnailer::VERSION
17
+
18
+ gem.add_dependency(%q{nokogiri}, ['~> 1.4.0'])
19
+ gem.add_dependency(%q{hashie}, ['~> 1.2.0'])
20
+ gem.add_dependency(%q{net-http-persistent}, ['> 0'])
21
+ gem.add_dependency(%q{rmagick}, ['> 0'])
22
+ end
metadata ADDED
@@ -0,0 +1,129 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: link_thumbnailer
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Pierre-Louis Gottfrois
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2012-08-19 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: nokogiri
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ~>
20
+ - !ruby/object:Gem::Version
21
+ version: 1.4.0
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ~>
28
+ - !ruby/object:Gem::Version
29
+ version: 1.4.0
30
+ - !ruby/object:Gem::Dependency
31
+ name: hashie
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ~>
36
+ - !ruby/object:Gem::Version
37
+ version: 1.2.0
38
+ type: :runtime
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ~>
44
+ - !ruby/object:Gem::Version
45
+ version: 1.2.0
46
+ - !ruby/object:Gem::Dependency
47
+ name: net-http-persistent
48
+ requirement: !ruby/object:Gem::Requirement
49
+ none: false
50
+ requirements:
51
+ - - ! '>'
52
+ - !ruby/object:Gem::Version
53
+ version: '0'
54
+ type: :runtime
55
+ prerelease: false
56
+ version_requirements: !ruby/object:Gem::Requirement
57
+ none: false
58
+ requirements:
59
+ - - ! '>'
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ - !ruby/object:Gem::Dependency
63
+ name: rmagick
64
+ requirement: !ruby/object:Gem::Requirement
65
+ none: false
66
+ requirements:
67
+ - - ! '>'
68
+ - !ruby/object:Gem::Version
69
+ version: '0'
70
+ type: :runtime
71
+ prerelease: false
72
+ version_requirements: !ruby/object:Gem::Requirement
73
+ none: false
74
+ requirements:
75
+ - - ! '>'
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ description: Ruby gem generating thumbnail images from a given URL.
79
+ email:
80
+ - pierrelouis.gottfrois@gmail.com
81
+ executables: []
82
+ extensions: []
83
+ extra_rdoc_files: []
84
+ files:
85
+ - .gitignore
86
+ - Gemfile
87
+ - LICENSE
88
+ - README.md
89
+ - Rakefile
90
+ - lib/generators/link_thumbnailer/install_generator.rb
91
+ - lib/generators/templates/initializer.rb
92
+ - lib/link_thumbnailer.rb
93
+ - lib/link_thumbnailer/doc.rb
94
+ - lib/link_thumbnailer/doc_parser.rb
95
+ - lib/link_thumbnailer/fetcher.rb
96
+ - lib/link_thumbnailer/img_comparator.rb
97
+ - lib/link_thumbnailer/img_parser.rb
98
+ - lib/link_thumbnailer/img_url_filter.rb
99
+ - lib/link_thumbnailer/object.rb
100
+ - lib/link_thumbnailer/opengraph.rb
101
+ - lib/link_thumbnailer/version.rb
102
+ - lib/link_thumbnailer/web_image.rb
103
+ - link_thumbnailer.gemspec
104
+ homepage: https://github.com/gottfrois/link_thumbnailer
105
+ licenses: []
106
+ post_install_message:
107
+ rdoc_options: []
108
+ require_paths:
109
+ - lib
110
+ required_ruby_version: !ruby/object:Gem::Requirement
111
+ none: false
112
+ requirements:
113
+ - - ! '>='
114
+ - !ruby/object:Gem::Version
115
+ version: '0'
116
+ required_rubygems_version: !ruby/object:Gem::Requirement
117
+ none: false
118
+ requirements:
119
+ - - ! '>='
120
+ - !ruby/object:Gem::Version
121
+ version: '0'
122
+ requirements: []
123
+ rubyforge_project:
124
+ rubygems_version: 1.8.24
125
+ signing_key:
126
+ specification_version: 3
127
+ summary: Ruby gem ranking images from a given URL returning an object containing images
128
+ and website informations.
129
+ test_files: []