dhash-vips 0.0.4.0 → 0.0.4.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: bb131dc947f50e6d428a75c6518704b6797e1d30
4
- data.tar.gz: 63dc5d70626192321675dbfeaa020c31812095ac
3
+ metadata.gz: a2e3a2b86e4e98ad5e145893959ac6297f4aa660
4
+ data.tar.gz: 5ada76a3f6567dde373947c8ea02d706e706cff5
5
5
  SHA512:
6
- metadata.gz: 8d1129484277dfcb2630aba294fdc2b2443017836376b556842a7d0faacbb4d4f77ef9e4e5f82653a32946c815c550182e26dd61b60f3c61a49fb589d539aa12
7
- data.tar.gz: 8ca92c6b85bf56415ea5c3be928c93141fb8b4d3d733845843699ee028103e5416b0fe2520fb23817d9ee9596fad4dea3764d08d0a1c0edd79a47a047572949b
6
+ metadata.gz: cf8c95f0106d35145e3389a818fd6817538c38301ea9585eb71259d24d4d3791d079cc462765d0b18913d0f3783953ee210a4c26cb8477e9224747c4fe33d262
7
+ data.tar.gz: 4eba5b36480988463f9e98df5bc1883870006bfab754fbdde3db0e9c3ae3c389f25cde1b8f0129335773738f8c9721f4472f9504308dd2bdb32c16feaa4b0e02
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2016 Nakilon
3
+ Copyright (c) 2017 Nakilon
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,35 +1,122 @@
1
- # About
1
+ [![Gem Version](https://badge.fury.io/rb/dhash-vips.svg)](http://badge.fury.io/rb/dhash-vips)
2
2
 
3
- dHash is for measuring the similarity of two images.
3
+ # dHash and IDHash gem powered by ruby-vips
4
4
 
5
- Read "Kind of Like That" blog post (21 January 2013): http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html
5
+ The "dHash" is an algorithm of hashing that can be used for measuring the similarity of two images.
6
6
 
7
- It does not automatically detect very shifted crops and rotated images but you may make a wrapper that would call the comparison function iteratively.
7
+ You may read about it in "Kind of Like That" blog post (21 January 2013): http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html
8
+ The original idea is that you split the image into 64 segments and so there are 64 bits -- each tells if the one segment is brighter or darker than the neighbor one. Then the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between hashes is the opposite of images similarity.
8
9
 
9
- This implementation is powered by Vips and was forked from https://github.com/maccman/dhash that used ImageMagick.
10
+ There are several implementations on Github already but they all depend on ImageMagick. My implementation takes an advantage of libvips (the `ruby-vips` gem) -- it also uses the `.conv` method and in result converts image to an array of grayscale bytes almost 10 times faster:
11
+ ```
12
+ $ rake compare_speed
10
13
 
11
- # Installation
14
+ load and calculate:
15
+ user system total real
16
+ Dhash 12.610000 0.850000 13.460000 ( 13.726590)
17
+ DHashVips::DHash 1.280000 0.250000 1.530000 ( 1.435285)
18
+ DHashVips::IDHash 1.240000 0.160000 1.400000 ( 1.315536)
19
+
20
+ distance (1000 times):
21
+ user system total real
22
+ Dhash 2.500000 0.050000 2.550000 ( 2.579340)
23
+ DHashVips::DHash 2.350000 0.020000 2.370000 ( 2.401252)
24
+ DHashVips::IDHash 5.190000 0.040000 5.230000 ( 5.279742)
25
+ ```
26
+
27
+ Here the `Dhash` is a https://github.com/maccman/dhash that I used earlier in my projects.
28
+ The `DHashVips::DHash` is a port of it that uses vips. I would like to tell you that you can replace the `dhash` with `dhash-vips` gem right now but it appeared to have a barely noticeable issue. There is a lot of magic behind the libvips speed and resizing -- you may not notice it with unarmed eyes but when two neighbor segments are enough similar by luminosity the difference can change the sign. So I found two identical images that were just of different colorspace and size (photo by Jordan Voth):
29
+ ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/dhash_issue_example.png)
30
+ but the distance between their hashes appeared to be equal to 5 while `dhash` gem reported 0.
31
+
32
+ This is why `DHashVips::IDHash` appeared.
33
+
34
+ ## IDHash (the Important Difference Hash)
35
+
36
+ It has improvements over the dHash that made hashing less sensitive to the resizing algorithm and effectively made the pair of images mentioned above to have a distance of 0 again. Three improvements are:
37
+ * The "Importance" is an array of extra 64 bits that tells the comparing function which half of 64 bits is important (when the difference between neighbors was enough significant) and which is not. So not every bit in a hash is being compared but only half of them.
38
+ * It subtracts not only horizontally but also vertically -- that adds 128 more bits.
39
+ * Instead of resizing to 9x8 it resizes to 8x8 and puts the image on a torus so it subtracts the left column from the right one and the top from bottom.
40
+
41
+ For example, here are two photos (by Brian Lauer):
42
+ ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/idhash_example_in.png)
43
+ and visualization of IDHash (`rake compare_images -- image1.jpg image2.jpg`):
44
+ ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/idhash_example_out.png)
45
+
46
+ Here in each of 64 cells, there are two circles that color the difference between that cell and the neighbor one. If the difference is low the Importance bit is set to zero and the circle is invisible. So there are 128 pairs of corresponding circles and when you take one, if at least one circle is visible and is of different color the line is to be drawn. Here you see 15 lines and so the distance between hashes will be equal to 15 (that is pretty low and can be interpreted as "images look similar"). Also, you see here that floor on this photo matters -- classic dHash won't see that it's darker than wall because it's comparing only horizontal neighbors and if one photo had no floor the distance function won't notice that. Also, it sees the Important difference between the very right and left columns because the wall has a slow but visible gradient.
47
+
48
+ You could see in hash calculation benchmark earlier that these improvements didn't make it slower than dHash because most of the time is spent on image resizing. The calculation of distance is what became two times slower:
49
+ ```ruby
50
+ ((a | b) & (a >> 128 ^ b >> 128)).to_s(2).count "1"
51
+ ```
52
+ vs
53
+ ```ruby
54
+ (a ^ b).to_s(2).count "1"
55
+ ```
56
+
57
+ Remaining problems:
58
+ * Neither dHash nor IDHash can't automatically detect very shifted crops and rotated images but you can make a wrapper that would call the comparison function iteratively.
59
+ * These algorithms are color blind because of converting an image to grayscale. If you take a photo of something in your yard the sun will create lights and shadows, but if you compare photos of something green painted on a blue wall there is a possibility the machine would see nothing painted at all. The `dhash` gem had such image in specs and that made them pretty useless (this was supposed to be a face):
60
+ ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/colorblind.png)
61
+ * If you have a pile of 1000000 images to compare them with each other that would take a month or two. To improve the process you in case of DHashVips::DHash that uses Hamming distance you may want to read these:
62
+ * [How to find the closest pairs of a string of binary bins in Ruby without O^2 issues?](https://stackoverflow.com/q/8734034/322020)
63
+ * [Find all pairs of values that are close under Hamming distance](https://cstheory.stackexchange.com/q/18516/27420)
64
+ * [Finding the closest pair between two sets of points on the hypercube](https://cstheory.stackexchange.com/q/16322/27420)
65
+ * [Would PCA work for boolean data types?](https://stats.stackexchange.com/q/159705/1125)
66
+
67
+ ## Installation
12
68
 
13
69
  brew install vips
14
70
 
15
- If you have troubles with above, see https://jcupitt.github.io/libvips/install.html
71
+ If you have troubles, see https://jcupitt.github.io/libvips/install.html
16
72
  Then:
17
73
 
18
74
  gem install dhash-vips
19
75
 
20
- If you have troubles with the `gem vips` dependency, see https://github.com/jcupitt/ruby-vips
76
+ If you have troubles with the `gem ruby-vips` dependency, see https://github.com/jcupitt/ruby-vips
21
77
 
22
- # Usage
78
+ ## Usage
23
79
 
24
- hash1 = DhashVips.calculate "photo1.jpg"
25
- hash2 = DhashVips.calculate "photo2.jpg"
80
+ ### dHash:
26
81
 
27
- if 10 > DhashVips.hamming(hash1, hash2)
28
- puts "Images are very similar"
29
- elsif 20 > DhashVips.hamming(hash1, hash2)
30
- puts "Images are slightly similar"
31
- else
32
- puts "Images are different"
33
- end
82
+ ```ruby
83
+ require "dhash-vips"
84
+
85
+ hash1 = DHashVips::DHash.calculate "photo1.jpg"
86
+ hash2 = DHashVips::DHash.calculate "photo2.jpg"
87
+
88
+ distance = DHashVips::DHash.hamming hash1, hash2
89
+ if distance < 10
90
+ puts "Images are very similar"
91
+ elsif distance < 20
92
+ puts "Images are slightly similar"
93
+ else
94
+ puts "Images are different"
95
+ end
96
+ ```
34
97
 
35
98
  These `10` and `20` numbers are found empirically and just work enough well for 8-byte hashes.
99
+
100
+ ### IDHash:
101
+
102
+ ```ruby
103
+ require "dhash-vips"
104
+
105
+ hash1 = DHashVips::IDHash.calculate "photo1.jpg"
106
+ hash2 = DHashVips::IDHash.calculate "photo2.jpg"
107
+
108
+ distance = DHashVips::IDHash.distance hash1, hash2
109
+ if distance < 10
110
+ puts "Images are very similar"
111
+ elsif distance < 25
112
+ puts "Images are slightly similar"
113
+ else
114
+ puts "Images are different"
115
+ end
116
+ ```
117
+
118
+ Note that `DHash#calculate` accepts `hash_size` optional parameter that sets hash size in bytes, but `IDHash` size is currently hardcoded and can't be adjusted.
119
+
120
+ ## Credits
121
+
122
+ [John Cupitt](https://github.com/jcupitt) (libvips and ruby-vips maintainer) helped me a lot.
@@ -3,19 +3,17 @@ Gem::Specification.new do |spec|
3
3
  spec.version = (require_relative "lib/dhash-vips/version"; DHashVips::VERSION)
4
4
  spec.author = "Victor Maslov"
5
5
  spec.email = "nakilon@gmail.com"
6
- spec.summary = "dHash powered by Vips"
6
+ spec.summary = "dHash and IDHash powered by Vips"
7
7
  spec.homepage = "https://github.com/nakilon/dhash-vips"
8
8
  spec.license = "MIT"
9
9
 
10
- spec.files = `git ls-files -z`.split("\x0") - ["spec"]
11
10
  spec.test_files = ["spec"]
11
+ spec.files = `git ls-files -z`.split("\x0") - spec.test_files
12
12
  spec.require_path = "lib"
13
13
 
14
14
  spec.add_dependency "ruby-vips"
15
- spec.add_dependency "mll"
16
-
17
- spec.add_development_dependency "dhash"
18
15
 
19
16
  spec.add_development_dependency "rake"
20
17
  spec.add_development_dependency "rspec-core"
18
+ spec.add_development_dependency "dhash"
21
19
  end
@@ -27,12 +27,11 @@ module DHashVips
27
27
 
28
28
  end
29
29
 
30
- require "mll"
31
30
  module IDHash
32
31
  extend self
33
32
 
34
33
  def distance a, b
35
- # TODO: the hash_size=8 is hardcoded here
34
+ # TODO: the hash_size is hardcoded here
36
35
  ((a | b) & (a >> 128 ^ b >> 128)).to_s(2).count "1"
37
36
  end
38
37
 
@@ -56,13 +55,14 @@ module DHashVips
56
55
  fail unless 1 == @@median[[1, 1, 1]]
57
56
  fail unless 1 == @@median[[1, 1]]
58
57
 
59
- def calculate file, hash_size = 8
58
+ def calculate file
59
+ hash_size = 8
60
60
  image = Vips::Image.new_from_file file
61
61
  image = image.resize(hash_size.fdiv(image.width), vscale: hash_size.fdiv(image.height)).colourspace("b-w")
62
62
 
63
63
  array = image.to_a.map &:flatten
64
64
  d1, i1, d2, i2 = [array, array.transpose].flat_map do |a|
65
- d = MLL::subtract[a, a.rotate(1)].to_a.flat_map(&:to_a)
65
+ d = a.zip(a.rotate(1)).flat_map{ |r1, r2| r1.zip(r2).map{ |i,j| i - j } }
66
66
  m = @@median.call d.map(&:abs).sort
67
67
  [
68
68
  d.map{ |c| c < 0 ? 1 : 0 }.join.to_i(2),
@@ -1,3 +1,3 @@
1
1
  module DHashVips
2
- VERSION = "0.0.4.0"
2
+ VERSION = "0.0.4.1"
3
3
  end
@@ -21,12 +21,10 @@ require "pp"
21
21
  # [58, 59, 49, 53, 23, 0, 44, 44],
22
22
  # [43, 46, 53, 61, 43, 44, 0, 0],
23
23
  # [43, 47, 52, 64, 45, 44, 0, 0]]
24
- # [DHashVips::IDHash, 14, 14, 21],
25
24
  ].each do |lib, dm, min_similar, max_similar, min_not_similar, max_not_similar|
26
25
 
27
26
  describe lib do
28
27
 
29
- # require "tmpdir"
30
28
  require "fileutils"
31
29
  require "open-uri"
32
30
  require "digest"
@@ -44,7 +42,7 @@ describe lib do
44
42
  bw1, bw2 = %w{
45
43
  71662d4d4029a3b41d47d5baf681ab9a.jpg
46
44
  ad8a37f872956666c3077a3e9e737984.jpg
47
- } # these is the same photo but of different size and bw
45
+ } # these is the same photo but of different size and colorspace
48
46
 
49
47
  example.metadata[:extra_failure_lines] = []
50
48
  FileUtils.mkdir_p dir = "images"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dhash-vips
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.4.0
4
+ version: 0.0.4.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Victor Maslov
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-10-11 00:00:00.000000000 Z
11
+ date: 2017-10-21 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ruby-vips
@@ -25,21 +25,7 @@ dependencies:
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
27
  - !ruby/object:Gem::Dependency
28
- name: mll
29
- requirement: !ruby/object:Gem::Requirement
30
- requirements:
31
- - - '>='
32
- - !ruby/object:Gem::Version
33
- version: '0'
34
- type: :runtime
35
- prerelease: false
36
- version_requirements: !ruby/object:Gem::Requirement
37
- requirements:
38
- - - '>='
39
- - !ruby/object:Gem::Version
40
- version: '0'
41
- - !ruby/object:Gem::Dependency
42
- name: dhash
28
+ name: rake
43
29
  requirement: !ruby/object:Gem::Requirement
44
30
  requirements:
45
31
  - - '>='
@@ -53,7 +39,7 @@ dependencies:
53
39
  - !ruby/object:Gem::Version
54
40
  version: '0'
55
41
  - !ruby/object:Gem::Dependency
56
- name: rake
42
+ name: rspec-core
57
43
  requirement: !ruby/object:Gem::Requirement
58
44
  requirements:
59
45
  - - '>='
@@ -67,7 +53,7 @@ dependencies:
67
53
  - !ruby/object:Gem::Version
68
54
  version: '0'
69
55
  - !ruby/object:Gem::Dependency
70
- name: rspec-core
56
+ name: dhash
71
57
  requirement: !ruby/object:Gem::Requirement
72
58
  requirements:
73
59
  - - '>='
@@ -118,5 +104,5 @@ rubyforge_project:
118
104
  rubygems_version: 2.0.14.1
119
105
  signing_key:
120
106
  specification_version: 4
121
- summary: dHash powered by Vips
107
+ summary: dHash and IDHash powered by Vips
122
108
  test_files: []