dhash-vips 0.0.4.1 → 0.0.5.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a2e3a2b86e4e98ad5e145893959ac6297f4aa660
4
- data.tar.gz: 5ada76a3f6567dde373947c8ea02d706e706cff5
3
+ metadata.gz: 0263e2d1f269153107a10986ba97d2221f1191d6
4
+ data.tar.gz: 4e1365a3ab0507e3e14cf8b06ae1f2f7d84482f1
5
5
  SHA512:
6
- metadata.gz: cf8c95f0106d35145e3389a818fd6817538c38301ea9585eb71259d24d4d3791d079cc462765d0b18913d0f3783953ee210a4c26cb8477e9224747c4fe33d262
7
- data.tar.gz: 4eba5b36480988463f9e98df5bc1883870006bfab754fbdde3db0e9c3ae3c389f25cde1b8f0129335773738f8c9721f4472f9504308dd2bdb32c16feaa4b0e02
6
+ metadata.gz: c00ea6ebead50175bfaa74bf0814a394c1b06159d238716d2edb7e2718587ee0a6c5497ec55823806193b5f46fa1a6ac1b43ba7e277ea468dd2e0ba3c5ea97a7
7
+ data.tar.gz: 57d32bfbf327226ff8ba0771f6346d1ccdadbe8b61dcfaa4aa81795976bd3146fce877f9532d6d9fc1a798ba2ffe3f63a1a940aee263f3702a9b48a4c835a6e6
data/Gemfile CHANGED
@@ -1,4 +1,3 @@
1
1
  source 'https://rubygems.org'
2
2
 
3
- # Specify your gem's dependencies in dhash.gemspec
4
3
  gemspec
data/README.md CHANGED
@@ -2,29 +2,27 @@
2
2
 
3
3
  # dHash and IDHash gem powered by ruby-vips
4
4
 
5
- The "dHash" is an algorithm of hashing that can be used for measuring the similarity of two images.
5
+ The "dHash" is an algorithm of fingerprinting that can be used to measure the similarity of two images.
6
6
 
7
7
  You may read about it in "Kind of Like That" blog post (21 January 2013): http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html
8
- The original idea is that you split the image into 64 segments and so there are 64 bits -- each tells if the one segment is brighter or darker than the neighbor one. Then the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between hashes is the opposite of images similarity.
8
+ The original idea is that you split the image into 64 segments and so there are 64 bits -- each tells if the one segment is brighter or darker than the neighbor one. Then the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between fingerprints is the opposite of images similarity.
9
9
 
10
- There are several implementations on Github already but they all depend on ImageMagick. My implementation takes an advantage of libvips (the `ruby-vips` gem) -- it also uses the `.conv` method and in result converts image to an array of grayscale bytes almost 10 times faster:
10
+ There were several implementations on Github already but they all depend on ImageMagick. My implementation takes an advantage of libvips (the `ruby-vips` gem) -- it also uses the `.conv` method and in result converts image to an array of grayscale bytes almost 10 times faster:
11
11
  ```
12
- $ rake compare_speed
13
-
14
- load and calculate:
15
- user system total real
16
- Dhash 12.610000 0.850000 13.460000 ( 13.726590)
17
- DHashVips::DHash 1.280000 0.250000 1.530000 ( 1.435285)
18
- DHashVips::IDHash 1.240000 0.160000 1.400000 ( 1.315536)
19
-
20
- distance (1000 times):
21
- user system total real
22
- Dhash 2.500000 0.050000 2.550000 ( 2.579340)
23
- DHashVips::DHash 2.350000 0.020000 2.370000 ( 2.401252)
24
- DHashVips::IDHash 5.190000 0.040000 5.230000 ( 5.279742)
12
+ load and calculate the fingerprint:
13
+ user system total real
14
+ Dhash 12.400000 0.820000 13.220000 ( 13.329952)
15
+ DHashVips::DHash 1.330000 0.230000 1.560000 ( 1.509826)
16
+ DHashVips::IDHash 1.060000 0.090000 1.150000 ( 1.100332)
17
+
18
+ measure the distance (1000 times):
19
+ user system total real
20
+ Dhash hamming 3.140000 0.020000 3.160000 ( 3.179392)
21
+ DHashVips::DHash hamming 3.040000 0.020000 3.060000 ( 3.095190)
22
+ DHashVips::IDHash distance 6.720000 0.030000 6.750000 ( 6.790900)
25
23
  ```
26
24
 
27
- Here the `Dhash` is a https://github.com/maccman/dhash that I used earlier in my projects.
25
+ Here the `Dhash` is [another gem](https://github.com/maccman/dhash) that I used earlier in my projects.
28
26
  The `DHashVips::DHash` is a port of it that uses vips. I would like to tell you that you can replace the `dhash` with `dhash-vips` gem right now but it appeared to have a barely noticeable issue. There is a lot of magic behind the libvips speed and resizing -- you may not notice it with unarmed eyes but when two neighbor segments are enough similar by luminosity the difference can change the sign. So I found two identical images that were just of different colorspace and size (photo by Jordan Voth):
29
27
  ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/dhash_issue_example.png)
30
28
  but the distance between their hashes appeared to be equal to 5 while `dhash` gem reported 0.
@@ -33,36 +31,40 @@ This is why `DHashVips::IDHash` appeared.
33
31
 
34
32
  ## IDHash (the Important Difference Hash)
35
33
 
36
- It has improvements over the dHash that made hashing less sensitive to the resizing algorithm and effectively made the pair of images mentioned above to have a distance of 0 again. Three improvements are:
37
- * The "Importance" is an array of extra 64 bits that tells the comparing function which half of 64 bits is important (when the difference between neighbors was enough significant) and which is not. So not every bit in a hash is being compared but only half of them.
38
- * It subtracts not only horizontally but also vertically -- that adds 128 more bits.
34
+ It has improvements over the dHash that made fingerprinting less sensitive to the resizing algorithm and effectively made the pair of images mentioned above to have a distance of 0 again. Three improvements are:
35
+ * The "Importance" is an array of extra 64 bits that tells the comparing function which half of 64 bits is important (when the difference between neighbors was enough significant) and which is not. So not every bit in a fingerprint is being compared but only half of them.
36
+ * It subtracts not only horizontally but also vertically -- that adds 128 more bits.
39
37
  * Instead of resizing to 9x8 it resizes to 8x8 and puts the image on a torus so it subtracts the left column from the right one and the top from bottom.
40
38
 
41
- For example, here are two photos (by Brian Lauer):
42
- ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/idhash_example_in.png)
43
- and visualization of IDHash (`rake compare_images -- image1.jpg image2.jpg`):
44
- ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/idhash_example_out.png)
45
-
46
- Here in each of 64 cells, there are two circles that color the difference between that cell and the neighbor one. If the difference is low the Importance bit is set to zero and the circle is invisible. So there are 128 pairs of corresponding circles and when you take one, if at least one circle is visible and is of different color the line is to be drawn. Here you see 15 lines and so the distance between hashes will be equal to 15 (that is pretty low and can be interpreted as "images look similar"). Also, you see here that floor on this photo matters -- classic dHash won't see that it's darker than wall because it's comparing only horizontal neighbors and if one photo had no floor the distance function won't notice that. Also, it sees the Important difference between the very right and left columns because the wall has a slow but visible gradient.
47
-
48
- You could see in hash calculation benchmark earlier that these improvements didn't make it slower than dHash because most of the time is spent on image resizing. The calculation of distance is what became two times slower:
39
+ You could see in fingerprint calculation benchmark earlier that these improvements didn't make it slower than dHash because most of the time is spent on image resizing. The calculation of distance is what became two times slower:
49
40
  ```ruby
50
- ((a | b) & (a >> 128 ^ b >> 128)).to_s(2).count "1"
41
+ ((a | b) & ((a ^ b) >> 128)).to_s(2).count "1"
51
42
  ```
52
43
  vs
53
44
  ```ruby
54
45
  (a ^ b).to_s(2).count "1"
55
46
  ```
56
47
 
57
- Remaining problems:
48
+ ### Example
49
+
50
+ Here are two photos (by Brian Lauer):
51
+ ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/idhash_example_in.png)
52
+ and visualization of IDHash (`rake compare_images -- image1.jpg image2.jpg`):
53
+ ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/idhash_example_out.png)
54
+
55
+ Here in each of 64 cells, there are two circles that color the difference between that cell and the neighbor one. If the difference is low the Importance bit is set to zero and the circle is invisible. So there are 128 pairs of corresponding circles and when you take one, if at least one circle is visible and is of different color the line is to be drawn. Here you see 15 lines and so the distance between fingerprints will be equal to 15 (that is pretty low and can be interpreted as "images look similar"). Also, you see here that floor on this photo matters -- classic dHash won't see that it's darker than wall because it's comparing only horizontal neighbors and if one photo had no floor the distance function won't notice that. Also, it sees the Important difference between the very right and left columns because the wall has a slow but visible gradient.
56
+
57
+ ### Remaining problems
58
+
58
59
  * Neither dHash nor IDHash can't automatically detect very shifted crops and rotated images but you can make a wrapper that would call the comparison function iteratively.
59
60
  * These algorithms are color blind because of converting an image to grayscale. If you take a photo of something in your yard the sun will create lights and shadows, but if you compare photos of something green painted on a blue wall there is a possibility the machine would see nothing painted at all. The `dhash` gem had such image in specs and that made them pretty useless (this was supposed to be a face):
60
61
  ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/colorblind.png)
61
- * If you have a pile of 1000000 images to compare them with each other that would take a month or two. To improve the process you in case of DHashVips::DHash that uses Hamming distance you may want to read these:
62
+ * If you have a pile of 1000000 images comparing them with each other would take a month or two. To improve the process in case of dHash that uses Hamming distance you may want to read these threads on Stackexchange network:
62
63
  * [How to find the closest pairs of a string of binary bins in Ruby without O^2 issues?](https://stackoverflow.com/q/8734034/322020)
63
64
  * [Find all pairs of values that are close under Hamming distance](https://cstheory.stackexchange.com/q/18516/27420)
64
65
  * [Finding the closest pair between two sets of points on the hypercube](https://cstheory.stackexchange.com/q/16322/27420)
65
66
  * [Would PCA work for boolean data types?](https://stats.stackexchange.com/q/159705/1125)
67
+ * [Using pHash to search agaist a huge image database, what is the best approach?](https://stackoverflow.com/q/18257641/322020)
66
68
 
67
69
  ## Installation
68
70
 
@@ -95,8 +97,6 @@ else
95
97
  end
96
98
  ```
97
99
 
98
- These `10` and `20` numbers are found empirically and just work enough well for 8-byte hashes.
99
-
100
100
  ### IDHash:
101
101
 
102
102
  ```ruby
@@ -115,7 +115,74 @@ else
115
115
  end
116
116
  ```
117
117
 
118
- Note that `DHash#calculate` accepts `hash_size` optional parameter that sets hash size in bytes, but `IDHash` size is currently hardcoded and can't be adjusted.
118
+ These `10` and `20` numbers are found empirically and just work enough well for 8-byte hashes.
119
+ To find out these tresholds we can run a rake task with hardcoded test cases:
120
+ ```
121
+ $ rake compare_matrices
122
+
123
+ Dhash
124
+ Absolutely the same image: 0..0
125
+ Complex B/W and the same but colorful: 0
126
+ Similar images: 13..16
127
+ Different images: 9..41
128
+
129
+ DHashVips::DHash
130
+ Absolutely the same image: 0..0
131
+ Complex B/W and the same but colorful: 5
132
+ Similar images: 17..18
133
+ Different images: 14..39
134
+
135
+ DHashVips::IDHash
136
+ Absolutely the same image: 0..0
137
+ Complex B/W and the same but colorful: 0
138
+ Similar images: 15..23
139
+ Different images: 19..64
140
+
141
+ DHashVips::IDHash 4
142
+ Absolutely the same image: 0..0
143
+ Complex B/W and the same but colorful: 0
144
+ Similar images: 71..108
145
+ Different images: 102..211
146
+
147
+ ```
148
+
149
+ ### Notes
150
+
151
+ * Methods were renamed from `#calculate` to `#fingerprint` and from `#hamming` to `#distance`.
152
+ * The `DHash#calculate` accepts `hash_size` optional parameter that is 8 by default. The `IDHash#fingerprint`'s optional parameter is called `power` and works in a bit different way: 3 means 8 and 4 means 16 -- other sizes are not supported because they don't seem to be useful (higher fingerprint resolution makes it vulnerable to image shifts and croppings). Because IDHash's fingerprint is more complex than DHash's one it's not that straight forward to compare them so under the hood the `#distance` methods have to check the size of fingerprint -- this trade-off costs 30-40% of speed that can be eliminated by using `#distance3` method that assumes fingerprint to be of power=3. So the full benchmark is this one:
153
+
154
+ ```
155
+ $ rake compare_speed
156
+
157
+ load and calculate the fingerprint:
158
+ user system total real
159
+ Dhash 12.400000 0.820000 13.220000 ( 13.329952)
160
+ DHashVips::DHash 1.330000 0.230000 1.560000 ( 1.509826)
161
+ DHashVips::IDHash 1.060000 0.090000 1.150000 ( 1.100332)
162
+ DHashVips::IDHash 4 1.030000 0.080000 1.110000 ( 1.089148)
163
+
164
+ measure the distance (1000 times):
165
+ user system total real
166
+ Dhash hamming 3.140000 0.020000 3.160000 ( 3.179392)
167
+ DHashVips::DHash hamming 3.040000 0.020000 3.060000 ( 3.095190)
168
+ DHashVips::IDHash distance 8.170000 0.040000 8.210000 ( 8.279950)
169
+ DHashVips::IDHash distance3 6.720000 0.030000 6.750000 ( 6.790900)
170
+ DHashVips::IDHash distance 4 24.430000 0.130000 24.560000 ( 24.652625)
171
+ ```
172
+
173
+ Also note that to make `#distance` able to assume the fingerprint resolution from the size of Integer that represents it, the change in its structure was needed (left half of bits was swapped with right one), so fingerprints between versions 0.0.4 and 0.0.5 became incompatible, but you probably can convert them manually. I know, incompatibilities suck but if we put the version or structure information inside fingerprint it will became slow to (de)serialize and store.
174
+
175
+ ## Troubleshooting
176
+
177
+ El Captain and rbenv may cause environment issues that would make you do things like:
178
+ ```
179
+ ./ruby `rbenv which rake` compare_matrixes
180
+ ```
181
+ instead of just
182
+ ```
183
+ rake compare_matrixes
184
+ ```
185
+ For more information on that: https://github.com/jcupitt/ruby-vips/issues/141
119
186
 
120
187
  ## Credits
121
188
 
data/Rakefile CHANGED
@@ -50,99 +50,144 @@ task :compare_kernels do |_|
50
50
  end
51
51
  end
52
52
 
53
-
53
+ # ./ruby `rbenv which rake` compare_matrixes
54
54
  desc "Compare the quality of Dhash, DHashVips::DHash and DHashVips::IDHash -- run it only after `rake test`"
55
- task :compare_matrixes do |_|
55
+ task :compare_matrices do |_|
56
56
  require "dhash"
57
57
  require_relative "lib/dhash-vips"
58
58
  require "mll"
59
- [[Dhash, :hamming], [DHashVips::DHash, :hamming], [DHashVips::IDHash, :distance]].each do |m, dm|
59
+ [
60
+ [Dhash, :calculate, :hamming],
61
+ [DHashVips::DHash, :calculate, :hamming],
62
+ [DHashVips::IDHash, :fingerprint, :distance],
63
+ [DHashVips::IDHash, :fingerprint, :distance, 4],
64
+ ].each do |m, calc, dm, power|
65
+ puts "\n#{m} #{power}"
60
66
  hashes = %w{
61
67
  71662d4d4029a3b41d47d5baf681ab9a.jpg
62
68
  ad8a37f872956666c3077a3e9e737984.jpg
69
+
70
+ 6d97739b4a08f965dc9239dd24382e96.jpg
71
+ 1b1d4bde376084011d027bba1c047a4b.jpg
72
+
63
73
  1d468d064d2e26b5b5de9a0241ef2d4b.jpg
64
74
  92d90b8977f813af803c78107e7f698e.jpg
75
+
65
76
  309666c7b45ecbf8f13e85a0bd6b0a4c.jpg
66
77
  3f9f3db06db20d1d9f8188cd753f6ef4.jpg
67
78
  df0a3b93e9412536ee8a11255f974141.jpg
68
79
  679634ff89a31279a39f03e278bc9a01.jpg
69
- }.map{ |filename| m.calculate "images/#{filename}" }
80
+ }.map{ |filename| m.public_send calc, "images/#{filename}", *power }
70
81
  table = MLL::table[m.method(dm), [hashes], [hashes]]
71
- array = Array.new(4){ [] }
82
+ # require "pp"
83
+ # pp table
84
+ array = Array.new(5){ [] }
72
85
  hashes.size.times.to_a.repeated_combination(2) do |i, j|
73
- array[i == j ? 0 : (j - i).abs == 1 && (i + j - 1) % 4 == 0 ? [i, j] == [0, 1] ? 1 : 2 : 3].push table[i][j]
86
+ array[i == j ? 0 : (j - i).abs == 1 && (i + j - 1) % 4 == 0 ? [i, j] == [0, 1] ? 1 : [i, j] == [2, 3] ? 2 : 3 : 4].push table[i][j]
74
87
  end
75
- p array.map &:sort
88
+ # p array.map &:sort
89
+ puts "Absolutely the same image: #{array[0].minmax.join ".."}"
90
+ puts "Complex B/W and the same but colorful: #{array[1][0]}"
91
+ puts "Similar images: #{array[3].minmax.join ".."}"
92
+ puts "Different images: #{[*array[2], *array[4]].minmax.join ".."}"
76
93
  end
77
94
  end
78
95
 
79
96
  # ruby -c Rakefile && rm -f ab.png && rake compare_images -- fc762fa286489d8afc80adc8cdcb125e.jpg 9c2c240ec02356472fb532f404d28dde.jpg 2>/dev/null && ql ab.png
97
+ # rm -f ab.png && ./ruby `rbenv which rake` compare_images -- 6d97739b4a08f965dc9239dd24382e96.jpg 1b1d4bde376084011d027bba1c047a4b.jpg 2>/dev/null && ql ab.png
80
98
  desc "Visualizes the IDHash difference measurement between two images"
81
99
  task :compare_images do |_|
82
- abort "there should be two image filenames passed as arguments" unless ARGV.size == 3
100
+ abort "there should be two image filenames passed as arguments (and optionally the `power`)" unless (4..5) === ARGV.size
101
+ abort "the optional argument should be either 3 or 4" unless [3, 4].include?(power = (ARGV[4] || 3).to_i)
102
+ task ARGV.last do ; end
83
103
  require_relative "lib/dhash-vips"
84
- ha, hb = ARGV.drop(1).map &DHashVips::IDHash.method(:calculate)
104
+ ha, hb = ARGV[2, 2].map{ |filename| DHashVips::IDHash.fingerprint(filename, power) }
85
105
  puts "distance: #{DHashVips::IDHash.distance ha, hb}"
106
+ size = 2 ** power
107
+ shift = 2 * size * size
108
+ ai = ha >> shift
109
+ ad = ha - (ai << shift)
110
+ bi = hb >> shift
111
+ bd = hb - (bi << shift)
112
+
113
+ a, b = ARGV[2, 2].map do |filename|
114
+ image = Vips::Image.new_from_file filename
115
+ image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w").
116
+ resize(100, vscale: 100, kernel: :nearest).colourspace("srgb")
117
+ end
118
+ fail unless a.width == b.width && a.height == b.height
86
119
 
87
- require "delegate"
88
- class ImageMutable < SimpleDelegator
89
- %i{ draw_line draw_circle }.each do |m|
90
- define_method "#{m}!" do |*args|
91
- __setobj__ self.send m, *args
92
- self
120
+ _127 = shift - 1
121
+ _63 = size * size - 1
122
+ n = 0
123
+ width = a.width
124
+ height = a.height
125
+
126
+ Vips::Operation.class_eval do
127
+ old_initialize = instance_method :initialize
128
+ define_method :initialize do |value|
129
+ old_initialize.bind(self).(value).tap do
130
+ self.instance_variable_set "@operation_name", value
93
131
  end
94
132
  end
133
+ old_set = instance_method :set
134
+ define_method :set do |*args|
135
+ args[1].instance_variable_set "@operation_name", self.instance_variable_get("@operation_name") if args.first == "image"
136
+ old_set.bind(self).(*args)
137
+ end
95
138
  end
96
- a, b = ARGV.drop(1).map do |filename|
97
- image = Vips::Image.new_from_file filename
98
- ImageMutable.new image.resize(8.fdiv(image.width), vscale: 8.fdiv(image.height)).colourspace("b-w").
99
- resize(100, vscale: 100, kernel: :nearest).colourspace("srgb")
139
+ Vips::Image.class_eval do
140
+ def copy
141
+ return self if caller.first.end_with?("/gems/ruby-vips-2.0.9/lib/vips/operation.rb:148:in `set'") &&
142
+ %w{ draw_line draw_circle }.include?(instance_variable_get "@operation_name")
143
+ method_missing :copy
144
+ end
100
145
  end
101
146
 
102
- ad = ha >> 128
103
- ai = ha - (ad << 128)
104
- bd = hb >> 128
105
- bi = hb - (bd << 128)
106
-
107
- n = 0
108
- [[a, ad, ai], [b, bd, bi]].each do |image, xd, xi|
109
- 127.downto(0).each do |i|
110
- if i > 63
111
- y, x = (127 - i).divmod 8
147
+ require "get_process_mem"
148
+ a, b = [[a, ad, ai], [b, bd, bi]].map do |image, xd, xi|
149
+ _127.downto(0).each_with_index do |i, ii|
150
+ mem = GetProcessMem.new(Process.pid).mb
151
+ abort ">1000mb of memory consumed" if 1000 < mem
152
+ if i > _63
153
+ y, x = (_127 - i).divmod size
112
154
  else
113
- x, y = (63 - i).divmod 8
155
+ x, y = (_63 - i).divmod size
114
156
  end
115
- x = (image.width * (x + 0.5) / 8).round
116
- y = (image.height * (y + 0.5) / 8).round
117
- if i > 63
157
+ x = (width * (x + 0.5) / size).round
158
+ y = (height * (y + 0.5) / size).round
159
+ if i > _63
118
160
  (x-2..x+2).map do |x| [
119
- [x, y , x, (y + image.height / 16 - 1) % image.height],
120
- [x, (y + image.height / 16 + 1) % image.height, x, (y + image.height / 8 ) % image.height],
161
+ [x, y , x, (y + height / size / 2 - 1) % height],
162
+ [x, (y + height / size / 2 + 1) % height, x, (y + height / size ) % height],
121
163
  ] end
122
164
  else
123
165
  (y-2..y+2).map do |y| [
124
- [ x , y, (x + image.width / 16 - 1) % image.width, y],
125
- [(x + image.width / 16 + 1) % image.width, y, (x + image.width / 8 ) % image.width, y],
166
+ [ x , y, (x + width / size / 2 - 1) % width, y],
167
+ [(x + width / size / 2 + 1) % width, y, (x + width / size ) % width, y],
126
168
  ] end
127
169
  end.each do |coords1, coords2|
128
170
  n += 1
129
- image.draw_line! (1 - xd[i]) * 255, *coords1
130
- image.draw_line! xd[i] * 255, *coords2
171
+ image = image.draw_line (1 - xd[i]) * 255, *coords1
172
+ image = image.draw_line xd[i] * 255, *coords2
131
173
  end if ai[i] + bi[i] > 0 && ad[i] != bd[i]
132
- cx, cy = if i > 63
133
- [x, y + 20]
174
+ cx, cy = if i > _63
175
+ [x, y + 30]
134
176
  else
135
- [x + 20, y]
177
+ [x + 30, y]
136
178
  end
137
- image.draw_circle! xd[i] * 255, cx, cy, 11, fill: true if xi[i] > 0
138
- image.draw_circle! (1 - xd[i]) * 255, cx, cy, 10, fill: true if xi[i] > 0
179
+ image = image.draw_circle xd[i] * 255, cx, cy, 11, fill: true if xi[i] > 0
180
+ image = image.draw_circle (1 - xd[i]) * 255, cx, cy, 10, fill: true if xi[i] > 0
139
181
  end
182
+ image
140
183
  end
141
184
  puts "distance: #{n / 10}"
185
+ puts "(above should be equal if raketask works correcly)"
142
186
 
143
- a.join(b.__getobj__, :horizontal, shim: 10).write_to_file "ab.png"
187
+ a.join(b, :horizontal, shim: 15).write_to_file "ab.png"
144
188
  end
145
189
 
190
+ # ./ruby `rbenv which rake` compare_speed
146
191
  desc "Benchmarks Dhash, DHashVips::DHash and DHashVips::IDHash"
147
192
  task :compare_speed do
148
193
  require "dhash"
@@ -169,21 +214,33 @@ task :compare_speed do
169
214
  end
170
215
 
171
216
  require "benchmark"
172
- puts "load and calculate:"
217
+ puts "load and calculate the fingerprint:"
173
218
  hashes = []
174
- Benchmark.bm 18 do |bm|
175
- [Dhash, DHashVips::DHash, DHashVips::IDHash].each do |m|
176
- bm.report m do
177
- hashes.push filenames.map &m.method(:calculate)
219
+ Benchmark.bm 19 do |bm|
220
+ [
221
+ [Dhash, :calculate],
222
+ [DHashVips::DHash, :calculate],
223
+ [DHashVips::IDHash, :fingerprint],
224
+ [DHashVips::IDHash, :fingerprint, 4],
225
+ ].each do |m, calc, power|
226
+ bm.report "#{m} #{power}" do
227
+ hashes.push filenames.map{ |filename| m.send calc, filename, *power }
178
228
  end
179
229
  end
180
230
  end
181
- puts "distance (1000 times):"
182
- Benchmark.bm 18 do |bm|
183
- [[Dhash, :hamming], [DHashVips::DHash, :hamming], [DHashVips::IDHash, :distance]].zip(hashes) do |(m, dm), hs|
184
- bm.report m do
231
+ hashes[-1, 1] = hashes[-2, 2] # for `distance` and `distance3` we use the same hashes
232
+ puts "\nmeasure the distance (1000 times):"
233
+ Benchmark.bm 28 do |bm|
234
+ [
235
+ [Dhash, :hamming],
236
+ [DHashVips::DHash, :hamming],
237
+ [DHashVips::IDHash, :distance],
238
+ [DHashVips::IDHash, :distance3],
239
+ [DHashVips::IDHash, :distance, 4],
240
+ ].zip(hashes) do |(m, dm, power), hs|
241
+ bm.report "#{m} #{dm} #{power}" do
185
242
  hs.product hs do |h1, h2|
186
- 1000.times{ m.send dm, h1, h2 }
243
+ 1000.times{ m.public_send dm, h1, h2 }
187
244
  end
188
245
  end
189
246
  end
@@ -1,6 +1,6 @@
1
1
  Gem::Specification.new do |spec|
2
2
  spec.name = "dhash-vips"
3
- spec.version = (require_relative "lib/dhash-vips/version"; DHashVips::VERSION)
3
+ spec.version = "0.0.5.0"
4
4
  spec.author = "Victor Maslov"
5
5
  spec.email = "nakilon@gmail.com"
6
6
  spec.summary = "dHash and IDHash powered by Vips"
@@ -16,4 +16,5 @@ Gem::Specification.new do |spec|
16
16
  spec.add_development_dependency "rake"
17
17
  spec.add_development_dependency "rspec-core"
18
18
  spec.add_development_dependency "dhash"
19
+ spec.add_development_dependency "get_process_mem"
19
20
  end
@@ -1,4 +1,3 @@
1
- require_relative "dhash-vips/version"
2
1
  require "vips"
3
2
 
4
3
  module DHashVips
@@ -30,9 +29,19 @@ module DHashVips
30
29
  module IDHash
31
30
  extend self
32
31
 
32
+ def distance3 a, b
33
+ return ((a ^ b) & (a | b) >> 128).to_s(2).count "1"
34
+ end
33
35
  def distance a, b
34
- # TODO: the hash_size is hardcoded here
35
- ((a | b) & (a >> 128 ^ b >> 128)).to_s(2).count "1"
36
+ size_a, size_b = [a, b].map do |x|
37
+ case x.size
38
+ when 32 ; 8
39
+ when 128, 124, 120 ; 16
40
+ else ; fail "invalid size of fingerprint; #{x.size}"
41
+ end
42
+ end
43
+ fail "fingerprints were taken with different `power` param: #{size_a} and #{size_b}" if size_a != size_b
44
+ ((a ^ b) & (a | b) >> 2 * size_a * size_a).to_s(2).count "1"
36
45
  end
37
46
 
38
47
  @@median = lambda do |array|
@@ -55,10 +64,10 @@ module DHashVips
55
64
  fail unless 1 == @@median[[1, 1, 1]]
56
65
  fail unless 1 == @@median[[1, 1]]
57
66
 
58
- def calculate file
59
- hash_size = 8
67
+ def fingerprint file, power = 3
68
+ size = 2 ** power
60
69
  image = Vips::Image.new_from_file file
61
- image = image.resize(hash_size.fdiv(image.width), vscale: hash_size.fdiv(image.height)).colourspace("b-w")
70
+ image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w")
62
71
 
63
72
  array = image.to_a.map &:flatten
64
73
  d1, i1, d2, i2 = [array, array.transpose].flat_map do |a|
@@ -69,7 +78,7 @@ module DHashVips
69
78
  d.map{ |c| c.abs >= m ? 1 : 0 }.join.to_i(2),
70
79
  ]
71
80
  end
72
- (((((d1 << hash_size * hash_size) + d2) << hash_size * hash_size) + i1) << hash_size * hash_size) + i2
81
+ (((((i1 << size * size) + i2) << size * size) + d1) << size * size) + d2
73
82
  end
74
83
 
75
84
  end
@@ -2,8 +2,10 @@ require "dhash-vips"
2
2
 
3
3
  require "pp"
4
4
 
5
+ # TODO tests about `fingerprint(4)`
6
+
5
7
  [
6
- [DHashVips::DHash, :hamming, 17, 18, 22, 39],
8
+ [DHashVips::DHash, :hamming, :calculate, 17, 18, 22, 39],
7
9
  # [[0, 17, 29, 27, 22, 29, 30, 29],
8
10
  # [17, 0, 30, 26, 33, 36, 37, 36],
9
11
  # [29, 30, 0, 18, 39, 30, 39, 36],
@@ -12,7 +14,7 @@ require "pp"
12
14
  # [29, 36, 30, 30, 17, 0, 33, 30],
13
15
  # [30, 37, 39, 35, 28, 33, 0, 5],
14
16
  # [29, 36, 36, 34, 23, 30, 5, 0]]
15
- [DHashVips::IDHash, :distance, 15, 23, 28, 64],
17
+ [DHashVips::IDHash, :distance, :fingerprint, 15, 23, 28, 64],
16
18
  # [[0, 16, 30, 32, 46, 58, 43, 43],
17
19
  # [16, 0, 28, 28, 47, 59, 46, 47],
18
20
  # [30, 28, 0, 15, 53, 49, 53, 52],
@@ -21,7 +23,7 @@ require "pp"
21
23
  # [58, 59, 49, 53, 23, 0, 44, 44],
22
24
  # [43, 46, 53, 61, 43, 44, 0, 0],
23
25
  # [43, 47, 52, 64, 45, 44, 0, 0]]
24
- ].each do |lib, dm, min_similar, max_similar, min_not_similar, max_not_similar|
26
+ ].each do |lib, dm, calc, min_similar, max_similar, min_not_similar, max_not_similar|
25
27
 
26
28
  describe lib do
27
29
 
@@ -39,10 +41,13 @@ describe lib do
39
41
  df0a3b93e9412536ee8a11255f974141.jpg
40
42
  679634ff89a31279a39f03e278bc9a01.jpg
41
43
  } # these images a consecutive pairs of slightly (but enough for nice asserts) silimar images
44
+ # 6d97739b4a08f965dc9239dd24382e96.jpg
45
+ # 1b1d4bde376084011d027bba1c047a4b.jpg
46
+ # while these two are tend to be false positive match by idhash
42
47
  bw1, bw2 = %w{
43
48
  71662d4d4029a3b41d47d5baf681ab9a.jpg
44
49
  ad8a37f872956666c3077a3e9e737984.jpg
45
- } # these is the same photo but of different size and colorspace
50
+ } # these are the same photo but of different size and colorspace
46
51
 
47
52
  example.metadata[:extra_failure_lines] = []
48
53
  FileUtils.mkdir_p dir = "images"
@@ -59,12 +64,12 @@ describe lib do
59
64
  end
60
65
  end
61
66
 
62
- hashes = [*images, bw1, bw2].map &described_class.method(:calculate)
67
+ hashes = [*images, bw1, bw2].map &described_class.method(calc)
63
68
  table = MLL::table[described_class.method(dm), [hashes], [hashes]]
64
69
 
65
70
  # require "pp"
66
71
  # pp table
67
- # abort
72
+ # next
68
73
 
69
74
  aggregate_failures do
70
75
  hashes.size.times.to_a.repeated_combination(2) do |i, j|
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dhash-vips
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.4.1
4
+ version: 0.0.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Victor Maslov
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-10-21 00:00:00.000000000 Z
11
+ date: 2018-02-11 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ruby-vips
@@ -66,6 +66,20 @@ dependencies:
66
66
  - - '>='
67
67
  - !ruby/object:Gem::Version
68
68
  version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: get_process_mem
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - '>='
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - '>='
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
69
83
  description:
70
84
  email: nakilon@gmail.com
71
85
  executables: []
@@ -79,7 +93,6 @@ files:
79
93
  - Rakefile
80
94
  - dhash-vips.gemspec
81
95
  - lib/dhash-vips.rb
82
- - lib/dhash-vips/version.rb
83
96
  - spec/_spec.rb
84
97
  homepage: https://github.com/nakilon/dhash-vips
85
98
  licenses:
@@ -1,3 +0,0 @@
1
- module DHashVips
2
- VERSION = "0.0.4.1"
3
- end