dhash-vips 0.0.6.0 → 0.1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 276ae83cf5882e8d61e4c45ade9e14a7e240b65a
4
- data.tar.gz: a02324864794aeaf24281e266e218c3ec865f3be
3
+ metadata.gz: b9dd21b28b731fe1b65ba352b65d3bd8b3a30e78
4
+ data.tar.gz: 51d1ba2da83dc157ed92ff3659a2538cf13bae32
5
5
  SHA512:
6
- metadata.gz: b308138a0c22a1c6d1b873573097cb6f8c918fcad538dc2b4d8a6d995232198f0a11562684bd51c1e1916b590299b93a4c63103eec82205aad6a0f66f1e6338f
7
- data.tar.gz: ba547c330d1b1a43d89149507748ff70a076ac8101e8da750bf7d65a140ec2bb48fa5a5e5166cdf780666c42b0306f800e364f7e753cdb45cdec5aa839995688
6
+ metadata.gz: 4444959377a750dad3da8fd98547d258a4cd2be46488c1a312a859f19eca6261810be4c3ef023d5c8f1681d8262dec12c9a0d6ef5ee0818e6b1bdbe538bfbd19
7
+ data.tar.gz: 62ab37e14c953b3382f3a66c1b661d6c8ff3c20fa8aac2b86e4bdd1d82d5ef601c5f9c5c5485eb76bd90dd4e9049355fffc4f3f62604eb710391881e064fdf1e
data/Gemfile CHANGED
@@ -1,3 +1,5 @@
1
1
  source 'https://rubygems.org'
2
2
 
3
+ gem "ruby-vips", github: "libvips/ruby-vips"
4
+
3
5
  gemspec
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2017 Nakilon
3
+ Copyright (c) 2017, Victor Maslov (nakilon@gmail.com)
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -2,48 +2,32 @@
2
2
 
3
3
  # dHash and IDHash gem powered by ruby-vips
4
4
 
5
- The "dHash" is an algorithm of fingerprinting that can be used to measure the similarity of two images.
5
+ The **dHash** is the algorithm of image fingerprinting that can be used to measure the similarity of two images.
6
+ The **IDHash** is the new algorithm that has some improvements over dHash -- I'll describe it further.
6
7
 
7
- You may read about it in "Kind of Like That" blog post (21 January 2013): http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html
8
- The original idea is that you split the image into 64 segments and so there are 64 bits -- each tells if the one segment is brighter or darker than the neighbor one. Then the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between fingerprints is the opposite of images similarity.
8
+ You can read about the dHash and perceptual hashing in the article ["Kind of Like That" at "The Hacker Factor Blog"](http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html) (21 January 2013). The idea is that you resize the otiginal image to 8x9 and then convert it to 8x8 array of bits -- each tells if the corresponding segment of the image is brighter or darker than the one on the right (or left). Then you apply the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) to such arrays to measure how much they are different.
9
9
 
10
- There were several implementations on Github already but they all depend on ImageMagick. My implementation takes an advantage of libvips (the `ruby-vips` gem) -- it also uses the `.conv` method and in result converts image to an array of grayscale bytes almost 10 times faster:
11
- ```
12
- load and calculate the fingerprint:
13
- user system total real
14
- Dhash 13.110000 0.950000 14.060000 ( 14.537057)
15
- DHashVips::DHash 1.480000 0.310000 1.790000 ( 1.808787)
16
- DHashVips::IDHash 1.080000 0.100000 1.180000 ( 1.156446)
17
-
18
- measure the distance (1000 times):
19
- user system total real
20
- Dhash hamming 1.770000 0.010000 1.780000 ( 1.815612)
21
- DHashVips::DHash 1.810000 0.010000 1.820000 ( 1.875666)
22
- DHashVips::IDHash 3.430000 0.020000 3.450000 ( 3.499031)
23
- ```
10
+ There were several Ruby implementations on Github already but they all depended on ImageMagick. My implementation takes an advantage of speed of the libvips (the `ruby-vips` gem) -- it fingerprints images much faster:
24
11
 
25
- Here the `Dhash` is [another gem](https://github.com/maccman/dhash) that I used earlier in my projects.
26
- The `DHashVips::DHash` is a port of it that uses vips. I would like to tell you that you can replace the `dhash` with `dhash-vips` gem right now but it appeared to have a barely noticeable issue. There is a lot of magic behind the libvips speed and resizing -- you may not notice it with unarmed eyes but when two neighbor segments are enough similar by luminosity the difference can change the sign. So I found two identical images that were just of different colorspace and size (photo by Jordan Voth):
27
- ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/dhash_issue_example.png)
28
- but the distance between their hashes appeared to be equal to 5 while `dhash` gem reported 0.
12
+ load the image and calculate the fingerprint:
13
+ user system total real
14
+ Dhash 6.191731 0.230885 6.422616 ( 6.428763)
15
+ DHashVips::DHash 0.858045 0.144820 1.002865 ( 0.924308)
29
16
 
30
- This is why `DHashVips::IDHash` appeared.
17
+ `Dhash` here is [another gem](https://github.com/maccman/dhash) that I used earlier in my projects before I decided to make this one.
18
+ Unfortunately both gems made slightly different fingerprints for two image files that are supposed to have the same fingerprint because from the human point of view they are the same (photo by Jordan Voth):
19
+ ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/dhash_issue_example.png)
20
+ The distance here appeared to be equal to 5. This is why I've decided to improve the algorithm and this is how the "IDHash" appeared.
31
21
 
32
22
  ## IDHash (the Important Difference Hash)
33
23
 
34
- It has improvements over the dHash that made fingerprinting less sensitive to the resizing algorithm and effectively made the pair of images mentioned above to have a distance of 0 again. Three improvements are:
24
+ The main improvement over the dHash is what makes it insensitive to the resizing algorithm, color scheme and effectively made the pair of images above to have a distance of 0.
25
+
35
26
  * The "Importance" is an array of extra 64 bits that tells the comparing function which half of 64 bits is important (when the difference between neighbors was enough significant) and which is not. So not every bit in a fingerprint is being compared but only half of them.
36
27
  * It subtracts not only horizontally but also vertically -- that adds 128 more bits.
37
- * Instead of resizing to 9x8 it resizes to 8x8 and puts the image on a torus so it subtracts the left column from the right one and the top from bottom.
28
+ * Instead of resizing to 8x9 it resizes to 8x8 and puts the image on a torus so it subtracts the very left column from the very right one and the top from the bottom.
38
29
 
39
- You could see in fingerprint calculation benchmark earlier that these improvements didn't make it slower than dHash because most of the time is spent on image resizing (at some point it actually even became faster, idk why). The calculation of distance is what became two times slower:
40
- ```ruby
41
- ((a | b) & ((a ^ b) >> 128)).to_s(2).count "1"
42
- ```
43
- vs
44
- ```ruby
45
- (a ^ b).to_s(2).count "1"
46
- ```
30
+ You could see in fingerprint calculation benchmark earlier that these improvements didn't make it slower than dHash because most of the time is spent on image resizing. Distance measurement is what became slower.
47
31
 
48
32
  ### Example
49
33
 
@@ -75,7 +59,7 @@ Then:
75
59
 
76
60
  gem install dhash-vips
77
61
 
78
- If you have troubles with the `gem ruby-vips` dependency, see https://github.com/jcupitt/ruby-vips
62
+ If you have troubles with the `gem ruby-vips` dependency, see https://github.com/libvips/ruby-vips
79
63
 
80
64
  ## Usage
81
65
 
@@ -115,93 +99,112 @@ else
115
99
  end
116
100
  ```
117
101
 
118
- These `15` and `25` numbers are found empirically and just work enough well for 8-byte hashes.
119
- To find out these tresholds we can run a rake task with hardcoded test cases:
120
- ```
121
- $ rake compare_matrices
122
-
123
- Dhash
124
- Absolutely the same image: 0..0
125
- Complex B/W and the same but colorful: 0
126
- Similar images: 13..16
127
- Different images: 9..41
128
-
129
- DHashVips::DHash
130
- Absolutely the same image: 0..0
131
- Complex B/W and the same but colorful: 5
132
- Similar images: 17..18
133
- Different images: 14..39
134
-
135
- DHashVips::IDHash
136
- Absolutely the same image: 0..0
137
- Complex B/W and the same but colorful: 0
138
- Similar images: 15..23
139
- Different images: 19..64
140
-
141
- DHashVips::IDHash 4
142
- Absolutely the same image: 0..0
143
- Complex B/W and the same but colorful: 0
144
- Similar images: 71..108
145
- Different images: 102..211
102
+ ### Notes and benchmarks
146
103
 
147
- ```
104
+ * The above `15` and `25` constants are found empirically and just work enough well for 8-byte hashes. To find these tresholds we can run a rake task with hardcoded test cases (pairs of photos from the same photosession are not the same but are considered to be enough 'similar' for the purpose of this benchmark):
148
105
 
149
- ### Notes
106
+ $ rake compare_quality
107
+
108
+ Dhash Phamilie DHashVips::DHash DHashVips::IDHash DHashVips::IDHash(4)
109
+ The same image: 0..0 0..0 0..0 0..0 0..0
110
+ 'Jordan Voth case': 4 2 4 0 0
111
+ Similar images: 1..17 14..34 2..23 6..22 53..166
112
+ Different images: 9..57 22..42 9..50 18..65 120..233
113
+ 1/FMI^2 = 1.25 4.0 1.556 1.25 1.306
114
+ FP, FN = [2, 0] [0, 6] [1, 2] [2, 0] [1, 1]
115
+
116
+ The `FMI` line here is the "quality of algorithm", i.e. the best achievable function from the ["Fowlkes–Mallows index"](https://en.wikipedia.org/wiki/Fowlkes%E2%80%93Mallows_index) value if you take the "similar" and "different" test pairs and try to draw the treshold line. Smaller number is better. Here I've added the [`phamilie` gem](https://github.com/toy/phamilie) that is DCT based (not a kind of dhash). The last line shows number of false positives (`FP`) and false negatives (`FN`) in case of the best achieved FMI.
150
117
 
151
118
  * Methods were renamed from `#calculate` to `#fingerprint` and from `#hamming` to `#distance`.
152
- * The `DHash#calculate` accepts `hash_size` optional parameter that is 8 by default. The `IDHash#fingerprint`'s optional parameter is called `power` and works in a bit different way: 3 means 8 and 4 means 16 -- other sizes are not supported because they don't seem to be useful (higher fingerprint resolution makes it vulnerable to image shifts and croppings, also `#distance` becomes much slower). Because IDHash's fingerprint is more complex than DHash's one it's not that straight forward to compare them so under the hood the `#distance` methods have to check the size of fingerprint -- this trade-off costs 30-40% of speed that can be eliminated by using `#distance3` method that assumes fingerprint to be of power=3. So the full benchmark is this one:
119
+ * The `DHash#calculate` accepts `hash_size` optional parameter that is 8 by default. The `IDHash#fingerprint`'s optional parameter is called `power` and works in a bit different way: 3 means 8 and 4 means 16 -- other sizes are not supported because they don't seem to be useful (higher fingerprint resolution makes it vulnerable to image shifts and croppings, also `#distance` becomes much slower). Because IDHash's fingerprint is more complex than DHash's one it's not that straight forward to compare them so under the hood the `#distance` method have to check the size of fingerprint. If you are sure that fingerprints were made with power=3 then to skip the check you may use the `#distance3` method directly.
120
+ * The `#distance3` method will use Ruby C extension that is around 15 times faster than pure Ruby implementation -- native extension is currently hardcoded to be compiled only if it's macOS and rbenv Ruby 2.3.8 installed with `-k` flag but if you know how to make the gem gracefully fallback to native Ruby if `make` fails let me know or make a pull request. So the full benchmark:
153
121
 
154
- ```
155
- # Ruby 2.0.0
156
-
157
- load and calculate the fingerprint:
158
- user system total real
159
- Dhash 12.400000 0.820000 13.220000 ( 13.329952)
160
- DHashVips::DHash 1.330000 0.230000 1.560000 ( 1.509826)
161
- DHashVips::IDHash 1.060000 0.090000 1.150000 ( 1.100332)
162
- DHashVips::IDHash 4 1.030000 0.080000 1.110000 ( 1.089148)
163
-
164
- measure the distance (1000 times):
165
- user system total real
166
- Dhash hamming 3.140000 0.020000 3.160000 ( 3.179392)
167
- DHashVips::DHash hamming 3.040000 0.020000 3.060000 ( 3.095190)
168
- DHashVips::IDHash distance 8.170000 0.040000 8.210000 ( 8.279950)
169
- DHashVips::IDHash distance3 6.720000 0.030000 6.750000 ( 6.790900)
170
- DHashVips::IDHash distance 4 24.430000 0.130000 24.560000 ( 24.652625)
171
- ```
172
- (macOS system MRI 2.3 has some nice bit arithmetics improvement compared to 2.0)
173
- ```
174
- # Ruby 2.3.3
175
-
176
- load and calculate the fingerprint:
177
- user system total real
178
- Dhash 13.110000 0.950000 14.060000 ( 14.537057)
179
- DHashVips::DHash 1.480000 0.310000 1.790000 ( 1.808787)
180
- DHashVips::IDHash 1.080000 0.100000 1.180000 ( 1.156446)
181
- DHashVips::IDHash 4 1.030000 0.090000 1.120000 ( 1.076117)
182
-
183
- measure the distance (1000 times):
184
- user system total real
185
- Dhash hamming 1.770000 0.010000 1.780000 ( 1.815612)
186
- DHashVips::DHash hamming 1.810000 0.010000 1.820000 ( 1.875666)
187
- DHashVips::IDHash distance 4.250000 0.020000 4.270000 ( 4.350071)
188
- DHashVips::IDHash distance3 3.430000 0.020000 3.450000 ( 3.499031)
189
- DHashVips::IDHash distance 4 8.210000 0.110000 8.320000 ( 8.510735)
190
- ```
122
+ * Ruby 2.0.0
191
123
 
192
- Also note that to make `#distance` able to assume the fingerprint resolution from the size of Integer that represents it, the change in its structure was needed (left half of bits was swapped with right one), so fingerprints between versions 0.0.4 and 0.0.5 became incompatible, but you probably can convert them manually. I know, incompatibilities suck but if we put the version or structure information inside fingerprint it will became slow to (de)serialize and store.
124
+ $ bundle exec rake compare_speed
193
125
 
194
- ## Troubleshooting
126
+ load the image and calculate the fingerprint:
127
+ user system total real
128
+ Dhash 12.400000 0.820000 13.220000 ( 13.329952)
129
+ DHashVips::DHash 1.330000 0.230000 1.560000 ( 1.509826)
130
+ DHashVips::IDHash 1.060000 0.090000 1.150000 ( 1.100332)
131
+ DHashVips::IDHash 4 1.030000 0.080000 1.110000 ( 1.089148)
195
132
 
196
- El Captain and rbenv may cause environment issues that would make you do things like:
197
- ```
198
- ./ruby `rbenv which rake` compare_matrixes
199
- ```
200
- instead of just
201
- ```
202
- rake compare_matrixes
203
- ```
204
- For more information on that: https://github.com/jcupitt/ruby-vips/issues/141
133
+ measure the distance (1000 times):
134
+ user system total real
135
+ Dhash hamming 3.140000 0.020000 3.160000 ( 3.179392)
136
+ DHashVips::DHash hamming 3.040000 0.020000 3.060000 ( 3.095190)
137
+ DHashVips::IDHash distance 8.170000 0.040000 8.210000 ( 8.279950)
138
+ DHashVips::IDHash distance3 6.720000 0.030000 6.750000 ( 6.790900)
139
+ DHashVips::IDHash distance 4 24.430000 0.130000 24.560000 ( 24.652625)
140
+
141
+ * Ruby 2.3.3 seems to have some bit arithmetics improvement compared to 2.0:
142
+
143
+ load the image and calculate the fingerprint:
144
+ user system total real
145
+ Dhash 13.110000 0.950000 14.060000 ( 14.537057)
146
+ DHashVips::DHash 1.480000 0.310000 1.790000 ( 1.808787)
147
+ DHashVips::IDHash 1.080000 0.100000 1.180000 ( 1.156446)
148
+ DHashVips::IDHash 4 1.030000 0.090000 1.120000 ( 1.076117)
149
+
150
+ measure the distance (1000 times):
151
+ user system total real
152
+ Dhash hamming 1.770000 0.010000 1.780000 ( 1.815612)
153
+ DHashVips::DHash hamming 1.810000 0.010000 1.820000 ( 1.875666)
154
+ DHashVips::IDHash distance 4.250000 0.020000 4.270000 ( 4.350071)
155
+ DHashVips::IDHash distance3 3.430000 0.020000 3.450000 ( 3.499031)
156
+ DHashVips::IDHash distance 4 8.210000 0.110000 8.320000 ( 8.510735)
157
+
158
+ * Ruby 2.3.8p459 (2.4.6, 2.5.5 and 2.6.3 are all similar) with newer CPU (`sysctl -n machdep.cpu.brand_string #=> Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz`):
159
+
160
+ load the image and calculate the fingerprint:
161
+ user system total real
162
+ Dhash 6.191731 0.230885 6.422616 ( 6.428763)
163
+ Phamilie 5.361751 0.037524 5.399275 ( 5.402553)
164
+ DHashVips::DHash 0.858045 0.144820 1.002865 ( 0.924308)
165
+ DHashVips::IDHash 0.769975 0.071087 0.841062 ( 0.790470)
166
+ DHashVips::IDHash 4 0.805311 0.077918 0.883229 ( 0.825897)
167
+
168
+ measure the distance (2000 times):
169
+ user system total real
170
+ Dhash hamming 1.810000 0.000000 1.810000 ( 1.824719)
171
+ Phamilie distance 1.000000 0.010000 1.010000 ( 1.006127)
172
+ DHashVips::DHash hamming 1.810000 0.000000 1.810000 ( 1.817415)
173
+ DHashVips::IDHash distance 1.400000 0.000000 1.400000 ( 1.401333)
174
+ DHashVips::IDHash distance3_ruby 3.320000 0.010000 3.330000 ( 3.337920)
175
+ DHashVips::IDHash distance3_c 0.210000 0.000000 0.210000 ( 0.212864)
176
+ DHashVips::IDHash distance 4 8.300000 0.120000 8.420000 ( 8.499735)
177
+
178
+ * Also note that to make `#distance` able to assume the fingerprint resolution from the size of Integer that represents it, the change in its structure was needed (left half of bits was swapped with right one), so fingerprints between versions 0.0.4 and 0.0.5 became incompatible, but you probably can convert them manually. Otherwise if we put the version or structure information inside fingerprint it would became slow to (de)serialize and store.
179
+
180
+ ## Development
181
+
182
+ * OS X El Captain and rbenv may cause environment issues that would make you do things like:
183
+
184
+ $ ./ruby `rbenv which rake` compare_matrixes
185
+
186
+ instead of just
187
+
188
+ $ rake compare_matrixes
189
+
190
+ For more information on that: https://github.com/jcupitt/ruby-vips/issues/141
191
+
192
+ * On macOS, when you do `bundle install` it may fail to install `rmagick` gem (`dhash` gem dependency) saying:
193
+
194
+ ERROR: Can't install RMagick 4.0.0. Can't find magick/MagickCore.h.
195
+
196
+ To resolve this do:
197
+
198
+ $ brew install imagemagick@6
199
+ $ LDFLAGS="-L/usr/local/opt/imagemagick@6/lib" CPPFLAGS="-I/usr/local/opt/imagemagick@6/include" bundle install
200
+
201
+ * If you get `No package 'MagickCore' found` try:
202
+
203
+ $ PKG_CONFIG_PATH="/usr/local/Cellar/imagemagick@6/6.9.10-74/lib/pkgconfig" bundle install
204
+
205
+ * You might need to prepend `bundle exec` to all the `rake` commands.
206
+
207
+ * Execute the `rake compare_quality` at least once before executing other rake tasks because it's currently the only one that downloads the test images.
205
208
 
206
209
  ## Credits
207
210
 
data/Rakefile CHANGED
@@ -1,15 +1,7 @@
1
1
  STDOUT.sync = true
2
+ require "pp"
2
3
 
3
- require "bundler/gem_tasks"
4
-
5
-
6
- task :default => %w{ spec }
7
-
8
- require "rspec/core/rake_task"
9
- RSpec::Core::RakeTask.new :spec do |t|
10
- t.verbose = false
11
- end
12
-
4
+ require "bundler/gem_tasks" # to push to rubygems
13
5
 
14
6
  visualize_hash = lambda do |hash|
15
7
  puts hash.to_s(2).rjust(64, ?0).gsub(/(?<=.)/, '\0 ').scan(/.{16}/)
@@ -50,58 +42,80 @@ task :compare_kernels do |_|
50
42
  end
51
43
  end
52
44
 
53
- # ./ruby `rbenv which rake` compare_matrixes
54
- desc "Compare the quality of Dhash, DHashVips::DHash and DHashVips::IDHash -- run it only after `rake test`"
55
- task :compare_matrices do |_|
45
+ desc "Compare the quality of Dhash, Phamilie, DHashVips::DHash, DHashVips::IDHash"
46
+ # in this test we want to know not that photos are the same but rather that they are from the same photosession
47
+ task :compare_quality do
56
48
  require "dhash"
49
+ require "phamilie"
50
+ phamilie = Phamilie.new
57
51
  require_relative "lib/dhash-vips"
58
52
  require "mll"
59
- [
60
- [Dhash, :calculate, :hamming],
61
- [DHashVips::DHash, :calculate, :hamming],
62
- [DHashVips::IDHash, :fingerprint, :distance],
63
- [DHashVips::IDHash, :fingerprint, :distance, 4],
64
- ].each do |m, calc, dm, power|
65
- puts "\n#{m} #{power}"
66
- hashes = %w{
67
- 71662d4d4029a3b41d47d5baf681ab9a.jpg
68
- ad8a37f872956666c3077a3e9e737984.jpg
69
-
70
- 6d97739b4a08f965dc9239dd24382e96.jpg
71
- 1b1d4bde376084011d027bba1c047a4b.jpg
72
-
73
- 1d468d064d2e26b5b5de9a0241ef2d4b.jpg
74
- 92d90b8977f813af803c78107e7f698e.jpg
75
-
76
- 309666c7b45ecbf8f13e85a0bd6b0a4c.jpg
77
- 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
78
- df0a3b93e9412536ee8a11255f974141.jpg
79
- 679634ff89a31279a39f03e278bc9a01.jpg
80
- }.map{ |filename| m.public_send calc, "images/#{filename}", *power }
81
- table = MLL::table[m.method(dm), [hashes], [hashes]]
82
- # require "pp"
83
- # pp table
84
- array = Array.new(5){ [] }
85
- hashes.size.times.to_a.repeated_combination(2) do |i, j|
86
- array[i == j ? 0 : (j - i).abs == 1 && (i + j - 1) % 4 == 0 ? [i, j] == [0, 1] ? 1 : [i, j] == [2, 3] ? 2 : 3 : 4].push table[i][j]
87
- end
88
- # p array.map &:sort
89
- puts "Absolutely the same image: #{array[0].minmax.join ".."}"
90
- puts "Complex B/W and the same but colorful: #{array[1][0]}"
91
- puts "Similar images: #{array[3].minmax.join ".."}"
92
- puts "Different images: #{[*array[2], *array[4]].minmax.join ".."}"
93
- end
53
+
54
+ puts MLL::grid.call( [
55
+ ["", "The same image:", "'Jordan Voth case':", "Similar images:", "Different images:", "1/FMI^2 =", "FP, FN ="],
56
+ *[
57
+ [Dhash, :calculate, :hamming],
58
+ [phamilie, :fingerprint, :distance, nil, 0],
59
+ [DHashVips::DHash, :calculate, :hamming],
60
+ [DHashVips::IDHash, :fingerprint, :distance],
61
+ [DHashVips::IDHash, :fingerprint, :distance, 4],
62
+ ].map do |m, calc, dm, power, ii|
63
+ require_relative "common"
64
+ hashes = %w{
65
+ 71662d4d4029a3b41d47d5baf681ab9a.jpg ad8a37f872956666c3077a3e9e737984.jpg
66
+
67
+ 1b1d4bde376084011d027bba1c047a4b.jpg 6d97739b4a08f965dc9239dd24382e96.jpg
68
+
69
+ 1d468d064d2e26b5b5de9a0241ef2d4b.jpg 92d90b8977f813af803c78107e7f698e.jpg
70
+ 309666c7b45ecbf8f13e85a0bd6b0a4c.jpg 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
71
+ 679634ff89a31279a39f03e278bc9a01.jpg df0a3b93e9412536ee8a11255f974141.jpg
72
+ 54192a3f65bd03163b04849e1577a40b.jpg 6d32f57459e5b79b5deca2a361eb8c6e.jpg
73
+ 4b62e0eef58bfbc8d0d2fbf2b9d05483.jpg b8eb0ca91855b657f12fb3d627d45c53.jpg
74
+ 21cd9a6986d98976b6b4655e1de7baf4.jpg 9b158c0d4953d47171a22ed84917f812.jpg
75
+ 9c2c240ec02356472fb532f404d28dde.jpg fc762fa286489d8afc80adc8cdcb125e.jpg
76
+ 7a833d873f8d49f12882e86af1cc6b79.jpg ac033cf01a3941dd1baa876082938bc9.jpg
77
+ }.map(&method(:download_and_keep)).map{ |filename| [filename, m.public_send(calc, filename, *power)] }
78
+ table = MLL::table[m.method(dm), [hashes.map{|_|_[ii||1]}], [hashes.map{|_|_[ii||1]}]]
79
+ report = Struct.new(:same, :bw, :sim, :not_sim).new [], [], [], []
80
+ hashes.size.times.to_a.repeated_combination(2) do |i, j|
81
+ report[
82
+ case
83
+ when i == j ; :same
84
+ when [i, j] == [0, 1] ; :bw
85
+ when i > 3 && i + 1 == j && i % 2 == 0 ; :sim
86
+ else ; :not_sim
87
+ end
88
+ ].push table[i][j]
89
+ end
90
+ min, max = [*report.sim, *report.not_sim].minmax
91
+ fmi, fp, fn = (min..max+1).map do |b|
92
+ fp = report.not_sim.count{ |_| _ < b }
93
+ tp = report.sim.count{ |_| _ < b }
94
+ fn = report.sim.count{ |_| _ >= b }
95
+ [((tp + fp) * (tp + fn)).fdiv(tp * tp), fp, fn]
96
+ end.reject{ |_,| _.nan? }.min_by(&:first)
97
+ [
98
+ "#{m.is_a?(Module) ? m : m.class}#{"(#{power})" if power}",
99
+ report.same. minmax.join(".."),
100
+ report.bw[0],
101
+ report.sim. minmax.join(".."),
102
+ report.not_sim.minmax.join(".."),
103
+ fmi.round(3),
104
+ [fp, fn]
105
+ ]
106
+ end,
107
+ ].transpose, spacings: [1.5, 0], alignment: :right )
94
108
  end
95
109
 
96
110
  # ruby -c Rakefile && rm -f ab.png && rake compare_images -- fc762fa286489d8afc80adc8cdcb125e.jpg 9c2c240ec02356472fb532f404d28dde.jpg 2>/dev/null && ql ab.png
97
111
  # rm -f ab.png && ./ruby `rbenv which rake` compare_images -- 6d97739b4a08f965dc9239dd24382e96.jpg 1b1d4bde376084011d027bba1c047a4b.jpg 2>/dev/null && ql ab.png
98
112
  desc "Visualizes the IDHash difference measurement between two images"
99
113
  task :compare_images do |_|
100
- abort "there should be two image filenames passed as arguments (and optionally the `power`)" unless (4..5) === ARGV.size
101
- abort "the optional argument should be either 3 or 4" unless [3, 4].include?(power = (ARGV[4] || 3).to_i)
114
+ abort "there should be two image filenames passed as arguments (and optionally the `power`)" unless (3..4) === ARGV.size
115
+ abort "the optional argument should be either 3 or 4" unless [3, 4].include?(power = (ARGV[3] || 3).to_i)
102
116
  task ARGV.last do ; end
103
117
  require_relative "lib/dhash-vips"
104
- ha, hb = ARGV[2, 2].map{ |filename| DHashVips::IDHash.fingerprint(filename, power) }
118
+ ha, hb = ARGV[1, 2].map{ |filename| DHashVips::IDHash.fingerprint(filename, power) }
105
119
  puts "distance: #{DHashVips::IDHash.distance ha, hb}"
106
120
  size = 2 ** power
107
121
  shift = 2 * size * size
@@ -110,7 +124,7 @@ task :compare_images do |_|
110
124
  bi = hb >> shift
111
125
  bd = hb - (bi << shift)
112
126
 
113
- a, b = ARGV[2, 2].map do |filename|
127
+ a, b = ARGV[1, 2].map do |filename|
114
128
  image = Vips::Image.new_from_file filename
115
129
  image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w").
116
130
  resize(100, vscale: 100, kernel: :nearest).colourspace("srgb")
@@ -191,6 +205,8 @@ end
191
205
  desc "Benchmarks Dhash, DHashVips::DHash and DHashVips::IDHash"
192
206
  task :compare_speed do
193
207
  require "dhash"
208
+ require "phamilie"
209
+ phamilie = Phamilie.new
194
210
  require_relative "lib/dhash-vips"
195
211
 
196
212
  filenames = %w{
@@ -214,33 +230,41 @@ task :compare_speed do
214
230
  end
215
231
 
216
232
  require "benchmark"
217
- puts "load and calculate the fingerprint:"
233
+ puts "load the image and calculate the fingerprint:"
218
234
  hashes = []
219
235
  Benchmark.bm 19 do |bm|
220
236
  [
221
237
  [Dhash, :calculate],
238
+ [phamilie, :fingerprint],
222
239
  [DHashVips::DHash, :calculate],
223
240
  [DHashVips::IDHash, :fingerprint],
224
241
  [DHashVips::IDHash, :fingerprint, 4],
225
242
  ].each do |m, calc, power|
226
- bm.report "#{m} #{power}" do
243
+ bm.report "#{m.is_a?(Module) ? m : m.class} #{power}" do
227
244
  hashes.push filenames.map{ |filename| m.send calc, filename, *power }
228
245
  end
229
246
  end
230
247
  end
231
- hashes[-1, 1] = hashes[-2, 2] # for `distance` and `distance3` we use the same hashes
232
- puts "\nmeasure the distance (1000 times):"
233
- Benchmark.bm 29 do |bm|
248
+
249
+ # for `distance`, `distance3_ruby` and `distance3_c` we use the same hashes
250
+ hashes[-1, 1] = hashes[-2, 2]
251
+ hashes[-1, 1] = hashes[-2, 2]
252
+
253
+ puts "\nmeasure the distance (2000 times):"
254
+ Benchmark.bm 32 do |bm|
234
255
  [
235
256
  [Dhash, :hamming],
257
+ [phamilie, :distance, nil, 1],
236
258
  [DHashVips::DHash, :hamming],
237
259
  [DHashVips::IDHash, :distance],
238
- [DHashVips::IDHash, :distance3],
260
+ [DHashVips::IDHash, :distance3_ruby],
261
+ [DHashVips::IDHash, :distance3_c],
239
262
  [DHashVips::IDHash, :distance, 4],
240
- ].zip(hashes) do |(m, dm, power), hs|
241
- bm.report "#{m} #{dm} #{power}" do
242
- hs.product hs do |h1, h2|
243
- 1000.times{ m.public_send dm, h1, h2 }
263
+ ].zip(hashes) do |(m, dm, power, ii), hs|
264
+ bm.report "#{m.is_a?(Module) ? m : m.class} #{dm} #{power}" do
265
+ _ = [hs, filenames][ii || 0]
266
+ _.product _ do |h1, h2|
267
+ 2000.times{ m.public_send dm, h1, h2 }
244
268
  end
245
269
  end
246
270
  end
@@ -0,0 +1,14 @@
1
+ def download_and_keep image
2
+ require "open-uri"
3
+ FileUtils.mkdir_p dir = "images"
4
+ "#{dir}/#{image}".tap do |filename|
5
+ require "digest" if Gem::Version.new(RUBY_VERSION) >= Gem::Version.new("2.5.0")
6
+ unless File.exist?(filename) && Digest::MD5.file(filename) == File.basename(filename, ".jpg")
7
+ open("https://storage.googleapis.com/dhash-vips.nakilon.pro/#{image}") do |link|
8
+ File.open(filename, "wb") do |file|
9
+ IO.copy_stream link, file
10
+ end
11
+ end
12
+ end
13
+ end
14
+ end
@@ -1,20 +1,27 @@
1
1
  Gem::Specification.new do |spec|
2
2
  spec.name = "dhash-vips"
3
- spec.version = "0.0.6.0"
3
+ spec.version = "0.1.0.0"
4
4
  spec.author = "Victor Maslov"
5
5
  spec.email = "nakilon@gmail.com"
6
6
  spec.summary = "dHash and IDHash powered by Vips"
7
7
  spec.homepage = "https://github.com/nakilon/dhash-vips"
8
8
  spec.license = "MIT"
9
9
 
10
- spec.test_files = ["spec"]
11
- spec.files = `git ls-files -z`.split("\x0") - spec.test_files
10
+ spec.test_files = %w{ test.rb }
11
+ spec.files = `git ls-files -z`.split("\x0") - spec.test_files - %w{ .gitignore }
12
12
  spec.require_path = "lib"
13
+ spec.extensions = %w{ extconf.rb }
13
14
 
14
- spec.add_dependency "ruby-vips"
15
+ spec.add_dependency "ruby-vips", "~>2.0.16"
15
16
 
16
17
  spec.add_development_dependency "rake"
17
- spec.add_development_dependency "rspec-core"
18
- spec.add_development_dependency "dhash"
18
+ spec.add_development_dependency "minitest"
19
19
  spec.add_development_dependency "get_process_mem"
20
+
21
+ spec.add_development_dependency "rmagick", "~>2.16"
22
+ spec.add_development_dependency "phamilie"
23
+ spec.add_development_dependency "dhash"
24
+
25
+ spec.add_development_dependency "mll"
26
+ spec.add_development_dependency "byebug"
20
27
  end
@@ -0,0 +1,31 @@
1
+ require "mkmf"
2
+
3
+ File.write "Makefile", dummy_makefile(?.).join
4
+ unless Gem::Platform.local.os == "darwin" && Gem::Version.new(RUBY_VERSION) == Gem::Version.new("2.3.8")
5
+ else
6
+ begin
7
+ # https://github.com/rbenv/rbenv/issues/1199
8
+ append_cppflags "-I#{Dir.glob("#{`rbenv root`.chomp}/sources/#{`rbenv version-name`.chomp}/*/").first}"
9
+ rescue
10
+ else
11
+ create_makefile "idhash"
12
+ # Why this hack?
13
+ # 1. Because I want to use Ruby and ./idhash.bundle for tests, not C.
14
+ # 2. Because I don't want to bother users with two gems instead of one.
15
+ File.write "Makefile", <<~HEREDOC + File.read("Makefile")
16
+ .PHONY: test
17
+ test: all
18
+ \t$(RUBY) -r./lib/dhash-vips.rb ./lib/dhash-vips-post-install-test.rb
19
+ HEREDOC
20
+ end
21
+ end
22
+
23
+ # Cases to check:
24
+ # 0. all is ok
25
+ # `rm -rf idhash.o idhash.bundle pkg && bundle exec rake install` # w/o ext # ["/Users/nakilon/_/dhash-vips/lib/dhash-vips.rb", 32]
26
+ # `rm -f idhash.o idhash.bundle Makefile && ruby extconf.rb && make` # with ext # ["/Users/nakilon/_/dhash-vips/lib/dhash-vips.rb", 40]
27
+ # `bundle exec rake -rdhash-vips -e "p DHashVips::IDHash.method(:distance3).source_location"`
28
+ # 1. not macOS && rbenv
29
+ # 2. fail during append_cppflags
30
+ # 3. failed compilation
31
+ # 4. failed tests
@@ -0,0 +1,29 @@
1
+ #include <bignum.c>
2
+
3
+ static VALUE idhash_distance(VALUE self, VALUE a, VALUE b){
4
+ BDIGIT* tempd;
5
+ long i, an = BIGNUM_LEN(a), bn = BIGNUM_LEN(b), templ, acc = 0;
6
+ BDIGIT* as = BDIGITS(a);
7
+ BDIGIT* bs = BDIGITS(b);
8
+ while (0 < an && as[an-1] == 0) an--; // for (i = an; --i;) printf("%u\n", as[i]);
9
+ while (0 < bn && bs[bn-1] == 0) bn--; // for (i = bn; --i;) printf("%u\n", bs[i]);
10
+ // printf("%lu %lu\n", an, bn);
11
+ if (an < bn) {
12
+ tempd = as; as = bs; bs = tempd;
13
+ templ = an; an = bn; bn = templ;
14
+ }
15
+ for (i = an; i-- > 4;) {
16
+ // printf("%ld : (%u | %u) & (%u ^ %u)\n", i, as[i], (i >= bn ? 0 : bs[i]), as[i-4], bs[i-4]);
17
+ acc += __builtin_popcountl((as[i] | (i >= bn ? 0 : bs[i])) & (as[i-4] ^ bs[i-4]));
18
+ // printf("%ld : %ld\n", i, acc);
19
+ }
20
+ RB_GC_GUARD(a);
21
+ RB_GC_GUARD(b);
22
+ return INT2FIX(acc);
23
+ }
24
+
25
+ void Init_idhash() {
26
+ VALUE m = rb_define_module("DHashVips");
27
+ VALUE mm = rb_define_module_under(m, "IDHash");
28
+ rb_define_module_function(mm, "distance3_c", idhash_distance, 2);
29
+ }
@@ -0,0 +1,36 @@
1
+ puts "Testing native extension..."
2
+
3
+ a, b = 27362028616592833077810614538336061650596602259623245623188871925927275101952, 57097733966917585112089915289446881218887831888508524872740133297073405558528
4
+ f = ->(a,b){ DHashVips::IDHash.distance3_ruby a, b }
5
+
6
+ p as = [a.to_s(16).rjust(64,?0)].pack("H*").unpack("N*")
7
+ p bs = [b.to_s(16).rjust(64,?0)].pack("H*").unpack("N*")
8
+ puts as.zip(bs)[0,4].map{ |i,j| (i | j).to_s(2).rjust(32, ?0) }.zip \
9
+ as.zip(bs)[4,4].map{ |i,j| (i ^ j).to_s(2).rjust(32, ?0) }
10
+ p DHashVips::IDHash.distance3_c a, b
11
+ p f[a, b]
12
+ fail unless 17 == f[a, b]
13
+
14
+ s = [0, 1, 1<<63, (1<<63)+1, (1<<64)-1].each do |_|
15
+ # p [_.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
16
+ end
17
+ ss = s.repeated_permutation(4).map do |s1, s2, s3, s4|
18
+ ((s1 << 192) + (s2 << 128) + (s3 << 64) + s4).tap do |_|
19
+ # p [_.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
20
+ end
21
+ end
22
+ fail unless :distance3 == DHashVips::IDHash.method(:distance3).original_name
23
+ ss.product ss do |s1, s2|
24
+ next unless s1.is_a?(Bignum) && s2.is_a?(Bignum)
25
+ unless f[s1, s2] == DHashVips::IDHash.distance3_c(s1, s2)
26
+ p [s1, s2]
27
+ p [s1.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
28
+ p [s2.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
29
+ p [f[s1, s2], DHashVips::IDHash.distance3_c(s1, s2)]
30
+ fail
31
+ end
32
+ end
33
+ 100000.times do
34
+ s1, s2 = Array.new(2){ n = rand 256; ([?0] * n + [?1] * (256 - n)).shuffle.join.to_i 2 }
35
+ fail unless DHashVips::IDHash.distance3(s1, s2) == DHashVips::IDHash.distance3_ruby(s1, s2)
36
+ end
@@ -21,7 +21,7 @@ module DHashVips
21
21
  def calculate file, hash_size = 8, kernel = nil
22
22
  image = pixelate file, hash_size, kernel
23
23
 
24
- image.cast("int").conv([1, -1]).crop(1, 0, hash_size, hash_size).>(0)./(255).cast("uchar").to_a.join.to_i(2)
24
+ image.cast("int").conv([[1, -1]]).crop(1, 0, hash_size, hash_size).>(0)./(255).cast("uchar").to_a.join.to_i(2)
25
25
  end
26
26
 
27
27
  end
@@ -29,8 +29,21 @@ module DHashVips
29
29
  module IDHash
30
30
  extend self
31
31
 
32
- def distance3 a, b
33
- return ((a ^ b) & (a | b) >> 128).to_s(2).count "1"
32
+ def distance3_ruby a, b
33
+ ((a ^ b) & (a | b) >> 128).to_s(2).count "1"
34
+ end
35
+ begin
36
+ require_relative "../idhash.bundle"
37
+ rescue LoadError
38
+ alias_method :distance3, :distance3_ruby
39
+ else
40
+ def distance3 a, b
41
+ if a.is_a?(Bignum) && b.is_a?(Bignum)
42
+ distance3_c a, b
43
+ else
44
+ distance3_ruby a, b
45
+ end
46
+ end
34
47
  end
35
48
  def distance a, b
36
49
  size_a, size_b = [a, b].map do |x|
@@ -39,6 +52,7 @@ module DHashVips
39
52
  # but also 31, 30 happens for MRI 2.3
40
53
  x.size <= 32 ? 8 : 16
41
54
  end
55
+ return distance3 a, b if [8, 8] == [size_a, size_b]
42
56
  fail "fingerprints were taken with different `power` param: #{size_a} and #{size_b}" if size_a != size_b
43
57
  ((a ^ b) & (a | b) >> 2 * size_a * size_a).to_s(2).count "1"
44
58
  end
data/test.rb ADDED
@@ -0,0 +1,99 @@
1
+ require "minitest/autorun"
2
+
3
+ require "dhash-vips"
4
+
5
+ # TODO tests about `fingerprint(4)`
6
+
7
+ [
8
+ [DHashVips::DHash, :hamming, :calculate, 2, 23, 18, 50, 4],
9
+ # [[0, 14, 26, 27, 31, 27, 32, 28, 43, 43, 34, 37, 37, 34, 35, 42],
10
+ # [14, 0, 28, 25, 39, 35, 32, 32, 43, 43, 38, 41, 41, 38, 37, 50],
11
+ # [26, 28, 0, 13, 35, 41, 28, 30, 41, 41, 36, 33, 35, 32, 27, 36],
12
+ # [27, 25, 13, 0, 36, 36, 31, 35, 40, 40, 33, 32, 42, 35, 26, 33],
13
+ # [31, 39, 35, 36, 0, 16, 33, 33, 40, 40, 31, 24, 28, 33, 40, 31],
14
+ # [27, 35, 41, 36, 16, 0, 41, 41, 38, 38, 23, 26, 24, 29, 34, 27],
15
+ # [32, 32, 28, 31, 33, 41, 0, 10, 27, 25, 38, 35, 37, 32, 23, 34],
16
+ # [28, 32, 30, 35, 33, 41, 10, 0, 27, 27, 34, 31, 37, 36, 27, 34],
17
+ # [43, 43, 41, 40, 40, 38, 27, 27, 0, 2, 35, 34, 30, 31, 28, 27],
18
+ # [43, 43, 41, 40, 40, 38, 25, 27, 2, 0, 35, 34, 30, 31, 28, 27],
19
+ # [34, 38, 36, 33, 31, 23, 38, 34, 35, 35, 0, 9, 23, 26, 29, 18],
20
+ # [37, 41, 33, 32, 24, 26, 35, 31, 34, 34, 9, 0, 22, 25, 30, 19],
21
+ # [37, 41, 35, 42, 28, 24, 37, 37, 30, 30, 23, 22, 0, 19, 26, 23],
22
+ # [34, 38, 32, 35, 33, 29, 32, 36, 31, 31, 26, 25, 19, 0, 21, 26],
23
+ # [35, 37, 27, 26, 40, 34, 23, 27, 28, 28, 29, 30, 26, 21, 0, 23],
24
+ # [42, 50, 36, 33, 31, 27, 34, 34, 27, 27, 18, 19, 23, 26, 23, 0]]
25
+ [DHashVips::IDHash, :distance, :fingerprint, 6, 22, 23, 65, 0],
26
+ # [[0, 17, 32, 35, 57, 45, 51, 50, 48, 47, 54, 48, 60, 50, 47, 56],
27
+ # [17, 0, 30, 35, 58, 46, 54, 55, 47, 51, 57, 49, 62, 52, 52, 60],
28
+ # [32, 30, 0, 9, 47, 54, 45, 41, 65, 62, 42, 37, 51, 44, 49, 49],
29
+ # [35, 35, 9, 0, 54, 64, 42, 40, 57, 56, 48, 39, 50, 40, 41, 51],
30
+ # [57, 58, 47, 54, 0, 22, 43, 45, 64, 61, 48, 47, 35, 43, 47, 48],
31
+ # [45, 46, 54, 64, 22, 0, 53, 54, 55, 54, 40, 46, 39, 42, 43, 42],
32
+ # [51, 54, 45, 42, 43, 53, 0, 6, 33, 35, 52, 43, 46, 45, 44, 47],
33
+ # [50, 55, 41, 40, 45, 54, 6, 0, 38, 41, 53, 50, 48, 45, 41, 42],
34
+ # [48, 47, 65, 57, 64, 55, 33, 38, 0, 9, 51, 53, 47, 47, 41, 46],
35
+ # [47, 51, 62, 56, 61, 54, 35, 41, 9, 0, 51, 57, 50, 49, 44, 43],
36
+ # [54, 57, 42, 48, 48, 40, 52, 53, 51, 51, 0, 10, 33, 36, 38, 25],
37
+ # [48, 49, 37, 39, 47, 46, 43, 50, 53, 57, 10, 0, 27, 30, 37, 27],
38
+ # [60, 62, 51, 50, 35, 39, 46, 48, 47, 50, 33, 27, 0, 20, 23, 28],
39
+ # [50, 52, 44, 40, 43, 42, 45, 45, 47, 49, 36, 30, 20, 0, 35, 39],
40
+ # [47, 52, 49, 41, 47, 43, 44, 41, 41, 44, 38, 37, 23, 35, 0, 19],
41
+ # [56, 60, 49, 51, 48, 42, 47, 42, 46, 43, 25, 27, 28, 39, 19, 0]]
42
+ ].each do |lib, dm, calc, min_similar, max_similar, min_not_similar, max_not_similar, bw_exceptional|
43
+
44
+ describe lib do
45
+
46
+ # these are false positive by idhash
47
+ # 6d97739b4a08f965dc9239dd24382e96.jpg
48
+ # 1b1d4bde376084011d027bba1c047a4b.jpg
49
+ [
50
+ [ %w{
51
+ 1d468d064d2e26b5b5de9a0241ef2d4b.jpg 92d90b8977f813af803c78107e7f698e.jpg
52
+ 309666c7b45ecbf8f13e85a0bd6b0a4c.jpg 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
53
+ 679634ff89a31279a39f03e278bc9a01.jpg df0a3b93e9412536ee8a11255f974141.jpg
54
+ 54192a3f65bd03163b04849e1577a40b.jpg 6d32f57459e5b79b5deca2a361eb8c6e.jpg
55
+ 4b62e0eef58bfbc8d0d2fbf2b9d05483.jpg b8eb0ca91855b657f12fb3d627d45c53.jpg
56
+ 21cd9a6986d98976b6b4655e1de7baf4.jpg 9b158c0d4953d47171a22ed84917f812.jpg
57
+ 9c2c240ec02356472fb532f404d28dde.jpg fc762fa286489d8afc80adc8cdcb125e.jpg
58
+ 7a833d873f8d49f12882e86af1cc6b79.jpg ac033cf01a3941dd1baa876082938bc9.jpg
59
+ }, min_similar, max_similar], # slightly silimar images
60
+ [ %w{
61
+ 71662d4d4029a3b41d47d5baf681ab9a.jpg ad8a37f872956666c3077a3e9e737984.jpg
62
+ }, bw_exceptional, bw_exceptional], # these are the same photo but of different size and colorspace
63
+ ].each do |images, min, max|
64
+
65
+ require "fileutils"
66
+ require "digest" if Gem::Version.new(RUBY_VERSION) >= Gem::Version.new("2.5.0")
67
+ require "mll"
68
+
69
+ require_relative "common"
70
+ images = images.map &method(:download_and_keep)
71
+
72
+ hashes = images.map &lib.method(calc)
73
+ table = MLL::table[lib.method(dm), [hashes], [hashes]]
74
+
75
+ require "pp"
76
+ STDERR.puts ""
77
+ PP.pp table, STDERR
78
+ STDERR.puts ""
79
+
80
+ hashes.size.times.to_a.repeated_combination(2) do |i, j|
81
+ it do
82
+ case
83
+ when i == j
84
+ assert_predicate table[i][j], :zero?
85
+ when (j - i).abs == 1 && (i + j - 1) % 4 == 0
86
+ # STDERR.puts [table[i][j], min, max].inspect
87
+ assert_includes min..max, table[i][j]
88
+ else
89
+ # STDERR.puts [table[i][j], min_not_similar, max_not_similar].inspect
90
+ assert_includes min_not_similar..max_not_similar, table[i][j]
91
+ end
92
+ end
93
+ end
94
+
95
+ end
96
+
97
+ end
98
+
99
+ end
metadata CHANGED
@@ -1,23 +1,37 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dhash-vips
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.6.0
4
+ version: 0.1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Victor Maslov
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-03-25 00:00:00.000000000 Z
11
+ date: 2019-12-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ruby-vips
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: 2.0.16
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: 2.0.16
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
15
29
  requirement: !ruby/object:Gem::Requirement
16
30
  requirements:
17
31
  - - ">="
18
32
  - !ruby/object:Gem::Version
19
33
  version: '0'
20
- type: :runtime
34
+ type: :development
21
35
  prerelease: false
22
36
  version_requirements: !ruby/object:Gem::Requirement
23
37
  requirements:
@@ -25,7 +39,21 @@ dependencies:
25
39
  - !ruby/object:Gem::Version
26
40
  version: '0'
27
41
  - !ruby/object:Gem::Dependency
28
- name: rake
42
+ name: minitest
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: get_process_mem
29
57
  requirement: !ruby/object:Gem::Requirement
30
58
  requirements:
31
59
  - - ">="
@@ -39,7 +67,21 @@ dependencies:
39
67
  - !ruby/object:Gem::Version
40
68
  version: '0'
41
69
  - !ruby/object:Gem::Dependency
42
- name: rspec-core
70
+ name: rmagick
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '2.16'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '2.16'
83
+ - !ruby/object:Gem::Dependency
84
+ name: phamilie
43
85
  requirement: !ruby/object:Gem::Requirement
44
86
  requirements:
45
87
  - - ">="
@@ -67,7 +109,21 @@ dependencies:
67
109
  - !ruby/object:Gem::Version
68
110
  version: '0'
69
111
  - !ruby/object:Gem::Dependency
70
- name: get_process_mem
112
+ name: mll
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '0'
125
+ - !ruby/object:Gem::Dependency
126
+ name: byebug
71
127
  requirement: !ruby/object:Gem::Requirement
72
128
  requirements:
73
129
  - - ">="
@@ -83,17 +139,21 @@ dependencies:
83
139
  description:
84
140
  email: nakilon@gmail.com
85
141
  executables: []
86
- extensions: []
142
+ extensions:
143
+ - extconf.rb
87
144
  extra_rdoc_files: []
88
145
  files:
89
- - ".gitignore"
90
146
  - Gemfile
91
147
  - LICENSE.txt
92
148
  - README.md
93
149
  - Rakefile
150
+ - common.rb
94
151
  - dhash-vips.gemspec
152
+ - extconf.rb
153
+ - idhash.c
154
+ - lib/dhash-vips-post-install-test.rb
95
155
  - lib/dhash-vips.rb
96
- - spec/_spec.rb
156
+ - test.rb
97
157
  homepage: https://github.com/nakilon/dhash-vips
98
158
  licenses:
99
159
  - MIT
@@ -114,8 +174,9 @@ required_rubygems_version: !ruby/object:Gem::Requirement
114
174
  version: '0'
115
175
  requirements: []
116
176
  rubyforge_project:
117
- rubygems_version: 2.5.2
177
+ rubygems_version: 2.5.2.3
118
178
  signing_key:
119
179
  specification_version: 4
120
180
  summary: dHash and IDHash powered by Vips
121
- test_files: []
181
+ test_files:
182
+ - test.rb
data/.gitignore DELETED
@@ -1,22 +0,0 @@
1
- *.gem
2
- *.rbc
3
- .bundle
4
- .config
5
- .yardoc
6
- Gemfile.lock
7
- InstalledFiles
8
- _yardoc
9
- coverage
10
- doc/
11
- lib/bundler/man
12
- pkg
13
- rdoc
14
- spec/reports
15
- test/tmp
16
- test/version_tmp
17
- tmp
18
- *.bundle
19
- *.so
20
- *.o
21
- *.a
22
- mkmf.log
@@ -1,100 +0,0 @@
1
- require "dhash-vips"
2
-
3
- require "pp"
4
-
5
- # TODO tests about `fingerprint(4)`
6
-
7
- [
8
- [DHashVips::DHash, :hamming, :calculate, 17, 18, 22, 39],
9
- # [[0, 17, 29, 27, 22, 29, 30, 29],
10
- # [17, 0, 30, 26, 33, 36, 37, 36],
11
- # [29, 30, 0, 18, 39, 30, 39, 36],
12
- # [27, 26, 18, 0, 35, 30, 35, 34],
13
- # [22, 33, 39, 35, 0, 17, 28, 23],
14
- # [29, 36, 30, 30, 17, 0, 33, 30],
15
- # [30, 37, 39, 35, 28, 33, 0, 5],
16
- # [29, 36, 36, 34, 23, 30, 5, 0]]
17
- [DHashVips::IDHash, :distance, :fingerprint, 15, 23, 28, 64],
18
- # [[0, 16, 30, 32, 46, 58, 43, 43],
19
- # [16, 0, 28, 28, 47, 59, 46, 47],
20
- # [30, 28, 0, 15, 53, 49, 53, 52],
21
- # [32, 28, 15, 0, 56, 53, 61, 64],
22
- # [46, 47, 53, 56, 0, 23, 43, 45],
23
- # [58, 59, 49, 53, 23, 0, 44, 44],
24
- # [43, 46, 53, 61, 43, 44, 0, 0],
25
- # [43, 47, 52, 64, 45, 44, 0, 0]]
26
- ].each do |lib, dm, calc, min_similar, max_similar, min_not_similar, max_not_similar|
27
-
28
- describe lib do
29
-
30
- require "fileutils"
31
- require "open-uri"
32
- require "digest"
33
- require "mll"
34
- example do |example|
35
-
36
- images = %w{
37
- 1d468d064d2e26b5b5de9a0241ef2d4b.jpg
38
- 92d90b8977f813af803c78107e7f698e.jpg
39
- 309666c7b45ecbf8f13e85a0bd6b0a4c.jpg
40
- 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
41
- df0a3b93e9412536ee8a11255f974141.jpg
42
- 679634ff89a31279a39f03e278bc9a01.jpg
43
- } # these images a consecutive pairs of slightly (but enough for nice asserts) silimar images
44
- # 6d97739b4a08f965dc9239dd24382e96.jpg
45
- # 1b1d4bde376084011d027bba1c047a4b.jpg
46
- # while these two are tend to be false positive match by idhash
47
- bw1, bw2 = %w{
48
- 71662d4d4029a3b41d47d5baf681ab9a.jpg
49
- ad8a37f872956666c3077a3e9e737984.jpg
50
- } # these are the same photo but of different size and colorspace
51
-
52
- example.metadata[:extra_failure_lines] = []
53
- FileUtils.mkdir_p dir = "images"
54
- *images, bw1, bw2 = [*images, bw1, bw2].map do |image|
55
- "#{dir}/#{image}".tap do |filename|
56
- unless File.exist?(filename) && Digest::MD5.file(filename) == File.basename(filename, ".jpg")
57
- example.metadata[:extra_failure_lines] << "copying image from web to #{filename}"
58
- open("https://storage.googleapis.com/dhash-vips.nakilon.pro/#{image}") do |link|
59
- File.open(filename, "wb") do |file|
60
- IO.copy_stream link, file
61
- end
62
- end
63
- end
64
- end
65
- end
66
-
67
- hashes = [*images, bw1, bw2].map &described_class.method(calc)
68
- table = MLL::table[described_class.method(dm), [hashes], [hashes]]
69
-
70
- # require "pp"
71
- # pp table
72
- # next
73
-
74
- aggregate_failures do
75
- hashes.size.times.to_a.repeated_combination(2) do |i, j|
76
- case
77
- when i == j
78
- expect(table[i][j]).to eq 0
79
- when (j - i).abs == 1 && (i + j - 1) % 4 == 0
80
- if [i, j] == [hashes.size - 2, hashes.size - 1]
81
- if described_class == DHashVips::DHash
82
- expect(table[i][j]).to be == 5
83
- else
84
- expect(table[i][j]).to eq 0
85
- end
86
- else
87
- expect(table[i][j]).to be_between(min_similar, max_similar).inclusive
88
- end
89
- else
90
- expect(table[i][j]).to be_between(min_not_similar, max_not_similar).inclusive
91
- end
92
- end
93
-
94
- end
95
-
96
- end
97
-
98
- end
99
-
100
- end