dhash-vips 0.0.5.1 → 0.1.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 73ec3cad8fd74c5eef699cb7c2251077a667c6c0
4
- data.tar.gz: 925b66f53a63671a6b6985a37e7224d9e1b399dc
3
+ metadata.gz: e462c5c41c5f2c5916223c083f22d16ae37a062b
4
+ data.tar.gz: 3261d15852334f30b59bd53cbfdd03e4a6db8758
5
5
  SHA512:
6
- metadata.gz: 8058297adddc0c433c4fce7f1a6e61a345f83801f3e6a739f90231bf289a02665b799e95e997cd3e3c3af327474c2ced3f341770b79423ef9018eb0411998f4a
7
- data.tar.gz: 0466d9ee38940e67e768336c9f216710f9e721d836cb92901a218b506f7bfc4e109e718684bd3097350530eab31e34cca234bc1ab0a168f323c9490525086506
6
+ metadata.gz: 80c90dd232d5c9ea5ccfd65d715c48bafb56ca94aac70e45ced117ad27e748ae81030010f4eb62f963b86f0854d1a9c6d3f4174de730cb8f763fd7dcfcf60f89
7
+ data.tar.gz: bbe1bee6c35949f68fe886512668d3ea5215daa2207f49fe3f2db51b5ea3c1829d2fb7c69a4d3b05a05afab6df7d243ba37d10d6de39aae1088b404189d43326
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2017 Nakilon
3
+ Copyright (c) 2017, Victor Maslov (nakilon@gmail.com)
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,49 +1,21 @@
1
- [![Gem Version](https://badge.fury.io/rb/dhash-vips.svg)](http://badge.fury.io/rb/dhash-vips)
1
+ [![Gem Version](https://badge.fury.io/rb/dhash-vips.svg)](http://badge.fury.io/rb/dhash-vips) [![Docker Image](https://github.com/nakilon/dhash-vips/workflows/Docker%20Image/badge.svg)](https://hub.docker.com/repository/docker/nakilonishe/dhash-vips/general)
2
2
 
3
3
  # dHash and IDHash gem powered by ruby-vips
4
4
 
5
- The "dHash" is an algorithm of fingerprinting that can be used to measure the similarity of two images.
5
+ The **dHash** is the algorithm of image fingerprinting that can be used to measure the similarity of two images.
6
+ The **IDHash** is the new algorithm that has some improvements over dHash -- I'll describe it further.
6
7
 
7
- You may read about it in "Kind of Like That" blog post (21 January 2013): http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html
8
- The original idea is that you split the image into 64 segments and so there are 64 bits -- each tells if the one segment is brighter or darker than the neighbor one. Then the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between fingerprints is the opposite of images similarity.
8
+ You can read about the dHash and perceptual hashing in the article ["Kind of Like That" at "The Hacker Factor Blog"](http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html) (21 January 2013). The idea is that you resize the otiginal image to 8x9 and then convert it to 8x8 array of bits -- each tells if the corresponding segment of the image is brighter or darker than the one on the right (or left). Then you apply the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) to such arrays to measure how much they are different.
9
9
 
10
- There were several implementations on Github already but they all depend on ImageMagick. My implementation takes an advantage of libvips (the `ruby-vips` gem) -- it also uses the `.conv` method and in result converts image to an array of grayscale bytes almost 10 times faster:
11
- ```
12
- load and calculate the fingerprint:
13
- user system total real
14
- Dhash 13.110000 0.950000 14.060000 ( 14.537057)
15
- DHashVips::DHash 1.480000 0.310000 1.790000 ( 1.808787)
16
- DHashVips::IDHash 1.080000 0.100000 1.180000 ( 1.156446)
17
-
18
- measure the distance (1000 times):
19
- user system total real
20
- Dhash hamming 1.770000 0.010000 1.780000 ( 1.815612)
21
- DHashVips::DHash 1.810000 0.010000 1.820000 ( 1.875666)
22
- DHashVips::IDHash 3.430000 0.020000 3.450000 ( 3.499031)
23
- ```
24
-
25
- Here the `Dhash` is [another gem](https://github.com/maccman/dhash) that I used earlier in my projects.
26
- The `DHashVips::DHash` is a port of it that uses vips. I would like to tell you that you can replace the `dhash` with `dhash-vips` gem right now but it appeared to have a barely noticeable issue. There is a lot of magic behind the libvips speed and resizing -- you may not notice it with unarmed eyes but when two neighbor segments are enough similar by luminosity the difference can change the sign. So I found two identical images that were just of different colorspace and size (photo by Jordan Voth):
27
- ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/dhash_issue_example.png)
28
- but the distance between their hashes appeared to be equal to 5 while `dhash` gem reported 0.
29
-
30
- This is why `DHashVips::IDHash` appeared.
10
+ There were several Ruby implementations on Github already but they all depended on ImageMagick. My implementation takes an advantage of speed of the libvips (the `ruby-vips` gem) -- it fingerprints images much faster. For even more speed the fingerprint comparison function is made as native C extension.
31
11
 
32
12
  ## IDHash (the Important Difference Hash)
33
13
 
34
- It has improvements over the dHash that made fingerprinting less sensitive to the resizing algorithm and effectively made the pair of images mentioned above to have a distance of 0 again. Three improvements are:
14
+ The main improvement over the dHash is what makes it insensitive to the resizing algorithm and possible errors due to color scheme conversion.
15
+
35
16
  * The "Importance" is an array of extra 64 bits that tells the comparing function which half of 64 bits is important (when the difference between neighbors was enough significant) and which is not. So not every bit in a fingerprint is being compared but only half of them.
36
17
  * It subtracts not only horizontally but also vertically -- that adds 128 more bits.
37
- * Instead of resizing to 9x8 it resizes to 8x8 and puts the image on a torus so it subtracts the left column from the right one and the top from bottom.
38
-
39
- You could see in fingerprint calculation benchmark earlier that these improvements didn't make it slower than dHash because most of the time is spent on image resizing (at some point it actually even became faster, idk why). The calculation of distance is what became two times slower:
40
- ```ruby
41
- ((a | b) & ((a ^ b) >> 128)).to_s(2).count "1"
42
- ```
43
- vs
44
- ```ruby
45
- (a ^ b).to_s(2).count "1"
46
- ```
18
+ * Instead of resizing to 8x9 it resizes to 8x8 and puts the image on a torus so it subtracts the very left column from the very right one and the top from the bottom.
47
19
 
48
20
  ### Example
49
21
 
@@ -65,6 +37,8 @@ Here in each of 64 cells, there are two circles that color the difference betwee
65
37
  * [Finding the closest pair between two sets of points on the hypercube](https://cstheory.stackexchange.com/q/16322/27420)
66
38
  * [Would PCA work for boolean data types?](https://stats.stackexchange.com/q/159705/1125)
67
39
  * [Using pHash to search agaist a huge image database, what is the best approach?](https://stackoverflow.com/q/18257641/322020)
40
+ * [How do I speed up this BIT_COUNT query for hamming distance?](https://stackoverflow.com/q/35065675/322020)
41
+ * [Hamming distance on binary strings in SQL](https://stackoverflow.com/q/4777070/322020)
68
42
 
69
43
  ## Installation
70
44
 
@@ -75,7 +49,7 @@ Then:
75
49
 
76
50
  gem install dhash-vips
77
51
 
78
- If you have troubles with the `gem ruby-vips` dependency, see https://github.com/jcupitt/ruby-vips
52
+ If you have troubles with the `gem ruby-vips` dependency, see https://github.com/libvips/ruby-vips
79
53
 
80
54
  ## Usage
81
55
 
@@ -115,93 +89,107 @@ else
115
89
  end
116
90
  ```
117
91
 
118
- These `15` and `25` numbers are found empirically and just work enough well for 8-byte hashes.
119
- To find out these tresholds we can run a rake task with hardcoded test cases:
120
- ```
121
- $ rake compare_matrices
122
-
123
- Dhash
124
- Absolutely the same image: 0..0
125
- Complex B/W and the same but colorful: 0
126
- Similar images: 13..16
127
- Different images: 9..41
128
-
129
- DHashVips::DHash
130
- Absolutely the same image: 0..0
131
- Complex B/W and the same but colorful: 5
132
- Similar images: 17..18
133
- Different images: 14..39
134
-
135
- DHashVips::IDHash
136
- Absolutely the same image: 0..0
137
- Complex B/W and the same but colorful: 0
138
- Similar images: 15..23
139
- Different images: 19..64
140
-
141
- DHashVips::IDHash 4
142
- Absolutely the same image: 0..0
143
- Complex B/W and the same but colorful: 0
144
- Similar images: 71..108
145
- Different images: 102..211
92
+ ### Notes and benchmarks
146
93
 
147
- ```
94
+ * The above `15` and `25` constants are found empirically and just work enough well for 8-byte hashes. To find these thresholds we can run a rake task with hardcoded test cases (pairs of photos from the same photosession are not the same but are considered to be enough 'similar' for the purpose of this benchmark):
148
95
 
149
- ### Notes
96
+ $ rake compare_quality
97
+
98
+ Dhash Phamilie DHashVips::DHash DHashVips::IDHash DHashVips::IDHash(4)
99
+ The same image: 0..0 0..0 0..0 0..0 0..0
100
+ 'Jordan Voth case': 2 2 4 0 0
101
+ Similar images: 1..15 14..34 2..23 6..22 53..166
102
+ Different images: 10..54 22..42 10..50 17..65 120..233
103
+ 1/FMI^2 = 1.375 4.0 1.556 1.25 1.306
104
+ FP, FN = [3, 0] [0, 6] [1, 2] [2, 0] [1, 1]
105
+
106
+ The `FMI` line here is the "quality of algorithm", i.e. the best achievable function from the ["Fowlkes–Mallows index"](https://en.wikipedia.org/wiki/Fowlkes%E2%80%93Mallows_index) value if you take the "similar" and "different" test pairs and try to draw the threshold line. Smaller number is better. The last line shows number of false positives (`FP`) and false negatives (`FN`) in case of the best achieved FMI. Here I've added the [`phamilie` gem](https://github.com/toy/phamilie) that is DCT based (not a kind of dhash).
150
107
 
151
108
  * Methods were renamed from `#calculate` to `#fingerprint` and from `#hamming` to `#distance`.
152
- * The `DHash#calculate` accepts `hash_size` optional parameter that is 8 by default. The `IDHash#fingerprint`'s optional parameter is called `power` and works in a bit different way: 3 means 8 and 4 means 16 -- other sizes are not supported because they don't seem to be useful (higher fingerprint resolution makes it vulnerable to image shifts and croppings, also `#distance` becomes much slower). Because IDHash's fingerprint is more complex than DHash's one it's not that straight forward to compare them so under the hood the `#distance` methods have to check the size of fingerprint -- this trade-off costs 30-40% of speed that can be eliminated by using `#distance3` method that assumes fingerprint to be of power=3. So the full benchmark is this one:
109
+ * The `DHash#calculate` accepts `hash_size` optional parameter that is 8 by default. The `IDHash#fingerprint`'s optional parameter is called `power` and works in a bit different way: 3 means 8 and 4 means 16 -- other sizes are not supported because they don't seem to be useful (higher fingerprint resolution makes it vulnerable to image shifts and croppings, also `#distance` becomes much slower). Because IDHash's fingerprint is more complex than DHash's one it's not that straight forward to compare them so under the hood the `#distance` method have to check the size of fingerprint. If you are sure that fingerprints were made with power=3 then to skip the check you may use the `#distance3` method directly.
110
+ * The `#distance3` method will try to compile and use the Ruby C extension that is around 15 times faster than pure Ruby implementation -- native extension currently works on macOS rbenv Ruby from 2.3.8 to 2.4.9 installed with rbenv `-k` flag. So the full benchmark:
153
111
 
154
- ```
155
- # Ruby 2.0.0
156
-
157
- load and calculate the fingerprint:
158
- user system total real
159
- Dhash 12.400000 0.820000 13.220000 ( 13.329952)
160
- DHashVips::DHash 1.330000 0.230000 1.560000 ( 1.509826)
161
- DHashVips::IDHash 1.060000 0.090000 1.150000 ( 1.100332)
162
- DHashVips::IDHash 4 1.030000 0.080000 1.110000 ( 1.089148)
163
-
164
- measure the distance (1000 times):
165
- user system total real
166
- Dhash hamming 3.140000 0.020000 3.160000 ( 3.179392)
167
- DHashVips::DHash hamming 3.040000 0.020000 3.060000 ( 3.095190)
168
- DHashVips::IDHash distance 8.170000 0.040000 8.210000 ( 8.279950)
169
- DHashVips::IDHash distance3 6.720000 0.030000 6.750000 ( 6.790900)
170
- DHashVips::IDHash distance 4 24.430000 0.130000 24.560000 ( 24.652625)
171
- ```
172
- (macOS system MRI 2.3 has some nice bit arithmetics improvement compared to 2.0)
173
- ```
174
- # Ruby 2.3.3
175
-
176
- load and calculate the fingerprint:
177
- user system total real
178
- Dhash 13.110000 0.950000 14.060000 ( 14.537057)
179
- DHashVips::DHash 1.480000 0.310000 1.790000 ( 1.808787)
180
- DHashVips::IDHash 1.080000 0.100000 1.180000 ( 1.156446)
181
- DHashVips::IDHash 4 1.030000 0.090000 1.120000 ( 1.076117)
182
-
183
- measure the distance (1000 times):
184
- user system total real
185
- Dhash hamming 1.770000 0.010000 1.780000 ( 1.815612)
186
- DHashVips::DHash hamming 1.810000 0.010000 1.820000 ( 1.875666)
187
- DHashVips::IDHash distance 4.250000 0.020000 4.270000 ( 4.350071)
188
- DHashVips::IDHash distance3 3.430000 0.020000 3.450000 ( 3.499031)
189
- DHashVips::IDHash distance 4 8.210000 0.110000 8.320000 ( 8.510735)
190
- ```
112
+ * Ruby 2.3.8p459:
191
113
 
192
- Also note that to make `#distance` able to assume the fingerprint resolution from the size of Integer that represents it, the change in its structure was needed (left half of bits was swapped with right one), so fingerprints between versions 0.0.4 and 0.0.5 became incompatible, but you probably can convert them manually. I know, incompatibilities suck but if we put the version or structure information inside fingerprint it will became slow to (de)serialize and store.
114
+ load the image and calculate the fingerprint:
115
+ user system total real
116
+ Dhash 6.191731 0.230885 6.422616 ( 6.428763)
117
+ Phamilie 5.361751 0.037524 5.399275 ( 5.402553)
118
+ DHashVips::DHash 0.858045 0.144820 1.002865 ( 0.924308)
119
+ DHashVips::IDHash 0.769975 0.071087 0.841062 ( 0.790470)
120
+ DHashVips::IDHash 4 0.805311 0.077918 0.883229 ( 0.825897)
193
121
 
194
- ## Troubleshooting
122
+ measure the distance (32*32*2000 times):
123
+ user system total real
124
+ Dhash hamming 1.810000 0.000000 1.810000 ( 1.824719)
125
+ Phamilie distance 1.000000 0.010000 1.010000 ( 1.006127)
126
+ DHashVips::DHash hamming 1.810000 0.000000 1.810000 ( 1.817415)
127
+ DHashVips::IDHash distance 1.400000 0.000000 1.400000 ( 1.401333)
128
+ DHashVips::IDHash distance3_ruby 3.320000 0.010000 3.330000 ( 3.337920)
129
+ DHashVips::IDHash distance3_c 0.210000 0.000000 0.210000 ( 0.212864)
130
+ DHashVips::IDHash distance 4 8.300000 0.120000 8.420000 ( 8.499735)
195
131
 
196
- El Captain and rbenv may cause environment issues that would make you do things like:
197
- ```
198
- ./ruby `rbenv which rake` compare_matrixes
199
- ```
200
- instead of just
201
- ```
202
- rake compare_matrixes
203
- ```
204
- For more information on that: https://github.com/jcupitt/ruby-vips/issues/141
132
+ * There is now a benchmark that runs both speed and quality tests summing results to a single table where lower numbers are better:
133
+
134
+ ruby 2.3.8p459 (2018-10-18 revision 65136) [x86_64-darwin18]
135
+ vips-8.9.2-Tue Apr 21 09:26:11 UTC 2020
136
+ Version: ImageMagick 6.9.11-24 Q16 x86_64 2020-07-18
137
+ Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
138
+
139
+ Fingerprint Compare 1/FMI^2
140
+ Phamilie 3.943 0.630 4.000
141
+ Dhash 4.969 1.097 1.375
142
+ DHash 0.434 1.089 1.556
143
+ IDHash 0.396 0.126 1.250
144
+
145
+ * Also note that to make `#distance` able to assume the fingerprint resolution from the size of Integer that represents it, the change in its structure was needed (left half of bits was swapped with right one), so fingerprints between versions 0.0.4 and 0.0.5 became incompatible, but you probably can convert them manually. Otherwise if we put the version or structure information inside fingerprint it would became slow to (de)serialize and store.
146
+
147
+ ## Development notes
148
+
149
+ $ ruby test.rb
150
+
151
+ * You might need to prepend `bundle exec` to all the `rake` commands.
152
+
153
+ * You get this:
154
+
155
+ Can't install RMagick 2.16.0. Can't find MagickWand.h.
156
+
157
+ because Imagemagick sucks but we need it to benchmark alternative gems, so:
158
+
159
+ $ brew install imagemagick@6
160
+ $ brew unlink imagemagick@7
161
+ $ brew link imagemagick@6 --force
162
+
163
+ * OS X El Captain and rbenv may cause environment issues that would make you do things like:
164
+
165
+ $ ./ruby `rbenv which rake` compare_matrixes
166
+
167
+ instead of just
168
+
169
+ $ rake compare_matrixes
170
+
171
+ For more information on that: https://github.com/jcupitt/ruby-vips/issues/141
172
+
173
+ * On macOS, when you do `bundle install` it may fail to install `rmagick` gem (`dhash` gem dependency) saying:
174
+
175
+ ERROR: Can't install RMagick 4.0.0. Can't find magick/MagickCore.h.
176
+
177
+ To resolve this do:
178
+
179
+ $ brew install imagemagick@6
180
+ $ LDFLAGS="-L/usr/local/opt/imagemagick@6/lib" CPPFLAGS="-I/usr/local/opt/imagemagick@6/include" bundle install
181
+
182
+ * If you get `No package 'MagickCore' found` try:
183
+
184
+ $ PKG_CONFIG_PATH="/usr/local/Cellar/imagemagick@6/6.9.10-74/lib/pkgconfig" bundle install
185
+
186
+ * Execute the `rake compare_quality` at least once before executing other rake tasks because it's currently the only one that downloads the test images.
187
+
188
+ * The tag `v0.0.0.4` is not semver and not real gem version -- it's only for Github Actions testing purposes.
189
+
190
+ * To quickly find out what does the dhash-vips Docker image include: `docker run --rm <image_name> sh -c "cat /etc/alpine-release; ruby -v; vips -v; gem list dhash-vips` (TODO: write in this README about the existing Docker image).
191
+
192
+ * Phamilie works with filenames instead of fingerprints and caches them but not distances.
205
193
 
206
194
  ## Credits
207
195
 
data/Rakefile CHANGED
@@ -1,15 +1,11 @@
1
- STDOUT.sync = true
2
-
3
- require "bundler/gem_tasks"
4
-
5
-
6
- task :default => %w{ spec }
7
-
8
- require "rspec/core/rake_task"
9
- RSpec::Core::RakeTask.new :spec do |t|
10
- t.verbose = false
1
+ begin
2
+ # for `rake release`
3
+ require "bundler/gem_tasks"
4
+ rescue LoadError
5
+ puts "consider to `gem install bundler` to be able to `rake release`"
11
6
  end
12
7
 
8
+ require "pp"
13
9
 
14
10
  visualize_hash = lambda do |hash|
15
11
  puts hash.to_s(2).rjust(64, ?0).gsub(/(?<=.)/, '\0 ').scan(/.{16}/)
@@ -50,58 +46,80 @@ task :compare_kernels do |_|
50
46
  end
51
47
  end
52
48
 
53
- # ./ruby `rbenv which rake` compare_matrixes
54
- desc "Compare the quality of Dhash, DHashVips::DHash and DHashVips::IDHash -- run it only after `rake test`"
55
- task :compare_matrices do |_|
49
+ desc "Compare the quality of Dhash, Phamilie, DHashVips::DHash, DHashVips::IDHash"
50
+ # in this test we want to know not that photos are the same but rather that they are from the same photosession
51
+ task :compare_quality do
56
52
  require "dhash"
53
+ require "phamilie"
54
+ phamilie = Phamilie.new
57
55
  require_relative "lib/dhash-vips"
58
56
  require "mll"
59
- [
60
- [Dhash, :calculate, :hamming],
61
- [DHashVips::DHash, :calculate, :hamming],
62
- [DHashVips::IDHash, :fingerprint, :distance],
63
- [DHashVips::IDHash, :fingerprint, :distance, 4],
64
- ].each do |m, calc, dm, power|
65
- puts "\n#{m} #{power}"
66
- hashes = %w{
67
- 71662d4d4029a3b41d47d5baf681ab9a.jpg
68
- ad8a37f872956666c3077a3e9e737984.jpg
69
-
70
- 6d97739b4a08f965dc9239dd24382e96.jpg
71
- 1b1d4bde376084011d027bba1c047a4b.jpg
72
-
73
- 1d468d064d2e26b5b5de9a0241ef2d4b.jpg
74
- 92d90b8977f813af803c78107e7f698e.jpg
75
-
76
- 309666c7b45ecbf8f13e85a0bd6b0a4c.jpg
77
- 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
78
- df0a3b93e9412536ee8a11255f974141.jpg
79
- 679634ff89a31279a39f03e278bc9a01.jpg
80
- }.map{ |filename| m.public_send calc, "images/#{filename}", *power }
81
- table = MLL::table[m.method(dm), [hashes], [hashes]]
82
- # require "pp"
83
- # pp table
84
- array = Array.new(5){ [] }
85
- hashes.size.times.to_a.repeated_combination(2) do |i, j|
86
- array[i == j ? 0 : (j - i).abs == 1 && (i + j - 1) % 4 == 0 ? [i, j] == [0, 1] ? 1 : [i, j] == [2, 3] ? 2 : 3 : 4].push table[i][j]
87
- end
88
- # p array.map &:sort
89
- puts "Absolutely the same image: #{array[0].minmax.join ".."}"
90
- puts "Complex B/W and the same but colorful: #{array[1][0]}"
91
- puts "Similar images: #{array[3].minmax.join ".."}"
92
- puts "Different images: #{[*array[2], *array[4]].minmax.join ".."}"
93
- end
57
+
58
+ puts MLL::grid.call( [
59
+ ["", "The same image:", "'Jordan Voth case':", "Similar images:", "Different images:", "1/FMI^2 =", "FP, FN ="],
60
+ *[
61
+ [Dhash, :calculate, :hamming],
62
+ [phamilie, :fingerprint, :distance, nil, 0],
63
+ [DHashVips::DHash, :calculate, :hamming],
64
+ [DHashVips::IDHash, :fingerprint, :distance],
65
+ [DHashVips::IDHash, :fingerprint, :distance, 4],
66
+ ].map do |m, calc, dm, power, ii|
67
+ require_relative "common"
68
+ hashes = %w{
69
+ 71662d4d4029a3b41d47d5baf681ab9a.jpg ad8a37f872956666c3077a3e9e737984.jpg
70
+
71
+ 1b1d4bde376084011d027bba1c047a4b.jpg 6d97739b4a08f965dc9239dd24382e96.jpg
72
+
73
+ 1d468d064d2e26b5b5de9a0241ef2d4b.jpg 92d90b8977f813af803c78107e7f698e.jpg
74
+ 309666c7b45ecbf8f13e85a0bd6b0a4c.jpg 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
75
+ 679634ff89a31279a39f03e278bc9a01.jpg df0a3b93e9412536ee8a11255f974141.jpg
76
+ 54192a3f65bd03163b04849e1577a40b.jpg 6d32f57459e5b79b5deca2a361eb8c6e.jpg
77
+ 4b62e0eef58bfbc8d0d2fbf2b9d05483.jpg b8eb0ca91855b657f12fb3d627d45c53.jpg
78
+ 21cd9a6986d98976b6b4655e1de7baf4.jpg 9b158c0d4953d47171a22ed84917f812.jpg
79
+ 9c2c240ec02356472fb532f404d28dde.jpg fc762fa286489d8afc80adc8cdcb125e.jpg
80
+ 7a833d873f8d49f12882e86af1cc6b79.jpg ac033cf01a3941dd1baa876082938bc9.jpg
81
+ }.map(&method(:download_and_keep)).map{ |filename| [filename, m.public_send(calc, filename, *power)] }
82
+ table = MLL::table[m.method(dm), [hashes.map{|_|_[ii||1]}], [hashes.map{|_|_[ii||1]}]]
83
+ report = Struct.new(:same, :bw, :sim, :not_sim).new [], [], [], []
84
+ hashes.size.times.to_a.repeated_combination(2) do |i, j|
85
+ report[
86
+ case
87
+ when i == j ; :same
88
+ when [i, j] == [0, 1] ; :bw
89
+ when i > 3 && i + 1 == j && i % 2 == 0 ; :sim
90
+ else ; :not_sim
91
+ end
92
+ ].push table[i][j]
93
+ end
94
+ min, max = [*report.sim, *report.not_sim].minmax
95
+ fmi, fp, fn = (min..max+1).map do |b|
96
+ fp = report.not_sim.count{ |_| _ < b }
97
+ tp = report.sim.count{ |_| _ < b }
98
+ fn = report.sim.count{ |_| _ >= b }
99
+ [((tp + fp) * (tp + fn)).fdiv(tp * tp), fp, fn]
100
+ end.reject{ |_,| _.nan? }.min_by(&:first)
101
+ [
102
+ "#{m.is_a?(Module) ? m : m.class}#{"(#{power})" if power}",
103
+ report.same. minmax.join(".."),
104
+ report.bw[0],
105
+ report.sim. minmax.join(".."),
106
+ report.not_sim.minmax.join(".."),
107
+ fmi.round(3),
108
+ [fp, fn]
109
+ ]
110
+ end,
111
+ ].transpose, spacings: [1.5, 0], alignment: :right )
94
112
  end
95
113
 
96
114
  # ruby -c Rakefile && rm -f ab.png && rake compare_images -- fc762fa286489d8afc80adc8cdcb125e.jpg 9c2c240ec02356472fb532f404d28dde.jpg 2>/dev/null && ql ab.png
97
115
  # rm -f ab.png && ./ruby `rbenv which rake` compare_images -- 6d97739b4a08f965dc9239dd24382e96.jpg 1b1d4bde376084011d027bba1c047a4b.jpg 2>/dev/null && ql ab.png
98
116
  desc "Visualizes the IDHash difference measurement between two images"
99
117
  task :compare_images do |_|
100
- abort "there should be two image filenames passed as arguments (and optionally the `power`)" unless (4..5) === ARGV.size
101
- abort "the optional argument should be either 3 or 4" unless [3, 4].include?(power = (ARGV[4] || 3).to_i)
118
+ abort "there should be two image filenames passed as arguments (and optionally the `power`)" unless (3..4) === ARGV.size
119
+ abort "the optional argument should be either 3 or 4" unless [3, 4].include?(power = (ARGV[3] || 3).to_i)
102
120
  task ARGV.last do ; end
103
121
  require_relative "lib/dhash-vips"
104
- ha, hb = ARGV[2, 2].map{ |filename| DHashVips::IDHash.fingerprint(filename, power) }
122
+ ha, hb = ARGV[1, 2].map{ |filename| DHashVips::IDHash.fingerprint(filename, power) }
105
123
  puts "distance: #{DHashVips::IDHash.distance ha, hb}"
106
124
  size = 2 ** power
107
125
  shift = 2 * size * size
@@ -110,7 +128,7 @@ task :compare_images do |_|
110
128
  bi = hb >> shift
111
129
  bd = hb - (bi << shift)
112
130
 
113
- a, b = ARGV[2, 2].map do |filename|
131
+ a, b = ARGV[1, 2].map do |filename|
114
132
  image = Vips::Image.new_from_file filename
115
133
  image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w").
116
134
  resize(100, vscale: 100, kernel: :nearest).colourspace("srgb")
@@ -187,10 +205,11 @@ task :compare_images do |_|
187
205
  a.join(b, :horizontal, shim: 15).write_to_file "ab.png"
188
206
  end
189
207
 
190
- # ./ruby `rbenv which rake` compare_speed
191
- desc "Benchmarks Dhash, DHashVips::DHash and DHashVips::IDHash"
208
+ desc "Benchmark speed of Dhash, DHashVips::DHash, DHashVips::IDHash and Phamilie"
192
209
  task :compare_speed do
193
210
  require "dhash"
211
+ require "phamilie"
212
+ phamilie = Phamilie.new
194
213
  require_relative "lib/dhash-vips"
195
214
 
196
215
  filenames = %w{
@@ -214,36 +233,132 @@ task :compare_speed do
214
233
  end
215
234
 
216
235
  require "benchmark"
217
- puts "load and calculate the fingerprint:"
236
+ puts "load the image and calculate the fingerprint:"
218
237
  hashes = []
219
238
  Benchmark.bm 19 do |bm|
220
239
  [
221
240
  [Dhash, :calculate],
241
+ [phamilie, :fingerprint],
222
242
  [DHashVips::DHash, :calculate],
223
243
  [DHashVips::IDHash, :fingerprint],
224
244
  [DHashVips::IDHash, :fingerprint, 4],
225
245
  ].each do |m, calc, power|
226
- bm.report "#{m} #{power}" do
246
+ bm.report "#{m.is_a?(Module) ? m : m.class} #{power}" do
227
247
  hashes.push filenames.map{ |filename| m.send calc, filename, *power }
228
248
  end
229
249
  end
230
250
  end
231
- hashes[-1, 1] = hashes[-2, 2] # for `distance` and `distance3` we use the same hashes
232
- puts "\nmeasure the distance (1000 times):"
233
- Benchmark.bm 29 do |bm|
251
+
252
+ # for `distance`, `distance3_ruby` and `distance3_c` we use the same hashes
253
+ # this array manipulation converts [1, 2, 3, 4, 5] into [1, 2, 3, 4, 4, 4, 5]
254
+ hashes[-1, 1] = hashes[-2, 2]
255
+ hashes[-1, 1] = hashes[-2, 2]
256
+
257
+ puts "\nmeasure the distance (32*32*2000 times):"
258
+ Benchmark.bm 32 do |bm|
234
259
  [
235
260
  [Dhash, :hamming],
261
+ [phamilie, :distance, nil, 1],
236
262
  [DHashVips::DHash, :hamming],
237
263
  [DHashVips::IDHash, :distance],
238
- [DHashVips::IDHash, :distance3],
264
+ [DHashVips::IDHash, :distance3_ruby],
265
+ [DHashVips::IDHash, :distance3_c],
239
266
  [DHashVips::IDHash, :distance, 4],
240
- ].zip(hashes) do |(m, dm, power), hs|
241
- bm.report "#{m} #{dm} #{power}" do
242
- hs.product hs do |h1, h2|
243
- 1000.times{ m.public_send dm, h1, h2 }
267
+ ].zip(hashes) do |(m, dm, power, ii), hs|
268
+ bm.report "#{m.is_a?(Module) ? m : m.class} #{dm} #{power}" do
269
+ _ = [hs, filenames][ii || 0]
270
+ _.product _ do |h1, h2|
271
+ 2000.times{ m.public_send dm, h1, h2 }
244
272
  end
245
273
  end
246
274
  end
247
275
  end
248
276
 
249
277
  end
278
+
279
+ desc "Benchmarks everything about Dhash, DHashVips::DHash, DHashVips::IDHash and Phamilie"
280
+ task :benchmark do
281
+ abort "provide a folder with images grouped by similarity" unless 2 === ARGV.size
282
+ abort "invalid folder provided" unless Dir.exist?(dir = ARGV.last)
283
+
284
+ require "dhash"
285
+ require "phamilie"
286
+ phamilie = Phamilie.new
287
+ require_relative "lib/dhash-vips"
288
+
289
+ filenames = Dir.glob("#{dir}/*").map{ |_| Dir.glob "#{_}/*" }
290
+ puts "image groups sizes: #{filenames.map(&:size)}"
291
+ require "benchmark"
292
+
293
+ puts "step 1 / 3 (fingerprinting)"
294
+ hashes = []
295
+ bm1 = [
296
+ [phamilie, :fingerprint],
297
+ [Dhash, :calculate],
298
+ [DHashVips::DHash, :calculate],
299
+ [DHashVips::IDHash, :fingerprint],
300
+ ].map do |m, calc, power|
301
+ Benchmark.realtime do
302
+ hashes.push filenames.flatten.map{ |filename| m.send calc, filename, *power }
303
+ end
304
+ end
305
+
306
+ puts "step 2 / 3 (comparing fingerprints)"
307
+ combs = filenames.flatten.size ** 2
308
+ n = 10_000_000_000_000 / Dir.glob("#{dir}/*/*").map(&File.method(:size)).inject(:+) / combs
309
+ bm2 = [
310
+ [phamilie, :distance, nil, filenames.flatten],
311
+ [Dhash, :hamming],
312
+ [DHashVips::DHash, :hamming],
313
+ [DHashVips::IDHash, :distance3_c],
314
+ ].zip(hashes).map do |(m, dm, power, ii), hs|
315
+ Benchmark.realtime do
316
+ _ = ii || hs
317
+ _.product _ do |h1, h2|
318
+ n.times{ m.public_send dm, h1, h2 }
319
+ end
320
+ end
321
+ end
322
+
323
+ puts "step 3 / 3 (looking for the best threshold)"
324
+ bm3 = [
325
+ [phamilie, :fingerprint, :distance, nil, 0],
326
+ [Dhash, :calculate, :hamming],
327
+ [DHashVips::DHash, :calculate, :hamming],
328
+ [DHashVips::IDHash, :fingerprint, :distance],
329
+ ].map do |m, calc, dm, power, ii|
330
+ require_relative "common"
331
+ hashes = Dir.glob("#{dir}/*").flat_map{ |_| Dir.glob "#{_}/*" }.map{ |filename| [filename, m.public_send(calc, filename, *power)] }
332
+ report = Struct.new(:same, :sim, :not_sim).new [], [], []
333
+ hashes.size.times.to_a.repeated_combination(2) do |i, j|
334
+ report[
335
+ case
336
+ when i == j ; :same
337
+ when File.split(File.split(hashes[i][0]).first).last ==
338
+ File.split(File.split(hashes[j][0]).first).last && i < j ; :sim
339
+ else ; :not_sim
340
+ end
341
+ ].push m.method(dm).call hashes[i][ii||1], hashes[j][ii||1]
342
+ end
343
+ min, max = [*report.sim, *report.not_sim].minmax
344
+ fmi, fp, fn = (min..max+1).map do |b|
345
+ fp = report.not_sim.count{ |_| _ < b }
346
+ tp = report.sim.count{ |_| _ < b }
347
+ fn = report.sim.count{ |_| _ >= b }
348
+ [((tp + fp) * (tp + fn)).fdiv(tp * tp), fp, fn]
349
+ end.reject{ |_,| _.nan? }.min_by(&:first)
350
+ fmi
351
+ end
352
+
353
+ puts RUBY_DESCRIPTION
354
+ system "vips -v"
355
+ system "identify -version | /usr/bin/head -1"
356
+ system "sysctl -n machdep.cpu.brand_string"
357
+ require "mll"
358
+ puts MLL::grid.call %w{ \ Fingerprint Compare 1/FMI^2 }.zip(*[
359
+ %w{ Phamilie Dhash DHash IDHash },
360
+ *[bm1, bm2].map{ |bm| bm.map{ |_| "%.3f" % _ } },
361
+ bm3.map{ |_| "%.3f" % _ }
362
+ ].transpose).transpose, spacings: [1.5, 0], alignment: :right
363
+ puts "(lower numbers are better)"
364
+ end
@@ -0,0 +1,11 @@
1
+ def download_and_keep image # returns path
2
+ require "open-uri"
3
+ require "digest"
4
+ File.join(FileUtils.mkdir_p(File.expand_path "images", __dir__()).first, image).tap do |path|
5
+ open("https://storage.googleapis.com/dhash-vips.nakilon.pro/#{image}") do |link|
6
+ File.open(path, "wb") do |file|
7
+ IO.copy_stream link, file
8
+ end
9
+ end unless File.exist?(path) && Digest::MD5.file(path) == File.basename(image, ".jpg")
10
+ end
11
+ end
@@ -1,20 +1,32 @@
1
1
  Gem::Specification.new do |spec|
2
2
  spec.name = "dhash-vips"
3
- spec.version = "0.0.5.1"
3
+ spec.version = "0.1.1.0"
4
4
  spec.author = "Victor Maslov"
5
5
  spec.email = "nakilon@gmail.com"
6
- spec.summary = "dHash and IDHash powered by Vips"
6
+ spec.summary = "dHash and IDHash perceptual image hashing/fingerprinting"
7
7
  spec.homepage = "https://github.com/nakilon/dhash-vips"
8
8
  spec.license = "MIT"
9
9
 
10
- spec.test_files = ["spec"]
11
- spec.files = `git ls-files -z`.split("\x0") - spec.test_files
12
10
  spec.require_path = "lib"
11
+ spec.test_files = %w{ test.rb }
12
+ spec.extensions = %w{ extconf.rb }
13
+ spec.files = `git ls-files -z`.split("\x0") -
14
+ spec.test_files -
15
+ %w{ .gitignore Dockerfile } -
16
+ Dir.glob("example_*/**/*") -
17
+ Dir.glob(".github/**/*")
13
18
 
14
- spec.add_dependency "ruby-vips"
19
+ spec.add_dependency "ruby-vips", "~>2.0.16"
15
20
 
16
21
  spec.add_development_dependency "rake"
17
- spec.add_development_dependency "rspec-core"
22
+ spec.add_development_dependency "minitest"
23
+
24
+ spec.add_development_dependency "rmagick", "~>2.16"
25
+ spec.add_development_dependency "phamilie"
18
26
  spec.add_development_dependency "dhash"
27
+
19
28
  spec.add_development_dependency "get_process_mem"
29
+
30
+ spec.add_development_dependency "mll"
31
+ spec.add_development_dependency "byebug"
20
32
  end
@@ -0,0 +1,36 @@
1
+ require "mkmf"
2
+
3
+ File.write "Makefile", dummy_makefile(?.).join
4
+ unless Gem::Platform.local.os == "darwin" && ENV["RBENV_ROOT"] && ENV["RBENV_VERSION"]
5
+ else
6
+ if Gem::Version.new(RUBY_VERSION) < Gem::Version.new("2.3.8") ||
7
+ Gem::Version.new(RUBY_VERSION) > Gem::Version.new("2.4.9")
8
+ else
9
+ if Gem::Version.new(RUBY_VERSION) < Gem::Version.new("2.4")
10
+ else
11
+ append_cppflags "-DRUBY_EXPORT"
12
+ end
13
+ # https://github.com/rbenv/rbenv/issues/1199
14
+ append_cppflags "-I#{Dir.glob("#{ENV["RBENV_ROOT"]}/sources/#{ENV["RBENV_VERSION"]}/ruby-*/").first}"
15
+ create_makefile "idhash"
16
+ # Why this hack?
17
+ # 1. Because I want to use Ruby and ./idhash.bundle for tests, not C.
18
+ # 2. Because I don't want to bother users with two gems instead of one.
19
+ File.write "Makefile", <<~HEREDOC + File.read("Makefile")
20
+ .PHONY: test
21
+ test: all
22
+ \t$(RUBY) -r./lib/dhash-vips.rb ./lib/dhash-vips-post-install-test.rb
23
+ HEREDOC
24
+ end
25
+ end
26
+
27
+ # Cases to check:
28
+ # 0. everything is ok
29
+ # `rm -rf idhash.o idhash.bundle idhash.so pkg && bundle exec rake install`
30
+ # `bundle exec rake -rdhash-vips -e "p DHashVips::IDHash.method(:distance3).source_location"` # => # ["/Users/nakilon/_/dhash-vips/lib/dhash-vips.rb", 32] # currently falsely says that gem install failed idk why
31
+ # `rm -f idhash.o idhash.bundle idhash.so Makefile && ruby extconf.rb && make`
32
+ # `bundle exec rake -rdhash-vips -e "p DHashVips::IDHash.method(:distance3).source_location"` # => # ["/Users/nakilon/_/dhash-vips/lib/dhash-vips.rb", 53]
33
+ # 1. not macOS && rbenv
34
+ # 2. fail during append_cppflags
35
+ # 3. failed compilation
36
+ # 4. failed tests
@@ -0,0 +1,29 @@
1
+ #include <bignum.c>
2
+
3
+ static VALUE idhash_distance(VALUE self, VALUE a, VALUE b){
4
+ BDIGIT* tempd;
5
+ long i, an = BIGNUM_LEN(a), bn = BIGNUM_LEN(b), templ, acc = 0;
6
+ BDIGIT* as = BDIGITS(a);
7
+ BDIGIT* bs = BDIGITS(b);
8
+ while (0 < an && as[an-1] == 0) an--; // for (i = an; --i;) printf("%u\n", as[i]);
9
+ while (0 < bn && bs[bn-1] == 0) bn--; // for (i = bn; --i;) printf("%u\n", bs[i]);
10
+ // printf("%lu %lu\n", an, bn);
11
+ if (an < bn) {
12
+ tempd = as; as = bs; bs = tempd;
13
+ templ = an; an = bn; bn = templ;
14
+ }
15
+ for (i = an; i-- > 4;) {
16
+ // printf("%ld : (%u | %u) & (%u ^ %u)\n", i, as[i], (i >= bn ? 0 : bs[i]), as[i-4], bs[i-4]);
17
+ acc += __builtin_popcountl((as[i] | (i >= bn ? 0 : bs[i])) & (as[i-4] ^ bs[i-4]));
18
+ // printf("%ld : %ld\n", i, acc);
19
+ }
20
+ RB_GC_GUARD(a);
21
+ RB_GC_GUARD(b);
22
+ return INT2FIX(acc);
23
+ }
24
+
25
+ void Init_idhash() {
26
+ VALUE m = rb_define_module("DHashVips");
27
+ VALUE mm = rb_define_module_under(m, "IDHash");
28
+ rb_define_module_function(mm, "distance3_c", idhash_distance, 2);
29
+ }
@@ -0,0 +1,48 @@
1
+ puts "Testing native extension..."
2
+
3
+ a, b = 27362028616592833077810614538336061650596602259623245623188871925927275101952, 57097733966917585112089915289446881218887831888508524872740133297073405558528
4
+ f = ->(a,b){ DHashVips::IDHash.distance3_ruby a, b }
5
+
6
+ p as = [a.to_s(16).rjust(64,?0)].pack("H*").unpack("N*")
7
+ p bs = [b.to_s(16).rjust(64,?0)].pack("H*").unpack("N*")
8
+ puts as.zip(bs)[0,4].map{ |i,j| (i | j).to_s(2).rjust(32, ?0) }.zip \
9
+ as.zip(bs)[4,4].map{ |i,j| (i ^ j).to_s(2).rjust(32, ?0) }
10
+ p DHashVips::IDHash.distance3_c a, b
11
+ p f[a, b]
12
+ fail unless 17 == f[a, b]
13
+
14
+ s = [0, 1, 1<<63, (1<<63)+1, (1<<64)-1].each do |_|
15
+ # p [_.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
16
+ end
17
+ ss = s.repeated_permutation(4).map do |s1, s2, s3, s4|
18
+ ((s1 << 192) + (s2 << 128) + (s3 << 64) + s4).tap do |_|
19
+ # p [_.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
20
+ end
21
+ end
22
+ fail unless :distance3 == DHashVips::IDHash.method(:distance3).original_name
23
+ if Gem::Version.new(RUBY_VERSION) < Gem::Version.new("2.4")
24
+ check = lambda do |s1, s2|
25
+ s1.is_a?(Bignum) && s2.is_a?(Bignum)
26
+ end
27
+ else
28
+ require "rbconfig/sizeof"
29
+ check = lambda do |s1, s2|
30
+ # https://github.com/ruby/ruby/commit/de2f7416d2deb4166d78638a41037cb550d64484#diff-16b196bc6bfe8fba63951420f843cfb4R10
31
+ _FIXNUM_MAX = (1 << (8 * RbConfig::SIZEOF["long"] - 2)) - 1
32
+ s1 > _FIXNUM_MAX && s2 > _FIXNUM_MAX
33
+ end
34
+ end
35
+ ss.product ss do |s1, s2|
36
+ next unless check.call s1, s2
37
+ unless f[s1, s2] == DHashVips::IDHash.distance3_c(s1, s2)
38
+ p [s1, s2]
39
+ p [s1.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
40
+ p [s2.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
41
+ p [f[s1, s2], DHashVips::IDHash.distance3_c(s1, s2)]
42
+ fail
43
+ end
44
+ end
45
+ 100000.times do
46
+ s1, s2 = Array.new(2){ n = rand 256; ([?0] * n + [?1] * (256 - n)).shuffle.join.to_i 2 }
47
+ fail unless DHashVips::IDHash.distance3(s1, s2) == DHashVips::IDHash.distance3_ruby(s1, s2)
48
+ end
@@ -10,7 +10,7 @@ module DHashVips
10
10
  end
11
11
 
12
12
  def pixelate file, hash_size, kernel = nil
13
- image = Vips::Image.new_from_file file
13
+ image = Vips::Image.new_from_file file, access: :sequential
14
14
  if kernel
15
15
  image.resize((hash_size + 1).fdiv(image.width), vscale: hash_size.fdiv(image.height), kernel: kernel).colourspace("b-w")
16
16
  else
@@ -21,7 +21,7 @@ module DHashVips
21
21
  def calculate file, hash_size = 8, kernel = nil
22
22
  image = pixelate file, hash_size, kernel
23
23
 
24
- image.cast("int").conv([1, -1]).crop(1, 0, hash_size, hash_size).>(0)./(255).cast("uchar").to_a.join.to_i(2)
24
+ image.cast("int").conv([[1, -1]]).crop(1, 0, hash_size, hash_size).>(0)./(255).cast("uchar").to_a.join.to_i(2)
25
25
  end
26
26
 
27
27
  end
@@ -29,8 +29,35 @@ module DHashVips
29
29
  module IDHash
30
30
  extend self
31
31
 
32
- def distance3 a, b
33
- return ((a ^ b) & (a | b) >> 128).to_s(2).count "1"
32
+ def distance3_ruby a, b
33
+ ((a ^ b) & (a | b) >> 128).to_s(2).count "1"
34
+ end
35
+ begin
36
+ require_relative "../idhash.bundle"
37
+ rescue LoadError
38
+ alias_method :distance3, :distance3_ruby
39
+ else
40
+ # we can't just do `defined? Bignum` because it's defined but deprecated (some internal CONST_DEPRECATED flag)
41
+ if Gem::Version.new(RUBY_VERSION) < Gem::Version.new("2.4")
42
+ def distance3 a, b
43
+ if a.is_a?(Bignum) && b.is_a?(Bignum)
44
+ distance3_c a, b
45
+ else
46
+ distance3_ruby a, b
47
+ end
48
+ end
49
+ else
50
+ # https://github.com/ruby/ruby/commit/de2f7416d2deb4166d78638a41037cb550d64484#diff-16b196bc6bfe8fba63951420f843cfb4R10
51
+ require "rbconfig/sizeof"
52
+ FIXNUM_MAX = (1 << (8 * RbConfig::SIZEOF["long"] - 2)) - 1
53
+ def distance3 a, b
54
+ if a > FIXNUM_MAX && b > FIXNUM_MAX
55
+ distance3_c a, b
56
+ else
57
+ distance3_ruby a, b
58
+ end
59
+ end
60
+ end
34
61
  end
35
62
  def distance a, b
36
63
  size_a, size_b = [a, b].map do |x|
@@ -39,6 +66,7 @@ module DHashVips
39
66
  # but also 31, 30 happens for MRI 2.3
40
67
  x.size <= 32 ? 8 : 16
41
68
  end
69
+ return distance3 a, b if [8, 8] == [size_a, size_b]
42
70
  fail "fingerprints were taken with different `power` param: #{size_a} and #{size_b}" if size_a != size_b
43
71
  ((a ^ b) & (a | b) >> 2 * size_a * size_a).to_s(2).count "1"
44
72
  end
@@ -65,8 +93,8 @@ module DHashVips
65
93
 
66
94
  def fingerprint filename, power = 3
67
95
  size = 2 ** power
68
- image = Vips::Image.new_from_file filename
69
- image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w")
96
+ image = Vips::Image.new_from_file filename, access: :sequential
97
+ image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w").flatten
70
98
 
71
99
  array = image.to_a.map &:flatten
72
100
  d1, i1, d2, i2 = [array, array.transpose].flat_map do |a|
data/test.rb ADDED
@@ -0,0 +1,101 @@
1
+ require "minitest/autorun"
2
+
3
+ require "dhash-vips"
4
+
5
+ # TODO tests about `fingerprint(4)`
6
+
7
+ [
8
+ [DHashVips::DHash, :hamming, :calculate, 2, 23, 18, 50, 4],
9
+ # vips-8.9.1-Tue Jan 28 13:05:46 UTC 2020
10
+ # [[0, 14, 26, 27, 31, 27, 32, 28, 43, 43, 34, 37, 37, 34, 35, 42],
11
+ # [14, 0, 28, 25, 39, 35, 32, 32, 43, 43, 38, 41, 41, 38, 37, 50],
12
+ # [26, 28, 0, 13, 35, 41, 28, 30, 41, 41, 36, 33, 35, 32, 27, 36],
13
+ # [27, 25, 13, 0, 36, 36, 31, 35, 40, 40, 33, 32, 42, 35, 26, 33],
14
+ # [31, 39, 35, 36, 0, 16, 33, 33, 40, 40, 31, 24, 28, 33, 40, 31],
15
+ # [27, 35, 41, 36, 16, 0, 41, 41, 38, 38, 23, 26, 24, 29, 34, 27],
16
+ # [32, 32, 28, 31, 33, 41, 0, 10, 27, 25, 38, 35, 37, 32, 23, 34],
17
+ # [28, 32, 30, 35, 33, 41, 10, 0, 27, 27, 34, 31, 37, 36, 27, 34],
18
+ # [43, 43, 41, 40, 40, 38, 27, 27, 0, 2, 35, 34, 30, 31, 28, 27],
19
+ # [43, 43, 41, 40, 40, 38, 25, 27, 2, 0, 35, 34, 30, 31, 28, 27],
20
+ # [34, 38, 36, 33, 31, 23, 38, 34, 35, 35, 0, 9, 23, 26, 29, 18],
21
+ # [37, 41, 33, 32, 24, 26, 35, 31, 34, 34, 9, 0, 22, 25, 30, 19],
22
+ # [37, 41, 35, 42, 28, 24, 37, 37, 30, 30, 23, 22, 0, 19, 26, 23],
23
+ # [34, 38, 32, 35, 33, 29, 32, 36, 31, 31, 26, 25, 19, 0, 21, 26],
24
+ # [35, 37, 27, 26, 40, 34, 23, 27, 28, 28, 29, 30, 26, 21, 0, 23],
25
+ # [42, 50, 36, 33, 31, 27, 34, 34, 27, 27, 18, 19, 23, 26, 23, 0]]
26
+ [DHashVips::IDHash, :distance, :fingerprint, 6, 22, 23, 65, 0],
27
+ # vips-8.9.1-Tue Jan 28 13:05:46 UTC 2020
28
+ # [[0, 16, 32, 35, 57, 45, 51, 50, 48, 47, 54, 48, 60, 50, 47, 56],
29
+ # [16, 0, 30, 34, 58, 47, 55, 56, 47, 50, 57, 49, 62, 52, 52, 61],
30
+ # [32, 30, 0, 9, 47, 54, 45, 41, 65, 62, 42, 37, 51, 44, 49, 49],
31
+ # [35, 34, 9, 0, 54, 64, 42, 40, 57, 56, 48, 39, 50, 40, 41, 51],
32
+ # [57, 58, 47, 54, 0, 22, 43, 45, 64, 61, 48, 47, 35, 43, 47, 48],
33
+ # [45, 47, 54, 64, 22, 0, 53, 54, 55, 54, 40, 46, 39, 42, 43, 42],
34
+ # [51, 55, 45, 42, 43, 53, 0, 6, 33, 35, 52, 43, 46, 45, 44, 47],
35
+ # [50, 56, 41, 40, 45, 54, 6, 0, 38, 41, 53, 50, 48, 45, 41, 42],
36
+ # [48, 47, 65, 57, 64, 55, 33, 38, 0, 9, 51, 53, 47, 47, 41, 46],
37
+ # [47, 50, 62, 56, 61, 54, 35, 41, 9, 0, 51, 57, 50, 49, 44, 43],
38
+ # [54, 57, 42, 48, 48, 40, 52, 53, 51, 51, 0, 10, 33, 36, 38, 25],
39
+ # [48, 49, 37, 39, 47, 46, 43, 50, 53, 57, 10, 0, 27, 30, 37, 27],
40
+ # [60, 62, 51, 50, 35, 39, 46, 48, 47, 50, 33, 27, 0, 20, 23, 28],
41
+ # [50, 52, 44, 40, 43, 42, 45, 45, 47, 49, 36, 30, 20, 0, 35, 39],
42
+ # [47, 52, 49, 41, 47, 43, 44, 41, 41, 44, 38, 37, 23, 35, 0, 19],
43
+ # [56, 61, 49, 51, 48, 42, 47, 42, 46, 43, 25, 27, 28, 39, 19, 0]]
44
+ ].each do |lib, dm, calc, min_similar, max_similar, min_not_similar, max_not_similar, bw_exceptional|
45
+
46
+ describe lib do
47
+
48
+ # these are false positive by idhash
49
+ # 6d97739b4a08f965dc9239dd24382e96.jpg
50
+ # 1b1d4bde376084011d027bba1c047a4b.jpg
51
+ [
52
+ [ %w{
53
+ 1d468d064d2e26b5b5de9a0241ef2d4b.jpg 92d90b8977f813af803c78107e7f698e.jpg
54
+ 309666c7b45ecbf8f13e85a0bd6b0a4c.jpg 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
55
+ 679634ff89a31279a39f03e278bc9a01.jpg df0a3b93e9412536ee8a11255f974141.jpg
56
+ 54192a3f65bd03163b04849e1577a40b.jpg 6d32f57459e5b79b5deca2a361eb8c6e.jpg
57
+ 4b62e0eef58bfbc8d0d2fbf2b9d05483.jpg b8eb0ca91855b657f12fb3d627d45c53.jpg
58
+ 21cd9a6986d98976b6b4655e1de7baf4.jpg 9b158c0d4953d47171a22ed84917f812.jpg
59
+ 9c2c240ec02356472fb532f404d28dde.jpg fc762fa286489d8afc80adc8cdcb125e.jpg
60
+ 7a833d873f8d49f12882e86af1cc6b79.jpg ac033cf01a3941dd1baa876082938bc9.jpg
61
+ }, min_similar, max_similar], # slightly silimar images
62
+ [ %w{
63
+ 71662d4d4029a3b41d47d5baf681ab9a.jpg ad8a37f872956666c3077a3e9e737984.jpg
64
+ }, bw_exceptional, bw_exceptional], # these are the same photo but of different size and colorspace
65
+ ].each do |images, min, max|
66
+
67
+ require "fileutils"
68
+ require "digest"
69
+ require "mll"
70
+
71
+ require_relative "common"
72
+ images = images.map &method(:download_and_keep)
73
+
74
+ hashes = images.map &lib.method(calc)
75
+ table = MLL::table[lib.method(dm), [hashes], [hashes]]
76
+
77
+ require "pp"
78
+ STDERR.puts ""
79
+ PP.pp table, STDERR
80
+ STDERR.puts ""
81
+
82
+ hashes.size.times.to_a.repeated_combination(2) do |i, j|
83
+ it do
84
+ case
85
+ when i == j
86
+ assert_predicate table[i][j], :zero?
87
+ when (j - i).abs == 1 && (i + j - 1) % 4 == 0
88
+ # STDERR.puts [table[i][j], min, max].inspect
89
+ assert_includes min..max, table[i][j]
90
+ else
91
+ # STDERR.puts [table[i][j], min_not_similar, max_not_similar].inspect
92
+ assert_includes min_not_similar..max_not_similar, table[i][j]
93
+ end
94
+ end
95
+ end
96
+
97
+ end
98
+
99
+ end
100
+
101
+ end
metadata CHANGED
@@ -1,23 +1,37 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dhash-vips
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.5.1
4
+ version: 0.1.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Victor Maslov
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-03-05 00:00:00.000000000 Z
11
+ date: 2020-07-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ruby-vips
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: 2.0.16
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: 2.0.16
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
15
29
  requirement: !ruby/object:Gem::Requirement
16
30
  requirements:
17
31
  - - ">="
18
32
  - !ruby/object:Gem::Version
19
33
  version: '0'
20
- type: :runtime
34
+ type: :development
21
35
  prerelease: false
22
36
  version_requirements: !ruby/object:Gem::Requirement
23
37
  requirements:
@@ -25,7 +39,7 @@ dependencies:
25
39
  - !ruby/object:Gem::Version
26
40
  version: '0'
27
41
  - !ruby/object:Gem::Dependency
28
- name: rake
42
+ name: minitest
29
43
  requirement: !ruby/object:Gem::Requirement
30
44
  requirements:
31
45
  - - ">="
@@ -39,7 +53,21 @@ dependencies:
39
53
  - !ruby/object:Gem::Version
40
54
  version: '0'
41
55
  - !ruby/object:Gem::Dependency
42
- name: rspec-core
56
+ name: rmagick
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '2.16'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '2.16'
69
+ - !ruby/object:Gem::Dependency
70
+ name: phamilie
43
71
  requirement: !ruby/object:Gem::Requirement
44
72
  requirements:
45
73
  - - ">="
@@ -80,20 +108,52 @@ dependencies:
80
108
  - - ">="
81
109
  - !ruby/object:Gem::Version
82
110
  version: '0'
111
+ - !ruby/object:Gem::Dependency
112
+ name: mll
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '0'
125
+ - !ruby/object:Gem::Dependency
126
+ name: byebug
127
+ requirement: !ruby/object:Gem::Requirement
128
+ requirements:
129
+ - - ">="
130
+ - !ruby/object:Gem::Version
131
+ version: '0'
132
+ type: :development
133
+ prerelease: false
134
+ version_requirements: !ruby/object:Gem::Requirement
135
+ requirements:
136
+ - - ">="
137
+ - !ruby/object:Gem::Version
138
+ version: '0'
83
139
  description:
84
140
  email: nakilon@gmail.com
85
141
  executables: []
86
- extensions: []
142
+ extensions:
143
+ - extconf.rb
87
144
  extra_rdoc_files: []
88
145
  files:
89
- - ".gitignore"
90
146
  - Gemfile
91
147
  - LICENSE.txt
92
148
  - README.md
93
149
  - Rakefile
150
+ - common.rb
94
151
  - dhash-vips.gemspec
152
+ - extconf.rb
153
+ - idhash.c
154
+ - lib/dhash-vips-post-install-test.rb
95
155
  - lib/dhash-vips.rb
96
- - spec/_spec.rb
156
+ - test.rb
97
157
  homepage: https://github.com/nakilon/dhash-vips
98
158
  licenses:
99
159
  - MIT
@@ -114,8 +174,9 @@ required_rubygems_version: !ruby/object:Gem::Requirement
114
174
  version: '0'
115
175
  requirements: []
116
176
  rubyforge_project:
117
- rubygems_version: 2.5.2
177
+ rubygems_version: 2.6.14.4
118
178
  signing_key:
119
179
  specification_version: 4
120
- summary: dHash and IDHash powered by Vips
121
- test_files: []
180
+ summary: dHash and IDHash perceptual image hashing/fingerprinting
181
+ test_files:
182
+ - test.rb
data/.gitignore DELETED
@@ -1,22 +0,0 @@
1
- *.gem
2
- *.rbc
3
- .bundle
4
- .config
5
- .yardoc
6
- Gemfile.lock
7
- InstalledFiles
8
- _yardoc
9
- coverage
10
- doc/
11
- lib/bundler/man
12
- pkg
13
- rdoc
14
- spec/reports
15
- test/tmp
16
- test/version_tmp
17
- tmp
18
- *.bundle
19
- *.so
20
- *.o
21
- *.a
22
- mkmf.log
@@ -1,100 +0,0 @@
1
- require "dhash-vips"
2
-
3
- require "pp"
4
-
5
- # TODO tests about `fingerprint(4)`
6
-
7
- [
8
- [DHashVips::DHash, :hamming, :calculate, 17, 18, 22, 39],
9
- # [[0, 17, 29, 27, 22, 29, 30, 29],
10
- # [17, 0, 30, 26, 33, 36, 37, 36],
11
- # [29, 30, 0, 18, 39, 30, 39, 36],
12
- # [27, 26, 18, 0, 35, 30, 35, 34],
13
- # [22, 33, 39, 35, 0, 17, 28, 23],
14
- # [29, 36, 30, 30, 17, 0, 33, 30],
15
- # [30, 37, 39, 35, 28, 33, 0, 5],
16
- # [29, 36, 36, 34, 23, 30, 5, 0]]
17
- [DHashVips::IDHash, :distance, :fingerprint, 15, 23, 28, 64],
18
- # [[0, 16, 30, 32, 46, 58, 43, 43],
19
- # [16, 0, 28, 28, 47, 59, 46, 47],
20
- # [30, 28, 0, 15, 53, 49, 53, 52],
21
- # [32, 28, 15, 0, 56, 53, 61, 64],
22
- # [46, 47, 53, 56, 0, 23, 43, 45],
23
- # [58, 59, 49, 53, 23, 0, 44, 44],
24
- # [43, 46, 53, 61, 43, 44, 0, 0],
25
- # [43, 47, 52, 64, 45, 44, 0, 0]]
26
- ].each do |lib, dm, calc, min_similar, max_similar, min_not_similar, max_not_similar|
27
-
28
- describe lib do
29
-
30
- require "fileutils"
31
- require "open-uri"
32
- require "digest"
33
- require "mll"
34
- example do |example|
35
-
36
- images = %w{
37
- 1d468d064d2e26b5b5de9a0241ef2d4b.jpg
38
- 92d90b8977f813af803c78107e7f698e.jpg
39
- 309666c7b45ecbf8f13e85a0bd6b0a4c.jpg
40
- 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
41
- df0a3b93e9412536ee8a11255f974141.jpg
42
- 679634ff89a31279a39f03e278bc9a01.jpg
43
- } # these images a consecutive pairs of slightly (but enough for nice asserts) silimar images
44
- # 6d97739b4a08f965dc9239dd24382e96.jpg
45
- # 1b1d4bde376084011d027bba1c047a4b.jpg
46
- # while these two are tend to be false positive match by idhash
47
- bw1, bw2 = %w{
48
- 71662d4d4029a3b41d47d5baf681ab9a.jpg
49
- ad8a37f872956666c3077a3e9e737984.jpg
50
- } # these are the same photo but of different size and colorspace
51
-
52
- example.metadata[:extra_failure_lines] = []
53
- FileUtils.mkdir_p dir = "images"
54
- *images, bw1, bw2 = [*images, bw1, bw2].map do |image|
55
- "#{dir}/#{image}".tap do |filename|
56
- unless File.exist?(filename) && Digest::MD5.file(filename) == File.basename(filename, ".jpg")
57
- example.metadata[:extra_failure_lines] << "copying image from web to #{filename}"
58
- open("https://storage.googleapis.com/dhash-vips.nakilon.pro/#{image}") do |link|
59
- File.open(filename, "wb") do |file|
60
- IO.copy_stream link, file
61
- end
62
- end
63
- end
64
- end
65
- end
66
-
67
- hashes = [*images, bw1, bw2].map &described_class.method(calc)
68
- table = MLL::table[described_class.method(dm), [hashes], [hashes]]
69
-
70
- # require "pp"
71
- # pp table
72
- # next
73
-
74
- aggregate_failures do
75
- hashes.size.times.to_a.repeated_combination(2) do |i, j|
76
- case
77
- when i == j
78
- expect(table[i][j]).to eq 0
79
- when (j - i).abs == 1 && (i + j - 1) % 4 == 0
80
- if [i, j] == [hashes.size - 2, hashes.size - 1]
81
- if described_class == DHashVips::DHash
82
- expect(table[i][j]).to be == 5
83
- else
84
- expect(table[i][j]).to eq 0
85
- end
86
- else
87
- expect(table[i][j]).to be_between(min_similar, max_similar).inclusive
88
- end
89
- else
90
- expect(table[i][j]).to be_between(min_not_similar, max_not_similar).inclusive
91
- end
92
- end
93
-
94
- end
95
-
96
- end
97
-
98
- end
99
-
100
- end