dhash-vips 0.0.5.0 → 0.1.0.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 0263e2d1f269153107a10986ba97d2221f1191d6
4
- data.tar.gz: 4e1365a3ab0507e3e14cf8b06ae1f2f7d84482f1
3
+ metadata.gz: 2b90b4abcd617f0e285dd3399338b4114d921624
4
+ data.tar.gz: d47a8e51a8e7a2bf28b60d0259dc350f8b5e4fa0
5
5
  SHA512:
6
- metadata.gz: c00ea6ebead50175bfaa74bf0814a394c1b06159d238716d2edb7e2718587ee0a6c5497ec55823806193b5f46fa1a6ac1b43ba7e277ea468dd2e0ba3c5ea97a7
7
- data.tar.gz: 57d32bfbf327226ff8ba0771f6346d1ccdadbe8b61dcfaa4aa81795976bd3146fce877f9532d6d9fc1a798ba2ffe3f63a1a940aee263f3702a9b48a4c835a6e6
6
+ metadata.gz: 3b8b2ce65f3bdfd90d03e3a069e30c61be4a9bf1fc40f9d4beace8216165383d7f54572e996e101ab9fd6f056b260676a9ed465b8860bd1a6e17a4228b859909
7
+ data.tar.gz: 2ea40198fa4c6ed05183d610b2ca558a6ad67a386e69c127dbc0afac5723686025bd4644cc7297c300d9173b212788cc1ab4f242ad849ed7ee50dbc462dd5f08
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2017 Nakilon
3
+ Copyright (c) 2017, Victor Maslov (nakilon@gmail.com)
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,49 +1,21 @@
1
- [![Gem Version](https://badge.fury.io/rb/dhash-vips.svg)](http://badge.fury.io/rb/dhash-vips)
1
+ [![Gem Version](https://badge.fury.io/rb/dhash-vips.svg)](http://badge.fury.io/rb/dhash-vips) [![Docker Image](https://github.com/nakilon/dhash-vips/workflows/Docker%20Image/badge.svg)](https://hub.docker.com/repository/docker/nakilonishe/dhash-vips/general)
2
2
 
3
3
  # dHash and IDHash gem powered by ruby-vips
4
4
 
5
- The "dHash" is an algorithm of fingerprinting that can be used to measure the similarity of two images.
5
+ The **dHash** is the algorithm of image fingerprinting that can be used to measure the similarity of two images.
6
+ The **IDHash** is the new algorithm that has some improvements over dHash -- I'll describe it further.
6
7
 
7
- You may read about it in "Kind of Like That" blog post (21 January 2013): http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html
8
- The original idea is that you split the image into 64 segments and so there are 64 bits -- each tells if the one segment is brighter or darker than the neighbor one. Then the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between fingerprints is the opposite of images similarity.
8
+ You can read about the dHash and perceptual hashing in the article ["Kind of Like That" at "The Hacker Factor Blog"](http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html) (21 January 2013). The idea is that you resize the otiginal image to 8x9 and then convert it to 8x8 array of bits -- each tells if the corresponding segment of the image is brighter or darker than the one on the right (or left). Then you apply the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) to such arrays to measure how much they are different.
9
9
 
10
- There were several implementations on Github already but they all depend on ImageMagick. My implementation takes an advantage of libvips (the `ruby-vips` gem) -- it also uses the `.conv` method and in result converts image to an array of grayscale bytes almost 10 times faster:
11
- ```
12
- load and calculate the fingerprint:
13
- user system total real
14
- Dhash 12.400000 0.820000 13.220000 ( 13.329952)
15
- DHashVips::DHash 1.330000 0.230000 1.560000 ( 1.509826)
16
- DHashVips::IDHash 1.060000 0.090000 1.150000 ( 1.100332)
17
-
18
- measure the distance (1000 times):
19
- user system total real
20
- Dhash hamming 3.140000 0.020000 3.160000 ( 3.179392)
21
- DHashVips::DHash hamming 3.040000 0.020000 3.060000 ( 3.095190)
22
- DHashVips::IDHash distance 6.720000 0.030000 6.750000 ( 6.790900)
23
- ```
24
-
25
- Here the `Dhash` is [another gem](https://github.com/maccman/dhash) that I used earlier in my projects.
26
- The `DHashVips::DHash` is a port of it that uses vips. I would like to tell you that you can replace the `dhash` with `dhash-vips` gem right now but it appeared to have a barely noticeable issue. There is a lot of magic behind the libvips speed and resizing -- you may not notice it with unarmed eyes but when two neighbor segments are enough similar by luminosity the difference can change the sign. So I found two identical images that were just of different colorspace and size (photo by Jordan Voth):
27
- ![](https://storage.googleapis.com/dhash-vips.nakilon.pro/dhash_issue_example.png)
28
- but the distance between their hashes appeared to be equal to 5 while `dhash` gem reported 0.
29
-
30
- This is why `DHashVips::IDHash` appeared.
10
+ There were several Ruby implementations on Github already but they all depended on ImageMagick. My implementation takes an advantage of speed of the libvips (the `ruby-vips` gem) -- it fingerprints images much faster. For even more speed the fingerprint comparison function is made as native C extension.
31
11
 
32
12
  ## IDHash (the Important Difference Hash)
33
13
 
34
- It has improvements over the dHash that made fingerprinting less sensitive to the resizing algorithm and effectively made the pair of images mentioned above to have a distance of 0 again. Three improvements are:
14
+ The main improvement over the dHash is what makes it insensitive to the resizing algorithm and possible errors due to color scheme conversion.
15
+
35
16
  * The "Importance" is an array of extra 64 bits that tells the comparing function which half of 64 bits is important (when the difference between neighbors was enough significant) and which is not. So not every bit in a fingerprint is being compared but only half of them.
36
17
  * It subtracts not only horizontally but also vertically -- that adds 128 more bits.
37
- * Instead of resizing to 9x8 it resizes to 8x8 and puts the image on a torus so it subtracts the left column from the right one and the top from bottom.
38
-
39
- You could see in fingerprint calculation benchmark earlier that these improvements didn't make it slower than dHash because most of the time is spent on image resizing. The calculation of distance is what became two times slower:
40
- ```ruby
41
- ((a | b) & ((a ^ b) >> 128)).to_s(2).count "1"
42
- ```
43
- vs
44
- ```ruby
45
- (a ^ b).to_s(2).count "1"
46
- ```
18
+ * Instead of resizing to 8x9 it resizes to 8x8 and puts the image on a torus so it subtracts the very left column from the very right one and the top from the bottom.
47
19
 
48
20
  ### Example
49
21
 
@@ -65,6 +37,8 @@ Here in each of 64 cells, there are two circles that color the difference betwee
65
37
  * [Finding the closest pair between two sets of points on the hypercube](https://cstheory.stackexchange.com/q/16322/27420)
66
38
  * [Would PCA work for boolean data types?](https://stats.stackexchange.com/q/159705/1125)
67
39
  * [Using pHash to search agaist a huge image database, what is the best approach?](https://stackoverflow.com/q/18257641/322020)
40
+ * [How do I speed up this BIT_COUNT query for hamming distance?](https://stackoverflow.com/q/35065675/322020)
41
+ * [Hamming distance on binary strings in SQL](https://stackoverflow.com/q/4777070/322020)
68
42
 
69
43
  ## Installation
70
44
 
@@ -75,7 +49,7 @@ Then:
75
49
 
76
50
  gem install dhash-vips
77
51
 
78
- If you have troubles with the `gem ruby-vips` dependency, see https://github.com/jcupitt/ruby-vips
52
+ If you have troubles with the `gem ruby-vips` dependency, see https://github.com/libvips/ruby-vips
79
53
 
80
54
  ## Usage
81
55
 
@@ -102,11 +76,11 @@ end
102
76
  ```ruby
103
77
  require "dhash-vips"
104
78
 
105
- hash1 = DHashVips::IDHash.calculate "photo1.jpg"
106
- hash2 = DHashVips::IDHash.calculate "photo2.jpg"
79
+ hash1 = DHashVips::IDHash.fingerprint "photo1.jpg"
80
+ hash2 = DHashVips::IDHash.fingerprint "photo2.jpg"
107
81
 
108
82
  distance = DHashVips::IDHash.distance hash1, hash2
109
- if distance < 10
83
+ if distance < 15
110
84
  puts "Images are very similar"
111
85
  elsif distance < 25
112
86
  puts "Images are slightly similar"
@@ -115,74 +89,134 @@ else
115
89
  end
116
90
  ```
117
91
 
118
- These `10` and `20` numbers are found empirically and just work enough well for 8-byte hashes.
119
- To find out these tresholds we can run a rake task with hardcoded test cases:
120
- ```
121
- $ rake compare_matrices
122
-
123
- Dhash
124
- Absolutely the same image: 0..0
125
- Complex B/W and the same but colorful: 0
126
- Similar images: 13..16
127
- Different images: 9..41
128
-
129
- DHashVips::DHash
130
- Absolutely the same image: 0..0
131
- Complex B/W and the same but colorful: 5
132
- Similar images: 17..18
133
- Different images: 14..39
134
-
135
- DHashVips::IDHash
136
- Absolutely the same image: 0..0
137
- Complex B/W and the same but colorful: 0
138
- Similar images: 15..23
139
- Different images: 19..64
140
-
141
- DHashVips::IDHash 4
142
- Absolutely the same image: 0..0
143
- Complex B/W and the same but colorful: 0
144
- Similar images: 71..108
145
- Different images: 102..211
92
+ ### Notes and benchmarks
146
93
 
147
- ```
94
+ * The above `15` and `25` constants are found empirically and just work enough well for 8-byte hashes. To find these thresholds we can run a rake task with hardcoded test cases (pairs of photos from the same photosession are not the same but are considered to be enough 'similar' for the purpose of this benchmark):
148
95
 
149
- ### Notes
96
+ $ vips -v
97
+ vips-8.9.2-Tue Apr 21 09:26:11 UTC 2020
98
+ $ identify -version | head -1
99
+ Version: ImageMagick 6.9.11-24 Q16 x86_64 2020-07-18 https://imagemagick.org
100
+ $ rake compare_quality
101
+
102
+ Dhash Phamilie DHashVips::DHash DHashVips::IDHash DHashVips::IDHash(4)
103
+ The same image: 0..0 0..0 0..0 0..0 0..0
104
+ 'Jordan Voth case': 2 2 4 0 0
105
+ Similar images: 1..15 14..34 2..23 6..22 53..166
106
+ Different images: 10..54 22..42 10..50 17..65 120..233
107
+ 1/FMI^2 = 1.375 4.0 1.556 1.25 1.306
108
+ FP, FN = [3, 0] [0, 6] [1, 2] [2, 0] [1, 1]
109
+
110
+ The `FMI` line here is the "quality of algorithm", i.e. the best achievable function from the ["Fowlkes–Mallows index"](https://en.wikipedia.org/wiki/Fowlkes%E2%80%93Mallows_index) value if you take the "similar" and "different" test pairs and try to draw the threshold line. Smaller number is better. The last line shows number of false positives (`FP`) and false negatives (`FN`) in case of the best achieved FMI. Here I've added the [`phamilie` gem](https://github.com/toy/phamilie) that is DCT based (not a kind of dhash).
150
111
 
151
112
  * Methods were renamed from `#calculate` to `#fingerprint` and from `#hamming` to `#distance`.
152
- * The `DHash#calculate` accepts `hash_size` optional parameter that is 8 by default. The `IDHash#fingerprint`'s optional parameter is called `power` and works in a bit different way: 3 means 8 and 4 means 16 -- other sizes are not supported because they don't seem to be useful (higher fingerprint resolution makes it vulnerable to image shifts and croppings). Because IDHash's fingerprint is more complex than DHash's one it's not that straight forward to compare them so under the hood the `#distance` methods have to check the size of fingerprint -- this trade-off costs 30-40% of speed that can be eliminated by using `#distance3` method that assumes fingerprint to be of power=3. So the full benchmark is this one:
113
+ * The `DHash#calculate` accepts `hash_size` optional parameter that is 8 by default. The `IDHash#fingerprint`'s optional parameter is called `power` and works in a bit different way: 3 means 8 and 4 means 16 -- other sizes are not supported because they don't seem to be useful (higher fingerprint resolution makes it vulnerable to image shifts and croppings, also `#distance` becomes much slower). Because IDHash's fingerprint is more complex than DHash's one it's not that straight forward to compare them so under the hood the `#distance` method have to check the size of fingerprint. If you are sure that fingerprints were made with power=3 then to skip the check you may use the `#distance3` method directly.
114
+ * The `#distance3` method will use Ruby C extension that is around 15 times faster than pure Ruby implementation -- native extension is currently hardcoded to be compiled only if it's macOS and rbenv Ruby 2.3.8 installed with `-k` flag but if you know how to make the gem gracefully fallback to native Ruby if `make` fails let me know or make a pull request. So the full benchmark:
153
115
 
154
- ```
155
- $ rake compare_speed
156
-
157
- load and calculate the fingerprint:
158
- user system total real
159
- Dhash 12.400000 0.820000 13.220000 ( 13.329952)
160
- DHashVips::DHash 1.330000 0.230000 1.560000 ( 1.509826)
161
- DHashVips::IDHash 1.060000 0.090000 1.150000 ( 1.100332)
162
- DHashVips::IDHash 4 1.030000 0.080000 1.110000 ( 1.089148)
163
-
164
- measure the distance (1000 times):
165
- user system total real
166
- Dhash hamming 3.140000 0.020000 3.160000 ( 3.179392)
167
- DHashVips::DHash hamming 3.040000 0.020000 3.060000 ( 3.095190)
168
- DHashVips::IDHash distance 8.170000 0.040000 8.210000 ( 8.279950)
169
- DHashVips::IDHash distance3 6.720000 0.030000 6.750000 ( 6.790900)
170
- DHashVips::IDHash distance 4 24.430000 0.130000 24.560000 ( 24.652625)
171
- ```
116
+ * Ruby 2.0.0
172
117
 
173
- Also note that to make `#distance` able to assume the fingerprint resolution from the size of Integer that represents it, the change in its structure was needed (left half of bits was swapped with right one), so fingerprints between versions 0.0.4 and 0.0.5 became incompatible, but you probably can convert them manually. I know, incompatibilities suck but if we put the version or structure information inside fingerprint it will became slow to (de)serialize and store.
118
+ $ bundle exec rake compare_speed
174
119
 
175
- ## Troubleshooting
120
+ load the image and calculate the fingerprint:
121
+ user system total real
122
+ Dhash 12.400000 0.820000 13.220000 ( 13.329952)
123
+ DHashVips::DHash 1.330000 0.230000 1.560000 ( 1.509826)
124
+ DHashVips::IDHash 1.060000 0.090000 1.150000 ( 1.100332)
125
+ DHashVips::IDHash 4 1.030000 0.080000 1.110000 ( 1.089148)
176
126
 
177
- El Captain and rbenv may cause environment issues that would make you do things like:
178
- ```
179
- ./ruby `rbenv which rake` compare_matrixes
180
- ```
181
- instead of just
182
- ```
183
- rake compare_matrixes
184
- ```
185
- For more information on that: https://github.com/jcupitt/ruby-vips/issues/141
127
+ measure the distance (32*32*1000 times):
128
+ user system total real
129
+ Dhash hamming 3.140000 0.020000 3.160000 ( 3.179392)
130
+ DHashVips::DHash hamming 3.040000 0.020000 3.060000 ( 3.095190)
131
+ DHashVips::IDHash distance 8.170000 0.040000 8.210000 ( 8.279950)
132
+ DHashVips::IDHash distance3 6.720000 0.030000 6.750000 ( 6.790900)
133
+ DHashVips::IDHash distance 4 24.430000 0.130000 24.560000 ( 24.652625)
134
+
135
+ * Ruby 2.3.3 seems to have some bit arithmetics improvement compared to 2.0:
136
+
137
+ load the image and calculate the fingerprint:
138
+ user system total real
139
+ Dhash 13.110000 0.950000 14.060000 ( 14.537057)
140
+ DHashVips::DHash 1.480000 0.310000 1.790000 ( 1.808787)
141
+ DHashVips::IDHash 1.080000 0.100000 1.180000 ( 1.156446)
142
+ DHashVips::IDHash 4 1.030000 0.090000 1.120000 ( 1.076117)
143
+
144
+ measure the distance (32*32*1000 times):
145
+ user system total real
146
+ Dhash hamming 1.770000 0.010000 1.780000 ( 1.815612)
147
+ DHashVips::DHash hamming 1.810000 0.010000 1.820000 ( 1.875666)
148
+ DHashVips::IDHash distance 4.250000 0.020000 4.270000 ( 4.350071)
149
+ DHashVips::IDHash distance3 3.430000 0.020000 3.450000 ( 3.499031)
150
+ DHashVips::IDHash distance 4 8.210000 0.110000 8.320000 ( 8.510735)
151
+
152
+ * Ruby 2.3.8p459 (2.4.6, 2.5.5 and 2.6.3 are all similar) with newer CPU (`sysctl -n machdep.cpu.brand_string #=> Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz`):
153
+
154
+ load the image and calculate the fingerprint:
155
+ user system total real
156
+ Dhash 6.191731 0.230885 6.422616 ( 6.428763)
157
+ Phamilie 5.361751 0.037524 5.399275 ( 5.402553)
158
+ DHashVips::DHash 0.858045 0.144820 1.002865 ( 0.924308)
159
+ DHashVips::IDHash 0.769975 0.071087 0.841062 ( 0.790470)
160
+ DHashVips::IDHash 4 0.805311 0.077918 0.883229 ( 0.825897)
161
+
162
+ measure the distance (32*32*2000 times):
163
+ user system total real
164
+ Dhash hamming 1.810000 0.000000 1.810000 ( 1.824719)
165
+ Phamilie distance 1.000000 0.010000 1.010000 ( 1.006127)
166
+ DHashVips::DHash hamming 1.810000 0.000000 1.810000 ( 1.817415)
167
+ DHashVips::IDHash distance 1.400000 0.000000 1.400000 ( 1.401333)
168
+ DHashVips::IDHash distance3_ruby 3.320000 0.010000 3.330000 ( 3.337920)
169
+ DHashVips::IDHash distance3_c 0.210000 0.000000 0.210000 ( 0.212864)
170
+ DHashVips::IDHash distance 4 8.300000 0.120000 8.420000 ( 8.499735)
171
+
172
+ * Also note that to make `#distance` able to assume the fingerprint resolution from the size of Integer that represents it, the change in its structure was needed (left half of bits was swapped with right one), so fingerprints between versions 0.0.4 and 0.0.5 became incompatible, but you probably can convert them manually. Otherwise if we put the version or structure information inside fingerprint it would became slow to (de)serialize and store.
173
+
174
+ ## Development notes
175
+
176
+ $ ruby test.rb
177
+
178
+ * You might need to prepend `bundle exec` to all the `rake` commands.
179
+
180
+ * You get this:
181
+
182
+ Can't install RMagick 2.16.0. Can't find MagickWand.h.
183
+
184
+ because Imagemagick sucks but we need it to benchmark alternative gems, so:
185
+
186
+ $ brew install imagemagick@6
187
+ $ brew unlink imagemagick@7
188
+ $ brew link imagemagick@6 --force
189
+
190
+ * OS X El Captain and rbenv may cause environment issues that would make you do things like:
191
+
192
+ $ ./ruby `rbenv which rake` compare_matrixes
193
+
194
+ instead of just
195
+
196
+ $ rake compare_matrixes
197
+
198
+ For more information on that: https://github.com/jcupitt/ruby-vips/issues/141
199
+
200
+ * On macOS, when you do `bundle install` it may fail to install `rmagick` gem (`dhash` gem dependency) saying:
201
+
202
+ ERROR: Can't install RMagick 4.0.0. Can't find magick/MagickCore.h.
203
+
204
+ To resolve this do:
205
+
206
+ $ brew install imagemagick@6
207
+ $ LDFLAGS="-L/usr/local/opt/imagemagick@6/lib" CPPFLAGS="-I/usr/local/opt/imagemagick@6/include" bundle install
208
+
209
+ * If you get `No package 'MagickCore' found` try:
210
+
211
+ $ PKG_CONFIG_PATH="/usr/local/Cellar/imagemagick@6/6.9.10-74/lib/pkgconfig" bundle install
212
+
213
+ * Execute the `rake compare_quality` at least once before executing other rake tasks because it's currently the only one that downloads the test images.
214
+
215
+ * The tag `v0.0.0.4` is not semver and not real gem version -- it's only for Github Actions testing purposes.
216
+
217
+ * To quickly find out what does the dhash-vips Docker image include: `docker run --rm <image_name> sh -c "cat /etc/alpine-release; ruby -v; vips -v; gem list dhash-vips` (TODO: write in this README about the existing Docker image).
218
+
219
+ * Phamilie works with filenames instead of fingerprints and caches them but not distances.
186
220
 
187
221
  ## Credits
188
222
 
data/Rakefile CHANGED
@@ -1,15 +1,7 @@
1
1
  STDOUT.sync = true
2
+ require "pp"
2
3
 
3
- require "bundler/gem_tasks"
4
-
5
-
6
- task :default => %w{ spec }
7
-
8
- require "rspec/core/rake_task"
9
- RSpec::Core::RakeTask.new :spec do |t|
10
- t.verbose = false
11
- end
12
-
4
+ require "bundler/gem_tasks" # to push to rubygems
13
5
 
14
6
  visualize_hash = lambda do |hash|
15
7
  puts hash.to_s(2).rjust(64, ?0).gsub(/(?<=.)/, '\0 ').scan(/.{16}/)
@@ -50,58 +42,80 @@ task :compare_kernels do |_|
50
42
  end
51
43
  end
52
44
 
53
- # ./ruby `rbenv which rake` compare_matrixes
54
- desc "Compare the quality of Dhash, DHashVips::DHash and DHashVips::IDHash -- run it only after `rake test`"
55
- task :compare_matrices do |_|
45
+ desc "Compare the quality of Dhash, Phamilie, DHashVips::DHash, DHashVips::IDHash"
46
+ # in this test we want to know not that photos are the same but rather that they are from the same photosession
47
+ task :compare_quality do
56
48
  require "dhash"
49
+ require "phamilie"
50
+ phamilie = Phamilie.new
57
51
  require_relative "lib/dhash-vips"
58
52
  require "mll"
59
- [
60
- [Dhash, :calculate, :hamming],
61
- [DHashVips::DHash, :calculate, :hamming],
62
- [DHashVips::IDHash, :fingerprint, :distance],
63
- [DHashVips::IDHash, :fingerprint, :distance, 4],
64
- ].each do |m, calc, dm, power|
65
- puts "\n#{m} #{power}"
66
- hashes = %w{
67
- 71662d4d4029a3b41d47d5baf681ab9a.jpg
68
- ad8a37f872956666c3077a3e9e737984.jpg
69
-
70
- 6d97739b4a08f965dc9239dd24382e96.jpg
71
- 1b1d4bde376084011d027bba1c047a4b.jpg
72
-
73
- 1d468d064d2e26b5b5de9a0241ef2d4b.jpg
74
- 92d90b8977f813af803c78107e7f698e.jpg
75
-
76
- 309666c7b45ecbf8f13e85a0bd6b0a4c.jpg
77
- 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
78
- df0a3b93e9412536ee8a11255f974141.jpg
79
- 679634ff89a31279a39f03e278bc9a01.jpg
80
- }.map{ |filename| m.public_send calc, "images/#{filename}", *power }
81
- table = MLL::table[m.method(dm), [hashes], [hashes]]
82
- # require "pp"
83
- # pp table
84
- array = Array.new(5){ [] }
85
- hashes.size.times.to_a.repeated_combination(2) do |i, j|
86
- array[i == j ? 0 : (j - i).abs == 1 && (i + j - 1) % 4 == 0 ? [i, j] == [0, 1] ? 1 : [i, j] == [2, 3] ? 2 : 3 : 4].push table[i][j]
87
- end
88
- # p array.map &:sort
89
- puts "Absolutely the same image: #{array[0].minmax.join ".."}"
90
- puts "Complex B/W and the same but colorful: #{array[1][0]}"
91
- puts "Similar images: #{array[3].minmax.join ".."}"
92
- puts "Different images: #{[*array[2], *array[4]].minmax.join ".."}"
93
- end
53
+
54
+ puts MLL::grid.call( [
55
+ ["", "The same image:", "'Jordan Voth case':", "Similar images:", "Different images:", "1/FMI^2 =", "FP, FN ="],
56
+ *[
57
+ [Dhash, :calculate, :hamming],
58
+ [phamilie, :fingerprint, :distance, nil, 0],
59
+ [DHashVips::DHash, :calculate, :hamming],
60
+ [DHashVips::IDHash, :fingerprint, :distance],
61
+ [DHashVips::IDHash, :fingerprint, :distance, 4],
62
+ ].map do |m, calc, dm, power, ii|
63
+ require_relative "common"
64
+ hashes = %w{
65
+ 71662d4d4029a3b41d47d5baf681ab9a.jpg ad8a37f872956666c3077a3e9e737984.jpg
66
+
67
+ 1b1d4bde376084011d027bba1c047a4b.jpg 6d97739b4a08f965dc9239dd24382e96.jpg
68
+
69
+ 1d468d064d2e26b5b5de9a0241ef2d4b.jpg 92d90b8977f813af803c78107e7f698e.jpg
70
+ 309666c7b45ecbf8f13e85a0bd6b0a4c.jpg 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
71
+ 679634ff89a31279a39f03e278bc9a01.jpg df0a3b93e9412536ee8a11255f974141.jpg
72
+ 54192a3f65bd03163b04849e1577a40b.jpg 6d32f57459e5b79b5deca2a361eb8c6e.jpg
73
+ 4b62e0eef58bfbc8d0d2fbf2b9d05483.jpg b8eb0ca91855b657f12fb3d627d45c53.jpg
74
+ 21cd9a6986d98976b6b4655e1de7baf4.jpg 9b158c0d4953d47171a22ed84917f812.jpg
75
+ 9c2c240ec02356472fb532f404d28dde.jpg fc762fa286489d8afc80adc8cdcb125e.jpg
76
+ 7a833d873f8d49f12882e86af1cc6b79.jpg ac033cf01a3941dd1baa876082938bc9.jpg
77
+ }.map(&method(:download_and_keep)).map{ |filename| [filename, m.public_send(calc, filename, *power)] }
78
+ table = MLL::table[m.method(dm), [hashes.map{|_|_[ii||1]}], [hashes.map{|_|_[ii||1]}]]
79
+ report = Struct.new(:same, :bw, :sim, :not_sim).new [], [], [], []
80
+ hashes.size.times.to_a.repeated_combination(2) do |i, j|
81
+ report[
82
+ case
83
+ when i == j ; :same
84
+ when [i, j] == [0, 1] ; :bw
85
+ when i > 3 && i + 1 == j && i % 2 == 0 ; :sim
86
+ else ; :not_sim
87
+ end
88
+ ].push table[i][j]
89
+ end
90
+ min, max = [*report.sim, *report.not_sim].minmax
91
+ fmi, fp, fn = (min..max+1).map do |b|
92
+ fp = report.not_sim.count{ |_| _ < b }
93
+ tp = report.sim.count{ |_| _ < b }
94
+ fn = report.sim.count{ |_| _ >= b }
95
+ [((tp + fp) * (tp + fn)).fdiv(tp * tp), fp, fn]
96
+ end.reject{ |_,| _.nan? }.min_by(&:first)
97
+ [
98
+ "#{m.is_a?(Module) ? m : m.class}#{"(#{power})" if power}",
99
+ report.same. minmax.join(".."),
100
+ report.bw[0],
101
+ report.sim. minmax.join(".."),
102
+ report.not_sim.minmax.join(".."),
103
+ fmi.round(3),
104
+ [fp, fn]
105
+ ]
106
+ end,
107
+ ].transpose, spacings: [1.5, 0], alignment: :right )
94
108
  end
95
109
 
96
110
  # ruby -c Rakefile && rm -f ab.png && rake compare_images -- fc762fa286489d8afc80adc8cdcb125e.jpg 9c2c240ec02356472fb532f404d28dde.jpg 2>/dev/null && ql ab.png
97
111
  # rm -f ab.png && ./ruby `rbenv which rake` compare_images -- 6d97739b4a08f965dc9239dd24382e96.jpg 1b1d4bde376084011d027bba1c047a4b.jpg 2>/dev/null && ql ab.png
98
112
  desc "Visualizes the IDHash difference measurement between two images"
99
113
  task :compare_images do |_|
100
- abort "there should be two image filenames passed as arguments (and optionally the `power`)" unless (4..5) === ARGV.size
101
- abort "the optional argument should be either 3 or 4" unless [3, 4].include?(power = (ARGV[4] || 3).to_i)
114
+ abort "there should be two image filenames passed as arguments (and optionally the `power`)" unless (3..4) === ARGV.size
115
+ abort "the optional argument should be either 3 or 4" unless [3, 4].include?(power = (ARGV[3] || 3).to_i)
102
116
  task ARGV.last do ; end
103
117
  require_relative "lib/dhash-vips"
104
- ha, hb = ARGV[2, 2].map{ |filename| DHashVips::IDHash.fingerprint(filename, power) }
118
+ ha, hb = ARGV[1, 2].map{ |filename| DHashVips::IDHash.fingerprint(filename, power) }
105
119
  puts "distance: #{DHashVips::IDHash.distance ha, hb}"
106
120
  size = 2 ** power
107
121
  shift = 2 * size * size
@@ -110,7 +124,7 @@ task :compare_images do |_|
110
124
  bi = hb >> shift
111
125
  bd = hb - (bi << shift)
112
126
 
113
- a, b = ARGV[2, 2].map do |filename|
127
+ a, b = ARGV[1, 2].map do |filename|
114
128
  image = Vips::Image.new_from_file filename
115
129
  image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w").
116
130
  resize(100, vscale: 100, kernel: :nearest).colourspace("srgb")
@@ -187,10 +201,11 @@ task :compare_images do |_|
187
201
  a.join(b, :horizontal, shim: 15).write_to_file "ab.png"
188
202
  end
189
203
 
190
- # ./ruby `rbenv which rake` compare_speed
191
- desc "Benchmarks Dhash, DHashVips::DHash and DHashVips::IDHash"
204
+ desc "Benchmark speed of Dhash, DHashVips::DHash, DHashVips::IDHash and Phamilie"
192
205
  task :compare_speed do
193
206
  require "dhash"
207
+ require "phamilie"
208
+ phamilie = Phamilie.new
194
209
  require_relative "lib/dhash-vips"
195
210
 
196
211
  filenames = %w{
@@ -214,36 +229,128 @@ task :compare_speed do
214
229
  end
215
230
 
216
231
  require "benchmark"
217
- puts "load and calculate the fingerprint:"
232
+ puts "load the image and calculate the fingerprint:"
218
233
  hashes = []
219
234
  Benchmark.bm 19 do |bm|
220
235
  [
221
236
  [Dhash, :calculate],
237
+ [phamilie, :fingerprint],
222
238
  [DHashVips::DHash, :calculate],
223
239
  [DHashVips::IDHash, :fingerprint],
224
240
  [DHashVips::IDHash, :fingerprint, 4],
225
241
  ].each do |m, calc, power|
226
- bm.report "#{m} #{power}" do
242
+ bm.report "#{m.is_a?(Module) ? m : m.class} #{power}" do
227
243
  hashes.push filenames.map{ |filename| m.send calc, filename, *power }
228
244
  end
229
245
  end
230
246
  end
231
- hashes[-1, 1] = hashes[-2, 2] # for `distance` and `distance3` we use the same hashes
232
- puts "\nmeasure the distance (1000 times):"
233
- Benchmark.bm 28 do |bm|
247
+
248
+ # for `distance`, `distance3_ruby` and `distance3_c` we use the same hashes
249
+ # this array manipulation converts [1, 2, 3, 4, 5] into [1, 2, 3, 4, 4, 4, 5]
250
+ hashes[-1, 1] = hashes[-2, 2]
251
+ hashes[-1, 1] = hashes[-2, 2]
252
+
253
+ puts "\nmeasure the distance (32*32*2000 times):"
254
+ Benchmark.bm 32 do |bm|
234
255
  [
235
256
  [Dhash, :hamming],
257
+ [phamilie, :distance, nil, 1],
236
258
  [DHashVips::DHash, :hamming],
237
259
  [DHashVips::IDHash, :distance],
238
- [DHashVips::IDHash, :distance3],
260
+ [DHashVips::IDHash, :distance3_ruby],
261
+ [DHashVips::IDHash, :distance3_c],
239
262
  [DHashVips::IDHash, :distance, 4],
240
- ].zip(hashes) do |(m, dm, power), hs|
241
- bm.report "#{m} #{dm} #{power}" do
242
- hs.product hs do |h1, h2|
243
- 1000.times{ m.public_send dm, h1, h2 }
263
+ ].zip(hashes) do |(m, dm, power, ii), hs|
264
+ bm.report "#{m.is_a?(Module) ? m : m.class} #{dm} #{power}" do
265
+ _ = [hs, filenames][ii || 0]
266
+ _.product _ do |h1, h2|
267
+ 2000.times{ m.public_send dm, h1, h2 }
244
268
  end
245
269
  end
246
270
  end
247
271
  end
248
272
 
249
273
  end
274
+
275
+ desc "Benchmarks everything about Dhash, DHashVips::DHash, DHashVips::IDHash and Phamilie"
276
+ task :benchmark do
277
+ abort "provide a folder with images grouped by similarity" unless 2 === ARGV.size
278
+ abort "invalid folder provided" unless Dir.exist?(dir = ARGV.last)
279
+
280
+ require "dhash"
281
+ require "phamilie"
282
+ phamilie = Phamilie.new
283
+ require_relative "lib/dhash-vips"
284
+
285
+ filenames = Dir.glob("#{dir}/*").map{ |_| Dir.glob "#{_}/*" }
286
+ puts "image groups sizes: #{filenames.map(&:size)}"
287
+ require "benchmark"
288
+
289
+ puts "step 1 / 3 (fingerprinting)"
290
+ hashes = []
291
+ bm1 = [
292
+ [phamilie, :fingerprint],
293
+ [Dhash, :calculate],
294
+ [DHashVips::DHash, :calculate],
295
+ [DHashVips::IDHash, :fingerprint],
296
+ ].map do |m, calc, power|
297
+ Benchmark.realtime do
298
+ hashes.push filenames.flatten.map{ |filename| m.send calc, filename, *power }
299
+ end
300
+ end
301
+
302
+ puts "step 2 / 3 (comparing fingerprints)"
303
+ combs = filenames.flatten.size ** 2
304
+ n = 10_000_000_000_000 / Dir.glob("#{dir}/*/*").map(&File.method(:size)).inject(:+) / combs
305
+ bm2 = [
306
+ [phamilie, :distance, nil, filenames.flatten],
307
+ [Dhash, :hamming],
308
+ [DHashVips::DHash, :hamming],
309
+ [DHashVips::IDHash, :distance3_c],
310
+ ].zip(hashes).map do |(m, dm, power, ii), hs|
311
+ Benchmark.realtime do
312
+ _ = ii || hs
313
+ _.product _ do |h1, h2|
314
+ n.times{ m.public_send dm, h1, h2 }
315
+ end
316
+ end
317
+ end
318
+
319
+ puts "step 3 / 3 (looking for the best threshold)"
320
+ bm3 = [
321
+ [phamilie, :fingerprint, :distance, nil, 0],
322
+ [Dhash, :calculate, :hamming],
323
+ [DHashVips::DHash, :calculate, :hamming],
324
+ [DHashVips::IDHash, :fingerprint, :distance],
325
+ ].map do |m, calc, dm, power, ii|
326
+ require_relative "common"
327
+ hashes = Dir.glob("#{dir}/*").flat_map{ |_| Dir.glob "#{_}/*" }.map{ |filename| [filename, m.public_send(calc, filename, *power)] }
328
+ report = Struct.new(:same, :sim, :not_sim).new [], [], []
329
+ hashes.size.times.to_a.repeated_combination(2) do |i, j|
330
+ report[
331
+ case
332
+ when i == j ; :same
333
+ when File.split(File.split(hashes[i][0]).first).last ==
334
+ File.split(File.split(hashes[j][0]).first).last && i < j ; :sim
335
+ else ; :not_sim
336
+ end
337
+ ].push m.method(dm).call hashes[i][ii||1], hashes[j][ii||1]
338
+ end
339
+ min, max = [*report.sim, *report.not_sim].minmax
340
+ fmi, fp, fn = (min..max+1).map do |b|
341
+ fp = report.not_sim.count{ |_| _ < b }
342
+ tp = report.sim.count{ |_| _ < b }
343
+ fn = report.sim.count{ |_| _ >= b }
344
+ [((tp + fp) * (tp + fn)).fdiv(tp * tp), fp, fn]
345
+ end.reject{ |_,| _.nan? }.min_by(&:first)
346
+ fmi
347
+ end
348
+
349
+ require "mll"
350
+ puts MLL::grid.call %w{ \ Fingerprint Compare 1/FMI^2 }.zip(*[
351
+ %w{ Phamilie Dhash DHash IDHash },
352
+ *[bm1, bm2].map{ |bm| bm.map{ |_| "%.3f" % _ } },
353
+ bm3.map{ |_| "%.3f" % _ }
354
+ ].transpose).transpose, spacings: [1.5, 0], alignment: :right
355
+ puts "(lower numbers are better)"
356
+ end
@@ -0,0 +1,11 @@
1
+ def download_and_keep image # returns path
2
+ require "open-uri"
3
+ require "digest"
4
+ File.join(FileUtils.mkdir_p(File.expand_path "images", __dir__()).first, image).tap do |path|
5
+ open("https://storage.googleapis.com/dhash-vips.nakilon.pro/#{image}") do |link|
6
+ File.open(path, "wb") do |file|
7
+ IO.copy_stream link, file
8
+ end
9
+ end unless File.exist?(path) && Digest::MD5.file(path) == File.basename(image, ".jpg")
10
+ end
11
+ end
@@ -1,20 +1,31 @@
1
1
  Gem::Specification.new do |spec|
2
2
  spec.name = "dhash-vips"
3
- spec.version = "0.0.5.0"
3
+ spec.version = "0.1.0.3"
4
4
  spec.author = "Victor Maslov"
5
5
  spec.email = "nakilon@gmail.com"
6
6
  spec.summary = "dHash and IDHash powered by Vips"
7
7
  spec.homepage = "https://github.com/nakilon/dhash-vips"
8
8
  spec.license = "MIT"
9
9
 
10
- spec.test_files = ["spec"]
11
- spec.files = `git ls-files -z`.split("\x0") - spec.test_files
12
10
  spec.require_path = "lib"
11
+ spec.test_files = %w{ test.rb }
12
+ spec.extensions = %w{ extconf.rb }
13
+ spec.files = `git ls-files -z`.split("\x0") -
14
+ spec.test_files -
15
+ %w{ .gitignore Dockerfile } -
16
+ Dir.glob("example_*/**/*") -
17
+ Dir.glob(".github/**/*")
13
18
 
14
- spec.add_dependency "ruby-vips"
19
+ spec.add_dependency "ruby-vips", "~>2.0.16"
15
20
 
16
21
  spec.add_development_dependency "rake"
17
- spec.add_development_dependency "rspec-core"
18
- spec.add_development_dependency "dhash"
22
+ spec.add_development_dependency "minitest"
19
23
  spec.add_development_dependency "get_process_mem"
24
+
25
+ spec.add_development_dependency "rmagick", "~>2.16"
26
+ spec.add_development_dependency "phamilie"
27
+ spec.add_development_dependency "dhash"
28
+
29
+ spec.add_development_dependency "mll"
30
+ spec.add_development_dependency "byebug"
20
31
  end
@@ -0,0 +1,31 @@
1
+ require "mkmf"
2
+
3
+ File.write "Makefile", dummy_makefile(?.).join
4
+ unless Gem::Platform.local.os == "darwin" && Gem::Version.new(RUBY_VERSION) == Gem::Version.new("2.3.8")
5
+ else
6
+ begin
7
+ # https://github.com/rbenv/rbenv/issues/1199
8
+ append_cppflags "-I#{Dir.glob("#{`rbenv root`.chomp}/sources/#{`rbenv version-name`.chomp}/*/").first}"
9
+ rescue
10
+ else
11
+ create_makefile "idhash"
12
+ # Why this hack?
13
+ # 1. Because I want to use Ruby and ./idhash.bundle for tests, not C.
14
+ # 2. Because I don't want to bother users with two gems instead of one.
15
+ File.write "Makefile", <<~HEREDOC + File.read("Makefile")
16
+ .PHONY: test
17
+ test: all
18
+ \t$(RUBY) -r./lib/dhash-vips.rb ./lib/dhash-vips-post-install-test.rb
19
+ HEREDOC
20
+ end
21
+ end
22
+
23
+ # Cases to check:
24
+ # 0. all is ok
25
+ # `rm -rf idhash.o idhash.bundle pkg && bundle exec rake install` # w/o ext # ["/Users/nakilon/_/dhash-vips/lib/dhash-vips.rb", 32]
26
+ # `rm -f idhash.o idhash.bundle Makefile && ruby extconf.rb && make` # with ext # ["/Users/nakilon/_/dhash-vips/lib/dhash-vips.rb", 40]
27
+ # `bundle exec rake -rdhash-vips -e "p DHashVips::IDHash.method(:distance3).source_location"`
28
+ # 1. not macOS && rbenv
29
+ # 2. fail during append_cppflags
30
+ # 3. failed compilation
31
+ # 4. failed tests
@@ -0,0 +1,29 @@
1
+ #include <bignum.c>
2
+
3
+ static VALUE idhash_distance(VALUE self, VALUE a, VALUE b){
4
+ BDIGIT* tempd;
5
+ long i, an = BIGNUM_LEN(a), bn = BIGNUM_LEN(b), templ, acc = 0;
6
+ BDIGIT* as = BDIGITS(a);
7
+ BDIGIT* bs = BDIGITS(b);
8
+ while (0 < an && as[an-1] == 0) an--; // for (i = an; --i;) printf("%u\n", as[i]);
9
+ while (0 < bn && bs[bn-1] == 0) bn--; // for (i = bn; --i;) printf("%u\n", bs[i]);
10
+ // printf("%lu %lu\n", an, bn);
11
+ if (an < bn) {
12
+ tempd = as; as = bs; bs = tempd;
13
+ templ = an; an = bn; bn = templ;
14
+ }
15
+ for (i = an; i-- > 4;) {
16
+ // printf("%ld : (%u | %u) & (%u ^ %u)\n", i, as[i], (i >= bn ? 0 : bs[i]), as[i-4], bs[i-4]);
17
+ acc += __builtin_popcountl((as[i] | (i >= bn ? 0 : bs[i])) & (as[i-4] ^ bs[i-4]));
18
+ // printf("%ld : %ld\n", i, acc);
19
+ }
20
+ RB_GC_GUARD(a);
21
+ RB_GC_GUARD(b);
22
+ return INT2FIX(acc);
23
+ }
24
+
25
+ void Init_idhash() {
26
+ VALUE m = rb_define_module("DHashVips");
27
+ VALUE mm = rb_define_module_under(m, "IDHash");
28
+ rb_define_module_function(mm, "distance3_c", idhash_distance, 2);
29
+ }
@@ -0,0 +1,36 @@
1
+ puts "Testing native extension..."
2
+
3
+ a, b = 27362028616592833077810614538336061650596602259623245623188871925927275101952, 57097733966917585112089915289446881218887831888508524872740133297073405558528
4
+ f = ->(a,b){ DHashVips::IDHash.distance3_ruby a, b }
5
+
6
+ p as = [a.to_s(16).rjust(64,?0)].pack("H*").unpack("N*")
7
+ p bs = [b.to_s(16).rjust(64,?0)].pack("H*").unpack("N*")
8
+ puts as.zip(bs)[0,4].map{ |i,j| (i | j).to_s(2).rjust(32, ?0) }.zip \
9
+ as.zip(bs)[4,4].map{ |i,j| (i ^ j).to_s(2).rjust(32, ?0) }
10
+ p DHashVips::IDHash.distance3_c a, b
11
+ p f[a, b]
12
+ fail unless 17 == f[a, b]
13
+
14
+ s = [0, 1, 1<<63, (1<<63)+1, (1<<64)-1].each do |_|
15
+ # p [_.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
16
+ end
17
+ ss = s.repeated_permutation(4).map do |s1, s2, s3, s4|
18
+ ((s1 << 192) + (s2 << 128) + (s3 << 64) + s4).tap do |_|
19
+ # p [_.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
20
+ end
21
+ end
22
+ fail unless :distance3 == DHashVips::IDHash.method(:distance3).original_name
23
+ ss.product ss do |s1, s2|
24
+ next unless s1.is_a?(Bignum) && s2.is_a?(Bignum)
25
+ unless f[s1, s2] == DHashVips::IDHash.distance3_c(s1, s2)
26
+ p [s1, s2]
27
+ p [s1.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
28
+ p [s2.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
29
+ p [f[s1, s2], DHashVips::IDHash.distance3_c(s1, s2)]
30
+ fail
31
+ end
32
+ end
33
+ 100000.times do
34
+ s1, s2 = Array.new(2){ n = rand 256; ([?0] * n + [?1] * (256 - n)).shuffle.join.to_i 2 }
35
+ fail unless DHashVips::IDHash.distance3(s1, s2) == DHashVips::IDHash.distance3_ruby(s1, s2)
36
+ end
@@ -10,7 +10,7 @@ module DHashVips
10
10
  end
11
11
 
12
12
  def pixelate file, hash_size, kernel = nil
13
- image = Vips::Image.new_from_file file
13
+ image = Vips::Image.new_from_file file, access: :sequential
14
14
  if kernel
15
15
  image.resize((hash_size + 1).fdiv(image.width), vscale: hash_size.fdiv(image.height), kernel: kernel).colourspace("b-w")
16
16
  else
@@ -21,7 +21,7 @@ module DHashVips
21
21
  def calculate file, hash_size = 8, kernel = nil
22
22
  image = pixelate file, hash_size, kernel
23
23
 
24
- image.cast("int").conv([1, -1]).crop(1, 0, hash_size, hash_size).>(0)./(255).cast("uchar").to_a.join.to_i(2)
24
+ image.cast("int").conv([[1, -1]]).crop(1, 0, hash_size, hash_size).>(0)./(255).cast("uchar").to_a.join.to_i(2)
25
25
  end
26
26
 
27
27
  end
@@ -29,17 +29,30 @@ module DHashVips
29
29
  module IDHash
30
30
  extend self
31
31
 
32
- def distance3 a, b
33
- return ((a ^ b) & (a | b) >> 128).to_s(2).count "1"
32
+ def distance3_ruby a, b
33
+ ((a ^ b) & (a | b) >> 128).to_s(2).count "1"
34
+ end
35
+ begin
36
+ require_relative "../idhash.bundle"
37
+ rescue LoadError
38
+ alias_method :distance3, :distance3_ruby
39
+ else
40
+ def distance3 a, b
41
+ if a.is_a?(Bignum) && b.is_a?(Bignum)
42
+ distance3_c a, b
43
+ else
44
+ distance3_ruby a, b
45
+ end
46
+ end
34
47
  end
35
48
  def distance a, b
36
49
  size_a, size_b = [a, b].map do |x|
37
- case x.size
38
- when 32 ; 8
39
- when 128, 124, 120 ; 16
40
- else ; fail "invalid size of fingerprint; #{x.size}"
41
- end
50
+ # TODO write a test about possible hash sizes
51
+ # they were 32 and 128, 124, 120 for MRI 2.0
52
+ # but also 31, 30 happens for MRI 2.3
53
+ x.size <= 32 ? 8 : 16
42
54
  end
55
+ return distance3 a, b if [8, 8] == [size_a, size_b]
43
56
  fail "fingerprints were taken with different `power` param: #{size_a} and #{size_b}" if size_a != size_b
44
57
  ((a ^ b) & (a | b) >> 2 * size_a * size_a).to_s(2).count "1"
45
58
  end
@@ -64,10 +77,10 @@ module DHashVips
64
77
  fail unless 1 == @@median[[1, 1, 1]]
65
78
  fail unless 1 == @@median[[1, 1]]
66
79
 
67
- def fingerprint file, power = 3
80
+ def fingerprint filename, power = 3
68
81
  size = 2 ** power
69
- image = Vips::Image.new_from_file file
70
- image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w")
82
+ image = Vips::Image.new_from_file filename, access: :sequential
83
+ image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w").flatten
71
84
 
72
85
  array = image.to_a.map &:flatten
73
86
  d1, i1, d2, i2 = [array, array.transpose].flat_map do |a|
data/test.rb ADDED
@@ -0,0 +1,101 @@
1
+ require "minitest/autorun"
2
+
3
+ require "dhash-vips"
4
+
5
+ # TODO tests about `fingerprint(4)`
6
+
7
+ [
8
+ [DHashVips::DHash, :hamming, :calculate, 2, 23, 18, 50, 4],
9
+ # vips-8.9.1-Tue Jan 28 13:05:46 UTC 2020
10
+ # [[0, 14, 26, 27, 31, 27, 32, 28, 43, 43, 34, 37, 37, 34, 35, 42],
11
+ # [14, 0, 28, 25, 39, 35, 32, 32, 43, 43, 38, 41, 41, 38, 37, 50],
12
+ # [26, 28, 0, 13, 35, 41, 28, 30, 41, 41, 36, 33, 35, 32, 27, 36],
13
+ # [27, 25, 13, 0, 36, 36, 31, 35, 40, 40, 33, 32, 42, 35, 26, 33],
14
+ # [31, 39, 35, 36, 0, 16, 33, 33, 40, 40, 31, 24, 28, 33, 40, 31],
15
+ # [27, 35, 41, 36, 16, 0, 41, 41, 38, 38, 23, 26, 24, 29, 34, 27],
16
+ # [32, 32, 28, 31, 33, 41, 0, 10, 27, 25, 38, 35, 37, 32, 23, 34],
17
+ # [28, 32, 30, 35, 33, 41, 10, 0, 27, 27, 34, 31, 37, 36, 27, 34],
18
+ # [43, 43, 41, 40, 40, 38, 27, 27, 0, 2, 35, 34, 30, 31, 28, 27],
19
+ # [43, 43, 41, 40, 40, 38, 25, 27, 2, 0, 35, 34, 30, 31, 28, 27],
20
+ # [34, 38, 36, 33, 31, 23, 38, 34, 35, 35, 0, 9, 23, 26, 29, 18],
21
+ # [37, 41, 33, 32, 24, 26, 35, 31, 34, 34, 9, 0, 22, 25, 30, 19],
22
+ # [37, 41, 35, 42, 28, 24, 37, 37, 30, 30, 23, 22, 0, 19, 26, 23],
23
+ # [34, 38, 32, 35, 33, 29, 32, 36, 31, 31, 26, 25, 19, 0, 21, 26],
24
+ # [35, 37, 27, 26, 40, 34, 23, 27, 28, 28, 29, 30, 26, 21, 0, 23],
25
+ # [42, 50, 36, 33, 31, 27, 34, 34, 27, 27, 18, 19, 23, 26, 23, 0]]
26
+ [DHashVips::IDHash, :distance, :fingerprint, 6, 22, 23, 65, 0],
27
+ # vips-8.9.1-Tue Jan 28 13:05:46 UTC 2020
28
+ # [[0, 16, 32, 35, 57, 45, 51, 50, 48, 47, 54, 48, 60, 50, 47, 56],
29
+ # [16, 0, 30, 34, 58, 47, 55, 56, 47, 50, 57, 49, 62, 52, 52, 61],
30
+ # [32, 30, 0, 9, 47, 54, 45, 41, 65, 62, 42, 37, 51, 44, 49, 49],
31
+ # [35, 34, 9, 0, 54, 64, 42, 40, 57, 56, 48, 39, 50, 40, 41, 51],
32
+ # [57, 58, 47, 54, 0, 22, 43, 45, 64, 61, 48, 47, 35, 43, 47, 48],
33
+ # [45, 47, 54, 64, 22, 0, 53, 54, 55, 54, 40, 46, 39, 42, 43, 42],
34
+ # [51, 55, 45, 42, 43, 53, 0, 6, 33, 35, 52, 43, 46, 45, 44, 47],
35
+ # [50, 56, 41, 40, 45, 54, 6, 0, 38, 41, 53, 50, 48, 45, 41, 42],
36
+ # [48, 47, 65, 57, 64, 55, 33, 38, 0, 9, 51, 53, 47, 47, 41, 46],
37
+ # [47, 50, 62, 56, 61, 54, 35, 41, 9, 0, 51, 57, 50, 49, 44, 43],
38
+ # [54, 57, 42, 48, 48, 40, 52, 53, 51, 51, 0, 10, 33, 36, 38, 25],
39
+ # [48, 49, 37, 39, 47, 46, 43, 50, 53, 57, 10, 0, 27, 30, 37, 27],
40
+ # [60, 62, 51, 50, 35, 39, 46, 48, 47, 50, 33, 27, 0, 20, 23, 28],
41
+ # [50, 52, 44, 40, 43, 42, 45, 45, 47, 49, 36, 30, 20, 0, 35, 39],
42
+ # [47, 52, 49, 41, 47, 43, 44, 41, 41, 44, 38, 37, 23, 35, 0, 19],
43
+ # [56, 61, 49, 51, 48, 42, 47, 42, 46, 43, 25, 27, 28, 39, 19, 0]]
44
+ ].each do |lib, dm, calc, min_similar, max_similar, min_not_similar, max_not_similar, bw_exceptional|
45
+
46
+ describe lib do
47
+
48
+ # these are false positive by idhash
49
+ # 6d97739b4a08f965dc9239dd24382e96.jpg
50
+ # 1b1d4bde376084011d027bba1c047a4b.jpg
51
+ [
52
+ [ %w{
53
+ 1d468d064d2e26b5b5de9a0241ef2d4b.jpg 92d90b8977f813af803c78107e7f698e.jpg
54
+ 309666c7b45ecbf8f13e85a0bd6b0a4c.jpg 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
55
+ 679634ff89a31279a39f03e278bc9a01.jpg df0a3b93e9412536ee8a11255f974141.jpg
56
+ 54192a3f65bd03163b04849e1577a40b.jpg 6d32f57459e5b79b5deca2a361eb8c6e.jpg
57
+ 4b62e0eef58bfbc8d0d2fbf2b9d05483.jpg b8eb0ca91855b657f12fb3d627d45c53.jpg
58
+ 21cd9a6986d98976b6b4655e1de7baf4.jpg 9b158c0d4953d47171a22ed84917f812.jpg
59
+ 9c2c240ec02356472fb532f404d28dde.jpg fc762fa286489d8afc80adc8cdcb125e.jpg
60
+ 7a833d873f8d49f12882e86af1cc6b79.jpg ac033cf01a3941dd1baa876082938bc9.jpg
61
+ }, min_similar, max_similar], # slightly silimar images
62
+ [ %w{
63
+ 71662d4d4029a3b41d47d5baf681ab9a.jpg ad8a37f872956666c3077a3e9e737984.jpg
64
+ }, bw_exceptional, bw_exceptional], # these are the same photo but of different size and colorspace
65
+ ].each do |images, min, max|
66
+
67
+ require "fileutils"
68
+ require "digest"
69
+ require "mll"
70
+
71
+ require_relative "common"
72
+ images = images.map &method(:download_and_keep)
73
+
74
+ hashes = images.map &lib.method(calc)
75
+ table = MLL::table[lib.method(dm), [hashes], [hashes]]
76
+
77
+ require "pp"
78
+ STDERR.puts ""
79
+ PP.pp table, STDERR
80
+ STDERR.puts ""
81
+
82
+ hashes.size.times.to_a.repeated_combination(2) do |i, j|
83
+ it do
84
+ case
85
+ when i == j
86
+ assert_predicate table[i][j], :zero?
87
+ when (j - i).abs == 1 && (i + j - 1) % 4 == 0
88
+ # STDERR.puts [table[i][j], min, max].inspect
89
+ assert_includes min..max, table[i][j]
90
+ else
91
+ # STDERR.puts [table[i][j], min_not_similar, max_not_similar].inspect
92
+ assert_includes min_not_similar..max_not_similar, table[i][j]
93
+ end
94
+ end
95
+ end
96
+
97
+ end
98
+
99
+ end
100
+
101
+ end
metadata CHANGED
@@ -1,99 +1,159 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dhash-vips
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.5.0
4
+ version: 0.1.0.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Victor Maslov
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-02-11 00:00:00.000000000 Z
11
+ date: 2020-07-25 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ruby-vips
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - '>='
17
+ - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: '0'
19
+ version: 2.0.16
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - '>='
24
+ - - "~>"
25
25
  - !ruby/object:Gem::Version
26
- version: '0'
26
+ version: 2.0.16
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: rake
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - '>='
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: minitest
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: get_process_mem
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
32
60
  - !ruby/object:Gem::Version
33
61
  version: '0'
34
62
  type: :development
35
63
  prerelease: false
36
64
  version_requirements: !ruby/object:Gem::Requirement
37
65
  requirements:
38
- - - '>='
66
+ - - ">="
39
67
  - !ruby/object:Gem::Version
40
68
  version: '0'
41
69
  - !ruby/object:Gem::Dependency
42
- name: rspec-core
70
+ name: rmagick
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '2.16'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '2.16'
83
+ - !ruby/object:Gem::Dependency
84
+ name: phamilie
43
85
  requirement: !ruby/object:Gem::Requirement
44
86
  requirements:
45
- - - '>='
87
+ - - ">="
46
88
  - !ruby/object:Gem::Version
47
89
  version: '0'
48
90
  type: :development
49
91
  prerelease: false
50
92
  version_requirements: !ruby/object:Gem::Requirement
51
93
  requirements:
52
- - - '>='
94
+ - - ">="
53
95
  - !ruby/object:Gem::Version
54
96
  version: '0'
55
97
  - !ruby/object:Gem::Dependency
56
98
  name: dhash
57
99
  requirement: !ruby/object:Gem::Requirement
58
100
  requirements:
59
- - - '>='
101
+ - - ">="
60
102
  - !ruby/object:Gem::Version
61
103
  version: '0'
62
104
  type: :development
63
105
  prerelease: false
64
106
  version_requirements: !ruby/object:Gem::Requirement
65
107
  requirements:
66
- - - '>='
108
+ - - ">="
67
109
  - !ruby/object:Gem::Version
68
110
  version: '0'
69
111
  - !ruby/object:Gem::Dependency
70
- name: get_process_mem
112
+ name: mll
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '0'
125
+ - !ruby/object:Gem::Dependency
126
+ name: byebug
71
127
  requirement: !ruby/object:Gem::Requirement
72
128
  requirements:
73
- - - '>='
129
+ - - ">="
74
130
  - !ruby/object:Gem::Version
75
131
  version: '0'
76
132
  type: :development
77
133
  prerelease: false
78
134
  version_requirements: !ruby/object:Gem::Requirement
79
135
  requirements:
80
- - - '>='
136
+ - - ">="
81
137
  - !ruby/object:Gem::Version
82
138
  version: '0'
83
139
  description:
84
140
  email: nakilon@gmail.com
85
141
  executables: []
86
- extensions: []
142
+ extensions:
143
+ - extconf.rb
87
144
  extra_rdoc_files: []
88
145
  files:
89
- - .gitignore
90
146
  - Gemfile
91
147
  - LICENSE.txt
92
148
  - README.md
93
149
  - Rakefile
150
+ - common.rb
94
151
  - dhash-vips.gemspec
152
+ - extconf.rb
153
+ - idhash.c
154
+ - lib/dhash-vips-post-install-test.rb
95
155
  - lib/dhash-vips.rb
96
- - spec/_spec.rb
156
+ - test.rb
97
157
  homepage: https://github.com/nakilon/dhash-vips
98
158
  licenses:
99
159
  - MIT
@@ -104,18 +164,19 @@ require_paths:
104
164
  - lib
105
165
  required_ruby_version: !ruby/object:Gem::Requirement
106
166
  requirements:
107
- - - '>='
167
+ - - ">="
108
168
  - !ruby/object:Gem::Version
109
169
  version: '0'
110
170
  required_rubygems_version: !ruby/object:Gem::Requirement
111
171
  requirements:
112
- - - '>='
172
+ - - ">="
113
173
  - !ruby/object:Gem::Version
114
174
  version: '0'
115
175
  requirements: []
116
176
  rubyforge_project:
117
- rubygems_version: 2.0.14.1
177
+ rubygems_version: 2.5.2.3
118
178
  signing_key:
119
179
  specification_version: 4
120
180
  summary: dHash and IDHash powered by Vips
121
- test_files: []
181
+ test_files:
182
+ - test.rb
data/.gitignore DELETED
@@ -1,22 +0,0 @@
1
- *.gem
2
- *.rbc
3
- .bundle
4
- .config
5
- .yardoc
6
- Gemfile.lock
7
- InstalledFiles
8
- _yardoc
9
- coverage
10
- doc/
11
- lib/bundler/man
12
- pkg
13
- rdoc
14
- spec/reports
15
- test/tmp
16
- test/version_tmp
17
- tmp
18
- *.bundle
19
- *.so
20
- *.o
21
- *.a
22
- mkmf.log
@@ -1,100 +0,0 @@
1
- require "dhash-vips"
2
-
3
- require "pp"
4
-
5
- # TODO tests about `fingerprint(4)`
6
-
7
- [
8
- [DHashVips::DHash, :hamming, :calculate, 17, 18, 22, 39],
9
- # [[0, 17, 29, 27, 22, 29, 30, 29],
10
- # [17, 0, 30, 26, 33, 36, 37, 36],
11
- # [29, 30, 0, 18, 39, 30, 39, 36],
12
- # [27, 26, 18, 0, 35, 30, 35, 34],
13
- # [22, 33, 39, 35, 0, 17, 28, 23],
14
- # [29, 36, 30, 30, 17, 0, 33, 30],
15
- # [30, 37, 39, 35, 28, 33, 0, 5],
16
- # [29, 36, 36, 34, 23, 30, 5, 0]]
17
- [DHashVips::IDHash, :distance, :fingerprint, 15, 23, 28, 64],
18
- # [[0, 16, 30, 32, 46, 58, 43, 43],
19
- # [16, 0, 28, 28, 47, 59, 46, 47],
20
- # [30, 28, 0, 15, 53, 49, 53, 52],
21
- # [32, 28, 15, 0, 56, 53, 61, 64],
22
- # [46, 47, 53, 56, 0, 23, 43, 45],
23
- # [58, 59, 49, 53, 23, 0, 44, 44],
24
- # [43, 46, 53, 61, 43, 44, 0, 0],
25
- # [43, 47, 52, 64, 45, 44, 0, 0]]
26
- ].each do |lib, dm, calc, min_similar, max_similar, min_not_similar, max_not_similar|
27
-
28
- describe lib do
29
-
30
- require "fileutils"
31
- require "open-uri"
32
- require "digest"
33
- require "mll"
34
- example do |example|
35
-
36
- images = %w{
37
- 1d468d064d2e26b5b5de9a0241ef2d4b.jpg
38
- 92d90b8977f813af803c78107e7f698e.jpg
39
- 309666c7b45ecbf8f13e85a0bd6b0a4c.jpg
40
- 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
41
- df0a3b93e9412536ee8a11255f974141.jpg
42
- 679634ff89a31279a39f03e278bc9a01.jpg
43
- } # these images a consecutive pairs of slightly (but enough for nice asserts) silimar images
44
- # 6d97739b4a08f965dc9239dd24382e96.jpg
45
- # 1b1d4bde376084011d027bba1c047a4b.jpg
46
- # while these two are tend to be false positive match by idhash
47
- bw1, bw2 = %w{
48
- 71662d4d4029a3b41d47d5baf681ab9a.jpg
49
- ad8a37f872956666c3077a3e9e737984.jpg
50
- } # these are the same photo but of different size and colorspace
51
-
52
- example.metadata[:extra_failure_lines] = []
53
- FileUtils.mkdir_p dir = "images"
54
- *images, bw1, bw2 = [*images, bw1, bw2].map do |image|
55
- "#{dir}/#{image}".tap do |filename|
56
- unless File.exist?(filename) && Digest::MD5.file(filename) == File.basename(filename, ".jpg")
57
- example.metadata[:extra_failure_lines] << "copying image from web to #{filename}"
58
- open("https://storage.googleapis.com/dhash-vips.nakilon.pro/#{image}") do |link|
59
- File.open(filename, "wb") do |file|
60
- IO.copy_stream link, file
61
- end
62
- end
63
- end
64
- end
65
- end
66
-
67
- hashes = [*images, bw1, bw2].map &described_class.method(calc)
68
- table = MLL::table[described_class.method(dm), [hashes], [hashes]]
69
-
70
- # require "pp"
71
- # pp table
72
- # next
73
-
74
- aggregate_failures do
75
- hashes.size.times.to_a.repeated_combination(2) do |i, j|
76
- case
77
- when i == j
78
- expect(table[i][j]).to eq 0
79
- when (j - i).abs == 1 && (i + j - 1) % 4 == 0
80
- if [i, j] == [hashes.size - 2, hashes.size - 1]
81
- if described_class == DHashVips::DHash
82
- expect(table[i][j]).to be == 5
83
- else
84
- expect(table[i][j]).to eq 0
85
- end
86
- else
87
- expect(table[i][j]).to be_between(min_similar, max_similar).inclusive
88
- end
89
- else
90
- expect(table[i][j]).to be_between(min_not_similar, max_not_similar).inclusive
91
- end
92
- end
93
-
94
- end
95
-
96
- end
97
-
98
- end
99
-
100
- end