dhash-vips 0.0.5.0 → 0.1.0.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/LICENSE.txt +1 -1
- data/README.md +134 -100
- data/Rakefile +171 -64
- data/common.rb +11 -0
- data/dhash-vips.gemspec +17 -6
- data/extconf.rb +31 -0
- data/idhash.c +29 -0
- data/lib/dhash-vips-post-install-test.rb +36 -0
- data/lib/dhash-vips.rb +25 -12
- data/test.rb +101 -0
- metadata +84 -23
- data/.gitignore +0 -22
- data/spec/_spec.rb +0 -100
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 2b90b4abcd617f0e285dd3399338b4114d921624
|
4
|
+
data.tar.gz: d47a8e51a8e7a2bf28b60d0259dc350f8b5e4fa0
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 3b8b2ce65f3bdfd90d03e3a069e30c61be4a9bf1fc40f9d4beace8216165383d7f54572e996e101ab9fd6f056b260676a9ed465b8860bd1a6e17a4228b859909
|
7
|
+
data.tar.gz: 2ea40198fa4c6ed05183d610b2ca558a6ad67a386e69c127dbc0afac5723686025bd4644cc7297c300d9173b212788cc1ab4f242ad849ed7ee50dbc462dd5f08
|
data/LICENSE.txt
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
The MIT License (MIT)
|
2
2
|
|
3
|
-
Copyright (c) 2017
|
3
|
+
Copyright (c) 2017, Victor Maslov (nakilon@gmail.com)
|
4
4
|
|
5
5
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
6
|
of this software and associated documentation files (the "Software"), to deal
|
data/README.md
CHANGED
@@ -1,49 +1,21 @@
|
|
1
|
-
[![Gem Version](https://badge.fury.io/rb/dhash-vips.svg)](http://badge.fury.io/rb/dhash-vips)
|
1
|
+
[![Gem Version](https://badge.fury.io/rb/dhash-vips.svg)](http://badge.fury.io/rb/dhash-vips) [![Docker Image](https://github.com/nakilon/dhash-vips/workflows/Docker%20Image/badge.svg)](https://hub.docker.com/repository/docker/nakilonishe/dhash-vips/general)
|
2
2
|
|
3
3
|
# dHash and IDHash gem powered by ruby-vips
|
4
4
|
|
5
|
-
The
|
5
|
+
The **dHash** is the algorithm of image fingerprinting that can be used to measure the similarity of two images.
|
6
|
+
The **IDHash** is the new algorithm that has some improvements over dHash -- I'll describe it further.
|
6
7
|
|
7
|
-
You
|
8
|
-
The original idea is that you split the image into 64 segments and so there are 64 bits -- each tells if the one segment is brighter or darker than the neighbor one. Then the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between fingerprints is the opposite of images similarity.
|
8
|
+
You can read about the dHash and perceptual hashing in the article ["Kind of Like That" at "The Hacker Factor Blog"](http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html) (21 January 2013). The idea is that you resize the otiginal image to 8x9 and then convert it to 8x8 array of bits -- each tells if the corresponding segment of the image is brighter or darker than the one on the right (or left). Then you apply the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) to such arrays to measure how much they are different.
|
9
9
|
|
10
|
-
There were several implementations on Github already but they all
|
11
|
-
```
|
12
|
-
load and calculate the fingerprint:
|
13
|
-
user system total real
|
14
|
-
Dhash 12.400000 0.820000 13.220000 ( 13.329952)
|
15
|
-
DHashVips::DHash 1.330000 0.230000 1.560000 ( 1.509826)
|
16
|
-
DHashVips::IDHash 1.060000 0.090000 1.150000 ( 1.100332)
|
17
|
-
|
18
|
-
measure the distance (1000 times):
|
19
|
-
user system total real
|
20
|
-
Dhash hamming 3.140000 0.020000 3.160000 ( 3.179392)
|
21
|
-
DHashVips::DHash hamming 3.040000 0.020000 3.060000 ( 3.095190)
|
22
|
-
DHashVips::IDHash distance 6.720000 0.030000 6.750000 ( 6.790900)
|
23
|
-
```
|
24
|
-
|
25
|
-
Here the `Dhash` is [another gem](https://github.com/maccman/dhash) that I used earlier in my projects.
|
26
|
-
The `DHashVips::DHash` is a port of it that uses vips. I would like to tell you that you can replace the `dhash` with `dhash-vips` gem right now but it appeared to have a barely noticeable issue. There is a lot of magic behind the libvips speed and resizing -- you may not notice it with unarmed eyes but when two neighbor segments are enough similar by luminosity the difference can change the sign. So I found two identical images that were just of different colorspace and size (photo by Jordan Voth):
|
27
|
-
![](https://storage.googleapis.com/dhash-vips.nakilon.pro/dhash_issue_example.png)
|
28
|
-
but the distance between their hashes appeared to be equal to 5 while `dhash` gem reported 0.
|
29
|
-
|
30
|
-
This is why `DHashVips::IDHash` appeared.
|
10
|
+
There were several Ruby implementations on Github already but they all depended on ImageMagick. My implementation takes an advantage of speed of the libvips (the `ruby-vips` gem) -- it fingerprints images much faster. For even more speed the fingerprint comparison function is made as native C extension.
|
31
11
|
|
32
12
|
## IDHash (the Important Difference Hash)
|
33
13
|
|
34
|
-
|
14
|
+
The main improvement over the dHash is what makes it insensitive to the resizing algorithm and possible errors due to color scheme conversion.
|
15
|
+
|
35
16
|
* The "Importance" is an array of extra 64 bits that tells the comparing function which half of 64 bits is important (when the difference between neighbors was enough significant) and which is not. So not every bit in a fingerprint is being compared but only half of them.
|
36
17
|
* It subtracts not only horizontally but also vertically -- that adds 128 more bits.
|
37
|
-
* Instead of resizing to
|
38
|
-
|
39
|
-
You could see in fingerprint calculation benchmark earlier that these improvements didn't make it slower than dHash because most of the time is spent on image resizing. The calculation of distance is what became two times slower:
|
40
|
-
```ruby
|
41
|
-
((a | b) & ((a ^ b) >> 128)).to_s(2).count "1"
|
42
|
-
```
|
43
|
-
vs
|
44
|
-
```ruby
|
45
|
-
(a ^ b).to_s(2).count "1"
|
46
|
-
```
|
18
|
+
* Instead of resizing to 8x9 it resizes to 8x8 and puts the image on a torus so it subtracts the very left column from the very right one and the top from the bottom.
|
47
19
|
|
48
20
|
### Example
|
49
21
|
|
@@ -65,6 +37,8 @@ Here in each of 64 cells, there are two circles that color the difference betwee
|
|
65
37
|
* [Finding the closest pair between two sets of points on the hypercube](https://cstheory.stackexchange.com/q/16322/27420)
|
66
38
|
* [Would PCA work for boolean data types?](https://stats.stackexchange.com/q/159705/1125)
|
67
39
|
* [Using pHash to search agaist a huge image database, what is the best approach?](https://stackoverflow.com/q/18257641/322020)
|
40
|
+
* [How do I speed up this BIT_COUNT query for hamming distance?](https://stackoverflow.com/q/35065675/322020)
|
41
|
+
* [Hamming distance on binary strings in SQL](https://stackoverflow.com/q/4777070/322020)
|
68
42
|
|
69
43
|
## Installation
|
70
44
|
|
@@ -75,7 +49,7 @@ Then:
|
|
75
49
|
|
76
50
|
gem install dhash-vips
|
77
51
|
|
78
|
-
If you have troubles with the `gem ruby-vips` dependency, see https://github.com/
|
52
|
+
If you have troubles with the `gem ruby-vips` dependency, see https://github.com/libvips/ruby-vips
|
79
53
|
|
80
54
|
## Usage
|
81
55
|
|
@@ -102,11 +76,11 @@ end
|
|
102
76
|
```ruby
|
103
77
|
require "dhash-vips"
|
104
78
|
|
105
|
-
hash1 = DHashVips::IDHash.
|
106
|
-
hash2 = DHashVips::IDHash.
|
79
|
+
hash1 = DHashVips::IDHash.fingerprint "photo1.jpg"
|
80
|
+
hash2 = DHashVips::IDHash.fingerprint "photo2.jpg"
|
107
81
|
|
108
82
|
distance = DHashVips::IDHash.distance hash1, hash2
|
109
|
-
if distance <
|
83
|
+
if distance < 15
|
110
84
|
puts "Images are very similar"
|
111
85
|
elsif distance < 25
|
112
86
|
puts "Images are slightly similar"
|
@@ -115,74 +89,134 @@ else
|
|
115
89
|
end
|
116
90
|
```
|
117
91
|
|
118
|
-
|
119
|
-
To find out these tresholds we can run a rake task with hardcoded test cases:
|
120
|
-
```
|
121
|
-
$ rake compare_matrices
|
122
|
-
|
123
|
-
Dhash
|
124
|
-
Absolutely the same image: 0..0
|
125
|
-
Complex B/W and the same but colorful: 0
|
126
|
-
Similar images: 13..16
|
127
|
-
Different images: 9..41
|
128
|
-
|
129
|
-
DHashVips::DHash
|
130
|
-
Absolutely the same image: 0..0
|
131
|
-
Complex B/W and the same but colorful: 5
|
132
|
-
Similar images: 17..18
|
133
|
-
Different images: 14..39
|
134
|
-
|
135
|
-
DHashVips::IDHash
|
136
|
-
Absolutely the same image: 0..0
|
137
|
-
Complex B/W and the same but colorful: 0
|
138
|
-
Similar images: 15..23
|
139
|
-
Different images: 19..64
|
140
|
-
|
141
|
-
DHashVips::IDHash 4
|
142
|
-
Absolutely the same image: 0..0
|
143
|
-
Complex B/W and the same but colorful: 0
|
144
|
-
Similar images: 71..108
|
145
|
-
Different images: 102..211
|
92
|
+
### Notes and benchmarks
|
146
93
|
|
147
|
-
|
94
|
+
* The above `15` and `25` constants are found empirically and just work enough well for 8-byte hashes. To find these thresholds we can run a rake task with hardcoded test cases (pairs of photos from the same photosession are not the same but are considered to be enough 'similar' for the purpose of this benchmark):
|
148
95
|
|
149
|
-
|
96
|
+
$ vips -v
|
97
|
+
vips-8.9.2-Tue Apr 21 09:26:11 UTC 2020
|
98
|
+
$ identify -version | head -1
|
99
|
+
Version: ImageMagick 6.9.11-24 Q16 x86_64 2020-07-18 https://imagemagick.org
|
100
|
+
$ rake compare_quality
|
101
|
+
|
102
|
+
Dhash Phamilie DHashVips::DHash DHashVips::IDHash DHashVips::IDHash(4)
|
103
|
+
The same image: 0..0 0..0 0..0 0..0 0..0
|
104
|
+
'Jordan Voth case': 2 2 4 0 0
|
105
|
+
Similar images: 1..15 14..34 2..23 6..22 53..166
|
106
|
+
Different images: 10..54 22..42 10..50 17..65 120..233
|
107
|
+
1/FMI^2 = 1.375 4.0 1.556 1.25 1.306
|
108
|
+
FP, FN = [3, 0] [0, 6] [1, 2] [2, 0] [1, 1]
|
109
|
+
|
110
|
+
The `FMI` line here is the "quality of algorithm", i.e. the best achievable function from the ["Fowlkes–Mallows index"](https://en.wikipedia.org/wiki/Fowlkes%E2%80%93Mallows_index) value if you take the "similar" and "different" test pairs and try to draw the threshold line. Smaller number is better. The last line shows number of false positives (`FP`) and false negatives (`FN`) in case of the best achieved FMI. Here I've added the [`phamilie` gem](https://github.com/toy/phamilie) that is DCT based (not a kind of dhash).
|
150
111
|
|
151
112
|
* Methods were renamed from `#calculate` to `#fingerprint` and from `#hamming` to `#distance`.
|
152
|
-
* The `DHash#calculate` accepts `hash_size` optional parameter that is 8 by default. The `IDHash#fingerprint`'s optional parameter is called `power` and works in a bit different way: 3 means 8 and 4 means 16 -- other sizes are not supported because they don't seem to be useful (higher fingerprint resolution makes it vulnerable to image shifts and croppings). Because IDHash's fingerprint is more complex than DHash's one it's not that straight forward to compare them so under the hood the `#distance`
|
113
|
+
* The `DHash#calculate` accepts `hash_size` optional parameter that is 8 by default. The `IDHash#fingerprint`'s optional parameter is called `power` and works in a bit different way: 3 means 8 and 4 means 16 -- other sizes are not supported because they don't seem to be useful (higher fingerprint resolution makes it vulnerable to image shifts and croppings, also `#distance` becomes much slower). Because IDHash's fingerprint is more complex than DHash's one it's not that straight forward to compare them so under the hood the `#distance` method have to check the size of fingerprint. If you are sure that fingerprints were made with power=3 then to skip the check you may use the `#distance3` method directly.
|
114
|
+
* The `#distance3` method will use Ruby C extension that is around 15 times faster than pure Ruby implementation -- native extension is currently hardcoded to be compiled only if it's macOS and rbenv Ruby 2.3.8 installed with `-k` flag but if you know how to make the gem gracefully fallback to native Ruby if `make` fails let me know or make a pull request. So the full benchmark:
|
153
115
|
|
154
|
-
|
155
|
-
$ rake compare_speed
|
156
|
-
|
157
|
-
load and calculate the fingerprint:
|
158
|
-
user system total real
|
159
|
-
Dhash 12.400000 0.820000 13.220000 ( 13.329952)
|
160
|
-
DHashVips::DHash 1.330000 0.230000 1.560000 ( 1.509826)
|
161
|
-
DHashVips::IDHash 1.060000 0.090000 1.150000 ( 1.100332)
|
162
|
-
DHashVips::IDHash 4 1.030000 0.080000 1.110000 ( 1.089148)
|
163
|
-
|
164
|
-
measure the distance (1000 times):
|
165
|
-
user system total real
|
166
|
-
Dhash hamming 3.140000 0.020000 3.160000 ( 3.179392)
|
167
|
-
DHashVips::DHash hamming 3.040000 0.020000 3.060000 ( 3.095190)
|
168
|
-
DHashVips::IDHash distance 8.170000 0.040000 8.210000 ( 8.279950)
|
169
|
-
DHashVips::IDHash distance3 6.720000 0.030000 6.750000 ( 6.790900)
|
170
|
-
DHashVips::IDHash distance 4 24.430000 0.130000 24.560000 ( 24.652625)
|
171
|
-
```
|
116
|
+
* Ruby 2.0.0
|
172
117
|
|
173
|
-
|
118
|
+
$ bundle exec rake compare_speed
|
174
119
|
|
175
|
-
|
120
|
+
load the image and calculate the fingerprint:
|
121
|
+
user system total real
|
122
|
+
Dhash 12.400000 0.820000 13.220000 ( 13.329952)
|
123
|
+
DHashVips::DHash 1.330000 0.230000 1.560000 ( 1.509826)
|
124
|
+
DHashVips::IDHash 1.060000 0.090000 1.150000 ( 1.100332)
|
125
|
+
DHashVips::IDHash 4 1.030000 0.080000 1.110000 ( 1.089148)
|
176
126
|
|
177
|
-
|
178
|
-
|
179
|
-
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
|
184
|
-
|
185
|
-
|
127
|
+
measure the distance (32*32*1000 times):
|
128
|
+
user system total real
|
129
|
+
Dhash hamming 3.140000 0.020000 3.160000 ( 3.179392)
|
130
|
+
DHashVips::DHash hamming 3.040000 0.020000 3.060000 ( 3.095190)
|
131
|
+
DHashVips::IDHash distance 8.170000 0.040000 8.210000 ( 8.279950)
|
132
|
+
DHashVips::IDHash distance3 6.720000 0.030000 6.750000 ( 6.790900)
|
133
|
+
DHashVips::IDHash distance 4 24.430000 0.130000 24.560000 ( 24.652625)
|
134
|
+
|
135
|
+
* Ruby 2.3.3 seems to have some bit arithmetics improvement compared to 2.0:
|
136
|
+
|
137
|
+
load the image and calculate the fingerprint:
|
138
|
+
user system total real
|
139
|
+
Dhash 13.110000 0.950000 14.060000 ( 14.537057)
|
140
|
+
DHashVips::DHash 1.480000 0.310000 1.790000 ( 1.808787)
|
141
|
+
DHashVips::IDHash 1.080000 0.100000 1.180000 ( 1.156446)
|
142
|
+
DHashVips::IDHash 4 1.030000 0.090000 1.120000 ( 1.076117)
|
143
|
+
|
144
|
+
measure the distance (32*32*1000 times):
|
145
|
+
user system total real
|
146
|
+
Dhash hamming 1.770000 0.010000 1.780000 ( 1.815612)
|
147
|
+
DHashVips::DHash hamming 1.810000 0.010000 1.820000 ( 1.875666)
|
148
|
+
DHashVips::IDHash distance 4.250000 0.020000 4.270000 ( 4.350071)
|
149
|
+
DHashVips::IDHash distance3 3.430000 0.020000 3.450000 ( 3.499031)
|
150
|
+
DHashVips::IDHash distance 4 8.210000 0.110000 8.320000 ( 8.510735)
|
151
|
+
|
152
|
+
* Ruby 2.3.8p459 (2.4.6, 2.5.5 and 2.6.3 are all similar) with newer CPU (`sysctl -n machdep.cpu.brand_string #=> Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz`):
|
153
|
+
|
154
|
+
load the image and calculate the fingerprint:
|
155
|
+
user system total real
|
156
|
+
Dhash 6.191731 0.230885 6.422616 ( 6.428763)
|
157
|
+
Phamilie 5.361751 0.037524 5.399275 ( 5.402553)
|
158
|
+
DHashVips::DHash 0.858045 0.144820 1.002865 ( 0.924308)
|
159
|
+
DHashVips::IDHash 0.769975 0.071087 0.841062 ( 0.790470)
|
160
|
+
DHashVips::IDHash 4 0.805311 0.077918 0.883229 ( 0.825897)
|
161
|
+
|
162
|
+
measure the distance (32*32*2000 times):
|
163
|
+
user system total real
|
164
|
+
Dhash hamming 1.810000 0.000000 1.810000 ( 1.824719)
|
165
|
+
Phamilie distance 1.000000 0.010000 1.010000 ( 1.006127)
|
166
|
+
DHashVips::DHash hamming 1.810000 0.000000 1.810000 ( 1.817415)
|
167
|
+
DHashVips::IDHash distance 1.400000 0.000000 1.400000 ( 1.401333)
|
168
|
+
DHashVips::IDHash distance3_ruby 3.320000 0.010000 3.330000 ( 3.337920)
|
169
|
+
DHashVips::IDHash distance3_c 0.210000 0.000000 0.210000 ( 0.212864)
|
170
|
+
DHashVips::IDHash distance 4 8.300000 0.120000 8.420000 ( 8.499735)
|
171
|
+
|
172
|
+
* Also note that to make `#distance` able to assume the fingerprint resolution from the size of Integer that represents it, the change in its structure was needed (left half of bits was swapped with right one), so fingerprints between versions 0.0.4 and 0.0.5 became incompatible, but you probably can convert them manually. Otherwise if we put the version or structure information inside fingerprint it would became slow to (de)serialize and store.
|
173
|
+
|
174
|
+
## Development notes
|
175
|
+
|
176
|
+
$ ruby test.rb
|
177
|
+
|
178
|
+
* You might need to prepend `bundle exec` to all the `rake` commands.
|
179
|
+
|
180
|
+
* You get this:
|
181
|
+
|
182
|
+
Can't install RMagick 2.16.0. Can't find MagickWand.h.
|
183
|
+
|
184
|
+
because Imagemagick sucks but we need it to benchmark alternative gems, so:
|
185
|
+
|
186
|
+
$ brew install imagemagick@6
|
187
|
+
$ brew unlink imagemagick@7
|
188
|
+
$ brew link imagemagick@6 --force
|
189
|
+
|
190
|
+
* OS X El Captain and rbenv may cause environment issues that would make you do things like:
|
191
|
+
|
192
|
+
$ ./ruby `rbenv which rake` compare_matrixes
|
193
|
+
|
194
|
+
instead of just
|
195
|
+
|
196
|
+
$ rake compare_matrixes
|
197
|
+
|
198
|
+
For more information on that: https://github.com/jcupitt/ruby-vips/issues/141
|
199
|
+
|
200
|
+
* On macOS, when you do `bundle install` it may fail to install `rmagick` gem (`dhash` gem dependency) saying:
|
201
|
+
|
202
|
+
ERROR: Can't install RMagick 4.0.0. Can't find magick/MagickCore.h.
|
203
|
+
|
204
|
+
To resolve this do:
|
205
|
+
|
206
|
+
$ brew install imagemagick@6
|
207
|
+
$ LDFLAGS="-L/usr/local/opt/imagemagick@6/lib" CPPFLAGS="-I/usr/local/opt/imagemagick@6/include" bundle install
|
208
|
+
|
209
|
+
* If you get `No package 'MagickCore' found` try:
|
210
|
+
|
211
|
+
$ PKG_CONFIG_PATH="/usr/local/Cellar/imagemagick@6/6.9.10-74/lib/pkgconfig" bundle install
|
212
|
+
|
213
|
+
* Execute the `rake compare_quality` at least once before executing other rake tasks because it's currently the only one that downloads the test images.
|
214
|
+
|
215
|
+
* The tag `v0.0.0.4` is not semver and not real gem version -- it's only for Github Actions testing purposes.
|
216
|
+
|
217
|
+
* To quickly find out what does the dhash-vips Docker image include: `docker run --rm <image_name> sh -c "cat /etc/alpine-release; ruby -v; vips -v; gem list dhash-vips` (TODO: write in this README about the existing Docker image).
|
218
|
+
|
219
|
+
* Phamilie works with filenames instead of fingerprints and caches them but not distances.
|
186
220
|
|
187
221
|
## Credits
|
188
222
|
|
data/Rakefile
CHANGED
@@ -1,15 +1,7 @@
|
|
1
1
|
STDOUT.sync = true
|
2
|
+
require "pp"
|
2
3
|
|
3
|
-
require "bundler/gem_tasks"
|
4
|
-
|
5
|
-
|
6
|
-
task :default => %w{ spec }
|
7
|
-
|
8
|
-
require "rspec/core/rake_task"
|
9
|
-
RSpec::Core::RakeTask.new :spec do |t|
|
10
|
-
t.verbose = false
|
11
|
-
end
|
12
|
-
|
4
|
+
require "bundler/gem_tasks" # to push to rubygems
|
13
5
|
|
14
6
|
visualize_hash = lambda do |hash|
|
15
7
|
puts hash.to_s(2).rjust(64, ?0).gsub(/(?<=.)/, '\0 ').scan(/.{16}/)
|
@@ -50,58 +42,80 @@ task :compare_kernels do |_|
|
|
50
42
|
end
|
51
43
|
end
|
52
44
|
|
53
|
-
|
54
|
-
|
55
|
-
task :
|
45
|
+
desc "Compare the quality of Dhash, Phamilie, DHashVips::DHash, DHashVips::IDHash"
|
46
|
+
# in this test we want to know not that photos are the same but rather that they are from the same photosession
|
47
|
+
task :compare_quality do
|
56
48
|
require "dhash"
|
49
|
+
require "phamilie"
|
50
|
+
phamilie = Phamilie.new
|
57
51
|
require_relative "lib/dhash-vips"
|
58
52
|
require "mll"
|
59
|
-
|
60
|
-
|
61
|
-
[
|
62
|
-
[
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
53
|
+
|
54
|
+
puts MLL::grid.call( [
|
55
|
+
["", "The same image:", "'Jordan Voth case':", "Similar images:", "Different images:", "1/FMI^2 =", "FP, FN ="],
|
56
|
+
*[
|
57
|
+
[Dhash, :calculate, :hamming],
|
58
|
+
[phamilie, :fingerprint, :distance, nil, 0],
|
59
|
+
[DHashVips::DHash, :calculate, :hamming],
|
60
|
+
[DHashVips::IDHash, :fingerprint, :distance],
|
61
|
+
[DHashVips::IDHash, :fingerprint, :distance, 4],
|
62
|
+
].map do |m, calc, dm, power, ii|
|
63
|
+
require_relative "common"
|
64
|
+
hashes = %w{
|
65
|
+
71662d4d4029a3b41d47d5baf681ab9a.jpg ad8a37f872956666c3077a3e9e737984.jpg
|
66
|
+
|
67
|
+
1b1d4bde376084011d027bba1c047a4b.jpg 6d97739b4a08f965dc9239dd24382e96.jpg
|
68
|
+
|
69
|
+
1d468d064d2e26b5b5de9a0241ef2d4b.jpg 92d90b8977f813af803c78107e7f698e.jpg
|
70
|
+
309666c7b45ecbf8f13e85a0bd6b0a4c.jpg 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
|
71
|
+
679634ff89a31279a39f03e278bc9a01.jpg df0a3b93e9412536ee8a11255f974141.jpg
|
72
|
+
54192a3f65bd03163b04849e1577a40b.jpg 6d32f57459e5b79b5deca2a361eb8c6e.jpg
|
73
|
+
4b62e0eef58bfbc8d0d2fbf2b9d05483.jpg b8eb0ca91855b657f12fb3d627d45c53.jpg
|
74
|
+
21cd9a6986d98976b6b4655e1de7baf4.jpg 9b158c0d4953d47171a22ed84917f812.jpg
|
75
|
+
9c2c240ec02356472fb532f404d28dde.jpg fc762fa286489d8afc80adc8cdcb125e.jpg
|
76
|
+
7a833d873f8d49f12882e86af1cc6b79.jpg ac033cf01a3941dd1baa876082938bc9.jpg
|
77
|
+
}.map(&method(:download_and_keep)).map{ |filename| [filename, m.public_send(calc, filename, *power)] }
|
78
|
+
table = MLL::table[m.method(dm), [hashes.map{|_|_[ii||1]}], [hashes.map{|_|_[ii||1]}]]
|
79
|
+
report = Struct.new(:same, :bw, :sim, :not_sim).new [], [], [], []
|
80
|
+
hashes.size.times.to_a.repeated_combination(2) do |i, j|
|
81
|
+
report[
|
82
|
+
case
|
83
|
+
when i == j ; :same
|
84
|
+
when [i, j] == [0, 1] ; :bw
|
85
|
+
when i > 3 && i + 1 == j && i % 2 == 0 ; :sim
|
86
|
+
else ; :not_sim
|
87
|
+
end
|
88
|
+
].push table[i][j]
|
89
|
+
end
|
90
|
+
min, max = [*report.sim, *report.not_sim].minmax
|
91
|
+
fmi, fp, fn = (min..max+1).map do |b|
|
92
|
+
fp = report.not_sim.count{ |_| _ < b }
|
93
|
+
tp = report.sim.count{ |_| _ < b }
|
94
|
+
fn = report.sim.count{ |_| _ >= b }
|
95
|
+
[((tp + fp) * (tp + fn)).fdiv(tp * tp), fp, fn]
|
96
|
+
end.reject{ |_,| _.nan? }.min_by(&:first)
|
97
|
+
[
|
98
|
+
"#{m.is_a?(Module) ? m : m.class}#{"(#{power})" if power}",
|
99
|
+
report.same. minmax.join(".."),
|
100
|
+
report.bw[0],
|
101
|
+
report.sim. minmax.join(".."),
|
102
|
+
report.not_sim.minmax.join(".."),
|
103
|
+
fmi.round(3),
|
104
|
+
[fp, fn]
|
105
|
+
]
|
106
|
+
end,
|
107
|
+
].transpose, spacings: [1.5, 0], alignment: :right )
|
94
108
|
end
|
95
109
|
|
96
110
|
# ruby -c Rakefile && rm -f ab.png && rake compare_images -- fc762fa286489d8afc80adc8cdcb125e.jpg 9c2c240ec02356472fb532f404d28dde.jpg 2>/dev/null && ql ab.png
|
97
111
|
# rm -f ab.png && ./ruby `rbenv which rake` compare_images -- 6d97739b4a08f965dc9239dd24382e96.jpg 1b1d4bde376084011d027bba1c047a4b.jpg 2>/dev/null && ql ab.png
|
98
112
|
desc "Visualizes the IDHash difference measurement between two images"
|
99
113
|
task :compare_images do |_|
|
100
|
-
abort "there should be two image filenames passed as arguments (and optionally the `power`)" unless (4
|
101
|
-
abort "the optional argument should be either 3 or 4" unless [3, 4].include?(power = (ARGV[
|
114
|
+
abort "there should be two image filenames passed as arguments (and optionally the `power`)" unless (3..4) === ARGV.size
|
115
|
+
abort "the optional argument should be either 3 or 4" unless [3, 4].include?(power = (ARGV[3] || 3).to_i)
|
102
116
|
task ARGV.last do ; end
|
103
117
|
require_relative "lib/dhash-vips"
|
104
|
-
ha, hb = ARGV[
|
118
|
+
ha, hb = ARGV[1, 2].map{ |filename| DHashVips::IDHash.fingerprint(filename, power) }
|
105
119
|
puts "distance: #{DHashVips::IDHash.distance ha, hb}"
|
106
120
|
size = 2 ** power
|
107
121
|
shift = 2 * size * size
|
@@ -110,7 +124,7 @@ task :compare_images do |_|
|
|
110
124
|
bi = hb >> shift
|
111
125
|
bd = hb - (bi << shift)
|
112
126
|
|
113
|
-
a, b = ARGV[
|
127
|
+
a, b = ARGV[1, 2].map do |filename|
|
114
128
|
image = Vips::Image.new_from_file filename
|
115
129
|
image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w").
|
116
130
|
resize(100, vscale: 100, kernel: :nearest).colourspace("srgb")
|
@@ -187,10 +201,11 @@ task :compare_images do |_|
|
|
187
201
|
a.join(b, :horizontal, shim: 15).write_to_file "ab.png"
|
188
202
|
end
|
189
203
|
|
190
|
-
|
191
|
-
desc "Benchmarks Dhash, DHashVips::DHash and DHashVips::IDHash"
|
204
|
+
desc "Benchmark speed of Dhash, DHashVips::DHash, DHashVips::IDHash and Phamilie"
|
192
205
|
task :compare_speed do
|
193
206
|
require "dhash"
|
207
|
+
require "phamilie"
|
208
|
+
phamilie = Phamilie.new
|
194
209
|
require_relative "lib/dhash-vips"
|
195
210
|
|
196
211
|
filenames = %w{
|
@@ -214,36 +229,128 @@ task :compare_speed do
|
|
214
229
|
end
|
215
230
|
|
216
231
|
require "benchmark"
|
217
|
-
puts "load and calculate the fingerprint:"
|
232
|
+
puts "load the image and calculate the fingerprint:"
|
218
233
|
hashes = []
|
219
234
|
Benchmark.bm 19 do |bm|
|
220
235
|
[
|
221
236
|
[Dhash, :calculate],
|
237
|
+
[phamilie, :fingerprint],
|
222
238
|
[DHashVips::DHash, :calculate],
|
223
239
|
[DHashVips::IDHash, :fingerprint],
|
224
240
|
[DHashVips::IDHash, :fingerprint, 4],
|
225
241
|
].each do |m, calc, power|
|
226
|
-
bm.report "#{m} #{power}" do
|
242
|
+
bm.report "#{m.is_a?(Module) ? m : m.class} #{power}" do
|
227
243
|
hashes.push filenames.map{ |filename| m.send calc, filename, *power }
|
228
244
|
end
|
229
245
|
end
|
230
246
|
end
|
231
|
-
|
232
|
-
|
233
|
-
|
247
|
+
|
248
|
+
# for `distance`, `distance3_ruby` and `distance3_c` we use the same hashes
|
249
|
+
# this array manipulation converts [1, 2, 3, 4, 5] into [1, 2, 3, 4, 4, 4, 5]
|
250
|
+
hashes[-1, 1] = hashes[-2, 2]
|
251
|
+
hashes[-1, 1] = hashes[-2, 2]
|
252
|
+
|
253
|
+
puts "\nmeasure the distance (32*32*2000 times):"
|
254
|
+
Benchmark.bm 32 do |bm|
|
234
255
|
[
|
235
256
|
[Dhash, :hamming],
|
257
|
+
[phamilie, :distance, nil, 1],
|
236
258
|
[DHashVips::DHash, :hamming],
|
237
259
|
[DHashVips::IDHash, :distance],
|
238
|
-
[DHashVips::IDHash, :
|
260
|
+
[DHashVips::IDHash, :distance3_ruby],
|
261
|
+
[DHashVips::IDHash, :distance3_c],
|
239
262
|
[DHashVips::IDHash, :distance, 4],
|
240
|
-
].zip(hashes) do |(m, dm, power), hs|
|
241
|
-
bm.report "#{m} #{dm} #{power}" do
|
242
|
-
|
243
|
-
|
263
|
+
].zip(hashes) do |(m, dm, power, ii), hs|
|
264
|
+
bm.report "#{m.is_a?(Module) ? m : m.class} #{dm} #{power}" do
|
265
|
+
_ = [hs, filenames][ii || 0]
|
266
|
+
_.product _ do |h1, h2|
|
267
|
+
2000.times{ m.public_send dm, h1, h2 }
|
244
268
|
end
|
245
269
|
end
|
246
270
|
end
|
247
271
|
end
|
248
272
|
|
249
273
|
end
|
274
|
+
|
275
|
+
desc "Benchmarks everything about Dhash, DHashVips::DHash, DHashVips::IDHash and Phamilie"
|
276
|
+
task :benchmark do
|
277
|
+
abort "provide a folder with images grouped by similarity" unless 2 === ARGV.size
|
278
|
+
abort "invalid folder provided" unless Dir.exist?(dir = ARGV.last)
|
279
|
+
|
280
|
+
require "dhash"
|
281
|
+
require "phamilie"
|
282
|
+
phamilie = Phamilie.new
|
283
|
+
require_relative "lib/dhash-vips"
|
284
|
+
|
285
|
+
filenames = Dir.glob("#{dir}/*").map{ |_| Dir.glob "#{_}/*" }
|
286
|
+
puts "image groups sizes: #{filenames.map(&:size)}"
|
287
|
+
require "benchmark"
|
288
|
+
|
289
|
+
puts "step 1 / 3 (fingerprinting)"
|
290
|
+
hashes = []
|
291
|
+
bm1 = [
|
292
|
+
[phamilie, :fingerprint],
|
293
|
+
[Dhash, :calculate],
|
294
|
+
[DHashVips::DHash, :calculate],
|
295
|
+
[DHashVips::IDHash, :fingerprint],
|
296
|
+
].map do |m, calc, power|
|
297
|
+
Benchmark.realtime do
|
298
|
+
hashes.push filenames.flatten.map{ |filename| m.send calc, filename, *power }
|
299
|
+
end
|
300
|
+
end
|
301
|
+
|
302
|
+
puts "step 2 / 3 (comparing fingerprints)"
|
303
|
+
combs = filenames.flatten.size ** 2
|
304
|
+
n = 10_000_000_000_000 / Dir.glob("#{dir}/*/*").map(&File.method(:size)).inject(:+) / combs
|
305
|
+
bm2 = [
|
306
|
+
[phamilie, :distance, nil, filenames.flatten],
|
307
|
+
[Dhash, :hamming],
|
308
|
+
[DHashVips::DHash, :hamming],
|
309
|
+
[DHashVips::IDHash, :distance3_c],
|
310
|
+
].zip(hashes).map do |(m, dm, power, ii), hs|
|
311
|
+
Benchmark.realtime do
|
312
|
+
_ = ii || hs
|
313
|
+
_.product _ do |h1, h2|
|
314
|
+
n.times{ m.public_send dm, h1, h2 }
|
315
|
+
end
|
316
|
+
end
|
317
|
+
end
|
318
|
+
|
319
|
+
puts "step 3 / 3 (looking for the best threshold)"
|
320
|
+
bm3 = [
|
321
|
+
[phamilie, :fingerprint, :distance, nil, 0],
|
322
|
+
[Dhash, :calculate, :hamming],
|
323
|
+
[DHashVips::DHash, :calculate, :hamming],
|
324
|
+
[DHashVips::IDHash, :fingerprint, :distance],
|
325
|
+
].map do |m, calc, dm, power, ii|
|
326
|
+
require_relative "common"
|
327
|
+
hashes = Dir.glob("#{dir}/*").flat_map{ |_| Dir.glob "#{_}/*" }.map{ |filename| [filename, m.public_send(calc, filename, *power)] }
|
328
|
+
report = Struct.new(:same, :sim, :not_sim).new [], [], []
|
329
|
+
hashes.size.times.to_a.repeated_combination(2) do |i, j|
|
330
|
+
report[
|
331
|
+
case
|
332
|
+
when i == j ; :same
|
333
|
+
when File.split(File.split(hashes[i][0]).first).last ==
|
334
|
+
File.split(File.split(hashes[j][0]).first).last && i < j ; :sim
|
335
|
+
else ; :not_sim
|
336
|
+
end
|
337
|
+
].push m.method(dm).call hashes[i][ii||1], hashes[j][ii||1]
|
338
|
+
end
|
339
|
+
min, max = [*report.sim, *report.not_sim].minmax
|
340
|
+
fmi, fp, fn = (min..max+1).map do |b|
|
341
|
+
fp = report.not_sim.count{ |_| _ < b }
|
342
|
+
tp = report.sim.count{ |_| _ < b }
|
343
|
+
fn = report.sim.count{ |_| _ >= b }
|
344
|
+
[((tp + fp) * (tp + fn)).fdiv(tp * tp), fp, fn]
|
345
|
+
end.reject{ |_,| _.nan? }.min_by(&:first)
|
346
|
+
fmi
|
347
|
+
end
|
348
|
+
|
349
|
+
require "mll"
|
350
|
+
puts MLL::grid.call %w{ \ Fingerprint Compare 1/FMI^2 }.zip(*[
|
351
|
+
%w{ Phamilie Dhash DHash IDHash },
|
352
|
+
*[bm1, bm2].map{ |bm| bm.map{ |_| "%.3f" % _ } },
|
353
|
+
bm3.map{ |_| "%.3f" % _ }
|
354
|
+
].transpose).transpose, spacings: [1.5, 0], alignment: :right
|
355
|
+
puts "(lower numbers are better)"
|
356
|
+
end
|
data/common.rb
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
def download_and_keep image # returns path
|
2
|
+
require "open-uri"
|
3
|
+
require "digest"
|
4
|
+
File.join(FileUtils.mkdir_p(File.expand_path "images", __dir__()).first, image).tap do |path|
|
5
|
+
open("https://storage.googleapis.com/dhash-vips.nakilon.pro/#{image}") do |link|
|
6
|
+
File.open(path, "wb") do |file|
|
7
|
+
IO.copy_stream link, file
|
8
|
+
end
|
9
|
+
end unless File.exist?(path) && Digest::MD5.file(path) == File.basename(image, ".jpg")
|
10
|
+
end
|
11
|
+
end
|
data/dhash-vips.gemspec
CHANGED
@@ -1,20 +1,31 @@
|
|
1
1
|
Gem::Specification.new do |spec|
|
2
2
|
spec.name = "dhash-vips"
|
3
|
-
spec.version = "0.0.
|
3
|
+
spec.version = "0.1.0.3"
|
4
4
|
spec.author = "Victor Maslov"
|
5
5
|
spec.email = "nakilon@gmail.com"
|
6
6
|
spec.summary = "dHash and IDHash powered by Vips"
|
7
7
|
spec.homepage = "https://github.com/nakilon/dhash-vips"
|
8
8
|
spec.license = "MIT"
|
9
9
|
|
10
|
-
spec.test_files = ["spec"]
|
11
|
-
spec.files = `git ls-files -z`.split("\x0") - spec.test_files
|
12
10
|
spec.require_path = "lib"
|
11
|
+
spec.test_files = %w{ test.rb }
|
12
|
+
spec.extensions = %w{ extconf.rb }
|
13
|
+
spec.files = `git ls-files -z`.split("\x0") -
|
14
|
+
spec.test_files -
|
15
|
+
%w{ .gitignore Dockerfile } -
|
16
|
+
Dir.glob("example_*/**/*") -
|
17
|
+
Dir.glob(".github/**/*")
|
13
18
|
|
14
|
-
spec.add_dependency "ruby-vips"
|
19
|
+
spec.add_dependency "ruby-vips", "~>2.0.16"
|
15
20
|
|
16
21
|
spec.add_development_dependency "rake"
|
17
|
-
spec.add_development_dependency "
|
18
|
-
spec.add_development_dependency "dhash"
|
22
|
+
spec.add_development_dependency "minitest"
|
19
23
|
spec.add_development_dependency "get_process_mem"
|
24
|
+
|
25
|
+
spec.add_development_dependency "rmagick", "~>2.16"
|
26
|
+
spec.add_development_dependency "phamilie"
|
27
|
+
spec.add_development_dependency "dhash"
|
28
|
+
|
29
|
+
spec.add_development_dependency "mll"
|
30
|
+
spec.add_development_dependency "byebug"
|
20
31
|
end
|
data/extconf.rb
ADDED
@@ -0,0 +1,31 @@
|
|
1
|
+
require "mkmf"
|
2
|
+
|
3
|
+
File.write "Makefile", dummy_makefile(?.).join
|
4
|
+
unless Gem::Platform.local.os == "darwin" && Gem::Version.new(RUBY_VERSION) == Gem::Version.new("2.3.8")
|
5
|
+
else
|
6
|
+
begin
|
7
|
+
# https://github.com/rbenv/rbenv/issues/1199
|
8
|
+
append_cppflags "-I#{Dir.glob("#{`rbenv root`.chomp}/sources/#{`rbenv version-name`.chomp}/*/").first}"
|
9
|
+
rescue
|
10
|
+
else
|
11
|
+
create_makefile "idhash"
|
12
|
+
# Why this hack?
|
13
|
+
# 1. Because I want to use Ruby and ./idhash.bundle for tests, not C.
|
14
|
+
# 2. Because I don't want to bother users with two gems instead of one.
|
15
|
+
File.write "Makefile", <<~HEREDOC + File.read("Makefile")
|
16
|
+
.PHONY: test
|
17
|
+
test: all
|
18
|
+
\t$(RUBY) -r./lib/dhash-vips.rb ./lib/dhash-vips-post-install-test.rb
|
19
|
+
HEREDOC
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
# Cases to check:
|
24
|
+
# 0. all is ok
|
25
|
+
# `rm -rf idhash.o idhash.bundle pkg && bundle exec rake install` # w/o ext # ["/Users/nakilon/_/dhash-vips/lib/dhash-vips.rb", 32]
|
26
|
+
# `rm -f idhash.o idhash.bundle Makefile && ruby extconf.rb && make` # with ext # ["/Users/nakilon/_/dhash-vips/lib/dhash-vips.rb", 40]
|
27
|
+
# `bundle exec rake -rdhash-vips -e "p DHashVips::IDHash.method(:distance3).source_location"`
|
28
|
+
# 1. not macOS && rbenv
|
29
|
+
# 2. fail during append_cppflags
|
30
|
+
# 3. failed compilation
|
31
|
+
# 4. failed tests
|
data/idhash.c
ADDED
@@ -0,0 +1,29 @@
|
|
1
|
+
#include <bignum.c>
|
2
|
+
|
3
|
+
static VALUE idhash_distance(VALUE self, VALUE a, VALUE b){
|
4
|
+
BDIGIT* tempd;
|
5
|
+
long i, an = BIGNUM_LEN(a), bn = BIGNUM_LEN(b), templ, acc = 0;
|
6
|
+
BDIGIT* as = BDIGITS(a);
|
7
|
+
BDIGIT* bs = BDIGITS(b);
|
8
|
+
while (0 < an && as[an-1] == 0) an--; // for (i = an; --i;) printf("%u\n", as[i]);
|
9
|
+
while (0 < bn && bs[bn-1] == 0) bn--; // for (i = bn; --i;) printf("%u\n", bs[i]);
|
10
|
+
// printf("%lu %lu\n", an, bn);
|
11
|
+
if (an < bn) {
|
12
|
+
tempd = as; as = bs; bs = tempd;
|
13
|
+
templ = an; an = bn; bn = templ;
|
14
|
+
}
|
15
|
+
for (i = an; i-- > 4;) {
|
16
|
+
// printf("%ld : (%u | %u) & (%u ^ %u)\n", i, as[i], (i >= bn ? 0 : bs[i]), as[i-4], bs[i-4]);
|
17
|
+
acc += __builtin_popcountl((as[i] | (i >= bn ? 0 : bs[i])) & (as[i-4] ^ bs[i-4]));
|
18
|
+
// printf("%ld : %ld\n", i, acc);
|
19
|
+
}
|
20
|
+
RB_GC_GUARD(a);
|
21
|
+
RB_GC_GUARD(b);
|
22
|
+
return INT2FIX(acc);
|
23
|
+
}
|
24
|
+
|
25
|
+
void Init_idhash() {
|
26
|
+
VALUE m = rb_define_module("DHashVips");
|
27
|
+
VALUE mm = rb_define_module_under(m, "IDHash");
|
28
|
+
rb_define_module_function(mm, "distance3_c", idhash_distance, 2);
|
29
|
+
}
|
@@ -0,0 +1,36 @@
|
|
1
|
+
puts "Testing native extension..."
|
2
|
+
|
3
|
+
a, b = 27362028616592833077810614538336061650596602259623245623188871925927275101952, 57097733966917585112089915289446881218887831888508524872740133297073405558528
|
4
|
+
f = ->(a,b){ DHashVips::IDHash.distance3_ruby a, b }
|
5
|
+
|
6
|
+
p as = [a.to_s(16).rjust(64,?0)].pack("H*").unpack("N*")
|
7
|
+
p bs = [b.to_s(16).rjust(64,?0)].pack("H*").unpack("N*")
|
8
|
+
puts as.zip(bs)[0,4].map{ |i,j| (i | j).to_s(2).rjust(32, ?0) }.zip \
|
9
|
+
as.zip(bs)[4,4].map{ |i,j| (i ^ j).to_s(2).rjust(32, ?0) }
|
10
|
+
p DHashVips::IDHash.distance3_c a, b
|
11
|
+
p f[a, b]
|
12
|
+
fail unless 17 == f[a, b]
|
13
|
+
|
14
|
+
s = [0, 1, 1<<63, (1<<63)+1, (1<<64)-1].each do |_|
|
15
|
+
# p [_.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
|
16
|
+
end
|
17
|
+
ss = s.repeated_permutation(4).map do |s1, s2, s3, s4|
|
18
|
+
((s1 << 192) + (s2 << 128) + (s3 << 64) + s4).tap do |_|
|
19
|
+
# p [_.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
|
20
|
+
end
|
21
|
+
end
|
22
|
+
fail unless :distance3 == DHashVips::IDHash.method(:distance3).original_name
|
23
|
+
ss.product ss do |s1, s2|
|
24
|
+
next unless s1.is_a?(Bignum) && s2.is_a?(Bignum)
|
25
|
+
unless f[s1, s2] == DHashVips::IDHash.distance3_c(s1, s2)
|
26
|
+
p [s1, s2]
|
27
|
+
p [s1.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
|
28
|
+
p [s2.to_s(16).rjust(64,?0)].pack("H*").unpack("N*").map{ |_| _.to_s(2).rjust(32, ?0) }
|
29
|
+
p [f[s1, s2], DHashVips::IDHash.distance3_c(s1, s2)]
|
30
|
+
fail
|
31
|
+
end
|
32
|
+
end
|
33
|
+
100000.times do
|
34
|
+
s1, s2 = Array.new(2){ n = rand 256; ([?0] * n + [?1] * (256 - n)).shuffle.join.to_i 2 }
|
35
|
+
fail unless DHashVips::IDHash.distance3(s1, s2) == DHashVips::IDHash.distance3_ruby(s1, s2)
|
36
|
+
end
|
data/lib/dhash-vips.rb
CHANGED
@@ -10,7 +10,7 @@ module DHashVips
|
|
10
10
|
end
|
11
11
|
|
12
12
|
def pixelate file, hash_size, kernel = nil
|
13
|
-
image = Vips::Image.new_from_file file
|
13
|
+
image = Vips::Image.new_from_file file, access: :sequential
|
14
14
|
if kernel
|
15
15
|
image.resize((hash_size + 1).fdiv(image.width), vscale: hash_size.fdiv(image.height), kernel: kernel).colourspace("b-w")
|
16
16
|
else
|
@@ -21,7 +21,7 @@ module DHashVips
|
|
21
21
|
def calculate file, hash_size = 8, kernel = nil
|
22
22
|
image = pixelate file, hash_size, kernel
|
23
23
|
|
24
|
-
image.cast("int").conv([1, -1]).crop(1, 0, hash_size, hash_size).>(0)./(255).cast("uchar").to_a.join.to_i(2)
|
24
|
+
image.cast("int").conv([[1, -1]]).crop(1, 0, hash_size, hash_size).>(0)./(255).cast("uchar").to_a.join.to_i(2)
|
25
25
|
end
|
26
26
|
|
27
27
|
end
|
@@ -29,17 +29,30 @@ module DHashVips
|
|
29
29
|
module IDHash
|
30
30
|
extend self
|
31
31
|
|
32
|
-
def
|
33
|
-
|
32
|
+
def distance3_ruby a, b
|
33
|
+
((a ^ b) & (a | b) >> 128).to_s(2).count "1"
|
34
|
+
end
|
35
|
+
begin
|
36
|
+
require_relative "../idhash.bundle"
|
37
|
+
rescue LoadError
|
38
|
+
alias_method :distance3, :distance3_ruby
|
39
|
+
else
|
40
|
+
def distance3 a, b
|
41
|
+
if a.is_a?(Bignum) && b.is_a?(Bignum)
|
42
|
+
distance3_c a, b
|
43
|
+
else
|
44
|
+
distance3_ruby a, b
|
45
|
+
end
|
46
|
+
end
|
34
47
|
end
|
35
48
|
def distance a, b
|
36
49
|
size_a, size_b = [a, b].map do |x|
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
end
|
50
|
+
# TODO write a test about possible hash sizes
|
51
|
+
# they were 32 and 128, 124, 120 for MRI 2.0
|
52
|
+
# but also 31, 30 happens for MRI 2.3
|
53
|
+
x.size <= 32 ? 8 : 16
|
42
54
|
end
|
55
|
+
return distance3 a, b if [8, 8] == [size_a, size_b]
|
43
56
|
fail "fingerprints were taken with different `power` param: #{size_a} and #{size_b}" if size_a != size_b
|
44
57
|
((a ^ b) & (a | b) >> 2 * size_a * size_a).to_s(2).count "1"
|
45
58
|
end
|
@@ -64,10 +77,10 @@ module DHashVips
|
|
64
77
|
fail unless 1 == @@median[[1, 1, 1]]
|
65
78
|
fail unless 1 == @@median[[1, 1]]
|
66
79
|
|
67
|
-
def fingerprint
|
80
|
+
def fingerprint filename, power = 3
|
68
81
|
size = 2 ** power
|
69
|
-
image = Vips::Image.new_from_file
|
70
|
-
image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w")
|
82
|
+
image = Vips::Image.new_from_file filename, access: :sequential
|
83
|
+
image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w").flatten
|
71
84
|
|
72
85
|
array = image.to_a.map &:flatten
|
73
86
|
d1, i1, d2, i2 = [array, array.transpose].flat_map do |a|
|
data/test.rb
ADDED
@@ -0,0 +1,101 @@
|
|
1
|
+
require "minitest/autorun"
|
2
|
+
|
3
|
+
require "dhash-vips"
|
4
|
+
|
5
|
+
# TODO tests about `fingerprint(4)`
|
6
|
+
|
7
|
+
[
|
8
|
+
[DHashVips::DHash, :hamming, :calculate, 2, 23, 18, 50, 4],
|
9
|
+
# vips-8.9.1-Tue Jan 28 13:05:46 UTC 2020
|
10
|
+
# [[0, 14, 26, 27, 31, 27, 32, 28, 43, 43, 34, 37, 37, 34, 35, 42],
|
11
|
+
# [14, 0, 28, 25, 39, 35, 32, 32, 43, 43, 38, 41, 41, 38, 37, 50],
|
12
|
+
# [26, 28, 0, 13, 35, 41, 28, 30, 41, 41, 36, 33, 35, 32, 27, 36],
|
13
|
+
# [27, 25, 13, 0, 36, 36, 31, 35, 40, 40, 33, 32, 42, 35, 26, 33],
|
14
|
+
# [31, 39, 35, 36, 0, 16, 33, 33, 40, 40, 31, 24, 28, 33, 40, 31],
|
15
|
+
# [27, 35, 41, 36, 16, 0, 41, 41, 38, 38, 23, 26, 24, 29, 34, 27],
|
16
|
+
# [32, 32, 28, 31, 33, 41, 0, 10, 27, 25, 38, 35, 37, 32, 23, 34],
|
17
|
+
# [28, 32, 30, 35, 33, 41, 10, 0, 27, 27, 34, 31, 37, 36, 27, 34],
|
18
|
+
# [43, 43, 41, 40, 40, 38, 27, 27, 0, 2, 35, 34, 30, 31, 28, 27],
|
19
|
+
# [43, 43, 41, 40, 40, 38, 25, 27, 2, 0, 35, 34, 30, 31, 28, 27],
|
20
|
+
# [34, 38, 36, 33, 31, 23, 38, 34, 35, 35, 0, 9, 23, 26, 29, 18],
|
21
|
+
# [37, 41, 33, 32, 24, 26, 35, 31, 34, 34, 9, 0, 22, 25, 30, 19],
|
22
|
+
# [37, 41, 35, 42, 28, 24, 37, 37, 30, 30, 23, 22, 0, 19, 26, 23],
|
23
|
+
# [34, 38, 32, 35, 33, 29, 32, 36, 31, 31, 26, 25, 19, 0, 21, 26],
|
24
|
+
# [35, 37, 27, 26, 40, 34, 23, 27, 28, 28, 29, 30, 26, 21, 0, 23],
|
25
|
+
# [42, 50, 36, 33, 31, 27, 34, 34, 27, 27, 18, 19, 23, 26, 23, 0]]
|
26
|
+
[DHashVips::IDHash, :distance, :fingerprint, 6, 22, 23, 65, 0],
|
27
|
+
# vips-8.9.1-Tue Jan 28 13:05:46 UTC 2020
|
28
|
+
# [[0, 16, 32, 35, 57, 45, 51, 50, 48, 47, 54, 48, 60, 50, 47, 56],
|
29
|
+
# [16, 0, 30, 34, 58, 47, 55, 56, 47, 50, 57, 49, 62, 52, 52, 61],
|
30
|
+
# [32, 30, 0, 9, 47, 54, 45, 41, 65, 62, 42, 37, 51, 44, 49, 49],
|
31
|
+
# [35, 34, 9, 0, 54, 64, 42, 40, 57, 56, 48, 39, 50, 40, 41, 51],
|
32
|
+
# [57, 58, 47, 54, 0, 22, 43, 45, 64, 61, 48, 47, 35, 43, 47, 48],
|
33
|
+
# [45, 47, 54, 64, 22, 0, 53, 54, 55, 54, 40, 46, 39, 42, 43, 42],
|
34
|
+
# [51, 55, 45, 42, 43, 53, 0, 6, 33, 35, 52, 43, 46, 45, 44, 47],
|
35
|
+
# [50, 56, 41, 40, 45, 54, 6, 0, 38, 41, 53, 50, 48, 45, 41, 42],
|
36
|
+
# [48, 47, 65, 57, 64, 55, 33, 38, 0, 9, 51, 53, 47, 47, 41, 46],
|
37
|
+
# [47, 50, 62, 56, 61, 54, 35, 41, 9, 0, 51, 57, 50, 49, 44, 43],
|
38
|
+
# [54, 57, 42, 48, 48, 40, 52, 53, 51, 51, 0, 10, 33, 36, 38, 25],
|
39
|
+
# [48, 49, 37, 39, 47, 46, 43, 50, 53, 57, 10, 0, 27, 30, 37, 27],
|
40
|
+
# [60, 62, 51, 50, 35, 39, 46, 48, 47, 50, 33, 27, 0, 20, 23, 28],
|
41
|
+
# [50, 52, 44, 40, 43, 42, 45, 45, 47, 49, 36, 30, 20, 0, 35, 39],
|
42
|
+
# [47, 52, 49, 41, 47, 43, 44, 41, 41, 44, 38, 37, 23, 35, 0, 19],
|
43
|
+
# [56, 61, 49, 51, 48, 42, 47, 42, 46, 43, 25, 27, 28, 39, 19, 0]]
|
44
|
+
].each do |lib, dm, calc, min_similar, max_similar, min_not_similar, max_not_similar, bw_exceptional|
|
45
|
+
|
46
|
+
describe lib do
|
47
|
+
|
48
|
+
# these are false positive by idhash
|
49
|
+
# 6d97739b4a08f965dc9239dd24382e96.jpg
|
50
|
+
# 1b1d4bde376084011d027bba1c047a4b.jpg
|
51
|
+
[
|
52
|
+
[ %w{
|
53
|
+
1d468d064d2e26b5b5de9a0241ef2d4b.jpg 92d90b8977f813af803c78107e7f698e.jpg
|
54
|
+
309666c7b45ecbf8f13e85a0bd6b0a4c.jpg 3f9f3db06db20d1d9f8188cd753f6ef4.jpg
|
55
|
+
679634ff89a31279a39f03e278bc9a01.jpg df0a3b93e9412536ee8a11255f974141.jpg
|
56
|
+
54192a3f65bd03163b04849e1577a40b.jpg 6d32f57459e5b79b5deca2a361eb8c6e.jpg
|
57
|
+
4b62e0eef58bfbc8d0d2fbf2b9d05483.jpg b8eb0ca91855b657f12fb3d627d45c53.jpg
|
58
|
+
21cd9a6986d98976b6b4655e1de7baf4.jpg 9b158c0d4953d47171a22ed84917f812.jpg
|
59
|
+
9c2c240ec02356472fb532f404d28dde.jpg fc762fa286489d8afc80adc8cdcb125e.jpg
|
60
|
+
7a833d873f8d49f12882e86af1cc6b79.jpg ac033cf01a3941dd1baa876082938bc9.jpg
|
61
|
+
}, min_similar, max_similar], # slightly silimar images
|
62
|
+
[ %w{
|
63
|
+
71662d4d4029a3b41d47d5baf681ab9a.jpg ad8a37f872956666c3077a3e9e737984.jpg
|
64
|
+
}, bw_exceptional, bw_exceptional], # these are the same photo but of different size and colorspace
|
65
|
+
].each do |images, min, max|
|
66
|
+
|
67
|
+
require "fileutils"
|
68
|
+
require "digest"
|
69
|
+
require "mll"
|
70
|
+
|
71
|
+
require_relative "common"
|
72
|
+
images = images.map &method(:download_and_keep)
|
73
|
+
|
74
|
+
hashes = images.map &lib.method(calc)
|
75
|
+
table = MLL::table[lib.method(dm), [hashes], [hashes]]
|
76
|
+
|
77
|
+
require "pp"
|
78
|
+
STDERR.puts ""
|
79
|
+
PP.pp table, STDERR
|
80
|
+
STDERR.puts ""
|
81
|
+
|
82
|
+
hashes.size.times.to_a.repeated_combination(2) do |i, j|
|
83
|
+
it do
|
84
|
+
case
|
85
|
+
when i == j
|
86
|
+
assert_predicate table[i][j], :zero?
|
87
|
+
when (j - i).abs == 1 && (i + j - 1) % 4 == 0
|
88
|
+
# STDERR.puts [table[i][j], min, max].inspect
|
89
|
+
assert_includes min..max, table[i][j]
|
90
|
+
else
|
91
|
+
# STDERR.puts [table[i][j], min_not_similar, max_not_similar].inspect
|
92
|
+
assert_includes min_not_similar..max_not_similar, table[i][j]
|
93
|
+
end
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
end
|
98
|
+
|
99
|
+
end
|
100
|
+
|
101
|
+
end
|
metadata
CHANGED
@@ -1,99 +1,159 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: dhash-vips
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.1.0.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Victor Maslov
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2020-07-25 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: ruby-vips
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|
16
16
|
requirements:
|
17
|
-
- -
|
17
|
+
- - "~>"
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version:
|
19
|
+
version: 2.0.16
|
20
20
|
type: :runtime
|
21
21
|
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
|
-
- -
|
24
|
+
- - "~>"
|
25
25
|
- !ruby/object:Gem::Version
|
26
|
-
version:
|
26
|
+
version: 2.0.16
|
27
27
|
- !ruby/object:Gem::Dependency
|
28
28
|
name: rake
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
30
30
|
requirements:
|
31
|
-
- -
|
31
|
+
- - ">="
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ">="
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '0'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: minitest
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - ">="
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0'
|
48
|
+
type: :development
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
55
|
+
- !ruby/object:Gem::Dependency
|
56
|
+
name: get_process_mem
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
58
|
+
requirements:
|
59
|
+
- - ">="
|
32
60
|
- !ruby/object:Gem::Version
|
33
61
|
version: '0'
|
34
62
|
type: :development
|
35
63
|
prerelease: false
|
36
64
|
version_requirements: !ruby/object:Gem::Requirement
|
37
65
|
requirements:
|
38
|
-
- -
|
66
|
+
- - ">="
|
39
67
|
- !ruby/object:Gem::Version
|
40
68
|
version: '0'
|
41
69
|
- !ruby/object:Gem::Dependency
|
42
|
-
name:
|
70
|
+
name: rmagick
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
72
|
+
requirements:
|
73
|
+
- - "~>"
|
74
|
+
- !ruby/object:Gem::Version
|
75
|
+
version: '2.16'
|
76
|
+
type: :development
|
77
|
+
prerelease: false
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
79
|
+
requirements:
|
80
|
+
- - "~>"
|
81
|
+
- !ruby/object:Gem::Version
|
82
|
+
version: '2.16'
|
83
|
+
- !ruby/object:Gem::Dependency
|
84
|
+
name: phamilie
|
43
85
|
requirement: !ruby/object:Gem::Requirement
|
44
86
|
requirements:
|
45
|
-
- -
|
87
|
+
- - ">="
|
46
88
|
- !ruby/object:Gem::Version
|
47
89
|
version: '0'
|
48
90
|
type: :development
|
49
91
|
prerelease: false
|
50
92
|
version_requirements: !ruby/object:Gem::Requirement
|
51
93
|
requirements:
|
52
|
-
- -
|
94
|
+
- - ">="
|
53
95
|
- !ruby/object:Gem::Version
|
54
96
|
version: '0'
|
55
97
|
- !ruby/object:Gem::Dependency
|
56
98
|
name: dhash
|
57
99
|
requirement: !ruby/object:Gem::Requirement
|
58
100
|
requirements:
|
59
|
-
- -
|
101
|
+
- - ">="
|
60
102
|
- !ruby/object:Gem::Version
|
61
103
|
version: '0'
|
62
104
|
type: :development
|
63
105
|
prerelease: false
|
64
106
|
version_requirements: !ruby/object:Gem::Requirement
|
65
107
|
requirements:
|
66
|
-
- -
|
108
|
+
- - ">="
|
67
109
|
- !ruby/object:Gem::Version
|
68
110
|
version: '0'
|
69
111
|
- !ruby/object:Gem::Dependency
|
70
|
-
name:
|
112
|
+
name: mll
|
113
|
+
requirement: !ruby/object:Gem::Requirement
|
114
|
+
requirements:
|
115
|
+
- - ">="
|
116
|
+
- !ruby/object:Gem::Version
|
117
|
+
version: '0'
|
118
|
+
type: :development
|
119
|
+
prerelease: false
|
120
|
+
version_requirements: !ruby/object:Gem::Requirement
|
121
|
+
requirements:
|
122
|
+
- - ">="
|
123
|
+
- !ruby/object:Gem::Version
|
124
|
+
version: '0'
|
125
|
+
- !ruby/object:Gem::Dependency
|
126
|
+
name: byebug
|
71
127
|
requirement: !ruby/object:Gem::Requirement
|
72
128
|
requirements:
|
73
|
-
- -
|
129
|
+
- - ">="
|
74
130
|
- !ruby/object:Gem::Version
|
75
131
|
version: '0'
|
76
132
|
type: :development
|
77
133
|
prerelease: false
|
78
134
|
version_requirements: !ruby/object:Gem::Requirement
|
79
135
|
requirements:
|
80
|
-
- -
|
136
|
+
- - ">="
|
81
137
|
- !ruby/object:Gem::Version
|
82
138
|
version: '0'
|
83
139
|
description:
|
84
140
|
email: nakilon@gmail.com
|
85
141
|
executables: []
|
86
|
-
extensions:
|
142
|
+
extensions:
|
143
|
+
- extconf.rb
|
87
144
|
extra_rdoc_files: []
|
88
145
|
files:
|
89
|
-
- .gitignore
|
90
146
|
- Gemfile
|
91
147
|
- LICENSE.txt
|
92
148
|
- README.md
|
93
149
|
- Rakefile
|
150
|
+
- common.rb
|
94
151
|
- dhash-vips.gemspec
|
152
|
+
- extconf.rb
|
153
|
+
- idhash.c
|
154
|
+
- lib/dhash-vips-post-install-test.rb
|
95
155
|
- lib/dhash-vips.rb
|
96
|
-
-
|
156
|
+
- test.rb
|
97
157
|
homepage: https://github.com/nakilon/dhash-vips
|
98
158
|
licenses:
|
99
159
|
- MIT
|
@@ -104,18 +164,19 @@ require_paths:
|
|
104
164
|
- lib
|
105
165
|
required_ruby_version: !ruby/object:Gem::Requirement
|
106
166
|
requirements:
|
107
|
-
- -
|
167
|
+
- - ">="
|
108
168
|
- !ruby/object:Gem::Version
|
109
169
|
version: '0'
|
110
170
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
111
171
|
requirements:
|
112
|
-
- -
|
172
|
+
- - ">="
|
113
173
|
- !ruby/object:Gem::Version
|
114
174
|
version: '0'
|
115
175
|
requirements: []
|
116
176
|
rubyforge_project:
|
117
|
-
rubygems_version: 2.
|
177
|
+
rubygems_version: 2.5.2.3
|
118
178
|
signing_key:
|
119
179
|
specification_version: 4
|
120
180
|
summary: dHash and IDHash powered by Vips
|
121
|
-
test_files:
|
181
|
+
test_files:
|
182
|
+
- test.rb
|
data/.gitignore
DELETED
@@ -1,22 +0,0 @@
|
|
1
|
-
*.gem
|
2
|
-
*.rbc
|
3
|
-
.bundle
|
4
|
-
.config
|
5
|
-
.yardoc
|
6
|
-
Gemfile.lock
|
7
|
-
InstalledFiles
|
8
|
-
_yardoc
|
9
|
-
coverage
|
10
|
-
doc/
|
11
|
-
lib/bundler/man
|
12
|
-
pkg
|
13
|
-
rdoc
|
14
|
-
spec/reports
|
15
|
-
test/tmp
|
16
|
-
test/version_tmp
|
17
|
-
tmp
|
18
|
-
*.bundle
|
19
|
-
*.so
|
20
|
-
*.o
|
21
|
-
*.a
|
22
|
-
mkmf.log
|
data/spec/_spec.rb
DELETED
@@ -1,100 +0,0 @@
|
|
1
|
-
require "dhash-vips"
|
2
|
-
|
3
|
-
require "pp"
|
4
|
-
|
5
|
-
# TODO tests about `fingerprint(4)`
|
6
|
-
|
7
|
-
[
|
8
|
-
[DHashVips::DHash, :hamming, :calculate, 17, 18, 22, 39],
|
9
|
-
# [[0, 17, 29, 27, 22, 29, 30, 29],
|
10
|
-
# [17, 0, 30, 26, 33, 36, 37, 36],
|
11
|
-
# [29, 30, 0, 18, 39, 30, 39, 36],
|
12
|
-
# [27, 26, 18, 0, 35, 30, 35, 34],
|
13
|
-
# [22, 33, 39, 35, 0, 17, 28, 23],
|
14
|
-
# [29, 36, 30, 30, 17, 0, 33, 30],
|
15
|
-
# [30, 37, 39, 35, 28, 33, 0, 5],
|
16
|
-
# [29, 36, 36, 34, 23, 30, 5, 0]]
|
17
|
-
[DHashVips::IDHash, :distance, :fingerprint, 15, 23, 28, 64],
|
18
|
-
# [[0, 16, 30, 32, 46, 58, 43, 43],
|
19
|
-
# [16, 0, 28, 28, 47, 59, 46, 47],
|
20
|
-
# [30, 28, 0, 15, 53, 49, 53, 52],
|
21
|
-
# [32, 28, 15, 0, 56, 53, 61, 64],
|
22
|
-
# [46, 47, 53, 56, 0, 23, 43, 45],
|
23
|
-
# [58, 59, 49, 53, 23, 0, 44, 44],
|
24
|
-
# [43, 46, 53, 61, 43, 44, 0, 0],
|
25
|
-
# [43, 47, 52, 64, 45, 44, 0, 0]]
|
26
|
-
].each do |lib, dm, calc, min_similar, max_similar, min_not_similar, max_not_similar|
|
27
|
-
|
28
|
-
describe lib do
|
29
|
-
|
30
|
-
require "fileutils"
|
31
|
-
require "open-uri"
|
32
|
-
require "digest"
|
33
|
-
require "mll"
|
34
|
-
example do |example|
|
35
|
-
|
36
|
-
images = %w{
|
37
|
-
1d468d064d2e26b5b5de9a0241ef2d4b.jpg
|
38
|
-
92d90b8977f813af803c78107e7f698e.jpg
|
39
|
-
309666c7b45ecbf8f13e85a0bd6b0a4c.jpg
|
40
|
-
3f9f3db06db20d1d9f8188cd753f6ef4.jpg
|
41
|
-
df0a3b93e9412536ee8a11255f974141.jpg
|
42
|
-
679634ff89a31279a39f03e278bc9a01.jpg
|
43
|
-
} # these images a consecutive pairs of slightly (but enough for nice asserts) silimar images
|
44
|
-
# 6d97739b4a08f965dc9239dd24382e96.jpg
|
45
|
-
# 1b1d4bde376084011d027bba1c047a4b.jpg
|
46
|
-
# while these two are tend to be false positive match by idhash
|
47
|
-
bw1, bw2 = %w{
|
48
|
-
71662d4d4029a3b41d47d5baf681ab9a.jpg
|
49
|
-
ad8a37f872956666c3077a3e9e737984.jpg
|
50
|
-
} # these are the same photo but of different size and colorspace
|
51
|
-
|
52
|
-
example.metadata[:extra_failure_lines] = []
|
53
|
-
FileUtils.mkdir_p dir = "images"
|
54
|
-
*images, bw1, bw2 = [*images, bw1, bw2].map do |image|
|
55
|
-
"#{dir}/#{image}".tap do |filename|
|
56
|
-
unless File.exist?(filename) && Digest::MD5.file(filename) == File.basename(filename, ".jpg")
|
57
|
-
example.metadata[:extra_failure_lines] << "copying image from web to #{filename}"
|
58
|
-
open("https://storage.googleapis.com/dhash-vips.nakilon.pro/#{image}") do |link|
|
59
|
-
File.open(filename, "wb") do |file|
|
60
|
-
IO.copy_stream link, file
|
61
|
-
end
|
62
|
-
end
|
63
|
-
end
|
64
|
-
end
|
65
|
-
end
|
66
|
-
|
67
|
-
hashes = [*images, bw1, bw2].map &described_class.method(calc)
|
68
|
-
table = MLL::table[described_class.method(dm), [hashes], [hashes]]
|
69
|
-
|
70
|
-
# require "pp"
|
71
|
-
# pp table
|
72
|
-
# next
|
73
|
-
|
74
|
-
aggregate_failures do
|
75
|
-
hashes.size.times.to_a.repeated_combination(2) do |i, j|
|
76
|
-
case
|
77
|
-
when i == j
|
78
|
-
expect(table[i][j]).to eq 0
|
79
|
-
when (j - i).abs == 1 && (i + j - 1) % 4 == 0
|
80
|
-
if [i, j] == [hashes.size - 2, hashes.size - 1]
|
81
|
-
if described_class == DHashVips::DHash
|
82
|
-
expect(table[i][j]).to be == 5
|
83
|
-
else
|
84
|
-
expect(table[i][j]).to eq 0
|
85
|
-
end
|
86
|
-
else
|
87
|
-
expect(table[i][j]).to be_between(min_similar, max_similar).inclusive
|
88
|
-
end
|
89
|
-
else
|
90
|
-
expect(table[i][j]).to be_between(min_not_similar, max_not_similar).inclusive
|
91
|
-
end
|
92
|
-
end
|
93
|
-
|
94
|
-
end
|
95
|
-
|
96
|
-
end
|
97
|
-
|
98
|
-
end
|
99
|
-
|
100
|
-
end
|