dhash-vips 0.0.4.1 → 0.0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/Gemfile +0 -1
- data/README.md +101 -34
- data/Rakefile +112 -55
- data/dhash-vips.gemspec +2 -1
- data/lib/dhash-vips.rb +16 -7
- data/spec/_spec.rb +11 -6
- metadata +16 -3
- data/lib/dhash-vips/version.rb +0 -3
    
        checksums.yaml
    CHANGED
    
    | @@ -1,7 +1,7 @@ | |
| 1 1 | 
             
            ---
         | 
| 2 2 | 
             
            SHA1:
         | 
| 3 | 
            -
              metadata.gz:  | 
| 4 | 
            -
              data.tar.gz:  | 
| 3 | 
            +
              metadata.gz: 0263e2d1f269153107a10986ba97d2221f1191d6
         | 
| 4 | 
            +
              data.tar.gz: 4e1365a3ab0507e3e14cf8b06ae1f2f7d84482f1
         | 
| 5 5 | 
             
            SHA512:
         | 
| 6 | 
            -
              metadata.gz:  | 
| 7 | 
            -
              data.tar.gz:  | 
| 6 | 
            +
              metadata.gz: c00ea6ebead50175bfaa74bf0814a394c1b06159d238716d2edb7e2718587ee0a6c5497ec55823806193b5f46fa1a6ac1b43ba7e277ea468dd2e0ba3c5ea97a7
         | 
| 7 | 
            +
              data.tar.gz: 57d32bfbf327226ff8ba0771f6346d1ccdadbe8b61dcfaa4aa81795976bd3146fce877f9532d6d9fc1a798ba2ffe3f63a1a940aee263f3702a9b48a4c835a6e6
         | 
    
        data/Gemfile
    CHANGED
    
    
    
        data/README.md
    CHANGED
    
    | @@ -2,29 +2,27 @@ | |
| 2 2 |  | 
| 3 3 | 
             
            # dHash and IDHash gem powered by ruby-vips
         | 
| 4 4 |  | 
| 5 | 
            -
            The "dHash" is an algorithm of  | 
| 5 | 
            +
            The "dHash" is an algorithm of fingerprinting that can be used to measure the similarity of two images.
         | 
| 6 6 |  | 
| 7 7 | 
             
            You may read about it in "Kind of Like That" blog post (21 January 2013): http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html  
         | 
| 8 | 
            -
            The original idea is that you split the image into 64 segments and so there are 64 bits -- each tells if the one segment is brighter or darker than the neighbor one. Then the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between  | 
| 8 | 
            +
            The original idea is that you split the image into 64 segments and so there are 64 bits -- each tells if the one segment is brighter or darker than the neighbor one. Then the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) between fingerprints is the opposite of images similarity.
         | 
| 9 9 |  | 
| 10 | 
            -
            There  | 
| 10 | 
            +
            There were several implementations on Github already but they all depend on ImageMagick. My implementation takes an advantage of libvips (the `ruby-vips` gem) -- it also uses the `.conv` method and in result converts image to an array of grayscale bytes almost 10 times faster:
         | 
| 11 11 | 
             
            ```
         | 
| 12 | 
            -
             | 
| 13 | 
            -
             | 
| 14 | 
            -
             | 
| 15 | 
            -
             | 
| 16 | 
            -
             | 
| 17 | 
            -
             | 
| 18 | 
            -
             | 
| 19 | 
            -
             | 
| 20 | 
            -
             | 
| 21 | 
            -
             | 
| 22 | 
            -
             | 
| 23 | 
            -
            DHashVips::DHash     2.350000   0.020000   2.370000 (  2.401252)
         | 
| 24 | 
            -
            DHashVips::IDHash    5.190000   0.040000   5.230000 (  5.279742)
         | 
| 12 | 
            +
            load and calculate the fingerprint:
         | 
| 13 | 
            +
                                      user     system      total        real
         | 
| 14 | 
            +
            Dhash                12.400000   0.820000  13.220000 ( 13.329952)
         | 
| 15 | 
            +
            DHashVips::DHash      1.330000   0.230000   1.560000 (  1.509826)
         | 
| 16 | 
            +
            DHashVips::IDHash     1.060000   0.090000   1.150000 (  1.100332)
         | 
| 17 | 
            +
             | 
| 18 | 
            +
            measure the distance (1000 times):
         | 
| 19 | 
            +
                                               user     system      total        real
         | 
| 20 | 
            +
            Dhash hamming                  3.140000   0.020000   3.160000 (  3.179392)
         | 
| 21 | 
            +
            DHashVips::DHash hamming       3.040000   0.020000   3.060000 (  3.095190)
         | 
| 22 | 
            +
            DHashVips::IDHash distance     6.720000   0.030000   6.750000 (  6.790900)
         | 
| 25 23 | 
             
            ```
         | 
| 26 24 |  | 
| 27 | 
            -
            Here the `Dhash` is  | 
| 25 | 
            +
            Here the `Dhash` is [another gem](https://github.com/maccman/dhash) that I used earlier in my projects.  
         | 
| 28 26 | 
             
            The `DHashVips::DHash` is a port of it that uses vips. I would like to tell you that you can replace the `dhash` with `dhash-vips` gem right now but it appeared to have a barely noticeable issue. There is a lot of magic behind the libvips speed and resizing -- you may not notice it with unarmed eyes but when two neighbor segments are enough similar by luminosity the difference can change the sign. So I found two identical images that were just of different colorspace and size (photo by Jordan Voth):  
         | 
| 29 27 | 
             
              
         | 
| 30 28 | 
             
            but the distance between their hashes appeared to be equal to 5 while `dhash` gem reported 0.
         | 
| @@ -33,36 +31,40 @@ This is why `DHashVips::IDHash` appeared. | |
| 33 31 |  | 
| 34 32 | 
             
            ## IDHash (the Important Difference Hash)
         | 
| 35 33 |  | 
| 36 | 
            -
            It has improvements over the dHash that made  | 
| 37 | 
            -
            * The "Importance" is an array of extra 64 bits that tells the comparing function which half of 64 bits is important (when the difference between neighbors was enough significant) and which is not. So not every bit in a  | 
| 38 | 
            -
            * It subtracts not only horizontally but also vertically -- that adds 128 more bits.
         | 
| 34 | 
            +
            It has improvements over the dHash that made fingerprinting less sensitive to the resizing algorithm and effectively made the pair of images mentioned above to have a distance of 0 again. Three improvements are:  
         | 
| 35 | 
            +
            * The "Importance" is an array of extra 64 bits that tells the comparing function which half of 64 bits is important (when the difference between neighbors was enough significant) and which is not. So not every bit in a fingerprint is being compared but only half of them.  
         | 
| 36 | 
            +
            * It subtracts not only horizontally but also vertically -- that adds 128 more bits.  
         | 
| 39 37 | 
             
            * Instead of resizing to 9x8 it resizes to 8x8 and puts the image on a torus so it subtracts the left column from the right one and the top from bottom.
         | 
| 40 38 |  | 
| 41 | 
            -
             | 
| 42 | 
            -
              
         | 
| 43 | 
            -
            and visualization of IDHash (`rake compare_images -- image1.jpg image2.jpg`):  
         | 
| 44 | 
            -
              
         | 
| 45 | 
            -
             | 
| 46 | 
            -
            Here in each of 64 cells, there are two circles that color the difference between that cell and the neighbor one. If the difference is low the Importance bit is set to zero and the circle is invisible. So there are 128 pairs of corresponding circles and when you take one, if at least one circle is visible and is of different color the line is to be drawn. Here you see 15 lines and so the distance between hashes will be equal to 15 (that is pretty low and can be interpreted as "images look similar"). Also, you see here that floor on this photo matters -- classic dHash won't see that it's darker than wall because it's comparing only horizontal neighbors and if one photo had no floor the distance function won't notice that. Also, it sees the Important difference between the very right and left columns because the wall has a slow but visible gradient.
         | 
| 47 | 
            -
             | 
| 48 | 
            -
            You could see in hash calculation benchmark earlier that these improvements didn't make it slower than dHash because most of the time is spent on image resizing. The calculation of distance is what became two times slower:
         | 
| 39 | 
            +
            You could see in fingerprint calculation benchmark earlier that these improvements didn't make it slower than dHash because most of the time is spent on image resizing. The calculation of distance is what became two times slower:
         | 
| 49 40 | 
             
            ```ruby
         | 
| 50 | 
            -
            ((a | b) & (a  | 
| 41 | 
            +
            ((a | b) & ((a ^ b) >> 128)).to_s(2).count "1"
         | 
| 51 42 | 
             
            ```
         | 
| 52 43 | 
             
            vs
         | 
| 53 44 | 
             
            ```ruby
         | 
| 54 45 | 
             
            (a ^ b).to_s(2).count "1"
         | 
| 55 46 | 
             
            ```
         | 
| 56 47 |  | 
| 57 | 
            -
             | 
| 48 | 
            +
            ### Example
         | 
| 49 | 
            +
             | 
| 50 | 
            +
            Here are two photos (by Brian Lauer):  
         | 
| 51 | 
            +
              
         | 
| 52 | 
            +
            and visualization of IDHash (`rake compare_images -- image1.jpg image2.jpg`):  
         | 
| 53 | 
            +
              
         | 
| 54 | 
            +
             | 
| 55 | 
            +
            Here in each of 64 cells, there are two circles that color the difference between that cell and the neighbor one. If the difference is low the Importance bit is set to zero and the circle is invisible. So there are 128 pairs of corresponding circles and when you take one, if at least one circle is visible and is of different color the line is to be drawn. Here you see 15 lines and so the distance between fingerprints will be equal to 15 (that is pretty low and can be interpreted as "images look similar"). Also, you see here that floor on this photo matters -- classic dHash won't see that it's darker than wall because it's comparing only horizontal neighbors and if one photo had no floor the distance function won't notice that. Also, it sees the Important difference between the very right and left columns because the wall has a slow but visible gradient.
         | 
| 56 | 
            +
             | 
| 57 | 
            +
            ### Remaining problems
         | 
| 58 | 
            +
             | 
| 58 59 | 
             
            * Neither dHash nor IDHash can't automatically detect very shifted crops and rotated images but you can make a wrapper that would call the comparison function iteratively.  
         | 
| 59 60 | 
             
            * These algorithms are color blind because of converting an image to grayscale. If you take a photo of something in your yard the sun will create lights and shadows, but if you compare photos of something green painted on a blue wall there is a possibility the machine would see nothing painted at all. The `dhash` gem had such image in specs and that made them pretty useless (this was supposed to be a face):  
         | 
| 60 61 | 
             
              
         | 
| 61 | 
            -
            * If you have a pile of 1000000 images  | 
| 62 | 
            +
            * If you have a pile of 1000000 images comparing them with each other would take a month or two. To improve the process in case of dHash that uses Hamming distance you may want to read these threads on Stackexchange network:  
         | 
| 62 63 | 
             
              * [How to find the closest pairs of a string of binary bins in Ruby without O^2 issues?](https://stackoverflow.com/q/8734034/322020)  
         | 
| 63 64 | 
             
              * [Find all pairs of values that are close under Hamming distance](https://cstheory.stackexchange.com/q/18516/27420)  
         | 
| 64 65 | 
             
              * [Finding the closest pair between two sets of points on the hypercube](https://cstheory.stackexchange.com/q/16322/27420)  
         | 
| 65 66 | 
             
              * [Would PCA work for boolean data types?](https://stats.stackexchange.com/q/159705/1125)  
         | 
| 67 | 
            +
              * [Using pHash to search agaist a huge image database, what is the best approach?](https://stackoverflow.com/q/18257641/322020)  
         | 
| 66 68 |  | 
| 67 69 | 
             
            ## Installation
         | 
| 68 70 |  | 
| @@ -95,8 +97,6 @@ else | |
| 95 97 | 
             
            end
         | 
| 96 98 | 
             
            ```
         | 
| 97 99 |  | 
| 98 | 
            -
            These `10` and `20` numbers are found empirically and just work enough well for 8-byte hashes.
         | 
| 99 | 
            -
             | 
| 100 100 | 
             
            ### IDHash:
         | 
| 101 101 |  | 
| 102 102 | 
             
            ```ruby
         | 
| @@ -115,7 +115,74 @@ else | |
| 115 115 | 
             
            end
         | 
| 116 116 | 
             
            ```
         | 
| 117 117 |  | 
| 118 | 
            -
             | 
| 118 | 
            +
            These `10` and `20` numbers are found empirically and just work enough well for 8-byte hashes.  
         | 
| 119 | 
            +
            To find out these tresholds we can run a rake task with hardcoded test cases:
         | 
| 120 | 
            +
            ```
         | 
| 121 | 
            +
            $ rake compare_matrices
         | 
| 122 | 
            +
             | 
| 123 | 
            +
            Dhash 
         | 
| 124 | 
            +
            Absolutely the same image: 0..0
         | 
| 125 | 
            +
            Complex B/W and the same but colorful: 0
         | 
| 126 | 
            +
            Similar images: 13..16
         | 
| 127 | 
            +
            Different images: 9..41
         | 
| 128 | 
            +
             | 
| 129 | 
            +
            DHashVips::DHash 
         | 
| 130 | 
            +
            Absolutely the same image: 0..0
         | 
| 131 | 
            +
            Complex B/W and the same but colorful: 5
         | 
| 132 | 
            +
            Similar images: 17..18
         | 
| 133 | 
            +
            Different images: 14..39
         | 
| 134 | 
            +
             | 
| 135 | 
            +
            DHashVips::IDHash 
         | 
| 136 | 
            +
            Absolutely the same image: 0..0
         | 
| 137 | 
            +
            Complex B/W and the same but colorful: 0
         | 
| 138 | 
            +
            Similar images: 15..23
         | 
| 139 | 
            +
            Different images: 19..64
         | 
| 140 | 
            +
             | 
| 141 | 
            +
            DHashVips::IDHash 4
         | 
| 142 | 
            +
            Absolutely the same image: 0..0
         | 
| 143 | 
            +
            Complex B/W and the same but colorful: 0
         | 
| 144 | 
            +
            Similar images: 71..108
         | 
| 145 | 
            +
            Different images: 102..211
         | 
| 146 | 
            +
             | 
| 147 | 
            +
            ```
         | 
| 148 | 
            +
             | 
| 149 | 
            +
            ### Notes
         | 
| 150 | 
            +
             | 
| 151 | 
            +
            * Methods were renamed from `#calculate` to `#fingerprint` and from `#hamming` to `#distance`.  
         | 
| 152 | 
            +
            * The `DHash#calculate` accepts `hash_size` optional parameter that is 8 by default. The `IDHash#fingerprint`'s optional parameter is called `power` and works in a bit different way: 3 means 8 and 4 means 16 -- other sizes are not supported because they don't seem to be useful (higher fingerprint resolution makes it vulnerable to image shifts and croppings). Because IDHash's fingerprint is more complex than DHash's one it's not that straight forward to compare them so under the hood the `#distance` methods have to check the size of fingerprint -- this trade-off costs 30-40% of speed that can be eliminated by using `#distance3` method that assumes fingerprint to be of power=3. So the full benchmark is this one:
         | 
| 153 | 
            +
             | 
| 154 | 
            +
            ```
         | 
| 155 | 
            +
            $ rake compare_speed
         | 
| 156 | 
            +
             | 
| 157 | 
            +
            load and calculate the fingerprint:
         | 
| 158 | 
            +
                                      user     system      total        real
         | 
| 159 | 
            +
            Dhash                12.400000   0.820000  13.220000 ( 13.329952)
         | 
| 160 | 
            +
            DHashVips::DHash      1.330000   0.230000   1.560000 (  1.509826)
         | 
| 161 | 
            +
            DHashVips::IDHash     1.060000   0.090000   1.150000 (  1.100332)
         | 
| 162 | 
            +
            DHashVips::IDHash 4   1.030000   0.080000   1.110000 (  1.089148)
         | 
| 163 | 
            +
             | 
| 164 | 
            +
            measure the distance (1000 times):
         | 
| 165 | 
            +
                                               user     system      total        real
         | 
| 166 | 
            +
            Dhash hamming                  3.140000   0.020000   3.160000 (  3.179392)
         | 
| 167 | 
            +
            DHashVips::DHash hamming       3.040000   0.020000   3.060000 (  3.095190)
         | 
| 168 | 
            +
            DHashVips::IDHash distance     8.170000   0.040000   8.210000 (  8.279950)
         | 
| 169 | 
            +
            DHashVips::IDHash distance3    6.720000   0.030000   6.750000 (  6.790900)
         | 
| 170 | 
            +
            DHashVips::IDHash distance 4  24.430000   0.130000  24.560000 ( 24.652625)
         | 
| 171 | 
            +
            ```
         | 
| 172 | 
            +
             | 
| 173 | 
            +
            Also note that to make `#distance` able to assume the fingerprint resolution from the size of Integer that represents it, the change in its structure was needed (left half of bits was swapped with right one), so fingerprints between versions 0.0.4 and 0.0.5 became incompatible, but you probably can convert them manually. I know, incompatibilities suck but if we put the version or structure information inside fingerprint it will became slow to (de)serialize and store.
         | 
| 174 | 
            +
             | 
| 175 | 
            +
            ## Troubleshooting
         | 
| 176 | 
            +
             | 
| 177 | 
            +
            El Captain and rbenv may cause environment issues that would make you do things like:
         | 
| 178 | 
            +
            ```
         | 
| 179 | 
            +
            ./ruby `rbenv which rake` compare_matrixes
         | 
| 180 | 
            +
            ```
         | 
| 181 | 
            +
            instead of just
         | 
| 182 | 
            +
            ```
         | 
| 183 | 
            +
            rake compare_matrixes
         | 
| 184 | 
            +
            ```
         | 
| 185 | 
            +
            For more information on that: https://github.com/jcupitt/ruby-vips/issues/141
         | 
| 119 186 |  | 
| 120 187 | 
             
            ## Credits
         | 
| 121 188 |  | 
    
        data/Rakefile
    CHANGED
    
    | @@ -50,99 +50,144 @@ task :compare_kernels do |_| | |
| 50 50 | 
             
              end
         | 
| 51 51 | 
             
            end
         | 
| 52 52 |  | 
| 53 | 
            -
             | 
| 53 | 
            +
            # ./ruby `rbenv which rake` compare_matrixes
         | 
| 54 54 | 
             
            desc "Compare the quality of Dhash, DHashVips::DHash and DHashVips::IDHash -- run it only after `rake test`"
         | 
| 55 | 
            -
            task : | 
| 55 | 
            +
            task :compare_matrices do |_|
         | 
| 56 56 | 
             
              require "dhash"
         | 
| 57 57 | 
             
              require_relative "lib/dhash-vips"
         | 
| 58 58 | 
             
              require "mll"
         | 
| 59 | 
            -
              [ | 
| 59 | 
            +
              [
         | 
| 60 | 
            +
                [Dhash, :calculate, :hamming],
         | 
| 61 | 
            +
                [DHashVips::DHash, :calculate, :hamming],
         | 
| 62 | 
            +
                [DHashVips::IDHash, :fingerprint, :distance],
         | 
| 63 | 
            +
                [DHashVips::IDHash, :fingerprint, :distance, 4],
         | 
| 64 | 
            +
              ].each do |m, calc, dm, power|
         | 
| 65 | 
            +
                puts "\n#{m} #{power}"
         | 
| 60 66 | 
             
                hashes = %w{
         | 
| 61 67 | 
             
                  71662d4d4029a3b41d47d5baf681ab9a.jpg
         | 
| 62 68 | 
             
                  ad8a37f872956666c3077a3e9e737984.jpg
         | 
| 69 | 
            +
             | 
| 70 | 
            +
                  6d97739b4a08f965dc9239dd24382e96.jpg
         | 
| 71 | 
            +
                  1b1d4bde376084011d027bba1c047a4b.jpg
         | 
| 72 | 
            +
             | 
| 63 73 | 
             
                  1d468d064d2e26b5b5de9a0241ef2d4b.jpg
         | 
| 64 74 | 
             
                  92d90b8977f813af803c78107e7f698e.jpg
         | 
| 75 | 
            +
             | 
| 65 76 | 
             
                  309666c7b45ecbf8f13e85a0bd6b0a4c.jpg
         | 
| 66 77 | 
             
                  3f9f3db06db20d1d9f8188cd753f6ef4.jpg
         | 
| 67 78 | 
             
                  df0a3b93e9412536ee8a11255f974141.jpg
         | 
| 68 79 | 
             
                  679634ff89a31279a39f03e278bc9a01.jpg
         | 
| 69 | 
            -
                }.map{ |filename| m. | 
| 80 | 
            +
                }.map{ |filename| m.public_send calc, "images/#{filename}", *power }
         | 
| 70 81 | 
             
                table = MLL::table[m.method(dm), [hashes], [hashes]]
         | 
| 71 | 
            -
                 | 
| 82 | 
            +
                # require "pp"
         | 
| 83 | 
            +
                # pp table
         | 
| 84 | 
            +
                array = Array.new(5){ [] }
         | 
| 72 85 | 
             
                hashes.size.times.to_a.repeated_combination(2) do |i, j|
         | 
| 73 | 
            -
                  array[i == j ? 0 : (j - i).abs == 1 && (i + j - 1) % 4 == 0 ? [i, j] == [0, 1] ? 1 : 2 : 3].push table[i][j]
         | 
| 86 | 
            +
                  array[i == j ? 0 : (j - i).abs == 1 && (i + j - 1) % 4 == 0 ? [i, j] == [0, 1] ? 1 : [i, j] == [2, 3] ? 2 : 3 : 4].push table[i][j]
         | 
| 74 87 | 
             
                end
         | 
| 75 | 
            -
                p array.map &:sort
         | 
| 88 | 
            +
                # p array.map &:sort
         | 
| 89 | 
            +
                puts "Absolutely the same image: #{array[0].minmax.join ".."}"
         | 
| 90 | 
            +
                puts "Complex B/W and the same but colorful: #{array[1][0]}"
         | 
| 91 | 
            +
                puts "Similar images: #{array[3].minmax.join ".."}"
         | 
| 92 | 
            +
                puts "Different images: #{[*array[2], *array[4]].minmax.join ".."}"
         | 
| 76 93 | 
             
              end
         | 
| 77 94 | 
             
            end
         | 
| 78 95 |  | 
| 79 96 | 
             
            # ruby -c Rakefile && rm -f ab.png && rake compare_images -- fc762fa286489d8afc80adc8cdcb125e.jpg 9c2c240ec02356472fb532f404d28dde.jpg 2>/dev/null && ql ab.png
         | 
| 97 | 
            +
            # rm -f ab.png && ./ruby `rbenv which rake` compare_images -- 6d97739b4a08f965dc9239dd24382e96.jpg 1b1d4bde376084011d027bba1c047a4b.jpg 2>/dev/null && ql ab.png
         | 
| 80 98 | 
             
            desc "Visualizes the IDHash difference measurement between two images"
         | 
| 81 99 | 
             
            task :compare_images do |_|
         | 
| 82 | 
            -
              abort "there should be two image filenames passed as arguments" unless ARGV.size | 
| 100 | 
            +
              abort "there should be two image filenames passed as arguments (and optionally the `power`)" unless (4..5) === ARGV.size
         | 
| 101 | 
            +
              abort "the optional argument should be either 3 or 4" unless [3, 4].include?(power = (ARGV[4] || 3).to_i)
         | 
| 102 | 
            +
              task ARGV.last do ; end
         | 
| 83 103 | 
             
              require_relative "lib/dhash-vips"
         | 
| 84 | 
            -
              ha, hb = ARGV. | 
| 104 | 
            +
              ha, hb = ARGV[2, 2].map{ |filename| DHashVips::IDHash.fingerprint(filename, power) }
         | 
| 85 105 | 
             
              puts "distance: #{DHashVips::IDHash.distance ha, hb}"
         | 
| 106 | 
            +
              size = 2 ** power
         | 
| 107 | 
            +
              shift = 2 * size * size
         | 
| 108 | 
            +
              ai = ha >> shift
         | 
| 109 | 
            +
              ad = ha - (ai << shift)
         | 
| 110 | 
            +
              bi = hb >> shift
         | 
| 111 | 
            +
              bd = hb - (bi << shift)
         | 
| 112 | 
            +
             | 
| 113 | 
            +
              a, b = ARGV[2, 2].map do |filename|
         | 
| 114 | 
            +
                image = Vips::Image.new_from_file filename
         | 
| 115 | 
            +
                image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w").
         | 
| 116 | 
            +
                              resize(100, vscale: 100, kernel: :nearest).colourspace("srgb")
         | 
| 117 | 
            +
              end
         | 
| 118 | 
            +
              fail unless a.width == b.width && a.height == b.height
         | 
| 86 119 |  | 
| 87 | 
            -
               | 
| 88 | 
            -
               | 
| 89 | 
            -
             | 
| 90 | 
            -
             | 
| 91 | 
            -
             | 
| 92 | 
            -
             | 
| 120 | 
            +
              _127 = shift - 1
         | 
| 121 | 
            +
              _63 = size * size - 1
         | 
| 122 | 
            +
              n = 0
         | 
| 123 | 
            +
              width = a.width
         | 
| 124 | 
            +
              height = a.height
         | 
| 125 | 
            +
             | 
| 126 | 
            +
              Vips::Operation.class_eval do
         | 
| 127 | 
            +
                old_initialize = instance_method :initialize
         | 
| 128 | 
            +
                define_method :initialize do |value|
         | 
| 129 | 
            +
                  old_initialize.bind(self).(value).tap do
         | 
| 130 | 
            +
                    self.instance_variable_set "@operation_name", value
         | 
| 93 131 | 
             
                  end
         | 
| 94 132 | 
             
                end
         | 
| 133 | 
            +
                old_set = instance_method :set
         | 
| 134 | 
            +
                define_method :set do |*args|
         | 
| 135 | 
            +
                  args[1].instance_variable_set "@operation_name", self.instance_variable_get("@operation_name") if args.first == "image"
         | 
| 136 | 
            +
                  old_set.bind(self).(*args)
         | 
| 137 | 
            +
                end
         | 
| 95 138 | 
             
              end
         | 
| 96 | 
            -
               | 
| 97 | 
            -
                 | 
| 98 | 
            -
             | 
| 99 | 
            -
             | 
| 139 | 
            +
              Vips::Image.class_eval do
         | 
| 140 | 
            +
                def copy
         | 
| 141 | 
            +
                  return self if caller.first.end_with?("/gems/ruby-vips-2.0.9/lib/vips/operation.rb:148:in `set'") &&
         | 
| 142 | 
            +
                                 %w{ draw_line draw_circle }.include?(instance_variable_get "@operation_name")
         | 
| 143 | 
            +
                  method_missing :copy
         | 
| 144 | 
            +
                end
         | 
| 100 145 | 
             
              end
         | 
| 101 146 |  | 
| 102 | 
            -
               | 
| 103 | 
            -
               | 
| 104 | 
            -
             | 
| 105 | 
            -
             | 
| 106 | 
            -
             | 
| 107 | 
            -
             | 
| 108 | 
            -
             | 
| 109 | 
            -
                127.downto(0).each do |i|
         | 
| 110 | 
            -
                  if i > 63
         | 
| 111 | 
            -
                    y, x = (127 - i).divmod 8
         | 
| 147 | 
            +
              require "get_process_mem"
         | 
| 148 | 
            +
              a, b = [[a, ad, ai], [b, bd, bi]].map do |image, xd, xi|
         | 
| 149 | 
            +
                _127.downto(0).each_with_index do |i, ii|
         | 
| 150 | 
            +
                  mem = GetProcessMem.new(Process.pid).mb
         | 
| 151 | 
            +
                  abort ">1000mb of memory consumed" if 1000 < mem
         | 
| 152 | 
            +
                  if i > _63
         | 
| 153 | 
            +
                    y, x = (_127 - i).divmod size
         | 
| 112 154 | 
             
                  else
         | 
| 113 | 
            -
                    x, y = ( | 
| 155 | 
            +
                    x, y = (_63 - i).divmod size
         | 
| 114 156 | 
             
                  end
         | 
| 115 | 
            -
                  x = ( | 
| 116 | 
            -
                  y = ( | 
| 117 | 
            -
                  if i >  | 
| 157 | 
            +
                  x = (width  * (x + 0.5) / size).round
         | 
| 158 | 
            +
                  y = (height * (y + 0.5) / size).round
         | 
| 159 | 
            +
                  if i > _63
         | 
| 118 160 | 
             
                    (x-2..x+2).map do |x| [
         | 
| 119 | 
            -
                      [x,  y | 
| 120 | 
            -
                      [x, (y +  | 
| 161 | 
            +
                      [x,  y                                  , x, (y + height / size / 2 - 1) % height],
         | 
| 162 | 
            +
                      [x, (y + height / size / 2 + 1) % height, x, (y + height / size        ) % height],
         | 
| 121 163 | 
             
                    ] end
         | 
| 122 164 | 
             
                  else
         | 
| 123 165 | 
             
                    (y-2..y+2).map do |y| [
         | 
| 124 | 
            -
                      [ x | 
| 125 | 
            -
                      [(x +  | 
| 166 | 
            +
                      [ x                                , y, (x + width / size / 2 - 1) % width, y],
         | 
| 167 | 
            +
                      [(x + width / size / 2 + 1) % width, y, (x + width / size        ) % width, y],
         | 
| 126 168 | 
             
                    ] end
         | 
| 127 169 | 
             
                  end.each do |coords1, coords2|
         | 
| 128 170 | 
             
                    n += 1
         | 
| 129 | 
            -
                    image.draw_line | 
| 130 | 
            -
                    image.draw_line | 
| 171 | 
            +
                    image = image.draw_line (1 - xd[i]) * 255, *coords1
         | 
| 172 | 
            +
                    image = image.draw_line      xd[i]  * 255, *coords2
         | 
| 131 173 | 
             
                  end if ai[i] + bi[i] > 0 && ad[i] != bd[i]
         | 
| 132 | 
            -
                  cx, cy = if i >  | 
| 133 | 
            -
                    [x, y +  | 
| 174 | 
            +
                  cx, cy = if i > _63
         | 
| 175 | 
            +
                    [x, y + 30]
         | 
| 134 176 | 
             
                  else
         | 
| 135 | 
            -
                    [x +  | 
| 177 | 
            +
                    [x + 30, y]
         | 
| 136 178 | 
             
                  end
         | 
| 137 | 
            -
                  image.draw_circle | 
| 138 | 
            -
                  image.draw_circle | 
| 179 | 
            +
                  image = image.draw_circle      xd[i]  * 255, cx, cy, 11, fill: true if xi[i] > 0
         | 
| 180 | 
            +
                  image = image.draw_circle (1 - xd[i]) * 255, cx, cy, 10, fill: true if xi[i] > 0
         | 
| 139 181 | 
             
                end
         | 
| 182 | 
            +
                image
         | 
| 140 183 | 
             
              end
         | 
| 141 184 | 
             
              puts "distance: #{n / 10}"
         | 
| 185 | 
            +
              puts "(above should be equal if raketask works correcly)"
         | 
| 142 186 |  | 
| 143 | 
            -
              a.join(b | 
| 187 | 
            +
              a.join(b, :horizontal, shim: 15).write_to_file "ab.png"
         | 
| 144 188 | 
             
            end
         | 
| 145 189 |  | 
| 190 | 
            +
            # ./ruby `rbenv which rake` compare_speed
         | 
| 146 191 | 
             
            desc "Benchmarks Dhash, DHashVips::DHash and DHashVips::IDHash"
         | 
| 147 192 | 
             
            task :compare_speed do
         | 
| 148 193 | 
             
              require "dhash"
         | 
| @@ -169,21 +214,33 @@ task :compare_speed do | |
| 169 214 | 
             
              end
         | 
| 170 215 |  | 
| 171 216 | 
             
              require "benchmark"
         | 
| 172 | 
            -
              puts "load and calculate:"
         | 
| 217 | 
            +
              puts "load and calculate the fingerprint:"
         | 
| 173 218 | 
             
              hashes = []
         | 
| 174 | 
            -
              Benchmark.bm  | 
| 175 | 
            -
                [ | 
| 176 | 
            -
                   | 
| 177 | 
            -
             | 
| 219 | 
            +
              Benchmark.bm 19 do |bm|
         | 
| 220 | 
            +
                [
         | 
| 221 | 
            +
                  [Dhash, :calculate],
         | 
| 222 | 
            +
                  [DHashVips::DHash, :calculate],
         | 
| 223 | 
            +
                  [DHashVips::IDHash, :fingerprint],
         | 
| 224 | 
            +
                  [DHashVips::IDHash, :fingerprint, 4],
         | 
| 225 | 
            +
                ].each do |m, calc, power|
         | 
| 226 | 
            +
                  bm.report "#{m} #{power}" do
         | 
| 227 | 
            +
                    hashes.push filenames.map{ |filename| m.send calc, filename, *power }
         | 
| 178 228 | 
             
                  end
         | 
| 179 229 | 
             
                end
         | 
| 180 230 | 
             
              end
         | 
| 181 | 
            -
               | 
| 182 | 
            -
               | 
| 183 | 
            -
             | 
| 184 | 
            -
             | 
| 231 | 
            +
              hashes[-1, 1] = hashes[-2, 2]     # for `distance` and `distance3` we use the same hashes
         | 
| 232 | 
            +
              puts "\nmeasure the distance (1000 times):"
         | 
| 233 | 
            +
              Benchmark.bm 28 do |bm|
         | 
| 234 | 
            +
                [
         | 
| 235 | 
            +
                  [Dhash, :hamming],
         | 
| 236 | 
            +
                  [DHashVips::DHash, :hamming],
         | 
| 237 | 
            +
                  [DHashVips::IDHash, :distance],
         | 
| 238 | 
            +
                  [DHashVips::IDHash, :distance3],
         | 
| 239 | 
            +
                  [DHashVips::IDHash, :distance, 4],
         | 
| 240 | 
            +
                ].zip(hashes) do |(m, dm, power), hs|
         | 
| 241 | 
            +
                  bm.report "#{m} #{dm} #{power}" do
         | 
| 185 242 | 
             
                    hs.product hs do |h1, h2|
         | 
| 186 | 
            -
                      1000.times{ m. | 
| 243 | 
            +
                      1000.times{ m.public_send dm, h1, h2 }
         | 
| 187 244 | 
             
                    end
         | 
| 188 245 | 
             
                  end
         | 
| 189 246 | 
             
                end
         | 
    
        data/dhash-vips.gemspec
    CHANGED
    
    | @@ -1,6 +1,6 @@ | |
| 1 1 | 
             
            Gem::Specification.new do |spec|
         | 
| 2 2 | 
             
              spec.name          = "dhash-vips"
         | 
| 3 | 
            -
              spec.version       =  | 
| 3 | 
            +
              spec.version       = "0.0.5.0"
         | 
| 4 4 | 
             
              spec.author        = "Victor Maslov"
         | 
| 5 5 | 
             
              spec.email         = "nakilon@gmail.com"
         | 
| 6 6 | 
             
              spec.summary       = "dHash and IDHash powered by Vips"
         | 
| @@ -16,4 +16,5 @@ Gem::Specification.new do |spec| | |
| 16 16 | 
             
              spec.add_development_dependency "rake"
         | 
| 17 17 | 
             
              spec.add_development_dependency "rspec-core"
         | 
| 18 18 | 
             
              spec.add_development_dependency "dhash"
         | 
| 19 | 
            +
              spec.add_development_dependency "get_process_mem"
         | 
| 19 20 | 
             
            end
         | 
    
        data/lib/dhash-vips.rb
    CHANGED
    
    | @@ -1,4 +1,3 @@ | |
| 1 | 
            -
            require_relative "dhash-vips/version"
         | 
| 2 1 | 
             
            require "vips"
         | 
| 3 2 |  | 
| 4 3 | 
             
            module DHashVips
         | 
| @@ -30,9 +29,19 @@ module DHashVips | |
| 30 29 | 
             
              module IDHash
         | 
| 31 30 | 
             
                extend self
         | 
| 32 31 |  | 
| 32 | 
            +
                def distance3 a, b
         | 
| 33 | 
            +
                  return ((a ^ b) & (a | b) >> 128).to_s(2).count "1"
         | 
| 34 | 
            +
                end
         | 
| 33 35 | 
             
                def distance a, b
         | 
| 34 | 
            -
                   | 
| 35 | 
            -
             | 
| 36 | 
            +
                  size_a, size_b = [a, b].map do |x|
         | 
| 37 | 
            +
                    case x.size
         | 
| 38 | 
            +
                    when            32 ; 8
         | 
| 39 | 
            +
                    when 128, 124, 120 ; 16
         | 
| 40 | 
            +
                    else          ; fail "invalid size of fingerprint; #{x.size}"
         | 
| 41 | 
            +
                    end
         | 
| 42 | 
            +
                  end
         | 
| 43 | 
            +
                  fail "fingerprints were taken with different `power` param: #{size_a} and #{size_b}" if size_a != size_b
         | 
| 44 | 
            +
                  ((a ^ b) & (a | b) >> 2 * size_a * size_a).to_s(2).count "1"
         | 
| 36 45 | 
             
                end
         | 
| 37 46 |  | 
| 38 47 | 
             
                @@median = lambda do |array|
         | 
| @@ -55,10 +64,10 @@ module DHashVips | |
| 55 64 | 
             
                fail unless 1 == @@median[[1, 1, 1]]
         | 
| 56 65 | 
             
                fail unless 1 == @@median[[1, 1]]
         | 
| 57 66 |  | 
| 58 | 
            -
                def  | 
| 59 | 
            -
                   | 
| 67 | 
            +
                def fingerprint file, power = 3
         | 
| 68 | 
            +
                  size = 2 ** power
         | 
| 60 69 | 
             
                  image = Vips::Image.new_from_file file
         | 
| 61 | 
            -
                  image = image.resize( | 
| 70 | 
            +
                  image = image.resize(size.fdiv(image.width), vscale: size.fdiv(image.height)).colourspace("b-w")
         | 
| 62 71 |  | 
| 63 72 | 
             
                  array = image.to_a.map &:flatten
         | 
| 64 73 | 
             
                  d1, i1, d2, i2 = [array, array.transpose].flat_map do |a|
         | 
| @@ -69,7 +78,7 @@ module DHashVips | |
| 69 78 | 
             
                      d.map{ |c| c.abs >= m ? 1 : 0 }.join.to_i(2),
         | 
| 70 79 | 
             
                    ]
         | 
| 71 80 | 
             
                  end
         | 
| 72 | 
            -
                  ((((( | 
| 81 | 
            +
                  (((((i1 << size * size) + i2) << size * size) + d1) << size * size) + d2
         | 
| 73 82 | 
             
                end
         | 
| 74 83 |  | 
| 75 84 | 
             
              end
         | 
    
        data/spec/_spec.rb
    CHANGED
    
    | @@ -2,8 +2,10 @@ require "dhash-vips" | |
| 2 2 |  | 
| 3 3 | 
             
            require "pp"
         | 
| 4 4 |  | 
| 5 | 
            +
            # TODO tests about `fingerprint(4)`
         | 
| 6 | 
            +
             | 
| 5 7 | 
             
            [
         | 
| 6 | 
            -
              [DHashVips::DHash, :hamming, 17, 18, 22, 39],
         | 
| 8 | 
            +
              [DHashVips::DHash, :hamming, :calculate, 17, 18, 22, 39],
         | 
| 7 9 | 
             
                # [[0, 17, 29, 27, 22, 29, 30, 29],
         | 
| 8 10 | 
             
                #  [17, 0, 30, 26, 33, 36, 37, 36],
         | 
| 9 11 | 
             
                #  [29, 30, 0, 18, 39, 30, 39, 36],
         | 
| @@ -12,7 +14,7 @@ require "pp" | |
| 12 14 | 
             
                #  [29, 36, 30, 30, 17, 0, 33, 30],
         | 
| 13 15 | 
             
                #  [30, 37, 39, 35, 28, 33, 0, 5],
         | 
| 14 16 | 
             
                #  [29, 36, 36, 34, 23, 30, 5, 0]]
         | 
| 15 | 
            -
              [DHashVips::IDHash, :distance, 15, 23, 28, 64],
         | 
| 17 | 
            +
              [DHashVips::IDHash, :distance, :fingerprint, 15, 23, 28, 64],
         | 
| 16 18 | 
             
                # [[0, 16, 30, 32, 46, 58, 43, 43],
         | 
| 17 19 | 
             
                #  [16, 0, 28, 28, 47, 59, 46, 47],
         | 
| 18 20 | 
             
                #  [30, 28, 0, 15, 53, 49, 53, 52],
         | 
| @@ -21,7 +23,7 @@ require "pp" | |
| 21 23 | 
             
                #  [58, 59, 49, 53, 23, 0, 44, 44],
         | 
| 22 24 | 
             
                #  [43, 46, 53, 61, 43, 44, 0, 0],
         | 
| 23 25 | 
             
                #  [43, 47, 52, 64, 45, 44, 0, 0]]
         | 
| 24 | 
            -
            ].each do |lib, dm, min_similar, max_similar, min_not_similar, max_not_similar|
         | 
| 26 | 
            +
            ].each do |lib, dm, calc, min_similar, max_similar, min_not_similar, max_not_similar|
         | 
| 25 27 |  | 
| 26 28 | 
             
            describe lib do
         | 
| 27 29 |  | 
| @@ -39,10 +41,13 @@ describe lib do | |
| 39 41 | 
             
                  df0a3b93e9412536ee8a11255f974141.jpg
         | 
| 40 42 | 
             
                  679634ff89a31279a39f03e278bc9a01.jpg
         | 
| 41 43 | 
             
                }   # these images a consecutive pairs of slightly (but enough for nice asserts) silimar images
         | 
| 44 | 
            +
                  # 6d97739b4a08f965dc9239dd24382e96.jpg
         | 
| 45 | 
            +
                  # 1b1d4bde376084011d027bba1c047a4b.jpg
         | 
| 46 | 
            +
                    # while these two are tend to be false positive match by idhash
         | 
| 42 47 | 
             
                bw1, bw2 = %w{
         | 
| 43 48 | 
             
                  71662d4d4029a3b41d47d5baf681ab9a.jpg
         | 
| 44 49 | 
             
                  ad8a37f872956666c3077a3e9e737984.jpg
         | 
| 45 | 
            -
                }   # these  | 
| 50 | 
            +
                }   # these are the same photo but of different size and colorspace
         | 
| 46 51 |  | 
| 47 52 | 
             
                example.metadata[:extra_failure_lines] = []
         | 
| 48 53 | 
             
                FileUtils.mkdir_p dir = "images"
         | 
| @@ -59,12 +64,12 @@ describe lib do | |
| 59 64 | 
             
                  end
         | 
| 60 65 | 
             
                end
         | 
| 61 66 |  | 
| 62 | 
            -
                hashes = [*images, bw1, bw2].map &described_class.method( | 
| 67 | 
            +
                hashes = [*images, bw1, bw2].map &described_class.method(calc)
         | 
| 63 68 | 
             
                table = MLL::table[described_class.method(dm), [hashes], [hashes]]
         | 
| 64 69 |  | 
| 65 70 | 
             
                # require "pp"
         | 
| 66 71 | 
             
                # pp table
         | 
| 67 | 
            -
                #  | 
| 72 | 
            +
                # next
         | 
| 68 73 |  | 
| 69 74 | 
             
                aggregate_failures do
         | 
| 70 75 | 
             
                  hashes.size.times.to_a.repeated_combination(2) do |i, j|
         | 
    
        metadata
    CHANGED
    
    | @@ -1,14 +1,14 @@ | |
| 1 1 | 
             
            --- !ruby/object:Gem::Specification
         | 
| 2 2 | 
             
            name: dhash-vips
         | 
| 3 3 | 
             
            version: !ruby/object:Gem::Version
         | 
| 4 | 
            -
              version: 0.0. | 
| 4 | 
            +
              version: 0.0.5.0
         | 
| 5 5 | 
             
            platform: ruby
         | 
| 6 6 | 
             
            authors:
         | 
| 7 7 | 
             
            - Victor Maslov
         | 
| 8 8 | 
             
            autorequire: 
         | 
| 9 9 | 
             
            bindir: bin
         | 
| 10 10 | 
             
            cert_chain: []
         | 
| 11 | 
            -
            date:  | 
| 11 | 
            +
            date: 2018-02-11 00:00:00.000000000 Z
         | 
| 12 12 | 
             
            dependencies:
         | 
| 13 13 | 
             
            - !ruby/object:Gem::Dependency
         | 
| 14 14 | 
             
              name: ruby-vips
         | 
| @@ -66,6 +66,20 @@ dependencies: | |
| 66 66 | 
             
                - - '>='
         | 
| 67 67 | 
             
                  - !ruby/object:Gem::Version
         | 
| 68 68 | 
             
                    version: '0'
         | 
| 69 | 
            +
            - !ruby/object:Gem::Dependency
         | 
| 70 | 
            +
              name: get_process_mem
         | 
| 71 | 
            +
              requirement: !ruby/object:Gem::Requirement
         | 
| 72 | 
            +
                requirements:
         | 
| 73 | 
            +
                - - '>='
         | 
| 74 | 
            +
                  - !ruby/object:Gem::Version
         | 
| 75 | 
            +
                    version: '0'
         | 
| 76 | 
            +
              type: :development
         | 
| 77 | 
            +
              prerelease: false
         | 
| 78 | 
            +
              version_requirements: !ruby/object:Gem::Requirement
         | 
| 79 | 
            +
                requirements:
         | 
| 80 | 
            +
                - - '>='
         | 
| 81 | 
            +
                  - !ruby/object:Gem::Version
         | 
| 82 | 
            +
                    version: '0'
         | 
| 69 83 | 
             
            description: 
         | 
| 70 84 | 
             
            email: nakilon@gmail.com
         | 
| 71 85 | 
             
            executables: []
         | 
| @@ -79,7 +93,6 @@ files: | |
| 79 93 | 
             
            - Rakefile
         | 
| 80 94 | 
             
            - dhash-vips.gemspec
         | 
| 81 95 | 
             
            - lib/dhash-vips.rb
         | 
| 82 | 
            -
            - lib/dhash-vips/version.rb
         | 
| 83 96 | 
             
            - spec/_spec.rb
         | 
| 84 97 | 
             
            homepage: https://github.com/nakilon/dhash-vips
         | 
| 85 98 | 
             
            licenses:
         | 
    
        data/lib/dhash-vips/version.rb
    DELETED