measurable 0.0.7 → 0.0.8
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.travis.yml +2 -1
- data/History.txt +3 -0
- data/README.md +28 -32
- data/lib/measurable.rb +1 -2
- data/lib/measurable/chebyshev.rb +23 -19
- data/lib/measurable/cosine.rb +65 -45
- data/lib/measurable/euclidean.rb +59 -68
- data/lib/measurable/hamming.rb +28 -24
- data/lib/measurable/haversine.rb +52 -47
- data/lib/measurable/jaccard.rb +58 -55
- data/lib/measurable/kullback_leibler.rb +39 -0
- data/lib/measurable/maxmin.rb +33 -28
- data/lib/measurable/minkowski.rb +39 -22
- data/lib/measurable/tanimoto.rb +47 -27
- data/lib/measurable/version.rb +1 -1
- data/spec/chebyshev_spec.rb +20 -1
- data/spec/cosine_spec.rb +16 -0
- data/spec/euclidean_spec.rb +17 -1
- data/spec/hamming_spec.rb +17 -1
- data/spec/haversine_spec.rb +21 -1
- data/spec/jaccard_spec.rb +21 -0
- data/spec/kullback_leibler_spec.rb +46 -0
- data/spec/levenshtein_spec.rb +16 -0
- data/spec/maxmin_spec.rb +20 -1
- data/spec/minkowski_spec.rb +17 -1
- data/spec/spec_helper.rb +1 -1
- data/spec/tanimoto_spec.rb +20 -0
- metadata +6 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: ef813e1fecb8d7f5a5a19cf70018b3965ea34790
|
4
|
+
data.tar.gz: 23489086e5091bff9a868e6d167cd2b77210b030
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ea90a56d0a5dc4062b45aa8acd8536e9eb2032287d4f04080130f826aaf9bfe88ee0245899dfe5a0188f48768d690e05d9639c3e2dbefed4ba7323157775912a
|
7
|
+
data.tar.gz: 195c6eb734ef6b2183ceece52dc68bd68cbb316de0e26127dde3e152195ce435845b52fe965ca47546f47fa6c325cc67d065311213648a41c7bf90d0563345aa
|
data/.travis.yml
CHANGED
data/History.txt
ADDED
data/README.md
CHANGED
@@ -3,23 +3,25 @@
|
|
3
3
|
[![Build Status](https://travis-ci.org/agarie/measurable.svg?branch=master)](https://travis-ci.org/agarie/measurable)
|
4
4
|
[![Code Climate](https://codeclimate.com/github/agarie/measurable.png)](https://codeclimate.com/github/agarie/measurable)
|
5
5
|
|
6
|
-
A gem to test what metric is best for certain kinds of datasets in machine
|
6
|
+
A gem to test what metric is best for certain kinds of datasets in machine
|
7
|
+
learning. Besides the `Array` class, I also want to support
|
8
|
+
[NMatrix](http://github.com/sciruby/nmatrix).
|
7
9
|
|
8
|
-
|
10
|
+
This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures),
|
11
|
+
which has a similar objective, but isn't actively maintained and doesn't support
|
12
|
+
NMatrix. Thank you, [@reddavis][reddavis]. :)
|
9
13
|
|
10
|
-
|
11
|
-
|
12
|
-
This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures), which has a similar objective, but isn't actively maintained and doesn't support NMatrix. Thank you, [@reddavis][reddavis]. :)
|
13
|
-
|
14
|
-
## Install
|
14
|
+
## Installation
|
15
15
|
|
16
16
|
`gem install measurable`
|
17
17
|
|
18
|
-
|
18
|
+
This gem is currently being tested on MRI Ruby 1.9.3, 2.0, 2.1.0, 2.1 (HEAD) and on Rubinius 2.x (HEAD). I hope to add JRuby support in the future.
|
19
19
|
|
20
|
-
##
|
20
|
+
## Available distance measures
|
21
21
|
|
22
|
-
I'm using the term "distance measure" without much concern for the strict
|
22
|
+
I'm using the term "distance measure" without much concern for the strict
|
23
|
+
mathematical definition of a metric. If the documentation for one of the
|
24
|
+
methods isn't clear about it being or not a metric, please open an issue.
|
23
25
|
|
24
26
|
The following are the similarity measures supported at the moment:
|
25
27
|
|
@@ -30,46 +32,40 @@ The following are the similarity measures supported at the moment:
|
|
30
32
|
- Jaccard distance
|
31
33
|
- Tanimoto distance
|
32
34
|
- Haversine distance
|
33
|
-
- Minkowski (Cityblock or Manhattan) distance
|
35
|
+
- Minkowski (aka Cityblock or Manhattan) distance
|
34
36
|
- Chebyshev distance
|
35
37
|
- Hamming distance
|
36
38
|
- [Levenshtein distance](http://en.wikipedia.org/wiki/Levenshtein_distance)
|
37
|
-
|
38
|
-
These still need to be implemented:
|
39
|
-
|
40
|
-
- Correlation distance
|
41
|
-
- Chi-square distance
|
42
|
-
- Kullback-Leibler divergence
|
43
|
-
- Jensen-Shannon divergence
|
44
|
-
- Mahalanobis distance
|
45
|
-
- Squared Mahalanobis distance
|
46
|
-
|
47
|
-
I plan to update the specs to reflect that each method is (or isn't) a mathematical metric, but I want to finish implementing them first. Any help is appreciated! :)
|
39
|
+
- [Kullback-Leibler divergence](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)
|
48
40
|
|
49
41
|
## How to use
|
50
42
|
|
51
43
|
The API I intend to support is something like this:
|
52
44
|
|
53
45
|
```ruby
|
54
|
-
require
|
55
|
-
|
56
|
-
u = NMatrix.ones([2, 1])
|
57
|
-
v = NMatrix.zeros([2, 1])
|
58
|
-
w = [1, 0]
|
59
|
-
x = [2, 2]
|
46
|
+
require 'measurable'
|
60
47
|
|
61
48
|
# Calculate the distance between two points in space.
|
62
|
-
Measurable.euclidean(
|
63
|
-
Measurable.euclidean(w, v) # => 1.00000
|
64
|
-
Measurable.cosine([1, 2], [2, 3]) # => 0.00772
|
49
|
+
Measurable.euclidean([1, 1], [0, 0]) # => 1.41421
|
65
50
|
|
66
51
|
# Calculate the norm of a vector, i.e. its distance from the origin.
|
52
|
+
Measurable.euclidean([1, 1]) # => 1.4142135623730951
|
53
|
+
|
54
|
+
# Get the cosine distance between
|
55
|
+
Measurable.cosine_distance([1, 2], [2, 3]) # => 0.007722123286332261
|
56
|
+
|
57
|
+
# Calculate sum of squares directly.
|
67
58
|
Measurable.euclidean_squared([3, 4]) # => 25
|
68
59
|
```
|
69
60
|
|
61
|
+
Most of the methods accept arbitrary enumerable objects instead of Arrays. For example, it's possible to use [NMatrix](https://github.com/sciruby/nmatrix).
|
62
|
+
|
70
63
|
## Documentation
|
71
64
|
|
72
|
-
`RDoc` syntax is used to document the project. To build it locally, you'll need
|
65
|
+
`RDoc` syntax is used to document the project. To build it locally, you'll need
|
66
|
+
to install the [Fivefish
|
67
|
+
generator](https://github.com/ged/rdoc-generator-fivefish) (`gem install
|
68
|
+
rdoc-generator-fivefish`) and run the following command:
|
73
69
|
|
74
70
|
```bash
|
75
71
|
rake rdoc
|
data/lib/measurable.rb
CHANGED
@@ -11,10 +11,9 @@ require 'measurable/maxmin'
|
|
11
11
|
require 'measurable/haversine'
|
12
12
|
require 'measurable/hamming'
|
13
13
|
require 'measurable/levenshtein'
|
14
|
+
require 'measurable/kullback_leibler'
|
14
15
|
|
15
16
|
module Measurable
|
16
17
|
# PI / 180 degrees.
|
17
18
|
RAD_PER_DEG = Math::PI / 180
|
18
|
-
|
19
|
-
extend self # expose all instance methods as singleton methods.
|
20
19
|
end
|
data/lib/measurable/chebyshev.rb
CHANGED
@@ -1,23 +1,27 @@
|
|
1
1
|
module Measurable
|
2
|
+
module Chebyshev
|
2
3
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
4
|
+
# call-seq:
|
5
|
+
# chebyshev(u, v) -> Float
|
6
|
+
#
|
7
|
+
#
|
8
|
+
#
|
9
|
+
# * *Arguments* :
|
10
|
+
# - +u+ -> An array of Numeric objects.
|
11
|
+
# - +v+ -> An array of Numeric objects.
|
12
|
+
# * *Returns* :
|
13
|
+
# - The L-infinite distance between +u+ and +v+.
|
14
|
+
# * *Raises* :
|
15
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
16
|
+
#
|
17
|
+
def chebyshev(u, v)
|
18
|
+
# TODO: Change this to a more specific, custom-made exception.
|
19
|
+
raise ArgumentError if u.size != v.size
|
19
20
|
|
20
|
-
|
21
|
-
|
21
|
+
abs_differences = u.zip(v).map { |a| (a[0] - a[1]).abs }
|
22
|
+
abs_differences.max
|
23
|
+
end
|
22
24
|
end
|
23
|
-
|
25
|
+
|
26
|
+
extend Measurable::Chebyshev
|
27
|
+
end
|
data/lib/measurable/cosine.rb
CHANGED
@@ -1,50 +1,70 @@
|
|
1
|
+
require 'measurable/euclidean'
|
2
|
+
|
1
3
|
module Measurable
|
4
|
+
module Cosine
|
2
5
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
6
|
+
# call-seq:
|
7
|
+
# cosine_similarity(u, v) -> Float
|
8
|
+
#
|
9
|
+
# Calculate the cosine similarity between the orientation of two vectors.
|
10
|
+
#
|
11
|
+
# See: http://en.wikipedia.org/wiki/Cosine_similarity
|
12
|
+
#
|
13
|
+
# * *Arguments* :
|
14
|
+
# - +u+ -> An array of Numeric objects.
|
15
|
+
# - +v+ -> An array of Numeric objects.
|
16
|
+
# * *Returns* :
|
17
|
+
# - The normalized dot product of +u+ and +v+, that is, the angle between
|
18
|
+
# them in the n-dimensional space.
|
19
|
+
# * *Raises* :
|
20
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
21
|
+
#
|
22
|
+
def cosine_similarity(u, v)
|
23
|
+
# TODO: Change this to a more specific, custom-made exception.
|
24
|
+
raise ArgumentError if u.size != v.size
|
25
|
+
|
26
|
+
dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] }
|
27
|
+
|
28
|
+
dot_product / (euclidean(u) * euclidean(v))
|
29
|
+
end
|
27
30
|
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
31
|
+
# call-seq:
|
32
|
+
# cosine_distance(u, v) -> Float
|
33
|
+
#
|
34
|
+
# Calculate the cosine distance between the orientation of two vectors.
|
35
|
+
#
|
36
|
+
# See: http://en.wikipedia.org/wiki/Cosine_similarity
|
37
|
+
#
|
38
|
+
# * *Arguments* :
|
39
|
+
# - +u+ -> An array of Numeric objects.
|
40
|
+
# - +v+ -> An array of Numeric objects.
|
41
|
+
# * *Returns* :
|
42
|
+
# - The normalized dot product of +u+ and +v+, that is, the angle between
|
43
|
+
# them in the n-dimensional space.
|
44
|
+
# * *Raises* :
|
45
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
46
|
+
#
|
47
|
+
def cosine_distance(u, v)
|
48
|
+
# TODO: Change this to a more specific, custom-made exception.
|
49
|
+
raise ArgumentError if u.size != v.size
|
50
|
+
|
51
|
+
1 - cosine_similarity(u, v)
|
52
|
+
end
|
53
|
+
|
54
|
+
def self.extended(base) # :nodoc:
|
55
|
+
base.instance_eval do
|
56
|
+
extend Measurable::Euclidean
|
57
|
+
end
|
58
|
+
super
|
59
|
+
end
|
60
|
+
|
61
|
+
def self.included(base) # :nodoc:
|
62
|
+
base.class_eval do
|
63
|
+
include Measurable::Euclidean
|
64
|
+
end
|
65
|
+
super
|
66
|
+
end
|
49
67
|
end
|
68
|
+
|
69
|
+
extend Measurable::Cosine
|
50
70
|
end
|
data/lib/measurable/euclidean.rb
CHANGED
@@ -1,76 +1,67 @@
|
|
1
1
|
module Measurable
|
2
|
+
module Euclidean
|
2
3
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
# vector u. For this, we need the distance between u and the origin.
|
25
|
-
if v.nil?
|
26
|
-
v = Array.new(u.size, 0)
|
4
|
+
# call-seq:
|
5
|
+
# euclidean(u) -> Float
|
6
|
+
# euclidean(u, v) -> Float
|
7
|
+
#
|
8
|
+
# Calculate the ordinary distance between arrays +u+ and +v+.
|
9
|
+
#
|
10
|
+
# If +v+ isn't given, calculate the Euclidean norm of +u+.
|
11
|
+
#
|
12
|
+
# See: http://en.wikipedia.org/wiki/Euclidean_distance#N_dimensions
|
13
|
+
#
|
14
|
+
# * *Arguments* :
|
15
|
+
# - +u+ -> An array of Numeric objects.
|
16
|
+
# - +v+ -> (Optional) An array of Numeric objects.
|
17
|
+
# * *Returns* :
|
18
|
+
# - The euclidean norm of +u+ or the euclidean distance between +u+ and
|
19
|
+
# +v+.
|
20
|
+
# * *Raises* :
|
21
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
22
|
+
#
|
23
|
+
def euclidean(u, v = nil)
|
24
|
+
Math.sqrt(self.euclidean_squared(u, v))
|
27
25
|
end
|
28
26
|
|
29
|
-
#
|
30
|
-
|
27
|
+
# call-seq:
|
28
|
+
# euclidean_squared(u) -> Float
|
29
|
+
# euclidean_squared(u, v) -> Float
|
30
|
+
#
|
31
|
+
# Calculate the same value as euclidean(u, v), but don't take the square root
|
32
|
+
# of it.
|
33
|
+
#
|
34
|
+
# This isn't a metric in the strict sense, i.e. it doesn't respect the
|
35
|
+
# triangle inequality. However, the squared Euclidean distance is very useful
|
36
|
+
# whenever only the relative values of distances are important, for example
|
37
|
+
# in optimization problems.
|
38
|
+
#
|
39
|
+
# See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
|
40
|
+
#
|
41
|
+
# * *Arguments* :
|
42
|
+
# - +u+ -> An array of Numeric objects.
|
43
|
+
# - +v+ -> (Optional) An array of Numeric objects.
|
44
|
+
# * *Returns* :
|
45
|
+
# - The squared value of the euclidean norm of +u+ or of the euclidean
|
46
|
+
# distance between +u+ and +v+.
|
47
|
+
# * *Raises* :
|
48
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
49
|
+
#
|
50
|
+
def euclidean_squared(u, v = nil)
|
51
|
+
# If the second argument is nil, the method should return the norm of
|
52
|
+
# vector u. For this, we need the distance between u and the origin.
|
53
|
+
if v.nil?
|
54
|
+
v = Array.new(u.size, 0)
|
55
|
+
end
|
31
56
|
|
32
|
-
|
33
|
-
|
34
|
-
end
|
35
|
-
|
36
|
-
Math.sqrt(sum)
|
37
|
-
end
|
57
|
+
# TODO: Change this to a more specific, custom-made exception.
|
58
|
+
raise ArgumentError if u.size != v.size
|
38
59
|
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
#
|
43
|
-
# Calculate the same value as euclidean(u, v), but don't take the square root
|
44
|
-
# of it.
|
45
|
-
#
|
46
|
-
# This isn't a metric in the strict sense, i.e. it doesn't respect the
|
47
|
-
# triangle inequality. However, the squared Euclidean distance is very useful
|
48
|
-
# whenever only the relative values of distances are important, for example
|
49
|
-
# in optimization problems.
|
50
|
-
#
|
51
|
-
# See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
|
52
|
-
#
|
53
|
-
# * *Arguments* :
|
54
|
-
# - +u+ -> An array of Numeric objects.
|
55
|
-
# - +v+ -> (Optional) An array of Numeric objects.
|
56
|
-
# * *Returns* :
|
57
|
-
# - The squared value of the euclidean norm of +u+ or of the euclidean
|
58
|
-
# distance between +u+ and +v+.
|
59
|
-
# * *Raises* :
|
60
|
-
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
61
|
-
#
|
62
|
-
def euclidean_squared(u, v = nil)
|
63
|
-
# If the second argument is nil, the method should return the norm of
|
64
|
-
# vector u. For this, we need the distance between u and the origin.
|
65
|
-
if v.nil?
|
66
|
-
v = Array.new(u.size, 0)
|
67
|
-
end
|
68
|
-
|
69
|
-
# TODO: Change this to a more specific, custom-made exception.
|
70
|
-
raise ArgumentError if u.size != v.size
|
71
|
-
|
72
|
-
u.zip(v).reduce(0.0) do |acc, ary|
|
73
|
-
acc += (ary[0] - ary[-1]) ** 2
|
60
|
+
u.zip(v).reduce(0.0) do |acc, ary|
|
61
|
+
acc += (ary[0] - ary[-1]) ** 2
|
62
|
+
end
|
74
63
|
end
|
75
64
|
end
|
76
|
-
|
65
|
+
|
66
|
+
extend Measurable::Euclidean
|
67
|
+
end
|
data/lib/measurable/hamming.rb
CHANGED
@@ -1,29 +1,33 @@
|
|
1
1
|
module Measurable
|
2
|
+
module Hamming
|
2
3
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
4
|
+
# call-seq:
|
5
|
+
# hamming(s1, s2) -> Integer
|
6
|
+
#
|
7
|
+
# Count the number of different characters between strings +s1+ and +s2+,
|
8
|
+
# that is, how many substitutions are necessary to change +s1+ into +s2+ and
|
9
|
+
# vice-versa.
|
10
|
+
#
|
11
|
+
# See: http://en.wikipedia.org/wiki/Hamming_distance
|
12
|
+
#
|
13
|
+
# * *Arguments* :
|
14
|
+
# - +s1+ -> A String.
|
15
|
+
# - +s2+ -> A String with the same size of +s1+.
|
16
|
+
# * *Returns* :
|
17
|
+
# - The number of characters in which +s1+ and +s2+ differ.
|
18
|
+
# * *Raises* :
|
19
|
+
# - +ArgumentError+ -> The sizes of +s1+ and +s2+ don't match.
|
20
|
+
#
|
21
|
+
def hamming(s1, s2)
|
22
|
+
# TODO: Change this to a more specific, custom-made exception.
|
23
|
+
raise ArgumentError if s1.size != s2.size
|
23
24
|
|
24
|
-
|
25
|
-
|
26
|
-
|
25
|
+
s1.chars.zip(s2.chars).reduce(0) do |acc, c|
|
26
|
+
acc += 1 if c[0] != c[1]
|
27
|
+
acc
|
28
|
+
end
|
27
29
|
end
|
28
30
|
end
|
29
|
-
|
31
|
+
|
32
|
+
extend Measurable::Hamming
|
33
|
+
end
|