measurable 0.0.7 → 0.0.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.travis.yml +2 -1
- data/History.txt +3 -0
- data/README.md +28 -32
- data/lib/measurable.rb +1 -2
- data/lib/measurable/chebyshev.rb +23 -19
- data/lib/measurable/cosine.rb +65 -45
- data/lib/measurable/euclidean.rb +59 -68
- data/lib/measurable/hamming.rb +28 -24
- data/lib/measurable/haversine.rb +52 -47
- data/lib/measurable/jaccard.rb +58 -55
- data/lib/measurable/kullback_leibler.rb +39 -0
- data/lib/measurable/maxmin.rb +33 -28
- data/lib/measurable/minkowski.rb +39 -22
- data/lib/measurable/tanimoto.rb +47 -27
- data/lib/measurable/version.rb +1 -1
- data/spec/chebyshev_spec.rb +20 -1
- data/spec/cosine_spec.rb +16 -0
- data/spec/euclidean_spec.rb +17 -1
- data/spec/hamming_spec.rb +17 -1
- data/spec/haversine_spec.rb +21 -1
- data/spec/jaccard_spec.rb +21 -0
- data/spec/kullback_leibler_spec.rb +46 -0
- data/spec/levenshtein_spec.rb +16 -0
- data/spec/maxmin_spec.rb +20 -1
- data/spec/minkowski_spec.rb +17 -1
- data/spec/spec_helper.rb +1 -1
- data/spec/tanimoto_spec.rb +20 -0
- metadata +6 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: ef813e1fecb8d7f5a5a19cf70018b3965ea34790
|
4
|
+
data.tar.gz: 23489086e5091bff9a868e6d167cd2b77210b030
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ea90a56d0a5dc4062b45aa8acd8536e9eb2032287d4f04080130f826aaf9bfe88ee0245899dfe5a0188f48768d690e05d9639c3e2dbefed4ba7323157775912a
|
7
|
+
data.tar.gz: 195c6eb734ef6b2183ceece52dc68bd68cbb316de0e26127dde3e152195ce435845b52fe965ca47546f47fa6c325cc67d065311213648a41c7bf90d0563345aa
|
data/.travis.yml
CHANGED
data/History.txt
ADDED
data/README.md
CHANGED
@@ -3,23 +3,25 @@
|
|
3
3
|
[](https://travis-ci.org/agarie/measurable)
|
4
4
|
[](https://codeclimate.com/github/agarie/measurable)
|
5
5
|
|
6
|
-
A gem to test what metric is best for certain kinds of datasets in machine
|
6
|
+
A gem to test what metric is best for certain kinds of datasets in machine
|
7
|
+
learning. Besides the `Array` class, I also want to support
|
8
|
+
[NMatrix](http://github.com/sciruby/nmatrix).
|
7
9
|
|
8
|
-
|
10
|
+
This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures),
|
11
|
+
which has a similar objective, but isn't actively maintained and doesn't support
|
12
|
+
NMatrix. Thank you, [@reddavis][reddavis]. :)
|
9
13
|
|
10
|
-
|
11
|
-
|
12
|
-
This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures), which has a similar objective, but isn't actively maintained and doesn't support NMatrix. Thank you, [@reddavis][reddavis]. :)
|
13
|
-
|
14
|
-
## Install
|
14
|
+
## Installation
|
15
15
|
|
16
16
|
`gem install measurable`
|
17
17
|
|
18
|
-
|
18
|
+
This gem is currently being tested on MRI Ruby 1.9.3, 2.0, 2.1.0, 2.1 (HEAD) and on Rubinius 2.x (HEAD). I hope to add JRuby support in the future.
|
19
19
|
|
20
|
-
##
|
20
|
+
## Available distance measures
|
21
21
|
|
22
|
-
I'm using the term "distance measure" without much concern for the strict
|
22
|
+
I'm using the term "distance measure" without much concern for the strict
|
23
|
+
mathematical definition of a metric. If the documentation for one of the
|
24
|
+
methods isn't clear about it being or not a metric, please open an issue.
|
23
25
|
|
24
26
|
The following are the similarity measures supported at the moment:
|
25
27
|
|
@@ -30,46 +32,40 @@ The following are the similarity measures supported at the moment:
|
|
30
32
|
- Jaccard distance
|
31
33
|
- Tanimoto distance
|
32
34
|
- Haversine distance
|
33
|
-
- Minkowski (Cityblock or Manhattan) distance
|
35
|
+
- Minkowski (aka Cityblock or Manhattan) distance
|
34
36
|
- Chebyshev distance
|
35
37
|
- Hamming distance
|
36
38
|
- [Levenshtein distance](http://en.wikipedia.org/wiki/Levenshtein_distance)
|
37
|
-
|
38
|
-
These still need to be implemented:
|
39
|
-
|
40
|
-
- Correlation distance
|
41
|
-
- Chi-square distance
|
42
|
-
- Kullback-Leibler divergence
|
43
|
-
- Jensen-Shannon divergence
|
44
|
-
- Mahalanobis distance
|
45
|
-
- Squared Mahalanobis distance
|
46
|
-
|
47
|
-
I plan to update the specs to reflect that each method is (or isn't) a mathematical metric, but I want to finish implementing them first. Any help is appreciated! :)
|
39
|
+
- [Kullback-Leibler divergence](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)
|
48
40
|
|
49
41
|
## How to use
|
50
42
|
|
51
43
|
The API I intend to support is something like this:
|
52
44
|
|
53
45
|
```ruby
|
54
|
-
require
|
55
|
-
|
56
|
-
u = NMatrix.ones([2, 1])
|
57
|
-
v = NMatrix.zeros([2, 1])
|
58
|
-
w = [1, 0]
|
59
|
-
x = [2, 2]
|
46
|
+
require 'measurable'
|
60
47
|
|
61
48
|
# Calculate the distance between two points in space.
|
62
|
-
Measurable.euclidean(
|
63
|
-
Measurable.euclidean(w, v) # => 1.00000
|
64
|
-
Measurable.cosine([1, 2], [2, 3]) # => 0.00772
|
49
|
+
Measurable.euclidean([1, 1], [0, 0]) # => 1.41421
|
65
50
|
|
66
51
|
# Calculate the norm of a vector, i.e. its distance from the origin.
|
52
|
+
Measurable.euclidean([1, 1]) # => 1.4142135623730951
|
53
|
+
|
54
|
+
# Get the cosine distance between
|
55
|
+
Measurable.cosine_distance([1, 2], [2, 3]) # => 0.007722123286332261
|
56
|
+
|
57
|
+
# Calculate sum of squares directly.
|
67
58
|
Measurable.euclidean_squared([3, 4]) # => 25
|
68
59
|
```
|
69
60
|
|
61
|
+
Most of the methods accept arbitrary enumerable objects instead of Arrays. For example, it's possible to use [NMatrix](https://github.com/sciruby/nmatrix).
|
62
|
+
|
70
63
|
## Documentation
|
71
64
|
|
72
|
-
`RDoc` syntax is used to document the project. To build it locally, you'll need
|
65
|
+
`RDoc` syntax is used to document the project. To build it locally, you'll need
|
66
|
+
to install the [Fivefish
|
67
|
+
generator](https://github.com/ged/rdoc-generator-fivefish) (`gem install
|
68
|
+
rdoc-generator-fivefish`) and run the following command:
|
73
69
|
|
74
70
|
```bash
|
75
71
|
rake rdoc
|
data/lib/measurable.rb
CHANGED
@@ -11,10 +11,9 @@ require 'measurable/maxmin'
|
|
11
11
|
require 'measurable/haversine'
|
12
12
|
require 'measurable/hamming'
|
13
13
|
require 'measurable/levenshtein'
|
14
|
+
require 'measurable/kullback_leibler'
|
14
15
|
|
15
16
|
module Measurable
|
16
17
|
# PI / 180 degrees.
|
17
18
|
RAD_PER_DEG = Math::PI / 180
|
18
|
-
|
19
|
-
extend self # expose all instance methods as singleton methods.
|
20
19
|
end
|
data/lib/measurable/chebyshev.rb
CHANGED
@@ -1,23 +1,27 @@
|
|
1
1
|
module Measurable
|
2
|
+
module Chebyshev
|
2
3
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
4
|
+
# call-seq:
|
5
|
+
# chebyshev(u, v) -> Float
|
6
|
+
#
|
7
|
+
#
|
8
|
+
#
|
9
|
+
# * *Arguments* :
|
10
|
+
# - +u+ -> An array of Numeric objects.
|
11
|
+
# - +v+ -> An array of Numeric objects.
|
12
|
+
# * *Returns* :
|
13
|
+
# - The L-infinite distance between +u+ and +v+.
|
14
|
+
# * *Raises* :
|
15
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
16
|
+
#
|
17
|
+
def chebyshev(u, v)
|
18
|
+
# TODO: Change this to a more specific, custom-made exception.
|
19
|
+
raise ArgumentError if u.size != v.size
|
19
20
|
|
20
|
-
|
21
|
-
|
21
|
+
abs_differences = u.zip(v).map { |a| (a[0] - a[1]).abs }
|
22
|
+
abs_differences.max
|
23
|
+
end
|
22
24
|
end
|
23
|
-
|
25
|
+
|
26
|
+
extend Measurable::Chebyshev
|
27
|
+
end
|
data/lib/measurable/cosine.rb
CHANGED
@@ -1,50 +1,70 @@
|
|
1
|
+
require 'measurable/euclidean'
|
2
|
+
|
1
3
|
module Measurable
|
4
|
+
module Cosine
|
2
5
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
6
|
+
# call-seq:
|
7
|
+
# cosine_similarity(u, v) -> Float
|
8
|
+
#
|
9
|
+
# Calculate the cosine similarity between the orientation of two vectors.
|
10
|
+
#
|
11
|
+
# See: http://en.wikipedia.org/wiki/Cosine_similarity
|
12
|
+
#
|
13
|
+
# * *Arguments* :
|
14
|
+
# - +u+ -> An array of Numeric objects.
|
15
|
+
# - +v+ -> An array of Numeric objects.
|
16
|
+
# * *Returns* :
|
17
|
+
# - The normalized dot product of +u+ and +v+, that is, the angle between
|
18
|
+
# them in the n-dimensional space.
|
19
|
+
# * *Raises* :
|
20
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
21
|
+
#
|
22
|
+
def cosine_similarity(u, v)
|
23
|
+
# TODO: Change this to a more specific, custom-made exception.
|
24
|
+
raise ArgumentError if u.size != v.size
|
25
|
+
|
26
|
+
dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] }
|
27
|
+
|
28
|
+
dot_product / (euclidean(u) * euclidean(v))
|
29
|
+
end
|
27
30
|
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
31
|
+
# call-seq:
|
32
|
+
# cosine_distance(u, v) -> Float
|
33
|
+
#
|
34
|
+
# Calculate the cosine distance between the orientation of two vectors.
|
35
|
+
#
|
36
|
+
# See: http://en.wikipedia.org/wiki/Cosine_similarity
|
37
|
+
#
|
38
|
+
# * *Arguments* :
|
39
|
+
# - +u+ -> An array of Numeric objects.
|
40
|
+
# - +v+ -> An array of Numeric objects.
|
41
|
+
# * *Returns* :
|
42
|
+
# - The normalized dot product of +u+ and +v+, that is, the angle between
|
43
|
+
# them in the n-dimensional space.
|
44
|
+
# * *Raises* :
|
45
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
46
|
+
#
|
47
|
+
def cosine_distance(u, v)
|
48
|
+
# TODO: Change this to a more specific, custom-made exception.
|
49
|
+
raise ArgumentError if u.size != v.size
|
50
|
+
|
51
|
+
1 - cosine_similarity(u, v)
|
52
|
+
end
|
53
|
+
|
54
|
+
def self.extended(base) # :nodoc:
|
55
|
+
base.instance_eval do
|
56
|
+
extend Measurable::Euclidean
|
57
|
+
end
|
58
|
+
super
|
59
|
+
end
|
60
|
+
|
61
|
+
def self.included(base) # :nodoc:
|
62
|
+
base.class_eval do
|
63
|
+
include Measurable::Euclidean
|
64
|
+
end
|
65
|
+
super
|
66
|
+
end
|
49
67
|
end
|
68
|
+
|
69
|
+
extend Measurable::Cosine
|
50
70
|
end
|
data/lib/measurable/euclidean.rb
CHANGED
@@ -1,76 +1,67 @@
|
|
1
1
|
module Measurable
|
2
|
+
module Euclidean
|
2
3
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
# vector u. For this, we need the distance between u and the origin.
|
25
|
-
if v.nil?
|
26
|
-
v = Array.new(u.size, 0)
|
4
|
+
# call-seq:
|
5
|
+
# euclidean(u) -> Float
|
6
|
+
# euclidean(u, v) -> Float
|
7
|
+
#
|
8
|
+
# Calculate the ordinary distance between arrays +u+ and +v+.
|
9
|
+
#
|
10
|
+
# If +v+ isn't given, calculate the Euclidean norm of +u+.
|
11
|
+
#
|
12
|
+
# See: http://en.wikipedia.org/wiki/Euclidean_distance#N_dimensions
|
13
|
+
#
|
14
|
+
# * *Arguments* :
|
15
|
+
# - +u+ -> An array of Numeric objects.
|
16
|
+
# - +v+ -> (Optional) An array of Numeric objects.
|
17
|
+
# * *Returns* :
|
18
|
+
# - The euclidean norm of +u+ or the euclidean distance between +u+ and
|
19
|
+
# +v+.
|
20
|
+
# * *Raises* :
|
21
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
22
|
+
#
|
23
|
+
def euclidean(u, v = nil)
|
24
|
+
Math.sqrt(self.euclidean_squared(u, v))
|
27
25
|
end
|
28
26
|
|
29
|
-
#
|
30
|
-
|
27
|
+
# call-seq:
|
28
|
+
# euclidean_squared(u) -> Float
|
29
|
+
# euclidean_squared(u, v) -> Float
|
30
|
+
#
|
31
|
+
# Calculate the same value as euclidean(u, v), but don't take the square root
|
32
|
+
# of it.
|
33
|
+
#
|
34
|
+
# This isn't a metric in the strict sense, i.e. it doesn't respect the
|
35
|
+
# triangle inequality. However, the squared Euclidean distance is very useful
|
36
|
+
# whenever only the relative values of distances are important, for example
|
37
|
+
# in optimization problems.
|
38
|
+
#
|
39
|
+
# See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
|
40
|
+
#
|
41
|
+
# * *Arguments* :
|
42
|
+
# - +u+ -> An array of Numeric objects.
|
43
|
+
# - +v+ -> (Optional) An array of Numeric objects.
|
44
|
+
# * *Returns* :
|
45
|
+
# - The squared value of the euclidean norm of +u+ or of the euclidean
|
46
|
+
# distance between +u+ and +v+.
|
47
|
+
# * *Raises* :
|
48
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
49
|
+
#
|
50
|
+
def euclidean_squared(u, v = nil)
|
51
|
+
# If the second argument is nil, the method should return the norm of
|
52
|
+
# vector u. For this, we need the distance between u and the origin.
|
53
|
+
if v.nil?
|
54
|
+
v = Array.new(u.size, 0)
|
55
|
+
end
|
31
56
|
|
32
|
-
|
33
|
-
|
34
|
-
end
|
35
|
-
|
36
|
-
Math.sqrt(sum)
|
37
|
-
end
|
57
|
+
# TODO: Change this to a more specific, custom-made exception.
|
58
|
+
raise ArgumentError if u.size != v.size
|
38
59
|
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
#
|
43
|
-
# Calculate the same value as euclidean(u, v), but don't take the square root
|
44
|
-
# of it.
|
45
|
-
#
|
46
|
-
# This isn't a metric in the strict sense, i.e. it doesn't respect the
|
47
|
-
# triangle inequality. However, the squared Euclidean distance is very useful
|
48
|
-
# whenever only the relative values of distances are important, for example
|
49
|
-
# in optimization problems.
|
50
|
-
#
|
51
|
-
# See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
|
52
|
-
#
|
53
|
-
# * *Arguments* :
|
54
|
-
# - +u+ -> An array of Numeric objects.
|
55
|
-
# - +v+ -> (Optional) An array of Numeric objects.
|
56
|
-
# * *Returns* :
|
57
|
-
# - The squared value of the euclidean norm of +u+ or of the euclidean
|
58
|
-
# distance between +u+ and +v+.
|
59
|
-
# * *Raises* :
|
60
|
-
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
61
|
-
#
|
62
|
-
def euclidean_squared(u, v = nil)
|
63
|
-
# If the second argument is nil, the method should return the norm of
|
64
|
-
# vector u. For this, we need the distance between u and the origin.
|
65
|
-
if v.nil?
|
66
|
-
v = Array.new(u.size, 0)
|
67
|
-
end
|
68
|
-
|
69
|
-
# TODO: Change this to a more specific, custom-made exception.
|
70
|
-
raise ArgumentError if u.size != v.size
|
71
|
-
|
72
|
-
u.zip(v).reduce(0.0) do |acc, ary|
|
73
|
-
acc += (ary[0] - ary[-1]) ** 2
|
60
|
+
u.zip(v).reduce(0.0) do |acc, ary|
|
61
|
+
acc += (ary[0] - ary[-1]) ** 2
|
62
|
+
end
|
74
63
|
end
|
75
64
|
end
|
76
|
-
|
65
|
+
|
66
|
+
extend Measurable::Euclidean
|
67
|
+
end
|
data/lib/measurable/hamming.rb
CHANGED
@@ -1,29 +1,33 @@
|
|
1
1
|
module Measurable
|
2
|
+
module Hamming
|
2
3
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
4
|
+
# call-seq:
|
5
|
+
# hamming(s1, s2) -> Integer
|
6
|
+
#
|
7
|
+
# Count the number of different characters between strings +s1+ and +s2+,
|
8
|
+
# that is, how many substitutions are necessary to change +s1+ into +s2+ and
|
9
|
+
# vice-versa.
|
10
|
+
#
|
11
|
+
# See: http://en.wikipedia.org/wiki/Hamming_distance
|
12
|
+
#
|
13
|
+
# * *Arguments* :
|
14
|
+
# - +s1+ -> A String.
|
15
|
+
# - +s2+ -> A String with the same size of +s1+.
|
16
|
+
# * *Returns* :
|
17
|
+
# - The number of characters in which +s1+ and +s2+ differ.
|
18
|
+
# * *Raises* :
|
19
|
+
# - +ArgumentError+ -> The sizes of +s1+ and +s2+ don't match.
|
20
|
+
#
|
21
|
+
def hamming(s1, s2)
|
22
|
+
# TODO: Change this to a more specific, custom-made exception.
|
23
|
+
raise ArgumentError if s1.size != s2.size
|
23
24
|
|
24
|
-
|
25
|
-
|
26
|
-
|
25
|
+
s1.chars.zip(s2.chars).reduce(0) do |acc, c|
|
26
|
+
acc += 1 if c[0] != c[1]
|
27
|
+
acc
|
28
|
+
end
|
27
29
|
end
|
28
30
|
end
|
29
|
-
|
31
|
+
|
32
|
+
extend Measurable::Hamming
|
33
|
+
end
|