measurable 0.0.7 → 0.0.8

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: aaf11caa585477217fa9724f7f8c8a794490194f
4
- data.tar.gz: 2e802d508407c4b656501c9a18af5130ae488979
3
+ metadata.gz: ef813e1fecb8d7f5a5a19cf70018b3965ea34790
4
+ data.tar.gz: 23489086e5091bff9a868e6d167cd2b77210b030
5
5
  SHA512:
6
- metadata.gz: 971e40facf06a090f515d1cd450006ce370d6ff460c8a9af65c9791ced5fddba58defdaff70c51cd9ce202ce712c7e0074b387507155de2c93250725cd3547f0
7
- data.tar.gz: 7f2714aeca034f418567b555c0fd5f99bcfce34d2ab73d595a2d4aa5d6cb5abdd620e2502182ce8537d0d7c1dccdba4285dabe4f9877e175a4973ae1ff155126
6
+ metadata.gz: ea90a56d0a5dc4062b45aa8acd8536e9eb2032287d4f04080130f826aaf9bfe88ee0245899dfe5a0188f48768d690e05d9639c3e2dbefed4ba7323157775912a
7
+ data.tar.gz: 195c6eb734ef6b2183ceece52dc68bd68cbb316de0e26127dde3e152195ce435845b52fe965ca47546f47fa6c325cc67d065311213648a41c7bf90d0563345aa
data/.travis.yml CHANGED
@@ -2,7 +2,8 @@ language: ruby
2
2
  rvm:
3
3
  - 1.9.3
4
4
  - 2.0
5
+ - 2.1.0
5
6
  - 2.1
6
- - rbx
7
+ - rbx-2
7
8
  # uncomment this line if your project needs to run something other than `rake`:
8
9
  # script: bundle exec rspec spec
data/History.txt ADDED
@@ -0,0 +1,3 @@
1
+ === 18th May, 2014 -- Version 0.0.8
2
+
3
+ * Added Kullback-Leibler divergence.
data/README.md CHANGED
@@ -3,23 +3,25 @@
3
3
  [![Build Status](https://travis-ci.org/agarie/measurable.svg?branch=master)](https://travis-ci.org/agarie/measurable)
4
4
  [![Code Climate](https://codeclimate.com/github/agarie/measurable.png)](https://codeclimate.com/github/agarie/measurable)
5
5
 
6
- A gem to test what metric is best for certain kinds of datasets in machine learning.
6
+ A gem to test what metric is best for certain kinds of datasets in machine
7
+ learning. Besides the `Array` class, I also want to support
8
+ [NMatrix](http://github.com/sciruby/nmatrix).
7
9
 
8
- Besides the `Array` class, I also want to support `NVector` (from [NMatrix](http://github.com/sciruby/nmatrix)).
10
+ This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures),
11
+ which has a similar objective, but isn't actively maintained and doesn't support
12
+ NMatrix. Thank you, [@reddavis][reddavis]. :)
9
13
 
10
- The distance measures will be created in Ruby first. If I see that it's really too slow, I'll write some methods in C (or Java, for JRuby).
11
-
12
- This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures), which has a similar objective, but isn't actively maintained and doesn't support NMatrix. Thank you, [@reddavis][reddavis]. :)
13
-
14
- ## Install
14
+ ## Installation
15
15
 
16
16
  `gem install measurable`
17
17
 
18
- I only tested it with 2.0.0 (yes, yes, travis, I'll do it eventually). I want to support JRuby as well.
18
+ This gem is currently being tested on MRI Ruby 1.9.3, 2.0, 2.1.0, 2.1 (HEAD) and on Rubinius 2.x (HEAD). I hope to add JRuby support in the future.
19
19
 
20
- ## Distance measures
20
+ ## Available distance measures
21
21
 
22
- I'm using the term "distance measure" without much concern for the strict mathematical definition of a metric. If the documentation for one of the methods isn't clear about it being or not a metric, please open an issue.
22
+ I'm using the term "distance measure" without much concern for the strict
23
+ mathematical definition of a metric. If the documentation for one of the
24
+ methods isn't clear about it being or not a metric, please open an issue.
23
25
 
24
26
  The following are the similarity measures supported at the moment:
25
27
 
@@ -30,46 +32,40 @@ The following are the similarity measures supported at the moment:
30
32
  - Jaccard distance
31
33
  - Tanimoto distance
32
34
  - Haversine distance
33
- - Minkowski (Cityblock or Manhattan) distance
35
+ - Minkowski (aka Cityblock or Manhattan) distance
34
36
  - Chebyshev distance
35
37
  - Hamming distance
36
38
  - [Levenshtein distance](http://en.wikipedia.org/wiki/Levenshtein_distance)
37
-
38
- These still need to be implemented:
39
-
40
- - Correlation distance
41
- - Chi-square distance
42
- - Kullback-Leibler divergence
43
- - Jensen-Shannon divergence
44
- - Mahalanobis distance
45
- - Squared Mahalanobis distance
46
-
47
- I plan to update the specs to reflect that each method is (or isn't) a mathematical metric, but I want to finish implementing them first. Any help is appreciated! :)
39
+ - [Kullback-Leibler divergence](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)
48
40
 
49
41
  ## How to use
50
42
 
51
43
  The API I intend to support is something like this:
52
44
 
53
45
  ```ruby
54
- require "measurable"
55
-
56
- u = NMatrix.ones([2, 1])
57
- v = NMatrix.zeros([2, 1])
58
- w = [1, 0]
59
- x = [2, 2]
46
+ require 'measurable'
60
47
 
61
48
  # Calculate the distance between two points in space.
62
- Measurable.euclidean(u, v) # => 1.41421
63
- Measurable.euclidean(w, v) # => 1.00000
64
- Measurable.cosine([1, 2], [2, 3]) # => 0.00772
49
+ Measurable.euclidean([1, 1], [0, 0]) # => 1.41421
65
50
 
66
51
  # Calculate the norm of a vector, i.e. its distance from the origin.
52
+ Measurable.euclidean([1, 1]) # => 1.4142135623730951
53
+
54
+ # Get the cosine distance between
55
+ Measurable.cosine_distance([1, 2], [2, 3]) # => 0.007722123286332261
56
+
57
+ # Calculate sum of squares directly.
67
58
  Measurable.euclidean_squared([3, 4]) # => 25
68
59
  ```
69
60
 
61
+ Most of the methods accept arbitrary enumerable objects instead of Arrays. For example, it's possible to use [NMatrix](https://github.com/sciruby/nmatrix).
62
+
70
63
  ## Documentation
71
64
 
72
- `RDoc` syntax is used to document the project. To build it locally, you'll need to install the [Fivefish generator](https://github.com/ged/rdoc-generator-fivefish) (`gem install rdoc-generator-fivefish`) and run the following command:
65
+ `RDoc` syntax is used to document the project. To build it locally, you'll need
66
+ to install the [Fivefish
67
+ generator](https://github.com/ged/rdoc-generator-fivefish) (`gem install
68
+ rdoc-generator-fivefish`) and run the following command:
73
69
 
74
70
  ```bash
75
71
  rake rdoc
data/lib/measurable.rb CHANGED
@@ -11,10 +11,9 @@ require 'measurable/maxmin'
11
11
  require 'measurable/haversine'
12
12
  require 'measurable/hamming'
13
13
  require 'measurable/levenshtein'
14
+ require 'measurable/kullback_leibler'
14
15
 
15
16
  module Measurable
16
17
  # PI / 180 degrees.
17
18
  RAD_PER_DEG = Math::PI / 180
18
-
19
- extend self # expose all instance methods as singleton methods.
20
19
  end
@@ -1,23 +1,27 @@
1
1
  module Measurable
2
+ module Chebyshev
2
3
 
3
- # call-seq:
4
- # chebyshev(u, v) -> Float
5
- #
6
- #
7
- #
8
- # * *Arguments* :
9
- # - +u+ -> An array of Numeric objects.
10
- # - +v+ -> An array of Numeric objects.
11
- # * *Returns* :
12
- # - The L-infinite distance between +u+ and +v+.
13
- # * *Raises* :
14
- # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
15
- #
16
- def chebyshev(u, v)
17
- # TODO: Change this to a more specific, custom-made exception.
18
- raise ArgumentError if u.size != v.size
4
+ # call-seq:
5
+ # chebyshev(u, v) -> Float
6
+ #
7
+ #
8
+ #
9
+ # * *Arguments* :
10
+ # - +u+ -> An array of Numeric objects.
11
+ # - +v+ -> An array of Numeric objects.
12
+ # * *Returns* :
13
+ # - The L-infinite distance between +u+ and +v+.
14
+ # * *Raises* :
15
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
16
+ #
17
+ def chebyshev(u, v)
18
+ # TODO: Change this to a more specific, custom-made exception.
19
+ raise ArgumentError if u.size != v.size
19
20
 
20
- abs_differences = u.zip(v).map { |a| (a[0] - a[1]).abs }
21
- abs_differences.max
21
+ abs_differences = u.zip(v).map { |a| (a[0] - a[1]).abs }
22
+ abs_differences.max
23
+ end
22
24
  end
23
- end
25
+
26
+ extend Measurable::Chebyshev
27
+ end
@@ -1,50 +1,70 @@
1
+ require 'measurable/euclidean'
2
+
1
3
  module Measurable
4
+ module Cosine
2
5
 
3
- # call-seq:
4
- # cosine_similarity(u, v) -> Float
5
- #
6
- # Calculate the cosine similarity between the orientation of two vectors.
7
- #
8
- # See: http://en.wikipedia.org/wiki/Cosine_similarity
9
- #
10
- # * *Arguments* :
11
- # - +u+ -> An array of Numeric objects.
12
- # - +v+ -> An array of Numeric objects.
13
- # * *Returns* :
14
- # - The normalized dot product of +u+ and +v+, that is, the angle between
15
- # them in the n-dimensional space.
16
- # * *Raises* :
17
- # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
18
- #
19
- def cosine_similarity(u, v)
20
- # TODO: Change this to a more specific, custom-made exception.
21
- raise ArgumentError if u.size != v.size
22
-
23
- dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] }
24
-
25
- dot_product / (euclidean(u) * euclidean(v))
26
- end
6
+ # call-seq:
7
+ # cosine_similarity(u, v) -> Float
8
+ #
9
+ # Calculate the cosine similarity between the orientation of two vectors.
10
+ #
11
+ # See: http://en.wikipedia.org/wiki/Cosine_similarity
12
+ #
13
+ # * *Arguments* :
14
+ # - +u+ -> An array of Numeric objects.
15
+ # - +v+ -> An array of Numeric objects.
16
+ # * *Returns* :
17
+ # - The normalized dot product of +u+ and +v+, that is, the angle between
18
+ # them in the n-dimensional space.
19
+ # * *Raises* :
20
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
21
+ #
22
+ def cosine_similarity(u, v)
23
+ # TODO: Change this to a more specific, custom-made exception.
24
+ raise ArgumentError if u.size != v.size
25
+
26
+ dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] }
27
+
28
+ dot_product / (euclidean(u) * euclidean(v))
29
+ end
27
30
 
28
- # call-seq:
29
- # cosine_distance(u, v) -> Float
30
- #
31
- # Calculate the cosine distance between the orientation of two vectors.
32
- #
33
- # See: http://en.wikipedia.org/wiki/Cosine_similarity
34
- #
35
- # * *Arguments* :
36
- # - +u+ -> An array of Numeric objects.
37
- # - +v+ -> An array of Numeric objects.
38
- # * *Returns* :
39
- # - The normalized dot product of +u+ and +v+, that is, the angle between
40
- # them in the n-dimensional space.
41
- # * *Raises* :
42
- # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
43
- #
44
- def cosine_distance(u, v)
45
- # TODO: Change this to a more specific, custom-made exception.
46
- raise ArgumentError if u.size != v.size
47
-
48
- 1 - cosine_similarity(u, v)
31
+ # call-seq:
32
+ # cosine_distance(u, v) -> Float
33
+ #
34
+ # Calculate the cosine distance between the orientation of two vectors.
35
+ #
36
+ # See: http://en.wikipedia.org/wiki/Cosine_similarity
37
+ #
38
+ # * *Arguments* :
39
+ # - +u+ -> An array of Numeric objects.
40
+ # - +v+ -> An array of Numeric objects.
41
+ # * *Returns* :
42
+ # - The normalized dot product of +u+ and +v+, that is, the angle between
43
+ # them in the n-dimensional space.
44
+ # * *Raises* :
45
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
46
+ #
47
+ def cosine_distance(u, v)
48
+ # TODO: Change this to a more specific, custom-made exception.
49
+ raise ArgumentError if u.size != v.size
50
+
51
+ 1 - cosine_similarity(u, v)
52
+ end
53
+
54
+ def self.extended(base) # :nodoc:
55
+ base.instance_eval do
56
+ extend Measurable::Euclidean
57
+ end
58
+ super
59
+ end
60
+
61
+ def self.included(base) # :nodoc:
62
+ base.class_eval do
63
+ include Measurable::Euclidean
64
+ end
65
+ super
66
+ end
49
67
  end
68
+
69
+ extend Measurable::Cosine
50
70
  end
@@ -1,76 +1,67 @@
1
1
  module Measurable
2
+ module Euclidean
2
3
 
3
- # call-seq:
4
- # euclidean(u) -> Float
5
- # euclidean(u, v) -> Float
6
- #
7
- # Calculate the ordinary distance between arrays +u+ and +v+.
8
- #
9
- # If +v+ isn't given, calculate the Euclidean norm of +u+.
10
- #
11
- # See: http://en.wikipedia.org/wiki/Euclidean_distance#N_dimensions
12
- #
13
- # * *Arguments* :
14
- # - +u+ -> An array of Numeric objects.
15
- # - +v+ -> (Optional) An array of Numeric objects.
16
- # * *Returns* :
17
- # - The euclidean norm of +u+ or the euclidean distance between +u+ and
18
- # +v+.
19
- # * *Raises* :
20
- # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
21
- #
22
- def euclidean(u, v = nil)
23
- # If the second argument is nil, the method should return the norm of
24
- # vector u. For this, we need the distance between u and the origin.
25
- if v.nil?
26
- v = Array.new(u.size, 0)
4
+ # call-seq:
5
+ # euclidean(u) -> Float
6
+ # euclidean(u, v) -> Float
7
+ #
8
+ # Calculate the ordinary distance between arrays +u+ and +v+.
9
+ #
10
+ # If +v+ isn't given, calculate the Euclidean norm of +u+.
11
+ #
12
+ # See: http://en.wikipedia.org/wiki/Euclidean_distance#N_dimensions
13
+ #
14
+ # * *Arguments* :
15
+ # - +u+ -> An array of Numeric objects.
16
+ # - +v+ -> (Optional) An array of Numeric objects.
17
+ # * *Returns* :
18
+ # - The euclidean norm of +u+ or the euclidean distance between +u+ and
19
+ # +v+.
20
+ # * *Raises* :
21
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
22
+ #
23
+ def euclidean(u, v = nil)
24
+ Math.sqrt(self.euclidean_squared(u, v))
27
25
  end
28
26
 
29
- # TODO: Change this to a more specific, custom-made exception.
30
- raise ArgumentError if u.size != v.size
27
+ # call-seq:
28
+ # euclidean_squared(u) -> Float
29
+ # euclidean_squared(u, v) -> Float
30
+ #
31
+ # Calculate the same value as euclidean(u, v), but don't take the square root
32
+ # of it.
33
+ #
34
+ # This isn't a metric in the strict sense, i.e. it doesn't respect the
35
+ # triangle inequality. However, the squared Euclidean distance is very useful
36
+ # whenever only the relative values of distances are important, for example
37
+ # in optimization problems.
38
+ #
39
+ # See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
40
+ #
41
+ # * *Arguments* :
42
+ # - +u+ -> An array of Numeric objects.
43
+ # - +v+ -> (Optional) An array of Numeric objects.
44
+ # * *Returns* :
45
+ # - The squared value of the euclidean norm of +u+ or of the euclidean
46
+ # distance between +u+ and +v+.
47
+ # * *Raises* :
48
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
49
+ #
50
+ def euclidean_squared(u, v = nil)
51
+ # If the second argument is nil, the method should return the norm of
52
+ # vector u. For this, we need the distance between u and the origin.
53
+ if v.nil?
54
+ v = Array.new(u.size, 0)
55
+ end
31
56
 
32
- sum = u.zip(v).reduce(0.0) do |acc, ary|
33
- acc += (ary[0] - ary[-1]) ** 2
34
- end
35
-
36
- Math.sqrt(sum)
37
- end
57
+ # TODO: Change this to a more specific, custom-made exception.
58
+ raise ArgumentError if u.size != v.size
38
59
 
39
- # call-seq:
40
- # euclidean_squared(u) -> Float
41
- # euclidean_squared(u, v) -> Float
42
- #
43
- # Calculate the same value as euclidean(u, v), but don't take the square root
44
- # of it.
45
- #
46
- # This isn't a metric in the strict sense, i.e. it doesn't respect the
47
- # triangle inequality. However, the squared Euclidean distance is very useful
48
- # whenever only the relative values of distances are important, for example
49
- # in optimization problems.
50
- #
51
- # See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
52
- #
53
- # * *Arguments* :
54
- # - +u+ -> An array of Numeric objects.
55
- # - +v+ -> (Optional) An array of Numeric objects.
56
- # * *Returns* :
57
- # - The squared value of the euclidean norm of +u+ or of the euclidean
58
- # distance between +u+ and +v+.
59
- # * *Raises* :
60
- # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
61
- #
62
- def euclidean_squared(u, v = nil)
63
- # If the second argument is nil, the method should return the norm of
64
- # vector u. For this, we need the distance between u and the origin.
65
- if v.nil?
66
- v = Array.new(u.size, 0)
67
- end
68
-
69
- # TODO: Change this to a more specific, custom-made exception.
70
- raise ArgumentError if u.size != v.size
71
-
72
- u.zip(v).reduce(0.0) do |acc, ary|
73
- acc += (ary[0] - ary[-1]) ** 2
60
+ u.zip(v).reduce(0.0) do |acc, ary|
61
+ acc += (ary[0] - ary[-1]) ** 2
62
+ end
74
63
  end
75
64
  end
76
- end
65
+
66
+ extend Measurable::Euclidean
67
+ end
@@ -1,29 +1,33 @@
1
1
  module Measurable
2
+ module Hamming
2
3
 
3
- # call-seq:
4
- # hamming(s1, s2) -> Integer
5
- #
6
- # Count the number of different characters between strings +s1+ and +s2+,
7
- # that is, how many substitutions are necessary to change +s1+ into +s2+ and
8
- # vice-versa.
9
- #
10
- # See: http://en.wikipedia.org/wiki/Hamming_distance
11
- #
12
- # * *Arguments* :
13
- # - +s1+ -> A String.
14
- # - +s2+ -> A String with the same size of +s1+.
15
- # * *Returns* :
16
- # - The number of characters in which +s1+ and +s2+ differ.
17
- # * *Raises* :
18
- # - +ArgumentError+ -> The sizes of +s1+ and +s2+ don't match.
19
- #
20
- def hamming(s1, s2)
21
- # TODO: Change this to a more specific, custom-made exception.
22
- raise ArgumentError if s1.size != s2.size
4
+ # call-seq:
5
+ # hamming(s1, s2) -> Integer
6
+ #
7
+ # Count the number of different characters between strings +s1+ and +s2+,
8
+ # that is, how many substitutions are necessary to change +s1+ into +s2+ and
9
+ # vice-versa.
10
+ #
11
+ # See: http://en.wikipedia.org/wiki/Hamming_distance
12
+ #
13
+ # * *Arguments* :
14
+ # - +s1+ -> A String.
15
+ # - +s2+ -> A String with the same size of +s1+.
16
+ # * *Returns* :
17
+ # - The number of characters in which +s1+ and +s2+ differ.
18
+ # * *Raises* :
19
+ # - +ArgumentError+ -> The sizes of +s1+ and +s2+ don't match.
20
+ #
21
+ def hamming(s1, s2)
22
+ # TODO: Change this to a more specific, custom-made exception.
23
+ raise ArgumentError if s1.size != s2.size
23
24
 
24
- s1.chars.zip(s2.chars).reduce(0) do |acc, c|
25
- acc += 1 if c[0] != c[1]
26
- acc
25
+ s1.chars.zip(s2.chars).reduce(0) do |acc, c|
26
+ acc += 1 if c[0] != c[1]
27
+ acc
28
+ end
27
29
  end
28
30
  end
29
- end
31
+
32
+ extend Measurable::Hamming
33
+ end