measurable 0.0.7 → 0.0.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: aaf11caa585477217fa9724f7f8c8a794490194f
4
- data.tar.gz: 2e802d508407c4b656501c9a18af5130ae488979
3
+ metadata.gz: ef813e1fecb8d7f5a5a19cf70018b3965ea34790
4
+ data.tar.gz: 23489086e5091bff9a868e6d167cd2b77210b030
5
5
  SHA512:
6
- metadata.gz: 971e40facf06a090f515d1cd450006ce370d6ff460c8a9af65c9791ced5fddba58defdaff70c51cd9ce202ce712c7e0074b387507155de2c93250725cd3547f0
7
- data.tar.gz: 7f2714aeca034f418567b555c0fd5f99bcfce34d2ab73d595a2d4aa5d6cb5abdd620e2502182ce8537d0d7c1dccdba4285dabe4f9877e175a4973ae1ff155126
6
+ metadata.gz: ea90a56d0a5dc4062b45aa8acd8536e9eb2032287d4f04080130f826aaf9bfe88ee0245899dfe5a0188f48768d690e05d9639c3e2dbefed4ba7323157775912a
7
+ data.tar.gz: 195c6eb734ef6b2183ceece52dc68bd68cbb316de0e26127dde3e152195ce435845b52fe965ca47546f47fa6c325cc67d065311213648a41c7bf90d0563345aa
data/.travis.yml CHANGED
@@ -2,7 +2,8 @@ language: ruby
2
2
  rvm:
3
3
  - 1.9.3
4
4
  - 2.0
5
+ - 2.1.0
5
6
  - 2.1
6
- - rbx
7
+ - rbx-2
7
8
  # uncomment this line if your project needs to run something other than `rake`:
8
9
  # script: bundle exec rspec spec
data/History.txt ADDED
@@ -0,0 +1,3 @@
1
+ === 18th May, 2014 -- Version 0.0.8
2
+
3
+ * Added Kullback-Leibler divergence.
data/README.md CHANGED
@@ -3,23 +3,25 @@
3
3
  [![Build Status](https://travis-ci.org/agarie/measurable.svg?branch=master)](https://travis-ci.org/agarie/measurable)
4
4
  [![Code Climate](https://codeclimate.com/github/agarie/measurable.png)](https://codeclimate.com/github/agarie/measurable)
5
5
 
6
- A gem to test what metric is best for certain kinds of datasets in machine learning.
6
+ A gem to test what metric is best for certain kinds of datasets in machine
7
+ learning. Besides the `Array` class, I also want to support
8
+ [NMatrix](http://github.com/sciruby/nmatrix).
7
9
 
8
- Besides the `Array` class, I also want to support `NVector` (from [NMatrix](http://github.com/sciruby/nmatrix)).
10
+ This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures),
11
+ which has a similar objective, but isn't actively maintained and doesn't support
12
+ NMatrix. Thank you, [@reddavis][reddavis]. :)
9
13
 
10
- The distance measures will be created in Ruby first. If I see that it's really too slow, I'll write some methods in C (or Java, for JRuby).
11
-
12
- This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures), which has a similar objective, but isn't actively maintained and doesn't support NMatrix. Thank you, [@reddavis][reddavis]. :)
13
-
14
- ## Install
14
+ ## Installation
15
15
 
16
16
  `gem install measurable`
17
17
 
18
- I only tested it with 2.0.0 (yes, yes, travis, I'll do it eventually). I want to support JRuby as well.
18
+ This gem is currently being tested on MRI Ruby 1.9.3, 2.0, 2.1.0, 2.1 (HEAD) and on Rubinius 2.x (HEAD). I hope to add JRuby support in the future.
19
19
 
20
- ## Distance measures
20
+ ## Available distance measures
21
21
 
22
- I'm using the term "distance measure" without much concern for the strict mathematical definition of a metric. If the documentation for one of the methods isn't clear about it being or not a metric, please open an issue.
22
+ I'm using the term "distance measure" without much concern for the strict
23
+ mathematical definition of a metric. If the documentation for one of the
24
+ methods isn't clear about it being or not a metric, please open an issue.
23
25
 
24
26
  The following are the similarity measures supported at the moment:
25
27
 
@@ -30,46 +32,40 @@ The following are the similarity measures supported at the moment:
30
32
  - Jaccard distance
31
33
  - Tanimoto distance
32
34
  - Haversine distance
33
- - Minkowski (Cityblock or Manhattan) distance
35
+ - Minkowski (aka Cityblock or Manhattan) distance
34
36
  - Chebyshev distance
35
37
  - Hamming distance
36
38
  - [Levenshtein distance](http://en.wikipedia.org/wiki/Levenshtein_distance)
37
-
38
- These still need to be implemented:
39
-
40
- - Correlation distance
41
- - Chi-square distance
42
- - Kullback-Leibler divergence
43
- - Jensen-Shannon divergence
44
- - Mahalanobis distance
45
- - Squared Mahalanobis distance
46
-
47
- I plan to update the specs to reflect that each method is (or isn't) a mathematical metric, but I want to finish implementing them first. Any help is appreciated! :)
39
+ - [Kullback-Leibler divergence](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)
48
40
 
49
41
  ## How to use
50
42
 
51
43
  The API I intend to support is something like this:
52
44
 
53
45
  ```ruby
54
- require "measurable"
55
-
56
- u = NMatrix.ones([2, 1])
57
- v = NMatrix.zeros([2, 1])
58
- w = [1, 0]
59
- x = [2, 2]
46
+ require 'measurable'
60
47
 
61
48
  # Calculate the distance between two points in space.
62
- Measurable.euclidean(u, v) # => 1.41421
63
- Measurable.euclidean(w, v) # => 1.00000
64
- Measurable.cosine([1, 2], [2, 3]) # => 0.00772
49
+ Measurable.euclidean([1, 1], [0, 0]) # => 1.41421
65
50
 
66
51
  # Calculate the norm of a vector, i.e. its distance from the origin.
52
+ Measurable.euclidean([1, 1]) # => 1.4142135623730951
53
+
54
+ # Get the cosine distance between
55
+ Measurable.cosine_distance([1, 2], [2, 3]) # => 0.007722123286332261
56
+
57
+ # Calculate sum of squares directly.
67
58
  Measurable.euclidean_squared([3, 4]) # => 25
68
59
  ```
69
60
 
61
+ Most of the methods accept arbitrary enumerable objects instead of Arrays. For example, it's possible to use [NMatrix](https://github.com/sciruby/nmatrix).
62
+
70
63
  ## Documentation
71
64
 
72
- `RDoc` syntax is used to document the project. To build it locally, you'll need to install the [Fivefish generator](https://github.com/ged/rdoc-generator-fivefish) (`gem install rdoc-generator-fivefish`) and run the following command:
65
+ `RDoc` syntax is used to document the project. To build it locally, you'll need
66
+ to install the [Fivefish
67
+ generator](https://github.com/ged/rdoc-generator-fivefish) (`gem install
68
+ rdoc-generator-fivefish`) and run the following command:
73
69
 
74
70
  ```bash
75
71
  rake rdoc
data/lib/measurable.rb CHANGED
@@ -11,10 +11,9 @@ require 'measurable/maxmin'
11
11
  require 'measurable/haversine'
12
12
  require 'measurable/hamming'
13
13
  require 'measurable/levenshtein'
14
+ require 'measurable/kullback_leibler'
14
15
 
15
16
  module Measurable
16
17
  # PI / 180 degrees.
17
18
  RAD_PER_DEG = Math::PI / 180
18
-
19
- extend self # expose all instance methods as singleton methods.
20
19
  end
@@ -1,23 +1,27 @@
1
1
  module Measurable
2
+ module Chebyshev
2
3
 
3
- # call-seq:
4
- # chebyshev(u, v) -> Float
5
- #
6
- #
7
- #
8
- # * *Arguments* :
9
- # - +u+ -> An array of Numeric objects.
10
- # - +v+ -> An array of Numeric objects.
11
- # * *Returns* :
12
- # - The L-infinite distance between +u+ and +v+.
13
- # * *Raises* :
14
- # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
15
- #
16
- def chebyshev(u, v)
17
- # TODO: Change this to a more specific, custom-made exception.
18
- raise ArgumentError if u.size != v.size
4
+ # call-seq:
5
+ # chebyshev(u, v) -> Float
6
+ #
7
+ #
8
+ #
9
+ # * *Arguments* :
10
+ # - +u+ -> An array of Numeric objects.
11
+ # - +v+ -> An array of Numeric objects.
12
+ # * *Returns* :
13
+ # - The L-infinite distance between +u+ and +v+.
14
+ # * *Raises* :
15
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
16
+ #
17
+ def chebyshev(u, v)
18
+ # TODO: Change this to a more specific, custom-made exception.
19
+ raise ArgumentError if u.size != v.size
19
20
 
20
- abs_differences = u.zip(v).map { |a| (a[0] - a[1]).abs }
21
- abs_differences.max
21
+ abs_differences = u.zip(v).map { |a| (a[0] - a[1]).abs }
22
+ abs_differences.max
23
+ end
22
24
  end
23
- end
25
+
26
+ extend Measurable::Chebyshev
27
+ end
@@ -1,50 +1,70 @@
1
+ require 'measurable/euclidean'
2
+
1
3
  module Measurable
4
+ module Cosine
2
5
 
3
- # call-seq:
4
- # cosine_similarity(u, v) -> Float
5
- #
6
- # Calculate the cosine similarity between the orientation of two vectors.
7
- #
8
- # See: http://en.wikipedia.org/wiki/Cosine_similarity
9
- #
10
- # * *Arguments* :
11
- # - +u+ -> An array of Numeric objects.
12
- # - +v+ -> An array of Numeric objects.
13
- # * *Returns* :
14
- # - The normalized dot product of +u+ and +v+, that is, the angle between
15
- # them in the n-dimensional space.
16
- # * *Raises* :
17
- # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
18
- #
19
- def cosine_similarity(u, v)
20
- # TODO: Change this to a more specific, custom-made exception.
21
- raise ArgumentError if u.size != v.size
22
-
23
- dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] }
24
-
25
- dot_product / (euclidean(u) * euclidean(v))
26
- end
6
+ # call-seq:
7
+ # cosine_similarity(u, v) -> Float
8
+ #
9
+ # Calculate the cosine similarity between the orientation of two vectors.
10
+ #
11
+ # See: http://en.wikipedia.org/wiki/Cosine_similarity
12
+ #
13
+ # * *Arguments* :
14
+ # - +u+ -> An array of Numeric objects.
15
+ # - +v+ -> An array of Numeric objects.
16
+ # * *Returns* :
17
+ # - The normalized dot product of +u+ and +v+, that is, the angle between
18
+ # them in the n-dimensional space.
19
+ # * *Raises* :
20
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
21
+ #
22
+ def cosine_similarity(u, v)
23
+ # TODO: Change this to a more specific, custom-made exception.
24
+ raise ArgumentError if u.size != v.size
25
+
26
+ dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] }
27
+
28
+ dot_product / (euclidean(u) * euclidean(v))
29
+ end
27
30
 
28
- # call-seq:
29
- # cosine_distance(u, v) -> Float
30
- #
31
- # Calculate the cosine distance between the orientation of two vectors.
32
- #
33
- # See: http://en.wikipedia.org/wiki/Cosine_similarity
34
- #
35
- # * *Arguments* :
36
- # - +u+ -> An array of Numeric objects.
37
- # - +v+ -> An array of Numeric objects.
38
- # * *Returns* :
39
- # - The normalized dot product of +u+ and +v+, that is, the angle between
40
- # them in the n-dimensional space.
41
- # * *Raises* :
42
- # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
43
- #
44
- def cosine_distance(u, v)
45
- # TODO: Change this to a more specific, custom-made exception.
46
- raise ArgumentError if u.size != v.size
47
-
48
- 1 - cosine_similarity(u, v)
31
+ # call-seq:
32
+ # cosine_distance(u, v) -> Float
33
+ #
34
+ # Calculate the cosine distance between the orientation of two vectors.
35
+ #
36
+ # See: http://en.wikipedia.org/wiki/Cosine_similarity
37
+ #
38
+ # * *Arguments* :
39
+ # - +u+ -> An array of Numeric objects.
40
+ # - +v+ -> An array of Numeric objects.
41
+ # * *Returns* :
42
+ # - The normalized dot product of +u+ and +v+, that is, the angle between
43
+ # them in the n-dimensional space.
44
+ # * *Raises* :
45
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
46
+ #
47
+ def cosine_distance(u, v)
48
+ # TODO: Change this to a more specific, custom-made exception.
49
+ raise ArgumentError if u.size != v.size
50
+
51
+ 1 - cosine_similarity(u, v)
52
+ end
53
+
54
+ def self.extended(base) # :nodoc:
55
+ base.instance_eval do
56
+ extend Measurable::Euclidean
57
+ end
58
+ super
59
+ end
60
+
61
+ def self.included(base) # :nodoc:
62
+ base.class_eval do
63
+ include Measurable::Euclidean
64
+ end
65
+ super
66
+ end
49
67
  end
68
+
69
+ extend Measurable::Cosine
50
70
  end
@@ -1,76 +1,67 @@
1
1
  module Measurable
2
+ module Euclidean
2
3
 
3
- # call-seq:
4
- # euclidean(u) -> Float
5
- # euclidean(u, v) -> Float
6
- #
7
- # Calculate the ordinary distance between arrays +u+ and +v+.
8
- #
9
- # If +v+ isn't given, calculate the Euclidean norm of +u+.
10
- #
11
- # See: http://en.wikipedia.org/wiki/Euclidean_distance#N_dimensions
12
- #
13
- # * *Arguments* :
14
- # - +u+ -> An array of Numeric objects.
15
- # - +v+ -> (Optional) An array of Numeric objects.
16
- # * *Returns* :
17
- # - The euclidean norm of +u+ or the euclidean distance between +u+ and
18
- # +v+.
19
- # * *Raises* :
20
- # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
21
- #
22
- def euclidean(u, v = nil)
23
- # If the second argument is nil, the method should return the norm of
24
- # vector u. For this, we need the distance between u and the origin.
25
- if v.nil?
26
- v = Array.new(u.size, 0)
4
+ # call-seq:
5
+ # euclidean(u) -> Float
6
+ # euclidean(u, v) -> Float
7
+ #
8
+ # Calculate the ordinary distance between arrays +u+ and +v+.
9
+ #
10
+ # If +v+ isn't given, calculate the Euclidean norm of +u+.
11
+ #
12
+ # See: http://en.wikipedia.org/wiki/Euclidean_distance#N_dimensions
13
+ #
14
+ # * *Arguments* :
15
+ # - +u+ -> An array of Numeric objects.
16
+ # - +v+ -> (Optional) An array of Numeric objects.
17
+ # * *Returns* :
18
+ # - The euclidean norm of +u+ or the euclidean distance between +u+ and
19
+ # +v+.
20
+ # * *Raises* :
21
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
22
+ #
23
+ def euclidean(u, v = nil)
24
+ Math.sqrt(self.euclidean_squared(u, v))
27
25
  end
28
26
 
29
- # TODO: Change this to a more specific, custom-made exception.
30
- raise ArgumentError if u.size != v.size
27
+ # call-seq:
28
+ # euclidean_squared(u) -> Float
29
+ # euclidean_squared(u, v) -> Float
30
+ #
31
+ # Calculate the same value as euclidean(u, v), but don't take the square root
32
+ # of it.
33
+ #
34
+ # This isn't a metric in the strict sense, i.e. it doesn't respect the
35
+ # triangle inequality. However, the squared Euclidean distance is very useful
36
+ # whenever only the relative values of distances are important, for example
37
+ # in optimization problems.
38
+ #
39
+ # See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
40
+ #
41
+ # * *Arguments* :
42
+ # - +u+ -> An array of Numeric objects.
43
+ # - +v+ -> (Optional) An array of Numeric objects.
44
+ # * *Returns* :
45
+ # - The squared value of the euclidean norm of +u+ or of the euclidean
46
+ # distance between +u+ and +v+.
47
+ # * *Raises* :
48
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
49
+ #
50
+ def euclidean_squared(u, v = nil)
51
+ # If the second argument is nil, the method should return the norm of
52
+ # vector u. For this, we need the distance between u and the origin.
53
+ if v.nil?
54
+ v = Array.new(u.size, 0)
55
+ end
31
56
 
32
- sum = u.zip(v).reduce(0.0) do |acc, ary|
33
- acc += (ary[0] - ary[-1]) ** 2
34
- end
35
-
36
- Math.sqrt(sum)
37
- end
57
+ # TODO: Change this to a more specific, custom-made exception.
58
+ raise ArgumentError if u.size != v.size
38
59
 
39
- # call-seq:
40
- # euclidean_squared(u) -> Float
41
- # euclidean_squared(u, v) -> Float
42
- #
43
- # Calculate the same value as euclidean(u, v), but don't take the square root
44
- # of it.
45
- #
46
- # This isn't a metric in the strict sense, i.e. it doesn't respect the
47
- # triangle inequality. However, the squared Euclidean distance is very useful
48
- # whenever only the relative values of distances are important, for example
49
- # in optimization problems.
50
- #
51
- # See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
52
- #
53
- # * *Arguments* :
54
- # - +u+ -> An array of Numeric objects.
55
- # - +v+ -> (Optional) An array of Numeric objects.
56
- # * *Returns* :
57
- # - The squared value of the euclidean norm of +u+ or of the euclidean
58
- # distance between +u+ and +v+.
59
- # * *Raises* :
60
- # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
61
- #
62
- def euclidean_squared(u, v = nil)
63
- # If the second argument is nil, the method should return the norm of
64
- # vector u. For this, we need the distance between u and the origin.
65
- if v.nil?
66
- v = Array.new(u.size, 0)
67
- end
68
-
69
- # TODO: Change this to a more specific, custom-made exception.
70
- raise ArgumentError if u.size != v.size
71
-
72
- u.zip(v).reduce(0.0) do |acc, ary|
73
- acc += (ary[0] - ary[-1]) ** 2
60
+ u.zip(v).reduce(0.0) do |acc, ary|
61
+ acc += (ary[0] - ary[-1]) ** 2
62
+ end
74
63
  end
75
64
  end
76
- end
65
+
66
+ extend Measurable::Euclidean
67
+ end
@@ -1,29 +1,33 @@
1
1
  module Measurable
2
+ module Hamming
2
3
 
3
- # call-seq:
4
- # hamming(s1, s2) -> Integer
5
- #
6
- # Count the number of different characters between strings +s1+ and +s2+,
7
- # that is, how many substitutions are necessary to change +s1+ into +s2+ and
8
- # vice-versa.
9
- #
10
- # See: http://en.wikipedia.org/wiki/Hamming_distance
11
- #
12
- # * *Arguments* :
13
- # - +s1+ -> A String.
14
- # - +s2+ -> A String with the same size of +s1+.
15
- # * *Returns* :
16
- # - The number of characters in which +s1+ and +s2+ differ.
17
- # * *Raises* :
18
- # - +ArgumentError+ -> The sizes of +s1+ and +s2+ don't match.
19
- #
20
- def hamming(s1, s2)
21
- # TODO: Change this to a more specific, custom-made exception.
22
- raise ArgumentError if s1.size != s2.size
4
+ # call-seq:
5
+ # hamming(s1, s2) -> Integer
6
+ #
7
+ # Count the number of different characters between strings +s1+ and +s2+,
8
+ # that is, how many substitutions are necessary to change +s1+ into +s2+ and
9
+ # vice-versa.
10
+ #
11
+ # See: http://en.wikipedia.org/wiki/Hamming_distance
12
+ #
13
+ # * *Arguments* :
14
+ # - +s1+ -> A String.
15
+ # - +s2+ -> A String with the same size of +s1+.
16
+ # * *Returns* :
17
+ # - The number of characters in which +s1+ and +s2+ differ.
18
+ # * *Raises* :
19
+ # - +ArgumentError+ -> The sizes of +s1+ and +s2+ don't match.
20
+ #
21
+ def hamming(s1, s2)
22
+ # TODO: Change this to a more specific, custom-made exception.
23
+ raise ArgumentError if s1.size != s2.size
23
24
 
24
- s1.chars.zip(s2.chars).reduce(0) do |acc, c|
25
- acc += 1 if c[0] != c[1]
26
- acc
25
+ s1.chars.zip(s2.chars).reduce(0) do |acc, c|
26
+ acc += 1 if c[0] != c[1]
27
+ acc
28
+ end
27
29
  end
28
30
  end
29
- end
31
+
32
+ extend Measurable::Hamming
33
+ end