measurable 0.0.5 → 0.0.11

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 66337383a6c25685893bb39f7caf5c7f0b40bcff
4
- data.tar.gz: 117a22bb28b1d36f14780d3a22bbad7211b279a0
2
+ SHA256:
3
+ metadata.gz: 3ea2c713d50fac6bd342f46e18eb9ed8267ff65cfdbf6f1ee75c70bcc92d5b1c
4
+ data.tar.gz: fa3c04483562118a4d875edce0d8afbcac5cf3ee5fa29e8b5b8d04b374058b73
5
5
  SHA512:
6
- metadata.gz: 0d7aff51213d2f0ca31472d1d5f43b3ce99d58255b9d3d53485b0983e953866a365927734c1852c7040f7c23149455712af6357679e03ff63bf247709ac7e124
7
- data.tar.gz: 652066925d87d7d52656f87346c856d34296e4a98662e94a670d72e9612c568dd95bf758314c524501700b21b53484eaaf059e5b5fee315129fb1b0b90a170bd
6
+ metadata.gz: 46aa4474a64b5b9e7e5c3ee53adf7804d618941528d0722e0bdff82e5010dc9f978d553ea5c7779d7917f26a9bc0e0f4c993bd74653d7f81b698abc9aabbcf9d
7
+ data.tar.gz: 9eaf669fef73e90a7dc4196939634f42c54f21471301c4913b9b46155feaf59a0755f88bf325a373d1242653fbe19e7f6d06fb4405e411a5e1aeef596a0b8713
data/.gitignore CHANGED
@@ -1,4 +1,6 @@
1
1
  pkg
2
2
  tmp/*
3
3
  benchmarks/*
4
- lib/*.bundle
4
+ lib/*.bundle
5
+ html/
6
+ Gemfile.lock
@@ -0,0 +1,7 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.5
4
+ - 2.6
5
+ - 2.7
6
+ # uncomment this line if your project needs to run something other than `rake`:
7
+ # script: bundle exec rspec spec
data/Gemfile CHANGED
@@ -1,3 +1,4 @@
1
1
  # Gemfile
2
- source "http://rubygems.org"
2
+ source "https://rubygems.org"
3
+
3
4
  gemspec
@@ -0,0 +1,11 @@
1
+ 0.0.11 -- 22th June, 2020
2
+ * Updated rake & rdoc
3
+ * Updated Travis CI config
4
+ * ... honestly, just getting back to this repository
5
+
6
+ 0.0.9 -- 16th April, 2015
7
+ * Removed unnecessary argument length check from jaccard_index.
8
+ * Host documentation on rubydoc.info.
9
+
10
+ 0.0.8 -- 18th May, 2014
11
+ * Added Kullback-Leibler divergence.
data/README.md CHANGED
@@ -1,22 +1,27 @@
1
1
  # Measurable
2
2
 
3
- A gem to test what metric is best for certain kinds of datasets in machine learning.
3
+ [![Build Status](https://travis-ci.org/agarie/measurable.svg?branch=master)](https://travis-ci.org/agarie/measurable)
4
+ [![Code Climate](https://codeclimate.com/github/agarie/measurable.png)](https://codeclimate.com/github/agarie/measurable)
4
5
 
5
- Besides the `Array` class, I also want to support `NVector` (from [NMatrix](http://github.com/sciruby/nmatrix)).
6
+ A gem to test what metric is best for certain kinds of datasets in machine
7
+ learning. Besides the `Array` class, I also want to support
8
+ [NMatrix](http://github.com/sciruby/nmatrix).
6
9
 
7
- The distance measures will be created in Ruby first. If I see that it's really too slow, I'll write some methods in C (or Java, for JRuby).
10
+ This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures),
11
+ which has a similar objective, but isn't actively maintained and doesn't support
12
+ NMatrix. Thank you, [@reddavis][reddavis]. :)
8
13
 
9
- This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures), which has a similar objective, but isn't actively maintained and doesn't support NMatrix. Thank you, [@reddavis][reddavis]. :)
10
-
11
- ## Install
14
+ ## Installation
12
15
 
13
16
  `gem install measurable`
14
17
 
15
- I only tested it with 2.0.0 (yes, yes, travis, I'll do it eventually). I want to support JRuby as well.
18
+ I test this gem (via Travis CI) on Ruby MRI 2.5, 2.6 and 2.7.
16
19
 
17
- ## Distance measures
20
+ ## Available distance measures
18
21
 
19
- I'm using the term "distance measure" without much concern for the strict mathematical definition of a metric. If the documentation for one of the methods isn't clear about it being or not a metric, please open an issue.
22
+ I'm using the term "distance measure" without much concern for the strict
23
+ mathematical definition of a metric. If the documentation for one of the
24
+ methods isn't clear about it being or not a metric, please open an issue.
20
25
 
21
26
  The following are the similarity measures supported at the moment:
22
27
 
@@ -27,52 +32,37 @@ The following are the similarity measures supported at the moment:
27
32
  - Jaccard distance
28
33
  - Tanimoto distance
29
34
  - Haversine distance
30
-
31
- These still need to be implemented:
32
-
33
- - Cityblock distance
35
+ - Minkowski (aka Cityblock or Manhattan) distance
34
36
  - Chebyshev distance
35
- - Minkowski distance
36
37
  - Hamming distance
37
- - Correlation distance
38
- - Chi-square distance
39
- - Kullback-Leibler divergence
40
- - Jensen-Shannon divergence
41
- - Mahalanobis distance
42
- - Squared Mahalanobis distance
38
+ - [Levenshtein distance](http://en.wikipedia.org/wiki/Levenshtein_distance)
39
+ - [Kullback-Leibler divergence](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)
43
40
 
44
41
  ## How to use
45
42
 
46
43
  The API I intend to support is something like this:
47
44
 
48
45
  ```ruby
49
- require "measurable"
50
-
51
- u = NVector.ones(2)
52
- v = NVector.zeros(2)
53
- w = [1, 0]
54
- x = [2, 2]
46
+ require 'measurable'
55
47
 
56
48
  # Calculate the distance between two points in space.
57
- Measurable.euclidean(u, v) # => 1.41421
58
- Measurable.euclidean(w, v) # => 1.00000
59
- Measurable.cosine([1, 2], [2, 3]) # => 0.00772
49
+ Measurable.euclidean([1, 1], [0, 0]) # => 1.41421
60
50
 
61
51
  # Calculate the norm of a vector, i.e. its distance from the origin.
62
- Measurable.euclidean_squared([3, 4]) # => 25
63
- ```
52
+ Measurable.euclidean([1, 1]) # => 1.4142135623730951
64
53
 
65
- ## Documentation
66
-
67
- `RDoc` syntax is used to document the project. To build it locally, you'll need to install the [Fivefish generator](https://github.com/ged/rdoc-generator-fivefish) (`gem install rdoc-generator-fivefish`) and run the following command:
54
+ # Get the cosine distance between
55
+ Measurable.cosine_distance([1, 2], [2, 3]) # => 0.007722123286332261
68
56
 
69
- ```bash
70
- rdoc -f fivefish -m README.md *.md LICENSE lib/
57
+ # Calculate sum of squares directly.
58
+ Measurable.euclidean_squared([3, 4]) # => 25
71
59
  ```
72
60
 
73
- I want to be able to use a Rake task to generate the documentation, thus allowing me to forget the specific command. However, there's a bug in `RDoc::Task` in which [custom generators (like Fivefish) can't be used](https://github.com/rdoc/rdoc/issues/246).
61
+ Most of the methods accept arbitrary enumerable objects instead of Arrays. For example, it's possible to use [NMatrix](https://github.com/sciruby/nmatrix).
62
+
63
+ ## Documentation
74
64
 
75
- If there's something wrong with an explanation or if there's information missing, please open an issue or send a pull request.
65
+ The documentation is hosted on [rubydoc](http://www.rubydoc.info/github/agarie/measurable).
76
66
 
77
67
  ## License
78
68
 
@@ -81,4 +71,4 @@ See LICENSE for details.
81
71
  The original `distance_measures` gem is copyrighted by [@reddavis][reddavis].
82
72
 
83
73
  [maxmin]: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05156398
84
- [reddavis]: (https://github.com/reddavis)
74
+ [reddavis]: (https://github.com/reddavis)
data/Rakefile CHANGED
@@ -1,7 +1,7 @@
1
1
  require 'rake'
2
2
  require 'bundler/gem_tasks'
3
3
  require "rspec/core/rake_task"
4
- # require 'rdoc/task' # See below.
4
+ require 'rdoc/task'
5
5
 
6
6
  # Setup the necessary gems, specified in the gemspec.
7
7
  require 'bundler'
@@ -13,20 +13,26 @@ rescue Bundler::BundlerError => e
13
13
  exit e.status_code
14
14
  end
15
15
 
16
+ task :default => [:spec]
17
+
16
18
  # Run all the specs.
17
19
  RSpec::Core::RakeTask.new(:spec)
18
20
 
19
- # RDoc task isn't working with custom generators, as can be seen in:
20
- # https://github.com/rdoc/rdoc/issues/246
21
- #
22
- # Whenever this issue is fixed, I'll resume using this task.
23
- #
24
- # RDoc::Task.new do |rdoc|
25
- # rdoc.main = "README.md"
26
- # rdoc.rdoc_files.include("README.md", "LICENSE", "lib")
27
- # rdoc.generator = "fivefish"
28
- # rdoc.external = true
29
- # end
21
+ RDoc::Task.new do |rdoc|
22
+ rdoc.main = "README.md"
23
+ rdoc.rdoc_files.include("README.md", "LICENSE", "lib")
24
+ rdoc.generator = "fivefish"
25
+ rdoc.external = true
26
+ end
27
+
28
+ desc "Open IRB with Measurable loaded."
29
+ task :console do
30
+ require 'irb'
31
+ require 'irb/completion'
32
+ require 'measurable'
33
+ ARGV.clear
34
+ IRB.start
35
+ end
30
36
 
31
37
  # Compile task.
32
38
  # Rake::ExtensionTask.new do |ext|
@@ -2,15 +2,18 @@ require 'measurable/version'
2
2
 
3
3
  # Distance measures. The require order is important.
4
4
  require 'measurable/euclidean'
5
+ require 'measurable/minkowski'
5
6
  require 'measurable/cosine'
6
7
  require 'measurable/jaccard'
7
8
  require 'measurable/tanimoto'
8
- require 'measurable/haversine'
9
+ require 'measurable/chebyshev'
9
10
  require 'measurable/maxmin'
11
+ require 'measurable/haversine'
12
+ require 'measurable/hamming'
13
+ require 'measurable/levenshtein'
14
+ require 'measurable/kullback_leibler'
10
15
 
11
16
  module Measurable
12
17
  # PI / 180 degrees.
13
18
  RAD_PER_DEG = Math::PI / 180
14
-
15
- extend self # expose all instance methods as singleton methods.
16
19
  end
@@ -0,0 +1,24 @@
1
+ module Measurable
2
+ module Chebyshev
3
+
4
+ # call-seq:
5
+ # chebyshev(u, v) -> Float
6
+ #
7
+ # Arguments:
8
+ # - +u+ -> An array of Numeric objects.
9
+ # - +v+ -> An array of Numeric objects.
10
+ # Returns:
11
+ # - The L-infinite distance between +u+ and +v+.
12
+ # Raises:
13
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
14
+ def chebyshev(u, v)
15
+ # TODO: Change this to a more specific, custom-made exception.
16
+ raise ArgumentError if u.size != v.size
17
+
18
+ abs_differences = u.zip(v).map { |a| (a[0] - a[1]).abs }
19
+ abs_differences.max
20
+ end
21
+ end
22
+
23
+ extend Measurable::Chebyshev
24
+ end
@@ -1,27 +1,69 @@
1
+ require 'measurable/euclidean'
2
+
1
3
  module Measurable
4
+ module Cosine
5
+
6
+ # call-seq:
7
+ # cosine_similarity(u, v) -> Float
8
+ #
9
+ # Calculate the cosine similarity between the orientation of two vectors.
10
+ #
11
+ # See: http://en.wikipedia.org/wiki/Cosine_similarity
12
+ #
13
+ # Arguments:
14
+ # - +u+ -> An array of Numeric objects.
15
+ # - +v+ -> An array of Numeric objects.
16
+ # Returns:
17
+ # - The normalized dot product of +u+ and +v+, that is, the angle between
18
+ # them in the n-dimensional space.
19
+ # Raises:
20
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
21
+ #
22
+ def cosine_similarity(u, v)
23
+ # TODO: Change this to a more specific, custom-made exception.
24
+ raise ArgumentError if u.size != v.size
25
+
26
+ dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] }
27
+
28
+ dot_product / (euclidean(u) * euclidean(v))
29
+ end
2
30
 
3
- # call-seq:
4
- # cosine(u, v) -> Float
5
- #
6
- # Calculate the similarity between the orientation of two vectors.
7
- #
8
- # See: http://en.wikipedia.org/wiki/Cosine_similarity
9
- #
10
- # * *Arguments* :
11
- # - +u+ -> An array of Numeric objects.
12
- # - +v+ -> An array of Numeric objects.
13
- # * *Returns* :
14
- # - The normalized dot product of +u+ and +v+, that is, the angle between
15
- # them in the n-dimensional space.
16
- # * *Raises* :
17
- # - +ArgumentError+ -> The sizes of +u+ and +v+ doesn't match.
18
- #
19
- def cosine(u, v)
20
- # TODO: Change this to a more specific, custom-made exception.
21
- raise ArgumentError if u.size != v.size
22
-
23
- dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] }
24
-
25
- dot_product / (euclidean(u) * euclidean(v))
31
+ # call-seq:
32
+ # cosine_distance(u, v) -> Float
33
+ #
34
+ # Calculate the cosine distance between the orientation of two vectors.
35
+ #
36
+ # See: http://en.wikipedia.org/wiki/Cosine_similarity
37
+ #
38
+ # Arguments:
39
+ # - +u+ -> An array of Numeric objects.
40
+ # - +v+ -> An array of Numeric objects.
41
+ # Returns:
42
+ # - The normalized dot product of +u+ and +v+, that is, the angle between
43
+ # them in the n-dimensional space.
44
+ # Raises:
45
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
46
+ def cosine_distance(u, v)
47
+ # TODO: Change this to a more specific, custom-made exception.
48
+ raise ArgumentError if u.size != v.size
49
+
50
+ 1 - cosine_similarity(u, v)
51
+ end
52
+
53
+ def self.extended(base) # :nodoc:
54
+ base.instance_eval do
55
+ extend Measurable::Euclidean
56
+ end
57
+ super
58
+ end
59
+
60
+ def self.included(base) # :nodoc:
61
+ base.class_eval do
62
+ include Measurable::Euclidean
63
+ end
64
+ super
65
+ end
26
66
  end
27
- end
67
+
68
+ extend Measurable::Cosine
69
+ end
@@ -1,76 +1,64 @@
1
1
  module Measurable
2
+ module Euclidean
2
3
 
3
- # call-seq:
4
- # euclidean(u) -> Float
5
- # euclidean(u, v) -> Float
6
- #
7
- # Calculate the ordinary distance between arrays +u+ and +v+.
8
- #
9
- # If +v+ isn't given, calculate the Euclidean norm of +u+.
10
- #
11
- # See: http://en.wikipedia.org/wiki/Euclidean_distance#N_dimensions
12
- #
13
- # * *Arguments* :
14
- # - +u+ -> An array of Numeric objects.
15
- # - +v+ -> (Optional) An array of Numeric objects.
16
- # * *Returns* :
17
- # - The euclidean norm of +u+ or the euclidean distance between +u+ and
18
- # +v+.
19
- # * *Raises* :
20
- # - +ArgumentError+ -> The sizes of +u+ and +v+ doesn't match.
21
- #
22
- def euclidean(u, v = nil)
23
- # If the second argument is nil, the method should return the norm of
24
- # vector u. For this, we need the distance between u and the origin.
25
- if v.nil?
26
- v = Array.new(u.size, 0)
4
+ # call-seq:
5
+ # euclidean(u) -> Float
6
+ # euclidean(u, v) -> Float
7
+ #
8
+ # Calculate the ordinary distance between arrays +u+ and +v+.
9
+ #
10
+ # If +v+ isn't given, calculate the Euclidean norm of +u+.
11
+ #
12
+ # See: http://en.wikipedia.org/wiki/Euclidean_distance#N_dimensions
13
+ #
14
+ # Arguments:
15
+ # - +u+ -> An array of Numeric objects.
16
+ # - +v+ -> (Optional) An array of Numeric objects.
17
+ # Returns:
18
+ # - The euclidean norm of +u+ or the euclidean distance between +u+ and +v+.
19
+ # Raises:
20
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
21
+ def euclidean(u, v = nil)
22
+ Math.sqrt(self.euclidean_squared(u, v))
27
23
  end
28
24
 
29
- # TODO: Change this to a more specific, custom-made exception.
30
- raise ArgumentError if u.size != v.size
25
+ # call-seq:
26
+ # euclidean_squared(u) -> Float
27
+ # euclidean_squared(u, v) -> Float
28
+ #
29
+ # Calculate the same value as euclidean(u, v), but don't take the square root
30
+ # of it.
31
+ #
32
+ # This isn't a metric in the strict sense, i.e. it doesn't respect the
33
+ # triangle inequality. However, the squared Euclidean distance is very useful
34
+ # whenever only the relative values of distances are important, for example
35
+ # in optimization problems.
36
+ #
37
+ # See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
38
+ #
39
+ # Arguments:
40
+ # - +u+ -> An array of Numeric objects.
41
+ # - +v+ -> (Optional) An array of Numeric objects.
42
+ # Returns:
43
+ # - The squared value of the euclidean norm of +u+ or of the euclidean
44
+ # distance between +u+ and +v+.
45
+ # Raises:
46
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
47
+ def euclidean_squared(u, v = nil)
48
+ # If the second argument is nil, the method should return the norm of
49
+ # vector u. For this, we need the distance between u and the origin.
50
+ if v.nil?
51
+ v = Array.new(u.size, 0)
52
+ end
31
53
 
32
- sum = u.zip(v).reduce(0.0) do |acc, ary|
33
- acc += (ary[0] - ary[-1]) ** 2
34
- end
35
-
36
- Math.sqrt(sum)
37
- end
54
+ # TODO: Change this to a more specific, custom-made exception.
55
+ raise ArgumentError if u.size != v.size
38
56
 
39
- # call-seq:
40
- # euclidean_squared(u) -> Float
41
- # euclidean_squared(u, v) -> Float
42
- #
43
- # Calculate the same value as euclidean(u, v), but don't take the square root
44
- # of it.
45
- #
46
- # This isn't a metric in the strict sense, i.e. it doesn't respect the
47
- # triangle inequality. However, the squared Euclidean distance is very useful
48
- # whenever only the relative values of distances are important, for example
49
- # in optimization problems.
50
- #
51
- # See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
52
- #
53
- # * *Arguments* :
54
- # - +u+ -> An array of Numeric objects.
55
- # - +v+ -> (Optional) An array of Numeric objects.
56
- # * *Returns* :
57
- # - The squared value of the euclidean norm of +u+ or of the euclidean
58
- # distance between +u+ and +v+.
59
- # * *Raises* :
60
- # - +ArgumentError+ -> The sizes of +u+ and +v+ doesn't match.
61
- #
62
- def euclidean_squared(u, v = nil)
63
- # If the second argument is nil, the method should return the norm of
64
- # vector u. For this, we need the distance between u and the origin.
65
- if v.nil?
66
- v = Array.new(u.size, 0)
67
- end
68
-
69
- # TODO: Change this to a more specific, custom-made exception.
70
- raise ArgumentError if u.size != v.size
71
-
72
- u.zip(v).reduce(0.0) do |acc, ary|
73
- acc += (ary[0] - ary[-1]) ** 2
57
+ u.zip(v).reduce(0.0) do |acc, ary|
58
+ acc += (ary[0] - ary[-1]) ** 2
59
+ end
74
60
  end
75
61
  end
76
- end
62
+
63
+ extend Measurable::Euclidean
64
+ end