measurable 0.0.5 → 0.0.11

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 66337383a6c25685893bb39f7caf5c7f0b40bcff
4
- data.tar.gz: 117a22bb28b1d36f14780d3a22bbad7211b279a0
2
+ SHA256:
3
+ metadata.gz: 3ea2c713d50fac6bd342f46e18eb9ed8267ff65cfdbf6f1ee75c70bcc92d5b1c
4
+ data.tar.gz: fa3c04483562118a4d875edce0d8afbcac5cf3ee5fa29e8b5b8d04b374058b73
5
5
  SHA512:
6
- metadata.gz: 0d7aff51213d2f0ca31472d1d5f43b3ce99d58255b9d3d53485b0983e953866a365927734c1852c7040f7c23149455712af6357679e03ff63bf247709ac7e124
7
- data.tar.gz: 652066925d87d7d52656f87346c856d34296e4a98662e94a670d72e9612c568dd95bf758314c524501700b21b53484eaaf059e5b5fee315129fb1b0b90a170bd
6
+ metadata.gz: 46aa4474a64b5b9e7e5c3ee53adf7804d618941528d0722e0bdff82e5010dc9f978d553ea5c7779d7917f26a9bc0e0f4c993bd74653d7f81b698abc9aabbcf9d
7
+ data.tar.gz: 9eaf669fef73e90a7dc4196939634f42c54f21471301c4913b9b46155feaf59a0755f88bf325a373d1242653fbe19e7f6d06fb4405e411a5e1aeef596a0b8713
data/.gitignore CHANGED
@@ -1,4 +1,6 @@
1
1
  pkg
2
2
  tmp/*
3
3
  benchmarks/*
4
- lib/*.bundle
4
+ lib/*.bundle
5
+ html/
6
+ Gemfile.lock
@@ -0,0 +1,7 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.5
4
+ - 2.6
5
+ - 2.7
6
+ # uncomment this line if your project needs to run something other than `rake`:
7
+ # script: bundle exec rspec spec
data/Gemfile CHANGED
@@ -1,3 +1,4 @@
1
1
  # Gemfile
2
- source "http://rubygems.org"
2
+ source "https://rubygems.org"
3
+
3
4
  gemspec
@@ -0,0 +1,11 @@
1
+ 0.0.11 -- 22th June, 2020
2
+ * Updated rake & rdoc
3
+ * Updated Travis CI config
4
+ * ... honestly, just getting back to this repository
5
+
6
+ 0.0.9 -- 16th April, 2015
7
+ * Removed unnecessary argument length check from jaccard_index.
8
+ * Host documentation on rubydoc.info.
9
+
10
+ 0.0.8 -- 18th May, 2014
11
+ * Added Kullback-Leibler divergence.
data/README.md CHANGED
@@ -1,22 +1,27 @@
1
1
  # Measurable
2
2
 
3
- A gem to test what metric is best for certain kinds of datasets in machine learning.
3
+ [![Build Status](https://travis-ci.org/agarie/measurable.svg?branch=master)](https://travis-ci.org/agarie/measurable)
4
+ [![Code Climate](https://codeclimate.com/github/agarie/measurable.png)](https://codeclimate.com/github/agarie/measurable)
4
5
 
5
- Besides the `Array` class, I also want to support `NVector` (from [NMatrix](http://github.com/sciruby/nmatrix)).
6
+ A gem to test what metric is best for certain kinds of datasets in machine
7
+ learning. Besides the `Array` class, I also want to support
8
+ [NMatrix](http://github.com/sciruby/nmatrix).
6
9
 
7
- The distance measures will be created in Ruby first. If I see that it's really too slow, I'll write some methods in C (or Java, for JRuby).
10
+ This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures),
11
+ which has a similar objective, but isn't actively maintained and doesn't support
12
+ NMatrix. Thank you, [@reddavis][reddavis]. :)
8
13
 
9
- This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures), which has a similar objective, but isn't actively maintained and doesn't support NMatrix. Thank you, [@reddavis][reddavis]. :)
10
-
11
- ## Install
14
+ ## Installation
12
15
 
13
16
  `gem install measurable`
14
17
 
15
- I only tested it with 2.0.0 (yes, yes, travis, I'll do it eventually). I want to support JRuby as well.
18
+ I test this gem (via Travis CI) on Ruby MRI 2.5, 2.6 and 2.7.
16
19
 
17
- ## Distance measures
20
+ ## Available distance measures
18
21
 
19
- I'm using the term "distance measure" without much concern for the strict mathematical definition of a metric. If the documentation for one of the methods isn't clear about it being or not a metric, please open an issue.
22
+ I'm using the term "distance measure" without much concern for the strict
23
+ mathematical definition of a metric. If the documentation for one of the
24
+ methods isn't clear about it being or not a metric, please open an issue.
20
25
 
21
26
  The following are the similarity measures supported at the moment:
22
27
 
@@ -27,52 +32,37 @@ The following are the similarity measures supported at the moment:
27
32
  - Jaccard distance
28
33
  - Tanimoto distance
29
34
  - Haversine distance
30
-
31
- These still need to be implemented:
32
-
33
- - Cityblock distance
35
+ - Minkowski (aka Cityblock or Manhattan) distance
34
36
  - Chebyshev distance
35
- - Minkowski distance
36
37
  - Hamming distance
37
- - Correlation distance
38
- - Chi-square distance
39
- - Kullback-Leibler divergence
40
- - Jensen-Shannon divergence
41
- - Mahalanobis distance
42
- - Squared Mahalanobis distance
38
+ - [Levenshtein distance](http://en.wikipedia.org/wiki/Levenshtein_distance)
39
+ - [Kullback-Leibler divergence](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)
43
40
 
44
41
  ## How to use
45
42
 
46
43
  The API I intend to support is something like this:
47
44
 
48
45
  ```ruby
49
- require "measurable"
50
-
51
- u = NVector.ones(2)
52
- v = NVector.zeros(2)
53
- w = [1, 0]
54
- x = [2, 2]
46
+ require 'measurable'
55
47
 
56
48
  # Calculate the distance between two points in space.
57
- Measurable.euclidean(u, v) # => 1.41421
58
- Measurable.euclidean(w, v) # => 1.00000
59
- Measurable.cosine([1, 2], [2, 3]) # => 0.00772
49
+ Measurable.euclidean([1, 1], [0, 0]) # => 1.41421
60
50
 
61
51
  # Calculate the norm of a vector, i.e. its distance from the origin.
62
- Measurable.euclidean_squared([3, 4]) # => 25
63
- ```
52
+ Measurable.euclidean([1, 1]) # => 1.4142135623730951
64
53
 
65
- ## Documentation
66
-
67
- `RDoc` syntax is used to document the project. To build it locally, you'll need to install the [Fivefish generator](https://github.com/ged/rdoc-generator-fivefish) (`gem install rdoc-generator-fivefish`) and run the following command:
54
+ # Get the cosine distance between
55
+ Measurable.cosine_distance([1, 2], [2, 3]) # => 0.007722123286332261
68
56
 
69
- ```bash
70
- rdoc -f fivefish -m README.md *.md LICENSE lib/
57
+ # Calculate sum of squares directly.
58
+ Measurable.euclidean_squared([3, 4]) # => 25
71
59
  ```
72
60
 
73
- I want to be able to use a Rake task to generate the documentation, thus allowing me to forget the specific command. However, there's a bug in `RDoc::Task` in which [custom generators (like Fivefish) can't be used](https://github.com/rdoc/rdoc/issues/246).
61
+ Most of the methods accept arbitrary enumerable objects instead of Arrays. For example, it's possible to use [NMatrix](https://github.com/sciruby/nmatrix).
62
+
63
+ ## Documentation
74
64
 
75
- If there's something wrong with an explanation or if there's information missing, please open an issue or send a pull request.
65
+ The documentation is hosted on [rubydoc](http://www.rubydoc.info/github/agarie/measurable).
76
66
 
77
67
  ## License
78
68
 
@@ -81,4 +71,4 @@ See LICENSE for details.
81
71
  The original `distance_measures` gem is copyrighted by [@reddavis][reddavis].
82
72
 
83
73
  [maxmin]: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05156398
84
- [reddavis]: (https://github.com/reddavis)
74
+ [reddavis]: (https://github.com/reddavis)
data/Rakefile CHANGED
@@ -1,7 +1,7 @@
1
1
  require 'rake'
2
2
  require 'bundler/gem_tasks'
3
3
  require "rspec/core/rake_task"
4
- # require 'rdoc/task' # See below.
4
+ require 'rdoc/task'
5
5
 
6
6
  # Setup the necessary gems, specified in the gemspec.
7
7
  require 'bundler'
@@ -13,20 +13,26 @@ rescue Bundler::BundlerError => e
13
13
  exit e.status_code
14
14
  end
15
15
 
16
+ task :default => [:spec]
17
+
16
18
  # Run all the specs.
17
19
  RSpec::Core::RakeTask.new(:spec)
18
20
 
19
- # RDoc task isn't working with custom generators, as can be seen in:
20
- # https://github.com/rdoc/rdoc/issues/246
21
- #
22
- # Whenever this issue is fixed, I'll resume using this task.
23
- #
24
- # RDoc::Task.new do |rdoc|
25
- # rdoc.main = "README.md"
26
- # rdoc.rdoc_files.include("README.md", "LICENSE", "lib")
27
- # rdoc.generator = "fivefish"
28
- # rdoc.external = true
29
- # end
21
+ RDoc::Task.new do |rdoc|
22
+ rdoc.main = "README.md"
23
+ rdoc.rdoc_files.include("README.md", "LICENSE", "lib")
24
+ rdoc.generator = "fivefish"
25
+ rdoc.external = true
26
+ end
27
+
28
+ desc "Open IRB with Measurable loaded."
29
+ task :console do
30
+ require 'irb'
31
+ require 'irb/completion'
32
+ require 'measurable'
33
+ ARGV.clear
34
+ IRB.start
35
+ end
30
36
 
31
37
  # Compile task.
32
38
  # Rake::ExtensionTask.new do |ext|
@@ -2,15 +2,18 @@ require 'measurable/version'
2
2
 
3
3
  # Distance measures. The require order is important.
4
4
  require 'measurable/euclidean'
5
+ require 'measurable/minkowski'
5
6
  require 'measurable/cosine'
6
7
  require 'measurable/jaccard'
7
8
  require 'measurable/tanimoto'
8
- require 'measurable/haversine'
9
+ require 'measurable/chebyshev'
9
10
  require 'measurable/maxmin'
11
+ require 'measurable/haversine'
12
+ require 'measurable/hamming'
13
+ require 'measurable/levenshtein'
14
+ require 'measurable/kullback_leibler'
10
15
 
11
16
  module Measurable
12
17
  # PI / 180 degrees.
13
18
  RAD_PER_DEG = Math::PI / 180
14
-
15
- extend self # expose all instance methods as singleton methods.
16
19
  end
@@ -0,0 +1,24 @@
1
+ module Measurable
2
+ module Chebyshev
3
+
4
+ # call-seq:
5
+ # chebyshev(u, v) -> Float
6
+ #
7
+ # Arguments:
8
+ # - +u+ -> An array of Numeric objects.
9
+ # - +v+ -> An array of Numeric objects.
10
+ # Returns:
11
+ # - The L-infinite distance between +u+ and +v+.
12
+ # Raises:
13
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
14
+ def chebyshev(u, v)
15
+ # TODO: Change this to a more specific, custom-made exception.
16
+ raise ArgumentError if u.size != v.size
17
+
18
+ abs_differences = u.zip(v).map { |a| (a[0] - a[1]).abs }
19
+ abs_differences.max
20
+ end
21
+ end
22
+
23
+ extend Measurable::Chebyshev
24
+ end
@@ -1,27 +1,69 @@
1
+ require 'measurable/euclidean'
2
+
1
3
  module Measurable
4
+ module Cosine
5
+
6
+ # call-seq:
7
+ # cosine_similarity(u, v) -> Float
8
+ #
9
+ # Calculate the cosine similarity between the orientation of two vectors.
10
+ #
11
+ # See: http://en.wikipedia.org/wiki/Cosine_similarity
12
+ #
13
+ # Arguments:
14
+ # - +u+ -> An array of Numeric objects.
15
+ # - +v+ -> An array of Numeric objects.
16
+ # Returns:
17
+ # - The normalized dot product of +u+ and +v+, that is, the angle between
18
+ # them in the n-dimensional space.
19
+ # Raises:
20
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
21
+ #
22
+ def cosine_similarity(u, v)
23
+ # TODO: Change this to a more specific, custom-made exception.
24
+ raise ArgumentError if u.size != v.size
25
+
26
+ dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] }
27
+
28
+ dot_product / (euclidean(u) * euclidean(v))
29
+ end
2
30
 
3
- # call-seq:
4
- # cosine(u, v) -> Float
5
- #
6
- # Calculate the similarity between the orientation of two vectors.
7
- #
8
- # See: http://en.wikipedia.org/wiki/Cosine_similarity
9
- #
10
- # * *Arguments* :
11
- # - +u+ -> An array of Numeric objects.
12
- # - +v+ -> An array of Numeric objects.
13
- # * *Returns* :
14
- # - The normalized dot product of +u+ and +v+, that is, the angle between
15
- # them in the n-dimensional space.
16
- # * *Raises* :
17
- # - +ArgumentError+ -> The sizes of +u+ and +v+ doesn't match.
18
- #
19
- def cosine(u, v)
20
- # TODO: Change this to a more specific, custom-made exception.
21
- raise ArgumentError if u.size != v.size
22
-
23
- dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] }
24
-
25
- dot_product / (euclidean(u) * euclidean(v))
31
+ # call-seq:
32
+ # cosine_distance(u, v) -> Float
33
+ #
34
+ # Calculate the cosine distance between the orientation of two vectors.
35
+ #
36
+ # See: http://en.wikipedia.org/wiki/Cosine_similarity
37
+ #
38
+ # Arguments:
39
+ # - +u+ -> An array of Numeric objects.
40
+ # - +v+ -> An array of Numeric objects.
41
+ # Returns:
42
+ # - The normalized dot product of +u+ and +v+, that is, the angle between
43
+ # them in the n-dimensional space.
44
+ # Raises:
45
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
46
+ def cosine_distance(u, v)
47
+ # TODO: Change this to a more specific, custom-made exception.
48
+ raise ArgumentError if u.size != v.size
49
+
50
+ 1 - cosine_similarity(u, v)
51
+ end
52
+
53
+ def self.extended(base) # :nodoc:
54
+ base.instance_eval do
55
+ extend Measurable::Euclidean
56
+ end
57
+ super
58
+ end
59
+
60
+ def self.included(base) # :nodoc:
61
+ base.class_eval do
62
+ include Measurable::Euclidean
63
+ end
64
+ super
65
+ end
26
66
  end
27
- end
67
+
68
+ extend Measurable::Cosine
69
+ end
@@ -1,76 +1,64 @@
1
1
  module Measurable
2
+ module Euclidean
2
3
 
3
- # call-seq:
4
- # euclidean(u) -> Float
5
- # euclidean(u, v) -> Float
6
- #
7
- # Calculate the ordinary distance between arrays +u+ and +v+.
8
- #
9
- # If +v+ isn't given, calculate the Euclidean norm of +u+.
10
- #
11
- # See: http://en.wikipedia.org/wiki/Euclidean_distance#N_dimensions
12
- #
13
- # * *Arguments* :
14
- # - +u+ -> An array of Numeric objects.
15
- # - +v+ -> (Optional) An array of Numeric objects.
16
- # * *Returns* :
17
- # - The euclidean norm of +u+ or the euclidean distance between +u+ and
18
- # +v+.
19
- # * *Raises* :
20
- # - +ArgumentError+ -> The sizes of +u+ and +v+ doesn't match.
21
- #
22
- def euclidean(u, v = nil)
23
- # If the second argument is nil, the method should return the norm of
24
- # vector u. For this, we need the distance between u and the origin.
25
- if v.nil?
26
- v = Array.new(u.size, 0)
4
+ # call-seq:
5
+ # euclidean(u) -> Float
6
+ # euclidean(u, v) -> Float
7
+ #
8
+ # Calculate the ordinary distance between arrays +u+ and +v+.
9
+ #
10
+ # If +v+ isn't given, calculate the Euclidean norm of +u+.
11
+ #
12
+ # See: http://en.wikipedia.org/wiki/Euclidean_distance#N_dimensions
13
+ #
14
+ # Arguments:
15
+ # - +u+ -> An array of Numeric objects.
16
+ # - +v+ -> (Optional) An array of Numeric objects.
17
+ # Returns:
18
+ # - The euclidean norm of +u+ or the euclidean distance between +u+ and +v+.
19
+ # Raises:
20
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
21
+ def euclidean(u, v = nil)
22
+ Math.sqrt(self.euclidean_squared(u, v))
27
23
  end
28
24
 
29
- # TODO: Change this to a more specific, custom-made exception.
30
- raise ArgumentError if u.size != v.size
25
+ # call-seq:
26
+ # euclidean_squared(u) -> Float
27
+ # euclidean_squared(u, v) -> Float
28
+ #
29
+ # Calculate the same value as euclidean(u, v), but don't take the square root
30
+ # of it.
31
+ #
32
+ # This isn't a metric in the strict sense, i.e. it doesn't respect the
33
+ # triangle inequality. However, the squared Euclidean distance is very useful
34
+ # whenever only the relative values of distances are important, for example
35
+ # in optimization problems.
36
+ #
37
+ # See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
38
+ #
39
+ # Arguments:
40
+ # - +u+ -> An array of Numeric objects.
41
+ # - +v+ -> (Optional) An array of Numeric objects.
42
+ # Returns:
43
+ # - The squared value of the euclidean norm of +u+ or of the euclidean
44
+ # distance between +u+ and +v+.
45
+ # Raises:
46
+ # - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
47
+ def euclidean_squared(u, v = nil)
48
+ # If the second argument is nil, the method should return the norm of
49
+ # vector u. For this, we need the distance between u and the origin.
50
+ if v.nil?
51
+ v = Array.new(u.size, 0)
52
+ end
31
53
 
32
- sum = u.zip(v).reduce(0.0) do |acc, ary|
33
- acc += (ary[0] - ary[-1]) ** 2
34
- end
35
-
36
- Math.sqrt(sum)
37
- end
54
+ # TODO: Change this to a more specific, custom-made exception.
55
+ raise ArgumentError if u.size != v.size
38
56
 
39
- # call-seq:
40
- # euclidean_squared(u) -> Float
41
- # euclidean_squared(u, v) -> Float
42
- #
43
- # Calculate the same value as euclidean(u, v), but don't take the square root
44
- # of it.
45
- #
46
- # This isn't a metric in the strict sense, i.e. it doesn't respect the
47
- # triangle inequality. However, the squared Euclidean distance is very useful
48
- # whenever only the relative values of distances are important, for example
49
- # in optimization problems.
50
- #
51
- # See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
52
- #
53
- # * *Arguments* :
54
- # - +u+ -> An array of Numeric objects.
55
- # - +v+ -> (Optional) An array of Numeric objects.
56
- # * *Returns* :
57
- # - The squared value of the euclidean norm of +u+ or of the euclidean
58
- # distance between +u+ and +v+.
59
- # * *Raises* :
60
- # - +ArgumentError+ -> The sizes of +u+ and +v+ doesn't match.
61
- #
62
- def euclidean_squared(u, v = nil)
63
- # If the second argument is nil, the method should return the norm of
64
- # vector u. For this, we need the distance between u and the origin.
65
- if v.nil?
66
- v = Array.new(u.size, 0)
67
- end
68
-
69
- # TODO: Change this to a more specific, custom-made exception.
70
- raise ArgumentError if u.size != v.size
71
-
72
- u.zip(v).reduce(0.0) do |acc, ary|
73
- acc += (ary[0] - ary[-1]) ** 2
57
+ u.zip(v).reduce(0.0) do |acc, ary|
58
+ acc += (ary[0] - ary[-1]) ** 2
59
+ end
74
60
  end
75
61
  end
76
- end
62
+
63
+ extend Measurable::Euclidean
64
+ end