measurable 0.0.5 → 0.0.11
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +5 -5
- data/.gitignore +3 -1
- data/.travis.yml +7 -0
- data/Gemfile +2 -1
- data/History.txt +11 -0
- data/README.md +29 -39
- data/Rakefile +18 -12
- data/lib/measurable.rb +6 -3
- data/lib/measurable/chebyshev.rb +24 -0
- data/lib/measurable/cosine.rb +66 -24
- data/lib/measurable/euclidean.rb +56 -68
- data/lib/measurable/hamming.rb +32 -0
- data/lib/measurable/haversine.rb +51 -47
- data/lib/measurable/jaccard.rb +54 -61
- data/lib/measurable/kullback_leibler.rb +39 -0
- data/lib/measurable/levenshtein.rb +57 -0
- data/lib/measurable/maxmin.rb +32 -28
- data/lib/measurable/minkowski.rb +44 -0
- data/lib/measurable/tanimoto.rb +46 -27
- data/lib/measurable/version.rb +2 -2
- data/measurable.gemspec +6 -4
- data/spec/chebyshev_spec.rb +48 -0
- data/spec/cosine_spec.rb +72 -24
- data/spec/euclidean_spec.rb +30 -14
- data/spec/hamming_spec.rb +46 -0
- data/spec/haversine_spec.rb +22 -2
- data/spec/jaccard_spec.rb +35 -14
- data/spec/kullback_leibler_spec.rb +46 -0
- data/spec/levenshtein_spec.rb +71 -0
- data/spec/maxmin_spec.rb +21 -2
- data/spec/minkowski_spec.rb +44 -0
- data/spec/spec_helper.rb +1 -1
- data/spec/tanimoto_spec.rb +23 -3
- metadata +53 -23
- data/Gemfile.lock +0 -27
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 3ea2c713d50fac6bd342f46e18eb9ed8267ff65cfdbf6f1ee75c70bcc92d5b1c
|
4
|
+
data.tar.gz: fa3c04483562118a4d875edce0d8afbcac5cf3ee5fa29e8b5b8d04b374058b73
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 46aa4474a64b5b9e7e5c3ee53adf7804d618941528d0722e0bdff82e5010dc9f978d553ea5c7779d7917f26a9bc0e0f4c993bd74653d7f81b698abc9aabbcf9d
|
7
|
+
data.tar.gz: 9eaf669fef73e90a7dc4196939634f42c54f21471301c4913b9b46155feaf59a0755f88bf325a373d1242653fbe19e7f6d06fb4405e411a5e1aeef596a0b8713
|
data/.gitignore
CHANGED
data/.travis.yml
ADDED
data/Gemfile
CHANGED
data/History.txt
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
0.0.11 -- 22th June, 2020
|
2
|
+
* Updated rake & rdoc
|
3
|
+
* Updated Travis CI config
|
4
|
+
* ... honestly, just getting back to this repository
|
5
|
+
|
6
|
+
0.0.9 -- 16th April, 2015
|
7
|
+
* Removed unnecessary argument length check from jaccard_index.
|
8
|
+
* Host documentation on rubydoc.info.
|
9
|
+
|
10
|
+
0.0.8 -- 18th May, 2014
|
11
|
+
* Added Kullback-Leibler divergence.
|
data/README.md
CHANGED
@@ -1,22 +1,27 @@
|
|
1
1
|
# Measurable
|
2
2
|
|
3
|
-
|
3
|
+
[![Build Status](https://travis-ci.org/agarie/measurable.svg?branch=master)](https://travis-ci.org/agarie/measurable)
|
4
|
+
[![Code Climate](https://codeclimate.com/github/agarie/measurable.png)](https://codeclimate.com/github/agarie/measurable)
|
4
5
|
|
5
|
-
|
6
|
+
A gem to test what metric is best for certain kinds of datasets in machine
|
7
|
+
learning. Besides the `Array` class, I also want to support
|
8
|
+
[NMatrix](http://github.com/sciruby/nmatrix).
|
6
9
|
|
7
|
-
|
10
|
+
This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures),
|
11
|
+
which has a similar objective, but isn't actively maintained and doesn't support
|
12
|
+
NMatrix. Thank you, [@reddavis][reddavis]. :)
|
8
13
|
|
9
|
-
|
10
|
-
|
11
|
-
## Install
|
14
|
+
## Installation
|
12
15
|
|
13
16
|
`gem install measurable`
|
14
17
|
|
15
|
-
I
|
18
|
+
I test this gem (via Travis CI) on Ruby MRI 2.5, 2.6 and 2.7.
|
16
19
|
|
17
|
-
##
|
20
|
+
## Available distance measures
|
18
21
|
|
19
|
-
I'm using the term "distance measure" without much concern for the strict
|
22
|
+
I'm using the term "distance measure" without much concern for the strict
|
23
|
+
mathematical definition of a metric. If the documentation for one of the
|
24
|
+
methods isn't clear about it being or not a metric, please open an issue.
|
20
25
|
|
21
26
|
The following are the similarity measures supported at the moment:
|
22
27
|
|
@@ -27,52 +32,37 @@ The following are the similarity measures supported at the moment:
|
|
27
32
|
- Jaccard distance
|
28
33
|
- Tanimoto distance
|
29
34
|
- Haversine distance
|
30
|
-
|
31
|
-
These still need to be implemented:
|
32
|
-
|
33
|
-
- Cityblock distance
|
35
|
+
- Minkowski (aka Cityblock or Manhattan) distance
|
34
36
|
- Chebyshev distance
|
35
|
-
- Minkowski distance
|
36
37
|
- Hamming distance
|
37
|
-
-
|
38
|
-
-
|
39
|
-
- Kullback-Leibler divergence
|
40
|
-
- Jensen-Shannon divergence
|
41
|
-
- Mahalanobis distance
|
42
|
-
- Squared Mahalanobis distance
|
38
|
+
- [Levenshtein distance](http://en.wikipedia.org/wiki/Levenshtein_distance)
|
39
|
+
- [Kullback-Leibler divergence](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)
|
43
40
|
|
44
41
|
## How to use
|
45
42
|
|
46
43
|
The API I intend to support is something like this:
|
47
44
|
|
48
45
|
```ruby
|
49
|
-
require
|
50
|
-
|
51
|
-
u = NVector.ones(2)
|
52
|
-
v = NVector.zeros(2)
|
53
|
-
w = [1, 0]
|
54
|
-
x = [2, 2]
|
46
|
+
require 'measurable'
|
55
47
|
|
56
48
|
# Calculate the distance between two points in space.
|
57
|
-
Measurable.euclidean(
|
58
|
-
Measurable.euclidean(w, v) # => 1.00000
|
59
|
-
Measurable.cosine([1, 2], [2, 3]) # => 0.00772
|
49
|
+
Measurable.euclidean([1, 1], [0, 0]) # => 1.41421
|
60
50
|
|
61
51
|
# Calculate the norm of a vector, i.e. its distance from the origin.
|
62
|
-
Measurable.
|
63
|
-
```
|
52
|
+
Measurable.euclidean([1, 1]) # => 1.4142135623730951
|
64
53
|
|
65
|
-
|
66
|
-
|
67
|
-
`RDoc` syntax is used to document the project. To build it locally, you'll need to install the [Fivefish generator](https://github.com/ged/rdoc-generator-fivefish) (`gem install rdoc-generator-fivefish`) and run the following command:
|
54
|
+
# Get the cosine distance between
|
55
|
+
Measurable.cosine_distance([1, 2], [2, 3]) # => 0.007722123286332261
|
68
56
|
|
69
|
-
|
70
|
-
|
57
|
+
# Calculate sum of squares directly.
|
58
|
+
Measurable.euclidean_squared([3, 4]) # => 25
|
71
59
|
```
|
72
60
|
|
73
|
-
|
61
|
+
Most of the methods accept arbitrary enumerable objects instead of Arrays. For example, it's possible to use [NMatrix](https://github.com/sciruby/nmatrix).
|
62
|
+
|
63
|
+
## Documentation
|
74
64
|
|
75
|
-
|
65
|
+
The documentation is hosted on [rubydoc](http://www.rubydoc.info/github/agarie/measurable).
|
76
66
|
|
77
67
|
## License
|
78
68
|
|
@@ -81,4 +71,4 @@ See LICENSE for details.
|
|
81
71
|
The original `distance_measures` gem is copyrighted by [@reddavis][reddavis].
|
82
72
|
|
83
73
|
[maxmin]: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05156398
|
84
|
-
[reddavis]: (https://github.com/reddavis)
|
74
|
+
[reddavis]: (https://github.com/reddavis)
|
data/Rakefile
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
require 'rake'
|
2
2
|
require 'bundler/gem_tasks'
|
3
3
|
require "rspec/core/rake_task"
|
4
|
-
|
4
|
+
require 'rdoc/task'
|
5
5
|
|
6
6
|
# Setup the necessary gems, specified in the gemspec.
|
7
7
|
require 'bundler'
|
@@ -13,20 +13,26 @@ rescue Bundler::BundlerError => e
|
|
13
13
|
exit e.status_code
|
14
14
|
end
|
15
15
|
|
16
|
+
task :default => [:spec]
|
17
|
+
|
16
18
|
# Run all the specs.
|
17
19
|
RSpec::Core::RakeTask.new(:spec)
|
18
20
|
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
21
|
+
RDoc::Task.new do |rdoc|
|
22
|
+
rdoc.main = "README.md"
|
23
|
+
rdoc.rdoc_files.include("README.md", "LICENSE", "lib")
|
24
|
+
rdoc.generator = "fivefish"
|
25
|
+
rdoc.external = true
|
26
|
+
end
|
27
|
+
|
28
|
+
desc "Open IRB with Measurable loaded."
|
29
|
+
task :console do
|
30
|
+
require 'irb'
|
31
|
+
require 'irb/completion'
|
32
|
+
require 'measurable'
|
33
|
+
ARGV.clear
|
34
|
+
IRB.start
|
35
|
+
end
|
30
36
|
|
31
37
|
# Compile task.
|
32
38
|
# Rake::ExtensionTask.new do |ext|
|
data/lib/measurable.rb
CHANGED
@@ -2,15 +2,18 @@ require 'measurable/version'
|
|
2
2
|
|
3
3
|
# Distance measures. The require order is important.
|
4
4
|
require 'measurable/euclidean'
|
5
|
+
require 'measurable/minkowski'
|
5
6
|
require 'measurable/cosine'
|
6
7
|
require 'measurable/jaccard'
|
7
8
|
require 'measurable/tanimoto'
|
8
|
-
require 'measurable/
|
9
|
+
require 'measurable/chebyshev'
|
9
10
|
require 'measurable/maxmin'
|
11
|
+
require 'measurable/haversine'
|
12
|
+
require 'measurable/hamming'
|
13
|
+
require 'measurable/levenshtein'
|
14
|
+
require 'measurable/kullback_leibler'
|
10
15
|
|
11
16
|
module Measurable
|
12
17
|
# PI / 180 degrees.
|
13
18
|
RAD_PER_DEG = Math::PI / 180
|
14
|
-
|
15
|
-
extend self # expose all instance methods as singleton methods.
|
16
19
|
end
|
@@ -0,0 +1,24 @@
|
|
1
|
+
module Measurable
|
2
|
+
module Chebyshev
|
3
|
+
|
4
|
+
# call-seq:
|
5
|
+
# chebyshev(u, v) -> Float
|
6
|
+
#
|
7
|
+
# Arguments:
|
8
|
+
# - +u+ -> An array of Numeric objects.
|
9
|
+
# - +v+ -> An array of Numeric objects.
|
10
|
+
# Returns:
|
11
|
+
# - The L-infinite distance between +u+ and +v+.
|
12
|
+
# Raises:
|
13
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
14
|
+
def chebyshev(u, v)
|
15
|
+
# TODO: Change this to a more specific, custom-made exception.
|
16
|
+
raise ArgumentError if u.size != v.size
|
17
|
+
|
18
|
+
abs_differences = u.zip(v).map { |a| (a[0] - a[1]).abs }
|
19
|
+
abs_differences.max
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
extend Measurable::Chebyshev
|
24
|
+
end
|
data/lib/measurable/cosine.rb
CHANGED
@@ -1,27 +1,69 @@
|
|
1
|
+
require 'measurable/euclidean'
|
2
|
+
|
1
3
|
module Measurable
|
4
|
+
module Cosine
|
5
|
+
|
6
|
+
# call-seq:
|
7
|
+
# cosine_similarity(u, v) -> Float
|
8
|
+
#
|
9
|
+
# Calculate the cosine similarity between the orientation of two vectors.
|
10
|
+
#
|
11
|
+
# See: http://en.wikipedia.org/wiki/Cosine_similarity
|
12
|
+
#
|
13
|
+
# Arguments:
|
14
|
+
# - +u+ -> An array of Numeric objects.
|
15
|
+
# - +v+ -> An array of Numeric objects.
|
16
|
+
# Returns:
|
17
|
+
# - The normalized dot product of +u+ and +v+, that is, the angle between
|
18
|
+
# them in the n-dimensional space.
|
19
|
+
# Raises:
|
20
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
21
|
+
#
|
22
|
+
def cosine_similarity(u, v)
|
23
|
+
# TODO: Change this to a more specific, custom-made exception.
|
24
|
+
raise ArgumentError if u.size != v.size
|
25
|
+
|
26
|
+
dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] }
|
27
|
+
|
28
|
+
dot_product / (euclidean(u) * euclidean(v))
|
29
|
+
end
|
2
30
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
31
|
+
# call-seq:
|
32
|
+
# cosine_distance(u, v) -> Float
|
33
|
+
#
|
34
|
+
# Calculate the cosine distance between the orientation of two vectors.
|
35
|
+
#
|
36
|
+
# See: http://en.wikipedia.org/wiki/Cosine_similarity
|
37
|
+
#
|
38
|
+
# Arguments:
|
39
|
+
# - +u+ -> An array of Numeric objects.
|
40
|
+
# - +v+ -> An array of Numeric objects.
|
41
|
+
# Returns:
|
42
|
+
# - The normalized dot product of +u+ and +v+, that is, the angle between
|
43
|
+
# them in the n-dimensional space.
|
44
|
+
# Raises:
|
45
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
46
|
+
def cosine_distance(u, v)
|
47
|
+
# TODO: Change this to a more specific, custom-made exception.
|
48
|
+
raise ArgumentError if u.size != v.size
|
49
|
+
|
50
|
+
1 - cosine_similarity(u, v)
|
51
|
+
end
|
52
|
+
|
53
|
+
def self.extended(base) # :nodoc:
|
54
|
+
base.instance_eval do
|
55
|
+
extend Measurable::Euclidean
|
56
|
+
end
|
57
|
+
super
|
58
|
+
end
|
59
|
+
|
60
|
+
def self.included(base) # :nodoc:
|
61
|
+
base.class_eval do
|
62
|
+
include Measurable::Euclidean
|
63
|
+
end
|
64
|
+
super
|
65
|
+
end
|
26
66
|
end
|
27
|
-
|
67
|
+
|
68
|
+
extend Measurable::Cosine
|
69
|
+
end
|
data/lib/measurable/euclidean.rb
CHANGED
@@ -1,76 +1,64 @@
|
|
1
1
|
module Measurable
|
2
|
+
module Euclidean
|
2
3
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
def euclidean(u, v = nil)
|
23
|
-
# If the second argument is nil, the method should return the norm of
|
24
|
-
# vector u. For this, we need the distance between u and the origin.
|
25
|
-
if v.nil?
|
26
|
-
v = Array.new(u.size, 0)
|
4
|
+
# call-seq:
|
5
|
+
# euclidean(u) -> Float
|
6
|
+
# euclidean(u, v) -> Float
|
7
|
+
#
|
8
|
+
# Calculate the ordinary distance between arrays +u+ and +v+.
|
9
|
+
#
|
10
|
+
# If +v+ isn't given, calculate the Euclidean norm of +u+.
|
11
|
+
#
|
12
|
+
# See: http://en.wikipedia.org/wiki/Euclidean_distance#N_dimensions
|
13
|
+
#
|
14
|
+
# Arguments:
|
15
|
+
# - +u+ -> An array of Numeric objects.
|
16
|
+
# - +v+ -> (Optional) An array of Numeric objects.
|
17
|
+
# Returns:
|
18
|
+
# - The euclidean norm of +u+ or the euclidean distance between +u+ and +v+.
|
19
|
+
# Raises:
|
20
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
21
|
+
def euclidean(u, v = nil)
|
22
|
+
Math.sqrt(self.euclidean_squared(u, v))
|
27
23
|
end
|
28
24
|
|
29
|
-
#
|
30
|
-
|
25
|
+
# call-seq:
|
26
|
+
# euclidean_squared(u) -> Float
|
27
|
+
# euclidean_squared(u, v) -> Float
|
28
|
+
#
|
29
|
+
# Calculate the same value as euclidean(u, v), but don't take the square root
|
30
|
+
# of it.
|
31
|
+
#
|
32
|
+
# This isn't a metric in the strict sense, i.e. it doesn't respect the
|
33
|
+
# triangle inequality. However, the squared Euclidean distance is very useful
|
34
|
+
# whenever only the relative values of distances are important, for example
|
35
|
+
# in optimization problems.
|
36
|
+
#
|
37
|
+
# See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
|
38
|
+
#
|
39
|
+
# Arguments:
|
40
|
+
# - +u+ -> An array of Numeric objects.
|
41
|
+
# - +v+ -> (Optional) An array of Numeric objects.
|
42
|
+
# Returns:
|
43
|
+
# - The squared value of the euclidean norm of +u+ or of the euclidean
|
44
|
+
# distance between +u+ and +v+.
|
45
|
+
# Raises:
|
46
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
47
|
+
def euclidean_squared(u, v = nil)
|
48
|
+
# If the second argument is nil, the method should return the norm of
|
49
|
+
# vector u. For this, we need the distance between u and the origin.
|
50
|
+
if v.nil?
|
51
|
+
v = Array.new(u.size, 0)
|
52
|
+
end
|
31
53
|
|
32
|
-
|
33
|
-
|
34
|
-
end
|
35
|
-
|
36
|
-
Math.sqrt(sum)
|
37
|
-
end
|
54
|
+
# TODO: Change this to a more specific, custom-made exception.
|
55
|
+
raise ArgumentError if u.size != v.size
|
38
56
|
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
#
|
43
|
-
# Calculate the same value as euclidean(u, v), but don't take the square root
|
44
|
-
# of it.
|
45
|
-
#
|
46
|
-
# This isn't a metric in the strict sense, i.e. it doesn't respect the
|
47
|
-
# triangle inequality. However, the squared Euclidean distance is very useful
|
48
|
-
# whenever only the relative values of distances are important, for example
|
49
|
-
# in optimization problems.
|
50
|
-
#
|
51
|
-
# See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
|
52
|
-
#
|
53
|
-
# * *Arguments* :
|
54
|
-
# - +u+ -> An array of Numeric objects.
|
55
|
-
# - +v+ -> (Optional) An array of Numeric objects.
|
56
|
-
# * *Returns* :
|
57
|
-
# - The squared value of the euclidean norm of +u+ or of the euclidean
|
58
|
-
# distance between +u+ and +v+.
|
59
|
-
# * *Raises* :
|
60
|
-
# - +ArgumentError+ -> The sizes of +u+ and +v+ doesn't match.
|
61
|
-
#
|
62
|
-
def euclidean_squared(u, v = nil)
|
63
|
-
# If the second argument is nil, the method should return the norm of
|
64
|
-
# vector u. For this, we need the distance between u and the origin.
|
65
|
-
if v.nil?
|
66
|
-
v = Array.new(u.size, 0)
|
67
|
-
end
|
68
|
-
|
69
|
-
# TODO: Change this to a more specific, custom-made exception.
|
70
|
-
raise ArgumentError if u.size != v.size
|
71
|
-
|
72
|
-
u.zip(v).reduce(0.0) do |acc, ary|
|
73
|
-
acc += (ary[0] - ary[-1]) ** 2
|
57
|
+
u.zip(v).reduce(0.0) do |acc, ary|
|
58
|
+
acc += (ary[0] - ary[-1]) ** 2
|
59
|
+
end
|
74
60
|
end
|
75
61
|
end
|
76
|
-
|
62
|
+
|
63
|
+
extend Measurable::Euclidean
|
64
|
+
end
|