measurable 0.0.5 → 0.0.11
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +5 -5
- data/.gitignore +3 -1
- data/.travis.yml +7 -0
- data/Gemfile +2 -1
- data/History.txt +11 -0
- data/README.md +29 -39
- data/Rakefile +18 -12
- data/lib/measurable.rb +6 -3
- data/lib/measurable/chebyshev.rb +24 -0
- data/lib/measurable/cosine.rb +66 -24
- data/lib/measurable/euclidean.rb +56 -68
- data/lib/measurable/hamming.rb +32 -0
- data/lib/measurable/haversine.rb +51 -47
- data/lib/measurable/jaccard.rb +54 -61
- data/lib/measurable/kullback_leibler.rb +39 -0
- data/lib/measurable/levenshtein.rb +57 -0
- data/lib/measurable/maxmin.rb +32 -28
- data/lib/measurable/minkowski.rb +44 -0
- data/lib/measurable/tanimoto.rb +46 -27
- data/lib/measurable/version.rb +2 -2
- data/measurable.gemspec +6 -4
- data/spec/chebyshev_spec.rb +48 -0
- data/spec/cosine_spec.rb +72 -24
- data/spec/euclidean_spec.rb +30 -14
- data/spec/hamming_spec.rb +46 -0
- data/spec/haversine_spec.rb +22 -2
- data/spec/jaccard_spec.rb +35 -14
- data/spec/kullback_leibler_spec.rb +46 -0
- data/spec/levenshtein_spec.rb +71 -0
- data/spec/maxmin_spec.rb +21 -2
- data/spec/minkowski_spec.rb +44 -0
- data/spec/spec_helper.rb +1 -1
- data/spec/tanimoto_spec.rb +23 -3
- metadata +53 -23
- data/Gemfile.lock +0 -27
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 3ea2c713d50fac6bd342f46e18eb9ed8267ff65cfdbf6f1ee75c70bcc92d5b1c
|
4
|
+
data.tar.gz: fa3c04483562118a4d875edce0d8afbcac5cf3ee5fa29e8b5b8d04b374058b73
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 46aa4474a64b5b9e7e5c3ee53adf7804d618941528d0722e0bdff82e5010dc9f978d553ea5c7779d7917f26a9bc0e0f4c993bd74653d7f81b698abc9aabbcf9d
|
7
|
+
data.tar.gz: 9eaf669fef73e90a7dc4196939634f42c54f21471301c4913b9b46155feaf59a0755f88bf325a373d1242653fbe19e7f6d06fb4405e411a5e1aeef596a0b8713
|
data/.gitignore
CHANGED
data/.travis.yml
ADDED
data/Gemfile
CHANGED
data/History.txt
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
0.0.11 -- 22th June, 2020
|
2
|
+
* Updated rake & rdoc
|
3
|
+
* Updated Travis CI config
|
4
|
+
* ... honestly, just getting back to this repository
|
5
|
+
|
6
|
+
0.0.9 -- 16th April, 2015
|
7
|
+
* Removed unnecessary argument length check from jaccard_index.
|
8
|
+
* Host documentation on rubydoc.info.
|
9
|
+
|
10
|
+
0.0.8 -- 18th May, 2014
|
11
|
+
* Added Kullback-Leibler divergence.
|
data/README.md
CHANGED
@@ -1,22 +1,27 @@
|
|
1
1
|
# Measurable
|
2
2
|
|
3
|
-
|
3
|
+
[](https://travis-ci.org/agarie/measurable)
|
4
|
+
[](https://codeclimate.com/github/agarie/measurable)
|
4
5
|
|
5
|
-
|
6
|
+
A gem to test what metric is best for certain kinds of datasets in machine
|
7
|
+
learning. Besides the `Array` class, I also want to support
|
8
|
+
[NMatrix](http://github.com/sciruby/nmatrix).
|
6
9
|
|
7
|
-
|
10
|
+
This is a fork of the gem [Distance Measure](https://github.com/reddavis/Distance-Measures),
|
11
|
+
which has a similar objective, but isn't actively maintained and doesn't support
|
12
|
+
NMatrix. Thank you, [@reddavis][reddavis]. :)
|
8
13
|
|
9
|
-
|
10
|
-
|
11
|
-
## Install
|
14
|
+
## Installation
|
12
15
|
|
13
16
|
`gem install measurable`
|
14
17
|
|
15
|
-
I
|
18
|
+
I test this gem (via Travis CI) on Ruby MRI 2.5, 2.6 and 2.7.
|
16
19
|
|
17
|
-
##
|
20
|
+
## Available distance measures
|
18
21
|
|
19
|
-
I'm using the term "distance measure" without much concern for the strict
|
22
|
+
I'm using the term "distance measure" without much concern for the strict
|
23
|
+
mathematical definition of a metric. If the documentation for one of the
|
24
|
+
methods isn't clear about it being or not a metric, please open an issue.
|
20
25
|
|
21
26
|
The following are the similarity measures supported at the moment:
|
22
27
|
|
@@ -27,52 +32,37 @@ The following are the similarity measures supported at the moment:
|
|
27
32
|
- Jaccard distance
|
28
33
|
- Tanimoto distance
|
29
34
|
- Haversine distance
|
30
|
-
|
31
|
-
These still need to be implemented:
|
32
|
-
|
33
|
-
- Cityblock distance
|
35
|
+
- Minkowski (aka Cityblock or Manhattan) distance
|
34
36
|
- Chebyshev distance
|
35
|
-
- Minkowski distance
|
36
37
|
- Hamming distance
|
37
|
-
-
|
38
|
-
-
|
39
|
-
- Kullback-Leibler divergence
|
40
|
-
- Jensen-Shannon divergence
|
41
|
-
- Mahalanobis distance
|
42
|
-
- Squared Mahalanobis distance
|
38
|
+
- [Levenshtein distance](http://en.wikipedia.org/wiki/Levenshtein_distance)
|
39
|
+
- [Kullback-Leibler divergence](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)
|
43
40
|
|
44
41
|
## How to use
|
45
42
|
|
46
43
|
The API I intend to support is something like this:
|
47
44
|
|
48
45
|
```ruby
|
49
|
-
require
|
50
|
-
|
51
|
-
u = NVector.ones(2)
|
52
|
-
v = NVector.zeros(2)
|
53
|
-
w = [1, 0]
|
54
|
-
x = [2, 2]
|
46
|
+
require 'measurable'
|
55
47
|
|
56
48
|
# Calculate the distance between two points in space.
|
57
|
-
Measurable.euclidean(
|
58
|
-
Measurable.euclidean(w, v) # => 1.00000
|
59
|
-
Measurable.cosine([1, 2], [2, 3]) # => 0.00772
|
49
|
+
Measurable.euclidean([1, 1], [0, 0]) # => 1.41421
|
60
50
|
|
61
51
|
# Calculate the norm of a vector, i.e. its distance from the origin.
|
62
|
-
Measurable.
|
63
|
-
```
|
52
|
+
Measurable.euclidean([1, 1]) # => 1.4142135623730951
|
64
53
|
|
65
|
-
|
66
|
-
|
67
|
-
`RDoc` syntax is used to document the project. To build it locally, you'll need to install the [Fivefish generator](https://github.com/ged/rdoc-generator-fivefish) (`gem install rdoc-generator-fivefish`) and run the following command:
|
54
|
+
# Get the cosine distance between
|
55
|
+
Measurable.cosine_distance([1, 2], [2, 3]) # => 0.007722123286332261
|
68
56
|
|
69
|
-
|
70
|
-
|
57
|
+
# Calculate sum of squares directly.
|
58
|
+
Measurable.euclidean_squared([3, 4]) # => 25
|
71
59
|
```
|
72
60
|
|
73
|
-
|
61
|
+
Most of the methods accept arbitrary enumerable objects instead of Arrays. For example, it's possible to use [NMatrix](https://github.com/sciruby/nmatrix).
|
62
|
+
|
63
|
+
## Documentation
|
74
64
|
|
75
|
-
|
65
|
+
The documentation is hosted on [rubydoc](http://www.rubydoc.info/github/agarie/measurable).
|
76
66
|
|
77
67
|
## License
|
78
68
|
|
@@ -81,4 +71,4 @@ See LICENSE for details.
|
|
81
71
|
The original `distance_measures` gem is copyrighted by [@reddavis][reddavis].
|
82
72
|
|
83
73
|
[maxmin]: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05156398
|
84
|
-
[reddavis]: (https://github.com/reddavis)
|
74
|
+
[reddavis]: (https://github.com/reddavis)
|
data/Rakefile
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
require 'rake'
|
2
2
|
require 'bundler/gem_tasks'
|
3
3
|
require "rspec/core/rake_task"
|
4
|
-
|
4
|
+
require 'rdoc/task'
|
5
5
|
|
6
6
|
# Setup the necessary gems, specified in the gemspec.
|
7
7
|
require 'bundler'
|
@@ -13,20 +13,26 @@ rescue Bundler::BundlerError => e
|
|
13
13
|
exit e.status_code
|
14
14
|
end
|
15
15
|
|
16
|
+
task :default => [:spec]
|
17
|
+
|
16
18
|
# Run all the specs.
|
17
19
|
RSpec::Core::RakeTask.new(:spec)
|
18
20
|
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
21
|
+
RDoc::Task.new do |rdoc|
|
22
|
+
rdoc.main = "README.md"
|
23
|
+
rdoc.rdoc_files.include("README.md", "LICENSE", "lib")
|
24
|
+
rdoc.generator = "fivefish"
|
25
|
+
rdoc.external = true
|
26
|
+
end
|
27
|
+
|
28
|
+
desc "Open IRB with Measurable loaded."
|
29
|
+
task :console do
|
30
|
+
require 'irb'
|
31
|
+
require 'irb/completion'
|
32
|
+
require 'measurable'
|
33
|
+
ARGV.clear
|
34
|
+
IRB.start
|
35
|
+
end
|
30
36
|
|
31
37
|
# Compile task.
|
32
38
|
# Rake::ExtensionTask.new do |ext|
|
data/lib/measurable.rb
CHANGED
@@ -2,15 +2,18 @@ require 'measurable/version'
|
|
2
2
|
|
3
3
|
# Distance measures. The require order is important.
|
4
4
|
require 'measurable/euclidean'
|
5
|
+
require 'measurable/minkowski'
|
5
6
|
require 'measurable/cosine'
|
6
7
|
require 'measurable/jaccard'
|
7
8
|
require 'measurable/tanimoto'
|
8
|
-
require 'measurable/
|
9
|
+
require 'measurable/chebyshev'
|
9
10
|
require 'measurable/maxmin'
|
11
|
+
require 'measurable/haversine'
|
12
|
+
require 'measurable/hamming'
|
13
|
+
require 'measurable/levenshtein'
|
14
|
+
require 'measurable/kullback_leibler'
|
10
15
|
|
11
16
|
module Measurable
|
12
17
|
# PI / 180 degrees.
|
13
18
|
RAD_PER_DEG = Math::PI / 180
|
14
|
-
|
15
|
-
extend self # expose all instance methods as singleton methods.
|
16
19
|
end
|
@@ -0,0 +1,24 @@
|
|
1
|
+
module Measurable
|
2
|
+
module Chebyshev
|
3
|
+
|
4
|
+
# call-seq:
|
5
|
+
# chebyshev(u, v) -> Float
|
6
|
+
#
|
7
|
+
# Arguments:
|
8
|
+
# - +u+ -> An array of Numeric objects.
|
9
|
+
# - +v+ -> An array of Numeric objects.
|
10
|
+
# Returns:
|
11
|
+
# - The L-infinite distance between +u+ and +v+.
|
12
|
+
# Raises:
|
13
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
14
|
+
def chebyshev(u, v)
|
15
|
+
# TODO: Change this to a more specific, custom-made exception.
|
16
|
+
raise ArgumentError if u.size != v.size
|
17
|
+
|
18
|
+
abs_differences = u.zip(v).map { |a| (a[0] - a[1]).abs }
|
19
|
+
abs_differences.max
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
extend Measurable::Chebyshev
|
24
|
+
end
|
data/lib/measurable/cosine.rb
CHANGED
@@ -1,27 +1,69 @@
|
|
1
|
+
require 'measurable/euclidean'
|
2
|
+
|
1
3
|
module Measurable
|
4
|
+
module Cosine
|
5
|
+
|
6
|
+
# call-seq:
|
7
|
+
# cosine_similarity(u, v) -> Float
|
8
|
+
#
|
9
|
+
# Calculate the cosine similarity between the orientation of two vectors.
|
10
|
+
#
|
11
|
+
# See: http://en.wikipedia.org/wiki/Cosine_similarity
|
12
|
+
#
|
13
|
+
# Arguments:
|
14
|
+
# - +u+ -> An array of Numeric objects.
|
15
|
+
# - +v+ -> An array of Numeric objects.
|
16
|
+
# Returns:
|
17
|
+
# - The normalized dot product of +u+ and +v+, that is, the angle between
|
18
|
+
# them in the n-dimensional space.
|
19
|
+
# Raises:
|
20
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
21
|
+
#
|
22
|
+
def cosine_similarity(u, v)
|
23
|
+
# TODO: Change this to a more specific, custom-made exception.
|
24
|
+
raise ArgumentError if u.size != v.size
|
25
|
+
|
26
|
+
dot_product = u.zip(v).reduce(0.0) { |acc, ary| acc += ary[0] * ary[1] }
|
27
|
+
|
28
|
+
dot_product / (euclidean(u) * euclidean(v))
|
29
|
+
end
|
2
30
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
31
|
+
# call-seq:
|
32
|
+
# cosine_distance(u, v) -> Float
|
33
|
+
#
|
34
|
+
# Calculate the cosine distance between the orientation of two vectors.
|
35
|
+
#
|
36
|
+
# See: http://en.wikipedia.org/wiki/Cosine_similarity
|
37
|
+
#
|
38
|
+
# Arguments:
|
39
|
+
# - +u+ -> An array of Numeric objects.
|
40
|
+
# - +v+ -> An array of Numeric objects.
|
41
|
+
# Returns:
|
42
|
+
# - The normalized dot product of +u+ and +v+, that is, the angle between
|
43
|
+
# them in the n-dimensional space.
|
44
|
+
# Raises:
|
45
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
46
|
+
def cosine_distance(u, v)
|
47
|
+
# TODO: Change this to a more specific, custom-made exception.
|
48
|
+
raise ArgumentError if u.size != v.size
|
49
|
+
|
50
|
+
1 - cosine_similarity(u, v)
|
51
|
+
end
|
52
|
+
|
53
|
+
def self.extended(base) # :nodoc:
|
54
|
+
base.instance_eval do
|
55
|
+
extend Measurable::Euclidean
|
56
|
+
end
|
57
|
+
super
|
58
|
+
end
|
59
|
+
|
60
|
+
def self.included(base) # :nodoc:
|
61
|
+
base.class_eval do
|
62
|
+
include Measurable::Euclidean
|
63
|
+
end
|
64
|
+
super
|
65
|
+
end
|
26
66
|
end
|
27
|
-
|
67
|
+
|
68
|
+
extend Measurable::Cosine
|
69
|
+
end
|
data/lib/measurable/euclidean.rb
CHANGED
@@ -1,76 +1,64 @@
|
|
1
1
|
module Measurable
|
2
|
+
module Euclidean
|
2
3
|
|
3
|
-
|
4
|
-
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
def euclidean(u, v = nil)
|
23
|
-
# If the second argument is nil, the method should return the norm of
|
24
|
-
# vector u. For this, we need the distance between u and the origin.
|
25
|
-
if v.nil?
|
26
|
-
v = Array.new(u.size, 0)
|
4
|
+
# call-seq:
|
5
|
+
# euclidean(u) -> Float
|
6
|
+
# euclidean(u, v) -> Float
|
7
|
+
#
|
8
|
+
# Calculate the ordinary distance between arrays +u+ and +v+.
|
9
|
+
#
|
10
|
+
# If +v+ isn't given, calculate the Euclidean norm of +u+.
|
11
|
+
#
|
12
|
+
# See: http://en.wikipedia.org/wiki/Euclidean_distance#N_dimensions
|
13
|
+
#
|
14
|
+
# Arguments:
|
15
|
+
# - +u+ -> An array of Numeric objects.
|
16
|
+
# - +v+ -> (Optional) An array of Numeric objects.
|
17
|
+
# Returns:
|
18
|
+
# - The euclidean norm of +u+ or the euclidean distance between +u+ and +v+.
|
19
|
+
# Raises:
|
20
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
21
|
+
def euclidean(u, v = nil)
|
22
|
+
Math.sqrt(self.euclidean_squared(u, v))
|
27
23
|
end
|
28
24
|
|
29
|
-
#
|
30
|
-
|
25
|
+
# call-seq:
|
26
|
+
# euclidean_squared(u) -> Float
|
27
|
+
# euclidean_squared(u, v) -> Float
|
28
|
+
#
|
29
|
+
# Calculate the same value as euclidean(u, v), but don't take the square root
|
30
|
+
# of it.
|
31
|
+
#
|
32
|
+
# This isn't a metric in the strict sense, i.e. it doesn't respect the
|
33
|
+
# triangle inequality. However, the squared Euclidean distance is very useful
|
34
|
+
# whenever only the relative values of distances are important, for example
|
35
|
+
# in optimization problems.
|
36
|
+
#
|
37
|
+
# See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
|
38
|
+
#
|
39
|
+
# Arguments:
|
40
|
+
# - +u+ -> An array of Numeric objects.
|
41
|
+
# - +v+ -> (Optional) An array of Numeric objects.
|
42
|
+
# Returns:
|
43
|
+
# - The squared value of the euclidean norm of +u+ or of the euclidean
|
44
|
+
# distance between +u+ and +v+.
|
45
|
+
# Raises:
|
46
|
+
# - +ArgumentError+ -> The sizes of +u+ and +v+ don't match.
|
47
|
+
def euclidean_squared(u, v = nil)
|
48
|
+
# If the second argument is nil, the method should return the norm of
|
49
|
+
# vector u. For this, we need the distance between u and the origin.
|
50
|
+
if v.nil?
|
51
|
+
v = Array.new(u.size, 0)
|
52
|
+
end
|
31
53
|
|
32
|
-
|
33
|
-
|
34
|
-
end
|
35
|
-
|
36
|
-
Math.sqrt(sum)
|
37
|
-
end
|
54
|
+
# TODO: Change this to a more specific, custom-made exception.
|
55
|
+
raise ArgumentError if u.size != v.size
|
38
56
|
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
#
|
43
|
-
# Calculate the same value as euclidean(u, v), but don't take the square root
|
44
|
-
# of it.
|
45
|
-
#
|
46
|
-
# This isn't a metric in the strict sense, i.e. it doesn't respect the
|
47
|
-
# triangle inequality. However, the squared Euclidean distance is very useful
|
48
|
-
# whenever only the relative values of distances are important, for example
|
49
|
-
# in optimization problems.
|
50
|
-
#
|
51
|
-
# See: http://en.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance
|
52
|
-
#
|
53
|
-
# * *Arguments* :
|
54
|
-
# - +u+ -> An array of Numeric objects.
|
55
|
-
# - +v+ -> (Optional) An array of Numeric objects.
|
56
|
-
# * *Returns* :
|
57
|
-
# - The squared value of the euclidean norm of +u+ or of the euclidean
|
58
|
-
# distance between +u+ and +v+.
|
59
|
-
# * *Raises* :
|
60
|
-
# - +ArgumentError+ -> The sizes of +u+ and +v+ doesn't match.
|
61
|
-
#
|
62
|
-
def euclidean_squared(u, v = nil)
|
63
|
-
# If the second argument is nil, the method should return the norm of
|
64
|
-
# vector u. For this, we need the distance between u and the origin.
|
65
|
-
if v.nil?
|
66
|
-
v = Array.new(u.size, 0)
|
67
|
-
end
|
68
|
-
|
69
|
-
# TODO: Change this to a more specific, custom-made exception.
|
70
|
-
raise ArgumentError if u.size != v.size
|
71
|
-
|
72
|
-
u.zip(v).reduce(0.0) do |acc, ary|
|
73
|
-
acc += (ary[0] - ary[-1]) ** 2
|
57
|
+
u.zip(v).reduce(0.0) do |acc, ary|
|
58
|
+
acc += (ary[0] - ary[-1]) ** 2
|
59
|
+
end
|
74
60
|
end
|
75
61
|
end
|
76
|
-
|
62
|
+
|
63
|
+
extend Measurable::Euclidean
|
64
|
+
end
|