rmds 0.2

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,27 @@
1
+ License
2
+ =====================================
3
+
4
+ RMDS - Ruby Multidimensional Scaling Library
5
+ Copyright (c) 2010, Christoph Heindl
6
+ All rights reserved.
7
+
8
+ Redistribution and use in source and binary forms, with or without modification,
9
+ are permitted provided that the following conditions are met:
10
+ - Redistributions of source code must retain the above copyright notice, this list
11
+ of conditions and the following disclaimer.
12
+ - Redistributions in binary form must reproduce the above copyright notice, this list
13
+ of conditions and the following disclaimer in the documentation and/or other materials
14
+ provided with the distribution.
15
+ - Neither the name of the Christoph Heindl nor the names of its contributors may be
16
+ used to endorse or promote products derived from this software without specific prior
17
+ written permission.
18
+
19
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
22
+ IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
23
+ INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
24
+ NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA,
25
+ OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
26
+ WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
27
+ IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,157 @@
1
+ # Ruby Multidimensional Scaling Library
2
+
3
+ ## Introduction
4
+
5
+ RMDS is a library for performing multidimensional scaling.
6
+
7
+ [Wikipedia][wiki_mds] describes multidimensional scaling (MDS) as
8
+ > [...] a set of related statistical techniques often used in information
9
+ > visualization for exploring similarities or dissimilarities in data.
10
+
11
+ In essence, multidimensional scaling takes a matrix of similarities or dissimilarities between pairwise observations as input and outputs a matrix of observations in Cartesian coordinates that preserve the similarities/dissimilarities given. The dimensionality of the output is a parameter to the algorithm.
12
+
13
+ ## Metric Multidimensional Scaling
14
+
15
+ RMDS implements metric multidimensional scaling in which dissimilarities are assumed to be distances. The result of this multidimensional scaling variant are coordinates in the Euclidean space that explain the given distances. In general, the embedding found is not unique. Any rotation or translation applied to found embedding does not change the pairwise distances.
16
+
17
+ ## Linear Algebra Backends
18
+
19
+ RMDS makes heavy use of linear algebra routines, but does not ship with linear algebra algorithms. Instead, RMDS has a non-intrusive adapter architecture to connect existing linear algebra packages to RMDS. For how-to details on providing new adapters for RMDS see {MDS::MatrixInterface}.
20
+
21
+ Note that the performance of most RMDS algorithms is dominated by the algorithms and performance of the linear algebra backend used.
22
+
23
+ Currently the following linear algebra backends are supported
24
+
25
+ - {MDS::StdlibInterface} - Connects Ruby's core matrix class to RMDS.
26
+ - {MDS::GSLInterface} - Connects the GNU Scientific Library to RMDS.
27
+ - {MDS::LinalgInterface} - Connects LAPACK and BLAS via Linalg to RMDS.
28
+
29
+ ## Examples
30
+
31
+ The following successfully calculates a two dimensional Cartesian embedding for a given distance matrix.
32
+
33
+ require 'mds'
34
+ require 'mds/interfaces/gsl_interface'
35
+
36
+ # Tell RMDS the linear algebra backend to be used.
37
+ MDS::Backend.interface = MDS::GSLInterface
38
+
39
+ # The squared Euclidean distance matrix.
40
+ d2 = MDS::Matrix.create_rows(
41
+ [0.0, 10.0, 2.0],
42
+ [10.0, 0.0, 20.0],
43
+ [2.0, 20.0, 0.0]
44
+ )
45
+
46
+ # Find a Cartesian embedding in two dimensions
47
+ # that approximates the distances in two dimensions.
48
+ x = MDS::Metric.projectd(d2, 2)
49
+
50
+ The result, *x*, of the above example is shown in the following image. In red the coordinates that yield the input matrix *d2*. In green, the resulting embedding *x* which preserves distances between individual observations and is unique up to rigid transformations.
51
+
52
+ ![Result of MDS](http://github.com/cheind/rmds/raw/master/docs/readme_example.png)
53
+
54
+ The following example works on a distance matrix which originates from air distances between european cities. It does not require a concrete linear algebra backend, but rather chooses an available one. The results are diagrammed graphically using Gnuplot and compared to the result from a mapping tool.
55
+
56
+ require 'mds'
57
+ require 'gnuplot' # gem install gnuplot
58
+
59
+ # Load backend
60
+ MDS::Backend.try_require
61
+ MDS::Backend.active = MDS::Backend.first
62
+ puts "Using backend #{MDS::Backend.active}"
63
+
64
+ # Load distance matrix from file, contains city names in first column
65
+ path = File.join(File.dirname(__FILE__), 'european_city_distances.csv')
66
+ cities = []
67
+ rows = MDS::IO::read_csv(path, ';') do |entry|
68
+ begin
69
+ f = Float(entry)
70
+ f * f # we use squared distances
71
+ rescue ArgumentError
72
+ cities << entry
73
+ nil
74
+ end
75
+ end
76
+
77
+ # Invoke MDS
78
+
79
+ d2 = MDS::Matrix.create_rows(*rows)
80
+ # Find a projection that preserves 90 percent of the variance of the distances.
81
+ proj = MDS::Metric.projectk(d2, 0.9)
82
+
83
+ # Plot results
84
+
85
+ Gnuplot.open do |gp|
86
+ Gnuplot::Plot.new( gp ) do |plot|
87
+
88
+ # Uncomment the following lines to write result to image.
89
+ # plot.term 'png size'
90
+ # plot.output 'visualization.png'
91
+
92
+ plot.title "Air Distances between European Cities"
93
+ plot.xrange "[-2000:2000]"
94
+ plot.yrange "[-1500:1200]"
95
+
96
+ plot.data << Gnuplot::DataSet.new(proj.columns) do |ds|
97
+ ds.with = "points"
98
+ ds.notitle
99
+ end
100
+
101
+ cities.each_with_index do |name, i|
102
+ plot.label "'#{name}' at #{proj[i,0] + 30}, #{proj[i,1] + 30}"
103
+ end
104
+ end
105
+ end
106
+
107
+ The following image show the result of the above script. A map is shown in a separate image for comparison.
108
+
109
+ Keep in mind that
110
+
111
+ - MDS finds an embedding up to rotation and translation
112
+ - The input matrix contains air-distances and the map shows correct goedesic distances.
113
+
114
+ ![Result of MDS](http://github.com/cheind/rmds/raw/master/docs/readme_mds_cities.png)
115
+
116
+ ![Result of Mapping](http://github.com/cheind/rmds/raw/master/docs/readme_cities.png)
117
+
118
+ ## Benchmarks
119
+ The following tables show the results of benchmarking different linear algebra packages against three test scenarios. They were generated by invoking 'rake test:bench:all' on a QuadCore 2.4 GHz machine, running Ubuntu 10.4 inside a virtual machine.
120
+
121
+ Note that the each test involves creation of the observations randomly, calculating the squared euclidean distance matrix of observations and invoking MDS. The first two steps are not listed explicitely since they only take roughly 5 percent of the total runtime.
122
+
123
+ Benchmarking RMDS for 10 observations in 2-dimensional space
124
+ user system total real
125
+ stdlib: 3.710000 1.220000 4.930000 ( 5.195076)
126
+ gsl: 0.000000 0.000000 0.000000 ( 0.067544)
127
+ linalg: 0.000000 0.000000 0.000000 ( 0.005462)
128
+
129
+ Benchmarking RMDS for 100 observations in 5-dimensional space
130
+ user system total real
131
+ stdlib: inf inf inf ( inf)
132
+ gsl: 0.030000 0.010000 0.040000 ( 0.043115)
133
+ linalg: 0.020000 0.010000 0.030000 ( 0.043157)
134
+
135
+ Benchmarking RMDS for 1000 observations in 10-dimensional space
136
+ user system total real
137
+ stdlib: inf inf inf ( inf)
138
+ gsl: 34.070000 1.390000 35.460000 ( 39.013276)
139
+ linalg: 12.750000 0.340000 13.090000 ( 13.645427)
140
+
141
+ Because of documented limitations of {MDS::StdlibInterface} timings for this interface are shown only for the smallest of the three scenarios. Neither GSL nor LAPACK/BLAS was re-compiled for optimal performance.
142
+
143
+ ## Requirements
144
+
145
+ RMDS itself does not have any dependencies except Ruby. Each matrix interface is likley to depend on one or more external projects, such as bindings and native libraries.
146
+
147
+ RMDS is tested against Ruby 1.8.7 and Ruby 1.9.1.
148
+
149
+ ## Documentation
150
+
151
+ An up-to-date documentation of the current master branch can be found [here](http://rdoc.info/github/cheind/rmds/master/frames).
152
+
153
+ ## License
154
+
155
+ *RMDS* is copyright 2010 Christoph Heindl. It is free software, and may be redistributed under the terms specified in the {file:LICENSE.md} file.
156
+
157
+ [wiki_mds]: http://en.wikipedia.org/wiki/Multidimensional_scaling "Wikipedia - Multidimensional Scaling"
@@ -0,0 +1,77 @@
1
+ #
2
+ # RMDS - Ruby Multidimensional Scaling Library
3
+ # Copyright (c) Christoph Heindl, 2010
4
+ # http://github.com/cheind/rmds
5
+ #
6
+
7
+ require 'lib/mds'
8
+ require 'rake'
9
+ require 'rake/testtask'
10
+ require 'rake/rdoctask'
11
+
12
+ task :default => ['test:unit:all']
13
+
14
+ namespace 'test' do
15
+
16
+ namespace 'unit' do
17
+
18
+ desc 'Run all unit tests'
19
+ Rake::TestTask.new('all') do |t|
20
+ t.pattern = FileList['test/unit/**/test_*.rb']
21
+ t.verbose = false
22
+ t.warning = false
23
+ end
24
+
25
+ desc 'Run unit tests for GSL interface'
26
+ Rake::TestTask.new('gsl') do |t|
27
+ t.pattern = FileList['test/unit/**/test_*gsl_*.rb']
28
+ t.verbose = false
29
+ t.warning = false
30
+ end
31
+
32
+ desc 'Run unit tests for Stdlib interface'
33
+ Rake::TestTask.new('std') do |t|
34
+ t.pattern = FileList['test/unit/**/test_*stdlib_*.rb']
35
+ t.verbose = false
36
+ t.warning = false
37
+ end
38
+
39
+ desc 'Run unit tests for Linalg interface'
40
+ Rake::TestTask.new('linalg') do |t|
41
+ t.pattern = FileList['test/unit/**/test_*linalg_*.rb']
42
+ t.verbose = false
43
+ t.warning = false
44
+ end
45
+ end
46
+
47
+ namespace 'bench' do
48
+
49
+ desc 'Run all benchmarks'
50
+ task('all') do |t|
51
+ require 'test/benchmark/benchmark_metric'
52
+ end
53
+
54
+ end
55
+
56
+ end
57
+
58
+ namespace 'docs' do
59
+
60
+ desc "Generate rdoc documentation"
61
+ Rake::RDocTask.new do |rd|
62
+ rd.rdoc_dir = "doc"
63
+ rd.main = 'README.md'
64
+ rd.options << '--title' << "RMDS #{MDS::VERSION} Documentation"
65
+ rd.rdoc_files.include('README.md', 'LICENSE.md', 'lib/**/*.rb', 'examples/**/*.rb')
66
+ end
67
+
68
+ begin
69
+ require 'yard'
70
+ YARD::Rake::YardocTask.new do |t|
71
+ t.files = ['lib/**/*.rb', 'examples/**/*.rb', '-', 'LICENSE.md']
72
+ t.options = ['--no-cache', '--title', "RMDS #{MDS::VERSION} Documentation"]
73
+ end
74
+ rescue LoadError
75
+ warn '**YARD is missing, disabling docs:yard task'
76
+ end
77
+ end
@@ -0,0 +1,61 @@
1
+ #
2
+ # RMDS - Ruby Multidimensional Scaling Library
3
+ # Copyright (c) Christoph Heindl, 2010
4
+ # http://github.com/cheind/rmds
5
+ #
6
+
7
+ module MDS
8
+ #
9
+ # Examples for RMDS
10
+ #
11
+ module Examples
12
+
13
+ #
14
+ # Illustrates advanced usage of {MDS::Backend}.
15
+ #
16
+ def Examples.adaptive_backend(argv)
17
+ require 'mds'
18
+
19
+ # Try adding interfaces from command line arguments
20
+ # Make sure that at least the standard RMDS interfaces
21
+ # are included
22
+ argv << './lib/mds/interfaces/*.rb'
23
+
24
+ while !argv.empty?
25
+ MDS::Backend.try_require(argv.shift)
26
+ end
27
+
28
+ # Select the first available interface.
29
+ # Additionally pass an array of interace names that are preferred.
30
+ interface = MDS::Backend.first(['MDS::GSLInterface', 'MDS::LinalgInterface'])
31
+ raise 'No interface found' unless interface
32
+ puts "Using interface #{interface}"
33
+
34
+ MDS::Backend.active = interface
35
+
36
+ # The squared Euclidean distance matrix.
37
+ d2 = MDS::Matrix.create_rows(
38
+ [0.0, 10.0, 2.0],
39
+ [10.0, 0.0, 20.0],
40
+ [2.0, 20.0, 0.0]
41
+ )
42
+
43
+ # Find a Cartesian embedding in two dimensions
44
+ # that approximates the distances in d2.
45
+ x = MDS::Metric.projectd(d2, 2)
46
+
47
+ puts x.matrix
48
+ # x => -0.6037 0.2828
49
+ # 2.5370 -0.0841
50
+ # -1.9330 -0.1987
51
+ end
52
+ end
53
+
54
+ end
55
+
56
+ if __FILE__ == $0
57
+ $:.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
58
+ MDS::Examples.adaptive_backend(ARGV)
59
+ end
60
+
61
+
@@ -0,0 +1,82 @@
1
+ #
2
+ # RMDS - Ruby Multidimensional Scaling Library
3
+ # Copyright (c) Christoph Heindl, 2010
4
+ # http://github.com/cheind/rmds
5
+ #
6
+
7
+ module MDS
8
+
9
+ #
10
+ # Examples for RMDS
11
+ #
12
+ module Examples
13
+ #
14
+ # Illustrates usage of {MDS::Metric}
15
+ #
16
+ def Examples.extended_metric
17
+ require 'mds'
18
+
19
+ # Prepare the linear algebra backend to be used.
20
+ require 'mds/interfaces/stdlib_interface'
21
+ MDS::Backend.active = MDS::StdlibInterface
22
+
23
+ # Observations, usually unknown
24
+ x = MDS::Matrix.create_rows(
25
+ [1.0, 2.0], # Point A
26
+ [4.0, 3.0], # Point B
27
+ [0.0, 1.0] # Point C
28
+ )
29
+
30
+ # Calculate the squared Euclidean distance matrix of observations,
31
+ # which is usually the only given input for MDS.
32
+
33
+ d = MDS::Metric.squared_distances(x)
34
+
35
+ # Find an Cartesian embedding of D, such that 99 percent
36
+ # of the variances of distances in D are preserved.
37
+
38
+ x_proj = MDS::Metric.projectk(d, 0.99)
39
+ # => [[-0.60 0.28],
40
+ # [ 2.53 -0.08],
41
+ # [-1.93 -0.19]]
42
+
43
+ # Since distances are preserved it should hold
44
+ # that the squared Euclidean distance matrix of
45
+ # x_proj equals D upto a certain delta.
46
+
47
+ diff = MDS::Metric.squared_distances(x_proj) - d
48
+ # => [[0.0 0.0 0.0],
49
+ # [0.0 0.0 0.0],
50
+ # [0.0 0.0 0.0]]
51
+
52
+ puts "Deviation from input"
53
+ puts diff.matrix
54
+
55
+ # If less accuracy of preserved distances is needed,
56
+ # one may lower the percentage. In our case, 80% are
57
+ # reflected in an embedding with just one dimension.
58
+ #
59
+ # If you look at the original input, X, this is true,
60
+ # since all points are close be colinear.
61
+
62
+ x_proj = MDS::Metric.projectk(d, 0.8)
63
+ # => [[-0.60],
64
+ # [ 2.53],
65
+ # [-1.93]]
66
+
67
+ # Therefore, the distance matrix yields a reduced
68
+ # degree of accuracy.
69
+
70
+ diff = MDS::Metric.squared_distances(x_proj) - d
71
+ # => [[ 0.0 -0.13 -0.23],
72
+ # [-0.13 0.0 -0.01],
73
+ # [-0.23 -0.01 0.0]]
74
+ end
75
+ end
76
+ end
77
+
78
+
79
+ if __FILE__ == $0
80
+ $:.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
81
+ MDS::Examples.extended_metric
82
+ end
@@ -0,0 +1,52 @@
1
+ #
2
+ # RMDS - Ruby Multidimensional Scaling Library
3
+ # Copyright (c) Christoph Heindl, 2010
4
+ # http://github.com/cheind/rmds
5
+ #
6
+
7
+ module MDS
8
+ #
9
+ # Examples for RMDS
10
+ #
11
+ module Examples
12
+
13
+ #
14
+ # Illustrates usage of {MDS::Metric}.
15
+ #
16
+ # Given a matrix of squared Euclidean distances, find
17
+ # a two-dimensional Cartesian embedding.
18
+ #
19
+ def Examples.minimal_metric
20
+ # Load RMDS
21
+ require 'mds'
22
+
23
+ # Prepare the linear algebra backend to be used.
24
+ require 'mds/interfaces/gsl_interface'
25
+ MDS::Backend.active = MDS::GSLInterface
26
+
27
+ # The squared Euclidean distance matrix.
28
+ d2 = MDS::Matrix.create_rows(
29
+ [0.0, 10.0, 2.0],
30
+ [10.0, 0.0, 20.0],
31
+ [2.0, 20.0, 0.0]
32
+ )
33
+
34
+ # Find a Cartesian embedding in two dimensions
35
+ # that approximates the distances in d2.
36
+ x = MDS::Metric.projectd(d2, 2)
37
+
38
+ puts x.matrix
39
+ # x => -0.6037 0.2828
40
+ # 2.5370 -0.0841
41
+ # -1.9330 -0.1987
42
+ end
43
+ end
44
+
45
+ end
46
+
47
+ if __FILE__ == $0
48
+ $:.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
49
+ MDS::Examples.minimal_metric
50
+ end
51
+
52
+