experiment 0.2.0 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- data/Manifest.txt +5 -3
- data/{README.rdoc → README.md} +26 -57
- data/Rakefile +3 -1
- data/bin/experiment +28 -9
- data/lib/experiment/base.rb +165 -65
- data/lib/experiment/config.rb +140 -11
- data/lib/experiment/distributed.rb +51 -20
- data/lib/experiment/factorial.rb +227 -0
- data/lib/experiment/generator/{experiment_template.rb → experiment_template.rb.txt} +10 -10
- data/lib/experiment/generator/readme_template.txt +2 -2
- data/lib/experiment/notify.rb +150 -146
- data/lib/experiment/params.rb +18 -0
- data/lib/experiment/runner.rb +50 -8
- data/lib/experiment/stats/descriptive.rb +58 -0
- data/lib/experiment/work_server.rb +20 -5
- data/lib/experiment.rb +1 -1
- data/test/test_stats.rb +9 -3
- metadata +12 -9
- data/lib/experiment/stats.rb +0 -43
data/Manifest.txt
CHANGED
@@ -1,18 +1,20 @@
|
|
1
1
|
History.txt
|
2
2
|
Manifest.txt
|
3
|
-
README.
|
3
|
+
README.md
|
4
4
|
Rakefile
|
5
5
|
lib/experiment.rb
|
6
6
|
lib/experiment/config.rb
|
7
|
-
lib/experiment/stats.rb
|
7
|
+
lib/experiment/stats/descriptive.rb
|
8
8
|
lib/experiment/runner.rb
|
9
9
|
lib/experiment/generator/readme_template.txt
|
10
|
-
lib/experiment/generator/experiment_template.rb
|
10
|
+
lib/experiment/generator/experiment_template.rb.txt
|
11
11
|
lib/experiment/generator/Rakefile
|
12
12
|
lib/experiment/base.rb
|
13
13
|
lib/experiment/notify.rb
|
14
14
|
lib/experiment/work_server.rb
|
15
15
|
lib/experiment/distributed.rb
|
16
|
+
lib/experiment/factorial.rb
|
17
|
+
lib/experiment/params.rb
|
16
18
|
test/test_experiment.rb
|
17
19
|
test/test_helper.rb
|
18
20
|
bin/experiment
|
data/{README.rdoc → README.md}
RENAMED
@@ -1,15 +1,12 @@
|
|
1
|
-
|
2
|
-
* http://github.com/gampleman/experiment
|
3
|
-
|
4
|
-
== What's it about?
|
1
|
+
## What's it about?
|
5
2
|
|
6
3
|
Experiment is a ruby library and environment for running scientific experiments (eg. AI, GA...), especially good for experiments in optimizing results by variations in algorithm or parameters.
|
7
4
|
|
8
|
-
|
5
|
+
## Installation
|
9
6
|
|
10
7
|
$ sudo gem install experiment
|
11
8
|
|
12
|
-
|
9
|
+
## Getting started
|
13
10
|
|
14
11
|
Experiment is modeled after rails and the workflow should be recognizable enough.
|
15
12
|
|
@@ -21,21 +18,23 @@ This will create several files and directories. We will shortly introduce you to
|
|
21
18
|
|
22
19
|
First off is the `app` directory. This is where a basic implementation of what you mean to do. You can write your code however you want, just make sure the code is well structured - you will be overriding this later in your experiments.
|
23
20
|
|
24
|
-
|
21
|
+
## Setting up an experiment
|
25
22
|
|
26
23
|
Experiments are set up in the experiments directory. The first thing you need to do is define what consist an experiment in your case. For this open up the file `experiments/experiment.rb`. You will notice that this file contains a bunch of comments and a stub letting you easily understand what to do.
|
27
24
|
|
28
|
-
For a typical experiment you will need to do some setup work (eg. initialize your classes, calculate
|
25
|
+
For a typical experiment you will need to do some setup work (eg. initialize your classes, calculate parameters, etc.), run the experiment and maybe do cleanup (remove temp. files).
|
29
26
|
|
30
|
-
You do all this work in the `run_the_experiment` method. Use the `measure` method to wrap your measurements. These will be
|
27
|
+
You do all this work in the `run_the_experiment` method. Use the `measure` method to wrap your measurements. These will be automatically benchmarked and their output will be automatically saved to the results directory for further analysis.
|
31
28
|
|
32
|
-
The `
|
29
|
+
The `data_set` method lets you specify an array of data points that you want split for cross-validation. You can access this in your experiment with `test_data`.
|
33
30
|
|
34
31
|
Next you may want to analyze the data you got. For that there is the `analyze_result!` method which has 2 arguments. One is the raw data file that was output by your code and the other is the path to an expected output file (this can be very rich in detail, ideal for confusion matrices and the like). The method should return a hash of summary results (eg. `:total_performance => 16`).
|
35
32
|
|
36
33
|
All of this will be also saved to disk and available for later analysis.
|
37
34
|
|
38
|
-
|
35
|
+
More info [on the wiki](https://github.com/gampleman/Experiment/wiki/Designing-your-experiment).
|
36
|
+
|
37
|
+
## Creating an experimental condition
|
39
38
|
|
40
39
|
Now to get to making different conditions and measuring them. First call
|
41
40
|
|
@@ -46,7 +45,7 @@ This will create a directory in `experiments` based on the name you provide (in
|
|
46
45
|
|
47
46
|
Also notice that the description you provided is stored as a comment in that file. You can expand your hypothesis as you work on the file and it will be included in your report automatically.
|
48
47
|
|
49
|
-
|
48
|
+
## Running the experiment
|
50
49
|
|
51
50
|
Once you make the desired changes you can run the experiment with:
|
52
51
|
|
@@ -58,66 +57,34 @@ The experimental results and benchmarks will be written to this directory with a
|
|
58
57
|
|
59
58
|
Please notice that you can provide several different conditions to the run command and it will run them sequentially, all with required options.
|
60
59
|
|
61
|
-
|
60
|
+
More on the [Command Line Interface](https://github.com/gampleman/Experiment/wiki/Command-Line-Interface).
|
61
|
+
|
62
|
+
## Configuration
|
62
63
|
|
63
64
|
So far we have been talking mainly about variations in the source code of the experiments. But what if you just want to tweak a few parameters? There is always the almighty *Config* class to the rescue.
|
64
65
|
|
65
66
|
Experiment::Config[:my_config_variable] # anywhere in your code
|
66
67
|
|
67
|
-
|
68
|
-
|
69
|
-
`development` is the default environment, you can set any other with the `--env` option.
|
68
|
+
You have a config directory containing a `config.yaml` file. This file contains several environments. The idea is that you might want to tweak your options differently when running on your laptop then when running on a university supercomputer. Experiments also have their own config file that override the global.
|
70
69
|
|
71
|
-
|
70
|
+
More info on [the wiki](https://github.com/gampleman/Experiment/wiki/Configuration).
|
72
71
|
|
73
|
-
And finally when running an experiment you can use the -o or --options option to override any config you want.
|
74
72
|
|
75
|
-
Let me give you an example. Your main config file looks like this:
|
76
73
|
|
77
|
-
|
78
|
-
development:
|
79
|
-
ref_dir: /Users/kubowo/Desktop/points-vals
|
80
|
-
master_dir: /Users/kubowo/Desktop/points-vals/s014
|
81
|
-
alpha: 0.4
|
82
|
-
compute:
|
83
|
-
ref_dir: /afs/group/DB/points
|
84
|
-
master_dir: /afs/group/DB/points/s145
|
85
|
-
alpha: 0.4
|
74
|
+
## Cross Validation
|
86
75
|
|
87
|
-
|
76
|
+
*Cross validation* (CV) is one of the most crucial research methods in CS and AI. For that reason it is built right in. You specify how many CVs you want to run using the --cv flag and your data is automatically split up for you and the experiment is run for each CV with the appropriate data.
|
88
77
|
|
89
|
-
|
90
|
-
development:
|
91
|
-
alpha: 0.5
|
92
|
-
compute:
|
93
|
-
alpha: 0.6
|
94
|
-
|
95
|
-
And you run the experiment with
|
96
|
-
|
97
|
-
$ experiment run my_condition --env compute -o "master_dir: /Users/kubowo/Desktop/points-vals/s015"
|
98
|
-
|
99
|
-
Then your final config will look like this:
|
100
|
-
|
101
|
-
{ :ref_dir => "/afs/group/DB/points",
|
102
|
-
:master_dir => "/Users/kubowo/Desktop/points-vals/s015",
|
103
|
-
:alpha => 0.6 }
|
104
|
-
|
105
|
-
Flexible, eh? **NEW** Check out the *get* method. It has features like interpolation and defaults.
|
106
|
-
|
107
|
-
== Cross Validation
|
108
|
-
|
109
|
-
Cross validation (CV) is one of the most crucial research methods in CS and AI. For that reason it is built right in. You specify how many CVs you want to run using the --cv flag and your data is automatically split up for you and the experiment is run for each CV with the appropriate data.
|
110
|
-
|
111
|
-
== Reporting Results
|
78
|
+
## Reporting Results
|
112
79
|
|
113
80
|
$ experiment report
|
114
81
|
|
115
|
-
Surprise, surprise. This will create two files in your `report` directory (BTW, this directory is also meant for you to store your report or paper draft). The first is methods.mmd. This takes all the stuff you wrote in the beginnings of your experimental condition files and creates a multi-markdown
|
82
|
+
Surprise, surprise. This will create two files in your `report` directory (BTW, this directory is also meant for you to store your report or paper draft). The first is methods.mmd. This takes all the stuff you wrote in the beginnings of your experimental condition files and creates a [multi-markdown](http://fletcherpenney.net/multimarkdown/) file out of them (I chose multi-markdown for it's LaTEX support and also it is directly importable into Scrivener, my writing application of choice, available at <http://www.literatureandlatte.com/scrivener.html>).
|
116
83
|
|
117
84
|
The second file created is the `data.csv` file which contains the data from all your experiments. It should be importable to Numbers, Excel even Matlab for further analysis and charting.
|
118
85
|
|
119
86
|
|
120
|
-
|
87
|
+
## Distributed computing support
|
121
88
|
|
122
89
|
Newly this library supports a simple distributed model of running experiments. Setup worker computers with the
|
123
90
|
|
@@ -125,8 +92,10 @@ Newly this library supports a simple distributed model of running experiments. S
|
|
125
92
|
|
126
93
|
and then run experiments with --distributed flag.
|
127
94
|
|
128
|
-
|
95
|
+
More details: <https://github.com/gampleman/Experiment/wiki/Distributed-Mode>.
|
96
|
+
|
97
|
+
## Misc
|
129
98
|
|
130
|
-
So that's pretty much the gist of experiment. There's a few other features (and a few soon to come to a gem near you ;-) Growl notifications are now supported. Turn them
|
99
|
+
So that's pretty much the gist of experiment. There's a few other features (and a few soon to come to a gem near you ;-) Growl notifications are now supported. Turn them off by setting growl_notifications to false in your config file.
|
131
100
|
|
132
|
-
Also check out the RDocs
|
101
|
+
Also check out the [RDocs](http://rdoc.info/github/gampleman/Experiment/master/frames).
|
data/Rakefile
CHANGED
@@ -15,7 +15,9 @@ $hoe = Hoe.spec 'experiment' do
|
|
15
15
|
#self.post_install_message = 'PostInstall.txt' # TODO remove if post-install message not required
|
16
16
|
self.rubyforge_name = self.name # TODO this is default value
|
17
17
|
#self.extra_deps = [['ruby-growl','>= 1.0']]
|
18
|
-
|
18
|
+
self.summary = "A framework for running Scientific experiments."
|
19
|
+
self.description = "It provides basic command line tools for simply defining things like cross validations, factorial experimental design and basic statistics. All of this can be run in a distributed manner."
|
20
|
+
#self.homepage = 'https://github.com/gampleman/Experiment'
|
19
21
|
end
|
20
22
|
|
21
23
|
require 'newgem/tasks'
|
data/bin/experiment
CHANGED
@@ -21,11 +21,12 @@ class App
|
|
21
21
|
@options.verbose = false
|
22
22
|
@options.quiet = false
|
23
23
|
@options.env = :development
|
24
|
-
@options.cv = 5
|
24
|
+
@options.cv = nil#5
|
25
25
|
@options.description = ""
|
26
26
|
@options.opts = ""
|
27
27
|
@options.distributed = false
|
28
28
|
@options.master = "localhost"
|
29
|
+
@options.summary = false
|
29
30
|
end
|
30
31
|
|
31
32
|
# Parse options, check arguments, then process the command
|
@@ -57,15 +58,33 @@ class App
|
|
57
58
|
@opts.on('-V', '--verbose') { @options.verbose = true }
|
58
59
|
@opts.on('-q', '--quiet') { @options.quiet = true }
|
59
60
|
@opts.on('-e', '--env [ENV]', [:development, :compute], "Sets the environment to run in.") { |v| @options.env = v }
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
@
|
61
|
+
|
62
|
+
if ARGV.first == 'generate' || ARGV.first == "-h"
|
63
|
+
@opts.separator ""
|
64
|
+
@opts.separator "Options for `generate`:"
|
65
|
+
@opts.on('-m', '--description M', String, "Description or hypothesis for the condition being generated.") { |v| @options.description = v }
|
66
|
+
end
|
67
|
+
|
68
|
+
if ARGV.first == 'run' || ARGV.first == "-h"
|
69
|
+
@opts.separator ""
|
70
|
+
@opts.separator "Options for `run`:"
|
71
|
+
@opts.on('-c', '--cv CV', Integer, "The number of cross validations to run.") { |v| @options.cv = v }
|
72
|
+
@opts.on('-o', '--options OPTSTRING', String, "Options to override or define configuration with.", "format as: key1:val1,key2:val2") do |v|
|
73
|
+
@options.opts = v
|
74
|
+
end
|
75
|
+
@opts.on('--summary', "After a run of the experiment print out the summary to STDOUT.") { @options.summary = true }
|
76
|
+
@opts.on('-D', '--distributed', "Run with a distributed computing mode.", "This will be the master server/work cue.") { @options.distributed = true }
|
77
|
+
@opts.separator " Overrideable options (defined in config/config.yaml)"
|
78
|
+
@opts.separator " No options defined." unless Experiment::Config::parsing_for_options(@opts, @options)
|
64
79
|
end
|
65
|
-
@opts.on('-D', '--distributed', "Run with a distributed computing mode. This will be the master server/work cue.") { @options.distributed = true }
|
66
|
-
@opts.on('-a', '--address MODE', String, "Address to the master machine.") { |v| @options.master = v }
|
67
|
-
@opts.parse!(@arguments) #rescue return false
|
68
80
|
|
81
|
+
if ARGV.first == 'worker' || ARGV.first == "-h"
|
82
|
+
@opts.separator ""
|
83
|
+
@opts.separator "Options for `worker`:"
|
84
|
+
@opts.on('-a', '--address MODE', String, "Address to the master machine.") { |v| @options.master = v }
|
85
|
+
end
|
86
|
+
|
87
|
+
@opts.parse!(@arguments) #rescue return false
|
69
88
|
process_options
|
70
89
|
true
|
71
90
|
end
|
@@ -107,7 +126,7 @@ for a new experiment"
|
|
107
126
|
parser = RDoc::Parser.for top_level, File.dirname(__FILE__) + "/../lib/experiment/runner.rb", File.read(File.dirname(__FILE__) + "/../lib/experiment/runner.rb"), opts, stats
|
108
127
|
d = parser.scan
|
109
128
|
d.modules.first.classes.first.method_list.each do |m|
|
110
|
-
|
129
|
+
unless m.comment == "" || m.name == 'initialize' || m.name == 'new'
|
111
130
|
puts "== #{m.name == 'new_project' ? 'new' : m.name}"
|
112
131
|
puts m.comment
|
113
132
|
puts
|
data/lib/experiment/base.rb
CHANGED
@@ -1,23 +1,44 @@
|
|
1
1
|
require File.dirname(__FILE__) + "/notify"
|
2
|
-
require File.dirname(__FILE__) + "/stats"
|
2
|
+
require File.dirname(__FILE__) + "/stats/descriptive"
|
3
3
|
require File.dirname(__FILE__) + "/config"
|
4
|
+
require File.dirname(__FILE__) + "/params"
|
4
5
|
require File.dirname(__FILE__) + "/distributed"
|
5
6
|
require 'benchmark'
|
6
7
|
require "drb/drb"
|
8
|
+
require "yaml"
|
7
9
|
|
8
10
|
module Experiment
|
11
|
+
# The base class for defining experimental conditons.
|
12
|
+
# @author Jakub Hampl
|
13
|
+
# @see https://github.com/gampleman/Experiment/wiki/Designing-your-experiment
|
9
14
|
class Base
|
10
|
-
|
15
|
+
|
11
16
|
include Distributed
|
12
17
|
|
13
|
-
|
14
|
-
|
15
|
-
|
18
|
+
@@cleanup_raw_files = false
|
19
|
+
|
20
|
+
# The directory in which the results will be written to.
|
21
|
+
attr_reader :dir
|
22
|
+
# The number of the current cross-validation
|
23
|
+
attr_reader :current_cv
|
24
|
+
# The number of overall cross-validations
|
25
|
+
attr_reader :cvs
|
26
|
+
# The file the program is currently set to output to.
|
27
|
+
# Use this if you want to write additional data.
|
28
|
+
attr_reader :output_file
|
29
|
+
|
30
|
+
# Called internally by the framewrok
|
31
|
+
# @private
|
32
|
+
# @param [:normal, :master, :slave] mode
|
33
|
+
# @param [String] experiment name
|
34
|
+
def initialize(mode, experiment, options)
|
16
35
|
@experiment = experiment
|
36
|
+
@options = options
|
17
37
|
case mode
|
18
38
|
|
19
39
|
when :normal
|
20
40
|
@abm = []
|
41
|
+
|
21
42
|
when :master
|
22
43
|
@abm = []
|
23
44
|
extend DRb::DRbUndumped
|
@@ -25,31 +46,51 @@ module Experiment
|
|
25
46
|
when :slave
|
26
47
|
|
27
48
|
end
|
28
|
-
Experiment::Config::load(experiment, options, env)
|
49
|
+
Experiment::Config::load(experiment, options.opts, options.env)
|
29
50
|
@mode = mode
|
30
51
|
end
|
31
52
|
|
53
|
+
# Is the experiment done.
|
32
54
|
def done?
|
33
55
|
@done
|
34
56
|
end
|
35
57
|
|
36
|
-
|
58
|
+
# The default analysis function
|
59
|
+
# Not terribly useful, better to override
|
60
|
+
# @abstract Override for your own method analysis.
|
61
|
+
# @param [String] input file path of results written by `measure` calls.
|
62
|
+
# @param [String] output file path where to optionally write detailed analysis.
|
63
|
+
# @return [Hash] Summary of analysis.
|
64
|
+
def analyze_result!(input, output)
|
65
|
+
YAML::load_file(input)
|
66
|
+
end
|
67
|
+
|
68
|
+
# Sets up actions to do after the task is completed.
|
69
|
+
#
|
70
|
+
# This will be expanded in the future. Currently the only
|
71
|
+
# possible usage is with :delete_raw_files
|
72
|
+
# @example
|
73
|
+
# after_completion :delete_raw_files
|
74
|
+
# @param [:delete_raw_files] args If called will delete the raw-*.txt files
|
75
|
+
# in the {dir} after the experiment successfully completes.
|
76
|
+
def self.after_completion(*args)
|
77
|
+
@@cleanup_raw_files = args.include? :delete_raw_files
|
78
|
+
end
|
37
79
|
|
38
|
-
# runs the whole experiment
|
80
|
+
# runs the whole experiment, called by the framework
|
81
|
+
# @private
|
39
82
|
def normal_run!(cv)
|
40
83
|
@cvs = cv || 1
|
41
84
|
@results = {}
|
42
85
|
Notify.started @experiment
|
43
86
|
split_up_data
|
44
87
|
write_dir!
|
45
|
-
specification!
|
46
|
-
|
47
88
|
@cvs.times do |cv_num|
|
48
89
|
@bm = []
|
49
90
|
@current_cv = cv_num
|
50
91
|
File.open(@dir + "/raw-#{cv_num}.txt", "w") do |output|
|
51
92
|
@ouptut_file = output
|
52
|
-
run_the_experiment
|
93
|
+
run_the_experiment
|
53
94
|
end
|
54
95
|
array_merge @results, analyze_result!(@dir + "/raw-#{cv_num}.txt", @dir + "/analyzed-#{cv_num}.txt")
|
55
96
|
write_performance!
|
@@ -57,44 +98,113 @@ module Experiment
|
|
57
98
|
end
|
58
99
|
summarize_performance!
|
59
100
|
summarize_results! @results
|
101
|
+
specification!
|
102
|
+
cleanup!
|
60
103
|
Notify.completed @experiment
|
104
|
+
puts File.read(@dir + "/summary.mmd") if @options.summary
|
61
105
|
end
|
62
106
|
|
107
|
+
# Returns the portion of the {data_set} that corresponds
|
108
|
+
# to the current cross validation number.
|
109
|
+
# @return [Array]
|
110
|
+
def test_data
|
111
|
+
@data[@current_cv]
|
112
|
+
end
|
113
|
+
|
114
|
+
# Returns the {data_set} that *without* the {test_data}.
|
115
|
+
# @return [Array]
|
116
|
+
def training_data
|
117
|
+
(@data - test_data).flatten
|
118
|
+
end
|
63
119
|
|
64
|
-
#
|
120
|
+
# Use this every time you want to do a measurement.
|
65
121
|
# It will be put on the record file and benchmarked
|
66
|
-
# automatically
|
67
|
-
#
|
68
|
-
#
|
69
|
-
#
|
122
|
+
# automatically.
|
123
|
+
#
|
124
|
+
# @param [Integer] weight Used for calculating
|
125
|
+
# Notify::step. It should be an integer denoting how many
|
126
|
+
# such measurements you wish to do.
|
70
127
|
def measure(label = "", weight = nil, &block)
|
71
128
|
out = ""
|
72
129
|
benchmark label do
|
73
130
|
out = yield
|
74
131
|
end
|
75
|
-
|
132
|
+
if out.is_a? String
|
133
|
+
@ouptut_file << out
|
134
|
+
else
|
135
|
+
YAML::dump(out, @ouptut_file)
|
136
|
+
end
|
76
137
|
Notify::step(@experiment, @current_cv, 1.0/weight) unless weight.nil?
|
77
138
|
end
|
78
139
|
|
79
140
|
|
80
141
|
# Registers and performs a benchmark which is then
|
81
|
-
# calculated to the total and everage times
|
142
|
+
# calculated to the total and everage times.
|
143
|
+
#
|
144
|
+
# A lower-level alternative to measure.
|
82
145
|
def benchmark(label = "", &block)
|
83
146
|
@bm ||= []
|
84
147
|
@bm << Benchmark.measure("CV #{@current_cv} #{label}", &block)
|
85
148
|
end
|
86
149
|
|
87
150
|
|
88
|
-
|
151
|
+
|
152
|
+
|
153
|
+
# creates a summary of the results and writes to 'summary.mmd'
|
154
|
+
def summarize_results!(results)
|
155
|
+
File.open(@dir + '/results.yaml', 'w' ) do |out|
|
156
|
+
YAML.dump(results, out)
|
157
|
+
end
|
158
|
+
# create an array of arrays
|
159
|
+
res = results.keys.map do |key|
|
160
|
+
# calculate stats
|
161
|
+
a = results[key]
|
162
|
+
if a.all? {|el| el.is_a? Numeric }
|
163
|
+
[key] + a + [Stats::mean(a), Stats::standard_deviation(a)]
|
164
|
+
else
|
165
|
+
[key] + a + ["--", "--"]
|
166
|
+
end
|
167
|
+
end
|
168
|
+
|
169
|
+
ls = results.keys.map{|v| [7, v.to_s.length].max }
|
170
|
+
|
171
|
+
ls = ["Std Deviation".length] + ls
|
172
|
+
res = header_column + res
|
173
|
+
res = res.transpose
|
174
|
+
out = build_table res, ls
|
175
|
+
File.open(@dir + "/summary.mmd", 'w') do |f|
|
176
|
+
f << "## Results for #{@experiment} ##\n\n"
|
177
|
+
f << out
|
178
|
+
end
|
179
|
+
#results = results.reduce({}) do |tot, res|
|
180
|
+
# cv = res.delete :cv
|
181
|
+
# tot.merge Hash[res.to_a.map {|a| ["cv_#{cv}_#{a.first}".to_sym, a.last]}]
|
182
|
+
#end
|
183
|
+
#FasterCSV.open("./results/all.csv", "a") do |csv|
|
184
|
+
# csv << results.to_a.sort_by{|a| a.first.to_s}.map(&:last)
|
185
|
+
#end
|
186
|
+
end
|
187
|
+
|
188
|
+
|
189
|
+
# A silly method meant to be overriden.
|
190
|
+
# should return an array, which will be then split up for cross-validating.
|
191
|
+
# @abstract Override this method to return an array.
|
192
|
+
def data_set
|
193
|
+
(1..cvs).to_a
|
194
|
+
end
|
195
|
+
|
196
|
+
protected
|
197
|
+
|
198
|
+
# Creates the results directory for the current experiment
|
89
199
|
def write_dir!
|
90
200
|
@dir = "./results/#{@experiment}-cv#{@cvs}-#{Time.now.to_i.to_s[4..9]}"
|
91
201
|
Dir.mkdir @dir
|
92
202
|
end
|
93
203
|
|
94
204
|
# Writes a yaml specification of all the options used to run the experiment
|
95
|
-
def specification!
|
205
|
+
def specification! all = false
|
96
206
|
File.open(@dir + '/specification.yaml', 'w' ) do |out|
|
97
|
-
YAML.dump({:name => @experiment, :date => Time.now, :configuration => Experiment::Config.to_h, :cross_validations => @cvs}, out )
|
207
|
+
YAML.dump({:name => @experiment, :date => Time.now, :configuration => (all ? Experiment::Config.to_hash : Experiment::Config.to_h), :cross_validations => @cvs}, out )
|
98
208
|
end
|
99
209
|
end
|
100
210
|
|
@@ -120,71 +230,59 @@ module Experiment
|
|
120
230
|
f << total.format(" Average: "+Benchmark::FMTSTR)
|
121
231
|
end
|
122
232
|
end
|
123
|
-
|
124
|
-
|
125
|
-
|
126
|
-
|
127
|
-
YAML.dump(results, out)
|
128
|
-
end
|
129
|
-
|
130
|
-
# create an array of arrays
|
131
|
-
res = results.keys.map do |key|
|
132
|
-
# calculate stats
|
133
|
-
a = results[key]
|
134
|
-
[key] + a + [Stats::mean(a), Stats::standard_deviation(a)]
|
135
|
-
end
|
136
|
-
|
137
|
-
ls = results.keys.map{|v| v.to_s.length }
|
138
|
-
|
139
|
-
ls = ["Standard Deviation".length] + ls
|
140
|
-
res = [["cv"] + (1..cvs).to_a.map(&:to_s) + ["Mean", "Standard Deviation"]] + res
|
141
|
-
out = ""
|
142
|
-
res.transpose.each do |col|
|
233
|
+
|
234
|
+
def build_table(table_data, ls)
|
235
|
+
out = ""
|
236
|
+
table_data.each_with_index do |col, row_num|
|
143
237
|
col.each_with_index do |cell, i|
|
144
238
|
l = ls[i]
|
145
239
|
out << "| "
|
146
240
|
if cell.is_a?(String) || cell.is_a?(Symbol)
|
147
241
|
out << sprintf("%#{l}s", cell)
|
148
|
-
|
242
|
+
elsif cell.is_a? Numeric
|
149
243
|
out << sprintf("%#{l}.3f", cell)
|
244
|
+
else
|
245
|
+
out << sprintf("%#{l}s", cell.to_s)
|
150
246
|
end
|
151
247
|
out << " "
|
152
248
|
end
|
249
|
+
|
153
250
|
out << "|\n"
|
154
|
-
|
155
|
-
|
156
|
-
|
157
|
-
|
158
|
-
|
159
|
-
|
160
|
-
|
161
|
-
|
162
|
-
|
163
|
-
|
164
|
-
# csv << results.to_a.sort_by{|a| a.first.to_s}.map(&:last)
|
165
|
-
#end
|
166
|
-
end
|
167
|
-
|
168
|
-
def result_line
|
169
|
-
" Done\n"
|
251
|
+
|
252
|
+
if row_num == 0 || row_num == table_data.length - 3
|
253
|
+
col.each_index do |i|
|
254
|
+
out << "|" + "-" * (ls[i] + 2)
|
255
|
+
end
|
256
|
+
out << "|\n"
|
257
|
+
end
|
258
|
+
end # each_with_index
|
259
|
+
|
260
|
+
return out
|
170
261
|
end
|
171
262
|
|
172
|
-
# A silly method meant to be overriden.
|
173
|
-
# should return an array, which will be then split up for cross-validating
|
174
|
-
def test_data
|
175
|
-
(1..cvs).to_a
|
176
|
-
end
|
177
263
|
|
178
264
|
def split_up_data
|
179
265
|
@data = []
|
180
|
-
|
266
|
+
data_set.each_with_index do |item, i|
|
181
267
|
@data[i % cvs] ||= []
|
182
268
|
@data[i % cvs] << item
|
183
269
|
end
|
184
270
|
@data
|
185
271
|
end
|
186
272
|
|
187
|
-
|
273
|
+
# Performs cleanup tasks
|
274
|
+
def cleanup!
|
275
|
+
if @@cleanup_raw_files
|
276
|
+
FileUtils.rm Dir[@dir + "/raw-*.txt"]
|
277
|
+
end
|
278
|
+
end
|
279
|
+
|
280
|
+
|
281
|
+
|
282
|
+
def header_column
|
283
|
+
[["cv"] + (1..cvs).to_a.map(&:to_s) + ["Mean", "Std Deviation"]]
|
284
|
+
end
|
285
|
+
|
188
286
|
# Yields a handle to the performance table
|
189
287
|
def performance_f(&block) # just a simple wrapper to make code a little DRYer
|
190
288
|
File.open(@dir+"/performance_table.txt", "a", &block)
|
@@ -196,5 +294,7 @@ module Experiment
|
|
196
294
|
h1[key] << value
|
197
295
|
end
|
198
296
|
end
|
297
|
+
|
298
|
+
|
199
299
|
end
|
200
300
|
end
|