nimbus 2.2.1 → 2.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/CODE_OF_CONDUCT.md +7 -0
- data/CONTRIBUTING.md +46 -0
- data/MIT-LICENSE.txt +1 -1
- data/README.md +131 -21
- data/bin/nimbus +2 -2
- data/lib/nimbus.rb +2 -6
- data/lib/nimbus/classification_tree.rb +9 -12
- data/lib/nimbus/configuration.rb +22 -22
- data/lib/nimbus/forest.rb +8 -8
- data/lib/nimbus/loss_functions.rb +11 -0
- data/lib/nimbus/regression_tree.rb +8 -10
- data/lib/nimbus/tree.rb +54 -12
- data/lib/nimbus/version.rb +1 -1
- data/spec/classification_tree_spec.rb +47 -47
- data/spec/configuration_spec.rb +55 -55
- data/spec/fixtures/{classification_config.yml → classification/config.yml} +3 -3
- data/spec/fixtures/classification/random_forest.yml +1174 -0
- data/spec/fixtures/{classification_testing.data → classification/testing.data} +0 -0
- data/spec/fixtures/{classification_training.data → classification/training.data} +0 -0
- data/spec/fixtures/{regression_config.yml → regression/config.yml} +4 -4
- data/spec/fixtures/regression/random_forest.yml +2737 -0
- data/spec/fixtures/{regression_testing.data → regression/testing.data} +0 -0
- data/spec/fixtures/{regression_training.data → regression/training.data} +0 -0
- data/spec/forest_spec.rb +39 -39
- data/spec/individual_spec.rb +3 -3
- data/spec/loss_functions_spec.rb +31 -13
- data/spec/nimbus_spec.rb +2 -2
- data/spec/regression_tree_spec.rb +44 -44
- data/spec/training_set_spec.rb +3 -3
- data/spec/tree_spec.rb +4 -4
- metadata +37 -34
- data/spec/fixtures/classification_random_forest.yml +0 -922
- data/spec/fixtures/regression_random_forest.yml +0 -1741
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 77c90d8df14d57b793836e99eb985c9bac4e80c1
|
4
|
+
data.tar.gz: e7ca59175e219cafd229d178d544fe4a9cb94c58
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 61a480e4d2e165f75059b950e2ce78e53686199e0459fade8944fea8f47d894d91aac2a849fa8cffc6cf39cb448a57b722776f5ee5c219548b9663189bc82e0e
|
7
|
+
data.tar.gz: e27b2d8ebc4f4cd686e2f9df72d77beeea368128c3cdac59284db21bfcf74ec5c1d31b08c265a38fc393c19c529080df5e3551c89adc0809837a3d3469c4141b
|
data/CODE_OF_CONDUCT.md
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
# Contributor Code of Conduct
|
2
|
+
|
3
|
+
The Nimbus team is committed to fostering a welcoming community.
|
4
|
+
|
5
|
+
we adopt an inclusive Code of Conduct adapted from the Contributor Covenant, version 1.4, you can read it here: [Contributor Covenant Code of Conduct](http://contributor-covenant.org/version/1/4/).
|
6
|
+
|
7
|
+
|
data/CONTRIBUTING.md
ADDED
@@ -0,0 +1,46 @@
|
|
1
|
+
# How to Contribute to this Project
|
2
|
+
|
3
|
+
## Report an issue
|
4
|
+
|
5
|
+
The prefered way to report any bug is [opening an issue in the project's Github repo](https://github.com/xuanxu/nimbus/issues/new).
|
6
|
+
|
7
|
+
For more informal communication, you can contact [@xuanxu via twitter](https://twitter.com/xuanxu)
|
8
|
+
|
9
|
+
## Resolve an issue
|
10
|
+
|
11
|
+
Pull request are welcome. If you want to contribute code to solve an issue:
|
12
|
+
|
13
|
+
* Add a comment to tell everyone you are working on the issue.
|
14
|
+
* If an issue has someone assigned it means that person is already working on it.
|
15
|
+
* Fork the project.
|
16
|
+
* Create a topic branch based on master.
|
17
|
+
* Commit there your code to solve the issue.
|
18
|
+
* Make sure all test are passing (and add specs to test any new feature if needed).
|
19
|
+
* Follow these [best practices](https://github.com/styleguide/ruby)
|
20
|
+
* Open a *pull request* to the main repository describing what issue you are addressing.
|
21
|
+
|
22
|
+
## Cleaning up
|
23
|
+
|
24
|
+
In the rush of time sometimes things get messy, you can help us cleaning things up:
|
25
|
+
|
26
|
+
* implementing pending specs
|
27
|
+
* increasing code coverage
|
28
|
+
* improving code quality
|
29
|
+
* updating dependencies
|
30
|
+
* making code consistent
|
31
|
+
|
32
|
+
## Other ways of contributing without coding
|
33
|
+
|
34
|
+
* If you think there's a feature missing, or find a bug, create an issue (make sure it has not already been reported).
|
35
|
+
* You can also help promoting the project talking about it in your social networks.
|
36
|
+
|
37
|
+
## How to report an issue
|
38
|
+
|
39
|
+
* Try to use a descriptive and to-the-point title
|
40
|
+
* Is a good idea to include some of there sections:
|
41
|
+
* Steps to reproduce the bug
|
42
|
+
* Expected behaviour/response
|
43
|
+
* Actual response
|
44
|
+
* Sometimes it is also helpful if you mention your operating system or shell.
|
45
|
+
|
46
|
+
Thanks! :heart: :heart: :heart:
|
data/MIT-LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -1,9 +1,12 @@
|
|
1
|
-
# Nimbus
|
1
|
+
# Nimbus
|
2
2
|
Random Forest algorithm for genomic selection.
|
3
3
|
|
4
|
+
[![Build Status](https://secure.travis-ci.org/xuanxu/nimbus.png?branch=master)](http://travis-ci.org/xuanxu/nimbus)
|
5
|
+
[![Gem Version](https://badge.fury.io/rb/nimbus.png)](http://badge.fury.io/rb/nimbus)
|
6
|
+
|
4
7
|
## Random Forest
|
5
8
|
|
6
|
-
The [random forest algorithm](http://en.wikipedia.org/wiki/Random_forest) is
|
9
|
+
The [random forest algorithm](http://en.wikipedia.org/wiki/Random_forest) is a classifier consisting in many random decision trees. It is based on choosing random subsets of variables for each tree and using the most frequent, or the averaged tree output as the overall classification. In machine learning terms, it is an ensembler classifier, so it uses multiple models to obtain better predictive performance than could be obtained from any of the constituent models.
|
7
10
|
|
8
11
|
The forest outputs the class that is the mean or the mode (in regression problems) or the majority class (in classification problems) of the node's output by individual trees.
|
9
12
|
|
@@ -35,14 +38,14 @@ Nimbus can be used to:
|
|
35
38
|
|
36
39
|
Nimbus can be used both with regression and classification problems.
|
37
40
|
|
38
|
-
**Regression**: is the default mode.
|
41
|
+
**Regression**: is the default mode.
|
39
42
|
|
40
|
-
* The split of nodes uses quadratic loss as loss function.
|
43
|
+
* The split of nodes uses quadratic loss as loss function.
|
41
44
|
* Labeling of nodes is made averaging the fenotype values of the individuals in the node.
|
42
45
|
|
43
|
-
**Classification**: user-activated declaring `classes` in the configuration file.
|
46
|
+
**Classification**: user-activated declaring `classes` in the configuration file.
|
44
47
|
|
45
|
-
* The split of nodes uses the Gini index as loss function.
|
48
|
+
* The split of nodes uses the Gini index as loss function.
|
46
49
|
* Labeling of nodes is made finding the majority fenotype class of the individuals in the node.
|
47
50
|
|
48
51
|
## Variable importances
|
@@ -51,29 +54,36 @@ By default Nimbus will estimate SNP importances everytime a training file is run
|
|
51
54
|
|
52
55
|
You can disable this behaviour (and speed up the training process) by setting the parameter `var_importances: No` in the configuration file.
|
53
56
|
|
54
|
-
##
|
57
|
+
## Installation
|
58
|
+
|
59
|
+
You need to have [Ruby](https://www.ruby-lang.org) (2.1 or higher) with Rubygems installed in your computer. Then install Nimbus with:
|
55
60
|
|
56
|
-
|
61
|
+
````shell
|
62
|
+
> gem install nimbus
|
63
|
+
````
|
57
64
|
|
58
|
-
|
65
|
+
There are not extra dependencies needed.
|
59
66
|
|
60
67
|
## Getting Started
|
61
68
|
|
62
69
|
Once you have nimbus installed in your system, you can run the gem using the `nimbus` executable:
|
63
70
|
|
64
|
-
|
71
|
+
````shell
|
72
|
+
> nimbus
|
73
|
+
````
|
65
74
|
|
66
|
-
It will look for these files:
|
75
|
+
It will look for these files in the directory where Nimbus is running:
|
67
76
|
|
68
|
-
* `training.data`: If found it will be used to build a random forest
|
69
|
-
* `testing.data` : If found it will be pushed down the forest to obtain predictions for every individual in the file
|
70
|
-
* `random_forest.yml`: If found it will be the forest used for the testing.
|
77
|
+
* `training.data`: If found it will be used to build a random forest.
|
78
|
+
* `testing.data` : If found it will be pushed down the forest to obtain predictions for every individual in the file.
|
79
|
+
* `random_forest.yml`: If found it will be the forest used for the testing instead of building one.
|
80
|
+
* `config.yml`: A file detailing random forest parameters and datasets. If not found default values will be used.
|
71
81
|
|
72
|
-
That way in order to train a forest a training file is needed. And to do the testing you need two files: the testing file and one of the other two: the training OR the random_forest file, because
|
82
|
+
That way in order to train a forest a training file is needed. And to do the testing you need two files: the testing file and one of the other two: the training OR the random_forest file, because Nimbus needs a forest from which obtain the predictions.
|
73
83
|
|
74
84
|
## Configuration (config.yml)
|
75
85
|
|
76
|
-
The
|
86
|
+
The names for the input data files and the forest parameters can be specified in the `config.yml` file that should be located in the directory where you are running `nimbus`.
|
77
87
|
|
78
88
|
The `config.yml` has the following structure and parameters:
|
79
89
|
|
@@ -91,14 +101,14 @@ The `config.yml` has the following structure and parameters:
|
|
91
101
|
SNP_total_count: 200
|
92
102
|
node_min_size: 5
|
93
103
|
|
94
|
-
Under the input chapter:
|
104
|
+
### Under the input chapter:
|
95
105
|
|
96
106
|
* `training`: specify the path to the training data file (optional, if specified `nimbus` will create a random forest).
|
97
|
-
* `testing`: specify the path to the testing data file (optional, if specified `nimbus` will traverse this
|
107
|
+
* `testing`: specify the path to the testing data file (optional, if specified `nimbus` will traverse this data through a random forest).
|
98
108
|
* `forest`: specify the path to a file containing a random forest structure (optional, if there is also testing file, this will be the forest used for the testing).
|
99
109
|
* `classes`: **optional (needed only for classification problems)**. Specify the list of classes in the input files as a comma separated list between squared brackets, e.g.:`[A, B]`.
|
100
110
|
|
101
|
-
Under the forest chapter:
|
111
|
+
### Under the forest chapter:
|
102
112
|
|
103
113
|
* `forest_size`: number of trees for the forest.
|
104
114
|
* `SNP_sample_size_mtry`: size of the random sample of SNPs to be used in every tree node.
|
@@ -106,6 +116,19 @@ Under the forest chapter:
|
|
106
116
|
* `node_min_size`: minimum amount of individuals in a tree node to make a split.
|
107
117
|
* `var_importances`: **optional**. If set to `No` Nimbus will not calculate SNP importances.
|
108
118
|
|
119
|
+
### Default values
|
120
|
+
|
121
|
+
If there is no config.yml file present, Nimbus will use these default values:
|
122
|
+
|
123
|
+
````yaml
|
124
|
+
forest_size: 300
|
125
|
+
tree_SNP_sample_size: 60
|
126
|
+
tree_SNP_total_count: 200
|
127
|
+
tree_node_min_size: 5
|
128
|
+
training_file: 'training.data'
|
129
|
+
testing_file: 'testing.data'
|
130
|
+
forest_file: 'forest.yml
|
131
|
+
````
|
109
132
|
|
110
133
|
## Input files
|
111
134
|
|
@@ -137,7 +160,83 @@ After training:
|
|
137
160
|
|
138
161
|
After testing:
|
139
162
|
|
140
|
-
* `testing_file_predictions.txt`: A file
|
163
|
+
* `testing_file_predictions.txt`: A file detailing the predicted results for the testing dataset.
|
164
|
+
|
165
|
+
## Example usage
|
166
|
+
|
167
|
+
### Sample files
|
168
|
+
|
169
|
+
Sample files are located in the `/spec/fixtures` directory, both for regression and classification problems. They can be used as a starting point to tweak your own configurations.
|
170
|
+
|
171
|
+
Depending on the kind of problem you want to test different files are needed:
|
172
|
+
|
173
|
+
### Regression
|
174
|
+
|
175
|
+
**Test with a Random Forest created from a training data set**
|
176
|
+
|
177
|
+
Download/copy the `config.yml`, `training.data` and `testing.data` files from the [regression folder](./tree/master/spec/fixtures/regression).
|
178
|
+
|
179
|
+
Then run nimbus:
|
180
|
+
|
181
|
+
````shell
|
182
|
+
> nimbus
|
183
|
+
````
|
184
|
+
|
185
|
+
It should output a `random_forest.yml` file with the nodes and structure of the resulting random forest, the `generalization_errors` and `snp_importances` files, and the predictions for both training and testing datasets (`training_file_predictions.txt` and `testing_file_predictions.txt` files).
|
186
|
+
|
187
|
+
**Test with a Random Forest previously created**
|
188
|
+
|
189
|
+
Download/copy the `config.yml`, `testing.data` and `random_forest.yml` files from the [regression folder](./tree/master/spec/fixtures/regression).
|
190
|
+
|
191
|
+
Edit the `config.yml` file to comment/remove the training entry.
|
192
|
+
|
193
|
+
Then use nimbus to run the testing:
|
194
|
+
|
195
|
+
````shell
|
196
|
+
> nimbus
|
197
|
+
````
|
198
|
+
|
199
|
+
It should output a `testing_file_predictions.txt` file with the resulting predictions for the testing dataset using the given random forest.
|
200
|
+
|
201
|
+
### Classification
|
202
|
+
|
203
|
+
**Test with a Random Forest created from a training data set**
|
204
|
+
|
205
|
+
Download/copy the `config.yml`, `training.data` and `testing.data` files from the [classification folder](./tree/master/spec/fixtures/classification).
|
206
|
+
|
207
|
+
Then run nimbus:
|
208
|
+
|
209
|
+
````shell
|
210
|
+
> nimbus
|
211
|
+
````
|
212
|
+
|
213
|
+
It should output a `random_forest.yml` file with the nodes and structure of the resulting random forest, the `generalization_errors` file, and the predictions for both training and testing datasets (`training_file_predictions.txt` and `testing_file_predictions.txt` files).
|
214
|
+
|
215
|
+
**Test with a Random Forest previously created**
|
216
|
+
|
217
|
+
Download/copy the `config.yml`, `testing.data` and `random_forest.yml` files from the [classification folder](./tree/master/spec/fixtures/classification).
|
218
|
+
|
219
|
+
Edit the `config.yml` file to comment/remove the training entry.
|
220
|
+
|
221
|
+
Then use nimbus to run the testing:
|
222
|
+
|
223
|
+
````shell
|
224
|
+
> nimbus
|
225
|
+
````
|
226
|
+
|
227
|
+
It should output a `testing_file_predictions.txt` file with the resulting predictions for the testing dataset using the given random forest.
|
228
|
+
|
229
|
+
|
230
|
+
## Test suite
|
231
|
+
|
232
|
+
Nimbus includes a test suite located in the `spec` directory. You can run the specs if you clone the code to your local machine and run the default rake task:
|
233
|
+
|
234
|
+
````shell
|
235
|
+
> git clone git://github.com/xuanxu/nimbus.git
|
236
|
+
> cd nimbus
|
237
|
+
> bundle install
|
238
|
+
> rake
|
239
|
+
````
|
141
240
|
|
142
241
|
## Resources
|
143
242
|
|
@@ -149,8 +248,19 @@ After testing:
|
|
149
248
|
* [Random Forest at Wikipedia](http://en.wikipedia.org/wiki/Random_forest)
|
150
249
|
* [RF Leo Breiman page](http://www.stat.berkeley.edu/~breiman/RandomForests/)
|
151
250
|
|
251
|
+
|
252
|
+
## Contributing
|
253
|
+
|
254
|
+
Contributions are welcome. We encourage you to contribute to the Nimbus codebase.
|
255
|
+
|
256
|
+
Please read the [CONTRIBUTING](CONTRIBUTING.md) file.
|
257
|
+
|
258
|
+
|
152
259
|
## Credits
|
153
260
|
|
154
261
|
Nimbus was developed by [Juanjo Bazán](http://twitter.com/xuanxu) in collaboration with Oscar González-Recio.
|
155
262
|
|
156
|
-
|
263
|
+
|
264
|
+
## LICENSE
|
265
|
+
|
266
|
+
Copyright © 2017 Juanjo Bazán, released under the [MIT license](MIT-LICENSE.txt)
|
data/bin/nimbus
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
#!/usr/bin/env ruby
|
2
2
|
|
3
3
|
#--
|
4
|
-
# Copyright (c) 2011-
|
4
|
+
# Copyright (c) 2011-2017 Juanjo Bazan
|
5
5
|
#
|
6
6
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
7
7
|
# of this software and associated documentation files (the "Software"), to
|
@@ -23,4 +23,4 @@
|
|
23
23
|
#++
|
24
24
|
|
25
25
|
require 'nimbus'
|
26
|
-
Nimbus.application.run
|
26
|
+
Nimbus.application.run
|
data/lib/nimbus.rb
CHANGED
@@ -20,6 +20,7 @@ module Nimbus
|
|
20
20
|
|
21
21
|
STDERR = $stderr
|
22
22
|
STDOUT = $stdout
|
23
|
+
STDOUT.sync = true
|
23
24
|
|
24
25
|
# Nimbus module singleton methods.
|
25
26
|
#
|
@@ -43,26 +44,21 @@ module Nimbus
|
|
43
44
|
# Writes message to the standard output
|
44
45
|
def message(msg)
|
45
46
|
STDOUT.puts msg
|
46
|
-
STDOUT.flush
|
47
47
|
end
|
48
48
|
|
49
49
|
# Writes message to the error output
|
50
50
|
def error_message(msg)
|
51
51
|
STDERR.puts msg
|
52
|
-
STDERR.flush
|
53
52
|
end
|
54
53
|
|
55
54
|
# Writes to the standard output
|
56
55
|
def write(str)
|
57
56
|
STDOUT.write str
|
58
|
-
STDOUT.flush
|
59
57
|
end
|
60
58
|
|
61
59
|
# Clear current console line
|
62
60
|
def clear_line!
|
63
|
-
|
64
|
-
self.write(" " * 50)
|
65
|
-
self.write "\r"
|
61
|
+
print "\r\e[2K"
|
66
62
|
end
|
67
63
|
|
68
64
|
end
|
@@ -8,7 +8,7 @@ module Nimbus
|
|
8
8
|
# * 1: Calculate loss function for the individuals in the node (first node contains all the individuals).
|
9
9
|
# * 2: Take a random sample of the SNPs (size m << total count of SNPs)
|
10
10
|
# * 3: Compute the loss function (default: gini index) for the split of the sample based on value of every SNP.
|
11
|
-
# * 4: If the SNP with minimum loss function also minimizes the general loss of the node, split the individuals sample in
|
11
|
+
# * 4: If the SNP with minimum loss function also minimizes the general loss of the node, split the individuals sample in two nodes, based on average value for that SNP [0,1][2], or [0][1,2]
|
12
12
|
# * 5: Repeat from 1 for every node until:
|
13
13
|
# - a) The individuals count in that node is < minimum size OR
|
14
14
|
# - b) None of the SNP splits has a loss function smaller than the node loss function
|
@@ -34,8 +34,8 @@ module Nimbus
|
|
34
34
|
|
35
35
|
# Creates a node by taking a random sample of the SNPs and computing the loss function for every split by SNP of that sample.
|
36
36
|
#
|
37
|
-
# * If SNP_min is the SNP with smaller loss function and it is < the loss function of the node, it splits the individuals sample in
|
38
|
-
# (
|
37
|
+
# * If SNP_min is the SNP with smaller loss function and it is < the loss function of the node, it splits the individuals sample in two:
|
38
|
+
# (the average of the 0,1,2 values for the SNP_min in the individuals is computed, and they are splitted in [<=avg], [>avg]) then it builds these 2 new nodes.
|
39
39
|
# * Otherwise every individual in the node gets labeled with the average of the fenotype values of all of them.
|
40
40
|
def build_node(individuals_ids, y_hat)
|
41
41
|
# General loss function value for the node
|
@@ -45,24 +45,21 @@ module Nimbus
|
|
45
45
|
|
46
46
|
# Finding the SNP that minimizes loss function
|
47
47
|
snps = snps_random_sample
|
48
|
-
min_loss, min_SNP, split, ginis = node_loss_function, nil, nil, nil
|
48
|
+
min_loss, min_SNP, split, split_type, ginis = node_loss_function, nil, nil, nil, nil
|
49
49
|
|
50
50
|
snps.each do |snp|
|
51
|
-
individuals_split_by_snp_value =
|
51
|
+
individuals_split_by_snp_value, node_split_type = split_by_snp_avegare_value individuals_ids, snp
|
52
52
|
y_hat_0 = Nimbus::LossFunctions.majority_class(individuals_split_by_snp_value[0], @id_to_fenotype, @classes)
|
53
53
|
y_hat_1 = Nimbus::LossFunctions.majority_class(individuals_split_by_snp_value[1], @id_to_fenotype, @classes)
|
54
|
-
|
55
|
-
|
54
|
+
|
56
55
|
gini_0 = Nimbus::LossFunctions.gini_index individuals_split_by_snp_value[0], @id_to_fenotype, @classes
|
57
56
|
gini_1 = Nimbus::LossFunctions.gini_index individuals_split_by_snp_value[1], @id_to_fenotype, @classes
|
58
|
-
gini_2 = Nimbus::LossFunctions.gini_index individuals_split_by_snp_value[2], @id_to_fenotype, @classes
|
59
57
|
loss_snp = (individuals_split_by_snp_value[0].size * gini_0 +
|
60
|
-
individuals_split_by_snp_value[1].size * gini_1
|
61
|
-
individuals_split_by_snp_value[2].size * gini_2) / individuals_count
|
58
|
+
individuals_split_by_snp_value[1].size * gini_1) / individuals_count
|
62
59
|
|
63
|
-
min_loss, min_SNP, split, ginis = loss_snp, snp, individuals_split_by_snp_value, [y_hat_0, y_hat_1
|
60
|
+
min_loss, min_SNP, split, split_type, ginis = loss_snp, snp, individuals_split_by_snp_value, node_split_type, [y_hat_0, y_hat_1] if loss_snp < min_loss
|
64
61
|
end
|
65
|
-
return build_branch(min_SNP, split, ginis, y_hat) if min_loss < node_loss_function
|
62
|
+
return build_branch(min_SNP, split, split_type, ginis, y_hat) if min_loss < node_loss_function
|
66
63
|
return label_node(y_hat, individuals_ids)
|
67
64
|
end
|
68
65
|
|
data/lib/nimbus/configuration.rb
CHANGED
@@ -36,26 +36,26 @@ module Nimbus
|
|
36
36
|
)
|
37
37
|
|
38
38
|
DEFAULTS = {
|
39
|
-
:
|
40
|
-
:
|
41
|
-
:
|
42
|
-
:
|
39
|
+
forest_size: 300,
|
40
|
+
tree_SNP_sample_size: 60,
|
41
|
+
tree_SNP_total_count: 200,
|
42
|
+
tree_node_min_size: 5,
|
43
43
|
|
44
|
-
:
|
45
|
-
:
|
44
|
+
loss_function_discrete: 'majority_class',
|
45
|
+
loss_function_continuous: 'average',
|
46
46
|
|
47
|
-
:
|
48
|
-
:
|
49
|
-
:
|
50
|
-
:
|
47
|
+
training_file: 'training.data',
|
48
|
+
testing_file: 'testing.data',
|
49
|
+
forest_file: 'forest.yml',
|
50
|
+
config_file: 'config.yml',
|
51
51
|
|
52
|
-
:
|
53
|
-
:
|
54
|
-
:
|
55
|
-
:
|
56
|
-
:
|
52
|
+
output_forest_file: 'random_forest.yml',
|
53
|
+
output_training_file: 'training_file_predictions.txt',
|
54
|
+
output_testing_file: 'testing_file_predictions.txt',
|
55
|
+
output_tree_errors_file: 'generalization_errors.txt',
|
56
|
+
output_snp_importances_file: 'snp_importances.txt',
|
57
57
|
|
58
|
-
:
|
58
|
+
silent: false
|
59
59
|
}
|
60
60
|
|
61
61
|
# Initialize a Nimbus::Configuration object.
|
@@ -85,10 +85,10 @@ module Nimbus
|
|
85
85
|
# Accessor method for the tree-related subset of options.
|
86
86
|
def tree
|
87
87
|
{
|
88
|
-
:
|
89
|
-
:
|
90
|
-
:
|
91
|
-
:
|
88
|
+
snp_sample_size: @tree_SNP_sample_size,
|
89
|
+
snp_total_count: @tree_SNP_total_count,
|
90
|
+
tree_node_min_size: @tree_node_min_size,
|
91
|
+
classes: @classes
|
92
92
|
}
|
93
93
|
end
|
94
94
|
|
@@ -126,8 +126,8 @@ module Nimbus
|
|
126
126
|
@forest_file = File.expand_path(DEFAULTS[:forest_file ], Dir.pwd) if File.exists? File.expand_path(DEFAULTS[:forest_file ], Dir.pwd)
|
127
127
|
end
|
128
128
|
|
129
|
-
@do_training = true
|
130
|
-
@do_testing = true
|
129
|
+
@do_training = true unless @training_file.nil?
|
130
|
+
@do_testing = true unless @testing_file.nil?
|
131
131
|
@classes = @classes.map{|c| c.to_s.strip} if @classes
|
132
132
|
|
133
133
|
if @do_testing && !@do_training && !@forest_file
|