lazar 1.0.0 → 1.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 2211d5cf1767b241583acff9a22379b56a5d8f1c
4
- data.tar.gz: 923a3d00d5c78fd77a2153c973c5e3935c939eda
3
+ metadata.gz: c17dc3fb7cae4c75aca1be7c0a6286cfbc3f22ce
4
+ data.tar.gz: 5b9fb4bae6230e427188e0c8e34153fd5a6efa0a
5
5
  SHA512:
6
- metadata.gz: 2a366bae505c427a72211df4d59c7f296ead656bfe3f42db0fb6bb2dc3885028c70ba9df0aa7778c0bd78acdbd7b2939417caafd342a535c4954a34fef410c8d
7
- data.tar.gz: 04fd93e7ab52517d338e6005223fe22b498d74be324f8dc6ef2e3a4d4a843202abc9224ff55e8ba053ce7a16a6a76301437f4fc061ac2719d65ff3afa392396a
6
+ metadata.gz: 7cae1ffb410cd9a2d1afd1516ebf99499e2b2447af8707a4381adb652cb59711e1875c11e80cec8fc101f8368224ab21bc378f685b0084ab29c631d798145dca
7
+ data.tar.gz: d01273022852b6a0b59941a0e881a85ed1400a984912018d97fc137f8ab602cff1fd6f5fb42a65df5d9375cb43ca2809adeefbde7a0e385fd832c189df0da031
data/README.md CHANGED
@@ -26,10 +26,73 @@ Installation
26
26
 
27
27
  The output should give you more verbose information that can help in debugging (e.g. to identify missing libraries).
28
28
 
29
+ Tutorial
30
+ --------
31
+
32
+ Execute the following commands either from an interactive Ruby shell or a Ruby script:
33
+
34
+ ### Create and use `lazar` models for small molecules
35
+
36
+ #### Create a training dataset
37
+
38
+ Create a CSV file with two columns. The first line should contain either SMILES or InChI (first column) and the endpoint (second column). The first column should contain either the SMILES or InChI of the training compounds, the second column the training compounds toxic activities (qualitative or quantitative). Use -log10 transformed values for regression datasets. Add metadata to a JSON file with the same basename containing the fields "species", "endpoint", "source" and "unit" (regression only). You can find example training data at [Github](https://github.com/opentox/lazar-public-data).
39
+
40
+ #### Create and validate a `lazar` model with default algorithms and parameters
41
+
42
+ `validated_model = Model::Validation.create_from_csv_file EPAFHM_log10.csv`
43
+
44
+ This command will create a `lazar` model and validate it with three independent 10-fold crossvalidations.
45
+
46
+ #### Inspect crossvalidation results
47
+
48
+ `validated_model.crossvalidations`
49
+
50
+ #### Predict a new compound
51
+
52
+ Create a compound
53
+
54
+ `compound = Compound.from_smiles "NC(=O)OCCC"`
55
+
56
+ Predict Fathead Minnow Acute Toxicity
57
+
58
+ `validated_model.predict compound`
59
+
60
+ #### Experiment with other algorithms
61
+
62
+ You can pass algorithms parameters to the `Model::Validation.create_from_csv_file` command. The [API documentation](http://rdoc.info/gems/lazar) provides detailed instructions.
63
+
64
+ ### Create and use `lazar` nanoparticle models
65
+
66
+ #### Create and validate a `nano-lazar` model from eNanoMapper with default algorithms and parameters
67
+
68
+ `validated_model = Model::Validation.create_from_enanomapper`
69
+
70
+ This command will mirror the eNanoMapper database in the local database, create a `nano-lazar` model and validate it with five independent 10-fold crossvalidations.
71
+
72
+ #### Inspect crossvalidation results
73
+
74
+ `validated_model.crossvalidations`
75
+
76
+ #### Predict nanoparticle toxicities
77
+
78
+ Choose a random nanoparticle from the "Potein Corona" dataset
79
+ ```
80
+ training_dataset = Dataset.where(:name => "Protein Corona Fingerprinting Predicts the Cellular Interaction of Gold and Silver Nanoparticles").first
81
+ nanoparticle = training_dataset.substances.shuffle.first
82
+ ```
83
+
84
+ Predict the "Net Cell Association" endpoint
85
+
86
+ `validated_model.predict nanoparticle`
87
+
88
+ #### Experiment with other datasets, endpoints and algorithms
89
+
90
+ You can pass training_dataset, prediction_feature and algorithms parameters to the `Model::Validation.create_from_enanomapper` command. The [API documentation](http://rdoc.info/gems/lazar) provides detailed instructions. Detailed documentation and validation results can be found in this [publication](https://github.com/enanomapper/nano-lazar-paper/blob/master/nano-lazar.pdf).
91
+
29
92
  Documentation
30
93
  -------------
31
94
  * [API documentation](http://rdoc.info/gems/lazar)
32
95
 
33
96
  Copyright
34
97
  ---------
35
- Copyright (c) 2009-2016 Christoph Helma, Martin Guetlein, Micha Rautenberg, Andreas Maunz, David Vorgrimmler, Denis Gebele. See LICENSE for details.
98
+ Copyright (c) 2009-2017 Christoph Helma, Martin Guetlein, Micha Rautenberg, Andreas Maunz, David Vorgrimmler, Denis Gebele. See LICENSE for details.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 1.0.0
1
+ 1.0.1
data/lib/algorithm.rb CHANGED
@@ -2,6 +2,7 @@ module OpenTox
2
2
 
3
3
  module Algorithm
4
4
 
5
+ # Execute an algorithm with parameters
5
6
  def self.run algorithm, parameters=nil
6
7
  klass,method = algorithm.split('.')
7
8
  Object.const_get(klass).send(method,parameters)
data/lib/caret.rb CHANGED
@@ -1,9 +1,17 @@
1
1
  module OpenTox
2
2
  module Algorithm
3
3
 
4
+ # Ruby interface for the R caret package
5
+ # Caret model list: https://topepo.github.io/caret/modelList.html
4
6
  class Caret
5
- # model list: https://topepo.github.io/caret/modelList.html
6
7
 
8
+ # Create a local R caret model and make a prediction
9
+ # @param [Array<Float,Bool>] dependent_variables
10
+ # @param [Array<Array<Float,Bool>>] independent_variables
11
+ # @param [Array<Float>] weights
12
+ # @param [String] Caret method
13
+ # @param [Array<Float,Bool>] query_variables
14
+ # @return [Hash]
7
15
  def self.create_model_and_predict dependent_variables:, independent_variables:, weights:, method:, query_variables:
8
16
  remove = []
9
17
  # remove independent_variables with single values
@@ -77,12 +85,13 @@ module OpenTox
77
85
 
78
86
  end
79
87
 
80
- # call caret methods dynamically, e.g. Caret.pls
88
+ # Call caret methods dynamically, e.g. Caret.pls
81
89
  def self.method_missing(sym, *args, &block)
82
90
  args.first[:method] = sym.to_s
83
91
  self.create_model_and_predict args.first
84
92
  end
85
93
 
94
+ # Convert Ruby values to R values
86
95
  def self.to_r v
87
96
  return "F" if v == false
88
97
  return "T" if v == true
@@ -1,9 +1,14 @@
1
1
  module OpenTox
2
2
  module Algorithm
3
3
 
4
+ # Classification algorithms
4
5
  class Classification
5
6
 
6
- def self.weighted_majority_vote dependent_variables:, independent_variables:nil, weights:, query_variables:
7
+ # Weighted majority vote
8
+ # @param [Array<TrueClass,FalseClass>] dependent_variables
9
+ # @param [Array<Float>] weights
10
+ # @return [Hash]
11
+ def self.weighted_majority_vote dependent_variables:, independent_variables:nil, weights:, query_variables:nil
7
12
  class_weights = {}
8
13
  dependent_variables.each_with_index do |v,i|
9
14
  class_weights[v] ||= []
data/lib/compound.rb CHANGED
@@ -2,6 +2,7 @@ CACTUS_URI="https://cactus.nci.nih.gov/chemical/structure/"
2
2
 
3
3
  module OpenTox
4
4
 
5
+ # Small molecules with defined chemical structures
5
6
  class Compound < Substance
6
7
  require_relative "unique_descriptors.rb"
7
8
  DEFAULT_FINGERPRINT = "MP2D"
@@ -28,6 +29,9 @@ module OpenTox
28
29
  compound
29
30
  end
30
31
 
32
+ # Create chemical fingerprint
33
+ # @param [String] fingerprint type
34
+ # @return [Array<String>]
31
35
  def fingerprint type=DEFAULT_FINGERPRINT
32
36
  unless fingerprints[type]
33
37
  return [] unless self.smiles
@@ -75,6 +79,9 @@ module OpenTox
75
79
  fingerprints[type]
76
80
  end
77
81
 
82
+ # Calculate physchem properties
83
+ # @param [Array<Hash>] list of descriptors
84
+ # @return [Array<Float>]
78
85
  def calculate_properties descriptors=PhysChem::OPENBABEL
79
86
  calculated_ids = properties.keys
80
87
  # BSON::ObjectId instances are not allowed as keys in a BSON document.
@@ -96,6 +103,10 @@ module OpenTox
96
103
  descriptors.collect{|d| properties[d.id.to_s]}
97
104
  end
98
105
 
106
+ # Match a SMARTS substructure
107
+ # @param [String] smarts
108
+ # @param [TrueClass,FalseClass] count matches or return true/false
109
+ # @return [TrueClass,FalseClass,Fixnum]
99
110
  def smarts_match smarts, count=false
100
111
  obconversion = OpenBabel::OBConversion.new
101
112
  obmol = OpenBabel::OBMol.new
@@ -116,8 +127,8 @@ module OpenTox
116
127
  # Create a compound from smiles string
117
128
  # @example
118
129
  # compound = OpenTox::Compound.from_smiles("c1ccccc1")
119
- # @param [String] smiles Smiles string
120
- # @return [OpenTox::Compound] Compound
130
+ # @param [String] smiles
131
+ # @return [OpenTox::Compound]
121
132
  def self.from_smiles smiles
122
133
  if smiles.match(/\s/) # spaces seem to confuse obconversion and may lead to invalid smiles
123
134
  $logger.warn "SMILES parsing failed for '#{smiles}'', SMILES string contains whitespaces."
@@ -132,9 +143,9 @@ module OpenTox
132
143
  end
133
144
  end
134
145
 
135
- # Create a compound from inchi string
136
- # @param inchi [String] smiles InChI string
137
- # @return [OpenTox::Compound] Compound
146
+ # Create a compound from InChI string
147
+ # @param [String] InChI
148
+ # @return [OpenTox::Compound]
138
149
  def self.from_inchi inchi
139
150
  #smiles = `echo "#{inchi}" | "#{File.join(File.dirname(__FILE__),"..","openbabel","bin","babel")}" -iinchi - -ocan`.chomp.strip
140
151
  smiles = obconversion(inchi,"inchi","can")
@@ -145,9 +156,9 @@ module OpenTox
145
156
  end
146
157
  end
147
158
 
148
- # Create a compound from sdf string
149
- # @param sdf [String] smiles SDF string
150
- # @return [OpenTox::Compound] Compound
159
+ # Create a compound from SDF
160
+ # @param [String] SDF
161
+ # @return [OpenTox::Compound]
151
162
  def self.from_sdf sdf
152
163
  # do not store sdf because it might be 2D
153
164
  Compound.from_smiles obconversion(sdf,"sdf","can")
@@ -156,40 +167,38 @@ module OpenTox
156
167
  # Create a compound from name. Relies on an external service for name lookups.
157
168
  # @example
158
169
  # compound = OpenTox::Compound.from_name("Benzene")
159
- # @param name [String] can be also an InChI/InChiKey, CAS number, etc
160
- # @return [OpenTox::Compound] Compound
170
+ # @param [String] name, can be also an InChI/InChiKey, CAS number, etc
171
+ # @return [OpenTox::Compound]
161
172
  def self.from_name name
162
173
  Compound.from_smiles RestClientWrapper.get(File.join(CACTUS_URI,URI.escape(name),"smiles"))
163
174
  end
164
175
 
165
176
  # Get InChI
166
- # @return [String] InChI string
177
+ # @return [String]
167
178
  def inchi
168
179
  unless self["inchi"]
169
-
170
180
  result = obconversion(smiles,"smi","inchi")
171
- #result = `echo "#{self.smiles}" | "#{File.join(File.dirname(__FILE__),"..","openbabel","bin","babel")}" -ismi - -oinchi`.chomp
172
181
  update(:inchi => result.chomp) if result and !result.empty?
173
182
  end
174
183
  self["inchi"]
175
184
  end
176
185
 
177
186
  # Get InChIKey
178
- # @return [String] InChIKey string
187
+ # @return [String]
179
188
  def inchikey
180
189
  update(:inchikey => obconversion(smiles,"smi","inchikey")) unless self["inchikey"]
181
190
  self["inchikey"]
182
191
  end
183
192
 
184
193
  # Get (canonical) smiles
185
- # @return [String] Smiles string
194
+ # @return [String]
186
195
  def smiles
187
196
  update(:smiles => obconversion(self["smiles"],"smi","can")) unless self["smiles"]
188
197
  self["smiles"]
189
198
  end
190
199
 
191
- # Get sdf
192
- # @return [String] SDF string
200
+ # Get SDF
201
+ # @return [String]
193
202
  def sdf
194
203
  if self.sdf_id.nil?
195
204
  sdf = obconversion(smiles,"smi","sdf")
@@ -209,7 +218,6 @@ module OpenTox
209
218
  update(:svg_id => $gridfs.insert_one(file))
210
219
  end
211
220
  $gridfs.find_one(_id: self.svg_id).data
212
-
213
221
  end
214
222
 
215
223
  # Get png image
@@ -223,26 +231,27 @@ module OpenTox
223
231
  update(:png_id => $gridfs.insert_one(file))
224
232
  end
225
233
  Base64.decode64($gridfs.find_one(_id: self.png_id).data)
226
-
227
234
  end
228
235
 
229
236
  # Get all known compound names. Relies on an external service for name lookups.
230
237
  # @example
231
238
  # names = compound.names
232
- # @return [String] Compound names
239
+ # @return [Array<String>]
233
240
  def names
234
241
  update(:names => RestClientWrapper.get("#{CACTUS_URI}#{inchi}/names").split("\n")) unless self["names"]
235
242
  self["names"]
236
243
  end
237
244
 
238
- # @return [String] PubChem Compound Identifier (CID), derieved via restcall to pubchem
245
+ # Get PubChem Compound Identifier (CID), obtained via REST call to PubChem
246
+ # @return [String]
239
247
  def cid
240
248
  pug_uri = "https://pubchem.ncbi.nlm.nih.gov/rest/pug/"
241
249
  update(:cid => RestClientWrapper.post(File.join(pug_uri, "compound", "inchi", "cids", "TXT"),{:inchi => inchi}).strip) unless self["cid"]
242
250
  self["cid"]
243
251
  end
244
252
 
245
- # @return [String] ChEMBL database compound id, derieved via restcall to chembl
253
+ # Get ChEMBL database compound id, obtained via REST call to ChEMBL
254
+ # @return [String]
246
255
  def chemblid
247
256
  # https://www.ebi.ac.uk/chembldb/ws#individualCompoundByInChiKey
248
257
  uri = "https://www.ebi.ac.uk/chemblws/compounds/smiles/#{smiles}.json"
@@ -292,7 +301,7 @@ module OpenTox
292
301
  mg.to_f/molecular_weight
293
302
  end
294
303
 
295
- # Calculate molecular weight of Compound with OB and store it in object
304
+ # Calculate molecular weight of Compound with OB and store it in compound object
296
305
  # @return [Float] molecular weight
297
306
  def molecular_weight
298
307
  mw_feature = PhysChem.find_or_create_by(:name => "Openbabel.MW")
@@ -1,10 +1,16 @@
1
1
  module OpenTox
2
2
 
3
3
  module Validation
4
+
5
+ # Crossvalidation
4
6
  class CrossValidation < Validation
5
7
  field :validation_ids, type: Array, default: []
6
8
  field :folds, type: Integer, default: 10
7
9
 
10
+ # Create a crossvalidation
11
+ # @param [OpenTox::Model::Lazar]
12
+ # @param [Fixnum] number of folds
13
+ # @return [OpenTox::Validation::CrossValidation]
8
14
  def self.create model, n=10
9
15
  $logger.debug model.algorithms
10
16
  klass = ClassificationCrossValidation if model.is_a? Model::LazarClassification
@@ -41,14 +47,20 @@ module OpenTox
41
47
  cv
42
48
  end
43
49
 
50
+ # Get execution time
51
+ # @return [Fixnum]
44
52
  def time
45
53
  finished_at - created_at
46
54
  end
47
55
 
56
+ # Get individual validations
57
+ # @return [Array<OpenTox::Validation>]
48
58
  def validations
49
59
  validation_ids.collect{|vid| TrainTest.find vid}
50
60
  end
51
61
 
62
+ # Get predictions for all compounds
63
+ # @return [Array<Hash>]
52
64
  def predictions
53
65
  predictions = {}
54
66
  validations.each{|v| predictions.merge!(v.predictions)}
@@ -56,6 +68,7 @@ module OpenTox
56
68
  end
57
69
  end
58
70
 
71
+ # Crossvalidation of classification models
59
72
  class ClassificationCrossValidation < CrossValidation
60
73
  include ClassificationStatistics
61
74
  field :accept_values, type: Array
@@ -68,6 +81,7 @@ module OpenTox
68
81
  field :probability_plot_id, type: BSON::ObjectId
69
82
  end
70
83
 
84
+ # Crossvalidation of regression models
71
85
  class RegressionCrossValidation < CrossValidation
72
86
  include RegressionStatistics
73
87
  field :rmse, type: Float, default:0
@@ -78,10 +92,16 @@ module OpenTox
78
92
  field :correlation_plot_id, type: BSON::ObjectId
79
93
  end
80
94
 
95
+ # Independent repeated crossvalidations
81
96
  class RepeatedCrossValidation < Validation
82
97
  field :crossvalidation_ids, type: Array, default: []
83
98
  field :correlation_plot_id, type: BSON::ObjectId
84
99
 
100
+ # Create repeated crossvalidations
101
+ # @param [OpenTox::Model::Lazar]
102
+ # @param [Fixnum] number of folds
103
+ # @param [Fixnum] number of repeats
104
+ # @return [OpenTox::Validation::RepeatedCrossValidation]
85
105
  def self.create model, folds=10, repeats=3
86
106
  repeated_cross_validation = self.new
87
107
  repeats.times do |n|
@@ -92,6 +112,8 @@ module OpenTox
92
112
  repeated_cross_validation
93
113
  end
94
114
 
115
+ # Get crossvalidations
116
+ # @return [OpenTox::Validation::CrossValidation]
95
117
  def crossvalidations
96
118
  crossvalidation_ids.collect{|id| CrossValidation.find(id)}
97
119
  end
data/lib/dataset.rb CHANGED
@@ -3,32 +3,43 @@ require 'tempfile'
3
3
 
4
4
  module OpenTox
5
5
 
6
+ # Collection of substances and features
6
7
  class Dataset
7
8
 
8
9
  field :data_entries, type: Hash, default: {}
9
10
 
10
11
  # Readers
11
12
 
13
+ # Get all compounds
14
+ # @return [Array<OpenTox::Compound>]
12
15
  def compounds
13
16
  substances.select{|s| s.is_a? Compound}
14
17
  end
15
18
 
19
+ # Get all nanoparticles
20
+ # @return [Array<OpenTox::Nanoparticle>]
16
21
  def nanoparticles
17
22
  substances.select{|s| s.is_a? Nanoparticle}
18
23
  end
19
24
 
20
25
  # Get all substances
26
+ # @return [Array<OpenTox::Substance>]
21
27
  def substances
22
28
  @substances ||= data_entries.keys.collect{|id| OpenTox::Substance.find id}.uniq
23
29
  @substances
24
30
  end
25
31
 
26
32
  # Get all features
33
+ # @return [Array<OpenTox::Feature>]
27
34
  def features
28
35
  @features ||= data_entries.collect{|sid,data| data.keys.collect{|id| OpenTox::Feature.find(id)}}.flatten.uniq
29
36
  @features
30
37
  end
31
38
 
39
+ # Get all values for a given substance and feature
40
+ # @param [OpenTox::Substance,BSON::ObjectId,String] substance or substance id
41
+ # @param [OpenTox::Feature,BSON::ObjectId,String] feature or feature id
42
+ # @return [TrueClass,FalseClass,Float]
32
43
  def values substance,feature
33
44
  substance = substance.id if substance.is_a? Substance
34
45
  feature = feature.id if feature.is_a? Feature
@@ -41,6 +52,10 @@ module OpenTox
41
52
 
42
53
  # Writers
43
54
 
55
+ # Add a value for a given substance and feature
56
+ # @param [OpenTox::Substance,BSON::ObjectId,String] substance or substance id
57
+ # @param [OpenTox::Feature,BSON::ObjectId,String] feature or feature id
58
+ # @param [TrueClass,FalseClass,Float]
44
59
  def add(substance,feature,value)
45
60
  substance = substance.id if substance.is_a? Substance
46
61
  feature = feature.id if feature.is_a? Feature
@@ -87,7 +102,7 @@ module OpenTox
87
102
 
88
103
  # Serialisation
89
104
 
90
- # converts dataset to csv format including compound smiles as first column, other column headers are feature names
105
+ # Convert dataset to csv format including compound smiles as first column, other column headers are feature names
91
106
  # @return [String]
92
107
  def to_csv(inchi=false)
93
108
  CSV.generate() do |csv|
@@ -130,6 +145,9 @@ module OpenTox
130
145
  #end
131
146
 
132
147
  # Create a dataset from CSV file
148
+ # @param [File]
149
+ # @param [TrueClass,FalseClass] accept or reject empty values
150
+ # @return [OpenTox::Dataset]
133
151
  def self.from_csv_file file, accept_empty_values=false
134
152
  source = file
135
153
  name = File.basename(file,".*")
@@ -145,8 +163,10 @@ module OpenTox
145
163
  dataset
146
164
  end
147
165
 
148
- # parse data in tabular format (e.g. from csv)
149
- # does a lot of guesswork in order to determine feature types
166
+ # Parse data in tabular format (e.g. from csv)
167
+ # does a lot of guesswork in order to determine feature types
168
+ # @param [Array<Array>]
169
+ # @param [TrueClass,FalseClass] accept or reject empty values
150
170
  def parse_table table, accept_empty_values
151
171
 
152
172
  # features
@@ -225,6 +245,7 @@ module OpenTox
225
245
  save
226
246
  end
227
247
 
248
+ # Delete dataset
228
249
  def delete
229
250
  compounds.each{|c| c.dataset_ids.delete id.to_s}
230
251
  super
@@ -238,14 +259,20 @@ module OpenTox
238
259
  field :prediction_feature_id, type: BSON::ObjectId
239
260
  field :predictions, type: Hash, default: {}
240
261
 
262
+ # Get prediction feature
263
+ # @return [OpenTox::Feature]
241
264
  def prediction_feature
242
265
  Feature.find prediction_feature_id
243
266
  end
244
267
 
268
+ # Get all compounds
269
+ # @return [Array<OpenTox::Compound>]
245
270
  def compounds
246
271
  substances.select{|s| s.is_a? Compound}
247
272
  end
248
273
 
274
+ # Get all substances
275
+ # @return [Array<OpenTox::Substance>]
249
276
  def substances
250
277
  predictions.keys.collect{|id| Substance.find id}
251
278
  end
data/lib/feature.rb CHANGED
@@ -8,10 +8,14 @@ module OpenTox
8
8
  field :unit, type: String
9
9
  field :conditions, type: Hash
10
10
 
11
+ # Is it a nominal feature
12
+ # @return [TrueClass,FalseClass]
11
13
  def nominal?
12
14
  self.class == NominalFeature
13
15
  end
14
16
 
17
+ # Is it a numeric feature
18
+ # @return [TrueClass,FalseClass]
15
19
  def numeric?
16
20
  self.class == NumericFeature
17
21
  end
@@ -30,6 +34,9 @@ module OpenTox
30
34
  class Smarts < NominalFeature
31
35
  field :smarts, type: String
32
36
  index "smarts" => 1
37
+ # Create feature from SMARTS string
38
+ # @param [String]
39
+ # @return [OpenTox::Feature]
33
40
  def self.from_smarts smarts
34
41
  self.find_or_create_by :smarts => smarts
35
42
  end
@@ -1,13 +1,16 @@
1
1
  module OpenTox
2
2
  module Algorithm
3
3
 
4
+ # Feature selection algorithms
4
5
  class FeatureSelection
5
6
 
7
+ # Select features correlated to the models prediction feature
8
+ # @param [OpenTox::Model::Lazar]
6
9
  def self.correlation_filter model
7
10
  relevant_features = {}
8
11
  R.assign "dependent", model.dependent_variables.collect{|v| to_r(v)}
9
12
  model.descriptor_weights = []
10
- selected_variables = []
13
+ selected_variables = []
11
14
  selected_descriptor_ids = []
12
15
  model.independent_variables.each_with_index do |v,i|
13
16
  v.collect!{|n| to_r(n)}
data/lib/import.rb CHANGED
@@ -1,12 +1,14 @@
1
1
  module OpenTox
2
2
 
3
+ # Import data from external databases
3
4
  module Import
4
5
 
5
6
  class Enanomapper
6
7
  include OpenTox
7
8
 
8
- # time critical step: JSON parsing (>99%), Oj brings only minor speed gains (~1%)
9
+ # Import from eNanoMapper
9
10
  def self.import
11
+ # time critical step: JSON parsing (>99%), Oj brings only minor speed gains (~1%)
10
12
  datasets = {}
11
13
  bundles = JSON.parse(RestClientWrapper.get('https://data.enanomapper.net/bundle?media=application%2Fjson'))["dataset"]
12
14
  bundles.each do |bundle|
@@ -20,6 +22,7 @@ module OpenTox
20
22
  uri = c["component"]["compound"]["URI"]
21
23
  uri = CGI.escape File.join(uri,"&media=application/json")
22
24
  data = JSON.parse(RestClientWrapper.get "https://data.enanomapper.net/query/compound/url/all?media=application/json&search=#{uri}")
25
+ source = data["dataEntry"][0]["compound"]["URI"]
23
26
  smiles = data["dataEntry"][0]["values"]["https://data.enanomapper.net/feature/http%3A%2F%2Fwww.opentox.org%2Fapi%2F1.1%23SMILESDefault"]
24
27
  names = []
25
28
  names << data["dataEntry"][0]["values"]["https://data.enanomapper.net/feature/http%3A%2F%2Fwww.opentox.org%2Fapi%2F1.1%23ChemicalNameDefault"]
@@ -31,6 +34,7 @@ module OpenTox
31
34
  else
32
35
  compound = Compound.find_or_create_by(:name => names.first,:names => names.compact)
33
36
  end
37
+ compound.source = source
34
38
  compound.save
35
39
  if c["relation"] == "HAS_CORE"
36
40
  core_id = compound.id.to_s
@@ -2,8 +2,12 @@ module OpenTox
2
2
 
3
3
  module Validation
4
4
 
5
+ # Leave one out validation
5
6
  class LeaveOneOut < Validation
6
7
 
8
+ # Create a leave one out validation
9
+ # @param [OpenTox::Model::Lazar]
10
+ # @return [OpenTox::Validation::LeaveOneOut]
7
11
  def self.create model
8
12
  bad_request_error "Cannot create leave one out validation for models with supervised feature selection. Please use crossvalidation instead." if model.algorithms[:feature_selection]
9
13
  $logger.debug "#{model.name}: LOO validation started"
@@ -32,6 +36,7 @@ module OpenTox
32
36
 
33
37
  end
34
38
 
39
+ # Leave one out validation for classification models
35
40
  class ClassificationLeaveOneOut < LeaveOneOut
36
41
  include ClassificationStatistics
37
42
  field :accept_values, type: Array
@@ -44,6 +49,7 @@ module OpenTox
44
49
  field :confidence_plot_id, type: BSON::ObjectId
45
50
  end
46
51
 
52
+ # Leave one out validation for regression models
47
53
  class RegressionLeaveOneOut < LeaveOneOut
48
54
  include RegressionStatistics
49
55
  field :rmse, type: Float, default: 0