data_modeler 0.3.4 → 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/README.md +88 -15
- data/data_modeler.gemspec +5 -5
- data/installation.md +65 -0
- data/lib/data_modeler.rb +8 -6
- data/lib/data_modeler/dataset/{dataset.rb → accessor.rb} +1 -1
- data/lib/data_modeler/dataset/{dataset_gen.rb → generator.rb} +1 -1
- data/lib/data_modeler/dataset/helper.rb +10 -1
- data/lib/data_modeler/{base.rb → framework.rb} +28 -21
- data/lib/data_modeler/{support.rb → helper.rb} +6 -11
- data/lib/data_modeler/{models → model}/fann.rb +3 -1
- data/lib/data_modeler/{models → model}/selector.rb +1 -1
- data/lib/data_modeler/version.rb +5 -0
- metadata +16 -13
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3d0ab252517704c4fd6804f63eee8d4bcfb3e6d2
|
4
|
+
data.tar.gz: 9ef139759ba7f429a9738ab93656e63d9ee709c5
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 11ab20819803dc44e5e73374089eabc6214a27a2f6b36ea3915e92c55964c21af01e8f932bea710e0184cc01aad42499c80768d3e4115c4920fc4d64d8c305fb
|
7
|
+
data.tar.gz: 129a0f555ad41c475902a136a805134a525019d3cc700a77c54cacb3ce8e9b9fad717c863e88593257a05638e5563e36ea0825456c552b556b34c38f8c752db6
|
data/.gitignore
CHANGED
data/README.md
CHANGED
@@ -1,40 +1,113 @@
|
|
1
1
|
|
2
2
|
# [Data Modeler](https://github.com/giuse/data_modeler)
|
3
3
|
|
4
|
+
|
4
5
|
[![Gem Version](https://badge.fury.io/rb/data_modeler.svg)](https://badge.fury.io/rb/data_modeler)
|
5
6
|
[![Build Status](https://travis-ci.org/giuse/data_modeler.svg?branch=master)](https://travis-ci.org/giuse/data_modeler)
|
6
7
|
[![Code Climate](https://codeclimate.com/github/giuse/data_modeler/badges/gpa.svg)](https://codeclimate.com/github/giuse/data_modeler)
|
7
8
|
|
8
|
-
|
9
|
+
|
10
|
+
#### Using machine learning, create generative models based on your data alone. Applications span from prediction to imputation and compression.
|
11
|
+
|
9
12
|
|
10
13
|
## Installation
|
11
14
|
|
12
|
-
Add
|
15
|
+
Add `gem 'data_modeler'` to your Gemfile then `$ bundle`, or install manually with `$ gem install data_modeler`.
|
16
|
+
|
17
|
+
If you're new to Ruby or Bundler, check these detailed [installation instructions](installation.md) first.
|
18
|
+
|
19
|
+
|
20
|
+
## Full documentation
|
21
|
+
|
22
|
+
I wish for my code to stay well documented. If you find the documentation lacking or outdated, please do let me know. You can find it [here](http://www.rubydoc.info/gems/data_modeler/).
|
23
|
+
|
24
|
+
|
25
|
+
## Getting started
|
26
|
+
|
27
|
+
|
28
|
+
### Obtaining a working configuration on example data
|
29
|
+
|
30
|
+
Make a copy of [`/spec/example`](spec/example) for you to play with.
|
31
|
+
The `config*.rb` files are configuration examples. The configuration is written in a simple Ruby `Hash`, and the files themselves can be directly executed with (i.e. run `ruby config_01.rb`) thanks to the few lines at the bottom.
|
32
|
+
|
33
|
+
The `.csv` files are examples of the format the data must be pre-processed into beforehand: a CSV table with a numeric time as first column, followed by one column for each of the time series available. The data should be complete (i.e. no missing values) and already normalized (depending on the model of choice). The file [`prepare_demo_csv`](spec/example/prepare_demo_csv.rb) can help you getting started on the task, as it was used to generate the demo CSV.
|
34
|
+
|
35
|
+
Start by just running one of the configurations, then play around with the config and customize them to your taste. And off you go!
|
36
|
+
|
37
|
+
|
38
|
+
### Understanding the results
|
39
|
+
|
40
|
+
Running a config file will create a folder holding the results; the path can be customized in the config file.
|
41
|
+
Note that `DataModeler#id_from` returns a numeric ID from the end of a string (e.g. file name), saving you from forgetting to update the output folder after creating a new config by copy.
|
42
|
+
|
43
|
+
Inside the results folder you will find a result file (CSV) for each run. They follow the naming convention `tpredobs_<nrun>.csv` as to remind their internal structure:
|
44
|
+
|
45
|
+
- First column is `time` and contains the timestamp of each target in the original data
|
46
|
+
- Then come all the columns relative to predictions
|
47
|
+
- The naming pattern `p_<series name>` corresponds to the predicted values for series named "series name" in the original data.
|
48
|
+
- Then come all the columns relative to observations
|
49
|
+
- The naming pattern `o_<series name>` corresponds to the observed values for series named "series name" in the original data.
|
13
50
|
|
14
|
-
|
15
|
-
gem 'data_modeler'
|
16
|
-
```
|
51
|
+
Loading this raw result data allows for easy calculation of residuals and statistics, and to plot your predictions against the ground truth.
|
17
52
|
|
18
|
-
And then execute:
|
19
53
|
|
20
|
-
|
54
|
+
### Customizing your experiment
|
21
55
|
|
22
|
-
|
56
|
+
Outdated documentation is often worse than lack of documentation. To understand all configuration options, consider the following:
|
23
57
|
|
24
|
-
|
58
|
+
- All configuration keys but the last refer to the data: where to find the original data, where to save the results, and how to build the train/test sets. I guarantee there will be no default value for these configurations, making it necessary for all the options to be explicitly declared in all `config` files. So everything you find there is everything there is.
|
25
59
|
|
26
|
-
|
60
|
+
- The (usually) last configuration key is named `:learner` and is model dependent, totally flexible.
|
61
|
+
Its (usually) first key is `:type`: you will find a model of the same(ish) name in the folder [`lib/data_modeler/model`](lib/data_modeler/model). The initializer of this class receives the `:learner` sub-configuration hash minus the key `:type` (already consumed to select the model).
|
27
62
|
|
28
|
-
|
63
|
+
This means that to know all available options you should rely on a previous config file, plus to the documentation (or implementation) of the `initialize` function of the model of choice (should be small).
|
29
64
|
|
30
|
-
## Development
|
31
65
|
|
32
|
-
|
33
|
-
|
66
|
+
### Leveraging time series data
|
67
|
+
|
68
|
+
There are three settings under `:tset` in the config which may be cryptic: `ninput_points`, `tspread` and `look_ahead`. Names can change in the future as I found it hard to name these three, please open an issue if I forget to modify this (or if you have suggestions).
|
69
|
+
|
70
|
+
If you don't work with time series, just set them to [1,0,0], use a line counter for `time`, and ignore the following. These three only make sense if the data is composed of aligned time series, with a numeric column `time` -- its unit will also be the unit for `tspread` and `look_ahead`.
|
71
|
+
|
72
|
+
- ninput_points: how many points in time to construct the model's input. For example, if the number is 3, then data coming from 3 data lines is considered.
|
73
|
+
- tspread: time spread between the data lines considered in the point above. For example, if the number is 2, then the data lines considered will have (at least) 2 time (units) between each other.
|
74
|
+
- look_ahead: span between the most recent input considered and the target to be learned. For example, if the number is 5, then the target will be constructed from a data line which is (at least) 5 time (units) later than the most recent input.
|
75
|
+
|
76
|
+
Example configurations:
|
77
|
+
|
78
|
+
- ninput_points = 1, tspread = 0, look_ahead = 0 -> build input from one line, no spreading, predict results in same line. This is the basic configuration allowing same-timestep prediction, e.g. for static modeling or simple data imputation.
|
79
|
+
- ninput_points = 4, tspread = 7, look_ahead = 7 -> hypothesize the unit of the column `time` to be days: build input from 4 lines spanning 21 days at one-week intervals (+ current), then use it to learn to predict one week ahead. This allows to train a proper time-ahead predictor, which will estimate the target at a constant one-week ahead interval.
|
80
|
+
- ninput_points = 30, tspread = 1, look_ahead = 1 -> hypothesize the unit of the column `time` to be seconds: train a real-time predictor estimating a behavior one-second ahead based on 1s-spaced data over the past 29 seconds + current.
|
81
|
+
|
82
|
+
Important: from each line, only the data coming from the listed input time series is considered for input, while the target time series list is used to construct the output.
|
83
|
+
|
84
|
+
Example inputs and targets, considering t0 the "current" time for a given iteration:
|
85
|
+
|
86
|
+
- ninput_points = 1, tspread = 0, look_ahead = 0, input_series = [s1, s4], targets = [s3]: inputs -> [s1t0, s2t0], targets = [s3t0]
|
87
|
+
- ninput_points = 4, tspread = 7, look_ahead = 7, input_series = [s1, s4], targets = [s3, s5]: inputs -> [s1t-21, s2t-21, s1t-14, s2t-14, s1t-7, s2t-7, s1t0, s2t0], targets = [s3t0, s5t0]
|
88
|
+
|
34
89
|
|
35
90
|
## Contributing
|
36
91
|
|
37
|
-
|
92
|
+
|
93
|
+
### Suggestions / requests
|
94
|
+
|
95
|
+
Feel free to open new issues. I mean it. We can work together from there.
|
96
|
+
|
97
|
+
|
98
|
+
### Adding new models
|
99
|
+
|
100
|
+
This system has by design a plug-in architecture. To add your own models, you just need to create a new wrapper in `lib/data_modeler/model`:
|
101
|
+
|
102
|
+
- Duplicate the `fann.rb` model: it provides both instructions and template for the interface you need to present to the system
|
103
|
+
- Duplicate the `spec/model/fann_spec.rb` spec: it will provide instructions on how to verify your model works with the system using some ready `shared_examples`.
|
104
|
+
|
105
|
+
Ideally, a `DataModeler::Model` should be a wrapper around an external independent functionality: keep it as compact as possible. To implement the interface you can use BDD on the `spec`, which verifies both the availability of the interface and basic modeling capabilities.
|
106
|
+
|
107
|
+
Remember to update [`lib/data_modeler.rb`](lib/data_modeler.rb) to load your file, and add an option to select it in [`lib/data_modeler/model/selector.rb`](lib/data_modeler/model/selector.rb)
|
108
|
+
|
109
|
+
THEN: please do propose a pull requests! Share your work with the community!
|
110
|
+
Even if you think it's not polished enough: I'll help out before accepting.
|
38
111
|
|
39
112
|
|
40
113
|
## License
|
data/data_modeler.gemspec
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
# coding: utf-8
|
2
2
|
lib = File.expand_path('../lib', __FILE__)
|
3
3
|
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
-
require 'data_modeler/
|
4
|
+
require 'data_modeler/version'
|
5
5
|
|
6
6
|
Gem::Specification.new do |spec|
|
7
7
|
spec.name = "data_modeler"
|
@@ -9,8 +9,8 @@ Gem::Specification.new do |spec|
|
|
9
9
|
spec.authors = ["Giuseppe Cuccu"]
|
10
10
|
spec.email = ["giuseppe.cuccu@gmail.com"]
|
11
11
|
|
12
|
-
spec.summary = %
|
13
|
-
spec.description = %
|
12
|
+
spec.summary = %{Model your data with machine learning}
|
13
|
+
spec.description = %{Using machine learning, create generative models based on your data alone. Applications span from prediction to imputation and compression. This build specifically leverages time series. NOTE: Since version 1.0.0 we're production-ready! :)}
|
14
14
|
spec.homepage = "https://github.com/giuse/data_modeler"
|
15
15
|
spec.license = "MIT"
|
16
16
|
|
@@ -18,12 +18,12 @@ Gem::Specification.new do |spec|
|
|
18
18
|
f.match(%r{^(test|spec|features)/})
|
19
19
|
end
|
20
20
|
spec.bindir = "exe"
|
21
|
-
spec.executables = spec.files.grep(
|
21
|
+
spec.executables = spec.files.grep(/^exe\//, &File.method(:basename))
|
22
22
|
spec.require_paths = ["lib"]
|
23
23
|
|
24
24
|
# Models
|
25
25
|
# TODO: learn to keep them independent from the gem (plug-ins)
|
26
|
-
spec.add_dependency 'ruby-fann', '~>1.2
|
26
|
+
spec.add_dependency 'ruby-fann', '~>1.2'
|
27
27
|
|
28
28
|
# Debug
|
29
29
|
spec.add_development_dependency 'pry', '~> 0.10'
|
data/installation.md
ADDED
@@ -0,0 +1,65 @@
|
|
1
|
+
# Installation instructions
|
2
|
+
|
3
|
+
New to Ruby? Welcome! :)
|
4
|
+
|
5
|
+
|
6
|
+
## Obtaining Ruby
|
7
|
+
|
8
|
+
I handle my Rubies with RVM. Just head to [rvm.io](http://rvm.io) and follow the instructions.
|
9
|
+
Windows users: you're probably better off with [The Ruby Installer](https://rubyinstaller.org/) for starters.
|
10
|
+
|
11
|
+
|
12
|
+
## Installing the gem
|
13
|
+
|
14
|
+
To manually install the gem, use the following command from your command line:
|
15
|
+
|
16
|
+
$ gem install data_modeler
|
17
|
+
|
18
|
+
To include it in a Bundler-managed application, add this to your Gemfile:
|
19
|
+
|
20
|
+
```ruby
|
21
|
+
gem 'data_modeler'
|
22
|
+
```
|
23
|
+
|
24
|
+
And then execute:
|
25
|
+
|
26
|
+
$ bundler
|
27
|
+
|
28
|
+
Bundler will keep your gemset coherent for the life of your application, if you're not using it already you should totally check it out at [bundler.io](http://bundler.io).
|
29
|
+
|
30
|
+
|
31
|
+
## Testing the installation
|
32
|
+
|
33
|
+
To test if everything installed correctly, launch an interactive Ruby console from the terminal with:
|
34
|
+
|
35
|
+
$ irb
|
36
|
+
|
37
|
+
then try loading the gem
|
38
|
+
|
39
|
+
```ruby
|
40
|
+
ruby> require 'data_modeler'
|
41
|
+
# => true
|
42
|
+
```
|
43
|
+
|
44
|
+
If the command returns `true`, the gem is installed and available. You should be good to go!
|
45
|
+
|
46
|
+
Still, forstarters, I advice you to unpack a copy of the gem to play with. These commands will create an independent copy you can mess up with to no consequences:
|
47
|
+
|
48
|
+
$ gem unpack data_modeler
|
49
|
+
|
50
|
+
Now get in, install the dependencies (did you check out [Bundler](http://bundler.io) as advised?), then run the tests and make sure everything works
|
51
|
+
|
52
|
+
$ cd data_modeler
|
53
|
+
$ bundle install
|
54
|
+
$ rake
|
55
|
+
|
56
|
+
If the tests run green, you're sure everything is working correctly. There's a working configuration example + test data in `spec/example/config_01.rb`. Go ahead and try it:
|
57
|
+
|
58
|
+
$ cd spec/example
|
59
|
+
$ ruby config_01.rb
|
60
|
+
|
61
|
+
This will create a subfolder `01/` containing the results of the computation.
|
62
|
+
|
63
|
+
With this you should be ready to head back to the [README](README.md) for further instructions.
|
64
|
+
|
65
|
+
Enjoy!
|
data/lib/data_modeler.rb
CHANGED
@@ -1,16 +1,18 @@
|
|
1
1
|
require 'pathname'
|
2
2
|
require 'fileutils'
|
3
3
|
|
4
|
-
|
4
|
+
# Gem support
|
5
|
+
require "data_modeler/version"
|
5
6
|
|
6
7
|
# Dataset
|
7
8
|
require "data_modeler/dataset/helper"
|
8
|
-
require "data_modeler/dataset/
|
9
|
-
require "data_modeler/dataset/
|
9
|
+
require "data_modeler/dataset/accessor"
|
10
|
+
require "data_modeler/dataset/generator"
|
10
11
|
|
11
12
|
# Models
|
12
|
-
require "data_modeler/
|
13
|
-
require "data_modeler/
|
13
|
+
require "data_modeler/model/selector"
|
14
|
+
require "data_modeler/model/fann"
|
14
15
|
|
15
16
|
# Framework core
|
16
|
-
require "data_modeler/
|
17
|
+
require "data_modeler/helper"
|
18
|
+
require "data_modeler/framework"
|
@@ -1,6 +1,6 @@
|
|
1
1
|
|
2
2
|
# Build complex inputs and targets from the data to train the model.
|
3
|
-
class DataModeler::Dataset
|
3
|
+
class DataModeler::Dataset::Accessor
|
4
4
|
|
5
5
|
attr_reader :data, :input_series, :target_series, :first_idx, :end_idx,
|
6
6
|
:ninput_points, :tspread, :look_ahead, :first_idx, :target_idx,
|
@@ -14,7 +14,7 @@
|
|
14
14
|
# Note how the test sets line up. This allows the testing results plots
|
15
15
|
# to be continuous, while no model is tested on data on which _itself_ has been trained.
|
16
16
|
# All data is used multiple times, alternately both as train and test sets.
|
17
|
-
class DataModeler::
|
17
|
+
class DataModeler::Dataset::Generator
|
18
18
|
|
19
19
|
attr_reader :data, :ds_args, :first_idx, :train_size, :test_size, :nrows
|
20
20
|
|
@@ -1,4 +1,14 @@
|
|
1
1
|
class DataModeler::Dataset
|
2
|
+
|
3
|
+
# Returns an instance of the DataModeler::Dataset::Accessor class
|
4
|
+
# @param args [Hash] Accessor class parameters
|
5
|
+
# @return [Accessor] initialized instance of Accessor class
|
6
|
+
def self.new *args
|
7
|
+
Accessor.new *args
|
8
|
+
end
|
9
|
+
|
10
|
+
### HELPER MODULES
|
11
|
+
|
2
12
|
# Converts between time and indices for referencing data lines
|
3
13
|
module ConvertingTimeAndIndices
|
4
14
|
# Returns the time for a given index
|
@@ -51,6 +61,5 @@ class DataModeler::Dataset
|
|
51
61
|
def to_a
|
52
62
|
each.to_a
|
53
63
|
end
|
54
|
-
|
55
64
|
end
|
56
65
|
end
|
@@ -5,7 +5,7 @@ require 'csv'
|
|
5
5
|
# - Initializes the system based on the config
|
6
6
|
# - Runs over the data training and testing models
|
7
7
|
# - Results and models are saved to the file system
|
8
|
-
class DataModeler::
|
8
|
+
class DataModeler::Framework
|
9
9
|
|
10
10
|
attr_reader :config, :inputs, :targets, :train_size, :test_size,
|
11
11
|
:nruns, :data, :out_dir, :tset_gen, :model
|
@@ -18,13 +18,13 @@ class DataModeler::Base
|
|
18
18
|
@train_size = config[:tset][:train_size]
|
19
19
|
@test_size = config[:tset][:test_size]
|
20
20
|
@nruns = config[:tset][:nruns] ||= Float::INFINITY # terminates with data
|
21
|
-
@save_models = config[:
|
21
|
+
@save_models = config[:data].delete :save_models
|
22
22
|
|
23
|
-
@data = load_data config[:data]
|
24
|
-
@out_dir = prepare_output config[:
|
23
|
+
@data = load_data config[:data].delete :input_file
|
24
|
+
@out_dir = prepare_output config[:data]
|
25
25
|
|
26
|
-
@tset_gen = DataModeler::
|
27
|
-
@model = DataModeler::
|
26
|
+
@tset_gen = DataModeler::Dataset::Generator.new data, **opts_for(:dataset_gen)
|
27
|
+
@model = DataModeler::Model.selector **opts_for(:learner)
|
28
28
|
end
|
29
29
|
|
30
30
|
# Main control: up to `nruns` (or until end of data) loop train-test-save
|
@@ -33,19 +33,27 @@ class DataModeler::Base
|
|
33
33
|
# @return [void]
|
34
34
|
# @note saves model, preds and obs to the file sistem at the end of each run
|
35
35
|
def run report_interval: 1000
|
36
|
-
|
36
|
+
printing = report_interval && report_interval > 0
|
37
|
+
over_nruns = nruns == Float::INFINITY ? "" : "/#{nruns}"
|
38
|
+
puts "\nStarting @ #{Time.now}\n#{self}" if printing
|
39
|
+
1.upto(nruns) do |nrun|
|
37
40
|
begin
|
38
41
|
train_set = tset_gen.train(nrun)
|
39
|
-
rescue DataModeler::
|
40
|
-
#
|
41
|
-
break
|
42
|
+
rescue DataModeler::Dataset::Generator::NoDataLeft
|
43
|
+
break # there's not enough data left for a train+test set pair
|
42
44
|
end
|
45
|
+
puts "\nRun #{nrun}#{over_nruns} -- starting @ #{Time.now}" if printing
|
43
46
|
model.reset
|
47
|
+
puts "-Training" if printing
|
44
48
|
model.train train_set, report_interval: report_interval
|
49
|
+
puts "-Testing" if printing
|
45
50
|
times, test_input, observations = tset_gen.test(nrun).values
|
46
51
|
predictions = model.test test_input
|
52
|
+
puts "-Saving" if printing
|
47
53
|
save_run nrun, model, [times, predictions, observations]
|
54
|
+
puts "Run #{nrun}#{over_nruns} -- ending @ #{Time.now}" if printing
|
48
55
|
end
|
56
|
+
puts "\nDone! @ #{Time.now}" if printing
|
49
57
|
end
|
50
58
|
|
51
59
|
# Attribute reader for instance variable `@save_models`, ending in '?' since
|
@@ -64,11 +72,10 @@ class DataModeler::Base
|
|
64
72
|
private
|
65
73
|
|
66
74
|
# Loads the data in a Hash ready for `DatasetGen` (and `Dataset`)
|
67
|
-
# @param
|
68
|
-
# @param file [String/fname] name of the file containing the data (from `config`)
|
75
|
+
# @param file [String/fname] path to the file containing the data (from `config`)
|
69
76
|
# @return [Hash] the data ready for access
|
70
|
-
def load_data
|
71
|
-
filename = Pathname.new
|
77
|
+
def load_data filename
|
78
|
+
filename = Pathname.new filename
|
72
79
|
abort "Only CSV data for now, sorry" unless filename.extname == '.csv'
|
73
80
|
# avoid loading data we won't use
|
74
81
|
series = [:time] + inputs + targets
|
@@ -85,16 +92,16 @@ class DataModeler::Base
|
|
85
92
|
# @param id [String/fname] id of current config/experiment (from `config`)
|
86
93
|
# @return [void]
|
87
94
|
# @note side effect: creates directories on file system to hold output
|
88
|
-
def prepare_output
|
89
|
-
Pathname.new(
|
95
|
+
def prepare_output results_dir:, exp_id:
|
96
|
+
Pathname.new(results_dir).join(exp_id).tap { |path| FileUtils.mkdir_p path }
|
90
97
|
end
|
91
98
|
|
92
99
|
# Compatibility helper, preparing configuration hashes for different classes
|
93
|
-
# @param
|
100
|
+
# @param whom [Symbol] which class are you preparing the config for
|
94
101
|
# @return [Hash] configuration for the class as required
|
95
|
-
def opts_for
|
96
|
-
case
|
97
|
-
when :
|
102
|
+
def opts_for whom
|
103
|
+
case whom
|
104
|
+
when :dataset_gen
|
98
105
|
{ ds_args: opts_for(:dataset),
|
99
106
|
train_size: config[:tset][:train_size],
|
100
107
|
test_size: config[:tset][:test_size]
|
@@ -111,7 +118,7 @@ class DataModeler::Base
|
|
111
118
|
ninputs: (config[:tset][:ninput_points] * inputs.size),
|
112
119
|
noutputs: targets.size
|
113
120
|
})
|
114
|
-
else abort "Unrecognized `
|
121
|
+
else abort "Unrecognized `whom`: '#{whom}'"
|
115
122
|
end
|
116
123
|
end
|
117
124
|
|
@@ -2,25 +2,20 @@
|
|
2
2
|
# Main gem module
|
3
3
|
module DataModeler
|
4
4
|
|
5
|
-
### VERSION
|
6
|
-
|
7
|
-
# Version number
|
8
|
-
VERSION = "0.3.4"
|
9
|
-
|
10
5
|
### HELPER FUNCTIONS
|
11
6
|
|
12
7
|
# Returns a standardized String ID from a (sequentially named) file
|
13
8
|
# @return [String]
|
14
9
|
# @note convenient method to have available in the config
|
15
|
-
def self.
|
10
|
+
def self.id_from filename
|
16
11
|
format "%02d", Integer(filename[/_(\d+).rb$/,1])
|
17
12
|
end
|
18
13
|
|
19
|
-
# Returns an instance of the
|
20
|
-
# @param config [Hash]
|
21
|
-
# @return [
|
14
|
+
# Returns an instance of the Framework class
|
15
|
+
# @param config [Hash] Framework class configuration
|
16
|
+
# @return [Framework] initialized instance of Framework class
|
22
17
|
def self.new config
|
23
|
-
DataModeler::
|
18
|
+
DataModeler::Framework.new config
|
24
19
|
end
|
25
20
|
|
26
21
|
### EXCEPTIONS
|
@@ -30,7 +25,7 @@ module DataModeler
|
|
30
25
|
class TimeNotFoundError < StandardError; end
|
31
26
|
end
|
32
27
|
|
33
|
-
class DataModeler::
|
28
|
+
class DataModeler::Dataset::Generator
|
34
29
|
# Exception: not enough `data` was provided for even a single train+test setup
|
35
30
|
class NotEnoughDataError < StandardError; end
|
36
31
|
|
@@ -2,7 +2,7 @@ require 'ruby-fann'
|
|
2
2
|
|
3
3
|
# Model the data using an artificial neural network, based on the
|
4
4
|
# Fast Artificial Neural Networks (FANN) implementation
|
5
|
-
class DataModeler::
|
5
|
+
class DataModeler::Model::FANN
|
6
6
|
|
7
7
|
attr_reader :fann_opts, :ngens, :fann, :algo, :actfn, :init_weights_range
|
8
8
|
|
@@ -48,6 +48,8 @@ class DataModeler::Models::FANN
|
|
48
48
|
# @param ngens [Integer] number of training generations
|
49
49
|
# @return [void]
|
50
50
|
def train trainset, ngens=@ngens, report_interval: 1000, desired_error: 1e-10
|
51
|
+
# it makes sense to temporarily disable the `report_interval` with `false` or `nil`
|
52
|
+
report_interval ||= 0
|
51
53
|
# special case: not implemented in FANN
|
52
54
|
if algo == :rwg
|
53
55
|
return train_rwg(trainset, ngens,
|
@@ -1,7 +1,7 @@
|
|
1
1
|
|
2
2
|
# All models for the framework should belong to this module.
|
3
3
|
# Also includes a model selector for initialization from config.
|
4
|
-
module DataModeler::
|
4
|
+
module DataModeler::Model
|
5
5
|
# Returns a new `Model` based on the `type` of choice initialized
|
6
6
|
# with `opts` parameters
|
7
7
|
# @param type [Symbol] selects the type of `Model`
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: data_modeler
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 1.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Giuseppe Cuccu
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-
|
11
|
+
date: 2017-06-28 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: ruby-fann
|
@@ -16,14 +16,14 @@ dependencies:
|
|
16
16
|
requirements:
|
17
17
|
- - "~>"
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version: 1.2
|
19
|
+
version: '1.2'
|
20
20
|
type: :runtime
|
21
21
|
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
24
|
- - "~>"
|
25
25
|
- !ruby/object:Gem::Version
|
26
|
-
version: 1.2
|
26
|
+
version: '1.2'
|
27
27
|
- !ruby/object:Gem::Dependency
|
28
28
|
name: pry
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
@@ -136,9 +136,10 @@ dependencies:
|
|
136
136
|
- - "~>"
|
137
137
|
- !ruby/object:Gem::Version
|
138
138
|
version: 0.5.4
|
139
|
-
description: Using machine learning, create generative models based on your data
|
140
|
-
Applications span from
|
141
|
-
time series.
|
139
|
+
description: 'Using machine learning, create generative models based on your data
|
140
|
+
alone. Applications span from prediction to imputation and compression. This build
|
141
|
+
specifically leverages time series. NOTE: Since version 1.0.0 we''re production-ready!
|
142
|
+
:)'
|
142
143
|
email:
|
143
144
|
- giuseppe.cuccu@gmail.com
|
144
145
|
executables: []
|
@@ -155,14 +156,16 @@ files:
|
|
155
156
|
- bin/console
|
156
157
|
- bin/setup
|
157
158
|
- data_modeler.gemspec
|
159
|
+
- installation.md
|
158
160
|
- lib/data_modeler.rb
|
159
|
-
- lib/data_modeler/
|
160
|
-
- lib/data_modeler/dataset/
|
161
|
-
- lib/data_modeler/dataset/dataset_gen.rb
|
161
|
+
- lib/data_modeler/dataset/accessor.rb
|
162
|
+
- lib/data_modeler/dataset/generator.rb
|
162
163
|
- lib/data_modeler/dataset/helper.rb
|
163
|
-
- lib/data_modeler/
|
164
|
-
- lib/data_modeler/
|
165
|
-
- lib/data_modeler/
|
164
|
+
- lib/data_modeler/framework.rb
|
165
|
+
- lib/data_modeler/helper.rb
|
166
|
+
- lib/data_modeler/model/fann.rb
|
167
|
+
- lib/data_modeler/model/selector.rb
|
168
|
+
- lib/data_modeler/version.rb
|
166
169
|
homepage: https://github.com/giuse/data_modeler
|
167
170
|
licenses:
|
168
171
|
- MIT
|