data_modeler 1.0.2 → 1.0.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +9 -8
- data/lib/data_modeler/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: ea57e751cee00062d76e425d59b9c5a6fba273d5
|
4
|
+
data.tar.gz: 15092c42c80bde72d6ee710d6495a9ff8dc6a28d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 1835c0d942162a62b8713e6e255016d9aec1a8e8ccec97d4bc9438b77b8f9e0bea20affb698bb6be10553151a82b1d87b9264438319fc405d8d4c9df8a07101c
|
7
|
+
data.tar.gz: 1ee72d1334169d21a25f18fada689dd4a417a2a8251e5bfb51db7b7eeb6891261c4981ff6a29a4639f75d67c95b1e8ae8f263566043d9fec09ede5fe1fcb4194
|
data/README.md
CHANGED
@@ -7,8 +7,7 @@
|
|
7
7
|
[![Code Climate](https://codeclimate.com/github/giuse/data_modeler/badges/gpa.svg)](https://codeclimate.com/github/giuse/data_modeler)
|
8
8
|
|
9
9
|
|
10
|
-
**Using machine learning, create generative models based on your data
|
11
|
-
Applications span from prediction to imputation and compression.**
|
10
|
+
**Using machine learning, create generative models based on your data.**
|
12
11
|
|
13
12
|
|
14
13
|
## Installation
|
@@ -68,7 +67,9 @@ This means that to know all available options you should rely on a previous conf
|
|
68
67
|
|
69
68
|
There are three settings under `:tset` in the config which may be cryptic: `ninput_points`, `tspread` and `look_ahead`. Names can change in the future as I found it hard to name these three, please open an issue if I forget to modify this (or if you have suggestions).
|
70
69
|
|
71
|
-
If you don't work with time series, just set them to [1,0,0]
|
70
|
+
If you don't work with time series, just set them to `[1,0,0]`, use a line counter for `time`, and ignore the following. These three only make sense if the data is composed of aligned time series, with a numeric column `time` -- its unit will also be the unit for `tspread` and `look_ahead`.
|
71
|
+
|
72
|
+
The data needs to be indexed (i.e. no repetitions) and sorted by `time`. This implies that different data "lines" in the following explanation have different time values.
|
72
73
|
|
73
74
|
- ninput_points: how many points in time to construct the model's input. For example, if the number is 3, then data coming from 3 data lines is considered.
|
74
75
|
- tspread: time spread between the data lines considered in the point above. For example, if the number is 2, then the data lines considered will have (at least) 2 time (units) between each other.
|
@@ -76,16 +77,16 @@ If you don't work with time series, just set them to [1,0,0], use a line counter
|
|
76
77
|
|
77
78
|
*Example configurations:*
|
78
79
|
|
79
|
-
- ninput_points = 1
|
80
|
-
- ninput_points = 4
|
81
|
-
- ninput_points = 30
|
80
|
+
- ninput_points = `1`, tspread = `0`, look_ahead = `0` -> build input from one line, no spreading, predict results in same line. This is the basic configuration allowing same-timestep prediction, e.g. for static modeling or simple data imputation.
|
81
|
+
- ninput_points = `4`, tspread = `7`, look_ahead = `7` -> hypothesize the unit of the column `time` to be days: build input from 4 lines spanning 21 days at one-week intervals (+ current), then use it to learn to predict one week ahead. This allows to train a proper time-ahead predictor, which will estimate the target at a constant one-week ahead interval.
|
82
|
+
- ninput_points = `30`, tspread = `1`, look_ahead = `1` -> hypothesize the unit of the column `time` to be seconds: train a real-time predictor estimating a behavior one-second ahead based on 1s-spaced data over the past 29 seconds + current.
|
82
83
|
|
83
84
|
Important: from each line, only the data coming from the listed input time series is considered for input, while the target time series list is used to construct the output.
|
84
85
|
|
85
86
|
*Example inputs and targets*, considering `t0` the "current" time for a given iteration:
|
86
87
|
|
87
|
-
- ninput_points = 1
|
88
|
-
- ninput_points = 4
|
88
|
+
- ninput_points = `1`, tspread = `0`, look_ahead = `0`, input_series = `[s1, s4]`, targets = `[s3]`: inputs -> `[s1t0, s4t0]`, targets = [s3t0]
|
89
|
+
- ninput_points = `4`, tspread = `7`, look_ahead = `7`, input_series = `[s1, s4]`, targets = `[s3, s5]`: inputs -> `[s1t-21, s4t-21, s1t-14, s4t-14, s1t-7, s4t-7, s1t0, s4t0]`, targets = `[s3t7, s5t7]`
|
89
90
|
|
90
91
|
|
91
92
|
## Contributing
|
data/lib/data_modeler/version.rb
CHANGED