eps 0.2.1 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +14 -0
- data/LICENSE.txt +1 -1
- data/README.md +183 -243
- data/lib/eps.rb +27 -3
- data/lib/eps/base_estimator.rb +316 -47
- data/lib/eps/data_frame.rb +141 -0
- data/lib/eps/evaluators/lightgbm.rb +116 -0
- data/lib/eps/evaluators/linear_regression.rb +54 -0
- data/lib/eps/evaluators/naive_bayes.rb +95 -0
- data/lib/eps/evaluators/node.rb +26 -0
- data/lib/eps/label_encoder.rb +41 -0
- data/lib/eps/lightgbm.rb +237 -0
- data/lib/eps/linear_regression.rb +132 -386
- data/lib/eps/metrics.rb +46 -0
- data/lib/eps/model.rb +16 -58
- data/lib/eps/naive_bayes.rb +175 -164
- data/lib/eps/pmml_generators/lightgbm.rb +187 -0
- data/lib/eps/statistics.rb +79 -0
- data/lib/eps/text_encoder.rb +81 -0
- data/lib/eps/utils.rb +22 -0
- data/lib/eps/version.rb +1 -1
- metadata +33 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3ca27ba2379d1cbfb6f3407ace5ad9dd5fcb71b08e48b8805ddda6483c026194
|
4
|
+
data.tar.gz: 91bb0beb50664dda5c2a42684414b1972e2bff91c3a993926639939c91272ccd
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 648d8098928d0ed952ad4cf2195b3e2562db5a38249357b76eb39c0aa17d8f8f974936c4773b2395ae1b1197aedb6e47c8fd018675496f3f966ee2feebb1ed2d
|
7
|
+
data.tar.gz: aa48887027114d9b654f3564715586a1740b742fe7778602d8db770b4921cff8acfbf90baea3ae6092d7c3962f37763c630857d71fbcd573402dfb016159f0c2
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,17 @@
|
|
1
|
+
## 0.3.0
|
2
|
+
|
3
|
+
- Added support for LightGBM
|
4
|
+
- Added text features
|
5
|
+
- Fixed naive Bayes PMML
|
6
|
+
- Fixed error with classification and Daru
|
7
|
+
|
8
|
+
Breaking
|
9
|
+
|
10
|
+
- LightGBM is now the default for new models
|
11
|
+
- Cross-validation happens automatically by default
|
12
|
+
- Removed support for JSON and PFA formats
|
13
|
+
- Added smoothing to naive Bayes
|
14
|
+
|
1
15
|
## 0.2.1
|
2
16
|
|
3
17
|
- Fixed error with `summary`
|
data/LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -4,9 +4,7 @@ Machine learning for Ruby
|
|
4
4
|
|
5
5
|
- Build predictive models quickly and easily
|
6
6
|
- Serve models built in Ruby, Python, R, and more
|
7
|
-
-
|
8
|
-
- Automatically handles categorical features
|
9
|
-
- Works great with the SciRuby ecosystem (Daru & IRuby)
|
7
|
+
- No prior knowledge of machine learning required :tada:
|
10
8
|
|
11
9
|
Check out [this post](https://ankane.org/rails-meet-data-science) for more info on machine learning with Rails
|
12
10
|
|
@@ -20,7 +18,11 @@ Add this line to your application’s Gemfile:
|
|
20
18
|
gem 'eps'
|
21
19
|
```
|
22
20
|
|
23
|
-
|
21
|
+
On Mac, also install OpenMP:
|
22
|
+
|
23
|
+
```sh
|
24
|
+
brew install libomp
|
25
|
+
```
|
24
26
|
|
25
27
|
## Getting Started
|
26
28
|
|
@@ -43,160 +45,119 @@ Make a prediction
|
|
43
45
|
model.predict(bedrooms: 2, bathrooms: 1)
|
44
46
|
```
|
45
47
|
|
46
|
-
|
47
|
-
|
48
|
-
The target can be numeric (regression) or categorical (classification).
|
49
|
-
|
50
|
-
## Building Models
|
51
|
-
|
52
|
-
### Training and Test Sets
|
53
|
-
|
54
|
-
When building models, it’s a good idea to hold out some data so you can see how well the model will perform on unseen data. To do this, we split our data into two sets: training and test. We build the model with the training set and later evaluate it on the test set.
|
48
|
+
Store the model
|
55
49
|
|
56
50
|
```ruby
|
57
|
-
|
58
|
-
train_set, test_set = houses.partition { |h| h.sold_at < split_date }
|
51
|
+
File.write("model.pmml", model.to_pmml)
|
59
52
|
```
|
60
53
|
|
61
|
-
|
54
|
+
Load the model
|
62
55
|
|
63
56
|
```ruby
|
64
|
-
|
65
|
-
|
57
|
+
pmml = File.read("model.pmml")
|
58
|
+
model = Eps::Model.load_pmml(pmml)
|
66
59
|
```
|
67
60
|
|
68
|
-
|
61
|
+
A few notes:
|
69
62
|
|
70
|
-
|
63
|
+
- The target can be numeric (regression) or categorical (classification)
|
64
|
+
- Pass an array of hashes to `predict` to make multiple predictions at once
|
65
|
+
- Models are stored in [PMML](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language), a standard for model storage
|
71
66
|
|
72
|
-
|
73
|
-
train_set.reject! { |h| h.bedrooms.nil? || h.price < 10000 }
|
74
|
-
```
|
67
|
+
## Building Models
|
75
68
|
|
76
|
-
###
|
69
|
+
### Goal
|
77
70
|
|
78
|
-
|
71
|
+
Often, the goal of building a model is to make good predictions on future data. To help achieve this, Eps splits the data into training and validation sets if you have 30+ data points. It uses the training set to build the model and the validation set to evaluate the performance.
|
72
|
+
|
73
|
+
If your data has a time associated with it, it’s highly recommended to use that field for the split.
|
79
74
|
|
80
75
|
```ruby
|
81
|
-
|
76
|
+
Eps::Model.new(data, target: :price, split: :listed_at)
|
82
77
|
```
|
83
78
|
|
84
|
-
|
79
|
+
Otherwise, the split is random. There are a number of [other options](#validation-options) as well.
|
85
80
|
|
86
|
-
|
81
|
+
Performance is reported in the summary.
|
87
82
|
|
88
|
-
|
89
|
-
|
90
|
-
```
|
91
|
-
|
92
|
-
For times, create features like day of week and hour of day with:
|
83
|
+
- For regression, it reports validation RMSE (root mean squared error) - lower is better
|
84
|
+
- For classification, it reports validation accuracy - higher is better
|
93
85
|
|
94
|
-
|
95
|
-
{weekday: time.wday.to_s, hour: time.hour.to_s}
|
96
|
-
```
|
86
|
+
Typically, the best way to improve performance is feature engineering.
|
97
87
|
|
98
|
-
|
88
|
+
### Feature Engineering
|
99
89
|
|
100
|
-
|
101
|
-
def features(house)
|
102
|
-
{
|
103
|
-
bedrooms: house.bedrooms,
|
104
|
-
city_id: house.city_id.to_s,
|
105
|
-
month: house.sold_at.strftime("%b")
|
106
|
-
}
|
107
|
-
end
|
90
|
+
Features are extremely important for model performance. Features can be:
|
108
91
|
|
109
|
-
|
110
|
-
|
92
|
+
1. numeric
|
93
|
+
2. categorical
|
94
|
+
3. text
|
111
95
|
|
112
|
-
|
96
|
+
#### Numeric
|
113
97
|
|
114
|
-
|
98
|
+
For numeric features, use any numeric type.
|
115
99
|
|
116
100
|
```ruby
|
117
|
-
|
118
|
-
house.price
|
119
|
-
end
|
120
|
-
|
121
|
-
train_target = train_set.map { |h| target(h) }
|
101
|
+
{bedrooms: 4, bathrooms: 2.5}
|
122
102
|
```
|
123
103
|
|
124
|
-
|
104
|
+
#### Categorical
|
125
105
|
|
126
|
-
|
106
|
+
For categorical features, use strings or booleans.
|
127
107
|
|
128
108
|
```ruby
|
129
|
-
|
130
|
-
puts model.summary
|
109
|
+
{state: "CA", basement: true}
|
131
110
|
```
|
132
111
|
|
133
|
-
|
134
|
-
|
135
|
-
### Evaluation
|
136
|
-
|
137
|
-
When you’re happy with the model, see how well it performs on the test set. This gives us an idea of how well it’ll perform on unseen data.
|
112
|
+
Convert any ids to strings so they’re treated as categorical features.
|
138
113
|
|
139
114
|
```ruby
|
140
|
-
|
141
|
-
test_target = test_set.map { |h| target(h) }
|
142
|
-
model.evaluate(test_features, test_target)
|
115
|
+
{city_id: city_id.to_s}
|
143
116
|
```
|
144
117
|
|
145
|
-
For
|
146
|
-
|
147
|
-
- RMSE - Root mean square error
|
148
|
-
- MAE - Mean absolute error
|
149
|
-
- ME - Mean error
|
150
|
-
|
151
|
-
We want to minimize the RMSE and MAE and keep the ME around 0.
|
152
|
-
|
153
|
-
For classification, this returns:
|
154
|
-
|
155
|
-
- Accuracy
|
156
|
-
|
157
|
-
We want to maximize the accuracy.
|
118
|
+
For dates, create features like day of week and month.
|
158
119
|
|
159
|
-
|
120
|
+
```ruby
|
121
|
+
{weekday: sold_on.strftime("%a"), month: sold_on.strftime("%b")}
|
122
|
+
```
|
160
123
|
|
161
|
-
|
124
|
+
For times, create features like day of week and hour of day.
|
162
125
|
|
163
126
|
```ruby
|
164
|
-
|
165
|
-
houses.reject! { |h| h.bedrooms.nil? || h.price < 10000 }
|
166
|
-
|
167
|
-
# training
|
168
|
-
all_features = houses.map { |h| features(h) }
|
169
|
-
all_target = houses.map { |h| target(h) }
|
170
|
-
model = Eps::Model.new(all_features, all_target)
|
127
|
+
{weekday: listed_at.strftime("%a"), hour: listed_at.hour.to_s}
|
171
128
|
```
|
172
129
|
|
173
|
-
|
130
|
+
#### Text
|
174
131
|
|
175
|
-
|
176
|
-
|
177
|
-
Once the model is trained, we need to store it. Eps uses PMML - [Predictive Model Markup Language](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language) - a standard for storing models. A great option is to write the model to a file with:
|
132
|
+
For text features, use strings with multiple words.
|
178
133
|
|
179
134
|
```ruby
|
180
|
-
|
135
|
+
{description: "a beautiful house on top of a hill"}
|
181
136
|
```
|
182
137
|
|
183
|
-
|
138
|
+
This creates features based on word count (term frequency).
|
184
139
|
|
185
|
-
|
140
|
+
You can specify text features explicitly with:
|
186
141
|
|
187
142
|
```ruby
|
188
|
-
|
189
|
-
model = Eps::Model.load_pmml(pmml)
|
143
|
+
Eps::Model.new(data, target: :price, text_features: [:description])
|
190
144
|
```
|
191
145
|
|
192
|
-
|
146
|
+
You can set advanced options with:
|
193
147
|
|
194
148
|
```ruby
|
195
|
-
|
149
|
+
text_features: {
|
150
|
+
description: {
|
151
|
+
min_occurences: 5,
|
152
|
+
max_features: 1000,
|
153
|
+
min_length: 1,
|
154
|
+
case_sensitive: true,
|
155
|
+
tokenizer: /\s+/,
|
156
|
+
stop_words: ["and", "the"]
|
157
|
+
}
|
158
|
+
}
|
196
159
|
```
|
197
160
|
|
198
|
-
To continuously train models, we recommend [storing them in your database](#database-storage).
|
199
|
-
|
200
161
|
## Full Example
|
201
162
|
|
202
163
|
We recommend putting all the model code in a single file. This makes it easy to rebuild the model as needed.
|
@@ -212,38 +173,18 @@ Here’s what a complete model in `app/ml_models/price_model.rb` may look like:
|
|
212
173
|
```ruby
|
213
174
|
class PriceModel < Eps::Base
|
214
175
|
def build
|
215
|
-
houses = House.all
|
216
|
-
|
217
|
-
# divide into training and test set
|
218
|
-
split_date = Date.parse("2018-06-01")
|
219
|
-
train_set, test_set = houses.partition { |h| h.sold_at < split_date }
|
220
|
-
|
221
|
-
# handle outliers and missing values
|
222
|
-
train_set = preprocess(train_set)
|
176
|
+
houses = House.all
|
223
177
|
|
224
178
|
# train
|
225
|
-
|
226
|
-
|
227
|
-
model = Eps::Model.new(train_features, train_target)
|
179
|
+
data = houses.map { |v| features(v) }
|
180
|
+
model = Eps::Model.new(data, target: :price, split: :listed_at)
|
228
181
|
puts model.summary
|
229
182
|
|
230
|
-
#
|
231
|
-
test_features = test_set.map { |v| features(v) }
|
232
|
-
test_target = test_set.map { |v| target(v) }
|
233
|
-
metrics = model.evaluate(test_features, test_target)
|
234
|
-
puts "Test RMSE: #{metrics[:rmse]}"
|
235
|
-
# for classification, use:
|
236
|
-
# puts "Test accuracy: #{(100 * metrics[:accuracy]).round}%"
|
237
|
-
|
238
|
-
# finalize
|
239
|
-
houses = preprocess(houses)
|
240
|
-
all_features = houses.map { |h| features(h) }
|
241
|
-
all_target = houses.map { |h| target(h) }
|
242
|
-
model = Eps::Model.new(all_features, all_target)
|
243
|
-
|
244
|
-
# save
|
183
|
+
# save to file
|
245
184
|
File.write(model_file, model.to_pmml)
|
246
|
-
|
185
|
+
|
186
|
+
# ensure reloads from file
|
187
|
+
@model = nil
|
247
188
|
end
|
248
189
|
|
249
190
|
def predict(house)
|
@@ -252,22 +193,16 @@ class PriceModel < Eps::Base
|
|
252
193
|
|
253
194
|
private
|
254
195
|
|
255
|
-
def preprocess(train_set)
|
256
|
-
train_set.reject { |h| h.bedrooms.nil? || h.price < 10000 }
|
257
|
-
end
|
258
|
-
|
259
196
|
def features(house)
|
260
197
|
{
|
261
198
|
bedrooms: house.bedrooms,
|
262
199
|
city_id: house.city_id.to_s,
|
263
|
-
month: house.
|
200
|
+
month: house.listed_at.strftime("%b"),
|
201
|
+
listed_at: house.listed_at,
|
202
|
+
price: house.price
|
264
203
|
}
|
265
204
|
end
|
266
205
|
|
267
|
-
def target(house)
|
268
|
-
house.price
|
269
|
-
end
|
270
|
-
|
271
206
|
def model
|
272
207
|
@model ||= Eps::Model.load_pmml(File.read(model_file))
|
273
208
|
end
|
@@ -298,50 +233,17 @@ We recommend monitoring how well your models perform over time. To do this, save
|
|
298
233
|
|
299
234
|
```ruby
|
300
235
|
actual = houses.map(&:price)
|
301
|
-
|
302
|
-
Eps.metrics(actual,
|
236
|
+
predicted = houses.map(&:predicted_price)
|
237
|
+
Eps.metrics(actual, predicted)
|
303
238
|
```
|
304
239
|
|
305
|
-
|
240
|
+
For RMSE and MAE, alert if they rise above a certain threshold. For ME, alert if it moves too far away from 0. For accuracy, alert if it drops below a certain threshold.
|
306
241
|
|
307
242
|
## Other Languages
|
308
243
|
|
309
|
-
Eps makes it easy to serve models from other languages. You can build models in
|
310
|
-
|
311
|
-
Eps can serve linear regression and Naive bayes models. Check out [Scoruby](https://github.com/asafschers/scoruby) to serve other models.
|
312
|
-
|
313
|
-
### R
|
314
|
-
|
315
|
-
To create a model in R, install the [pmml](https://cran.r-project.org/package=pmml) package
|
316
|
-
|
317
|
-
```r
|
318
|
-
install.packages("pmml")
|
319
|
-
```
|
320
|
-
|
321
|
-
For regression, run:
|
244
|
+
Eps makes it easy to serve models from other languages. You can build models in Python, R, and others and serve them in Ruby without having to worry about how to deploy or run another language.
|
322
245
|
|
323
|
-
|
324
|
-
library(pmml)
|
325
|
-
|
326
|
-
model <- lm(dist ~ speed, cars)
|
327
|
-
|
328
|
-
# save model
|
329
|
-
data <- toString(pmml(model))
|
330
|
-
write(data, file="model.pmml")
|
331
|
-
```
|
332
|
-
|
333
|
-
For classification, run:
|
334
|
-
|
335
|
-
```r
|
336
|
-
library(pmml)
|
337
|
-
library(e1071)
|
338
|
-
|
339
|
-
model <- naiveBayes(Species ~ ., iris)
|
340
|
-
|
341
|
-
# save model
|
342
|
-
data <- toString(pmml(model, predictedField="Species"))
|
343
|
-
write(data, file="model.pmml")
|
344
|
-
```
|
246
|
+
Eps can serve LightGBM, linear regression, and naive Bayes models. Check out [ONNX Runtime](https://github.com/ankane/onnxruntime) and [Scoruby](https://github.com/asafschers/scoruby) to serve other models.
|
345
247
|
|
346
248
|
### Python
|
347
249
|
|
@@ -351,36 +253,25 @@ To create a model in Python, install the [sklearn2pmml](https://github.com/jpmml
|
|
351
253
|
pip install sklearn2pmml
|
352
254
|
```
|
353
255
|
|
354
|
-
|
256
|
+
And check out the examples:
|
355
257
|
|
356
|
-
|
357
|
-
|
358
|
-
|
258
|
+
- [LightGBM Regression](test/support/python/lightgbm_regression.py)
|
259
|
+
- [LightGBM Classification](test/support/python/lightgbm_classification.py)
|
260
|
+
- [Linear Regression](test/support/python/linear_regression.py)
|
261
|
+
- [Naive Bayes](test/support/python/naive_bayes.py)
|
359
262
|
|
360
|
-
|
361
|
-
y = [5 * xi + 3 for xi in x]
|
263
|
+
### R
|
362
264
|
|
363
|
-
model
|
364
|
-
model.fit([[xi] for xi in x], y)
|
265
|
+
To create a model in R, install the [pmml](https://cran.r-project.org/package=pmml) package
|
365
266
|
|
366
|
-
|
367
|
-
|
267
|
+
```r
|
268
|
+
install.packages("pmml")
|
368
269
|
```
|
369
270
|
|
370
|
-
|
271
|
+
And check out the examples:
|
371
272
|
|
372
|
-
|
373
|
-
|
374
|
-
from sklearn.naive_bayes import GaussianNB
|
375
|
-
|
376
|
-
x = [1, 2, 3, 5, 6]
|
377
|
-
y = ["ham", "ham", "ham", "spam", "spam"]
|
378
|
-
|
379
|
-
model = GaussianNB()
|
380
|
-
model.fit([[xi] for xi in x], y)
|
381
|
-
|
382
|
-
sklearn2pmml(make_pmml_pipeline(model), "model.pmml")
|
383
|
-
```
|
273
|
+
- [Linear Regression](test/support/r/linear_regression.R)
|
274
|
+
- [Naive Bayes](test/support/r/naive_bayes.R)
|
384
275
|
|
385
276
|
### Verifying
|
386
277
|
|
@@ -413,37 +304,58 @@ CSV.foreach("predictions.csv", headers: true, converters: :numeric) do |row|
|
|
413
304
|
end
|
414
305
|
```
|
415
306
|
|
416
|
-
##
|
307
|
+
## Data
|
417
308
|
|
418
|
-
|
309
|
+
A number of data formats are supported. You can pass the target variable separately.
|
419
310
|
|
420
|
-
|
311
|
+
```ruby
|
312
|
+
x = [{x: 1}, {x: 2}, {x: 3}]
|
313
|
+
y = [1, 2, 3]
|
314
|
+
Eps::Model.new(x, y)
|
315
|
+
```
|
421
316
|
|
422
|
-
|
317
|
+
Or pass arrays of arrays
|
423
318
|
|
424
|
-
```
|
425
|
-
|
319
|
+
```ruby
|
320
|
+
x = [[1, 2], [2, 0], [3, 1]]
|
321
|
+
y = [1, 2, 3]
|
322
|
+
Eps::Model.new(x, y)
|
426
323
|
```
|
427
324
|
|
428
|
-
|
325
|
+
### Daru
|
326
|
+
|
327
|
+
Eps works well with Daru data frames.
|
429
328
|
|
430
329
|
```ruby
|
431
|
-
|
432
|
-
|
330
|
+
df = Daru::DataFrame.from_csv("houses.csv")
|
331
|
+
Eps::Model.new(df, target: "price")
|
433
332
|
```
|
434
333
|
|
435
|
-
|
334
|
+
### CSVs
|
335
|
+
|
336
|
+
When importing data from CSV files, be sure to convert numeric fields. The `table` method does this automatically.
|
436
337
|
|
437
338
|
```ruby
|
438
|
-
|
439
|
-
model = Eps::Model.load_pmml(data)
|
339
|
+
CSV.table("data.csv").map { |row| row.to_h }
|
440
340
|
```
|
441
341
|
|
442
|
-
##
|
342
|
+
## Algorithms
|
443
343
|
|
444
|
-
|
344
|
+
Pass an algorithm with:
|
445
345
|
|
446
|
-
|
346
|
+
```ruby
|
347
|
+
Eps::Model.new(data, algorithm: :linear_regression)
|
348
|
+
```
|
349
|
+
|
350
|
+
Eps supports:
|
351
|
+
|
352
|
+
- LightGBM (default)
|
353
|
+
- Linear Regression
|
354
|
+
- Naive Bayes
|
355
|
+
|
356
|
+
### Linear Regression
|
357
|
+
|
358
|
+
To speed up training on large datasets with linear regression, [install GSL](https://www.gnu.org/software/gsl/). With Homebrew, you can use:
|
447
359
|
|
448
360
|
```sh
|
449
361
|
brew install gsl
|
@@ -457,65 +369,93 @@ gem 'gsl', group: :development
|
|
457
369
|
|
458
370
|
It only needs to be available in environments used to build the model.
|
459
371
|
|
460
|
-
|
372
|
+
## Validation Options
|
461
373
|
|
462
|
-
|
374
|
+
Pass your own validation set with:
|
463
375
|
|
464
|
-
|
376
|
+
```ruby
|
377
|
+
Eps::Model.new(data, validation_set: validation_set)
|
378
|
+
```
|
379
|
+
|
380
|
+
Split on a specific value
|
465
381
|
|
466
382
|
```ruby
|
467
|
-
|
468
|
-
y = [1, 2, 3]
|
469
|
-
Eps::Model.new(x, y)
|
383
|
+
Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
|
470
384
|
```
|
471
385
|
|
472
|
-
|
386
|
+
Specify the validation set size (the default is `0.25`, which is 25%)
|
473
387
|
|
474
388
|
```ruby
|
475
|
-
|
476
|
-
y = [1, 2, 3]
|
477
|
-
Eps::Model.new(x, y)
|
389
|
+
Eps::Model.new(data, split: {validation_size: 0.2})
|
478
390
|
```
|
479
391
|
|
480
|
-
##
|
392
|
+
## Database Storage
|
481
393
|
|
482
|
-
|
394
|
+
The database is another place you can store models. It’s good if you retrain models automatically.
|
483
395
|
|
484
|
-
|
485
|
-
|
486
|
-
|
396
|
+
> We recommend adding monitoring and guardrails as well if you retrain automatically
|
397
|
+
|
398
|
+
Create an ActiveRecord model to store the predictive model.
|
399
|
+
|
400
|
+
```sh
|
401
|
+
rails g model Model key:string:uniq data:text
|
487
402
|
```
|
488
403
|
|
489
|
-
|
404
|
+
Store the model with:
|
490
405
|
|
491
406
|
```ruby
|
492
|
-
|
493
|
-
|
494
|
-
train_set = houses.where(train_index)
|
495
|
-
test_set = houses.where(train_index.map { |v| !v })
|
407
|
+
store = Model.where(key: "price").first_or_initialize
|
408
|
+
store.update(data: model.to_pmml)
|
496
409
|
```
|
497
410
|
|
498
|
-
|
499
|
-
|
500
|
-
When importing data from CSV files, be sure to convert numeric fields. The `table` method does this automatically.
|
411
|
+
Load the model with:
|
501
412
|
|
502
413
|
```ruby
|
503
|
-
|
414
|
+
data = Model.find_by!(key: "price").data
|
415
|
+
model = Eps::Model.load_pmml(data)
|
504
416
|
```
|
505
417
|
|
506
418
|
## Jupyter & IRuby
|
507
419
|
|
508
420
|
You can use [IRuby](https://github.com/SciRuby/iruby) to run Eps in [Jupyter](https://jupyter.org/) notebooks. Here’s how to get [IRuby working with Rails](https://ankane.org/jupyter-rails).
|
509
421
|
|
510
|
-
##
|
422
|
+
## Upgrading
|
423
|
+
|
424
|
+
## 0.3.0
|
511
425
|
|
512
|
-
|
426
|
+
Eps 0.3.0 brings a number of improvements, including support for LightGBM and cross-validation. There are a number of breaking changes to be aware of:
|
513
427
|
|
514
|
-
|
515
|
-
model.summary(extended: true)
|
516
|
-
```
|
428
|
+
- LightGBM is now the default for new models. On Mac, run:
|
517
429
|
|
518
|
-
|
430
|
+
```sh
|
431
|
+
brew install libomp
|
432
|
+
```
|
433
|
+
|
434
|
+
Pass the `algorithm` option to use linear regression or naive Bayes.
|
435
|
+
|
436
|
+
```ruby
|
437
|
+
Eps::Model.new(data, algorithm: :linear_regression) # or :naive_bayes
|
438
|
+
```
|
439
|
+
|
440
|
+
- Cross-validation happens automatically by default. You no longer need to create training and test sets manually. If you were splitting on a time, use:
|
441
|
+
|
442
|
+
```ruby
|
443
|
+
Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
|
444
|
+
```
|
445
|
+
|
446
|
+
Or randomly, use:
|
447
|
+
|
448
|
+
```ruby
|
449
|
+
Eps::Model.new(data, split: {validation_size: 0.3})
|
450
|
+
```
|
451
|
+
|
452
|
+
To continue splitting manually, use:
|
453
|
+
|
454
|
+
```ruby
|
455
|
+
Eps::Model.new(data, validation_set: test_set)
|
456
|
+
```
|
457
|
+
|
458
|
+
- It’s no longer possible to load models in JSON or PFA formats. Retrain models and save them as PMML.
|
519
459
|
|
520
460
|
## 0.2.0
|
521
461
|
|