eps 0.2.1 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ae5bc00818b79dc5e07f4dcda7ca56aa825f1014d1f70203564a87cb49b375d4
4
- data.tar.gz: 8ba22dddc8635da418c429c12066c63bc1aea15238c32c4a7c4185f66281b6a5
3
+ metadata.gz: 3ca27ba2379d1cbfb6f3407ace5ad9dd5fcb71b08e48b8805ddda6483c026194
4
+ data.tar.gz: 91bb0beb50664dda5c2a42684414b1972e2bff91c3a993926639939c91272ccd
5
5
  SHA512:
6
- metadata.gz: e8a0f8cc325d26618691613a6213f6471b45c94a22bb2c9eb6ea729543dce4deabd9875d8e7055649fde90066d30d09c0b1b61949598c6859557e8270ff8e776
7
- data.tar.gz: 0b8d0918e9571ce1587d4497b8de84b64222f19ba7c466d24bd490605dcfbed10ebfa10ad10ed6f03a13ab45a7bc1b77d5928f45d0a9f8ba7757e569591a36fe
6
+ metadata.gz: 648d8098928d0ed952ad4cf2195b3e2562db5a38249357b76eb39c0aa17d8f8f974936c4773b2395ae1b1197aedb6e47c8fd018675496f3f966ee2feebb1ed2d
7
+ data.tar.gz: aa48887027114d9b654f3564715586a1740b742fe7778602d8db770b4921cff8acfbf90baea3ae6092d7c3962f37763c630857d71fbcd573402dfb016159f0c2
data/CHANGELOG.md CHANGED
@@ -1,3 +1,17 @@
1
+ ## 0.3.0
2
+
3
+ - Added support for LightGBM
4
+ - Added text features
5
+ - Fixed naive Bayes PMML
6
+ - Fixed error with classification and Daru
7
+
8
+ Breaking
9
+
10
+ - LightGBM is now the default for new models
11
+ - Cross-validation happens automatically by default
12
+ - Removed support for JSON and PFA formats
13
+ - Added smoothing to naive Bayes
14
+
1
15
  ## 0.2.1
2
16
 
3
17
  - Fixed error with `summary`
data/LICENSE.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2018 Andrew Kane
3
+ Copyright (c) 2018-2019 Andrew Kane
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -4,9 +4,7 @@ Machine learning for Ruby
4
4
 
5
5
  - Build predictive models quickly and easily
6
6
  - Serve models built in Ruby, Python, R, and more
7
- - Supports regression (linear regression) and classification (naive Bayes)
8
- - Automatically handles categorical features
9
- - Works great with the SciRuby ecosystem (Daru & IRuby)
7
+ - No prior knowledge of machine learning required :tada:
10
8
 
11
9
  Check out [this post](https://ankane.org/rails-meet-data-science) for more info on machine learning with Rails
12
10
 
@@ -20,7 +18,11 @@ Add this line to your application’s Gemfile:
20
18
  gem 'eps'
21
19
  ```
22
20
 
23
- To speed up training on large datasets, you can also [add GSL](#training-performance).
21
+ On Mac, also install OpenMP:
22
+
23
+ ```sh
24
+ brew install libomp
25
+ ```
24
26
 
25
27
  ## Getting Started
26
28
 
@@ -43,160 +45,119 @@ Make a prediction
43
45
  model.predict(bedrooms: 2, bathrooms: 1)
44
46
  ```
45
47
 
46
- > Pass an array of hashes make multiple predictions at once
47
-
48
- The target can be numeric (regression) or categorical (classification).
49
-
50
- ## Building Models
51
-
52
- ### Training and Test Sets
53
-
54
- When building models, it’s a good idea to hold out some data so you can see how well the model will perform on unseen data. To do this, we split our data into two sets: training and test. We build the model with the training set and later evaluate it on the test set.
48
+ Store the model
55
49
 
56
50
  ```ruby
57
- split_date = Date.parse("2018-06-01")
58
- train_set, test_set = houses.partition { |h| h.sold_at < split_date }
51
+ File.write("model.pmml", model.to_pmml)
59
52
  ```
60
53
 
61
- If your data doesn’t have a time associated with it, you can split it randomly.
54
+ Load the model
62
55
 
63
56
  ```ruby
64
- rng = Random.new(1) # seed random number generator
65
- train_set, test_set = houses.partition { rng.rand < 0.7 }
57
+ pmml = File.read("model.pmml")
58
+ model = Eps::Model.load_pmml(pmml)
66
59
  ```
67
60
 
68
- ### Outliers and Missing Data
61
+ A few notes:
69
62
 
70
- Next, decide what to do with outliers and missing data. There are a number of methods for handling them, but the easiest is to remove them.
63
+ - The target can be numeric (regression) or categorical (classification)
64
+ - Pass an array of hashes to `predict` to make multiple predictions at once
65
+ - Models are stored in [PMML](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language), a standard for model storage
71
66
 
72
- ```ruby
73
- train_set.reject! { |h| h.bedrooms.nil? || h.price < 10000 }
74
- ```
67
+ ## Building Models
75
68
 
76
- ### Feature Engineering
69
+ ### Goal
77
70
 
78
- Selecting features for a model is extremely important for performance. Features can be numeric or categorical. For categorical features, there’s no need to create dummy variables - just pass the data as strings.
71
+ Often, the goal of building a model is to make good predictions on future data. To help achieve this, Eps splits the data into training and validation sets if you have 30+ data points. It uses the training set to build the model and the validation set to evaluate the performance.
72
+
73
+ If your data has a time associated with it, it’s highly recommended to use that field for the split.
79
74
 
80
75
  ```ruby
81
- {state: "CA"}
76
+ Eps::Model.new(data, target: :price, split: :listed_at)
82
77
  ```
83
78
 
84
- > Categorical features generate coefficients for each distinct value except for one
79
+ Otherwise, the split is random. There are a number of [other options](#validation-options) as well.
85
80
 
86
- Convert any ids to strings so they’re treated as categorical features.
81
+ Performance is reported in the summary.
87
82
 
88
- ```ruby
89
- {city_id: city_id.to_s}
90
- ```
91
-
92
- For times, create features like day of week and hour of day with:
83
+ - For regression, it reports validation RMSE (root mean squared error) - lower is better
84
+ - For classification, it reports validation accuracy - higher is better
93
85
 
94
- ```ruby
95
- {weekday: time.wday.to_s, hour: time.hour.to_s}
96
- ```
86
+ Typically, the best way to improve performance is feature engineering.
97
87
 
98
- In practice, your code may look like:
88
+ ### Feature Engineering
99
89
 
100
- ```ruby
101
- def features(house)
102
- {
103
- bedrooms: house.bedrooms,
104
- city_id: house.city_id.to_s,
105
- month: house.sold_at.strftime("%b")
106
- }
107
- end
90
+ Features are extremely important for model performance. Features can be:
108
91
 
109
- train_features = train_set.map { |h| features(h) }
110
- ```
92
+ 1. numeric
93
+ 2. categorical
94
+ 3. text
111
95
 
112
- > We use a method for features so it can be used across training, evaluation, and prediction
96
+ #### Numeric
113
97
 
114
- We also need to prepare the target variable.
98
+ For numeric features, use any numeric type.
115
99
 
116
100
  ```ruby
117
- def target(house)
118
- house.price
119
- end
120
-
121
- train_target = train_set.map { |h| target(h) }
101
+ {bedrooms: 4, bathrooms: 2.5}
122
102
  ```
123
103
 
124
- ### Training
104
+ #### Categorical
125
105
 
126
- Now, let’s train the model.
106
+ For categorical features, use strings or booleans.
127
107
 
128
108
  ```ruby
129
- model = Eps::Model.new(train_features, train_target)
130
- puts model.summary
109
+ {state: "CA", basement: true}
131
110
  ```
132
111
 
133
- For regression, the summary includes the coefficients and their significance. The lower the p-value, the more significant the feature is. p-values below 0.05 are typically considered significant. It also shows the adjusted r-squared, which is a measure of how well the model fits the data. The higher the number, the better the fit. Heres a good explanation of why it’s [better than r-squared](https://www.quora.com/What-is-the-difference-between-R-squared-and-Adjusted-R-squared).
134
-
135
- ### Evaluation
136
-
137
- When you’re happy with the model, see how well it performs on the test set. This gives us an idea of how well it’ll perform on unseen data.
112
+ Convert any ids to strings so theyre treated as categorical features.
138
113
 
139
114
  ```ruby
140
- test_features = test_set.map { |h| features(h) }
141
- test_target = test_set.map { |h| target(h) }
142
- model.evaluate(test_features, test_target)
115
+ {city_id: city_id.to_s}
143
116
  ```
144
117
 
145
- For regression, this returns:
146
-
147
- - RMSE - Root mean square error
148
- - MAE - Mean absolute error
149
- - ME - Mean error
150
-
151
- We want to minimize the RMSE and MAE and keep the ME around 0.
152
-
153
- For classification, this returns:
154
-
155
- - Accuracy
156
-
157
- We want to maximize the accuracy.
118
+ For dates, create features like day of week and month.
158
119
 
159
- ### Finalize
120
+ ```ruby
121
+ {weekday: sold_on.strftime("%a"), month: sold_on.strftime("%b")}
122
+ ```
160
123
 
161
- Now that we have an idea of how the model will perform, we want to retrain the model with all of our data. Treat outliers and missing data the same as you did with the training set.
124
+ For times, create features like day of week and hour of day.
162
125
 
163
126
  ```ruby
164
- # outliers and missing data
165
- houses.reject! { |h| h.bedrooms.nil? || h.price < 10000 }
166
-
167
- # training
168
- all_features = houses.map { |h| features(h) }
169
- all_target = houses.map { |h| target(h) }
170
- model = Eps::Model.new(all_features, all_target)
127
+ {weekday: listed_at.strftime("%a"), hour: listed_at.hour.to_s}
171
128
  ```
172
129
 
173
- We now have a model that’s ready to serve.
130
+ #### Text
174
131
 
175
- ## Serving Models
176
-
177
- Once the model is trained, we need to store it. Eps uses PMML - [Predictive Model Markup Language](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language) - a standard for storing models. A great option is to write the model to a file with:
132
+ For text features, use strings with multiple words.
178
133
 
179
134
  ```ruby
180
- File.write("model.pmml", model.to_pmml)
135
+ {description: "a beautiful house on top of a hill"}
181
136
  ```
182
137
 
183
- > You may need to add `nokogiri` to your Gemfile
138
+ This creates features based on word count (term frequency).
184
139
 
185
- To load a model, use:
140
+ You can specify text features explicitly with:
186
141
 
187
142
  ```ruby
188
- pmml = File.read("model.pmml")
189
- model = Eps::Model.load_pmml(pmml)
143
+ Eps::Model.new(data, target: :price, text_features: [:description])
190
144
  ```
191
145
 
192
- Now we can use it to make predictions.
146
+ You can set advanced options with:
193
147
 
194
148
  ```ruby
195
- model.predict(bedrooms: 2, bathrooms: 1)
149
+ text_features: {
150
+ description: {
151
+ min_occurences: 5,
152
+ max_features: 1000,
153
+ min_length: 1,
154
+ case_sensitive: true,
155
+ tokenizer: /\s+/,
156
+ stop_words: ["and", "the"]
157
+ }
158
+ }
196
159
  ```
197
160
 
198
- To continuously train models, we recommend [storing them in your database](#database-storage).
199
-
200
161
  ## Full Example
201
162
 
202
163
  We recommend putting all the model code in a single file. This makes it easy to rebuild the model as needed.
@@ -212,38 +173,18 @@ Here’s what a complete model in `app/ml_models/price_model.rb` may look like:
212
173
  ```ruby
213
174
  class PriceModel < Eps::Base
214
175
  def build
215
- houses = House.all.to_a
216
-
217
- # divide into training and test set
218
- split_date = Date.parse("2018-06-01")
219
- train_set, test_set = houses.partition { |h| h.sold_at < split_date }
220
-
221
- # handle outliers and missing values
222
- train_set = preprocess(train_set)
176
+ houses = House.all
223
177
 
224
178
  # train
225
- train_features = train_set.map { |v| features(v) }
226
- train_target = train_set.map { |v| target(v) }
227
- model = Eps::Model.new(train_features, train_target)
179
+ data = houses.map { |v| features(v) }
180
+ model = Eps::Model.new(data, target: :price, split: :listed_at)
228
181
  puts model.summary
229
182
 
230
- # evaluate
231
- test_features = test_set.map { |v| features(v) }
232
- test_target = test_set.map { |v| target(v) }
233
- metrics = model.evaluate(test_features, test_target)
234
- puts "Test RMSE: #{metrics[:rmse]}"
235
- # for classification, use:
236
- # puts "Test accuracy: #{(100 * metrics[:accuracy]).round}%"
237
-
238
- # finalize
239
- houses = preprocess(houses)
240
- all_features = houses.map { |h| features(h) }
241
- all_target = houses.map { |h| target(h) }
242
- model = Eps::Model.new(all_features, all_target)
243
-
244
- # save
183
+ # save to file
245
184
  File.write(model_file, model.to_pmml)
246
- @model = nil # reset for future predictions
185
+
186
+ # ensure reloads from file
187
+ @model = nil
247
188
  end
248
189
 
249
190
  def predict(house)
@@ -252,22 +193,16 @@ class PriceModel < Eps::Base
252
193
 
253
194
  private
254
195
 
255
- def preprocess(train_set)
256
- train_set.reject { |h| h.bedrooms.nil? || h.price < 10000 }
257
- end
258
-
259
196
  def features(house)
260
197
  {
261
198
  bedrooms: house.bedrooms,
262
199
  city_id: house.city_id.to_s,
263
- month: house.sold_at.strftime("%b")
200
+ month: house.listed_at.strftime("%b"),
201
+ listed_at: house.listed_at,
202
+ price: house.price
264
203
  }
265
204
  end
266
205
 
267
- def target(house)
268
- house.price
269
- end
270
-
271
206
  def model
272
207
  @model ||= Eps::Model.load_pmml(File.read(model_file))
273
208
  end
@@ -298,50 +233,17 @@ We recommend monitoring how well your models perform over time. To do this, save
298
233
 
299
234
  ```ruby
300
235
  actual = houses.map(&:price)
301
- estimated = houses.map(&:estimated_price)
302
- Eps.metrics(actual, estimated)
236
+ predicted = houses.map(&:predicted_price)
237
+ Eps.metrics(actual, predicted)
303
238
  ```
304
239
 
305
- This returns the same evaluation metrics as model building. For RMSE and MAE, alert if they rise above a certain threshold. For ME, alert if it moves too far away from 0. For accuracy, alert if it drops below a certain threshold.
240
+ For RMSE and MAE, alert if they rise above a certain threshold. For ME, alert if it moves too far away from 0. For accuracy, alert if it drops below a certain threshold.
306
241
 
307
242
  ## Other Languages
308
243
 
309
- Eps makes it easy to serve models from other languages. You can build models in R, Python, and others and serve them in Ruby without having to worry about how to deploy or run another language.
310
-
311
- Eps can serve linear regression and Naive bayes models. Check out [Scoruby](https://github.com/asafschers/scoruby) to serve other models.
312
-
313
- ### R
314
-
315
- To create a model in R, install the [pmml](https://cran.r-project.org/package=pmml) package
316
-
317
- ```r
318
- install.packages("pmml")
319
- ```
320
-
321
- For regression, run:
244
+ Eps makes it easy to serve models from other languages. You can build models in Python, R, and others and serve them in Ruby without having to worry about how to deploy or run another language.
322
245
 
323
- ```r
324
- library(pmml)
325
-
326
- model <- lm(dist ~ speed, cars)
327
-
328
- # save model
329
- data <- toString(pmml(model))
330
- write(data, file="model.pmml")
331
- ```
332
-
333
- For classification, run:
334
-
335
- ```r
336
- library(pmml)
337
- library(e1071)
338
-
339
- model <- naiveBayes(Species ~ ., iris)
340
-
341
- # save model
342
- data <- toString(pmml(model, predictedField="Species"))
343
- write(data, file="model.pmml")
344
- ```
246
+ Eps can serve LightGBM, linear regression, and naive Bayes models. Check out [ONNX Runtime](https://github.com/ankane/onnxruntime) and [Scoruby](https://github.com/asafschers/scoruby) to serve other models.
345
247
 
346
248
  ### Python
347
249
 
@@ -351,36 +253,25 @@ To create a model in Python, install the [sklearn2pmml](https://github.com/jpmml
351
253
  pip install sklearn2pmml
352
254
  ```
353
255
 
354
- For regression, run:
256
+ And check out the examples:
355
257
 
356
- ```python
357
- from sklearn2pmml import sklearn2pmml, make_pmml_pipeline
358
- from sklearn.linear_model import LinearRegression
258
+ - [LightGBM Regression](test/support/python/lightgbm_regression.py)
259
+ - [LightGBM Classification](test/support/python/lightgbm_classification.py)
260
+ - [Linear Regression](test/support/python/linear_regression.py)
261
+ - [Naive Bayes](test/support/python/naive_bayes.py)
359
262
 
360
- x = [1, 2, 3, 5, 6]
361
- y = [5 * xi + 3 for xi in x]
263
+ ### R
362
264
 
363
- model = LinearRegression()
364
- model.fit([[xi] for xi in x], y)
265
+ To create a model in R, install the [pmml](https://cran.r-project.org/package=pmml) package
365
266
 
366
- # save model
367
- sklearn2pmml(make_pmml_pipeline(model), "model.pmml")
267
+ ```r
268
+ install.packages("pmml")
368
269
  ```
369
270
 
370
- For classification, run:
271
+ And check out the examples:
371
272
 
372
- ```python
373
- from sklearn2pmml import sklearn2pmml, make_pmml_pipeline
374
- from sklearn.naive_bayes import GaussianNB
375
-
376
- x = [1, 2, 3, 5, 6]
377
- y = ["ham", "ham", "ham", "spam", "spam"]
378
-
379
- model = GaussianNB()
380
- model.fit([[xi] for xi in x], y)
381
-
382
- sklearn2pmml(make_pmml_pipeline(model), "model.pmml")
383
- ```
273
+ - [Linear Regression](test/support/r/linear_regression.R)
274
+ - [Naive Bayes](test/support/r/naive_bayes.R)
384
275
 
385
276
  ### Verifying
386
277
 
@@ -413,37 +304,58 @@ CSV.foreach("predictions.csv", headers: true, converters: :numeric) do |row|
413
304
  end
414
305
  ```
415
306
 
416
- ## Database Storage
307
+ ## Data
417
308
 
418
- The database is another place you can store models. It’s good if you retrain models automatically.
309
+ A number of data formats are supported. You can pass the target variable separately.
419
310
 
420
- > We recommend adding monitoring and guardrails as well if you retrain automatically
311
+ ```ruby
312
+ x = [{x: 1}, {x: 2}, {x: 3}]
313
+ y = [1, 2, 3]
314
+ Eps::Model.new(x, y)
315
+ ```
421
316
 
422
- Create an ActiveRecord model to store the predictive model.
317
+ Or pass arrays of arrays
423
318
 
424
- ```sh
425
- rails g model Model key:string:uniq data:text
319
+ ```ruby
320
+ x = [[1, 2], [2, 0], [3, 1]]
321
+ y = [1, 2, 3]
322
+ Eps::Model.new(x, y)
426
323
  ```
427
324
 
428
- Store the model with:
325
+ ### Daru
326
+
327
+ Eps works well with Daru data frames.
429
328
 
430
329
  ```ruby
431
- store = Model.where(key: "price").first_or_initialize
432
- store.update(data: model.to_pmml)
330
+ df = Daru::DataFrame.from_csv("houses.csv")
331
+ Eps::Model.new(df, target: "price")
433
332
  ```
434
333
 
435
- Load the model with:
334
+ ### CSVs
335
+
336
+ When importing data from CSV files, be sure to convert numeric fields. The `table` method does this automatically.
436
337
 
437
338
  ```ruby
438
- data = Model.find_by!(key: "price").data
439
- model = Eps::Model.load_pmml(data)
339
+ CSV.table("data.csv").map { |row| row.to_h }
440
340
  ```
441
341
 
442
- ## Training Performance
342
+ ## Algorithms
443
343
 
444
- Speed up training on large datasets with GSL.
344
+ Pass an algorithm with:
445
345
 
446
- First, [install GSL](https://www.gnu.org/software/gsl/). With Homebrew, you can use:
346
+ ```ruby
347
+ Eps::Model.new(data, algorithm: :linear_regression)
348
+ ```
349
+
350
+ Eps supports:
351
+
352
+ - LightGBM (default)
353
+ - Linear Regression
354
+ - Naive Bayes
355
+
356
+ ### Linear Regression
357
+
358
+ To speed up training on large datasets with linear regression, [install GSL](https://www.gnu.org/software/gsl/). With Homebrew, you can use:
447
359
 
448
360
  ```sh
449
361
  brew install gsl
@@ -457,65 +369,93 @@ gem 'gsl', group: :development
457
369
 
458
370
  It only needs to be available in environments used to build the model.
459
371
 
460
- > This only speeds up regression, not classification
372
+ ## Validation Options
461
373
 
462
- ## Data
374
+ Pass your own validation set with:
463
375
 
464
- A number of data formats are supported. You can pass the target variable separately.
376
+ ```ruby
377
+ Eps::Model.new(data, validation_set: validation_set)
378
+ ```
379
+
380
+ Split on a specific value
465
381
 
466
382
  ```ruby
467
- x = [{x: 1}, {x: 2}, {x: 3}]
468
- y = [1, 2, 3]
469
- Eps::Model.new(x, y)
383
+ Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
470
384
  ```
471
385
 
472
- Or pass arrays of arrays
386
+ Specify the validation set size (the default is `0.25`, which is 25%)
473
387
 
474
388
  ```ruby
475
- x = [[1, 2], [2, 0], [3, 1]]
476
- y = [1, 2, 3]
477
- Eps::Model.new(x, y)
389
+ Eps::Model.new(data, split: {validation_size: 0.2})
478
390
  ```
479
391
 
480
- ## Daru
392
+ ## Database Storage
481
393
 
482
- Eps works well with Daru data frames.
394
+ The database is another place you can store models. It’s good if you retrain models automatically.
483
395
 
484
- ```ruby
485
- df = Daru::DataFrame.from_csv("houses.csv")
486
- Eps::Model.new(df, target: "price")
396
+ > We recommend adding monitoring and guardrails as well if you retrain automatically
397
+
398
+ Create an ActiveRecord model to store the predictive model.
399
+
400
+ ```sh
401
+ rails g model Model key:string:uniq data:text
487
402
  ```
488
403
 
489
- To split into training and test sets, use:
404
+ Store the model with:
490
405
 
491
406
  ```ruby
492
- rng = Random.new(1) # seed random number generator
493
- train_index = houses.map { rng.rand < 0.7 }
494
- train_set = houses.where(train_index)
495
- test_set = houses.where(train_index.map { |v| !v })
407
+ store = Model.where(key: "price").first_or_initialize
408
+ store.update(data: model.to_pmml)
496
409
  ```
497
410
 
498
- ## CSVs
499
-
500
- When importing data from CSV files, be sure to convert numeric fields. The `table` method does this automatically.
411
+ Load the model with:
501
412
 
502
413
  ```ruby
503
- CSV.table("data.csv").map { |row| row.to_h }
414
+ data = Model.find_by!(key: "price").data
415
+ model = Eps::Model.load_pmml(data)
504
416
  ```
505
417
 
506
418
  ## Jupyter & IRuby
507
419
 
508
420
  You can use [IRuby](https://github.com/SciRuby/iruby) to run Eps in [Jupyter](https://jupyter.org/) notebooks. Here’s how to get [IRuby working with Rails](https://ankane.org/jupyter-rails).
509
421
 
510
- ## Reference
422
+ ## Upgrading
423
+
424
+ ## 0.3.0
511
425
 
512
- Get an extended summary with standard error, t-values, and r-squared
426
+ Eps 0.3.0 brings a number of improvements, including support for LightGBM and cross-validation. There are a number of breaking changes to be aware of:
513
427
 
514
- ```ruby
515
- model.summary(extended: true)
516
- ```
428
+ - LightGBM is now the default for new models. On Mac, run:
517
429
 
518
- ## Upgrading
430
+ ```sh
431
+ brew install libomp
432
+ ```
433
+
434
+ Pass the `algorithm` option to use linear regression or naive Bayes.
435
+
436
+ ```ruby
437
+ Eps::Model.new(data, algorithm: :linear_regression) # or :naive_bayes
438
+ ```
439
+
440
+ - Cross-validation happens automatically by default. You no longer need to create training and test sets manually. If you were splitting on a time, use:
441
+
442
+ ```ruby
443
+ Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
444
+ ```
445
+
446
+ Or randomly, use:
447
+
448
+ ```ruby
449
+ Eps::Model.new(data, split: {validation_size: 0.3})
450
+ ```
451
+
452
+ To continue splitting manually, use:
453
+
454
+ ```ruby
455
+ Eps::Model.new(data, validation_set: test_set)
456
+ ```
457
+
458
+ - It’s no longer possible to load models in JSON or PFA formats. Retrain models and save them as PMML.
519
459
 
520
460
  ## 0.2.0
521
461