supervised_learning 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +24 -0
- data/Gemfile +4 -0
- data/LICENSE.txt +22 -0
- data/README.md +73 -0
- data/Rakefile +2 -0
- data/lib/supervised_learning/version.rb +3 -0
- data/lib/supervised_learning.rb +191 -0
- data/spec/linear_regression_spec.rb +118 -0
- data/spec/spec_helper.rb +13 -0
- data/supervised_learning.gemspec +26 -0
- metadata +128 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: f8b579d3f7901496e6f92152443e2af2a5ae1e48
|
4
|
+
data.tar.gz: c1b532c91505c321fbc3d0de82c0f2e19dd847dc
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 595acd0096de69b602bdddf01f8e6f2c83da4cb0c40adb0d896d4232325d462edd3eea913fad21504b0d50f7c70b0bd79b7a6986c0cab20a7a1d6623890482fd
|
7
|
+
data.tar.gz: d64319d9bf16c941b62939cab66f18da2f5c92d7a9934c02d08c4724d11ee6107f6a51c879401bc9d6c6803d41bc3d90cec4572ae3118c6b08bae84fd4c4888e
|
data/.gitignore
ADDED
@@ -0,0 +1,24 @@
|
|
1
|
+
*.gem
|
2
|
+
*.rbc
|
3
|
+
.bundle
|
4
|
+
.config
|
5
|
+
.yardoc
|
6
|
+
Gemfile.lock
|
7
|
+
InstalledFiles
|
8
|
+
_yardoc
|
9
|
+
coverage
|
10
|
+
doc/
|
11
|
+
lib/bundler/man
|
12
|
+
pkg
|
13
|
+
rdoc
|
14
|
+
spec/reports
|
15
|
+
test/tmp
|
16
|
+
test/version_tmp
|
17
|
+
tmp
|
18
|
+
*.bundle
|
19
|
+
*.so
|
20
|
+
*.o
|
21
|
+
*.a
|
22
|
+
mkmf.log
|
23
|
+
.ruby-gemset
|
24
|
+
.ruby-version
|
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2014 Michael Imstepf
|
2
|
+
|
3
|
+
MIT License
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,73 @@
|
|
1
|
+
# SupervisedLearning
|
2
|
+
|
3
|
+
Supervised learning is the machine learning task of inferring a function from labeled training data. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
|
4
|
+
|
5
|
+
Credits for the underlying algorithms of the functions that make predictions go to [Andrew Ng](http://cs.stanford.edu/people/ang/) at Stanford University.
|
6
|
+
|
7
|
+
## Example
|
8
|
+
|
9
|
+
One example is the prediction of house prices (output value) along two dimensions (features): the size of house in square meters and the number of bedrooms.
|
10
|
+
|
11
|
+
The training data could look something like this:
|
12
|
+
|
13
|
+
| Size (m2) | # bedrooms | Price |
|
14
|
+
| ------------- |:-------------:| ------:|
|
15
|
+
| 2104 | 3 | 399900 |
|
16
|
+
| 1600 | 3 | 329900 |
|
17
|
+
| 3000 | 4 | 539900 |
|
18
|
+
| 1940 | 4 | 239999 |
|
19
|
+
|
20
|
+
Using linear regression, we can now predict the price of a house with 2200 square meters and 3 bedrooms.
|
21
|
+
|
22
|
+
## Installation
|
23
|
+
|
24
|
+
Add this line to your application's Gemfile:
|
25
|
+
|
26
|
+
gem 'supervised_learning'
|
27
|
+
|
28
|
+
And then execute:
|
29
|
+
|
30
|
+
$ bundle
|
31
|
+
|
32
|
+
Or install it yourself as:
|
33
|
+
|
34
|
+
$ gem install supervised_learning
|
35
|
+
|
36
|
+
## Usage
|
37
|
+
|
38
|
+
1. Create a matrix of the training data.
|
39
|
+
|
40
|
+
The **output value** (the type of value you want to predict) needs to be in the **last column**. The matrix has to have a) at least two columns (one feature and one output) and b) one row. The more data you feed it, the more accurate the prediction.
|
41
|
+
|
42
|
+
Consult the [Ruby API](http://www.ruby-doc.org/stdlib-2.1.2/libdoc/matrix/rdoc/Matrix.html) for information on how to build matrices for instances based on arrays of rows or columns.
|
43
|
+
|
44
|
+
```ruby
|
45
|
+
require 'matrix'
|
46
|
+
|
47
|
+
training_set = Matrix[ [2104, 3, 399900], [1600, 3, 329900], [3000, 4, 539900], [1940, 4, 239999] ]
|
48
|
+
```
|
49
|
+
|
50
|
+
2. Instantiate an object with the training data.
|
51
|
+
|
52
|
+
```ruby
|
53
|
+
program = SupervisedLearning::LinearRegression.new(training_set)
|
54
|
+
```
|
55
|
+
|
56
|
+
3. Create a prediction in form of a matrix.
|
57
|
+
|
58
|
+
This matrix has one row and the **columns follow the order of the training set**. It has one column less than the training set since the output value (the last column of the training set) is the value we want to predict.
|
59
|
+
|
60
|
+
```ruby
|
61
|
+
# Predict the price of a house of 2000 square meters with 3 bedrooms
|
62
|
+
prediction_set = Matrix[ [2000, 3] ]
|
63
|
+
program.predict(prediction_set)
|
64
|
+
=>
|
65
|
+
```
|
66
|
+
|
67
|
+
## Contributing
|
68
|
+
|
69
|
+
1. Fork it ( https://github.com/[my-github-username]/supervised_learning/fork )
|
70
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
71
|
+
3. Commit your changes (`git commit -am 'Add some feature'`)
|
72
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
73
|
+
5. Create a new Pull Request
|
data/Rakefile
ADDED
@@ -0,0 +1,191 @@
|
|
1
|
+
require 'supervised_learning/version'
|
2
|
+
require 'matrix'
|
3
|
+
require 'descriptive_statistics'
|
4
|
+
|
5
|
+
module SupervisedLearning
|
6
|
+
|
7
|
+
# This class uses linear regression to make predictions based on a training set.
|
8
|
+
# For datasets of less than 1000 columns, use #predict since this will give the most accurate prediction.
|
9
|
+
# For larger datasets where the #predict method is too slow, use #predict_advanced.
|
10
|
+
# The algorithms in #predict and #predict_advanced were provided by Andrew Ng (Stanford University).
|
11
|
+
# @author Michael Imstepf
|
12
|
+
class LinearRegression
|
13
|
+
# Initializes a LinearRegression object with a training set
|
14
|
+
# @param training_set [Matrix] training_set, each feature/dimension has one column and the last column is the output column (type of value #predict will return)
|
15
|
+
# @raise [ArgumentError] if training_set is not a Matrix and has at least two columns and one row
|
16
|
+
def initialize(training_set)
|
17
|
+
@training_set = training_set
|
18
|
+
raise ArgumentError, 'input is not a Matrix' unless @training_set.is_a? Matrix
|
19
|
+
raise ArgumentError, 'Matrix must have at least 2 columns and 1 row' unless @training_set.column_size > 1
|
20
|
+
|
21
|
+
@number_of_training_set_columns = @training_set.column_size
|
22
|
+
@number_of_features = @number_of_training_set_columns - 1
|
23
|
+
@number_of_training_examples = @training_set.row_size
|
24
|
+
|
25
|
+
@output_set = @training_set.column_vectors.last
|
26
|
+
end
|
27
|
+
|
28
|
+
# Makes prediction using normalization.
|
29
|
+
# This algorithm is the most accurate one but with large
|
30
|
+
# sets (more than 1000 columns) it might take too long to calculate.
|
31
|
+
# @param prediction [Matrix] prediction
|
32
|
+
def predict(prediction)
|
33
|
+
feature_set = get_feature_set(@training_set, true)
|
34
|
+
|
35
|
+
validate_prediction_input(prediction)
|
36
|
+
|
37
|
+
transposed_feature_set = feature_set.transpose # only transpose once for efficiency
|
38
|
+
theta = (transposed_feature_set * feature_set).inverse * transposed_feature_set * @output_set
|
39
|
+
|
40
|
+
# add column of ones to prediction
|
41
|
+
prediction = get_feature_set(prediction, true)
|
42
|
+
|
43
|
+
result_vectorized = prediction * theta
|
44
|
+
result = result_vectorized.to_a.first.to_f
|
45
|
+
end
|
46
|
+
|
47
|
+
# Makes prediction using gradient descent.
|
48
|
+
# This algorithm is requires less computing power
|
49
|
+
# than #predict but is less accurate since it uses approximation.
|
50
|
+
# @param prediction [Matrix] prediction
|
51
|
+
def predict_advanced(prediction, learning_rate = 0.01, iterations = 1000, debug = false)
|
52
|
+
validate_prediction_input(prediction)
|
53
|
+
|
54
|
+
feature_set = get_feature_set(@training_set, false)
|
55
|
+
feature_set = normalize_feature_set(feature_set)
|
56
|
+
# add ones to feature set after normalization
|
57
|
+
feature_set = get_feature_set(feature_set, true)
|
58
|
+
|
59
|
+
# prepare theta column vector with zeros
|
60
|
+
theta = Array.new(@number_of_training_set_columns, 0)
|
61
|
+
theta = Matrix.columns([theta])
|
62
|
+
|
63
|
+
iterations.times do
|
64
|
+
theta = theta - (learning_rate * (1.0/@number_of_training_examples) * (feature_set * theta - @output_set).transpose * feature_set).transpose
|
65
|
+
if debug
|
66
|
+
puts "Theta: #{theta}"
|
67
|
+
puts "Cost: #{calculate_cost(feature_set, theta)}"
|
68
|
+
end
|
69
|
+
end
|
70
|
+
|
71
|
+
# normalize prediction
|
72
|
+
prediction = normalize_prediction(prediction)
|
73
|
+
|
74
|
+
# add column of ones to prediction
|
75
|
+
prediction = get_feature_set(prediction, true)
|
76
|
+
|
77
|
+
result_vectorized = prediction * theta
|
78
|
+
result = result_vectorized[0,0]
|
79
|
+
end
|
80
|
+
|
81
|
+
private
|
82
|
+
|
83
|
+
# Returns a feature set without output set (last column of training set)
|
84
|
+
# and optionally adds a leading column of ones to a Matrix.
|
85
|
+
# This column of ones is the first dimension of theta to easily calculate
|
86
|
+
# the output of a function a*1 + b*theta_1 + c*theta_2 etc.
|
87
|
+
# Ruby's Matrix class has not built-in function for prepending,
|
88
|
+
# hence some manual work is required.
|
89
|
+
# @see http://stackoverflow.com/questions/9710628/how-do-i-add-columns-and-rows-to-a-matrix-in-ruby
|
90
|
+
# @param matrix [Matrix] matrix
|
91
|
+
# @param leading_ones [Boolean] whether to prepend a column of leading ones
|
92
|
+
# @return [Matrix] matrix
|
93
|
+
def get_feature_set(matrix, leading_ones = false)
|
94
|
+
# get array of columns
|
95
|
+
existing_columns = matrix.column_vectors
|
96
|
+
|
97
|
+
columns = []
|
98
|
+
columns << Array.new(existing_columns.first.size, 1) if leading_ones
|
99
|
+
# add remaining columns
|
100
|
+
existing_columns.each_with_index do |column, index|
|
101
|
+
# output column (last column of @training_set) needs to be skipped
|
102
|
+
# when called from #get_feature_set, matrix includes output column
|
103
|
+
# when called from #prediction, matrix does not inlcude output column
|
104
|
+
break if index + 1 > @number_of_features
|
105
|
+
columns << column.to_a
|
106
|
+
end
|
107
|
+
|
108
|
+
Matrix.columns(columns)
|
109
|
+
end
|
110
|
+
|
111
|
+
# Validates prediction input.
|
112
|
+
# @param prediction [Matrix] prediction
|
113
|
+
# @raise [ArgumentError] if prediction is not a Matrix
|
114
|
+
# @raise [ArgumentError] if prediction does not have the correct number of columns (@training set minus @output_set)
|
115
|
+
# @raise [ArgumentError] if prediction has more than one row
|
116
|
+
def validate_prediction_input(prediction)
|
117
|
+
raise ArgumentError, 'input is not a Matrix' unless prediction.is_a? Matrix
|
118
|
+
raise ArgumentError, 'input has more than one row' if prediction.row_size > 1
|
119
|
+
raise ArgumentError, 'input has wrong number of columns' if prediction.column_size != @number_of_features
|
120
|
+
end
|
121
|
+
|
122
|
+
# Normalizes feature set for quicker and more reliable calculation.
|
123
|
+
# @param feature_set [Matrix] feature set
|
124
|
+
# @return [Matrix] normalized feature set
|
125
|
+
def normalize_feature_set(feature_set)
|
126
|
+
# create Matrix with mean
|
127
|
+
mean = []
|
128
|
+
feature_set.column_vectors.each do |feature_set_column|
|
129
|
+
# create Matrix of length of training examples for later substraction
|
130
|
+
mean << Array.new(@number_of_training_examples, feature_set_column.mean)
|
131
|
+
end
|
132
|
+
mean = Matrix.columns(mean)
|
133
|
+
|
134
|
+
# save for later usage as Matrix and not as Vector
|
135
|
+
@mean = Matrix[mean.row(0)]
|
136
|
+
|
137
|
+
# substract mean from feature set
|
138
|
+
feature_set = feature_set - mean
|
139
|
+
|
140
|
+
# create Matrix with standard deviation
|
141
|
+
standard_deviation = []
|
142
|
+
feature_set.column_vectors.each do |feature_set_column|
|
143
|
+
# create row vector with standard deviation
|
144
|
+
standard_deviation << [feature_set_column.standard_deviation]
|
145
|
+
end
|
146
|
+
# save for later usage
|
147
|
+
@standard_deviation = Matrix.columns(standard_deviation)
|
148
|
+
|
149
|
+
# Dividing these non-square matrices has to be done manually
|
150
|
+
# (non square matrices have no inverse and can't be divided in Ruby)
|
151
|
+
# iterate through each column
|
152
|
+
columns = []
|
153
|
+
feature_set.column_vectors.each_with_index do |feature_set_column, index|
|
154
|
+
# manually divide each row within column with standard deviation for that row
|
155
|
+
columns << feature_set_column.to_a.collect { |value| value / @standard_deviation[0,index] }
|
156
|
+
end
|
157
|
+
# reconstruct training set
|
158
|
+
feature_set = Matrix.columns(columns)
|
159
|
+
feature_set
|
160
|
+
end
|
161
|
+
|
162
|
+
# Normalizes prediction.
|
163
|
+
# @param prediction [Matrix] prediction
|
164
|
+
# @return [Matrix] normalized prediction
|
165
|
+
def normalize_prediction(prediction)
|
166
|
+
# substract mean
|
167
|
+
prediction = prediction - @mean
|
168
|
+
|
169
|
+
# Dividing these non-square matrices has to be done manually
|
170
|
+
# (non square matrices have no inverse and can't be divided in Ruby)
|
171
|
+
# iterate through each column
|
172
|
+
columns = []
|
173
|
+
prediction.column_vectors.each_with_index do |prediction_column, index|
|
174
|
+
# manually divide each row within column with standard deviation for that row
|
175
|
+
columns << prediction_column / @standard_deviation[0,index]
|
176
|
+
end
|
177
|
+
# reconstruct prediction
|
178
|
+
prediction = Matrix.columns(columns)
|
179
|
+
end
|
180
|
+
|
181
|
+
# Calculates cost of current theta.
|
182
|
+
# The closer to 0, the more accurate the prediction will be.
|
183
|
+
# @param feature_set [Matrix] feature set
|
184
|
+
# @param theta [Matrix] theta
|
185
|
+
# @return [Float] cost
|
186
|
+
def calculate_cost(feature_set, theta)
|
187
|
+
cost_vectorized = 1.0/(2 * @number_of_training_examples) * (feature_set * theta - @output_set).transpose * (feature_set * theta - @output_set)
|
188
|
+
cost_vectorized[0,0]
|
189
|
+
end
|
190
|
+
end
|
191
|
+
end
|
@@ -0,0 +1,118 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
require 'pry'
|
3
|
+
|
4
|
+
describe SupervisedLearning::LinearRegression do
|
5
|
+
training_set_one_feature = Matrix[ [2104,399900], [1600,329900], [2400,369000], [1416,232000], [3000,539900], [1985,299900], [1534,314900], [1427,198999], [1380,212000], [1494,242500], [1940,239999], [2000,347000], [1890,329999], [4478,699900], [1268,259900], [2300,449900], [1320,299900], [1236,199900], [2609,499998], [3031,599000], [1767,252900], [1888,255000], [1604,242900], [1962,259900], [3890,573900], [1100,249900], [1458,464500], [2526,469000], [2200,475000], [2637,299900], [1839,349900], [1000,169900], [2040,314900], [3137,579900], [1811,285900], [1437,249900], [1239,229900], [2132,345000], [4215,549000], [2162,287000], [1664,368500], [2238,329900], [2567,314000], [1200,299000], [852,179900], [1852,299900], [1203,239500] ]
|
6
|
+
program_one_feature = SupervisedLearning::LinearRegression.new(training_set_one_feature)
|
7
|
+
|
8
|
+
training_set_two_features = Matrix[ [2104,3,399900], [1600,3,329900], [2400,3,369000], [1416,2,232000], [3000,4,539900], [1985,4,299900], [1534,3,314900], [1427,3,198999], [1380,3,212000], [1494,3,242500], [1940,4,239999], [2000,3,347000], [1890,3,329999], [4478,5,699900], [1268,3,259900], [2300,4,449900], [1320,2,299900], [1236,3,199900], [2609,4,499998], [3031,4,599000], [1767,3,252900], [1888,2,255000], [1604,3,242900], [1962,4,259900], [3890,3,573900], [1100,3,249900], [1458,3,464500], [2526,3,469000], [2200,3,475000], [2637,3,299900], [1839,2,349900], [1000,1,169900], [2040,4,314900], [3137,3,579900], [1811,4,285900], [1437,3,249900], [1239,3,229900], [2132,4,345000], [4215,4,549000], [2162,4,287000], [1664,2,368500], [2238,3,329900], [2567,4,314000], [1200,3,299000], [852,2,179900], [1852,4,299900], [1203,3,239500] ]
|
9
|
+
program_two_features = SupervisedLearning::LinearRegression.new(training_set_two_features)
|
10
|
+
|
11
|
+
describe '#initialize' do
|
12
|
+
context 'when training set is not a matrix' do
|
13
|
+
it 'raises an exception' do
|
14
|
+
expect {SupervisedLearning::LinearRegression.new([1, 2])}.to raise_exception(ArgumentError)
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
context 'when training set is an empty Matrix' do
|
19
|
+
it 'raises an exception' do
|
20
|
+
expect {SupervisedLearning::LinearRegression.new(Matrix[])}.to raise_exception(ArgumentError)
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
context 'when training only has one column' do
|
25
|
+
it 'raises an exception' do
|
26
|
+
expect {SupervisedLearning::LinearRegression.new(Matrix[[1]])}.to raise_exception(ArgumentError)
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
describe '#predict' do
|
32
|
+
context 'when prediction set is not a matrix' do
|
33
|
+
it 'raises an exception' do
|
34
|
+
expect {program_one_feature.predict([1, 2])}.to raise_exception(ArgumentError)
|
35
|
+
end
|
36
|
+
end
|
37
|
+
|
38
|
+
context 'when prediction has more than one row' do
|
39
|
+
it 'raises an exception' do
|
40
|
+
expect {program_two_features.predict(Matrix[ [1, 2], [3, 4] ])}.to raise_exception(ArgumentError)
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
context 'when prediction has wrong amount of columns' do
|
45
|
+
context 'when training set has one feature' do
|
46
|
+
it 'raises an exception' do
|
47
|
+
expect {program_one_feature.predict(Matrix[ [1, 2] ])}.to raise_exception(ArgumentError)
|
48
|
+
expect {program_one_feature.predict(Matrix[ ])}.to raise_exception(ArgumentError)
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
context 'when training set has two features' do
|
53
|
+
it 'raises an exception' do
|
54
|
+
expect {program_two_features.predict(Matrix[ [1, 2, 3] ])}.to raise_exception(ArgumentError)
|
55
|
+
expect {program_two_features.predict(Matrix[ [1] ])}.to raise_exception(ArgumentError)
|
56
|
+
end
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
context 'when prediction has correct amount of columns' do
|
61
|
+
context 'when training set has one feature' do
|
62
|
+
it 'returns correct prediction' do
|
63
|
+
expect(program_one_feature.predict(Matrix[ [1650] ]).to_i).to eq 293237
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
context 'when training set has two features' do
|
68
|
+
it 'returns correct prediction' do
|
69
|
+
expect(program_two_features.predict(Matrix[ [1650, 3] ]).to_i).to eq 293081
|
70
|
+
end
|
71
|
+
end
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
describe '#predict_advanced' do
|
76
|
+
context 'when prediction set is not a matrix' do
|
77
|
+
it 'raises an exception' do
|
78
|
+
expect {program_one_feature.predict_advanced([1, 2])}.to raise_exception(ArgumentError)
|
79
|
+
end
|
80
|
+
end
|
81
|
+
|
82
|
+
context 'when prediction has more than one row' do
|
83
|
+
it 'raises an exception' do
|
84
|
+
expect {program_two_features.predict_advanced(Matrix[ [1, 2], [3, 4] ])}.to raise_exception(ArgumentError)
|
85
|
+
end
|
86
|
+
end
|
87
|
+
|
88
|
+
context 'when prediction has wrong amount of columns' do
|
89
|
+
context 'when training set has one feature' do
|
90
|
+
it 'raises an exception' do
|
91
|
+
expect {program_one_feature.predict_advanced(Matrix[ [1, 2] ])}.to raise_exception(ArgumentError)
|
92
|
+
expect {program_one_feature.predict_advanced(Matrix[ ])}.to raise_exception(ArgumentError)
|
93
|
+
end
|
94
|
+
end
|
95
|
+
|
96
|
+
context 'when training set has two features' do
|
97
|
+
it 'raises an exception' do
|
98
|
+
expect {program_two_features.predict_advanced(Matrix[ [1, 2, 3] ])}.to raise_exception(ArgumentError)
|
99
|
+
expect {program_two_features.predict_advanced(Matrix[ [1] ])}.to raise_exception(ArgumentError)
|
100
|
+
end
|
101
|
+
end
|
102
|
+
end
|
103
|
+
|
104
|
+
context 'when prediction has correct amount of columns' do
|
105
|
+
context 'when training set has one feature' do
|
106
|
+
it 'returns correct prediction' do
|
107
|
+
expect(program_one_feature.predict_advanced(Matrix[ [1650] ], 0.1, 600, false).to_i).to be_within(200).of(293237)
|
108
|
+
end
|
109
|
+
end
|
110
|
+
|
111
|
+
context 'when training set has two features' do
|
112
|
+
it 'returns correct prediction' do
|
113
|
+
expect(program_two_features.predict_advanced(Matrix[ [1650, 3] ], 0.1, 600, false).to_i).to be_within(200).of(293237)
|
114
|
+
end
|
115
|
+
end
|
116
|
+
end
|
117
|
+
end
|
118
|
+
end
|
data/spec/spec_helper.rb
ADDED
@@ -0,0 +1,13 @@
|
|
1
|
+
require 'supervised_learning'
|
2
|
+
|
3
|
+
RSpec.configure do |config|
|
4
|
+
# Run specs in random order to surface order dependencies. If you find an
|
5
|
+
# order dependency and want to debug it, you can fix the order by providing
|
6
|
+
# the seed, which is printed after each run.
|
7
|
+
# --seed 1234
|
8
|
+
config.order = 'random'
|
9
|
+
|
10
|
+
# when a focus tag is present in RSpec, only run tests with focus tag: http://railscasts.com/episodes/285-spork
|
11
|
+
config.filter_run :focus
|
12
|
+
config.run_all_when_everything_filtered = true
|
13
|
+
end
|
@@ -0,0 +1,26 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'supervised_learning/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "supervised_learning"
|
8
|
+
spec.version = SupervisedLearning::VERSION
|
9
|
+
spec.authors = ["Michael Imstepf"]
|
10
|
+
spec.email = ["michael.imstepf@gmail.com"]
|
11
|
+
spec.summary = %q{A module to make predictions based on a set of training data.}
|
12
|
+
spec.description = %q{Supervised learning is the machine learning task of inferring a function from labeled training data. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.}
|
13
|
+
spec.homepage = "https://github.com/michaelimstepf/supervised-learning"
|
14
|
+
spec.license = "MIT"
|
15
|
+
|
16
|
+
spec.files = `git ls-files -z`.split("\x0")
|
17
|
+
spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
|
18
|
+
spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
|
19
|
+
spec.require_paths = ["lib"]
|
20
|
+
|
21
|
+
spec.add_development_dependency "bundler", "~> 1.6"
|
22
|
+
spec.add_development_dependency "rake"
|
23
|
+
spec.add_development_dependency "rspec"
|
24
|
+
spec.add_development_dependency "pry"
|
25
|
+
spec.add_development_dependency "descriptive_statistics"
|
26
|
+
end
|
metadata
ADDED
@@ -0,0 +1,128 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: supervised_learning
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.0.1
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Michael Imstepf
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2014-07-21 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: bundler
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - "~>"
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '1.6'
|
20
|
+
type: :development
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - "~>"
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '1.6'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: rake
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ">="
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ">="
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '0'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: rspec
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - ">="
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0'
|
48
|
+
type: :development
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
55
|
+
- !ruby/object:Gem::Dependency
|
56
|
+
name: pry
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
58
|
+
requirements:
|
59
|
+
- - ">="
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '0'
|
62
|
+
type: :development
|
63
|
+
prerelease: false
|
64
|
+
version_requirements: !ruby/object:Gem::Requirement
|
65
|
+
requirements:
|
66
|
+
- - ">="
|
67
|
+
- !ruby/object:Gem::Version
|
68
|
+
version: '0'
|
69
|
+
- !ruby/object:Gem::Dependency
|
70
|
+
name: descriptive_statistics
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
72
|
+
requirements:
|
73
|
+
- - ">="
|
74
|
+
- !ruby/object:Gem::Version
|
75
|
+
version: '0'
|
76
|
+
type: :development
|
77
|
+
prerelease: false
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
79
|
+
requirements:
|
80
|
+
- - ">="
|
81
|
+
- !ruby/object:Gem::Version
|
82
|
+
version: '0'
|
83
|
+
description: Supervised learning is the machine learning task of inferring a function
|
84
|
+
from labeled training data. A supervised learning algorithm analyzes the training
|
85
|
+
data and produces an inferred function, which can be used for mapping new examples.
|
86
|
+
email:
|
87
|
+
- michael.imstepf@gmail.com
|
88
|
+
executables: []
|
89
|
+
extensions: []
|
90
|
+
extra_rdoc_files: []
|
91
|
+
files:
|
92
|
+
- ".gitignore"
|
93
|
+
- Gemfile
|
94
|
+
- LICENSE.txt
|
95
|
+
- README.md
|
96
|
+
- Rakefile
|
97
|
+
- lib/supervised_learning.rb
|
98
|
+
- lib/supervised_learning/version.rb
|
99
|
+
- spec/linear_regression_spec.rb
|
100
|
+
- spec/spec_helper.rb
|
101
|
+
- supervised_learning.gemspec
|
102
|
+
homepage: https://github.com/michaelimstepf/supervised-learning
|
103
|
+
licenses:
|
104
|
+
- MIT
|
105
|
+
metadata: {}
|
106
|
+
post_install_message:
|
107
|
+
rdoc_options: []
|
108
|
+
require_paths:
|
109
|
+
- lib
|
110
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
111
|
+
requirements:
|
112
|
+
- - ">="
|
113
|
+
- !ruby/object:Gem::Version
|
114
|
+
version: '0'
|
115
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
116
|
+
requirements:
|
117
|
+
- - ">="
|
118
|
+
- !ruby/object:Gem::Version
|
119
|
+
version: '0'
|
120
|
+
requirements: []
|
121
|
+
rubyforge_project:
|
122
|
+
rubygems_version: 2.2.2
|
123
|
+
signing_key:
|
124
|
+
specification_version: 4
|
125
|
+
summary: A module to make predictions based on a set of training data.
|
126
|
+
test_files:
|
127
|
+
- spec/linear_regression_spec.rb
|
128
|
+
- spec/spec_helper.rb
|