supervised_learning 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +24 -0
- data/Gemfile +4 -0
- data/LICENSE.txt +22 -0
- data/README.md +73 -0
- data/Rakefile +2 -0
- data/lib/supervised_learning/version.rb +3 -0
- data/lib/supervised_learning.rb +191 -0
- data/spec/linear_regression_spec.rb +118 -0
- data/spec/spec_helper.rb +13 -0
- data/supervised_learning.gemspec +26 -0
- metadata +128 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: f8b579d3f7901496e6f92152443e2af2a5ae1e48
|
4
|
+
data.tar.gz: c1b532c91505c321fbc3d0de82c0f2e19dd847dc
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 595acd0096de69b602bdddf01f8e6f2c83da4cb0c40adb0d896d4232325d462edd3eea913fad21504b0d50f7c70b0bd79b7a6986c0cab20a7a1d6623890482fd
|
7
|
+
data.tar.gz: d64319d9bf16c941b62939cab66f18da2f5c92d7a9934c02d08c4724d11ee6107f6a51c879401bc9d6c6803d41bc3d90cec4572ae3118c6b08bae84fd4c4888e
|
data/.gitignore
ADDED
@@ -0,0 +1,24 @@
|
|
1
|
+
*.gem
|
2
|
+
*.rbc
|
3
|
+
.bundle
|
4
|
+
.config
|
5
|
+
.yardoc
|
6
|
+
Gemfile.lock
|
7
|
+
InstalledFiles
|
8
|
+
_yardoc
|
9
|
+
coverage
|
10
|
+
doc/
|
11
|
+
lib/bundler/man
|
12
|
+
pkg
|
13
|
+
rdoc
|
14
|
+
spec/reports
|
15
|
+
test/tmp
|
16
|
+
test/version_tmp
|
17
|
+
tmp
|
18
|
+
*.bundle
|
19
|
+
*.so
|
20
|
+
*.o
|
21
|
+
*.a
|
22
|
+
mkmf.log
|
23
|
+
.ruby-gemset
|
24
|
+
.ruby-version
|
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2014 Michael Imstepf
|
2
|
+
|
3
|
+
MIT License
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,73 @@
|
|
1
|
+
# SupervisedLearning
|
2
|
+
|
3
|
+
Supervised learning is the machine learning task of inferring a function from labeled training data. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.
|
4
|
+
|
5
|
+
Credits for the underlying algorithms of the functions that make predictions go to [Andrew Ng](http://cs.stanford.edu/people/ang/) at Stanford University.
|
6
|
+
|
7
|
+
## Example
|
8
|
+
|
9
|
+
One example is the prediction of house prices (output value) along two dimensions (features): the size of house in square meters and the number of bedrooms.
|
10
|
+
|
11
|
+
The training data could look something like this:
|
12
|
+
|
13
|
+
| Size (m2) | # bedrooms | Price |
|
14
|
+
| ------------- |:-------------:| ------:|
|
15
|
+
| 2104 | 3 | 399900 |
|
16
|
+
| 1600 | 3 | 329900 |
|
17
|
+
| 3000 | 4 | 539900 |
|
18
|
+
| 1940 | 4 | 239999 |
|
19
|
+
|
20
|
+
Using linear regression, we can now predict the price of a house with 2200 square meters and 3 bedrooms.
|
21
|
+
|
22
|
+
## Installation
|
23
|
+
|
24
|
+
Add this line to your application's Gemfile:
|
25
|
+
|
26
|
+
gem 'supervised_learning'
|
27
|
+
|
28
|
+
And then execute:
|
29
|
+
|
30
|
+
$ bundle
|
31
|
+
|
32
|
+
Or install it yourself as:
|
33
|
+
|
34
|
+
$ gem install supervised_learning
|
35
|
+
|
36
|
+
## Usage
|
37
|
+
|
38
|
+
1. Create a matrix of the training data.
|
39
|
+
|
40
|
+
The **output value** (the type of value you want to predict) needs to be in the **last column**. The matrix has to have a) at least two columns (one feature and one output) and b) one row. The more data you feed it, the more accurate the prediction.
|
41
|
+
|
42
|
+
Consult the [Ruby API](http://www.ruby-doc.org/stdlib-2.1.2/libdoc/matrix/rdoc/Matrix.html) for information on how to build matrices for instances based on arrays of rows or columns.
|
43
|
+
|
44
|
+
```ruby
|
45
|
+
require 'matrix'
|
46
|
+
|
47
|
+
training_set = Matrix[ [2104, 3, 399900], [1600, 3, 329900], [3000, 4, 539900], [1940, 4, 239999] ]
|
48
|
+
```
|
49
|
+
|
50
|
+
2. Instantiate an object with the training data.
|
51
|
+
|
52
|
+
```ruby
|
53
|
+
program = SupervisedLearning::LinearRegression.new(training_set)
|
54
|
+
```
|
55
|
+
|
56
|
+
3. Create a prediction in form of a matrix.
|
57
|
+
|
58
|
+
This matrix has one row and the **columns follow the order of the training set**. It has one column less than the training set since the output value (the last column of the training set) is the value we want to predict.
|
59
|
+
|
60
|
+
```ruby
|
61
|
+
# Predict the price of a house of 2000 square meters with 3 bedrooms
|
62
|
+
prediction_set = Matrix[ [2000, 3] ]
|
63
|
+
program.predict(prediction_set)
|
64
|
+
=>
|
65
|
+
```
|
66
|
+
|
67
|
+
## Contributing
|
68
|
+
|
69
|
+
1. Fork it ( https://github.com/[my-github-username]/supervised_learning/fork )
|
70
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
71
|
+
3. Commit your changes (`git commit -am 'Add some feature'`)
|
72
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
73
|
+
5. Create a new Pull Request
|
data/Rakefile
ADDED
@@ -0,0 +1,191 @@
|
|
1
|
+
require 'supervised_learning/version'
|
2
|
+
require 'matrix'
|
3
|
+
require 'descriptive_statistics'
|
4
|
+
|
5
|
+
module SupervisedLearning
|
6
|
+
|
7
|
+
# This class uses linear regression to make predictions based on a training set.
|
8
|
+
# For datasets of less than 1000 columns, use #predict since this will give the most accurate prediction.
|
9
|
+
# For larger datasets where the #predict method is too slow, use #predict_advanced.
|
10
|
+
# The algorithms in #predict and #predict_advanced were provided by Andrew Ng (Stanford University).
|
11
|
+
# @author Michael Imstepf
|
12
|
+
class LinearRegression
|
13
|
+
# Initializes a LinearRegression object with a training set
|
14
|
+
# @param training_set [Matrix] training_set, each feature/dimension has one column and the last column is the output column (type of value #predict will return)
|
15
|
+
# @raise [ArgumentError] if training_set is not a Matrix and has at least two columns and one row
|
16
|
+
def initialize(training_set)
|
17
|
+
@training_set = training_set
|
18
|
+
raise ArgumentError, 'input is not a Matrix' unless @training_set.is_a? Matrix
|
19
|
+
raise ArgumentError, 'Matrix must have at least 2 columns and 1 row' unless @training_set.column_size > 1
|
20
|
+
|
21
|
+
@number_of_training_set_columns = @training_set.column_size
|
22
|
+
@number_of_features = @number_of_training_set_columns - 1
|
23
|
+
@number_of_training_examples = @training_set.row_size
|
24
|
+
|
25
|
+
@output_set = @training_set.column_vectors.last
|
26
|
+
end
|
27
|
+
|
28
|
+
# Makes prediction using normalization.
|
29
|
+
# This algorithm is the most accurate one but with large
|
30
|
+
# sets (more than 1000 columns) it might take too long to calculate.
|
31
|
+
# @param prediction [Matrix] prediction
|
32
|
+
def predict(prediction)
|
33
|
+
feature_set = get_feature_set(@training_set, true)
|
34
|
+
|
35
|
+
validate_prediction_input(prediction)
|
36
|
+
|
37
|
+
transposed_feature_set = feature_set.transpose # only transpose once for efficiency
|
38
|
+
theta = (transposed_feature_set * feature_set).inverse * transposed_feature_set * @output_set
|
39
|
+
|
40
|
+
# add column of ones to prediction
|
41
|
+
prediction = get_feature_set(prediction, true)
|
42
|
+
|
43
|
+
result_vectorized = prediction * theta
|
44
|
+
result = result_vectorized.to_a.first.to_f
|
45
|
+
end
|
46
|
+
|
47
|
+
# Makes prediction using gradient descent.
|
48
|
+
# This algorithm is requires less computing power
|
49
|
+
# than #predict but is less accurate since it uses approximation.
|
50
|
+
# @param prediction [Matrix] prediction
|
51
|
+
def predict_advanced(prediction, learning_rate = 0.01, iterations = 1000, debug = false)
|
52
|
+
validate_prediction_input(prediction)
|
53
|
+
|
54
|
+
feature_set = get_feature_set(@training_set, false)
|
55
|
+
feature_set = normalize_feature_set(feature_set)
|
56
|
+
# add ones to feature set after normalization
|
57
|
+
feature_set = get_feature_set(feature_set, true)
|
58
|
+
|
59
|
+
# prepare theta column vector with zeros
|
60
|
+
theta = Array.new(@number_of_training_set_columns, 0)
|
61
|
+
theta = Matrix.columns([theta])
|
62
|
+
|
63
|
+
iterations.times do
|
64
|
+
theta = theta - (learning_rate * (1.0/@number_of_training_examples) * (feature_set * theta - @output_set).transpose * feature_set).transpose
|
65
|
+
if debug
|
66
|
+
puts "Theta: #{theta}"
|
67
|
+
puts "Cost: #{calculate_cost(feature_set, theta)}"
|
68
|
+
end
|
69
|
+
end
|
70
|
+
|
71
|
+
# normalize prediction
|
72
|
+
prediction = normalize_prediction(prediction)
|
73
|
+
|
74
|
+
# add column of ones to prediction
|
75
|
+
prediction = get_feature_set(prediction, true)
|
76
|
+
|
77
|
+
result_vectorized = prediction * theta
|
78
|
+
result = result_vectorized[0,0]
|
79
|
+
end
|
80
|
+
|
81
|
+
private
|
82
|
+
|
83
|
+
# Returns a feature set without output set (last column of training set)
|
84
|
+
# and optionally adds a leading column of ones to a Matrix.
|
85
|
+
# This column of ones is the first dimension of theta to easily calculate
|
86
|
+
# the output of a function a*1 + b*theta_1 + c*theta_2 etc.
|
87
|
+
# Ruby's Matrix class has not built-in function for prepending,
|
88
|
+
# hence some manual work is required.
|
89
|
+
# @see http://stackoverflow.com/questions/9710628/how-do-i-add-columns-and-rows-to-a-matrix-in-ruby
|
90
|
+
# @param matrix [Matrix] matrix
|
91
|
+
# @param leading_ones [Boolean] whether to prepend a column of leading ones
|
92
|
+
# @return [Matrix] matrix
|
93
|
+
def get_feature_set(matrix, leading_ones = false)
|
94
|
+
# get array of columns
|
95
|
+
existing_columns = matrix.column_vectors
|
96
|
+
|
97
|
+
columns = []
|
98
|
+
columns << Array.new(existing_columns.first.size, 1) if leading_ones
|
99
|
+
# add remaining columns
|
100
|
+
existing_columns.each_with_index do |column, index|
|
101
|
+
# output column (last column of @training_set) needs to be skipped
|
102
|
+
# when called from #get_feature_set, matrix includes output column
|
103
|
+
# when called from #prediction, matrix does not inlcude output column
|
104
|
+
break if index + 1 > @number_of_features
|
105
|
+
columns << column.to_a
|
106
|
+
end
|
107
|
+
|
108
|
+
Matrix.columns(columns)
|
109
|
+
end
|
110
|
+
|
111
|
+
# Validates prediction input.
|
112
|
+
# @param prediction [Matrix] prediction
|
113
|
+
# @raise [ArgumentError] if prediction is not a Matrix
|
114
|
+
# @raise [ArgumentError] if prediction does not have the correct number of columns (@training set minus @output_set)
|
115
|
+
# @raise [ArgumentError] if prediction has more than one row
|
116
|
+
def validate_prediction_input(prediction)
|
117
|
+
raise ArgumentError, 'input is not a Matrix' unless prediction.is_a? Matrix
|
118
|
+
raise ArgumentError, 'input has more than one row' if prediction.row_size > 1
|
119
|
+
raise ArgumentError, 'input has wrong number of columns' if prediction.column_size != @number_of_features
|
120
|
+
end
|
121
|
+
|
122
|
+
# Normalizes feature set for quicker and more reliable calculation.
|
123
|
+
# @param feature_set [Matrix] feature set
|
124
|
+
# @return [Matrix] normalized feature set
|
125
|
+
def normalize_feature_set(feature_set)
|
126
|
+
# create Matrix with mean
|
127
|
+
mean = []
|
128
|
+
feature_set.column_vectors.each do |feature_set_column|
|
129
|
+
# create Matrix of length of training examples for later substraction
|
130
|
+
mean << Array.new(@number_of_training_examples, feature_set_column.mean)
|
131
|
+
end
|
132
|
+
mean = Matrix.columns(mean)
|
133
|
+
|
134
|
+
# save for later usage as Matrix and not as Vector
|
135
|
+
@mean = Matrix[mean.row(0)]
|
136
|
+
|
137
|
+
# substract mean from feature set
|
138
|
+
feature_set = feature_set - mean
|
139
|
+
|
140
|
+
# create Matrix with standard deviation
|
141
|
+
standard_deviation = []
|
142
|
+
feature_set.column_vectors.each do |feature_set_column|
|
143
|
+
# create row vector with standard deviation
|
144
|
+
standard_deviation << [feature_set_column.standard_deviation]
|
145
|
+
end
|
146
|
+
# save for later usage
|
147
|
+
@standard_deviation = Matrix.columns(standard_deviation)
|
148
|
+
|
149
|
+
# Dividing these non-square matrices has to be done manually
|
150
|
+
# (non square matrices have no inverse and can't be divided in Ruby)
|
151
|
+
# iterate through each column
|
152
|
+
columns = []
|
153
|
+
feature_set.column_vectors.each_with_index do |feature_set_column, index|
|
154
|
+
# manually divide each row within column with standard deviation for that row
|
155
|
+
columns << feature_set_column.to_a.collect { |value| value / @standard_deviation[0,index] }
|
156
|
+
end
|
157
|
+
# reconstruct training set
|
158
|
+
feature_set = Matrix.columns(columns)
|
159
|
+
feature_set
|
160
|
+
end
|
161
|
+
|
162
|
+
# Normalizes prediction.
|
163
|
+
# @param prediction [Matrix] prediction
|
164
|
+
# @return [Matrix] normalized prediction
|
165
|
+
def normalize_prediction(prediction)
|
166
|
+
# substract mean
|
167
|
+
prediction = prediction - @mean
|
168
|
+
|
169
|
+
# Dividing these non-square matrices has to be done manually
|
170
|
+
# (non square matrices have no inverse and can't be divided in Ruby)
|
171
|
+
# iterate through each column
|
172
|
+
columns = []
|
173
|
+
prediction.column_vectors.each_with_index do |prediction_column, index|
|
174
|
+
# manually divide each row within column with standard deviation for that row
|
175
|
+
columns << prediction_column / @standard_deviation[0,index]
|
176
|
+
end
|
177
|
+
# reconstruct prediction
|
178
|
+
prediction = Matrix.columns(columns)
|
179
|
+
end
|
180
|
+
|
181
|
+
# Calculates cost of current theta.
|
182
|
+
# The closer to 0, the more accurate the prediction will be.
|
183
|
+
# @param feature_set [Matrix] feature set
|
184
|
+
# @param theta [Matrix] theta
|
185
|
+
# @return [Float] cost
|
186
|
+
def calculate_cost(feature_set, theta)
|
187
|
+
cost_vectorized = 1.0/(2 * @number_of_training_examples) * (feature_set * theta - @output_set).transpose * (feature_set * theta - @output_set)
|
188
|
+
cost_vectorized[0,0]
|
189
|
+
end
|
190
|
+
end
|
191
|
+
end
|
@@ -0,0 +1,118 @@
|
|
1
|
+
require 'spec_helper'
|
2
|
+
require 'pry'
|
3
|
+
|
4
|
+
describe SupervisedLearning::LinearRegression do
|
5
|
+
training_set_one_feature = Matrix[ [2104,399900], [1600,329900], [2400,369000], [1416,232000], [3000,539900], [1985,299900], [1534,314900], [1427,198999], [1380,212000], [1494,242500], [1940,239999], [2000,347000], [1890,329999], [4478,699900], [1268,259900], [2300,449900], [1320,299900], [1236,199900], [2609,499998], [3031,599000], [1767,252900], [1888,255000], [1604,242900], [1962,259900], [3890,573900], [1100,249900], [1458,464500], [2526,469000], [2200,475000], [2637,299900], [1839,349900], [1000,169900], [2040,314900], [3137,579900], [1811,285900], [1437,249900], [1239,229900], [2132,345000], [4215,549000], [2162,287000], [1664,368500], [2238,329900], [2567,314000], [1200,299000], [852,179900], [1852,299900], [1203,239500] ]
|
6
|
+
program_one_feature = SupervisedLearning::LinearRegression.new(training_set_one_feature)
|
7
|
+
|
8
|
+
training_set_two_features = Matrix[ [2104,3,399900], [1600,3,329900], [2400,3,369000], [1416,2,232000], [3000,4,539900], [1985,4,299900], [1534,3,314900], [1427,3,198999], [1380,3,212000], [1494,3,242500], [1940,4,239999], [2000,3,347000], [1890,3,329999], [4478,5,699900], [1268,3,259900], [2300,4,449900], [1320,2,299900], [1236,3,199900], [2609,4,499998], [3031,4,599000], [1767,3,252900], [1888,2,255000], [1604,3,242900], [1962,4,259900], [3890,3,573900], [1100,3,249900], [1458,3,464500], [2526,3,469000], [2200,3,475000], [2637,3,299900], [1839,2,349900], [1000,1,169900], [2040,4,314900], [3137,3,579900], [1811,4,285900], [1437,3,249900], [1239,3,229900], [2132,4,345000], [4215,4,549000], [2162,4,287000], [1664,2,368500], [2238,3,329900], [2567,4,314000], [1200,3,299000], [852,2,179900], [1852,4,299900], [1203,3,239500] ]
|
9
|
+
program_two_features = SupervisedLearning::LinearRegression.new(training_set_two_features)
|
10
|
+
|
11
|
+
describe '#initialize' do
|
12
|
+
context 'when training set is not a matrix' do
|
13
|
+
it 'raises an exception' do
|
14
|
+
expect {SupervisedLearning::LinearRegression.new([1, 2])}.to raise_exception(ArgumentError)
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
context 'when training set is an empty Matrix' do
|
19
|
+
it 'raises an exception' do
|
20
|
+
expect {SupervisedLearning::LinearRegression.new(Matrix[])}.to raise_exception(ArgumentError)
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
context 'when training only has one column' do
|
25
|
+
it 'raises an exception' do
|
26
|
+
expect {SupervisedLearning::LinearRegression.new(Matrix[[1]])}.to raise_exception(ArgumentError)
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
describe '#predict' do
|
32
|
+
context 'when prediction set is not a matrix' do
|
33
|
+
it 'raises an exception' do
|
34
|
+
expect {program_one_feature.predict([1, 2])}.to raise_exception(ArgumentError)
|
35
|
+
end
|
36
|
+
end
|
37
|
+
|
38
|
+
context 'when prediction has more than one row' do
|
39
|
+
it 'raises an exception' do
|
40
|
+
expect {program_two_features.predict(Matrix[ [1, 2], [3, 4] ])}.to raise_exception(ArgumentError)
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
context 'when prediction has wrong amount of columns' do
|
45
|
+
context 'when training set has one feature' do
|
46
|
+
it 'raises an exception' do
|
47
|
+
expect {program_one_feature.predict(Matrix[ [1, 2] ])}.to raise_exception(ArgumentError)
|
48
|
+
expect {program_one_feature.predict(Matrix[ ])}.to raise_exception(ArgumentError)
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
52
|
+
context 'when training set has two features' do
|
53
|
+
it 'raises an exception' do
|
54
|
+
expect {program_two_features.predict(Matrix[ [1, 2, 3] ])}.to raise_exception(ArgumentError)
|
55
|
+
expect {program_two_features.predict(Matrix[ [1] ])}.to raise_exception(ArgumentError)
|
56
|
+
end
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
context 'when prediction has correct amount of columns' do
|
61
|
+
context 'when training set has one feature' do
|
62
|
+
it 'returns correct prediction' do
|
63
|
+
expect(program_one_feature.predict(Matrix[ [1650] ]).to_i).to eq 293237
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
context 'when training set has two features' do
|
68
|
+
it 'returns correct prediction' do
|
69
|
+
expect(program_two_features.predict(Matrix[ [1650, 3] ]).to_i).to eq 293081
|
70
|
+
end
|
71
|
+
end
|
72
|
+
end
|
73
|
+
end
|
74
|
+
|
75
|
+
describe '#predict_advanced' do
|
76
|
+
context 'when prediction set is not a matrix' do
|
77
|
+
it 'raises an exception' do
|
78
|
+
expect {program_one_feature.predict_advanced([1, 2])}.to raise_exception(ArgumentError)
|
79
|
+
end
|
80
|
+
end
|
81
|
+
|
82
|
+
context 'when prediction has more than one row' do
|
83
|
+
it 'raises an exception' do
|
84
|
+
expect {program_two_features.predict_advanced(Matrix[ [1, 2], [3, 4] ])}.to raise_exception(ArgumentError)
|
85
|
+
end
|
86
|
+
end
|
87
|
+
|
88
|
+
context 'when prediction has wrong amount of columns' do
|
89
|
+
context 'when training set has one feature' do
|
90
|
+
it 'raises an exception' do
|
91
|
+
expect {program_one_feature.predict_advanced(Matrix[ [1, 2] ])}.to raise_exception(ArgumentError)
|
92
|
+
expect {program_one_feature.predict_advanced(Matrix[ ])}.to raise_exception(ArgumentError)
|
93
|
+
end
|
94
|
+
end
|
95
|
+
|
96
|
+
context 'when training set has two features' do
|
97
|
+
it 'raises an exception' do
|
98
|
+
expect {program_two_features.predict_advanced(Matrix[ [1, 2, 3] ])}.to raise_exception(ArgumentError)
|
99
|
+
expect {program_two_features.predict_advanced(Matrix[ [1] ])}.to raise_exception(ArgumentError)
|
100
|
+
end
|
101
|
+
end
|
102
|
+
end
|
103
|
+
|
104
|
+
context 'when prediction has correct amount of columns' do
|
105
|
+
context 'when training set has one feature' do
|
106
|
+
it 'returns correct prediction' do
|
107
|
+
expect(program_one_feature.predict_advanced(Matrix[ [1650] ], 0.1, 600, false).to_i).to be_within(200).of(293237)
|
108
|
+
end
|
109
|
+
end
|
110
|
+
|
111
|
+
context 'when training set has two features' do
|
112
|
+
it 'returns correct prediction' do
|
113
|
+
expect(program_two_features.predict_advanced(Matrix[ [1650, 3] ], 0.1, 600, false).to_i).to be_within(200).of(293237)
|
114
|
+
end
|
115
|
+
end
|
116
|
+
end
|
117
|
+
end
|
118
|
+
end
|
data/spec/spec_helper.rb
ADDED
@@ -0,0 +1,13 @@
|
|
1
|
+
require 'supervised_learning'
|
2
|
+
|
3
|
+
RSpec.configure do |config|
|
4
|
+
# Run specs in random order to surface order dependencies. If you find an
|
5
|
+
# order dependency and want to debug it, you can fix the order by providing
|
6
|
+
# the seed, which is printed after each run.
|
7
|
+
# --seed 1234
|
8
|
+
config.order = 'random'
|
9
|
+
|
10
|
+
# when a focus tag is present in RSpec, only run tests with focus tag: http://railscasts.com/episodes/285-spork
|
11
|
+
config.filter_run :focus
|
12
|
+
config.run_all_when_everything_filtered = true
|
13
|
+
end
|
@@ -0,0 +1,26 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'supervised_learning/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "supervised_learning"
|
8
|
+
spec.version = SupervisedLearning::VERSION
|
9
|
+
spec.authors = ["Michael Imstepf"]
|
10
|
+
spec.email = ["michael.imstepf@gmail.com"]
|
11
|
+
spec.summary = %q{A module to make predictions based on a set of training data.}
|
12
|
+
spec.description = %q{Supervised learning is the machine learning task of inferring a function from labeled training data. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples.}
|
13
|
+
spec.homepage = "https://github.com/michaelimstepf/supervised-learning"
|
14
|
+
spec.license = "MIT"
|
15
|
+
|
16
|
+
spec.files = `git ls-files -z`.split("\x0")
|
17
|
+
spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
|
18
|
+
spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
|
19
|
+
spec.require_paths = ["lib"]
|
20
|
+
|
21
|
+
spec.add_development_dependency "bundler", "~> 1.6"
|
22
|
+
spec.add_development_dependency "rake"
|
23
|
+
spec.add_development_dependency "rspec"
|
24
|
+
spec.add_development_dependency "pry"
|
25
|
+
spec.add_development_dependency "descriptive_statistics"
|
26
|
+
end
|
metadata
ADDED
@@ -0,0 +1,128 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: supervised_learning
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.0.1
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Michael Imstepf
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2014-07-21 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: bundler
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - "~>"
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '1.6'
|
20
|
+
type: :development
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - "~>"
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '1.6'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: rake
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ">="
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '0'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ">="
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '0'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: rspec
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - ">="
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '0'
|
48
|
+
type: :development
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '0'
|
55
|
+
- !ruby/object:Gem::Dependency
|
56
|
+
name: pry
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
58
|
+
requirements:
|
59
|
+
- - ">="
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '0'
|
62
|
+
type: :development
|
63
|
+
prerelease: false
|
64
|
+
version_requirements: !ruby/object:Gem::Requirement
|
65
|
+
requirements:
|
66
|
+
- - ">="
|
67
|
+
- !ruby/object:Gem::Version
|
68
|
+
version: '0'
|
69
|
+
- !ruby/object:Gem::Dependency
|
70
|
+
name: descriptive_statistics
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
72
|
+
requirements:
|
73
|
+
- - ">="
|
74
|
+
- !ruby/object:Gem::Version
|
75
|
+
version: '0'
|
76
|
+
type: :development
|
77
|
+
prerelease: false
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
79
|
+
requirements:
|
80
|
+
- - ">="
|
81
|
+
- !ruby/object:Gem::Version
|
82
|
+
version: '0'
|
83
|
+
description: Supervised learning is the machine learning task of inferring a function
|
84
|
+
from labeled training data. A supervised learning algorithm analyzes the training
|
85
|
+
data and produces an inferred function, which can be used for mapping new examples.
|
86
|
+
email:
|
87
|
+
- michael.imstepf@gmail.com
|
88
|
+
executables: []
|
89
|
+
extensions: []
|
90
|
+
extra_rdoc_files: []
|
91
|
+
files:
|
92
|
+
- ".gitignore"
|
93
|
+
- Gemfile
|
94
|
+
- LICENSE.txt
|
95
|
+
- README.md
|
96
|
+
- Rakefile
|
97
|
+
- lib/supervised_learning.rb
|
98
|
+
- lib/supervised_learning/version.rb
|
99
|
+
- spec/linear_regression_spec.rb
|
100
|
+
- spec/spec_helper.rb
|
101
|
+
- supervised_learning.gemspec
|
102
|
+
homepage: https://github.com/michaelimstepf/supervised-learning
|
103
|
+
licenses:
|
104
|
+
- MIT
|
105
|
+
metadata: {}
|
106
|
+
post_install_message:
|
107
|
+
rdoc_options: []
|
108
|
+
require_paths:
|
109
|
+
- lib
|
110
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
111
|
+
requirements:
|
112
|
+
- - ">="
|
113
|
+
- !ruby/object:Gem::Version
|
114
|
+
version: '0'
|
115
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
116
|
+
requirements:
|
117
|
+
- - ">="
|
118
|
+
- !ruby/object:Gem::Version
|
119
|
+
version: '0'
|
120
|
+
requirements: []
|
121
|
+
rubyforge_project:
|
122
|
+
rubygems_version: 2.2.2
|
123
|
+
signing_key:
|
124
|
+
specification_version: 4
|
125
|
+
summary: A module to make predictions based on a set of training data.
|
126
|
+
test_files:
|
127
|
+
- spec/linear_regression_spec.rb
|
128
|
+
- spec/spec_helper.rb
|