linefit 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (6) hide show
  1. data/CHANGELOG +7 -0
  2. data/LICENSE +7 -0
  3. data/README +306 -0
  4. data/examples/lrtest.rb +36 -0
  5. data/lib/linefit.rb +501 -0
  6. metadata +58 -0
data/CHANGELOG ADDED
@@ -0,0 +1,7 @@
1
+ = Change Log
2
+
3
+ Below is a complete listing of changes for each revision of LineFit.
4
+
5
+ == 0.1.0
6
+
7
+ * Initial public release.
data/LICENSE ADDED
@@ -0,0 +1,7 @@
1
+ = License Terms
2
+
3
+ Distributed under the user's choice of the {GPL Version 2}[http://www.gnu.org/licenses/old-licenses/gpl-2.0.html]
4
+ (see COPYING for details) or the {Ruby software license}[http://www.ruby-lang.org/en/LICENSE.txt] by
5
+ Eric Cline.
6
+
7
+ Please email James[mailto:escline@gmail.com] with any questions.
data/README ADDED
@@ -0,0 +1,306 @@
1
+ Contents:
2
+ NAME
3
+ SYNOPSIS
4
+ DESCRIPTION
5
+ ALGORITHM
6
+ LIMITATIONS
7
+ EXAMPLES
8
+ METHODS
9
+ SEE ALSO
10
+ AUTHOR
11
+ LICENSE
12
+ DISCLAIMER
13
+
14
+ NAME
15
+ LineFit - Least squares line fit, weighted or unweighted
16
+
17
+ SYNOPSIS
18
+ require 'linefit'
19
+ lineFit = LineFit.new
20
+ lineFit.setData(x,y)
21
+ intercept, slope = lineFit.coefficients
22
+ rSquared = lineFit.rSquared
23
+ meanSquaredError = lineFit.meanSqError
24
+ durbinWatson = lineFit.durbinWatson
25
+ sigma = lineFit.sigma
26
+ tStatIntercept, tStatSlope = lineFit.tStatistics
27
+ predictedYs = lineFit.predictedYs
28
+ residuals = lineFit.residuals
29
+ varianceIntercept, varianceSlope = lineFit.varianceOfEstimates
30
+
31
+ DESCRIPTION
32
+ The LineFit class does weighted or unweighted least-squares
33
+ line fitting to two-dimensional data (y = a + b * x). (This is also
34
+ called linear regression.) In addition to the slope and y-intercept, the
35
+ class can return the square of the correlation coefficient (R squared),
36
+ the Durbin-Watson statistic, the mean squared error, sigma, the t
37
+ statistics, the variance of the estimates of the slope and y-intercept,
38
+ the predicted y values and the residuals of the y values. (See the
39
+ METHODS section for a description of these statistics.)
40
+
41
+ The class accepts input data in separate x and y arrays or a single 2-D
42
+ array (an array of two arrays). The optional weights are input in a
43
+ separate array. The module can optionally verify that the input data and
44
+ weights are valid numbers. If weights are input, the line fit minimizes
45
+ the weighted sum of the squared errors and the following statistics are
46
+ weighted: the correlation coefficient, the Durbin-Watson statistic, the
47
+ mean squared error, sigma and the t statistics.
48
+
49
+ The class is state-oriented and caches its results. Once you call the
50
+ setData() method, you can call the other methods in any order or call a
51
+ method several times without invoking redundant calculations.
52
+
53
+ The decision to use or not use weighting could be made using your a
54
+ prior knowledge of the data or using supplemental data. If the data is
55
+ sparse or contains non-random noise, weighting can degrade the solution.
56
+ Weighting is a good option if some points are suspect or less relevant
57
+ (e.g., older terms in a time series, points that are known to have more
58
+ noise).
59
+
60
+ ALGORITHM
61
+ The least-square line is the line that minimizes the sum of the squares
62
+ of the y residuals:
63
+
64
+ Minimize SUM((y[i] - (a + b * x[i])) ** 2)
65
+
66
+ Setting the parial derivatives of a and b to zero yields a solution that
67
+ can be expressed in terms of the means, variances and covariances of x
68
+ and y:
69
+
70
+ b = SUM((x[i] - meanX) * (y[i] - meanY)) / SUM((x[i] - meanX) ** 2)
71
+
72
+ a = meanY - b * meanX
73
+
74
+ Note that a and b are undefined if all the x values are the same.
75
+
76
+ If you use weights, each term in the above sums is multiplied by the
77
+ value of the weight for that index. The program normalizes the weights
78
+ (after copying the input values) so that the sum of the weights equals
79
+ the number of points. This minimizes the differences between the
80
+ weighted and unweighted equations.
81
+
82
+ LineFit uses equations that are mathematically equivalent to
83
+ the above equations and computationally more efficient. The module runs
84
+ in O(N) (linear time).
85
+
86
+ LIMITATIONS
87
+ The regression fails if the input x values are all equal or the only
88
+ unequal x values have zero weights. This is an inherent limit to fitting
89
+ a line of the form y = a + b * x. In this case, the class issues an
90
+ error message and methods that return statistical values will return
91
+ undefined values. You can also use the return value of the regress()
92
+ method to check the status of the regression.
93
+
94
+ As the sum of the squared deviations of the x values approaches zero,
95
+ the class's results become sensitive to the precision of floating
96
+ point operations on the host system.
97
+
98
+ If the x values are not all the same and the apparent "best fit" line is
99
+ vertical, the class will fit a horizontal line. For example, an input
100
+ of (1, 1), (1, 7), (2, 3), (2, 5) returns a slope of zero, an intercept
101
+ of 4 and an R squared of zero. This is correct behavior because this
102
+ line is the best least-squares fit to the data for the given
103
+ parameterization (y = a + b * x).
104
+
105
+ On a 32-bit system the results are accurate to about 11 significant
106
+ digits, depending on the input data. Many of the installation tests will
107
+ fail on a system with word lengths of 16 bits or fewer. (You might want
108
+ to upgrade your old 80286 IBM PC.)
109
+
110
+ EXAMPLES
111
+
112
+ require 'linefit'
113
+ lineFit = LineFit.new
114
+ x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
115
+ y = [4039,4057,4052,4094,4104,4110,4154,4161,4186,4195,4229,4244,4242,4283,4322,4333,4368,4389]
116
+
117
+ lineFit.setData(x,y)
118
+
119
+ intercept, slope = lineFit.coefficients
120
+ rSquared = lineFit.rSquared
121
+ meanSquaredError = lineFit.meanSqError
122
+ durbinWatson = lineFit.durbinWatson
123
+ sigma = lineFit.sigma
124
+ tStatIntercept, tStatSlope = lineFit.tStatistics
125
+ predictedYs = lineFit.predictedYs
126
+ residuals = lineFit.residuals
127
+ varianceIntercept, varianceSlope = lineFit.varianceOfEstimates
128
+
129
+ print "Slope: #{slope} Y-Intercept: #{intercept}\n"
130
+ print "r-Squared: #{rSquared}\n"
131
+ print "Mean Squared Error: #{meanSquaredError}\n"
132
+ print "Durbin Watson Test: #{durbinWatson}\n"
133
+ print "Sigma: #{sigma}\n"
134
+ print "t Stat Intercept: #{tStatIntercept} t Stat Slope: #{tStatSlope}\n\n"
135
+ print "Predicted Ys: #{predictedYs.inspect}\n\n"
136
+ print "Residuals: #{residuals.inspect}\n\n"
137
+ print "Variance Intercept: #{varianceIntercept} Variance Slope: #{varianceSlope}\n"
138
+ print "\n"
139
+
140
+ newX = 24
141
+ newY = lineFit.forecast(newX)
142
+ print "New X: #{newX}\nNew Y: #{newY}\n"
143
+
144
+ METHODS
145
+ The class is state-oriented and caches its results. Once you call the
146
+ setData() method, you can call the other methods in any order or call a
147
+ method several times without invoking redundant calculations.
148
+
149
+ The regression fails if the x values are all the same. In this case, the
150
+ module issues an error message and methods that return statistical
151
+ values will return undefined values. You can also use the return value
152
+ of the regress() method to check the status of the regression.
153
+
154
+ new() - create a new LineFit object
155
+ linefit = LineFit.new
156
+ linefit = LineFit.new(validate)
157
+ linefit = LineFit.new(validate, hush)
158
+
159
+ validate = 1 -> Verify input data is numeric (slower execution)
160
+ = 0 -> Don't verify input data (default, faster execution)
161
+ hush = 1 -> Suppress error messages
162
+ = 0 -> Enable error messages (default)
163
+
164
+ coefficients() - Return the slope and y intercept
165
+ intercept, slope = linefit.coefficients
166
+
167
+ The returned list is undefined if the regression fails.
168
+
169
+ durbinWatson() - Return the Durbin-Watson statistic
170
+ durbinWatson = linefit.durbinWatson
171
+
172
+ The Durbin-Watson test is a test for first-order autocorrelation in the
173
+ residuals of a time series regression. The Durbin-Watson statistic has a
174
+ range of 0 to 4; a value of 2 indicates there is no autocorrelation.
175
+
176
+ The return value is undefined if the regression fails. If weights are
177
+ input, the return value is the weighted Durbin-Watson statistic.
178
+
179
+ meanSqError() - Return the mean squared error
180
+ meanSquaredError = linefit.meanSqError
181
+
182
+ The return value is undefined if the regression fails. If weights are
183
+ input, the return value is the weighted mean squared error.
184
+
185
+ predictedYs() - Return the predicted y values array
186
+ predictedYs = linefit.predictedYs
187
+
188
+ The returned list is undefined if the regression fails.
189
+
190
+ forecast() - Return the independent (Y) value, by using a dependent (X) value.
191
+ forecasted_y = linefit.forecast(x_value)
192
+
193
+ Will use the slope and intercept to calculate the Y value along the line
194
+ at the x value. Note: value returned only as good as the line fit.
195
+
196
+ regress() - Do the least squares line fit (if not already done)
197
+ linefit.regress
198
+
199
+ You don't need to call this method because it is invoked by the other
200
+ methods as needed. After you call setData(), you can call regress() at
201
+ any time to get the status of the regression for the current data.
202
+
203
+ residuals() - Return predicted y values minus input y values
204
+ residuals = linefit.residuals
205
+
206
+ The returned list is undefined if the regression fails.
207
+
208
+ rSquared() - Return the square of the correlation coefficient
209
+ rSquared = linefit.rSquared
210
+
211
+ R squared, also called the square of the Pearson product-moment
212
+ correlation coefficient, is a measure of goodness-of-fit. It is the
213
+ fraction of the variation in Y that can be attributed to the variation
214
+ in X. A perfect fit will have an R squared of 1; fitting a line to the
215
+ vertices of a regular polygon will yield an R squared of zero. Graphical
216
+ displays of data with an R squared of less than about 0.1 do not show a
217
+ visible linear trend.
218
+
219
+ The return value is undefined if the regression fails. If weights are
220
+ input, the return value is the weighted correlation coefficient.
221
+
222
+ setData() - Initialize (x,y) values and optional weights
223
+ lineFit.setData(x, y)
224
+ lineFit.setData(x, y, weights)
225
+ lineFit.setData(xy)
226
+ lineFit.setData(xy, weights)
227
+
228
+ xy is an array of arrays; x values are xy[i][0], y values are
229
+ xy[i][1]. The method identifies the difference between the first
230
+ and fourth calling signatures by examining the first argument.
231
+
232
+ The optional weights array must be the same length as the data array(s).
233
+ The weights must be non-negative numbers; at least two of the weights
234
+ must be nonzero. Only the relative size of the weights is significant:
235
+ the program normalizes the weights (after copying the input values) so
236
+ that the sum of the weights equals the number of points. If you want to
237
+ do multiple line fits using the same weights, the weights must be passed
238
+ to each call to setData().
239
+
240
+ The method will return flase if the array lengths don't match, there are
241
+ less than two data points, any weights are negative or less than two of
242
+ the weights are nonzero. If the new() method was called with validate =
243
+ 1, the method will also verify that the data and weights are valid
244
+ numbers. Once you successfully call setData(), the next call to any
245
+ method other than new() or setData() invokes the regression.
246
+
247
+ sigma() - Return the standard error of the estimate
248
+ sigma = linefit.sigma
249
+
250
+ Sigma is an estimate of the homoscedastic standard deviation of the
251
+ error. Sigma is also known as the standard error of the estimate.
252
+
253
+ The return value is undefined if the regression fails. If weights are
254
+ input, the return value is the weighted standard error.
255
+
256
+ tStatistics() - Return the t statistics
257
+ tStatIntercept, tStatSlope = linefit.tStatistics
258
+
259
+ The t statistic, also called the t ratio or Wald statistic, is used to
260
+ accept or reject a hypothesis using a table of cutoff values computed
261
+ from the t distribution. The t-statistic suggests that the estimated
262
+ value is (reasonable, too small, too large) when the t-statistic is
263
+ (close to zero, large and positive, large and negative).
264
+
265
+ The returned list is undefined if the regression fails. If weights are
266
+ input, the returned values are the weighted t statistics.
267
+
268
+ varianceOfEstimates() - Return variances of estimates of intercept, slope
269
+ varianceIntercept, varianceSlope = linefit.varianceOfEstimates
270
+
271
+ Assuming the data are noisy or inaccurate, the intercept and slope
272
+ returned by the coefficients() method are only estimates of the true
273
+ intercept and slope. The varianceofEstimate() method returns the
274
+ variances of the estimates of the intercept and slope, respectively. See
275
+ Numerical Recipes in C, section 15.2 (Fitting Data to a Straight Line),
276
+ equation 15.2.9.
277
+
278
+ The returned list is undefined if the regression fails. If weights are
279
+ input, the returned values are the weighted variances.
280
+
281
+ SEE ALSO
282
+ Mendenhall, W., and Sincich, T.L., 2003, A Second Course in Statistics:
283
+ Regression Analysis, 6th ed., Prentice Hall.
284
+ Press, W. H., Flannery, B. P., Teukolsky, S. A., Vetterling, W. T., 1992,
285
+ Numerical Recipes in C : The Art of Scientific Computing, 2nd ed.,
286
+ Cambridge University Press.
287
+
288
+ AUTHOR
289
+ Eric Cline, escline(at)gmail(dot)com
290
+ Richard Anderson
291
+
292
+ LICENSE
293
+ This program is free software; you can redistribute it and/or modify it
294
+ under the same terms as Ruby itself.
295
+
296
+ The full text of the license can be found in the LICENSE file included
297
+ in the distribution and available in the RubyForge listing for
298
+ LineFit (see rubyforge.org).
299
+
300
+ DISCLAIMER
301
+ To the maximum extent permitted by applicable law, the author of this
302
+ module disclaims all warranties, either express or implied, including
303
+ but not limited to implied warranties of merchantability and fitness for
304
+ a particular purpose, with regard to the software and the accompanying
305
+ documentation.
306
+
@@ -0,0 +1,36 @@
1
+ require 'linefit'
2
+
3
+ lineFit = LineFit.new
4
+
5
+ x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
6
+ y = [4039,4057,4052,4094,4104,4110,4154,4161,4186,4195,4229,4244,4242,4283,4322,4333,4368,4389]
7
+
8
+ lineFit.setData(x,y)
9
+
10
+ intercept, slope = lineFit.coefficients
11
+
12
+ rSquared = lineFit.rSquared
13
+
14
+ meanSquaredError = lineFit.meanSqError
15
+ durbinWatson = lineFit.durbinWatson
16
+ sigma = lineFit.sigma
17
+ tStatIntercept, tStatSlope = lineFit.tStatistics
18
+ predictedYs = lineFit.predictedYs
19
+ residuals = lineFit.residuals
20
+ varianceIntercept, varianceSlope = lineFit.varianceOfEstimates
21
+
22
+ print "\n***** LineFit *****\n"
23
+ print "Slope: #{slope} Y-Intercept: #{intercept}\n"
24
+ print "r-Squared: #{rSquared}\n"
25
+ print "Mean Squared Error: #{meanSquaredError}\n"
26
+ print "Durbin Watson Test: #{durbinWatson}\n"
27
+ print "Sigma: #{sigma}\n"
28
+ print "t Stat Intercept: #{tStatIntercept} t Stat Slope: #{tStatSlope}\n\n"
29
+ print "Predicted Ys: #{predictedYs.inspect}\n\n"
30
+ print "Residuals: #{residuals.inspect}\n\n"
31
+ print "Variance Intercept: #{varianceIntercept} Variance Slope: #{varianceSlope}\n"
32
+ print "\n"
33
+
34
+ newX = 24
35
+ newY = lineFit.forecast(newX)
36
+ print "New X: #{newX}\nNew Y: #{newY}\n"
data/lib/linefit.rb ADDED
@@ -0,0 +1,501 @@
1
+ # == Synopsis
2
+ #
3
+ # Weighted or unweighted least-squares line fitting to two-dimensional data (y = a + b * x).
4
+ # (This is also called linear regression.)
5
+ #
6
+ # == Usage
7
+ #
8
+ # x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
9
+ # y = [4039,4057,4052,4094,4104,4110,4154,4161,4186,4195,4229,4244,4242,4283,4322,4333,4368,4389]
10
+ #
11
+ # linefit = LineFit.new
12
+ # linefit.setData(x,y)
13
+ #
14
+ # intercept, slope = linefit.coefficients
15
+ # rSquared = linefit.rSquared
16
+ # meanSquaredError = linefit.meanSqError
17
+ # durbinWatson = linefit.durbinWatson
18
+ # sigma = linefit.sigma
19
+ # tStatIntercept, tStatSlope = linefit.tStatistics
20
+ # predictedYs = linefit.predictedYs
21
+ # residuals = linefit.residuals
22
+ # varianceIntercept, varianceSlope = linefit.varianceOfEstimates
23
+ #
24
+ # newX = 24
25
+ # newY = linefit.forecast(newX)
26
+ #
27
+ # == Authors
28
+ # Eric Cline, escline(at)gmail(dot)com, ( Ruby Port, LineFit#forecast )
29
+ #
30
+ #
31
+ # Richard Anderson ( Statistics::LineFit Perl module )
32
+ # http://search.cpan.org/~randerson/Statistics-LineFit-0.07
33
+ #
34
+ # == See Also
35
+ # Mendenhall, W., and Sincich, T.L., 2003, A Second Course in Statistics:
36
+ # Regression Analysis, 6th ed., Prentice Hall.
37
+ # Press, W. H., Flannery, B. P., Teukolsky, S. A., Vetterling, W. T., 1992,
38
+ # Numerical Recipes in C : The Art of Scientific Computing, 2nd ed.,
39
+ # Cambridge University Press.
40
+ #
41
+ # == License
42
+ # Licensed under the same terms as Ruby.
43
+ #
44
+
45
+ class LineFit
46
+
47
+ ############################################################################
48
+ # Create a LineFit object with the optional validate and hush parameters
49
+ #
50
+ # linefit = LineFit.new
51
+ # linefit = LineFit.new(validate)
52
+ # linefit = LineFit.new(validate, hush)
53
+ #
54
+ # validate = 1 -> Verify input data is numeric (slower execution)
55
+ # = 0 -> Don't verify input data (default, faster execution)
56
+ # hush = 1 -> Suppress error messages
57
+ # = 0 -> Enable error messages (default)
58
+
59
+ def initialize(validate = FALSE, hush = FALSE)
60
+ @doneRegress = FALSE
61
+ @gotData = FALSE
62
+ @hush = hush
63
+ @validate = validate
64
+ end
65
+
66
+ ############################################################################
67
+ # Return the slope and intercept from least squares line fit
68
+ #
69
+ # intercept, slope = linefit.coefficients
70
+ #
71
+ # The returned list is undefined if the regression fails.
72
+ #
73
+ def coefficients
74
+ self.regress unless (@intercept and @slope)
75
+ return @intercept, @slope
76
+ end
77
+
78
+ ############################################################################
79
+ # Return the Durbin-Watson statistic
80
+ #
81
+ # durbinWatson = linefit.durbinWatson
82
+ #
83
+ # The Durbin-Watson test is a test for first-order autocorrelation in the
84
+ # residuals of a time series regression. The Durbin-Watson statistic has a
85
+ # range of 0 to 4; a value of 2 indicates there is no autocorrelation.
86
+ #
87
+ # The return value is undefined if the regression fails. If weights are
88
+ # input, the return value is the weighted Durbin-Watson statistic.
89
+ #
90
+ def durbinWatson
91
+ unless @durbinWatson
92
+ self.regress or return
93
+ sumErrDiff = 0
94
+ errorTMinus1 = @y[0] - (@intercept + @slope * @x[0])
95
+ 1.upto(@numxy-1) do |i|
96
+ error = @y[i] - (@intercept + @slope * @x[i])
97
+ sumErrDiff += (error - errorTMinus1) ** 2
98
+ errorTMinus1 = error
99
+ end
100
+ @durbinWatson = sumSqErrors() > 0 ? sumErrDiff / sumSqErrors() : 0
101
+ end
102
+ return @durbinWatson
103
+ end
104
+
105
+ ############################################################################
106
+ # Return the mean squared error
107
+ #
108
+ # meanSquaredError = linefit.meanSqError
109
+ #
110
+ # The return value is undefined if the regression fails. If weights are
111
+ # input, the return value is the weighted mean squared error.
112
+ #
113
+ def meanSqError
114
+ unless @meanSqError
115
+ self.regress or return
116
+ @meanSqError = sumSqErrors() / @numxy
117
+ end
118
+ return @meanSqError
119
+ end
120
+
121
+ ############################################################################
122
+ # Return the predicted Y values
123
+ #
124
+ # predictedYs = linefit.predictedYs
125
+ #
126
+ # The returned list is undefined if the regression fails.
127
+ #
128
+ def predictedYs
129
+ unless @predictedYs
130
+ self.regress or return
131
+ @predictedYs = []
132
+ 0.upto(@numxy-1) do |i|
133
+ @predictedYs[i] = @intercept + @slope * @x[i]
134
+ end
135
+ end
136
+ return @predictedYs
137
+ end
138
+
139
+ ############################################################################
140
+ # Return the independent (Y) value, by using a dependent (X) value
141
+ #
142
+ # forecasted_y = linefit.forecast(x_value)
143
+ #
144
+ # Will use the slope and intercept to calculate the Y value along the line
145
+ # at the x value. Note: value returned only as good as the line fit.
146
+ #
147
+ def forecast(x)
148
+ self.regress unless (@intercept and @slope)
149
+ return @slope * x + @intercept
150
+ end
151
+
152
+ ############################################################################
153
+ # Do the least squares line fit (if not already done)
154
+ #
155
+ # linefit.regress
156
+ #
157
+ # You don't need to call this method because it is invoked by the other
158
+ # methods as needed. After you call setData(), you can call regress() at
159
+ # any time to get the status of the regression for the current data.
160
+ #
161
+ def regress
162
+ return @regressOK if @doneRegress
163
+ unless @gotData
164
+ puts "No valid data input - can't do regression" unless @hush
165
+ return FALSE
166
+ end
167
+ sumx, sumy, @sumxx, sumyy, sumxy = computeSums()
168
+ @sumSqDevx = @sumxx - sumx ** 2 / @numxy
169
+ if @sumSqDevx != 0
170
+ @sumSqDevy = sumyy - sumy ** 2 / @numxy
171
+ @sumSqDevxy = sumxy - sumx * sumy / @numxy
172
+ @slope = @sumSqDevxy / @sumSqDevx
173
+ @intercept = (sumy - @slope * sumx) / @numxy
174
+ @regressOK = TRUE
175
+ else
176
+ puts "Can't fit line when x values are all equal" unless @hush
177
+ @sumxx = @sumSqDevx = nil
178
+ @regressOK = FALSE
179
+ end
180
+ @doneRegress = TRUE
181
+ return @regressOK
182
+ end
183
+
184
+ ############################################################################
185
+ # Return the predicted Y values minus the observed Y values
186
+ #
187
+ # residuals = linefit.residuals
188
+ #
189
+ # The returned list is undefined if the regression fails.
190
+ #
191
+ def residuals
192
+ unless @residuals
193
+ self.regress or return
194
+ @residuals = []
195
+ 0.upto(@numxy-1) do |i|
196
+ @residuals[i] = @y[i] - (@intercept + @slope * @x[i])
197
+ end
198
+ end
199
+ return @residuals
200
+ end
201
+
202
+ ############################################################################
203
+ # Return the correlation coefficient
204
+ #
205
+ # rSquared = linefit.rSquared
206
+ #
207
+ # R squared, also called the square of the Pearson product-moment
208
+ # correlation coefficient, is a measure of goodness-of-fit. It is the
209
+ # fraction of the variation in Y that can be attributed to the variation
210
+ # in X. A perfect fit will have an R squared of 1; fitting a line to the
211
+ # vertices of a regular polygon will yield an R squared of zero. Graphical
212
+ # displays of data with an R squared of less than about 0.1 do not show a
213
+ # visible linear trend.
214
+ #
215
+ # The return value is undefined if the regression fails. If weights are
216
+ # input, the return value is the weighted correlation coefficient.
217
+ #
218
+ def rSquared
219
+ unless @rSquared
220
+ self.regress or return
221
+ denom = @sumSqDevx * @sumSqDevy
222
+ @rSquared = denom != 0 ? @sumSqDevxy ** 2 / denom : 1
223
+ end
224
+ return @rSquared
225
+ end
226
+
227
+ ############################################################################
228
+ # Initialize (x,y) values and optional weights
229
+ #
230
+ # lineFit.setData(x, y)
231
+ # lineFit.setData(x, y, weights)
232
+ # lineFit.setData(xy)
233
+ # lineFit.setData(xy, weights)
234
+ #
235
+ # xy is an array of arrays; x values are xy[i][0], y values are
236
+ # xy[i][1]. The method identifies the difference between the first
237
+ # and fourth calling signatures by examining the first argument.
238
+ #
239
+ # The optional weights array must be the same length as the data array(s).
240
+ # The weights must be non-negative numbers; at least two of the weights
241
+ # must be nonzero. Only the relative size of the weights is significant:
242
+ # the program normalizes the weights (after copying the input values) so
243
+ # that the sum of the weights equals the number of points. If you want to
244
+ # do multiple line fits using the same weights, the weights must be passed
245
+ # to each call to setData().
246
+ #
247
+ # The method will return flase if the array lengths don't match, there are
248
+ # less than two data points, any weights are negative or less than two of
249
+ # the weights are nonzero. If the new() method was called with validate =
250
+ # 1, the method will also verify that the data and weights are valid
251
+ # numbers. Once you successfully call setData(), the next call to any
252
+ # method other than new() or setData() invokes the regression.
253
+ #
254
+ def setData(x, y, weights = nil)
255
+ @doneRegress = FALSE
256
+ @x = @y = @numxy = @weight = \
257
+ @intercept = @slope = @rSquared = \
258
+ @sigma = @durbinWatson = @meanSqError = \
259
+ @sumSqErrors = @tStatInt = @tStatSlope = \
260
+ @predictedYs = @residuals = @sumxx = \
261
+ @sumSqDevx = @sumSqDevy = @sumSqDevxy = nil
262
+ if x.length < 2
263
+ puts "Must input more than one data point!" unless @hush
264
+ return FALSE
265
+ end
266
+ if x[0].class == Array
267
+ @numxy = x[0].length
268
+ setWeights(y) or return FALSE
269
+ @x = []
270
+ @y = []
271
+ x.each do |xy|
272
+ @x = xy[0]
273
+ @y = xy[1]
274
+ end
275
+ else
276
+ if x.length != y.length
277
+ puts "Length of x and y arrays must be equal!" unless @hush
278
+ return FALSE
279
+ end
280
+ @numxy = x.length
281
+ setWeights(weights) or return FALSE
282
+ @x = x
283
+ @y = y
284
+ end
285
+ if @validate
286
+ unless validData()
287
+ @x = @y = @weights = @numxy = nil
288
+ return FALSE
289
+ end
290
+ end
291
+ @gotData = TRUE
292
+ return TRUE
293
+ end
294
+
295
+ ############################################################################
296
+ # Return the estimated homoscedastic standard deviation of the
297
+ # error term
298
+ #
299
+ # sigma = linefit.sigma
300
+ #
301
+ # Sigma is an estimate of the homoscedastic standard deviation of the
302
+ # error. Sigma is also known as the standard error of the estimate.
303
+ #
304
+ # The return value is undefined if the regression fails. If weights are
305
+ # input, the return value is the weighted standard error.
306
+ #
307
+ def sigma
308
+ unless @sigma
309
+ self.regress or return
310
+ @sigma = @numxy > 2 ? Math.sqrt(sumSqErrors() / (@numxy - 2)) : 0
311
+ end
312
+ return @sigma
313
+ end
314
+
315
+ ############################################################################
316
+ # Return the T statistics
317
+ #
318
+ # tStatIntercept, tStatSlope = linefit.tStatistics
319
+ #
320
+ # The t statistic, also called the t ratio or Wald statistic, is used to
321
+ # accept or reject a hypothesis using a table of cutoff values computed
322
+ # from the t distribution. The t-statistic suggests that the estimated
323
+ # value is (reasonable, too small, too large) when the t-statistic is
324
+ # (close to zero, large and positive, large and negative).
325
+ #
326
+ # The returned list is undefined if the regression fails. If weights are
327
+ # input, the returned values are the weighted t statistics.
328
+ #
329
+ def tStatistics
330
+ unless (@tStatInt and @tStatSlope)
331
+ self.regress or return
332
+ biasEstimateInt = sigma() * Math.sqrt(@sumxx / (@sumSqDevx * @numxy))
333
+ @tStatInt = biasEstimateInt != 0 ? @intercept / biasEstimateInt : 0
334
+ biasEstimateSlope = sigma() / Math.sqrt(@sumSqDevx)
335
+ @tStatSlope = biasEstimateSlope != 0 ? @slope / biasEstimateSlope : 0
336
+ end
337
+ return @tStatInt, @tStatSlope
338
+ end
339
+
340
+ ############################################################################
341
+ # Return the variances in the estiamtes of the intercept and slope
342
+ #
343
+ # varianceIntercept, varianceSlope = linefit.varianceOfEstimates
344
+ #
345
+ # Assuming the data are noisy or inaccurate, the intercept and slope
346
+ # returned by the coefficients() method are only estimates of the true
347
+ # intercept and slope. The varianceofEstimate() method returns the
348
+ # variances of the estimates of the intercept and slope, respectively. See
349
+ # Numerical Recipes in C, section 15.2 (Fitting Data to a Straight Line),
350
+ # equation 15.2.9.
351
+ #
352
+ # The returned list is undefined if the regression fails. If weights are
353
+ # input, the returned values are the weighted variances.
354
+ #
355
+ def varianceOfEstimates
356
+ unless @intercept and @slope
357
+ self.regress or return
358
+ end
359
+ predictedYs = predictedYs()
360
+ s = sx = sxx = 0
361
+ if @weight
362
+ 0.upto(@numxy-1) do |i|
363
+ variance = (predictedYs[i] - @y[i]) ** 2
364
+ unless variance == 0
365
+ s += 1.0 / variance
366
+ sx += @weight[i] * @x[i] / variance
367
+ sxx += @weight[i] * @x[i] ** 2 / variance
368
+ end
369
+ end
370
+ else
371
+ 0.upto(@numxy-1) do |i|
372
+ variance = (predictedYs[i] - @y[i]) ** 2
373
+ unless variance == 0
374
+ s += 1.0 / variance
375
+ sx += @x[i] / variance
376
+ sxx += @x[i] ** 2 / variance
377
+ end
378
+ end
379
+ end
380
+ denominator = (s * sxx - sx ** 2)
381
+ if denominator == 0
382
+ return
383
+ else
384
+ return sxx / denominator, s / denominator
385
+ end
386
+ end
387
+
388
+ private
389
+
390
+ ############################################################################
391
+ # Compute sum of x, y, x**2, y**2, and x*y
392
+ #
393
+ def computeSums
394
+ sumx = sumy = sumxx = sumyy = sumxy = 0
395
+ if @weight
396
+ 0.upto(@numxy-1) do |i|
397
+ sumx += @weight[i] * @x[i]
398
+ sumy += @weight[i] * @y[i]
399
+ sumxx += @weight[i] * @x[i] ** 2
400
+ sumyy += @weight[i] * @y[i] ** 2
401
+ sumxy += @weight[i] * @x[i] * @y[i]
402
+ end
403
+ else
404
+ 0.upto(@numxy-1) do |i|
405
+ sumx += @x[i]
406
+ sumy += @y[i]
407
+ sumxx += @x[i] ** 2
408
+ sumyy += @y[i] ** 2
409
+ sumxy += @x[i] * @y[i]
410
+ end
411
+ end
412
+ # Multiply each return value by 1.0 to force them to Floats
413
+ return sumx * 1.0, sumy * 1.0, sumxx * 1.0, sumyy * 1.0, sumxy * 1.0
414
+ end
415
+
416
+ ############################################################################
417
+ # Normalize and initialize line fit weighting factors
418
+ #
419
+ def setWeights(weights = nil)
420
+ return TRUE unless weights
421
+ if weights.length != @numxy
422
+ puts "Length of weight array must equal length of data array!" unless @hush
423
+ return FALSE
424
+ end
425
+ if @validate
426
+ validWeights(weights) or return FALSE
427
+ end
428
+ sumw = numNonZero = 0
429
+ weights.each do |weight|
430
+ if z < 0
431
+ puts "Weights must be non-negative numbers!" unless @hush
432
+ return FALSE
433
+ end
434
+ sumw += weight
435
+ numNonZero += 1 if weight != 0
436
+ end
437
+ if numNonZero < 2
438
+ puts "At least two weights must be nonzero!" unless @hush
439
+ return FALSE
440
+ end
441
+ factor = weights.length / sumw
442
+ weights.collect! {|weight| weight * factor}
443
+ @weight = weights
444
+ return TRUE
445
+ end
446
+
447
+ ############################################################################
448
+ # Return the sum of the squared errors
449
+ #
450
+ def sumSqErrors
451
+ unless @sumSqErrors
452
+ self.regress or return
453
+ @sumSqErrors = @sumSqDevy - @sumSqDevx * @slope ** 2
454
+ @sumSqErrors = 0 if @sumSqErrors < 0
455
+ end
456
+ return @sumSqErrors
457
+ end
458
+
459
+ ############################################################################
460
+ # Verify that the input x-y data are numeric
461
+ #
462
+ def validData
463
+ 0.upto(@numxy-1) do |i|
464
+ unless @x[i]
465
+ puts "Input x[#{i}] is not defined" unless @hush
466
+ return FALSE
467
+ end
468
+ if @x[i] !~ /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/
469
+ puts "Input x[#{i}] is not a number: #{x[i]}" unless @hush
470
+ return FALSE
471
+ end
472
+ unless @y[i]
473
+ puts "Input y[#{i}] is not defined" unless @hush
474
+ return FALSE
475
+ end
476
+ if @y[i] !~ /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/
477
+ puts "Input y[#{i}] is not a number: #{y[i]}" unless @hush
478
+ return FALSE
479
+ end
480
+ end
481
+ return TRUE
482
+ end
483
+
484
+ ############################################################################
485
+ # Verify that the input weights are numeric
486
+ #
487
+ def validWeights(weights)
488
+ 0.upto(weights.length) do |i|
489
+ unless weights[i]
490
+ puts "Input weights[#{i}] is not defined" unless @hush
491
+ return FALSE
492
+ end
493
+ if weights[i] !~ /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/
494
+ puts "Input weights[#{i}] is not a number: #{weights[i]}" unless @hush
495
+ return FALSE
496
+ end
497
+ end
498
+ return TRUE
499
+ end
500
+
501
+ end
metadata ADDED
@@ -0,0 +1,58 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: linefit
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Eric Cline
8
+ - Richard Anderson
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2009-06-11 00:00:00 -05:00
14
+ default_executable:
15
+ dependencies: []
16
+
17
+ description: LineFit does weighted or unweighted least-squares line fitting to two-dimensional data (y = a + b * x). (Linear Regression)
18
+ email: escline+rubyforge@gmail.com
19
+ executables: []
20
+
21
+ extensions: []
22
+
23
+ extra_rdoc_files: []
24
+
25
+ files:
26
+ - lib/linefit.rb
27
+ - examples/lrtest.rb
28
+ - README
29
+ - LICENSE
30
+ - CHANGELOG
31
+ has_rdoc: true
32
+ homepage: http://linefit.rubyforge.org/
33
+ post_install_message:
34
+ rdoc_options: []
35
+
36
+ require_paths:
37
+ - - lib
38
+ required_ruby_version: !ruby/object:Gem::Requirement
39
+ requirements:
40
+ - - ">="
41
+ - !ruby/object:Gem::Version
42
+ version: "0"
43
+ version:
44
+ required_rubygems_version: !ruby/object:Gem::Requirement
45
+ requirements:
46
+ - - ">="
47
+ - !ruby/object:Gem::Version
48
+ version: "0"
49
+ version:
50
+ requirements: []
51
+
52
+ rubyforge_project: linefit
53
+ rubygems_version: 1.3.1
54
+ signing_key:
55
+ specification_version: 2
56
+ summary: LineFit is a linear regression math class.
57
+ test_files: []
58
+