linefit 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (6) hide show
  1. data/CHANGELOG +7 -0
  2. data/LICENSE +7 -0
  3. data/README +306 -0
  4. data/examples/lrtest.rb +36 -0
  5. data/lib/linefit.rb +501 -0
  6. metadata +58 -0
data/CHANGELOG ADDED
@@ -0,0 +1,7 @@
1
+ = Change Log
2
+
3
+ Below is a complete listing of changes for each revision of LineFit.
4
+
5
+ == 0.1.0
6
+
7
+ * Initial public release.
data/LICENSE ADDED
@@ -0,0 +1,7 @@
1
+ = License Terms
2
+
3
+ Distributed under the user's choice of the {GPL Version 2}[http://www.gnu.org/licenses/old-licenses/gpl-2.0.html]
4
+ (see COPYING for details) or the {Ruby software license}[http://www.ruby-lang.org/en/LICENSE.txt] by
5
+ Eric Cline.
6
+
7
+ Please email James[mailto:escline@gmail.com] with any questions.
data/README ADDED
@@ -0,0 +1,306 @@
1
+ Contents:
2
+ NAME
3
+ SYNOPSIS
4
+ DESCRIPTION
5
+ ALGORITHM
6
+ LIMITATIONS
7
+ EXAMPLES
8
+ METHODS
9
+ SEE ALSO
10
+ AUTHOR
11
+ LICENSE
12
+ DISCLAIMER
13
+
14
+ NAME
15
+ LineFit - Least squares line fit, weighted or unweighted
16
+
17
+ SYNOPSIS
18
+ require 'linefit'
19
+ lineFit = LineFit.new
20
+ lineFit.setData(x,y)
21
+ intercept, slope = lineFit.coefficients
22
+ rSquared = lineFit.rSquared
23
+ meanSquaredError = lineFit.meanSqError
24
+ durbinWatson = lineFit.durbinWatson
25
+ sigma = lineFit.sigma
26
+ tStatIntercept, tStatSlope = lineFit.tStatistics
27
+ predictedYs = lineFit.predictedYs
28
+ residuals = lineFit.residuals
29
+ varianceIntercept, varianceSlope = lineFit.varianceOfEstimates
30
+
31
+ DESCRIPTION
32
+ The LineFit class does weighted or unweighted least-squares
33
+ line fitting to two-dimensional data (y = a + b * x). (This is also
34
+ called linear regression.) In addition to the slope and y-intercept, the
35
+ class can return the square of the correlation coefficient (R squared),
36
+ the Durbin-Watson statistic, the mean squared error, sigma, the t
37
+ statistics, the variance of the estimates of the slope and y-intercept,
38
+ the predicted y values and the residuals of the y values. (See the
39
+ METHODS section for a description of these statistics.)
40
+
41
+ The class accepts input data in separate x and y arrays or a single 2-D
42
+ array (an array of two arrays). The optional weights are input in a
43
+ separate array. The module can optionally verify that the input data and
44
+ weights are valid numbers. If weights are input, the line fit minimizes
45
+ the weighted sum of the squared errors and the following statistics are
46
+ weighted: the correlation coefficient, the Durbin-Watson statistic, the
47
+ mean squared error, sigma and the t statistics.
48
+
49
+ The class is state-oriented and caches its results. Once you call the
50
+ setData() method, you can call the other methods in any order or call a
51
+ method several times without invoking redundant calculations.
52
+
53
+ The decision to use or not use weighting could be made using your a
54
+ prior knowledge of the data or using supplemental data. If the data is
55
+ sparse or contains non-random noise, weighting can degrade the solution.
56
+ Weighting is a good option if some points are suspect or less relevant
57
+ (e.g., older terms in a time series, points that are known to have more
58
+ noise).
59
+
60
+ ALGORITHM
61
+ The least-square line is the line that minimizes the sum of the squares
62
+ of the y residuals:
63
+
64
+ Minimize SUM((y[i] - (a + b * x[i])) ** 2)
65
+
66
+ Setting the parial derivatives of a and b to zero yields a solution that
67
+ can be expressed in terms of the means, variances and covariances of x
68
+ and y:
69
+
70
+ b = SUM((x[i] - meanX) * (y[i] - meanY)) / SUM((x[i] - meanX) ** 2)
71
+
72
+ a = meanY - b * meanX
73
+
74
+ Note that a and b are undefined if all the x values are the same.
75
+
76
+ If you use weights, each term in the above sums is multiplied by the
77
+ value of the weight for that index. The program normalizes the weights
78
+ (after copying the input values) so that the sum of the weights equals
79
+ the number of points. This minimizes the differences between the
80
+ weighted and unweighted equations.
81
+
82
+ LineFit uses equations that are mathematically equivalent to
83
+ the above equations and computationally more efficient. The module runs
84
+ in O(N) (linear time).
85
+
86
+ LIMITATIONS
87
+ The regression fails if the input x values are all equal or the only
88
+ unequal x values have zero weights. This is an inherent limit to fitting
89
+ a line of the form y = a + b * x. In this case, the class issues an
90
+ error message and methods that return statistical values will return
91
+ undefined values. You can also use the return value of the regress()
92
+ method to check the status of the regression.
93
+
94
+ As the sum of the squared deviations of the x values approaches zero,
95
+ the class's results become sensitive to the precision of floating
96
+ point operations on the host system.
97
+
98
+ If the x values are not all the same and the apparent "best fit" line is
99
+ vertical, the class will fit a horizontal line. For example, an input
100
+ of (1, 1), (1, 7), (2, 3), (2, 5) returns a slope of zero, an intercept
101
+ of 4 and an R squared of zero. This is correct behavior because this
102
+ line is the best least-squares fit to the data for the given
103
+ parameterization (y = a + b * x).
104
+
105
+ On a 32-bit system the results are accurate to about 11 significant
106
+ digits, depending on the input data. Many of the installation tests will
107
+ fail on a system with word lengths of 16 bits or fewer. (You might want
108
+ to upgrade your old 80286 IBM PC.)
109
+
110
+ EXAMPLES
111
+
112
+ require 'linefit'
113
+ lineFit = LineFit.new
114
+ x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
115
+ y = [4039,4057,4052,4094,4104,4110,4154,4161,4186,4195,4229,4244,4242,4283,4322,4333,4368,4389]
116
+
117
+ lineFit.setData(x,y)
118
+
119
+ intercept, slope = lineFit.coefficients
120
+ rSquared = lineFit.rSquared
121
+ meanSquaredError = lineFit.meanSqError
122
+ durbinWatson = lineFit.durbinWatson
123
+ sigma = lineFit.sigma
124
+ tStatIntercept, tStatSlope = lineFit.tStatistics
125
+ predictedYs = lineFit.predictedYs
126
+ residuals = lineFit.residuals
127
+ varianceIntercept, varianceSlope = lineFit.varianceOfEstimates
128
+
129
+ print "Slope: #{slope} Y-Intercept: #{intercept}\n"
130
+ print "r-Squared: #{rSquared}\n"
131
+ print "Mean Squared Error: #{meanSquaredError}\n"
132
+ print "Durbin Watson Test: #{durbinWatson}\n"
133
+ print "Sigma: #{sigma}\n"
134
+ print "t Stat Intercept: #{tStatIntercept} t Stat Slope: #{tStatSlope}\n\n"
135
+ print "Predicted Ys: #{predictedYs.inspect}\n\n"
136
+ print "Residuals: #{residuals.inspect}\n\n"
137
+ print "Variance Intercept: #{varianceIntercept} Variance Slope: #{varianceSlope}\n"
138
+ print "\n"
139
+
140
+ newX = 24
141
+ newY = lineFit.forecast(newX)
142
+ print "New X: #{newX}\nNew Y: #{newY}\n"
143
+
144
+ METHODS
145
+ The class is state-oriented and caches its results. Once you call the
146
+ setData() method, you can call the other methods in any order or call a
147
+ method several times without invoking redundant calculations.
148
+
149
+ The regression fails if the x values are all the same. In this case, the
150
+ module issues an error message and methods that return statistical
151
+ values will return undefined values. You can also use the return value
152
+ of the regress() method to check the status of the regression.
153
+
154
+ new() - create a new LineFit object
155
+ linefit = LineFit.new
156
+ linefit = LineFit.new(validate)
157
+ linefit = LineFit.new(validate, hush)
158
+
159
+ validate = 1 -> Verify input data is numeric (slower execution)
160
+ = 0 -> Don't verify input data (default, faster execution)
161
+ hush = 1 -> Suppress error messages
162
+ = 0 -> Enable error messages (default)
163
+
164
+ coefficients() - Return the slope and y intercept
165
+ intercept, slope = linefit.coefficients
166
+
167
+ The returned list is undefined if the regression fails.
168
+
169
+ durbinWatson() - Return the Durbin-Watson statistic
170
+ durbinWatson = linefit.durbinWatson
171
+
172
+ The Durbin-Watson test is a test for first-order autocorrelation in the
173
+ residuals of a time series regression. The Durbin-Watson statistic has a
174
+ range of 0 to 4; a value of 2 indicates there is no autocorrelation.
175
+
176
+ The return value is undefined if the regression fails. If weights are
177
+ input, the return value is the weighted Durbin-Watson statistic.
178
+
179
+ meanSqError() - Return the mean squared error
180
+ meanSquaredError = linefit.meanSqError
181
+
182
+ The return value is undefined if the regression fails. If weights are
183
+ input, the return value is the weighted mean squared error.
184
+
185
+ predictedYs() - Return the predicted y values array
186
+ predictedYs = linefit.predictedYs
187
+
188
+ The returned list is undefined if the regression fails.
189
+
190
+ forecast() - Return the independent (Y) value, by using a dependent (X) value.
191
+ forecasted_y = linefit.forecast(x_value)
192
+
193
+ Will use the slope and intercept to calculate the Y value along the line
194
+ at the x value. Note: value returned only as good as the line fit.
195
+
196
+ regress() - Do the least squares line fit (if not already done)
197
+ linefit.regress
198
+
199
+ You don't need to call this method because it is invoked by the other
200
+ methods as needed. After you call setData(), you can call regress() at
201
+ any time to get the status of the regression for the current data.
202
+
203
+ residuals() - Return predicted y values minus input y values
204
+ residuals = linefit.residuals
205
+
206
+ The returned list is undefined if the regression fails.
207
+
208
+ rSquared() - Return the square of the correlation coefficient
209
+ rSquared = linefit.rSquared
210
+
211
+ R squared, also called the square of the Pearson product-moment
212
+ correlation coefficient, is a measure of goodness-of-fit. It is the
213
+ fraction of the variation in Y that can be attributed to the variation
214
+ in X. A perfect fit will have an R squared of 1; fitting a line to the
215
+ vertices of a regular polygon will yield an R squared of zero. Graphical
216
+ displays of data with an R squared of less than about 0.1 do not show a
217
+ visible linear trend.
218
+
219
+ The return value is undefined if the regression fails. If weights are
220
+ input, the return value is the weighted correlation coefficient.
221
+
222
+ setData() - Initialize (x,y) values and optional weights
223
+ lineFit.setData(x, y)
224
+ lineFit.setData(x, y, weights)
225
+ lineFit.setData(xy)
226
+ lineFit.setData(xy, weights)
227
+
228
+ xy is an array of arrays; x values are xy[i][0], y values are
229
+ xy[i][1]. The method identifies the difference between the first
230
+ and fourth calling signatures by examining the first argument.
231
+
232
+ The optional weights array must be the same length as the data array(s).
233
+ The weights must be non-negative numbers; at least two of the weights
234
+ must be nonzero. Only the relative size of the weights is significant:
235
+ the program normalizes the weights (after copying the input values) so
236
+ that the sum of the weights equals the number of points. If you want to
237
+ do multiple line fits using the same weights, the weights must be passed
238
+ to each call to setData().
239
+
240
+ The method will return flase if the array lengths don't match, there are
241
+ less than two data points, any weights are negative or less than two of
242
+ the weights are nonzero. If the new() method was called with validate =
243
+ 1, the method will also verify that the data and weights are valid
244
+ numbers. Once you successfully call setData(), the next call to any
245
+ method other than new() or setData() invokes the regression.
246
+
247
+ sigma() - Return the standard error of the estimate
248
+ sigma = linefit.sigma
249
+
250
+ Sigma is an estimate of the homoscedastic standard deviation of the
251
+ error. Sigma is also known as the standard error of the estimate.
252
+
253
+ The return value is undefined if the regression fails. If weights are
254
+ input, the return value is the weighted standard error.
255
+
256
+ tStatistics() - Return the t statistics
257
+ tStatIntercept, tStatSlope = linefit.tStatistics
258
+
259
+ The t statistic, also called the t ratio or Wald statistic, is used to
260
+ accept or reject a hypothesis using a table of cutoff values computed
261
+ from the t distribution. The t-statistic suggests that the estimated
262
+ value is (reasonable, too small, too large) when the t-statistic is
263
+ (close to zero, large and positive, large and negative).
264
+
265
+ The returned list is undefined if the regression fails. If weights are
266
+ input, the returned values are the weighted t statistics.
267
+
268
+ varianceOfEstimates() - Return variances of estimates of intercept, slope
269
+ varianceIntercept, varianceSlope = linefit.varianceOfEstimates
270
+
271
+ Assuming the data are noisy or inaccurate, the intercept and slope
272
+ returned by the coefficients() method are only estimates of the true
273
+ intercept and slope. The varianceofEstimate() method returns the
274
+ variances of the estimates of the intercept and slope, respectively. See
275
+ Numerical Recipes in C, section 15.2 (Fitting Data to a Straight Line),
276
+ equation 15.2.9.
277
+
278
+ The returned list is undefined if the regression fails. If weights are
279
+ input, the returned values are the weighted variances.
280
+
281
+ SEE ALSO
282
+ Mendenhall, W., and Sincich, T.L., 2003, A Second Course in Statistics:
283
+ Regression Analysis, 6th ed., Prentice Hall.
284
+ Press, W. H., Flannery, B. P., Teukolsky, S. A., Vetterling, W. T., 1992,
285
+ Numerical Recipes in C : The Art of Scientific Computing, 2nd ed.,
286
+ Cambridge University Press.
287
+
288
+ AUTHOR
289
+ Eric Cline, escline(at)gmail(dot)com
290
+ Richard Anderson
291
+
292
+ LICENSE
293
+ This program is free software; you can redistribute it and/or modify it
294
+ under the same terms as Ruby itself.
295
+
296
+ The full text of the license can be found in the LICENSE file included
297
+ in the distribution and available in the RubyForge listing for
298
+ LineFit (see rubyforge.org).
299
+
300
+ DISCLAIMER
301
+ To the maximum extent permitted by applicable law, the author of this
302
+ module disclaims all warranties, either express or implied, including
303
+ but not limited to implied warranties of merchantability and fitness for
304
+ a particular purpose, with regard to the software and the accompanying
305
+ documentation.
306
+
@@ -0,0 +1,36 @@
1
+ require 'linefit'
2
+
3
+ lineFit = LineFit.new
4
+
5
+ x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
6
+ y = [4039,4057,4052,4094,4104,4110,4154,4161,4186,4195,4229,4244,4242,4283,4322,4333,4368,4389]
7
+
8
+ lineFit.setData(x,y)
9
+
10
+ intercept, slope = lineFit.coefficients
11
+
12
+ rSquared = lineFit.rSquared
13
+
14
+ meanSquaredError = lineFit.meanSqError
15
+ durbinWatson = lineFit.durbinWatson
16
+ sigma = lineFit.sigma
17
+ tStatIntercept, tStatSlope = lineFit.tStatistics
18
+ predictedYs = lineFit.predictedYs
19
+ residuals = lineFit.residuals
20
+ varianceIntercept, varianceSlope = lineFit.varianceOfEstimates
21
+
22
+ print "\n***** LineFit *****\n"
23
+ print "Slope: #{slope} Y-Intercept: #{intercept}\n"
24
+ print "r-Squared: #{rSquared}\n"
25
+ print "Mean Squared Error: #{meanSquaredError}\n"
26
+ print "Durbin Watson Test: #{durbinWatson}\n"
27
+ print "Sigma: #{sigma}\n"
28
+ print "t Stat Intercept: #{tStatIntercept} t Stat Slope: #{tStatSlope}\n\n"
29
+ print "Predicted Ys: #{predictedYs.inspect}\n\n"
30
+ print "Residuals: #{residuals.inspect}\n\n"
31
+ print "Variance Intercept: #{varianceIntercept} Variance Slope: #{varianceSlope}\n"
32
+ print "\n"
33
+
34
+ newX = 24
35
+ newY = lineFit.forecast(newX)
36
+ print "New X: #{newX}\nNew Y: #{newY}\n"
data/lib/linefit.rb ADDED
@@ -0,0 +1,501 @@
1
+ # == Synopsis
2
+ #
3
+ # Weighted or unweighted least-squares line fitting to two-dimensional data (y = a + b * x).
4
+ # (This is also called linear regression.)
5
+ #
6
+ # == Usage
7
+ #
8
+ # x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
9
+ # y = [4039,4057,4052,4094,4104,4110,4154,4161,4186,4195,4229,4244,4242,4283,4322,4333,4368,4389]
10
+ #
11
+ # linefit = LineFit.new
12
+ # linefit.setData(x,y)
13
+ #
14
+ # intercept, slope = linefit.coefficients
15
+ # rSquared = linefit.rSquared
16
+ # meanSquaredError = linefit.meanSqError
17
+ # durbinWatson = linefit.durbinWatson
18
+ # sigma = linefit.sigma
19
+ # tStatIntercept, tStatSlope = linefit.tStatistics
20
+ # predictedYs = linefit.predictedYs
21
+ # residuals = linefit.residuals
22
+ # varianceIntercept, varianceSlope = linefit.varianceOfEstimates
23
+ #
24
+ # newX = 24
25
+ # newY = linefit.forecast(newX)
26
+ #
27
+ # == Authors
28
+ # Eric Cline, escline(at)gmail(dot)com, ( Ruby Port, LineFit#forecast )
29
+ #
30
+ #
31
+ # Richard Anderson ( Statistics::LineFit Perl module )
32
+ # http://search.cpan.org/~randerson/Statistics-LineFit-0.07
33
+ #
34
+ # == See Also
35
+ # Mendenhall, W., and Sincich, T.L., 2003, A Second Course in Statistics:
36
+ # Regression Analysis, 6th ed., Prentice Hall.
37
+ # Press, W. H., Flannery, B. P., Teukolsky, S. A., Vetterling, W. T., 1992,
38
+ # Numerical Recipes in C : The Art of Scientific Computing, 2nd ed.,
39
+ # Cambridge University Press.
40
+ #
41
+ # == License
42
+ # Licensed under the same terms as Ruby.
43
+ #
44
+
45
+ class LineFit
46
+
47
+ ############################################################################
48
+ # Create a LineFit object with the optional validate and hush parameters
49
+ #
50
+ # linefit = LineFit.new
51
+ # linefit = LineFit.new(validate)
52
+ # linefit = LineFit.new(validate, hush)
53
+ #
54
+ # validate = 1 -> Verify input data is numeric (slower execution)
55
+ # = 0 -> Don't verify input data (default, faster execution)
56
+ # hush = 1 -> Suppress error messages
57
+ # = 0 -> Enable error messages (default)
58
+
59
+ def initialize(validate = FALSE, hush = FALSE)
60
+ @doneRegress = FALSE
61
+ @gotData = FALSE
62
+ @hush = hush
63
+ @validate = validate
64
+ end
65
+
66
+ ############################################################################
67
+ # Return the slope and intercept from least squares line fit
68
+ #
69
+ # intercept, slope = linefit.coefficients
70
+ #
71
+ # The returned list is undefined if the regression fails.
72
+ #
73
+ def coefficients
74
+ self.regress unless (@intercept and @slope)
75
+ return @intercept, @slope
76
+ end
77
+
78
+ ############################################################################
79
+ # Return the Durbin-Watson statistic
80
+ #
81
+ # durbinWatson = linefit.durbinWatson
82
+ #
83
+ # The Durbin-Watson test is a test for first-order autocorrelation in the
84
+ # residuals of a time series regression. The Durbin-Watson statistic has a
85
+ # range of 0 to 4; a value of 2 indicates there is no autocorrelation.
86
+ #
87
+ # The return value is undefined if the regression fails. If weights are
88
+ # input, the return value is the weighted Durbin-Watson statistic.
89
+ #
90
+ def durbinWatson
91
+ unless @durbinWatson
92
+ self.regress or return
93
+ sumErrDiff = 0
94
+ errorTMinus1 = @y[0] - (@intercept + @slope * @x[0])
95
+ 1.upto(@numxy-1) do |i|
96
+ error = @y[i] - (@intercept + @slope * @x[i])
97
+ sumErrDiff += (error - errorTMinus1) ** 2
98
+ errorTMinus1 = error
99
+ end
100
+ @durbinWatson = sumSqErrors() > 0 ? sumErrDiff / sumSqErrors() : 0
101
+ end
102
+ return @durbinWatson
103
+ end
104
+
105
+ ############################################################################
106
+ # Return the mean squared error
107
+ #
108
+ # meanSquaredError = linefit.meanSqError
109
+ #
110
+ # The return value is undefined if the regression fails. If weights are
111
+ # input, the return value is the weighted mean squared error.
112
+ #
113
+ def meanSqError
114
+ unless @meanSqError
115
+ self.regress or return
116
+ @meanSqError = sumSqErrors() / @numxy
117
+ end
118
+ return @meanSqError
119
+ end
120
+
121
+ ############################################################################
122
+ # Return the predicted Y values
123
+ #
124
+ # predictedYs = linefit.predictedYs
125
+ #
126
+ # The returned list is undefined if the regression fails.
127
+ #
128
+ def predictedYs
129
+ unless @predictedYs
130
+ self.regress or return
131
+ @predictedYs = []
132
+ 0.upto(@numxy-1) do |i|
133
+ @predictedYs[i] = @intercept + @slope * @x[i]
134
+ end
135
+ end
136
+ return @predictedYs
137
+ end
138
+
139
+ ############################################################################
140
+ # Return the independent (Y) value, by using a dependent (X) value
141
+ #
142
+ # forecasted_y = linefit.forecast(x_value)
143
+ #
144
+ # Will use the slope and intercept to calculate the Y value along the line
145
+ # at the x value. Note: value returned only as good as the line fit.
146
+ #
147
+ def forecast(x)
148
+ self.regress unless (@intercept and @slope)
149
+ return @slope * x + @intercept
150
+ end
151
+
152
+ ############################################################################
153
+ # Do the least squares line fit (if not already done)
154
+ #
155
+ # linefit.regress
156
+ #
157
+ # You don't need to call this method because it is invoked by the other
158
+ # methods as needed. After you call setData(), you can call regress() at
159
+ # any time to get the status of the regression for the current data.
160
+ #
161
+ def regress
162
+ return @regressOK if @doneRegress
163
+ unless @gotData
164
+ puts "No valid data input - can't do regression" unless @hush
165
+ return FALSE
166
+ end
167
+ sumx, sumy, @sumxx, sumyy, sumxy = computeSums()
168
+ @sumSqDevx = @sumxx - sumx ** 2 / @numxy
169
+ if @sumSqDevx != 0
170
+ @sumSqDevy = sumyy - sumy ** 2 / @numxy
171
+ @sumSqDevxy = sumxy - sumx * sumy / @numxy
172
+ @slope = @sumSqDevxy / @sumSqDevx
173
+ @intercept = (sumy - @slope * sumx) / @numxy
174
+ @regressOK = TRUE
175
+ else
176
+ puts "Can't fit line when x values are all equal" unless @hush
177
+ @sumxx = @sumSqDevx = nil
178
+ @regressOK = FALSE
179
+ end
180
+ @doneRegress = TRUE
181
+ return @regressOK
182
+ end
183
+
184
+ ############################################################################
185
+ # Return the predicted Y values minus the observed Y values
186
+ #
187
+ # residuals = linefit.residuals
188
+ #
189
+ # The returned list is undefined if the regression fails.
190
+ #
191
+ def residuals
192
+ unless @residuals
193
+ self.regress or return
194
+ @residuals = []
195
+ 0.upto(@numxy-1) do |i|
196
+ @residuals[i] = @y[i] - (@intercept + @slope * @x[i])
197
+ end
198
+ end
199
+ return @residuals
200
+ end
201
+
202
+ ############################################################################
203
+ # Return the correlation coefficient
204
+ #
205
+ # rSquared = linefit.rSquared
206
+ #
207
+ # R squared, also called the square of the Pearson product-moment
208
+ # correlation coefficient, is a measure of goodness-of-fit. It is the
209
+ # fraction of the variation in Y that can be attributed to the variation
210
+ # in X. A perfect fit will have an R squared of 1; fitting a line to the
211
+ # vertices of a regular polygon will yield an R squared of zero. Graphical
212
+ # displays of data with an R squared of less than about 0.1 do not show a
213
+ # visible linear trend.
214
+ #
215
+ # The return value is undefined if the regression fails. If weights are
216
+ # input, the return value is the weighted correlation coefficient.
217
+ #
218
+ def rSquared
219
+ unless @rSquared
220
+ self.regress or return
221
+ denom = @sumSqDevx * @sumSqDevy
222
+ @rSquared = denom != 0 ? @sumSqDevxy ** 2 / denom : 1
223
+ end
224
+ return @rSquared
225
+ end
226
+
227
+ ############################################################################
228
+ # Initialize (x,y) values and optional weights
229
+ #
230
+ # lineFit.setData(x, y)
231
+ # lineFit.setData(x, y, weights)
232
+ # lineFit.setData(xy)
233
+ # lineFit.setData(xy, weights)
234
+ #
235
+ # xy is an array of arrays; x values are xy[i][0], y values are
236
+ # xy[i][1]. The method identifies the difference between the first
237
+ # and fourth calling signatures by examining the first argument.
238
+ #
239
+ # The optional weights array must be the same length as the data array(s).
240
+ # The weights must be non-negative numbers; at least two of the weights
241
+ # must be nonzero. Only the relative size of the weights is significant:
242
+ # the program normalizes the weights (after copying the input values) so
243
+ # that the sum of the weights equals the number of points. If you want to
244
+ # do multiple line fits using the same weights, the weights must be passed
245
+ # to each call to setData().
246
+ #
247
+ # The method will return flase if the array lengths don't match, there are
248
+ # less than two data points, any weights are negative or less than two of
249
+ # the weights are nonzero. If the new() method was called with validate =
250
+ # 1, the method will also verify that the data and weights are valid
251
+ # numbers. Once you successfully call setData(), the next call to any
252
+ # method other than new() or setData() invokes the regression.
253
+ #
254
+ def setData(x, y, weights = nil)
255
+ @doneRegress = FALSE
256
+ @x = @y = @numxy = @weight = \
257
+ @intercept = @slope = @rSquared = \
258
+ @sigma = @durbinWatson = @meanSqError = \
259
+ @sumSqErrors = @tStatInt = @tStatSlope = \
260
+ @predictedYs = @residuals = @sumxx = \
261
+ @sumSqDevx = @sumSqDevy = @sumSqDevxy = nil
262
+ if x.length < 2
263
+ puts "Must input more than one data point!" unless @hush
264
+ return FALSE
265
+ end
266
+ if x[0].class == Array
267
+ @numxy = x[0].length
268
+ setWeights(y) or return FALSE
269
+ @x = []
270
+ @y = []
271
+ x.each do |xy|
272
+ @x = xy[0]
273
+ @y = xy[1]
274
+ end
275
+ else
276
+ if x.length != y.length
277
+ puts "Length of x and y arrays must be equal!" unless @hush
278
+ return FALSE
279
+ end
280
+ @numxy = x.length
281
+ setWeights(weights) or return FALSE
282
+ @x = x
283
+ @y = y
284
+ end
285
+ if @validate
286
+ unless validData()
287
+ @x = @y = @weights = @numxy = nil
288
+ return FALSE
289
+ end
290
+ end
291
+ @gotData = TRUE
292
+ return TRUE
293
+ end
294
+
295
+ ############################################################################
296
+ # Return the estimated homoscedastic standard deviation of the
297
+ # error term
298
+ #
299
+ # sigma = linefit.sigma
300
+ #
301
+ # Sigma is an estimate of the homoscedastic standard deviation of the
302
+ # error. Sigma is also known as the standard error of the estimate.
303
+ #
304
+ # The return value is undefined if the regression fails. If weights are
305
+ # input, the return value is the weighted standard error.
306
+ #
307
+ def sigma
308
+ unless @sigma
309
+ self.regress or return
310
+ @sigma = @numxy > 2 ? Math.sqrt(sumSqErrors() / (@numxy - 2)) : 0
311
+ end
312
+ return @sigma
313
+ end
314
+
315
+ ############################################################################
316
+ # Return the T statistics
317
+ #
318
+ # tStatIntercept, tStatSlope = linefit.tStatistics
319
+ #
320
+ # The t statistic, also called the t ratio or Wald statistic, is used to
321
+ # accept or reject a hypothesis using a table of cutoff values computed
322
+ # from the t distribution. The t-statistic suggests that the estimated
323
+ # value is (reasonable, too small, too large) when the t-statistic is
324
+ # (close to zero, large and positive, large and negative).
325
+ #
326
+ # The returned list is undefined if the regression fails. If weights are
327
+ # input, the returned values are the weighted t statistics.
328
+ #
329
+ def tStatistics
330
+ unless (@tStatInt and @tStatSlope)
331
+ self.regress or return
332
+ biasEstimateInt = sigma() * Math.sqrt(@sumxx / (@sumSqDevx * @numxy))
333
+ @tStatInt = biasEstimateInt != 0 ? @intercept / biasEstimateInt : 0
334
+ biasEstimateSlope = sigma() / Math.sqrt(@sumSqDevx)
335
+ @tStatSlope = biasEstimateSlope != 0 ? @slope / biasEstimateSlope : 0
336
+ end
337
+ return @tStatInt, @tStatSlope
338
+ end
339
+
340
+ ############################################################################
341
+ # Return the variances in the estiamtes of the intercept and slope
342
+ #
343
+ # varianceIntercept, varianceSlope = linefit.varianceOfEstimates
344
+ #
345
+ # Assuming the data are noisy or inaccurate, the intercept and slope
346
+ # returned by the coefficients() method are only estimates of the true
347
+ # intercept and slope. The varianceofEstimate() method returns the
348
+ # variances of the estimates of the intercept and slope, respectively. See
349
+ # Numerical Recipes in C, section 15.2 (Fitting Data to a Straight Line),
350
+ # equation 15.2.9.
351
+ #
352
+ # The returned list is undefined if the regression fails. If weights are
353
+ # input, the returned values are the weighted variances.
354
+ #
355
+ def varianceOfEstimates
356
+ unless @intercept and @slope
357
+ self.regress or return
358
+ end
359
+ predictedYs = predictedYs()
360
+ s = sx = sxx = 0
361
+ if @weight
362
+ 0.upto(@numxy-1) do |i|
363
+ variance = (predictedYs[i] - @y[i]) ** 2
364
+ unless variance == 0
365
+ s += 1.0 / variance
366
+ sx += @weight[i] * @x[i] / variance
367
+ sxx += @weight[i] * @x[i] ** 2 / variance
368
+ end
369
+ end
370
+ else
371
+ 0.upto(@numxy-1) do |i|
372
+ variance = (predictedYs[i] - @y[i]) ** 2
373
+ unless variance == 0
374
+ s += 1.0 / variance
375
+ sx += @x[i] / variance
376
+ sxx += @x[i] ** 2 / variance
377
+ end
378
+ end
379
+ end
380
+ denominator = (s * sxx - sx ** 2)
381
+ if denominator == 0
382
+ return
383
+ else
384
+ return sxx / denominator, s / denominator
385
+ end
386
+ end
387
+
388
+ private
389
+
390
+ ############################################################################
391
+ # Compute sum of x, y, x**2, y**2, and x*y
392
+ #
393
+ def computeSums
394
+ sumx = sumy = sumxx = sumyy = sumxy = 0
395
+ if @weight
396
+ 0.upto(@numxy-1) do |i|
397
+ sumx += @weight[i] * @x[i]
398
+ sumy += @weight[i] * @y[i]
399
+ sumxx += @weight[i] * @x[i] ** 2
400
+ sumyy += @weight[i] * @y[i] ** 2
401
+ sumxy += @weight[i] * @x[i] * @y[i]
402
+ end
403
+ else
404
+ 0.upto(@numxy-1) do |i|
405
+ sumx += @x[i]
406
+ sumy += @y[i]
407
+ sumxx += @x[i] ** 2
408
+ sumyy += @y[i] ** 2
409
+ sumxy += @x[i] * @y[i]
410
+ end
411
+ end
412
+ # Multiply each return value by 1.0 to force them to Floats
413
+ return sumx * 1.0, sumy * 1.0, sumxx * 1.0, sumyy * 1.0, sumxy * 1.0
414
+ end
415
+
416
+ ############################################################################
417
+ # Normalize and initialize line fit weighting factors
418
+ #
419
+ def setWeights(weights = nil)
420
+ return TRUE unless weights
421
+ if weights.length != @numxy
422
+ puts "Length of weight array must equal length of data array!" unless @hush
423
+ return FALSE
424
+ end
425
+ if @validate
426
+ validWeights(weights) or return FALSE
427
+ end
428
+ sumw = numNonZero = 0
429
+ weights.each do |weight|
430
+ if z < 0
431
+ puts "Weights must be non-negative numbers!" unless @hush
432
+ return FALSE
433
+ end
434
+ sumw += weight
435
+ numNonZero += 1 if weight != 0
436
+ end
437
+ if numNonZero < 2
438
+ puts "At least two weights must be nonzero!" unless @hush
439
+ return FALSE
440
+ end
441
+ factor = weights.length / sumw
442
+ weights.collect! {|weight| weight * factor}
443
+ @weight = weights
444
+ return TRUE
445
+ end
446
+
447
+ ############################################################################
448
+ # Return the sum of the squared errors
449
+ #
450
+ def sumSqErrors
451
+ unless @sumSqErrors
452
+ self.regress or return
453
+ @sumSqErrors = @sumSqDevy - @sumSqDevx * @slope ** 2
454
+ @sumSqErrors = 0 if @sumSqErrors < 0
455
+ end
456
+ return @sumSqErrors
457
+ end
458
+
459
+ ############################################################################
460
+ # Verify that the input x-y data are numeric
461
+ #
462
+ def validData
463
+ 0.upto(@numxy-1) do |i|
464
+ unless @x[i]
465
+ puts "Input x[#{i}] is not defined" unless @hush
466
+ return FALSE
467
+ end
468
+ if @x[i] !~ /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/
469
+ puts "Input x[#{i}] is not a number: #{x[i]}" unless @hush
470
+ return FALSE
471
+ end
472
+ unless @y[i]
473
+ puts "Input y[#{i}] is not defined" unless @hush
474
+ return FALSE
475
+ end
476
+ if @y[i] !~ /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/
477
+ puts "Input y[#{i}] is not a number: #{y[i]}" unless @hush
478
+ return FALSE
479
+ end
480
+ end
481
+ return TRUE
482
+ end
483
+
484
+ ############################################################################
485
+ # Verify that the input weights are numeric
486
+ #
487
+ def validWeights(weights)
488
+ 0.upto(weights.length) do |i|
489
+ unless weights[i]
490
+ puts "Input weights[#{i}] is not defined" unless @hush
491
+ return FALSE
492
+ end
493
+ if weights[i] !~ /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/
494
+ puts "Input weights[#{i}] is not a number: #{weights[i]}" unless @hush
495
+ return FALSE
496
+ end
497
+ end
498
+ return TRUE
499
+ end
500
+
501
+ end
metadata ADDED
@@ -0,0 +1,58 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: linefit
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Eric Cline
8
+ - Richard Anderson
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2009-06-11 00:00:00 -05:00
14
+ default_executable:
15
+ dependencies: []
16
+
17
+ description: LineFit does weighted or unweighted least-squares line fitting to two-dimensional data (y = a + b * x). (Linear Regression)
18
+ email: escline+rubyforge@gmail.com
19
+ executables: []
20
+
21
+ extensions: []
22
+
23
+ extra_rdoc_files: []
24
+
25
+ files:
26
+ - lib/linefit.rb
27
+ - examples/lrtest.rb
28
+ - README
29
+ - LICENSE
30
+ - CHANGELOG
31
+ has_rdoc: true
32
+ homepage: http://linefit.rubyforge.org/
33
+ post_install_message:
34
+ rdoc_options: []
35
+
36
+ require_paths:
37
+ - - lib
38
+ required_ruby_version: !ruby/object:Gem::Requirement
39
+ requirements:
40
+ - - ">="
41
+ - !ruby/object:Gem::Version
42
+ version: "0"
43
+ version:
44
+ required_rubygems_version: !ruby/object:Gem::Requirement
45
+ requirements:
46
+ - - ">="
47
+ - !ruby/object:Gem::Version
48
+ version: "0"
49
+ version:
50
+ requirements: []
51
+
52
+ rubyforge_project: linefit
53
+ rubygems_version: 1.3.1
54
+ signing_key:
55
+ specification_version: 2
56
+ summary: LineFit is a linear regression math class.
57
+ test_files: []
58
+