davidrichards-just_enumerable_stats 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.rdoc ADDED
@@ -0,0 +1,147 @@
1
+ == Just Enumerable Stats
2
+
3
+ I had some tricky stuff in Statisticus and Sirb that were useful, but I ended up using someone else's library for plain-old calculations on a single enumerable. What a shame, I thought. So I extracted things out and made a simpler library that I'd use for my simpler needs.
4
+
5
+ I also included a FixedRange class for traversing floating point ranges a little more easily and a nearly-exact copy of the whole library in its own module for containing these methods in your own container. See the Known Issues section below and the specs for that module.
6
+
7
+ ==Usage
8
+
9
+ Would I release a gem without an IRB application included? Probably not. This gem has one, and it's called jes:
10
+
11
+ jes
12
+ Loading Just Enumerable Stats version: 0.0.1
13
+
14
+ Looking at the library through jes, you can see that we have all the usual goodies:
15
+
16
+ >> [1,2,3].mean
17
+ => 2
18
+ >> [1,2,3].std
19
+ => 1.0
20
+ >> [1,2,3].cor [2,3,5]
21
+ => 0.981980506061966
22
+
23
+ The list of methods are:
24
+
25
+ * average
26
+ * avg (average)
27
+ * cartesian_product
28
+ * compliment
29
+ * cor (correlation)
30
+ * correlation
31
+ * cp (cartesian_product)
32
+ * cum_max (cumulative_max)
33
+ * cum_min (cumulative_min)
34
+ * cum_prod (cumulative_product)
35
+ * cum_sum (cumulative_sum)
36
+ * cumulative_max
37
+ * cumulative_min
38
+ * cumulative_product
39
+ * cumulative_sum
40
+ * default_block
41
+ * default_block=
42
+ * euclidian_distance
43
+ * exclusive_not
44
+ * intersect
45
+ * max
46
+ * max_index
47
+ * max_of_lists
48
+ * mean (average)
49
+ * median
50
+ * min
51
+ * min_index
52
+ * min_of_lists
53
+ * new_sort
54
+ * order
55
+ * original_max
56
+ * original_min
57
+ * permutations (cartesian_product)
58
+ * product
59
+ * quantile
60
+ * rand_in_range
61
+ * range
62
+ * range_as_range
63
+ * range_class
64
+ * range_class=
65
+ * rank
66
+ * sigma_pairs
67
+ * standard_deviation
68
+ * std (standard_deviation)
69
+ * sum
70
+ * tanimoto_correlation (tanimoto_pairs)
71
+ * tanimoto_pairs
72
+ * to_pairs
73
+ * union
74
+ * var (variance)
75
+ * variance
76
+ * yield_transpose
77
+
78
+ One of the more interesting methods is yield_transpose:
79
+
80
+ [1,2,3].yield_transpose([5,5,5], [2,2,2]) { |e| e.product }
81
+ # => [10, 20, 30]
82
+
83
+ The yield_transpose:
84
+
85
+ * makes a list of lists (read matrix) out of the main object and one or more other lists
86
+ * yields a block on the *columns* of the list
87
+
88
+ In this case, it multiplies 1 * 5 * 2, 2 * 5 * 2, and 3 * 5 * 2 to get the final result.
89
+
90
+ There are a lot of other interesting tools that do this (RNum, Matrix), but the ones I know about aren't as flexible as this simple implementation.
91
+
92
+ Another interesting feature is the default block getter and setter. Sometimes I need to filter, scale, or normalize a result. I can do that in the default block and still hold on to the original value Ultimately, it's more expensive to do things this way (every computation has to also go through a filter), but it's a little simpler sometimes. An example:
93
+
94
+ a = [1,2,3]
95
+ a.default_block = lambda {|e| e * 2}
96
+ a.sum
97
+ # => [2,4,6]
98
+ a.std
99
+ # => 2.0 instead of 1.0
100
+
101
+ ==Installation
102
+
103
+ sudo gem install davidrichards-just_enumerable_stats
104
+
105
+ == Dependencies
106
+
107
+ None
108
+
109
+ == Known Issues
110
+
111
+ * I don't like the quantile methods. I found a different approach that I think is cleaner, that I should implement when I get the time.
112
+ * This isn't really for any Enumerable. It's only tested on Arrays, though I'm pretty sure a lot of my repositories and other custom Enumerables will work well with this. Most importantly, a Hash will fall on its face here, so don't try it. If you need labeled data, keep an eye out for Marginal, a gem I'm cleaning up that offers log-linear methods on cross tables. That gem will use this gem, so whatever goodies I add here will be available there.
113
+ * I imagine the scope of this gem may grow by about a third more methods. It's not supposed to be an exhaustive list. TeguGears was developed to build these kinds of methods and have them work nicely with other tools. So, anything more than elementary statistics should become a TeguGears class.
114
+ * I should probably rename the range methods.
115
+ * I'm very aggressively polluting the Enumerable namespace. In complex work environments, it wouldn't work if other libraries had as liberal a view on things as I do. If this is a problem, you can do something like:
116
+
117
+ require 'just_enumerable_stats/stats'
118
+ class MyDataContainer
119
+ include Enumerable
120
+ include JustEnumerableStats::Stats
121
+
122
+ def initialize(*values)
123
+ @data = values
124
+ end
125
+
126
+ def method_missing(sym, *args, &block)
127
+ @data.send(sym, *args, &block)
128
+ end
129
+
130
+ def to_a
131
+ @data
132
+ end
133
+
134
+ end
135
+
136
+ To use this new class, you'd convert your data lists like this:
137
+ a = [1,2,3]
138
+ m = MyDataContainer.new(*a)
139
+
140
+ Or just
141
+ m = MyDataContainer.new(1,2,3)
142
+
143
+ This approach works and passes the same tests as the main library though it promises to be awkward.
144
+
145
+ ==COPYRIGHT
146
+
147
+ Copyright (c) 2009 David Richards. See LICENSE for details.
data/VERSION.yml ADDED
@@ -0,0 +1,4 @@
1
+ ---
2
+ :major: 0
3
+ :minor: 0
4
+ :patch: 2
data/bin/jes ADDED
@@ -0,0 +1,27 @@
1
+ #!/usr/bin/env ruby -wKU
2
+ require 'yaml'
3
+
4
+ version_hash = YAML.load_file(File.join(File.dirname(__FILE__), %w(.. VERSION.yml)))
5
+ version = [version_hash[:major].to_s, version_hash[:minor].to_s, version_hash[:patch].to_s].join(".")
6
+ jes_file = File.join(File.dirname(__FILE__), %w(.. lib just_enumerable_stats))
7
+
8
+ irb = RUBY_PLATFORM =~ /(:?mswin|mingw)/ ? 'irb.bat' : 'irb'
9
+
10
+ require 'optparse'
11
+ options = { :sandbox => false, :irb => irb, :without_stored_procedures => false }
12
+ OptionParser.new do |opt|
13
+ opt.banner = "Usage: console [environment] [options]"
14
+ opt.on("--irb=[#{irb}]", 'Invoke a different irb.') { |v| options[:irb] = v }
15
+ opt.on("-w", 'Run without storing procedures') { |v| options[:without_stored_procedures] = true }
16
+ opt.parse!(ARGV)
17
+ end
18
+
19
+ libs = " -r irb/completion -r #{jes_file}"
20
+
21
+ puts "Loading Just Enumerable Stats version: #{version}"
22
+
23
+ if options[:sandbox]
24
+ puts "I'll have to think about how the whole sandbox concept should work for the jes"
25
+ end
26
+
27
+ exec "#{options[:irb]} #{libs} --simple-prompt"
@@ -0,0 +1,46 @@
1
+ # Because the standard Range isn't robust enough to handle floating
2
+ # point ranges correctly.
3
+ class FixedRange
4
+ include Enumerable
5
+
6
+ attr_reader :step_size, :max, :min
7
+ def initialize(min, max, step_size=1)
8
+ @step_size = step_size
9
+ if (min <=> max) < 0
10
+ @min = min
11
+ @max = max
12
+ else
13
+ @min = max
14
+ @max = min
15
+ end
16
+ end
17
+
18
+ def size
19
+ @size ||= calc_size
20
+ end
21
+
22
+ def step(enn=self.step_size, &block)
23
+ calc_size(enn).to_i.times do |i|
24
+ block.call(step_value(i, enn))
25
+ end
26
+ end
27
+
28
+ def each(&block)
29
+ step(&block)
30
+ end
31
+
32
+ def step_value(index, step_size=self.step_size)
33
+ index = size.to_i + index if index < 0
34
+ val = (index * step_size) + self.min
35
+ raise ArgumentError, "You have supplied an index and/or step_size that is outside of the range" if
36
+ val < self.min or val > self.max
37
+ return val
38
+ end
39
+ alias :[] :step_value
40
+
41
+ protected
42
+ def calc_size(step_size=self.step_size)
43
+ ((self.max - self.min) / step_size) + 1.0
44
+ end
45
+
46
+ end
@@ -0,0 +1,503 @@
1
+ # This is a namespaced version of the gem, in case you can create a
2
+ # container for your data and only include these methods there.
3
+ # Example:
4
+ class Object
5
+
6
+ # Simpler way to handle a random number between to values
7
+ def rand_between(a, b)
8
+ return rand_in_floats(a, b) if a.is_a?(Float) or b.is_a?(Float)
9
+ range = (a - b).abs + 1
10
+ rand(range) + [a,b].min
11
+ end
12
+
13
+ # Handles non-integers
14
+ def rand_in_floats(a, b)
15
+ range = (a - b).abs
16
+ (rand * range) + [a,b].min
17
+ end
18
+
19
+ end
20
+
21
+ module JustEnumerableStats #:nodoc:
22
+ module Stats
23
+
24
+ # To keep max and min DRY.
25
+ def block_sorter(a, b, &block)
26
+ if block
27
+ val = yield(a, b)
28
+ elsif default_block
29
+ val = default_block.call(a, b)
30
+ else
31
+ val = a <=> b
32
+ end
33
+ end
34
+ protected :block_sorter
35
+
36
+ # Returns the max, using an optional block.
37
+ def max(&block)
38
+ self.inject do |best, e|
39
+ val = block_sorter(best, e, &block)
40
+ best = val > 0 ? best : e
41
+ end
42
+ end
43
+
44
+ # Returns the first index of the max value
45
+ def max_index(&block)
46
+ self.index(max(&block))
47
+ end
48
+
49
+ # Min of any number of items
50
+ def min(&block)
51
+ self.inject do |best, e|
52
+ val = block_sorter(best, e, &block)
53
+ best = val < 0 ? best : e
54
+ end
55
+ end
56
+
57
+ # Returns the first index of the min value
58
+ def min_index(&block)
59
+ self.index(min(&block))
60
+ end
61
+
62
+ # The block called to filter the values in the object.
63
+ def default_block
64
+ @default_stat_block
65
+ end
66
+
67
+ # Allows me to setup a block for a series of operations. Example:
68
+ # a = [1,2,3]
69
+ # a.sum # => 6.0
70
+ # a.default_block = lambda{|e| 1 / e}
71
+ # a.sum # => 1.0
72
+ def default_block=(block)
73
+ @default_stat_block = block
74
+ end
75
+
76
+ # Provides zero in the right class (Numeric or Float)
77
+ def zero
78
+ any? {|e| e.is_a?(Float)} ? 0.0 : 0
79
+ end
80
+ protected :zero
81
+
82
+ # Provides one in the right class (Numeric or Float)
83
+ def one
84
+ any? {|e| e.is_a?(Float)} ? 1.0 : 1
85
+ end
86
+ protected :one
87
+
88
+ # Adds up the list. Uses a block or default block if present.
89
+ def sum
90
+ sum = zero
91
+ if block_given?
92
+ each{|i| sum += yield(i)}
93
+ elsif default_block
94
+ each{|i| sum += default_block[*i]}
95
+ else
96
+ each{|i| sum += i}
97
+ end
98
+ sum
99
+ end
100
+
101
+ # The arithmetic mean, uses a block or default block.
102
+ def average(&block)
103
+ sum(&block)/size
104
+ end
105
+ alias :mean :average
106
+ alias :avg :average
107
+
108
+ # The variance, uses a block or default block.
109
+ def variance(&block)
110
+ m = mean(&block)
111
+ sum_of_differences = if block_given?
112
+ sum{ |i| j=yield(i); (m - j) ** 2 }
113
+ elsif default_block
114
+ sum{ |i| j=default_block[*i]; (m - j) ** 2 }
115
+ else
116
+ sum{ |i| (m - i) ** 2 }
117
+ end
118
+ sum_of_differences / (size - 1)
119
+ end
120
+ alias :var :variance
121
+
122
+ # The standard deviation. Uses a block or default block.
123
+ def standard_deviation(&block)
124
+ Math::sqrt(variance(&block))
125
+ end
126
+ alias :std :standard_deviation
127
+
128
+ # The slow way is to iterate up to the middle point. A faster way is to
129
+ # use the index, when available. If a block is supplied, always iterate
130
+ # to the middle point.
131
+ def median(ratio=0.5, &block)
132
+ return iterate_midway(ratio, &block) if block_given?
133
+ begin
134
+ mid1, mid2 = middle_two
135
+ sorted = new_sort
136
+ med1, med2 = sorted[mid1], sorted[mid2]
137
+ return med1 if med1 == med2
138
+ return med1 + ((med2 - med1) * ratio)
139
+ rescue
140
+ iterate_midway(ratio, &block)
141
+ end
142
+ end
143
+
144
+ def middle_two
145
+ mid2 = size.div(2)
146
+ mid1 = (size % 2 == 0) ? mid2 - 1 : mid2
147
+ return mid1, mid2
148
+ end
149
+ protected :middle_two
150
+
151
+ def median_position
152
+ middle_two.last
153
+ end
154
+ protected :median_position
155
+
156
+ def first_half(&block)
157
+ fh = self[0..median_position].dup
158
+ end
159
+ protected :first_half
160
+
161
+ def second_half(&block)
162
+ # Total crap, but it's the way R does things, and this will most likely
163
+ # only be used to feed R some numbers to plot, if at all.
164
+ sh = size <= 5 ? self[median_position..-1].dup : self[median_position - 1..-1].dup
165
+ end
166
+ protected :second_half
167
+
168
+ # An iterative version of median
169
+ def iterate_midway(ratio, &block)
170
+ mid1, mid2, last_value, j, sorted, sort1, sort2 = middle_two, nil, 0, new_sort, nil, nil
171
+
172
+ if block_given?
173
+ sorted.each do |i|
174
+ last_value = yield(i)
175
+ j += 1
176
+ sort1 = last_value if j == mid1
177
+ sort2 = last_value if j == mid2
178
+ break if j >= mid2
179
+ end
180
+ elsif default_block
181
+ sorted.each do |i|
182
+ last_value = default_block[*i]
183
+ j += 1
184
+ sort1 = last_value if j == mid1
185
+ sort2 = last_value if j == mid2
186
+ break if j >= mid2
187
+ end
188
+ else
189
+ sorted.each do |i|
190
+ last_value = i
191
+ sort1 = last_value if j == mid1
192
+ sort2 = last_value if j == mid2
193
+ j += 1
194
+ break if j >= mid2
195
+ end
196
+ end
197
+ return med1 if med1 == med2
198
+ return med1 + ((med2 - med1) * ratio)
199
+ end
200
+ protected :iterate_midway
201
+
202
+ # Just an array of [min, max] to comply with R uses of the work. Use
203
+ # range_as_range if you want a real Range.
204
+ def range(&block)
205
+ [min(&block), max(&block)]
206
+ end
207
+
208
+ # Useful for setting a real range class (FixedRange).
209
+ def range_class=(klass)
210
+ @range_class = klass
211
+ end
212
+
213
+ # When creating a range, what class will it be? Defaults to Range, but
214
+ # other classes are sometimes useful.
215
+ def range_class
216
+ @range_class ||= Range
217
+ end
218
+
219
+ # Actually instantiates the range, instead of producing a min and max array.
220
+ def range_as_range(&block)
221
+ range_class.new(min(&block), max(&block))
222
+ end
223
+
224
+ # I don't pass the block to the sort, because a sort block needs to look
225
+ # something like: {|x,y| x <=> y}. To get around this, set the default
226
+ # block on the object.
227
+ def new_sort(&block)
228
+ if block_given?
229
+ map { |i| yield(i) }.sort.dup
230
+ elsif default_block
231
+ map { |i| default_block[*i] }.sort.dup
232
+ else
233
+ sort().dup
234
+ end
235
+ end
236
+
237
+ # Doesn't overwrite things like Matrix#rank
238
+ def rank(&block)
239
+
240
+ sorted = new_sort(&block)
241
+
242
+ if block_given?
243
+ map { |i| sorted.index(yield(i)) + 1 }
244
+ elsif default_block
245
+ map { |i| sorted.index(default_block[*i]) + 1 }
246
+ else
247
+ map { |i| sorted.index(i) + 1 }
248
+ end
249
+
250
+ end unless defined?(rank)
251
+
252
+ # Given values like [10,5,5,1]
253
+ # Rank should produce something like [4,2,2,1]
254
+ # And order should produce something like [4,2,3,1]
255
+ # The trick is that rank skips as many as were duplicated, so there
256
+ # could not be a 3 in the rank from the example above.
257
+ def order(&block)
258
+ hold = []
259
+ rank(&block).each do |x|
260
+ while hold.include?(x) do
261
+ x += 1
262
+ end
263
+ hold << x
264
+ end
265
+ hold
266
+ end
267
+
268
+ # First quartile: nth_split_by_m(1, 4)
269
+ # Third quartile: nth_split_by_m(3, 4)
270
+ # Median: nth_split_by_m(1, 2)
271
+ # Doesn't match R, and it's silly to try to.
272
+ # def nth_split_by_m(n, m)
273
+ # sorted = new_sort
274
+ # dividers = m - 1
275
+ # if size % m == dividers # Divides evenly
276
+ # # Because we have a 0-based list, we get the floor
277
+ # i = ((size / m.to_f) * n).floor
278
+ # j = i
279
+ # else
280
+ # # This reflects R's approach, which I don't think I agree with.
281
+ # i = (((size / m.to_f) * n) - 1)
282
+ # i = i > (size / m.to_f) ? i.floor : i.ceil
283
+ # j = i + 1
284
+ # end
285
+ # sorted[i] + ((n / m.to_f) * (sorted[j] - sorted[i]))
286
+ # end
287
+ def quantile(&block)
288
+ [
289
+ min(&block),
290
+ first_half(&block).median(0.25, &block),
291
+ median(&block),
292
+ second_half(&block).median(0.75, &block),
293
+ max(&block)
294
+ ]
295
+ end
296
+
297
+ # The cummulative sum. Example:
298
+ # [1,2,3].cum_sum # => [1, 3, 6]
299
+ def cum_sum(sorted=false, &block)
300
+ sum = zero
301
+ obj = sorted ? self.new_sort : self
302
+ if block_given?
303
+ obj.map { |i| sum += yield(i) }
304
+ elsif default_block
305
+ obj.map { |i| sum += default_block[*i] }
306
+ else
307
+ obj.map { |i| sum += i }
308
+ end
309
+ end
310
+ alias :cumulative_sum :cum_sum
311
+
312
+ # The cummulative product. Example:
313
+ # [1,2,3].cum_prod # => [1.0, 2.0, 6.0]
314
+ def cum_prod(sorted=false, &block)
315
+ prod = one
316
+ obj = sorted ? self.new_sort : self
317
+ if block_given?
318
+ obj.map { |i| prod *= yield(i) }
319
+ elsif default_block
320
+ obj.map { |i| prod *= default_block[*i] }
321
+ else
322
+ obj.map { |i| prod *= i }
323
+ end
324
+ end
325
+ alias :cumulative_product :cum_prod
326
+
327
+ # Used to preprocess the list
328
+ def morph_list(&block)
329
+ if block
330
+ self.map{ |e| block.call(e) }
331
+ elsif self.default_block
332
+ self.map{ |e| self.default_block.call(e) }
333
+ else
334
+ self
335
+ end
336
+ end
337
+ protected :morph_list
338
+
339
+ # Example:
340
+ # [1,2,3,0,5].cum_max # => [1,2,3,3,5]
341
+ def cum_max(&block)
342
+ morph_list(&block).inject([]) do |list, e|
343
+ found = (list | [e]).max
344
+ list << (found ? found : e)
345
+ end
346
+ end
347
+ alias :cumulative_max :cum_max
348
+
349
+ # Example:
350
+ # [1,2,3,0,5].cum_min # => [1,1,1,0,0]
351
+ def cum_min(&block)
352
+ morph_list(&block).inject([]) do |list, e|
353
+ found = (list | [e]).min
354
+ list << (found ? found : e)
355
+ end
356
+ end
357
+ alias :cumulative_min :cum_min
358
+
359
+ # Multiplies the values:
360
+ # >> product(1,2,3)
361
+ # => 6.0
362
+ def product
363
+ self.inject(one) {|sum, a| sum *= a}
364
+ end
365
+
366
+ # There are going to be a lot more of these kinds of things, so pay
367
+ # attention.
368
+ def to_pairs(other, &block)
369
+ n = [self.size, other.size].min
370
+ (0...n).map {|i| block.call(self[i], other[i]) }
371
+ end
372
+
373
+ # Finds the tanimoto coefficient: the intersection set size / union set
374
+ # size. This is used to find the distance between two vectors.
375
+ # >> [1,2,3].cor([2,3,5])
376
+ # => 0.981980506061966
377
+ # >> [1,2,3].tanimoto_pairs([2,3,5])
378
+ # => 0.5
379
+ def tanimoto_pairs(other)
380
+ intersect(other).size / union(other).size.to_f
381
+ end
382
+ alias :tanimoto_correlation :tanimoto_pairs
383
+
384
+ # Sometimes it just helps to have things spelled out. These are all
385
+ # part of the Array class. This means, you have methods that you can't
386
+ # run on some kinds of enumerables.
387
+
388
+ # All of the left and right hand sides, excluding duplicates.
389
+ # "The union of x and y"
390
+ def union(other)
391
+ other = other.to_a unless other.is_a?(Array)
392
+ self | other
393
+ end
394
+
395
+ # What's shared on the left and right hand sides
396
+ # "The intersection of x and y"
397
+ def intersect(other)
398
+ other = other.to_a unless other.is_a?(Array)
399
+ self & other
400
+ end
401
+
402
+ # Everything on the left hand side except what's shared on the right
403
+ # hand side.
404
+ # "The relative compliment of y in x"
405
+ def compliment(other)
406
+ other = other.to_a unless other.is_a?(Array)
407
+ self - other
408
+ end
409
+
410
+ # Everything but what's shared
411
+ def exclusive_not(other)
412
+ other = other.to_a unless other.is_a?(Array)
413
+ (self | other) - (self & other)
414
+ end
415
+
416
+ # Finds the cartesian product, excluding duplicates items and self-
417
+ # referential pairs. Yields the block value if given.
418
+ def cartesian_product(other, &block)
419
+ x,y = self.uniq.dup, other.uniq.dup
420
+ pairs = x.inject([]) do |cp, i|
421
+ cp | y.map{|b| i == b ? nil : [i,b]}.compact
422
+ end
423
+ return pairs unless block_given?
424
+ pairs.map{|p| yield p.first, p.last}
425
+ end
426
+ alias :cp :cartesian_product
427
+ alias :permutations :cartesian_product
428
+
429
+ # Sigma of pairs. Returns a single float, or whatever object is sent in.
430
+ # Example: [1,2,3].sigma_pairs([4,5,6], 0) {|x, y| x + y}
431
+ # returns 21 instead of 21.0.
432
+ def sigma_pairs(other, z=zero, &block)
433
+ self.to_pairs(other,&block).inject(z) {|sum, i| sum += i}
434
+ end
435
+
436
+ # Returns the Euclidian distance between all points of a set of enumerables
437
+ def euclidian_distance(other)
438
+ Math.sqrt(self.sigma_pairs(other) {|a, b| (a - b) ** 2})
439
+ end
440
+
441
+ # Returns a random integer in the range for any number of lists. This
442
+ # is a way to get a random vector that is tenable based on the sample
443
+ # data. For example, given two sets of numbers:
444
+ #
445
+ # a = [1,2,3]; b = [8,8,8]
446
+ #
447
+ # rand_in_pair_range will return a value >= 1 and <= 8 in the first
448
+ # place, >= 2 and <= 8 in the second place, and >= 3 and <= 8 in the
449
+ # last place.
450
+ # Works for integers. Rethink this for floats. May consider setting up
451
+ # FixedRange for floats. O(n*5)
452
+ def rand_in_range(*args)
453
+ min = self.min_of_lists(*args)
454
+ max = self.max_of_lists(*args)
455
+ (0...size).inject([]) do |ary, i|
456
+ ary << rand_between(min[i], max[i])
457
+ end
458
+ end
459
+
460
+ # Finds the correlation between two enumerables.
461
+ # Example: [1,2,3].cor [2,3,5]
462
+ # returns 0.981980506061966
463
+ def correlation(other)
464
+ n = [self.size, other.size].min
465
+ sum_of_products_of_pairs = self.sigma_pairs(other) {|a, b| a * b}
466
+ self_sum = self.sum
467
+ other_sum = other.sum
468
+ sum_of_squared_self_scores = self.sum { |e| e * e }
469
+ sum_of_squared_other_scores = other.sum { |e| e * e }
470
+
471
+ numerator = (n * sum_of_products_of_pairs) - (self_sum * other_sum)
472
+ self_denominator = ((n * sum_of_squared_self_scores) - (self_sum ** 2))
473
+ other_denominator = ((n * sum_of_squared_other_scores) - (other_sum ** 2))
474
+ denominator = Math.sqrt(self_denominator * other_denominator)
475
+ return numerator / denominator
476
+ end
477
+ alias :cor :correlation
478
+
479
+ # Transposes arrays of arrays and yields a block on the value.
480
+ # The regular Array#transpose ignores blocks
481
+ def yield_transpose(*enums, &block)
482
+ enums.unshift(self)
483
+ n = enums.map{ |x| x.size}.min
484
+ block ||= lambda{|e| e}
485
+ (0...n).map { |i| block.call enums.map{ |x| x[i] } }
486
+ end
487
+
488
+ # Returns the max of two or more enumerables.
489
+ # >> [1,2,3].max_of_lists([0,5,6], [0,2,9])
490
+ # => [1, 5, 9]
491
+ def max_of_lists(*enums)
492
+ yield_transpose(*enums) {|e| e.max}
493
+ end
494
+
495
+ # Returns the min of two or more enumerables.
496
+ # >> [1,2,3].min_of_lists([4,5,6], [0,2,9])
497
+ # => [0, 2, 3]
498
+ def min_of_lists(*enums)
499
+ yield_transpose(*enums) {|e| e.min}
500
+ end
501
+
502
+ end
503
+ end