davidrichards-just_enumerable_stats 0.0.8 → 0.0.11
Sign up to get free protection for your applications and to get access to all the features.
- data/README.rdoc +7 -31
- data/VERSION.yml +1 -1
- data/lib/just_enumerable_stats.rb +261 -190
- data/spec/just_enumerable_stats_spec.rb +197 -7
- data/spec/spec_helper.rb +62 -0
- metadata +1 -5
- data/lib/just_enumerable_stats/stats.rb +0 -597
- data/spec/just_enumerable_stats/stats_spec.rb +0 -534
data/README.rdoc
CHANGED
@@ -149,49 +149,25 @@ OK, here we go:
|
|
149
149
|
* count_if is just like a.select_all{|e| e < 3}.size, but a little more obvious.
|
150
150
|
* dichotomize just splits the categories into two, with the first category less than or equal to the split value provided (2 in our case)
|
151
151
|
|
152
|
+
==Obtrusiveness
|
153
|
+
|
154
|
+
This gem won't override methods, but it puts a lot into the Enumerable namespace. It's almost as abusive as ActiveSupport. I'll be testing this in a Rails environment to make sure this plays nicely with ActiveSupport. The issue there was that ActiveSupport also wanted to override sum on an enumerable. If you are in an environment where jes isn't overriding methods, then you're going to have to use the _jes_ prefix for the calls. So, @a._jes_sum will work, as will @a._jes_standard_deviation, @a._jes_covariance, etc. I'm realizing that I always want to use this library in all of my new stuff, because it simplifies things so much. So, this is going to be an ever-more important point.
|
155
|
+
|
152
156
|
==Installation
|
153
157
|
|
154
158
|
sudo gem install davidrichards-just_enumerable_stats
|
155
159
|
|
160
|
+
|
156
161
|
== Dependencies
|
157
162
|
|
158
|
-
|
163
|
+
There's an optional dependency on facets/dictionary. You'll have fewer surprises if you use a Dictionary instead of a hash for categories.
|
159
164
|
|
160
165
|
== Known Issues
|
161
166
|
|
162
167
|
* I don't like the quantile methods. I found a different approach that I think is cleaner, that I should implement when I get the time.
|
163
|
-
* This isn't really for any Enumerable. It's only tested on Arrays, though I'm pretty sure a lot of my repositories and other custom Enumerables will work well with this. Most importantly, a Hash will fall on its face here, so don't try it. If you need labeled data, keep an eye out for Marginal, a gem I'm cleaning up that offers log-linear methods on cross tables. That gem will use this gem, so whatever goodies I add here will be available there.
|
168
|
+
* This isn't really for any Enumerable. It's only tested on Arrays, though I'm pretty sure a lot of my repositories and other custom Enumerables will work well with this. Most importantly, a Hash will fall on its face here, so don't try it. If you need labeled data, keep an eye out for Marginal, a gem I'm cleaning up that offers log-linear methods on cross tables. That gem will use this gem, so whatever goodies I add here will be available there. Also, there is data_frame out right now that uses this gem. That is a simpler and very useful gem for labeled data.
|
164
169
|
* I imagine the scope of this gem may grow by about a third more methods. It's not supposed to be an exhaustive list. TeguGears was developed to build these kinds of methods and have them work nicely with other tools. So, anything more than elementary statistics should become a TeguGears class.
|
165
170
|
* I should probably rename the range methods.
|
166
|
-
* I'm very aggressively polluting the Enumerable namespace. In complex work environments, it wouldn't work if other libraries had as liberal a view on things as I do. If this is a problem, you can do something like:
|
167
|
-
|
168
|
-
require 'just_enumerable_stats/stats'
|
169
|
-
class MyDataContainer
|
170
|
-
include Enumerable
|
171
|
-
include JustEnumerableStats::Stats
|
172
|
-
|
173
|
-
def initialize(*values)
|
174
|
-
@data = values
|
175
|
-
end
|
176
|
-
|
177
|
-
def method_missing(sym, *args, &block)
|
178
|
-
@data.send(sym, *args, &block)
|
179
|
-
end
|
180
|
-
|
181
|
-
def to_a
|
182
|
-
@data
|
183
|
-
end
|
184
|
-
|
185
|
-
end
|
186
|
-
|
187
|
-
To use this new class, you'd convert your data lists like this:
|
188
|
-
a = [1,2,3]
|
189
|
-
m = MyDataContainer.new(*a)
|
190
|
-
|
191
|
-
Or just
|
192
|
-
m = MyDataContainer.new(1,2,3)
|
193
|
-
|
194
|
-
This approach works and passes the same tests as the main library though it promises to be awkward.
|
195
171
|
|
196
172
|
==COPYRIGHT
|
197
173
|
|
data/VERSION.yml
CHANGED
@@ -35,152 +35,180 @@ module Enumerable
|
|
35
35
|
alias :original_min :min
|
36
36
|
|
37
37
|
# To keep max and min DRY.
|
38
|
-
def
|
38
|
+
def _jes_block_sorter(a, b, &block)
|
39
39
|
if block
|
40
40
|
val = yield(a, b)
|
41
|
-
elsif
|
42
|
-
val =
|
41
|
+
elsif _jes_default_block
|
42
|
+
val = _jes_default_block.call(a, b)
|
43
43
|
else
|
44
44
|
val = a <=> b
|
45
45
|
end
|
46
46
|
end
|
47
|
-
protected :
|
47
|
+
protected :_jes_block_sorter
|
48
48
|
|
49
|
+
# Defines the new methods unobtrusively.
|
50
|
+
def self.safe_alias(sym1, sym2=nil)
|
51
|
+
|
52
|
+
return false if not sym2 and not sym1.to_s.match(/^_jes_/)
|
53
|
+
|
54
|
+
if sym2
|
55
|
+
old_meth = sym2
|
56
|
+
new_meth = sym1
|
57
|
+
else
|
58
|
+
old_meth = sym1
|
59
|
+
new_meth = sym1.to_s.sub(/^_jes_/, '').to_sym
|
60
|
+
return false if self.class.respond_to?(new_meth)
|
61
|
+
end
|
62
|
+
alias_method new_meth, old_meth
|
63
|
+
end
|
64
|
+
|
49
65
|
# Returns the max, using an optional block.
|
50
|
-
def
|
66
|
+
def _jes_max(&block)
|
51
67
|
self.inject do |best, e|
|
52
|
-
val =
|
68
|
+
val = _jes_block_sorter(best, e, &block)
|
53
69
|
best = val > 0 ? best : e
|
54
70
|
end
|
55
71
|
end
|
72
|
+
safe_alias :_jes_max
|
56
73
|
|
57
74
|
# Returns the first index of the max value
|
58
|
-
def
|
59
|
-
self.index(
|
75
|
+
def _jes_max_index(&block)
|
76
|
+
self.index(_jes_max(&block))
|
60
77
|
end
|
78
|
+
safe_alias :_jes_max_index
|
61
79
|
|
62
80
|
# Min of any number of items
|
63
|
-
def
|
81
|
+
def _jes_min(&block)
|
64
82
|
self.inject do |best, e|
|
65
|
-
val =
|
83
|
+
val = _jes_block_sorter(best, e, &block)
|
66
84
|
best = val < 0 ? best : e
|
67
85
|
end
|
68
86
|
end
|
87
|
+
safe_alias :_jes_min
|
69
88
|
|
70
89
|
# Returns the first index of the min value
|
71
|
-
def
|
72
|
-
self.index(
|
90
|
+
def _jes_min_index(&block)
|
91
|
+
self.index(_jes_min(&block))
|
73
92
|
end
|
93
|
+
safe_alias :_jes_min_index
|
74
94
|
|
75
95
|
# The block called to filter the values in the object.
|
76
|
-
def
|
77
|
-
@
|
96
|
+
def _jes_default_block
|
97
|
+
@_jes_default_stat_block
|
78
98
|
end
|
99
|
+
safe_alias :_jes_default_block
|
79
100
|
|
80
101
|
# Allows me to setup a block for a series of operations. Example:
|
81
102
|
# a = [1,2,3]
|
82
103
|
# a.sum # => 6.0
|
83
104
|
# a.default_block = lambda{|e| 1 / e}
|
84
105
|
# a.sum # => 1.0
|
85
|
-
def
|
86
|
-
@
|
106
|
+
def _jes_default_block=(block)
|
107
|
+
@_jes_default_stat_block = block
|
87
108
|
end
|
109
|
+
safe_alias :_jes_default_block=
|
88
110
|
|
89
111
|
# Provides zero in the right class (Numeric or Float)
|
90
|
-
def
|
112
|
+
def _jes_zero
|
91
113
|
any? {|e| e.is_a?(Float)} ? 0.0 : 0
|
92
114
|
end
|
93
|
-
protected :
|
115
|
+
protected :_jes_zero
|
94
116
|
|
95
117
|
# Provides one in the right class (Numeric or Float)
|
96
|
-
def
|
118
|
+
def _jes_one
|
97
119
|
any? {|e| e.is_a?(Float)} ? 1.0 : 1
|
98
120
|
end
|
99
|
-
protected :
|
121
|
+
protected :_jes_one
|
100
122
|
|
101
123
|
# Adds up the list. Uses a block or default block if present.
|
102
|
-
def
|
103
|
-
sum =
|
124
|
+
def _jes_sum
|
125
|
+
sum = _jes_zero
|
104
126
|
if block_given?
|
105
127
|
each{|i| sum += yield(i)}
|
106
|
-
elsif
|
107
|
-
each{|i| sum +=
|
128
|
+
elsif _jes_default_block
|
129
|
+
each{|i| sum += _jes_default_block[*i]}
|
108
130
|
else
|
109
131
|
each{|i| sum += i}
|
110
132
|
end
|
111
133
|
sum
|
112
134
|
end
|
135
|
+
safe_alias :_jes_sum
|
113
136
|
|
114
137
|
# The arithmetic mean, uses a block or default block.
|
115
|
-
def
|
116
|
-
|
138
|
+
def _jes_average(&block)
|
139
|
+
_jes_sum(&block)/size
|
117
140
|
end
|
118
|
-
|
119
|
-
|
141
|
+
safe_alias :_jes_average
|
142
|
+
safe_alias :mean, :_jes_average
|
143
|
+
safe_alias :avg, :_jes_average
|
120
144
|
|
121
145
|
# The variance, uses a block or default block.
|
122
|
-
def
|
123
|
-
m =
|
146
|
+
def _jes_variance(&block)
|
147
|
+
m = _jes_average(&block)
|
124
148
|
sum_of_differences = if block_given?
|
125
|
-
|
126
|
-
elsif
|
127
|
-
|
149
|
+
_jes_sum{ |i| j=yield(i); (m - j) ** 2 }
|
150
|
+
elsif _jes_default_block
|
151
|
+
_jes_sum{ |i| j=_jes_default_block[*i]; (m - j) ** 2 }
|
128
152
|
else
|
129
|
-
|
153
|
+
_jes_sum{ |i| (m - i) ** 2 }
|
130
154
|
end
|
131
155
|
sum_of_differences / (size - 1)
|
132
156
|
end
|
133
|
-
|
157
|
+
safe_alias :_jes_variance
|
158
|
+
safe_alias :var, :_jes_variance
|
134
159
|
|
135
160
|
# The standard deviation. Uses a block or default block.
|
136
|
-
def
|
137
|
-
Math::sqrt(
|
161
|
+
def _jes_standard_deviation(&block)
|
162
|
+
Math::sqrt(_jes_variance(&block))
|
138
163
|
end
|
139
|
-
|
164
|
+
safe_alias :_jes_standard_deviation
|
165
|
+
safe_alias :std, :_jes_standard_deviation
|
140
166
|
|
141
167
|
# The slow way is to iterate up to the middle point. A faster way is to
|
142
168
|
# use the index, when available. If a block is supplied, always iterate
|
143
169
|
# to the middle point.
|
144
|
-
def
|
145
|
-
return
|
170
|
+
def _jes_median(ratio=0.5, &block)
|
171
|
+
return _jes_iterate_midway(ratio, &block) if block_given?
|
146
172
|
begin
|
147
|
-
mid1, mid2 =
|
148
|
-
sorted =
|
173
|
+
mid1, mid2 = _jes_middle_two
|
174
|
+
sorted = sort
|
149
175
|
med1, med2 = sorted[mid1], sorted[mid2]
|
150
176
|
return med1 if med1 == med2
|
151
177
|
return med1 + ((med2 - med1) * ratio)
|
152
178
|
rescue
|
153
|
-
|
179
|
+
_jes_iterate_midway(ratio, &block)
|
154
180
|
end
|
155
181
|
end
|
182
|
+
safe_alias :_jes_median
|
183
|
+
|
156
184
|
|
157
|
-
def
|
185
|
+
def _jes_middle_two
|
158
186
|
mid2 = size.div(2)
|
159
187
|
mid1 = (size % 2 == 0) ? mid2 - 1 : mid2
|
160
188
|
return mid1, mid2
|
161
189
|
end
|
162
|
-
protected :
|
190
|
+
protected :_jes_middle_two
|
163
191
|
|
164
|
-
def
|
165
|
-
|
192
|
+
def _jes_median_position
|
193
|
+
_jes_middle_two.last
|
166
194
|
end
|
167
|
-
protected :
|
195
|
+
protected :_jes_median_position
|
168
196
|
|
169
|
-
def
|
170
|
-
fh = self[0..
|
197
|
+
def _jes_first_half(&block)
|
198
|
+
fh = self[0.._jes_median_position].dup
|
171
199
|
end
|
172
|
-
protected :
|
200
|
+
protected :_jes_first_half
|
173
201
|
|
174
|
-
def
|
202
|
+
def _jes_second_half(&block)
|
175
203
|
# Total crap, but it's the way R does things, and this will most likely
|
176
204
|
# only be used to feed R some numbers to plot, if at all.
|
177
|
-
sh = size <= 5 ? self[
|
205
|
+
sh = size <= 5 ? self[_jes_median_position..-1].dup : self[_jes_median_position - 1..-1].dup
|
178
206
|
end
|
179
|
-
protected :
|
207
|
+
protected :_jes_second_half
|
180
208
|
|
181
209
|
# An iterative version of median
|
182
|
-
def
|
183
|
-
mid1, mid2, last_value, j, sorted, sort1, sort2 =
|
210
|
+
def _jes_iterate_midway(ratio, &block)
|
211
|
+
mid1, mid2, last_value, j, sorted, sort1, sort2 = _jes_middle_two, nil, 0, sort, nil, nil
|
184
212
|
|
185
213
|
if block_given?
|
186
214
|
sorted.each do |i|
|
@@ -190,9 +218,9 @@ module Enumerable
|
|
190
218
|
sort2 = last_value if j == mid2
|
191
219
|
break if j >= mid2
|
192
220
|
end
|
193
|
-
elsif
|
221
|
+
elsif _jes_default_block
|
194
222
|
sorted.each do |i|
|
195
|
-
last_value =
|
223
|
+
last_value = _jes_default_block[*i]
|
196
224
|
j += 1
|
197
225
|
sort1 = last_value if j == mid1
|
198
226
|
sort2 = last_value if j == mid2
|
@@ -210,7 +238,7 @@ module Enumerable
|
|
210
238
|
return med1 if med1 == med2
|
211
239
|
return med1 + ((med2 - med1) * ratio)
|
212
240
|
end
|
213
|
-
protected :
|
241
|
+
protected :_jes_iterate_midway
|
214
242
|
|
215
243
|
# Takes the range_class and returns its map.
|
216
244
|
# Example:
|
@@ -222,72 +250,81 @@ module Enumerable
|
|
222
250
|
# => [1, 5/4, 3/2, 7/4, 2, 9/4, 5/2, 11/4, 3]
|
223
251
|
# For non-numeric values, returns a unique set,
|
224
252
|
# ordered if possible.
|
225
|
-
def
|
226
|
-
if @
|
227
|
-
@
|
228
|
-
elsif self.
|
229
|
-
self.
|
253
|
+
def _jes_categories
|
254
|
+
if @_jes_categories
|
255
|
+
@_jes_categories
|
256
|
+
elsif self._jes_is_numeric?
|
257
|
+
self._jes_range_instance.map
|
230
258
|
else
|
231
259
|
self.uniq.sort rescue self.uniq
|
232
260
|
end
|
233
261
|
end
|
262
|
+
safe_alias :_jes_categories
|
234
263
|
|
235
|
-
def
|
264
|
+
def _jes_is_numeric?
|
236
265
|
self.all? {|e| e.is_a?(Numeric)}
|
237
266
|
end
|
267
|
+
safe_alias :_jes_is_numeric?
|
238
268
|
|
239
269
|
# Just an array of [min, max] to comply with R uses of the work. Use
|
240
270
|
# range_as_range if you want a real Range.
|
241
|
-
def
|
242
|
-
[
|
271
|
+
def _jes_range(&block)
|
272
|
+
[_jes_min(&block), _jes_max(&block)]
|
243
273
|
end
|
274
|
+
safe_alias :_jes_range
|
244
275
|
|
245
276
|
# Useful for setting a real range class (FixedRange).
|
246
|
-
def
|
247
|
-
@
|
248
|
-
@
|
249
|
-
self.
|
277
|
+
def _jes_set_range_class(klass, *args)
|
278
|
+
@_jes_range_class = klass
|
279
|
+
@_jes_range_class_args = args
|
280
|
+
self._jes_range_class
|
250
281
|
end
|
282
|
+
safe_alias :_jes_set_range_class
|
251
283
|
|
252
284
|
# Takes a hash of arrays for categories
|
253
285
|
# If Facets happens to be loaded on the computer, this keeps the order
|
254
286
|
# of the categories straight.
|
255
|
-
def
|
287
|
+
def _jes_set_range(hash)
|
256
288
|
if defined?(Dictionary)
|
257
|
-
@
|
258
|
-
@
|
259
|
-
@
|
289
|
+
@_jes_range_hash = Dictionary.new
|
290
|
+
@_jes_range_hash.merge!(hash)
|
291
|
+
@_jes_categories = @_jes_range_hash.keys
|
260
292
|
else
|
261
|
-
@
|
262
|
-
@
|
293
|
+
@_jes_categories = hash.keys
|
294
|
+
@_jes_range_hash = hash
|
263
295
|
end
|
264
|
-
@
|
296
|
+
@_jes_categories
|
265
297
|
end
|
298
|
+
safe_alias :_jes_set_range
|
266
299
|
|
267
300
|
# The hash of lambdas that are used to categorize the enumerable.
|
268
|
-
attr_reader :
|
301
|
+
attr_reader :_jes_range_hash
|
302
|
+
safe_alias :_jes_range_hash
|
269
303
|
|
270
304
|
# The arguments needed to instantiate the custom-defined range class.
|
271
|
-
attr_reader :
|
305
|
+
attr_reader :_jes_range_class_args
|
306
|
+
safe_alias :_jes_range_class_args
|
272
307
|
|
273
308
|
# Splits the values in two, <= the value and > the value.
|
274
|
-
def
|
275
|
-
|
276
|
-
|
277
|
-
|
278
|
-
|
309
|
+
def _jes_dichotomize(split_value, first_label, second_label)
|
310
|
+
container = defined?(Dictionary) ? Dictionary.new : Hash.new
|
311
|
+
container[first_label] = lambda{|e| e <= split_value}
|
312
|
+
container[second_label] = lambda{|e| e > split_value}
|
313
|
+
_jes_set_range(container)
|
279
314
|
end
|
315
|
+
safe_alias :_jes_dichotomize
|
280
316
|
|
281
317
|
# Counts each element where the block evaluates to true
|
282
318
|
# Example:
|
283
319
|
# a = [1,2,3]
|
284
320
|
# a.count_if {|e| e % 2 == 0}
|
285
|
-
def
|
321
|
+
def _jes_count_if(&block)
|
286
322
|
self.inject(0) do |s, e|
|
287
323
|
s += 1 if block.call(e)
|
288
324
|
s
|
289
325
|
end
|
290
326
|
end
|
327
|
+
safe_alias :_jes_count_if
|
291
328
|
|
292
329
|
# Returns a Hash or Dictionary (if available) for each category with a
|
293
330
|
# value as the set of matching values as an array.
|
@@ -295,75 +332,83 @@ module Enumerable
|
|
295
332
|
# expensive call, I'm going to cache it and offer a parameter to reset
|
296
333
|
# the cache. So, call category_values(true) if you need to reset the
|
297
334
|
# cache.
|
298
|
-
def
|
299
|
-
@
|
300
|
-
return @
|
335
|
+
def _jes_category_values(reset=false)
|
336
|
+
@_jes_category_values = nil if reset
|
337
|
+
return @_jes_category_values if @_jes_category_values
|
301
338
|
container = defined?(Dictionary) ? Dictionary.new : Hash.new
|
302
339
|
if self.range_hash
|
303
|
-
@
|
304
|
-
cont[cat] = self.find_all &self.
|
340
|
+
@_jes_category_values = self._jes_categories.inject(container) do |cont, cat|
|
341
|
+
cont[cat] = self.find_all &self._jes_range_hash[cat]
|
305
342
|
cont
|
306
343
|
end
|
307
344
|
else
|
308
|
-
@
|
345
|
+
@_jes_category_values = self._jes_categories.inject(container) do |cont, cat|
|
309
346
|
cont[cat] = self.find_all {|e| e == cat}
|
310
347
|
cont
|
311
348
|
end
|
312
349
|
end
|
313
350
|
end
|
351
|
+
safe_alias :_jes_category_values
|
314
352
|
|
315
353
|
# When creating a range, what class will it be? Defaults to Range, but
|
316
354
|
# other classes are sometimes useful.
|
317
|
-
def
|
318
|
-
@
|
355
|
+
def _jes_range_class
|
356
|
+
@_jes_range_class ||= Range
|
319
357
|
end
|
358
|
+
safe_alias :_jes_range_class
|
320
359
|
|
321
360
|
# Actually instantiates the range, instead of producing a min and max array.
|
322
|
-
def
|
323
|
-
if @
|
324
|
-
self.
|
361
|
+
def _jes_range_as_range(&block)
|
362
|
+
if @_jes_range_class_args and not @_jes_range_class_args.empty?
|
363
|
+
self._jes_range_class.new(*@_jes_range_class_args)
|
325
364
|
else
|
326
|
-
self.
|
365
|
+
self._jes_range_class.new(_jes_min(&block), _jes_max(&block))
|
327
366
|
end
|
328
367
|
end
|
329
|
-
|
368
|
+
safe_alias :_jes_range_as_range
|
369
|
+
safe_alias :_jes_range_instance, :_jes_range_as_range
|
370
|
+
safe_alias :range_instance, :_jes_range_as_range
|
330
371
|
|
331
372
|
# I don't pass the block to the sort, because a sort block needs to look
|
332
373
|
# something like: {|x,y| x <=> y}. To get around this, set the default
|
333
374
|
# block on the object.
|
334
|
-
def
|
375
|
+
def _jes_new_sort(&block)
|
335
376
|
if block_given?
|
336
377
|
map { |i| yield(i) }.sort.dup
|
337
|
-
elsif
|
338
|
-
map { |i|
|
378
|
+
elsif _jes_default_block
|
379
|
+
map { |i| _jes_default_block[*i] }.sort.dup
|
339
380
|
else
|
340
381
|
sort().dup
|
341
382
|
end
|
342
383
|
end
|
384
|
+
safe_alias :_jes_new_sort
|
343
385
|
|
344
|
-
#
|
345
|
-
def
|
386
|
+
# Ranks the values
|
387
|
+
def _jes_rank(&block)
|
346
388
|
|
347
|
-
sorted =
|
389
|
+
sorted = _jes_new_sort(&block)
|
390
|
+
# rank = map { |i| sorted.index(i) + 1 }
|
348
391
|
|
349
392
|
if block_given?
|
350
393
|
map { |i| sorted.index(yield(i)) + 1 }
|
351
|
-
elsif
|
352
|
-
map { |i|
|
394
|
+
elsif _jes_default_block
|
395
|
+
map { |i|
|
396
|
+
sorted.index(_jes_default_block[*i]) + 1 }
|
353
397
|
else
|
354
398
|
map { |i| sorted.index(i) + 1 }
|
355
399
|
end
|
356
400
|
|
357
|
-
end
|
401
|
+
end
|
402
|
+
safe_alias :_jes_rank
|
358
403
|
|
359
404
|
# Given values like [10,5,5,1]
|
360
405
|
# Rank should produce something like [4,2,2,1]
|
361
406
|
# And order should produce something like [4,2,3,1]
|
362
407
|
# The trick is that rank skips as many as were duplicated, so there
|
363
408
|
# could not be a 3 in the rank from the example above.
|
364
|
-
def
|
409
|
+
def _jes_order(&block)
|
365
410
|
hold = []
|
366
|
-
|
411
|
+
_jes_rank(&block).each do |x|
|
367
412
|
while hold.include?(x) do
|
368
413
|
x += 1
|
369
414
|
end
|
@@ -371,12 +416,13 @@ module Enumerable
|
|
371
416
|
end
|
372
417
|
hold
|
373
418
|
end
|
419
|
+
safe_alias :_jes_order
|
374
420
|
|
375
421
|
# First quartile: nth_split_by_m(1, 4)
|
376
422
|
# Third quartile: nth_split_by_m(3, 4)
|
377
423
|
# Median: nth_split_by_m(1, 2)
|
378
424
|
# Doesn't match R, and it's silly to try to.
|
379
|
-
# def
|
425
|
+
# def _jes_nth_split_by_m(n, m)
|
380
426
|
# sorted = new_sort
|
381
427
|
# dividers = m - 1
|
382
428
|
# if size % m == dividers # Divides evenly
|
@@ -391,91 +437,98 @@ module Enumerable
|
|
391
437
|
# end
|
392
438
|
# sorted[i] + ((n / m.to_f) * (sorted[j] - sorted[i]))
|
393
439
|
# end
|
394
|
-
def
|
440
|
+
def _jes_quantile(&block)
|
395
441
|
[
|
396
|
-
|
397
|
-
|
398
|
-
|
399
|
-
|
400
|
-
|
442
|
+
_jes_min(&block),
|
443
|
+
_jes_first_half(&block)._jes_median(0.25, &block),
|
444
|
+
_jes_median(&block),
|
445
|
+
_jes_second_half(&block)._jes_median(0.75, &block),
|
446
|
+
_jes_max(&block)
|
401
447
|
]
|
402
448
|
end
|
449
|
+
safe_alias :_jes_quantile
|
403
450
|
|
404
451
|
# The cummulative sum. Example:
|
405
452
|
# [1,2,3].cum_sum # => [1, 3, 6]
|
406
|
-
def
|
407
|
-
sum =
|
408
|
-
obj = sorted ? self.
|
453
|
+
def _jes_cum_sum(sorted=false, &block)
|
454
|
+
sum = _jes_zero
|
455
|
+
obj = sorted ? self.sort : self
|
409
456
|
if block_given?
|
410
457
|
obj.map { |i| sum += yield(i) }
|
411
|
-
elsif
|
412
|
-
obj.map { |i| sum +=
|
458
|
+
elsif _jes_default_block
|
459
|
+
obj.map { |i| sum += _jes_default_block[*i] }
|
413
460
|
else
|
414
461
|
obj.map { |i| sum += i }
|
415
462
|
end
|
416
463
|
end
|
417
|
-
|
464
|
+
safe_alias :_jes_cum_sum
|
465
|
+
safe_alias :cumulative_sum, :_jes_cum_sum
|
418
466
|
|
419
467
|
# The cummulative product. Example:
|
420
468
|
# [1,2,3].cum_prod # => [1.0, 2.0, 6.0]
|
421
|
-
def
|
422
|
-
prod =
|
423
|
-
obj = sorted ? self.
|
469
|
+
def _jes_cum_prod(sorted=false, &block)
|
470
|
+
prod = _jes_one
|
471
|
+
obj = sorted ? self.sort : self
|
424
472
|
if block_given?
|
425
473
|
obj.map { |i| prod *= yield(i) }
|
426
|
-
elsif
|
427
|
-
obj.map { |i| prod *=
|
474
|
+
elsif _jes_default_block
|
475
|
+
obj.map { |i| prod *= _jes_default_block[*i] }
|
428
476
|
else
|
429
477
|
obj.map { |i| prod *= i }
|
430
478
|
end
|
431
479
|
end
|
432
|
-
|
480
|
+
safe_alias :_jes_cum_prod
|
481
|
+
safe_alias :cumulative_product, :_jes_cum_prod
|
433
482
|
|
434
483
|
# Used to preprocess the list
|
435
|
-
def
|
484
|
+
def _jes_morph_list(&block)
|
436
485
|
if block
|
437
486
|
self.map{ |e| block.call(e) }
|
438
|
-
elsif self.
|
439
|
-
self.map{ |e| self.
|
487
|
+
elsif self._jes_default_block
|
488
|
+
self.map{ |e| self._jes_default_block.call(e) }
|
440
489
|
else
|
441
490
|
self
|
442
491
|
end
|
443
492
|
end
|
444
|
-
protected :
|
493
|
+
protected :_jes_morph_list
|
445
494
|
|
446
495
|
# Example:
|
447
496
|
# [1,2,3,0,5].cum_max # => [1,2,3,3,5]
|
448
|
-
def
|
449
|
-
|
450
|
-
found = (list | [e]).
|
497
|
+
def _jes_cum_max(&block)
|
498
|
+
_jes_morph_list(&block).inject([]) do |list, e|
|
499
|
+
found = (list | [e])._jes_max
|
451
500
|
list << (found ? found : e)
|
452
501
|
end
|
453
502
|
end
|
454
|
-
|
503
|
+
safe_alias :_jes_cum_max
|
504
|
+
safe_alias :cumulative_max, :_jes_cum_max
|
455
505
|
|
456
506
|
# Example:
|
457
507
|
# [1,2,3,0,5].cum_min # => [1,1,1,0,0]
|
458
|
-
def
|
459
|
-
|
508
|
+
def _jes_cum_min(&block)
|
509
|
+
_jes_morph_list(&block).inject([]) do |list, e|
|
460
510
|
found = (list | [e]).min
|
461
511
|
list << (found ? found : e)
|
462
512
|
end
|
463
513
|
end
|
464
|
-
|
514
|
+
safe_alias :_jes_cum_min
|
515
|
+
safe_alias :cumulative_min, :_jes_cum_min
|
465
516
|
|
466
517
|
# Multiplies the values:
|
467
518
|
# >> product(1,2,3)
|
468
519
|
# => 6.0
|
469
|
-
def
|
470
|
-
self.inject(
|
520
|
+
def _jes_product
|
521
|
+
self.inject(_jes_one) {|sum, a| sum *= a}
|
471
522
|
end
|
523
|
+
safe_alias :_jes_product
|
472
524
|
|
473
525
|
# There are going to be a lot more of these kinds of things, so pay
|
474
526
|
# attention.
|
475
|
-
def
|
476
|
-
n = [self.size, other.size].
|
527
|
+
def _jes_to_pairs(other, &block)
|
528
|
+
n = [self.size, other.size]._jes_min
|
477
529
|
(0...n).map {|i| block.call(self[i], other[i]) }
|
478
530
|
end
|
531
|
+
safe_alias :_jes_to_pairs
|
479
532
|
|
480
533
|
# Finds the tanimoto coefficient: the intersection set size / union set
|
481
534
|
# size. This is used to find the distance between two vectors.
|
@@ -483,10 +536,11 @@ module Enumerable
|
|
483
536
|
# => 0.981980506061966
|
484
537
|
# >> [1,2,3].tanimoto_pairs([2,3,5])
|
485
538
|
# => 0.5
|
486
|
-
def
|
487
|
-
|
539
|
+
def _jes_tanimoto_pairs(other)
|
540
|
+
_jes_intersect(other).size / _jes_union(other).size.to_f
|
488
541
|
end
|
489
|
-
|
542
|
+
safe_alias :_jes_tanimoto_pairs
|
543
|
+
safe_alias :tanimoto_correlation, :_jes_tanimoto_pairs
|
490
544
|
|
491
545
|
# Sometimes it just helps to have things spelled out. These are all
|
492
546
|
# part of the Array class. This means, you have methods that you can't
|
@@ -494,31 +548,35 @@ module Enumerable
|
|
494
548
|
|
495
549
|
# All of the left and right hand sides, excluding duplicates.
|
496
550
|
# "The union of x and y"
|
497
|
-
def
|
551
|
+
def _jes_union(other)
|
498
552
|
self | other
|
499
553
|
end
|
554
|
+
safe_alias :_jes_union
|
500
555
|
|
501
556
|
# What's shared on the left and right hand sides
|
502
557
|
# "The intersection of x and y"
|
503
|
-
def
|
558
|
+
def _jes_intersect(other)
|
504
559
|
self & other
|
505
560
|
end
|
561
|
+
safe_alias :_jes_intersect
|
506
562
|
|
507
563
|
# Everything on the left hand side except what's shared on the right
|
508
564
|
# hand side.
|
509
565
|
# "The relative compliment of y in x"
|
510
|
-
def
|
566
|
+
def _jes_compliment(other)
|
511
567
|
self - other
|
512
568
|
end
|
569
|
+
safe_alias :_jes_compliment
|
513
570
|
|
514
571
|
# Everything but what's shared
|
515
|
-
def
|
572
|
+
def _jes_exclusive_not(other)
|
516
573
|
(self | other) - (self & other)
|
517
574
|
end
|
575
|
+
safe_alias :_jes_exclusive_not
|
518
576
|
|
519
577
|
# Finds the cartesian product, excluding duplicates items and self-
|
520
578
|
# referential pairs. Yields the block value if given.
|
521
|
-
def
|
579
|
+
def _jes_cartesian_product(other, &block)
|
522
580
|
x,y = self.uniq.dup, other.uniq.dup
|
523
581
|
pairs = x.inject([]) do |cp, i|
|
524
582
|
cp | y.map{|b| i == b ? nil : [i,b]}.compact
|
@@ -526,20 +584,23 @@ module Enumerable
|
|
526
584
|
return pairs unless block_given?
|
527
585
|
pairs.map{|p| yield p.first, p.last}
|
528
586
|
end
|
529
|
-
|
530
|
-
|
587
|
+
safe_alias :_jes_cartesian_product
|
588
|
+
safe_alias :cp, :_jes_cartesian_product
|
589
|
+
safe_alias :permutations, :_jes_cartesian_product
|
531
590
|
|
532
591
|
# Sigma of pairs. Returns a single float, or whatever object is sent in.
|
533
592
|
# Example: [1,2,3].sigma_pairs([4,5,6], 0) {|x, y| x + y}
|
534
593
|
# returns 21 instead of 21.0.
|
535
|
-
def
|
536
|
-
self.
|
594
|
+
def _jes_sigma_pairs(other, z=_jes_zero, &block)
|
595
|
+
self._jes_to_pairs(other,&block).inject(z) {|sum, i| sum += i}
|
537
596
|
end
|
597
|
+
safe_alias :_jes_sigma_pairs
|
538
598
|
|
539
599
|
# Returns the Euclidian distance between all points of a set of enumerables
|
540
|
-
def
|
541
|
-
Math.sqrt(self.
|
600
|
+
def _jes_euclidian_distance(other)
|
601
|
+
Math.sqrt(self._jes_sigma_pairs(other) {|a, b| (a - b) ** 2})
|
542
602
|
end
|
603
|
+
safe_alias :_jes_euclidian_distance
|
543
604
|
|
544
605
|
# Returns a random integer in the range for any number of lists. This
|
545
606
|
# is a way to get a random vector that is tenable based on the sample
|
@@ -552,24 +613,25 @@ module Enumerable
|
|
552
613
|
# last place.
|
553
614
|
# Works for integers. Rethink this for floats. May consider setting up
|
554
615
|
# FixedRange for floats. O(n*5)
|
555
|
-
def
|
556
|
-
min = self.
|
557
|
-
max = self.
|
616
|
+
def _jes_rand_in_range(*args)
|
617
|
+
min = self._jes_min_of_lists(*args)
|
618
|
+
max = self._jes_max_of_lists(*args)
|
558
619
|
(0...size).inject([]) do |ary, i|
|
559
620
|
ary << rand_between(min[i], max[i])
|
560
621
|
end
|
561
622
|
end
|
623
|
+
safe_alias :_jes_rand_in_range
|
562
624
|
|
563
625
|
# Finds the correlation between two enumerables.
|
564
626
|
# Example: [1,2,3].cor [2,3,5]
|
565
627
|
# returns 0.981980506061966
|
566
|
-
def
|
567
|
-
n = [self.size, other.size].
|
568
|
-
sum_of_products_of_pairs = self.
|
569
|
-
self_sum = self.
|
570
|
-
other_sum = other.
|
571
|
-
sum_of_squared_self_scores = self.
|
572
|
-
sum_of_squared_other_scores = other.
|
628
|
+
def _jes_correlation(other)
|
629
|
+
n = [self.size, other.size]._jes_min
|
630
|
+
sum_of_products_of_pairs = self._jes_sigma_pairs(other) {|a, b| a * b}
|
631
|
+
self_sum = self._jes_sum
|
632
|
+
other_sum = other._jes_sum
|
633
|
+
sum_of_squared_self_scores = self._jes_sum { |e| e * e }
|
634
|
+
sum_of_squared_other_scores = other._jes_sum { |e| e * e }
|
573
635
|
|
574
636
|
numerator = (n * sum_of_products_of_pairs) - (self_sum * other_sum)
|
575
637
|
self_denominator = ((n * sum_of_squared_self_scores) - (self_sum ** 2))
|
@@ -577,56 +639,65 @@ module Enumerable
|
|
577
639
|
denominator = Math.sqrt(self_denominator * other_denominator)
|
578
640
|
return numerator / denominator
|
579
641
|
end
|
580
|
-
|
642
|
+
safe_alias :_jes_correlation
|
643
|
+
safe_alias :cor, :_jes_correlation
|
581
644
|
|
582
645
|
# Transposes arrays of arrays and yields a block on the value.
|
583
646
|
# The regular Array#transpose ignores blocks
|
584
|
-
def
|
647
|
+
def _jes_yield_transpose(*enums, &block)
|
585
648
|
enums.unshift(self)
|
586
649
|
n = enums.map{ |x| x.size}.min
|
587
650
|
block ||= lambda{|e| e}
|
588
651
|
(0...n).map { |i| block.call enums.map{ |x| x[i] } }
|
589
652
|
end
|
653
|
+
safe_alias :_jes_yield_transpose
|
590
654
|
|
591
655
|
# Returns the max of two or more enumerables.
|
592
656
|
# >> [1,2,3].max_of_lists([0,5,6], [0,2,9])
|
593
657
|
# => [1, 5, 9]
|
594
|
-
def
|
595
|
-
|
658
|
+
def _jes_max_of_lists(*enums)
|
659
|
+
_jes_yield_transpose(*enums) {|e| e._jes_max}
|
596
660
|
end
|
661
|
+
safe_alias :_jes_max_of_lists
|
597
662
|
|
598
663
|
# Returns the min of two or more enumerables.
|
599
664
|
# >> [1,2,3].min_of_lists([4,5,6], [0,2,9])
|
600
665
|
# => [0, 2, 3]
|
601
|
-
def
|
602
|
-
|
666
|
+
def _jes_min_of_lists(*enums)
|
667
|
+
_jes_yield_transpose(*enums) {|e| e.min}
|
603
668
|
end
|
669
|
+
safe_alias :_jes_min_of_lists
|
604
670
|
|
605
671
|
# Returns the covariance of two lists.
|
606
|
-
def
|
607
|
-
self.
|
608
|
-
other.
|
609
|
-
n = [self.size, other.size].
|
610
|
-
self_average = self.
|
611
|
-
other_average = other.
|
612
|
-
total_expected = self.
|
672
|
+
def _jes_covariance(other)
|
673
|
+
self._jes_to_f!
|
674
|
+
other._jes_to_f!
|
675
|
+
n = [self.size, other.size]._jes_min
|
676
|
+
self_average = self._jes_average
|
677
|
+
other_average = other._jes_average
|
678
|
+
total_expected = self._jes_sigma_pairs(other) {|a, b| (a - self_average) * (b - other_average)}
|
613
679
|
total_expected / n
|
614
680
|
end
|
681
|
+
safe_alias :_jes_covariance
|
615
682
|
|
616
683
|
# The covariance / product of standard deviations
|
617
684
|
# http://en.wikipedia.org/wiki/Correlation
|
618
|
-
def
|
619
|
-
self.
|
620
|
-
other.
|
621
|
-
denominator = self.
|
622
|
-
self.
|
685
|
+
def _jes_pearson_correlation(other)
|
686
|
+
self._jes_to_f!
|
687
|
+
other._jes_to_f!
|
688
|
+
denominator = self._jes_standard_deviation * other._jes_standard_deviation
|
689
|
+
self._jes_covariance(other) / denominator
|
623
690
|
end
|
691
|
+
safe_alias :_jes_pearson_correlation
|
624
692
|
|
625
693
|
# Some calculations have to have at least floating point numbers. This
|
626
694
|
# generates a cached version of the operation--only runs once per object.
|
627
|
-
def
|
628
|
-
return true if @
|
629
|
-
@
|
695
|
+
def _jes_to_f!
|
696
|
+
return true if @_jes_to_f
|
697
|
+
@_jes_to_f = self.map! {|e| e.to_f}
|
630
698
|
end
|
699
|
+
safe_alias :_jes_to_f!
|
631
700
|
|
632
|
-
end
|
701
|
+
end
|
702
|
+
|
703
|
+
@a = [1,2,3]
|