prolly 0.0.1 → 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 301611a54f3a7af11acf53093b45721e7555b99c
4
- data.tar.gz: 539ad41437a7e9063dcd8957b369110a7566d0ff
3
+ metadata.gz: 2c026ea02e8ac2de33f7278ac2fe87e3d8864cf0
4
+ data.tar.gz: 7f9c5e4186921a36dcf74fe7662aa1b58551fd3e
5
5
  SHA512:
6
- metadata.gz: 1b31d314231261ffa961d69fe0559a5c5f59822430eb33cfc659a16c82fcb7142e728d64cbb02f4497e96948f61aed7bdacfb5949c656393d0ec01cfaf1ada2c
7
- data.tar.gz: e08c9a4b435449ed146f31c0e829ae3eb8b26064b50a025850d432ddb96d2d02ef5a88736c4f4da77483616ca4e63875036a92585e5ef2f35d9d8f353e9c6ed4
6
+ metadata.gz: 0bdfe8480833e3f61552ae656763f24acff8e8ca331e4c3d5e452286ed29f0b1e537067730fa1df043beec4e680c04c530f0151f0d025c282c837dceedf678f3
7
+ data.tar.gz: 6b5fcc2c664c0c18c123c0bc8c81bf109623293732696a886db3880212ee1f38898343f463b2e322180bb4f860f3b3d5d4cc959c179d2c63800dfd9e4bbf09a2
@@ -6,9 +6,10 @@ specifically for answering questions about probabilities of events based on the
6
6
  samples you've seen before.
7
7
 
8
8
  So instead of counting all the events yourself, you just express
9
- probabilities much like how math books express it. Being able to express
10
- probabilities is useful for writing machine learning algorithms at a higher level
11
- of abstraction. The right level abstraction makes things easier to build.
9
+ probabilities, entropies, and information gain much like how math books express it.
10
+ Being able to express probabilities is useful for writing machine learning
11
+ algorithms at a higher level of abstraction. The right level abstraction makes things
12
+ easier to build.
12
13
 
13
14
  We can now making decisions in code not just based on the current data, like `if`
14
15
  statements do, but we can make decisions based on the chance of prior data and
@@ -90,17 +91,99 @@ Ps.rv(color: :blue).given(size: :small).prob
90
91
  ```
91
92
  And that will give you the probability of the random variable Color is :blue given that the Size was :small.
92
93
 
93
- ### Random Variables and Operations
94
+ ### Probabilities
95
+
96
+ What is the probability there is a blue marble?
97
+ ```ruby
98
+ # P(C = blue)
99
+ Ps.rv(color: :blue).prob
100
+ ```
101
+
102
+ What is the joint probability there is a blue marble that also has a rough texture?
103
+ ```ruby
104
+ # P(C = blue, T = rough)
105
+ Ps.rv(color: :blue, texture: :rough).prob
106
+ ```
107
+
108
+ What is the probability a marble is small or med sized?
109
+ ```ruby
110
+ # P(S = small, med)
111
+ Ps.rv(size: [:small, :med]).prob
112
+ ```
113
+
114
+ What is the probability of a blue marble given that the marble is small?
115
+ ```ruby
116
+ # P(C = blue | S = small)
117
+ Ps.rv(color: :blue).given(size: :small).prob
118
+ ```
119
+
120
+ What is the probability of a blue marble and rough texture given that the marble is small?
121
+ ```ruby
122
+ # P(C = blue, T = rough | S = small)
123
+ Ps.rv(color: :blue, texture: :rough).given(size: :small).prob
124
+ ```
125
+
126
+ ### Probability density functions
127
+
128
+ Probability density for a random variable.
129
+ ```ruby
130
+ Ps.rv(:color).pdf
131
+ ```
132
+
133
+ Probability density for a conditional random variable.
134
+ ```ruby
135
+ Ps.rv(:color).given(size: :small).pdf
136
+ ```
137
+
138
+ ### Entropy
139
+
140
+ Entropy of the RV color.
141
+ ```ruby
142
+ # H(C)
143
+ Ps.rv(:color).entropy
144
+ ```
145
+
146
+ Entropy of color given the marble is small
147
+ ```ruby
148
+ # H(C | S = small)
149
+ Ps.rv(:color).given(size: :small).entropy
150
+ ```
151
+
152
+ ### Information Gain
153
+
154
+ Information gain of color and size.
155
+ ```ruby
156
+ # IG(C | S)
157
+ Ps.rv(:color).given(:size).infogain
158
+ ```
159
+
160
+ Information gain of color and size, when we already know texture and opacity.
161
+ ```ruby
162
+ # IG(C | S, T=smooth, O=opaque)
163
+ Ps.rv(:color).given(:size, { texture: :smooth, opacity: :opaque }).infogain
164
+ ```
165
+
166
+ ### Counts
167
+
168
+ At the base of all the probabilities are counts of stuff.
169
+ ```ruby
170
+ Ps.rv(color: :blue).count
171
+ ```
172
+
173
+ ```ruby
174
+ Ps.rv(:color).given(:size).count
175
+ ```
176
+ ## Full Reference
94
177
 
95
178
  A random variable can be specified `Ps.rv(:color)` or unspecified `Ps.rv(color: :blue)`. So too can conditional random variables be specified or unspecified.
96
179
 
97
180
  Prolly currently supports five operations.
98
181
 
99
- - .prob · Calculates probability, a fractional number representing the belief you have that an event will occur; based on the amount of evidence you've seen for that event.
100
- - .pdf · Calculates probability density function, a hash of all possible probabilities for the random variable.
101
- - .entropy · Calculates entropy, a fractional number representing the spikiness or smoothness of a density function, which implies how much information is in the random variable.
102
- - .infogain · Calculates information gain, a fractional number representing the amount of information (that is, reduction in uncertainty) that knowing either variable provides about the other.
103
- - .count · Counts the number of events satisfying the conditions.
182
+ - .prob() · Calculates probability, a fractional number representing the belief you have that an event will occur; based on the amount of evidence you've seen for that event.
183
+ - .pdf() · Calculates probability density function, a hash of all possible probabilities for the random variable.
184
+ - .entropy() · Calculates entropy, a fractional number representing the spikiness or smoothness of a density function, which implies how much information is in the random variable.
185
+ - .infogain() · Calculates information gain, a fractional number representing the amount of information (that is, reduction in uncertainty) that knowing either variable provides about the other.
186
+ - .count() · Counts the number of events satisfying the conditions.
104
187
 
105
188
  Each of the operations will only work with certain combinations of random variables. The possibilities are listed below, and Prolly will throw an exception if it's violated.
106
189
 
@@ -108,271 +191,301 @@ Legend:
108
191
  - ✓ available for this operator
109
192
  - Δ! available, but not yet implemented for this operator.
110
193
 
194
+ ### The Probability Operator: .prob()
195
+
111
196
  <table>
112
197
  <tr>
113
- <th>RandVar</th>
114
- <th>Given</th>
115
- <th>.prob</th>
116
- <th>.pdf</th>
117
- <th>.entropy</th>
118
- <th>.infogain</th>
119
- <th>.count</th>
198
+ <th></th>
199
+ <th>n/a</th>
200
+ <th>.given(:size)</th>
201
+ <th>.given(size: :small)</th>
202
+ <th>.given(size: :small, weight: :fat)</th>
203
+ <th>.given(:size, weight: :fat)</th>
204
+ <th>.given(:size, :weight)</th>
120
205
  </tr>
121
206
  <tr>
122
- <th>Ps.rv(color: :blue)</th>
207
+ <th>rv(color: :blue)</th>
208
+ <th>&#10003;</th>
209
+ <th>&#10003;</th>
210
+ <th>&#10003;</th>
211
+ <th>&#10003;</th>
123
212
  <th></th>
213
+ <th></th>
214
+ </tr>
215
+ <tr>
216
+ <th>rv(color: [:blue, :green])</th>
124
217
  <th>&#10003;</th>
125
218
  <th></th>
126
219
  <th></th>
127
220
  <th></th>
128
- <th>&#10003;</th>
221
+ <th></th>
222
+ <th></th>
129
223
  </tr>
130
224
  <tr>
131
- <th>Ps.rv(color: :blue)</th>
132
- <th>.given(:size)</th>
225
+ <th>rv(color: :blue, texture: :rough)</th>
226
+ <th>&#10003;</th>
133
227
  <th>&#10003;</th>
228
+ <th>&#10003;</th>
229
+ <th>&#10003;</th>
230
+ <th></th>
231
+ <th></th>
232
+ </tr>
233
+ <tr>
234
+ <th>rv(:color)</th>
235
+ <th></th>
236
+ <th></th>
237
+ <th></th>
134
238
  <th></th>
135
239
  <th></th>
136
240
  <th></th>
137
- <th>&#10003;</th>
138
241
  </tr>
139
242
  <tr>
140
- <th>Ps.rv(color: :blue)</th>
141
- <th>.given(size: :small)</th>
142
- <th>&#10003;</th>
243
+ <th>rv(:color, :texture)</th>
244
+ <th></th>
245
+ <th></th>
246
+ <th></th>
143
247
  <th></th>
144
248
  <th></th>
145
249
  <th></th>
146
- <th>&#10003;</th>
147
250
  </tr>
251
+ </table>
252
+
253
+ ### The Probability Density Function Operator: .pdf()
254
+
255
+ <table>
148
256
  <tr>
149
- <th>Ps.rv(color: :blue)</th>
257
+ <th></th>
258
+ <th>n/a</th>
259
+ <th>.given(:size)</th>
260
+ <th>.given(size: :small)</th>
150
261
  <th>.given(size: :small, weight: :fat)</th>
151
- <th>&#10003;</th>
262
+ <th>.given(:size, weight: :fat)</th>
263
+ <th>.given(:size, :weight)</th>
264
+ </tr>
265
+ <tr>
266
+ <th>rv(color: :blue)</th>
267
+ <th></th>
268
+ <th></th>
269
+ <th></th>
152
270
  <th></th>
153
271
  <th></th>
154
272
  <th></th>
155
- <th>&#10003;</th>
156
273
  </tr>
157
- <tr>
158
- <th>Ps.rv(color: [:blue, :green])</th>
159
- <th></th>
160
- <th>&#10003;</th>
161
- <th></th>
162
- <th></th>
163
- <th></th>
164
- <th>&#10003;</th>
165
- </tr>
166
274
  <tr>
167
- <th>Ps.rv(color: :blue, texture: :rough)</th>
275
+ <th>rv(color: [:blue, :green])</th>
276
+ <th></th>
277
+ <th></th>
168
278
  <th></th>
169
- <th>&#10003;</th>
170
279
  <th></th>
171
280
  <th></th>
172
281
  <th></th>
173
- <th>&#10003;</th>
174
282
  </tr>
175
283
  <tr>
176
- <th>Ps.rv(color: :blue, texture: :rough)</th>
177
- <th>.given(:size)</th>
178
- <th>&#10003;</th>
284
+ <th>rv(color: :blue, texture: :rough)</th>
285
+ <th></th>
286
+ <th></th>
287
+ <th></th>
179
288
  <th></th>
180
289
  <th></th>
181
290
  <th></th>
182
- <th>&#10003;</th>
183
291
  </tr>
184
292
  <tr>
185
- <th>Ps.rv(color: :blue, texture: :rough)</th>
186
- <th>.given(size: :small)</th>
293
+ <th>rv(:color)</th>
294
+ <th>&#10003;</th>
295
+ <th>&#10003;</th>
296
+ <th>&#10003;</th>
187
297
  <th>&#10003;</th>
188
298
  <th></th>
189
299
  <th></th>
300
+ </tr>
301
+ <tr>
302
+ <th>rv(:color, :texture)</th>
303
+ <th>&Delta;!</th>
304
+ <th>&Delta;!</th>
305
+ <th>&Delta;!</th>
306
+ <th>&Delta;!</th>
307
+ <th>&Delta;!</th>
190
308
  <th></th>
191
- <th>&#10003;</th>
192
309
  </tr>
310
+ </table>
311
+
312
+ ### The Entropy Operator: .entropy()
313
+
314
+ <table>
193
315
  <tr>
194
- <th>Ps.rv(color: :blue, texture: :rough)</th>
316
+ <th></th>
317
+ <th>n/a</th>
318
+ <th>.given(:size)</th>
319
+ <th>.given(size: :small)</th>
195
320
  <th>.given(size: :small, weight: :fat)</th>
196
- <th>&#10003;</th>
321
+ <th>.given(:size, weight: :fat)</th>
322
+ <th>.given(:size, :weight)</th>
323
+ </tr>
324
+ <tr>
325
+ <th>rv(color: :blue)</th>
326
+ <th></th>
327
+ <th></th>
328
+ <th></th>
197
329
  <th></th>
198
330
  <th></th>
199
331
  <th></th>
200
- <th>&#10003;</th>
201
332
  </tr>
202
333
  <tr>
203
- <th>Ps.rv(:color)</th>
334
+ <th>rv(color: [:blue, :green])</th>
335
+ <th></th>
336
+ <th></th>
337
+ <th></th>
204
338
  <th></th>
205
339
  <th></th>
206
- <th>&#10003;</th>
207
- <th>&#10003;</th>
208
340
  <th></th>
209
- <th>&#10003;</th>
210
341
  </tr>
211
342
  <tr>
212
- <th>Ps.rv(:color)</th>
213
- <th>.given(:size)</th>
343
+ <th>rv(color: :blue, texture: :rough)</th>
344
+ <th></th>
345
+ <th></th>
346
+ <th></th>
347
+ <th></th>
214
348
  <th></th>
349
+ <th></th>
350
+ </tr>
351
+ <tr>
352
+ <th>rv(:color)</th>
215
353
  <th>&#10003;</th>
216
354
  <th>&#10003;</th>
217
355
  <th>&#10003;</th>
218
356
  <th>&#10003;</th>
357
+ <th>&#10003;</th>
358
+ <th></th>
219
359
  </tr>
220
360
  <tr>
221
- <th>Ps.rv(:color)</th>
222
- <th>.given(size: :small)</th>
223
- <th></th>
361
+ <th>rv(:color, :texture)</th>
362
+ <th>&#10003;</th>
363
+ <th>&Delta;!</th>
224
364
  <th>&#10003;</th>
365
+ <th>&Delta;!</th>
225
366
  <th>&#10003;</th>
226
367
  <th></th>
227
- <th>&#10003;</th>
228
368
  </tr>
369
+ </table>
370
+
371
+ ### The Information Gain Operator: .infogain()
372
+
373
+ <table>
229
374
  <tr>
230
- <th>Ps.rv(:color)</th>
375
+ <th></th>
376
+ <th>n/a</th>
377
+ <th>.given(:size)</th>
378
+ <th>.given(size: :small)</th>
231
379
  <th>.given(size: :small, weight: :fat)</th>
380
+ <th>.given(:size, weight: :fat)</th>
381
+ <th>.given(:size, :weight)</th>
382
+ </tr>
383
+ <tr>
384
+ <th>rv(color: :blue)</th>
385
+ <th></th>
386
+ <th></th>
387
+ <th></th>
388
+ <th></th>
232
389
  <th></th>
233
- <th>&#10003;</th>
234
- <th>&#10003;</th>
235
390
  <th></th>
236
- <th>&#10003;</th>
237
391
  </tr>
238
392
  <tr>
239
- <th>Ps.rv(:color)</th>
240
- <th>.given(:size, weight: :fat)</th>
393
+ <th>rv(color: [:blue, :green])</th>
394
+ <th></th>
395
+ <th></th>
396
+ <th></th>
397
+ <th></th>
241
398
  <th></th>
242
399
  <th></th>
243
- <th>&#10003;</th>
244
- <th>&#10003;</th>
245
- <th>&#10003;</th>
246
400
  </tr>
247
401
  <tr>
248
- <th>Ps.rv(:color, :texture)</th>
402
+ <th>rv(color: :blue, texture: :rough)</th>
249
403
  <th></th>
250
404
  <th></th>
251
- <th>&Delta;!</th>
405
+ <th></th>
406
+ <th></th>
407
+ <th></th>
408
+ <th></th>
409
+ </tr>
410
+ <tr>
411
+ <th>rv(:color)</th>
412
+ <th></th>
252
413
  <th>&#10003;</th>
253
414
  <th></th>
415
+ <th></th>
254
416
  <th>&#10003;</th>
417
+ <th></th>
255
418
  </tr>
256
419
  <tr>
257
- <th>Ps.rv(:color, :texture)</th>
258
- <th>.given(:size)</th>
420
+ <th>rv(:color, :texture)</th>
421
+ <th></th>
422
+ <th></th>
423
+ <th></th>
424
+ <th></th>
259
425
  <th></th>
260
- <th>&Delta;!</th>
261
- <th>&Delta;!</th>
262
426
  <th></th>
263
- <th>&#10003;</th>
264
427
  </tr>
428
+ </table>
429
+
430
+ ### The Count Operator: .count()
431
+
432
+ <table>
265
433
  <tr>
266
- <th>Ps.rv(:color, :texture)</th>
267
- <th>.given(size: :small)</th>
268
434
  <th></th>
269
- <th>&Delta;!</th>
435
+ <th>n/a</th>
436
+ <th>.given(:size)</th>
437
+ <th>.given(size: :small)</th>
438
+ <th>.given(size: :small, weight: :fat)</th>
439
+ <th>.given(:size, weight: :fat)</th>
440
+ <th>.given(:size, :weight)</th>
441
+ </tr>
442
+ <tr>
443
+ <th>rv(color: :blue)</th>
444
+ <th>&#10003;</th>
445
+ <th>&#10003;</th>
446
+ <th>&#10003;</th>
447
+ <th>&#10003;</th>
270
448
  <th>&#10003;</th>
271
- <th></th>
272
449
  <th>&#10003;</th>
273
450
  </tr>
274
451
  <tr>
275
- <th>Ps.rv(:color, :texture)</th>
276
- <th>.given(size: :small, weight: :fat)</th>
277
- <th></th>
278
- <th>&Delta;!</th>
279
- <th>&Delta;!</th>
280
- <th></th>
452
+ <th>rv(color: [:blue, :green])</th>
453
+ <th>&#10003;</th>
454
+ <th>&#10003;</th>
455
+ <th>&#10003;</th>
456
+ <th>&#10003;</th>
457
+ <th>&#10003;</th>
281
458
  <th>&#10003;</th>
282
459
  </tr>
283
460
  <tr>
284
- <th>Ps.rv(:color, :texture)</th>
285
- <th>.given(:size, weight: :fat)</th>
286
- <th></th>
287
- <th>&Delta;!</th>
461
+ <th>rv(color: :blue, texture: :rough)</th>
462
+ <th>&#10003;</th>
463
+ <th>&#10003;</th>
464
+ <th>&#10003;</th>
465
+ <th>&#10003;</th>
466
+ <th>&#10003;</th>
467
+ <th>&#10003;</th>
468
+ </tr>
469
+ <tr>
470
+ <th>rv(:color)</th>
471
+ <th>&#10003;</th>
472
+ <th>&#10003;</th>
473
+ <th>&#10003;</th>
474
+ <th>&#10003;</th>
475
+ <th>&#10003;</th>
476
+ <th>&#10003;</th>
477
+ </tr>
478
+ <tr>
479
+ <th>rv(:color, :texture)</th>
480
+ <th>&#10003;</th>
481
+ <th>&#10003;</th>
482
+ <th>&#10003;</th>
483
+ <th>&#10003;</th>
288
484
  <th>&#10003;</th>
289
- <th></th>
290
485
  <th>&#10003;</th>
291
486
  </tr>
292
487
  </table>
293
488
 
294
- ## Examples
295
-
296
- There are examples of using Prolly to write learning algorithms.
297
-
298
- - [Decision Tree](https://github.com/iamwilhelm/prolly/tree/master/examples/decision_tree)
299
-
300
- ### Probabilities
301
-
302
- What is the probability there is a blue marble?
303
- ```ruby
304
- # P(C = blue)
305
- Ps.rv(color: :blue).prob
306
- ```
307
-
308
- What is the joint probability there is a blue marble that also has a rough texture?
309
- ```ruby
310
- # P(C = blue, T = rough)
311
- Ps.rv(color: :blue, texture: :rough).prob
312
- ```
313
-
314
- What is the probability a marble is small or med sized?
315
- ```ruby
316
- # P(S = small, med)
317
- Ps.rv(size: [:small, :med]).prob
318
- ```
319
-
320
- What is the probability of a blue marble given that the marble is small?
321
- ```ruby
322
- # P(C = blue | S = small)
323
- Ps.rv(color: :blue).given(size: :small).prob
324
- ```
325
-
326
- What is the probability of a blue marble and rough texture given that the marble is small?
327
- ```ruby
328
- # P(C = blue, T = rough | S = small)
329
- Ps.rv(color: :blue, texture: :rough).given(size: :small).prob
330
- ```
331
-
332
- ### Probability density functions
333
-
334
- Probability density for a random variable.
335
- ```ruby
336
- Ps.rv(:color).pdf
337
- ```
338
-
339
- Probability density for a conditional random variable.
340
- ```ruby
341
- Ps.rv(:color).given(size: :small).pdf
342
- ```
343
-
344
- ### Entropy
345
-
346
- Entropy of the RV color.
347
- ```ruby
348
- # H(C)
349
- Ps.rv(:color).entropy
350
- ```
351
-
352
- Entropy of color given the marble is small
353
- ```ruby
354
- # H(C | S = small)
355
- Ps.rv(:color).given(size: :small).entropy
356
- ```
357
-
358
- ### Information Gain
359
-
360
- Information gain of color and size.
361
- ```ruby
362
- # IG(C | S)
363
- Ps.rv(:color).given(:size).infogain
364
- ```
365
- ### Counts
366
-
367
- At the base of all the probabilities are counts of stuff.
368
- ```ruby
369
- Ps.rv(color: :blue).count
370
- ```
371
-
372
- ```ruby
373
- Ps.rv(:color).given(:size).count
374
- ```
375
-
376
489
  ## Stores
377
490
 
378
491
  Prolly can use different stores to remember the prior event data from which it
@@ -60,9 +60,9 @@ module Prolly
60
60
 
61
61
  def_delegators :@storage, :reset, :add, :count, :rand_vars, :uniq_vals, :import
62
62
 
63
- def initialize
64
- #@storage = Storage::Rubylist.new()
65
- @storage = Storage::Mongodb.new()
63
+ def initialize(storage = nil)
64
+ #@storage = Storage::Mongodb.new()
65
+ @storage = Storage::Rubylist.new()
66
66
  #@storage = Storage::Redis.new()
67
67
  end
68
68
 
@@ -22,18 +22,16 @@ module Prolly
22
22
 
23
23
  def count(rvs, options = {})
24
24
  reload = options[:reload] || false
25
- start_time = Time.now
26
25
  if rvs.kind_of?(Array)
27
26
  value = @data.count { |e| rvs.all? { |rv| e.has_key?(rv) } }
28
27
  elsif rvs.kind_of?(Hash)
29
28
  value = @data.count { |e|
30
29
  rvs.map { |rkey, rval|
31
30
  vals = rval.kind_of?(Array) ? rval : [rval]
32
- vals.include?(e[rkey]) == rval
31
+ vals.include?(e[rkey])
33
32
  }.all?
34
33
  }
35
34
  end
36
- elapsed = Time.now - start_time
37
35
  return value
38
36
  end
39
37
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: prolly
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.0.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Wil Chung