prolly 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 301611a54f3a7af11acf53093b45721e7555b99c
4
+ data.tar.gz: 539ad41437a7e9063dcd8957b369110a7566d0ff
5
+ SHA512:
6
+ metadata.gz: 1b31d314231261ffa961d69fe0559a5c5f59822430eb33cfc659a16c82fcb7142e728d64cbb02f4497e96948f61aed7bdacfb5949c656393d0ec01cfaf1ada2c
7
+ data.tar.gz: e08c9a4b435449ed146f31c0e829ae3eb8b26064b50a025850d432ddb96d2d02ef5a88736c4f4da77483616ca4e63875036a92585e5ef2f35d9d8f353e9c6ed4
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) <year> <copyright holders>
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
22
+
data/README.markdown ADDED
@@ -0,0 +1,446 @@
1
+ # Prolly
2
+
3
+ **Prolly is a Domain Specific Language (DSL) for expressing probabilities in code.**
4
+ Just like a database has a query language (SQL), this is a query language
5
+ specifically for answering questions about probabilities of events based on the
6
+ samples you've seen before.
7
+
8
+ So instead of counting all the events yourself, you just express
9
+ probabilities much like how math books express it. Being able to express
10
+ probabilities is useful for writing machine learning algorithms at a higher level
11
+ of abstraction. The right level abstraction makes things easier to build.
12
+
13
+ We can now making decisions in code not just based on the current data, like `if`
14
+ statements do, but we can make decisions based on the chance of prior data and
15
+ the current data, and that makes for smarter software.
16
+
17
+ ## What can I use this for?
18
+
19
+ There are examples of using Prolly to write learning algorithms.
20
+
21
+ - [Decision Tree](https://github.com/iamwilhelm/prolly/tree/master/examples/decision_tree)
22
+
23
+
24
+ ## Quick intro
25
+
26
+ Prolly makes it easy to express probabilities from data. It can also calculate
27
+ entropies of random variables as well as the information gain.
28
+
29
+ Here's how to express Bayes Rule in Prolly:
30
+
31
+ ```
32
+ Ps.rv(color: blue).given(size: red).prob * Ps.rv(size: red).prob
33
+ / Ps.rv(color: blue).prob
34
+ ```
35
+
36
+ And the above will calculate P(Size=red | Color= blue)
37
+
38
+ ## Installing
39
+
40
+ Use ruby gems to install
41
+
42
+ `gem install prolly`
43
+
44
+ If you use Bundler, just add it to your Gemfile, and then run `bundle install`
45
+
46
+ ## Usage
47
+
48
+ We first add samples of observable events to be able to estimate the probability of the events we've seen. Then we can query it with Prolly to know the probability of different events.
49
+
50
+ ### Adding samples
51
+
52
+ Now we add the samples of data that we've observed for the random variable. Presumably, we have a large
53
+ enough dataset that we can reasonably estimate each specified RV.
54
+
55
+ ```
56
+ require 'prolly'
57
+ include Prolly
58
+
59
+ Ps.add({ color: :blue, size: :small })
60
+ Ps.add({ color: :blue, size: :big })
61
+ Ps.add({ color: :blue, size: :big })
62
+ Ps.add({ color: :green, size: :big })
63
+ Ps.add({ color: :green, size: :small })
64
+ ```
65
+
66
+ Now that we have samples to estimate our probabilities, we're good to go on how to express them.
67
+
68
+ >Note that you need you'll need to `include Prolly` into whatever namespace you're using it in, in order to call `Ps.add`. Otherwise, you'll need
69
+ to type: `Prolly::Ps.add`, if `Ps` is already taken in your namespace.
70
+
71
+ ### Expressing Stochastics through Probability Space
72
+
73
+ `Ps` is short for Probability Space. It's normally denoted by &Omega;, U (for universal set), or S (for sample set) in probability textbooks. It's the set of all events that could happen.
74
+
75
+ You start with probability space.
76
+ ```
77
+ Ps
78
+ ```
79
+ then pick an specified random variable to examine
80
+ ```
81
+ Ps.rv(color: :blue)
82
+ ```
83
+ And if necessary, pick a conditional random variable
84
+ ```
85
+ Ps.rv(color: :blue).given(size: :small)
86
+ ```
87
+ Then pick the operation, where it can be `count`, `prob`, `pdf`, `entropy`, or `infogain`.
88
+ ```
89
+ Ps.rv(color: :blue).given(size: :small).prob
90
+ ```
91
+ And that will give you the probability of the random variable Color is :blue given that the Size was :small.
92
+
93
+ ### Random Variables and Operations
94
+
95
+ A random variable can be specified `Ps.rv(:color)` or unspecified `Ps.rv(color: :blue)`. So too can conditional random variables be specified or unspecified.
96
+
97
+ Prolly currently supports five operations.
98
+
99
+ - .prob &middot; Calculates probability, a fractional number representing the belief you have that an event will occur; based on the amount of evidence you've seen for that event.
100
+ - .pdf &middot; Calculates probability density function, a hash of all possible probabilities for the random variable.
101
+ - .entropy &middot; Calculates entropy, a fractional number representing the spikiness or smoothness of a density function, which implies how much information is in the random variable.
102
+ - .infogain &middot; Calculates information gain, a fractional number representing the amount of information (that is, reduction in uncertainty) that knowing either variable provides about the other.
103
+ - .count &middot; Counts the number of events satisfying the conditions.
104
+
105
+ Each of the operations will only work with certain combinations of random variables. The possibilities are listed below, and Prolly will throw an exception if it's violated.
106
+
107
+ Legend:
108
+ - &#10003; available for this operator
109
+ - &Delta;! available, but not yet implemented for this operator.
110
+
111
+ <table>
112
+ <tr>
113
+ <th>RandVar</th>
114
+ <th>Given</th>
115
+ <th>.prob</th>
116
+ <th>.pdf</th>
117
+ <th>.entropy</th>
118
+ <th>.infogain</th>
119
+ <th>.count</th>
120
+ </tr>
121
+ <tr>
122
+ <th>Ps.rv(color: :blue)</th>
123
+ <th></th>
124
+ <th>&#10003;</th>
125
+ <th></th>
126
+ <th></th>
127
+ <th></th>
128
+ <th>&#10003;</th>
129
+ </tr>
130
+ <tr>
131
+ <th>Ps.rv(color: :blue)</th>
132
+ <th>.given(:size)</th>
133
+ <th>&#10003;</th>
134
+ <th></th>
135
+ <th></th>
136
+ <th></th>
137
+ <th>&#10003;</th>
138
+ </tr>
139
+ <tr>
140
+ <th>Ps.rv(color: :blue)</th>
141
+ <th>.given(size: :small)</th>
142
+ <th>&#10003;</th>
143
+ <th></th>
144
+ <th></th>
145
+ <th></th>
146
+ <th>&#10003;</th>
147
+ </tr>
148
+ <tr>
149
+ <th>Ps.rv(color: :blue)</th>
150
+ <th>.given(size: :small, weight: :fat)</th>
151
+ <th>&#10003;</th>
152
+ <th></th>
153
+ <th></th>
154
+ <th></th>
155
+ <th>&#10003;</th>
156
+ </tr>
157
+ <tr>
158
+ <th>Ps.rv(color: [:blue, :green])</th>
159
+ <th></th>
160
+ <th>&#10003;</th>
161
+ <th></th>
162
+ <th></th>
163
+ <th></th>
164
+ <th>&#10003;</th>
165
+ </tr>
166
+ <tr>
167
+ <th>Ps.rv(color: :blue, texture: :rough)</th>
168
+ <th></th>
169
+ <th>&#10003;</th>
170
+ <th></th>
171
+ <th></th>
172
+ <th></th>
173
+ <th>&#10003;</th>
174
+ </tr>
175
+ <tr>
176
+ <th>Ps.rv(color: :blue, texture: :rough)</th>
177
+ <th>.given(:size)</th>
178
+ <th>&#10003;</th>
179
+ <th></th>
180
+ <th></th>
181
+ <th></th>
182
+ <th>&#10003;</th>
183
+ </tr>
184
+ <tr>
185
+ <th>Ps.rv(color: :blue, texture: :rough)</th>
186
+ <th>.given(size: :small)</th>
187
+ <th>&#10003;</th>
188
+ <th></th>
189
+ <th></th>
190
+ <th></th>
191
+ <th>&#10003;</th>
192
+ </tr>
193
+ <tr>
194
+ <th>Ps.rv(color: :blue, texture: :rough)</th>
195
+ <th>.given(size: :small, weight: :fat)</th>
196
+ <th>&#10003;</th>
197
+ <th></th>
198
+ <th></th>
199
+ <th></th>
200
+ <th>&#10003;</th>
201
+ </tr>
202
+ <tr>
203
+ <th>Ps.rv(:color)</th>
204
+ <th></th>
205
+ <th></th>
206
+ <th>&#10003;</th>
207
+ <th>&#10003;</th>
208
+ <th></th>
209
+ <th>&#10003;</th>
210
+ </tr>
211
+ <tr>
212
+ <th>Ps.rv(:color)</th>
213
+ <th>.given(:size)</th>
214
+ <th></th>
215
+ <th>&#10003;</th>
216
+ <th>&#10003;</th>
217
+ <th>&#10003;</th>
218
+ <th>&#10003;</th>
219
+ </tr>
220
+ <tr>
221
+ <th>Ps.rv(:color)</th>
222
+ <th>.given(size: :small)</th>
223
+ <th></th>
224
+ <th>&#10003;</th>
225
+ <th>&#10003;</th>
226
+ <th></th>
227
+ <th>&#10003;</th>
228
+ </tr>
229
+ <tr>
230
+ <th>Ps.rv(:color)</th>
231
+ <th>.given(size: :small, weight: :fat)</th>
232
+ <th></th>
233
+ <th>&#10003;</th>
234
+ <th>&#10003;</th>
235
+ <th></th>
236
+ <th>&#10003;</th>
237
+ </tr>
238
+ <tr>
239
+ <th>Ps.rv(:color)</th>
240
+ <th>.given(:size, weight: :fat)</th>
241
+ <th></th>
242
+ <th></th>
243
+ <th>&#10003;</th>
244
+ <th>&#10003;</th>
245
+ <th>&#10003;</th>
246
+ </tr>
247
+ <tr>
248
+ <th>Ps.rv(:color, :texture)</th>
249
+ <th></th>
250
+ <th></th>
251
+ <th>&Delta;!</th>
252
+ <th>&#10003;</th>
253
+ <th></th>
254
+ <th>&#10003;</th>
255
+ </tr>
256
+ <tr>
257
+ <th>Ps.rv(:color, :texture)</th>
258
+ <th>.given(:size)</th>
259
+ <th></th>
260
+ <th>&Delta;!</th>
261
+ <th>&Delta;!</th>
262
+ <th></th>
263
+ <th>&#10003;</th>
264
+ </tr>
265
+ <tr>
266
+ <th>Ps.rv(:color, :texture)</th>
267
+ <th>.given(size: :small)</th>
268
+ <th></th>
269
+ <th>&Delta;!</th>
270
+ <th>&#10003;</th>
271
+ <th></th>
272
+ <th>&#10003;</th>
273
+ </tr>
274
+ <tr>
275
+ <th>Ps.rv(:color, :texture)</th>
276
+ <th>.given(size: :small, weight: :fat)</th>
277
+ <th></th>
278
+ <th>&Delta;!</th>
279
+ <th>&Delta;!</th>
280
+ <th></th>
281
+ <th>&#10003;</th>
282
+ </tr>
283
+ <tr>
284
+ <th>Ps.rv(:color, :texture)</th>
285
+ <th>.given(:size, weight: :fat)</th>
286
+ <th></th>
287
+ <th>&Delta;!</th>
288
+ <th>&#10003;</th>
289
+ <th></th>
290
+ <th>&#10003;</th>
291
+ </tr>
292
+ </table>
293
+
294
+ ## Examples
295
+
296
+ There are examples of using Prolly to write learning algorithms.
297
+
298
+ - [Decision Tree](https://github.com/iamwilhelm/prolly/tree/master/examples/decision_tree)
299
+
300
+ ### Probabilities
301
+
302
+ What is the probability there is a blue marble?
303
+ ```ruby
304
+ # P(C = blue)
305
+ Ps.rv(color: :blue).prob
306
+ ```
307
+
308
+ What is the joint probability there is a blue marble that also has a rough texture?
309
+ ```ruby
310
+ # P(C = blue, T = rough)
311
+ Ps.rv(color: :blue, texture: :rough).prob
312
+ ```
313
+
314
+ What is the probability a marble is small or med sized?
315
+ ```ruby
316
+ # P(S = small, med)
317
+ Ps.rv(size: [:small, :med]).prob
318
+ ```
319
+
320
+ What is the probability of a blue marble given that the marble is small?
321
+ ```ruby
322
+ # P(C = blue | S = small)
323
+ Ps.rv(color: :blue).given(size: :small).prob
324
+ ```
325
+
326
+ What is the probability of a blue marble and rough texture given that the marble is small?
327
+ ```ruby
328
+ # P(C = blue, T = rough | S = small)
329
+ Ps.rv(color: :blue, texture: :rough).given(size: :small).prob
330
+ ```
331
+
332
+ ### Probability density functions
333
+
334
+ Probability density for a random variable.
335
+ ```ruby
336
+ Ps.rv(:color).pdf
337
+ ```
338
+
339
+ Probability density for a conditional random variable.
340
+ ```ruby
341
+ Ps.rv(:color).given(size: :small).pdf
342
+ ```
343
+
344
+ ### Entropy
345
+
346
+ Entropy of the RV color.
347
+ ```ruby
348
+ # H(C)
349
+ Ps.rv(:color).entropy
350
+ ```
351
+
352
+ Entropy of color given the marble is small
353
+ ```ruby
354
+ # H(C | S = small)
355
+ Ps.rv(:color).given(size: :small).entropy
356
+ ```
357
+
358
+ ### Information Gain
359
+
360
+ Information gain of color and size.
361
+ ```ruby
362
+ # IG(C | S)
363
+ Ps.rv(:color).given(:size).infogain
364
+ ```
365
+ ### Counts
366
+
367
+ At the base of all the probabilities are counts of stuff.
368
+ ```ruby
369
+ Ps.rv(color: :blue).count
370
+ ```
371
+
372
+ ```ruby
373
+ Ps.rv(:color).given(:size).count
374
+ ```
375
+
376
+ ## Stores
377
+
378
+ Prolly can use different stores to remember the prior event data from which it
379
+ calculates the probability. Currently Prolly implements a RubyList store and a
380
+ Mongodb store.
381
+
382
+ ### Implementing new stores
383
+
384
+ The interface for a new store is pretty easy. It just needs to implement six methods:
385
+
386
+ #### initialize
387
+
388
+ This just brings up the store, and connects to it, and whatever else you need to do in the beginning.
389
+
390
+ #### reset
391
+
392
+ This should just clear the entire store of the data in the collection.
393
+
394
+ #### add(datum)
395
+
396
+ Adds one row of data to the store.
397
+
398
+ #### count(rvs, options = {})
399
+
400
+ Counts the number of samples that satisfy the RVs requested. `rvs` can be either an Array or a Hash. When it's an array, you must count all
401
+ samples that have all the RVs.
402
+
403
+ When it's a hash, you must look for all samples that not only have the random variables, but also have the matching designated
404
+ values. Note that the values can be an array. When that happens, the user is indicating that it also would like any of the values the RV to match.
405
+
406
+ #### rand_vars
407
+
408
+ Return a list of all random variables
409
+
410
+ #### uniq_vals(name)
411
+
412
+ Return a list of all uniq values of a random variable.
413
+
414
+ ## Motivation
415
+
416
+ A couple years back, I was reading [a blog post](http://weblog.raganwald.com/2008/02/naive-approach-to-hiring-people.html) by Raganwald, where I read this quote:
417
+
418
+ <blockquote>
419
+ A very senior Microsoft developer who moved to Google told me that Google works and thinks at a higher level of abstraction than Microsoft. “Google uses Bayesian filtering the way Microsoft uses the if statement,” he said.
420
+
421
+ —Joel Spolsky, Microsoft Jet
422
+ </blockquote>
423
+
424
+ That got me thinking very literally. What would it look like if we have probability
425
+ statements to use natively like we have "if" statements? How would that change how
426
+ we code? That would mean we could make decisions not just on the information we
427
+ have on hand, but the prior information we saw before.
428
+
429
+ ## Contributing
430
+
431
+ Write some specs, make sure the entire thing passes. Then submit a pull request.
432
+
433
+ ## Contributors
434
+
435
+ - Wil Chung
436
+
437
+ ## License
438
+
439
+ MIT license
440
+
441
+ ## Changelog
442
+
443
+ ### v0.0.1
444
+
445
+ - Initial release with counts, probs, pdf, entropy, and infogain.
446
+ - implements two stores, RubyList and Mongodb
data/lib/prolly.rb ADDED
@@ -0,0 +1,2 @@
1
+ require 'prolly/ps'
2
+
data/lib/prolly/ps.rb ADDED
@@ -0,0 +1,70 @@
1
+ require "forwardable"
2
+ require "prolly/rand_var"
3
+
4
+ require "prolly/ps/storage/rubylist"
5
+ require "prolly/ps/storage/mongodb"
6
+ #require "prolly/ps/storage/redis"
7
+
8
+ module Prolly
9
+ class Ps
10
+
11
+ class << self
12
+ def ps
13
+ @ps ||= Ps.new
14
+ end
15
+
16
+ def import(data)
17
+ ps.import(data)
18
+ end
19
+
20
+ def reset
21
+ ps.reset
22
+ end
23
+
24
+ def add(datum)
25
+ ps.add(datum)
26
+ end
27
+
28
+ def rv(*rand_vars)
29
+ if rand_vars.empty?
30
+ ps.rand_vars
31
+ else
32
+ RandVar.new(ps, *rand_vars)
33
+ end
34
+ end
35
+
36
+ def stash
37
+ ps.stash
38
+ end
39
+
40
+ # unique values for a random variable.
41
+ #
42
+ # If there are multiple random variables, then we get combinations of the unique
43
+ # values of the random variables
44
+ def uniq_vals(uspec_rvs)
45
+
46
+ def combo(list_of_vals)
47
+ if list_of_vals.length == 1
48
+ list_of_vals.first.map { |e| [e] }
49
+ else
50
+ combinations = combo(list_of_vals[1..-1])
51
+ list_of_vals.first.flat_map { |val| combinations.map { |e| [val] + e } }
52
+ end
53
+ end
54
+
55
+ combo(uspec_rvs.map { |uspec_rv| @ps.uniq_vals(uspec_rv) })
56
+ end
57
+ end
58
+
59
+ extend Forwardable
60
+
61
+ def_delegators :@storage, :reset, :add, :count, :rand_vars, :uniq_vals, :import
62
+
63
+ def initialize
64
+ #@storage = Storage::Rubylist.new()
65
+ @storage = Storage::Mongodb.new()
66
+ #@storage = Storage::Redis.new()
67
+ end
68
+
69
+ end
70
+ end
@@ -0,0 +1,40 @@
1
+
2
+ module Prolly
3
+ class Ps
4
+ module Storage
5
+
6
+ class Base
7
+
8
+ def initialize
9
+ end
10
+
11
+ def reset
12
+ @stash ||= {}
13
+ @stash_stats ||= { hits: 0, misses: 0 }
14
+ @stash_time ||= {}
15
+ end
16
+
17
+ def import(data)
18
+ data.each { |datum| add(datum) }
19
+ end
20
+
21
+ def add(datum)
22
+ raise StandardError.new("not implemented")
23
+ end
24
+
25
+ def count(rvs, options = {})
26
+ raise StandardError.new("not implemented")
27
+ end
28
+
29
+ def rand_vars
30
+ end
31
+
32
+ def uniq_vals(name)
33
+ end
34
+
35
+ end
36
+
37
+ end
38
+ end
39
+ end
40
+
@@ -0,0 +1,82 @@
1
+ require 'date'
2
+ require 'moped'
3
+
4
+ require 'prolly/ps/storage/base'
5
+
6
+ module Prolly
7
+ class Ps
8
+ module Storage
9
+
10
+ class Mongodb < Base
11
+
12
+ attr_reader :session
13
+
14
+ def initialize
15
+ @session ||= Moped::Session.new(["127.0.0.1:27017", "127.0.0.1:27018"])
16
+ @session.use 'pspace'
17
+
18
+ super
19
+ @rand_vars = []
20
+ end
21
+
22
+ def reset
23
+ super
24
+ @session['samples'].drop
25
+ end
26
+
27
+ def add(datum)
28
+ # create an index for each new datum key
29
+ #new_rvs(datum).each do |rv|
30
+ # @session.indexes.create(rv.to_sym => 1)
31
+ #end
32
+
33
+ record_new_rand_vars(datum)
34
+
35
+ @session[:samples].insert(datum)
36
+ end
37
+
38
+ def count(rvs, options = {})
39
+ reload = options["reload"] || false
40
+ if rvs.kind_of?(Array)
41
+ @session[:samples].find(
42
+ Hash[*rvs.flat_map { |rv| [rv, { '$exists' => true }] }]
43
+ ).count
44
+ elsif rvs.kind_of?(Hash)
45
+ @session[:samples].find(to_query_hash(rvs)).count
46
+ end
47
+ end
48
+
49
+ def rand_vars
50
+ @session[:rand_vars].find.map { |rv| rv[:name] }
51
+ end
52
+
53
+ def uniq_vals(name)
54
+ @session[:samples].aggregate([
55
+ { "$match" => { name.to_sym => { "$exists" => true } } },
56
+ { "$group" => { "_id": "$#{name}" } }
57
+ ]).map { |e| e["_id"] }
58
+ end
59
+
60
+ private
61
+
62
+ def new_rvs(datum)
63
+ return datum.keys - rand_vars
64
+ end
65
+
66
+ def record_new_rand_vars(datum)
67
+ new_rvs(datum).each do |rv|
68
+ @session[:rand_vars].insert({ name: rv })
69
+ end
70
+ end
71
+
72
+ def to_query_hash(rvs)
73
+ Hash[*rvs.flat_map { |k, v|
74
+ [k, v.kind_of?(Array) ? { "$in" => v } : v]
75
+ }]
76
+ end
77
+
78
+ end
79
+
80
+ end
81
+ end
82
+ end
@@ -0,0 +1,56 @@
1
+ require "redis"
2
+
3
+ require 'prolly/ps/storage/base'
4
+
5
+ module Prolly
6
+ class Ps
7
+ module Storage
8
+
9
+ class Redis
10
+
11
+ def initialize(data)
12
+ @redis = ::Redis.new(host: "localhost", port: "6379")
13
+ reset
14
+ import(data) unless data.nil?
15
+ end
16
+
17
+ def reset
18
+ @redis.keys("pspace:*").each { |k| @redis.del k }
19
+ end
20
+
21
+ def import(data)
22
+ data.each { |datum| add(datum) }
23
+ end
24
+
25
+ def add(datum)
26
+ datum.each do |rv, val|
27
+ @redis.sadd "pspace:rand_vars", rv
28
+ @redis.sadd "pspace:uniq_vals:#{rv}", val
29
+
30
+ @redis.PFADD "pspace:count:#{rv}", datum.object_id.to_i
31
+ @redis.PFADD "pspace:count:#{rv}=#{val}", datum.object_id.to_i
32
+
33
+ end
34
+ end
35
+
36
+ def count(rvs, options = {})
37
+ if rvs.kind_of?(Array)
38
+ @redis.pfcount *rvs.map { |rv| "pspace:count:#{rv}" }
39
+ elsif rvs.kind_of?(Hash)
40
+ @redis.pfcount *rvs.map { |rv, val| "pspace:count:#{rv}=#{val}" }
41
+ end
42
+ end
43
+
44
+ def rand_vars
45
+ @redis.smembers "pspace:rand_vars"
46
+ end
47
+
48
+ def uniq_vals(rv)
49
+ @redis.smembers "pspace:uniq_vals:#{rv}"
50
+ end
51
+
52
+ end
53
+
54
+ end
55
+ end
56
+ end
@@ -0,0 +1,72 @@
1
+ require 'prolly/ps/storage/base'
2
+
3
+ module Prolly
4
+ class Ps
5
+ module Storage
6
+
7
+ class Rubylist < Base
8
+
9
+ def initialize
10
+ super
11
+ end
12
+
13
+ def reset
14
+ super
15
+ @data = []
16
+ @uniq_vals = {}
17
+ end
18
+
19
+ def add(datum)
20
+ @data << datum
21
+ end
22
+
23
+ def count(rvs, options = {})
24
+ reload = options[:reload] || false
25
+ start_time = Time.now
26
+ if rvs.kind_of?(Array)
27
+ value = @data.count { |e| rvs.all? { |rv| e.has_key?(rv) } }
28
+ elsif rvs.kind_of?(Hash)
29
+ value = @data.count { |e|
30
+ rvs.map { |rkey, rval|
31
+ vals = rval.kind_of?(Array) ? rval : [rval]
32
+ vals.include?(e[rkey]) == rval
33
+ }.all?
34
+ }
35
+ end
36
+ elapsed = Time.now - start_time
37
+ return value
38
+ end
39
+
40
+ def rand_vars
41
+ @data.first.keys
42
+ end
43
+
44
+ def uniq_vals(name)
45
+ @uniq_vals[name] ||= @data.map { |li| li.has_key?(name) ? li[name] : nil }.uniq
46
+ end
47
+
48
+ private
49
+
50
+ def explain(rvs, options = {})
51
+ end
52
+
53
+ def stats(options = {})
54
+ end
55
+
56
+ def display_stats
57
+ require 'pp'
58
+ puts "------------- Stats! --------------------"
59
+ puts
60
+ pp @stash_time.sort { |a, b| b[1][:usage] <=> a[1][:usage] }[0..10]
61
+ puts
62
+ pp @stash_time.sort { |a, b| b[1][:elapsed] <=> a[1][:elapsed] }[0..10]
63
+ puts
64
+ puts @stash_stats.inspect
65
+ puts
66
+ end
67
+
68
+ end
69
+
70
+ end
71
+ end
72
+ end
@@ -0,0 +1,69 @@
1
+ require 'prolly/rand_var/prob'
2
+ require 'prolly/rand_var/pdf'
3
+ require 'prolly/rand_var/entropy'
4
+ require 'prolly/rand_var/infogain'
5
+
6
+ module Prolly
7
+
8
+ class RandVar
9
+
10
+ include Prob
11
+ include Pdf
12
+ include Entropy
13
+ include Infogain
14
+
15
+ def initialize(pspace, *rand_vars)
16
+ @pspace = pspace
17
+
18
+ @uspec_rv, @spec_rv = parse(rand_vars)
19
+
20
+ @uspec_gv = []
21
+ @spec_gv = {}
22
+ end
23
+
24
+ # parses rand_var arguments
25
+ #
26
+ # random variable are passed in as arguments to a method. It can take the format of:
27
+ #
28
+ # :size
29
+ #
30
+ # { size: :large, color: :green }
31
+ #
32
+ # [ :size, { color: :green, texture: :rough } ]
33
+ #
34
+ def parse(rand_vars)
35
+ if rand_vars.kind_of?(Hash)
36
+ specified_rvs = rand_vars
37
+ unspecified_rvs = []
38
+ elsif rand_vars.kind_of?(Array)
39
+ specified_rvs, unspecified_rvs = rand_vars.partition { |e| e.kind_of?(Hash) }
40
+ specified_rvs = specified_rvs.inject({}) { |t, e| t.merge(e) }
41
+ else # if it's a symbol
42
+ specified_rvs = []
43
+ unspecified_rvs = [rand_vars]
44
+ end
45
+
46
+ return unspecified_rvs, specified_rvs
47
+ end
48
+
49
+ def given(*rand_vars)
50
+ @uspec_gv, @spec_gv = parse(rand_vars)
51
+
52
+ return self
53
+ end
54
+
55
+ def count
56
+ if !@spec_rv.empty?
57
+ if @uspec_gv.empty? and @spec_gv.empty?
58
+ @pspace.count(@spec_rv)
59
+ else
60
+ @pspace.count(@spec_rv.merge(@spec_gv))
61
+ end
62
+ else
63
+ @pspace.count(@uspec_rv)
64
+ end
65
+ end
66
+
67
+ end
68
+
69
+ end
@@ -0,0 +1,60 @@
1
+ module Prolly
2
+ class RandVar
3
+
4
+ module Entropy
5
+
6
+ # Entropy doesn't take hashes (for now?)
7
+ # If it did, I'm not sure what H(color=green) means at all.
8
+ def entropy
9
+ if !@spec_rv.empty?
10
+ raise "Cannot use entropy with specified random variables"
11
+ else
12
+ #puts "H(#{@rv} | #{@gv})"
13
+
14
+ if @uspec_gv.empty?# and @spec_gv.empty?
15
+ entropy_rv
16
+ else
17
+ entropy_rv_gv
18
+ end
19
+
20
+ end
21
+ end
22
+
23
+ private
24
+
25
+ # H(color)
26
+ # H(color, size)
27
+ # H(color | size=small)
28
+ # H(color, size | texture=smooth)
29
+ # H(color | size=small, texture=smooth)
30
+ def entropy_rv
31
+ distr = pdf
32
+ distr.inject(0) do |t, kv|
33
+ name, pn = kv
34
+ t += -pn * (pn == 0 ? 0.0 : Math.log(pn)) / Math.log(10)
35
+ end
36
+ end
37
+
38
+ # H(color | size)
39
+ # H(color, weight | size, texture = smooth)
40
+ # H(color | size, texture = smooth)
41
+ def entropy_rv_gv
42
+ ::Ps.uniq_vals(@uspec_gv).inject(0) do |t, gv_vals|
43
+ uspec_gv_speced = Hash[*@uspec_gv.zip(gv_vals).flatten]
44
+ gv = @spec_gv.merge(uspec_gv_speced)
45
+
46
+ pn = Ps.rv(gv).given(@spec_gv).prob
47
+ hn = Ps.rv(*@uspec_rv).given(gv).entropy
48
+
49
+ #puts "P(#{gv} | #{@spec_gv}) = #{pn}"
50
+ #puts "H(#{@uspec_rv} | #{gv}) = #{hn}"
51
+ #puts " #{Ps.rv(*@uspec_rv).given(gv).prob}"
52
+
53
+ t += (pn * hn)
54
+ end
55
+ end
56
+
57
+ end
58
+
59
+ end
60
+ end
@@ -0,0 +1,22 @@
1
+ module Prolly
2
+ class RandVar
3
+
4
+ module Infogain
5
+
6
+ # I(Y | X)
7
+ # I(Y | X, A = a)
8
+ # I(Y | X, A = a, B = b)
9
+ def infogain
10
+ raise "Need given var" if @uspec_gv.empty? and @spec_gv.empty?
11
+ raise "Need unspecified given var" if @uspec_gv.empty?
12
+ raise "Need unspecified rand var" if @uspec_rv.empty?
13
+
14
+ # puts "I(#{@rv} | #{@gv})"
15
+ Ps.rv(*@uspec_rv).given(@spec_gv).entropy -
16
+ Ps.rv(*@uspec_rv).given(*@uspec_gv, @spec_gv).entropy
17
+ end
18
+
19
+ end
20
+
21
+ end
22
+ end
@@ -0,0 +1,78 @@
1
+ module Prolly
2
+ class RandVar
3
+
4
+ module Pdf
5
+
6
+ def pdf
7
+ if !@spec_rv.empty?
8
+
9
+ raise StandardError.new("Cannot use pdf on this RV")
10
+
11
+ #if @uspec_gv.empty? and @spec_gv.empty?
12
+ # prob_rv_eq
13
+ #else
14
+ # prob_rv_eq_gv_eq
15
+ #end
16
+
17
+ else
18
+ #puts "distr : #{@rv.to_s} : #{@gv.to_s}"
19
+
20
+ if @uspec_gv.empty? and @spec_gv.empty?
21
+ prob_rv
22
+ elsif not @spec_gv.empty?
23
+ prob_rv_gv_eq
24
+ else
25
+ prob_rv_gv
26
+ end
27
+
28
+ end
29
+
30
+ end
31
+
32
+ private
33
+
34
+ # P(color) = [P(color=green), P(color=blue)]
35
+ # P(color, size) = [every combo of color and size]
36
+ def prob_rv
37
+ distr = ::Ps.uniq_vals(@uspec_rv).flat_map do |rv_vals|
38
+ spec_rv = Hash[*@uspec_rv.zip(rv_vals).flatten]
39
+ [rv_vals, Ps.rv(spec_rv).prob]
40
+ end
41
+
42
+ Hash[*distr]
43
+ end
44
+
45
+ # P(color | size=small) =
46
+ # [P(color=green | size=small), P(color=blue | size=small)]
47
+ # P(color | size=small, texture=smooth) =
48
+ # [P(every color | size=small, texture=smooth)]
49
+ def prob_rv_gv_eq
50
+ distr = ::Ps.uniq_vals(@uspec_rv).flat_map do |rv_vals|
51
+ spec_rv = Hash[*@uspec_rv.zip(rv_vals).flatten]
52
+ [rv_vals, Ps.rv(spec_rv).given(@spec_gv).prob]
53
+ end
54
+
55
+ Hash[*distr]
56
+ end
57
+
58
+ # P(color | size) =
59
+ # [P(color=green | size), P(color=blue | size)]
60
+ # TODO not tested
61
+ def prob_rv_gv
62
+ rv = @uspec_rv.first
63
+ gv = @uspec_gv.first
64
+
65
+ distr = @pspace.uniq_vals(rv).flat_map do |rv_val|
66
+ #puts "rv | gv : #{rv.to_s} | #{@gv.to_s}"
67
+
68
+ [rv_val, Ps.rv(rv.to_sym => rv_val).given(gv.to_sym).prob]
69
+ end
70
+ Hash[*distr]
71
+ end
72
+
73
+
74
+
75
+ end
76
+
77
+ end
78
+ end
@@ -0,0 +1,61 @@
1
+ module Prolly
2
+ class RandVar
3
+
4
+ module Prob
5
+
6
+ def prob
7
+ #puts "P(#{@rv} | #{@gv})"
8
+ raise StandardError.new("Cannot use prob on this RV") if @spec_rv.empty?
9
+
10
+ if @uspec_gv.empty? and @spec_gv.empty?
11
+ prob_rv_eq
12
+ else
13
+ prob_rv_eq_gv_eq
14
+ end
15
+ end
16
+
17
+ private
18
+
19
+ # P(color=green)
20
+ # P(color=green, size=small)
21
+ # P(color=[green, blue])
22
+ def prob_rv_eq
23
+ numer = self.count()
24
+ denom = @pspace.count(@spec_rv.keys)
25
+
26
+ if denom == 0.0
27
+ return 0.0
28
+ else
29
+ return numer.to_f / denom
30
+ end
31
+ end
32
+
33
+ # P(color=green | size=small)
34
+ # P(color=green, size=small | texture=smooth)
35
+ # P(color=green | size=small, texture=smooth)
36
+ def prob_rv_eq_gv_eq
37
+ numer = @pspace.count(@spec_rv.merge(@spec_gv))
38
+ denom = @pspace.count(@spec_gv)
39
+
40
+ if denom == 0.0
41
+ return 0.0
42
+ else
43
+ return numer.to_f / denom
44
+ end
45
+ end
46
+
47
+ # P(color=green | size)
48
+ #
49
+ # For now, this is like P(color=green)
50
+ def prob_rv_eq_gv
51
+ numer = @pspace.count(@spec_rv)
52
+ denom = @pspace.count(@uspec_gv)
53
+
54
+ return numer.to_f / denom
55
+ end
56
+
57
+
58
+ end
59
+
60
+ end
61
+ end
metadata ADDED
@@ -0,0 +1,78 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: prolly
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Wil Chung
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2015-02-15 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: moped
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '2.0'
20
+ - - ">="
21
+ - !ruby/object:Gem::Version
22
+ version: 2.0.3
23
+ type: :runtime
24
+ prerelease: false
25
+ version_requirements: !ruby/object:Gem::Requirement
26
+ requirements:
27
+ - - "~>"
28
+ - !ruby/object:Gem::Version
29
+ version: '2.0'
30
+ - - ">="
31
+ - !ruby/object:Gem::Version
32
+ version: 2.0.3
33
+ description: Just like a database has a query language like SQL this is a query language
34
+ specifically for answering questions about probabilities of events based on the
35
+ samples you have seen before
36
+ email: iamwil@gmail.com
37
+ executables: []
38
+ extensions: []
39
+ extra_rdoc_files: []
40
+ files:
41
+ - LICENSE
42
+ - README.markdown
43
+ - lib/prolly.rb
44
+ - lib/prolly/ps.rb
45
+ - lib/prolly/ps/storage/base.rb
46
+ - lib/prolly/ps/storage/mongodb.rb
47
+ - lib/prolly/ps/storage/redis.rb
48
+ - lib/prolly/ps/storage/rubylist.rb
49
+ - lib/prolly/rand_var.rb
50
+ - lib/prolly/rand_var/entropy.rb
51
+ - lib/prolly/rand_var/infogain.rb
52
+ - lib/prolly/rand_var/pdf.rb
53
+ - lib/prolly/rand_var/prob.rb
54
+ homepage: https://github.com/iamwilhelm/prolly
55
+ licenses:
56
+ - MIT
57
+ metadata: {}
58
+ post_install_message:
59
+ rdoc_options: []
60
+ require_paths:
61
+ - lib
62
+ required_ruby_version: !ruby/object:Gem::Requirement
63
+ requirements:
64
+ - - ">="
65
+ - !ruby/object:Gem::Version
66
+ version: '0'
67
+ required_rubygems_version: !ruby/object:Gem::Requirement
68
+ requirements:
69
+ - - ">="
70
+ - !ruby/object:Gem::Version
71
+ version: '0'
72
+ requirements: []
73
+ rubyforge_project:
74
+ rubygems_version: 2.4.5
75
+ signing_key:
76
+ specification_version: 4
77
+ summary: Domain Specific Language for expressing probabilities in code
78
+ test_files: []