bud 0.0.3 → 0.0.4

Sign up to get free protection for your applications and to get access to all the features.
Files changed (50) hide show
  1. data/README +33 -16
  2. data/bin/budplot +42 -65
  3. data/bin/budtimelines +235 -0
  4. data/bin/budvis +24 -122
  5. data/bin/rebl +1 -0
  6. data/docs/README.md +21 -10
  7. data/docs/bfs.md +4 -6
  8. data/docs/c.html +251 -0
  9. data/docs/cheat.md +45 -30
  10. data/docs/deploy.md +26 -26
  11. data/docs/getstarted.md +6 -4
  12. data/docs/visualizations.md +43 -31
  13. data/examples/chat/chat.rb +4 -9
  14. data/examples/chat/chat_server.rb +1 -8
  15. data/examples/deploy/deploy_ip_port +1 -0
  16. data/examples/deploy/keys.rb +5 -0
  17. data/examples/deploy/tokenring-ec2.rb +9 -9
  18. data/examples/deploy/{tokenring-local.rb → tokenring-fork.rb} +3 -5
  19. data/examples/deploy/tokenring-thread.rb +15 -0
  20. data/examples/deploy/tokenring.rb +25 -17
  21. data/lib/bud/aggs.rb +87 -25
  22. data/lib/bud/bud_meta.rb +48 -31
  23. data/lib/bud/bust/bust.rb +16 -15
  24. data/lib/bud/collections.rb +207 -232
  25. data/lib/bud/depanalysis.rb +1 -0
  26. data/lib/bud/deploy/countatomicdelivery.rb +8 -20
  27. data/lib/bud/deploy/deployer.rb +16 -16
  28. data/lib/bud/deploy/ec2deploy.rb +34 -35
  29. data/lib/bud/deploy/forkdeploy.rb +90 -0
  30. data/lib/bud/deploy/threaddeploy.rb +38 -0
  31. data/lib/bud/graphs.rb +103 -199
  32. data/lib/bud/joins.rb +190 -41
  33. data/lib/bud/monkeypatch.rb +84 -0
  34. data/lib/bud/rebl.rb +8 -1
  35. data/lib/bud/rewrite.rb +152 -49
  36. data/lib/bud/server.rb +1 -0
  37. data/lib/bud/state.rb +24 -10
  38. data/lib/bud/storage/dbm.rb +170 -0
  39. data/lib/bud/storage/tokyocabinet.rb +5 -1
  40. data/lib/bud/stratify.rb +6 -7
  41. data/lib/bud/viz.rb +31 -17
  42. data/lib/bud/viz_util.rb +204 -0
  43. data/lib/bud.rb +271 -244
  44. data/lib/bud.rb.orig +806 -0
  45. metadata +43 -22
  46. data/docs/bfs.raw +0 -251
  47. data/docs/diffs +0 -181
  48. data/examples/basics/out +0 -1103
  49. data/examples/basics/out.new +0 -856
  50. data/lib/bud/deploy/localdeploy.rb +0 -53
data/docs/README.md CHANGED
@@ -3,13 +3,24 @@ Welcome to the documentation for *Bud*, a prototype of Bloom under development.
3
3
 
4
4
  The documents here are organized to be read in any order, but you might like to try the following:
5
5
 
6
- * **intro.md**: A brief introduction to Bud and Bloom
7
- * **getstarted.md**: A quickstart to teach you basic Bloom concepts, the use of `rebl` interactive terminal, and the embedding of Bloom code in Ruby via the `Bud` module.
8
- * **operational.md**: an operational view of Bloom, to provide a more detailed model of how Bloom code is evaluated by Bud.
9
- * **cheat.md**: A concise "cheat sheet" to remind you about Bloom syntax.
10
- * **modules.md**: An overview of Bloom's modularity features.
11
- * **ruby_hooks.md**: Bud module methods that allow you to interact with the Bud evaluator from other Ruby threads.
12
- * **visualizations.md**: Overview of the `budvis` and `budplot` tools for visualizing Bloom program analyses.
13
- * **bfs.md**: A walkthrough of the Bloom distributed filesystem.
14
- * **bud-sandbox**: a github repository including lots of useful libraries and examples.
15
- * **Bud RubyDoc**: The Bud gem ships with RubyDoc on the language constructs and runtime hooks provided by the Bud module. (To see rdoc, you run `gem server` from a command line and open [http://0.0.0.0:8808/](http://0.0.0.0:8808/))
6
+ * **[intro.md](intro.md)**: A brief introduction to Bud and Bloom.
7
+ * **[getstarted.md](getstarted.md)**: A quickstart to teach you basic Bloom
8
+ concepts, the use of `rebl` interactive terminal, and the embedding of Bloom
9
+ code in Ruby via the `Bud` module.
10
+ * **[operational.md](operational.md)**: An operational view of Bloom, to provide
11
+ a more detailed model of how Bloom code is evaluated by Bud.
12
+ * **[cheat.md](cheat.md)**: A concise "cheat sheet" to remind you about Bloom syntax.
13
+ * **[modules.md](modules.md)**: An overview of Bloom's modularity features.
14
+ * **[ruby_hooks.md](ruby_hooks.md)**: Bud module methods that allow you to
15
+ interact with the Bud evaluator from other Ruby threads.
16
+ * **[visualizations.md](visualizations.md)**: Overview of the `budvis` and
17
+ `budplot` tools for visualizing Bloom program analyses.
18
+ * **[bfs.md](bfs.md)**: A walkthrough of the Bloom distributed filesystem.
19
+
20
+ In addition, the **[bud-sandbox](http://github.com/bloom-lang/bud-sandbox)**
21
+ GitHub repository contains lots of useful libraries and example programs built
22
+ using Bloom.
23
+
24
+ Finally, the Bud gem ships with RubyDoc on the language constructs and runtime
25
+ hooks provided by the Bud module. (To see rdoc, run `gem server` from a command
26
+ line and open [http://0.0.0.0:8808/](http://0.0.0.0:8808/))
data/docs/bfs.md CHANGED
@@ -301,10 +301,10 @@ payload of local chunks:
301
301
  [l.sender, l.payload[0]] unless l.payload[1] == [nil]
302
302
  end
303
303
 
304
- At the same time, we use the Ruby _flatmap_ method to flatten the array of chunks in the heartbeat payload into a set of tuples, which we
304
+ At the same time, we use the Ruby _flat_map_ method to flatten the array of chunks in the heartbeat payload into a set of tuples, which we
305
305
  associate with the heartbeating datanode and the time of receipt in __chunk_cache__:
306
306
 
307
- chunk_cache <= join([master_duty_cycle, last_heartbeat]).flat_map do |d, l|
307
+ chunk_cache <= (master_duty_cycle * last_heartbeat).flat_map do |d, l|
308
308
  unless l.payload[1].nil?
309
309
  l.payload[1].map do |pay|
310
310
  [l.peer, pay, Time.parse(d.val).to_f]
@@ -312,14 +312,13 @@ associate with the heartbeating datanode and the time of receipt in __chunk_cach
312
312
  end
313
313
  end
314
314
 
315
- We periodically garbage-collect this cached, removing entries for datanodes from whom we have not received a heartbeat in a configurable amount of time.
315
+ We periodically garbage-collect this cache, removing entries for datanodes from whom we have not received a heartbeat in a configurable amount of time.
316
316
  __last_heartbeat__ is an output interface provided by the __HeartbeatAgent__ module, and contains the most recent, non-stale heartbeat contents:
317
317
 
318
- chunk_cache <- join([master_duty_cycle, chunk_cache]).map do |t, c|
318
+ chunk_cache <-(master_duty_cycle * chunk_cache).pairs do |t, c|
319
319
  c unless last_heartbeat.map{|h| h.peer}.include? c.node
320
320
  end
321
321
 
322
-
323
322
  ## [BFS Client](https://github.com/bloom-lang/bud-sandbox/blob/master/bfs/bfs_client.rb)
324
323
 
325
324
  One of the most complicated parts of the basic GFS design is the client component. To minimize load on the centralized master, we take it off the critical
@@ -369,7 +368,6 @@ After defining some helper aggregates (__chunk_cnts_chunk__ or replica count by
369
368
 
370
369
  we define __lowchunks__ as the set of chunks whose replication factor is too low:
371
370
 
372
-
373
371
  lowchunks <= chunk_cnts_chunk { |c| [c.chunkid] if c.replicas < REP_FACTOR and !c.chunkid.nil?}
374
372
 
375
373
  We define __chosen_dest__ for a given underreplicated chunk as the datanode with
data/docs/c.html ADDED
@@ -0,0 +1,251 @@
1
+ <h1>Bud Cheat Sheet</h1>
2
+ <h2>General Bloom Syntax Rules</h2>
3
+ <p>Bloom programs are unordered sets of statements.<br>
4
+ Statements are delimited by semicolons (;) or newlines. <br>
5
+ As in Ruby, backslash is used to escape a newline.<br></p>
6
+ <h2>Simple embedding of Bud in a Ruby Class</h2>
7
+ <pre><code>require 'bud'
8
+
9
+ class Foo
10
+ include Bud
11
+
12
+ state do
13
+ ...
14
+ end
15
+
16
+ bloom do
17
+ ...
18
+ end
19
+ end
20
+ </code></pre>
21
+ <h2>State Declarations</h2>
22
+ <p>A <code>state</code> block contains Bud collection definitions. A Bud collection is a <em>set</em>
23
+ of <em>facts</em>; each fact is an array of Ruby values. Note that collections do not
24
+ contain duplicates (inserting a duplicate fact into a collection is ignored).</p>
25
+ <p>Like tables in a relational database, zero or more columns in a collection make
26
+ up the collection's <em>key</em>. Attempting to insert two facts that agree on the key
27
+ columns but are not duplicates results in a primary key violation (runtime
28
+ exception).</p>
29
+ <h3>Default Declaration Syntax</h3>
30
+ <p><em>BudCollection :name, [keys] =&gt; [values]</em></p>
31
+ <h3>table</h3>
32
+ <p>Contents persist in memory until explicitly deleted.<br>
33
+ Default attributes: <code>[:key] =&gt; [:val]</code></p>
34
+ <pre><code>table :keyvalue
35
+ table :composite, [:keyfield1, :keyfield2] =&gt; [:values]
36
+ table :noDups, [:field1, field2]
37
+ </code></pre>
38
+ <h3>scratch</h3>
39
+ <p>Contents emptied at start of each timestep.<br>
40
+ Default attributes: <code>[:key] =&gt; [:val]</code></p>
41
+ <pre><code>scratch :stats
42
+ </code></pre>
43
+ <h3>interface</h3>
44
+ <p>Scratch collections, used as connection points between modules.<br>
45
+ Default attributes: <code>[:key] =&gt; [:val]</code></p>
46
+ <pre><code>interface input, :request
47
+ interface output, :response
48
+ </code></pre>
49
+ <h3>channel</h3>
50
+ <p>Network channel manifested as a scratch collection.<br>
51
+ Facts that are inserted into a channel are sent to a remote host; the address of the remote host is specified in an attribute of the channel that is denoted with <code>@</code>.<br>
52
+ Default attributes: <code>[:@address, :val] =&gt; []</code></p>
53
+ <p>(Bloom statements with channel on lhs must use async merge (<code>&lt;~</code>).)</p>
54
+ <pre><code>channel :msgs
55
+ channel :req_chan, [:cartnum, :storenum, :@server] =&gt; [:command, :params]
56
+ </code></pre>
57
+ <h3>periodic</h3>
58
+ <p>System timer manifested as a scratch collection.<br>
59
+ System-provided attributes: <code>[:key] =&gt; [:val]</code><br>
60
+ &nbsp;&nbsp;&nbsp;&nbsp; (<code>key</code> is a unique ID, <code>val</code> is a Ruby Time converted to a string.)<br>
61
+ State declaration includes interval (in seconds).</p>
62
+ <p>(periodic can only be used on rhs of a Bloom statement.)</p>
63
+ <pre><code>periodic :timer, 0.1
64
+ </code></pre>
65
+ <h3>stdio</h3>
66
+ <p>Built-in scratch collection mapped to Ruby's <code>$stdin</code> and <code>$stdout</code><br>
67
+ System-provided attributes: <code>[:line] =&gt; []</code></p>
68
+ <p>Statements with stdio on lhs must use async merge (<code>&lt;~</code>).<br>
69
+ To capture <code>$stdin</code> on rhs, instantiate Bud with <code>:read_stdin</code> option.<br></p>
70
+ <h3>tctable</h3>
71
+ <p>Table collection mapped to a <a href="http://fallabs.com/tokyocabinet/">Tokyo Cabinet</a> store.<br>
72
+ Default attributes: <code>[:key] =&gt; [:val]</code></p>
73
+ <pre><code>tctable :t1
74
+ tctable :t2, [:k1, :k2] =&gt; [:v1, :v2]
75
+ </code></pre>
76
+ <h3>zktable</h3>
77
+ <p>Table collection mapped to an <a href="http://hadoop.apache.org/zookeeper/">Apache Zookeeper</a> store.<br>
78
+ System-provided attributes: <code>[:key] =&gt; [:val]</code><br>
79
+ State declaration includes Zookeeper path and optional TCP string (default: "localhost:2181")<br></p>
80
+ <pre><code>zktable :foo, "/bat"
81
+ zktable :bar, "/dat", "localhost:2182"
82
+ </code></pre>
83
+ <h2>Bloom Statements</h2>
84
+ <p><em>lhs BloomOp rhs</em></p>
85
+ <p>Left-hand-side (lhs) is a named <code>BudCollection</code> object.<br>
86
+ Right-hand-side (rhs) is a Ruby expression producing a <code>BudCollection</code> or <code>Array</code> of <code>Arrays</code>.<br>
87
+ BloomOp is one of the 4 operators listed below.</p>
88
+ <h3>Bloom Operators</h3>
89
+ <p>merges:</p>
90
+ <ul>
91
+ <li><code>left &lt;= right</code> &nbsp;&nbsp;&nbsp;&nbsp; (<em>instantaneous</em>)</li>
92
+ <li><code>left &lt;+ right</code> &nbsp;&nbsp;&nbsp;&nbsp; (<em>deferred</em>)</li>
93
+ <li><code>left &lt;~ right</code> &nbsp;&nbsp;&nbsp;&nbsp; (<em>asynchronous</em>)</li>
94
+ </ul>
95
+ <p>delete:</p>
96
+ <ul>
97
+ <li><code>left &lt;- right</code> &nbsp;&nbsp;&nbsp;&nbsp; (<em>deferred</em>)</li>
98
+ </ul>
99
+ <h3>Collection Methods</h3>
100
+ <p>Standard Ruby methods used on a BudCollection <code>bc</code>:</p>
101
+ <p>implicit map:</p>
102
+ <pre><code>t1 &lt;= bc {|t| [t.col1 + 4, t.col2.chomp]} # formatting/projection
103
+ t2 &lt;= bc {|t| t if t.col = 5} # selection
104
+ </code></pre>
105
+ <p><code>flat_map</code>:</p>
106
+ <pre><code>require 'backports' # flat_map not included in Ruby 1.8 by default
107
+
108
+ t3 &lt;= bc.flat_map do |t| # unnest a collection-valued attribute
109
+ bc.col4.map { |sub| [t.col1, t.col2, t.col3, sub] }
110
+ end
111
+ </code></pre>
112
+ <p><code>bc.reduce</code>, <code>bc.inject</code>:</p>
113
+ <pre><code>t4 &lt;= bc.reduce({}) do |memo, t| # example: groupby col1 and count
114
+ memo[t.col1] ||= 0
115
+ memo[t.col1] += 1
116
+ memo
117
+ end
118
+ </code></pre>
119
+ <p><code>bc.include?</code>:</p>
120
+ <pre><code>t5 &lt;= bc do |t| # like SQL's NOT IN
121
+ t unless t2.include?([t.col1, t.col2])
122
+ end
123
+ </code></pre>
124
+ <h2>BudCollection-Specific Methods</h2>
125
+ <p><code>bc.keys</code>: projects <code>bc</code> to key columns<br></p>
126
+ <p><code>bc.values</code>: projects <code>bc</code> to non-key columns<br></p>
127
+ <p><code>bc.inspected</code>: shorthand for <code>bc {|t| [t.inspect]}</code></p>
128
+ <pre><code>stdio &lt;~ bc.inspected
129
+ </code></pre>
130
+ <p><code>chan.payloads</code>: projects <code>chan</code> to non-address columns. Only defined for channels.</p>
131
+ <pre><code># at sender
132
+ msgs &lt;~ requests {|r| "127.0.0.1:12345", r}
133
+ # at receiver
134
+ requests &lt;= msgs.payloads
135
+ </code></pre>
136
+ <p><code>bc.exists?</code>: test for non-empty collection. Can optionally pass in a block.</p>
137
+ <pre><code>stdio &lt;~ [["Wake Up!"] if timer.exists?]
138
+ stdio &lt;~ requests do |r|
139
+ [r.inspect] if msgs.exists?{|m| r.ident == m.ident}
140
+ end
141
+ </code></pre>
142
+ <h2>SQL-style grouping/aggregation (and then some)</h2>
143
+ <ul>
144
+ <li><code>bc.group([:col1, :col2], min(:col3))</code>. <em>akin to min(col3) GROUP BY (col1,col2)</em></li>
145
+ <li>exemplary aggs: <code>min</code>, <code>max</code>, <code>choose</code></li>
146
+ <li>summary aggs: <code>sum</code>, <code>avg</code>, <code>count</code></li>
147
+ <li>structural aggs: <code>accum</code></li>
148
+ <li><code>bc.argmax([:col1], :col2)</code> &nbsp;&nbsp;&nbsp;&nbsp; <em>returns the bc tuple per col1 that has highest col2</em></li>
149
+ <li><code>bc.argmin([:col1], :col2)</code></li>
150
+ </ul>
151
+ <h3>Built-in Aggregates:</h3>
152
+ <ul>
153
+ <li>Exemplary aggs: <code>min</code>, <code>max</code>, <code>choose</code></li>
154
+ <li>Summary aggs: <code>count</code>, <code>sum</code>, <code>avg</code></li>
155
+ <li>Structural aggs: <code>accum</code></li>
156
+ </ul>
157
+ <p>Note that custom aggregation can be written using <code>reduce</code>.</p>
158
+ <h2>Collection Combination (Join)</h2>
159
+ <p>To match items across two (or more) collections, use the <code>*</code> operator, followed by methods to filter/format the result (<code>pairs</code>, <code>matches</code>, <code>combos</code>, <code>lefts</code>, <code>rights</code>).</p>
160
+ <h3>Methods on Combinations (Joins)</h3>
161
+ <p><code>pairs(</code><em>hash pairs</em><code>)</code>: <br>
162
+ Given a <code>*</code> expression, form all pairs of items with value matches in the hash-pairs attributes. Hash pairs can be fully qualified (<code>coll1.attr1 =&gt; coll2.attr2</code>) or shorthand (<code>:attr1 =&gt; :attr2</code>).</p>
163
+ <pre><code># for each inbound msg, find match in a persistent buffer
164
+ result &lt;= (msg * buffer).pairs(:val =&gt; :key) {|m, b| [m.address, m.val, b.val] }
165
+ </code></pre>
166
+ <p><code>combos(</code><em>hash pairs</em><code>)</code>: <br>
167
+ Alias for <code>pairs</code>, more readable for multi-collection <code>*</code> expressions. Must use fully-qualified hash pairs.</p>
168
+ <pre><code># the following 2 Bloom statements are equivalent to this SQL
169
+ # SELECT r.a, s_tab.b, t.c
170
+ # FROM r, s_tab, t
171
+ # WHERE r.x = s_tab.x
172
+ # AND s_tab.x = t.x;
173
+
174
+ # multiple column matches
175
+ out &lt;= (r * s_tab * t).combos(r.x =&gt; s_tab.x, s_tab.x =&gt; t.x) do |t1, t2, t3|
176
+ [t1.a, t2.b, t3.c]
177
+ end
178
+
179
+ # column matching done per pair: this will be very slow
180
+ out &lt;= (r * s_tab * t).combos do |t1, t2, t3|
181
+ [t1.a, t2.b, t3.c] if r.x == s_tab.x and s_tab.x == t.x
182
+ end
183
+ </code></pre>
184
+ <p><code>matches</code>:<br>
185
+ Shorthand for <code>combos</code> with hash pairs for all attributes with matching names.</p>
186
+ <pre><code># Equivalent to the above statements if x is the only attribute name in common:
187
+ out &lt;= (r * s_tab * t).matches do {|t1, t2, t3| [t1.a, t2.b, t3.c]}
188
+ </code></pre>
189
+ <p><code>lefts(</code><em>hash pairs</em><code>)</code>: <br>
190
+ Like <code>pairs</code>, but implicitly includes a block that projects down to the left item in each pair.</p>
191
+ <p><code>rights(</code><em>hash pairs</em><code>)</code>:
192
+ Like <code>pairs</code>, but implicitly includes a block that projects down to the right item in each pair.</p>
193
+ <p><code>flatten</code>:<br>
194
+ <code>flatten</code> is a bit like SQL's <code>SELECT *</code>: it produces a collection of concatenated objects, with a schema that is the concatenation of the schemas in tablelist (with duplicate names disambiguated.) Useful for chaining to operators that expect input collections with schemas, e.g. group:</p>
195
+ <pre><code>out &lt;= (r * s).matches.flatten.group([:a], max(:b))
196
+ </code></pre>
197
+ <p><code>outer(</code><em>hash pairs</em><code>)</code>:<br>
198
+ Left Outer Join. Like <code>pairs</code>, but objects in the first collection will be produced nil-padded if they have no match in the second collection.</p>
199
+ <h2>Temp Collections</h2>
200
+ <p><code>temp</code><br>
201
+ Temp collections are scratches defined within a <code>bloom</code> block:</p>
202
+ <pre><code>temp :my_scratch1 &lt;= foo
203
+ </code></pre>
204
+ <p>The schema of a temp collection in inherited from the rhs; if the rhs has no
205
+ schema, a simple one is manufactured to suit the data found in the rhs at
206
+ runtime: <code>[c0, c1, ...]</code>.</p>
207
+ <h2>Bud Modules</h2>
208
+ <p>A Bud module combines state (collections) and logic (Bloom rules). Using modules allows your program to be decomposed into a collection of smaller units.</p>
209
+ <p>Definining a Bud module is identical to defining a Ruby module, except that the module can use the <code>bloom</code>, <code>bootstrap</code>, and <code>state</code> blocks described above.</p>
210
+ <p>There are two ways to use a module <em>B</em> in another Bloom module <em>A</em>:</p>
211
+ <ol>
212
+ <li>
213
+ <p><code>include B</code>: This "inlines" the definitions (state and logic) from <em>B</em> into
214
+ <em>A</em>. Hence, collections defined in <em>B</em> can be accessed from <em>A</em> (via the
215
+ same syntax as <em>A</em>'s own collections). In fact, since Ruby is
216
+ dynamically-typed, Bloom statements in <em>B</em> can access collections
217
+ in <em>A</em>!</p>
218
+ </li>
219
+ <li>
220
+ <p><code>import B =&gt; :b</code>: The <code>import</code> statement provides a more structured way to
221
+ access another module. Module <em>A</em> can now access state defined in <em>B</em> by
222
+ using the qualifier <code>b</code>. <em>A</em> can also import two different copies of <em>B</em>,
223
+ and give them local names <code>b1</code> and <code>b2</code>; these copies will be independent
224
+ (facts inserted into a collection defined in <code>b1</code> won't also be inserted
225
+ into <code>b2</code>'s copy of the collection).</p>
226
+ </li>
227
+ </ol>
228
+ <h2>Skeleton of a Bud Module</h2>
229
+ <pre><code>require 'rubygems'
230
+ require 'bud'
231
+
232
+ module YourModule
233
+ include Bud
234
+
235
+ state do
236
+ ...
237
+ end
238
+
239
+ bootstrap do
240
+ ...
241
+ end
242
+
243
+ bloom :some_stmts do
244
+ ...
245
+ end
246
+
247
+ bloom :more_stmts do
248
+ ...
249
+ end
250
+ end
251
+ </code></pre>
data/docs/cheat.md CHANGED
@@ -9,19 +9,26 @@ As in Ruby, backslash is used to escape a newline.<br>
9
9
  require 'bud'
10
10
 
11
11
  class Foo
12
- include Bud
12
+ include Bud
13
13
 
14
- state do
15
- ...
16
- end
14
+ state do
15
+ ...
16
+ end
17
17
 
18
- bloom do
19
- ...
20
- end
18
+ bloom do
19
+ ...
20
+ end
21
21
  end
22
22
 
23
23
  ## State Declarations ##
24
- A `state` block contains Bud collection definitions.
24
+ A `state` block contains Bud collection definitions. A Bud collection is a *set*
25
+ of *facts*; each fact is an array of Ruby values. Note that collections do not
26
+ contain duplicates (inserting a duplicate fact into a collection is ignored).
27
+
28
+ Like a table in a relational databas, a subset of the columns in a collection
29
+ makeup the collection's _key_. Attempting to insert two facts into a collection
30
+ that agree on the key columns (but are not duplicates) results in a runtime
31
+ exception.
25
32
 
26
33
  ### Default Declaration Syntax ###
27
34
  *BudCollection :name, [keys] => [values]*
@@ -57,10 +64,18 @@ Default attributes: `[:@address, :val] => []`
57
64
  channel :msgs
58
65
  channel :req_chan, [:cartnum, :storenum, :@server] => [:command, :params]
59
66
 
67
+ ### loopback ###
68
+ A network channel that delivers tuples back to the current Bud instance.<br>
69
+ Default attributes: `[:key] => [:val]`
70
+
71
+ (Bloom statements with loopback on lhs must use async merge (`<~`).)
72
+
73
+ loopback :talk_to_self
74
+
60
75
  ### periodic ###
61
76
  System timer manifested as a scratch collection.<br>
62
77
  System-provided attributes: `[:key] => [:val]`<br>
63
- &nbsp;&nbsp;&nbsp;&nbsp; (`key` is a unique ID, `val` is a Ruby Time converted to a string.)<br>
78
+ &nbsp;&nbsp;&nbsp;&nbsp; (`key` is a unique ID, `val` is a Ruby `Time` object.)<br>
64
79
  State declaration includes interval (in seconds).
65
80
 
66
81
  (periodic can only be used on rhs of a Bloom statement.)
@@ -68,11 +83,19 @@ State declaration includes interval (in seconds).
68
83
  periodic :timer, 0.1
69
84
 
70
85
  ### stdio ###
71
- Built-in scratch collection mapped to Ruby's `$stdin` and `$stdout`<br>
86
+ Built-in scratch collection for performing terminal I/O.<br>
72
87
  System-provided attributes: `[:line] => []`
73
88
 
74
89
  Statements with stdio on lhs must use async merge (`<~`).<br>
75
- To capture `$stdin` on rhs, instantiate Bud with `:read_stdin` option.<br>
90
+ Using `stdio` on the lhs of an async merge results in writing to the `IO` object specified by the `:stdout` Bud option (`$stdout` by default).<br>
91
+ To use `stdio` on rhs, instantiate Bud with `:stdin` option set to an `IO` object (e.g., `$stdin`).<br>
92
+
93
+ ### dbm_table ###
94
+ Table collection mapped to a [DBM] (http://en.wikipedia.org/wiki/Dbm) store.<br>
95
+ Default attributes: `[:key] => [:val]`
96
+
97
+ dbm_table :t1
98
+ dbm_table :t2, [:k1, :k2] => [:v1, :v2]
76
99
 
77
100
  ### tctable ###
78
101
  Table collection mapped to a [Tokyo Cabinet](http://fallabs.com/tokyocabinet/) store.<br>
@@ -95,7 +118,7 @@ State declaration includes Zookeeper path and optional TCP string (default: "loc
95
118
 
96
119
  Left-hand-side (lhs) is a named `BudCollection` object.<br>
97
120
  Right-hand-side (rhs) is a Ruby expression producing a `BudCollection` or `Array` of `Arrays`.<br>
98
- BloomOp is one of the 5 operators listed below.
121
+ BloomOp is one of the 4 operators listed below.
99
122
 
100
123
  ### Bloom Operators ###
101
124
  merges:
@@ -108,13 +131,6 @@ delete:
108
131
 
109
132
  * `left <- right` &nbsp;&nbsp;&nbsp;&nbsp; (*deferred*)
110
133
 
111
- insert:<br>
112
-
113
- * `left << [...]` &nbsp;&nbsp;&nbsp;&nbsp; (*instantaneous*)
114
-
115
- Note that unlike merge/delete, insert expects a single fact on the rhs, rather
116
- than a collection.
117
-
118
134
  ### Collection Methods ###
119
135
  Standard Ruby methods used on a BudCollection `bc`:
120
136
 
@@ -154,7 +170,7 @@ implicit map:
154
170
 
155
171
  stdio <~ bc.inspected
156
172
 
157
- `chan.payloads`: projects `chan` to non-address columns. only defined for channels
173
+ `chan.payloads`: projects `chan` to non-address columns. Only defined for channels.
158
174
 
159
175
  # at sender
160
176
  msgs <~ requests {|r| "127.0.0.1:12345", r}
@@ -191,13 +207,13 @@ To match items across two (or more) collections, use the `*` operator, followed
191
207
  ### Methods on Combinations (Joins) ###
192
208
 
193
209
  `pairs(`*hash pairs*`)`: <br>
194
- given a `*` expression, form all pairs of items with value matches in the hash-pairs attributes. Hash pairs can be fully qualified (`coll1.attr1 => coll2.attr2`) or shorthand (`:attr1 => :attr2`).
210
+ Given a `*` expression, form all pairs of items with value matches in the hash-pairs attributes. Hash pairs can be fully qualified (`coll1.attr1 => coll2.attr2`) or shorthand (`:attr1 => :attr2`).
195
211
 
196
212
  # for each inbound msg, find match in a persistent buffer
197
213
  result <= (msg * buffer).pairs(:val => :key) {|m, b| [m.address, m.val, b.val] }
198
214
 
199
- `pairs(`*hash pairs*`)`: <br>
200
- alias for `pairs`, more readable for multi-collection `*` expressions. Must use fully-qualified hash pairs.
215
+ `combos(`*hash pairs*`)`: <br>
216
+ Alias for `pairs`, more readable for multi-collection `*` expressions. Must use fully-qualified hash pairs.
201
217
 
202
218
  # the following 2 Bloom statements are equivalent to this SQL
203
219
  # SELECT r.a, s_tab.b, t.c
@@ -211,16 +227,16 @@ alias for `pairs`, more readable for multi-collection `*` expressions. Must use
211
227
  end
212
228
 
213
229
  # column matching done per pair: this will be very slow
214
- out <= join([r,s_tab,t]) do |t1, t2, t3|
230
+ out <= (r * s_tab * t).combos do |t1, t2, t3|
215
231
  [t1.a, t2.b, t3.c] if r.x == s_tab.x and s_tab.x == t.x
216
232
  end
217
233
 
218
234
  `matches`:<br>
219
- Shorthand for `combos` with hash pairs for all attributes with matching names.
235
+ Shorthand for `combos` with hash pairs for all attributes with matching names; this is called the "natural join" in SQL.
220
236
 
221
237
  # Equivalent to the above statements if x is the only attribute name in common:
222
- out <= (r * s_tab * t).matches do {|t1, t2, t3| [t1.a, t2.b, t3.c]}
223
-
238
+ out <= (r * s_tab * t).matches {|t1, t2, t3| [t1.a, t2.b, t3.c]}
239
+
224
240
  `lefts(`*hash pairs*`)`: <br>
225
241
  Like `pairs`, but implicitly includes a block that projects down to the left item in each pair.
226
242
 
@@ -232,9 +248,8 @@ Like `pairs`, but implicitly includes a block that projects down to the right it
232
248
 
233
249
  out <= (r * s).matches.flatten.group([:a], max(:b))
234
250
 
235
- ### Left Join ###
236
- `leftjoin([`*t1, t2*`]` *, optional hash pairs, ...*`)`<br>
237
- Left Outer Join. Note postfix syntax with array of 2 collections as first argument, hash pairs as subsequent arguments. Objects in the first collection will be included in the output even if no match is found in the second collection.
251
+ `outer(`*hash pairs*`)`:<br>
252
+ Left Outer Join. Like `pairs`, but objects in the first collection will be produced nil-padded if they have no match in the second collection.
238
253
 
239
254
  ## Temp Collections ##
240
255
  `temp`<br>
data/docs/deploy.md CHANGED
@@ -1,45 +1,43 @@
1
1
  # Deployment
2
2
 
3
- Bud provides support for deploying a program onto a set of Bud instances. At the moment, two types of deployments are supported: local deployment and EC2 deployment. Intuitively, you include the module corresponding to the type of deployment you want into a Bud class, which you instantiate and run on a node called the "deployer." The deployer then spins up a requested number of Bud instances and distributes initial data.
3
+ Bud provides support for deploying a program onto a set of Bud instances. At the moment, two types of deployments are supported: fork-based local deployment and EC2 deployment. Intuitively, you include the module corresponding to the type of deployment you want into a Bud class, which you instantiate and run on a node called the "deployer." The deployer then spins up a requested number of Bud instances and distributes initial data.
4
4
 
5
5
  First, decide which type of deployment you want to use.
6
6
 
7
- ## Local Deployment
7
+ ## Fork-based Local Deployment
8
8
 
9
- To use local deployment, you'll need to require it in your program:
9
+ To use fork-based deployment, you'll need to include it in your class:
10
10
 
11
- require 'bud/deploy/localdeploy'
11
+ include ForkDeploy
12
12
 
13
- Don't forget to include it in your class:
14
-
15
- include LocalDeploy
16
-
17
- The next step is to declare how many nodes you want to the program to be spun up on. You need to do this in a `deploystrap` block. A `deploystrap` block is run before `bootstrap`, and is only run for a Bud class that is instantiated with the option "`:deploy => true`". As an example:
13
+ The next step is to declare how many nodes you want to the program to be spun up on. You need to do this in a `deploystrap` block. A `deploystrap` block is run before `bootstrap`, and is only run for a Bud class that is instantiated with the option `:deploy => true`. As an example:
18
14
 
19
15
  deploystrap do
20
16
  num_nodes <= [[2]]
21
17
  end
22
18
 
23
- Local deployment will spin up `num_nodes` local processes, each containing one Bud instance, running the class that you include `LocalDeploy` in. The deployment code will populate a binary collection called `node`; the first columm is a "node ID" -- a distinct integer from the range `[0, num_nodes - 1]` -- and the second argument is an "IP:port" string associated with the node. Nodes are spun up on ephemeral ports, listening on "localhost".
19
+ Fork-based deployment will spin up `num_nodes` local processes, each containing one Bud instance, running the class that you include `ForkDeploy` in. The deployment code will populate a binary collection called `node`; the first columm is a "node ID" -- a distinct integer from the range `[0, num_nodes - 1]` -- and the second argument is an "IP:port" string associated with the node. Nodes are spun up on ephemeral ports, listening on "localhost".
24
20
 
25
- Now, you need to define how you want the initial data to be distributed. You can do this, for example, by writing (multiple) rules with `initial_data` in the head. The schema of `initial_data` is as follows: [node ID, relation name as a symbol, list of tuples]. For example, to distribute the IP address of the "deployer" to all of the other nodes in a relation called `master`, you might decide to write something like this:
21
+ Now, you need to define how you want the initial data to be distributed. You can do this, for example, by writing (multiple) rules with `initial_data` in the head. These rules can appear in any `bloom` block in your program. The schema of `initial_data` is as follows: [node ID, relation name as a symbol, list of tuples].
26
22
 
27
- initial_data <= node.map {|n| [n.uid, :master, [[ip_port]]]}
23
+ For example, to distribute the IP address and port of the "deployer" to all of the other nodes in a relation called `master`, you might decide to write something like this:
28
24
 
29
- Note that the relation ("master" in this case) is never a channel. You may only distribute data to scratches and tables. Initial data is transferred only after _all_ nodes are spun up; this ensures that initial data will never be lost because a node is not yet listening on a socket, for example. Initial data is transmitted atomically to each node; this means that on each node, _all_ initial data in _all_ relations will arrive at the same Bud timestep. However, there is no global barrier for transfer of initial data. For example, if initial data is distributed to nodes 1 and 2, node 1 may receive its initial data first, and then send subsequent messages on channels to node 2 which node 2 may receive before its initial data.
25
+ initial_data <= node {|n| [n.uid, :master, [[ip_port]]]}
30
26
 
31
- The rules defining `initial_data` may appear in any `bloom` block.
27
+ Note that the relation (`master` in this case) cannot be a channel -- you may only distribute data to scratches and tables. Initial data is transferred only after _all_ nodes are spun up; this ensures that initial data will never be lost because a node is not yet listening on a socket, for example. Initial data is transmitted atomically to each node; this means that on each node, _all_ initial data in _all_ relations will arrive at the same Bud timestep. However, there is no global barrier for transfer of initial data. For example, if initial data is distributed to nodes 1 and 2, node 1 may receive its initial data first, and then send subsequent messages on channels to node 2 which node 2 may receive before its initial data.
32
28
 
33
- The final step is to add `:deploy => true` to the instantiation of your class. Note that the local deployer will spin up nodes without `:deploy => true`, so you don't forkbomb your system.
29
+ The final step is to add `:deploy => true` to the instantiation of your class. Note that the fork-based deployer will spin up nodes without `:deploy => true`, so you don't forkbomb your system.
34
30
 
35
31
 
36
32
  ## EC2 Deployment
37
33
 
38
- To use EC2 deployment you'll need to require it in your program:
34
+ To use EC2 deployment, you'll need to require it in your program:
39
35
 
40
36
  require 'bud/deploy/ec2deploy'
41
37
 
42
- Don't forget to include it in your class:
38
+ Note that the `amazon-ec2`, `net-scp`, and `net-ssh` gems must be installed.
39
+
40
+ Next, include the `EC2Deploy` module in your class:
43
41
 
44
42
  include EC2Deploy
45
43
 
@@ -71,26 +69,28 @@ This recursively copies all directories and files rooted at the current working
71
69
 
72
70
  Finally, `ec2_key_location` is the path to the private key of the `key_name` keypair. For example:
73
71
 
74
- key_name <= [["/home/bob/.ssh/ec2"]]
72
+ ec2_key_location <= [["/home/bob/.ssh/bob.pem"]]
75
73
 
76
- EC2 deployment will spin up `num_nodes` instances (using defaults) on EC2 using a pre-rolled Bud AMI based on Amazon's 32-bit Linux AMI (`ami-8c1fece5`). Each instance contains one Bud instance, which runs the `ruby_command`. Like before, the deployment code will populate a binary relation called `node`; the first argument is a "node ID" -- a distinct integer from the range [0, num_nodes - 1] -- and the second argument is an "IP:port" string associated with the node. Nodes are currently spun up on fixed port 54321.
74
+ EC2 deployment will spin up `num_nodes` instances (using defaults) on EC2 using a pre-rolled Bud AMI based on Amazon's 32-bit Linux AMI (`ami-8c1fece5`). Each instance contains one Bud instance, which runs the `ruby_command`. Like before, the deployment code will populate a binary relation called `node`; the first attribute is a "node ID" -- a distinct integer from the range [0, num_nodes - 1] -- and the second attribute is an "IP:port" string associated with the node. Nodes are currently spun up on fixed port 54321.
77
75
 
78
- Defining initial data works exactly the same way with EC2 deployment as it does with local deployment.
76
+ Defining `initial_data` works exactly the same way with EC2 deployment as it does with local deployment.
79
77
 
80
- There is a slight catch with EC2 deployment. Sometimes EC2 tells us that the nodes have all started up, but really, one or more nodes will never start up. Currently, in this scenario, deployment exceeds the maximum number of ssh retries, and throws an exception.
78
+ There is a slight catch with EC2 deployment. Sometimes EC2 tells us that the nodes have all started up, but really, one or more nodes will never start up. Currently, in this scenario, deployment exceeds the maximum number of SSH retries, and throws an exception.
81
79
 
82
80
  Note that EC2 deployment does *not* shut down the EC2 nodes it starts up under any circumstances. This means you must use some alternate means to shut down the nodes, such as logging onto the EC2 web interface and terminating the nodes.
83
81
 
84
82
  ## Examples
85
83
 
86
- Check out the `examples/deploy` directory in Bud. There is a simple token ring example that establishes a ring involving 10 nodes and sends a token around the ring continuously. This example can be deployed locally:
84
+ Check out the `examples/deploy` directory in Bud. There is a simple token ring example that establishes a ring involving 10 nodes and sends a token around the ring continuously. This example can be deployed locally using the fork-based deployer:
87
85
 
88
- ruby tokenring-local.rb
86
+ ruby tokenring-fork.rb
89
87
 
90
88
  or on EC2:
91
89
 
92
- ruby tokenring-ec2.rb local_ip:local_port ext_ip true
90
+ ruby tokenring-ec2.rb local_ip:local_port ext_ip:ext_port true
91
+
92
+ "ext_ip" and "ext_port" should be set to the externally-visible IP and port of the computer you are deploying from. For example, if you are behind a home router, you will want to set "ext_ip" to your public IP address, and ensure that your router forwards "ext_port" to local_port on the computer with private IP "local_ip".
93
93
 
94
- Note that before running `tokenring-ec2`, you must create a "keys.rb" file that contains `access_key_id`, `secret_access_key`, `key_name` and `ec2_key_location`.
94
+ Note that before running `tokenring-ec2`, you must create a file named "keys.rb" that contains definitions for `access_key_id`, `secret_access_key`, `key_name` and `ec2_key_location`.
95
95
 
96
- Output will be displayed to show the progress of the deployment. Be patient, it may take a while for output to appear. Once deployment is complete and all nodes are ready, each node will display output indicating when it has the token. All output will be visible for the local deployment case, whereas only the deployer node's output will be visible for the EC2 deployment case (stdout of all other nodes is materialized to disk).
96
+ Output will be displayed to show the progress of the deployment. Be patient, it may take a while for output to appear. Once deployment is complete and all nodes are ready, each node will display output indicating when it has the token. All output will be visible for the fork-based deployment case, whereas only the deployer node's output will be visible for the EC2 deployment case (stdout of all other nodes is written to local disk).
data/docs/getstarted.md CHANGED
@@ -2,7 +2,11 @@
2
2
  In this document we'll do a hands-on tour of Bud and its Bloom DSL for Ruby. We'll start with some examples, and introduce concepts as we go.
3
3
 
4
4
  ## Installation ##
5
- You know the drill!
5
+ Bud depends on one library that needs to be installed separately:
6
+
7
+ * [GraphViz](http://www.graphviz.org/Download.php) (2.26.3 recommended)
8
+
9
+ Once that's done, you know the drill!
6
10
 
7
11
  % gem install bud
8
12
 
@@ -248,8 +252,6 @@ And here's the code:
248
252
  end
249
253
  end
250
254
 
251
-
252
-
253
255
  if ARGV.length == 2
254
256
  server = ARGV[1]
255
257
  else
@@ -264,7 +266,7 @@ The ChatClient class has a typical Ruby `initialize` method that sets up two loc
264
266
 
265
267
  The next block in the class is the first Bloom `bootstrap` block we've seen. This is a set of Bloom statements that are evaluated only once, just before the first "regular" timestep of the system. In this case, we bootstrap the client by sending a message to the server on the connect channel, containing the client's address (via the built-in Bud instance method `ip_port`) and chosen nickname.
266
268
 
267
- After that comes a bloom block, with the name `:chatter`. It contains two statements: one to take stdio input from the terminal and send it to the server via mcast, and another to receive mcasts and place them on stdio output. The first statement has the built-in `stdio` scratch on the rhs: this includes any lines of terminal input that arrived since the last timestep. For each line of terminal input, the `do...end` block formats an `mcast` message destined to the address in the instance variable `@server`, with an array as the payload. The rhs of the second statement takes `mcast` messages that arrived since the last timestep. For each message `m`, the `m.val` expression in the block returns the message payload; the call to the Ruby instance method `pretty_print` formats the message so it will look nice on-screen. These formatted strings are placed (asynchronously, as before) into `stdio` on the left.
269
+ After that comes a Bloom block with the name `:chatter`. It contains two statements: one to take stdio input from the terminal and send it to the server via mcast, and another to receive mcasts and place them on stdio output. The first statement has the built-in `stdio` scratch on the rhs: this includes any lines of terminal input that arrived since the last timestep. For each line of terminal input, the `do...end` block formats an `mcast` message destined to the address in the instance variable `@server`, with an array as the payload. The rhs of the second statement takes `mcast` messages that arrived since the last timestep. For each message `m`, the `m.val` expression in the block returns the message payload; the call to the Ruby instance method `pretty_print` formats the message so it will look nice on-screen. These formatted strings are placed (asynchronously, as before) into `stdio` on the left.
268
270
 
269
271
  The remaining lines are Ruby driver code to instantiate and run the ChatClient class (which includes the `Bud` module) using arguments from the command line. Note the option `:read_stdin => true` to `ChatClient.new`: this causes the Bud runtime to capture stdin via the built-in `stdio` collection.
270
272