bud 0.0.3 → 0.0.4
Sign up to get free protection for your applications and to get access to all the features.
- data/README +33 -16
- data/bin/budplot +42 -65
- data/bin/budtimelines +235 -0
- data/bin/budvis +24 -122
- data/bin/rebl +1 -0
- data/docs/README.md +21 -10
- data/docs/bfs.md +4 -6
- data/docs/c.html +251 -0
- data/docs/cheat.md +45 -30
- data/docs/deploy.md +26 -26
- data/docs/getstarted.md +6 -4
- data/docs/visualizations.md +43 -31
- data/examples/chat/chat.rb +4 -9
- data/examples/chat/chat_server.rb +1 -8
- data/examples/deploy/deploy_ip_port +1 -0
- data/examples/deploy/keys.rb +5 -0
- data/examples/deploy/tokenring-ec2.rb +9 -9
- data/examples/deploy/{tokenring-local.rb → tokenring-fork.rb} +3 -5
- data/examples/deploy/tokenring-thread.rb +15 -0
- data/examples/deploy/tokenring.rb +25 -17
- data/lib/bud/aggs.rb +87 -25
- data/lib/bud/bud_meta.rb +48 -31
- data/lib/bud/bust/bust.rb +16 -15
- data/lib/bud/collections.rb +207 -232
- data/lib/bud/depanalysis.rb +1 -0
- data/lib/bud/deploy/countatomicdelivery.rb +8 -20
- data/lib/bud/deploy/deployer.rb +16 -16
- data/lib/bud/deploy/ec2deploy.rb +34 -35
- data/lib/bud/deploy/forkdeploy.rb +90 -0
- data/lib/bud/deploy/threaddeploy.rb +38 -0
- data/lib/bud/graphs.rb +103 -199
- data/lib/bud/joins.rb +190 -41
- data/lib/bud/monkeypatch.rb +84 -0
- data/lib/bud/rebl.rb +8 -1
- data/lib/bud/rewrite.rb +152 -49
- data/lib/bud/server.rb +1 -0
- data/lib/bud/state.rb +24 -10
- data/lib/bud/storage/dbm.rb +170 -0
- data/lib/bud/storage/tokyocabinet.rb +5 -1
- data/lib/bud/stratify.rb +6 -7
- data/lib/bud/viz.rb +31 -17
- data/lib/bud/viz_util.rb +204 -0
- data/lib/bud.rb +271 -244
- data/lib/bud.rb.orig +806 -0
- metadata +43 -22
- data/docs/bfs.raw +0 -251
- data/docs/diffs +0 -181
- data/examples/basics/out +0 -1103
- data/examples/basics/out.new +0 -856
- data/lib/bud/deploy/localdeploy.rb +0 -53
data/docs/README.md
CHANGED
@@ -3,13 +3,24 @@ Welcome to the documentation for *Bud*, a prototype of Bloom under development.
|
|
3
3
|
|
4
4
|
The documents here are organized to be read in any order, but you might like to try the following:
|
5
5
|
|
6
|
-
* **intro.md**: A brief introduction to Bud and Bloom
|
7
|
-
* **getstarted.md**: A quickstart to teach you basic Bloom
|
8
|
-
|
9
|
-
|
10
|
-
* **
|
11
|
-
|
12
|
-
* **
|
13
|
-
* **
|
14
|
-
* **
|
15
|
-
|
6
|
+
* **[intro.md](intro.md)**: A brief introduction to Bud and Bloom.
|
7
|
+
* **[getstarted.md](getstarted.md)**: A quickstart to teach you basic Bloom
|
8
|
+
concepts, the use of `rebl` interactive terminal, and the embedding of Bloom
|
9
|
+
code in Ruby via the `Bud` module.
|
10
|
+
* **[operational.md](operational.md)**: An operational view of Bloom, to provide
|
11
|
+
a more detailed model of how Bloom code is evaluated by Bud.
|
12
|
+
* **[cheat.md](cheat.md)**: A concise "cheat sheet" to remind you about Bloom syntax.
|
13
|
+
* **[modules.md](modules.md)**: An overview of Bloom's modularity features.
|
14
|
+
* **[ruby_hooks.md](ruby_hooks.md)**: Bud module methods that allow you to
|
15
|
+
interact with the Bud evaluator from other Ruby threads.
|
16
|
+
* **[visualizations.md](visualizations.md)**: Overview of the `budvis` and
|
17
|
+
`budplot` tools for visualizing Bloom program analyses.
|
18
|
+
* **[bfs.md](bfs.md)**: A walkthrough of the Bloom distributed filesystem.
|
19
|
+
|
20
|
+
In addition, the **[bud-sandbox](http://github.com/bloom-lang/bud-sandbox)**
|
21
|
+
GitHub repository contains lots of useful libraries and example programs built
|
22
|
+
using Bloom.
|
23
|
+
|
24
|
+
Finally, the Bud gem ships with RubyDoc on the language constructs and runtime
|
25
|
+
hooks provided by the Bud module. (To see rdoc, run `gem server` from a command
|
26
|
+
line and open [http://0.0.0.0:8808/](http://0.0.0.0:8808/))
|
data/docs/bfs.md
CHANGED
@@ -301,10 +301,10 @@ payload of local chunks:
|
|
301
301
|
[l.sender, l.payload[0]] unless l.payload[1] == [nil]
|
302
302
|
end
|
303
303
|
|
304
|
-
At the same time, we use the Ruby
|
304
|
+
At the same time, we use the Ruby _flat_map_ method to flatten the array of chunks in the heartbeat payload into a set of tuples, which we
|
305
305
|
associate with the heartbeating datanode and the time of receipt in __chunk_cache__:
|
306
306
|
|
307
|
-
chunk_cache <=
|
307
|
+
chunk_cache <= (master_duty_cycle * last_heartbeat).flat_map do |d, l|
|
308
308
|
unless l.payload[1].nil?
|
309
309
|
l.payload[1].map do |pay|
|
310
310
|
[l.peer, pay, Time.parse(d.val).to_f]
|
@@ -312,14 +312,13 @@ associate with the heartbeating datanode and the time of receipt in __chunk_cach
|
|
312
312
|
end
|
313
313
|
end
|
314
314
|
|
315
|
-
We periodically garbage-collect this
|
315
|
+
We periodically garbage-collect this cache, removing entries for datanodes from whom we have not received a heartbeat in a configurable amount of time.
|
316
316
|
__last_heartbeat__ is an output interface provided by the __HeartbeatAgent__ module, and contains the most recent, non-stale heartbeat contents:
|
317
317
|
|
318
|
-
chunk_cache <-
|
318
|
+
chunk_cache <-(master_duty_cycle * chunk_cache).pairs do |t, c|
|
319
319
|
c unless last_heartbeat.map{|h| h.peer}.include? c.node
|
320
320
|
end
|
321
321
|
|
322
|
-
|
323
322
|
## [BFS Client](https://github.com/bloom-lang/bud-sandbox/blob/master/bfs/bfs_client.rb)
|
324
323
|
|
325
324
|
One of the most complicated parts of the basic GFS design is the client component. To minimize load on the centralized master, we take it off the critical
|
@@ -369,7 +368,6 @@ After defining some helper aggregates (__chunk_cnts_chunk__ or replica count by
|
|
369
368
|
|
370
369
|
we define __lowchunks__ as the set of chunks whose replication factor is too low:
|
371
370
|
|
372
|
-
|
373
371
|
lowchunks <= chunk_cnts_chunk { |c| [c.chunkid] if c.replicas < REP_FACTOR and !c.chunkid.nil?}
|
374
372
|
|
375
373
|
We define __chosen_dest__ for a given underreplicated chunk as the datanode with
|
data/docs/c.html
ADDED
@@ -0,0 +1,251 @@
|
|
1
|
+
<h1>Bud Cheat Sheet</h1>
|
2
|
+
<h2>General Bloom Syntax Rules</h2>
|
3
|
+
<p>Bloom programs are unordered sets of statements.<br>
|
4
|
+
Statements are delimited by semicolons (;) or newlines. <br>
|
5
|
+
As in Ruby, backslash is used to escape a newline.<br></p>
|
6
|
+
<h2>Simple embedding of Bud in a Ruby Class</h2>
|
7
|
+
<pre><code>require 'bud'
|
8
|
+
|
9
|
+
class Foo
|
10
|
+
include Bud
|
11
|
+
|
12
|
+
state do
|
13
|
+
...
|
14
|
+
end
|
15
|
+
|
16
|
+
bloom do
|
17
|
+
...
|
18
|
+
end
|
19
|
+
end
|
20
|
+
</code></pre>
|
21
|
+
<h2>State Declarations</h2>
|
22
|
+
<p>A <code>state</code> block contains Bud collection definitions. A Bud collection is a <em>set</em>
|
23
|
+
of <em>facts</em>; each fact is an array of Ruby values. Note that collections do not
|
24
|
+
contain duplicates (inserting a duplicate fact into a collection is ignored).</p>
|
25
|
+
<p>Like tables in a relational database, zero or more columns in a collection make
|
26
|
+
up the collection's <em>key</em>. Attempting to insert two facts that agree on the key
|
27
|
+
columns but are not duplicates results in a primary key violation (runtime
|
28
|
+
exception).</p>
|
29
|
+
<h3>Default Declaration Syntax</h3>
|
30
|
+
<p><em>BudCollection :name, [keys] => [values]</em></p>
|
31
|
+
<h3>table</h3>
|
32
|
+
<p>Contents persist in memory until explicitly deleted.<br>
|
33
|
+
Default attributes: <code>[:key] => [:val]</code></p>
|
34
|
+
<pre><code>table :keyvalue
|
35
|
+
table :composite, [:keyfield1, :keyfield2] => [:values]
|
36
|
+
table :noDups, [:field1, field2]
|
37
|
+
</code></pre>
|
38
|
+
<h3>scratch</h3>
|
39
|
+
<p>Contents emptied at start of each timestep.<br>
|
40
|
+
Default attributes: <code>[:key] => [:val]</code></p>
|
41
|
+
<pre><code>scratch :stats
|
42
|
+
</code></pre>
|
43
|
+
<h3>interface</h3>
|
44
|
+
<p>Scratch collections, used as connection points between modules.<br>
|
45
|
+
Default attributes: <code>[:key] => [:val]</code></p>
|
46
|
+
<pre><code>interface input, :request
|
47
|
+
interface output, :response
|
48
|
+
</code></pre>
|
49
|
+
<h3>channel</h3>
|
50
|
+
<p>Network channel manifested as a scratch collection.<br>
|
51
|
+
Facts that are inserted into a channel are sent to a remote host; the address of the remote host is specified in an attribute of the channel that is denoted with <code>@</code>.<br>
|
52
|
+
Default attributes: <code>[:@address, :val] => []</code></p>
|
53
|
+
<p>(Bloom statements with channel on lhs must use async merge (<code><~</code>).)</p>
|
54
|
+
<pre><code>channel :msgs
|
55
|
+
channel :req_chan, [:cartnum, :storenum, :@server] => [:command, :params]
|
56
|
+
</code></pre>
|
57
|
+
<h3>periodic</h3>
|
58
|
+
<p>System timer manifested as a scratch collection.<br>
|
59
|
+
System-provided attributes: <code>[:key] => [:val]</code><br>
|
60
|
+
(<code>key</code> is a unique ID, <code>val</code> is a Ruby Time converted to a string.)<br>
|
61
|
+
State declaration includes interval (in seconds).</p>
|
62
|
+
<p>(periodic can only be used on rhs of a Bloom statement.)</p>
|
63
|
+
<pre><code>periodic :timer, 0.1
|
64
|
+
</code></pre>
|
65
|
+
<h3>stdio</h3>
|
66
|
+
<p>Built-in scratch collection mapped to Ruby's <code>$stdin</code> and <code>$stdout</code><br>
|
67
|
+
System-provided attributes: <code>[:line] => []</code></p>
|
68
|
+
<p>Statements with stdio on lhs must use async merge (<code><~</code>).<br>
|
69
|
+
To capture <code>$stdin</code> on rhs, instantiate Bud with <code>:read_stdin</code> option.<br></p>
|
70
|
+
<h3>tctable</h3>
|
71
|
+
<p>Table collection mapped to a <a href="http://fallabs.com/tokyocabinet/">Tokyo Cabinet</a> store.<br>
|
72
|
+
Default attributes: <code>[:key] => [:val]</code></p>
|
73
|
+
<pre><code>tctable :t1
|
74
|
+
tctable :t2, [:k1, :k2] => [:v1, :v2]
|
75
|
+
</code></pre>
|
76
|
+
<h3>zktable</h3>
|
77
|
+
<p>Table collection mapped to an <a href="http://hadoop.apache.org/zookeeper/">Apache Zookeeper</a> store.<br>
|
78
|
+
System-provided attributes: <code>[:key] => [:val]</code><br>
|
79
|
+
State declaration includes Zookeeper path and optional TCP string (default: "localhost:2181")<br></p>
|
80
|
+
<pre><code>zktable :foo, "/bat"
|
81
|
+
zktable :bar, "/dat", "localhost:2182"
|
82
|
+
</code></pre>
|
83
|
+
<h2>Bloom Statements</h2>
|
84
|
+
<p><em>lhs BloomOp rhs</em></p>
|
85
|
+
<p>Left-hand-side (lhs) is a named <code>BudCollection</code> object.<br>
|
86
|
+
Right-hand-side (rhs) is a Ruby expression producing a <code>BudCollection</code> or <code>Array</code> of <code>Arrays</code>.<br>
|
87
|
+
BloomOp is one of the 4 operators listed below.</p>
|
88
|
+
<h3>Bloom Operators</h3>
|
89
|
+
<p>merges:</p>
|
90
|
+
<ul>
|
91
|
+
<li><code>left <= right</code> (<em>instantaneous</em>)</li>
|
92
|
+
<li><code>left <+ right</code> (<em>deferred</em>)</li>
|
93
|
+
<li><code>left <~ right</code> (<em>asynchronous</em>)</li>
|
94
|
+
</ul>
|
95
|
+
<p>delete:</p>
|
96
|
+
<ul>
|
97
|
+
<li><code>left <- right</code> (<em>deferred</em>)</li>
|
98
|
+
</ul>
|
99
|
+
<h3>Collection Methods</h3>
|
100
|
+
<p>Standard Ruby methods used on a BudCollection <code>bc</code>:</p>
|
101
|
+
<p>implicit map:</p>
|
102
|
+
<pre><code>t1 <= bc {|t| [t.col1 + 4, t.col2.chomp]} # formatting/projection
|
103
|
+
t2 <= bc {|t| t if t.col = 5} # selection
|
104
|
+
</code></pre>
|
105
|
+
<p><code>flat_map</code>:</p>
|
106
|
+
<pre><code>require 'backports' # flat_map not included in Ruby 1.8 by default
|
107
|
+
|
108
|
+
t3 <= bc.flat_map do |t| # unnest a collection-valued attribute
|
109
|
+
bc.col4.map { |sub| [t.col1, t.col2, t.col3, sub] }
|
110
|
+
end
|
111
|
+
</code></pre>
|
112
|
+
<p><code>bc.reduce</code>, <code>bc.inject</code>:</p>
|
113
|
+
<pre><code>t4 <= bc.reduce({}) do |memo, t| # example: groupby col1 and count
|
114
|
+
memo[t.col1] ||= 0
|
115
|
+
memo[t.col1] += 1
|
116
|
+
memo
|
117
|
+
end
|
118
|
+
</code></pre>
|
119
|
+
<p><code>bc.include?</code>:</p>
|
120
|
+
<pre><code>t5 <= bc do |t| # like SQL's NOT IN
|
121
|
+
t unless t2.include?([t.col1, t.col2])
|
122
|
+
end
|
123
|
+
</code></pre>
|
124
|
+
<h2>BudCollection-Specific Methods</h2>
|
125
|
+
<p><code>bc.keys</code>: projects <code>bc</code> to key columns<br></p>
|
126
|
+
<p><code>bc.values</code>: projects <code>bc</code> to non-key columns<br></p>
|
127
|
+
<p><code>bc.inspected</code>: shorthand for <code>bc {|t| [t.inspect]}</code></p>
|
128
|
+
<pre><code>stdio <~ bc.inspected
|
129
|
+
</code></pre>
|
130
|
+
<p><code>chan.payloads</code>: projects <code>chan</code> to non-address columns. Only defined for channels.</p>
|
131
|
+
<pre><code># at sender
|
132
|
+
msgs <~ requests {|r| "127.0.0.1:12345", r}
|
133
|
+
# at receiver
|
134
|
+
requests <= msgs.payloads
|
135
|
+
</code></pre>
|
136
|
+
<p><code>bc.exists?</code>: test for non-empty collection. Can optionally pass in a block.</p>
|
137
|
+
<pre><code>stdio <~ [["Wake Up!"] if timer.exists?]
|
138
|
+
stdio <~ requests do |r|
|
139
|
+
[r.inspect] if msgs.exists?{|m| r.ident == m.ident}
|
140
|
+
end
|
141
|
+
</code></pre>
|
142
|
+
<h2>SQL-style grouping/aggregation (and then some)</h2>
|
143
|
+
<ul>
|
144
|
+
<li><code>bc.group([:col1, :col2], min(:col3))</code>. <em>akin to min(col3) GROUP BY (col1,col2)</em></li>
|
145
|
+
<li>exemplary aggs: <code>min</code>, <code>max</code>, <code>choose</code></li>
|
146
|
+
<li>summary aggs: <code>sum</code>, <code>avg</code>, <code>count</code></li>
|
147
|
+
<li>structural aggs: <code>accum</code></li>
|
148
|
+
<li><code>bc.argmax([:col1], :col2)</code> <em>returns the bc tuple per col1 that has highest col2</em></li>
|
149
|
+
<li><code>bc.argmin([:col1], :col2)</code></li>
|
150
|
+
</ul>
|
151
|
+
<h3>Built-in Aggregates:</h3>
|
152
|
+
<ul>
|
153
|
+
<li>Exemplary aggs: <code>min</code>, <code>max</code>, <code>choose</code></li>
|
154
|
+
<li>Summary aggs: <code>count</code>, <code>sum</code>, <code>avg</code></li>
|
155
|
+
<li>Structural aggs: <code>accum</code></li>
|
156
|
+
</ul>
|
157
|
+
<p>Note that custom aggregation can be written using <code>reduce</code>.</p>
|
158
|
+
<h2>Collection Combination (Join)</h2>
|
159
|
+
<p>To match items across two (or more) collections, use the <code>*</code> operator, followed by methods to filter/format the result (<code>pairs</code>, <code>matches</code>, <code>combos</code>, <code>lefts</code>, <code>rights</code>).</p>
|
160
|
+
<h3>Methods on Combinations (Joins)</h3>
|
161
|
+
<p><code>pairs(</code><em>hash pairs</em><code>)</code>: <br>
|
162
|
+
Given a <code>*</code> expression, form all pairs of items with value matches in the hash-pairs attributes. Hash pairs can be fully qualified (<code>coll1.attr1 => coll2.attr2</code>) or shorthand (<code>:attr1 => :attr2</code>).</p>
|
163
|
+
<pre><code># for each inbound msg, find match in a persistent buffer
|
164
|
+
result <= (msg * buffer).pairs(:val => :key) {|m, b| [m.address, m.val, b.val] }
|
165
|
+
</code></pre>
|
166
|
+
<p><code>combos(</code><em>hash pairs</em><code>)</code>: <br>
|
167
|
+
Alias for <code>pairs</code>, more readable for multi-collection <code>*</code> expressions. Must use fully-qualified hash pairs.</p>
|
168
|
+
<pre><code># the following 2 Bloom statements are equivalent to this SQL
|
169
|
+
# SELECT r.a, s_tab.b, t.c
|
170
|
+
# FROM r, s_tab, t
|
171
|
+
# WHERE r.x = s_tab.x
|
172
|
+
# AND s_tab.x = t.x;
|
173
|
+
|
174
|
+
# multiple column matches
|
175
|
+
out <= (r * s_tab * t).combos(r.x => s_tab.x, s_tab.x => t.x) do |t1, t2, t3|
|
176
|
+
[t1.a, t2.b, t3.c]
|
177
|
+
end
|
178
|
+
|
179
|
+
# column matching done per pair: this will be very slow
|
180
|
+
out <= (r * s_tab * t).combos do |t1, t2, t3|
|
181
|
+
[t1.a, t2.b, t3.c] if r.x == s_tab.x and s_tab.x == t.x
|
182
|
+
end
|
183
|
+
</code></pre>
|
184
|
+
<p><code>matches</code>:<br>
|
185
|
+
Shorthand for <code>combos</code> with hash pairs for all attributes with matching names.</p>
|
186
|
+
<pre><code># Equivalent to the above statements if x is the only attribute name in common:
|
187
|
+
out <= (r * s_tab * t).matches do {|t1, t2, t3| [t1.a, t2.b, t3.c]}
|
188
|
+
</code></pre>
|
189
|
+
<p><code>lefts(</code><em>hash pairs</em><code>)</code>: <br>
|
190
|
+
Like <code>pairs</code>, but implicitly includes a block that projects down to the left item in each pair.</p>
|
191
|
+
<p><code>rights(</code><em>hash pairs</em><code>)</code>:
|
192
|
+
Like <code>pairs</code>, but implicitly includes a block that projects down to the right item in each pair.</p>
|
193
|
+
<p><code>flatten</code>:<br>
|
194
|
+
<code>flatten</code> is a bit like SQL's <code>SELECT *</code>: it produces a collection of concatenated objects, with a schema that is the concatenation of the schemas in tablelist (with duplicate names disambiguated.) Useful for chaining to operators that expect input collections with schemas, e.g. group:</p>
|
195
|
+
<pre><code>out <= (r * s).matches.flatten.group([:a], max(:b))
|
196
|
+
</code></pre>
|
197
|
+
<p><code>outer(</code><em>hash pairs</em><code>)</code>:<br>
|
198
|
+
Left Outer Join. Like <code>pairs</code>, but objects in the first collection will be produced nil-padded if they have no match in the second collection.</p>
|
199
|
+
<h2>Temp Collections</h2>
|
200
|
+
<p><code>temp</code><br>
|
201
|
+
Temp collections are scratches defined within a <code>bloom</code> block:</p>
|
202
|
+
<pre><code>temp :my_scratch1 <= foo
|
203
|
+
</code></pre>
|
204
|
+
<p>The schema of a temp collection in inherited from the rhs; if the rhs has no
|
205
|
+
schema, a simple one is manufactured to suit the data found in the rhs at
|
206
|
+
runtime: <code>[c0, c1, ...]</code>.</p>
|
207
|
+
<h2>Bud Modules</h2>
|
208
|
+
<p>A Bud module combines state (collections) and logic (Bloom rules). Using modules allows your program to be decomposed into a collection of smaller units.</p>
|
209
|
+
<p>Definining a Bud module is identical to defining a Ruby module, except that the module can use the <code>bloom</code>, <code>bootstrap</code>, and <code>state</code> blocks described above.</p>
|
210
|
+
<p>There are two ways to use a module <em>B</em> in another Bloom module <em>A</em>:</p>
|
211
|
+
<ol>
|
212
|
+
<li>
|
213
|
+
<p><code>include B</code>: This "inlines" the definitions (state and logic) from <em>B</em> into
|
214
|
+
<em>A</em>. Hence, collections defined in <em>B</em> can be accessed from <em>A</em> (via the
|
215
|
+
same syntax as <em>A</em>'s own collections). In fact, since Ruby is
|
216
|
+
dynamically-typed, Bloom statements in <em>B</em> can access collections
|
217
|
+
in <em>A</em>!</p>
|
218
|
+
</li>
|
219
|
+
<li>
|
220
|
+
<p><code>import B => :b</code>: The <code>import</code> statement provides a more structured way to
|
221
|
+
access another module. Module <em>A</em> can now access state defined in <em>B</em> by
|
222
|
+
using the qualifier <code>b</code>. <em>A</em> can also import two different copies of <em>B</em>,
|
223
|
+
and give them local names <code>b1</code> and <code>b2</code>; these copies will be independent
|
224
|
+
(facts inserted into a collection defined in <code>b1</code> won't also be inserted
|
225
|
+
into <code>b2</code>'s copy of the collection).</p>
|
226
|
+
</li>
|
227
|
+
</ol>
|
228
|
+
<h2>Skeleton of a Bud Module</h2>
|
229
|
+
<pre><code>require 'rubygems'
|
230
|
+
require 'bud'
|
231
|
+
|
232
|
+
module YourModule
|
233
|
+
include Bud
|
234
|
+
|
235
|
+
state do
|
236
|
+
...
|
237
|
+
end
|
238
|
+
|
239
|
+
bootstrap do
|
240
|
+
...
|
241
|
+
end
|
242
|
+
|
243
|
+
bloom :some_stmts do
|
244
|
+
...
|
245
|
+
end
|
246
|
+
|
247
|
+
bloom :more_stmts do
|
248
|
+
...
|
249
|
+
end
|
250
|
+
end
|
251
|
+
</code></pre>
|
data/docs/cheat.md
CHANGED
@@ -9,19 +9,26 @@ As in Ruby, backslash is used to escape a newline.<br>
|
|
9
9
|
require 'bud'
|
10
10
|
|
11
11
|
class Foo
|
12
|
-
|
12
|
+
include Bud
|
13
13
|
|
14
|
-
|
15
|
-
|
16
|
-
|
14
|
+
state do
|
15
|
+
...
|
16
|
+
end
|
17
17
|
|
18
|
-
|
19
|
-
|
20
|
-
|
18
|
+
bloom do
|
19
|
+
...
|
20
|
+
end
|
21
21
|
end
|
22
22
|
|
23
23
|
## State Declarations ##
|
24
|
-
A `state` block contains Bud collection definitions.
|
24
|
+
A `state` block contains Bud collection definitions. A Bud collection is a *set*
|
25
|
+
of *facts*; each fact is an array of Ruby values. Note that collections do not
|
26
|
+
contain duplicates (inserting a duplicate fact into a collection is ignored).
|
27
|
+
|
28
|
+
Like a table in a relational databas, a subset of the columns in a collection
|
29
|
+
makeup the collection's _key_. Attempting to insert two facts into a collection
|
30
|
+
that agree on the key columns (but are not duplicates) results in a runtime
|
31
|
+
exception.
|
25
32
|
|
26
33
|
### Default Declaration Syntax ###
|
27
34
|
*BudCollection :name, [keys] => [values]*
|
@@ -57,10 +64,18 @@ Default attributes: `[:@address, :val] => []`
|
|
57
64
|
channel :msgs
|
58
65
|
channel :req_chan, [:cartnum, :storenum, :@server] => [:command, :params]
|
59
66
|
|
67
|
+
### loopback ###
|
68
|
+
A network channel that delivers tuples back to the current Bud instance.<br>
|
69
|
+
Default attributes: `[:key] => [:val]`
|
70
|
+
|
71
|
+
(Bloom statements with loopback on lhs must use async merge (`<~`).)
|
72
|
+
|
73
|
+
loopback :talk_to_self
|
74
|
+
|
60
75
|
### periodic ###
|
61
76
|
System timer manifested as a scratch collection.<br>
|
62
77
|
System-provided attributes: `[:key] => [:val]`<br>
|
63
|
-
(`key` is a unique ID, `val` is a Ruby Time
|
78
|
+
(`key` is a unique ID, `val` is a Ruby `Time` object.)<br>
|
64
79
|
State declaration includes interval (in seconds).
|
65
80
|
|
66
81
|
(periodic can only be used on rhs of a Bloom statement.)
|
@@ -68,11 +83,19 @@ State declaration includes interval (in seconds).
|
|
68
83
|
periodic :timer, 0.1
|
69
84
|
|
70
85
|
### stdio ###
|
71
|
-
Built-in scratch collection
|
86
|
+
Built-in scratch collection for performing terminal I/O.<br>
|
72
87
|
System-provided attributes: `[:line] => []`
|
73
88
|
|
74
89
|
Statements with stdio on lhs must use async merge (`<~`).<br>
|
75
|
-
|
90
|
+
Using `stdio` on the lhs of an async merge results in writing to the `IO` object specified by the `:stdout` Bud option (`$stdout` by default).<br>
|
91
|
+
To use `stdio` on rhs, instantiate Bud with `:stdin` option set to an `IO` object (e.g., `$stdin`).<br>
|
92
|
+
|
93
|
+
### dbm_table ###
|
94
|
+
Table collection mapped to a [DBM] (http://en.wikipedia.org/wiki/Dbm) store.<br>
|
95
|
+
Default attributes: `[:key] => [:val]`
|
96
|
+
|
97
|
+
dbm_table :t1
|
98
|
+
dbm_table :t2, [:k1, :k2] => [:v1, :v2]
|
76
99
|
|
77
100
|
### tctable ###
|
78
101
|
Table collection mapped to a [Tokyo Cabinet](http://fallabs.com/tokyocabinet/) store.<br>
|
@@ -95,7 +118,7 @@ State declaration includes Zookeeper path and optional TCP string (default: "loc
|
|
95
118
|
|
96
119
|
Left-hand-side (lhs) is a named `BudCollection` object.<br>
|
97
120
|
Right-hand-side (rhs) is a Ruby expression producing a `BudCollection` or `Array` of `Arrays`.<br>
|
98
|
-
BloomOp is one of the
|
121
|
+
BloomOp is one of the 4 operators listed below.
|
99
122
|
|
100
123
|
### Bloom Operators ###
|
101
124
|
merges:
|
@@ -108,13 +131,6 @@ delete:
|
|
108
131
|
|
109
132
|
* `left <- right` (*deferred*)
|
110
133
|
|
111
|
-
insert:<br>
|
112
|
-
|
113
|
-
* `left << [...]` (*instantaneous*)
|
114
|
-
|
115
|
-
Note that unlike merge/delete, insert expects a single fact on the rhs, rather
|
116
|
-
than a collection.
|
117
|
-
|
118
134
|
### Collection Methods ###
|
119
135
|
Standard Ruby methods used on a BudCollection `bc`:
|
120
136
|
|
@@ -154,7 +170,7 @@ implicit map:
|
|
154
170
|
|
155
171
|
stdio <~ bc.inspected
|
156
172
|
|
157
|
-
`chan.payloads`: projects `chan` to non-address columns.
|
173
|
+
`chan.payloads`: projects `chan` to non-address columns. Only defined for channels.
|
158
174
|
|
159
175
|
# at sender
|
160
176
|
msgs <~ requests {|r| "127.0.0.1:12345", r}
|
@@ -191,13 +207,13 @@ To match items across two (or more) collections, use the `*` operator, followed
|
|
191
207
|
### Methods on Combinations (Joins) ###
|
192
208
|
|
193
209
|
`pairs(`*hash pairs*`)`: <br>
|
194
|
-
|
210
|
+
Given a `*` expression, form all pairs of items with value matches in the hash-pairs attributes. Hash pairs can be fully qualified (`coll1.attr1 => coll2.attr2`) or shorthand (`:attr1 => :attr2`).
|
195
211
|
|
196
212
|
# for each inbound msg, find match in a persistent buffer
|
197
213
|
result <= (msg * buffer).pairs(:val => :key) {|m, b| [m.address, m.val, b.val] }
|
198
214
|
|
199
|
-
`
|
200
|
-
|
215
|
+
`combos(`*hash pairs*`)`: <br>
|
216
|
+
Alias for `pairs`, more readable for multi-collection `*` expressions. Must use fully-qualified hash pairs.
|
201
217
|
|
202
218
|
# the following 2 Bloom statements are equivalent to this SQL
|
203
219
|
# SELECT r.a, s_tab.b, t.c
|
@@ -211,16 +227,16 @@ alias for `pairs`, more readable for multi-collection `*` expressions. Must use
|
|
211
227
|
end
|
212
228
|
|
213
229
|
# column matching done per pair: this will be very slow
|
214
|
-
out <=
|
230
|
+
out <= (r * s_tab * t).combos do |t1, t2, t3|
|
215
231
|
[t1.a, t2.b, t3.c] if r.x == s_tab.x and s_tab.x == t.x
|
216
232
|
end
|
217
233
|
|
218
234
|
`matches`:<br>
|
219
|
-
Shorthand for `combos` with hash pairs for all attributes with matching names.
|
235
|
+
Shorthand for `combos` with hash pairs for all attributes with matching names; this is called the "natural join" in SQL.
|
220
236
|
|
221
237
|
# Equivalent to the above statements if x is the only attribute name in common:
|
222
|
-
out <= (r * s_tab * t).matches
|
223
|
-
|
238
|
+
out <= (r * s_tab * t).matches {|t1, t2, t3| [t1.a, t2.b, t3.c]}
|
239
|
+
|
224
240
|
`lefts(`*hash pairs*`)`: <br>
|
225
241
|
Like `pairs`, but implicitly includes a block that projects down to the left item in each pair.
|
226
242
|
|
@@ -232,9 +248,8 @@ Like `pairs`, but implicitly includes a block that projects down to the right it
|
|
232
248
|
|
233
249
|
out <= (r * s).matches.flatten.group([:a], max(:b))
|
234
250
|
|
235
|
-
|
236
|
-
|
237
|
-
Left Outer Join. Note postfix syntax with array of 2 collections as first argument, hash pairs as subsequent arguments. Objects in the first collection will be included in the output even if no match is found in the second collection.
|
251
|
+
`outer(`*hash pairs*`)`:<br>
|
252
|
+
Left Outer Join. Like `pairs`, but objects in the first collection will be produced nil-padded if they have no match in the second collection.
|
238
253
|
|
239
254
|
## Temp Collections ##
|
240
255
|
`temp`<br>
|
data/docs/deploy.md
CHANGED
@@ -1,45 +1,43 @@
|
|
1
1
|
# Deployment
|
2
2
|
|
3
|
-
Bud provides support for deploying a program onto a set of Bud instances. At the moment, two types of deployments are supported: local deployment and EC2 deployment. Intuitively, you include the module corresponding to the type of deployment you want into a Bud class, which you instantiate and run on a node called the "deployer." The deployer then spins up a requested number of Bud instances and distributes initial data.
|
3
|
+
Bud provides support for deploying a program onto a set of Bud instances. At the moment, two types of deployments are supported: fork-based local deployment and EC2 deployment. Intuitively, you include the module corresponding to the type of deployment you want into a Bud class, which you instantiate and run on a node called the "deployer." The deployer then spins up a requested number of Bud instances and distributes initial data.
|
4
4
|
|
5
5
|
First, decide which type of deployment you want to use.
|
6
6
|
|
7
|
-
## Local Deployment
|
7
|
+
## Fork-based Local Deployment
|
8
8
|
|
9
|
-
To use
|
9
|
+
To use fork-based deployment, you'll need to include it in your class:
|
10
10
|
|
11
|
-
|
11
|
+
include ForkDeploy
|
12
12
|
|
13
|
-
|
14
|
-
|
15
|
-
include LocalDeploy
|
16
|
-
|
17
|
-
The next step is to declare how many nodes you want to the program to be spun up on. You need to do this in a `deploystrap` block. A `deploystrap` block is run before `bootstrap`, and is only run for a Bud class that is instantiated with the option "`:deploy => true`". As an example:
|
13
|
+
The next step is to declare how many nodes you want to the program to be spun up on. You need to do this in a `deploystrap` block. A `deploystrap` block is run before `bootstrap`, and is only run for a Bud class that is instantiated with the option `:deploy => true`. As an example:
|
18
14
|
|
19
15
|
deploystrap do
|
20
16
|
num_nodes <= [[2]]
|
21
17
|
end
|
22
18
|
|
23
|
-
|
19
|
+
Fork-based deployment will spin up `num_nodes` local processes, each containing one Bud instance, running the class that you include `ForkDeploy` in. The deployment code will populate a binary collection called `node`; the first columm is a "node ID" -- a distinct integer from the range `[0, num_nodes - 1]` -- and the second argument is an "IP:port" string associated with the node. Nodes are spun up on ephemeral ports, listening on "localhost".
|
24
20
|
|
25
|
-
Now, you need to define how you want the initial data to be distributed. You can do this, for example, by writing (multiple) rules with `initial_data` in the head. The schema of `initial_data` is as follows: [node ID, relation name as a symbol, list of tuples].
|
21
|
+
Now, you need to define how you want the initial data to be distributed. You can do this, for example, by writing (multiple) rules with `initial_data` in the head. These rules can appear in any `bloom` block in your program. The schema of `initial_data` is as follows: [node ID, relation name as a symbol, list of tuples].
|
26
22
|
|
27
|
-
|
23
|
+
For example, to distribute the IP address and port of the "deployer" to all of the other nodes in a relation called `master`, you might decide to write something like this:
|
28
24
|
|
29
|
-
|
25
|
+
initial_data <= node {|n| [n.uid, :master, [[ip_port]]]}
|
30
26
|
|
31
|
-
|
27
|
+
Note that the relation (`master` in this case) cannot be a channel -- you may only distribute data to scratches and tables. Initial data is transferred only after _all_ nodes are spun up; this ensures that initial data will never be lost because a node is not yet listening on a socket, for example. Initial data is transmitted atomically to each node; this means that on each node, _all_ initial data in _all_ relations will arrive at the same Bud timestep. However, there is no global barrier for transfer of initial data. For example, if initial data is distributed to nodes 1 and 2, node 1 may receive its initial data first, and then send subsequent messages on channels to node 2 which node 2 may receive before its initial data.
|
32
28
|
|
33
|
-
The final step is to add `:deploy => true` to the instantiation of your class. Note that the
|
29
|
+
The final step is to add `:deploy => true` to the instantiation of your class. Note that the fork-based deployer will spin up nodes without `:deploy => true`, so you don't forkbomb your system.
|
34
30
|
|
35
31
|
|
36
32
|
## EC2 Deployment
|
37
33
|
|
38
|
-
To use EC2 deployment you'll need to require it in your program:
|
34
|
+
To use EC2 deployment, you'll need to require it in your program:
|
39
35
|
|
40
36
|
require 'bud/deploy/ec2deploy'
|
41
37
|
|
42
|
-
|
38
|
+
Note that the `amazon-ec2`, `net-scp`, and `net-ssh` gems must be installed.
|
39
|
+
|
40
|
+
Next, include the `EC2Deploy` module in your class:
|
43
41
|
|
44
42
|
include EC2Deploy
|
45
43
|
|
@@ -71,26 +69,28 @@ This recursively copies all directories and files rooted at the current working
|
|
71
69
|
|
72
70
|
Finally, `ec2_key_location` is the path to the private key of the `key_name` keypair. For example:
|
73
71
|
|
74
|
-
|
72
|
+
ec2_key_location <= [["/home/bob/.ssh/bob.pem"]]
|
75
73
|
|
76
|
-
EC2 deployment will spin up `num_nodes` instances (using defaults) on EC2 using a pre-rolled Bud AMI based on Amazon's 32-bit Linux AMI (`ami-8c1fece5`). Each instance contains one Bud instance, which runs the `ruby_command`. Like before, the deployment code will populate a binary relation called `node`; the first
|
74
|
+
EC2 deployment will spin up `num_nodes` instances (using defaults) on EC2 using a pre-rolled Bud AMI based on Amazon's 32-bit Linux AMI (`ami-8c1fece5`). Each instance contains one Bud instance, which runs the `ruby_command`. Like before, the deployment code will populate a binary relation called `node`; the first attribute is a "node ID" -- a distinct integer from the range [0, num_nodes - 1] -- and the second attribute is an "IP:port" string associated with the node. Nodes are currently spun up on fixed port 54321.
|
77
75
|
|
78
|
-
Defining
|
76
|
+
Defining `initial_data` works exactly the same way with EC2 deployment as it does with local deployment.
|
79
77
|
|
80
|
-
There is a slight catch with EC2 deployment. Sometimes EC2 tells us that the nodes have all started up, but really, one or more nodes will never start up. Currently, in this scenario, deployment exceeds the maximum number of
|
78
|
+
There is a slight catch with EC2 deployment. Sometimes EC2 tells us that the nodes have all started up, but really, one or more nodes will never start up. Currently, in this scenario, deployment exceeds the maximum number of SSH retries, and throws an exception.
|
81
79
|
|
82
80
|
Note that EC2 deployment does *not* shut down the EC2 nodes it starts up under any circumstances. This means you must use some alternate means to shut down the nodes, such as logging onto the EC2 web interface and terminating the nodes.
|
83
81
|
|
84
82
|
## Examples
|
85
83
|
|
86
|
-
Check out the `examples/deploy` directory in Bud. There is a simple token ring example that establishes a ring involving 10 nodes and sends a token around the ring continuously. This example can be deployed locally:
|
84
|
+
Check out the `examples/deploy` directory in Bud. There is a simple token ring example that establishes a ring involving 10 nodes and sends a token around the ring continuously. This example can be deployed locally using the fork-based deployer:
|
87
85
|
|
88
|
-
ruby tokenring-
|
86
|
+
ruby tokenring-fork.rb
|
89
87
|
|
90
88
|
or on EC2:
|
91
89
|
|
92
|
-
ruby tokenring-ec2.rb local_ip:local_port ext_ip true
|
90
|
+
ruby tokenring-ec2.rb local_ip:local_port ext_ip:ext_port true
|
91
|
+
|
92
|
+
"ext_ip" and "ext_port" should be set to the externally-visible IP and port of the computer you are deploying from. For example, if you are behind a home router, you will want to set "ext_ip" to your public IP address, and ensure that your router forwards "ext_port" to local_port on the computer with private IP "local_ip".
|
93
93
|
|
94
|
-
Note that before running `tokenring-ec2`, you must create a "keys.rb"
|
94
|
+
Note that before running `tokenring-ec2`, you must create a file named "keys.rb" that contains definitions for `access_key_id`, `secret_access_key`, `key_name` and `ec2_key_location`.
|
95
95
|
|
96
|
-
Output will be displayed to show the progress of the deployment. Be patient, it may take a while for output to appear. Once deployment is complete and all nodes are ready, each node will display output indicating when it has the token. All output will be visible for the
|
96
|
+
Output will be displayed to show the progress of the deployment. Be patient, it may take a while for output to appear. Once deployment is complete and all nodes are ready, each node will display output indicating when it has the token. All output will be visible for the fork-based deployment case, whereas only the deployer node's output will be visible for the EC2 deployment case (stdout of all other nodes is written to local disk).
|
data/docs/getstarted.md
CHANGED
@@ -2,7 +2,11 @@
|
|
2
2
|
In this document we'll do a hands-on tour of Bud and its Bloom DSL for Ruby. We'll start with some examples, and introduce concepts as we go.
|
3
3
|
|
4
4
|
## Installation ##
|
5
|
-
|
5
|
+
Bud depends on one library that needs to be installed separately:
|
6
|
+
|
7
|
+
* [GraphViz](http://www.graphviz.org/Download.php) (2.26.3 recommended)
|
8
|
+
|
9
|
+
Once that's done, you know the drill!
|
6
10
|
|
7
11
|
% gem install bud
|
8
12
|
|
@@ -248,8 +252,6 @@ And here's the code:
|
|
248
252
|
end
|
249
253
|
end
|
250
254
|
|
251
|
-
|
252
|
-
|
253
255
|
if ARGV.length == 2
|
254
256
|
server = ARGV[1]
|
255
257
|
else
|
@@ -264,7 +266,7 @@ The ChatClient class has a typical Ruby `initialize` method that sets up two loc
|
|
264
266
|
|
265
267
|
The next block in the class is the first Bloom `bootstrap` block we've seen. This is a set of Bloom statements that are evaluated only once, just before the first "regular" timestep of the system. In this case, we bootstrap the client by sending a message to the server on the connect channel, containing the client's address (via the built-in Bud instance method `ip_port`) and chosen nickname.
|
266
268
|
|
267
|
-
After that comes a
|
269
|
+
After that comes a Bloom block with the name `:chatter`. It contains two statements: one to take stdio input from the terminal and send it to the server via mcast, and another to receive mcasts and place them on stdio output. The first statement has the built-in `stdio` scratch on the rhs: this includes any lines of terminal input that arrived since the last timestep. For each line of terminal input, the `do...end` block formats an `mcast` message destined to the address in the instance variable `@server`, with an array as the payload. The rhs of the second statement takes `mcast` messages that arrived since the last timestep. For each message `m`, the `m.val` expression in the block returns the message payload; the call to the Ruby instance method `pretty_print` formats the message so it will look nice on-screen. These formatted strings are placed (asynchronously, as before) into `stdio` on the left.
|
268
270
|
|
269
271
|
The remaining lines are Ruby driver code to instantiate and run the ChatClient class (which includes the `Bud` module) using arguments from the command line. Note the option `:read_stdin => true` to `ChatClient.new`: this causes the Bud runtime to capture stdin via the built-in `stdio` collection.
|
270
272
|
|