RubyGems - bud - Versions diffs - 0.0.3 → 0.0.4 - Mend

bud 0.0.3 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

data/README +33 -16
data/bin/budplot +42 -65
data/bin/budtimelines +235 -0
data/bin/budvis +24 -122
data/bin/rebl +1 -0
data/docs/README.md +21 -10
data/docs/bfs.md +4 -6
data/docs/c.html +251 -0
data/docs/cheat.md +45 -30
data/docs/deploy.md +26 -26
data/docs/getstarted.md +6 -4
data/docs/visualizations.md +43 -31
data/examples/chat/chat.rb +4 -9
data/examples/chat/chat_server.rb +1 -8
data/examples/deploy/deploy_ip_port +1 -0
data/examples/deploy/keys.rb +5 -0
data/examples/deploy/tokenring-ec2.rb +9 -9
data/examples/deploy/{tokenring-local.rb → tokenring-fork.rb} +3 -5
data/examples/deploy/tokenring-thread.rb +15 -0
data/examples/deploy/tokenring.rb +25 -17
data/lib/bud/aggs.rb +87 -25
data/lib/bud/bud_meta.rb +48 -31
data/lib/bud/bust/bust.rb +16 -15
data/lib/bud/collections.rb +207 -232
data/lib/bud/depanalysis.rb +1 -0
data/lib/bud/deploy/countatomicdelivery.rb +8 -20
data/lib/bud/deploy/deployer.rb +16 -16
data/lib/bud/deploy/ec2deploy.rb +34 -35
data/lib/bud/deploy/forkdeploy.rb +90 -0
data/lib/bud/deploy/threaddeploy.rb +38 -0
data/lib/bud/graphs.rb +103 -199
data/lib/bud/joins.rb +190 -41
data/lib/bud/monkeypatch.rb +84 -0
data/lib/bud/rebl.rb +8 -1
data/lib/bud/rewrite.rb +152 -49
data/lib/bud/server.rb +1 -0
data/lib/bud/state.rb +24 -10
data/lib/bud/storage/dbm.rb +170 -0
data/lib/bud/storage/tokyocabinet.rb +5 -1
data/lib/bud/stratify.rb +6 -7
data/lib/bud/viz.rb +31 -17
data/lib/bud/viz_util.rb +204 -0
data/lib/bud.rb +271 -244
data/lib/bud.rb.orig +806 -0
metadata +43 -22
data/docs/bfs.raw +0 -251
data/docs/diffs +0 -181
data/examples/basics/out +0 -1103
data/examples/basics/out.new +0 -856
data/lib/bud/deploy/localdeploy.rb +0 -53

data/docs/README.md CHANGED Viewed

@@ -3,13 +3,24 @@ Welcome to the documentation for *Bud*, a prototype of Bloom under development.
 The documents here are organized to be read in any order, but you might like to try the following:
-* **intro.md**: A brief introduction to Bud and Bloom
-* **getstarted.md**: A quickstart to teach you basic Bloom concepts, the use of `rebl` interactive terminal, and the embedding of Bloom code in Ruby via the `Bud` module.
-* **operational.md**: an operational view of Bloom, to provide a more detailed  model of how Bloom code is evaluated by Bud.
-* **cheat.md**: A concise "cheat sheet" to remind you about Bloom syntax.
-* **modules.md**: An overview of Bloom's modularity features.
-* **ruby_hooks.md**: Bud module methods that allow you to interact with the Bud evaluator from other Ruby threads.
-* **visualizations.md**: Overview of the `budvis` and `budplot` tools for visualizing Bloom program analyses.
-* **bfs.md**:  A walkthrough of the Bloom distributed filesystem.
-* **bud-sandbox**: a github repository including lots of useful libraries and examples.
-* **Bud RubyDoc**: The Bud gem ships with RubyDoc on the language constructs and runtime hooks provided by the Bud module.  (To see rdoc, you run `gem server` from a command line and open [http://0.0.0.0:8808/](http://0.0.0.0:8808/))
+* **[intro.md](intro.md)**: A brief introduction to Bud and Bloom.
+* **[getstarted.md](getstarted.md)**: A quickstart to teach you basic Bloom
+  concepts, the use of `rebl` interactive terminal, and the embedding of Bloom
+  code in Ruby via the `Bud` module.
+* **[operational.md](operational.md)**: An operational view of Bloom, to provide
+  a more detailed model of how Bloom code is evaluated by Bud.
+* **[cheat.md](cheat.md)**: A concise "cheat sheet" to remind you about Bloom syntax.
+* **[modules.md](modules.md)**: An overview of Bloom's modularity features.
+* **[ruby_hooks.md](ruby_hooks.md)**: Bud module methods that allow you to
+  interact with the Bud evaluator from other Ruby threads.
+* **[visualizations.md](visualizations.md)**: Overview of the `budvis` and
+  `budplot` tools for visualizing Bloom program analyses.
+* **[bfs.md](bfs.md)**: A walkthrough of the Bloom distributed filesystem.
+In addition, the **[bud-sandbox](http://github.com/bloom-lang/bud-sandbox)**
+GitHub repository contains lots of useful libraries and example programs built
+using Bloom.
+Finally, the Bud gem ships with RubyDoc on the language constructs and runtime
+hooks provided by the Bud module.  (To see rdoc, run `gem server` from a command
+line and open [http://0.0.0.0:8808/](http://0.0.0.0:8808/))

data/docs/bfs.md CHANGED Viewed

@@ -301,10 +301,10 @@ payload of local chunks:
           [l.sender, l.payload[0]] unless l.payload[1] == [nil]
         end
-At the same time, we use the Ruby _flatmap_ method to flatten the array of chunks in the heartbeat payload into a set of tuples, which we
+At the same time, we use the Ruby _flat_map_ method to flatten the array of chunks in the heartbeat payload into a set of tuples, which we
 associate with the heartbeating datanode and the time of receipt in __chunk_cache__:
-        chunk_cache <= join([master_duty_cycle, last_heartbeat]).flat_map do |d, l|
+        chunk_cache <= (master_duty_cycle * last_heartbeat).flat_map do |d, l|
           unless l.payload[1].nil?
             l.payload[1].map do |pay|
               [l.peer, pay, Time.parse(d.val).to_f]
@@ -312,14 +312,13 @@ associate with the heartbeating datanode and the time of receipt in __chunk_cach
           end
         end
-We periodically garbage-collect this cached, removing entries for datanodes from whom we have not received a heartbeat in a configurable amount of time.
+We periodically garbage-collect this cache, removing entries for datanodes from whom we have not received a heartbeat in a configurable amount of time.
 __last_heartbeat__ is an output interface provided by the __HeartbeatAgent__ module, and contains the most recent, non-stale heartbeat contents:
-        chunk_cache <- join([master_duty_cycle, chunk_cache]).map do |t, c|
+        chunk_cache <-(master_duty_cycle * chunk_cache).pairs do |t, c|
           c unless last_heartbeat.map{|h| h.peer}.include? c.node
         end
 ## [BFS Client](https://github.com/bloom-lang/bud-sandbox/blob/master/bfs/bfs_client.rb)
 One of the most complicated parts of the basic GFS design is the client component.  To minimize load on the centralized master, we take it off the critical
@@ -369,7 +368,6 @@ After defining some helper aggregates (__chunk_cnts_chunk__ or replica count by
 we define __lowchunks__ as the set of chunks whose replication factor is too low:
         lowchunks <= chunk_cnts_chunk { |c| [c.chunkid] if c.replicas < REP_FACTOR and !c.chunkid.nil?}
 We define __chosen_dest__ for a given underreplicated chunk as the datanode with

data/docs/c.html ADDED Viewed

@@ -0,0 +1,251 @@
+<h1>Bud Cheat Sheet</h1>
+<h2>General Bloom Syntax Rules</h2>
+<p>Bloom programs are unordered sets of statements.<br>
+Statements are delimited by semicolons (;) or newlines. <br>
+As in Ruby, backslash is used to escape a newline.<br></p>
+<h2>Simple embedding of Bud in a Ruby Class</h2>
+<pre><code>require 'bud'
+class Foo
+    include Bud
+    state do
+      ...
+    end
+    bloom do
+      ...
+    end
+end
+</code></pre>
+<h2>State Declarations</h2>
+<p>A <code>state</code> block contains Bud collection definitions. A Bud collection is a <em>set</em>
+of <em>facts</em>; each fact is an array of Ruby values. Note that collections do not
+contain duplicates (inserting a duplicate fact into a collection is ignored).</p>
+<p>Like tables in a relational database, zero or more columns in a collection make
+up the collection's <em>key</em>. Attempting to insert two facts that agree on the key
+columns but are not duplicates results in a primary key violation (runtime
+exception).</p>
+<h3>Default Declaration Syntax</h3>
+<p><em>BudCollection :name, [keys] =&gt; [values]</em></p>
+<h3>table</h3>
+<p>Contents persist in memory until explicitly deleted.<br>
+Default attributes: <code>[:key] =&gt; [:val]</code></p>
+<pre><code>table :keyvalue
+table :composite, [:keyfield1, :keyfield2] =&gt; [:values]
+table :noDups, [:field1, field2]
+</code></pre>
+<h3>scratch</h3>
+<p>Contents emptied at start of each timestep.<br>
+Default attributes: <code>[:key] =&gt; [:val]</code></p>
+<pre><code>scratch :stats
+</code></pre>
+<h3>interface</h3>
+<p>Scratch collections, used as connection points between modules.<br>
+Default attributes: <code>[:key] =&gt; [:val]</code></p>
+<pre><code>interface input, :request
+interface output, :response
+</code></pre>
+<h3>channel</h3>
+<p>Network channel manifested as a scratch collection.<br>
+Facts that are inserted into a channel are sent to a remote host; the address of the remote host is specified in an attribute of the channel that is denoted with <code>@</code>.<br>
+Default attributes: <code>[:@address, :val] =&gt; []</code></p>
+<p>(Bloom statements with channel on lhs must use async merge (<code>&lt;~</code>).)</p>
+<pre><code>channel :msgs
+channel :req_chan, [:cartnum, :storenum, :@server] =&gt; [:command, :params]
+</code></pre>
+<h3>periodic</h3>
+<p>System timer manifested as a scratch collection.<br>
+System-provided attributes: <code>[:key] =&gt; [:val]</code><br>
+&nbsp;&nbsp;&nbsp;&nbsp; (<code>key</code> is a unique ID, <code>val</code> is a Ruby Time converted to a string.)<br>
+State declaration includes interval (in seconds).</p>
+<p>(periodic can only be used on rhs of a Bloom statement.)</p>
+<pre><code>periodic :timer, 0.1
+</code></pre>
+<h3>stdio</h3>
+<p>Built-in scratch collection mapped to Ruby's <code>$stdin</code> and <code>$stdout</code><br>
+System-provided attributes: <code>[:line] =&gt; []</code></p>
+<p>Statements with stdio on lhs must use async merge (<code>&lt;~</code>).<br>
+To capture <code>$stdin</code> on rhs, instantiate Bud with <code>:read_stdin</code> option.<br></p>
+<h3>tctable</h3>
+<p>Table collection mapped to a <a href="http://fallabs.com/tokyocabinet/">Tokyo Cabinet</a> store.<br>
+Default attributes: <code>[:key] =&gt; [:val]</code></p>
+<pre><code>tctable :t1
+tctable :t2, [:k1, :k2] =&gt; [:v1, :v2]
+</code></pre>
+<h3>zktable</h3>
+<p>Table collection mapped to an <a href="http://hadoop.apache.org/zookeeper/">Apache Zookeeper</a> store.<br>
+System-provided attributes: <code>[:key] =&gt; [:val]</code><br>
+State declaration includes Zookeeper path and optional TCP string (default: "localhost:2181")<br></p>
+<pre><code>zktable :foo, "/bat"
+zktable :bar, "/dat", "localhost:2182"
+</code></pre>
+<h2>Bloom Statements</h2>
+<p><em>lhs BloomOp rhs</em></p>
+<p>Left-hand-side (lhs) is a named <code>BudCollection</code> object.<br>
+Right-hand-side (rhs) is a Ruby expression producing a <code>BudCollection</code> or <code>Array</code> of <code>Arrays</code>.<br>
+BloomOp is one of the 4 operators listed below.</p>
+<h3>Bloom Operators</h3>
+<p>merges:</p>
+<ul>
+<li><code>left &lt;= right</code> &nbsp;&nbsp;&nbsp;&nbsp; (<em>instantaneous</em>)</li>
+<li><code>left &lt;+ right</code> &nbsp;&nbsp;&nbsp;&nbsp; (<em>deferred</em>)</li>
+<li><code>left &lt;~ right</code> &nbsp;&nbsp;&nbsp;&nbsp; (<em>asynchronous</em>)</li>
+</ul>
+<p>delete:</p>
+<ul>
+<li><code>left &lt;- right</code> &nbsp;&nbsp;&nbsp;&nbsp; (<em>deferred</em>)</li>
+</ul>
+<h3>Collection Methods</h3>
+<p>Standard Ruby methods used on a BudCollection <code>bc</code>:</p>
+<p>implicit map:</p>
+<pre><code>t1 &lt;= bc {|t| [t.col1 + 4, t.col2.chomp]} # formatting/projection
+t2 &lt;= bc {|t| t if t.col = 5}             # selection
+</code></pre>
+<p><code>flat_map</code>:</p>
+<pre><code>require 'backports' # flat_map not included in Ruby 1.8 by default
+t3 &lt;= bc.flat_map do |t| # unnest a collection-valued attribute
+  bc.col4.map { |sub| [t.col1, t.col2, t.col3, sub] }
+end
+</code></pre>
+<p><code>bc.reduce</code>, <code>bc.inject</code>:</p>
+<pre><code>t4 &lt;= bc.reduce({}) do |memo, t|  # example: groupby col1 and count
+  memo[t.col1] ||= 0
+  memo[t.col1] += 1
+  memo
+end
+</code></pre>
+<p><code>bc.include?</code>:</p>
+<pre><code>t5 &lt;= bc do |t| # like SQL's NOT IN
+    t unless t2.include?([t.col1, t.col2])
+end
+</code></pre>
+<h2>BudCollection-Specific Methods</h2>
+<p><code>bc.keys</code>: projects <code>bc</code> to key columns<br></p>
+<p><code>bc.values</code>: projects <code>bc</code> to non-key columns<br></p>
+<p><code>bc.inspected</code>: shorthand for <code>bc {|t| [t.inspect]}</code></p>
+<pre><code>stdio &lt;~ bc.inspected
+</code></pre>
+<p><code>chan.payloads</code>: projects <code>chan</code> to non-address columns. Only defined for channels.</p>
+<pre><code># at sender
+msgs &lt;~ requests {|r| "127.0.0.1:12345", r}
+# at receiver
+requests &lt;= msgs.payloads
+</code></pre>
+<p><code>bc.exists?</code>: test for non-empty collection.  Can optionally pass in a block.</p>
+<pre><code>stdio &lt;~ [["Wake Up!"] if timer.exists?]
+stdio &lt;~ requests do |r|
+  [r.inspect] if msgs.exists?{|m| r.ident == m.ident}
+end
+</code></pre>
+<h2>SQL-style grouping/aggregation (and then some)</h2>
+<ul>
+<li><code>bc.group([:col1, :col2], min(:col3))</code>.  <em>akin to min(col3) GROUP BY (col1,col2)</em></li>
+<li>exemplary aggs: <code>min</code>, <code>max</code>, <code>choose</code></li>
+<li>summary aggs: <code>sum</code>, <code>avg</code>, <code>count</code></li>
+<li>structural aggs: <code>accum</code></li>
+<li><code>bc.argmax([:col1], :col2)</code> &nbsp;&nbsp;&nbsp;&nbsp; <em>returns the bc tuple per col1 that has highest col2</em></li>
+<li><code>bc.argmin([:col1], :col2)</code></li>
+</ul>
+<h3>Built-in Aggregates:</h3>
+<ul>
+<li>Exemplary aggs: <code>min</code>, <code>max</code>, <code>choose</code></li>
+<li>Summary aggs: <code>count</code>, <code>sum</code>, <code>avg</code></li>
+<li>Structural aggs: <code>accum</code></li>
+</ul>
+<p>Note that custom aggregation can be written using <code>reduce</code>.</p>
+<h2>Collection Combination (Join)</h2>
+<p>To match items across two (or more) collections, use the <code>*</code> operator, followed by methods to filter/format the result (<code>pairs</code>, <code>matches</code>, <code>combos</code>, <code>lefts</code>, <code>rights</code>).</p>
+<h3>Methods on Combinations (Joins)</h3>
+<p><code>pairs(</code><em>hash pairs</em><code>)</code>: <br>
+Given a <code>*</code> expression, form all pairs of items with value matches in the hash-pairs attributes.  Hash pairs can be fully qualified (<code>coll1.attr1 =&gt; coll2.attr2</code>) or shorthand (<code>:attr1 =&gt; :attr2</code>).</p>
+<pre><code># for each inbound msg, find match in a persistent buffer
+result &lt;= (msg * buffer).pairs(:val =&gt; :key) {|m, b| [m.address, m.val, b.val] }
+</code></pre>
+<p><code>combos(</code><em>hash pairs</em><code>)</code>: <br>
+Alias for <code>pairs</code>, more readable for multi-collection <code>*</code> expressions.  Must use fully-qualified hash pairs.</p>
+<pre><code># the following 2 Bloom statements are equivalent to this SQL
+# SELECT r.a, s_tab.b, t.c
+#   FROM r, s_tab, t
+#  WHERE r.x = s_tab.x
+#    AND s_tab.x = t.x;
+# multiple column matches
+out &lt;= (r * s_tab * t).combos(r.x =&gt; s_tab.x, s_tab.x =&gt; t.x) do |t1, t2, t3|
+         [t1.a, t2.b, t3.c]
+       end
+# column matching done per pair: this will be very slow
+out &lt;= (r * s_tab * t).combos do |t1, t2, t3|
+         [t1.a, t2.b, t3.c] if r.x == s_tab.x and s_tab.x == t.x
+       end
+</code></pre>
+<p><code>matches</code>:<br>
+Shorthand for <code>combos</code> with hash pairs for all attributes with matching names.</p>
+<pre><code># Equivalent to the above statements if x is the only attribute name in common:
+out &lt;= (r * s_tab * t).matches do {|t1, t2, t3| [t1.a, t2.b, t3.c]}
+</code></pre>
+<p><code>lefts(</code><em>hash pairs</em><code>)</code>: <br>
+Like <code>pairs</code>, but implicitly includes a block that projects down to the left item in each pair.</p>
+<p><code>rights(</code><em>hash pairs</em><code>)</code>:
+Like <code>pairs</code>, but implicitly includes a block that projects down to the right item in each pair.</p>
+<p><code>flatten</code>:<br>
+<code>flatten</code> is a bit like SQL's <code>SELECT *</code>: it produces a collection of concatenated objects, with a schema that is the concatenation of the schemas in tablelist (with duplicate names disambiguated.) Useful for chaining to operators that expect input collections with schemas, e.g. group:</p>
+<pre><code>out &lt;= (r * s).matches.flatten.group([:a], max(:b))
+</code></pre>
+<p><code>outer(</code><em>hash pairs</em><code>)</code>:<br>
+Left Outer Join.  Like <code>pairs</code>, but objects in the first collection will be produced nil-padded if they have no match in the second collection.</p>
+<h2>Temp Collections</h2>
+<p><code>temp</code><br>
+Temp collections are scratches defined within a <code>bloom</code> block:</p>
+<pre><code>temp :my_scratch1 &lt;= foo
+</code></pre>
+<p>The schema of a temp collection in inherited from the rhs; if the rhs has no
+schema, a simple one is manufactured to suit the data found in the rhs at
+runtime: <code>[c0, c1, ...]</code>.</p>
+<h2>Bud Modules</h2>
+<p>A Bud module combines state (collections) and logic (Bloom rules). Using modules allows your program to be decomposed into a collection of smaller units.</p>
+<p>Definining a Bud module is identical to defining a Ruby module, except that the module can use the <code>bloom</code>, <code>bootstrap</code>, and <code>state</code> blocks described above.</p>
+<p>There are two ways to use a module <em>B</em> in another Bloom module <em>A</em>:</p>
+<ol>
+<li>
+<p><code>include B</code>: This "inlines" the definitions (state and logic) from <em>B</em> into
+     <em>A</em>. Hence, collections defined in <em>B</em> can be accessed from <em>A</em> (via the
+     same syntax as <em>A</em>'s own collections). In fact, since Ruby is
+     dynamically-typed, Bloom statements in <em>B</em> can access collections
+     in <em>A</em>!</p>
+</li>
+<li>
+<p><code>import B =&gt; :b</code>: The <code>import</code> statement provides a more structured way to
+     access another module. Module <em>A</em> can now access state defined in <em>B</em> by
+     using the qualifier <code>b</code>. <em>A</em> can also import two different copies of <em>B</em>,
+     and give them local names <code>b1</code> and <code>b2</code>; these copies will be independent
+     (facts inserted into a collection defined in <code>b1</code> won't also be inserted
+     into <code>b2</code>'s copy of the collection).</p>
+</li>
+</ol>
+<h2>Skeleton of a Bud Module</h2>
+<pre><code>require 'rubygems'
+require 'bud'
+module YourModule
+  include Bud
+  state do
+    ...
+  end
+  bootstrap do
+    ...
+  end
+  bloom :some_stmts do
+    ...
+  end
+  bloom :more_stmts do
+    ...
+  end
+end
+</code></pre>

data/docs/cheat.md CHANGED Viewed

@@ -9,19 +9,26 @@ As in Ruby, backslash is used to escape a newline.<br>
     require 'bud'
     class Foo
-        include Bud
+      include Bud
-        state do
-          ...
-        end
+      state do
+        ...
+      end
-        bloom do
-          ...
-        end
+      bloom do
+        ...
+      end
     end
 ## State Declarations ##
-A `state` block contains Bud collection definitions.
+A `state` block contains Bud collection definitions. A Bud collection is a *set*
+of *facts*; each fact is an array of Ruby values. Note that collections do not
+contain duplicates (inserting a duplicate fact into a collection is ignored).
+Like a table in a relational databas, a subset of the columns in a collection
+makeup the collection's _key_. Attempting to insert two facts into a collection
+that agree on the key columns (but are not duplicates) results in a runtime
+exception.
 ### Default Declaration Syntax ###
 *BudCollection :name, [keys] => [values]*
@@ -57,10 +64,18 @@ Default attributes: `[:@address, :val] => []`
     channel :msgs
     channel :req_chan, [:cartnum, :storenum, :@server] => [:command, :params]
+### loopback ###
+A network channel that delivers tuples back to the current Bud instance.<br>
+Default attributes: `[:key] => [:val]`
+(Bloom statements with loopback on lhs must use async merge (`<~`).)
+    loopback :talk_to_self
 ### periodic ###
 System timer manifested as a scratch collection.<br>
 System-provided attributes: `[:key] => [:val]`<br>
-&nbsp;&nbsp;&nbsp;&nbsp; (`key` is a unique ID, `val` is a Ruby Time converted to a string.)<br>
+&nbsp;&nbsp;&nbsp;&nbsp; (`key` is a unique ID, `val` is a Ruby `Time` object.)<br>
 State declaration includes interval (in seconds).
 (periodic can only be used on rhs of a Bloom statement.)
@@ -68,11 +83,19 @@ State declaration includes interval (in seconds).
     periodic :timer, 0.1
 ### stdio ###
-Built-in scratch collection mapped to Ruby's `$stdin` and `$stdout`<br>
+Built-in scratch collection for performing terminal I/O.<br>
 System-provided attributes: `[:line] => []`
 Statements with stdio on lhs must use async merge (`<~`).<br>
-To capture `$stdin` on rhs, instantiate Bud with `:read_stdin` option.<br>
+Using `stdio` on the lhs of an async merge results in writing to the `IO` object specified by the `:stdout` Bud option (`$stdout` by default).<br>
+To use `stdio` on rhs, instantiate Bud with `:stdin` option set to an `IO` object (e.g., `$stdin`).<br>
+### dbm_table ###
+Table collection mapped to a [DBM] (http://en.wikipedia.org/wiki/Dbm) store.<br>
+Default attributes: `[:key] => [:val]`
+    dbm_table :t1
+    dbm_table :t2, [:k1, :k2] => [:v1, :v2]
 ### tctable ###
 Table collection mapped to a [Tokyo Cabinet](http://fallabs.com/tokyocabinet/) store.<br>
@@ -95,7 +118,7 @@ State declaration includes Zookeeper path and optional TCP string (default: "loc
 Left-hand-side (lhs) is a named `BudCollection` object.<br>
 Right-hand-side (rhs) is a Ruby expression producing a `BudCollection` or `Array` of `Arrays`.<br>
-BloomOp is one of the 5 operators listed below.
+BloomOp is one of the 4 operators listed below.
 ### Bloom Operators ###
 merges:
@@ -108,13 +131,6 @@ delete:
 * `left <- right` &nbsp;&nbsp;&nbsp;&nbsp; (*deferred*)
-insert:<br>
-* `left << [...]` &nbsp;&nbsp;&nbsp;&nbsp; (*instantaneous*)
-Note that unlike merge/delete, insert expects a single fact on the rhs, rather
-than a collection.
 ### Collection Methods ###
 Standard Ruby methods used on a BudCollection `bc`:
@@ -154,7 +170,7 @@ implicit map:
     stdio <~ bc.inspected
-`chan.payloads`: projects `chan` to non-address columns. only defined for channels
+`chan.payloads`: projects `chan` to non-address columns. Only defined for channels.
     # at sender
     msgs <~ requests {|r| "127.0.0.1:12345", r}
@@ -191,13 +207,13 @@ To match items across two (or more) collections, use the `*` operator, followed
 ### Methods on Combinations (Joins) ###
 `pairs(`*hash pairs*`)`: <br>
-given a `*` expression, form all pairs of items with value matches in the hash-pairs attributes.  Hash pairs can be fully qualified (`coll1.attr1 => coll2.attr2`) or shorthand (`:attr1 => :attr2`).
+Given a `*` expression, form all pairs of items with value matches in the hash-pairs attributes.  Hash pairs can be fully qualified (`coll1.attr1 => coll2.attr2`) or shorthand (`:attr1 => :attr2`).
     # for each inbound msg, find match in a persistent buffer
     result <= (msg * buffer).pairs(:val => :key) {|m, b| [m.address, m.val, b.val] }
-`pairs(`*hash pairs*`)`: <br>
-alias for `pairs`, more readable for multi-collection `*` expressions.  Must use fully-qualified hash pairs.
+`combos(`*hash pairs*`)`: <br>
+Alias for `pairs`, more readable for multi-collection `*` expressions.  Must use fully-qualified hash pairs.
     # the following 2 Bloom statements are equivalent to this SQL
     # SELECT r.a, s_tab.b, t.c
@@ -211,16 +227,16 @@ alias for `pairs`, more readable for multi-collection `*` expressions.  Must use
            end
     # column matching done per pair: this will be very slow
-    out <= join([r,s_tab,t]) do |t1, t2, t3|
+    out <= (r * s_tab * t).combos do |t1, t2, t3|
              [t1.a, t2.b, t3.c] if r.x == s_tab.x and s_tab.x == t.x
            end
 `matches`:<br>
-Shorthand for `combos` with hash pairs for all attributes with matching names.
+Shorthand for `combos` with hash pairs for all attributes with matching names; this is called the "natural join" in SQL.
     # Equivalent to the above statements if x is the only attribute name in common:
-    out <= (r * s_tab * t).matches do {|t1, t2, t3| [t1.a, t2.b, t3.c]}
+    out <= (r * s_tab * t).matches {|t1, t2, t3| [t1.a, t2.b, t3.c]}
 `lefts(`*hash pairs*`)`: <br>
 Like `pairs`, but implicitly includes a block that projects down to the left item in each pair.
@@ -232,9 +248,8 @@ Like `pairs`, but implicitly includes a block that projects down to the right it
     out <= (r * s).matches.flatten.group([:a], max(:b))
-### Left Join ###
-`leftjoin([`*t1, t2*`]` *, optional hash pairs, ...*`)`<br>
-Left Outer Join.  Note postfix syntax with array of 2 collections as first argument, hash pairs as subsequent arguments.  Objects in the first collection will be included in the output even if no match is found in the second collection.
+`outer(`*hash pairs*`)`:<br>
+Left Outer Join.  Like `pairs`, but objects in the first collection will be produced nil-padded if they have no match in the second collection.
 ## Temp Collections ##
 `temp`<br>

data/docs/deploy.md CHANGED Viewed

@@ -1,45 +1,43 @@
 # Deployment
-Bud provides support for deploying a program onto a set of Bud instances.  At the moment, two types of deployments are supported: local deployment and EC2 deployment.  Intuitively, you include the module corresponding to the type of deployment you want into a Bud class, which you instantiate and run on a node called the "deployer."  The deployer then spins up a requested number of Bud instances and distributes initial data.
+Bud provides support for deploying a program onto a set of Bud instances.  At the moment, two types of deployments are supported: fork-based local deployment and EC2 deployment.  Intuitively, you include the module corresponding to the type of deployment you want into a Bud class, which you instantiate and run on a node called the "deployer."  The deployer then spins up a requested number of Bud instances and distributes initial data.
 First, decide which type of deployment you want to use.
-## Local Deployment
+## Fork-based Local Deployment
-To use local deployment, you'll need to require it in your program:
+To use fork-based deployment, you'll need to include it in your class:
-    require 'bud/deploy/localdeploy'
+    include ForkDeploy
-Don't forget to include it in your class:
-    include LocalDeploy
-The next step is to declare how many nodes you want to the program to be spun up on.  You need to do this in a `deploystrap` block.  A `deploystrap` block is run before `bootstrap`, and is only run for a Bud class that is instantiated with the option "`:deploy => true`".  As an example:
+The next step is to declare how many nodes you want to the program to be spun up on.  You need to do this in a `deploystrap` block.  A `deploystrap` block is run before `bootstrap`, and is only run for a Bud class that is instantiated with the option `:deploy => true`.  As an example:
     deploystrap do
       num_nodes <= [[2]]
     end
-Local deployment will spin up `num_nodes` local processes, each containing one Bud instance, running the class that you include `LocalDeploy` in.  The deployment code will populate a binary collection called `node`; the first columm is a "node ID" -- a distinct integer from the range `[0, num_nodes - 1]` -- and the second argument is an "IP:port" string associated with the node.  Nodes are spun up on ephemeral ports, listening on "localhost".
+Fork-based deployment will spin up `num_nodes` local processes, each containing one Bud instance, running the class that you include `ForkDeploy` in.  The deployment code will populate a binary collection called `node`; the first columm is a "node ID" -- a distinct integer from the range `[0, num_nodes - 1]` -- and the second argument is an "IP:port" string associated with the node.  Nodes are spun up on ephemeral ports, listening on "localhost".
-Now, you need to define how you want the initial data to be distributed.  You can do this, for example, by writing (multiple) rules with `initial_data` in the head.  The schema of `initial_data` is as follows: [node ID, relation name as a symbol, list of tuples].  For example, to distribute the IP address of the "deployer" to all of the other nodes in a relation called `master`, you might decide to write something like this:
+Now, you need to define how you want the initial data to be distributed.  You can do this, for example, by writing (multiple) rules with `initial_data` in the head.  These rules can appear in any `bloom` block in your program. The schema of `initial_data` is as follows: [node ID, relation name as a symbol, list of tuples].
-    initial_data <= node.map {|n| [n.uid, :master, [[ip_port]]]}
+For example, to distribute the IP address and port of the "deployer" to all of the other nodes in a relation called `master`, you might decide to write something like this:
-Note that the relation ("master" in this case) is never a channel.  You may only distribute data to scratches and tables.  Initial data is transferred only after _all_ nodes are spun up; this ensures that initial data will never be lost because a node is not yet listening on a socket, for example.  Initial data is transmitted atomically to each node; this means that on each node, _all_ initial data in _all_ relations will arrive at the same Bud timestep.  However, there is no global barrier for transfer of initial data.  For example, if initial data is distributed to nodes 1 and 2, node 1 may receive its initial data first, and then send subsequent messages on channels to node 2 which node 2 may receive before its initial data.
+    initial_data <= node {|n| [n.uid, :master, [[ip_port]]]}
-The rules defining `initial_data` may appear in any `bloom` block.
+Note that the relation (`master` in this case) cannot be a channel -- you may only distribute data to scratches and tables.  Initial data is transferred only after _all_ nodes are spun up; this ensures that initial data will never be lost because a node is not yet listening on a socket, for example.  Initial data is transmitted atomically to each node; this means that on each node, _all_ initial data in _all_ relations will arrive at the same Bud timestep.  However, there is no global barrier for transfer of initial data.  For example, if initial data is distributed to nodes 1 and 2, node 1 may receive its initial data first, and then send subsequent messages on channels to node 2 which node 2 may receive before its initial data.
-The final step is to add `:deploy => true` to the instantiation of your class.  Note that the local deployer will spin up nodes without `:deploy => true`, so you don't forkbomb your system.
+The final step is to add `:deploy => true` to the instantiation of your class.  Note that the fork-based deployer will spin up nodes without `:deploy => true`, so you don't forkbomb your system.
 ## EC2 Deployment
-To use EC2 deployment you'll need to require it in your program:
+To use EC2 deployment, you'll need to require it in your program:
     require 'bud/deploy/ec2deploy'
-Don't forget to include it in your class:
+Note that the `amazon-ec2`, `net-scp`, and `net-ssh` gems must be installed.
+Next, include the `EC2Deploy` module in your class:
     include EC2Deploy
@@ -71,26 +69,28 @@ This recursively copies all directories and files rooted at the current working
 Finally, `ec2_key_location` is the path to the private key of the `key_name` keypair.  For example:
-    key_name <= [["/home/bob/.ssh/ec2"]]
+    ec2_key_location <= [["/home/bob/.ssh/bob.pem"]]
-EC2 deployment will spin up `num_nodes` instances (using defaults) on EC2 using a pre-rolled Bud AMI based on Amazon's 32-bit Linux AMI (`ami-8c1fece5`).  Each instance contains one Bud instance, which runs the `ruby_command`.  Like before, the deployment code will populate a binary relation called `node`; the first argument is a "node ID" -- a distinct integer from the range [0, num_nodes - 1] -- and the second argument is an "IP:port" string associated with the node.  Nodes are currently spun up on fixed port 54321.
+EC2 deployment will spin up `num_nodes` instances (using defaults) on EC2 using a pre-rolled Bud AMI based on Amazon's 32-bit Linux AMI (`ami-8c1fece5`).  Each instance contains one Bud instance, which runs the `ruby_command`.  Like before, the deployment code will populate a binary relation called `node`; the first attribute is a "node ID" -- a distinct integer from the range [0, num_nodes - 1] -- and the second attribute is an "IP:port" string associated with the node.  Nodes are currently spun up on fixed port 54321.
-Defining initial data works exactly the same way with EC2 deployment as it does with local deployment.
+Defining `initial_data` works exactly the same way with EC2 deployment as it does with local deployment.
-There is a slight catch with EC2 deployment.  Sometimes EC2 tells us that the nodes have all started up, but really, one or more nodes will never start up.  Currently, in this scenario, deployment exceeds the maximum number of ssh retries, and throws an exception.
+There is a slight catch with EC2 deployment.  Sometimes EC2 tells us that the nodes have all started up, but really, one or more nodes will never start up.  Currently, in this scenario, deployment exceeds the maximum number of SSH retries, and throws an exception.
 Note that EC2 deployment does *not* shut down the EC2 nodes it starts up under any circumstances.  This means you must use some alternate means to shut down the nodes, such as logging onto the EC2 web interface and terminating the nodes.
 ## Examples
-Check out the `examples/deploy` directory in Bud.  There is a simple token ring example that establishes a ring involving 10 nodes and sends a token around the ring continuously.  This example can be deployed locally:
+Check out the `examples/deploy` directory in Bud.  There is a simple token ring example that establishes a ring involving 10 nodes and sends a token around the ring continuously.  This example can be deployed locally using the fork-based deployer:
-    ruby tokenring-local.rb
+    ruby tokenring-fork.rb
 or on EC2:
-    ruby tokenring-ec2.rb local_ip:local_port ext_ip true
+    ruby tokenring-ec2.rb local_ip:local_port ext_ip:ext_port true
+"ext_ip" and "ext_port" should be set to the externally-visible IP and port of the computer you are deploying from.  For example, if you are behind a home router, you will want to set "ext_ip" to your public IP address, and ensure that your router forwards "ext_port" to local_port on the computer with private IP "local_ip".
-Note that before running `tokenring-ec2`, you must create a "keys.rb" file that contains `access_key_id`, `secret_access_key`, `key_name` and `ec2_key_location`.
+Note that before running `tokenring-ec2`, you must create a file named "keys.rb" that contains definitions for `access_key_id`, `secret_access_key`, `key_name` and `ec2_key_location`.
-Output will be displayed to show the progress of the deployment.  Be patient, it may take a while for output to appear.  Once deployment is complete and all nodes are ready, each node will display output indicating when it has the token.  All output will be visible for the local deployment case, whereas only the deployer node's output will be visible for the EC2 deployment case (stdout of all other nodes is materialized to disk).
+Output will be displayed to show the progress of the deployment.  Be patient, it may take a while for output to appear.  Once deployment is complete and all nodes are ready, each node will display output indicating when it has the token.  All output will be visible for the fork-based deployment case, whereas only the deployer node's output will be visible for the EC2 deployment case (stdout of all other nodes is written to local disk).

data/docs/getstarted.md CHANGED Viewed

@@ -2,7 +2,11 @@
 In this document we'll do a hands-on tour of Bud and its Bloom DSL for Ruby.  We'll start with some examples, and introduce concepts as we go.
 ## Installation ##
-You know the drill!
+Bud depends on one library that needs to be installed separately:
+* [GraphViz](http://www.graphviz.org/Download.php) (2.26.3 recommended)
+Once that's done, you know the drill!
     % gem install bud
@@ -248,8 +252,6 @@ And here's the code:
       end
     end
     if ARGV.length == 2
       server = ARGV[1]
     else
@@ -264,7 +266,7 @@ The ChatClient class has a typical Ruby `initialize` method that sets up two loc
 The next block in the class is the first Bloom `bootstrap` block we've seen.  This is a set of Bloom statements that are evaluated only once, just before the first "regular" timestep of the system.  In this case, we bootstrap the client by sending a message to the server on the connect channel, containing the client's address (via the built-in Bud instance method `ip_port`) and chosen nickname.
-After that comes a bloom block, with the name `:chatter`.  It contains two statements: one to take stdio input from the terminal and send it to the server via mcast, and another to receive mcasts and place them on stdio output.  The first statement has the built-in `stdio` scratch on the rhs: this includes any lines of terminal input that arrived since the last timestep.  For each line of terminal input, the `do...end` block formats an `mcast` message destined to the address in the instance variable `@server`, with an array as the payload.  The rhs of the second statement takes `mcast` messages that arrived since the last timestep.  For each message `m`, the `m.val` expression in the block returns the message payload; the call to the Ruby instance method `pretty_print` formats the message so it will look nice on-screen.  These formatted strings are placed (asynchronously, as before) into `stdio` on the left.
+After that comes a Bloom block with the name `:chatter`.  It contains two statements: one to take stdio input from the terminal and send it to the server via mcast, and another to receive mcasts and place them on stdio output.  The first statement has the built-in `stdio` scratch on the rhs: this includes any lines of terminal input that arrived since the last timestep.  For each line of terminal input, the `do...end` block formats an `mcast` message destined to the address in the instance variable `@server`, with an array as the payload.  The rhs of the second statement takes `mcast` messages that arrived since the last timestep.  For each message `m`, the `m.val` expression in the block returns the message payload; the call to the Ruby instance method `pretty_print` formats the message so it will look nice on-screen.  These formatted strings are placed (asynchronously, as before) into `stdio` on the left.
 The remaining lines are Ruby driver code to instantiate and run the ChatClient class (which includes the `Bud` module) using arguments from the command line.  Note the option `:read_stdin => true` to `ChatClient.new`: this causes the Bud runtime to capture stdin via the built-in `stdio` collection.