RubyGems - bud - Versions diffs - 0.0.2 - Mend

bud 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (62) hide show

data/LICENSE +9 -0
data/README +30 -0
data/bin/budplot +134 -0
data/bin/budvis +201 -0
data/bin/rebl +4 -0
data/docs/README.md +13 -0
data/docs/bfs.md +379 -0
data/docs/bfs.raw +251 -0
data/docs/bfs_arch.png +0 -0
data/docs/bloom-loop.png +0 -0
data/docs/bust.md +83 -0
data/docs/cheat.md +291 -0
data/docs/deploy.md +96 -0
data/docs/diffs +181 -0
data/docs/getstarted.md +296 -0
data/docs/intro.md +36 -0
data/docs/modules.md +112 -0
data/docs/operational.md +96 -0
data/docs/rebl.md +99 -0
data/docs/ruby_hooks.md +19 -0
data/docs/visualizations.md +75 -0
data/examples/README +1 -0
data/examples/basics/hello.rb +12 -0
data/examples/basics/out +1103 -0
data/examples/basics/out.new +856 -0
data/examples/basics/paths.rb +51 -0
data/examples/bust/README.md +9 -0
data/examples/bust/bustclient-example.rb +23 -0
data/examples/bust/bustinspector.html +135 -0
data/examples/bust/bustserver-example.rb +18 -0
data/examples/chat/README.md +9 -0
data/examples/chat/chat.rb +45 -0
data/examples/chat/chat_protocol.rb +8 -0
data/examples/chat/chat_server.rb +29 -0
data/examples/deploy/tokenring-ec2.rb +26 -0
data/examples/deploy/tokenring-local.rb +17 -0
data/examples/deploy/tokenring.rb +39 -0
data/lib/bud/aggs.rb +126 -0
data/lib/bud/bud_meta.rb +185 -0
data/lib/bud/bust/bust.rb +126 -0
data/lib/bud/bust/client/idempotence.rb +10 -0
data/lib/bud/bust/client/restclient.rb +49 -0
data/lib/bud/collections.rb +937 -0
data/lib/bud/depanalysis.rb +44 -0
data/lib/bud/deploy/countatomicdelivery.rb +50 -0
data/lib/bud/deploy/deployer.rb +67 -0
data/lib/bud/deploy/ec2deploy.rb +200 -0
data/lib/bud/deploy/localdeploy.rb +41 -0
data/lib/bud/errors.rb +15 -0
data/lib/bud/graphs.rb +405 -0
data/lib/bud/joins.rb +300 -0
data/lib/bud/rebl.rb +314 -0
data/lib/bud/rewrite.rb +523 -0
data/lib/bud/rtrace.rb +27 -0
data/lib/bud/server.rb +43 -0
data/lib/bud/state.rb +108 -0
data/lib/bud/storage/tokyocabinet.rb +170 -0
data/lib/bud/storage/zookeeper.rb +178 -0
data/lib/bud/stratify.rb +83 -0
data/lib/bud/viz.rb +65 -0
data/lib/bud.rb +797 -0
metadata +330 -0

data/docs/bust.md ADDED Viewed

@@ -0,0 +1,83 @@
+BUST stands for BUd State Transfer and it is a REST interface to BUD.  BUST consists of a Bud implementation of a client and server.  The client implements bindings to a subset of the Ruby Nestful library, and the server is a lightweight HTTP server written in Ruby.  Note that the BUST server currently sets the "Access-Control-Allow-Origin: *" HTTP header to override web browsers' same-origin policy.
+Right now BUST supports "GET" and "POST" requests, and may support "DELETE" and "PUT" requests in the future.
+# BUST Server
+For the BUST server, a "GET" request corresponds to retrieving a subset of rows of a table, and a "POST" request corresponds to inserting a row into a table.  For example, the following "GET" request (assuming BUST is running on port 8080):
+    GET localhost:8080/foo?bar=hello&baz=world
+would retrieve all rows in table "foo" where named schema attribute "bar" is equal to the string "hello", and named schema attribute "baz" is equal to the string "world".  Right now, one limitation of BUST is that only strings are supported.
+To use BUST in your program, ensure you have the json gem installed.  Add the "require" line for BUST:
+    require "bud/bust/bust"
+In your class, make sure to:
+    include Bust
+That's it!  Now a BUST server will be started up when your class is instantiated.  By default, this server will listen on port 8080, but you can change this by passing a port via the "bust_port" option when you instantiate your class.
+You can test out the BUST server using Ruby's "net/http" library if you want, and you can also check out "BUST Inspector", a sample AJAX application that allows you to view the state of a bud instance.
+## net/http Example
+Try running "bustserver-example.rb" in the "examples/bust" directory:
+    ruby bustserver-example.rb
+Now, let's interact with our example using "net/http" from within IRB.  Start up an IRB instance:
+    irb
+    irb(main):001:0> require 'net/http'
+    => true
+bustexample.rb defines a single relation called "foo":
+    table :foo, [:bar, :baz, :qux]
+Let's fire off some requests.  First, let's put a new foo fact in:
+    irb(main):002:0> res = Net::HTTP.post_form(URI.parse('http://localhost:8080/foo'), {:bar => "a", :baz => "b", :qux => "c"})
+    => #<Net::HTTPOK 200 /OK readbody=true>
+Now, let's retrieve all foo facts where the "qux" attribute is "c", and the "baz" attribute is "b":
+    irb(main):003:0> res = Net::HTTP.get(URI.parse('http://localhost:8080/foo?qux=c&baz=b'))
+    => "[[\"a\",\"b\",\"c\"]]"
+Note that the response is a JSON array.
+## BUST Inspector
+BUST Inspector -- an example app that uses XMLHttpRequests to inspect state in a Bud program using BUST is included -- (bust/bustinspector.html).  Right now, it assumes that the Bud instance you're trying to inspect is listening on "localhost" at port "8080".  BUST Inspector is tested to work in Firefox, and may or may not work in other browsers.  BUST Inspector will query your Bud instance every second for metadata describing the tables and their schema.  It will display a list of the tables in a pane on the left of the screen, with a checkbox next to each table.  Selecting a checkbox renders the current table contents in the right pane (these are also updated every second while the box is checked).
+# BUST Client
+The BUST client (located in the "bust/client" folder) allows Bud applications to access REST services (including a Bud client hosting a BUST instance). The REST client is basically a wrapper for the Ruby nestful library. You'll need to ensure you have the "nestful" gem installed before you can use the REST client. To use it in your application, you need to put the require line:
+    require 'bud/bust/client/restclient'
+and the include line:
+    include RestClient
+To make requests, insert into the rest_req interface, whose defintion is reproduced below:
+    interface input, :rest_req, [:rid, :verb, :form, :url, :params]
+"rid" is a unique ID for the request, "verb" is one of ":get" or ":post", "form" is the format of the request, for example, you might use ":json", or if you're doing a form post, you'd use "form". If set to nil, "form" defaults to ":form" for ":post", and is omitted from a ":get". For ":get" requests, the "form" parameter seems to be appended onto the end of "url". For example, if you do a ":get" for "http://example.com/ex" with "form" set to ":json", the library sends an HTTP GET to "http://example.com/ex.json". "params" is a hash, which comprises the query string for a ":get", and the contents of the body in a ":post" with "form" set to ":form".
+The output interface is:
+    interface output, :rest_response, [:rid, :resp, :exception]
+"rid" is the unique ID supplied when the request was made, "resp" is the parsed response from the server. For example, if you do a ":json" ":get", then "resp" will contain whatever JSON object was returned converted into a Ruby object, e.g., array, hash, etc. If there is an exception, then "resp" will contain a string describing the exception, and "exception" will be set to true; otherwise, "exception" will be set to false.
+A simple example is included (bustclient-example.rb) that does an HTTP GET on Twitter's public timeline, returning the most recent statuses, and prints them to stdout.  The example is in "examples/bust/client".
+The BUST client does not yet support OAuth. Also unsupported so far is HTTP DELETE and PUT.

data/docs/cheat.md ADDED Viewed

@@ -0,0 +1,291 @@
+# Bud Cheat Sheet #
+## General Bloom Syntax Rules ##
+Bloom programs are unordered sets of statements.<br>
+Statements are delimited by semicolons (;) or newlines. <br>
+As in Ruby, backslash is used to escape a newline.<br>
+## Simple embedding of Bud in a Ruby Class ##
+    require 'bud'
+    class Foo
+        include Bud
+        state do
+          ...
+        end
+        bloom do
+          ...
+        end
+    end
+## State Declarations ##
+A `state` block contains Bud collection definitions.
+### Default Declaration Syntax ###
+*BudCollection :name, [keys] => [values]*
+### table ###
+Contents persist in memory until explicitly deleted.<br>
+Default attributes: `[:key] => [:val]`
+    table :keyvalue
+    table :composite, [:keyfield1, :keyfield2] => [:values]
+    table :noDups, [:field1, field2]
+### scratch ###
+Contents emptied at start of each timestep.<br>
+Default attributes: `[:key] => [:val]`
+    scratch :stats
+### interface ###
+Scratch collections, used as connection points between modules.<br>
+Default attributes: `[:key] => [:val]`
+    interface input, :request
+    interface output, :response
+### channel ###
+Network channel manifested as a scratch collection.<br>
+Facts that are inserted into a channel are sent to a remote host; the address of the remote host is specified in an attribute of the channel that is denoted with `@`.<br>
+Default attributes: `[:@address, :val] => []`
+(Bloom statements with channel on lhs must use async merge (`<~`).)
+    channel :msgs
+    channel :req_chan, [:@address, :cartnum, :storenum] => [:command, :params]
+### periodic ###
+System timer manifested as a scratch collection.<br>
+System-provided attributes: `[:key] => [:val]`<br>
+&nbsp;&nbsp;&nbsp;&nbsp; (`key` is a unique ID, `val` is a Ruby Time converted to a string.)<br>
+State declaration includes interval (in seconds).
+(periodic can only be used on rhs of a Bloom statement.)
+    periodic :timer, 0.1
+### stdio ###
+Built-in scratch collection mapped to Ruby's `$stdin` and `$stdout`<br>
+System-provided attributes: `[:line] => []`
+Statements with stdio on lhs must use async merge (`<~`).<br>
+To capture `$stdin` on rhs, instantiate Bud with `:read_stdin` option.<br>
+### tctable ###
+Table collection mapped to a [Tokyo Cabinet](http://fallabs.com/tokyocabinet/) store.<br>
+Default attributes: `[:key] => [:val]`
+    tctable :t1
+    tctable :t2, [:k1, :k2] => [:v1, :v2]
+### zktable ###
+Table collection mapped to an [Apache Zookeeper](http://hadoop.apache.org/zookeeper/) store.<br>
+System-provided attributes: `[:key] => [:val]`<br>
+State declaration includes Zookeeper path and optional TCP string (default: "localhost:2181")<br>
+    zktable :foo, "/bat"
+    zktable :bar, "/dat", "localhost:2182"
+## Bloom Statements ##
+*lhs BloomOp rhs*
+Left-hand-side (lhs) is a named `BudCollection` object.<br>
+Right-hand-side (rhs) is a Ruby expression producing a `BudCollection` or `Array` of `Arrays`.<br>
+BloomOp is one of the 5 operators listed below.
+## Bloom Operators ##
+merges:
+* `left <= right` &nbsp;&nbsp;&nbsp;&nbsp; (*instantaneous*)
+* `left <+ right` &nbsp;&nbsp;&nbsp;&nbsp; (*deferred*)
+* `left <~ right` &nbsp;&nbsp;&nbsp;&nbsp; (*asynchronous*)
+delete:
+* `left <- right` &nbsp;&nbsp;&nbsp;&nbsp; (*deferred*)
+insert:<br>
+* `left << [...]` &nbsp;&nbsp;&nbsp;&nbsp; (*instantaneous*)
+Note that unlike merge/delete, insert expects a single fact on the rhs, rather
+than a collection.
+## Collection Methods ##
+Standard Ruby methods used on a BudCollection `bc`:
+implicit map:
+    t1 <= bc {|t| [t.col1 + 4, t.col2.chomp]} # formatting/projection
+    t2 <= bc {|t| t if t.col = 5}             # selection
+`flat_map`:
+    require 'backports' # flat_map not included in Ruby 1.8 by default
+    t3 <= bc.flat_map do |t| # unnest a collection-valued attribute
+      bc.col4.map { |sub| [t.col1, t.col2, t.col3, sub] }
+    end
+`bc.reduce`, `bc.inject`:
+    t4 <= bc.reduce({}) do |memo, t|  # example: groupby col1 and count
+      memo[t.col1] ||= 0
+      memo[t.col1] += 1
+      memo
+    end
+`bc.include?`:
+    t5 <= bc do |t| # like SQL's NOT IN
+        t unless t2.include?([t.col1, t.col2])
+    end
+## BudCollection-Specific Methods ##
+`bc.keys`: projects `bc` to key columns<br>
+`bc.values`: projects `bc` to non-key columns<br>
+`bc.inspected`: shorthand for `bc {|t| [t.inspect]}`
+    stdio <~ bc.inspected
+`chan.payloads`: shorthand for `chan {|t| t.val}`, only defined for channels
+    # at sender
+    msgs <~ requests {|r| "127.0.0.1:12345", r}
+    # at receiver
+    requests <= msgs.payloads
+`bc.exists?`: test for non-empty collection.  Can optionally pass in a block.
+    stdio <~ [["Wake Up!"] if timer.exists?]
+    stdio <~ requests do |r|
+      [r.inspect] if msgs.exists?{|m| r.ident == m.ident}
+    end
+## SQL-style grouping/aggregation (and then some) ##
+* `bc.group([:col1, :col2], min(:col3))`.  *akin to min(col3) GROUP BY (col1,col2)*
+  * exemplary aggs: `min`, `max`, `choose`
+  * summary aggs: `sum`, `avg`, `count`
+  * structural aggs: `accum`
+* `bc.argmax([:col1], :col2)` &nbsp;&nbsp;&nbsp;&nbsp; *returns the bc tuple per col1 that has highest col2*
+* `bc.argmin([:col1], :col2)`
+### Built-in Aggregates: ###
+* Exemplary aggs: `min`, `max`, `choose`
+* Summary aggs: `count`, `sum`, `avg`
+* Structural aggs: `accum`
+## Collection Combination (Join) ###
+To match items across two (or more) collections, use the `*` operator, followed by methods to filter/format the result (`pairs`, `matches`, `combos`, `lefts`, `rights`).
+### Methods on Combinations (Joins) ###
+`pairs(`*hash pairs*`)`: <br>
+given a `*` expression, form all pairs of items with value matches in the hash-pairs attributes.  Hash pairs can be fully qualified (`coll1.attr1 => coll2.attr2`) or shorthand (`:attr1 => :attr2`).
+    # for each inbound msg, find match in a persistent buffer
+    result <= (msg * buffer).pairs(:val => :key) {|m, b| [m.address, m.val, b.val] }
+`pairs(`*hash pairs*`)`: <br>
+alias for `pairs`, more readable for multi-collection `*` expressions.  Must use fully-qualified hash pairs.
+    # the following 2 Bloom statements are equivalent to this SQL
+    # SELECT r.a, s_tab.b, t.c
+    #   FROM r, s_tab, t
+    #  WHERE r.x = s_tab.x
+    #    AND s_tab.x = t.x;
+    # multiple column matches
+    out <= (r * s_tab * t).combos(r.x => s_tab.x, s_tab.x => t.x) do |t1, t2, t3|
+             [t1.a, t2.b, t3.c]
+           end
+    # column matching done per pair: this will be very slow
+    out <= join([r,s_tab,t]) do |t1, t2, t3|
+             [t1.a, t2.b, t3.c] if r.x == s_tab.x and s_tab.x == t.x
+           end
+`matches`:<br>
+Shorthand for `combos` with hash pairs for all attributes with matching names.
+    # Equivalent to the above statements if x is the only attribute name in common:
+    out <= (r * s_tab * t).matches do {|t1, t2, t3| [t1.a, t2.b, t3.c]}
+`lefts(`*hash pairs*`)`: <br>
+Like `pairs`, but implicitly includes a block that projects down to the left item in each pair.
+`rights(`*hash pairs*`)`:
+Like `pairs`, but implicitly includes a block that projects down to the right item in each pair.
+`flatten`<br>
+`flatten` is a bit like SQL's `SELECT *`: it produces a collection of concatenated objects, with a schema that is the concatenation of the schemas in tablelist (with duplicate names disambiguated.) Useful for chaining to operators that expect input collections with schemas, e.g. group:
+    out <= (r * s).matches.flatten.group([:a], max(:b))
+### Left Join ###
+`leftjoin([`*t1, t2*`]` *, optional hash pairs, ...*`)`<br>
+Left Outer Join.  Note postfix syntax with array of 2 collections as first argument, hash pairs as subsequent arguments.  Objects in the first collection will be included in the output even if no match is found in the second collection.
+## Temp Collections ##
+`temp`<br>
+Temp collections are scratches defined within a `bloom` block:
+    temp :my_scratch1 <= foo
+The schema of a temp collection in inherited from the rhs; if the rhs has no
+schema, a simple one is manufactured to suit the data found in the rhs at
+runtime: `[c0, c1, ...]`.
+## Bud Modules ##
+A Bud module combines state (collections) and logic (Bloom rules). Using modules allows your program to be decomposed into a collection of smaller units.
+Definining a Bud module is identical to defining a Ruby module, except that the module can use the `bloom`, `bootstrap`, and `state` blocks described above.
+There are two ways to use a module *B* in another Bloom module *A*:
+  1. `include B`: This "inlines" the definitions (state and logic) from *B* into
+     *A*. Hence, collections defined in *B* can be accessed from *A* (via the
+     same syntax as *A*'s own collections). In fact, since Ruby is
+     dynamically-typed, Bloom statements in *B* can access collections
+     in *A*!
+  2. `import B => :b`: The `import` statement provides a more structured way to
+     access another module. Module *A* can now access state defined in *B* by
+     using the qualifier `b`. *A* can also import two different copies of *B*,
+     and give them local names `b1` and `b2`; these copies will be independent
+     (facts inserted into a collection defined in `b1` won't also be inserted
+     into `b2`'s copy of the collection).
+## Skeleton of a Bud Module ##
+    require 'rubygems'
+    require 'bud'
+    module YourModule
+      include Bud
+      state do
+        ...
+      end
+      bootstrap do
+        ...
+      end
+      bloom :some_stmts do
+        ...
+      end
+      bloom :more_stmts do
+        ...
+      end
+    end

data/docs/deploy.md ADDED Viewed

@@ -0,0 +1,96 @@
+# Deployment
+Bud provides support for deploying a program onto a set of Bud instances.  At the moment, two types of deployments are supported: local deployment and EC2 deployment.  Intuitively, you include the module corresponding to the type of deployment you want into a Bud class, which you instantiate and run on a node called the "deployer."  The deployer then spins up a requested number of Bud instances and distributes initial data.
+First, decide which type of deployment you want to use.
+## Local Deployment
+To use local deployment, you'll need to require it in your program:
+    require 'bud/deploy/localdeploy'
+Don't forget to include it in your class:
+    include LocalDeploy
+The next step is to declare how many nodes you want to the program to be spun up on.  You need to do this in a `deploystrap` block.  A `deploystrap` block is run before `bootstrap`, and is only run for a Bud class that is instantiated with the option "`:deploy => true`".  As an example:
+    deploystrap do
+      num_nodes <= [[2]]
+    end
+Local deployment will spin up `num_nodes` local processes, each containing one Bud instance, running the class that you include `LocalDeploy` in.  The deployment code will populate a binary collection called `node`; the first columm is a "node ID" -- a distinct integer from the range `[0, num_nodes - 1]` -- and the second argument is an "IP:port" string associated with the node.  Nodes are spun up on ephemeral ports, listening on "localhost".
+Now, you need to define how you want the initial data to be distributed.  You can do this, for example, by writing (multiple) rules with `initial_data` in the head.  The schema of `initial_data` is as follows: [node ID, relation name as a symbol, list of tuples].  For example, to distribute the IP address of the "deployer" to all of the other nodes in a relation called `master`, you might decide to write something like this:
+    initial_data <= node.map {|n| [n.uid, :master, [[ip_port]]]}
+Note that the relation ("master" in this case) is never a channel.  You may only distribute data to scratches and tables.  Initial data is transferred only after _all_ nodes are spun up; this ensures that initial data will never be lost because a node is not yet listening on a socket, for example.  Initial data is transmitted atomically to each node; this means that on each node, _all_ initial data in _all_ relations will arrive at the same Bud timestep.  However, there is no global barrier for transfer of initial data.  For example, if initial data is distributed to nodes 1 and 2, node 1 may receive its initial data first, and then send subsequent messages on channels to node 2 which node 2 may receive before its initial data.
+The rules defining `initial_data` may appear in any `bloom` block.
+The final step is to add `:deploy => true` to the instantiation of your class.  Note that the local deployer will spin up nodes without `:deploy => true`, so you don't forkbomb your system.
+## EC2 Deployment
+To use EC2 deployment you'll need to require it in your program:
+    require 'bud/deploy/ec2deploy'
+Don't forget to include it in your class:
+    include EC2Deploy
+As in local deployment, you'll need to define `num_nodes` in a `deploystrap` block.  Additionally in the `deploystrap` block, you need to define the following relations: `ruby_command`, `init_dir`, `access_key_id`, `secret_access_key`, `key_name`, `ec2_key_location`.  `ruby_command` is the command line to run on the EC2 nodes.  For example, if you want to run a file called `test.rb` on your EC2 nodes, you'd put:
+    ruby_command <= [["ruby test.rb"]]
+Note that whatever file you specify here _must_ take three arguments.  Here's the recommended boilerplate that you use for the file you want to deploy, assuming `Test` is the name of your class:
+    ip, port = ARGV[0].split(':')
+    ext_ip, ext_port = ARGV[1].split(':')
+    Test.new(:ip => ip,
+             :ext_ip => ext_ip,
+             :port => port,
+             :deploy => not ARGV[2]).run_fg
+`init_dir` is the directory that contains all of the Ruby files you want to deploy.  Alternatively, `init_dir` may be the single filename you include in your `ruby_command`.  If it is a directory, it must contain the file you execute in your `ruby_command`.  Unless you're doing something particularly fancy, you'll usually set `init_dir` to ".":
+    init_dir <= [["."]]
+This recursively copies all directories and files rooted at the current working directory.  `access_key_id` is your EC2 access key ID, and `secret_access_key` is your EC2 secret access key.
+    access_key_id <= [["XXXXXXXXXXXXXXXXXXXX"]]
+    secret_access_key <= [["XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"]]
+`key_name` is the name of the keypair you want to use to SSH in to the EC2 instances.  For example, if you have a keypair named "bob", you'd write:
+    key_name <= [["bob"]]
+Finally, `ec2_key_location` is the path to the private key of the `key_name` keypair.  For example:
+    key_name <= [["/home/bob/.ssh/ec2"]]
+EC2 deployment will spin up `num_nodes` instances (using defaults) on EC2 using a pre-rolled Bud AMI based on Amazon's 32-bit Linux AMI (`ami-8c1fece5`).  Each instance contains one Bud instance, which runs the `ruby_command`.  Like before, the deployment code will populate a binary relation called `node`; the first argument is a "node ID" -- a distinct integer from the range [0, num_nodes - 1] -- and the second argument is an "IP:port" string associated with the node.  Nodes are currently spun up on fixed port 54321.
+Defining initial data works exactly the same way with EC2 deployment as it does with local deployment.
+There is a slight catch with EC2 deployment.  Sometimes EC2 tells us that the nodes have all started up, but really, one or more nodes will never start up.  Currently, in this scenario, deployment exceeds the maximum number of ssh retries, and throws an exception.
+Note that EC2 deployment does *not* shut down the EC2 nodes it starts up under any circumstances.  This means you must use some alternate means to shut down the nodes, such as logging onto the EC2 web interface and terminating the nodes.
+## Examples
+Check out the `examples/deploy` directory in Bud.  There is a simple token ring example that establishes a ring involving 10 nodes and sends a token around the ring continuously.  This example can be deployed locally:
+    ruby tokenring-local.rb
+or on EC2:
+    ruby tokenring-ec2.rb local_ip:local_port ext_ip true
+Note that before running `tokenring-ec2`, you must create a "keys.rb" file that contains `access_key_id`, `secret_access_key`, `key_name` and `ec2_key_location`.
+Output will be displayed to show the progress of the deployment.  Be patient, it may take a while for output to appear.  Once deployment is complete and all nodes are ready, each node will display output indicating when it has the token.  All output will be visible for the local deployment case, whereas only the deployer node's output will be visible for the EC2 deployment case (stdout of all other nodes is materialized to disk).

data/docs/diffs ADDED Viewed

@@ -0,0 +1,181 @@
+24c24
+< channel used for communication (**JMH**: "bulk data transfer"?) between clients and datanodes and between
+---
+> channel used for communication between clients and datanodes and between
+35,43c35
+<     module FSProtocol
+<       state do
+<         interface input, :fsls, [:reqid, :path]
+<         interface input, :fscreate, [] => [:reqid, :name, :path, :data]
+<         interface input, :fsmkdir, [] => [:reqid, :name, :path]
+<         interface input, :fsrm, [] => [:reqid, :name, :path]
+<         interface output, :fsret, [:reqid, :status, :data]
+<       end
+<     end
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/fs_master.rb|12-20
+57,58d48
+< (**JMH**: I find it a bit confusing how you toggle from the discussion above to this naive file-storage design here.  Can you warn us a bit more clearly that this is a starting point focused on metadata, with a strawman for data storage to be overriden later?)
+<
+62,65c52
+<     module KVSFS
+<       include FSProtocol
+<       include BasicKVS
+<       include TimestepNonce
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/fs_master.rb|33-36
+67c54
+< If we wanted to replicate the metadata master (**JMH**: "the master node's metadata"?), we could consider mixing in a replicated KVS implementation instead of __BasicKVS__ -- but more on that later.
+---
+> If we wanted to replicate the metadata master, we could consider mixing in a replicated KVS implementation instead of __BasicKVS__ -- but more on that later.
+71c58
+< The directory listing operation is implemented by a simple block of Bloom statements:
+---
+> The directory listing operation is very simple:
+73,81c60
+<       bloom :elles do
+<         kvget <= fsls.map{ |l| [l.reqid, l.path] }
+<         fsret <= join([kvget_response, fsls], [kvget_response.reqid, fsls.reqid]).map{ |r, i| [r.reqid, true, r.value] }
+<         fsret <= fsls.map do |l|
+<           unless kvget_response.map{ |r| r.reqid}.include? l.reqid
+<             [l.reqid, false, nil]
+<           end
+<         end
+<       end
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/fs_master.rb|51-59
+96d74
+< (**JMH**: Transition: "The following Bloom code carries this out for ...")
+98,134c76
+<         dir_exists = join [check_parent_exists, kvget_response, nonce], [check_parent_exists.reqid, kvget_response.reqid]
+<
+<         check_is_empty <= join([fsrm, nonce]).map{|m, n| [n.ident, m.reqid, terminate_with_slash(m.path) + m.name] }
+<         kvget <= check_is_empty.map{|c| [c.reqid, c.name] }
+<         can_remove <= join([kvget_response, check_is_empty], [kvget_response.reqid, check_is_empty.reqid]).map do |r, c|
+<           [c.reqid, c.orig_reqid, c.name] if r.value.length == 0
+<         end
+<
+<         fsret <= dir_exists.map do |c, r, n|
+<           if c.mtype == :rm
+<             unless can_remove.map{|can| can.orig_reqid}.include? c.reqid
+<               [c.reqid, false, "directory #{} not empty"]
+<             end
+<           end
+<         end
+<
+<         # update dir entry
+<         # note that it is unnecessary to ensure that a file is created before its corresponding
+<         # directory entry, as both inserts into :kvput below will co-occur in the same timestep.
+<         kvput <= dir_exists.map do |c, r, n|
+<           if c.mtype == :rm
+<             if can_remove.map{|can| can.orig_reqid}.include? c.reqid
+<               [ip_port, c.path, n.ident, r.value.clone.reject{|item| item == c.name}]
+<             end
+<           else
+<             [ip_port, c.path, n.ident, r.value.clone.push(c.name)]
+<           end
+<         end
+<
+<         kvput <= dir_exists.map do |c, r, n|
+<           case c.mtype
+<             when :mkdir
+<               [ip_port, terminate_with_slash(c.path) + c.name, c.reqid, []]
+<             when :create
+<               [ip_port, terminate_with_slash(c.path) + c.name, c.reqid, "LEAF"]
+<           end
+<         end
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/fs_master.rb|74-110
+136d77
+< (**JMH**: This next sounds awkward.  You *do* take care: by using <= and understanding the atomicity of timesteps in Bloom.  I think what you mean to say is that Bloom's atomic timestep model makes this easy compared to ... something.)
+150c91
+<         table :chunk, [:chunkid, :file, :siz]
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/chunking.rb|26-26
+154c95
+<         table :chunk_cache, [:node, :chunkid, :time]
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/5c7734912e900c28087e39b3424a1e0191e13704/bfs/hb_master.rb|12-12
+156d96
+< (**JMH**: ambiguous reference ahead "these latter")
+161,171c101
+<     module ChunkedFSProtocol
+<       include FSProtocol
+<
+<       state do
+<         interface :input, :fschunklist, [:reqid, :file]
+<         interface :input, :fschunklocations, [:reqid, :chunkid]
+<         interface :input, :fsaddchunk, [:reqid, :file]
+<         # note that no output interface is defined.
+<         # we use :fsret (defined in FSProtocol) for output.
+<       end
+<     end
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/chunking.rb|6-16
+185,187c115
+<         chunk_buffer <= join([fschunklist, kvget_response, chunk], [fschunklist.reqid, kvget_response.reqid], [fschunklist.file, chunk.file]).map{ |l, r, c| [l.reqid, c.chunkid] }
+<         chunk_buffer2 <= chunk_buffer.group([chunk_buffer.reqid], accum(chunk_buffer.chunkid))
+<         fsret <= chunk_buffer2.map{ |c| [c.reqid, true, c.chunklist] }
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/chunking.rb|47-49
+191d118
+< (**JMH**: Ambiguous ref "If it")
+195,202c122
+<         minted_chunk = join([kvget_response, fsaddchunk, available, nonce], [kvget_response.reqid, fsaddchunk.reqid])
+<         chunk <= minted_chunk.map{ |r, a, v, n| [n.ident, a.file, 0] }
+<         fsret <= minted_chunk.map{ |r, a, v, n| [r.reqid, true, [n.ident, v.pref_list.slice(0, (REP_FACTOR + 2))]] }
+<         fsret <= join([kvget_response, fsaddchunk], [kvget_response.reqid, fsaddchunk.reqid]).map do |r, a|
+<           if available.empty? or available.first.pref_list.length < REP_FACTOR
+<             [r.reqid, false, "datanode set cannot satisfy REP_FACTOR = #{REP_FACTOR} with [#{available.first.nil? ? "NIL" : available.first.pref_list.inspect}]"]
+<           end
+<         end
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/chunking.rb|69-76
+204d123
+< (**JMH**: Ambiguous ref "If it")
+208,212c127
+<         fsret <= fschunklocations.map do |l|
+<           unless chunk_cache.map{|c| c.chunkid}.include? l.chunkid
+<             [l.reqid, false, "no datanodes found for #{l.chunkid} in cc, now #{chunk_cache.length}"]
+<           end
+<         end
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/chunking.rb|54-58
+216,219c131
+<         chunkjoin = join [fschunklocations, chunk_cache], [fschunklocations.chunkid, chunk_cache.chunkid]
+<         host_buffer <= chunkjoin.map{|l, c| [l.reqid, c.node] }
+<         host_buffer2 <= host_buffer.group([host_buffer.reqid], accum(host_buffer.host))
+<         fsret <= host_buffer2.map{|c| [c.reqid, true, c.hostlist] }
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/chunking.rb|61-64
+229,233c141
+<     module BFSDatanode
+<       include HeartbeatAgent
+<       include StaticMembership
+<       include TimestepNonce
+<       include BFSHBProtocol
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/datanode.rb|9-13
+241c149
+<         @dp_server = DataProtocolServer.new(dataport)
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/datanode.rb|53-53
+245,250c153
+<         dir_contents <= hb_timer.flat_map do |t|
+<           dir = Dir.new("#{DATADIR}/#{@data_port}")
+<           files = dir.to_a.map{|d| d.to_i unless d =~ /^\./}.uniq!
+<           dir.close
+<           files.map {|f| [f, Time.parse(t.val).to_f]}
+<         end
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/datanode.rb|24-29
+255,259c158
+<         to_payload <= join([dir_contents, nonce]).map do |c, n|
+<           unless server_knows.map{|s| s.file}.include? c.file
+<             [n.ident, c.file, c.time]
+<           end
+<         end
+---
+> ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/datanode.rb|31-35
+275d173
+< ### I am autogenerated.  Please do not edit me.