bud 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (62) hide show
  1. data/LICENSE +9 -0
  2. data/README +30 -0
  3. data/bin/budplot +134 -0
  4. data/bin/budvis +201 -0
  5. data/bin/rebl +4 -0
  6. data/docs/README.md +13 -0
  7. data/docs/bfs.md +379 -0
  8. data/docs/bfs.raw +251 -0
  9. data/docs/bfs_arch.png +0 -0
  10. data/docs/bloom-loop.png +0 -0
  11. data/docs/bust.md +83 -0
  12. data/docs/cheat.md +291 -0
  13. data/docs/deploy.md +96 -0
  14. data/docs/diffs +181 -0
  15. data/docs/getstarted.md +296 -0
  16. data/docs/intro.md +36 -0
  17. data/docs/modules.md +112 -0
  18. data/docs/operational.md +96 -0
  19. data/docs/rebl.md +99 -0
  20. data/docs/ruby_hooks.md +19 -0
  21. data/docs/visualizations.md +75 -0
  22. data/examples/README +1 -0
  23. data/examples/basics/hello.rb +12 -0
  24. data/examples/basics/out +1103 -0
  25. data/examples/basics/out.new +856 -0
  26. data/examples/basics/paths.rb +51 -0
  27. data/examples/bust/README.md +9 -0
  28. data/examples/bust/bustclient-example.rb +23 -0
  29. data/examples/bust/bustinspector.html +135 -0
  30. data/examples/bust/bustserver-example.rb +18 -0
  31. data/examples/chat/README.md +9 -0
  32. data/examples/chat/chat.rb +45 -0
  33. data/examples/chat/chat_protocol.rb +8 -0
  34. data/examples/chat/chat_server.rb +29 -0
  35. data/examples/deploy/tokenring-ec2.rb +26 -0
  36. data/examples/deploy/tokenring-local.rb +17 -0
  37. data/examples/deploy/tokenring.rb +39 -0
  38. data/lib/bud/aggs.rb +126 -0
  39. data/lib/bud/bud_meta.rb +185 -0
  40. data/lib/bud/bust/bust.rb +126 -0
  41. data/lib/bud/bust/client/idempotence.rb +10 -0
  42. data/lib/bud/bust/client/restclient.rb +49 -0
  43. data/lib/bud/collections.rb +937 -0
  44. data/lib/bud/depanalysis.rb +44 -0
  45. data/lib/bud/deploy/countatomicdelivery.rb +50 -0
  46. data/lib/bud/deploy/deployer.rb +67 -0
  47. data/lib/bud/deploy/ec2deploy.rb +200 -0
  48. data/lib/bud/deploy/localdeploy.rb +41 -0
  49. data/lib/bud/errors.rb +15 -0
  50. data/lib/bud/graphs.rb +405 -0
  51. data/lib/bud/joins.rb +300 -0
  52. data/lib/bud/rebl.rb +314 -0
  53. data/lib/bud/rewrite.rb +523 -0
  54. data/lib/bud/rtrace.rb +27 -0
  55. data/lib/bud/server.rb +43 -0
  56. data/lib/bud/state.rb +108 -0
  57. data/lib/bud/storage/tokyocabinet.rb +170 -0
  58. data/lib/bud/storage/zookeeper.rb +178 -0
  59. data/lib/bud/stratify.rb +83 -0
  60. data/lib/bud/viz.rb +65 -0
  61. data/lib/bud.rb +797 -0
  62. metadata +330 -0
data/docs/bust.md ADDED
@@ -0,0 +1,83 @@
1
+ BUST stands for BUd State Transfer and it is a REST interface to BUD. BUST consists of a Bud implementation of a client and server. The client implements bindings to a subset of the Ruby Nestful library, and the server is a lightweight HTTP server written in Ruby. Note that the BUST server currently sets the "Access-Control-Allow-Origin: *" HTTP header to override web browsers' same-origin policy.
2
+
3
+ Right now BUST supports "GET" and "POST" requests, and may support "DELETE" and "PUT" requests in the future.
4
+
5
+ # BUST Server
6
+
7
+ For the BUST server, a "GET" request corresponds to retrieving a subset of rows of a table, and a "POST" request corresponds to inserting a row into a table. For example, the following "GET" request (assuming BUST is running on port 8080):
8
+
9
+ GET localhost:8080/foo?bar=hello&baz=world
10
+
11
+ would retrieve all rows in table "foo" where named schema attribute "bar" is equal to the string "hello", and named schema attribute "baz" is equal to the string "world". Right now, one limitation of BUST is that only strings are supported.
12
+
13
+ To use BUST in your program, ensure you have the json gem installed. Add the "require" line for BUST:
14
+
15
+ require "bud/bust/bust"
16
+
17
+ In your class, make sure to:
18
+
19
+ include Bust
20
+
21
+ That's it! Now a BUST server will be started up when your class is instantiated. By default, this server will listen on port 8080, but you can change this by passing a port via the "bust_port" option when you instantiate your class.
22
+
23
+ You can test out the BUST server using Ruby's "net/http" library if you want, and you can also check out "BUST Inspector", a sample AJAX application that allows you to view the state of a bud instance.
24
+
25
+ ## net/http Example
26
+
27
+ Try running "bustserver-example.rb" in the "examples/bust" directory:
28
+
29
+ ruby bustserver-example.rb
30
+
31
+ Now, let's interact with our example using "net/http" from within IRB. Start up an IRB instance:
32
+
33
+ irb
34
+ irb(main):001:0> require 'net/http'
35
+ => true
36
+
37
+ bustexample.rb defines a single relation called "foo":
38
+
39
+ table :foo, [:bar, :baz, :qux]
40
+
41
+ Let's fire off some requests. First, let's put a new foo fact in:
42
+
43
+ irb(main):002:0> res = Net::HTTP.post_form(URI.parse('http://localhost:8080/foo'), {:bar => "a", :baz => "b", :qux => "c"})
44
+ => #<Net::HTTPOK 200 /OK readbody=true>
45
+
46
+ Now, let's retrieve all foo facts where the "qux" attribute is "c", and the "baz" attribute is "b":
47
+
48
+ irb(main):003:0> res = Net::HTTP.get(URI.parse('http://localhost:8080/foo?qux=c&baz=b'))
49
+ => "[[\"a\",\"b\",\"c\"]]"
50
+
51
+ Note that the response is a JSON array.
52
+
53
+
54
+ ## BUST Inspector
55
+
56
+ BUST Inspector -- an example app that uses XMLHttpRequests to inspect state in a Bud program using BUST is included -- (bust/bustinspector.html). Right now, it assumes that the Bud instance you're trying to inspect is listening on "localhost" at port "8080". BUST Inspector is tested to work in Firefox, and may or may not work in other browsers. BUST Inspector will query your Bud instance every second for metadata describing the tables and their schema. It will display a list of the tables in a pane on the left of the screen, with a checkbox next to each table. Selecting a checkbox renders the current table contents in the right pane (these are also updated every second while the box is checked).
57
+
58
+
59
+ # BUST Client
60
+
61
+ The BUST client (located in the "bust/client" folder) allows Bud applications to access REST services (including a Bud client hosting a BUST instance). The REST client is basically a wrapper for the Ruby nestful library. You'll need to ensure you have the "nestful" gem installed before you can use the REST client. To use it in your application, you need to put the require line:
62
+
63
+ require 'bud/bust/client/restclient'
64
+
65
+ and the include line:
66
+
67
+ include RestClient
68
+
69
+ To make requests, insert into the rest_req interface, whose defintion is reproduced below:
70
+
71
+ interface input, :rest_req, [:rid, :verb, :form, :url, :params]
72
+
73
+ "rid" is a unique ID for the request, "verb" is one of ":get" or ":post", "form" is the format of the request, for example, you might use ":json", or if you're doing a form post, you'd use "form". If set to nil, "form" defaults to ":form" for ":post", and is omitted from a ":get". For ":get" requests, the "form" parameter seems to be appended onto the end of "url". For example, if you do a ":get" for "http://example.com/ex" with "form" set to ":json", the library sends an HTTP GET to "http://example.com/ex.json". "params" is a hash, which comprises the query string for a ":get", and the contents of the body in a ":post" with "form" set to ":form".
74
+
75
+ The output interface is:
76
+
77
+ interface output, :rest_response, [:rid, :resp, :exception]
78
+
79
+ "rid" is the unique ID supplied when the request was made, "resp" is the parsed response from the server. For example, if you do a ":json" ":get", then "resp" will contain whatever JSON object was returned converted into a Ruby object, e.g., array, hash, etc. If there is an exception, then "resp" will contain a string describing the exception, and "exception" will be set to true; otherwise, "exception" will be set to false.
80
+
81
+ A simple example is included (bustclient-example.rb) that does an HTTP GET on Twitter's public timeline, returning the most recent statuses, and prints them to stdout. The example is in "examples/bust/client".
82
+
83
+ The BUST client does not yet support OAuth. Also unsupported so far is HTTP DELETE and PUT.
data/docs/cheat.md ADDED
@@ -0,0 +1,291 @@
1
+ # Bud Cheat Sheet #
2
+
3
+ ## General Bloom Syntax Rules ##
4
+ Bloom programs are unordered sets of statements.<br>
5
+ Statements are delimited by semicolons (;) or newlines. <br>
6
+ As in Ruby, backslash is used to escape a newline.<br>
7
+
8
+ ## Simple embedding of Bud in a Ruby Class ##
9
+ require 'bud'
10
+
11
+ class Foo
12
+ include Bud
13
+
14
+ state do
15
+ ...
16
+ end
17
+
18
+ bloom do
19
+ ...
20
+ end
21
+ end
22
+
23
+ ## State Declarations ##
24
+ A `state` block contains Bud collection definitions.
25
+
26
+ ### Default Declaration Syntax ###
27
+ *BudCollection :name, [keys] => [values]*
28
+
29
+ ### table ###
30
+ Contents persist in memory until explicitly deleted.<br>
31
+ Default attributes: `[:key] => [:val]`
32
+
33
+ table :keyvalue
34
+ table :composite, [:keyfield1, :keyfield2] => [:values]
35
+ table :noDups, [:field1, field2]
36
+
37
+ ### scratch ###
38
+ Contents emptied at start of each timestep.<br>
39
+ Default attributes: `[:key] => [:val]`
40
+
41
+ scratch :stats
42
+
43
+ ### interface ###
44
+ Scratch collections, used as connection points between modules.<br>
45
+ Default attributes: `[:key] => [:val]`
46
+
47
+ interface input, :request
48
+ interface output, :response
49
+
50
+ ### channel ###
51
+ Network channel manifested as a scratch collection.<br>
52
+ Facts that are inserted into a channel are sent to a remote host; the address of the remote host is specified in an attribute of the channel that is denoted with `@`.<br>
53
+ Default attributes: `[:@address, :val] => []`
54
+
55
+ (Bloom statements with channel on lhs must use async merge (`<~`).)
56
+
57
+ channel :msgs
58
+ channel :req_chan, [:@address, :cartnum, :storenum] => [:command, :params]
59
+
60
+ ### periodic ###
61
+ System timer manifested as a scratch collection.<br>
62
+ System-provided attributes: `[:key] => [:val]`<br>
63
+ &nbsp;&nbsp;&nbsp;&nbsp; (`key` is a unique ID, `val` is a Ruby Time converted to a string.)<br>
64
+ State declaration includes interval (in seconds).
65
+
66
+ (periodic can only be used on rhs of a Bloom statement.)
67
+
68
+ periodic :timer, 0.1
69
+
70
+ ### stdio ###
71
+ Built-in scratch collection mapped to Ruby's `$stdin` and `$stdout`<br>
72
+ System-provided attributes: `[:line] => []`
73
+
74
+ Statements with stdio on lhs must use async merge (`<~`).<br>
75
+ To capture `$stdin` on rhs, instantiate Bud with `:read_stdin` option.<br>
76
+
77
+ ### tctable ###
78
+ Table collection mapped to a [Tokyo Cabinet](http://fallabs.com/tokyocabinet/) store.<br>
79
+ Default attributes: `[:key] => [:val]`
80
+
81
+ tctable :t1
82
+ tctable :t2, [:k1, :k2] => [:v1, :v2]
83
+
84
+ ### zktable ###
85
+ Table collection mapped to an [Apache Zookeeper](http://hadoop.apache.org/zookeeper/) store.<br>
86
+ System-provided attributes: `[:key] => [:val]`<br>
87
+ State declaration includes Zookeeper path and optional TCP string (default: "localhost:2181")<br>
88
+
89
+ zktable :foo, "/bat"
90
+ zktable :bar, "/dat", "localhost:2182"
91
+
92
+
93
+ ## Bloom Statements ##
94
+ *lhs BloomOp rhs*
95
+
96
+ Left-hand-side (lhs) is a named `BudCollection` object.<br>
97
+ Right-hand-side (rhs) is a Ruby expression producing a `BudCollection` or `Array` of `Arrays`.<br>
98
+ BloomOp is one of the 5 operators listed below.
99
+
100
+ ## Bloom Operators ##
101
+ merges:
102
+
103
+ * `left <= right` &nbsp;&nbsp;&nbsp;&nbsp; (*instantaneous*)
104
+ * `left <+ right` &nbsp;&nbsp;&nbsp;&nbsp; (*deferred*)
105
+ * `left <~ right` &nbsp;&nbsp;&nbsp;&nbsp; (*asynchronous*)
106
+
107
+ delete:
108
+
109
+ * `left <- right` &nbsp;&nbsp;&nbsp;&nbsp; (*deferred*)
110
+
111
+ insert:<br>
112
+
113
+ * `left << [...]` &nbsp;&nbsp;&nbsp;&nbsp; (*instantaneous*)
114
+
115
+ Note that unlike merge/delete, insert expects a single fact on the rhs, rather
116
+ than a collection.
117
+
118
+ ## Collection Methods ##
119
+ Standard Ruby methods used on a BudCollection `bc`:
120
+
121
+ implicit map:
122
+
123
+ t1 <= bc {|t| [t.col1 + 4, t.col2.chomp]} # formatting/projection
124
+ t2 <= bc {|t| t if t.col = 5} # selection
125
+
126
+ `flat_map`:
127
+
128
+ require 'backports' # flat_map not included in Ruby 1.8 by default
129
+
130
+ t3 <= bc.flat_map do |t| # unnest a collection-valued attribute
131
+ bc.col4.map { |sub| [t.col1, t.col2, t.col3, sub] }
132
+ end
133
+
134
+ `bc.reduce`, `bc.inject`:
135
+
136
+ t4 <= bc.reduce({}) do |memo, t| # example: groupby col1 and count
137
+ memo[t.col1] ||= 0
138
+ memo[t.col1] += 1
139
+ memo
140
+ end
141
+
142
+ `bc.include?`:
143
+
144
+ t5 <= bc do |t| # like SQL's NOT IN
145
+ t unless t2.include?([t.col1, t.col2])
146
+ end
147
+
148
+ ## BudCollection-Specific Methods ##
149
+ `bc.keys`: projects `bc` to key columns<br>
150
+
151
+ `bc.values`: projects `bc` to non-key columns<br>
152
+
153
+ `bc.inspected`: shorthand for `bc {|t| [t.inspect]}`
154
+
155
+ stdio <~ bc.inspected
156
+
157
+ `chan.payloads`: shorthand for `chan {|t| t.val}`, only defined for channels
158
+
159
+ # at sender
160
+ msgs <~ requests {|r| "127.0.0.1:12345", r}
161
+ # at receiver
162
+ requests <= msgs.payloads
163
+
164
+ `bc.exists?`: test for non-empty collection. Can optionally pass in a block.
165
+
166
+ stdio <~ [["Wake Up!"] if timer.exists?]
167
+ stdio <~ requests do |r|
168
+ [r.inspect] if msgs.exists?{|m| r.ident == m.ident}
169
+ end
170
+
171
+ ## SQL-style grouping/aggregation (and then some) ##
172
+
173
+ * `bc.group([:col1, :col2], min(:col3))`. *akin to min(col3) GROUP BY (col1,col2)*
174
+ * exemplary aggs: `min`, `max`, `choose`
175
+ * summary aggs: `sum`, `avg`, `count`
176
+ * structural aggs: `accum`
177
+ * `bc.argmax([:col1], :col2)` &nbsp;&nbsp;&nbsp;&nbsp; *returns the bc tuple per col1 that has highest col2*
178
+ * `bc.argmin([:col1], :col2)`
179
+
180
+ ### Built-in Aggregates: ###
181
+
182
+ * Exemplary aggs: `min`, `max`, `choose`
183
+ * Summary aggs: `count`, `sum`, `avg`
184
+ * Structural aggs: `accum`
185
+
186
+ ## Collection Combination (Join) ###
187
+ To match items across two (or more) collections, use the `*` operator, followed by methods to filter/format the result (`pairs`, `matches`, `combos`, `lefts`, `rights`).
188
+
189
+ ### Methods on Combinations (Joins) ###
190
+
191
+ `pairs(`*hash pairs*`)`: <br>
192
+ given a `*` expression, form all pairs of items with value matches in the hash-pairs attributes. Hash pairs can be fully qualified (`coll1.attr1 => coll2.attr2`) or shorthand (`:attr1 => :attr2`).
193
+
194
+ # for each inbound msg, find match in a persistent buffer
195
+ result <= (msg * buffer).pairs(:val => :key) {|m, b| [m.address, m.val, b.val] }
196
+
197
+ `pairs(`*hash pairs*`)`: <br>
198
+ alias for `pairs`, more readable for multi-collection `*` expressions. Must use fully-qualified hash pairs.
199
+
200
+ # the following 2 Bloom statements are equivalent to this SQL
201
+ # SELECT r.a, s_tab.b, t.c
202
+ # FROM r, s_tab, t
203
+ # WHERE r.x = s_tab.x
204
+ # AND s_tab.x = t.x;
205
+
206
+ # multiple column matches
207
+ out <= (r * s_tab * t).combos(r.x => s_tab.x, s_tab.x => t.x) do |t1, t2, t3|
208
+ [t1.a, t2.b, t3.c]
209
+ end
210
+
211
+ # column matching done per pair: this will be very slow
212
+ out <= join([r,s_tab,t]) do |t1, t2, t3|
213
+ [t1.a, t2.b, t3.c] if r.x == s_tab.x and s_tab.x == t.x
214
+ end
215
+
216
+ `matches`:<br>
217
+ Shorthand for `combos` with hash pairs for all attributes with matching names.
218
+
219
+ # Equivalent to the above statements if x is the only attribute name in common:
220
+ out <= (r * s_tab * t).matches do {|t1, t2, t3| [t1.a, t2.b, t3.c]}
221
+
222
+ `lefts(`*hash pairs*`)`: <br>
223
+ Like `pairs`, but implicitly includes a block that projects down to the left item in each pair.
224
+
225
+ `rights(`*hash pairs*`)`:
226
+ Like `pairs`, but implicitly includes a block that projects down to the right item in each pair.
227
+
228
+ `flatten`<br>
229
+ `flatten` is a bit like SQL's `SELECT *`: it produces a collection of concatenated objects, with a schema that is the concatenation of the schemas in tablelist (with duplicate names disambiguated.) Useful for chaining to operators that expect input collections with schemas, e.g. group:
230
+
231
+ out <= (r * s).matches.flatten.group([:a], max(:b))
232
+
233
+ ### Left Join ###
234
+ `leftjoin([`*t1, t2*`]` *, optional hash pairs, ...*`)`<br>
235
+ Left Outer Join. Note postfix syntax with array of 2 collections as first argument, hash pairs as subsequent arguments. Objects in the first collection will be included in the output even if no match is found in the second collection.
236
+
237
+ ## Temp Collections ##
238
+ `temp`<br>
239
+ Temp collections are scratches defined within a `bloom` block:
240
+
241
+ temp :my_scratch1 <= foo
242
+
243
+ The schema of a temp collection in inherited from the rhs; if the rhs has no
244
+ schema, a simple one is manufactured to suit the data found in the rhs at
245
+ runtime: `[c0, c1, ...]`.
246
+
247
+ ## Bud Modules ##
248
+ A Bud module combines state (collections) and logic (Bloom rules). Using modules allows your program to be decomposed into a collection of smaller units.
249
+
250
+ Definining a Bud module is identical to defining a Ruby module, except that the module can use the `bloom`, `bootstrap`, and `state` blocks described above.
251
+
252
+ There are two ways to use a module *B* in another Bloom module *A*:
253
+
254
+ 1. `include B`: This "inlines" the definitions (state and logic) from *B* into
255
+ *A*. Hence, collections defined in *B* can be accessed from *A* (via the
256
+ same syntax as *A*'s own collections). In fact, since Ruby is
257
+ dynamically-typed, Bloom statements in *B* can access collections
258
+ in *A*!
259
+
260
+ 2. `import B => :b`: The `import` statement provides a more structured way to
261
+ access another module. Module *A* can now access state defined in *B* by
262
+ using the qualifier `b`. *A* can also import two different copies of *B*,
263
+ and give them local names `b1` and `b2`; these copies will be independent
264
+ (facts inserted into a collection defined in `b1` won't also be inserted
265
+ into `b2`'s copy of the collection).
266
+
267
+ ## Skeleton of a Bud Module ##
268
+
269
+ require 'rubygems'
270
+ require 'bud'
271
+
272
+ module YourModule
273
+ include Bud
274
+
275
+ state do
276
+ ...
277
+ end
278
+
279
+ bootstrap do
280
+ ...
281
+ end
282
+
283
+ bloom :some_stmts do
284
+ ...
285
+ end
286
+
287
+ bloom :more_stmts do
288
+ ...
289
+ end
290
+ end
291
+
data/docs/deploy.md ADDED
@@ -0,0 +1,96 @@
1
+ # Deployment
2
+
3
+ Bud provides support for deploying a program onto a set of Bud instances. At the moment, two types of deployments are supported: local deployment and EC2 deployment. Intuitively, you include the module corresponding to the type of deployment you want into a Bud class, which you instantiate and run on a node called the "deployer." The deployer then spins up a requested number of Bud instances and distributes initial data.
4
+
5
+ First, decide which type of deployment you want to use.
6
+
7
+ ## Local Deployment
8
+
9
+ To use local deployment, you'll need to require it in your program:
10
+
11
+ require 'bud/deploy/localdeploy'
12
+
13
+ Don't forget to include it in your class:
14
+
15
+ include LocalDeploy
16
+
17
+ The next step is to declare how many nodes you want to the program to be spun up on. You need to do this in a `deploystrap` block. A `deploystrap` block is run before `bootstrap`, and is only run for a Bud class that is instantiated with the option "`:deploy => true`". As an example:
18
+
19
+ deploystrap do
20
+ num_nodes <= [[2]]
21
+ end
22
+
23
+ Local deployment will spin up `num_nodes` local processes, each containing one Bud instance, running the class that you include `LocalDeploy` in. The deployment code will populate a binary collection called `node`; the first columm is a "node ID" -- a distinct integer from the range `[0, num_nodes - 1]` -- and the second argument is an "IP:port" string associated with the node. Nodes are spun up on ephemeral ports, listening on "localhost".
24
+
25
+ Now, you need to define how you want the initial data to be distributed. You can do this, for example, by writing (multiple) rules with `initial_data` in the head. The schema of `initial_data` is as follows: [node ID, relation name as a symbol, list of tuples]. For example, to distribute the IP address of the "deployer" to all of the other nodes in a relation called `master`, you might decide to write something like this:
26
+
27
+ initial_data <= node.map {|n| [n.uid, :master, [[ip_port]]]}
28
+
29
+ Note that the relation ("master" in this case) is never a channel. You may only distribute data to scratches and tables. Initial data is transferred only after _all_ nodes are spun up; this ensures that initial data will never be lost because a node is not yet listening on a socket, for example. Initial data is transmitted atomically to each node; this means that on each node, _all_ initial data in _all_ relations will arrive at the same Bud timestep. However, there is no global barrier for transfer of initial data. For example, if initial data is distributed to nodes 1 and 2, node 1 may receive its initial data first, and then send subsequent messages on channels to node 2 which node 2 may receive before its initial data.
30
+
31
+ The rules defining `initial_data` may appear in any `bloom` block.
32
+
33
+ The final step is to add `:deploy => true` to the instantiation of your class. Note that the local deployer will spin up nodes without `:deploy => true`, so you don't forkbomb your system.
34
+
35
+
36
+ ## EC2 Deployment
37
+
38
+ To use EC2 deployment you'll need to require it in your program:
39
+
40
+ require 'bud/deploy/ec2deploy'
41
+
42
+ Don't forget to include it in your class:
43
+
44
+ include EC2Deploy
45
+
46
+ As in local deployment, you'll need to define `num_nodes` in a `deploystrap` block. Additionally in the `deploystrap` block, you need to define the following relations: `ruby_command`, `init_dir`, `access_key_id`, `secret_access_key`, `key_name`, `ec2_key_location`. `ruby_command` is the command line to run on the EC2 nodes. For example, if you want to run a file called `test.rb` on your EC2 nodes, you'd put:
47
+
48
+ ruby_command <= [["ruby test.rb"]]
49
+
50
+ Note that whatever file you specify here _must_ take three arguments. Here's the recommended boilerplate that you use for the file you want to deploy, assuming `Test` is the name of your class:
51
+
52
+ ip, port = ARGV[0].split(':')
53
+ ext_ip, ext_port = ARGV[1].split(':')
54
+ Test.new(:ip => ip,
55
+ :ext_ip => ext_ip,
56
+ :port => port,
57
+ :deploy => not ARGV[2]).run_fg
58
+
59
+ `init_dir` is the directory that contains all of the Ruby files you want to deploy. Alternatively, `init_dir` may be the single filename you include in your `ruby_command`. If it is a directory, it must contain the file you execute in your `ruby_command`. Unless you're doing something particularly fancy, you'll usually set `init_dir` to ".":
60
+
61
+ init_dir <= [["."]]
62
+
63
+ This recursively copies all directories and files rooted at the current working directory. `access_key_id` is your EC2 access key ID, and `secret_access_key` is your EC2 secret access key.
64
+
65
+ access_key_id <= [["XXXXXXXXXXXXXXXXXXXX"]]
66
+ secret_access_key <= [["XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"]]
67
+
68
+ `key_name` is the name of the keypair you want to use to SSH in to the EC2 instances. For example, if you have a keypair named "bob", you'd write:
69
+
70
+ key_name <= [["bob"]]
71
+
72
+ Finally, `ec2_key_location` is the path to the private key of the `key_name` keypair. For example:
73
+
74
+ key_name <= [["/home/bob/.ssh/ec2"]]
75
+
76
+ EC2 deployment will spin up `num_nodes` instances (using defaults) on EC2 using a pre-rolled Bud AMI based on Amazon's 32-bit Linux AMI (`ami-8c1fece5`). Each instance contains one Bud instance, which runs the `ruby_command`. Like before, the deployment code will populate a binary relation called `node`; the first argument is a "node ID" -- a distinct integer from the range [0, num_nodes - 1] -- and the second argument is an "IP:port" string associated with the node. Nodes are currently spun up on fixed port 54321.
77
+
78
+ Defining initial data works exactly the same way with EC2 deployment as it does with local deployment.
79
+
80
+ There is a slight catch with EC2 deployment. Sometimes EC2 tells us that the nodes have all started up, but really, one or more nodes will never start up. Currently, in this scenario, deployment exceeds the maximum number of ssh retries, and throws an exception.
81
+
82
+ Note that EC2 deployment does *not* shut down the EC2 nodes it starts up under any circumstances. This means you must use some alternate means to shut down the nodes, such as logging onto the EC2 web interface and terminating the nodes.
83
+
84
+ ## Examples
85
+
86
+ Check out the `examples/deploy` directory in Bud. There is a simple token ring example that establishes a ring involving 10 nodes and sends a token around the ring continuously. This example can be deployed locally:
87
+
88
+ ruby tokenring-local.rb
89
+
90
+ or on EC2:
91
+
92
+ ruby tokenring-ec2.rb local_ip:local_port ext_ip true
93
+
94
+ Note that before running `tokenring-ec2`, you must create a "keys.rb" file that contains `access_key_id`, `secret_access_key`, `key_name` and `ec2_key_location`.
95
+
96
+ Output will be displayed to show the progress of the deployment. Be patient, it may take a while for output to appear. Once deployment is complete and all nodes are ready, each node will display output indicating when it has the token. All output will be visible for the local deployment case, whereas only the deployer node's output will be visible for the EC2 deployment case (stdout of all other nodes is materialized to disk).
data/docs/diffs ADDED
@@ -0,0 +1,181 @@
1
+ 24c24
2
+ < channel used for communication (**JMH**: "bulk data transfer"?) between clients and datanodes and between
3
+ ---
4
+ > channel used for communication between clients and datanodes and between
5
+ 35,43c35
6
+ < module FSProtocol
7
+ < state do
8
+ < interface input, :fsls, [:reqid, :path]
9
+ < interface input, :fscreate, [] => [:reqid, :name, :path, :data]
10
+ < interface input, :fsmkdir, [] => [:reqid, :name, :path]
11
+ < interface input, :fsrm, [] => [:reqid, :name, :path]
12
+ < interface output, :fsret, [:reqid, :status, :data]
13
+ < end
14
+ < end
15
+ ---
16
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/fs_master.rb|12-20
17
+ 57,58d48
18
+ < (**JMH**: I find it a bit confusing how you toggle from the discussion above to this naive file-storage design here. Can you warn us a bit more clearly that this is a starting point focused on metadata, with a strawman for data storage to be overriden later?)
19
+ <
20
+ 62,65c52
21
+ < module KVSFS
22
+ < include FSProtocol
23
+ < include BasicKVS
24
+ < include TimestepNonce
25
+ ---
26
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/fs_master.rb|33-36
27
+ 67c54
28
+ < If we wanted to replicate the metadata master (**JMH**: "the master node's metadata"?), we could consider mixing in a replicated KVS implementation instead of __BasicKVS__ -- but more on that later.
29
+ ---
30
+ > If we wanted to replicate the metadata master, we could consider mixing in a replicated KVS implementation instead of __BasicKVS__ -- but more on that later.
31
+ 71c58
32
+ < The directory listing operation is implemented by a simple block of Bloom statements:
33
+ ---
34
+ > The directory listing operation is very simple:
35
+ 73,81c60
36
+ < bloom :elles do
37
+ < kvget <= fsls.map{ |l| [l.reqid, l.path] }
38
+ < fsret <= join([kvget_response, fsls], [kvget_response.reqid, fsls.reqid]).map{ |r, i| [r.reqid, true, r.value] }
39
+ < fsret <= fsls.map do |l|
40
+ < unless kvget_response.map{ |r| r.reqid}.include? l.reqid
41
+ < [l.reqid, false, nil]
42
+ < end
43
+ < end
44
+ < end
45
+ ---
46
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/fs_master.rb|51-59
47
+ 96d74
48
+ < (**JMH**: Transition: "The following Bloom code carries this out for ...")
49
+ 98,134c76
50
+ < dir_exists = join [check_parent_exists, kvget_response, nonce], [check_parent_exists.reqid, kvget_response.reqid]
51
+ <
52
+ < check_is_empty <= join([fsrm, nonce]).map{|m, n| [n.ident, m.reqid, terminate_with_slash(m.path) + m.name] }
53
+ < kvget <= check_is_empty.map{|c| [c.reqid, c.name] }
54
+ < can_remove <= join([kvget_response, check_is_empty], [kvget_response.reqid, check_is_empty.reqid]).map do |r, c|
55
+ < [c.reqid, c.orig_reqid, c.name] if r.value.length == 0
56
+ < end
57
+ <
58
+ < fsret <= dir_exists.map do |c, r, n|
59
+ < if c.mtype == :rm
60
+ < unless can_remove.map{|can| can.orig_reqid}.include? c.reqid
61
+ < [c.reqid, false, "directory #{} not empty"]
62
+ < end
63
+ < end
64
+ < end
65
+ <
66
+ < # update dir entry
67
+ < # note that it is unnecessary to ensure that a file is created before its corresponding
68
+ < # directory entry, as both inserts into :kvput below will co-occur in the same timestep.
69
+ < kvput <= dir_exists.map do |c, r, n|
70
+ < if c.mtype == :rm
71
+ < if can_remove.map{|can| can.orig_reqid}.include? c.reqid
72
+ < [ip_port, c.path, n.ident, r.value.clone.reject{|item| item == c.name}]
73
+ < end
74
+ < else
75
+ < [ip_port, c.path, n.ident, r.value.clone.push(c.name)]
76
+ < end
77
+ < end
78
+ <
79
+ < kvput <= dir_exists.map do |c, r, n|
80
+ < case c.mtype
81
+ < when :mkdir
82
+ < [ip_port, terminate_with_slash(c.path) + c.name, c.reqid, []]
83
+ < when :create
84
+ < [ip_port, terminate_with_slash(c.path) + c.name, c.reqid, "LEAF"]
85
+ < end
86
+ < end
87
+ ---
88
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/fs_master.rb|74-110
89
+ 136d77
90
+ < (**JMH**: This next sounds awkward. You *do* take care: by using <= and understanding the atomicity of timesteps in Bloom. I think what you mean to say is that Bloom's atomic timestep model makes this easy compared to ... something.)
91
+ 150c91
92
+ < table :chunk, [:chunkid, :file, :siz]
93
+ ---
94
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/chunking.rb|26-26
95
+ 154c95
96
+ < table :chunk_cache, [:node, :chunkid, :time]
97
+ ---
98
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/5c7734912e900c28087e39b3424a1e0191e13704/bfs/hb_master.rb|12-12
99
+ 156d96
100
+ < (**JMH**: ambiguous reference ahead "these latter")
101
+ 161,171c101
102
+ < module ChunkedFSProtocol
103
+ < include FSProtocol
104
+ <
105
+ < state do
106
+ < interface :input, :fschunklist, [:reqid, :file]
107
+ < interface :input, :fschunklocations, [:reqid, :chunkid]
108
+ < interface :input, :fsaddchunk, [:reqid, :file]
109
+ < # note that no output interface is defined.
110
+ < # we use :fsret (defined in FSProtocol) for output.
111
+ < end
112
+ < end
113
+ ---
114
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/chunking.rb|6-16
115
+ 185,187c115
116
+ < chunk_buffer <= join([fschunklist, kvget_response, chunk], [fschunklist.reqid, kvget_response.reqid], [fschunklist.file, chunk.file]).map{ |l, r, c| [l.reqid, c.chunkid] }
117
+ < chunk_buffer2 <= chunk_buffer.group([chunk_buffer.reqid], accum(chunk_buffer.chunkid))
118
+ < fsret <= chunk_buffer2.map{ |c| [c.reqid, true, c.chunklist] }
119
+ ---
120
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/chunking.rb|47-49
121
+ 191d118
122
+ < (**JMH**: Ambiguous ref "If it")
123
+ 195,202c122
124
+ < minted_chunk = join([kvget_response, fsaddchunk, available, nonce], [kvget_response.reqid, fsaddchunk.reqid])
125
+ < chunk <= minted_chunk.map{ |r, a, v, n| [n.ident, a.file, 0] }
126
+ < fsret <= minted_chunk.map{ |r, a, v, n| [r.reqid, true, [n.ident, v.pref_list.slice(0, (REP_FACTOR + 2))]] }
127
+ < fsret <= join([kvget_response, fsaddchunk], [kvget_response.reqid, fsaddchunk.reqid]).map do |r, a|
128
+ < if available.empty? or available.first.pref_list.length < REP_FACTOR
129
+ < [r.reqid, false, "datanode set cannot satisfy REP_FACTOR = #{REP_FACTOR} with [#{available.first.nil? ? "NIL" : available.first.pref_list.inspect}]"]
130
+ < end
131
+ < end
132
+ ---
133
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/chunking.rb|69-76
134
+ 204d123
135
+ < (**JMH**: Ambiguous ref "If it")
136
+ 208,212c127
137
+ < fsret <= fschunklocations.map do |l|
138
+ < unless chunk_cache.map{|c| c.chunkid}.include? l.chunkid
139
+ < [l.reqid, false, "no datanodes found for #{l.chunkid} in cc, now #{chunk_cache.length}"]
140
+ < end
141
+ < end
142
+ ---
143
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/chunking.rb|54-58
144
+ 216,219c131
145
+ < chunkjoin = join [fschunklocations, chunk_cache], [fschunklocations.chunkid, chunk_cache.chunkid]
146
+ < host_buffer <= chunkjoin.map{|l, c| [l.reqid, c.node] }
147
+ < host_buffer2 <= host_buffer.group([host_buffer.reqid], accum(host_buffer.host))
148
+ < fsret <= host_buffer2.map{|c| [c.reqid, true, c.hostlist] }
149
+ ---
150
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/chunking.rb|61-64
151
+ 229,233c141
152
+ < module BFSDatanode
153
+ < include HeartbeatAgent
154
+ < include StaticMembership
155
+ < include TimestepNonce
156
+ < include BFSHBProtocol
157
+ ---
158
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/datanode.rb|9-13
159
+ 241c149
160
+ < @dp_server = DataProtocolServer.new(dataport)
161
+ ---
162
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/datanode.rb|53-53
163
+ 245,250c153
164
+ < dir_contents <= hb_timer.flat_map do |t|
165
+ < dir = Dir.new("#{DATADIR}/#{@data_port}")
166
+ < files = dir.to_a.map{|d| d.to_i unless d =~ /^\./}.uniq!
167
+ < dir.close
168
+ < files.map {|f| [f, Time.parse(t.val).to_f]}
169
+ < end
170
+ ---
171
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/datanode.rb|24-29
172
+ 255,259c158
173
+ < to_payload <= join([dir_contents, nonce]).map do |c, n|
174
+ < unless server_knows.map{|s| s.file}.include? c.file
175
+ < [n.ident, c.file, c.time]
176
+ < end
177
+ < end
178
+ ---
179
+ > ==https://github.com/bloom-lang/bud-sandbox/raw/master/bfs/datanode.rb|31-35
180
+ 275d173
181
+ < ### I am autogenerated. Please do not edit me.