bud 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (62) hide show
  1. data/LICENSE +9 -0
  2. data/README +30 -0
  3. data/bin/budplot +134 -0
  4. data/bin/budvis +201 -0
  5. data/bin/rebl +4 -0
  6. data/docs/README.md +13 -0
  7. data/docs/bfs.md +379 -0
  8. data/docs/bfs.raw +251 -0
  9. data/docs/bfs_arch.png +0 -0
  10. data/docs/bloom-loop.png +0 -0
  11. data/docs/bust.md +83 -0
  12. data/docs/cheat.md +291 -0
  13. data/docs/deploy.md +96 -0
  14. data/docs/diffs +181 -0
  15. data/docs/getstarted.md +296 -0
  16. data/docs/intro.md +36 -0
  17. data/docs/modules.md +112 -0
  18. data/docs/operational.md +96 -0
  19. data/docs/rebl.md +99 -0
  20. data/docs/ruby_hooks.md +19 -0
  21. data/docs/visualizations.md +75 -0
  22. data/examples/README +1 -0
  23. data/examples/basics/hello.rb +12 -0
  24. data/examples/basics/out +1103 -0
  25. data/examples/basics/out.new +856 -0
  26. data/examples/basics/paths.rb +51 -0
  27. data/examples/bust/README.md +9 -0
  28. data/examples/bust/bustclient-example.rb +23 -0
  29. data/examples/bust/bustinspector.html +135 -0
  30. data/examples/bust/bustserver-example.rb +18 -0
  31. data/examples/chat/README.md +9 -0
  32. data/examples/chat/chat.rb +45 -0
  33. data/examples/chat/chat_protocol.rb +8 -0
  34. data/examples/chat/chat_server.rb +29 -0
  35. data/examples/deploy/tokenring-ec2.rb +26 -0
  36. data/examples/deploy/tokenring-local.rb +17 -0
  37. data/examples/deploy/tokenring.rb +39 -0
  38. data/lib/bud/aggs.rb +126 -0
  39. data/lib/bud/bud_meta.rb +185 -0
  40. data/lib/bud/bust/bust.rb +126 -0
  41. data/lib/bud/bust/client/idempotence.rb +10 -0
  42. data/lib/bud/bust/client/restclient.rb +49 -0
  43. data/lib/bud/collections.rb +937 -0
  44. data/lib/bud/depanalysis.rb +44 -0
  45. data/lib/bud/deploy/countatomicdelivery.rb +50 -0
  46. data/lib/bud/deploy/deployer.rb +67 -0
  47. data/lib/bud/deploy/ec2deploy.rb +200 -0
  48. data/lib/bud/deploy/localdeploy.rb +41 -0
  49. data/lib/bud/errors.rb +15 -0
  50. data/lib/bud/graphs.rb +405 -0
  51. data/lib/bud/joins.rb +300 -0
  52. data/lib/bud/rebl.rb +314 -0
  53. data/lib/bud/rewrite.rb +523 -0
  54. data/lib/bud/rtrace.rb +27 -0
  55. data/lib/bud/server.rb +43 -0
  56. data/lib/bud/state.rb +108 -0
  57. data/lib/bud/storage/tokyocabinet.rb +170 -0
  58. data/lib/bud/storage/zookeeper.rb +178 -0
  59. data/lib/bud/stratify.rb +83 -0
  60. data/lib/bud/viz.rb +65 -0
  61. data/lib/bud.rb +797 -0
  62. metadata +330 -0
@@ -0,0 +1,296 @@
1
+ # Getting Started with Bud #
2
+ In this document we'll do a hands-on tour of Bud and its Bloom DSL for Ruby. We'll start with some examples, and introduce concepts as we go.
3
+
4
+ ## Installation ##
5
+ You know the drill!
6
+
7
+ % gem install bud
8
+
9
+ This installs four things:
10
+
11
+ * The `Bud` module, to embed Bloom code in Ruby.
12
+ * The `rebl` executable: an interactive shell for trying out Bloom.
13
+ * The `budplot` and `budvis` executables: graphical tools for visualizing and debugging Bloom programs.
14
+
15
+ ## First Blooms ##
16
+
17
+ ### Hello, Clouds! ###
18
+ It seems kind of a silly to do the old "Hello, World" example in a distributed programming language, but no tutorial would be complete without it...
19
+
20
+ Open up a rebl prompt, and paste in the following:
21
+
22
+ stdio <~ [['Hello,'], ['Clouds']]
23
+ /tick
24
+ /q
25
+
26
+ You should see something like this:
27
+
28
+ % rebl
29
+ Welcome to rebl, the interactive Bloom terminal.
30
+
31
+ Type: /h for help
32
+ /q to quit
33
+
34
+ rebl> stdio <~ [['Hello,'], ['Clouds']]
35
+ rebl> /tick
36
+ Hello,
37
+ Clouds
38
+ rebl> /q
39
+
40
+ Rebellion quashed.
41
+ %
42
+
43
+ Let's take this apart:
44
+
45
+ 1. The first line you pasted is a Bloom statement. It says (roughly) "merge the two strings 'Hello,' and 'Clouds' into the standard I/O (terminal) output stream." Note that rebl doesn't execute that statement immediately! It just remembers it as part of a set of statements.
46
+ 2. The second line starts with a slash, meaning it's a command to rebl. It tells rebl to "tick" the Bloom runtime--that is, to evaluate the set of statements we've typed so far in a single atomic "timestep".
47
+ 3. The third line is short for `/quit`. (By the way, rebl auto-completes its commands with the tab key if you like.)
48
+
49
+ For fun and illustration, start up rebl again and paste in this minor variation:
50
+
51
+ stdio <~ [['Hello,'], ['Clouds'], ['Clouds']]
52
+ /tick
53
+ /tick
54
+ /q
55
+
56
+ What happened?
57
+
58
+ First, note that the basic data structures in Bloom are *sets* of objects, which means they have no duplicates, and they have no defined order for their elements. So you should only see 'Clouds' once on the terminal. And in fact you may see 'Clouds' and 'Hello,' in either order. That is not a bug, that is a reflection of the disorderly set of values being placed into stdio during a given timestep! (If you're a fan of duplicates and/or ordering, don't sweat it. We'll show you how to achieve them. But remember: Bloom is *disorderly* for a good reason--to reflect the reality of distributed systems execution. So we're going to try and get you comfortable with being disorderly by default, to protect your ability to write simpler distributed code. You'll need to use some additional syntax to impose order on in the disorderly context of a distributed system, but it's worth it!)
59
+
60
+ Second, note that our Bud program's one statement merges the values on its right-hand-side (rhs) into the left-hand-side (lhs) at *every* timestep--every time you say `/tick`. If you ran this program as a server, it would generate an infinite stream of chatter! Generally it doesn't make sense to write server code with constants on the rhs of statements. We'll see more sensible examples soon.
61
+
62
+ ### Tables and Scratches ###
63
+ Before we dive into writing server code, let's try a slightly more involved single-timestep example. Start up rebl again, and paste in the following:
64
+
65
+ table :clouds
66
+ clouds <= [[1, "Cirrus"], [2, "Cumulus"]]
67
+ stdio <~ clouds.inspected
68
+
69
+ Now tick your rebl, but don't quit yet.
70
+
71
+ /tick
72
+
73
+ Hopefully the output looks sensible. A few things to note here:
74
+
75
+ 1. the first line we pasted in is a Bloom *collection declaration*. It declares the existence of a `table` named `clouds`. By default, Bloom collections hold \[key, value\] pairs: i.e., arrays with the first field being a unique key, and the second a value. (Given an item `i` in a Bloom collection, you can access those fields as `i.key` and `i.val` respectively, or as `i[0]` and `i[1]`).
76
+ 2. The second line uses Bloom's `<=` merge operator. This *instantaneously* merges the contents from the rhs of the statement into the lhs within the same timestep.
77
+
78
+ 3. The third line uses Bloom's `<~` merge operator. We'll spend more time on the meaning of this operator later; for now just be aware that statements with `stdio` on the lhs *must* use `<~`. (If you like, try starting over with `<=` instead of `<~` in that statement and see what happens.)
79
+ 4. the `inspected` method of BudCollections converts arrays of values into strings suitable for printing. (Again, if you like you can try the program again, leaving out the `.inspected` method.)
80
+
81
+ Now, let's use rebl's `lsrules` and `rmrule` commands to remove a Bloom statement (a.k.a. "rule") from our program. Assuming you didn't quit from the last rebl prompt, you can proceed as follows:
82
+
83
+ /lsrules
84
+ /rmrule 1
85
+ /lsrules
86
+
87
+ Have a look at the output of each of those rebl commands--they're fairly self-explanatory. You should see that we deleted the rule that instantaneously merged strings into the `clouds` table. Now tick your rebl again.
88
+
89
+ /tick
90
+ /q
91
+
92
+ You still get output values on this second tick because the clouds table stored its content--even though the statement that populated that table was removed.
93
+
94
+ In many cases we don't want a collection to retain its contents across ticks. For those cases, we have `scratch` collections. Start up a new rebl and try this variant of the previous example:
95
+
96
+ scratch :passing_clouds
97
+ passing_clouds <= [[3, "Nimbus"], [2, "Cumulonimbus"]]
98
+ stdio <~ passing_clouds.inspected
99
+ /tick
100
+ /lsrules
101
+ /rmrule 1
102
+ /tick
103
+ /q
104
+
105
+ See how the second tick produced no output this time? After the first timestep, the passing\_clouds scratch collection did not retain its contents. And without the first statement to repopulate it during the second timestep, it remained empty.
106
+
107
+ ### Summing Up ###
108
+ In these initial examples we learned about a few simple but important things:
109
+
110
+ * **rebl**: the interactive Bloom terminal and its `/` commands.
111
+ * **Bloom collections**: unordered sets of items, which are set up by collection declarations. So far we have seen persistent `table` and transient `scratch` collections. By default, collections are structured as \[key,value\] pairs.
112
+ * **Bloom statements**: expressions of the form *lhs op rhs*, where the lhs is a collection and the rhs is either a collection or an array-of-arrays.
113
+ * **Bloom timestep**: an atomic single-machine evaluation of a block of Bloom statements.
114
+ * **Bloom merge operators**:
115
+ * The `<=` merge operator for *instantaneously* merging things into a collection.
116
+ * The `<~` merge operator for *asynchronously* merging things into collections outside the control of tick evaluation: e.g. terminal output.*
117
+ * **stdio**: a built-in Bloom collection that, when place on the rhs of an asynch merge operator `<~`, prints its contents to stdout.
118
+ * **inspected**: a method of Bloom collections that transforms the elements to be suitable for textual display on the terminal.
119
+
120
+ ## Chat, World! ##
121
+ Now that we've seen a bit of Bloom, we're ready to write our first interesting service that embeds Bloom code in Ruby. We'll implement a simple client-server "chat" service. The full code for this program is in the `examples/chat` directory of the Bud distribution. (Lest there be any confusion at this point, please note that Bloom isn't specifically designed for client/server designs. Many examples in the [bud-sandbox](http://github.com/bloom-lang/bud-sandbox) repository are more like "peer-to-peer" or "agent-based" designs, which tend to work out as neatly as this one or moreso.)
122
+
123
+ **Basic idea**: The basic idea of this program is that clients will connect to a chatserver process across the Internet. When a client first connects to the server, the server will remember its address and nickname. The server will also accept messages from clients, and relay them to other clients.
124
+
125
+ Even though we're getting ahead of ourselves, let's have a peek at the Bloom statements that implement the server in `examples/chat/chat_server.rb`:
126
+
127
+ nodelist <= signup.payloads
128
+ mcast <~ (mcast * nodelist).pairs { |m,n| [n.key, m.val] }
129
+
130
+ That's it! There is one statement for each of the two sentences describing the behavior of the "basic idea" above. We'll go through these two statements in more detail shortly. But it's nice to see right away how concisely and naturally a Bloom program can fit our intuitive description of a distributed service.
131
+
132
+ ### The Server Side ###
133
+
134
+ Now that we've satisfied our need to peek, let's take this a bit more methodically. First we need declarations for the various Bloom collections we'll be using. We put the declarations that are common to both client and server into file `examples/chat/chat_protocol.rb`:
135
+
136
+ module ChatProtocol
137
+ state do
138
+ channel :mcast
139
+ channel :connect
140
+ end
141
+
142
+ DEFAULT_ADDR = "localhost:12345"
143
+ end
144
+
145
+ This defines a [Ruby mixin module](http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_modules.html) called `ChatProtocol` that has a couple special Bloom features:
146
+
147
+ 1. It contains a Bloom `state` block, containing collection declarations. When embedding Bloom in Ruby, all Bloom collection declarations must appear in a `state` block of this sort.
148
+ 2. This particular state block uses a kind of Bloom collection we have not seen before: a `channel`. A channel collection is a special kind of scratch used for network communication. It has a few key features:
149
+
150
+ * Unlike the default \[key,value\] structure of scratches and tables, channels default to the structure \[address,payload\]: the first field is a destination IP string of the form 'host:port', and the second field is a payload to be delivered to that destination--typically a Ruby array. (For the record, the default key of a channel collection is the *pair* \[address,payload\]).
151
+ * Any Bloom statement with a channel on the lhs must use the async merge (`<~`) operator. This instructs the runtime to attempt to deliver each rhs item to the address stored therein. In an async merge, each item in the collection on the right will appear in the collection on the lhs *eventually*. But this will not happen instantaneously, and it might not happen atomically--items in the collection on the rhs may "straggle in" individually over time at the destination. And if you're unlucky, this may happen after an arbitrarily long delay (possibly never). The use of `<~` for channels reflects the typical uncertainty of real-world network delivery. (Don't worry, the Bud sandbox provides libraries to wrap that uncertainty up in convenient ways.)
152
+
153
+ Given this protocol (and the Ruby constant at the bottom), we're now ready to examine `examples/chat/chat_server.rb` in more detail:
154
+
155
+ require 'rubygems'
156
+ require 'bud'
157
+ require 'chat_protocol'
158
+
159
+ class ChatServer
160
+ include Bud
161
+ include ChatProtocol
162
+
163
+ state { table :nodelist }
164
+
165
+ bloom do
166
+ nodelist <= connect.payloads
167
+ mcast <~ (mcast * nodelist).pairs { |m,n| [n.key, m.val] }
168
+ end
169
+ end
170
+
171
+ if ARGV.first
172
+ addr = ARGV.first
173
+ else
174
+ addr = ChatProtocol::DEFAULT_ADDR
175
+ end
176
+
177
+ ip, port = addr.split(":")
178
+ puts "Server address: #{ip}:#{port}"
179
+ program = ChatServer.new(:ip => ip, :port => port.to_i)
180
+ program.run
181
+
182
+
183
+ The first few lines get the appropriate Ruby classes and modules loaded via `require`. We then define the ChatServer class which mixes in the `Bud` module and the ChatProtocol module we looked at above. Then we have another `state` block that declares one additional collection, the `nodelist` table.
184
+
185
+ With those preliminaries aside, we have our first `bloom` block, which is how Bloom statements are embedded into Ruby. Let's revisit the two Bloom statements that make up our server.
186
+
187
+ The first is pretty simple:
188
+
189
+ nodelist <= connect.payloads
190
+
191
+ This says that whenever messages arrive on the channel named "connect", their payloads (i.e. their non-address field) should be instantaneously merged into the table nodelist, which will store them persistently. Note that nodelist has a \[key/value\] pair structure, so we expect the payloads will have that structure as well.
192
+
193
+ The next Bloom statement is more complex. Remember the description in the "basic idea" at the beginning of this section: the server needs to accept inbound chat messages from clients, and forward them to other clients.
194
+
195
+ mcast <~ (mcast * nodelist).pairs { |m,n| [n.key, m.val] }
196
+
197
+ The first thing to note is the lhs and operator in this statement. We are merging items (asynchronously, of course!) into the mcast channel, where they will be sent to their eventual destination.
198
+
199
+ The rhs is our first introduction to the `*` operator of Bloom collections, and the `pairs` method after it. You can think of the `*` operator as "all-pairs": it produces a Bloom collection containing all pairs of mcast and nodelist items. The `pairs` method iterates through these pairs, passing them through a code block via the block arguments `m` and `n`. Finally, for each such pair the block produces an item containing the `key` attribute of the nodelist item, and the `val` attribute of the mcast item. This is structured as a proper \[address, value\] entry to be merged back into the mcast channel. Putting this together, this statement *multicasts inbound payloads on the mcast channel to all nodes in the chat*.
200
+
201
+ The remaining lines of plain Ruby simply instantiate and run the ChatServer class (which includes the `Bud` module) using an ip and port given on the command line (or the default from ChatProtocol.rb).
202
+
203
+ #### `*`'s and Clouds ####
204
+ You can think of out use of the `*` operator in the rhs of the second statement in a few different ways:
205
+
206
+ * If you're familiar with event-loop programming, this implements an *event handler* for messages on the mcast channel: whenever an mcast message arrives, this handler performs lookups in the nodelist table to form new messages. (It is easy to add "filters" to these handlers.) The resulting messages are dispatched via the mcast channel accordingly. This is a very common pattern in Bloom programs: handling channel messages via lookups in a table.
207
+
208
+ * If you're familiar with SQL databases, the rhs is essentially a query that is run at each timestep, performing a CROSS JOIN of the mcast and nodelist "tables", with the SELECT clause captured by the block. (It is easy to add WHERE clauses to these joins.) The resulting "tuples" are "inserted" into the lhs asynchronously (and typical on remote nodes). This is a general-purpose way to think about the * operator. But as you've already seen, many common use cases for Bloom's * operator don't "feel" like database queries, because one or more of the collections is a scratch that is "driving" the program.
209
+
210
+ We expect that people doing distributed programming are probably familiar with both of these metaphors, and they're both useful. It's fairly common to think about rules in the first form, although the second form is actually closer to the underlying semantics of the language (which come from a temporal logic called [Dedalus](http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-173.html)).
211
+
212
+ ### The Client side ###
213
+ Given our understanding of the server, the client should be pretty simple. It needs to send an appropriately-formatted message on the `connect` channel to the server, send/receive messages on the `mcast` channel, and print the messages it receives to the screen.
214
+
215
+ And here's the code:
216
+
217
+ require 'rubygems'
218
+ require 'bud'
219
+ require 'chat_protocol'
220
+
221
+ class ChatClient
222
+ include Bud
223
+ include ChatProtocol
224
+
225
+ def initialize(nick, server, opts)
226
+ @nick = nick
227
+ @server = server
228
+ super opts
229
+ end
230
+
231
+ # send connection request to server on startup
232
+ bootstrap do
233
+ connect <~ [[@server, [ip_port, @nick]]]
234
+ end
235
+
236
+ bloom :chatter do
237
+ # send mcast requests to server
238
+ mcast <~ stdio do |s|
239
+ [@server, [ip_port, @nick, Time.new.strftime("%I:%M.%S"), s.line]]
240
+ end
241
+ # pretty-print mcast msgs from server on terminal
242
+ stdio <~ mcast do |m|
243
+ [left_right_align(m.val[1].to_s + ": " \
244
+ + (m.val[3].to_s || ''),
245
+ "(" + m.val[2].to_s + ")")]
246
+ end
247
+ end
248
+
249
+ # format chat messages with timestamp on the right of the screen
250
+ def left_right_align(x, y)
251
+ return x + " "*[66 - x.length,2].max + y
252
+ end
253
+ end
254
+
255
+ if ARGV.length == 2
256
+ server = ARGV[1]
257
+ else
258
+ server = ChatProtocol::DEFAULT_ADDR
259
+ end
260
+
261
+ puts "Server address: #{server}"
262
+ program = ChatClient.new(ARGV[0], server, :read_stdin => true)
263
+ program.run
264
+
265
+ The ChatClient class has a typical Ruby `initialize` method that sets up two local instance variables: one for this client's nickname, and another for the 'IP:port' address string for the server. It then calls the initializer of the Bud superclass passing along a hash of options.
266
+
267
+ The next block in the class is the first Bloom `bootstrap` block we've seen. This is a set of Bloom statements that are evaluated only once, just before the first "regular" timestep of the system. In this case, we bootstrap the client by sending a message to the server on the connect channel, containing the client's address (via the built-in Bud instance method `ip_port`) and chosen nickname.
268
+
269
+ After that comes a bloom block, with the name `:chatter`. It contains two statements: one to take stdio input from the terminal and send it to the server via mcast, and another to receive mcasts and place them on stdio output. The first statement has the built-in `stdio` scratch on the rhs: this includes any lines of terminal input that arrived since the last timestep. For each line of terminal input, the `do...end` block formats an `mcast` message destined to the address in the instance variable `@server`, with an array as the payload. The rhs of the second statement takes `mcast` messages that arrived since the last timestep. For each message `m`, the `m.val` expression in the block returns the message payload; the call to the Ruby instance method `left_right_align` formats the message so it will look nice on-screen. These formatted strings are placed (asynchronously, as before) into `stdio` on the left.
270
+
271
+ The remaining lines are Ruby driver code to instantiate and run the ChatClient class (which includes the `Bud` module) using arguments from the command line. Note the option `:read_stdin => true` to `ChatClient.new`: this causes the Bud runtime to capture stdin via the built-in `stdio` collection.
272
+
273
+ ### Running the chat ###
274
+ You can try out our little chat program on a single machine by issuing each of the following shell commands from the `examples/chat` subdir within a separate window:
275
+
276
+ # ruby chatserver.rb
277
+
278
+ # ruby chat.rb alice
279
+
280
+ # ruby chat.rb bob
281
+
282
+ # ruby chat.rb harvey
283
+
284
+ Alternatively you can run the server and clients on separate nodes, specifying the server's IP:port pair on the command-line (consistently).
285
+
286
+ ### Summing Up ###
287
+ In this section we saw a number of features that we missed in our earlier single-timestep examples in rebl:
288
+
289
+ * **state blocks**: Embedding of Bloom collection declarations into Ruby.
290
+ * **bloom blocks**: Embedding of Bloom statements into Ruby.
291
+ * **bootstrap blocks**: for one-time statements to be executed before the first timestep.
292
+ * **channel collections**: collection types that enable sending/receiving asynchronous, unreliable messages
293
+ * **the * operator and pairs method**: the way to combine items from multiple collections.
294
+
295
+ # The Big Picture and the Details #
296
+ Now that you've seen some working Bloom code, hopefully you're ready to delve deeper. The [README](README.md) provides links to places you can go for more information. Have fun and [stay in touch](http://groups.google.com/group/bloom-lang)!
data/docs/intro.md ADDED
@@ -0,0 +1,36 @@
1
+ # *Bud*: Ruby <~ Bloom #
2
+
3
+ Bud is a prototype of the [*Bloom*](http://bloom-lang.org) language for distributed programming, embedded as a DSL in Ruby. "Bud" stands for *Bloom under development*. The current 0.0.1 release is the initial alpha, targeted at "friends and family" who would like to engage at an early stage in the language design.
4
+
5
+ ## Distributed Code in Bloom ##
6
+ The goal of Bloom is to make distributed programming far easier than it has been with traditional languages. The key features of Bloom are:
7
+
8
+ 1. *Disorderly Programming*: Traditional languages like Java and C are based on the [von Neumann model](http://en.wikipedia.org/wiki/Von_Neumann_architecture), where a program counter steps through individual instructions in order. Distributed systems don’t work like that. Much of the pain in traditional distributed programming comes from this mismatch: programmers are expected to bridge from an ordered programming model into a disorderly reality that executes their code. Bloom was designed to match--and to exploit--the disorderly reality of distributed systems.  Bloom programmers write code made of unordered collections of statements, and use explicit constructs to impose order when needed.
9
+
10
+ 2. *A Collected Approach to Data Structures*: Taking a cue from successfully-parallelized models like MapReduce and SQL, the standard data structures in Bloom are *disorderly collections*, rather than scalar variables and nested structures like lists, queues and trees. Disorderly collection types reflect the realities of non-deterministic ordering inherent in distributed systems. Bloom provides simple, familiar syntax for manipulating these structures. In Bud, much of this syntax comes straight from Ruby, with a taste of MapReduce and SQL.
11
+
12
+ 3. *CALM Consistency*: Bloom enables powerful compiler analysis techniques based on the [CALM principle](http://db.cs.berkeley.edu/papers/cidr11-bloom.pdf) to reason about the consistency of your distributed code. The Bud prototype includes program analysis tools that can point out precise *points of order* in your program: lines of code where a coordination library should be plugged in to ensure distributed consistency.
13
+
14
+ 4. *Concise Code*: Bloom is a very high-level language, designed with distributed code in mind. As a result, Bloom programs tend to be far smaller (often [orders of magnitude](http://boom.cs.berkeley.edu) smaller) than equivalent programs in traditional imperative languages.
15
+
16
+ ## Alpha Goals and Limitations ##
17
+
18
+ We had three main goals in preparing this release. The first was to flesh out the shape of the Bloom language: initial syntax and semantics, and the "feel" of embedding it as a DSL. The second goal was to build tools for reasoning about Bloom programs: both automatic program analysis, and tools for surfacing that analysis to developers.
19
+
20
+ The third goal was to start a feedback loop with developers interested in the potential of the ideas behind the language. We are optimistic that the principles underlying Bloom can make distributed programming radically simpler. But we realize that those ideas only matter if programmers can adopt them naturally. We intend Bud to be the beginning of an iterative design partnership with developers who see value in betting early on these ideas, and shaping the design of the language.
21
+
22
+ In developing this alpha release, we explicitly set aside some issues that we intend to revisit in future. The first limitation is performance: Bud 0.0.1 is not intended to excel in single-node performance in terms of either latency, throughput or scale. We do expect major improvements on all these fronts in future releases: many of the known performance problems have known solutions that we've implemented in prior systems, and we intend to revisit them for our beta release. The second main limitation involves integration issues embedding Bloom as a DSL in Ruby. In the spectrum from flexibility to purity, we leaned decidedly toward flexibility. The barriers between Ruby and Bloom code are very fluid in the alpha, and we do relatively little to prevent programmers from ad-hoc mixtures of the two. Aggressive use of Ruby within Bloom statements is likely to do something *interesting*, but not necessarily predictable or desirable. This is an area where we expect to learn more from experience, and make some more refined decisions for the beta release.
23
+
24
+ ### Friends and Family: Come On In ###
25
+ Although our team has many years of development experience, Bud is still open-source academic software built on a decidedly personal scale.
26
+
27
+ This 0.0.1 alpha is targeted at "friends and family", and at developers who'd like to become same. This is definitely the bleeding edge: we're in a rapid cycle of learning about this new style of programming, and exposing what we learn in new iterations of the language. If you'd like to jump on the wheel with us and play with Bud, we'd love your feedback--both success stories and constructive criticism.
28
+
29
+ ## Getting Started ##
30
+ We're shipping Bud with a [sandbox](http://github.com/bloom-lang/bud-sandbox) of libraries and example applications for distributed systems. These illustrate the language and how it can be used, and also can serve as mixins for new code you might want to write. You may be surprised at how short the provided Bud code is, but don't be fooled.
31
+
32
+ To get you started with Bud, we've provided a [quick-start tutorial](getstarted.md), instructions for [deploying distributed Bud](deployer.md) programs on Amazon's EC2 cloud, and a number of other docs you can find linked from the [README](README.md).
33
+
34
+ We welcome both constructive criticism and (hopefully occasional) smoke-out-your-ears, hair-tearing shouts of frustration. Please point your feedback cannon at the [Bloom mailing list](http://groups.google.com/group/bloom-lang) on Google Groups.
35
+
36
+ Happy Blooming!
data/docs/modules.md ADDED
@@ -0,0 +1,112 @@
1
+ # Code Structuring and Reuse in BUD
2
+
3
+ ## Language Support
4
+
5
+ ### Ruby Mixins
6
+
7
+ The basic unit of reuse in BUD is the mixin functionality provided by Ruby itself. BUD code is structured into modules, each of which may have its own __state__ and__bootstrap__ block and any number of __bloom__ blocks (described below). A module or class may mix in a BUD module via Ruby's _include_ statement. _include_ causes the specified module's code to be expanded into the local scope.
8
+
9
+ ### Bloom Blocks
10
+
11
+ While the order and grouping of BUD rules have no semantic significance, rules can be grouped and tagged within a single module using __bloom__ blocks. Bloom blocks serve two purposes:
12
+
13
+ 1. Improving readability by grouping related or dependent rules together.
14
+ 2. Supporting name-based overriding.
15
+
16
+ (1) is self-explanatory. (2) represents one of several extensibility mechanisms provided by BUD. If a Module B includes a module A which contains a basket X, B may supply a bloom block X and in so doing replaces the set of rules defined by (A.)X with its own set. For example:
17
+
18
+ require 'rubygems'
19
+ require 'bud'
20
+
21
+ module Hello
22
+ state do
23
+ interface input, :stim
24
+ end
25
+ bloom :hi do
26
+ stdio <~ stim {|s| ["Hello, #{s.val}"]}
27
+ end
28
+ end
29
+
30
+ module HelloTwo
31
+ include Hello
32
+ bloom :hi do
33
+ stdio <~ stim{|s| ["Hello, #{s.key}"]}
34
+ end
35
+ end
36
+
37
+ class HelloClass
38
+ include Bud
39
+ include HelloTwo
40
+ end
41
+
42
+ h = HelloClass.new
43
+ h.run_bg
44
+ h.sync_do{h.stim <+ [[1,2]]}
45
+
46
+ The program above will print "Hello, 1", because the module HelloTwo overrides the bloom block named __hi__. If we give the bloom block in HelloTwo a distinct name, the program will print "Hello, 1" and "Hello, 2" (in no guaranteed order).
47
+
48
+
49
+ ### The BUD Module Import System
50
+
51
+ For simple programs, composing modules via _include_ is often sufficient. But the flat namespace provided by mixins can make it difficult or impossible to support certain types of reuse. Consider a module Q that provides a queue-like functionality via an input interface _enqueue_ and an output interface _dequeue_, each with a single attribute (payload). A later module may wish to employ two queues (say, to implement a scheduler). But it cannot include Q twice! It would be necessary to rewrite Q's interfaces so as to support multiple ``users.''
52
+
53
+ In addition to _include_, BUD supports the _import_ keyword, which instantiates a BUD module under a namespace alias. For example:
54
+
55
+ module UserCode
56
+ import Q => :q1
57
+ import Q => :q2
58
+
59
+ bloom do
60
+ # insert into the first queue
61
+ q1.enqueue <= [....]
62
+ end
63
+ end
64
+
65
+
66
+ ## Techniques
67
+
68
+ ### Structuring
69
+
70
+ In summary, BUD extends the basic Ruby code structuring mechanisms (classes and modules) with bloom blocks, for finer-granularity grouping of rules
71
+ within modules, and the import system, for scoped inclusion.
72
+
73
+ ### Composition
74
+
75
+ Basic code composition can achieved using the Ruby mixin system. If the flat namespace causes ambiguity (as above) or hinders readability, the import system provides the ability to scope code inclusions.
76
+
77
+ ### Extension and Overriding
78
+
79
+ Extending the existing functionality of a BUD program can be achieved in a number of ways. The simplest (but arguably least flexible) is via bloom block overriding, as described in the Hello example above.
80
+
81
+ The import system can be used to implement finer-grained overriding, at the collection level. Consider a module BlackBox that provides an input interface __iin__ and an output interface __iout__. Suppose that we wish to "use" BlackBox, but need to provide additional functionality. We may extend one or both of its interfaces by _import_'ing BlackBox, redeclaring the interfaces, and gluing them together. For example, the module UsesBlackBox shown below interposes additional logic (indicated by ellipses) upstream of BlackBox's input interface, and provides ``extended'' BlackBox functionality.
82
+
83
+ module UsesBlackBox
84
+ import BlackBox => :bb
85
+ state do
86
+ interface input, :iin
87
+ interface output, :iout
88
+ end
89
+
90
+ bloom do
91
+ [ .... ] <= iin
92
+ bb.iin <= [ .... ]
93
+ iout <= bb.iout
94
+ end
95
+ end
96
+
97
+ ### Abstract Interfaces and Concrete Implementations
98
+
99
+ In the previous example, UsesBlackBox extended the functionality of BlackBox by _interposing_ additional logic into its dataflow.
100
+ It was able to do this transparently because both implementations had the same externally visible interface: inserting tuples into __iin__ causes tuples to appear in __iout__. In some (extremely underspecified) sense, the definition of this pair of interfaces constitutes an abstract contract which both implementations implement -- and the dependency of UsesBlackBox on Blackbox is just a detail of UsesBlackBox's implementation.
101
+
102
+ The basic Ruby module system inherited by Bud may be used, by convention, to enable code reuse and hiding via the separation of abstract interfaces and concrete implementations. Instead of reiterating the schema definitions in multiple state blocks, we will often instead declare a protocol module as follows:
103
+
104
+ module BBProtocol
105
+ # Contract: do XXXXXXXXXXX
106
+ state do
107
+ interface input, :iin
108
+ interface output, :iout
109
+ end
110
+ end
111
+
112
+ Each implementation of the protocol would then include BBProtocol. Though the interpreter treats this as an ordinary Ruby mixin, the interpretation is that by including BBProtocol, both BlackBox and UsesBlackBox _implement_ the protocol. A downstream developer may then write code against the external interface, committing only when necessary to a fully-specified implmentation.
@@ -0,0 +1,96 @@
1
+ # An Operational View Of Bloom #
2
+ You may ask yourself: well, what does a Bloom program *mean*? You may ask yourself: How do I read this Bloom code? ([You may tell yourself, this is not my beautiful house.](http://www.youtube.com/watch?v=I1wg1DNHbNU) But I digress.)
3
+
4
+ There is a formal answer to these questions about Bloom, but there's also a more more approachable answer. *Very* briefly, the formal answer is that Bloom's semantics are based in model theory, via a temporal logic language called *Dedalus* that is described in [a paper from Berkeley](http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-173.html).
5
+
6
+ While that's nice for proving theorems (and writing [program analysis tools](visualizations.md)), many programmers don't find model-theoretic discussions of semantics terribly helpful or even interesting. It's usually easier to think about how the language *works* at some level, so you can reason about how to use it.
7
+
8
+ That's the goal of this document: to provide a relatively simple, hopefully useful intuition for how Bloom is evaluated. This is not the only way to evaluate Bloom, but it's the intuitive way to do it, and basically the way that the Bud implementation works (modulo some optimizations).
9
+
10
+ ## Bloom Timesteps ##
11
+ Bloom is designed to be run on multiple machines, with no assumptions about coordinating their behavior or resources.
12
+
13
+ Each machine runs an evaluator that works in a loop, as depicted in this figure:
14
+
15
+ ![Bloom Loop](bloom-loop.png?raw=true)
16
+
17
+ Each iteration of this loop is a *timestep* for that node; each timestep is associated with a monotonically increasing timestamp (which is accessible via the `budtime` method in Bud). Timesteps and timestamps are not coordinated across nodes; any such coordination has to be programmed in the Bloom language itself.
18
+
19
+ A Bloom timestep has 3 main phases:
20
+
21
+ 1. *setup*: All scratch collections are set to empty. Network messages and periodic timer events are received from the runtime and placed into their designated `channel` and `periodic` scratches, respectively, to be read in the rhs of statements. Note that a batch of multiple messages/events may be received at once.
22
+ 2. *logic*: All Bloom statements for the program are evaluated. In programs with recursion through instantaneous merges (`<=`), the statements are repeatedly evaluated until a *fixpoint* is reached: i.e. no new lhs items are derived from any rhs.
23
+ 3. *transition*: Items derived on the lhs of deferred operators (`<+`, `<-`) are placed into/deleted from their corresponding collections, and items derived on the lhs of asynchronous merge (`<~`) are handed off to external code (i.e. the local operating system) for processing.
24
+
25
+ It is important to understand how the Bloom collection operators fit into these timesteps:
26
+
27
+ * *Instantaneous* merge (`<=`) occurs within the fixpoint of phase 2.
28
+ * *Deferred* operations include merge (`<+`) and delete (`<-`), and are handled in phase 3. Their effects become visible atomically to Bloom statements in phase 2 of the next timestep.
29
+ * *Asynchronous* merge (`<~`) is initiated during phase 3, so it cannot affect the current timestep. When multiple items are on the rhs of an async merge, they may "appear" independently spread across multiple different future local timesteps.
30
+
31
+
32
+ ## Atomicity: Timesteps and Deferred Operators ##
33
+
34
+ The only instantaneous Bloom operator is a merge (`<=`), which can only introduce additional items into a collection--it can not delete or change existing items. As a result, all state within a Bloom timestep is *immutable*: once an item is in a collection at timestep *T*, it stays in that collection throughout timestep *T*.
35
+
36
+ To get atomic state change in Bloom, you exploit the combination of two language features:
37
+
38
+ 1. the immutability of state in a single timestep, and
39
+ 2. the uninterrupted sequencing of consecutive timesteps.
40
+
41
+ State "update" is achieved in Bloom via a pair of deferred statements, one positive and one negative, like so:
42
+
43
+ buffer <+ [[1, "newval"]]
44
+ buffer <- buffer {|b| b if b.key == 1}
45
+
46
+ This atomically replaces the entry for key 1 with the value "newval" at the start of the next timestep.
47
+
48
+ Any reasoning about atomicity in Bloom programs is built on this simple foundation. It's really all you need. In the bud-sandbox we show how to build more powerful atomicity constructs using it, including things like enforcing [message ordering across timesteps](https://github.com/bloom-lang/bud-sandbox/tree/master/ordering), and protocols for [agreeing on ordering of distributed updates](https://github.com/bloom-lang/bud-sandbox/tree/master/paxos) across all nodes.
49
+
50
+ ## Recursion in Bloom ##
51
+ Because Bloom is data-driven rather than call-stack-driven, recursion may feel a bit unfamiliar at first.
52
+
53
+ Have a look at the following classic "transitive closure" example, which computes multi-hop paths in a graph based on a collection of one-hop links:
54
+
55
+ state do
56
+ table :link, [:from, :to, :cost]
57
+ table :path, [:from, :to, :cost]
58
+ end
59
+
60
+ bloom :make_paths do
61
+ # base case: every link is a path
62
+ path <= link {|e| [e.from, e.to, e.cost]}
63
+
64
+ # recurse: path of length n+1 made by a link to a path of length n
65
+ path <= (link*path).pairs(:to => :from) do |l,p|
66
+ [l.from, p.to, l.cost+p.cost]
67
+ end
68
+ end
69
+
70
+ The recursion in the second Bloom statement is easy to see: the lhs and rhs both contain the path collection, so path is defined in terms of itself.
71
+
72
+ You can think of this being computed by reevaluating the bloom block over and over--within a single timestep--until no more new paths are found. In each iteration, we find new paths that are one hop longer than the longest paths found previously. When no new items are found in an iteration, we are at what's called a *fixpoint*.
73
+
74
+ Hopefully that description is fairly easy to understand. You can certainly construct more complicated examples of recursion--just as you can in a traditional language (e.g., simultaneous recursion.) But understanding this example of simple recursion is probably sufficient for most needs.
75
+
76
+ ## Non-monotonicity and Strata ##
77
+
78
+ Consider augmenting the previous path-finding program to compute only the "highest-cost" paths between each source and destination, and print them out. We can do this by adding another statement to the above:
79
+
80
+ bloom :print_highest do
81
+ stdio <~ path.argmax([:from, :to], :cost)
82
+ end
83
+
84
+ The `argmax` expression in the rhs of this statement finds the items in path that have the maximum cost for each `[from, to]` pair.
85
+
86
+ It's interesting to think about how to evaluate this statement. Consider what happens after a single iteration of the path-finding logic listed above. We will have 1-hop paths between some pairs. But there will likely be multi-hop paths between those pairs that cost more. So it would be premature after a single iteration to put anything out on stdio. In fact, we can't be sure what should go out to stdio until we have hit a fixpoint with respect to the path collection. That's because `argmax` is a logically *non-monotonic* operator: as we merge more items into its input collection, it may have to "retract" an output they would previously have produced.
87
+
88
+ The Bud runtime takes care of this problem for you under the covers, by breaking your statements in *strata* (layers) via a process called *stratification*. The basic idea is simple. The goal is to postpone evaluating non-monotonic operators until fixpoint is reached on their input collections. Stratification basically breaks up the statements in a Bloom program into layers that are separated by non-monotonic operators, and evaluates the layers in order.
89
+
90
+ For your reference, the basic non-monotonic Bloom operators include `group, reduce, argmin, argmax`. Also, statements that embed Ruby collection methods in their blocks are often non-monotonic--e.g., methods like `all?, empty?, include?, none?` and `size`.
91
+
92
+ Note that it is possible to write a program in Bloom that is *unstratifiable*: there is no way to separate it into layers like this. This arises when some collection is recursively defined in terms of itself, and there is a non-monotonic method along the recursive dependency chain. A simple example of this is as follows:
93
+
94
+ glass <= one_item {|t| ['full'] if glass.empty? }
95
+
96
+ Consider the case where we start out with glass being empty. Then we know the fact `glass.empty?`, and the bloom statement says that `(glass.empty? => not glass.empty?)` which is equivalent to `(glass.empty? and not glass.empty?)` which is a contradiction. The Bud runtime detects cycles through non-monotonicity for you automatically when you instantiate your class.