tupelo 0.16 → 0.17
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +21 -460
- data/bin/tup +11 -2
- data/example/chat/chat-nohistory.rb +1 -3
- data/example/chat/chat.rb +19 -8
- data/example/consistent-hash.rb +73 -0
- data/example/map-reduce/prime-factor-balanced.rb +54 -10
- data/example/multi-tier/kvspace.rb +1 -1
- data/example/multi-tier/memo2.rb +5 -8
- data/example/multi-tier/multi-sinatras.rb +1 -1
- data/example/subspaces/addr-book.rb +18 -27
- data/example/subspaces/pubsub.rb +2 -14
- data/example/subspaces/ramp.rb +17 -24
- data/example/subspaces/shop/shop-v2.rb +5 -8
- data/example/subspaces/simple.rb +1 -9
- data/example/{fish.rb → wip/fish.rb} +4 -0
- data/example/{fish0.rb → wip/fish1.rb} +0 -0
- data/example/wip/fish2.rb +59 -0
- data/lib/tupelo/app/builder.rb +2 -0
- data/lib/tupelo/client/subspace.rb +36 -0
- data/lib/tupelo/client/transaction.rb +5 -2
- data/lib/tupelo/client/tuplespace.rb +2 -2
- data/lib/tupelo/client/worker.rb +13 -12
- data/lib/tupelo/client.rb +1 -32
- data/lib/tupelo/util/bin-circle.rb +123 -0
- data/lib/tupelo/version.rb +1 -1
- metadata +115 -111
- data/example/map-reduce/mr.rb +0 -61
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c5703dd8bef3eca3c30fc208f1ff5fe291b19b4a
|
4
|
+
data.tar.gz: c84e02596fc16e71dce360a8ba67b7033194d15b
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f6e2afa9ccf4cea0e12497ae4b8d98d169d2b8b6b46dcf92ff13d46f44d111080331fc80aa637f9234ef41d3727cd7e591d7e861bf2492bce5fda9ab258e6e32
|
7
|
+
data.tar.gz: afb4669e2bdf49294eda7423b250ea62f0e4efedd7f9ea250658ee531e76871d27d027b456492680d0fa920ffd9d5045fb6f7df569bdc52945f7677b86ffb325
|
data/README.md
CHANGED
@@ -1,34 +1,32 @@
|
|
1
|
-
|
1
|
+
Tupelo
|
2
2
|
==
|
3
3
|
|
4
|
-
A tuplespace that is fast, scalable, and language agnostic. It is designed for distribution of both computation and storage, in a unified language that has both transactional and tuple-operation (read/write/take) semantics.
|
4
|
+
A tuplespace that is fast, scalable, and language agnostic. It is designed for distribution of both computation and storage (disk and memory), in a unified language that has both transactional and tuple-operation (read/write/take) semantics.
|
5
5
|
|
6
|
-
|
7
|
-
This is the reference implementation in ruby. It should be able to communicate with implementations in other languages. Planned implementation languages include C, Python, and Go.
|
8
|
-
|
9
|
-
Tupelo differs from other spaces in several ways:
|
10
|
-
|
11
|
-
* minimal central storage: the only state in the server is a counter and socket connections
|
12
|
-
|
13
|
-
* minimal central computation: just counter increment, message dispatch, and connection management (and it never unpacks serialized tuples)
|
14
|
-
|
15
|
-
* clients do all the tuple work: registering and checking waiters, matching, searching, notifying, storing, inserting, deleting, persisting, etc. Each client is free to to decide how to do these things (application code is insulated from this, however). Special-purpose clients (known as *tuplets*) may use specialized algorithms and stores for the subspaces they manage.
|
16
|
-
|
17
|
-
* transactions, in addition to the classic operators (and transactions execute client-side, reducing bottleneck and increasing expressiveness).
|
18
|
-
|
19
|
-
* replication is inherent in the design (in fact it is unavoidable), for better or worse.
|
6
|
+
This is the reference implementation in ruby. It should be able to communicate with implementations in other languages.
|
20
7
|
|
21
8
|
Documentation
|
22
9
|
============
|
23
10
|
|
11
|
+
* [Tutorial](doc/tutorial.md)
|
24
12
|
* [FAQ](doc/faq.md)
|
13
|
+
* [Comparisons](doc/compare.md)
|
14
|
+
* [Transactions](doc/transactions.md)
|
25
15
|
* [Subspaces](doc/subspace.md)
|
26
|
-
* [
|
16
|
+
* [Examples](example/)
|
17
|
+
|
18
|
+
Internals
|
19
|
+
---------
|
20
|
+
* [Architecture and protocol](doc/arch.md)
|
21
|
+
|
22
|
+
Talk
|
23
|
+
----
|
24
|
+
* [Abstract](sfdc.md) and [slides](doc/sfdc.pdf) for San Francisco Distributed Computing meetup
|
27
25
|
|
28
26
|
Getting started
|
29
27
|
==========
|
30
28
|
|
31
|
-
1. Install ruby 2 (not 1.9) from http://ruby-lang.org. Examples and tests will not work on windows (they use fork and unix sockets) or jruby, though probably the underying libs will (using tcp sockets).
|
29
|
+
1. Install ruby 2.0 or 2.1 (not 1.9) from http://ruby-lang.org. Examples and tests will not work on windows (they use fork and unix sockets) or jruby, though probably the underying libs will (using tcp sockets).
|
32
30
|
|
33
31
|
2. Install the gem and its dependencies (you may need to `sudo` this):
|
34
32
|
|
@@ -43,327 +41,14 @@ Getting started
|
|
43
41
|
>> t [nil, nil]
|
44
42
|
=> ["hello", "world"]
|
45
43
|
|
46
|
-
|
47
|
-
|
48
|
-
Write one or more tuples (and wait for the transaction to be recorded in the local space):
|
49
|
-
|
50
|
-
w <tuple>,...
|
51
|
-
write_wait <tuple>,...
|
52
|
-
|
53
|
-
Write without waiting:
|
54
|
-
|
55
|
-
write <tuple>,...
|
56
|
-
|
57
|
-
Write and then wait, under user control:
|
58
|
-
|
59
|
-
write(...).wait
|
60
|
-
|
61
|
-
Pulse a tuple or several (write but immediately delete it, like pubsub):
|
62
|
-
|
63
|
-
pl <tuple>,...
|
64
|
-
pulse_wait ...
|
65
|
-
|
66
|
-
Pulse without waiting:
|
67
|
-
|
68
|
-
pulse_nowait <tuple>,...
|
69
|
-
|
70
|
-
Read tuple matching a template, waiting for a match to exist:
|
71
|
-
|
72
|
-
r <template>
|
73
|
-
read <template>
|
74
|
-
read_wait <template>
|
75
|
-
|
76
|
-
Read tuple matching a template and return it, without waiting for a match to exist (returning nil in that case):
|
77
|
-
|
78
|
-
read_nowait <template>
|
79
|
-
|
80
|
-
Note that neither #read nor #read_nowait wait for any previously issued writes to complete. The difference is that #read waits for a match to exist and #read_nowait does not. Compare:
|
81
|
-
|
82
|
-
write [1]; read_nowait [1] # ==> nil, probably
|
83
|
-
write [2]; read [2] # ==> [2]
|
84
|
-
|
85
|
-
Read all tuples matching a template, no waiting (like #read_nowait):
|
86
|
-
|
87
|
-
ra <template>
|
88
|
-
read_all <template>
|
89
|
-
|
90
|
-
If the template is omitted, reads everything (careful, you get what you ask for!). The template can be a standard template as discussed below or anything with a #=== method. Hence
|
91
|
-
|
92
|
-
ra Hash
|
93
|
-
|
94
|
-
reads all hash tuples (and ignores array tuples), and
|
95
|
-
|
96
|
-
ra proc {|t| t.size==2}
|
97
|
-
|
98
|
-
reads all 2-tuples.
|
99
|
-
|
100
|
-
Read tuples in a stream, both existing and as they arrive:
|
101
|
-
|
102
|
-
read <template> do |tuple| ... end
|
103
|
-
read do |tuple| ... end # match any tuple
|
104
|
-
|
105
|
-
Take a tuple matching a template:
|
106
|
-
|
107
|
-
t <template>
|
108
|
-
take <template>
|
109
|
-
|
110
|
-
Take a tuple matching a template and optimistically use the local value before the transaction is complete:
|
111
|
-
|
112
|
-
x_final = take <template> do |x_optimistic|
|
113
|
-
...
|
114
|
-
end
|
115
|
-
|
116
|
-
There is no guarantee that `x_final == x_optimistic`. The block may execute more than once.
|
117
|
-
|
118
|
-
Take a tuple matching a template, but only if a local match exists (otherwise return nil):
|
119
|
-
|
120
|
-
take_nowait <template>
|
121
|
-
|
122
|
-
x_final = take_nowait <template> do |x_optimistic|
|
123
|
-
...
|
124
|
-
end
|
125
|
-
|
126
|
-
Note that a local match is still not a guarantee of `x_final == x_optimistic`. Another process may take `x_optimistic` first, and the take will be re-executed. (Think of #take_nowait as a way of saying "take a match, but don't bother trying if there is no match known at this time.") Similarly, #take_nowait returning nil is not a guarantee that a match does not exist: another process could have written a match later than the time of the local search.
|
127
|
-
|
128
|
-
Perform a general transaction:
|
129
|
-
|
130
|
-
result =
|
131
|
-
transaction do |t|
|
132
|
-
rval = t.read ... # optimistic value
|
133
|
-
t.write ...
|
134
|
-
t.pulse ...
|
135
|
-
tval = t.take ... # optimistic value
|
136
|
-
[rval, tval] # pass out result
|
137
|
-
end
|
138
|
-
|
139
|
-
Note that the block may execute more than once, if there is competition for the tuples that you are trying to #take or #read. When the block exits, however, the transaction is final and universally accepted by all clients.
|
140
|
-
|
141
|
-
Tuples written or taken during a transaction affect subsequent operations in the transaction without modifying the tuplespace or affecting other concurrent transactions (until the transaction completes):
|
142
|
-
|
143
|
-
transaction do |t|
|
144
|
-
t.write [3]
|
145
|
-
p t.read [3] # => 3
|
146
|
-
p read_all # => [] # note read_all called on client, not trans.
|
147
|
-
t.take [3]
|
148
|
-
p t.read_nowait [3] # => nil
|
149
|
-
end
|
150
|
-
|
151
|
-
Be careful about context within the do...end. If you omit the `|t|` block argument, then all operations are automatically scoped to the transaction, rather than the client. The following is equivalent to the previous example:
|
152
|
-
|
153
|
-
client = self # local var that we can use inside the block
|
154
|
-
transaction do
|
155
|
-
write [3]
|
156
|
-
p read [3]
|
157
|
-
p client.read_all
|
158
|
-
take [3]
|
159
|
-
p read_nowait [3]
|
160
|
-
end
|
161
|
-
|
162
|
-
You can timeout a transaction:
|
163
|
-
|
164
|
-
transaction timeout: 1 do
|
165
|
-
read ["does not exist"]
|
166
|
-
end
|
167
|
-
|
168
|
-
This uses tupelo's internal lightweight scheduler, rather than ruby's heavyweight (one thread per timeout) Timeout, though the latter works with tupelo as well.
|
169
|
-
|
170
|
-
You can also abort a transaction while inside it by calling `#abort` on it:
|
171
|
-
|
172
|
-
write [1]
|
173
|
-
transaction {take [1]; abort}
|
174
|
-
read_all # => [[1]]
|
175
|
-
|
176
|
-
Another thread can abort a transaction in progress (to the extent possible) by calling `#cancel` on it. See [example/cancel.rb](example/cancel.rb).
|
177
|
-
|
178
|
-
4. Run tup with a server file so that two sessions can interact. Do this in two terminals in the same dir:
|
179
|
-
|
180
|
-
$ tup sv
|
181
|
-
|
182
|
-
(The 'sv' argument names a file that the first instance of tup uses to store information like socket addresses and the second instance uses to connect. The first instance starts the servers as child processes. However, both instances appear in the terminal as interactive shells.)
|
183
|
-
|
184
|
-
To do this on two hosts, copy the sv file and, if necessary, edit its connect_host field. You can even do this:
|
185
|
-
|
186
|
-
host1$ tup sv tcp localhost
|
187
|
-
|
188
|
-
host2$ tup host1:path/to/sv --tunnel
|
189
|
-
|
190
|
-
|
191
|
-
5. Look at the examples. You may need to dig a bit to find the gem installation. For example:
|
192
|
-
|
193
|
-
ls -d /usr/local/lib/ruby/gems/*/gems/tupelo*
|
194
|
-
|
195
|
-
Note that all bin and example programs accept blob type (e.g., --msgpack, --json) on command line (it only needs to be specified for server -- the clients discover it). Also, all these programs accept log level on command line. The default is --warn. The --info level is a good way to get an idea of what is happening, without the verbosity of --debug.
|
196
|
-
|
197
|
-
6. Debugging: in addition to the --info switch on all bin and example programs, bin/tspy is also really useful; it shows all tuplespace events in sequence that they occur. For example, run
|
198
|
-
|
199
|
-
$ tspy sv
|
200
|
-
|
201
|
-
in another terminal after running `tup sv`. The output shows the clock tick, sending client, operation, and operation status (success or failure).
|
202
|
-
|
203
|
-
There is also the similar --trace switch that is available to all bin and example programs. This turns on diagnostic output for each transaction. For example:
|
204
|
-
|
205
|
-
```
|
206
|
-
tick cid status operation
|
207
|
-
1 2 write ["x", 1]
|
208
|
-
2 2 write ["y", 2]
|
209
|
-
3 3 take ["x", 1], ["y", 2]
|
210
|
-
```
|
211
|
-
|
212
|
-
The `Tupelo.application` command, provided by `tupelo/app`, is the source of all these options and is available to your programs. It's a kind of lightweight process deployment and control framework; however `Tupelo.application` is not necessary to use tupelo.
|
213
|
-
|
214
|
-
|
215
|
-
What is a tuplespace?
|
216
|
-
=====================
|
217
|
-
|
218
|
-
A tuplespace is a service for coordination, configuration, and control of concurrent and distributed systems. The model it provides to processes is a shared space that they can use to communicate in a deterministic and sequential manner. (Deterministic in that all clients see the same, consistent view of the data.) The space contains tuples. The operations on the space are few, but powerful. It's not a database, but it might be a front-end for one or more databases.
|
219
|
-
|
220
|
-
See https://en.wikipedia.org/wiki/Tuple_space for general information and history. This project is strongly influenced by Masatoshi Seki's Rinda implementation, part of the Ruby standard library. See http://pragprog.com/book/sidruby/the-druby-book for a good introduction to rinda and druby.
|
221
|
-
|
222
|
-
See http://dbmsmusings.blogspot.com/2010/08/problems-with-acid-and-how-to-fix-them.html for an explanation of the importance of determinism in distributed transaction systems.
|
223
|
-
|
224
|
-
What is a tuple?
|
225
|
-
----------------
|
226
|
-
|
227
|
-
A tuple is the unit of information in a tuplespace. It is immutable in the context of the tuplespace -- you can write a tuple into the space and you can read or take one from the space, but you cannot update a tuple within a space. A tuple does not have an identity other than the data it contains. A tuplespace can contain multiple copies of the same tuple. (In the ruby client, two tuples are considered the same when they are #==.)
|
228
|
-
|
229
|
-
A tuple is either an array:
|
230
|
-
|
231
|
-
["hello", 7]
|
232
|
-
[nil, true, false]
|
233
|
-
["foo", 3.2, [6,5,4], {"bar" => 3}]
|
234
|
-
|
235
|
-
... or a hash:
|
236
|
-
|
237
|
-
{name: "Myrtle", location: [100,200]}
|
238
|
-
{ [1,2] => 3, [5,7] => 12 }
|
239
|
-
|
240
|
-
In other words, a tuple is a fairly general object, though this depends on the serializer--see below. More or less, a tuple is anything that can be built out of:
|
241
|
-
|
242
|
-
* strings
|
243
|
-
|
244
|
-
* numbers
|
245
|
-
|
246
|
-
* nil, true, false
|
247
|
-
|
248
|
-
* arrays
|
249
|
-
|
250
|
-
* hashes
|
251
|
-
|
252
|
-
It's kind of like a "JSON object", except that, when using the json serializer, the hash keys can only be strings. In the msgpack case, keys have no special limitations. In the case of the marshal and yaml modes, tuples can contain many other kinds of objects.
|
253
|
-
|
254
|
-
The empty tuples `[]` and `{}` are allowed, but bare values such as `3.14` or `false` are not tuples by themselves.
|
255
|
-
|
256
|
-
One other thing to keep in mind: in the array case, the order of the elements is significant. In the hash case, the order is not significant. So these are both true:
|
257
|
-
|
258
|
-
[1,2] != [2,1]
|
259
|
-
{a:1, b:2} == {b:2, a:1}
|
260
|
-
|
261
|
-
|
262
|
-
What is a template?
|
263
|
-
-------------------
|
264
|
-
|
265
|
-
A template an object that matches (or does not match) tuples. It's used for querying a tuplespace. Typically, a template looks just like a tuple, but possibly with wildcards of some sort. The template:
|
266
|
-
|
267
|
-
[3..5, Integer, /foo/, nil]
|
268
|
-
|
269
|
-
would match the tuple:
|
270
|
-
|
271
|
-
[4, 7, "foobar", "xyz"]
|
272
|
-
|
273
|
-
but not these tuples:
|
274
|
-
|
275
|
-
[6, 7, "foobar", "xyz"]
|
276
|
-
[3, 7.2, "foobar", "xyz"]
|
277
|
-
[3, 7, "fobar", "xyz"]
|
278
|
-
|
279
|
-
The nil wildcard matches anything. The Range, Regexp, and Class entries function as wildcards because of the way they define the #=== (match) method. See ruby docs for general information on "threequals" matching.
|
280
|
-
|
281
|
-
Every tuple can also be used as a template. The template:
|
282
|
-
|
283
|
-
[4, 7, "foobar", "xyz"]
|
284
|
-
|
285
|
-
matches itself.
|
44
|
+
4. Take a look at the [FAQ](doc/faq.md), [tutorial](doc/tutorial.md), and many (examples)(example/).
|
286
45
|
|
287
|
-
Here's a template for matching some hash tuples:
|
288
|
-
|
289
|
-
{name: String, location: "home"}
|
290
|
-
|
291
|
-
This would match all tuples whose keys are "name" and "location" and whose values for those keys are any string and the string "home", respectively.
|
292
|
-
|
293
|
-
A template doesn't have to be a tuple pattern with wildcards, though. It can be anything with a #=== method. For example:
|
294
|
-
|
295
|
-
read_all proc {|t| some_predicate(t)}
|
296
|
-
read_all Hash
|
297
|
-
read_all Array
|
298
|
-
read_all Object
|
299
|
-
|
300
|
-
An optional library, `tupelo/util/boolean`, provides a #match_any method to construct the boolean `or` of other templates:
|
301
|
-
|
302
|
-
read_all match_any( [1,2,3], {foo: "bar"} )
|
303
|
-
|
304
|
-
Unlike in some tuplespace implementations, templates are a client-side concept (except for subspace-defining templates), which is a source of efficiency and scalability. Matching operations (which can be computationally heavy) are performed on the client, rather than on the server, which would bottleneck the whole system.
|
305
|
-
|
306
|
-
What are the operations on tuples?
|
307
|
-
--------------------
|
308
|
-
|
309
|
-
* read - search the space for matching tuples, waiting if none found
|
310
|
-
|
311
|
-
* write - insert the tuple into the space
|
312
|
-
|
313
|
-
* take - search the space for matching tuples, waiting if none found, removing the tuple if found
|
314
|
-
|
315
|
-
* pulse - write and take the tuple; readers see it, but it cannot be taken by other client, and it cannot be read later (this is not a classical tuplespace operation, but is useful for publish-subscribe communication patterns)
|
316
|
-
|
317
|
-
These operations have a few variations (wait vs nowait) and options (timeouts).
|
318
|
-
|
319
|
-
Transactions and optimistic concurrency
|
320
|
-
--------------------
|
321
|
-
|
322
|
-
Transactions combine operations into a group that take effect at the same instant in (logical) time, isolated from other transactions.
|
323
|
-
|
324
|
-
However, it may take some time to prepare the transaction. This is true in terms of both real time (clock and process) and logical time (global sequence of operations). Preparing a transaction means finding tuples that match the criteria of the read and take operations. Finding tuples may require searching (locally) for tuples, or waiting for new tuples to be written by others. Also, the transaction may fail even after matching tuples are found (when another process takes tuples of interest). Then the transaction needs to be prepared again. Once prepared, transaction is sent to all clients, where it may either succeed (in all clients) or fail (for the same reason as before--someone else grabbed one of our tuples). If it fails, then the preparation begins again. A transaction guarantees that, when it completes, all the operations were performed on the tuples at the same logical time. It does not guarantee that the world stands still while one process is inside the `transaction {...}` block.
|
325
|
-
|
326
|
-
Transactions are not just about batching up operations into a more efficient package. A transaction makes the combined operations execute atomically: the transaction finishes only when all of its operations can be successfully performed. Writes and pulses can always succeed, but takes and reads only succeed if the tuples exist.
|
327
|
-
|
328
|
-
Transactions give you a means of optimistic locking: the transaction proceeds in a way that depends on preconditions. See [example/increment.rb](example/increment.rb) for a very simple example. Not only can you make a transaction depend on the existence of a tuple, you can make the effect of the transaction a function of existing tuples (see [example/transaction-logic.rb](example/transaction-logic.rb) and [example/broker-optimistic.rb](example/broker-optimistic.rb)).
|
329
|
-
|
330
|
-
If you prefer classical tuplespace locking, you can simply use certain tuples as locks, using take/write to lock/unlock them. See the examples, such as [example/broker-locking.rb](example/broker-locking.rb). If you have a lot of contention and want to avoid the thundering herd, see [example/lock-mgr-with-queue.rb](example/lock-mgr-with-queue.rb).
|
331
|
-
|
332
|
-
If an optimistic transaction fails (for example, it is trying to take a tuple, but the tuple has just been taken by another transaction), then the transaction block is re-executed, possibly waiting for new matches to the templates. Application code must be aware of the possible re-execution of the block. This is better explained in the examples...
|
333
|
-
|
334
|
-
Transactions have a significant disadvantage compared to using take/write to lock/unlock tuples: a transaction can protect only resources that are represented in the tuplespace, whereas a lock can protect anything: a file, a device, a service, etc. This is because a transaction begins and ends within a single instant of logical (tuplespace) time, whereas a lock tuple can be taken out for an arbitrary duration of real (and logical) time. Furthermore, the instant of logical time in which a transaction takes effect may occur at different wall-clock times on different processes, even on the same host.
|
335
|
-
|
336
|
-
Transactions do have an advantage over using take/write to lock/unlock tuples: there is no possibility of deadlock. See [example/deadlock.rb](example/deadlock.rb) and [example/parallel.rb](example/parallel.rb).
|
337
|
-
|
338
|
-
Another advantage of tranactions is that it is possible to guarantee continuous existence of a time-series of tuples. For example, suppose that tuples matching `{step: Numeric}` indicate the progress of some activity. With transactions, you can guarantee that there is exactly one matching tuple at any time, and that no client ever sees in intermediate or inconsistent state of the counter:
|
339
|
-
|
340
|
-
transaction do
|
341
|
-
step = take(step: nil)["step"]
|
342
|
-
write step: step + 1
|
343
|
-
end
|
344
|
-
|
345
|
-
Any client which reads this template will find a (unique) match without blocking.
|
346
|
-
|
347
|
-
Another use of transactions: forcing a retry when something changes:
|
348
|
-
|
349
|
-
transaction do
|
350
|
-
step = read(step: nil)["step"]
|
351
|
-
take value: nil, step: step
|
352
|
-
end
|
353
|
-
|
354
|
-
This code waits on the existence of a value, but retries if the step changes while waiting. See example/pregel/distributed.rb for a use of this techinique.
|
355
|
-
|
356
|
-
Tupelo transactions are ACID in the following sense. They are Atomic and Isolated -- this is enforced by the transaction processing in each client. Consistency is enforced by the underlying message sequencer: each client's copy of the space is the deterministic result of the same sequence of operations. This is also known as [sequential consistency] (https://en.wikipedia.org/wiki/Sequential_consistency). Durability is optional, but can be provided by the persistent archiver or other clients.
|
357
|
-
|
358
|
-
On the CAP spectrum, tupelo tends towards consistency: for all clients, write and take operations are applied in the same order, so the state of the entire system up through a given tick of discrete time is universally agreed upon. This is known as [state machine replication] (http://en.wikipedia.org/wiki/State%20machine%20replication). Of course, because of the difficulties of distributed systems, one client may not yet have seen the same range of ticks as another. Tupelo's replication model (especially in the use of subspaces) can also be described as [virtual synchrony](https://en.wikipedia.org/wiki/Virtual_synchrony).
|
359
|
-
|
360
|
-
Tupelo transactions do not require two-phase commit, because they are less powerful than general transactions. Each client has enough information to decide (in the same way as all other clients) whether the transaction succeeds or fails. This has performance advantages, but imposes some limitations on transactions over subspaces that are known to one client but not another. [Subspaces](doc/subspace.md).
|
361
46
|
|
47
|
+
Applications
|
48
|
+
=======
|
362
49
|
|
363
|
-
|
364
|
-
======
|
50
|
+
Tupelo is a flexible base layer for various distributed programming paradigms: job queues, dataflow, map-reduce, etc.
|
365
51
|
|
366
|
-
You can use tupelo with a simplified syntax, like a "domain-specific language". Each construct with a block can be used in either of two forms, with an explicit block param or without. Compare [example/add-dsl.rb](example/add-dsl.rb) and [example/add.rb](example/add.rb).
|
367
52
|
|
368
53
|
|
369
54
|
Advantages
|
@@ -414,137 +99,13 @@ Future
|
|
414
99
|
|
415
100
|
- Investigate nio4r for faster networking, especially with many clients.
|
416
101
|
|
417
|
-
- Interoperable client and server implementations in C, Python, Go,
|
102
|
+
- Interoperable client and server implementations in C, Python, Go, Elixir?
|
418
103
|
|
419
104
|
- UDP multicast to further reduce the bottleneck in the message sequencer.
|
420
105
|
|
421
106
|
- Tupelo as a service; specialized and replicated subspace managers as services.
|
422
107
|
|
423
108
|
|
424
|
-
Comparisons
|
425
|
-
===========
|
426
|
-
|
427
|
-
Redis
|
428
|
-
-----
|
429
|
-
|
430
|
-
Unlike redis, computations are not a centralized bottleneck. Set intersection, for example.
|
431
|
-
|
432
|
-
Pushing data to client eliminates need for polling, makes reads faster.
|
433
|
-
|
434
|
-
Tupelo's pulse/read ops are like pubsub in redis.
|
435
|
-
|
436
|
-
However, tupelo is not a substitute for the caching functionality of redis and memcache.
|
437
|
-
|
438
|
-
|
439
|
-
Rinda
|
440
|
-
-----
|
441
|
-
|
442
|
-
Very similar api.
|
443
|
-
|
444
|
-
Rinda has a severe bottleneck, though: all matching, waiting, etc. are performed in one process.
|
445
|
-
|
446
|
-
Rinda is rpc-based, which is slower and also more vulnerable due to the extra client-server state; tupelo is imlemented on a message layer, rather than rpc. This also helps with pipelined writes.
|
447
|
-
|
448
|
-
Tupelo also supports custom classes in tuples, but only with marshal / yaml; must define #==; see [example/custom-class.rb](example/custom-class.rb)
|
449
|
-
|
450
|
-
Both: tuples can be arrays or hashes.
|
451
|
-
|
452
|
-
Spaces have an advantage over distributed hash tables: different clients may acccess tuples in terms of different dimensions. For example, a producer generates [producer_id, value]; a consumer looks for [nil, SomeParticularValues]. Separation of concerns, decoupling in the data space.
|
453
|
-
|
454
|
-
|
455
|
-
To compare
|
456
|
-
----------
|
457
|
-
|
458
|
-
* beanstalkd
|
459
|
-
|
460
|
-
* resque
|
461
|
-
|
462
|
-
* zookeeper -- totally ordered updates; tupelo trades availability for lower latency (?)
|
463
|
-
|
464
|
-
* chubby
|
465
|
-
|
466
|
-
* doozer, etcd
|
467
|
-
|
468
|
-
* serf -- tupelo has lower latency and is transactional, but at a cost compared to serf; tupelo semantics is closer to databases
|
469
|
-
|
470
|
-
* arakoon
|
471
|
-
|
472
|
-
* hazelcast
|
473
|
-
|
474
|
-
* lmax -- minimal spof
|
475
|
-
|
476
|
-
* datomic -- similar distribution of "facts", but not tuplespace; similar use of pluggable storage managers
|
477
|
-
|
478
|
-
* job queues: sidekiq, resque, delayedjob, http://queues.io, https://github.com/factual/skuld
|
479
|
-
|
480
|
-
* pubsubs: kafka
|
481
|
-
|
482
|
-
* spark, storm
|
483
|
-
|
484
|
-
* tibco and gigaspace
|
485
|
-
|
486
|
-
* gridgain
|
487
|
-
|
488
|
-
|
489
|
-
Architecture
|
490
|
-
============
|
491
|
-
|
492
|
-
Two central processes:
|
493
|
-
|
494
|
-
* message sequencer -- assigns unique increasing IDs to each message (a message is essentially a transaction containing operations on the tuplespace). This is the key to the whole design. By sequencing all transactions in a way that all clients agree with, the transactions can be applied (or rejected) by all clients without further negotiation.
|
495
|
-
|
496
|
-
* client sequencer -- assigns unique increasing IDs to clients when they join the distributed system
|
497
|
-
|
498
|
-
Specialized clients:
|
499
|
-
|
500
|
-
* archiver -- dumps tuplespace state to clients joining the system later than t=0; at least one archiver is required, unless all clients start at t=0.
|
501
|
-
|
502
|
-
* tup -- command line shell for accessing (and creating) tuplespaces
|
503
|
-
|
504
|
-
* tspy -- uses the notification API to watch all events in the space
|
505
|
-
|
506
|
-
* queue / lock / lease managers (see examples)
|
507
|
-
|
508
|
-
General application clients:
|
509
|
-
|
510
|
-
* contain a worker thread and any number of application-level client threads
|
511
|
-
|
512
|
-
* worker thread manages local tuplespace state and requests to modify or access it
|
513
|
-
|
514
|
-
* client threads construct transactions and wait for results (communicating with the worker thread over queues); they may also use asynchronous transactions
|
515
|
-
|
516
|
-
Some design principles:
|
517
|
-
|
518
|
-
* Once a transaction has been sent from a client to the message sequencer, it references only tuples, not templates. This makes it faster and simpler for each receiving client to apply or reject the transaction. Also, clients that do not support local template searching (such as archivers) can store tuples using especially efficient data structures that only support tuple-insert, tuple-delete, and iterate/export operations.
|
519
|
-
|
520
|
-
* Use non-blocking protocols. For example, transactions can be evaluated in one client without waiting for information from other clients. Even at the level of reading messages over sockets, tupelo uses (via funl and object-stream) non-blocking constructs. At the application level, you can use transactions to optimistically modify shared state (but applications are free to use locking if high contention demands it).
|
521
|
-
|
522
|
-
* Do the hard work on the client side. For example, all pattern matching happens in the client that requested an operation that has a template argument, not on the server or other clients.
|
523
|
-
|
524
|
-
Protocol
|
525
|
-
--------
|
526
|
-
|
527
|
-
Nothing in the protocol specifies local searching or storage, or matching, or notification, or templating. That's all up to each client. The protocol only contains tuples and operations on them (take, write, pulse, read), combined into transactions.
|
528
|
-
|
529
|
-
The protocol has two layers. The outer (message) layer is 6 fields, managed by the funl gem, using msgpack for serialization. All socket reads are non-blocking (using msgpack's stream mode), so a slow sender will not block other activity in the system.
|
530
|
-
|
531
|
-
One of those 6 fields is a data blob, containing the actual transaction and tuple information. The inner (blob) layer manages that field using msgpack (by default), marshal, json, or yaml. This layer contains the transaction operations. The blob is not unpacked by the server, only by clients.
|
532
|
-
|
533
|
-
Each inner serialization method ("blobber") has its own advantages and drawbacks:
|
534
|
-
|
535
|
-
* marshal is ruby only, but can contain the widest variation of objects
|
536
|
-
|
537
|
-
* yaml is portable and humanly readable, and still fairly diverse, but very inefficient
|
538
|
-
|
539
|
-
* msgpack and json (yajl) are both relatively efficient (in terms of packet size, as well as parse/emit time)
|
540
|
-
|
541
|
-
* msgpack and json support the least diversity of objects (just "JSON objects"), but msgpack also supports hash keys that are objects rather than just strings.
|
542
|
-
|
543
|
-
For most purposes, msgpack is a good choice, so it is the default.
|
544
|
-
|
545
|
-
The sending client's tupelo library must make sure that there is no aliasing within the list of tuples (this is only an issue for Marshal and YAML, since msgpack and json do not support references).
|
546
|
-
|
547
|
-
|
548
109
|
Development
|
549
110
|
===========
|
550
111
|
|
data/bin/tup
CHANGED
@@ -49,7 +49,7 @@ if ARGV.delete("-h") or ARGV.delete("--help")
|
|
49
49
|
|
50
50
|
--trace enable trace output
|
51
51
|
|
52
|
-
--tunnel remote clients use ssh tunnels by default
|
52
|
+
--tunnel remote clients use ssh tunnels by default
|
53
53
|
|
54
54
|
--pubsub publish/subscribe mode; does not keep local tuple store:
|
55
55
|
|
@@ -118,6 +118,15 @@ Tupelo.application(
|
|
118
118
|
alias tr transaction
|
119
119
|
CMD_ALIASES = %w{ w pl t r ra tr }
|
120
120
|
private *CMD_ALIASES
|
121
|
+
|
122
|
+
def help
|
123
|
+
puts "Command aliases:"
|
124
|
+
CMD_ALIASES.each do |m_name|
|
125
|
+
m = method(m_name)
|
126
|
+
printf "%8s -> %s\n", m.name, m.original_name
|
127
|
+
end
|
128
|
+
nil
|
129
|
+
end
|
121
130
|
end
|
122
131
|
|
123
132
|
client_opts = {}
|
@@ -134,7 +143,7 @@ Tupelo.application(
|
|
134
143
|
use_subspaces! if use_subspaces
|
135
144
|
|
136
145
|
log.info {"cpu time: %.2fs" % Process.times.inject {|s,x|s+x}}
|
137
|
-
log.info {"starting shell.
|
146
|
+
log.info {"starting shell."}
|
138
147
|
|
139
148
|
require 'tupelo/app/irb-shell'
|
140
149
|
IRB.start_session(self)
|
@@ -2,8 +2,6 @@
|
|
2
2
|
|
3
3
|
require 'tupelo/app'
|
4
4
|
|
5
|
-
sv = "chat-nohistory.yaml"
|
6
|
-
|
7
5
|
Thread.abort_on_exception = true
|
8
6
|
|
9
7
|
def display_message msg
|
@@ -13,7 +11,7 @@ def display_message msg
|
|
13
11
|
puts "#{from}@#{time_str}> #{line}"
|
14
12
|
end
|
15
13
|
|
16
|
-
Tupelo.tcp_application
|
14
|
+
Tupelo.tcp_application do
|
17
15
|
me = argv.shift
|
18
16
|
|
19
17
|
local do
|
data/example/chat/chat.rb
CHANGED
@@ -1,11 +1,23 @@
|
|
1
|
-
#
|
2
|
-
# user name to be shared with other chat clients. New clients see a brief
|
3
|
-
# history of the chat, as well as new messages from other clients.
|
1
|
+
# Network chat program.
|
4
2
|
#
|
5
3
|
# You can run several instances of chat.rb. The first will set up all needed
|
6
|
-
# services. The rest will connect by referring to
|
7
|
-
#
|
8
|
-
#
|
4
|
+
# services, as well as run the chat shell. The rest will connect by referring to
|
5
|
+
# the services specified in a yaml file, and then run the chat shell.
|
6
|
+
#
|
7
|
+
# Usage:
|
8
|
+
#
|
9
|
+
# ruby chat.rb chat.yaml username
|
10
|
+
#
|
11
|
+
# For remote clients, you can copy the yaml file, or use scp syntax:
|
12
|
+
#
|
13
|
+
# ruby chat.rb host:path/to/chat.yaml username
|
14
|
+
#
|
15
|
+
# The username is shared with other chat clients. New clients see a brief
|
16
|
+
# history of the chat, as well as new messages from other clients.
|
17
|
+
#
|
18
|
+
# Accepts usual tupelo switches (such as --trace, --debug, --tunnel).
|
19
|
+
#
|
20
|
+
# If the first instance is run with "--persist-dir <dir>", messages
|
9
21
|
# will persist across service shutdown.
|
10
22
|
#
|
11
23
|
# Compare: https://github.com/bloom-lang/bud/blob/master/examples/chat.
|
@@ -15,7 +27,6 @@
|
|
15
27
|
|
16
28
|
require 'tupelo/app'
|
17
29
|
|
18
|
-
sv = "chat.yaml"
|
19
30
|
history_period = 60 # seconds -- discard _my_ messages older than this
|
20
31
|
|
21
32
|
Thread.abort_on_exception = true
|
@@ -27,7 +38,7 @@ def display_message msg
|
|
27
38
|
puts "#{from}@#{time_str}> #{line}"
|
28
39
|
end
|
29
40
|
|
30
|
-
Tupelo.tcp_application
|
41
|
+
Tupelo.tcp_application do
|
31
42
|
me = argv.shift
|
32
43
|
|
33
44
|
local do
|