spinoza 0.1 → 0.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 868a9f9ccabcf28a96ace9484c9c0caf0bbacd26
4
- data.tar.gz: 42d844a448adc075556b5933731bcb90e4a58356
3
+ metadata.gz: 8864d3908b3001de934f150becf1876972c78809
4
+ data.tar.gz: 7da59f8200f6b5d501842d950b98d9be440eb4ce
5
5
  SHA512:
6
- metadata.gz: 4c178a4b1cf1e14cf538dec9602c4a25eca2f57a003a8f8db2b2b92fb9b55feb9014979db38caabc6af9ce2f7fc26dfda73b564da454599fb8b054f80132585b
7
- data.tar.gz: ae5bcbdf391a45950d1d3aa4efc0ac52f10d99d413f0ee96e639c179ecb054a311ae14ab3448d0ecbc22976162e02e22d71a0cbb4088e091aa193decb7a1c86a
6
+ metadata.gz: f33925d0842233788b242cdde6d0d3c1311dff1d7d2e9349907516fd3ec95ba40b0961786e6f4e00accd3c8a01efe3e9732563758b3c640b34f6a4bb1b0745f7
7
+ data.tar.gz: 57e8c0075dc16a91180a89955f70c8548c310f10e4d0db444fa39a365c0be659174cb9ed7896bad64065263b7d31471a738d5eb7eb5f4dbad4e6f7364f4f7fd8
data/README.md CHANGED
@@ -1,14 +1,77 @@
1
1
  spinoza
2
2
  =======
3
3
 
4
- A model of the Calvin distributed database.
4
+ A model of the Calvin distributed database. The main purpose of this model is expository, rather than analysis for correctness or performance.
5
5
 
6
6
  Spinoza, like Calvin, was a philosopher who dealt in determinism.
7
7
 
8
- The model of the underlying computer and network system is in lib/spinoza/system.
9
-
10
8
  Calvin is developed by the Yale Databases group; the open-source releases are at https://github.com/yaledb.
11
9
 
10
+
11
+ Structure
12
+ =========
13
+
14
+ The model of the underlying computer and network system is in [lib/spinoza/system](lib/spinoza/system).
15
+
16
+ The Calvin model, implemented on the system models, is in [lib/spinoza/calvin](lib/spinoza/calvin). Other distributed transaction models could also be implemented on this layer.
17
+
18
+ The transaction class, in [lib/spinoza/transaction.rb](lib/spinoza/transaction.rb), is mostly abstracted from these layers. It is very simplistic, intended to illustrate Calvin's replication and consistency characteristics.
19
+
20
+
21
+ Running
22
+ =======
23
+
24
+ You will need ruby 2.0 or later, from http://ruby-lang.org, and the gems listed in the gemspec:
25
+
26
+ sequel
27
+ sqlite3
28
+ rbtree
29
+
30
+ You can also `gem install spinoza`, but it may not be up to date.
31
+
32
+ To run the unit tests:
33
+
34
+ rake test
35
+
36
+ Examples TBD.
37
+
38
+
39
+ References
40
+ ==========
41
+
42
+ * The Calvin papers:
43
+
44
+ * [The Case for Determinism in Database Systems](http://cs-www.cs.yale.edu/homes/dna/papers/determinism-vldb10.pdf)
45
+
46
+ * [Consistency Tradeoffs in Modern Distributed Database System Design](http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf)
47
+
48
+ * [Modularity and Scalability in Calvin](http://sites.computer.org/debull/A13june/calvin1.pdf)
49
+
50
+ * [Calvin: Fast Distributed Transactions for Partitioned Database Systems](http://www.cs.yale.edu/homes/dna/papers/calvin-sigmod12.pdf)
51
+
52
+ * [Lightweight Locking for Main Memory Database Systems](http://cs-www.cs.yale.edu/homes/dna/papers/vll-vldb13.pdf)
53
+
54
+
55
+ To do
56
+ =====
57
+
58
+ * The performance and error modeling should optionally be statistical, with variation using some distribution.
59
+
60
+ * Model IO latency and compute time, in addition to currently modeled network latency.
61
+
62
+ * `Log#time_replicated` should be a function of the reading node and depend on the link characteristics between that node and the writing node.
63
+
64
+ * Transactions, to be more realistic, should have dataflow dependencies among operations. (But only for non-key values, because Calvin splits dependent transactions.)
65
+
66
+ * Transactions also need conditionals, or, at least, conditional abort, which is needed to support the splitting mentioned above.
67
+
68
+ * For comparison, implement a 2-phase commit transaction processor on top of the Spinoza::System classes.
69
+
70
+ * Output spacetime diagrams using graphviz.
71
+
72
+ * See also 'TODO' in code.
73
+
74
+
12
75
  Contact
13
76
  =======
14
77
 
@@ -0,0 +1,107 @@
1
+ require 'spinoza/common'
2
+
3
+ # Represents the work performed in one thread. The scheduler assigns a sequence
4
+ # of transactions to each of several executors. The executor handles the
5
+ # transactions one at a time, in a series of substeps as data is received from
6
+ # peers. Within an executor, the sequence of transactions and substeps is
7
+ # totally ordered wrt the global timeline, but the sequences of two Executors
8
+ # may interleave, which is how Calvin achieves some write concurrency.
9
+ #
10
+ # Does not have access to any subsystems except the node's Store and
11
+ # communication with peer executors via the readcasters.
12
+ #
13
+ class Calvin::Executor
14
+ attr_reader :store
15
+ attr_reader :readcaster
16
+ attr_reader :task
17
+
18
+ class StateError < StandardError; end
19
+
20
+ # Represents the state of executing one transaction in this Executor, in
21
+ # the case where that execution involves waiting for data from peers.
22
+ class Task
23
+ attr_reader :txn
24
+
25
+ # Accumulates results as they arrive locally and from peers.
26
+ attr_accessor :read_results
27
+
28
+ # Set of tables the task is waiting for.
29
+ attr_accessor :remote_read_tables
30
+
31
+ def initialize txn, read_results: [], remote_read_tables: Set[]
32
+ @txn = txn
33
+ @read_results = read_results
34
+ @remote_read_tables = remote_read_tables
35
+ end
36
+ end
37
+
38
+ def initialize store: nil, readcaster: nil
39
+ @store = store
40
+ @readcaster = readcaster
41
+ ready!
42
+ end
43
+
44
+ def ready!
45
+ @task = nil
46
+ end
47
+
48
+ def ready?
49
+ @task.nil?
50
+ end
51
+
52
+ def assert_ready?
53
+ unless ready?
54
+ raise StateError, "cannot start new task -- already executing #{task}"
55
+ end
56
+ end
57
+
58
+ # Assumes all locks are held around this call.
59
+ def execute_transaction txn
60
+ assert_ready?
61
+
62
+ local_read_results = @readcaster.execute_local_reads txn
63
+ @readcaster.serve_reads txn, local_read_results
64
+
65
+ if passive? txn
66
+ result = local_read_results
67
+ ready!
68
+
69
+ elsif all_reads_are_local? txn
70
+ result = local_read_results
71
+ store.execute *txn.all_write_ops
72
+ ready!
73
+
74
+ else
75
+ @task = Task.new txn,
76
+ read_results: local_read_results,
77
+ remote_read_tables: txn.remote_read_tables(store)
78
+ result = false
79
+ end
80
+
81
+ return result
82
+ end
83
+
84
+ def passive? txn
85
+ not txn.active? store
86
+ end
87
+
88
+ def all_reads_are_local? txn
89
+ txn.all_reads_are_local? store
90
+ end
91
+
92
+ # Assumes all locks are held around this call.
93
+ def recv_remote_reads table, read_results
94
+ if task.remote_read_tables.include? table
95
+ task.remote_read_tables.delete table
96
+ task.read_results.concat read_results
97
+ # else this is a redundant message for this table, so ignore it
98
+ end
99
+
100
+ return false unless task.remote_read_tables.empty?
101
+
102
+ store.execute *task.txn.all_write_ops
103
+ result = task.read_results
104
+ ready!
105
+ result
106
+ end
107
+ end
@@ -0,0 +1,44 @@
1
+ require 'spinoza/system/node'
2
+ require 'spinoza/calvin/sequencer'
3
+ require 'spinoza/calvin/scheduler'
4
+
5
+ class Calvin::Node < Spinoza::Node
6
+ attr_reader :sequencer, :scheduler
7
+ attr_reader :log, :meta_log
8
+
9
+ def initialize *tables, log: nil, meta_log: nil,
10
+ sequencer: nil, scheduler: nil, **rest
11
+ super *tables, **rest
12
+
13
+ @log = log
14
+ @meta_log = meta_log
15
+ @sequencer = sequencer || Calvin::Sequencer.new(node: self)
16
+ @scheduler = scheduler || Calvin::Scheduler.new(node: self)
17
+
18
+ on_transaction_finish &method(:default_output)
19
+ end
20
+
21
+ def default_output transaction, result
22
+ r = result.map {|rr| [rr.op.table, rr.val].join(":")}.join(", ")
23
+ puts "%07.6f [RESULT] #{transaction} => #{r}" % timeline.now
24
+ end
25
+
26
+ def recv msg: nil
27
+ scheduler.recv_peer_results **msg
28
+ end
29
+
30
+ def read_batch batch_id
31
+ log.read batch_id, node: self
32
+ end
33
+
34
+ def on_transaction_finish &b
35
+ @finished_transaction_handler = b
36
+ end
37
+
38
+ # Override this to put the result somewhere.
39
+ def finished_transaction transaction, result
40
+ if @finished_transaction_handler
41
+ @finished_transaction_handler[transaction, result]
42
+ end
43
+ end
44
+ end
@@ -0,0 +1,50 @@
1
+ require 'spinoza/common'
2
+ require 'set'
3
+
4
+ class Calvin::Readcaster
5
+ attr_reader :node
6
+
7
+ def initialize node: nil
8
+ @node = node
9
+ @links = nil
10
+ @tables = node.tables
11
+ end
12
+
13
+ def inspect
14
+ "<#{self.class} on #{node.inspect}>"
15
+ end
16
+
17
+ # Pre-computed map, by table, of which nodes might need data from this node:
18
+ # {table => Set[link, ...]} In other, words, excludes `table,link` pairs for
19
+ # which link.dst already has `table`.
20
+ def links
21
+ unless @links
22
+ @links = Hash.new {|h,k| h[k] = Set[]}
23
+ node.links.each do |there, link|
24
+ (@tables - there.tables).each do |table|
25
+ @links[table] << link
26
+ end
27
+ end
28
+ end
29
+ @links
30
+ end
31
+
32
+ def execute_local_reads txn
33
+ local_reads = txn.all_read_ops.select {|r| @tables.include? r.table}
34
+ node.store.execute *local_reads
35
+ end
36
+
37
+ def serve_reads txn, local_read_results
38
+ local_read_results.group_by {|r| r.op.table}.each do |table, results|
39
+ links[table].each do |link|
40
+ if txn.active?(link.dst)
41
+ send_read link, transaction: txn, table: table, read_results: results
42
+ end
43
+ end
44
+ end
45
+ end
46
+
47
+ def send_read link, **opts
48
+ link.send_message **opts
49
+ end
50
+ end
@@ -0,0 +1,134 @@
1
+ require 'spinoza/common'
2
+ require 'spinoza/calvin/executor'
3
+ require 'spinoza/calvin/readcaster'
4
+
5
+ class Calvin::Scheduler
6
+ attr_reader :node
7
+
8
+ attr_reader :executors
9
+ attr_reader :idle_executors
10
+
11
+ # Maps { locally executing transaction => Executor }
12
+ attr_reader :ex_for_txn
13
+
14
+ # Transactions to be executed, in order.
15
+ attr_reader :work_queue
16
+
17
+ def initialize node: raise, n_threads: 4
18
+ @node = node
19
+
20
+ @executors = n_threads.times.map {
21
+ Calvin::Executor.new(
22
+ store: node.store,
23
+ readcaster: Calvin::Readcaster.new(node: node))}
24
+
25
+ @idle_executors = @executors.dup
26
+ @ex_for_txn = {}
27
+ @work_queue = []
28
+
29
+ node.meta_log.on_entry_available self, :handle_meta_log_entry
30
+ end
31
+
32
+ def inspect
33
+ "<#{self.class} on #{node.inspect}>"
34
+ end
35
+
36
+ def handle_meta_log_entry id: raise, node: raise, value: raise
37
+ batch_id = value
38
+ batch = node.read_batch(batch_id)
39
+ if batch
40
+ work_queue.concat batch
41
+ handle_next_transactions
42
+ else
43
+ # Log entry did not yet propagate to this node, even though MetaLog entry
44
+ # did propagate. Won't happen with default latency settings.
45
+ raise "TODO" ##
46
+ end
47
+ end
48
+
49
+ # Handle messages from peers. The only messages are the unidirectional
50
+ # broadcasts of read results.
51
+ def recv_peer_results transaction: raise, table: raise, read_results: raise
52
+ ex = ex_for_txn[transaction]
53
+ if ex
54
+ result = ex.recv_remote_reads table, read_results
55
+ if result
56
+ finish_transaction transaction, result
57
+ handle_next_transactions
58
+ end
59
+ else
60
+ ## TODO what if transaction hasn't started yet? Buffer? This won't
61
+ ## happen with our simplistic latency assumptions.
62
+ # The transaction has already finished locally, but another
63
+ # node is still sending out read results.
64
+ end
65
+ end
66
+
67
+ def handle_next_transactions
68
+ until work_queue.empty? or idle_executors.empty?
69
+ success = handle_next_transaction
70
+ break unless success
71
+ end
72
+ end
73
+
74
+ def handle_next_transaction
75
+ ex = idle_executors.last
76
+ txn = work_queue.first
77
+ raise if ex_for_txn[txn]
78
+
79
+ lock_succeeded = try_lock(txn)
80
+
81
+ if lock_succeeded
82
+ txn = work_queue.shift
83
+ result = ex.execute_transaction(txn)
84
+ if result
85
+ finish_transaction txn, result
86
+ else
87
+ idle_executors.pop
88
+ ex_for_txn[txn] = ex
89
+ end
90
+
91
+ else
92
+ node.lock_manager.unlock_all txn
93
+ # nothing to do until some executor finishes its current transaction
94
+ ## TODO optimization: attempt to reorder another txn to the head
95
+ ## of the work_queue where lock sets are disjoint.
96
+ end
97
+
98
+ lock_succeeded
99
+ end
100
+
101
+ def try_lock txn
102
+ lm = node.lock_manager
103
+ rset = txn.read_set
104
+ wset = txn.write_set
105
+
106
+ # get write locks first, so r/w on same key doesn't fail
107
+ wset.each do |table, keys|
108
+ keys.each do |key|
109
+ next if key == Spinoza::Transaction::INSERT_KEY
110
+ lm.lock_write [table, key], txn
111
+ end
112
+ end
113
+
114
+ rset.each do |table, keys|
115
+ keys.each do |key|
116
+ lm.lock_read [table, key], txn
117
+ end
118
+ end
119
+
120
+ true
121
+
122
+ rescue Spinoza::LockManager::ConcurrencyError
123
+ false
124
+ end
125
+
126
+ def finish_transaction transaction, result
127
+ ex = ex_for_txn.delete(transaction)
128
+ if ex
129
+ idle_executors.push ex
130
+ end
131
+ node.lock_manager.unlock_all transaction
132
+ node.finished_transaction transaction, result
133
+ end
134
+ end
@@ -0,0 +1,74 @@
1
+ require 'spinoza/system/model'
2
+
3
+ # Accepts transaction requests from clients. The requests accepted in an epoch
4
+ # are grouped as a batch, given a sequential id, and replicated to the
5
+ # transaction schedulers on each node.
6
+ class Calvin::Sequencer < Spinoza::Model
7
+ attr_reader :node
8
+
9
+ # ID used to construct UUID for batch.
10
+ attr_reader :id
11
+
12
+ # Length of epoch in seconds.
13
+ attr_reader :dt_epoch
14
+
15
+ @seq_id = 0
16
+ class << self
17
+ def next_id
18
+ @seq_id += 1
19
+ end
20
+ end
21
+
22
+ def initialize node: raise, dt_epoch: 0.010
23
+ super timeline: node.timeline
24
+
25
+ @node = node
26
+ @dt_epoch = dt_epoch
27
+ @batch = []
28
+ @epoch = 0
29
+ @id = self.class.next_id
30
+
31
+ step_epoch
32
+ end
33
+
34
+ def inspect
35
+ "<#{self.class} on #{node.inspect}>"
36
+ end
37
+
38
+ def step_epoch
39
+ unless @batch.empty?
40
+ batch_id = [@id, @epoch] # globally unique, but not ordered
41
+ log.write batch_id, @batch, node: node
42
+
43
+ log.when_durable batch_id,
44
+ actor: self,
45
+ action: :append_batch_to_meta_log,
46
+ batch_id: batch_id
47
+
48
+ @batch = []
49
+ end
50
+ @epoch += 1
51
+
52
+ timeline.schedule Spinoza::Event[
53
+ time: time_now + dt_epoch,
54
+ actor: self,
55
+ action: :step_epoch
56
+ ]
57
+ end
58
+
59
+ def append_batch_to_meta_log batch_id: batch_id
60
+ meta_log.append batch_id, node: node
61
+ end
62
+
63
+ def log
64
+ node.log
65
+ end
66
+
67
+ def meta_log
68
+ node.meta_log
69
+ end
70
+
71
+ def accept_transaction txn
72
+ @batch << txn
73
+ end
74
+ end