bud 0.9.2 → 0.9.3
Sign up to get free protection for your applications and to get access to all the features.
- data/History.txt +17 -4
- data/README.md +5 -0
- data/docs/cheat.md +1 -1
- data/docs/getstarted.md +1 -1
- data/lib/bud.rb +52 -34
- data/lib/bud/aggs.rb +8 -11
- data/lib/bud/bud_meta.rb +18 -18
- data/lib/bud/collections.rb +14 -21
- data/lib/bud/executor/README.rescan +80 -0
- data/lib/bud/executor/elements.rb +25 -44
- data/lib/bud/executor/group.rb +80 -29
- data/lib/bud/executor/join.rb +73 -90
- data/lib/bud/monkeypatch.rb +1 -1
- data/lib/bud/rebl.rb +5 -2
- data/lib/bud/rewrite.rb +18 -14
- data/lib/bud/server.rb +1 -1
- data/lib/bud/source.rb +0 -45
- data/lib/bud/storage/dbm.rb +13 -9
- data/lib/bud/viz.rb +6 -8
- data/lib/bud/viz_util.rb +1 -0
- metadata +3 -18
@@ -0,0 +1,80 @@
|
|
1
|
+
Notes on Invalidate and Rescan in Bud
|
2
|
+
=====================================
|
3
|
+
|
4
|
+
(I'll use 'downstream' to mean rhs to lhs (like in budplot). In every stratum,
|
5
|
+
data originates at scanned sources at the "top", winds its way through various
|
6
|
+
PushElements and ends up in a collection at the "bottom". I'll also the term
|
7
|
+
"elements" to mean both dataflow nodes (PushElements) and collections).
|
8
|
+
|
9
|
+
Invalidation strategy works through two flags/signals, rescan and
|
10
|
+
invalidate. Invalidation means a stateful PushElement or a scratch's contents
|
11
|
+
are erased, or table is negated. Rescan means that tuples coming out of an
|
12
|
+
element represent the entire collection (a full-scan), not just deltas.
|
13
|
+
|
14
|
+
Earlier: all stateful elements were eagerly invalidated.
|
15
|
+
Collections with state: scratches, interfaces, channels, terminal
|
16
|
+
Elements with state: Group, join, sort, reduce, each_with_index
|
17
|
+
|
18
|
+
Now: lazy invalidation where possible, based on the observation that the same
|
19
|
+
state is often rederived downstream, which means that as long as there are no
|
20
|
+
negations, one should be able to go on in incremental mode (working only on
|
21
|
+
deltas, not on storage) from one tick to another.
|
22
|
+
|
23
|
+
Observations:
|
24
|
+
|
25
|
+
1. There are two kinds of elements that are (or may be) invalidated at the
|
26
|
+
beginning of every tick: source scratches (those that are not found on the
|
27
|
+
lhs of any rule), and tables that process pending negations.
|
28
|
+
|
29
|
+
2. a. Invalidation implies rescan of its contents.
|
30
|
+
|
31
|
+
b. Rescan of its contents implies invalidation of downstream nodes.
|
32
|
+
|
33
|
+
c. Invalidation involves rebuilding of state, which means that if a node has
|
34
|
+
multiple sources, it has to ask the other sources to rescan as well.
|
35
|
+
|
36
|
+
Example: x,y,z are scratches
|
37
|
+
z <= x.group(....)
|
38
|
+
z <= y.sort {}
|
39
|
+
|
40
|
+
If x is invalidated, it will rescan its contents. The group element then
|
41
|
+
invalidates its state, and rebuilds itself as x is scanned. Since group is
|
42
|
+
in rescan mode, z invalidates its state and is rebuilt from group.
|
43
|
+
However, since part of z's state state comes from y.sort, it asks its
|
44
|
+
source element (the sort node) for a rescan as well.
|
45
|
+
|
46
|
+
This push-pull negotiation can be run until fixpoint, until the elements
|
47
|
+
that need to be invalidated and rescanned is determined fully.
|
48
|
+
|
49
|
+
3. If a node is stateless, it passes the rescan request upstream, and the
|
50
|
+
invalidations downstream. But if it is stateful, it need not pass a rescan
|
51
|
+
request upstream. In the example above, only the sort node needs to rescan
|
52
|
+
its buffer; y doesn't need to be scanned at all.
|
53
|
+
|
54
|
+
4. Solving the above constraints to a fixpoint at every tick is a huge
|
55
|
+
overhead. So we determine the strategy at wiring time.
|
56
|
+
|
57
|
+
bud.default_invalidate/default_rescan == set of elements that we know
|
58
|
+
apriori will _always_ need the corresponding signal.
|
59
|
+
|
60
|
+
scanner.invalidate_set/rescan_set == for each scanner, the set of elements
|
61
|
+
to invalidate/rescan should that scanner's collection be negated.
|
62
|
+
|
63
|
+
bud.prepare_invalidation_scheme works as follows.
|
64
|
+
|
65
|
+
Start the process by determining which tables will invalidate at each tick,
|
66
|
+
and which PushElements will rescan at the beginning of each tick. Then run
|
67
|
+
rescan_invalidate_tc for a transitive closure, where each element gets to
|
68
|
+
determine its own presence in the rescan and invalidate sets, depending on
|
69
|
+
its source or target elements' presence in those sets. This creates the
|
70
|
+
default sets.
|
71
|
+
|
72
|
+
Then for each scanner, prime the pump by setting the scanner to rescan mode,
|
73
|
+
and determine what effect it has on the system, by running
|
74
|
+
rescan_invalidate_tc. All the elements that are not already in the default
|
75
|
+
sets are those that need to be additionally informed at run time, should we
|
76
|
+
discover that that scanner's collection has been negated at the beginning of
|
77
|
+
each tick.
|
78
|
+
|
79
|
+
The BUD_SAFE environment variable is used to force old-style behavior, where
|
80
|
+
every cached element is invalidated and fully scanned once every tick.
|
@@ -144,7 +144,8 @@ module Bud
|
|
144
144
|
# default for stateless elements
|
145
145
|
public
|
146
146
|
def add_rescan_invalidate(rescan, invalidate)
|
147
|
-
# if any of the source elements are in rescan mode, then put this node in
|
147
|
+
# if any of the source elements are in rescan mode, then put this node in
|
148
|
+
# rescan.
|
148
149
|
srcs = non_temporal_predecessors
|
149
150
|
if srcs.any?{|p| rescan.member? p}
|
150
151
|
rescan << self
|
@@ -157,7 +158,7 @@ module Bud
|
|
157
158
|
# finally, if this node is in rescan, pass the request on to all source
|
158
159
|
# elements
|
159
160
|
if rescan.member? self
|
160
|
-
rescan
|
161
|
+
rescan.merge(srcs)
|
161
162
|
end
|
162
163
|
end
|
163
164
|
|
@@ -177,14 +178,12 @@ module Bud
|
|
177
178
|
def <<(i)
|
178
179
|
insert(i, nil)
|
179
180
|
end
|
181
|
+
|
180
182
|
public
|
181
183
|
def flush
|
182
184
|
end
|
183
|
-
|
184
185
|
def invalidate_cache
|
185
|
-
#override to get rid of cached information.
|
186
186
|
end
|
187
|
-
public
|
188
187
|
def stratum_end
|
189
188
|
end
|
190
189
|
|
@@ -220,7 +219,7 @@ module Bud
|
|
220
219
|
def join(elem2, &blk)
|
221
220
|
# cached = @bud_instance.push_elems[[self.object_id,:join,[self,elem2], @bud_instance, blk]]
|
222
221
|
# if cached.nil?
|
223
|
-
elem2
|
222
|
+
elem2 = elem2.to_push_elem unless elem2.class <= PushElement
|
224
223
|
toplevel = @bud_instance.toplevel
|
225
224
|
join = Bud::PushSHJoin.new([self, elem2], toplevel.this_rule_context, [])
|
226
225
|
self.wire_to(join)
|
@@ -292,7 +291,6 @@ module Bud
|
|
292
291
|
return g
|
293
292
|
end
|
294
293
|
|
295
|
-
|
296
294
|
def argagg(aggname, gbkey_cols, collection, &blk)
|
297
295
|
gbkey_cols = gbkey_cols.map{|c| canonicalize_col(c)}
|
298
296
|
collection = canonicalize_col(collection)
|
@@ -353,7 +351,6 @@ module Bud
|
|
353
351
|
end
|
354
352
|
|
355
353
|
def reduce(initial, &blk)
|
356
|
-
@memo = initial
|
357
354
|
retval = Bud::PushReduce.new("reduce#{Time.new.tv_usec}",
|
358
355
|
@bud_instance, @collection_name,
|
359
356
|
schema, initial, &blk)
|
@@ -380,34 +377,18 @@ module Bud
|
|
380
377
|
end
|
381
378
|
toplevel.push_elems[[self.object_id, :inspected]]
|
382
379
|
end
|
383
|
-
|
384
|
-
def to_enum
|
385
|
-
# scr = @bud_instance.scratch(("scratch_" + Process.pid.to_s + "_" + object_id.to_s + "_" + rand(10000).to_s).to_sym, schema)
|
386
|
-
scr = []
|
387
|
-
self.wire_to(scr)
|
388
|
-
scr
|
389
|
-
end
|
390
380
|
end
|
391
381
|
|
392
382
|
class PushStatefulElement < PushElement
|
393
|
-
def rescan_at_tick
|
394
|
-
true
|
395
|
-
end
|
396
|
-
|
397
|
-
def rescan
|
398
|
-
true # always gives an entire dump of its contents
|
399
|
-
end
|
400
|
-
|
401
383
|
def add_rescan_invalidate(rescan, invalidate)
|
402
|
-
|
403
|
-
|
404
|
-
# (doesn't need to pass a rescan request to its its source nodes).
|
405
|
-
rescan << self
|
406
|
-
srcs = non_temporal_predecessors
|
407
|
-
if srcs.any? {|p| rescan.member? p}
|
384
|
+
if non_temporal_predecessors.any? {|e| rescan.member? e}
|
385
|
+
rescan << self
|
408
386
|
invalidate << self
|
409
387
|
end
|
410
388
|
|
389
|
+
# Note that we do not need to pass rescan requests up to our source
|
390
|
+
# elements, since a stateful element has enough local information to
|
391
|
+
# reproduce its output.
|
411
392
|
invalidate_tables(rescan, invalidate)
|
412
393
|
end
|
413
394
|
end
|
@@ -437,27 +418,29 @@ module Bud
|
|
437
418
|
class PushSort < PushStatefulElement
|
438
419
|
def initialize(elem_name=nil, bud_instance=nil, collection_name=nil,
|
439
420
|
schema_in=nil, &blk)
|
440
|
-
@sortbuf = []
|
441
421
|
super(elem_name, bud_instance, collection_name, schema_in, &blk)
|
422
|
+
@sortbuf = []
|
423
|
+
@seen_new_input = false
|
442
424
|
end
|
443
425
|
|
444
426
|
def insert(item, source)
|
445
427
|
@sortbuf << item
|
428
|
+
@seen_new_input = true
|
446
429
|
end
|
447
430
|
|
448
431
|
def flush
|
449
|
-
|
432
|
+
if @seen_new_input || @rescan
|
450
433
|
@sortbuf.sort!(&@blk)
|
451
434
|
@sortbuf.each do |t|
|
452
435
|
push_out(t, false)
|
453
436
|
end
|
454
|
-
@
|
437
|
+
@seen_new_input = false
|
438
|
+
@rescan = false
|
455
439
|
end
|
456
|
-
nil
|
457
440
|
end
|
458
441
|
|
459
442
|
def invalidate_cache
|
460
|
-
@sortbuf
|
443
|
+
@sortbuf.clear
|
461
444
|
end
|
462
445
|
end
|
463
446
|
|
@@ -488,11 +471,14 @@ module Bud
|
|
488
471
|
@invalidate_set = invalidate
|
489
472
|
end
|
490
473
|
|
491
|
-
public
|
492
474
|
def add_rescan_invalidate(rescan, invalidate)
|
493
|
-
#
|
475
|
+
# if the collection is to be invalidated, the scanner needs to be in
|
476
|
+
# rescan mode
|
494
477
|
rescan << self if invalidate.member? @collection
|
495
478
|
|
479
|
+
# in addition, default PushElement rescan/invalidate logic applies
|
480
|
+
super
|
481
|
+
|
496
482
|
# Note also that this node can be nominated for rescan by a target node;
|
497
483
|
# in other words, a scanner element can be set to rescan even if the
|
498
484
|
# collection is not invalidated.
|
@@ -555,20 +541,15 @@ module Bud
|
|
555
541
|
end
|
556
542
|
|
557
543
|
def add_rescan_invalidate(rescan, invalidate)
|
558
|
-
|
559
|
-
if srcs.any? {|p| rescan.member? p}
|
560
|
-
invalidate << self
|
561
|
-
rescan << self
|
562
|
-
end
|
563
|
-
|
564
|
-
invalidate_tables(rescan, invalidate)
|
544
|
+
super
|
565
545
|
|
566
546
|
# This node has some state (@each_index), but not the tuples. If it is in
|
567
547
|
# rescan mode, then it must ask its sources to rescan, and restart its
|
568
548
|
# index.
|
569
549
|
if rescan.member? self
|
570
550
|
invalidate << self
|
571
|
-
|
551
|
+
srcs = non_temporal_predecessors
|
552
|
+
rescan.merge(srcs)
|
572
553
|
end
|
573
554
|
end
|
574
555
|
|
data/lib/bud/executor/group.rb
CHANGED
@@ -1,39 +1,80 @@
|
|
1
1
|
require 'bud/executor/elements'
|
2
|
+
require 'set'
|
2
3
|
|
3
4
|
module Bud
|
4
5
|
class PushGroup < PushStatefulElement
|
5
|
-
def initialize(elem_name, bud_instance, collection_name,
|
6
|
-
|
6
|
+
def initialize(elem_name, bud_instance, collection_name,
|
7
|
+
keys_in, aggpairs_in, schema_in, &blk)
|
7
8
|
if keys_in.nil?
|
8
9
|
@keys = []
|
9
10
|
else
|
10
11
|
@keys = keys_in.map{|k| k[1]}
|
11
12
|
end
|
12
|
-
#
|
13
|
-
|
13
|
+
# An aggpair is an array: [agg class instance, index of input field].
|
14
|
+
# ap[1] is nil for Count.
|
15
|
+
@aggpairs = aggpairs_in.map{|ap| [ap[0], ap[1].nil? ? nil : ap[1][1]]}
|
16
|
+
@groups = {}
|
17
|
+
|
18
|
+
# Check whether we need to eliminate duplicates from our input (we might
|
19
|
+
# see duplicates because of the rescan/invalidation logic, as well as
|
20
|
+
# because we don't do duplicate elimination on the output of a projection
|
21
|
+
# operator). We don't need to dupelim if all the args are exemplary.
|
22
|
+
@elim_dups = @aggpairs.any? {|a| not a[0].kind_of? ArgExemplary}
|
23
|
+
if @elim_dups
|
24
|
+
@input_cache = Set.new
|
25
|
+
end
|
26
|
+
|
27
|
+
@seen_new_data = false
|
14
28
|
super(elem_name, bud_instance, collection_name, schema_in, &blk)
|
15
29
|
end
|
16
30
|
|
17
31
|
def insert(item, source)
|
32
|
+
if @elim_dups
|
33
|
+
return if @input_cache.include? item
|
34
|
+
@input_cache << item
|
35
|
+
end
|
36
|
+
|
37
|
+
@seen_new_data = true
|
18
38
|
key = @keys.map{|k| item[k]}
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
39
|
+
group_state = @groups[key]
|
40
|
+
if group_state.nil?
|
41
|
+
@groups[key] = @aggpairs.map do |ap|
|
42
|
+
input_val = ap[1].nil? ? item : item[ap[1]]
|
43
|
+
ap[0].init(input_val)
|
44
|
+
end
|
45
|
+
else
|
46
|
+
@aggpairs.each_with_index do |ap, agg_ix|
|
47
|
+
input_val = ap[1].nil? ? item : item[ap[1]]
|
48
|
+
state_val = ap[0].trans(group_state[agg_ix], input_val)[0]
|
49
|
+
group_state[agg_ix] = state_val
|
50
|
+
end
|
24
51
|
end
|
25
52
|
end
|
26
53
|
|
54
|
+
def add_rescan_invalidate(rescan, invalidate)
|
55
|
+
# XXX: need to understand why this is necessary; it is dissimilar to the
|
56
|
+
# way other stateful non-monotonic operators are handled.
|
57
|
+
rescan << self
|
58
|
+
super
|
59
|
+
end
|
60
|
+
|
27
61
|
def invalidate_cache
|
28
|
-
puts "
|
62
|
+
puts "#{self.class}/#{self.tabname} invalidated" if $BUD_DEBUG
|
29
63
|
@groups.clear
|
64
|
+
@input_cache.clear if @elim_dups
|
65
|
+
@seen_new_data = false
|
30
66
|
end
|
31
67
|
|
32
68
|
def flush
|
69
|
+
# If we haven't seen any input since the last call to flush(), we're done:
|
70
|
+
# our output would be the same as before.
|
71
|
+
return unless @seen_new_data
|
72
|
+
@seen_new_data = false
|
73
|
+
|
33
74
|
@groups.each do |g, grps|
|
34
75
|
grp = @keys == $EMPTY ? [[]] : [g]
|
35
76
|
@aggpairs.each_with_index do |ap, agg_ix|
|
36
|
-
grp << ap[0].
|
77
|
+
grp << ap[0].final(grps[agg_ix])
|
37
78
|
end
|
38
79
|
outval = grp[0].flatten
|
39
80
|
(1..grp.length-1).each {|i| outval << grp[i]}
|
@@ -44,31 +85,38 @@ module Bud
|
|
44
85
|
|
45
86
|
class PushArgAgg < PushGroup
|
46
87
|
def initialize(elem_name, bud_instance, collection_name, keys_in, aggpairs_in, schema_in, &blk)
|
47
|
-
|
88
|
+
unless aggpairs_in.length == 1
|
89
|
+
raise Bud::Error, "multiple aggpairs #{aggpairs_in.map{|a| a.class.name}} in ArgAgg; only one allowed"
|
90
|
+
end
|
48
91
|
super(elem_name, bud_instance, collection_name, keys_in, aggpairs_in, schema_in, &blk)
|
49
|
-
@agg = @aggpairs[0]
|
50
|
-
@aggcol = @aggpairs[0][1]
|
92
|
+
@agg, @aggcol = @aggpairs[0]
|
51
93
|
@winners = {}
|
52
94
|
end
|
53
95
|
|
54
96
|
public
|
55
97
|
def invalidate_cache
|
56
|
-
|
57
|
-
@groups.clear
|
98
|
+
super
|
58
99
|
@winners.clear
|
59
100
|
end
|
60
101
|
|
61
102
|
def insert(item, source)
|
62
103
|
key = @keys.map{|k| item[k]}
|
63
|
-
|
64
|
-
|
65
|
-
|
66
|
-
|
104
|
+
group_state = @groups[key]
|
105
|
+
if group_state.nil?
|
106
|
+
@seen_new_data = true
|
107
|
+
@groups[key] = @aggpairs.map do |ap|
|
67
108
|
@winners[key] = [item]
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
109
|
+
input_val = item[ap[1]]
|
110
|
+
ap[0].init(input_val)
|
111
|
+
end
|
112
|
+
else
|
113
|
+
@aggpairs.each_with_index do |ap, agg_ix|
|
114
|
+
input_val = item[ap[1]]
|
115
|
+
state_val, flag, *rest = ap[0].trans(group_state[agg_ix], input_val)
|
116
|
+
group_state[agg_ix] = state_val
|
117
|
+
@seen_new_data = true unless flag == :ignore
|
118
|
+
|
119
|
+
case flag
|
72
120
|
when :ignore
|
73
121
|
# do nothing
|
74
122
|
when :replace
|
@@ -76,19 +124,22 @@ module Bud
|
|
76
124
|
when :keep
|
77
125
|
@winners[key] << item
|
78
126
|
when :delete
|
79
|
-
|
80
|
-
@winners[key].delete t
|
127
|
+
rest.each do |t|
|
128
|
+
@winners[key].delete t
|
81
129
|
end
|
82
130
|
else
|
83
|
-
raise Bud::Error, "strange result from argagg
|
131
|
+
raise Bud::Error, "strange result from argagg transition func: #{flag}"
|
84
132
|
end
|
85
133
|
end
|
86
|
-
@groups[key] ||= Array.new(@aggpairs.length)
|
87
|
-
@groups[key][agg_ix] = agg
|
88
134
|
end
|
89
135
|
end
|
90
136
|
|
91
137
|
def flush
|
138
|
+
# If we haven't seen any input since the last call to flush(), we're done:
|
139
|
+
# our output would be the same as before.
|
140
|
+
return unless @seen_new_data
|
141
|
+
@seen_new_data = false
|
142
|
+
|
92
143
|
@groups.each_key do |g|
|
93
144
|
@winners[g].each do |t|
|
94
145
|
push_out(t, false)
|
data/lib/bud/executor/join.rb
CHANGED
@@ -67,14 +67,16 @@ module Bud
|
|
67
67
|
public
|
68
68
|
def state_id # :nodoc: all
|
69
69
|
object_id
|
70
|
-
|
70
|
+
end
|
71
|
+
|
72
|
+
def flush
|
73
|
+
replay_join if @rescan
|
71
74
|
end
|
72
75
|
|
73
76
|
# initialize the state for this join to be carried across iterations within a fixpoint
|
74
77
|
private
|
75
78
|
def setup_state
|
76
79
|
sid = state_id
|
77
|
-
|
78
80
|
@tabname = ("(" + @all_rels_below.map{|r| r.tabname}.join('*') +"):"+sid.to_s).to_sym
|
79
81
|
@hash_tables = [{}, {}]
|
80
82
|
end
|
@@ -131,21 +133,21 @@ module Bud
|
|
131
133
|
else
|
132
134
|
@keys = []
|
133
135
|
end
|
134
|
-
# puts "@keys = #{@keys.inspect}"
|
135
136
|
end
|
136
137
|
|
137
138
|
public
|
138
139
|
def invalidate_cache
|
139
140
|
@rels.each_with_index do |source_elem, i|
|
140
141
|
if source_elem.rescan
|
141
|
-
|
142
142
|
puts "#{tabname} rel:#{i}(#{source_elem.tabname}) invalidated" if $BUD_DEBUG
|
143
143
|
@hash_tables[i] = {}
|
144
|
-
if
|
145
|
-
#
|
146
|
-
#
|
147
|
-
|
148
|
-
#
|
144
|
+
if i == 0
|
145
|
+
# Only if i == 0 because outer joins in Bloom are left outer joins.
|
146
|
+
# If i == 1, missing_keys will be corrected when items are populated
|
147
|
+
# in the rhs fork.
|
148
|
+
# XXX This is not modular. We are doing invalidation work for outer
|
149
|
+
# joins, which is part of a separate module PushSHOuterJoin.
|
150
|
+
@missing_keys.clear
|
149
151
|
end
|
150
152
|
end
|
151
153
|
end
|
@@ -268,11 +270,12 @@ module Bud
|
|
268
270
|
|
269
271
|
public
|
270
272
|
def insert(item, source)
|
271
|
-
#
|
272
|
-
if
|
273
|
-
|
274
|
-
|
275
|
-
|
273
|
+
# If we need to reproduce the join's output, do that now before we process
|
274
|
+
# the to-be-inserted tuple. This avoids needless duplicates: if the
|
275
|
+
# to-be-inserted tuple produced any join output, we'd produce that output
|
276
|
+
# again if we didn't rescan now.
|
277
|
+
replay_join if @rescan
|
278
|
+
|
276
279
|
if @selfjoins.include? source.elem_name
|
277
280
|
offsets = []
|
278
281
|
@relnames.each_with_index{|r,i| offsets << i if r == source.elem_name}
|
@@ -308,50 +311,26 @@ module Bud
|
|
308
311
|
end
|
309
312
|
end
|
310
313
|
|
311
|
-
public
|
312
|
-
def rescan_at_tick
|
313
|
-
false
|
314
|
-
end
|
315
|
-
|
316
|
-
public
|
317
|
-
def add_rescan_invalidate(rescan, invalidate)
|
318
|
-
if non_temporal_predecessors.any? {|e| rescan.member? e}
|
319
|
-
rescan << self
|
320
|
-
invalidate << self
|
321
|
-
end
|
322
|
-
|
323
|
-
# The distinction between a join node and other stateful elements is that
|
324
|
-
# when a join node needs a rescan it doesn't tell all its sources to
|
325
|
-
# rescan. In fact, it doesn't have to pass a rescan request up to a
|
326
|
-
# source, because if a target needs a rescan, the join node has all the
|
327
|
-
# state necessary to feed the downstream node. And if a source node is in
|
328
|
-
# rescan, then at run-time only the state associated with that particular
|
329
|
-
# source node @hash_tables[offset] will be cleared, and will get filled up
|
330
|
-
# again because that source will rescan anyway.
|
331
|
-
invalidate_tables(rescan, invalidate)
|
332
|
-
end
|
333
|
-
|
334
314
|
def replay_join
|
335
|
-
|
336
|
-
b = @hash_tables
|
337
|
-
|
338
|
-
|
339
|
-
|
340
|
-
|
341
|
-
|
342
|
-
|
343
|
-
|
344
|
-
|
345
|
-
end
|
315
|
+
@rescan = false
|
316
|
+
a, b = @hash_tables
|
317
|
+
return if a.empty? or b.empty?
|
318
|
+
|
319
|
+
if a.size < b.size
|
320
|
+
a.each_pair do |key, items|
|
321
|
+
the_matches = b[key]
|
322
|
+
unless the_matches.nil?
|
323
|
+
items.each do |item|
|
324
|
+
process_matches(item, the_matches, 0)
|
346
325
|
end
|
347
326
|
end
|
348
|
-
|
349
|
-
|
350
|
-
|
351
|
-
|
352
|
-
|
353
|
-
|
354
|
-
|
327
|
+
end
|
328
|
+
else
|
329
|
+
b.each_pair do |key, items|
|
330
|
+
the_matches = a[key]
|
331
|
+
unless the_matches.nil?
|
332
|
+
items.each do |item|
|
333
|
+
process_matches(item, the_matches, 1)
|
355
334
|
end
|
356
335
|
end
|
357
336
|
end
|
@@ -489,7 +468,6 @@ module Bud
|
|
489
468
|
end
|
490
469
|
|
491
470
|
module PushSHOuterJoin
|
492
|
-
|
493
471
|
private
|
494
472
|
def insert_item(item, offset)
|
495
473
|
if @keys.nil? or @keys.empty?
|
@@ -517,6 +495,11 @@ module Bud
|
|
517
495
|
end
|
518
496
|
end
|
519
497
|
|
498
|
+
public
|
499
|
+
def rescan_at_tick
|
500
|
+
true
|
501
|
+
end
|
502
|
+
|
520
503
|
public
|
521
504
|
def stratum_end
|
522
505
|
flush
|
@@ -525,23 +508,24 @@ module Bud
|
|
525
508
|
|
526
509
|
private
|
527
510
|
def push_missing
|
528
|
-
|
529
|
-
@
|
530
|
-
@
|
531
|
-
push_out([t, @rels[1].null_tuple])
|
532
|
-
end
|
511
|
+
@missing_keys.each do |key|
|
512
|
+
@hash_tables[0][key].each do |t|
|
513
|
+
push_out([t, @rels[1].null_tuple])
|
533
514
|
end
|
534
515
|
end
|
535
516
|
end
|
536
517
|
end
|
537
518
|
|
538
519
|
|
539
|
-
# Consider "u <= s.notin(t, s.a => t.b)". notin is a non-monotonic operator,
|
540
|
-
# but negatively on t. Stratification ensures
|
541
|
-
#
|
542
|
-
#
|
543
|
-
#
|
544
|
-
#
|
520
|
+
# Consider "u <= s.notin(t, s.a => t.b)". notin is a non-monotonic operator,
|
521
|
+
# where u depends positively on s, but negatively on t. Stratification ensures
|
522
|
+
# that t is fully computed in a lower stratum, which means that we can expect
|
523
|
+
# multiple iterators on s's side only. If t's scanner were to push its
|
524
|
+
# elements down first, every insert of s merely needs to be cross checked with
|
525
|
+
# the cached elements of 't', and pushed down to the next element if s notin
|
526
|
+
# t. However, if s's scanner were to fire first, we have to wait until the
|
527
|
+
# first flush, at which point we are sure to have seen all the t-side tuples
|
528
|
+
# in this tick.
|
545
529
|
class PushNotIn < PushStatefulElement
|
546
530
|
def initialize(rellist, bud_instance, preds=nil, &blk) # :nodoc: all
|
547
531
|
@lhs, @rhs = rellist
|
@@ -552,7 +536,6 @@ module Bud
|
|
552
536
|
setup_preds(preds) unless preds.empty?
|
553
537
|
@rhs_rcvd = false
|
554
538
|
@hash_tables = [{},{}]
|
555
|
-
@rhs_rcvd = false
|
556
539
|
if @lhs_keycols.nil? and blk.nil?
|
557
540
|
# pointwise comparison. Could use zip, but it creates an array for each field pair
|
558
541
|
blk = lambda {|lhs, rhs|
|
@@ -563,9 +546,10 @@ module Bud
|
|
563
546
|
end
|
564
547
|
|
565
548
|
def setup_preds(preds)
|
566
|
-
# This is simpler than PushSHJoin's setup_preds, because notin is a binary
|
567
|
-
# collections.
|
568
|
-
#
|
549
|
+
# This is simpler than PushSHJoin's setup_preds, because notin is a binary
|
550
|
+
# operator where both lhs and rhs are collections. preds an array of
|
551
|
+
# hash_pairs. For now assume that the attributes are in the same order as
|
552
|
+
# the tables.
|
569
553
|
@lhs_keycols, @rhs_keycols = preds.reduce([[], []]) do |memo, item|
|
570
554
|
# each item is a hash
|
571
555
|
l = item.keys[0]
|
@@ -578,11 +562,11 @@ module Bud
|
|
578
562
|
def find_col(colspec, rel)
|
579
563
|
if colspec.is_a? Symbol
|
580
564
|
col_desc = rel.send(colspec)
|
581
|
-
raise "
|
565
|
+
raise Bud::Error, "unknown column #{colspec} in #{@rel.tabname}" if col_desc.nil?
|
582
566
|
elsif colspec.is_a? Array
|
583
567
|
col_desc = colspec
|
584
568
|
else
|
585
|
-
raise "
|
569
|
+
raise Bud::Error, "symbol or column spec expected. Got #{colspec}"
|
586
570
|
end
|
587
571
|
col_desc[1] # col_desc is of the form [tabname, colnum, colname]
|
588
572
|
end
|
@@ -592,11 +576,6 @@ module Bud
|
|
592
576
|
keycols.nil? ? $EMPTY : keycols.map{|col| item[col]}
|
593
577
|
end
|
594
578
|
|
595
|
-
public
|
596
|
-
def invalidate_at_tick
|
597
|
-
true
|
598
|
-
end
|
599
|
-
|
600
579
|
public
|
601
580
|
def rescan_at_tick
|
602
581
|
true
|
@@ -605,7 +584,6 @@ module Bud
|
|
605
584
|
def insert(item, source)
|
606
585
|
offset = source == @lhs ? 0 : 1
|
607
586
|
key = get_key(item, offset)
|
608
|
-
#puts "#{key}, #{item}, #{offset}"
|
609
587
|
(@hash_tables[offset][key] ||= Set.new).add item
|
610
588
|
if @rhs_rcvd and offset == 0
|
611
589
|
push_lhs(key, item)
|
@@ -613,15 +591,14 @@ module Bud
|
|
613
591
|
end
|
614
592
|
|
615
593
|
def flush
|
616
|
-
# When flush is called the first time, both lhs and rhs scanners have been
|
617
|
-
# we know that the rhs is not
|
594
|
+
# When flush is called the first time, both lhs and rhs scanners have been
|
595
|
+
# invoked, and because of stratification we know that the rhs is not
|
596
|
+
# growing any more, until the next tick.
|
618
597
|
unless @rhs_rcvd
|
619
598
|
@rhs_rcvd = true
|
620
|
-
@hash_tables[0].
|
621
|
-
values.each{|item|
|
622
|
-
|
623
|
-
}
|
624
|
-
}
|
599
|
+
@hash_tables[0].each do |key,values|
|
600
|
+
values.each {|item| push_lhs(key, item)}
|
601
|
+
end
|
625
602
|
end
|
626
603
|
end
|
627
604
|
|
@@ -661,9 +638,15 @@ module Bud
|
|
661
638
|
@delete_keys.each{|o| o.pending_delete_keys([item])}
|
662
639
|
@pendings.each{|o| o.pending_merge([item])}
|
663
640
|
end
|
664
|
-
|
665
|
-
|
666
|
-
|
667
|
-
|
641
|
+
|
642
|
+
def invalidate_cache
|
643
|
+
puts "#{self.class}/#{self.tabname} invalidated" if $BUD_DEBUG
|
644
|
+
@hash_tables = [{},{}]
|
645
|
+
@rhs_rcvd = false
|
646
|
+
end
|
647
|
+
|
648
|
+
def stratum_end
|
649
|
+
@rhs_rcvd = false
|
650
|
+
end
|
668
651
|
end
|
669
652
|
end
|