bud 0.9.2 → 0.9.3

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,80 @@
1
+ Notes on Invalidate and Rescan in Bud
2
+ =====================================
3
+
4
+ (I'll use 'downstream' to mean rhs to lhs (like in budplot). In every stratum,
5
+ data originates at scanned sources at the "top", winds its way through various
6
+ PushElements and ends up in a collection at the "bottom". I'll also the term
7
+ "elements" to mean both dataflow nodes (PushElements) and collections).
8
+
9
+ Invalidation strategy works through two flags/signals, rescan and
10
+ invalidate. Invalidation means a stateful PushElement or a scratch's contents
11
+ are erased, or table is negated. Rescan means that tuples coming out of an
12
+ element represent the entire collection (a full-scan), not just deltas.
13
+
14
+ Earlier: all stateful elements were eagerly invalidated.
15
+ Collections with state: scratches, interfaces, channels, terminal
16
+ Elements with state: Group, join, sort, reduce, each_with_index
17
+
18
+ Now: lazy invalidation where possible, based on the observation that the same
19
+ state is often rederived downstream, which means that as long as there are no
20
+ negations, one should be able to go on in incremental mode (working only on
21
+ deltas, not on storage) from one tick to another.
22
+
23
+ Observations:
24
+
25
+ 1. There are two kinds of elements that are (or may be) invalidated at the
26
+ beginning of every tick: source scratches (those that are not found on the
27
+ lhs of any rule), and tables that process pending negations.
28
+
29
+ 2. a. Invalidation implies rescan of its contents.
30
+
31
+ b. Rescan of its contents implies invalidation of downstream nodes.
32
+
33
+ c. Invalidation involves rebuilding of state, which means that if a node has
34
+ multiple sources, it has to ask the other sources to rescan as well.
35
+
36
+ Example: x,y,z are scratches
37
+ z <= x.group(....)
38
+ z <= y.sort {}
39
+
40
+ If x is invalidated, it will rescan its contents. The group element then
41
+ invalidates its state, and rebuilds itself as x is scanned. Since group is
42
+ in rescan mode, z invalidates its state and is rebuilt from group.
43
+ However, since part of z's state state comes from y.sort, it asks its
44
+ source element (the sort node) for a rescan as well.
45
+
46
+ This push-pull negotiation can be run until fixpoint, until the elements
47
+ that need to be invalidated and rescanned is determined fully.
48
+
49
+ 3. If a node is stateless, it passes the rescan request upstream, and the
50
+ invalidations downstream. But if it is stateful, it need not pass a rescan
51
+ request upstream. In the example above, only the sort node needs to rescan
52
+ its buffer; y doesn't need to be scanned at all.
53
+
54
+ 4. Solving the above constraints to a fixpoint at every tick is a huge
55
+ overhead. So we determine the strategy at wiring time.
56
+
57
+ bud.default_invalidate/default_rescan == set of elements that we know
58
+ apriori will _always_ need the corresponding signal.
59
+
60
+ scanner.invalidate_set/rescan_set == for each scanner, the set of elements
61
+ to invalidate/rescan should that scanner's collection be negated.
62
+
63
+ bud.prepare_invalidation_scheme works as follows.
64
+
65
+ Start the process by determining which tables will invalidate at each tick,
66
+ and which PushElements will rescan at the beginning of each tick. Then run
67
+ rescan_invalidate_tc for a transitive closure, where each element gets to
68
+ determine its own presence in the rescan and invalidate sets, depending on
69
+ its source or target elements' presence in those sets. This creates the
70
+ default sets.
71
+
72
+ Then for each scanner, prime the pump by setting the scanner to rescan mode,
73
+ and determine what effect it has on the system, by running
74
+ rescan_invalidate_tc. All the elements that are not already in the default
75
+ sets are those that need to be additionally informed at run time, should we
76
+ discover that that scanner's collection has been negated at the beginning of
77
+ each tick.
78
+
79
+ The BUD_SAFE environment variable is used to force old-style behavior, where
80
+ every cached element is invalidated and fully scanned once every tick.
@@ -144,7 +144,8 @@ module Bud
144
144
  # default for stateless elements
145
145
  public
146
146
  def add_rescan_invalidate(rescan, invalidate)
147
- # if any of the source elements are in rescan mode, then put this node in rescan.
147
+ # if any of the source elements are in rescan mode, then put this node in
148
+ # rescan.
148
149
  srcs = non_temporal_predecessors
149
150
  if srcs.any?{|p| rescan.member? p}
150
151
  rescan << self
@@ -157,7 +158,7 @@ module Bud
157
158
  # finally, if this node is in rescan, pass the request on to all source
158
159
  # elements
159
160
  if rescan.member? self
160
- rescan += srcs
161
+ rescan.merge(srcs)
161
162
  end
162
163
  end
163
164
 
@@ -177,14 +178,12 @@ module Bud
177
178
  def <<(i)
178
179
  insert(i, nil)
179
180
  end
181
+
180
182
  public
181
183
  def flush
182
184
  end
183
-
184
185
  def invalidate_cache
185
- #override to get rid of cached information.
186
186
  end
187
- public
188
187
  def stratum_end
189
188
  end
190
189
 
@@ -220,7 +219,7 @@ module Bud
220
219
  def join(elem2, &blk)
221
220
  # cached = @bud_instance.push_elems[[self.object_id,:join,[self,elem2], @bud_instance, blk]]
222
221
  # if cached.nil?
223
- elem2 = elem2.to_push_elem unless elem2.class <= PushElement
222
+ elem2 = elem2.to_push_elem unless elem2.class <= PushElement
224
223
  toplevel = @bud_instance.toplevel
225
224
  join = Bud::PushSHJoin.new([self, elem2], toplevel.this_rule_context, [])
226
225
  self.wire_to(join)
@@ -292,7 +291,6 @@ module Bud
292
291
  return g
293
292
  end
294
293
 
295
-
296
294
  def argagg(aggname, gbkey_cols, collection, &blk)
297
295
  gbkey_cols = gbkey_cols.map{|c| canonicalize_col(c)}
298
296
  collection = canonicalize_col(collection)
@@ -353,7 +351,6 @@ module Bud
353
351
  end
354
352
 
355
353
  def reduce(initial, &blk)
356
- @memo = initial
357
354
  retval = Bud::PushReduce.new("reduce#{Time.new.tv_usec}",
358
355
  @bud_instance, @collection_name,
359
356
  schema, initial, &blk)
@@ -380,34 +377,18 @@ module Bud
380
377
  end
381
378
  toplevel.push_elems[[self.object_id, :inspected]]
382
379
  end
383
-
384
- def to_enum
385
- # scr = @bud_instance.scratch(("scratch_" + Process.pid.to_s + "_" + object_id.to_s + "_" + rand(10000).to_s).to_sym, schema)
386
- scr = []
387
- self.wire_to(scr)
388
- scr
389
- end
390
380
  end
391
381
 
392
382
  class PushStatefulElement < PushElement
393
- def rescan_at_tick
394
- true
395
- end
396
-
397
- def rescan
398
- true # always gives an entire dump of its contents
399
- end
400
-
401
383
  def add_rescan_invalidate(rescan, invalidate)
402
- # If an upstream node is set to rescan, a stateful node invalidates its
403
- # cache. In addition, a stateful node always rescans its own contents
404
- # (doesn't need to pass a rescan request to its its source nodes).
405
- rescan << self
406
- srcs = non_temporal_predecessors
407
- if srcs.any? {|p| rescan.member? p}
384
+ if non_temporal_predecessors.any? {|e| rescan.member? e}
385
+ rescan << self
408
386
  invalidate << self
409
387
  end
410
388
 
389
+ # Note that we do not need to pass rescan requests up to our source
390
+ # elements, since a stateful element has enough local information to
391
+ # reproduce its output.
411
392
  invalidate_tables(rescan, invalidate)
412
393
  end
413
394
  end
@@ -437,27 +418,29 @@ module Bud
437
418
  class PushSort < PushStatefulElement
438
419
  def initialize(elem_name=nil, bud_instance=nil, collection_name=nil,
439
420
  schema_in=nil, &blk)
440
- @sortbuf = []
441
421
  super(elem_name, bud_instance, collection_name, schema_in, &blk)
422
+ @sortbuf = []
423
+ @seen_new_input = false
442
424
  end
443
425
 
444
426
  def insert(item, source)
445
427
  @sortbuf << item
428
+ @seen_new_input = true
446
429
  end
447
430
 
448
431
  def flush
449
- unless @sortbuf.empty?
432
+ if @seen_new_input || @rescan
450
433
  @sortbuf.sort!(&@blk)
451
434
  @sortbuf.each do |t|
452
435
  push_out(t, false)
453
436
  end
454
- @sortbuf = []
437
+ @seen_new_input = false
438
+ @rescan = false
455
439
  end
456
- nil
457
440
  end
458
441
 
459
442
  def invalidate_cache
460
- @sortbuf = []
443
+ @sortbuf.clear
461
444
  end
462
445
  end
463
446
 
@@ -488,11 +471,14 @@ module Bud
488
471
  @invalidate_set = invalidate
489
472
  end
490
473
 
491
- public
492
474
  def add_rescan_invalidate(rescan, invalidate)
493
- # scanner elements are never directly connected to tables.
475
+ # if the collection is to be invalidated, the scanner needs to be in
476
+ # rescan mode
494
477
  rescan << self if invalidate.member? @collection
495
478
 
479
+ # in addition, default PushElement rescan/invalidate logic applies
480
+ super
481
+
496
482
  # Note also that this node can be nominated for rescan by a target node;
497
483
  # in other words, a scanner element can be set to rescan even if the
498
484
  # collection is not invalidated.
@@ -555,20 +541,15 @@ module Bud
555
541
  end
556
542
 
557
543
  def add_rescan_invalidate(rescan, invalidate)
558
- srcs = non_temporal_predecessors
559
- if srcs.any? {|p| rescan.member? p}
560
- invalidate << self
561
- rescan << self
562
- end
563
-
564
- invalidate_tables(rescan, invalidate)
544
+ super
565
545
 
566
546
  # This node has some state (@each_index), but not the tuples. If it is in
567
547
  # rescan mode, then it must ask its sources to rescan, and restart its
568
548
  # index.
569
549
  if rescan.member? self
570
550
  invalidate << self
571
- rescan += srcs
551
+ srcs = non_temporal_predecessors
552
+ rescan.merge(srcs)
572
553
  end
573
554
  end
574
555
 
@@ -1,39 +1,80 @@
1
1
  require 'bud/executor/elements'
2
+ require 'set'
2
3
 
3
4
  module Bud
4
5
  class PushGroup < PushStatefulElement
5
- def initialize(elem_name, bud_instance, collection_name, keys_in, aggpairs_in, schema_in, &blk)
6
- @groups = {}
6
+ def initialize(elem_name, bud_instance, collection_name,
7
+ keys_in, aggpairs_in, schema_in, &blk)
7
8
  if keys_in.nil?
8
9
  @keys = []
9
10
  else
10
11
  @keys = keys_in.map{|k| k[1]}
11
12
  end
12
- # ap[1] is nil for Count
13
- @aggpairs = aggpairs_in.map{|ap| ap[1].nil? ? [ap[0]] : [ap[0], ap[1][1]]}
13
+ # An aggpair is an array: [agg class instance, index of input field].
14
+ # ap[1] is nil for Count.
15
+ @aggpairs = aggpairs_in.map{|ap| [ap[0], ap[1].nil? ? nil : ap[1][1]]}
16
+ @groups = {}
17
+
18
+ # Check whether we need to eliminate duplicates from our input (we might
19
+ # see duplicates because of the rescan/invalidation logic, as well as
20
+ # because we don't do duplicate elimination on the output of a projection
21
+ # operator). We don't need to dupelim if all the args are exemplary.
22
+ @elim_dups = @aggpairs.any? {|a| not a[0].kind_of? ArgExemplary}
23
+ if @elim_dups
24
+ @input_cache = Set.new
25
+ end
26
+
27
+ @seen_new_data = false
14
28
  super(elem_name, bud_instance, collection_name, schema_in, &blk)
15
29
  end
16
30
 
17
31
  def insert(item, source)
32
+ if @elim_dups
33
+ return if @input_cache.include? item
34
+ @input_cache << item
35
+ end
36
+
37
+ @seen_new_data = true
18
38
  key = @keys.map{|k| item[k]}
19
- @aggpairs.each_with_index do |ap, agg_ix|
20
- agg_input = ap[1].nil? ? item : item[ap[1]]
21
- agg = (@groups[key].nil? or @groups[key][agg_ix].nil?) ? ap[0].send(:init, agg_input) : ap[0].send(:trans, @groups[key][agg_ix], agg_input)[0]
22
- @groups[key] ||= Array.new(@aggpairs.length)
23
- @groups[key][agg_ix] = agg
39
+ group_state = @groups[key]
40
+ if group_state.nil?
41
+ @groups[key] = @aggpairs.map do |ap|
42
+ input_val = ap[1].nil? ? item : item[ap[1]]
43
+ ap[0].init(input_val)
44
+ end
45
+ else
46
+ @aggpairs.each_with_index do |ap, agg_ix|
47
+ input_val = ap[1].nil? ? item : item[ap[1]]
48
+ state_val = ap[0].trans(group_state[agg_ix], input_val)[0]
49
+ group_state[agg_ix] = state_val
50
+ end
24
51
  end
25
52
  end
26
53
 
54
+ def add_rescan_invalidate(rescan, invalidate)
55
+ # XXX: need to understand why this is necessary; it is dissimilar to the
56
+ # way other stateful non-monotonic operators are handled.
57
+ rescan << self
58
+ super
59
+ end
60
+
27
61
  def invalidate_cache
28
- puts "Group #{qualified_tabname} invalidated" if $BUD_DEBUG
62
+ puts "#{self.class}/#{self.tabname} invalidated" if $BUD_DEBUG
29
63
  @groups.clear
64
+ @input_cache.clear if @elim_dups
65
+ @seen_new_data = false
30
66
  end
31
67
 
32
68
  def flush
69
+ # If we haven't seen any input since the last call to flush(), we're done:
70
+ # our output would be the same as before.
71
+ return unless @seen_new_data
72
+ @seen_new_data = false
73
+
33
74
  @groups.each do |g, grps|
34
75
  grp = @keys == $EMPTY ? [[]] : [g]
35
76
  @aggpairs.each_with_index do |ap, agg_ix|
36
- grp << ap[0].send(:final, grps[agg_ix])
77
+ grp << ap[0].final(grps[agg_ix])
37
78
  end
38
79
  outval = grp[0].flatten
39
80
  (1..grp.length-1).each {|i| outval << grp[i]}
@@ -44,31 +85,38 @@ module Bud
44
85
 
45
86
  class PushArgAgg < PushGroup
46
87
  def initialize(elem_name, bud_instance, collection_name, keys_in, aggpairs_in, schema_in, &blk)
47
- raise Bud::Error, "multiple aggpairs #{aggpairs_in.map{|a| a.class.name}} in ArgAgg; only one allowed" if aggpairs_in.length > 1
88
+ unless aggpairs_in.length == 1
89
+ raise Bud::Error, "multiple aggpairs #{aggpairs_in.map{|a| a.class.name}} in ArgAgg; only one allowed"
90
+ end
48
91
  super(elem_name, bud_instance, collection_name, keys_in, aggpairs_in, schema_in, &blk)
49
- @agg = @aggpairs[0][0]
50
- @aggcol = @aggpairs[0][1]
92
+ @agg, @aggcol = @aggpairs[0]
51
93
  @winners = {}
52
94
  end
53
95
 
54
96
  public
55
97
  def invalidate_cache
56
- puts "#{self.class}/#{self.tabname} invalidated" if $BUD_DEBUG
57
- @groups.clear
98
+ super
58
99
  @winners.clear
59
100
  end
60
101
 
61
102
  def insert(item, source)
62
103
  key = @keys.map{|k| item[k]}
63
- @aggpairs.each_with_index do |ap, agg_ix|
64
- agg_input = item[ap[1]]
65
- if @groups[key].nil?
66
- agg = ap[0].send(:init, agg_input)
104
+ group_state = @groups[key]
105
+ if group_state.nil?
106
+ @seen_new_data = true
107
+ @groups[key] = @aggpairs.map do |ap|
67
108
  @winners[key] = [item]
68
- else
69
- agg_result = ap[0].send(:trans, @groups[key][agg_ix], agg_input)
70
- agg = agg_result[0]
71
- case agg_result[1]
109
+ input_val = item[ap[1]]
110
+ ap[0].init(input_val)
111
+ end
112
+ else
113
+ @aggpairs.each_with_index do |ap, agg_ix|
114
+ input_val = item[ap[1]]
115
+ state_val, flag, *rest = ap[0].trans(group_state[agg_ix], input_val)
116
+ group_state[agg_ix] = state_val
117
+ @seen_new_data = true unless flag == :ignore
118
+
119
+ case flag
72
120
  when :ignore
73
121
  # do nothing
74
122
  when :replace
@@ -76,19 +124,22 @@ module Bud
76
124
  when :keep
77
125
  @winners[key] << item
78
126
  when :delete
79
- agg_result[2..-1].each do |t|
80
- @winners[key].delete t unless @winners[key].empty?
127
+ rest.each do |t|
128
+ @winners[key].delete t
81
129
  end
82
130
  else
83
- raise Bud::Error, "strange result from argagg finalizer"
131
+ raise Bud::Error, "strange result from argagg transition func: #{flag}"
84
132
  end
85
133
  end
86
- @groups[key] ||= Array.new(@aggpairs.length)
87
- @groups[key][agg_ix] = agg
88
134
  end
89
135
  end
90
136
 
91
137
  def flush
138
+ # If we haven't seen any input since the last call to flush(), we're done:
139
+ # our output would be the same as before.
140
+ return unless @seen_new_data
141
+ @seen_new_data = false
142
+
92
143
  @groups.each_key do |g|
93
144
  @winners[g].each do |t|
94
145
  push_out(t, false)
@@ -67,14 +67,16 @@ module Bud
67
67
  public
68
68
  def state_id # :nodoc: all
69
69
  object_id
70
- # Marshal.dump([@rels.map{|r| r.tabname}, @localpreds]).hash
70
+ end
71
+
72
+ def flush
73
+ replay_join if @rescan
71
74
  end
72
75
 
73
76
  # initialize the state for this join to be carried across iterations within a fixpoint
74
77
  private
75
78
  def setup_state
76
79
  sid = state_id
77
-
78
80
  @tabname = ("(" + @all_rels_below.map{|r| r.tabname}.join('*') +"):"+sid.to_s).to_sym
79
81
  @hash_tables = [{}, {}]
80
82
  end
@@ -131,21 +133,21 @@ module Bud
131
133
  else
132
134
  @keys = []
133
135
  end
134
- # puts "@keys = #{@keys.inspect}"
135
136
  end
136
137
 
137
138
  public
138
139
  def invalidate_cache
139
140
  @rels.each_with_index do |source_elem, i|
140
141
  if source_elem.rescan
141
-
142
142
  puts "#{tabname} rel:#{i}(#{source_elem.tabname}) invalidated" if $BUD_DEBUG
143
143
  @hash_tables[i] = {}
144
- if i == 0
145
- # XXX This is not modular. We are doing invalidation work for outer joins, which is part of a
146
- # separate module PushSHOuterJoin.
147
- @missing_keys.clear # Only if i == 0 because outer joins in Bloom are left outer joins
148
- # if i == 1, missing_keys will be corrected when items are populated in the rhs fork
144
+ if i == 0
145
+ # Only if i == 0 because outer joins in Bloom are left outer joins.
146
+ # If i == 1, missing_keys will be corrected when items are populated
147
+ # in the rhs fork.
148
+ # XXX This is not modular. We are doing invalidation work for outer
149
+ # joins, which is part of a separate module PushSHOuterJoin.
150
+ @missing_keys.clear
149
151
  end
150
152
  end
151
153
  end
@@ -268,11 +270,12 @@ module Bud
268
270
 
269
271
  public
270
272
  def insert(item, source)
271
- #puts "JOIN: #{source.tabname} --> #{self.tabname} : #{item}/#{item.class}"
272
- if @rescan
273
- replay_join
274
- @rescan = false
275
- end
273
+ # If we need to reproduce the join's output, do that now before we process
274
+ # the to-be-inserted tuple. This avoids needless duplicates: if the
275
+ # to-be-inserted tuple produced any join output, we'd produce that output
276
+ # again if we didn't rescan now.
277
+ replay_join if @rescan
278
+
276
279
  if @selfjoins.include? source.elem_name
277
280
  offsets = []
278
281
  @relnames.each_with_index{|r,i| offsets << i if r == source.elem_name}
@@ -308,50 +311,26 @@ module Bud
308
311
  end
309
312
  end
310
313
 
311
- public
312
- def rescan_at_tick
313
- false
314
- end
315
-
316
- public
317
- def add_rescan_invalidate(rescan, invalidate)
318
- if non_temporal_predecessors.any? {|e| rescan.member? e}
319
- rescan << self
320
- invalidate << self
321
- end
322
-
323
- # The distinction between a join node and other stateful elements is that
324
- # when a join node needs a rescan it doesn't tell all its sources to
325
- # rescan. In fact, it doesn't have to pass a rescan request up to a
326
- # source, because if a target needs a rescan, the join node has all the
327
- # state necessary to feed the downstream node. And if a source node is in
328
- # rescan, then at run-time only the state associated with that particular
329
- # source node @hash_tables[offset] will be cleared, and will get filled up
330
- # again because that source will rescan anyway.
331
- invalidate_tables(rescan, invalidate)
332
- end
333
-
334
314
  def replay_join
335
- a = @hash_tables[0]
336
- b = @hash_tables[1]
337
-
338
- if not(a.empty? or b.empty?)
339
- if a.size < b.size
340
- a.each_pair do |key, items|
341
- the_matches = b[key]
342
- unless the_matches.nil?
343
- items.each do |item|
344
- process_matches(item, the_matches, 1)
345
- end
315
+ @rescan = false
316
+ a, b = @hash_tables
317
+ return if a.empty? or b.empty?
318
+
319
+ if a.size < b.size
320
+ a.each_pair do |key, items|
321
+ the_matches = b[key]
322
+ unless the_matches.nil?
323
+ items.each do |item|
324
+ process_matches(item, the_matches, 0)
346
325
  end
347
326
  end
348
- else
349
- b.each_pair do |key, items|
350
- the_matches = a[key]
351
- unless the_matches.nil?
352
- items.each do |item|
353
- process_matches(item, the_matches, 0)
354
- end
327
+ end
328
+ else
329
+ b.each_pair do |key, items|
330
+ the_matches = a[key]
331
+ unless the_matches.nil?
332
+ items.each do |item|
333
+ process_matches(item, the_matches, 1)
355
334
  end
356
335
  end
357
336
  end
@@ -489,7 +468,6 @@ module Bud
489
468
  end
490
469
 
491
470
  module PushSHOuterJoin
492
-
493
471
  private
494
472
  def insert_item(item, offset)
495
473
  if @keys.nil? or @keys.empty?
@@ -517,6 +495,11 @@ module Bud
517
495
  end
518
496
  end
519
497
 
498
+ public
499
+ def rescan_at_tick
500
+ true
501
+ end
502
+
520
503
  public
521
504
  def stratum_end
522
505
  flush
@@ -525,23 +508,24 @@ module Bud
525
508
 
526
509
  private
527
510
  def push_missing
528
- if @missing_keys
529
- @missing_keys.each do |key|
530
- @hash_tables[0][key].each do |t|
531
- push_out([t, @rels[1].null_tuple])
532
- end
511
+ @missing_keys.each do |key|
512
+ @hash_tables[0][key].each do |t|
513
+ push_out([t, @rels[1].null_tuple])
533
514
  end
534
515
  end
535
516
  end
536
517
  end
537
518
 
538
519
 
539
- # Consider "u <= s.notin(t, s.a => t.b)". notin is a non-monotonic operator, where u depends positively on s,
540
- # but negatively on t. Stratification ensures that t is fully computed in a lower stratum, which means that we
541
- # can expect multiple iterators on s's side only. If t's scanner were to push its elemends down first, every
542
- # insert of s merely needs to be cross checked with the cached elements of 't', and pushed down to the next
543
- # element if s notin t. However, if s's scanner were to fire first, we have to wait until the first flush, at which
544
- # point we are sure to have seen all the t-side tuples in this tick.
520
+ # Consider "u <= s.notin(t, s.a => t.b)". notin is a non-monotonic operator,
521
+ # where u depends positively on s, but negatively on t. Stratification ensures
522
+ # that t is fully computed in a lower stratum, which means that we can expect
523
+ # multiple iterators on s's side only. If t's scanner were to push its
524
+ # elements down first, every insert of s merely needs to be cross checked with
525
+ # the cached elements of 't', and pushed down to the next element if s notin
526
+ # t. However, if s's scanner were to fire first, we have to wait until the
527
+ # first flush, at which point we are sure to have seen all the t-side tuples
528
+ # in this tick.
545
529
  class PushNotIn < PushStatefulElement
546
530
  def initialize(rellist, bud_instance, preds=nil, &blk) # :nodoc: all
547
531
  @lhs, @rhs = rellist
@@ -552,7 +536,6 @@ module Bud
552
536
  setup_preds(preds) unless preds.empty?
553
537
  @rhs_rcvd = false
554
538
  @hash_tables = [{},{}]
555
- @rhs_rcvd = false
556
539
  if @lhs_keycols.nil? and blk.nil?
557
540
  # pointwise comparison. Could use zip, but it creates an array for each field pair
558
541
  blk = lambda {|lhs, rhs|
@@ -563,9 +546,10 @@ module Bud
563
546
  end
564
547
 
565
548
  def setup_preds(preds)
566
- # This is simpler than PushSHJoin's setup_preds, because notin is a binary operator where both lhs and rhs are
567
- # collections.
568
- # preds an array of hash_pairs. For now assume that the attributes are in the same order as the tables.
549
+ # This is simpler than PushSHJoin's setup_preds, because notin is a binary
550
+ # operator where both lhs and rhs are collections. preds an array of
551
+ # hash_pairs. For now assume that the attributes are in the same order as
552
+ # the tables.
569
553
  @lhs_keycols, @rhs_keycols = preds.reduce([[], []]) do |memo, item|
570
554
  # each item is a hash
571
555
  l = item.keys[0]
@@ -578,11 +562,11 @@ module Bud
578
562
  def find_col(colspec, rel)
579
563
  if colspec.is_a? Symbol
580
564
  col_desc = rel.send(colspec)
581
- raise "Unknown column #{rel} in #{@rel.tabname}" if col_desc.nil?
565
+ raise Bud::Error, "unknown column #{colspec} in #{@rel.tabname}" if col_desc.nil?
582
566
  elsif colspec.is_a? Array
583
567
  col_desc = colspec
584
568
  else
585
- raise "Symbol or column spec expected. Got #{colspec}"
569
+ raise Bud::Error, "symbol or column spec expected. Got #{colspec}"
586
570
  end
587
571
  col_desc[1] # col_desc is of the form [tabname, colnum, colname]
588
572
  end
@@ -592,11 +576,6 @@ module Bud
592
576
  keycols.nil? ? $EMPTY : keycols.map{|col| item[col]}
593
577
  end
594
578
 
595
- public
596
- def invalidate_at_tick
597
- true
598
- end
599
-
600
579
  public
601
580
  def rescan_at_tick
602
581
  true
@@ -605,7 +584,6 @@ module Bud
605
584
  def insert(item, source)
606
585
  offset = source == @lhs ? 0 : 1
607
586
  key = get_key(item, offset)
608
- #puts "#{key}, #{item}, #{offset}"
609
587
  (@hash_tables[offset][key] ||= Set.new).add item
610
588
  if @rhs_rcvd and offset == 0
611
589
  push_lhs(key, item)
@@ -613,15 +591,14 @@ module Bud
613
591
  end
614
592
 
615
593
  def flush
616
- # When flush is called the first time, both lhs and rhs scanners have been invoked, and because of stratification
617
- # we know that the rhs is not growing any more, until the next tick.
594
+ # When flush is called the first time, both lhs and rhs scanners have been
595
+ # invoked, and because of stratification we know that the rhs is not
596
+ # growing any more, until the next tick.
618
597
  unless @rhs_rcvd
619
598
  @rhs_rcvd = true
620
- @hash_tables[0].map{|key,values|
621
- values.each{|item|
622
- push_lhs(key, item)
623
- }
624
- }
599
+ @hash_tables[0].each do |key,values|
600
+ values.each {|item| push_lhs(key, item)}
601
+ end
625
602
  end
626
603
  end
627
604
 
@@ -661,9 +638,15 @@ module Bud
661
638
  @delete_keys.each{|o| o.pending_delete_keys([item])}
662
639
  @pendings.each{|o| o.pending_merge([item])}
663
640
  end
664
- end
665
- def stratum_end
666
- @hash_tables = [{},{}]
667
- @rhs_rcvd = false
641
+
642
+ def invalidate_cache
643
+ puts "#{self.class}/#{self.tabname} invalidated" if $BUD_DEBUG
644
+ @hash_tables = [{},{}]
645
+ @rhs_rcvd = false
646
+ end
647
+
648
+ def stratum_end
649
+ @rhs_rcvd = false
650
+ end
668
651
  end
669
652
  end