bud 0.9.2 → 0.9.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,80 @@
1
+ Notes on Invalidate and Rescan in Bud
2
+ =====================================
3
+
4
+ (I'll use 'downstream' to mean rhs to lhs (like in budplot). In every stratum,
5
+ data originates at scanned sources at the "top", winds its way through various
6
+ PushElements and ends up in a collection at the "bottom". I'll also the term
7
+ "elements" to mean both dataflow nodes (PushElements) and collections).
8
+
9
+ Invalidation strategy works through two flags/signals, rescan and
10
+ invalidate. Invalidation means a stateful PushElement or a scratch's contents
11
+ are erased, or table is negated. Rescan means that tuples coming out of an
12
+ element represent the entire collection (a full-scan), not just deltas.
13
+
14
+ Earlier: all stateful elements were eagerly invalidated.
15
+ Collections with state: scratches, interfaces, channels, terminal
16
+ Elements with state: Group, join, sort, reduce, each_with_index
17
+
18
+ Now: lazy invalidation where possible, based on the observation that the same
19
+ state is often rederived downstream, which means that as long as there are no
20
+ negations, one should be able to go on in incremental mode (working only on
21
+ deltas, not on storage) from one tick to another.
22
+
23
+ Observations:
24
+
25
+ 1. There are two kinds of elements that are (or may be) invalidated at the
26
+ beginning of every tick: source scratches (those that are not found on the
27
+ lhs of any rule), and tables that process pending negations.
28
+
29
+ 2. a. Invalidation implies rescan of its contents.
30
+
31
+ b. Rescan of its contents implies invalidation of downstream nodes.
32
+
33
+ c. Invalidation involves rebuilding of state, which means that if a node has
34
+ multiple sources, it has to ask the other sources to rescan as well.
35
+
36
+ Example: x,y,z are scratches
37
+ z <= x.group(....)
38
+ z <= y.sort {}
39
+
40
+ If x is invalidated, it will rescan its contents. The group element then
41
+ invalidates its state, and rebuilds itself as x is scanned. Since group is
42
+ in rescan mode, z invalidates its state and is rebuilt from group.
43
+ However, since part of z's state state comes from y.sort, it asks its
44
+ source element (the sort node) for a rescan as well.
45
+
46
+ This push-pull negotiation can be run until fixpoint, until the elements
47
+ that need to be invalidated and rescanned is determined fully.
48
+
49
+ 3. If a node is stateless, it passes the rescan request upstream, and the
50
+ invalidations downstream. But if it is stateful, it need not pass a rescan
51
+ request upstream. In the example above, only the sort node needs to rescan
52
+ its buffer; y doesn't need to be scanned at all.
53
+
54
+ 4. Solving the above constraints to a fixpoint at every tick is a huge
55
+ overhead. So we determine the strategy at wiring time.
56
+
57
+ bud.default_invalidate/default_rescan == set of elements that we know
58
+ apriori will _always_ need the corresponding signal.
59
+
60
+ scanner.invalidate_set/rescan_set == for each scanner, the set of elements
61
+ to invalidate/rescan should that scanner's collection be negated.
62
+
63
+ bud.prepare_invalidation_scheme works as follows.
64
+
65
+ Start the process by determining which tables will invalidate at each tick,
66
+ and which PushElements will rescan at the beginning of each tick. Then run
67
+ rescan_invalidate_tc for a transitive closure, where each element gets to
68
+ determine its own presence in the rescan and invalidate sets, depending on
69
+ its source or target elements' presence in those sets. This creates the
70
+ default sets.
71
+
72
+ Then for each scanner, prime the pump by setting the scanner to rescan mode,
73
+ and determine what effect it has on the system, by running
74
+ rescan_invalidate_tc. All the elements that are not already in the default
75
+ sets are those that need to be additionally informed at run time, should we
76
+ discover that that scanner's collection has been negated at the beginning of
77
+ each tick.
78
+
79
+ The BUD_SAFE environment variable is used to force old-style behavior, where
80
+ every cached element is invalidated and fully scanned once every tick.
@@ -144,7 +144,8 @@ module Bud
144
144
  # default for stateless elements
145
145
  public
146
146
  def add_rescan_invalidate(rescan, invalidate)
147
- # if any of the source elements are in rescan mode, then put this node in rescan.
147
+ # if any of the source elements are in rescan mode, then put this node in
148
+ # rescan.
148
149
  srcs = non_temporal_predecessors
149
150
  if srcs.any?{|p| rescan.member? p}
150
151
  rescan << self
@@ -157,7 +158,7 @@ module Bud
157
158
  # finally, if this node is in rescan, pass the request on to all source
158
159
  # elements
159
160
  if rescan.member? self
160
- rescan += srcs
161
+ rescan.merge(srcs)
161
162
  end
162
163
  end
163
164
 
@@ -177,14 +178,12 @@ module Bud
177
178
  def <<(i)
178
179
  insert(i, nil)
179
180
  end
181
+
180
182
  public
181
183
  def flush
182
184
  end
183
-
184
185
  def invalidate_cache
185
- #override to get rid of cached information.
186
186
  end
187
- public
188
187
  def stratum_end
189
188
  end
190
189
 
@@ -220,7 +219,7 @@ module Bud
220
219
  def join(elem2, &blk)
221
220
  # cached = @bud_instance.push_elems[[self.object_id,:join,[self,elem2], @bud_instance, blk]]
222
221
  # if cached.nil?
223
- elem2 = elem2.to_push_elem unless elem2.class <= PushElement
222
+ elem2 = elem2.to_push_elem unless elem2.class <= PushElement
224
223
  toplevel = @bud_instance.toplevel
225
224
  join = Bud::PushSHJoin.new([self, elem2], toplevel.this_rule_context, [])
226
225
  self.wire_to(join)
@@ -292,7 +291,6 @@ module Bud
292
291
  return g
293
292
  end
294
293
 
295
-
296
294
  def argagg(aggname, gbkey_cols, collection, &blk)
297
295
  gbkey_cols = gbkey_cols.map{|c| canonicalize_col(c)}
298
296
  collection = canonicalize_col(collection)
@@ -353,7 +351,6 @@ module Bud
353
351
  end
354
352
 
355
353
  def reduce(initial, &blk)
356
- @memo = initial
357
354
  retval = Bud::PushReduce.new("reduce#{Time.new.tv_usec}",
358
355
  @bud_instance, @collection_name,
359
356
  schema, initial, &blk)
@@ -380,34 +377,18 @@ module Bud
380
377
  end
381
378
  toplevel.push_elems[[self.object_id, :inspected]]
382
379
  end
383
-
384
- def to_enum
385
- # scr = @bud_instance.scratch(("scratch_" + Process.pid.to_s + "_" + object_id.to_s + "_" + rand(10000).to_s).to_sym, schema)
386
- scr = []
387
- self.wire_to(scr)
388
- scr
389
- end
390
380
  end
391
381
 
392
382
  class PushStatefulElement < PushElement
393
- def rescan_at_tick
394
- true
395
- end
396
-
397
- def rescan
398
- true # always gives an entire dump of its contents
399
- end
400
-
401
383
  def add_rescan_invalidate(rescan, invalidate)
402
- # If an upstream node is set to rescan, a stateful node invalidates its
403
- # cache. In addition, a stateful node always rescans its own contents
404
- # (doesn't need to pass a rescan request to its its source nodes).
405
- rescan << self
406
- srcs = non_temporal_predecessors
407
- if srcs.any? {|p| rescan.member? p}
384
+ if non_temporal_predecessors.any? {|e| rescan.member? e}
385
+ rescan << self
408
386
  invalidate << self
409
387
  end
410
388
 
389
+ # Note that we do not need to pass rescan requests up to our source
390
+ # elements, since a stateful element has enough local information to
391
+ # reproduce its output.
411
392
  invalidate_tables(rescan, invalidate)
412
393
  end
413
394
  end
@@ -437,27 +418,29 @@ module Bud
437
418
  class PushSort < PushStatefulElement
438
419
  def initialize(elem_name=nil, bud_instance=nil, collection_name=nil,
439
420
  schema_in=nil, &blk)
440
- @sortbuf = []
441
421
  super(elem_name, bud_instance, collection_name, schema_in, &blk)
422
+ @sortbuf = []
423
+ @seen_new_input = false
442
424
  end
443
425
 
444
426
  def insert(item, source)
445
427
  @sortbuf << item
428
+ @seen_new_input = true
446
429
  end
447
430
 
448
431
  def flush
449
- unless @sortbuf.empty?
432
+ if @seen_new_input || @rescan
450
433
  @sortbuf.sort!(&@blk)
451
434
  @sortbuf.each do |t|
452
435
  push_out(t, false)
453
436
  end
454
- @sortbuf = []
437
+ @seen_new_input = false
438
+ @rescan = false
455
439
  end
456
- nil
457
440
  end
458
441
 
459
442
  def invalidate_cache
460
- @sortbuf = []
443
+ @sortbuf.clear
461
444
  end
462
445
  end
463
446
 
@@ -488,11 +471,14 @@ module Bud
488
471
  @invalidate_set = invalidate
489
472
  end
490
473
 
491
- public
492
474
  def add_rescan_invalidate(rescan, invalidate)
493
- # scanner elements are never directly connected to tables.
475
+ # if the collection is to be invalidated, the scanner needs to be in
476
+ # rescan mode
494
477
  rescan << self if invalidate.member? @collection
495
478
 
479
+ # in addition, default PushElement rescan/invalidate logic applies
480
+ super
481
+
496
482
  # Note also that this node can be nominated for rescan by a target node;
497
483
  # in other words, a scanner element can be set to rescan even if the
498
484
  # collection is not invalidated.
@@ -555,20 +541,15 @@ module Bud
555
541
  end
556
542
 
557
543
  def add_rescan_invalidate(rescan, invalidate)
558
- srcs = non_temporal_predecessors
559
- if srcs.any? {|p| rescan.member? p}
560
- invalidate << self
561
- rescan << self
562
- end
563
-
564
- invalidate_tables(rescan, invalidate)
544
+ super
565
545
 
566
546
  # This node has some state (@each_index), but not the tuples. If it is in
567
547
  # rescan mode, then it must ask its sources to rescan, and restart its
568
548
  # index.
569
549
  if rescan.member? self
570
550
  invalidate << self
571
- rescan += srcs
551
+ srcs = non_temporal_predecessors
552
+ rescan.merge(srcs)
572
553
  end
573
554
  end
574
555
 
@@ -1,39 +1,80 @@
1
1
  require 'bud/executor/elements'
2
+ require 'set'
2
3
 
3
4
  module Bud
4
5
  class PushGroup < PushStatefulElement
5
- def initialize(elem_name, bud_instance, collection_name, keys_in, aggpairs_in, schema_in, &blk)
6
- @groups = {}
6
+ def initialize(elem_name, bud_instance, collection_name,
7
+ keys_in, aggpairs_in, schema_in, &blk)
7
8
  if keys_in.nil?
8
9
  @keys = []
9
10
  else
10
11
  @keys = keys_in.map{|k| k[1]}
11
12
  end
12
- # ap[1] is nil for Count
13
- @aggpairs = aggpairs_in.map{|ap| ap[1].nil? ? [ap[0]] : [ap[0], ap[1][1]]}
13
+ # An aggpair is an array: [agg class instance, index of input field].
14
+ # ap[1] is nil for Count.
15
+ @aggpairs = aggpairs_in.map{|ap| [ap[0], ap[1].nil? ? nil : ap[1][1]]}
16
+ @groups = {}
17
+
18
+ # Check whether we need to eliminate duplicates from our input (we might
19
+ # see duplicates because of the rescan/invalidation logic, as well as
20
+ # because we don't do duplicate elimination on the output of a projection
21
+ # operator). We don't need to dupelim if all the args are exemplary.
22
+ @elim_dups = @aggpairs.any? {|a| not a[0].kind_of? ArgExemplary}
23
+ if @elim_dups
24
+ @input_cache = Set.new
25
+ end
26
+
27
+ @seen_new_data = false
14
28
  super(elem_name, bud_instance, collection_name, schema_in, &blk)
15
29
  end
16
30
 
17
31
  def insert(item, source)
32
+ if @elim_dups
33
+ return if @input_cache.include? item
34
+ @input_cache << item
35
+ end
36
+
37
+ @seen_new_data = true
18
38
  key = @keys.map{|k| item[k]}
19
- @aggpairs.each_with_index do |ap, agg_ix|
20
- agg_input = ap[1].nil? ? item : item[ap[1]]
21
- agg = (@groups[key].nil? or @groups[key][agg_ix].nil?) ? ap[0].send(:init, agg_input) : ap[0].send(:trans, @groups[key][agg_ix], agg_input)[0]
22
- @groups[key] ||= Array.new(@aggpairs.length)
23
- @groups[key][agg_ix] = agg
39
+ group_state = @groups[key]
40
+ if group_state.nil?
41
+ @groups[key] = @aggpairs.map do |ap|
42
+ input_val = ap[1].nil? ? item : item[ap[1]]
43
+ ap[0].init(input_val)
44
+ end
45
+ else
46
+ @aggpairs.each_with_index do |ap, agg_ix|
47
+ input_val = ap[1].nil? ? item : item[ap[1]]
48
+ state_val = ap[0].trans(group_state[agg_ix], input_val)[0]
49
+ group_state[agg_ix] = state_val
50
+ end
24
51
  end
25
52
  end
26
53
 
54
+ def add_rescan_invalidate(rescan, invalidate)
55
+ # XXX: need to understand why this is necessary; it is dissimilar to the
56
+ # way other stateful non-monotonic operators are handled.
57
+ rescan << self
58
+ super
59
+ end
60
+
27
61
  def invalidate_cache
28
- puts "Group #{qualified_tabname} invalidated" if $BUD_DEBUG
62
+ puts "#{self.class}/#{self.tabname} invalidated" if $BUD_DEBUG
29
63
  @groups.clear
64
+ @input_cache.clear if @elim_dups
65
+ @seen_new_data = false
30
66
  end
31
67
 
32
68
  def flush
69
+ # If we haven't seen any input since the last call to flush(), we're done:
70
+ # our output would be the same as before.
71
+ return unless @seen_new_data
72
+ @seen_new_data = false
73
+
33
74
  @groups.each do |g, grps|
34
75
  grp = @keys == $EMPTY ? [[]] : [g]
35
76
  @aggpairs.each_with_index do |ap, agg_ix|
36
- grp << ap[0].send(:final, grps[agg_ix])
77
+ grp << ap[0].final(grps[agg_ix])
37
78
  end
38
79
  outval = grp[0].flatten
39
80
  (1..grp.length-1).each {|i| outval << grp[i]}
@@ -44,31 +85,38 @@ module Bud
44
85
 
45
86
  class PushArgAgg < PushGroup
46
87
  def initialize(elem_name, bud_instance, collection_name, keys_in, aggpairs_in, schema_in, &blk)
47
- raise Bud::Error, "multiple aggpairs #{aggpairs_in.map{|a| a.class.name}} in ArgAgg; only one allowed" if aggpairs_in.length > 1
88
+ unless aggpairs_in.length == 1
89
+ raise Bud::Error, "multiple aggpairs #{aggpairs_in.map{|a| a.class.name}} in ArgAgg; only one allowed"
90
+ end
48
91
  super(elem_name, bud_instance, collection_name, keys_in, aggpairs_in, schema_in, &blk)
49
- @agg = @aggpairs[0][0]
50
- @aggcol = @aggpairs[0][1]
92
+ @agg, @aggcol = @aggpairs[0]
51
93
  @winners = {}
52
94
  end
53
95
 
54
96
  public
55
97
  def invalidate_cache
56
- puts "#{self.class}/#{self.tabname} invalidated" if $BUD_DEBUG
57
- @groups.clear
98
+ super
58
99
  @winners.clear
59
100
  end
60
101
 
61
102
  def insert(item, source)
62
103
  key = @keys.map{|k| item[k]}
63
- @aggpairs.each_with_index do |ap, agg_ix|
64
- agg_input = item[ap[1]]
65
- if @groups[key].nil?
66
- agg = ap[0].send(:init, agg_input)
104
+ group_state = @groups[key]
105
+ if group_state.nil?
106
+ @seen_new_data = true
107
+ @groups[key] = @aggpairs.map do |ap|
67
108
  @winners[key] = [item]
68
- else
69
- agg_result = ap[0].send(:trans, @groups[key][agg_ix], agg_input)
70
- agg = agg_result[0]
71
- case agg_result[1]
109
+ input_val = item[ap[1]]
110
+ ap[0].init(input_val)
111
+ end
112
+ else
113
+ @aggpairs.each_with_index do |ap, agg_ix|
114
+ input_val = item[ap[1]]
115
+ state_val, flag, *rest = ap[0].trans(group_state[agg_ix], input_val)
116
+ group_state[agg_ix] = state_val
117
+ @seen_new_data = true unless flag == :ignore
118
+
119
+ case flag
72
120
  when :ignore
73
121
  # do nothing
74
122
  when :replace
@@ -76,19 +124,22 @@ module Bud
76
124
  when :keep
77
125
  @winners[key] << item
78
126
  when :delete
79
- agg_result[2..-1].each do |t|
80
- @winners[key].delete t unless @winners[key].empty?
127
+ rest.each do |t|
128
+ @winners[key].delete t
81
129
  end
82
130
  else
83
- raise Bud::Error, "strange result from argagg finalizer"
131
+ raise Bud::Error, "strange result from argagg transition func: #{flag}"
84
132
  end
85
133
  end
86
- @groups[key] ||= Array.new(@aggpairs.length)
87
- @groups[key][agg_ix] = agg
88
134
  end
89
135
  end
90
136
 
91
137
  def flush
138
+ # If we haven't seen any input since the last call to flush(), we're done:
139
+ # our output would be the same as before.
140
+ return unless @seen_new_data
141
+ @seen_new_data = false
142
+
92
143
  @groups.each_key do |g|
93
144
  @winners[g].each do |t|
94
145
  push_out(t, false)
@@ -67,14 +67,16 @@ module Bud
67
67
  public
68
68
  def state_id # :nodoc: all
69
69
  object_id
70
- # Marshal.dump([@rels.map{|r| r.tabname}, @localpreds]).hash
70
+ end
71
+
72
+ def flush
73
+ replay_join if @rescan
71
74
  end
72
75
 
73
76
  # initialize the state for this join to be carried across iterations within a fixpoint
74
77
  private
75
78
  def setup_state
76
79
  sid = state_id
77
-
78
80
  @tabname = ("(" + @all_rels_below.map{|r| r.tabname}.join('*') +"):"+sid.to_s).to_sym
79
81
  @hash_tables = [{}, {}]
80
82
  end
@@ -131,21 +133,21 @@ module Bud
131
133
  else
132
134
  @keys = []
133
135
  end
134
- # puts "@keys = #{@keys.inspect}"
135
136
  end
136
137
 
137
138
  public
138
139
  def invalidate_cache
139
140
  @rels.each_with_index do |source_elem, i|
140
141
  if source_elem.rescan
141
-
142
142
  puts "#{tabname} rel:#{i}(#{source_elem.tabname}) invalidated" if $BUD_DEBUG
143
143
  @hash_tables[i] = {}
144
- if i == 0
145
- # XXX This is not modular. We are doing invalidation work for outer joins, which is part of a
146
- # separate module PushSHOuterJoin.
147
- @missing_keys.clear # Only if i == 0 because outer joins in Bloom are left outer joins
148
- # if i == 1, missing_keys will be corrected when items are populated in the rhs fork
144
+ if i == 0
145
+ # Only if i == 0 because outer joins in Bloom are left outer joins.
146
+ # If i == 1, missing_keys will be corrected when items are populated
147
+ # in the rhs fork.
148
+ # XXX This is not modular. We are doing invalidation work for outer
149
+ # joins, which is part of a separate module PushSHOuterJoin.
150
+ @missing_keys.clear
149
151
  end
150
152
  end
151
153
  end
@@ -268,11 +270,12 @@ module Bud
268
270
 
269
271
  public
270
272
  def insert(item, source)
271
- #puts "JOIN: #{source.tabname} --> #{self.tabname} : #{item}/#{item.class}"
272
- if @rescan
273
- replay_join
274
- @rescan = false
275
- end
273
+ # If we need to reproduce the join's output, do that now before we process
274
+ # the to-be-inserted tuple. This avoids needless duplicates: if the
275
+ # to-be-inserted tuple produced any join output, we'd produce that output
276
+ # again if we didn't rescan now.
277
+ replay_join if @rescan
278
+
276
279
  if @selfjoins.include? source.elem_name
277
280
  offsets = []
278
281
  @relnames.each_with_index{|r,i| offsets << i if r == source.elem_name}
@@ -308,50 +311,26 @@ module Bud
308
311
  end
309
312
  end
310
313
 
311
- public
312
- def rescan_at_tick
313
- false
314
- end
315
-
316
- public
317
- def add_rescan_invalidate(rescan, invalidate)
318
- if non_temporal_predecessors.any? {|e| rescan.member? e}
319
- rescan << self
320
- invalidate << self
321
- end
322
-
323
- # The distinction between a join node and other stateful elements is that
324
- # when a join node needs a rescan it doesn't tell all its sources to
325
- # rescan. In fact, it doesn't have to pass a rescan request up to a
326
- # source, because if a target needs a rescan, the join node has all the
327
- # state necessary to feed the downstream node. And if a source node is in
328
- # rescan, then at run-time only the state associated with that particular
329
- # source node @hash_tables[offset] will be cleared, and will get filled up
330
- # again because that source will rescan anyway.
331
- invalidate_tables(rescan, invalidate)
332
- end
333
-
334
314
  def replay_join
335
- a = @hash_tables[0]
336
- b = @hash_tables[1]
337
-
338
- if not(a.empty? or b.empty?)
339
- if a.size < b.size
340
- a.each_pair do |key, items|
341
- the_matches = b[key]
342
- unless the_matches.nil?
343
- items.each do |item|
344
- process_matches(item, the_matches, 1)
345
- end
315
+ @rescan = false
316
+ a, b = @hash_tables
317
+ return if a.empty? or b.empty?
318
+
319
+ if a.size < b.size
320
+ a.each_pair do |key, items|
321
+ the_matches = b[key]
322
+ unless the_matches.nil?
323
+ items.each do |item|
324
+ process_matches(item, the_matches, 0)
346
325
  end
347
326
  end
348
- else
349
- b.each_pair do |key, items|
350
- the_matches = a[key]
351
- unless the_matches.nil?
352
- items.each do |item|
353
- process_matches(item, the_matches, 0)
354
- end
327
+ end
328
+ else
329
+ b.each_pair do |key, items|
330
+ the_matches = a[key]
331
+ unless the_matches.nil?
332
+ items.each do |item|
333
+ process_matches(item, the_matches, 1)
355
334
  end
356
335
  end
357
336
  end
@@ -489,7 +468,6 @@ module Bud
489
468
  end
490
469
 
491
470
  module PushSHOuterJoin
492
-
493
471
  private
494
472
  def insert_item(item, offset)
495
473
  if @keys.nil? or @keys.empty?
@@ -517,6 +495,11 @@ module Bud
517
495
  end
518
496
  end
519
497
 
498
+ public
499
+ def rescan_at_tick
500
+ true
501
+ end
502
+
520
503
  public
521
504
  def stratum_end
522
505
  flush
@@ -525,23 +508,24 @@ module Bud
525
508
 
526
509
  private
527
510
  def push_missing
528
- if @missing_keys
529
- @missing_keys.each do |key|
530
- @hash_tables[0][key].each do |t|
531
- push_out([t, @rels[1].null_tuple])
532
- end
511
+ @missing_keys.each do |key|
512
+ @hash_tables[0][key].each do |t|
513
+ push_out([t, @rels[1].null_tuple])
533
514
  end
534
515
  end
535
516
  end
536
517
  end
537
518
 
538
519
 
539
- # Consider "u <= s.notin(t, s.a => t.b)". notin is a non-monotonic operator, where u depends positively on s,
540
- # but negatively on t. Stratification ensures that t is fully computed in a lower stratum, which means that we
541
- # can expect multiple iterators on s's side only. If t's scanner were to push its elemends down first, every
542
- # insert of s merely needs to be cross checked with the cached elements of 't', and pushed down to the next
543
- # element if s notin t. However, if s's scanner were to fire first, we have to wait until the first flush, at which
544
- # point we are sure to have seen all the t-side tuples in this tick.
520
+ # Consider "u <= s.notin(t, s.a => t.b)". notin is a non-monotonic operator,
521
+ # where u depends positively on s, but negatively on t. Stratification ensures
522
+ # that t is fully computed in a lower stratum, which means that we can expect
523
+ # multiple iterators on s's side only. If t's scanner were to push its
524
+ # elements down first, every insert of s merely needs to be cross checked with
525
+ # the cached elements of 't', and pushed down to the next element if s notin
526
+ # t. However, if s's scanner were to fire first, we have to wait until the
527
+ # first flush, at which point we are sure to have seen all the t-side tuples
528
+ # in this tick.
545
529
  class PushNotIn < PushStatefulElement
546
530
  def initialize(rellist, bud_instance, preds=nil, &blk) # :nodoc: all
547
531
  @lhs, @rhs = rellist
@@ -552,7 +536,6 @@ module Bud
552
536
  setup_preds(preds) unless preds.empty?
553
537
  @rhs_rcvd = false
554
538
  @hash_tables = [{},{}]
555
- @rhs_rcvd = false
556
539
  if @lhs_keycols.nil? and blk.nil?
557
540
  # pointwise comparison. Could use zip, but it creates an array for each field pair
558
541
  blk = lambda {|lhs, rhs|
@@ -563,9 +546,10 @@ module Bud
563
546
  end
564
547
 
565
548
  def setup_preds(preds)
566
- # This is simpler than PushSHJoin's setup_preds, because notin is a binary operator where both lhs and rhs are
567
- # collections.
568
- # preds an array of hash_pairs. For now assume that the attributes are in the same order as the tables.
549
+ # This is simpler than PushSHJoin's setup_preds, because notin is a binary
550
+ # operator where both lhs and rhs are collections. preds an array of
551
+ # hash_pairs. For now assume that the attributes are in the same order as
552
+ # the tables.
569
553
  @lhs_keycols, @rhs_keycols = preds.reduce([[], []]) do |memo, item|
570
554
  # each item is a hash
571
555
  l = item.keys[0]
@@ -578,11 +562,11 @@ module Bud
578
562
  def find_col(colspec, rel)
579
563
  if colspec.is_a? Symbol
580
564
  col_desc = rel.send(colspec)
581
- raise "Unknown column #{rel} in #{@rel.tabname}" if col_desc.nil?
565
+ raise Bud::Error, "unknown column #{colspec} in #{@rel.tabname}" if col_desc.nil?
582
566
  elsif colspec.is_a? Array
583
567
  col_desc = colspec
584
568
  else
585
- raise "Symbol or column spec expected. Got #{colspec}"
569
+ raise Bud::Error, "symbol or column spec expected. Got #{colspec}"
586
570
  end
587
571
  col_desc[1] # col_desc is of the form [tabname, colnum, colname]
588
572
  end
@@ -592,11 +576,6 @@ module Bud
592
576
  keycols.nil? ? $EMPTY : keycols.map{|col| item[col]}
593
577
  end
594
578
 
595
- public
596
- def invalidate_at_tick
597
- true
598
- end
599
-
600
579
  public
601
580
  def rescan_at_tick
602
581
  true
@@ -605,7 +584,6 @@ module Bud
605
584
  def insert(item, source)
606
585
  offset = source == @lhs ? 0 : 1
607
586
  key = get_key(item, offset)
608
- #puts "#{key}, #{item}, #{offset}"
609
587
  (@hash_tables[offset][key] ||= Set.new).add item
610
588
  if @rhs_rcvd and offset == 0
611
589
  push_lhs(key, item)
@@ -613,15 +591,14 @@ module Bud
613
591
  end
614
592
 
615
593
  def flush
616
- # When flush is called the first time, both lhs and rhs scanners have been invoked, and because of stratification
617
- # we know that the rhs is not growing any more, until the next tick.
594
+ # When flush is called the first time, both lhs and rhs scanners have been
595
+ # invoked, and because of stratification we know that the rhs is not
596
+ # growing any more, until the next tick.
618
597
  unless @rhs_rcvd
619
598
  @rhs_rcvd = true
620
- @hash_tables[0].map{|key,values|
621
- values.each{|item|
622
- push_lhs(key, item)
623
- }
624
- }
599
+ @hash_tables[0].each do |key,values|
600
+ values.each {|item| push_lhs(key, item)}
601
+ end
625
602
  end
626
603
  end
627
604
 
@@ -661,9 +638,15 @@ module Bud
661
638
  @delete_keys.each{|o| o.pending_delete_keys([item])}
662
639
  @pendings.each{|o| o.pending_merge([item])}
663
640
  end
664
- end
665
- def stratum_end
666
- @hash_tables = [{},{}]
667
- @rhs_rcvd = false
641
+
642
+ def invalidate_cache
643
+ puts "#{self.class}/#{self.tabname} invalidated" if $BUD_DEBUG
644
+ @hash_tables = [{},{}]
645
+ @rhs_rcvd = false
646
+ end
647
+
648
+ def stratum_end
649
+ @rhs_rcvd = false
650
+ end
668
651
  end
669
652
  end