RubyGems - bud - Versions diffs - 0.9.2 → 0.9.3 - Mend

bud 0.9.2 → 0.9.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (21) hide show

data/History.txt +17 -4
data/README.md +5 -0
data/docs/cheat.md +1 -1
data/docs/getstarted.md +1 -1
data/lib/bud.rb +52 -34
data/lib/bud/aggs.rb +8 -11
data/lib/bud/bud_meta.rb +18 -18
data/lib/bud/collections.rb +14 -21
data/lib/bud/executor/README.rescan +80 -0
data/lib/bud/executor/elements.rb +25 -44
data/lib/bud/executor/group.rb +80 -29
data/lib/bud/executor/join.rb +73 -90
data/lib/bud/monkeypatch.rb +1 -1
data/lib/bud/rebl.rb +5 -2
data/lib/bud/rewrite.rb +18 -14
data/lib/bud/server.rb +1 -1
data/lib/bud/source.rb +0 -45
data/lib/bud/storage/dbm.rb +13 -9
data/lib/bud/viz.rb +6 -8
data/lib/bud/viz_util.rb +1 -0
metadata +3 -18

data/lib/bud/executor/README.rescan ADDED

@@ -0,0 +1,80 @@
+Notes on Invalidate and Rescan in Bud
+=====================================
+(I'll use 'downstream' to mean rhs to lhs (like in budplot). In every stratum,
+data originates at scanned sources at the "top", winds its way through various
+PushElements and ends up in a collection at the "bottom". I'll also the term
+"elements" to mean both dataflow nodes (PushElements) and collections).
+Invalidation strategy works through two flags/signals, rescan and
+invalidate. Invalidation means a stateful PushElement or a scratch's contents
+are erased, or table is negated. Rescan means that tuples coming out of an
+element represent the entire collection (a full-scan), not just deltas.
+Earlier: all stateful elements were eagerly invalidated.
+  Collections with state: scratches, interfaces, channels, terminal
+  Elements with state: Group, join, sort, reduce, each_with_index
+Now: lazy invalidation where possible, based on the observation that the same
+state is often rederived downstream, which means that as long as there are no
+negations, one should be able to go on in incremental mode (working only on
+deltas, not on storage) from one tick to another.
+Observations:
+1. There are two kinds of elements that are (or may be) invalidated at the
+   beginning of every tick: source scratches (those that are not found on the
+   lhs of any rule), and tables that process pending negations.
+2. a. Invalidation implies rescan of its contents.
+   b. Rescan of its contents implies invalidation of downstream nodes.
+   c. Invalidation involves rebuilding of state, which means that if a node has
+      multiple sources, it has to ask the other sources to rescan as well.
+      Example:  x,y,z are scratches
+          z <= x.group(....)
+          z <= y.sort {}
+      If x is invalidated, it will rescan its contents. The group element then
+      invalidates its state, and rebuilds itself as x is scanned. Since group is
+      in rescan mode, z invalidates its state and is rebuilt from group.
+      However, since part of z's state state comes from y.sort, it asks its
+      source element (the sort node) for a rescan as well.
+    This push-pull negotiation can be run until fixpoint, until the elements
+    that need to be invalidated and rescanned is determined fully.
+3. If a node is stateless, it passes the rescan request upstream, and the
+   invalidations downstream. But if it is stateful, it need not pass a rescan
+   request upstream. In the example above, only the sort node needs to rescan
+   its buffer; y doesn't need to be scanned at all.
+4. Solving the above constraints to a fixpoint at every tick is a huge
+   overhead. So we determine the strategy at wiring time.
+    bud.default_invalidate/default_rescan == set of elements that we know
+    apriori will _always_ need the corresponding signal.
+    scanner.invalidate_set/rescan_set == for each scanner, the set of elements
+    to invalidate/rescan should that scanner's collection be negated.
+    bud.prepare_invalidation_scheme works as follows.
+    Start the process by determining which tables will invalidate at each tick,
+    and which PushElements will rescan at the beginning of each tick. Then run
+    rescan_invalidate_tc for a transitive closure, where each element gets to
+    determine its own presence in the rescan and invalidate sets, depending on
+    its source or target elements' presence in those sets. This creates the
+    default sets.
+    Then for each scanner, prime the pump by setting the scanner to rescan mode,
+    and determine what effect it has on the system, by running
+    rescan_invalidate_tc. All the elements that are not already in the default
+    sets are those that need to be additionally informed at run time, should we
+    discover that that scanner's collection has been negated at the beginning of
+    each tick.
+    The BUD_SAFE environment variable is used to force old-style behavior, where
+    every cached element is invalidated and fully scanned once every tick.

data/lib/bud/executor/elements.rb CHANGED

@@ -144,7 +144,8 @@ module Bud
     # default for stateless elements
     public
     def add_rescan_invalidate(rescan, invalidate)
-      # if any of the source elements are in rescan mode, then put this node in rescan.
+      # if any of the source elements are in rescan mode, then put this node in
+      # rescan.
       srcs = non_temporal_predecessors
       if srcs.any?{|p| rescan.member? p}
         rescan << self
@@ -157,7 +158,7 @@ module Bud
       # finally, if this node is in rescan, pass the request on to all source
       # elements
       if rescan.member? self
-        rescan += srcs
+        rescan.merge(srcs)
       end
     end
@@ -177,14 +178,12 @@ module Bud
     def <<(i)
       insert(i, nil)
     end
     public
     def flush
     end
     def invalidate_cache
-      #override to get rid of cached information.
     end
-    public
     def stratum_end
     end
@@ -220,7 +219,7 @@ module Bud
     def join(elem2, &blk)
       # cached = @bud_instance.push_elems[[self.object_id,:join,[self,elem2], @bud_instance, blk]]
       # if cached.nil?
-        elem2  = elem2.to_push_elem unless elem2.class <= PushElement
+        elem2 = elem2.to_push_elem unless elem2.class <= PushElement
         toplevel = @bud_instance.toplevel
         join = Bud::PushSHJoin.new([self, elem2], toplevel.this_rule_context, [])
         self.wire_to(join)
@@ -292,7 +291,6 @@ module Bud
       return g
     end
     def argagg(aggname, gbkey_cols, collection, &blk)
       gbkey_cols = gbkey_cols.map{|c| canonicalize_col(c)}
       collection = canonicalize_col(collection)
@@ -353,7 +351,6 @@ module Bud
     end
     def reduce(initial, &blk)
-      @memo = initial
       retval = Bud::PushReduce.new("reduce#{Time.new.tv_usec}",
                                    @bud_instance, @collection_name,
                                    schema, initial, &blk)
@@ -380,34 +377,18 @@ module Bud
       end
       toplevel.push_elems[[self.object_id, :inspected]]
     end
-    def to_enum
-      # scr = @bud_instance.scratch(("scratch_" + Process.pid.to_s + "_" + object_id.to_s + "_" + rand(10000).to_s).to_sym, schema)
-      scr = []
-      self.wire_to(scr)
-      scr
-    end
   end
   class PushStatefulElement < PushElement
-    def rescan_at_tick
-      true
-    end
-    def rescan
-      true # always gives an entire dump of its contents
-    end
     def add_rescan_invalidate(rescan, invalidate)
-      # If an upstream node is set to rescan, a stateful node invalidates its
-      # cache.  In addition, a stateful node always rescans its own contents
-      # (doesn't need to pass a rescan request to its its source nodes).
-      rescan << self
-      srcs = non_temporal_predecessors
-      if srcs.any? {|p| rescan.member? p}
+      if non_temporal_predecessors.any? {|e| rescan.member? e}
+        rescan << self
         invalidate << self
       end
+      # Note that we do not need to pass rescan requests up to our source
+      # elements, since a stateful element has enough local information to
+      # reproduce its output.
       invalidate_tables(rescan, invalidate)
     end
   end
@@ -437,27 +418,29 @@ module Bud
   class PushSort < PushStatefulElement
     def initialize(elem_name=nil, bud_instance=nil, collection_name=nil,
                    schema_in=nil, &blk)
-      @sortbuf = []
       super(elem_name, bud_instance, collection_name, schema_in, &blk)
+      @sortbuf = []
+      @seen_new_input = false
     end
     def insert(item, source)
       @sortbuf << item
+      @seen_new_input = true
     end
     def flush
-      unless @sortbuf.empty?
+      if @seen_new_input || @rescan
         @sortbuf.sort!(&@blk)
         @sortbuf.each do |t|
           push_out(t, false)
         end
-        @sortbuf = []
+        @seen_new_input = false
+        @rescan = false
       end
-      nil
     end
     def invalidate_cache
-      @sortbuf = []
+      @sortbuf.clear
     end
   end
@@ -488,11 +471,14 @@ module Bud
       @invalidate_set = invalidate
     end
-    public
     def add_rescan_invalidate(rescan, invalidate)
-      # scanner elements are never directly connected to tables.
+      # if the collection is to be invalidated, the scanner needs to be in
+      # rescan mode
       rescan << self if invalidate.member? @collection
+      # in addition, default PushElement rescan/invalidate logic applies
+      super
       # Note also that this node can be nominated for rescan by a target node;
       # in other words, a scanner element can be set to rescan even if the
       # collection is not invalidated.
@@ -555,20 +541,15 @@ module Bud
     end
     def add_rescan_invalidate(rescan, invalidate)
-      srcs = non_temporal_predecessors
-      if srcs.any? {|p| rescan.member? p}
-        invalidate << self
-        rescan << self
-      end
-      invalidate_tables(rescan, invalidate)
+      super
       # This node has some state (@each_index), but not the tuples. If it is in
       # rescan mode, then it must ask its sources to rescan, and restart its
       # index.
       if rescan.member? self
         invalidate << self
-        rescan += srcs
+        srcs = non_temporal_predecessors
+        rescan.merge(srcs)
       end
     end

data/lib/bud/executor/group.rb CHANGED

@@ -1,39 +1,80 @@
 require 'bud/executor/elements'
+require 'set'
 module Bud
   class PushGroup < PushStatefulElement
-    def initialize(elem_name, bud_instance, collection_name, keys_in, aggpairs_in, schema_in, &blk)
-      @groups = {}
+    def initialize(elem_name, bud_instance, collection_name,
+                   keys_in, aggpairs_in, schema_in, &blk)
       if keys_in.nil?
         @keys = []
       else
         @keys = keys_in.map{|k| k[1]}
       end
-      # ap[1] is nil for Count
-      @aggpairs = aggpairs_in.map{|ap| ap[1].nil? ? [ap[0]] : [ap[0], ap[1][1]]}
+      # An aggpair is an array: [agg class instance, index of input field].
+      # ap[1] is nil for Count.
+      @aggpairs = aggpairs_in.map{|ap| [ap[0], ap[1].nil? ? nil : ap[1][1]]}
+      @groups = {}
+      # Check whether we need to eliminate duplicates from our input (we might
+      # see duplicates because of the rescan/invalidation logic, as well as
+      # because we don't do duplicate elimination on the output of a projection
+      # operator). We don't need to dupelim if all the args are exemplary.
+      @elim_dups = @aggpairs.any? {|a| not a[0].kind_of? ArgExemplary}
+      if @elim_dups
+        @input_cache = Set.new
+      end
+      @seen_new_data = false
       super(elem_name, bud_instance, collection_name, schema_in, &blk)
     end
     def insert(item, source)
+      if @elim_dups
+        return if @input_cache.include? item
+        @input_cache << item
+      end
+      @seen_new_data = true
       key = @keys.map{|k| item[k]}
-      @aggpairs.each_with_index do |ap, agg_ix|
-        agg_input = ap[1].nil? ? item : item[ap[1]]
-        agg = (@groups[key].nil? or @groups[key][agg_ix].nil?) ? ap[0].send(:init, agg_input) : ap[0].send(:trans, @groups[key][agg_ix], agg_input)[0]
-        @groups[key] ||= Array.new(@aggpairs.length)
-        @groups[key][agg_ix] = agg
+      group_state = @groups[key]
+      if group_state.nil?
+        @groups[key] = @aggpairs.map do |ap|
+          input_val = ap[1].nil? ? item : item[ap[1]]
+          ap[0].init(input_val)
+        end
+      else
+        @aggpairs.each_with_index do |ap, agg_ix|
+          input_val = ap[1].nil? ? item : item[ap[1]]
+          state_val = ap[0].trans(group_state[agg_ix], input_val)[0]
+          group_state[agg_ix] = state_val
+        end
       end
     end
+    def add_rescan_invalidate(rescan, invalidate)
+      # XXX: need to understand why this is necessary; it is dissimilar to the
+      # way other stateful non-monotonic operators are handled.
+      rescan << self
+      super
+    end
     def invalidate_cache
-      puts "Group #{qualified_tabname} invalidated" if $BUD_DEBUG
+      puts "#{self.class}/#{self.tabname} invalidated" if $BUD_DEBUG
       @groups.clear
+      @input_cache.clear if @elim_dups
+      @seen_new_data = false
     end
     def flush
+      # If we haven't seen any input since the last call to flush(), we're done:
+      # our output would be the same as before.
+      return unless @seen_new_data
+      @seen_new_data = false
       @groups.each do |g, grps|
         grp = @keys == $EMPTY ? [[]] : [g]
         @aggpairs.each_with_index do |ap, agg_ix|
-          grp << ap[0].send(:final, grps[agg_ix])
+          grp << ap[0].final(grps[agg_ix])
         end
         outval = grp[0].flatten
         (1..grp.length-1).each {|i| outval << grp[i]}
@@ -44,31 +85,38 @@ module Bud
   class PushArgAgg < PushGroup
     def initialize(elem_name, bud_instance, collection_name, keys_in, aggpairs_in, schema_in, &blk)
-      raise Bud::Error, "multiple aggpairs #{aggpairs_in.map{|a| a.class.name}} in ArgAgg; only one allowed" if aggpairs_in.length > 1
+      unless aggpairs_in.length == 1
+        raise Bud::Error, "multiple aggpairs #{aggpairs_in.map{|a| a.class.name}} in ArgAgg; only one allowed"
+      end
       super(elem_name, bud_instance, collection_name, keys_in, aggpairs_in, schema_in, &blk)
-      @agg = @aggpairs[0][0]
-      @aggcol = @aggpairs[0][1]
+      @agg, @aggcol = @aggpairs[0]
       @winners = {}
     end
     public
     def invalidate_cache
-      puts "#{self.class}/#{self.tabname} invalidated" if $BUD_DEBUG
-      @groups.clear
+      super
       @winners.clear
     end
     def insert(item, source)
       key = @keys.map{|k| item[k]}
-      @aggpairs.each_with_index do |ap, agg_ix|
-        agg_input = item[ap[1]]
-        if @groups[key].nil?
-          agg = ap[0].send(:init, agg_input)
+      group_state = @groups[key]
+      if group_state.nil?
+        @seen_new_data = true
+        @groups[key] = @aggpairs.map do |ap|
           @winners[key] = [item]
-        else
-          agg_result = ap[0].send(:trans, @groups[key][agg_ix], agg_input)
-          agg = agg_result[0]
-          case agg_result[1]
+          input_val = item[ap[1]]
+          ap[0].init(input_val)
+        end
+      else
+        @aggpairs.each_with_index do |ap, agg_ix|
+          input_val = item[ap[1]]
+          state_val, flag, *rest = ap[0].trans(group_state[agg_ix], input_val)
+          group_state[agg_ix] = state_val
+          @seen_new_data = true unless flag == :ignore
+          case flag
           when :ignore
             # do nothing
           when :replace
@@ -76,19 +124,22 @@ module Bud
           when :keep
             @winners[key] << item
           when :delete
-            agg_result[2..-1].each do |t|
-              @winners[key].delete t unless @winners[key].empty?
+            rest.each do |t|
+              @winners[key].delete t
             end
           else
-            raise Bud::Error, "strange result from argagg finalizer"
+            raise Bud::Error, "strange result from argagg transition func: #{flag}"
           end
         end
-        @groups[key] ||= Array.new(@aggpairs.length)
-        @groups[key][agg_ix] = agg
       end
     end
     def flush
+      # If we haven't seen any input since the last call to flush(), we're done:
+      # our output would be the same as before.
+      return unless @seen_new_data
+      @seen_new_data = false
       @groups.each_key do |g|
         @winners[g].each do |t|
           push_out(t, false)

data/lib/bud/executor/join.rb CHANGED

@@ -67,14 +67,16 @@ module Bud
     public
     def state_id # :nodoc: all
       object_id
-                 # Marshal.dump([@rels.map{|r| r.tabname}, @localpreds]).hash
+    end
+    def flush
+      replay_join if @rescan
     end
     # initialize the state for this join to be carried across iterations within a fixpoint
     private
     def setup_state
       sid = state_id
       @tabname = ("(" + @all_rels_below.map{|r| r.tabname}.join('*') +"):"+sid.to_s).to_sym
       @hash_tables = [{}, {}]
     end
@@ -131,21 +133,21 @@ module Bud
       else
         @keys = []
       end
-      # puts "@keys = #{@keys.inspect}"
     end
     public
     def invalidate_cache
       @rels.each_with_index do |source_elem, i|
         if source_elem.rescan
           puts "#{tabname} rel:#{i}(#{source_elem.tabname}) invalidated" if $BUD_DEBUG
           @hash_tables[i] = {}
-          if  i == 0
-            # XXX This is not modular. We are doing invalidation work for outer joins, which is part of a
-            # separate module PushSHOuterJoin.
-            @missing_keys.clear # Only if i == 0 because outer joins in Bloom are left outer joins
-            # if i == 1, missing_keys will be corrected when items are populated in the rhs fork
+          if i == 0
+            # Only if i == 0 because outer joins in Bloom are left outer joins.
+            # If i == 1, missing_keys will be corrected when items are populated
+            # in the rhs fork.
+            # XXX This is not modular. We are doing invalidation work for outer
+            # joins, which is part of a separate module PushSHOuterJoin.
+            @missing_keys.clear
           end
         end
       end
@@ -268,11 +270,12 @@ module Bud
     public
     def insert(item, source)
-      #puts "JOIN: #{source.tabname} -->  #{self.tabname} : #{item}/#{item.class}"
-      if @rescan
-        replay_join
-        @rescan = false
-      end
+      # If we need to reproduce the join's output, do that now before we process
+      # the to-be-inserted tuple. This avoids needless duplicates: if the
+      # to-be-inserted tuple produced any join output, we'd produce that output
+      # again if we didn't rescan now.
+      replay_join if @rescan
       if @selfjoins.include? source.elem_name
         offsets = []
         @relnames.each_with_index{|r,i| offsets << i if r == source.elem_name}
@@ -308,50 +311,26 @@ module Bud
       end
     end
-    public
-    def rescan_at_tick
-      false
-    end
-    public
-    def add_rescan_invalidate(rescan, invalidate)
-      if non_temporal_predecessors.any? {|e| rescan.member? e}
-        rescan << self
-        invalidate << self
-      end
-      # The distinction between a join node and other stateful elements is that
-      # when a join node needs a rescan it doesn't tell all its sources to
-      # rescan. In fact, it doesn't have to pass a rescan request up to a
-      # source, because if a target needs a rescan, the join node has all the
-      # state necessary to feed the downstream node. And if a source node is in
-      # rescan, then at run-time only the state associated with that particular
-      # source node @hash_tables[offset] will be cleared, and will get filled up
-      # again because that source will rescan anyway.
-      invalidate_tables(rescan, invalidate)
-    end
     def replay_join
-      a = @hash_tables[0]
-      b = @hash_tables[1]
-      if not(a.empty? or b.empty?)
-        if a.size < b.size
-          a.each_pair do |key, items|
-            the_matches = b[key]
-            unless the_matches.nil?
-              items.each do |item|
-                process_matches(item, the_matches, 1)
-              end
+      @rescan = false
+      a, b = @hash_tables
+      return if a.empty? or b.empty?
+      if a.size < b.size
+        a.each_pair do |key, items|
+          the_matches = b[key]
+          unless the_matches.nil?
+            items.each do |item|
+              process_matches(item, the_matches, 0)
             end
           end
-        else
-          b.each_pair do |key, items|
-            the_matches = a[key]
-            unless the_matches.nil?
-              items.each do |item|
-                process_matches(item, the_matches, 0)
-              end
+        end
+      else
+        b.each_pair do |key, items|
+          the_matches = a[key]
+          unless the_matches.nil?
+            items.each do |item|
+              process_matches(item, the_matches, 1)
             end
           end
         end
@@ -489,7 +468,6 @@ module Bud
   end
   module PushSHOuterJoin
     private
     def insert_item(item, offset)
       if @keys.nil? or @keys.empty?
@@ -517,6 +495,11 @@ module Bud
       end
     end
+    public
+    def rescan_at_tick
+      true
+    end
     public
     def stratum_end
       flush
@@ -525,23 +508,24 @@ module Bud
     private
     def push_missing
-      if @missing_keys
-        @missing_keys.each do |key|
-          @hash_tables[0][key].each do |t|
-            push_out([t, @rels[1].null_tuple])
-          end
+      @missing_keys.each do |key|
+        @hash_tables[0][key].each do |t|
+          push_out([t, @rels[1].null_tuple])
         end
       end
     end
   end
-  # Consider "u <= s.notin(t, s.a => t.b)". notin is a non-monotonic operator, where u depends positively on s,
-  # but negatively on t. Stratification ensures that t is fully computed in a lower stratum, which means that we
-  # can expect multiple iterators on s's side only. If t's scanner were to push its elemends down first, every
-  # insert of s merely needs to be cross checked with the cached elements of 't', and pushed down to the next
-  # element if s notin t. However, if s's scanner were to fire first, we have to wait until the first flush, at which
-  # point we are sure to have seen all the t-side tuples in this tick.
+  # Consider "u <= s.notin(t, s.a => t.b)". notin is a non-monotonic operator,
+  # where u depends positively on s, but negatively on t. Stratification ensures
+  # that t is fully computed in a lower stratum, which means that we can expect
+  # multiple iterators on s's side only. If t's scanner were to push its
+  # elements down first, every insert of s merely needs to be cross checked with
+  # the cached elements of 't', and pushed down to the next element if s notin
+  # t. However, if s's scanner were to fire first, we have to wait until the
+  # first flush, at which point we are sure to have seen all the t-side tuples
+  # in this tick.
   class PushNotIn < PushStatefulElement
     def initialize(rellist, bud_instance, preds=nil, &blk) # :nodoc: all
       @lhs, @rhs = rellist
@@ -552,7 +536,6 @@ module Bud
       setup_preds(preds) unless preds.empty?
       @rhs_rcvd = false
       @hash_tables = [{},{}]
-      @rhs_rcvd = false
       if @lhs_keycols.nil? and blk.nil?
         # pointwise comparison. Could use zip, but it creates an array for each field pair
         blk = lambda {|lhs, rhs|
@@ -563,9 +546,10 @@ module Bud
     end
     def setup_preds(preds)
-      # This is simpler than PushSHJoin's setup_preds, because notin is a binary operator where both lhs and rhs are
-      # collections.
-      # preds an array of hash_pairs. For now assume that the attributes are in the same order as the tables.
+      # This is simpler than PushSHJoin's setup_preds, because notin is a binary
+      # operator where both lhs and rhs are collections.  preds an array of
+      # hash_pairs. For now assume that the attributes are in the same order as
+      # the tables.
       @lhs_keycols, @rhs_keycols = preds.reduce([[], []]) do |memo, item|
         # each item is a hash
         l = item.keys[0]
@@ -578,11 +562,11 @@ module Bud
     def find_col(colspec, rel)
       if colspec.is_a? Symbol
         col_desc = rel.send(colspec)
-        raise "Unknown column #{rel} in #{@rel.tabname}" if col_desc.nil?
+        raise Bud::Error, "unknown column #{colspec} in #{@rel.tabname}" if col_desc.nil?
       elsif colspec.is_a? Array
         col_desc = colspec
       else
-        raise "Symbol or column spec expected. Got #{colspec}"
+        raise Bud::Error, "symbol or column spec expected. Got #{colspec}"
       end
       col_desc[1] # col_desc is of the form [tabname, colnum, colname]
     end
@@ -592,11 +576,6 @@ module Bud
       keycols.nil? ? $EMPTY : keycols.map{|col| item[col]}
     end
-    public
-    def invalidate_at_tick
-      true
-    end
     public
     def rescan_at_tick
       true
@@ -605,7 +584,6 @@ module Bud
     def insert(item, source)
       offset = source == @lhs ? 0 : 1
       key = get_key(item, offset)
-      #puts "#{key}, #{item}, #{offset}"
       (@hash_tables[offset][key] ||= Set.new).add item
       if @rhs_rcvd and offset == 0
         push_lhs(key, item)
@@ -613,15 +591,14 @@ module Bud
     end
     def flush
-      # When flush is called the first time, both lhs and rhs scanners have been invoked, and because of stratification
-      # we know that the rhs is not growing any more, until the next tick.
+      # When flush is called the first time, both lhs and rhs scanners have been
+      # invoked, and because of stratification we know that the rhs is not
+      # growing any more, until the next tick.
       unless @rhs_rcvd
         @rhs_rcvd = true
-        @hash_tables[0].map{|key,values|
-          values.each{|item|
-            push_lhs(key, item)
-          }
-        }
+        @hash_tables[0].each do |key,values|
+          values.each {|item| push_lhs(key, item)}
+        end
       end
     end
@@ -661,9 +638,15 @@ module Bud
       @delete_keys.each{|o| o.pending_delete_keys([item])}
       @pendings.each{|o| o.pending_merge([item])}
     end
-  end
-  def stratum_end
-    @hash_tables = [{},{}]
-    @rhs_rcvd = false
+    def invalidate_cache
+      puts "#{self.class}/#{self.tabname} invalidated" if $BUD_DEBUG
+      @hash_tables = [{},{}]
+      @rhs_rcvd = false
+    end
+    def stratum_end
+      @rhs_rcvd = false
+    end
   end
 end