RubyGems - data_structures_rmolinari - Versions diffs - 0.2.1 → 0.3.0 - Mend

data_structures_rmolinari 0.2.1 → 0.3.0

Files changed (10) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: b8d63f6987f160f00ce92f79643f904f95bc230e4a5b60679f4301ecd366622a
-  data.tar.gz: cdc660ab53a0259a79b72ecc3928f640faa7b1251a1bae3baabb1c70bac20d01
+  metadata.gz: 9f006234ee3b216d5607e9b10bb1958a6107ccfa0cc8c359f98383dc7fde14ee
+  data.tar.gz: f281ab0768e24e7c983cd046ba7b185dab8fd972fb3065fd73ff575782bf5486
 SHA512:
-  metadata.gz: 057d29af42606c7ecc4a78e6c752fa5f7a29b41788f21ca7ede6bab0893d1ba63b6c9dc3e5f3b5747f3912718565d6dfcb5952d189f78b40a6ef77efe95f34c9
-  data.tar.gz: '08f6921ab4b1c7cf84c9fe5624266b7060f4f1d4f1ecc56e3850106b3bb30e0c6b30c3ea808a61939966350118684f3353ccbb628d24b353b35ae865268222d2'
+  metadata.gz: e274a97f177fad44bad20ecf24ecca1385fee3c217e7e42aac076c24377970c6444dfdbadc6fd3e1e201555177429c9f8eddaee211e463dd60f6b36e74004eec
+  data.tar.gz: 293fc0b2973a8d851c27f4e64177dbf7b9a25b2bb7eb9efb4b33abdb07c4e006f80f4450996ef99da7e8bb1516ca8aa89ab893258960d9127d101995906254ed

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,18 @@
+# Changelog
+## [Unreleased]
+## [0.3.0] 2023-01-06
+### Added
+- Start this file
+- `Heap` can be constructed as "non-addressable"
+  - `update` is not possible but duplicates can be inserted and overall performance is a little better.
+### Changed
+- `LogicError` gets a subclassed `InternalLogicError` for issues inside the library.
+- `Shared::Pair` becomes `Shared::Point`
+  - this doesn't change the API of `MaxPrioritySearchTree` because of ducktyping. But client code (of which there is none) might be
+    using the `Pair` name.

data/lib/data_structures_rmolinari/{disjoint_union_internal.rb → disjoint_union.rb} RENAMED Viewed

@@ -4,13 +4,21 @@
 # The data structure provides efficient actions to merge two disjoint subsets, i.e., replace them by their union, and determine if
 # two elements are in the same subset.
 #
-# The elements of the set must be 0, 1, ..., n-1. Client code can map its data to these representatives. The code uses several ideas
-# from Tarjan and van Leeuwen for efficiency
+# The elements of the set are 0, 1, ..., n-1, where n is the size of the universe. Client code can map its data to these
+# representatives.
 #
 # See https://en.wikipedia.org/wiki/Disjoint-set_data_structure for a good introduction.
 #
+# The code uses several ideas from Tarjan and van Leeuwen for efficiency. We use "union by rank" in +unite+ and path-halving in
+# +find+. Together, these make the amortized cost for each of n such operations effectively constant.
+#
 # - Tarjan, Robert E., van Leeuwen, Jan (1984). "Worst-case analysis of set union algorithms". Journal of the ACM. 31 (2): 245–281.
-class DisjointUnionInternal
+#
+# @todo
+#   - allow caller to expand the size of the universe. This operation is called "make set".
+#     - All we need to do is increase the size of @d, set the parent pointers, define the new ranks (zero), and update @size.
+class DataStructuresRMolinari::DisjointUnion
+  # The number of subsets in the partition.
   attr_reader :subset_count
   # @param size the size of the universe, which must be known at the time of construction. The elements 0, 1, ..., size - 1 start
@@ -52,7 +60,7 @@ class DisjointUnionInternal
   end
   private def check_value(v)
-    raise "Value must be given and be in (0..#{@size - 1})" unless v && v.between?(0, @size - 1)
+    raise DataError, "Value must be given and be in (0..#{@size - 1})" unless v && v.between?(0, @size - 1)
   end
   private def link(e, f)

data/lib/data_structures_rmolinari/{generic_segment_tree_internal.rb → generic_segment_tree.rb} RENAMED Viewed

@@ -8,45 +8,33 @@ require_relative 'shared'
 # called an "interval tree."
 #
 # For more details (and some close-to-metal analysis of run time, especially for large datasets) see
-# https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up
-# implementation, which is faster, at least for large datasets and cache-relevant compiled code.
+# https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up implementation,
+# which is faster, at least for large datasets and cache-relevant compiled code. These issues don't really apply to code written in
+# Ruby.
 #
-# This is a generic implementation.
+# This is a generic implementation, intended to allow easy configuration for concrete instances. See the parameters to the
+# initializer and the definitions of concrete realisations like MaxValSegmentTree.
 #
 # We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
-#
-# @todo
-#   - provide a data-update operation like update_val_at(idx, val)
-#     - this is O(log n)
-#     - note that this may need some rework. Consider something like IndexOfMaxVal: @merge needs to know about the underlying data
-#       in that case. Hmmm. Maybe the lambda can close over the data in a way that makes it possible to change the data "from the
-#       outside". Yes:
-#         a = [1,2,3]
-#         foo = ->() { a.max }
-#         foo.call # 3
-#         a = [1,2,4]
-#         foo.call # 4
-#   - Offer an optional parameter base_case_value_extractor (<-- need better name) to be used in #determine_val in the case that
-#     left == tree_l && right == tree_r instead of simply returning @tree[tree_idx]
-#     - Use case: https://cp-algorithms.com/data_structures/segment_tree.html#saving-the-entire-subarrays-in-each-vertex, such as
-#       finding the least element in a subarray l..r no smaller than a given value x. In this case we store a sorted version the
-#       entire subarray at each node and use a binary search on it.
-#     - the default value would simply be the identity function.
-#     - NOTE that in this case, we have different "combine" functions in #determine_val and #build. In #build we would combine
-#       sorted lists into a larger sorted list. In #determine_val we combine results via #min.
-#     - Think about the interface before doing this.
-class GenericSegmentTreeInternal
+class DataStructuresRMolinari::GenericSegmentTree
   include Shared::BinaryTreeArithmetic
   # Construct a concrete instance of a Segment Tree. See details at the links above for the underlying concepts here.
   # @param combine a lambda that takes two values and munges them into a combined value.
   #   - For example, if we are calculating sums over subintervals, combine.call(a, b) = a + b, while if we are doing maxima we will
-  #     return max(a, b)
+  #     return max(a, b).
+  #   - Things get more complicated when we are calculating, say, the _index_ of the maximal value in a subinterval. Now it is not
+  #     enough simply to store that index at each tree node, because to combine the indices from two child nodes we need to know
+  #     both the index of the maximal element in each child node's interval, but also the maximal values themselves, so we know
+  #     which one "wins" for the parent node. This affects the sort of work we need to do when combining and the value provided by
+  #     the +single_cell_array_val+ lambda.
   # @param single_cell_array_val a lambda that takes an index i and returns the value we need to store in the #build
-  #     operation for the subinterval i..i. This is often simply be the value data[i], but in some cases - like "index of max val" -
-  #     it will be something else.
+  #     operation for the subinterval i..i.
+  #     - This will often simply be the value data[i], but in some cases it will be something else. For example, when we are
+  #       calculating the index of the maximal value on each subinterval we need [i, data[i]] here.
+  #     - If +update_at+ is called later, this lambda must close over the underlying data in a way that captures the updated value.
   # @param size the size of the underlying data array, used in certain internal arithmetic.
-  # @param identity is the value to return when we are querying on an empty interval
+  # @param identity the value to return when we are querying on an empty interval
   #   - for sums, this will be zero; for maxima, this will be -Infinity, etc
   def initialize(combine:, single_cell_array_val:, size:, identity:)
     @combine = combine
@@ -62,15 +50,28 @@ class GenericSegmentTreeInternal
   # @param left the left end of the subinterval.
   # @param right the right end (inclusive) of the subinterval.
   #
-  # The type of the return value depends on the concrete instance of the segment tree.
+  # The type of the return value depends on the concrete instance of the segment tree. We return the _identity_ element provided at
+  # construction time if the interval is empty.
   def query_on(left, right)
-    raise "Bad query interval #{left}..#{right}" if left.negative? || right >= @size
+    raise DataError, "Bad query interval #{left}..#{right}" if left.negative? || right >= @size
     return @identity if left > right # empty interval
     determine_val(root, left, right, 0, @size - 1)
   end
+  # Update the value in the underlying array at the given idx
+  #
+  # @param idx an index in the underlying data array.
+  #
+  # Note that we don't need the updated value itself. We get that by calling the lambda +single_cell_array_val+ supplied at
+  # construction.
+  def update_at(idx)
+    raise DataError, 'Cannot update an index outside the initial range of the underlying data' unless (0...@size).cover?(idx)
+    update_val_at(idx, root, 0, @size - 1)
+  end
   private def determine_val(tree_idx, left, right, tree_l, tree_r)
     # Does the current tree node exactly serve up the interval we're interested in?
     return @tree[tree_idx] if left == tree_l && right == tree_r
@@ -92,6 +93,26 @@ class GenericSegmentTreeInternal
     end
   end
+  private def update_val_at(idx, tree_idx, tree_l, tree_r)
+    if tree_l == tree_r
+      # We have found the spot!
+      raise InternalLogicError, 'tree_l == tree_r, but they do not agree with the idx holding the updated value' unless tree_l == idx
+      @tree[tree_idx] = @single_cell_array_val.call(tree_l)
+    else
+      # Recursively update the appropriate subtree
+      mid = midpoint(tree_l, tree_r)
+      left = left(tree_idx)
+      right = right(tree_idx)
+      if mid >= idx
+        update_val_at(idx, left(tree_idx), tree_l, mid)
+      else
+        update_val_at(idx, right(tree_idx), mid + 1, tree_r)
+      end
+      @tree[tree_idx] = @combine.call(@tree[left], @tree[right])
+    end
+  end
   # Build the internal data structure.
   #
   # - tree_idx is the index into @tree

data/lib/data_structures_rmolinari/{heap_internal.rb → heap.rb} RENAMED Viewed

@@ -13,8 +13,8 @@ require_relative 'shared'
 # - +empty?+
 #   - is the heap empty?
 #   - O(1)
-# - +insert+
-#   - add a new element to the heap with an associated priority
+# - +insert(item, priority)+
+#   - add a new item to the heap with an associated priority
 #   - O(log N)
 # - +top+
 #   - return the lowest-priority element, which is the element at the root of the tree. In a max-heap this is the highest-priority
@@ -23,12 +23,18 @@ require_relative 'shared'
 # - +pop+
 #   - removes and returns the item that would be returned by +top+
 #   - O(log N)
-# - +update+
+# - +update(item, priority)+
 #   - tell the heap that the priority of a particular item has changed
 #   - O(log N)
 #
 # Here N is the number of elements in the heap.
 #
+# The internal requirements needed to implement +update+ have several consequences.
+# - Items added to the heap must be distinct. Otherwise we would not know which occurrence to update
+# - There is some bookkeeping overhead.
+# If client code doesn't need to call +update+ then we can create a "non-addressable" heap that allows for the insertion of
+# duplicate items and has slightly faster runtime overall. See the arguments to the initializer.
+#
 # References:
 #
 # - https://en.wikipedia.org/wiki/Binary_heap
@@ -36,23 +42,31 @@ require_relative 'shared'
 #   DOI 10.1007/s00224-017-9760-2
 #
 # @todo
-#   - relax the requirement that priorities must be comparable vai +<+ and respond to negation. Instead, allow comparison via +<=>+
-#     and handle max-heaps differently.
-#     - this will allow priorities to be arrays for tie-breakers and similar.
-class HeapInternal
+#   - let caller see the priority of the top element. Maybe this is useful sometimes.
+class DataStructuresRMolinari::Heap
+  include Shared
   include Shared::BinaryTreeArithmetic
+  # The number of items currently in the heap
   attr_reader :size
-  Pair = Struct.new(:priority, :item)
+  # An (item, priority) pair
+  InternalPair = Struct.new(:item, :priority)
+  private_constant :InternalPair
   # @param max_heap when truthy, make a max-heap rather than a min-heap
-  # @param debug when truthy, verify the heap property after each update than might violate it. This makes operations much slower.
-  def initialize(max_heap: false, debug: false)
+  # @param addressable when truthy, the heap is _addressable_. This means that
+  #   - item priorities are updatable with +update(item, p)+, and
+  #   - items added to the heap must be distinct.
+  #   When falsy, priorities are not updateable but items may be inserted multiple times. Operations are slightly faster because
+  #   there is less internal bookkeeping.
+  # @param debug when truthy, verify the heap property after each change that might violate it. This makes operations much slower.
+  def initialize(max_heap: false, addressable: true, debug: false)
     @data = []
     @size = 0
     @max_heap = max_heap
-    @index_of = {}
+    @addressable = addressable
+    @index_of = {} # used in addressable heaps
     @debug = debug
   end
@@ -61,27 +75,25 @@ class HeapInternal
     @size.zero?
   end
-  # Insert a new element into the heap with the given property.
-  # @param value the item to be inserted. It is an error to insert an item that is already present in the heap, though we don't
-  #   check for this.
-  # @param priority the priority to use for new item. The values used as priorities ust be totally ordered via +<+ and, if +self+ is
-  #   a max-heap, must respond to negation +@-+ in the natural order-respecting way.
-  # @todo
-  #   - check for duplicate
+  # Insert a new element into the heap with the given priority.
+  # @param value the item to be inserted.
+  #   - If the heap is addressible (the default) it is an error to insert an item that is already present in the heap.
+  # @param priority the priority to use for new item. The values used as priorities must be totally ordered via +<=>+.
   def insert(value, priority)
-    priority *= -1 if @max_heap
+    raise DataError, "Heap already contains #{value}" if @addressable && contains?(value)
     @size += 1
-    d = Pair.new(priority, value)
+    d = InternalPair.new(value, priority)
     assign(d, @size)
     sift_up(@size)
   end
   # Return the top of the heap without removing it
-  # @return the value with minimal (maximal for max-heaps) priority. Strictly speaking, it returns the item at the root of the
-  #   binary tree; this element has minimal priority, but there may be other elements with the same priority.
+  # @return a value with minimal priority (maximal for max-heaps). Strictly speaking, it returns the item at the root of the
+  #   binary tree; this element has minimal priority, but there may be other elements with the same priority and they do not appear
+  #   at the top of the heap in any guaranteed order.
   def top
     raise 'Heap is empty!' unless @size.positive?
@@ -92,12 +104,11 @@ class HeapInternal
   # @return (see #top)
   def pop
     result = top
-    @index_of.delete(result)
     assign(@data[@size], root)
     @data[@size] = nil
     @size -= 1
+    @index_of.delete(result) if @addressable
     sift_down(root) if @size.positive?
@@ -105,21 +116,20 @@ class HeapInternal
   end
   # Update the priority of the given element and maintain the necessary heap properties.
+  #
   # @param element the item whose priority we are updating. It is an error to update the priority of an element not already in the
   #   heap
   # @param priority the new priority
-  #
-  # @todo
-  #   - check that the element is in the heap
   def update(element, priority)
-    priority *= -1 if @max_heap
+    raise LogicError, 'Cannot update priorities in a non-addressable heap' unless @addressable
+    raise DataError, "Cannot update priority for value #{element} not already in the heap" unless contains?(element)
     idx = @index_of[element]
     old = @data[idx].priority
     @data[idx].priority = priority
-    if priority > old
+    if less_than_priority?(old, priority)
       sift_down(idx)
-    elsif priority < old
+    elsif less_than_priority?(priority, old)
       sift_up(idx)
     end
@@ -133,7 +143,7 @@ class HeapInternal
     x = @data[idx]
     while idx != root
       i = parent(idx)
-      break unless x.priority < @data[i].priority
+      break unless less_than?(x, @data[i])
       assign(@data[i], idx)
       idx = i
@@ -148,9 +158,9 @@ class HeapInternal
     x = @data[idx]
     while (j = left(idx)) <= @size
-      j += 1 if j + 1 <= @size && @data[j + 1].priority < @data[j].priority
+      j += 1 if j + 1 <= @size && less_than?(@data[j + 1], @data[j])
-      break unless @data[j].priority < x.priority
+      break unless less_than?(@data[j], x)
       assign(@data[j], idx)
       idx = j
@@ -163,7 +173,27 @@ class HeapInternal
   # Put the pair in the given heap location
   private def assign(pair, idx)
     @data[idx] = pair
-    @index_of[pair.item] = idx
+    @index_of[pair.item] = idx if @addressable
+  end
+  # Compare the priorities of two items with <=> and return truthy exactly when the result is -1.
+  #
+  # If this is a max-heap return truthy exactly when the result of <=> is 1.
+  #
+  # The arguments can also be the priorities themselves.
+  private def less_than?(p1, p2)
+    less_than_priority?(p1.priority, p2.priority)
+  end
+  # Direct comparison of priorities
+  private def less_than_priority?(priority1, priority2)
+    return (priority1 <=> priority2) == 1 if @max_heap
+    (priority1 <=> priority2) == -1
+  end
+  private def contains?(item)
+    !!@index_of[item]
   end
   # For debugging
@@ -172,8 +202,8 @@ class HeapInternal
       left = left(idx)
       right = right(idx)
-      raise "Heap property violated by left child of index #{idx}" if left <= @size && @data[idx].priority >= @data[left].priority
-      raise "Heap property violated by right child of index #{idx}" if right <= @size && @data[idx].priority >= @data[right].priority
+      raise InternalLogicError, "Heap property violated by left child of index #{idx}" if left <= @size && less_than?(@data[left], @data[idx])
+      raise InternalLogicError, "Heap property violated by right child of index #{idx}" if right <= @size && less_than?(@data[right], @data[idx])
     end
   end
 end

data/lib/data_structures_rmolinari/{max_priority_search_tree_internal.rb → max_priority_search_tree.rb} RENAMED Viewed

@@ -1,13 +1,9 @@
 require 'set'
 require_relative 'shared'
-# A priority search tree (PST) stores a set, P, of two-dimensional points (x,y) in a way that allows efficient answes to certain
+# A priority search tree (PST) stores a set, P, of two-dimensional points (x,y) in a way that allows efficient answers to certain
 # questions about P.
 #
-# (In the current implementation no two points can share an x-value and no two points can share a y-value. This (rather severe)
-# restriction can be relaxed with some more complicated code.)
-#
 # The data structure was introduced in 1985 by Edward McCreight. Later, De, Maheshwari, Nandy, and Smid showed how to construct a
 # PST in-place (using only O(1) extra memory), at the expense of some slightly more complicated code for the various supported
 # operations. It is their approach that we have implemented.
@@ -33,21 +29,29 @@ require_relative 'shared'
 #
 # The final operation (enumerate) takes O(m + log n) time, where m is the number of points that are enumerated.
 #
+# In the current implementation no two points can share an x-value and no two points can share a y-value. This (rather severe)
+# restriction can be relaxed with some more complicated code.
+#
+#
 # There is a related data structure called the Min-max priority search tree so we have called this a "Max priority search tree", or
 # MaxPST.
 #
 # References:
-# * E.M. McCreight, _Priority search trees_, SIAM J. Comput., 14(2):257-276, 1985.  Later, De,
+# * E.M. McCreight, _Priority search trees_, SIAM J. Comput., 14(2):257-276, 1985.
 # * M. De, A. Maheshwari, S. C. Nandy, M. Smid, _An In-Place Priority Search Tree_, 23rd Canadian Conference on Computational
 #   Geometry, 2011
-class MaxPrioritySearchTreeInternal
+class DataStructuresRMolinari::MaxPrioritySearchTree
   include Shared
   include BinaryTreeArithmetic
   # Construct a MaxPST from the collection of points in +data+.
   #
-  # @param data [Array] the set P of points presented as an array. The tree is built in the array in-place without cloning. Each
-  #        element of the array must respond to +#x+ and +#y+ (though this is not currently checked).
+  # @param data [Array] the set P of points presented as an array. The tree is built in the array in-place without cloning.
+  #   - Each element of the array must respond to +#x+ and +#y+.
+  #     - This is not checked explicitly but a missing method exception will be thrown when we try to call one of them.
+  #   - The +x+ values must be distinct, as must the +y+ values. We raise a +Shared::DataError+ if this isn't the case.
+  #     - This is a restriction that simplifies some of the algorithm code. It can be removed as the cost of some extra work. Issue
+  #       #9.
   #
   # @param verify [Boolean] when truthy, check that the properties of a PST are satisified after construction, raising an exception
   #        if not.
@@ -69,7 +73,7 @@ class MaxPrioritySearchTreeInternal
   # Let Q = [x0, infty) X [y0, infty) be the northeast quadrant defined by the point (x0, y0) and let P be the points in this data
   # structure. Define p* as
   #
-  # - (infty, -infty) f Q \intersect P is empty and
+  # - (infty, -infty) if Q \intersect P is empty and
   # - the highest (max-x) point in Q \intersect P otherwise.
   #
   # This method returns p* in O(log n) time and O(1) extra space.
@@ -82,7 +86,7 @@ class MaxPrioritySearchTreeInternal
   # Let Q = (-infty, x0] X [y0, infty) be the northwest quadrant defined by the point (x0, y0) and let P be the points in this data
   # structure. Define p* as
   #
-  # - (-infty, -infty) f Q \intersect P is empty and
+  # - (-infty, -infty) if Q \intersect P is empty and
   # - the highest (max-y) point in Q \intersect P otherwise.
   #
   # This method returns p* in O(log n) time and O(1) extra space.
@@ -109,12 +113,12 @@ class MaxPrioritySearchTreeInternal
     p = root
     if quadrant == :ne
-      best = Pair.new(INFINITY, -INFINITY)
+      best = Point.new(INFINITY, -INFINITY)
       preferred_child = ->(n) { right(n) }
       nonpreferred_child = ->(n) { left(n) }
       sufficient_x = ->(x) { x >= x0 }
     else
-      best = Pair.new(-INFINITY, -INFINITY)
+      best = Point.new(-INFINITY, -INFINITY)
       preferred_child = ->(n) { left(n) }
       nonpreferred_child = ->(n) { right(n) }
       sufficient_x = ->(x) { x <= x0 }
@@ -186,7 +190,7 @@ class MaxPrioritySearchTreeInternal
   # Let Q = [x0, infty) X [y0, infty) be the northeast quadrant defined by the point (x0, y0) and let P be the points in this data
   # structure. Define p* as
   #
-  # - (infty, infty) f Q \intersect P is empty and
+  # - (infty, infty) if Q \intersect P is empty and
   # - the leftmost (min-x) point in Q \intersect P otherwise.
   #
   # This method returns p* in O(log n) time and O(1) extra space.
@@ -224,10 +228,10 @@ class MaxPrioritySearchTreeInternal
     if quadrant == :ne
       sign = 1
-      best = Pair.new(INFINITY, INFINITY)
+      best = Point.new(INFINITY, INFINITY)
     else
       sign = -1
-      best = Pair.new(-INFINITY, INFINITY)
+      best = Point.new(-INFINITY, INFINITY)
     end
     p = q = root
@@ -369,7 +373,7 @@ class MaxPrioritySearchTreeInternal
     #
     # Sometimes we don't have a relevant node to the left or right of Q. The booleans L and R (which we call left and right) track
     # whether p and q are defined at the moment.
-    best = Pair.new(INFINITY, -INFINITY)
+    best = Point.new(INFINITY, -INFINITY)
     p = q = left = right = nil
     x_range = (x0..x1)
@@ -637,7 +641,7 @@ class MaxPrioritySearchTreeInternal
           end
           current = parent(current)
         else
-          raise LogicError, "Explore(t) state is somehow #{state} rather than 0, 1, or 2."
+          raise InternalLogicError, "Explore(t) state is somehow #{state} rather than 0, 1, or 2."
         end
       end
     end
@@ -782,7 +786,7 @@ class MaxPrioritySearchTreeInternal
             p_in = right(p_in)
             left = true
           else
-            raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
+            raise InternalLogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
             p = left(p_in)
             q = right(p_in)
@@ -792,7 +796,7 @@ class MaxPrioritySearchTreeInternal
           end
         elsif left_val.x <= x1
           if right_val.x > x1
-            raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
+            raise InternalLogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
             q = right(p_in)
             p_in = left(p_in)
@@ -806,7 +810,7 @@ class MaxPrioritySearchTreeInternal
             right_in = true
           end
         else
-          raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
+          raise InternalLogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
           q = left(p_in)
           deactivate_p_in.call
@@ -842,8 +846,8 @@ class MaxPrioritySearchTreeInternal
       # q has two children. Cases!
       if @data[left(q)].x < x0
-        raise LogicError, 'p_in should not be active, based on the value at left(q)' if left_in
-        raise LogicError, 'q_in should not be active, based on the value at left(q)' if right_in
+        raise InternalLogicError, 'p_in should not be active, based on the value at left(q)' if left_in
+        raise InternalLogicError, 'q_in should not be active, based on the value at left(q)' if right_in
         left = true
         if @data[right(q)].x < x0
@@ -874,7 +878,7 @@ class MaxPrioritySearchTreeInternal
     # Given: q' is active and satisfied x0 <= x(q') <= x1
     enumerate_right_in = lambda do
-      raise LogicError, 'right_in should be true if we call enumerate_right_in' unless right_in
+      raise InternalLogicError, 'right_in should be true if we call enumerate_right_in' unless right_in
       if @data[q_in].y >= y0
         report.call(q_in)
@@ -906,7 +910,7 @@ class MaxPrioritySearchTreeInternal
       # q' has two children
       right_val = @data[right(q_in)]
       if left_val.x < x0
-        raise LogicError, 'p_in cannot be active, by the value in the left child of q_in' if left_in
+        raise InternalLogicError, 'p_in cannot be active, by the value in the left child of q_in' if left_in
         if right_val.x < x0
           p = right(q_in)
@@ -966,7 +970,7 @@ class MaxPrioritySearchTreeInternal
     while left || left_in || right_in || right
       # byebug if $do_it
-      raise LogicError, 'It should not be that q_in is active but p_in is not' if right_in && !left_in
+      raise InternalLogicError, 'It should not be that q_in is active but p_in is not' if right_in && !left_in
       set_i = []
       set_i << :left if left
@@ -984,7 +988,7 @@ class MaxPrioritySearchTreeInternal
       when :right
         enumerate_right.call
       else
-        raise LogicError, "bad symbol #{z}"
+        raise InternalLogicError, "bad symbol #{z}"
       end
     end
     return result unless block_given?
@@ -994,9 +998,14 @@ class MaxPrioritySearchTreeInternal
   # Build the initial stucture
   private def construct_pst
-    # We follow the algorithm in the paper by De, Maheshwari et al. Note that indexing is from 1 there. For now we pretend that that
-    # is the case here, too.
+    raise DataError, 'Duplicate x values are not supported' if contains_duplicates?(@data, by: :x)
+    raise DataError, 'Duplicate y values are not supported' if contains_duplicates?(@data, by: :y)
+    # We follow the algorithm in the paper by De, Maheshwari et al.
+    # Since we are building an implicit binary tree, things are simpler if the array is 1-based. This probably requires a malloc and
+    # data copy, which isn't great, but it's in the C layer so cheap compared to the O(n log^2 n) work we need to do for
+    # construction. In fact, we are probably doing O(n^2) work because of all the calls to #index_with_largest_y_in.
     @data.unshift nil
     h = Math.log2(@size).floor
@@ -1106,13 +1115,11 @@ class MaxPrioritySearchTreeInternal
     (l..r).max_by { |idx| @data[idx].y }
   end
-  # Sort the subarray @data[l..r]. This is much faster than a Ruby-layer heapsort because it is mostly happening in C.
+  # Sort the subarray @data[l..r].
   private def sort_subarray(l, r)
-    # heapsort_subarray(l, r)
     return if l == r # 1-array already sorted!
-    #l -= 1
-    #r -= 1
+    # This slice-replacement is much faster than a Ruby-layer heapsort because it is mostly happening in C.
     @data[l..r] = @data[l..r].sort_by(&:x)
   end
@@ -1127,7 +1134,7 @@ class MaxPrioritySearchTreeInternal
   private def verify_properties
     # It's a max-heap in y
     (2..@size).each do |node|
-      raise LogicError, "Heap property violated at child #{node}" unless @data[node].y < @data[parent(node)].y
+      raise InternalLogicError, "Heap property violated at child #{node}" unless @data[node].y < @data[parent(node)].y
     end
     # Left subtree has x values less than all of the right subtree
@@ -1137,7 +1144,7 @@ class MaxPrioritySearchTreeInternal
       left_max = max_x_in_subtree(left(node))
       right_min = min_x_in_subtree(right(node))
-      raise LogicError, "Left-right property of x-values violated at #{node}" unless left_max < right_min
+      raise InternalLogicError, "Left-right property of x-values violated at #{node}" unless left_max < right_min
     end
   end

data/lib/data_structures_rmolinari/{minmax_priority_search_tree_internal.rb → minmax_priority_search_tree.rb} RENAMED Viewed

@@ -2,15 +2,13 @@ require 'must_be'
 require_relative 'shared'
+# THIS CLASS IS INCOMPLETE AND NOT USABLE
+#
 # A priority search tree (PST) stores points in two dimensions (x,y) and can efficiently answer certain questions about the set of
 # point.
 #
 # The structure was introduced by McCreight [1].
 #
-# It is a binary search tree which is a max-heap by the y-coordinate, and, for a non-leaf node N storing (x, y), all the nodes in
-# the left subtree of N have smaller x values than any of the nodes in the right subtree of N. Note, though, that the x-value at N
-# has no particular property relative to the x values in its subtree. It is thus _almost_ a binary search tree in the x coordinate.
-#
 # See more: https://en.wikipedia.org/wiki/Priority_search_tree
 #
 # It is possible to build such a tree in place, given an array of pairs. See [2]. In a follow-up paper, [3], the authors show how to
@@ -40,12 +38,12 @@ require_relative 'shared'
 # [2] De, Maheshwari, Nandy, Smid, _An in-place priority search tree_, 23rd Annual Canadian Conference on Computational Geometry.
 # [3] De, Maheshwari, Nandy, Smid, _An in-place min-max priority search tree_, Computational Geometry, v46 (2013), pp 310-327.
 # [4] Atkinson, Sack, Santoro, Strothotte, _Min-max heaps and generalized priority queues_, Commun. ACM 29 (10) (1986), pp 996-1000.
-class MinmaxPrioritySearchTreeInternal
+class DataStructuresRMolinari::MinmaxPrioritySearchTree
   include Shared
   # The array of pairs is turned into a minmax PST in-place without cloning. So clone before passing it in, if you care.
   #
-  # Each element must respond to #x and #y. Use Pair (above) if you like.
+  # Each element must respond to #x and #y. Use Point (above) if you like.
   def initialize(data, verify: false)
     @data = data
     @size = @data.size
@@ -75,7 +73,7 @@ class MinmaxPrioritySearchTreeInternal
   #
   # Here T(x) is the subtree rooted at x
   def leftmost_ne(x0, y0)
-    best = Pair.new(INFINITY, INFINITY)
+    best = Point.new(INFINITY, INFINITY)
     p = q = root
     in_q = ->(pair) { pair.x >= x0 && pair.y >= y0 }
@@ -284,7 +282,7 @@ class MinmaxPrioritySearchTreeInternal
   #
   # This method returns p*
   # def highest_3_sided_up(x0, x1, y0)
-  #   best = Pair.new(INFINITY, -INFINITY)
+  #   best = Point.new(INFINITY, -INFINITY)
   #   in_q = lambda do |pair|
   #     pair.x >= x0 && pair.x <= x1 && pair.y >= y0
@@ -407,7 +405,7 @@ class MinmaxPrioritySearchTreeInternal
     #     - If Q intersect P is empty then p* = best
     #
     # Here, P is the set of points in our data structure and T_p is the subtree rooted at p
-    best = Pair.new(INFINITY, -INFINITY)
+    best = Point.new(INFINITY, -INFINITY)
     p = root # root of the whole tree AND the pair stored there
     in_q = lambda do |pair|

data/lib/data_structures_rmolinari/shared.rb CHANGED Viewed

@@ -1,11 +1,20 @@
 # Some odds and ends shared by other classes
 module Shared
+  # Infinity without having to put a +Float::+ prefix every time
   INFINITY = Float::INFINITY
-  Pair = Struct.new(:x, :y)
+  # An (x, y) coordinate pair.
+  Point = Struct.new(:x, :y)
   # @private
+  # Used for errors related to logic errors in client code
   class LogicError < StandardError; end
+  # Used for errors related to logic errors in library code
+  class InternalLogicError < LogicError; end
+  # Used for errors related to data, such as duplicated elements where they must be distinct.
+  class DataError < StandardError; end
   # @private
   #
@@ -61,4 +70,22 @@ module Shared
       (i & 1).zero?
     end
   end
+  # Simple O(n) check for duplicates in an enumerable.
+  #
+  # It may be worse than O(n), depending on how close to constant set insertion is.
+  #
+  # @param enum the enumerable to check for duplicates
+  # @param by a method to call on each element of enum before checking. The results of these methods are checked for
+  #        duplication. When nil we don't call anything and just use the elements themselves.
+  def contains_duplicates?(enum, by: nil)
+    seen = Set.new
+    enum.each do |v|
+      v = v.send(by) if by
+      return true if seen.include? v
+      seen << v
+    end
+    false
+  end
 end

data/lib/data_structures_rmolinari.rb CHANGED Viewed

@@ -1,54 +1,78 @@
 require_relative 'data_structures_rmolinari/shared'
-require_relative 'data_structures_rmolinari/disjoint_union_internal'
-require_relative 'data_structures_rmolinari/generic_segment_tree_internal'
-require_relative 'data_structures_rmolinari/heap_internal'
-require_relative 'data_structures_rmolinari/max_priority_search_tree_internal'
-require_relative 'data_structures_rmolinari/minmax_priority_search_tree_internal'
 module DataStructuresRMolinari
-  Pair = Shared::Pair
-  ########################################
-  # Priority Search Trees
-  #
-  # Note that MinmaxPrioritySearchTree is only a fragment of what we need
+  # A struct responding to +.x+ and +.y+.
+  Point = Shared::Point
+end
-  MaxPrioritySearchTree = MaxPrioritySearchTreeInternal
-  MinmaxPrioritySearchTree = MinmaxPrioritySearchTreeInternal
+# These define classes inside module DataStructuresRMolinari
+require_relative 'data_structures_rmolinari/disjoint_union'
+require_relative 'data_structures_rmolinari/generic_segment_tree'
+require_relative 'data_structures_rmolinari/heap'
+require_relative 'data_structures_rmolinari/max_priority_search_tree'
+require_relative 'data_structures_rmolinari/minmax_priority_search_tree'
+# A namespace to hold the provided classes. We want to avoid polluting the global namespace with names like "Heap"
+module DataStructuresRMolinari
   ########################################
-  # Segment Trees
-  GenericSegmentTree = GenericSegmentTreeInternal
-  # Takes an array A[0...n] and tells us what the maximum value is on a subinterval i..j in O(log n) time.
+  # Concrete instances of Segment Tree
   #
-  # TODO:
-  # - allow min val too
-  #   - add a flag to the initializer
-  #   - call it ExtremalValSegment tree or something similar
+  # @todo consider moving these into generic_segment_tree.rb
+  # A segment tree that for an array A(0...n) answers questions of the form "what is the maximum value in the subinterval A(i..j)?"
+  # in O(log n) time.
   class MaxValSegmentTree
     extend Forwardable
-    def_delegator :@structure, :query_on, :max_on
+    # Tell the tree that the value at idx has changed
+    def_delegator :@structure, :update_at
+    # @param data an object that contains values at integer indices based at 0, via +data[i]+.
+    #   - This will usually be an Array, but it could also be a hash or a proc.
     def initialize(data)
       @structure = GenericSegmentTree.new(
         combine:               ->(a, b) { [a, b].max },
         single_cell_array_val: ->(i) { data[i] },
         size:                  data.size,
-        identity:              -Float::INFINITY
+        identity:              -Shared::INFINITY
       )
     end
+    # The maximum value in A(i..j).
+    #
+    # The arguments must be integers in 0...(A.size)
+    # @return the largest value in A(i..j) or -Infinity if i > j.
+    def max_on(i, j)
+      @structure.query_on(i, j)
+    end
   end
-  ########################################
-  # Heap
+  # A segment tree that for an array A(0...n) answers questions of the form "what is the index of the maximal value in the
+  # subinterval A(i..j)?" in O(log n) time.
+  class IndexOfMaxValSegmentTree
+    extend Forwardable
-  Heap = HeapInternal
+    # Tell the tree that the value at idx has changed
+    def_delegator :@structure, :update_at
-  ########################################
-  # Disjoint Union
+    # @param (see MaxValSegmentTree#initialize)
+    def initialize(data)
+      @structure = GenericSegmentTree.new(
+        combine:               ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
+        single_cell_array_val: ->(i) { [i, data[i]] },
+        size:                  data.size,
+        identity:              nil
+      )
+    end
-  DisjointUnion = DisjointUnionInternal
+    # The index of the maximum value in A(i..j)
+    #
+    # The arguments must be integers in 0...(A.size)
+    # @return (Integer, nil) the index of the largest value in A(i..j) or +nil+ if i > j.
+    #   - If there is more than one entry with that value, return one the indices. There is no guarantee as to which one.
+    #   - Return +nil+ if i > j
+    def index_of_max_val_on(i, j)
+      @structure.query_on(i, j)&.first # discard the value part of the pair
+    end
+  end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: data_structures_rmolinari
 version: !ruby/object:Gem::Version
-  version: 0.2.1
+  version: 0.3.0
 platform: ruby
 authors:
 - Rory Molinari
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2023-01-05 00:00:00.000000000 Z
+date: 2023-01-06 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: must_be
@@ -67,24 +67,26 @@ dependencies:
       - !ruby/object:Gem::Version
         version: 0.22.0
 description: |
-  This small gem contains several data structures that I have implemented to learn how they work.
+  This small gem contains several data structures that I have implemented in Ruby to learn how they work.
   Sometimes it is not enough to read the description of a data structure and accompanying pseudo-code.
-  Actually implementing the structure is often helpful in understanding what is going on. It is also
+  Actually implementing it is often helpful in understanding what is going on. It is also
   usually fun.
-  We implement Disjoin Union, Heap, Priority Search Tree, and Segment Tree.
-email: rorymolinari+rubygems@gmail.com
+  The gem contains basic implementions of Disjoint Union, Heap, Priority Search Tree, and Segment Tree.
+  See the homepage for more details.
+email: rorymolinari@gmail.com
 executables: []
 extensions: []
 extra_rdoc_files: []
 files:
+- CHANGELOG.md
 - lib/data_structures_rmolinari.rb
-- lib/data_structures_rmolinari/disjoint_union_internal.rb
-- lib/data_structures_rmolinari/generic_segment_tree_internal.rb
-- lib/data_structures_rmolinari/heap_internal.rb
-- lib/data_structures_rmolinari/max_priority_search_tree_internal.rb
-- lib/data_structures_rmolinari/minmax_priority_search_tree_internal.rb
+- lib/data_structures_rmolinari/disjoint_union.rb
+- lib/data_structures_rmolinari/generic_segment_tree.rb
+- lib/data_structures_rmolinari/heap.rb
+- lib/data_structures_rmolinari/max_priority_search_tree.rb
+- lib/data_structures_rmolinari/minmax_priority_search_tree.rb
 - lib/data_structures_rmolinari/shared.rb
 homepage: https://github.com/rmolinari/data_structures
 licenses: