data_structures_rmolinari 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b8d63f6987f160f00ce92f79643f904f95bc230e4a5b60679f4301ecd366622a
4
- data.tar.gz: cdc660ab53a0259a79b72ecc3928f640faa7b1251a1bae3baabb1c70bac20d01
3
+ metadata.gz: 9f006234ee3b216d5607e9b10bb1958a6107ccfa0cc8c359f98383dc7fde14ee
4
+ data.tar.gz: f281ab0768e24e7c983cd046ba7b185dab8fd972fb3065fd73ff575782bf5486
5
5
  SHA512:
6
- metadata.gz: 057d29af42606c7ecc4a78e6c752fa5f7a29b41788f21ca7ede6bab0893d1ba63b6c9dc3e5f3b5747f3912718565d6dfcb5952d189f78b40a6ef77efe95f34c9
7
- data.tar.gz: '08f6921ab4b1c7cf84c9fe5624266b7060f4f1d4f1ecc56e3850106b3bb30e0c6b30c3ea808a61939966350118684f3353ccbb628d24b353b35ae865268222d2'
6
+ metadata.gz: e274a97f177fad44bad20ecf24ecca1385fee3c217e7e42aac076c24377970c6444dfdbadc6fd3e1e201555177429c9f8eddaee211e463dd60f6b36e74004eec
7
+ data.tar.gz: 293fc0b2973a8d851c27f4e64177dbf7b9a25b2bb7eb9efb4b33abdb07c4e006f80f4450996ef99da7e8bb1516ca8aa89ab893258960d9127d101995906254ed
data/CHANGELOG.md ADDED
@@ -0,0 +1,18 @@
1
+ # Changelog
2
+
3
+ ## [Unreleased]
4
+
5
+ ## [0.3.0] 2023-01-06
6
+
7
+ ### Added
8
+
9
+ - Start this file
10
+ - `Heap` can be constructed as "non-addressable"
11
+ - `update` is not possible but duplicates can be inserted and overall performance is a little better.
12
+
13
+ ### Changed
14
+
15
+ - `LogicError` gets a subclassed `InternalLogicError` for issues inside the library.
16
+ - `Shared::Pair` becomes `Shared::Point`
17
+ - this doesn't change the API of `MaxPrioritySearchTree` because of ducktyping. But client code (of which there is none) might be
18
+ using the `Pair` name.
@@ -4,13 +4,21 @@
4
4
  # The data structure provides efficient actions to merge two disjoint subsets, i.e., replace them by their union, and determine if
5
5
  # two elements are in the same subset.
6
6
  #
7
- # The elements of the set must be 0, 1, ..., n-1. Client code can map its data to these representatives. The code uses several ideas
8
- # from Tarjan and van Leeuwen for efficiency
7
+ # The elements of the set are 0, 1, ..., n-1, where n is the size of the universe. Client code can map its data to these
8
+ # representatives.
9
9
  #
10
10
  # See https://en.wikipedia.org/wiki/Disjoint-set_data_structure for a good introduction.
11
11
  #
12
+ # The code uses several ideas from Tarjan and van Leeuwen for efficiency. We use "union by rank" in +unite+ and path-halving in
13
+ # +find+. Together, these make the amortized cost for each of n such operations effectively constant.
14
+ #
12
15
  # - Tarjan, Robert E., van Leeuwen, Jan (1984). "Worst-case analysis of set union algorithms". Journal of the ACM. 31 (2): 245–281.
13
- class DisjointUnionInternal
16
+ #
17
+ # @todo
18
+ # - allow caller to expand the size of the universe. This operation is called "make set".
19
+ # - All we need to do is increase the size of @d, set the parent pointers, define the new ranks (zero), and update @size.
20
+ class DataStructuresRMolinari::DisjointUnion
21
+ # The number of subsets in the partition.
14
22
  attr_reader :subset_count
15
23
 
16
24
  # @param size the size of the universe, which must be known at the time of construction. The elements 0, 1, ..., size - 1 start
@@ -52,7 +60,7 @@ class DisjointUnionInternal
52
60
  end
53
61
 
54
62
  private def check_value(v)
55
- raise "Value must be given and be in (0..#{@size - 1})" unless v && v.between?(0, @size - 1)
63
+ raise DataError, "Value must be given and be in (0..#{@size - 1})" unless v && v.between?(0, @size - 1)
56
64
  end
57
65
 
58
66
  private def link(e, f)
@@ -8,45 +8,33 @@ require_relative 'shared'
8
8
  # called an "interval tree."
9
9
  #
10
10
  # For more details (and some close-to-metal analysis of run time, especially for large datasets) see
11
- # https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up
12
- # implementation, which is faster, at least for large datasets and cache-relevant compiled code.
11
+ # https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up implementation,
12
+ # which is faster, at least for large datasets and cache-relevant compiled code. These issues don't really apply to code written in
13
+ # Ruby.
13
14
  #
14
- # This is a generic implementation.
15
+ # This is a generic implementation, intended to allow easy configuration for concrete instances. See the parameters to the
16
+ # initializer and the definitions of concrete realisations like MaxValSegmentTree.
15
17
  #
16
18
  # We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
17
- #
18
- # @todo
19
- # - provide a data-update operation like update_val_at(idx, val)
20
- # - this is O(log n)
21
- # - note that this may need some rework. Consider something like IndexOfMaxVal: @merge needs to know about the underlying data
22
- # in that case. Hmmm. Maybe the lambda can close over the data in a way that makes it possible to change the data "from the
23
- # outside". Yes:
24
- # a = [1,2,3]
25
- # foo = ->() { a.max }
26
- # foo.call # 3
27
- # a = [1,2,4]
28
- # foo.call # 4
29
- # - Offer an optional parameter base_case_value_extractor (<-- need better name) to be used in #determine_val in the case that
30
- # left == tree_l && right == tree_r instead of simply returning @tree[tree_idx]
31
- # - Use case: https://cp-algorithms.com/data_structures/segment_tree.html#saving-the-entire-subarrays-in-each-vertex, such as
32
- # finding the least element in a subarray l..r no smaller than a given value x. In this case we store a sorted version the
33
- # entire subarray at each node and use a binary search on it.
34
- # - the default value would simply be the identity function.
35
- # - NOTE that in this case, we have different "combine" functions in #determine_val and #build. In #build we would combine
36
- # sorted lists into a larger sorted list. In #determine_val we combine results via #min.
37
- # - Think about the interface before doing this.
38
- class GenericSegmentTreeInternal
19
+ class DataStructuresRMolinari::GenericSegmentTree
39
20
  include Shared::BinaryTreeArithmetic
40
21
 
41
22
  # Construct a concrete instance of a Segment Tree. See details at the links above for the underlying concepts here.
42
23
  # @param combine a lambda that takes two values and munges them into a combined value.
43
24
  # - For example, if we are calculating sums over subintervals, combine.call(a, b) = a + b, while if we are doing maxima we will
44
- # return max(a, b)
25
+ # return max(a, b).
26
+ # - Things get more complicated when we are calculating, say, the _index_ of the maximal value in a subinterval. Now it is not
27
+ # enough simply to store that index at each tree node, because to combine the indices from two child nodes we need to know
28
+ # both the index of the maximal element in each child node's interval, but also the maximal values themselves, so we know
29
+ # which one "wins" for the parent node. This affects the sort of work we need to do when combining and the value provided by
30
+ # the +single_cell_array_val+ lambda.
45
31
  # @param single_cell_array_val a lambda that takes an index i and returns the value we need to store in the #build
46
- # operation for the subinterval i..i. This is often simply be the value data[i], but in some cases - like "index of max val" -
47
- # it will be something else.
32
+ # operation for the subinterval i..i.
33
+ # - This will often simply be the value data[i], but in some cases it will be something else. For example, when we are
34
+ # calculating the index of the maximal value on each subinterval we need [i, data[i]] here.
35
+ # - If +update_at+ is called later, this lambda must close over the underlying data in a way that captures the updated value.
48
36
  # @param size the size of the underlying data array, used in certain internal arithmetic.
49
- # @param identity is the value to return when we are querying on an empty interval
37
+ # @param identity the value to return when we are querying on an empty interval
50
38
  # - for sums, this will be zero; for maxima, this will be -Infinity, etc
51
39
  def initialize(combine:, single_cell_array_val:, size:, identity:)
52
40
  @combine = combine
@@ -62,15 +50,28 @@ class GenericSegmentTreeInternal
62
50
  # @param left the left end of the subinterval.
63
51
  # @param right the right end (inclusive) of the subinterval.
64
52
  #
65
- # The type of the return value depends on the concrete instance of the segment tree.
53
+ # The type of the return value depends on the concrete instance of the segment tree. We return the _identity_ element provided at
54
+ # construction time if the interval is empty.
66
55
  def query_on(left, right)
67
- raise "Bad query interval #{left}..#{right}" if left.negative? || right >= @size
56
+ raise DataError, "Bad query interval #{left}..#{right}" if left.negative? || right >= @size
68
57
 
69
58
  return @identity if left > right # empty interval
70
59
 
71
60
  determine_val(root, left, right, 0, @size - 1)
72
61
  end
73
62
 
63
+ # Update the value in the underlying array at the given idx
64
+ #
65
+ # @param idx an index in the underlying data array.
66
+ #
67
+ # Note that we don't need the updated value itself. We get that by calling the lambda +single_cell_array_val+ supplied at
68
+ # construction.
69
+ def update_at(idx)
70
+ raise DataError, 'Cannot update an index outside the initial range of the underlying data' unless (0...@size).cover?(idx)
71
+
72
+ update_val_at(idx, root, 0, @size - 1)
73
+ end
74
+
74
75
  private def determine_val(tree_idx, left, right, tree_l, tree_r)
75
76
  # Does the current tree node exactly serve up the interval we're interested in?
76
77
  return @tree[tree_idx] if left == tree_l && right == tree_r
@@ -92,6 +93,26 @@ class GenericSegmentTreeInternal
92
93
  end
93
94
  end
94
95
 
96
+ private def update_val_at(idx, tree_idx, tree_l, tree_r)
97
+ if tree_l == tree_r
98
+ # We have found the spot!
99
+ raise InternalLogicError, 'tree_l == tree_r, but they do not agree with the idx holding the updated value' unless tree_l == idx
100
+
101
+ @tree[tree_idx] = @single_cell_array_val.call(tree_l)
102
+ else
103
+ # Recursively update the appropriate subtree
104
+ mid = midpoint(tree_l, tree_r)
105
+ left = left(tree_idx)
106
+ right = right(tree_idx)
107
+ if mid >= idx
108
+ update_val_at(idx, left(tree_idx), tree_l, mid)
109
+ else
110
+ update_val_at(idx, right(tree_idx), mid + 1, tree_r)
111
+ end
112
+ @tree[tree_idx] = @combine.call(@tree[left], @tree[right])
113
+ end
114
+ end
115
+
95
116
  # Build the internal data structure.
96
117
  #
97
118
  # - tree_idx is the index into @tree
@@ -13,8 +13,8 @@ require_relative 'shared'
13
13
  # - +empty?+
14
14
  # - is the heap empty?
15
15
  # - O(1)
16
- # - +insert+
17
- # - add a new element to the heap with an associated priority
16
+ # - +insert(item, priority)+
17
+ # - add a new item to the heap with an associated priority
18
18
  # - O(log N)
19
19
  # - +top+
20
20
  # - return the lowest-priority element, which is the element at the root of the tree. In a max-heap this is the highest-priority
@@ -23,12 +23,18 @@ require_relative 'shared'
23
23
  # - +pop+
24
24
  # - removes and returns the item that would be returned by +top+
25
25
  # - O(log N)
26
- # - +update+
26
+ # - +update(item, priority)+
27
27
  # - tell the heap that the priority of a particular item has changed
28
28
  # - O(log N)
29
29
  #
30
30
  # Here N is the number of elements in the heap.
31
31
  #
32
+ # The internal requirements needed to implement +update+ have several consequences.
33
+ # - Items added to the heap must be distinct. Otherwise we would not know which occurrence to update
34
+ # - There is some bookkeeping overhead.
35
+ # If client code doesn't need to call +update+ then we can create a "non-addressable" heap that allows for the insertion of
36
+ # duplicate items and has slightly faster runtime overall. See the arguments to the initializer.
37
+ #
32
38
  # References:
33
39
  #
34
40
  # - https://en.wikipedia.org/wiki/Binary_heap
@@ -36,23 +42,31 @@ require_relative 'shared'
36
42
  # DOI 10.1007/s00224-017-9760-2
37
43
  #
38
44
  # @todo
39
- # - relax the requirement that priorities must be comparable vai +<+ and respond to negation. Instead, allow comparison via +<=>+
40
- # and handle max-heaps differently.
41
- # - this will allow priorities to be arrays for tie-breakers and similar.
42
- class HeapInternal
45
+ # - let caller see the priority of the top element. Maybe this is useful sometimes.
46
+ class DataStructuresRMolinari::Heap
47
+ include Shared
43
48
  include Shared::BinaryTreeArithmetic
44
49
 
50
+ # The number of items currently in the heap
45
51
  attr_reader :size
46
52
 
47
- Pair = Struct.new(:priority, :item)
53
+ # An (item, priority) pair
54
+ InternalPair = Struct.new(:item, :priority)
55
+ private_constant :InternalPair
48
56
 
49
57
  # @param max_heap when truthy, make a max-heap rather than a min-heap
50
- # @param debug when truthy, verify the heap property after each update than might violate it. This makes operations much slower.
51
- def initialize(max_heap: false, debug: false)
58
+ # @param addressable when truthy, the heap is _addressable_. This means that
59
+ # - item priorities are updatable with +update(item, p)+, and
60
+ # - items added to the heap must be distinct.
61
+ # When falsy, priorities are not updateable but items may be inserted multiple times. Operations are slightly faster because
62
+ # there is less internal bookkeeping.
63
+ # @param debug when truthy, verify the heap property after each change that might violate it. This makes operations much slower.
64
+ def initialize(max_heap: false, addressable: true, debug: false)
52
65
  @data = []
53
66
  @size = 0
54
67
  @max_heap = max_heap
55
- @index_of = {}
68
+ @addressable = addressable
69
+ @index_of = {} # used in addressable heaps
56
70
  @debug = debug
57
71
  end
58
72
 
@@ -61,27 +75,25 @@ class HeapInternal
61
75
  @size.zero?
62
76
  end
63
77
 
64
- # Insert a new element into the heap with the given property.
65
- # @param value the item to be inserted. It is an error to insert an item that is already present in the heap, though we don't
66
- # check for this.
67
- # @param priority the priority to use for new item. The values used as priorities ust be totally ordered via +<+ and, if +self+ is
68
- # a max-heap, must respond to negation +@-+ in the natural order-respecting way.
69
- # @todo
70
- # - check for duplicate
78
+ # Insert a new element into the heap with the given priority.
79
+ # @param value the item to be inserted.
80
+ # - If the heap is addressible (the default) it is an error to insert an item that is already present in the heap.
81
+ # @param priority the priority to use for new item. The values used as priorities must be totally ordered via +<=>+.
71
82
  def insert(value, priority)
72
- priority *= -1 if @max_heap
83
+ raise DataError, "Heap already contains #{value}" if @addressable && contains?(value)
73
84
 
74
85
  @size += 1
75
86
 
76
- d = Pair.new(priority, value)
87
+ d = InternalPair.new(value, priority)
77
88
  assign(d, @size)
78
89
 
79
90
  sift_up(@size)
80
91
  end
81
92
 
82
93
  # Return the top of the heap without removing it
83
- # @return the value with minimal (maximal for max-heaps) priority. Strictly speaking, it returns the item at the root of the
84
- # binary tree; this element has minimal priority, but there may be other elements with the same priority.
94
+ # @return a value with minimal priority (maximal for max-heaps). Strictly speaking, it returns the item at the root of the
95
+ # binary tree; this element has minimal priority, but there may be other elements with the same priority and they do not appear
96
+ # at the top of the heap in any guaranteed order.
85
97
  def top
86
98
  raise 'Heap is empty!' unless @size.positive?
87
99
 
@@ -92,12 +104,11 @@ class HeapInternal
92
104
  # @return (see #top)
93
105
  def pop
94
106
  result = top
95
- @index_of.delete(result)
96
-
97
107
  assign(@data[@size], root)
98
108
 
99
109
  @data[@size] = nil
100
110
  @size -= 1
111
+ @index_of.delete(result) if @addressable
101
112
 
102
113
  sift_down(root) if @size.positive?
103
114
 
@@ -105,21 +116,20 @@ class HeapInternal
105
116
  end
106
117
 
107
118
  # Update the priority of the given element and maintain the necessary heap properties.
119
+ #
108
120
  # @param element the item whose priority we are updating. It is an error to update the priority of an element not already in the
109
121
  # heap
110
122
  # @param priority the new priority
111
- #
112
- # @todo
113
- # - check that the element is in the heap
114
123
  def update(element, priority)
115
- priority *= -1 if @max_heap
124
+ raise LogicError, 'Cannot update priorities in a non-addressable heap' unless @addressable
125
+ raise DataError, "Cannot update priority for value #{element} not already in the heap" unless contains?(element)
116
126
 
117
127
  idx = @index_of[element]
118
128
  old = @data[idx].priority
119
129
  @data[idx].priority = priority
120
- if priority > old
130
+ if less_than_priority?(old, priority)
121
131
  sift_down(idx)
122
- elsif priority < old
132
+ elsif less_than_priority?(priority, old)
123
133
  sift_up(idx)
124
134
  end
125
135
 
@@ -133,7 +143,7 @@ class HeapInternal
133
143
  x = @data[idx]
134
144
  while idx != root
135
145
  i = parent(idx)
136
- break unless x.priority < @data[i].priority
146
+ break unless less_than?(x, @data[i])
137
147
 
138
148
  assign(@data[i], idx)
139
149
  idx = i
@@ -148,9 +158,9 @@ class HeapInternal
148
158
  x = @data[idx]
149
159
 
150
160
  while (j = left(idx)) <= @size
151
- j += 1 if j + 1 <= @size && @data[j + 1].priority < @data[j].priority
161
+ j += 1 if j + 1 <= @size && less_than?(@data[j + 1], @data[j])
152
162
 
153
- break unless @data[j].priority < x.priority
163
+ break unless less_than?(@data[j], x)
154
164
 
155
165
  assign(@data[j], idx)
156
166
  idx = j
@@ -163,7 +173,27 @@ class HeapInternal
163
173
  # Put the pair in the given heap location
164
174
  private def assign(pair, idx)
165
175
  @data[idx] = pair
166
- @index_of[pair.item] = idx
176
+ @index_of[pair.item] = idx if @addressable
177
+ end
178
+
179
+ # Compare the priorities of two items with <=> and return truthy exactly when the result is -1.
180
+ #
181
+ # If this is a max-heap return truthy exactly when the result of <=> is 1.
182
+ #
183
+ # The arguments can also be the priorities themselves.
184
+ private def less_than?(p1, p2)
185
+ less_than_priority?(p1.priority, p2.priority)
186
+ end
187
+
188
+ # Direct comparison of priorities
189
+ private def less_than_priority?(priority1, priority2)
190
+ return (priority1 <=> priority2) == 1 if @max_heap
191
+
192
+ (priority1 <=> priority2) == -1
193
+ end
194
+
195
+ private def contains?(item)
196
+ !!@index_of[item]
167
197
  end
168
198
 
169
199
  # For debugging
@@ -172,8 +202,8 @@ class HeapInternal
172
202
  left = left(idx)
173
203
  right = right(idx)
174
204
 
175
- raise "Heap property violated by left child of index #{idx}" if left <= @size && @data[idx].priority >= @data[left].priority
176
- raise "Heap property violated by right child of index #{idx}" if right <= @size && @data[idx].priority >= @data[right].priority
205
+ raise InternalLogicError, "Heap property violated by left child of index #{idx}" if left <= @size && less_than?(@data[left], @data[idx])
206
+ raise InternalLogicError, "Heap property violated by right child of index #{idx}" if right <= @size && less_than?(@data[right], @data[idx])
177
207
  end
178
208
  end
179
209
  end
@@ -1,13 +1,9 @@
1
1
  require 'set'
2
2
  require_relative 'shared'
3
3
 
4
-
5
- # A priority search tree (PST) stores a set, P, of two-dimensional points (x,y) in a way that allows efficient answes to certain
4
+ # A priority search tree (PST) stores a set, P, of two-dimensional points (x,y) in a way that allows efficient answers to certain
6
5
  # questions about P.
7
6
  #
8
- # (In the current implementation no two points can share an x-value and no two points can share a y-value. This (rather severe)
9
- # restriction can be relaxed with some more complicated code.)
10
- #
11
7
  # The data structure was introduced in 1985 by Edward McCreight. Later, De, Maheshwari, Nandy, and Smid showed how to construct a
12
8
  # PST in-place (using only O(1) extra memory), at the expense of some slightly more complicated code for the various supported
13
9
  # operations. It is their approach that we have implemented.
@@ -33,21 +29,29 @@ require_relative 'shared'
33
29
  #
34
30
  # The final operation (enumerate) takes O(m + log n) time, where m is the number of points that are enumerated.
35
31
  #
32
+ # In the current implementation no two points can share an x-value and no two points can share a y-value. This (rather severe)
33
+ # restriction can be relaxed with some more complicated code.
34
+ #
35
+ #
36
36
  # There is a related data structure called the Min-max priority search tree so we have called this a "Max priority search tree", or
37
37
  # MaxPST.
38
38
  #
39
39
  # References:
40
- # * E.M. McCreight, _Priority search trees_, SIAM J. Comput., 14(2):257-276, 1985. Later, De,
40
+ # * E.M. McCreight, _Priority search trees_, SIAM J. Comput., 14(2):257-276, 1985.
41
41
  # * M. De, A. Maheshwari, S. C. Nandy, M. Smid, _An In-Place Priority Search Tree_, 23rd Canadian Conference on Computational
42
42
  # Geometry, 2011
43
- class MaxPrioritySearchTreeInternal
43
+ class DataStructuresRMolinari::MaxPrioritySearchTree
44
44
  include Shared
45
45
  include BinaryTreeArithmetic
46
46
 
47
47
  # Construct a MaxPST from the collection of points in +data+.
48
48
  #
49
- # @param data [Array] the set P of points presented as an array. The tree is built in the array in-place without cloning. Each
50
- # element of the array must respond to +#x+ and +#y+ (though this is not currently checked).
49
+ # @param data [Array] the set P of points presented as an array. The tree is built in the array in-place without cloning.
50
+ # - Each element of the array must respond to +#x+ and +#y+.
51
+ # - This is not checked explicitly but a missing method exception will be thrown when we try to call one of them.
52
+ # - The +x+ values must be distinct, as must the +y+ values. We raise a +Shared::DataError+ if this isn't the case.
53
+ # - This is a restriction that simplifies some of the algorithm code. It can be removed as the cost of some extra work. Issue
54
+ # #9.
51
55
  #
52
56
  # @param verify [Boolean] when truthy, check that the properties of a PST are satisified after construction, raising an exception
53
57
  # if not.
@@ -69,7 +73,7 @@ class MaxPrioritySearchTreeInternal
69
73
  # Let Q = [x0, infty) X [y0, infty) be the northeast quadrant defined by the point (x0, y0) and let P be the points in this data
70
74
  # structure. Define p* as
71
75
  #
72
- # - (infty, -infty) f Q \intersect P is empty and
76
+ # - (infty, -infty) if Q \intersect P is empty and
73
77
  # - the highest (max-x) point in Q \intersect P otherwise.
74
78
  #
75
79
  # This method returns p* in O(log n) time and O(1) extra space.
@@ -82,7 +86,7 @@ class MaxPrioritySearchTreeInternal
82
86
  # Let Q = (-infty, x0] X [y0, infty) be the northwest quadrant defined by the point (x0, y0) and let P be the points in this data
83
87
  # structure. Define p* as
84
88
  #
85
- # - (-infty, -infty) f Q \intersect P is empty and
89
+ # - (-infty, -infty) if Q \intersect P is empty and
86
90
  # - the highest (max-y) point in Q \intersect P otherwise.
87
91
  #
88
92
  # This method returns p* in O(log n) time and O(1) extra space.
@@ -109,12 +113,12 @@ class MaxPrioritySearchTreeInternal
109
113
 
110
114
  p = root
111
115
  if quadrant == :ne
112
- best = Pair.new(INFINITY, -INFINITY)
116
+ best = Point.new(INFINITY, -INFINITY)
113
117
  preferred_child = ->(n) { right(n) }
114
118
  nonpreferred_child = ->(n) { left(n) }
115
119
  sufficient_x = ->(x) { x >= x0 }
116
120
  else
117
- best = Pair.new(-INFINITY, -INFINITY)
121
+ best = Point.new(-INFINITY, -INFINITY)
118
122
  preferred_child = ->(n) { left(n) }
119
123
  nonpreferred_child = ->(n) { right(n) }
120
124
  sufficient_x = ->(x) { x <= x0 }
@@ -186,7 +190,7 @@ class MaxPrioritySearchTreeInternal
186
190
  # Let Q = [x0, infty) X [y0, infty) be the northeast quadrant defined by the point (x0, y0) and let P be the points in this data
187
191
  # structure. Define p* as
188
192
  #
189
- # - (infty, infty) f Q \intersect P is empty and
193
+ # - (infty, infty) if Q \intersect P is empty and
190
194
  # - the leftmost (min-x) point in Q \intersect P otherwise.
191
195
  #
192
196
  # This method returns p* in O(log n) time and O(1) extra space.
@@ -224,10 +228,10 @@ class MaxPrioritySearchTreeInternal
224
228
 
225
229
  if quadrant == :ne
226
230
  sign = 1
227
- best = Pair.new(INFINITY, INFINITY)
231
+ best = Point.new(INFINITY, INFINITY)
228
232
  else
229
233
  sign = -1
230
- best = Pair.new(-INFINITY, INFINITY)
234
+ best = Point.new(-INFINITY, INFINITY)
231
235
  end
232
236
 
233
237
  p = q = root
@@ -369,7 +373,7 @@ class MaxPrioritySearchTreeInternal
369
373
  #
370
374
  # Sometimes we don't have a relevant node to the left or right of Q. The booleans L and R (which we call left and right) track
371
375
  # whether p and q are defined at the moment.
372
- best = Pair.new(INFINITY, -INFINITY)
376
+ best = Point.new(INFINITY, -INFINITY)
373
377
  p = q = left = right = nil
374
378
 
375
379
  x_range = (x0..x1)
@@ -637,7 +641,7 @@ class MaxPrioritySearchTreeInternal
637
641
  end
638
642
  current = parent(current)
639
643
  else
640
- raise LogicError, "Explore(t) state is somehow #{state} rather than 0, 1, or 2."
644
+ raise InternalLogicError, "Explore(t) state is somehow #{state} rather than 0, 1, or 2."
641
645
  end
642
646
  end
643
647
  end
@@ -782,7 +786,7 @@ class MaxPrioritySearchTreeInternal
782
786
  p_in = right(p_in)
783
787
  left = true
784
788
  else
785
- raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
789
+ raise InternalLogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
786
790
 
787
791
  p = left(p_in)
788
792
  q = right(p_in)
@@ -792,7 +796,7 @@ class MaxPrioritySearchTreeInternal
792
796
  end
793
797
  elsif left_val.x <= x1
794
798
  if right_val.x > x1
795
- raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
799
+ raise InternalLogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
796
800
 
797
801
  q = right(p_in)
798
802
  p_in = left(p_in)
@@ -806,7 +810,7 @@ class MaxPrioritySearchTreeInternal
806
810
  right_in = true
807
811
  end
808
812
  else
809
- raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
813
+ raise InternalLogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
810
814
 
811
815
  q = left(p_in)
812
816
  deactivate_p_in.call
@@ -842,8 +846,8 @@ class MaxPrioritySearchTreeInternal
842
846
 
843
847
  # q has two children. Cases!
844
848
  if @data[left(q)].x < x0
845
- raise LogicError, 'p_in should not be active, based on the value at left(q)' if left_in
846
- raise LogicError, 'q_in should not be active, based on the value at left(q)' if right_in
849
+ raise InternalLogicError, 'p_in should not be active, based on the value at left(q)' if left_in
850
+ raise InternalLogicError, 'q_in should not be active, based on the value at left(q)' if right_in
847
851
 
848
852
  left = true
849
853
  if @data[right(q)].x < x0
@@ -874,7 +878,7 @@ class MaxPrioritySearchTreeInternal
874
878
 
875
879
  # Given: q' is active and satisfied x0 <= x(q') <= x1
876
880
  enumerate_right_in = lambda do
877
- raise LogicError, 'right_in should be true if we call enumerate_right_in' unless right_in
881
+ raise InternalLogicError, 'right_in should be true if we call enumerate_right_in' unless right_in
878
882
 
879
883
  if @data[q_in].y >= y0
880
884
  report.call(q_in)
@@ -906,7 +910,7 @@ class MaxPrioritySearchTreeInternal
906
910
  # q' has two children
907
911
  right_val = @data[right(q_in)]
908
912
  if left_val.x < x0
909
- raise LogicError, 'p_in cannot be active, by the value in the left child of q_in' if left_in
913
+ raise InternalLogicError, 'p_in cannot be active, by the value in the left child of q_in' if left_in
910
914
 
911
915
  if right_val.x < x0
912
916
  p = right(q_in)
@@ -966,7 +970,7 @@ class MaxPrioritySearchTreeInternal
966
970
 
967
971
  while left || left_in || right_in || right
968
972
  # byebug if $do_it
969
- raise LogicError, 'It should not be that q_in is active but p_in is not' if right_in && !left_in
973
+ raise InternalLogicError, 'It should not be that q_in is active but p_in is not' if right_in && !left_in
970
974
 
971
975
  set_i = []
972
976
  set_i << :left if left
@@ -984,7 +988,7 @@ class MaxPrioritySearchTreeInternal
984
988
  when :right
985
989
  enumerate_right.call
986
990
  else
987
- raise LogicError, "bad symbol #{z}"
991
+ raise InternalLogicError, "bad symbol #{z}"
988
992
  end
989
993
  end
990
994
  return result unless block_given?
@@ -994,9 +998,14 @@ class MaxPrioritySearchTreeInternal
994
998
  # Build the initial stucture
995
999
 
996
1000
  private def construct_pst
997
- # We follow the algorithm in the paper by De, Maheshwari et al. Note that indexing is from 1 there. For now we pretend that that
998
- # is the case here, too.
1001
+ raise DataError, 'Duplicate x values are not supported' if contains_duplicates?(@data, by: :x)
1002
+ raise DataError, 'Duplicate y values are not supported' if contains_duplicates?(@data, by: :y)
1003
+
1004
+ # We follow the algorithm in the paper by De, Maheshwari et al.
999
1005
 
1006
+ # Since we are building an implicit binary tree, things are simpler if the array is 1-based. This probably requires a malloc and
1007
+ # data copy, which isn't great, but it's in the C layer so cheap compared to the O(n log^2 n) work we need to do for
1008
+ # construction. In fact, we are probably doing O(n^2) work because of all the calls to #index_with_largest_y_in.
1000
1009
  @data.unshift nil
1001
1010
 
1002
1011
  h = Math.log2(@size).floor
@@ -1106,13 +1115,11 @@ class MaxPrioritySearchTreeInternal
1106
1115
  (l..r).max_by { |idx| @data[idx].y }
1107
1116
  end
1108
1117
 
1109
- # Sort the subarray @data[l..r]. This is much faster than a Ruby-layer heapsort because it is mostly happening in C.
1118
+ # Sort the subarray @data[l..r].
1110
1119
  private def sort_subarray(l, r)
1111
- # heapsort_subarray(l, r)
1112
1120
  return if l == r # 1-array already sorted!
1113
1121
 
1114
- #l -= 1
1115
- #r -= 1
1122
+ # This slice-replacement is much faster than a Ruby-layer heapsort because it is mostly happening in C.
1116
1123
  @data[l..r] = @data[l..r].sort_by(&:x)
1117
1124
  end
1118
1125
 
@@ -1127,7 +1134,7 @@ class MaxPrioritySearchTreeInternal
1127
1134
  private def verify_properties
1128
1135
  # It's a max-heap in y
1129
1136
  (2..@size).each do |node|
1130
- raise LogicError, "Heap property violated at child #{node}" unless @data[node].y < @data[parent(node)].y
1137
+ raise InternalLogicError, "Heap property violated at child #{node}" unless @data[node].y < @data[parent(node)].y
1131
1138
  end
1132
1139
 
1133
1140
  # Left subtree has x values less than all of the right subtree
@@ -1137,7 +1144,7 @@ class MaxPrioritySearchTreeInternal
1137
1144
  left_max = max_x_in_subtree(left(node))
1138
1145
  right_min = min_x_in_subtree(right(node))
1139
1146
 
1140
- raise LogicError, "Left-right property of x-values violated at #{node}" unless left_max < right_min
1147
+ raise InternalLogicError, "Left-right property of x-values violated at #{node}" unless left_max < right_min
1141
1148
  end
1142
1149
  end
1143
1150
 
@@ -2,15 +2,13 @@ require 'must_be'
2
2
 
3
3
  require_relative 'shared'
4
4
 
5
+ # THIS CLASS IS INCOMPLETE AND NOT USABLE
6
+ #
5
7
  # A priority search tree (PST) stores points in two dimensions (x,y) and can efficiently answer certain questions about the set of
6
8
  # point.
7
9
  #
8
10
  # The structure was introduced by McCreight [1].
9
11
  #
10
- # It is a binary search tree which is a max-heap by the y-coordinate, and, for a non-leaf node N storing (x, y), all the nodes in
11
- # the left subtree of N have smaller x values than any of the nodes in the right subtree of N. Note, though, that the x-value at N
12
- # has no particular property relative to the x values in its subtree. It is thus _almost_ a binary search tree in the x coordinate.
13
- #
14
12
  # See more: https://en.wikipedia.org/wiki/Priority_search_tree
15
13
  #
16
14
  # It is possible to build such a tree in place, given an array of pairs. See [2]. In a follow-up paper, [3], the authors show how to
@@ -40,12 +38,12 @@ require_relative 'shared'
40
38
  # [2] De, Maheshwari, Nandy, Smid, _An in-place priority search tree_, 23rd Annual Canadian Conference on Computational Geometry.
41
39
  # [3] De, Maheshwari, Nandy, Smid, _An in-place min-max priority search tree_, Computational Geometry, v46 (2013), pp 310-327.
42
40
  # [4] Atkinson, Sack, Santoro, Strothotte, _Min-max heaps and generalized priority queues_, Commun. ACM 29 (10) (1986), pp 996-1000.
43
- class MinmaxPrioritySearchTreeInternal
41
+ class DataStructuresRMolinari::MinmaxPrioritySearchTree
44
42
  include Shared
45
43
 
46
44
  # The array of pairs is turned into a minmax PST in-place without cloning. So clone before passing it in, if you care.
47
45
  #
48
- # Each element must respond to #x and #y. Use Pair (above) if you like.
46
+ # Each element must respond to #x and #y. Use Point (above) if you like.
49
47
  def initialize(data, verify: false)
50
48
  @data = data
51
49
  @size = @data.size
@@ -75,7 +73,7 @@ class MinmaxPrioritySearchTreeInternal
75
73
  #
76
74
  # Here T(x) is the subtree rooted at x
77
75
  def leftmost_ne(x0, y0)
78
- best = Pair.new(INFINITY, INFINITY)
76
+ best = Point.new(INFINITY, INFINITY)
79
77
  p = q = root
80
78
 
81
79
  in_q = ->(pair) { pair.x >= x0 && pair.y >= y0 }
@@ -284,7 +282,7 @@ class MinmaxPrioritySearchTreeInternal
284
282
  #
285
283
  # This method returns p*
286
284
  # def highest_3_sided_up(x0, x1, y0)
287
- # best = Pair.new(INFINITY, -INFINITY)
285
+ # best = Point.new(INFINITY, -INFINITY)
288
286
 
289
287
  # in_q = lambda do |pair|
290
288
  # pair.x >= x0 && pair.x <= x1 && pair.y >= y0
@@ -407,7 +405,7 @@ class MinmaxPrioritySearchTreeInternal
407
405
  # - If Q intersect P is empty then p* = best
408
406
  #
409
407
  # Here, P is the set of points in our data structure and T_p is the subtree rooted at p
410
- best = Pair.new(INFINITY, -INFINITY)
408
+ best = Point.new(INFINITY, -INFINITY)
411
409
  p = root # root of the whole tree AND the pair stored there
412
410
 
413
411
  in_q = lambda do |pair|
@@ -1,11 +1,20 @@
1
1
  # Some odds and ends shared by other classes
2
2
  module Shared
3
+ # Infinity without having to put a +Float::+ prefix every time
3
4
  INFINITY = Float::INFINITY
4
5
 
5
- Pair = Struct.new(:x, :y)
6
+ # An (x, y) coordinate pair.
7
+ Point = Struct.new(:x, :y)
6
8
 
7
9
  # @private
10
+
11
+ # Used for errors related to logic errors in client code
8
12
  class LogicError < StandardError; end
13
+ # Used for errors related to logic errors in library code
14
+ class InternalLogicError < LogicError; end
15
+
16
+ # Used for errors related to data, such as duplicated elements where they must be distinct.
17
+ class DataError < StandardError; end
9
18
 
10
19
  # @private
11
20
  #
@@ -61,4 +70,22 @@ module Shared
61
70
  (i & 1).zero?
62
71
  end
63
72
  end
73
+
74
+ # Simple O(n) check for duplicates in an enumerable.
75
+ #
76
+ # It may be worse than O(n), depending on how close to constant set insertion is.
77
+ #
78
+ # @param enum the enumerable to check for duplicates
79
+ # @param by a method to call on each element of enum before checking. The results of these methods are checked for
80
+ # duplication. When nil we don't call anything and just use the elements themselves.
81
+ def contains_duplicates?(enum, by: nil)
82
+ seen = Set.new
83
+ enum.each do |v|
84
+ v = v.send(by) if by
85
+ return true if seen.include? v
86
+
87
+ seen << v
88
+ end
89
+ false
90
+ end
64
91
  end
@@ -1,54 +1,78 @@
1
1
  require_relative 'data_structures_rmolinari/shared'
2
- require_relative 'data_structures_rmolinari/disjoint_union_internal'
3
- require_relative 'data_structures_rmolinari/generic_segment_tree_internal'
4
- require_relative 'data_structures_rmolinari/heap_internal'
5
- require_relative 'data_structures_rmolinari/max_priority_search_tree_internal'
6
- require_relative 'data_structures_rmolinari/minmax_priority_search_tree_internal'
7
2
 
8
3
  module DataStructuresRMolinari
9
- Pair = Shared::Pair
10
-
11
- ########################################
12
- # Priority Search Trees
13
- #
14
- # Note that MinmaxPrioritySearchTree is only a fragment of what we need
4
+ # A struct responding to +.x+ and +.y+.
5
+ Point = Shared::Point
6
+ end
15
7
 
16
- MaxPrioritySearchTree = MaxPrioritySearchTreeInternal
17
- MinmaxPrioritySearchTree = MinmaxPrioritySearchTreeInternal
8
+ # These define classes inside module DataStructuresRMolinari
9
+ require_relative 'data_structures_rmolinari/disjoint_union'
10
+ require_relative 'data_structures_rmolinari/generic_segment_tree'
11
+ require_relative 'data_structures_rmolinari/heap'
12
+ require_relative 'data_structures_rmolinari/max_priority_search_tree'
13
+ require_relative 'data_structures_rmolinari/minmax_priority_search_tree'
18
14
 
15
+ # A namespace to hold the provided classes. We want to avoid polluting the global namespace with names like "Heap"
16
+ module DataStructuresRMolinari
19
17
  ########################################
20
- # Segment Trees
21
-
22
- GenericSegmentTree = GenericSegmentTreeInternal
23
-
24
- # Takes an array A[0...n] and tells us what the maximum value is on a subinterval i..j in O(log n) time.
18
+ # Concrete instances of Segment Tree
25
19
  #
26
- # TODO:
27
- # - allow min val too
28
- # - add a flag to the initializer
29
- # - call it ExtremalValSegment tree or something similar
20
+ # @todo consider moving these into generic_segment_tree.rb
21
+
22
+ # A segment tree that for an array A(0...n) answers questions of the form "what is the maximum value in the subinterval A(i..j)?"
23
+ # in O(log n) time.
30
24
  class MaxValSegmentTree
31
25
  extend Forwardable
32
26
 
33
- def_delegator :@structure, :query_on, :max_on
27
+ # Tell the tree that the value at idx has changed
28
+ def_delegator :@structure, :update_at
34
29
 
30
+ # @param data an object that contains values at integer indices based at 0, via +data[i]+.
31
+ # - This will usually be an Array, but it could also be a hash or a proc.
35
32
  def initialize(data)
36
33
  @structure = GenericSegmentTree.new(
37
34
  combine: ->(a, b) { [a, b].max },
38
35
  single_cell_array_val: ->(i) { data[i] },
39
36
  size: data.size,
40
- identity: -Float::INFINITY
37
+ identity: -Shared::INFINITY
41
38
  )
42
39
  end
40
+
41
+ # The maximum value in A(i..j).
42
+ #
43
+ # The arguments must be integers in 0...(A.size)
44
+ # @return the largest value in A(i..j) or -Infinity if i > j.
45
+ def max_on(i, j)
46
+ @structure.query_on(i, j)
47
+ end
43
48
  end
44
49
 
45
- ########################################
46
- # Heap
50
+ # A segment tree that for an array A(0...n) answers questions of the form "what is the index of the maximal value in the
51
+ # subinterval A(i..j)?" in O(log n) time.
52
+ class IndexOfMaxValSegmentTree
53
+ extend Forwardable
47
54
 
48
- Heap = HeapInternal
55
+ # Tell the tree that the value at idx has changed
56
+ def_delegator :@structure, :update_at
49
57
 
50
- ########################################
51
- # Disjoint Union
58
+ # @param (see MaxValSegmentTree#initialize)
59
+ def initialize(data)
60
+ @structure = GenericSegmentTree.new(
61
+ combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
62
+ single_cell_array_val: ->(i) { [i, data[i]] },
63
+ size: data.size,
64
+ identity: nil
65
+ )
66
+ end
52
67
 
53
- DisjointUnion = DisjointUnionInternal
68
+ # The index of the maximum value in A(i..j)
69
+ #
70
+ # The arguments must be integers in 0...(A.size)
71
+ # @return (Integer, nil) the index of the largest value in A(i..j) or +nil+ if i > j.
72
+ # - If there is more than one entry with that value, return one the indices. There is no guarantee as to which one.
73
+ # - Return +nil+ if i > j
74
+ def index_of_max_val_on(i, j)
75
+ @structure.query_on(i, j)&.first # discard the value part of the pair
76
+ end
77
+ end
54
78
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data_structures_rmolinari
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Rory Molinari
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-01-05 00:00:00.000000000 Z
11
+ date: 2023-01-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: must_be
@@ -67,24 +67,26 @@ dependencies:
67
67
  - !ruby/object:Gem::Version
68
68
  version: 0.22.0
69
69
  description: |
70
- This small gem contains several data structures that I have implemented to learn how they work.
70
+ This small gem contains several data structures that I have implemented in Ruby to learn how they work.
71
71
 
72
72
  Sometimes it is not enough to read the description of a data structure and accompanying pseudo-code.
73
- Actually implementing the structure is often helpful in understanding what is going on. It is also
73
+ Actually implementing it is often helpful in understanding what is going on. It is also
74
74
  usually fun.
75
75
 
76
- We implement Disjoin Union, Heap, Priority Search Tree, and Segment Tree.
77
- email: rorymolinari+rubygems@gmail.com
76
+ The gem contains basic implementions of Disjoint Union, Heap, Priority Search Tree, and Segment Tree.
77
+ See the homepage for more details.
78
+ email: rorymolinari@gmail.com
78
79
  executables: []
79
80
  extensions: []
80
81
  extra_rdoc_files: []
81
82
  files:
83
+ - CHANGELOG.md
82
84
  - lib/data_structures_rmolinari.rb
83
- - lib/data_structures_rmolinari/disjoint_union_internal.rb
84
- - lib/data_structures_rmolinari/generic_segment_tree_internal.rb
85
- - lib/data_structures_rmolinari/heap_internal.rb
86
- - lib/data_structures_rmolinari/max_priority_search_tree_internal.rb
87
- - lib/data_structures_rmolinari/minmax_priority_search_tree_internal.rb
85
+ - lib/data_structures_rmolinari/disjoint_union.rb
86
+ - lib/data_structures_rmolinari/generic_segment_tree.rb
87
+ - lib/data_structures_rmolinari/heap.rb
88
+ - lib/data_structures_rmolinari/max_priority_search_tree.rb
89
+ - lib/data_structures_rmolinari/minmax_priority_search_tree.rb
88
90
  - lib/data_structures_rmolinari/shared.rb
89
91
  homepage: https://github.com/rmolinari/data_structures
90
92
  licenses: