data_structures_rmolinari 0.2.1 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: b8d63f6987f160f00ce92f79643f904f95bc230e4a5b60679f4301ecd366622a
4
- data.tar.gz: cdc660ab53a0259a79b72ecc3928f640faa7b1251a1bae3baabb1c70bac20d01
3
+ metadata.gz: 9f006234ee3b216d5607e9b10bb1958a6107ccfa0cc8c359f98383dc7fde14ee
4
+ data.tar.gz: f281ab0768e24e7c983cd046ba7b185dab8fd972fb3065fd73ff575782bf5486
5
5
  SHA512:
6
- metadata.gz: 057d29af42606c7ecc4a78e6c752fa5f7a29b41788f21ca7ede6bab0893d1ba63b6c9dc3e5f3b5747f3912718565d6dfcb5952d189f78b40a6ef77efe95f34c9
7
- data.tar.gz: '08f6921ab4b1c7cf84c9fe5624266b7060f4f1d4f1ecc56e3850106b3bb30e0c6b30c3ea808a61939966350118684f3353ccbb628d24b353b35ae865268222d2'
6
+ metadata.gz: e274a97f177fad44bad20ecf24ecca1385fee3c217e7e42aac076c24377970c6444dfdbadc6fd3e1e201555177429c9f8eddaee211e463dd60f6b36e74004eec
7
+ data.tar.gz: 293fc0b2973a8d851c27f4e64177dbf7b9a25b2bb7eb9efb4b33abdb07c4e006f80f4450996ef99da7e8bb1516ca8aa89ab893258960d9127d101995906254ed
data/CHANGELOG.md ADDED
@@ -0,0 +1,18 @@
1
+ # Changelog
2
+
3
+ ## [Unreleased]
4
+
5
+ ## [0.3.0] 2023-01-06
6
+
7
+ ### Added
8
+
9
+ - Start this file
10
+ - `Heap` can be constructed as "non-addressable"
11
+ - `update` is not possible but duplicates can be inserted and overall performance is a little better.
12
+
13
+ ### Changed
14
+
15
+ - `LogicError` gets a subclassed `InternalLogicError` for issues inside the library.
16
+ - `Shared::Pair` becomes `Shared::Point`
17
+ - this doesn't change the API of `MaxPrioritySearchTree` because of ducktyping. But client code (of which there is none) might be
18
+ using the `Pair` name.
@@ -4,13 +4,21 @@
4
4
  # The data structure provides efficient actions to merge two disjoint subsets, i.e., replace them by their union, and determine if
5
5
  # two elements are in the same subset.
6
6
  #
7
- # The elements of the set must be 0, 1, ..., n-1. Client code can map its data to these representatives. The code uses several ideas
8
- # from Tarjan and van Leeuwen for efficiency
7
+ # The elements of the set are 0, 1, ..., n-1, where n is the size of the universe. Client code can map its data to these
8
+ # representatives.
9
9
  #
10
10
  # See https://en.wikipedia.org/wiki/Disjoint-set_data_structure for a good introduction.
11
11
  #
12
+ # The code uses several ideas from Tarjan and van Leeuwen for efficiency. We use "union by rank" in +unite+ and path-halving in
13
+ # +find+. Together, these make the amortized cost for each of n such operations effectively constant.
14
+ #
12
15
  # - Tarjan, Robert E., van Leeuwen, Jan (1984). "Worst-case analysis of set union algorithms". Journal of the ACM. 31 (2): 245–281.
13
- class DisjointUnionInternal
16
+ #
17
+ # @todo
18
+ # - allow caller to expand the size of the universe. This operation is called "make set".
19
+ # - All we need to do is increase the size of @d, set the parent pointers, define the new ranks (zero), and update @size.
20
+ class DataStructuresRMolinari::DisjointUnion
21
+ # The number of subsets in the partition.
14
22
  attr_reader :subset_count
15
23
 
16
24
  # @param size the size of the universe, which must be known at the time of construction. The elements 0, 1, ..., size - 1 start
@@ -52,7 +60,7 @@ class DisjointUnionInternal
52
60
  end
53
61
 
54
62
  private def check_value(v)
55
- raise "Value must be given and be in (0..#{@size - 1})" unless v && v.between?(0, @size - 1)
63
+ raise DataError, "Value must be given and be in (0..#{@size - 1})" unless v && v.between?(0, @size - 1)
56
64
  end
57
65
 
58
66
  private def link(e, f)
@@ -8,45 +8,33 @@ require_relative 'shared'
8
8
  # called an "interval tree."
9
9
  #
10
10
  # For more details (and some close-to-metal analysis of run time, especially for large datasets) see
11
- # https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up
12
- # implementation, which is faster, at least for large datasets and cache-relevant compiled code.
11
+ # https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up implementation,
12
+ # which is faster, at least for large datasets and cache-relevant compiled code. These issues don't really apply to code written in
13
+ # Ruby.
13
14
  #
14
- # This is a generic implementation.
15
+ # This is a generic implementation, intended to allow easy configuration for concrete instances. See the parameters to the
16
+ # initializer and the definitions of concrete realisations like MaxValSegmentTree.
15
17
  #
16
18
  # We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
17
- #
18
- # @todo
19
- # - provide a data-update operation like update_val_at(idx, val)
20
- # - this is O(log n)
21
- # - note that this may need some rework. Consider something like IndexOfMaxVal: @merge needs to know about the underlying data
22
- # in that case. Hmmm. Maybe the lambda can close over the data in a way that makes it possible to change the data "from the
23
- # outside". Yes:
24
- # a = [1,2,3]
25
- # foo = ->() { a.max }
26
- # foo.call # 3
27
- # a = [1,2,4]
28
- # foo.call # 4
29
- # - Offer an optional parameter base_case_value_extractor (<-- need better name) to be used in #determine_val in the case that
30
- # left == tree_l && right == tree_r instead of simply returning @tree[tree_idx]
31
- # - Use case: https://cp-algorithms.com/data_structures/segment_tree.html#saving-the-entire-subarrays-in-each-vertex, such as
32
- # finding the least element in a subarray l..r no smaller than a given value x. In this case we store a sorted version the
33
- # entire subarray at each node and use a binary search on it.
34
- # - the default value would simply be the identity function.
35
- # - NOTE that in this case, we have different "combine" functions in #determine_val and #build. In #build we would combine
36
- # sorted lists into a larger sorted list. In #determine_val we combine results via #min.
37
- # - Think about the interface before doing this.
38
- class GenericSegmentTreeInternal
19
+ class DataStructuresRMolinari::GenericSegmentTree
39
20
  include Shared::BinaryTreeArithmetic
40
21
 
41
22
  # Construct a concrete instance of a Segment Tree. See details at the links above for the underlying concepts here.
42
23
  # @param combine a lambda that takes two values and munges them into a combined value.
43
24
  # - For example, if we are calculating sums over subintervals, combine.call(a, b) = a + b, while if we are doing maxima we will
44
- # return max(a, b)
25
+ # return max(a, b).
26
+ # - Things get more complicated when we are calculating, say, the _index_ of the maximal value in a subinterval. Now it is not
27
+ # enough simply to store that index at each tree node, because to combine the indices from two child nodes we need to know
28
+ # both the index of the maximal element in each child node's interval, but also the maximal values themselves, so we know
29
+ # which one "wins" for the parent node. This affects the sort of work we need to do when combining and the value provided by
30
+ # the +single_cell_array_val+ lambda.
45
31
  # @param single_cell_array_val a lambda that takes an index i and returns the value we need to store in the #build
46
- # operation for the subinterval i..i. This is often simply be the value data[i], but in some cases - like "index of max val" -
47
- # it will be something else.
32
+ # operation for the subinterval i..i.
33
+ # - This will often simply be the value data[i], but in some cases it will be something else. For example, when we are
34
+ # calculating the index of the maximal value on each subinterval we need [i, data[i]] here.
35
+ # - If +update_at+ is called later, this lambda must close over the underlying data in a way that captures the updated value.
48
36
  # @param size the size of the underlying data array, used in certain internal arithmetic.
49
- # @param identity is the value to return when we are querying on an empty interval
37
+ # @param identity the value to return when we are querying on an empty interval
50
38
  # - for sums, this will be zero; for maxima, this will be -Infinity, etc
51
39
  def initialize(combine:, single_cell_array_val:, size:, identity:)
52
40
  @combine = combine
@@ -62,15 +50,28 @@ class GenericSegmentTreeInternal
62
50
  # @param left the left end of the subinterval.
63
51
  # @param right the right end (inclusive) of the subinterval.
64
52
  #
65
- # The type of the return value depends on the concrete instance of the segment tree.
53
+ # The type of the return value depends on the concrete instance of the segment tree. We return the _identity_ element provided at
54
+ # construction time if the interval is empty.
66
55
  def query_on(left, right)
67
- raise "Bad query interval #{left}..#{right}" if left.negative? || right >= @size
56
+ raise DataError, "Bad query interval #{left}..#{right}" if left.negative? || right >= @size
68
57
 
69
58
  return @identity if left > right # empty interval
70
59
 
71
60
  determine_val(root, left, right, 0, @size - 1)
72
61
  end
73
62
 
63
+ # Update the value in the underlying array at the given idx
64
+ #
65
+ # @param idx an index in the underlying data array.
66
+ #
67
+ # Note that we don't need the updated value itself. We get that by calling the lambda +single_cell_array_val+ supplied at
68
+ # construction.
69
+ def update_at(idx)
70
+ raise DataError, 'Cannot update an index outside the initial range of the underlying data' unless (0...@size).cover?(idx)
71
+
72
+ update_val_at(idx, root, 0, @size - 1)
73
+ end
74
+
74
75
  private def determine_val(tree_idx, left, right, tree_l, tree_r)
75
76
  # Does the current tree node exactly serve up the interval we're interested in?
76
77
  return @tree[tree_idx] if left == tree_l && right == tree_r
@@ -92,6 +93,26 @@ class GenericSegmentTreeInternal
92
93
  end
93
94
  end
94
95
 
96
+ private def update_val_at(idx, tree_idx, tree_l, tree_r)
97
+ if tree_l == tree_r
98
+ # We have found the spot!
99
+ raise InternalLogicError, 'tree_l == tree_r, but they do not agree with the idx holding the updated value' unless tree_l == idx
100
+
101
+ @tree[tree_idx] = @single_cell_array_val.call(tree_l)
102
+ else
103
+ # Recursively update the appropriate subtree
104
+ mid = midpoint(tree_l, tree_r)
105
+ left = left(tree_idx)
106
+ right = right(tree_idx)
107
+ if mid >= idx
108
+ update_val_at(idx, left(tree_idx), tree_l, mid)
109
+ else
110
+ update_val_at(idx, right(tree_idx), mid + 1, tree_r)
111
+ end
112
+ @tree[tree_idx] = @combine.call(@tree[left], @tree[right])
113
+ end
114
+ end
115
+
95
116
  # Build the internal data structure.
96
117
  #
97
118
  # - tree_idx is the index into @tree
@@ -13,8 +13,8 @@ require_relative 'shared'
13
13
  # - +empty?+
14
14
  # - is the heap empty?
15
15
  # - O(1)
16
- # - +insert+
17
- # - add a new element to the heap with an associated priority
16
+ # - +insert(item, priority)+
17
+ # - add a new item to the heap with an associated priority
18
18
  # - O(log N)
19
19
  # - +top+
20
20
  # - return the lowest-priority element, which is the element at the root of the tree. In a max-heap this is the highest-priority
@@ -23,12 +23,18 @@ require_relative 'shared'
23
23
  # - +pop+
24
24
  # - removes and returns the item that would be returned by +top+
25
25
  # - O(log N)
26
- # - +update+
26
+ # - +update(item, priority)+
27
27
  # - tell the heap that the priority of a particular item has changed
28
28
  # - O(log N)
29
29
  #
30
30
  # Here N is the number of elements in the heap.
31
31
  #
32
+ # The internal requirements needed to implement +update+ have several consequences.
33
+ # - Items added to the heap must be distinct. Otherwise we would not know which occurrence to update
34
+ # - There is some bookkeeping overhead.
35
+ # If client code doesn't need to call +update+ then we can create a "non-addressable" heap that allows for the insertion of
36
+ # duplicate items and has slightly faster runtime overall. See the arguments to the initializer.
37
+ #
32
38
  # References:
33
39
  #
34
40
  # - https://en.wikipedia.org/wiki/Binary_heap
@@ -36,23 +42,31 @@ require_relative 'shared'
36
42
  # DOI 10.1007/s00224-017-9760-2
37
43
  #
38
44
  # @todo
39
- # - relax the requirement that priorities must be comparable vai +<+ and respond to negation. Instead, allow comparison via +<=>+
40
- # and handle max-heaps differently.
41
- # - this will allow priorities to be arrays for tie-breakers and similar.
42
- class HeapInternal
45
+ # - let caller see the priority of the top element. Maybe this is useful sometimes.
46
+ class DataStructuresRMolinari::Heap
47
+ include Shared
43
48
  include Shared::BinaryTreeArithmetic
44
49
 
50
+ # The number of items currently in the heap
45
51
  attr_reader :size
46
52
 
47
- Pair = Struct.new(:priority, :item)
53
+ # An (item, priority) pair
54
+ InternalPair = Struct.new(:item, :priority)
55
+ private_constant :InternalPair
48
56
 
49
57
  # @param max_heap when truthy, make a max-heap rather than a min-heap
50
- # @param debug when truthy, verify the heap property after each update than might violate it. This makes operations much slower.
51
- def initialize(max_heap: false, debug: false)
58
+ # @param addressable when truthy, the heap is _addressable_. This means that
59
+ # - item priorities are updatable with +update(item, p)+, and
60
+ # - items added to the heap must be distinct.
61
+ # When falsy, priorities are not updateable but items may be inserted multiple times. Operations are slightly faster because
62
+ # there is less internal bookkeeping.
63
+ # @param debug when truthy, verify the heap property after each change that might violate it. This makes operations much slower.
64
+ def initialize(max_heap: false, addressable: true, debug: false)
52
65
  @data = []
53
66
  @size = 0
54
67
  @max_heap = max_heap
55
- @index_of = {}
68
+ @addressable = addressable
69
+ @index_of = {} # used in addressable heaps
56
70
  @debug = debug
57
71
  end
58
72
 
@@ -61,27 +75,25 @@ class HeapInternal
61
75
  @size.zero?
62
76
  end
63
77
 
64
- # Insert a new element into the heap with the given property.
65
- # @param value the item to be inserted. It is an error to insert an item that is already present in the heap, though we don't
66
- # check for this.
67
- # @param priority the priority to use for new item. The values used as priorities ust be totally ordered via +<+ and, if +self+ is
68
- # a max-heap, must respond to negation +@-+ in the natural order-respecting way.
69
- # @todo
70
- # - check for duplicate
78
+ # Insert a new element into the heap with the given priority.
79
+ # @param value the item to be inserted.
80
+ # - If the heap is addressible (the default) it is an error to insert an item that is already present in the heap.
81
+ # @param priority the priority to use for new item. The values used as priorities must be totally ordered via +<=>+.
71
82
  def insert(value, priority)
72
- priority *= -1 if @max_heap
83
+ raise DataError, "Heap already contains #{value}" if @addressable && contains?(value)
73
84
 
74
85
  @size += 1
75
86
 
76
- d = Pair.new(priority, value)
87
+ d = InternalPair.new(value, priority)
77
88
  assign(d, @size)
78
89
 
79
90
  sift_up(@size)
80
91
  end
81
92
 
82
93
  # Return the top of the heap without removing it
83
- # @return the value with minimal (maximal for max-heaps) priority. Strictly speaking, it returns the item at the root of the
84
- # binary tree; this element has minimal priority, but there may be other elements with the same priority.
94
+ # @return a value with minimal priority (maximal for max-heaps). Strictly speaking, it returns the item at the root of the
95
+ # binary tree; this element has minimal priority, but there may be other elements with the same priority and they do not appear
96
+ # at the top of the heap in any guaranteed order.
85
97
  def top
86
98
  raise 'Heap is empty!' unless @size.positive?
87
99
 
@@ -92,12 +104,11 @@ class HeapInternal
92
104
  # @return (see #top)
93
105
  def pop
94
106
  result = top
95
- @index_of.delete(result)
96
-
97
107
  assign(@data[@size], root)
98
108
 
99
109
  @data[@size] = nil
100
110
  @size -= 1
111
+ @index_of.delete(result) if @addressable
101
112
 
102
113
  sift_down(root) if @size.positive?
103
114
 
@@ -105,21 +116,20 @@ class HeapInternal
105
116
  end
106
117
 
107
118
  # Update the priority of the given element and maintain the necessary heap properties.
119
+ #
108
120
  # @param element the item whose priority we are updating. It is an error to update the priority of an element not already in the
109
121
  # heap
110
122
  # @param priority the new priority
111
- #
112
- # @todo
113
- # - check that the element is in the heap
114
123
  def update(element, priority)
115
- priority *= -1 if @max_heap
124
+ raise LogicError, 'Cannot update priorities in a non-addressable heap' unless @addressable
125
+ raise DataError, "Cannot update priority for value #{element} not already in the heap" unless contains?(element)
116
126
 
117
127
  idx = @index_of[element]
118
128
  old = @data[idx].priority
119
129
  @data[idx].priority = priority
120
- if priority > old
130
+ if less_than_priority?(old, priority)
121
131
  sift_down(idx)
122
- elsif priority < old
132
+ elsif less_than_priority?(priority, old)
123
133
  sift_up(idx)
124
134
  end
125
135
 
@@ -133,7 +143,7 @@ class HeapInternal
133
143
  x = @data[idx]
134
144
  while idx != root
135
145
  i = parent(idx)
136
- break unless x.priority < @data[i].priority
146
+ break unless less_than?(x, @data[i])
137
147
 
138
148
  assign(@data[i], idx)
139
149
  idx = i
@@ -148,9 +158,9 @@ class HeapInternal
148
158
  x = @data[idx]
149
159
 
150
160
  while (j = left(idx)) <= @size
151
- j += 1 if j + 1 <= @size && @data[j + 1].priority < @data[j].priority
161
+ j += 1 if j + 1 <= @size && less_than?(@data[j + 1], @data[j])
152
162
 
153
- break unless @data[j].priority < x.priority
163
+ break unless less_than?(@data[j], x)
154
164
 
155
165
  assign(@data[j], idx)
156
166
  idx = j
@@ -163,7 +173,27 @@ class HeapInternal
163
173
  # Put the pair in the given heap location
164
174
  private def assign(pair, idx)
165
175
  @data[idx] = pair
166
- @index_of[pair.item] = idx
176
+ @index_of[pair.item] = idx if @addressable
177
+ end
178
+
179
+ # Compare the priorities of two items with <=> and return truthy exactly when the result is -1.
180
+ #
181
+ # If this is a max-heap return truthy exactly when the result of <=> is 1.
182
+ #
183
+ # The arguments can also be the priorities themselves.
184
+ private def less_than?(p1, p2)
185
+ less_than_priority?(p1.priority, p2.priority)
186
+ end
187
+
188
+ # Direct comparison of priorities
189
+ private def less_than_priority?(priority1, priority2)
190
+ return (priority1 <=> priority2) == 1 if @max_heap
191
+
192
+ (priority1 <=> priority2) == -1
193
+ end
194
+
195
+ private def contains?(item)
196
+ !!@index_of[item]
167
197
  end
168
198
 
169
199
  # For debugging
@@ -172,8 +202,8 @@ class HeapInternal
172
202
  left = left(idx)
173
203
  right = right(idx)
174
204
 
175
- raise "Heap property violated by left child of index #{idx}" if left <= @size && @data[idx].priority >= @data[left].priority
176
- raise "Heap property violated by right child of index #{idx}" if right <= @size && @data[idx].priority >= @data[right].priority
205
+ raise InternalLogicError, "Heap property violated by left child of index #{idx}" if left <= @size && less_than?(@data[left], @data[idx])
206
+ raise InternalLogicError, "Heap property violated by right child of index #{idx}" if right <= @size && less_than?(@data[right], @data[idx])
177
207
  end
178
208
  end
179
209
  end
@@ -1,13 +1,9 @@
1
1
  require 'set'
2
2
  require_relative 'shared'
3
3
 
4
-
5
- # A priority search tree (PST) stores a set, P, of two-dimensional points (x,y) in a way that allows efficient answes to certain
4
+ # A priority search tree (PST) stores a set, P, of two-dimensional points (x,y) in a way that allows efficient answers to certain
6
5
  # questions about P.
7
6
  #
8
- # (In the current implementation no two points can share an x-value and no two points can share a y-value. This (rather severe)
9
- # restriction can be relaxed with some more complicated code.)
10
- #
11
7
  # The data structure was introduced in 1985 by Edward McCreight. Later, De, Maheshwari, Nandy, and Smid showed how to construct a
12
8
  # PST in-place (using only O(1) extra memory), at the expense of some slightly more complicated code for the various supported
13
9
  # operations. It is their approach that we have implemented.
@@ -33,21 +29,29 @@ require_relative 'shared'
33
29
  #
34
30
  # The final operation (enumerate) takes O(m + log n) time, where m is the number of points that are enumerated.
35
31
  #
32
+ # In the current implementation no two points can share an x-value and no two points can share a y-value. This (rather severe)
33
+ # restriction can be relaxed with some more complicated code.
34
+ #
35
+ #
36
36
  # There is a related data structure called the Min-max priority search tree so we have called this a "Max priority search tree", or
37
37
  # MaxPST.
38
38
  #
39
39
  # References:
40
- # * E.M. McCreight, _Priority search trees_, SIAM J. Comput., 14(2):257-276, 1985. Later, De,
40
+ # * E.M. McCreight, _Priority search trees_, SIAM J. Comput., 14(2):257-276, 1985.
41
41
  # * M. De, A. Maheshwari, S. C. Nandy, M. Smid, _An In-Place Priority Search Tree_, 23rd Canadian Conference on Computational
42
42
  # Geometry, 2011
43
- class MaxPrioritySearchTreeInternal
43
+ class DataStructuresRMolinari::MaxPrioritySearchTree
44
44
  include Shared
45
45
  include BinaryTreeArithmetic
46
46
 
47
47
  # Construct a MaxPST from the collection of points in +data+.
48
48
  #
49
- # @param data [Array] the set P of points presented as an array. The tree is built in the array in-place without cloning. Each
50
- # element of the array must respond to +#x+ and +#y+ (though this is not currently checked).
49
+ # @param data [Array] the set P of points presented as an array. The tree is built in the array in-place without cloning.
50
+ # - Each element of the array must respond to +#x+ and +#y+.
51
+ # - This is not checked explicitly but a missing method exception will be thrown when we try to call one of them.
52
+ # - The +x+ values must be distinct, as must the +y+ values. We raise a +Shared::DataError+ if this isn't the case.
53
+ # - This is a restriction that simplifies some of the algorithm code. It can be removed as the cost of some extra work. Issue
54
+ # #9.
51
55
  #
52
56
  # @param verify [Boolean] when truthy, check that the properties of a PST are satisified after construction, raising an exception
53
57
  # if not.
@@ -69,7 +73,7 @@ class MaxPrioritySearchTreeInternal
69
73
  # Let Q = [x0, infty) X [y0, infty) be the northeast quadrant defined by the point (x0, y0) and let P be the points in this data
70
74
  # structure. Define p* as
71
75
  #
72
- # - (infty, -infty) f Q \intersect P is empty and
76
+ # - (infty, -infty) if Q \intersect P is empty and
73
77
  # - the highest (max-x) point in Q \intersect P otherwise.
74
78
  #
75
79
  # This method returns p* in O(log n) time and O(1) extra space.
@@ -82,7 +86,7 @@ class MaxPrioritySearchTreeInternal
82
86
  # Let Q = (-infty, x0] X [y0, infty) be the northwest quadrant defined by the point (x0, y0) and let P be the points in this data
83
87
  # structure. Define p* as
84
88
  #
85
- # - (-infty, -infty) f Q \intersect P is empty and
89
+ # - (-infty, -infty) if Q \intersect P is empty and
86
90
  # - the highest (max-y) point in Q \intersect P otherwise.
87
91
  #
88
92
  # This method returns p* in O(log n) time and O(1) extra space.
@@ -109,12 +113,12 @@ class MaxPrioritySearchTreeInternal
109
113
 
110
114
  p = root
111
115
  if quadrant == :ne
112
- best = Pair.new(INFINITY, -INFINITY)
116
+ best = Point.new(INFINITY, -INFINITY)
113
117
  preferred_child = ->(n) { right(n) }
114
118
  nonpreferred_child = ->(n) { left(n) }
115
119
  sufficient_x = ->(x) { x >= x0 }
116
120
  else
117
- best = Pair.new(-INFINITY, -INFINITY)
121
+ best = Point.new(-INFINITY, -INFINITY)
118
122
  preferred_child = ->(n) { left(n) }
119
123
  nonpreferred_child = ->(n) { right(n) }
120
124
  sufficient_x = ->(x) { x <= x0 }
@@ -186,7 +190,7 @@ class MaxPrioritySearchTreeInternal
186
190
  # Let Q = [x0, infty) X [y0, infty) be the northeast quadrant defined by the point (x0, y0) and let P be the points in this data
187
191
  # structure. Define p* as
188
192
  #
189
- # - (infty, infty) f Q \intersect P is empty and
193
+ # - (infty, infty) if Q \intersect P is empty and
190
194
  # - the leftmost (min-x) point in Q \intersect P otherwise.
191
195
  #
192
196
  # This method returns p* in O(log n) time and O(1) extra space.
@@ -224,10 +228,10 @@ class MaxPrioritySearchTreeInternal
224
228
 
225
229
  if quadrant == :ne
226
230
  sign = 1
227
- best = Pair.new(INFINITY, INFINITY)
231
+ best = Point.new(INFINITY, INFINITY)
228
232
  else
229
233
  sign = -1
230
- best = Pair.new(-INFINITY, INFINITY)
234
+ best = Point.new(-INFINITY, INFINITY)
231
235
  end
232
236
 
233
237
  p = q = root
@@ -369,7 +373,7 @@ class MaxPrioritySearchTreeInternal
369
373
  #
370
374
  # Sometimes we don't have a relevant node to the left or right of Q. The booleans L and R (which we call left and right) track
371
375
  # whether p and q are defined at the moment.
372
- best = Pair.new(INFINITY, -INFINITY)
376
+ best = Point.new(INFINITY, -INFINITY)
373
377
  p = q = left = right = nil
374
378
 
375
379
  x_range = (x0..x1)
@@ -637,7 +641,7 @@ class MaxPrioritySearchTreeInternal
637
641
  end
638
642
  current = parent(current)
639
643
  else
640
- raise LogicError, "Explore(t) state is somehow #{state} rather than 0, 1, or 2."
644
+ raise InternalLogicError, "Explore(t) state is somehow #{state} rather than 0, 1, or 2."
641
645
  end
642
646
  end
643
647
  end
@@ -782,7 +786,7 @@ class MaxPrioritySearchTreeInternal
782
786
  p_in = right(p_in)
783
787
  left = true
784
788
  else
785
- raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
789
+ raise InternalLogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
786
790
 
787
791
  p = left(p_in)
788
792
  q = right(p_in)
@@ -792,7 +796,7 @@ class MaxPrioritySearchTreeInternal
792
796
  end
793
797
  elsif left_val.x <= x1
794
798
  if right_val.x > x1
795
- raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
799
+ raise InternalLogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
796
800
 
797
801
  q = right(p_in)
798
802
  p_in = left(p_in)
@@ -806,7 +810,7 @@ class MaxPrioritySearchTreeInternal
806
810
  right_in = true
807
811
  end
808
812
  else
809
- raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
813
+ raise InternalLogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
810
814
 
811
815
  q = left(p_in)
812
816
  deactivate_p_in.call
@@ -842,8 +846,8 @@ class MaxPrioritySearchTreeInternal
842
846
 
843
847
  # q has two children. Cases!
844
848
  if @data[left(q)].x < x0
845
- raise LogicError, 'p_in should not be active, based on the value at left(q)' if left_in
846
- raise LogicError, 'q_in should not be active, based on the value at left(q)' if right_in
849
+ raise InternalLogicError, 'p_in should not be active, based on the value at left(q)' if left_in
850
+ raise InternalLogicError, 'q_in should not be active, based on the value at left(q)' if right_in
847
851
 
848
852
  left = true
849
853
  if @data[right(q)].x < x0
@@ -874,7 +878,7 @@ class MaxPrioritySearchTreeInternal
874
878
 
875
879
  # Given: q' is active and satisfied x0 <= x(q') <= x1
876
880
  enumerate_right_in = lambda do
877
- raise LogicError, 'right_in should be true if we call enumerate_right_in' unless right_in
881
+ raise InternalLogicError, 'right_in should be true if we call enumerate_right_in' unless right_in
878
882
 
879
883
  if @data[q_in].y >= y0
880
884
  report.call(q_in)
@@ -906,7 +910,7 @@ class MaxPrioritySearchTreeInternal
906
910
  # q' has two children
907
911
  right_val = @data[right(q_in)]
908
912
  if left_val.x < x0
909
- raise LogicError, 'p_in cannot be active, by the value in the left child of q_in' if left_in
913
+ raise InternalLogicError, 'p_in cannot be active, by the value in the left child of q_in' if left_in
910
914
 
911
915
  if right_val.x < x0
912
916
  p = right(q_in)
@@ -966,7 +970,7 @@ class MaxPrioritySearchTreeInternal
966
970
 
967
971
  while left || left_in || right_in || right
968
972
  # byebug if $do_it
969
- raise LogicError, 'It should not be that q_in is active but p_in is not' if right_in && !left_in
973
+ raise InternalLogicError, 'It should not be that q_in is active but p_in is not' if right_in && !left_in
970
974
 
971
975
  set_i = []
972
976
  set_i << :left if left
@@ -984,7 +988,7 @@ class MaxPrioritySearchTreeInternal
984
988
  when :right
985
989
  enumerate_right.call
986
990
  else
987
- raise LogicError, "bad symbol #{z}"
991
+ raise InternalLogicError, "bad symbol #{z}"
988
992
  end
989
993
  end
990
994
  return result unless block_given?
@@ -994,9 +998,14 @@ class MaxPrioritySearchTreeInternal
994
998
  # Build the initial stucture
995
999
 
996
1000
  private def construct_pst
997
- # We follow the algorithm in the paper by De, Maheshwari et al. Note that indexing is from 1 there. For now we pretend that that
998
- # is the case here, too.
1001
+ raise DataError, 'Duplicate x values are not supported' if contains_duplicates?(@data, by: :x)
1002
+ raise DataError, 'Duplicate y values are not supported' if contains_duplicates?(@data, by: :y)
1003
+
1004
+ # We follow the algorithm in the paper by De, Maheshwari et al.
999
1005
 
1006
+ # Since we are building an implicit binary tree, things are simpler if the array is 1-based. This probably requires a malloc and
1007
+ # data copy, which isn't great, but it's in the C layer so cheap compared to the O(n log^2 n) work we need to do for
1008
+ # construction. In fact, we are probably doing O(n^2) work because of all the calls to #index_with_largest_y_in.
1000
1009
  @data.unshift nil
1001
1010
 
1002
1011
  h = Math.log2(@size).floor
@@ -1106,13 +1115,11 @@ class MaxPrioritySearchTreeInternal
1106
1115
  (l..r).max_by { |idx| @data[idx].y }
1107
1116
  end
1108
1117
 
1109
- # Sort the subarray @data[l..r]. This is much faster than a Ruby-layer heapsort because it is mostly happening in C.
1118
+ # Sort the subarray @data[l..r].
1110
1119
  private def sort_subarray(l, r)
1111
- # heapsort_subarray(l, r)
1112
1120
  return if l == r # 1-array already sorted!
1113
1121
 
1114
- #l -= 1
1115
- #r -= 1
1122
+ # This slice-replacement is much faster than a Ruby-layer heapsort because it is mostly happening in C.
1116
1123
  @data[l..r] = @data[l..r].sort_by(&:x)
1117
1124
  end
1118
1125
 
@@ -1127,7 +1134,7 @@ class MaxPrioritySearchTreeInternal
1127
1134
  private def verify_properties
1128
1135
  # It's a max-heap in y
1129
1136
  (2..@size).each do |node|
1130
- raise LogicError, "Heap property violated at child #{node}" unless @data[node].y < @data[parent(node)].y
1137
+ raise InternalLogicError, "Heap property violated at child #{node}" unless @data[node].y < @data[parent(node)].y
1131
1138
  end
1132
1139
 
1133
1140
  # Left subtree has x values less than all of the right subtree
@@ -1137,7 +1144,7 @@ class MaxPrioritySearchTreeInternal
1137
1144
  left_max = max_x_in_subtree(left(node))
1138
1145
  right_min = min_x_in_subtree(right(node))
1139
1146
 
1140
- raise LogicError, "Left-right property of x-values violated at #{node}" unless left_max < right_min
1147
+ raise InternalLogicError, "Left-right property of x-values violated at #{node}" unless left_max < right_min
1141
1148
  end
1142
1149
  end
1143
1150
 
@@ -2,15 +2,13 @@ require 'must_be'
2
2
 
3
3
  require_relative 'shared'
4
4
 
5
+ # THIS CLASS IS INCOMPLETE AND NOT USABLE
6
+ #
5
7
  # A priority search tree (PST) stores points in two dimensions (x,y) and can efficiently answer certain questions about the set of
6
8
  # point.
7
9
  #
8
10
  # The structure was introduced by McCreight [1].
9
11
  #
10
- # It is a binary search tree which is a max-heap by the y-coordinate, and, for a non-leaf node N storing (x, y), all the nodes in
11
- # the left subtree of N have smaller x values than any of the nodes in the right subtree of N. Note, though, that the x-value at N
12
- # has no particular property relative to the x values in its subtree. It is thus _almost_ a binary search tree in the x coordinate.
13
- #
14
12
  # See more: https://en.wikipedia.org/wiki/Priority_search_tree
15
13
  #
16
14
  # It is possible to build such a tree in place, given an array of pairs. See [2]. In a follow-up paper, [3], the authors show how to
@@ -40,12 +38,12 @@ require_relative 'shared'
40
38
  # [2] De, Maheshwari, Nandy, Smid, _An in-place priority search tree_, 23rd Annual Canadian Conference on Computational Geometry.
41
39
  # [3] De, Maheshwari, Nandy, Smid, _An in-place min-max priority search tree_, Computational Geometry, v46 (2013), pp 310-327.
42
40
  # [4] Atkinson, Sack, Santoro, Strothotte, _Min-max heaps and generalized priority queues_, Commun. ACM 29 (10) (1986), pp 996-1000.
43
- class MinmaxPrioritySearchTreeInternal
41
+ class DataStructuresRMolinari::MinmaxPrioritySearchTree
44
42
  include Shared
45
43
 
46
44
  # The array of pairs is turned into a minmax PST in-place without cloning. So clone before passing it in, if you care.
47
45
  #
48
- # Each element must respond to #x and #y. Use Pair (above) if you like.
46
+ # Each element must respond to #x and #y. Use Point (above) if you like.
49
47
  def initialize(data, verify: false)
50
48
  @data = data
51
49
  @size = @data.size
@@ -75,7 +73,7 @@ class MinmaxPrioritySearchTreeInternal
75
73
  #
76
74
  # Here T(x) is the subtree rooted at x
77
75
  def leftmost_ne(x0, y0)
78
- best = Pair.new(INFINITY, INFINITY)
76
+ best = Point.new(INFINITY, INFINITY)
79
77
  p = q = root
80
78
 
81
79
  in_q = ->(pair) { pair.x >= x0 && pair.y >= y0 }
@@ -284,7 +282,7 @@ class MinmaxPrioritySearchTreeInternal
284
282
  #
285
283
  # This method returns p*
286
284
  # def highest_3_sided_up(x0, x1, y0)
287
- # best = Pair.new(INFINITY, -INFINITY)
285
+ # best = Point.new(INFINITY, -INFINITY)
288
286
 
289
287
  # in_q = lambda do |pair|
290
288
  # pair.x >= x0 && pair.x <= x1 && pair.y >= y0
@@ -407,7 +405,7 @@ class MinmaxPrioritySearchTreeInternal
407
405
  # - If Q intersect P is empty then p* = best
408
406
  #
409
407
  # Here, P is the set of points in our data structure and T_p is the subtree rooted at p
410
- best = Pair.new(INFINITY, -INFINITY)
408
+ best = Point.new(INFINITY, -INFINITY)
411
409
  p = root # root of the whole tree AND the pair stored there
412
410
 
413
411
  in_q = lambda do |pair|
@@ -1,11 +1,20 @@
1
1
  # Some odds and ends shared by other classes
2
2
  module Shared
3
+ # Infinity without having to put a +Float::+ prefix every time
3
4
  INFINITY = Float::INFINITY
4
5
 
5
- Pair = Struct.new(:x, :y)
6
+ # An (x, y) coordinate pair.
7
+ Point = Struct.new(:x, :y)
6
8
 
7
9
  # @private
10
+
11
+ # Used for errors related to logic errors in client code
8
12
  class LogicError < StandardError; end
13
+ # Used for errors related to logic errors in library code
14
+ class InternalLogicError < LogicError; end
15
+
16
+ # Used for errors related to data, such as duplicated elements where they must be distinct.
17
+ class DataError < StandardError; end
9
18
 
10
19
  # @private
11
20
  #
@@ -61,4 +70,22 @@ module Shared
61
70
  (i & 1).zero?
62
71
  end
63
72
  end
73
+
74
+ # Simple O(n) check for duplicates in an enumerable.
75
+ #
76
+ # It may be worse than O(n), depending on how close to constant set insertion is.
77
+ #
78
+ # @param enum the enumerable to check for duplicates
79
+ # @param by a method to call on each element of enum before checking. The results of these methods are checked for
80
+ # duplication. When nil we don't call anything and just use the elements themselves.
81
+ def contains_duplicates?(enum, by: nil)
82
+ seen = Set.new
83
+ enum.each do |v|
84
+ v = v.send(by) if by
85
+ return true if seen.include? v
86
+
87
+ seen << v
88
+ end
89
+ false
90
+ end
64
91
  end
@@ -1,54 +1,78 @@
1
1
  require_relative 'data_structures_rmolinari/shared'
2
- require_relative 'data_structures_rmolinari/disjoint_union_internal'
3
- require_relative 'data_structures_rmolinari/generic_segment_tree_internal'
4
- require_relative 'data_structures_rmolinari/heap_internal'
5
- require_relative 'data_structures_rmolinari/max_priority_search_tree_internal'
6
- require_relative 'data_structures_rmolinari/minmax_priority_search_tree_internal'
7
2
 
8
3
  module DataStructuresRMolinari
9
- Pair = Shared::Pair
10
-
11
- ########################################
12
- # Priority Search Trees
13
- #
14
- # Note that MinmaxPrioritySearchTree is only a fragment of what we need
4
+ # A struct responding to +.x+ and +.y+.
5
+ Point = Shared::Point
6
+ end
15
7
 
16
- MaxPrioritySearchTree = MaxPrioritySearchTreeInternal
17
- MinmaxPrioritySearchTree = MinmaxPrioritySearchTreeInternal
8
+ # These define classes inside module DataStructuresRMolinari
9
+ require_relative 'data_structures_rmolinari/disjoint_union'
10
+ require_relative 'data_structures_rmolinari/generic_segment_tree'
11
+ require_relative 'data_structures_rmolinari/heap'
12
+ require_relative 'data_structures_rmolinari/max_priority_search_tree'
13
+ require_relative 'data_structures_rmolinari/minmax_priority_search_tree'
18
14
 
15
+ # A namespace to hold the provided classes. We want to avoid polluting the global namespace with names like "Heap"
16
+ module DataStructuresRMolinari
19
17
  ########################################
20
- # Segment Trees
21
-
22
- GenericSegmentTree = GenericSegmentTreeInternal
23
-
24
- # Takes an array A[0...n] and tells us what the maximum value is on a subinterval i..j in O(log n) time.
18
+ # Concrete instances of Segment Tree
25
19
  #
26
- # TODO:
27
- # - allow min val too
28
- # - add a flag to the initializer
29
- # - call it ExtremalValSegment tree or something similar
20
+ # @todo consider moving these into generic_segment_tree.rb
21
+
22
+ # A segment tree that for an array A(0...n) answers questions of the form "what is the maximum value in the subinterval A(i..j)?"
23
+ # in O(log n) time.
30
24
  class MaxValSegmentTree
31
25
  extend Forwardable
32
26
 
33
- def_delegator :@structure, :query_on, :max_on
27
+ # Tell the tree that the value at idx has changed
28
+ def_delegator :@structure, :update_at
34
29
 
30
+ # @param data an object that contains values at integer indices based at 0, via +data[i]+.
31
+ # - This will usually be an Array, but it could also be a hash or a proc.
35
32
  def initialize(data)
36
33
  @structure = GenericSegmentTree.new(
37
34
  combine: ->(a, b) { [a, b].max },
38
35
  single_cell_array_val: ->(i) { data[i] },
39
36
  size: data.size,
40
- identity: -Float::INFINITY
37
+ identity: -Shared::INFINITY
41
38
  )
42
39
  end
40
+
41
+ # The maximum value in A(i..j).
42
+ #
43
+ # The arguments must be integers in 0...(A.size)
44
+ # @return the largest value in A(i..j) or -Infinity if i > j.
45
+ def max_on(i, j)
46
+ @structure.query_on(i, j)
47
+ end
43
48
  end
44
49
 
45
- ########################################
46
- # Heap
50
+ # A segment tree that for an array A(0...n) answers questions of the form "what is the index of the maximal value in the
51
+ # subinterval A(i..j)?" in O(log n) time.
52
+ class IndexOfMaxValSegmentTree
53
+ extend Forwardable
47
54
 
48
- Heap = HeapInternal
55
+ # Tell the tree that the value at idx has changed
56
+ def_delegator :@structure, :update_at
49
57
 
50
- ########################################
51
- # Disjoint Union
58
+ # @param (see MaxValSegmentTree#initialize)
59
+ def initialize(data)
60
+ @structure = GenericSegmentTree.new(
61
+ combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
62
+ single_cell_array_val: ->(i) { [i, data[i]] },
63
+ size: data.size,
64
+ identity: nil
65
+ )
66
+ end
52
67
 
53
- DisjointUnion = DisjointUnionInternal
68
+ # The index of the maximum value in A(i..j)
69
+ #
70
+ # The arguments must be integers in 0...(A.size)
71
+ # @return (Integer, nil) the index of the largest value in A(i..j) or +nil+ if i > j.
72
+ # - If there is more than one entry with that value, return one the indices. There is no guarantee as to which one.
73
+ # - Return +nil+ if i > j
74
+ def index_of_max_val_on(i, j)
75
+ @structure.query_on(i, j)&.first # discard the value part of the pair
76
+ end
77
+ end
54
78
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data_structures_rmolinari
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Rory Molinari
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-01-05 00:00:00.000000000 Z
11
+ date: 2023-01-06 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: must_be
@@ -67,24 +67,26 @@ dependencies:
67
67
  - !ruby/object:Gem::Version
68
68
  version: 0.22.0
69
69
  description: |
70
- This small gem contains several data structures that I have implemented to learn how they work.
70
+ This small gem contains several data structures that I have implemented in Ruby to learn how they work.
71
71
 
72
72
  Sometimes it is not enough to read the description of a data structure and accompanying pseudo-code.
73
- Actually implementing the structure is often helpful in understanding what is going on. It is also
73
+ Actually implementing it is often helpful in understanding what is going on. It is also
74
74
  usually fun.
75
75
 
76
- We implement Disjoin Union, Heap, Priority Search Tree, and Segment Tree.
77
- email: rorymolinari+rubygems@gmail.com
76
+ The gem contains basic implementions of Disjoint Union, Heap, Priority Search Tree, and Segment Tree.
77
+ See the homepage for more details.
78
+ email: rorymolinari@gmail.com
78
79
  executables: []
79
80
  extensions: []
80
81
  extra_rdoc_files: []
81
82
  files:
83
+ - CHANGELOG.md
82
84
  - lib/data_structures_rmolinari.rb
83
- - lib/data_structures_rmolinari/disjoint_union_internal.rb
84
- - lib/data_structures_rmolinari/generic_segment_tree_internal.rb
85
- - lib/data_structures_rmolinari/heap_internal.rb
86
- - lib/data_structures_rmolinari/max_priority_search_tree_internal.rb
87
- - lib/data_structures_rmolinari/minmax_priority_search_tree_internal.rb
85
+ - lib/data_structures_rmolinari/disjoint_union.rb
86
+ - lib/data_structures_rmolinari/generic_segment_tree.rb
87
+ - lib/data_structures_rmolinari/heap.rb
88
+ - lib/data_structures_rmolinari/max_priority_search_tree.rb
89
+ - lib/data_structures_rmolinari/minmax_priority_search_tree.rb
88
90
  - lib/data_structures_rmolinari/shared.rb
89
91
  homepage: https://github.com/rmolinari/data_structures
90
92
  licenses: