data_structures_rmolinari 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,1156 @@
1
+ require 'set'
2
+
3
+ require_relative 'shared'
4
+
5
+ class LogicError < StandardError; end
6
+
7
+ # A priority search tree (PST) stores a set, P, of two-dimensional points (x,y) in a way that allows efficient answes to certain
8
+ # questions about P.
9
+ #
10
+ # (In the current implementation no two points can share an x-value and no two points can share a y-value. This (rather severe)
11
+ # restriction can be relaxed with some more complicated code.)
12
+ #
13
+ # The data structure was introduced in 1985 by Edward McCreight. Later, De, Maheshwari, Nandy, and Smid showed how to construct a
14
+ # PST in-place (using only O(1) extra memory), at the expense of some slightly more complicated code for the various supported
15
+ # operations. It is their approach that we have implemented.
16
+ #
17
+ # The PST structure is an implicit, balanced binary tree with the following properties:
18
+ # * The tree is a _max-heap_ in the y coordinate. That is, the point at each node has a y-value less than its parent.
19
+ # * For each node p, the x-values of all the nodes in the left subtree of p are less than the x-values of all the nodes in the right
20
+ # subtree of p. Note that this says nothing about the x-value at the node p itself. The tree is thus _almost_ a binary search tree
21
+ # in the x coordinate.
22
+ #
23
+ # Given a set of n points, we can answer the following questions quickly:
24
+ #
25
+ # - +leftmost_ne+: for x0 and y0, what is the leftmost point (x, y) in P satisfying x >= x0 and y >= y0?
26
+ # - +rightmost_nw+: for x0 and y0, what is the rightmost point (x, y) in P satisfying x <= x0 and y >= y0?
27
+ # - +highest_ne+: for x0 and y0, what is the highest point (x, y) in P satisfying x >= x0 and y >= y0?
28
+ # - +highest_nw+: for x0 and y0, what is the highest point (x, y) in P satisfying x <= x0 and y >= y0?
29
+ # - +highest_3_sided+: for x0, x1, and y0, what is the highest point (x, y) in P satisfying x >= x0, x <= x1 and y >= y0?
30
+ # - +enumerate_3_sided+: for x0, x1, and y0, enumerate all points in P satisfying x >= x0, x <= x1 and y >= y0.
31
+ #
32
+ # (Here, "leftmost/rightmost" means "minimal/maximal x", and "highest" means "maximal y".)
33
+ #
34
+ # The first 5 operations take O(log n) time.
35
+ #
36
+ # The final operation (enumerate) takes O(m + log n) time, where m is the number of points that are enumerated.
37
+ #
38
+ # There is a related data structure called the Min-max priority search tree so we have called this a "Max priority search tree", or
39
+ # MaxPST.
40
+ #
41
+ # References:
42
+ # * E.M. McCreight, _Priority search trees_, SIAM J. Comput., 14(2):257-276, 1985. Later, De,
43
+ # * M. De, A. Maheshwari, S. C. Nandy, M. Smid, _An In-Place Priority Search Tree_, 23rd Canadian Conference on Computational
44
+ # Geometry, 2011
45
+ class MaxPrioritySearchTreeInternal
46
+ include Shared
47
+
48
+ # Construct a MaxPST from the collection of points in +data+.
49
+ #
50
+ # @param data [Array] the set P of points presented as an array. The tree is built in the array in-place without cloning. Each
51
+ # element of the array must respond to +#x+ and +#y+ (though this is not currently checked).
52
+ #
53
+ # @param verify [Boolean] when truthy, check that the properties of a PST are satisified after construction, raising an exception
54
+ # if not.
55
+ def initialize(data, verify: false)
56
+ @data = data
57
+ @size = @data.size
58
+
59
+ construct_pst
60
+ return unless verify
61
+
62
+ verify_properties
63
+ end
64
+
65
+ ########################################
66
+ # Highest NE and Highest NW
67
+
68
+ # Return the highest point in P to the "northeast" of (x0, y0).
69
+ #
70
+ # Let Q = [x0, infty) X [y0, infty) be the northeast quadrant defined by the point (x0, y0) and let P be the points in this data
71
+ # structure. Define p* as
72
+ #
73
+ # - (infty, -infty) f Q \intersect P is empty and
74
+ # - the highest (max-x) point in Q \intersect P otherwise.
75
+ #
76
+ # This method returns p* in O(log n) time and O(1) extra space.
77
+ def highest_ne(x0, y0)
78
+ highest_in_quadrant(x0, y0, :ne)
79
+ end
80
+
81
+ # Return the highest point in P to the "northwest" of (x0, y0).
82
+ #
83
+ # Let Q = (-infty, x0] X [y0, infty) be the northwest quadrant defined by the point (x0, y0) and let P be the points in this data
84
+ # structure. Define p* as
85
+ #
86
+ # - (-infty, -infty) f Q \intersect P is empty and
87
+ # - the highest (max-y) point in Q \intersect P otherwise.
88
+ #
89
+ # This method returns p* in O(log n) time and O(1) extra space.
90
+ def highest_nw(x0, y0)
91
+ highest_in_quadrant(x0, y0, :nw)
92
+ end
93
+
94
+ # The basic algorithm is from De et al. section 3.1. We have generalaized it slightly to allow it to calculate both highest_ne and
95
+ # highest_nw
96
+ #
97
+ # Note that highest_ne(x0, y0) = highest_3_sided(x0, infinty, y0) so we don't really need this. But it's a bit faster than the
98
+ # general case and is a simple algorithm that introduces a typical way that an algorithm interacts with the data structure.
99
+ #
100
+ # From the paper:
101
+ #
102
+ # The algorithm uses two variables best and p, which satisfy the following invariant
103
+ #
104
+ # - If Q intersect P is nonempty then p* in {best} union T_p
105
+ # - If Q intersect P is empty then p* = best
106
+ #
107
+ # Here, P is the set of points in our data structure and T_p is the subtree rooted at p
108
+ private def highest_in_quadrant(x0, y0, quadrant)
109
+ quadrant.must_be_in [:ne, :nw]
110
+
111
+ p = root
112
+ if quadrant == :ne
113
+ best = Pair.new(INFINITY, -INFINITY)
114
+ preferred_child = ->(n) { right(n) }
115
+ nonpreferred_child = ->(n) { left(n) }
116
+ sufficient_x = ->(x) { x >= x0 }
117
+ else
118
+ best = Pair.new(-INFINITY, -INFINITY)
119
+ preferred_child = ->(n) { left(n) }
120
+ nonpreferred_child = ->(n) { right(n) }
121
+ sufficient_x = ->(x) { x <= x0 }
122
+ end
123
+
124
+ # x == x0 or is not sufficient. This test sometimes excludes the other child of a node from consideration.
125
+ exclusionary_x = ->(x) { x == x0 || !sufficient_x.call(x) }
126
+
127
+ in_q = lambda do |pair|
128
+ sufficient_x.call(pair.x) && pair.y >= y0
129
+ end
130
+
131
+ # From the paper:
132
+ #
133
+ # takes as input a point t and does the following: if t \in Q and y(t) > y(best) then it assignes best = t
134
+ #
135
+ # Note that the paper identifies a node in the tree with its value. We need to grab the correct node.
136
+ update_highest = lambda do |node|
137
+ t = @data[node]
138
+ if in_q.call(t) && t.y > best.y
139
+ best = t
140
+ end
141
+ end
142
+
143
+ # We could make this code more efficient. But since we only have O(log n) steps we won't actually gain much so let's keep it
144
+ # readable and close to the paper's pseudocode for now.
145
+ until leaf?(p)
146
+ p_val = @data[p]
147
+ if in_q.call(p_val)
148
+ # p \in Q and nothing in its subtree can beat it because of the max-heap
149
+ update_highest.call(p)
150
+ return best
151
+ elsif p_val.y < y0
152
+ # p is too low for Q, so the entire subtree is too low as well
153
+ return best
154
+ elsif one_child?(p)
155
+ # With just one child we need to check it
156
+ p = left(p)
157
+ elsif exclusionary_x.call(@data[preferred_child.call(p)].x)
158
+ # right(p) might be in Q, but nothing in the left subtree can be, by the PST property on x.
159
+ p = preferred_child.call(p)
160
+ elsif sufficient_x.call(@data[nonpreferred_child.call(p)].x)
161
+ # Both children have sufficient x, so try the y-higher of them. Note that nothing else in either subtree will beat this one,
162
+ # by the y-property of the PST
163
+ higher = left(p)
164
+ if @data[right(p)].y > @data[left(p)].y
165
+ higher = right(p)
166
+ end
167
+ p = higher
168
+ elsif @data[preferred_child.call(p)].y < y0
169
+ # Nothing in the right subtree is in Q, but maybe we'll find something in the left
170
+ p = nonpreferred_child.call(p)
171
+ else
172
+ # At this point we know that right(p) \in Q so we need to check it. Nothing in its subtree can beat it so we don't need to
173
+ # look there. But there might be something better in the left subtree.
174
+ update_highest.call(preferred_child.call(p))
175
+ p = nonpreferred_child.call(p)
176
+ end
177
+ end
178
+ update_highest.call(p) # try the leaf
179
+ best
180
+ end
181
+
182
+ ########################################
183
+ # Leftmost NE and Rightmost NW
184
+
185
+ # Return the leftmost (min-x) point in P to the northeast of (x0, y0).
186
+ #
187
+ # Let Q = [x0, infty) X [y0, infty) be the northeast quadrant defined by the point (x0, y0) and let P be the points in this data
188
+ # structure. Define p* as
189
+ #
190
+ # - (infty, infty) f Q \intersect P is empty and
191
+ # - the leftmost (min-x) point in Q \intersect P otherwise.
192
+ #
193
+ # This method returns p* in O(log n) time and O(1) extra space.
194
+ def leftmost_ne(x0, y0)
195
+ extremal_in_x_dimension(x0, y0, :ne)
196
+ end
197
+
198
+ # Return the rightmost (max-x) point in P to the northwest of (x0, y0).
199
+ #
200
+ # Let Q = (-infty, x0] X [y0, infty) be the northwest quadrant defined by the point (x0, y0) and let P be the points in this data
201
+ # structure. Define p* as
202
+ #
203
+ # - (-infty, infty) if Q \intersect P is empty and
204
+ # - the leftmost (min-x) point in Q \intersect P otherwise.
205
+ #
206
+ # This method returns p* in O(log n) time and O(1) extra space.
207
+ def rightmost_nw(x0, y0)
208
+ extremal_in_x_dimension(x0, y0, :nw)
209
+ end
210
+
211
+ # A genericized version of the paper's leftmost_ne that can calculate either leftmost_ne or rightmost_nw as specifies via a
212
+ # parameter.
213
+ #
214
+ # Quadrant is either :ne (which gives leftmost_ne) or :nw (which gives rightmost_nw).
215
+ #
216
+ # From De et al:
217
+ #
218
+ # The algorithm uses three variables best, p, and q which satisfy the folling invariant:
219
+ #
220
+ # - if Q \intersect P is empty then p* = best
221
+ # - if Q \intersect P is nonempty then p* \in {best} \union T(p) \union T(q)
222
+ # - p and q are at the same level of T and x(p) <= x(q)
223
+ private def extremal_in_x_dimension(x0, y0, quadrant)
224
+ quadrant.must_be_in [:ne, :nw]
225
+
226
+ if quadrant == :ne
227
+ sign = 1
228
+ best = Pair.new(INFINITY, INFINITY)
229
+ else
230
+ sign = -1
231
+ best = Pair.new(-INFINITY, INFINITY)
232
+ end
233
+
234
+ p = q = root
235
+
236
+ in_q = lambda do |pair|
237
+ sign * pair.x >= sign * x0 && pair.y >= y0
238
+ end
239
+
240
+ # From the paper:
241
+ #
242
+ # takes as input a point t and does the following: if t \in Q and x(t) < x(best) then it assignes best = t
243
+ #
244
+ # Note that the paper identifies a node in the tree with its value. We need to grab the correct node.
245
+ update_leftmost = lambda do |node|
246
+ t = @data[node]
247
+ if in_q.call(t) && sign * t.x < sign * best.x
248
+ best = t
249
+ end
250
+ end
251
+
252
+ # Use the approach described in the Min-Max paper, p 316
253
+ #
254
+ # In the paper c = [c1, c2, ..., ck] is an array of four nodes, [left(p), right(p), left(q), right(q)], but we also use this
255
+ # logic when q has only a left child.
256
+ #
257
+ # Idea: x(c1) < x(c2) < ..., so the key thing to know for the next step is where x0 fits in.
258
+ #
259
+ # - If x0 <= x(c1) then all subtrees have large enough x values and we look for the leftmost node in c with a large enough y
260
+ # value. Both p and q are sent into that subtree.
261
+ # - If x0 >= x(ck) the the rightmost subtree is our only hope the rightmost subtree.
262
+ # - Otherwise, x(c1) < x0 < x(ck) and we let i be least so that x(ci) <= x0 < x(c(i+1)). Then q becomes the lefmost cj in c not
263
+ # to the left of ci such that y(cj) >= y0, if any. p becomes ci if y(ci) >= y0 and q otherwise. If there is no such j, we put
264
+ # q = p. This may leave both of p, q undefined which means there is no useful way forward and we return nils to signal this to
265
+ # calling code.
266
+ #
267
+ # The same logic applies to rightmost_nw, though everything is "backwards"
268
+ # - membership of Q depends on having a small-enough value of x, rather than a large-enough one
269
+ # - among the ci, values towards the end of the array tend not to be in Q while values towards the start of the array tend to be
270
+ # in Q
271
+ #
272
+ # Idea: handle the first issue by negating all x-values being compared and handle the second by reversing the array c before
273
+ # doing anything and swapping the values for p and q that we work out.
274
+ determine_next_nodes = lambda do |*c|
275
+ c.reverse! if quadrant == :nw
276
+
277
+ if sign * @data[c.first].x > sign * x0
278
+ # All subtrees have x-values good enough for Q. We look at y-values to work out which subtree to focus on
279
+ leftmost = c.find { |node| @data[node].y >= y0 } # might be nil
280
+
281
+ # Otherwise, explore the "leftmost" subtree with large enough y values. Its root is in Q and can't be beaten as "leftmost"
282
+ # by anything to its "right". If it's nil the calling code can bail
283
+ return [leftmost, leftmost]
284
+ end
285
+
286
+ if sign * @data[c.last].x <= sign * x0
287
+ # only the "rightmost" subtree can possibly have anything in Q, assuming distinct x-values
288
+ return [c.last, c.last]
289
+ end
290
+
291
+ values = c.map { |node| @data[node] }
292
+
293
+ # Note that x(c1) <= x0 < x(c4) so i is well-defined
294
+ i = (0...4).find { |j| sign * values[j].x <= sign * x0 && sign * x0 < sign * values[j + 1].x }
295
+
296
+ # These nodes all have large-enough x values so looking at y finds the ones in Q
297
+ new_q = c[(i + 1)..].find { |node| @data[node].y >= y0 } # could be nil
298
+ new_p = c[i] if values[i].y >= y0 # The leftmost subtree is worth exploring if the y-value is big enough but not otherwise
299
+ new_p ||= new_q # if nodes[i] is no good, send p along with q
300
+ new_q ||= new_p # but if there is no worthwhile value for q we should send it along with p
301
+
302
+ return [new_q, new_p] if quadrant == :nw # swap for the rightmost_nw case.
303
+
304
+ [new_p, new_q]
305
+ end
306
+
307
+ until leaf?(p)
308
+ update_leftmost.call(p)
309
+ update_leftmost.call(q)
310
+
311
+ if p == q
312
+ if one_child?(p)
313
+ p = q = left(p)
314
+ else
315
+ q = right(p)
316
+ p = left(p)
317
+ end
318
+ else
319
+ # p != q
320
+ if leaf?(q)
321
+ q = p # p itself is just one layer above the leaves, or is itself a leaf
322
+ elsif one_child?(q)
323
+ # This generic approach is not as fast as the bespoke checks described in the paper. But it is easier to maintain the code
324
+ # this way and allows easy implementation of rightmost_nw
325
+ p, q = determine_next_nodes.call(left(p), right(p), left(q))
326
+ else
327
+ p, q = determine_next_nodes.call(left(p), right(p), left(q), right(q))
328
+ end
329
+ break unless p # we've run out of useful nodes
330
+ end
331
+ end
332
+ update_leftmost.call(p) if p
333
+ update_leftmost.call(q) if q
334
+ best
335
+ end
336
+
337
+ ########################################
338
+ # Highest 3 Sided
339
+
340
+ # Return the highest point of P in the box bounded by x0, x1, and y0.
341
+ #
342
+ # Let Q = [x0, x1] X [y0, infty) be the "three-sided" box bounded by x0, x1, and y0, and let P be the set of points in the
343
+ # MaxPST. (Note that Q is empty if x1 < x0.) Define p* as
344
+ #
345
+ # - (infty, -infty) if Q \intersect P is empty and
346
+ # - the highest (max-x) point in Q \intersect P otherwise.
347
+ #
348
+ # This method returns p* in O(log n) time and O(1) extra space.
349
+ def highest_3_sided(x0, x1, y0)
350
+ # From the paper:
351
+ #
352
+ # The three real numbers x0, x1, and y0 define the three-sided range Q = [x0,x1] X [y0,∞). If Q \intersect P̸ is not \empty,
353
+ # define p* to be the highest point of P in Q. If Q \intersect P = \empty, define p∗ to be the point (infty, -infty).
354
+ # Algorithm Highest3Sided(x0,x1,y0) returns the point p∗.
355
+ #
356
+ # The algorithm uses two bits L and R, and three variables best, p, and q. As before, best stores the highest point in Q
357
+ # found so far. The bit L indicates whether or not p∗ may be in the subtree of p; if L=1, then p is to the left of
358
+ # Q. Similarly, the bit R indicates whether or not p∗ may be in the subtree of q; if R=1, then q is to the right of Q.
359
+ #
360
+ # Although there are a lot of lines and cases the overall idea is simple. We maintain in p the rightmost node at its level that
361
+ # is to the left of the area Q. Likewise, q is the leftmost node that is the right of Q. The logic just updates this data at
362
+ # each step. The helper check_left updates p and check_right updates q.
363
+ #
364
+ # A couple of simple observations that show why maintaining just these two points is enough.
365
+ #
366
+ # - We know that x(p) < x0. This tells us nothing about the x values in the subtrees of p (which is why we need to check various
367
+ # cases), but it does tell us that everything to the left of p has values of x that are too small to bother with.
368
+ # - We don't need to maintain any state inside the region Q because the max-heap property means that if we ever find a node r in
369
+ # Q we check it for best and then ignore its subtree (which cannot beat r on y-value).
370
+ #
371
+ # Sometimes we don't have a relevant node to the left or right of Q. The booleans L and R (which we call left and right) track
372
+ # whether p and q are defined at the moment.
373
+ best = Pair.new(INFINITY, -INFINITY)
374
+ p = q = left = right = nil
375
+
376
+ x_range = (x0..x1)
377
+
378
+ in_q = lambda do |pair|
379
+ x_range.cover?(pair.x) && pair.y >= y0
380
+ end
381
+
382
+ # From the paper:
383
+ #
384
+ # takes as input a point t and does the following: if t \in Q and x(t) < x(best) then it assignes best = t
385
+ #
386
+ # Note that the paper identifies a node in the tree with its value. We need to grab the correct node.
387
+ update_highest = lambda do |node|
388
+ t = @data[node]
389
+ if in_q.call(t) && t.y > best.y
390
+ best = t
391
+ end
392
+ end
393
+
394
+ # "Input: a node p such that x(p) < x0""
395
+ #
396
+ # Step-by-step it is pretty straightforward. As the paper says
397
+ #
398
+ # [E]ither p moves one level down in the tree T or the bit L is set to 0. In addition, the point q either stays the same or it
399
+ # become a child of (the original) p.
400
+ check_left = lambda do
401
+ if leaf?(p)
402
+ left = false # Question: did p ever get checked as a potential winner?
403
+ elsif one_child?(p)
404
+ if x_range.cover? @data[left(p)].x
405
+ update_highest.call(left(p))
406
+ left = false # can't do y-better in the subtree
407
+ elsif @data[left(p)].x < x0
408
+ p = left(p)
409
+ else
410
+ q = left(p)
411
+ right = true
412
+ left = false
413
+ end
414
+ else
415
+ # p has two children
416
+ if @data[left(p)].x < x0
417
+ if @data[right(p)].x < x0
418
+ p = right(p)
419
+ elsif @data[right(p)].x <= x1
420
+ update_highest.call(right(p))
421
+ p = left(p)
422
+ else
423
+ # x(p_r) > x1, so q needs to take it
424
+ q = right(p)
425
+ p = left(p)
426
+ right = true
427
+ end
428
+ elsif @data[left(p)].x <= x1
429
+ update_highest.call(left(p))
430
+ left = false # we won't do better in T(p_l)
431
+ if @data[right(p)].x > x1
432
+ q = right(p)
433
+ right = true
434
+ else
435
+ update_highest.call(right(p))
436
+ end
437
+ else
438
+ q = left(p)
439
+ left = false
440
+ right = true
441
+ end
442
+ end
443
+ end
444
+
445
+ # Do "on the right" with q what check_left does on the left with p
446
+ #
447
+ # We know that x(q) > x1
448
+ #
449
+ # TODO: can we share logic between check_left and check_right? At first glance they are too different to parameterize but maybe
450
+ # the bones can be shared.
451
+ #
452
+ # We either push q further down the tree or make right = false. We might also make p a child of (original) q. We never change
453
+ # left from true to false
454
+ check_right = lambda do
455
+ if leaf?(q)
456
+ right = false
457
+ elsif one_child?(q)
458
+ if x_range.cover? @data[left(q)].x
459
+ update_highest.call(left(q))
460
+ right = false # can't do y-better in the subtree
461
+ elsif @data[left(q)].x < x0
462
+ p = left(q)
463
+ left = true
464
+ right = false
465
+ else
466
+ q = left(q)
467
+ end
468
+ else
469
+ # q has two children
470
+ if @data[left(q)].x < x0
471
+ left = true
472
+ if @data[right(q)].x < x0
473
+ p = right(q)
474
+ right = false
475
+ elsif @data[right(q)].x <= x1
476
+ update_highest.call(right(q))
477
+ p = left(q)
478
+ right = false
479
+ else
480
+ # x(q_r) > x1
481
+ p = left(q)
482
+ q = right(q)
483
+ # left = true
484
+ end
485
+ elsif @data[left(q)].x <= x1
486
+ update_highest.call(left(q))
487
+ if @data[right(q)].x > x1
488
+ q = right(q)
489
+ else
490
+ update_highest.call(right(q))
491
+ right = false
492
+ end
493
+ else
494
+ q = left(q)
495
+ end
496
+ end
497
+ end
498
+
499
+ root_val = @data[root]
500
+
501
+ # If the root value is in the region Q, the max-heap property on y means we can't do better
502
+ if x_range.cover? root_val.x
503
+ # If y(root) is large enough then the root is the winner because of the max heap property in y. And if it isn't large enough
504
+ # then no other point in the tree can be high enough either
505
+ left = right = false
506
+ best = root_val if root_val.y >= y0
507
+ end
508
+
509
+ if root_val.x < x0
510
+ p = root
511
+ left = true
512
+ right = false
513
+ else
514
+ q = root
515
+ left = false
516
+ right = true
517
+ end
518
+
519
+ val = ->(sym) { sym == :left ? p : q }
520
+
521
+ while left || right
522
+ set_i = []
523
+ set_i << :left if left
524
+ set_i << :right if right
525
+ z = set_i.min_by { |s| level(val.call(s)) }
526
+ if z == :left
527
+ check_left.call
528
+ else
529
+ check_right.call
530
+ end
531
+ end
532
+
533
+ best
534
+ end
535
+
536
+ ########################################
537
+ # Enumerate 3 sided
538
+
539
+ # Enumerate the points of P in the box bounded by x0, x1, and y0.
540
+ #
541
+ # Let Q = [x0, x1] X [y0, infty) be the "three-sided" box bounded by x0, x1, and y0, and let P be the set of points in the
542
+ # MaxPST. (Note that Q is empty if x1 < x0.) We find an enumerate all the points in Q \intersect P.
543
+ #
544
+ # If the calling code provides a block then we +yield+ each point to it. Otherwise we return a set containing all the points in
545
+ # the intersection.
546
+ #
547
+ # This method runs in O(m + log n) time and O(1) extra space, where m is the number of points found.
548
+ def enumerate_3_sided(x0, x1, y0)
549
+ # From the paper
550
+ #
551
+ # Given three real numbers x0, x1, and y0 define the three sided range Q = [x0, x1] X [y0, infty). Algorithm
552
+ # Enumerage3Sided(x0, x1,y0) returns all elements of Q \intersect P. The algorithm uses the same approach as algorithm
553
+ # Highest3Sided. Besides the two bits L and R it uses two additional bits L' and R'. Each of these four bits ... corresponds
554
+ # to a subtree of T rooted at the points p, p', q, and q', respectively; if the bit is equal to one, then the subtree may
555
+ # contain points that are in the query range Q.
556
+ #
557
+ # The following variant will be maintained:
558
+ #
559
+ # - If L = 1 then x(p) < x0.
560
+ # - If L' = 1 then x0 <= x(p') <= x1.
561
+ # - If R = 1 then x(q) > x1.
562
+ # - If R' = 1 then x0 <= x(q') <= x1.
563
+ # - If L' = 1 and R' = 1 then x(p') <= x(q').
564
+ # - All points in Q \intersect P [other than those in the subtrees of the currently active search nodes] have been reported.
565
+ #
566
+ #
567
+ # My high-level understanding of the algorithm
568
+ # --------------------------------------------
569
+ #
570
+ # We need to find all elements of Q \intersect P, so it isn't enough, as it was in highest_3_sided simply to keep track of p and
571
+ # q. We need to track four nodes, p, p', q', and q which are (with a little handwaving) respectively
572
+ #
573
+ # - the rightmost node to the left of Q' = [x0, x1] X [-infinity, infinity],
574
+ # - the leftmost node inside Q',
575
+ # - the rightmost node inside Q', and
576
+ # - the leftmost node to the right of Q'.
577
+ #
578
+ # Tracking these is enough. Subtrees of things to the left of p can't have anything in Q by the x-value properties of the PST,
579
+ # and likewise with things to the right of q.
580
+ #
581
+ # And we don't need to track any more nodes inside Q'. If we had r with p' <~ r <~ q' (where s <~ t represents "t is to the
582
+ # right of s"), then all of the subtree rooted at r lies inside Q', and we can visit all of its elements of Q \intersect P via
583
+ # the routine Explore(), which is what we do whenever we need to. The node r is thus exhausted, and we can forget about it.
584
+ #
585
+ # So the algorithm is actually quite simple. There is a large amount of code here because of the many cases that need to be
586
+ # handled at each update.
587
+ #
588
+ # If a block is given, yield each found point to it. Otherwise return all the found points in an enumerable (currently Set).
589
+ x_range = x0..x1
590
+ # Instead of using primes we use "_in"
591
+ left = left_in = right_in = right = false
592
+ p = p_in = q_in = q = nil
593
+
594
+ result = Set.new
595
+
596
+ report = lambda do |node|
597
+ if block_given?
598
+ yield @data[node]
599
+ else
600
+ result << @data[node]
601
+ end
602
+ end
603
+
604
+ # "reports all points in T_t whose y-coordinates are at least y0"
605
+ #
606
+ # We follow the logic from the min-max paper, leaving out the need to worry about the parity of the leval and the min- or max-
607
+ # switching.
608
+ explore = lambda do |t|
609
+ current = t
610
+ state = 0
611
+ while current != t || state != 2
612
+ case state
613
+ when 0
614
+ # State 0: we have arrived at this node for the first time
615
+ # look at current and perhaps descend to left child
616
+ # The paper describes this algorithm as in-order, but isn't this pre-order?
617
+ if @data[current].y >= y0
618
+ report.call(current)
619
+ end
620
+ if !leaf?(current) && @data[left(current)].y >= y0
621
+ current = left(current)
622
+ else
623
+ state = 1
624
+ end
625
+ when 1
626
+ # State 1: we've already handled this node and its left subtree. Should we descend to the right subtree?
627
+ if two_children?(current) && @data[right(current)].y >= y0
628
+ current = right(current)
629
+ state = 0
630
+ else
631
+ state = 2
632
+ end
633
+ when 2
634
+ # State 2: we're done with this node and its subtrees. Go back up a level, having set state correctly for the logic at the
635
+ # parent node.
636
+ if left_child?(current)
637
+ state = 1
638
+ end
639
+ current = parent(current)
640
+ else
641
+ raise LogicError, "Explore(t) state is somehow #{state} rather than 0, 1, or 2."
642
+ end
643
+ end
644
+ end
645
+
646
+ # Helpers for the helpers
647
+ #
648
+ # Invariant: if q_in is active then p_in is active. In other words, if only one "inside" node is active then it is p_in.
649
+
650
+ # Mark p_in as inactive. Then, if q_in is active, it becomes p_in.
651
+ deactivate_p_in = lambda do
652
+ left_in = false
653
+ return unless right_in
654
+
655
+ p_in = q_in
656
+ left_in = true
657
+ right_in = false
658
+ end
659
+
660
+ # Add a new leftmost "in" point. This becomes p_in. We handle existing "inside" points appropriately
661
+ add_leftmost_inner_node = lambda do |node|
662
+ if left_in && right_in
663
+ # the old p_in is squeezed between node and q_in
664
+ explore.call(p_in)
665
+ elsif left_in
666
+ q_in = p_in
667
+ right_in = true
668
+ else
669
+ left_in = true
670
+ end
671
+ p_in = node
672
+ end
673
+
674
+ add_rightmost_inner_node = lambda do |node|
675
+ if left_in && right_in
676
+ # the old q_in is squeezed between p_in and node
677
+ explore.call(q_in)
678
+ q_in = node
679
+ elsif left_in
680
+ q_in = node
681
+ right_in = true
682
+ else
683
+ p_in = node
684
+ left_in = true
685
+ end
686
+ end
687
+
688
+ ########################################
689
+ # The four key helpers described in the paper
690
+
691
+ # Handle the next step of the subtree at p
692
+ #
693
+ # I need to go through this with paper, pencil, and some diagrams.
694
+ enumerate_left = lambda do
695
+ if leaf?(p)
696
+ left = false
697
+ return
698
+ end
699
+
700
+ if one_child?(p)
701
+ if x_range.cover? @data[left(p)].x
702
+ add_leftmost_inner_node.call(left(p))
703
+ left = false
704
+ elsif @data[left(p)].x < x0
705
+ p = left(p)
706
+ else
707
+ q = left(p)
708
+ right = true
709
+ left = false
710
+ end
711
+ return
712
+ end
713
+
714
+ # p has two children
715
+ if @data[left(p)].x < x0
716
+ if @data[right(p)].x < x0
717
+ p = right(p)
718
+ elsif @data[right(p)].x <= x1
719
+ add_leftmost_inner_node.call(right(p))
720
+ p = left(p)
721
+ else
722
+ q = right(p)
723
+ p = left(p)
724
+ right = true
725
+ end
726
+ elsif @data[left(p)].x <= x1
727
+ if @data[right(p)].x > x1
728
+ q = right(p)
729
+ p_in = left(p)
730
+ left = false
731
+ left_in = right = true
732
+ else
733
+ # p_l and p_r both lie inside [x0, x1]
734
+ add_leftmost_inner_node.call(right(p))
735
+ add_leftmost_inner_node.call(left(p))
736
+ left = false
737
+ end
738
+ else
739
+ q = left(p)
740
+ left = false
741
+ right = true
742
+ end
743
+ end
744
+
745
+ # Given: p' satisfied x0 <= x(p') <= x1. (Our p_in is the paper's p')
746
+ enumerate_left_in = lambda do
747
+ if @data[p_in].y >= y0
748
+ report.call(p_in)
749
+ end
750
+
751
+ if leaf?(p_in) # nothing more to do
752
+ deactivate_p_in.call
753
+ return
754
+ end
755
+
756
+ left_val = @data[left(p_in)]
757
+ if one_child?(p_in)
758
+ if x_range.cover? left_val.x
759
+ p_in = left(p_in)
760
+ elsif left_val.x < x0
761
+ # We aren't in the [x0, x1] zone any more and have moved out to the left
762
+ p = left(p_in)
763
+ deactivate_p_in.call
764
+ left = true
765
+ else
766
+ # similar, but we've moved out to the right. Note that left(p_in) is the leftmost node to the right of Q.
767
+ raise 'q_in should not be active (by the val of left(p_in))' if right_in
768
+
769
+ q = left(p_in)
770
+ deactivate_p_in.call
771
+ right = true
772
+ end
773
+ else
774
+ # p' has two children
775
+ right_val = @data[right(p_in)]
776
+ if left_val.x < x0
777
+ if right_val.x < x0
778
+ p = right(p_in)
779
+ left = true
780
+ deactivate_p_in.call
781
+ elsif right_val.x <= x1
782
+ p = left(p_in)
783
+ p_in = right(p_in)
784
+ left = true
785
+ else
786
+ raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
787
+
788
+ p = left(p_in)
789
+ q = right(p_in)
790
+ deactivate_p_in.call
791
+ left = true
792
+ right = true
793
+ end
794
+ elsif left_val.x <= x1
795
+ if right_val.x > x1
796
+ raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
797
+
798
+ q = right(p_in)
799
+ p_in = left(p_in)
800
+ right = true
801
+ elsif right_in
802
+ explore.call(right(p_in))
803
+ p_in = left(p_in)
804
+ else
805
+ q_in = right(p_in)
806
+ p_in = left(p_in)
807
+ right_in = true
808
+ end
809
+ else
810
+ raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
811
+
812
+ q = left(p_in)
813
+ deactivate_p_in.call
814
+ right = true
815
+ end
816
+ end
817
+ end
818
+
819
+ # This is "just like" enumerate left, but handles q instead of p.
820
+ #
821
+ # The paper doesn't given an implementation, but it should be pretty symmetric. Can we share any logic with enumerate_left?
822
+ #
823
+ # Q: why is my implementation more complicated than enumerate_left? I must be missing something.
824
+ enumerate_right = lambda do
825
+ if leaf?(q)
826
+ right = false
827
+ return
828
+ end
829
+
830
+ if one_child?(q)
831
+ if x_range.cover? @data[left(q)].x
832
+ add_rightmost_inner_node.call(left(q))
833
+ right = false
834
+ elsif @data[left(q)].x < x0
835
+ p = left(q)
836
+ left = true
837
+ right = false
838
+ else
839
+ q = left(q)
840
+ end
841
+ return
842
+ end
843
+
844
+ # q has two children. Cases!
845
+ if @data[left(q)].x < x0
846
+ raise LogicError, 'p_in should not be active, based on the value at left(q)' if left_in
847
+ raise LogicError, 'q_in should not be active, based on the value at left(q)' if right_in
848
+
849
+ left = true
850
+ if @data[right(q)].x < x0
851
+ p = right(q)
852
+ right = false
853
+ elsif @data[right(q)].x <= x1
854
+ p_in = right(q)
855
+ p = left(q)
856
+ left_in = true
857
+ right = false
858
+ else
859
+ p = left(q)
860
+ q = right(q)
861
+ end
862
+ elsif @data[left(q)].x <= x1
863
+ add_rightmost_inner_node.call(left(q))
864
+ if @data[right(q)].x > x1
865
+ q = right(q)
866
+ else
867
+ add_rightmost_inner_node.call(right(q))
868
+ right = false
869
+ end
870
+ else
871
+ # x(q_l) > x1
872
+ q = left(q)
873
+ end
874
+ end
875
+
876
+ # Given: q' is active and satisfied x0 <= x(q') <= x1
877
+ enumerate_right_in = lambda do
878
+ raise LogicError, 'right_in should be true if we call enumerate_right_in' unless right_in
879
+
880
+ if @data[q_in].y >= y0
881
+ report.call(q_in)
882
+ end
883
+
884
+ if leaf?(q_in)
885
+ right_in = false
886
+ return
887
+ end
888
+
889
+ left_val = @data[left(q_in)]
890
+ if one_child?(q_in)
891
+ if x_range.cover? left_val.x
892
+ q_in = left(q_in)
893
+ elsif left_val.x < x0
894
+ # We have moved out to the left
895
+ p = left(q_in)
896
+ right_in = false
897
+ left = true
898
+ else
899
+ # We have moved out to the right
900
+ q = left(q_in)
901
+ right_in = false
902
+ right = true
903
+ end
904
+ return
905
+ end
906
+
907
+ # q' has two children
908
+ right_val = @data[right(q_in)]
909
+ if left_val.x < x0
910
+ raise LogicError, 'p_in cannot be active, by the value in the left child of q_in' if left_in
911
+
912
+ if right_val.x < x0
913
+ p = right(q_in)
914
+ elsif right_val.x <= x1
915
+ p = left(q_in)
916
+ p_in = right(q_in) # should this be q_in = right(q_in) ??
917
+ left_in = true
918
+ else
919
+ p = left(q_in)
920
+ q = right(q_in)
921
+ right = true
922
+ end
923
+ right_in = false
924
+ left = true
925
+ elsif left_val.x <= x1
926
+ if right_val.x > x1
927
+ q = right(q_in)
928
+ right = true
929
+ if left_in
930
+ q_in = left(q_in)
931
+ else
932
+ p_in = left(q_in)
933
+ left_in = true
934
+ right_in = false
935
+ end
936
+ else
937
+ if left_in
938
+ explore.call(left(q_in))
939
+ else
940
+ p_in = left(q_in)
941
+ left_in = true
942
+ end
943
+ q_in = right(q_in)
944
+ end
945
+ else
946
+ q = left(q_in)
947
+ right_in = false
948
+ right = true
949
+ end
950
+ end
951
+
952
+ val = ->(sym) { { left: p, left_in: p_in, right_in: q_in, right: q }[sym] }
953
+
954
+ root_val = @data[root]
955
+ if root_val.y < y0
956
+ # no hope, no op
957
+ elsif root_val.x < x0
958
+ p = root
959
+ left = true
960
+ elsif root_val.x <= x1 # Possible bug in paper, which tests "< x1"
961
+ p_in = root
962
+ left_in = true
963
+ else
964
+ q = root
965
+ right = 1
966
+ end
967
+
968
+ while left || left_in || right_in || right
969
+ # byebug if $do_it
970
+ raise LogicError, 'It should not be that q_in is active but p_in is not' if right_in && !left_in
971
+
972
+ set_i = []
973
+ set_i << :left if left
974
+ set_i << :left_in if left_in
975
+ set_i << :right_in if right_in
976
+ set_i << :right if right
977
+ z = set_i.min_by { |sym| level(val.call(sym)) }
978
+ case z
979
+ when :left
980
+ enumerate_left.call
981
+ when :left_in
982
+ enumerate_left_in.call
983
+ when :right_in
984
+ enumerate_right_in.call
985
+ when :right
986
+ enumerate_right.call
987
+ else
988
+ raise LogicError, "bad symbol #{z}"
989
+ end
990
+ end
991
+ return result unless block_given?
992
+ end
993
+
994
+ ########################################
995
+ # Build the initial stucture
996
+
997
+ private def construct_pst
998
+ # We follow the algorithm in the paper by De, Maheshwari et al. Note that indexing is from 1 there. For now we pretend that that
999
+ # is the case here, too.
1000
+
1001
+ @data.unshift nil
1002
+
1003
+ h = Math.log2(@size).floor
1004
+ a = @size - (2**h - 1) # the paper calls it A
1005
+ sort_subarray(1, @size)
1006
+
1007
+ @last_non_leaf = @size / 2
1008
+ if @size.even?
1009
+ @parent_of_one_child = @last_non_leaf
1010
+ @last_parent_of_two_children = @parent_of_one_child - 1
1011
+ else
1012
+ @parent_of_one_child = nil
1013
+ @last_parent_of_two_children = @last_non_leaf
1014
+ end
1015
+
1016
+ (0...h).each do |i|
1017
+ pow_of_2 = 2**i
1018
+ k = a / (2**(h - i))
1019
+ k1 = 2**(h + 1 - i) - 1
1020
+ k2 = (1 - k) * 2**(h - i) - 1 + a
1021
+ k3 = 2**(h - i) - 1
1022
+ (1..k).each do |j|
1023
+ l = index_with_largest_y_in(
1024
+ pow_of_2 + (j - 1) * k1, pow_of_2 + j * k1 - 1
1025
+ )
1026
+ swap(l, pow_of_2 + j - 1)
1027
+ end
1028
+
1029
+ if k < pow_of_2
1030
+ l = index_with_largest_y_in(
1031
+ pow_of_2 + k * k1, pow_of_2 + k * k1 + k2 - 1
1032
+ )
1033
+ swap(l, pow_of_2 + k)
1034
+
1035
+ m = pow_of_2 + k * k1 + k2
1036
+ (1..(pow_of_2 - k - 1)).each do |j|
1037
+ l = index_with_largest_y_in(
1038
+ m + (j - 1) * k3, m + j * k3 - 1
1039
+ )
1040
+ swap(l, pow_of_2 + k + j)
1041
+ end
1042
+ end
1043
+ sort_subarray(2 * pow_of_2, @size)
1044
+ end
1045
+ end
1046
+
1047
+ ########################################
1048
+ # Tree arithmetic
1049
+
1050
+ # First element and root of the tree structure
1051
+ private def root
1052
+ 1
1053
+ end
1054
+
1055
+ # Indexing is from 1
1056
+ private def parent(i)
1057
+ i >> 1
1058
+ end
1059
+
1060
+ private def left(i)
1061
+ i << 1
1062
+ end
1063
+
1064
+ private def right(i)
1065
+ 1 + (i << 1)
1066
+ end
1067
+
1068
+ private def level(i)
1069
+ l = 0
1070
+ while i > root
1071
+ i >>= 1
1072
+ l += 1
1073
+ end
1074
+ l
1075
+ end
1076
+
1077
+ # i has no children
1078
+ private def leaf?(i)
1079
+ i > @last_non_leaf
1080
+ end
1081
+
1082
+ # i has exactly one child (the left)
1083
+ private def one_child?(i)
1084
+ i == @parent_of_one_child
1085
+ end
1086
+
1087
+ # i has two children
1088
+ private def two_children?(i)
1089
+ i <= @last_parent_of_two_children
1090
+ end
1091
+
1092
+ # i is the left child of its parent.
1093
+ private def left_child?(i)
1094
+ (i & 1).zero?
1095
+ end
1096
+
1097
+ private def swap(index1, index2)
1098
+ return if index1 == index2
1099
+
1100
+ @data[index1], @data[index2] = @data[index2], @data[index1]
1101
+ end
1102
+
1103
+ # The index in @data[l..r] having the largest value for y
1104
+ private def index_with_largest_y_in(l, r)
1105
+ return nil if r < l
1106
+
1107
+ (l..r).max_by { |idx| @data[idx].y }
1108
+ end
1109
+
1110
+ # Sort the subarray @data[l..r]. This is much faster than a Ruby-layer heapsort because it is mostly happening in C.
1111
+ private def sort_subarray(l, r)
1112
+ # heapsort_subarray(l, r)
1113
+ return if l == r # 1-array already sorted!
1114
+
1115
+ #l -= 1
1116
+ #r -= 1
1117
+ @data[l..r] = @data[l..r].sort_by(&:x)
1118
+ end
1119
+
1120
+ ########################################
1121
+ # Debugging support
1122
+ #
1123
+ # These methods are not written for speed
1124
+
1125
+ # Check that our data satisfies the requirements of a Priority Search Tree:
1126
+ # - max-heap in y
1127
+ # - all the x values in the left subtree are less than all the x values in the right subtree
1128
+ private def verify_properties
1129
+ # It's a max-heap in y
1130
+ (2..@size).each do |node|
1131
+ raise LogicError, "Heap property violated at child #{node}" unless @data[node].y < @data[parent(node)].y
1132
+ end
1133
+
1134
+ # Left subtree has x values less than all of the right subtree
1135
+ (1..@size).each do |node|
1136
+ next if right(node) >= @size
1137
+
1138
+ left_max = max_x_in_subtree(left(node))
1139
+ right_min = min_x_in_subtree(right(node))
1140
+
1141
+ raise LogicError, "Left-right property of x-values violated at #{node}" unless left_max < right_min
1142
+ end
1143
+ end
1144
+
1145
+ private def max_x_in_subtree(root)
1146
+ return -Float::INFINITY if root >= @size
1147
+
1148
+ [@data[root].x, max_x_in_subtree(left(root)), max_x_in_subtree(right(root))].max
1149
+ end
1150
+
1151
+ private def min_x_in_subtree(root)
1152
+ return Float::INFINITY if root >= @size
1153
+
1154
+ [@data[root].x, min_x_in_subtree(left(root)), min_x_in_subtree(right(root))].min
1155
+ end
1156
+ end