data_structures_rmolinari 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,1156 @@
1
+ require 'set'
2
+
3
+ require_relative 'shared'
4
+
5
+ class LogicError < StandardError; end
6
+
7
+ # A priority search tree (PST) stores a set, P, of two-dimensional points (x,y) in a way that allows efficient answes to certain
8
+ # questions about P.
9
+ #
10
+ # (In the current implementation no two points can share an x-value and no two points can share a y-value. This (rather severe)
11
+ # restriction can be relaxed with some more complicated code.)
12
+ #
13
+ # The data structure was introduced in 1985 by Edward McCreight. Later, De, Maheshwari, Nandy, and Smid showed how to construct a
14
+ # PST in-place (using only O(1) extra memory), at the expense of some slightly more complicated code for the various supported
15
+ # operations. It is their approach that we have implemented.
16
+ #
17
+ # The PST structure is an implicit, balanced binary tree with the following properties:
18
+ # * The tree is a _max-heap_ in the y coordinate. That is, the point at each node has a y-value less than its parent.
19
+ # * For each node p, the x-values of all the nodes in the left subtree of p are less than the x-values of all the nodes in the right
20
+ # subtree of p. Note that this says nothing about the x-value at the node p itself. The tree is thus _almost_ a binary search tree
21
+ # in the x coordinate.
22
+ #
23
+ # Given a set of n points, we can answer the following questions quickly:
24
+ #
25
+ # - +leftmost_ne+: for x0 and y0, what is the leftmost point (x, y) in P satisfying x >= x0 and y >= y0?
26
+ # - +rightmost_nw+: for x0 and y0, what is the rightmost point (x, y) in P satisfying x <= x0 and y >= y0?
27
+ # - +highest_ne+: for x0 and y0, what is the highest point (x, y) in P satisfying x >= x0 and y >= y0?
28
+ # - +highest_nw+: for x0 and y0, what is the highest point (x, y) in P satisfying x <= x0 and y >= y0?
29
+ # - +highest_3_sided+: for x0, x1, and y0, what is the highest point (x, y) in P satisfying x >= x0, x <= x1 and y >= y0?
30
+ # - +enumerate_3_sided+: for x0, x1, and y0, enumerate all points in P satisfying x >= x0, x <= x1 and y >= y0.
31
+ #
32
+ # (Here, "leftmost/rightmost" means "minimal/maximal x", and "highest" means "maximal y".)
33
+ #
34
+ # The first 5 operations take O(log n) time.
35
+ #
36
+ # The final operation (enumerate) takes O(m + log n) time, where m is the number of points that are enumerated.
37
+ #
38
+ # There is a related data structure called the Min-max priority search tree so we have called this a "Max priority search tree", or
39
+ # MaxPST.
40
+ #
41
+ # References:
42
+ # * E.M. McCreight, _Priority search trees_, SIAM J. Comput., 14(2):257-276, 1985. Later, De,
43
+ # * M. De, A. Maheshwari, S. C. Nandy, M. Smid, _An In-Place Priority Search Tree_, 23rd Canadian Conference on Computational
44
+ # Geometry, 2011
45
+ class MaxPrioritySearchTreeInternal
46
+ include Shared
47
+
48
+ # Construct a MaxPST from the collection of points in +data+.
49
+ #
50
+ # @param data [Array] the set P of points presented as an array. The tree is built in the array in-place without cloning. Each
51
+ # element of the array must respond to +#x+ and +#y+ (though this is not currently checked).
52
+ #
53
+ # @param verify [Boolean] when truthy, check that the properties of a PST are satisified after construction, raising an exception
54
+ # if not.
55
+ def initialize(data, verify: false)
56
+ @data = data
57
+ @size = @data.size
58
+
59
+ construct_pst
60
+ return unless verify
61
+
62
+ verify_properties
63
+ end
64
+
65
+ ########################################
66
+ # Highest NE and Highest NW
67
+
68
+ # Return the highest point in P to the "northeast" of (x0, y0).
69
+ #
70
+ # Let Q = [x0, infty) X [y0, infty) be the northeast quadrant defined by the point (x0, y0) and let P be the points in this data
71
+ # structure. Define p* as
72
+ #
73
+ # - (infty, -infty) f Q \intersect P is empty and
74
+ # - the highest (max-x) point in Q \intersect P otherwise.
75
+ #
76
+ # This method returns p* in O(log n) time and O(1) extra space.
77
+ def highest_ne(x0, y0)
78
+ highest_in_quadrant(x0, y0, :ne)
79
+ end
80
+
81
+ # Return the highest point in P to the "northwest" of (x0, y0).
82
+ #
83
+ # Let Q = (-infty, x0] X [y0, infty) be the northwest quadrant defined by the point (x0, y0) and let P be the points in this data
84
+ # structure. Define p* as
85
+ #
86
+ # - (-infty, -infty) f Q \intersect P is empty and
87
+ # - the highest (max-y) point in Q \intersect P otherwise.
88
+ #
89
+ # This method returns p* in O(log n) time and O(1) extra space.
90
+ def highest_nw(x0, y0)
91
+ highest_in_quadrant(x0, y0, :nw)
92
+ end
93
+
94
+ # The basic algorithm is from De et al. section 3.1. We have generalaized it slightly to allow it to calculate both highest_ne and
95
+ # highest_nw
96
+ #
97
+ # Note that highest_ne(x0, y0) = highest_3_sided(x0, infinty, y0) so we don't really need this. But it's a bit faster than the
98
+ # general case and is a simple algorithm that introduces a typical way that an algorithm interacts with the data structure.
99
+ #
100
+ # From the paper:
101
+ #
102
+ # The algorithm uses two variables best and p, which satisfy the following invariant
103
+ #
104
+ # - If Q intersect P is nonempty then p* in {best} union T_p
105
+ # - If Q intersect P is empty then p* = best
106
+ #
107
+ # Here, P is the set of points in our data structure and T_p is the subtree rooted at p
108
+ private def highest_in_quadrant(x0, y0, quadrant)
109
+ quadrant.must_be_in [:ne, :nw]
110
+
111
+ p = root
112
+ if quadrant == :ne
113
+ best = Pair.new(INFINITY, -INFINITY)
114
+ preferred_child = ->(n) { right(n) }
115
+ nonpreferred_child = ->(n) { left(n) }
116
+ sufficient_x = ->(x) { x >= x0 }
117
+ else
118
+ best = Pair.new(-INFINITY, -INFINITY)
119
+ preferred_child = ->(n) { left(n) }
120
+ nonpreferred_child = ->(n) { right(n) }
121
+ sufficient_x = ->(x) { x <= x0 }
122
+ end
123
+
124
+ # x == x0 or is not sufficient. This test sometimes excludes the other child of a node from consideration.
125
+ exclusionary_x = ->(x) { x == x0 || !sufficient_x.call(x) }
126
+
127
+ in_q = lambda do |pair|
128
+ sufficient_x.call(pair.x) && pair.y >= y0
129
+ end
130
+
131
+ # From the paper:
132
+ #
133
+ # takes as input a point t and does the following: if t \in Q and y(t) > y(best) then it assignes best = t
134
+ #
135
+ # Note that the paper identifies a node in the tree with its value. We need to grab the correct node.
136
+ update_highest = lambda do |node|
137
+ t = @data[node]
138
+ if in_q.call(t) && t.y > best.y
139
+ best = t
140
+ end
141
+ end
142
+
143
+ # We could make this code more efficient. But since we only have O(log n) steps we won't actually gain much so let's keep it
144
+ # readable and close to the paper's pseudocode for now.
145
+ until leaf?(p)
146
+ p_val = @data[p]
147
+ if in_q.call(p_val)
148
+ # p \in Q and nothing in its subtree can beat it because of the max-heap
149
+ update_highest.call(p)
150
+ return best
151
+ elsif p_val.y < y0
152
+ # p is too low for Q, so the entire subtree is too low as well
153
+ return best
154
+ elsif one_child?(p)
155
+ # With just one child we need to check it
156
+ p = left(p)
157
+ elsif exclusionary_x.call(@data[preferred_child.call(p)].x)
158
+ # right(p) might be in Q, but nothing in the left subtree can be, by the PST property on x.
159
+ p = preferred_child.call(p)
160
+ elsif sufficient_x.call(@data[nonpreferred_child.call(p)].x)
161
+ # Both children have sufficient x, so try the y-higher of them. Note that nothing else in either subtree will beat this one,
162
+ # by the y-property of the PST
163
+ higher = left(p)
164
+ if @data[right(p)].y > @data[left(p)].y
165
+ higher = right(p)
166
+ end
167
+ p = higher
168
+ elsif @data[preferred_child.call(p)].y < y0
169
+ # Nothing in the right subtree is in Q, but maybe we'll find something in the left
170
+ p = nonpreferred_child.call(p)
171
+ else
172
+ # At this point we know that right(p) \in Q so we need to check it. Nothing in its subtree can beat it so we don't need to
173
+ # look there. But there might be something better in the left subtree.
174
+ update_highest.call(preferred_child.call(p))
175
+ p = nonpreferred_child.call(p)
176
+ end
177
+ end
178
+ update_highest.call(p) # try the leaf
179
+ best
180
+ end
181
+
182
+ ########################################
183
+ # Leftmost NE and Rightmost NW
184
+
185
+ # Return the leftmost (min-x) point in P to the northeast of (x0, y0).
186
+ #
187
+ # Let Q = [x0, infty) X [y0, infty) be the northeast quadrant defined by the point (x0, y0) and let P be the points in this data
188
+ # structure. Define p* as
189
+ #
190
+ # - (infty, infty) f Q \intersect P is empty and
191
+ # - the leftmost (min-x) point in Q \intersect P otherwise.
192
+ #
193
+ # This method returns p* in O(log n) time and O(1) extra space.
194
+ def leftmost_ne(x0, y0)
195
+ extremal_in_x_dimension(x0, y0, :ne)
196
+ end
197
+
198
+ # Return the rightmost (max-x) point in P to the northwest of (x0, y0).
199
+ #
200
+ # Let Q = (-infty, x0] X [y0, infty) be the northwest quadrant defined by the point (x0, y0) and let P be the points in this data
201
+ # structure. Define p* as
202
+ #
203
+ # - (-infty, infty) if Q \intersect P is empty and
204
+ # - the leftmost (min-x) point in Q \intersect P otherwise.
205
+ #
206
+ # This method returns p* in O(log n) time and O(1) extra space.
207
+ def rightmost_nw(x0, y0)
208
+ extremal_in_x_dimension(x0, y0, :nw)
209
+ end
210
+
211
+ # A genericized version of the paper's leftmost_ne that can calculate either leftmost_ne or rightmost_nw as specifies via a
212
+ # parameter.
213
+ #
214
+ # Quadrant is either :ne (which gives leftmost_ne) or :nw (which gives rightmost_nw).
215
+ #
216
+ # From De et al:
217
+ #
218
+ # The algorithm uses three variables best, p, and q which satisfy the folling invariant:
219
+ #
220
+ # - if Q \intersect P is empty then p* = best
221
+ # - if Q \intersect P is nonempty then p* \in {best} \union T(p) \union T(q)
222
+ # - p and q are at the same level of T and x(p) <= x(q)
223
+ private def extremal_in_x_dimension(x0, y0, quadrant)
224
+ quadrant.must_be_in [:ne, :nw]
225
+
226
+ if quadrant == :ne
227
+ sign = 1
228
+ best = Pair.new(INFINITY, INFINITY)
229
+ else
230
+ sign = -1
231
+ best = Pair.new(-INFINITY, INFINITY)
232
+ end
233
+
234
+ p = q = root
235
+
236
+ in_q = lambda do |pair|
237
+ sign * pair.x >= sign * x0 && pair.y >= y0
238
+ end
239
+
240
+ # From the paper:
241
+ #
242
+ # takes as input a point t and does the following: if t \in Q and x(t) < x(best) then it assignes best = t
243
+ #
244
+ # Note that the paper identifies a node in the tree with its value. We need to grab the correct node.
245
+ update_leftmost = lambda do |node|
246
+ t = @data[node]
247
+ if in_q.call(t) && sign * t.x < sign * best.x
248
+ best = t
249
+ end
250
+ end
251
+
252
+ # Use the approach described in the Min-Max paper, p 316
253
+ #
254
+ # In the paper c = [c1, c2, ..., ck] is an array of four nodes, [left(p), right(p), left(q), right(q)], but we also use this
255
+ # logic when q has only a left child.
256
+ #
257
+ # Idea: x(c1) < x(c2) < ..., so the key thing to know for the next step is where x0 fits in.
258
+ #
259
+ # - If x0 <= x(c1) then all subtrees have large enough x values and we look for the leftmost node in c with a large enough y
260
+ # value. Both p and q are sent into that subtree.
261
+ # - If x0 >= x(ck) the the rightmost subtree is our only hope the rightmost subtree.
262
+ # - Otherwise, x(c1) < x0 < x(ck) and we let i be least so that x(ci) <= x0 < x(c(i+1)). Then q becomes the lefmost cj in c not
263
+ # to the left of ci such that y(cj) >= y0, if any. p becomes ci if y(ci) >= y0 and q otherwise. If there is no such j, we put
264
+ # q = p. This may leave both of p, q undefined which means there is no useful way forward and we return nils to signal this to
265
+ # calling code.
266
+ #
267
+ # The same logic applies to rightmost_nw, though everything is "backwards"
268
+ # - membership of Q depends on having a small-enough value of x, rather than a large-enough one
269
+ # - among the ci, values towards the end of the array tend not to be in Q while values towards the start of the array tend to be
270
+ # in Q
271
+ #
272
+ # Idea: handle the first issue by negating all x-values being compared and handle the second by reversing the array c before
273
+ # doing anything and swapping the values for p and q that we work out.
274
+ determine_next_nodes = lambda do |*c|
275
+ c.reverse! if quadrant == :nw
276
+
277
+ if sign * @data[c.first].x > sign * x0
278
+ # All subtrees have x-values good enough for Q. We look at y-values to work out which subtree to focus on
279
+ leftmost = c.find { |node| @data[node].y >= y0 } # might be nil
280
+
281
+ # Otherwise, explore the "leftmost" subtree with large enough y values. Its root is in Q and can't be beaten as "leftmost"
282
+ # by anything to its "right". If it's nil the calling code can bail
283
+ return [leftmost, leftmost]
284
+ end
285
+
286
+ if sign * @data[c.last].x <= sign * x0
287
+ # only the "rightmost" subtree can possibly have anything in Q, assuming distinct x-values
288
+ return [c.last, c.last]
289
+ end
290
+
291
+ values = c.map { |node| @data[node] }
292
+
293
+ # Note that x(c1) <= x0 < x(c4) so i is well-defined
294
+ i = (0...4).find { |j| sign * values[j].x <= sign * x0 && sign * x0 < sign * values[j + 1].x }
295
+
296
+ # These nodes all have large-enough x values so looking at y finds the ones in Q
297
+ new_q = c[(i + 1)..].find { |node| @data[node].y >= y0 } # could be nil
298
+ new_p = c[i] if values[i].y >= y0 # The leftmost subtree is worth exploring if the y-value is big enough but not otherwise
299
+ new_p ||= new_q # if nodes[i] is no good, send p along with q
300
+ new_q ||= new_p # but if there is no worthwhile value for q we should send it along with p
301
+
302
+ return [new_q, new_p] if quadrant == :nw # swap for the rightmost_nw case.
303
+
304
+ [new_p, new_q]
305
+ end
306
+
307
+ until leaf?(p)
308
+ update_leftmost.call(p)
309
+ update_leftmost.call(q)
310
+
311
+ if p == q
312
+ if one_child?(p)
313
+ p = q = left(p)
314
+ else
315
+ q = right(p)
316
+ p = left(p)
317
+ end
318
+ else
319
+ # p != q
320
+ if leaf?(q)
321
+ q = p # p itself is just one layer above the leaves, or is itself a leaf
322
+ elsif one_child?(q)
323
+ # This generic approach is not as fast as the bespoke checks described in the paper. But it is easier to maintain the code
324
+ # this way and allows easy implementation of rightmost_nw
325
+ p, q = determine_next_nodes.call(left(p), right(p), left(q))
326
+ else
327
+ p, q = determine_next_nodes.call(left(p), right(p), left(q), right(q))
328
+ end
329
+ break unless p # we've run out of useful nodes
330
+ end
331
+ end
332
+ update_leftmost.call(p) if p
333
+ update_leftmost.call(q) if q
334
+ best
335
+ end
336
+
337
+ ########################################
338
+ # Highest 3 Sided
339
+
340
+ # Return the highest point of P in the box bounded by x0, x1, and y0.
341
+ #
342
+ # Let Q = [x0, x1] X [y0, infty) be the "three-sided" box bounded by x0, x1, and y0, and let P be the set of points in the
343
+ # MaxPST. (Note that Q is empty if x1 < x0.) Define p* as
344
+ #
345
+ # - (infty, -infty) if Q \intersect P is empty and
346
+ # - the highest (max-x) point in Q \intersect P otherwise.
347
+ #
348
+ # This method returns p* in O(log n) time and O(1) extra space.
349
+ def highest_3_sided(x0, x1, y0)
350
+ # From the paper:
351
+ #
352
+ # The three real numbers x0, x1, and y0 define the three-sided range Q = [x0,x1] X [y0,∞). If Q \intersect P̸ is not \empty,
353
+ # define p* to be the highest point of P in Q. If Q \intersect P = \empty, define p∗ to be the point (infty, -infty).
354
+ # Algorithm Highest3Sided(x0,x1,y0) returns the point p∗.
355
+ #
356
+ # The algorithm uses two bits L and R, and three variables best, p, and q. As before, best stores the highest point in Q
357
+ # found so far. The bit L indicates whether or not p∗ may be in the subtree of p; if L=1, then p is to the left of
358
+ # Q. Similarly, the bit R indicates whether or not p∗ may be in the subtree of q; if R=1, then q is to the right of Q.
359
+ #
360
+ # Although there are a lot of lines and cases the overall idea is simple. We maintain in p the rightmost node at its level that
361
+ # is to the left of the area Q. Likewise, q is the leftmost node that is the right of Q. The logic just updates this data at
362
+ # each step. The helper check_left updates p and check_right updates q.
363
+ #
364
+ # A couple of simple observations that show why maintaining just these two points is enough.
365
+ #
366
+ # - We know that x(p) < x0. This tells us nothing about the x values in the subtrees of p (which is why we need to check various
367
+ # cases), but it does tell us that everything to the left of p has values of x that are too small to bother with.
368
+ # - We don't need to maintain any state inside the region Q because the max-heap property means that if we ever find a node r in
369
+ # Q we check it for best and then ignore its subtree (which cannot beat r on y-value).
370
+ #
371
+ # Sometimes we don't have a relevant node to the left or right of Q. The booleans L and R (which we call left and right) track
372
+ # whether p and q are defined at the moment.
373
+ best = Pair.new(INFINITY, -INFINITY)
374
+ p = q = left = right = nil
375
+
376
+ x_range = (x0..x1)
377
+
378
+ in_q = lambda do |pair|
379
+ x_range.cover?(pair.x) && pair.y >= y0
380
+ end
381
+
382
+ # From the paper:
383
+ #
384
+ # takes as input a point t and does the following: if t \in Q and x(t) < x(best) then it assignes best = t
385
+ #
386
+ # Note that the paper identifies a node in the tree with its value. We need to grab the correct node.
387
+ update_highest = lambda do |node|
388
+ t = @data[node]
389
+ if in_q.call(t) && t.y > best.y
390
+ best = t
391
+ end
392
+ end
393
+
394
+ # "Input: a node p such that x(p) < x0""
395
+ #
396
+ # Step-by-step it is pretty straightforward. As the paper says
397
+ #
398
+ # [E]ither p moves one level down in the tree T or the bit L is set to 0. In addition, the point q either stays the same or it
399
+ # become a child of (the original) p.
400
+ check_left = lambda do
401
+ if leaf?(p)
402
+ left = false # Question: did p ever get checked as a potential winner?
403
+ elsif one_child?(p)
404
+ if x_range.cover? @data[left(p)].x
405
+ update_highest.call(left(p))
406
+ left = false # can't do y-better in the subtree
407
+ elsif @data[left(p)].x < x0
408
+ p = left(p)
409
+ else
410
+ q = left(p)
411
+ right = true
412
+ left = false
413
+ end
414
+ else
415
+ # p has two children
416
+ if @data[left(p)].x < x0
417
+ if @data[right(p)].x < x0
418
+ p = right(p)
419
+ elsif @data[right(p)].x <= x1
420
+ update_highest.call(right(p))
421
+ p = left(p)
422
+ else
423
+ # x(p_r) > x1, so q needs to take it
424
+ q = right(p)
425
+ p = left(p)
426
+ right = true
427
+ end
428
+ elsif @data[left(p)].x <= x1
429
+ update_highest.call(left(p))
430
+ left = false # we won't do better in T(p_l)
431
+ if @data[right(p)].x > x1
432
+ q = right(p)
433
+ right = true
434
+ else
435
+ update_highest.call(right(p))
436
+ end
437
+ else
438
+ q = left(p)
439
+ left = false
440
+ right = true
441
+ end
442
+ end
443
+ end
444
+
445
+ # Do "on the right" with q what check_left does on the left with p
446
+ #
447
+ # We know that x(q) > x1
448
+ #
449
+ # TODO: can we share logic between check_left and check_right? At first glance they are too different to parameterize but maybe
450
+ # the bones can be shared.
451
+ #
452
+ # We either push q further down the tree or make right = false. We might also make p a child of (original) q. We never change
453
+ # left from true to false
454
+ check_right = lambda do
455
+ if leaf?(q)
456
+ right = false
457
+ elsif one_child?(q)
458
+ if x_range.cover? @data[left(q)].x
459
+ update_highest.call(left(q))
460
+ right = false # can't do y-better in the subtree
461
+ elsif @data[left(q)].x < x0
462
+ p = left(q)
463
+ left = true
464
+ right = false
465
+ else
466
+ q = left(q)
467
+ end
468
+ else
469
+ # q has two children
470
+ if @data[left(q)].x < x0
471
+ left = true
472
+ if @data[right(q)].x < x0
473
+ p = right(q)
474
+ right = false
475
+ elsif @data[right(q)].x <= x1
476
+ update_highest.call(right(q))
477
+ p = left(q)
478
+ right = false
479
+ else
480
+ # x(q_r) > x1
481
+ p = left(q)
482
+ q = right(q)
483
+ # left = true
484
+ end
485
+ elsif @data[left(q)].x <= x1
486
+ update_highest.call(left(q))
487
+ if @data[right(q)].x > x1
488
+ q = right(q)
489
+ else
490
+ update_highest.call(right(q))
491
+ right = false
492
+ end
493
+ else
494
+ q = left(q)
495
+ end
496
+ end
497
+ end
498
+
499
+ root_val = @data[root]
500
+
501
+ # If the root value is in the region Q, the max-heap property on y means we can't do better
502
+ if x_range.cover? root_val.x
503
+ # If y(root) is large enough then the root is the winner because of the max heap property in y. And if it isn't large enough
504
+ # then no other point in the tree can be high enough either
505
+ left = right = false
506
+ best = root_val if root_val.y >= y0
507
+ end
508
+
509
+ if root_val.x < x0
510
+ p = root
511
+ left = true
512
+ right = false
513
+ else
514
+ q = root
515
+ left = false
516
+ right = true
517
+ end
518
+
519
+ val = ->(sym) { sym == :left ? p : q }
520
+
521
+ while left || right
522
+ set_i = []
523
+ set_i << :left if left
524
+ set_i << :right if right
525
+ z = set_i.min_by { |s| level(val.call(s)) }
526
+ if z == :left
527
+ check_left.call
528
+ else
529
+ check_right.call
530
+ end
531
+ end
532
+
533
+ best
534
+ end
535
+
536
+ ########################################
537
+ # Enumerate 3 sided
538
+
539
+ # Enumerate the points of P in the box bounded by x0, x1, and y0.
540
+ #
541
+ # Let Q = [x0, x1] X [y0, infty) be the "three-sided" box bounded by x0, x1, and y0, and let P be the set of points in the
542
+ # MaxPST. (Note that Q is empty if x1 < x0.) We find an enumerate all the points in Q \intersect P.
543
+ #
544
+ # If the calling code provides a block then we +yield+ each point to it. Otherwise we return a set containing all the points in
545
+ # the intersection.
546
+ #
547
+ # This method runs in O(m + log n) time and O(1) extra space, where m is the number of points found.
548
+ def enumerate_3_sided(x0, x1, y0)
549
+ # From the paper
550
+ #
551
+ # Given three real numbers x0, x1, and y0 define the three sided range Q = [x0, x1] X [y0, infty). Algorithm
552
+ # Enumerage3Sided(x0, x1,y0) returns all elements of Q \intersect P. The algorithm uses the same approach as algorithm
553
+ # Highest3Sided. Besides the two bits L and R it uses two additional bits L' and R'. Each of these four bits ... corresponds
554
+ # to a subtree of T rooted at the points p, p', q, and q', respectively; if the bit is equal to one, then the subtree may
555
+ # contain points that are in the query range Q.
556
+ #
557
+ # The following variant will be maintained:
558
+ #
559
+ # - If L = 1 then x(p) < x0.
560
+ # - If L' = 1 then x0 <= x(p') <= x1.
561
+ # - If R = 1 then x(q) > x1.
562
+ # - If R' = 1 then x0 <= x(q') <= x1.
563
+ # - If L' = 1 and R' = 1 then x(p') <= x(q').
564
+ # - All points in Q \intersect P [other than those in the subtrees of the currently active search nodes] have been reported.
565
+ #
566
+ #
567
+ # My high-level understanding of the algorithm
568
+ # --------------------------------------------
569
+ #
570
+ # We need to find all elements of Q \intersect P, so it isn't enough, as it was in highest_3_sided simply to keep track of p and
571
+ # q. We need to track four nodes, p, p', q', and q which are (with a little handwaving) respectively
572
+ #
573
+ # - the rightmost node to the left of Q' = [x0, x1] X [-infinity, infinity],
574
+ # - the leftmost node inside Q',
575
+ # - the rightmost node inside Q', and
576
+ # - the leftmost node to the right of Q'.
577
+ #
578
+ # Tracking these is enough. Subtrees of things to the left of p can't have anything in Q by the x-value properties of the PST,
579
+ # and likewise with things to the right of q.
580
+ #
581
+ # And we don't need to track any more nodes inside Q'. If we had r with p' <~ r <~ q' (where s <~ t represents "t is to the
582
+ # right of s"), then all of the subtree rooted at r lies inside Q', and we can visit all of its elements of Q \intersect P via
583
+ # the routine Explore(), which is what we do whenever we need to. The node r is thus exhausted, and we can forget about it.
584
+ #
585
+ # So the algorithm is actually quite simple. There is a large amount of code here because of the many cases that need to be
586
+ # handled at each update.
587
+ #
588
+ # If a block is given, yield each found point to it. Otherwise return all the found points in an enumerable (currently Set).
589
+ x_range = x0..x1
590
+ # Instead of using primes we use "_in"
591
+ left = left_in = right_in = right = false
592
+ p = p_in = q_in = q = nil
593
+
594
+ result = Set.new
595
+
596
+ report = lambda do |node|
597
+ if block_given?
598
+ yield @data[node]
599
+ else
600
+ result << @data[node]
601
+ end
602
+ end
603
+
604
+ # "reports all points in T_t whose y-coordinates are at least y0"
605
+ #
606
+ # We follow the logic from the min-max paper, leaving out the need to worry about the parity of the leval and the min- or max-
607
+ # switching.
608
+ explore = lambda do |t|
609
+ current = t
610
+ state = 0
611
+ while current != t || state != 2
612
+ case state
613
+ when 0
614
+ # State 0: we have arrived at this node for the first time
615
+ # look at current and perhaps descend to left child
616
+ # The paper describes this algorithm as in-order, but isn't this pre-order?
617
+ if @data[current].y >= y0
618
+ report.call(current)
619
+ end
620
+ if !leaf?(current) && @data[left(current)].y >= y0
621
+ current = left(current)
622
+ else
623
+ state = 1
624
+ end
625
+ when 1
626
+ # State 1: we've already handled this node and its left subtree. Should we descend to the right subtree?
627
+ if two_children?(current) && @data[right(current)].y >= y0
628
+ current = right(current)
629
+ state = 0
630
+ else
631
+ state = 2
632
+ end
633
+ when 2
634
+ # State 2: we're done with this node and its subtrees. Go back up a level, having set state correctly for the logic at the
635
+ # parent node.
636
+ if left_child?(current)
637
+ state = 1
638
+ end
639
+ current = parent(current)
640
+ else
641
+ raise LogicError, "Explore(t) state is somehow #{state} rather than 0, 1, or 2."
642
+ end
643
+ end
644
+ end
645
+
646
+ # Helpers for the helpers
647
+ #
648
+ # Invariant: if q_in is active then p_in is active. In other words, if only one "inside" node is active then it is p_in.
649
+
650
+ # Mark p_in as inactive. Then, if q_in is active, it becomes p_in.
651
+ deactivate_p_in = lambda do
652
+ left_in = false
653
+ return unless right_in
654
+
655
+ p_in = q_in
656
+ left_in = true
657
+ right_in = false
658
+ end
659
+
660
+ # Add a new leftmost "in" point. This becomes p_in. We handle existing "inside" points appropriately
661
+ add_leftmost_inner_node = lambda do |node|
662
+ if left_in && right_in
663
+ # the old p_in is squeezed between node and q_in
664
+ explore.call(p_in)
665
+ elsif left_in
666
+ q_in = p_in
667
+ right_in = true
668
+ else
669
+ left_in = true
670
+ end
671
+ p_in = node
672
+ end
673
+
674
+ add_rightmost_inner_node = lambda do |node|
675
+ if left_in && right_in
676
+ # the old q_in is squeezed between p_in and node
677
+ explore.call(q_in)
678
+ q_in = node
679
+ elsif left_in
680
+ q_in = node
681
+ right_in = true
682
+ else
683
+ p_in = node
684
+ left_in = true
685
+ end
686
+ end
687
+
688
+ ########################################
689
+ # The four key helpers described in the paper
690
+
691
+ # Handle the next step of the subtree at p
692
+ #
693
+ # I need to go through this with paper, pencil, and some diagrams.
694
+ enumerate_left = lambda do
695
+ if leaf?(p)
696
+ left = false
697
+ return
698
+ end
699
+
700
+ if one_child?(p)
701
+ if x_range.cover? @data[left(p)].x
702
+ add_leftmost_inner_node.call(left(p))
703
+ left = false
704
+ elsif @data[left(p)].x < x0
705
+ p = left(p)
706
+ else
707
+ q = left(p)
708
+ right = true
709
+ left = false
710
+ end
711
+ return
712
+ end
713
+
714
+ # p has two children
715
+ if @data[left(p)].x < x0
716
+ if @data[right(p)].x < x0
717
+ p = right(p)
718
+ elsif @data[right(p)].x <= x1
719
+ add_leftmost_inner_node.call(right(p))
720
+ p = left(p)
721
+ else
722
+ q = right(p)
723
+ p = left(p)
724
+ right = true
725
+ end
726
+ elsif @data[left(p)].x <= x1
727
+ if @data[right(p)].x > x1
728
+ q = right(p)
729
+ p_in = left(p)
730
+ left = false
731
+ left_in = right = true
732
+ else
733
+ # p_l and p_r both lie inside [x0, x1]
734
+ add_leftmost_inner_node.call(right(p))
735
+ add_leftmost_inner_node.call(left(p))
736
+ left = false
737
+ end
738
+ else
739
+ q = left(p)
740
+ left = false
741
+ right = true
742
+ end
743
+ end
744
+
745
+ # Given: p' satisfied x0 <= x(p') <= x1. (Our p_in is the paper's p')
746
+ enumerate_left_in = lambda do
747
+ if @data[p_in].y >= y0
748
+ report.call(p_in)
749
+ end
750
+
751
+ if leaf?(p_in) # nothing more to do
752
+ deactivate_p_in.call
753
+ return
754
+ end
755
+
756
+ left_val = @data[left(p_in)]
757
+ if one_child?(p_in)
758
+ if x_range.cover? left_val.x
759
+ p_in = left(p_in)
760
+ elsif left_val.x < x0
761
+ # We aren't in the [x0, x1] zone any more and have moved out to the left
762
+ p = left(p_in)
763
+ deactivate_p_in.call
764
+ left = true
765
+ else
766
+ # similar, but we've moved out to the right. Note that left(p_in) is the leftmost node to the right of Q.
767
+ raise 'q_in should not be active (by the val of left(p_in))' if right_in
768
+
769
+ q = left(p_in)
770
+ deactivate_p_in.call
771
+ right = true
772
+ end
773
+ else
774
+ # p' has two children
775
+ right_val = @data[right(p_in)]
776
+ if left_val.x < x0
777
+ if right_val.x < x0
778
+ p = right(p_in)
779
+ left = true
780
+ deactivate_p_in.call
781
+ elsif right_val.x <= x1
782
+ p = left(p_in)
783
+ p_in = right(p_in)
784
+ left = true
785
+ else
786
+ raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
787
+
788
+ p = left(p_in)
789
+ q = right(p_in)
790
+ deactivate_p_in.call
791
+ left = true
792
+ right = true
793
+ end
794
+ elsif left_val.x <= x1
795
+ if right_val.x > x1
796
+ raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
797
+
798
+ q = right(p_in)
799
+ p_in = left(p_in)
800
+ right = true
801
+ elsif right_in
802
+ explore.call(right(p_in))
803
+ p_in = left(p_in)
804
+ else
805
+ q_in = right(p_in)
806
+ p_in = left(p_in)
807
+ right_in = true
808
+ end
809
+ else
810
+ raise LogicError, 'q_in cannot be active, by the value in the right child of p_in!' if right_in
811
+
812
+ q = left(p_in)
813
+ deactivate_p_in.call
814
+ right = true
815
+ end
816
+ end
817
+ end
818
+
819
+ # This is "just like" enumerate left, but handles q instead of p.
820
+ #
821
+ # The paper doesn't given an implementation, but it should be pretty symmetric. Can we share any logic with enumerate_left?
822
+ #
823
+ # Q: why is my implementation more complicated than enumerate_left? I must be missing something.
824
+ enumerate_right = lambda do
825
+ if leaf?(q)
826
+ right = false
827
+ return
828
+ end
829
+
830
+ if one_child?(q)
831
+ if x_range.cover? @data[left(q)].x
832
+ add_rightmost_inner_node.call(left(q))
833
+ right = false
834
+ elsif @data[left(q)].x < x0
835
+ p = left(q)
836
+ left = true
837
+ right = false
838
+ else
839
+ q = left(q)
840
+ end
841
+ return
842
+ end
843
+
844
+ # q has two children. Cases!
845
+ if @data[left(q)].x < x0
846
+ raise LogicError, 'p_in should not be active, based on the value at left(q)' if left_in
847
+ raise LogicError, 'q_in should not be active, based on the value at left(q)' if right_in
848
+
849
+ left = true
850
+ if @data[right(q)].x < x0
851
+ p = right(q)
852
+ right = false
853
+ elsif @data[right(q)].x <= x1
854
+ p_in = right(q)
855
+ p = left(q)
856
+ left_in = true
857
+ right = false
858
+ else
859
+ p = left(q)
860
+ q = right(q)
861
+ end
862
+ elsif @data[left(q)].x <= x1
863
+ add_rightmost_inner_node.call(left(q))
864
+ if @data[right(q)].x > x1
865
+ q = right(q)
866
+ else
867
+ add_rightmost_inner_node.call(right(q))
868
+ right = false
869
+ end
870
+ else
871
+ # x(q_l) > x1
872
+ q = left(q)
873
+ end
874
+ end
875
+
876
+ # Given: q' is active and satisfied x0 <= x(q') <= x1
877
+ enumerate_right_in = lambda do
878
+ raise LogicError, 'right_in should be true if we call enumerate_right_in' unless right_in
879
+
880
+ if @data[q_in].y >= y0
881
+ report.call(q_in)
882
+ end
883
+
884
+ if leaf?(q_in)
885
+ right_in = false
886
+ return
887
+ end
888
+
889
+ left_val = @data[left(q_in)]
890
+ if one_child?(q_in)
891
+ if x_range.cover? left_val.x
892
+ q_in = left(q_in)
893
+ elsif left_val.x < x0
894
+ # We have moved out to the left
895
+ p = left(q_in)
896
+ right_in = false
897
+ left = true
898
+ else
899
+ # We have moved out to the right
900
+ q = left(q_in)
901
+ right_in = false
902
+ right = true
903
+ end
904
+ return
905
+ end
906
+
907
+ # q' has two children
908
+ right_val = @data[right(q_in)]
909
+ if left_val.x < x0
910
+ raise LogicError, 'p_in cannot be active, by the value in the left child of q_in' if left_in
911
+
912
+ if right_val.x < x0
913
+ p = right(q_in)
914
+ elsif right_val.x <= x1
915
+ p = left(q_in)
916
+ p_in = right(q_in) # should this be q_in = right(q_in) ??
917
+ left_in = true
918
+ else
919
+ p = left(q_in)
920
+ q = right(q_in)
921
+ right = true
922
+ end
923
+ right_in = false
924
+ left = true
925
+ elsif left_val.x <= x1
926
+ if right_val.x > x1
927
+ q = right(q_in)
928
+ right = true
929
+ if left_in
930
+ q_in = left(q_in)
931
+ else
932
+ p_in = left(q_in)
933
+ left_in = true
934
+ right_in = false
935
+ end
936
+ else
937
+ if left_in
938
+ explore.call(left(q_in))
939
+ else
940
+ p_in = left(q_in)
941
+ left_in = true
942
+ end
943
+ q_in = right(q_in)
944
+ end
945
+ else
946
+ q = left(q_in)
947
+ right_in = false
948
+ right = true
949
+ end
950
+ end
951
+
952
+ val = ->(sym) { { left: p, left_in: p_in, right_in: q_in, right: q }[sym] }
953
+
954
+ root_val = @data[root]
955
+ if root_val.y < y0
956
+ # no hope, no op
957
+ elsif root_val.x < x0
958
+ p = root
959
+ left = true
960
+ elsif root_val.x <= x1 # Possible bug in paper, which tests "< x1"
961
+ p_in = root
962
+ left_in = true
963
+ else
964
+ q = root
965
+ right = 1
966
+ end
967
+
968
+ while left || left_in || right_in || right
969
+ # byebug if $do_it
970
+ raise LogicError, 'It should not be that q_in is active but p_in is not' if right_in && !left_in
971
+
972
+ set_i = []
973
+ set_i << :left if left
974
+ set_i << :left_in if left_in
975
+ set_i << :right_in if right_in
976
+ set_i << :right if right
977
+ z = set_i.min_by { |sym| level(val.call(sym)) }
978
+ case z
979
+ when :left
980
+ enumerate_left.call
981
+ when :left_in
982
+ enumerate_left_in.call
983
+ when :right_in
984
+ enumerate_right_in.call
985
+ when :right
986
+ enumerate_right.call
987
+ else
988
+ raise LogicError, "bad symbol #{z}"
989
+ end
990
+ end
991
+ return result unless block_given?
992
+ end
993
+
994
+ ########################################
995
+ # Build the initial stucture
996
+
997
+ private def construct_pst
998
+ # We follow the algorithm in the paper by De, Maheshwari et al. Note that indexing is from 1 there. For now we pretend that that
999
+ # is the case here, too.
1000
+
1001
+ @data.unshift nil
1002
+
1003
+ h = Math.log2(@size).floor
1004
+ a = @size - (2**h - 1) # the paper calls it A
1005
+ sort_subarray(1, @size)
1006
+
1007
+ @last_non_leaf = @size / 2
1008
+ if @size.even?
1009
+ @parent_of_one_child = @last_non_leaf
1010
+ @last_parent_of_two_children = @parent_of_one_child - 1
1011
+ else
1012
+ @parent_of_one_child = nil
1013
+ @last_parent_of_two_children = @last_non_leaf
1014
+ end
1015
+
1016
+ (0...h).each do |i|
1017
+ pow_of_2 = 2**i
1018
+ k = a / (2**(h - i))
1019
+ k1 = 2**(h + 1 - i) - 1
1020
+ k2 = (1 - k) * 2**(h - i) - 1 + a
1021
+ k3 = 2**(h - i) - 1
1022
+ (1..k).each do |j|
1023
+ l = index_with_largest_y_in(
1024
+ pow_of_2 + (j - 1) * k1, pow_of_2 + j * k1 - 1
1025
+ )
1026
+ swap(l, pow_of_2 + j - 1)
1027
+ end
1028
+
1029
+ if k < pow_of_2
1030
+ l = index_with_largest_y_in(
1031
+ pow_of_2 + k * k1, pow_of_2 + k * k1 + k2 - 1
1032
+ )
1033
+ swap(l, pow_of_2 + k)
1034
+
1035
+ m = pow_of_2 + k * k1 + k2
1036
+ (1..(pow_of_2 - k - 1)).each do |j|
1037
+ l = index_with_largest_y_in(
1038
+ m + (j - 1) * k3, m + j * k3 - 1
1039
+ )
1040
+ swap(l, pow_of_2 + k + j)
1041
+ end
1042
+ end
1043
+ sort_subarray(2 * pow_of_2, @size)
1044
+ end
1045
+ end
1046
+
1047
+ ########################################
1048
+ # Tree arithmetic
1049
+
1050
+ # First element and root of the tree structure
1051
+ private def root
1052
+ 1
1053
+ end
1054
+
1055
+ # Indexing is from 1
1056
+ private def parent(i)
1057
+ i >> 1
1058
+ end
1059
+
1060
+ private def left(i)
1061
+ i << 1
1062
+ end
1063
+
1064
+ private def right(i)
1065
+ 1 + (i << 1)
1066
+ end
1067
+
1068
+ private def level(i)
1069
+ l = 0
1070
+ while i > root
1071
+ i >>= 1
1072
+ l += 1
1073
+ end
1074
+ l
1075
+ end
1076
+
1077
+ # i has no children
1078
+ private def leaf?(i)
1079
+ i > @last_non_leaf
1080
+ end
1081
+
1082
+ # i has exactly one child (the left)
1083
+ private def one_child?(i)
1084
+ i == @parent_of_one_child
1085
+ end
1086
+
1087
+ # i has two children
1088
+ private def two_children?(i)
1089
+ i <= @last_parent_of_two_children
1090
+ end
1091
+
1092
+ # i is the left child of its parent.
1093
+ private def left_child?(i)
1094
+ (i & 1).zero?
1095
+ end
1096
+
1097
+ private def swap(index1, index2)
1098
+ return if index1 == index2
1099
+
1100
+ @data[index1], @data[index2] = @data[index2], @data[index1]
1101
+ end
1102
+
1103
+ # The index in @data[l..r] having the largest value for y
1104
+ private def index_with_largest_y_in(l, r)
1105
+ return nil if r < l
1106
+
1107
+ (l..r).max_by { |idx| @data[idx].y }
1108
+ end
1109
+
1110
+ # Sort the subarray @data[l..r]. This is much faster than a Ruby-layer heapsort because it is mostly happening in C.
1111
+ private def sort_subarray(l, r)
1112
+ # heapsort_subarray(l, r)
1113
+ return if l == r # 1-array already sorted!
1114
+
1115
+ #l -= 1
1116
+ #r -= 1
1117
+ @data[l..r] = @data[l..r].sort_by(&:x)
1118
+ end
1119
+
1120
+ ########################################
1121
+ # Debugging support
1122
+ #
1123
+ # These methods are not written for speed
1124
+
1125
+ # Check that our data satisfies the requirements of a Priority Search Tree:
1126
+ # - max-heap in y
1127
+ # - all the x values in the left subtree are less than all the x values in the right subtree
1128
+ private def verify_properties
1129
+ # It's a max-heap in y
1130
+ (2..@size).each do |node|
1131
+ raise LogicError, "Heap property violated at child #{node}" unless @data[node].y < @data[parent(node)].y
1132
+ end
1133
+
1134
+ # Left subtree has x values less than all of the right subtree
1135
+ (1..@size).each do |node|
1136
+ next if right(node) >= @size
1137
+
1138
+ left_max = max_x_in_subtree(left(node))
1139
+ right_min = min_x_in_subtree(right(node))
1140
+
1141
+ raise LogicError, "Left-right property of x-values violated at #{node}" unless left_max < right_min
1142
+ end
1143
+ end
1144
+
1145
+ private def max_x_in_subtree(root)
1146
+ return -Float::INFINITY if root >= @size
1147
+
1148
+ [@data[root].x, max_x_in_subtree(left(root)), max_x_in_subtree(right(root))].max
1149
+ end
1150
+
1151
+ private def min_x_in_subtree(root)
1152
+ return Float::INFINITY if root >= @size
1153
+
1154
+ [@data[root].x, min_x_in_subtree(left(root)), min_x_in_subtree(right(root))].min
1155
+ end
1156
+ end