data_structures_rmolinari 0.4.4 → 0.5.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +11 -0
- data/README.md +96 -29
- data/ext/c_segment_tree_template/segment_tree_template.c +2 -1
- data/lib/data_structures_rmolinari/algorithms.rb +5 -5
- data/lib/data_structures_rmolinari/c_segment_tree_template_impl.rb +3 -100
- data/lib/data_structures_rmolinari/disjoint_union.rb +2 -0
- data/lib/data_structures_rmolinari/segment_tree.rb +126 -0
- data/lib/data_structures_rmolinari/segment_tree_template.rb +3 -3
- data/lib/data_structures_rmolinari.rb +2 -67
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7682f6d3b0779f347ce0797f55f33b9d7dcc7bd9c2039fc2fd6f865eb72e085a
|
4
|
+
data.tar.gz: d717e5e36f79ddc4ecb605a59b475b7114359dea7476445590deb300f7915bd4
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c3ffd9a4f67f55b7a2df1c949cf2288c06fcae416d5ff03a10307a1b79c3dae1daa74e2576d5e190c989adeea47b046426fad8c3c64199aadf22ba500b317f36
|
7
|
+
data.tar.gz: 8380d6117f2955da9362395f8315f5121b4f7afba2f69aabb1981a01b675cbbed81d07c10b5745409080c2588c92df3d676ec36efa128571234a74dceef0e20d
|
data/CHANGELOG.md
CHANGED
@@ -2,6 +2,17 @@
|
|
2
2
|
|
3
3
|
## [Unreleased]
|
4
4
|
|
5
|
+
## [0.5.0] 2023-02.03
|
6
|
+
|
7
|
+
- SegmentTree
|
8
|
+
- Reorganize the code into a SegmentTree submodule.
|
9
|
+
- Provide a conveniece method for getting concrete instances.
|
10
|
+
|
11
|
+
- README.md
|
12
|
+
- Add some simple example code for the data types.
|
13
|
+
|
14
|
+
## [0.4.4] 2023-02-02
|
15
|
+
|
5
16
|
- Disjoint Union
|
6
17
|
- C extension: use Convenient Containers rather than my janky Dynamic Array attempt.
|
7
18
|
|
data/README.md
CHANGED
@@ -14,18 +14,6 @@ The code is available as a gem: https://rubygems.org/gems/data_structures_rmolin
|
|
14
14
|
The right way to organize the code is not obvious to me. For now the data structures are all defined in the module
|
15
15
|
`DataStructuresRMolinari` to avoid polluting the global namespace.
|
16
16
|
|
17
|
-
Example usage after the gem is installed:
|
18
|
-
```
|
19
|
-
require 'data_structures_rmolinari`
|
20
|
-
|
21
|
-
# Pull what we need out of the namespace
|
22
|
-
MaxPrioritySearchTree = DataStructuresRMolinari::MaxPrioritySearchTree
|
23
|
-
Point = DataStructuresRMolinari::Point # anything responding to :x and :y is fine
|
24
|
-
|
25
|
-
pst = MaxPrioritySearchTree.new([Point.new(1, 1)])
|
26
|
-
puts pst.largest_y_in_ne(0, 0) # "Point(1,1)"
|
27
|
-
```
|
28
|
-
|
29
17
|
# Implementations
|
30
18
|
|
31
19
|
## Disjoint Union
|
@@ -42,6 +30,23 @@ It also provides
|
|
42
30
|
For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
|
43
31
|
van Leeuwen.
|
44
32
|
|
33
|
+
``` ruby
|
34
|
+
require 'data_structures_rmolinari'
|
35
|
+
DisjointUnion = DataStructuresRMolinari::DisjointUnion
|
36
|
+
|
37
|
+
# Create an instance over the "universe" 0, 1, ..., 9.
|
38
|
+
du = DisjointUnion.new(10)
|
39
|
+
du.subset_count # => 10; each element starts out in its own subset
|
40
|
+
|
41
|
+
du.unite(2, 3) # say that 2 and 3 are actually in the same subset
|
42
|
+
du.subset_count # => 9
|
43
|
+
du.find(2) == du.find(3) # => true
|
44
|
+
|
45
|
+
du.unite(4, 5)
|
46
|
+
du.unite(3, 4) # now 2, 3, 4, and 5 are all in the same subset
|
47
|
+
du.subset_count # => 7
|
48
|
+
```
|
49
|
+
|
45
50
|
## Heap
|
46
51
|
|
47
52
|
This is a standard binary heap with an `update` method, suitable for use as a priority queue. There are several supported
|
@@ -63,6 +68,24 @@ allows the insertion of duplicate items (which is sometimes useful) and slightly
|
|
63
68
|
|
64
69
|
See https://en.wikipedia.org/wiki/Binary_heap and the paper by Edelkamp, Elmasry, and Katajainen [[EEK2017]](#references).
|
65
70
|
|
71
|
+
``` ruby
|
72
|
+
require 'data_structures_rmolinari'
|
73
|
+
Heap = DataStructuresRMolinari::Heap
|
74
|
+
|
75
|
+
data = [4, 3, 2, 1]
|
76
|
+
|
77
|
+
heap = Heap.new
|
78
|
+
|
79
|
+
# Insert the elements of data, each with itself as priority.
|
80
|
+
data.each { |v| heap.insert(v, v) }
|
81
|
+
|
82
|
+
heap.top # => 1, since we have a min-heap.
|
83
|
+
heap.pop # => 1
|
84
|
+
heap.top # => 2; with 1 gone, this is the element with least priority
|
85
|
+
heap.update(3, -3)
|
86
|
+
heap.top # => 3; now 3 is the element with least priority
|
87
|
+
```
|
88
|
+
|
66
89
|
## Priority Search Tree
|
67
90
|
|
68
91
|
A PST stores a set P of two-dimensional points in a way that allows certain queries about P to be answered efficiently. The data
|
@@ -96,13 +119,27 @@ regions.
|
|
96
119
|
By default these data structures are immutable: once constructed they cannot be changed. But there is a constructor option that
|
97
120
|
makes the instance "dynamic". This allows us to delete the element at the root of the tree - the one with largest y value (smallest
|
98
121
|
for MinPST) - with the `delete_top!` method. This operation is important in certain algorithms, such as enumerating all maximal
|
99
|
-
empty rectangles (see the second paper by De et al
|
122
|
+
empty rectangles (see the second paper by De et al[[DMNS2013]](#references)). Note that points can still not be added to the PST in
|
100
123
|
any case, and choosing the dynamic option makes certain internal bookkeeping operations slower.
|
101
124
|
|
102
125
|
In [[DMNS2013]](#references) De et al. generalize the in-place structure to a _Min-max Priority Search Tree_ (MinmaxPST) that can
|
103
126
|
answer queries in all four quadrants and both "kinds" of 3-sided boxes. Having one of these would save the trouble of constructing
|
104
127
|
both a MaxPST and MinPST. But the presentiation is hard to follow in places and the paper's pseudocode is buggy.[^minmaxpst]
|
105
128
|
|
129
|
+
``` ruby
|
130
|
+
require 'data_structures_rmolinari'
|
131
|
+
MaxPST = DataStructuresRMolinari::MaxPrioritySearchTree
|
132
|
+
Point = Shared::Point # simple (x, y) struct. Anything responding to #x and #y will work
|
133
|
+
|
134
|
+
data = [Point.new(0, 0), Point.new(1, 2), Point.new(2, 1)]
|
135
|
+
pst = MaxPST.new(data)
|
136
|
+
|
137
|
+
pst.largest_y_in_ne(0, 0) # => #<struct Shared::Point x=1, y=2>
|
138
|
+
pst.largest_y_in_ne(1, 1) # => #<struct Shared::Point x=1, y=2>
|
139
|
+
pst.largest_y_in_ne(1.5, 1) # => #<struct Shared::Point x=2, y=1>
|
140
|
+
pst.largest_y_in_3_sided(-0.5, 0.5, 0) # => #<struct Shared::Point x=0, y=0>
|
141
|
+
```
|
142
|
+
|
106
143
|
## Segment Tree
|
107
144
|
|
108
145
|
A segment tree stores information related to subintervals of a certain array. For example, a segment tree can be used to find the
|
@@ -112,11 +149,37 @@ arbitrary subarrays.
|
|
112
149
|
|
113
150
|
An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
|
114
151
|
|
115
|
-
Generic code is provided in `SegmentTreeTemplate
|
116
|
-
|
117
|
-
|
152
|
+
Generic code is provided in `SegmentTree::SegmentTreeTemplate` and its equivalent (and faster) C-based sibling,
|
153
|
+
`SegmentTree::CSegmentTreeTemplate` (see [below](#c-extensions)).
|
154
|
+
|
155
|
+
Writing a concrete segment tree class just means providing some simple lambdas and constants to the template class's
|
156
|
+
initializer. Figuring out the details requires some knowledge of the internal mechanisms of a segment tree, for which the link at
|
157
|
+
cp-algorithms.com is very helpful. See the implementations of the concrete classes `MaxValSegmentTree` and
|
118
158
|
`IndexOfMaxValSegmentTree` for examples.
|
119
159
|
|
160
|
+
Since there are several concrete "types" and two underlying generic implementions there is a convenience method on the `SegmentTree`
|
161
|
+
module to get instances.
|
162
|
+
|
163
|
+
``` ruby
|
164
|
+
require 'data_structures_rmolinari'
|
165
|
+
SegmentTree = DataStructuresRMolinari::SegmentTree # namespace module
|
166
|
+
|
167
|
+
data = [1, -3, 2, 1, 5, -9]
|
168
|
+
|
169
|
+
# Get a segment tree instance that will answer "max over this subinterval" questions about data.
|
170
|
+
# Here we get one using the ruby implementation of the generic functionality.
|
171
|
+
#
|
172
|
+
# We offer :index_of_max as an alternative to :max. This will construct an instance that answers
|
173
|
+
# questions of the form "an index of the maximum value over this subinterval".
|
174
|
+
#
|
175
|
+
# To use the version written in C, put :c instead of :ruby.
|
176
|
+
seg_tree = SegmentTree.construct(data, :max, :ruby)
|
177
|
+
|
178
|
+
seg_tree.max_on(0, 2) # => 2
|
179
|
+
seg_tree.max_on(1, 4) # => 5
|
180
|
+
# ..etc..
|
181
|
+
```
|
182
|
+
|
120
183
|
## Algorithms
|
121
184
|
|
122
185
|
The Algorithms submodule contains some algorithms using the data structures.
|
@@ -130,13 +193,12 @@ The Algorithms submodule contains some algorithms using the data structures.
|
|
130
193
|
|
131
194
|
# C Extensions
|
132
195
|
|
133
|
-
As another learning process I have implemented several of these data structures as C extensions. The
|
134
|
-
and they can be required like their pure Ruby versions. They have the same APIs as their Ruby cousins.
|
196
|
+
As another learning process I have implemented several of these data structures as C extensions. The APIs are the same.
|
135
197
|
|
136
198
|
## Disjoint Union
|
137
199
|
|
138
|
-
A benchmark suggests that a long sequence of `unite` operations is about 3 times as fast
|
139
|
-
`DisjointUnion`.
|
200
|
+
The C version is called `CDisjointUnion`. A benchmark suggests that a long sequence of `unite` operations is about 3 times as fast
|
201
|
+
with `CDisjointUnion` as with `DisjointUnion`.
|
140
202
|
|
141
203
|
The implementation uses the remarkable Convenient Containers library from Jackson Allan.[[Allan]](#references).
|
142
204
|
|
@@ -145,16 +207,21 @@ The implementation uses the remarkable Convenient Containers library from Jackso
|
|
145
207
|
`CSegmentTreeTemplate` is the C implementation of the generic class. Concrete classes are built on top of this in Ruby, just as with
|
146
208
|
the pure Ruby `SegmentTreeTemplate` class.
|
147
209
|
|
148
|
-
A benchmark suggests that a long sequence of `max_on` operations against a max-val Segment Tree is about 4 times as fast with
|
149
|
-
|
150
|
-
|
210
|
+
A benchmark suggests that a long sequence of `max_on` operations against a max-val Segment Tree is about 4 times as fast with C as
|
211
|
+
with Ruby. I'm a bit suprised the improvment isn't larger, but remember that the C code must still interact with the Ruby objects in
|
212
|
+
the underlying data array, and must combine them, etc., via Ruby lambdas.
|
151
213
|
|
152
214
|
# References
|
153
|
-
- [Allan] Allan, J., _CC: Convenient Containers_, https://github.com/JacksonAllan/CC, retrieved 2023-02-01.
|
154
|
-
- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp
|
155
|
-
|
156
|
-
- [
|
157
|
-
-
|
158
|
-
- [
|
215
|
+
- [Allan] Allan, J., _CC: Convenient Containers_, https://github.com/JacksonAllan/CC, (retrieved 2023-02-01).
|
216
|
+
- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp
|
217
|
+
245–281, https://dl.acm.org/doi/10.1145/62.2160 (retrieved 2022-02-01).
|
218
|
+
- [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI
|
219
|
+
10.1007/s00224-017-9760-2, https://kclpure.kcl.ac.uk/portal/files/87388857/TheoryComputingSzstems.pdf (retrieved 2022-02-02).
|
220
|
+
- [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985,
|
221
|
+
http://www.cs.duke.edu/courses/fall08/cps234/handouts/SMJ000257.pdf (retrieved 2023-02-02).
|
222
|
+
- [DMNS2011] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Priority Search Tree_, 23rd Canadian Conference on
|
223
|
+
Computational Geometry, 2011, http://www.cs.carleton.ca/~michiel/inplace_pst.pdf (retrieved 2023-02-02).
|
224
|
+
- [DMNS2013] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46
|
225
|
+
(2013), pp 310-327, https://people.scs.carleton.ca/~michiel/MinMaxPST.pdf (retrieved 2023-02-02).
|
159
226
|
|
160
227
|
[^minmaxpst]: See the comments in the fragmentary class `MinMaxPrioritySearchTree` for further details.
|
@@ -353,7 +353,8 @@ static VALUE segment_tree_update_at(VALUE self, VALUE idx) {
|
|
353
353
|
* (see SegmentTreeTemplate)
|
354
354
|
*/
|
355
355
|
void Init_c_segment_tree_template() {
|
356
|
-
VALUE
|
356
|
+
VALUE mSegmentTree = rb_define_module_under(mDataStructuresRMolinari, "SegmentTree");
|
357
|
+
VALUE cSegmentTreeTemplate = rb_define_class_under(mSegmentTree, "CSegmentTreeTemplate", rb_cObject);
|
357
358
|
|
358
359
|
rb_define_alloc_func(cSegmentTreeTemplate, segment_tree_alloc);
|
359
360
|
rb_define_method(cSegmentTreeTemplate, "c_initialize", segment_tree_init, 4);
|
@@ -1,4 +1,4 @@
|
|
1
|
-
#
|
1
|
+
# Algorithms that use the module's data structures but don't belong as a method on one of the data structures
|
2
2
|
module DataStructuresRMolinari::Algorithms
|
3
3
|
include Shared
|
4
4
|
|
@@ -11,12 +11,12 @@ module DataStructuresRMolinari::Algorithms
|
|
11
11
|
#
|
12
12
|
# A _maximal empty rectangle_ (MER) for P is an empty rectangle for P not properly contained in any other.
|
13
13
|
#
|
14
|
-
# We enumerate all maximal empty rectangles for P, yielding each as (left, right, bottom, top)
|
15
|
-
#
|
16
|
-
#
|
14
|
+
# We enumerate all maximal empty rectangles for P, yielding each as (left, right, bottom, top). The algorithm is due to De, M.,
|
15
|
+
# Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46 (2013), pp
|
16
|
+
# 310-327.
|
17
17
|
#
|
18
18
|
# It runs in O(m log n) time, where m is the number of MERs enumerated and n is the number of points in P. (Contructing the
|
19
|
-
# MaxPST
|
19
|
+
# MaxPST takes O(n log^2 n) time, but m = O(n^2) so we are still O(m log n) overall.)
|
20
20
|
#
|
21
21
|
# @param points [Array] an array of points in the x-y plane. Each must respond to +x+ and +y+.
|
22
22
|
def self.maximal_empty_rectangles(points)
|
@@ -3,110 +3,13 @@ require 'must_be'
|
|
3
3
|
require_relative 'shared'
|
4
4
|
require_relative 'c_segment_tree_template'
|
5
5
|
|
6
|
-
# The
|
7
|
-
# max) on a arbitrary subarray of a given array.
|
6
|
+
# The underlying functionality of the Segment Tree data type, implemented in C as a Ruby extension.
|
8
7
|
#
|
9
|
-
#
|
10
|
-
# Wikipedia article (https://en.wikipedia.org/wiki/Segment_tree) appears to describe a different data structure which is sometimes
|
11
|
-
# called an "interval tree."
|
12
|
-
#
|
13
|
-
# For more details (and some close-to-metal analysis of run time, especially for large datasets) see
|
14
|
-
# https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up implementation,
|
15
|
-
# which is faster, at least for large datasets and cache-relevant compiled code. These issues don't really apply to code written in
|
16
|
-
# Ruby.
|
17
|
-
#
|
18
|
-
# This is a generic implementation, intended to allow easy configuration for concrete instances. See the parameters to the
|
19
|
-
# initializer and the definitions of concrete realisations like MaxValSegmentTree.
|
20
|
-
#
|
21
|
-
# We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
|
8
|
+
# See SegmentTreeTemplate for more information.
|
22
9
|
class DataStructuresRMolinari::CSegmentTreeTemplate
|
23
|
-
|
24
|
-
# Construct a concrete instance of a Segment Tree. See details at the links above for the underlying concepts here.
|
25
|
-
# @param combine a lambda that takes two values and munges them into a combined value.
|
26
|
-
# - For example, if we are calculating sums over subintervals, combine.call(a, b) = a + b, while if we are doing maxima we will
|
27
|
-
# return max(a, b).
|
28
|
-
# - Things get more complicated when we are calculating, say, the _index_ of the maximal value in a subinterval. Now it is not
|
29
|
-
# enough simply to store that index at each tree node, because to combine the indices from two child nodes we need to know
|
30
|
-
# both the index of the maximal element in each child node's interval, but also the maximal values themselves, so we know
|
31
|
-
# which one "wins" for the parent node. This affects the sort of work we need to do when combining and the value provided by
|
32
|
-
# the +single_cell_array_val+ lambda.
|
33
|
-
# @param single_cell_array_val a lambda that takes an index i and returns the value we need to store in the #build
|
34
|
-
# operation for the subinterval i..i.
|
35
|
-
# - This will often simply be the value data[i], but in some cases it will be something else. For example, when we are
|
36
|
-
# calculating the index of the maximal value on each subinterval we need [i, data[i]] here.
|
37
|
-
# - If +update_at+ is called later, this lambda must close over the underlying data in a way that captures the updated value.
|
38
|
-
# @param size the size of the underlying data array, used in certain internal arithmetic.
|
39
|
-
# @param identity the value to return when we are querying on an empty interval
|
40
|
-
# - for sums, this will be zero; for maxima, this will be -Infinity, etc
|
10
|
+
# (see SegmentTreeTemplate::initialize)
|
41
11
|
def initialize(combine:, single_cell_array_val:, size:, identity:)
|
42
12
|
# having sorted out the keyword arguments, pass them more easily to the C layer.
|
43
13
|
c_initialize(combine, single_cell_array_val, size, identity)
|
44
14
|
end
|
45
15
|
end
|
46
|
-
|
47
|
-
# A segment tree that for an array A(0...n) answers questions of the form "what is the maximum value in the subinterval A(i..j)?"
|
48
|
-
# in O(log n) time.
|
49
|
-
#
|
50
|
-
# C version
|
51
|
-
#
|
52
|
-
# TODO: share the definition with (non-C) MasValSegmentTree. The only difference is the class of the underlying segment tree
|
53
|
-
# template.
|
54
|
-
module DataStructuresRMolinari
|
55
|
-
class CMaxValSegmentTree
|
56
|
-
extend Forwardable
|
57
|
-
|
58
|
-
# Tell the tree that the value at idx has changed
|
59
|
-
def_delegator :@structure, :update_at
|
60
|
-
|
61
|
-
# @param data an object that contains values at integer indices based at 0, via +data[i]+.
|
62
|
-
# - This will usually be an Array, but it could also be a hash or a proc.
|
63
|
-
def initialize(data)
|
64
|
-
@structure = CSegmentTreeTemplate.new(
|
65
|
-
combine: ->(a, b) { [a, b].max },
|
66
|
-
single_cell_array_val: ->(i) { data[i] },
|
67
|
-
size: data.size,
|
68
|
-
identity: -Shared::INFINITY
|
69
|
-
)
|
70
|
-
end
|
71
|
-
|
72
|
-
# The maximum value in A(i..j).
|
73
|
-
#
|
74
|
-
# The arguments must be integers in 0...(A.size)
|
75
|
-
# @return the largest value in A(i..j) or -Infinity if i > j.
|
76
|
-
def max_on(i, j)
|
77
|
-
@structure.query_on(i, j)
|
78
|
-
end
|
79
|
-
end
|
80
|
-
|
81
|
-
# A segment tree that for an array A(0...n) answers questions of the form "what is the index of the maximal value in the
|
82
|
-
# subinterval A(i..j)?" in O(log n) time.
|
83
|
-
#
|
84
|
-
# C version
|
85
|
-
class CIndexOfMaxValSegmentTree
|
86
|
-
extend Forwardable
|
87
|
-
|
88
|
-
# Tell the tree that the value at idx has changed
|
89
|
-
def_delegator :@structure, :update_at
|
90
|
-
|
91
|
-
# @param (see MaxValSegmentTree#initialize)
|
92
|
-
def initialize(data)
|
93
|
-
@structure = CSegmentTreeTemplate.new(
|
94
|
-
combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
|
95
|
-
single_cell_array_val: ->(i) { [i, data[i]] },
|
96
|
-
size: data.size,
|
97
|
-
identity: nil
|
98
|
-
)
|
99
|
-
end
|
100
|
-
|
101
|
-
# The index of the maximum value in A(i..j)
|
102
|
-
#
|
103
|
-
# The arguments must be integers in 0...(A.size)
|
104
|
-
# @return (Integer, nil) the index of the largest value in A(i..j) or +nil+ if i > j.
|
105
|
-
# - If there is more than one entry with that value, return one the indices. There is no guarantee as to which one.
|
106
|
-
# - Return +nil+ if i > j
|
107
|
-
def index_of_max_val_on(i, j)
|
108
|
-
@structure.query_on(i, j)&.first # discard the value part of the pair, which is a bookkeeping
|
109
|
-
end
|
110
|
-
end
|
111
|
-
|
112
|
-
end
|
@@ -0,0 +1,126 @@
|
|
1
|
+
require_relative 'shared'
|
2
|
+
|
3
|
+
# A namespace to hold the various bits and bobs related to the SegmentTree implementation
|
4
|
+
module DataStructuresRMolinari::SegmentTree
|
5
|
+
end
|
6
|
+
|
7
|
+
require_relative 'segment_tree_template' # Ruby implementation of the generic API
|
8
|
+
require_relative 'c_segment_tree_template' # C implementation of the generic API
|
9
|
+
|
10
|
+
# Segment Tree: various concrete implementations
|
11
|
+
#
|
12
|
+
# There is an excellent description of the data structure at https://cp-algorithms.com/data_structures/segment_tree.html. The
|
13
|
+
# Wikipedia article (https://en.wikipedia.org/wiki/Segment_tree) appears to describe a different data structure which is sometimes
|
14
|
+
# called an "interval tree."
|
15
|
+
#
|
16
|
+
# For more details (and some close-to-metal analysis of run time, especially for large datasets) see
|
17
|
+
# https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up implementation,
|
18
|
+
# which is faster, at least for large datasets and cache-relevant compiled code. These issues don't really apply to code written in
|
19
|
+
# Ruby.
|
20
|
+
#
|
21
|
+
# Here we provide several concrete segment tree implementations built on top of the template (generic) versions. Each instance is
|
22
|
+
# backed either by the pure Ruby SegmentTreeTemplate or its C-based sibling CSegmentTreeTemplate
|
23
|
+
module DataStructuresRMolinari
|
24
|
+
module SegmentTree
|
25
|
+
# A convenience method to construct a Segment Tree that, for a given array A(0...size), answers questions of the kind given by
|
26
|
+
# operation, using the template written in lang
|
27
|
+
#
|
28
|
+
# - @param data: the array A.
|
29
|
+
# - It must respond to +#size+ and to +#[]+ with non-negative integer arguments.
|
30
|
+
# - @param operation: a supported "style" of Segment Tree
|
31
|
+
# - for now, must be one of these (but you can write your own concrete version)
|
32
|
+
# - +:max+: implementing +max_on(i, j)+, returning the maximum value in A(i..j)
|
33
|
+
# - +:index_of_max+: implementing +index_of_max_val_on(i, j)+, returning an index corresponding to the maximum value in
|
34
|
+
# A(i..j).
|
35
|
+
# - @param lang: the language in which the underlying "template" is written
|
36
|
+
# - +:c+ or +:ruby+
|
37
|
+
# - the C version will run faster but for now may be buggier and harder to debug
|
38
|
+
module_function def construct(data, operation, lang)
|
39
|
+
operation.must_be_in [:max, :index_of_max]
|
40
|
+
lang.must_be_in [:ruby, :c]
|
41
|
+
|
42
|
+
klass = operation == :max ? MaxValSegmentTree : IndexOfMaxValSegmentTree
|
43
|
+
template = lang == :ruby ? SegmentTreeTemplate : CSegmentTreeTemplate
|
44
|
+
|
45
|
+
klass.new(template, data)
|
46
|
+
end
|
47
|
+
|
48
|
+
# A segment tree that for an array A(0...n) answers questions of the form "what is the maximum value in the subinterval A(i..j)?"
|
49
|
+
# in O(log n) time.
|
50
|
+
class MaxValSegmentTree
|
51
|
+
extend Forwardable
|
52
|
+
|
53
|
+
# Tell the tree that the value at idx has changed
|
54
|
+
def_delegator :@structure, :update_at
|
55
|
+
|
56
|
+
# @param template_klass the "template" class that provides the generic implementation of the Segment Tree functionality.
|
57
|
+
# @param data an object that contains values at integer indices based at 0, via +data[i]+.
|
58
|
+
# - This will usually be an Array, but it could also be a hash or a proc.
|
59
|
+
def initialize(template_klass, data)
|
60
|
+
data.must_be_a Enumerable
|
61
|
+
|
62
|
+
@structure = template_klass.new(
|
63
|
+
combine: ->(a, b) { [a, b].max },
|
64
|
+
single_cell_array_val: ->(i) { data[i] },
|
65
|
+
size: data.size,
|
66
|
+
identity: -Shared::INFINITY
|
67
|
+
)
|
68
|
+
end
|
69
|
+
|
70
|
+
# The maximum value in A(i..j).
|
71
|
+
#
|
72
|
+
# The arguments must be integers in 0...(A.size)
|
73
|
+
# @return the largest value in A(i..j) or -Infinity if i > j.
|
74
|
+
def max_on(i, j)
|
75
|
+
@structure.query_on(i, j)
|
76
|
+
end
|
77
|
+
end
|
78
|
+
|
79
|
+
# A segment tree that for an array A(0...n) answers questions of the form "what is the index of the maximal value in the
|
80
|
+
# subinterval A(i..j)?" in O(log n) time.
|
81
|
+
class IndexOfMaxValSegmentTree
|
82
|
+
extend Forwardable
|
83
|
+
|
84
|
+
# Tell the tree that the value at idx has changed
|
85
|
+
def_delegator :@structure, :update_at
|
86
|
+
|
87
|
+
# @param (see MaxValSegmentTree#initialize)
|
88
|
+
def initialize(template_klass, data)
|
89
|
+
data.must_be_a Enumerable
|
90
|
+
|
91
|
+
@structure = template_klass.new(
|
92
|
+
combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
|
93
|
+
single_cell_array_val: ->(i) { [i, data[i]] },
|
94
|
+
size: data.size,
|
95
|
+
identity: nil
|
96
|
+
)
|
97
|
+
end
|
98
|
+
|
99
|
+
# The index of the maximum value in A(i..j)
|
100
|
+
#
|
101
|
+
# The arguments must be integers in 0...(A.size)
|
102
|
+
# @return (Integer, nil) the index of the largest value in A(i..j) or +nil+ if i > j.
|
103
|
+
# - If there is more than one entry with that value, return one the indices. There is no guarantee as to which one.
|
104
|
+
# - Return +nil+ if i > j
|
105
|
+
def index_of_max_val_on(i, j)
|
106
|
+
@structure.query_on(i, j)&.first # discard the value part of the pair, which is a bookkeeping
|
107
|
+
end
|
108
|
+
end
|
109
|
+
|
110
|
+
# The underlying functionality of the Segment Tree data type, implemented in C as a Ruby extension.
|
111
|
+
#
|
112
|
+
# See SegmentTreeTemplate for more information.
|
113
|
+
#
|
114
|
+
# Implementation note
|
115
|
+
#
|
116
|
+
# The functionality is entirely written in C. But we write the constructor in Ruby because keyword arguments are difficult to
|
117
|
+
# parse on the C side.
|
118
|
+
class CSegmentTreeTemplate
|
119
|
+
# (see SegmentTreeTemplate::initialize)
|
120
|
+
def initialize(combine:, single_cell_array_val:, size:, identity:)
|
121
|
+
# having sorted out the keyword arguments, pass them more easily to the C layer.
|
122
|
+
c_initialize(combine, single_cell_array_val, size, identity)
|
123
|
+
end
|
124
|
+
end
|
125
|
+
end
|
126
|
+
end
|
@@ -1,7 +1,7 @@
|
|
1
1
|
require_relative 'shared'
|
2
2
|
|
3
|
-
#
|
4
|
-
# max) on a arbitrary subarray of a given array.
|
3
|
+
# A generic implementation of Segment Tree, which can be used for various interval-related purposes, like efficiently finding the
|
4
|
+
# sum (or min or max) on a arbitrary subarray of a given array.
|
5
5
|
#
|
6
6
|
# There is an excellent description of the data structure at https://cp-algorithms.com/data_structures/segment_tree.html. The
|
7
7
|
# Wikipedia article (https://en.wikipedia.org/wiki/Segment_tree) appears to describe a different data structure which is sometimes
|
@@ -16,7 +16,7 @@ require_relative 'shared'
|
|
16
16
|
# initializer and the definitions of concrete realisations like MaxValSegmentTree.
|
17
17
|
#
|
18
18
|
# We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
|
19
|
-
class DataStructuresRMolinari::SegmentTreeTemplate
|
19
|
+
class DataStructuresRMolinari::SegmentTree::SegmentTreeTemplate
|
20
20
|
include Shared
|
21
21
|
include Shared::BinaryTreeArithmetic
|
22
22
|
|
@@ -14,77 +14,12 @@ require_relative 'data_structures_rmolinari/algorithms'
|
|
14
14
|
require_relative 'data_structures_rmolinari/disjoint_union'
|
15
15
|
require_relative 'data_structures_rmolinari/c_disjoint_union' # version as a C extension
|
16
16
|
|
17
|
-
require_relative 'data_structures_rmolinari/
|
18
|
-
require_relative 'data_structures_rmolinari/c_segment_tree_template_impl'
|
17
|
+
require_relative 'data_structures_rmolinari/segment_tree'
|
19
18
|
|
20
19
|
require_relative 'data_structures_rmolinari/heap'
|
21
20
|
require_relative 'data_structures_rmolinari/max_priority_search_tree'
|
22
21
|
require_relative 'data_structures_rmolinari/min_priority_search_tree'
|
23
22
|
|
24
23
|
module DataStructuresRMolinari
|
25
|
-
|
26
|
-
# Concrete instances of Segment Tree
|
27
|
-
#
|
28
|
-
# @todo consider moving these into generic_segment_tree.rb and renaming that file
|
29
|
-
|
30
|
-
# A segment tree that for an array A(0...n) answers questions of the form "what is the maximum value in the subinterval A(i..j)?"
|
31
|
-
# in O(log n) time.
|
32
|
-
class MaxValSegmentTree
|
33
|
-
extend Forwardable
|
34
|
-
|
35
|
-
# Tell the tree that the value at idx has changed
|
36
|
-
def_delegator :@structure, :update_at
|
37
|
-
|
38
|
-
# @param data an object that contains values at integer indices based at 0, via +data[i]+.
|
39
|
-
# - This will usually be an Array, but it could also be a hash or a proc.
|
40
|
-
def initialize(data)
|
41
|
-
data.must_be_a Enumerable
|
42
|
-
|
43
|
-
@structure = SegmentTreeTemplate.new(
|
44
|
-
combine: ->(a, b) { [a, b].max },
|
45
|
-
single_cell_array_val: ->(i) { data[i] },
|
46
|
-
size: data.size,
|
47
|
-
identity: -Shared::INFINITY
|
48
|
-
)
|
49
|
-
end
|
50
|
-
|
51
|
-
# The maximum value in A(i..j).
|
52
|
-
#
|
53
|
-
# The arguments must be integers in 0...(A.size)
|
54
|
-
# @return the largest value in A(i..j) or -Infinity if i > j.
|
55
|
-
def max_on(i, j)
|
56
|
-
@structure.query_on(i, j)
|
57
|
-
end
|
58
|
-
end
|
59
|
-
|
60
|
-
# A segment tree that for an array A(0...n) answers questions of the form "what is the index of the maximal value in the
|
61
|
-
# subinterval A(i..j)?" in O(log n) time.
|
62
|
-
class IndexOfMaxValSegmentTree
|
63
|
-
extend Forwardable
|
64
|
-
|
65
|
-
# Tell the tree that the value at idx has changed
|
66
|
-
def_delegator :@structure, :update_at
|
67
|
-
|
68
|
-
# @param (see MaxValSegmentTree#initialize)
|
69
|
-
def initialize(data)
|
70
|
-
data.must_be_a Enumerable
|
71
|
-
|
72
|
-
@structure = SegmentTreeTemplate.new(
|
73
|
-
combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
|
74
|
-
single_cell_array_val: ->(i) { [i, data[i]] },
|
75
|
-
size: data.size,
|
76
|
-
identity: nil
|
77
|
-
)
|
78
|
-
end
|
79
|
-
|
80
|
-
# The index of the maximum value in A(i..j)
|
81
|
-
#
|
82
|
-
# The arguments must be integers in 0...(A.size)
|
83
|
-
# @return (Integer, nil) the index of the largest value in A(i..j) or +nil+ if i > j.
|
84
|
-
# - If there is more than one entry with that value, return one the indices. There is no guarantee as to which one.
|
85
|
-
# - Return +nil+ if i > j
|
86
|
-
def index_of_max_val_on(i, j)
|
87
|
-
@structure.query_on(i, j)&.first # discard the value part of the pair, which is a bookkeeping
|
88
|
-
end
|
89
|
-
end
|
24
|
+
# Add things here if needed
|
90
25
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: data_structures_rmolinari
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Rory Molinari
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2023-02-
|
11
|
+
date: 2023-02-03 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: must_be
|
@@ -97,6 +97,7 @@ files:
|
|
97
97
|
- lib/data_structures_rmolinari/heap.rb
|
98
98
|
- lib/data_structures_rmolinari/max_priority_search_tree.rb
|
99
99
|
- lib/data_structures_rmolinari/min_priority_search_tree.rb
|
100
|
+
- lib/data_structures_rmolinari/segment_tree.rb
|
100
101
|
- lib/data_structures_rmolinari/segment_tree_template.rb
|
101
102
|
- lib/data_structures_rmolinari/shared.rb
|
102
103
|
homepage: https://github.com/rmolinari/data_structures
|