data_structures_rmolinari 0.4.4 → 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +11 -0
- data/README.md +96 -29
- data/ext/c_segment_tree_template/segment_tree_template.c +2 -1
- data/lib/data_structures_rmolinari/algorithms.rb +5 -5
- data/lib/data_structures_rmolinari/c_segment_tree_template_impl.rb +3 -100
- data/lib/data_structures_rmolinari/disjoint_union.rb +2 -0
- data/lib/data_structures_rmolinari/segment_tree.rb +126 -0
- data/lib/data_structures_rmolinari/segment_tree_template.rb +3 -3
- data/lib/data_structures_rmolinari.rb +2 -67
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7682f6d3b0779f347ce0797f55f33b9d7dcc7bd9c2039fc2fd6f865eb72e085a
|
4
|
+
data.tar.gz: d717e5e36f79ddc4ecb605a59b475b7114359dea7476445590deb300f7915bd4
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c3ffd9a4f67f55b7a2df1c949cf2288c06fcae416d5ff03a10307a1b79c3dae1daa74e2576d5e190c989adeea47b046426fad8c3c64199aadf22ba500b317f36
|
7
|
+
data.tar.gz: 8380d6117f2955da9362395f8315f5121b4f7afba2f69aabb1981a01b675cbbed81d07c10b5745409080c2588c92df3d676ec36efa128571234a74dceef0e20d
|
data/CHANGELOG.md
CHANGED
@@ -2,6 +2,17 @@
|
|
2
2
|
|
3
3
|
## [Unreleased]
|
4
4
|
|
5
|
+
## [0.5.0] 2023-02.03
|
6
|
+
|
7
|
+
- SegmentTree
|
8
|
+
- Reorganize the code into a SegmentTree submodule.
|
9
|
+
- Provide a conveniece method for getting concrete instances.
|
10
|
+
|
11
|
+
- README.md
|
12
|
+
- Add some simple example code for the data types.
|
13
|
+
|
14
|
+
## [0.4.4] 2023-02-02
|
15
|
+
|
5
16
|
- Disjoint Union
|
6
17
|
- C extension: use Convenient Containers rather than my janky Dynamic Array attempt.
|
7
18
|
|
data/README.md
CHANGED
@@ -14,18 +14,6 @@ The code is available as a gem: https://rubygems.org/gems/data_structures_rmolin
|
|
14
14
|
The right way to organize the code is not obvious to me. For now the data structures are all defined in the module
|
15
15
|
`DataStructuresRMolinari` to avoid polluting the global namespace.
|
16
16
|
|
17
|
-
Example usage after the gem is installed:
|
18
|
-
```
|
19
|
-
require 'data_structures_rmolinari`
|
20
|
-
|
21
|
-
# Pull what we need out of the namespace
|
22
|
-
MaxPrioritySearchTree = DataStructuresRMolinari::MaxPrioritySearchTree
|
23
|
-
Point = DataStructuresRMolinari::Point # anything responding to :x and :y is fine
|
24
|
-
|
25
|
-
pst = MaxPrioritySearchTree.new([Point.new(1, 1)])
|
26
|
-
puts pst.largest_y_in_ne(0, 0) # "Point(1,1)"
|
27
|
-
```
|
28
|
-
|
29
17
|
# Implementations
|
30
18
|
|
31
19
|
## Disjoint Union
|
@@ -42,6 +30,23 @@ It also provides
|
|
42
30
|
For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
|
43
31
|
van Leeuwen.
|
44
32
|
|
33
|
+
``` ruby
|
34
|
+
require 'data_structures_rmolinari'
|
35
|
+
DisjointUnion = DataStructuresRMolinari::DisjointUnion
|
36
|
+
|
37
|
+
# Create an instance over the "universe" 0, 1, ..., 9.
|
38
|
+
du = DisjointUnion.new(10)
|
39
|
+
du.subset_count # => 10; each element starts out in its own subset
|
40
|
+
|
41
|
+
du.unite(2, 3) # say that 2 and 3 are actually in the same subset
|
42
|
+
du.subset_count # => 9
|
43
|
+
du.find(2) == du.find(3) # => true
|
44
|
+
|
45
|
+
du.unite(4, 5)
|
46
|
+
du.unite(3, 4) # now 2, 3, 4, and 5 are all in the same subset
|
47
|
+
du.subset_count # => 7
|
48
|
+
```
|
49
|
+
|
45
50
|
## Heap
|
46
51
|
|
47
52
|
This is a standard binary heap with an `update` method, suitable for use as a priority queue. There are several supported
|
@@ -63,6 +68,24 @@ allows the insertion of duplicate items (which is sometimes useful) and slightly
|
|
63
68
|
|
64
69
|
See https://en.wikipedia.org/wiki/Binary_heap and the paper by Edelkamp, Elmasry, and Katajainen [[EEK2017]](#references).
|
65
70
|
|
71
|
+
``` ruby
|
72
|
+
require 'data_structures_rmolinari'
|
73
|
+
Heap = DataStructuresRMolinari::Heap
|
74
|
+
|
75
|
+
data = [4, 3, 2, 1]
|
76
|
+
|
77
|
+
heap = Heap.new
|
78
|
+
|
79
|
+
# Insert the elements of data, each with itself as priority.
|
80
|
+
data.each { |v| heap.insert(v, v) }
|
81
|
+
|
82
|
+
heap.top # => 1, since we have a min-heap.
|
83
|
+
heap.pop # => 1
|
84
|
+
heap.top # => 2; with 1 gone, this is the element with least priority
|
85
|
+
heap.update(3, -3)
|
86
|
+
heap.top # => 3; now 3 is the element with least priority
|
87
|
+
```
|
88
|
+
|
66
89
|
## Priority Search Tree
|
67
90
|
|
68
91
|
A PST stores a set P of two-dimensional points in a way that allows certain queries about P to be answered efficiently. The data
|
@@ -96,13 +119,27 @@ regions.
|
|
96
119
|
By default these data structures are immutable: once constructed they cannot be changed. But there is a constructor option that
|
97
120
|
makes the instance "dynamic". This allows us to delete the element at the root of the tree - the one with largest y value (smallest
|
98
121
|
for MinPST) - with the `delete_top!` method. This operation is important in certain algorithms, such as enumerating all maximal
|
99
|
-
empty rectangles (see the second paper by De et al
|
122
|
+
empty rectangles (see the second paper by De et al[[DMNS2013]](#references)). Note that points can still not be added to the PST in
|
100
123
|
any case, and choosing the dynamic option makes certain internal bookkeeping operations slower.
|
101
124
|
|
102
125
|
In [[DMNS2013]](#references) De et al. generalize the in-place structure to a _Min-max Priority Search Tree_ (MinmaxPST) that can
|
103
126
|
answer queries in all four quadrants and both "kinds" of 3-sided boxes. Having one of these would save the trouble of constructing
|
104
127
|
both a MaxPST and MinPST. But the presentiation is hard to follow in places and the paper's pseudocode is buggy.[^minmaxpst]
|
105
128
|
|
129
|
+
``` ruby
|
130
|
+
require 'data_structures_rmolinari'
|
131
|
+
MaxPST = DataStructuresRMolinari::MaxPrioritySearchTree
|
132
|
+
Point = Shared::Point # simple (x, y) struct. Anything responding to #x and #y will work
|
133
|
+
|
134
|
+
data = [Point.new(0, 0), Point.new(1, 2), Point.new(2, 1)]
|
135
|
+
pst = MaxPST.new(data)
|
136
|
+
|
137
|
+
pst.largest_y_in_ne(0, 0) # => #<struct Shared::Point x=1, y=2>
|
138
|
+
pst.largest_y_in_ne(1, 1) # => #<struct Shared::Point x=1, y=2>
|
139
|
+
pst.largest_y_in_ne(1.5, 1) # => #<struct Shared::Point x=2, y=1>
|
140
|
+
pst.largest_y_in_3_sided(-0.5, 0.5, 0) # => #<struct Shared::Point x=0, y=0>
|
141
|
+
```
|
142
|
+
|
106
143
|
## Segment Tree
|
107
144
|
|
108
145
|
A segment tree stores information related to subintervals of a certain array. For example, a segment tree can be used to find the
|
@@ -112,11 +149,37 @@ arbitrary subarrays.
|
|
112
149
|
|
113
150
|
An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
|
114
151
|
|
115
|
-
Generic code is provided in `SegmentTreeTemplate
|
116
|
-
|
117
|
-
|
152
|
+
Generic code is provided in `SegmentTree::SegmentTreeTemplate` and its equivalent (and faster) C-based sibling,
|
153
|
+
`SegmentTree::CSegmentTreeTemplate` (see [below](#c-extensions)).
|
154
|
+
|
155
|
+
Writing a concrete segment tree class just means providing some simple lambdas and constants to the template class's
|
156
|
+
initializer. Figuring out the details requires some knowledge of the internal mechanisms of a segment tree, for which the link at
|
157
|
+
cp-algorithms.com is very helpful. See the implementations of the concrete classes `MaxValSegmentTree` and
|
118
158
|
`IndexOfMaxValSegmentTree` for examples.
|
119
159
|
|
160
|
+
Since there are several concrete "types" and two underlying generic implementions there is a convenience method on the `SegmentTree`
|
161
|
+
module to get instances.
|
162
|
+
|
163
|
+
``` ruby
|
164
|
+
require 'data_structures_rmolinari'
|
165
|
+
SegmentTree = DataStructuresRMolinari::SegmentTree # namespace module
|
166
|
+
|
167
|
+
data = [1, -3, 2, 1, 5, -9]
|
168
|
+
|
169
|
+
# Get a segment tree instance that will answer "max over this subinterval" questions about data.
|
170
|
+
# Here we get one using the ruby implementation of the generic functionality.
|
171
|
+
#
|
172
|
+
# We offer :index_of_max as an alternative to :max. This will construct an instance that answers
|
173
|
+
# questions of the form "an index of the maximum value over this subinterval".
|
174
|
+
#
|
175
|
+
# To use the version written in C, put :c instead of :ruby.
|
176
|
+
seg_tree = SegmentTree.construct(data, :max, :ruby)
|
177
|
+
|
178
|
+
seg_tree.max_on(0, 2) # => 2
|
179
|
+
seg_tree.max_on(1, 4) # => 5
|
180
|
+
# ..etc..
|
181
|
+
```
|
182
|
+
|
120
183
|
## Algorithms
|
121
184
|
|
122
185
|
The Algorithms submodule contains some algorithms using the data structures.
|
@@ -130,13 +193,12 @@ The Algorithms submodule contains some algorithms using the data structures.
|
|
130
193
|
|
131
194
|
# C Extensions
|
132
195
|
|
133
|
-
As another learning process I have implemented several of these data structures as C extensions. The
|
134
|
-
and they can be required like their pure Ruby versions. They have the same APIs as their Ruby cousins.
|
196
|
+
As another learning process I have implemented several of these data structures as C extensions. The APIs are the same.
|
135
197
|
|
136
198
|
## Disjoint Union
|
137
199
|
|
138
|
-
A benchmark suggests that a long sequence of `unite` operations is about 3 times as fast
|
139
|
-
`DisjointUnion`.
|
200
|
+
The C version is called `CDisjointUnion`. A benchmark suggests that a long sequence of `unite` operations is about 3 times as fast
|
201
|
+
with `CDisjointUnion` as with `DisjointUnion`.
|
140
202
|
|
141
203
|
The implementation uses the remarkable Convenient Containers library from Jackson Allan.[[Allan]](#references).
|
142
204
|
|
@@ -145,16 +207,21 @@ The implementation uses the remarkable Convenient Containers library from Jackso
|
|
145
207
|
`CSegmentTreeTemplate` is the C implementation of the generic class. Concrete classes are built on top of this in Ruby, just as with
|
146
208
|
the pure Ruby `SegmentTreeTemplate` class.
|
147
209
|
|
148
|
-
A benchmark suggests that a long sequence of `max_on` operations against a max-val Segment Tree is about 4 times as fast with
|
149
|
-
|
150
|
-
|
210
|
+
A benchmark suggests that a long sequence of `max_on` operations against a max-val Segment Tree is about 4 times as fast with C as
|
211
|
+
with Ruby. I'm a bit suprised the improvment isn't larger, but remember that the C code must still interact with the Ruby objects in
|
212
|
+
the underlying data array, and must combine them, etc., via Ruby lambdas.
|
151
213
|
|
152
214
|
# References
|
153
|
-
- [Allan] Allan, J., _CC: Convenient Containers_, https://github.com/JacksonAllan/CC, retrieved 2023-02-01.
|
154
|
-
- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp
|
155
|
-
|
156
|
-
- [
|
157
|
-
-
|
158
|
-
- [
|
215
|
+
- [Allan] Allan, J., _CC: Convenient Containers_, https://github.com/JacksonAllan/CC, (retrieved 2023-02-01).
|
216
|
+
- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp
|
217
|
+
245–281, https://dl.acm.org/doi/10.1145/62.2160 (retrieved 2022-02-01).
|
218
|
+
- [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI
|
219
|
+
10.1007/s00224-017-9760-2, https://kclpure.kcl.ac.uk/portal/files/87388857/TheoryComputingSzstems.pdf (retrieved 2022-02-02).
|
220
|
+
- [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985,
|
221
|
+
http://www.cs.duke.edu/courses/fall08/cps234/handouts/SMJ000257.pdf (retrieved 2023-02-02).
|
222
|
+
- [DMNS2011] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Priority Search Tree_, 23rd Canadian Conference on
|
223
|
+
Computational Geometry, 2011, http://www.cs.carleton.ca/~michiel/inplace_pst.pdf (retrieved 2023-02-02).
|
224
|
+
- [DMNS2013] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46
|
225
|
+
(2013), pp 310-327, https://people.scs.carleton.ca/~michiel/MinMaxPST.pdf (retrieved 2023-02-02).
|
159
226
|
|
160
227
|
[^minmaxpst]: See the comments in the fragmentary class `MinMaxPrioritySearchTree` for further details.
|
@@ -353,7 +353,8 @@ static VALUE segment_tree_update_at(VALUE self, VALUE idx) {
|
|
353
353
|
* (see SegmentTreeTemplate)
|
354
354
|
*/
|
355
355
|
void Init_c_segment_tree_template() {
|
356
|
-
VALUE
|
356
|
+
VALUE mSegmentTree = rb_define_module_under(mDataStructuresRMolinari, "SegmentTree");
|
357
|
+
VALUE cSegmentTreeTemplate = rb_define_class_under(mSegmentTree, "CSegmentTreeTemplate", rb_cObject);
|
357
358
|
|
358
359
|
rb_define_alloc_func(cSegmentTreeTemplate, segment_tree_alloc);
|
359
360
|
rb_define_method(cSegmentTreeTemplate, "c_initialize", segment_tree_init, 4);
|
@@ -1,4 +1,4 @@
|
|
1
|
-
#
|
1
|
+
# Algorithms that use the module's data structures but don't belong as a method on one of the data structures
|
2
2
|
module DataStructuresRMolinari::Algorithms
|
3
3
|
include Shared
|
4
4
|
|
@@ -11,12 +11,12 @@ module DataStructuresRMolinari::Algorithms
|
|
11
11
|
#
|
12
12
|
# A _maximal empty rectangle_ (MER) for P is an empty rectangle for P not properly contained in any other.
|
13
13
|
#
|
14
|
-
# We enumerate all maximal empty rectangles for P, yielding each as (left, right, bottom, top)
|
15
|
-
#
|
16
|
-
#
|
14
|
+
# We enumerate all maximal empty rectangles for P, yielding each as (left, right, bottom, top). The algorithm is due to De, M.,
|
15
|
+
# Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46 (2013), pp
|
16
|
+
# 310-327.
|
17
17
|
#
|
18
18
|
# It runs in O(m log n) time, where m is the number of MERs enumerated and n is the number of points in P. (Contructing the
|
19
|
-
# MaxPST
|
19
|
+
# MaxPST takes O(n log^2 n) time, but m = O(n^2) so we are still O(m log n) overall.)
|
20
20
|
#
|
21
21
|
# @param points [Array] an array of points in the x-y plane. Each must respond to +x+ and +y+.
|
22
22
|
def self.maximal_empty_rectangles(points)
|
@@ -3,110 +3,13 @@ require 'must_be'
|
|
3
3
|
require_relative 'shared'
|
4
4
|
require_relative 'c_segment_tree_template'
|
5
5
|
|
6
|
-
# The
|
7
|
-
# max) on a arbitrary subarray of a given array.
|
6
|
+
# The underlying functionality of the Segment Tree data type, implemented in C as a Ruby extension.
|
8
7
|
#
|
9
|
-
#
|
10
|
-
# Wikipedia article (https://en.wikipedia.org/wiki/Segment_tree) appears to describe a different data structure which is sometimes
|
11
|
-
# called an "interval tree."
|
12
|
-
#
|
13
|
-
# For more details (and some close-to-metal analysis of run time, especially for large datasets) see
|
14
|
-
# https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up implementation,
|
15
|
-
# which is faster, at least for large datasets and cache-relevant compiled code. These issues don't really apply to code written in
|
16
|
-
# Ruby.
|
17
|
-
#
|
18
|
-
# This is a generic implementation, intended to allow easy configuration for concrete instances. See the parameters to the
|
19
|
-
# initializer and the definitions of concrete realisations like MaxValSegmentTree.
|
20
|
-
#
|
21
|
-
# We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
|
8
|
+
# See SegmentTreeTemplate for more information.
|
22
9
|
class DataStructuresRMolinari::CSegmentTreeTemplate
|
23
|
-
|
24
|
-
# Construct a concrete instance of a Segment Tree. See details at the links above for the underlying concepts here.
|
25
|
-
# @param combine a lambda that takes two values and munges them into a combined value.
|
26
|
-
# - For example, if we are calculating sums over subintervals, combine.call(a, b) = a + b, while if we are doing maxima we will
|
27
|
-
# return max(a, b).
|
28
|
-
# - Things get more complicated when we are calculating, say, the _index_ of the maximal value in a subinterval. Now it is not
|
29
|
-
# enough simply to store that index at each tree node, because to combine the indices from two child nodes we need to know
|
30
|
-
# both the index of the maximal element in each child node's interval, but also the maximal values themselves, so we know
|
31
|
-
# which one "wins" for the parent node. This affects the sort of work we need to do when combining and the value provided by
|
32
|
-
# the +single_cell_array_val+ lambda.
|
33
|
-
# @param single_cell_array_val a lambda that takes an index i and returns the value we need to store in the #build
|
34
|
-
# operation for the subinterval i..i.
|
35
|
-
# - This will often simply be the value data[i], but in some cases it will be something else. For example, when we are
|
36
|
-
# calculating the index of the maximal value on each subinterval we need [i, data[i]] here.
|
37
|
-
# - If +update_at+ is called later, this lambda must close over the underlying data in a way that captures the updated value.
|
38
|
-
# @param size the size of the underlying data array, used in certain internal arithmetic.
|
39
|
-
# @param identity the value to return when we are querying on an empty interval
|
40
|
-
# - for sums, this will be zero; for maxima, this will be -Infinity, etc
|
10
|
+
# (see SegmentTreeTemplate::initialize)
|
41
11
|
def initialize(combine:, single_cell_array_val:, size:, identity:)
|
42
12
|
# having sorted out the keyword arguments, pass them more easily to the C layer.
|
43
13
|
c_initialize(combine, single_cell_array_val, size, identity)
|
44
14
|
end
|
45
15
|
end
|
46
|
-
|
47
|
-
# A segment tree that for an array A(0...n) answers questions of the form "what is the maximum value in the subinterval A(i..j)?"
|
48
|
-
# in O(log n) time.
|
49
|
-
#
|
50
|
-
# C version
|
51
|
-
#
|
52
|
-
# TODO: share the definition with (non-C) MasValSegmentTree. The only difference is the class of the underlying segment tree
|
53
|
-
# template.
|
54
|
-
module DataStructuresRMolinari
|
55
|
-
class CMaxValSegmentTree
|
56
|
-
extend Forwardable
|
57
|
-
|
58
|
-
# Tell the tree that the value at idx has changed
|
59
|
-
def_delegator :@structure, :update_at
|
60
|
-
|
61
|
-
# @param data an object that contains values at integer indices based at 0, via +data[i]+.
|
62
|
-
# - This will usually be an Array, but it could also be a hash or a proc.
|
63
|
-
def initialize(data)
|
64
|
-
@structure = CSegmentTreeTemplate.new(
|
65
|
-
combine: ->(a, b) { [a, b].max },
|
66
|
-
single_cell_array_val: ->(i) { data[i] },
|
67
|
-
size: data.size,
|
68
|
-
identity: -Shared::INFINITY
|
69
|
-
)
|
70
|
-
end
|
71
|
-
|
72
|
-
# The maximum value in A(i..j).
|
73
|
-
#
|
74
|
-
# The arguments must be integers in 0...(A.size)
|
75
|
-
# @return the largest value in A(i..j) or -Infinity if i > j.
|
76
|
-
def max_on(i, j)
|
77
|
-
@structure.query_on(i, j)
|
78
|
-
end
|
79
|
-
end
|
80
|
-
|
81
|
-
# A segment tree that for an array A(0...n) answers questions of the form "what is the index of the maximal value in the
|
82
|
-
# subinterval A(i..j)?" in O(log n) time.
|
83
|
-
#
|
84
|
-
# C version
|
85
|
-
class CIndexOfMaxValSegmentTree
|
86
|
-
extend Forwardable
|
87
|
-
|
88
|
-
# Tell the tree that the value at idx has changed
|
89
|
-
def_delegator :@structure, :update_at
|
90
|
-
|
91
|
-
# @param (see MaxValSegmentTree#initialize)
|
92
|
-
def initialize(data)
|
93
|
-
@structure = CSegmentTreeTemplate.new(
|
94
|
-
combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
|
95
|
-
single_cell_array_val: ->(i) { [i, data[i]] },
|
96
|
-
size: data.size,
|
97
|
-
identity: nil
|
98
|
-
)
|
99
|
-
end
|
100
|
-
|
101
|
-
# The index of the maximum value in A(i..j)
|
102
|
-
#
|
103
|
-
# The arguments must be integers in 0...(A.size)
|
104
|
-
# @return (Integer, nil) the index of the largest value in A(i..j) or +nil+ if i > j.
|
105
|
-
# - If there is more than one entry with that value, return one the indices. There is no guarantee as to which one.
|
106
|
-
# - Return +nil+ if i > j
|
107
|
-
def index_of_max_val_on(i, j)
|
108
|
-
@structure.query_on(i, j)&.first # discard the value part of the pair, which is a bookkeeping
|
109
|
-
end
|
110
|
-
end
|
111
|
-
|
112
|
-
end
|
@@ -0,0 +1,126 @@
|
|
1
|
+
require_relative 'shared'
|
2
|
+
|
3
|
+
# A namespace to hold the various bits and bobs related to the SegmentTree implementation
|
4
|
+
module DataStructuresRMolinari::SegmentTree
|
5
|
+
end
|
6
|
+
|
7
|
+
require_relative 'segment_tree_template' # Ruby implementation of the generic API
|
8
|
+
require_relative 'c_segment_tree_template' # C implementation of the generic API
|
9
|
+
|
10
|
+
# Segment Tree: various concrete implementations
|
11
|
+
#
|
12
|
+
# There is an excellent description of the data structure at https://cp-algorithms.com/data_structures/segment_tree.html. The
|
13
|
+
# Wikipedia article (https://en.wikipedia.org/wiki/Segment_tree) appears to describe a different data structure which is sometimes
|
14
|
+
# called an "interval tree."
|
15
|
+
#
|
16
|
+
# For more details (and some close-to-metal analysis of run time, especially for large datasets) see
|
17
|
+
# https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up implementation,
|
18
|
+
# which is faster, at least for large datasets and cache-relevant compiled code. These issues don't really apply to code written in
|
19
|
+
# Ruby.
|
20
|
+
#
|
21
|
+
# Here we provide several concrete segment tree implementations built on top of the template (generic) versions. Each instance is
|
22
|
+
# backed either by the pure Ruby SegmentTreeTemplate or its C-based sibling CSegmentTreeTemplate
|
23
|
+
module DataStructuresRMolinari
|
24
|
+
module SegmentTree
|
25
|
+
# A convenience method to construct a Segment Tree that, for a given array A(0...size), answers questions of the kind given by
|
26
|
+
# operation, using the template written in lang
|
27
|
+
#
|
28
|
+
# - @param data: the array A.
|
29
|
+
# - It must respond to +#size+ and to +#[]+ with non-negative integer arguments.
|
30
|
+
# - @param operation: a supported "style" of Segment Tree
|
31
|
+
# - for now, must be one of these (but you can write your own concrete version)
|
32
|
+
# - +:max+: implementing +max_on(i, j)+, returning the maximum value in A(i..j)
|
33
|
+
# - +:index_of_max+: implementing +index_of_max_val_on(i, j)+, returning an index corresponding to the maximum value in
|
34
|
+
# A(i..j).
|
35
|
+
# - @param lang: the language in which the underlying "template" is written
|
36
|
+
# - +:c+ or +:ruby+
|
37
|
+
# - the C version will run faster but for now may be buggier and harder to debug
|
38
|
+
module_function def construct(data, operation, lang)
|
39
|
+
operation.must_be_in [:max, :index_of_max]
|
40
|
+
lang.must_be_in [:ruby, :c]
|
41
|
+
|
42
|
+
klass = operation == :max ? MaxValSegmentTree : IndexOfMaxValSegmentTree
|
43
|
+
template = lang == :ruby ? SegmentTreeTemplate : CSegmentTreeTemplate
|
44
|
+
|
45
|
+
klass.new(template, data)
|
46
|
+
end
|
47
|
+
|
48
|
+
# A segment tree that for an array A(0...n) answers questions of the form "what is the maximum value in the subinterval A(i..j)?"
|
49
|
+
# in O(log n) time.
|
50
|
+
class MaxValSegmentTree
|
51
|
+
extend Forwardable
|
52
|
+
|
53
|
+
# Tell the tree that the value at idx has changed
|
54
|
+
def_delegator :@structure, :update_at
|
55
|
+
|
56
|
+
# @param template_klass the "template" class that provides the generic implementation of the Segment Tree functionality.
|
57
|
+
# @param data an object that contains values at integer indices based at 0, via +data[i]+.
|
58
|
+
# - This will usually be an Array, but it could also be a hash or a proc.
|
59
|
+
def initialize(template_klass, data)
|
60
|
+
data.must_be_a Enumerable
|
61
|
+
|
62
|
+
@structure = template_klass.new(
|
63
|
+
combine: ->(a, b) { [a, b].max },
|
64
|
+
single_cell_array_val: ->(i) { data[i] },
|
65
|
+
size: data.size,
|
66
|
+
identity: -Shared::INFINITY
|
67
|
+
)
|
68
|
+
end
|
69
|
+
|
70
|
+
# The maximum value in A(i..j).
|
71
|
+
#
|
72
|
+
# The arguments must be integers in 0...(A.size)
|
73
|
+
# @return the largest value in A(i..j) or -Infinity if i > j.
|
74
|
+
def max_on(i, j)
|
75
|
+
@structure.query_on(i, j)
|
76
|
+
end
|
77
|
+
end
|
78
|
+
|
79
|
+
# A segment tree that for an array A(0...n) answers questions of the form "what is the index of the maximal value in the
|
80
|
+
# subinterval A(i..j)?" in O(log n) time.
|
81
|
+
class IndexOfMaxValSegmentTree
|
82
|
+
extend Forwardable
|
83
|
+
|
84
|
+
# Tell the tree that the value at idx has changed
|
85
|
+
def_delegator :@structure, :update_at
|
86
|
+
|
87
|
+
# @param (see MaxValSegmentTree#initialize)
|
88
|
+
def initialize(template_klass, data)
|
89
|
+
data.must_be_a Enumerable
|
90
|
+
|
91
|
+
@structure = template_klass.new(
|
92
|
+
combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
|
93
|
+
single_cell_array_val: ->(i) { [i, data[i]] },
|
94
|
+
size: data.size,
|
95
|
+
identity: nil
|
96
|
+
)
|
97
|
+
end
|
98
|
+
|
99
|
+
# The index of the maximum value in A(i..j)
|
100
|
+
#
|
101
|
+
# The arguments must be integers in 0...(A.size)
|
102
|
+
# @return (Integer, nil) the index of the largest value in A(i..j) or +nil+ if i > j.
|
103
|
+
# - If there is more than one entry with that value, return one the indices. There is no guarantee as to which one.
|
104
|
+
# - Return +nil+ if i > j
|
105
|
+
def index_of_max_val_on(i, j)
|
106
|
+
@structure.query_on(i, j)&.first # discard the value part of the pair, which is a bookkeeping
|
107
|
+
end
|
108
|
+
end
|
109
|
+
|
110
|
+
# The underlying functionality of the Segment Tree data type, implemented in C as a Ruby extension.
|
111
|
+
#
|
112
|
+
# See SegmentTreeTemplate for more information.
|
113
|
+
#
|
114
|
+
# Implementation note
|
115
|
+
#
|
116
|
+
# The functionality is entirely written in C. But we write the constructor in Ruby because keyword arguments are difficult to
|
117
|
+
# parse on the C side.
|
118
|
+
class CSegmentTreeTemplate
|
119
|
+
# (see SegmentTreeTemplate::initialize)
|
120
|
+
def initialize(combine:, single_cell_array_val:, size:, identity:)
|
121
|
+
# having sorted out the keyword arguments, pass them more easily to the C layer.
|
122
|
+
c_initialize(combine, single_cell_array_val, size, identity)
|
123
|
+
end
|
124
|
+
end
|
125
|
+
end
|
126
|
+
end
|
@@ -1,7 +1,7 @@
|
|
1
1
|
require_relative 'shared'
|
2
2
|
|
3
|
-
#
|
4
|
-
# max) on a arbitrary subarray of a given array.
|
3
|
+
# A generic implementation of Segment Tree, which can be used for various interval-related purposes, like efficiently finding the
|
4
|
+
# sum (or min or max) on a arbitrary subarray of a given array.
|
5
5
|
#
|
6
6
|
# There is an excellent description of the data structure at https://cp-algorithms.com/data_structures/segment_tree.html. The
|
7
7
|
# Wikipedia article (https://en.wikipedia.org/wiki/Segment_tree) appears to describe a different data structure which is sometimes
|
@@ -16,7 +16,7 @@ require_relative 'shared'
|
|
16
16
|
# initializer and the definitions of concrete realisations like MaxValSegmentTree.
|
17
17
|
#
|
18
18
|
# We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
|
19
|
-
class DataStructuresRMolinari::SegmentTreeTemplate
|
19
|
+
class DataStructuresRMolinari::SegmentTree::SegmentTreeTemplate
|
20
20
|
include Shared
|
21
21
|
include Shared::BinaryTreeArithmetic
|
22
22
|
|
@@ -14,77 +14,12 @@ require_relative 'data_structures_rmolinari/algorithms'
|
|
14
14
|
require_relative 'data_structures_rmolinari/disjoint_union'
|
15
15
|
require_relative 'data_structures_rmolinari/c_disjoint_union' # version as a C extension
|
16
16
|
|
17
|
-
require_relative 'data_structures_rmolinari/
|
18
|
-
require_relative 'data_structures_rmolinari/c_segment_tree_template_impl'
|
17
|
+
require_relative 'data_structures_rmolinari/segment_tree'
|
19
18
|
|
20
19
|
require_relative 'data_structures_rmolinari/heap'
|
21
20
|
require_relative 'data_structures_rmolinari/max_priority_search_tree'
|
22
21
|
require_relative 'data_structures_rmolinari/min_priority_search_tree'
|
23
22
|
|
24
23
|
module DataStructuresRMolinari
|
25
|
-
|
26
|
-
# Concrete instances of Segment Tree
|
27
|
-
#
|
28
|
-
# @todo consider moving these into generic_segment_tree.rb and renaming that file
|
29
|
-
|
30
|
-
# A segment tree that for an array A(0...n) answers questions of the form "what is the maximum value in the subinterval A(i..j)?"
|
31
|
-
# in O(log n) time.
|
32
|
-
class MaxValSegmentTree
|
33
|
-
extend Forwardable
|
34
|
-
|
35
|
-
# Tell the tree that the value at idx has changed
|
36
|
-
def_delegator :@structure, :update_at
|
37
|
-
|
38
|
-
# @param data an object that contains values at integer indices based at 0, via +data[i]+.
|
39
|
-
# - This will usually be an Array, but it could also be a hash or a proc.
|
40
|
-
def initialize(data)
|
41
|
-
data.must_be_a Enumerable
|
42
|
-
|
43
|
-
@structure = SegmentTreeTemplate.new(
|
44
|
-
combine: ->(a, b) { [a, b].max },
|
45
|
-
single_cell_array_val: ->(i) { data[i] },
|
46
|
-
size: data.size,
|
47
|
-
identity: -Shared::INFINITY
|
48
|
-
)
|
49
|
-
end
|
50
|
-
|
51
|
-
# The maximum value in A(i..j).
|
52
|
-
#
|
53
|
-
# The arguments must be integers in 0...(A.size)
|
54
|
-
# @return the largest value in A(i..j) or -Infinity if i > j.
|
55
|
-
def max_on(i, j)
|
56
|
-
@structure.query_on(i, j)
|
57
|
-
end
|
58
|
-
end
|
59
|
-
|
60
|
-
# A segment tree that for an array A(0...n) answers questions of the form "what is the index of the maximal value in the
|
61
|
-
# subinterval A(i..j)?" in O(log n) time.
|
62
|
-
class IndexOfMaxValSegmentTree
|
63
|
-
extend Forwardable
|
64
|
-
|
65
|
-
# Tell the tree that the value at idx has changed
|
66
|
-
def_delegator :@structure, :update_at
|
67
|
-
|
68
|
-
# @param (see MaxValSegmentTree#initialize)
|
69
|
-
def initialize(data)
|
70
|
-
data.must_be_a Enumerable
|
71
|
-
|
72
|
-
@structure = SegmentTreeTemplate.new(
|
73
|
-
combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
|
74
|
-
single_cell_array_val: ->(i) { [i, data[i]] },
|
75
|
-
size: data.size,
|
76
|
-
identity: nil
|
77
|
-
)
|
78
|
-
end
|
79
|
-
|
80
|
-
# The index of the maximum value in A(i..j)
|
81
|
-
#
|
82
|
-
# The arguments must be integers in 0...(A.size)
|
83
|
-
# @return (Integer, nil) the index of the largest value in A(i..j) or +nil+ if i > j.
|
84
|
-
# - If there is more than one entry with that value, return one the indices. There is no guarantee as to which one.
|
85
|
-
# - Return +nil+ if i > j
|
86
|
-
def index_of_max_val_on(i, j)
|
87
|
-
@structure.query_on(i, j)&.first # discard the value part of the pair, which is a bookkeeping
|
88
|
-
end
|
89
|
-
end
|
24
|
+
# Add things here if needed
|
90
25
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: data_structures_rmolinari
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.5.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Rory Molinari
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2023-02-
|
11
|
+
date: 2023-02-03 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: must_be
|
@@ -97,6 +97,7 @@ files:
|
|
97
97
|
- lib/data_structures_rmolinari/heap.rb
|
98
98
|
- lib/data_structures_rmolinari/max_priority_search_tree.rb
|
99
99
|
- lib/data_structures_rmolinari/min_priority_search_tree.rb
|
100
|
+
- lib/data_structures_rmolinari/segment_tree.rb
|
100
101
|
- lib/data_structures_rmolinari/segment_tree_template.rb
|
101
102
|
- lib/data_structures_rmolinari/shared.rb
|
102
103
|
homepage: https://github.com/rmolinari/data_structures
|