data_structures_rmolinari 0.4.4 → 0.5.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +11 -0
- data/README.md +98 -29
- data/ext/c_segment_tree_template/segment_tree_template.c +2 -1
- data/ext/cc.h +3879 -0
- data/ext/shared.h +33 -0
- data/lib/data_structures_rmolinari/algorithms.rb +5 -5
- data/lib/data_structures_rmolinari/c_segment_tree_template_impl.rb +3 -100
- data/lib/data_structures_rmolinari/disjoint_union.rb +2 -0
- data/lib/data_structures_rmolinari/segment_tree.rb +126 -0
- data/lib/data_structures_rmolinari/segment_tree_template.rb +3 -3
- data/lib/data_structures_rmolinari.rb +2 -67
- metadata +5 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a41b03a0d46982d64fc1ec5648174df24c6c9b0593ff93354ad90fc65ff84e8e
|
4
|
+
data.tar.gz: 544aa69540c9cef54c4eb6402b260bafbbeddf7dc5e770bf428976c11161c58e
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 19efca577bb0c3c9524cf09b6f63c0e06beab70bfd20d1dbafa5034009be9bcb965ac5b8cee0ba0a39ecdcb646c401a54b363852a4572613941f7a34da174593
|
7
|
+
data.tar.gz: 05606e9c142a9bcb69a9fd555586b0ad8fe76bf7572af3dd82003de673d78b671b9a5edf1e5d4a2e4c7d5c6d5e41bef2ed1679573600725b74a25d4f3665c6ee
|
data/CHANGELOG.md
CHANGED
@@ -2,6 +2,17 @@
|
|
2
2
|
|
3
3
|
## [Unreleased]
|
4
4
|
|
5
|
+
## [0.5.0] 2023-02.03
|
6
|
+
|
7
|
+
- SegmentTree
|
8
|
+
- Reorganize the code into a SegmentTree submodule.
|
9
|
+
- Provide a conveniece method for getting concrete instances.
|
10
|
+
|
11
|
+
- README.md
|
12
|
+
- Add some simple example code for the data types.
|
13
|
+
|
14
|
+
## [0.4.4] 2023-02-02
|
15
|
+
|
5
16
|
- Disjoint Union
|
6
17
|
- C extension: use Convenient Containers rather than my janky Dynamic Array attempt.
|
7
18
|
|
data/README.md
CHANGED
@@ -9,23 +9,13 @@ as fast as possible.
|
|
9
9
|
|
10
10
|
The code is available as a gem: https://rubygems.org/gems/data_structures_rmolinari.
|
11
11
|
|
12
|
+
It is distributed under the MIT license.
|
13
|
+
|
12
14
|
## Usage
|
13
15
|
|
14
16
|
The right way to organize the code is not obvious to me. For now the data structures are all defined in the module
|
15
17
|
`DataStructuresRMolinari` to avoid polluting the global namespace.
|
16
18
|
|
17
|
-
Example usage after the gem is installed:
|
18
|
-
```
|
19
|
-
require 'data_structures_rmolinari`
|
20
|
-
|
21
|
-
# Pull what we need out of the namespace
|
22
|
-
MaxPrioritySearchTree = DataStructuresRMolinari::MaxPrioritySearchTree
|
23
|
-
Point = DataStructuresRMolinari::Point # anything responding to :x and :y is fine
|
24
|
-
|
25
|
-
pst = MaxPrioritySearchTree.new([Point.new(1, 1)])
|
26
|
-
puts pst.largest_y_in_ne(0, 0) # "Point(1,1)"
|
27
|
-
```
|
28
|
-
|
29
19
|
# Implementations
|
30
20
|
|
31
21
|
## Disjoint Union
|
@@ -42,6 +32,23 @@ It also provides
|
|
42
32
|
For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
|
43
33
|
van Leeuwen.
|
44
34
|
|
35
|
+
``` ruby
|
36
|
+
require 'data_structures_rmolinari'
|
37
|
+
DisjointUnion = DataStructuresRMolinari::DisjointUnion
|
38
|
+
|
39
|
+
# Create an instance over the "universe" 0, 1, ..., 9.
|
40
|
+
du = DisjointUnion.new(10)
|
41
|
+
du.subset_count # => 10; each element starts out in its own subset
|
42
|
+
|
43
|
+
du.unite(2, 3) # say that 2 and 3 are actually in the same subset
|
44
|
+
du.subset_count # => 9
|
45
|
+
du.find(2) == du.find(3) # => true
|
46
|
+
|
47
|
+
du.unite(4, 5)
|
48
|
+
du.unite(3, 4) # now 2, 3, 4, and 5 are all in the same subset
|
49
|
+
du.subset_count # => 7
|
50
|
+
```
|
51
|
+
|
45
52
|
## Heap
|
46
53
|
|
47
54
|
This is a standard binary heap with an `update` method, suitable for use as a priority queue. There are several supported
|
@@ -63,6 +70,24 @@ allows the insertion of duplicate items (which is sometimes useful) and slightly
|
|
63
70
|
|
64
71
|
See https://en.wikipedia.org/wiki/Binary_heap and the paper by Edelkamp, Elmasry, and Katajainen [[EEK2017]](#references).
|
65
72
|
|
73
|
+
``` ruby
|
74
|
+
require 'data_structures_rmolinari'
|
75
|
+
Heap = DataStructuresRMolinari::Heap
|
76
|
+
|
77
|
+
data = [4, 3, 2, 1]
|
78
|
+
|
79
|
+
heap = Heap.new
|
80
|
+
|
81
|
+
# Insert the elements of data, each with itself as priority.
|
82
|
+
data.each { |v| heap.insert(v, v) }
|
83
|
+
|
84
|
+
heap.top # => 1, since we have a min-heap.
|
85
|
+
heap.pop # => 1
|
86
|
+
heap.top # => 2; with 1 gone, this is the element with least priority
|
87
|
+
heap.update(3, -3)
|
88
|
+
heap.top # => 3; now 3 is the element with least priority
|
89
|
+
```
|
90
|
+
|
66
91
|
## Priority Search Tree
|
67
92
|
|
68
93
|
A PST stores a set P of two-dimensional points in a way that allows certain queries about P to be answered efficiently. The data
|
@@ -96,13 +121,27 @@ regions.
|
|
96
121
|
By default these data structures are immutable: once constructed they cannot be changed. But there is a constructor option that
|
97
122
|
makes the instance "dynamic". This allows us to delete the element at the root of the tree - the one with largest y value (smallest
|
98
123
|
for MinPST) - with the `delete_top!` method. This operation is important in certain algorithms, such as enumerating all maximal
|
99
|
-
empty rectangles (see the second paper by De et al
|
124
|
+
empty rectangles (see the second paper by De et al[[DMNS2013]](#references)). Note that points can still not be added to the PST in
|
100
125
|
any case, and choosing the dynamic option makes certain internal bookkeeping operations slower.
|
101
126
|
|
102
127
|
In [[DMNS2013]](#references) De et al. generalize the in-place structure to a _Min-max Priority Search Tree_ (MinmaxPST) that can
|
103
128
|
answer queries in all four quadrants and both "kinds" of 3-sided boxes. Having one of these would save the trouble of constructing
|
104
129
|
both a MaxPST and MinPST. But the presentiation is hard to follow in places and the paper's pseudocode is buggy.[^minmaxpst]
|
105
130
|
|
131
|
+
``` ruby
|
132
|
+
require 'data_structures_rmolinari'
|
133
|
+
MaxPST = DataStructuresRMolinari::MaxPrioritySearchTree
|
134
|
+
Point = Shared::Point # simple (x, y) struct. Anything responding to #x and #y will work
|
135
|
+
|
136
|
+
data = [Point.new(0, 0), Point.new(1, 2), Point.new(2, 1)]
|
137
|
+
pst = MaxPST.new(data)
|
138
|
+
|
139
|
+
pst.largest_y_in_ne(0, 0) # => #<struct Shared::Point x=1, y=2>
|
140
|
+
pst.largest_y_in_ne(1, 1) # => #<struct Shared::Point x=1, y=2>
|
141
|
+
pst.largest_y_in_ne(1.5, 1) # => #<struct Shared::Point x=2, y=1>
|
142
|
+
pst.largest_y_in_3_sided(-0.5, 0.5, 0) # => #<struct Shared::Point x=0, y=0>
|
143
|
+
```
|
144
|
+
|
106
145
|
## Segment Tree
|
107
146
|
|
108
147
|
A segment tree stores information related to subintervals of a certain array. For example, a segment tree can be used to find the
|
@@ -112,11 +151,37 @@ arbitrary subarrays.
|
|
112
151
|
|
113
152
|
An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
|
114
153
|
|
115
|
-
Generic code is provided in `SegmentTreeTemplate
|
116
|
-
|
117
|
-
|
154
|
+
Generic code is provided in `SegmentTree::SegmentTreeTemplate` and its equivalent (and faster) C-based sibling,
|
155
|
+
`SegmentTree::CSegmentTreeTemplate` (see [below](#c-extensions)).
|
156
|
+
|
157
|
+
Writing a concrete segment tree class just means providing some simple lambdas and constants to the template class's
|
158
|
+
initializer. Figuring out the details requires some knowledge of the internal mechanisms of a segment tree, for which the link at
|
159
|
+
cp-algorithms.com is very helpful. See the implementations of the concrete classes `MaxValSegmentTree` and
|
118
160
|
`IndexOfMaxValSegmentTree` for examples.
|
119
161
|
|
162
|
+
Since there are several concrete "types" and two underlying generic implementions there is a convenience method on the `SegmentTree`
|
163
|
+
module to get instances.
|
164
|
+
|
165
|
+
``` ruby
|
166
|
+
require 'data_structures_rmolinari'
|
167
|
+
SegmentTree = DataStructuresRMolinari::SegmentTree # namespace module
|
168
|
+
|
169
|
+
data = [1, -3, 2, 1, 5, -9]
|
170
|
+
|
171
|
+
# Get a segment tree instance that will answer "max over this subinterval?" questions about data.
|
172
|
+
# Here we get one using the ruby implementation of the generic functionality.
|
173
|
+
#
|
174
|
+
# Put :index_of_max in place of :map to get an instance that returns "an index of the maximum value
|
175
|
+
# over this subinterval".
|
176
|
+
#
|
177
|
+
# To use the generic code written in C, put :c instead of :ruby.
|
178
|
+
seg_tree = SegmentTree.construct(data, :max, :ruby)
|
179
|
+
|
180
|
+
seg_tree.max_on(0, 2) # => 2
|
181
|
+
seg_tree.max_on(1, 4) # => 5
|
182
|
+
# ..etc..
|
183
|
+
```
|
184
|
+
|
120
185
|
## Algorithms
|
121
186
|
|
122
187
|
The Algorithms submodule contains some algorithms using the data structures.
|
@@ -130,13 +195,12 @@ The Algorithms submodule contains some algorithms using the data structures.
|
|
130
195
|
|
131
196
|
# C Extensions
|
132
197
|
|
133
|
-
As another learning process I have implemented several of these data structures as C extensions. The
|
134
|
-
and they can be required like their pure Ruby versions. They have the same APIs as their Ruby cousins.
|
198
|
+
As another learning process I have implemented several of these data structures as C extensions. The APIs are the same.
|
135
199
|
|
136
200
|
## Disjoint Union
|
137
201
|
|
138
|
-
A benchmark suggests that a long sequence of `unite` operations is about 3 times as fast
|
139
|
-
`DisjointUnion`.
|
202
|
+
The C version is called `CDisjointUnion`. A benchmark suggests that a long sequence of `unite` operations is about 3 times as fast
|
203
|
+
with `CDisjointUnion` as with `DisjointUnion`.
|
140
204
|
|
141
205
|
The implementation uses the remarkable Convenient Containers library from Jackson Allan.[[Allan]](#references).
|
142
206
|
|
@@ -145,16 +209,21 @@ The implementation uses the remarkable Convenient Containers library from Jackso
|
|
145
209
|
`CSegmentTreeTemplate` is the C implementation of the generic class. Concrete classes are built on top of this in Ruby, just as with
|
146
210
|
the pure Ruby `SegmentTreeTemplate` class.
|
147
211
|
|
148
|
-
A benchmark suggests that a long sequence of `max_on` operations against a max-val Segment Tree is about 4 times as fast with
|
149
|
-
|
150
|
-
|
212
|
+
A benchmark suggests that a long sequence of `max_on` operations against a max-val Segment Tree is about 4 times as fast with C as
|
213
|
+
with Ruby. I'm a bit suprised the improvment isn't larger, but remember that the C code must still interact with the Ruby objects in
|
214
|
+
the underlying data array, and must combine them, etc., via Ruby lambdas.
|
151
215
|
|
152
216
|
# References
|
153
|
-
- [Allan] Allan, J., _CC: Convenient Containers_, https://github.com/JacksonAllan/CC, retrieved 2023-02-01.
|
154
|
-
- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp
|
155
|
-
|
156
|
-
- [
|
157
|
-
-
|
158
|
-
- [
|
217
|
+
- [Allan] Allan, J., _CC: Convenient Containers_, https://github.com/JacksonAllan/CC, (retrieved 2023-02-01).
|
218
|
+
- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp
|
219
|
+
245–281, https://dl.acm.org/doi/10.1145/62.2160 (retrieved 2022-02-01).
|
220
|
+
- [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI
|
221
|
+
10.1007/s00224-017-9760-2, https://kclpure.kcl.ac.uk/portal/files/87388857/TheoryComputingSzstems.pdf (retrieved 2022-02-02).
|
222
|
+
- [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985,
|
223
|
+
http://www.cs.duke.edu/courses/fall08/cps234/handouts/SMJ000257.pdf (retrieved 2023-02-02).
|
224
|
+
- [DMNS2011] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Priority Search Tree_, 23rd Canadian Conference on
|
225
|
+
Computational Geometry, 2011, http://www.cs.carleton.ca/~michiel/inplace_pst.pdf (retrieved 2023-02-02).
|
226
|
+
- [DMNS2013] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46
|
227
|
+
(2013), pp 310-327, https://people.scs.carleton.ca/~michiel/MinMaxPST.pdf (retrieved 2023-02-02).
|
159
228
|
|
160
229
|
[^minmaxpst]: See the comments in the fragmentary class `MinMaxPrioritySearchTree` for further details.
|
@@ -353,7 +353,8 @@ static VALUE segment_tree_update_at(VALUE self, VALUE idx) {
|
|
353
353
|
* (see SegmentTreeTemplate)
|
354
354
|
*/
|
355
355
|
void Init_c_segment_tree_template() {
|
356
|
-
VALUE
|
356
|
+
VALUE mSegmentTree = rb_define_module_under(mDataStructuresRMolinari, "SegmentTree");
|
357
|
+
VALUE cSegmentTreeTemplate = rb_define_class_under(mSegmentTree, "CSegmentTreeTemplate", rb_cObject);
|
357
358
|
|
358
359
|
rb_define_alloc_func(cSegmentTreeTemplate, segment_tree_alloc);
|
359
360
|
rb_define_method(cSegmentTreeTemplate, "c_initialize", segment_tree_init, 4);
|