data_structures_rmolinari 0.4.2 → 0.4.4
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +12 -0
- data/README.md +35 -16
- data/Rakefile +6 -4
- data/ext/c_disjoint_union/disjoint_union.c +100 -142
- data/ext/c_disjoint_union/extconf.rb +7 -2
- data/ext/c_segment_tree_template/extconf.rb +17 -0
- data/ext/c_segment_tree_template/segment_tree_template.c +362 -0
- data/ext/shared.c +32 -0
- data/lib/data_structures_rmolinari/c_segment_tree_template_impl.rb +112 -0
- data/lib/data_structures_rmolinari/segment_tree_template.rb +8 -5
- data/lib/data_structures_rmolinari.rb +8 -0
- metadata +7 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 943ac55678a074cc0da3667dccbb07ee7d203639233f53bd8587af7fd8cd062e
|
4
|
+
data.tar.gz: ad235e5f4714e699f1cf5f113dd4b3a356a194cced5a74b60e17c5e3a896e01b
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: a68de76c88c67fadc42752610c695b1f0b8fd17f34db9c806291aeab4c933fe84c6523615deb4197e1c9fa6d36dce30987cc4e8896a2b0c1700b7e72b5bd2fff
|
7
|
+
data.tar.gz: 9063d89a98d599f27db2585bf383dbfb13e8f927abce64ac7eafb2edd70c490ddad1f1fc51e0f11c24adf29f28ab8c56548a6db264b15ace239c63b1a2ce5a01
|
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,17 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
+
## [Unreleased]
|
4
|
+
|
5
|
+
- Disjoint Union
|
6
|
+
- C extension: use Convenient Containers rather than my janky Dynamic Array attempt.
|
7
|
+
|
8
|
+
- Segment Tree
|
9
|
+
- Add a C implementation as CSegmentTreeTemplate.
|
10
|
+
|
11
|
+
## [0.4.3] 2023-01-27
|
12
|
+
|
13
|
+
- Fix bad directive in Rakefile for DisjointUnion C extension
|
14
|
+
|
3
15
|
## [0.4.2] 2023-01-26
|
4
16
|
|
5
17
|
### Added
|
data/README.md
CHANGED
@@ -4,8 +4,8 @@ This is a small collection of Ruby data structures that I have implemented for m
|
|
4
4
|
structure is almost always more educational than simply reading about it and is usually fun. I wrote some of them while
|
5
5
|
participating in the Advent of Code (https://adventofcode.com/).
|
6
6
|
|
7
|
-
|
8
|
-
|
7
|
+
The implementations are based on the expository descriptions and pseudo-code I found as I read about each structure and so are not
|
8
|
+
as fast as possible.
|
9
9
|
|
10
10
|
The code is available as a gem: https://rubygems.org/gems/data_structures_rmolinari.
|
11
11
|
|
@@ -42,9 +42,6 @@ It also provides
|
|
42
42
|
For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
|
43
43
|
van Leeuwen.
|
44
44
|
|
45
|
-
There is an experimental implementation as a C extension in c_disjoint_union.c. The Ruby class for this implementation is
|
46
|
-
`CDisjointUnion`. Benchmarks indicate that a long sequence of `unite` calls is about twice as fast.
|
47
|
-
|
48
45
|
## Heap
|
49
46
|
|
50
47
|
This is a standard binary heap with an `update` method, suitable for use as a priority queue. There are several supported
|
@@ -84,15 +81,15 @@ pointing north.
|
|
84
81
|
|
85
82
|
There is no `smallest_x_in_3_sided(x0, x1, y0)`. Just use `smallest_x_in_ne(x0, y0)`.
|
86
83
|
|
84
|
+
(These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
|
85
|
+
[[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.)
|
86
|
+
|
87
87
|
The single-point queries run in O(log n) time, where n is the size of P, while `enumerate_3_sided` runs in O(m + log n), where m is
|
88
88
|
the number of points actually enumerated.
|
89
89
|
|
90
90
|
The implementation is in `MaxPrioritySearchTree` (MaxPST for short), so called because internally the structure is, among other
|
91
91
|
things, a max-heap on the y-coordinates.
|
92
92
|
|
93
|
-
These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
|
94
|
-
[[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.
|
95
|
-
|
96
93
|
We also provide a `MinPrioritySearchTree`, which answers analagous queries in the southward-infinite quadrants and 3-sided
|
97
94
|
regions.
|
98
95
|
|
@@ -108,17 +105,17 @@ both a MaxPST and MinPST. But the presentiation is hard to follow in places and
|
|
108
105
|
|
109
106
|
## Segment Tree
|
110
107
|
|
111
|
-
|
112
|
-
elements in an arbitrary subinterval A
|
113
|
-
of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for
|
114
|
-
subarrays.
|
108
|
+
A segment tree stores information related to subintervals of a certain array. For example, a segment tree can be used to find the
|
109
|
+
sum of the elements in an arbitrary subinterval A(i..j) of an array A(0..n) in O(log n) time. Each node in the tree corresponds to a
|
110
|
+
subarray of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for
|
111
|
+
arbitrary subarrays.
|
115
112
|
|
116
113
|
An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
|
117
114
|
|
118
|
-
Generic code is provided in `SegmentTreeTemplate`. Concrete classes
|
119
|
-
|
120
|
-
|
121
|
-
`
|
115
|
+
Generic code is provided in `SegmentTreeTemplate`. Concrete classes provide a handful of simple lambdas and constants to the
|
116
|
+
template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a segment tree, for
|
117
|
+
which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes `MaxValSegmentTree` and
|
118
|
+
`IndexOfMaxValSegmentTree` for examples.
|
122
119
|
|
123
120
|
## Algorithms
|
124
121
|
|
@@ -131,7 +128,29 @@ The Algorithms submodule contains some algorithms using the data structures.
|
|
131
128
|
[left, right, bottom, top].
|
132
129
|
- The algorithm is due to [[DMNS2013]](#references).
|
133
130
|
|
131
|
+
# C Extensions
|
132
|
+
|
133
|
+
As another learning process I have implemented several of these data structures as C extensions. The class names have a "C" prefixed
|
134
|
+
and they can be required like their pure Ruby versions. They have the same APIs as their Ruby cousins.
|
135
|
+
|
136
|
+
## Disjoint Union
|
137
|
+
|
138
|
+
A benchmark suggests that a long sequence of `unite` operations is about 3 times as fast with the `CDisjointUnion` as with
|
139
|
+
`DisjointUnion`.
|
140
|
+
|
141
|
+
The implementation uses the remarkable Convenient Containers library from Jackson Allan.[[Allan]](#references).
|
142
|
+
|
143
|
+
## Segment Tree
|
144
|
+
|
145
|
+
`CSegmentTreeTemplate` is the C implementation of the generic class. Concrete classes are built on top of this in Ruby, just as with
|
146
|
+
the pure Ruby `SegmentTreeTemplate` class.
|
147
|
+
|
148
|
+
A benchmark suggests that a long sequence of `max_on` operations against a max-val Segment Tree is about 4 times as fast with the C
|
149
|
+
version as with the Ruby version. I'm a bit suprised the improvment isn't larger, but we must remember that the C code must still
|
150
|
+
interact with the Ruby objects in the underlying data array, and must "combine" them, etc., by calling Ruby lambdas.
|
151
|
+
|
134
152
|
# References
|
153
|
+
- [Allan] Allan, J., _CC: Convenient Containers_, https://github.com/JacksonAllan/CC, retrieved 2023-02-01.
|
135
154
|
- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
|
136
155
|
- [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI 10.1007/s00224-017-9760-2.
|
137
156
|
- [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985.
|
data/Rakefile
CHANGED
@@ -2,10 +2,12 @@ require 'rubygems'
|
|
2
2
|
require 'rake/testtask'
|
3
3
|
require 'rake/extensiontask'
|
4
4
|
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
5
|
+
['c_disjoint_union', 'c_segment_tree_template'].each do |extension_name|
|
6
|
+
Rake::ExtensionTask.new("data_structures_rmolinari/#{extension_name}") do |ext|
|
7
|
+
ext.name = extension_name
|
8
|
+
ext.ext_dir = "ext/#{extension_name}"
|
9
|
+
ext.lib_dir = 'lib/data_structures_rmolinari/'
|
10
|
+
end
|
9
11
|
end
|
10
12
|
|
11
13
|
Rake::TestTask.new do |t|
|
@@ -16,128 +16,84 @@
|
|
16
16
|
*/
|
17
17
|
|
18
18
|
#include "ruby.h"
|
19
|
-
|
20
|
-
|
21
|
-
// just fine.
|
22
|
-
#define mShared rb_define_module("Shared")
|
23
|
-
#define eDataError rb_const_get(mShared, rb_intern_const("DataError"))
|
19
|
+
#include "cc.h" // Convenient Containers
|
20
|
+
#include "shared.h"
|
24
21
|
|
25
22
|
/**
|
26
|
-
*
|
27
|
-
*
|
28
|
-
* Dynamic array of longs, with an initial value for otherwise uninitialized elements.
|
29
|
-
* Based on https://stackoverflow.com/questions/3536153/c-dynamically-growing-array
|
23
|
+
* Data type for the (parent, rank) pair, and some accessor helpers for the vec() container we are going to be using.
|
30
24
|
*/
|
31
|
-
typedef struct {
|
32
|
-
long *array;
|
33
|
-
size_t size;
|
34
|
-
long default_val;
|
35
|
-
} DynamicArray;
|
36
|
-
|
37
|
-
void initDynamicArray(DynamicArray *a, size_t initial_size, long default_val) {
|
38
|
-
a->array = malloc(initial_size * sizeof(long));
|
39
|
-
a->size = initial_size;
|
40
|
-
a->default_val = default_val;
|
41
|
-
|
42
|
-
for (size_t i = 0; i < initial_size; i++) {
|
43
|
-
a->array[i] = default_val;
|
44
|
-
}
|
45
|
-
}
|
46
|
-
|
47
|
-
void insertDynamicArray(DynamicArray *a, unsigned long index, long element) {
|
48
|
-
if (a->size <= index) {
|
49
|
-
size_t new_size = a->size;
|
50
|
-
while (new_size <= index) {
|
51
|
-
new_size = 8 * new_size / 5 + 8; // 8/5 gives "Fibonnacci-like" growth; adding 8 to avoid small arrays having to reallocate
|
52
|
-
// too often. Who knows if it's worth being "clever"."
|
53
|
-
}
|
54
|
-
|
55
|
-
long* new_array = realloc(a->array, new_size * sizeof(long));
|
56
|
-
if (!new_array) {
|
57
|
-
rb_raise(rb_eRuntimeError, "Cannot allocate memory to expand DynamicArray!");
|
58
|
-
}
|
59
25
|
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
26
|
+
typedef struct data_pair {
|
27
|
+
long parent;
|
28
|
+
unsigned long rank;
|
29
|
+
} data_pair;
|
64
30
|
|
65
|
-
|
66
|
-
|
31
|
+
#define DEFAULT_PARENT -1
|
32
|
+
#define DEFAULT_RANK 0
|
33
|
+
static data_pair default_pair = { .parent = DEFAULT_PARENT, .rank = DEFAULT_RANK };
|
67
34
|
|
68
|
-
|
35
|
+
static data_pair make_data_pair(long parent, unsigned long rank) {
|
36
|
+
data_pair pair = { .parent = parent, .rank = rank };
|
37
|
+
return pair;
|
69
38
|
}
|
70
39
|
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
40
|
+
/* The vector generic from Convenient Containers */
|
41
|
+
typedef vec(data_pair) pair_vector;
|
42
|
+
|
43
|
+
#define parent(disjoint_union_ptr, idx) (get(disjoint_union->pairs, idx)->parent)
|
44
|
+
#define rank(disjoint_union_ptr, idx) (get(disjoint_union->pairs, idx)->rank)
|
76
45
|
|
77
46
|
/**
|
78
47
|
* The C implementation of a Disjoint Union
|
79
48
|
*
|
80
|
-
* See
|
49
|
+
* See the paper for optimizations we use to get almost constant time for find() and unite().
|
50
|
+
*
|
51
|
+
* Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
|
81
52
|
*/
|
82
53
|
|
83
54
|
/*
|
84
55
|
* The Disjoint Union struct.
|
85
|
-
* -
|
86
|
-
* -
|
87
|
-
*
|
56
|
+
* - pairs: a vector (dynamic array) of pairs, the i-th of which contains
|
57
|
+
* - the "parent" of element i in its membership tree
|
58
|
+
* - An element e is the root of its tree just when it is its own parent
|
59
|
+
* - Two elements are in the same subset just when they are in the same tree in the forest.
|
88
60
|
* - So the key idea is that we can check this by navigating via parents from each element to their roots. Clever optimizations
|
89
61
|
* keep the trees flat and so most nodes are close to their roots.
|
90
|
-
*
|
91
|
-
*
|
92
|
-
* Leeuwen
|
62
|
+
* - the "rank" of element i
|
63
|
+
* - this value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat.
|
93
64
|
* - subset_count: the number of (disjoint) subsets.
|
94
65
|
* - it isn't needed internally but may be useful to client code.
|
95
66
|
*/
|
96
67
|
typedef struct du_data {
|
97
|
-
|
98
|
-
DynamicArray* rank; // the "ranks" of the elements, used when uniting subsets
|
68
|
+
pair_vector *pairs; // The generic vector container from the amazing Convenient Containers library
|
99
69
|
size_t subset_count;
|
100
70
|
} disjoint_union_data;
|
101
71
|
|
102
72
|
/*
|
103
|
-
* Create one.
|
104
|
-
*
|
105
|
-
* The dynamic arrays are initialized with a size of 100 because I didn't have a better idea. This will end up getting called from
|
106
|
-
* the Ruby #allocate method, which happens before #initialize. Thus we don't know the calling code's desired initial size.
|
73
|
+
* Create one (on the heap).
|
107
74
|
*/
|
108
|
-
|
109
|
-
|
110
|
-
disjoint_union_data* disjoint_union = malloc(sizeof(disjoint_union_data));
|
75
|
+
static disjoint_union_data *create_disjoint_union() {
|
76
|
+
disjoint_union_data *disjoint_union = (disjoint_union_data *)malloc(sizeof(disjoint_union_data));
|
111
77
|
|
112
78
|
// Allocate the structures
|
113
|
-
|
114
|
-
|
115
|
-
initDynamicArray(forest, INITIAL_SIZE, -1);
|
116
|
-
initDynamicArray(rank, INITIAL_SIZE, 0);
|
79
|
+
disjoint_union->pairs = malloc(sizeof(pair_vector));
|
80
|
+
init(disjoint_union->pairs);
|
117
81
|
|
118
|
-
disjoint_union->forest = forest;
|
119
|
-
disjoint_union->rank = rank;
|
120
82
|
disjoint_union->subset_count = 0;
|
121
83
|
|
122
84
|
return disjoint_union;
|
123
85
|
}
|
124
86
|
|
125
87
|
/*
|
126
|
-
* Free the memory associated with a disjoint union.
|
88
|
+
* Free the memory associated with a disjoint union.
|
89
|
+
*
|
90
|
+
* This will end up getting triggered by the Ruby garbage collector. Ruby learns about it via the disjoint_union_type struct below.
|
127
91
|
*/
|
128
92
|
static void disjoint_union_free(void *ptr) {
|
129
93
|
if (ptr) {
|
130
94
|
disjoint_union_data *disjoint_union = ptr;
|
131
|
-
|
132
|
-
|
133
|
-
|
134
|
-
free(disjoint_union->forest);
|
135
|
-
disjoint_union->forest = NULL;
|
136
|
-
|
137
|
-
free(disjoint_union->rank);
|
138
|
-
disjoint_union->rank = NULL;
|
139
|
-
|
140
|
-
free(disjoint_union);
|
95
|
+
cleanup(disjoint_union->pairs);
|
96
|
+
xfree(disjoint_union);
|
141
97
|
}
|
142
98
|
}
|
143
99
|
|
@@ -148,17 +104,23 @@ static void disjoint_union_free(void *ptr) {
|
|
148
104
|
/*
|
149
105
|
* Is the given element already a member of the universe?
|
150
106
|
*/
|
151
|
-
static int present_p(disjoint_union_data*
|
152
|
-
|
153
|
-
return (forest->size > element && (forest->array[element] != forest->default_val));
|
107
|
+
static int present_p(disjoint_union_data *disjoint_union, size_t element) {
|
108
|
+
return (size(disjoint_union->pairs) > element && (parent(disjoint_union, element) != DEFAULT_PARENT));
|
154
109
|
}
|
155
110
|
|
156
111
|
/*
|
157
112
|
* Check that the given element is a member of the universe and raise Shared::DataError (ruby-side) if not
|
158
113
|
*/
|
159
|
-
static void assert_membership(disjoint_union_data*
|
114
|
+
static void assert_membership(disjoint_union_data *disjoint_union, size_t element) {
|
160
115
|
if (!present_p(disjoint_union, element)) {
|
161
|
-
rb_raise(
|
116
|
+
rb_raise(eSharedDataError, "Value %zu is not part of the universe", element);
|
117
|
+
/* rb_raise( */
|
118
|
+
/* eSharedDataError, */
|
119
|
+
/* "Value %zu is not part of the universe, size = %zu, forest_val = %lu", */
|
120
|
+
/* element, */
|
121
|
+
/* size(disjoint_union->pairs), */
|
122
|
+
/* get(disjoint_union->pairs, element)->parent */
|
123
|
+
/* ); */
|
162
124
|
}
|
163
125
|
}
|
164
126
|
|
@@ -167,52 +129,57 @@ static void assert_membership(disjoint_union_data* disjoint_union, size_t elemen
|
|
167
129
|
*
|
168
130
|
* Shared::DataError is raised if it is already an element.
|
169
131
|
*/
|
170
|
-
static void add_new_element(disjoint_union_data*
|
132
|
+
static void add_new_element(disjoint_union_data *disjoint_union, size_t element) {
|
171
133
|
if (present_p(disjoint_union, element)) {
|
172
|
-
rb_raise(
|
134
|
+
rb_raise(eSharedDataError, "Element %zu already present in the universe", element);
|
135
|
+
}
|
136
|
+
|
137
|
+
// Expand the underlying vector if necessary
|
138
|
+
size_t sz = size(disjoint_union->pairs);
|
139
|
+
if (sz <= element) {
|
140
|
+
resize(disjoint_union->pairs, element + 1);
|
141
|
+
for (size_t i = sz + 1; i <= element; i++) {
|
142
|
+
lval(disjoint_union->pairs, i) = default_pair;
|
143
|
+
}
|
173
144
|
}
|
174
145
|
|
175
|
-
|
176
|
-
insertDynamicArray(disjoint_union->rank, element, 0);
|
146
|
+
lval(disjoint_union->pairs, element) = make_data_pair(element, 0l);
|
177
147
|
disjoint_union->subset_count++;
|
178
148
|
}
|
179
149
|
|
180
150
|
/*
|
181
|
-
* Find the canonical representative of the given element. This is the root of the tree
|
151
|
+
* Find the canonical representative of the given element. This is the root of the tree containing it.
|
182
152
|
*
|
183
153
|
* Two elements are in the same subset exactly when their canonical representatives are equal.
|
184
154
|
*/
|
185
|
-
static size_t find(disjoint_union_data*
|
155
|
+
static size_t find(disjoint_union_data *disjoint_union, size_t element) {
|
186
156
|
assert_membership(disjoint_union, element);
|
187
157
|
|
188
|
-
// We
|
189
|
-
long* d = disjoint_union->forest->array; // the actual forest data
|
158
|
+
// We use "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
|
190
159
|
size_t x = element;
|
191
|
-
|
192
|
-
|
160
|
+
long p, gp; // parent and grandparent
|
161
|
+
while (p = parent(disjoint_union, x), gp = parent(disjoint_union, p), p != gp) {
|
162
|
+
parent(disjoint_union, p) = gp;
|
163
|
+
x = gp;
|
193
164
|
}
|
194
|
-
return
|
165
|
+
return parent(disjoint_union, x);
|
195
166
|
}
|
196
167
|
|
197
168
|
/*
|
198
|
-
* "Link"
|
169
|
+
* "Link" the two given elements so that they are in the same subset now.
|
199
170
|
*
|
200
171
|
* In other words, merge the subtrees containing the two elements.
|
201
172
|
*
|
202
|
-
*
|
203
|
-
* though we don't check that here.
|
173
|
+
* elt1 and elt2 area must be disinct and the roots of their trees, though we don't check that here.
|
204
174
|
*/
|
205
|
-
static void link_roots(disjoint_union_data*
|
206
|
-
|
207
|
-
|
208
|
-
|
209
|
-
|
210
|
-
|
211
|
-
} else if (rank[elt1] == rank[elt2]) {
|
212
|
-
forest[elt2] = elt1;
|
213
|
-
rank[elt1]++;
|
175
|
+
static void link_roots(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2) {
|
176
|
+
if (rank(disjoint_union, elt1) > rank(disjoint_union, elt2)) {
|
177
|
+
parent(disjoint_union, elt2) = elt1;
|
178
|
+
} else if (rank(disjoint_union, elt1) == rank(disjoint_union, elt2)) {
|
179
|
+
parent(disjoint_union, elt2) = elt1;
|
180
|
+
rank(disjoint_union, elt1)++;
|
214
181
|
} else {
|
215
|
-
|
182
|
+
parent(disjoint_union, elt1) = elt2;
|
216
183
|
}
|
217
184
|
|
218
185
|
disjoint_union->subset_count--;
|
@@ -221,12 +188,12 @@ static void link_roots(disjoint_union_data* disjoint_union, size_t elt1, size_t
|
|
221
188
|
/*
|
222
189
|
* "Unite" or merge the subsets containing elt1 and elt2.
|
223
190
|
*/
|
224
|
-
static void unite(disjoint_union_data*
|
191
|
+
static void unite(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2) {
|
225
192
|
assert_membership(disjoint_union, elt1);
|
226
193
|
assert_membership(disjoint_union, elt2);
|
227
194
|
|
228
195
|
if (elt1 == elt2) {
|
229
|
-
rb_raise(
|
196
|
+
rb_raise(eSharedDataError, "Uniting an element with itself is meaningless");
|
230
197
|
}
|
231
198
|
|
232
199
|
size_t root1 = find(disjoint_union, elt1);
|
@@ -249,8 +216,10 @@ static void unite(disjoint_union_data* disjoint_union, size_t elt1, size_t elt2)
|
|
249
216
|
// deciding how agressive to be during garbage collection and such.
|
250
217
|
static size_t disjoint_union_memsize(const void *ptr) {
|
251
218
|
if (ptr) {
|
252
|
-
const disjoint_union_data *
|
253
|
-
|
219
|
+
const disjoint_union_data *du = ptr;
|
220
|
+
|
221
|
+
// See https://github.com/JacksonAllan/CC/issues/3
|
222
|
+
return sizeof( cc_vec_hdr_ty ) + cap( du->pairs ) * CC_EL_SIZE( *(du->pairs) );
|
254
223
|
} else {
|
255
224
|
return 0;
|
256
225
|
}
|
@@ -273,26 +242,10 @@ static const rb_data_type_t disjoint_union_type = {
|
|
273
242
|
};
|
274
243
|
|
275
244
|
/*
|
276
|
-
*
|
277
|
-
*
|
278
|
-
* TODO: can we return an size_t or unsigned long instead?
|
245
|
+
* Unwrap a Ruby-side disjoint union object to get the C struct inside.
|
279
246
|
*/
|
280
|
-
static
|
281
|
-
|
282
|
-
long c_val = FIX2LONG(val);
|
283
|
-
|
284
|
-
if (c_val < 0) {
|
285
|
-
rb_raise(eDataError, "Value must be non-negative");
|
286
|
-
}
|
287
|
-
|
288
|
-
return c_val;
|
289
|
-
}
|
290
|
-
|
291
|
-
/*
|
292
|
-
* Unwrap a Rubyfied disjoint union to get the C struct inside.
|
293
|
-
*/
|
294
|
-
static disjoint_union_data* unwrapped(VALUE self) {
|
295
|
-
disjoint_union_data* disjoint_union;
|
247
|
+
static disjoint_union_data *unwrapped(VALUE self) {
|
248
|
+
disjoint_union_data *disjoint_union;
|
296
249
|
TypedData_Get_Struct((self), disjoint_union_data, &disjoint_union_type, disjoint_union);
|
297
250
|
return disjoint_union;
|
298
251
|
}
|
@@ -301,7 +254,9 @@ static disjoint_union_data* unwrapped(VALUE self) {
|
|
301
254
|
* This is for CDisjointUnion.allocate on the Ruby side
|
302
255
|
*/
|
303
256
|
static VALUE disjoint_union_alloc(VALUE klass) {
|
304
|
-
|
257
|
+
// Get one on the heap
|
258
|
+
disjoint_union_data *disjoint_union = create_disjoint_union();
|
259
|
+
// Wrap it up into a Ruby object
|
305
260
|
return TypedData_Wrap_Struct(klass, &disjoint_union_type, disjoint_union);
|
306
261
|
}
|
307
262
|
|
@@ -318,11 +273,15 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
|
|
318
273
|
rb_raise(rb_eArgError, "wrong number of arguments");
|
319
274
|
} else {
|
320
275
|
size_t initial_size = checked_nonneg_fixnum(argv[0]);
|
321
|
-
disjoint_union_data*
|
276
|
+
disjoint_union_data *disjoint_union = unwrapped(self);
|
277
|
+
|
278
|
+
pair_vector *pair_vec = disjoint_union->pairs;
|
279
|
+
resize(pair_vec, initial_size);
|
322
280
|
|
323
281
|
for (size_t i = 0; i < initial_size; i++) {
|
324
|
-
|
282
|
+
lval(pair_vec, i) = make_data_pair(i, 0);
|
325
283
|
}
|
284
|
+
disjoint_union->subset_count = initial_size;
|
326
285
|
}
|
327
286
|
return self;
|
328
287
|
}
|
@@ -330,7 +289,7 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
|
|
330
289
|
/**
|
331
290
|
* And now the simple wrappers around the Disjoint Union C functionality. In each case we
|
332
291
|
* - unwrap a 'VALUE self',
|
333
|
-
* - i.e.,
|
292
|
+
* - i.e., the CDisjointUnion instance on the Ruby side;
|
334
293
|
* - munge any other arguments into longs;
|
335
294
|
* - call the appropriate C function to act on the struct; and
|
336
295
|
* - return an appropriate VALUE for the Ruby runtime can use.
|
@@ -341,7 +300,7 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
|
|
341
300
|
/*
|
342
301
|
* Add a new subset to the universe containing the element +new_v+.
|
343
302
|
*
|
344
|
-
* @param the new element, starting in its own singleton subset
|
303
|
+
* @param arg the new element, starting in its own singleton subset
|
345
304
|
* - it must be a non-negative integer, not already part of the universe of elements.
|
346
305
|
*/
|
347
306
|
static VALUE disjoint_union_make_set(VALUE self, VALUE arg) {
|
@@ -389,8 +348,7 @@ static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
|
|
389
348
|
* The data structure provides efficient actions to merge two disjoint subsets, i.e., replace them by their union, and determine if
|
390
349
|
* two elements are in the same subset.
|
391
350
|
*
|
392
|
-
* The elements of the set are
|
393
|
-
* representatives.
|
351
|
+
* The elements of the set are non-negative integers. Client code can map its data to these representatives.
|
394
352
|
*
|
395
353
|
* See https://en.wikipedia.org/wiki/Disjoint-set_data_structure for a good introduction.
|
396
354
|
*
|
@@ -400,7 +358,7 @@ static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
|
|
400
358
|
* - Tarjan, Robert E., van Leeuwen, Jan (1984). _Worst-case analysis of set union algorithms_. Journal of the ACM. 31 (2): 245–281.
|
401
359
|
*/
|
402
360
|
void Init_c_disjoint_union() {
|
403
|
-
VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
|
361
|
+
//VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
|
404
362
|
VALUE cDisjointUnion = rb_define_class_under(mDataStructuresRMolinari, "CDisjointUnion", rb_cObject);
|
405
363
|
|
406
364
|
rb_define_alloc_func(cDisjointUnion, disjoint_union_alloc);
|
@@ -3,10 +3,15 @@ require 'mkmf'
|
|
3
3
|
abort 'missing malloc()' unless have_func "malloc"
|
4
4
|
abort 'missing realloc()' unless have_func "realloc"
|
5
5
|
|
6
|
-
if try_cflags('-
|
7
|
-
append_cflags('-
|
6
|
+
if try_cflags('-O3')
|
7
|
+
append_cflags('-O3')
|
8
8
|
end
|
9
9
|
|
10
10
|
extension_name = "c_disjoint_union"
|
11
11
|
dir_config(extension_name)
|
12
|
+
|
13
|
+
$srcs = ["disjoint_union.c", "../shared.c"]
|
14
|
+
$INCFLAGS << " -I$(srcdir)/.."
|
15
|
+
$VPATH << "$(srcdir)/.."
|
16
|
+
|
12
17
|
create_makefile("data_structures_rmolinari/c_disjoint_union")
|
@@ -0,0 +1,17 @@
|
|
1
|
+
require 'mkmf'
|
2
|
+
|
3
|
+
abort 'missing malloc()' unless have_func "malloc"
|
4
|
+
abort 'missing realloc()' unless have_func "realloc"
|
5
|
+
|
6
|
+
if try_cflags('-O3')
|
7
|
+
append_cflags('-O3')
|
8
|
+
end
|
9
|
+
|
10
|
+
extension_name = "c_segment_tree_template"
|
11
|
+
dir_config(extension_name)
|
12
|
+
|
13
|
+
$srcs = ["segment_tree_template.c", "../shared.c"]
|
14
|
+
$INCFLAGS << " -I$(srcdir)/.."
|
15
|
+
$VPATH << "$(srcdir)/.."
|
16
|
+
|
17
|
+
create_makefile("data_structures_rmolinari/c_segment_tree_template")
|
@@ -0,0 +1,362 @@
|
|
1
|
+
/*
|
2
|
+
* This is a C implementation of a Segment Tree data structure.
|
3
|
+
*
|
4
|
+
* More specifically, it is the C version of the SegmentTreeTemplate Ruby class, for which see elsewhere in the repo.
|
5
|
+
*/
|
6
|
+
|
7
|
+
#include "ruby.h"
|
8
|
+
#include "shared.h"
|
9
|
+
|
10
|
+
#define single_cell_val_at(seg_tree, idx) rb_funcall(seg_tree->single_cell_array_val_lambda, rb_intern("call"), 1, LONG2FIX(idx))
|
11
|
+
#define combined_val(seg_tree, v1, v2) rb_funcall(seg_tree->combine_lambda, rb_intern("call"), 2, (v1), (v2))
|
12
|
+
|
13
|
+
/**
|
14
|
+
* The C implementation of a generic Segment Tree
|
15
|
+
*/
|
16
|
+
|
17
|
+
typedef struct {
|
18
|
+
VALUE *tree; // The 1-based implicit binary tree in which the data structure lives
|
19
|
+
VALUE single_cell_array_val_lambda;
|
20
|
+
VALUE combine_lambda;
|
21
|
+
VALUE identity;
|
22
|
+
size_t size; // the size of the underlying data array
|
23
|
+
size_t tree_alloc_size; // the size of the VALUE* tree array
|
24
|
+
} segment_tree_data;
|
25
|
+
|
26
|
+
/************************************************************
|
27
|
+
* Memory Management
|
28
|
+
*
|
29
|
+
*/
|
30
|
+
|
31
|
+
/*
|
32
|
+
* Create one (on the heap).
|
33
|
+
*/
|
34
|
+
static segment_tree_data *create_segment_tree() {
|
35
|
+
segment_tree_data *segment_tree = malloc(sizeof(segment_tree_data));
|
36
|
+
|
37
|
+
// Allocate the structures
|
38
|
+
segment_tree->tree = NULL; // we don't yet know how much space we need
|
39
|
+
|
40
|
+
segment_tree->single_cell_array_val_lambda = 0;
|
41
|
+
segment_tree->combine_lambda = 0;
|
42
|
+
segment_tree->size = 0; // we don't know the right value yet
|
43
|
+
|
44
|
+
return segment_tree;
|
45
|
+
}
|
46
|
+
|
47
|
+
/*
|
48
|
+
* Free the memory associated with a segment_tree.
|
49
|
+
*
|
50
|
+
* This will end up getting triggered by the Ruby garbage collector. Ruby learns about it via the segment_tree_type struct below.
|
51
|
+
*/
|
52
|
+
static void segment_tree_free(void *ptr) {
|
53
|
+
if (ptr) {
|
54
|
+
segment_tree_data *segment_tree = ptr;
|
55
|
+
xfree(segment_tree->tree);
|
56
|
+
xfree(segment_tree);
|
57
|
+
}
|
58
|
+
}
|
59
|
+
|
60
|
+
/*
|
61
|
+
* How much memory (roughly) does a segment_tree_data instance consume?
|
62
|
+
*
|
63
|
+
* I guess the Ruby runtime can use this information when deciding how agressive to be during garbage collection and such.
|
64
|
+
*/
|
65
|
+
static size_t segment_tree_memsize(const void *ptr) {
|
66
|
+
if (ptr) {
|
67
|
+
const segment_tree_data *st = ptr;
|
68
|
+
|
69
|
+
// for the tree array plus the size of the segment_tree_data struct itself.
|
70
|
+
return sizeof( VALUE ) * st->tree_alloc_size * 4 + sizeof(segment_tree_data);
|
71
|
+
} else {
|
72
|
+
return 0;
|
73
|
+
}
|
74
|
+
}
|
75
|
+
|
76
|
+
/*
|
77
|
+
* Mark the Ruby objects we hold so that the Ruby garbage collector knows that they are still in use.
|
78
|
+
*/
|
79
|
+
static void segment_tree_mark(void *ptr) {
|
80
|
+
segment_tree_data *st = ptr;
|
81
|
+
|
82
|
+
rb_gc_mark(st->combine_lambda);
|
83
|
+
rb_gc_mark(st->single_cell_array_val_lambda);
|
84
|
+
rb_gc_mark(st->identity);
|
85
|
+
|
86
|
+
for (size_t i = 0; i < st->tree_alloc_size; i++) {
|
87
|
+
VALUE value = st->tree[i];
|
88
|
+
if (value) {
|
89
|
+
rb_gc_mark(value);
|
90
|
+
}
|
91
|
+
}
|
92
|
+
}
|
93
|
+
|
94
|
+
|
95
|
+
/*
|
96
|
+
* A configuration struct that tells the Ruby runtime how to deal with a segment_tree_data object.
|
97
|
+
*
|
98
|
+
* https://docs.ruby-lang.org/en/master/extension_rdoc.html#label-Encapsulate+C+data+into+a+Ruby+object
|
99
|
+
*/
|
100
|
+
static const rb_data_type_t segment_tree_type = {
|
101
|
+
.wrap_struct_name = "segment_tree_template",
|
102
|
+
{ // help for the Ruby garbage collector
|
103
|
+
.dmark = segment_tree_mark, // dmark, for marking other Ruby objects.
|
104
|
+
.dfree = segment_tree_free, // how to free the memory associated with an object
|
105
|
+
.dsize = segment_tree_memsize, // roughly how much space does the object consume?
|
106
|
+
},
|
107
|
+
.data = NULL, // a data field we could use for something here if we wanted. Ruby ignores it
|
108
|
+
.flags = 0 // GC-related flag values.
|
109
|
+
};
|
110
|
+
|
111
|
+
/*
|
112
|
+
* End memory management functions.
|
113
|
+
************************************************************/
|
114
|
+
|
115
|
+
|
116
|
+
/************************************************************
|
117
|
+
* Wrapping and unwrapping the C struct and other things.
|
118
|
+
*
|
119
|
+
*/
|
120
|
+
|
121
|
+
/*
|
122
|
+
* Unwrap a Ruby-side disjoint union object to get the C struct inside.
|
123
|
+
*
|
124
|
+
* TODO: consider a macro in a shared header
|
125
|
+
*/
|
126
|
+
static segment_tree_data *unwrapped(VALUE self) {
|
127
|
+
segment_tree_data *segment_tree;
|
128
|
+
TypedData_Get_Struct((self), segment_tree_data, &segment_tree_type, segment_tree);
|
129
|
+
return segment_tree;
|
130
|
+
}
|
131
|
+
|
132
|
+
/*
|
133
|
+
* Allocate a segment_tree_data struct and wrap it for the Ruby runtime.
|
134
|
+
*
|
135
|
+
* This is for CSegmentTreeTemplate.allocate on the Ruby side.
|
136
|
+
*/
|
137
|
+
static VALUE segment_tree_alloc(VALUE klass) {
|
138
|
+
// Get one on the heap
|
139
|
+
segment_tree_data *segment_tree = create_segment_tree();
|
140
|
+
// ...and wrap it into a Ruby object
|
141
|
+
return TypedData_Wrap_Struct(klass, &segment_tree_type, segment_tree);
|
142
|
+
}
|
143
|
+
|
144
|
+
/*
|
145
|
+
* End wrapping and unwrapping functions.
|
146
|
+
************************************************************/
|
147
|
+
|
148
|
+
/************************************************************
|
149
|
+
* The Segment Tree API on the C side.
|
150
|
+
*
|
151
|
+
* We wrap these in the Ruby-ready functions below
|
152
|
+
*/
|
153
|
+
|
154
|
+
/*
|
155
|
+
* Recursively build the internal tree data structure.
|
156
|
+
*
|
157
|
+
* - tree_idx: the index into the tree array of the node being calculated
|
158
|
+
* - [tree_l, tree_r]: the sub-interval of the underlying array data corresponding to the tree node being calculated.
|
159
|
+
*/
|
160
|
+
static void build(segment_tree_data *segment_tree, size_t tree_idx, size_t tree_l, size_t tree_r) {
|
161
|
+
VALUE *tree = segment_tree->tree;
|
162
|
+
|
163
|
+
if (tree_l == tree_r) {
|
164
|
+
// Base case: the node corresponds to a subarray of length 1.
|
165
|
+
segment_tree->tree[tree_idx] = single_cell_val_at(segment_tree, tree_l);
|
166
|
+
} else {
|
167
|
+
// Build to two child nodes, and then combine their values for this node.
|
168
|
+
size_t mid = midpoint(tree_l, tree_r);
|
169
|
+
size_t left = left_child(tree_idx);
|
170
|
+
size_t right = right_child(tree_idx);
|
171
|
+
|
172
|
+
build(segment_tree, left, tree_l, mid);
|
173
|
+
build(segment_tree, right, mid + 1, tree_r);
|
174
|
+
|
175
|
+
VALUE comb_val = combined_val(segment_tree, tree[left], tree[right]);
|
176
|
+
segment_tree->tree[tree_idx] = comb_val;
|
177
|
+
}
|
178
|
+
}
|
179
|
+
|
180
|
+
/*
|
181
|
+
* Set up the internals with the arguments we get from #initialize.
|
182
|
+
*
|
183
|
+
* - combine: must be callable
|
184
|
+
* - single_cell_array_val: must be callable
|
185
|
+
* - size: must be a positive integer
|
186
|
+
* - identity: we don't care what it is.
|
187
|
+
* - maybe we should check at least that it is not 0. But Qnil is fine.
|
188
|
+
*/
|
189
|
+
static void setup(segment_tree_data* seg_tree, VALUE combine, VALUE single_cell_array_val, VALUE size, VALUE identity) {
|
190
|
+
VALUE idCall = rb_intern("call");
|
191
|
+
|
192
|
+
if (!rb_obj_respond_to(combine, idCall, TRUE)) {
|
193
|
+
rb_raise(rb_eArgError, "wrong type argument %"PRIsVALUE" (should be callable)", rb_obj_class(combine));
|
194
|
+
}
|
195
|
+
|
196
|
+
if (!rb_obj_respond_to(single_cell_array_val, idCall, TRUE)) {
|
197
|
+
rb_raise(rb_eArgError, "wrong type argument %"PRIsVALUE" (should be callable)", rb_obj_class(single_cell_array_val));
|
198
|
+
}
|
199
|
+
|
200
|
+
seg_tree->combine_lambda = combine;
|
201
|
+
seg_tree->single_cell_array_val_lambda = single_cell_array_val;
|
202
|
+
seg_tree->identity = identity;
|
203
|
+
seg_tree->size = checked_nonneg_fixnum(size);
|
204
|
+
|
205
|
+
if (seg_tree->size == 0) {
|
206
|
+
rb_raise(rb_eArgError, "size must be positive.");
|
207
|
+
}
|
208
|
+
|
209
|
+
// Implicit binary tree with n leaves and straightforward left() and right() may use indices up to 4n. But see here for a way to
|
210
|
+
// reduce the requirement to 2n: https://cp-algorithms.com/data_structures/segment_tree.html#memory-efficient-implementation
|
211
|
+
size_t tree_size = 1 + 4 * seg_tree->size;
|
212
|
+
seg_tree->tree = calloc(tree_size, sizeof(VALUE));
|
213
|
+
seg_tree->tree_alloc_size = tree_size;
|
214
|
+
|
215
|
+
build(seg_tree, TREE_ROOT, 0, seg_tree->size - 1);
|
216
|
+
}
|
217
|
+
|
218
|
+
|
219
|
+
/*
|
220
|
+
* Determine the value for the subarray A(left, right).
|
221
|
+
*
|
222
|
+
* - tree_idx: the index in the array of the node we are currently visiting
|
223
|
+
* - tree_l..tree_r: the subarray handled by the current node.
|
224
|
+
* - left..right: the subarray whose value we are currently looking for.
|
225
|
+
*
|
226
|
+
* As an invariant we have left..right \subset tree_l..tree_r.
|
227
|
+
*
|
228
|
+
* We start out with
|
229
|
+
* - tree_idx = TREE_ROOT
|
230
|
+
* - tree_l..tree_r = 0..(size - 1), and
|
231
|
+
* - left..right given by the client code's query
|
232
|
+
*
|
233
|
+
* If [tree_l, tree_r] = [left, right] then the current node gives the desired answer. Otherwise we decend the tree with one or two
|
234
|
+
* recursive calls.
|
235
|
+
*
|
236
|
+
* If left..right is contained the the bottom or top half of tree_l..tree_r we decend to the corresponding child with one recursive
|
237
|
+
* call. Otherwise we split left..right at the midpoint of tree_l..tree_r, make two recursive calls, and then combine the results.
|
238
|
+
*/
|
239
|
+
static VALUE determine_val(segment_tree_data* seg_tree, size_t tree_idx, size_t left, size_t right, size_t tree_l, size_t tree_r) {
|
240
|
+
// Does the current tree node exactly serve up the interval we're interested in?
|
241
|
+
if (left == tree_l && right == tree_r) {
|
242
|
+
return seg_tree->tree[tree_idx];
|
243
|
+
}
|
244
|
+
|
245
|
+
// We need to go further down the tree */
|
246
|
+
size_t mid = midpoint(tree_l, tree_r);
|
247
|
+
if (mid >= right) {
|
248
|
+
// Our interval is contained by the left child's interval
|
249
|
+
return determine_val(seg_tree, left_child(tree_idx), left, right, tree_l, mid);
|
250
|
+
} else if (mid + 1 <= left) {
|
251
|
+
// Our interval is contained by the right child's interval
|
252
|
+
return determine_val(seg_tree, right_child(tree_idx), left, right, mid + 1, tree_r);
|
253
|
+
} else {
|
254
|
+
// Our interval is split between the two, so we need to combine the results from the children.
|
255
|
+
return rb_funcall(
|
256
|
+
seg_tree->combine_lambda, rb_intern("call"), 2,
|
257
|
+
determine_val(seg_tree, left_child(tree_idx), left, mid, tree_l, mid),
|
258
|
+
determine_val(seg_tree, right_child(tree_idx), mid + 1, right, mid + 1, tree_r)
|
259
|
+
);
|
260
|
+
}
|
261
|
+
}
|
262
|
+
|
263
|
+
/*
|
264
|
+
* Update the structure to reflect the change in the underlying array at index idx.
|
265
|
+
*
|
266
|
+
* - idx: the index at which the underlying array data has changed.
|
267
|
+
* - tree_id: the index in the internal datastructure of the node we are currently visiting.
|
268
|
+
* - tree_l..tree_r: the range handled by the current node
|
269
|
+
*/
|
270
|
+
static void update_val_at(segment_tree_data *seg_tree, size_t idx, size_t tree_idx, size_t tree_l, size_t tree_r) {
|
271
|
+
if (tree_l == tree_r) {
|
272
|
+
// We have found the base case of our update
|
273
|
+
if (tree_l != idx) {
|
274
|
+
rb_raise(
|
275
|
+
eSharedInternalLogicError,
|
276
|
+
"tree_l == tree_r == %lu but they do not agree with the idx %lu holding the updated value",
|
277
|
+
tree_r, idx
|
278
|
+
);
|
279
|
+
}
|
280
|
+
seg_tree->tree[tree_idx] = single_cell_val_at(seg_tree, tree_l);
|
281
|
+
} else {
|
282
|
+
// Recursively update the appropriate subtree...
|
283
|
+
size_t mid = midpoint(tree_l, tree_r);
|
284
|
+
size_t left = left_child(tree_idx);
|
285
|
+
size_t right = right_child(tree_idx);
|
286
|
+
if (mid >= idx) {
|
287
|
+
update_val_at(seg_tree, idx, left, tree_l, mid);
|
288
|
+
} else {
|
289
|
+
update_val_at(seg_tree, idx, right, mid + 1, tree_r);
|
290
|
+
}
|
291
|
+
// ...and ourself to incorporate the change
|
292
|
+
seg_tree->tree[tree_idx] = combined_val(seg_tree, seg_tree->tree[left], seg_tree->tree[right]);
|
293
|
+
}
|
294
|
+
}
|
295
|
+
|
296
|
+
/*
|
297
|
+
* End C implementation of the Segment Tree API
|
298
|
+
************************************************************/
|
299
|
+
|
300
|
+
/**
|
301
|
+
* And now the wrappers around the C functionality.
|
302
|
+
*/
|
303
|
+
|
304
|
+
/*
|
305
|
+
* CSegmentTreeTemplate#c_initialize.
|
306
|
+
*
|
307
|
+
* (see CSegmentTreeTemplate#initialize).
|
308
|
+
*/
|
309
|
+
static VALUE segment_tree_init(VALUE self, VALUE combine, VALUE single_cell_array_val, VALUE size, VALUE identity) {
|
310
|
+
setup(unwrapped(self), combine, single_cell_array_val, size, identity);
|
311
|
+
return self;
|
312
|
+
}
|
313
|
+
|
314
|
+
/*
|
315
|
+
* (see SegmentTreeTemplate#query_on)
|
316
|
+
*/
|
317
|
+
static VALUE segment_tree_query_on(VALUE self, VALUE left, VALUE right) {
|
318
|
+
segment_tree_data* seg_tree = unwrapped(self);
|
319
|
+
size_t c_left = checked_nonneg_fixnum(left);
|
320
|
+
size_t c_right = checked_nonneg_fixnum(right);
|
321
|
+
|
322
|
+
if (c_right >= seg_tree->size) {
|
323
|
+
rb_raise(eSharedDataError, "Bad query interval %lu..%lu (size = %lu)", c_left, c_right, seg_tree->size);
|
324
|
+
}
|
325
|
+
|
326
|
+
if (left > right) {
|
327
|
+
// empty interval.
|
328
|
+
return seg_tree->identity;
|
329
|
+
}
|
330
|
+
|
331
|
+
return determine_val(seg_tree, TREE_ROOT, c_left, c_right, 0, seg_tree->size - 1);
|
332
|
+
}
|
333
|
+
|
334
|
+
/*
|
335
|
+
* (see SegmentTreeTemplate#update_at)
|
336
|
+
*/
|
337
|
+
static VALUE segment_tree_update_at(VALUE self, VALUE idx) {
|
338
|
+
segment_tree_data *seg_tree = unwrapped(self);
|
339
|
+
size_t c_idx = checked_nonneg_fixnum(idx);
|
340
|
+
|
341
|
+
if (c_idx >= seg_tree->size) {
|
342
|
+
rb_raise(eSharedDataError, "Cannot update value at index %lu, size = %lu", c_idx, seg_tree->size);
|
343
|
+
}
|
344
|
+
|
345
|
+
update_val_at(seg_tree, c_idx, TREE_ROOT, 0, seg_tree->size - 1);
|
346
|
+
|
347
|
+
return Qnil;
|
348
|
+
}
|
349
|
+
|
350
|
+
/*
|
351
|
+
* A generic Segment Tree template, written in C.
|
352
|
+
*
|
353
|
+
* (see SegmentTreeTemplate)
|
354
|
+
*/
|
355
|
+
void Init_c_segment_tree_template() {
|
356
|
+
VALUE cSegmentTreeTemplate = rb_define_class_under(mDataStructuresRMolinari, "CSegmentTreeTemplate", rb_cObject);
|
357
|
+
|
358
|
+
rb_define_alloc_func(cSegmentTreeTemplate, segment_tree_alloc);
|
359
|
+
rb_define_method(cSegmentTreeTemplate, "c_initialize", segment_tree_init, 4);
|
360
|
+
rb_define_method(cSegmentTreeTemplate, "query_on", segment_tree_query_on, 2);
|
361
|
+
rb_define_method(cSegmentTreeTemplate, "update_at", segment_tree_update_at, 1);
|
362
|
+
}
|
data/ext/shared.c
ADDED
@@ -0,0 +1,32 @@
|
|
1
|
+
#include "shared.h"
|
2
|
+
|
3
|
+
/*
|
4
|
+
* Arithmetic for in-array binary tree
|
5
|
+
*/
|
6
|
+
size_t midpoint(size_t left, size_t right) {
|
7
|
+
return (left + right) / 2;
|
8
|
+
}
|
9
|
+
|
10
|
+
size_t left_child(size_t i) {
|
11
|
+
return i << 1;
|
12
|
+
}
|
13
|
+
|
14
|
+
size_t right_child(size_t i) {
|
15
|
+
return 1 + (i << 1);
|
16
|
+
}
|
17
|
+
|
18
|
+
/*
|
19
|
+
* Check that a Ruby value is a non-negative Fixnum and convert it to a C unsigned long
|
20
|
+
*/
|
21
|
+
unsigned long checked_nonneg_fixnum(VALUE val) {
|
22
|
+
Check_Type(val, T_FIXNUM);
|
23
|
+
long c_val = FIX2LONG(val);
|
24
|
+
|
25
|
+
if (c_val < 0) {
|
26
|
+
rb_raise(eSharedDataError, "Value must be non-negative");
|
27
|
+
}
|
28
|
+
|
29
|
+
return c_val;
|
30
|
+
}
|
31
|
+
|
32
|
+
|
@@ -0,0 +1,112 @@
|
|
1
|
+
require 'must_be'
|
2
|
+
|
3
|
+
require_relative 'shared'
|
4
|
+
require_relative 'c_segment_tree_template'
|
5
|
+
|
6
|
+
# The template of Segment Tree, which can be used for various interval-related purposes, like efficiently finding the sum (or min or
|
7
|
+
# max) on a arbitrary subarray of a given array.
|
8
|
+
#
|
9
|
+
# There is an excellent description of the data structure at https://cp-algorithms.com/data_structures/segment_tree.html. The
|
10
|
+
# Wikipedia article (https://en.wikipedia.org/wiki/Segment_tree) appears to describe a different data structure which is sometimes
|
11
|
+
# called an "interval tree."
|
12
|
+
#
|
13
|
+
# For more details (and some close-to-metal analysis of run time, especially for large datasets) see
|
14
|
+
# https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up implementation,
|
15
|
+
# which is faster, at least for large datasets and cache-relevant compiled code. These issues don't really apply to code written in
|
16
|
+
# Ruby.
|
17
|
+
#
|
18
|
+
# This is a generic implementation, intended to allow easy configuration for concrete instances. See the parameters to the
|
19
|
+
# initializer and the definitions of concrete realisations like MaxValSegmentTree.
|
20
|
+
#
|
21
|
+
# We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
|
22
|
+
class DataStructuresRMolinari::CSegmentTreeTemplate
|
23
|
+
|
24
|
+
# Construct a concrete instance of a Segment Tree. See details at the links above for the underlying concepts here.
|
25
|
+
# @param combine a lambda that takes two values and munges them into a combined value.
|
26
|
+
# - For example, if we are calculating sums over subintervals, combine.call(a, b) = a + b, while if we are doing maxima we will
|
27
|
+
# return max(a, b).
|
28
|
+
# - Things get more complicated when we are calculating, say, the _index_ of the maximal value in a subinterval. Now it is not
|
29
|
+
# enough simply to store that index at each tree node, because to combine the indices from two child nodes we need to know
|
30
|
+
# both the index of the maximal element in each child node's interval, but also the maximal values themselves, so we know
|
31
|
+
# which one "wins" for the parent node. This affects the sort of work we need to do when combining and the value provided by
|
32
|
+
# the +single_cell_array_val+ lambda.
|
33
|
+
# @param single_cell_array_val a lambda that takes an index i and returns the value we need to store in the #build
|
34
|
+
# operation for the subinterval i..i.
|
35
|
+
# - This will often simply be the value data[i], but in some cases it will be something else. For example, when we are
|
36
|
+
# calculating the index of the maximal value on each subinterval we need [i, data[i]] here.
|
37
|
+
# - If +update_at+ is called later, this lambda must close over the underlying data in a way that captures the updated value.
|
38
|
+
# @param size the size of the underlying data array, used in certain internal arithmetic.
|
39
|
+
# @param identity the value to return when we are querying on an empty interval
|
40
|
+
# - for sums, this will be zero; for maxima, this will be -Infinity, etc
|
41
|
+
def initialize(combine:, single_cell_array_val:, size:, identity:)
|
42
|
+
# having sorted out the keyword arguments, pass them more easily to the C layer.
|
43
|
+
c_initialize(combine, single_cell_array_val, size, identity)
|
44
|
+
end
|
45
|
+
end
|
46
|
+
|
47
|
+
# A segment tree that for an array A(0...n) answers questions of the form "what is the maximum value in the subinterval A(i..j)?"
|
48
|
+
# in O(log n) time.
|
49
|
+
#
|
50
|
+
# C version
|
51
|
+
#
|
52
|
+
# TODO: share the definition with (non-C) MasValSegmentTree. The only difference is the class of the underlying segment tree
|
53
|
+
# template.
|
54
|
+
module DataStructuresRMolinari
|
55
|
+
class CMaxValSegmentTree
|
56
|
+
extend Forwardable
|
57
|
+
|
58
|
+
# Tell the tree that the value at idx has changed
|
59
|
+
def_delegator :@structure, :update_at
|
60
|
+
|
61
|
+
# @param data an object that contains values at integer indices based at 0, via +data[i]+.
|
62
|
+
# - This will usually be an Array, but it could also be a hash or a proc.
|
63
|
+
def initialize(data)
|
64
|
+
@structure = CSegmentTreeTemplate.new(
|
65
|
+
combine: ->(a, b) { [a, b].max },
|
66
|
+
single_cell_array_val: ->(i) { data[i] },
|
67
|
+
size: data.size,
|
68
|
+
identity: -Shared::INFINITY
|
69
|
+
)
|
70
|
+
end
|
71
|
+
|
72
|
+
# The maximum value in A(i..j).
|
73
|
+
#
|
74
|
+
# The arguments must be integers in 0...(A.size)
|
75
|
+
# @return the largest value in A(i..j) or -Infinity if i > j.
|
76
|
+
def max_on(i, j)
|
77
|
+
@structure.query_on(i, j)
|
78
|
+
end
|
79
|
+
end
|
80
|
+
|
81
|
+
# A segment tree that for an array A(0...n) answers questions of the form "what is the index of the maximal value in the
|
82
|
+
# subinterval A(i..j)?" in O(log n) time.
|
83
|
+
#
|
84
|
+
# C version
|
85
|
+
class CIndexOfMaxValSegmentTree
|
86
|
+
extend Forwardable
|
87
|
+
|
88
|
+
# Tell the tree that the value at idx has changed
|
89
|
+
def_delegator :@structure, :update_at
|
90
|
+
|
91
|
+
# @param (see MaxValSegmentTree#initialize)
|
92
|
+
def initialize(data)
|
93
|
+
@structure = CSegmentTreeTemplate.new(
|
94
|
+
combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
|
95
|
+
single_cell_array_val: ->(i) { [i, data[i]] },
|
96
|
+
size: data.size,
|
97
|
+
identity: nil
|
98
|
+
)
|
99
|
+
end
|
100
|
+
|
101
|
+
# The index of the maximum value in A(i..j)
|
102
|
+
#
|
103
|
+
# The arguments must be integers in 0...(A.size)
|
104
|
+
# @return (Integer, nil) the index of the largest value in A(i..j) or +nil+ if i > j.
|
105
|
+
# - If there is more than one entry with that value, return one the indices. There is no guarantee as to which one.
|
106
|
+
# - Return +nil+ if i > j
|
107
|
+
def index_of_max_val_on(i, j)
|
108
|
+
@structure.query_on(i, j)&.first # discard the value part of the pair, which is a bookkeeping
|
109
|
+
end
|
110
|
+
end
|
111
|
+
|
112
|
+
end
|
@@ -17,6 +17,7 @@ require_relative 'shared'
|
|
17
17
|
#
|
18
18
|
# We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
|
19
19
|
class DataStructuresRMolinari::SegmentTreeTemplate
|
20
|
+
include Shared
|
20
21
|
include Shared::BinaryTreeArithmetic
|
21
22
|
|
22
23
|
# Construct a concrete instance of a Segment Tree. See details at the links above for the underlying concepts here.
|
@@ -47,27 +48,29 @@ class DataStructuresRMolinari::SegmentTreeTemplate
|
|
47
48
|
end
|
48
49
|
|
49
50
|
# The desired value (max, sum, etc.) on the subinterval left..right.
|
51
|
+
#
|
50
52
|
# @param left the left end of the subinterval.
|
51
53
|
# @param right the right end (inclusive) of the subinterval.
|
52
54
|
#
|
55
|
+
# It must be that left..right is contained in 0...size.
|
56
|
+
#
|
53
57
|
# The type of the return value depends on the concrete instance of the segment tree. We return the _identity_ element provided at
|
54
58
|
# construction time if the interval is empty.
|
55
59
|
def query_on(left, right)
|
56
|
-
raise DataError, "Bad query interval #{left}..#{right}"
|
60
|
+
raise DataError, "Bad query interval #{left}..#{right} (size = #{@size})" unless (0...@size).cover?(left..right)
|
57
61
|
|
58
62
|
return @identity if left > right # empty interval
|
59
63
|
|
60
64
|
determine_val(root, left, right, 0, @size - 1)
|
61
65
|
end
|
62
66
|
|
63
|
-
#
|
67
|
+
# Reflect the fact that the underlying array has been updated at the given idx
|
64
68
|
#
|
65
69
|
# @param idx an index in the underlying data array.
|
66
70
|
#
|
67
71
|
# Note that we don't need the updated value itself. We get that by calling the lambda +single_cell_array_val+ supplied at
|
68
72
|
# construction.
|
69
73
|
def update_at(idx)
|
70
|
-
raise DataError, 'Cannot update an index outside the initial range of the underlying data' unless (0...@size).cover?(idx)
|
71
74
|
|
72
75
|
update_val_at(idx, root, 0, @size - 1)
|
73
76
|
end
|
@@ -105,9 +108,9 @@ class DataStructuresRMolinari::SegmentTreeTemplate
|
|
105
108
|
left = left(tree_idx)
|
106
109
|
right = right(tree_idx)
|
107
110
|
if mid >= idx
|
108
|
-
update_val_at(idx, left
|
111
|
+
update_val_at(idx, left, tree_l, mid)
|
109
112
|
else
|
110
|
-
update_val_at(idx, right
|
113
|
+
update_val_at(idx, right, mid + 1, tree_r)
|
111
114
|
end
|
112
115
|
@tree[tree_idx] = @combine.call(@tree[left], @tree[right])
|
113
116
|
end
|
@@ -10,9 +10,13 @@ end
|
|
10
10
|
|
11
11
|
# These define classes inside module DataStructuresRMolinari
|
12
12
|
require_relative 'data_structures_rmolinari/algorithms'
|
13
|
+
|
13
14
|
require_relative 'data_structures_rmolinari/disjoint_union'
|
14
15
|
require_relative 'data_structures_rmolinari/c_disjoint_union' # version as a C extension
|
16
|
+
|
15
17
|
require_relative 'data_structures_rmolinari/segment_tree_template'
|
18
|
+
require_relative 'data_structures_rmolinari/c_segment_tree_template_impl'
|
19
|
+
|
16
20
|
require_relative 'data_structures_rmolinari/heap'
|
17
21
|
require_relative 'data_structures_rmolinari/max_priority_search_tree'
|
18
22
|
require_relative 'data_structures_rmolinari/min_priority_search_tree'
|
@@ -34,6 +38,8 @@ module DataStructuresRMolinari
|
|
34
38
|
# @param data an object that contains values at integer indices based at 0, via +data[i]+.
|
35
39
|
# - This will usually be an Array, but it could also be a hash or a proc.
|
36
40
|
def initialize(data)
|
41
|
+
data.must_be_a Enumerable
|
42
|
+
|
37
43
|
@structure = SegmentTreeTemplate.new(
|
38
44
|
combine: ->(a, b) { [a, b].max },
|
39
45
|
single_cell_array_val: ->(i) { data[i] },
|
@@ -61,6 +67,8 @@ module DataStructuresRMolinari
|
|
61
67
|
|
62
68
|
# @param (see MaxValSegmentTree#initialize)
|
63
69
|
def initialize(data)
|
70
|
+
data.must_be_a Enumerable
|
71
|
+
|
64
72
|
@structure = SegmentTreeTemplate.new(
|
65
73
|
combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
|
66
74
|
single_cell_array_val: ->(i) { [i, data[i]] },
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: data_structures_rmolinari
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.4.
|
4
|
+
version: 0.4.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Rory Molinari
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2023-
|
11
|
+
date: 2023-02-02 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: must_be
|
@@ -79,6 +79,7 @@ email: rorymolinari@gmail.com
|
|
79
79
|
executables: []
|
80
80
|
extensions:
|
81
81
|
- ext/c_disjoint_union/extconf.rb
|
82
|
+
- ext/c_segment_tree_template/extconf.rb
|
82
83
|
extra_rdoc_files: []
|
83
84
|
files:
|
84
85
|
- CHANGELOG.md
|
@@ -86,8 +87,12 @@ files:
|
|
86
87
|
- Rakefile
|
87
88
|
- ext/c_disjoint_union/disjoint_union.c
|
88
89
|
- ext/c_disjoint_union/extconf.rb
|
90
|
+
- ext/c_segment_tree_template/extconf.rb
|
91
|
+
- ext/c_segment_tree_template/segment_tree_template.c
|
92
|
+
- ext/shared.c
|
89
93
|
- lib/data_structures_rmolinari.rb
|
90
94
|
- lib/data_structures_rmolinari/algorithms.rb
|
95
|
+
- lib/data_structures_rmolinari/c_segment_tree_template_impl.rb
|
91
96
|
- lib/data_structures_rmolinari/disjoint_union.rb
|
92
97
|
- lib/data_structures_rmolinari/heap.rb
|
93
98
|
- lib/data_structures_rmolinari/max_priority_search_tree.rb
|