data_structures_rmolinari 0.4.2 → 0.4.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +12 -0
- data/README.md +35 -16
- data/Rakefile +6 -4
- data/ext/c_disjoint_union/disjoint_union.c +100 -142
- data/ext/c_disjoint_union/extconf.rb +7 -2
- data/ext/c_segment_tree_template/extconf.rb +17 -0
- data/ext/c_segment_tree_template/segment_tree_template.c +362 -0
- data/ext/shared.c +32 -0
- data/lib/data_structures_rmolinari/c_segment_tree_template_impl.rb +112 -0
- data/lib/data_structures_rmolinari/segment_tree_template.rb +8 -5
- data/lib/data_structures_rmolinari.rb +8 -0
- metadata +7 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 943ac55678a074cc0da3667dccbb07ee7d203639233f53bd8587af7fd8cd062e
|
|
4
|
+
data.tar.gz: ad235e5f4714e699f1cf5f113dd4b3a356a194cced5a74b60e17c5e3a896e01b
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: a68de76c88c67fadc42752610c695b1f0b8fd17f34db9c806291aeab4c933fe84c6523615deb4197e1c9fa6d36dce30987cc4e8896a2b0c1700b7e72b5bd2fff
|
|
7
|
+
data.tar.gz: 9063d89a98d599f27db2585bf383dbfb13e8f927abce64ac7eafb2edd70c490ddad1f1fc51e0f11c24adf29f28ab8c56548a6db264b15ace239c63b1a2ce5a01
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,17 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [Unreleased]
|
|
4
|
+
|
|
5
|
+
- Disjoint Union
|
|
6
|
+
- C extension: use Convenient Containers rather than my janky Dynamic Array attempt.
|
|
7
|
+
|
|
8
|
+
- Segment Tree
|
|
9
|
+
- Add a C implementation as CSegmentTreeTemplate.
|
|
10
|
+
|
|
11
|
+
## [0.4.3] 2023-01-27
|
|
12
|
+
|
|
13
|
+
- Fix bad directive in Rakefile for DisjointUnion C extension
|
|
14
|
+
|
|
3
15
|
## [0.4.2] 2023-01-26
|
|
4
16
|
|
|
5
17
|
### Added
|
data/README.md
CHANGED
|
@@ -4,8 +4,8 @@ This is a small collection of Ruby data structures that I have implemented for m
|
|
|
4
4
|
structure is almost always more educational than simply reading about it and is usually fun. I wrote some of them while
|
|
5
5
|
participating in the Advent of Code (https://adventofcode.com/).
|
|
6
6
|
|
|
7
|
-
|
|
8
|
-
|
|
7
|
+
The implementations are based on the expository descriptions and pseudo-code I found as I read about each structure and so are not
|
|
8
|
+
as fast as possible.
|
|
9
9
|
|
|
10
10
|
The code is available as a gem: https://rubygems.org/gems/data_structures_rmolinari.
|
|
11
11
|
|
|
@@ -42,9 +42,6 @@ It also provides
|
|
|
42
42
|
For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
|
|
43
43
|
van Leeuwen.
|
|
44
44
|
|
|
45
|
-
There is an experimental implementation as a C extension in c_disjoint_union.c. The Ruby class for this implementation is
|
|
46
|
-
`CDisjointUnion`. Benchmarks indicate that a long sequence of `unite` calls is about twice as fast.
|
|
47
|
-
|
|
48
45
|
## Heap
|
|
49
46
|
|
|
50
47
|
This is a standard binary heap with an `update` method, suitable for use as a priority queue. There are several supported
|
|
@@ -84,15 +81,15 @@ pointing north.
|
|
|
84
81
|
|
|
85
82
|
There is no `smallest_x_in_3_sided(x0, x1, y0)`. Just use `smallest_x_in_ne(x0, y0)`.
|
|
86
83
|
|
|
84
|
+
(These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
|
|
85
|
+
[[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.)
|
|
86
|
+
|
|
87
87
|
The single-point queries run in O(log n) time, where n is the size of P, while `enumerate_3_sided` runs in O(m + log n), where m is
|
|
88
88
|
the number of points actually enumerated.
|
|
89
89
|
|
|
90
90
|
The implementation is in `MaxPrioritySearchTree` (MaxPST for short), so called because internally the structure is, among other
|
|
91
91
|
things, a max-heap on the y-coordinates.
|
|
92
92
|
|
|
93
|
-
These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
|
|
94
|
-
[[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.
|
|
95
|
-
|
|
96
93
|
We also provide a `MinPrioritySearchTree`, which answers analagous queries in the southward-infinite quadrants and 3-sided
|
|
97
94
|
regions.
|
|
98
95
|
|
|
@@ -108,17 +105,17 @@ both a MaxPST and MinPST. But the presentiation is hard to follow in places and
|
|
|
108
105
|
|
|
109
106
|
## Segment Tree
|
|
110
107
|
|
|
111
|
-
|
|
112
|
-
elements in an arbitrary subinterval A
|
|
113
|
-
of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for
|
|
114
|
-
subarrays.
|
|
108
|
+
A segment tree stores information related to subintervals of a certain array. For example, a segment tree can be used to find the
|
|
109
|
+
sum of the elements in an arbitrary subinterval A(i..j) of an array A(0..n) in O(log n) time. Each node in the tree corresponds to a
|
|
110
|
+
subarray of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for
|
|
111
|
+
arbitrary subarrays.
|
|
115
112
|
|
|
116
113
|
An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
|
|
117
114
|
|
|
118
|
-
Generic code is provided in `SegmentTreeTemplate`. Concrete classes
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
`
|
|
115
|
+
Generic code is provided in `SegmentTreeTemplate`. Concrete classes provide a handful of simple lambdas and constants to the
|
|
116
|
+
template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a segment tree, for
|
|
117
|
+
which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes `MaxValSegmentTree` and
|
|
118
|
+
`IndexOfMaxValSegmentTree` for examples.
|
|
122
119
|
|
|
123
120
|
## Algorithms
|
|
124
121
|
|
|
@@ -131,7 +128,29 @@ The Algorithms submodule contains some algorithms using the data structures.
|
|
|
131
128
|
[left, right, bottom, top].
|
|
132
129
|
- The algorithm is due to [[DMNS2013]](#references).
|
|
133
130
|
|
|
131
|
+
# C Extensions
|
|
132
|
+
|
|
133
|
+
As another learning process I have implemented several of these data structures as C extensions. The class names have a "C" prefixed
|
|
134
|
+
and they can be required like their pure Ruby versions. They have the same APIs as their Ruby cousins.
|
|
135
|
+
|
|
136
|
+
## Disjoint Union
|
|
137
|
+
|
|
138
|
+
A benchmark suggests that a long sequence of `unite` operations is about 3 times as fast with the `CDisjointUnion` as with
|
|
139
|
+
`DisjointUnion`.
|
|
140
|
+
|
|
141
|
+
The implementation uses the remarkable Convenient Containers library from Jackson Allan.[[Allan]](#references).
|
|
142
|
+
|
|
143
|
+
## Segment Tree
|
|
144
|
+
|
|
145
|
+
`CSegmentTreeTemplate` is the C implementation of the generic class. Concrete classes are built on top of this in Ruby, just as with
|
|
146
|
+
the pure Ruby `SegmentTreeTemplate` class.
|
|
147
|
+
|
|
148
|
+
A benchmark suggests that a long sequence of `max_on` operations against a max-val Segment Tree is about 4 times as fast with the C
|
|
149
|
+
version as with the Ruby version. I'm a bit suprised the improvment isn't larger, but we must remember that the C code must still
|
|
150
|
+
interact with the Ruby objects in the underlying data array, and must "combine" them, etc., by calling Ruby lambdas.
|
|
151
|
+
|
|
134
152
|
# References
|
|
153
|
+
- [Allan] Allan, J., _CC: Convenient Containers_, https://github.com/JacksonAllan/CC, retrieved 2023-02-01.
|
|
135
154
|
- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
|
|
136
155
|
- [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI 10.1007/s00224-017-9760-2.
|
|
137
156
|
- [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985.
|
data/Rakefile
CHANGED
|
@@ -2,10 +2,12 @@ require 'rubygems'
|
|
|
2
2
|
require 'rake/testtask'
|
|
3
3
|
require 'rake/extensiontask'
|
|
4
4
|
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
5
|
+
['c_disjoint_union', 'c_segment_tree_template'].each do |extension_name|
|
|
6
|
+
Rake::ExtensionTask.new("data_structures_rmolinari/#{extension_name}") do |ext|
|
|
7
|
+
ext.name = extension_name
|
|
8
|
+
ext.ext_dir = "ext/#{extension_name}"
|
|
9
|
+
ext.lib_dir = 'lib/data_structures_rmolinari/'
|
|
10
|
+
end
|
|
9
11
|
end
|
|
10
12
|
|
|
11
13
|
Rake::TestTask.new do |t|
|
|
@@ -16,128 +16,84 @@
|
|
|
16
16
|
*/
|
|
17
17
|
|
|
18
18
|
#include "ruby.h"
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
// just fine.
|
|
22
|
-
#define mShared rb_define_module("Shared")
|
|
23
|
-
#define eDataError rb_const_get(mShared, rb_intern_const("DataError"))
|
|
19
|
+
#include "cc.h" // Convenient Containers
|
|
20
|
+
#include "shared.h"
|
|
24
21
|
|
|
25
22
|
/**
|
|
26
|
-
*
|
|
27
|
-
*
|
|
28
|
-
* Dynamic array of longs, with an initial value for otherwise uninitialized elements.
|
|
29
|
-
* Based on https://stackoverflow.com/questions/3536153/c-dynamically-growing-array
|
|
23
|
+
* Data type for the (parent, rank) pair, and some accessor helpers for the vec() container we are going to be using.
|
|
30
24
|
*/
|
|
31
|
-
typedef struct {
|
|
32
|
-
long *array;
|
|
33
|
-
size_t size;
|
|
34
|
-
long default_val;
|
|
35
|
-
} DynamicArray;
|
|
36
|
-
|
|
37
|
-
void initDynamicArray(DynamicArray *a, size_t initial_size, long default_val) {
|
|
38
|
-
a->array = malloc(initial_size * sizeof(long));
|
|
39
|
-
a->size = initial_size;
|
|
40
|
-
a->default_val = default_val;
|
|
41
|
-
|
|
42
|
-
for (size_t i = 0; i < initial_size; i++) {
|
|
43
|
-
a->array[i] = default_val;
|
|
44
|
-
}
|
|
45
|
-
}
|
|
46
|
-
|
|
47
|
-
void insertDynamicArray(DynamicArray *a, unsigned long index, long element) {
|
|
48
|
-
if (a->size <= index) {
|
|
49
|
-
size_t new_size = a->size;
|
|
50
|
-
while (new_size <= index) {
|
|
51
|
-
new_size = 8 * new_size / 5 + 8; // 8/5 gives "Fibonnacci-like" growth; adding 8 to avoid small arrays having to reallocate
|
|
52
|
-
// too often. Who knows if it's worth being "clever"."
|
|
53
|
-
}
|
|
54
|
-
|
|
55
|
-
long* new_array = realloc(a->array, new_size * sizeof(long));
|
|
56
|
-
if (!new_array) {
|
|
57
|
-
rb_raise(rb_eRuntimeError, "Cannot allocate memory to expand DynamicArray!");
|
|
58
|
-
}
|
|
59
25
|
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
26
|
+
typedef struct data_pair {
|
|
27
|
+
long parent;
|
|
28
|
+
unsigned long rank;
|
|
29
|
+
} data_pair;
|
|
64
30
|
|
|
65
|
-
|
|
66
|
-
|
|
31
|
+
#define DEFAULT_PARENT -1
|
|
32
|
+
#define DEFAULT_RANK 0
|
|
33
|
+
static data_pair default_pair = { .parent = DEFAULT_PARENT, .rank = DEFAULT_RANK };
|
|
67
34
|
|
|
68
|
-
|
|
35
|
+
static data_pair make_data_pair(long parent, unsigned long rank) {
|
|
36
|
+
data_pair pair = { .parent = parent, .rank = rank };
|
|
37
|
+
return pair;
|
|
69
38
|
}
|
|
70
39
|
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
40
|
+
/* The vector generic from Convenient Containers */
|
|
41
|
+
typedef vec(data_pair) pair_vector;
|
|
42
|
+
|
|
43
|
+
#define parent(disjoint_union_ptr, idx) (get(disjoint_union->pairs, idx)->parent)
|
|
44
|
+
#define rank(disjoint_union_ptr, idx) (get(disjoint_union->pairs, idx)->rank)
|
|
76
45
|
|
|
77
46
|
/**
|
|
78
47
|
* The C implementation of a Disjoint Union
|
|
79
48
|
*
|
|
80
|
-
* See
|
|
49
|
+
* See the paper for optimizations we use to get almost constant time for find() and unite().
|
|
50
|
+
*
|
|
51
|
+
* Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
|
|
81
52
|
*/
|
|
82
53
|
|
|
83
54
|
/*
|
|
84
55
|
* The Disjoint Union struct.
|
|
85
|
-
* -
|
|
86
|
-
* -
|
|
87
|
-
*
|
|
56
|
+
* - pairs: a vector (dynamic array) of pairs, the i-th of which contains
|
|
57
|
+
* - the "parent" of element i in its membership tree
|
|
58
|
+
* - An element e is the root of its tree just when it is its own parent
|
|
59
|
+
* - Two elements are in the same subset just when they are in the same tree in the forest.
|
|
88
60
|
* - So the key idea is that we can check this by navigating via parents from each element to their roots. Clever optimizations
|
|
89
61
|
* keep the trees flat and so most nodes are close to their roots.
|
|
90
|
-
*
|
|
91
|
-
*
|
|
92
|
-
* Leeuwen
|
|
62
|
+
* - the "rank" of element i
|
|
63
|
+
* - this value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat.
|
|
93
64
|
* - subset_count: the number of (disjoint) subsets.
|
|
94
65
|
* - it isn't needed internally but may be useful to client code.
|
|
95
66
|
*/
|
|
96
67
|
typedef struct du_data {
|
|
97
|
-
|
|
98
|
-
DynamicArray* rank; // the "ranks" of the elements, used when uniting subsets
|
|
68
|
+
pair_vector *pairs; // The generic vector container from the amazing Convenient Containers library
|
|
99
69
|
size_t subset_count;
|
|
100
70
|
} disjoint_union_data;
|
|
101
71
|
|
|
102
72
|
/*
|
|
103
|
-
* Create one.
|
|
104
|
-
*
|
|
105
|
-
* The dynamic arrays are initialized with a size of 100 because I didn't have a better idea. This will end up getting called from
|
|
106
|
-
* the Ruby #allocate method, which happens before #initialize. Thus we don't know the calling code's desired initial size.
|
|
73
|
+
* Create one (on the heap).
|
|
107
74
|
*/
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
disjoint_union_data* disjoint_union = malloc(sizeof(disjoint_union_data));
|
|
75
|
+
static disjoint_union_data *create_disjoint_union() {
|
|
76
|
+
disjoint_union_data *disjoint_union = (disjoint_union_data *)malloc(sizeof(disjoint_union_data));
|
|
111
77
|
|
|
112
78
|
// Allocate the structures
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
initDynamicArray(forest, INITIAL_SIZE, -1);
|
|
116
|
-
initDynamicArray(rank, INITIAL_SIZE, 0);
|
|
79
|
+
disjoint_union->pairs = malloc(sizeof(pair_vector));
|
|
80
|
+
init(disjoint_union->pairs);
|
|
117
81
|
|
|
118
|
-
disjoint_union->forest = forest;
|
|
119
|
-
disjoint_union->rank = rank;
|
|
120
82
|
disjoint_union->subset_count = 0;
|
|
121
83
|
|
|
122
84
|
return disjoint_union;
|
|
123
85
|
}
|
|
124
86
|
|
|
125
87
|
/*
|
|
126
|
-
* Free the memory associated with a disjoint union.
|
|
88
|
+
* Free the memory associated with a disjoint union.
|
|
89
|
+
*
|
|
90
|
+
* This will end up getting triggered by the Ruby garbage collector. Ruby learns about it via the disjoint_union_type struct below.
|
|
127
91
|
*/
|
|
128
92
|
static void disjoint_union_free(void *ptr) {
|
|
129
93
|
if (ptr) {
|
|
130
94
|
disjoint_union_data *disjoint_union = ptr;
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
free(disjoint_union->forest);
|
|
135
|
-
disjoint_union->forest = NULL;
|
|
136
|
-
|
|
137
|
-
free(disjoint_union->rank);
|
|
138
|
-
disjoint_union->rank = NULL;
|
|
139
|
-
|
|
140
|
-
free(disjoint_union);
|
|
95
|
+
cleanup(disjoint_union->pairs);
|
|
96
|
+
xfree(disjoint_union);
|
|
141
97
|
}
|
|
142
98
|
}
|
|
143
99
|
|
|
@@ -148,17 +104,23 @@ static void disjoint_union_free(void *ptr) {
|
|
|
148
104
|
/*
|
|
149
105
|
* Is the given element already a member of the universe?
|
|
150
106
|
*/
|
|
151
|
-
static int present_p(disjoint_union_data*
|
|
152
|
-
|
|
153
|
-
return (forest->size > element && (forest->array[element] != forest->default_val));
|
|
107
|
+
static int present_p(disjoint_union_data *disjoint_union, size_t element) {
|
|
108
|
+
return (size(disjoint_union->pairs) > element && (parent(disjoint_union, element) != DEFAULT_PARENT));
|
|
154
109
|
}
|
|
155
110
|
|
|
156
111
|
/*
|
|
157
112
|
* Check that the given element is a member of the universe and raise Shared::DataError (ruby-side) if not
|
|
158
113
|
*/
|
|
159
|
-
static void assert_membership(disjoint_union_data*
|
|
114
|
+
static void assert_membership(disjoint_union_data *disjoint_union, size_t element) {
|
|
160
115
|
if (!present_p(disjoint_union, element)) {
|
|
161
|
-
rb_raise(
|
|
116
|
+
rb_raise(eSharedDataError, "Value %zu is not part of the universe", element);
|
|
117
|
+
/* rb_raise( */
|
|
118
|
+
/* eSharedDataError, */
|
|
119
|
+
/* "Value %zu is not part of the universe, size = %zu, forest_val = %lu", */
|
|
120
|
+
/* element, */
|
|
121
|
+
/* size(disjoint_union->pairs), */
|
|
122
|
+
/* get(disjoint_union->pairs, element)->parent */
|
|
123
|
+
/* ); */
|
|
162
124
|
}
|
|
163
125
|
}
|
|
164
126
|
|
|
@@ -167,52 +129,57 @@ static void assert_membership(disjoint_union_data* disjoint_union, size_t elemen
|
|
|
167
129
|
*
|
|
168
130
|
* Shared::DataError is raised if it is already an element.
|
|
169
131
|
*/
|
|
170
|
-
static void add_new_element(disjoint_union_data*
|
|
132
|
+
static void add_new_element(disjoint_union_data *disjoint_union, size_t element) {
|
|
171
133
|
if (present_p(disjoint_union, element)) {
|
|
172
|
-
rb_raise(
|
|
134
|
+
rb_raise(eSharedDataError, "Element %zu already present in the universe", element);
|
|
135
|
+
}
|
|
136
|
+
|
|
137
|
+
// Expand the underlying vector if necessary
|
|
138
|
+
size_t sz = size(disjoint_union->pairs);
|
|
139
|
+
if (sz <= element) {
|
|
140
|
+
resize(disjoint_union->pairs, element + 1);
|
|
141
|
+
for (size_t i = sz + 1; i <= element; i++) {
|
|
142
|
+
lval(disjoint_union->pairs, i) = default_pair;
|
|
143
|
+
}
|
|
173
144
|
}
|
|
174
145
|
|
|
175
|
-
|
|
176
|
-
insertDynamicArray(disjoint_union->rank, element, 0);
|
|
146
|
+
lval(disjoint_union->pairs, element) = make_data_pair(element, 0l);
|
|
177
147
|
disjoint_union->subset_count++;
|
|
178
148
|
}
|
|
179
149
|
|
|
180
150
|
/*
|
|
181
|
-
* Find the canonical representative of the given element. This is the root of the tree
|
|
151
|
+
* Find the canonical representative of the given element. This is the root of the tree containing it.
|
|
182
152
|
*
|
|
183
153
|
* Two elements are in the same subset exactly when their canonical representatives are equal.
|
|
184
154
|
*/
|
|
185
|
-
static size_t find(disjoint_union_data*
|
|
155
|
+
static size_t find(disjoint_union_data *disjoint_union, size_t element) {
|
|
186
156
|
assert_membership(disjoint_union, element);
|
|
187
157
|
|
|
188
|
-
// We
|
|
189
|
-
long* d = disjoint_union->forest->array; // the actual forest data
|
|
158
|
+
// We use "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
|
|
190
159
|
size_t x = element;
|
|
191
|
-
|
|
192
|
-
|
|
160
|
+
long p, gp; // parent and grandparent
|
|
161
|
+
while (p = parent(disjoint_union, x), gp = parent(disjoint_union, p), p != gp) {
|
|
162
|
+
parent(disjoint_union, p) = gp;
|
|
163
|
+
x = gp;
|
|
193
164
|
}
|
|
194
|
-
return
|
|
165
|
+
return parent(disjoint_union, x);
|
|
195
166
|
}
|
|
196
167
|
|
|
197
168
|
/*
|
|
198
|
-
* "Link"
|
|
169
|
+
* "Link" the two given elements so that they are in the same subset now.
|
|
199
170
|
*
|
|
200
171
|
* In other words, merge the subtrees containing the two elements.
|
|
201
172
|
*
|
|
202
|
-
*
|
|
203
|
-
* though we don't check that here.
|
|
173
|
+
* elt1 and elt2 area must be disinct and the roots of their trees, though we don't check that here.
|
|
204
174
|
*/
|
|
205
|
-
static void link_roots(disjoint_union_data*
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
} else if (rank[elt1] == rank[elt2]) {
|
|
212
|
-
forest[elt2] = elt1;
|
|
213
|
-
rank[elt1]++;
|
|
175
|
+
static void link_roots(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2) {
|
|
176
|
+
if (rank(disjoint_union, elt1) > rank(disjoint_union, elt2)) {
|
|
177
|
+
parent(disjoint_union, elt2) = elt1;
|
|
178
|
+
} else if (rank(disjoint_union, elt1) == rank(disjoint_union, elt2)) {
|
|
179
|
+
parent(disjoint_union, elt2) = elt1;
|
|
180
|
+
rank(disjoint_union, elt1)++;
|
|
214
181
|
} else {
|
|
215
|
-
|
|
182
|
+
parent(disjoint_union, elt1) = elt2;
|
|
216
183
|
}
|
|
217
184
|
|
|
218
185
|
disjoint_union->subset_count--;
|
|
@@ -221,12 +188,12 @@ static void link_roots(disjoint_union_data* disjoint_union, size_t elt1, size_t
|
|
|
221
188
|
/*
|
|
222
189
|
* "Unite" or merge the subsets containing elt1 and elt2.
|
|
223
190
|
*/
|
|
224
|
-
static void unite(disjoint_union_data*
|
|
191
|
+
static void unite(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2) {
|
|
225
192
|
assert_membership(disjoint_union, elt1);
|
|
226
193
|
assert_membership(disjoint_union, elt2);
|
|
227
194
|
|
|
228
195
|
if (elt1 == elt2) {
|
|
229
|
-
rb_raise(
|
|
196
|
+
rb_raise(eSharedDataError, "Uniting an element with itself is meaningless");
|
|
230
197
|
}
|
|
231
198
|
|
|
232
199
|
size_t root1 = find(disjoint_union, elt1);
|
|
@@ -249,8 +216,10 @@ static void unite(disjoint_union_data* disjoint_union, size_t elt1, size_t elt2)
|
|
|
249
216
|
// deciding how agressive to be during garbage collection and such.
|
|
250
217
|
static size_t disjoint_union_memsize(const void *ptr) {
|
|
251
218
|
if (ptr) {
|
|
252
|
-
const disjoint_union_data *
|
|
253
|
-
|
|
219
|
+
const disjoint_union_data *du = ptr;
|
|
220
|
+
|
|
221
|
+
// See https://github.com/JacksonAllan/CC/issues/3
|
|
222
|
+
return sizeof( cc_vec_hdr_ty ) + cap( du->pairs ) * CC_EL_SIZE( *(du->pairs) );
|
|
254
223
|
} else {
|
|
255
224
|
return 0;
|
|
256
225
|
}
|
|
@@ -273,26 +242,10 @@ static const rb_data_type_t disjoint_union_type = {
|
|
|
273
242
|
};
|
|
274
243
|
|
|
275
244
|
/*
|
|
276
|
-
*
|
|
277
|
-
*
|
|
278
|
-
* TODO: can we return an size_t or unsigned long instead?
|
|
245
|
+
* Unwrap a Ruby-side disjoint union object to get the C struct inside.
|
|
279
246
|
*/
|
|
280
|
-
static
|
|
281
|
-
|
|
282
|
-
long c_val = FIX2LONG(val);
|
|
283
|
-
|
|
284
|
-
if (c_val < 0) {
|
|
285
|
-
rb_raise(eDataError, "Value must be non-negative");
|
|
286
|
-
}
|
|
287
|
-
|
|
288
|
-
return c_val;
|
|
289
|
-
}
|
|
290
|
-
|
|
291
|
-
/*
|
|
292
|
-
* Unwrap a Rubyfied disjoint union to get the C struct inside.
|
|
293
|
-
*/
|
|
294
|
-
static disjoint_union_data* unwrapped(VALUE self) {
|
|
295
|
-
disjoint_union_data* disjoint_union;
|
|
247
|
+
static disjoint_union_data *unwrapped(VALUE self) {
|
|
248
|
+
disjoint_union_data *disjoint_union;
|
|
296
249
|
TypedData_Get_Struct((self), disjoint_union_data, &disjoint_union_type, disjoint_union);
|
|
297
250
|
return disjoint_union;
|
|
298
251
|
}
|
|
@@ -301,7 +254,9 @@ static disjoint_union_data* unwrapped(VALUE self) {
|
|
|
301
254
|
* This is for CDisjointUnion.allocate on the Ruby side
|
|
302
255
|
*/
|
|
303
256
|
static VALUE disjoint_union_alloc(VALUE klass) {
|
|
304
|
-
|
|
257
|
+
// Get one on the heap
|
|
258
|
+
disjoint_union_data *disjoint_union = create_disjoint_union();
|
|
259
|
+
// Wrap it up into a Ruby object
|
|
305
260
|
return TypedData_Wrap_Struct(klass, &disjoint_union_type, disjoint_union);
|
|
306
261
|
}
|
|
307
262
|
|
|
@@ -318,11 +273,15 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
|
|
|
318
273
|
rb_raise(rb_eArgError, "wrong number of arguments");
|
|
319
274
|
} else {
|
|
320
275
|
size_t initial_size = checked_nonneg_fixnum(argv[0]);
|
|
321
|
-
disjoint_union_data*
|
|
276
|
+
disjoint_union_data *disjoint_union = unwrapped(self);
|
|
277
|
+
|
|
278
|
+
pair_vector *pair_vec = disjoint_union->pairs;
|
|
279
|
+
resize(pair_vec, initial_size);
|
|
322
280
|
|
|
323
281
|
for (size_t i = 0; i < initial_size; i++) {
|
|
324
|
-
|
|
282
|
+
lval(pair_vec, i) = make_data_pair(i, 0);
|
|
325
283
|
}
|
|
284
|
+
disjoint_union->subset_count = initial_size;
|
|
326
285
|
}
|
|
327
286
|
return self;
|
|
328
287
|
}
|
|
@@ -330,7 +289,7 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
|
|
|
330
289
|
/**
|
|
331
290
|
* And now the simple wrappers around the Disjoint Union C functionality. In each case we
|
|
332
291
|
* - unwrap a 'VALUE self',
|
|
333
|
-
* - i.e.,
|
|
292
|
+
* - i.e., the CDisjointUnion instance on the Ruby side;
|
|
334
293
|
* - munge any other arguments into longs;
|
|
335
294
|
* - call the appropriate C function to act on the struct; and
|
|
336
295
|
* - return an appropriate VALUE for the Ruby runtime can use.
|
|
@@ -341,7 +300,7 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
|
|
|
341
300
|
/*
|
|
342
301
|
* Add a new subset to the universe containing the element +new_v+.
|
|
343
302
|
*
|
|
344
|
-
* @param the new element, starting in its own singleton subset
|
|
303
|
+
* @param arg the new element, starting in its own singleton subset
|
|
345
304
|
* - it must be a non-negative integer, not already part of the universe of elements.
|
|
346
305
|
*/
|
|
347
306
|
static VALUE disjoint_union_make_set(VALUE self, VALUE arg) {
|
|
@@ -389,8 +348,7 @@ static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
|
|
|
389
348
|
* The data structure provides efficient actions to merge two disjoint subsets, i.e., replace them by their union, and determine if
|
|
390
349
|
* two elements are in the same subset.
|
|
391
350
|
*
|
|
392
|
-
* The elements of the set are
|
|
393
|
-
* representatives.
|
|
351
|
+
* The elements of the set are non-negative integers. Client code can map its data to these representatives.
|
|
394
352
|
*
|
|
395
353
|
* See https://en.wikipedia.org/wiki/Disjoint-set_data_structure for a good introduction.
|
|
396
354
|
*
|
|
@@ -400,7 +358,7 @@ static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
|
|
|
400
358
|
* - Tarjan, Robert E., van Leeuwen, Jan (1984). _Worst-case analysis of set union algorithms_. Journal of the ACM. 31 (2): 245–281.
|
|
401
359
|
*/
|
|
402
360
|
void Init_c_disjoint_union() {
|
|
403
|
-
VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
|
|
361
|
+
//VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
|
|
404
362
|
VALUE cDisjointUnion = rb_define_class_under(mDataStructuresRMolinari, "CDisjointUnion", rb_cObject);
|
|
405
363
|
|
|
406
364
|
rb_define_alloc_func(cDisjointUnion, disjoint_union_alloc);
|
|
@@ -3,10 +3,15 @@ require 'mkmf'
|
|
|
3
3
|
abort 'missing malloc()' unless have_func "malloc"
|
|
4
4
|
abort 'missing realloc()' unless have_func "realloc"
|
|
5
5
|
|
|
6
|
-
if try_cflags('-
|
|
7
|
-
append_cflags('-
|
|
6
|
+
if try_cflags('-O3')
|
|
7
|
+
append_cflags('-O3')
|
|
8
8
|
end
|
|
9
9
|
|
|
10
10
|
extension_name = "c_disjoint_union"
|
|
11
11
|
dir_config(extension_name)
|
|
12
|
+
|
|
13
|
+
$srcs = ["disjoint_union.c", "../shared.c"]
|
|
14
|
+
$INCFLAGS << " -I$(srcdir)/.."
|
|
15
|
+
$VPATH << "$(srcdir)/.."
|
|
16
|
+
|
|
12
17
|
create_makefile("data_structures_rmolinari/c_disjoint_union")
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
require 'mkmf'
|
|
2
|
+
|
|
3
|
+
abort 'missing malloc()' unless have_func "malloc"
|
|
4
|
+
abort 'missing realloc()' unless have_func "realloc"
|
|
5
|
+
|
|
6
|
+
if try_cflags('-O3')
|
|
7
|
+
append_cflags('-O3')
|
|
8
|
+
end
|
|
9
|
+
|
|
10
|
+
extension_name = "c_segment_tree_template"
|
|
11
|
+
dir_config(extension_name)
|
|
12
|
+
|
|
13
|
+
$srcs = ["segment_tree_template.c", "../shared.c"]
|
|
14
|
+
$INCFLAGS << " -I$(srcdir)/.."
|
|
15
|
+
$VPATH << "$(srcdir)/.."
|
|
16
|
+
|
|
17
|
+
create_makefile("data_structures_rmolinari/c_segment_tree_template")
|
|
@@ -0,0 +1,362 @@
|
|
|
1
|
+
/*
|
|
2
|
+
* This is a C implementation of a Segment Tree data structure.
|
|
3
|
+
*
|
|
4
|
+
* More specifically, it is the C version of the SegmentTreeTemplate Ruby class, for which see elsewhere in the repo.
|
|
5
|
+
*/
|
|
6
|
+
|
|
7
|
+
#include "ruby.h"
|
|
8
|
+
#include "shared.h"
|
|
9
|
+
|
|
10
|
+
#define single_cell_val_at(seg_tree, idx) rb_funcall(seg_tree->single_cell_array_val_lambda, rb_intern("call"), 1, LONG2FIX(idx))
|
|
11
|
+
#define combined_val(seg_tree, v1, v2) rb_funcall(seg_tree->combine_lambda, rb_intern("call"), 2, (v1), (v2))
|
|
12
|
+
|
|
13
|
+
/**
|
|
14
|
+
* The C implementation of a generic Segment Tree
|
|
15
|
+
*/
|
|
16
|
+
|
|
17
|
+
typedef struct {
|
|
18
|
+
VALUE *tree; // The 1-based implicit binary tree in which the data structure lives
|
|
19
|
+
VALUE single_cell_array_val_lambda;
|
|
20
|
+
VALUE combine_lambda;
|
|
21
|
+
VALUE identity;
|
|
22
|
+
size_t size; // the size of the underlying data array
|
|
23
|
+
size_t tree_alloc_size; // the size of the VALUE* tree array
|
|
24
|
+
} segment_tree_data;
|
|
25
|
+
|
|
26
|
+
/************************************************************
|
|
27
|
+
* Memory Management
|
|
28
|
+
*
|
|
29
|
+
*/
|
|
30
|
+
|
|
31
|
+
/*
|
|
32
|
+
* Create one (on the heap).
|
|
33
|
+
*/
|
|
34
|
+
static segment_tree_data *create_segment_tree() {
|
|
35
|
+
segment_tree_data *segment_tree = malloc(sizeof(segment_tree_data));
|
|
36
|
+
|
|
37
|
+
// Allocate the structures
|
|
38
|
+
segment_tree->tree = NULL; // we don't yet know how much space we need
|
|
39
|
+
|
|
40
|
+
segment_tree->single_cell_array_val_lambda = 0;
|
|
41
|
+
segment_tree->combine_lambda = 0;
|
|
42
|
+
segment_tree->size = 0; // we don't know the right value yet
|
|
43
|
+
|
|
44
|
+
return segment_tree;
|
|
45
|
+
}
|
|
46
|
+
|
|
47
|
+
/*
|
|
48
|
+
* Free the memory associated with a segment_tree.
|
|
49
|
+
*
|
|
50
|
+
* This will end up getting triggered by the Ruby garbage collector. Ruby learns about it via the segment_tree_type struct below.
|
|
51
|
+
*/
|
|
52
|
+
static void segment_tree_free(void *ptr) {
|
|
53
|
+
if (ptr) {
|
|
54
|
+
segment_tree_data *segment_tree = ptr;
|
|
55
|
+
xfree(segment_tree->tree);
|
|
56
|
+
xfree(segment_tree);
|
|
57
|
+
}
|
|
58
|
+
}
|
|
59
|
+
|
|
60
|
+
/*
|
|
61
|
+
* How much memory (roughly) does a segment_tree_data instance consume?
|
|
62
|
+
*
|
|
63
|
+
* I guess the Ruby runtime can use this information when deciding how agressive to be during garbage collection and such.
|
|
64
|
+
*/
|
|
65
|
+
static size_t segment_tree_memsize(const void *ptr) {
|
|
66
|
+
if (ptr) {
|
|
67
|
+
const segment_tree_data *st = ptr;
|
|
68
|
+
|
|
69
|
+
// for the tree array plus the size of the segment_tree_data struct itself.
|
|
70
|
+
return sizeof( VALUE ) * st->tree_alloc_size * 4 + sizeof(segment_tree_data);
|
|
71
|
+
} else {
|
|
72
|
+
return 0;
|
|
73
|
+
}
|
|
74
|
+
}
|
|
75
|
+
|
|
76
|
+
/*
|
|
77
|
+
* Mark the Ruby objects we hold so that the Ruby garbage collector knows that they are still in use.
|
|
78
|
+
*/
|
|
79
|
+
static void segment_tree_mark(void *ptr) {
|
|
80
|
+
segment_tree_data *st = ptr;
|
|
81
|
+
|
|
82
|
+
rb_gc_mark(st->combine_lambda);
|
|
83
|
+
rb_gc_mark(st->single_cell_array_val_lambda);
|
|
84
|
+
rb_gc_mark(st->identity);
|
|
85
|
+
|
|
86
|
+
for (size_t i = 0; i < st->tree_alloc_size; i++) {
|
|
87
|
+
VALUE value = st->tree[i];
|
|
88
|
+
if (value) {
|
|
89
|
+
rb_gc_mark(value);
|
|
90
|
+
}
|
|
91
|
+
}
|
|
92
|
+
}
|
|
93
|
+
|
|
94
|
+
|
|
95
|
+
/*
|
|
96
|
+
* A configuration struct that tells the Ruby runtime how to deal with a segment_tree_data object.
|
|
97
|
+
*
|
|
98
|
+
* https://docs.ruby-lang.org/en/master/extension_rdoc.html#label-Encapsulate+C+data+into+a+Ruby+object
|
|
99
|
+
*/
|
|
100
|
+
static const rb_data_type_t segment_tree_type = {
|
|
101
|
+
.wrap_struct_name = "segment_tree_template",
|
|
102
|
+
{ // help for the Ruby garbage collector
|
|
103
|
+
.dmark = segment_tree_mark, // dmark, for marking other Ruby objects.
|
|
104
|
+
.dfree = segment_tree_free, // how to free the memory associated with an object
|
|
105
|
+
.dsize = segment_tree_memsize, // roughly how much space does the object consume?
|
|
106
|
+
},
|
|
107
|
+
.data = NULL, // a data field we could use for something here if we wanted. Ruby ignores it
|
|
108
|
+
.flags = 0 // GC-related flag values.
|
|
109
|
+
};
|
|
110
|
+
|
|
111
|
+
/*
|
|
112
|
+
* End memory management functions.
|
|
113
|
+
************************************************************/
|
|
114
|
+
|
|
115
|
+
|
|
116
|
+
/************************************************************
|
|
117
|
+
* Wrapping and unwrapping the C struct and other things.
|
|
118
|
+
*
|
|
119
|
+
*/
|
|
120
|
+
|
|
121
|
+
/*
|
|
122
|
+
* Unwrap a Ruby-side disjoint union object to get the C struct inside.
|
|
123
|
+
*
|
|
124
|
+
* TODO: consider a macro in a shared header
|
|
125
|
+
*/
|
|
126
|
+
static segment_tree_data *unwrapped(VALUE self) {
|
|
127
|
+
segment_tree_data *segment_tree;
|
|
128
|
+
TypedData_Get_Struct((self), segment_tree_data, &segment_tree_type, segment_tree);
|
|
129
|
+
return segment_tree;
|
|
130
|
+
}
|
|
131
|
+
|
|
132
|
+
/*
|
|
133
|
+
* Allocate a segment_tree_data struct and wrap it for the Ruby runtime.
|
|
134
|
+
*
|
|
135
|
+
* This is for CSegmentTreeTemplate.allocate on the Ruby side.
|
|
136
|
+
*/
|
|
137
|
+
static VALUE segment_tree_alloc(VALUE klass) {
|
|
138
|
+
// Get one on the heap
|
|
139
|
+
segment_tree_data *segment_tree = create_segment_tree();
|
|
140
|
+
// ...and wrap it into a Ruby object
|
|
141
|
+
return TypedData_Wrap_Struct(klass, &segment_tree_type, segment_tree);
|
|
142
|
+
}
|
|
143
|
+
|
|
144
|
+
/*
|
|
145
|
+
* End wrapping and unwrapping functions.
|
|
146
|
+
************************************************************/
|
|
147
|
+
|
|
148
|
+
/************************************************************
|
|
149
|
+
* The Segment Tree API on the C side.
|
|
150
|
+
*
|
|
151
|
+
* We wrap these in the Ruby-ready functions below
|
|
152
|
+
*/
|
|
153
|
+
|
|
154
|
+
/*
|
|
155
|
+
* Recursively build the internal tree data structure.
|
|
156
|
+
*
|
|
157
|
+
* - tree_idx: the index into the tree array of the node being calculated
|
|
158
|
+
* - [tree_l, tree_r]: the sub-interval of the underlying array data corresponding to the tree node being calculated.
|
|
159
|
+
*/
|
|
160
|
+
static void build(segment_tree_data *segment_tree, size_t tree_idx, size_t tree_l, size_t tree_r) {
|
|
161
|
+
VALUE *tree = segment_tree->tree;
|
|
162
|
+
|
|
163
|
+
if (tree_l == tree_r) {
|
|
164
|
+
// Base case: the node corresponds to a subarray of length 1.
|
|
165
|
+
segment_tree->tree[tree_idx] = single_cell_val_at(segment_tree, tree_l);
|
|
166
|
+
} else {
|
|
167
|
+
// Build to two child nodes, and then combine their values for this node.
|
|
168
|
+
size_t mid = midpoint(tree_l, tree_r);
|
|
169
|
+
size_t left = left_child(tree_idx);
|
|
170
|
+
size_t right = right_child(tree_idx);
|
|
171
|
+
|
|
172
|
+
build(segment_tree, left, tree_l, mid);
|
|
173
|
+
build(segment_tree, right, mid + 1, tree_r);
|
|
174
|
+
|
|
175
|
+
VALUE comb_val = combined_val(segment_tree, tree[left], tree[right]);
|
|
176
|
+
segment_tree->tree[tree_idx] = comb_val;
|
|
177
|
+
}
|
|
178
|
+
}
|
|
179
|
+
|
|
180
|
+
/*
|
|
181
|
+
* Set up the internals with the arguments we get from #initialize.
|
|
182
|
+
*
|
|
183
|
+
* - combine: must be callable
|
|
184
|
+
* - single_cell_array_val: must be callable
|
|
185
|
+
* - size: must be a positive integer
|
|
186
|
+
* - identity: we don't care what it is.
|
|
187
|
+
* - maybe we should check at least that it is not 0. But Qnil is fine.
|
|
188
|
+
*/
|
|
189
|
+
static void setup(segment_tree_data* seg_tree, VALUE combine, VALUE single_cell_array_val, VALUE size, VALUE identity) {
|
|
190
|
+
VALUE idCall = rb_intern("call");
|
|
191
|
+
|
|
192
|
+
if (!rb_obj_respond_to(combine, idCall, TRUE)) {
|
|
193
|
+
rb_raise(rb_eArgError, "wrong type argument %"PRIsVALUE" (should be callable)", rb_obj_class(combine));
|
|
194
|
+
}
|
|
195
|
+
|
|
196
|
+
if (!rb_obj_respond_to(single_cell_array_val, idCall, TRUE)) {
|
|
197
|
+
rb_raise(rb_eArgError, "wrong type argument %"PRIsVALUE" (should be callable)", rb_obj_class(single_cell_array_val));
|
|
198
|
+
}
|
|
199
|
+
|
|
200
|
+
seg_tree->combine_lambda = combine;
|
|
201
|
+
seg_tree->single_cell_array_val_lambda = single_cell_array_val;
|
|
202
|
+
seg_tree->identity = identity;
|
|
203
|
+
seg_tree->size = checked_nonneg_fixnum(size);
|
|
204
|
+
|
|
205
|
+
if (seg_tree->size == 0) {
|
|
206
|
+
rb_raise(rb_eArgError, "size must be positive.");
|
|
207
|
+
}
|
|
208
|
+
|
|
209
|
+
// Implicit binary tree with n leaves and straightforward left() and right() may use indices up to 4n. But see here for a way to
|
|
210
|
+
// reduce the requirement to 2n: https://cp-algorithms.com/data_structures/segment_tree.html#memory-efficient-implementation
|
|
211
|
+
size_t tree_size = 1 + 4 * seg_tree->size;
|
|
212
|
+
seg_tree->tree = calloc(tree_size, sizeof(VALUE));
|
|
213
|
+
seg_tree->tree_alloc_size = tree_size;
|
|
214
|
+
|
|
215
|
+
build(seg_tree, TREE_ROOT, 0, seg_tree->size - 1);
|
|
216
|
+
}
|
|
217
|
+
|
|
218
|
+
|
|
219
|
+
/*
|
|
220
|
+
* Determine the value for the subarray A(left, right).
|
|
221
|
+
*
|
|
222
|
+
* - tree_idx: the index in the array of the node we are currently visiting
|
|
223
|
+
* - tree_l..tree_r: the subarray handled by the current node.
|
|
224
|
+
* - left..right: the subarray whose value we are currently looking for.
|
|
225
|
+
*
|
|
226
|
+
* As an invariant we have left..right \subset tree_l..tree_r.
|
|
227
|
+
*
|
|
228
|
+
* We start out with
|
|
229
|
+
* - tree_idx = TREE_ROOT
|
|
230
|
+
* - tree_l..tree_r = 0..(size - 1), and
|
|
231
|
+
* - left..right given by the client code's query
|
|
232
|
+
*
|
|
233
|
+
* If [tree_l, tree_r] = [left, right] then the current node gives the desired answer. Otherwise we decend the tree with one or two
|
|
234
|
+
* recursive calls.
|
|
235
|
+
*
|
|
236
|
+
* If left..right is contained the the bottom or top half of tree_l..tree_r we decend to the corresponding child with one recursive
|
|
237
|
+
* call. Otherwise we split left..right at the midpoint of tree_l..tree_r, make two recursive calls, and then combine the results.
|
|
238
|
+
*/
|
|
239
|
+
static VALUE determine_val(segment_tree_data* seg_tree, size_t tree_idx, size_t left, size_t right, size_t tree_l, size_t tree_r) {
|
|
240
|
+
// Does the current tree node exactly serve up the interval we're interested in?
|
|
241
|
+
if (left == tree_l && right == tree_r) {
|
|
242
|
+
return seg_tree->tree[tree_idx];
|
|
243
|
+
}
|
|
244
|
+
|
|
245
|
+
// We need to go further down the tree */
|
|
246
|
+
size_t mid = midpoint(tree_l, tree_r);
|
|
247
|
+
if (mid >= right) {
|
|
248
|
+
// Our interval is contained by the left child's interval
|
|
249
|
+
return determine_val(seg_tree, left_child(tree_idx), left, right, tree_l, mid);
|
|
250
|
+
} else if (mid + 1 <= left) {
|
|
251
|
+
// Our interval is contained by the right child's interval
|
|
252
|
+
return determine_val(seg_tree, right_child(tree_idx), left, right, mid + 1, tree_r);
|
|
253
|
+
} else {
|
|
254
|
+
// Our interval is split between the two, so we need to combine the results from the children.
|
|
255
|
+
return rb_funcall(
|
|
256
|
+
seg_tree->combine_lambda, rb_intern("call"), 2,
|
|
257
|
+
determine_val(seg_tree, left_child(tree_idx), left, mid, tree_l, mid),
|
|
258
|
+
determine_val(seg_tree, right_child(tree_idx), mid + 1, right, mid + 1, tree_r)
|
|
259
|
+
);
|
|
260
|
+
}
|
|
261
|
+
}
|
|
262
|
+
|
|
263
|
+
/*
|
|
264
|
+
* Update the structure to reflect the change in the underlying array at index idx.
|
|
265
|
+
*
|
|
266
|
+
* - idx: the index at which the underlying array data has changed.
|
|
267
|
+
* - tree_id: the index in the internal datastructure of the node we are currently visiting.
|
|
268
|
+
* - tree_l..tree_r: the range handled by the current node
|
|
269
|
+
*/
|
|
270
|
+
static void update_val_at(segment_tree_data *seg_tree, size_t idx, size_t tree_idx, size_t tree_l, size_t tree_r) {
|
|
271
|
+
if (tree_l == tree_r) {
|
|
272
|
+
// We have found the base case of our update
|
|
273
|
+
if (tree_l != idx) {
|
|
274
|
+
rb_raise(
|
|
275
|
+
eSharedInternalLogicError,
|
|
276
|
+
"tree_l == tree_r == %lu but they do not agree with the idx %lu holding the updated value",
|
|
277
|
+
tree_r, idx
|
|
278
|
+
);
|
|
279
|
+
}
|
|
280
|
+
seg_tree->tree[tree_idx] = single_cell_val_at(seg_tree, tree_l);
|
|
281
|
+
} else {
|
|
282
|
+
// Recursively update the appropriate subtree...
|
|
283
|
+
size_t mid = midpoint(tree_l, tree_r);
|
|
284
|
+
size_t left = left_child(tree_idx);
|
|
285
|
+
size_t right = right_child(tree_idx);
|
|
286
|
+
if (mid >= idx) {
|
|
287
|
+
update_val_at(seg_tree, idx, left, tree_l, mid);
|
|
288
|
+
} else {
|
|
289
|
+
update_val_at(seg_tree, idx, right, mid + 1, tree_r);
|
|
290
|
+
}
|
|
291
|
+
// ...and ourself to incorporate the change
|
|
292
|
+
seg_tree->tree[tree_idx] = combined_val(seg_tree, seg_tree->tree[left], seg_tree->tree[right]);
|
|
293
|
+
}
|
|
294
|
+
}
|
|
295
|
+
|
|
296
|
+
/*
|
|
297
|
+
* End C implementation of the Segment Tree API
|
|
298
|
+
************************************************************/
|
|
299
|
+
|
|
300
|
+
/**
|
|
301
|
+
* And now the wrappers around the C functionality.
|
|
302
|
+
*/
|
|
303
|
+
|
|
304
|
+
/*
|
|
305
|
+
* CSegmentTreeTemplate#c_initialize.
|
|
306
|
+
*
|
|
307
|
+
* (see CSegmentTreeTemplate#initialize).
|
|
308
|
+
*/
|
|
309
|
+
static VALUE segment_tree_init(VALUE self, VALUE combine, VALUE single_cell_array_val, VALUE size, VALUE identity) {
|
|
310
|
+
setup(unwrapped(self), combine, single_cell_array_val, size, identity);
|
|
311
|
+
return self;
|
|
312
|
+
}
|
|
313
|
+
|
|
314
|
+
/*
|
|
315
|
+
* (see SegmentTreeTemplate#query_on)
|
|
316
|
+
*/
|
|
317
|
+
static VALUE segment_tree_query_on(VALUE self, VALUE left, VALUE right) {
|
|
318
|
+
segment_tree_data* seg_tree = unwrapped(self);
|
|
319
|
+
size_t c_left = checked_nonneg_fixnum(left);
|
|
320
|
+
size_t c_right = checked_nonneg_fixnum(right);
|
|
321
|
+
|
|
322
|
+
if (c_right >= seg_tree->size) {
|
|
323
|
+
rb_raise(eSharedDataError, "Bad query interval %lu..%lu (size = %lu)", c_left, c_right, seg_tree->size);
|
|
324
|
+
}
|
|
325
|
+
|
|
326
|
+
if (left > right) {
|
|
327
|
+
// empty interval.
|
|
328
|
+
return seg_tree->identity;
|
|
329
|
+
}
|
|
330
|
+
|
|
331
|
+
return determine_val(seg_tree, TREE_ROOT, c_left, c_right, 0, seg_tree->size - 1);
|
|
332
|
+
}
|
|
333
|
+
|
|
334
|
+
/*
|
|
335
|
+
* (see SegmentTreeTemplate#update_at)
|
|
336
|
+
*/
|
|
337
|
+
static VALUE segment_tree_update_at(VALUE self, VALUE idx) {
|
|
338
|
+
segment_tree_data *seg_tree = unwrapped(self);
|
|
339
|
+
size_t c_idx = checked_nonneg_fixnum(idx);
|
|
340
|
+
|
|
341
|
+
if (c_idx >= seg_tree->size) {
|
|
342
|
+
rb_raise(eSharedDataError, "Cannot update value at index %lu, size = %lu", c_idx, seg_tree->size);
|
|
343
|
+
}
|
|
344
|
+
|
|
345
|
+
update_val_at(seg_tree, c_idx, TREE_ROOT, 0, seg_tree->size - 1);
|
|
346
|
+
|
|
347
|
+
return Qnil;
|
|
348
|
+
}
|
|
349
|
+
|
|
350
|
+
/*
|
|
351
|
+
* A generic Segment Tree template, written in C.
|
|
352
|
+
*
|
|
353
|
+
* (see SegmentTreeTemplate)
|
|
354
|
+
*/
|
|
355
|
+
void Init_c_segment_tree_template() {
|
|
356
|
+
VALUE cSegmentTreeTemplate = rb_define_class_under(mDataStructuresRMolinari, "CSegmentTreeTemplate", rb_cObject);
|
|
357
|
+
|
|
358
|
+
rb_define_alloc_func(cSegmentTreeTemplate, segment_tree_alloc);
|
|
359
|
+
rb_define_method(cSegmentTreeTemplate, "c_initialize", segment_tree_init, 4);
|
|
360
|
+
rb_define_method(cSegmentTreeTemplate, "query_on", segment_tree_query_on, 2);
|
|
361
|
+
rb_define_method(cSegmentTreeTemplate, "update_at", segment_tree_update_at, 1);
|
|
362
|
+
}
|
data/ext/shared.c
ADDED
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
#include "shared.h"
|
|
2
|
+
|
|
3
|
+
/*
|
|
4
|
+
* Arithmetic for in-array binary tree
|
|
5
|
+
*/
|
|
6
|
+
size_t midpoint(size_t left, size_t right) {
|
|
7
|
+
return (left + right) / 2;
|
|
8
|
+
}
|
|
9
|
+
|
|
10
|
+
size_t left_child(size_t i) {
|
|
11
|
+
return i << 1;
|
|
12
|
+
}
|
|
13
|
+
|
|
14
|
+
size_t right_child(size_t i) {
|
|
15
|
+
return 1 + (i << 1);
|
|
16
|
+
}
|
|
17
|
+
|
|
18
|
+
/*
|
|
19
|
+
* Check that a Ruby value is a non-negative Fixnum and convert it to a C unsigned long
|
|
20
|
+
*/
|
|
21
|
+
unsigned long checked_nonneg_fixnum(VALUE val) {
|
|
22
|
+
Check_Type(val, T_FIXNUM);
|
|
23
|
+
long c_val = FIX2LONG(val);
|
|
24
|
+
|
|
25
|
+
if (c_val < 0) {
|
|
26
|
+
rb_raise(eSharedDataError, "Value must be non-negative");
|
|
27
|
+
}
|
|
28
|
+
|
|
29
|
+
return c_val;
|
|
30
|
+
}
|
|
31
|
+
|
|
32
|
+
|
|
@@ -0,0 +1,112 @@
|
|
|
1
|
+
require 'must_be'
|
|
2
|
+
|
|
3
|
+
require_relative 'shared'
|
|
4
|
+
require_relative 'c_segment_tree_template'
|
|
5
|
+
|
|
6
|
+
# The template of Segment Tree, which can be used for various interval-related purposes, like efficiently finding the sum (or min or
|
|
7
|
+
# max) on a arbitrary subarray of a given array.
|
|
8
|
+
#
|
|
9
|
+
# There is an excellent description of the data structure at https://cp-algorithms.com/data_structures/segment_tree.html. The
|
|
10
|
+
# Wikipedia article (https://en.wikipedia.org/wiki/Segment_tree) appears to describe a different data structure which is sometimes
|
|
11
|
+
# called an "interval tree."
|
|
12
|
+
#
|
|
13
|
+
# For more details (and some close-to-metal analysis of run time, especially for large datasets) see
|
|
14
|
+
# https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up implementation,
|
|
15
|
+
# which is faster, at least for large datasets and cache-relevant compiled code. These issues don't really apply to code written in
|
|
16
|
+
# Ruby.
|
|
17
|
+
#
|
|
18
|
+
# This is a generic implementation, intended to allow easy configuration for concrete instances. See the parameters to the
|
|
19
|
+
# initializer and the definitions of concrete realisations like MaxValSegmentTree.
|
|
20
|
+
#
|
|
21
|
+
# We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
|
|
22
|
+
class DataStructuresRMolinari::CSegmentTreeTemplate
|
|
23
|
+
|
|
24
|
+
# Construct a concrete instance of a Segment Tree. See details at the links above for the underlying concepts here.
|
|
25
|
+
# @param combine a lambda that takes two values and munges them into a combined value.
|
|
26
|
+
# - For example, if we are calculating sums over subintervals, combine.call(a, b) = a + b, while if we are doing maxima we will
|
|
27
|
+
# return max(a, b).
|
|
28
|
+
# - Things get more complicated when we are calculating, say, the _index_ of the maximal value in a subinterval. Now it is not
|
|
29
|
+
# enough simply to store that index at each tree node, because to combine the indices from two child nodes we need to know
|
|
30
|
+
# both the index of the maximal element in each child node's interval, but also the maximal values themselves, so we know
|
|
31
|
+
# which one "wins" for the parent node. This affects the sort of work we need to do when combining and the value provided by
|
|
32
|
+
# the +single_cell_array_val+ lambda.
|
|
33
|
+
# @param single_cell_array_val a lambda that takes an index i and returns the value we need to store in the #build
|
|
34
|
+
# operation for the subinterval i..i.
|
|
35
|
+
# - This will often simply be the value data[i], but in some cases it will be something else. For example, when we are
|
|
36
|
+
# calculating the index of the maximal value on each subinterval we need [i, data[i]] here.
|
|
37
|
+
# - If +update_at+ is called later, this lambda must close over the underlying data in a way that captures the updated value.
|
|
38
|
+
# @param size the size of the underlying data array, used in certain internal arithmetic.
|
|
39
|
+
# @param identity the value to return when we are querying on an empty interval
|
|
40
|
+
# - for sums, this will be zero; for maxima, this will be -Infinity, etc
|
|
41
|
+
def initialize(combine:, single_cell_array_val:, size:, identity:)
|
|
42
|
+
# having sorted out the keyword arguments, pass them more easily to the C layer.
|
|
43
|
+
c_initialize(combine, single_cell_array_val, size, identity)
|
|
44
|
+
end
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
# A segment tree that for an array A(0...n) answers questions of the form "what is the maximum value in the subinterval A(i..j)?"
|
|
48
|
+
# in O(log n) time.
|
|
49
|
+
#
|
|
50
|
+
# C version
|
|
51
|
+
#
|
|
52
|
+
# TODO: share the definition with (non-C) MasValSegmentTree. The only difference is the class of the underlying segment tree
|
|
53
|
+
# template.
|
|
54
|
+
module DataStructuresRMolinari
|
|
55
|
+
class CMaxValSegmentTree
|
|
56
|
+
extend Forwardable
|
|
57
|
+
|
|
58
|
+
# Tell the tree that the value at idx has changed
|
|
59
|
+
def_delegator :@structure, :update_at
|
|
60
|
+
|
|
61
|
+
# @param data an object that contains values at integer indices based at 0, via +data[i]+.
|
|
62
|
+
# - This will usually be an Array, but it could also be a hash or a proc.
|
|
63
|
+
def initialize(data)
|
|
64
|
+
@structure = CSegmentTreeTemplate.new(
|
|
65
|
+
combine: ->(a, b) { [a, b].max },
|
|
66
|
+
single_cell_array_val: ->(i) { data[i] },
|
|
67
|
+
size: data.size,
|
|
68
|
+
identity: -Shared::INFINITY
|
|
69
|
+
)
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
# The maximum value in A(i..j).
|
|
73
|
+
#
|
|
74
|
+
# The arguments must be integers in 0...(A.size)
|
|
75
|
+
# @return the largest value in A(i..j) or -Infinity if i > j.
|
|
76
|
+
def max_on(i, j)
|
|
77
|
+
@structure.query_on(i, j)
|
|
78
|
+
end
|
|
79
|
+
end
|
|
80
|
+
|
|
81
|
+
# A segment tree that for an array A(0...n) answers questions of the form "what is the index of the maximal value in the
|
|
82
|
+
# subinterval A(i..j)?" in O(log n) time.
|
|
83
|
+
#
|
|
84
|
+
# C version
|
|
85
|
+
class CIndexOfMaxValSegmentTree
|
|
86
|
+
extend Forwardable
|
|
87
|
+
|
|
88
|
+
# Tell the tree that the value at idx has changed
|
|
89
|
+
def_delegator :@structure, :update_at
|
|
90
|
+
|
|
91
|
+
# @param (see MaxValSegmentTree#initialize)
|
|
92
|
+
def initialize(data)
|
|
93
|
+
@structure = CSegmentTreeTemplate.new(
|
|
94
|
+
combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
|
|
95
|
+
single_cell_array_val: ->(i) { [i, data[i]] },
|
|
96
|
+
size: data.size,
|
|
97
|
+
identity: nil
|
|
98
|
+
)
|
|
99
|
+
end
|
|
100
|
+
|
|
101
|
+
# The index of the maximum value in A(i..j)
|
|
102
|
+
#
|
|
103
|
+
# The arguments must be integers in 0...(A.size)
|
|
104
|
+
# @return (Integer, nil) the index of the largest value in A(i..j) or +nil+ if i > j.
|
|
105
|
+
# - If there is more than one entry with that value, return one the indices. There is no guarantee as to which one.
|
|
106
|
+
# - Return +nil+ if i > j
|
|
107
|
+
def index_of_max_val_on(i, j)
|
|
108
|
+
@structure.query_on(i, j)&.first # discard the value part of the pair, which is a bookkeeping
|
|
109
|
+
end
|
|
110
|
+
end
|
|
111
|
+
|
|
112
|
+
end
|
|
@@ -17,6 +17,7 @@ require_relative 'shared'
|
|
|
17
17
|
#
|
|
18
18
|
# We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
|
|
19
19
|
class DataStructuresRMolinari::SegmentTreeTemplate
|
|
20
|
+
include Shared
|
|
20
21
|
include Shared::BinaryTreeArithmetic
|
|
21
22
|
|
|
22
23
|
# Construct a concrete instance of a Segment Tree. See details at the links above for the underlying concepts here.
|
|
@@ -47,27 +48,29 @@ class DataStructuresRMolinari::SegmentTreeTemplate
|
|
|
47
48
|
end
|
|
48
49
|
|
|
49
50
|
# The desired value (max, sum, etc.) on the subinterval left..right.
|
|
51
|
+
#
|
|
50
52
|
# @param left the left end of the subinterval.
|
|
51
53
|
# @param right the right end (inclusive) of the subinterval.
|
|
52
54
|
#
|
|
55
|
+
# It must be that left..right is contained in 0...size.
|
|
56
|
+
#
|
|
53
57
|
# The type of the return value depends on the concrete instance of the segment tree. We return the _identity_ element provided at
|
|
54
58
|
# construction time if the interval is empty.
|
|
55
59
|
def query_on(left, right)
|
|
56
|
-
raise DataError, "Bad query interval #{left}..#{right}"
|
|
60
|
+
raise DataError, "Bad query interval #{left}..#{right} (size = #{@size})" unless (0...@size).cover?(left..right)
|
|
57
61
|
|
|
58
62
|
return @identity if left > right # empty interval
|
|
59
63
|
|
|
60
64
|
determine_val(root, left, right, 0, @size - 1)
|
|
61
65
|
end
|
|
62
66
|
|
|
63
|
-
#
|
|
67
|
+
# Reflect the fact that the underlying array has been updated at the given idx
|
|
64
68
|
#
|
|
65
69
|
# @param idx an index in the underlying data array.
|
|
66
70
|
#
|
|
67
71
|
# Note that we don't need the updated value itself. We get that by calling the lambda +single_cell_array_val+ supplied at
|
|
68
72
|
# construction.
|
|
69
73
|
def update_at(idx)
|
|
70
|
-
raise DataError, 'Cannot update an index outside the initial range of the underlying data' unless (0...@size).cover?(idx)
|
|
71
74
|
|
|
72
75
|
update_val_at(idx, root, 0, @size - 1)
|
|
73
76
|
end
|
|
@@ -105,9 +108,9 @@ class DataStructuresRMolinari::SegmentTreeTemplate
|
|
|
105
108
|
left = left(tree_idx)
|
|
106
109
|
right = right(tree_idx)
|
|
107
110
|
if mid >= idx
|
|
108
|
-
update_val_at(idx, left
|
|
111
|
+
update_val_at(idx, left, tree_l, mid)
|
|
109
112
|
else
|
|
110
|
-
update_val_at(idx, right
|
|
113
|
+
update_val_at(idx, right, mid + 1, tree_r)
|
|
111
114
|
end
|
|
112
115
|
@tree[tree_idx] = @combine.call(@tree[left], @tree[right])
|
|
113
116
|
end
|
|
@@ -10,9 +10,13 @@ end
|
|
|
10
10
|
|
|
11
11
|
# These define classes inside module DataStructuresRMolinari
|
|
12
12
|
require_relative 'data_structures_rmolinari/algorithms'
|
|
13
|
+
|
|
13
14
|
require_relative 'data_structures_rmolinari/disjoint_union'
|
|
14
15
|
require_relative 'data_structures_rmolinari/c_disjoint_union' # version as a C extension
|
|
16
|
+
|
|
15
17
|
require_relative 'data_structures_rmolinari/segment_tree_template'
|
|
18
|
+
require_relative 'data_structures_rmolinari/c_segment_tree_template_impl'
|
|
19
|
+
|
|
16
20
|
require_relative 'data_structures_rmolinari/heap'
|
|
17
21
|
require_relative 'data_structures_rmolinari/max_priority_search_tree'
|
|
18
22
|
require_relative 'data_structures_rmolinari/min_priority_search_tree'
|
|
@@ -34,6 +38,8 @@ module DataStructuresRMolinari
|
|
|
34
38
|
# @param data an object that contains values at integer indices based at 0, via +data[i]+.
|
|
35
39
|
# - This will usually be an Array, but it could also be a hash or a proc.
|
|
36
40
|
def initialize(data)
|
|
41
|
+
data.must_be_a Enumerable
|
|
42
|
+
|
|
37
43
|
@structure = SegmentTreeTemplate.new(
|
|
38
44
|
combine: ->(a, b) { [a, b].max },
|
|
39
45
|
single_cell_array_val: ->(i) { data[i] },
|
|
@@ -61,6 +67,8 @@ module DataStructuresRMolinari
|
|
|
61
67
|
|
|
62
68
|
# @param (see MaxValSegmentTree#initialize)
|
|
63
69
|
def initialize(data)
|
|
70
|
+
data.must_be_a Enumerable
|
|
71
|
+
|
|
64
72
|
@structure = SegmentTreeTemplate.new(
|
|
65
73
|
combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
|
|
66
74
|
single_cell_array_val: ->(i) { [i, data[i]] },
|
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: data_structures_rmolinari
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.4.
|
|
4
|
+
version: 0.4.4
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Rory Molinari
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: bin
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2023-
|
|
11
|
+
date: 2023-02-02 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: must_be
|
|
@@ -79,6 +79,7 @@ email: rorymolinari@gmail.com
|
|
|
79
79
|
executables: []
|
|
80
80
|
extensions:
|
|
81
81
|
- ext/c_disjoint_union/extconf.rb
|
|
82
|
+
- ext/c_segment_tree_template/extconf.rb
|
|
82
83
|
extra_rdoc_files: []
|
|
83
84
|
files:
|
|
84
85
|
- CHANGELOG.md
|
|
@@ -86,8 +87,12 @@ files:
|
|
|
86
87
|
- Rakefile
|
|
87
88
|
- ext/c_disjoint_union/disjoint_union.c
|
|
88
89
|
- ext/c_disjoint_union/extconf.rb
|
|
90
|
+
- ext/c_segment_tree_template/extconf.rb
|
|
91
|
+
- ext/c_segment_tree_template/segment_tree_template.c
|
|
92
|
+
- ext/shared.c
|
|
89
93
|
- lib/data_structures_rmolinari.rb
|
|
90
94
|
- lib/data_structures_rmolinari/algorithms.rb
|
|
95
|
+
- lib/data_structures_rmolinari/c_segment_tree_template_impl.rb
|
|
91
96
|
- lib/data_structures_rmolinari/disjoint_union.rb
|
|
92
97
|
- lib/data_structures_rmolinari/heap.rb
|
|
93
98
|
- lib/data_structures_rmolinari/max_priority_search_tree.rb
|