data_structures_rmolinari 0.4.1 → 0.4.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: eb25e49219167201208f45a402b202180466202bc071940da418b2f84d281f6d
4
- data.tar.gz: f43c1614c2a433d7a4e1148eb90121fc4b1d61c807afeb78c456d77b66935adb
3
+ metadata.gz: c9022e9531472d1125c6172025c2d10c5d4ef4f9c43e326a43f1c5b4f0721263
4
+ data.tar.gz: '0212619be7fe32e68b63d2087730f81ffd6b4179b8b8bf63aa0026e4e3056224'
5
5
  SHA512:
6
- metadata.gz: d2e77397f790e8fe8d650d7727be55b464aa8f19d928e215b824820b712df1f5762d74ed304196e1c773960f02d73cd29bc07486f0cc28eb5ddcd5dbd422d691
7
- data.tar.gz: 56328269625f88b5119b64696f792a2e567d6327605dc9a6cc391edd33e5e49f293adb304ef3b819094b860299f904b79862eaac2a8eaeff0d249350bce280db
6
+ metadata.gz: a7f9258eeed2dc7e7fa5713aaecfcdf44e061bb161aa3d0d2662fb662bfb6b2685c61be221b4a109792982c3b2aa6215da75b51ae299d4a9237b6226000612e4
7
+ data.tar.gz: e585a245f753ef731895163eedba802e3fe2f6000720d10705b5a0cd02a12642a35220eea57c8b71f504b66db7cb06161fdfca1660edc0dc132ee026dd83be4d
data/CHANGELOG.md CHANGED
@@ -1,6 +1,28 @@
1
1
  # Changelog
2
2
 
3
- ## [Unreleased]
3
+ ## [0.4.3] 2023-01-27
4
+
5
+ - Fix bad directive in Rakefile for DisjointUnion C extension
6
+
7
+ ## [0.4.2] 2023-01-26
8
+
9
+ ### Added
10
+
11
+ - MinPrioritySearchTree added
12
+ - it's a thin layer on top of a MaxPrioritySearchTree with negated y values.
13
+
14
+ - MaxPrioritySearchTree
15
+ - A "dynamic" constructor option now allows deletion of the "top" (root) node. This is useful in certain algorithms.
16
+
17
+ - DisjointUnion
18
+ - Added a proof-of-concept implementation in C, which is about twice as fast.
19
+
20
+ - Algorithms
21
+ - Implement the Maximal Empty Rectangle algorithm of De et al. It uses a dynamic MaxPST.
22
+
23
+ ## [0.4.1] 2023-01-12
24
+
25
+ - Update this file for the gem (though I forgot to add this comment first!)
4
26
 
5
27
  ## [0.4.0] 2023-01-12
6
28
 
@@ -10,10 +32,10 @@
10
32
  - Duplicate y values are now allowed. Ties are broken with a preference for smaller values of x.
11
33
  - Method names have changed
12
34
  - Instead of "highest", "leftmost", "rightmost" we use "largest_y", "smallest_x", "largest_x"
13
- - For example, +highest_ne+ is now +largest_y_in_nw+
35
+ - For example, `highest_ne` is now `largest_y_in_nw`
14
36
  - DisjointUnion
15
37
  - the size argument to initializer is optional. The default value is 0.
16
- - elements can be added to the "universe" of known values with +make_set+
38
+ - elements can be added to the "universe" of known values with `make_set`
17
39
 
18
40
  ### Removed
19
41
  - MinmaxPrioritySearchTree is no longer available
data/README.md ADDED
@@ -0,0 +1,141 @@
1
+ # Data Structures
2
+
3
+ This is a small collection of Ruby data structures that I have implemented for my own interest. Implementing the code for a data
4
+ structure is almost always more educational than simply reading about it and is usually fun. I wrote some of them while
5
+ participating in the Advent of Code (https://adventofcode.com/).
6
+
7
+ These implementations are not particularly clever. They are based on the expository descriptions and pseudo-code I found as I read
8
+ about each structure and so are not as fast as possible.
9
+
10
+ The code is available as a gem: https://rubygems.org/gems/data_structures_rmolinari.
11
+
12
+ ## Usage
13
+
14
+ The right way to organize the code is not obvious to me. For now the data structures are all defined in the module
15
+ `DataStructuresRMolinari` to avoid polluting the global namespace.
16
+
17
+ Example usage after the gem is installed:
18
+ ```
19
+ require 'data_structures_rmolinari`
20
+
21
+ # Pull what we need out of the namespace
22
+ MaxPrioritySearchTree = DataStructuresRMolinari::MaxPrioritySearchTree
23
+ Point = DataStructuresRMolinari::Point # anything responding to :x and :y is fine
24
+
25
+ pst = MaxPrioritySearchTree.new([Point.new(1, 1)])
26
+ puts pst.largest_y_in_ne(0, 0) # "Point(1,1)"
27
+ ```
28
+
29
+ # Implementations
30
+
31
+ ## Disjoint Union
32
+
33
+ We represent a set S of non-negative integers as the disjoint union of subsets. Equivalently, we represent a partition of S. The
34
+ data structure provides very efficient implementation of the two key operations
35
+ - `unite(e, f)`, which merges the subsets containing e and f; and
36
+ - `find(e)`, which returns the canonical representative of the subset containing e. Two elements e and f are in the same subset
37
+ exactly when `find(e) == find(f)`.
38
+
39
+ It also provides
40
+ - `make_set(v)`, which adds a new value `v` to the set S, starting out in a singleton subset.
41
+
42
+ For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
43
+ van Leeuwen.
44
+
45
+ There is an experimental implementation as a C extension in c_disjoint_union.c. The Ruby class for this implementation is
46
+ `CDisjointUnion`. Benchmarks indicate that a long sequence of `unite` calls is about twice as fast.
47
+
48
+ ## Heap
49
+
50
+ This is a standard binary heap with an `update` method, suitable for use as a priority queue. There are several supported
51
+ operations:
52
+
53
+ - `insert(item, priority)`, insert the given item with the stated priority.
54
+ - By default, items must be distinct.
55
+ - `top`, returning the element with smallest priority
56
+ - `pop`, return the element with smallest priority and remove it from the structure
57
+ - `update(item, priority)`, update the priority of the given item, which must already be in the heap
58
+
59
+ `top` is O(1). The others are O(log n) where n is the number of items in the heap.
60
+
61
+ By default we have a min-heap: the top element is the one with smallest priority. A configuration parameter at construction can make
62
+ it a max-heap.
63
+
64
+ Another configuration parameter allows the creation of a "non-addressable" heap. This makes it impossible to call `update`, but
65
+ allows the insertion of duplicate items (which is sometimes useful) and slightly faster operation overall.
66
+
67
+ See https://en.wikipedia.org/wiki/Binary_heap and the paper by Edelkamp, Elmasry, and Katajainen [[EEK2017]](#references).
68
+
69
+ ## Priority Search Tree
70
+
71
+ A PST stores a set P of two-dimensional points in a way that allows certain queries about P to be answered efficiently. The data
72
+ structure was introduced by McCreight [[McC1985]](#references). De, Maheshawari, Nandy, and Smid [[DMNS2011]](#references) showed
73
+ how to build the structure in-place and we use their approach here.
74
+
75
+ - `largest_y_in_ne(x0, y0)` and `largest_y_in_nw(x0, y0)`, the "highest" (max-y) point in the quadrant to the northest/northwest of
76
+ (x0, y0);
77
+ - `smallest_x_in_ne(x0, y0)`, the "leftmost" (min-x) point in the quadrant to the northeast of (x0, y0);
78
+ - `largest_x_in_nw(x0, y0)`, the "rightmost" (max-x) point in the quadrant to the northwest of (x0, y0);
79
+ - `largest_y_in_3_sided(x0, x1, y0)`, the highest point in the region specified by x0 <= x <= x1 and y0 <= y; and
80
+ - `enumerate_3_sided(x0, x1, y0)`, enumerate all the points in that region.
81
+
82
+ Here compass directions are the natural ones in the x-y plane with the positive x-axis pointing east and the positive y-axis
83
+ pointing north.
84
+
85
+ There is no `smallest_x_in_3_sided(x0, x1, y0)`. Just use `smallest_x_in_ne(x0, y0)`.
86
+
87
+ The single-point queries run in O(log n) time, where n is the size of P, while `enumerate_3_sided` runs in O(m + log n), where m is
88
+ the number of points actually enumerated.
89
+
90
+ The implementation is in `MaxPrioritySearchTree` (MaxPST for short), so called because internally the structure is, among other
91
+ things, a max-heap on the y-coordinates.
92
+
93
+ These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
94
+ [[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.
95
+
96
+ We also provide a `MinPrioritySearchTree`, which answers analagous queries in the southward-infinite quadrants and 3-sided
97
+ regions.
98
+
99
+ By default these data structures are immutable: once constructed they cannot be changed. But there is a constructor option that
100
+ makes the instance "dynamic". This allows us to delete the element at the root of the tree - the one with largest y value (smallest
101
+ for MinPST) - with the `delete_top!` method. This operation is important in certain algorithms, such as enumerating all maximal
102
+ empty rectangles (see the second paper by De et al.[[DMNS2013]](#references)) Note that points can still not be added to the PST in
103
+ any case, and choosing the dynamic option makes certain internal bookkeeping operations slower.
104
+
105
+ In [[DMNS2013]](#references) De et al. generalize the in-place structure to a _Min-max Priority Search Tree_ (MinmaxPST) that can
106
+ answer queries in all four quadrants and both "kinds" of 3-sided boxes. Having one of these would save the trouble of constructing
107
+ both a MaxPST and MinPST. But the presentiation is hard to follow in places and the paper's pseudocode is buggy.[^minmaxpst]
108
+
109
+ ## Segment Tree
110
+
111
+ Segment trees store information related to subintervals of a certain array. For example, they can be used to find the sum of the
112
+ elements in an arbitrary subinterval A[i..j] of an array A[0..n] in O(log n) time. Each node in the tree corresponds to a subarray
113
+ of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for arbitrary
114
+ subarrays.
115
+
116
+ An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
117
+
118
+ Generic code is provided in `SegmentTreeTemplate`. Concrete classes are written by providing a handful of simple lambdas and
119
+ constants to the template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a
120
+ segment tree, for which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes,
121
+ `MaxValSegmentTree` and `IndexOfMaxValSegmentTree`, for examples.
122
+
123
+ ## Algorithms
124
+
125
+ The Algorithms submodule contains some algorithms using the data structures.
126
+
127
+ - `maximal_empty_rectangles(points)`
128
+ - We are given a set P contained in a minimal box B = [x_min, x_max] x [y_min, y_max]. An _empty rectangle_ is a axis-parallel
129
+ rectangle with positive area contained in B containing no element of P in its interior. A _maximal empty rectangle_ is an empty
130
+ rectangle not properly contained in any other empty rectangle. This method yields each maximal empty rectangle in the form
131
+ [left, right, bottom, top].
132
+ - The algorithm is due to [[DMNS2013]](#references).
133
+
134
+ # References
135
+ - [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
136
+ - [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI 10.1007/s00224-017-9760-2.
137
+ - [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985.
138
+ - [DMNS2011] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Priority Search Tree_, 23rd Canadian Conference on Computational Geometry, 2011.
139
+ - [DMNS2013] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46 (2013), pp 310-327.
140
+
141
+ [^minmaxpst]: See the comments in the fragmentary class `MinMaxPrioritySearchTree` for further details.
data/Rakefile ADDED
@@ -0,0 +1,16 @@
1
+ require 'rubygems'
2
+ require 'rake/testtask'
3
+ require 'rake/extensiontask'
4
+
5
+ Rake::ExtensionTask.new('data_structures_rmolinari/c_disjoint_union') do |ext|
6
+ ext.name = 'c_disjoint_union'
7
+ ext.ext_dir = 'ext/c_disjoint_union'
8
+ ext.lib_dir = 'lib/data_structures_rmolinari/'
9
+ end
10
+
11
+ Rake::TestTask.new do |t|
12
+ t.libs << 'test'
13
+ end
14
+
15
+ desc 'Run Tests'
16
+ task default: :test
@@ -0,0 +1,424 @@
1
+ /*
2
+ * This is a C implementation of a simple Ruby Disjoint Union data structure.
3
+ *
4
+ * A Disjoint Union doesn't have much of an implementation in Ruby: see disjoint_union.rb in this gem. This means that we don't gain
5
+ * much by implementing it in C but that it serves as a good learning experience for me.
6
+ *
7
+ * It turns out that writing a C extension for Ruby like this isn't very complicated, but there are a bunch of moving parts and the
8
+ * available documentation is a bit of a slog. Writing this was very educational.
9
+ *
10
+ * See https://docs.ruby-lang.org/en/master/extension_rdoc.html for some documentation. It's a bit hard to read in places, but
11
+ * plugging away at things helps.
12
+ *
13
+ * https://guides.rubygems.org/gems-with-extensions/ is a decent tutorial, though it leaves out lots of details.
14
+ *
15
+ * See https://aaronbedra.com/extending-ruby/ for another tutorial.
16
+ */
17
+
18
+ #include "ruby.h"
19
+
20
+ // The Shared::DataError exception type in the Ruby code. We only need it when we detect a runtime error, so a macro should be fine.
21
+ #define mShared rb_define_module("Shared")
22
+ #define eSharedDataError rb_const_get(mShared, rb_intern_const("DataError"))
23
+
24
+ /**
25
+ * It's been so long since I've written non-trival C that I need to copy examples from online.
26
+ *
27
+ * Dynamic array of longs, with an initial value for otherwise uninitialized elements.
28
+ * Based on https://stackoverflow.com/questions/3536153/c-dynamically-growing-array
29
+ */
30
+ typedef struct {
31
+ long *array;
32
+ size_t size;
33
+ long default_val;
34
+ } DynamicArray;
35
+
36
+ /*
37
+ * Initialize a DynamicArray struct with the given initial size and with all values set to the default value.
38
+ *
39
+ * The default value is stored and used to initialize new array sections if and when the array needs to be expanded.
40
+ */
41
+ void initDynamicArray(DynamicArray *a, size_t initial_size, long default_val) {
42
+ a->array = malloc(initial_size * sizeof(long));
43
+ a->size = initial_size;
44
+ a->default_val = default_val;
45
+
46
+ for (size_t i = 0; i < initial_size; i++) {
47
+ a->array[i] = default_val;
48
+ }
49
+ }
50
+
51
+ /*
52
+ * Assign +value+ to the the +index+-th element of the array, expanding the available space if necessary.
53
+ */
54
+ void assignInDynamicArray(DynamicArray *a, unsigned long index, long value) {
55
+ if (a->size <= index) {
56
+ size_t new_size = a->size;
57
+ while (new_size <= index) {
58
+ new_size = 8 * new_size / 5 + 8; // 8/5 gives "Fibonnacci-like" growth; adding 8 to avoid small arrays having to reallocate
59
+ // too often as they grow. Who knows if it's worth being "clever".
60
+ }
61
+
62
+ long *new_array = realloc(a->array, new_size * sizeof(long));
63
+ if (!new_array) {
64
+ rb_raise(rb_eRuntimeError, "Cannot allocate memory to expand DynamicArray!");
65
+ }
66
+
67
+ a->array = new_array;
68
+ for (size_t i = a->size; i < new_size; i++) {
69
+ a->array[i] = a->default_val;
70
+ }
71
+
72
+ a->size = new_size;
73
+ }
74
+
75
+ a->array[index] = value;
76
+ }
77
+
78
+ void freeDynamicArray(DynamicArray *a) {
79
+ free(a->array);
80
+ a->array = NULL;
81
+ a->size = 0;
82
+ }
83
+
84
+ size_t _size_of(DynamicArray *a) {
85
+ return a->size * sizeof(a->default_val);
86
+ }
87
+
88
+ /**
89
+ * The C implementation of a Disjoint Union
90
+ *
91
+ * See Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
92
+ */
93
+
94
+ /*
95
+ * The Disjoint Union struct.
96
+ * - forest: an array of longs giving, for each element, the element's parent.
97
+ * - An element e is the root of its tree just when forest[e] == e.
98
+ * - Two elements are in the same subset just when they are in the same tree in the forest.
99
+ * - So the key idea is that we can check this by navigating via parents from each element to their roots. Clever optimizations
100
+ * keep the trees flat and so most nodes are close to their roots.
101
+ * - rank: a array of longs giving the "rank" of each element.
102
+ * - This value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat. See Tarjan & van
103
+ * Leeuwen
104
+ * - subset_count: the number of (disjoint) subsets.
105
+ * - it isn't needed internally but may be useful to client code.
106
+ */
107
+ typedef struct du_data {
108
+ DynamicArray *forest; // the forest that describes the unified subsets
109
+ DynamicArray *rank; // the "ranks" of the elements, used when uniting subsets
110
+ size_t subset_count;
111
+ } disjoint_union_data;
112
+
113
+ /*
114
+ * Create one (on the heap).
115
+ *
116
+ * The dynamic arrays are initialized with a size of 100 because I didn't have a better idea. This will end up getting called from
117
+ * the Ruby #allocate method, which happens before #initialize. Thus we don't know the calling code's desired initial size.
118
+ */
119
+ #define INITIAL_SIZE 100
120
+ static disjoint_union_data *create_disjoint_union() {
121
+ disjoint_union_data *disjoint_union = (disjoint_union_data *)malloc(sizeof(disjoint_union_data));
122
+
123
+ // Allocate the structures
124
+ DynamicArray *forest = (DynamicArray *)malloc(sizeof(DynamicArray));
125
+ DynamicArray *rank = (DynamicArray *)malloc(sizeof(DynamicArray));
126
+ initDynamicArray(forest, INITIAL_SIZE, -1);
127
+ initDynamicArray(rank, INITIAL_SIZE, 0);
128
+
129
+ disjoint_union->forest = forest;
130
+ disjoint_union->rank = rank;
131
+ disjoint_union->subset_count = 0;
132
+
133
+ return disjoint_union;
134
+ }
135
+
136
+ /*
137
+ * Free the memory associated with a disjoint union.
138
+ *
139
+ * This will end up getting triggered by the Ruby garbage collector. Ruby learns about it via the disjoint_union_type struct below.
140
+ */
141
+ static void disjoint_union_free(void *ptr) {
142
+ if (ptr) {
143
+ disjoint_union_data *disjoint_union = ptr;
144
+ freeDynamicArray(disjoint_union->forest);
145
+ freeDynamicArray(disjoint_union->rank);
146
+
147
+ free(disjoint_union->forest);
148
+ disjoint_union->forest = NULL;
149
+
150
+ free(disjoint_union->rank);
151
+ disjoint_union->rank = NULL;
152
+
153
+ xfree(disjoint_union);
154
+ }
155
+ }
156
+
157
+ /************************************************************
158
+ * The disjoint union operations
159
+ ************************************************************/
160
+
161
+ /*
162
+ * Is the given element already a member of the universe?
163
+ */
164
+ static int present_p(disjoint_union_data *disjoint_union, size_t element) {
165
+ DynamicArray *forest = (DynamicArray *)disjoint_union->forest;
166
+ return (forest->size > element && (forest->array[element] != forest->default_val));
167
+ }
168
+
169
+ /*
170
+ * Check that the given element is a member of the universe and raise Shared::DataError (ruby-side) if not
171
+ */
172
+ static void assert_membership(disjoint_union_data *disjoint_union, size_t element) {
173
+ if (!present_p(disjoint_union, element)) {
174
+ rb_raise(eSharedDataError, "Value %zu is not part of the universe", element);
175
+ }
176
+ }
177
+
178
+ /*
179
+ * Add a new element to the universe. It starts out in its own singleton subset.
180
+ *
181
+ * Shared::DataError is raised if it is already an element.
182
+ */
183
+ static void add_new_element(disjoint_union_data *disjoint_union, size_t element) {
184
+ if (present_p(disjoint_union, element)) {
185
+ rb_raise(eSharedDataError, "Element %zu already present in the universe", element);
186
+ }
187
+
188
+ assignInDynamicArray(disjoint_union->forest, element, element);
189
+ assignInDynamicArray(disjoint_union->rank, element, 0);
190
+ disjoint_union->subset_count++;
191
+ }
192
+
193
+ /*
194
+ * Find the canonical representative of the given element. This is the root of the tree (in forest) containing element.
195
+ *
196
+ * Two elements are in the same subset exactly when their canonical representatives are equal.
197
+ */
198
+ static size_t find(disjoint_union_data *disjoint_union, size_t element) {
199
+ assert_membership(disjoint_union, element);
200
+
201
+ // We implement find with "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
202
+ long *d = disjoint_union->forest->array; // the actual forest data
203
+ size_t x = element;
204
+ while (d[d[x]] != d[x]) {
205
+ x = d[x] = d[d[x]];
206
+ }
207
+ return d[x];
208
+ }
209
+
210
+ /*
211
+ * "Link"" the two given elements so that they are in the same subset now.
212
+ *
213
+ * In other words, merge the subtrees containing the two elements.
214
+ *
215
+ * Good performace (see Tarjan and van Leeuwin) assumes that elt1 and elt2 area are disinct and already the roots of their trees,
216
+ * though we don't check that here.
217
+ */
218
+ static void link_roots(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2) {
219
+ long *rank = disjoint_union->rank->array;
220
+ long *forest = disjoint_union->forest->array;
221
+
222
+ if (rank[elt1] > rank[elt2]) {
223
+ forest[elt2] = elt1;
224
+ } else if (rank[elt1] == rank[elt2]) {
225
+ forest[elt2] = elt1;
226
+ rank[elt1]++;
227
+ } else {
228
+ forest[elt1] = elt2;
229
+ }
230
+
231
+ disjoint_union->subset_count--;
232
+ }
233
+
234
+ /*
235
+ * "Unite" or merge the subsets containing elt1 and elt2.
236
+ */
237
+ static void unite(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2) {
238
+ assert_membership(disjoint_union, elt1);
239
+ assert_membership(disjoint_union, elt2);
240
+
241
+ if (elt1 == elt2) {
242
+ rb_raise(eSharedDataError, "Uniting an element with itself is meaningless");
243
+ }
244
+
245
+ size_t root1 = find(disjoint_union, elt1);
246
+ size_t root2 = find(disjoint_union, elt2);
247
+
248
+ if (root1 == root2) {
249
+ return; // already united
250
+ }
251
+
252
+ link_roots(disjoint_union, root1, root2);
253
+ }
254
+
255
+
256
+ /**
257
+ * Wrapping and unwrapping things for the Ruby runtime
258
+ *
259
+ */
260
+
261
+ // How much memory (roughly) does a disjoint_union_data instance consume? I guess the Ruby runtime can use this information when
262
+ // deciding how agressive to be during garbage collection and such.
263
+ static size_t disjoint_union_memsize(const void *ptr) {
264
+ if (ptr) {
265
+ const disjoint_union_data *du = ptr;
266
+ return sizeof(disjoint_union_data) + _size_of(du->forest) + _size_of(du->rank);
267
+ } else {
268
+ return 0;
269
+ }
270
+ }
271
+
272
+ /*
273
+ * A configuration struct that tells the Ruby runtime how to deal with a disjoint_union_data object.
274
+ *
275
+ * https://docs.ruby-lang.org/en/master/extension_rdoc.html#label-Encapsulate+C+data+into+a+Ruby+object
276
+ */
277
+ static const rb_data_type_t disjoint_union_type = {
278
+ .wrap_struct_name = "disjoint_union",
279
+ { // help for the Ruby garbage collector
280
+ .dmark = NULL, // dmark, for marking other Ruby objects. We don't hold any other objects so this can be NULL
281
+ .dfree = disjoint_union_free, // how to free the memory associated with an object
282
+ .dsize = disjoint_union_memsize, // roughly how much space does the object consume?
283
+ },
284
+ .data = NULL, // a data field we could use for something here if we wanted. Ruby ignores it
285
+ .flags = 0 // GC-related flag values.
286
+ };
287
+
288
+ /*
289
+ * Helper: check that a Ruby value is a non-negative Fixnum and convert it to a C unsigned long
290
+ */
291
+ static unsigned long checked_nonneg_fixnum(VALUE val) {
292
+ Check_Type(val, T_FIXNUM);
293
+ long c_val = FIX2LONG(val);
294
+
295
+ if (c_val < 0) {
296
+ rb_raise(eSharedDataError, "Value must be non-negative");
297
+ }
298
+
299
+ return c_val;
300
+ }
301
+
302
+ /*
303
+ * Unwrap a Rubyfied disjoint union to get the C struct inside.
304
+ */
305
+ static disjoint_union_data *unwrapped(VALUE self) {
306
+ disjoint_union_data *disjoint_union;
307
+ TypedData_Get_Struct((self), disjoint_union_data, &disjoint_union_type, disjoint_union);
308
+ return disjoint_union;
309
+ }
310
+
311
+ /*
312
+ * This is for CDisjointUnion.allocate on the Ruby side
313
+ */
314
+ static VALUE disjoint_union_alloc(VALUE klass) {
315
+ // Get one on the heap
316
+ disjoint_union_data *disjoint_union = create_disjoint_union();
317
+ // Wrap it up into a Ruby object
318
+ return TypedData_Wrap_Struct(klass, &disjoint_union_type, disjoint_union);
319
+ }
320
+
321
+ /*
322
+ * A single parameter is optional. If given it should be a non-negative integer and specifies the initial size, s, of the universe
323
+ * 0, 1, ..., s-1.
324
+ *
325
+ * If no argument is given we act as though a value of 0 were passed.
326
+ */
327
+ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
328
+ if (argc == 0) {
329
+ return self;
330
+ } else if (argc > 1) {
331
+ rb_raise(rb_eArgError, "wrong number of arguments");
332
+ } else {
333
+ size_t initial_size = checked_nonneg_fixnum(argv[0]);
334
+ disjoint_union_data *disjoint_union = unwrapped(self);
335
+
336
+ for (size_t i = 0; i < initial_size; i++) {
337
+ add_new_element(disjoint_union, i);
338
+ }
339
+ }
340
+ return self;
341
+ }
342
+
343
+ /**
344
+ * And now the simple wrappers around the Disjoint Union C functionality. In each case we
345
+ * - unwrap a 'VALUE self',
346
+ * - i.e., theCDisjointUnion instance that contains a disjoint_union_data struct;
347
+ * - munge any other arguments into longs;
348
+ * - call the appropriate C function to act on the struct; and
349
+ * - return an appropriate VALUE for the Ruby runtime can use.
350
+ *
351
+ * We make them into methods on CDisjointUnion in the Init_CDisjointUnion function, below.
352
+ */
353
+
354
+ /*
355
+ * Add a new subset to the universe containing the element +new_v+.
356
+ *
357
+ * @param the new element, starting in its own singleton subset
358
+ * - it must be a non-negative integer, not already part of the universe of elements.
359
+ */
360
+ static VALUE disjoint_union_make_set(VALUE self, VALUE arg) {
361
+ add_new_element(unwrapped(self), checked_nonneg_fixnum(arg));
362
+
363
+ return Qnil;
364
+ }
365
+
366
+ /*
367
+ * @return the number of subsets into which the universe is currently partitioned.
368
+ */
369
+ static VALUE disjoint_union_subset_count(VALUE self) {
370
+ return LONG2NUM(unwrapped(self)->subset_count);
371
+ }
372
+
373
+ /*
374
+ * The canonical representative of the subset containing e. Two elements d and e are in the same subset exactly when find(d) ==
375
+ * find(e).
376
+ *
377
+ * The parameter must be in the universe of elements.
378
+ *
379
+ * @return (Integer) one of the universe of elements
380
+ */
381
+ static VALUE disjoint_union_find(VALUE self, VALUE arg) {
382
+ return LONG2NUM(find(unwrapped(self), checked_nonneg_fixnum(arg)));
383
+ }
384
+
385
+ /*
386
+ * Declare that the arguments are equivalent, i.e., in the same subset. If they are already in the same subset this is a no-op.
387
+ *
388
+ * Each argument must be in the universe of elements
389
+ */
390
+ static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
391
+ unite(unwrapped(self), checked_nonneg_fixnum(arg1), checked_nonneg_fixnum(arg2));
392
+
393
+ return Qnil;
394
+ }
395
+
396
+ /*
397
+ * A Disjoint Union.
398
+ *
399
+ * A "disjoint set union" that represents a set of elements that belonging to _disjoint_ subsets. Alternatively, this expresses a
400
+ * partion of a fixed set.
401
+ *
402
+ * The data structure provides efficient actions to merge two disjoint subsets, i.e., replace them by their union, and determine if
403
+ * two elements are in the same subset.
404
+ *
405
+ * The elements of the set are non-negative integers. Client code can map its data to these representatives.
406
+ *
407
+ * See https://en.wikipedia.org/wiki/Disjoint-set_data_structure for a good introduction.
408
+ *
409
+ * The code uses several ideas from Tarjan and van Leeuwen for efficiency. We use "union by rank" in +unite+ and path-halving in
410
+ * +find+. Together, these make the amortized cost of each opperation effectively constant.
411
+ *
412
+ * - Tarjan, Robert E., van Leeuwen, Jan (1984). _Worst-case analysis of set union algorithms_. Journal of the ACM. 31 (2): 245–281.
413
+ */
414
+ void Init_c_disjoint_union() {
415
+ VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
416
+ VALUE cDisjointUnion = rb_define_class_under(mDataStructuresRMolinari, "CDisjointUnion", rb_cObject);
417
+
418
+ rb_define_alloc_func(cDisjointUnion, disjoint_union_alloc);
419
+ rb_define_method(cDisjointUnion, "initialize", disjoint_union_init, -1);
420
+ rb_define_method(cDisjointUnion, "make_set", disjoint_union_make_set, 1);
421
+ rb_define_method(cDisjointUnion, "subset_count", disjoint_union_subset_count, 0);
422
+ rb_define_method(cDisjointUnion, "find", disjoint_union_find, 1);
423
+ rb_define_method(cDisjointUnion, "unite", disjoint_union_unite, 2);
424
+ }
@@ -0,0 +1,12 @@
1
+ require 'mkmf'
2
+
3
+ abort 'missing malloc()' unless have_func "malloc"
4
+ abort 'missing realloc()' unless have_func "realloc"
5
+
6
+ if try_cflags('-O')
7
+ append_cflags('-O')
8
+ end
9
+
10
+ extension_name = "c_disjoint_union"
11
+ dir_config(extension_name)
12
+ create_makefile("data_structures_rmolinari/c_disjoint_union")