data_structures_rmolinari 0.4.1 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: eb25e49219167201208f45a402b202180466202bc071940da418b2f84d281f6d
4
- data.tar.gz: f43c1614c2a433d7a4e1148eb90121fc4b1d61c807afeb78c456d77b66935adb
3
+ metadata.gz: c912d4ddf3a7cfc721b7f298a966f7e0d4cbd4249797506457605a44774523a0
4
+ data.tar.gz: c168b7096178e496f76fa53f5b8566cd2ac26897fd5b362c3c37da5314f2a6db
5
5
  SHA512:
6
- metadata.gz: d2e77397f790e8fe8d650d7727be55b464aa8f19d928e215b824820b712df1f5762d74ed304196e1c773960f02d73cd29bc07486f0cc28eb5ddcd5dbd422d691
7
- data.tar.gz: 56328269625f88b5119b64696f792a2e567d6327605dc9a6cc391edd33e5e49f293adb304ef3b819094b860299f904b79862eaac2a8eaeff0d249350bce280db
6
+ metadata.gz: 0c88c1ad7c07fe6358e3eefd21406b4bbd33a89e89731edb3997bb027efca52a7d312ffee1435af960be73c5cd5212950a854a11c6f2105dfccc47ed4ae00c2b
7
+ data.tar.gz: 9bf6e4570017217b59f4f3a0b1d9e23d7752ba4e4b5dc11a988826726367956c6564763044f23b5105293213c4844667249a7f88726861f792264c1d634256ae
data/CHANGELOG.md CHANGED
@@ -1,6 +1,24 @@
1
1
  # Changelog
2
2
 
3
- ## [Unreleased]
3
+ ## [0.4.2] 2023-01-26
4
+
5
+ ### Added
6
+
7
+ - MinPrioritySearchTree added
8
+ - it's a thin layer on top of a MaxPrioritySearchTree with negated y values.
9
+
10
+ - MaxPrioritySearchTree
11
+ - A "dynamic" constructor option now allows deletion of the "top" (root) node. This is useful in certain algorithms.
12
+
13
+ - DisjointUnion
14
+ - Added a proof-of-concept implementation in C, which is about twice as fast.
15
+
16
+ - Algorithms
17
+ - Implement the Maximal Empty Rectangle algorithm of De et al. It uses a dynamic MaxPST.
18
+
19
+ ## [0.4.1] 2023-01-12
20
+
21
+ - Update this file for the gem (though I forgot to add this comment first!)
4
22
 
5
23
  ## [0.4.0] 2023-01-12
6
24
 
@@ -10,10 +28,10 @@
10
28
  - Duplicate y values are now allowed. Ties are broken with a preference for smaller values of x.
11
29
  - Method names have changed
12
30
  - Instead of "highest", "leftmost", "rightmost" we use "largest_y", "smallest_x", "largest_x"
13
- - For example, +highest_ne+ is now +largest_y_in_nw+
31
+ - For example, `highest_ne` is now `largest_y_in_nw`
14
32
  - DisjointUnion
15
33
  - the size argument to initializer is optional. The default value is 0.
16
- - elements can be added to the "universe" of known values with +make_set+
34
+ - elements can be added to the "universe" of known values with `make_set`
17
35
 
18
36
  ### Removed
19
37
  - MinmaxPrioritySearchTree is no longer available
data/README.md ADDED
@@ -0,0 +1,141 @@
1
+ # Data Structures
2
+
3
+ This is a small collection of Ruby data structures that I have implemented for my own interest. Implementing the code for a data
4
+ structure is almost always more educational than simply reading about it and is usually fun. I wrote some of them while
5
+ participating in the Advent of Code (https://adventofcode.com/).
6
+
7
+ These implementations are not particularly clever. They are based on the expository descriptions and pseudo-code I found as I read
8
+ about each structure and so are not as fast as possible.
9
+
10
+ The code is available as a gem: https://rubygems.org/gems/data_structures_rmolinari.
11
+
12
+ ## Usage
13
+
14
+ The right way to organize the code is not obvious to me. For now the data structures are all defined in the module
15
+ `DataStructuresRMolinari` to avoid polluting the global namespace.
16
+
17
+ Example usage after the gem is installed:
18
+ ```
19
+ require 'data_structures_rmolinari`
20
+
21
+ # Pull what we need out of the namespace
22
+ MaxPrioritySearchTree = DataStructuresRMolinari::MaxPrioritySearchTree
23
+ Point = DataStructuresRMolinari::Point # anything responding to :x and :y is fine
24
+
25
+ pst = MaxPrioritySearchTree.new([Point.new(1, 1)])
26
+ puts pst.largest_y_in_ne(0, 0) # "Point(1,1)"
27
+ ```
28
+
29
+ # Implementations
30
+
31
+ ## Disjoint Union
32
+
33
+ We represent a set S of non-negative integers as the disjoint union of subsets. Equivalently, we represent a partition of S. The
34
+ data structure provides very efficient implementation of the two key operations
35
+ - `unite(e, f)`, which merges the subsets containing e and f; and
36
+ - `find(e)`, which returns the canonical representative of the subset containing e. Two elements e and f are in the same subset
37
+ exactly when `find(e) == find(f)`.
38
+
39
+ It also provides
40
+ - `make_set(v)`, which adds a new value `v` to the set S, starting out in a singleton subset.
41
+
42
+ For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
43
+ van Leeuwen.
44
+
45
+ There is an experimental implementation as a C extension in c_disjoint_union.c. The Ruby class for this implementation is
46
+ `CDisjointUnion`. Benchmarks indicate that a long sequence of `unite` calls is about twice as fast.
47
+
48
+ ## Heap
49
+
50
+ This is a standard binary heap with an `update` method, suitable for use as a priority queue. There are several supported
51
+ operations:
52
+
53
+ - `insert(item, priority)`, insert the given item with the stated priority.
54
+ - By default, items must be distinct.
55
+ - `top`, returning the element with smallest priority
56
+ - `pop`, return the element with smallest priority and remove it from the structure
57
+ - `update(item, priority)`, update the priority of the given item, which must already be in the heap
58
+
59
+ `top` is O(1). The others are O(log n) where n is the number of items in the heap.
60
+
61
+ By default we have a min-heap: the top element is the one with smallest priority. A configuration parameter at construction can make
62
+ it a max-heap.
63
+
64
+ Another configuration parameter allows the creation of a "non-addressable" heap. This makes it impossible to call `update`, but
65
+ allows the insertion of duplicate items (which is sometimes useful) and slightly faster operation overall.
66
+
67
+ See https://en.wikipedia.org/wiki/Binary_heap and the paper by Edelkamp, Elmasry, and Katajainen [[EEK2017]](#references).
68
+
69
+ ## Priority Search Tree
70
+
71
+ A PST stores a set P of two-dimensional points in a way that allows certain queries about P to be answered efficiently. The data
72
+ structure was introduced by McCreight [[McC1985]](#references). De, Maheshawari, Nandy, and Smid [[DMNS2011]](#references) showed
73
+ how to build the structure in-place and we use their approach here.
74
+
75
+ - `largest_y_in_ne(x0, y0)` and `largest_y_in_nw(x0, y0)`, the "highest" (max-y) point in the quadrant to the northest/northwest of
76
+ (x0, y0);
77
+ - `smallest_x_in_ne(x0, y0)`, the "leftmost" (min-x) point in the quadrant to the northeast of (x0, y0);
78
+ - `largest_x_in_nw(x0, y0)`, the "rightmost" (max-x) point in the quadrant to the northwest of (x0, y0);
79
+ - `largest_y_in_3_sided(x0, x1, y0)`, the highest point in the region specified by x0 <= x <= x1 and y0 <= y; and
80
+ - `enumerate_3_sided(x0, x1, y0)`, enumerate all the points in that region.
81
+
82
+ Here compass directions are the natural ones in the x-y plane with the positive x-axis pointing east and the positive y-axis
83
+ pointing north.
84
+
85
+ There is no `smallest_x_in_3_sided(x0, x1, y0)`. Just use `smallest_x_in_ne(x0, y0)`.
86
+
87
+ The single-point queries run in O(log n) time, where n is the size of P, while `enumerate_3_sided` runs in O(m + log n), where m is
88
+ the number of points actually enumerated.
89
+
90
+ The implementation is in `MaxPrioritySearchTree` (MaxPST for short), so called because internally the structure is, among other
91
+ things, a max-heap on the y-coordinates.
92
+
93
+ These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
94
+ [[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.
95
+
96
+ We also provide a `MinPrioritySearchTree`, which answers analagous queries in the southward-infinite quadrants and 3-sided
97
+ regions.
98
+
99
+ By default these data structures are immutable: once constructed they cannot be changed. But there is a constructor option that
100
+ makes the instance "dynamic". This allows us to delete the element at the root of the tree - the one with largest y value (smallest
101
+ for MinPST) - with the `delete_top!` method. This operation is important in certain algorithms, such as enumerating all maximal
102
+ empty rectangles (see the second paper by De et al.[[DMNS2013]](#references)) Note that points can still not be added to the PST in
103
+ any case, and choosing the dynamic option makes certain internal bookkeeping operations slower.
104
+
105
+ In [[DMNS2013]](#references) De et al. generalize the in-place structure to a _Min-max Priority Search Tree_ (MinmaxPST) that can
106
+ answer queries in all four quadrants and both "kinds" of 3-sided boxes. Having one of these would save the trouble of constructing
107
+ both a MaxPST and MinPST. But the presentiation is hard to follow in places and the paper's pseudocode is buggy.[^minmaxpst]
108
+
109
+ ## Segment Tree
110
+
111
+ Segment trees store information related to subintervals of a certain array. For example, they can be used to find the sum of the
112
+ elements in an arbitrary subinterval A[i..j] of an array A[0..n] in O(log n) time. Each node in the tree corresponds to a subarray
113
+ of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for arbitrary
114
+ subarrays.
115
+
116
+ An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
117
+
118
+ Generic code is provided in `SegmentTreeTemplate`. Concrete classes are written by providing a handful of simple lambdas and
119
+ constants to the template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a
120
+ segment tree, for which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes,
121
+ `MaxValSegmentTree` and `IndexOfMaxValSegmentTree`, for examples.
122
+
123
+ ## Algorithms
124
+
125
+ The Algorithms submodule contains some algorithms using the data structures.
126
+
127
+ - `maximal_empty_rectangles(points)`
128
+ - We are given a set P contained in a minimal box B = [x_min, x_max] x [y_min, y_max]. An _empty rectangle_ is a axis-parallel
129
+ rectangle with positive area contained in B containing no element of P in its interior. A _maximal empty rectangle_ is an empty
130
+ rectangle not properly contained in any other empty rectangle. This method yields each maximal empty rectangle in the form
131
+ [left, right, bottom, top].
132
+ - The algorithm is due to [[DMNS2013]](#references).
133
+
134
+ # References
135
+ - [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
136
+ - [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI 10.1007/s00224-017-9760-2.
137
+ - [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985.
138
+ - [DMNS2011] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Priority Search Tree_, 23rd Canadian Conference on Computational Geometry, 2011.
139
+ - [DMNS2013] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46 (2013), pp 310-327.
140
+
141
+ [^minmaxpst]: See the comments in the fragmentary class `MinMaxPrioritySearchTree` for further details.
data/Rakefile ADDED
@@ -0,0 +1,16 @@
1
+ require 'rubygems'
2
+ require 'rake/testtask'
3
+ require 'rake/extensiontask'
4
+
5
+ Rake::ExtensionTask.new('data_structures_rmolinari/c_disjoint_union') do |ext|
6
+ ext.name = 'CDisjointUnion'
7
+ ext.ext_dir = 'ext/c_disjoint_union'
8
+ ext.lib_dir = 'lib/data_structures_rmolinari/'
9
+ end
10
+
11
+ Rake::TestTask.new do |t|
12
+ t.libs << 'test'
13
+ end
14
+
15
+ desc 'Run Tests'
16
+ task default: :test
@@ -0,0 +1,412 @@
1
+ /*
2
+ * This is a C implementation of a simple Ruby Disjoint Union data structure.
3
+ *
4
+ * A Disjoint Union doesn't have much of an implementation in Ruby: see disjoint_union.rb in this gem. This means that we don't gain
5
+ * much by implementing it in C but that it serves as a good learning experience for me.
6
+ *
7
+ * It turns out that writing a C extension for Ruby like this isn't very complicated, but there are a bunch of moving parts and the
8
+ * available documentation is a bit of a slog. Writing this was very educational.
9
+ *
10
+ * See https://docs.ruby-lang.org/en/master/extension_rdoc.html for some documentation. It's a bit hard to read in places, but
11
+ * plugging away at things helps.
12
+ *
13
+ * https://guides.rubygems.org/gems-with-extensions/ is a decent tutorial, though it leaves out lots of details.
14
+ *
15
+ * See https://aaronbedra.com/extending-ruby/ for another tutorial.
16
+ */
17
+
18
+ #include "ruby.h"
19
+
20
+ // The Shared::DataError exception type in the Ruby code. We only need it when we detect a runtime error, so a macro is simplest and
21
+ // just fine.
22
+ #define mShared rb_define_module("Shared")
23
+ #define eDataError rb_const_get(mShared, rb_intern_const("DataError"))
24
+
25
+ /**
26
+ * It's been so long since I've written non-trival C that I need to copy examples from online.
27
+ *
28
+ * Dynamic array of longs, with an initial value for otherwise uninitialized elements.
29
+ * Based on https://stackoverflow.com/questions/3536153/c-dynamically-growing-array
30
+ */
31
+ typedef struct {
32
+ long *array;
33
+ size_t size;
34
+ long default_val;
35
+ } DynamicArray;
36
+
37
+ void initDynamicArray(DynamicArray *a, size_t initial_size, long default_val) {
38
+ a->array = malloc(initial_size * sizeof(long));
39
+ a->size = initial_size;
40
+ a->default_val = default_val;
41
+
42
+ for (size_t i = 0; i < initial_size; i++) {
43
+ a->array[i] = default_val;
44
+ }
45
+ }
46
+
47
+ void insertDynamicArray(DynamicArray *a, unsigned long index, long element) {
48
+ if (a->size <= index) {
49
+ size_t new_size = a->size;
50
+ while (new_size <= index) {
51
+ new_size = 8 * new_size / 5 + 8; // 8/5 gives "Fibonnacci-like" growth; adding 8 to avoid small arrays having to reallocate
52
+ // too often. Who knows if it's worth being "clever"."
53
+ }
54
+
55
+ long* new_array = realloc(a->array, new_size * sizeof(long));
56
+ if (!new_array) {
57
+ rb_raise(rb_eRuntimeError, "Cannot allocate memory to expand DynamicArray!");
58
+ }
59
+
60
+ a->array = new_array;
61
+ for (size_t i = a->size; i < new_size; i++) {
62
+ a->array[i] = a->default_val;
63
+ }
64
+
65
+ a->size = new_size;
66
+ }
67
+
68
+ a->array[index] = element;
69
+ }
70
+
71
+ void freeDynamicArray(DynamicArray *a) {
72
+ free(a->array);
73
+ a->array = NULL;
74
+ a->size = 0;
75
+ }
76
+
77
+ /**
78
+ * The C implementation of a Disjoint Union
79
+ *
80
+ * See Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
81
+ */
82
+
83
+ /*
84
+ * The Disjoint Union struct.
85
+ * - forest: an array of longs giving, for each element, the parent element of its tree.
86
+ * - An element e is the root of its tree just when forest[e] == e.
87
+ * - Two elements are in the same subset just when they are in the same tree in the forest.
88
+ * - So the key idea is that we can check this by navigating via parents from each element to their roots. Clever optimizations
89
+ * keep the trees flat and so most nodes are close to their roots.
90
+ * - rank: a array of longs giving the "rank" of each element.
91
+ * - This value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat. See Tarjan & van
92
+ * Leeuwen
93
+ * - subset_count: the number of (disjoint) subsets.
94
+ * - it isn't needed internally but may be useful to client code.
95
+ */
96
+ typedef struct du_data {
97
+ DynamicArray* forest; // the forest that describes the unified subsets
98
+ DynamicArray* rank; // the "ranks" of the elements, used when uniting subsets
99
+ size_t subset_count;
100
+ } disjoint_union_data;
101
+
102
+ /*
103
+ * Create one.
104
+ *
105
+ * The dynamic arrays are initialized with a size of 100 because I didn't have a better idea. This will end up getting called from
106
+ * the Ruby #allocate method, which happens before #initialize. Thus we don't know the calling code's desired initial size.
107
+ */
108
+ #define INITIAL_SIZE 100
109
+ static disjoint_union_data* create_disjoint_union() {
110
+ disjoint_union_data* disjoint_union = malloc(sizeof(disjoint_union_data));
111
+
112
+ // Allocate the structures
113
+ DynamicArray* forest = malloc(sizeof(DynamicArray));
114
+ DynamicArray* rank = malloc(sizeof(DynamicArray));
115
+ initDynamicArray(forest, INITIAL_SIZE, -1);
116
+ initDynamicArray(rank, INITIAL_SIZE, 0);
117
+
118
+ disjoint_union->forest = forest;
119
+ disjoint_union->rank = rank;
120
+ disjoint_union->subset_count = 0;
121
+
122
+ return disjoint_union;
123
+ }
124
+
125
+ /*
126
+ * Free the memory associated with a disjoint union. This will end up getting triggered by the Ruby garbage collector.
127
+ */
128
+ static void disjoint_union_free(void *ptr) {
129
+ if (ptr) {
130
+ disjoint_union_data *disjoint_union = ptr;
131
+ freeDynamicArray(disjoint_union->forest);
132
+ freeDynamicArray(disjoint_union->rank);
133
+
134
+ free(disjoint_union->forest);
135
+ disjoint_union->forest = NULL;
136
+
137
+ free(disjoint_union->rank);
138
+ disjoint_union->rank = NULL;
139
+
140
+ free(disjoint_union);
141
+ }
142
+ }
143
+
144
+ /************************************************************
145
+ * The disjoint union operations
146
+ ************************************************************/
147
+
148
+ /*
149
+ * Is the given element already a member of the universe?
150
+ */
151
+ static int present_p(disjoint_union_data* disjoint_union, size_t element) {
152
+ DynamicArray* forest = disjoint_union->forest;
153
+ return (forest->size > element && (forest->array[element] != forest->default_val));
154
+ }
155
+
156
+ /*
157
+ * Check that the given element is a member of the universe and raise Shared::DataError (ruby-side) if not
158
+ */
159
+ static void assert_membership(disjoint_union_data* disjoint_union, size_t element) {
160
+ if (!present_p(disjoint_union, element)) {
161
+ rb_raise(eDataError, "Value %zu is not part of the universe", element);
162
+ }
163
+ }
164
+
165
+ /*
166
+ * Add a new element to the universe. It starts out in its own singleton subset.
167
+ *
168
+ * Shared::DataError is raised if it is already an element.
169
+ */
170
+ static void add_new_element(disjoint_union_data* disjoint_union, size_t element) {
171
+ if (present_p(disjoint_union, element)) {
172
+ rb_raise(eDataError, "Element %zu already present in the universe", element);
173
+ }
174
+
175
+ insertDynamicArray(disjoint_union->forest, element, element);
176
+ insertDynamicArray(disjoint_union->rank, element, 0);
177
+ disjoint_union->subset_count++;
178
+ }
179
+
180
+ /*
181
+ * Find the canonical representative of the given element. This is the root of the tree (in forest) containing element.
182
+ *
183
+ * Two elements are in the same subset exactly when their canonical representatives are equal.
184
+ */
185
+ static size_t find(disjoint_union_data* disjoint_union, size_t element) {
186
+ assert_membership(disjoint_union, element);
187
+
188
+ // We implement find with "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
189
+ long* d = disjoint_union->forest->array; // the actual forest data
190
+ size_t x = element;
191
+ while (d[d[x]] != d[x]) {
192
+ x = d[x] = d[d[x]];
193
+ }
194
+ return d[x];
195
+ }
196
+
197
+ /*
198
+ * "Link"" the two given elements so that they are in the same subset now.
199
+ *
200
+ * In other words, merge the subtrees containing the two elements.
201
+ *
202
+ * Good performace (see Tarjan and van Leeuwin) assumes that elt1 and elt2 area are disinct and already the roots of their trees,
203
+ * though we don't check that here.
204
+ */
205
+ static void link_roots(disjoint_union_data* disjoint_union, size_t elt1, size_t elt2) {
206
+ long* rank = disjoint_union->rank->array;
207
+ long* forest = disjoint_union->forest->array;
208
+
209
+ if (rank[elt1] > rank[elt2]) {
210
+ forest[elt2] = elt1;
211
+ } else if (rank[elt1] == rank[elt2]) {
212
+ forest[elt2] = elt1;
213
+ rank[elt1]++;
214
+ } else {
215
+ forest[elt1] = elt2;
216
+ }
217
+
218
+ disjoint_union->subset_count--;
219
+ }
220
+
221
+ /*
222
+ * "Unite" or merge the subsets containing elt1 and elt2.
223
+ */
224
+ static void unite(disjoint_union_data* disjoint_union, size_t elt1, size_t elt2) {
225
+ assert_membership(disjoint_union, elt1);
226
+ assert_membership(disjoint_union, elt2);
227
+
228
+ if (elt1 == elt2) {
229
+ rb_raise(eDataError, "Uniting an element with itself is meaningless");
230
+ }
231
+
232
+ size_t root1 = find(disjoint_union, elt1);
233
+ size_t root2 = find(disjoint_union, elt2);
234
+
235
+ if (root1 == root2) {
236
+ return; // already united
237
+ }
238
+
239
+ link_roots(disjoint_union, root1, root2);
240
+ }
241
+
242
+
243
+ /**
244
+ * Wrapping and unwrapping things for the Ruby runtime
245
+ *
246
+ */
247
+
248
+ // How much memory (roughly) does a disjoint_union_data instance consume? I guess the Ruby runtime can use this information when
249
+ // deciding how agressive to be during garbage collection and such.
250
+ static size_t disjoint_union_memsize(const void *ptr) {
251
+ if (ptr) {
252
+ const disjoint_union_data *disjoint_union = ptr;
253
+ return (2 * disjoint_union->forest->size * sizeof(long)); // disjoint_union->rank is the same size
254
+ } else {
255
+ return 0;
256
+ }
257
+ }
258
+
259
+ /*
260
+ * A configuration struct that tells the Ruby runtime how to deal with a disjoint_union_data object.
261
+ *
262
+ * https://docs.ruby-lang.org/en/master/extension_rdoc.html#label-Encapsulate+C+data+into+a+Ruby+object
263
+ */
264
+ static const rb_data_type_t disjoint_union_type = {
265
+ .wrap_struct_name = "disjoint_union",
266
+ { // help for the Ruby garbage collector
267
+ .dmark = NULL, // dmark, for marking other Ruby objects. We don't hold any other objects so this can be NULL
268
+ .dfree = disjoint_union_free, // how to free the memory associated with an object
269
+ .dsize = disjoint_union_memsize, // roughly how much space does the object consume?
270
+ },
271
+ .data = NULL, // a data field we could use for something here if we wanted. Ruby ignores it
272
+ .flags = 0 // GC-related flag values.
273
+ };
274
+
275
+ /*
276
+ * Helper: check that a Ruby value is a non-negative Fixnum and convert it to a nice C long
277
+ *
278
+ * TODO: can we return an size_t or unsigned long instead?
279
+ */
280
+ static long checked_nonneg_fixnum(VALUE val) {
281
+ Check_Type(val, T_FIXNUM);
282
+ long c_val = FIX2LONG(val);
283
+
284
+ if (c_val < 0) {
285
+ rb_raise(eDataError, "Value must be non-negative");
286
+ }
287
+
288
+ return c_val;
289
+ }
290
+
291
+ /*
292
+ * Unwrap a Rubyfied disjoint union to get the C struct inside.
293
+ */
294
+ static disjoint_union_data* unwrapped(VALUE self) {
295
+ disjoint_union_data* disjoint_union;
296
+ TypedData_Get_Struct((self), disjoint_union_data, &disjoint_union_type, disjoint_union);
297
+ return disjoint_union;
298
+ }
299
+
300
+ /*
301
+ * This is for CDisjointUnion.allocate on the Ruby side
302
+ */
303
+ static VALUE disjoint_union_alloc(VALUE klass) {
304
+ disjoint_union_data* disjoint_union = create_disjoint_union();
305
+ return TypedData_Wrap_Struct(klass, &disjoint_union_type, disjoint_union);
306
+ }
307
+
308
+ /*
309
+ * A single parameter is optional. If given it should be a non-negative integer and specifies the initial size, s, of the universe
310
+ * 0, 1, ..., s-1.
311
+ *
312
+ * If no argument is given we act as though a value of 0 were passed.
313
+ */
314
+ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
315
+ if (argc == 0) {
316
+ return self;
317
+ } else if (argc > 1) {
318
+ rb_raise(rb_eArgError, "wrong number of arguments");
319
+ } else {
320
+ size_t initial_size = checked_nonneg_fixnum(argv[0]);
321
+ disjoint_union_data* disjoint_union = unwrapped(self);
322
+
323
+ for (size_t i = 0; i < initial_size; i++) {
324
+ add_new_element(disjoint_union, i);
325
+ }
326
+ }
327
+ return self;
328
+ }
329
+
330
+ /**
331
+ * And now the simple wrappers around the Disjoint Union C functionality. In each case we
332
+ * - unwrap a 'VALUE self',
333
+ * - i.e., theCDisjointUnion instance that contains a disjoint_union_data struct;
334
+ * - munge any other arguments into longs;
335
+ * - call the appropriate C function to act on the struct; and
336
+ * - return an appropriate VALUE for the Ruby runtime can use.
337
+ *
338
+ * We make them into methods on CDisjointUnion in the Init_CDisjointUnion function, below.
339
+ */
340
+
341
+ /*
342
+ * Add a new subset to the universe containing the element +new_v+.
343
+ *
344
+ * @param the new element, starting in its own singleton subset
345
+ * - it must be a non-negative integer, not already part of the universe of elements.
346
+ */
347
+ static VALUE disjoint_union_make_set(VALUE self, VALUE arg) {
348
+ add_new_element(unwrapped(self), checked_nonneg_fixnum(arg));
349
+
350
+ return Qnil;
351
+ }
352
+
353
+ /*
354
+ * @return the number of subsets into which the universe is currently partitioned.
355
+ */
356
+ static VALUE disjoint_union_subset_count(VALUE self) {
357
+ return LONG2NUM(unwrapped(self)->subset_count);
358
+ }
359
+
360
+ /*
361
+ * The canonical representative of the subset containing e. Two elements d and e are in the same subset exactly when find(d) ==
362
+ * find(e).
363
+ *
364
+ * The parameter must be in the universe of elements.
365
+ *
366
+ * @return (Integer) one of the universe of elements
367
+ */
368
+ static VALUE disjoint_union_find(VALUE self, VALUE arg) {
369
+ return LONG2NUM(find(unwrapped(self), checked_nonneg_fixnum(arg)));
370
+ }
371
+
372
+ /*
373
+ * Declare that the arguments are equivalent, i.e., in the same subset. If they are already in the same subset this is a no-op.
374
+ *
375
+ * Each argument must be in the universe of elements
376
+ */
377
+ static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
378
+ unite(unwrapped(self), checked_nonneg_fixnum(arg1), checked_nonneg_fixnum(arg2));
379
+
380
+ return Qnil;
381
+ }
382
+
383
+ /*
384
+ * A Disjoint Union.
385
+ *
386
+ * A "disjoint set union" that represents a set of elements that belonging to _disjoint_ subsets. Alternatively, this expresses a
387
+ * partion of a fixed set.
388
+ *
389
+ * The data structure provides efficient actions to merge two disjoint subsets, i.e., replace them by their union, and determine if
390
+ * two elements are in the same subset.
391
+ *
392
+ * The elements of the set are 0, 1, ..., n-1, where n is the size of the universe. Client code can map its data to these
393
+ * representatives.
394
+ *
395
+ * See https://en.wikipedia.org/wiki/Disjoint-set_data_structure for a good introduction.
396
+ *
397
+ * The code uses several ideas from Tarjan and van Leeuwen for efficiency. We use "union by rank" in +unite+ and path-halving in
398
+ * +find+. Together, these make the amortized cost of each opperation effectively constant.
399
+ *
400
+ * - Tarjan, Robert E., van Leeuwen, Jan (1984). _Worst-case analysis of set union algorithms_. Journal of the ACM. 31 (2): 245–281.
401
+ */
402
+ void Init_c_disjoint_union() {
403
+ VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
404
+ VALUE cDisjointUnion = rb_define_class_under(mDataStructuresRMolinari, "CDisjointUnion", rb_cObject);
405
+
406
+ rb_define_alloc_func(cDisjointUnion, disjoint_union_alloc);
407
+ rb_define_method(cDisjointUnion, "initialize", disjoint_union_init, -1);
408
+ rb_define_method(cDisjointUnion, "make_set", disjoint_union_make_set, 1);
409
+ rb_define_method(cDisjointUnion, "subset_count", disjoint_union_subset_count, 0);
410
+ rb_define_method(cDisjointUnion, "find", disjoint_union_find, 1);
411
+ rb_define_method(cDisjointUnion, "unite", disjoint_union_unite, 2);
412
+ }
@@ -0,0 +1,12 @@
1
+ require 'mkmf'
2
+
3
+ abort 'missing malloc()' unless have_func "malloc"
4
+ abort 'missing realloc()' unless have_func "realloc"
5
+
6
+ if try_cflags('-O')
7
+ append_cflags('-O')
8
+ end
9
+
10
+ extension_name = "c_disjoint_union"
11
+ dir_config(extension_name)
12
+ create_makefile("data_structures_rmolinari/c_disjoint_union")