data_structures_rmolinari 0.4.1 → 0.4.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: eb25e49219167201208f45a402b202180466202bc071940da418b2f84d281f6d
4
- data.tar.gz: f43c1614c2a433d7a4e1148eb90121fc4b1d61c807afeb78c456d77b66935adb
3
+ metadata.gz: c912d4ddf3a7cfc721b7f298a966f7e0d4cbd4249797506457605a44774523a0
4
+ data.tar.gz: c168b7096178e496f76fa53f5b8566cd2ac26897fd5b362c3c37da5314f2a6db
5
5
  SHA512:
6
- metadata.gz: d2e77397f790e8fe8d650d7727be55b464aa8f19d928e215b824820b712df1f5762d74ed304196e1c773960f02d73cd29bc07486f0cc28eb5ddcd5dbd422d691
7
- data.tar.gz: 56328269625f88b5119b64696f792a2e567d6327605dc9a6cc391edd33e5e49f293adb304ef3b819094b860299f904b79862eaac2a8eaeff0d249350bce280db
6
+ metadata.gz: 0c88c1ad7c07fe6358e3eefd21406b4bbd33a89e89731edb3997bb027efca52a7d312ffee1435af960be73c5cd5212950a854a11c6f2105dfccc47ed4ae00c2b
7
+ data.tar.gz: 9bf6e4570017217b59f4f3a0b1d9e23d7752ba4e4b5dc11a988826726367956c6564763044f23b5105293213c4844667249a7f88726861f792264c1d634256ae
data/CHANGELOG.md CHANGED
@@ -1,6 +1,24 @@
1
1
  # Changelog
2
2
 
3
- ## [Unreleased]
3
+ ## [0.4.2] 2023-01-26
4
+
5
+ ### Added
6
+
7
+ - MinPrioritySearchTree added
8
+ - it's a thin layer on top of a MaxPrioritySearchTree with negated y values.
9
+
10
+ - MaxPrioritySearchTree
11
+ - A "dynamic" constructor option now allows deletion of the "top" (root) node. This is useful in certain algorithms.
12
+
13
+ - DisjointUnion
14
+ - Added a proof-of-concept implementation in C, which is about twice as fast.
15
+
16
+ - Algorithms
17
+ - Implement the Maximal Empty Rectangle algorithm of De et al. It uses a dynamic MaxPST.
18
+
19
+ ## [0.4.1] 2023-01-12
20
+
21
+ - Update this file for the gem (though I forgot to add this comment first!)
4
22
 
5
23
  ## [0.4.0] 2023-01-12
6
24
 
@@ -10,10 +28,10 @@
10
28
  - Duplicate y values are now allowed. Ties are broken with a preference for smaller values of x.
11
29
  - Method names have changed
12
30
  - Instead of "highest", "leftmost", "rightmost" we use "largest_y", "smallest_x", "largest_x"
13
- - For example, +highest_ne+ is now +largest_y_in_nw+
31
+ - For example, `highest_ne` is now `largest_y_in_nw`
14
32
  - DisjointUnion
15
33
  - the size argument to initializer is optional. The default value is 0.
16
- - elements can be added to the "universe" of known values with +make_set+
34
+ - elements can be added to the "universe" of known values with `make_set`
17
35
 
18
36
  ### Removed
19
37
  - MinmaxPrioritySearchTree is no longer available
data/README.md ADDED
@@ -0,0 +1,141 @@
1
+ # Data Structures
2
+
3
+ This is a small collection of Ruby data structures that I have implemented for my own interest. Implementing the code for a data
4
+ structure is almost always more educational than simply reading about it and is usually fun. I wrote some of them while
5
+ participating in the Advent of Code (https://adventofcode.com/).
6
+
7
+ These implementations are not particularly clever. They are based on the expository descriptions and pseudo-code I found as I read
8
+ about each structure and so are not as fast as possible.
9
+
10
+ The code is available as a gem: https://rubygems.org/gems/data_structures_rmolinari.
11
+
12
+ ## Usage
13
+
14
+ The right way to organize the code is not obvious to me. For now the data structures are all defined in the module
15
+ `DataStructuresRMolinari` to avoid polluting the global namespace.
16
+
17
+ Example usage after the gem is installed:
18
+ ```
19
+ require 'data_structures_rmolinari`
20
+
21
+ # Pull what we need out of the namespace
22
+ MaxPrioritySearchTree = DataStructuresRMolinari::MaxPrioritySearchTree
23
+ Point = DataStructuresRMolinari::Point # anything responding to :x and :y is fine
24
+
25
+ pst = MaxPrioritySearchTree.new([Point.new(1, 1)])
26
+ puts pst.largest_y_in_ne(0, 0) # "Point(1,1)"
27
+ ```
28
+
29
+ # Implementations
30
+
31
+ ## Disjoint Union
32
+
33
+ We represent a set S of non-negative integers as the disjoint union of subsets. Equivalently, we represent a partition of S. The
34
+ data structure provides very efficient implementation of the two key operations
35
+ - `unite(e, f)`, which merges the subsets containing e and f; and
36
+ - `find(e)`, which returns the canonical representative of the subset containing e. Two elements e and f are in the same subset
37
+ exactly when `find(e) == find(f)`.
38
+
39
+ It also provides
40
+ - `make_set(v)`, which adds a new value `v` to the set S, starting out in a singleton subset.
41
+
42
+ For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
43
+ van Leeuwen.
44
+
45
+ There is an experimental implementation as a C extension in c_disjoint_union.c. The Ruby class for this implementation is
46
+ `CDisjointUnion`. Benchmarks indicate that a long sequence of `unite` calls is about twice as fast.
47
+
48
+ ## Heap
49
+
50
+ This is a standard binary heap with an `update` method, suitable for use as a priority queue. There are several supported
51
+ operations:
52
+
53
+ - `insert(item, priority)`, insert the given item with the stated priority.
54
+ - By default, items must be distinct.
55
+ - `top`, returning the element with smallest priority
56
+ - `pop`, return the element with smallest priority and remove it from the structure
57
+ - `update(item, priority)`, update the priority of the given item, which must already be in the heap
58
+
59
+ `top` is O(1). The others are O(log n) where n is the number of items in the heap.
60
+
61
+ By default we have a min-heap: the top element is the one with smallest priority. A configuration parameter at construction can make
62
+ it a max-heap.
63
+
64
+ Another configuration parameter allows the creation of a "non-addressable" heap. This makes it impossible to call `update`, but
65
+ allows the insertion of duplicate items (which is sometimes useful) and slightly faster operation overall.
66
+
67
+ See https://en.wikipedia.org/wiki/Binary_heap and the paper by Edelkamp, Elmasry, and Katajainen [[EEK2017]](#references).
68
+
69
+ ## Priority Search Tree
70
+
71
+ A PST stores a set P of two-dimensional points in a way that allows certain queries about P to be answered efficiently. The data
72
+ structure was introduced by McCreight [[McC1985]](#references). De, Maheshawari, Nandy, and Smid [[DMNS2011]](#references) showed
73
+ how to build the structure in-place and we use their approach here.
74
+
75
+ - `largest_y_in_ne(x0, y0)` and `largest_y_in_nw(x0, y0)`, the "highest" (max-y) point in the quadrant to the northest/northwest of
76
+ (x0, y0);
77
+ - `smallest_x_in_ne(x0, y0)`, the "leftmost" (min-x) point in the quadrant to the northeast of (x0, y0);
78
+ - `largest_x_in_nw(x0, y0)`, the "rightmost" (max-x) point in the quadrant to the northwest of (x0, y0);
79
+ - `largest_y_in_3_sided(x0, x1, y0)`, the highest point in the region specified by x0 <= x <= x1 and y0 <= y; and
80
+ - `enumerate_3_sided(x0, x1, y0)`, enumerate all the points in that region.
81
+
82
+ Here compass directions are the natural ones in the x-y plane with the positive x-axis pointing east and the positive y-axis
83
+ pointing north.
84
+
85
+ There is no `smallest_x_in_3_sided(x0, x1, y0)`. Just use `smallest_x_in_ne(x0, y0)`.
86
+
87
+ The single-point queries run in O(log n) time, where n is the size of P, while `enumerate_3_sided` runs in O(m + log n), where m is
88
+ the number of points actually enumerated.
89
+
90
+ The implementation is in `MaxPrioritySearchTree` (MaxPST for short), so called because internally the structure is, among other
91
+ things, a max-heap on the y-coordinates.
92
+
93
+ These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
94
+ [[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.
95
+
96
+ We also provide a `MinPrioritySearchTree`, which answers analagous queries in the southward-infinite quadrants and 3-sided
97
+ regions.
98
+
99
+ By default these data structures are immutable: once constructed they cannot be changed. But there is a constructor option that
100
+ makes the instance "dynamic". This allows us to delete the element at the root of the tree - the one with largest y value (smallest
101
+ for MinPST) - with the `delete_top!` method. This operation is important in certain algorithms, such as enumerating all maximal
102
+ empty rectangles (see the second paper by De et al.[[DMNS2013]](#references)) Note that points can still not be added to the PST in
103
+ any case, and choosing the dynamic option makes certain internal bookkeeping operations slower.
104
+
105
+ In [[DMNS2013]](#references) De et al. generalize the in-place structure to a _Min-max Priority Search Tree_ (MinmaxPST) that can
106
+ answer queries in all four quadrants and both "kinds" of 3-sided boxes. Having one of these would save the trouble of constructing
107
+ both a MaxPST and MinPST. But the presentiation is hard to follow in places and the paper's pseudocode is buggy.[^minmaxpst]
108
+
109
+ ## Segment Tree
110
+
111
+ Segment trees store information related to subintervals of a certain array. For example, they can be used to find the sum of the
112
+ elements in an arbitrary subinterval A[i..j] of an array A[0..n] in O(log n) time. Each node in the tree corresponds to a subarray
113
+ of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for arbitrary
114
+ subarrays.
115
+
116
+ An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
117
+
118
+ Generic code is provided in `SegmentTreeTemplate`. Concrete classes are written by providing a handful of simple lambdas and
119
+ constants to the template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a
120
+ segment tree, for which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes,
121
+ `MaxValSegmentTree` and `IndexOfMaxValSegmentTree`, for examples.
122
+
123
+ ## Algorithms
124
+
125
+ The Algorithms submodule contains some algorithms using the data structures.
126
+
127
+ - `maximal_empty_rectangles(points)`
128
+ - We are given a set P contained in a minimal box B = [x_min, x_max] x [y_min, y_max]. An _empty rectangle_ is a axis-parallel
129
+ rectangle with positive area contained in B containing no element of P in its interior. A _maximal empty rectangle_ is an empty
130
+ rectangle not properly contained in any other empty rectangle. This method yields each maximal empty rectangle in the form
131
+ [left, right, bottom, top].
132
+ - The algorithm is due to [[DMNS2013]](#references).
133
+
134
+ # References
135
+ - [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
136
+ - [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI 10.1007/s00224-017-9760-2.
137
+ - [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985.
138
+ - [DMNS2011] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Priority Search Tree_, 23rd Canadian Conference on Computational Geometry, 2011.
139
+ - [DMNS2013] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46 (2013), pp 310-327.
140
+
141
+ [^minmaxpst]: See the comments in the fragmentary class `MinMaxPrioritySearchTree` for further details.
data/Rakefile ADDED
@@ -0,0 +1,16 @@
1
+ require 'rubygems'
2
+ require 'rake/testtask'
3
+ require 'rake/extensiontask'
4
+
5
+ Rake::ExtensionTask.new('data_structures_rmolinari/c_disjoint_union') do |ext|
6
+ ext.name = 'CDisjointUnion'
7
+ ext.ext_dir = 'ext/c_disjoint_union'
8
+ ext.lib_dir = 'lib/data_structures_rmolinari/'
9
+ end
10
+
11
+ Rake::TestTask.new do |t|
12
+ t.libs << 'test'
13
+ end
14
+
15
+ desc 'Run Tests'
16
+ task default: :test
@@ -0,0 +1,412 @@
1
+ /*
2
+ * This is a C implementation of a simple Ruby Disjoint Union data structure.
3
+ *
4
+ * A Disjoint Union doesn't have much of an implementation in Ruby: see disjoint_union.rb in this gem. This means that we don't gain
5
+ * much by implementing it in C but that it serves as a good learning experience for me.
6
+ *
7
+ * It turns out that writing a C extension for Ruby like this isn't very complicated, but there are a bunch of moving parts and the
8
+ * available documentation is a bit of a slog. Writing this was very educational.
9
+ *
10
+ * See https://docs.ruby-lang.org/en/master/extension_rdoc.html for some documentation. It's a bit hard to read in places, but
11
+ * plugging away at things helps.
12
+ *
13
+ * https://guides.rubygems.org/gems-with-extensions/ is a decent tutorial, though it leaves out lots of details.
14
+ *
15
+ * See https://aaronbedra.com/extending-ruby/ for another tutorial.
16
+ */
17
+
18
+ #include "ruby.h"
19
+
20
+ // The Shared::DataError exception type in the Ruby code. We only need it when we detect a runtime error, so a macro is simplest and
21
+ // just fine.
22
+ #define mShared rb_define_module("Shared")
23
+ #define eDataError rb_const_get(mShared, rb_intern_const("DataError"))
24
+
25
+ /**
26
+ * It's been so long since I've written non-trival C that I need to copy examples from online.
27
+ *
28
+ * Dynamic array of longs, with an initial value for otherwise uninitialized elements.
29
+ * Based on https://stackoverflow.com/questions/3536153/c-dynamically-growing-array
30
+ */
31
+ typedef struct {
32
+ long *array;
33
+ size_t size;
34
+ long default_val;
35
+ } DynamicArray;
36
+
37
+ void initDynamicArray(DynamicArray *a, size_t initial_size, long default_val) {
38
+ a->array = malloc(initial_size * sizeof(long));
39
+ a->size = initial_size;
40
+ a->default_val = default_val;
41
+
42
+ for (size_t i = 0; i < initial_size; i++) {
43
+ a->array[i] = default_val;
44
+ }
45
+ }
46
+
47
+ void insertDynamicArray(DynamicArray *a, unsigned long index, long element) {
48
+ if (a->size <= index) {
49
+ size_t new_size = a->size;
50
+ while (new_size <= index) {
51
+ new_size = 8 * new_size / 5 + 8; // 8/5 gives "Fibonnacci-like" growth; adding 8 to avoid small arrays having to reallocate
52
+ // too often. Who knows if it's worth being "clever"."
53
+ }
54
+
55
+ long* new_array = realloc(a->array, new_size * sizeof(long));
56
+ if (!new_array) {
57
+ rb_raise(rb_eRuntimeError, "Cannot allocate memory to expand DynamicArray!");
58
+ }
59
+
60
+ a->array = new_array;
61
+ for (size_t i = a->size; i < new_size; i++) {
62
+ a->array[i] = a->default_val;
63
+ }
64
+
65
+ a->size = new_size;
66
+ }
67
+
68
+ a->array[index] = element;
69
+ }
70
+
71
+ void freeDynamicArray(DynamicArray *a) {
72
+ free(a->array);
73
+ a->array = NULL;
74
+ a->size = 0;
75
+ }
76
+
77
+ /**
78
+ * The C implementation of a Disjoint Union
79
+ *
80
+ * See Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
81
+ */
82
+
83
+ /*
84
+ * The Disjoint Union struct.
85
+ * - forest: an array of longs giving, for each element, the parent element of its tree.
86
+ * - An element e is the root of its tree just when forest[e] == e.
87
+ * - Two elements are in the same subset just when they are in the same tree in the forest.
88
+ * - So the key idea is that we can check this by navigating via parents from each element to their roots. Clever optimizations
89
+ * keep the trees flat and so most nodes are close to their roots.
90
+ * - rank: a array of longs giving the "rank" of each element.
91
+ * - This value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat. See Tarjan & van
92
+ * Leeuwen
93
+ * - subset_count: the number of (disjoint) subsets.
94
+ * - it isn't needed internally but may be useful to client code.
95
+ */
96
+ typedef struct du_data {
97
+ DynamicArray* forest; // the forest that describes the unified subsets
98
+ DynamicArray* rank; // the "ranks" of the elements, used when uniting subsets
99
+ size_t subset_count;
100
+ } disjoint_union_data;
101
+
102
+ /*
103
+ * Create one.
104
+ *
105
+ * The dynamic arrays are initialized with a size of 100 because I didn't have a better idea. This will end up getting called from
106
+ * the Ruby #allocate method, which happens before #initialize. Thus we don't know the calling code's desired initial size.
107
+ */
108
+ #define INITIAL_SIZE 100
109
+ static disjoint_union_data* create_disjoint_union() {
110
+ disjoint_union_data* disjoint_union = malloc(sizeof(disjoint_union_data));
111
+
112
+ // Allocate the structures
113
+ DynamicArray* forest = malloc(sizeof(DynamicArray));
114
+ DynamicArray* rank = malloc(sizeof(DynamicArray));
115
+ initDynamicArray(forest, INITIAL_SIZE, -1);
116
+ initDynamicArray(rank, INITIAL_SIZE, 0);
117
+
118
+ disjoint_union->forest = forest;
119
+ disjoint_union->rank = rank;
120
+ disjoint_union->subset_count = 0;
121
+
122
+ return disjoint_union;
123
+ }
124
+
125
+ /*
126
+ * Free the memory associated with a disjoint union. This will end up getting triggered by the Ruby garbage collector.
127
+ */
128
+ static void disjoint_union_free(void *ptr) {
129
+ if (ptr) {
130
+ disjoint_union_data *disjoint_union = ptr;
131
+ freeDynamicArray(disjoint_union->forest);
132
+ freeDynamicArray(disjoint_union->rank);
133
+
134
+ free(disjoint_union->forest);
135
+ disjoint_union->forest = NULL;
136
+
137
+ free(disjoint_union->rank);
138
+ disjoint_union->rank = NULL;
139
+
140
+ free(disjoint_union);
141
+ }
142
+ }
143
+
144
+ /************************************************************
145
+ * The disjoint union operations
146
+ ************************************************************/
147
+
148
+ /*
149
+ * Is the given element already a member of the universe?
150
+ */
151
+ static int present_p(disjoint_union_data* disjoint_union, size_t element) {
152
+ DynamicArray* forest = disjoint_union->forest;
153
+ return (forest->size > element && (forest->array[element] != forest->default_val));
154
+ }
155
+
156
+ /*
157
+ * Check that the given element is a member of the universe and raise Shared::DataError (ruby-side) if not
158
+ */
159
+ static void assert_membership(disjoint_union_data* disjoint_union, size_t element) {
160
+ if (!present_p(disjoint_union, element)) {
161
+ rb_raise(eDataError, "Value %zu is not part of the universe", element);
162
+ }
163
+ }
164
+
165
+ /*
166
+ * Add a new element to the universe. It starts out in its own singleton subset.
167
+ *
168
+ * Shared::DataError is raised if it is already an element.
169
+ */
170
+ static void add_new_element(disjoint_union_data* disjoint_union, size_t element) {
171
+ if (present_p(disjoint_union, element)) {
172
+ rb_raise(eDataError, "Element %zu already present in the universe", element);
173
+ }
174
+
175
+ insertDynamicArray(disjoint_union->forest, element, element);
176
+ insertDynamicArray(disjoint_union->rank, element, 0);
177
+ disjoint_union->subset_count++;
178
+ }
179
+
180
+ /*
181
+ * Find the canonical representative of the given element. This is the root of the tree (in forest) containing element.
182
+ *
183
+ * Two elements are in the same subset exactly when their canonical representatives are equal.
184
+ */
185
+ static size_t find(disjoint_union_data* disjoint_union, size_t element) {
186
+ assert_membership(disjoint_union, element);
187
+
188
+ // We implement find with "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
189
+ long* d = disjoint_union->forest->array; // the actual forest data
190
+ size_t x = element;
191
+ while (d[d[x]] != d[x]) {
192
+ x = d[x] = d[d[x]];
193
+ }
194
+ return d[x];
195
+ }
196
+
197
+ /*
198
+ * "Link"" the two given elements so that they are in the same subset now.
199
+ *
200
+ * In other words, merge the subtrees containing the two elements.
201
+ *
202
+ * Good performace (see Tarjan and van Leeuwin) assumes that elt1 and elt2 area are disinct and already the roots of their trees,
203
+ * though we don't check that here.
204
+ */
205
+ static void link_roots(disjoint_union_data* disjoint_union, size_t elt1, size_t elt2) {
206
+ long* rank = disjoint_union->rank->array;
207
+ long* forest = disjoint_union->forest->array;
208
+
209
+ if (rank[elt1] > rank[elt2]) {
210
+ forest[elt2] = elt1;
211
+ } else if (rank[elt1] == rank[elt2]) {
212
+ forest[elt2] = elt1;
213
+ rank[elt1]++;
214
+ } else {
215
+ forest[elt1] = elt2;
216
+ }
217
+
218
+ disjoint_union->subset_count--;
219
+ }
220
+
221
+ /*
222
+ * "Unite" or merge the subsets containing elt1 and elt2.
223
+ */
224
+ static void unite(disjoint_union_data* disjoint_union, size_t elt1, size_t elt2) {
225
+ assert_membership(disjoint_union, elt1);
226
+ assert_membership(disjoint_union, elt2);
227
+
228
+ if (elt1 == elt2) {
229
+ rb_raise(eDataError, "Uniting an element with itself is meaningless");
230
+ }
231
+
232
+ size_t root1 = find(disjoint_union, elt1);
233
+ size_t root2 = find(disjoint_union, elt2);
234
+
235
+ if (root1 == root2) {
236
+ return; // already united
237
+ }
238
+
239
+ link_roots(disjoint_union, root1, root2);
240
+ }
241
+
242
+
243
+ /**
244
+ * Wrapping and unwrapping things for the Ruby runtime
245
+ *
246
+ */
247
+
248
+ // How much memory (roughly) does a disjoint_union_data instance consume? I guess the Ruby runtime can use this information when
249
+ // deciding how agressive to be during garbage collection and such.
250
+ static size_t disjoint_union_memsize(const void *ptr) {
251
+ if (ptr) {
252
+ const disjoint_union_data *disjoint_union = ptr;
253
+ return (2 * disjoint_union->forest->size * sizeof(long)); // disjoint_union->rank is the same size
254
+ } else {
255
+ return 0;
256
+ }
257
+ }
258
+
259
+ /*
260
+ * A configuration struct that tells the Ruby runtime how to deal with a disjoint_union_data object.
261
+ *
262
+ * https://docs.ruby-lang.org/en/master/extension_rdoc.html#label-Encapsulate+C+data+into+a+Ruby+object
263
+ */
264
+ static const rb_data_type_t disjoint_union_type = {
265
+ .wrap_struct_name = "disjoint_union",
266
+ { // help for the Ruby garbage collector
267
+ .dmark = NULL, // dmark, for marking other Ruby objects. We don't hold any other objects so this can be NULL
268
+ .dfree = disjoint_union_free, // how to free the memory associated with an object
269
+ .dsize = disjoint_union_memsize, // roughly how much space does the object consume?
270
+ },
271
+ .data = NULL, // a data field we could use for something here if we wanted. Ruby ignores it
272
+ .flags = 0 // GC-related flag values.
273
+ };
274
+
275
+ /*
276
+ * Helper: check that a Ruby value is a non-negative Fixnum and convert it to a nice C long
277
+ *
278
+ * TODO: can we return an size_t or unsigned long instead?
279
+ */
280
+ static long checked_nonneg_fixnum(VALUE val) {
281
+ Check_Type(val, T_FIXNUM);
282
+ long c_val = FIX2LONG(val);
283
+
284
+ if (c_val < 0) {
285
+ rb_raise(eDataError, "Value must be non-negative");
286
+ }
287
+
288
+ return c_val;
289
+ }
290
+
291
+ /*
292
+ * Unwrap a Rubyfied disjoint union to get the C struct inside.
293
+ */
294
+ static disjoint_union_data* unwrapped(VALUE self) {
295
+ disjoint_union_data* disjoint_union;
296
+ TypedData_Get_Struct((self), disjoint_union_data, &disjoint_union_type, disjoint_union);
297
+ return disjoint_union;
298
+ }
299
+
300
+ /*
301
+ * This is for CDisjointUnion.allocate on the Ruby side
302
+ */
303
+ static VALUE disjoint_union_alloc(VALUE klass) {
304
+ disjoint_union_data* disjoint_union = create_disjoint_union();
305
+ return TypedData_Wrap_Struct(klass, &disjoint_union_type, disjoint_union);
306
+ }
307
+
308
+ /*
309
+ * A single parameter is optional. If given it should be a non-negative integer and specifies the initial size, s, of the universe
310
+ * 0, 1, ..., s-1.
311
+ *
312
+ * If no argument is given we act as though a value of 0 were passed.
313
+ */
314
+ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
315
+ if (argc == 0) {
316
+ return self;
317
+ } else if (argc > 1) {
318
+ rb_raise(rb_eArgError, "wrong number of arguments");
319
+ } else {
320
+ size_t initial_size = checked_nonneg_fixnum(argv[0]);
321
+ disjoint_union_data* disjoint_union = unwrapped(self);
322
+
323
+ for (size_t i = 0; i < initial_size; i++) {
324
+ add_new_element(disjoint_union, i);
325
+ }
326
+ }
327
+ return self;
328
+ }
329
+
330
+ /**
331
+ * And now the simple wrappers around the Disjoint Union C functionality. In each case we
332
+ * - unwrap a 'VALUE self',
333
+ * - i.e., theCDisjointUnion instance that contains a disjoint_union_data struct;
334
+ * - munge any other arguments into longs;
335
+ * - call the appropriate C function to act on the struct; and
336
+ * - return an appropriate VALUE for the Ruby runtime can use.
337
+ *
338
+ * We make them into methods on CDisjointUnion in the Init_CDisjointUnion function, below.
339
+ */
340
+
341
+ /*
342
+ * Add a new subset to the universe containing the element +new_v+.
343
+ *
344
+ * @param the new element, starting in its own singleton subset
345
+ * - it must be a non-negative integer, not already part of the universe of elements.
346
+ */
347
+ static VALUE disjoint_union_make_set(VALUE self, VALUE arg) {
348
+ add_new_element(unwrapped(self), checked_nonneg_fixnum(arg));
349
+
350
+ return Qnil;
351
+ }
352
+
353
+ /*
354
+ * @return the number of subsets into which the universe is currently partitioned.
355
+ */
356
+ static VALUE disjoint_union_subset_count(VALUE self) {
357
+ return LONG2NUM(unwrapped(self)->subset_count);
358
+ }
359
+
360
+ /*
361
+ * The canonical representative of the subset containing e. Two elements d and e are in the same subset exactly when find(d) ==
362
+ * find(e).
363
+ *
364
+ * The parameter must be in the universe of elements.
365
+ *
366
+ * @return (Integer) one of the universe of elements
367
+ */
368
+ static VALUE disjoint_union_find(VALUE self, VALUE arg) {
369
+ return LONG2NUM(find(unwrapped(self), checked_nonneg_fixnum(arg)));
370
+ }
371
+
372
+ /*
373
+ * Declare that the arguments are equivalent, i.e., in the same subset. If they are already in the same subset this is a no-op.
374
+ *
375
+ * Each argument must be in the universe of elements
376
+ */
377
+ static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
378
+ unite(unwrapped(self), checked_nonneg_fixnum(arg1), checked_nonneg_fixnum(arg2));
379
+
380
+ return Qnil;
381
+ }
382
+
383
+ /*
384
+ * A Disjoint Union.
385
+ *
386
+ * A "disjoint set union" that represents a set of elements that belonging to _disjoint_ subsets. Alternatively, this expresses a
387
+ * partion of a fixed set.
388
+ *
389
+ * The data structure provides efficient actions to merge two disjoint subsets, i.e., replace them by their union, and determine if
390
+ * two elements are in the same subset.
391
+ *
392
+ * The elements of the set are 0, 1, ..., n-1, where n is the size of the universe. Client code can map its data to these
393
+ * representatives.
394
+ *
395
+ * See https://en.wikipedia.org/wiki/Disjoint-set_data_structure for a good introduction.
396
+ *
397
+ * The code uses several ideas from Tarjan and van Leeuwen for efficiency. We use "union by rank" in +unite+ and path-halving in
398
+ * +find+. Together, these make the amortized cost of each opperation effectively constant.
399
+ *
400
+ * - Tarjan, Robert E., van Leeuwen, Jan (1984). _Worst-case analysis of set union algorithms_. Journal of the ACM. 31 (2): 245–281.
401
+ */
402
+ void Init_c_disjoint_union() {
403
+ VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
404
+ VALUE cDisjointUnion = rb_define_class_under(mDataStructuresRMolinari, "CDisjointUnion", rb_cObject);
405
+
406
+ rb_define_alloc_func(cDisjointUnion, disjoint_union_alloc);
407
+ rb_define_method(cDisjointUnion, "initialize", disjoint_union_init, -1);
408
+ rb_define_method(cDisjointUnion, "make_set", disjoint_union_make_set, 1);
409
+ rb_define_method(cDisjointUnion, "subset_count", disjoint_union_subset_count, 0);
410
+ rb_define_method(cDisjointUnion, "find", disjoint_union_find, 1);
411
+ rb_define_method(cDisjointUnion, "unite", disjoint_union_unite, 2);
412
+ }
@@ -0,0 +1,12 @@
1
+ require 'mkmf'
2
+
3
+ abort 'missing malloc()' unless have_func "malloc"
4
+ abort 'missing realloc()' unless have_func "realloc"
5
+
6
+ if try_cflags('-O')
7
+ append_cflags('-O')
8
+ end
9
+
10
+ extension_name = "c_disjoint_union"
11
+ dir_config(extension_name)
12
+ create_makefile("data_structures_rmolinari/c_disjoint_union")