data_structures_rmolinari 0.4.1 → 0.4.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +21 -3
- data/README.md +141 -0
- data/Rakefile +16 -0
- data/ext/c_disjoint_union/disjoint_union.c +412 -0
- data/ext/c_disjoint_union/extconf.rb +12 -0
- data/lib/data_structures_rmolinari/algorithms.rb +103 -0
- data/lib/data_structures_rmolinari/max_priority_search_tree.rb +200 -58
- data/lib/data_structures_rmolinari/min_priority_search_tree.rb +187 -0
- data/lib/data_structures_rmolinari/{generic_segment_tree.rb → segment_tree_template.rb} +0 -0
- data/lib/data_structures_rmolinari/shared.rb +5 -16
- data/lib/data_structures_rmolinari.rb +6 -3
- metadata +12 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c912d4ddf3a7cfc721b7f298a966f7e0d4cbd4249797506457605a44774523a0
|
4
|
+
data.tar.gz: c168b7096178e496f76fa53f5b8566cd2ac26897fd5b362c3c37da5314f2a6db
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 0c88c1ad7c07fe6358e3eefd21406b4bbd33a89e89731edb3997bb027efca52a7d312ffee1435af960be73c5cd5212950a854a11c6f2105dfccc47ed4ae00c2b
|
7
|
+
data.tar.gz: 9bf6e4570017217b59f4f3a0b1d9e23d7752ba4e4b5dc11a988826726367956c6564763044f23b5105293213c4844667249a7f88726861f792264c1d634256ae
|
data/CHANGELOG.md
CHANGED
@@ -1,6 +1,24 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
-
## [
|
3
|
+
## [0.4.2] 2023-01-26
|
4
|
+
|
5
|
+
### Added
|
6
|
+
|
7
|
+
- MinPrioritySearchTree added
|
8
|
+
- it's a thin layer on top of a MaxPrioritySearchTree with negated y values.
|
9
|
+
|
10
|
+
- MaxPrioritySearchTree
|
11
|
+
- A "dynamic" constructor option now allows deletion of the "top" (root) node. This is useful in certain algorithms.
|
12
|
+
|
13
|
+
- DisjointUnion
|
14
|
+
- Added a proof-of-concept implementation in C, which is about twice as fast.
|
15
|
+
|
16
|
+
- Algorithms
|
17
|
+
- Implement the Maximal Empty Rectangle algorithm of De et al. It uses a dynamic MaxPST.
|
18
|
+
|
19
|
+
## [0.4.1] 2023-01-12
|
20
|
+
|
21
|
+
- Update this file for the gem (though I forgot to add this comment first!)
|
4
22
|
|
5
23
|
## [0.4.0] 2023-01-12
|
6
24
|
|
@@ -10,10 +28,10 @@
|
|
10
28
|
- Duplicate y values are now allowed. Ties are broken with a preference for smaller values of x.
|
11
29
|
- Method names have changed
|
12
30
|
- Instead of "highest", "leftmost", "rightmost" we use "largest_y", "smallest_x", "largest_x"
|
13
|
-
- For example,
|
31
|
+
- For example, `highest_ne` is now `largest_y_in_nw`
|
14
32
|
- DisjointUnion
|
15
33
|
- the size argument to initializer is optional. The default value is 0.
|
16
|
-
- elements can be added to the "universe" of known values with
|
34
|
+
- elements can be added to the "universe" of known values with `make_set`
|
17
35
|
|
18
36
|
### Removed
|
19
37
|
- MinmaxPrioritySearchTree is no longer available
|
data/README.md
ADDED
@@ -0,0 +1,141 @@
|
|
1
|
+
# Data Structures
|
2
|
+
|
3
|
+
This is a small collection of Ruby data structures that I have implemented for my own interest. Implementing the code for a data
|
4
|
+
structure is almost always more educational than simply reading about it and is usually fun. I wrote some of them while
|
5
|
+
participating in the Advent of Code (https://adventofcode.com/).
|
6
|
+
|
7
|
+
These implementations are not particularly clever. They are based on the expository descriptions and pseudo-code I found as I read
|
8
|
+
about each structure and so are not as fast as possible.
|
9
|
+
|
10
|
+
The code is available as a gem: https://rubygems.org/gems/data_structures_rmolinari.
|
11
|
+
|
12
|
+
## Usage
|
13
|
+
|
14
|
+
The right way to organize the code is not obvious to me. For now the data structures are all defined in the module
|
15
|
+
`DataStructuresRMolinari` to avoid polluting the global namespace.
|
16
|
+
|
17
|
+
Example usage after the gem is installed:
|
18
|
+
```
|
19
|
+
require 'data_structures_rmolinari`
|
20
|
+
|
21
|
+
# Pull what we need out of the namespace
|
22
|
+
MaxPrioritySearchTree = DataStructuresRMolinari::MaxPrioritySearchTree
|
23
|
+
Point = DataStructuresRMolinari::Point # anything responding to :x and :y is fine
|
24
|
+
|
25
|
+
pst = MaxPrioritySearchTree.new([Point.new(1, 1)])
|
26
|
+
puts pst.largest_y_in_ne(0, 0) # "Point(1,1)"
|
27
|
+
```
|
28
|
+
|
29
|
+
# Implementations
|
30
|
+
|
31
|
+
## Disjoint Union
|
32
|
+
|
33
|
+
We represent a set S of non-negative integers as the disjoint union of subsets. Equivalently, we represent a partition of S. The
|
34
|
+
data structure provides very efficient implementation of the two key operations
|
35
|
+
- `unite(e, f)`, which merges the subsets containing e and f; and
|
36
|
+
- `find(e)`, which returns the canonical representative of the subset containing e. Two elements e and f are in the same subset
|
37
|
+
exactly when `find(e) == find(f)`.
|
38
|
+
|
39
|
+
It also provides
|
40
|
+
- `make_set(v)`, which adds a new value `v` to the set S, starting out in a singleton subset.
|
41
|
+
|
42
|
+
For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
|
43
|
+
van Leeuwen.
|
44
|
+
|
45
|
+
There is an experimental implementation as a C extension in c_disjoint_union.c. The Ruby class for this implementation is
|
46
|
+
`CDisjointUnion`. Benchmarks indicate that a long sequence of `unite` calls is about twice as fast.
|
47
|
+
|
48
|
+
## Heap
|
49
|
+
|
50
|
+
This is a standard binary heap with an `update` method, suitable for use as a priority queue. There are several supported
|
51
|
+
operations:
|
52
|
+
|
53
|
+
- `insert(item, priority)`, insert the given item with the stated priority.
|
54
|
+
- By default, items must be distinct.
|
55
|
+
- `top`, returning the element with smallest priority
|
56
|
+
- `pop`, return the element with smallest priority and remove it from the structure
|
57
|
+
- `update(item, priority)`, update the priority of the given item, which must already be in the heap
|
58
|
+
|
59
|
+
`top` is O(1). The others are O(log n) where n is the number of items in the heap.
|
60
|
+
|
61
|
+
By default we have a min-heap: the top element is the one with smallest priority. A configuration parameter at construction can make
|
62
|
+
it a max-heap.
|
63
|
+
|
64
|
+
Another configuration parameter allows the creation of a "non-addressable" heap. This makes it impossible to call `update`, but
|
65
|
+
allows the insertion of duplicate items (which is sometimes useful) and slightly faster operation overall.
|
66
|
+
|
67
|
+
See https://en.wikipedia.org/wiki/Binary_heap and the paper by Edelkamp, Elmasry, and Katajainen [[EEK2017]](#references).
|
68
|
+
|
69
|
+
## Priority Search Tree
|
70
|
+
|
71
|
+
A PST stores a set P of two-dimensional points in a way that allows certain queries about P to be answered efficiently. The data
|
72
|
+
structure was introduced by McCreight [[McC1985]](#references). De, Maheshawari, Nandy, and Smid [[DMNS2011]](#references) showed
|
73
|
+
how to build the structure in-place and we use their approach here.
|
74
|
+
|
75
|
+
- `largest_y_in_ne(x0, y0)` and `largest_y_in_nw(x0, y0)`, the "highest" (max-y) point in the quadrant to the northest/northwest of
|
76
|
+
(x0, y0);
|
77
|
+
- `smallest_x_in_ne(x0, y0)`, the "leftmost" (min-x) point in the quadrant to the northeast of (x0, y0);
|
78
|
+
- `largest_x_in_nw(x0, y0)`, the "rightmost" (max-x) point in the quadrant to the northwest of (x0, y0);
|
79
|
+
- `largest_y_in_3_sided(x0, x1, y0)`, the highest point in the region specified by x0 <= x <= x1 and y0 <= y; and
|
80
|
+
- `enumerate_3_sided(x0, x1, y0)`, enumerate all the points in that region.
|
81
|
+
|
82
|
+
Here compass directions are the natural ones in the x-y plane with the positive x-axis pointing east and the positive y-axis
|
83
|
+
pointing north.
|
84
|
+
|
85
|
+
There is no `smallest_x_in_3_sided(x0, x1, y0)`. Just use `smallest_x_in_ne(x0, y0)`.
|
86
|
+
|
87
|
+
The single-point queries run in O(log n) time, where n is the size of P, while `enumerate_3_sided` runs in O(m + log n), where m is
|
88
|
+
the number of points actually enumerated.
|
89
|
+
|
90
|
+
The implementation is in `MaxPrioritySearchTree` (MaxPST for short), so called because internally the structure is, among other
|
91
|
+
things, a max-heap on the y-coordinates.
|
92
|
+
|
93
|
+
These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
|
94
|
+
[[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.
|
95
|
+
|
96
|
+
We also provide a `MinPrioritySearchTree`, which answers analagous queries in the southward-infinite quadrants and 3-sided
|
97
|
+
regions.
|
98
|
+
|
99
|
+
By default these data structures are immutable: once constructed they cannot be changed. But there is a constructor option that
|
100
|
+
makes the instance "dynamic". This allows us to delete the element at the root of the tree - the one with largest y value (smallest
|
101
|
+
for MinPST) - with the `delete_top!` method. This operation is important in certain algorithms, such as enumerating all maximal
|
102
|
+
empty rectangles (see the second paper by De et al.[[DMNS2013]](#references)) Note that points can still not be added to the PST in
|
103
|
+
any case, and choosing the dynamic option makes certain internal bookkeeping operations slower.
|
104
|
+
|
105
|
+
In [[DMNS2013]](#references) De et al. generalize the in-place structure to a _Min-max Priority Search Tree_ (MinmaxPST) that can
|
106
|
+
answer queries in all four quadrants and both "kinds" of 3-sided boxes. Having one of these would save the trouble of constructing
|
107
|
+
both a MaxPST and MinPST. But the presentiation is hard to follow in places and the paper's pseudocode is buggy.[^minmaxpst]
|
108
|
+
|
109
|
+
## Segment Tree
|
110
|
+
|
111
|
+
Segment trees store information related to subintervals of a certain array. For example, they can be used to find the sum of the
|
112
|
+
elements in an arbitrary subinterval A[i..j] of an array A[0..n] in O(log n) time. Each node in the tree corresponds to a subarray
|
113
|
+
of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for arbitrary
|
114
|
+
subarrays.
|
115
|
+
|
116
|
+
An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
|
117
|
+
|
118
|
+
Generic code is provided in `SegmentTreeTemplate`. Concrete classes are written by providing a handful of simple lambdas and
|
119
|
+
constants to the template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a
|
120
|
+
segment tree, for which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes,
|
121
|
+
`MaxValSegmentTree` and `IndexOfMaxValSegmentTree`, for examples.
|
122
|
+
|
123
|
+
## Algorithms
|
124
|
+
|
125
|
+
The Algorithms submodule contains some algorithms using the data structures.
|
126
|
+
|
127
|
+
- `maximal_empty_rectangles(points)`
|
128
|
+
- We are given a set P contained in a minimal box B = [x_min, x_max] x [y_min, y_max]. An _empty rectangle_ is a axis-parallel
|
129
|
+
rectangle with positive area contained in B containing no element of P in its interior. A _maximal empty rectangle_ is an empty
|
130
|
+
rectangle not properly contained in any other empty rectangle. This method yields each maximal empty rectangle in the form
|
131
|
+
[left, right, bottom, top].
|
132
|
+
- The algorithm is due to [[DMNS2013]](#references).
|
133
|
+
|
134
|
+
# References
|
135
|
+
- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
|
136
|
+
- [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI 10.1007/s00224-017-9760-2.
|
137
|
+
- [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985.
|
138
|
+
- [DMNS2011] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Priority Search Tree_, 23rd Canadian Conference on Computational Geometry, 2011.
|
139
|
+
- [DMNS2013] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46 (2013), pp 310-327.
|
140
|
+
|
141
|
+
[^minmaxpst]: See the comments in the fragmentary class `MinMaxPrioritySearchTree` for further details.
|
data/Rakefile
ADDED
@@ -0,0 +1,16 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'rake/testtask'
|
3
|
+
require 'rake/extensiontask'
|
4
|
+
|
5
|
+
Rake::ExtensionTask.new('data_structures_rmolinari/c_disjoint_union') do |ext|
|
6
|
+
ext.name = 'CDisjointUnion'
|
7
|
+
ext.ext_dir = 'ext/c_disjoint_union'
|
8
|
+
ext.lib_dir = 'lib/data_structures_rmolinari/'
|
9
|
+
end
|
10
|
+
|
11
|
+
Rake::TestTask.new do |t|
|
12
|
+
t.libs << 'test'
|
13
|
+
end
|
14
|
+
|
15
|
+
desc 'Run Tests'
|
16
|
+
task default: :test
|
@@ -0,0 +1,412 @@
|
|
1
|
+
/*
|
2
|
+
* This is a C implementation of a simple Ruby Disjoint Union data structure.
|
3
|
+
*
|
4
|
+
* A Disjoint Union doesn't have much of an implementation in Ruby: see disjoint_union.rb in this gem. This means that we don't gain
|
5
|
+
* much by implementing it in C but that it serves as a good learning experience for me.
|
6
|
+
*
|
7
|
+
* It turns out that writing a C extension for Ruby like this isn't very complicated, but there are a bunch of moving parts and the
|
8
|
+
* available documentation is a bit of a slog. Writing this was very educational.
|
9
|
+
*
|
10
|
+
* See https://docs.ruby-lang.org/en/master/extension_rdoc.html for some documentation. It's a bit hard to read in places, but
|
11
|
+
* plugging away at things helps.
|
12
|
+
*
|
13
|
+
* https://guides.rubygems.org/gems-with-extensions/ is a decent tutorial, though it leaves out lots of details.
|
14
|
+
*
|
15
|
+
* See https://aaronbedra.com/extending-ruby/ for another tutorial.
|
16
|
+
*/
|
17
|
+
|
18
|
+
#include "ruby.h"
|
19
|
+
|
20
|
+
// The Shared::DataError exception type in the Ruby code. We only need it when we detect a runtime error, so a macro is simplest and
|
21
|
+
// just fine.
|
22
|
+
#define mShared rb_define_module("Shared")
|
23
|
+
#define eDataError rb_const_get(mShared, rb_intern_const("DataError"))
|
24
|
+
|
25
|
+
/**
|
26
|
+
* It's been so long since I've written non-trival C that I need to copy examples from online.
|
27
|
+
*
|
28
|
+
* Dynamic array of longs, with an initial value for otherwise uninitialized elements.
|
29
|
+
* Based on https://stackoverflow.com/questions/3536153/c-dynamically-growing-array
|
30
|
+
*/
|
31
|
+
typedef struct {
|
32
|
+
long *array;
|
33
|
+
size_t size;
|
34
|
+
long default_val;
|
35
|
+
} DynamicArray;
|
36
|
+
|
37
|
+
void initDynamicArray(DynamicArray *a, size_t initial_size, long default_val) {
|
38
|
+
a->array = malloc(initial_size * sizeof(long));
|
39
|
+
a->size = initial_size;
|
40
|
+
a->default_val = default_val;
|
41
|
+
|
42
|
+
for (size_t i = 0; i < initial_size; i++) {
|
43
|
+
a->array[i] = default_val;
|
44
|
+
}
|
45
|
+
}
|
46
|
+
|
47
|
+
void insertDynamicArray(DynamicArray *a, unsigned long index, long element) {
|
48
|
+
if (a->size <= index) {
|
49
|
+
size_t new_size = a->size;
|
50
|
+
while (new_size <= index) {
|
51
|
+
new_size = 8 * new_size / 5 + 8; // 8/5 gives "Fibonnacci-like" growth; adding 8 to avoid small arrays having to reallocate
|
52
|
+
// too often. Who knows if it's worth being "clever"."
|
53
|
+
}
|
54
|
+
|
55
|
+
long* new_array = realloc(a->array, new_size * sizeof(long));
|
56
|
+
if (!new_array) {
|
57
|
+
rb_raise(rb_eRuntimeError, "Cannot allocate memory to expand DynamicArray!");
|
58
|
+
}
|
59
|
+
|
60
|
+
a->array = new_array;
|
61
|
+
for (size_t i = a->size; i < new_size; i++) {
|
62
|
+
a->array[i] = a->default_val;
|
63
|
+
}
|
64
|
+
|
65
|
+
a->size = new_size;
|
66
|
+
}
|
67
|
+
|
68
|
+
a->array[index] = element;
|
69
|
+
}
|
70
|
+
|
71
|
+
void freeDynamicArray(DynamicArray *a) {
|
72
|
+
free(a->array);
|
73
|
+
a->array = NULL;
|
74
|
+
a->size = 0;
|
75
|
+
}
|
76
|
+
|
77
|
+
/**
|
78
|
+
* The C implementation of a Disjoint Union
|
79
|
+
*
|
80
|
+
* See Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
|
81
|
+
*/
|
82
|
+
|
83
|
+
/*
|
84
|
+
* The Disjoint Union struct.
|
85
|
+
* - forest: an array of longs giving, for each element, the parent element of its tree.
|
86
|
+
* - An element e is the root of its tree just when forest[e] == e.
|
87
|
+
* - Two elements are in the same subset just when they are in the same tree in the forest.
|
88
|
+
* - So the key idea is that we can check this by navigating via parents from each element to their roots. Clever optimizations
|
89
|
+
* keep the trees flat and so most nodes are close to their roots.
|
90
|
+
* - rank: a array of longs giving the "rank" of each element.
|
91
|
+
* - This value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat. See Tarjan & van
|
92
|
+
* Leeuwen
|
93
|
+
* - subset_count: the number of (disjoint) subsets.
|
94
|
+
* - it isn't needed internally but may be useful to client code.
|
95
|
+
*/
|
96
|
+
typedef struct du_data {
|
97
|
+
DynamicArray* forest; // the forest that describes the unified subsets
|
98
|
+
DynamicArray* rank; // the "ranks" of the elements, used when uniting subsets
|
99
|
+
size_t subset_count;
|
100
|
+
} disjoint_union_data;
|
101
|
+
|
102
|
+
/*
|
103
|
+
* Create one.
|
104
|
+
*
|
105
|
+
* The dynamic arrays are initialized with a size of 100 because I didn't have a better idea. This will end up getting called from
|
106
|
+
* the Ruby #allocate method, which happens before #initialize. Thus we don't know the calling code's desired initial size.
|
107
|
+
*/
|
108
|
+
#define INITIAL_SIZE 100
|
109
|
+
static disjoint_union_data* create_disjoint_union() {
|
110
|
+
disjoint_union_data* disjoint_union = malloc(sizeof(disjoint_union_data));
|
111
|
+
|
112
|
+
// Allocate the structures
|
113
|
+
DynamicArray* forest = malloc(sizeof(DynamicArray));
|
114
|
+
DynamicArray* rank = malloc(sizeof(DynamicArray));
|
115
|
+
initDynamicArray(forest, INITIAL_SIZE, -1);
|
116
|
+
initDynamicArray(rank, INITIAL_SIZE, 0);
|
117
|
+
|
118
|
+
disjoint_union->forest = forest;
|
119
|
+
disjoint_union->rank = rank;
|
120
|
+
disjoint_union->subset_count = 0;
|
121
|
+
|
122
|
+
return disjoint_union;
|
123
|
+
}
|
124
|
+
|
125
|
+
/*
|
126
|
+
* Free the memory associated with a disjoint union. This will end up getting triggered by the Ruby garbage collector.
|
127
|
+
*/
|
128
|
+
static void disjoint_union_free(void *ptr) {
|
129
|
+
if (ptr) {
|
130
|
+
disjoint_union_data *disjoint_union = ptr;
|
131
|
+
freeDynamicArray(disjoint_union->forest);
|
132
|
+
freeDynamicArray(disjoint_union->rank);
|
133
|
+
|
134
|
+
free(disjoint_union->forest);
|
135
|
+
disjoint_union->forest = NULL;
|
136
|
+
|
137
|
+
free(disjoint_union->rank);
|
138
|
+
disjoint_union->rank = NULL;
|
139
|
+
|
140
|
+
free(disjoint_union);
|
141
|
+
}
|
142
|
+
}
|
143
|
+
|
144
|
+
/************************************************************
|
145
|
+
* The disjoint union operations
|
146
|
+
************************************************************/
|
147
|
+
|
148
|
+
/*
|
149
|
+
* Is the given element already a member of the universe?
|
150
|
+
*/
|
151
|
+
static int present_p(disjoint_union_data* disjoint_union, size_t element) {
|
152
|
+
DynamicArray* forest = disjoint_union->forest;
|
153
|
+
return (forest->size > element && (forest->array[element] != forest->default_val));
|
154
|
+
}
|
155
|
+
|
156
|
+
/*
|
157
|
+
* Check that the given element is a member of the universe and raise Shared::DataError (ruby-side) if not
|
158
|
+
*/
|
159
|
+
static void assert_membership(disjoint_union_data* disjoint_union, size_t element) {
|
160
|
+
if (!present_p(disjoint_union, element)) {
|
161
|
+
rb_raise(eDataError, "Value %zu is not part of the universe", element);
|
162
|
+
}
|
163
|
+
}
|
164
|
+
|
165
|
+
/*
|
166
|
+
* Add a new element to the universe. It starts out in its own singleton subset.
|
167
|
+
*
|
168
|
+
* Shared::DataError is raised if it is already an element.
|
169
|
+
*/
|
170
|
+
static void add_new_element(disjoint_union_data* disjoint_union, size_t element) {
|
171
|
+
if (present_p(disjoint_union, element)) {
|
172
|
+
rb_raise(eDataError, "Element %zu already present in the universe", element);
|
173
|
+
}
|
174
|
+
|
175
|
+
insertDynamicArray(disjoint_union->forest, element, element);
|
176
|
+
insertDynamicArray(disjoint_union->rank, element, 0);
|
177
|
+
disjoint_union->subset_count++;
|
178
|
+
}
|
179
|
+
|
180
|
+
/*
|
181
|
+
* Find the canonical representative of the given element. This is the root of the tree (in forest) containing element.
|
182
|
+
*
|
183
|
+
* Two elements are in the same subset exactly when their canonical representatives are equal.
|
184
|
+
*/
|
185
|
+
static size_t find(disjoint_union_data* disjoint_union, size_t element) {
|
186
|
+
assert_membership(disjoint_union, element);
|
187
|
+
|
188
|
+
// We implement find with "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
|
189
|
+
long* d = disjoint_union->forest->array; // the actual forest data
|
190
|
+
size_t x = element;
|
191
|
+
while (d[d[x]] != d[x]) {
|
192
|
+
x = d[x] = d[d[x]];
|
193
|
+
}
|
194
|
+
return d[x];
|
195
|
+
}
|
196
|
+
|
197
|
+
/*
|
198
|
+
* "Link"" the two given elements so that they are in the same subset now.
|
199
|
+
*
|
200
|
+
* In other words, merge the subtrees containing the two elements.
|
201
|
+
*
|
202
|
+
* Good performace (see Tarjan and van Leeuwin) assumes that elt1 and elt2 area are disinct and already the roots of their trees,
|
203
|
+
* though we don't check that here.
|
204
|
+
*/
|
205
|
+
static void link_roots(disjoint_union_data* disjoint_union, size_t elt1, size_t elt2) {
|
206
|
+
long* rank = disjoint_union->rank->array;
|
207
|
+
long* forest = disjoint_union->forest->array;
|
208
|
+
|
209
|
+
if (rank[elt1] > rank[elt2]) {
|
210
|
+
forest[elt2] = elt1;
|
211
|
+
} else if (rank[elt1] == rank[elt2]) {
|
212
|
+
forest[elt2] = elt1;
|
213
|
+
rank[elt1]++;
|
214
|
+
} else {
|
215
|
+
forest[elt1] = elt2;
|
216
|
+
}
|
217
|
+
|
218
|
+
disjoint_union->subset_count--;
|
219
|
+
}
|
220
|
+
|
221
|
+
/*
|
222
|
+
* "Unite" or merge the subsets containing elt1 and elt2.
|
223
|
+
*/
|
224
|
+
static void unite(disjoint_union_data* disjoint_union, size_t elt1, size_t elt2) {
|
225
|
+
assert_membership(disjoint_union, elt1);
|
226
|
+
assert_membership(disjoint_union, elt2);
|
227
|
+
|
228
|
+
if (elt1 == elt2) {
|
229
|
+
rb_raise(eDataError, "Uniting an element with itself is meaningless");
|
230
|
+
}
|
231
|
+
|
232
|
+
size_t root1 = find(disjoint_union, elt1);
|
233
|
+
size_t root2 = find(disjoint_union, elt2);
|
234
|
+
|
235
|
+
if (root1 == root2) {
|
236
|
+
return; // already united
|
237
|
+
}
|
238
|
+
|
239
|
+
link_roots(disjoint_union, root1, root2);
|
240
|
+
}
|
241
|
+
|
242
|
+
|
243
|
+
/**
|
244
|
+
* Wrapping and unwrapping things for the Ruby runtime
|
245
|
+
*
|
246
|
+
*/
|
247
|
+
|
248
|
+
// How much memory (roughly) does a disjoint_union_data instance consume? I guess the Ruby runtime can use this information when
|
249
|
+
// deciding how agressive to be during garbage collection and such.
|
250
|
+
static size_t disjoint_union_memsize(const void *ptr) {
|
251
|
+
if (ptr) {
|
252
|
+
const disjoint_union_data *disjoint_union = ptr;
|
253
|
+
return (2 * disjoint_union->forest->size * sizeof(long)); // disjoint_union->rank is the same size
|
254
|
+
} else {
|
255
|
+
return 0;
|
256
|
+
}
|
257
|
+
}
|
258
|
+
|
259
|
+
/*
|
260
|
+
* A configuration struct that tells the Ruby runtime how to deal with a disjoint_union_data object.
|
261
|
+
*
|
262
|
+
* https://docs.ruby-lang.org/en/master/extension_rdoc.html#label-Encapsulate+C+data+into+a+Ruby+object
|
263
|
+
*/
|
264
|
+
static const rb_data_type_t disjoint_union_type = {
|
265
|
+
.wrap_struct_name = "disjoint_union",
|
266
|
+
{ // help for the Ruby garbage collector
|
267
|
+
.dmark = NULL, // dmark, for marking other Ruby objects. We don't hold any other objects so this can be NULL
|
268
|
+
.dfree = disjoint_union_free, // how to free the memory associated with an object
|
269
|
+
.dsize = disjoint_union_memsize, // roughly how much space does the object consume?
|
270
|
+
},
|
271
|
+
.data = NULL, // a data field we could use for something here if we wanted. Ruby ignores it
|
272
|
+
.flags = 0 // GC-related flag values.
|
273
|
+
};
|
274
|
+
|
275
|
+
/*
|
276
|
+
* Helper: check that a Ruby value is a non-negative Fixnum and convert it to a nice C long
|
277
|
+
*
|
278
|
+
* TODO: can we return an size_t or unsigned long instead?
|
279
|
+
*/
|
280
|
+
static long checked_nonneg_fixnum(VALUE val) {
|
281
|
+
Check_Type(val, T_FIXNUM);
|
282
|
+
long c_val = FIX2LONG(val);
|
283
|
+
|
284
|
+
if (c_val < 0) {
|
285
|
+
rb_raise(eDataError, "Value must be non-negative");
|
286
|
+
}
|
287
|
+
|
288
|
+
return c_val;
|
289
|
+
}
|
290
|
+
|
291
|
+
/*
|
292
|
+
* Unwrap a Rubyfied disjoint union to get the C struct inside.
|
293
|
+
*/
|
294
|
+
static disjoint_union_data* unwrapped(VALUE self) {
|
295
|
+
disjoint_union_data* disjoint_union;
|
296
|
+
TypedData_Get_Struct((self), disjoint_union_data, &disjoint_union_type, disjoint_union);
|
297
|
+
return disjoint_union;
|
298
|
+
}
|
299
|
+
|
300
|
+
/*
|
301
|
+
* This is for CDisjointUnion.allocate on the Ruby side
|
302
|
+
*/
|
303
|
+
static VALUE disjoint_union_alloc(VALUE klass) {
|
304
|
+
disjoint_union_data* disjoint_union = create_disjoint_union();
|
305
|
+
return TypedData_Wrap_Struct(klass, &disjoint_union_type, disjoint_union);
|
306
|
+
}
|
307
|
+
|
308
|
+
/*
|
309
|
+
* A single parameter is optional. If given it should be a non-negative integer and specifies the initial size, s, of the universe
|
310
|
+
* 0, 1, ..., s-1.
|
311
|
+
*
|
312
|
+
* If no argument is given we act as though a value of 0 were passed.
|
313
|
+
*/
|
314
|
+
static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
|
315
|
+
if (argc == 0) {
|
316
|
+
return self;
|
317
|
+
} else if (argc > 1) {
|
318
|
+
rb_raise(rb_eArgError, "wrong number of arguments");
|
319
|
+
} else {
|
320
|
+
size_t initial_size = checked_nonneg_fixnum(argv[0]);
|
321
|
+
disjoint_union_data* disjoint_union = unwrapped(self);
|
322
|
+
|
323
|
+
for (size_t i = 0; i < initial_size; i++) {
|
324
|
+
add_new_element(disjoint_union, i);
|
325
|
+
}
|
326
|
+
}
|
327
|
+
return self;
|
328
|
+
}
|
329
|
+
|
330
|
+
/**
|
331
|
+
* And now the simple wrappers around the Disjoint Union C functionality. In each case we
|
332
|
+
* - unwrap a 'VALUE self',
|
333
|
+
* - i.e., theCDisjointUnion instance that contains a disjoint_union_data struct;
|
334
|
+
* - munge any other arguments into longs;
|
335
|
+
* - call the appropriate C function to act on the struct; and
|
336
|
+
* - return an appropriate VALUE for the Ruby runtime can use.
|
337
|
+
*
|
338
|
+
* We make them into methods on CDisjointUnion in the Init_CDisjointUnion function, below.
|
339
|
+
*/
|
340
|
+
|
341
|
+
/*
|
342
|
+
* Add a new subset to the universe containing the element +new_v+.
|
343
|
+
*
|
344
|
+
* @param the new element, starting in its own singleton subset
|
345
|
+
* - it must be a non-negative integer, not already part of the universe of elements.
|
346
|
+
*/
|
347
|
+
static VALUE disjoint_union_make_set(VALUE self, VALUE arg) {
|
348
|
+
add_new_element(unwrapped(self), checked_nonneg_fixnum(arg));
|
349
|
+
|
350
|
+
return Qnil;
|
351
|
+
}
|
352
|
+
|
353
|
+
/*
|
354
|
+
* @return the number of subsets into which the universe is currently partitioned.
|
355
|
+
*/
|
356
|
+
static VALUE disjoint_union_subset_count(VALUE self) {
|
357
|
+
return LONG2NUM(unwrapped(self)->subset_count);
|
358
|
+
}
|
359
|
+
|
360
|
+
/*
|
361
|
+
* The canonical representative of the subset containing e. Two elements d and e are in the same subset exactly when find(d) ==
|
362
|
+
* find(e).
|
363
|
+
*
|
364
|
+
* The parameter must be in the universe of elements.
|
365
|
+
*
|
366
|
+
* @return (Integer) one of the universe of elements
|
367
|
+
*/
|
368
|
+
static VALUE disjoint_union_find(VALUE self, VALUE arg) {
|
369
|
+
return LONG2NUM(find(unwrapped(self), checked_nonneg_fixnum(arg)));
|
370
|
+
}
|
371
|
+
|
372
|
+
/*
|
373
|
+
* Declare that the arguments are equivalent, i.e., in the same subset. If they are already in the same subset this is a no-op.
|
374
|
+
*
|
375
|
+
* Each argument must be in the universe of elements
|
376
|
+
*/
|
377
|
+
static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
|
378
|
+
unite(unwrapped(self), checked_nonneg_fixnum(arg1), checked_nonneg_fixnum(arg2));
|
379
|
+
|
380
|
+
return Qnil;
|
381
|
+
}
|
382
|
+
|
383
|
+
/*
|
384
|
+
* A Disjoint Union.
|
385
|
+
*
|
386
|
+
* A "disjoint set union" that represents a set of elements that belonging to _disjoint_ subsets. Alternatively, this expresses a
|
387
|
+
* partion of a fixed set.
|
388
|
+
*
|
389
|
+
* The data structure provides efficient actions to merge two disjoint subsets, i.e., replace them by their union, and determine if
|
390
|
+
* two elements are in the same subset.
|
391
|
+
*
|
392
|
+
* The elements of the set are 0, 1, ..., n-1, where n is the size of the universe. Client code can map its data to these
|
393
|
+
* representatives.
|
394
|
+
*
|
395
|
+
* See https://en.wikipedia.org/wiki/Disjoint-set_data_structure for a good introduction.
|
396
|
+
*
|
397
|
+
* The code uses several ideas from Tarjan and van Leeuwen for efficiency. We use "union by rank" in +unite+ and path-halving in
|
398
|
+
* +find+. Together, these make the amortized cost of each opperation effectively constant.
|
399
|
+
*
|
400
|
+
* - Tarjan, Robert E., van Leeuwen, Jan (1984). _Worst-case analysis of set union algorithms_. Journal of the ACM. 31 (2): 245–281.
|
401
|
+
*/
|
402
|
+
void Init_c_disjoint_union() {
|
403
|
+
VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
|
404
|
+
VALUE cDisjointUnion = rb_define_class_under(mDataStructuresRMolinari, "CDisjointUnion", rb_cObject);
|
405
|
+
|
406
|
+
rb_define_alloc_func(cDisjointUnion, disjoint_union_alloc);
|
407
|
+
rb_define_method(cDisjointUnion, "initialize", disjoint_union_init, -1);
|
408
|
+
rb_define_method(cDisjointUnion, "make_set", disjoint_union_make_set, 1);
|
409
|
+
rb_define_method(cDisjointUnion, "subset_count", disjoint_union_subset_count, 0);
|
410
|
+
rb_define_method(cDisjointUnion, "find", disjoint_union_find, 1);
|
411
|
+
rb_define_method(cDisjointUnion, "unite", disjoint_union_unite, 2);
|
412
|
+
}
|
@@ -0,0 +1,12 @@
|
|
1
|
+
require 'mkmf'
|
2
|
+
|
3
|
+
abort 'missing malloc()' unless have_func "malloc"
|
4
|
+
abort 'missing realloc()' unless have_func "realloc"
|
5
|
+
|
6
|
+
if try_cflags('-O')
|
7
|
+
append_cflags('-O')
|
8
|
+
end
|
9
|
+
|
10
|
+
extension_name = "c_disjoint_union"
|
11
|
+
dir_config(extension_name)
|
12
|
+
create_makefile("data_structures_rmolinari/c_disjoint_union")
|