data_structures_rmolinari 0.4.1 → 0.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +21 -3
- data/README.md +141 -0
- data/Rakefile +16 -0
- data/ext/c_disjoint_union/disjoint_union.c +412 -0
- data/ext/c_disjoint_union/extconf.rb +12 -0
- data/lib/data_structures_rmolinari/algorithms.rb +103 -0
- data/lib/data_structures_rmolinari/max_priority_search_tree.rb +200 -58
- data/lib/data_structures_rmolinari/min_priority_search_tree.rb +187 -0
- data/lib/data_structures_rmolinari/{generic_segment_tree.rb → segment_tree_template.rb} +0 -0
- data/lib/data_structures_rmolinari/shared.rb +5 -16
- data/lib/data_structures_rmolinari.rb +6 -3
- metadata +12 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c912d4ddf3a7cfc721b7f298a966f7e0d4cbd4249797506457605a44774523a0
|
4
|
+
data.tar.gz: c168b7096178e496f76fa53f5b8566cd2ac26897fd5b362c3c37da5314f2a6db
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 0c88c1ad7c07fe6358e3eefd21406b4bbd33a89e89731edb3997bb027efca52a7d312ffee1435af960be73c5cd5212950a854a11c6f2105dfccc47ed4ae00c2b
|
7
|
+
data.tar.gz: 9bf6e4570017217b59f4f3a0b1d9e23d7752ba4e4b5dc11a988826726367956c6564763044f23b5105293213c4844667249a7f88726861f792264c1d634256ae
|
data/CHANGELOG.md
CHANGED
@@ -1,6 +1,24 @@
|
|
1
1
|
# Changelog
|
2
2
|
|
3
|
-
## [
|
3
|
+
## [0.4.2] 2023-01-26
|
4
|
+
|
5
|
+
### Added
|
6
|
+
|
7
|
+
- MinPrioritySearchTree added
|
8
|
+
- it's a thin layer on top of a MaxPrioritySearchTree with negated y values.
|
9
|
+
|
10
|
+
- MaxPrioritySearchTree
|
11
|
+
- A "dynamic" constructor option now allows deletion of the "top" (root) node. This is useful in certain algorithms.
|
12
|
+
|
13
|
+
- DisjointUnion
|
14
|
+
- Added a proof-of-concept implementation in C, which is about twice as fast.
|
15
|
+
|
16
|
+
- Algorithms
|
17
|
+
- Implement the Maximal Empty Rectangle algorithm of De et al. It uses a dynamic MaxPST.
|
18
|
+
|
19
|
+
## [0.4.1] 2023-01-12
|
20
|
+
|
21
|
+
- Update this file for the gem (though I forgot to add this comment first!)
|
4
22
|
|
5
23
|
## [0.4.0] 2023-01-12
|
6
24
|
|
@@ -10,10 +28,10 @@
|
|
10
28
|
- Duplicate y values are now allowed. Ties are broken with a preference for smaller values of x.
|
11
29
|
- Method names have changed
|
12
30
|
- Instead of "highest", "leftmost", "rightmost" we use "largest_y", "smallest_x", "largest_x"
|
13
|
-
- For example,
|
31
|
+
- For example, `highest_ne` is now `largest_y_in_nw`
|
14
32
|
- DisjointUnion
|
15
33
|
- the size argument to initializer is optional. The default value is 0.
|
16
|
-
- elements can be added to the "universe" of known values with
|
34
|
+
- elements can be added to the "universe" of known values with `make_set`
|
17
35
|
|
18
36
|
### Removed
|
19
37
|
- MinmaxPrioritySearchTree is no longer available
|
data/README.md
ADDED
@@ -0,0 +1,141 @@
|
|
1
|
+
# Data Structures
|
2
|
+
|
3
|
+
This is a small collection of Ruby data structures that I have implemented for my own interest. Implementing the code for a data
|
4
|
+
structure is almost always more educational than simply reading about it and is usually fun. I wrote some of them while
|
5
|
+
participating in the Advent of Code (https://adventofcode.com/).
|
6
|
+
|
7
|
+
These implementations are not particularly clever. They are based on the expository descriptions and pseudo-code I found as I read
|
8
|
+
about each structure and so are not as fast as possible.
|
9
|
+
|
10
|
+
The code is available as a gem: https://rubygems.org/gems/data_structures_rmolinari.
|
11
|
+
|
12
|
+
## Usage
|
13
|
+
|
14
|
+
The right way to organize the code is not obvious to me. For now the data structures are all defined in the module
|
15
|
+
`DataStructuresRMolinari` to avoid polluting the global namespace.
|
16
|
+
|
17
|
+
Example usage after the gem is installed:
|
18
|
+
```
|
19
|
+
require 'data_structures_rmolinari`
|
20
|
+
|
21
|
+
# Pull what we need out of the namespace
|
22
|
+
MaxPrioritySearchTree = DataStructuresRMolinari::MaxPrioritySearchTree
|
23
|
+
Point = DataStructuresRMolinari::Point # anything responding to :x and :y is fine
|
24
|
+
|
25
|
+
pst = MaxPrioritySearchTree.new([Point.new(1, 1)])
|
26
|
+
puts pst.largest_y_in_ne(0, 0) # "Point(1,1)"
|
27
|
+
```
|
28
|
+
|
29
|
+
# Implementations
|
30
|
+
|
31
|
+
## Disjoint Union
|
32
|
+
|
33
|
+
We represent a set S of non-negative integers as the disjoint union of subsets. Equivalently, we represent a partition of S. The
|
34
|
+
data structure provides very efficient implementation of the two key operations
|
35
|
+
- `unite(e, f)`, which merges the subsets containing e and f; and
|
36
|
+
- `find(e)`, which returns the canonical representative of the subset containing e. Two elements e and f are in the same subset
|
37
|
+
exactly when `find(e) == find(f)`.
|
38
|
+
|
39
|
+
It also provides
|
40
|
+
- `make_set(v)`, which adds a new value `v` to the set S, starting out in a singleton subset.
|
41
|
+
|
42
|
+
For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
|
43
|
+
van Leeuwen.
|
44
|
+
|
45
|
+
There is an experimental implementation as a C extension in c_disjoint_union.c. The Ruby class for this implementation is
|
46
|
+
`CDisjointUnion`. Benchmarks indicate that a long sequence of `unite` calls is about twice as fast.
|
47
|
+
|
48
|
+
## Heap
|
49
|
+
|
50
|
+
This is a standard binary heap with an `update` method, suitable for use as a priority queue. There are several supported
|
51
|
+
operations:
|
52
|
+
|
53
|
+
- `insert(item, priority)`, insert the given item with the stated priority.
|
54
|
+
- By default, items must be distinct.
|
55
|
+
- `top`, returning the element with smallest priority
|
56
|
+
- `pop`, return the element with smallest priority and remove it from the structure
|
57
|
+
- `update(item, priority)`, update the priority of the given item, which must already be in the heap
|
58
|
+
|
59
|
+
`top` is O(1). The others are O(log n) where n is the number of items in the heap.
|
60
|
+
|
61
|
+
By default we have a min-heap: the top element is the one with smallest priority. A configuration parameter at construction can make
|
62
|
+
it a max-heap.
|
63
|
+
|
64
|
+
Another configuration parameter allows the creation of a "non-addressable" heap. This makes it impossible to call `update`, but
|
65
|
+
allows the insertion of duplicate items (which is sometimes useful) and slightly faster operation overall.
|
66
|
+
|
67
|
+
See https://en.wikipedia.org/wiki/Binary_heap and the paper by Edelkamp, Elmasry, and Katajainen [[EEK2017]](#references).
|
68
|
+
|
69
|
+
## Priority Search Tree
|
70
|
+
|
71
|
+
A PST stores a set P of two-dimensional points in a way that allows certain queries about P to be answered efficiently. The data
|
72
|
+
structure was introduced by McCreight [[McC1985]](#references). De, Maheshawari, Nandy, and Smid [[DMNS2011]](#references) showed
|
73
|
+
how to build the structure in-place and we use their approach here.
|
74
|
+
|
75
|
+
- `largest_y_in_ne(x0, y0)` and `largest_y_in_nw(x0, y0)`, the "highest" (max-y) point in the quadrant to the northest/northwest of
|
76
|
+
(x0, y0);
|
77
|
+
- `smallest_x_in_ne(x0, y0)`, the "leftmost" (min-x) point in the quadrant to the northeast of (x0, y0);
|
78
|
+
- `largest_x_in_nw(x0, y0)`, the "rightmost" (max-x) point in the quadrant to the northwest of (x0, y0);
|
79
|
+
- `largest_y_in_3_sided(x0, x1, y0)`, the highest point in the region specified by x0 <= x <= x1 and y0 <= y; and
|
80
|
+
- `enumerate_3_sided(x0, x1, y0)`, enumerate all the points in that region.
|
81
|
+
|
82
|
+
Here compass directions are the natural ones in the x-y plane with the positive x-axis pointing east and the positive y-axis
|
83
|
+
pointing north.
|
84
|
+
|
85
|
+
There is no `smallest_x_in_3_sided(x0, x1, y0)`. Just use `smallest_x_in_ne(x0, y0)`.
|
86
|
+
|
87
|
+
The single-point queries run in O(log n) time, where n is the size of P, while `enumerate_3_sided` runs in O(m + log n), where m is
|
88
|
+
the number of points actually enumerated.
|
89
|
+
|
90
|
+
The implementation is in `MaxPrioritySearchTree` (MaxPST for short), so called because internally the structure is, among other
|
91
|
+
things, a max-heap on the y-coordinates.
|
92
|
+
|
93
|
+
These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
|
94
|
+
[[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.
|
95
|
+
|
96
|
+
We also provide a `MinPrioritySearchTree`, which answers analagous queries in the southward-infinite quadrants and 3-sided
|
97
|
+
regions.
|
98
|
+
|
99
|
+
By default these data structures are immutable: once constructed they cannot be changed. But there is a constructor option that
|
100
|
+
makes the instance "dynamic". This allows us to delete the element at the root of the tree - the one with largest y value (smallest
|
101
|
+
for MinPST) - with the `delete_top!` method. This operation is important in certain algorithms, such as enumerating all maximal
|
102
|
+
empty rectangles (see the second paper by De et al.[[DMNS2013]](#references)) Note that points can still not be added to the PST in
|
103
|
+
any case, and choosing the dynamic option makes certain internal bookkeeping operations slower.
|
104
|
+
|
105
|
+
In [[DMNS2013]](#references) De et al. generalize the in-place structure to a _Min-max Priority Search Tree_ (MinmaxPST) that can
|
106
|
+
answer queries in all four quadrants and both "kinds" of 3-sided boxes. Having one of these would save the trouble of constructing
|
107
|
+
both a MaxPST and MinPST. But the presentiation is hard to follow in places and the paper's pseudocode is buggy.[^minmaxpst]
|
108
|
+
|
109
|
+
## Segment Tree
|
110
|
+
|
111
|
+
Segment trees store information related to subintervals of a certain array. For example, they can be used to find the sum of the
|
112
|
+
elements in an arbitrary subinterval A[i..j] of an array A[0..n] in O(log n) time. Each node in the tree corresponds to a subarray
|
113
|
+
of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for arbitrary
|
114
|
+
subarrays.
|
115
|
+
|
116
|
+
An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
|
117
|
+
|
118
|
+
Generic code is provided in `SegmentTreeTemplate`. Concrete classes are written by providing a handful of simple lambdas and
|
119
|
+
constants to the template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a
|
120
|
+
segment tree, for which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes,
|
121
|
+
`MaxValSegmentTree` and `IndexOfMaxValSegmentTree`, for examples.
|
122
|
+
|
123
|
+
## Algorithms
|
124
|
+
|
125
|
+
The Algorithms submodule contains some algorithms using the data structures.
|
126
|
+
|
127
|
+
- `maximal_empty_rectangles(points)`
|
128
|
+
- We are given a set P contained in a minimal box B = [x_min, x_max] x [y_min, y_max]. An _empty rectangle_ is a axis-parallel
|
129
|
+
rectangle with positive area contained in B containing no element of P in its interior. A _maximal empty rectangle_ is an empty
|
130
|
+
rectangle not properly contained in any other empty rectangle. This method yields each maximal empty rectangle in the form
|
131
|
+
[left, right, bottom, top].
|
132
|
+
- The algorithm is due to [[DMNS2013]](#references).
|
133
|
+
|
134
|
+
# References
|
135
|
+
- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
|
136
|
+
- [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI 10.1007/s00224-017-9760-2.
|
137
|
+
- [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985.
|
138
|
+
- [DMNS2011] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Priority Search Tree_, 23rd Canadian Conference on Computational Geometry, 2011.
|
139
|
+
- [DMNS2013] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46 (2013), pp 310-327.
|
140
|
+
|
141
|
+
[^minmaxpst]: See the comments in the fragmentary class `MinMaxPrioritySearchTree` for further details.
|
data/Rakefile
ADDED
@@ -0,0 +1,16 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'rake/testtask'
|
3
|
+
require 'rake/extensiontask'
|
4
|
+
|
5
|
+
Rake::ExtensionTask.new('data_structures_rmolinari/c_disjoint_union') do |ext|
|
6
|
+
ext.name = 'CDisjointUnion'
|
7
|
+
ext.ext_dir = 'ext/c_disjoint_union'
|
8
|
+
ext.lib_dir = 'lib/data_structures_rmolinari/'
|
9
|
+
end
|
10
|
+
|
11
|
+
Rake::TestTask.new do |t|
|
12
|
+
t.libs << 'test'
|
13
|
+
end
|
14
|
+
|
15
|
+
desc 'Run Tests'
|
16
|
+
task default: :test
|
@@ -0,0 +1,412 @@
|
|
1
|
+
/*
|
2
|
+
* This is a C implementation of a simple Ruby Disjoint Union data structure.
|
3
|
+
*
|
4
|
+
* A Disjoint Union doesn't have much of an implementation in Ruby: see disjoint_union.rb in this gem. This means that we don't gain
|
5
|
+
* much by implementing it in C but that it serves as a good learning experience for me.
|
6
|
+
*
|
7
|
+
* It turns out that writing a C extension for Ruby like this isn't very complicated, but there are a bunch of moving parts and the
|
8
|
+
* available documentation is a bit of a slog. Writing this was very educational.
|
9
|
+
*
|
10
|
+
* See https://docs.ruby-lang.org/en/master/extension_rdoc.html for some documentation. It's a bit hard to read in places, but
|
11
|
+
* plugging away at things helps.
|
12
|
+
*
|
13
|
+
* https://guides.rubygems.org/gems-with-extensions/ is a decent tutorial, though it leaves out lots of details.
|
14
|
+
*
|
15
|
+
* See https://aaronbedra.com/extending-ruby/ for another tutorial.
|
16
|
+
*/
|
17
|
+
|
18
|
+
#include "ruby.h"
|
19
|
+
|
20
|
+
// The Shared::DataError exception type in the Ruby code. We only need it when we detect a runtime error, so a macro is simplest and
|
21
|
+
// just fine.
|
22
|
+
#define mShared rb_define_module("Shared")
|
23
|
+
#define eDataError rb_const_get(mShared, rb_intern_const("DataError"))
|
24
|
+
|
25
|
+
/**
|
26
|
+
* It's been so long since I've written non-trival C that I need to copy examples from online.
|
27
|
+
*
|
28
|
+
* Dynamic array of longs, with an initial value for otherwise uninitialized elements.
|
29
|
+
* Based on https://stackoverflow.com/questions/3536153/c-dynamically-growing-array
|
30
|
+
*/
|
31
|
+
typedef struct {
|
32
|
+
long *array;
|
33
|
+
size_t size;
|
34
|
+
long default_val;
|
35
|
+
} DynamicArray;
|
36
|
+
|
37
|
+
void initDynamicArray(DynamicArray *a, size_t initial_size, long default_val) {
|
38
|
+
a->array = malloc(initial_size * sizeof(long));
|
39
|
+
a->size = initial_size;
|
40
|
+
a->default_val = default_val;
|
41
|
+
|
42
|
+
for (size_t i = 0; i < initial_size; i++) {
|
43
|
+
a->array[i] = default_val;
|
44
|
+
}
|
45
|
+
}
|
46
|
+
|
47
|
+
void insertDynamicArray(DynamicArray *a, unsigned long index, long element) {
|
48
|
+
if (a->size <= index) {
|
49
|
+
size_t new_size = a->size;
|
50
|
+
while (new_size <= index) {
|
51
|
+
new_size = 8 * new_size / 5 + 8; // 8/5 gives "Fibonnacci-like" growth; adding 8 to avoid small arrays having to reallocate
|
52
|
+
// too often. Who knows if it's worth being "clever"."
|
53
|
+
}
|
54
|
+
|
55
|
+
long* new_array = realloc(a->array, new_size * sizeof(long));
|
56
|
+
if (!new_array) {
|
57
|
+
rb_raise(rb_eRuntimeError, "Cannot allocate memory to expand DynamicArray!");
|
58
|
+
}
|
59
|
+
|
60
|
+
a->array = new_array;
|
61
|
+
for (size_t i = a->size; i < new_size; i++) {
|
62
|
+
a->array[i] = a->default_val;
|
63
|
+
}
|
64
|
+
|
65
|
+
a->size = new_size;
|
66
|
+
}
|
67
|
+
|
68
|
+
a->array[index] = element;
|
69
|
+
}
|
70
|
+
|
71
|
+
void freeDynamicArray(DynamicArray *a) {
|
72
|
+
free(a->array);
|
73
|
+
a->array = NULL;
|
74
|
+
a->size = 0;
|
75
|
+
}
|
76
|
+
|
77
|
+
/**
|
78
|
+
* The C implementation of a Disjoint Union
|
79
|
+
*
|
80
|
+
* See Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
|
81
|
+
*/
|
82
|
+
|
83
|
+
/*
|
84
|
+
* The Disjoint Union struct.
|
85
|
+
* - forest: an array of longs giving, for each element, the parent element of its tree.
|
86
|
+
* - An element e is the root of its tree just when forest[e] == e.
|
87
|
+
* - Two elements are in the same subset just when they are in the same tree in the forest.
|
88
|
+
* - So the key idea is that we can check this by navigating via parents from each element to their roots. Clever optimizations
|
89
|
+
* keep the trees flat and so most nodes are close to their roots.
|
90
|
+
* - rank: a array of longs giving the "rank" of each element.
|
91
|
+
* - This value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat. See Tarjan & van
|
92
|
+
* Leeuwen
|
93
|
+
* - subset_count: the number of (disjoint) subsets.
|
94
|
+
* - it isn't needed internally but may be useful to client code.
|
95
|
+
*/
|
96
|
+
typedef struct du_data {
|
97
|
+
DynamicArray* forest; // the forest that describes the unified subsets
|
98
|
+
DynamicArray* rank; // the "ranks" of the elements, used when uniting subsets
|
99
|
+
size_t subset_count;
|
100
|
+
} disjoint_union_data;
|
101
|
+
|
102
|
+
/*
|
103
|
+
* Create one.
|
104
|
+
*
|
105
|
+
* The dynamic arrays are initialized with a size of 100 because I didn't have a better idea. This will end up getting called from
|
106
|
+
* the Ruby #allocate method, which happens before #initialize. Thus we don't know the calling code's desired initial size.
|
107
|
+
*/
|
108
|
+
#define INITIAL_SIZE 100
|
109
|
+
static disjoint_union_data* create_disjoint_union() {
|
110
|
+
disjoint_union_data* disjoint_union = malloc(sizeof(disjoint_union_data));
|
111
|
+
|
112
|
+
// Allocate the structures
|
113
|
+
DynamicArray* forest = malloc(sizeof(DynamicArray));
|
114
|
+
DynamicArray* rank = malloc(sizeof(DynamicArray));
|
115
|
+
initDynamicArray(forest, INITIAL_SIZE, -1);
|
116
|
+
initDynamicArray(rank, INITIAL_SIZE, 0);
|
117
|
+
|
118
|
+
disjoint_union->forest = forest;
|
119
|
+
disjoint_union->rank = rank;
|
120
|
+
disjoint_union->subset_count = 0;
|
121
|
+
|
122
|
+
return disjoint_union;
|
123
|
+
}
|
124
|
+
|
125
|
+
/*
|
126
|
+
* Free the memory associated with a disjoint union. This will end up getting triggered by the Ruby garbage collector.
|
127
|
+
*/
|
128
|
+
static void disjoint_union_free(void *ptr) {
|
129
|
+
if (ptr) {
|
130
|
+
disjoint_union_data *disjoint_union = ptr;
|
131
|
+
freeDynamicArray(disjoint_union->forest);
|
132
|
+
freeDynamicArray(disjoint_union->rank);
|
133
|
+
|
134
|
+
free(disjoint_union->forest);
|
135
|
+
disjoint_union->forest = NULL;
|
136
|
+
|
137
|
+
free(disjoint_union->rank);
|
138
|
+
disjoint_union->rank = NULL;
|
139
|
+
|
140
|
+
free(disjoint_union);
|
141
|
+
}
|
142
|
+
}
|
143
|
+
|
144
|
+
/************************************************************
|
145
|
+
* The disjoint union operations
|
146
|
+
************************************************************/
|
147
|
+
|
148
|
+
/*
|
149
|
+
* Is the given element already a member of the universe?
|
150
|
+
*/
|
151
|
+
static int present_p(disjoint_union_data* disjoint_union, size_t element) {
|
152
|
+
DynamicArray* forest = disjoint_union->forest;
|
153
|
+
return (forest->size > element && (forest->array[element] != forest->default_val));
|
154
|
+
}
|
155
|
+
|
156
|
+
/*
|
157
|
+
* Check that the given element is a member of the universe and raise Shared::DataError (ruby-side) if not
|
158
|
+
*/
|
159
|
+
static void assert_membership(disjoint_union_data* disjoint_union, size_t element) {
|
160
|
+
if (!present_p(disjoint_union, element)) {
|
161
|
+
rb_raise(eDataError, "Value %zu is not part of the universe", element);
|
162
|
+
}
|
163
|
+
}
|
164
|
+
|
165
|
+
/*
|
166
|
+
* Add a new element to the universe. It starts out in its own singleton subset.
|
167
|
+
*
|
168
|
+
* Shared::DataError is raised if it is already an element.
|
169
|
+
*/
|
170
|
+
static void add_new_element(disjoint_union_data* disjoint_union, size_t element) {
|
171
|
+
if (present_p(disjoint_union, element)) {
|
172
|
+
rb_raise(eDataError, "Element %zu already present in the universe", element);
|
173
|
+
}
|
174
|
+
|
175
|
+
insertDynamicArray(disjoint_union->forest, element, element);
|
176
|
+
insertDynamicArray(disjoint_union->rank, element, 0);
|
177
|
+
disjoint_union->subset_count++;
|
178
|
+
}
|
179
|
+
|
180
|
+
/*
|
181
|
+
* Find the canonical representative of the given element. This is the root of the tree (in forest) containing element.
|
182
|
+
*
|
183
|
+
* Two elements are in the same subset exactly when their canonical representatives are equal.
|
184
|
+
*/
|
185
|
+
static size_t find(disjoint_union_data* disjoint_union, size_t element) {
|
186
|
+
assert_membership(disjoint_union, element);
|
187
|
+
|
188
|
+
// We implement find with "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
|
189
|
+
long* d = disjoint_union->forest->array; // the actual forest data
|
190
|
+
size_t x = element;
|
191
|
+
while (d[d[x]] != d[x]) {
|
192
|
+
x = d[x] = d[d[x]];
|
193
|
+
}
|
194
|
+
return d[x];
|
195
|
+
}
|
196
|
+
|
197
|
+
/*
|
198
|
+
* "Link"" the two given elements so that they are in the same subset now.
|
199
|
+
*
|
200
|
+
* In other words, merge the subtrees containing the two elements.
|
201
|
+
*
|
202
|
+
* Good performace (see Tarjan and van Leeuwin) assumes that elt1 and elt2 area are disinct and already the roots of their trees,
|
203
|
+
* though we don't check that here.
|
204
|
+
*/
|
205
|
+
static void link_roots(disjoint_union_data* disjoint_union, size_t elt1, size_t elt2) {
|
206
|
+
long* rank = disjoint_union->rank->array;
|
207
|
+
long* forest = disjoint_union->forest->array;
|
208
|
+
|
209
|
+
if (rank[elt1] > rank[elt2]) {
|
210
|
+
forest[elt2] = elt1;
|
211
|
+
} else if (rank[elt1] == rank[elt2]) {
|
212
|
+
forest[elt2] = elt1;
|
213
|
+
rank[elt1]++;
|
214
|
+
} else {
|
215
|
+
forest[elt1] = elt2;
|
216
|
+
}
|
217
|
+
|
218
|
+
disjoint_union->subset_count--;
|
219
|
+
}
|
220
|
+
|
221
|
+
/*
|
222
|
+
* "Unite" or merge the subsets containing elt1 and elt2.
|
223
|
+
*/
|
224
|
+
static void unite(disjoint_union_data* disjoint_union, size_t elt1, size_t elt2) {
|
225
|
+
assert_membership(disjoint_union, elt1);
|
226
|
+
assert_membership(disjoint_union, elt2);
|
227
|
+
|
228
|
+
if (elt1 == elt2) {
|
229
|
+
rb_raise(eDataError, "Uniting an element with itself is meaningless");
|
230
|
+
}
|
231
|
+
|
232
|
+
size_t root1 = find(disjoint_union, elt1);
|
233
|
+
size_t root2 = find(disjoint_union, elt2);
|
234
|
+
|
235
|
+
if (root1 == root2) {
|
236
|
+
return; // already united
|
237
|
+
}
|
238
|
+
|
239
|
+
link_roots(disjoint_union, root1, root2);
|
240
|
+
}
|
241
|
+
|
242
|
+
|
243
|
+
/**
|
244
|
+
* Wrapping and unwrapping things for the Ruby runtime
|
245
|
+
*
|
246
|
+
*/
|
247
|
+
|
248
|
+
// How much memory (roughly) does a disjoint_union_data instance consume? I guess the Ruby runtime can use this information when
|
249
|
+
// deciding how agressive to be during garbage collection and such.
|
250
|
+
static size_t disjoint_union_memsize(const void *ptr) {
|
251
|
+
if (ptr) {
|
252
|
+
const disjoint_union_data *disjoint_union = ptr;
|
253
|
+
return (2 * disjoint_union->forest->size * sizeof(long)); // disjoint_union->rank is the same size
|
254
|
+
} else {
|
255
|
+
return 0;
|
256
|
+
}
|
257
|
+
}
|
258
|
+
|
259
|
+
/*
|
260
|
+
* A configuration struct that tells the Ruby runtime how to deal with a disjoint_union_data object.
|
261
|
+
*
|
262
|
+
* https://docs.ruby-lang.org/en/master/extension_rdoc.html#label-Encapsulate+C+data+into+a+Ruby+object
|
263
|
+
*/
|
264
|
+
static const rb_data_type_t disjoint_union_type = {
|
265
|
+
.wrap_struct_name = "disjoint_union",
|
266
|
+
{ // help for the Ruby garbage collector
|
267
|
+
.dmark = NULL, // dmark, for marking other Ruby objects. We don't hold any other objects so this can be NULL
|
268
|
+
.dfree = disjoint_union_free, // how to free the memory associated with an object
|
269
|
+
.dsize = disjoint_union_memsize, // roughly how much space does the object consume?
|
270
|
+
},
|
271
|
+
.data = NULL, // a data field we could use for something here if we wanted. Ruby ignores it
|
272
|
+
.flags = 0 // GC-related flag values.
|
273
|
+
};
|
274
|
+
|
275
|
+
/*
|
276
|
+
* Helper: check that a Ruby value is a non-negative Fixnum and convert it to a nice C long
|
277
|
+
*
|
278
|
+
* TODO: can we return an size_t or unsigned long instead?
|
279
|
+
*/
|
280
|
+
static long checked_nonneg_fixnum(VALUE val) {
|
281
|
+
Check_Type(val, T_FIXNUM);
|
282
|
+
long c_val = FIX2LONG(val);
|
283
|
+
|
284
|
+
if (c_val < 0) {
|
285
|
+
rb_raise(eDataError, "Value must be non-negative");
|
286
|
+
}
|
287
|
+
|
288
|
+
return c_val;
|
289
|
+
}
|
290
|
+
|
291
|
+
/*
|
292
|
+
* Unwrap a Rubyfied disjoint union to get the C struct inside.
|
293
|
+
*/
|
294
|
+
static disjoint_union_data* unwrapped(VALUE self) {
|
295
|
+
disjoint_union_data* disjoint_union;
|
296
|
+
TypedData_Get_Struct((self), disjoint_union_data, &disjoint_union_type, disjoint_union);
|
297
|
+
return disjoint_union;
|
298
|
+
}
|
299
|
+
|
300
|
+
/*
|
301
|
+
* This is for CDisjointUnion.allocate on the Ruby side
|
302
|
+
*/
|
303
|
+
static VALUE disjoint_union_alloc(VALUE klass) {
|
304
|
+
disjoint_union_data* disjoint_union = create_disjoint_union();
|
305
|
+
return TypedData_Wrap_Struct(klass, &disjoint_union_type, disjoint_union);
|
306
|
+
}
|
307
|
+
|
308
|
+
/*
|
309
|
+
* A single parameter is optional. If given it should be a non-negative integer and specifies the initial size, s, of the universe
|
310
|
+
* 0, 1, ..., s-1.
|
311
|
+
*
|
312
|
+
* If no argument is given we act as though a value of 0 were passed.
|
313
|
+
*/
|
314
|
+
static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
|
315
|
+
if (argc == 0) {
|
316
|
+
return self;
|
317
|
+
} else if (argc > 1) {
|
318
|
+
rb_raise(rb_eArgError, "wrong number of arguments");
|
319
|
+
} else {
|
320
|
+
size_t initial_size = checked_nonneg_fixnum(argv[0]);
|
321
|
+
disjoint_union_data* disjoint_union = unwrapped(self);
|
322
|
+
|
323
|
+
for (size_t i = 0; i < initial_size; i++) {
|
324
|
+
add_new_element(disjoint_union, i);
|
325
|
+
}
|
326
|
+
}
|
327
|
+
return self;
|
328
|
+
}
|
329
|
+
|
330
|
+
/**
|
331
|
+
* And now the simple wrappers around the Disjoint Union C functionality. In each case we
|
332
|
+
* - unwrap a 'VALUE self',
|
333
|
+
* - i.e., theCDisjointUnion instance that contains a disjoint_union_data struct;
|
334
|
+
* - munge any other arguments into longs;
|
335
|
+
* - call the appropriate C function to act on the struct; and
|
336
|
+
* - return an appropriate VALUE for the Ruby runtime can use.
|
337
|
+
*
|
338
|
+
* We make them into methods on CDisjointUnion in the Init_CDisjointUnion function, below.
|
339
|
+
*/
|
340
|
+
|
341
|
+
/*
|
342
|
+
* Add a new subset to the universe containing the element +new_v+.
|
343
|
+
*
|
344
|
+
* @param the new element, starting in its own singleton subset
|
345
|
+
* - it must be a non-negative integer, not already part of the universe of elements.
|
346
|
+
*/
|
347
|
+
static VALUE disjoint_union_make_set(VALUE self, VALUE arg) {
|
348
|
+
add_new_element(unwrapped(self), checked_nonneg_fixnum(arg));
|
349
|
+
|
350
|
+
return Qnil;
|
351
|
+
}
|
352
|
+
|
353
|
+
/*
|
354
|
+
* @return the number of subsets into which the universe is currently partitioned.
|
355
|
+
*/
|
356
|
+
static VALUE disjoint_union_subset_count(VALUE self) {
|
357
|
+
return LONG2NUM(unwrapped(self)->subset_count);
|
358
|
+
}
|
359
|
+
|
360
|
+
/*
|
361
|
+
* The canonical representative of the subset containing e. Two elements d and e are in the same subset exactly when find(d) ==
|
362
|
+
* find(e).
|
363
|
+
*
|
364
|
+
* The parameter must be in the universe of elements.
|
365
|
+
*
|
366
|
+
* @return (Integer) one of the universe of elements
|
367
|
+
*/
|
368
|
+
static VALUE disjoint_union_find(VALUE self, VALUE arg) {
|
369
|
+
return LONG2NUM(find(unwrapped(self), checked_nonneg_fixnum(arg)));
|
370
|
+
}
|
371
|
+
|
372
|
+
/*
|
373
|
+
* Declare that the arguments are equivalent, i.e., in the same subset. If they are already in the same subset this is a no-op.
|
374
|
+
*
|
375
|
+
* Each argument must be in the universe of elements
|
376
|
+
*/
|
377
|
+
static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
|
378
|
+
unite(unwrapped(self), checked_nonneg_fixnum(arg1), checked_nonneg_fixnum(arg2));
|
379
|
+
|
380
|
+
return Qnil;
|
381
|
+
}
|
382
|
+
|
383
|
+
/*
|
384
|
+
* A Disjoint Union.
|
385
|
+
*
|
386
|
+
* A "disjoint set union" that represents a set of elements that belonging to _disjoint_ subsets. Alternatively, this expresses a
|
387
|
+
* partion of a fixed set.
|
388
|
+
*
|
389
|
+
* The data structure provides efficient actions to merge two disjoint subsets, i.e., replace them by their union, and determine if
|
390
|
+
* two elements are in the same subset.
|
391
|
+
*
|
392
|
+
* The elements of the set are 0, 1, ..., n-1, where n is the size of the universe. Client code can map its data to these
|
393
|
+
* representatives.
|
394
|
+
*
|
395
|
+
* See https://en.wikipedia.org/wiki/Disjoint-set_data_structure for a good introduction.
|
396
|
+
*
|
397
|
+
* The code uses several ideas from Tarjan and van Leeuwen for efficiency. We use "union by rank" in +unite+ and path-halving in
|
398
|
+
* +find+. Together, these make the amortized cost of each opperation effectively constant.
|
399
|
+
*
|
400
|
+
* - Tarjan, Robert E., van Leeuwen, Jan (1984). _Worst-case analysis of set union algorithms_. Journal of the ACM. 31 (2): 245–281.
|
401
|
+
*/
|
402
|
+
void Init_c_disjoint_union() {
|
403
|
+
VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
|
404
|
+
VALUE cDisjointUnion = rb_define_class_under(mDataStructuresRMolinari, "CDisjointUnion", rb_cObject);
|
405
|
+
|
406
|
+
rb_define_alloc_func(cDisjointUnion, disjoint_union_alloc);
|
407
|
+
rb_define_method(cDisjointUnion, "initialize", disjoint_union_init, -1);
|
408
|
+
rb_define_method(cDisjointUnion, "make_set", disjoint_union_make_set, 1);
|
409
|
+
rb_define_method(cDisjointUnion, "subset_count", disjoint_union_subset_count, 0);
|
410
|
+
rb_define_method(cDisjointUnion, "find", disjoint_union_find, 1);
|
411
|
+
rb_define_method(cDisjointUnion, "unite", disjoint_union_unite, 2);
|
412
|
+
}
|
@@ -0,0 +1,12 @@
|
|
1
|
+
require 'mkmf'
|
2
|
+
|
3
|
+
abort 'missing malloc()' unless have_func "malloc"
|
4
|
+
abort 'missing realloc()' unless have_func "realloc"
|
5
|
+
|
6
|
+
if try_cflags('-O')
|
7
|
+
append_cflags('-O')
|
8
|
+
end
|
9
|
+
|
10
|
+
extension_name = "c_disjoint_union"
|
11
|
+
dir_config(extension_name)
|
12
|
+
create_makefile("data_structures_rmolinari/c_disjoint_union")
|