data_structures_rmolinari 0.4.3 → 0.4.4

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c9022e9531472d1125c6172025c2d10c5d4ef4f9c43e326a43f1c5b4f0721263
4
- data.tar.gz: '0212619be7fe32e68b63d2087730f81ffd6b4179b8b8bf63aa0026e4e3056224'
3
+ metadata.gz: 943ac55678a074cc0da3667dccbb07ee7d203639233f53bd8587af7fd8cd062e
4
+ data.tar.gz: ad235e5f4714e699f1cf5f113dd4b3a356a194cced5a74b60e17c5e3a896e01b
5
5
  SHA512:
6
- metadata.gz: a7f9258eeed2dc7e7fa5713aaecfcdf44e061bb161aa3d0d2662fb662bfb6b2685c61be221b4a109792982c3b2aa6215da75b51ae299d4a9237b6226000612e4
7
- data.tar.gz: e585a245f753ef731895163eedba802e3fe2f6000720d10705b5a0cd02a12642a35220eea57c8b71f504b66db7cb06161fdfca1660edc0dc132ee026dd83be4d
6
+ metadata.gz: a68de76c88c67fadc42752610c695b1f0b8fd17f34db9c806291aeab4c933fe84c6523615deb4197e1c9fa6d36dce30987cc4e8896a2b0c1700b7e72b5bd2fff
7
+ data.tar.gz: 9063d89a98d599f27db2585bf383dbfb13e8f927abce64ac7eafb2edd70c490ddad1f1fc51e0f11c24adf29f28ab8c56548a6db264b15ace239c63b1a2ce5a01
data/CHANGELOG.md CHANGED
@@ -1,5 +1,13 @@
1
1
  # Changelog
2
2
 
3
+ ## [Unreleased]
4
+
5
+ - Disjoint Union
6
+ - C extension: use Convenient Containers rather than my janky Dynamic Array attempt.
7
+
8
+ - Segment Tree
9
+ - Add a C implementation as CSegmentTreeTemplate.
10
+
3
11
  ## [0.4.3] 2023-01-27
4
12
 
5
13
  - Fix bad directive in Rakefile for DisjointUnion C extension
data/README.md CHANGED
@@ -4,8 +4,8 @@ This is a small collection of Ruby data structures that I have implemented for m
4
4
  structure is almost always more educational than simply reading about it and is usually fun. I wrote some of them while
5
5
  participating in the Advent of Code (https://adventofcode.com/).
6
6
 
7
- These implementations are not particularly clever. They are based on the expository descriptions and pseudo-code I found as I read
8
- about each structure and so are not as fast as possible.
7
+ The implementations are based on the expository descriptions and pseudo-code I found as I read about each structure and so are not
8
+ as fast as possible.
9
9
 
10
10
  The code is available as a gem: https://rubygems.org/gems/data_structures_rmolinari.
11
11
 
@@ -42,9 +42,6 @@ It also provides
42
42
  For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
43
43
  van Leeuwen.
44
44
 
45
- There is an experimental implementation as a C extension in c_disjoint_union.c. The Ruby class for this implementation is
46
- `CDisjointUnion`. Benchmarks indicate that a long sequence of `unite` calls is about twice as fast.
47
-
48
45
  ## Heap
49
46
 
50
47
  This is a standard binary heap with an `update` method, suitable for use as a priority queue. There are several supported
@@ -84,15 +81,15 @@ pointing north.
84
81
 
85
82
  There is no `smallest_x_in_3_sided(x0, x1, y0)`. Just use `smallest_x_in_ne(x0, y0)`.
86
83
 
84
+ (These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
85
+ [[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.)
86
+
87
87
  The single-point queries run in O(log n) time, where n is the size of P, while `enumerate_3_sided` runs in O(m + log n), where m is
88
88
  the number of points actually enumerated.
89
89
 
90
90
  The implementation is in `MaxPrioritySearchTree` (MaxPST for short), so called because internally the structure is, among other
91
91
  things, a max-heap on the y-coordinates.
92
92
 
93
- These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
94
- [[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.
95
-
96
93
  We also provide a `MinPrioritySearchTree`, which answers analagous queries in the southward-infinite quadrants and 3-sided
97
94
  regions.
98
95
 
@@ -108,17 +105,17 @@ both a MaxPST and MinPST. But the presentiation is hard to follow in places and
108
105
 
109
106
  ## Segment Tree
110
107
 
111
- Segment trees store information related to subintervals of a certain array. For example, they can be used to find the sum of the
112
- elements in an arbitrary subinterval A[i..j] of an array A[0..n] in O(log n) time. Each node in the tree corresponds to a subarray
113
- of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for arbitrary
114
- subarrays.
108
+ A segment tree stores information related to subintervals of a certain array. For example, a segment tree can be used to find the
109
+ sum of the elements in an arbitrary subinterval A(i..j) of an array A(0..n) in O(log n) time. Each node in the tree corresponds to a
110
+ subarray of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for
111
+ arbitrary subarrays.
115
112
 
116
113
  An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
117
114
 
118
- Generic code is provided in `SegmentTreeTemplate`. Concrete classes are written by providing a handful of simple lambdas and
119
- constants to the template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a
120
- segment tree, for which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes,
121
- `MaxValSegmentTree` and `IndexOfMaxValSegmentTree`, for examples.
115
+ Generic code is provided in `SegmentTreeTemplate`. Concrete classes provide a handful of simple lambdas and constants to the
116
+ template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a segment tree, for
117
+ which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes `MaxValSegmentTree` and
118
+ `IndexOfMaxValSegmentTree` for examples.
122
119
 
123
120
  ## Algorithms
124
121
 
@@ -131,7 +128,29 @@ The Algorithms submodule contains some algorithms using the data structures.
131
128
  [left, right, bottom, top].
132
129
  - The algorithm is due to [[DMNS2013]](#references).
133
130
 
131
+ # C Extensions
132
+
133
+ As another learning process I have implemented several of these data structures as C extensions. The class names have a "C" prefixed
134
+ and they can be required like their pure Ruby versions. They have the same APIs as their Ruby cousins.
135
+
136
+ ## Disjoint Union
137
+
138
+ A benchmark suggests that a long sequence of `unite` operations is about 3 times as fast with the `CDisjointUnion` as with
139
+ `DisjointUnion`.
140
+
141
+ The implementation uses the remarkable Convenient Containers library from Jackson Allan.[[Allan]](#references).
142
+
143
+ ## Segment Tree
144
+
145
+ `CSegmentTreeTemplate` is the C implementation of the generic class. Concrete classes are built on top of this in Ruby, just as with
146
+ the pure Ruby `SegmentTreeTemplate` class.
147
+
148
+ A benchmark suggests that a long sequence of `max_on` operations against a max-val Segment Tree is about 4 times as fast with the C
149
+ version as with the Ruby version. I'm a bit suprised the improvment isn't larger, but we must remember that the C code must still
150
+ interact with the Ruby objects in the underlying data array, and must "combine" them, etc., by calling Ruby lambdas.
151
+
134
152
  # References
153
+ - [Allan] Allan, J., _CC: Convenient Containers_, https://github.com/JacksonAllan/CC, retrieved 2023-02-01.
135
154
  - [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
136
155
  - [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI 10.1007/s00224-017-9760-2.
137
156
  - [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985.
data/Rakefile CHANGED
@@ -2,10 +2,12 @@ require 'rubygems'
2
2
  require 'rake/testtask'
3
3
  require 'rake/extensiontask'
4
4
 
5
- Rake::ExtensionTask.new('data_structures_rmolinari/c_disjoint_union') do |ext|
6
- ext.name = 'c_disjoint_union'
7
- ext.ext_dir = 'ext/c_disjoint_union'
8
- ext.lib_dir = 'lib/data_structures_rmolinari/'
5
+ ['c_disjoint_union', 'c_segment_tree_template'].each do |extension_name|
6
+ Rake::ExtensionTask.new("data_structures_rmolinari/#{extension_name}") do |ext|
7
+ ext.name = extension_name
8
+ ext.ext_dir = "ext/#{extension_name}"
9
+ ext.lib_dir = 'lib/data_structures_rmolinari/'
10
+ end
9
11
  end
10
12
 
11
13
  Rake::TestTask.new do |t|
@@ -16,118 +16,69 @@
16
16
  */
17
17
 
18
18
  #include "ruby.h"
19
-
20
- // The Shared::DataError exception type in the Ruby code. We only need it when we detect a runtime error, so a macro should be fine.
21
- #define mShared rb_define_module("Shared")
22
- #define eSharedDataError rb_const_get(mShared, rb_intern_const("DataError"))
19
+ #include "cc.h" // Convenient Containers
20
+ #include "shared.h"
23
21
 
24
22
  /**
25
- * It's been so long since I've written non-trival C that I need to copy examples from online.
26
- *
27
- * Dynamic array of longs, with an initial value for otherwise uninitialized elements.
28
- * Based on https://stackoverflow.com/questions/3536153/c-dynamically-growing-array
29
- */
30
- typedef struct {
31
- long *array;
32
- size_t size;
33
- long default_val;
34
- } DynamicArray;
35
-
36
- /*
37
- * Initialize a DynamicArray struct with the given initial size and with all values set to the default value.
38
- *
39
- * The default value is stored and used to initialize new array sections if and when the array needs to be expanded.
40
- */
41
- void initDynamicArray(DynamicArray *a, size_t initial_size, long default_val) {
42
- a->array = malloc(initial_size * sizeof(long));
43
- a->size = initial_size;
44
- a->default_val = default_val;
45
-
46
- for (size_t i = 0; i < initial_size; i++) {
47
- a->array[i] = default_val;
48
- }
49
- }
50
-
51
- /*
52
- * Assign +value+ to the the +index+-th element of the array, expanding the available space if necessary.
23
+ * Data type for the (parent, rank) pair, and some accessor helpers for the vec() container we are going to be using.
53
24
  */
54
- void assignInDynamicArray(DynamicArray *a, unsigned long index, long value) {
55
- if (a->size <= index) {
56
- size_t new_size = a->size;
57
- while (new_size <= index) {
58
- new_size = 8 * new_size / 5 + 8; // 8/5 gives "Fibonnacci-like" growth; adding 8 to avoid small arrays having to reallocate
59
- // too often as they grow. Who knows if it's worth being "clever".
60
- }
61
25
 
62
- long *new_array = realloc(a->array, new_size * sizeof(long));
63
- if (!new_array) {
64
- rb_raise(rb_eRuntimeError, "Cannot allocate memory to expand DynamicArray!");
65
- }
26
+ typedef struct data_pair {
27
+ long parent;
28
+ unsigned long rank;
29
+ } data_pair;
66
30
 
67
- a->array = new_array;
68
- for (size_t i = a->size; i < new_size; i++) {
69
- a->array[i] = a->default_val;
70
- }
71
-
72
- a->size = new_size;
73
- }
31
+ #define DEFAULT_PARENT -1
32
+ #define DEFAULT_RANK 0
33
+ static data_pair default_pair = { .parent = DEFAULT_PARENT, .rank = DEFAULT_RANK };
74
34
 
75
- a->array[index] = value;
35
+ static data_pair make_data_pair(long parent, unsigned long rank) {
36
+ data_pair pair = { .parent = parent, .rank = rank };
37
+ return pair;
76
38
  }
77
39
 
78
- void freeDynamicArray(DynamicArray *a) {
79
- free(a->array);
80
- a->array = NULL;
81
- a->size = 0;
82
- }
40
+ /* The vector generic from Convenient Containers */
41
+ typedef vec(data_pair) pair_vector;
83
42
 
84
- size_t _size_of(DynamicArray *a) {
85
- return a->size * sizeof(a->default_val);
86
- }
43
+ #define parent(disjoint_union_ptr, idx) (get(disjoint_union->pairs, idx)->parent)
44
+ #define rank(disjoint_union_ptr, idx) (get(disjoint_union->pairs, idx)->rank)
87
45
 
88
46
  /**
89
47
  * The C implementation of a Disjoint Union
90
48
  *
91
- * See Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
49
+ * See the paper for optimizations we use to get almost constant time for find() and unite().
50
+ *
51
+ * Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
92
52
  */
93
53
 
94
54
  /*
95
55
  * The Disjoint Union struct.
96
- * - forest: an array of longs giving, for each element, the element's parent.
97
- * - An element e is the root of its tree just when forest[e] == e.
98
- * - Two elements are in the same subset just when they are in the same tree in the forest.
56
+ * - pairs: a vector (dynamic array) of pairs, the i-th of which contains
57
+ * - the "parent" of element i in its membership tree
58
+ * - An element e is the root of its tree just when it is its own parent
59
+ * - Two elements are in the same subset just when they are in the same tree in the forest.
99
60
  * - So the key idea is that we can check this by navigating via parents from each element to their roots. Clever optimizations
100
61
  * keep the trees flat and so most nodes are close to their roots.
101
- * - rank: a array of longs giving the "rank" of each element.
102
- * - This value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat. See Tarjan & van
103
- * Leeuwen
62
+ * - the "rank" of element i
63
+ * - this value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat.
104
64
  * - subset_count: the number of (disjoint) subsets.
105
65
  * - it isn't needed internally but may be useful to client code.
106
66
  */
107
67
  typedef struct du_data {
108
- DynamicArray *forest; // the forest that describes the unified subsets
109
- DynamicArray *rank; // the "ranks" of the elements, used when uniting subsets
68
+ pair_vector *pairs; // The generic vector container from the amazing Convenient Containers library
110
69
  size_t subset_count;
111
70
  } disjoint_union_data;
112
71
 
113
72
  /*
114
73
  * Create one (on the heap).
115
- *
116
- * The dynamic arrays are initialized with a size of 100 because I didn't have a better idea. This will end up getting called from
117
- * the Ruby #allocate method, which happens before #initialize. Thus we don't know the calling code's desired initial size.
118
74
  */
119
- #define INITIAL_SIZE 100
120
75
  static disjoint_union_data *create_disjoint_union() {
121
76
  disjoint_union_data *disjoint_union = (disjoint_union_data *)malloc(sizeof(disjoint_union_data));
122
77
 
123
78
  // Allocate the structures
124
- DynamicArray *forest = (DynamicArray *)malloc(sizeof(DynamicArray));
125
- DynamicArray *rank = (DynamicArray *)malloc(sizeof(DynamicArray));
126
- initDynamicArray(forest, INITIAL_SIZE, -1);
127
- initDynamicArray(rank, INITIAL_SIZE, 0);
79
+ disjoint_union->pairs = malloc(sizeof(pair_vector));
80
+ init(disjoint_union->pairs);
128
81
 
129
- disjoint_union->forest = forest;
130
- disjoint_union->rank = rank;
131
82
  disjoint_union->subset_count = 0;
132
83
 
133
84
  return disjoint_union;
@@ -141,15 +92,7 @@ static disjoint_union_data *create_disjoint_union() {
141
92
  static void disjoint_union_free(void *ptr) {
142
93
  if (ptr) {
143
94
  disjoint_union_data *disjoint_union = ptr;
144
- freeDynamicArray(disjoint_union->forest);
145
- freeDynamicArray(disjoint_union->rank);
146
-
147
- free(disjoint_union->forest);
148
- disjoint_union->forest = NULL;
149
-
150
- free(disjoint_union->rank);
151
- disjoint_union->rank = NULL;
152
-
95
+ cleanup(disjoint_union->pairs);
153
96
  xfree(disjoint_union);
154
97
  }
155
98
  }
@@ -162,8 +105,7 @@ static void disjoint_union_free(void *ptr) {
162
105
  * Is the given element already a member of the universe?
163
106
  */
164
107
  static int present_p(disjoint_union_data *disjoint_union, size_t element) {
165
- DynamicArray *forest = (DynamicArray *)disjoint_union->forest;
166
- return (forest->size > element && (forest->array[element] != forest->default_val));
108
+ return (size(disjoint_union->pairs) > element && (parent(disjoint_union, element) != DEFAULT_PARENT));
167
109
  }
168
110
 
169
111
  /*
@@ -172,6 +114,13 @@ static int present_p(disjoint_union_data *disjoint_union, size_t element) {
172
114
  static void assert_membership(disjoint_union_data *disjoint_union, size_t element) {
173
115
  if (!present_p(disjoint_union, element)) {
174
116
  rb_raise(eSharedDataError, "Value %zu is not part of the universe", element);
117
+ /* rb_raise( */
118
+ /* eSharedDataError, */
119
+ /* "Value %zu is not part of the universe, size = %zu, forest_val = %lu", */
120
+ /* element, */
121
+ /* size(disjoint_union->pairs), */
122
+ /* get(disjoint_union->pairs, element)->parent */
123
+ /* ); */
175
124
  }
176
125
  }
177
126
 
@@ -185,47 +134,52 @@ static void add_new_element(disjoint_union_data *disjoint_union, size_t element)
185
134
  rb_raise(eSharedDataError, "Element %zu already present in the universe", element);
186
135
  }
187
136
 
188
- assignInDynamicArray(disjoint_union->forest, element, element);
189
- assignInDynamicArray(disjoint_union->rank, element, 0);
137
+ // Expand the underlying vector if necessary
138
+ size_t sz = size(disjoint_union->pairs);
139
+ if (sz <= element) {
140
+ resize(disjoint_union->pairs, element + 1);
141
+ for (size_t i = sz + 1; i <= element; i++) {
142
+ lval(disjoint_union->pairs, i) = default_pair;
143
+ }
144
+ }
145
+
146
+ lval(disjoint_union->pairs, element) = make_data_pair(element, 0l);
190
147
  disjoint_union->subset_count++;
191
148
  }
192
149
 
193
150
  /*
194
- * Find the canonical representative of the given element. This is the root of the tree (in forest) containing element.
151
+ * Find the canonical representative of the given element. This is the root of the tree containing it.
195
152
  *
196
153
  * Two elements are in the same subset exactly when their canonical representatives are equal.
197
154
  */
198
155
  static size_t find(disjoint_union_data *disjoint_union, size_t element) {
199
156
  assert_membership(disjoint_union, element);
200
157
 
201
- // We implement find with "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
202
- long *d = disjoint_union->forest->array; // the actual forest data
158
+ // We use "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
203
159
  size_t x = element;
204
- while (d[d[x]] != d[x]) {
205
- x = d[x] = d[d[x]];
160
+ long p, gp; // parent and grandparent
161
+ while (p = parent(disjoint_union, x), gp = parent(disjoint_union, p), p != gp) {
162
+ parent(disjoint_union, p) = gp;
163
+ x = gp;
206
164
  }
207
- return d[x];
165
+ return parent(disjoint_union, x);
208
166
  }
209
167
 
210
168
  /*
211
- * "Link"" the two given elements so that they are in the same subset now.
169
+ * "Link" the two given elements so that they are in the same subset now.
212
170
  *
213
171
  * In other words, merge the subtrees containing the two elements.
214
172
  *
215
- * Good performace (see Tarjan and van Leeuwin) assumes that elt1 and elt2 area are disinct and already the roots of their trees,
216
- * though we don't check that here.
173
+ * elt1 and elt2 area must be disinct and the roots of their trees, though we don't check that here.
217
174
  */
218
175
  static void link_roots(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2) {
219
- long *rank = disjoint_union->rank->array;
220
- long *forest = disjoint_union->forest->array;
221
-
222
- if (rank[elt1] > rank[elt2]) {
223
- forest[elt2] = elt1;
224
- } else if (rank[elt1] == rank[elt2]) {
225
- forest[elt2] = elt1;
226
- rank[elt1]++;
176
+ if (rank(disjoint_union, elt1) > rank(disjoint_union, elt2)) {
177
+ parent(disjoint_union, elt2) = elt1;
178
+ } else if (rank(disjoint_union, elt1) == rank(disjoint_union, elt2)) {
179
+ parent(disjoint_union, elt2) = elt1;
180
+ rank(disjoint_union, elt1)++;
227
181
  } else {
228
- forest[elt1] = elt2;
182
+ parent(disjoint_union, elt1) = elt2;
229
183
  }
230
184
 
231
185
  disjoint_union->subset_count--;
@@ -263,7 +217,9 @@ static void unite(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2)
263
217
  static size_t disjoint_union_memsize(const void *ptr) {
264
218
  if (ptr) {
265
219
  const disjoint_union_data *du = ptr;
266
- return sizeof(disjoint_union_data) + _size_of(du->forest) + _size_of(du->rank);
220
+
221
+ // See https://github.com/JacksonAllan/CC/issues/3
222
+ return sizeof( cc_vec_hdr_ty ) + cap( du->pairs ) * CC_EL_SIZE( *(du->pairs) );
267
223
  } else {
268
224
  return 0;
269
225
  }
@@ -286,21 +242,7 @@ static const rb_data_type_t disjoint_union_type = {
286
242
  };
287
243
 
288
244
  /*
289
- * Helper: check that a Ruby value is a non-negative Fixnum and convert it to a C unsigned long
290
- */
291
- static unsigned long checked_nonneg_fixnum(VALUE val) {
292
- Check_Type(val, T_FIXNUM);
293
- long c_val = FIX2LONG(val);
294
-
295
- if (c_val < 0) {
296
- rb_raise(eSharedDataError, "Value must be non-negative");
297
- }
298
-
299
- return c_val;
300
- }
301
-
302
- /*
303
- * Unwrap a Rubyfied disjoint union to get the C struct inside.
245
+ * Unwrap a Ruby-side disjoint union object to get the C struct inside.
304
246
  */
305
247
  static disjoint_union_data *unwrapped(VALUE self) {
306
248
  disjoint_union_data *disjoint_union;
@@ -333,9 +275,13 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
333
275
  size_t initial_size = checked_nonneg_fixnum(argv[0]);
334
276
  disjoint_union_data *disjoint_union = unwrapped(self);
335
277
 
278
+ pair_vector *pair_vec = disjoint_union->pairs;
279
+ resize(pair_vec, initial_size);
280
+
336
281
  for (size_t i = 0; i < initial_size; i++) {
337
- add_new_element(disjoint_union, i);
282
+ lval(pair_vec, i) = make_data_pair(i, 0);
338
283
  }
284
+ disjoint_union->subset_count = initial_size;
339
285
  }
340
286
  return self;
341
287
  }
@@ -343,7 +289,7 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
343
289
  /**
344
290
  * And now the simple wrappers around the Disjoint Union C functionality. In each case we
345
291
  * - unwrap a 'VALUE self',
346
- * - i.e., theCDisjointUnion instance that contains a disjoint_union_data struct;
292
+ * - i.e., the CDisjointUnion instance on the Ruby side;
347
293
  * - munge any other arguments into longs;
348
294
  * - call the appropriate C function to act on the struct; and
349
295
  * - return an appropriate VALUE for the Ruby runtime can use.
@@ -354,7 +300,7 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
354
300
  /*
355
301
  * Add a new subset to the universe containing the element +new_v+.
356
302
  *
357
- * @param the new element, starting in its own singleton subset
303
+ * @param arg the new element, starting in its own singleton subset
358
304
  * - it must be a non-negative integer, not already part of the universe of elements.
359
305
  */
360
306
  static VALUE disjoint_union_make_set(VALUE self, VALUE arg) {
@@ -412,7 +358,7 @@ static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
412
358
  * - Tarjan, Robert E., van Leeuwen, Jan (1984). _Worst-case analysis of set union algorithms_. Journal of the ACM. 31 (2): 245–281.
413
359
  */
414
360
  void Init_c_disjoint_union() {
415
- VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
361
+ //VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
416
362
  VALUE cDisjointUnion = rb_define_class_under(mDataStructuresRMolinari, "CDisjointUnion", rb_cObject);
417
363
 
418
364
  rb_define_alloc_func(cDisjointUnion, disjoint_union_alloc);
@@ -3,10 +3,15 @@ require 'mkmf'
3
3
  abort 'missing malloc()' unless have_func "malloc"
4
4
  abort 'missing realloc()' unless have_func "realloc"
5
5
 
6
- if try_cflags('-O')
7
- append_cflags('-O')
6
+ if try_cflags('-O3')
7
+ append_cflags('-O3')
8
8
  end
9
9
 
10
10
  extension_name = "c_disjoint_union"
11
11
  dir_config(extension_name)
12
+
13
+ $srcs = ["disjoint_union.c", "../shared.c"]
14
+ $INCFLAGS << " -I$(srcdir)/.."
15
+ $VPATH << "$(srcdir)/.."
16
+
12
17
  create_makefile("data_structures_rmolinari/c_disjoint_union")
@@ -0,0 +1,17 @@
1
+ require 'mkmf'
2
+
3
+ abort 'missing malloc()' unless have_func "malloc"
4
+ abort 'missing realloc()' unless have_func "realloc"
5
+
6
+ if try_cflags('-O3')
7
+ append_cflags('-O3')
8
+ end
9
+
10
+ extension_name = "c_segment_tree_template"
11
+ dir_config(extension_name)
12
+
13
+ $srcs = ["segment_tree_template.c", "../shared.c"]
14
+ $INCFLAGS << " -I$(srcdir)/.."
15
+ $VPATH << "$(srcdir)/.."
16
+
17
+ create_makefile("data_structures_rmolinari/c_segment_tree_template")
@@ -0,0 +1,362 @@
1
+ /*
2
+ * This is a C implementation of a Segment Tree data structure.
3
+ *
4
+ * More specifically, it is the C version of the SegmentTreeTemplate Ruby class, for which see elsewhere in the repo.
5
+ */
6
+
7
+ #include "ruby.h"
8
+ #include "shared.h"
9
+
10
+ #define single_cell_val_at(seg_tree, idx) rb_funcall(seg_tree->single_cell_array_val_lambda, rb_intern("call"), 1, LONG2FIX(idx))
11
+ #define combined_val(seg_tree, v1, v2) rb_funcall(seg_tree->combine_lambda, rb_intern("call"), 2, (v1), (v2))
12
+
13
+ /**
14
+ * The C implementation of a generic Segment Tree
15
+ */
16
+
17
+ typedef struct {
18
+ VALUE *tree; // The 1-based implicit binary tree in which the data structure lives
19
+ VALUE single_cell_array_val_lambda;
20
+ VALUE combine_lambda;
21
+ VALUE identity;
22
+ size_t size; // the size of the underlying data array
23
+ size_t tree_alloc_size; // the size of the VALUE* tree array
24
+ } segment_tree_data;
25
+
26
+ /************************************************************
27
+ * Memory Management
28
+ *
29
+ */
30
+
31
+ /*
32
+ * Create one (on the heap).
33
+ */
34
+ static segment_tree_data *create_segment_tree() {
35
+ segment_tree_data *segment_tree = malloc(sizeof(segment_tree_data));
36
+
37
+ // Allocate the structures
38
+ segment_tree->tree = NULL; // we don't yet know how much space we need
39
+
40
+ segment_tree->single_cell_array_val_lambda = 0;
41
+ segment_tree->combine_lambda = 0;
42
+ segment_tree->size = 0; // we don't know the right value yet
43
+
44
+ return segment_tree;
45
+ }
46
+
47
+ /*
48
+ * Free the memory associated with a segment_tree.
49
+ *
50
+ * This will end up getting triggered by the Ruby garbage collector. Ruby learns about it via the segment_tree_type struct below.
51
+ */
52
+ static void segment_tree_free(void *ptr) {
53
+ if (ptr) {
54
+ segment_tree_data *segment_tree = ptr;
55
+ xfree(segment_tree->tree);
56
+ xfree(segment_tree);
57
+ }
58
+ }
59
+
60
+ /*
61
+ * How much memory (roughly) does a segment_tree_data instance consume?
62
+ *
63
+ * I guess the Ruby runtime can use this information when deciding how agressive to be during garbage collection and such.
64
+ */
65
+ static size_t segment_tree_memsize(const void *ptr) {
66
+ if (ptr) {
67
+ const segment_tree_data *st = ptr;
68
+
69
+ // for the tree array plus the size of the segment_tree_data struct itself.
70
+ return sizeof( VALUE ) * st->tree_alloc_size * 4 + sizeof(segment_tree_data);
71
+ } else {
72
+ return 0;
73
+ }
74
+ }
75
+
76
+ /*
77
+ * Mark the Ruby objects we hold so that the Ruby garbage collector knows that they are still in use.
78
+ */
79
+ static void segment_tree_mark(void *ptr) {
80
+ segment_tree_data *st = ptr;
81
+
82
+ rb_gc_mark(st->combine_lambda);
83
+ rb_gc_mark(st->single_cell_array_val_lambda);
84
+ rb_gc_mark(st->identity);
85
+
86
+ for (size_t i = 0; i < st->tree_alloc_size; i++) {
87
+ VALUE value = st->tree[i];
88
+ if (value) {
89
+ rb_gc_mark(value);
90
+ }
91
+ }
92
+ }
93
+
94
+
95
+ /*
96
+ * A configuration struct that tells the Ruby runtime how to deal with a segment_tree_data object.
97
+ *
98
+ * https://docs.ruby-lang.org/en/master/extension_rdoc.html#label-Encapsulate+C+data+into+a+Ruby+object
99
+ */
100
+ static const rb_data_type_t segment_tree_type = {
101
+ .wrap_struct_name = "segment_tree_template",
102
+ { // help for the Ruby garbage collector
103
+ .dmark = segment_tree_mark, // dmark, for marking other Ruby objects.
104
+ .dfree = segment_tree_free, // how to free the memory associated with an object
105
+ .dsize = segment_tree_memsize, // roughly how much space does the object consume?
106
+ },
107
+ .data = NULL, // a data field we could use for something here if we wanted. Ruby ignores it
108
+ .flags = 0 // GC-related flag values.
109
+ };
110
+
111
+ /*
112
+ * End memory management functions.
113
+ ************************************************************/
114
+
115
+
116
+ /************************************************************
117
+ * Wrapping and unwrapping the C struct and other things.
118
+ *
119
+ */
120
+
121
+ /*
122
+ * Unwrap a Ruby-side disjoint union object to get the C struct inside.
123
+ *
124
+ * TODO: consider a macro in a shared header
125
+ */
126
+ static segment_tree_data *unwrapped(VALUE self) {
127
+ segment_tree_data *segment_tree;
128
+ TypedData_Get_Struct((self), segment_tree_data, &segment_tree_type, segment_tree);
129
+ return segment_tree;
130
+ }
131
+
132
+ /*
133
+ * Allocate a segment_tree_data struct and wrap it for the Ruby runtime.
134
+ *
135
+ * This is for CSegmentTreeTemplate.allocate on the Ruby side.
136
+ */
137
+ static VALUE segment_tree_alloc(VALUE klass) {
138
+ // Get one on the heap
139
+ segment_tree_data *segment_tree = create_segment_tree();
140
+ // ...and wrap it into a Ruby object
141
+ return TypedData_Wrap_Struct(klass, &segment_tree_type, segment_tree);
142
+ }
143
+
144
+ /*
145
+ * End wrapping and unwrapping functions.
146
+ ************************************************************/
147
+
148
+ /************************************************************
149
+ * The Segment Tree API on the C side.
150
+ *
151
+ * We wrap these in the Ruby-ready functions below
152
+ */
153
+
154
+ /*
155
+ * Recursively build the internal tree data structure.
156
+ *
157
+ * - tree_idx: the index into the tree array of the node being calculated
158
+ * - [tree_l, tree_r]: the sub-interval of the underlying array data corresponding to the tree node being calculated.
159
+ */
160
+ static void build(segment_tree_data *segment_tree, size_t tree_idx, size_t tree_l, size_t tree_r) {
161
+ VALUE *tree = segment_tree->tree;
162
+
163
+ if (tree_l == tree_r) {
164
+ // Base case: the node corresponds to a subarray of length 1.
165
+ segment_tree->tree[tree_idx] = single_cell_val_at(segment_tree, tree_l);
166
+ } else {
167
+ // Build to two child nodes, and then combine their values for this node.
168
+ size_t mid = midpoint(tree_l, tree_r);
169
+ size_t left = left_child(tree_idx);
170
+ size_t right = right_child(tree_idx);
171
+
172
+ build(segment_tree, left, tree_l, mid);
173
+ build(segment_tree, right, mid + 1, tree_r);
174
+
175
+ VALUE comb_val = combined_val(segment_tree, tree[left], tree[right]);
176
+ segment_tree->tree[tree_idx] = comb_val;
177
+ }
178
+ }
179
+
180
+ /*
181
+ * Set up the internals with the arguments we get from #initialize.
182
+ *
183
+ * - combine: must be callable
184
+ * - single_cell_array_val: must be callable
185
+ * - size: must be a positive integer
186
+ * - identity: we don't care what it is.
187
+ * - maybe we should check at least that it is not 0. But Qnil is fine.
188
+ */
189
+ static void setup(segment_tree_data* seg_tree, VALUE combine, VALUE single_cell_array_val, VALUE size, VALUE identity) {
190
+ VALUE idCall = rb_intern("call");
191
+
192
+ if (!rb_obj_respond_to(combine, idCall, TRUE)) {
193
+ rb_raise(rb_eArgError, "wrong type argument %"PRIsVALUE" (should be callable)", rb_obj_class(combine));
194
+ }
195
+
196
+ if (!rb_obj_respond_to(single_cell_array_val, idCall, TRUE)) {
197
+ rb_raise(rb_eArgError, "wrong type argument %"PRIsVALUE" (should be callable)", rb_obj_class(single_cell_array_val));
198
+ }
199
+
200
+ seg_tree->combine_lambda = combine;
201
+ seg_tree->single_cell_array_val_lambda = single_cell_array_val;
202
+ seg_tree->identity = identity;
203
+ seg_tree->size = checked_nonneg_fixnum(size);
204
+
205
+ if (seg_tree->size == 0) {
206
+ rb_raise(rb_eArgError, "size must be positive.");
207
+ }
208
+
209
+ // Implicit binary tree with n leaves and straightforward left() and right() may use indices up to 4n. But see here for a way to
210
+ // reduce the requirement to 2n: https://cp-algorithms.com/data_structures/segment_tree.html#memory-efficient-implementation
211
+ size_t tree_size = 1 + 4 * seg_tree->size;
212
+ seg_tree->tree = calloc(tree_size, sizeof(VALUE));
213
+ seg_tree->tree_alloc_size = tree_size;
214
+
215
+ build(seg_tree, TREE_ROOT, 0, seg_tree->size - 1);
216
+ }
217
+
218
+
219
+ /*
220
+ * Determine the value for the subarray A(left, right).
221
+ *
222
+ * - tree_idx: the index in the array of the node we are currently visiting
223
+ * - tree_l..tree_r: the subarray handled by the current node.
224
+ * - left..right: the subarray whose value we are currently looking for.
225
+ *
226
+ * As an invariant we have left..right \subset tree_l..tree_r.
227
+ *
228
+ * We start out with
229
+ * - tree_idx = TREE_ROOT
230
+ * - tree_l..tree_r = 0..(size - 1), and
231
+ * - left..right given by the client code's query
232
+ *
233
+ * If [tree_l, tree_r] = [left, right] then the current node gives the desired answer. Otherwise we decend the tree with one or two
234
+ * recursive calls.
235
+ *
236
+ * If left..right is contained the the bottom or top half of tree_l..tree_r we decend to the corresponding child with one recursive
237
+ * call. Otherwise we split left..right at the midpoint of tree_l..tree_r, make two recursive calls, and then combine the results.
238
+ */
239
+ static VALUE determine_val(segment_tree_data* seg_tree, size_t tree_idx, size_t left, size_t right, size_t tree_l, size_t tree_r) {
240
+ // Does the current tree node exactly serve up the interval we're interested in?
241
+ if (left == tree_l && right == tree_r) {
242
+ return seg_tree->tree[tree_idx];
243
+ }
244
+
245
+ // We need to go further down the tree */
246
+ size_t mid = midpoint(tree_l, tree_r);
247
+ if (mid >= right) {
248
+ // Our interval is contained by the left child's interval
249
+ return determine_val(seg_tree, left_child(tree_idx), left, right, tree_l, mid);
250
+ } else if (mid + 1 <= left) {
251
+ // Our interval is contained by the right child's interval
252
+ return determine_val(seg_tree, right_child(tree_idx), left, right, mid + 1, tree_r);
253
+ } else {
254
+ // Our interval is split between the two, so we need to combine the results from the children.
255
+ return rb_funcall(
256
+ seg_tree->combine_lambda, rb_intern("call"), 2,
257
+ determine_val(seg_tree, left_child(tree_idx), left, mid, tree_l, mid),
258
+ determine_val(seg_tree, right_child(tree_idx), mid + 1, right, mid + 1, tree_r)
259
+ );
260
+ }
261
+ }
262
+
263
+ /*
264
+ * Update the structure to reflect the change in the underlying array at index idx.
265
+ *
266
+ * - idx: the index at which the underlying array data has changed.
267
+ * - tree_id: the index in the internal datastructure of the node we are currently visiting.
268
+ * - tree_l..tree_r: the range handled by the current node
269
+ */
270
+ static void update_val_at(segment_tree_data *seg_tree, size_t idx, size_t tree_idx, size_t tree_l, size_t tree_r) {
271
+ if (tree_l == tree_r) {
272
+ // We have found the base case of our update
273
+ if (tree_l != idx) {
274
+ rb_raise(
275
+ eSharedInternalLogicError,
276
+ "tree_l == tree_r == %lu but they do not agree with the idx %lu holding the updated value",
277
+ tree_r, idx
278
+ );
279
+ }
280
+ seg_tree->tree[tree_idx] = single_cell_val_at(seg_tree, tree_l);
281
+ } else {
282
+ // Recursively update the appropriate subtree...
283
+ size_t mid = midpoint(tree_l, tree_r);
284
+ size_t left = left_child(tree_idx);
285
+ size_t right = right_child(tree_idx);
286
+ if (mid >= idx) {
287
+ update_val_at(seg_tree, idx, left, tree_l, mid);
288
+ } else {
289
+ update_val_at(seg_tree, idx, right, mid + 1, tree_r);
290
+ }
291
+ // ...and ourself to incorporate the change
292
+ seg_tree->tree[tree_idx] = combined_val(seg_tree, seg_tree->tree[left], seg_tree->tree[right]);
293
+ }
294
+ }
295
+
296
+ /*
297
+ * End C implementation of the Segment Tree API
298
+ ************************************************************/
299
+
300
+ /**
301
+ * And now the wrappers around the C functionality.
302
+ */
303
+
304
+ /*
305
+ * CSegmentTreeTemplate#c_initialize.
306
+ *
307
+ * (see CSegmentTreeTemplate#initialize).
308
+ */
309
+ static VALUE segment_tree_init(VALUE self, VALUE combine, VALUE single_cell_array_val, VALUE size, VALUE identity) {
310
+ setup(unwrapped(self), combine, single_cell_array_val, size, identity);
311
+ return self;
312
+ }
313
+
314
+ /*
315
+ * (see SegmentTreeTemplate#query_on)
316
+ */
317
+ static VALUE segment_tree_query_on(VALUE self, VALUE left, VALUE right) {
318
+ segment_tree_data* seg_tree = unwrapped(self);
319
+ size_t c_left = checked_nonneg_fixnum(left);
320
+ size_t c_right = checked_nonneg_fixnum(right);
321
+
322
+ if (c_right >= seg_tree->size) {
323
+ rb_raise(eSharedDataError, "Bad query interval %lu..%lu (size = %lu)", c_left, c_right, seg_tree->size);
324
+ }
325
+
326
+ if (left > right) {
327
+ // empty interval.
328
+ return seg_tree->identity;
329
+ }
330
+
331
+ return determine_val(seg_tree, TREE_ROOT, c_left, c_right, 0, seg_tree->size - 1);
332
+ }
333
+
334
+ /*
335
+ * (see SegmentTreeTemplate#update_at)
336
+ */
337
+ static VALUE segment_tree_update_at(VALUE self, VALUE idx) {
338
+ segment_tree_data *seg_tree = unwrapped(self);
339
+ size_t c_idx = checked_nonneg_fixnum(idx);
340
+
341
+ if (c_idx >= seg_tree->size) {
342
+ rb_raise(eSharedDataError, "Cannot update value at index %lu, size = %lu", c_idx, seg_tree->size);
343
+ }
344
+
345
+ update_val_at(seg_tree, c_idx, TREE_ROOT, 0, seg_tree->size - 1);
346
+
347
+ return Qnil;
348
+ }
349
+
350
+ /*
351
+ * A generic Segment Tree template, written in C.
352
+ *
353
+ * (see SegmentTreeTemplate)
354
+ */
355
+ void Init_c_segment_tree_template() {
356
+ VALUE cSegmentTreeTemplate = rb_define_class_under(mDataStructuresRMolinari, "CSegmentTreeTemplate", rb_cObject);
357
+
358
+ rb_define_alloc_func(cSegmentTreeTemplate, segment_tree_alloc);
359
+ rb_define_method(cSegmentTreeTemplate, "c_initialize", segment_tree_init, 4);
360
+ rb_define_method(cSegmentTreeTemplate, "query_on", segment_tree_query_on, 2);
361
+ rb_define_method(cSegmentTreeTemplate, "update_at", segment_tree_update_at, 1);
362
+ }
data/ext/shared.c ADDED
@@ -0,0 +1,32 @@
1
+ #include "shared.h"
2
+
3
+ /*
4
+ * Arithmetic for in-array binary tree
5
+ */
6
+ size_t midpoint(size_t left, size_t right) {
7
+ return (left + right) / 2;
8
+ }
9
+
10
+ size_t left_child(size_t i) {
11
+ return i << 1;
12
+ }
13
+
14
+ size_t right_child(size_t i) {
15
+ return 1 + (i << 1);
16
+ }
17
+
18
+ /*
19
+ * Check that a Ruby value is a non-negative Fixnum and convert it to a C unsigned long
20
+ */
21
+ unsigned long checked_nonneg_fixnum(VALUE val) {
22
+ Check_Type(val, T_FIXNUM);
23
+ long c_val = FIX2LONG(val);
24
+
25
+ if (c_val < 0) {
26
+ rb_raise(eSharedDataError, "Value must be non-negative");
27
+ }
28
+
29
+ return c_val;
30
+ }
31
+
32
+
@@ -0,0 +1,112 @@
1
+ require 'must_be'
2
+
3
+ require_relative 'shared'
4
+ require_relative 'c_segment_tree_template'
5
+
6
+ # The template of Segment Tree, which can be used for various interval-related purposes, like efficiently finding the sum (or min or
7
+ # max) on a arbitrary subarray of a given array.
8
+ #
9
+ # There is an excellent description of the data structure at https://cp-algorithms.com/data_structures/segment_tree.html. The
10
+ # Wikipedia article (https://en.wikipedia.org/wiki/Segment_tree) appears to describe a different data structure which is sometimes
11
+ # called an "interval tree."
12
+ #
13
+ # For more details (and some close-to-metal analysis of run time, especially for large datasets) see
14
+ # https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up implementation,
15
+ # which is faster, at least for large datasets and cache-relevant compiled code. These issues don't really apply to code written in
16
+ # Ruby.
17
+ #
18
+ # This is a generic implementation, intended to allow easy configuration for concrete instances. See the parameters to the
19
+ # initializer and the definitions of concrete realisations like MaxValSegmentTree.
20
+ #
21
+ # We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
22
+ class DataStructuresRMolinari::CSegmentTreeTemplate
23
+
24
+ # Construct a concrete instance of a Segment Tree. See details at the links above for the underlying concepts here.
25
+ # @param combine a lambda that takes two values and munges them into a combined value.
26
+ # - For example, if we are calculating sums over subintervals, combine.call(a, b) = a + b, while if we are doing maxima we will
27
+ # return max(a, b).
28
+ # - Things get more complicated when we are calculating, say, the _index_ of the maximal value in a subinterval. Now it is not
29
+ # enough simply to store that index at each tree node, because to combine the indices from two child nodes we need to know
30
+ # both the index of the maximal element in each child node's interval, but also the maximal values themselves, so we know
31
+ # which one "wins" for the parent node. This affects the sort of work we need to do when combining and the value provided by
32
+ # the +single_cell_array_val+ lambda.
33
+ # @param single_cell_array_val a lambda that takes an index i and returns the value we need to store in the #build
34
+ # operation for the subinterval i..i.
35
+ # - This will often simply be the value data[i], but in some cases it will be something else. For example, when we are
36
+ # calculating the index of the maximal value on each subinterval we need [i, data[i]] here.
37
+ # - If +update_at+ is called later, this lambda must close over the underlying data in a way that captures the updated value.
38
+ # @param size the size of the underlying data array, used in certain internal arithmetic.
39
+ # @param identity the value to return when we are querying on an empty interval
40
+ # - for sums, this will be zero; for maxima, this will be -Infinity, etc
41
+ def initialize(combine:, single_cell_array_val:, size:, identity:)
42
+ # having sorted out the keyword arguments, pass them more easily to the C layer.
43
+ c_initialize(combine, single_cell_array_val, size, identity)
44
+ end
45
+ end
46
+
47
+ # A segment tree that for an array A(0...n) answers questions of the form "what is the maximum value in the subinterval A(i..j)?"
48
+ # in O(log n) time.
49
+ #
50
+ # C version
51
+ #
52
+ # TODO: share the definition with (non-C) MasValSegmentTree. The only difference is the class of the underlying segment tree
53
+ # template.
54
+ module DataStructuresRMolinari
55
+ class CMaxValSegmentTree
56
+ extend Forwardable
57
+
58
+ # Tell the tree that the value at idx has changed
59
+ def_delegator :@structure, :update_at
60
+
61
+ # @param data an object that contains values at integer indices based at 0, via +data[i]+.
62
+ # - This will usually be an Array, but it could also be a hash or a proc.
63
+ def initialize(data)
64
+ @structure = CSegmentTreeTemplate.new(
65
+ combine: ->(a, b) { [a, b].max },
66
+ single_cell_array_val: ->(i) { data[i] },
67
+ size: data.size,
68
+ identity: -Shared::INFINITY
69
+ )
70
+ end
71
+
72
+ # The maximum value in A(i..j).
73
+ #
74
+ # The arguments must be integers in 0...(A.size)
75
+ # @return the largest value in A(i..j) or -Infinity if i > j.
76
+ def max_on(i, j)
77
+ @structure.query_on(i, j)
78
+ end
79
+ end
80
+
81
+ # A segment tree that for an array A(0...n) answers questions of the form "what is the index of the maximal value in the
82
+ # subinterval A(i..j)?" in O(log n) time.
83
+ #
84
+ # C version
85
+ class CIndexOfMaxValSegmentTree
86
+ extend Forwardable
87
+
88
+ # Tell the tree that the value at idx has changed
89
+ def_delegator :@structure, :update_at
90
+
91
+ # @param (see MaxValSegmentTree#initialize)
92
+ def initialize(data)
93
+ @structure = CSegmentTreeTemplate.new(
94
+ combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
95
+ single_cell_array_val: ->(i) { [i, data[i]] },
96
+ size: data.size,
97
+ identity: nil
98
+ )
99
+ end
100
+
101
+ # The index of the maximum value in A(i..j)
102
+ #
103
+ # The arguments must be integers in 0...(A.size)
104
+ # @return (Integer, nil) the index of the largest value in A(i..j) or +nil+ if i > j.
105
+ # - If there is more than one entry with that value, return one the indices. There is no guarantee as to which one.
106
+ # - Return +nil+ if i > j
107
+ def index_of_max_val_on(i, j)
108
+ @structure.query_on(i, j)&.first # discard the value part of the pair, which is a bookkeeping
109
+ end
110
+ end
111
+
112
+ end
@@ -17,6 +17,7 @@ require_relative 'shared'
17
17
  #
18
18
  # We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
19
19
  class DataStructuresRMolinari::SegmentTreeTemplate
20
+ include Shared
20
21
  include Shared::BinaryTreeArithmetic
21
22
 
22
23
  # Construct a concrete instance of a Segment Tree. See details at the links above for the underlying concepts here.
@@ -47,27 +48,29 @@ class DataStructuresRMolinari::SegmentTreeTemplate
47
48
  end
48
49
 
49
50
  # The desired value (max, sum, etc.) on the subinterval left..right.
51
+ #
50
52
  # @param left the left end of the subinterval.
51
53
  # @param right the right end (inclusive) of the subinterval.
52
54
  #
55
+ # It must be that left..right is contained in 0...size.
56
+ #
53
57
  # The type of the return value depends on the concrete instance of the segment tree. We return the _identity_ element provided at
54
58
  # construction time if the interval is empty.
55
59
  def query_on(left, right)
56
- raise DataError, "Bad query interval #{left}..#{right}" if left.negative? || right >= @size
60
+ raise DataError, "Bad query interval #{left}..#{right} (size = #{@size})" unless (0...@size).cover?(left..right)
57
61
 
58
62
  return @identity if left > right # empty interval
59
63
 
60
64
  determine_val(root, left, right, 0, @size - 1)
61
65
  end
62
66
 
63
- # Update the value in the underlying array at the given idx
67
+ # Reflect the fact that the underlying array has been updated at the given idx
64
68
  #
65
69
  # @param idx an index in the underlying data array.
66
70
  #
67
71
  # Note that we don't need the updated value itself. We get that by calling the lambda +single_cell_array_val+ supplied at
68
72
  # construction.
69
73
  def update_at(idx)
70
- raise DataError, 'Cannot update an index outside the initial range of the underlying data' unless (0...@size).cover?(idx)
71
74
 
72
75
  update_val_at(idx, root, 0, @size - 1)
73
76
  end
@@ -105,9 +108,9 @@ class DataStructuresRMolinari::SegmentTreeTemplate
105
108
  left = left(tree_idx)
106
109
  right = right(tree_idx)
107
110
  if mid >= idx
108
- update_val_at(idx, left(tree_idx), tree_l, mid)
111
+ update_val_at(idx, left, tree_l, mid)
109
112
  else
110
- update_val_at(idx, right(tree_idx), mid + 1, tree_r)
113
+ update_val_at(idx, right, mid + 1, tree_r)
111
114
  end
112
115
  @tree[tree_idx] = @combine.call(@tree[left], @tree[right])
113
116
  end
@@ -10,9 +10,13 @@ end
10
10
 
11
11
  # These define classes inside module DataStructuresRMolinari
12
12
  require_relative 'data_structures_rmolinari/algorithms'
13
+
13
14
  require_relative 'data_structures_rmolinari/disjoint_union'
14
15
  require_relative 'data_structures_rmolinari/c_disjoint_union' # version as a C extension
16
+
15
17
  require_relative 'data_structures_rmolinari/segment_tree_template'
18
+ require_relative 'data_structures_rmolinari/c_segment_tree_template_impl'
19
+
16
20
  require_relative 'data_structures_rmolinari/heap'
17
21
  require_relative 'data_structures_rmolinari/max_priority_search_tree'
18
22
  require_relative 'data_structures_rmolinari/min_priority_search_tree'
@@ -34,6 +38,8 @@ module DataStructuresRMolinari
34
38
  # @param data an object that contains values at integer indices based at 0, via +data[i]+.
35
39
  # - This will usually be an Array, but it could also be a hash or a proc.
36
40
  def initialize(data)
41
+ data.must_be_a Enumerable
42
+
37
43
  @structure = SegmentTreeTemplate.new(
38
44
  combine: ->(a, b) { [a, b].max },
39
45
  single_cell_array_val: ->(i) { data[i] },
@@ -61,6 +67,8 @@ module DataStructuresRMolinari
61
67
 
62
68
  # @param (see MaxValSegmentTree#initialize)
63
69
  def initialize(data)
70
+ data.must_be_a Enumerable
71
+
64
72
  @structure = SegmentTreeTemplate.new(
65
73
  combine: ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
66
74
  single_cell_array_val: ->(i) { [i, data[i]] },
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: data_structures_rmolinari
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.3
4
+ version: 0.4.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Rory Molinari
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-01-27 00:00:00.000000000 Z
11
+ date: 2023-02-02 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: must_be
@@ -79,6 +79,7 @@ email: rorymolinari@gmail.com
79
79
  executables: []
80
80
  extensions:
81
81
  - ext/c_disjoint_union/extconf.rb
82
+ - ext/c_segment_tree_template/extconf.rb
82
83
  extra_rdoc_files: []
83
84
  files:
84
85
  - CHANGELOG.md
@@ -86,8 +87,12 @@ files:
86
87
  - Rakefile
87
88
  - ext/c_disjoint_union/disjoint_union.c
88
89
  - ext/c_disjoint_union/extconf.rb
90
+ - ext/c_segment_tree_template/extconf.rb
91
+ - ext/c_segment_tree_template/segment_tree_template.c
92
+ - ext/shared.c
89
93
  - lib/data_structures_rmolinari.rb
90
94
  - lib/data_structures_rmolinari/algorithms.rb
95
+ - lib/data_structures_rmolinari/c_segment_tree_template_impl.rb
91
96
  - lib/data_structures_rmolinari/disjoint_union.rb
92
97
  - lib/data_structures_rmolinari/heap.rb
93
98
  - lib/data_structures_rmolinari/max_priority_search_tree.rb