RubyGems - data_structures_rmolinari - Versions diffs - 0.4.2 → 0.4.4 - Mend

data_structures_rmolinari 0.4.2 → 0.4.4

Files changed (13) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +12 -0
data/README.md +35 -16
data/Rakefile +6 -4
data/ext/c_disjoint_union/disjoint_union.c +100 -142
data/ext/c_disjoint_union/extconf.rb +7 -2
data/ext/c_segment_tree_template/extconf.rb +17 -0
data/ext/c_segment_tree_template/segment_tree_template.c +362 -0
data/ext/shared.c +32 -0
data/lib/data_structures_rmolinari/c_segment_tree_template_impl.rb +112 -0
data/lib/data_structures_rmolinari/segment_tree_template.rb +8 -5
data/lib/data_structures_rmolinari.rb +8 -0
metadata +7 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c912d4ddf3a7cfc721b7f298a966f7e0d4cbd4249797506457605a44774523a0
-  data.tar.gz: c168b7096178e496f76fa53f5b8566cd2ac26897fd5b362c3c37da5314f2a6db
+  metadata.gz: 943ac55678a074cc0da3667dccbb07ee7d203639233f53bd8587af7fd8cd062e
+  data.tar.gz: ad235e5f4714e699f1cf5f113dd4b3a356a194cced5a74b60e17c5e3a896e01b
 SHA512:
-  metadata.gz: 0c88c1ad7c07fe6358e3eefd21406b4bbd33a89e89731edb3997bb027efca52a7d312ffee1435af960be73c5cd5212950a854a11c6f2105dfccc47ed4ae00c2b
-  data.tar.gz: 9bf6e4570017217b59f4f3a0b1d9e23d7752ba4e4b5dc11a988826726367956c6564763044f23b5105293213c4844667249a7f88726861f792264c1d634256ae
+  metadata.gz: a68de76c88c67fadc42752610c695b1f0b8fd17f34db9c806291aeab4c933fe84c6523615deb4197e1c9fa6d36dce30987cc4e8896a2b0c1700b7e72b5bd2fff
+  data.tar.gz: 9063d89a98d599f27db2585bf383dbfb13e8f927abce64ac7eafb2edd70c490ddad1f1fc51e0f11c24adf29f28ab8c56548a6db264b15ace239c63b1a2ce5a01

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,17 @@
 # Changelog
+## [Unreleased]
+- Disjoint Union
+  - C extension: use Convenient Containers rather than my janky Dynamic Array attempt.
+- Segment Tree
+  - Add a C implementation as CSegmentTreeTemplate.
+## [0.4.3] 2023-01-27
+- Fix bad directive in Rakefile for DisjointUnion C extension
 ## [0.4.2] 2023-01-26
 ### Added

data/README.md CHANGED Viewed

@@ -4,8 +4,8 @@ This is a small collection of Ruby data structures that I have implemented for m
 structure is almost always more educational than simply reading about it and is usually fun.  I wrote some of them while
 participating in the Advent of Code (https://adventofcode.com/).
-These implementations are not particularly clever. They are based on the expository descriptions and pseudo-code I found as I read
-about each structure and so are not as fast as possible.
+The implementations are based on the expository descriptions and pseudo-code I found as I read about each structure and so are not
+as fast as possible.
 The code is available as a gem: https://rubygems.org/gems/data_structures_rmolinari.
@@ -42,9 +42,6 @@ It also provides
 For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
 van Leeuwen.
-There is an experimental implementation as a C extension in c_disjoint_union.c. The Ruby class for this implementation is
-`CDisjointUnion`. Benchmarks indicate that a long sequence of `unite` calls is about twice as fast.
 ## Heap
 This is a standard binary heap with an `update` method, suitable for use as a priority queue. There are several supported
@@ -84,15 +81,15 @@ pointing north.
 There is no `smallest_x_in_3_sided(x0, x1, y0)`. Just use `smallest_x_in_ne(x0, y0)`.
+(These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
+[[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.)
 The single-point queries run in O(log n) time, where n is the size of P, while `enumerate_3_sided` runs in O(m + log n), where m is
 the number of points actually enumerated.
 The implementation is in `MaxPrioritySearchTree` (MaxPST for short), so called because internally the structure is, among other
 things, a max-heap on the y-coordinates.
-These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
-[[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.
 We also provide a `MinPrioritySearchTree`, which answers analagous queries in the southward-infinite quadrants and 3-sided
 regions.
@@ -108,17 +105,17 @@ both a MaxPST and MinPST. But the presentiation is hard to follow in places and
 ## Segment Tree
-Segment trees store information related to subintervals of a certain array. For example, they can be used to find the sum of the
-elements in an arbitrary subinterval A[i..j] of an array A[0..n] in O(log n) time. Each node in the tree corresponds to a subarray
-of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for arbitrary
-subarrays.
+A segment tree stores information related to subintervals of a certain array. For example, a segment tree can be used to find the
+sum of the elements in an arbitrary subinterval A(i..j) of an array A(0..n) in O(log n) time. Each node in the tree corresponds to a
+subarray of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for
+arbitrary subarrays.
 An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
-Generic code is provided in `SegmentTreeTemplate`. Concrete classes are written by providing a handful of simple lambdas and
-constants to the template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a
-segment tree, for which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes,
-`MaxValSegmentTree` and `IndexOfMaxValSegmentTree`, for examples.
+Generic code is provided in `SegmentTreeTemplate`. Concrete classes provide a handful of simple lambdas and constants to the
+template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a segment tree, for
+which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes `MaxValSegmentTree` and
+`IndexOfMaxValSegmentTree` for examples.
 ## Algorithms
@@ -131,7 +128,29 @@ The Algorithms submodule contains some algorithms using the data structures.
     [left, right, bottom, top].
   - The algorithm is due to [[DMNS2013]](#references).
+# C Extensions
+As another learning process I have implemented several of these data structures as C extensions. The class names have a "C" prefixed
+and they can be required like their pure Ruby versions. They have the same APIs as their Ruby cousins.
+## Disjoint Union
+A benchmark suggests that a long sequence of `unite` operations is about 3 times as fast with the `CDisjointUnion` as with
+`DisjointUnion`.
+The implementation uses the remarkable Convenient Containers library from Jackson Allan.[[Allan]](#references).
+## Segment Tree
+`CSegmentTreeTemplate` is the C implementation of the generic class. Concrete classes are built on top of this in Ruby, just as with
+the pure Ruby `SegmentTreeTemplate` class.
+A benchmark suggests that a long sequence of `max_on` operations against a max-val Segment Tree is about 4 times as fast with the C
+version as with the Ruby version. I'm a bit suprised the improvment isn't larger, but we must remember that the C code must still
+interact with the Ruby objects in the underlying data array, and must "combine" them, etc., by calling Ruby lambdas.
 # References
+- [Allan] Allan, J., _CC: Convenient Containers_, https://github.com/JacksonAllan/CC, retrieved 2023-02-01.
 - [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
 - [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI 10.1007/s00224-017-9760-2.
 - [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985.

data/Rakefile CHANGED Viewed

@@ -2,10 +2,12 @@ require 'rubygems'
 require 'rake/testtask'
 require 'rake/extensiontask'
-Rake::ExtensionTask.new('data_structures_rmolinari/c_disjoint_union') do |ext|
-  ext.name = 'CDisjointUnion'
-  ext.ext_dir = 'ext/c_disjoint_union'
-  ext.lib_dir = 'lib/data_structures_rmolinari/'
+['c_disjoint_union', 'c_segment_tree_template'].each do |extension_name|
+  Rake::ExtensionTask.new("data_structures_rmolinari/#{extension_name}") do |ext|
+    ext.name = extension_name
+    ext.ext_dir = "ext/#{extension_name}"
+    ext.lib_dir = 'lib/data_structures_rmolinari/'
+  end
 end
 Rake::TestTask.new do |t|

data/ext/c_disjoint_union/disjoint_union.c CHANGED Viewed

@@ -16,128 +16,84 @@
  */
 #include "ruby.h"
-// The Shared::DataError exception type in the Ruby code. We only need it when we detect a runtime error, so a macro is simplest and
-// just fine.
-#define mShared rb_define_module("Shared")
-#define eDataError rb_const_get(mShared, rb_intern_const("DataError"))
+#include "cc.h" // Convenient Containers
+#include "shared.h"
 /**
- * It's been so long since I've written non-trival C that I need to copy examples from online.
- *
- * Dynamic array of longs, with an initial value for otherwise uninitialized elements.
- * Based on  https://stackoverflow.com/questions/3536153/c-dynamically-growing-array
+ * Data type for the (parent, rank) pair, and some accessor helpers for the vec() container we are going to be using.
  */
-typedef struct {
-  long *array;
-  size_t size;
-  long default_val;
-} DynamicArray;
-void initDynamicArray(DynamicArray *a, size_t initial_size, long default_val) {
-  a->array = malloc(initial_size * sizeof(long));
-  a->size = initial_size;
-  a->default_val = default_val;
-  for (size_t i = 0; i < initial_size; i++) {
-    a->array[i] = default_val;
-  }
-}
-void insertDynamicArray(DynamicArray *a, unsigned long index, long element) {
-  if (a->size <= index) {
-    size_t new_size = a->size;
-    while (new_size <= index) {
-      new_size = 8 * new_size / 5 + 8; // 8/5 gives "Fibonnacci-like" growth; adding 8 to avoid small arrays having to reallocate
-                                       // too often. Who knows if it's worth being "clever"."
-    }
-    long* new_array = realloc(a->array, new_size * sizeof(long));
-    if (!new_array) {
-      rb_raise(rb_eRuntimeError, "Cannot allocate memory to expand DynamicArray!");
-    }
-    a->array = new_array;
-    for (size_t i = a->size; i < new_size; i++) {
-      a->array[i] = a->default_val;
-    }
+typedef struct data_pair {
+  long parent;
+  unsigned long rank;
+} data_pair;
-    a->size = new_size;
-  }
+#define DEFAULT_PARENT -1
+#define DEFAULT_RANK 0
+static data_pair default_pair = { .parent = DEFAULT_PARENT, .rank = DEFAULT_RANK };
-  a->array[index] = element;
+static data_pair make_data_pair(long parent, unsigned long rank) {
+  data_pair pair = { .parent = parent, .rank = rank };
+  return pair;
 }
-void freeDynamicArray(DynamicArray *a) {
-  free(a->array);
-  a->array = NULL;
-  a->size = 0;
-}
+/* The vector generic from Convenient Containers */
+typedef vec(data_pair) pair_vector;
+#define parent(disjoint_union_ptr, idx) (get(disjoint_union->pairs, idx)->parent)
+#define rank(disjoint_union_ptr, idx) (get(disjoint_union->pairs, idx)->rank)
 /**
  * The C implementation of a Disjoint Union
  *
- * See Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
+ * See the paper for optimizations we use to get almost constant time for find() and unite().
+ *
+ * Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
  */
 /*
  * The Disjoint Union struct.
- * - forest: an array of longs giving, for each element, the parent element of its tree.
- *   - An element e is the root of its tree just when forest[e] == e.
- *   - Two elements are in the same subset just when they are in the same tree in the forest.
+ * - pairs: a vector (dynamic array) of pairs, the i-th of which contains
+ *   - the "parent" of element i in its membership tree
+ *     - An element e is the root of its tree just when it is its own parent
+ *     - Two elements are in the same subset just when they are in the same tree in the forest.
  *     - So the key idea is that we can check this by navigating via parents from each element to their roots. Clever optimizations
  *       keep the trees flat and so most nodes are close to their roots.
- * - rank: a array of longs giving the "rank" of each element.
- *   - This value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat. See Tarjan & van
- *     Leeuwen
+ *   - the "rank" of element i
+ *     - this value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat.
  * - subset_count: the number of (disjoint) subsets.
  *   - it isn't needed internally but may be useful to client code.
  */
 typedef struct du_data {
-  DynamicArray* forest; // the forest that describes the unified subsets
-  DynamicArray* rank;   // the "ranks" of the elements, used when uniting subsets
+  pair_vector *pairs; // The generic vector container from the amazing Convenient Containers library
   size_t subset_count;
 } disjoint_union_data;
 /*
- * Create one.
- *
- * The dynamic arrays are initialized with a size of 100 because I didn't have a better idea. This will end up getting called from
- * the Ruby #allocate method, which happens before #initialize. Thus we don't know the calling code's desired initial size.
+ * Create one (on the heap).
  */
-#define INITIAL_SIZE 100
-static disjoint_union_data* create_disjoint_union() {
-  disjoint_union_data* disjoint_union = malloc(sizeof(disjoint_union_data));
+static disjoint_union_data *create_disjoint_union() {
+  disjoint_union_data *disjoint_union = (disjoint_union_data *)malloc(sizeof(disjoint_union_data));
   // Allocate the structures
-  DynamicArray* forest = malloc(sizeof(DynamicArray));
-  DynamicArray* rank = malloc(sizeof(DynamicArray));
-  initDynamicArray(forest, INITIAL_SIZE, -1);
-  initDynamicArray(rank,   INITIAL_SIZE, 0);
+  disjoint_union->pairs = malloc(sizeof(pair_vector));
+  init(disjoint_union->pairs);
-  disjoint_union->forest = forest;
-  disjoint_union->rank = rank;
   disjoint_union->subset_count = 0;
   return disjoint_union;
 }
 /*
- * Free the memory associated with a disjoint union. This will end up getting triggered by the Ruby garbage collector.
+ * Free the memory associated with a disjoint union.
+ *
+ * This will end up getting triggered by the Ruby garbage collector. Ruby learns about it via the disjoint_union_type struct below.
  */
 static void disjoint_union_free(void *ptr) {
   if (ptr) {
     disjoint_union_data *disjoint_union = ptr;
-    freeDynamicArray(disjoint_union->forest);
-    freeDynamicArray(disjoint_union->rank);
-    free(disjoint_union->forest);
-    disjoint_union->forest = NULL;
-    free(disjoint_union->rank);
-    disjoint_union->rank = NULL;
-    free(disjoint_union);
+    cleanup(disjoint_union->pairs);
+    xfree(disjoint_union);
   }
 }
@@ -148,17 +104,23 @@ static void disjoint_union_free(void *ptr) {
 /*
  * Is the given element already a member of the universe?
  */
-static int present_p(disjoint_union_data* disjoint_union, size_t element) {
-  DynamicArray* forest = disjoint_union->forest;
-  return (forest->size > element && (forest->array[element] != forest->default_val));
+static int present_p(disjoint_union_data *disjoint_union, size_t element) {
+  return (size(disjoint_union->pairs) > element && (parent(disjoint_union, element) != DEFAULT_PARENT));
 }
 /*
  * Check that the given element is a member of the universe and raise Shared::DataError (ruby-side) if not
  */
-static void assert_membership(disjoint_union_data* disjoint_union, size_t element) {
+static void assert_membership(disjoint_union_data *disjoint_union, size_t element) {
   if (!present_p(disjoint_union, element)) {
-    rb_raise(eDataError, "Value %zu is not part of the universe", element);
+    rb_raise(eSharedDataError, "Value %zu is not part of the universe", element);
+    /* rb_raise( */
+    /*          eSharedDataError, */
+    /*          "Value %zu is not part of the universe, size = %zu, forest_val = %lu", */
+    /*          element, */
+    /*          size(disjoint_union->pairs), */
+    /*          get(disjoint_union->pairs, element)->parent */
+    /*          ); */
   }
 }
@@ -167,52 +129,57 @@ static void assert_membership(disjoint_union_data* disjoint_union, size_t elemen
  *
  * Shared::DataError is raised if it is already an element.
  */
-static void add_new_element(disjoint_union_data* disjoint_union, size_t element) {
+static void add_new_element(disjoint_union_data *disjoint_union, size_t element) {
   if (present_p(disjoint_union, element)) {
-    rb_raise(eDataError, "Element %zu already present in the universe", element);
+    rb_raise(eSharedDataError, "Element %zu already present in the universe", element);
+  }
+  // Expand the underlying vector if necessary
+  size_t sz = size(disjoint_union->pairs);
+  if (sz <= element) {
+    resize(disjoint_union->pairs, element + 1);
+    for (size_t i = sz + 1; i <= element; i++) {
+      lval(disjoint_union->pairs, i) = default_pair;
+    }
   }
-  insertDynamicArray(disjoint_union->forest, element, element);
-  insertDynamicArray(disjoint_union->rank, element, 0);
+  lval(disjoint_union->pairs, element) = make_data_pair(element, 0l);
   disjoint_union->subset_count++;
 }
 /*
- * Find the canonical representative of the given element. This is the root of the tree (in forest) containing element.
+ * Find the canonical representative of the given element. This is the root of the tree containing it.
  *
  * Two elements are in the same subset exactly when their canonical representatives are equal.
  */
-static size_t find(disjoint_union_data* disjoint_union, size_t element) {
+static size_t find(disjoint_union_data *disjoint_union, size_t element) {
   assert_membership(disjoint_union, element);
-  // We implement find with "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
-  long* d = disjoint_union->forest->array; // the actual forest data
+  // We use "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
   size_t x = element;
-  while (d[d[x]] != d[x]) {
-    x = d[x] = d[d[x]];
+  long p, gp; // parent and grandparent
+  while (p = parent(disjoint_union, x), gp = parent(disjoint_union, p), p != gp) {
+    parent(disjoint_union, p) = gp;
+    x = gp;
   }
-  return d[x];
+  return parent(disjoint_union, x);
 }
 /*
- * "Link"" the two given elements so that they are in the same subset now.
+ * "Link" the two given elements so that they are in the same subset now.
  *
  * In other words, merge the subtrees containing the two elements.
  *
- * Good performace (see Tarjan and van Leeuwin) assumes that elt1 and elt2 area are disinct and already the roots of their trees,
- * though we don't check that here.
+ * elt1 and elt2 area must be disinct and the roots of their trees, though we don't check that here.
  */
-static void link_roots(disjoint_union_data* disjoint_union, size_t elt1, size_t elt2) {
-  long* rank = disjoint_union->rank->array;
-  long* forest = disjoint_union->forest->array;
-  if (rank[elt1] > rank[elt2]) {
-    forest[elt2] = elt1;
-  } else if (rank[elt1] == rank[elt2]) {
-    forest[elt2] = elt1;
-    rank[elt1]++;
+static void link_roots(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2) {
+  if (rank(disjoint_union, elt1) > rank(disjoint_union, elt2)) {
+    parent(disjoint_union, elt2) =  elt1;
+  } else if (rank(disjoint_union, elt1) == rank(disjoint_union, elt2)) {
+    parent(disjoint_union, elt2) = elt1;
+    rank(disjoint_union, elt1)++;
   } else {
-    forest[elt1] = elt2;
+    parent(disjoint_union, elt1) = elt2;
   }
   disjoint_union->subset_count--;
@@ -221,12 +188,12 @@ static void link_roots(disjoint_union_data* disjoint_union, size_t elt1, size_t
 /*
  * "Unite" or merge the subsets containing elt1 and elt2.
  */
-static void unite(disjoint_union_data* disjoint_union, size_t elt1, size_t elt2) {
+static void unite(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2) {
   assert_membership(disjoint_union, elt1);
   assert_membership(disjoint_union, elt2);
   if (elt1 == elt2) {
-    rb_raise(eDataError, "Uniting an element with itself is meaningless");
+    rb_raise(eSharedDataError, "Uniting an element with itself is meaningless");
   }
   size_t root1 = find(disjoint_union, elt1);
@@ -249,8 +216,10 @@ static void unite(disjoint_union_data* disjoint_union, size_t elt1, size_t elt2)
 // deciding how agressive to be during garbage collection and such.
 static size_t disjoint_union_memsize(const void *ptr) {
   if (ptr) {
-    const disjoint_union_data *disjoint_union = ptr;
-    return (2 * disjoint_union->forest->size * sizeof(long)); // disjoint_union->rank is the same size
+    const disjoint_union_data *du = ptr;
+    // See https://github.com/JacksonAllan/CC/issues/3
+    return sizeof( cc_vec_hdr_ty ) + cap( du->pairs ) * CC_EL_SIZE( *(du->pairs) );
   } else {
     return 0;
   }
@@ -273,26 +242,10 @@ static const rb_data_type_t disjoint_union_type = {
 };
 /*
- * Helper: check that a Ruby value is a non-negative Fixnum and convert it to a nice C long
- *
- * TODO: can we return an size_t or unsigned long instead?
+ * Unwrap a Ruby-side disjoint union object to get the C struct inside.
  */
-static long checked_nonneg_fixnum(VALUE val) {
-  Check_Type(val, T_FIXNUM);
-  long c_val = FIX2LONG(val);
-  if (c_val < 0) {
-    rb_raise(eDataError, "Value must be non-negative");
-  }
-  return c_val;
-}
-/*
- * Unwrap a Rubyfied disjoint union to get the C struct inside.
- */
-static disjoint_union_data* unwrapped(VALUE self) {
-  disjoint_union_data* disjoint_union;
+static disjoint_union_data *unwrapped(VALUE self) {
+  disjoint_union_data *disjoint_union;
   TypedData_Get_Struct((self), disjoint_union_data, &disjoint_union_type, disjoint_union);
   return disjoint_union;
 }
@@ -301,7 +254,9 @@ static disjoint_union_data* unwrapped(VALUE self) {
  * This is for CDisjointUnion.allocate on the Ruby side
  */
 static VALUE disjoint_union_alloc(VALUE klass) {
-  disjoint_union_data* disjoint_union = create_disjoint_union();
+  // Get one on the heap
+  disjoint_union_data *disjoint_union = create_disjoint_union();
+  // Wrap it up into a Ruby object
   return TypedData_Wrap_Struct(klass, &disjoint_union_type, disjoint_union);
 }
@@ -318,11 +273,15 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
     rb_raise(rb_eArgError, "wrong number of arguments");
   } else {
     size_t initial_size = checked_nonneg_fixnum(argv[0]);
-    disjoint_union_data* disjoint_union = unwrapped(self);
+    disjoint_union_data *disjoint_union = unwrapped(self);
+    pair_vector *pair_vec = disjoint_union->pairs;
+    resize(pair_vec, initial_size);
     for (size_t i = 0; i < initial_size; i++) {
-      add_new_element(disjoint_union, i);
+      lval(pair_vec, i) = make_data_pair(i, 0);
     }
+    disjoint_union->subset_count = initial_size;
   }
   return self;
 }
@@ -330,7 +289,7 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
 /**
  * And now the simple wrappers around the Disjoint Union C functionality. In each case we
  *   - unwrap a 'VALUE self',
- *     - i.e., theCDisjointUnion instance that contains a disjoint_union_data struct;
+ *     - i.e., the CDisjointUnion instance on the Ruby side;
  *   - munge any other arguments into longs;
  *   - call the appropriate C function to act on the struct; and
  *   - return an appropriate VALUE for the Ruby runtime can use.
@@ -341,7 +300,7 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
 /*
  * Add a new subset to the universe containing the element +new_v+.
  *
- * @param the new element, starting in its own singleton subset
+ * @param arg the new element, starting in its own singleton subset
  *   - it must be a non-negative integer, not already part of the universe of elements.
  */
 static VALUE disjoint_union_make_set(VALUE self, VALUE arg) {
@@ -389,8 +348,7 @@ static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
  * The data structure provides efficient actions to merge two disjoint subsets, i.e., replace them by their union, and determine if
  * two elements are in the same subset.
  *
- * The elements of the set are 0, 1, ..., n-1, where n is the size of the universe. Client code can map its data to these
- * representatives.
+ * The elements of the set are non-negative integers. Client code can map its data to these representatives.
  *
  * See https://en.wikipedia.org/wiki/Disjoint-set_data_structure for a good introduction.
  *
@@ -400,7 +358,7 @@ static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
  * - Tarjan, Robert E., van Leeuwen, Jan (1984). _Worst-case analysis of set union algorithms_. Journal of the ACM. 31 (2): 245–281.
  */
 void Init_c_disjoint_union() {
-  VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
+  //VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
   VALUE cDisjointUnion = rb_define_class_under(mDataStructuresRMolinari, "CDisjointUnion", rb_cObject);
   rb_define_alloc_func(cDisjointUnion, disjoint_union_alloc);

data/ext/c_disjoint_union/extconf.rb CHANGED Viewed

@@ -3,10 +3,15 @@ require 'mkmf'
 abort 'missing malloc()' unless have_func "malloc"
 abort 'missing realloc()' unless have_func "realloc"
-if try_cflags('-O')
-  append_cflags('-O')
+if try_cflags('-O3')
+  append_cflags('-O3')
 end
 extension_name = "c_disjoint_union"
 dir_config(extension_name)
+$srcs = ["disjoint_union.c", "../shared.c"]
+$INCFLAGS << " -I$(srcdir)/.."
+$VPATH << "$(srcdir)/.."
 create_makefile("data_structures_rmolinari/c_disjoint_union")

data/ext/c_segment_tree_template/extconf.rb ADDED Viewed

@@ -0,0 +1,17 @@
+require 'mkmf'
+abort 'missing malloc()' unless have_func "malloc"
+abort 'missing realloc()' unless have_func "realloc"
+if try_cflags('-O3')
+  append_cflags('-O3')
+end
+extension_name = "c_segment_tree_template"
+dir_config(extension_name)
+$srcs = ["segment_tree_template.c", "../shared.c"]
+$INCFLAGS << " -I$(srcdir)/.."
+$VPATH << "$(srcdir)/.."
+create_makefile("data_structures_rmolinari/c_segment_tree_template")

data/ext/c_segment_tree_template/segment_tree_template.c ADDED Viewed

@@ -0,0 +1,362 @@
+/*
+ * This is a C implementation of a Segment Tree data structure.
+ *
+ * More specifically, it is the C version of the SegmentTreeTemplate Ruby class, for which see elsewhere in the repo.
+ */
+#include "ruby.h"
+#include "shared.h"
+#define single_cell_val_at(seg_tree, idx) rb_funcall(seg_tree->single_cell_array_val_lambda, rb_intern("call"), 1, LONG2FIX(idx))
+#define combined_val(seg_tree, v1, v2) rb_funcall(seg_tree->combine_lambda, rb_intern("call"), 2, (v1), (v2))
+/**
+ * The C implementation of a generic Segment Tree
+ */
+typedef struct {
+  VALUE *tree; // The 1-based implicit binary tree in which the data structure lives
+  VALUE single_cell_array_val_lambda;
+  VALUE combine_lambda;
+  VALUE identity;
+  size_t size; // the size of the underlying data array
+  size_t tree_alloc_size; // the size of the VALUE* tree array
+} segment_tree_data;
+/************************************************************
+ * Memory Management
+ *
+ */
+/*
+ * Create one (on the heap).
+ */
+static segment_tree_data *create_segment_tree() {
+  segment_tree_data *segment_tree = malloc(sizeof(segment_tree_data));
+  // Allocate the structures
+  segment_tree->tree = NULL; // we don't yet know how much space we need
+  segment_tree->single_cell_array_val_lambda = 0;
+  segment_tree->combine_lambda = 0;
+  segment_tree->size = 0; // we don't know the right value yet
+  return segment_tree;
+}
+/*
+ * Free the memory associated with a segment_tree.
+ *
+ * This will end up getting triggered by the Ruby garbage collector. Ruby learns about it via the segment_tree_type struct below.
+ */
+static void segment_tree_free(void *ptr) {
+  if (ptr) {
+    segment_tree_data *segment_tree = ptr;
+    xfree(segment_tree->tree);
+    xfree(segment_tree);
+  }
+}
+/*
+ * How much memory (roughly) does a segment_tree_data instance consume?
+ *
+ * I guess the Ruby runtime can use this information when deciding how agressive to be during garbage collection and such.
+ */
+static size_t segment_tree_memsize(const void *ptr) {
+  if (ptr) {
+    const segment_tree_data *st = ptr;
+    // for the tree array plus the size of the segment_tree_data struct itself.
+    return sizeof( VALUE ) * st->tree_alloc_size * 4 + sizeof(segment_tree_data);
+  } else {
+    return 0;
+  }
+}
+/*
+ * Mark the Ruby objects we hold so that the Ruby garbage collector knows that they are still in use.
+ */
+static void segment_tree_mark(void *ptr) {
+  segment_tree_data *st = ptr;
+  rb_gc_mark(st->combine_lambda);
+  rb_gc_mark(st->single_cell_array_val_lambda);
+  rb_gc_mark(st->identity);
+  for (size_t i = 0; i < st->tree_alloc_size; i++) {
+    VALUE value = st->tree[i];
+    if (value) {
+      rb_gc_mark(value);
+    }
+  }
+}
+/*
+ * A configuration struct that tells the Ruby runtime how to deal with a segment_tree_data object.
+ *
+ * https://docs.ruby-lang.org/en/master/extension_rdoc.html#label-Encapsulate+C+data+into+a+Ruby+object
+ */
+static const rb_data_type_t segment_tree_type = {
+  .wrap_struct_name = "segment_tree_template",
+  { // help for the Ruby garbage collector
+    .dmark = segment_tree_mark, // dmark, for marking other Ruby objects.
+    .dfree = segment_tree_free, // how to free the memory associated with an object
+    .dsize = segment_tree_memsize, // roughly how much space does the object consume?
+  },
+  .data = NULL, // a data field we could use for something here if we wanted. Ruby ignores it
+  .flags = 0  // GC-related flag values.
+};
+/*
+ * End memory management functions.
+ ************************************************************/
+/************************************************************
+ * Wrapping and unwrapping the C struct and other things.
+ *
+ */
+/*
+ * Unwrap a Ruby-side disjoint union object to get the C struct inside.
+ *
+ * TODO: consider a macro in a shared header
+ */
+static segment_tree_data *unwrapped(VALUE self) {
+  segment_tree_data *segment_tree;
+  TypedData_Get_Struct((self), segment_tree_data, &segment_tree_type, segment_tree);
+  return segment_tree;
+}
+/*
+ * Allocate a segment_tree_data struct and wrap it for the Ruby runtime.
+ *
+ * This is for CSegmentTreeTemplate.allocate on the Ruby side.
+ */
+static VALUE segment_tree_alloc(VALUE klass) {
+  // Get one on the heap
+  segment_tree_data *segment_tree = create_segment_tree();
+  // ...and wrap it into a Ruby object
+  return TypedData_Wrap_Struct(klass, &segment_tree_type, segment_tree);
+}
+/*
+ * End wrapping and unwrapping functions.
+ ************************************************************/
+/************************************************************
+ * The Segment Tree API on the C side.
+ *
+ * We wrap these in the Ruby-ready functions below
+ */
+/*
+ * Recursively build the internal tree data structure.
+ *
+ * - tree_idx: the index into the tree array of the node being calculated
+ * - [tree_l, tree_r]: the sub-interval of the underlying array data corresponding to the tree node being calculated.
+ */
+static void build(segment_tree_data *segment_tree, size_t tree_idx, size_t tree_l, size_t tree_r) {
+  VALUE *tree = segment_tree->tree;
+  if (tree_l == tree_r) {
+    // Base case: the node corresponds to a subarray of length 1.
+    segment_tree->tree[tree_idx] = single_cell_val_at(segment_tree, tree_l);
+  } else {
+    // Build to two child nodes, and then combine their values for this node.
+    size_t mid = midpoint(tree_l, tree_r);
+    size_t left = left_child(tree_idx);
+    size_t right = right_child(tree_idx);
+    build(segment_tree, left, tree_l, mid);
+    build(segment_tree, right, mid + 1, tree_r);
+    VALUE comb_val = combined_val(segment_tree, tree[left], tree[right]);
+    segment_tree->tree[tree_idx] = comb_val;
+  }
+}
+/*
+ * Set up the internals with the arguments we get from #initialize.
+ *
+ * - combine: must be callable
+ * - single_cell_array_val: must be callable
+ * - size: must be a positive integer
+ * - identity: we don't care what it is.
+ *   - maybe we should check at least that it is not 0. But Qnil is fine.
+ */
+static void setup(segment_tree_data* seg_tree, VALUE combine, VALUE single_cell_array_val, VALUE size, VALUE identity) {
+  VALUE idCall = rb_intern("call");
+  if (!rb_obj_respond_to(combine, idCall, TRUE)) {
+    rb_raise(rb_eArgError, "wrong type argument %"PRIsVALUE" (should be callable)", rb_obj_class(combine));
+  }
+  if (!rb_obj_respond_to(single_cell_array_val, idCall, TRUE)) {
+    rb_raise(rb_eArgError, "wrong type argument %"PRIsVALUE" (should be callable)", rb_obj_class(single_cell_array_val));
+  }
+  seg_tree->combine_lambda = combine;
+  seg_tree->single_cell_array_val_lambda = single_cell_array_val;
+  seg_tree->identity = identity;
+  seg_tree->size = checked_nonneg_fixnum(size);
+  if (seg_tree->size == 0) {
+    rb_raise(rb_eArgError, "size must be positive.");
+  }
+  // Implicit binary tree with n leaves and straightforward left() and right() may use indices up to 4n.  But see here for a way to
+  // reduce the requirement to 2n: https://cp-algorithms.com/data_structures/segment_tree.html#memory-efficient-implementation
+  size_t tree_size = 1 + 4 * seg_tree->size;
+  seg_tree->tree = calloc(tree_size, sizeof(VALUE));
+  seg_tree->tree_alloc_size = tree_size;
+  build(seg_tree, TREE_ROOT, 0, seg_tree->size - 1);
+}
+/*
+ * Determine the value for the subarray A(left, right).
+ *
+ * - tree_idx: the index in the array of the node we are currently visiting
+ * - tree_l..tree_r: the subarray handled by the current node.
+ * - left..right: the subarray whose value we are currently looking for.
+ *
+ * As an invariant we have left..right \subset tree_l..tree_r.
+ *
+ * We start out with
+ * - tree_idx = TREE_ROOT
+ * - tree_l..tree_r = 0..(size - 1), and
+ * - left..right given by the client code's query
+ *
+ * If [tree_l, tree_r] = [left, right] then the current node gives the desired answer. Otherwise we decend the tree with one or two
+ * recursive calls.
+ *
+ * If left..right is contained the the bottom or top half of tree_l..tree_r we decend to the corresponding child with one recursive
+ * call. Otherwise we split left..right at the midpoint of tree_l..tree_r, make two recursive calls, and then combine the results.
+ */
+static VALUE determine_val(segment_tree_data* seg_tree, size_t tree_idx, size_t left, size_t right, size_t tree_l, size_t tree_r) {
+  // Does the current tree node exactly serve up the interval we're interested in?
+  if (left == tree_l && right == tree_r) {
+    return seg_tree->tree[tree_idx];
+  }
+  // We need to go further down the tree */
+  size_t mid = midpoint(tree_l, tree_r);
+  if (mid >= right) {
+    // Our interval is contained by the left child's interval
+    return determine_val(seg_tree, left_child(tree_idx),  left, right, tree_l,  mid);
+  } else if (mid + 1 <= left) {
+    // Our interval is contained by the right child's interval
+    return determine_val(seg_tree, right_child(tree_idx), left, right, mid + 1, tree_r);
+  } else {
+    // Our interval is split between the two, so we need to combine the results from the children.
+    return rb_funcall(
+                      seg_tree->combine_lambda, rb_intern("call"), 2,
+                      determine_val(seg_tree, left_child(tree_idx),  left,    mid,   tree_l,  mid),
+                      determine_val(seg_tree, right_child(tree_idx), mid + 1, right, mid + 1, tree_r)
+                      );
+  }
+}
+/*
+ * Update the structure to reflect the change in the underlying array at index idx.
+ *
+ * - idx: the index at which the underlying array data has changed.
+ * - tree_id: the index in the internal datastructure of the node we are currently visiting.
+ * - tree_l..tree_r: the range handled by the current node
+ */
+static void update_val_at(segment_tree_data *seg_tree, size_t idx, size_t tree_idx, size_t tree_l, size_t tree_r) {
+  if (tree_l == tree_r) {
+    // We have found the base case of our update
+    if (tree_l != idx) {
+      rb_raise(
+               eSharedInternalLogicError,
+               "tree_l == tree_r == %lu but they do not agree with the idx %lu holding the updated value",
+               tree_r, idx
+               );
+    }
+    seg_tree->tree[tree_idx] = single_cell_val_at(seg_tree, tree_l);
+  } else {
+    // Recursively update the appropriate subtree...
+    size_t mid = midpoint(tree_l, tree_r);
+    size_t left = left_child(tree_idx);
+    size_t right = right_child(tree_idx);
+    if (mid >= idx) {
+      update_val_at(seg_tree, idx, left, tree_l, mid);
+    } else {
+      update_val_at(seg_tree, idx, right, mid + 1, tree_r);
+    }
+    // ...and ourself to incorporate the change
+    seg_tree->tree[tree_idx] = combined_val(seg_tree, seg_tree->tree[left], seg_tree->tree[right]);
+  }
+}
+/*
+ * End C implementation of the Segment Tree API
+ ************************************************************/
+/**
+ * And now the wrappers around the C functionality.
+ */
+/*
+ * CSegmentTreeTemplate#c_initialize.
+ *
+ * (see CSegmentTreeTemplate#initialize).
+ */
+static VALUE segment_tree_init(VALUE self, VALUE combine, VALUE single_cell_array_val, VALUE size, VALUE identity) {
+  setup(unwrapped(self), combine, single_cell_array_val, size, identity);
+  return self;
+}
+/*
+ * (see SegmentTreeTemplate#query_on)
+ */
+static VALUE segment_tree_query_on(VALUE self, VALUE left, VALUE right) {
+  segment_tree_data* seg_tree = unwrapped(self);
+  size_t c_left = checked_nonneg_fixnum(left);
+  size_t c_right = checked_nonneg_fixnum(right);
+  if (c_right >= seg_tree->size) {
+    rb_raise(eSharedDataError, "Bad query interval %lu..%lu (size = %lu)", c_left, c_right, seg_tree->size);
+  }
+  if (left > right) {
+    // empty interval.
+    return seg_tree->identity;
+  }
+  return determine_val(seg_tree, TREE_ROOT, c_left, c_right, 0, seg_tree->size - 1);
+}
+/*
+ * (see SegmentTreeTemplate#update_at)
+ */
+static VALUE segment_tree_update_at(VALUE self, VALUE idx) {
+  segment_tree_data *seg_tree = unwrapped(self);
+  size_t c_idx = checked_nonneg_fixnum(idx);
+  if (c_idx >= seg_tree->size) {
+    rb_raise(eSharedDataError, "Cannot update value at index %lu, size = %lu", c_idx, seg_tree->size);
+  }
+  update_val_at(seg_tree, c_idx, TREE_ROOT, 0, seg_tree->size - 1);
+  return Qnil;
+}
+/*
+ * A generic Segment Tree template, written in C.
+ *
+ * (see SegmentTreeTemplate)
+ */
+void Init_c_segment_tree_template() {
+  VALUE cSegmentTreeTemplate = rb_define_class_under(mDataStructuresRMolinari, "CSegmentTreeTemplate", rb_cObject);
+  rb_define_alloc_func(cSegmentTreeTemplate, segment_tree_alloc);
+  rb_define_method(cSegmentTreeTemplate, "c_initialize", segment_tree_init, 4);
+  rb_define_method(cSegmentTreeTemplate, "query_on", segment_tree_query_on, 2);
+  rb_define_method(cSegmentTreeTemplate, "update_at", segment_tree_update_at, 1);
+}

data/ext/shared.c ADDED Viewed

@@ -0,0 +1,32 @@
+#include "shared.h"
+/*
+ * Arithmetic for in-array binary tree
+ */
+size_t midpoint(size_t left, size_t right) {
+  return (left + right) / 2;
+}
+size_t left_child(size_t i) {
+  return i << 1;
+}
+size_t right_child(size_t i) {
+  return 1 + (i << 1);
+}
+/*
+ * Check that a Ruby value is a non-negative Fixnum and convert it to a C unsigned long
+ */
+unsigned long checked_nonneg_fixnum(VALUE val) {
+  Check_Type(val, T_FIXNUM);
+  long c_val = FIX2LONG(val);
+  if (c_val < 0) {
+    rb_raise(eSharedDataError, "Value must be non-negative");
+  }
+  return c_val;
+}

data/lib/data_structures_rmolinari/c_segment_tree_template_impl.rb ADDED Viewed

@@ -0,0 +1,112 @@
+require 'must_be'
+require_relative 'shared'
+require_relative 'c_segment_tree_template'
+# The template of Segment Tree, which can be used for various interval-related purposes, like efficiently finding the sum (or min or
+# max) on a arbitrary subarray of a given array.
+#
+# There is an excellent description of the data structure at https://cp-algorithms.com/data_structures/segment_tree.html. The
+# Wikipedia article (https://en.wikipedia.org/wiki/Segment_tree) appears to describe a different data structure which is sometimes
+# called an "interval tree."
+#
+# For more details (and some close-to-metal analysis of run time, especially for large datasets) see
+# https://en.algorithmica.org/hpc/data-structures/segment-trees/. In particular, this shows how to do a bottom-up implementation,
+# which is faster, at least for large datasets and cache-relevant compiled code. These issues don't really apply to code written in
+# Ruby.
+#
+# This is a generic implementation, intended to allow easy configuration for concrete instances. See the parameters to the
+# initializer and the definitions of concrete realisations like MaxValSegmentTree.
+#
+# We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
+class DataStructuresRMolinari::CSegmentTreeTemplate
+  # Construct a concrete instance of a Segment Tree. See details at the links above for the underlying concepts here.
+  # @param combine a lambda that takes two values and munges them into a combined value.
+  #   - For example, if we are calculating sums over subintervals, combine.call(a, b) = a + b, while if we are doing maxima we will
+  #     return max(a, b).
+  #   - Things get more complicated when we are calculating, say, the _index_ of the maximal value in a subinterval. Now it is not
+  #     enough simply to store that index at each tree node, because to combine the indices from two child nodes we need to know
+  #     both the index of the maximal element in each child node's interval, but also the maximal values themselves, so we know
+  #     which one "wins" for the parent node. This affects the sort of work we need to do when combining and the value provided by
+  #     the +single_cell_array_val+ lambda.
+  # @param single_cell_array_val a lambda that takes an index i and returns the value we need to store in the #build
+  #     operation for the subinterval i..i.
+  #     - This will often simply be the value data[i], but in some cases it will be something else. For example, when we are
+  #       calculating the index of the maximal value on each subinterval we need [i, data[i]] here.
+  #     - If +update_at+ is called later, this lambda must close over the underlying data in a way that captures the updated value.
+  # @param size the size of the underlying data array, used in certain internal arithmetic.
+  # @param identity the value to return when we are querying on an empty interval
+  #   - for sums, this will be zero; for maxima, this will be -Infinity, etc
+  def initialize(combine:, single_cell_array_val:, size:, identity:)
+    # having sorted out the keyword arguments, pass them more easily to the C layer.
+    c_initialize(combine, single_cell_array_val, size, identity)
+  end
+end
+# A segment tree that for an array A(0...n) answers questions of the form "what is the maximum value in the subinterval A(i..j)?"
+# in O(log n) time.
+#
+# C version
+#
+# TODO: share the definition with (non-C) MasValSegmentTree. The only difference is the class of the underlying segment tree
+# template.
+module DataStructuresRMolinari
+  class CMaxValSegmentTree
+    extend Forwardable
+    # Tell the tree that the value at idx has changed
+    def_delegator :@structure, :update_at
+    # @param data an object that contains values at integer indices based at 0, via +data[i]+.
+    #   - This will usually be an Array, but it could also be a hash or a proc.
+    def initialize(data)
+      @structure = CSegmentTreeTemplate.new(
+        combine:               ->(a, b) { [a, b].max },
+        single_cell_array_val: ->(i) { data[i] },
+        size:                  data.size,
+        identity:              -Shared::INFINITY
+      )
+    end
+    # The maximum value in A(i..j).
+    #
+    # The arguments must be integers in 0...(A.size)
+    # @return the largest value in A(i..j) or -Infinity if i > j.
+    def max_on(i, j)
+      @structure.query_on(i, j)
+    end
+  end
+  # A segment tree that for an array A(0...n) answers questions of the form "what is the index of the maximal value in the
+  # subinterval A(i..j)?" in O(log n) time.
+  #
+  # C version
+  class CIndexOfMaxValSegmentTree
+    extend Forwardable
+    # Tell the tree that the value at idx has changed
+    def_delegator :@structure, :update_at
+    # @param (see MaxValSegmentTree#initialize)
+    def initialize(data)
+      @structure = CSegmentTreeTemplate.new(
+        combine:               ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
+        single_cell_array_val: ->(i) { [i, data[i]] },
+        size:                  data.size,
+        identity:              nil
+      )
+    end
+    # The index of the maximum value in A(i..j)
+    #
+    # The arguments must be integers in 0...(A.size)
+    # @return (Integer, nil) the index of the largest value in A(i..j) or +nil+ if i > j.
+    #   - If there is more than one entry with that value, return one the indices. There is no guarantee as to which one.
+    #   - Return +nil+ if i > j
+    def index_of_max_val_on(i, j)
+      @structure.query_on(i, j)&.first # discard the value part of the pair, which is a bookkeeping
+    end
+  end
+end

data/lib/data_structures_rmolinari/segment_tree_template.rb CHANGED Viewed

@@ -17,6 +17,7 @@ require_relative 'shared'
 #
 # We do O(n) work to build the internal data structure at initialization. Then we answer queries in O(log n) time.
 class DataStructuresRMolinari::SegmentTreeTemplate
+  include Shared
   include Shared::BinaryTreeArithmetic
   # Construct a concrete instance of a Segment Tree. See details at the links above for the underlying concepts here.
@@ -47,27 +48,29 @@ class DataStructuresRMolinari::SegmentTreeTemplate
   end
   # The desired value (max, sum, etc.) on the subinterval left..right.
+  #
   # @param left the left end of the subinterval.
   # @param right the right end (inclusive) of the subinterval.
   #
+  # It must be that left..right is contained in 0...size.
+  #
   # The type of the return value depends on the concrete instance of the segment tree. We return the _identity_ element provided at
   # construction time if the interval is empty.
   def query_on(left, right)
-    raise DataError, "Bad query interval #{left}..#{right}" if left.negative? || right >= @size
+    raise DataError, "Bad query interval #{left}..#{right} (size = #{@size})" unless (0...@size).cover?(left..right)
     return @identity if left > right # empty interval
     determine_val(root, left, right, 0, @size - 1)
   end
-  # Update the value in the underlying array at the given idx
+  # Reflect the fact that the underlying array has been updated at the given idx
   #
   # @param idx an index in the underlying data array.
   #
   # Note that we don't need the updated value itself. We get that by calling the lambda +single_cell_array_val+ supplied at
   # construction.
   def update_at(idx)
-    raise DataError, 'Cannot update an index outside the initial range of the underlying data' unless (0...@size).cover?(idx)
     update_val_at(idx, root, 0, @size - 1)
   end
@@ -105,9 +108,9 @@ class DataStructuresRMolinari::SegmentTreeTemplate
       left = left(tree_idx)
       right = right(tree_idx)
       if mid >= idx
-        update_val_at(idx, left(tree_idx), tree_l, mid)
+        update_val_at(idx, left, tree_l, mid)
       else
-        update_val_at(idx, right(tree_idx), mid + 1, tree_r)
+        update_val_at(idx, right, mid + 1, tree_r)
       end
       @tree[tree_idx] = @combine.call(@tree[left], @tree[right])
     end

data/lib/data_structures_rmolinari.rb CHANGED Viewed

@@ -10,9 +10,13 @@ end
 # These define classes inside module DataStructuresRMolinari
 require_relative 'data_structures_rmolinari/algorithms'
 require_relative 'data_structures_rmolinari/disjoint_union'
 require_relative 'data_structures_rmolinari/c_disjoint_union' # version as a C extension
 require_relative 'data_structures_rmolinari/segment_tree_template'
+require_relative 'data_structures_rmolinari/c_segment_tree_template_impl'
 require_relative 'data_structures_rmolinari/heap'
 require_relative 'data_structures_rmolinari/max_priority_search_tree'
 require_relative 'data_structures_rmolinari/min_priority_search_tree'
@@ -34,6 +38,8 @@ module DataStructuresRMolinari
     # @param data an object that contains values at integer indices based at 0, via +data[i]+.
     #   - This will usually be an Array, but it could also be a hash or a proc.
     def initialize(data)
+      data.must_be_a Enumerable
       @structure = SegmentTreeTemplate.new(
         combine:               ->(a, b) { [a, b].max },
         single_cell_array_val: ->(i) { data[i] },
@@ -61,6 +67,8 @@ module DataStructuresRMolinari
     # @param (see MaxValSegmentTree#initialize)
     def initialize(data)
+      data.must_be_a Enumerable
       @structure = SegmentTreeTemplate.new(
         combine:               ->(p1, p2) { p1[1] >= p2[1] ? p1 : p2 },
         single_cell_array_val: ->(i) { [i, data[i]] },

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: data_structures_rmolinari
 version: !ruby/object:Gem::Version
-  version: 0.4.2
+  version: 0.4.4
 platform: ruby
 authors:
 - Rory Molinari
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2023-01-26 00:00:00.000000000 Z
+date: 2023-02-02 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: must_be
@@ -79,6 +79,7 @@ email: rorymolinari@gmail.com
 executables: []
 extensions:
 - ext/c_disjoint_union/extconf.rb
+- ext/c_segment_tree_template/extconf.rb
 extra_rdoc_files: []
 files:
 - CHANGELOG.md
@@ -86,8 +87,12 @@ files:
 - Rakefile
 - ext/c_disjoint_union/disjoint_union.c
 - ext/c_disjoint_union/extconf.rb
+- ext/c_segment_tree_template/extconf.rb
+- ext/c_segment_tree_template/segment_tree_template.c
+- ext/shared.c
 - lib/data_structures_rmolinari.rb
 - lib/data_structures_rmolinari/algorithms.rb
+- lib/data_structures_rmolinari/c_segment_tree_template_impl.rb
 - lib/data_structures_rmolinari/disjoint_union.rb
 - lib/data_structures_rmolinari/heap.rb
 - lib/data_structures_rmolinari/max_priority_search_tree.rb