RubyGems - data_structures_rmolinari - Versions diffs - 0.4.3 → 0.5.0 - Mend

data_structures_rmolinari 0.4.3 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +19 -0
data/README.md +119 -33
data/Rakefile +6 -4
data/ext/c_disjoint_union/disjoint_union.c +75 -129
data/ext/c_disjoint_union/extconf.rb +7 -2
data/ext/c_segment_tree_template/extconf.rb +17 -0
data/ext/c_segment_tree_template/segment_tree_template.c +363 -0
data/ext/shared.c +32 -0
data/lib/data_structures_rmolinari/algorithms.rb +5 -5
data/lib/data_structures_rmolinari/c_segment_tree_template_impl.rb +15 -0
data/lib/data_structures_rmolinari/disjoint_union.rb +2 -0
data/lib/data_structures_rmolinari/segment_tree.rb +126 -0
data/lib/data_structures_rmolinari/segment_tree_template.rb +11 -8
data/lib/data_structures_rmolinari.rb +5 -62
metadata +8 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c9022e9531472d1125c6172025c2d10c5d4ef4f9c43e326a43f1c5b4f0721263
-  data.tar.gz: '0212619be7fe32e68b63d2087730f81ffd6b4179b8b8bf63aa0026e4e3056224'
+  metadata.gz: 7682f6d3b0779f347ce0797f55f33b9d7dcc7bd9c2039fc2fd6f865eb72e085a
+  data.tar.gz: d717e5e36f79ddc4ecb605a59b475b7114359dea7476445590deb300f7915bd4
 SHA512:
-  metadata.gz: a7f9258eeed2dc7e7fa5713aaecfcdf44e061bb161aa3d0d2662fb662bfb6b2685c61be221b4a109792982c3b2aa6215da75b51ae299d4a9237b6226000612e4
-  data.tar.gz: e585a245f753ef731895163eedba802e3fe2f6000720d10705b5a0cd02a12642a35220eea57c8b71f504b66db7cb06161fdfca1660edc0dc132ee026dd83be4d
+  metadata.gz: c3ffd9a4f67f55b7a2df1c949cf2288c06fcae416d5ff03a10307a1b79c3dae1daa74e2576d5e190c989adeea47b046426fad8c3c64199aadf22ba500b317f36
+  data.tar.gz: 8380d6117f2955da9362395f8315f5121b4f7afba2f69aabb1981a01b675cbbed81d07c10b5745409080c2588c92df3d676ec36efa128571234a74dceef0e20d

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,24 @@
 # Changelog
+## [Unreleased]
+## [0.5.0] 2023-02.03
+- SegmentTree
+  - Reorganize the code into a SegmentTree submodule.
+  - Provide a conveniece method for getting concrete instances.
+- README.md
+  - Add some simple example code for the data types.
+## [0.4.4] 2023-02-02
+- Disjoint Union
+  - C extension: use Convenient Containers rather than my janky Dynamic Array attempt.
+- Segment Tree
+  - Add a C implementation as CSegmentTreeTemplate.
 ## [0.4.3] 2023-01-27
 - Fix bad directive in Rakefile for DisjointUnion C extension

data/README.md CHANGED Viewed

@@ -4,8 +4,8 @@ This is a small collection of Ruby data structures that I have implemented for m
 structure is almost always more educational than simply reading about it and is usually fun.  I wrote some of them while
 participating in the Advent of Code (https://adventofcode.com/).
-These implementations are not particularly clever. They are based on the expository descriptions and pseudo-code I found as I read
-about each structure and so are not as fast as possible.
+The implementations are based on the expository descriptions and pseudo-code I found as I read about each structure and so are not
+as fast as possible.
 The code is available as a gem: https://rubygems.org/gems/data_structures_rmolinari.
@@ -14,18 +14,6 @@ The code is available as a gem: https://rubygems.org/gems/data_structures_rmolin
 The right way to organize the code is not obvious to me. For now the data structures are all defined in the module
 `DataStructuresRMolinari` to avoid polluting the global namespace.
-Example usage after the gem is installed:
-```
-require 'data_structures_rmolinari`
-# Pull what we need out of the namespace
-MaxPrioritySearchTree = DataStructuresRMolinari::MaxPrioritySearchTree
-Point = DataStructuresRMolinari::Point # anything responding to :x and :y is fine
-pst = MaxPrioritySearchTree.new([Point.new(1, 1)])
-puts pst.largest_y_in_ne(0, 0) # "Point(1,1)"
-```
 # Implementations
 ## Disjoint Union
@@ -42,8 +30,22 @@ It also provides
 For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
 van Leeuwen.
-There is an experimental implementation as a C extension in c_disjoint_union.c. The Ruby class for this implementation is
-`CDisjointUnion`. Benchmarks indicate that a long sequence of `unite` calls is about twice as fast.
+``` ruby
+require 'data_structures_rmolinari'
+DisjointUnion = DataStructuresRMolinari::DisjointUnion
+# Create an instance over the "universe" 0, 1, ..., 9.
+du = DisjointUnion.new(10)
+du.subset_count          # => 10; each element starts out in its own subset
+du.unite(2, 3)           # say that 2 and 3 are actually in the same subset
+du.subset_count          # => 9
+du.find(2) == du.find(3) # => true
+du.unite(4, 5)
+du.unite(3, 4)           # now 2, 3, 4, and 5 are all in the same subset
+du.subset_count          # => 7
+```
 ## Heap
@@ -66,6 +68,24 @@ allows the insertion of duplicate items (which is sometimes useful) and slightly
 See https://en.wikipedia.org/wiki/Binary_heap and the paper by Edelkamp, Elmasry, and Katajainen [[EEK2017]](#references).
+``` ruby
+require 'data_structures_rmolinari'
+Heap = DataStructuresRMolinari::Heap
+data = [4, 3, 2, 1]
+heap = Heap.new
+# Insert the elements of data, each with itself as priority.
+data.each { |v| heap.insert(v, v) }
+heap.top           # => 1, since we have a min-heap.
+heap.pop           # => 1
+heap.top           # => 2; with 1 gone, this is the element with least priority
+heap.update(3, -3)
+heap.top           # => 3; now 3 is the element with least priority
+```
 ## Priority Search Tree
 A PST stores a set P of two-dimensional points in a way that allows certain queries about P to be answered efficiently. The data
@@ -84,41 +104,81 @@ pointing north.
 There is no `smallest_x_in_3_sided(x0, x1, y0)`. Just use `smallest_x_in_ne(x0, y0)`.
+(These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
+[[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.)
 The single-point queries run in O(log n) time, where n is the size of P, while `enumerate_3_sided` runs in O(m + log n), where m is
 the number of points actually enumerated.
 The implementation is in `MaxPrioritySearchTree` (MaxPST for short), so called because internally the structure is, among other
 things, a max-heap on the y-coordinates.
-These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
-[[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.
 We also provide a `MinPrioritySearchTree`, which answers analagous queries in the southward-infinite quadrants and 3-sided
 regions.
 By default these data structures are immutable: once constructed they cannot be changed. But there is a constructor option that
 makes the instance "dynamic". This allows us to delete the element at the root of the tree - the one with largest y value (smallest
 for MinPST) - with the `delete_top!` method. This operation is important in certain algorithms, such as enumerating all maximal
-empty rectangles (see the second paper by De et al.[[DMNS2013]](#references)) Note that points can still not be added to the PST in
+empty rectangles (see the second paper by De et al[[DMNS2013]](#references)). Note that points can still not be added to the PST in
 any case, and choosing the dynamic option makes certain internal bookkeeping operations slower.
 In [[DMNS2013]](#references) De et al. generalize the in-place structure to a _Min-max Priority Search Tree_ (MinmaxPST) that can
 answer queries in all four quadrants and both "kinds" of 3-sided boxes. Having one of these would save the trouble of constructing
 both a MaxPST and MinPST. But the presentiation is hard to follow in places and the paper's pseudocode is buggy.[^minmaxpst]
+``` ruby
+require 'data_structures_rmolinari'
+MaxPST = DataStructuresRMolinari::MaxPrioritySearchTree
+Point = Shared::Point # simple (x, y) struct. Anything responding to #x and #y will work
+data = [Point.new(0, 0), Point.new(1, 2), Point.new(2, 1)]
+pst = MaxPST.new(data)
+pst.largest_y_in_ne(0, 0)              # => #<struct Shared::Point x=1, y=2>
+pst.largest_y_in_ne(1, 1)              # => #<struct Shared::Point x=1, y=2>
+pst.largest_y_in_ne(1.5, 1)            # => #<struct Shared::Point x=2, y=1>
+pst.largest_y_in_3_sided(-0.5, 0.5, 0) # => #<struct Shared::Point x=0, y=0>
+```
 ## Segment Tree
-Segment trees store information related to subintervals of a certain array. For example, they can be used to find the sum of the
-elements in an arbitrary subinterval A[i..j] of an array A[0..n] in O(log n) time. Each node in the tree corresponds to a subarray
-of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for arbitrary
-subarrays.
+A segment tree stores information related to subintervals of a certain array. For example, a segment tree can be used to find the
+sum of the elements in an arbitrary subinterval A(i..j) of an array A(0..n) in O(log n) time. Each node in the tree corresponds to a
+subarray of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for
+arbitrary subarrays.
 An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
-Generic code is provided in `SegmentTreeTemplate`. Concrete classes are written by providing a handful of simple lambdas and
-constants to the template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a
-segment tree, for which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes,
-`MaxValSegmentTree` and `IndexOfMaxValSegmentTree`, for examples.
+Generic code is provided in `SegmentTree::SegmentTreeTemplate` and its equivalent (and faster) C-based sibling,
+`SegmentTree::CSegmentTreeTemplate` (see [below](#c-extensions)).
+Writing a concrete segment tree class just means providing some simple lambdas and constants to the template class's
+initializer. Figuring out the details requires some knowledge of the internal mechanisms of a segment tree, for which the link at
+cp-algorithms.com is very helpful. See the implementations of the concrete classes `MaxValSegmentTree` and
+`IndexOfMaxValSegmentTree` for examples.
+Since there are several concrete "types" and two underlying generic implementions there is a convenience method on the `SegmentTree`
+module to get instances.
+``` ruby
+require 'data_structures_rmolinari'
+SegmentTree = DataStructuresRMolinari::SegmentTree # namespace module
+data = [1, -3, 2, 1, 5, -9]
+# Get a segment tree instance that will answer "max over this subinterval" questions about data.
+# Here we get one using the ruby implementation of the generic functionality.
+#
+# We offer :index_of_max as an alternative to :max. This will construct an instance that answers
+# questions of the form "an index of the maximum value over this subinterval".
+#
+# To use the version written in C, put :c instead of :ruby.
+seg_tree = SegmentTree.construct(data, :max, :ruby)
+seg_tree.max_on(0, 2) # => 2
+seg_tree.max_on(1, 4) # => 5
+# ..etc..
+```
 ## Algorithms
@@ -131,11 +191,37 @@ The Algorithms submodule contains some algorithms using the data structures.
     [left, right, bottom, top].
   - The algorithm is due to [[DMNS2013]](#references).
+# C Extensions
+As another learning process I have implemented several of these data structures as C extensions. The APIs are the same.
+## Disjoint Union
+The C version is called `CDisjointUnion`.  A benchmark suggests that a long sequence of `unite` operations is about 3 times as fast
+with `CDisjointUnion` as with `DisjointUnion`.
+The implementation uses the remarkable Convenient Containers library from Jackson Allan.[[Allan]](#references).
+## Segment Tree
+`CSegmentTreeTemplate` is the C implementation of the generic class. Concrete classes are built on top of this in Ruby, just as with
+the pure Ruby `SegmentTreeTemplate` class.
+A benchmark suggests that a long sequence of `max_on` operations against a max-val Segment Tree is about 4 times as fast with C as
+with Ruby. I'm a bit suprised the improvment isn't larger, but remember that the C code must still interact with the Ruby objects in
+the underlying data array, and must combine them, etc., via Ruby lambdas.
 # References
-- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
-- [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI 10.1007/s00224-017-9760-2.
-- [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985.
-- [DMNS2011] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Priority Search Tree_, 23rd Canadian Conference on Computational Geometry, 2011.
-- [DMNS2013] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46 (2013), pp 310-327.
+- [Allan] Allan, J., _CC: Convenient Containers_, https://github.com/JacksonAllan/CC, (retrieved 2023-02-01).
+- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp
+  245–281, https://dl.acm.org/doi/10.1145/62.2160 (retrieved 2022-02-01).
+- [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI
+  10.1007/s00224-017-9760-2, https://kclpure.kcl.ac.uk/portal/files/87388857/TheoryComputingSzstems.pdf (retrieved 2022-02-02).
+- [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985,
+  http://www.cs.duke.edu/courses/fall08/cps234/handouts/SMJ000257.pdf (retrieved 2023-02-02).
+- [DMNS2011] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Priority Search Tree_, 23rd Canadian Conference on
+  Computational Geometry, 2011, http://www.cs.carleton.ca/~michiel/inplace_pst.pdf (retrieved 2023-02-02).
+- [DMNS2013] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46
+  (2013), pp 310-327, https://people.scs.carleton.ca/~michiel/MinMaxPST.pdf (retrieved 2023-02-02).
 [^minmaxpst]: See the comments in the fragmentary class `MinMaxPrioritySearchTree` for further details.

data/Rakefile CHANGED Viewed

@@ -2,10 +2,12 @@ require 'rubygems'
 require 'rake/testtask'
 require 'rake/extensiontask'
-Rake::ExtensionTask.new('data_structures_rmolinari/c_disjoint_union') do |ext|
-  ext.name = 'c_disjoint_union'
-  ext.ext_dir = 'ext/c_disjoint_union'
-  ext.lib_dir = 'lib/data_structures_rmolinari/'
+['c_disjoint_union', 'c_segment_tree_template'].each do |extension_name|
+  Rake::ExtensionTask.new("data_structures_rmolinari/#{extension_name}") do |ext|
+    ext.name = extension_name
+    ext.ext_dir = "ext/#{extension_name}"
+    ext.lib_dir = 'lib/data_structures_rmolinari/'
+  end
 end
 Rake::TestTask.new do |t|

data/ext/c_disjoint_union/disjoint_union.c CHANGED Viewed

@@ -16,118 +16,69 @@
  */
 #include "ruby.h"
-// The Shared::DataError exception type in the Ruby code. We only need it when we detect a runtime error, so a macro should be fine.
-#define mShared rb_define_module("Shared")
-#define eSharedDataError rb_const_get(mShared, rb_intern_const("DataError"))
+#include "cc.h" // Convenient Containers
+#include "shared.h"
 /**
- * It's been so long since I've written non-trival C that I need to copy examples from online.
- *
- * Dynamic array of longs, with an initial value for otherwise uninitialized elements.
- * Based on  https://stackoverflow.com/questions/3536153/c-dynamically-growing-array
- */
-typedef struct {
-  long *array;
-  size_t size;
-  long default_val;
-} DynamicArray;
-/*
- * Initialize a DynamicArray struct with the given initial size and with all values set to the default value.
- *
- * The default value is stored and used to initialize new array sections if and when the array needs to be expanded.
- */
-void initDynamicArray(DynamicArray *a, size_t initial_size, long default_val) {
-  a->array = malloc(initial_size * sizeof(long));
-  a->size = initial_size;
-  a->default_val = default_val;
-  for (size_t i = 0; i < initial_size; i++) {
-    a->array[i] = default_val;
-  }
-}
-/*
- * Assign +value+ to the the +index+-th element of the array, expanding the available space if necessary.
+ * Data type for the (parent, rank) pair, and some accessor helpers for the vec() container we are going to be using.
  */
-void assignInDynamicArray(DynamicArray *a, unsigned long index, long value) {
-  if (a->size <= index) {
-    size_t new_size = a->size;
-    while (new_size <= index) {
-      new_size = 8 * new_size / 5 + 8; // 8/5 gives "Fibonnacci-like" growth; adding 8 to avoid small arrays having to reallocate
-                                       // too often as they grow. Who knows if it's worth being "clever".
-    }
-    long *new_array = realloc(a->array, new_size * sizeof(long));
-    if (!new_array) {
-      rb_raise(rb_eRuntimeError, "Cannot allocate memory to expand DynamicArray!");
-    }
+typedef struct data_pair {
+  long parent;
+  unsigned long rank;
+} data_pair;
-    a->array = new_array;
-    for (size_t i = a->size; i < new_size; i++) {
-      a->array[i] = a->default_val;
-    }
-    a->size = new_size;
-  }
+#define DEFAULT_PARENT -1
+#define DEFAULT_RANK 0
+static data_pair default_pair = { .parent = DEFAULT_PARENT, .rank = DEFAULT_RANK };
-  a->array[index] = value;
+static data_pair make_data_pair(long parent, unsigned long rank) {
+  data_pair pair = { .parent = parent, .rank = rank };
+  return pair;
 }
-void freeDynamicArray(DynamicArray *a) {
-  free(a->array);
-  a->array = NULL;
-  a->size = 0;
-}
+/* The vector generic from Convenient Containers */
+typedef vec(data_pair) pair_vector;
-size_t _size_of(DynamicArray *a) {
-  return a->size * sizeof(a->default_val);
-}
+#define parent(disjoint_union_ptr, idx) (get(disjoint_union->pairs, idx)->parent)
+#define rank(disjoint_union_ptr, idx) (get(disjoint_union->pairs, idx)->rank)
 /**
  * The C implementation of a Disjoint Union
  *
- * See Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
+ * See the paper for optimizations we use to get almost constant time for find() and unite().
+ *
+ * Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
  */
 /*
  * The Disjoint Union struct.
- * - forest: an array of longs giving, for each element, the element's parent.
- *   - An element e is the root of its tree just when forest[e] == e.
- *   - Two elements are in the same subset just when they are in the same tree in the forest.
+ * - pairs: a vector (dynamic array) of pairs, the i-th of which contains
+ *   - the "parent" of element i in its membership tree
+ *     - An element e is the root of its tree just when it is its own parent
+ *     - Two elements are in the same subset just when they are in the same tree in the forest.
  *     - So the key idea is that we can check this by navigating via parents from each element to their roots. Clever optimizations
  *       keep the trees flat and so most nodes are close to their roots.
- * - rank: a array of longs giving the "rank" of each element.
- *   - This value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat. See Tarjan & van
- *     Leeuwen
+ *   - the "rank" of element i
+ *     - this value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat.
  * - subset_count: the number of (disjoint) subsets.
  *   - it isn't needed internally but may be useful to client code.
  */
 typedef struct du_data {
-  DynamicArray *forest; // the forest that describes the unified subsets
-  DynamicArray *rank;   // the "ranks" of the elements, used when uniting subsets
+  pair_vector *pairs; // The generic vector container from the amazing Convenient Containers library
   size_t subset_count;
 } disjoint_union_data;
 /*
  * Create one (on the heap).
- *
- * The dynamic arrays are initialized with a size of 100 because I didn't have a better idea. This will end up getting called from
- * the Ruby #allocate method, which happens before #initialize. Thus we don't know the calling code's desired initial size.
  */
-#define INITIAL_SIZE 100
 static disjoint_union_data *create_disjoint_union() {
   disjoint_union_data *disjoint_union = (disjoint_union_data *)malloc(sizeof(disjoint_union_data));
   // Allocate the structures
-  DynamicArray *forest = (DynamicArray *)malloc(sizeof(DynamicArray));
-  DynamicArray *rank = (DynamicArray *)malloc(sizeof(DynamicArray));
-  initDynamicArray(forest, INITIAL_SIZE, -1);
-  initDynamicArray(rank,   INITIAL_SIZE, 0);
+  disjoint_union->pairs = malloc(sizeof(pair_vector));
+  init(disjoint_union->pairs);
-  disjoint_union->forest = forest;
-  disjoint_union->rank = rank;
   disjoint_union->subset_count = 0;
   return disjoint_union;
@@ -141,15 +92,7 @@ static disjoint_union_data *create_disjoint_union() {
 static void disjoint_union_free(void *ptr) {
   if (ptr) {
     disjoint_union_data *disjoint_union = ptr;
-    freeDynamicArray(disjoint_union->forest);
-    freeDynamicArray(disjoint_union->rank);
-    free(disjoint_union->forest);
-    disjoint_union->forest = NULL;
-    free(disjoint_union->rank);
-    disjoint_union->rank = NULL;
+    cleanup(disjoint_union->pairs);
     xfree(disjoint_union);
   }
 }
@@ -162,8 +105,7 @@ static void disjoint_union_free(void *ptr) {
  * Is the given element already a member of the universe?
  */
 static int present_p(disjoint_union_data *disjoint_union, size_t element) {
-  DynamicArray *forest = (DynamicArray *)disjoint_union->forest;
-  return (forest->size > element && (forest->array[element] != forest->default_val));
+  return (size(disjoint_union->pairs) > element && (parent(disjoint_union, element) != DEFAULT_PARENT));
 }
 /*
@@ -172,6 +114,13 @@ static int present_p(disjoint_union_data *disjoint_union, size_t element) {
 static void assert_membership(disjoint_union_data *disjoint_union, size_t element) {
   if (!present_p(disjoint_union, element)) {
     rb_raise(eSharedDataError, "Value %zu is not part of the universe", element);
+    /* rb_raise( */
+    /*          eSharedDataError, */
+    /*          "Value %zu is not part of the universe, size = %zu, forest_val = %lu", */
+    /*          element, */
+    /*          size(disjoint_union->pairs), */
+    /*          get(disjoint_union->pairs, element)->parent */
+    /*          ); */
   }
 }
@@ -185,47 +134,52 @@ static void add_new_element(disjoint_union_data *disjoint_union, size_t element)
     rb_raise(eSharedDataError, "Element %zu already present in the universe", element);
   }
-  assignInDynamicArray(disjoint_union->forest, element, element);
-  assignInDynamicArray(disjoint_union->rank, element, 0);
+  // Expand the underlying vector if necessary
+  size_t sz = size(disjoint_union->pairs);
+  if (sz <= element) {
+    resize(disjoint_union->pairs, element + 1);
+    for (size_t i = sz + 1; i <= element; i++) {
+      lval(disjoint_union->pairs, i) = default_pair;
+    }
+  }
+  lval(disjoint_union->pairs, element) = make_data_pair(element, 0l);
   disjoint_union->subset_count++;
 }
 /*
- * Find the canonical representative of the given element. This is the root of the tree (in forest) containing element.
+ * Find the canonical representative of the given element. This is the root of the tree containing it.
  *
  * Two elements are in the same subset exactly when their canonical representatives are equal.
  */
 static size_t find(disjoint_union_data *disjoint_union, size_t element) {
   assert_membership(disjoint_union, element);
-  // We implement find with "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
-  long *d = disjoint_union->forest->array; // the actual forest data
+  // We use "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
   size_t x = element;
-  while (d[d[x]] != d[x]) {
-    x = d[x] = d[d[x]];
+  long p, gp; // parent and grandparent
+  while (p = parent(disjoint_union, x), gp = parent(disjoint_union, p), p != gp) {
+    parent(disjoint_union, p) = gp;
+    x = gp;
   }
-  return d[x];
+  return parent(disjoint_union, x);
 }
 /*
- * "Link"" the two given elements so that they are in the same subset now.
+ * "Link" the two given elements so that they are in the same subset now.
  *
  * In other words, merge the subtrees containing the two elements.
  *
- * Good performace (see Tarjan and van Leeuwin) assumes that elt1 and elt2 area are disinct and already the roots of their trees,
- * though we don't check that here.
+ * elt1 and elt2 area must be disinct and the roots of their trees, though we don't check that here.
  */
 static void link_roots(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2) {
-  long *rank = disjoint_union->rank->array;
-  long *forest = disjoint_union->forest->array;
-  if (rank[elt1] > rank[elt2]) {
-    forest[elt2] = elt1;
-  } else if (rank[elt1] == rank[elt2]) {
-    forest[elt2] = elt1;
-    rank[elt1]++;
+  if (rank(disjoint_union, elt1) > rank(disjoint_union, elt2)) {
+    parent(disjoint_union, elt2) =  elt1;
+  } else if (rank(disjoint_union, elt1) == rank(disjoint_union, elt2)) {
+    parent(disjoint_union, elt2) = elt1;
+    rank(disjoint_union, elt1)++;
   } else {
-    forest[elt1] = elt2;
+    parent(disjoint_union, elt1) = elt2;
   }
   disjoint_union->subset_count--;
@@ -263,7 +217,9 @@ static void unite(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2)
 static size_t disjoint_union_memsize(const void *ptr) {
   if (ptr) {
     const disjoint_union_data *du = ptr;
-    return sizeof(disjoint_union_data) + _size_of(du->forest) + _size_of(du->rank);
+    // See https://github.com/JacksonAllan/CC/issues/3
+    return sizeof( cc_vec_hdr_ty ) + cap( du->pairs ) * CC_EL_SIZE( *(du->pairs) );
   } else {
     return 0;
   }
@@ -286,21 +242,7 @@ static const rb_data_type_t disjoint_union_type = {
 };
 /*
- * Helper: check that a Ruby value is a non-negative Fixnum and convert it to a C unsigned long
- */
-static unsigned long checked_nonneg_fixnum(VALUE val) {
-  Check_Type(val, T_FIXNUM);
-  long c_val = FIX2LONG(val);
-  if (c_val < 0) {
-    rb_raise(eSharedDataError, "Value must be non-negative");
-  }
-  return c_val;
-}
-/*
- * Unwrap a Rubyfied disjoint union to get the C struct inside.
+ * Unwrap a Ruby-side disjoint union object to get the C struct inside.
  */
 static disjoint_union_data *unwrapped(VALUE self) {
   disjoint_union_data *disjoint_union;
@@ -333,9 +275,13 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
     size_t initial_size = checked_nonneg_fixnum(argv[0]);
     disjoint_union_data *disjoint_union = unwrapped(self);
+    pair_vector *pair_vec = disjoint_union->pairs;
+    resize(pair_vec, initial_size);
     for (size_t i = 0; i < initial_size; i++) {
-      add_new_element(disjoint_union, i);
+      lval(pair_vec, i) = make_data_pair(i, 0);
     }
+    disjoint_union->subset_count = initial_size;
   }
   return self;
 }
@@ -343,7 +289,7 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
 /**
  * And now the simple wrappers around the Disjoint Union C functionality. In each case we
  *   - unwrap a 'VALUE self',
- *     - i.e., theCDisjointUnion instance that contains a disjoint_union_data struct;
+ *     - i.e., the CDisjointUnion instance on the Ruby side;
  *   - munge any other arguments into longs;
  *   - call the appropriate C function to act on the struct; and
  *   - return an appropriate VALUE for the Ruby runtime can use.
@@ -354,7 +300,7 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
 /*
  * Add a new subset to the universe containing the element +new_v+.
  *
- * @param the new element, starting in its own singleton subset
+ * @param arg the new element, starting in its own singleton subset
  *   - it must be a non-negative integer, not already part of the universe of elements.
  */
 static VALUE disjoint_union_make_set(VALUE self, VALUE arg) {
@@ -412,7 +358,7 @@ static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
  * - Tarjan, Robert E., van Leeuwen, Jan (1984). _Worst-case analysis of set union algorithms_. Journal of the ACM. 31 (2): 245–281.
  */
 void Init_c_disjoint_union() {
-  VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
+  //VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
   VALUE cDisjointUnion = rb_define_class_under(mDataStructuresRMolinari, "CDisjointUnion", rb_cObject);
   rb_define_alloc_func(cDisjointUnion, disjoint_union_alloc);

data/ext/c_disjoint_union/extconf.rb CHANGED Viewed

@@ -3,10 +3,15 @@ require 'mkmf'
 abort 'missing malloc()' unless have_func "malloc"
 abort 'missing realloc()' unless have_func "realloc"
-if try_cflags('-O')
-  append_cflags('-O')
+if try_cflags('-O3')
+  append_cflags('-O3')
 end
 extension_name = "c_disjoint_union"
 dir_config(extension_name)
+$srcs = ["disjoint_union.c", "../shared.c"]
+$INCFLAGS << " -I$(srcdir)/.."
+$VPATH << "$(srcdir)/.."
 create_makefile("data_structures_rmolinari/c_disjoint_union")

data/ext/c_segment_tree_template/extconf.rb ADDED Viewed

@@ -0,0 +1,17 @@
+require 'mkmf'
+abort 'missing malloc()' unless have_func "malloc"
+abort 'missing realloc()' unless have_func "realloc"
+if try_cflags('-O3')
+  append_cflags('-O3')
+end
+extension_name = "c_segment_tree_template"
+dir_config(extension_name)
+$srcs = ["segment_tree_template.c", "../shared.c"]
+$INCFLAGS << " -I$(srcdir)/.."
+$VPATH << "$(srcdir)/.."
+create_makefile("data_structures_rmolinari/c_segment_tree_template")