RubyGems - data_structures_rmolinari - Versions diffs - 0.4.3 → 0.5.0 - Mend

data_structures_rmolinari 0.4.3 → 0.5.0

Files changed (16) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +19 -0
data/README.md +119 -33
data/Rakefile +6 -4
data/ext/c_disjoint_union/disjoint_union.c +75 -129
data/ext/c_disjoint_union/extconf.rb +7 -2
data/ext/c_segment_tree_template/extconf.rb +17 -0
data/ext/c_segment_tree_template/segment_tree_template.c +363 -0
data/ext/shared.c +32 -0
data/lib/data_structures_rmolinari/algorithms.rb +5 -5
data/lib/data_structures_rmolinari/c_segment_tree_template_impl.rb +15 -0
data/lib/data_structures_rmolinari/disjoint_union.rb +2 -0
data/lib/data_structures_rmolinari/segment_tree.rb +126 -0
data/lib/data_structures_rmolinari/segment_tree_template.rb +11 -8
data/lib/data_structures_rmolinari.rb +5 -62
metadata +8 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c9022e9531472d1125c6172025c2d10c5d4ef4f9c43e326a43f1c5b4f0721263
-  data.tar.gz: '0212619be7fe32e68b63d2087730f81ffd6b4179b8b8bf63aa0026e4e3056224'
+  metadata.gz: 7682f6d3b0779f347ce0797f55f33b9d7dcc7bd9c2039fc2fd6f865eb72e085a
+  data.tar.gz: d717e5e36f79ddc4ecb605a59b475b7114359dea7476445590deb300f7915bd4
 SHA512:
-  metadata.gz: a7f9258eeed2dc7e7fa5713aaecfcdf44e061bb161aa3d0d2662fb662bfb6b2685c61be221b4a109792982c3b2aa6215da75b51ae299d4a9237b6226000612e4
-  data.tar.gz: e585a245f753ef731895163eedba802e3fe2f6000720d10705b5a0cd02a12642a35220eea57c8b71f504b66db7cb06161fdfca1660edc0dc132ee026dd83be4d
+  metadata.gz: c3ffd9a4f67f55b7a2df1c949cf2288c06fcae416d5ff03a10307a1b79c3dae1daa74e2576d5e190c989adeea47b046426fad8c3c64199aadf22ba500b317f36
+  data.tar.gz: 8380d6117f2955da9362395f8315f5121b4f7afba2f69aabb1981a01b675cbbed81d07c10b5745409080c2588c92df3d676ec36efa128571234a74dceef0e20d

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,24 @@
 # Changelog
+## [Unreleased]
+## [0.5.0] 2023-02.03
+- SegmentTree
+  - Reorganize the code into a SegmentTree submodule.
+  - Provide a conveniece method for getting concrete instances.
+- README.md
+  - Add some simple example code for the data types.
+## [0.4.4] 2023-02-02
+- Disjoint Union
+  - C extension: use Convenient Containers rather than my janky Dynamic Array attempt.
+- Segment Tree
+  - Add a C implementation as CSegmentTreeTemplate.
 ## [0.4.3] 2023-01-27
 - Fix bad directive in Rakefile for DisjointUnion C extension

data/README.md CHANGED Viewed

@@ -4,8 +4,8 @@ This is a small collection of Ruby data structures that I have implemented for m
 structure is almost always more educational than simply reading about it and is usually fun.  I wrote some of them while
 participating in the Advent of Code (https://adventofcode.com/).
-These implementations are not particularly clever. They are based on the expository descriptions and pseudo-code I found as I read
-about each structure and so are not as fast as possible.
+The implementations are based on the expository descriptions and pseudo-code I found as I read about each structure and so are not
+as fast as possible.
 The code is available as a gem: https://rubygems.org/gems/data_structures_rmolinari.
@@ -14,18 +14,6 @@ The code is available as a gem: https://rubygems.org/gems/data_structures_rmolin
 The right way to organize the code is not obvious to me. For now the data structures are all defined in the module
 `DataStructuresRMolinari` to avoid polluting the global namespace.
-Example usage after the gem is installed:
-```
-require 'data_structures_rmolinari`
-# Pull what we need out of the namespace
-MaxPrioritySearchTree = DataStructuresRMolinari::MaxPrioritySearchTree
-Point = DataStructuresRMolinari::Point # anything responding to :x and :y is fine
-pst = MaxPrioritySearchTree.new([Point.new(1, 1)])
-puts pst.largest_y_in_ne(0, 0) # "Point(1,1)"
-```
 # Implementations
 ## Disjoint Union
@@ -42,8 +30,22 @@ It also provides
 For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
 van Leeuwen.
-There is an experimental implementation as a C extension in c_disjoint_union.c. The Ruby class for this implementation is
-`CDisjointUnion`. Benchmarks indicate that a long sequence of `unite` calls is about twice as fast.
+``` ruby
+require 'data_structures_rmolinari'
+DisjointUnion = DataStructuresRMolinari::DisjointUnion
+# Create an instance over the "universe" 0, 1, ..., 9.
+du = DisjointUnion.new(10)
+du.subset_count          # => 10; each element starts out in its own subset
+du.unite(2, 3)           # say that 2 and 3 are actually in the same subset
+du.subset_count          # => 9
+du.find(2) == du.find(3) # => true
+du.unite(4, 5)
+du.unite(3, 4)           # now 2, 3, 4, and 5 are all in the same subset
+du.subset_count          # => 7
+```
 ## Heap
@@ -66,6 +68,24 @@ allows the insertion of duplicate items (which is sometimes useful) and slightly
 See https://en.wikipedia.org/wiki/Binary_heap and the paper by Edelkamp, Elmasry, and Katajainen [[EEK2017]](#references).
+``` ruby
+require 'data_structures_rmolinari'
+Heap = DataStructuresRMolinari::Heap
+data = [4, 3, 2, 1]
+heap = Heap.new
+# Insert the elements of data, each with itself as priority.
+data.each { |v| heap.insert(v, v) }
+heap.top           # => 1, since we have a min-heap.
+heap.pop           # => 1
+heap.top           # => 2; with 1 gone, this is the element with least priority
+heap.update(3, -3)
+heap.top           # => 3; now 3 is the element with least priority
+```
 ## Priority Search Tree
 A PST stores a set P of two-dimensional points in a way that allows certain queries about P to be answered efficiently. The data
@@ -84,41 +104,81 @@ pointing north.
 There is no `smallest_x_in_3_sided(x0, x1, y0)`. Just use `smallest_x_in_ne(x0, y0)`.
+(These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
+[[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.)
 The single-point queries run in O(log n) time, where n is the size of P, while `enumerate_3_sided` runs in O(m + log n), where m is
 the number of points actually enumerated.
 The implementation is in `MaxPrioritySearchTree` (MaxPST for short), so called because internally the structure is, among other
 things, a max-heap on the y-coordinates.
-These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
-[[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.
 We also provide a `MinPrioritySearchTree`, which answers analagous queries in the southward-infinite quadrants and 3-sided
 regions.
 By default these data structures are immutable: once constructed they cannot be changed. But there is a constructor option that
 makes the instance "dynamic". This allows us to delete the element at the root of the tree - the one with largest y value (smallest
 for MinPST) - with the `delete_top!` method. This operation is important in certain algorithms, such as enumerating all maximal
-empty rectangles (see the second paper by De et al.[[DMNS2013]](#references)) Note that points can still not be added to the PST in
+empty rectangles (see the second paper by De et al[[DMNS2013]](#references)). Note that points can still not be added to the PST in
 any case, and choosing the dynamic option makes certain internal bookkeeping operations slower.
 In [[DMNS2013]](#references) De et al. generalize the in-place structure to a _Min-max Priority Search Tree_ (MinmaxPST) that can
 answer queries in all four quadrants and both "kinds" of 3-sided boxes. Having one of these would save the trouble of constructing
 both a MaxPST and MinPST. But the presentiation is hard to follow in places and the paper's pseudocode is buggy.[^minmaxpst]
+``` ruby
+require 'data_structures_rmolinari'
+MaxPST = DataStructuresRMolinari::MaxPrioritySearchTree
+Point = Shared::Point # simple (x, y) struct. Anything responding to #x and #y will work
+data = [Point.new(0, 0), Point.new(1, 2), Point.new(2, 1)]
+pst = MaxPST.new(data)
+pst.largest_y_in_ne(0, 0)              # => #<struct Shared::Point x=1, y=2>
+pst.largest_y_in_ne(1, 1)              # => #<struct Shared::Point x=1, y=2>
+pst.largest_y_in_ne(1.5, 1)            # => #<struct Shared::Point x=2, y=1>
+pst.largest_y_in_3_sided(-0.5, 0.5, 0) # => #<struct Shared::Point x=0, y=0>
+```
 ## Segment Tree
-Segment trees store information related to subintervals of a certain array. For example, they can be used to find the sum of the
-elements in an arbitrary subinterval A[i..j] of an array A[0..n] in O(log n) time. Each node in the tree corresponds to a subarray
-of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for arbitrary
-subarrays.
+A segment tree stores information related to subintervals of a certain array. For example, a segment tree can be used to find the
+sum of the elements in an arbitrary subinterval A(i..j) of an array A(0..n) in O(log n) time. Each node in the tree corresponds to a
+subarray of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for
+arbitrary subarrays.
 An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
-Generic code is provided in `SegmentTreeTemplate`. Concrete classes are written by providing a handful of simple lambdas and
-constants to the template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a
-segment tree, for which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes,
-`MaxValSegmentTree` and `IndexOfMaxValSegmentTree`, for examples.
+Generic code is provided in `SegmentTree::SegmentTreeTemplate` and its equivalent (and faster) C-based sibling,
+`SegmentTree::CSegmentTreeTemplate` (see [below](#c-extensions)).
+Writing a concrete segment tree class just means providing some simple lambdas and constants to the template class's
+initializer. Figuring out the details requires some knowledge of the internal mechanisms of a segment tree, for which the link at
+cp-algorithms.com is very helpful. See the implementations of the concrete classes `MaxValSegmentTree` and
+`IndexOfMaxValSegmentTree` for examples.
+Since there are several concrete "types" and two underlying generic implementions there is a convenience method on the `SegmentTree`
+module to get instances.
+``` ruby
+require 'data_structures_rmolinari'
+SegmentTree = DataStructuresRMolinari::SegmentTree # namespace module
+data = [1, -3, 2, 1, 5, -9]
+# Get a segment tree instance that will answer "max over this subinterval" questions about data.
+# Here we get one using the ruby implementation of the generic functionality.
+#
+# We offer :index_of_max as an alternative to :max. This will construct an instance that answers
+# questions of the form "an index of the maximum value over this subinterval".
+#
+# To use the version written in C, put :c instead of :ruby.
+seg_tree = SegmentTree.construct(data, :max, :ruby)
+seg_tree.max_on(0, 2) # => 2
+seg_tree.max_on(1, 4) # => 5
+# ..etc..
+```
 ## Algorithms
@@ -131,11 +191,37 @@ The Algorithms submodule contains some algorithms using the data structures.
     [left, right, bottom, top].
   - The algorithm is due to [[DMNS2013]](#references).
+# C Extensions
+As another learning process I have implemented several of these data structures as C extensions. The APIs are the same.
+## Disjoint Union
+The C version is called `CDisjointUnion`.  A benchmark suggests that a long sequence of `unite` operations is about 3 times as fast
+with `CDisjointUnion` as with `DisjointUnion`.
+The implementation uses the remarkable Convenient Containers library from Jackson Allan.[[Allan]](#references).
+## Segment Tree
+`CSegmentTreeTemplate` is the C implementation of the generic class. Concrete classes are built on top of this in Ruby, just as with
+the pure Ruby `SegmentTreeTemplate` class.
+A benchmark suggests that a long sequence of `max_on` operations against a max-val Segment Tree is about 4 times as fast with C as
+with Ruby. I'm a bit suprised the improvment isn't larger, but remember that the C code must still interact with the Ruby objects in
+the underlying data array, and must combine them, etc., via Ruby lambdas.
 # References
-- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
-- [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI 10.1007/s00224-017-9760-2.
-- [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985.
-- [DMNS2011] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Priority Search Tree_, 23rd Canadian Conference on Computational Geometry, 2011.
-- [DMNS2013] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46 (2013), pp 310-327.
+- [Allan] Allan, J., _CC: Convenient Containers_, https://github.com/JacksonAllan/CC, (retrieved 2023-02-01).
+- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp
+  245–281, https://dl.acm.org/doi/10.1145/62.2160 (retrieved 2022-02-01).
+- [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI
+  10.1007/s00224-017-9760-2, https://kclpure.kcl.ac.uk/portal/files/87388857/TheoryComputingSzstems.pdf (retrieved 2022-02-02).
+- [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985,
+  http://www.cs.duke.edu/courses/fall08/cps234/handouts/SMJ000257.pdf (retrieved 2023-02-02).
+- [DMNS2011] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Priority Search Tree_, 23rd Canadian Conference on
+  Computational Geometry, 2011, http://www.cs.carleton.ca/~michiel/inplace_pst.pdf (retrieved 2023-02-02).
+- [DMNS2013] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46
+  (2013), pp 310-327, https://people.scs.carleton.ca/~michiel/MinMaxPST.pdf (retrieved 2023-02-02).
 [^minmaxpst]: See the comments in the fragmentary class `MinMaxPrioritySearchTree` for further details.

data/Rakefile CHANGED Viewed

@@ -2,10 +2,12 @@ require 'rubygems'
 require 'rake/testtask'
 require 'rake/extensiontask'
-Rake::ExtensionTask.new('data_structures_rmolinari/c_disjoint_union') do |ext|
-  ext.name = 'c_disjoint_union'
-  ext.ext_dir = 'ext/c_disjoint_union'
-  ext.lib_dir = 'lib/data_structures_rmolinari/'
+['c_disjoint_union', 'c_segment_tree_template'].each do |extension_name|
+  Rake::ExtensionTask.new("data_structures_rmolinari/#{extension_name}") do |ext|
+    ext.name = extension_name
+    ext.ext_dir = "ext/#{extension_name}"
+    ext.lib_dir = 'lib/data_structures_rmolinari/'
+  end
 end
 Rake::TestTask.new do |t|

data/ext/c_disjoint_union/disjoint_union.c CHANGED Viewed

@@ -16,118 +16,69 @@
  */
 #include "ruby.h"
-// The Shared::DataError exception type in the Ruby code. We only need it when we detect a runtime error, so a macro should be fine.
-#define mShared rb_define_module("Shared")
-#define eSharedDataError rb_const_get(mShared, rb_intern_const("DataError"))
+#include "cc.h" // Convenient Containers
+#include "shared.h"
 /**
- * It's been so long since I've written non-trival C that I need to copy examples from online.
- *
- * Dynamic array of longs, with an initial value for otherwise uninitialized elements.
- * Based on  https://stackoverflow.com/questions/3536153/c-dynamically-growing-array
- */
-typedef struct {
-  long *array;
-  size_t size;
-  long default_val;
-} DynamicArray;
-/*
- * Initialize a DynamicArray struct with the given initial size and with all values set to the default value.
- *
- * The default value is stored and used to initialize new array sections if and when the array needs to be expanded.
- */
-void initDynamicArray(DynamicArray *a, size_t initial_size, long default_val) {
-  a->array = malloc(initial_size * sizeof(long));
-  a->size = initial_size;
-  a->default_val = default_val;
-  for (size_t i = 0; i < initial_size; i++) {
-    a->array[i] = default_val;
-  }
-}
-/*
- * Assign +value+ to the the +index+-th element of the array, expanding the available space if necessary.
+ * Data type for the (parent, rank) pair, and some accessor helpers for the vec() container we are going to be using.
  */
-void assignInDynamicArray(DynamicArray *a, unsigned long index, long value) {
-  if (a->size <= index) {
-    size_t new_size = a->size;
-    while (new_size <= index) {
-      new_size = 8 * new_size / 5 + 8; // 8/5 gives "Fibonnacci-like" growth; adding 8 to avoid small arrays having to reallocate
-                                       // too often as they grow. Who knows if it's worth being "clever".
-    }
-    long *new_array = realloc(a->array, new_size * sizeof(long));
-    if (!new_array) {
-      rb_raise(rb_eRuntimeError, "Cannot allocate memory to expand DynamicArray!");
-    }
+typedef struct data_pair {
+  long parent;
+  unsigned long rank;
+} data_pair;
-    a->array = new_array;
-    for (size_t i = a->size; i < new_size; i++) {
-      a->array[i] = a->default_val;
-    }
-    a->size = new_size;
-  }
+#define DEFAULT_PARENT -1
+#define DEFAULT_RANK 0
+static data_pair default_pair = { .parent = DEFAULT_PARENT, .rank = DEFAULT_RANK };
-  a->array[index] = value;
+static data_pair make_data_pair(long parent, unsigned long rank) {
+  data_pair pair = { .parent = parent, .rank = rank };
+  return pair;
 }
-void freeDynamicArray(DynamicArray *a) {
-  free(a->array);
-  a->array = NULL;
-  a->size = 0;
-}
+/* The vector generic from Convenient Containers */
+typedef vec(data_pair) pair_vector;
-size_t _size_of(DynamicArray *a) {
-  return a->size * sizeof(a->default_val);
-}
+#define parent(disjoint_union_ptr, idx) (get(disjoint_union->pairs, idx)->parent)
+#define rank(disjoint_union_ptr, idx) (get(disjoint_union->pairs, idx)->rank)
 /**
  * The C implementation of a Disjoint Union
  *
- * See Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
+ * See the paper for optimizations we use to get almost constant time for find() and unite().
+ *
+ * Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
  */
 /*
  * The Disjoint Union struct.
- * - forest: an array of longs giving, for each element, the element's parent.
- *   - An element e is the root of its tree just when forest[e] == e.
- *   - Two elements are in the same subset just when they are in the same tree in the forest.
+ * - pairs: a vector (dynamic array) of pairs, the i-th of which contains
+ *   - the "parent" of element i in its membership tree
+ *     - An element e is the root of its tree just when it is its own parent
+ *     - Two elements are in the same subset just when they are in the same tree in the forest.
  *     - So the key idea is that we can check this by navigating via parents from each element to their roots. Clever optimizations
  *       keep the trees flat and so most nodes are close to their roots.
- * - rank: a array of longs giving the "rank" of each element.
- *   - This value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat. See Tarjan & van
- *     Leeuwen
+ *   - the "rank" of element i
+ *     - this value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat.
  * - subset_count: the number of (disjoint) subsets.
  *   - it isn't needed internally but may be useful to client code.
  */
 typedef struct du_data {
-  DynamicArray *forest; // the forest that describes the unified subsets
-  DynamicArray *rank;   // the "ranks" of the elements, used when uniting subsets
+  pair_vector *pairs; // The generic vector container from the amazing Convenient Containers library
   size_t subset_count;
 } disjoint_union_data;
 /*
  * Create one (on the heap).
- *
- * The dynamic arrays are initialized with a size of 100 because I didn't have a better idea. This will end up getting called from
- * the Ruby #allocate method, which happens before #initialize. Thus we don't know the calling code's desired initial size.
  */
-#define INITIAL_SIZE 100
 static disjoint_union_data *create_disjoint_union() {
   disjoint_union_data *disjoint_union = (disjoint_union_data *)malloc(sizeof(disjoint_union_data));
   // Allocate the structures
-  DynamicArray *forest = (DynamicArray *)malloc(sizeof(DynamicArray));
-  DynamicArray *rank = (DynamicArray *)malloc(sizeof(DynamicArray));
-  initDynamicArray(forest, INITIAL_SIZE, -1);
-  initDynamicArray(rank,   INITIAL_SIZE, 0);
+  disjoint_union->pairs = malloc(sizeof(pair_vector));
+  init(disjoint_union->pairs);
-  disjoint_union->forest = forest;
-  disjoint_union->rank = rank;
   disjoint_union->subset_count = 0;
   return disjoint_union;
@@ -141,15 +92,7 @@ static disjoint_union_data *create_disjoint_union() {
 static void disjoint_union_free(void *ptr) {
   if (ptr) {
     disjoint_union_data *disjoint_union = ptr;
-    freeDynamicArray(disjoint_union->forest);
-    freeDynamicArray(disjoint_union->rank);
-    free(disjoint_union->forest);
-    disjoint_union->forest = NULL;
-    free(disjoint_union->rank);
-    disjoint_union->rank = NULL;
+    cleanup(disjoint_union->pairs);
     xfree(disjoint_union);
   }
 }
@@ -162,8 +105,7 @@ static void disjoint_union_free(void *ptr) {
  * Is the given element already a member of the universe?
  */
 static int present_p(disjoint_union_data *disjoint_union, size_t element) {
-  DynamicArray *forest = (DynamicArray *)disjoint_union->forest;
-  return (forest->size > element && (forest->array[element] != forest->default_val));
+  return (size(disjoint_union->pairs) > element && (parent(disjoint_union, element) != DEFAULT_PARENT));
 }
 /*
@@ -172,6 +114,13 @@ static int present_p(disjoint_union_data *disjoint_union, size_t element) {
 static void assert_membership(disjoint_union_data *disjoint_union, size_t element) {
   if (!present_p(disjoint_union, element)) {
     rb_raise(eSharedDataError, "Value %zu is not part of the universe", element);
+    /* rb_raise( */
+    /*          eSharedDataError, */
+    /*          "Value %zu is not part of the universe, size = %zu, forest_val = %lu", */
+    /*          element, */
+    /*          size(disjoint_union->pairs), */
+    /*          get(disjoint_union->pairs, element)->parent */
+    /*          ); */
   }
 }
@@ -185,47 +134,52 @@ static void add_new_element(disjoint_union_data *disjoint_union, size_t element)
     rb_raise(eSharedDataError, "Element %zu already present in the universe", element);
   }
-  assignInDynamicArray(disjoint_union->forest, element, element);
-  assignInDynamicArray(disjoint_union->rank, element, 0);
+  // Expand the underlying vector if necessary
+  size_t sz = size(disjoint_union->pairs);
+  if (sz <= element) {
+    resize(disjoint_union->pairs, element + 1);
+    for (size_t i = sz + 1; i <= element; i++) {
+      lval(disjoint_union->pairs, i) = default_pair;
+    }
+  }
+  lval(disjoint_union->pairs, element) = make_data_pair(element, 0l);
   disjoint_union->subset_count++;
 }
 /*
- * Find the canonical representative of the given element. This is the root of the tree (in forest) containing element.
+ * Find the canonical representative of the given element. This is the root of the tree containing it.
  *
  * Two elements are in the same subset exactly when their canonical representatives are equal.
  */
 static size_t find(disjoint_union_data *disjoint_union, size_t element) {
   assert_membership(disjoint_union, element);
-  // We implement find with "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
-  long *d = disjoint_union->forest->array; // the actual forest data
+  // We use "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
   size_t x = element;
-  while (d[d[x]] != d[x]) {
-    x = d[x] = d[d[x]];
+  long p, gp; // parent and grandparent
+  while (p = parent(disjoint_union, x), gp = parent(disjoint_union, p), p != gp) {
+    parent(disjoint_union, p) = gp;
+    x = gp;
   }
-  return d[x];
+  return parent(disjoint_union, x);
 }
 /*
- * "Link"" the two given elements so that they are in the same subset now.
+ * "Link" the two given elements so that they are in the same subset now.
  *
  * In other words, merge the subtrees containing the two elements.
  *
- * Good performace (see Tarjan and van Leeuwin) assumes that elt1 and elt2 area are disinct and already the roots of their trees,
- * though we don't check that here.
+ * elt1 and elt2 area must be disinct and the roots of their trees, though we don't check that here.
  */
 static void link_roots(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2) {
-  long *rank = disjoint_union->rank->array;
-  long *forest = disjoint_union->forest->array;
-  if (rank[elt1] > rank[elt2]) {
-    forest[elt2] = elt1;
-  } else if (rank[elt1] == rank[elt2]) {
-    forest[elt2] = elt1;
-    rank[elt1]++;
+  if (rank(disjoint_union, elt1) > rank(disjoint_union, elt2)) {
+    parent(disjoint_union, elt2) =  elt1;
+  } else if (rank(disjoint_union, elt1) == rank(disjoint_union, elt2)) {
+    parent(disjoint_union, elt2) = elt1;
+    rank(disjoint_union, elt1)++;
   } else {
-    forest[elt1] = elt2;
+    parent(disjoint_union, elt1) = elt2;
   }
   disjoint_union->subset_count--;
@@ -263,7 +217,9 @@ static void unite(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2)
 static size_t disjoint_union_memsize(const void *ptr) {
   if (ptr) {
     const disjoint_union_data *du = ptr;
-    return sizeof(disjoint_union_data) + _size_of(du->forest) + _size_of(du->rank);
+    // See https://github.com/JacksonAllan/CC/issues/3
+    return sizeof( cc_vec_hdr_ty ) + cap( du->pairs ) * CC_EL_SIZE( *(du->pairs) );
   } else {
     return 0;
   }
@@ -286,21 +242,7 @@ static const rb_data_type_t disjoint_union_type = {
 };
 /*
- * Helper: check that a Ruby value is a non-negative Fixnum and convert it to a C unsigned long
- */
-static unsigned long checked_nonneg_fixnum(VALUE val) {
-  Check_Type(val, T_FIXNUM);
-  long c_val = FIX2LONG(val);
-  if (c_val < 0) {
-    rb_raise(eSharedDataError, "Value must be non-negative");
-  }
-  return c_val;
-}
-/*
- * Unwrap a Rubyfied disjoint union to get the C struct inside.
+ * Unwrap a Ruby-side disjoint union object to get the C struct inside.
  */
 static disjoint_union_data *unwrapped(VALUE self) {
   disjoint_union_data *disjoint_union;
@@ -333,9 +275,13 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
     size_t initial_size = checked_nonneg_fixnum(argv[0]);
     disjoint_union_data *disjoint_union = unwrapped(self);
+    pair_vector *pair_vec = disjoint_union->pairs;
+    resize(pair_vec, initial_size);
     for (size_t i = 0; i < initial_size; i++) {
-      add_new_element(disjoint_union, i);
+      lval(pair_vec, i) = make_data_pair(i, 0);
     }
+    disjoint_union->subset_count = initial_size;
   }
   return self;
 }
@@ -343,7 +289,7 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
 /**
  * And now the simple wrappers around the Disjoint Union C functionality. In each case we
  *   - unwrap a 'VALUE self',
- *     - i.e., theCDisjointUnion instance that contains a disjoint_union_data struct;
+ *     - i.e., the CDisjointUnion instance on the Ruby side;
  *   - munge any other arguments into longs;
  *   - call the appropriate C function to act on the struct; and
  *   - return an appropriate VALUE for the Ruby runtime can use.
@@ -354,7 +300,7 @@ static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
 /*
  * Add a new subset to the universe containing the element +new_v+.
  *
- * @param the new element, starting in its own singleton subset
+ * @param arg the new element, starting in its own singleton subset
  *   - it must be a non-negative integer, not already part of the universe of elements.
  */
 static VALUE disjoint_union_make_set(VALUE self, VALUE arg) {
@@ -412,7 +358,7 @@ static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
  * - Tarjan, Robert E., van Leeuwen, Jan (1984). _Worst-case analysis of set union algorithms_. Journal of the ACM. 31 (2): 245–281.
  */
 void Init_c_disjoint_union() {
-  VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
+  //VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
   VALUE cDisjointUnion = rb_define_class_under(mDataStructuresRMolinari, "CDisjointUnion", rb_cObject);
   rb_define_alloc_func(cDisjointUnion, disjoint_union_alloc);

data/ext/c_disjoint_union/extconf.rb CHANGED Viewed

@@ -3,10 +3,15 @@ require 'mkmf'
 abort 'missing malloc()' unless have_func "malloc"
 abort 'missing realloc()' unless have_func "realloc"
-if try_cflags('-O')
-  append_cflags('-O')
+if try_cflags('-O3')
+  append_cflags('-O3')
 end
 extension_name = "c_disjoint_union"
 dir_config(extension_name)
+$srcs = ["disjoint_union.c", "../shared.c"]
+$INCFLAGS << " -I$(srcdir)/.."
+$VPATH << "$(srcdir)/.."
 create_makefile("data_structures_rmolinari/c_disjoint_union")

data/ext/c_segment_tree_template/extconf.rb ADDED Viewed

@@ -0,0 +1,17 @@
+require 'mkmf'
+abort 'missing malloc()' unless have_func "malloc"
+abort 'missing realloc()' unless have_func "realloc"
+if try_cflags('-O3')
+  append_cflags('-O3')
+end
+extension_name = "c_segment_tree_template"
+dir_config(extension_name)
+$srcs = ["segment_tree_template.c", "../shared.c"]
+$INCFLAGS << " -I$(srcdir)/.."
+$VPATH << "$(srcdir)/.."
+create_makefile("data_structures_rmolinari/c_segment_tree_template")