RubyGems - data_structures_rmolinari - Versions diffs - 0.4.1 → 0.4.3 - Mend

data_structures_rmolinari 0.4.1 → 0.4.3

Files changed (13) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +25 -3
data/README.md +141 -0
data/Rakefile +16 -0
data/ext/c_disjoint_union/disjoint_union.c +424 -0
data/ext/c_disjoint_union/extconf.rb +12 -0
data/lib/data_structures_rmolinari/algorithms.rb +103 -0
data/lib/data_structures_rmolinari/max_priority_search_tree.rb +200 -58
data/lib/data_structures_rmolinari/min_priority_search_tree.rb +187 -0
data/lib/data_structures_rmolinari/{generic_segment_tree.rb → segment_tree_template.rb} +0 -0
data/lib/data_structures_rmolinari/shared.rb +5 -16
data/lib/data_structures_rmolinari.rb +6 -3
metadata +12 -5

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: eb25e49219167201208f45a402b202180466202bc071940da418b2f84d281f6d
-  data.tar.gz: f43c1614c2a433d7a4e1148eb90121fc4b1d61c807afeb78c456d77b66935adb
+  metadata.gz: c9022e9531472d1125c6172025c2d10c5d4ef4f9c43e326a43f1c5b4f0721263
+  data.tar.gz: '0212619be7fe32e68b63d2087730f81ffd6b4179b8b8bf63aa0026e4e3056224'
 SHA512:
-  metadata.gz: d2e77397f790e8fe8d650d7727be55b464aa8f19d928e215b824820b712df1f5762d74ed304196e1c773960f02d73cd29bc07486f0cc28eb5ddcd5dbd422d691
-  data.tar.gz: 56328269625f88b5119b64696f792a2e567d6327605dc9a6cc391edd33e5e49f293adb304ef3b819094b860299f904b79862eaac2a8eaeff0d249350bce280db
+  metadata.gz: a7f9258eeed2dc7e7fa5713aaecfcdf44e061bb161aa3d0d2662fb662bfb6b2685c61be221b4a109792982c3b2aa6215da75b51ae299d4a9237b6226000612e4
+  data.tar.gz: e585a245f753ef731895163eedba802e3fe2f6000720d10705b5a0cd02a12642a35220eea57c8b71f504b66db7cb06161fdfca1660edc0dc132ee026dd83be4d

data/CHANGELOG.md CHANGED Viewed

@@ -1,6 +1,28 @@
 # Changelog
-## [Unreleased]
+## [0.4.3] 2023-01-27
+- Fix bad directive in Rakefile for DisjointUnion C extension
+## [0.4.2] 2023-01-26
+### Added
+- MinPrioritySearchTree added
+  - it's a thin layer on top of a MaxPrioritySearchTree with negated y values.
+- MaxPrioritySearchTree
+  - A "dynamic" constructor option now allows deletion of the "top" (root) node. This is useful in certain algorithms.
+- DisjointUnion
+  - Added a proof-of-concept implementation in C, which is about twice as fast.
+- Algorithms
+  - Implement the Maximal Empty Rectangle algorithm of De et al. It uses a dynamic MaxPST.
+## [0.4.1] 2023-01-12
+- Update this file for the gem (though I forgot to add this comment first!)
 ## [0.4.0] 2023-01-12
@@ -10,10 +32,10 @@
   - Duplicate y values are now allowed. Ties are broken with a preference for smaller values of x.
   - Method names have changed
     - Instead of "highest", "leftmost", "rightmost" we use "largest_y", "smallest_x", "largest_x"
-    - For example, +highest_ne+ is now +largest_y_in_nw+
+    - For example, `highest_ne` is now `largest_y_in_nw`
 - DisjointUnion
   - the size argument to initializer is optional. The default value is 0.
-  - elements can be added to the "universe" of known values with +make_set+
+  - elements can be added to the "universe" of known values with `make_set`
 ### Removed
 - MinmaxPrioritySearchTree is no longer available

data/README.md ADDED Viewed

@@ -0,0 +1,141 @@
+# Data Structures
+This is a small collection of Ruby data structures that I have implemented for my own interest.  Implementing the code for a data
+structure is almost always more educational than simply reading about it and is usually fun.  I wrote some of them while
+participating in the Advent of Code (https://adventofcode.com/).
+These implementations are not particularly clever. They are based on the expository descriptions and pseudo-code I found as I read
+about each structure and so are not as fast as possible.
+The code is available as a gem: https://rubygems.org/gems/data_structures_rmolinari.
+## Usage
+The right way to organize the code is not obvious to me. For now the data structures are all defined in the module
+`DataStructuresRMolinari` to avoid polluting the global namespace.
+Example usage after the gem is installed:
+```
+require 'data_structures_rmolinari`
+# Pull what we need out of the namespace
+MaxPrioritySearchTree = DataStructuresRMolinari::MaxPrioritySearchTree
+Point = DataStructuresRMolinari::Point # anything responding to :x and :y is fine
+pst = MaxPrioritySearchTree.new([Point.new(1, 1)])
+puts pst.largest_y_in_ne(0, 0) # "Point(1,1)"
+```
+# Implementations
+## Disjoint Union
+We represent a set S of non-negative integers as the disjoint union of subsets. Equivalently, we represent a partition of S. The
+data structure provides very efficient implementation of the two key operations
+- `unite(e, f)`, which merges the subsets containing e and f; and
+- `find(e)`, which returns the canonical representative of the subset containing e. Two elements e and f are in the same subset
+  exactly when `find(e) == find(f)`.
+It also provides
+- `make_set(v)`, which adds a new value `v` to the set S, starting out in a singleton subset.
+For more details see https://en.wikipedia.org/wiki/Disjoint-set_data_structure and the paper [[TvL1984]](#references) by Tarjan and
+van Leeuwen.
+There is an experimental implementation as a C extension in c_disjoint_union.c. The Ruby class for this implementation is
+`CDisjointUnion`. Benchmarks indicate that a long sequence of `unite` calls is about twice as fast.
+## Heap
+This is a standard binary heap with an `update` method, suitable for use as a priority queue. There are several supported
+operations:
+- `insert(item, priority)`, insert the given item with the stated priority.
+  - By default, items must be distinct.
+- `top`, returning the element with smallest priority
+- `pop`, return the element with smallest priority and remove it from the structure
+- `update(item, priority)`, update the priority of the given item, which must already be in the heap
+`top` is O(1). The others are O(log n) where n is the number of items in the heap.
+By default we have a min-heap: the top element is the one with smallest priority. A configuration parameter at construction can make
+it a max-heap.
+Another configuration parameter allows the creation of a "non-addressable" heap. This makes it impossible to call `update`, but
+allows the insertion of duplicate items (which is sometimes useful) and slightly faster operation overall.
+See https://en.wikipedia.org/wiki/Binary_heap and the paper by Edelkamp, Elmasry, and Katajainen [[EEK2017]](#references).
+## Priority Search Tree
+A PST stores a set P of two-dimensional points in a way that allows certain queries about P to be answered efficiently. The data
+structure was introduced by McCreight [[McC1985]](#references). De, Maheshawari, Nandy, and Smid [[DMNS2011]](#references) showed
+how to build the structure in-place and we use their approach here.
+- `largest_y_in_ne(x0, y0)` and `largest_y_in_nw(x0, y0)`, the "highest" (max-y) point in the quadrant to the northest/northwest of
+  (x0, y0);
+- `smallest_x_in_ne(x0, y0)`, the "leftmost" (min-x) point in the quadrant to the northeast of (x0, y0);
+- `largest_x_in_nw(x0, y0)`, the "rightmost" (max-x) point in the quadrant to the northwest of (x0, y0);
+- `largest_y_in_3_sided(x0, x1, y0)`, the highest point in the region specified by x0 <= x <= x1 and y0 <= y; and
+- `enumerate_3_sided(x0, x1, y0)`, enumerate all the points in that region.
+Here compass directions are the natural ones in the x-y plane with the positive x-axis pointing east and the positive y-axis
+pointing north.
+There is no `smallest_x_in_3_sided(x0, x1, y0)`. Just use `smallest_x_in_ne(x0, y0)`.
+The single-point queries run in O(log n) time, where n is the size of P, while `enumerate_3_sided` runs in O(m + log n), where m is
+the number of points actually enumerated.
+The implementation is in `MaxPrioritySearchTree` (MaxPST for short), so called because internally the structure is, among other
+things, a max-heap on the y-coordinates.
+These queries appear rather abstract at first but there are interesting applications. See, for example, section 4 of
+[[McC85]](#references), keeping in mind that the data structure in that paper is actually a _MinPST_.
+We also provide a `MinPrioritySearchTree`, which answers analagous queries in the southward-infinite quadrants and 3-sided
+regions.
+By default these data structures are immutable: once constructed they cannot be changed. But there is a constructor option that
+makes the instance "dynamic". This allows us to delete the element at the root of the tree - the one with largest y value (smallest
+for MinPST) - with the `delete_top!` method. This operation is important in certain algorithms, such as enumerating all maximal
+empty rectangles (see the second paper by De et al.[[DMNS2013]](#references)) Note that points can still not be added to the PST in
+any case, and choosing the dynamic option makes certain internal bookkeeping operations slower.
+In [[DMNS2013]](#references) De et al. generalize the in-place structure to a _Min-max Priority Search Tree_ (MinmaxPST) that can
+answer queries in all four quadrants and both "kinds" of 3-sided boxes. Having one of these would save the trouble of constructing
+both a MaxPST and MinPST. But the presentiation is hard to follow in places and the paper's pseudocode is buggy.[^minmaxpst]
+## Segment Tree
+Segment trees store information related to subintervals of a certain array. For example, they can be used to find the sum of the
+elements in an arbitrary subinterval A[i..j] of an array A[0..n] in O(log n) time. Each node in the tree corresponds to a subarray
+of A in such a way that the values we store in the nodes can be combined efficiently to determine the desired result for arbitrary
+subarrays.
+An excellent description of the idea is found at https://cp-algorithms.com/data_structures/segment_tree.html.
+Generic code is provided in `SegmentTreeTemplate`. Concrete classes are written by providing a handful of simple lambdas and
+constants to the template class's initializer. Figuring out the details requires some knowledge of the internal mechanisms of a
+segment tree, for which the link at cp-algorithms.com is very helpful. See the definitions of the concrete classes,
+`MaxValSegmentTree` and `IndexOfMaxValSegmentTree`, for examples.
+## Algorithms
+The Algorithms submodule contains some algorithms using the data structures.
+- `maximal_empty_rectangles(points)`
+  - We are given a set P contained in a minimal box B = [x_min, x_max] x [y_min, y_max]. An _empty rectangle_ is a axis-parallel
+    rectangle with positive area contained in B containing no element of P in its interior. A _maximal empty rectangle_ is an empty
+    rectangle not properly contained in any other empty rectangle. This method yields each maximal empty rectangle in the form
+    [left, right, bottom, top].
+  - The algorithm is due to [[DMNS2013]](#references).
+# References
+- [TvL1984] Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
+- [EEK2017] Edelkamp, S., Elmasry, A., Katajainen, J., _Optimizing Binary Heaps_, Theory Comput Syst (2017), vol 61, pp 606-636, DOI 10.1007/s00224-017-9760-2.
+- [McC1985] McCreight, E. M., _Priority Search Trees_, SIAM J. Comput., 14(2):257-276, 1985.
+- [DMNS2011] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Priority Search Tree_, 23rd Canadian Conference on Computational Geometry, 2011.
+- [DMNS2013] De, M., Maheshwari, A., Nandy, S. C., Smid, M., _An In-Place Min-max Priority Search Tree_, Computational Geometry, v46 (2013), pp 310-327.
+[^minmaxpst]: See the comments in the fragmentary class `MinMaxPrioritySearchTree` for further details.

data/Rakefile ADDED Viewed

@@ -0,0 +1,16 @@
+require 'rubygems'
+require 'rake/testtask'
+require 'rake/extensiontask'
+Rake::ExtensionTask.new('data_structures_rmolinari/c_disjoint_union') do |ext|
+  ext.name = 'c_disjoint_union'
+  ext.ext_dir = 'ext/c_disjoint_union'
+  ext.lib_dir = 'lib/data_structures_rmolinari/'
+end
+Rake::TestTask.new do |t|
+  t.libs << 'test'
+end
+desc 'Run Tests'
+task default: :test

data/ext/c_disjoint_union/disjoint_union.c ADDED Viewed

@@ -0,0 +1,424 @@
+/*
+ * This is a C implementation of a simple Ruby Disjoint Union data structure.
+ *
+ * A Disjoint Union doesn't have much of an implementation in Ruby: see disjoint_union.rb in this gem. This means that we don't gain
+ * much by implementing it in C but that it serves as a good learning experience for me.
+ *
+ * It turns out that writing a C extension for Ruby like this isn't very complicated, but there are a bunch of moving parts and the
+ * available documentation is a bit of a slog. Writing this was very educational.
+ *
+ * See https://docs.ruby-lang.org/en/master/extension_rdoc.html for some documentation. It's a bit hard to read in places, but
+ * plugging away at things helps.
+ *
+ * https://guides.rubygems.org/gems-with-extensions/ is a decent tutorial, though it leaves out lots of details.
+ *
+ * See https://aaronbedra.com/extending-ruby/ for another tutorial.
+ */
+#include "ruby.h"
+// The Shared::DataError exception type in the Ruby code. We only need it when we detect a runtime error, so a macro should be fine.
+#define mShared rb_define_module("Shared")
+#define eSharedDataError rb_const_get(mShared, rb_intern_const("DataError"))
+/**
+ * It's been so long since I've written non-trival C that I need to copy examples from online.
+ *
+ * Dynamic array of longs, with an initial value for otherwise uninitialized elements.
+ * Based on  https://stackoverflow.com/questions/3536153/c-dynamically-growing-array
+ */
+typedef struct {
+  long *array;
+  size_t size;
+  long default_val;
+} DynamicArray;
+/*
+ * Initialize a DynamicArray struct with the given initial size and with all values set to the default value.
+ *
+ * The default value is stored and used to initialize new array sections if and when the array needs to be expanded.
+ */
+void initDynamicArray(DynamicArray *a, size_t initial_size, long default_val) {
+  a->array = malloc(initial_size * sizeof(long));
+  a->size = initial_size;
+  a->default_val = default_val;
+  for (size_t i = 0; i < initial_size; i++) {
+    a->array[i] = default_val;
+  }
+}
+/*
+ * Assign +value+ to the the +index+-th element of the array, expanding the available space if necessary.
+ */
+void assignInDynamicArray(DynamicArray *a, unsigned long index, long value) {
+  if (a->size <= index) {
+    size_t new_size = a->size;
+    while (new_size <= index) {
+      new_size = 8 * new_size / 5 + 8; // 8/5 gives "Fibonnacci-like" growth; adding 8 to avoid small arrays having to reallocate
+                                       // too often as they grow. Who knows if it's worth being "clever".
+    }
+    long *new_array = realloc(a->array, new_size * sizeof(long));
+    if (!new_array) {
+      rb_raise(rb_eRuntimeError, "Cannot allocate memory to expand DynamicArray!");
+    }
+    a->array = new_array;
+    for (size_t i = a->size; i < new_size; i++) {
+      a->array[i] = a->default_val;
+    }
+    a->size = new_size;
+  }
+  a->array[index] = value;
+}
+void freeDynamicArray(DynamicArray *a) {
+  free(a->array);
+  a->array = NULL;
+  a->size = 0;
+}
+size_t _size_of(DynamicArray *a) {
+  return a->size * sizeof(a->default_val);
+}
+/**
+ * The C implementation of a Disjoint Union
+ *
+ * See Tarjan, Robert E., van Leeuwen, J., _Worst-case Analysis of Set Union Algorithms_, Journal of the ACM, v31:2 (1984), pp 245–281.
+ */
+/*
+ * The Disjoint Union struct.
+ * - forest: an array of longs giving, for each element, the element's parent.
+ *   - An element e is the root of its tree just when forest[e] == e.
+ *   - Two elements are in the same subset just when they are in the same tree in the forest.
+ *     - So the key idea is that we can check this by navigating via parents from each element to their roots. Clever optimizations
+ *       keep the trees flat and so most nodes are close to their roots.
+ * - rank: a array of longs giving the "rank" of each element.
+ *   - This value is used to guide the "linking" of trees when subsets are being merged to keep the trees flat. See Tarjan & van
+ *     Leeuwen
+ * - subset_count: the number of (disjoint) subsets.
+ *   - it isn't needed internally but may be useful to client code.
+ */
+typedef struct du_data {
+  DynamicArray *forest; // the forest that describes the unified subsets
+  DynamicArray *rank;   // the "ranks" of the elements, used when uniting subsets
+  size_t subset_count;
+} disjoint_union_data;
+/*
+ * Create one (on the heap).
+ *
+ * The dynamic arrays are initialized with a size of 100 because I didn't have a better idea. This will end up getting called from
+ * the Ruby #allocate method, which happens before #initialize. Thus we don't know the calling code's desired initial size.
+ */
+#define INITIAL_SIZE 100
+static disjoint_union_data *create_disjoint_union() {
+  disjoint_union_data *disjoint_union = (disjoint_union_data *)malloc(sizeof(disjoint_union_data));
+  // Allocate the structures
+  DynamicArray *forest = (DynamicArray *)malloc(sizeof(DynamicArray));
+  DynamicArray *rank = (DynamicArray *)malloc(sizeof(DynamicArray));
+  initDynamicArray(forest, INITIAL_SIZE, -1);
+  initDynamicArray(rank,   INITIAL_SIZE, 0);
+  disjoint_union->forest = forest;
+  disjoint_union->rank = rank;
+  disjoint_union->subset_count = 0;
+  return disjoint_union;
+}
+/*
+ * Free the memory associated with a disjoint union.
+ *
+ * This will end up getting triggered by the Ruby garbage collector. Ruby learns about it via the disjoint_union_type struct below.
+ */
+static void disjoint_union_free(void *ptr) {
+  if (ptr) {
+    disjoint_union_data *disjoint_union = ptr;
+    freeDynamicArray(disjoint_union->forest);
+    freeDynamicArray(disjoint_union->rank);
+    free(disjoint_union->forest);
+    disjoint_union->forest = NULL;
+    free(disjoint_union->rank);
+    disjoint_union->rank = NULL;
+    xfree(disjoint_union);
+  }
+}
+/************************************************************
+ * The disjoint union operations
+ ************************************************************/
+/*
+ * Is the given element already a member of the universe?
+ */
+static int present_p(disjoint_union_data *disjoint_union, size_t element) {
+  DynamicArray *forest = (DynamicArray *)disjoint_union->forest;
+  return (forest->size > element && (forest->array[element] != forest->default_val));
+}
+/*
+ * Check that the given element is a member of the universe and raise Shared::DataError (ruby-side) if not
+ */
+static void assert_membership(disjoint_union_data *disjoint_union, size_t element) {
+  if (!present_p(disjoint_union, element)) {
+    rb_raise(eSharedDataError, "Value %zu is not part of the universe", element);
+  }
+}
+/*
+ * Add a new element to the universe. It starts out in its own singleton subset.
+ *
+ * Shared::DataError is raised if it is already an element.
+ */
+static void add_new_element(disjoint_union_data *disjoint_union, size_t element) {
+  if (present_p(disjoint_union, element)) {
+    rb_raise(eSharedDataError, "Element %zu already present in the universe", element);
+  }
+  assignInDynamicArray(disjoint_union->forest, element, element);
+  assignInDynamicArray(disjoint_union->rank, element, 0);
+  disjoint_union->subset_count++;
+}
+/*
+ * Find the canonical representative of the given element. This is the root of the tree (in forest) containing element.
+ *
+ * Two elements are in the same subset exactly when their canonical representatives are equal.
+ */
+static size_t find(disjoint_union_data *disjoint_union, size_t element) {
+  assert_membership(disjoint_union, element);
+  // We implement find with "halving" to shrink the length of paths to the root. See Tarjan and van Leeuwin p 252.
+  long *d = disjoint_union->forest->array; // the actual forest data
+  size_t x = element;
+  while (d[d[x]] != d[x]) {
+    x = d[x] = d[d[x]];
+  }
+  return d[x];
+}
+/*
+ * "Link"" the two given elements so that they are in the same subset now.
+ *
+ * In other words, merge the subtrees containing the two elements.
+ *
+ * Good performace (see Tarjan and van Leeuwin) assumes that elt1 and elt2 area are disinct and already the roots of their trees,
+ * though we don't check that here.
+ */
+static void link_roots(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2) {
+  long *rank = disjoint_union->rank->array;
+  long *forest = disjoint_union->forest->array;
+  if (rank[elt1] > rank[elt2]) {
+    forest[elt2] = elt1;
+  } else if (rank[elt1] == rank[elt2]) {
+    forest[elt2] = elt1;
+    rank[elt1]++;
+  } else {
+    forest[elt1] = elt2;
+  }
+  disjoint_union->subset_count--;
+}
+/*
+ * "Unite" or merge the subsets containing elt1 and elt2.
+ */
+static void unite(disjoint_union_data *disjoint_union, size_t elt1, size_t elt2) {
+  assert_membership(disjoint_union, elt1);
+  assert_membership(disjoint_union, elt2);
+  if (elt1 == elt2) {
+    rb_raise(eSharedDataError, "Uniting an element with itself is meaningless");
+  }
+  size_t root1 = find(disjoint_union, elt1);
+  size_t root2 = find(disjoint_union, elt2);
+  if (root1 == root2) {
+    return; // already united
+  }
+  link_roots(disjoint_union, root1, root2);
+}
+/**
+ * Wrapping and unwrapping things for the Ruby runtime
+ *
+ */
+// How much memory (roughly) does a disjoint_union_data instance consume? I guess the Ruby runtime can use this information when
+// deciding how agressive to be during garbage collection and such.
+static size_t disjoint_union_memsize(const void *ptr) {
+  if (ptr) {
+    const disjoint_union_data *du = ptr;
+    return sizeof(disjoint_union_data) + _size_of(du->forest) + _size_of(du->rank);
+  } else {
+    return 0;
+  }
+}
+/*
+ * A configuration struct that tells the Ruby runtime how to deal with a disjoint_union_data object.
+ *
+ * https://docs.ruby-lang.org/en/master/extension_rdoc.html#label-Encapsulate+C+data+into+a+Ruby+object
+ */
+static const rb_data_type_t disjoint_union_type = {
+  .wrap_struct_name = "disjoint_union",
+  { // help for the Ruby garbage collector
+    .dmark = NULL, // dmark, for marking other Ruby objects. We don't hold any other objects so this can be NULL
+    .dfree = disjoint_union_free, // how to free the memory associated with an object
+    .dsize = disjoint_union_memsize, // roughly how much space does the object consume?
+  },
+  .data = NULL, // a data field we could use for something here if we wanted. Ruby ignores it
+  .flags = 0  // GC-related flag values.
+};
+/*
+ * Helper: check that a Ruby value is a non-negative Fixnum and convert it to a C unsigned long
+ */
+static unsigned long checked_nonneg_fixnum(VALUE val) {
+  Check_Type(val, T_FIXNUM);
+  long c_val = FIX2LONG(val);
+  if (c_val < 0) {
+    rb_raise(eSharedDataError, "Value must be non-negative");
+  }
+  return c_val;
+}
+/*
+ * Unwrap a Rubyfied disjoint union to get the C struct inside.
+ */
+static disjoint_union_data *unwrapped(VALUE self) {
+  disjoint_union_data *disjoint_union;
+  TypedData_Get_Struct((self), disjoint_union_data, &disjoint_union_type, disjoint_union);
+  return disjoint_union;
+}
+/*
+ * This is for CDisjointUnion.allocate on the Ruby side
+ */
+static VALUE disjoint_union_alloc(VALUE klass) {
+  // Get one on the heap
+  disjoint_union_data *disjoint_union = create_disjoint_union();
+  // Wrap it up into a Ruby object
+  return TypedData_Wrap_Struct(klass, &disjoint_union_type, disjoint_union);
+}
+/*
+ * A single parameter is optional. If given it should be a non-negative integer and specifies the initial size, s, of the universe
+ * 0, 1, ..., s-1.
+ *
+ * If no argument is given we act as though a value of 0 were passed.
+ */
+static VALUE disjoint_union_init(int argc, VALUE *argv, VALUE self) {
+  if (argc == 0) {
+    return self;
+  } else if (argc > 1) {
+    rb_raise(rb_eArgError, "wrong number of arguments");
+  } else {
+    size_t initial_size = checked_nonneg_fixnum(argv[0]);
+    disjoint_union_data *disjoint_union = unwrapped(self);
+    for (size_t i = 0; i < initial_size; i++) {
+      add_new_element(disjoint_union, i);
+    }
+  }
+  return self;
+}
+/**
+ * And now the simple wrappers around the Disjoint Union C functionality. In each case we
+ *   - unwrap a 'VALUE self',
+ *     - i.e., theCDisjointUnion instance that contains a disjoint_union_data struct;
+ *   - munge any other arguments into longs;
+ *   - call the appropriate C function to act on the struct; and
+ *   - return an appropriate VALUE for the Ruby runtime can use.
+ *
+ * We make them into methods on CDisjointUnion in the Init_CDisjointUnion function, below.
+ */
+/*
+ * Add a new subset to the universe containing the element +new_v+.
+ *
+ * @param the new element, starting in its own singleton subset
+ *   - it must be a non-negative integer, not already part of the universe of elements.
+ */
+static VALUE disjoint_union_make_set(VALUE self, VALUE arg) {
+  add_new_element(unwrapped(self), checked_nonneg_fixnum(arg));
+  return Qnil;
+}
+/*
+ * @return the number of subsets into which the universe is currently partitioned.
+ */
+static VALUE disjoint_union_subset_count(VALUE self) {
+  return LONG2NUM(unwrapped(self)->subset_count);
+}
+/*
+ * The canonical representative of the subset containing e. Two elements d and e are in the same subset exactly when find(d) ==
+ * find(e).
+ *
+ * The parameter must be in the universe of elements.
+ *
+ * @return (Integer) one of the universe of elements
+ */
+static VALUE disjoint_union_find(VALUE self, VALUE arg) {
+  return LONG2NUM(find(unwrapped(self), checked_nonneg_fixnum(arg)));
+}
+/*
+ * Declare that the arguments are equivalent, i.e., in the same subset. If they are already in the same subset this is a no-op.
+ *
+ * Each argument must be in the universe of elements
+ */
+static VALUE disjoint_union_unite(VALUE self, VALUE arg1, VALUE arg2) {
+  unite(unwrapped(self), checked_nonneg_fixnum(arg1), checked_nonneg_fixnum(arg2));
+  return Qnil;
+}
+/*
+ * A Disjoint Union.
+ *
+ * A "disjoint set union" that represents a set of elements that belonging to _disjoint_ subsets. Alternatively, this expresses a
+ * partion of a fixed set.
+ *
+ * The data structure provides efficient actions to merge two disjoint subsets, i.e., replace them by their union, and determine if
+ * two elements are in the same subset.
+ *
+ * The elements of the set are non-negative integers. Client code can map its data to these representatives.
+ *
+ * See https://en.wikipedia.org/wiki/Disjoint-set_data_structure for a good introduction.
+ *
+ * The code uses several ideas from Tarjan and van Leeuwen for efficiency. We use "union by rank" in +unite+ and path-halving in
+ * +find+. Together, these make the amortized cost of each opperation effectively constant.
+ *
+ * - Tarjan, Robert E., van Leeuwen, Jan (1984). _Worst-case analysis of set union algorithms_. Journal of the ACM. 31 (2): 245–281.
+ */
+void Init_c_disjoint_union() {
+  VALUE mDataStructuresRMolinari = rb_define_module("DataStructuresRMolinari");
+  VALUE cDisjointUnion = rb_define_class_under(mDataStructuresRMolinari, "CDisjointUnion", rb_cObject);
+  rb_define_alloc_func(cDisjointUnion, disjoint_union_alloc);
+  rb_define_method(cDisjointUnion, "initialize", disjoint_union_init, -1);
+  rb_define_method(cDisjointUnion, "make_set", disjoint_union_make_set, 1);
+  rb_define_method(cDisjointUnion, "subset_count", disjoint_union_subset_count, 0);
+  rb_define_method(cDisjointUnion, "find", disjoint_union_find, 1);
+  rb_define_method(cDisjointUnion, "unite", disjoint_union_unite, 2);
+}

data/ext/c_disjoint_union/extconf.rb ADDED Viewed

@@ -0,0 +1,12 @@
+require 'mkmf'
+abort 'missing malloc()' unless have_func "malloc"
+abort 'missing realloc()' unless have_func "realloc"
+if try_cflags('-O')
+  append_cflags('-O')
+end
+extension_name = "c_disjoint_union"
+dir_config(extension_name)
+create_makefile("data_structures_rmolinari/c_disjoint_union")