RubyGems - ractor_queue - Versions diffs - 0.1.0 → 0.2.0 - Mend

ractor_queue 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

checksums.yaml +4 -4
data/README.md +17 -3
data/ext/ractor_queue/ractor_queue.cpp +18 -7
data/ext/ractor_queue/standard_queue.h +147 -3
data/lib/ractor_queue/interface.rb +31 -2
data/lib/ractor_queue/ractor_queue.rb +9 -0
data/lib/ractor_queue/version.rb +1 -1
metadata +16 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 47a22f9a81a1d6ce08a5588f4d1324e0d15006a57062057ec2ac74d5f44b8888
-  data.tar.gz: c56c3feadd7d4bfa98c28f9b10705fd749ca51d5fb56ab9004530fcba6d6b984
+  metadata.gz: 361cbe2d5b565cd4159a21e0a8f7b4c36c0d86e63881d05bf6ba242006c8a736
+  data.tar.gz: 5788c2099b8c84930d3b1ff626dd1749188185faf717325c59ef4e2f34f4edf1
 SHA512:
-  metadata.gz: f381232cfc1aff09b17f0a41c35fdd22db4bbbbca1319135593c61c3f98dab892a811e59df1922e9eb0fcc55989b896460cf7587d911b6a3809306e28c7beb70
-  data.tar.gz: 335c0137bb03e242cd3fc63daad949e4a862fdf92eca3db068dbbf0728eec31e34210b833cd68376d9a9ac899c1b09f813bc871b6d083469336e3984419df705
+  metadata.gz: 36cf18faa696857aa353d3d2cf7bdee071005557e52537d33d7ecc67370f1945b812664174d9034d7fe7349c9421191667c5b5a43335343598c242a4178f49f0
+  data.tar.gz: a938292d49c5e0a3ca144602b0118506f2c3f8ec539c94a25c9cee6c172eff5568902905314426f0931ec3b9f5b1fccb891b822706b723d4c5d4e783d2efb05f

data/README.md CHANGED Viewed

@@ -14,7 +14,7 @@ producer.value
 consumer.value
 ```
-Backed by the [max0x7ba/atomic_queue](https://github.com/max0x7ba/atomic_queue) C++14 header-only library via [Rice](https://github.com/jasonroelofs/rice) 4.x bindings.
+Backed by the [max0x7ba/atomic_queue](https://github.com/max0x7ba/atomic_queue) C++17 header-only library via [Rice](https://github.com/jasonroelofs/rice) 4.x bindings.
 ---
@@ -62,6 +62,14 @@ q.pop               # => 99
 # Blocking with timeout
 q.pop(timeout: 0.5) # raises RactorQueue::TimeoutError after 500 ms if still empty
+# Fiber-scheduler-aware — use inside Async { } blocks
+require "async"
+Async do
+  q.async_push(42)            # => self  (yields via sleep(0) while full)
+  q.async_pop                 # => 42   (yields via sleep(0) while empty)
+  q.async_pop(timeout: 1.0)   # raises RactorQueue::TimeoutError after 1 s
+end
 # State (approximate under concurrency)
 q.size              # => Integer
 q.empty?            # => true / false
@@ -81,8 +89,10 @@ Ractor.shareable?(q) # => true
 | `RactorQueue.new(capacity:, validate_shareable: false)` | `RactorQueue` instance | Capacity rounded up to power-of-two minimum |
 | `try_push(obj)` | `true` / `false` | Non-blocking; `false` if full |
 | `try_pop` | `obj` or `RactorQueue::EMPTY` | Non-blocking; `EMPTY` sentinel if queue was empty; `nil` if `nil` was pushed |
-| `push(obj, timeout: nil)` | `self` | Blocks until space; raises `TimeoutError` if timeout expires |
-| `pop(timeout: nil)` | `obj` | Blocks until item; raises `TimeoutError` if timeout expires |
+| `push(obj, timeout: nil)` | `self` | Blocking; OS-thread backoff (sleep → suspend); best for Ractors and plain Threads |
+| `pop(timeout: nil)` | `obj` | Blocking; OS-thread backoff; best for Ractors and plain Threads |
+| `async_push(obj, timeout: nil)` | `self` | Fiber-scheduler-aware blocking push; yields via `sleep(0)`; best inside `Async { }` blocks |
+| `async_pop(timeout: nil)` | `obj` | Fiber-scheduler-aware blocking pop; yields via `sleep(0)`; best inside `Async { }` blocks |
 | `size` | Integer | Approximate element count |
 | `empty?` | Boolean | Approximate |
 | `full?` | Boolean | Approximate |
@@ -248,6 +258,8 @@ Under MRI threads (no Ractors), Ruby's `Queue` is faster because the GVL makes l
 ```sh
 bundle exec ruby examples/01_basic_usage.rb   # Ractor usage patterns
 bundle exec ruby examples/02_performance.rb   # Throughput benchmarks
+bundle exec ruby examples/05_simd.rb          # TF-IDF scoring — SIMD fan-out pattern
+bundle exec ruby examples/06_pipeline.rb      # Semantic chunk ranking — MIMD pipeline pattern
 ```
 ---
@@ -268,6 +280,8 @@ bundle exec rake test      # run the test suite
 |---|---|
 | [`examples/01_basic_usage.rb`](examples/01_basic_usage.rb) | Annotated Ractor usage patterns (1P1C, timeout, worker pool, pipeline, validate_shareable) |
 | [`examples/02_performance.rb`](examples/02_performance.rb) | Throughput benchmarks across queue topologies and Ractor counts |
+| [`examples/05_simd.rb`](examples/05_simd.rb) | SIMD fan-out: parallel TF-IDF scoring across W Ractors via `Parallel.map` |
+| [`examples/06_pipeline.rb`](examples/06_pipeline.rb) | MIMD pipeline: 2-stage chunk-rank pipeline via `Parallel.pipeline`, 6 Ractors per stage |
 | [`docs/superpowers/specs/2026-04-10-atomic-queue-design.md`](docs/superpowers/specs/2026-04-10-atomic-queue-design.md) | Original design specification (C extension architecture, Rice bindings, API design decisions) |
 | [`docs/superpowers/plans/`](docs/superpowers/plans/) | Implementation plans for each development phase |

data/ext/ractor_queue/ractor_queue.cpp CHANGED Viewed

@@ -5,6 +5,14 @@
 using namespace Rice;
+// Tell Rice how to mark a StandardQueue during GC.
+namespace Rice {
+  template<>
+  void ruby_mark<StandardQueue>(StandardQueue* data) {
+    if (data) data->mark();
+  }
+}
 // Global sentinel — a unique frozen Ruby Object used to signal "queue empty"
 // from c_try_pop. Pinned as a permanent GC root so it is never collected.
 VALUE g_empty_sentinel = Qnil;  // Set in Init_ractor_queue
@@ -24,10 +32,15 @@ extern "C" void Init_ractor_queue() {
                    Arg("v").setValue())
     .define_method("c_try_pop",  &StandardQueue::try_pop,
                    Return().setValue())
-    .define_method("capacity",   &StandardQueue::capacity)
-    .define_method("was_size",   &StandardQueue::was_size)
-    .define_method("was_empty",  &StandardQueue::was_empty)
-    .define_method("was_full",   &StandardQueue::was_full);
+    .define_method("capacity",     &StandardQueue::capacity)
+    .define_method("was_size",     &StandardQueue::was_size)
+    .define_method("was_empty",    &StandardQueue::was_empty)
+    .define_method("was_full",     &StandardQueue::was_full)
+    // gc_unprotect(VALUE self_val): called once post-construction with the Ruby
+    // VALUE of the queue itself so rb_gc_writebarrier_unprotect can be applied.
+    // See standard_queue.h::gc_unprotect for the full rationale.
+    .define_method("_gc_unprotect", &StandardQueue::gc_unprotect,
+                   Arg("self_val").setValue());
   // Create the permanent EMPTY_SENTINEL object and pin it as a GC root.
   g_empty_sentinel = rb_obj_alloc(rb_cObject);
@@ -36,10 +49,8 @@ extern "C" void Init_ractor_queue() {
   rb_define_const(rb_cRQ, "EMPTY_SENTINEL", g_empty_sentinel);
   // Mark the wrapped C++ type as Ractor-shareable when frozen.
-  // Rice only sets RUBY_TYPED_FREE_IMMEDIATELY; we OR-in RUBY_TYPED_FROZEN_SHAREABLE
-  // so that Ractor.make_shareable(instance) succeeds after Ruby-side freeze.
   Data_Type<StandardQueue>::ruby_data_type()->flags |= RUBY_TYPED_FROZEN_SHAREABLE;
-  // Restore: methods defined after this point (by other code) are not auto-marked Ractor-safe.
+  // Restore: methods defined after this point are not auto-marked Ractor-safe.
   rb_ext_ractor_safe(false);
 }

data/ext/ractor_queue/standard_queue.h CHANGED Viewed

@@ -1,6 +1,7 @@
 #pragma once
 #include <atomic_queue/atomic_queue.h>
 #include <ruby.h>
+#include <atomic>
 // Global sentinel — initialized in Init_ractor_queue, returned by try_pop when empty.
 // Never a valid user-pushed VALUE; only used by the Ruby layer to detect "empty."
@@ -9,17 +10,160 @@ extern VALUE g_empty_sentinel;
 class StandardQueue {
   atomic_queue::AtomicQueueB2<VALUE> q_;
+  // GC shadow: one std::atomic<VALUE> slot per queue capacity entry.
+  // When a heap-allocated VALUE is enqueued, we CAS it into a free slot.
+  // When it is dequeued, we CAS that slot back to Qnil.
+  // The dmark callback marks every non-Qnil slot, keeping queued objects alive.
+  //
+  // Design constraints:
+  //  - No std::mutex: the GC can call dmark during stop-the-world while any
+  //    Ruby thread may be paused mid-push/pop, so a mutex could deadlock.
+  //  - CAS operations are lock-free: safe from Ractors (no GVL required) and
+  //    from Threads; two pushers or a push+pop never corrupt the same slot.
+  //  - Array is sized to the actual rounded-up capacity, so there are always
+  //    enough slots for the maximum in-flight item count.
+  std::atomic<VALUE>* gc_slots_;
+  unsigned            gc_cap_;
+  // Rotating scan hints for try_push and try_pop.
+  //
+  // gc_slots are claimed/freed in roughly FIFO order (mirroring the queue),
+  // so a hint that advances with each successful operation gives O(1) amortised
+  // scan instead of a worst-case O(gc_cap_) scan from 0.
+  //
+  // push_hint_: position of the last successfully claimed slot + 1.
+  //   After a full queue cycle (claim 0..gc_cap_-1, wrap), slot 0 is the first
+  //   freed, and the hint is back at 0 — so the next push claim is O(1).
+  //
+  // pop_hint_: position of the last successfully freed slot + 1.
+  //   Without this, all concurrent pop Ractors scan from slot 0, creating a
+  //   thundering-herd on the slot-0 cache line that dwarfs any scan savings.
+  //   With the hint, each successful pop advances the starting position so
+  //   concurrent workers naturally spread across different cache lines.
+  std::atomic<unsigned> push_hint_{0};
+  std::atomic<unsigned> pop_hint_{0};
 public:
-  explicit StandardQueue(unsigned capacity) : q_(capacity) {}
+  explicit StandardQueue(unsigned capacity)
+      : q_(capacity), gc_cap_(q_.capacity()) {
+    gc_slots_ = new std::atomic<VALUE>[gc_cap_];
+    for (unsigned i = 0; i < gc_cap_; i++)
+      gc_slots_[i].store(Qnil, std::memory_order_relaxed);
+  }
+  ~StandardQueue() { delete[] gc_slots_; }
   // Non-blocking push. Returns true if element was enqueued, false if full.
-  bool try_push(VALUE v) { return q_.try_push(v); }
+  //
+  // ORDERING: gc_slot is claimed BEFORE pushing to q_. This guarantees that any
+  // VALUE in q_ is always covered by a gc_slot — there is no window where an item
+  // is in the queue but unprotected from GC. Without this ordering, a concurrent
+  // pop could drain the item between q_.try_push and the gc_slot CAS, leaving a
+  // stale slot claimed forever. After ~gc_cap_ such races all slots fill up, new
+  // pushes lose GC coverage, and minor GC collects in-flight items (crash).
+  //
+  // FAST-PATH: was_full() exits before touching gc_slots_ on the hot retry path.
+  // The rotating push_hint_ makes the CAS scan O(1) amortised: since gc_slots
+  // are claimed in the same FIFO order as the queue, the hint always points at
+  // (or one step past) the oldest freed slot, which is almost always free.
+  bool try_push(VALUE v) {
+    if (RB_SPECIAL_CONST_P(v)) {
+      // Special consts (fixnum, symbol, true/false/nil) live outside the heap;
+      // they are never collected and need no gc_slot.
+      return q_.try_push(v);
+    }
+    // Skip gc_slot work entirely when the queue reports full.
+    if (q_.was_full()) return false;
+    // Scan from the hint position. O(1) amortised for FIFO workloads.
+    unsigned start  = push_hint_.load(std::memory_order_relaxed) % gc_cap_;
+    int      claimed = -1;
+    for (unsigned i = 0; i < gc_cap_; i++) {
+      unsigned idx = (start + i) % gc_cap_;
+      VALUE expected = Qnil;
+      if (gc_slots_[idx].compare_exchange_strong(
+              expected, v,
+              std::memory_order_release,
+              std::memory_order_relaxed)) {
+        claimed = (int)idx;
+        push_hint_.store((idx + 1) % gc_cap_, std::memory_order_relaxed);
+        break;
+      }
+    }
+    if (claimed < 0) {
+      // Every gc_slot is occupied: queue is at capacity.
+      return false;
+    }
+    bool ok = q_.try_push(v);
+    if (!ok) {
+      // Push failed (race: queue became full between was_full() and here).
+      // Release the slot and roll the hint back so the next push re-checks it.
+      gc_slots_[claimed].store(Qnil, std::memory_order_release);
+      push_hint_.store((unsigned)claimed, std::memory_order_relaxed);
+    }
+    return ok;
+  }
   // Non-blocking pop. Returns the VALUE if one was available,
   // or g_empty_sentinel if the queue was empty.
+  //
+  // Uses CAS (not load+store) to clear the gc_slot so that two concurrent pops
+  // of the same VALUE (pushed twice) always clear two DISTINCT slots. A plain
+  // load+store would let both pops target the same slot, leaving the other slot
+  // permanently occupied — a slow gc_slot leak that eventually fills all slots.
+  //
+  // Scans from pop_hint_: under high concurrency (e.g. 12 Ractor workers all
+  // calling try_pop), scanning from 0 concentrates every CAS retry on the same
+  // cache line — a thundering herd that collapses throughput. The rotating hint
+  // naturally spreads concurrent scanners across different cache lines, giving
+  // O(1) amortised scan with no coherency storm.
   VALUE try_pop() {
     VALUE v;
-    return q_.try_pop(v) ? v : g_empty_sentinel;
+    if (!q_.try_pop(v)) return g_empty_sentinel;
+    if (!RB_SPECIAL_CONST_P(v)) {
+      unsigned start = pop_hint_.load(std::memory_order_relaxed) % gc_cap_;
+      for (unsigned i = 0; i < gc_cap_; i++) {
+        unsigned idx = (start + i) % gc_cap_;
+        VALUE expected = v;
+        if (gc_slots_[idx].compare_exchange_strong(
+                expected, Qnil,
+                std::memory_order_release,
+                std::memory_order_relaxed)) {
+          pop_hint_.store((idx + 1) % gc_cap_, std::memory_order_relaxed);
+          break;
+        }
+      }
+    }
+    return v;
+  }
+  // Called by the GC dmark callback. Marks every occupied gc_slot so that
+  // queued heap-allocated objects survive the current GC cycle.
+  // GC is stop-the-world: no concurrent push/pop is possible here, so the
+  // relaxed loads are safe.
+  void mark() const noexcept {
+    for (unsigned i = 0; i < gc_cap_; i++) {
+      VALUE v = gc_slots_[i].load(std::memory_order_relaxed);
+      if (!RB_SPECIAL_CONST_P(v)) rb_gc_mark(v);
+    }
+  }
+  // Called once after construction from RactorQueue.new.
+  // Marks this object as write-barrier-unprotected so Ruby's generational GC
+  // (RGENGC) always scans gc_slots_ during minor GC.
+  //
+  // Without this, StandardQueue (an OLD object after a few GC cycles) never
+  // appears in the minor-GC remembered set. Any YOUNG Ruby objects pushed to
+  // the queue would not be traced by minor GC and could be collected before
+  // they are popped — causing crashes at any pop site.
+  //
+  // self_val must be the Ruby VALUE of this wrapper object, passed from the
+  // Ruby layer via instance._gc_unprotect(instance).
+  void gc_unprotect(VALUE self_val) {
+    rb_gc_writebarrier_unprotect(self_val);
   }
   unsigned capacity()  const { return q_.capacity(); }

data/lib/ractor_queue/interface.rb CHANGED Viewed

@@ -34,6 +34,35 @@ class RactorQueue
       blocking_pop(timeout)
     end
+    # Fiber-scheduler-aware pop. Yields to the async reactor on every empty
+    # check via sleep(0) rather than spinning with Thread.pass first.
+    # Use inside Async { } blocks. Degrades gracefully to a near-no-op sleep
+    # in plain Thread context (no scheduler installed).
+    # Raises RactorQueue::TimeoutError if timeout expires.
+    def async_pop(timeout: nil)
+      deadline = timeout ? Process.clock_gettime(Process::CLOCK_MONOTONIC) + timeout : nil
+      loop do
+        result = c_try_pop
+        return result unless result.equal?(EMPTY_SENTINEL)
+        raise TimeoutError if deadline && Process.clock_gettime(Process::CLOCK_MONOTONIC) >= deadline
+        sleep(0)
+      end
+    end
+    # Fiber-scheduler-aware push. Yields to the async reactor on every full
+    # check via sleep(0) rather than spinning with Thread.pass first.
+    # Use inside Async { } blocks. Degrades gracefully in plain Thread context.
+    # Raises RactorQueue::TimeoutError if timeout expires.
+    def async_push(obj, timeout: nil)
+      validate_shareable!(obj) if @validate_shareable
+      deadline = timeout ? Process.clock_gettime(Process::CLOCK_MONOTONIC) + timeout : nil
+      loop do
+        return self if c_try_push(obj)
+        raise TimeoutError if deadline && Process.clock_gettime(Process::CLOCK_MONOTONIC) >= deadline
+        sleep(0)
+      end
+    end
     # Approximate current element count.
     def size    = was_size
@@ -77,7 +106,7 @@ class RactorQueue
         raise TimeoutError if deadline && Process.clock_gettime(Process::CLOCK_MONOTONIC) >= deadline
         if spins < SPIN_THRESHOLD
           spins += 1
-          Thread.pass
+          Thread.pass  # ~100ns cooperative yield; works in Ractors and Threads
         else
           sleep(SLEEP_INTERVAL)
         end
@@ -97,7 +126,7 @@ class RactorQueue
         raise TimeoutError if deadline && Process.clock_gettime(Process::CLOCK_MONOTONIC) >= deadline
         if spins < SPIN_THRESHOLD
           spins += 1
-          Thread.pass
+          Thread.pass  # ~100ns cooperative yield; works in Ractors and Threads
         else
           sleep(SLEEP_INTERVAL)
         end

data/lib/ractor_queue/ractor_queue.rb CHANGED Viewed

@@ -13,6 +13,15 @@ class RactorQueue
   # @param validate_shareable [Boolean] Raise NotShareableError on non-shareable pushes.
   def self.new(capacity:, validate_shareable: false)
     instance = super(capacity)
+    # RGENGC write-barrier fix: mark the queue as WB-unprotected so minor GC
+    # always scans gc_slots_ and keeps queued young objects alive.
+    # Without this, pushing a young VALUE into an OLD StandardQueue creates an
+    # untracked old→young reference — minor GC never marks it, the VALUE is
+    # collected, and subsequent pops return garbage (crash at scale).
+    # We pass `instance` explicitly because rb_gc_writebarrier_unprotect needs
+    # the raw Ruby VALUE; there is no way to capture `self` as a VALUE inside
+    # a Rice member-function binding without an explicit argument.
+    instance._gc_unprotect(instance)
     instance.instance_variable_set(:@validate_shareable, validate_shareable)
     # Make the queue instance itself Ractor-shareable. This deep-freezes the Ruby
     # wrapper object. The C++ AtomicQueueB2 buffer is not affected by Ruby's freeze.

data/lib/ractor_queue/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 class RactorQueue
-  VERSION = "0.1.0"
+  VERSION = "0.2.0"
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: ractor_queue
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.2.0
 platform: ruby
 authors:
 - Dewayne VanHoozer
@@ -65,6 +65,20 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '5.0'
+- !ruby/object:Gem::Dependency
+  name: async
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 description: A lock-free MPMC queue that can be shared across Ruby Ractors — the only
   Ractor-safe bounded queue option since Ruby's built-in Queue uses Mutex and cannot
   cross Ractor boundaries.
@@ -104,7 +118,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 4.0.10
+rubygems_version: 4.0.11
 specification_version: 4
 summary: Ractor-shareable bounded queue for Ruby parallel workloads
 test_files: []