ractor_queue 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 47a22f9a81a1d6ce08a5588f4d1324e0d15006a57062057ec2ac74d5f44b8888
4
- data.tar.gz: c56c3feadd7d4bfa98c28f9b10705fd749ca51d5fb56ab9004530fcba6d6b984
3
+ metadata.gz: 361cbe2d5b565cd4159a21e0a8f7b4c36c0d86e63881d05bf6ba242006c8a736
4
+ data.tar.gz: 5788c2099b8c84930d3b1ff626dd1749188185faf717325c59ef4e2f34f4edf1
5
5
  SHA512:
6
- metadata.gz: f381232cfc1aff09b17f0a41c35fdd22db4bbbbca1319135593c61c3f98dab892a811e59df1922e9eb0fcc55989b896460cf7587d911b6a3809306e28c7beb70
7
- data.tar.gz: 335c0137bb03e242cd3fc63daad949e4a862fdf92eca3db068dbbf0728eec31e34210b833cd68376d9a9ac899c1b09f813bc871b6d083469336e3984419df705
6
+ metadata.gz: 36cf18faa696857aa353d3d2cf7bdee071005557e52537d33d7ecc67370f1945b812664174d9034d7fe7349c9421191667c5b5a43335343598c242a4178f49f0
7
+ data.tar.gz: a938292d49c5e0a3ca144602b0118506f2c3f8ec539c94a25c9cee6c172eff5568902905314426f0931ec3b9f5b1fccb891b822706b723d4c5d4e783d2efb05f
data/README.md CHANGED
@@ -14,7 +14,7 @@ producer.value
14
14
  consumer.value
15
15
  ```
16
16
 
17
- Backed by the [max0x7ba/atomic_queue](https://github.com/max0x7ba/atomic_queue) C++14 header-only library via [Rice](https://github.com/jasonroelofs/rice) 4.x bindings.
17
+ Backed by the [max0x7ba/atomic_queue](https://github.com/max0x7ba/atomic_queue) C++17 header-only library via [Rice](https://github.com/jasonroelofs/rice) 4.x bindings.
18
18
 
19
19
  ---
20
20
 
@@ -62,6 +62,14 @@ q.pop # => 99
62
62
  # Blocking with timeout
63
63
  q.pop(timeout: 0.5) # raises RactorQueue::TimeoutError after 500 ms if still empty
64
64
 
65
+ # Fiber-scheduler-aware — use inside Async { } blocks
66
+ require "async"
67
+ Async do
68
+ q.async_push(42) # => self (yields via sleep(0) while full)
69
+ q.async_pop # => 42 (yields via sleep(0) while empty)
70
+ q.async_pop(timeout: 1.0) # raises RactorQueue::TimeoutError after 1 s
71
+ end
72
+
65
73
  # State (approximate under concurrency)
66
74
  q.size # => Integer
67
75
  q.empty? # => true / false
@@ -81,8 +89,10 @@ Ractor.shareable?(q) # => true
81
89
  | `RactorQueue.new(capacity:, validate_shareable: false)` | `RactorQueue` instance | Capacity rounded up to power-of-two minimum |
82
90
  | `try_push(obj)` | `true` / `false` | Non-blocking; `false` if full |
83
91
  | `try_pop` | `obj` or `RactorQueue::EMPTY` | Non-blocking; `EMPTY` sentinel if queue was empty; `nil` if `nil` was pushed |
84
- | `push(obj, timeout: nil)` | `self` | Blocks until space; raises `TimeoutError` if timeout expires |
85
- | `pop(timeout: nil)` | `obj` | Blocks until item; raises `TimeoutError` if timeout expires |
92
+ | `push(obj, timeout: nil)` | `self` | Blocking; OS-thread backoff (sleep → suspend); best for Ractors and plain Threads |
93
+ | `pop(timeout: nil)` | `obj` | Blocking; OS-thread backoff; best for Ractors and plain Threads |
94
+ | `async_push(obj, timeout: nil)` | `self` | Fiber-scheduler-aware blocking push; yields via `sleep(0)`; best inside `Async { }` blocks |
95
+ | `async_pop(timeout: nil)` | `obj` | Fiber-scheduler-aware blocking pop; yields via `sleep(0)`; best inside `Async { }` blocks |
86
96
  | `size` | Integer | Approximate element count |
87
97
  | `empty?` | Boolean | Approximate |
88
98
  | `full?` | Boolean | Approximate |
@@ -248,6 +258,8 @@ Under MRI threads (no Ractors), Ruby's `Queue` is faster because the GVL makes l
248
258
  ```sh
249
259
  bundle exec ruby examples/01_basic_usage.rb # Ractor usage patterns
250
260
  bundle exec ruby examples/02_performance.rb # Throughput benchmarks
261
+ bundle exec ruby examples/05_simd.rb # TF-IDF scoring — SIMD fan-out pattern
262
+ bundle exec ruby examples/06_pipeline.rb # Semantic chunk ranking — MIMD pipeline pattern
251
263
  ```
252
264
 
253
265
  ---
@@ -268,6 +280,8 @@ bundle exec rake test # run the test suite
268
280
  |---|---|
269
281
  | [`examples/01_basic_usage.rb`](examples/01_basic_usage.rb) | Annotated Ractor usage patterns (1P1C, timeout, worker pool, pipeline, validate_shareable) |
270
282
  | [`examples/02_performance.rb`](examples/02_performance.rb) | Throughput benchmarks across queue topologies and Ractor counts |
283
+ | [`examples/05_simd.rb`](examples/05_simd.rb) | SIMD fan-out: parallel TF-IDF scoring across W Ractors via `Parallel.map` |
284
+ | [`examples/06_pipeline.rb`](examples/06_pipeline.rb) | MIMD pipeline: 2-stage chunk-rank pipeline via `Parallel.pipeline`, 6 Ractors per stage |
271
285
  | [`docs/superpowers/specs/2026-04-10-atomic-queue-design.md`](docs/superpowers/specs/2026-04-10-atomic-queue-design.md) | Original design specification (C extension architecture, Rice bindings, API design decisions) |
272
286
  | [`docs/superpowers/plans/`](docs/superpowers/plans/) | Implementation plans for each development phase |
273
287
 
@@ -5,6 +5,14 @@
5
5
 
6
6
  using namespace Rice;
7
7
 
8
+ // Tell Rice how to mark a StandardQueue during GC.
9
+ namespace Rice {
10
+ template<>
11
+ void ruby_mark<StandardQueue>(StandardQueue* data) {
12
+ if (data) data->mark();
13
+ }
14
+ }
15
+
8
16
  // Global sentinel — a unique frozen Ruby Object used to signal "queue empty"
9
17
  // from c_try_pop. Pinned as a permanent GC root so it is never collected.
10
18
  VALUE g_empty_sentinel = Qnil; // Set in Init_ractor_queue
@@ -24,10 +32,15 @@ extern "C" void Init_ractor_queue() {
24
32
  Arg("v").setValue())
25
33
  .define_method("c_try_pop", &StandardQueue::try_pop,
26
34
  Return().setValue())
27
- .define_method("capacity", &StandardQueue::capacity)
28
- .define_method("was_size", &StandardQueue::was_size)
29
- .define_method("was_empty", &StandardQueue::was_empty)
30
- .define_method("was_full", &StandardQueue::was_full);
35
+ .define_method("capacity", &StandardQueue::capacity)
36
+ .define_method("was_size", &StandardQueue::was_size)
37
+ .define_method("was_empty", &StandardQueue::was_empty)
38
+ .define_method("was_full", &StandardQueue::was_full)
39
+ // gc_unprotect(VALUE self_val): called once post-construction with the Ruby
40
+ // VALUE of the queue itself so rb_gc_writebarrier_unprotect can be applied.
41
+ // See standard_queue.h::gc_unprotect for the full rationale.
42
+ .define_method("_gc_unprotect", &StandardQueue::gc_unprotect,
43
+ Arg("self_val").setValue());
31
44
 
32
45
  // Create the permanent EMPTY_SENTINEL object and pin it as a GC root.
33
46
  g_empty_sentinel = rb_obj_alloc(rb_cObject);
@@ -36,10 +49,8 @@ extern "C" void Init_ractor_queue() {
36
49
  rb_define_const(rb_cRQ, "EMPTY_SENTINEL", g_empty_sentinel);
37
50
 
38
51
  // Mark the wrapped C++ type as Ractor-shareable when frozen.
39
- // Rice only sets RUBY_TYPED_FREE_IMMEDIATELY; we OR-in RUBY_TYPED_FROZEN_SHAREABLE
40
- // so that Ractor.make_shareable(instance) succeeds after Ruby-side freeze.
41
52
  Data_Type<StandardQueue>::ruby_data_type()->flags |= RUBY_TYPED_FROZEN_SHAREABLE;
42
53
 
43
- // Restore: methods defined after this point (by other code) are not auto-marked Ractor-safe.
54
+ // Restore: methods defined after this point are not auto-marked Ractor-safe.
44
55
  rb_ext_ractor_safe(false);
45
56
  }
@@ -1,6 +1,7 @@
1
1
  #pragma once
2
2
  #include <atomic_queue/atomic_queue.h>
3
3
  #include <ruby.h>
4
+ #include <atomic>
4
5
 
5
6
  // Global sentinel — initialized in Init_ractor_queue, returned by try_pop when empty.
6
7
  // Never a valid user-pushed VALUE; only used by the Ruby layer to detect "empty."
@@ -9,17 +10,160 @@ extern VALUE g_empty_sentinel;
9
10
  class StandardQueue {
10
11
  atomic_queue::AtomicQueueB2<VALUE> q_;
11
12
 
13
+ // GC shadow: one std::atomic<VALUE> slot per queue capacity entry.
14
+ // When a heap-allocated VALUE is enqueued, we CAS it into a free slot.
15
+ // When it is dequeued, we CAS that slot back to Qnil.
16
+ // The dmark callback marks every non-Qnil slot, keeping queued objects alive.
17
+ //
18
+ // Design constraints:
19
+ // - No std::mutex: the GC can call dmark during stop-the-world while any
20
+ // Ruby thread may be paused mid-push/pop, so a mutex could deadlock.
21
+ // - CAS operations are lock-free: safe from Ractors (no GVL required) and
22
+ // from Threads; two pushers or a push+pop never corrupt the same slot.
23
+ // - Array is sized to the actual rounded-up capacity, so there are always
24
+ // enough slots for the maximum in-flight item count.
25
+ std::atomic<VALUE>* gc_slots_;
26
+ unsigned gc_cap_;
27
+
28
+ // Rotating scan hints for try_push and try_pop.
29
+ //
30
+ // gc_slots are claimed/freed in roughly FIFO order (mirroring the queue),
31
+ // so a hint that advances with each successful operation gives O(1) amortised
32
+ // scan instead of a worst-case O(gc_cap_) scan from 0.
33
+ //
34
+ // push_hint_: position of the last successfully claimed slot + 1.
35
+ // After a full queue cycle (claim 0..gc_cap_-1, wrap), slot 0 is the first
36
+ // freed, and the hint is back at 0 — so the next push claim is O(1).
37
+ //
38
+ // pop_hint_: position of the last successfully freed slot + 1.
39
+ // Without this, all concurrent pop Ractors scan from slot 0, creating a
40
+ // thundering-herd on the slot-0 cache line that dwarfs any scan savings.
41
+ // With the hint, each successful pop advances the starting position so
42
+ // concurrent workers naturally spread across different cache lines.
43
+ std::atomic<unsigned> push_hint_{0};
44
+ std::atomic<unsigned> pop_hint_{0};
45
+
12
46
  public:
13
- explicit StandardQueue(unsigned capacity) : q_(capacity) {}
47
+ explicit StandardQueue(unsigned capacity)
48
+ : q_(capacity), gc_cap_(q_.capacity()) {
49
+ gc_slots_ = new std::atomic<VALUE>[gc_cap_];
50
+ for (unsigned i = 0; i < gc_cap_; i++)
51
+ gc_slots_[i].store(Qnil, std::memory_order_relaxed);
52
+ }
53
+
54
+ ~StandardQueue() { delete[] gc_slots_; }
14
55
 
15
56
  // Non-blocking push. Returns true if element was enqueued, false if full.
16
- bool try_push(VALUE v) { return q_.try_push(v); }
57
+ //
58
+ // ORDERING: gc_slot is claimed BEFORE pushing to q_. This guarantees that any
59
+ // VALUE in q_ is always covered by a gc_slot — there is no window where an item
60
+ // is in the queue but unprotected from GC. Without this ordering, a concurrent
61
+ // pop could drain the item between q_.try_push and the gc_slot CAS, leaving a
62
+ // stale slot claimed forever. After ~gc_cap_ such races all slots fill up, new
63
+ // pushes lose GC coverage, and minor GC collects in-flight items (crash).
64
+ //
65
+ // FAST-PATH: was_full() exits before touching gc_slots_ on the hot retry path.
66
+ // The rotating push_hint_ makes the CAS scan O(1) amortised: since gc_slots
67
+ // are claimed in the same FIFO order as the queue, the hint always points at
68
+ // (or one step past) the oldest freed slot, which is almost always free.
69
+ bool try_push(VALUE v) {
70
+ if (RB_SPECIAL_CONST_P(v)) {
71
+ // Special consts (fixnum, symbol, true/false/nil) live outside the heap;
72
+ // they are never collected and need no gc_slot.
73
+ return q_.try_push(v);
74
+ }
75
+
76
+ // Skip gc_slot work entirely when the queue reports full.
77
+ if (q_.was_full()) return false;
78
+
79
+ // Scan from the hint position. O(1) amortised for FIFO workloads.
80
+ unsigned start = push_hint_.load(std::memory_order_relaxed) % gc_cap_;
81
+ int claimed = -1;
82
+ for (unsigned i = 0; i < gc_cap_; i++) {
83
+ unsigned idx = (start + i) % gc_cap_;
84
+ VALUE expected = Qnil;
85
+ if (gc_slots_[idx].compare_exchange_strong(
86
+ expected, v,
87
+ std::memory_order_release,
88
+ std::memory_order_relaxed)) {
89
+ claimed = (int)idx;
90
+ push_hint_.store((idx + 1) % gc_cap_, std::memory_order_relaxed);
91
+ break;
92
+ }
93
+ }
94
+
95
+ if (claimed < 0) {
96
+ // Every gc_slot is occupied: queue is at capacity.
97
+ return false;
98
+ }
99
+
100
+ bool ok = q_.try_push(v);
101
+ if (!ok) {
102
+ // Push failed (race: queue became full between was_full() and here).
103
+ // Release the slot and roll the hint back so the next push re-checks it.
104
+ gc_slots_[claimed].store(Qnil, std::memory_order_release);
105
+ push_hint_.store((unsigned)claimed, std::memory_order_relaxed);
106
+ }
107
+ return ok;
108
+ }
17
109
 
18
110
  // Non-blocking pop. Returns the VALUE if one was available,
19
111
  // or g_empty_sentinel if the queue was empty.
112
+ //
113
+ // Uses CAS (not load+store) to clear the gc_slot so that two concurrent pops
114
+ // of the same VALUE (pushed twice) always clear two DISTINCT slots. A plain
115
+ // load+store would let both pops target the same slot, leaving the other slot
116
+ // permanently occupied — a slow gc_slot leak that eventually fills all slots.
117
+ //
118
+ // Scans from pop_hint_: under high concurrency (e.g. 12 Ractor workers all
119
+ // calling try_pop), scanning from 0 concentrates every CAS retry on the same
120
+ // cache line — a thundering herd that collapses throughput. The rotating hint
121
+ // naturally spreads concurrent scanners across different cache lines, giving
122
+ // O(1) amortised scan with no coherency storm.
20
123
  VALUE try_pop() {
21
124
  VALUE v;
22
- return q_.try_pop(v) ? v : g_empty_sentinel;
125
+ if (!q_.try_pop(v)) return g_empty_sentinel;
126
+ if (!RB_SPECIAL_CONST_P(v)) {
127
+ unsigned start = pop_hint_.load(std::memory_order_relaxed) % gc_cap_;
128
+ for (unsigned i = 0; i < gc_cap_; i++) {
129
+ unsigned idx = (start + i) % gc_cap_;
130
+ VALUE expected = v;
131
+ if (gc_slots_[idx].compare_exchange_strong(
132
+ expected, Qnil,
133
+ std::memory_order_release,
134
+ std::memory_order_relaxed)) {
135
+ pop_hint_.store((idx + 1) % gc_cap_, std::memory_order_relaxed);
136
+ break;
137
+ }
138
+ }
139
+ }
140
+ return v;
141
+ }
142
+
143
+ // Called by the GC dmark callback. Marks every occupied gc_slot so that
144
+ // queued heap-allocated objects survive the current GC cycle.
145
+ // GC is stop-the-world: no concurrent push/pop is possible here, so the
146
+ // relaxed loads are safe.
147
+ void mark() const noexcept {
148
+ for (unsigned i = 0; i < gc_cap_; i++) {
149
+ VALUE v = gc_slots_[i].load(std::memory_order_relaxed);
150
+ if (!RB_SPECIAL_CONST_P(v)) rb_gc_mark(v);
151
+ }
152
+ }
153
+
154
+ // Called once after construction from RactorQueue.new.
155
+ // Marks this object as write-barrier-unprotected so Ruby's generational GC
156
+ // (RGENGC) always scans gc_slots_ during minor GC.
157
+ //
158
+ // Without this, StandardQueue (an OLD object after a few GC cycles) never
159
+ // appears in the minor-GC remembered set. Any YOUNG Ruby objects pushed to
160
+ // the queue would not be traced by minor GC and could be collected before
161
+ // they are popped — causing crashes at any pop site.
162
+ //
163
+ // self_val must be the Ruby VALUE of this wrapper object, passed from the
164
+ // Ruby layer via instance._gc_unprotect(instance).
165
+ void gc_unprotect(VALUE self_val) {
166
+ rb_gc_writebarrier_unprotect(self_val);
23
167
  }
24
168
 
25
169
  unsigned capacity() const { return q_.capacity(); }
@@ -34,6 +34,35 @@ class RactorQueue
34
34
  blocking_pop(timeout)
35
35
  end
36
36
 
37
+ # Fiber-scheduler-aware pop. Yields to the async reactor on every empty
38
+ # check via sleep(0) rather than spinning with Thread.pass first.
39
+ # Use inside Async { } blocks. Degrades gracefully to a near-no-op sleep
40
+ # in plain Thread context (no scheduler installed).
41
+ # Raises RactorQueue::TimeoutError if timeout expires.
42
+ def async_pop(timeout: nil)
43
+ deadline = timeout ? Process.clock_gettime(Process::CLOCK_MONOTONIC) + timeout : nil
44
+ loop do
45
+ result = c_try_pop
46
+ return result unless result.equal?(EMPTY_SENTINEL)
47
+ raise TimeoutError if deadline && Process.clock_gettime(Process::CLOCK_MONOTONIC) >= deadline
48
+ sleep(0)
49
+ end
50
+ end
51
+
52
+ # Fiber-scheduler-aware push. Yields to the async reactor on every full
53
+ # check via sleep(0) rather than spinning with Thread.pass first.
54
+ # Use inside Async { } blocks. Degrades gracefully in plain Thread context.
55
+ # Raises RactorQueue::TimeoutError if timeout expires.
56
+ def async_push(obj, timeout: nil)
57
+ validate_shareable!(obj) if @validate_shareable
58
+ deadline = timeout ? Process.clock_gettime(Process::CLOCK_MONOTONIC) + timeout : nil
59
+ loop do
60
+ return self if c_try_push(obj)
61
+ raise TimeoutError if deadline && Process.clock_gettime(Process::CLOCK_MONOTONIC) >= deadline
62
+ sleep(0)
63
+ end
64
+ end
65
+
37
66
  # Approximate current element count.
38
67
  def size = was_size
39
68
 
@@ -77,7 +106,7 @@ class RactorQueue
77
106
  raise TimeoutError if deadline && Process.clock_gettime(Process::CLOCK_MONOTONIC) >= deadline
78
107
  if spins < SPIN_THRESHOLD
79
108
  spins += 1
80
- Thread.pass
109
+ Thread.pass # ~100ns cooperative yield; works in Ractors and Threads
81
110
  else
82
111
  sleep(SLEEP_INTERVAL)
83
112
  end
@@ -97,7 +126,7 @@ class RactorQueue
97
126
  raise TimeoutError if deadline && Process.clock_gettime(Process::CLOCK_MONOTONIC) >= deadline
98
127
  if spins < SPIN_THRESHOLD
99
128
  spins += 1
100
- Thread.pass
129
+ Thread.pass # ~100ns cooperative yield; works in Ractors and Threads
101
130
  else
102
131
  sleep(SLEEP_INTERVAL)
103
132
  end
@@ -13,6 +13,15 @@ class RactorQueue
13
13
  # @param validate_shareable [Boolean] Raise NotShareableError on non-shareable pushes.
14
14
  def self.new(capacity:, validate_shareable: false)
15
15
  instance = super(capacity)
16
+ # RGENGC write-barrier fix: mark the queue as WB-unprotected so minor GC
17
+ # always scans gc_slots_ and keeps queued young objects alive.
18
+ # Without this, pushing a young VALUE into an OLD StandardQueue creates an
19
+ # untracked old→young reference — minor GC never marks it, the VALUE is
20
+ # collected, and subsequent pops return garbage (crash at scale).
21
+ # We pass `instance` explicitly because rb_gc_writebarrier_unprotect needs
22
+ # the raw Ruby VALUE; there is no way to capture `self` as a VALUE inside
23
+ # a Rice member-function binding without an explicit argument.
24
+ instance._gc_unprotect(instance)
16
25
  instance.instance_variable_set(:@validate_shareable, validate_shareable)
17
26
  # Make the queue instance itself Ractor-shareable. This deep-freezes the Ruby
18
27
  # wrapper object. The C++ AtomicQueueB2 buffer is not affected by Ruby's freeze.
@@ -1,3 +1,3 @@
1
1
  class RactorQueue
2
- VERSION = "0.1.0"
2
+ VERSION = "0.2.0"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ractor_queue
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dewayne VanHoozer
@@ -65,6 +65,20 @@ dependencies:
65
65
  - - "~>"
66
66
  - !ruby/object:Gem::Version
67
67
  version: '5.0'
68
+ - !ruby/object:Gem::Dependency
69
+ name: async
70
+ requirement: !ruby/object:Gem::Requirement
71
+ requirements:
72
+ - - ">="
73
+ - !ruby/object:Gem::Version
74
+ version: '0'
75
+ type: :development
76
+ prerelease: false
77
+ version_requirements: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - ">="
80
+ - !ruby/object:Gem::Version
81
+ version: '0'
68
82
  description: A lock-free MPMC queue that can be shared across Ruby Ractors — the only
69
83
  Ractor-safe bounded queue option since Ruby's built-in Queue uses Mutex and cannot
70
84
  cross Ractor boundaries.
@@ -104,7 +118,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
104
118
  - !ruby/object:Gem::Version
105
119
  version: '0'
106
120
  requirements: []
107
- rubygems_version: 4.0.10
121
+ rubygems_version: 4.0.11
108
122
  specification_version: 4
109
123
  summary: Ractor-shareable bounded queue for Ruby parallel workloads
110
124
  test_files: []