fiber_stream 0.4.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 1cc93666d0610e659313a12dc756fca935579e62711972a9ea65d9ad818f6020
4
- data.tar.gz: 97f315765ba573a5c047752fd083006db89404eff06e92bda5afbfa0f1933ed1
3
+ metadata.gz: 504a7400182e09bb66a5a07981bbb2cfab350397d09c23ea036e856f29afe1d2
4
+ data.tar.gz: 67c7ddead203d42bf869dcba7d840e7d09514d888b90b84b77d97a8cedc19e1e
5
5
  SHA512:
6
- metadata.gz: 5e635531f9e34510ef0eab76254c33e70f36124be779f48d2f9692c351a2f26ed36c7c14fa0d1d596c5645b511bb4fa1abe2bb773fc434c1f4da39a6e19dbefc
7
- data.tar.gz: 67eb50ae0a1d727c65fd172be572f7bf08ba82b3ee0cf4d9094c7e9ad579aac327155cc772bcd4e94b778cef53f63decf64a08e12e7ea5ef6949a6d813336b04
6
+ metadata.gz: f9c8d80faca53d9c6e059326bc9e6ffe7497aa8a6cbcad3b42eefe0ef3a3d8d29f08a3afbec8fc2b02b2064a745bb00ea644e816f913aa624a2ad11f1b087b99
7
+ data.tar.gz: 9e6f87b74cbfe2ccaaa9c32990f9786af3ee3ee3d7f2c3dd605a88efa8f3e436e4a1051d65c1eb1664859af212a2fecc82c66dc585cf9333dd799917c2a9035e
data/CHANGELOG.md CHANGED
@@ -1,5 +1,30 @@
1
1
  # Changelog
2
2
 
3
+ ## 0.5.0 - 2026-06-21
4
+
5
+ ### Added
6
+
7
+ - `Flow.tap { |element| ... }` for lazy pass-through observation without
8
+ changing emitted elements.
9
+ - `Flow.filter_map { |element| ... }` and `Source#filter_map` for combined
10
+ transformation and falsey-value dropping.
11
+ - `Flow.reject { |element| ... }` and `Source#reject` for complement
12
+ predicate filtering.
13
+ - `Flow.compact` and `Source#compact` for nil-only filtering while preserving
14
+ `false`.
15
+ - `Flow.map_concat { |element| enumerable }` and `Source#map_concat` for
16
+ one-to-many element expansion.
17
+ - `Flow.throttle(...)`, `Source#throttle(...)`, and `RateLimiter` for
18
+ pull-driven rate limiting with optional shared quota state.
19
+ - `Sink.count` for counting stream elements without accumulating them.
20
+
21
+ ### Changed
22
+
23
+ - Expanded README and website reference coverage for the flow operators and
24
+ rate limiter added in 0.5.0.
25
+ - Promoted completed flow product specs and design docs from draft to accepted
26
+ status.
27
+
3
28
  ## 0.4.0 - 2026-06-09
4
29
 
5
30
  ### Added
data/README.md CHANGED
@@ -1,7 +1,10 @@
1
1
  # FiberStream
2
- FiberStream is a Ruby library for linear stream processing with pull-based backpressure.
3
2
 
4
- It builds lazy Source definitions, transforms values with Flow stages, and materializes results with Sink objects.
3
+ FiberStream is a Ruby library for linear stream processing with pull-based
4
+ backpressure.
5
+
6
+ It builds lazy `Source` definitions, transforms values with `Flow` stages, and
7
+ materializes results with `Sink` objects.
5
8
 
6
9
  [![Gem Version](https://badge.fury.io/rb/fiber_stream.svg)](https://badge.fury.io/rb/fiber_stream)
7
10
 
@@ -30,11 +33,12 @@ Implemented capabilities:
30
33
  - in-memory, IO, FiberStream-owned Ractor producer, backpressure-aware Ractor
31
34
  port, and Ractor port merge sources
32
35
  - lazy source concatenation, zipping, and scheduler-backed merging
33
- - mapping, filtering, limiting, predicate-based limiting and dropping,
34
- fixed-prefix dropping, fixed-size grouping, line splitting, buffering, async
35
- boundaries, ordered and unordered parallel mapping, and ordered
36
- Ractor-backed mapping
37
- - array, first-element, fold, foreach, and IO sinks
36
+ - mapping, filtering, transform-and-filter, nil compaction, side-effect
37
+ observation, one-to-many expansion, limiting, predicate-based limiting and
38
+ dropping, fixed-prefix dropping, fixed-size grouping, line splitting,
39
+ buffering, async boundaries, throttling, ordered and unordered parallel
40
+ mapping, and ordered Ractor-backed mapping
41
+ - array, first-element, count, fold, foreach, and IO sinks
38
42
  - reusable flow composition and runnable pipelines
39
43
  - foreground and scheduler-backed background pipeline execution
40
44
  - public RBS signatures
@@ -205,6 +209,62 @@ FiberStream::Source.each([" a ", "", " b "])
205
209
  # => ["a", "b"]
206
210
  ```
207
211
 
212
+ Use `reject` when the predicate names values to drop. Truthy predicate results
213
+ drop the original element; `false` and `nil` results pass it through unchanged:
214
+
215
+ ```ruby
216
+ result =
217
+ FiberStream::Source.each([1, 2, 3, 4])
218
+ .reject(&:even?)
219
+ .run_with(FiberStream::Sink.to_a)
220
+
221
+ result # => [1, 3]
222
+ ```
223
+
224
+ Use `filter_map` when filtering and transformation are one decision. Truthy
225
+ block results are emitted as transformed values; `false` and `nil` are
226
+ dropped:
227
+
228
+ ```ruby
229
+ ids =
230
+ FiberStream::Source.each([{ id: 1 }, {}, { id: 3 }])
231
+ .filter_map { |record| record[:id] }
232
+ .run_with(FiberStream::Sink.to_a)
233
+
234
+ ids # => [1, 3]
235
+ ```
236
+
237
+ Use `compact` to drop only `nil` while keeping `false`, and `map_concat` to
238
+ expand one upstream element into zero or more downstream elements:
239
+
240
+ ```ruby
241
+ tokens =
242
+ FiberStream::Source.each(["alpha beta", nil, "gamma"])
243
+ .compact
244
+ .map_concat { |line| line.split }
245
+ .run_with(FiberStream::Sink.to_a)
246
+
247
+ tokens # => ["alpha", "beta", "gamma"]
248
+ ```
249
+
250
+ Use `Flow.tap` for observation inside a reusable flow without changing the
251
+ element:
252
+
253
+ ```ruby
254
+ seen = []
255
+
256
+ observed =
257
+ FiberStream::Flow.tap { |value| seen << value }
258
+ .via(FiberStream::Flow.map { |value| value * 10 })
259
+
260
+ FiberStream::Source.each([1, 2])
261
+ .via(observed)
262
+ .run_with(FiberStream::Sink.to_a)
263
+ # => [10, 20]
264
+
265
+ seen # => [1, 2]
266
+ ```
267
+
208
268
  Use `parallel_map` for ordered scheduler-backed mapping when each element
209
269
  waits on non-blocking IO. It preserves input order while allowing up to
210
270
  `concurrency` mapping operations to be in flight:
@@ -293,6 +353,14 @@ FiberStream::Source.each([1, 2, 3])
293
353
  # => 6
294
354
  ```
295
355
 
356
+ Use `Sink.count` when only the number of elements matters:
357
+
358
+ ```ruby
359
+ FiberStream::Source.each([1, 2, 3])
360
+ .run_with(FiberStream::Sink.count)
361
+ # => 3
362
+ ```
363
+
296
364
  Use `Sink.foreach` when the terminal operation is a side effect and the stream
297
365
  values should not be accumulated:
298
366
 
@@ -473,12 +541,15 @@ does not provide CPU parallelism. Use producer ractors with
473
541
  `Source.ractor_producer` or `Source.ractor_merge_producers` when producer work
474
542
  needs true isolation.
475
543
 
476
- `Flow.buffer(count)` allows bounded prefetch. `Flow.async`, `Flow.buffer`,
544
+ `Flow.buffer(count)` allows bounded prefetch. `Flow.throttle(rate:, per:)`
545
+ paces elements before downstream side effects. `Flow.async`, `Flow.buffer`,
477
546
  `Flow.parallel_map`, `Flow.parallel_unordered_map`, `Source.io`,
478
547
  `Source#merge`, `Sink.io`, and `Pipeline#run_async` require an installed
479
548
  `Fiber.scheduler` and a non-blocking current fiber when demanded or started.
480
- FiberStream does not install a scheduler and does not depend on Async at
481
- runtime.
549
+ `Flow.throttle` requires that scheduler context only when it needs to wait.
550
+ Pass `throttle(limiter:)` with a `FiberStream::RateLimiter` when multiple
551
+ pipelines or repeated runs should share quota state. FiberStream does not
552
+ install a scheduler and does not depend on Async at runtime.
482
553
 
483
554
  ## API Surface
484
555
 
@@ -498,10 +569,14 @@ Source convenience methods:
498
569
  - `Source#zip(source)`
499
570
  - `Source#merge(source)`
500
571
  - `Source#map { |element| ... }`
572
+ - `Source#filter_map { |element| ... }`
573
+ - `Source#compact`
574
+ - `Source#map_concat { |element| enumerable }`
501
575
  - `Source#parallel_map(concurrency:) { |element| ... }`
502
576
  - `Source#parallel_unordered_map(concurrency:) { |element| ... }`
503
577
  - `Source#ractor_map(workers:, input_transfer: :copy, output_transfer: :copy) { |element| ... }`
504
578
  - `Source#select { |element| ... }`
579
+ - `Source#reject { |element| ... }`
505
580
  - `Source#take(count)`
506
581
  - `Source#drop(count)`
507
582
  - `Source#grouped(count)`
@@ -510,6 +585,8 @@ Source convenience methods:
510
585
  - `Source#drop_while { |element| ... }`
511
586
  - `Source#async`
512
587
  - `Source#buffer(count)`
588
+ - `Source#throttle(rate:, per: 1, burst: nil)`
589
+ - `Source#throttle(limiter:)`
513
590
  - `Source#lines(chomp: true, max_length: nil)`
514
591
  - `Source#split(separator, keep_separator: false, max_length: nil)`
515
592
  - `Source#to(sink)`
@@ -518,10 +595,15 @@ Source convenience methods:
518
595
  Flows:
519
596
 
520
597
  - `FiberStream::Flow.map { |element| ... }`
598
+ - `FiberStream::Flow.filter_map { |element| ... }`
599
+ - `FiberStream::Flow.compact`
600
+ - `FiberStream::Flow.map_concat { |element| enumerable }`
601
+ - `FiberStream::Flow.tap { |element| ... }`
521
602
  - `FiberStream::Flow.parallel_map(concurrency:) { |element| ... }`
522
603
  - `FiberStream::Flow.parallel_unordered_map(concurrency:) { |element| ... }`
523
604
  - `FiberStream::Flow.ractor_map(workers:, input_transfer: :copy, output_transfer: :copy) { |element| ... }`
524
605
  - `FiberStream::Flow.select { |element| ... }`
606
+ - `FiberStream::Flow.reject { |element| ... }`
525
607
  - `FiberStream::Flow.take(count)`
526
608
  - `FiberStream::Flow.drop(count)`
527
609
  - `FiberStream::Flow.grouped(count)`
@@ -530,10 +612,14 @@ Flows:
530
612
  - `FiberStream::Flow.drop_while { |element| ... }`
531
613
  - `FiberStream::Flow.async`
532
614
  - `FiberStream::Flow.buffer(count)`
615
+ - `FiberStream::Flow.throttle(rate:, per: 1, burst: nil)`
616
+ - `FiberStream::Flow.throttle(limiter:)`
533
617
  - `FiberStream::Flow.lines(chomp: true, max_length: nil)`
534
618
  - `FiberStream::Flow.split(separator, keep_separator: false, max_length: nil)`
535
619
  - `Flow#via(flow)`
536
620
  - `Flow#to(sink)`
621
+ - `FiberStream::RateLimiter.new(rate:, per: 1, burst: nil)`
622
+ - `FiberStream::RateLimiter#acquire(permits: 1)`
537
623
 
538
624
  `lines` and `split` default to `max_length: nil`, which allows one
539
625
  unterminated line or frame to buffer without bound. Set a positive
@@ -543,6 +629,7 @@ Sinks:
543
629
 
544
630
  - `FiberStream::Sink.to_a`
545
631
  - `FiberStream::Sink.first`
632
+ - `FiberStream::Sink.count`
546
633
  - `FiberStream::Sink.fold(initial) { |accumulator, element| ... }`
547
634
  - `FiberStream::Sink.foreach { |element| ... }`
548
635
  - `FiberStream::Sink.io(io, close: false, flush: false)`
@@ -13,6 +13,51 @@ module FiberStream
13
13
  new { |upstream| Pull.map(upstream, block) }
14
14
  end
15
15
 
16
+ # Creates a transform-and-filter flow.
17
+ #
18
+ # The block is called once for each upstream element observed by this
19
+ # stage. Truthy block results are emitted downstream as transformed values;
20
+ # false and nil results are dropped. Exceptions raised by the block fail the
21
+ # stream and are re-raised from `Source#run_with`.
22
+ def self.filter_map(&block)
23
+ raise ArgumentError, "missing block" unless block
24
+
25
+ new { |upstream| Pull.filter_map(upstream, block) }
26
+ end
27
+
28
+ # Creates a nil-dropping flow.
29
+ #
30
+ # The flow drops `nil` elements and passes every non-`nil` element through
31
+ # unchanged, including `false`.
32
+ def self.compact
33
+ new { |upstream| Pull.compact(upstream) }
34
+ end
35
+
36
+ # Creates a one-to-many mapping flow.
37
+ #
38
+ # The block is called once for each upstream element whose expansion is
39
+ # needed. It must return an object that responds to `#each`; yielded values
40
+ # are emitted in order before the next upstream element is pulled.
41
+ # Exceptions raised by the block or by the returned object's `#each` fail
42
+ # the stream and are re-raised from `Source#run_with`.
43
+ def self.map_concat(&block)
44
+ raise ArgumentError, "missing block" unless block
45
+
46
+ new { |upstream| Pull.map_concat(upstream, block) }
47
+ end
48
+
49
+ # Creates a pass-through observing flow.
50
+ #
51
+ # The block is called once for each element before that element is emitted
52
+ # downstream. The block return value is ignored and the original element is
53
+ # passed through unchanged. Exceptions raised by the block fail the stream
54
+ # and are re-raised from `Source#run_with`.
55
+ def self.tap(&block)
56
+ raise ArgumentError, "missing block" unless block
57
+
58
+ new { |upstream| Pull.tap(upstream, block) }
59
+ end
60
+
16
61
  # Creates an ordered scheduler-backed parallel mapping flow.
17
62
  #
18
63
  # The stage starts internal scheduled fibers on first downstream demand and
@@ -78,6 +123,19 @@ module FiberStream
78
123
  new { |upstream| Pull.select(upstream, block) }
79
124
  end
80
125
 
126
+ # Creates a complement filtering flow.
127
+ #
128
+ # The block is called for upstream elements until it returns `false` or
129
+ # `nil`, or upstream completes. Truthy predicate results drop the original
130
+ # element; false and nil results pass the element through unchanged.
131
+ # Exceptions raised by the block fail the stream and are re-raised from
132
+ # `Source#run_with`.
133
+ def self.reject(&block)
134
+ raise ArgumentError, "missing block" unless block
135
+
136
+ new { |upstream| Pull.reject(upstream, block) }
137
+ end
138
+
81
139
  # Creates a limiting flow.
82
140
  #
83
141
  # The flow emits at most `count` elements. `take(0)` completes without
@@ -180,6 +238,19 @@ module FiberStream
180
238
  new { |upstream| Pull.buffer(upstream, count) }
181
239
  end
182
240
 
241
+ # Creates a scheduler-aware throttling flow.
242
+ #
243
+ # The `rate:` form creates a fresh `RateLimiter` for each materialization.
244
+ # The `limiter:` form uses the supplied limiter object, which must respond
245
+ # to `acquire(permits:)` and return only after permits are acquired. When
246
+ # FiberStream-owned waiting is required, the current fiber must be
247
+ # non-blocking with an installed `Fiber.scheduler`.
248
+ def self.throttle(**options)
249
+ limiter = build_throttle_limiter(options)
250
+
251
+ new { |upstream| Pull.throttle(upstream, limiter.call) }
252
+ end
253
+
183
254
  # Creates a line-splitting flow.
184
255
  #
185
256
  # The flow accepts String chunks and emits lines split on "\n". By default
@@ -222,6 +293,38 @@ module FiberStream
222
293
  new(&attach)
223
294
  end
224
295
 
296
+ def self.build_throttle_limiter(options)
297
+ unknown_keywords = options.keys - [:rate, :per, :burst, :limiter]
298
+ raise ArgumentError, "unknown keywords: #{unknown_keywords.join(", ")}" unless unknown_keywords.empty?
299
+
300
+ rate_given = options.key?(:rate)
301
+ per_given = options.key?(:per)
302
+ burst_given = options.key?(:burst)
303
+ limiter_given = options.key?(:limiter)
304
+
305
+ if limiter_given
306
+ raise ArgumentError, "cannot pass rate and limiter together" if rate_given
307
+ raise ArgumentError, "cannot pass per with limiter" if per_given
308
+ raise ArgumentError, "cannot pass burst with limiter" if burst_given
309
+
310
+ limiter = options.fetch(:limiter)
311
+ raise TypeError, "limiter must respond to acquire" unless limiter.respond_to?(:acquire)
312
+
313
+ return -> { limiter }
314
+ end
315
+
316
+ raise ArgumentError, "missing rate or limiter" unless rate_given
317
+
318
+ rate = options.fetch(:rate)
319
+ per = options.fetch(:per, 1)
320
+ burst = options.fetch(:burst, nil)
321
+ RateLimiter.validate_options!(rate:, per:, burst:)
322
+
323
+ -> { RateLimiter.new(rate:, per:, burst:) }
324
+ end
325
+
326
+ private_class_method :build_throttle_limiter
327
+
225
328
  # Returns a reusable flow that applies this flow and then `flow`.
226
329
  #
227
330
  # Construction is lazy. No upstream stream is attached and no elements are
@@ -0,0 +1,39 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FiberStream
4
+ module Pull
5
+ # Nil-dropping stage.
6
+ #
7
+ # A single downstream demand may pull multiple upstream elements until a
8
+ # non-nil value is observed or upstream completes. Dropped nil values are
9
+ # discarded immediately and are not buffered.
10
+ class Compact
11
+ def initialize(upstream)
12
+ @upstream = upstream
13
+ @closed = false
14
+ @done = false
15
+ end
16
+
17
+ def next
18
+ return DONE if @closed || @done
19
+
20
+ loop do
21
+ value = @upstream.next
22
+ if Pull.done?(value)
23
+ @done = true
24
+ return DONE
25
+ end
26
+
27
+ return value unless value.nil?
28
+ end
29
+ end
30
+
31
+ def close
32
+ return if @closed
33
+
34
+ @closed = true
35
+ @upstream.close
36
+ end
37
+ end
38
+ end
39
+ end
@@ -0,0 +1,41 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FiberStream
4
+ module Pull
5
+ # Transform-and-filter stage.
6
+ #
7
+ # A single downstream demand may pull multiple upstream elements until the
8
+ # transform returns a truthy value or upstream completes. Falsey transform
9
+ # results are discarded immediately and are not buffered.
10
+ class FilterMap
11
+ def initialize(upstream, transform)
12
+ @upstream = upstream
13
+ @transform = transform
14
+ @closed = false
15
+ @done = false
16
+ end
17
+
18
+ def next
19
+ return DONE if @closed || @done
20
+
21
+ loop do
22
+ value = @upstream.next
23
+ if Pull.done?(value)
24
+ @done = true
25
+ return DONE
26
+ end
27
+
28
+ result = @transform.call(value)
29
+ return result if result
30
+ end
31
+ end
32
+
33
+ def close
34
+ return if @closed
35
+
36
+ @closed = true
37
+ @upstream.close
38
+ end
39
+ end
40
+ end
41
+ end
@@ -0,0 +1,56 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FiberStream
4
+ module Pull
5
+ # One-to-many mapping stage.
6
+ #
7
+ # It expands one upstream element into the values yielded by one returned
8
+ # `#each` object. Only one expansion is active at a time, and the stage
9
+ # never pulls the next upstream element until the active expansion is
10
+ # exhausted.
11
+ class MapConcat
12
+ def initialize(upstream, transform)
13
+ @upstream = upstream
14
+ @transform = transform
15
+ @current_enumerator = nil
16
+ @closed = false
17
+ @done = false
18
+ end
19
+
20
+ def next
21
+ return DONE if @closed || @done
22
+
23
+ loop do
24
+ if @current_enumerator
25
+ begin
26
+ return @current_enumerator.next
27
+ rescue StopIteration
28
+ @current_enumerator = nil
29
+ end
30
+ end
31
+
32
+ value = @upstream.next
33
+ if Pull.done?(value)
34
+ @done = true
35
+ return DONE
36
+ end
37
+
38
+ result = @transform.call(value)
39
+ unless result.respond_to?(:each)
40
+ raise TypeError, "map_concat block result must respond to each"
41
+ end
42
+
43
+ @current_enumerator = result.to_enum(:each)
44
+ end
45
+ end
46
+
47
+ def close
48
+ return if @closed
49
+
50
+ @closed = true
51
+ @current_enumerator = nil
52
+ @upstream.close
53
+ end
54
+ end
55
+ end
56
+ end
@@ -0,0 +1,40 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FiberStream
4
+ module Pull
5
+ # Complement filtering stage.
6
+ #
7
+ # A single downstream demand may pull multiple upstream elements until the
8
+ # predicate retains a value or upstream completes. Rejected elements are
9
+ # discarded immediately and are not buffered.
10
+ class Reject
11
+ def initialize(upstream, predicate)
12
+ @upstream = upstream
13
+ @predicate = predicate
14
+ @closed = false
15
+ @done = false
16
+ end
17
+
18
+ def next
19
+ return DONE if @closed || @done
20
+
21
+ loop do
22
+ value = @upstream.next
23
+ if Pull.done?(value)
24
+ @done = true
25
+ return DONE
26
+ end
27
+
28
+ return value unless @predicate.call(value)
29
+ end
30
+ end
31
+
32
+ def close
33
+ return if @closed
34
+
35
+ @closed = true
36
+ @upstream.close
37
+ end
38
+ end
39
+ end
40
+ end
@@ -0,0 +1,38 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FiberStream
4
+ module Pull
5
+ # Stateless observing stage.
6
+ #
7
+ # It pulls one upstream element for each downstream demand, calls the
8
+ # observer for real elements, and emits the original element unchanged.
9
+ class Tap
10
+ def initialize(upstream, observer)
11
+ @upstream = upstream
12
+ @observer = observer
13
+ @closed = false
14
+ @done = false
15
+ end
16
+
17
+ def next
18
+ return DONE if @closed || @done
19
+
20
+ value = @upstream.next
21
+ if Pull.done?(value)
22
+ @done = true
23
+ return DONE
24
+ end
25
+
26
+ @observer.call(value)
27
+ value
28
+ end
29
+
30
+ def close
31
+ return if @closed
32
+
33
+ @closed = true
34
+ @upstream.close
35
+ end
36
+ end
37
+ end
38
+ end
@@ -0,0 +1,43 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FiberStream
4
+ module Pull
5
+ # Pull-driven rate-limiting stage.
6
+ #
7
+ # The stage pulls at most one upstream value, acquires one permit, and then
8
+ # emits that value unless the stage was closed while waiting.
9
+ class Throttle
10
+ def initialize(upstream, limiter)
11
+ @upstream = upstream
12
+ @limiter = limiter
13
+ @closed = false
14
+ @done = false
15
+ end
16
+
17
+ def next
18
+ return DONE if @closed || @done
19
+
20
+ value = @upstream.next
21
+ if Pull.done?(value)
22
+ @done = true
23
+ return DONE
24
+ end
25
+
26
+ @limiter.acquire(permits: 1)
27
+ if @closed
28
+ @done = true
29
+ return DONE
30
+ end
31
+
32
+ value
33
+ end
34
+
35
+ def close
36
+ return if @closed
37
+
38
+ @closed = true
39
+ @upstream.close
40
+ end
41
+ end
42
+ end
43
+ end
@@ -85,6 +85,22 @@ module FiberStream
85
85
  Map.new(upstream, transform)
86
86
  end
87
87
 
88
+ def self.filter_map(upstream, transform)
89
+ FilterMap.new(upstream, transform)
90
+ end
91
+
92
+ def self.compact(upstream)
93
+ Compact.new(upstream)
94
+ end
95
+
96
+ def self.map_concat(upstream, transform)
97
+ MapConcat.new(upstream, transform)
98
+ end
99
+
100
+ def self.tap(upstream, observer)
101
+ Tap.new(upstream, observer)
102
+ end
103
+
88
104
  def self.parallel_map(upstream, concurrency, transform)
89
105
  ParallelMapBoundary.new(upstream, concurrency, transform)
90
106
  end
@@ -101,6 +117,10 @@ module FiberStream
101
117
  Select.new(upstream, predicate)
102
118
  end
103
119
 
120
+ def self.reject(upstream, predicate)
121
+ Reject.new(upstream, predicate)
122
+ end
123
+
104
124
  def self.take(upstream, count)
105
125
  Take.new(upstream, count)
106
126
  end
@@ -133,6 +153,10 @@ module FiberStream
133
153
  BufferBoundary.new(upstream, count)
134
154
  end
135
155
 
156
+ def self.throttle(upstream, limiter)
157
+ Throttle.new(upstream, limiter)
158
+ end
159
+
136
160
  def self.lines(upstream, chomp, max_length)
137
161
  Lines.new(upstream, chomp, max_length)
138
162
  end
@@ -154,7 +178,12 @@ require_relative "pull/concat"
154
178
  require_relative "pull/zip"
155
179
  require_relative "pull/merge"
156
180
  require_relative "pull/map"
181
+ require_relative "pull/filter_map"
182
+ require_relative "pull/compact"
183
+ require_relative "pull/map_concat"
184
+ require_relative "pull/tap"
157
185
  require_relative "pull/select"
186
+ require_relative "pull/reject"
158
187
  require_relative "pull/take"
159
188
  require_relative "pull/drop"
160
189
  require_relative "pull/grouped"
@@ -165,6 +194,7 @@ require_relative "pull/lines"
165
194
  require_relative "pull/split"
166
195
  require_relative "pull/async_boundary"
167
196
  require_relative "pull/buffer_boundary"
197
+ require_relative "pull/throttle"
168
198
  require_relative "pull/parallel_map_boundary"
169
199
  require_relative "pull/parallel_unordered_map_boundary"
170
200
  require_relative "pull/ractor_map_boundary"
@@ -172,8 +202,8 @@ require_relative "pull/ractor_map_boundary"
172
202
  module FiberStream
173
203
  module Pull
174
204
  private_constant :Each, :IOSource, :RactorPortSource, :RactorMergePortsSource, :RactorProducerSource, :Concat,
175
- :Zip, :Merge, :Map, :Select, :Take, :Drop, :Grouped, :Scan, :TakeWhile, :DropWhile, :Lines, :Split,
176
- :AsyncBoundary, :BufferBoundary, :ParallelMapBoundary, :ParallelUnorderedMapBoundary,
177
- :RactorMapBoundary
205
+ :Zip, :Merge, :Map, :FilterMap, :Compact, :MapConcat, :Tap, :Select, :Reject, :Take, :Drop,
206
+ :Grouped, :Scan, :TakeWhile, :DropWhile, :Lines, :Split, :AsyncBoundary, :BufferBoundary,
207
+ :Throttle, :ParallelMapBoundary, :ParallelUnorderedMapBoundary, :RactorMapBoundary
178
208
  end
179
209
  end
@@ -0,0 +1,163 @@
1
+ # frozen_string_literal: true
2
+
3
+ module FiberStream
4
+ # Scheduler-aware token-bucket rate limiter.
5
+ #
6
+ # `rate` permits refill every `per` seconds. `burst` is the maximum token
7
+ # capacity and defaults to `rate`. Immediate permit grants do not require a
8
+ # scheduler. When FiberStream must sleep, the current fiber must be
9
+ # non-blocking with an installed `Fiber.scheduler`.
10
+ class RateLimiter
11
+ Request = Data.define(:rate, :per, :burst, :permits, :now)
12
+
13
+ class << self
14
+ def validate_options!(rate:, per: 1, burst: nil) # :nodoc:
15
+ rate = validate_rate(rate)
16
+ validate_duration(:per, per)
17
+ validate_burst(burst.nil? ? rate : burst)
18
+ nil
19
+ end
20
+
21
+ private
22
+
23
+ def validate_rate(rate)
24
+ raise TypeError, "rate must be an Integer" unless rate.is_a?(Integer)
25
+ raise ArgumentError, "rate must be positive" unless rate.positive?
26
+
27
+ rate
28
+ end
29
+
30
+ def validate_burst(burst)
31
+ raise TypeError, "burst must be an Integer" unless burst.is_a?(Integer)
32
+ raise ArgumentError, "burst must be positive" unless burst.positive?
33
+
34
+ burst
35
+ end
36
+
37
+ def validate_duration(name, duration)
38
+ raise TypeError, "#{name} must be Numeric" unless duration.is_a?(Numeric)
39
+ raise ArgumentError, "#{name} must be finite and real" unless finite_real?(duration)
40
+ raise ArgumentError, "#{name} must be positive" unless duration.positive?
41
+
42
+ duration
43
+ end
44
+
45
+ def finite_real?(value)
46
+ return false if value.is_a?(Complex)
47
+ return value.finite? if value.respond_to?(:finite?)
48
+
49
+ true
50
+ end
51
+ end
52
+
53
+ def initialize(rate:, per: 1, burst: nil, &block)
54
+ self.class.validate_options!(rate:, per:, burst:)
55
+
56
+ @rate = rate
57
+ @per = per
58
+ @burst = burst.nil? ? @rate : burst
59
+ @policy = block
60
+ @mutex = Mutex.new
61
+ @tokens = @burst.to_f
62
+ @updated_at = monotonic_now
63
+ end
64
+
65
+ # Acquires `permits`, waiting when necessary.
66
+ #
67
+ # Waits are scheduler-backed and non-blocking. Requests larger than `burst`
68
+ # are rejected because the local token bucket could never satisfy them.
69
+ def acquire(permits: 1)
70
+ permits = validate_permits(permits)
71
+
72
+ if @policy
73
+ acquire_with_policy(permits)
74
+ else
75
+ acquire_with_token_bucket(permits)
76
+ end
77
+
78
+ nil
79
+ end
80
+
81
+ private
82
+
83
+ def acquire_with_policy(permits)
84
+ loop do
85
+ wait = normalize_policy_wait(@policy.call(request_for(permits)))
86
+ return if wait <= 0
87
+
88
+ scheduler_sleep(wait)
89
+ end
90
+ end
91
+
92
+ def acquire_with_token_bucket(permits)
93
+ loop do
94
+ wait = nil
95
+
96
+ @mutex.synchronize do
97
+ refill_tokens
98
+
99
+ if @tokens >= permits
100
+ @tokens -= permits
101
+ return
102
+ end
103
+
104
+ wait = (permits - @tokens) / permits_per_second
105
+ end
106
+
107
+ scheduler_sleep(wait)
108
+ end
109
+ end
110
+
111
+ def request_for(permits)
112
+ Request.new(rate: @rate, per: @per, burst: @burst, permits:, now: monotonic_now)
113
+ end
114
+
115
+ def refill_tokens
116
+ now = monotonic_now
117
+ elapsed = now - @updated_at
118
+ @updated_at = now
119
+ @tokens = [@burst, @tokens + (elapsed * permits_per_second)].min
120
+ end
121
+
122
+ def permits_per_second
123
+ @rate.to_f / @per.to_f
124
+ end
125
+
126
+ def scheduler_sleep(duration)
127
+ validate_scheduler!
128
+ sleep(duration)
129
+ end
130
+
131
+ def validate_scheduler!
132
+ return if Fiber.scheduler && !Fiber.current.blocking?
133
+
134
+ message =
135
+ if Fiber.scheduler
136
+ "RateLimiter#acquire requires a non-blocking fiber"
137
+ else
138
+ "RateLimiter#acquire requires Fiber.scheduler"
139
+ end
140
+ raise SchedulerRequiredError, message
141
+ end
142
+
143
+ def validate_permits(permits)
144
+ raise TypeError, "permits must be an Integer" unless permits.is_a?(Integer)
145
+ raise ArgumentError, "permits must be positive" unless permits.positive?
146
+ raise ArgumentError, "permits must be less than or equal to burst" if permits > @burst
147
+
148
+ permits
149
+ end
150
+
151
+ def normalize_policy_wait(wait)
152
+ return 0 if wait.nil?
153
+ raise TypeError, "custom wait duration must be Numeric or nil" unless wait.is_a?(Numeric)
154
+ raise ArgumentError, "custom wait duration must be finite and real" unless self.class.send(:finite_real?, wait)
155
+
156
+ wait.positive? ? wait : 0
157
+ end
158
+
159
+ def monotonic_now
160
+ Process.clock_gettime(Process::CLOCK_MONOTONIC)
161
+ end
162
+ end
163
+ end
@@ -29,6 +29,22 @@ module FiberStream
29
29
  end
30
30
  end
31
31
 
32
+ # Creates a sink that counts all stream elements.
33
+ #
34
+ # The sink consumes upstream until normal completion and returns the number
35
+ # of elements observed. It does not store consumed elements.
36
+ def self.count
37
+ new do |stream|
38
+ count = 0
39
+
40
+ Pull.each_value(stream) do
41
+ count += 1
42
+ end
43
+
44
+ count
45
+ end
46
+ end
47
+
32
48
  # Creates a sink that folds all stream elements into an accumulator.
33
49
  #
34
50
  # The sink consumes upstream until normal completion. It returns the final
@@ -178,6 +178,35 @@ module FiberStream
178
178
  via(Flow.map(&block))
179
179
  end
180
180
 
181
+ # Returns a new source definition that emits truthy transformed values.
182
+ #
183
+ # This is a convenience wrapper around
184
+ # `via(FiberStream::Flow.filter_map { ... })` and has the same falsey-drop,
185
+ # lazy construction, error, and backpressure behavior as the underlying
186
+ # flow.
187
+ def filter_map(&block)
188
+ via(Flow.filter_map(&block))
189
+ end
190
+
191
+ # Returns a new source definition that drops nil elements.
192
+ #
193
+ # This is a convenience wrapper around `via(FiberStream::Flow.compact)` and
194
+ # preserves the same nil-only filtering, lazy construction, and
195
+ # backpressure behavior as the underlying flow.
196
+ def compact
197
+ via(Flow.compact)
198
+ end
199
+
200
+ # Returns a new source definition that emits each mapped expansion.
201
+ #
202
+ # This is a convenience wrapper around
203
+ # `via(FiberStream::Flow.map_concat { ... })` and has the same one-level
204
+ # flattening, lazy construction, error, and backpressure behavior as the
205
+ # underlying flow.
206
+ def map_concat(&block)
207
+ via(Flow.map_concat(&block))
208
+ end
209
+
181
210
  # Returns a new source definition that maps elements concurrently.
182
211
  #
183
212
  # This is a convenience wrapper around
@@ -226,6 +255,15 @@ module FiberStream
226
255
  via(Flow.select(&block))
227
256
  end
228
257
 
258
+ # Returns a new source definition that drops elements matching `block`.
259
+ #
260
+ # This is a convenience wrapper around
261
+ # `via(FiberStream::Flow.reject { ... })` and has the same truthiness and
262
+ # lazy construction behavior as the underlying flow.
263
+ def reject(&block)
264
+ via(Flow.reject(&block))
265
+ end
266
+
229
267
  # Returns a new source definition that emits at most `count` elements.
230
268
  #
231
269
  # This is a convenience wrapper around `via(FiberStream::Flow.take(count))`
@@ -297,6 +335,15 @@ module FiberStream
297
335
  via(Flow.buffer(count))
298
336
  end
299
337
 
338
+ # Returns a new source definition that rate-limits emitted elements.
339
+ #
340
+ # This is a convenience wrapper around `via(FiberStream::Flow.throttle(...))`.
341
+ # The `rate:` form creates a fresh default limiter for each materialization;
342
+ # pass `limiter:` to share quota state across sources or runs.
343
+ def throttle(**options)
344
+ via(Flow.throttle(**options))
345
+ end
346
+
300
347
  # Returns a new source definition that splits String chunks into lines.
301
348
  #
302
349
  # This is a convenience wrapper around
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module FiberStream
4
- VERSION = "0.4.0"
4
+ VERSION = "0.5.0"
5
5
  end
data/lib/fiber_stream.rb CHANGED
@@ -3,6 +3,7 @@
3
3
  require_relative "fiber_stream/pull"
4
4
  require_relative "fiber_stream/version"
5
5
  require_relative "fiber_stream/errors"
6
+ require_relative "fiber_stream/rate_limiter"
6
7
  require_relative "fiber_stream/internal/ractor_transfer_policy"
7
8
  require_relative "fiber_stream/ractor_port"
8
9
  require_relative "fiber_stream/ractor_producer"
data/sig/fiber_stream.rbs CHANGED
@@ -14,6 +14,19 @@ module FiberStream
14
14
  class PipelineCancelledError < RuntimeError
15
15
  end
16
16
 
17
+ class RateLimiter
18
+ class Request < Data
19
+ attr_reader rate: Integer
20
+ attr_reader per: Numeric
21
+ attr_reader burst: Integer
22
+ attr_reader permits: Integer
23
+ attr_reader now: Float
24
+ end
25
+
26
+ def initialize: (rate: Integer, ?per: Numeric, ?burst: Integer?) ?{ (Request request) -> Numeric? } -> void
27
+ def acquire: (?permits: Integer) -> nil
28
+ end
29
+
17
30
  class RactorPortSourceError < RuntimeError
18
31
  attr_reader kind: ractor_port_source_error_kind
19
32
  attr_reader cause_class_name: String
@@ -74,7 +87,11 @@ module FiberStream
74
87
  def parallel_map: [Out] (concurrency: Integer) { (Elem) -> Out } -> Source[Out]
75
88
  def parallel_unordered_map: [Out] (concurrency: Integer) { (Elem) -> Out } -> Source[Out]
76
89
  def ractor_map: [Out] (workers: Integer, ?input_transfer: ractor_transfer_policy, ?output_transfer: ractor_transfer_policy) { (Elem) -> Out } -> Source[Out]
90
+ def filter_map: [Out] () { (Elem) -> (Out | false | nil) } -> Source[Out]
91
+ def compact: () -> Source[Elem]
92
+ def map_concat: [Out] () { (Elem) -> Enumerable[Out] } -> Source[Out]
77
93
  def select: () { (Elem) -> boolish } -> Source[Elem]
94
+ def reject: () { (Elem) -> boolish } -> Source[Elem]
78
95
  def take: (Integer count) -> Source[Elem]
79
96
  def drop: (Integer count) -> Source[Elem]
80
97
  def grouped: (Integer count) -> Source[Array[Elem]]
@@ -83,6 +100,7 @@ module FiberStream
83
100
  def drop_while: () { (Elem) -> boolish } -> Source[Elem]
84
101
  def async: () -> Source[Elem]
85
102
  def buffer: (Integer count) -> Source[Elem]
103
+ def throttle: (?rate: Integer, ?per: Numeric, ?burst: Integer?, ?limiter: untyped) -> Source[Elem]
86
104
  def lines: (?chomp: bool, ?max_length: Integer?) -> Source[String]
87
105
  def split: (String separator, ?keep_separator: bool, ?max_length: Integer?) -> Source[String]
88
106
  def to: [Mat] (Sink[Elem, Mat] sink) -> Pipeline[Mat]
@@ -91,10 +109,15 @@ module FiberStream
91
109
 
92
110
  class Flow[In, Out]
93
111
  def self.map: [In, Out] () { (In) -> Out } -> Flow[In, Out]
112
+ def self.tap: [Elem] () { (Elem) -> void } -> Flow[Elem, Elem]
94
113
  def self.parallel_map: [In, Out] (concurrency: Integer) { (In) -> Out } -> Flow[In, Out]
95
114
  def self.parallel_unordered_map: [In, Out] (concurrency: Integer) { (In) -> Out } -> Flow[In, Out]
96
115
  def self.ractor_map: [In, Out] (workers: Integer, ?input_transfer: ractor_transfer_policy, ?output_transfer: ractor_transfer_policy) { (In) -> Out } -> Flow[In, Out]
116
+ def self.filter_map: [In, Out] () { (In) -> (Out | false | nil) } -> Flow[In, Out]
117
+ def self.compact: [Elem] () -> Flow[Elem, Elem]
118
+ def self.map_concat: [In, Out] () { (In) -> Enumerable[Out] } -> Flow[In, Out]
97
119
  def self.select: [Elem] () { (Elem) -> boolish } -> Flow[Elem, Elem]
120
+ def self.reject: [Elem] () { (Elem) -> boolish } -> Flow[Elem, Elem]
98
121
  def self.take: [Elem] (Integer count) -> Flow[Elem, Elem]
99
122
  def self.drop: [Elem] (Integer count) -> Flow[Elem, Elem]
100
123
  def self.grouped: [Elem] (Integer count) -> Flow[Elem, Array[Elem]]
@@ -103,6 +126,7 @@ module FiberStream
103
126
  def self.drop_while: [Elem] () { (Elem) -> boolish } -> Flow[Elem, Elem]
104
127
  def self.async: [Elem] () -> Flow[Elem, Elem]
105
128
  def self.buffer: [Elem] (Integer count) -> Flow[Elem, Elem]
129
+ def self.throttle: [Elem] (?rate: Integer, ?per: Numeric, ?burst: Integer?, ?limiter: untyped) -> Flow[Elem, Elem]
106
130
  def self.lines: (?chomp: bool, ?max_length: Integer?) -> Flow[String, String]
107
131
  def self.split: (String separator, ?keep_separator: bool, ?max_length: Integer?) -> Flow[String, String]
108
132
  def via: [Next] (Flow[Out, Next] flow) -> Flow[In, Next]
@@ -112,6 +136,7 @@ module FiberStream
112
136
  class Sink[In, Mat]
113
137
  def self.to_a: [Elem] () -> Sink[Elem, Array[Elem]]
114
138
  def self.first: [Elem] () -> Sink[Elem, Elem?]
139
+ def self.count: [Elem] () -> Sink[Elem, Integer]
115
140
  def self.fold: [Elem, Acc] (Acc initial) { (Acc, Elem) -> Acc } -> Sink[Elem, Acc]
116
141
  def self.foreach: [Elem] () { (Elem) -> void } -> Sink[Elem, Integer]
117
142
  def self.io: (untyped io, ?close: bool, ?flush: bool) -> Sink[String, Integer]
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fiber_stream
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.0
4
+ version: 0.5.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dai Akatsuka
@@ -128,14 +128,17 @@ files:
128
128
  - lib/fiber_stream/pull.rb
129
129
  - lib/fiber_stream/pull/async_boundary.rb
130
130
  - lib/fiber_stream/pull/buffer_boundary.rb
131
+ - lib/fiber_stream/pull/compact.rb
131
132
  - lib/fiber_stream/pull/concat.rb
132
133
  - lib/fiber_stream/pull/drop.rb
133
134
  - lib/fiber_stream/pull/drop_while.rb
134
135
  - lib/fiber_stream/pull/each.rb
136
+ - lib/fiber_stream/pull/filter_map.rb
135
137
  - lib/fiber_stream/pull/grouped.rb
136
138
  - lib/fiber_stream/pull/io_source.rb
137
139
  - lib/fiber_stream/pull/lines.rb
138
140
  - lib/fiber_stream/pull/map.rb
141
+ - lib/fiber_stream/pull/map_concat.rb
139
142
  - lib/fiber_stream/pull/merge.rb
140
143
  - lib/fiber_stream/pull/parallel_map_boundary.rb
141
144
  - lib/fiber_stream/pull/parallel_unordered_map_boundary.rb
@@ -143,14 +146,18 @@ files:
143
146
  - lib/fiber_stream/pull/ractor_merge_ports_source.rb
144
147
  - lib/fiber_stream/pull/ractor_port_source.rb
145
148
  - lib/fiber_stream/pull/ractor_producer_source.rb
149
+ - lib/fiber_stream/pull/reject.rb
146
150
  - lib/fiber_stream/pull/scan.rb
147
151
  - lib/fiber_stream/pull/select.rb
148
152
  - lib/fiber_stream/pull/split.rb
149
153
  - lib/fiber_stream/pull/take.rb
150
154
  - lib/fiber_stream/pull/take_while.rb
155
+ - lib/fiber_stream/pull/tap.rb
156
+ - lib/fiber_stream/pull/throttle.rb
151
157
  - lib/fiber_stream/pull/zip.rb
152
158
  - lib/fiber_stream/ractor_port.rb
153
159
  - lib/fiber_stream/ractor_producer.rb
160
+ - lib/fiber_stream/rate_limiter.rb
154
161
  - lib/fiber_stream/running_pipeline.rb
155
162
  - lib/fiber_stream/sink.rb
156
163
  - lib/fiber_stream/source.rb
@@ -162,7 +169,7 @@ licenses:
162
169
  metadata:
163
170
  allowed_push_host: https://rubygems.org
164
171
  homepage_uri: https://github.com/dakatsuka/fiber_stream
165
- source_code_uri: https://github.com/dakatsuka/fiber_stream/tree/v0.4.0
172
+ source_code_uri: https://github.com/dakatsuka/fiber_stream/tree/v0.5.0
166
173
  changelog_uri: https://github.com/dakatsuka/fiber_stream/blob/main/CHANGELOG.md
167
174
  rubygems_mfa_required: 'true'
168
175
  rdoc_options: []