kino 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/Cargo.toml ADDED
@@ -0,0 +1,15 @@
1
+ # This Cargo.toml is here to let externals tools (IDEs, etc.) know that this is
2
+ # a Rust project. Your extensions dependencies should be added to the Cargo.toml
3
+ # in the ext/ directory.
4
+
5
+ [workspace]
6
+ members = ["./ext/kino"]
7
+ resolver = "2"
8
+
9
+ [profile.release]
10
+ # Keep debug symbols in release builds so the final binary stays debuggable.
11
+ debug = true
12
+ opt-level = 3
13
+ lto = "fat"
14
+ codegen-units = 1
15
+ incremental = false
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2026 Yaroslav Markin
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,384 @@
1
+ # Kino
2
+
3
+ **Kino** is a high-performance **Ractor** web server for Ruby 4.0+.
4
+
5
+ [![GitHub Release](https://img.shields.io/github/v/release/yaroslav/kino)](https://github.com/yaroslav/kino/releases)
6
+ [![Docs](https://img.shields.io/badge/yard-docs-blue.svg)](https://rubydoc.info/gems/kino)
7
+
8
+ Ruby threads cannot run Ruby code in parallel, so production setups fork
9
+ a process per core and pay for each copy in memory. Kino runs your code
10
+ on every core in **one small process**. A **Rust** (tokio + hyper)
11
+ front-end owns the network, parallel **Ractors** run your Rack 3 app,
12
+ and a threaded fallback mode runs everything else, Rails included.
13
+
14
+ * **Fast.** On a real 8-core server, every Kino mode is **1.4-2×** ahead
15
+ of a same-topology Puma cluster on I/O-light endpoints. Ractor mode
16
+ also wins on pure CPU. [Benchmarks](#benchmarks) below.
17
+ * **A fraction of the memory.** One process instead of a fork per core:
18
+ about **1/19th of the Puma cluster's memory** under the same load, and
19
+ about 1/8th when serving the Rails hello-world.
20
+ * **Parallel without forking.** Ractor mode runs CPU work **5×** faster
21
+ than Kino's own GVL-bound threaded mode, in the same small process.
22
+ * **Production plumbing included.** Graceful drain, crash supervision
23
+ and respawn, bounded queues with 503 backpressure, request timeouts,
24
+ TLS (rustls), live stats, async access and app logging.
25
+ * **Tells you why.** `kino --check` lists exactly what blocks your app
26
+ from ractor mode, finding by finding, so you do not have to decode
27
+ `Ractor::IsolationError` yourself.
28
+ * **Puma-shaped.** The same `workers × threads` topology, a familiar
29
+ config DSL, a `kino` CLI. If you can run Puma, you can run Kino.
30
+
31
+ **N.B.:** Ractors are officially **experimental** in Ruby 4.0, and so is this server. The threaded mode is solid. Still, Kino aims to be the best way to experiment with Ractors today—and the best Ractor server when they become stable.
32
+
33
+ ---
34
+
35
+ ## Table of Contents
36
+
37
+ - [Why](#why)
38
+ - [Benchmarks](#benchmarks)
39
+ - [Install](#install)
40
+ - [Usage](#usage)
41
+ - [Config file and CLI](#config-file-and-cli)
42
+ - [`kino --check`](#kino---check)
43
+ - [Request timeouts](#request-timeouts)
44
+ - [Stats](#stats)
45
+ - [Logging](#logging)
46
+ - [Timer waits](#timer-waits)
47
+ - [Rack 3 compliance](#rack-3-compliance)
48
+ - [Rails](#rails)
49
+
50
+ ## Why
51
+
52
+ The GVL allows only one Ruby thread to run at a time. To use all cores,
53
+ Ruby servers fork processes, and every fork costs a full copy of the
54
+ app. Ractors do not have this limit: each one has its own lock, so one
55
+ process can run Ruby in parallel. What was missing is a server that
56
+ dispatches requests to them. Ruby 4.0 reworked Ractors (`Ractor::Port`,
57
+ `shareable_proc`, less lock contention) and made this worth building.
58
+
59
+ Why a Ractor server has to be built this way, and which Rust parts make
60
+ Ractors fast here: [doc/why-kino.md](doc/why-kino.md). The full design
61
+ notes live in [doc/architecture.md](doc/architecture.md).
62
+
63
+ ## Benchmarks
64
+
65
+ Measured on a real server: AWS **c7a.2xlarge** (8-core AMD EPYC 9R14,
66
+ 16 GB, Amazon Linux 2023). This is a realistic app-server size. The same
67
+ Ractor-shareable app runs on every server, Ruby 4.0.5 with YJIT, equal
68
+ topology (8 workers × 3 threads; Puma forks, Kino stays in one process).
69
+ Numbers are req/s by wrk (8-second windows, 64 connections, same host).
70
+ Methodology and the analysis behind every column:
71
+ [doc/benchmarks.md](doc/benchmarks.md).
72
+
73
+ | endpoint | Kino :ractor | + lanes | Kino :threaded | Puma (cluster) |
74
+ |-------------|-------------:|--------:|---------------:|---------------:|
75
+ | /plaintext | 201,472 | **241,501** | 218,348 | 117,838 |
76
+ | /10k | 156,635 | **183,564** | 153,442 | 106,666 |
77
+ | /cpu (fib) | 66,735¹| **70,373** | 13,298 | 58,207 |
78
+ | /io (5 ms) | 4,527²| 4,530 | **4,715** | 4,691 |
79
+ | /io_native | 4,714 | **4,717** | 4,709 | 4,692 |
80
+
81
+ Memory on the same box, RSS under load:
82
+
83
+ | serving | Kino (one process) | Puma cluster (8 workers) |
84
+ |-----------------------|-------------------:|-------------------------:|
85
+ | bench app, :ractor | **57 MB** | 1,078 MB |
86
+ | bench app, :threaded | **50 MB** | 1,078 MB |
87
+ | Rails hello-world | **97 MB** | 797 MB |
88
+
89
+ "+ lanes" is the experimental per-worker-queue dispatcher (`lanes true`).
90
+ It adds +20% over the shared queue on this hardware and makes ractor
91
+ mode the fastest Kino configuration. Details:
92
+ [doc/benchmarks.md](doc/benchmarks.md#lane-dispatch-experimental-lanes-true).
93
+
94
+ ¹ Stock settings, no tuning. Ractor mode beats the fork cluster on pure
95
+ CPU by +15% (+21% with lanes). Threaded mode shows the GVL ceiling that
96
+ every single-process Ruby server hits. The CPU-tuning recipe that our
97
+ earlier Docker measurements needed makes no difference on real hardware
98
+ (+0.5%); see [doc/benchmarks.md](doc/benchmarks.md#cpu-bound-tuning).
99
+
100
+ ² The ractor timer tax is small on real hardware: −4% against threaded
101
+ mode (it was −18% in Docker). Wait-bound throughput is slots ÷ wait, and
102
+ Kino slots are threads, not processes. `workers 32, threads 1` measured
103
+ **5,922 /io (+27% over the cluster) and 6,254 /io_native (+34%)**, still
104
+ one small process. See
105
+ [doc/benchmarks.md](doc/benchmarks.md#why-io-lags-in-ractor-mode-on-linux).
106
+
107
+ A common first idea is to keep your current server and wrap the app in
108
+ a ractor pool. We measured that too (same box; the analysis is in the
109
+ doc):
110
+
111
+ | endpoint | Kino :ractor | Puma + ractor wrapper | Falcon + ractor wrapper |
112
+ |------------|-------------:|----------------------:|------------------------:|
113
+ | /plaintext | **201,472** | 19,425 | 100,624 |
114
+ | /cpu (fib) | **66,735** | 17,106 | 49,083 |
115
+ | /io (5 ms) | **4,527** | 1,447 | 1,549 |
116
+
117
+ In short: ractor mode reaches fork-level CPU parallelism (**5×** Kino's
118
+ own GVL-bound threaded mode) in one process, at about 1/19th of the
119
+ cluster's memory. Every Kino mode is 1.4-2× ahead of the cluster on
120
+ I/O-light endpoints. The macOS numbers (secondary; everything there hits
121
+ the loopback ceiling) and the YJIT × Ractors gotcha are in
122
+ [doc/benchmarks.md](doc/benchmarks.md).
123
+
124
+ Reproduce: `bench/run.sh [seconds] [concurrency]` for the main table,
125
+ `bench/studies.sh` for the follow-ups (CPU recipe, topology, scaling,
126
+ logging, memory).
127
+
128
+ ## Install
129
+
130
+ You need Ruby >= 4.0. Add Kino to your application's bundle:
131
+
132
+ ```sh
133
+ bundle add kino # or: gem install kino (outside a bundle)
134
+ ```
135
+
136
+ or put it in the `Gemfile` yourself:
137
+
138
+ ```ruby
139
+ gem "kino", "~> 0.1"
140
+ ```
141
+
142
+ Then generate a config and serve:
143
+
144
+ ```sh
145
+ bundle exec kino --init # writes kino.rb; every directive documented in place
146
+ bundle exec kino # picks up config.ru + kino.rb, serves on :9292
147
+ ```
148
+
149
+ (After a standalone `gem install`, the `kino` command works without
150
+ `bundle exec`.)
151
+
152
+ No Rust compiler needed: released versions ship precompiled native gems
153
+ for Linux (x86_64/aarch64, glibc and musl) and macOS (arm64). On other
154
+ platforms the gem compiles at install time; that needs a Rust toolchain,
155
+ plus clang/libclang on Linux.
156
+
157
+ ## Usage
158
+
159
+ ```ruby
160
+ require "kino"
161
+
162
+ # Ractor mode needs a Ractor-shareable app: capture nothing, freeze config.
163
+ app = Ractor.shareable_proc do |env|
164
+ [200, { "content-type" => "text/plain" }, ["Hello from #{Ractor.current}"]]
165
+ end
166
+
167
+ Kino::Server.run(app, port: 9292) # traps INT/TERM; Ctrl-C drains gracefully
168
+ ```
169
+
170
+ Or embedded, with everything spelled out:
171
+
172
+ ```ruby
173
+ server = Kino::Server.new(app,
174
+ bind: "127.0.0.1",
175
+ port: 9292, # 0 = ephemeral; read back via server.port
176
+ workers: Etc.nprocessors, # ractors (parallelism)
177
+ threads: 3, # threads per ractor (I/O concurrency, Puma-style)
178
+ mode: :auto, # :auto | :ractor | :threaded
179
+ queue_depth: 1024, # bounded queue; overflow → 503
180
+ queue_timeout: 1.0, # seconds before 503 on a full queue
181
+ request_timeout: nil, # seconds before a slow response becomes a 504 (nil = off)
182
+ shutdown_timeout: 30, # drain deadline
183
+ tls: { cert: "cert.pem", key: "key.pem" }, # file paths or inline PEM
184
+ )
185
+ server.start
186
+ server.shutdown # graceful: drain → deadline → abort stragglers
187
+ ```
188
+
189
+ ### Modes
190
+
191
+ - **`:ractor`**: `workers` Ractors × `threads` Threads each. The app must
192
+ be `Ractor.shareable?` (frozen middleware, `shareable_proc` endpoints).
193
+ Forcing `:ractor` with an unshareable app raises
194
+ `Kino::UnshareableAppError`. A crashed ractor returns 500 to its
195
+ in-flight requests right away, then respawns.
196
+ - **`:threaded`**: the same machinery on `workers × threads` plain
197
+ Threads. Runs **any** Rack app, including Rails, today. Parallel for
198
+ I/O, serialized by the GVL for CPU.
199
+ - **`:auto`** (default): `:ractor` when the app is shareable, otherwise
200
+ a warning and `:threaded`. One caveat: a *class* used as a Rack app
201
+ always counts as "shareable" (classes are), even if calling it touches
202
+ unshareable state. Force `:threaded` for those.
203
+
204
+ ## Config file and CLI
205
+
206
+ Settings can live in a Puma-style Ruby DSL file. Precedence: explicit
207
+ kwargs and CLI flags > config file > defaults.
208
+
209
+ ```ruby
210
+ # kino.rb
211
+ port 9292
212
+ workers 8
213
+ threads 3
214
+ mode :ractor
215
+ ```
216
+
217
+ ```sh
218
+ kino --init # write a fully commented sample kino.rb
219
+ kino # config.ru + kino.rb, port 9292
220
+ kino --check # explain whether the app can run in :ractor mode
221
+ kino -C config/kino.rb -p 3000 -w 4 -m ractor my_app.ru
222
+ ```
223
+
224
+ The generated sample documents every directive, including the Rails
225
+ settings and the performance notes.
226
+
227
+ ## `kino --check`
228
+
229
+ When an app cannot run in `:ractor` mode, Kino can tell you why, instead
230
+ of leaving you with a bare `Ractor::IsolationError`. The check changes
231
+ nothing (it does not freeze your objects) and names each blocker:
232
+ captured variables with the place they were defined, instance variables
233
+ by path, and the class-level instance variable trap that catches
234
+ class-style apps:
235
+
236
+ ```
237
+ $ kino --check
238
+ check: app is NOT Ractor-shareable
239
+ - app (Proc at app.rb:12)—captures `cache` = {} (Hash) (unshareable)
240
+ - app (HelloApp).@instance—class-level ivar holds #<HelloApp…>—classes
241
+ pass Ractor.shareable?, but reading this from a worker ractor raises
242
+ Ractor::IsolationError on the first request
243
+ hints: freeze config at boot; build endpoints with Ractor.shareable_proc;
244
+ keep per-worker resources in Ractor.store_if_absent; or run mode :threaded.
245
+ ```
246
+
247
+ Exit status is 0/1, so it works in CI. The programmatic form is
248
+ `Kino::Check.report(app)`.
249
+
250
+ ## Request timeouts
251
+
252
+ `request_timeout: seconds` (or `request_timeout 30` in `kino.rb`) limits
253
+ how long the app may take to produce a response. Past the deadline the
254
+ client gets an immediate **504** while the handler keeps running; its
255
+ late response is dropped without harm. Off by default. The handler is
256
+ deliberately *not* killed, because interrupting arbitrary Ruby mid-flight
257
+ is unsafe. A stuck handler still occupies its worker slot until it
258
+ returns, so set the deadline above your slowest legitimate endpoint and
259
+ watch `stats[:timeouts]`.
260
+
261
+ ## Stats
262
+
263
+ `server.stats` returns a live snapshot: the configuration plus counters
264
+ from the native layer (one relaxed atomic per request, no measurable
265
+ cost):
266
+
267
+ ```ruby
268
+ server.stats
269
+ # => {mode: :ractor, lanes: false, workers: 8, threads: 3, batch: 1,
270
+ # respawns: 0, queued: 0, in_flight: 2, served: 1041, rejected: 0,
271
+ # timeouts: 0}
272
+ # plus lane_depths: [...] when lane dispatch is on
273
+ ```
274
+
275
+ From the outside, `kill -USR1 <pid>` prints the same snapshot as one line
276
+ (pair it with `pidfile` to find the pid):
277
+
278
+ ```
279
+ Kino stats: mode=:ractor lanes=false workers=8 threads=3 batch=1 respawns=0 queued=0 in_flight=2 served=1041 rejected=0 timeouts=0
280
+ ```
281
+
282
+ ## Logging
283
+
284
+ With one log line per request, `Kino::Logger` sustained **2.4× the
285
+ throughput of a shared `::Logger`** (151k vs 63k req/s on the benchmark
286
+ box). There are two native pieces. Both write through a lock-free
287
+ channel to a Rust flusher thread, so request threads never take a log
288
+ mutex and never make a write syscall:
289
+
290
+ - **Access log** (`log_requests true`): one line per request to stdout,
291
+ including the 503s that never reach your app. On color terminals the
292
+ lines are tinted by status class: 2xx green, 3xx yellow, 4xx maroon,
293
+ 5xx bright red:
294
+
295
+ ```
296
+ 127.0.0.1 [Tue, 10 Jun 2026 13:39:56 GMT] "GET / HTTP/1.1" 200 0.1ms
297
+ ```
298
+
299
+ - **`Kino::Logger`**: a `::Logger` over the same async sink, for your
300
+ app's own logging (`Kino::Logger.new("log/production.log")`, or no
301
+ argument for stdout). The raw IO-like device is `Kino::Logger::Device`,
302
+ for integrations that want bytes without `::Logger` formatting. The
303
+ device is frozen and Ractor-shareable, so one device serves every
304
+ worker.
305
+
306
+ `Kino::Logger` in a **Rails** app: it is a real `::Logger` subclass, so
307
+ it fits anywhere Rails expects a logger:
308
+
309
+ ```ruby
310
+ # config/environments/production.rb, simplest forms:
311
+ config.logger = Kino::Logger.new # stdout
312
+ config.logger = Kino::Logger.new("log/production.log") # file
313
+ # both file and stdout:
314
+ config.logger = ActiveSupport::BroadcastLogger.new(
315
+ Kino::Logger.new("log/production.log"), Kino::Logger.new
316
+ )
317
+ # tagged logging wraps it like any ::Logger:
318
+ config.logger = ActiveSupport::TaggedLogging.new(Kino::Logger.new)
319
+ ```
320
+
321
+ From a plain **Rack** app, give middleware the logger, or hand
322
+ `Rack::CommonLogger` the raw device (it just calls `write`):
323
+
324
+ ```ruby
325
+ # config.ru
326
+ use Rack::CommonLogger, Kino::Logger::Device.new # access-style app log
327
+ run MyApp
328
+ ```
329
+
330
+ (If you only want request lines, prefer Kino's own `log_requests true`.
331
+ It is free for your Ruby threads, and it also sees the 503s that never
332
+ reach Rack.)
333
+
334
+ Graceful shutdown drains both logs fully. A hard crash can lose the tail
335
+ of the buffer, and when you log faster than the disk can take (over 100k
336
+ lines/s), the sink drops lines instead of blocking request threads.
337
+ These trade-offs are measured in
338
+ [doc/benchmarks.md](doc/benchmarks.md#logging-costs).
339
+
340
+ ## Timer waits
341
+
342
+ `Kino.sleep(seconds)` is a high-resolution sleep on the OS clock with
343
+ the GVL released. MRI's own `sleep` wakes up late inside non-main
344
+ ractors (details and numbers in [doc/benchmarks.md](doc/benchmarks.md)).
345
+ Use `Kino.sleep` for explicit timer waits in handlers. Ordinary blocking
346
+ I/O does not need it.
347
+
348
+ ## Rack 3 compliance
349
+
350
+ The spec suite runs every test app under `Rack::Lint` over real sockets:
351
+ streaming request bodies (forward-only `rack.input`), enumerable and
352
+ callable (full-duplex stream) response bodies, lowercase and multi-value
353
+ headers, HEAD/204 semantics. Full hijack is left out on purpose; it is
354
+ optional in Rack 3.
355
+
356
+ ## Rails
357
+
358
+ Rails (edge) runs on Kino today in `:threaded` mode; see
359
+ `examples/rails-hello`. Ractor-mode Rails is blocked upstream. The exact
360
+ blockers, the `Ruby::Box` findings, and what would unlock it are written
361
+ up in [doc/rails-on-ractors.md](doc/rails-on-ractors.md). The example
362
+ ships a probe script that re-tests against whatever Rails you bundle.
363
+
364
+ ## Development
365
+
366
+ ```sh
367
+ bin/setup
368
+ bundle exec rake # compile, Rust tests, specs, RBS, lint
369
+ RB_SYS_CARGO_PROFILE=dev bundle exec rake compile # fast dev rebuilds
370
+ ```
371
+
372
+ ## Assisted by
373
+
374
+ Claude Code (Mythos, Opus).
375
+
376
+ ## Contributing
377
+
378
+ Bug reports and pull requests are welcome on GitHub at
379
+ https://github.com/yaroslav/kino.
380
+
381
+ ## License
382
+
383
+ The gem is available as open source under the terms of the
384
+ [MIT License](https://opensource.org/licenses/MIT).
data/doc/README.md ADDED
@@ -0,0 +1,6 @@
1
+ # Extra documentation
2
+
3
+ This folder is dedicated to architectural decisions, discussions, and
4
+ benchmark results.
5
+
6
+ Almost all content here is written by agents (Claude Code or Codex).
@@ -0,0 +1,161 @@
1
+ # Architecture
2
+
3
+ ```
4
+ tokio (Rust threads) Ruby
5
+ ┌──────────────────────────┐
6
+ │ accept loop (hyper) │ bounded MPMC ┌─ worker: Ractor × threads ─┐
7
+ │ per request: │ ──── queue ───────► │ loop { │
8
+ │ parse → RequestCtx │ │ env = take_one │ ← blocks with the
9
+ │ queue full → 503 │ ◄─── response ───── │ status,h,b = app.(env) │ per-ractor lock
10
+ │ TLS (rustls) │ ◄─── body chunks ── │ respond / stream │ RELEASED
11
+ └──────────────────────────┘ └────────────────────────────┘
12
+ ```
13
+
14
+ All network I/O lives in Rust on a tokio multi-threaded runtime; hyper
15
+ parses HTTP/1.1 and handles keep-alive; rustls terminates TLS. Ruby never
16
+ touches a socket. Each request becomes a Rust-side `RequestCtx` pushed to a
17
+ bounded flume MPMC queue; Ruby workers pull from it.
18
+
19
+ ## Topology
20
+
21
+ Puma-style two-level: `workers × threads`.
22
+
23
+ - `:ractor` mode—`workers` Ractors, each running `threads` Ruby Threads
24
+ over the same worker loop. Parallel across ractors (each has its own VM
25
+ lock); concurrent within one only for I/O-bound handlers.
26
+ - `:threaded` mode—the same total capacity as plain Threads on the main
27
+ ractor. Runs any Rack app; the GVL serializes CPU work.
28
+ - Identical machinery either way: the flume queue is MPMC, a "worker slot"
29
+ is per-thread, and the worker loop (`lib/kino/worker.rb`) is shared
30
+ verbatim.
31
+ - Experimental `lanes true` replaces the one shared queue with a small
32
+ private queue per worker slot (awake-preferring dispatch, work
33
+ stealing); see [benchmarks](benchmarks.md#lane-dispatch-experimental-lanes-true).
34
+
35
+ ## The Rust ↔ Ruby boundary
36
+
37
+ - **No native (TypedData) handle crosses a ractor boundary.** Worker
38
+ ractors receive plain integers (server id, worker ids) plus the
39
+ Ractor-shareable app; native state lives in a global Rust-side registry
40
+ keyed by those ids. The per-request handle
41
+ (`Kino::Native::Request`, a TypedData object) is created *inside* the
42
+ worker ractor by the take calls (`take_one`/`take_batch`), so its
43
+ ownership is correct by construction.
44
+ - **Blocking discipline:** every blocking native call goes through
45
+ `rb_thread_call_without_gvl` (rb-sys; magnus doesn't wrap it) so a
46
+ blocked worker holds no VM lock. Waits poll an atomic interrupt flag
47
+ between bounded `recv_timeout` ticks; the unblock function (UBF) just
48
+ sets the flag. `flume::Selector` lost wakeups under sustained load
49
+ (workers went permanently deaf to a non-empty queue after ~100k
50
+ requests) and is not used anywhere.
51
+ - **Fast path:** when a request is already queued, `take_one` takes it
52
+ with `try_recv` while still holding the GVL—the release/reacquire pair
53
+ (two scheduler round-trips) is skipped entirely. Under load this is the
54
+ common case.
55
+ - **Fused crossing:** the common complete-body response rides
56
+ `respond_and_take_one`: answer the previous request and take the next in
57
+ one FFI call, ~one crossing per request once the loop is warm. The env
58
+ Hash carries the request handle under `env["kino.request"]`, so no
59
+ per-request pair array exists either.
60
+ - **Env construction:** one FFI call builds the full CGI side of the Rack
61
+ env as a real Hash. Static keys, common methods/protocols and 44 common
62
+ `HTTP_*` header names come from a frozen (and therefore Ractor-shareable)
63
+ string cache built once at init on the main ractor. Frozen keys also
64
+ skip the dup that `Hash#[]=` performs on unfrozen string keys. Only
65
+ `rack.input` is lazy/streaming.
66
+ - **Response path:** the Rack headers Hash is passed through as-is and
67
+ iterated on the Rust side (`RHash#foreach`); header bytes are borrowed
68
+ in place from rooted Ruby strings (safe: GVL held, hyper copies
69
+ immediately). Single-chunk bodies skip the join copy.
70
+
71
+ ## Backpressure, in both directions
72
+
73
+ - Bounded request queue between tokio and Ruby. When it stays full past
74
+ `queue_timeout`, the client gets an immediate 503 rather than waiting.
75
+ - Request bodies stream through a bounded(8) channel: hyper is only polled
76
+ as fast as Ruby consumes (inbound backpressure costs nothing extra).
77
+ Bodyless requests (most GETs) spawn no forwarder task at all.
78
+ - Response bodies stream through a bounded(8) channel the other way: a
79
+ slow client makes `write_chunk` block—with the GVL released.
80
+
81
+ ## Failure handling
82
+
83
+ Three parties can answer a client, coordinated by an atomic
84
+ first-claimant-wins flag on the per-request `Responder`:
85
+
86
+ 1. The app, via the worker loop (normal path; `StandardError` is rescued
87
+ in Ruby and becomes a clean 500).
88
+ 2. The supervisor: each worker ractor has a supervisor thread blocked in
89
+ `Ractor#value`. A hard crash (any `Exception`) wakes it; it immediately
90
+ 500s the crashed ractor's in-flight requests via a `Weak<Responder>`
91
+ side table—not when GC eventually notices—and respawns the ractor
92
+ with fresh slots.
93
+ 3. A `Drop` guard on `RequestCtx` as the universal backstop (GC of an
94
+ abandoned handle, teardown races). The Drop path never touches the Ruby
95
+ API, so it is safe from any thread.
96
+
97
+ With `request_timeout` configured, the tokio front-end can additionally
98
+ answer with a 504 on its own when the response head misses the deadline;
99
+ the worker keeps running, and its late response goes nowhere harmlessly:
100
+ the front-end has stopped listening (the oneshot receiver is dropped),
101
+ and the worker's claim makes the Drop backstop a no-op.
102
+
103
+ Client aborts are handled the same way in reverse: hyper drops the request
104
+ future, and a Rust `Drop` guard keeps the in-flight counter honest (a
105
+ plain decrement after an `.await` would never run).
106
+
107
+ ## Graceful shutdown
108
+
109
+ `stop_accepting` → drain until queue + in-flight reach zero or the
110
+ deadline passes → `close_queue` (idle workers see Disconnected and exit) →
111
+ join workers → past deadline: abort remaining clients (a 500, or a
112
+ connection abort mid-stream), interrupt blocked workers, reap
113
+ stragglers → tear down the tokio runtime. Idempotent;
114
+ a second INT/TERM force-exits.
115
+
116
+ ## Timer waits: `Kino.sleep`
117
+
118
+ MRI's `sleep` parks the thread on the VM timer, whose wakeups inside
119
+ non-main ractors are coarse (how coarse is environment-dependent; see
120
+ [benchmarks](benchmarks.md#why-io-lags-in-ractor-mode-on-linux)).
121
+ `Kino.sleep` releases the GVL and waits on the OS clock directly, chunked
122
+ at the interrupt tick so `Thread#kill` and shutdown stay responsive.
123
+
124
+ ## Why tokio (researched June 2026)
125
+
126
+ - **tokio + hyper**: the bottleneck is the Ruby dispatch boundary, not raw
127
+ I/O throughput; what matters is HTTP correctness, keep-alive, TLS, and
128
+ h2-later—hyper's territory. Cross-platform out of the box.
129
+ - **monoio**: thread-per-core io_uring looks great in echo-server
130
+ benchmarks, but hyper only works through its poll-io compat layer
131
+ (forfeiting io_uring on the hot path), and the share-nothing advantage
132
+ is spent the moment requests fan into an MPMC queue toward Ruby.
133
+ - **compio**: completion-based, cross-platform, production-proven—but no
134
+ first-class HTTP server story yet, and completion-model owned-buffer
135
+ semantics would leak into the request lifecycle design.
136
+ - **ntex**: the strongest alternative—unlike monoio/compio it has a
137
+ first-class HTTP/1.1 + HTTP/2 server stack (TechEmpower top tier) plus
138
+ an io_uring runtime ("neon") on Linux today. Rejected as the default
139
+ for now: its thread-per-core, `Rc`-based `!Send` worker model is
140
+ exactly what our Send-ctx-into-MPMC dispatch opts out of; its own
141
+ request/response/body types would force a conversion seam through
142
+ `Responder` and the streaming path; neon is Linux-only (ntex-on-tokio
143
+ elsewhere forfeits the io_uring win and just trades hyper's
144
+ battle-tested h1 for a less-deployed one); and the realistic gain is
145
+ confined to syscall-bound /plaintext-class traffic—the Ruby boundary,
146
+ not the front-end, is where Kino's time goes. Worth a contained
147
+ feature-flag spike if the Linux plaintext ceiling ever matters
148
+ competitively.
149
+ - **io_uring path**: tokio ships in-tree io_uring as an unstable feature
150
+ (file ops as of 1.52; network expected to follow). `server.rs` isolates
151
+ the runtime, so adopting it later is a contained change—and would
152
+ deliver most of ntex/neon's win without the type seam.
153
+
154
+ ## Versioning of risky dependencies
155
+
156
+ magnus is used for everything except the GVL-release primitives and the
157
+ `rb_ext_ractor_safe` flag, which go straight to rb-sys (magnus wraps
158
+ neither). magnus's lazy TypedData class cache is force-resolved at init
159
+ on the main ractor, so no worker ractor ever races its first resolution;
160
+ the only symbols the crate creates are made during `server_start`, also
161
+ on the main ractor.