kino 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.yardopts +14 -0
- data/CHANGELOG.md +54 -0
- data/Cargo.lock +993 -0
- data/Cargo.toml +15 -0
- data/LICENSE.txt +21 -0
- data/README.md +384 -0
- data/doc/README.md +6 -0
- data/doc/architecture.md +161 -0
- data/doc/benchmarks.md +321 -0
- data/doc/rails-on-ractors.md +50 -0
- data/doc/why-kino.md +91 -0
- data/exe/kino +26 -0
- data/ext/kino/Cargo.toml +49 -0
- data/ext/kino/build.rs +5 -0
- data/ext/kino/extconf.rb +6 -0
- data/ext/kino/src/env_strings.rs +318 -0
- data/ext/kino/src/gvl.rs +103 -0
- data/ext/kino/src/lib.rs +90 -0
- data/ext/kino/src/logsink.rs +155 -0
- data/ext/kino/src/queue.rs +207 -0
- data/ext/kino/src/registry.rs +268 -0
- data/ext/kino/src/request.rs +432 -0
- data/ext/kino/src/response.rs +214 -0
- data/ext/kino/src/server.rs +621 -0
- data/ext/kino/src/style.rs +87 -0
- data/ext/kino/src/test_support.rs +82 -0
- data/ext/kino/src/timer.rs +57 -0
- data/ext/kino/src/tls.rs +96 -0
- data/lib/kino/check.rb +199 -0
- data/lib/kino/cli.rb +254 -0
- data/lib/kino/configuration.rb +190 -0
- data/lib/kino/errors_stream.rb +25 -0
- data/lib/kino/input.rb +77 -0
- data/lib/kino/logger.rb +56 -0
- data/lib/kino/null_input.rb +37 -0
- data/lib/kino/ractor_supervisor.rb +103 -0
- data/lib/kino/server.rb +271 -0
- data/lib/kino/stream.rb +61 -0
- data/lib/kino/templates/kino.rb.tt +141 -0
- data/lib/kino/version.rb +6 -0
- data/lib/kino/worker.rb +124 -0
- data/lib/kino.rb +53 -0
- data/sig/kino.rbs +178 -0
- metadata +219 -0
data/Cargo.toml
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
# This Cargo.toml is here to let externals tools (IDEs, etc.) know that this is
|
|
2
|
+
# a Rust project. Your extensions dependencies should be added to the Cargo.toml
|
|
3
|
+
# in the ext/ directory.
|
|
4
|
+
|
|
5
|
+
[workspace]
|
|
6
|
+
members = ["./ext/kino"]
|
|
7
|
+
resolver = "2"
|
|
8
|
+
|
|
9
|
+
[profile.release]
|
|
10
|
+
# Keep debug symbols in release builds so the final binary stays debuggable.
|
|
11
|
+
debug = true
|
|
12
|
+
opt-level = 3
|
|
13
|
+
lto = "fat"
|
|
14
|
+
codegen-units = 1
|
|
15
|
+
incremental = false
|
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
The MIT License (MIT)
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Yaroslav Markin
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
|
13
|
+
all copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,384 @@
|
|
|
1
|
+
# Kino
|
|
2
|
+
|
|
3
|
+
**Kino** is a high-performance **Ractor** web server for Ruby 4.0+.
|
|
4
|
+
|
|
5
|
+
[](https://github.com/yaroslav/kino/releases)
|
|
6
|
+
[](https://rubydoc.info/gems/kino)
|
|
7
|
+
|
|
8
|
+
Ruby threads cannot run Ruby code in parallel, so production setups fork
|
|
9
|
+
a process per core and pay for each copy in memory. Kino runs your code
|
|
10
|
+
on every core in **one small process**. A **Rust** (tokio + hyper)
|
|
11
|
+
front-end owns the network, parallel **Ractors** run your Rack 3 app,
|
|
12
|
+
and a threaded fallback mode runs everything else, Rails included.
|
|
13
|
+
|
|
14
|
+
* **Fast.** On a real 8-core server, every Kino mode is **1.4-2×** ahead
|
|
15
|
+
of a same-topology Puma cluster on I/O-light endpoints. Ractor mode
|
|
16
|
+
also wins on pure CPU. [Benchmarks](#benchmarks) below.
|
|
17
|
+
* **A fraction of the memory.** One process instead of a fork per core:
|
|
18
|
+
about **1/19th of the Puma cluster's memory** under the same load, and
|
|
19
|
+
about 1/8th when serving the Rails hello-world.
|
|
20
|
+
* **Parallel without forking.** Ractor mode runs CPU work **5×** faster
|
|
21
|
+
than Kino's own GVL-bound threaded mode, in the same small process.
|
|
22
|
+
* **Production plumbing included.** Graceful drain, crash supervision
|
|
23
|
+
and respawn, bounded queues with 503 backpressure, request timeouts,
|
|
24
|
+
TLS (rustls), live stats, async access and app logging.
|
|
25
|
+
* **Tells you why.** `kino --check` lists exactly what blocks your app
|
|
26
|
+
from ractor mode, finding by finding, so you do not have to decode
|
|
27
|
+
`Ractor::IsolationError` yourself.
|
|
28
|
+
* **Puma-shaped.** The same `workers × threads` topology, a familiar
|
|
29
|
+
config DSL, a `kino` CLI. If you can run Puma, you can run Kino.
|
|
30
|
+
|
|
31
|
+
**N.B.:** Ractors are officially **experimental** in Ruby 4.0, and so is this server. The threaded mode is solid. Still, Kino aims to be the best way to experiment with Ractors today—and the best Ractor server when they become stable.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## Table of Contents
|
|
36
|
+
|
|
37
|
+
- [Why](#why)
|
|
38
|
+
- [Benchmarks](#benchmarks)
|
|
39
|
+
- [Install](#install)
|
|
40
|
+
- [Usage](#usage)
|
|
41
|
+
- [Config file and CLI](#config-file-and-cli)
|
|
42
|
+
- [`kino --check`](#kino---check)
|
|
43
|
+
- [Request timeouts](#request-timeouts)
|
|
44
|
+
- [Stats](#stats)
|
|
45
|
+
- [Logging](#logging)
|
|
46
|
+
- [Timer waits](#timer-waits)
|
|
47
|
+
- [Rack 3 compliance](#rack-3-compliance)
|
|
48
|
+
- [Rails](#rails)
|
|
49
|
+
|
|
50
|
+
## Why
|
|
51
|
+
|
|
52
|
+
The GVL allows only one Ruby thread to run at a time. To use all cores,
|
|
53
|
+
Ruby servers fork processes, and every fork costs a full copy of the
|
|
54
|
+
app. Ractors do not have this limit: each one has its own lock, so one
|
|
55
|
+
process can run Ruby in parallel. What was missing is a server that
|
|
56
|
+
dispatches requests to them. Ruby 4.0 reworked Ractors (`Ractor::Port`,
|
|
57
|
+
`shareable_proc`, less lock contention) and made this worth building.
|
|
58
|
+
|
|
59
|
+
Why a Ractor server has to be built this way, and which Rust parts make
|
|
60
|
+
Ractors fast here: [doc/why-kino.md](doc/why-kino.md). The full design
|
|
61
|
+
notes live in [doc/architecture.md](doc/architecture.md).
|
|
62
|
+
|
|
63
|
+
## Benchmarks
|
|
64
|
+
|
|
65
|
+
Measured on a real server: AWS **c7a.2xlarge** (8-core AMD EPYC 9R14,
|
|
66
|
+
16 GB, Amazon Linux 2023). This is a realistic app-server size. The same
|
|
67
|
+
Ractor-shareable app runs on every server, Ruby 4.0.5 with YJIT, equal
|
|
68
|
+
topology (8 workers × 3 threads; Puma forks, Kino stays in one process).
|
|
69
|
+
Numbers are req/s by wrk (8-second windows, 64 connections, same host).
|
|
70
|
+
Methodology and the analysis behind every column:
|
|
71
|
+
[doc/benchmarks.md](doc/benchmarks.md).
|
|
72
|
+
|
|
73
|
+
| endpoint | Kino :ractor | + lanes | Kino :threaded | Puma (cluster) |
|
|
74
|
+
|-------------|-------------:|--------:|---------------:|---------------:|
|
|
75
|
+
| /plaintext | 201,472 | **241,501** | 218,348 | 117,838 |
|
|
76
|
+
| /10k | 156,635 | **183,564** | 153,442 | 106,666 |
|
|
77
|
+
| /cpu (fib) | 66,735¹| **70,373** | 13,298 | 58,207 |
|
|
78
|
+
| /io (5 ms) | 4,527²| 4,530 | **4,715** | 4,691 |
|
|
79
|
+
| /io_native | 4,714 | **4,717** | 4,709 | 4,692 |
|
|
80
|
+
|
|
81
|
+
Memory on the same box, RSS under load:
|
|
82
|
+
|
|
83
|
+
| serving | Kino (one process) | Puma cluster (8 workers) |
|
|
84
|
+
|-----------------------|-------------------:|-------------------------:|
|
|
85
|
+
| bench app, :ractor | **57 MB** | 1,078 MB |
|
|
86
|
+
| bench app, :threaded | **50 MB** | 1,078 MB |
|
|
87
|
+
| Rails hello-world | **97 MB** | 797 MB |
|
|
88
|
+
|
|
89
|
+
"+ lanes" is the experimental per-worker-queue dispatcher (`lanes true`).
|
|
90
|
+
It adds +20% over the shared queue on this hardware and makes ractor
|
|
91
|
+
mode the fastest Kino configuration. Details:
|
|
92
|
+
[doc/benchmarks.md](doc/benchmarks.md#lane-dispatch-experimental-lanes-true).
|
|
93
|
+
|
|
94
|
+
¹ Stock settings, no tuning. Ractor mode beats the fork cluster on pure
|
|
95
|
+
CPU by +15% (+21% with lanes). Threaded mode shows the GVL ceiling that
|
|
96
|
+
every single-process Ruby server hits. The CPU-tuning recipe that our
|
|
97
|
+
earlier Docker measurements needed makes no difference on real hardware
|
|
98
|
+
(+0.5%); see [doc/benchmarks.md](doc/benchmarks.md#cpu-bound-tuning).
|
|
99
|
+
|
|
100
|
+
² The ractor timer tax is small on real hardware: −4% against threaded
|
|
101
|
+
mode (it was −18% in Docker). Wait-bound throughput is slots ÷ wait, and
|
|
102
|
+
Kino slots are threads, not processes. `workers 32, threads 1` measured
|
|
103
|
+
**5,922 /io (+27% over the cluster) and 6,254 /io_native (+34%)**, still
|
|
104
|
+
one small process. See
|
|
105
|
+
[doc/benchmarks.md](doc/benchmarks.md#why-io-lags-in-ractor-mode-on-linux).
|
|
106
|
+
|
|
107
|
+
A common first idea is to keep your current server and wrap the app in
|
|
108
|
+
a ractor pool. We measured that too (same box; the analysis is in the
|
|
109
|
+
doc):
|
|
110
|
+
|
|
111
|
+
| endpoint | Kino :ractor | Puma + ractor wrapper | Falcon + ractor wrapper |
|
|
112
|
+
|------------|-------------:|----------------------:|------------------------:|
|
|
113
|
+
| /plaintext | **201,472** | 19,425 | 100,624 |
|
|
114
|
+
| /cpu (fib) | **66,735** | 17,106 | 49,083 |
|
|
115
|
+
| /io (5 ms) | **4,527** | 1,447 | 1,549 |
|
|
116
|
+
|
|
117
|
+
In short: ractor mode reaches fork-level CPU parallelism (**5×** Kino's
|
|
118
|
+
own GVL-bound threaded mode) in one process, at about 1/19th of the
|
|
119
|
+
cluster's memory. Every Kino mode is 1.4-2× ahead of the cluster on
|
|
120
|
+
I/O-light endpoints. The macOS numbers (secondary; everything there hits
|
|
121
|
+
the loopback ceiling) and the YJIT × Ractors gotcha are in
|
|
122
|
+
[doc/benchmarks.md](doc/benchmarks.md).
|
|
123
|
+
|
|
124
|
+
Reproduce: `bench/run.sh [seconds] [concurrency]` for the main table,
|
|
125
|
+
`bench/studies.sh` for the follow-ups (CPU recipe, topology, scaling,
|
|
126
|
+
logging, memory).
|
|
127
|
+
|
|
128
|
+
## Install
|
|
129
|
+
|
|
130
|
+
You need Ruby >= 4.0. Add Kino to your application's bundle:
|
|
131
|
+
|
|
132
|
+
```sh
|
|
133
|
+
bundle add kino # or: gem install kino (outside a bundle)
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
or put it in the `Gemfile` yourself:
|
|
137
|
+
|
|
138
|
+
```ruby
|
|
139
|
+
gem "kino", "~> 0.1"
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
Then generate a config and serve:
|
|
143
|
+
|
|
144
|
+
```sh
|
|
145
|
+
bundle exec kino --init # writes kino.rb; every directive documented in place
|
|
146
|
+
bundle exec kino # picks up config.ru + kino.rb, serves on :9292
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
(After a standalone `gem install`, the `kino` command works without
|
|
150
|
+
`bundle exec`.)
|
|
151
|
+
|
|
152
|
+
No Rust compiler needed: released versions ship precompiled native gems
|
|
153
|
+
for Linux (x86_64/aarch64, glibc and musl) and macOS (arm64). On other
|
|
154
|
+
platforms the gem compiles at install time; that needs a Rust toolchain,
|
|
155
|
+
plus clang/libclang on Linux.
|
|
156
|
+
|
|
157
|
+
## Usage
|
|
158
|
+
|
|
159
|
+
```ruby
|
|
160
|
+
require "kino"
|
|
161
|
+
|
|
162
|
+
# Ractor mode needs a Ractor-shareable app: capture nothing, freeze config.
|
|
163
|
+
app = Ractor.shareable_proc do |env|
|
|
164
|
+
[200, { "content-type" => "text/plain" }, ["Hello from #{Ractor.current}"]]
|
|
165
|
+
end
|
|
166
|
+
|
|
167
|
+
Kino::Server.run(app, port: 9292) # traps INT/TERM; Ctrl-C drains gracefully
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
Or embedded, with everything spelled out:
|
|
171
|
+
|
|
172
|
+
```ruby
|
|
173
|
+
server = Kino::Server.new(app,
|
|
174
|
+
bind: "127.0.0.1",
|
|
175
|
+
port: 9292, # 0 = ephemeral; read back via server.port
|
|
176
|
+
workers: Etc.nprocessors, # ractors (parallelism)
|
|
177
|
+
threads: 3, # threads per ractor (I/O concurrency, Puma-style)
|
|
178
|
+
mode: :auto, # :auto | :ractor | :threaded
|
|
179
|
+
queue_depth: 1024, # bounded queue; overflow → 503
|
|
180
|
+
queue_timeout: 1.0, # seconds before 503 on a full queue
|
|
181
|
+
request_timeout: nil, # seconds before a slow response becomes a 504 (nil = off)
|
|
182
|
+
shutdown_timeout: 30, # drain deadline
|
|
183
|
+
tls: { cert: "cert.pem", key: "key.pem" }, # file paths or inline PEM
|
|
184
|
+
)
|
|
185
|
+
server.start
|
|
186
|
+
server.shutdown # graceful: drain → deadline → abort stragglers
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
### Modes
|
|
190
|
+
|
|
191
|
+
- **`:ractor`**: `workers` Ractors × `threads` Threads each. The app must
|
|
192
|
+
be `Ractor.shareable?` (frozen middleware, `shareable_proc` endpoints).
|
|
193
|
+
Forcing `:ractor` with an unshareable app raises
|
|
194
|
+
`Kino::UnshareableAppError`. A crashed ractor returns 500 to its
|
|
195
|
+
in-flight requests right away, then respawns.
|
|
196
|
+
- **`:threaded`**: the same machinery on `workers × threads` plain
|
|
197
|
+
Threads. Runs **any** Rack app, including Rails, today. Parallel for
|
|
198
|
+
I/O, serialized by the GVL for CPU.
|
|
199
|
+
- **`:auto`** (default): `:ractor` when the app is shareable, otherwise
|
|
200
|
+
a warning and `:threaded`. One caveat: a *class* used as a Rack app
|
|
201
|
+
always counts as "shareable" (classes are), even if calling it touches
|
|
202
|
+
unshareable state. Force `:threaded` for those.
|
|
203
|
+
|
|
204
|
+
## Config file and CLI
|
|
205
|
+
|
|
206
|
+
Settings can live in a Puma-style Ruby DSL file. Precedence: explicit
|
|
207
|
+
kwargs and CLI flags > config file > defaults.
|
|
208
|
+
|
|
209
|
+
```ruby
|
|
210
|
+
# kino.rb
|
|
211
|
+
port 9292
|
|
212
|
+
workers 8
|
|
213
|
+
threads 3
|
|
214
|
+
mode :ractor
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
```sh
|
|
218
|
+
kino --init # write a fully commented sample kino.rb
|
|
219
|
+
kino # config.ru + kino.rb, port 9292
|
|
220
|
+
kino --check # explain whether the app can run in :ractor mode
|
|
221
|
+
kino -C config/kino.rb -p 3000 -w 4 -m ractor my_app.ru
|
|
222
|
+
```
|
|
223
|
+
|
|
224
|
+
The generated sample documents every directive, including the Rails
|
|
225
|
+
settings and the performance notes.
|
|
226
|
+
|
|
227
|
+
## `kino --check`
|
|
228
|
+
|
|
229
|
+
When an app cannot run in `:ractor` mode, Kino can tell you why, instead
|
|
230
|
+
of leaving you with a bare `Ractor::IsolationError`. The check changes
|
|
231
|
+
nothing (it does not freeze your objects) and names each blocker:
|
|
232
|
+
captured variables with the place they were defined, instance variables
|
|
233
|
+
by path, and the class-level instance variable trap that catches
|
|
234
|
+
class-style apps:
|
|
235
|
+
|
|
236
|
+
```
|
|
237
|
+
$ kino --check
|
|
238
|
+
check: app is NOT Ractor-shareable
|
|
239
|
+
- app (Proc at app.rb:12)—captures `cache` = {} (Hash) (unshareable)
|
|
240
|
+
- app (HelloApp).@instance—class-level ivar holds #<HelloApp…>—classes
|
|
241
|
+
pass Ractor.shareable?, but reading this from a worker ractor raises
|
|
242
|
+
Ractor::IsolationError on the first request
|
|
243
|
+
hints: freeze config at boot; build endpoints with Ractor.shareable_proc;
|
|
244
|
+
keep per-worker resources in Ractor.store_if_absent; or run mode :threaded.
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
Exit status is 0/1, so it works in CI. The programmatic form is
|
|
248
|
+
`Kino::Check.report(app)`.
|
|
249
|
+
|
|
250
|
+
## Request timeouts
|
|
251
|
+
|
|
252
|
+
`request_timeout: seconds` (or `request_timeout 30` in `kino.rb`) limits
|
|
253
|
+
how long the app may take to produce a response. Past the deadline the
|
|
254
|
+
client gets an immediate **504** while the handler keeps running; its
|
|
255
|
+
late response is dropped without harm. Off by default. The handler is
|
|
256
|
+
deliberately *not* killed, because interrupting arbitrary Ruby mid-flight
|
|
257
|
+
is unsafe. A stuck handler still occupies its worker slot until it
|
|
258
|
+
returns, so set the deadline above your slowest legitimate endpoint and
|
|
259
|
+
watch `stats[:timeouts]`.
|
|
260
|
+
|
|
261
|
+
## Stats
|
|
262
|
+
|
|
263
|
+
`server.stats` returns a live snapshot: the configuration plus counters
|
|
264
|
+
from the native layer (one relaxed atomic per request, no measurable
|
|
265
|
+
cost):
|
|
266
|
+
|
|
267
|
+
```ruby
|
|
268
|
+
server.stats
|
|
269
|
+
# => {mode: :ractor, lanes: false, workers: 8, threads: 3, batch: 1,
|
|
270
|
+
# respawns: 0, queued: 0, in_flight: 2, served: 1041, rejected: 0,
|
|
271
|
+
# timeouts: 0}
|
|
272
|
+
# plus lane_depths: [...] when lane dispatch is on
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
From the outside, `kill -USR1 <pid>` prints the same snapshot as one line
|
|
276
|
+
(pair it with `pidfile` to find the pid):
|
|
277
|
+
|
|
278
|
+
```
|
|
279
|
+
Kino stats: mode=:ractor lanes=false workers=8 threads=3 batch=1 respawns=0 queued=0 in_flight=2 served=1041 rejected=0 timeouts=0
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
## Logging
|
|
283
|
+
|
|
284
|
+
With one log line per request, `Kino::Logger` sustained **2.4× the
|
|
285
|
+
throughput of a shared `::Logger`** (151k vs 63k req/s on the benchmark
|
|
286
|
+
box). There are two native pieces. Both write through a lock-free
|
|
287
|
+
channel to a Rust flusher thread, so request threads never take a log
|
|
288
|
+
mutex and never make a write syscall:
|
|
289
|
+
|
|
290
|
+
- **Access log** (`log_requests true`): one line per request to stdout,
|
|
291
|
+
including the 503s that never reach your app. On color terminals the
|
|
292
|
+
lines are tinted by status class: 2xx green, 3xx yellow, 4xx maroon,
|
|
293
|
+
5xx bright red:
|
|
294
|
+
|
|
295
|
+
```
|
|
296
|
+
127.0.0.1 [Tue, 10 Jun 2026 13:39:56 GMT] "GET / HTTP/1.1" 200 0.1ms
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
- **`Kino::Logger`**: a `::Logger` over the same async sink, for your
|
|
300
|
+
app's own logging (`Kino::Logger.new("log/production.log")`, or no
|
|
301
|
+
argument for stdout). The raw IO-like device is `Kino::Logger::Device`,
|
|
302
|
+
for integrations that want bytes without `::Logger` formatting. The
|
|
303
|
+
device is frozen and Ractor-shareable, so one device serves every
|
|
304
|
+
worker.
|
|
305
|
+
|
|
306
|
+
`Kino::Logger` in a **Rails** app: it is a real `::Logger` subclass, so
|
|
307
|
+
it fits anywhere Rails expects a logger:
|
|
308
|
+
|
|
309
|
+
```ruby
|
|
310
|
+
# config/environments/production.rb, simplest forms:
|
|
311
|
+
config.logger = Kino::Logger.new # stdout
|
|
312
|
+
config.logger = Kino::Logger.new("log/production.log") # file
|
|
313
|
+
# both file and stdout:
|
|
314
|
+
config.logger = ActiveSupport::BroadcastLogger.new(
|
|
315
|
+
Kino::Logger.new("log/production.log"), Kino::Logger.new
|
|
316
|
+
)
|
|
317
|
+
# tagged logging wraps it like any ::Logger:
|
|
318
|
+
config.logger = ActiveSupport::TaggedLogging.new(Kino::Logger.new)
|
|
319
|
+
```
|
|
320
|
+
|
|
321
|
+
From a plain **Rack** app, give middleware the logger, or hand
|
|
322
|
+
`Rack::CommonLogger` the raw device (it just calls `write`):
|
|
323
|
+
|
|
324
|
+
```ruby
|
|
325
|
+
# config.ru
|
|
326
|
+
use Rack::CommonLogger, Kino::Logger::Device.new # access-style app log
|
|
327
|
+
run MyApp
|
|
328
|
+
```
|
|
329
|
+
|
|
330
|
+
(If you only want request lines, prefer Kino's own `log_requests true`.
|
|
331
|
+
It is free for your Ruby threads, and it also sees the 503s that never
|
|
332
|
+
reach Rack.)
|
|
333
|
+
|
|
334
|
+
Graceful shutdown drains both logs fully. A hard crash can lose the tail
|
|
335
|
+
of the buffer, and when you log faster than the disk can take (over 100k
|
|
336
|
+
lines/s), the sink drops lines instead of blocking request threads.
|
|
337
|
+
These trade-offs are measured in
|
|
338
|
+
[doc/benchmarks.md](doc/benchmarks.md#logging-costs).
|
|
339
|
+
|
|
340
|
+
## Timer waits
|
|
341
|
+
|
|
342
|
+
`Kino.sleep(seconds)` is a high-resolution sleep on the OS clock with
|
|
343
|
+
the GVL released. MRI's own `sleep` wakes up late inside non-main
|
|
344
|
+
ractors (details and numbers in [doc/benchmarks.md](doc/benchmarks.md)).
|
|
345
|
+
Use `Kino.sleep` for explicit timer waits in handlers. Ordinary blocking
|
|
346
|
+
I/O does not need it.
|
|
347
|
+
|
|
348
|
+
## Rack 3 compliance
|
|
349
|
+
|
|
350
|
+
The spec suite runs every test app under `Rack::Lint` over real sockets:
|
|
351
|
+
streaming request bodies (forward-only `rack.input`), enumerable and
|
|
352
|
+
callable (full-duplex stream) response bodies, lowercase and multi-value
|
|
353
|
+
headers, HEAD/204 semantics. Full hijack is left out on purpose; it is
|
|
354
|
+
optional in Rack 3.
|
|
355
|
+
|
|
356
|
+
## Rails
|
|
357
|
+
|
|
358
|
+
Rails (edge) runs on Kino today in `:threaded` mode; see
|
|
359
|
+
`examples/rails-hello`. Ractor-mode Rails is blocked upstream. The exact
|
|
360
|
+
blockers, the `Ruby::Box` findings, and what would unlock it are written
|
|
361
|
+
up in [doc/rails-on-ractors.md](doc/rails-on-ractors.md). The example
|
|
362
|
+
ships a probe script that re-tests against whatever Rails you bundle.
|
|
363
|
+
|
|
364
|
+
## Development
|
|
365
|
+
|
|
366
|
+
```sh
|
|
367
|
+
bin/setup
|
|
368
|
+
bundle exec rake # compile, Rust tests, specs, RBS, lint
|
|
369
|
+
RB_SYS_CARGO_PROFILE=dev bundle exec rake compile # fast dev rebuilds
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
## Assisted by
|
|
373
|
+
|
|
374
|
+
Claude Code (Mythos, Opus).
|
|
375
|
+
|
|
376
|
+
## Contributing
|
|
377
|
+
|
|
378
|
+
Bug reports and pull requests are welcome on GitHub at
|
|
379
|
+
https://github.com/yaroslav/kino.
|
|
380
|
+
|
|
381
|
+
## License
|
|
382
|
+
|
|
383
|
+
The gem is available as open source under the terms of the
|
|
384
|
+
[MIT License](https://opensource.org/licenses/MIT).
|
data/doc/README.md
ADDED
data/doc/architecture.md
ADDED
|
@@ -0,0 +1,161 @@
|
|
|
1
|
+
# Architecture
|
|
2
|
+
|
|
3
|
+
```
|
|
4
|
+
tokio (Rust threads) Ruby
|
|
5
|
+
┌──────────────────────────┐
|
|
6
|
+
│ accept loop (hyper) │ bounded MPMC ┌─ worker: Ractor × threads ─┐
|
|
7
|
+
│ per request: │ ──── queue ───────► │ loop { │
|
|
8
|
+
│ parse → RequestCtx │ │ env = take_one │ ← blocks with the
|
|
9
|
+
│ queue full → 503 │ ◄─── response ───── │ status,h,b = app.(env) │ per-ractor lock
|
|
10
|
+
│ TLS (rustls) │ ◄─── body chunks ── │ respond / stream │ RELEASED
|
|
11
|
+
└──────────────────────────┘ └────────────────────────────┘
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
All network I/O lives in Rust on a tokio multi-threaded runtime; hyper
|
|
15
|
+
parses HTTP/1.1 and handles keep-alive; rustls terminates TLS. Ruby never
|
|
16
|
+
touches a socket. Each request becomes a Rust-side `RequestCtx` pushed to a
|
|
17
|
+
bounded flume MPMC queue; Ruby workers pull from it.
|
|
18
|
+
|
|
19
|
+
## Topology
|
|
20
|
+
|
|
21
|
+
Puma-style two-level: `workers × threads`.
|
|
22
|
+
|
|
23
|
+
- `:ractor` mode—`workers` Ractors, each running `threads` Ruby Threads
|
|
24
|
+
over the same worker loop. Parallel across ractors (each has its own VM
|
|
25
|
+
lock); concurrent within one only for I/O-bound handlers.
|
|
26
|
+
- `:threaded` mode—the same total capacity as plain Threads on the main
|
|
27
|
+
ractor. Runs any Rack app; the GVL serializes CPU work.
|
|
28
|
+
- Identical machinery either way: the flume queue is MPMC, a "worker slot"
|
|
29
|
+
is per-thread, and the worker loop (`lib/kino/worker.rb`) is shared
|
|
30
|
+
verbatim.
|
|
31
|
+
- Experimental `lanes true` replaces the one shared queue with a small
|
|
32
|
+
private queue per worker slot (awake-preferring dispatch, work
|
|
33
|
+
stealing); see [benchmarks](benchmarks.md#lane-dispatch-experimental-lanes-true).
|
|
34
|
+
|
|
35
|
+
## The Rust ↔ Ruby boundary
|
|
36
|
+
|
|
37
|
+
- **No native (TypedData) handle crosses a ractor boundary.** Worker
|
|
38
|
+
ractors receive plain integers (server id, worker ids) plus the
|
|
39
|
+
Ractor-shareable app; native state lives in a global Rust-side registry
|
|
40
|
+
keyed by those ids. The per-request handle
|
|
41
|
+
(`Kino::Native::Request`, a TypedData object) is created *inside* the
|
|
42
|
+
worker ractor by the take calls (`take_one`/`take_batch`), so its
|
|
43
|
+
ownership is correct by construction.
|
|
44
|
+
- **Blocking discipline:** every blocking native call goes through
|
|
45
|
+
`rb_thread_call_without_gvl` (rb-sys; magnus doesn't wrap it) so a
|
|
46
|
+
blocked worker holds no VM lock. Waits poll an atomic interrupt flag
|
|
47
|
+
between bounded `recv_timeout` ticks; the unblock function (UBF) just
|
|
48
|
+
sets the flag. `flume::Selector` lost wakeups under sustained load
|
|
49
|
+
(workers went permanently deaf to a non-empty queue after ~100k
|
|
50
|
+
requests) and is not used anywhere.
|
|
51
|
+
- **Fast path:** when a request is already queued, `take_one` takes it
|
|
52
|
+
with `try_recv` while still holding the GVL—the release/reacquire pair
|
|
53
|
+
(two scheduler round-trips) is skipped entirely. Under load this is the
|
|
54
|
+
common case.
|
|
55
|
+
- **Fused crossing:** the common complete-body response rides
|
|
56
|
+
`respond_and_take_one`: answer the previous request and take the next in
|
|
57
|
+
one FFI call, ~one crossing per request once the loop is warm. The env
|
|
58
|
+
Hash carries the request handle under `env["kino.request"]`, so no
|
|
59
|
+
per-request pair array exists either.
|
|
60
|
+
- **Env construction:** one FFI call builds the full CGI side of the Rack
|
|
61
|
+
env as a real Hash. Static keys, common methods/protocols and 44 common
|
|
62
|
+
`HTTP_*` header names come from a frozen (and therefore Ractor-shareable)
|
|
63
|
+
string cache built once at init on the main ractor. Frozen keys also
|
|
64
|
+
skip the dup that `Hash#[]=` performs on unfrozen string keys. Only
|
|
65
|
+
`rack.input` is lazy/streaming.
|
|
66
|
+
- **Response path:** the Rack headers Hash is passed through as-is and
|
|
67
|
+
iterated on the Rust side (`RHash#foreach`); header bytes are borrowed
|
|
68
|
+
in place from rooted Ruby strings (safe: GVL held, hyper copies
|
|
69
|
+
immediately). Single-chunk bodies skip the join copy.
|
|
70
|
+
|
|
71
|
+
## Backpressure, in both directions
|
|
72
|
+
|
|
73
|
+
- Bounded request queue between tokio and Ruby. When it stays full past
|
|
74
|
+
`queue_timeout`, the client gets an immediate 503 rather than waiting.
|
|
75
|
+
- Request bodies stream through a bounded(8) channel: hyper is only polled
|
|
76
|
+
as fast as Ruby consumes (inbound backpressure costs nothing extra).
|
|
77
|
+
Bodyless requests (most GETs) spawn no forwarder task at all.
|
|
78
|
+
- Response bodies stream through a bounded(8) channel the other way: a
|
|
79
|
+
slow client makes `write_chunk` block—with the GVL released.
|
|
80
|
+
|
|
81
|
+
## Failure handling
|
|
82
|
+
|
|
83
|
+
Three parties can answer a client, coordinated by an atomic
|
|
84
|
+
first-claimant-wins flag on the per-request `Responder`:
|
|
85
|
+
|
|
86
|
+
1. The app, via the worker loop (normal path; `StandardError` is rescued
|
|
87
|
+
in Ruby and becomes a clean 500).
|
|
88
|
+
2. The supervisor: each worker ractor has a supervisor thread blocked in
|
|
89
|
+
`Ractor#value`. A hard crash (any `Exception`) wakes it; it immediately
|
|
90
|
+
500s the crashed ractor's in-flight requests via a `Weak<Responder>`
|
|
91
|
+
side table—not when GC eventually notices—and respawns the ractor
|
|
92
|
+
with fresh slots.
|
|
93
|
+
3. A `Drop` guard on `RequestCtx` as the universal backstop (GC of an
|
|
94
|
+
abandoned handle, teardown races). The Drop path never touches the Ruby
|
|
95
|
+
API, so it is safe from any thread.
|
|
96
|
+
|
|
97
|
+
With `request_timeout` configured, the tokio front-end can additionally
|
|
98
|
+
answer with a 504 on its own when the response head misses the deadline;
|
|
99
|
+
the worker keeps running, and its late response goes nowhere harmlessly:
|
|
100
|
+
the front-end has stopped listening (the oneshot receiver is dropped),
|
|
101
|
+
and the worker's claim makes the Drop backstop a no-op.
|
|
102
|
+
|
|
103
|
+
Client aborts are handled the same way in reverse: hyper drops the request
|
|
104
|
+
future, and a Rust `Drop` guard keeps the in-flight counter honest (a
|
|
105
|
+
plain decrement after an `.await` would never run).
|
|
106
|
+
|
|
107
|
+
## Graceful shutdown
|
|
108
|
+
|
|
109
|
+
`stop_accepting` → drain until queue + in-flight reach zero or the
|
|
110
|
+
deadline passes → `close_queue` (idle workers see Disconnected and exit) →
|
|
111
|
+
join workers → past deadline: abort remaining clients (a 500, or a
|
|
112
|
+
connection abort mid-stream), interrupt blocked workers, reap
|
|
113
|
+
stragglers → tear down the tokio runtime. Idempotent;
|
|
114
|
+
a second INT/TERM force-exits.
|
|
115
|
+
|
|
116
|
+
## Timer waits: `Kino.sleep`
|
|
117
|
+
|
|
118
|
+
MRI's `sleep` parks the thread on the VM timer, whose wakeups inside
|
|
119
|
+
non-main ractors are coarse (how coarse is environment-dependent; see
|
|
120
|
+
[benchmarks](benchmarks.md#why-io-lags-in-ractor-mode-on-linux)).
|
|
121
|
+
`Kino.sleep` releases the GVL and waits on the OS clock directly, chunked
|
|
122
|
+
at the interrupt tick so `Thread#kill` and shutdown stay responsive.
|
|
123
|
+
|
|
124
|
+
## Why tokio (researched June 2026)
|
|
125
|
+
|
|
126
|
+
- **tokio + hyper**: the bottleneck is the Ruby dispatch boundary, not raw
|
|
127
|
+
I/O throughput; what matters is HTTP correctness, keep-alive, TLS, and
|
|
128
|
+
h2-later—hyper's territory. Cross-platform out of the box.
|
|
129
|
+
- **monoio**: thread-per-core io_uring looks great in echo-server
|
|
130
|
+
benchmarks, but hyper only works through its poll-io compat layer
|
|
131
|
+
(forfeiting io_uring on the hot path), and the share-nothing advantage
|
|
132
|
+
is spent the moment requests fan into an MPMC queue toward Ruby.
|
|
133
|
+
- **compio**: completion-based, cross-platform, production-proven—but no
|
|
134
|
+
first-class HTTP server story yet, and completion-model owned-buffer
|
|
135
|
+
semantics would leak into the request lifecycle design.
|
|
136
|
+
- **ntex**: the strongest alternative—unlike monoio/compio it has a
|
|
137
|
+
first-class HTTP/1.1 + HTTP/2 server stack (TechEmpower top tier) plus
|
|
138
|
+
an io_uring runtime ("neon") on Linux today. Rejected as the default
|
|
139
|
+
for now: its thread-per-core, `Rc`-based `!Send` worker model is
|
|
140
|
+
exactly what our Send-ctx-into-MPMC dispatch opts out of; its own
|
|
141
|
+
request/response/body types would force a conversion seam through
|
|
142
|
+
`Responder` and the streaming path; neon is Linux-only (ntex-on-tokio
|
|
143
|
+
elsewhere forfeits the io_uring win and just trades hyper's
|
|
144
|
+
battle-tested h1 for a less-deployed one); and the realistic gain is
|
|
145
|
+
confined to syscall-bound /plaintext-class traffic—the Ruby boundary,
|
|
146
|
+
not the front-end, is where Kino's time goes. Worth a contained
|
|
147
|
+
feature-flag spike if the Linux plaintext ceiling ever matters
|
|
148
|
+
competitively.
|
|
149
|
+
- **io_uring path**: tokio ships in-tree io_uring as an unstable feature
|
|
150
|
+
(file ops as of 1.52; network expected to follow). `server.rs` isolates
|
|
151
|
+
the runtime, so adopting it later is a contained change—and would
|
|
152
|
+
deliver most of ntex/neon's win without the type seam.
|
|
153
|
+
|
|
154
|
+
## Versioning of risky dependencies
|
|
155
|
+
|
|
156
|
+
magnus is used for everything except the GVL-release primitives and the
|
|
157
|
+
`rb_ext_ractor_safe` flag, which go straight to rb-sys (magnus wraps
|
|
158
|
+
neither). magnus's lazy TypedData class cache is force-resolved at init
|
|
159
|
+
on the main ractor, so no worker ractor ever races its first resolution;
|
|
160
|
+
the only symbols the crate creates are made during `server_start`, also
|
|
161
|
+
on the main ractor.
|