kino 0.1.0 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +37 -0
- data/Cargo.lock +1 -1
- data/README.md +103 -57
- data/doc/benchmarks.md +208 -89
- data/doc/rails-on-ractors.md +5 -4
- data/doc/why-kino.md +8 -8
- data/ext/kino/Cargo.toml +1 -1
- data/ext/kino/src/registry.rs +4 -0
- data/ext/kino/src/request.rs +33 -1
- data/ext/kino/src/server.rs +123 -25
- data/lib/kino/configuration.rb +14 -3
- data/lib/kino/server.rb +21 -1
- data/lib/kino/templates/kino.rb.tt +63 -83
- data/lib/kino/version.rb +1 -1
- data/sig/kino.rbs +2 -0
- metadata +1 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 0bd8e6e3b295832fa1d87743b4c9a121cdb5687287011b29f1140814fdae0575
|
|
4
|
+
data.tar.gz: f21305459e857d366159ee258d873e2109b7a7715bb0cbe8d23c47e458ae2b33
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 04e95f9ee2133b4d15bdd2069977e73791a16b033e57a3bdb88478d0048b0bb5d8f56eff1963813bbf8d9399461bb80bf9c33a2039f98805e4f129b3e42b28ef
|
|
7
|
+
data.tar.gz: 8eb131cdbbe5bdbd29d188ab4ff43dbbec2f8c791aacf85586ba69c7208b2f74cc05d4007c469a4de61c4578ef08214e56a4fd080b9ad655af3a01d208b4c752
|
data/CHANGELOG.md
CHANGED
|
@@ -1,3 +1,40 @@
|
|
|
1
|
+
## [0.1.2] - 2026-06-22
|
|
2
|
+
|
|
3
|
+
- Drop a connection that has not sent its complete request headers
|
|
4
|
+
within 15 seconds. Closes a slowloris hole: hyper's built-in header-read
|
|
5
|
+
timeout was inert because the server installed no timer, so a slow-header
|
|
6
|
+
client could tie up a connection (and its tokio task) indefinitely.
|
|
7
|
+
- Cap concurrent connections (new `max_connections` directive). Past the cap,
|
|
8
|
+
new connections wait in the kernel backlog instead of piling up until a
|
|
9
|
+
flood exhausts file descriptors or memory. Defaults to most of the process
|
|
10
|
+
open-file limit (`ulimit -n`), so it scales with the OS limit and only
|
|
11
|
+
engages under a flood.
|
|
12
|
+
- Bound the TLS handshake to 10 seconds. A client that completed the TCP
|
|
13
|
+
connect but stalled the handshake could otherwise hold a connection slot
|
|
14
|
+
indefinitely, since the request and header-read deadlines only begin once
|
|
15
|
+
the handshake finishes.
|
|
16
|
+
- Cap the request body at 50 MB by default (new `max_body_size` directive,
|
|
17
|
+
configurable; nil or 0 disables and delegates to a fronting proxy). An app
|
|
18
|
+
that reads `rack.input` could otherwise be driven to run out of memory by an
|
|
19
|
+
oversized or endless upload. A truthful oversize Content-Length is refused
|
|
20
|
+
with a 413 before the app runs; a chunked or lying client is cut off
|
|
21
|
+
mid-stream once it passes the cap.
|
|
22
|
+
- Bound the idle time between request-body frames to 30 seconds. A client that
|
|
23
|
+
began a request then stalled mid-body would otherwise keep a worker blocked
|
|
24
|
+
in `rack.input.read` indefinitely; now the read raises and the worker
|
|
25
|
+
reclaims its slot. Only a silent client trips it: a steadily-sent body resets
|
|
26
|
+
the deadline each frame, so slow-but-active uploads are unaffected.
|
|
27
|
+
|
|
28
|
+
## [0.1.1] - 2026-06-11
|
|
29
|
+
|
|
30
|
+
- Mode-dependent `threads` default: 1 per worker in :ractor mode (threads
|
|
31
|
+
inside a ractor share its lock and cost a per-request handoff; +16-18%
|
|
32
|
+
on fast handlers, measured on dedicated hardware), 3 in :threaded mode.
|
|
33
|
+
Explicit `threads` always wins; waiting-heavy ractor apps should raise
|
|
34
|
+
`workers` instead.
|
|
35
|
+
- `queue_timeout` default raised from 1 to 5 seconds: a brief burst now
|
|
36
|
+
waits out the spike instead of shedding 503s within a second.
|
|
37
|
+
|
|
1
38
|
## [0.1.0] - 2026-06-11
|
|
2
39
|
|
|
3
40
|
Initial release.
|
data/Cargo.lock
CHANGED
data/README.md
CHANGED
|
@@ -11,14 +11,14 @@ on every core in **one small process**. A **Rust** (tokio + hyper)
|
|
|
11
11
|
front-end owns the network, parallel **Ractors** run your Rack 3 app,
|
|
12
12
|
and a threaded fallback mode runs everything else, Rails included.
|
|
13
13
|
|
|
14
|
-
* **Fast.** On a real 8-core server, every Kino mode is **1.
|
|
15
|
-
of a
|
|
16
|
-
|
|
17
|
-
* **A fraction of the memory.**
|
|
18
|
-
about **
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
14
|
+
* **Fast.** On a real 8-core server, every Kino mode is **1.5-2×**
|
|
15
|
+
ahead of a Puma fork cluster on I/O-light endpoints. Ractor mode also
|
|
16
|
+
wins on pure CPU, **30%+**. [Benchmarks](#benchmarks) below.
|
|
17
|
+
* **A fraction of the memory.** Aabout **~7×** on the simplistic bench
|
|
18
|
+
Ractor app, and about **4× less memory** than a Puma cluster serving Rails in fallback threaded mode.
|
|
19
|
+
* **Parallel without forking.** Ractor mode runs CPU work **more than
|
|
20
|
+
5× faster** than Kino's own GVL-bound threaded mode, in the same
|
|
21
|
+
small process.
|
|
22
22
|
* **Production plumbing included.** Graceful drain, crash supervision
|
|
23
23
|
and respawn, bounded queues with 503 backpressure, request timeouts,
|
|
24
24
|
TLS (rustls), live stats, async access and app logging.
|
|
@@ -63,63 +63,108 @@ notes live in [doc/architecture.md](doc/architecture.md).
|
|
|
63
63
|
## Benchmarks
|
|
64
64
|
|
|
65
65
|
Measured on a real server: AWS **c7a.2xlarge** (8-core AMD EPYC 9R14,
|
|
66
|
-
16 GB, Amazon Linux 2023). This is a realistic app-server size.
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
66
|
+
16 GB, Amazon Linux 2023). This is a realistic app-server size.
|
|
67
|
+
|
|
68
|
+
**These tables run a tiny synthetic Rack app**—plaintext, a 10 KB body,
|
|
69
|
+
a CPU-bound `fib`, a 5 ms wait—deliberately small, to measure the server
|
|
70
|
+
rather than an app. It is Ractor-shareable, so Kino runs it in `:ractor`
|
|
71
|
+
mode (and `:threaded` for comparison). **A real Rails app is a different
|
|
72
|
+
story:** it is *not* Ractor-shareable, so it runs only in Kino's
|
|
73
|
+
`:threaded` fallback, with its own numbers—see [Rails](#rails) below.
|
|
74
|
+
Ruby 4.0.5 with YJIT, every server at its defaults: Puma forks 8 workers ×
|
|
75
|
+
3 threads, Kino stays in one process (8 workers; 1 thread each in ractor
|
|
76
|
+
modes, 3 in threaded). Numbers are req/s by wrk (8-second windows, 64
|
|
77
|
+
connections, same host). Methodology:
|
|
71
78
|
[doc/benchmarks.md](doc/benchmarks.md).
|
|
72
79
|
|
|
73
|
-
| endpoint | Kino :ractor | + lanes | Kino :threaded | Puma (cluster) |
|
|
74
|
-
|
|
75
|
-
| /plaintext |
|
|
76
|
-
| /10k |
|
|
77
|
-
| /cpu (fib) |
|
|
78
|
-
| /io (5 ms) |
|
|
79
|
-
| /io_native |
|
|
80
|
+
| endpoint | Kino :ractor | + lanes | :ractor, `workers 32`² | Kino :threaded | Puma (cluster) |
|
|
81
|
+
|-------------|-------------:|--------:|-----------------------:|---------------:|---------------:|
|
|
82
|
+
| /plaintext | 229,534 | **250,222** | 182,997 | 216,994 | 118,176 |
|
|
83
|
+
| /10k | 178,083 | **189,862** | 151,034 | 160,400 | 106,768 |
|
|
84
|
+
| /cpu (fib) | **77,999**¹| 70,885 | 66,100 | 13,429 | 58,006 |
|
|
85
|
+
| /io (5 ms) | 1,552 | 1,551 | **5,888** | 4,709 | 4,693 |
|
|
86
|
+
| /io_native | 1,570 | 1,571 | **6,274** | 4,695 | 4,691 |
|
|
80
87
|
|
|
81
|
-
Memory on the
|
|
88
|
+
Memory tells two different stories depending on the app, both by **PSS**
|
|
89
|
+
(proportional set size; see note) after sustained load.
|
|
82
90
|
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
91
|
+
**The tiny benchmark app** (Ractor-shareable, so Kino runs it in `:ractor`
|
|
92
|
+
or `:threaded`). Kino is **~7× lighter in :ractor mode, ~10× in :threaded**
|
|
93
|
+
than the Puma cluster — the gap stays large because a trivial app is almost
|
|
94
|
+
all private per-worker heap, which copy-on-write can't share:
|
|
95
|
+
|
|
96
|
+
| tiny app, Kino | Kino (one process) | Puma cluster (8 workers) | ratio |
|
|
97
|
+
|-----------------|-------------------:|-------------------------:|------:|
|
|
98
|
+
| :ractor (8×1) | **148 MB** | 1,068 MB | ~7× |
|
|
99
|
+
| :threaded (8×3) | **107 MB**³| 1,068 MB | ~10× |
|
|
100
|
+
|
|
101
|
+
**A real Rails app** (not Ractor-shareable—Kino's `:threaded` fallback
|
|
102
|
+
only, [below](#rails)). The gap is **~4×**, smaller because Rails' large
|
|
103
|
+
framework *is* shared copy-on-write across Puma's forks:
|
|
104
|
+
|
|
105
|
+
| Rails hello-world | Kino :threaded | Puma cluster (8 workers) | ratio |
|
|
106
|
+
|-------------------|---------------:|-------------------------:|------:|
|
|
107
|
+
| **PSS** | **92 MB** | **389 MB** | ~4× |
|
|
88
108
|
|
|
89
109
|
"+ lanes" is the experimental per-worker-queue dispatcher (`lanes true`).
|
|
90
|
-
It
|
|
91
|
-
mode the fastest Kino configuration. Details:
|
|
110
|
+
It posts the fastest plaintext/10k of any configuration here. Details:
|
|
92
111
|
[doc/benchmarks.md](doc/benchmarks.md#lane-dispatch-experimental-lanes-true).
|
|
93
112
|
|
|
94
113
|
¹ Stock settings, no tuning. Ractor mode beats the fork cluster on pure
|
|
95
|
-
CPU by +
|
|
96
|
-
every single-process Ruby server hits. The CPU-tuning recipe
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
114
|
+
CPU by +34% (+22% with lanes). Threaded mode shows the GVL ceiling that
|
|
115
|
+
every single-process Ruby server hits. The old CPU-tuning recipe is
|
|
116
|
+
retired: its `threads 1` half **is** the default now, and its
|
|
117
|
+
`tokio_threads 1` half costs −12% on real hardware; see
|
|
118
|
+
[doc/benchmarks.md](doc/benchmarks.md#cpu-bound-tuning).
|
|
119
|
+
|
|
120
|
+
² Wait-bound throughput is slots ÷ wait, and the default columns bring
|
|
121
|
+
8 single-thread workers against the cluster's 24 threads. Kino slots
|
|
122
|
+
are threads, not processes—when your app waits a lot, raise `workers`.
|
|
123
|
+
The `workers 32` column is that tuning: **+25% over the cluster on /io
|
|
124
|
+
(+34% via `Kino.sleep`)** while still ahead of it on pure CPU, all in
|
|
125
|
+
one small process. The cost is the CPU-light rows (32 ractors
|
|
126
|
+
oversubscribe 8 cores); pick the topology your app's wait profile
|
|
127
|
+
needs. See
|
|
105
128
|
[doc/benchmarks.md](doc/benchmarks.md#why-io-lags-in-ractor-mode-on-linux).
|
|
106
129
|
|
|
130
|
+
³ With `MALLOC_ARENA_MAX=2` (the standard Ruby deployment setting;
|
|
131
|
+
Heroku's default). Without it, 24 threads churning 10 KB responses
|
|
132
|
+
through one glibc heap balloon to ~670 MB—an arena-fragmentation
|
|
133
|
+
footgun, not a leak, and ractor mode sidesteps it. See
|
|
134
|
+
[doc/benchmarks.md](doc/benchmarks.md#memory-under-load-and-the-glibc-arena-footgun).
|
|
135
|
+
|
|
107
136
|
A common first idea is to keep your current server and wrap the app in
|
|
108
137
|
a ractor pool. We measured that too (same box; the analysis is in the
|
|
109
138
|
doc):
|
|
110
139
|
|
|
111
|
-
| endpoint | Kino :ractor | Puma + ractor wrapper | Falcon + ractor wrapper |
|
|
112
|
-
|
|
113
|
-
| /plaintext |
|
|
114
|
-
| /cpu (fib) |
|
|
115
|
-
| /io (5 ms) |
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
140
|
+
| endpoint | Kino :ractor (8×3) | Puma + ractor wrapper | Falcon + ractor wrapper |
|
|
141
|
+
|------------|-------------------:|----------------------:|------------------------:|
|
|
142
|
+
| /plaintext | **193,826** | 19,480 | 99,776 |
|
|
143
|
+
| /cpu (fib) | **68,061** | 17,755 | 48,721 |
|
|
144
|
+
| /io (5 ms) | **4,530** | 1,454 | 1,549 |
|
|
145
|
+
|
|
146
|
+
### Rails
|
|
147
|
+
|
|
148
|
+
Rails is not Ractor-shareable today, so Kino serves it in `:threaded`
|
|
149
|
+
fallback — one GVL-bound process. On the same box (`examples/rails-hello`,
|
|
150
|
+
edge Rails, production, 8×5):
|
|
151
|
+
|
|
152
|
+
| Rails hello-world | req/s | memory (PSS) |
|
|
153
|
+
|------------------------------|-------:|-------------:|
|
|
154
|
+
| Kino :threaded (one process) | 2,637 | **92 MB** |
|
|
155
|
+
| Puma cluster (8 workers) | 12,138 | 389 MB |
|
|
156
|
+
|
|
157
|
+
The honest trade-off: Puma's fork cluster uses all 8 cores, so it serves
|
|
158
|
+
~4.6× the throughput — at ~4× the memory. Ractor-mode Rails would close
|
|
159
|
+
the throughput gap at one-process memory cost; the upstream blockers are
|
|
160
|
+
tracked in [doc/rails-on-ractors.md](doc/rails-on-ractors.md).
|
|
161
|
+
|
|
162
|
+
In short: on the tiny synthetic app, ractor mode beats fork-level CPU parallelism (**5.8×** Kino's
|
|
163
|
+
own GVL-bound threaded mode, +34% over the cluster) in one process, at
|
|
164
|
+
about 1/7th of the cluster's memory by PSS (~4× on a real Rails app).
|
|
165
|
+
Every Kino mode is 1.5-2.1× ahead of the cluster on I/O-light endpoints. The macOS numbers
|
|
166
|
+
(secondary; everything there hits the loopback ceiling) and the
|
|
167
|
+
YJIT × Ractors gotcha are in [doc/benchmarks.md](doc/benchmarks.md).
|
|
123
168
|
|
|
124
169
|
Reproduce: `bench/run.sh [seconds] [concurrency]` for the main table,
|
|
125
170
|
`bench/studies.sh` for the follow-ups (CPU recipe, topology, scaling,
|
|
@@ -174,10 +219,10 @@ server = Kino::Server.new(app,
|
|
|
174
219
|
bind: "127.0.0.1",
|
|
175
220
|
port: 9292, # 0 = ephemeral; read back via server.port
|
|
176
221
|
workers: Etc.nprocessors, # ractors (parallelism)
|
|
177
|
-
threads:
|
|
222
|
+
threads: 1, # per worker; ractor default 1, threaded default 3
|
|
178
223
|
mode: :auto, # :auto | :ractor | :threaded
|
|
179
224
|
queue_depth: 1024, # bounded queue; overflow → 503
|
|
180
|
-
queue_timeout:
|
|
225
|
+
queue_timeout: 5.0, # seconds before 503 on a full queue
|
|
181
226
|
request_timeout: nil, # seconds before a slow response becomes a 504 (nil = off)
|
|
182
227
|
shutdown_timeout: 30, # drain deadline
|
|
183
228
|
tls: { cert: "cert.pem", key: "key.pem" }, # file paths or inline PEM
|
|
@@ -210,7 +255,7 @@ kwargs and CLI flags > config file > defaults.
|
|
|
210
255
|
# kino.rb
|
|
211
256
|
port 9292
|
|
212
257
|
workers 8
|
|
213
|
-
threads
|
|
258
|
+
threads 1
|
|
214
259
|
mode :ractor
|
|
215
260
|
```
|
|
216
261
|
|
|
@@ -266,7 +311,7 @@ cost):
|
|
|
266
311
|
|
|
267
312
|
```ruby
|
|
268
313
|
server.stats
|
|
269
|
-
# => {mode: :ractor, lanes: false, workers: 8, threads:
|
|
314
|
+
# => {mode: :ractor, lanes: false, workers: 8, threads: 1, batch: 1,
|
|
270
315
|
# respawns: 0, queued: 0, in_flight: 2, served: 1041, rejected: 0,
|
|
271
316
|
# timeouts: 0}
|
|
272
317
|
# plus lane_depths: [...] when lane dispatch is on
|
|
@@ -276,19 +321,20 @@ From the outside, `kill -USR1 <pid>` prints the same snapshot as one line
|
|
|
276
321
|
(pair it with `pidfile` to find the pid):
|
|
277
322
|
|
|
278
323
|
```
|
|
279
|
-
Kino stats: mode=:ractor lanes=false workers=8 threads=
|
|
324
|
+
Kino stats: mode=:ractor lanes=false workers=8 threads=1 batch=1 respawns=0 queued=0 in_flight=2 served=1041 rejected=0 timeouts=0
|
|
280
325
|
```
|
|
281
326
|
|
|
282
327
|
## Logging
|
|
283
328
|
|
|
284
329
|
With one log line per request, `Kino::Logger` sustained **2.4× the
|
|
285
|
-
throughput of a shared `::Logger`** (
|
|
330
|
+
throughput of a shared `::Logger`** (149k vs 63k req/s on the benchmark
|
|
286
331
|
box). There are two native pieces. Both write through a lock-free
|
|
287
332
|
channel to a Rust flusher thread, so request threads never take a log
|
|
288
333
|
mutex and never make a write syscall:
|
|
289
334
|
|
|
290
335
|
- **Access log** (`log_requests true`): one line per request to stdout,
|
|
291
|
-
including the 503s that never reach your app.
|
|
336
|
+
including the 503s that never reach your app. Recommended in
|
|
337
|
+
development; cheap enough for production. On color terminals the
|
|
292
338
|
lines are tinted by status class: 2xx green, 3xx yellow, 4xx maroon,
|
|
293
339
|
5xx bright red:
|
|
294
340
|
|