RubyGems - kino - Versions diffs - 0.1.0 → 0.1.2 - Mend

kino 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +37 -0
data/Cargo.lock +1 -1
data/README.md +103 -57
data/doc/benchmarks.md +208 -89
data/doc/rails-on-ractors.md +5 -4
data/doc/why-kino.md +8 -8
data/ext/kino/Cargo.toml +1 -1
data/ext/kino/src/registry.rs +4 -0
data/ext/kino/src/request.rs +33 -1
data/ext/kino/src/server.rs +123 -25
data/lib/kino/configuration.rb +14 -3
data/lib/kino/server.rb +21 -1
data/lib/kino/templates/kino.rb.tt +63 -83
data/lib/kino/version.rb +1 -1
data/sig/kino.rbs +2 -0
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: a4e73026c9f3087a58234e0f8fc88be1e407bb7435e8e8a0be43d087f318bcad
-  data.tar.gz: 49581091f865d3ff11227900376ad9499d1bddef77a4b3b0bc5b3b7405d58bd8
+  metadata.gz: 0bd8e6e3b295832fa1d87743b4c9a121cdb5687287011b29f1140814fdae0575
+  data.tar.gz: f21305459e857d366159ee258d873e2109b7a7715bb0cbe8d23c47e458ae2b33
 SHA512:
-  metadata.gz: 3e420380536f5c73a417d20945ad49f7f0a8e46d3cec5feabfde50600aab9dfc18b2e16d64656b2e1529e5e2c08384efabf4be9f36bebfb9cc737a245e348aec
-  data.tar.gz: f0e38db0ea8bb93cfa7825db47ebcf72319e76ac82487ed6efb191d82db86a8c205899f7996f34976b36d1960dd97f3086fa4a5359179ce8c7f35bfa5d8b94f4
+  metadata.gz: 04e95f9ee2133b4d15bdd2069977e73791a16b033e57a3bdb88478d0048b0bb5d8f56eff1963813bbf8d9399461bb80bf9c33a2039f98805e4f129b3e42b28ef
+  data.tar.gz: 8eb131cdbbe5bdbd29d188ab4ff43dbbec2f8c791aacf85586ba69c7208b2f74cc05d4007c469a4de61c4578ef08214e56a4fd080b9ad655af3a01d208b4c752

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,40 @@
+## [0.1.2] - 2026-06-22
+- Drop a connection that has not sent its complete request headers
+  within 15 seconds. Closes a slowloris hole: hyper's built-in header-read
+  timeout was inert because the server installed no timer, so a slow-header
+  client could tie up a connection (and its tokio task) indefinitely.
+- Cap concurrent connections (new `max_connections` directive). Past the cap,
+  new connections wait in the kernel backlog instead of piling up until a
+  flood exhausts file descriptors or memory. Defaults to most of the process
+  open-file limit (`ulimit -n`), so it scales with the OS limit and only
+  engages under a flood.
+- Bound the TLS handshake to 10 seconds. A client that completed the TCP
+  connect but stalled the handshake could otherwise hold a connection slot
+  indefinitely, since the request and header-read deadlines only begin once
+  the handshake finishes.
+- Cap the request body at 50 MB by default (new `max_body_size` directive,
+  configurable; nil or 0 disables and delegates to a fronting proxy). An app
+  that reads `rack.input` could otherwise be driven to run out of memory by an
+  oversized or endless upload. A truthful oversize Content-Length is refused
+  with a 413 before the app runs; a chunked or lying client is cut off
+  mid-stream once it passes the cap.
+- Bound the idle time between request-body frames to 30 seconds. A client that
+  began a request then stalled mid-body would otherwise keep a worker blocked
+  in `rack.input.read` indefinitely; now the read raises and the worker
+  reclaims its slot. Only a silent client trips it: a steadily-sent body resets
+  the deadline each frame, so slow-but-active uploads are unaffected.
+## [0.1.1] - 2026-06-11
+- Mode-dependent `threads` default: 1 per worker in :ractor mode (threads
+  inside a ractor share its lock and cost a per-request handoff; +16-18%
+  on fast handlers, measured on dedicated hardware), 3 in :threaded mode.
+  Explicit `threads` always wins; waiting-heavy ractor apps should raise
+  `workers` instead.
+- `queue_timeout` default raised from 1 to 5 seconds: a brief burst now
+  waits out the spike instead of shedding 503s within a second.
 ## [0.1.0] - 2026-06-11
 Initial release.

data/Cargo.lock CHANGED Viewed

@@ -332,7 +332,7 @@ dependencies = [
 [[package]]
 name = "kino"
-version = "0.1.0"
+version = "0.1.2"
 dependencies = [
  "ahash",
  "bytes",

data/README.md CHANGED Viewed

@@ -11,14 +11,14 @@ on every core in **one small process**. A **Rust** (tokio + hyper)
 front-end owns the network, parallel **Ractors** run your Rack 3 app,
 and a threaded fallback mode runs everything else, Rails included.
-* **Fast.** On a real 8-core server, every Kino mode is **1.4-2×** ahead
-  of a same-topology Puma cluster on I/O-light endpoints. Ractor mode
-  also wins on pure CPU. [Benchmarks](#benchmarks) below.
-* **A fraction of the memory.** One process instead of a fork per core:
-  about **1/19th of the Puma cluster's memory** under the same load, and
-  about 1/8th when serving the Rails hello-world.
-* **Parallel without forking.** Ractor mode runs CPU work **5×** faster
-  than Kino's own GVL-bound threaded mode, in the same small process.
+* **Fast.** On a real 8-core server, every Kino mode is **1.5-2×**
+  ahead of a Puma fork cluster on I/O-light endpoints. Ractor mode also
+  wins on pure CPU, **30%+**. [Benchmarks](#benchmarks) below.
+* **A fraction of the memory.** Aabout **~7×** on the simplistic bench
+  Ractor app, and about **4× less memory** than a Puma cluster serving Rails in fallback threaded mode.
+* **Parallel without forking.** Ractor mode runs CPU work **more than
+  5× faster** than Kino's own GVL-bound threaded mode, in the same
+  small process.
 * **Production plumbing included.** Graceful drain, crash supervision
   and respawn, bounded queues with 503 backpressure, request timeouts,
   TLS (rustls), live stats, async access and app logging.
@@ -63,63 +63,108 @@ notes live in [doc/architecture.md](doc/architecture.md).
 ## Benchmarks
 Measured on a real server: AWS **c7a.2xlarge** (8-core AMD EPYC 9R14,
-16 GB, Amazon Linux 2023). This is a realistic app-server size. The same
-Ractor-shareable app runs on every server, Ruby 4.0.5 with YJIT, equal
-topology (8 workers × 3 threads; Puma forks, Kino stays in one process).
-Numbers are req/s by wrk (8-second windows, 64 connections, same host).
-Methodology and the analysis behind every column:
+16 GB, Amazon Linux 2023). This is a realistic app-server size.
+**These tables run a tiny synthetic Rack app**—plaintext, a 10 KB body,
+a CPU-bound `fib`, a 5 ms wait—deliberately small, to measure the server
+rather than an app. It is Ractor-shareable, so Kino runs it in `:ractor`
+mode (and `:threaded` for comparison). **A real Rails app is a different
+story:** it is *not* Ractor-shareable, so it runs only in Kino's
+`:threaded` fallback, with its own numbers—see [Rails](#rails) below.
+Ruby 4.0.5 with YJIT, every server at its defaults: Puma forks 8 workers ×
+3 threads, Kino stays in one process (8 workers; 1 thread each in ractor
+modes, 3 in threaded). Numbers are req/s by wrk (8-second windows, 64
+connections, same host). Methodology:
 [doc/benchmarks.md](doc/benchmarks.md).
-| endpoint    | Kino :ractor | + lanes | Kino :threaded | Puma (cluster) |
-|-------------|-------------:|--------:|---------------:|---------------:|
-| /plaintext  |      201,472 | **241,501** |    218,348 |        117,838 |
-| /10k        |      156,635 | **183,564** |    153,442 |        106,666 |
-| /cpu (fib)  |       66,735¹| **70,373**  |     13,298 |         58,207 |
-| /io (5 ms)  |        4,527²|   4,530 |      **4,715** |          4,691 |
-| /io_native  |        4,714 | **4,717** |        4,709 |          4,692 |
+| endpoint    | Kino :ractor | + lanes | :ractor, `workers 32`² | Kino :threaded | Puma (cluster) |
+|-------------|-------------:|--------:|-----------------------:|---------------:|---------------:|
+| /plaintext  |      229,534 | **250,222** |         182,997 |        216,994 |        118,176 |
+| /10k        |      178,083 | **189,862** |         151,034 |        160,400 |        106,768 |
+| /cpu (fib)  |   **77,999**¹|  70,885 |          66,100 |         13,429 |         58,006 |
+| /io (5 ms)  |        1,552 |   1,551 |       **5,888** |          4,709 |          4,693 |
+| /io_native  |        1,570 |   1,571 |       **6,274** |          4,695 |          4,691 |
-Memory on the same box, RSS under load:
+Memory tells two different stories depending on the app, both by **PSS**
+(proportional set size; see note) after sustained load.
-| serving               | Kino (one process) | Puma cluster (8 workers) |
-|-----------------------|-------------------:|-------------------------:|
-| bench app, :ractor    |          **57 MB** |                 1,078 MB |
-| bench app, :threaded  |          **50 MB** |                 1,078 MB |
-| Rails hello-world     |          **97 MB** |                   797 MB |
+**The tiny benchmark app** (Ractor-shareable, so Kino runs it in `:ractor`
+or `:threaded`). Kino is **~7× lighter in :ractor mode, ~10× in :threaded**
+than the Puma cluster — the gap stays large because a trivial app is almost
+all private per-worker heap, which copy-on-write can't share:
+| tiny app, Kino  | Kino (one process) | Puma cluster (8 workers) | ratio |
+|-----------------|-------------------:|-------------------------:|------:|
+| :ractor (8×1)   |         **148 MB** |                 1,068 MB |  ~7×  |
+| :threaded (8×3) |         **107 MB**³|                 1,068 MB | ~10×  |
+**A real Rails app** (not Ractor-shareable—Kino's `:threaded` fallback
+only, [below](#rails)). The gap is **~4×**, smaller because Rails' large
+framework *is* shared copy-on-write across Puma's forks:
+| Rails hello-world | Kino :threaded | Puma cluster (8 workers) | ratio |
+|-------------------|---------------:|-------------------------:|------:|
+| **PSS**           |      **92 MB** |               **389 MB** |  ~4×  |
 "+ lanes" is the experimental per-worker-queue dispatcher (`lanes true`).
-It adds +20% over the shared queue on this hardware and makes ractor
-mode the fastest Kino configuration. Details:
+It posts the fastest plaintext/10k of any configuration here. Details:
 [doc/benchmarks.md](doc/benchmarks.md#lane-dispatch-experimental-lanes-true).
 ¹ Stock settings, no tuning. Ractor mode beats the fork cluster on pure
-CPU by +15% (+21% with lanes). Threaded mode shows the GVL ceiling that
-every single-process Ruby server hits. The CPU-tuning recipe that our
-earlier Docker measurements needed makes no difference on real hardware
-(+0.5%); see [doc/benchmarks.md](doc/benchmarks.md#cpu-bound-tuning).
-² The ractor timer tax is small on real hardware: −4% against threaded
-mode (it was −18% in Docker). Wait-bound throughput is slots ÷ wait, and
-Kino slots are threads, not processes. `workers 32, threads 1` measured
-**5,922 /io (+27% over the cluster) and 6,254 /io_native (+34%)**, still
-one small process. See
+CPU by +34% (+22% with lanes). Threaded mode shows the GVL ceiling that
+every single-process Ruby server hits. The old CPU-tuning recipe is
+retired: its `threads 1` half **is** the default now, and its
+`tokio_threads 1` half costs −12% on real hardware; see
+[doc/benchmarks.md](doc/benchmarks.md#cpu-bound-tuning).
+² Wait-bound throughput is slots ÷ wait, and the default columns bring
+8 single-thread workers against the cluster's 24 threads. Kino slots
+are threads, not processes—when your app waits a lot, raise `workers`.
+The `workers 32` column is that tuning: **+25% over the cluster on /io
+(+34% via `Kino.sleep`)** while still ahead of it on pure CPU, all in
+one small process. The cost is the CPU-light rows (32 ractors
+oversubscribe 8 cores); pick the topology your app's wait profile
+needs. See
 [doc/benchmarks.md](doc/benchmarks.md#why-io-lags-in-ractor-mode-on-linux).
+³ With `MALLOC_ARENA_MAX=2` (the standard Ruby deployment setting;
+Heroku's default). Without it, 24 threads churning 10 KB responses
+through one glibc heap balloon to ~670 MB—an arena-fragmentation
+footgun, not a leak, and ractor mode sidesteps it. See
+[doc/benchmarks.md](doc/benchmarks.md#memory-under-load-and-the-glibc-arena-footgun).
 A common first idea is to keep your current server and wrap the app in
 a ractor pool. We measured that too (same box; the analysis is in the
 doc):
-| endpoint   | Kino :ractor | Puma + ractor wrapper | Falcon + ractor wrapper |
-|------------|-------------:|----------------------:|------------------------:|
-| /plaintext |  **201,472** |                19,425 |                 100,624 |
-| /cpu (fib) |   **66,735** |                17,106 |                  49,083 |
-| /io (5 ms) |    **4,527** |                 1,447 |                   1,549 |
-In short: ractor mode reaches fork-level CPU parallelism (**5×** Kino's
-own GVL-bound threaded mode) in one process, at about 1/19th of the
-cluster's memory. Every Kino mode is 1.4-2× ahead of the cluster on
-I/O-light endpoints. The macOS numbers (secondary; everything there hits
-the loopback ceiling) and the YJIT × Ractors gotcha are in
-[doc/benchmarks.md](doc/benchmarks.md).
+| endpoint   | Kino :ractor (8×3) | Puma + ractor wrapper | Falcon + ractor wrapper |
+|------------|-------------------:|----------------------:|------------------------:|
+| /plaintext |        **193,826** |                19,480 |                  99,776 |
+| /cpu (fib) |         **68,061** |                17,755 |                  48,721 |
+| /io (5 ms) |          **4,530** |                 1,454 |                   1,549 |
+### Rails
+Rails is not Ractor-shareable today, so Kino serves it in `:threaded`
+fallback — one GVL-bound process. On the same box (`examples/rails-hello`,
+edge Rails, production, 8×5):
+| Rails hello-world            |  req/s | memory (PSS) |
+|------------------------------|-------:|-------------:|
+| Kino :threaded (one process) |  2,637 |    **92 MB** |
+| Puma cluster (8 workers)     | 12,138 |       389 MB |
+The honest trade-off: Puma's fork cluster uses all 8 cores, so it serves
+~4.6× the throughput — at ~4× the memory. Ractor-mode Rails would close
+the throughput gap at one-process memory cost; the upstream blockers are
+tracked in [doc/rails-on-ractors.md](doc/rails-on-ractors.md).
+In short: on the tiny synthetic app, ractor mode beats fork-level CPU parallelism (**5.8×** Kino's
+own GVL-bound threaded mode, +34% over the cluster) in one process, at
+about 1/7th of the cluster's memory by PSS (~4× on a real Rails app).
+Every Kino mode is 1.5-2.1× ahead of the cluster on I/O-light endpoints. The macOS numbers
+(secondary; everything there hits the loopback ceiling) and the
+YJIT × Ractors gotcha are in [doc/benchmarks.md](doc/benchmarks.md).
 Reproduce: `bench/run.sh [seconds] [concurrency]` for the main table,
 `bench/studies.sh` for the follow-ups (CPU recipe, topology, scaling,
@@ -174,10 +219,10 @@ server = Kino::Server.new(app,
   bind: "127.0.0.1",
   port: 9292,                 # 0 = ephemeral; read back via server.port
   workers: Etc.nprocessors,   # ractors (parallelism)
-  threads: 3,                 # threads per ractor (I/O concurrency, Puma-style)
+  threads: 1,                 # per worker; ractor default 1, threaded default 3
   mode: :auto,                # :auto | :ractor | :threaded
   queue_depth: 1024,          # bounded queue; overflow → 503
-  queue_timeout: 1.0,         # seconds before 503 on a full queue
+  queue_timeout: 5.0,         # seconds before 503 on a full queue
   request_timeout: nil,       # seconds before a slow response becomes a 504 (nil = off)
   shutdown_timeout: 30,       # drain deadline
   tls: { cert: "cert.pem", key: "key.pem" },  # file paths or inline PEM
@@ -210,7 +255,7 @@ kwargs and CLI flags > config file > defaults.
 # kino.rb
 port 9292
 workers 8
-threads 3
+threads 1
 mode :ractor
 ```
@@ -266,7 +311,7 @@ cost):
 ```ruby
 server.stats
-# => {mode: :ractor, lanes: false, workers: 8, threads: 3, batch: 1,
+# => {mode: :ractor, lanes: false, workers: 8, threads: 1, batch: 1,
 #     respawns: 0, queued: 0, in_flight: 2, served: 1041, rejected: 0,
 #     timeouts: 0}
 # plus lane_depths: [...] when lane dispatch is on
@@ -276,19 +321,20 @@ From the outside, `kill -USR1 <pid>` prints the same snapshot as one line
 (pair it with `pidfile` to find the pid):
 ```
-Kino stats: mode=:ractor lanes=false workers=8 threads=3 batch=1 respawns=0 queued=0 in_flight=2 served=1041 rejected=0 timeouts=0
+Kino stats: mode=:ractor lanes=false workers=8 threads=1 batch=1 respawns=0 queued=0 in_flight=2 served=1041 rejected=0 timeouts=0
 ```
 ## Logging
 With one log line per request, `Kino::Logger` sustained **2.4× the
-throughput of a shared `::Logger`** (151k vs 63k req/s on the benchmark
+throughput of a shared `::Logger`** (149k vs 63k req/s on the benchmark
 box). There are two native pieces. Both write through a lock-free
 channel to a Rust flusher thread, so request threads never take a log
 mutex and never make a write syscall:
 - **Access log** (`log_requests true`): one line per request to stdout,
-  including the 503s that never reach your app. On color terminals the
+  including the 503s that never reach your app. Recommended in
+  development; cheap enough for production. On color terminals the
   lines are tinted by status class: 2xx green, 3xx yellow, 4xx maroon,
   5xx bright red: