RubyGems - rubino-agent - Versions diffs - 0.5.2.1 → 0.5.2.2 - Mend

rubino-agent 0.5.2.1 → 0.5.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (60) hide show

checksums.yaml +4 -4
data/.dockerignore +15 -0
data/CHANGELOG.md +56 -0
data/Dockerfile +56 -0
data/agent.md +112 -0
data/docs/design/bg-shell-pty-port.md +88 -0
data/docs/design/bg-shell-review-refinements.md +65 -0
data/docs/design/bg-shell-ux.md +130 -0
data/docs/tools.md +3 -12
data/lib/rubino/agent/iteration_budget.rb +13 -0
data/lib/rubino/agent/loop.rb +43 -5
data/lib/rubino/agent/prompts/build.txt +3 -5
data/lib/rubino/agent/prompts/memory_guidance.txt +5 -0
data/lib/rubino/agent/prompts/tool_use_enforcement.txt +4 -0
data/lib/rubino/agent/prompts/tool_use_enforcement_google.txt +9 -0
data/lib/rubino/agent/prompts/tool_use_enforcement_openai.txt +48 -0
data/lib/rubino/agent/runner.rb +55 -12
data/lib/rubino/agent/tool_executor.rb +1 -1
data/lib/rubino/cli/chat/idle_card_host.rb +6 -1
data/lib/rubino/cli/chat_command.rb +119 -17
data/lib/rubino/cli/commands.rb +5 -0
data/lib/rubino/commands/handlers/agents.rb +27 -18
data/lib/rubino/commands/handlers/status.rb +6 -3
data/lib/rubino/config/configuration.rb +25 -8
data/lib/rubino/config/defaults.rb +15 -13
data/lib/rubino/context/prompt_assembler.rb +89 -1
data/lib/rubino/context/summary_builder.rb +0 -22
data/lib/rubino/interaction/events.rb +2 -2
data/lib/rubino/interaction/lifecycle.rb +54 -20
data/lib/rubino/llm/ruby_llm_adapter.rb +178 -20
data/lib/rubino/security/redactor.rb +1 -1
data/lib/rubino/session/message.rb +12 -0
data/lib/rubino/tools/background_tasks.rb +107 -12
data/lib/rubino/tools/base.rb +1 -1
data/lib/rubino/tools/read_attachment_tool.rb +52 -54
data/lib/rubino/tools/registry.rb +21 -72
data/lib/rubino/tools/shell_entry_adapter.rb +97 -0
data/lib/rubino/tools/shell_input_tool.rb +1 -1
data/lib/rubino/tools/shell_kill_tool.rb +4 -4
data/lib/rubino/tools/shell_registry.rb +178 -38
data/lib/rubino/tools/shell_tool.rb +45 -5
data/lib/rubino/tools/task_result_tool.rb +4 -1
data/lib/rubino/tools/task_tool.rb +74 -11
data/lib/rubino/tools/vision_tool.rb +1 -1
data/lib/rubino/ui/agent_menu.rb +8 -2
data/lib/rubino/ui/api.rb +11 -0
data/lib/rubino/ui/bottom_composer.rb +24 -11
data/lib/rubino/ui/cli.rb +254 -15
data/lib/rubino/ui/markdown_renderer.rb +4 -1
data/lib/rubino/ui/stdout_proxy.rb +25 -10
data/lib/rubino/ui/streaming_markdown.rb +67 -12
data/lib/rubino/ui/subagent_cards.rb +8 -7
data/lib/rubino/ui/tool_args_stream.rb +143 -0
data/lib/rubino/update_check.rb +10 -2
data/lib/rubino/version.rb +1 -1
metadata +14 -6
data/AGENTS.md +0 -97
data/docs/agents.md +0 -216
data/lib/rubino/jobs/handlers/summarize_session_job.rb +0 -21
data/lib/rubino/tools/summarize_file_tool.rb +0 -194

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 45e009503b875320e560be8ce46459d7c2a70b0c5744043b70f2121c5e747ab2
-  data.tar.gz: c26d1f586ae8578ed3d7298d1f7f8765d16ed5328b6d063d77e0df41d32a710a
+  metadata.gz: ea93727a0527a270cfbad507d459eb7676532dd4f715967c9d6924cc9aa81c82
+  data.tar.gz: c11bf3ca63ed02a7705447fd65cd58f197939631c1c408a0bb7415b59e333c97
 SHA512:
-  metadata.gz: dc1cfd93fb51e6236daf8e839a8df248fdcd299aef3028c258fefc93c585bf15e322b5d41ebf4d1b504795a9d24d6e8668f5dece2089b9454f6432dabcace482
-  data.tar.gz: 5495d4862d7b082826205ca4ecb30701df4a103ce801233df6a3dc91c59ce423ea4cb6cab4e41b9b0327f72e911ad310bf9433aaf6d218d8803bb0ffd56bec65
+  metadata.gz: a0dfb145e9f590745b3cb178768581734109e0ac1d9fb5ec3b10552303e977323b394f96f20af66603d3f921aa180c32c593d77587756755c02195fcce6773e6
+  data.tar.gz: ccaf380590fe7c71c0a3d65f45e949b92ecd2c4f8ee8f00631e46da3b3188910fe49df333431746fc431553d212b6e7c006e4846df22ea9f8c7197f5bf9476c5

data/.dockerignore ADDED Viewed

@@ -0,0 +1,15 @@
+# Build context trimming — keep the image small and the build fast. The gem is
+# run from source (lib/ + exe/ + Gemfile), so none of the below is needed at
+# build or run time. .git is the big one (~66M) and the gemspec's `git ls-files`
+# simply yields [] without it, which is harmless (we run via exe/rubino, not the
+# packaged file list).
+.git
+.github
+coverage
+tmp
+pkg
+*.gem
+*.log
+.DS_Store
+node_modules
+spec/tmp

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,61 @@
 # Changelog
+## [Unreleased]
+## [0.5.2.2] - 2026-07-01
+### Added
+- **Background shells get the same dev UX as background subagents.** A shell
+  started with `run_in_background: true` now appears in the `↓` picker and the
+  live cards alongside subagents, can be FOCUSED (Enter attaches to a cleared
+  view that live-tails its output and lets you type straight to its stdin), and
+  STOPPED with `/stop`. Interactive shells run on a real PTY, so `y/N` prompts,
+  sudo passwords, and tty-aware programs work where a plain pipe couldn't.
+  `probe` (an instant output snapshot — no LLM call), `steer` (→ stdin), the
+  `/agents` list and the `/status` count all treat shells consistently with
+  subagents. Stopping a subagent **cascade-kills the background shells it
+  opened** (shells the user/main agent opened are left running). Ported from
+  Hermes' `ptyprocess`/`process_registry` model.
+- **H1/H2 headings get breathing room** — a blank line above and below big
+  headings so they break the surrounding prose instead of sitting glued to it;
+  H3+ stay compact.
+### Changed
+- **The per-turn wall clock is disabled by default.** `agent.max_turn_seconds`
+  (was 600s) guillotined legitimate multi-file work on slow local models — a real
+  docs-vs-code audit runs dozens of tool calls over 10+ minutes and was
+  force-summarized into a confused non-answer. The default is now `nil`
+  (disabled); the tool-iteration budget (`max_tool_iterations`, 90) is the
+  runaway guard, and per-tool timeouts bound a hung tool. Hermes parity (its
+  IterationBudget has no clock). Set a positive number to re-arm the clock as a
+  backstop.
+- The background picker header reads "background" (not "subagents") now that it
+  lists background shells alongside subagents.
+### Fixed
+- **Truncated subagents are reported as PARTIAL, not "completed".** A background
+  subagent force-summarized at its budget/time cap used to return its partial
+  progress recap as a normal completion, so the parent — and you — got a false
+  success with no real deliverable. The turn's terminal stop reason now flows out
+  of the loop, and the completion notice, the main-timeline marker, and
+  `task_result` all mark a cut-off child **PARTIAL** with a banner telling the
+  parent the delegated work is unfinished — so it recovers (re-delegates or
+  finishes the work itself) instead of trusting a false completion.
+- **A subagent's own `max_turns` budget is honored again.** `explore`'s per-agent
+  cap (20) was silently dropped — the runner passed `nil` for subagents, so the
+  cap never applied. A subagent now honors its cap and, on reaching it, surfaces
+  the budget-extension request (#574) instead of silently force-summarizing.
+- **`rubino update` now reports the new version correctly.** After `gem update`
+  pulled a newer gem, the command read the version via
+  `Gem::Specification.find_by_name`, which returns the spec ACTIVATED in the
+  running process — so it still saw the old version and wrongly printed "rubino is
+  already up to date" even though the update had installed. It now `Gem.refresh`es
+  and reads the HIGHEST installed version (`find_all_by_name(...).max`), so the
+  post-update message reflects what was actually installed.
 ## [0.5.2.1] - 2026-06-26
 ### Fixed

data/Dockerfile ADDED Viewed

@@ -0,0 +1,56 @@
+# Ubuntu-based image that runs rubino-agent FROM SOURCE, for manual testing.
+#
+#   Build:  docker build -t rubino:latest .
+#   Run:    docker run --rm -it \
+#             -v "$PWD":/work \                  # the dir the agent works on
+#             -v "$HOME/.rubino":/root/.rubino \ # reuse your host config + keys
+#             rubino:latest rubino
+#
+# The agent is launched via the `rubino` wrapper from any cwd; mount your project
+# at /work. Secrets are NEVER baked in — provide them by mounting ~/.rubino or
+# passing *_API_KEY env vars (-e RUBINO_API_KEY=...).
+FROM ubuntu:24.04
+ENV DEBIAN_FRONTEND=noninteractive \
+    TERM=xterm-256color \
+    LANG=C.UTF-8 \
+    LC_ALL=C.UTF-8
+# Two groups: (1) runtime tools the agent shells out to — git, ripgrep, sqlite3,
+# tmux, curl, less, procps; (2) the toolchain ruby-build (via mise) needs to
+# COMPILE Ruby 3.3.3 and the native gems (nokogiri / ffi / sqlite3).
+RUN apt-get update && apt-get install -y --no-install-recommends \
+      ca-certificates curl git less procps tmux ripgrep sqlite3 \
+      build-essential autoconf bison \
+      libssl-dev libyaml-dev libreadline-dev zlib1g-dev \
+      libncurses-dev libffi-dev libgdbm-dev libsqlite3-dev \
+    && rm -rf /var/lib/apt/lists/*
+# Ruby 3.3.3 via mise — Ubuntu's apt ships only 3.2, but the repo pins 3.3.3
+# (.ruby-version), so we compile the exact version. Put the install's bin dir
+# straight on PATH (no shims) so ruby/gem/bundler resolve deterministically.
+ENV MISE_DATA_DIR=/opt/mise \
+    PATH=/opt/mise/installs/ruby/3.3.3/bin:/usr/local/bin:$PATH
+RUN curl -fsSL https://mise.run | MISE_INSTALL_PATH=/usr/local/bin/mise sh \
+    && mise install ruby@3.3.3 \
+    && gem install bundler -v 4.0.12
+WORKDIR /app
+COPY . /app
+# The lockfile is resolved on macOS (arm64-darwin); add the Linux platforms so
+# `bundle install` stays in lockstep with the pinned versions instead of
+# re-resolving, then install.
+RUN bundle lock --add-platform x86_64-linux aarch64-linux \
+    && bundle install
+# Run rubino from the source checkout WITHOUT changing the caller's cwd, so the
+# agent operates on the mounted /work dir, not /app. (A source checkout has no
+# working binstub; this wrapper replaces it.)
+RUN printf '#!/usr/bin/env bash\nexport BUNDLE_GEMFILE=/app/Gemfile\nexec bundle exec /app/exe/rubino "$@"\n' \
+      > /usr/local/bin/rubino \
+    && chmod +x /usr/local/bin/rubino
+ENV RUBINO_HOME=/root/.rubino
+RUN mkdir -p /root/.rubino /work
+WORKDIR /work
+CMD ["bash"]

data/agent.md ADDED Viewed

@@ -0,0 +1,112 @@
+# agent.md — session state & decisions (handoff)
+> Working handoff for the next agent. NOT the project guide — that's `AGENTS.md`.
+> Branch: `test/pre-release-gate`.
+## How this is run (no Docker)
+- rubino runs from THIS checkout via `~/.local/bin/rubino-dev` (forces ruby
+  3.4.7, `BUNDLE_GEMFILE` = this repo, does NOT change cwd → operates on the dir
+  you invoke it from). Works anywhere on the machine; whatever is checked out
+  here is what runs — no reinstall. The installed gem `0.4.0` is the OLD fallback
+  WITHOUT our fixes — always verify with `rubino-dev`.
+- LLM backend: local OpenAI-compatible server on `127.0.0.1:8000` = **`ds4-serve`**
+  (DeepSeek: `deepseek-v4-flash`, `deepseek-v4-pro`). It **STREAMS tool-call
+  argument deltas** (a file write is on the wire as it generates) AND round-trips
+  `reasoning_content`. Config: `~/.rubino/config.yml` → `model.default:
+  deepseek-v4-flash`, provider `gateway` (openai_compatible, base_url
+  `127.0.0.1:8000/v1`).
+  - ⚠️ **ds4-server is a SINGLE KV slot and has no auto-restart; it CRASHES under
+    large generations.** If a "freeze" reappears, FIRST check it's still up:
+    `curl -s 127.0.0.1:8000/v1/models`. Its log is `/tmp/ds4-server.log` — the
+    single best diagnostic (see below).
+- Verify TUI behavior in a REAL terminal (offline PTY/pyte capture misses
+  raw-mode defects). The fastest objective probe is a PTY driver that timestamps
+  stdout (scratchpad/pty_*.rb in past sessions) + tailing `/tmp/ds4-server.log`.
+## The local-performance work — DONE + validated live (this branch)
+The user's "freeze on the local config" was THREE distinct bugs. All fixed and
+verified live against ds4 (see commits on this branch). Read
+`~/.claude/.../memory/reference_rubino_kv_cache_bust_rootcause.md` +
+`project_rubino_kv_cache_fix.md` for the full diagnosis.
+1. **Cross-turn freeze = KV prefix-cache busting (#608b/#608c).** ds4 only reuses
+   cache on a PURE prefix-extension (`common==live`); any divergence → full
+   `ctx=0..N` re-prefill (grows with context = "freeze after N turns").
+   - **Reasoning replay:** rubino dropped the assistant's `reasoning_content`, so
+     the replay diverged from the server's KV where reasoning was generated.
+     Now persisted (`metadata[:reasoning]`) and replayed as wire
+     `reasoning_content` (Hermes conversation_loop.py:940 parity). Bug found:
+     `extract_thinking` read `response.reasoning` (nonexistent) not `.thinking`.
+     Files: loop.rb, ruby_llm_adapter.rb (normalize_intermediate/rebuild_thinking/
+     load_history), session/message.rb. Effect: turn 2+ 21s → 0.6s.
+   - **Aux off-slot gate:** post-turn memory-extraction/distill ran a divergent
+     no-tools prompt on the SAME slot every turn → evicted the main KV. Now
+     SKIPPED on the interactive REPL when the aux task resolves to the main
+     endpoint (`Configuration#auxiliary_on_main_endpoint?`); extraction happens at
+     session-end flush + compaction (no recall lost — the per-session memory
+     snapshot is frozen anyway). `interactive` flag threaded build_runner(default
+     true)→setup_oneshot(false)→Runner→Lifecycle. A DISTINCT aux endpoint keeps
+     the inter-turn cadence. Files: configuration.rb, lifecycle.rb, runner.rb,
+     chat_command.rb.
+2. **Large-write freeze = dead UI past the preview cap (#608d).** ds4 streams a
+   big `write`'s args for minutes; after the 30-line preview cap `tool_chunk`
+   stopped emitting and the facet was hidden → ~38s of dead screen. Fix (cli.rb):
+   `tool_params_feed`/`tool_chunk` stream the params IN FULL (`full: true`, no
+   cap — the user watches the file land; the cap stays for tool OUTPUT) + an
+   animated facet during arg streaming. Verified: full content shown, UI silence
+   38s → 6.3s. NB the deltas are INCREMENTAL (not cumulative — no O(N²)); the
+   model is just genuinely slow (~17-22 t/s, degrading) for big files.
+3. **`ctx` gauge frozen during a run (#608e).** The bar repainted only at turn
+   boundaries and read persisted messages. Fix: `chat_command#live_status_meter`
+   captures the base once → cheap no-DB lambda on `ui.live_status_provider`; the
+   cli ticker (`refresh_live_ctx_bar`, ~1/s) feeds it `@turn_tok_chars/4` and
+   repaints. `build_status_line` refactored to share `render_status_bar`.
+   Verified: ctx climbs 0k→1.2k during a write.
+### Diagnostic playbook (reuse this)
+- `/tmp/ds4-server.log`: `live kv cache miss … common=N reason=token-mismatch`
+  then `chat ctx=0..N:N prompt done <s>` = a full re-prefill. `common==live` +
+  `ctx=K..N:small done 0.6s` = a cache HIT (what you want). Within-turn tool
+  iterations already hit; the bug is at turn boundaries.
+- Repro multi-turn: `rubino-dev -q "…" --yolo` then `-c -q "…"` (one-shot
+  continues a session) OR a PTY driver for true interactive multi-turn (the gate
+  is REPL-only, so one-shot won't show the aux-eviction fix). `RUBINO_HOME=<tmp>`
+  + a sed'd config isolates variables. A logging TCP proxy (:8999→:8000) captures
+  request/response bodies to confirm delta-vs-cumulative and reasoning replay.
+## Other uncommitted history folded into this branch's tip
+- **Streaming tool-call params UX (#608):** `lib/rubino/ui/tool_args_stream.rb`
+  (single-pass JSON streaming decoder, surfaces string VALUES), adapter
+  `announce_tool_stream` (emits `:tool_preparing` + `:tool_args`), `api.rb`/`cli.rb`
+  sinks, byte-batching in `stdout_proxy.rb`/`cli.rb`. The token/speed footer plan
+  is now partly realized by the live `ctx` gauge (#608e); a tok/s readout is still
+  open.
+- **Per-turn SummarizeSessionJob removed:** the running summary is produced ONLY
+  by threshold-gated compaction (Hermes/Claude-Code parity), never a background
+  job every turn. Handler deleted; `auto_summarize` config + `memory_auto_summarize?`
+  gone. (This ALSO removed one of the per-turn aux-LLM calls — aligned with #608c.)
+## Backlog / NOT done
+- **Fix C (prompt normalization + volatile-to-tail):** NOT needed (cache hits
+  already land without it; the frozen snapshot keeps volatile_tail stable).
+  Offered as optional strict-Hermes parity hardening only.
+- **Large files are genuinely slow on ds4** (model throughput, not rubino). The
+  UI now shows progress; making it FASTER is behavioral (steer toward edits /
+  smaller writes) — not yet done.
+- **tok/s readout** in the footer (the other half of the #608 token-footer plan).
+- 5 PRE-EXISTING host rspec failures (ruby_tool load-path, fresh_home_db schema)
+  — NOT regressions; see `reference_rubino_host_rspec_env_failures`.
+## Constraints / protocol
+- Clean code / DRY; refactor when it keeps things clean.
+- Repo/commit/PR content in English. NO co-author / "Generated with" trailers.
+- Verify with `rubino-dev` against live ds4 before claiming a TUI fix works
+  (offline render misses raw-mode defects).
+## Test status
+Full suite green except the 5 pre-existing host failures above
+(`6376 examples, 5 failures, 8 pending`). Rubocop clean on all touched files.

data/docs/design/bg-shell-pty-port.md ADDED Viewed

@@ -0,0 +1,88 @@
+# Porting Hermes' interactive PTY shell to rubino
+Status: DESIGN (deep-study of Hermes, no impl yet) · Branch: `feat/bg-shell-ux`
+Source studied: `hermes-agent/tools/process_registry.py`, `hermes-agent/tools/terminal_tool.py`,
+`hermes-agent/hermes_cli/pty_bridge.py`.
+## Why PTY (the corrected conclusion)
+A pipe-backed background shell has `stdin=DEVNULL` and can't answer `y/N`, sudo passwords,
+or run TTY-aware/curses programs. Hermes (and Codex `unified_exec`, and the open Claude
+Code FR) all converge on a **PTY**: the process believes it's on a real terminal, and the
+user's keystrokes/answers are written to the PTY master. We follow Hermes.
+## Hermes' model (the algorithm we port, with refs)
+1. **Spawn.** `ProcessRegistry.spawn_local(use_pty=True)` (`process_registry.py:515`) →
+   `ptyprocess.PtyProcess.spawn(cmd, env, ...)` (`:553`); the handle is stored on
+   `ProcessSession._pty` (`:134`). Pipe fallback when ptyprocess is absent. Pipe mode is
+   `stdin=DEVNULL` (`:605`) — deliberately non-interactive.
+2. **Output reader.** `_pty_reader_loop` (`:814`) `pty.read(4096)` until `pty.isalive()` is
+   false; captures `exitstatus`. Feeds `_check_watch_patterns` on each chunk (`:748/784/828`).
+3. **Input primitives.** `write_stdin(id, data)` (`:1184`) → `_pty.write(bytes)`;
+   `submit_stdin(id, data="")` = `write_stdin(data + "\n")` (press Enter, `:1209`);
+   `close_stdin(id)` = EOF without kill (`:1213`).
+4. **Interactive prompt routing.** Thread-local UI callbacks: `set_sudo_password_callback`
+   / `set_approval_callback` (`terminal_tool.py:189-205`). When unset, fall back to
+   `/dev/tty` / `input()`. The CLI registers them so prompts run through the TUI event loop.
+5. **Sudo password.** Detect `sudo` (`_rewrite_real_sudo_invocations`, `:501`), prompt the
+   user with HIDDEN input ("input is hidden", `:404`), cache per scope
+   (`_sudo_password_cache`, scope = session-key / callback-owner / thread, `:205-240`), feed
+   it to the process. Cache cleared on teardown (`_reset_cached_sudo_passwords`).
+6. **Watch patterns.** Regexes scan new output to detect notable lines/prompts; after
+   `WATCH_STRIKE_LIMIT` (3) misses, disable + promote to `notify_on_complete` (`:191-288`).
+## rubino mapping (DRY, faithful, clean-code)
+rubino already has the skeleton: `Tools::ShellRegistry` (pgid tracking + kill),
+`shell_tool` (background spawn), `shell_input`/`shell_output`/`shell_tail`/`shell_kill`.
+Today it is **pipe-only**. The port adds a PTY mode alongside.
+| Hermes | rubino target |
+|--------|---------------|
+| `ptyprocess.PtyProcess.spawn` | Ruby stdlib **`PTY.spawn`** (`require "pty"`) — returns `[reader_io, writer_io, pid]` |
+| `ProcessSession._pty` | a `pty_master`/`pty_pid` field on `ShellRegistry::Entry` (`shell_registry.rb:31`) |
+| `_pty_reader_loop` | the existing `drain_into` reader, reading the PTY master instead of the pipe |
+| `write_stdin/submit_stdin/close_stdin` | extend `ShellRegistry.write_input` + add `submit_input` (`+"\n"`) and `close_input` |
+| sudo/approval UI callbacks | **reuse rubino's existing prompt UI** (`UI::CLI#confirm` / the `question` tool / approval menu) — register a thread/fiber-local "shell input needed" callback that surfaces a masked prompt |
+| `_sudo_password_cache` (per scope) | a per-session masked-secret cache (scope = session id), cleared on teardown; mask in scrollback via the existing `SecretsMask` |
+| watch_patterns | OPTIONAL (slice 3) — a prompt detector (`password:`, `[y/N]`) to auto-surface input without the user attaching |
+### How the USER provides the `y` / password (the goal)
+Two complementary paths, both writing to the same `write_input` PTY primitive:
+- **Attach-and-type (the focus view).** When attached to the shell (the bg-shell-as-
+  `BackgroundTasks`-entry from `bg-shell-ux.md`), your keystrokes/lines route to
+  `ShellRegistry.write_input(id, ...)` → the PTY. You see `[y/N]`, type `y`, it goes in.
+- **Detect-and-prompt (Hermes sudo path, no attach needed).** A registered callback +
+  a prompt detector surface a masked/normal prompt inline ("the shell wants input:
+  `Password:`"); your answer is written to the PTY. Reuses the `question`/approval UI.
+## Stages (each clean-code, spec'd, tmux-verified before the next)
+- **Slice 0 — PTY foundation.** `ShellRegistry` gains a PTY mode (`PTY.spawn`), the reader
+  reads the master, `write_input`/`submit_input`/`close_input` work over the PTY. The model
+  tools (`shell_input`) already call `write_input`, so the agent can drive an interactive
+  bg process. Verify in tmux: `python3 -c "print(input('name? '))"` in bg, `shell_input`
+  "x\n", output shows it. (No user-facing UI yet.)
+- **Slice 1 — SEE + STOP + FOCUS** (from `bg-shell-ux.md`): shell as a `BackgroundTasks`
+  `kind: :shell` entry → card + picker + `/stop` + attach view (clear + live PTY tail).
+- **Slice 2 — USER types in focus.** Attached keystrokes/lines → `write_input` (the PTY).
+  Now you answer `y` yourself in the focus view.
+- **Slice 3 — detect-and-prompt + sudo masked.** Prompt detector + masked password +
+  per-session cache, reusing the `question`/approval UI. The Hermes sudo flow.
+## Gotchas (from the source)
+- `PTY.spawn` makes the child a session leader (PID == PGID) — matches rubino's existing
+  pgid hard-kill, good. But PTY EOF/`Errno::EIO` on child exit must be caught in the reader
+  (Ruby's `PTY` raises `PTY::ChildExited`/`Errno::EIO`).
+- Terminal size: a PTY needs a winsize (`TIOCSWINSZ`); set a sane default (e.g. 120x40) or
+  the attached terminal's size; resize on attach.
+- Masking: the sudo/secret path MUST run answers through `SecretsMask` so the password
+  never lands in scrollback or the output buffer (Hermes hides it; rubino has `SecretsMask`).
+- Don't break the pipe path: keep pipe mode as the default for non-interactive bg work;
+  PTY mode is opt-in (a `pty: true`/`interactive: true` arg, or auto when a prompt is likely).
+- Output: a PTY echoes input back + emits control sequences; the output buffer/tail must
+  strip/normalize (rubino has `ansi_strip`-equivalent? confirm) so the model/user see clean text.

data/docs/design/bg-shell-review-refinements.md ADDED Viewed

@@ -0,0 +1,65 @@
+# Review-driven refinements (Slice 1 design lock-in)
+An adversarial clean-code review of Slice 0 + the bridge plan produced these changes.
+They supersede the "thin adapters" framing in `bg-shell-ux.md` where they conflict.
+## Slice 0 fixes already applied (from review)
+- **close_stdin crash-on-retire (BLOCKER):** EOT only a LIVE child; a dead PTY master
+  raises `Errno::EIO`, so close the fd instead (also reclaiming a leaked `master_w`).
+  Rescue widened to `IOError, Errno::EIO, Errno::EBADF`.
+- **spawn_pty cwd fragility:** `cd … || exit 127\n<cmd>` (own line) — not `cd && (<cmd>)`,
+  which a trailing `#`-comment broke.
+- **winsize:** default `40x120` (a fresh PTY is 0x0).
+- Honest comments (EOT is canonical-mode-only; dropped the dead `PTY::ChildExited` catch).
+## Still-open Slice 0/2 prerequisites (do BEFORE wiring write_input/attach)
+- **PTY echo:** a cooked PTY echoes typed input back into the buffer → doubled text, and a
+  typed password would land in the ring buffer in cleartext. Before the user/agent writes
+  to a PTY: turn `ECHO` off via `io/console` for the secret path, and/or strip the echoed
+  line at the capture seam. Mask through `SecretsMask`.
+- **Control sequences:** a PTY emits `\r\n` + CSI/OSC. `drain_into` only `scrub_utf8`s.
+  For PTY mode, normalize at the capture seam (strip CR, strip non-SGR CSI/OSC) so the
+  model isn't fed escapes and the attach view doesn't paint raw escapes (route the attach
+  renderer through the same `sanitize_terminal_keep_sgr` the cards use — CWE-150).
+## DRY: the `kind:` discriminator is a DATA TAG, not a control switch
+Review verdict: a `case kind` would spray across ≥7 sites (stop_entry, attach view,
+attached-input, cards, menu, watch, completion) — a smell. Instead:
+1. Give the shell's `BackgroundTasks` entry the **same flat fields** the renderers already
+   read (`prompt`=command, `started_at`, synced `status`, `subagent`="shell"). Then
+   `SubagentCards`, `AgentMenu`, `render_agent_watch` need **zero** branches. Replace the
+   literal `"subagent"` strings with one `entry_kind_label(entry)` helper.
+2. Push the genuinely-divergent behavior behind **~4 polymorphic methods on the entry**
+   (or two small duck-typed adapter objects): `#stop`, `#attach_render(ui)`,
+   `#feed_input(text)`, `#live?`. Then `stop_entry` → `entry.stop`, `attach_agent_view` →
+   `entry.attach_render`, `handle_attached_input` → `entry.feed_input`. **No `case kind`
+   in any UI file.** `kind` survives only as the label.
+## Three gaps to handle when registering a shell entry
+`BackgroundTasks#reserve` carries subagent semantics a shell must NOT inherit:
+1. **Concurrency cap:** `reserve` counts against `max_concurrent_total`/depth/per-owner. A
+   shell is not an LLM run — it must register WITHOUT consuming the subagent budget
+   (separate register path, or exempt `kind: :shell` from `running_count`/`refusal_reason`).
+2. **Double completion notice:** `ShellRegistry#notify_completion` ALREADY pushes
+   `[background-shell] finished`. If the BG entry's `complete` also fires a notice, the user
+   gets two. Pick ONE owner (keep ShellRegistry's; the BG entry only syncs status).
+3. **Dead steer_queue:** `reserve` allocates a `steer_queue`; a shell can't steer/probe.
+   Disable steer/probe for `kind: :shell` (route attached input to `feed_input` → stdin).
+## Status sync note
+Two status sources (ShellRegistry `wait_thr`-derived vs BG stored): sync the BG entry to
+terminal only at `notify_completion`. There's a small window where a just-killed shell still
+reads live until the reader thread fires — acceptable, documented.
+## Sandbox/pgid: confirmed intact under PTY.spawn
+`PTY.spawn` setsid's the child → `pgid == pid` (pgroup:true redundant); the sandbox launcher
+still `exec`s bash in place, so the write-jail + pgid-kill are identical to the pipe path.
+Only cwd handling diverged (fixed above).

data/docs/design/bg-shell-ux.md ADDED Viewed

@@ -0,0 +1,130 @@
+# Background shells as first-class background work (see / focus / stop)
+Status: DESIGN (no implementation yet) · Branch: `feat/bg-shell-ux`
+## Goal
+Give a background **shell** the same user-facing affordances a background **subagent**
+already has:
+1. **See** it — a card + a picker row, at a glance.
+2. **Focus** it — attach to a clear, live view of what it's doing.
+3. **Stop** it — `/stop <id>` from the UI.
+Today a background shell lives ONLY in `ShellRegistry`, so it is invisible to every
+user surface. The model can read/tail/kill it via tools (`shell_output`,
+`shell_tail`, `shell_kill`), but the human has no card, no picker entry, no attach,
+no `/stop`.
+## The central reuse lever (why this is mostly DRY, not new UI)
+Three UI surfaces and the control handlers all read **one source of truth**:
+- `UI::CLI#set_subagent_cards` → `BackgroundTasks.instance.running` (`cli.rb:930`)
+- `UI::AgentMenu` picker entries default → `BackgroundTasks.instance.running` (`agent_menu.rb:21`)
+- `BottomComposer` card host → `BackgroundTasks.instance.running` (`bottom_composer.rb:1639`)
+- `/agents`, `/stop`, `auto_resolve_pending` → `BackgroundTasks` lookups
+None of these inspect `subagent`/`runner` to decide whether to show a row — they
+filter purely on `live_status?` (`LIVE_STATUSES = %i[running needs_approval stopping]`).
+**So: anything in `BackgroundTasks#running` automatically gets a card, a picker row,
+and `/stop`.** The whole feature reduces to *register the shell as a `BackgroundTasks`
+entry* + a few thin, kind-aware branches.
+## Architecture
+Add a `kind: :subagent | :shell` discriminator to `BackgroundTasks::Entry`
+(`background_tasks.rb:60`). A background shell gets BOTH:
+- its existing `ShellRegistry::Entry` (process group, output ring, kill, stdin) — unchanged;
+- a NEW linked `BackgroundTasks::Entry` (`kind: :shell`) that carries the SAME `bg_*`
+  id, so the card/picker/stop surfaces light up and `/stop bg_x` already matches
+  `shell_kill`'s id.
+The two entries are bridged 1:1 by id. `ShellRegistry` stays the process owner;
+`BackgroundTasks` becomes the *presentation + control* layer (as it already is for subagents).
+```
+ShellRegistry::Entry  ──(same bg_ id)──  BackgroundTasks::Entry(kind: :shell)
+  pgid, pipes, buffer                      status, card, picker row, /stop
+  read_new / write_input / kill            attach view, completion notice
+```
+### Reuse AS-IS (the shared seams — no shell-specific code)
+1. `BackgroundTasks#running` + `live_status?` / `LIVE_STATUSES` — the liveness oracle
+   that auto-drives cards + picker + composer.
+2. `UI::SubagentCards` row rendering — reads only plain struct fields
+   (`id, status, tool_count, started_at, prompt`); map `prompt`→command.
+3. `UI::AgentMenu` row rendering — reads only `id, subagent, status, budget_request`.
+4. `InputQueue#push_notice` → idle `coalesced_resume` (#561) — shells ALREADY ride
+   this (`shell_registry.rb:372`).
+5. `render_agent_output_tail` / `watch_loop` (`agents.rb:300-328`) — an existing
+   kind-agnostic byte-tail renderer, perfect for the shell attach view.
+6. `stop_entry` (`background_tasks.rb:456`) as the single stop entry-point, dispatched by kind.
+### Thin shell adapters (the only new code — kept minimal)
+1. **Bridge (register + sync).** In `shell_tool.rb#spawn_background` (`:382`), after
+   `ShellRegistry.spawn`, `reserve` a `kind: :shell` `BackgroundTasks` entry with the
+   same id. In `ShellRegistry#notify_completion` (`:357`), flip the linked entry to
+   `:completed`/`:failed` via `complete` (so the card/picker drop it). Status for a
+   shell is DERIVED (`ShellRegistry#status` from `wait_thr`); the bridge keeps the
+   stored `BackgroundTasks` status in sync — single sync point at completion + an
+   optional poll for the live `tool_count`/activity proxy (bytes/lines).
+2. **Attach branch.** In `chat_command.rb#attach_agent_view` (`:3009`), branch on
+   `kind == :shell`: `entry.messages` is empty (no session), so skip session replay
+   and instead render the captured buffer + a polling `read_new` live-tail (reuse the
+   `watch_loop` shape). Attached plain text → `ShellRegistry.write_input` (stdin),
+   not `steer_agent`.
+3. **Stop branch.** In `stop_entry` (`:456`), branch on `kind == :shell`:
+   `Process.kill` the pgid (reuse `ShellKillTool`'s SIGTERM → grace → SIGKILL body,
+   extracted to a shared `ShellRegistry#signal_group`) instead of `runner.cancel!`.
+### Kind-aware copy (cosmetic, one helper)
+`AgentMenu` header/hints ("subagents", "Enter attaches"), `SubagentCards` glyph
+wording, and `Agents` copy ("No background subagents") hardcode "subagent". Introduce
+ONE `entry_kind_label(entry)` → "subagent"/"shell" used by the picker header + card +
+list copy, so a shell row reads right without forking the renderers.
+## Lifecycle & the two-lifetime rule
+A shell has TWO decoupled lifetimes, by design:
+- The `BackgroundTasks` entry goes **terminal** (drops from `running`/cards/picker) the
+  moment the shell exits — so the UI stops showing a dead shell as live.
+- The `ShellRegistry` entry stays **retired** (RETIRED_TTL) so `shell_output` can still
+  fetch the final output for the model.
+Keep them decoupled: completion flips the BackgroundTasks status; retirement is
+ShellRegistry-only.
+## Open decisions (need your call)
+- **D1 — id namespace.** Recommend the shell's `BackgroundTasks` entry **keep its `bg_*`
+  id** (so `/stop bg_x` == `shell_kill bg_x`, one id the user sees everywhere). (Alt:
+  give it `sa_*` — rejected, splits the id space.)
+- **D2 — attach interactivity (scope).** v1 attach = **read-only live tail**; OR v1
+  also routes attached plain-text to the shell's **stdin** (interactive bg process).
+  stdin-steer is a nice win but more surface to test.
+- **D3 — steer/probe on a shell.** Disable for `kind: :shell` (a shell has no model to
+  probe / no steer queue), OR repurpose steer→stdin (ties to D2).
+## Proposed slices (incremental, each independently testable)
+- **Slice 1 — SEE + STOP.** `kind` discriminator + bridge (register/sync) + `stop_entry`
+  shell branch + kind-aware label. Outcome: a bg shell shows a card + picker row and
+  `/stop bg_x` kills it. (Biggest value, smallest surface — pure reuse + 2 thin branches.)
+- **Slice 2 — FOCUS.** `attach_agent_view` shell branch: clear + buffer + polling tail.
+  Outcome: Enter on a shell row attaches to a live output view; `←`/`/back` returns.
+- **Slice 3 — stdin (optional, D2/D3).** Attached plain-text → `shell_input`.
+Each slice: clean-code, DRY (reuse the named seams), spec'd, verified in the QA
+container with a real bg shell (tmux: card visible, `/stop` kills, attach tails live).
+## Non-goals (v1)
+Reworking `ShellRegistry`'s process model; per-shell resource limits; persisting shell
+output to a session Store (shells stay buffer-backed, not transcript-backed).

data/docs/tools.md CHANGED Viewed

@@ -1,10 +1,10 @@
 # Tools Reference
-rubino ships **29 built-in tools** plus dynamic MCP tools (started at boot when `mcp.servers` is configured — see [mcp.md](mcp.md); being server-dependent they are excluded from the drift-checked list below) and custom user-defined tools. Each tool is gated by a `tools.<key>` config flag (opt-out: absent key = enabled, only an explicit `false` disables) and the approval model. The count and list below are drift-checked against the live registry by `spec/docs/tools_doc_drift_spec.rb`.
+rubino ships **28 built-in tools** plus dynamic MCP tools (started at boot when `mcp.servers` is configured — see [mcp.md](mcp.md); being server-dependent they are excluded from the drift-checked list below) and custom user-defined tools. Each tool is gated by a `tools.<key>` config flag (opt-out: absent key = enabled, only an explicit `false` disables) and the approval model. The count and list below are drift-checked against the live registry by `spec/docs/tools_doc_drift_spec.rb`.
-The full list (registration order): `read`, `summarize_file`, `write`, `edit`, `multi_edit`, `grep`, `glob`, `shell`, `shell_output`, `shell_tail`, `shell_input`, `shell_kill`, `ruby`, `apply_patch`, `webfetch`, `websearch`, `question`, `todowrite`, `memory`, `session_search`, `attach_file`, `read_attachment`, `vision`, `skill`, `task`, `task_result`, `task_stop`, `steer`, `probe`.
+The full list (registration order): `read`, `write`, `edit`, `multi_edit`, `grep`, `glob`, `shell`, `shell_output`, `shell_tail`, `shell_input`, `shell_kill`, `ruby`, `apply_patch`, `webfetch`, `websearch`, `question`, `todowrite`, `memory`, `session_search`, `attach_file`, `read_attachment`, `vision`, `skill`, `task`, `task_result`, `task_stop`, `steer`, `probe`.
-Several tools share one config gate, so `rubino tools` shows **24 rows** (config groups), not 29: `webfetch` + `websearch` share `tools.web`, and the whole delegation family (`task`, `task_result`, `task_stop`, `steer`, `probe`) rides on `tools.task` — disabling delegation disables them all.
+Several tools share one config gate, so `rubino tools` shows **23 rows** (config groups), not 28: `webfetch` + `websearch` share `tools.web`, and the whole delegation family (`task`, `task_result`, `task_stop`, `steer`, `probe`) rides on `tools.task` — disabling delegation disables them all.
 ## How tools are gated
@@ -45,15 +45,6 @@ Risk: low
 Parameters: file_path, offset, limit
 ```
-### summarize_file
-Summarize a large text file WITHOUT loading it into the conversation. The file is map-reduced by a separate summarization model; only the final summary returns, so the raw bytes never enter context. Prefer this over `read` for big documents.
-```
-Risk: low
-Parameters: file_path, focus, max_words
-```
 ### write
 Write content to a file, overwriting any existing content. Creates parent directories if needed. Use `edit`/`multi_edit` to modify an existing file in place.

data/lib/rubino/agent/iteration_budget.rb CHANGED Viewed

@@ -37,6 +37,19 @@ module Rubino
         within_iteration_limit?(iteration) && within_time_limit?
       end
+      # Which rail is blocking the turn RIGHT NOW, so a force-summarized turn can
+      # report WHY it stopped (honest subagent-completion reporting, not a false
+      # "completed"). :iterations when the tool/turn ceiling is spent, :time when
+      # the wall-clock safety-net is, nil when the turn could still continue.
+      # Mirrors #can_continue?'s conjunction — the iteration ceiling is checked
+      # first, matching the order the loop exhausts them.
+      def limiting_factor(iteration)
+        return :iterations unless within_iteration_limit?(iteration)
+        return :time unless within_time_limit?
+        nil
+      end
       # True ONLY when offering the interactive Continue extension would actually
       # help: the SOFT iteration ceiling (@max_tool_iterations) is what's
       # exhausted, and neither non-extendable rail is the blocker (#403).