rubino-agent 0.5.2.1 → 0.5.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (60) hide show
  1. checksums.yaml +4 -4
  2. data/.dockerignore +15 -0
  3. data/CHANGELOG.md +56 -0
  4. data/Dockerfile +56 -0
  5. data/agent.md +112 -0
  6. data/docs/design/bg-shell-pty-port.md +88 -0
  7. data/docs/design/bg-shell-review-refinements.md +65 -0
  8. data/docs/design/bg-shell-ux.md +130 -0
  9. data/docs/tools.md +3 -12
  10. data/lib/rubino/agent/iteration_budget.rb +13 -0
  11. data/lib/rubino/agent/loop.rb +43 -5
  12. data/lib/rubino/agent/prompts/build.txt +3 -5
  13. data/lib/rubino/agent/prompts/memory_guidance.txt +5 -0
  14. data/lib/rubino/agent/prompts/tool_use_enforcement.txt +4 -0
  15. data/lib/rubino/agent/prompts/tool_use_enforcement_google.txt +9 -0
  16. data/lib/rubino/agent/prompts/tool_use_enforcement_openai.txt +48 -0
  17. data/lib/rubino/agent/runner.rb +55 -12
  18. data/lib/rubino/agent/tool_executor.rb +1 -1
  19. data/lib/rubino/cli/chat/idle_card_host.rb +6 -1
  20. data/lib/rubino/cli/chat_command.rb +119 -17
  21. data/lib/rubino/cli/commands.rb +5 -0
  22. data/lib/rubino/commands/handlers/agents.rb +27 -18
  23. data/lib/rubino/commands/handlers/status.rb +6 -3
  24. data/lib/rubino/config/configuration.rb +25 -8
  25. data/lib/rubino/config/defaults.rb +15 -13
  26. data/lib/rubino/context/prompt_assembler.rb +89 -1
  27. data/lib/rubino/context/summary_builder.rb +0 -22
  28. data/lib/rubino/interaction/events.rb +2 -2
  29. data/lib/rubino/interaction/lifecycle.rb +54 -20
  30. data/lib/rubino/llm/ruby_llm_adapter.rb +178 -20
  31. data/lib/rubino/security/redactor.rb +1 -1
  32. data/lib/rubino/session/message.rb +12 -0
  33. data/lib/rubino/tools/background_tasks.rb +107 -12
  34. data/lib/rubino/tools/base.rb +1 -1
  35. data/lib/rubino/tools/read_attachment_tool.rb +52 -54
  36. data/lib/rubino/tools/registry.rb +21 -72
  37. data/lib/rubino/tools/shell_entry_adapter.rb +97 -0
  38. data/lib/rubino/tools/shell_input_tool.rb +1 -1
  39. data/lib/rubino/tools/shell_kill_tool.rb +4 -4
  40. data/lib/rubino/tools/shell_registry.rb +178 -38
  41. data/lib/rubino/tools/shell_tool.rb +45 -5
  42. data/lib/rubino/tools/task_result_tool.rb +4 -1
  43. data/lib/rubino/tools/task_tool.rb +74 -11
  44. data/lib/rubino/tools/vision_tool.rb +1 -1
  45. data/lib/rubino/ui/agent_menu.rb +8 -2
  46. data/lib/rubino/ui/api.rb +11 -0
  47. data/lib/rubino/ui/bottom_composer.rb +24 -11
  48. data/lib/rubino/ui/cli.rb +254 -15
  49. data/lib/rubino/ui/markdown_renderer.rb +4 -1
  50. data/lib/rubino/ui/stdout_proxy.rb +25 -10
  51. data/lib/rubino/ui/streaming_markdown.rb +67 -12
  52. data/lib/rubino/ui/subagent_cards.rb +8 -7
  53. data/lib/rubino/ui/tool_args_stream.rb +143 -0
  54. data/lib/rubino/update_check.rb +10 -2
  55. data/lib/rubino/version.rb +1 -1
  56. metadata +14 -6
  57. data/AGENTS.md +0 -97
  58. data/docs/agents.md +0 -216
  59. data/lib/rubino/jobs/handlers/summarize_session_job.rb +0 -21
  60. data/lib/rubino/tools/summarize_file_tool.rb +0 -194
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 45e009503b875320e560be8ce46459d7c2a70b0c5744043b70f2121c5e747ab2
4
- data.tar.gz: c26d1f586ae8578ed3d7298d1f7f8765d16ed5328b6d063d77e0df41d32a710a
3
+ metadata.gz: ea93727a0527a270cfbad507d459eb7676532dd4f715967c9d6924cc9aa81c82
4
+ data.tar.gz: c11bf3ca63ed02a7705447fd65cd58f197939631c1c408a0bb7415b59e333c97
5
5
  SHA512:
6
- metadata.gz: dc1cfd93fb51e6236daf8e839a8df248fdcd299aef3028c258fefc93c585bf15e322b5d41ebf4d1b504795a9d24d6e8668f5dece2089b9454f6432dabcace482
7
- data.tar.gz: 5495d4862d7b082826205ca4ecb30701df4a103ce801233df6a3dc91c59ce423ea4cb6cab4e41b9b0327f72e911ad310bf9433aaf6d218d8803bb0ffd56bec65
6
+ metadata.gz: a0dfb145e9f590745b3cb178768581734109e0ac1d9fb5ec3b10552303e977323b394f96f20af66603d3f921aa180c32c593d77587756755c02195fcce6773e6
7
+ data.tar.gz: ccaf380590fe7c71c0a3d65f45e949b92ecd2c4f8ee8f00631e46da3b3188910fe49df333431746fc431553d212b6e7c006e4846df22ea9f8c7197f5bf9476c5
data/.dockerignore ADDED
@@ -0,0 +1,15 @@
1
+ # Build context trimming — keep the image small and the build fast. The gem is
2
+ # run from source (lib/ + exe/ + Gemfile), so none of the below is needed at
3
+ # build or run time. .git is the big one (~66M) and the gemspec's `git ls-files`
4
+ # simply yields [] without it, which is harmless (we run via exe/rubino, not the
5
+ # packaged file list).
6
+ .git
7
+ .github
8
+ coverage
9
+ tmp
10
+ pkg
11
+ *.gem
12
+ *.log
13
+ .DS_Store
14
+ node_modules
15
+ spec/tmp
data/CHANGELOG.md CHANGED
@@ -1,5 +1,61 @@
1
1
  # Changelog
2
2
 
3
+ ## [Unreleased]
4
+
5
+ ## [0.5.2.2] - 2026-07-01
6
+
7
+ ### Added
8
+
9
+ - **Background shells get the same dev UX as background subagents.** A shell
10
+ started with `run_in_background: true` now appears in the `↓` picker and the
11
+ live cards alongside subagents, can be FOCUSED (Enter attaches to a cleared
12
+ view that live-tails its output and lets you type straight to its stdin), and
13
+ STOPPED with `/stop`. Interactive shells run on a real PTY, so `y/N` prompts,
14
+ sudo passwords, and tty-aware programs work where a plain pipe couldn't.
15
+ `probe` (an instant output snapshot — no LLM call), `steer` (→ stdin), the
16
+ `/agents` list and the `/status` count all treat shells consistently with
17
+ subagents. Stopping a subagent **cascade-kills the background shells it
18
+ opened** (shells the user/main agent opened are left running). Ported from
19
+ Hermes' `ptyprocess`/`process_registry` model.
20
+ - **H1/H2 headings get breathing room** — a blank line above and below big
21
+ headings so they break the surrounding prose instead of sitting glued to it;
22
+ H3+ stay compact.
23
+
24
+ ### Changed
25
+
26
+ - **The per-turn wall clock is disabled by default.** `agent.max_turn_seconds`
27
+ (was 600s) guillotined legitimate multi-file work on slow local models — a real
28
+ docs-vs-code audit runs dozens of tool calls over 10+ minutes and was
29
+ force-summarized into a confused non-answer. The default is now `nil`
30
+ (disabled); the tool-iteration budget (`max_tool_iterations`, 90) is the
31
+ runaway guard, and per-tool timeouts bound a hung tool. Hermes parity (its
32
+ IterationBudget has no clock). Set a positive number to re-arm the clock as a
33
+ backstop.
34
+ - The background picker header reads "background" (not "subagents") now that it
35
+ lists background shells alongside subagents.
36
+
37
+ ### Fixed
38
+
39
+ - **Truncated subagents are reported as PARTIAL, not "completed".** A background
40
+ subagent force-summarized at its budget/time cap used to return its partial
41
+ progress recap as a normal completion, so the parent — and you — got a false
42
+ success with no real deliverable. The turn's terminal stop reason now flows out
43
+ of the loop, and the completion notice, the main-timeline marker, and
44
+ `task_result` all mark a cut-off child **PARTIAL** with a banner telling the
45
+ parent the delegated work is unfinished — so it recovers (re-delegates or
46
+ finishes the work itself) instead of trusting a false completion.
47
+ - **A subagent's own `max_turns` budget is honored again.** `explore`'s per-agent
48
+ cap (20) was silently dropped — the runner passed `nil` for subagents, so the
49
+ cap never applied. A subagent now honors its cap and, on reaching it, surfaces
50
+ the budget-extension request (#574) instead of silently force-summarizing.
51
+ - **`rubino update` now reports the new version correctly.** After `gem update`
52
+ pulled a newer gem, the command read the version via
53
+ `Gem::Specification.find_by_name`, which returns the spec ACTIVATED in the
54
+ running process — so it still saw the old version and wrongly printed "rubino is
55
+ already up to date" even though the update had installed. It now `Gem.refresh`es
56
+ and reads the HIGHEST installed version (`find_all_by_name(...).max`), so the
57
+ post-update message reflects what was actually installed.
58
+
3
59
  ## [0.5.2.1] - 2026-06-26
4
60
 
5
61
  ### Fixed
data/Dockerfile ADDED
@@ -0,0 +1,56 @@
1
+ # Ubuntu-based image that runs rubino-agent FROM SOURCE, for manual testing.
2
+ #
3
+ # Build: docker build -t rubino:latest .
4
+ # Run: docker run --rm -it \
5
+ # -v "$PWD":/work \ # the dir the agent works on
6
+ # -v "$HOME/.rubino":/root/.rubino \ # reuse your host config + keys
7
+ # rubino:latest rubino
8
+ #
9
+ # The agent is launched via the `rubino` wrapper from any cwd; mount your project
10
+ # at /work. Secrets are NEVER baked in — provide them by mounting ~/.rubino or
11
+ # passing *_API_KEY env vars (-e RUBINO_API_KEY=...).
12
+ FROM ubuntu:24.04
13
+
14
+ ENV DEBIAN_FRONTEND=noninteractive \
15
+ TERM=xterm-256color \
16
+ LANG=C.UTF-8 \
17
+ LC_ALL=C.UTF-8
18
+
19
+ # Two groups: (1) runtime tools the agent shells out to — git, ripgrep, sqlite3,
20
+ # tmux, curl, less, procps; (2) the toolchain ruby-build (via mise) needs to
21
+ # COMPILE Ruby 3.3.3 and the native gems (nokogiri / ffi / sqlite3).
22
+ RUN apt-get update && apt-get install -y --no-install-recommends \
23
+ ca-certificates curl git less procps tmux ripgrep sqlite3 \
24
+ build-essential autoconf bison \
25
+ libssl-dev libyaml-dev libreadline-dev zlib1g-dev \
26
+ libncurses-dev libffi-dev libgdbm-dev libsqlite3-dev \
27
+ && rm -rf /var/lib/apt/lists/*
28
+
29
+ # Ruby 3.3.3 via mise — Ubuntu's apt ships only 3.2, but the repo pins 3.3.3
30
+ # (.ruby-version), so we compile the exact version. Put the install's bin dir
31
+ # straight on PATH (no shims) so ruby/gem/bundler resolve deterministically.
32
+ ENV MISE_DATA_DIR=/opt/mise \
33
+ PATH=/opt/mise/installs/ruby/3.3.3/bin:/usr/local/bin:$PATH
34
+ RUN curl -fsSL https://mise.run | MISE_INSTALL_PATH=/usr/local/bin/mise sh \
35
+ && mise install ruby@3.3.3 \
36
+ && gem install bundler -v 4.0.12
37
+
38
+ WORKDIR /app
39
+ COPY . /app
40
+ # The lockfile is resolved on macOS (arm64-darwin); add the Linux platforms so
41
+ # `bundle install` stays in lockstep with the pinned versions instead of
42
+ # re-resolving, then install.
43
+ RUN bundle lock --add-platform x86_64-linux aarch64-linux \
44
+ && bundle install
45
+
46
+ # Run rubino from the source checkout WITHOUT changing the caller's cwd, so the
47
+ # agent operates on the mounted /work dir, not /app. (A source checkout has no
48
+ # working binstub; this wrapper replaces it.)
49
+ RUN printf '#!/usr/bin/env bash\nexport BUNDLE_GEMFILE=/app/Gemfile\nexec bundle exec /app/exe/rubino "$@"\n' \
50
+ > /usr/local/bin/rubino \
51
+ && chmod +x /usr/local/bin/rubino
52
+
53
+ ENV RUBINO_HOME=/root/.rubino
54
+ RUN mkdir -p /root/.rubino /work
55
+ WORKDIR /work
56
+ CMD ["bash"]
data/agent.md ADDED
@@ -0,0 +1,112 @@
1
+ # agent.md — session state & decisions (handoff)
2
+
3
+ > Working handoff for the next agent. NOT the project guide — that's `AGENTS.md`.
4
+ > Branch: `test/pre-release-gate`.
5
+
6
+ ## How this is run (no Docker)
7
+
8
+ - rubino runs from THIS checkout via `~/.local/bin/rubino-dev` (forces ruby
9
+ 3.4.7, `BUNDLE_GEMFILE` = this repo, does NOT change cwd → operates on the dir
10
+ you invoke it from). Works anywhere on the machine; whatever is checked out
11
+ here is what runs — no reinstall. The installed gem `0.4.0` is the OLD fallback
12
+ WITHOUT our fixes — always verify with `rubino-dev`.
13
+ - LLM backend: local OpenAI-compatible server on `127.0.0.1:8000` = **`ds4-serve`**
14
+ (DeepSeek: `deepseek-v4-flash`, `deepseek-v4-pro`). It **STREAMS tool-call
15
+ argument deltas** (a file write is on the wire as it generates) AND round-trips
16
+ `reasoning_content`. Config: `~/.rubino/config.yml` → `model.default:
17
+ deepseek-v4-flash`, provider `gateway` (openai_compatible, base_url
18
+ `127.0.0.1:8000/v1`).
19
+ - ⚠️ **ds4-server is a SINGLE KV slot and has no auto-restart; it CRASHES under
20
+ large generations.** If a "freeze" reappears, FIRST check it's still up:
21
+ `curl -s 127.0.0.1:8000/v1/models`. Its log is `/tmp/ds4-server.log` — the
22
+ single best diagnostic (see below).
23
+ - Verify TUI behavior in a REAL terminal (offline PTY/pyte capture misses
24
+ raw-mode defects). The fastest objective probe is a PTY driver that timestamps
25
+ stdout (scratchpad/pty_*.rb in past sessions) + tailing `/tmp/ds4-server.log`.
26
+
27
+ ## The local-performance work — DONE + validated live (this branch)
28
+
29
+ The user's "freeze on the local config" was THREE distinct bugs. All fixed and
30
+ verified live against ds4 (see commits on this branch). Read
31
+ `~/.claude/.../memory/reference_rubino_kv_cache_bust_rootcause.md` +
32
+ `project_rubino_kv_cache_fix.md` for the full diagnosis.
33
+
34
+ 1. **Cross-turn freeze = KV prefix-cache busting (#608b/#608c).** ds4 only reuses
35
+ cache on a PURE prefix-extension (`common==live`); any divergence → full
36
+ `ctx=0..N` re-prefill (grows with context = "freeze after N turns").
37
+ - **Reasoning replay:** rubino dropped the assistant's `reasoning_content`, so
38
+ the replay diverged from the server's KV where reasoning was generated.
39
+ Now persisted (`metadata[:reasoning]`) and replayed as wire
40
+ `reasoning_content` (Hermes conversation_loop.py:940 parity). Bug found:
41
+ `extract_thinking` read `response.reasoning` (nonexistent) not `.thinking`.
42
+ Files: loop.rb, ruby_llm_adapter.rb (normalize_intermediate/rebuild_thinking/
43
+ load_history), session/message.rb. Effect: turn 2+ 21s → 0.6s.
44
+ - **Aux off-slot gate:** post-turn memory-extraction/distill ran a divergent
45
+ no-tools prompt on the SAME slot every turn → evicted the main KV. Now
46
+ SKIPPED on the interactive REPL when the aux task resolves to the main
47
+ endpoint (`Configuration#auxiliary_on_main_endpoint?`); extraction happens at
48
+ session-end flush + compaction (no recall lost — the per-session memory
49
+ snapshot is frozen anyway). `interactive` flag threaded build_runner(default
50
+ true)→setup_oneshot(false)→Runner→Lifecycle. A DISTINCT aux endpoint keeps
51
+ the inter-turn cadence. Files: configuration.rb, lifecycle.rb, runner.rb,
52
+ chat_command.rb.
53
+
54
+ 2. **Large-write freeze = dead UI past the preview cap (#608d).** ds4 streams a
55
+ big `write`'s args for minutes; after the 30-line preview cap `tool_chunk`
56
+ stopped emitting and the facet was hidden → ~38s of dead screen. Fix (cli.rb):
57
+ `tool_params_feed`/`tool_chunk` stream the params IN FULL (`full: true`, no
58
+ cap — the user watches the file land; the cap stays for tool OUTPUT) + an
59
+ animated facet during arg streaming. Verified: full content shown, UI silence
60
+ 38s → 6.3s. NB the deltas are INCREMENTAL (not cumulative — no O(N²)); the
61
+ model is just genuinely slow (~17-22 t/s, degrading) for big files.
62
+
63
+ 3. **`ctx` gauge frozen during a run (#608e).** The bar repainted only at turn
64
+ boundaries and read persisted messages. Fix: `chat_command#live_status_meter`
65
+ captures the base once → cheap no-DB lambda on `ui.live_status_provider`; the
66
+ cli ticker (`refresh_live_ctx_bar`, ~1/s) feeds it `@turn_tok_chars/4` and
67
+ repaints. `build_status_line` refactored to share `render_status_bar`.
68
+ Verified: ctx climbs 0k→1.2k during a write.
69
+
70
+ ### Diagnostic playbook (reuse this)
71
+ - `/tmp/ds4-server.log`: `live kv cache miss … common=N reason=token-mismatch`
72
+ then `chat ctx=0..N:N prompt done <s>` = a full re-prefill. `common==live` +
73
+ `ctx=K..N:small done 0.6s` = a cache HIT (what you want). Within-turn tool
74
+ iterations already hit; the bug is at turn boundaries.
75
+ - Repro multi-turn: `rubino-dev -q "…" --yolo` then `-c -q "…"` (one-shot
76
+ continues a session) OR a PTY driver for true interactive multi-turn (the gate
77
+ is REPL-only, so one-shot won't show the aux-eviction fix). `RUBINO_HOME=<tmp>`
78
+ + a sed'd config isolates variables. A logging TCP proxy (:8999→:8000) captures
79
+ request/response bodies to confirm delta-vs-cumulative and reasoning replay.
80
+
81
+ ## Other uncommitted history folded into this branch's tip
82
+ - **Streaming tool-call params UX (#608):** `lib/rubino/ui/tool_args_stream.rb`
83
+ (single-pass JSON streaming decoder, surfaces string VALUES), adapter
84
+ `announce_tool_stream` (emits `:tool_preparing` + `:tool_args`), `api.rb`/`cli.rb`
85
+ sinks, byte-batching in `stdout_proxy.rb`/`cli.rb`. The token/speed footer plan
86
+ is now partly realized by the live `ctx` gauge (#608e); a tok/s readout is still
87
+ open.
88
+ - **Per-turn SummarizeSessionJob removed:** the running summary is produced ONLY
89
+ by threshold-gated compaction (Hermes/Claude-Code parity), never a background
90
+ job every turn. Handler deleted; `auto_summarize` config + `memory_auto_summarize?`
91
+ gone. (This ALSO removed one of the per-turn aux-LLM calls — aligned with #608c.)
92
+
93
+ ## Backlog / NOT done
94
+ - **Fix C (prompt normalization + volatile-to-tail):** NOT needed (cache hits
95
+ already land without it; the frozen snapshot keeps volatile_tail stable).
96
+ Offered as optional strict-Hermes parity hardening only.
97
+ - **Large files are genuinely slow on ds4** (model throughput, not rubino). The
98
+ UI now shows progress; making it FASTER is behavioral (steer toward edits /
99
+ smaller writes) — not yet done.
100
+ - **tok/s readout** in the footer (the other half of the #608 token-footer plan).
101
+ - 5 PRE-EXISTING host rspec failures (ruby_tool load-path, fresh_home_db schema)
102
+ — NOT regressions; see `reference_rubino_host_rspec_env_failures`.
103
+
104
+ ## Constraints / protocol
105
+ - Clean code / DRY; refactor when it keeps things clean.
106
+ - Repo/commit/PR content in English. NO co-author / "Generated with" trailers.
107
+ - Verify with `rubino-dev` against live ds4 before claiming a TUI fix works
108
+ (offline render misses raw-mode defects).
109
+
110
+ ## Test status
111
+ Full suite green except the 5 pre-existing host failures above
112
+ (`6376 examples, 5 failures, 8 pending`). Rubocop clean on all touched files.
@@ -0,0 +1,88 @@
1
+ # Porting Hermes' interactive PTY shell to rubino
2
+
3
+ Status: DESIGN (deep-study of Hermes, no impl yet) · Branch: `feat/bg-shell-ux`
4
+ Source studied: `hermes-agent/tools/process_registry.py`, `hermes-agent/tools/terminal_tool.py`,
5
+ `hermes-agent/hermes_cli/pty_bridge.py`.
6
+
7
+ ## Why PTY (the corrected conclusion)
8
+
9
+ A pipe-backed background shell has `stdin=DEVNULL` and can't answer `y/N`, sudo passwords,
10
+ or run TTY-aware/curses programs. Hermes (and Codex `unified_exec`, and the open Claude
11
+ Code FR) all converge on a **PTY**: the process believes it's on a real terminal, and the
12
+ user's keystrokes/answers are written to the PTY master. We follow Hermes.
13
+
14
+ ## Hermes' model (the algorithm we port, with refs)
15
+
16
+ 1. **Spawn.** `ProcessRegistry.spawn_local(use_pty=True)` (`process_registry.py:515`) →
17
+ `ptyprocess.PtyProcess.spawn(cmd, env, ...)` (`:553`); the handle is stored on
18
+ `ProcessSession._pty` (`:134`). Pipe fallback when ptyprocess is absent. Pipe mode is
19
+ `stdin=DEVNULL` (`:605`) — deliberately non-interactive.
20
+ 2. **Output reader.** `_pty_reader_loop` (`:814`) `pty.read(4096)` until `pty.isalive()` is
21
+ false; captures `exitstatus`. Feeds `_check_watch_patterns` on each chunk (`:748/784/828`).
22
+ 3. **Input primitives.** `write_stdin(id, data)` (`:1184`) → `_pty.write(bytes)`;
23
+ `submit_stdin(id, data="")` = `write_stdin(data + "\n")` (press Enter, `:1209`);
24
+ `close_stdin(id)` = EOF without kill (`:1213`).
25
+ 4. **Interactive prompt routing.** Thread-local UI callbacks: `set_sudo_password_callback`
26
+ / `set_approval_callback` (`terminal_tool.py:189-205`). When unset, fall back to
27
+ `/dev/tty` / `input()`. The CLI registers them so prompts run through the TUI event loop.
28
+ 5. **Sudo password.** Detect `sudo` (`_rewrite_real_sudo_invocations`, `:501`), prompt the
29
+ user with HIDDEN input ("input is hidden", `:404`), cache per scope
30
+ (`_sudo_password_cache`, scope = session-key / callback-owner / thread, `:205-240`), feed
31
+ it to the process. Cache cleared on teardown (`_reset_cached_sudo_passwords`).
32
+ 6. **Watch patterns.** Regexes scan new output to detect notable lines/prompts; after
33
+ `WATCH_STRIKE_LIMIT` (3) misses, disable + promote to `notify_on_complete` (`:191-288`).
34
+
35
+ ## rubino mapping (DRY, faithful, clean-code)
36
+
37
+ rubino already has the skeleton: `Tools::ShellRegistry` (pgid tracking + kill),
38
+ `shell_tool` (background spawn), `shell_input`/`shell_output`/`shell_tail`/`shell_kill`.
39
+ Today it is **pipe-only**. The port adds a PTY mode alongside.
40
+
41
+ | Hermes | rubino target |
42
+ |--------|---------------|
43
+ | `ptyprocess.PtyProcess.spawn` | Ruby stdlib **`PTY.spawn`** (`require "pty"`) — returns `[reader_io, writer_io, pid]` |
44
+ | `ProcessSession._pty` | a `pty_master`/`pty_pid` field on `ShellRegistry::Entry` (`shell_registry.rb:31`) |
45
+ | `_pty_reader_loop` | the existing `drain_into` reader, reading the PTY master instead of the pipe |
46
+ | `write_stdin/submit_stdin/close_stdin` | extend `ShellRegistry.write_input` + add `submit_input` (`+"\n"`) and `close_input` |
47
+ | sudo/approval UI callbacks | **reuse rubino's existing prompt UI** (`UI::CLI#confirm` / the `question` tool / approval menu) — register a thread/fiber-local "shell input needed" callback that surfaces a masked prompt |
48
+ | `_sudo_password_cache` (per scope) | a per-session masked-secret cache (scope = session id), cleared on teardown; mask in scrollback via the existing `SecretsMask` |
49
+ | watch_patterns | OPTIONAL (slice 3) — a prompt detector (`password:`, `[y/N]`) to auto-surface input without the user attaching |
50
+
51
+ ### How the USER provides the `y` / password (the goal)
52
+
53
+ Two complementary paths, both writing to the same `write_input` PTY primitive:
54
+
55
+ - **Attach-and-type (the focus view).** When attached to the shell (the bg-shell-as-
56
+ `BackgroundTasks`-entry from `bg-shell-ux.md`), your keystrokes/lines route to
57
+ `ShellRegistry.write_input(id, ...)` → the PTY. You see `[y/N]`, type `y`, it goes in.
58
+ - **Detect-and-prompt (Hermes sudo path, no attach needed).** A registered callback +
59
+ a prompt detector surface a masked/normal prompt inline ("the shell wants input:
60
+ `Password:`"); your answer is written to the PTY. Reuses the `question`/approval UI.
61
+
62
+ ## Stages (each clean-code, spec'd, tmux-verified before the next)
63
+
64
+ - **Slice 0 — PTY foundation.** `ShellRegistry` gains a PTY mode (`PTY.spawn`), the reader
65
+ reads the master, `write_input`/`submit_input`/`close_input` work over the PTY. The model
66
+ tools (`shell_input`) already call `write_input`, so the agent can drive an interactive
67
+ bg process. Verify in tmux: `python3 -c "print(input('name? '))"` in bg, `shell_input`
68
+ "x\n", output shows it. (No user-facing UI yet.)
69
+ - **Slice 1 — SEE + STOP + FOCUS** (from `bg-shell-ux.md`): shell as a `BackgroundTasks`
70
+ `kind: :shell` entry → card + picker + `/stop` + attach view (clear + live PTY tail).
71
+ - **Slice 2 — USER types in focus.** Attached keystrokes/lines → `write_input` (the PTY).
72
+ Now you answer `y` yourself in the focus view.
73
+ - **Slice 3 — detect-and-prompt + sudo masked.** Prompt detector + masked password +
74
+ per-session cache, reusing the `question`/approval UI. The Hermes sudo flow.
75
+
76
+ ## Gotchas (from the source)
77
+
78
+ - `PTY.spawn` makes the child a session leader (PID == PGID) — matches rubino's existing
79
+ pgid hard-kill, good. But PTY EOF/`Errno::EIO` on child exit must be caught in the reader
80
+ (Ruby's `PTY` raises `PTY::ChildExited`/`Errno::EIO`).
81
+ - Terminal size: a PTY needs a winsize (`TIOCSWINSZ`); set a sane default (e.g. 120x40) or
82
+ the attached terminal's size; resize on attach.
83
+ - Masking: the sudo/secret path MUST run answers through `SecretsMask` so the password
84
+ never lands in scrollback or the output buffer (Hermes hides it; rubino has `SecretsMask`).
85
+ - Don't break the pipe path: keep pipe mode as the default for non-interactive bg work;
86
+ PTY mode is opt-in (a `pty: true`/`interactive: true` arg, or auto when a prompt is likely).
87
+ - Output: a PTY echoes input back + emits control sequences; the output buffer/tail must
88
+ strip/normalize (rubino has `ansi_strip`-equivalent? confirm) so the model/user see clean text.
@@ -0,0 +1,65 @@
1
+ # Review-driven refinements (Slice 1 design lock-in)
2
+
3
+ An adversarial clean-code review of Slice 0 + the bridge plan produced these changes.
4
+ They supersede the "thin adapters" framing in `bg-shell-ux.md` where they conflict.
5
+
6
+ ## Slice 0 fixes already applied (from review)
7
+
8
+ - **close_stdin crash-on-retire (BLOCKER):** EOT only a LIVE child; a dead PTY master
9
+ raises `Errno::EIO`, so close the fd instead (also reclaiming a leaked `master_w`).
10
+ Rescue widened to `IOError, Errno::EIO, Errno::EBADF`.
11
+ - **spawn_pty cwd fragility:** `cd … || exit 127\n<cmd>` (own line) — not `cd && (<cmd>)`,
12
+ which a trailing `#`-comment broke.
13
+ - **winsize:** default `40x120` (a fresh PTY is 0x0).
14
+ - Honest comments (EOT is canonical-mode-only; dropped the dead `PTY::ChildExited` catch).
15
+
16
+ ## Still-open Slice 0/2 prerequisites (do BEFORE wiring write_input/attach)
17
+
18
+ - **PTY echo:** a cooked PTY echoes typed input back into the buffer → doubled text, and a
19
+ typed password would land in the ring buffer in cleartext. Before the user/agent writes
20
+ to a PTY: turn `ECHO` off via `io/console` for the secret path, and/or strip the echoed
21
+ line at the capture seam. Mask through `SecretsMask`.
22
+ - **Control sequences:** a PTY emits `\r\n` + CSI/OSC. `drain_into` only `scrub_utf8`s.
23
+ For PTY mode, normalize at the capture seam (strip CR, strip non-SGR CSI/OSC) so the
24
+ model isn't fed escapes and the attach view doesn't paint raw escapes (route the attach
25
+ renderer through the same `sanitize_terminal_keep_sgr` the cards use — CWE-150).
26
+
27
+ ## DRY: the `kind:` discriminator is a DATA TAG, not a control switch
28
+
29
+ Review verdict: a `case kind` would spray across ≥7 sites (stop_entry, attach view,
30
+ attached-input, cards, menu, watch, completion) — a smell. Instead:
31
+
32
+ 1. Give the shell's `BackgroundTasks` entry the **same flat fields** the renderers already
33
+ read (`prompt`=command, `started_at`, synced `status`, `subagent`="shell"). Then
34
+ `SubagentCards`, `AgentMenu`, `render_agent_watch` need **zero** branches. Replace the
35
+ literal `"subagent"` strings with one `entry_kind_label(entry)` helper.
36
+ 2. Push the genuinely-divergent behavior behind **~4 polymorphic methods on the entry**
37
+ (or two small duck-typed adapter objects): `#stop`, `#attach_render(ui)`,
38
+ `#feed_input(text)`, `#live?`. Then `stop_entry` → `entry.stop`, `attach_agent_view` →
39
+ `entry.attach_render`, `handle_attached_input` → `entry.feed_input`. **No `case kind`
40
+ in any UI file.** `kind` survives only as the label.
41
+
42
+ ## Three gaps to handle when registering a shell entry
43
+
44
+ `BackgroundTasks#reserve` carries subagent semantics a shell must NOT inherit:
45
+
46
+ 1. **Concurrency cap:** `reserve` counts against `max_concurrent_total`/depth/per-owner. A
47
+ shell is not an LLM run — it must register WITHOUT consuming the subagent budget
48
+ (separate register path, or exempt `kind: :shell` from `running_count`/`refusal_reason`).
49
+ 2. **Double completion notice:** `ShellRegistry#notify_completion` ALREADY pushes
50
+ `[background-shell] finished`. If the BG entry's `complete` also fires a notice, the user
51
+ gets two. Pick ONE owner (keep ShellRegistry's; the BG entry only syncs status).
52
+ 3. **Dead steer_queue:** `reserve` allocates a `steer_queue`; a shell can't steer/probe.
53
+ Disable steer/probe for `kind: :shell` (route attached input to `feed_input` → stdin).
54
+
55
+ ## Status sync note
56
+
57
+ Two status sources (ShellRegistry `wait_thr`-derived vs BG stored): sync the BG entry to
58
+ terminal only at `notify_completion`. There's a small window where a just-killed shell still
59
+ reads live until the reader thread fires — acceptable, documented.
60
+
61
+ ## Sandbox/pgid: confirmed intact under PTY.spawn
62
+
63
+ `PTY.spawn` setsid's the child → `pgid == pid` (pgroup:true redundant); the sandbox launcher
64
+ still `exec`s bash in place, so the write-jail + pgid-kill are identical to the pipe path.
65
+ Only cwd handling diverged (fixed above).
@@ -0,0 +1,130 @@
1
+ # Background shells as first-class background work (see / focus / stop)
2
+
3
+ Status: DESIGN (no implementation yet) · Branch: `feat/bg-shell-ux`
4
+
5
+ ## Goal
6
+
7
+ Give a background **shell** the same user-facing affordances a background **subagent**
8
+ already has:
9
+
10
+ 1. **See** it — a card + a picker row, at a glance.
11
+ 2. **Focus** it — attach to a clear, live view of what it's doing.
12
+ 3. **Stop** it — `/stop <id>` from the UI.
13
+
14
+ Today a background shell lives ONLY in `ShellRegistry`, so it is invisible to every
15
+ user surface. The model can read/tail/kill it via tools (`shell_output`,
16
+ `shell_tail`, `shell_kill`), but the human has no card, no picker entry, no attach,
17
+ no `/stop`.
18
+
19
+ ## The central reuse lever (why this is mostly DRY, not new UI)
20
+
21
+ Three UI surfaces and the control handlers all read **one source of truth**:
22
+
23
+ - `UI::CLI#set_subagent_cards` → `BackgroundTasks.instance.running` (`cli.rb:930`)
24
+ - `UI::AgentMenu` picker entries default → `BackgroundTasks.instance.running` (`agent_menu.rb:21`)
25
+ - `BottomComposer` card host → `BackgroundTasks.instance.running` (`bottom_composer.rb:1639`)
26
+ - `/agents`, `/stop`, `auto_resolve_pending` → `BackgroundTasks` lookups
27
+
28
+ None of these inspect `subagent`/`runner` to decide whether to show a row — they
29
+ filter purely on `live_status?` (`LIVE_STATUSES = %i[running needs_approval stopping]`).
30
+
31
+ **So: anything in `BackgroundTasks#running` automatically gets a card, a picker row,
32
+ and `/stop`.** The whole feature reduces to *register the shell as a `BackgroundTasks`
33
+ entry* + a few thin, kind-aware branches.
34
+
35
+ ## Architecture
36
+
37
+ Add a `kind: :subagent | :shell` discriminator to `BackgroundTasks::Entry`
38
+ (`background_tasks.rb:60`). A background shell gets BOTH:
39
+
40
+ - its existing `ShellRegistry::Entry` (process group, output ring, kill, stdin) — unchanged;
41
+ - a NEW linked `BackgroundTasks::Entry` (`kind: :shell`) that carries the SAME `bg_*`
42
+ id, so the card/picker/stop surfaces light up and `/stop bg_x` already matches
43
+ `shell_kill`'s id.
44
+
45
+ The two entries are bridged 1:1 by id. `ShellRegistry` stays the process owner;
46
+ `BackgroundTasks` becomes the *presentation + control* layer (as it already is for subagents).
47
+
48
+ ```
49
+ ShellRegistry::Entry ──(same bg_ id)── BackgroundTasks::Entry(kind: :shell)
50
+ pgid, pipes, buffer status, card, picker row, /stop
51
+ read_new / write_input / kill attach view, completion notice
52
+ ```
53
+
54
+ ### Reuse AS-IS (the shared seams — no shell-specific code)
55
+
56
+ 1. `BackgroundTasks#running` + `live_status?` / `LIVE_STATUSES` — the liveness oracle
57
+ that auto-drives cards + picker + composer.
58
+ 2. `UI::SubagentCards` row rendering — reads only plain struct fields
59
+ (`id, status, tool_count, started_at, prompt`); map `prompt`→command.
60
+ 3. `UI::AgentMenu` row rendering — reads only `id, subagent, status, budget_request`.
61
+ 4. `InputQueue#push_notice` → idle `coalesced_resume` (#561) — shells ALREADY ride
62
+ this (`shell_registry.rb:372`).
63
+ 5. `render_agent_output_tail` / `watch_loop` (`agents.rb:300-328`) — an existing
64
+ kind-agnostic byte-tail renderer, perfect for the shell attach view.
65
+ 6. `stop_entry` (`background_tasks.rb:456`) as the single stop entry-point, dispatched by kind.
66
+
67
+ ### Thin shell adapters (the only new code — kept minimal)
68
+
69
+ 1. **Bridge (register + sync).** In `shell_tool.rb#spawn_background` (`:382`), after
70
+ `ShellRegistry.spawn`, `reserve` a `kind: :shell` `BackgroundTasks` entry with the
71
+ same id. In `ShellRegistry#notify_completion` (`:357`), flip the linked entry to
72
+ `:completed`/`:failed` via `complete` (so the card/picker drop it). Status for a
73
+ shell is DERIVED (`ShellRegistry#status` from `wait_thr`); the bridge keeps the
74
+ stored `BackgroundTasks` status in sync — single sync point at completion + an
75
+ optional poll for the live `tool_count`/activity proxy (bytes/lines).
76
+ 2. **Attach branch.** In `chat_command.rb#attach_agent_view` (`:3009`), branch on
77
+ `kind == :shell`: `entry.messages` is empty (no session), so skip session replay
78
+ and instead render the captured buffer + a polling `read_new` live-tail (reuse the
79
+ `watch_loop` shape). Attached plain text → `ShellRegistry.write_input` (stdin),
80
+ not `steer_agent`.
81
+ 3. **Stop branch.** In `stop_entry` (`:456`), branch on `kind == :shell`:
82
+ `Process.kill` the pgid (reuse `ShellKillTool`'s SIGTERM → grace → SIGKILL body,
83
+ extracted to a shared `ShellRegistry#signal_group`) instead of `runner.cancel!`.
84
+
85
+ ### Kind-aware copy (cosmetic, one helper)
86
+
87
+ `AgentMenu` header/hints ("subagents", "Enter attaches"), `SubagentCards` glyph
88
+ wording, and `Agents` copy ("No background subagents") hardcode "subagent". Introduce
89
+ ONE `entry_kind_label(entry)` → "subagent"/"shell" used by the picker header + card +
90
+ list copy, so a shell row reads right without forking the renderers.
91
+
92
+ ## Lifecycle & the two-lifetime rule
93
+
94
+ A shell has TWO decoupled lifetimes, by design:
95
+
96
+ - The `BackgroundTasks` entry goes **terminal** (drops from `running`/cards/picker) the
97
+ moment the shell exits — so the UI stops showing a dead shell as live.
98
+ - The `ShellRegistry` entry stays **retired** (RETIRED_TTL) so `shell_output` can still
99
+ fetch the final output for the model.
100
+
101
+ Keep them decoupled: completion flips the BackgroundTasks status; retirement is
102
+ ShellRegistry-only.
103
+
104
+ ## Open decisions (need your call)
105
+
106
+ - **D1 — id namespace.** Recommend the shell's `BackgroundTasks` entry **keep its `bg_*`
107
+ id** (so `/stop bg_x` == `shell_kill bg_x`, one id the user sees everywhere). (Alt:
108
+ give it `sa_*` — rejected, splits the id space.)
109
+ - **D2 — attach interactivity (scope).** v1 attach = **read-only live tail**; OR v1
110
+ also routes attached plain-text to the shell's **stdin** (interactive bg process).
111
+ stdin-steer is a nice win but more surface to test.
112
+ - **D3 — steer/probe on a shell.** Disable for `kind: :shell` (a shell has no model to
113
+ probe / no steer queue), OR repurpose steer→stdin (ties to D2).
114
+
115
+ ## Proposed slices (incremental, each independently testable)
116
+
117
+ - **Slice 1 — SEE + STOP.** `kind` discriminator + bridge (register/sync) + `stop_entry`
118
+ shell branch + kind-aware label. Outcome: a bg shell shows a card + picker row and
119
+ `/stop bg_x` kills it. (Biggest value, smallest surface — pure reuse + 2 thin branches.)
120
+ - **Slice 2 — FOCUS.** `attach_agent_view` shell branch: clear + buffer + polling tail.
121
+ Outcome: Enter on a shell row attaches to a live output view; `←`/`/back` returns.
122
+ - **Slice 3 — stdin (optional, D2/D3).** Attached plain-text → `shell_input`.
123
+
124
+ Each slice: clean-code, DRY (reuse the named seams), spec'd, verified in the QA
125
+ container with a real bg shell (tmux: card visible, `/stop` kills, attach tails live).
126
+
127
+ ## Non-goals (v1)
128
+
129
+ Reworking `ShellRegistry`'s process model; per-shell resource limits; persisting shell
130
+ output to a session Store (shells stay buffer-backed, not transcript-backed).
data/docs/tools.md CHANGED
@@ -1,10 +1,10 @@
1
1
  # Tools Reference
2
2
 
3
- rubino ships **29 built-in tools** plus dynamic MCP tools (started at boot when `mcp.servers` is configured — see [mcp.md](mcp.md); being server-dependent they are excluded from the drift-checked list below) and custom user-defined tools. Each tool is gated by a `tools.<key>` config flag (opt-out: absent key = enabled, only an explicit `false` disables) and the approval model. The count and list below are drift-checked against the live registry by `spec/docs/tools_doc_drift_spec.rb`.
3
+ rubino ships **28 built-in tools** plus dynamic MCP tools (started at boot when `mcp.servers` is configured — see [mcp.md](mcp.md); being server-dependent they are excluded from the drift-checked list below) and custom user-defined tools. Each tool is gated by a `tools.<key>` config flag (opt-out: absent key = enabled, only an explicit `false` disables) and the approval model. The count and list below are drift-checked against the live registry by `spec/docs/tools_doc_drift_spec.rb`.
4
4
 
5
- The full list (registration order): `read`, `summarize_file`, `write`, `edit`, `multi_edit`, `grep`, `glob`, `shell`, `shell_output`, `shell_tail`, `shell_input`, `shell_kill`, `ruby`, `apply_patch`, `webfetch`, `websearch`, `question`, `todowrite`, `memory`, `session_search`, `attach_file`, `read_attachment`, `vision`, `skill`, `task`, `task_result`, `task_stop`, `steer`, `probe`.
5
+ The full list (registration order): `read`, `write`, `edit`, `multi_edit`, `grep`, `glob`, `shell`, `shell_output`, `shell_tail`, `shell_input`, `shell_kill`, `ruby`, `apply_patch`, `webfetch`, `websearch`, `question`, `todowrite`, `memory`, `session_search`, `attach_file`, `read_attachment`, `vision`, `skill`, `task`, `task_result`, `task_stop`, `steer`, `probe`.
6
6
 
7
- Several tools share one config gate, so `rubino tools` shows **24 rows** (config groups), not 29: `webfetch` + `websearch` share `tools.web`, and the whole delegation family (`task`, `task_result`, `task_stop`, `steer`, `probe`) rides on `tools.task` — disabling delegation disables them all.
7
+ Several tools share one config gate, so `rubino tools` shows **23 rows** (config groups), not 28: `webfetch` + `websearch` share `tools.web`, and the whole delegation family (`task`, `task_result`, `task_stop`, `steer`, `probe`) rides on `tools.task` — disabling delegation disables them all.
8
8
 
9
9
  ## How tools are gated
10
10
 
@@ -45,15 +45,6 @@ Risk: low
45
45
  Parameters: file_path, offset, limit
46
46
  ```
47
47
 
48
- ### summarize_file
49
-
50
- Summarize a large text file WITHOUT loading it into the conversation. The file is map-reduced by a separate summarization model; only the final summary returns, so the raw bytes never enter context. Prefer this over `read` for big documents.
51
-
52
- ```
53
- Risk: low
54
- Parameters: file_path, focus, max_words
55
- ```
56
-
57
48
  ### write
58
49
 
59
50
  Write content to a file, overwriting any existing content. Creates parent directories if needed. Use `edit`/`multi_edit` to modify an existing file in place.
@@ -37,6 +37,19 @@ module Rubino
37
37
  within_iteration_limit?(iteration) && within_time_limit?
38
38
  end
39
39
 
40
+ # Which rail is blocking the turn RIGHT NOW, so a force-summarized turn can
41
+ # report WHY it stopped (honest subagent-completion reporting, not a false
42
+ # "completed"). :iterations when the tool/turn ceiling is spent, :time when
43
+ # the wall-clock safety-net is, nil when the turn could still continue.
44
+ # Mirrors #can_continue?'s conjunction — the iteration ceiling is checked
45
+ # first, matching the order the loop exhausts them.
46
+ def limiting_factor(iteration)
47
+ return :iterations unless within_iteration_limit?(iteration)
48
+ return :time unless within_time_limit?
49
+
50
+ nil
51
+ end
52
+
40
53
  # True ONLY when offering the interactive Continue extension would actually
41
54
  # help: the SOFT iteration ceiling (@max_tool_iterations) is what's
42
55
  # exhausted, and neither non-extendable rail is the blocker (#403).