RubyGems - rubino-agent - Versions diffs - 0.5.1 → 0.5.2.2 - Mend

rubino-agent 0.5.1 → 0.5.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (98) hide show

checksums.yaml +4 -4
data/.dockerignore +15 -0
data/CHANGELOG.md +127 -0
data/Dockerfile +56 -0
data/agent.md +112 -0
data/docs/api/v1.md +2 -0
data/docs/commands.md +3 -6
data/docs/configuration.md +13 -6
data/docs/design/bg-shell-pty-port.md +88 -0
data/docs/design/bg-shell-review-refinements.md +65 -0
data/docs/design/bg-shell-ux.md +130 -0
data/docs/oauth-providers.md +21 -0
data/docs/tools.md +3 -12
data/lib/rubino/agent/iteration_budget.rb +13 -0
data/lib/rubino/agent/loop.rb +43 -5
data/lib/rubino/agent/prompts/build.txt +10 -5
data/lib/rubino/agent/prompts/memory_guidance.txt +5 -0
data/lib/rubino/agent/prompts/tool_use_enforcement.txt +4 -0
data/lib/rubino/agent/prompts/tool_use_enforcement_google.txt +9 -0
data/lib/rubino/agent/prompts/tool_use_enforcement_openai.txt +48 -0
data/lib/rubino/agent/runner.rb +55 -12
data/lib/rubino/agent/tool_executor.rb +1 -1
data/lib/rubino/api/operations/tasks/stop_operation.rb +0 -3
data/lib/rubino/attachments/classify.rb +0 -1
data/lib/rubino/cli/chat/completion_builder.rb +0 -8
data/lib/rubino/cli/chat/idle_card_host.rb +6 -1
data/lib/rubino/cli/chat_command.rb +324 -171
data/lib/rubino/cli/commands.rb +5 -0
data/lib/rubino/commands/built_ins.rb +0 -1
data/lib/rubino/commands/executor.rb +1 -7
data/lib/rubino/commands/handlers/agents.rb +55 -265
data/lib/rubino/commands/handlers/status.rb +6 -3
data/lib/rubino/compression/line_skeleton.rb +1 -1
data/lib/rubino/compression/python_code_skeleton.rb +1 -1
data/lib/rubino/compression/ruby_code_skeleton.rb +1 -1
data/lib/rubino/compression/tree_sitter_code_skeleton.rb +1 -1
data/lib/rubino/config/configuration.rb +47 -18
data/lib/rubino/config/defaults.rb +57 -33
data/lib/rubino/context/prompt_assembler.rb +89 -1
data/lib/rubino/context/summary_builder.rb +0 -22
data/lib/rubino/context/token_budget.rb +0 -5
data/lib/rubino/errors.rb +2 -2
data/lib/rubino/interaction/events.rb +2 -2
data/lib/rubino/interaction/lifecycle.rb +54 -20
data/lib/rubino/llm/anthropic_role_merge.rb +75 -0
data/lib/rubino/llm/error_classifier.rb +34 -1
data/lib/rubino/llm/fake_provider.rb +0 -4
data/lib/rubino/llm/ruby_llm_adapter.rb +222 -59
data/lib/rubino/llm/stream_tool_call_recovery.rb +91 -0
data/lib/rubino/llm/tool_call_recovery.rb +177 -0
data/lib/rubino/memory/sqlite_extraction_prompt.rb +0 -2
data/lib/rubino/memory/store.rb +0 -19
data/lib/rubino/security/pattern_matcher.rb +0 -2
data/lib/rubino/security/redactor.rb +1 -1
data/lib/rubino/security/secret_path.rb +16 -4
data/lib/rubino/session/message.rb +12 -0
data/lib/rubino/skills/registry.rb +16 -2
data/lib/rubino/tools/background_tasks.rb +132 -228
data/lib/rubino/tools/base.rb +1 -17
data/lib/rubino/tools/grep_tool.rb +13 -1
data/lib/rubino/tools/question_tool.rb +3 -4
data/lib/rubino/tools/read_attachment_tool.rb +52 -54
data/lib/rubino/tools/registry.rb +21 -72
data/lib/rubino/tools/shell_entry_adapter.rb +97 -0
data/lib/rubino/tools/shell_input_tool.rb +1 -1
data/lib/rubino/tools/shell_kill_tool.rb +4 -4
data/lib/rubino/tools/shell_registry.rb +178 -38
data/lib/rubino/tools/shell_tool.rb +45 -5
data/lib/rubino/tools/steer_tool.rb +3 -4
data/lib/rubino/tools/task_result_tool.rb +4 -1
data/lib/rubino/tools/task_stop_tool.rb +5 -7
data/lib/rubino/tools/task_tool.rb +81 -35
data/lib/rubino/tools/vision_tool.rb +1 -1
data/lib/rubino/tools/write_tool.rb +22 -2
data/lib/rubino/ui/agent_menu.rb +8 -4
data/lib/rubino/ui/api.rb +11 -0
data/lib/rubino/ui/bottom_composer.rb +240 -374
data/lib/rubino/ui/cli.rb +381 -155
data/lib/rubino/ui/input_history.rb +0 -5
data/lib/rubino/ui/live_region.rb +18 -1
data/lib/rubino/ui/markdown_renderer.rb +51 -4
data/lib/rubino/ui/markdown_repair.rb +114 -0
data/lib/rubino/ui/notifier.rb +4 -10
data/lib/rubino/ui/stdout_proxy.rb +25 -10
data/lib/rubino/ui/streaming_markdown.rb +79 -12
data/lib/rubino/ui/subagent_cards.rb +18 -44
data/lib/rubino/ui/tool_args_stream.rb +143 -0
data/lib/rubino/update_check.rb +10 -2
data/lib/rubino/util/ignore_rules.rb +18 -2
data/lib/rubino/util/secrets_mask.rb +0 -9
data/lib/rubino/version.rb +1 -1
data/lib/rubino.rb +33 -7
data/rubino-agent.gemspec +1 -0
metadata +31 -5
data/AGENTS.md +0 -97
data/docs/agents.md +0 -224
data/lib/rubino/jobs/handlers/summarize_session_job.rb +0 -21
data/lib/rubino/tools/summarize_file_tool.rb +0 -194

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c1debe685b923c625e0dc4dcf95da3c9fc12fcd6c73bdff71e35164279e62b06
-  data.tar.gz: 5451e122fc13bfdd4ffeba0e680cad9fb6b976dfabe8f9dcfd894e5215ac9688
+  metadata.gz: ea93727a0527a270cfbad507d459eb7676532dd4f715967c9d6924cc9aa81c82
+  data.tar.gz: c11bf3ca63ed02a7705447fd65cd58f197939631c1c408a0bb7415b59e333c97
 SHA512:
-  metadata.gz: bf657914d128053ffa39d7911a5c2c12e491ff45b1907e8bc78a694b8f8d7540a1e8d1182ee5831670af17e5e64b16148202d987ebc9e2c75604a88f78148d36
-  data.tar.gz: eefe6fbbcd977bff1cf8b7a189fdaf73daee9ca6b12ca55b82876a99c55ede529dc78ba3f5146f8b34a7e6e6b6b38dab454c9d6cdcf3baca8e754187f49c6829
+  metadata.gz: a0dfb145e9f590745b3cb178768581734109e0ac1d9fb5ec3b10552303e977323b394f96f20af66603d3f921aa180c32c593d77587756755c02195fcce6773e6
+  data.tar.gz: ccaf380590fe7c71c0a3d65f45e949b92ecd2c4f8ee8f00631e46da3b3188910fe49df333431746fc431553d212b6e7c006e4846df22ea9f8c7197f5bf9476c5

data/.dockerignore ADDED Viewed

@@ -0,0 +1,15 @@
+# Build context trimming — keep the image small and the build fast. The gem is
+# run from source (lib/ + exe/ + Gemfile), so none of the below is needed at
+# build or run time. .git is the big one (~66M) and the gemspec's `git ls-files`
+# simply yields [] without it, which is harmless (we run via exe/rubino, not the
+# packaged file list).
+.git
+.github
+coverage
+tmp
+pkg
+*.gem
+*.log
+.DS_Store
+node_modules
+spec/tmp

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,132 @@
 # Changelog
+## [Unreleased]
+## [0.5.2.2] - 2026-07-01
+### Added
+- **Background shells get the same dev UX as background subagents.** A shell
+  started with `run_in_background: true` now appears in the `↓` picker and the
+  live cards alongside subagents, can be FOCUSED (Enter attaches to a cleared
+  view that live-tails its output and lets you type straight to its stdin), and
+  STOPPED with `/stop`. Interactive shells run on a real PTY, so `y/N` prompts,
+  sudo passwords, and tty-aware programs work where a plain pipe couldn't.
+  `probe` (an instant output snapshot — no LLM call), `steer` (→ stdin), the
+  `/agents` list and the `/status` count all treat shells consistently with
+  subagents. Stopping a subagent **cascade-kills the background shells it
+  opened** (shells the user/main agent opened are left running). Ported from
+  Hermes' `ptyprocess`/`process_registry` model.
+- **H1/H2 headings get breathing room** — a blank line above and below big
+  headings so they break the surrounding prose instead of sitting glued to it;
+  H3+ stay compact.
+### Changed
+- **The per-turn wall clock is disabled by default.** `agent.max_turn_seconds`
+  (was 600s) guillotined legitimate multi-file work on slow local models — a real
+  docs-vs-code audit runs dozens of tool calls over 10+ minutes and was
+  force-summarized into a confused non-answer. The default is now `nil`
+  (disabled); the tool-iteration budget (`max_tool_iterations`, 90) is the
+  runaway guard, and per-tool timeouts bound a hung tool. Hermes parity (its
+  IterationBudget has no clock). Set a positive number to re-arm the clock as a
+  backstop.
+- The background picker header reads "background" (not "subagents") now that it
+  lists background shells alongside subagents.
+### Fixed
+- **Truncated subagents are reported as PARTIAL, not "completed".** A background
+  subagent force-summarized at its budget/time cap used to return its partial
+  progress recap as a normal completion, so the parent — and you — got a false
+  success with no real deliverable. The turn's terminal stop reason now flows out
+  of the loop, and the completion notice, the main-timeline marker, and
+  `task_result` all mark a cut-off child **PARTIAL** with a banner telling the
+  parent the delegated work is unfinished — so it recovers (re-delegates or
+  finishes the work itself) instead of trusting a false completion.
+- **A subagent's own `max_turns` budget is honored again.** `explore`'s per-agent
+  cap (20) was silently dropped — the runner passed `nil` for subagents, so the
+  cap never applied. A subagent now honors its cap and, on reaching it, surfaces
+  the budget-extension request (#574) instead of silently force-summarizing.
+- **`rubino update` now reports the new version correctly.** After `gem update`
+  pulled a newer gem, the command read the version via
+  `Gem::Specification.find_by_name`, which returns the spec ACTIVATED in the
+  running process — so it still saw the old version and wrongly printed "rubino is
+  already up to date" even though the update had installed. It now `Gem.refresh`es
+  and reads the HIGHEST installed version (`find_all_by_name(...).max`), so the
+  post-update message reflects what was actually installed.
+## [0.5.2.1] - 2026-06-26
+### Fixed
+- **Symlinked workspace roots broke three path checks.** Several modules compared
+  a symlink-resolved path against a NON-resolved root, so a workspace reached
+  through a symlink (macOS `/etc` → `/private/etc`, `/var` → `/private/var`, or
+  any symlinked checkout) defeated the match:
+  - **SecretPath** — `secret?("/etc/sudoers")` returned `false` and the
+    `~/.ssh`/`~/.aws`/… credential read-gate classified nothing, silently
+    no-op'ing the write-approval gate and read-block for those paths.
+  - **IgnoreRules** — `git rev-parse --show-toplevel` returns the realpath, so
+    the allowed-set rebase dropped *every* file and the whole tree read as
+    git-ignored; `grep`/`glob` then returned nothing under a symlinked checkout.
+  - **Skills::Registry** — an untrusted repo's project-local `.rubino/skills` was
+    not recognised as project-local, so the trust gate failed to drop it (hostile
+    project skills could load in an untrusted directory).
+  All three now resolve both sides of the comparison through `realpath` /
+  `canonical_path`. Defense-in-depth — not security boundaries.
+- **`grep` Ruby fallback now matches dotfiles.** Without ripgrep on PATH, the
+  fallback globbed `**/<include>` without `FNM_DOTMATCH`, so an include like
+  `*.env` never matched `.env`/`.envrc` — exactly the secret-bearing files. The
+  include glob now matches dotfiles, mirroring `rg --glob`.
+## [0.5.2] - 2026-06-26
+### Added
+- **Live formatted-markdown streaming.** The in-flight model stream now renders
+  as formatted markdown while it arrives (Stage 1), painted as atomic frames via
+  DEC-2026 synchronized output so a fast stream never tears mid-update (Stage 2),
+  with committed code blocks syntax-highlighted through Rouge (Stage 3). (#592,
+  #593, #594)
+- **Leaked tool-call recovery.** Models that emit a tool call as plain text or
+  garbled XML/JSON markup instead of a structured call (MiniMax-M3 and other
+  tool-loop models) now have those calls re-parsed into real `tool_calls` at the
+  transport layer so they actually execute, including a garbled `<invoke">`
+  variant.
+- **`write` content preview.** The `write` tool box now shows a preview of the
+  content being written.
+### Changed
+- Raise the default `max_tool_iterations` from 25 to 90 (Hermes-aligned), so long
+  tool-driven turns no longer hit the ceiling mid-task.
+- Teach the agent (via the build prompt) to read the compressed tool-output
+  markers introduced in 0.5.1.
+### Fixed
+- Render any unterminated code fence as a code box, matching CommonMark's
+  end-of-file fence auto-close, instead of leaking the raw backticks. (#595)
+- Merge consecutive same-role messages on the Anthropic-family wire so the
+  request shape stays valid. (#597)
+- Give MiniMax its full output ceiling so a long thinking block no longer starves
+  the visible output (a root cause of heavy-turn "invalid params" death).
+- Keep a 5xx-wrapped "invalid params" response on the retryable path.
+- Multi-line `ask()` prompts no longer erase terminal scrollback.
+- Exclude synthetic `[harness control]` injections from the rewind picker.
+- Fix an installed-gem launch crash (`uninitialized constant Rubino::TAGLINE`).
+### Removed
+- Drop the dead `server.*` config section, the orphaned `ask_parent` takeover and
+  ask/reply substrate, and dead code surfaced by the post-removal audit.
+### Docs
+- Mark native OAuth as not wired end-to-end (WIP).
 ## [0.5.1] - 2026-06-25
 ### Added

data/Dockerfile ADDED Viewed

@@ -0,0 +1,56 @@
+# Ubuntu-based image that runs rubino-agent FROM SOURCE, for manual testing.
+#
+#   Build:  docker build -t rubino:latest .
+#   Run:    docker run --rm -it \
+#             -v "$PWD":/work \                  # the dir the agent works on
+#             -v "$HOME/.rubino":/root/.rubino \ # reuse your host config + keys
+#             rubino:latest rubino
+#
+# The agent is launched via the `rubino` wrapper from any cwd; mount your project
+# at /work. Secrets are NEVER baked in — provide them by mounting ~/.rubino or
+# passing *_API_KEY env vars (-e RUBINO_API_KEY=...).
+FROM ubuntu:24.04
+ENV DEBIAN_FRONTEND=noninteractive \
+    TERM=xterm-256color \
+    LANG=C.UTF-8 \
+    LC_ALL=C.UTF-8
+# Two groups: (1) runtime tools the agent shells out to — git, ripgrep, sqlite3,
+# tmux, curl, less, procps; (2) the toolchain ruby-build (via mise) needs to
+# COMPILE Ruby 3.3.3 and the native gems (nokogiri / ffi / sqlite3).
+RUN apt-get update && apt-get install -y --no-install-recommends \
+      ca-certificates curl git less procps tmux ripgrep sqlite3 \
+      build-essential autoconf bison \
+      libssl-dev libyaml-dev libreadline-dev zlib1g-dev \
+      libncurses-dev libffi-dev libgdbm-dev libsqlite3-dev \
+    && rm -rf /var/lib/apt/lists/*
+# Ruby 3.3.3 via mise — Ubuntu's apt ships only 3.2, but the repo pins 3.3.3
+# (.ruby-version), so we compile the exact version. Put the install's bin dir
+# straight on PATH (no shims) so ruby/gem/bundler resolve deterministically.
+ENV MISE_DATA_DIR=/opt/mise \
+    PATH=/opt/mise/installs/ruby/3.3.3/bin:/usr/local/bin:$PATH
+RUN curl -fsSL https://mise.run | MISE_INSTALL_PATH=/usr/local/bin/mise sh \
+    && mise install ruby@3.3.3 \
+    && gem install bundler -v 4.0.12
+WORKDIR /app
+COPY . /app
+# The lockfile is resolved on macOS (arm64-darwin); add the Linux platforms so
+# `bundle install` stays in lockstep with the pinned versions instead of
+# re-resolving, then install.
+RUN bundle lock --add-platform x86_64-linux aarch64-linux \
+    && bundle install
+# Run rubino from the source checkout WITHOUT changing the caller's cwd, so the
+# agent operates on the mounted /work dir, not /app. (A source checkout has no
+# working binstub; this wrapper replaces it.)
+RUN printf '#!/usr/bin/env bash\nexport BUNDLE_GEMFILE=/app/Gemfile\nexec bundle exec /app/exe/rubino "$@"\n' \
+      > /usr/local/bin/rubino \
+    && chmod +x /usr/local/bin/rubino
+ENV RUBINO_HOME=/root/.rubino
+RUN mkdir -p /root/.rubino /work
+WORKDIR /work
+CMD ["bash"]

data/agent.md ADDED Viewed

@@ -0,0 +1,112 @@
+# agent.md — session state & decisions (handoff)
+> Working handoff for the next agent. NOT the project guide — that's `AGENTS.md`.
+> Branch: `test/pre-release-gate`.
+## How this is run (no Docker)
+- rubino runs from THIS checkout via `~/.local/bin/rubino-dev` (forces ruby
+  3.4.7, `BUNDLE_GEMFILE` = this repo, does NOT change cwd → operates on the dir
+  you invoke it from). Works anywhere on the machine; whatever is checked out
+  here is what runs — no reinstall. The installed gem `0.4.0` is the OLD fallback
+  WITHOUT our fixes — always verify with `rubino-dev`.
+- LLM backend: local OpenAI-compatible server on `127.0.0.1:8000` = **`ds4-serve`**
+  (DeepSeek: `deepseek-v4-flash`, `deepseek-v4-pro`). It **STREAMS tool-call
+  argument deltas** (a file write is on the wire as it generates) AND round-trips
+  `reasoning_content`. Config: `~/.rubino/config.yml` → `model.default:
+  deepseek-v4-flash`, provider `gateway` (openai_compatible, base_url
+  `127.0.0.1:8000/v1`).
+  - ⚠️ **ds4-server is a SINGLE KV slot and has no auto-restart; it CRASHES under
+    large generations.** If a "freeze" reappears, FIRST check it's still up:
+    `curl -s 127.0.0.1:8000/v1/models`. Its log is `/tmp/ds4-server.log` — the
+    single best diagnostic (see below).
+- Verify TUI behavior in a REAL terminal (offline PTY/pyte capture misses
+  raw-mode defects). The fastest objective probe is a PTY driver that timestamps
+  stdout (scratchpad/pty_*.rb in past sessions) + tailing `/tmp/ds4-server.log`.
+## The local-performance work — DONE + validated live (this branch)
+The user's "freeze on the local config" was THREE distinct bugs. All fixed and
+verified live against ds4 (see commits on this branch). Read
+`~/.claude/.../memory/reference_rubino_kv_cache_bust_rootcause.md` +
+`project_rubino_kv_cache_fix.md` for the full diagnosis.
+1. **Cross-turn freeze = KV prefix-cache busting (#608b/#608c).** ds4 only reuses
+   cache on a PURE prefix-extension (`common==live`); any divergence → full
+   `ctx=0..N` re-prefill (grows with context = "freeze after N turns").
+   - **Reasoning replay:** rubino dropped the assistant's `reasoning_content`, so
+     the replay diverged from the server's KV where reasoning was generated.
+     Now persisted (`metadata[:reasoning]`) and replayed as wire
+     `reasoning_content` (Hermes conversation_loop.py:940 parity). Bug found:
+     `extract_thinking` read `response.reasoning` (nonexistent) not `.thinking`.
+     Files: loop.rb, ruby_llm_adapter.rb (normalize_intermediate/rebuild_thinking/
+     load_history), session/message.rb. Effect: turn 2+ 21s → 0.6s.
+   - **Aux off-slot gate:** post-turn memory-extraction/distill ran a divergent
+     no-tools prompt on the SAME slot every turn → evicted the main KV. Now
+     SKIPPED on the interactive REPL when the aux task resolves to the main
+     endpoint (`Configuration#auxiliary_on_main_endpoint?`); extraction happens at
+     session-end flush + compaction (no recall lost — the per-session memory
+     snapshot is frozen anyway). `interactive` flag threaded build_runner(default
+     true)→setup_oneshot(false)→Runner→Lifecycle. A DISTINCT aux endpoint keeps
+     the inter-turn cadence. Files: configuration.rb, lifecycle.rb, runner.rb,
+     chat_command.rb.
+2. **Large-write freeze = dead UI past the preview cap (#608d).** ds4 streams a
+   big `write`'s args for minutes; after the 30-line preview cap `tool_chunk`
+   stopped emitting and the facet was hidden → ~38s of dead screen. Fix (cli.rb):
+   `tool_params_feed`/`tool_chunk` stream the params IN FULL (`full: true`, no
+   cap — the user watches the file land; the cap stays for tool OUTPUT) + an
+   animated facet during arg streaming. Verified: full content shown, UI silence
+   38s → 6.3s. NB the deltas are INCREMENTAL (not cumulative — no O(N²)); the
+   model is just genuinely slow (~17-22 t/s, degrading) for big files.
+3. **`ctx` gauge frozen during a run (#608e).** The bar repainted only at turn
+   boundaries and read persisted messages. Fix: `chat_command#live_status_meter`
+   captures the base once → cheap no-DB lambda on `ui.live_status_provider`; the
+   cli ticker (`refresh_live_ctx_bar`, ~1/s) feeds it `@turn_tok_chars/4` and
+   repaints. `build_status_line` refactored to share `render_status_bar`.
+   Verified: ctx climbs 0k→1.2k during a write.
+### Diagnostic playbook (reuse this)
+- `/tmp/ds4-server.log`: `live kv cache miss … common=N reason=token-mismatch`
+  then `chat ctx=0..N:N prompt done <s>` = a full re-prefill. `common==live` +
+  `ctx=K..N:small done 0.6s` = a cache HIT (what you want). Within-turn tool
+  iterations already hit; the bug is at turn boundaries.
+- Repro multi-turn: `rubino-dev -q "…" --yolo` then `-c -q "…"` (one-shot
+  continues a session) OR a PTY driver for true interactive multi-turn (the gate
+  is REPL-only, so one-shot won't show the aux-eviction fix). `RUBINO_HOME=<tmp>`
+  + a sed'd config isolates variables. A logging TCP proxy (:8999→:8000) captures
+  request/response bodies to confirm delta-vs-cumulative and reasoning replay.
+## Other uncommitted history folded into this branch's tip
+- **Streaming tool-call params UX (#608):** `lib/rubino/ui/tool_args_stream.rb`
+  (single-pass JSON streaming decoder, surfaces string VALUES), adapter
+  `announce_tool_stream` (emits `:tool_preparing` + `:tool_args`), `api.rb`/`cli.rb`
+  sinks, byte-batching in `stdout_proxy.rb`/`cli.rb`. The token/speed footer plan
+  is now partly realized by the live `ctx` gauge (#608e); a tok/s readout is still
+  open.
+- **Per-turn SummarizeSessionJob removed:** the running summary is produced ONLY
+  by threshold-gated compaction (Hermes/Claude-Code parity), never a background
+  job every turn. Handler deleted; `auto_summarize` config + `memory_auto_summarize?`
+  gone. (This ALSO removed one of the per-turn aux-LLM calls — aligned with #608c.)
+## Backlog / NOT done
+- **Fix C (prompt normalization + volatile-to-tail):** NOT needed (cache hits
+  already land without it; the frozen snapshot keeps volatile_tail stable).
+  Offered as optional strict-Hermes parity hardening only.
+- **Large files are genuinely slow on ds4** (model throughput, not rubino). The
+  UI now shows progress; making it FASTER is behavioral (steer toward edits /
+  smaller writes) — not yet done.
+- **tok/s readout** in the footer (the other half of the #608 token-footer plan).
+- 5 PRE-EXISTING host rspec failures (ruby_tool load-path, fresh_home_db schema)
+  — NOT regressions; see `reference_rubino_host_rspec_env_failures`.
+## Constraints / protocol
+- Clean code / DRY; refactor when it keeps things clean.
+- Repo/commit/PR content in English. NO co-author / "Generated with" trailers.
+- Verify with `rubino-dev` against live ds4 before claiming a TUI fix works
+  (offline render misses raw-mode defects).
+## Test status
+Full suite green except the 5 pre-existing host failures above
+(`6376 examples, 5 failures, 8 pending`). Rubocop clean on all touched files.

data/docs/api/v1.md CHANGED Viewed

@@ -350,6 +350,8 @@ Cooperative cancel of a running task (descendant ask-gates are cancelled too). R
 ## OAuth
+> **WIP — not wired end-to-end.** These endpoints work and store encrypted tokens, but **no tool consumes a connection's token yet** and there is no CLI surface. See the status banner in [`docs/oauth-providers.md`](../oauth-providers.md) (and issue #590: native vs MCP-delegated).
 See [`docs/oauth-providers.md`](../oauth-providers.md) for the full PKCE flow, encryption key requirements, and per-provider setup. The HTTP surface:
 ### `GET /v1/oauth/providers` → 200

data/docs/commands.md CHANGED Viewed

@@ -179,7 +179,6 @@ Type these inside `rubino chat`. Generated from `BuiltIns::DESCRIPTIONS` (drift-
 | `/agent` | Switch the primary agent (/agent <name>; a bare /<name> or Tab cycles) |
 | `/agents` | List background subagents; ↓+Enter to attach & steer one live, or steer/probe/view by id |
 | `/tasks` | Alias for /agents |
-| `/reply` | Answer a subagent that is blocked waiting on you (e.g. an approval) |
 | `/stop` | Stop a running subagent (/stop <id>; alias for /agents <id> --stop) |
 | `/jobs` | List the background job queue (status counts); /jobs <id> for detail |
 | `/skills` | List skills; activate one ('none' clears), or enable/disable NAME |
@@ -317,7 +316,7 @@ Read (and set) configuration without leaving the REPL, over the same **effective
 Gets resolve default-valued keys (not just what's in the file), and secret-named keys (`api_key`, tokens, …) render masked — exactly like `rubino config show`. A set writes through `Config::Writer` (the same persist path `/reasoning` and `/think` use) **and** updates the live configuration, so it survives the session and applies from the next turn; consumers that memoize their config (e.g. the memory backend) still need a restart. Typing `/config ` opens a dropdown with the verbs plus the known config keys flattened from the defaults tree; after `get`/`set` the keys complete again.
-### Background subagents: `/agents` and `/reply`
+### Background subagents: `/agents`
 The agent spawns background subagents with its `task` tool; these commands are the human surface over them (full model in [agents.md](agents.md)):
@@ -327,8 +326,6 @@ The agent spawns background subagents with its `task` tool; these commands are t
 /agents <id> --stop           # cancel a running subagent (blocked descendants unwind too)
 /agents <id> steer "note"     # park a note folded into the child's context at its next turn
 /agents <id> probe "question" # ephemeral read-only peek — nothing is saved to the child
-/reply <id> <answer>          # answer a subagent blocked on you (e.g. an approval)
-/reply                        # bare: list the subagents currently blocked on you
 ```
 `/tasks` is an alias for `/agents`.
@@ -337,9 +334,9 @@ The agent spawns background subagents with its `task` tool; these commands are t
 idle prompt to open the subagent picker, arrow to one, and `Enter` to **attach**:
 the screen switches to that agent's own full timeline (its tool calls and what it
 said, replayed) and the prompt becomes scoped — `sa_xxxx ❯`. While attached, just
-type to steer the running child (or answer it if it's blocked on you); `←` on the
+type to steer the running child; `←` on the
 empty prompt (or `/detach`) returns to the main timeline. The scoped prompt makes
-the global `/agents <id> steer/probe` and `/reply <id>` forms redundant — they're
+the global `/agents <id> steer/probe` forms redundant — they're
 the same operations, by id, from anywhere.
 ### Workspace roots: `/add-dir` and `/dirs`

data/docs/configuration.md CHANGED Viewed

@@ -227,6 +227,9 @@ display:
   statusbar: true        # the model + context bar under the chat input
   tool_output_preview_lines: 3  # head lines of tool output shown in the transcript (0 = full dump)
   input_max_rows: 8      # chat input grows up to this many rows, then scrolls
+  live_markdown: true    # format the in-flight streamed block live (false = raw live tail)
+  synchronized_output: true  # atomic frames via DEC-2026 BSU/ESU (false = legacy per-write frames)
+  code_highlight: true       # syntax-highlight committed code blocks (Rouge); false = plain
 paste:
   collapse_lines: 5            # pastes longer than this collapse to a placeholder
@@ -246,6 +249,10 @@ context:
 - `display.statusbar` (default `true`) pins a dim one-line bar UNDER the chat input — the session mode first (plus the branch/skill tokens when set), then the resolved model id and context saturation, e.g. `default · MiniMax-M3 · ctx ~8.4k/64k (13%)` (the percentage is omitted below 1%). The mode token is the live mode indicator (the prompt itself is a constant `▍❯ `): dim `default`, yellow `plan`, red `yolo`. Saturation uses the REAL usage the provider reported for the last response when available (the full assembled prompt, recorded by the agent loop), else the same chars/4 estimate compaction runs on (`Context::TokenBudget`); the window comes from `model.context_length` / `context.max_tokens`. It refreshes at turn boundaries (after each turn footer, and on session resume), never per stream delta. The percentage turns yellow at 70% and red at 90%; with no usable window only the token count shows. The bar is omitted off a TTY or on terminals narrower than 40 columns.
 - `display.tool_output_preview_lines` (default `3`) caps how many head lines of each tool's output the transcript shows before a dim `… +N lines (full output → context)` marker. DISPLAY-ONLY: the model always receives the full output (subject to the `tool_output` truncation caps) — only the scrollback rendering collapses. Set `0` to restore the old full dump.
 - `display.input_max_rows` (default `8`) caps how many visual rows the chat input grows to as a long or multi-line prompt wraps; past the cap the input scrolls vertically, keeping the caret row in view.
+- `display.live_markdown` (default `true`) renders the still-streaming (in-flight) block as FORMATTED markdown in the live region — bold, headings, lists and code style as the tokens arrive, with syntax left open by the partial stream repaired (an open code fence shows as a code block, a dangling `**`/`` ` `` span is closed) so no raw marker leaks. Set `false` for the legacy raw rolling-tail that only snaps to styled when the block commits. Display-only; the committed scrollback render is identical either way.
+- `display.synchronized_output` (default `true`) wraps each live-region frame in DEC private mode 2026 (BSU/ESU synchronized output) so a supporting terminal (kitty, WezTerm, tmux ≥3.4, recent xterm.js) buffers the whole clear→commit→redraw sequence and swaps it in one atomic update — no flicker or tearing on multi-step repaints. Terminals without support silently ignore the mode (it degrades cleanly); the escapes are emitted only to a real TTY. Set `false` for the legacy per-write frames.
+- `display.code_highlight` (default `true`) syntax-highlights fenced code blocks by language (via Rouge) in the COMMITTED render — the live tail stays unstyled, so highlighting never blocks the stream (code shows instantly, colours arrive a beat later when the block commits, like Claude Code). Unknown languages, language-less fences, and any failure fall back to the plain code body. Set `false` for plain (uncoloured) code blocks.
+- An **unterminated** code fence at end-of-stream — a fence the model never closed, or closed with a too-short bare run of backticks (e.g. MiniMax-M3 emitting `` against a ``` opener) — is rendered as a code box, matching CommonMark's end-of-document auto-close (§4.5) that every other renderer relies on. The CLI synthesises the close at the opener length (never relaxing the "close ≥ opener" rule), because kramdown does not auto-close an open fence.
 - `paste.collapse_lines` (default `5`) — the file-backed paste pipeline's first tier. Pasting MORE than this many lines into the chat input inserts a single cyan `[Pasted text #N +M lines]` placeholder instead of flooding the composer; the placeholder is one editable token (backspace deletes it whole, you can type around it, it survives ↑ draft recall and Alt+Enter queueing) and expands to the full pasted body when the message is sent — the model sees everything, while the transcript echo keeps the compact placeholder. Pastes at or under the threshold inline as real rows, exactly as before.
 - `paste.file_threshold_tokens` (default `8000`) — the second tier. A paste estimated above this many tokens (chars/4, the same rule compaction uses) is written to `<RUBINO_HOME>/sessions/<session-id>/paste_N.txt` instead of being held inline, and the sent message carries `[Pasted text #N saved to <path> — too large to inline; read it with the read tool]` so the model reads just the parts it needs. The files persist for the session; `/clear-images` does not touch them (it only drops staged image attachments).
@@ -303,7 +310,6 @@ tasks:
   max_children_per_node: 3       # max LIVE direct children per node
   max_concurrent_total: 8        # hard ceiling on total LIVE subagents across the tree
   max_live_probes_per_child: 5   # per-child budget for billed live probes (probe(live: true))
-  ask_parent_timeout: 900        # vestigial: governed the removed child→parent ask channel; no effect now
 ```
 ### tools
@@ -657,13 +663,14 @@ agents:
     mcp_servers: []
 ```
-### server / api
+### api
-```yaml
-server:
-  port: 4820
-  auth: false
+The API server's listen port and bind host come from the CLI, not config:
+`rubino server --port <n>` (or `RUBINO_API_PORT`, default `4820`) and `--host`
+(or `RUBINO_API_HOST`). The bearer token is `RUBINO_API_KEY`. The `api` block
+configures payload caps, rate limiting, and the public-bind gate:
+```yaml
 api:
   max_body_bytes: 5242880        # 5 MB cap on JSON request bodies (413 past this)
   max_upload_bytes: 52428800     # 50 MB cap on multipart uploads

data/docs/design/bg-shell-pty-port.md ADDED Viewed

@@ -0,0 +1,88 @@
+# Porting Hermes' interactive PTY shell to rubino
+Status: DESIGN (deep-study of Hermes, no impl yet) · Branch: `feat/bg-shell-ux`
+Source studied: `hermes-agent/tools/process_registry.py`, `hermes-agent/tools/terminal_tool.py`,
+`hermes-agent/hermes_cli/pty_bridge.py`.
+## Why PTY (the corrected conclusion)
+A pipe-backed background shell has `stdin=DEVNULL` and can't answer `y/N`, sudo passwords,
+or run TTY-aware/curses programs. Hermes (and Codex `unified_exec`, and the open Claude
+Code FR) all converge on a **PTY**: the process believes it's on a real terminal, and the
+user's keystrokes/answers are written to the PTY master. We follow Hermes.
+## Hermes' model (the algorithm we port, with refs)
+1. **Spawn.** `ProcessRegistry.spawn_local(use_pty=True)` (`process_registry.py:515`) →
+   `ptyprocess.PtyProcess.spawn(cmd, env, ...)` (`:553`); the handle is stored on
+   `ProcessSession._pty` (`:134`). Pipe fallback when ptyprocess is absent. Pipe mode is
+   `stdin=DEVNULL` (`:605`) — deliberately non-interactive.
+2. **Output reader.** `_pty_reader_loop` (`:814`) `pty.read(4096)` until `pty.isalive()` is
+   false; captures `exitstatus`. Feeds `_check_watch_patterns` on each chunk (`:748/784/828`).
+3. **Input primitives.** `write_stdin(id, data)` (`:1184`) → `_pty.write(bytes)`;
+   `submit_stdin(id, data="")` = `write_stdin(data + "\n")` (press Enter, `:1209`);
+   `close_stdin(id)` = EOF without kill (`:1213`).
+4. **Interactive prompt routing.** Thread-local UI callbacks: `set_sudo_password_callback`
+   / `set_approval_callback` (`terminal_tool.py:189-205`). When unset, fall back to
+   `/dev/tty` / `input()`. The CLI registers them so prompts run through the TUI event loop.
+5. **Sudo password.** Detect `sudo` (`_rewrite_real_sudo_invocations`, `:501`), prompt the
+   user with HIDDEN input ("input is hidden", `:404`), cache per scope
+   (`_sudo_password_cache`, scope = session-key / callback-owner / thread, `:205-240`), feed
+   it to the process. Cache cleared on teardown (`_reset_cached_sudo_passwords`).
+6. **Watch patterns.** Regexes scan new output to detect notable lines/prompts; after
+   `WATCH_STRIKE_LIMIT` (3) misses, disable + promote to `notify_on_complete` (`:191-288`).
+## rubino mapping (DRY, faithful, clean-code)
+rubino already has the skeleton: `Tools::ShellRegistry` (pgid tracking + kill),
+`shell_tool` (background spawn), `shell_input`/`shell_output`/`shell_tail`/`shell_kill`.
+Today it is **pipe-only**. The port adds a PTY mode alongside.
+| Hermes | rubino target |
+|--------|---------------|
+| `ptyprocess.PtyProcess.spawn` | Ruby stdlib **`PTY.spawn`** (`require "pty"`) — returns `[reader_io, writer_io, pid]` |
+| `ProcessSession._pty` | a `pty_master`/`pty_pid` field on `ShellRegistry::Entry` (`shell_registry.rb:31`) |
+| `_pty_reader_loop` | the existing `drain_into` reader, reading the PTY master instead of the pipe |
+| `write_stdin/submit_stdin/close_stdin` | extend `ShellRegistry.write_input` + add `submit_input` (`+"\n"`) and `close_input` |
+| sudo/approval UI callbacks | **reuse rubino's existing prompt UI** (`UI::CLI#confirm` / the `question` tool / approval menu) — register a thread/fiber-local "shell input needed" callback that surfaces a masked prompt |
+| `_sudo_password_cache` (per scope) | a per-session masked-secret cache (scope = session id), cleared on teardown; mask in scrollback via the existing `SecretsMask` |
+| watch_patterns | OPTIONAL (slice 3) — a prompt detector (`password:`, `[y/N]`) to auto-surface input without the user attaching |
+### How the USER provides the `y` / password (the goal)
+Two complementary paths, both writing to the same `write_input` PTY primitive:
+- **Attach-and-type (the focus view).** When attached to the shell (the bg-shell-as-
+  `BackgroundTasks`-entry from `bg-shell-ux.md`), your keystrokes/lines route to
+  `ShellRegistry.write_input(id, ...)` → the PTY. You see `[y/N]`, type `y`, it goes in.
+- **Detect-and-prompt (Hermes sudo path, no attach needed).** A registered callback +
+  a prompt detector surface a masked/normal prompt inline ("the shell wants input:
+  `Password:`"); your answer is written to the PTY. Reuses the `question`/approval UI.
+## Stages (each clean-code, spec'd, tmux-verified before the next)
+- **Slice 0 — PTY foundation.** `ShellRegistry` gains a PTY mode (`PTY.spawn`), the reader
+  reads the master, `write_input`/`submit_input`/`close_input` work over the PTY. The model
+  tools (`shell_input`) already call `write_input`, so the agent can drive an interactive
+  bg process. Verify in tmux: `python3 -c "print(input('name? '))"` in bg, `shell_input`
+  "x\n", output shows it. (No user-facing UI yet.)
+- **Slice 1 — SEE + STOP + FOCUS** (from `bg-shell-ux.md`): shell as a `BackgroundTasks`
+  `kind: :shell` entry → card + picker + `/stop` + attach view (clear + live PTY tail).
+- **Slice 2 — USER types in focus.** Attached keystrokes/lines → `write_input` (the PTY).
+  Now you answer `y` yourself in the focus view.
+- **Slice 3 — detect-and-prompt + sudo masked.** Prompt detector + masked password +
+  per-session cache, reusing the `question`/approval UI. The Hermes sudo flow.
+## Gotchas (from the source)
+- `PTY.spawn` makes the child a session leader (PID == PGID) — matches rubino's existing
+  pgid hard-kill, good. But PTY EOF/`Errno::EIO` on child exit must be caught in the reader
+  (Ruby's `PTY` raises `PTY::ChildExited`/`Errno::EIO`).
+- Terminal size: a PTY needs a winsize (`TIOCSWINSZ`); set a sane default (e.g. 120x40) or
+  the attached terminal's size; resize on attach.
+- Masking: the sudo/secret path MUST run answers through `SecretsMask` so the password
+  never lands in scrollback or the output buffer (Hermes hides it; rubino has `SecretsMask`).
+- Don't break the pipe path: keep pipe mode as the default for non-interactive bg work;
+  PTY mode is opt-in (a `pty: true`/`interactive: true` arg, or auto when a prompt is likely).
+- Output: a PTY echoes input back + emits control sequences; the output buffer/tail must
+  strip/normalize (rubino has `ansi_strip`-equivalent? confirm) so the model/user see clean text.

data/docs/design/bg-shell-review-refinements.md ADDED Viewed

@@ -0,0 +1,65 @@
+# Review-driven refinements (Slice 1 design lock-in)
+An adversarial clean-code review of Slice 0 + the bridge plan produced these changes.
+They supersede the "thin adapters" framing in `bg-shell-ux.md` where they conflict.
+## Slice 0 fixes already applied (from review)
+- **close_stdin crash-on-retire (BLOCKER):** EOT only a LIVE child; a dead PTY master
+  raises `Errno::EIO`, so close the fd instead (also reclaiming a leaked `master_w`).
+  Rescue widened to `IOError, Errno::EIO, Errno::EBADF`.
+- **spawn_pty cwd fragility:** `cd … || exit 127\n<cmd>` (own line) — not `cd && (<cmd>)`,
+  which a trailing `#`-comment broke.
+- **winsize:** default `40x120` (a fresh PTY is 0x0).
+- Honest comments (EOT is canonical-mode-only; dropped the dead `PTY::ChildExited` catch).
+## Still-open Slice 0/2 prerequisites (do BEFORE wiring write_input/attach)
+- **PTY echo:** a cooked PTY echoes typed input back into the buffer → doubled text, and a
+  typed password would land in the ring buffer in cleartext. Before the user/agent writes
+  to a PTY: turn `ECHO` off via `io/console` for the secret path, and/or strip the echoed
+  line at the capture seam. Mask through `SecretsMask`.
+- **Control sequences:** a PTY emits `\r\n` + CSI/OSC. `drain_into` only `scrub_utf8`s.
+  For PTY mode, normalize at the capture seam (strip CR, strip non-SGR CSI/OSC) so the
+  model isn't fed escapes and the attach view doesn't paint raw escapes (route the attach
+  renderer through the same `sanitize_terminal_keep_sgr` the cards use — CWE-150).
+## DRY: the `kind:` discriminator is a DATA TAG, not a control switch
+Review verdict: a `case kind` would spray across ≥7 sites (stop_entry, attach view,
+attached-input, cards, menu, watch, completion) — a smell. Instead:
+1. Give the shell's `BackgroundTasks` entry the **same flat fields** the renderers already
+   read (`prompt`=command, `started_at`, synced `status`, `subagent`="shell"). Then
+   `SubagentCards`, `AgentMenu`, `render_agent_watch` need **zero** branches. Replace the
+   literal `"subagent"` strings with one `entry_kind_label(entry)` helper.
+2. Push the genuinely-divergent behavior behind **~4 polymorphic methods on the entry**
+   (or two small duck-typed adapter objects): `#stop`, `#attach_render(ui)`,
+   `#feed_input(text)`, `#live?`. Then `stop_entry` → `entry.stop`, `attach_agent_view` →
+   `entry.attach_render`, `handle_attached_input` → `entry.feed_input`. **No `case kind`
+   in any UI file.** `kind` survives only as the label.
+## Three gaps to handle when registering a shell entry
+`BackgroundTasks#reserve` carries subagent semantics a shell must NOT inherit:
+1. **Concurrency cap:** `reserve` counts against `max_concurrent_total`/depth/per-owner. A
+   shell is not an LLM run — it must register WITHOUT consuming the subagent budget
+   (separate register path, or exempt `kind: :shell` from `running_count`/`refusal_reason`).
+2. **Double completion notice:** `ShellRegistry#notify_completion` ALREADY pushes
+   `[background-shell] finished`. If the BG entry's `complete` also fires a notice, the user
+   gets two. Pick ONE owner (keep ShellRegistry's; the BG entry only syncs status).
+3. **Dead steer_queue:** `reserve` allocates a `steer_queue`; a shell can't steer/probe.
+   Disable steer/probe for `kind: :shell` (route attached input to `feed_input` → stdin).
+## Status sync note
+Two status sources (ShellRegistry `wait_thr`-derived vs BG stored): sync the BG entry to
+terminal only at `notify_completion`. There's a small window where a just-killed shell still
+reads live until the reader thread fires — acceptable, documented.
+## Sandbox/pgid: confirmed intact under PTY.spawn
+`PTY.spawn` setsid's the child → `pgid == pid` (pgroup:true redundant); the sandbox launcher
+still `exec`s bash in place, so the write-jail + pgid-kill are identical to the pipe path.
+Only cwd handling diverged (fixed above).