npm - agent-sh - Versions diffs - 0.15.0 → 0.15.2 - Mend

agent-sh 0.15.0 → 0.15.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (124) hide show

package/dist/agent/agent-loop.js +11 -8
package/dist/agent/events.d.ts +4 -0
package/docs/README.md +14 -0
package/docs/agent.md +398 -0
package/docs/architecture.md +196 -0
package/docs/context-management.md +200 -0
package/docs/extensions.md +951 -0
package/docs/library.md +84 -0
package/docs/troubleshooting.md +65 -0
package/docs/tui-composition.md +294 -0
package/docs/usage.md +306 -0
package/examples/extensions/ash-scheme/package.json +1 -1
package/examples/extensions/ashi/EXTENDING.md +2 -2
package/examples/extensions/ashi/README.md +2 -2
package/examples/extensions/ashi/docs/ui-surface-protocol.md +1 -1
package/examples/extensions/ashi/package.json +5 -3
package/examples/extensions/ashi/src/chat/tool-group.ts +3 -2
package/examples/extensions/ashi/src/cli.ts +9 -8
package/examples/extensions/ashi/src/dialogs.ts +16 -1
package/examples/extensions/ashi/src/events.ts +1 -0
package/examples/extensions/ashi/src/frontend.ts +26 -6
package/examples/extensions/ashi/src/renderer.ts +24 -4
package/examples/extensions/ashi/src/renderers/pi-tui/schema-mount.ts +4 -3
package/examples/extensions/ashi/src/renderers/pi-tui/tool-group.ts +5 -8
package/examples/extensions/ashi/src/ui.ts +11 -0
package/examples/extensions/ashi-ink/package.json +2 -2
package/examples/extensions/claude-code-bridge/package.json +1 -1
package/examples/extensions/opencode-bridge/package.json +1 -1
package/package.json +3 -1
package/src/agent/agent-loop.ts +1566 -0
package/src/agent/entry-format.ts +19 -0
package/src/agent/events.ts +153 -0
package/src/agent/extensions/rolling-history/constants.ts +1 -0
package/src/agent/extensions/rolling-history/index.ts +202 -0
package/src/agent/extensions/rolling-history/recall.ts +131 -0
package/src/agent/extensions/rolling-history/strategy.ts +404 -0
package/src/agent/host-types.ts +192 -0
package/src/agent/index.ts +591 -0
package/src/agent/live-view.ts +279 -0
package/src/agent/llm-client.ts +111 -0
package/src/agent/llm-facade.ts +43 -0
package/src/agent/normalize-args.ts +61 -0
package/src/agent/nuclear-form.ts +382 -0
package/src/agent/providers/deepseek.ts +39 -0
package/src/agent/providers/ollama.ts +92 -0
package/src/agent/providers/openai-compatible.ts +36 -0
package/src/agent/providers/openai.ts +52 -0
package/src/agent/providers/opencode.ts +142 -0
package/src/agent/providers/openrouter.ts +105 -0
package/src/agent/providers/zai-coding-plan.ts +33 -0
package/src/agent/session-store.ts +336 -0
package/src/agent/skills.ts +228 -0
package/src/agent/store.ts +310 -0
package/src/agent/subagent.ts +305 -0
package/src/agent/system-prompt.ts +151 -0
package/src/agent/token-budget.ts +12 -0
package/src/agent/tool-protocol.ts +722 -0
package/src/agent/tool-registry.ts +66 -0
package/src/agent/tools/bash.ts +95 -0
package/src/agent/tools/edit-file.ts +154 -0
package/src/agent/tools/expand-home.ts +7 -0
package/src/agent/tools/glob.ts +108 -0
package/src/agent/tools/grep.ts +228 -0
package/src/agent/tools/list-skills.ts +37 -0
package/src/agent/tools/ls.ts +81 -0
package/src/agent/tools/pwsh.ts +140 -0
package/src/agent/tools/read-file.ts +164 -0
package/src/agent/tools/write-file.ts +72 -0
package/src/agent/types.ts +149 -0
package/src/cli/args.ts +91 -0
package/src/cli/auth/cli.ts +244 -0
package/src/cli/auth/discover.ts +52 -0
package/src/cli/auth/keys.ts +143 -0
package/src/cli/index.ts +295 -0
package/src/cli/init.ts +74 -0
package/src/cli/install.ts +439 -0
package/src/cli/shell-env.ts +68 -0
package/src/cli/subcommands.ts +24 -0
package/src/core/event-bus.ts +252 -0
package/src/core/extension-loader.ts +347 -0
package/src/core/index.ts +152 -0
package/src/core/settings.ts +398 -0
package/src/core/types.ts +61 -0
package/src/extensions/file-autocomplete.ts +71 -0
package/src/extensions/index.ts +38 -0
package/src/extensions/slash-commands/events.ts +14 -0
package/src/extensions/slash-commands/index.ts +269 -0
package/src/shell/events.ts +73 -0
package/src/shell/host-types.ts +150 -0
package/src/shell/index.ts +159 -0
package/src/shell/input-handler.ts +505 -0
package/src/shell/output-parser.ts +156 -0
package/src/shell/shell-context.ts +193 -0
package/src/shell/shell.ts +414 -0
package/src/shell/strategies/bash.ts +83 -0
package/src/shell/strategies/fish.ts +77 -0
package/src/shell/strategies/index.ts +24 -0
package/src/shell/strategies/types.ts +64 -0
package/src/shell/strategies/zsh.ts +92 -0
package/src/shell/terminal.ts +124 -0
package/src/shell/tui-input-view.ts +222 -0
package/src/shell/tui-renderer.ts +1126 -0
package/src/utils/ansi.ts +140 -0
package/src/utils/box-frame.ts +138 -0
package/src/utils/compositor.ts +157 -0
package/src/utils/diff-renderer.ts +829 -0
package/src/utils/diff.ts +244 -0
package/src/utils/executor.ts +305 -0
package/src/utils/file-watcher.ts +110 -0
package/src/utils/floating-panel.ts +1160 -0
package/src/utils/handler-registry.ts +110 -0
package/src/utils/line-editor.ts +636 -0
package/src/utils/markdown.ts +437 -0
package/src/utils/message-utils.ts +113 -0
package/src/utils/package-version.ts +12 -0
package/src/utils/palette.ts +64 -0
package/src/utils/ref-counter.ts +9 -0
package/src/utils/ripgrep-path.ts +17 -0
package/src/utils/shell-output-spill.ts +76 -0
package/src/utils/stream-transform.ts +292 -0
package/src/utils/terminal-buffer.ts +213 -0
package/src/utils/tool-display.ts +315 -0
package/src/utils/tool-interactive.ts +71 -0
package/src/utils/tty.ts +14 -0

package/docs/architecture.md ADDED Viewed

@@ -0,0 +1,196 @@
+# Architecture
+agent-sh is a composable agent runtime: a pure kernel that any frontend can drive and any agent backend can plug into, over one shared extension layer. Frontends and backends are both bus-driven components that self-wire to events — the bundled shell is just one frontend among several.
+## Design Philosophy: Pure Kernel + Everything Is an Extension
+The core (`createCore()`) is a frontend-agnostic kernel — it wires up the EventBus, HandlerRegistry, and Compositor with zero knowledge of terminals, PTYs, LLMs, shells, or rendering. **The core has no agent, no LLM client, and no shell coupling.** The built-in agent backend, shell tracking, provider management, TUI rendering, and all other features are loaded as extensions.
+```
+createCore() — pure kernel:
+  │     EventBus          — typed pub/sub + transform pipelines
+  │     HandlerRegistry   — named function registry (define/advise/call)
+  │     Compositor        — routes named render streams to surfaces
+  │     Multi-backend     — coordinates which agent backend is active
+  │     Default `cwd` handler returning `process.cwd()`
+  │
+index.ts — interactive terminal frontend:
+  │     Shell             — PTY lifecycle (delegates to InputHandler + OutputParser)
+  │
+  ├── Agent host (always activated via activateAgent(ctx) before built-ins load):
+  │     ash backend       — provider resolution, LlmClient, lazy AgentLoop
+  │     core tools        — bash/read/write/edit/grep/glob/ls/list_skills registered at activate time
+  │     built-in providers — openrouter, openai, openai-compatible, deepseek (unconditional)
+  │
+  ├── Backend registry (owned by core; backends register via `agent:register-backend`):
+  │     core.activateBackend() — picks the named/persisted/first backend and calls its start()
+  │
+  ├── Built-in extensions (loaded via declarative manifest, individually disableable):
+  │     shell-context     — PTY exchange tracking, cwd advisor, <cwd>/<shell_events> producer
+  │     tui-renderer      — markdown rendering, inline diffs, thinking display, spinner
+  │     slash-commands    — /help, /model, /backend, /thinking, /compact, /context, /reload
+  │     file-autocomplete — @ file path completion
+  │
+  ├── Shared utilities:
+  │     palette           — semantic color system (accent, success, warning, error, muted)
+  │     diff-renderer     — syntax-highlighted diffs (split/unified/summary)
+  │     box-frame         — bordered TUI panels
+  │     tool-display      — width-adaptive tool call rendering + pure spinner
+  │     output-writer     — OutputWriter interface (StdoutWriter, BufferWriter for tests)
+  │     stream-transform  — content block transforms for response pipeline
+  │
+  └── User extensions (opt-in, loaded from -e flag / settings.json / extensions dir):
+        e.g. overlay-agent, interactive-prompts, solarized-theme, latex-images, peer-mesh
+```
+All components communicate exclusively through typed bus events. The backend has no reference to Shell — it emits lifecycle events and the TUI subscribes. Input flows the same way: any frontend emits `agent:submit` and the backend handles it.
+Built-in extensions are loaded from a declarative manifest and can be individually disabled via the `disabledBuiltins` setting in `~/.agent-sh/settings.json`. This means even the built-in agent can be disabled (e.g., for users who only use extension backends like Claude Code).
+**The core works without any frontend.** See [Library](library.md) for embedding agent-sh in your own apps.
+## How It Works
+1. agent-sh spawns a real PTY running your shell (zsh or bash, with your full rc config) and sets up raw stdin passthrough
+2. Built-in extensions load (including the agent backend, which registers via `agent:register-backend`), then user extensions
+3. `activateBackend()` wires the chosen backend to bus events
+4. All keyboard input goes directly to the PTY — zero latency, full terminal compatibility
+5. When you type `>` at the start of a line, agent-sh intercepts and enters agent input mode
+6. On Enter, the query is emitted as `agent:submit` and the active backend decides which tools to use
+7. The backend handles the query — streaming LLM responses, executing tools, emitting events. Read-only tools run in parallel; permission-requiring tools run sequentially.
+8. The TUI renderer extension renders streamed content inline (markdown, diffs, tool calls with tree-style grouping)
+9. When the backend finishes (`agent:processing-done`), normal shell operation resumes
+## Shell ↔ Agent Boundary
+The shell and the agent are **separate worlds** by default. The PTY runs your real shell; the agent runs its tools in isolated child processes. A `cd` by the agent's `bash` tool doesn't change your shell's cwd.
+### Command-boundary detection
+agent-sh injects three invisible OSC sequences into its inner shell — `\e]9999;id=<tag>;PROMPT\a` (precmd), `\e]9997;id=<tag>;<cmd>\a` (preexec), `\e]9998;id=<tag>;READY\a` (prompt rendered). `<tag>` is the process's `instanceId`. The OutputParser reacts only to its own tag; markers with a different tag (or none) are treated as opaque foreground output. That's what keeps a nested agent-sh — for example, an `ash` launched inside an SSH session — from cross-triggering the outer instance's command lifecycle.
+The connection between them is **context**: each query includes shell context (recent commands, output, cwd). The agent sees what you've been doing but can't touch your shell state.
+Extensions can cross this boundary using `shell:exec-request`. The core event bus makes this easy to wire up — an extension just registers a tool that emits the event and returns the result. We don't include a PTY tool as built-in because the right behavior depends on user preference (confirmation prompts? output capture? restricted commands?). See `examples/extensions/user-shell.ts` for a ready-made implementation.
+The pattern works like this:
+```
+agent calls user_shell({ command: "cd src" })
+  → bus.emitPipeAsync("shell:exec-request", { command })
+    → Shell writes command to PTY
+      → PTY executes in user's real shell
+        → shell:command-done fires with output
+          → result returned to agent
+```
+## Agent Backend
+The agent backend is a bus-driven component that registers via `agent:register-backend`. The core's multi-backend coordinator manages which backend is active — it has no knowledge of any specific backend's internals.
+### Built-in backend: ash
+The default backend is **ash**, registered from the agent host (`src/agent/index.ts`) when `activateAgent(ctx)` runs. It resolves LLM providers from registered catalogs + settings overlay, configures an `LlmClient`, and registers itself with the core's backend registry by emitting `agent:register-backend`. The `AgentLoop` that drives tool calls is constructed lazily — only when ash's `start()` runs (on `activateBackend("ash")`). See [The Built-in Agent: ash](agent.md) for the full guide.
+The agent host also defines an `llm:invoke` handler that backs the `ctx.agent.llm` facade, so any extension can call `ctx.agent.llm.ask(...)` or `ctx.agent.llm.session(...)` without knowing which backend is active. Backends with no LLM leave `ctx.agent.llm.available` false.
+### Extension Backends
+Extensions can register alternative backends by emitting `agent:register-backend` during activation — this is the same mechanism the built-in agent uses. See [Extensions: Custom Agent Backends](extensions.md#custom-agent-backends) for the full protocol and a working example.
+All backends emit the same bus events. The TUI, extensions, and library consumers don't know which backend is active.
+## Key Extension Points
+The extension system provides several composable primitives for customizing agent-sh. Each is documented in detail in the [Extensions](extensions.md) guide:
+- **[Event Bus](extensions.md#event-bus)** — typed pub/sub (`on`/`emit`), synchronous transform chains (`onPipe`/`emitPipe`), async transform chains (`onPipeAsync`/`emitPipeAsync`), and transform-then-notify (`emitTransform`)
+- **[Custom Agent Backends](extensions.md#custom-agent-backends)** — replace the entire agent backend via `agent:register-backend`
+- **[Named Handlers](extensions.md#named-handlers-advice-system)** — `define`/`advise`/`call` registry for wrapping processing steps (e.g. code block rendering)
+- **[Content Transform Pipeline](extensions.md#content-transform-pipeline)** — typed content blocks (`text`, `code-block`, `image`, `raw`) flow through parsers and post-transforms before rendering
+- **[Custom Input Modes](extensions.md#custom-input-modes)** — register trigger characters (`?`, `>`, etc.) with custom `onSubmit` handlers
+- **[Terminal Buffer & Floating Panel](extensions.md#terminal-buffer--floating-panel)** — headless xterm.js terminal mirror + composited overlay with handler-based rendering customization
+- **[Theming](extensions.md#theming)** — semantic color palette overrides via `setPalette()`
+## Project Structure
+```
+agent-sh/
+├── src/
+│   ├── core/                 # Substrate kernel — no LLM, no agent, no shell
+│   │   ├── index.ts          # createCore(), backend registry, extensionContext()
+│   │   ├── types.ts          # CoreContext, CoreConfig
+│   │   ├── event-bus.ts      # Typed EventBus: emit/on, emitPipe, emitPipeAsync, emitTransform
+│   │   ├── settings.ts       # User settings (~/.agent-sh/settings.json)
+│   │   └── extension-loader.ts # Extension loading (-e, settings.json, extensions dir)
+│   │
+│   ├── cli/                  # CLI entry + subcommands (install, init, auth)
+│   │   ├── index.ts          # Interactive terminal entry point
+│   │   ├── subcommands.ts, install.ts, init.ts
+│   │   └── auth/             # Provider API key management
+│   │
+│   ├── shell/                # Shell host — TUI frontend, PTY, compositor, theming
+│   │   ├── index.ts          # registerShellHandlers/activateShell — attaches ctx.shell
+│   │   ├── events.ts         # BusEvents augmentation (shell:*, input:*, compositor:*, autocomplete:request)
+│   │   ├── host-types.ts     # ShellSurface, ShellContext, ExtensionContext, AppConfig
+│   │   ├── shell.ts          # PTY lifecycle + wiring (InputHandler + OutputParser)
+│   │   ├── shell-context.ts  # Shell exchange tracking, cwd advisor, <shell_events>
+│   │   ├── tui-renderer.ts   # Main renderer — writes to compositor streams
+│   │   ├── input-handler.ts  # Keyboard input, agent mode, bus-driven autocomplete
+│   │   ├── output-parser.ts  # OSC parsing, command boundary detection
+│   │   └── tui-input-view.ts # Input rendering + line editor integration
+│   │
+│   ├── agent/                # Agent host — ash backend, providers, tools, skills
+│   │   ├── index.ts          # activateAgent — attaches ctx.agent, registers core tools + ash backend
+│   │   ├── events.ts         # BusEvents augmentation (agent:providers, agent:models-changed, ...)
+│   │   ├── host-types.ts     # AgentSurface, AgentContext, ProviderRegistration, Model, ModelEndpoint
+│   │   ├── types.ts          # AgentBackend, ToolDefinition, ToolResult
+│   │   ├── agent-loop.ts     # ash AgentLoop (constructed lazily in start())
+│   │   ├── llm-client.ts, llm-facade.ts  # ash LLM transport + ctx.agent.llm facade
+│   │   ├── providers/        # openai, openrouter, deepseek, openai-compatible
+│   │   ├── token-budget.ts   # Shared constants (RESPONSE_RESERVE, DEFAULT_CONTEXT_WINDOW)
+│   │   ├── tool-registry.ts, tool-protocol.ts
+│   │   ├── live-view.ts       # In-memory messages array + compaction + recall archive
+│   │   ├── store.ts, session-store.ts  # Append-only entry store; session/message persistence
+│   │   ├── nuclear-form.ts, system-prompt.ts
+│   │   ├── skills.ts, subagent.ts
+│   │   └── tools/            # Built-in tool implementations (bash, read/write/edit, grep, glob, ls, ...)
+│   │
+│   ├── extensions/           # Cross-cutting built-ins (loaded via manifest)
+│   │   ├── index.ts          # Declarative manifest + loader
+│   │   ├── slash-commands/   # /reload, /quit, command dispatch; events.ts ships command:* events
+│   │   └── file-autocomplete.ts
+│   │
+│   └── utils/                # Shared primitives
+│       ├── handler-registry.ts # Named function registry (define/advise/call)
+│       ├── compositor.ts       # Routes named render streams to surfaces
+│       ├── terminal-buffer.ts  # Headless xterm.js mirror of the terminal
+│       ├── floating-panel.ts   # Composited floating overlay
+│       ├── executor.ts         # Isolated child process execution
+│       ├── shell-output-spill.ts # Session-tempfile spill for long shell outputs
+│       ├── palette.ts, ansi.ts, diff.ts, diff-renderer.ts
+│       └── (markdown, line-editor, stream-transform, ...)
+│
+├── examples/                 # Example extensions and agent integrations
+│   └── extensions/
+│       ├── overlay-agent.ts     # Ctrl+\ floating overlay agent
+│       ├── interactive-prompts.ts # Permission prompts (opt-in safety)
+│       ├── peer-mesh.ts         # Cross-instance communication
+│       ├── terminal-buffer.ts   # Headless xterm.js terminal mirror extension
+│       ├── tmux-pane.ts         # Tmux side pane output/interactive modes
+│       ├── web-access.ts        # Web search and content extraction
+│       ├── user-shell.ts        # Run commands in the live PTY
+│       ├── questionnaire.ts     # Interactive question prompts
+│       ├── subagents.ts         # Subagent orchestration
+│       ├── solarized-theme.ts   # Theme example
+│       ├── secret-guard.ts      # Secret redaction
+│       ├── latex-images.ts      # LaTeX equation rendering
+│       ├── ollama.ts            # Ollama provider (local + cloud)
+│       ├── claude-code-bridge/  # Claude Code SDK backend
+│       ├── pi-bridge/           # Pi agent backend
+│       ├── ash-mcp-bridge/      # MCP server bridge
+│       └── ash-acp-bridge/      # ACP server (headless core)
+├── docs/                     # Documentation
+├── package.json
+└── tsconfig.json
+```

package/docs/context-management.md ADDED Viewed

@@ -0,0 +1,200 @@
+# Context Management
+## What is "context," and why manage it?
+Large language models take text as input and produce text as output. Every model has a **context window** — a hard cap on how much text it can consider at once, measured in tokens (~4 characters each). A modern frontier model might offer 200k or 1M tokens; an older one might offer 8k. The window is always finite, and every token inside it costs money, costs latency, and — as windows grow — can degrade output quality.
+"Context management" is the art of deciding *what* to keep inside that budget, *when* to evict things, and *how* to recover what you've pushed out. Different agents solve this differently. Most chat-style agents sidestep it: you get one window per conversation, and when it fills up you start a new chat. That works when the agent owns the entire interaction.
+**agent-sh is different — it lives inside a terminal**, and terminals don't have sessions.
+## The terminal mental model
+When you use a shell, you never think about "sessions." You run commands, switch between tasks, help a colleague, come back. Shell history is just *there* — always growing, searchable, persisting across restarts. Nobody invokes `/clear` or picks a new chat.
+agent-sh adopts this mental model. The consequences shape everything below:
+1. **No sessions.** There's no new-chat button and no `/clear`. History is continuous and append-only, like `.zsh_history`.
+2. **No workflow guessing.** We don't try to detect topic changes or time gaps — any heuristic that guesses user intent will be wrong often enough to annoy. The only reason to evict content is mechanical: the window filled up.
+3. **Two streams.** Shell activity and agent reasoning are fundamentally different kinds of information; they deserve different mechanisms.
+4. **Model-aware where it matters.** Compaction triggers adapt to the model's real context window, not a hardcoded threshold.
+5. **Strategy is pluggable.** The kernel decides *when* to act; *how* to compact is behind an advisable handler so extensions can install richer strategies without touching core code.
+## The two streams
+### Shell context — "what has the user been doing?"
+Captured and owned by the `shell-context` built-in (`src/shell/shell-context.ts`). Tracks user-initiated PTY activity: shell commands the user ran + their outputs.
+Agent tool outputs are **not** here — those live in the conversation stream. The boundary is strict: if the user typed it at the PTY, it goes into shell context; if the agent called a tool, it goes into the conversation.
+Frontends without a PTY (e.g. ashi, asHub) simply don't load this extension — the agent runs cwd-aware via the default `cwd` handler (`process.cwd()`) and no `<cwd>` / `<shell_events>` envelope is emitted.
+### Conversation — "what has the agent been working on?"
+Owned by `LiveView` (`src/agent/live-view.ts`). This is the OpenAI-shaped messages array (`user` / `assistant` / `tool`) the LLM actually sees. Contains:
+- User messages (queries the user sent to the agent)
+- Assistant messages (the LLM's replies)
+- Tool calls and tool results
+The two streams merge at one point: when the user submits a new query, the current cwd is wrapped inside `<cwd>` and any new shell events inside `<shell_events>` (both nested in the per-query `<query_context>` envelope) and prepended to that user message. They then live inside the conversation array as regular bytes, but they are never stored separately in both places.
+## How shell activity reaches the LLM
+Each exchange (a shell command + output) gets a sequential `id` as it's captured. The shell-context extension keeps an internal `lastSeq` cursor — the highest id it has already sent to the model.
+Shell context contributes to the per-query `query-context:build` handler (the `shell-context` extension advises it directly; extensions can equivalently use `ctx.agent.registerContextProducer(name, fn, { mode: "per-query" })`):
+1. The producer always emits `<cwd>...</cwd>` with the live PTY-tracked cwd, so every user message anchors where the agent is right now (immune to compaction confusion over historical cwds).
+2. If there are exchanges with id > `lastSeq`, it appends `<shell_events>...</shell_events>` with the deltas; the cursor then advances to the new high-water mark.
+3. The dispatcher composes the result with any other per-query producer output and wraps the whole bundle in `<query_context>...</query_context>`, prepended to the user's query inside a single user message.
+The delta is sent **once per user query**, not per tool-use step inside the agent loop. Inside the loop (where the LLM calls tools, sees results, calls more tools), no new shell events are injected — injecting mid-loop would break the `tool_call → tool_result` chain some providers require, and per-tool-call shell visibility isn't the right semantic anyway.
+Prior-turn shell events remain visible in later turns because they're embedded in earlier user messages in the conversation history. They are not *re-sent* as fresh bytes — the provider's prefix cache amortizes them to O(1) per turn.
+## Handling long shell outputs
+A `find /` or a verbose build can produce megabytes of output. Storing that verbatim in context is wasteful: most of it is never referenced.
+At capture time, if an exchange's output exceeds `shellTruncateThreshold` lines:
+1. The full text is written to `<tmpdir>/agent-sh-<pid>/<id>.out`.
+2. The in-memory exchange keeps only `shellHeadLines` from the top + a marker + `shellTailLines` from the bottom:
+   ```
+   <first 10 lines verbatim>
+   [... 4823 lines truncated — full output at /tmp/agent-sh-12345/42.out; use read_file to expand ...]
+   <last 10 lines verbatim>
+   ```
+3. If the agent needs the full content later, it calls `read_file` on the path — with `offset`/`limit` for pagination on very large files.
+This trades a little disk I/O for a lot of heap and token savings, and gives the user a side benefit: they can `cat /tmp/agent-sh-<pid>/42.out` directly to inspect what was captured, which is handy for debugging.
+The session directory is removed on process exit (including `SIGINT` / `SIGTERM` / `SIGHUP`). Stale directories from crashed sessions are swept lazily the next time agent-sh starts.
+## Conversation compaction
+Unlike shell context — which is a per-query delta and stays small — the conversation grows every turn. Without an active strategy it would eventually blow past the model's window. The kernel owns the *trigger*; the **built-in `rolling-history` extension** owns the *strategy* and the *store*. The result is a three-tier scheme designed to feel like shell history. (Headless or bridge backends that don't load the extension keep the live array and the kernel trigger, but have no summary store, recall, or cross-restart history.)
+### Tier 1 — eager capture
+Every time a message is appended to the conversation, the kernel emits a `conversation:message-appended` event. The rolling-history extension listens and, for each message:
+1. Nucleates it into a one-line summary (`nucleate()` in `src/agent/nuclear-form.ts`) and appends that as a persisted `Entry` to its summary **Store**.
+2. Appends an *ephemeral* `recall-cache` child entry holding the full message, so the verbatim text stays expandable for the rest of the process without ever being written to disk.
+3. Links the live message back to its entry id (`conversation:link`, which stamps `meta.entryId`), so a later compaction won't re-summarize it.
+Read-only tool results (`read_file`, `grep`, `glob`, `ls`) are filtered out of the persisted summaries — the agent can just re-run those tools.
+#### The summary store on disk
+The store (`SharedFileStore` in `src/agent/store.ts`) is an append-only JSONL log at `~/.agent-sh/rolling-history/history.jsonl` (`~/.agent-sh` is the config dir, overridable via `AGENT_SH_HOME`). One serialized `Entry` per line — `{ id, parentId?, ts, kind, payload }`, where a summary's payload carries `sum` (the one-liner), optional `body` (full content, capped per kind), and `iid` (the writing instance's id).
+- **Concurrency-safe.** Lines are short enough that POSIX `O_APPEND` writes are atomic, so multiple agent-sh instances can share one file without a lock. Only front-truncation (which rewrites the file) takes a lock — `history.jsonl.lock` via `O_EXCL`, with a 10-second stale-lock timeout to recover from crashes.
+- **Ephemeral entries never touch disk.** The `recall-cache` full-body entries are appended with `{ ephemeral: true }`, a no-op on the file store — they live only in the current process.
+- **Front-truncation.** After each append, the file is checked against the extension's `maxBytes` (default 50MB). Past 150% of the cap, the oldest lines are dropped and the rest rewritten atomically via temp-file + `rename`; the overshoot avoids frequent rewrites.
+- **Reverse-chunked reads.** `readRecent`, `findById`, and `search` stream the file backward in 1MB chunks, stitching lines across boundaries at the byte level so UTF-8 codepoints never split. Search caps at a 20MB scan budget to bound cost on large files.
+The store sits behind a generic `Store` interface (`append` / `findById` / `readRecent` / `search`), so an extension can swap in a different backend (SQLite, remote service) without changing capture or recall.
+### Tier 2 — active context
+The live `LiveView` array holds full messages for every turn the LLM currently sees. Alongside it, the rolling-history extension keeps two id-keyed views: the summary Store (one-liners, persisted) and the per-process `recall-cache` (full bodies, ephemeral). So once a turn is evicted from the live array, its summary stays browsable and its full text stays expandable for the rest of the session.
+### Tier 3 — compaction
+The kernel watches estimated prompt size against `autoCompactThreshold × (contextWindow − RESPONSE_RESERVE)` (default threshold `0.5`). When it's crossed (or `/compact` is invoked, or the API returns a context-overflow error), the kernel calls the advisable `conversation:compact` handler with a token target. The rolling-history extension's advisor implements the strategy:
+1. Parse the live array into turns (a turn starts at each user message).
+2. Pin the first turn and the most recent turns — the newest kept verbatim, a band just behind it "slimmed" (read-only tool calls dropped, long tool/assistant bodies trimmed).
+3. Score the remaining middle turns by *priority × recency* (user messages and errors rank highest; large read-only tool results lowest) and evict lowest-first until the estimate is under target.
+4. Replace the evicted span in place with one synthetic block — `[Conversation history — use conversation_recall to expand any entry]` — built from the recent summary lines, topping up summaries for any messages that missed eager capture.
+On startup, if `prefetchEntries > 0` (default 50) the extension reads the most recent summary lines from the Store and injects them as a `[Prior session history]` message — so context carries across restarts the way shell history does.
+### Token accounting
+Compaction decisions use **API-grounded** token counts, not a chars/4 heuristic. After each API response, the provider's reported `prompt_tokens` is captured as an anchor. On the next iteration, `estimatePromptTokens()` returns that anchor plus a small local estimate for anything appended since. This keeps the trigger aligned with what the provider actually bills.
+## Two mechanisms that look similar but aren't
+People often conflate shell output truncation and conversation compaction. They're different things:
+| | Shell output truncation | Conversation compaction |
+|---|---|---|
+| **Stream** | Shell context (`<shell_events>` deltas) | Conversation messages array |
+| **When** | Once, at the moment each exchange is captured | On threshold crossing, `/compact`, or overflow retry |
+| **State change** | Permanent: `ex.output` becomes head+tail+path | Permanent: evicted turns collapse to one-liners |
+| **Full-text location** | Tempfile on disk | Ephemeral recall cache + summary store (`~/.agent-sh/rolling-history/history.jsonl`) |
+| **Recovery tool** | `read_file` on the spill path | `conversation_recall` |
+They fire independently. An exchange with a huge output spills as soon as it's captured; conversation compaction may not trigger until many turns later, for unrelated reasons.
+## Recall APIs
+Both streams offer a way to retrieve full content that isn't in live context.
+### Shell output — `read_file` on the spill path
+There's no dedicated shell-recall tool: the spill file is just a normal file. The agent uses `read_file`, which already supports `offset`/`limit` pagination for very large outputs.
+### Conversation — `conversation_recall` tool
+Registered by the built-in `rolling-history` extension (only present when that extension is active; bridges and embedded uses don't ship it):
+- `conversation_recall {"action": "browse"}` — list the 25 most recent summary entries from the store
+- `conversation_recall {"action": "search", "query": "..."}` — regex search across stored entries (one-line summaries plus the ephemeral full-body cache), returning each hit's header and a first-match excerpt
+- `conversation_recall {"action": "expand", "turn_id": "#a1b2c3d4"}` — full content of a specific entry, by the `#id` shown in browse/search output
+Extensions that install a custom compaction strategy can reuse `conversation_recall` or advise it with their own semantics.
+## Extension hooks
+| Handler / event | Purpose |
+|---|---|
+| `conversation:compact` *(advisable handler)* | Install a custom compaction strategy. Read the messages array via `conversation:get-messages`, compute a replacement, install it via `conversation:replace-messages`, return `{ before, after, evictedCount }`. |
+| `conversation:message-appended` *(event)* | Fires every time a message is added (user/assistant/tool). Use it to build rolling indexes, summarize in the background, or feed external memory systems. |
+Common override patterns: LLM-summarized compaction (summarize evicted turns before eviction), topic pinning (preserve turns matching pinned keywords), alternate persistence backends (SQLite, vector store, remote service).
+## Slash commands
+| Command | Action |
+|---|---|
+| `/compact` | Fire the `conversation:compact` handler (effective behavior depends on active advisors) |
+| `/context` | Show context budget usage (active tokens, total tokens, budget) |
+| `/history [on\|off\|status]` | Pause/resume writes to the rolling-history store for this session. Recall stays available; the tool and instruction stay registered, so toggling doesn't perturb the tools array or system prompt (LLM prompt cache is preserved). |
+There's no `/clear` — history is continuous by design.
+## Configuration
+All settings live in `~/.agent-sh/settings.json`:
+| Setting | Default | Description |
+|---|---|---|
+| `shellTruncateThreshold` | 20 | Output lines that trigger spill-to-tempfile at capture |
+| `shellHeadLines` | 10 | Lines kept from the top when an output is spilled |
+| `shellTailLines` | 10 | Lines kept from the bottom when an output is spilled |
+| `autoCompactThreshold` | 0.5 | Fraction of available context window that triggers auto-compact |
+The `rolling-history` extension reads its own settings, namespaced under `"rolling-history"`:
+| Setting | Default | Description |
+|---|---|---|
+| `maxBytes` | 52428800 | Max size of the summary store before front-truncation (50MB) |
+| `prefetchEntries` | 50 | Summary entries injected as `[Prior session history]` on startup (0 disables) |
+## Key files
+| File | Role |
+|---|---|
+| `src/shell/shell-context.ts` | Built-in: shell exchange capture, spill-to-tempfile on long outputs, `<shell_events>` per-query producer, `cwd` handler advisor |
+| `src/utils/shell-output-spill.ts` | Per-pid session dir, cleanup on exit + signals, stale-dir sweep for crashed sessions |
+| `src/agent/live-view.ts` | The live messages array the LLM sees; estimate/replace/link + API-grounded token accounting |
+| `src/agent/nuclear-form.ts` | One-line-summary primitives (nucleate, serialize, priority classification) |
+| `src/agent/store.ts` | `Store` interface + `SharedFileStore`: append-only JSONL with chunked search/tail-read + front-truncation |
+| `src/agent/agent-loop.ts` | Auto-compact trigger, `conversation:*` handler definitions, `conversation:message-appended` emits |
+| `src/agent/extensions/rolling-history/` | The built-in rolling-history extension: eager capture (`strategy.ts`), `conversation:compact` advisor, `conversation_recall` (`recall.ts`), `/history` command (`index.ts`) |
+| `src/agent/index.ts` | `/compact` and `/context` slash commands registered when the ash backend starts |