npm - zidane - Versions diffs - 5.4.2 → 5.5.0 - Mend

zidane 5.4.2 → 5.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (86) hide show

package/README.md +45 -1
package/dist/{agent-DxBoKDba.d.ts → agent-CvImMxMQ.d.ts} +256 -5
package/dist/agent-CvImMxMQ.d.ts.map +1 -0
package/dist/chat.d.ts +137 -16
package/dist/chat.d.ts.map +1 -1
package/dist/chat.js +3 -2
package/dist/contexts/docker.d.ts +1 -1
package/dist/contexts-DhmMlT2W.js +472 -0
package/dist/contexts-DhmMlT2W.js.map +1 -0
package/dist/contexts.d.ts +3 -3
package/dist/contexts.js +1 -1
package/dist/{errors-Byb0F8B9.js → errors-CDwtPIMX.js} +4 -2
package/dist/{errors-Byb0F8B9.js.map → errors-CDwtPIMX.js.map} +1 -1
package/dist/{index-BOtXdQkW.d.ts → index-B0uc2C5x.d.ts} +9 -3
package/dist/index-B0uc2C5x.d.ts.map +1 -0
package/dist/{index-BiO_5Hm4.d.ts → index-CbS75MD3.d.ts} +2 -2
package/dist/index-CbS75MD3.d.ts.map +1 -0
package/dist/{index-B2VOOijU.d.ts → index-CtXksgqb.d.ts} +73 -4
package/dist/index-CtXksgqb.d.ts.map +1 -0
package/dist/index.d.ts +6 -6
package/dist/index.js +11 -11
package/dist/{interpolate-ERgZUxgg.js → interpolate-BaaKaKzN.js} +156 -19
package/dist/interpolate-BaaKaKzN.js.map +1 -0
package/dist/{login-CJbeAadS.js → login-iTy-0wYz.js} +3 -3
package/dist/{login-CJbeAadS.js.map → login-iTy-0wYz.js.map} +1 -1
package/dist/{mcp-DhmmJfxK.js → mcp-CNUbvbsy.js} +2 -2
package/dist/{mcp-DhmmJfxK.js.map → mcp-CNUbvbsy.js.map} +1 -1
package/dist/mcp.d.ts +1 -1
package/dist/mcp.js +1 -1
package/dist/{messages-D0xT979U.js → messages-fTR19Ga6.js} +2 -2
package/dist/{messages-D0xT979U.js.map → messages-fTR19Ga6.js.map} +1 -1
package/dist/{presets-MCcvxiNT.js → presets-h6UWhghO.js} +3 -2
package/dist/presets-h6UWhghO.js.map +1 -0
package/dist/presets.d.ts +2 -2
package/dist/presets.js +1 -1
package/dist/{providers-x3LZByR5.js → providers-G0VBZK9j.js} +4 -4
package/dist/{providers-x3LZByR5.js.map → providers-G0VBZK9j.js.map} +1 -1
package/dist/providers.d.ts +1 -1
package/dist/providers.js +2 -2
package/dist/session/sqlite.d.ts +1 -1
package/dist/session/sqlite.d.ts.map +1 -1
package/dist/session/sqlite.js +2 -1
package/dist/session/sqlite.js.map +1 -1
package/dist/{session-BHZwxmfr.js → session-CbkiJDlH.js} +3 -2
package/dist/session-CbkiJDlH.js.map +1 -0
package/dist/session.d.ts +1 -1
package/dist/session.js +2 -2
package/dist/skills.d.ts +2 -2
package/dist/skills.js +1 -1
package/dist/{tools-BNfyY14s.js → tools-D_icxa-V.js} +813 -284
package/dist/tools-D_icxa-V.js.map +1 -0
package/dist/tools.d.ts +3 -3
package/dist/tools.js +2 -2
package/dist/{transcript-anchors-DonKvoh4.d.ts → transcript-anchors-3FFw2xuk.d.ts} +98 -15
package/dist/transcript-anchors-3FFw2xuk.d.ts.map +1 -0
package/dist/tui.d.ts +29 -5
package/dist/tui.d.ts.map +1 -1
package/dist/tui.js +879 -70
package/dist/tui.js.map +1 -1
package/dist/{turn-operations-TKvy0q29.js → turn-operations-CtgBlBHn.js} +412 -125
package/dist/turn-operations-CtgBlBHn.js.map +1 -0
package/dist/types-IcokUOyC.js.map +1 -1
package/dist/types-KukEp-mi.d.ts +253 -0
package/dist/types-KukEp-mi.d.ts.map +1 -0
package/dist/types.d.ts +4 -4
package/dist/types.js +1 -1
package/docs/ARCHITECTURE.md +37 -3
package/docs/CHAT.md +4 -2
package/docs/RUN_IN_BACKGROUND.md +612 -0
package/docs/SKILL.md +83 -14
package/docs/TUI.md +40 -2
package/package.json +4 -4
package/dist/agent-DxBoKDba.d.ts.map +0 -1
package/dist/contexts-BwiHIr2w.js +0 -129
package/dist/contexts-BwiHIr2w.js.map +0 -1
package/dist/index-B2VOOijU.d.ts.map +0 -1
package/dist/index-BOtXdQkW.d.ts.map +0 -1
package/dist/index-BiO_5Hm4.d.ts.map +0 -1
package/dist/interpolate-ERgZUxgg.js.map +0 -1
package/dist/presets-MCcvxiNT.js.map +0 -1
package/dist/session-BHZwxmfr.js.map +0 -1
package/dist/tools-BNfyY14s.js.map +0 -1
package/dist/transcript-anchors-DonKvoh4.d.ts.map +0 -1
package/dist/turn-operations-TKvy0q29.js.map +0 -1
package/dist/types-Ce78ds4h.d.ts +0 -88
package/dist/types-Ce78ds4h.d.ts.map +0 -1

package/docs/RUN_IN_BACKGROUND.md ADDED Viewed

@@ -0,0 +1,612 @@
+# `run_in_background` — design plan
+> Status: **shipped (Phase 1)**. Tracks decisions and edge cases for background shell execution. Originally written as a forward-looking plan after comparing Claude Code's task-unification refactor (`tools/BashTool/BashTool.tsx`, `tools/TaskOutputTool/`, `tools/TaskStopTool/`, `tasks/LocalShellTask/LocalShellTask.tsx`); kept as the source of truth for the design rationale. Deviations from the plan as shipped are noted inline.
+## Problem
+Today's `shell` tool blocks the model's turn for the command's full duration. Correct for `git status`, broken for `npm run dev` / `python train.py` / `tail -f`. We need:
+- Start a long-running process and return control to the model immediately.
+- Let the model read incremental output without polling busy-loops.
+- Let the model kill a process by id.
+- Survive `agent.run()` boundaries (model may want to read across turns).
+- Tear down cleanly on `agent.destroy()` so closing the TUI doesn't orphan processes.
+Claude Code ships this as `Bash(command, run_in_background: true)` returning `{ backgroundTaskId, outputPath }` — output streams to a real file on disk, model reads via `Read`, framework pushes a completion notification on the next turn. We adopt that shape with minor adjustments.
+## Goals
+1. **Start-and-return**: `shell({ command, run_in_background: true })` settles in <100ms with `{ task_id, output_path, pid }`.
+2. **Disk-backed output**: stdout + stderr stream interleaved into one file at `output_path`. Model uses the existing `read_file` tool to inspect — no new poll tool.
+3. **Framework-pushed completion notification**: when a backgrounded task exits, the loop injects a `<task-notification>` block into the next user-turn so the model wakes up knowing the task is done, without polling.
+4. **Process-group kill** on `shell_kill({ task_id })` — same kill-tree guarantee we already have for foreground.
+5. **Agent-lifetime**, not run-lifetime — the task survives every `deactivateAllSkills` pass and run-end teardown until either it exits naturally or `agent.destroy()` fires.
+6. **Per-execution-context implementation** — `ProcessContext` first, others opt in.
+## Non-goals (initial scope)
+- **Stdin to background jobs.** Stdin is closed. Interactive REPLs and prompts (`vim`, npm prompts) are not supported. Mitigated in Phase 2 by a stall-watchdog that detects `(y/n)`-style stagnation and pushes a notification telling the model to kill + re-run with piped input or `--yes` flags.
+- **Persistence across TUI restarts.** Tasks die when the zidane process exits. We don't try to serialize task state to disk and reattach — that's tmux/screen territory. Resumed sessions show no live tasks.
+- **Cross-context teleportation.** A task started in `ProcessContext` doesn't follow if the agent swaps execution context.
+- **Subagent-task unification.** Claude Code's `TaskOutputTool` works on both background bash AND subagents. Conceptually elegant; structurally a different feature (async subagents). Out of scope here. Reconsider as its own RFC.
+- **Auto-backgrounding (KAIROS-style).** Claude Code promotes long-running foreground calls to background after a budget elapses. Magical and stateful — the model has to handle "wait, my foreground call became a task_id mid-flight". Skipped. Model decides upfront.
+## High-level architecture
+```
+                ┌──────────────────────────────────────────────┐
+                │              ExecutionContext                │
+                │                                              │
+   shell    ─┐  │   exec()  (existing, foreground)             │
+             │  │                                              │
+   shell    ─┤  │   execBackground()                           │
+   ({ bg })  │  │     spawn('/bin/sh', ['-c', cmd], detached)  │
+             ├─▶│     stdout/stderr → write stream → file      │
+             │  │     registry: Map<taskId, TaskEntry>         │
+             │  │                                              │
+ shell_kill ─┘  │   killBackground()                           │
+                │     process.kill(-pid, 'SIGTERM')            │
+                │                                              │
+                └──────┬───────────────────────────────────────┘
+                       │
+                       │  on child 'close' →
+                       ▼
+                ┌──────────────────────────────────────────────┐
+                │       Agent                                  │
+                │                                              │
+                │   pendingTaskNotifications: TaskNotif[]      │
+                │     ↑ enqueued on task exit                  │
+                │     ↑ drained at next run() start            │
+                │     ↑ latched off when model already         │
+                │       killed / read the task                 │
+                │                                              │
+                │   inject as <task-notification> in the       │
+                │   leading user-turn content block            │
+                └──────────────────────────────────────────────┘
+```
+Two pieces of plumbing:
+- **`ExecutionContext`** owns the registry and the file. Same shape as foreground `exec` — just doesn't await close.
+- **`Agent`** owns the notification queue. When a context tells it "task `bash_1` exited", it enqueues a notification. On `run()` start (or between batches), the loop injects pending notifications into the prompt.
+## API design
+### Model-facing tools
+**Two model-facing pieces of API**: one flag on `shell`, plus one new `shell_kill` tool.
+```ts
+// Modified shell tool — `run_in_background` flag toggles the return shape.
+shell({
+  command: 'npm run dev',
+  run_in_background: true,    // ← new optional flag
+})
+  → "Started bash_1 (pid 12345). Output: /Users/.../tasks/bash_1.20260523-024147-123.log\n\nThe task is running in the background. Read the output file with `read_file` to inspect progress; you'll receive a <task-notification> when it completes."
+// New tool — terminates a running background task by id.
+shell_kill({ task_id: 'bash_1' })
+  → "Killed bash_1 (exited 143). Output: /Users/.../tasks/bash_1.20260523-024147-123.log"
+```
+The log filename embeds a per-context UTC timestamp (`YYYYMMDD-HHMMSS-mmm`) after the task id, so two contexts sharing the same `tasksDir` (TUI restart, concurrent zidane instances on the same session) never resolve to the same file. The model-facing **task id** stays short (`bash_1`) — only the filesystem path carries the suffix. The model always reads the path verbatim from the spawn result; it never reconstructs it.
+**No `shell_output` / `shell_list`**. The model reads incremental output via the existing `read_file({ path, offset, limit })` and lists active tasks via the TUI's manage-tasks modal (or via `shell_kill` returning a list — TBD).
+**Foreground return shape unchanged** — calling `shell({ command })` without the flag returns `{ output, exit_code }` as today.
+### Disabling background mode
+Background mode is auto-disabled at the **schema level** (the `run_in_background` field is dropped from the `shell` tool's input schema AND the background-mode paragraphs are dropped from its description) when either:
+- `behavior.tasksDir` is **unset** — the host hasn't wired the log dir; no point advertising a flag that would only error.
+- `behavior.disableBackgroundTasks: true` — explicit opt-out for hosts that have `tasksDir` for some other reason but don't want the model spawning background work.
+The model never sees `run_in_background` in either case → no wasted turns discovering the feature doesn't apply. The runtime check in `runBackground` stays as defense-in-depth for forged inputs (a hand-crafted `{ run_in_background: true }` falls through to a clean error, not a silent fallthrough to foreground).
+Identity check: the auto-rewrite only fires when the registered shell tool is identity-equal to the framework's exported `shell` constant. Hosts who register a custom shell-named tool keep ownership of their spec; for explicit control, import `createShellTool({ allowBackground })` and register the tailored variant directly. The pattern mirrors `createSpawnTool` / `createToolSearchTool` — same factory shape, same lifecycle.
+### `ExecutionContext` additions
+```ts
+interface ExecutionContext {
+  // …existing fields…
+  /**
+   * Start a process in the background. Settles as soon as `spawn` returns
+   * — does NOT wait for the child to exit. Stdout + stderr stream
+   * interleaved into a single log file at the returned `outputPath`.
+   * Caller reads via the standard `readFile` seam.
+   *
+   * Optional — contexts that don't support backgrounding don't implement
+   * it. The shell tool surfaces "background mode is not supported in
+   * this execution context" when undefined.
+   *
+   * On task exit, calls `onExit` with the final status. Hosts wire this
+   * into the agent's `pendingTaskNotifications` queue so the model gets
+   * a push notification on its next turn (see "Completion notification"
+   * below).
+   */
+  execBackground?: (
+    handle: ExecutionHandle,
+    command: string,
+    options: {
+      cwd?: string
+      env?: Record<string, string>
+      onExit: (final: TaskExitInfo) => void
+    },
+  ) => Promise<{ taskId: string, outputPath: string, pid: number }>
+  /** SIGTERM the whole process group. Idempotent — second kill is a no-op. */
+  killBackground?: (
+    handle: ExecutionHandle,
+    taskId: string,
+  ) => Promise<TaskExitInfo>
+  /** Snapshot of every task (running + exited). */
+  listBackground?: (
+    handle: ExecutionHandle,
+  ) => Promise<readonly TaskEntry[]>
+}
+interface TaskEntry {
+  taskId: string
+  pid: number
+  command: string
+  cwd: string
+  startedAt: number
+  outputPath: string
+  status: 'running' | 'exited' | 'killed'
+  exitCode?: number
+  signal?: NodeJS.Signals
+  /** Total bytes written to the output file so far. */
+  bytesWritten: number
+}
+interface TaskExitInfo {
+  taskId: string
+  status: 'exited' | 'killed'
+  exitCode: number
+  signal?: NodeJS.Signals
+  outputPath: string
+  durationMs: number
+}
+```
+`onExit` is the seam that wires task lifecycle into the agent. ProcessContext fires it from the `child.on('close')` handler. The agent layer translates that into a queued notification.
+### Behavior knobs
+```ts
+behavior: {
+  // …existing…
+  /** Cap on concurrent background tasks per context. Default: 8. */
+  maxBackgroundTasks?: number
+  /**
+   * Default per-task file-size cap. When the output file grows past
+   * this, we truncate from the head (preserving the tail — same
+   * "tail-priority truncation" pattern shell uses). Default: 10 MiB.
+   * Set to 0 to disable.
+   */
+  backgroundOutputCap?: number
+  /**
+   * Stall watchdog — when output stagnates for N ms AND the tail
+   * matches an interactive-prompt regex, push a `<task-notification>`
+   * telling the model the process is likely waiting on stdin.
+   * Default: 45_000 (45 s). Set to 0 to disable.
+   */
+  backgroundStallWatchdogMs?: number
+}
+```
+No `backgroundOnDestroy` knob — destroy always kills. Power users who want tasks to outlive the TUI should use `tmux` / `nohup`; that's not the agent's job. (Drop from previous plan.)
+### Tool aliases
+Add a `string[]` `aliases` field to the tool spec so future renames don't break resumed sessions or SDK consumers:
+```ts
+interface ToolSpec {
+  name: string
+  description: string
+  inputSchema: Record<string, unknown>
+  aliases?: readonly string[]   // ← new
+}
+```
+When the loop dispatches by tool name, it walks the alias table as a fallback. We don't ship with any renames today — but it's free future-proofing (Claude Code added it retroactively for `BashOutputTool` → `TaskOutputTool` migration; we avoid the same scramble).
+## Completion notification
+The single biggest UX win over polling. When a background task exits, the framework wakes the model on its next turn with a structured `<task-notification>` block.
+### Mechanism
+- **Per-agent queue** `pendingTaskNotifications: TaskNotification[]` initialised once in `createAgent`.
+- **Enqueue on exit**: ProcessContext's `onExit` callback (passed by the agent during `execBackground`) appends to the queue.
+- **Drain at run start**: `agent.run()`'s loop checks the queue early (before building the first turn's wire messages) and prepends `<task-notification>` blocks to the leading user-turn content.
+- **Latch via `task.notified`**: when the model already read the task's output file OR killed it via `shell_kill`, the latch flips and the notification is suppressed. Prevents the "task exited, model already killed it, framework still pushes a notification" double-signal that Claude Code's bug stream documents.
+### Wire format
+A `<task-notification>` block prepended to the next user-turn content:
+```xml
+<task-notification>
+  <task-id>bash_1</task-id>
+  <status>exited</status>
+  <exit-code>0</exit-code>
+  <output-file>/Users/.../tasks/bash_1.20260523-024147-123.log</output-file>
+  <summary>npm run build (4.2s) — exited 0</summary>
+</task-notification>
+```
+Multiple completed tasks → multiple blocks, in completion order.
+### Prompt-side guidance
+The shell tool's description, when `run_in_background: true` is documented, includes:
+> If your command is long-running and you want to be notified when it finishes, set `run_in_background: true`. No sleep loop needed. You will receive a `<task-notification>` on your next turn with the output file path; `read_file` it to inspect. Do NOT poll the file in a loop while you wait — the notification IS the wake-up.
+Borrows the wording from Claude Code's `BashTool/prompt.ts` (lines 317-319).
+### Within-run waits
+The notification only fires on the **next** `agent.run()`. For "launched a 5-second build, want to read its output before responding in the same turn", the model has two choices:
+1. **Inline foreground**: just don't use background mode. Run synchronously.
+2. **`read_file` after a delay**: not great — sleep-loops are exactly what we're discouraging.
+We deliberately do NOT add a mid-run notification injection or a `wait_task` tool in Phase 1. If users hit this gap often enough that the model starts polling in loops anyway, add `wait_task({ task_id, timeout_ms })` in Phase 2 — it returns when the task exits or the timeout elapses, then the model `read_file`s the output once.
+## State model
+Per `ExecutionContext` instance:
+```ts
+interface TaskState {
+  taskId: string
+  pid: number
+  command: string
+  cwd: string
+  env: Record<string, string>
+  startedAt: number
+  child: ChildProcess          // the spawn() return — never exposed externally
+  outputPath: string           // absolute path to the log file
+  outputStream: WriteStream    // append stream piped from child's stdout+stderr
+  bytesWritten: number
+  status: 'running' | 'exited' | 'killed'
+  exitCode?: number
+  signal?: NodeJS.Signals
+  notified: boolean            // latch — true once the model has been told
+                               // (via auto-notification OR by reading/killing)
+  onExit: (final: TaskExitInfo) => void  // wired by the agent at spawn time
+}
+```
+**File layout**: per-session under `<userDir>/<sessionId>/tasks/<task-id>.<context-timestamp>.log`. Off the project tree, scoped to session lifetime, easy to clean up. The `<context-timestamp>` segment is `YYYYMMDD-HHMMSS-mmm` (UTC, millisecond precision) shared by every task spawned within the same `ExecutionContext` instance — so a directory listing groups cleanly by "which run produced these" and two contexts on the same session never write into the same file. On `agent.destroy()` we close the streams; on session delete (a TUI affordance) we remove the directory.
+**No ring buffer, no cursors**: the model uses `read_file({ offset, limit })` for incremental reads. The file is a normal log file; the existing read tool already does grep / range / pagination.
+**Output cap**: tail-priority truncation when `bytesWritten > backgroundOutputCap`. When the cap trips we don't kill the process (unlike Claude Code's behavior on shell output overflow) — we just stop writing new bytes and append a one-time `…(N bytes dropped from middle)…` marker to the file. Long-running dev servers can write gigabytes of "request handled" logs without bringing us down.
+## Lifecycle
+```
+┌────────────────────┐
+│ start              │  shell({ command, run_in_background: true })
+│                    │  → execBackground spawns the child
+│                    │  → stdout+stderr piped to output file
+│                    │  → registry entry created (status: 'running')
+│                    │  → tool result: { task_id, output_path, pid }
+└──────────┬─────────┘
+           │
+           │  child writes to output file (no in-memory buffer)
+           │
+┌──────────▼─────────┐
+│ inspect            │  model uses read_file({ path: output_path, … })
+│                    │  on its own schedule; no new tool needed
+└──────────┬─────────┘
+           │
+           ├─────────────────┐
+           │                 │
+┌──────────▼────┐   ┌────────▼────────┐
+│ exit          │   │ kill            │  shell_kill({ task_id })
+│ (natural)     │   │ (user/model)    │  → process.kill(-pid, SIGTERM)
+│               │   │                 │  → status: 'killed', exitCode: 143
+│ status:       │   │                 │  → notified = true
+│ 'exited'      │   │                 │
+│ exitCode set  │   └─────────────────┘
+└──────┬────────┘
+       │
+       │  onExit callback → agent.pendingTaskNotifications.push(…)
+       │
+┌──────▼─────────────┐
+│ wake-up            │  next agent.run() begins
+│                    │  → drain queue → prepend <task-notification> blocks
+│                    │  → notified = true (idempotent)
+│                    │  → model sees the notification + output path
+└────────────────────┘
+```
+### Interactions with existing semantics
+| User action | Effect on background task |
+|---|---|
+| `esc abort run` | Background task **not affected**. Run-level abort tears down the in-flight foreground call, not the background registry. The model still sees `<task-notification>` on its next prompt. |
+| `ctrl+k` cancel tool | Same. Cancel-tool is scoped to the call that's in flight; the call that *started* a background task has already returned, so there's nothing to cancel. |
+| `agent.destroy()` | **All background tasks killed** (process group SIGTERM), output streams flushed and closed, registry cleared. No detach option. |
+| Session swap (TUI) | The previous session's tasks are killed (via the agent destroy that fires during teardown). The new session starts with an empty registry. |
+| Session resume after restart | No tasks survive process exit. The output files remain on disk (under the per-session directory) but the registry is empty. |
+## Per-context support matrix
+| Context | Background support | Mechanism |
+|---|---|---|
+| `ProcessContext` | ✅ Phase 1 | spawn + file write streams + group kill |
+| `MockContext` | ✅ Phase 1 | Test stub — fake `TaskState` with manual `onExit` resolve |
+| `DockerContext` | ⏳ Phase 4 maybe | `docker exec -d` + tracking the exec instance — different primitive, needs its own design pass |
+| `SandboxContext` | ⏳ Provider-dependent | Some providers support detached exec; many don't |
+Contexts without `execBackground?` surface a clean "background mode is not supported in this execution context" error when the model sets the flag.
+## TUI surface
+Per Phase 3 work:
+- **Footer chip** showing running task count (`⌁ 2 tasks` style). Hides when zero. Same accent as the active-skills chip (`✦ N skills`).
+- **`ctrl+b` keybind** opens the manage-tasks modal AND, on second press / from the modal, backgrounds-all foreground tools. Borrows Claude Code's `backgroundAll()` semantics so one chord covers both common verbs.
+- **Manage-tasks modal**: list of tasks with id / command / status / runtime / output path. Per-row actions: kill, open output file in editor (OSC 8 link).
+- **Close-warning**: when the user tries to exit the TUI with running tasks, confirm-once dialog ("3 tasks will be killed — continue?"). One Enter to proceed, esc to cancel.
+## Decisions
+The three open questions are resolved. Tentative answers from the prior revision are now load-bearing — anything that diverges is a deliberate redesign and requires another review pass.
+### §1. Notification injection point — leading content block in next user-turn
+When a background task exits, the framework prepends a `<task-notification>` block to the **next user-turn**'s content array (a plain `text` content block carrying the XML). Persisted to `session.turns` as part of that turn; survives history replay; no new block type; the model sees the same thing whether the run is live or being resumed from disk.
+**TUI rendering — synthesized event, not raw XML in the transcript.** The `<task-notification>` tag would be ugly inline. Approach:
+1. Add a new `StreamEvent.kind: 'task-notification'` shape with structured fields (`taskId`, `status`, `exitCode`, `outputPath`, `summary`, `durationMs`). The agent emits this synthesized event when it injects the text block — same as how `'compact-summary'` events get synthesized alongside their underlying turn block today.
+2. The persisted user-turn carries the raw `<task-notification>` XML (for the model + replay correctness). The renderer uses the synthesized event for display.
+3. `eventsFromTurns` (the live↔history reconciliation pass) detects the `<task-notification>` prefix on replay and re-synthesizes the event, so a reloaded session shows the same banner the live one did.
+4. Banner shape: one line, theme-accented by status (`COLOR.dim` for exited-0, `COLOR.warn` for non-zero, `COLOR.error` for killed). Format: `⌁ bash_1 exited (0) · 4.2s · /Users/.../tasks/bash_1.20260523-024147-123.log`. OSC 8 hyperlink on the path so terminal-emulators that support it open the log file. Multi-task drains render as N banners in completion order.
+5. The renderer must DEDUPE — if the raw text block is also visible in the transcript via the generic text-block path, we'd double-render. The detection in step 3 strips the text block from generic rendering when a structured event covers it. (Same dedupe pattern the compact-summary code already uses.)
+### §2. File location — `<userDir>/<sessionId>/tasks/<task-id>.<context-timestamp>.log`
+Under zidane's user data dir, scoped per session. Same rationale as the existing session-persistence layout. Cleanup hooks into the session-delete path — when a session is removed, its `tasks/` subdirectory goes with it. The user's project tree stays clean.
+The `<context-timestamp>` is `YYYYMMDD-HHMMSS-mmm` in UTC, computed once at `createProcessContext()` and shared by every task in that context. Rationale:
+- **Collision-free across context restarts.** Without the suffix, a TUI restart on the same session would re-mint `bash_1, bash_2, …` and append into the OLD log files (`flags: 'a'`), producing scrambled output. Two zidane instances on the same session would do the same concurrently. The suffix decouples the model-facing id from the on-disk identity.
+- **Sortable.** Lexical sort of UTC `YYYYMMDD-HHMMSS-mmm` equals chronological sort. `ls tasks/` reads "in run order".
+- **Grouped per run.** Every `bash_N` in the same context shares the same timestamp segment — a human reading the directory listing sees the run boundary.
+- **Filesystem-safe.** Digits + hyphens only.
+- **Helper exports.** `formatContextTimestamp(date: Date): string` and `TASK_LOG_FILENAME_RE: RegExp` live in `src/contexts/process.ts` for tooling that wants to parse the convention.
+### §3. Destroy ordering — tasks first, mirroring "tear down what happens in the session, then what the session needed"
+Borrowing the user's framing: kill the things that operate INSIDE the session (background tasks, pending tool cancels), then teardown the things the session DEPENDED ON (MCP connections, execution handle, skills cache). One-shot teardown — sequential, not parallel — the few ms of extra latency aren't worth the synchronization complexity.
+Final pseudo-code:
+```ts
+async function destroy() {
+  if (destroyed) return
+  destroyed = true
+  // ① INSIDE-the-session — work that the session was producing.
+  await killAllBackgroundTasks()        // SIGTERM groups, await close, flush + close WriteStreams
+  for (const c of pendingToolCancels.values())
+    if (!c.signal.aborted) c.abort('agent-destroyed')
+  pendingToolCancels.clear()
+  // ② NEEDED-for-the-session — infra the session was sitting on top of.
+  if (mcpConnection) await mcpConnection.close()
+  if (executionHandle) await executionContext.destroy(executionHandle)
+  skillsCleanup()
+}
+```
+## Code-quality checklist
+A non-exhaustive list of "do not let these slip into the diff" — distilled from the kind of bugs this shape of feature tends to ship with. Each item references the part of the design that's most at risk.
+1. **Notification double-fire (queue + drain race).** Enqueue on `child.on('close')`, drain at next `run()` start. The `notified: boolean` latch on `TaskState` must be set BEFORE we drain the entry, not after, otherwise a concurrent kill / read inside the same microtask can re-emit. Pin with a test that fires `shell_kill` between `onExit` and `run()` start — the kill must win, no notification injected.
+2. **Stream flush race on task exit.** Node's `WriteStream` queues writes; `child.on('close')` can fire while bytes are still queued. Use `stream.end(callback)` and await the callback BEFORE flipping the entry to `'exited'`. Without this, a model that reads the file in the same turn as the notification arrives can see truncated output. Test: write 10 KB, exit, read — full content must be visible.
+3. **File handle leaks on abnormal teardown.** The WriteStream must close on every exit path: natural close, kill, error, agent.destroy. Use a single `closeTask(taskId)` helper that's idempotent and always called from finally / catch / destroy. Test: spawn → kill → destroy → check no FD entries remain (Bun's `Bun.openSync` count or process resource usage).
+4. **Path-traversal hygiene on task IDs.** Even though we mint `bash_${n}` (no user input), validate the id against `/^bash_\d+$/` before joining into a path. Defensive; cheap; pins the invariant.
+5. **PGID-reuse race on long-uptime systems.** `process.kill(-pid, …)` after the kernel has reaped + reassigned the pid hits a different process. The existing kill-tree code catches ESRCH/EPERM; verify the new task-kill path uses the same try/catch and logs nothing on ESRCH (silent is correct — process is already gone).
+6. **Notification queue not cleared on session swap.** The TUI's session-swap path tears the agent down and builds a fresh one. The new agent must NOT inherit pending notifications from the old session. Same pattern as `inFlightTools` / `activeSkillNames` — clear in the teardown handler.
+7. **Don't abstract too early.** Keep `execBackground` / `killBackground` / `listBackground` as direct methods on `ExecutionContext`. NO "BackgroundTaskManager" class or "TaskRegistry" wrapper — the registry is a `Map<string, TaskState>` inside the context's closure, full stop. We can extract a class IF a second context implementation needs one; YAGNI until then.
+8. **Don't duplicate cancellation semantics.** The existing `pendingToolCancels` map is for FOREGROUND-tool cancellation — calls that are still awaited by the loop. Background tasks live in a SEPARATE registry. Two maps, two purposes, never collapse into one. Doing so risks the cancel/kill semantics drifting into each other.
+9. **Persisted XML must round-trip cleanly.** The `<task-notification>` block is plain text inside a `text` content block. Escape any user-provided strings in the summary (the command shows up there). Use the existing `escapeXml` helper from `src/xml.ts`. Test: a command containing `<` / `>` / `&` round-trips through enqueue → inject → persist → replay without corruption.
+10. **Run-end deactivate-all MUST NOT touch task notifications.** The existing skills deactivate-all at run end is a separate concern. The notification drain happens BEFORE the loop builds the first turn's wire messages — earlier in `run()` than `deactivateAllSkills`. Putting them in the same teardown bucket would cause notifications to fire on run-end (wrong — they fire on the NEXT run's start).
+11. **Replay correctness.** `eventsFromTurns` must produce the same StreamEvent stream live and on replay. If we add the `task-notification` synthesized event, replay must re-synthesize it from the raw text block. Add an integration test that loads a session with a previously-persisted notification turn and asserts the event stream contains the structured event.
+12. **No cross-context teleportation assumption.** The task registry lives on the ExecutionContext instance. Swapping contexts is not supported in v1 — document this. If swap happens (host code), the old context's destroy handles its tasks; the new context starts empty. No "migrate" logic.
+13. **Output cap must NOT kill the process.** When the file grows past `backgroundOutputCap`, we drop bytes from the middle and append a one-time marker. Long-running dev servers don't get killed for being verbose — that's the bug we're avoiding from Claude Code's foreground-shell behavior. Test: spawn a writer that exceeds the cap, verify process is still running, verify marker appears, verify reading the file returns head + marker + tail.
+14. **`onExit` callback must be at-most-once.** Multiple registration paths (close, error, abort) can race. Use a `settled: boolean` flag on the entry; gate the callback behind it. Same pattern the existing `runSingleToolDispatch`'s `settled` flag uses.
+15. **TUI dedupe of synthesized event vs raw text.** When `eventsFromTurns` walks the persisted turns and finds a `<task-notification>` text block, it must emit ONLY the structured event — NOT both the text block AND the structured event. Otherwise the transcript shows the banner AND the raw XML. The compact-summary code has the same problem and solves it the same way; reuse the pattern.
+## Phased rollout
+Each phase MUST land its code, tests, AND doc updates together. A phase isn't "done" until the public-facing surface is documented and the test suite covers the new behavior. No phase ships behind a "we'll write the docs next sprint" flag — that's how docs drift from reality (we just spent a release cycle backfilling the cancellation + skills work).
+### Phase 1 — context primitive + ProcessContext + agent plumbing
+**Code:**
+- `ExecutionContext` adds `execBackground?` / `killBackground?` / `listBackground?` with `TaskEntry` / `TaskExitInfo` types.
+- `ProcessContext` implementation: spawn + WriteStream + group kill + idempotent `closeTask` + `onExit` wiring.
+- `MockContext` stub for tests.
+- `Agent`: `pendingTaskNotifications` queue + drain helper + injection at `run()` start. `notified` latch on `TaskState`. `killAllBackgroundTasks` helper called first in `destroy()`.
+- `Agent.destroy()` reordered per §3 — INSIDE-the-session work first, NEEDED-for-the-session second.
+- New tool: `shell_kill`. The `shell` tool gains the `run_in_background` flag.
+- New hooks: `background:start` (`{ taskId, pid, command }`), `background:exit` (`{ taskId, status, exitCode, outputPath, durationMs }`).
+- StreamEvent: new `kind: 'task-notification'` with structured fields; renderer dedupe vs the raw text block; `eventsFromTurns` re-synthesizes from persisted XML on replay.
+- TUI: banner rendering for the synthesized event (one-line, status-accented, OSC 8 link on output path).
+**Tests:**
+- `execBackground` settles in <100ms, returns valid path + pid.
+- File contents match expected stdout (interleaved with stderr).
+- `killBackground` kills the group — probe via `ps -A`, gated by sandbox-env detection (same skip pattern as the existing kill-tree test).
+- Notification injected at next `agent.run()` start with correct XML payload.
+- `notified` latch: `shell_kill` between exit and next `run()` suppresses the auto-notification.
+- `notified` latch: `read_file` against the output path between exit and next `run()` ALSO suppresses (covered by the latch flip in the read-tracking path).
+- `agent.destroy()` kills all running tasks AND flushes their output streams before returning. Probe via fd count (or via reading the file post-destroy and confirming the trailing bytes are present).
+- Multiple concurrent tasks → multiple notifications in completion order.
+- `eventsFromTurns` re-synthesizes `task-notification` events on replay.
+- TUI banner renders for both live + replayed events with no duplicate text block.
+- Escaped XML round-trip for commands with `<` / `>` / `&`.
+- Cross-test isolation: each test uses a unique session id + temp dir so artifacts don't leak.
+**Docs:**
+- `docs/SKILL.md`: new "Background tasks" subsection under the existing hooks + tools section. Covers the `run_in_background` flag, the notification flow, the file-path read pattern, the two new hooks, and the wire format of the `<task-notification>` block.
+- `docs/ARCHITECTURE.md`: add to the tool-execution diagram the `execBackground` branch and the notification-injection arrow at `run()` start. Add `background:start` / `background:exit` to the hook ordering reference.
+- `docs/TUI.md`: brief note that the synthesized `task-notification` event renders as a banner; full TUI coverage lives in Phase 3.
+- `README.md`: one-paragraph blurb under "Sub-agent Spawning" (sibling concept) introducing `run_in_background: true` with a 5-line example. Mention the notification flow + that polling is unnecessary.
+### Phase 2 — quality-of-life
+**Code:**
+- Stall watchdog: detect 45 s of output stagnation + interactive-prompt regex match, push a one-shot notification telling the model to kill and re-run with piped input or `--yes` flags.
+- Output cap with tail-priority truncation when the file grows past `backgroundOutputCap` (default 10 MiB). MUST NOT kill the process — drop bytes from the middle, append a marker line.
+- `ToolSpec.aliases?: readonly string[]` field plumbed through the dispatcher's name-resolution path. No renames ship yet; the framework is ready.
+**Tests:**
+- Stall watchdog: write 100 B every 50 s for 5 minutes — no false positive. Write 100 B then `(y/n)` prompt then no more output for 45 s — notification fires exactly once.
+- Output cap: writer exceeds the cap by 10×, process stays alive, file contains head + marker + tail (middle dropped), notification fires on natural exit.
+- Aliases: register a tool with alias `["OldName"]`, dispatch by `OldName`, body runs.
+**Docs:**
+- `docs/SKILL.md`: extend the Background tasks section with the stall-watchdog behavior + the truncation marker. Document `behavior.backgroundStallWatchdogMs` and `behavior.backgroundOutputCap`. Add `aliases` to the `ToolSpec` reference.
+- `docs/ARCHITECTURE.md`: update the dispatcher path description to mention the alias fallback.
+### Phase 3 — TUI integration
+**Code:**
+- Footer chip showing running task count (`⌁ N tasks`), hidden when zero. Same shape as the `✦ N skills` chip; same color accent or a distinct one (TBD during impl).
+- `ctrl+b` keybind action. First press opens the manage-tasks modal; second press (or modal action) backgrounds all currently-foreground bash tools via the same kill-tree mechanism repurposed for background promotion.
+- Manage-tasks modal: list (task id, command, status, runtime, output path), per-row kill action, per-row "open output" action (OSC 8 link or `$EDITOR`).
+- Close-warning: when the user requests TUI exit AND running tasks exist, confirm-once dialog with the count.
+**Tests:**
+- Footer chip updates as tasks start + exit (state-driven, no flicker).
+- `ctrl+b` first press opens modal; second press backgrounds all; modal kill action terminates the group.
+- Close-warning fires when tasks are running; doesn't fire when registry is empty.
+**Docs:**
+- `docs/TUI.md`: new "Background tasks" section. Cover the chip, the keybind, the modal, the close-warning. Document the `kind: 'task-notification'` event banner with screenshots / ASCII mockup.
+- `docs/CHAT.md`: if the chat package exposes any new types or events for the TUI integration, document them here.
+### Phase 4 (deferred) — broader context coverage
+**Code:**
+- Docker exec backgrounding (`docker exec -d` + container exec tracking).
+- Sandbox-context providers with detached execution.
+- Subagent-task unification — bigger refactor, standalone RFC if pursued.
+- Orphan-reaping sweep for `ZIDANE_TASK_OWNER` env marker after a SIGKILL of the parent (the resource-leak mitigation from Risk #4).
+**Docs:**
+- Per-context tables in `docs/ARCHITECTURE.md` updated as each lands.
+- `docs/SKILL.md`: support matrix for which contexts honor `run_in_background`.
+## Testing strategy
+### Unit (context layer)
+- `execBackground` returns `{ taskId, outputPath, pid }` in < 100ms.
+- Sequential task IDs (`bash_1`, `bash_2`).
+- Output file is written with interleaved stdout + stderr.
+- `killBackground` sends SIGTERM to the group; probe via `ps -A` (skipped gracefully when `ps` unavailable, same pattern as the existing `kills the whole process tree on abort` test).
+- `onExit` callback fires exactly once per task with correct status.
+- `listBackground` reflects state transitions.
+- Concurrent tasks respect `maxBackgroundTasks` cap.
+- Tail truncation kicks in past `backgroundOutputCap` without killing the process.
+### Integration (agent layer)
+- Notification queue: enqueue on exit, drain on next run-start, format matches the documented XML.
+- `notified` latch: kill / read suppresses the would-be auto-notification.
+- `agent.destroy()` kills all running tasks AND flushes their output streams before returning.
+- Multiple completed tasks → multiple notification blocks in completion order.
+- A model that reads the output file gets a coherent log (no missed bytes if the read happens after the close-write race).
+### E2E (Phase 3)
+- TUI footer chip updates as tasks start and exit.
+- `ctrl+b` first press opens the modal; second press (or modal action) backgrounds all foreground tools.
+- Close-warning fires when running tasks exist.
+## Risks / known limitations
+1. **Process double-fork / setsid()** escapes our process group. A backgrounded command that itself daemonizes (`nginx`, some database servers) won't die under `process.kill(-pid)`. Documented limitation. Mitigation: advise users to background the actual daemon with `nohup` or `disown` if they want the lifetime decoupled from us.
+2. **`stdin` closed** — interactive REPLs / TUIs that expect a controlling terminal will misbehave. The stall watchdog (Phase 2) tries to detect this and tell the model to kill + retry with piped input. Not a silver bullet.
+3. **Output truncation surprises** — when the output cap trips, the model sees a marker mid-file. The marker is loud enough that the model should notice, but a poorly-prompted model could miss it and assume incomplete output is the real output. Mitigated by setting the cap high (10 MiB default) and surfacing the marker as a structured tag (`<output-truncated bytes-dropped="…"/>`) the model can pattern-match.
+4. **Resource leaks on host SIGKILL** — if the zidane process is force-killed (not graceful destroy), `detached: true` children survive and become PPID 1. The user has to `pkill` manually. Could mitigate with an env-var marker (`ZIDANE_TASK_OWNER=<sessionId>`) that lets a next-launch sweep identify our orphans — deferred to Phase 4.
+5. **`task_id` collision across context swaps** — sequential ids reset on `agent.destroy()`. If a session swaps execution contexts mid-flight (not a thing today, but conceivable), the new context starts at `bash_1` and could collide with notifications that reference the old context's ids. Document; ignore unless it becomes a real problem.
+6. **Race between `onExit` and the next `agent.run()`** — task exits, `onExit` enqueues notification, but the agent is mid-tool-batch on a *different* run. The notification waits for the run to complete, then drains at the next run-start. Correct, but means the model can sometimes see a notification one turn later than the kernel saw the exit. Acceptable.
+7. **File handle leaks on `agent.destroy()`** — must close every WriteStream before destroying the execution handle. The destroy ordering above puts task-kill first specifically for this reason; tests need to verify no file handles remain open.
+## Implementation footprint estimate
+| Layer | LOC (rough) |
+|---|---|
+| **Phase 1 — code** | |
+| `src/contexts/types.ts` — three method signatures + `TaskEntry` / `TaskExitInfo` types | ~50 |
+| `src/contexts/process.ts` — registry + spawn + file streams + group kill + `closeTask` | ~300 |
+| `src/contexts/sandbox.ts` / `docker.ts` — typed-undefined stubs | ~20 |
+| `src/agent.ts` — `pendingTaskNotifications` queue, drain at run-start, `<task-notification>` injection, destroy reorder | ~180 |
+| `src/tools/shell.ts` — `run_in_background` flag handling | ~80 |
+| `src/tools/shell-kill.ts` — new tool | ~60 |
+| `src/index.ts` — exports | ~10 |
+| `src/chat/types.ts` — `StreamEvent.kind: 'task-notification'` + fields | ~25 |
+| `src/chat/store.ts` (or wherever `eventsFromTurns` lives) — replay synthesis + dedupe | ~80 |
+| `src/tui/components.tsx` — banner renderer for the new event kind | ~80 |
+| **Phase 1 — tests** | |
+| `test/mock-context.ts` — task stubs | ~60 |
+| `test/background.test.ts` — context + agent + replay E2E | ~500 |
+| **Phase 1 — docs** | |
+| `docs/SKILL.md` / `docs/ARCHITECTURE.md` / `docs/TUI.md` / `README.md` | ~120 |
+| **Phase 2** — stall watchdog + output cap + tool aliases (code + tests + docs) | ~350 |
+| **Phase 3** — TUI chip + manage modal + `ctrl+b` keybind + close-warning (code + tests + docs) | ~400 |
+**Total** for Phases 1–3: ~2.3 KLOC. ~10 % larger than the post-redesign estimate because the synthesized `task-notification` StreamEvent + replay + banner rendering bumped Phase 1; still ~30 % smaller than the original ring-buffer / four-tool plan.
+## Status
+**Phase 1 shipped.** `shell({ run_in_background: true })`, `shell_kill`, `agent.killBackgroundTask`, `background:start` / `background:exit` / `background:reassign` hooks, `<task-notification>` injection + replay, subagent reassignment, TUI banner + cancel-tool modal integration, log-filename timestamping. See the [test files](../test/background.test.ts) and [`docs/SKILL.md` — Background tasks](SKILL.md#background-tasks) for the as-shipped behavior.
+Phases 2–3 (stall watchdog, output cap, manage-tasks modal, `ctrl+b` keybind, close-warning) remain on the backlog. Reopen this doc when picking those up.
+Code-quality checklist (15 items) and per-phase doc-update obligations are load-bearing parts of the plan — not afterthoughts. Each phase ships code + tests + docs together, or it doesn't ship.