npm - @aion0/forge - Versions diffs - 0.10.79 → 0.10.80 - Mend

@aion0/forge 0.10.79 → 0.10.80

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/RELEASE_NOTES.md +8 -5
package/app/api/tasks/[id]/hook/stop/route.ts +15 -0
package/app/api/tasks/route.ts +2 -1
package/cli/mw.mjs +7 -5
package/cli/mw.ts +8 -6
package/components/Dashboard.tsx +6 -2
package/components/TaskDetail.tsx +28 -1
package/components/TmuxTaskTerminal.tsx +105 -0
package/components/WebTerminal.tsx +7 -0
package/docs/design_automation_records/Automation Redesign.dc.html +2019 -0
package/docs/design_automation_records/README.md +232 -0
package/lib/chat/agent-loop.ts +6 -0
package/lib/chat/tool-dispatcher.ts +110 -9
package/lib/help-docs/05-pipelines.md +31 -0
package/lib/help-docs/25-chat-tools.md +23 -0
package/lib/pipeline.ts +27 -3
package/lib/task-manager.ts +73 -3
package/lib/task-tmux-backend.ts +625 -0
package/lib/workspace/skill-installer.ts +18 -8
package/package.json +1 -1
package/proxy.ts +5 -4
package/src/core/db/database.ts +1 -0
package/src/types/index.ts +3 -0

package/docs/design_automation_records/README.md ADDED Viewed

@@ -0,0 +1,232 @@
+# Handoff: Forge Automation — Pipeline & Task Redesign
+## Overview
+This is a redesign of the **Automation** area of Forge (a CI/agentic-automation product). It restructures the area into four pages with a clearer information architecture and fixes two core pain points: pipeline runs took too many clicks to inspect, and task logs were spread across tabs.
+The four pages:
+1. **Pipeline** — view & edit pipeline *definitions* (steps, commands, triggers). No run data here.
+2. **Pipeline Record** — a flat, filterable table of *all* pipeline execution records across every pipeline. Rows expand inline to show that run's own steps + the failed-step log. Failed runs can be **retried**; running runs can be **cancelled**.
+3. **Task** — tasks are one-shot (created → run once → done), so there is no separate "task definition." This single page combines **Create Task** + a flat, filterable list of recent/running tasks. Rows expand inline into a compact log/result/diff cockpit.
+4. **Task Detail** — a dedicated full page for a single task showing *everything*: command/prompt, full searchable log, lineage back to its pipeline run, a details sidebar, a phases timeline, and Result + Git Diff.
+A persistent **top navigation** (Schedules · Pipeline · Pipeline Record · Task) switches between pages.
+## About the Design Files
+The file in this bundle (`Automation Redesign.dc.html`) is a **design reference created in HTML** — a working prototype showing the intended look and behavior. It is **not production code to copy directly.** It is authored in a bespoke template runtime ("Design Component" / `.dc.html`) used only for prototyping; do not try to reuse that runtime.
+Your task is to **recreate these designs in the target codebase's existing environment** (React, Vue, Svelte, etc.) using its established component library, state patterns, data-fetching, and styling approach. If no front-end environment exists yet, pick the most appropriate framework for the project and implement there. Treat the HTML as the source of truth for layout, spacing, color, typography, copy, and interaction — but wire the data to real APIs and use the app's real components.
+## Fidelity
+**High-fidelity.** Colors, typography, spacing, and interactions are final and intended to be reproduced faithfully. Exact hex values, sizes, and copy are given below. The one caveat: all data in the prototype is **mock data** (modeled on a `fortinet-mantis-bug-fix-batch` pipeline). Replace it with real API data; keep the layouts and visual treatment.
+---
+## Design Tokens
+### Color — surfaces (dark, near-black terminal aesthetic)
+| Token | Hex | Usage |
+|---|---|---|
+| `bg/app` | `#08090b` | App background, content area |
+| `bg/deep` | `#070809` | Log panels, deepest wells |
+| `bg/panel` | `#0b0c0e` | Card panel background |
+| `bg/raised` | `#0d0e11` | Top bar, table headers, section headers |
+| `bg/well` | `#0a0b0d` | Sidebars, output wells |
+| `bg/card` | `#0f1013` | Overview cards |
+| `bg/row-active` | `#121317` / `#15171b` | Selected/expanded row |
+| `bg/input` | `#08090b` | Inputs, search fields |
+| `bg/chip` | `#1a1d22` | Neutral buttons / segmented active |
+### Color — borders & lines
+| Token | Hex | Usage |
+|---|---|---|
+| `border/strong` | `#2a2e35` | Button borders, modal border |
+| `border/default` | `#22252b` | Card borders, input borders |
+| `border/line` | `#1d1f24` | Section dividers |
+| `border/faint` | `#16181c` / `#131519` / `#15171a` | Row dividers, inner hairlines |
+### Color — status & accent
+| Token | Hex | Meaning |
+|---|---|---|
+| `accent/brand` (orange) | `#ff7d2e` | Primary action (Re-run, Create, New), brand mark |
+| `status/passed` (green) | `#46c25a` | passed / done / exit 0 |
+| `status/failed` (red) | `#f15b4a` | failed / cancelled / errors / exit 1 |
+| `status/running` (blue) | `#4f9dff` | running + all hyperlinks / active tab |
+| `status/skipped` (gray) | `#33363d` | skipped / pending step bars |
+| `kind/llm` (purple) | `#a978f0` | LLM-prompt step badge, cost values |
+| `kind/shell` (yellow) | `#d8a23f` | Shell-command step badge |
+Task status colors: done = `#4f9dff` (blue), failed = `#f15b4a` (red), running = `#4f9dff` (blue, with pulse), other/dim = `#7d828b`.
+### Color — text
+| Token | Hex | Usage |
+|---|---|---|
+| `text/primary` | `#f3f4f6` | Headings, key values |
+| `text/body` | `#cdd0d6` | Default body text |
+| `text/secondary` | `#c2c6cd` / `#c9cdd3` | Row labels |
+| `text/muted` | `#8b9098` / `#9aa0a8` | Descriptions |
+| `text/dim` | `#7d828b` / `#6b7079` | Meta, timestamps |
+| `text/faint` | `#565b63` / `#5e636b` | Uppercase column labels, line numbers |
+| `text/ghost` | `#3f434b` / `#34373d` | Log line-number gutter, separators |
+### Typography
+- **Mono (primary UI + all data):** `'JetBrains Mono', ui-monospace, monospace`. Weights 400/500/600/700. This is the dominant typeface — IDs, logs, table cells, commands, most labels.
+- **Sans (prose only):** `'Inter', sans-serif`. Used for page titles, descriptive paragraphs, and Overview cards.
+- Sizes (px): page title 18–26 / 700; section heading 13–15 / 700; row text 11.5–13; meta 10–11; uppercase eyebrow labels 10–11 / 600 with `letter-spacing: 1px` and `text-transform: uppercase`.
+- Line numbers and log text: 12–12.5px, `line-height: 1.9`.
+### Spacing, radius, effects
+- Radius: cards `12px`; inputs/buttons/chips `7–8px`; pills/badges `5–8px`; quick-filter chips (rounded) `16px`; status dots `50%`.
+- Card border: `1px solid #20232a`, radius `12px`, `overflow: hidden`.
+- Row vertical padding: `10–12px`; cell gap: `12px`.
+- Shadows: modal `0 24px 60px rgba(0,0,0,.5)`; status dots get a glow `0 0 6–8px <color>66/77`.
+- Status-dot glow pattern: `box-shadow: 0 0 7px <color>77`.
+### Keyframes (animations)
+```css
+@keyframes blink   { 0%,100%{opacity:1} 50%{opacity:.25} }          /* live "running" dot */
+@keyframes pulseRed{ 0%,100%{box-shadow:0 0 0 0 rgba(241,91,74,0)} 50%{box-shadow:0 0 0 4px rgba(241,91,74,.18)} } /* running step pulse */
+@keyframes slideIn { from{opacity:0; transform:translateY(4px)} to{opacity:1; transform:none} } /* inline-expand reveal */
+```
+Scrollbars: 9px, thumb `#2b2e34` radius 6px, transparent track.
+---
+## Screens / Views
+### Global chrome
+- **Top app bar** (`#0d0e11`, bottom border `#1d1f24`, padding `11px 20px`): orange rounded logo square (gradient `135deg, #ff7d2e, #ff5c4d`), "Forge / Automation" breadcrumb, and a right-aligned status legend (passed/failed/running/skipped swatches, 9×9px rounded `2px`).
+- **Page top-nav** (inside each page card, `#0d0e11`, bottom border): tabs `Schedules · Pipeline · Pipeline Record · Task`. Active tab = `#f3f4f6` text + 2px bottom border `#4f9dff` + weight 600; inactive = `#6b7079`. (Schedules is inert/out-of-scope.) The Task tab is "active" for both the Task list and the Task Detail page.
+---
+### 1. Pipeline (manage definitions)
+**Purpose:** view and edit pipeline definitions — their ordered steps, each step's type (shell/LLM) and command/prompt, and trigger.
+**Layout:** two-pane inside the page card, min-width 900px.
+- **Left rail (248px, border-right):** header row "Pipelines · 6" + an orange `+ New` button (`#ff7d2e`, text `#0b0c0e`, radius 6px). Scrollable list of pipelines; each item: status dot (last-run color) + name (`12.5px/600`), and a sub-line with pipeline id (mono, `#6b7079`), step count, run count. Selected item: left border `2px #ff7d2e`, bg `#15171b`.
+- **Right detail:** header (padding `18px 22px`, border-bottom) with pipeline name (`19px/700`), mono pipeline id, a `查看运行记录 →` link (blue) that jumps to Pipeline Record filtered to this pipeline, and an **Edit** toggle button (default: ghost `#1a1d22` border `#2a2e35`; when editing: green `#46c25a` bg, dark text, label "Done"). A meta row shows `trigger · steps · runs · last <status dot>`.
+- **Steps list:** bordered container (`1px #1d1f24`, radius 10). Each step row (`12px 15px`, divider): when editing, a drag handle `⠿` (`#3f434b`) and a delete `✕`; index number (mono, faint); step name (`13px/600`); a kind badge (`shell` = yellow `#d8a23f`, `llm` = purple `#a978f0`, format `1px 7px` radius 5, bg `<color>1c`); and below, the command/prompt in mono `11.5px #7d828b`. When editing, an `+ Add step` link (green) appears above the list.
+**Edit mode is visual only in the prototype** (drag handles, add/delete affordances appear) — implement real editing per the codebase (reorder, inline edit of name/command, add/remove).
+---
+### 2. Pipeline Record (all execution records)
+**Purpose:** one flat table of every run across all pipelines; find a run by condition, expand it to see its own steps + the failed-step log, and retry/cancel.
+**Layout:** page card, the table area is min-width 1040px and **horizontally scrollable**; an **outer wrapper scrolls (overflow:auto, max-height 560px) with the min-width on an inner div** so that a right-pinned actions column can stay visible (see "Pinned actions" below).
+**Filter bar** (padding `13px 18px`, border-bottom, stacked rows, min-width 880px):
+- Search input (flex, mono) — placeholder "按 run id / pipeline / 步骤名 筛选…", with a clear `×` when non-empty.
+- A view toggle segmented control: **平铺 Flat** (default) / **按 pipeline 分组**.
+- A count readout: `<total> runs · <failed> failed` (failed count in red).
+- Segmented filter groups, each a pill-row in a `#0c0d10` bordered container (`padding:3px`, inner pills `6px 12px` radius 7): **Status** (All/Failed/Running/Passed — active colors red/blue/green), **Time** (24h/7d/30d/All — default 7d), **Trigger** (All/◷ Scheduled/☞ Manual/⚡ Webhook), **Pipeline** (All + each pipeline name; active = orange).
+**Column header (sticky top, `#0a0b0d`):** chevron | Status | Pipeline ID | Pipeline | Run ID | Steps | Result | Trigger | Duration | Started | **Actions** (pinned right). Header labels are uppercase faint `10px`.
+**Run row** (flex, gap 12, `11px 16px`, divider; expanded row: left border 2px status color + bg `#121317`):
+- chevron `▸/▾` (faint, width 12)
+- Status cell (width 74): status dot (8×8, glow) + status label in status color, `11px/600`
+- Pipeline ID (width 90, mono `#6b7079`) — only in flat view
+- Pipeline name (width 196, `12px/600`) — only in flat view
+- Run ID (width 80, mono `#c9cdd3`)
+- **Steps bar** (flex, min 64): a thin segmented bar — one `6px` segment per step colored by that step's status (green/red/blue/gray), `gap 1.5px`, radius 1
+- Result/summary (width 150): `"failed · <step>"` (red) / `"running · <step>"` (blue) / `"N steps passed"` (dim)
+- Trigger (width 92): glyph + name, dim
+- Duration (width 64, right), Started (width 62, right)
+- **Pinned Actions cell** (width ~104, `position: sticky; right: 0`, with a left-fading gradient background `linear-gradient(90deg, transparent, #0c0d10 26%)`, `pointer-events:none` on wrapper, `pointer-events:auto` on the button). Shows a quick action: **↻ Retry** (orange-tinted) for failed runs, **■ Cancel** (red-tinted) for running runs, nothing for passed. **This pinned column is essential** — without it the actions scroll off the right edge of the wide table.
+**Group view** (`按 pipeline 分组`): a collapsible header per pipeline (`#0d0e11`): chevron, last-run dot, pipeline name (`13px/700`), run count, a red `N failed` badge, and a 12-cell recent-status sparkline on the right (each `7×14` rounded `2px`). Expanding shows that pipeline's rows (same row component, but without the pipeline id/name columns).
+**Expanded run detail** (inline, `slideIn` animation, bg `#08090b`, left border = run status color):
+- **Actions toolbar** (`#0d0e11`): an "Actions" label + buttons. Failed → `↻ Retry from <step>` + `↻ Re-run all`. Running → `■ Cancel` + `⊙ Follow live`. Passed → `↻ Re-run`. Buttons are neutral chips (`#1a1d22`, border `#2a2e35`).
+- **This run's own steps** (horizontal, scrollable): a label "`N steps · 此 run 自己的步骤 · 点步骤切日志`" then a row of step pills — each pill: status glyph (✓/✕/◐/–) in status color + step name + (focused pill gets a colored border/bg; running pill pulses). Clicking a pill changes the focused step. **Important:** each run renders *its own* step list (runs are heterogeneous — different runs can have different steps), so never assume a fixed step schema.
+- **Focused-step panel** (split): left = the step's **log** (mono `11.5px`, error lines red, `✓` green, `→` blue) with an `Open task detail →` link (blue) that opens the Task Detail page for that step's task; right (width ~268–300) = **Step output** key/value list (`key` faint width ~84–96, `value` mono colored).
+**Defaults:** flat view, time = 7d, the `25b1fc48` (failed apply-fix) run pre-expanded.
+**Retry behavior:** Retry creates a *new* run with status `running` prepended to the list (id like `re1_25b1`), and resets filters to show it. **Cancel** flips a running run to a cancelled state (rendered as failed, status label "cancelled").
+---
+### 3. Task (create + records, one-shot)
+**Purpose:** create a one-shot task and browse recent/running tasks. Tasks have no separate definition — create = run.
+**Header:** Inter title "Task · 任务" + a small "页面" tag; description paragraph explaining tasks are one-shot.
+**Filter bar** (like Pipeline Record, min-width 980px):
+- Search (mono) placeholder "按 task id / 命令 / 步骤 / pipeline 搜索…", count readout `<n> tasks · <failed> failed`, and an orange **`+ 新建任务`** button.
+- **Quick filters** — a row of rounded (16px) chips: `全部` / `失败` (red) / `运行中` (blue) / `scratch` (yellow) / `有费用` (purple) / `今天` (blue). Active chip = colored border + `<color>1e` bg + colored text.
+- Segmented filters: **Status** (All/Done/Failed/Running), **Provider** (All + distinct providers), **Time** (24h/7d/30d/All), **Pipeline** (All + names).
+**Column header (sticky):** Status | Task ID | Pipeline · Step | Provider | Command | Cost | Dur | Started | (expand chevron).
+**Task row** (flex, gap 12, `10px 16px`, divider; expanded: bg `#121317` + left border status color):
+- Status cell (width 76): dot + status text in status color
+- Task ID (width 86, mono `#c9cdd3`)
+- Pipeline · Step (width 200): pipeline name on top (`12px`), step name below (`10.5px #6b7079`)
+- Provider (width 80, `11.5px/600`; "scratch" = dim)
+- Command (flex, mono `11px #8b9098`, ellipsized)
+- Cost (width 58, purple, right), Dur (width 58, right), Started (width 62, right)
+- expand chevron `▸/▾` (width 16, right)
+**Inline expanded cockpit** (compact T3, bg `#070809`, left border status color):
+- Header: live dot, `task://<id>`, step name (`12px/700`), status badge; right side: `打开完整详情 ↗` (blue → opens Task Detail page), plus `↻ Retry` (failed) / `■ Cancel` (running) / `Re-run`.
+- A search + "errors N" toggle row (`#0a0b0d`).
+- **Log** (mono `12.5px`, line numbers in ghost gutter, error lines get red left-gutter `3px` + faint red row bg) with a **right-edge minimap** (13px column; one `5px` mark per line; red marks = error lines).
+- Below: **Result** (mono `11px`, `white-space:pre`) and **Git Diff** side by side (add lines green on `#46c25a14`, del red on `#f15b4a14`, hunk purple, meta dim).
+**Create Task modal** (centered, overlay `rgba(4,5,7,.72)`, panel 560px, `#0e1013`, border `#2a2e35`, radius 14, shadow):
+- Header "新建任务" + subtitle "创建即运行 · 一次性" + `×`.
+- **Provider / Repo** `<select>` (FortiNAC / scratch / canary / regress / depbump / docs).
+- **Type** segmented: `⌘ Shell` (active = yellow `#d8a23f` bg) / `✦ LLM Prompt` (active = purple `#a978f0` bg).
+- **Command / Prompt** `<textarea>` (min-height 96, mono), placeholder changes with type.
+- Footer: hint text, `取消`, and an orange **`▶ 创建并运行`** (disabled/greyed until the field is non-empty).
+- On run: prepend a new task `{ status: running, provider, step: command|prompt, summary: <input> }` to the list, close modal, reset filters so it's visible.
+---
+### 4. Task Detail (full single-task page)
+**Purpose:** the complete view of one task — richer than the inline peek. Reached from Pipeline Record's `Open task detail →` and the Task list's `打开完整详情 ↗`.
+**Layout:** breadcrumb `← Task / Task Detail` above the card; inside the card: top-nav (Task active), header, lineage banner, a main/sidebar split, then Result + Diff full-width. Min-width 900px.
+- **Header** (`18px 22px`, border-bottom): big status dot (11×11, glow), step name (`19px/700` mono), status badge, mono task id; right: **↻ Retry** (failed, solid orange) / **■ Cancel** (running, red-tinted) / **Re-run** (neutral).
+- **Lineage banner** (`#0e1622`, border `#1d2b3d`, clickable): `来自 <pipeline name> › run <run id> (blue) › <step>` + right link `在 Pipeline Record 中查看 ↗`. Clicking jumps to Pipeline Record with that run expanded. Hidden for ad-hoc/scratch tasks.
+- **Main column** (flex 1, border-right):
+  - **Command / Prompt** block: uppercase label, then a bordered well (`#070809`, border `#20232a`, radius 8) with the command in mono `12px`.
+  - **Log** header row: "Log" label + line count + an inline search field + "errors N" toggle.
+  - **Log body** (max-height 300, mono `12.5px/1.95`, line numbers ghost gutter width 26, error rows tinted) + **right minimap** (13px).
+- **Sidebar** (width 280, `#0a0b0d`):
+  - **Details** key/value list: Task ID, Status (colored), Type (shell yellow / LLM purple), Provider/Repo, Exit code (0 green / 1 red / — dim), Started, Duration, Cost (purple), and for LLM tasks: Model (`claude-sonnet-4.6`) + Tokens; for bugfix pipeline: Branch (`fix/...`, blue). Each: faint key (width 108) + mono value.
+  - **Phases** timeline: queued → provision env → execute → collect output, each with a status glyph + label. Failed task: execute = ✕ red, collect = skipped; running: execute = ◐ blue, collect = pending; done: all green.
+- **Result + Git Diff** (full-width, border-top, split): same treatment as the inline cockpit, max-height 200 each.
+---
+## Interactions & Behavior
+- **Navigation:** top-nav tabs and a left rail both set the active page. No full page reload; client-side view switch.
+- **Row expand/collapse:** clicking a Pipeline Record run row or a Task row toggles an inline detail region (`slideIn`, 0.15s). Only one expanded at a time (single `expandedRun` / `expandedTask` id).
+- **Step focus:** inside an expanded run, clicking a step pill swaps the focused-step log/output (per-run focus map).
+- **Cross-links:** Pipeline → Pipeline Record (filtered); Pipeline Record step → Task Detail (mapped to that step's task); Task list/peek → Task Detail; Task Detail lineage → Pipeline Record (run expanded).
+- **Retry:** (runs and tasks) spawns a new `running` record at the top; resets filters so it's visible.
+- **Cancel:** flips a `running` record to cancelled (shown as failed).
+- **Create Task:** modal; run button disabled until command/prompt non-empty; on submit prepend running task.
+- **Search/filter:** all filters compose (AND). Search matches id / name / step / command / provider. Quick-filter chips are shortcuts that set the underlying filter state.
+- **Live "running":** running dots use the `blink` keyframe; running steps/cells use `pulseRed`.
+- **Pinned actions column** in Pipeline Record uses `position: sticky; right: 0` against a full-width scroll wrapper so Retry/Cancel never scroll out of view.
+## State Management
+Single view-model/state object (mirror in the target app's state layer):
+- `activePage`: `'pipe' | 'p1' (pipeline record) | 'trecord' (task) | 'tdetail' (task detail)` (+ legacy exploration views, ignore).
+- Pipeline Record: `recStatus, recPipe, recTime, recTrigger, recSearch, recGroup ('timeline'|'pipeline'), expandedRun, runFocus{runId:stepName}, expandedGroups{}, extraRuns[] (retried), cancelledRuns{}`.
+- Pipeline mgmt: `selectedPipelineId, editMode`.
+- Task: `trStatus, trProvider, trPipe, trTime, trSearch, trCost(bool), expandedTask, createdTasks[], cancelledTasks{}, createModalOpen, createForm{provider,mode,cmd}`.
+- Task Detail: `selectedTaskId`, plus shared log controls `taskSearch, errorsOnly`.
+- Data needs (replace mock): list of pipeline definitions (id, name, trigger, ordered steps each with type + command); list of run records (id, pipeline, status, started, duration, trigger, per-step statuses, focused/failed step, per-step logs + outputs); list of task records (id, status, provider, pipeline+step, command/prompt, cost, duration, started, parent run id, log lines, result JSON, git diff).
+## Assets
+None. No image/icon files — all glyphs are Unicode characters (`✓ ✕ ◐ – ▸ ▾ ↻ ■ ⊙ ⌕ ↗ → › ⠿ ⌘ ✦ ◷ ☞ ⚡`) and the only fonts are Google Fonts **JetBrains Mono** and **Inter**. Substitute the codebase's icon set if preferred; keep meanings.
+## Files
+- `Automation Redesign.dc.html` — the full prototype containing all four final pages (plus the top-nav, Overview, and earlier design explorations behind other nav items, which are reference-only).
+- The earlier exploration rounds live in sibling files in the project (`Automation Redesign — Round 1/2.dc.html`) and are **not** part of this handoff — the four pages above are the settled design.

package/lib/chat/agent-loop.ts CHANGED Viewed

@@ -402,6 +402,12 @@ function buildSystemPrompt(
     '',
     '- Reply without tools ONLY when no system + no time question is involved.',
     '',
+    '',
+    'Files & task results — NEVER fabricate:',
+    '- A dispatched task\'s real deliverable is usually a FILE it wrote into the project repo (e.g. "docs/report.md"), NOT its result_summary. To get it, call read_project_file({project, path}). read_forge_file only reaches Forge\'s data dir (tmp/scratch/flows/...), NOT project repos.',
+    '- If a file read returns not-found / a tool fails, SAY SO plainly and report the path you tried. NEVER reconstruct, summarize-from-memory, or invent file contents — a guessed report presented as the real one is a serious error.',
+    '- To read any file (a task\'s output, a source file), prefer read_project_file / read_forge_file over spawning a dispatch_task. Never start a task just to cat/display a file.',
+    '',
     'Other:',
     '- Call get_current_time when asked about "now" or "today".',
     'Keep replies short and direct.',

package/lib/chat/tool-dispatcher.ts CHANGED Viewed

@@ -214,7 +214,7 @@ const BUILTINS: Record<string, BuiltinHandler> = {
   // required (no default) vs optional (have default) so the agent can omit
   // optional ones rather than passing wrong placeholder values.
   trigger_pipeline: async (input) => {
-    const params = (input as { workflow?: string; input?: Record<string, unknown>; skills?: unknown } | undefined) || {};
+    const params = (input as { workflow?: string; input?: Record<string, unknown>; skills?: unknown; backend?: unknown } | undefined) || {};
     const { listWorkflows, startPipeline, getPipeline, getWorkflow } = await import('../pipeline');
     if (!params.workflow) {
       const workflows = listWorkflows();
@@ -327,7 +327,13 @@ const BUILTINS: Record<string, BuiltinHandler> = {
       }
     }
-    const pipeline = startPipeline(params.workflow, stringInput, { skills: skills.length ? skills : undefined });
+    // Runtime backend override: "use tmux" / "用 tmux 方式" makes every task in
+    // this run use the tmux backend, regardless of the workflow's declared default.
+    const backendOverride = params.backend === 'tmux' || params.backend === 'headless' ? params.backend : undefined;
+    const pipeline = startPipeline(params.workflow, stringInput, {
+      skills: skills.length ? skills : undefined,
+      backend: backendOverride,
+    });
     const fresh = getPipeline(pipeline.id) || pipeline;
     const errors: string[] = [];
     if (pipeline.status === 'failed') {
@@ -522,6 +528,56 @@ const BUILTINS: Record<string, BuiltinHandler> = {
     });
   },
+  // Read a file from inside a PROJECT repo (the working dir a task ran in),
+  // NOT the Forge data dir. This is the companion to read_forge_file: a
+  // dispatch_task that writes `docs/report.md` writes it into the project
+  // repo, which resolveDataPath (read_forge_file) can never reach. Path is
+  // resolved relative to the project root with traversal + sensitive-file
+  // guards so it can't escape the repo or leak secrets.
+  read_project_file: async (input) => {
+    const params = (input as { project?: string; path?: string; filename?: string; as_base64?: boolean } | undefined) || {};
+    const projectName = (params.project || '').trim();
+    const rel = (params.path || params.filename || '').trim();
+    if (!projectName) return JSON.stringify({ ok: false, error: 'project is required (Forge project name — call list_forge_context for valid names)' });
+    if (!rel) return JSON.stringify({ ok: false, error: 'path is required (project-relative, e.g. "docs/report.md")' });
+    const { getProjectInfo } = await import('../projects');
+    const proj = getProjectInfo(projectName);
+    if (!proj) return JSON.stringify({ ok: false, error: `Project not found: ${projectName}. Call list_forge_context for valid names.` });
+    const { resolve, isAbsolute, sep } = await import('node:path');
+    const root = proj.path;
+    if (isAbsolute(rel) || rel.split(/[\\/]/).includes('..')) {
+      return JSON.stringify({ ok: false, error: 'path must be project-relative — no leading "/" and no ".." segments' });
+    }
+    const target = resolve(root, rel);
+    if (target !== root && !target.startsWith(root + sep)) {
+      return JSON.stringify({ ok: false, error: 'resolved path escapes the project directory' });
+    }
+    // Sensitive-file guard: never hand back secrets even from inside a repo.
+    if (/(^|\/)\.git\//.test(rel) || /(^|\/)\.env(\.|$)/i.test(rel) || /\.(pem|key|p12|pfx)$/i.test(rel) || /(^|\/)\.encrypt-key$/.test(rel) || /(^|\/)id_(rsa|ed25519)$/.test(rel)) {
+      return JSON.stringify({ ok: false, error: `Refused: "${rel}" looks like a sensitive file (git internals / env / private key).` });
+    }
+    const { readFile } = await import('node:fs/promises');
+    let buf: Buffer;
+    try { buf = await readFile(target); }
+    catch { return JSON.stringify({ ok: false, error: `File not found in project ${projectName}: ${rel}` }); }
+    if (params.as_base64) {
+      return JSON.stringify({
+        ok: true, project: projectName, path: rel,
+        local_path: target, file_url: `file://${target}`,
+        encoding: 'base64', size_bytes: buf.length, content: buf.toString('base64'),
+      });
+    }
+    const MAX = 256 * 1024;
+    const truncated = buf.length > MAX;
+    return JSON.stringify({
+      ok: true, project: projectName, path: rel,
+      local_path: target, file_url: `file://${target}`,
+      encoding: 'utf-8', size_bytes: buf.length, truncated,
+      content: buf.subarray(0, MAX).toString('utf8'),
+      ...(truncated ? { note: `content truncated to ${MAX} bytes — use as_base64 for the full file` } : {}),
+    });
+  },
   // Extract a zip/tar/gz archive sitting in tmp/ (e.g. one a connector just
   // downloaded via the _files channel) into tmp/<base>-extracted/, then
   // return the file listing so the agent can read_forge_file each entry.
@@ -657,7 +713,7 @@ const BUILTINS: Record<string, BuiltinHandler> = {
   // required (defaults to 'scratch' if not given). Returns the task id; the
   // caller can ask "what's the status of task <id>?" later — we don't block.
   dispatch_task: async (input) => {
-    const params = (input as { project?: string; prompt?: string; agent?: string } | undefined) || {};
+    const params = (input as { project?: string; prompt?: string; agent?: string; backend?: string } | undefined) || {};
     if (!params.prompt) return JSON.stringify({ ok: false, error: 'prompt is required' });
     const { getProjectInfo, SCRATCH_PROJECT_NAME } = await import('../projects');
     const projectName = params.project?.trim() || SCRATCH_PROJECT_NAME;
@@ -670,6 +726,7 @@ const BUILTINS: Record<string, BuiltinHandler> = {
       prompt: params.prompt,
       conversationId: '',
       agent: params.agent || undefined,
+      backend: params.backend === 'tmux' ? 'tmux' : undefined,
     });
     return JSON.stringify({
       ok: true,
@@ -701,20 +758,37 @@ const BUILTINS: Record<string, BuiltinHandler> = {
   // so start_watch can poll via done_path="terminal" or done_match
   // {path:"status", equals:"done"}.
   get_task_status: async (input) => {
-    const params = (input as { task_id?: string } | undefined) || {};
+    const params = (input as { task_id?: string; full?: boolean } | undefined) || {};
     if (!params.task_id) return JSON.stringify({ ok: false, error: 'task_id is required (returned by dispatch_task)' });
     const { getTask } = await import('../task-manager');
     const task = getTask(params.task_id);
     if (!task) return JSON.stringify({ ok: false, error: `Task "${params.task_id}" not found` });
-    return JSON.stringify({
+    // result_summary is a short headline (capped in DB). With full:true we also
+    // return the tail of the task's own log so the caller can see the real
+    // narration instead of guessing — but if the task wrote a file, the file is
+    // the deliverable; read it with read_project_file rather than parsing this.
+    const summaryCap = params.full ? 4000 : 1000;
+    const base: Record<string, unknown> = {
       id: task.id,
       status: task.status,
       terminal: task.status === 'done' || task.status === 'failed' || task.status === 'cancelled',
       project: task.projectName,
-      ...(task.resultSummary ? { result_summary: String(task.resultSummary).slice(0, 1000) } : {}),
+      ...(task.resultSummary ? { result_summary: String(task.resultSummary).slice(0, summaryCap) } : {}),
       ...(task.error ? { error: String(task.error).slice(0, 500) } : {}),
       ...(task.completedAt ? { completed_at: task.completedAt } : {}),
-    });
+    };
+    if (params.full) {
+      const { getTaskLogSlice } = await import('../task-manager');
+      const { entries } = getTaskLogSlice(task.id, { limit: 60, truncate: 4000 });
+      const text = entries
+        .map((e: any) => (typeof e.content === 'string' ? e.content : ''))
+        .filter(Boolean)
+        .join('\n')
+        .slice(-12000);
+      if (text) base.output_tail = text;
+      base.note = 'output_tail = the task\'s own log narration (last entries). If the task wrote a file in the project repo (e.g. docs/report.md), THAT file is the real deliverable — read it with read_project_file. Never reconstruct/guess file contents.';
+    }
+    return JSON.stringify(base);
   },
   // List Forge's own help/documentation files so the chat agent can answer
@@ -963,6 +1037,11 @@ export const BUILTIN_TOOL_DEFS: BuiltinToolDef[] = [
           items: { type: 'string' },
           description: 'Forge skills (by name) to make available to every Claude task inside the pipeline — injected via --append-system-prompt. Pass when the user mentions skill names ("用 git-savvy", "with the code-reviewer skill"). Call list_forge_context to validate names. Omit if the user didn\'t mention any.',
         },
+        backend: {
+          type: 'string',
+          enum: ['tmux', 'headless'],
+          description: 'Execution backend override for EVERY task in this run. Set "tmux" when the user says "use tmux" / "用 tmux 方式" / "subscription mode" (interactive claude, subscription billing). Set "headless" to force default claude -p. OMIT to honor whatever the workflow YAML declares (default headless). This overrides the workflow\'s own backend: field.',
+        },
       },
     },
   },
@@ -987,7 +1066,7 @@ export const BUILTIN_TOOL_DEFS: BuiltinToolDef[] = [
   },
   {
     name: 'dispatch_task',
-    description: 'Dispatch a one-shot background Claude CLI task in a Forge project. EXPENSIVE — spawns a fresh Claude subprocess that reads/edits code in the target project. Use ONLY for genuine codebase work: "analyze X repo and write findings", "run the test suite and summarize", "fix bug in <project>", "refactor module Y". \n\nDO NOT use for: \n  • Saving a file with content YOU already have → use save_tmp_file. \n  • Reading a Forge-owned file (tmp/scratch/flows/prompts/...) → use read_forge_file. \n  • Listing files in <dataDir>/ → use list_forge_files. \n  • Running a pipeline → use trigger_pipeline. \n  • Inspecting a saved task → use get_task_status. \n\nFor "create / write / save a file with this content" the right tool is ALWAYS save_tmp_file — the LLM has the content, no CLI subprocess needed. \n\nIf the user\'s ask is ambiguous (might be a quick save vs a real codebase task), STOP and ask before dispatching — a user reporting "I just wanted a file" after seeing a task spawn is a clear signal you misclassified. \n\nReturns JSON: {ok, task_id, project, status, hint}. The task runs in the background; if the user wants completion notification, follow the hint — call start_watch on get_task_status and STOP polling in this conversation.',
+    description: 'Dispatch a one-shot background Claude CLI task in a Forge project. EXPENSIVE — spawns a fresh Claude subprocess that reads/edits code in the target project. Use ONLY for genuine codebase work: "analyze X repo and write findings", "run the test suite and summarize", "fix bug in <project>", "refactor module Y". \n\nDO NOT use for: \n  • Saving a file with content YOU already have → use save_tmp_file. \n  • Reading a Forge-owned file (tmp/scratch/flows/prompts/...) → use read_forge_file. \n  • Reading a file inside a PROJECT repo (e.g. a report a prior task wrote to docs/X.md, or any source file) → use read_project_file. NEVER spawn a task just to cat/display a file. \n  • Listing files in <dataDir>/ → use list_forge_files. \n  • Running a pipeline → use trigger_pipeline. \n  • Inspecting a saved task → use get_task_status. \n\nFor "create / write / save a file with this content" the right tool is ALWAYS save_tmp_file — the LLM has the content, no CLI subprocess needed. \n\nIf the user\'s ask is ambiguous (might be a quick save vs a real codebase task), STOP and ask before dispatching — a user reporting "I just wanted a file" after seeing a task spawn is a clear signal you misclassified. \n\nReturns JSON: {ok, task_id, project, status, hint}. The task runs in the background; if the user wants completion notification, follow the hint — call start_watch on get_task_status and STOP polling in this conversation.',
     input_schema: {
       type: 'object',
       properties: {
@@ -1003,6 +1082,11 @@ export const BUILTIN_TOOL_DEFS: BuiltinToolDef[] = [
           type: 'string',
           description: 'Optional agent id override. Omit to use the project default.',
         },
+        backend: {
+          type: 'string',
+          enum: ['tmux'],
+          description: 'Set to "tmux" to run via interactive tmux session (subscription billing, no API key needed). Omit for default (claude -p, API billing). Use tmux when the user says "use tmux", "subscription mode", or "interactive mode". Set it based on the CURRENT user request only — do NOT carry a previous task\'s tmux choice onto unrelated follow-ups (e.g. a trivial file read), and prefer read_project_file/read_forge_file over a task entirely for reads.',
+        },
       },
       required: ['prompt'],
     },
@@ -1037,6 +1121,19 @@ export const BUILTIN_TOOL_DEFS: BuiltinToolDef[] = [
       required: ['filename'],
     },
   },
+  {
+    name: 'read_project_file',
+    description: 'Read a file from inside a PROJECT repo — the working directory a dispatch_task ran in. THIS is how you read a file a task produced (e.g. a task that "writes findings to docs/report.md" puts it in the project repo, NOT the Forge data dir — read_forge_file CANNOT reach it). Whenever get_task_status says a task wrote/saved a file at a repo-relative path, read it here. `project` is the Forge project name (same value you pass to dispatch_task/pipeline input.project); `path` is repo-relative ("docs/report.md", "src/foo.ts"). Returns decoded UTF-8 (capped 256KB; as_base64:true for binary). Path traversal (../) and sensitive files (.git internals, .env, private keys) are refused. NEVER fabricate file contents — if this returns not-found, tell the user; do not reconstruct from memory.',
+    input_schema: {
+      type: 'object',
+      properties: {
+        project: { type: 'string', description: 'Forge project name (e.g. "FortiNAC"). Call list_forge_context if unsure of valid names.' },
+        path: { type: 'string', description: 'Repo-relative path, e.g. "docs/mantis-1296959-analysis.md", "src/index.ts". No leading "/" and no ".." segments.' },
+        as_base64: { type: 'boolean', description: 'Return raw bytes base64-encoded instead of decoded UTF-8. Use for binary files.' },
+      },
+      required: ['project', 'path'],
+    },
+  },
   {
     name: 'extract_archive',
     description: 'Unpack an archive sitting in tmp/ (e.g. one a connector just downloaded — owa.download_attachment etc.) into tmp/<base>-extracted/, and return the file listing. USE THIS when an attachment / download is a .zip / .tar / .tar.gz / .tgz / .gz and you need to read what is inside — you have no shell, this runs unzip/tar for you. Then read individual entries with read_forge_file using each returned `path`. Returns JSON {ok, extracted_dir, count, files:[{path,size_bytes}]}.',
@@ -1073,7 +1170,7 @@ export const BUILTIN_TOOL_DEFS: BuiltinToolDef[] = [
   },
   {
     name: 'get_task_status',
-    description: "Check a dispatched Forge task's status + result by id. Pass task_id (returned by dispatch_task). Returns JSON: {id, status: 'queued'|'running'|'done'|'failed'|'cancelled', terminal: bool, project, result_summary?, error?, completed_at?}. For start_watch, use done_path=\"terminal\" (fires on done/failed/cancelled) or done_match={path:\"status\",equals:\"done\"}.",
+    description: "Check a dispatched Forge task's status + result by id. Pass task_id (returned by dispatch_task). Returns JSON: {id, status: 'queued'|'running'|'done'|'failed'|'cancelled', terminal: bool, project, result_summary?, error?, completed_at?, output_tail?}. `result_summary` is a SHORT headline — if the task wrote a file in the project repo (e.g. docs/report.md), read that file with read_project_file; it is the real deliverable. Set full:true to also get `output_tail` (the task's own log narration) when no file was written. NEVER reconstruct/guess a task's output from memory. For start_watch, use done_path=\"terminal\" (fires on done/failed/cancelled) or done_match={path:\"status\",equals:\"done\"}.",
     input_schema: {
       type: 'object',
       properties: {
@@ -1081,6 +1178,10 @@ export const BUILTIN_TOOL_DEFS: BuiltinToolDef[] = [
           type: 'string',
           description: 'Task id (returned by dispatch_task).',
         },
+        full: {
+          type: 'boolean',
+          description: 'Also return `output_tail` (tail of the task log) and a longer result_summary. Use when result_summary is too short to answer and the task did NOT write a file you can read with read_project_file.',
+        },
       },
       required: ['task_id'],
     },

package/lib/help-docs/05-pipelines.md CHANGED Viewed

@@ -109,6 +109,37 @@ pipeline.
 | `outputs` | Extract results (see Output Extraction) | `[]` |
 | `routes` | Conditional routing to next nodes (see Routing) | `[]` |
 | `max_iterations` | Max loop iterations for routed nodes | `3` |
+| `backend` | `tmux` (interactive claude, subscription billing) or `headless` (`claude -p`). Overrides the workflow-level `backend`. | inherits workflow |
+### Execution backend (`backend: tmux`)
+By default every node runs **headless** (`claude -p`, API billing). Set a
+top-level `backend: tmux` to run all nodes as interactive claude inside a
+dedicated per-node tmux session (subscription billing, no API key). A node can
+override with its own `backend:`.
+```yaml
+name: my-pipeline
+backend: tmux          # all nodes use tmux by default
+nodes:
+  build:
+    project: my-app
+    prompt: "..."
+  deploy:
+    backend: headless  # this one node opts back to claude -p
+    project: my-app
+    prompt: "..."
+```
+Each node still gets its **own** tmux session (`fgt-<taskId>`). Sharing one
+session across nodes is not yet supported.
+**Runtime override from chat.** You don't have to edit the YAML — when you
+fire a pipeline from chat you can say *"fix bug 1234 with the
+mantis-bug-fix pipeline, **use tmux**"* and the assistant passes
+`backend: tmux` to `trigger_pipeline`, switching every task in that run to
+tmux regardless of the workflow's declared default. `backend: headless`
+forces the opposite. Omitting it honors the YAML.
 ## Node Modes

package/lib/help-docs/25-chat-tools.md CHANGED Viewed

@@ -46,6 +46,24 @@ chat agent can't accidentally leak these into chat context.
 Pass `as_base64: true` for binary files (pdf, images, zip).
+### `read_project_file` — read a file inside a PROJECT repo
+`read_forge_file` only reaches `<dataDir>/`. When a `dispatch_task` writes
+its findings to a repo file (e.g. *"document your analysis in
+`docs/report.md`"*), that file lands in the **project working directory**,
+not the Forge data dir — so `read_forge_file` returns "not found". Use
+`read_project_file({project, path})` instead:
+- `project` — the Forge project name (same value as `dispatch_task` /
+  pipeline `input.project`, e.g. `FortiNAC`).
+- `path` — repo-relative (`docs/report.md`, `src/index.ts`). No leading
+  `/`, no `..` segments.
+Path traversal and sensitive files (`.git/` internals, `.env`, private
+keys) are refused. Returns decoded UTF-8 (capped 256 KB; `as_base64: true`
+for binary). **Never** dispatch a `cat` task to read a file — and never
+reconstruct file contents from memory if the read fails; report the path
+that wasn't found.
 ### `list_forge_files` — list files anywhere under `<dataDir>/`
 Pass `dir` as a dataDir-relative subdir (`tmp`, `scratch`, `flows`,
 `connectors/mantis`). Each entry returns `path`, `kind` (file/dir),
@@ -84,6 +102,11 @@ Returns `status`, `terminal`, `result_summary` (truncated to 1KB),
 `error`, `completed_at`. For long-running polls, prefer `start_watch`
 over manual polling — see `24-watch.md`.
+`result_summary` is a short headline. If the task's real output is a file
+it wrote into the repo, read that file with `read_project_file` — that's
+the deliverable. Pass `full: true` to also get `output_tail` (the tail of
+the task's own log) and a longer summary when no file was written.
 ## Pipelines + schedules
 The chat agent owns the full schedule CRUD surface and pipeline triggers: