@glrs-dev/harness-plugin-opencode 0.3.1 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,5 +1,195 @@
1
1
  # Changelog
2
2
 
3
+ ## 1.0.0
4
+
5
+ ### Major Changes
6
+
7
+ - [#26](https://github.com/iceglober/glrs/pull/26) [`6cec227`](https://github.com/iceglober/glrs/commit/6cec227eeb4360344a8a5cb9b944f3070459084c) Thanks [@iceglober](https://github.com/iceglober)! - pilot: scorched-earth rollback of worktree isolation — cwd mode is the only execution shape
8
+
9
+ **Breaking change.** The pilot subsystem no longer manages a per-task worktree pool. `pilot build` now runs each task directly in the user's current worktree (`process.cwd()`), committing on HEAD of the user's feature branch after each task's verify passes.
10
+
11
+ User-visible changes:
12
+
13
+ - **Pre-flight safety gate.** `pilot build` refuses to run when the working tree is on `main`/`master`/the remote's default branch, outside a git repo, or has uncommitted changes. Match `/fresh --yes` semantics.
14
+ - **`setup:` field removed.** Plans that declare a top-level `setup:` array fail `pilot validate` with a friendly message pointing at `src/pilot/AGENTS.md`. Users should run setup manually (install, compose, migrate, seed) before invoking `pilot build`.
15
+ - **CLI verbs removed.** `pilot resume`, `pilot retry`, and `pilot worktrees` are deleted. cwd-mode resume/retry semantics are future work.
16
+ - **No `PILOT_*` env injection.** Verify commands inherit `process.env` verbatim. The COMPOSE_PROJECT_NAME default is gone.
17
+ - **Auto-commit contract preserved.** The worker still auto-commits after each successful task — just on HEAD of the user's current branch instead of a throwaway per-task branch.
18
+
19
+ Internal:
20
+
21
+ - Deleted `src/pilot/worktree/` directory and its `pool.ts`/`git.ts` modules.
22
+ - New `src/pilot/worker/safety-gate.ts` with `checkCwdSafety()`.
23
+ - `enforceTouches()` now takes `cwd` instead of `worktree`.
24
+ - Plan schema uses `.passthrough().superRefine(...)` to surface the friendly setup-removal message alongside standard unknown-key rejection.
25
+ - `pilot-planning` skill is now 9 rules (was 10); `setup-authoring.md` deleted.
26
+
27
+ ### Minor Changes
28
+
29
+ - [#26](https://github.com/iceglober/glrs/pull/26) [`6cec227`](https://github.com/iceglober/glrs/commit/6cec227eeb4360344a8a5cb9b944f3070459084c) Thanks [@iceglober](https://github.com/iceglober)! - pilot: add `pilot build-resume` — continue a partially-completed run
30
+
31
+ When `pilot build` fails mid-run (task failure, stall, abort), previously the only recovery was to rerun from scratch or finish manually. `pilot build-resume` picks up where the run left off:
32
+
33
+ - Discovers the latest non-terminal run in the repo (or honors `--run <id>`).
34
+ - Skips `succeeded` tasks — their commits are already on HEAD.
35
+ - Resets every non-succeeded task (failed/blocked/aborted/running) to `pending` with `attempts=0` and a fresh retry budget. Cost is preserved.
36
+ - Re-marks the run as `running`, clears `finished_at`.
37
+ - Pre-flight: same safety gate as `pilot build` (clean tree, feature branch) PLUS a branch-match check — refuses if the current branch name doesn't equal the branch recorded on any succeeded task from the run. Prevents "I switched branches since" mistakes.
38
+ - Loads the plan from the path recorded on the run row. If the user edited the plan between runs, the resume picks up the edited version.
39
+
40
+ Usage:
41
+
42
+ ```bash
43
+ # resume the latest failed/blocked run in this repo
44
+ pilot build-resume
45
+
46
+ # or target a specific run
47
+ pilot build-resume --run 01KQDEDKGMAF6NGSKNS2H8QB4V
48
+ ```
49
+
50
+ Exit codes:
51
+
52
+ - `0` — resume succeeded (every remaining task completed).
53
+ - `1` — wiring failure, branch mismatch, or safety gate refusal.
54
+ - `2` — no resumable tasks (all succeeded, or no runs found).
55
+ - `3` — resume ran but at least one task failed.
56
+ - `130` — SIGINT.
57
+
58
+ New state accessors: `resetTasksForResume()`, `markRunResumed()`.
59
+
60
+ - [#26](https://github.com/iceglober/glrs/pull/26) [`6cec227`](https://github.com/iceglober/glrs/commit/6cec227eeb4360344a8a5cb9b944f3070459084c) Thanks [@iceglober](https://github.com/iceglober)! - pilot: clean the working tree after every task (success OR failure)
61
+
62
+ The worker now guarantees the tree is pristine between tasks. After every task the worker runs `git reset --hard HEAD && git clean -fd` (preserves `.gitignored`). This makes the tree-clean-between-tasks invariant explicit: `git status --porcelain` is empty before the next task starts.
63
+
64
+ - **Success paths** already had this implicitly via `commitAll`. No behavior change — the reset is a no-op on an already-clean tree.
65
+ - **Failure paths** previously left partial agent edits in the working tree. Now they're reverted. The forensic record of what the failed task did lives in `runs/<runId>/tasks/<taskId>/session.jsonl` — unchanged.
66
+
67
+ Consequences:
68
+
69
+ 1. `pilot build-resume` no longer trips on a dirty tree left behind by the failed run — the failed task's own cleanup already handled it. Resume just works.
70
+ 2. Subsequent tasks in the same run start from a known-clean state. No more "task B silently ran on top of task A's partial edits."
71
+ 3. If the post-task cleanup itself fails (locked ref, permissions), the worker halts the whole run with a clear error and emits a `run.cleanup.failed` event. Subsequent tasks cannot safely run on a mixed tree.
72
+
73
+ Users who need to inspect what a failed task produced should open the session's JSONL log under `~/.glorious/opencode/<repo>/pilot/runs/<runId>/tasks/<taskId>/session.jsonl` — the git diff is no longer the canonical record.
74
+
75
+ - [#26](https://github.com/iceglober/glrs/pull/26) [`6cec227`](https://github.com/iceglober/glrs/commit/6cec227eeb4360344a8a5cb9b944f3070459084c) Thanks [@iceglober](https://github.com/iceglober)! - pilot-planner: accept multi-issue cross-cutting plans as a first-class shape
76
+
77
+ The pilot-planning skill previously encouraged the planner to refuse
78
+ ambitious multi-issue scopes — pushing users to run multiple pilot
79
+ sessions with 3× the setup cost. Skill rework:
80
+
81
+ - `decomposition.md` gains a "Plan sizing" section: 5–30 tasks is the
82
+ sweet spot, and bundling 2–4 related issues into one plan is first-
83
+ class when they share repo + package manager + docker-compose + test
84
+ runner. Cross-references `dag-shape.md`'s "Disconnected" pattern.
85
+ - `SKILL.md` gains a "When to bundle vs. split plans" section placed
86
+ before "When to refuse". The refuse section is rewritten to refuse
87
+ ONLY for underspecified / ambiguous / no-concrete-acceptance work
88
+ (e.g., "refactor auth", "clean up tech debt"), explicitly stating
89
+ plan size, multi-issue scope, and disconnected-subtree shape are
90
+ NOT refusal reasons.
91
+ - `self-review.md` question 6 is rewritten: task-level `cascadeFail`
92
+ only blocks DEPENDENTS of the failing task, not siblings in
93
+ disconnected subtrees. The question now asks whether the dependency
94
+ graph concentrates too much value in one critical task (a real
95
+ anti-pattern), not whether the plan is "too big" (a false one).
96
+
97
+ Observable effect: the planner now bundles cross-cutting work like
98
+ "rule-engine cleanup + cache invalidation + admin UI" into one plan
99
+ instead of refusing the scope.
100
+
101
+ - [#26](https://github.com/iceglober/glrs/pull/26) [`6cec227`](https://github.com/iceglober/glrs/commit/6cec227eeb4360344a8a5cb9b944f3070459084c) Thanks [@iceglober](https://github.com/iceglober)! - pilot: safety gate tolerates framework-owned dirty files (`.opencode/**`, `next-env.d.ts`, etc.)
102
+
103
+ When opencode auto-updates its plugin dep in the background, it bumps `.opencode/package.json` + `.opencode/package-lock.json`. Previously the pilot safety gate rejected those dirty files as "user uncommitted work," blocking `pilot build` on something the user didn't do and couldn't preempt.
104
+
105
+ **Fix:** A new `SAFETY_GATE_TOLERATE` list mirrors the post-task `DEFAULT_TOLERATE` pattern. Dirt ONLY in these paths is allowed; pilot proceeds with a one-line warning showing which framework-owned files were modified. Genuine user dirt (anywhere else) still refuses as before. Mixed dirty trees (framework + user) refuse and surface the user-owned path in the error message.
106
+
107
+ Tolerated paths:
108
+
109
+ - `.opencode/**` — opencode plugin installer churn.
110
+ - `**/next-env.d.ts`, `**/.next/types/**`, `**/.next/dev/types/**` — Next.js artifacts.
111
+ - `**/*.tsbuildinfo` — TypeScript incremental build cache.
112
+ - `**/__snapshots__/**`, `**/*.snap` — test snapshot files.
113
+
114
+ User-visible:
115
+
116
+ - `pilot build` prints `[pilot] working tree has N modified file(s) in framework-owned paths; treating tree as clean:` followed by the first 5 paths before starting.
117
+ - `pilot build-resume` does the same.
118
+
119
+ Also fixed a porcelain-parser bug that ate the leading space off `git status --porcelain` lines; new tests cover the round-trip.
120
+
121
+ - [#26](https://github.com/iceglober/glrs/pull/26) [`6cec227`](https://github.com/iceglober/glrs/commit/6cec227eeb4360344a8a5cb9b944f3070459084c) Thanks [@iceglober](https://github.com/iceglober)! - pilot: add `.glrs/hooks/pilot_setup` — repo-level setup hook
122
+
123
+ A user-authored shell script at `.glrs/hooks/pilot_setup` (relative to the repo root) is auto-invoked once at the start of `pilot build` and `pilot build-resume`, before any task runs. Its job is to make the dev stack ready: install deps, start docker services, run migrations, seed data — whatever the plan's verify commands expect to already be running.
124
+
125
+ Contract:
126
+
127
+ - **Missing file → skip silently.** No hook = no setup = the old behavior.
128
+ - **Present + executable → run it.** stdout/stderr stream live to the terminal so the user sees install progress.
129
+ - **Non-zero exit → abort the pilot run.** User fixes their env first.
130
+ - **10-minute timeout → abort.** Prevents hung installs from blocking indefinitely.
131
+ - **Not executable → abort with a clear message** (`chmod +x .glrs/hooks/pilot_setup`).
132
+
133
+ Why this instead of the old plan-level `setup:` field:
134
+
135
+ - It's version-controlled in the user's repo, not LLM-authored.
136
+ - One hook per repo covers every plan — no cross-plan drift.
137
+ - The user controls exactly what runs (no pilot-opinionated defaults).
138
+ - It's idempotent by convention — safe to re-run on resume.
139
+
140
+ Example `.glrs/hooks/pilot_setup`:
141
+
142
+ ```bash
143
+ #!/bin/sh
144
+ set -e
145
+ pnpm install --frozen-lockfile
146
+ docker compose up -d postgres redis
147
+ pnpm prisma migrate dev --skip-generate
148
+ ```
149
+
150
+ - [#26](https://github.com/iceglober/glrs/pull/26) [`6cec227`](https://github.com/iceglober/glrs/commit/6cec227eeb4360344a8a5cb9b944f3070459084c) Thanks [@iceglober](https://github.com/iceglober)! - pilot: add `tolerate:` task field + default allowlist for framework-generated files
151
+
152
+ **Problem:** Tasks with verify steps like `next build` would fail touches-enforcement on files the framework itself rewrites (`next-env.d.ts`, `.next/types/**`), not files the agent edited. The fix-loop couldn't recover — reverting the file just made the next verify regenerate it.
153
+
154
+ **Fix:** Two complementary escape hatches.
155
+
156
+ 1. **Built-in default allowlist.** `enforceTouches` now accepts a small, opinionated set of framework-generated globs without requiring plan authors to list them:
157
+
158
+ - `**/next-env.d.ts`
159
+ - `**/.next/types/**`, `**/.next/dev/types/**`
160
+ - `**/*.tsbuildinfo`
161
+ - `**/__snapshots__/**`, `**/*.snap`
162
+
163
+ 2. **Task-level `tolerate:` field.** Plan authors can extend the allowlist per-task for project-specific codegen (prisma/client, graphql/generated, etc.). `tolerate:` is unioned with `touches:` and defaults at enforcement time.
164
+
165
+ **Behavior change:** Tasks that previously failed touches-enforcement on these paths will now pass. `touches: []` (verify-only) tasks where ONLY tolerated/default-allowed files change also pass. Real drift (file outside touches + tolerate + defaults) still fails as before.
166
+
167
+ Planner prompt and `pilot-planning/rules/touches-scope.md` both updated with the new `tolerate:` contract and examples.
168
+
169
+ - [#26](https://github.com/iceglober/glrs/pull/26) [`6cec227`](https://github.com/iceglober/glrs/commit/6cec227eeb4360344a8a5cb9b944f3070459084c) Thanks [@iceglober](https://github.com/iceglober)! - pilot: inject PILOT\_\* env vars into setup and verify commands
170
+
171
+ Pilot setup and per-task verify commands now run with a fixed set of `PILOT_*` env vars plus a default `COMPOSE_PROJECT_NAME` injected by the harness. This lets plan authors isolate per-worktree local infrastructure (docker-compose projects, host ports, named volumes) so parallel and retried pilot worktrees don't collide with each other or with a developer's background dev stack.
172
+
173
+ Injected vars:
174
+
175
+ - `PILOT_RUN_ID` — ULID of the current run.
176
+ - `PILOT_TASK_ID` — stable task id.
177
+ - `PILOT_SLOT_INDEX` — pool slot index (0 in v0.1).
178
+ - `PILOT_SLOT_SEQ` — unique sequence `= slot_index * 100 + retry_counter`.
179
+ - `PILOT_WORKTREE_DIR` — absolute worktree path.
180
+ - `PILOT_PORT_BASE` — opinionated port base `= 10000 + PILOT_SLOT_SEQ * 100`.
181
+ - `COMPOSE_PROJECT_NAME` — default `pilot-<runIdShort>-<slotSeq>`, only when unset (user/CI intent preserved).
182
+
183
+ Plan authors using docker-compose for local infra no longer need to hand-roll slot-unique project names or port offsets. See `src/skills/pilot-planning/rules/setup-authoring.md` (updated) for a worked example.
184
+
185
+ ### Patch Changes
186
+
187
+ - [#27](https://github.com/iceglober/glrs/pull/27) [`cf74f2d`](https://github.com/iceglober/glrs/commit/cf74f2dca60ee099a92a500d90de1c1886b6aed0) Thanks [@iceglober](https://github.com/iceglober)! - chore(changesets): move @glrs-dev/cli and @glrs-dev/harness-plugin-opencode from `linked` to `fixed`
188
+
189
+ The `linked` group synchronizes versions only among packages that are ALREADY being bumped — it does not force a package into a release. A changeset that named only the harness (as most of our changesets do) would ship a new harness on npm without republishing the CLI, even though the CLI vendors the harness `dist/` at build time (`packages/cli/scripts/vendor-harness.ts`). End users running `glrs oc ...` would keep getting the old vendored harness until somebody remembered to write a no-op CLI changeset.
190
+
191
+ Moving the pair to `fixed` guarantees any harness publish drags the CLI along at a matching version, so a fresh CLI tarball always re-vendors the latest harness `dist/`. The trade-off — CLI-only changesets now also force a no-op harness republish — is cheap because CLI-only changes are rare in this repo.
192
+
3
193
  ## 0.3.1
4
194
 
5
195
  ### Patch Changes
@@ -80,7 +80,7 @@ If `task.prompt` says "add lodash to handle deep merging", install it. If the ta
80
80
 
81
81
  If a verify failure clearly points to an environmental issue — `Cannot find module 'X'` where `X` is a workspace/monorepo dep, `node_modules` absent despite a lockfile committed to the repo, a stale build artifact a typecheck depends on — you ARE expected to run the obvious install command BEFORE giving up with STOP.
82
82
 
83
- Recognise these canonical bootstrap commands: `pnpm install`, `bun install`, `npm install`, `npm ci`, `cargo fetch`, `cargo build`. If the plan declared a `setup:` block, treat that block as the canonical list — run those commands verbatim.
83
+ Recognise these canonical bootstrap commands: `pnpm install`, `bun install`, `npm install`, `npm ci`, `cargo fetch`, `cargo build`.
84
84
 
85
85
  The plugin deny list does not block any of these; they are not task-level dependency additions and they do not require lockfile edits.
86
86
 
@@ -111,7 +111,22 @@ If the fix prompt names `touchesViolators`: revert your edits to those files. Us
111
111
  - Plan. The plan is `pilot.yaml`. Each task in it was already designed by the pilot-planner agent. You are not a co-author.
112
112
  - Refactor unrelated code. The task names a scope; respect it. If you see a glaring issue elsewhere, ignore it — that's a separate task for the human.
113
113
  - Add observability/logging beyond what the task asks for. If the task didn't say "add structured logs", don't add structured logs.
114
- - Run the verify commands yourself. The worker runs them after you stop. Running them yourself wastes turns and can leave residue (test artifacts, cached state) that messes up the worker's run.
115
114
  - Apologize, hedge, or narrate. Each turn is a billable opencode session call; chat preamble buys you nothing.
115
+ - **Write TODO, FIXME, HACK, or XXX comments.** Many repos have pre-commit hooks that reject these annotations. The worker commits your work automatically after verify passes; if the commit is blocked by a hook, the task fails. If you need to note future work, put it in the task's output summary, not in a code comment.
116
116
 
117
- You're a focused, fast, pessimistic implementer. Make the change. Stop. The worker will tell you if anything is wrong.
117
+ # Self-verification run the tests BEFORE you stop
118
+
119
+ **You SHOULD run the task's verify commands yourself during your work session.** The worker runs them formally after you stop, but you should iterate locally first:
120
+
121
+ 1. Write the code.
122
+ 2. Run the verify command(s) listed in the task's `verify:` field.
123
+ 3. If they fail, fix the code and re-run. Iterate until they pass.
124
+ 4. THEN stop.
125
+
126
+ This is faster and cheaper than the worker's retry loop (which requires a full session round-trip per attempt). The worker's formal verify is a gate, not your development loop — arrive at the gate already passing.
127
+
128
+ **How to find the verify commands:** They're in the task kickoff prompt under "Verify commands". Run them exactly as written via bash. They execute in the repo root (cwd).
129
+
130
+ **Exception:** If a verify command requires infrastructure you can't reach (e.g., a running server on a specific port), note that in your output and stop. The worker will handle it.
131
+
132
+ You're a focused, fast, pessimistic implementer. Make the change. Verify it passes. Stop.
@@ -45,13 +45,13 @@ Use Serena and grep to map out:
45
45
  - Existing tests that already cover related code (the verify commands will likely be variations of those).
46
46
  - Existing patterns the change should match.
47
47
  - Any module boundaries that suggest natural task splits.
48
- - **Tooling footprint** — lockfiles, docker-compose services, migration tooling, UI/API/DB test frameworks. You'll use these in Section 3 to propose a `setup:` block and per-surface verify patterns.
48
+ - **Tooling footprint** — lockfiles, docker-compose services, migration tooling, UI/API/DB test frameworks. Understanding these informs your per-surface verify patterns in Section 3.
49
49
 
50
50
  Be thorough here. A planner who shipped a sloppy plan because they only skimmed the codebase wastes hours of pilot-builder time chasing bad scope.
51
51
 
52
52
  ## 3. Apply the planning methodology
53
53
 
54
- The `pilot-planning` skill carries the ten rules. Apply them:
54
+ The `pilot-planning` skill carries the nine rules. Apply them:
55
55
 
56
56
  1. First-principles task framing.
57
57
  2. Decomposition into right-sized tasks.
@@ -61,8 +61,7 @@ The `pilot-planning` skill carries the ten rules. Apply them:
61
61
  6. Optional milestone grouping.
62
62
  7. Self-review.
63
63
  8. Per-task `context:` population (rationale, code pointers, acceptance shorthand).
64
- 9. **Setup-block authoring** — detect lockfiles (pnpm, bun, npm, yarn, Cargo), docker-compose services, and migration tooling (prisma, drizzle-kit, knex, flyway), then propose specific setup commands to the user for confirmation.
65
- 10. **QA-expectations establishment** — detect per-surface test frameworks and propose concrete verify patterns:
64
+ 9. **QA-expectations establishment** — detect per-surface test frameworks and propose concrete verify patterns:
66
65
  - **UI**: Playwright, Cypress, or Vitest browser mode for visual/interaction assertions
67
66
  - **API**: curl against local endpoints or OpenAPI-based contract tests
68
67
  - **DB**: Postgres readiness checks and migration verification (prisma migrate, drizzle-kit push)
@@ -70,7 +69,9 @@ The `pilot-planning` skill carries the ten rules. Apply them:
70
69
  - **Browser-based component**: Storybook or Chromatic visual tests
71
70
  - **CLI**: bin/ smoke tests or `--help` verification
72
71
 
73
- Rules 9 and 10 typically involve ONE bundled `question` tool call to the user combine setup proposals and per-surface verify proposals into a single round (respecting "talk to the user — once" guidance).
72
+ Rule 9 typically involves ONE bundled `question` tool call to the user for QA verify patterns (respecting "talk to the user — once" guidance).
73
+
74
+ Note: The `setup:` field was removed in the cwd-mode rollback. Plans assume the user's dev stack is already running (install, compose, migrate, seed) before `pilot build` is invoked. Remind the user of this at hand-off.
74
75
 
75
76
  ## 4. Write the YAML
76
77
 
@@ -80,10 +81,6 @@ Required schema (see `src/pilot/plan/schema.ts` for the canonical Zod definition
80
81
 
81
82
  ```yaml
82
83
  name: <human-readable plan name>
83
- setup: # optional — run once per worktree before any task
84
- - pnpm install --frozen-lockfile
85
- - docker compose up -d postgres
86
- - pnpm prisma migrate dev
87
84
  defaults: # optional, override per-task as needed
88
85
  agent: pilot-builder # default
89
86
  model: anthropic/claude-sonnet-4-6
@@ -114,6 +111,17 @@ tasks:
114
111
  touches:
115
112
  - src/api/**
116
113
  - test/api/**
114
+ tolerate: # optional — files that may appear in
115
+ # the diff but aren't part of the task's
116
+ # scope (project-specific codegen,
117
+ # framework side-effects beyond the
118
+ # built-in defaults like next-env.d.ts).
119
+ # Common entries: prisma/client/**,
120
+ # graphql/generated/**, schema.graphql.
121
+ # Built-in defaults already cover
122
+ # next-env.d.ts, .next/types/**,
123
+ # *.tsbuildinfo, __snapshots__/**.
124
+ - prisma/client/**
117
125
  verify:
118
126
  - bun test test/api
119
127
  depends_on: [ ] # other task ids
@@ -154,6 +162,8 @@ Don't elaborate. Don't summarize the plan in chat. The user can read it.
154
162
 
155
163
  - **Asking the human to clarify mid-build.** Don't write tasks whose prompts contain things like "ask the user about X". Pilot is unattended. If you don't know X, either ASK NOW (during the planning session) or design the task to discover X via reading code.
156
164
 
165
+ - **YAML quoting errors in titles/prompts.** If a string contains double quotes, wrap it in single quotes: `title: '"Test rule set" UI + hook'`. If it contains single quotes, use double quotes with escaped inner quotes: `title: "it's a \"test\""`. NEVER write `title: "word" more words` — YAML closes the scalar at the second `"`. Run `pilot validate` after saving; it catches these.
166
+
157
167
  # What "done" looks like
158
168
 
159
169
  A plan that:
@@ -0,0 +1,174 @@
1
+ // src/pilot/state/tasks.ts
2
+ function upsertFromPlan(db, runId, plan) {
3
+ const stmt = db.prepare(
4
+ `INSERT OR IGNORE INTO tasks (run_id, task_id, status) VALUES (?, ?, 'pending')`
5
+ );
6
+ const tx = db.transaction(() => {
7
+ for (const t of plan.tasks) {
8
+ stmt.run(runId, t.id);
9
+ }
10
+ });
11
+ tx();
12
+ }
13
+ function markReady(db, runId, taskId) {
14
+ requireStatus(db, runId, taskId, ["pending"], "ready");
15
+ db.run(
16
+ "UPDATE tasks SET status='ready' WHERE run_id=? AND task_id=?",
17
+ [runId, taskId]
18
+ );
19
+ }
20
+ function markRunning(db, args) {
21
+ requireStatus(db, args.runId, args.taskId, ["ready"], "running");
22
+ const now = args.now ?? Date.now();
23
+ db.run(
24
+ `UPDATE tasks
25
+ SET status='running',
26
+ attempts = attempts + 1,
27
+ session_id = ?,
28
+ branch = ?,
29
+ worktree_path = ?,
30
+ started_at = COALESCE(started_at, ?)
31
+ WHERE run_id=? AND task_id=?`,
32
+ [args.sessionId, args.branch, args.worktreePath, now, args.runId, args.taskId]
33
+ );
34
+ }
35
+ function markSucceeded(db, runId, taskId, now = Date.now()) {
36
+ requireStatus(db, runId, taskId, ["running"], "succeeded");
37
+ db.run(
38
+ `UPDATE tasks
39
+ SET status='succeeded', finished_at=?, last_error=NULL
40
+ WHERE run_id=? AND task_id=?`,
41
+ [now, runId, taskId]
42
+ );
43
+ }
44
+ function markFailed(db, runId, taskId, reason, now = Date.now()) {
45
+ requireStatus(db, runId, taskId, ["running", "ready"], "failed");
46
+ db.run(
47
+ `UPDATE tasks
48
+ SET status='failed', finished_at=?, last_error=?
49
+ WHERE run_id=? AND task_id=?`,
50
+ [now, reason, runId, taskId]
51
+ );
52
+ }
53
+ function markBlocked(db, runId, taskId, reason) {
54
+ requireStatus(db, runId, taskId, ["pending", "ready"], "blocked");
55
+ db.run(
56
+ `UPDATE tasks
57
+ SET status='blocked', last_error=?
58
+ WHERE run_id=? AND task_id=?`,
59
+ [reason, runId, taskId]
60
+ );
61
+ }
62
+ function markAborted(db, runId, taskId, reason, now = Date.now()) {
63
+ requireStatus(db, runId, taskId, ["running", "ready"], "aborted");
64
+ db.run(
65
+ `UPDATE tasks
66
+ SET status='aborted', finished_at=?, last_error=?
67
+ WHERE run_id=? AND task_id=?`,
68
+ [now, reason, runId, taskId]
69
+ );
70
+ }
71
+ function markPending(db, runId, taskId) {
72
+ const cur = getTask(db, runId, taskId);
73
+ if (!cur) {
74
+ throw new Error(
75
+ `markPending: task ${JSON.stringify(taskId)} not found in run ${JSON.stringify(runId)}`
76
+ );
77
+ }
78
+ db.run(
79
+ `UPDATE tasks
80
+ SET status='pending',
81
+ session_id=NULL,
82
+ branch=NULL,
83
+ worktree_path=NULL,
84
+ started_at=NULL,
85
+ finished_at=NULL,
86
+ last_error=NULL
87
+ WHERE run_id=? AND task_id=?`,
88
+ [runId, taskId]
89
+ );
90
+ }
91
+ function setCostUsd(db, runId, taskId, costUsd) {
92
+ if (!Number.isFinite(costUsd) || costUsd < 0) {
93
+ throw new RangeError(`setCostUsd: invalid cost ${costUsd}`);
94
+ }
95
+ db.run(
96
+ "UPDATE tasks SET cost_usd=? WHERE run_id=? AND task_id=?",
97
+ [costUsd, runId, taskId]
98
+ );
99
+ }
100
+ function getTask(db, runId, taskId) {
101
+ return db.query("SELECT * FROM tasks WHERE run_id=? AND task_id=?").get(runId, taskId);
102
+ }
103
+ function listTasks(db, runId) {
104
+ return db.query("SELECT * FROM tasks WHERE run_id=? ORDER BY task_id").all(runId);
105
+ }
106
+ function readyTasks(db, runId) {
107
+ return db.query("SELECT * FROM tasks WHERE run_id=? AND status='ready' ORDER BY task_id").all(runId);
108
+ }
109
+ function countByStatus(db, runId) {
110
+ const rows = db.query("SELECT status, COUNT(*) as n FROM tasks WHERE run_id=? GROUP BY status").all(runId);
111
+ const out = {
112
+ pending: 0,
113
+ ready: 0,
114
+ running: 0,
115
+ succeeded: 0,
116
+ failed: 0,
117
+ blocked: 0,
118
+ aborted: 0
119
+ };
120
+ for (const r of rows) out[r.status] = r.n;
121
+ return out;
122
+ }
123
+ function resetTasksForResume(db, runId) {
124
+ const rows = listTasks(db, runId);
125
+ const resettable = rows.filter((r) => r.status !== "succeeded");
126
+ if (resettable.length === 0) return [];
127
+ const stmt = db.prepare(
128
+ `UPDATE tasks
129
+ SET status='pending',
130
+ attempts=0,
131
+ session_id=NULL,
132
+ last_error=NULL,
133
+ started_at=NULL,
134
+ finished_at=NULL,
135
+ branch=NULL,
136
+ worktree_path=NULL
137
+ WHERE run_id=? AND task_id=? AND status != 'succeeded'`
138
+ );
139
+ const tx = db.transaction(() => {
140
+ for (const r of resettable) stmt.run(runId, r.task_id);
141
+ });
142
+ tx();
143
+ return resettable.map((r) => r.task_id);
144
+ }
145
+ function requireStatus(db, runId, taskId, expected, intended) {
146
+ const row = getTask(db, runId, taskId);
147
+ if (!row) {
148
+ throw new Error(
149
+ `task ${JSON.stringify(taskId)} not found in run ${JSON.stringify(runId)}`
150
+ );
151
+ }
152
+ if (!expected.includes(row.status)) {
153
+ throw new Error(
154
+ `cannot move task ${JSON.stringify(taskId)} from ${row.status} to ${intended} (expected one of: ${expected.join(", ")})`
155
+ );
156
+ }
157
+ }
158
+
159
+ export {
160
+ upsertFromPlan,
161
+ markReady,
162
+ markRunning,
163
+ markSucceeded,
164
+ markFailed,
165
+ markBlocked,
166
+ markAborted,
167
+ markPending,
168
+ setCostUsd,
169
+ getTask,
170
+ listTasks,
171
+ readyTasks,
172
+ countByStatus,
173
+ resetTasksForResume
174
+ };
@@ -0,0 +1,67 @@
1
+ // src/pilot/state/runs.ts
2
+ import { ulid } from "ulid";
3
+ function createRun(db, args) {
4
+ const id = ulid();
5
+ const now = args.now ?? Date.now();
6
+ db.run(
7
+ `INSERT INTO runs (id, plan_path, plan_slug, started_at, status)
8
+ VALUES (?, ?, ?, ?, 'pending')`,
9
+ [id, args.planPath, args.slug, now]
10
+ );
11
+ void args.plan;
12
+ return id;
13
+ }
14
+ function markRunRunning(db, runId) {
15
+ const cur = getRun(db, runId);
16
+ if (!cur) throw new Error(`markRunRunning: run ${JSON.stringify(runId)} not found`);
17
+ if (cur.status === "running") return;
18
+ if (cur.status !== "pending") {
19
+ throw new Error(
20
+ `markRunRunning: cannot move run ${JSON.stringify(runId)} from ${cur.status} to running`
21
+ );
22
+ }
23
+ db.run("UPDATE runs SET status='running' WHERE id=?", [runId]);
24
+ }
25
+ function markRunFinished(db, runId, status, now = Date.now()) {
26
+ if (status !== "completed" && status !== "aborted" && status !== "failed") {
27
+ throw new Error(
28
+ `markRunFinished: ${JSON.stringify(status)} is not a terminal status`
29
+ );
30
+ }
31
+ const cur = getRun(db, runId);
32
+ if (!cur) {
33
+ throw new Error(`markRunFinished: run ${JSON.stringify(runId)} not found`);
34
+ }
35
+ db.run("UPDATE runs SET status=?, finished_at=? WHERE id=?", [status, now, runId]);
36
+ }
37
+ function markRunResumed(db, runId) {
38
+ const cur = getRun(db, runId);
39
+ if (!cur) throw new Error(`markRunResumed: run ${JSON.stringify(runId)} not found`);
40
+ if (cur.status === "completed") {
41
+ throw new Error(
42
+ `markRunResumed: run ${JSON.stringify(runId)} is already completed; nothing to resume`
43
+ );
44
+ }
45
+ db.run("UPDATE runs SET status='running', finished_at=NULL WHERE id=?", [runId]);
46
+ }
47
+ function getRun(db, runId) {
48
+ const row = db.query("SELECT * FROM runs WHERE id=?").get(runId);
49
+ return row;
50
+ }
51
+ function listRuns(db, limit = 100) {
52
+ return db.query("SELECT * FROM runs ORDER BY started_at DESC LIMIT ?").all(limit);
53
+ }
54
+ function latestRun(db) {
55
+ const row = db.query("SELECT * FROM runs ORDER BY started_at DESC LIMIT 1").get();
56
+ return row;
57
+ }
58
+
59
+ export {
60
+ createRun,
61
+ markRunRunning,
62
+ markRunFinished,
63
+ markRunResumed,
64
+ getRun,
65
+ listRuns,
66
+ latestRun
67
+ };