@flumecode/runner 0.8.0 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -63,7 +63,7 @@ skipping if that version is already on npm).
63
63
  6. Report the summary back (`POST /api/runner/jobs/:id/complete`), which fills in
64
64
  the pending agent comment in the thread.
65
65
 
66
- Jobs come in two kinds. **chat** jobs answer a request thread (the flow above).
66
+ Jobs come in two kinds. **comment** jobs answer a request thread (the flow above).
67
67
  **init** jobs bootstrap a repository: they clone the default branch onto a fresh
68
68
  `flumecode/init-*` branch, run the `flumecode:document` skill to create the
69
69
  `.flumecode/` wiki, and open a PR. A repo must be initialized (from its dashboard
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@flumecode/runner",
3
- "version": "0.8.0",
3
+ "version": "0.9.0",
4
4
  "type": "module",
5
5
  "description": "FlumeCode local runner — claims jobs and drives your local Claude Code against a real checkout.",
6
6
  "bin": {
@@ -31,10 +31,11 @@ put it in the prompt, the subagent doesn't have it.
31
31
 
32
32
  - Spawn each phase with the **Task** tool, `subagent_type: "general-purpose"`.
33
33
  - **Model per phase** (pass it as the Task `model` argument):
34
- - `"sonnet"` — implementation and fixes (the code-writing work).
34
+ - `"sonnet"` — implementation, fixes, and the Verify step (mechanical
35
+ command-running; Verify is read-only even though it uses sonnet).
35
36
  - `"opus"` — acceptance-criteria review, code-quality review, and the report.
36
- - **Reviewers are read-only.** Tell every review/report subagent to _inspect and
37
- report only — never edit, create, or delete files_. Only implementation/fix
37
+ - **Read-only phases.** Tell every review, Verify, and report subagent to _inspect
38
+ and report only — never edit, create, or delete files_. Only implementation/fix
38
39
  subagents may change the working tree.
39
40
  - **No git side effects.** Neither you nor any subagent may commit, push, or open
40
41
  a PR. Leave the changes in the working tree; the runner commits + opens the PR
@@ -61,11 +62,35 @@ the next step.
61
62
 
62
63
  2. **Implement** — Task, `model: "sonnet"`. Give the subagent: the plan steps, a
63
64
  pointer to the wiki/orientation, and the coding guidelines (verbatim). Tell it
64
- to make all the code changes in the working tree to satisfy the plan, keep the
65
- build and tests green where practical, and end by reporting which files it
66
- changed and how each step was addressed. It must not commit or push.
67
-
68
- 3. **Acceptance-criteria review** Task, `model: "opus"`, read-only. Give the
65
+ to make all the code changes in the working tree to satisfy the plan, then
66
+ self-verify by discovering and running the project's verification commands
67
+ checking these sources in order: `package.json` scripts (look for `build`,
68
+ `typecheck`, `lint`, `test`), `CLAUDE.md`, any `.flumecode/wiki/` page that
69
+ mentions commands, and `Makefile`. Use whatever is present and appropriate for
70
+ this repo; do not hardcode specific command strings. Run each discovered
71
+ command and fix any errors that the edits introduced before returning. If no
72
+ build/test setup exists in this repo, note that and move on — do not fail. End
73
+ by reporting: the verification commands it ran and their pass/fail results,
74
+ which files it changed, and how each plan step was addressed. It must not
75
+ commit or push.
76
+
77
+ 3. **Verify (build & tests)** — Task, `model: "sonnet"`, read-only. This step
78
+ gives the orchestrator an objective, independent build/test signal before the
79
+ subjective AC and quality reviews. Tell the subagent to:
80
+ - Discover the project's verification commands from `package.json` scripts
81
+ (look for `build`, `typecheck`, `lint`, `test`), `CLAUDE.md`,
82
+ `.flumecode/wiki/` (any page that mentions commands), and `Makefile`. Use
83
+ what is present; do not hardcode specific command strings.
84
+ - Run each discovered command and record: the exact command, whether it passed
85
+ or failed, and — for any failure — a short excerpt of the failing output
86
+ (enough to diagnose the problem).
87
+ - If no build/test setup exists in this repo, say so explicitly and pass the
88
+ gate.
89
+ - Return a structured per-check result: command, pass/fail, failing-output
90
+ excerpt (if any).
91
+ - Must not edit, create, or delete any files.
92
+
93
+ 4. **Acceptance-criteria review** — Task, `model: "opus"`, read-only. Give the
69
94
  subagent the full AC list and tell it to verify each one against the actual
70
95
  changes (run `git --no-pager diff`, read the changed files, run tests/build if
71
96
  useful). For **each** AC it must return: the criterion text verbatim, a verdict
@@ -82,32 +107,38 @@ the next step.
82
107
  to return this as a clean, structured list so you can hand it straight to the
83
108
  report step.
84
109
 
85
- 4. **Code-quality review** — Task, `model: "opus"`, read-only. Give the subagent
110
+ 5. **Code-quality review** — Task, `model: "opus"`, read-only. Give the subagent
86
111
  the coding guidelines (verbatim) and tell it to review the changes for
87
112
  violations and quality problems, returning concrete findings as
88
113
  `file:line — what — why`, each marked **must-fix** or **nice-to-have**.
89
114
 
90
- 5. **Fix loop.** If the AC review reports any _not met_ AC, or the quality review
115
+ 6. **Fix loop.** If the Verify step (step 3) reports any failing check, the AC
116
+ review (step 4) reports any _not met_ AC, or the quality review (step 5)
91
117
  reports any _must-fix_ finding: spawn an **Implement/fix** subagent (Task,
92
118
  `model: "sonnet"`) whose prompt lists exactly those findings and tells it to
93
- resolve them without regressing the rest. Then re-run only the review(s) that
94
- failed. Repeat at most **2** times. If something still fails after that, stop
95
- looping and record the gap honestly in the report do not hide it.
96
-
97
- 6. **Report** Task, `model: "opus"`, read-only. Give the subagent the plan, the AC
98
- verdicts (from step 3), and the quality findings, and tell it to run
99
- `git --no-pager diff` itself as the **single source of truth** for the report.
100
- Every `evidence` hunk it submits must be copied verbatim from that live diff — it
101
- must drop or correct any hunk carried over from step 3 that no longer appears in
102
- the actual diff, and the **Files changed** list must come from
103
- `git --no-pager diff --stat`, not from what an earlier subagent claimed. **If
104
- `git --no-pager diff` is empty, the implementation changed nothing:** the report
105
- must say so plainly an honest `summary`, no AC marked `met` with evidence — and
106
- must never describe edits that aren't in the diff. Tell it to submit the
107
- user-facing report by calling the **`submit_report`** tool it has that tool
108
- available. It must call `submit_report` exactly once and must not edit any files.
109
-
110
- 7. **Confirm and end.** Once the report subagent has called `submit_report`, you are
119
+ resolve them without regressing the rest. When a Verify failure triggered the
120
+ fix, include the failing command(s) and their error output excerpt(s) from the
121
+ Verify result in the fix subagent's prompt so it has the full context. After
122
+ each fix iteration, re-run the Verify step (step 3) in addition to any AC or
123
+ quality review that failed. Repeat at most **2** times. If something still
124
+ fails after that, stop looping and record the gap honestly in the report do
125
+ not hide it.
126
+
127
+ 7. **Report** Task, `model: "opus"`, read-only. Give the subagent the plan, the
128
+ Verify results (from step 3), the AC verdicts (from step 4), and the quality
129
+ findings, and tell it to run `git --no-pager diff` itself as the **single
130
+ source of truth** for the report. Every `evidence` hunk it submits must be
131
+ copied verbatim from that live diff it must drop or correct any hunk carried
132
+ over from step 4 that no longer appears in the actual diff, and the **Files
133
+ changed** list must come from `git --no-pager diff --stat`, not from what an
134
+ earlier subagent claimed. **If `git --no-pager diff` is empty, the
135
+ implementation changed nothing:** the report must say so plainly — an honest
136
+ `summary`, no AC marked `met` with evidence and must never describe edits
137
+ that aren't in the diff. Tell it to submit the user-facing report by calling
138
+ the **`submit_report`** tool — it has that tool available. It must call
139
+ `submit_report` exactly once and must not edit any files.
140
+
141
+ 8. **Confirm and end.** Once the report subagent has called `submit_report`, you are
111
142
  done — end your turn. The runner reads the submitted report, renders it, posts it
112
143
  to the thread, and appends the pull-request link. (Your own final text is only a
113
144
  fallback if no report was submitted, so make sure the subagent submits one.)
@@ -120,11 +151,11 @@ The report subagent calls `submit_report` with these fields:
120
151
  - **`prose`** — markdown for the remaining sections, using `##` headings:
121
152
  **What changed** (the plan steps, each mapped to the concrete changes that satisfy
122
153
  it), **Code quality** (the quality-review outcome and anything left as
123
- nice-to-have), **Files changed** (the list from the diff), **Build / tests** (what
124
- was run and the result, or why it wasn't run), and **Caveats / follow-ups**
125
- (anything deferred, unmet, or worth a human's eyes). Do **not** put the
126
- acceptance-criteria section in `prose`, and do **not** include a PR link — the
127
- runner adds it.
154
+ nice-to-have), **Files changed** (the list from the diff), **Build / tests** (lists
155
+ each verification command and its final pass/fail result, or explains that no
156
+ build/test setup was found), and **Caveats / follow-ups** (anything deferred,
157
+ unmet, or worth a human's eyes). Do **not** put the acceptance-criteria section in
158
+ `prose`, and do **not** include a PR link — the runner adds it.
128
159
  - **`acceptanceCriteria`** — one entry per AC from the plan, in plan order, each:
129
160
  - `criterion` — the AC text verbatim.
130
161
  - `status` — `"met"` / `"not_met"` / `"unclear"`, mirroring the AC review.
@@ -138,7 +169,7 @@ The report subagent calls `submit_report` with these fields:
138
169
 
139
170
  - Delegate through Task subagents; don't implement, review, or write the report
140
171
  yourself.
141
- - Right model per phase: `sonnet` to implement/fix, `opus` to review/report.
172
+ - Right model per phase: `sonnet` to implement/fix/verify (Verify is read-only), `opus` to review/report.
142
173
  - Make every Task prompt self-contained — subagents see only what you give them.
143
174
  - Reviewers and the report writer never modify files.
144
175
  - Never commit, push, or open a PR.
@@ -65,9 +65,12 @@ essentials:
65
65
  - **Scope the work to the request.** This is a fine-tune of an existing
66
66
  implementation, not a rebuild. Change only what the user asked for plus what that
67
67
  change strictly requires; don't regress the rest of the plan.
68
- - **Pipeline:** Implement (Task, `model: "sonnet"`) acceptance/quality review of
69
- the change (Task, `model: "opus"`, read-only) → fix loop if needed (≤2) → report
70
- (Task, `model: "opus"`, read-only). Reviewers and the report writer never edit.
68
+ - **Pipeline:** Implement (self-runs build/tests & fixes its own errors, Task
69
+ `model: "sonnet"`) → Verify (build/tests, read-only, Task `model: "sonnet"`) →
70
+ acceptance/quality review (Task `model: "opus"`, read-only) fix loop if needed
71
+ (≤2, re-run Verify after each fix) → report (Task `model: "opus"`, read-only).
72
+ Detailed mechanics (command discovery, Verify step spec, fix-loop trigger
73
+ conditions) are in `implement-plan/SKILL.md` — read it for the full pipeline.
71
74
  - **No git side effects.** Never commit, push, or open a PR — leave the changes in
72
75
  the working tree. The runner commits them and updates the existing pull request.
73
76
 
@@ -76,11 +79,13 @@ essentials:
76
79
  Your last message **is** the comment posted to the plan thread — write it for the
77
80
  user:
78
81
 
79
- - **Implemented:** a short report — what you changed and why, which files, and how
80
- it was verified (build/tests). Base "what changed" and "which files" on the actual
81
- `git --no-pager diff` (`--stat` for the file list), not on what a subagent claimed;
82
- if the diff is empty, say nothing was changed rather than describing edits that
83
- aren't there. The runner appends the pull-request link, so don't add one.
82
+ - **Implemented:** a short report — what you changed and why, which files, and the
83
+ verification results: list each build/test command that was run and its final
84
+ pass/fail result (or note that no build/test setup was found). Base "what changed"
85
+ and "which files" on the actual `git --no-pager diff` (`--stat` for the file
86
+ list), not on what a subagent claimed; if the diff is empty, say nothing was
87
+ changed rather than describing edits that aren't there. The runner appends the
88
+ pull-request link, so don't add one.
84
89
  - **Clarify / push back:** your question or reasoning, as prose (plus any widget).
85
90
  - **Re-plan:** you called `submit_plan`; the rendered plan is posted automatically,
86
91
  so keep any extra reply text minimal.