@flumecode/runner 0.8.0 → 0.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md
CHANGED
|
@@ -63,7 +63,7 @@ skipping if that version is already on npm).
|
|
|
63
63
|
6. Report the summary back (`POST /api/runner/jobs/:id/complete`), which fills in
|
|
64
64
|
the pending agent comment in the thread.
|
|
65
65
|
|
|
66
|
-
Jobs come in two kinds. **
|
|
66
|
+
Jobs come in two kinds. **comment** jobs answer a request thread (the flow above).
|
|
67
67
|
**init** jobs bootstrap a repository: they clone the default branch onto a fresh
|
|
68
68
|
`flumecode/init-*` branch, run the `flumecode:document` skill to create the
|
|
69
69
|
`.flumecode/` wiki, and open a PR. A repo must be initialized (from its dashboard
|
package/package.json
CHANGED
|
@@ -31,10 +31,11 @@ put it in the prompt, the subagent doesn't have it.
|
|
|
31
31
|
|
|
32
32
|
- Spawn each phase with the **Task** tool, `subagent_type: "general-purpose"`.
|
|
33
33
|
- **Model per phase** (pass it as the Task `model` argument):
|
|
34
|
-
- `"sonnet"` — implementation and
|
|
34
|
+
- `"sonnet"` — implementation, fixes, and the Verify step (mechanical
|
|
35
|
+
command-running; Verify is read-only even though it uses sonnet).
|
|
35
36
|
- `"opus"` — acceptance-criteria review, code-quality review, and the report.
|
|
36
|
-
- **
|
|
37
|
-
report only — never edit, create, or delete files_. Only implementation/fix
|
|
37
|
+
- **Read-only phases.** Tell every review, Verify, and report subagent to _inspect
|
|
38
|
+
and report only — never edit, create, or delete files_. Only implementation/fix
|
|
38
39
|
subagents may change the working tree.
|
|
39
40
|
- **No git side effects.** Neither you nor any subagent may commit, push, or open
|
|
40
41
|
a PR. Leave the changes in the working tree; the runner commits + opens the PR
|
|
@@ -61,11 +62,35 @@ the next step.
|
|
|
61
62
|
|
|
62
63
|
2. **Implement** — Task, `model: "sonnet"`. Give the subagent: the plan steps, a
|
|
63
64
|
pointer to the wiki/orientation, and the coding guidelines (verbatim). Tell it
|
|
64
|
-
to make all the code changes in the working tree to satisfy the plan,
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
65
|
+
to make all the code changes in the working tree to satisfy the plan, then
|
|
66
|
+
self-verify by discovering and running the project's verification commands —
|
|
67
|
+
checking these sources in order: `package.json` scripts (look for `build`,
|
|
68
|
+
`typecheck`, `lint`, `test`), `CLAUDE.md`, any `.flumecode/wiki/` page that
|
|
69
|
+
mentions commands, and `Makefile`. Use whatever is present and appropriate for
|
|
70
|
+
this repo; do not hardcode specific command strings. Run each discovered
|
|
71
|
+
command and fix any errors that the edits introduced before returning. If no
|
|
72
|
+
build/test setup exists in this repo, note that and move on — do not fail. End
|
|
73
|
+
by reporting: the verification commands it ran and their pass/fail results,
|
|
74
|
+
which files it changed, and how each plan step was addressed. It must not
|
|
75
|
+
commit or push.
|
|
76
|
+
|
|
77
|
+
3. **Verify (build & tests)** — Task, `model: "sonnet"`, read-only. This step
|
|
78
|
+
gives the orchestrator an objective, independent build/test signal before the
|
|
79
|
+
subjective AC and quality reviews. Tell the subagent to:
|
|
80
|
+
- Discover the project's verification commands from `package.json` scripts
|
|
81
|
+
(look for `build`, `typecheck`, `lint`, `test`), `CLAUDE.md`,
|
|
82
|
+
`.flumecode/wiki/` (any page that mentions commands), and `Makefile`. Use
|
|
83
|
+
what is present; do not hardcode specific command strings.
|
|
84
|
+
- Run each discovered command and record: the exact command, whether it passed
|
|
85
|
+
or failed, and — for any failure — a short excerpt of the failing output
|
|
86
|
+
(enough to diagnose the problem).
|
|
87
|
+
- If no build/test setup exists in this repo, say so explicitly and pass the
|
|
88
|
+
gate.
|
|
89
|
+
- Return a structured per-check result: command, pass/fail, failing-output
|
|
90
|
+
excerpt (if any).
|
|
91
|
+
- Must not edit, create, or delete any files.
|
|
92
|
+
|
|
93
|
+
4. **Acceptance-criteria review** — Task, `model: "opus"`, read-only. Give the
|
|
69
94
|
subagent the full AC list and tell it to verify each one against the actual
|
|
70
95
|
changes (run `git --no-pager diff`, read the changed files, run tests/build if
|
|
71
96
|
useful). For **each** AC it must return: the criterion text verbatim, a verdict
|
|
@@ -82,32 +107,38 @@ the next step.
|
|
|
82
107
|
to return this as a clean, structured list so you can hand it straight to the
|
|
83
108
|
report step.
|
|
84
109
|
|
|
85
|
-
|
|
110
|
+
5. **Code-quality review** — Task, `model: "opus"`, read-only. Give the subagent
|
|
86
111
|
the coding guidelines (verbatim) and tell it to review the changes for
|
|
87
112
|
violations and quality problems, returning concrete findings as
|
|
88
113
|
`file:line — what — why`, each marked **must-fix** or **nice-to-have**.
|
|
89
114
|
|
|
90
|
-
|
|
115
|
+
6. **Fix loop.** If the Verify step (step 3) reports any failing check, the AC
|
|
116
|
+
review (step 4) reports any _not met_ AC, or the quality review (step 5)
|
|
91
117
|
reports any _must-fix_ finding: spawn an **Implement/fix** subagent (Task,
|
|
92
118
|
`model: "sonnet"`) whose prompt lists exactly those findings and tells it to
|
|
93
|
-
resolve them without regressing the rest.
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
`git --no-pager diff
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
119
|
+
resolve them without regressing the rest. When a Verify failure triggered the
|
|
120
|
+
fix, include the failing command(s) and their error output excerpt(s) from the
|
|
121
|
+
Verify result in the fix subagent's prompt so it has the full context. After
|
|
122
|
+
each fix iteration, re-run the Verify step (step 3) in addition to any AC or
|
|
123
|
+
quality review that failed. Repeat at most **2** times. If something still
|
|
124
|
+
fails after that, stop looping and record the gap honestly in the report — do
|
|
125
|
+
not hide it.
|
|
126
|
+
|
|
127
|
+
7. **Report** — Task, `model: "opus"`, read-only. Give the subagent the plan, the
|
|
128
|
+
Verify results (from step 3), the AC verdicts (from step 4), and the quality
|
|
129
|
+
findings, and tell it to run `git --no-pager diff` itself as the **single
|
|
130
|
+
source of truth** for the report. Every `evidence` hunk it submits must be
|
|
131
|
+
copied verbatim from that live diff — it must drop or correct any hunk carried
|
|
132
|
+
over from step 4 that no longer appears in the actual diff, and the **Files
|
|
133
|
+
changed** list must come from `git --no-pager diff --stat`, not from what an
|
|
134
|
+
earlier subagent claimed. **If `git --no-pager diff` is empty, the
|
|
135
|
+
implementation changed nothing:** the report must say so plainly — an honest
|
|
136
|
+
`summary`, no AC marked `met` with evidence — and must never describe edits
|
|
137
|
+
that aren't in the diff. Tell it to submit the user-facing report by calling
|
|
138
|
+
the **`submit_report`** tool — it has that tool available. It must call
|
|
139
|
+
`submit_report` exactly once and must not edit any files.
|
|
140
|
+
|
|
141
|
+
8. **Confirm and end.** Once the report subagent has called `submit_report`, you are
|
|
111
142
|
done — end your turn. The runner reads the submitted report, renders it, posts it
|
|
112
143
|
to the thread, and appends the pull-request link. (Your own final text is only a
|
|
113
144
|
fallback if no report was submitted, so make sure the subagent submits one.)
|
|
@@ -120,11 +151,11 @@ The report subagent calls `submit_report` with these fields:
|
|
|
120
151
|
- **`prose`** — markdown for the remaining sections, using `##` headings:
|
|
121
152
|
**What changed** (the plan steps, each mapped to the concrete changes that satisfy
|
|
122
153
|
it), **Code quality** (the quality-review outcome and anything left as
|
|
123
|
-
nice-to-have), **Files changed** (the list from the diff), **Build / tests** (
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
runner adds it.
|
|
154
|
+
nice-to-have), **Files changed** (the list from the diff), **Build / tests** (lists
|
|
155
|
+
each verification command and its final pass/fail result, or explains that no
|
|
156
|
+
build/test setup was found), and **Caveats / follow-ups** (anything deferred,
|
|
157
|
+
unmet, or worth a human's eyes). Do **not** put the acceptance-criteria section in
|
|
158
|
+
`prose`, and do **not** include a PR link — the runner adds it.
|
|
128
159
|
- **`acceptanceCriteria`** — one entry per AC from the plan, in plan order, each:
|
|
129
160
|
- `criterion` — the AC text verbatim.
|
|
130
161
|
- `status` — `"met"` / `"not_met"` / `"unclear"`, mirroring the AC review.
|
|
@@ -138,7 +169,7 @@ The report subagent calls `submit_report` with these fields:
|
|
|
138
169
|
|
|
139
170
|
- Delegate through Task subagents; don't implement, review, or write the report
|
|
140
171
|
yourself.
|
|
141
|
-
- Right model per phase: `sonnet` to implement/fix, `opus` to review/report.
|
|
172
|
+
- Right model per phase: `sonnet` to implement/fix/verify (Verify is read-only), `opus` to review/report.
|
|
142
173
|
- Make every Task prompt self-contained — subagents see only what you give them.
|
|
143
174
|
- Reviewers and the report writer never modify files.
|
|
144
175
|
- Never commit, push, or open a PR.
|
|
@@ -65,9 +65,12 @@ essentials:
|
|
|
65
65
|
- **Scope the work to the request.** This is a fine-tune of an existing
|
|
66
66
|
implementation, not a rebuild. Change only what the user asked for plus what that
|
|
67
67
|
change strictly requires; don't regress the rest of the plan.
|
|
68
|
-
- **Pipeline:** Implement (
|
|
69
|
-
|
|
70
|
-
(Task
|
|
68
|
+
- **Pipeline:** Implement (self-runs build/tests & fixes its own errors, Task
|
|
69
|
+
`model: "sonnet"`) → Verify (build/tests, read-only, Task `model: "sonnet"`) →
|
|
70
|
+
acceptance/quality review (Task `model: "opus"`, read-only) → fix loop if needed
|
|
71
|
+
(≤2, re-run Verify after each fix) → report (Task `model: "opus"`, read-only).
|
|
72
|
+
Detailed mechanics (command discovery, Verify step spec, fix-loop trigger
|
|
73
|
+
conditions) are in `implement-plan/SKILL.md` — read it for the full pipeline.
|
|
71
74
|
- **No git side effects.** Never commit, push, or open a PR — leave the changes in
|
|
72
75
|
the working tree. The runner commits them and updates the existing pull request.
|
|
73
76
|
|
|
@@ -76,11 +79,13 @@ essentials:
|
|
|
76
79
|
Your last message **is** the comment posted to the plan thread — write it for the
|
|
77
80
|
user:
|
|
78
81
|
|
|
79
|
-
- **Implemented:** a short report — what you changed and why, which files, and
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
82
|
+
- **Implemented:** a short report — what you changed and why, which files, and the
|
|
83
|
+
verification results: list each build/test command that was run and its final
|
|
84
|
+
pass/fail result (or note that no build/test setup was found). Base "what changed"
|
|
85
|
+
and "which files" on the actual `git --no-pager diff` (`--stat` for the file
|
|
86
|
+
list), not on what a subagent claimed; if the diff is empty, say nothing was
|
|
87
|
+
changed rather than describing edits that aren't there. The runner appends the
|
|
88
|
+
pull-request link, so don't add one.
|
|
84
89
|
- **Clarify / push back:** your question or reasoning, as prose (plus any widget).
|
|
85
90
|
- **Re-plan:** you called `submit_plan`; the rendered plan is posted automatically,
|
|
86
91
|
so keep any extra reply text minimal.
|