npm - @flumecode/runner - Versions diffs - 0.8.0 → 0.9.0 - Mend

@flumecode/runner 0.8.0 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md +1 -1
package/package.json +1 -1
package/skills-plugin/skills/implement-plan/SKILL.md +65 -34
package/skills-plugin/skills/revise-implementation/SKILL.md +13 -8

package/README.md CHANGED Viewed

@@ -63,7 +63,7 @@ skipping if that version is already on npm).
 6. Report the summary back (`POST /api/runner/jobs/:id/complete`), which fills in
    the pending agent comment in the thread.
-Jobs come in two kinds. **chat** jobs answer a request thread (the flow above).
+Jobs come in two kinds. **comment** jobs answer a request thread (the flow above).
 **init** jobs bootstrap a repository: they clone the default branch onto a fresh
 `flumecode/init-*` branch, run the `flumecode:document` skill to create the
 `.flumecode/` wiki, and open a PR. A repo must be initialized (from its dashboard

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@flumecode/runner",
-  "version": "0.8.0",
+  "version": "0.9.0",
   "type": "module",
   "description": "FlumeCode local runner — claims jobs and drives your local Claude Code against a real checkout.",
   "bin": {

package/skills-plugin/skills/implement-plan/SKILL.md CHANGED Viewed

@@ -31,10 +31,11 @@ put it in the prompt, the subagent doesn't have it.
 - Spawn each phase with the **Task** tool, `subagent_type: "general-purpose"`.
 - **Model per phase** (pass it as the Task `model` argument):
-  - `"sonnet"` — implementation and fixes (the code-writing work).
+  - `"sonnet"` — implementation, fixes, and the Verify step (mechanical
+    command-running; Verify is read-only even though it uses sonnet).
   - `"opus"` — acceptance-criteria review, code-quality review, and the report.
-- **Reviewers are read-only.** Tell every review/report subagent to _inspect and
-  report only — never edit, create, or delete files_. Only implementation/fix
+- **Read-only phases.** Tell every review, Verify, and report subagent to _inspect
+  and report only — never edit, create, or delete files_. Only implementation/fix
   subagents may change the working tree.
 - **No git side effects.** Neither you nor any subagent may commit, push, or open
   a PR. Leave the changes in the working tree; the runner commits + opens the PR
@@ -61,11 +62,35 @@ the next step.
 2. **Implement** — Task, `model: "sonnet"`. Give the subagent: the plan steps, a
    pointer to the wiki/orientation, and the coding guidelines (verbatim). Tell it
-   to make all the code changes in the working tree to satisfy the plan, keep the
-   build and tests green where practical, and end by reporting which files it
-   changed and how each step was addressed. It must not commit or push.
-3. **Acceptance-criteria review** — Task, `model: "opus"`, read-only. Give the
+   to make all the code changes in the working tree to satisfy the plan, then
+   self-verify by discovering and running the project's verification commands —
+   checking these sources in order: `package.json` scripts (look for `build`,
+   `typecheck`, `lint`, `test`), `CLAUDE.md`, any `.flumecode/wiki/` page that
+   mentions commands, and `Makefile`. Use whatever is present and appropriate for
+   this repo; do not hardcode specific command strings. Run each discovered
+   command and fix any errors that the edits introduced before returning. If no
+   build/test setup exists in this repo, note that and move on — do not fail. End
+   by reporting: the verification commands it ran and their pass/fail results,
+   which files it changed, and how each plan step was addressed. It must not
+   commit or push.
+3. **Verify (build & tests)** — Task, `model: "sonnet"`, read-only. This step
+   gives the orchestrator an objective, independent build/test signal before the
+   subjective AC and quality reviews. Tell the subagent to:
+   - Discover the project's verification commands from `package.json` scripts
+     (look for `build`, `typecheck`, `lint`, `test`), `CLAUDE.md`,
+     `.flumecode/wiki/` (any page that mentions commands), and `Makefile`. Use
+     what is present; do not hardcode specific command strings.
+   - Run each discovered command and record: the exact command, whether it passed
+     or failed, and — for any failure — a short excerpt of the failing output
+     (enough to diagnose the problem).
+   - If no build/test setup exists in this repo, say so explicitly and pass the
+     gate.
+   - Return a structured per-check result: command, pass/fail, failing-output
+     excerpt (if any).
+   - Must not edit, create, or delete any files.
+4. **Acceptance-criteria review** — Task, `model: "opus"`, read-only. Give the
    subagent the full AC list and tell it to verify each one against the actual
    changes (run `git --no-pager diff`, read the changed files, run tests/build if
    useful). For **each** AC it must return: the criterion text verbatim, a verdict
@@ -82,32 +107,38 @@ the next step.
    to return this as a clean, structured list so you can hand it straight to the
    report step.
-4. **Code-quality review** — Task, `model: "opus"`, read-only. Give the subagent
+5. **Code-quality review** — Task, `model: "opus"`, read-only. Give the subagent
    the coding guidelines (verbatim) and tell it to review the changes for
    violations and quality problems, returning concrete findings as
    `file:line — what — why`, each marked **must-fix** or **nice-to-have**.
-5. **Fix loop.** If the AC review reports any _not met_ AC, or the quality review
+6. **Fix loop.** If the Verify step (step 3) reports any failing check, the AC
+   review (step 4) reports any _not met_ AC, or the quality review (step 5)
    reports any _must-fix_ finding: spawn an **Implement/fix** subagent (Task,
    `model: "sonnet"`) whose prompt lists exactly those findings and tells it to
-   resolve them without regressing the rest. Then re-run only the review(s) that
-   failed. Repeat at most **2** times. If something still fails after that, stop
-   looping and record the gap honestly in the report — do not hide it.
-6. **Report** — Task, `model: "opus"`, read-only. Give the subagent the plan, the AC
-   verdicts (from step 3), and the quality findings, and tell it to run
-   `git --no-pager diff` itself as the **single source of truth** for the report.
-   Every `evidence` hunk it submits must be copied verbatim from that live diff — it
-   must drop or correct any hunk carried over from step 3 that no longer appears in
-   the actual diff, and the **Files changed** list must come from
-   `git --no-pager diff --stat`, not from what an earlier subagent claimed. **If
-   `git --no-pager diff` is empty, the implementation changed nothing:** the report
-   must say so plainly — an honest `summary`, no AC marked `met` with evidence — and
-   must never describe edits that aren't in the diff. Tell it to submit the
-   user-facing report by calling the **`submit_report`** tool — it has that tool
-   available. It must call `submit_report` exactly once and must not edit any files.
-7. **Confirm and end.** Once the report subagent has called `submit_report`, you are
+   resolve them without regressing the rest. When a Verify failure triggered the
+   fix, include the failing command(s) and their error output excerpt(s) from the
+   Verify result in the fix subagent's prompt so it has the full context. After
+   each fix iteration, re-run the Verify step (step 3) in addition to any AC or
+   quality review that failed. Repeat at most **2** times. If something still
+   fails after that, stop looping and record the gap honestly in the report — do
+   not hide it.
+7. **Report** — Task, `model: "opus"`, read-only. Give the subagent the plan, the
+   Verify results (from step 3), the AC verdicts (from step 4), and the quality
+   findings, and tell it to run `git --no-pager diff` itself as the **single
+   source of truth** for the report. Every `evidence` hunk it submits must be
+   copied verbatim from that live diff — it must drop or correct any hunk carried
+   over from step 4 that no longer appears in the actual diff, and the **Files
+   changed** list must come from `git --no-pager diff --stat`, not from what an
+   earlier subagent claimed. **If `git --no-pager diff` is empty, the
+   implementation changed nothing:** the report must say so plainly — an honest
+   `summary`, no AC marked `met` with evidence — and must never describe edits
+   that aren't in the diff. Tell it to submit the user-facing report by calling
+   the **`submit_report`** tool — it has that tool available. It must call
+   `submit_report` exactly once and must not edit any files.
+8. **Confirm and end.** Once the report subagent has called `submit_report`, you are
    done — end your turn. The runner reads the submitted report, renders it, posts it
    to the thread, and appends the pull-request link. (Your own final text is only a
    fallback if no report was submitted, so make sure the subagent submits one.)
@@ -120,11 +151,11 @@ The report subagent calls `submit_report` with these fields:
 - **`prose`** — markdown for the remaining sections, using `##` headings:
   **What changed** (the plan steps, each mapped to the concrete changes that satisfy
   it), **Code quality** (the quality-review outcome and anything left as
-  nice-to-have), **Files changed** (the list from the diff), **Build / tests** (what
-  was run and the result, or why it wasn't run), and **Caveats / follow-ups**
-  (anything deferred, unmet, or worth a human's eyes). Do **not** put the
-  acceptance-criteria section in `prose`, and do **not** include a PR link — the
-  runner adds it.
+  nice-to-have), **Files changed** (the list from the diff), **Build / tests** (lists
+  each verification command and its final pass/fail result, or explains that no
+  build/test setup was found), and **Caveats / follow-ups** (anything deferred,
+  unmet, or worth a human's eyes). Do **not** put the acceptance-criteria section in
+  `prose`, and do **not** include a PR link — the runner adds it.
 - **`acceptanceCriteria`** — one entry per AC from the plan, in plan order, each:
   - `criterion` — the AC text verbatim.
   - `status` — `"met"` / `"not_met"` / `"unclear"`, mirroring the AC review.
@@ -138,7 +169,7 @@ The report subagent calls `submit_report` with these fields:
 - Delegate through Task subagents; don't implement, review, or write the report
   yourself.
-- Right model per phase: `sonnet` to implement/fix, `opus` to review/report.
+- Right model per phase: `sonnet` to implement/fix/verify (Verify is read-only), `opus` to review/report.
 - Make every Task prompt self-contained — subagents see only what you give them.
 - Reviewers and the report writer never modify files.
 - Never commit, push, or open a PR.

package/skills-plugin/skills/revise-implementation/SKILL.md CHANGED Viewed

@@ -65,9 +65,12 @@ essentials:
 - **Scope the work to the request.** This is a fine-tune of an existing
   implementation, not a rebuild. Change only what the user asked for plus what that
   change strictly requires; don't regress the rest of the plan.
-- **Pipeline:** Implement (Task, `model: "sonnet"`) → acceptance/quality review of
-  the change (Task, `model: "opus"`, read-only) → fix loop if needed (≤2) → report
-  (Task, `model: "opus"`, read-only). Reviewers and the report writer never edit.
+- **Pipeline:** Implement (self-runs build/tests & fixes its own errors, Task
+  `model: "sonnet"`) → Verify (build/tests, read-only, Task `model: "sonnet"`) →
+  acceptance/quality review (Task `model: "opus"`, read-only) → fix loop if needed
+  (≤2, re-run Verify after each fix) → report (Task `model: "opus"`, read-only).
+  Detailed mechanics (command discovery, Verify step spec, fix-loop trigger
+  conditions) are in `implement-plan/SKILL.md` — read it for the full pipeline.
 - **No git side effects.** Never commit, push, or open a PR — leave the changes in
   the working tree. The runner commits them and updates the existing pull request.
@@ -76,11 +79,13 @@ essentials:
 Your last message **is** the comment posted to the plan thread — write it for the
 user:
-- **Implemented:** a short report — what you changed and why, which files, and how
-  it was verified (build/tests). Base "what changed" and "which files" on the actual
-  `git --no-pager diff` (`--stat` for the file list), not on what a subagent claimed;
-  if the diff is empty, say nothing was changed rather than describing edits that
-  aren't there. The runner appends the pull-request link, so don't add one.
+- **Implemented:** a short report — what you changed and why, which files, and the
+  verification results: list each build/test command that was run and its final
+  pass/fail result (or note that no build/test setup was found). Base "what changed"
+  and "which files" on the actual `git --no-pager diff` (`--stat` for the file
+  list), not on what a subagent claimed; if the diff is empty, say nothing was
+  changed rather than describing edits that aren't there. The runner appends the
+  pull-request link, so don't add one.
 - **Clarify / push back:** your question or reasoning, as prose (plus any widget).
 - **Re-plan:** you called `submit_plan`; the rendered plan is posted automatically,
   so keep any extra reply text minimal.