npm - @flumecode/runner - Versions diffs - 0.11.0 → 0.12.0 - Mend

@flumecode/runner 0.11.0 → 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/dist/cli.js +3 -3
package/package.json +1 -1
package/skills-plugin/skills/implement-plan/SKILL.md +10 -8
package/skills-plugin/skills/request-to-plan/SKILL.md +13 -5

package/dist/cli.js CHANGED Viewed

@@ -738,7 +738,7 @@ var planInputSchema = {
   assumptions: z2.array(z2.string()).describe("Anything decided during planning, including unanswered defaults."),
   steps: z2.array(stepSchema).min(1).describe("Ordered list of changes. Each step says what and why, with file references."),
   acceptanceCriteria: z2.array(z2.string().min(1)).min(2).describe(
-    "Observable conditions that together define done. Each must be checkable \u2014 a behavior, test, or state you could verify \u2014 not a restatement of a step. At least 2 required."
+    "Concrete, deterministically-checkable conditions that together define done. Each names a trigger/precondition and the exact observable result (run X -> output Y; file Z contains W; f(a) returns b) \u2014 no vague adjectives, not a restatement of a step. The set must collectively cover every step's change. At least 2 required."
   ),
   risks: z2.array(z2.string()).describe("Anything that could change the approach."),
   outOfScope: z2.array(z2.string()).describe("What is deliberately not being done.")
@@ -858,13 +858,13 @@ var acVerdictSchema = z3.object({
   status: z3.enum(["met", "not_met", "unclear"]).describe("Verdict for this criterion, verified against the actual diff."),
   rationale: z3.string().min(1).describe("One or two sentences on why the verdict holds."),
   evidence: z3.array(evidenceSchema).describe(
-    "Diff hunks proving the verdict. Include the relevant hunk(s) for a met criterion; may be empty for not_met / unclear."
+    "Diff hunks proving the verdict, copied verbatim from git --no-pager diff. Across ALL criteria the evidence must collectively cover every hunk in the diff \u2014 each changed hunk appears under at least one criterion. Cite the relevant hunk(s) for a met criterion; may be empty for not_met / unclear."
   )
 });
 var reportInputSchema = {
   summary: z3.string().min(1).describe("One or two sentences on what was implemented."),
   prose: z3.string().min(1).describe(
-    "Markdown for the remaining report sections \u2014 What changed, Files changed, Build / tests, and Caveats / follow-ups. Use ## headings. Do NOT include the acceptance-criteria section here (that goes in acceptanceCriteria) and do NOT include the PR link (the runner appends it)."
+    "Markdown for the remaining report sections \u2014 Files changed, Build / tests, and Caveats / follow-ups. Use ## headings. Do NOT include the acceptance-criteria section here (that goes in acceptanceCriteria) and do NOT include the PR link (the runner appends it)."
   ),
   acceptanceCriteria: z3.array(acVerdictSchema).min(1).describe(
     "One entry per acceptance criterion from the plan, in plan order, each with a verdict and the diff evidence behind it."

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@flumecode/runner",
-  "version": "0.11.0",
+  "version": "0.12.0",
   "type": "module",
   "description": "FlumeCode local runner — claims jobs and drives your local Claude Code against a real checkout.",
   "bin": {

package/skills-plugin/skills/implement-plan/SKILL.md CHANGED Viewed

@@ -96,8 +96,9 @@ the next step.
    useful). For **each** AC it must return: the criterion text verbatim, a verdict
    (**met / not met / unclear**), a one-or-two-sentence rationale, and — this is the
    evidence the report needs — the **exact diff hunk(s)** that prove it, each tagged
-   with its file path (the minimal lines that matter, copied verbatim from
-   `git --no-pager diff`; not the whole file). A _met_ AC should cite at least one
+   with its file path (the hunks that prove it, copied verbatim from
+   `git --no-pager diff`, such that the union of every AC's evidence covers the
+   entire diff — each changed hunk cited under at least one criterion). A _met_ AC should cite at least one
    hunk; _not met_ / _unclear_ may cite none. **Ground every verdict in the actual
    diff:** a criterion may be marked _met_ only if `git --no-pager diff` really
    contains the change that satisfies it, and each cited hunk must be copied verbatim
@@ -105,7 +106,7 @@ the next step.
    implement subagent claimed. If `git --no-pager diff` is empty, the implementation
    produced no changes: no criterion may be _met_, and the review must say so. Tell it
    to return this as a clean, structured list so you can hand it straight to the
-   report step.
+   report step. In addition to per-AC verdicts, cross-check that every hunk in `git --no-pager diff` is cited by at least one AC's evidence; report any uncovered hunk as a coverage gap (signalling a missing AC or an out-of-scope change).
 5. **Code-quality review** — Task, `model: "opus"`, read-only. Give the subagent
    the coding guidelines (verbatim) and tell it to review the changes for
@@ -131,7 +132,7 @@ the next step.
    copied verbatim from that live diff — it must drop or correct any hunk carried
    over from step 4 that no longer appears in the actual diff, and the **Files
    changed** list must come from `git --no-pager diff --stat`, not from what an
-   earlier subagent claimed. **If `git --no-pager diff` is empty, the
+   earlier subagent claimed. Tell it to enumerate all hunks from `git --no-pager diff` and ensure each is attached to ≥1 AC's `evidence`; any hunk mapping to no plan AC goes under `## Caveats / follow-ups` as an explicit unattributed change. **If `git --no-pager diff` is empty, the
    implementation changed nothing:** the report must say so plainly — an honest
    `summary`, no AC marked `met` with evidence — and must never describe edits
    that aren't in the diff. Tell it to submit the user-facing report by calling
@@ -149,13 +150,13 @@ The report subagent calls `submit_report` with these fields:
 - **`summary`** — one or two sentences on what was implemented.
 - **`prose`** — markdown for the remaining sections, using `##` headings:
-  **What changed** (the plan steps, each mapped to the concrete changes that satisfy
-  it), **Code quality** (the quality-review outcome and anything left as
+  **Code quality** (the quality-review outcome and anything left as
   nice-to-have), **Files changed** (the list from the diff), **Build / tests** (lists
   each verification command and its final pass/fail result, or explains that no
   build/test setup was found), and **Caveats / follow-ups** (anything deferred,
-  unmet, or worth a human's eyes). Do **not** put the acceptance-criteria section in
-  `prose`, and do **not** include a PR link — the runner adds it.
+  unmet, or worth a human's eyes — including any diff hunks that map to no plan AC,
+  listed explicitly as unattributed changes). Do **not** put the acceptance-criteria
+  section in `prose`, and do **not** include a PR link — the runner adds it.
 - **`acceptanceCriteria`** — one entry per AC from the plan, in plan order, each:
   - `criterion` — the AC text verbatim.
   - `status` — `"met"` / `"not_met"` / `"unclear"`, mirroring the AC review.
@@ -177,3 +178,4 @@ The report subagent calls `submit_report` with these fields:
   once — not as prose for you to echo. Each acceptance criterion carries the diff
   hunk(s) that prove its verdict, copied verbatim from the live `git --no-pager diff`
   — never fabricated. An empty diff means an honest "nothing changed" report.
+- The report exists so the human reviewer can verify each acceptance criterion is satisfied — the ACs and their diff evidence are the primary review surface.

package/skills-plugin/skills/request-to-plan/SKILL.md CHANGED Viewed

@@ -71,11 +71,19 @@ Field-by-field guidance:
   - **`description`** — what changes and why: the concrete change being made and the rationale for it. Use concrete file references (`path/to/file.ts`) and name the functions/symbols involved.
   - **`pseudoCode`** — an array of `{ file, pseudoCode }` entries. Provide an entry for every file the step touches **except** documentation files (SKILL.md, README.md, wiki pages, etc.). `pseudoCode` is optional in the schema but expected for all non-documentation files. Each entry names the file path and contains pseudo code that precisely describes the changes to make in that file.
 - **`acceptanceCriteria`** — **required; at least 2 items.** Each criterion must
-  be an observable condition you could check after the work is done: a behavior,
-  a test result, or a verifiable state. Together they must fully define "done" —
-  satisfying all of them resolves the request, no more and no less. Do **not**
-  restate a step as a criterion ("the function is updated" is a step; "calling
-  the function with X returns Y" is a criterion).
+  be a concrete, deterministically-checkable condition that a third party can verify
+  without knowing the author's intent. Write each as a trigger/precondition and the
+  exact observable result: `run X → output Y`, `file Z contains W`, `calling f(a) returns b`.
+  No vague adjectives (`robust`, `clean`, `properly`, `works correctly`). The set
+  must be **collectively exhaustive** — every step's intended change is covered by
+  at least one AC. Do **not** restate a step as a criterion.
+  **Good vs bad examples:**
+  - ✅ `grep -rn "What changed" apps/runner/src/report.ts` produces no matches.
+  - ❌ The report is cleaner and no longer mentions 'What changed'. _(vague, not checkable)_
+  - ✅ `pnpm test` in the repo root exits 0 and report.test.ts output contains no failures.
+  - ❌ Tests pass correctly. _(no trigger, no observable result)_
 - **`risks`** — anything that could change the approach or surface a problem.
 - **`outOfScope`** — what you are deliberately not doing.