npm - @kody-ade/kody-engine - Versions diffs - 0.4.106 → 0.4.107 - Mend

@kody-ade/kody-engine 0.4.106 → 0.4.107

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/dist/bin/kody.js +49 -2
package/dist/executables/bug/profile.json +1 -1
package/dist/executables/fix/profile.json +1 -1
package/dist/executables/plan/agents/plan-scout.md +3 -0
package/dist/executables/plan/profile.json +1 -1
package/dist/executables/plan/prompt.md +3 -2
package/dist/executables/research/agents/research-scout.md +3 -0
package/dist/executables/research/profile.json +1 -1
package/dist/executables/research/prompt.md +3 -2
package/dist/executables/review/agents/review-correctness.md +3 -0
package/dist/executables/review/agents/review-security.md +3 -0
package/dist/executables/review/agents/review-style.md +3 -0
package/dist/executables/review/profile.json +1 -1
package/dist/executables/review/prompt.md +5 -3
package/dist/plugins/skills/systematic-debugging/SKILL.md +43 -0
package/package.json +1 -1
package/templates/kody.yml +1 -1

package/dist/bin/kody.js CHANGED Viewed

@@ -880,7 +880,7 @@ var init_loadPriorArt = __esm({
 // package.json
 var package_default = {
   name: "@kody-ade/kody-engine",
-  version: "0.4.106",
+  version: "0.4.107",
   description: "kody \u2014 autonomous development engine. Single-session Claude Code agent behind a generic executor + declarative executable profiles.",
   license: "MIT",
   type: "module",
@@ -2711,6 +2711,50 @@ init_issue();
 import { execFileSync as execFileSync30, spawn as spawn6 } from "child_process";
 import * as fs38 from "fs";
 import * as path36 from "path";
+// src/discipline.ts
+var DISCIPLINE = `# Working discipline (applies to this entire task)
+These rules override any instinct to take a shortcut. They exist because this
+work runs unattended \u2014 no human will catch a hand-waved claim before it ships.
+## Prove before you claim
+You do not get to decide you are "done". Your job is to make the work correct
+and to PROVE every claim you make; a separate wrapper verifies the result and
+will re-invoke you with the gap if proof is missing.
+Before writing ANY success or completion statement ("done", "fixed", "passes",
+"works", "verified"):
+1. IDENTIFY the exact command to run, or the exact file:line to read, that
+   would prove the claim.
+2. RUN/READ it fresh in this run. Do not rely on memory of an earlier output
+   or on "it should pass".
+3. READ the full output and exit code (or the actual lines you cited).
+4. Only then make the claim \u2014 and make it WITH the evidence beside it.
+"Great!" / "Perfect!" / "Done!" with nothing checked this turn is the same as
+claiming something you did not verify. Don't.
+## Do not rationalize past a step
+Violating the letter of an instruction is violating its spirit. If you catch
+yourself thinking any of the following, STOP \u2014 it is a red flag, not a green
+light:
+| The thought | The reality |
+|---|---|
+| "This is too simple to test/verify." | Simple changes break callers too; the check is cheap, the silent regression is not. |
+| "I already verified this earlier." | Earlier is not now \u2014 state changed. Run it again. |
+| "The diff looks right, so it works." | Reading code is not running it. Run it. |
+| "I'll skip this one step to save time." | The skipped step is the one that fails later with no human watching. |
+| "It probably passes." | "Probably" is not evidence. Make it certain, or say you could not. |
+## When you genuinely cannot finish
+If you cannot complete or cannot verify something, say so plainly and name what
+is blocking you. An honest "I could not verify X because Y" is correct and
+useful. A confident claim you never checked is the most expensive failure mode
+here \u2014 never substitute it for the truth.`;
+// src/executor.ts
 init_events();
 // src/lifecycleLabels.ts
@@ -3578,6 +3622,7 @@ function loadSubagents(profile) {
       const tools = fm.tools.split(",").map((t) => t.trim()).filter(Boolean);
       if (tools.length > 0) def.tools = tools;
     }
+    if (fm.model) def.model = fm.model;
     agents[fm.name || name] = def;
   }
   return agents;
@@ -11251,7 +11296,9 @@ async function runExecutable(profileName, input) {
       maxTurns: profile.claudeCode.maxTurns,
       maxThinkingTokens: profile.claudeCode.maxThinkingTokens,
       maxTurnTimeoutMs: typeof profile.claudeCode.maxTurnTimeoutSec === "number" ? Math.floor(profile.claudeCode.maxTurnTimeoutSec * 1e3) : void 0,
-      systemPromptAppend: [profile.claudeCode.systemPromptAppend, taskArtifacts?.promptAddendum].filter((s) => typeof s === "string" && s.length > 0).join("\n\n") || void 0,
+      // DISCIPLINE leads so the stable, role-agnostic block sits at the front
+      // of the cacheable system-prompt prefix; profile/task appends follow.
+      systemPromptAppend: [DISCIPLINE, profile.claudeCode.systemPromptAppend, taskArtifacts?.promptAddendum].filter((s) => typeof s === "string" && s.length > 0).join("\n\n") || void 0,
       cacheable: profile.claudeCode.cacheable,
       enableVerifyTool: profile.claudeCode.enableVerifyTool,
       verifyToolMaxAttempts: profile.claudeCode.verifyAttempts ?? null,

package/dist/executables/bug/profile.json CHANGED Viewed

@@ -37,7 +37,7 @@
       "mcp__kody-verify"
     ],
     "hooks": ["block-git"],
-    "skills": [],
+    "skills": ["systematic-debugging"],
     "commands": [],
     "subagents": [],
     "plugins": [],

package/dist/executables/fix/profile.json CHANGED Viewed

@@ -39,7 +39,7 @@
       "mcp__kody-verify"
     ],
     "hooks": ["block-git"],
-    "skills": [],
+    "skills": ["systematic-debugging"],
     "commands": [],
     "subagents": [],
     "plugins": [],

package/dist/executables/plan/agents/plan-scout.md CHANGED Viewed

@@ -18,8 +18,11 @@ Return ONLY a concise findings block — no preamble, no final-plan formatting (
 ```
 AREA: <the area/files you were assigned>
+- status: DONE | NEEDS_CONTEXT | BLOCKED
 - changes: <file:line — current state → target state, exact edit location>
 - pattern to reuse: <sibling path + which idioms/APIs are mirrored, or "new convention because …">
 - API surface: <symbol → definition path, or UNVERIFIED>
 - risks/edge cases/tests: <bullets an implementer must handle>
 ```
+`status`: `DONE` = area fully investigated. `NEEDS_CONTEXT` = you need a file, boundary, or decision the lead must supply before you can finish — say exactly what. `BLOCKED` = the assigned area doesn't exist or the assignment is wrong — say why. Report `NEEDS_CONTEXT`/`BLOCKED` honestly; never pad the block with guesses to look complete.

package/dist/executables/plan/profile.json CHANGED Viewed

@@ -23,7 +23,7 @@
       "Grep",
       "Glob",
       "Bash",
-      "Task"
+      "Agent"
     ],
     "hooks": ["block-write"],
     "skills": [],

package/dist/executables/plan/prompt.md CHANGED Viewed

@@ -62,11 +62,12 @@ If a file you need to read does not exist, say so explicitly in the plan under "
 # Parallel investigation (do this to meet the Research floor faster)
-You have a `plan-scout` subagent available via the `Task` tool. Use it to satisfy the Research floor in parallel:
+You have a `plan-scout` subagent available via the `Agent` tool. Use it to satisfy the Research floor in parallel:
 1. **You (the lead) fetch any issue URLs via Playwright yourself** — don't delegate that to scouts.
-2. Identify the distinct areas/files this change will touch (e.g. "the field component", "the data hook", "the migration", "the tests"). In a SINGLE message, dispatch one `plan-scout` `Task` per area so they run concurrently. Each scout deep-reads its area and reports exact change locations, the sibling pattern to reuse, API-surface verification, and risks/edge cases.
+2. Identify the distinct areas/files this change will touch (e.g. "the field component", "the data hook", "the migration", "the tests"). In a SINGLE message, dispatch one `plan-scout` `Agent` call per area so they run concurrently. Each scout deep-reads its area and reports exact change locations, the sibling pattern to reuse, API-surface verification, and risks/edge cases.
 3. Wait for all scouts, then synthesize their findings into the plan below. Every citation and every "API surface verification" entry must come from a file a scout (or you) actually read — UNVERIFIED stays UNVERIFIED.
+4. **Check each scout's `status`.** A scout that returns `NEEDS_CONTEXT` or `BLOCKED` did not finish its area. Do NOT re-dispatch the same scout with the same instructions — that just burns a turn for the same result. Instead, change something: supply the context it asked for, narrow or redefine its area, or read that area yourself. Never loop an unchanged dispatch.
 For a small single-file change, one scout (or your own reading) is fine — don't manufacture parallelism that isn't there.

package/dist/executables/research/agents/research-scout.md CHANGED Viewed

@@ -17,8 +17,11 @@ Return ONLY a concise findings block — no preamble, no final-doc formatting (t
 ```
 AREA: <the area you were assigned>
+- status: DONE | NEEDS_CONTEXT | BLOCKED
 - findings:
   - <file:line — what's there and why it matters for this issue>
 - patterns to reuse: <sibling module path + one line, or "none found (searched X)">
 - open questions / gaps: <anything an implementer still wouldn't know, or "none">
 ```
+`status`: `DONE` = area fully investigated. `NEEDS_CONTEXT` = you need a file, boundary, or decision the lead must supply before you can finish — say exactly what. `BLOCKED` = the assigned area doesn't exist or the assignment is wrong — say why. Report `NEEDS_CONTEXT`/`BLOCKED` honestly; never pad the block with guesses to look complete.

package/dist/executables/research/profile.json CHANGED Viewed

@@ -22,7 +22,7 @@
       "Grep",
       "Glob",
       "Bash",
-      "Task",
+      "Agent",
       "mcp__playwright"
     ],
     "hooks": ["block-write"],

package/dist/executables/research/prompt.md CHANGED Viewed

@@ -35,11 +35,12 @@ If a prior-art block is present above, scan the diffs and review comments — th
 # Parallel investigation (do this before writing the doc)
-You have a `research-scout` subagent available via the `Task` tool. Use it to investigate the repo in parallel:
+You have a `research-scout` subagent available via the `Agent` tool. Use it to investigate the repo in parallel:
 1. **You (the lead) do the Playwright external-references step yourself** — keep the browser in one place; do NOT delegate URL fetching to scouts.
-2. From the issue, identify 2–4 distinct investigation areas (e.g. "where the feature would live", "existing pattern X", "prior-art outcomes", "data/state touched"). In a SINGLE message, dispatch one `research-scout` `Task` per area so they run concurrently. Give each scout its specific area and the issue context.
+2. From the issue, identify 2–4 distinct investigation areas (e.g. "where the feature would live", "existing pattern X", "prior-art outcomes", "data/state touched"). In a SINGLE message, dispatch one `research-scout` `Agent` call per area so they run concurrently. Give each scout its specific area and the issue context.
 3. Wait for all scouts, then synthesize their findings into the doc below. Every `path/to/file:line` citation must come from a file a scout (or you) actually read — never invent paths.
+4. **Check each scout's `status`.** A scout that returns `NEEDS_CONTEXT` or `BLOCKED` did not finish its area. Do NOT re-dispatch the same scout with the same instructions — that just burns a turn for the same result. Instead, change something: supply the context it asked for, narrow or redefine its area, or read that area yourself. Never loop an unchanged dispatch.
 For a trivial issue where one area suffices, a single scout (or your own reading) is fine — don't manufacture parallelism that isn't there.

package/dist/executables/review/agents/review-correctness.md CHANGED Viewed

@@ -18,9 +18,12 @@ Return ONLY this block — no preamble:
 ```
 CORRECTNESS
+- status: DONE | NEEDS_CONTEXT | BLOCKED
 - severity: BLOCK | WARN | NONE
 - findings:
   - <file:line — concrete bug/regression and how it manifests at runtime, or "None">
 ```
 Use `BLOCK` only for a clear correctness or regression risk (wrong output, broken caller, dropped tested case). Test-coverage gaps that aren't outright bugs are `WARN`.
+`status`: `DONE` = you reviewed the full diff. `NEEDS_CONTEXT` = you need a file or context the lead must supply to finish — say exactly what. `BLOCKED` = you could not read the diff/files at all — say why. Never emit `severity: NONE` to fake a clean review when you were actually blocked; report the block.

package/dist/executables/review/agents/review-security.md CHANGED Viewed

@@ -17,9 +17,12 @@ Return ONLY this block — no preamble:
 ```
 SECURITY
+- status: DONE | NEEDS_CONTEXT | BLOCKED
 - severity: BLOCK | WARN | NONE
 - findings:
   - <file:line — concrete issue and the exploit it enables, or "None">
 ```
 Use `BLOCK` only for a real, exploitable vulnerability introduced by this diff. Pre-existing issues the diff didn't touch are out of scope.
+`status`: `DONE` = you reviewed the full diff. `NEEDS_CONTEXT` = you need a file or context the lead must supply to finish — say exactly what. `BLOCKED` = you could not read the diff/files at all — say why. Never emit `severity: NONE` to fake a clean review when you were actually blocked; report the block.

package/dist/executables/review/agents/review-style.md CHANGED Viewed

@@ -17,9 +17,12 @@ Return ONLY this block — no preamble:
 ```
 STRUCTURE
+- status: DONE | NEEDS_CONTEXT | BLOCKED
 - severity: WARN | NONE
 - findings:
   - <file:line — concrete structural/convention/doc gap and the existing pattern it should follow, or "None">
 ```
 Structure findings never `BLOCK` — they are advisory. Use `WARN` for real gaps, `NONE` otherwise.
+`status`: `DONE` = you reviewed the full diff. `NEEDS_CONTEXT` = you need a file or context the lead must supply to finish — say exactly what. `BLOCKED` = you could not read the diff/files at all — say why. Never emit `severity: NONE` to fake a clean review when you were actually blocked; report the block.

package/dist/executables/review/profile.json CHANGED Viewed

@@ -24,7 +24,7 @@
       "Grep",
       "Glob",
       "Bash",
-      "Task"
+      "Agent"
     ],
     "hooks": ["block-write"],
     "skills": [],

package/dist/executables/review/prompt.md CHANGED Viewed

@@ -16,19 +16,21 @@ Base: {{pr.baseRefName}} ← Head: {{pr.headRefName}}
 # How to run this review
-1. **Fan out in parallel.** In a SINGLE message, issue three `Task` calls — one to each subagent — so they run concurrently:
+1. **Fan out in parallel.** In a SINGLE message, issue three `Agent` calls — one to each subagent — so they run concurrently:
    - `review-security` — security vulnerabilities.
    - `review-correctness` — logic bugs, regressions, test gaps.
    - `review-style` — structure, conventions, duplication, docs.
    Give each subagent the same context: PR #{{pr.number}}, the base/head refs above, and the diff. Instruct each to read the full changed files (not just hunks) before reporting, and to return only its structured block.
-2. **Synthesize.** Once all three return, merge their findings into the single comment below. Resolve the verdict from the worst severity reported:
+2. **Check each reviewer's `status` before trusting its verdict.** A reviewer that returns `NEEDS_CONTEXT` or `BLOCKED` did not actually complete its review — do NOT treat its `severity: NONE` as a clean pass. Do NOT re-dispatch the same reviewer with the same instructions; change something: give it the context it asked for, or note in the comment that this dimension could not be reviewed. A review missing a whole dimension cannot be **PASS**.
+3. **Synthesize.** Once all three have genuinely completed, merge their findings into the single comment below. Resolve the verdict from the worst severity reported:
    - any `BLOCK` (security or correctness) → **FAIL**
    - no BLOCK but any `WARN` → **CONCERNS**
    - all `NONE` → **PASS**
-3. Drop duplicate findings, keep every distinct `file:line` citation. Do not invent citations — only pass through what the subagents reported from files they actually read.
+4. Drop duplicate findings, keep every distinct `file:line` citation. Do not invent citations — only pass through what the subagents reported from files they actually read.
 # Required output

package/dist/plugins/skills/systematic-debugging/SKILL.md ADDED Viewed

@@ -0,0 +1,43 @@
+---
+name: systematic-debugging
+description: Use when a test fails, a build breaks, or runtime behavior is wrong and the cause is not immediately obvious — especially before changing code to "see if it helps".
+---
+# Systematic debugging
+A failing check is a question ("why did this happen?"), not a prompt to start
+editing. Guessing-and-checking edits the code before you understand it; it
+usually masks the symptom, leaves the real bug, and burns turns. Find the cause
+first, then make one targeted fix.
+## The loop
+1. **Reproduce.** Run the exact failing command and read the FULL output and
+   exit code — not a summary, not memory of a past run. If you can't reproduce
+   it, you can't fix it; say so.
+2. **Isolate.** Narrow to the smallest input/file/line that still fails. Read
+   the full failing function and the code that calls it — the cause often sits
+   outside the line the error points at.
+3. **Find the root cause.** State, in one sentence, the actual mechanism: "X is
+   null here because Y never set it." If you can't write that sentence, you
+   don't understand it yet — keep reading, don't start editing.
+4. **Fix the cause, not the symptom.** Make the smallest change that addresses
+   the mechanism you named. Do not swallow the error, loosen the assertion, or
+   special-case the one failing input to make it pass.
+5. **Verify.** Re-run the exact command from step 1, fresh. Confirm it now
+   passes AND that you didn't break a sibling — run the surrounding tests too.
+## Red flags — stop if you think any of these
+| The thought | The reality |
+|---|---|
+| "Let me just try this and see if it works." | You're editing before you understand. Reproduce and isolate first. |
+| "I'll wrap it in try/catch / loosen the test." | That hides the bug, it doesn't fix it. Find why it throws. |
+| "It only fails sometimes, probably flaky." | "Probably" is not a root cause. Isolate the condition that triggers it. |
+| "The fix is obvious from the error message." | Confirm the mechanism by reading the code; error messages mislead. |
+## When you're stuck
+If two genuine attempts at the root cause fail, stop and say what you tried,
+what you ruled out, and what you'd need to get unblocked. An honest dead-end
+beats a symptom-masking patch that ships a latent bug.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@kody-ade/kody-engine",
-  "version": "0.4.106",
+  "version": "0.4.107",
   "description": "kody — autonomous development engine. Single-session Claude Code agent behind a generic executor + declarative executable profiles.",
   "license": "MIT",
   "type": "module",

package/templates/kody.yml CHANGED Viewed

@@ -90,4 +90,4 @@ jobs:
           INIT_MESSAGE:  ${{ inputs.message }}
           MODEL:         ${{ inputs.model }}
           DASHBOARD_URL: ${{ inputs.dashboardUrl }}
-        run: npx -y -p @kody-ade/kody-engine@0.4.106 kody-engine
+        run: npx -y -p @kody-ade/kody-engine@0.4.107 kody-engine