npm - @kody-ade/kody-engine - Versions diffs - 0.4.106 → 0.4.108 - Mend

@kody-ade/kody-engine 0.4.106 → 0.4.108

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

package/dist/bin/kody.js +49 -2
package/dist/executables/bug/profile.json +1 -1
package/dist/executables/fix/profile.json +1 -1
package/dist/executables/plan/agents/plan-scout.md +3 -0
package/dist/executables/plan/profile.json +1 -1
package/dist/executables/plan/prompt.md +14 -2
package/dist/executables/research/agents/research-scout.md +3 -0
package/dist/executables/research/profile.json +1 -1
package/dist/executables/research/prompt.md +3 -2
package/dist/executables/review/agents/review-correctness.md +3 -0
package/dist/executables/review/agents/review-security.md +3 -0
package/dist/executables/review/agents/review-style.md +3 -0
package/dist/executables/review/profile.json +1 -1
package/dist/executables/review/prompt.md +27 -3
package/dist/plugins/skills/systematic-debugging/SKILL.md +43 -0
package/package.json +1 -1
package/templates/kody.yml +1 -1

package/dist/bin/kody.js CHANGED Viewed

@@ -880,7 +880,7 @@ var init_loadPriorArt = __esm({
 // package.json
 var package_default = {
   name: "@kody-ade/kody-engine",
-  version: "0.4.106",
+  version: "0.4.108",
   description: "kody \u2014 autonomous development engine. Single-session Claude Code agent behind a generic executor + declarative executable profiles.",
   license: "MIT",
   type: "module",
@@ -2711,6 +2711,50 @@ init_issue();
 import { execFileSync as execFileSync30, spawn as spawn6 } from "child_process";
 import * as fs38 from "fs";
 import * as path36 from "path";
+// src/discipline.ts
+var DISCIPLINE = `# Working discipline (applies to this entire task)
+These rules override any instinct to take a shortcut. They exist because this
+work runs unattended \u2014 no human will catch a hand-waved claim before it ships.
+## Prove before you claim
+You do not get to decide you are "done". Your job is to make the work correct
+and to PROVE every claim you make; a separate wrapper verifies the result and
+will re-invoke you with the gap if proof is missing.
+Before writing ANY success or completion statement ("done", "fixed", "passes",
+"works", "verified"):
+1. IDENTIFY the exact command to run, or the exact file:line to read, that
+   would prove the claim.
+2. RUN/READ it fresh in this run. Do not rely on memory of an earlier output
+   or on "it should pass".
+3. READ the full output and exit code (or the actual lines you cited).
+4. Only then make the claim \u2014 and make it WITH the evidence beside it.
+"Great!" / "Perfect!" / "Done!" with nothing checked this turn is the same as
+claiming something you did not verify. Don't.
+## Do not rationalize past a step
+Violating the letter of an instruction is violating its spirit. If you catch
+yourself thinking any of the following, STOP \u2014 it is a red flag, not a green
+light:
+| The thought | The reality |
+|---|---|
+| "This is too simple to test/verify." | Simple changes break callers too; the check is cheap, the silent regression is not. |
+| "I already verified this earlier." | Earlier is not now \u2014 state changed. Run it again. |
+| "The diff looks right, so it works." | Reading code is not running it. Run it. |
+| "I'll skip this one step to save time." | The skipped step is the one that fails later with no human watching. |
+| "It probably passes." | "Probably" is not evidence. Make it certain, or say you could not. |
+## When you genuinely cannot finish
+If you cannot complete or cannot verify something, say so plainly and name what
+is blocking you. An honest "I could not verify X because Y" is correct and
+useful. A confident claim you never checked is the most expensive failure mode
+here \u2014 never substitute it for the truth.`;
+// src/executor.ts
 init_events();
 // src/lifecycleLabels.ts
@@ -3578,6 +3622,7 @@ function loadSubagents(profile) {
       const tools = fm.tools.split(",").map((t) => t.trim()).filter(Boolean);
       if (tools.length > 0) def.tools = tools;
     }
+    if (fm.model) def.model = fm.model;
     agents[fm.name || name] = def;
   }
   return agents;
@@ -11251,7 +11296,9 @@ async function runExecutable(profileName, input) {
       maxTurns: profile.claudeCode.maxTurns,
       maxThinkingTokens: profile.claudeCode.maxThinkingTokens,
       maxTurnTimeoutMs: typeof profile.claudeCode.maxTurnTimeoutSec === "number" ? Math.floor(profile.claudeCode.maxTurnTimeoutSec * 1e3) : void 0,
-      systemPromptAppend: [profile.claudeCode.systemPromptAppend, taskArtifacts?.promptAddendum].filter((s) => typeof s === "string" && s.length > 0).join("\n\n") || void 0,
+      // DISCIPLINE leads so the stable, role-agnostic block sits at the front
+      // of the cacheable system-prompt prefix; profile/task appends follow.
+      systemPromptAppend: [DISCIPLINE, profile.claudeCode.systemPromptAppend, taskArtifacts?.promptAddendum].filter((s) => typeof s === "string" && s.length > 0).join("\n\n") || void 0,
       cacheable: profile.claudeCode.cacheable,
       enableVerifyTool: profile.claudeCode.enableVerifyTool,
       verifyToolMaxAttempts: profile.claudeCode.verifyAttempts ?? null,

package/dist/executables/bug/profile.json CHANGED Viewed

@@ -37,7 +37,7 @@
       "mcp__kody-verify"
     ],
     "hooks": ["block-git"],
-    "skills": [],
+    "skills": ["systematic-debugging"],
     "commands": [],
     "subagents": [],
     "plugins": [],

package/dist/executables/fix/profile.json CHANGED Viewed

@@ -39,7 +39,7 @@
       "mcp__kody-verify"
     ],
     "hooks": ["block-git"],
-    "skills": [],
+    "skills": ["systematic-debugging"],
     "commands": [],
     "subagents": [],
     "plugins": [],

package/dist/executables/plan/agents/plan-scout.md CHANGED Viewed

@@ -18,8 +18,11 @@ Return ONLY a concise findings block — no preamble, no final-plan formatting (
 ```
 AREA: <the area/files you were assigned>
+- status: DONE | NEEDS_CONTEXT | BLOCKED
 - changes: <file:line — current state → target state, exact edit location>
 - pattern to reuse: <sibling path + which idioms/APIs are mirrored, or "new convention because …">
 - API surface: <symbol → definition path, or UNVERIFIED>
 - risks/edge cases/tests: <bullets an implementer must handle>
 ```
+`status`: `DONE` = area fully investigated. `NEEDS_CONTEXT` = you need a file, boundary, or decision the lead must supply before you can finish — say exactly what. `BLOCKED` = the assigned area doesn't exist or the assignment is wrong — say why. Report `NEEDS_CONTEXT`/`BLOCKED` honestly; never pad the block with guesses to look complete.

package/dist/executables/plan/profile.json CHANGED Viewed

@@ -23,7 +23,7 @@
       "Grep",
       "Glob",
       "Bash",
-      "Task"
+      "Agent"
     ],
     "hooks": ["block-write"],
     "skills": [],

package/dist/executables/plan/prompt.md CHANGED Viewed

@@ -62,11 +62,12 @@ If a file you need to read does not exist, say so explicitly in the plan under "
 # Parallel investigation (do this to meet the Research floor faster)
-You have a `plan-scout` subagent available via the `Task` tool. Use it to satisfy the Research floor in parallel:
+You have a `plan-scout` subagent available via the `Agent` tool. Use it to satisfy the Research floor in parallel:
 1. **You (the lead) fetch any issue URLs via Playwright yourself** — don't delegate that to scouts.
-2. Identify the distinct areas/files this change will touch (e.g. "the field component", "the data hook", "the migration", "the tests"). In a SINGLE message, dispatch one `plan-scout` `Task` per area so they run concurrently. Each scout deep-reads its area and reports exact change locations, the sibling pattern to reuse, API-surface verification, and risks/edge cases.
+2. Identify the distinct areas/files this change will touch (e.g. "the field component", "the data hook", "the migration", "the tests"). In a SINGLE message, dispatch one `plan-scout` `Agent` call per area so they run concurrently. Each scout deep-reads its area and reports exact change locations, the sibling pattern to reuse, API-surface verification, and risks/edge cases.
 3. Wait for all scouts, then synthesize their findings into the plan below. Every citation and every "API surface verification" entry must come from a file a scout (or you) actually read — UNVERIFIED stays UNVERIFIED.
+4. **Check each scout's `status`.** A scout that returns `NEEDS_CONTEXT` or `BLOCKED` did not finish its area. Do NOT re-dispatch the same scout with the same instructions — that just burns a turn for the same result. Instead, change something: supply the context it asked for, narrow or redefine its area, or read that area yourself. Never loop an unchanged dispatch.
 For a small single-file change, one scout (or your own reading) is fine — don't manufacture parallelism that isn't there.
@@ -82,6 +83,15 @@ COMMIT_MSG: plan: <very short title>
 PR_SUMMARY:
 <A deep, detailed implementation plan in markdown with the following sections, in order. Omit a section only if its trigger condition is not met — do not leave placeholders. Depth is expected; brevity for its own sake is not a goal.
+## Requirement coverage
+Enumerate EVERY discrete ask in the issue (and any answered clarifications) as a
+checklist, each mapped to where this plan delivers it:
+ - <verbatim ask> → <the section/file in this plan that addresses it>
+ - <verbatim ask> → ⚠ MISSING — <what's needed, or why it can't be planned>
+Do not finalize a plan that still has a ⚠ MISSING row unless that row is also
+listed under "Ambiguities & assumptions" with a concrete blocker. Silently
+dropping an ask the issue made is a planning failure.
 ## Existing patterns found
 For each major part of the change, name the sibling module in this repo that
 already solves a similar problem and state how this plan reuses it.
@@ -175,6 +185,8 @@ No filler. No marketing language. Depth over brevity.>
 - Read-only. Do NOT modify any file.
 - Do NOT run git or gh commands.
 - No speculative scope — plan only what the issue asks for, but plan it THOROUGHLY.
+- **Deliver the full ask or split it — never silently shrink it.** Planning a reduced version of what the issue requested is the most damaging failure mode. When any of these phrases (or their intent) describe a *stated requirement* rather than a genuine deferred phase, treat it as a BLOCKER: `"v1"`, `"v2 later"`, `"simplified"`, `"basic version"`, `"minimal"`, `"static for now"`, `"hardcoded for now"`, `"placeholder"`, `"stub"`, `"will be wired later"`, `"future enhancement"`. If the full ask is genuinely too large for one plan, output `FAILED: scope too large — split into <sub-issues>` — do NOT quietly plan less than was asked.
+- **Authority limits on narrowing scope.** You may narrow or split scope ONLY for concrete constraints: output/context-token budget, information you cannot obtain, or a dependency conflict. You may NOT narrow scope because a part looks hard, complex, or time-consuming — difficulty is never a license to reduce the ask.
 - **Plan length ≤ ~1500 lines / ~15k tokens.** Larger plans get truncated by output token caps before the closing `DONE` marker — and a truncated plan is worse than a smaller one. If a feature legitimately needs more, output `FAILED: scope too large for single plan — split into <list of sub-issues>` instead of overrunning.
 - If the issue is ambiguous and you cannot make progress without input, output `FAILED: <what's unclear>` instead of a plan.
 - If the Research floor cannot be met because required files are missing or unreadable, output `FAILED: <what could not be read>` instead of a half-blind plan.

package/dist/executables/research/agents/research-scout.md CHANGED Viewed

@@ -17,8 +17,11 @@ Return ONLY a concise findings block — no preamble, no final-doc formatting (t
 ```
 AREA: <the area you were assigned>
+- status: DONE | NEEDS_CONTEXT | BLOCKED
 - findings:
   - <file:line — what's there and why it matters for this issue>
 - patterns to reuse: <sibling module path + one line, or "none found (searched X)">
 - open questions / gaps: <anything an implementer still wouldn't know, or "none">
 ```
+`status`: `DONE` = area fully investigated. `NEEDS_CONTEXT` = you need a file, boundary, or decision the lead must supply before you can finish — say exactly what. `BLOCKED` = the assigned area doesn't exist or the assignment is wrong — say why. Report `NEEDS_CONTEXT`/`BLOCKED` honestly; never pad the block with guesses to look complete.

package/dist/executables/research/profile.json CHANGED Viewed

@@ -22,7 +22,7 @@
       "Grep",
       "Glob",
       "Bash",
-      "Task",
+      "Agent",
       "mcp__playwright"
     ],
     "hooks": ["block-write"],

package/dist/executables/research/prompt.md CHANGED Viewed

@@ -35,11 +35,12 @@ If a prior-art block is present above, scan the diffs and review comments — th
 # Parallel investigation (do this before writing the doc)
-You have a `research-scout` subagent available via the `Task` tool. Use it to investigate the repo in parallel:
+You have a `research-scout` subagent available via the `Agent` tool. Use it to investigate the repo in parallel:
 1. **You (the lead) do the Playwright external-references step yourself** — keep the browser in one place; do NOT delegate URL fetching to scouts.
-2. From the issue, identify 2–4 distinct investigation areas (e.g. "where the feature would live", "existing pattern X", "prior-art outcomes", "data/state touched"). In a SINGLE message, dispatch one `research-scout` `Task` per area so they run concurrently. Give each scout its specific area and the issue context.
+2. From the issue, identify 2–4 distinct investigation areas (e.g. "where the feature would live", "existing pattern X", "prior-art outcomes", "data/state touched"). In a SINGLE message, dispatch one `research-scout` `Agent` call per area so they run concurrently. Give each scout its specific area and the issue context.
 3. Wait for all scouts, then synthesize their findings into the doc below. Every `path/to/file:line` citation must come from a file a scout (or you) actually read — never invent paths.
+4. **Check each scout's `status`.** A scout that returns `NEEDS_CONTEXT` or `BLOCKED` did not finish its area. Do NOT re-dispatch the same scout with the same instructions — that just burns a turn for the same result. Instead, change something: supply the context it asked for, narrow or redefine its area, or read that area yourself. Never loop an unchanged dispatch.
 For a trivial issue where one area suffices, a single scout (or your own reading) is fine — don't manufacture parallelism that isn't there.

package/dist/executables/review/agents/review-correctness.md CHANGED Viewed

@@ -18,9 +18,12 @@ Return ONLY this block — no preamble:
 ```
 CORRECTNESS
+- status: DONE | NEEDS_CONTEXT | BLOCKED
 - severity: BLOCK | WARN | NONE
 - findings:
   - <file:line — concrete bug/regression and how it manifests at runtime, or "None">
 ```
 Use `BLOCK` only for a clear correctness or regression risk (wrong output, broken caller, dropped tested case). Test-coverage gaps that aren't outright bugs are `WARN`.
+`status`: `DONE` = you reviewed the full diff. `NEEDS_CONTEXT` = you need a file or context the lead must supply to finish — say exactly what. `BLOCKED` = you could not read the diff/files at all — say why. Never emit `severity: NONE` to fake a clean review when you were actually blocked; report the block.

package/dist/executables/review/agents/review-security.md CHANGED Viewed

@@ -17,9 +17,12 @@ Return ONLY this block — no preamble:
 ```
 SECURITY
+- status: DONE | NEEDS_CONTEXT | BLOCKED
 - severity: BLOCK | WARN | NONE
 - findings:
   - <file:line — concrete issue and the exploit it enables, or "None">
 ```
 Use `BLOCK` only for a real, exploitable vulnerability introduced by this diff. Pre-existing issues the diff didn't touch are out of scope.
+`status`: `DONE` = you reviewed the full diff. `NEEDS_CONTEXT` = you need a file or context the lead must supply to finish — say exactly what. `BLOCKED` = you could not read the diff/files at all — say why. Never emit `severity: NONE` to fake a clean review when you were actually blocked; report the block.

package/dist/executables/review/agents/review-style.md CHANGED Viewed

@@ -17,9 +17,12 @@ Return ONLY this block — no preamble:
 ```
 STRUCTURE
+- status: DONE | NEEDS_CONTEXT | BLOCKED
 - severity: WARN | NONE
 - findings:
   - <file:line — concrete structural/convention/doc gap and the existing pattern it should follow, or "None">
 ```
 Structure findings never `BLOCK` — they are advisory. Use `WARN` for real gaps, `NONE` otherwise.
+`status`: `DONE` = you reviewed the full diff. `NEEDS_CONTEXT` = you need a file or context the lead must supply to finish — say exactly what. `BLOCKED` = you could not read the diff/files at all — say why. Never emit `severity: NONE` to fake a clean review when you were actually blocked; report the block.

package/dist/executables/review/profile.json CHANGED Viewed

@@ -24,7 +24,7 @@
       "Grep",
       "Glob",
       "Bash",
-      "Task"
+      "Agent"
     ],
     "hooks": ["block-write"],
     "skills": [],

package/dist/executables/review/prompt.md CHANGED Viewed

@@ -16,19 +16,43 @@ Base: {{pr.baseRefName}} ← Head: {{pr.headRefName}}
 # How to run this review
-1. **Fan out in parallel.** In a SINGLE message, issue three `Task` calls — one to each subagent — so they run concurrently:
+1. **Fan out in parallel.** In a SINGLE message, issue three `Agent` calls — one to each subagent — so they run concurrently:
    - `review-security` — security vulnerabilities.
    - `review-correctness` — logic bugs, regressions, test gaps.
    - `review-style` — structure, conventions, duplication, docs.
    Give each subagent the same context: PR #{{pr.number}}, the base/head refs above, and the diff. Instruct each to read the full changed files (not just hunks) before reporting, and to return only its structured block.
-2. **Synthesize.** Once all three return, merge their findings into the single comment below. Resolve the verdict from the worst severity reported:
+2. **Check each reviewer's `status` before trusting its verdict.** A reviewer that returns `NEEDS_CONTEXT` or `BLOCKED` did not actually complete its review — do NOT treat its `severity: NONE` as a clean pass. Do NOT re-dispatch the same reviewer with the same instructions; change something: give it the context it asked for, or note in the comment that this dimension could not be reviewed. A review missing a whole dimension cannot be **PASS**.
+3. **Synthesize.** Once all three have genuinely completed, merge their findings into the single comment below. Resolve the verdict from the worst severity reported:
    - any `BLOCK` (security or correctness) → **FAIL**
    - no BLOCK but any `WARN` → **CONCERNS**
    - all `NONE` → **PASS**
-3. Drop duplicate findings, keep every distinct `file:line` citation. Do not invent citations — only pass through what the subagents reported from files they actually read.
+4. Drop duplicate findings, keep every distinct `file:line` citation. Do not invent citations — only pass through what the subagents reported from files they actually read.
+# Review stance — do not go soft
+Default to skepticism: assume the diff contains a defect until the code proves otherwise, and surface every issue you can demonstrate with a `file:line`. Watch for the ways a reviewer quietly goes easy — each is a failure here:
+- Downgrading a real BLOCK to a WARN or a Suggestion so the review feels less harsh.
+- Accepting "looks right" without confirming the change is actually wired (apply the depth ladder below).
+- Treating a stub or placeholder shipped against a *stated* requirement as acceptable. Phrases like `"v1"`, `"basic version"`, `"simplified"`, `"minimal"`, `"static for now"`, `"hardcoded for now"`, `"placeholder"`, `"stub"`, `"will be wired later"`, `"future enhancement"` — when they describe a behavior the issue actually asked for — are a **FAIL**, not a note.
+- Returning **PASS** when a whole dimension came back `BLOCKED`/`NEEDS_CONTEXT`.
+Severity reflects the risk in the code, never how it feels to report it.
+# Implementation depth — existence is not implementation
+For every change in the diff, don't stop at "the code is there". Walk the ladder:
+1. **Exists** — the function / route / field / component is present.
+2. **Substantive** — it has real logic, not a stub, `TODO`, `return null`, or an echo of its input.
+3. **Wired** — its output is actually consumed: the query result is returned, the fetched response is used, the new config key is read where it matters, the handler is registered/exported, the component is rendered. Grep the symbol's usages to confirm it's consumed, not just defined.
+4. **Functional** — it produces the right result for the issue's cases.
+Missing *wiring* is the most common defect — a query that exists but whose result is never returned, a fetch whose response is ignored, a config default added in code but absent from the schema. A change that reaches only Exists/Substantive but isn't wired is a correctness **FAIL**, not a style note.
 # Required output

package/dist/plugins/skills/systematic-debugging/SKILL.md ADDED Viewed

@@ -0,0 +1,43 @@
+---
+name: systematic-debugging
+description: Use when a test fails, a build breaks, or runtime behavior is wrong and the cause is not immediately obvious — especially before changing code to "see if it helps".
+---
+# Systematic debugging
+A failing check is a question ("why did this happen?"), not a prompt to start
+editing. Guessing-and-checking edits the code before you understand it; it
+usually masks the symptom, leaves the real bug, and burns turns. Find the cause
+first, then make one targeted fix.
+## The loop
+1. **Reproduce.** Run the exact failing command and read the FULL output and
+   exit code — not a summary, not memory of a past run. If you can't reproduce
+   it, you can't fix it; say so.
+2. **Isolate.** Narrow to the smallest input/file/line that still fails. Read
+   the full failing function and the code that calls it — the cause often sits
+   outside the line the error points at.
+3. **Find the root cause.** State, in one sentence, the actual mechanism: "X is
+   null here because Y never set it." If you can't write that sentence, you
+   don't understand it yet — keep reading, don't start editing.
+4. **Fix the cause, not the symptom.** Make the smallest change that addresses
+   the mechanism you named. Do not swallow the error, loosen the assertion, or
+   special-case the one failing input to make it pass.
+5. **Verify.** Re-run the exact command from step 1, fresh. Confirm it now
+   passes AND that you didn't break a sibling — run the surrounding tests too.
+## Red flags — stop if you think any of these
+| The thought | The reality |
+|---|---|
+| "Let me just try this and see if it works." | You're editing before you understand. Reproduce and isolate first. |
+| "I'll wrap it in try/catch / loosen the test." | That hides the bug, it doesn't fix it. Find why it throws. |
+| "It only fails sometimes, probably flaky." | "Probably" is not a root cause. Isolate the condition that triggers it. |
+| "The fix is obvious from the error message." | Confirm the mechanism by reading the code; error messages mislead. |
+## When you're stuck
+If two genuine attempts at the root cause fail, stop and say what you tried,
+what you ruled out, and what you'd need to get unblocked. An honest dead-end
+beats a symptom-masking patch that ships a latent bug.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@kody-ade/kody-engine",
-  "version": "0.4.106",
+  "version": "0.4.108",
   "description": "kody — autonomous development engine. Single-session Claude Code agent behind a generic executor + declarative executable profiles.",
   "license": "MIT",
   "type": "module",

package/templates/kody.yml CHANGED Viewed

@@ -90,4 +90,4 @@ jobs:
           INIT_MESSAGE:  ${{ inputs.message }}
           MODEL:         ${{ inputs.model }}
           DASHBOARD_URL: ${{ inputs.dashboardUrl }}
-        run: npx -y -p @kody-ade/kody-engine@0.4.106 kody-engine
+        run: npx -y -p @kody-ade/kody-engine@0.4.108 kody-engine