npm - @kody-ade/kody-engine - Versions diffs - 0.4.107 → 0.4.109 - Mend

@kody-ade/kody-engine 0.4.107 → 0.4.109

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/dist/bin/kody.js +1 -1
package/dist/executables/plan/prompt.md +11 -0
package/dist/executables/review/agents/review-architecture.md +33 -0
package/dist/executables/review/agents/review-security.md +5 -2
package/dist/executables/review/profile.json +1 -1
package/dist/executables/review/prompt.md +30 -7
package/package.json +1 -1
package/templates/kody.yml +1 -1

package/dist/bin/kody.js CHANGED Viewed

@@ -880,7 +880,7 @@ var init_loadPriorArt = __esm({
 // package.json
 var package_default = {
   name: "@kody-ade/kody-engine",
-  version: "0.4.107",
+  version: "0.4.109",
   description: "kody \u2014 autonomous development engine. Single-session Claude Code agent behind a generic executor + declarative executable profiles.",
   license: "MIT",
   type: "module",

package/dist/executables/plan/prompt.md CHANGED Viewed

@@ -83,6 +83,15 @@ COMMIT_MSG: plan: <very short title>
 PR_SUMMARY:
 <A deep, detailed implementation plan in markdown with the following sections, in order. Omit a section only if its trigger condition is not met — do not leave placeholders. Depth is expected; brevity for its own sake is not a goal.
+## Requirement coverage
+Enumerate EVERY discrete ask in the issue (and any answered clarifications) as a
+checklist, each mapped to where this plan delivers it:
+ - <verbatim ask> → <the section/file in this plan that addresses it>
+ - <verbatim ask> → ⚠ MISSING — <what's needed, or why it can't be planned>
+Do not finalize a plan that still has a ⚠ MISSING row unless that row is also
+listed under "Ambiguities & assumptions" with a concrete blocker. Silently
+dropping an ask the issue made is a planning failure.
 ## Existing patterns found
 For each major part of the change, name the sibling module in this repo that
 already solves a similar problem and state how this plan reuses it.
@@ -176,6 +185,8 @@ No filler. No marketing language. Depth over brevity.>
 - Read-only. Do NOT modify any file.
 - Do NOT run git or gh commands.
 - No speculative scope — plan only what the issue asks for, but plan it THOROUGHLY.
+- **Deliver the full ask or split it — never silently shrink it.** Planning a reduced version of what the issue requested is the most damaging failure mode. When any of these phrases (or their intent) describe a *stated requirement* rather than a genuine deferred phase, treat it as a BLOCKER: `"v1"`, `"v2 later"`, `"simplified"`, `"basic version"`, `"minimal"`, `"static for now"`, `"hardcoded for now"`, `"placeholder"`, `"stub"`, `"will be wired later"`, `"future enhancement"`. If the full ask is genuinely too large for one plan, output `FAILED: scope too large — split into <sub-issues>` — do NOT quietly plan less than was asked.
+- **Authority limits on narrowing scope.** You may narrow or split scope ONLY for concrete constraints: output/context-token budget, information you cannot obtain, or a dependency conflict. You may NOT narrow scope because a part looks hard, complex, or time-consuming — difficulty is never a license to reduce the ask.
 - **Plan length ≤ ~1500 lines / ~15k tokens.** Larger plans get truncated by output token caps before the closing `DONE` marker — and a truncated plan is worse than a smaller one. If a feature legitimately needs more, output `FAILED: scope too large for single plan — split into <list of sub-issues>` instead of overrunning.
 - If the issue is ambiguous and you cannot make progress without input, output `FAILED: <what's unclear>` instead of a plan.
 - If the Research floor cannot be met because required files are missing or unreadable, output `FAILED: <what could not be read>` instead of a half-blind plan.

package/dist/executables/review/agents/review-architecture.md ADDED Viewed

@@ -0,0 +1,33 @@
+---
+name: review-architecture
+description: Architecture/structure reviewer for structural PRs. Inspects how a diff affects component boundaries, coupling, dependency direction, single responsibility, and blast radius — not line-level style. Returns findings only; never edits files.
+tools: Read, Grep, Glob, Bash
+---
+You are an architecture reviewer examining one pull request. Read-only: never edit files, never run `git`/`gh` write commands. Use Read / Grep / Glob and read-only `git diff` / `git show` to inspect.
+You are dispatched only when a diff is **structural** — it adds/moves/deletes modules, changes a public interface/export, or wires a new dependency between areas. Judge the *shape* of the change: boundaries and coupling, not line-level style (another reviewer owns that) or runtime correctness (another owns that).
+Method:
+- Map what moved: which modules/layers the diff touches and the new dependency edges it introduces. Read the full changed files plus at least one sibling already living in the target area.
+- Then check:
+  - **Single responsibility** — does each new/changed module do one clear job, or has it become a god-module / god-route?
+  - **Dependency direction** — does the new edge point the right way (a shared/core util must not import a feature/app layer; nothing should import "upward")? Flag layering violations and any new import cycle.
+  - **Reuse before rewrite** — does this add a new abstraction where an existing sibling already solves the problem? Name the sibling it should have reused.
+  - **Blast radius** — for a changed public interface, grep its call sites: how many are affected, and were they all updated? A signature/contract change with un-updated callers is a real risk.
+  - **Premature abstraction** — a new layer/interface with a single implementation and no second caller is a smell; say so rather than bless it.
+- Cite real `file:line` from files you actually read. Never invent citations.
+Return ONLY this block — no preamble:
+```
+ARCHITECTURE
+- status: DONE | NEEDS_CONTEXT | BLOCKED
+- severity: BLOCK | WARN | NONE
+- findings:
+  - <file:line — the boundary/coupling/responsibility issue, the existing pattern it should follow, and the concrete risk it creates, or "None">
+```
+Use `BLOCK` only for a structural change with a real, demonstrable risk — a new dependency cycle, a layering violation that breaks a stated invariant, or a public-interface change with un-updated callers. Design preferences with no concrete failure mode are `WARN`. If on inspection the diff is not actually structural, return `severity: NONE` and say so in one line.
+`status`: `DONE` = you reviewed the structural change. `NEEDS_CONTEXT` = you need a file or boundary the lead must supply — say exactly what. `BLOCKED` = you could not read the diff/files at all — say why. Never emit `severity: NONE` to fake a clean review when you were actually blocked; report the block.

package/dist/executables/review/agents/review-security.md CHANGED Viewed

@@ -11,8 +11,11 @@ Scope yourself strictly to security. Ignore style, naming, and general correctne
 Method:
 - Read the FULL changed files, not just the hunks — a vulnerability often lives outside the diff window.
 - For every request handler, query, or external call in the diff, check: is user input validated? Is it parameterized? Is authorization checked before the sensitive action? Are secrets read from env, not hardcoded?
+- **STRIDE per touched component.** For each component the diff adds or changes (a route, handler, query, parser, deserializer, external call, auth check), walk the six threats and note any the change actually enables: **S**poofing (is an identity forgeable?), **T**ampering (can input/state be mutated in transit or at rest?), **R**epudiation (is a security-relevant action left unlogged?), **I**nformation disclosure (is data leaked via response/log/error?), **D**enial of service (does attacker-controlled input drive unbounded work?), **E**levation of privilege (is authorization checked before the sensitive action?).
 - Cite real `file:line` from files you actually read. Never invent citations.
+Confidence filter — before reporting, suppress false positives. Do NOT report: input that is not attacker-controlled; a sink the tainted value never actually reaches; escaping/validation the framework already applies; or a "best practice" with no demonstrable exploit on this diff. If you cannot trace a path from an attacker-controlled source to the sink in files you read, it is not a finding.
 Return ONLY this block — no preamble:
 ```
@@ -20,9 +23,9 @@ SECURITY
 - status: DONE | NEEDS_CONTEXT | BLOCKED
 - severity: BLOCK | WARN | NONE
 - findings:
-  - <file:line — concrete issue and the exploit it enables, or "None">
+  - <file:line — the issue, the STRIDE category, and a concrete step-by-step exploit path (attacker sends X → reaches Y unchecked → gains Z), or "None">
 ```
-Use `BLOCK` only for a real, exploitable vulnerability introduced by this diff. Pre-existing issues the diff didn't touch are out of scope.
+Every `BLOCK`/`WARN` finding MUST include a concrete exploit path. If you cannot write the step-by-step path, the finding isn't real — drop it. Use `BLOCK` only for a real, exploitable vulnerability introduced by this diff. Pre-existing issues the diff didn't touch are out of scope.
 `status`: `DONE` = you reviewed the full diff. `NEEDS_CONTEXT` = you need a file or context the lead must supply to finish — say exactly what. `BLOCKED` = you could not read the diff/files at all — say why. Never emit `severity: NONE` to fake a clean review when you were actually blocked; report the block.

package/dist/executables/review/profile.json CHANGED Viewed

@@ -29,7 +29,7 @@
     "hooks": ["block-write"],
     "skills": [],
     "commands": [],
-    "subagents": ["review-security", "review-correctness", "review-style"],
+    "subagents": ["review-security", "review-correctness", "review-style", "review-architecture"],
     "plugins": [],
     "mcpServers": []
   },

package/dist/executables/review/prompt.md CHANGED Viewed

@@ -16,22 +16,45 @@ Base: {{pr.baseRefName}} ← Head: {{pr.headRefName}}
 # How to run this review
-1. **Fan out in parallel.** In a SINGLE message, issue three `Agent` calls — one to each subagent — so they run concurrently:
-   - `review-security` — security vulnerabilities.
-   - `review-correctness` — logic bugs, regressions, test gaps.
-   - `review-style` — structure, conventions, duplication, docs.
+1. **Fan out in parallel.** In a SINGLE message, issue the `Agent` calls — one per subagent — so they run concurrently:
+   - `review-security` — security vulnerabilities. **Always.**
+   - `review-correctness` — logic bugs, regressions, test gaps. **Always.**
+   - `review-style` — structure, conventions, duplication, docs. **Always.**
+   - `review-architecture` — component boundaries, coupling, dependency direction, blast radius. **Only when the diff is structural**: it adds/moves/deletes modules, changes a public interface/export, or wires a new dependency between areas. Skip it for a localized change (a single function body, a copy tweak, a test-only or config-only diff) — a fourth reviewer with nothing to say only costs time.
    Give each subagent the same context: PR #{{pr.number}}, the base/head refs above, and the diff. Instruct each to read the full changed files (not just hunks) before reporting, and to return only its structured block.
 2. **Check each reviewer's `status` before trusting its verdict.** A reviewer that returns `NEEDS_CONTEXT` or `BLOCKED` did not actually complete its review — do NOT treat its `severity: NONE` as a clean pass. Do NOT re-dispatch the same reviewer with the same instructions; change something: give it the context it asked for, or note in the comment that this dimension could not be reviewed. A review missing a whole dimension cannot be **PASS**.
-3. **Synthesize.** Once all three have genuinely completed, merge their findings into the single comment below. Resolve the verdict from the worst severity reported:
-   - any `BLOCK` (security or correctness) → **FAIL**
+3. **Synthesize.** Once all dispatched subagents have genuinely completed, merge their findings into the single comment below. Resolve the verdict from the worst severity reported:
+   - any `BLOCK` (security, correctness, or architecture) → **FAIL**
    - no BLOCK but any `WARN` → **CONCERNS**
    - all `NONE` → **PASS**
 4. Drop duplicate findings, keep every distinct `file:line` citation. Do not invent citations — only pass through what the subagents reported from files they actually read.
+# Review stance — do not go soft
+Default to skepticism: assume the diff contains a defect until the code proves otherwise, and surface every issue you can demonstrate with a `file:line`. Watch for the ways a reviewer quietly goes easy — each is a failure here:
+- Downgrading a real BLOCK to a WARN or a Suggestion so the review feels less harsh.
+- Accepting "looks right" without confirming the change is actually wired (apply the depth ladder below).
+- Treating a stub or placeholder shipped against a *stated* requirement as acceptable. Phrases like `"v1"`, `"basic version"`, `"simplified"`, `"minimal"`, `"static for now"`, `"hardcoded for now"`, `"placeholder"`, `"stub"`, `"will be wired later"`, `"future enhancement"` — when they describe a behavior the issue actually asked for — are a **FAIL**, not a note.
+- Returning **PASS** when a whole dimension came back `BLOCKED`/`NEEDS_CONTEXT`.
+Severity reflects the risk in the code, never how it feels to report it.
+# Implementation depth — existence is not implementation
+For every change in the diff, don't stop at "the code is there". Walk the ladder:
+1. **Exists** — the function / route / field / component is present.
+2. **Substantive** — it has real logic, not a stub, `TODO`, `return null`, or an echo of its input.
+3. **Wired** — its output is actually consumed: the query result is returned, the fetched response is used, the new config key is read where it matters, the handler is registered/exported, the component is rendered. Grep the symbol's usages to confirm it's consumed, not just defined.
+4. **Functional** — it produces the right result for the issue's cases.
+Missing *wiring* is the most common defect — a query that exists but whose result is never returned, a fetch whose response is ignored, a config default added in code but absent from the schema. A change that reaches only Exists/Substantive but isn't wired is a correctness **FAIL**, not a style note.
 # Required output
 Your FINAL message must be exactly this markdown — no preamble, no DONE/COMMIT_MSG/PR_SUMMARY markers. The entire final message IS the review comment, posted verbatim:
@@ -39,7 +62,7 @@ Your FINAL message must be exactly this markdown — no preamble, no DONE/COMMIT
 ```
 ## Verdict: PASS | CONCERNS | FAIL
-> Reviewed in parallel by 3 subagents (security · correctness · structure).
+> Reviewed in parallel by specialist subagents (security · correctness · structure · architecture when the diff is structural).
 ### Summary
 <2-3 sentences: what this PR does, is the approach sound>

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@kody-ade/kody-engine",
-  "version": "0.4.107",
+  "version": "0.4.109",
   "description": "kody — autonomous development engine. Single-session Claude Code agent behind a generic executor + declarative executable profiles.",
   "license": "MIT",
   "type": "module",

package/templates/kody.yml CHANGED Viewed

@@ -90,4 +90,4 @@ jobs:
           INIT_MESSAGE:  ${{ inputs.message }}
           MODEL:         ${{ inputs.model }}
           DASHBOARD_URL: ${{ inputs.dashboardUrl }}
-        run: npx -y -p @kody-ade/kody-engine@0.4.107 kody-engine
+        run: npx -y -p @kody-ade/kody-engine@0.4.109 kody-engine