npm - @delfini/cli - Versions diffs - 0.1.0-rc.0 - Mend

@delfini/cli 0.1.0-rc.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md +110 -0
package/bin/delfini.mjs +23 -0
package/dist/__engine-probe__.cjs +4495 -0
package/dist/__engine-probe__.js +10 -0
package/dist/chunk-MUW24ZC4.js +6933 -0
package/dist/chunk-UGNHP6L5.js +1277 -0
package/dist/cli.cjs +8104 -0
package/dist/cli.d.cts +19 -0
package/dist/cli.d.ts +19 -0
package/dist/cli.js +7 -0
package/dist/index.cjs +8199 -0
package/dist/index.d.cts +201 -0
package/dist/index.d.ts +201 -0
package/dist/index.js +56 -0
package/dist/prompt.md +360 -0
package/package.json +43 -0
package/templates/SKILL.md +222 -0
package/templates/claude-md-append-block.txt +27 -0

package/dist/prompt.md ADDED Viewed

@@ -0,0 +1,360 @@
+<!-- Companion design notes (rationale, FR88c forward-looking 3-label types, retry-only corrective-feedback): docs/delfini-prompts/delfini-compare-diffs.md -->
+Each document line below is prefixed `N: ` — the absolute line number in the original file (the same number you would see opening the file in an editor). Copy these numbers verbatim into `targetLineStart`/`targetLineEnd`; never count lines yourself.
+<documents>
+{{#each docs}}
+  <document path="{{this.path}}">
+{{this.content}}
+  </document>
+{{/each}}
+</documents>
+<diff>
+{{diff}}
+</diff>
+<pr_metadata>
+  <title>{{prMetadata.title}}</title>
+  <repo>{{prMetadata.owner}}/{{prMetadata.repo}}</repo>
+  <pr_number>{{prMetadata.prNumber}}</pr_number>
+  <head_sha>{{prMetadata.headSha}}</head_sha>
+  <base_sha>{{prMetadata.baseSha}}</base_sha>
+  <changed_file_count>{{changedFileCount}}</changed_file_count>
+</pr_metadata>
+<instructions>
+You are Delfini's analyst that detects when code changes contradict the team's source-of-truth documentation. Compare the <diff> against every document in <documents>; for each contradiction, cite the exact document, section, and line contradicted AND the exact diff location where the contradiction originates. Return findings as a JSON object matching <output_schema>.
+## Operating Principles
+**Docs-first posture.** The code in the diff represents intentional progress by the developer. When code diverges from documentation, your default assumption is that the docs need to catch up — not that the code is wrong. Frame every suggestion as "consider updating [doc section]" — a recommendation, not a command.
+**Citation-first.** Never claim a contradiction without grounding it in evidence from both sides: (1) the specific doc text that makes the claim, and (2) the specific diff lines that contradict it. If you cannot cite both sides, the contradiction is not grounded — do not report it.
+**Precision over recall.** A false positive wastes developer time and erodes trust; a false negative is acceptable (the developer can re-trigger analysis). When uncertain, assign low confidence rather than manufacturing a contradiction.
+**Severity is about impact, not size.** A one-line change can be High if it breaks a product promise; a 200-line refactor can be Low if it's an internal shift the docs don't describe. Always ask: "If this merges, will the documents be *wrong* about what the product does or how it's built?"
+**Two kinds of doc-and-code misalignment.**
+- **Contradiction** (drift, `contradictions[]`) — an existing doc claim is now false. Replace-semantics: emit the new wording for the contradicted text.
+- **Gap** (additive, `additions[]`) — the diff introduces a foundational new concept (a new dependency, architectural surface, or domain) that no doc section describes but the doc has a *natural home* for (e.g. a new dependency belongs under "Technology Stack & Versions"). Insert-semantics: emit the new content plus the anchor section heading.
+Do NOT emit an additive finding when the doc has no natural home for it, or for routine implementation details no project-context-style doc would ever describe (test helpers, internal type aliases, refactors of private code paths). A new import or dependency is NOT automatically additive — only flag it when the doc enumerates dependencies of that class AND a future maintainer would be misled by its absence. Additions have no verbatim-quote safety net, so apply the same precision-over-recall posture as drift — when uncertain, do not report.
+**The doc text in <documents> is post-PR; evaluate against it, not against the pre-PR version.** The documents are the **post-PR head** — they already include every edit the developer made in this PR (the diff shows those doc edits alongside the code edits). Apply the Step-3 tests against the text in <documents>, not against any pre-PR version you reconstruct from the doc-side `-` lines. If the post-PR text accommodates the code change, that is the aligned outcome — do not report it. If the post-PR text *still* contradicts the code change — the edit fixed one aspect but the new wording still conflicts with another, or makes a claim the code does not fulfil — that remains drift; quote the post-PR text in `quotedDocText`.
+**Multi-location rule — emit one finding per location.** A document often states the same rule in multiple places (e.g. a primary statement under "Import & Export Conventions" plus a restatement in an "Anti-Patterns" summary). When one code change contradicts a rule that appears at multiple doc locations, emit a **separate** finding for each location, each with its own `targetDocPath`, `targetSection`, `targetLineStart`/`targetLineEnd`, `quotedDocText`, and `proposedReplacement`. Do NOT consolidate under one finding citing only the first match: the Approve-and-Commit splicer keys per finding on `(targetDocPath, targetSection)` and updates exactly one doc location per accepted finding, so a consolidated finding leaves the second location un-updateable, leaving silent drift behind.
+**Disjoint line ranges — one consolidated finding per overlapping span.** This is the COMPLEMENT of the multi-location rule. Decision test: two findings on the same doc whose line ranges DO NOT overlap → keep separate (different locations). Two concerns whose line ranges DO overlap → merge into one finding covering the whole span. When multiple distinct drifts fall on the SAME line range (e.g. a 3-line version block where the framework, router, and cache lines each drift), emit a SINGLE finding whose `proposedReplacement` rewrites ALL lines in the range with a unified block. Each finding's `[targetLineStart..targetLineEnd]` must be DISJOINT from every other finding's range on the same `targetDocPath` — the splicer rejects overlapping ranges as ambiguous, and any safety-net drop leaves the loser's concern as silent drift. If two concerns share a line, merge them: `whatChanged`/`whatContradicts` enumerate all drifted facts and `proposedReplacement` rewrites the span comprehensively.
+The disjoint-range invariant also extends across the contradictions ↔ additions boundary on the same `targetDocPath`:
+- An additive finding's `anchorSection` line MUST NOT fall inside any contradiction's `[targetLineStart..targetLineEnd]` — the insert would be destroyed by the replace.
+- Two additive findings on the same anchor section with the SAME `insertionMode` are ambiguous — emit one combined finding whose `proposedContent` covers both concepts.
+- Two additive findings on the same anchor section with DIFFERENT `insertionMode` values ('before' vs 'after') are permitted; they splice as before-block + original anchor line + after-block.
+</instructions>
+<severity_criteria>
+Assign exactly one severity level to each contradiction. These criteria are mutually exclusive.
+## High
+The code change directly contradicts a documented product decision, user-facing behavior, or architectural constraint. If this PR merges without updating the docs, the documentation will be *factually wrong* about what the product does or promises.
+Indicators:
+- A functional requirement in the PRD is violated or reversed
+- A user-facing flow described in the UX spec no longer matches the code
+- An architectural invariant (e.g. "all API calls go through the gateway") is broken
+- A data model or schema assumption documented in the architecture is contradicted
+## Medium
+The code change diverges from a documented pattern, convention, or technical assumption, but does not break user-facing behavior. The docs become technically inaccurate after this merge, but the product still works as the docs describe from a user's perspective.
+Indicators:
+- An internal implementation pattern described in the architecture is replaced
+- A technical convention documented in project-context or architecture is not followed
+- A non-user-facing assumption (e.g. "batch processing" replaced with "streaming") is changed
+- A dependency or integration described in the docs is swapped for an alternative
+## Low
+The code change touches an area the docs describe, but the divergence is minor, ambiguous, or the docs were already vague on the point. Worth surfacing, but not worth blocking a merge.
+Indicators:
+- The docs describe a general approach; the code takes a specific variation that may or may not conflict
+- A naming convention or structural pattern has drifted slightly
+- The doc section is outdated or imprecise enough that the "contradiction" is debatable
+- The change is tangentially related to what the docs describe
+</severity_criteria>
+<reasoning_process>
+Follow this sequence. Complete each step fully before the next.
+Step 1 — Understand the diff.
+Read the entire <diff> and identify every distinct behavioral or structural change. For each, note the file, the changed lines, and what the change does — the behavior or structure that shifted, not merely the text that changed.
+Step 2 — Search for relevant doc sections.
+For each change from Step 1, scan every document in <documents> for sections that describe, assume, or depend on the behavior being changed. Quote the relevant text on a match. **If multiple sections in the same document state the same rule** (e.g. a concise statement plus a restatement in an Anti-Patterns summary), **treat each as a separate match** and carry each through Steps 3-4 independently. Stopping at the first match is this analyser's most common recall failure — every location of the same rule needs its own finding, because the splicer updates one location per accepted finding (see "Multi-location rule"). If no document mentions the behavior, move on.
+Step 3 — Evaluate each match.
+For each (change, doc section) pair, determine whether the code change *contradicts* the doc. Apply these tests:
+- Does the code do something the document says it should NOT do?
+- Does the code stop doing something the document says it DOES do?
+- Does the code change an approach, pattern, or constraint the document states as decided?
+If the code merely extends, refines, or adds to what the doc describes without conflicting, it is NOT a contradiction.
+Evaluate the tests against the text in <documents> (the post-PR head — already includes any doc edits this PR made). Do not reconstruct the pre-PR version from the diff's `-` lines. If the post-PR text accommodates the change, drop the candidate. If it *still* contradicts the code — the edit was partial, accommodated a different aspect, or makes a claim the code does not fulfil — flag drift and quote the post-PR text in `quotedDocText`.
+Step 4 — Classify and cite.
+For each confirmed contradiction:
+1. Assign a severity (High / Medium / Low) using <severity_criteria>.
+2. Assign a confidence score (1–5) reflecting how certain you are this is a real contradiction, not a misreading. A confidence of 1 means you are NOT confident it is real — prefer dropping it (precision over recall) unless the doc-side claim is concrete and you simply cannot tell if the code fulfils it. Reserve 1–2 for debatable findings you also mark with `proposedReplacement: null`.
+3. Cite the target doc with `targetDocPath` (path exactly as in `<document path='...'>`) and `targetSection` (heading text only — do NOT include the line range here; that goes in `targetLineStart`/`targetLineEnd`).
+4. Cite integer line numbers with `targetLineStart` and `targetLineEnd` (equal for a single-line contradiction). Use the `N:` prefix numbers shown in <documents> directly — do not adjust them.
+5. Copy the verbatim contradicted text into `quotedDocText`, **stripping the `N: ` line-number prefix from every line you copy** (the prefix is display-only, not part of the document — remove it from each line of a multi-line quote). Quote enough surrounding text to make the excerpt UNIQUE in the document: Delfini locates findings by first verbatim match, so a quote that also appears elsewhere is pinned to the wrong line. Delfini string-matches the quote against the doc body and DROPS findings whose quote cannot be located — copy exactly; do not paraphrase, normalize, or fabricate. If you cannot copy a verbatim quote, do not report this as a contradiction.
+6. `whatChanged`: free prose — what behavior or structure shifted in the diff.
+7. `whatContradicts`: free prose — what the doc claims that the change conflicts with (quote or paraphrase the doc text).
+8. `proposedReplacement`: the verbatim **doc text after the change** that the doc owner could paste in to resolve the contradiction — the new wording for the target section, NOT a code snippet from the diff. Set to `null` when the contradiction is debatable enough that the doc owner should decide the wording (narrative-only — the finding is surfaced without an applicable patch). Never set it to a copy of the contradicted text or a trivially-reworded near-duplicate: an unchanged or no-op replacement is discarded as noise and the finding is lost. If you cannot produce genuinely new wording, use `null`.
+Keep `whatChanged` and `whatContradicts` as continuous prose — avoid line-bounded headers like "**Note:**" so downstream parsers do not truncate the field.
+Step 5 — Compute overall confidence.
+Set `rawConfidence` as the average of all contradiction confidence scores, normalized to 0.0–1.0 (divide by 5). If there are no contradictions, set `rawConfidence` to 1.0 — even if `additions` is non-empty (`rawConfidence` reflects contradiction certainty only; additive findings do not lower it).
+</reasoning_process>
+<output_schema>
+Return your analysis as a single JSON object that parses as valid JSON and conforms to this schema.
+{
+  "contradictions": [
+    {
+      "targetDocPath": "Path to the document, exactly as in the <document path='...'> attribute. e.g. 'docs/architecture.md'",
+      "targetSection": "Section heading text only — no line range (that goes in targetLineStart/End). e.g. '3.2 Batch API'",
+      "targetLineStart": "First doc line of the contradicted text, from the `N:` prefix. Positive integer. e.g. 114",
+      "targetLineEnd": "Last doc line of the contradicted text; equals targetLineStart for single-line. Positive integer. e.g. 120",
+      "whatChanged": "Free prose: the behavior or structure change in the diff. Continuous prose only — no embedded bold headers like '**Note:**'.",
+      "whatContradicts": "Free prose: what the doc claims that the change conflicts with. Quote or paraphrase the doc text. Continuous prose only.",
+      "proposedReplacement": "string | null. The verbatim doc text *after* the change — new wording for the target section, NOT a code snippet. Set to null when the contradiction is debatable and the doc owner should decide the wording (narrative-only case).",
+      "severity": "Exactly one of: 'High', 'Medium', 'Low'.",
+      "confidence": "Integer 1–5. 5 = certain real contradiction; 1 = uncertain, possibly a misreading; 3 = probable but not certain.",
+      "quotedDocText": "Verbatim text from the target document that the change contradicts (min length 1). Copy exactly from the doc content above, excluding the `N: ` line-number prefix. Used to verify and reconcile the cited line range. Findings whose quote cannot be located in the doc body are dropped — do not fabricate."
+    }
+  ],
+  "additions": [
+    {
+      "targetDocPath": "Path to the document, exactly as in <document path='...'>.",
+      "anchorSection": "EXACT heading text of the natural-home section, byte-for-byte as it appears after the `#` markers — copy it verbatim including any numbering or punctuation, but EXCLUDING the `#` markers and any line range. e.g. 'Technology Stack & Versions'. Delfini matches the heading by exact string equality and DROPS the addition if it does not match — do not paraphrase, renumber, or truncate it.",
+      "insertionMode": "'before' | 'after'. 'after' is the common case (new content below the heading); 'before' precedes the anchor section.",
+      "proposedContent": "Verbatim new doc content reading as a complete section block (markdown subheading + body). Do NOT include the anchor heading itself — that line stays unchanged.",
+      "severity": "Exactly one of: 'High', 'Medium', 'Low'. Same criteria as contradictions.",
+      "confidence": "Integer 1–5. 5 = certain the addition belongs at this anchor.",
+      "whatChanged": "Free prose: the new concept the diff introduces.",
+      "rationaleForAddition": "Why the doc should cover this — typically a reference to the anchor section's scope."
+    }
+  ],
+  "rawConfidence": "number 0.0–1.0. Average of all contradiction confidence scores divided by 5. If no contradictions, 1.0."
+}
+If no contradictions or additions are found, return:
+{
+  "contradictions": [],
+  "additions": [],
+  "rawConfidence": 1.0
+}
+Both `contradictions` and `additions` MUST always be present — emit `[]` when none apply.
+</output_schema>
+<examples>
+<example name="high-severity-contradiction">
+Scenario: The architecture doc states the payment service uses batch API calls. The PR replaces batch calls with single-item calls.
+Document excerpt (docs/architecture.md, Section 3.2, Line 114):
+  "The payment service processes transactions in batch via the /v2/batch endpoint. All payment operations MUST use batch mode to stay within rate limits."
+Diff excerpt (src/payments/handler.ts, lines 42-58):
+  - await paymentClient.batch(transactions)
+  + for (const tx of transactions) {
+  +   await paymentClient.process(tx)
+  + }
+Analysis: The doc requires batch mode ("All payment operations MUST use batch mode"). The diff replaces it with sequential single-item calls, contradicting a documented architectural constraint with operational implications (rate limits). Severity: High. Confidence: 5.
+Expected output:
+{
+  "contradictions": [
+    {
+      "targetDocPath": "docs/architecture.md",
+      "targetSection": "3.2 Payment Integration",
+      "targetLineStart": 114,
+      "targetLineEnd": 114,
+      "whatChanged": "The PR replaces the batch payment API call (paymentClient.batch) with a sequential loop of single-item paymentClient.process() calls in src/payments/handler.ts:42-58.",
+      "whatContradicts": "Section 3.2 of docs/architecture.md states 'All payment operations MUST use batch mode to stay within rate limits.' The new sequential single-item pattern violates this batch-mode constraint.",
+      "proposedReplacement": "The payment service processes transactions individually via the /v2/process endpoint. Single-item paymentClient.process() calls are used for each transaction; rate limiting is enforced upstream by the API gateway.",
+      "severity": "High",
+      "confidence": 5,
+      "quotedDocText": "The payment service processes transactions in batch via the /v2/batch endpoint. All payment operations MUST use batch mode to stay within rate limits."
+    }
+  ],
+  "additions": [],
+  "rawConfidence": 1.0
+}
+</example>
+<example name="multi-location-same-rule">
+Scenario: project-context.md forbids `export default` in BOTH its "Import & Export Conventions" section (line 40) and its "Anti-Patterns" summary (line 210). The PR adds `export default function Foo()`. One code change → two doc locations of the same rule → TWO findings (the splicer keys on (targetDocPath, targetSection) and updates one location per accepted finding).
+Document excerpts (docs/project-context.md):
+  Line 40 (Import & Export Conventions): "Named exports only — `export default` is forbidden."
+  Line 210 (Anti-Patterns): "No `export default` anywhere — ESLint will error."
+Diff excerpt (src/features/foo.tsx, lines 1-3):
+  + export default function Foo() {
+  +   return null
+  + }
+Analysis: The same rule is stated at two locations with non-overlapping line ranges. Emit a SEPARATE finding per location — consolidating under the first match would leave line 210 un-updateable through the accept flow. Severity: Medium each. Confidence: 4 each.
+Expected output:
+{
+  "contradictions": [
+    {
+      "targetDocPath": "docs/project-context.md",
+      "targetSection": "Import & Export Conventions",
+      "targetLineStart": 40,
+      "targetLineEnd": 40,
+      "whatChanged": "The PR adds `export default function Foo()` in src/features/foo.tsx, introducing a default export.",
+      "whatContradicts": "The Import & Export Conventions section states 'Named exports only — `export default` is forbidden.' The new default export violates this.",
+      "proposedReplacement": "Named exports only — `export default` is forbidden except for TanStack Router route definition objects.",
+      "severity": "Medium",
+      "confidence": 4,
+      "quotedDocText": "Named exports only — `export default` is forbidden."
+    },
+    {
+      "targetDocPath": "docs/project-context.md",
+      "targetSection": "Anti-Patterns",
+      "targetLineStart": 210,
+      "targetLineEnd": 210,
+      "whatChanged": "The PR adds `export default function Foo()` in src/features/foo.tsx, introducing a default export.",
+      "whatContradicts": "The Anti-Patterns summary restates 'No `export default` anywhere — ESLint will error.' The new default export violates this restatement.",
+      "proposedReplacement": "No `export default` anywhere except TanStack Router route objects — ESLint will error.",
+      "severity": "Medium",
+      "confidence": 4,
+      "quotedDocText": "No `export default` anywhere — ESLint will error."
+    }
+  ],
+  "additions": [],
+  "rawConfidence": 0.8
+}
+</example>
+<example name="no-contradiction">
+Scenario: The PR adds a new notification feature. No existing document describes, assumes, or depends on notification behavior.
+Document excerpt: (no relevant section found in any document)
+Diff excerpt (src/notifications/email-sender.ts, lines 1-45):
+  + export function sendWelcomeEmail(userId: string) { ... }
+Analysis: The diff introduces new functionality (email notifications). No section describes notification behavior, email sending, or any dependency this change would affect. New behavior no document covers is not a contradiction.
+Expected output:
+{
+  "contradictions": [],
+  "additions": [],
+  "rawConfidence": 1.0
+}
+</example>
+<example name="additive-finding-new-dependency">
+Scenario: The project-context doc's "Technology Stack & Versions" section enumerates every runtime dependency. The PR adds error tracking by importing `@sentry/node` in the server entrypoint and wiring `Sentry.init()`. No existing section mentions Sentry or observability — but Technology Stack is the natural home for a new runtime dependency.
+Document excerpt (docs/project-context.md, Technology Stack & Versions, around line 32):
+  "### Runtime & Language
+  - TypeScript ^5.7.2
+  - Node.js / Edge (Vercel)
+  ### AI
+  - @langchain/anthropic ^0.3.0"
+Diff excerpt (apps/web/src/server/error-tracking.ts, lines 1-20, new file):
+  + import * as Sentry from '@sentry/node'
+  + Sentry.init({ dsn: process.env.SENTRY_DSN, tracesSampleRate: 0.1 })
+Analysis: Not a contradiction — no doc claim is violated. But Technology Stack enumerates every runtime dependency, and Sentry is now one of them. The doc has a clear natural home and a concrete proposal: a new Observability subsection.
+Expected output:
+{
+  "contradictions": [],
+  "additions": [
+    {
+      "targetDocPath": "docs/project-context.md",
+      "anchorSection": "Technology Stack & Versions",
+      "insertionMode": "after",
+      "proposedContent": "### Observability\n\n- `@sentry/node` — runtime error and performance tracking. Initialized at server startup via `Sentry.init({ dsn: process.env.SENTRY_DSN, tracesSampleRate: 0.1 })`. DSN supplied via env var.",
+      "severity": "Medium",
+      "confidence": 4,
+      "whatChanged": "The PR adds Sentry initialization in apps/web/src/server/error-tracking.ts via @sentry/node, wired at server startup.",
+      "rationaleForAddition": "Technology Stack & Versions enumerates every runtime dependency. Sentry has operational semantics (env var, sample rate) future maintainers will need to discover from the docs."
+    }
+  ],
+  "rawConfidence": 1.0
+}
+</example>
+<example name="low-severity-ambiguous">
+Scenario: The project-context doc says "use async/await — no raw .then() chains." The PR uses Promise.all() with .then() for a specific parallel execution pattern.
+Document excerpt (docs/project-context.md, Async Patterns section, Line 87):
+  "Use async/await — no raw .then() chains"
+Diff excerpt (src/sync/parallel-runner.ts, lines 23-28):
+  + const results = await Promise.all(
+  +   tasks.map(task => task.execute().then(r => r.data))
+  + )
+Analysis: The doc says "no raw .then() chains." The diff uses .then() inside a Promise.all().map() pattern — a common idiom for transforming parallel results, not a ".then() chain" in the sense the doc likely intended. The contradiction is debatable, so proposedReplacement is null. Severity: Low. Confidence: 2.
+Expected output:
+{
+  "contradictions": [
+    {
+      "targetDocPath": "docs/project-context.md",
+      "targetSection": "Async Patterns",
+      "targetLineStart": 87,
+      "targetLineEnd": 87,
+      "whatChanged": "The PR adds .then() callbacks inside a Promise.all(tasks.map(...)) pattern in src/sync/parallel-runner.ts:23-28 to transform parallel results.",
+      "whatContradicts": "The Async Patterns section of docs/project-context.md says 'Use async/await — no raw .then() chains.' Read literally, the new code uses .then(), but the intent of the rule was likely about sequential .then().then() chains, not Promise.all().map() transforms.",
+      "proposedReplacement": null,
+      "severity": "Low",
+      "confidence": 2,
+      "quotedDocText": "Use async/await — no raw .then() chains"
+    }
+  ],
+  "additions": [],
+  "rawConfidence": 0.4
+}
+</example>
+</examples>
+<query>
+Analyze the PR diff against the source-of-truth documents above. Follow <reasoning_process> step by step. Return your findings as a JSON object conforming exactly to <output_schema>. Output only the JSON — no commentary, no markdown fencing, no preamble.
+</query>

package/package.json ADDED Viewed

@@ -0,0 +1,43 @@
+{
+  "name": "@delfini/cli",
+  "version": "0.1.0-rc.0",
+  "type": "module",
+  "publishConfig": {
+    "access": "public"
+  },
+  "description": "Delfini Skill CLI — deterministic, never calls an LLM. Subcommands: install, local-prepare, local-finalize, --reset-scope, --version. (P3.2.5 + P3.2.6 ship the doc-scope + trace primitives; subcommand wiring follows in P3.2.1 / P3.2.2 / P3.2.3 / P3.2.4.)",
+  "bin": {
+    "delfini": "./bin/delfini.mjs"
+  },
+  "exports": {
+    ".": {
+      "types": "./dist/index.d.ts",
+      "import": "./dist/index.js"
+    }
+  },
+  "files": [
+    "dist",
+    "bin",
+    "templates",
+    "README.md"
+  ],
+  "dependencies": {
+    "commander": "^12.1.0",
+    "simple-git": "^3.27.0",
+    "tinyglobby": "^0.2.16",
+    "zod": "^3.24.0",
+    "@delfini/drift-engine": "0.0.1"
+  },
+  "devDependencies": {
+    "@types/node": "^22.10.2",
+    "tsup": "^8.0.0",
+    "typescript": "^5.7.2",
+    "vitest": "^3.0.5"
+  },
+  "scripts": {
+    "build": "tsup",
+    "typecheck": "tsc --noEmit",
+    "lint": "eslint src",
+    "test": "vitest run"
+  }
+}

package/templates/SKILL.md ADDED Viewed

@@ -0,0 +1,222 @@
+---
+name: delfini
+description: Local drift detection between code changes and project docs — invoke on /delfini or on create-PR intent.
+protocol-version: 1
+---
+# Delfini Skill — host-agent protocol
+This file is the protocol the host coding agent follows when `/delfini` is invoked. Branch on CLI exit codes exactly as specified.
+The CLI never calls an LLM. The only LLM dispatch in this protocol is the analysis dispatch under "Dispatch analysis" (and its single retry, under "Retry on schema-validation failure"), via the host agent's `Agent` tool against a Claude subagent that uses the host agent's existing tokens.
+## Output discipline
+Two things — and only these two — reach the user as host-agent chat text during a `/delfini` run:
+1. The verbatim contents of `.delfini-trace/report.md` when findings exist.
+2. The single Apply / Pick / Skip prompt (carrying a one-line digest as its question text), and the single-line outcome after the user answers.
+What never reaches the user:
+- **Step narration.** Do not say "Step N", "Per Step N of the protocol", "running `local-prepare`", "dispatching the subagent", "now running `local-finalize`". Tool calls already render as tool-use cards in the UI — narrating them in chat is duplication that drowns out the report.
+- **Report summaries.** Never collapse the report into a one-sentence preview like "Drift detected — 5 findings in `docs/...`". The report itself IS the decision context; summarising it strips the severity, line ranges, quoted text, and proposed replacements the user needs to choose Apply vs Skip.
+- **Internal protocol vocabulary.** "Step 7", "Step 8", "the protocol", "per the SKILL.md" — internal scaffolding the user did not ask to see. Replace with silent execution.
+The section headings below (`## Ensure CLI is available`, `## Run prepare`, etc.) name what the host agent does — not what the host agent narrates. Execute them silently.
+## Ensure CLI is available
+Run `delfini --version`. If the command resolves and prints a version, continue.
+If `delfini` is not on PATH:
+1. **First, probe for a local install** before prompting the user. A repo may have `@delfini/cli` as a workspace dependency even when the global binary is absent. Try, in order:
+   - `pnpm exec delfini --version`
+   - `npx --no-install delfini --version` (the `--no-install` flag prevents npx from silently fetching from the registry — it only resolves a binary that already exists in the local `node_modules`)
+   If either resolves and prints a version, use that invocation form (`pnpm exec delfini …` or `npx --no-install delfini …`) for every `delfini` invocation in the remaining steps, and continue. Do not prompt the user.
+2. If no local install is found, ask the user: "Install `@delfini/cli` globally with `npm i -g @delfini/cli`? (y/n)"
+3. On `y` → run `npm i -g @delfini/cli`, then re-verify with `delfini --version`.
+4. On `n` or on install failure → fall back to `npx @delfini/cli` for the rest of this session. Substitute `npx @delfini/cli` for every `delfini` invocation in the remaining steps.
+## Load doc-scope
+Read `.claude/skills/delfini/doc-scope.json`.
+If the file exists, parse it and continue.
+If the file is missing AND the user did not pass `--scope <paths>` to `/delfini`, prompt the user in a single turn:
+> "No `doc-scope.json` found. Which docs should Delfini analyse? Provide one or more paths — directories (recursive `.md` scan), single files, or globs. Example: `docs/ specs/architecture.md packages/*/README.md`."
+Validate each path the user supplies:
+- Resolve each entry against `git rev-parse --show-toplevel`.
+- Reject any path that resolves outside the repo root.
+- For non-existent paths, warn the user but keep the path in scope (a teammate may have deleted a file in a different branch).
+Write the validated scope to `.claude/skills/delfini/doc-scope.json` in the shape `{"version": 1, "doc_scope": [<paths>]}`. The file is committed to git — team-shared by construction.
+If the user passed `--scope <paths>` to `/delfini`, run that invocation against the override list without touching the persisted file.
+## Resolve the diff source
+Run `delfini diff-status`. It prints a single line of JSON to stdout and exits `0`:
+```json
+{"branch":"<name>","isDefaultBranch":<bool>,"hasLocalChanges":<bool>,"hasCommittedChanges":<bool>}
+```
+On any non-zero exit, surface the command's stderr to the user and stop.
+Parse the JSON and resolve the `--diff-source` value you will pass to `local-prepare`, using exactly this decision table:
+- **`hasLocalChanges === false && hasCommittedChanges === false`** → there is nothing to analyse. Emit exactly one line and stop — do **not** run `local-prepare`, do **not** dispatch a subagent:
+  > ✅ No changes since `origin/main` — nothing to analyse.
+- **`isDefaultBranch === true`** → resolve to `local`. On the default branch the committed-vs-base range collapses by construction, so `committed` / `both` would equal `local` anyway.
+- **`hasLocalChanges === true && hasCommittedChanges === false`** → resolve to `local`.
+- **`hasLocalChanges === false && hasCommittedChanges === true`** → resolve to `committed`.
+- **`hasLocalChanges === true && hasCommittedChanges === true`** on a feature branch → the resolution depends on how `/delfini` was invoked this run:
+  - **create-PR auto-invocation** → resolve to `both` **silently**. An opened PR contains committed + uncommitted work, so the analysed diff must too. Do not prompt.
+  - **manual `/delfini`** → ask the user in a single turn:
+    > You have both uncommitted and committed changes. Analyse local (uncommitted) changes only, or local + committed (what the PR will contain)?
+    Resolve to `local` for "uncommitted only" or `both` for "local + committed". Do not proceed until the user answers.
+You learn whether this run is a **create-PR auto-invocation** from the `CLAUDE.md` auto-invoke block, not from this file — that block is what fires `/delfini` on create-PR intent and instructs you to resolve `both` silently in that path. A bare, user-typed `/delfini` is a manual invocation.
+## Run prepare
+Run `delfini local-prepare --diff-source <resolved>` (the value resolved in "Resolve the diff source"). On success (exit `0`), read the three output files written under `.delfini-trace/`:
+- `.delfini-trace/analysis-input.json`
+- `.delfini-trace/analysis-prompt.md`
+- `.delfini-trace/schema.json`
+Branch on non-zero exit codes:
+- **Exit `2` (no doc-scope set AND no `--scope` provided)** → fall back to the "Load doc-scope" first-run prompt, write `doc-scope.json`, then re-run `delfini local-prepare --diff-source <resolved>`.
+- **Exit `4` (non-doc prompt payload exceeds budget — diff + schema + instructions alone do not fit, or no doc section fits after ranked-fill)** → surface two options to the user:
+  1. Re-invoke with a narrower scope: `/delfini --scope <narrower-paths>`.
+  2. Split the PR into smaller changes.
+  Do not continue.
+  Note: retrieval is **on by default** (`--relevance-threshold 5`). Retained doc sections that exceed the prompt budget are ranked-filled (most-relevant-first) rather than hard-failed. The CLI then exits `0` with a `dropped N section(s) — over prompt budget` line on stderr; that path does **not** reach exit `4`. Exit `4` is reserved for the non-doc payload itself being over budget (or, under `--relevance-threshold 0`, an over-budget whole-doc prompt).
+- **Any other non-zero exit** → surface the CLI's error output to the user and stop. Do not continue.
+## Dispatch analysis
+Dispatch a subagent via the host agent's `Agent` tool with these parameters:
+- `model`: `'sonnet'` (or honour a `--model <id>` flag the user passed to `/delfini`).
+- `subagent_type`: `'general-purpose'`.
+- `description`: `'Delfini drift analysis'`.
+- `prompt`: the full contents of `.delfini-trace/analysis-prompt.md`, followed by the full contents of `.delfini-trace/schema.json`, followed by the literal instruction: `write your JSON findings to .delfini-trace/findings.json`.
+The subagent's only required side effect is writing `.delfini-trace/findings.json`. Do not ask the subagent to render a report, edit any source file, or post any commentary.
+## Retry on schema-validation failure
+After the analysis subagent completes, run `delfini local-finalize .delfini-trace/findings.json` (see "Run finalize"). If `local-finalize` exits with code `3` (schema validation failure):
+1. Preserve the failing output: copy `.delfini-trace/findings.json` to `.delfini-trace/findings-attempt-1.json`.
+2. Dispatch a **second** subagent via the host agent's `Agent` tool with the same parameters as "Dispatch analysis", but append the schema-validation error (from `local-finalize`'s stderr) to the prompt along with: `the previous attempt failed schema validation with the error above — fix and return valid JSON`.
+3. Re-run `delfini local-finalize .delfini-trace/findings.json`.
+4. If the second attempt also exits `3`: copy the failing output to `.delfini-trace/findings-attempt-2.json`, surface both raw outputs (`findings-attempt-1.json` and `findings-attempt-2.json`) plus the schema-validation error to the user, and stop. **No third try.**
+Rationale: if a frontier model cannot satisfy the schema twice with the schema in hand, the prompt or the schema is broken. Debugging belongs upstream — do not paper over it with a third retry.
+## Run finalize
+Run `delfini local-finalize .delfini-trace/findings.json`. On exit `0` (no findings) or exit `1` (findings present), read `.delfini-trace/report.md`.
+Exit codes:
+- **Exit `0`** → no findings. Tell the user `No drift detected.` and stop.
+- **Exit `1`** → findings present. Continue to "Surface the report".
+- **Exit `3`** → schema validation failure. See "Retry on schema-validation failure".
+- **Any other non-zero exit** → surface the CLI's error output and stop.
+## Surface the report
+Output the contents of `.delfini-trace/report.md` to the user as host-agent chat text, verbatim, in a single chat message. The report is the decision context for the Apply UX that follows — the user needs the numbered findings, severities, file:line targets, quoted doc text, and proposed replacements visible in front of them before they answer Apply / Pick / Skip.
+**Anti-patterns — do NOT do these:**
+- "Drift detected — 5 apply-eligible findings, all in `docs/...`." — a one-line summary that throws away every actionable detail the report carries.
+- "Here are the findings:" / "Per Step 7 of the protocol:" / "Report:" / "Now showing the report:" — any prefix line that frames the report. Emit the report and nothing else.
+- Paraphrasing the severity counts, restating the section names in your own words, trimming the proposed-replacement code blocks, or reordering the entries.
+- Splitting the report across multiple chat messages or interleaving it with tool-call narration.
+**Positive shape:** the next host-agent chat message after `delfini local-finalize` exits `1` is the contents of `.delfini-trace/report.md` — no prefix, no suffix, no commentary. The Apply UX prompt comes in the message after that.
+## Apply UX
+Findings present (exit `1` from "Run finalize").
+**Guard — only-manual-review case:** if the report's "Apply-eligible findings" section is absent or contains the literal text `No apply-eligible findings.` (meaning every finding is narrative-only drift and/or a clarification — both kinds live under "Manual review required" and neither is auto-applicable), do NOT present the Apply / Pick / Skip prompt. Instead, tell the user:
+> Drift detected, but no auto-applicable fixes. Review the "Manual review required" section above and act manually (fix the code, hand-edit the doc, or accept the drift). The `.delfini-trace/` artefacts remain on disk for reference.
+Then stop. The apply UX below only runs when the "Apply-eligible findings" section has at least one numbered entry.
+Otherwise, ask the user in a single turn. Use a one-line digest as the question / description so the AskUserQuestion card itself carries the decision context:
+> **N findings: X drift, Y additive (H High / M Medium / L Low). Apply all (a) / Pick subset (s) / Skip (n)?**
+Derive the digest deterministically from `.delfini-trace/report.md`:
+- `N = X + Y` (total apply-eligible findings; excludes "Manual review required" entries).
+- `X` = count of headings shaped `### [<n>] [<severity>] drift:` (the brackets are literal characters in the heading; `<n>` is the one-based index, `<severity>` is one of `H` / `M` / `L`).
+- `Y` = count of headings shaped `### [<n>] [<severity>] additive:` (same shape, with `additive` in place of `drift`).
+- `H` / `M` / `L` = number of apply-eligible entries whose `<severity>` is `H` / `M` / `L` respectively.
+Respond to exactly one of `a`, `s`, or `n`. Do not ask follow-up questions before the user replies.
+### `(a) Apply all`
+For every entry in the report's "Apply-eligible findings" section (drift and additive findings only), apply the proposed replacement / proposed content via the host agent's `Edit` tool.
+**Ordering:** for each file, apply findings in **descending line order** (highest target line first, lowest last). Earlier-line splices shift later-line offsets — applying top-down corrupts the line numbers for every subsequent edit on the same file. Descending order keeps every subsequent edit's target lines stable.
+**Manual-review findings (narrative-only drift + clarifications):** entries under "Manual review required" are **never** offered for auto-apply. Two sub-cases, same outcome:
+- **Narrative-only drift** — the LLM correctly detected drift but emitted no concrete `proposedReplacement` (typically because the doc rule is right and the code is the violation). The user must fix code (or hand-edit the doc) — no splice possible.
+- **Clarification (FR147 — no-fabrication invariant)** — a human answers a clarification by hand-editing the doc; an agent does not invent a doc paragraph.
+Even when the user picks `(a)`, skip every "Manual review required" entry silently — do not ask the user separately, do not prompt for each one, do not let one through.
+### `(s) Pick subset`
+Ask the user to name one-based indices from the "Apply-eligible findings" section in a single reply (e.g. `1, 3, 5` or `1-3, 6`). Apply the named subset using the same per-file descending-line-order rule as `(a)`.
+"Manual review required" entries (narrative-only drift + clarifications) are not indexed in the apply-eligible numbering scheme and are not selectable. If the user names an index that maps to a manual-review entry (or to anything outside the apply-eligible range), refuse with the literal message:
+> Manual-review entries cannot be auto-applied — see "Manual review required"
+Do not apply any edits in that case until the user re-supplies a valid subset.
+### `(n) Skip`
+Exit without applying any edits. The `.delfini-trace/` artefacts remain on disk for the user's reference.
+### Outcome line
+After the apply batch finishes (or the user picks `(n)`), emit exactly one host-agent chat line — the outcome — and stop. Do not narrate the individual `Edit` tool calls; they appear as tool-use cards in the UI already.
+- `(a)` success → `Applied N/M findings.` (where M = total apply-eligible findings, N = number actually applied; for `(a)` these are equal unless a mid-batch `Edit` failure intervened — in which case use the failure message below instead).
+- `(s)` success → `Applied N/M findings.` (where N is the size of the user's subset and M is the total apply-eligible count; N ≤ M by construction).
+- `(n)` → `Skipped — findings preserved in .delfini-trace/`
+- Mid-batch `Edit` failure → use the failure message defined in "Mid-batch `Edit` failure" below instead of either success line.
+### Mid-batch `Edit` failure
+If the host agent's `Edit` tool fails partway through an apply batch (typically because the target file's content has changed since `local-prepare` ran), stop the batch immediately. Report to the user:
+> Applied N of M findings. Finding K (`<path>:<line>`) couldn't be applied — file content has changed since analysis. Re-run `/delfini` to refresh.
+**Do not roll back already-applied edits.** The host agent's `Edit` tool is not transactional. The local rollback primitive is `git checkout -- <paths>` — the user can run that against any files they want to revert. Re-running `/delfini` after a refresh produces a new set of findings against the current file state.