npm - contract-driven-delivery - Versions diffs - 2.1.3 → 2.2.1 - Mend

contract-driven-delivery 2.1.3 → 2.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

package/CHANGELOG.md +219 -0
package/README.md +124 -1
package/assets/CLAUDE.template.md +13 -0
package/assets/agents/backend-engineer.md +3 -1
package/assets/agents/frontend-engineer.md +3 -2
package/assets/cdd/conformance.json +16 -0
package/assets/cdd/tier-policy.json +35 -0
package/assets/contracts/api/api-contract.md +26 -0
package/assets/hooks/pre-tool-use-graph-first.sh +65 -0
package/assets/skills/contract-driven-delivery/scripts/validate_api_conformance.py +671 -0
package/assets/skills/contract-driven-delivery/scripts/validate_api_semantic.py +8 -1
package/assets/skills/contract-driven-delivery/scripts/validate_contract_versions.py +4 -0
package/dist/cli/index.js +2118 -491
package/docs/adr/0001-contract-to-openapi-export.md +142 -0
package/docs/adr/0002-schema-carrying-contract-format.md +277 -0
package/docs/adr/0003-code-intelligence-indexing-strategy.md +110 -0
package/docs/api-conformance.md +145 -0
package/docs/openapi-export.md +157 -0
package/package.json +1 -1

package/docs/adr/0001-contract-to-openapi-export.md ADDED Viewed

@@ -0,0 +1,142 @@
+# ADR 0001: Contract → OpenAPI export, and the path to generated clients
+- Status: Accepted (export PoC); Proposed (client generation)
+- Date: 2026-06-01
+- Deciders: maintainer + AI delivery agent
+## Context
+The kit's recurring failure mode is **prose governance**: a rule that only
+works if a human or agent chooses to follow it. PRs #6–#8 converted three such
+rules into mechanical chokepoints (conformance validator, `--with-source`,
+installed graph-first hook). One gap remains, and it is the strongest one.
+`contracts/api/api-contract.md` is called the single source of truth for the
+API, but **nothing generates code from it**. The conformance validator added in
+#6 is *detective*: it parses real backend routes and frontend calls and diffs
+them against the contract table *after* the code is written. That is valuable,
+but it has two structural limits:
+1. **It is heuristic.** As the Codex review rounds on #6 showed, regex route
+   detection has an endless tail of framework-shape edge cases (NestJS
+   `RouterModule`, mounted Express prefixes, Rails' stateful DSL). Each is a
+   patch; none is a proof.
+2. **It catches drift; it cannot prevent it.** A divergent call is written,
+   committed, and only then flagged.
+The category that *prevents* drift is **generation**: if the frontend client (or
+its types) is generated from the contract, a divergent call is unrepresentable —
+it fails to typecheck. No regex, no edge cases. This ADR decides how the kit
+moves toward that without betraying its "generic, ships to any repo" constraint.
+## Decision
+Split the work along the **generic / stack-specific** seam, and only build the
+generic half in the kit core.
+### 1. The kit owns contract → OpenAPI extraction (generic, build now)
+Add `cdd-kit openapi export`, which reads `contracts/api/api-contract.md` and
+emits a **minimal OpenAPI 3.1 skeleton** (`paths`, `operations`, `parameters`
+inferred from path templates, response status placeholders, and the API-style
+metadata as `info`/`description`). This is:
+- **stack-agnostic** — it reads the same markdown table the conformance
+  validator already parses; it produces a standard artifact, not code;
+- **safe** — it never writes into source trees, only emits an OpenAPI document
+  to stdout or a chosen path;
+- **non-authoritative by direction** — the markdown contract remains the SoT
+  that humans/agents edit; OpenAPI is a *projection* of it for tooling. (We do
+  **not** invert this to "OpenAPI is the SoT" — that would require every contract
+  author to write OpenAPI, which contradicts the kit's "non-engineers author
+  contracts" audience.)
+The skeleton is intentionally partial: request/response *schemas* in the
+contract table are free-form prose today (e.g. "User", "User[]"), not JSON
+Schema, so the export cannot fabricate field-level schemas it does not have. It
+emits what is mechanically derivable (path, method, params, status codes,
+auth → security scheme hints) and marks the rest as `TODO`/`x-cdd-unresolved`.
+This honesty is the point — same principle as removing `.rb` from the
+conformance defaults rather than shipping a fake Rails parser.
+### 2. Per-stack client generation stays an opt-in adapter (do NOT build in core)
+Generating a typed FE client from the OpenAPI document is **stack-specific** and
+belongs in the consumer repo, wired by the user with an existing, well-maintained
+generator (`openapi-typescript`, `orval`, `openapi-generator`, …). The kit's
+role is to **produce the seam** (the OpenAPI doc) and **document the wiring**, not
+to ship a universal generator. Shipping one would:
+- recreate the maintenance tail we just escaped, now for every target language;
+- risk generating subtly wrong clients across arbitrary stacks — itself a new
+  drift source, which is antithetical to the kit.
+So: kit exports OpenAPI; user runs their generator of choice in their own CI;
+the generated client makes divergence a compile error. The kit provides a
+documented recipe per common stack, not code.
+### 3. Relationship to the conformance validator
+The OpenAPI export does **not** replace `validate_api_conformance.py`; they are
+complementary along the brownfield axis:
+| | Generated client (preventive) | Conformance validator (detective) |
+|---|---|---|
+| Applicable when | repo owns both sides in a typed stack and can regenerate | brownfield/polyglot, cannot regenerate the client |
+| Strength | divergence is unrepresentable (compile error) | divergence is flagged post-hoc, heuristically |
+| Cost | requires generator wiring per stack | zero per-stack wiring |
+The kit keeps the validator as the universal floor and offers OpenAPI export as
+the on-ramp to the stronger, opt-in preventive path. A repo can also feed the
+exported OpenAPI **back into** conformance checking later (validate real routes
+against the OpenAPI paths instead of the markdown table) — a future unification,
+explicitly out of scope here.
+## Consequences
+**Positive**
+- A standard, tool-consumable artifact is derivable from the contract with zero
+  per-stack assumptions.
+- The kit stays generic; stack-specific risk stays in the consumer repo where it
+  belongs.
+- Clear, honest boundary: the export emits only what the markdown actually
+  determines and flags the rest.
+**Negative / limits**
+- The skeleton is partial until contracts carry field-level schemas. This ADR
+  does not mandate that migration; it leaves request/response bodies as `TODO`.
+- Two artifacts now describe the API (markdown SoT + derived OpenAPI). We accept
+  this because the direction is fixed (markdown → OpenAPI, never reverse-edited),
+  so they cannot become competing sources of truth. `cdd-kit openapi export
+  --check` asserts the committed OpenAPI is in sync, the same way `code-map
+  --check` does (shipped — see the follow-up note below).
+## Scope of the accompanying PoC
+This PR ships **only** the generic export half:
+- `cdd-kit openapi export [--out <path>] [--json|--yaml]` reading
+  `contracts/api/api-contract.md`;
+- path-template → OpenAPI `parameters` inference (`/users/:id`, `/users/{id}`);
+- method/auth/status extraction; API-style metadata into `info`;
+- unresolved request/response schemas emitted as clearly-marked placeholders;
+- tests; docs recipe for wiring `openapi-typescript` in a consumer repo.
+It does **not** ship any client generation or schema authoring format. Those
+remain follow-ups, each its own decision.
+## Follow-up shipped since the PoC
+- **`--check` sync gate** (`cdd-kit openapi export --check --out <path>`):
+  verifies the committed artifact still matches the contract, exiting non-zero on
+  drift. This closes the "two artifacts" risk noted above — the derived OpenAPI
+  can no longer silently fall out of step with the markdown source of truth.
+- **Consumer codegen wiring**: `cdd-kit init` now scaffolds editable
+  `contract:client` / `contract:client:check` npm scripts when a `package.json`
+  is present, materializing the consumer half of the seam as a chokepoint rather
+  than a doc — while keeping the actual generator choice in the consumer repo, as
+  this ADR decided.
+Still deferred: a schema-carrying contract format (so request/response **bodies**
+become preventive too, not just paths/methods), and feeding the exported OpenAPI
+back into `validate_api_conformance.py`.

package/docs/adr/0002-schema-carrying-contract-format.md ADDED Viewed

@@ -0,0 +1,277 @@
+# ADR 0002: Schema-carrying contract format (preventive request/response bodies)
+- Status: Accepted (design); implementation to follow in a separate PR
+- Date: 2026-06-01
+- Deciders: maintainer + AI delivery agent
+- Supersedes: nothing; extends ADR 0001 (Contract → OpenAPI export)
+## Context
+ADR 0001 made **paths and methods** preventive: `cdd-kit openapi export`
+projects the markdown contract into OpenAPI, a consumer generates a typed
+client, and a wrong *URL or verb* becomes a compile error. It deliberately
+stopped there. The contract's `request schema` / `response schema` columns are
+free-form prose today — cells say `User`, `User[]`, `CreateUser` — so the
+exporter cannot fabricate field-level schemas and marks every request body
+`x-cdd-unresolved` (see `src/commands/openapi-export.ts`).
+The consequence is a **half-preventive client**. The generated client knows
+`POST /api/users` exists and returns *something*, but the request body is typed
+`unknown` / `any`. The exact class of bug the kit exists to kill — "the URL is
+right but the payload is wrong" (missing required field, wrong field name,
+wrong type) — is still only catchable *detectively*, if at all. The conformance
+validator (ADR 0001 §3) diffs *routes*, not *body shapes*, so it does not cover
+this either.
+This ADR decides **how a contract can carry field-level schemas** so that
+request/response bodies become preventive too, **without** betraying the two
+constraints ADR 0001 was careful about:
+1. **Non-engineers author contracts.** We cannot require every author to write
+   raw OpenAPI / JSON Schema.
+2. **The markdown contract stays the single source of truth**, projected
+   one-way into OpenAPI; we never reverse-edit the generated artifact.
+## Decision
+### 1. Schemas live in the same contract file, referenced by name
+Add an optional `## Schemas` section to `contracts/api/api-contract.md`. Each
+named schema is a `### <Name>` subsection. The existing endpoint table is
+**unchanged**: a `request schema` / `response schema` cell that already says
+`CreateUser` simply *becomes a reference* to `### CreateUser` when that section
+exists. No new column, no migration of existing rows.
+```markdown
+## Endpoint Requirements
+| method | path | auth | request schema | response schema | errors | tests |
+|---|---|---|---|---|---|---|
+| POST | /api/users | admin | CreateUser | User | 400 | yes |
+## Schemas
+### CreateUser
+| field | type | required | notes |
+|---|---|---|---|
+| email | string | yes | login identity |
+| name  | string | yes | |
+| age   | integer | no | |
+### User
+| field | type | required | notes |
+|---|---|---|---|
+| id    | string | yes | |
+| email | string | yes | |
+| name  | string | yes | |
+```
+This reuses the **exact authoring idiom the audience already uses** for
+endpoints (a markdown table), so the non-engineer constraint holds. One file
+stays the source of truth.
+**Naming rules** (so cell→schema resolution is unambiguous):
+- A schema name must match `^[A-Za-z][A-Za-z0-9_]*$` (the OpenAPI
+  `components.schemas` key charset). A `### Name` heading that does not match is
+  **not** treated as a schema (it is ordinary prose), so unrelated `###`
+  headings in the file are never misread as types.
+- A cell value is **first stripped of a single trailing `[]`** (the existing
+  list shorthand: `User[]`), then the inner name is matched. `User[]` resolves to
+  `### User` and emits an array wrapper (`{ type: array, items: { $ref } }`), per
+  §4. The `[]` lives only in the *cell* grammar; a `### User[]` heading is invalid
+  under the name grammar above and is never a schema. (Only one level of `[]` is
+  recognized; `User[][]` is over the ceiling → Tier B.)
+- After `[]`-stripping, resolution is **exact and case-sensitive**: the inner
+  name resolves only to a `### Name` whose text matches byte-for-byte after
+  trimming. `user` does not resolve to `### User`. This avoids a class of silent
+  mis-binding.
+- **Duplicate `### Name` sections are a hard error** (the export fails, it does
+  not pick one), because a duplicate is an ambiguous source of truth — the same
+  stance the kit takes elsewhere.
+- A rename is just an edit: rename the `### Name` *and* the cells that reference
+  it. **A cell that names a non-existent schema is, by construction,
+  indistinguishable from ordinary prose** — every existing contract's cells are
+  exactly that (prose with no `## Schemas` section) — so it **stays Tier C and is
+  not an error**. `openapi export --check` will **not** flag it: `--check` is
+  byte-equality of the committed artifact against a fresh export, so once the
+  Tier C artifact is committed it stays in sync. There is deliberately **no
+  automatic "did-you-mean" net** for a dangling ref, because adding one would
+  either break the no-migration guarantee (it would fail on every legitimately
+  prose cell) or require guessing intent. Surfacing *suspected* typo-refs is a
+  possible **opt-in strict mode**, called out as a follow-up below — not a
+  promise made by `--check`.
+### 2. Three fidelity tiers, with graceful degradation
+For each `request schema` / `response schema` cell value, the exporter resolves
+in this order:
+| Tier | Author writes | Export emits | Body is |
+|---|---|---|---|
+| **A. Field table** | a `### Name` field sub-table | `components.schemas.Name` (JSON Schema) + `$ref` | **preventive** |
+| **B. Raw escape hatch** | a fenced ` ```json-schema ` block under `### Name` | that JSON Schema verbatim + `$ref` | **preventive** |
+| **C. Unresolved** | nothing (cell is prose, no `### Name`) | **today's markers, unchanged** (see below) | best-effort |
+Tier C is the current behavior, so **every existing contract keeps exporting
+exactly as it does now**. Crucially, "today's behavior" is **not uniform across
+request and response**, and Tier C must preserve each side exactly:
+- an unresolved **request** cell keeps emitting the `requestBody` with
+  `x-cdd-unresolved` (as `buildDoc` does today);
+- an unresolved **response** cell keeps emitting the prose annotation
+  `x-cdd-response-contract` (as `buildDoc` does today) — it does **not** gain an
+  `x-cdd-unresolved` marker.
+Flattening both onto a single `x-cdd-unresolved` marker would rewrite every
+existing unresolved-response artifact and break the byte-for-byte/no-migration
+guarantee. So Tier C is defined as "emit precisely what the current exporter
+emits for this cell"; only resolution to a real `### Name` (Tier A/B) changes
+the output. Adoption is incremental and per-schema: define `### User` and only
+`User` becomes preventive; everything else degrades untouched. This mirrors
+ADR 0001's "emit what is mechanically derivable, mark the rest" honesty — and
+mirrors how client codegen itself is opt-in.
+Tier B exists because the field-table notation has a deliberate ceiling (below):
+power users get a verbatim escape hatch instead of being blocked.
+**A and B are mutually exclusive within one `### Name`.** A section is exactly
+one of: a field table (Tier A), a single fenced ` ```json-schema ` block
+(Tier B), or neither (Tier C). If a section contains **both** a field table and
+a `json-schema` block, the export **fails** rather than guessing precedence —
+the same no-silent-pick rule as duplicate names. This keeps every schema's
+source unambiguous; a power user who needs both a readable table *and* a
+constraint the grammar can't express writes the whole schema in Tier B.
+### 3. The field-table type grammar (Tier A)
+Minimal, closed, table-friendly. The `type` cell accepts:
+| `type` cell | JSON Schema |
+|---|---|
+| `string`, `integer`, `number`, `boolean` | `{ "type": ... }` |
+| `<Name>` (matches another `### Name`) | `{ "$ref": "#/components/schemas/Name" }` |
+| `<T>[]` (T = any of the above) | `{ "type": "array", "items": <T> }` |
+| `enum(a, b, c)` | `{ "type": "string", "enum": [...] }` |
+- `required: yes` adds the field to the schema's `required` array.
+- `notes` becomes `description` (always non-binding prose). An optional `format`
+  column becomes JSON Schema `format` — which is an **optional downstream
+  constraint, not guaranteed-inert metadata**: format-aware tooling (ajv with
+  formats, `openapi-generator`) *does* enforce values like `email`, `uuid`,
+  `date-time` and may narrow generated types. The ADR does not promise `format`
+  is never validated; an author writing `format: email` is choosing a constraint
+  their generator may apply, and should mean it.
+- An unknown `type` value (e.g. `strng`) in a schema that is otherwise Tier A
+  **fails the export** (non-zero exit, naming the schema, field, and bad type).
+  It must **not** degrade just that one field to an `x-cdd-unresolved` marker:
+  extension keywords are ignored by validators and code generators, so a
+  per-field marker leaves the property unconstrained while the operation is still
+  advertised as Tier A/preventive — the client would accept any value for it.
+  Worse, byte-equality `--check` would not catch this, so CI would stay green on
+  a silently weakened client. Failing the whole export is the only outcome
+  consistent with "Tier A means preventive": a schema is either fully resolvable
+  or it is not authored yet (leave the cell prose → Tier C). There is no
+  half-preventive Tier A schema.
+**Common edge cases, decided explicitly** (so authors know the Tier A/Tier B
+line without guessing):
+| Need | Tier A answer |
+|---|---|
+| **Optional vs nullable** | `required: no` means the field *may be absent*. It does **not** make the value `null`. A field that must be present but may hold `null` is a `oneOf`-shaped constraint → **drop to Tier B**. We do not overload the `required` column with two meanings. |
+| **Numeric / non-string enums** | `enum(...)` is **string-only** (the 90% case: status strings). A numeric or mixed-type enum → **Tier B**. The grammar stays closed rather than inventing a per-member type syntax in a table cell. |
+| **Date / time** | `string` with a `format` note (e.g. `date-time`, `date`) — emitted as JSON Schema `format`. Whether it is *enforced* depends on the consumer's generator (see the `format` note above): format-aware tooling will validate/narrow, format-blind tooling treats it as a hint. Author it only when you mean that constraint; if you need shape beyond a single `format`, → **Tier B**. |
+The rule behind all three: Tier A covers objects of scalars, refs, arrays,
+string enums, and a **single `format` per field** (the optional `format`
+column). The moment a field needs a `null`-union, a non-string enum, or a shape
+**beyond a single `format`** (e.g. `format` *plus* a union, or multiple
+constraints the grammar has no column for), that one schema moves to Tier B. A
+lone `date-time`/`email`/`uuid` `format` stays in Tier A — it is exactly what the
+`format` column is for. The field table never grows new columns past this — that
+is the ceiling, by design.
+Anything past this grammar (`oneOf`, discriminated unions, deep nesting beyond
+named refs, tuple types) is **out of scope for Tier A by design** — that is what
+Tier B's raw JSON Schema block is for. We are not rebuilding JSON Schema in
+markdown tables; we are covering the 90% object-of-scalars-and-refs case in the
+audience's own idiom and providing a clean hatch for the rest.
+### 4. What the export changes
+- A `components.schemas` map is populated from resolved `### Name` sections.
+- A resolvable `request schema` cell emits a real
+  `requestBody.content['application/json'].schema = { $ref }` **instead of** the
+  `x-cdd-unresolved` placeholder.
+- A resolvable `response schema` cell emits
+  `responses[code].content['application/json'].schema = { $ref }` (and `User[]`
+  → an array wrapper), upgrading today's `x-cdd-response-contract` annotation
+  from prose to a typed shape.
+- `--check` is unaffected in mechanism: it stays byte-equality of the committed
+  artifact against the freshly-generated projection. More of the document is now
+  determined by the contract, so the sync gate simply covers more.
+Unresolved cells keep emitting exactly the markers they do today, so the
+artifact's shape is a **superset** — no existing field is removed or renamed.
+## Consequences
+**Positive**
+- The payload-shape bug class (`wrong/missing field`) becomes a **compile
+  error** in the generated client — the strongest, non-heuristic guarantee,
+  extended from URLs to bodies.
+- Zero forced migration: contracts without a `## Schemas` section are byte-for-
+  byte unaffected; the `## Schemas` block is purely additive.
+- Authors stay in markdown tables; no one is required to learn OpenAPI.
+- The escape hatch keeps complex APIs unblocked without polluting the common
+  notation.
+**Negative / limits**
+- Field-level precision **is** engineering work. This tier targets the engineer
+  (or the agent acting on their behalf), not the non-engineer contract author —
+  honestly an opt-in enrichment layer, the same status client codegen has. We
+  accept this rather than pretend field types can be authored without rigor.
+- Two notations now describe a body in some repos (Tier A table for simple,
+  Tier B raw for complex). We bound this by making Tier B an explicit fenced
+  block, never an inline cell, so the table never becomes a JSON Schema dumping
+  ground.
+- **Response-body conformance against real code stays out of scope.** Once
+  schemas are real, one *could* check that backend response types match the
+  contract schema — but that requires per-language type extraction, the exact
+  heuristic tail ADR 0001 refused. Generation (preventive) remains the path;
+  code-vs-schema conformance is a separate future decision, not this one.
+- **Schema-level breaking-change diffing** (adding a required request field is
+  breaking) is enabled by this format but **not** built here; it is a follow-up
+  that would consume two exported artifacts and apply the contract's existing
+  `breaking-change-policy`.
+- **Suspected-dangling-ref strict mode** is a possible follow-up, not part of
+  this ADR: an opt-in check that flags cells which *look* like an intended schema
+  ref (e.g. capitalized single-token, near-miss of an existing `### Name`) but
+  resolve to no section. It must stay opt-in precisely because, in the general
+  case, such a cell is indistinguishable from legitimate prose; making it a
+  default would break the no-migration guarantee.
+## Scope of the proposed implementation
+If accepted, the implementing PR ships:
+1. A `## Schemas` parser (`### Name` field sub-tables + Tier B fenced blocks)
+   in the export path, behind the existing markdown reader.
+2. The Tier A type grammar compiler and `components.schemas` emission.
+3. `request schema` / `response schema` cells resolving to `$ref` when defined,
+   degrading to each side's existing marker when not (`x-cdd-unresolved` for
+   requests, `x-cdd-response-contract` for responses); a duplicate `### Name`,
+   a section mixing Tier A + Tier B, or an unknown field type **fails the
+   export** (non-zero, no partial artifact).
+4. Template + docs: a `## Schemas` stub in the **source** template
+   `contracts/api/api-contract.md` (NOT `assets/contracts/...`, which `build.js`
+   generates from `contracts/` via `copy('contracts', 'assets/contracts')`) and a
+   worked example in `docs/openapi-export.md`.
+5. Tests: resolved object, `[]` array, `enum`, nested `$ref`, Tier B passthrough;
+   the **no-migration** cases (undefined name leaves request vs response markers
+   exactly as today) proving existing contracts export unchanged; and the
+   **fail-fast** cases (unknown type, duplicate name, mixed A+B) proving no
+   silently-weakened artifact is ever produced.
+It does **not** ship code-vs-schema conformance or schema breaking-change
+diffing. Those remain follow-ups, each its own decision.

package/docs/adr/0003-code-intelligence-indexing-strategy.md ADDED Viewed

@@ -0,0 +1,110 @@
+# ADR 0003: Code-intelligence indexing strategy (trigger vs background, AST vs LSP)
+- Status: Accepted
+- Date: 2026-06-02
+- Deciders: maintainer + AI delivery agent
+- Relates to: `cdd-kit code-map`, `cdd-kit graph`, `cdd-kit index`, `cdd-kit mcp`
+## Context
+The kit's token-efficiency moat is its deterministic, low-token code
+intelligence: `cdd-kit code-map` parses source into a structural index, the
+native code-graph adds files/symbols/imports/calls, and agents query symbols and
+line ranges (`--with-source`) instead of `Read`-ing whole files. Two design
+questions were never written down and are now load-bearing as the kit positions
+itself for a fully automated, no-human-reviewer workflow:
+1. **What parser tier?** Today the kit uses its own AST scanners (Babel for
+   JS/TS/Vue, a Python subprocess) producing a YAML map + JSON sidecar + native
+   graph. Should it adopt a **Language Server Protocol (LSP)** backend like
+   [Serena](https://github.com/oraios/serena), which gives IDE-grade
+   "go-to-definition" precision across 20+ languages?
+2. **What refresh model?** Today indexing is **trigger-based**: the map is
+   regenerated when a command needs it (`gate`, `index query --refresh`,
+   `doctor --fix`, the pre-commit code-map hook). Should it instead run as a
+   **background daemon** that watches the filesystem and keeps the index live,
+   as Serena, [CocoIndex](https://cocoindex.io/cocoindex-code/), and most
+   tree-sitter-based indexers do?
+### What the field does (2026)
+- **Serena** pairs LLMs with LSP for deterministic symbol resolution. It relies
+  on lazy per-language server startup, **incremental indexing** (only modified
+  files re-index), symbol-table caching, and a serialized background task queue.
+- **CocoIndex / tree-sitter indexers** use a **background file watcher** with
+  debounced (~500 ms) incremental re-parse; only changed AST nodes are
+  re-processed, and unchanged chunks reuse cached work. Content-hash (XXH3)
+  comparison gives ~4× speedup over full re-index, and full-repo rebuilds are
+  explicitly avoided because on large repos they "cost real money and take real
+  hours, and the context is stale on arrival."
+- A recurring independent finding: **LSP, built for interactive human IDE
+  sessions, does not translate cleanly to autonomous agents** — symbol-resolution
+  failures, empty reference searches, and coordinate-precision requirements bite
+  in headless runs. Several agent indexers therefore use *LSP-inspired*
+  tree-sitter + graph (e.g. PageRank over the dependency graph) and report
+  symbol-level awareness at only 8.5–13k tokens, without a live LSP.
+## Decision
+### 1. Stay on native AST scanners; do not adopt an LSP daemon
+The kit keeps its own AST/graph scanners as the default engine. Rationale:
+- **Determinism over precision-at-any-cost.** The kit's value is a *stable,
+  diffable, byte-identical* index that the gate and `--with-source` can rely on.
+  An LSP server's answers vary with workspace state, plugin versions, and
+  warm-up; that is the wrong trade for a mechanical chokepoint.
+- **Autonomous-agent fit.** The published failure modes of LSP-in-agents are
+  exactly our usage pattern (headless, ephemeral containers, no editor).
+- **Zero heavy runtime.** No per-language server processes to install, warm, or
+  babysit inside short-lived CI/agent containers.
+- **External LSP/CodeGraph stays opt-in.** `--engine codegraph` already exists
+  for users who want a heavier external graph; LSP can join later as another
+  opt-in adapter, never the default.
+### 2. Keep trigger-based refresh as the default; add opt-in background watch
+Trigger-based stays the default because it is correct for the dominant
+execution context — **ephemeral containers and one-shot agent runs**, where a
+daemon would build an index that is discarded minutes later. But the trigger
+model has a real gap for **long-lived co-editing sessions** (a human and an agent
+editing the same repo over hours): between triggers the map is stale, and
+re-deriving the whole map on every query is wasteful.
+We close that gap with an **opt-in background mode**, `cdd-kit code-map --watch`:
+- A debounced (default 500 ms, matching field practice) recursive `fs.watch`
+  rebuilds the map after change bursts settle.
+- Self-triggering is avoided by ignoring writes under `.cdd/`.
+- Where recursive `fs.watch` is unavailable (older Linux Node), it falls back to
+  freshness polling using the existing `checkCodeMapFreshness` digest check.
+- It is **never armed automatically** — daemons are a poor default for CI.
+### 3. Make detection cheaper and more honest now; make rebuild incremental next
+Two follow-ups, sequenced:
+- **(shipped here)** Background watch + the existing content-hash freshness check
+  (`# sources-digest` header, verified against `computeSourcesDigest`) already
+  prevent false "stale" verdicts after `git clone` and let watch/poll skip
+  no-op rebuilds.
+- **(next PR)** **Incremental rebuild.** Today `--watch` still rebuilds the whole
+  map per debounce window because the scanners are whole-repo. The high-value
+  follow-up is per-file incremental: keep prior map entries for files whose
+  content hash is unchanged, re-scan only the changed set, and merge. This is
+  the ~4× win the field reports and is the prerequisite for watch to be cheap on
+  large repos. The content-hash sidecar already gives us the per-file digests to
+  build on.
+## Consequences
+- **Positive:** default behaviour unchanged for CI/agents; a real option for
+  live sessions; explicit rationale for *not* chasing LSP; a clear, scoped
+  incremental-rebuild roadmap.
+- **Negative / accepted:** `--watch` on a large repo is currently O(full scan)
+  per change burst until incremental lands — documented, and gated behind an
+  explicit flag so no one pays for it unawares.
+- **Revisit when:** an LSP/tree-sitter adapter shows a *deterministic*,
+  container-friendly mode, or incremental rebuild lands and changes the
+  watch cost profile.