npm - @kiwidata/grimoire - Versions diffs - 0.1.6 → 0.2.1 - Mend

@kiwidata/grimoire 0.1.6 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (88) hide show

package/AGENTS.md +2 -2
package/README.md +5 -1
package/dist/cli/program.d.ts.map +1 -1
package/dist/cli/program.js +2 -0
package/dist/cli/program.js.map +1 -1
package/dist/commands/comment-lint.d.ts +3 -0
package/dist/commands/comment-lint.d.ts.map +1 -0
package/dist/commands/comment-lint.js +14 -0
package/dist/commands/comment-lint.js.map +1 -0
package/dist/core/branch-check.d.ts.map +1 -1
package/dist/core/branch-check.js +2 -16
package/dist/core/branch-check.js.map +1 -1
package/dist/core/check-complexity.d.ts +4 -0
package/dist/core/check-complexity.d.ts.map +1 -0
package/dist/core/check-complexity.js +79 -0
package/dist/core/check-complexity.js.map +1 -0
package/dist/core/check-llm.d.ts +5 -0
package/dist/core/check-llm.d.ts.map +1 -0
package/dist/core/check-llm.js +108 -0
package/dist/core/check-llm.js.map +1 -0
package/dist/core/check.d.ts +1 -2
package/dist/core/check.d.ts.map +1 -1
package/dist/core/check.js +2 -179
package/dist/core/check.js.map +1 -1
package/dist/core/comment-lint.d.ts +18 -0
package/dist/core/comment-lint.d.ts.map +1 -0
package/dist/core/comment-lint.js +215 -0
package/dist/core/comment-lint.js.map +1 -0
package/dist/core/doc-style.d.ts +1 -0
package/dist/core/doc-style.d.ts.map +1 -1
package/dist/core/doc-style.js +1 -1
package/dist/core/doc-style.js.map +1 -1
package/dist/core/docs.js +1 -1
package/dist/core/docs.js.map +1 -1
package/dist/core/health.js +1 -1
package/dist/core/health.js.map +1 -1
package/dist/core/hooks.js +39 -28
package/dist/core/hooks.js.map +1 -1
package/dist/core/init-config.d.ts +22 -0
package/dist/core/init-config.d.ts.map +1 -0
package/dist/core/init-config.js +118 -0
package/dist/core/init-config.js.map +1 -0
package/dist/core/init-prompts.d.ts +8 -0
package/dist/core/init-prompts.d.ts.map +1 -0
package/dist/core/init-prompts.js +166 -0
package/dist/core/init-prompts.js.map +1 -0
package/dist/core/init.d.ts +0 -1
package/dist/core/init.d.ts.map +1 -1
package/dist/core/init.js +3 -278
package/dist/core/init.js.map +1 -1
package/dist/core/list.js +2 -2
package/dist/core/list.js.map +1 -1
package/dist/core/status.js +2 -2
package/dist/core/status.js.map +1 -1
package/dist/core/update.d.ts.map +1 -1
package/dist/core/update.js +22 -0
package/dist/core/update.js.map +1 -1
package/dist/core/validate.js +1 -1
package/dist/core/validate.js.map +1 -1
package/dist/utils/config.d.ts +2 -0
package/dist/utils/config.d.ts.map +1 -1
package/dist/utils/config.js +4 -0
package/dist/utils/config.js.map +1 -1
package/dist/utils/frontmatter.d.ts +6 -0
package/dist/utils/frontmatter.d.ts.map +1 -0
package/dist/utils/frontmatter.js +13 -0
package/dist/utils/frontmatter.js.map +1 -0
package/dist/utils/paths.d.ts +1 -1
package/dist/utils/paths.d.ts.map +1 -1
package/dist/utils/paths.js +5 -4
package/dist/utils/paths.js.map +1 -1
package/dist/utils/stdin.d.ts +3 -0
package/dist/utils/stdin.d.ts.map +1 -0
package/dist/utils/stdin.js +13 -0
package/dist/utils/stdin.js.map +1 -0
package/package.json +6 -2
package/skills/grimoire-apply/SKILL.md +4 -3
package/skills/grimoire-draft/SKILL.md +147 -215
package/skills/grimoire-plan/SKILL.md +4 -1
package/skills/grimoire-pr/SKILL.md +14 -15
package/skills/grimoire-refactor/SKILL.md +3 -1
package/skills/references/artifact-map.md +2 -1
package/skills/references/code-quality.md +1 -1
package/skills/references/principles.md +4 -3
package/skills/references/refactor-scan-categories.md +15 -0
package/skills/references/review-personas.md +30 -13
package/skills/references/testing-contracts.md +19 -0
package/templates/draft.md +108 -0

package/skills/grimoire-refactor/SKILL.md CHANGED Viewed

@@ -34,7 +34,7 @@ Each debt item in the register follows a structured format influenced by the Cod
 **Required fields:**
 - `id` — unique identifier (debt-NNN, monotonically increasing)
-- `category` — one of: `hotspot`, `structural_bloat`, `data_structure`, `circular_dependency`, `dependency_staleness`, `broken_promise`, `duplication`, `dead_code`, `test_debt`, `pattern_divergence`, `comment_noise`
+- `category` — one of: `hotspot`, `structural_bloat`, `data_structure`, `circular_dependency`, `dependency_staleness`, `broken_promise`, `duplication`, `reinvented_platform`, `dead_code`, `test_debt`, `pattern_divergence`, `comment_noise`
 - `severity` — `high`, `medium`, or `low`
 - `location` — file path (with optional `:line`), or `path ↔ path` for relationships
 - `title` — short human-readable summary
@@ -114,6 +114,7 @@ Run applicable scans from the categories in `../references/refactor-scan-categor
 - **Dependency staleness** — uses `config.tools.dep_audit` or package manager outdated commands
 - **Broken promises** — aged TODO/FIXME/HACK comments via `grep` + `git blame`
 - **Duplication** — textual clones via `config.tools.duplicates` or `grimoire health` (config-driven duplicates metric); plus semantic duplicate detection via `search_graph(semantic_query=[...])` to find re-implementations under different names (requires `codebase-memory-mcp`)
+- **Platform/stdlib reinvention** (YAGNI) — code or a dependency hand-rolling what the stdlib or runtime platform already ships (`moment`→`Intl`, a date-picker lib→`<input type="date">`, a hand-written deep-clone→`structuredClone`); the don't-reinvent-the-wheel principle (principles.md §3) as a scan. See `../references/refactor-scan-categories.md` §2L
 - **Dead code** — uses `config.tools.dead_code` or `codebase-memory-mcp` graph queries
 - **Test debt** — high complexity + low coverage
 - **Pattern divergence** — code that contradicts established codebase patterns; uses `codebase-memory-mcp` peer group analysis + hallucinated reference detection (skip if graph not indexed)
@@ -195,6 +196,7 @@ After the first batch, ask if the user wants to see more or start working on the
 - "This 34-method class has 5 distinct responsibilities — extracting them would make each class testable independently"
 - "This 28-field config type is used in 3 contexts — splitting it eliminates 22 optional fields and makes each usage self-documenting"
 - "Flattening this 4-level nested structure into 2 normalized types would eliminate the deep property chains throughout the codebase"
+- "This `EmailValidator` class (27 lines) is `\"@\" in email` — real validation is the confirmation mail" / "`moment` is imported for one format call — `Intl.DateTimeFormat`, 0 deps" (always name the concrete stdlib/native replacement and the lines or dependency it removes)
 ### 6. Create Grimoire Changes

package/skills/references/artifact-map.md CHANGED Viewed

@@ -8,7 +8,8 @@ Loaded by skills that read a change's specs before acting (`grimoire-plan`, `gri
 Per-change (under `.grimoire/changes/<change-id>/`):
-- **`manifest.md`** — change summary, complexity level, and the Why. Level 3-4 also carry Assumptions, Pre-Mortem, and **Prior Art** (the build-vs-buy rationale).
+- **`draft.md`** — the living design doc the change was designed on (diagram/sketch, decision ledger, pseudo-code, Decided/Open ledger). The single source the other artifacts were **projected** from at the end of `grimoire-draft`. Ephemeral: retained read-only as the agreed-design reference through the pipeline, deleted when `grimoire-apply` clears the change folder. Read it for the *intent and rationale* behind the projected artifacts; the features/constraints/decisions remain the authoritative homes.
+- **`manifest.md`** — change summary, complexity level, and the Why. Level 3-4 also carry Assumptions, Pre-Mortem, and **Prior Art** (the build-vs-buy rationale). Generated from `draft.md` at projection.
 - **`features/*.feature`** — behavioral specifications. Edited live in `features/` on the branch.
 - **decision records** — architectural choices for this change, edited live in `.grimoire/decisions/`, including Cost of Ownership sections.
 - **`tasks.md`** — the implementation plan (present once planned).

package/skills/references/code-quality.md CHANGED Viewed

@@ -67,7 +67,7 @@ Fail: a private function with three guard clauses before its one real line.
 Fail: any local named `data`, `result`, `temp`, `obj`, `item`, or `value` when a more specific name fits.
-### 6. No premature abstraction
+### 6. No premature abstraction (YAGNI)
 Three near-identical copies is acceptable. Extract on the fourth, or when the shared shape is stable and named. Wrong abstractions are harder to undo than duplication.

package/skills/references/principles.md CHANGED Viewed

@@ -52,15 +52,16 @@ grimoire mechanism that duplicates it.
   tracking is a fractured landscape), don't force-adopt one *and* don't build a
   general-purpose clone. Keep any local mechanism narrow and purpose-scoped.
-## 4. Keep it simple (KISS)
+## 4. Keep it simple (KISS / YAGNI)
 The simplest thing that fully solves the *stated* problem wins.
 - Least code, fewest new files, smallest surface area. A few lines in an existing
   file beats a new module. A standard-library call beats a new dependency. Inline
   beats a one-line wrapper.
-- No premature abstraction. No `BaseX`/factory/strategy/config-object for a single
-  caller. No speculative generality "for a future second caller" that doesn't exist.
+- **YAGNI — no premature abstraction.** You aren't gonna need it. No `BaseX`/factory/
+  strategy/config-object for a single caller. No speculative generality "for a future
+  second caller" that doesn't exist. Speculative need → skip it, say so in one line.
 - Solve the problem in front of you, not the imagined one. Non-goals are real scope
   boundaries — do not plan or build past them.
 - **Tell:** an abstraction, indirection, or dependency whose only justification is a

package/skills/references/refactor-scan-categories.md CHANGED Viewed

@@ -252,3 +252,18 @@ Manual triage: open each hit and check whether a multi-line docstring follows. P
 - low = multi-line docstrings on private functions
 **Suggested action:** Delete restatement comments. Move task/PR references to commit history. Trim private function docstrings to one line or remove entirely.
+## 2L. Platform / Stdlib Reinvention (YAGNI)
+Code or a dependency doing a job the standard library or the runtime platform already ships — `principles.md` §3 (don't reinvent the wheel) as a scan. Distinct from duplication (§2g), which finds *internal* re-implementation; this finds reinvention of things that live *outside* the repo.
+**How to scan (qualitative — no single tool):**
+- **Hand-rolled stdlib:** scan for what the language already covers — date math, URL/query parsing, grouping/uniq, deep clone, debounce/throttle, deep-merge, retry, UUID, base64, hashing. Stack-specific homes: Python `functools`/`itertools`/`collections`/`datetime`; JS `Intl`, `URLSearchParams`, `structuredClone`, `Array`/`Object` methods.
+- **Native-platform reinvention:** a JS lib for what HTML/CSS/the browser does (`<input type="date">`, `<dialog>`, `Intl.DateTimeFormat`, `:has()`), app code for what a DB constraint/index enforces, a polyfill for a baseline-supported feature.
+- **Removable dependency:** cross-reference `package.json` / `pyproject.toml` for a dependency whose whole job now lives in the stdlib or platform — `moment` → `Intl`/`Temporal`, `lodash.clonedeep` → `structuredClone`, `left-pad` → `String.padStart`.
+**Flag when** the replacement is a stdlib call or native feature of equal-or-better correctness. **Name the function/feature and the line count (or dependency) it removes** — a finding the author can't act on is noise.
+**Severity:** high = a removable dependency, medium = a multi-line hand-rolled stdlib equivalent, low = a one-or-two-line shrink.
+**Suggested action:** replace with the named stdlib/native feature; note the dependency drop if any.

package/skills/references/review-personas.md CHANGED Viewed

@@ -158,6 +158,8 @@ Each persona below names what it evaluates. The calling skill points the persona
 Skip if the change is purely internal (no user-facing behavior).
+**Anchor — INVEST + outcome over output.** Judge the change as a well-formed unit of value, using the review-relevant slice of INVEST: **V**aluable (solves a real user problem — not gold-plating), **S**mall/scoped (one coherent outcome, not a grab-bag), **T**estable (acceptance is checkable from the artifact in front of you). Review the *outcome* the change targets, not output volume. INVEST is a lens on what's already specified — the PM persona adds no scope of its own; a "nice to have" it invents is gold-plating, the same failure it flags in others.
 Evaluate:
 - **Outcome**: Manifest's Why states the problem and how success is measured? Mechanism vs outcome ("add an endpoint" vs "users can reset passwords")?
 - **Coverage**: Do feature scenarios cover all user-facing behaviors? Missing edge cases, error states, alternate flows?
@@ -172,11 +174,12 @@ Treat accepted decisions as constraints — cite ADR ID before suggesting an ove
 Evaluate:
 - **Build vs Buy** *(design only)*: Was prior art research thorough? If a well-maintained library exists that the manifest doesn't mention, **blocker**.
-- **Simplicity**: Simplest design that solves the problem? Unnecessary abstraction, indirection, premature generalization, config-driven where direct call would do?
+- **Simplicity (YAGNI ladder)**: Walk `../references/principles.md` §4 in order — could it not exist (YAGNI), does the stdlib do it, does a native platform feature cover it, does an installed dep solve it, is it one line? Flag the first rung the code skipped: unnecessary abstraction, indirection, premature generalization, config-driven where a direct call would do. Abstract on the third real use, not the first (**Rule of Three**) — two copies is not yet a pattern. Every finding **names the concrete replacement** (`stdlib: 27-line validator → "@" in email, 1 line`), not just "this seems complex" — a finding the author can't act on is noise.
 - **Architecture**: Decisions sensible for this codebase? Will this paint us into a corner?
 - **Conventions** *(PR/pre-commit)*: Does new code match file layout, naming, and patterns already in the touched areas? Check `.grimoire/docs/<area>.md` if present.
-- **Reuse**: Existing utilities/functions that were re-implemented? `grep` for similar names; check area docs' reusable-code lists.
+- **Reuse / reinvention**: Existing utilities re-implemented (`grep` similar names; area-doc reusable lists), or stdlib / native-platform / installed-dep functionality hand-rolled (principles.md §3 — don't reinvent the wheel). Name what already does the job.
 - **Dead code** *(PR/pre-commit)*: Functions added but not called, imports unused, commented-out code, stubs with no implementation.
+- **Safe deletion / refactor (Chesterton's Fence)**: Before recommending code be deleted, inlined, or "simplified away," state *why it exists* — a guard, a workaround, a non-obvious caller. If you can't explain the fence, don't tear it down — ask. YAGNI removes the speculative, never the load-bearing-but-unobvious.
 - **Scope creep** *(PR/pre-commit)*: Files changed outside the scope implied by the change-id or manifest. Formatting-only changes to unrelated files = noise.
 - **Error handling**: Errors handled at boundaries? Internal code shouldn't be littered with defensive checks; external inputs must be validated.
 - **Tests**: New behaviors have tests? Tests make real assertions (not just `assert true` / mock everything)? Check `./testing-contracts.md` if framework matches.
@@ -205,12 +208,24 @@ For every new entry point, data flow, or trust boundary:
 Skip categories that don't apply.
+#### LINDDUN (privacy) — engage only when briefing data-sensitivity is pii/financial/phi, or the change touches personal data
+STRIDE covers security, not privacy. For flows that collect, store, or share personal data, scan the privacy threats STRIDE under-covers:
+- **Linking / Identifying**: can records be correlated to one person, or a dataset/log de-anonymize someone who shouldn't be identifiable?
+- **Data disclosure (minimization)**: is each collected field minimized, purpose-bound, and retention-limited? (GDPR data-minimization) — excess collection/retention is the finding.
+- **Detecting**: presence/membership inferable via side channels (timing, error-message diffs)?
+- **Non-compliance**: storage/retention violates `project.compliance` or an accepted ADR?
+Most reviews: only **Data disclosure** + **Linking/Identifying** apply — skip the rest unless the flow warrants. (Full LINDDUN: Linking, Identifying, Non-repudiation, Detecting, Disclosure, Unawareness, Non-compliance.)
 #### Code-level scan *(PR/pre-commit only)*
 - **Secrets**: Grep diff for hardcoded keys, tokens, passwords, cloud credentials, JWT secrets. Any hit = **blocker**.
 - **Injection**: Raw SQL with string concatenation, shell-exec with user input, `eval`/`exec`, unsafe deserialization. Tag OWASP + CWE.
 - **Input validation**: New endpoints without schema validation, file uploads without size/type limits, path params used directly in filesystem calls.
 - **Auth**: New routes/handlers missing auth decorators / middleware. Compare against neighbors in same file.
+- **API authorization (OWASP API Top 10 2023)**: new/changed endpoints — object-level authz (**BOLA**: can user A fetch user B's object by id?), property-level (**BOPLA**: mass-assignment / over-exposed response fields), SSRF on user-supplied URLs. The API-specific complement to the generic Top 10.
 - **Dependencies**: New packages — pinned to exact version (no `^`/`~`/`>=`/`*`), lockfile updated and committed with integrity hashes, name is real (typosquat risk), `dep_audit` output clean if committed. Flag packages with zero downloads, recent ownership transfer (~90 days), suspicious new maintainers, or post-install scripts. Unpinned dep or missing lockfile entry on a new package = **blocker** (see `./security-compliance.md` § Supply Chain Defense).
 - **PII**: New logging that could emit PII; new storage of personal data without encryption.
 - **Cross-service auth**: If `context.yml` lists related services, are service-to-service calls authenticated?
@@ -227,20 +242,21 @@ Every security finding gets OWASP 2021 + CWE tags. See CWE quick-reference in `.
 Skip if change is purely internal.
+**Mandate — review the *testing of the spec*, not the spec itself.** QA owns exactly one question: *is what the spec already defines adequately and honestly tested?* It does **not** own feature completeness or scope — the Product Manager owns coverage of user-facing behaviors, the Senior Engineer owns build correctness. QA never expands the requirement. A behavior, edge case, or failure mode the spec does not define is a **scope question routed to the PM**, never a QA finding. YAGNI: demand no test for behavior nobody asked for. This persona is the most prone to over-reach — if a finding would *add* required behavior, it is out of lane; drop it.
 **Coverage-gap routing (apply before recommending any new artifact).** A coverage gap does NOT default to "write a `.feature`." Route each gap to its one home using the feature-file admission test in `../grimoire-draft/SKILL.md` (§ jurisdiction table + the four admission gates):
 - A `.feature` scenario is warranted **only** for an actor-observable behavior that passes all four gates (external actor, observable outcome, domain language, survives reimplementation).
 - An invariant — observability/logging guarantee, perf budget, security control, compliance rule — is a **constraint**. Recommend it be recorded/verified in `.grimoire/docs/constraints.md`, never as a new `.feature`.
 - When a behavior gap belongs in features, the default is **extend an existing feature file** in the same domain — recommend a new file only if no existing file fits, and say which were considered. Don't propose a `.feature` per finding.
 - Test gaps for already-specified behavior are a *missing test*, not a missing feature — recommend the test, not a new spec.
-Evaluate:
-- **Test presence**: Every new user-facing behavior has a test? Every scenario from a linked feature file has step definitions? Missing test = recommend the test; only recommend a new/extended `.feature` if the behavior itself is unspecified *and* passes the admission test.
-- **Test quality**: Tests asserting outputs, or just that code "ran"? Over-mocked tests = red flag.
-- **Negative paths**: For each happy path, is there a failure-path test?
-- **Edge cases**: Empty states, concurrent users, interruptions, boundary values? A missing edge case is a missing test/scenario in the relevant existing feature — not grounds for a new feature file.
-- **Observability**: New feature — how will it be debugged in prod? Structured logs / metrics / error surfaces? Observability is a **constraint**: verify it's asserted in `constraints.md`; do not recommend a `.feature` for it.
-- **Regression risk** *(PR/pre-commit)*: Which existing tests cover the touched code? Were any tests removed or weakened?
-- **Accessibility**: New UI — keyboard nav, aria labels, contrast?
+Evaluate (every check anchored to behavior the spec / feature already defines):
+- **Test presence**: Does every behavior the spec defines have a test? Every scenario in a linked feature file have step definitions? Missing test for a *specified* behavior → recommend the test. A behavior with no spec is not a QA finding — route to PM.
+- **Test quality (FIRST + behavior-not-implementation)**: Tests assert real outputs, not that code "ran"? Tests **F**ast / **I**solated / **R**epeatable / **S**elf-validating / **T**imely? Over-mocked tests that verify the mock instead of the behavior, or tests coupled to implementation internals rather than observable behavior = QA's highest-value finding. This is the lane — spend the attention here.
+- **Negative paths**: Where the spec defines a failure behavior (an error, a rejection, a rollback), is it tested? Do **not** invent failure modes the spec is silent on.
+- **Edge cases (Boundary Value Analysis / Equivalence Partitioning)**: For input the spec gives a range or set, test the boundaries and one value per equivalence class — but only for ranges the spec, a scenario, or a `constraints.md` entry actually names. **BVA bounds "enough edge cases" to the spec's stated values; it does not license inventing scenarios.** An unspecified edge case ("what about concurrent users / interruptions?") is a scope question for the PM — drop it, don't file it as a test gap.
+- **Regression risk** *(PR/pre-commit)*: Which existing tests cover the touched code? Were any removed or weakened? A silently weakened assertion is a **blocker**.
+- **Out of lane — defer, don't duplicate**: Observability is a **constraint** (verify it's in `constraints.md`; never a `.feature`). Accessibility belongs to the Adversarial User personas (§4.7). Note in one line and defer — do not re-file their findings as QA blockers.
 ### 4.5 Data Engineer
@@ -251,11 +267,12 @@ Read:
 - `.grimoire/docs/data/schema.yml` — current baseline
 Evaluate:
-- **Migrations**: Safe on live DB? Adding NOT NULL without default on large table = **blocker**. Renames without two-step migration = **blocker**.
+- **Migrations — name the deployment consequence, don't mandate zero-downtime**: State plainly what the change costs to ship — does this ALTER **lock the table** (a downtime-incurring change)? Is it **backward-incompatible** (old app versions break mid-deploy)? Is it **irreversible**? Zero-downtime is *not* assumed — many projects accept a maintenance window. Surface the cost as a decision for the human ("this rename locks `users` and breaks rolling deploy — confirm a downtime window is acceptable, or split expand→contract to avoid it"), **not** an automatic blocker. **Expand–Contract / Parallel Change** (add new → backfill → switch reads → drop old) is the *option* offered when the project wants zero-downtime, never a requirement. **A downtime-incurring or backward-incompatible migration MUST be flagged in the PR / merge-request body** (a one-line `⚠️ incurs downtime` / `breaking schema change` note) so it's visible at merge — surfacing it only in the review is not enough.
 - **Indexes**: New foreign keys with no index? New query patterns against unindexed columns?
 - **Naming**: New fields follow existing schema conventions?
-- **Backwards compatibility**: Will schema change break existing API consumers, queries, or reports?
-- **Breaking contract**: `data.yml` vs `schema.yml` — removed/renamed/retyped response fields or new required request fields = **blocker** unless migration path documented.
+- **Data minimization (PII)**: New columns holding personal data — each field necessary, purpose-bound, retention-limited, encrypted at rest if sensitive? Excess PII storage = finding. Pairs with the Security persona's LINDDUN Data-disclosure check (§4.3).
+- **Backward / forward compatibility (named outcome, not forced)**: State *whether* the change is backward-compatible (old readers survive) and forward-compatible (old code + new schema holds during deploy). Report the outcome so breaking is a *conscious* choice, not a surprise — the project decides if it's acceptable; don't force compatibility. If broken, it rides the same PR / merge-request flag as a downtime change.
+- **Breaking contract**: `data.yml` vs `schema.yml` — removed/renamed/retyped response fields or new required request fields = **blocker** unless migration path documented. (This is the silent-break exception: breaking a *documented consumer contract* without noting it harms other people's code — distinct from a self-contained downtime choice above.)
 - **Transactions**: Multi-step writes wrapped in a transaction?
 - **External APIs** *(design)*: New API dependency — `schema_ref` pointing to a stable spec? Fallback if API unavailable?

package/skills/references/testing-contracts.md CHANGED Viewed

@@ -18,6 +18,25 @@ Loaded by skills that involve writing tests, mocking external services, or verif
 - When the contract changes, the fixture must change — stale fixtures are false-positive tests
 - Include at least one error response fixture per external API (matching `error_response` in `schema.yml`)
+## Test Data Generation
+**Do not ask the user for test data, sample records, or example scenarios.** Scenarios are derived from the spec (`.feature` scenarios, constraints, contracts); the *data* that exercises them is generated. Asking the user to supply fixtures by hand front-loads the work onto them and produces brittle, hand-curated values. Generate it instead, in this order of preference:
+1. **Project's standard data factory / generator (default).** Check `config.tools` and existing test imports for the tool already in use and follow it:
+   - **Python**: `factory_boy`, `model_bakery`/`model-mommy`, `faker`, `Hypothesis` (property-based)
+   - **JS/TS**: `@faker-js/faker`, `fishery`, `test-data-bot`, `fast-check` (property-based)
+   - **Ruby**: `factory_bot`, `faker`
+   - **Go**: `gofakeit`, table-driven fixtures
+   - **Java/Kotlin**: `instancio`, `easy-random`, `jqwik` (property-based)
+   Build records through the factory, override only the fields the scenario actually pins, and let the tool fill the rest. For invariants ("never accepts a negative amount", "round-trips any valid payload"), prefer **property-based generation** (Hypothesis / fast-check / jqwik) over a handful of literals — it covers the input space the spec describes instead of one example.
+2. **Recorded / fixture responses** for external-API contracts — concrete instances of the `schema.yml` contract (see Fixture Management above). These are captured shapes, not invented ones.
+3. **AI-authored literal test data — last resort, only on explicit instruction.** Hand-writing literal records (the agent inventing `{name: "Acme Corp", amount: 4200, ...}`) is permitted **only when the user explicitly asks for generated test data**, or no factory tooling exists in the project *and* the case needs one specific crafted value (a known edge constant, a regression repro). When you fall back to this, say so — note in the task/test why a factory wasn't used. Never silently invent a dataset.
+No data-factory tool configured and the project has tests? Match whatever those tests already do. Proposing a *new* factory dependency is a plan-stage decision, not something to pull in mid-implementation.
 ## Contract Test Requirements
 Every external API integration needs contract tests that assert:

package/templates/draft.md ADDED Viewed

@@ -0,0 +1,108 @@
+---
+status: draft
+change-id: <kebab-case-verb-led>
+kind: greenfield | refactor
+# NO complexity here. Complexity is an OUTPUT of design — scored at projection
+# (after agreement) and written to manifest.md, never to this file.
+---
+<!--
+  draft.md — the ONE living surface you design a change on.
+  This is where the whole change lives as a single coherent picture: diagram/sketch,
+  rationale, a decision ledger, pseudo-code, and an open-question ledger. You and the
+  user iterate HERE (this is the interview). Nothing is written to features/,
+  constraints.md, or decisions/ until the design is agreed — then it is PROJECTED into
+  those homes. This file is ephemeral: retained read-only as reference through the
+  pipeline, deleted when the change folder is cleared at grimoire-apply finalize. Git
+  history preserves it.
+  Required sections: At a glance · Why · Decisions · Decided / Open.
+  As-needed: Current state (REQUIRED for kind=refactor) · Sketches · Constraints · Cut.
+  Delete the guidance comments and any section that carries no weight for this change.
+-->
+# <change> — draft
+**Date:** <YYYY-MM-DD> · **Provenance:** <prior passes / branches / source docs, if any>
+## At a glance
+<!--
+  Make the whole change graspable in one screen. Pick the medium that fits:
+    - greenfield → an ASCII flow / box diagram of the system or pipeline
+    - refactor   → a pseudo-code sketch of the target shape (annotate with decision IDs)
+  If grimoire-design (Figma) output exists for this change, its visual + component/state
+  material anchors this section.
+-->
+## Why
+<!--
+  greenfield: the objective (what problem, how you'll know it's solved) + non-goals.
+  refactor:   the pain points justifying the change — each a NAMED, LOCATED smell with a
+              file:line breadcrumb, not an abstract complaint.
+-->
+## Current state            <!-- REQUIRED for kind=refactor; omit for greenfield -->
+<!--
+  How the touched system works TODAY, with breadcrumbs to live code. Mandate the codebase
+  graph: index_repository first if needed, then search_graph / trace_path / get_code_snippet
+  for qualified names, callers, and call chains. Follow with a severity-ranked Gaps/drift
+  list — the audit findings that motivate the redesign.
+-->
+## Decisions
+<!--
+  ONE inline ledger. Each row: a stable ID, the decision, and its WHY. Use sub-IDs (D1a)
+  and cross-references (D7 cites D3) freely — this is how coupled decisions stay legible
+  in one place. At projection, each NOVEL decision becomes a MADR (novelty gate applies —
+  obvious tooling picks fold into the baseline ADR, they don't mint a record).
+-->
+| #  | Decision | Why |
+|----|----------|-----|
+| D1 |          |     |
+## Sketches                 <!-- as-needed; expected for refactor -->
+<!--
+  Pseudo-code / key considerations for the target shape. Mark it "not final" — it is shape,
+  not contract. Annotate lines with the decision IDs they realize (# D8).
+-->
+## Constraints              <!-- as-needed -->
+<!--
+  Invariants this change must hold (security / NFR / observability / compliance). One line
+  each: assertion · rationale · how-verified. These project to .grimoire/docs/constraints.md
+  — NOT to a feature file.
+-->
+## Decided / Open
+<!--
+  Two lists. Decided = settled calls (cross-reference the D-IDs). Open = live unknowns.
+  Resolve an Open IN PLACE — rewrite it as `RESOLVED: <answer> (Dn)`, do not delete it.
+  The struck-through trail is the record of the thinking. Design is "done" when Decided is
+  stable and Open is empty-or-deferred.
+-->
+**Decided:**
+-
+**Open:**
+-
+## Cut / deferred           <!-- as-needed; greenfield-leaning -->
+<!--
+  What was deliberately removed or deferred, so it is not silently lost. Table:
+  cut · what it was · why cut · re-add when. Design by subtraction, recorded.
+-->
+| Cut | What it was | Why cut | Re-add when |
+|-----|-------------|---------|-------------|
+|     |             |         |             |