forgecraft-mcp 1.7.0 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (52) hide show
  1. package/README.md +79 -0
  2. package/dist/registry/remote-gates.d.ts +16 -0
  3. package/dist/registry/remote-gates.d.ts.map +1 -1
  4. package/dist/registry/remote-gates.js +56 -0
  5. package/dist/registry/remote-gates.js.map +1 -1
  6. package/dist/registry/sentinel-domain-map.d.ts.map +1 -1
  7. package/dist/registry/sentinel-domain-map.js +16 -9
  8. package/dist/registry/sentinel-domain-map.js.map +1 -1
  9. package/dist/registry/sentinel-renderer.d.ts +13 -8
  10. package/dist/registry/sentinel-renderer.d.ts.map +1 -1
  11. package/dist/registry/sentinel-renderer.js +440 -162
  12. package/dist/registry/sentinel-renderer.js.map +1 -1
  13. package/dist/shared/harness-budget.d.ts +49 -0
  14. package/dist/shared/harness-budget.d.ts.map +1 -0
  15. package/dist/shared/harness-budget.js +123 -0
  16. package/dist/shared/harness-budget.js.map +1 -0
  17. package/dist/shared/hook-installer.d.ts.map +1 -1
  18. package/dist/shared/hook-installer.js +2 -1
  19. package/dist/shared/hook-installer.js.map +1 -1
  20. package/dist/tools/close-cycle-helpers.d.ts +9 -0
  21. package/dist/tools/close-cycle-helpers.d.ts.map +1 -1
  22. package/dist/tools/close-cycle-helpers.js.map +1 -1
  23. package/dist/tools/close-cycle.d.ts.map +1 -1
  24. package/dist/tools/close-cycle.js +29 -0
  25. package/dist/tools/close-cycle.js.map +1 -1
  26. package/dist/tools/contribute-gate.d.ts +30 -4
  27. package/dist/tools/contribute-gate.d.ts.map +1 -1
  28. package/dist/tools/contribute-gate.js +180 -66
  29. package/dist/tools/contribute-gate.js.map +1 -1
  30. package/dist/tools/gate-genesis.d.ts +47 -0
  31. package/dist/tools/gate-genesis.d.ts.map +1 -0
  32. package/dist/tools/gate-genesis.js +241 -0
  33. package/dist/tools/gate-genesis.js.map +1 -0
  34. package/dist/tools/learning-graph.d.ts +31 -0
  35. package/dist/tools/learning-graph.d.ts.map +1 -0
  36. package/dist/tools/learning-graph.js +266 -0
  37. package/dist/tools/learning-graph.js.map +1 -0
  38. package/dist/tools/setup-artifact-writers.d.ts +15 -3
  39. package/dist/tools/setup-artifact-writers.d.ts.map +1 -1
  40. package/dist/tools/setup-artifact-writers.js +149 -13
  41. package/dist/tools/setup-artifact-writers.js.map +1 -1
  42. package/dist/tools/setup-phase2.d.ts +9 -0
  43. package/dist/tools/setup-phase2.d.ts.map +1 -1
  44. package/dist/tools/setup-phase2.js +13 -0
  45. package/dist/tools/setup-phase2.js.map +1 -1
  46. package/dist/tools/setup-project.d.ts +6 -0
  47. package/dist/tools/setup-project.d.ts.map +1 -1
  48. package/dist/tools/setup-project.js +21 -4
  49. package/dist/tools/setup-project.js.map +1 -1
  50. package/package.json +99 -98
  51. package/templates/api/instructions.yaml +50 -188
  52. package/templates/universal/instructions.yaml +194 -1003
@@ -39,158 +39,60 @@ blocks:
39
39
  title: "Dev Environment Hygiene"
40
40
  content: |
41
41
  ## Dev Environment Hygiene
42
-
43
- AI-assisted development can silently fill disk space. These rules are non-negotiable.
44
- A full disk kills every running tool simultaneously VS Code, Docker, the terminal, the DB.
45
-
46
- ### VS Code Extensions
47
- - Before installing any extension: `code --list-extensions | grep -i <name>`.
48
- - Only install if no version in the required major range is already present.
49
- - Never run `code --install-extension` unconditionally in scripts or setup steps.
50
- - Installing the same extension twice on the same day = a bug in your script.
51
-
52
- ### Docker Containers & Volumes
53
- - Check before creating: `docker ps -a --filter name=<service>` — if it exists, start it, don't create it.
54
- - Prefer `docker compose up` (reuse) over bare `docker run` (always creates new).
55
- - One Compose file per project. Split files for the same project = tech debt.
56
- - Log pruning: run `docker system prune -f` periodically. Never let container logs exceed 500 MB total.
57
- - Time-series or synthetic data volumes: before writing >100 MB, ask whether raw retention,
58
- statistical condensation, or deletion after the run is preferred.
59
- - Synthetic datasets older than 7 days with no code reference: ask to delete.
60
-
61
- ### Python Virtual Environments
62
- - One `.venv` per project root, one per standalone package subdirectory — never more.
63
- - Before creating: check if `.venv/` exists and `python --version` matches the required major.minor.
64
- Recreate only on major version mismatch or explicit user request.
65
- - Never create a venv in a subdirectory unless that directory is a standalone installable package.
66
- - Sanitize dependencies: if `pip list --not-required` reveals packages not in requirements, flag them.
67
-
68
- ### General Install Hygiene
69
- - Before any install/download: check version already installed. Skip if within the required range.
70
- - If project directory disk usage outside of `node_modules/`, `.venv/`, `dist/`, `.next/`
71
- exceeds 2 GB: surface a warning and ask before continuing any file-generating operation.
72
- - Never silently grow the workspace. When uncertain about retention, ask.
42
+ - VS Code: `code --list-extensions | grep -i <name>` before install; skip if in-range. Never install unconditionally in scripts.
43
+ - Docker: `docker ps -a --filter name=<service>` before create — reuse if exists. Prefer `docker compose up` over `docker run`. One Compose file per project.
44
+ - Docker: `docker system prune -f` periodically; container logs < 500 MB total.
45
+ - Data volumes: before writing >100 MB, ask retention. Synthetic data >7 days with no code ref: ask to delete.
46
+ - Python: one `.venv` per project root or standalone package — never more. Check `.venv/` + `python --version` before creating; recreate only on major mismatch.
47
+ - Before any install: check installed version, skip if in-range.
48
+ - If project dir usage (excluding `node_modules/`, `.venv/`, `dist/`, `.next/`) > 2 GB: warn and ask before generating files.
73
49
 
74
50
  - id: dependency-registry
75
51
  tier: core
76
52
  title: "Dependency Registry"
77
53
  content: |
78
- ## Dependency Registry — AI-Maintained Security Contract
79
-
80
- The project's approved dependency set is a **living GS artifact maintained by the AI
81
- assistant**. It is not a template rule template authors cannot predict which library
82
- will gain a CVE next quarter. The AI can run an audit at the moment a dependency is
83
- about to be added. This block prescribes that it must.
84
-
85
- ### The registry artifact
86
-
87
- File: **`docs/approved-packages.md`** — emit in P1 alongside schema, tsconfig, package.json.
88
- Update it every time a dependency is added or upgraded. If it exists only in prose or a
89
- README reference, it does not exist.
90
-
91
- ```markdown
92
- # Approved Packages
93
-
94
- | Package | Version range | Purpose | Alternatives rejected | Rationale | Audit status |
95
- |---|---|---|---|---|---|
96
- | example-pkg | ^2.4 | HTTP client | axios (larger bundle), node-fetch (no TS types) | Wide adoption, zero known CVEs | 0 HIGH/CRITICAL |
97
- ```
98
-
99
- The AI populates every row. The registry is the authoritative record of WHY each
100
- dependency was chosen and that it was clean at the time of addition.
101
-
102
- ### Process rules — stack-agnostic
103
-
104
- 1. **Before adding any package**: run the project's audit command (see table below)
105
- with `--dry-run` or equivalent to check the candidate for known CVEs.
106
- - If HIGH or CRITICAL found: choose an alternative and document the rejection.
107
- - If no CVE-free alternative exists: document the accepted risk and create an ADR
108
- naming the approver. Zero-tolerance is the default; exceptions require a record.
109
- 2. **After adding a package**: add a row to `docs/approved-packages.md` with audit status.
110
- 3. **Commit gate**: the pre-commit hook runs the audit command. HIGH or CRITICAL blocks
111
- the commit. If audit is not in the pre-commit hook, the gate does not exist.
112
- 4. **Version pins**: approved version ranges are locked in the lockfile (package-lock.json,
113
- uv.lock, Cargo.lock). The lockfile is committed. Ranges without a lockfile are not pins.
114
-
115
- ### Audit commands by ecosystem
54
+ ## Dependency Registry
55
+ - File: **`docs/approved-packages.md`** — emit in P1, update on every add/upgrade. Columns: Package, Version range, Purpose, Alternatives rejected, Rationale, Audit status.
56
+ - Before adding any package: run the audit command (table) for CVEs. HIGH/CRITICAL → pick an alternative and document the rejection. No clean alternative → ADR naming the approver.
57
+ - After adding: add a row with audit status.
58
+ - Commit gate: pre-commit hook runs audit; HIGH/CRITICAL blocks. Not in the hook = gate does not exist.
59
+ - Version pins live in the committed lockfile (package-lock.json, uv.lock, Cargo.lock).
116
60
 
117
61
  | Ecosystem | Audit command | Threshold |
118
62
  |---|---|---|
119
- | npm / Node.js | `npm audit --audit-level=high` | HIGH or CRITICAL |
120
- | pnpm | `pnpm audit --audit-level=high` | HIGH or CRITICAL |
121
- | yarn | `yarn npm audit --severity high` | HIGH or CRITICAL |
122
- | Python / pip | `pip-audit --fail-on-severity high` | HIGH or CRITICAL |
123
- | Python / uv | `uv audit` | HIGH or CRITICAL |
124
- | Rust | `cargo audit` | HIGH or CRITICAL |
125
- | Go | `govulncheck ./...` | Any directly imported |
126
- | Java / Maven | `mvn dependency-check:check -DfailBuildOnCVSS=7` | CVSS ≥ 7 |
127
- | Ruby | `bundle audit` | HIGH or CRITICAL |
128
-
129
- The correct command for **this project's ecosystem** must appear in the pre-commit hook
130
- emitted in P1. Discovering CVEs at code review is too late.
63
+ | npm | `npm audit --audit-level=high` | HIGH/CRITICAL |
64
+ | pnpm | `pnpm audit --audit-level=high` | HIGH/CRITICAL |
65
+ | yarn | `yarn npm audit --severity high` | HIGH/CRITICAL |
66
+ | pip | `pip-audit --fail-on-severity high` | HIGH/CRITICAL |
67
+ | uv | `uv audit` | HIGH/CRITICAL |
68
+ | Rust | `cargo audit` | HIGH/CRITICAL |
69
+ | Go | `govulncheck ./...` | Any direct |
70
+ | Maven | `mvn dependency-check:check -DfailBuildOnCVSS=7` | CVSS ≥ 7 |
71
+ | Ruby | `bundle audit` | HIGH/CRITICAL |
131
72
 
132
73
  - id: language-stack-constraints
133
74
  tier: core
134
75
  title: "Language Stack Constraints"
135
76
  content: |
136
- ## Language Stack Constraints — Seed Defaults
137
-
138
- These are **starting defaults for {{language}} projects** — use them to populate the
139
- initial rows of `docs/approved-packages.md` in P1. They are not a permanent approved
140
- list: the AI maintains the registry from here forward, keeps versions current, and
141
- replaces any entry that develops a known CVE. The Dependency Registry block above
142
- governs the process.
143
-
144
- Before adding any dependency not listed here, apply the audit-before-add process.
77
+ ## Language Stack Constraints — Seed Defaults for {{language}}
78
+ Seed rows for `docs/approved-packages.md` in P1; apply audit-before-add for anything not listed.
145
79
 
146
80
  {{#if language_is_typescript}}
147
- ### TypeScript / Node.js — Approved Toolchain
148
-
149
- **Runtime & compiler**
150
- - Node.js: `^20 LTS` minimum. NOT `^16` or `^18` (EOL or near-EOL).
151
- - TypeScript: `^5.4` minimum. `tsconfig.json` must include `"strict": true` AND
152
- `"noUncheckedIndexedAccess": true`. The second flag is required to narrow
153
- `process.env.*` from `string | undefined` at compile time.
154
-
155
- **Linting**
156
- - `eslint@^9` + `@typescript-eslint/eslint-plugin@^8` + `@typescript-eslint/parser@^8`
157
- - NOT `@typescript-eslint@^5` or `^6` — old `minimatch` transitive dep has known CVEs.
158
- - NOT `tslint` — deprecated.
159
-
160
- **Test runner**
161
- - `vitest@^2` (preferred — native ESM, fast, Jest-compatible API) or `jest@^29`.
162
- - NOT `mocha` + `chai` for new projects (weaker TypeScript support).
163
- - NOT `jasmine` (no active maintenance for Node.js use).
164
-
165
- **Formatting**
166
- - `prettier@^3` — configured via `.prettierrc`, integrated with ESLint via
167
- `eslint-config-prettier`. NOT separate manual formatting.
81
+ ### TypeScript / Node.js
82
+ - Node.js `^20 LTS` min. NOT `^16`/`^18` (EOL).
83
+ - TypeScript `^5.4` min. `tsconfig.json`: `"strict": true` AND `"noUncheckedIndexedAccess": true`.
84
+ - `eslint@^9` + `@typescript-eslint/*@^8`. NOT `^5`/`^6` (minimatch CVE). NOT `tslint` (deprecated).
85
+ - `vitest@^2` or `jest@^29`. NOT `mocha`+`chai` (weak TS). NOT `jasmine` (unmaintained).
86
+ - `prettier@^3` via `.prettierrc` + `eslint-config-prettier`.
168
87
  {{/if}}
169
88
 
170
89
  {{#if language_is_python}}
171
- ### Python — Approved Toolchain
172
-
173
- **Runtime**
174
- - Python `^3.11` minimum. NOT `3.8` or `3.9` for new projects `3.11+` brings
175
- `tomllib` builtin, `ExceptionGroup`, and significant performance improvements.
176
-
177
- **Linting / formatting**
178
- - `ruff@^0.4` — replaces `flake8` + `isort` + `black` with a single 10–100× faster tool.
179
- - NOT separate `flake8` + `isort` + `black` for new projects.
180
-
181
- **Type checking**
182
- - `pyright@^1.1` (strict mode) — same engine as Pylance, best TypedDict support.
183
- - `mypy@^1.9` is acceptable. Strict mode required in both cases.
184
- - NOT unchecked Python — all public functions must be typed.
185
-
186
- **Test runner**
187
- - `pytest@^8` — NOT `unittest` for new projects.
188
- - Async tests: `pytest-asyncio@^0.23`.
189
-
190
- **Dependency management**
191
- - `uv@^0.1` (recommended — fastest resolver) or `poetry@^1.7`.
192
- - ALL dependencies pinned in lockfile (`uv.lock` or `poetry.lock`). Lockfile committed.
193
- - `pip-tools` acceptable for library projects.
90
+ ### Python
91
+ - Python `^3.11` min. NOT `3.8`/`3.9` (no tomllib/ExceptionGroup).
92
+ - `ruff@^0.4`. NOT separate `flake8`+`isort`+`black`.
93
+ - `pyright@^1.1` strict or `mypy@^1.9` strict. All public functions typed.
94
+ - `pytest@^8` (NOT `unittest`); async via `pytest-asyncio@^0.23`.
95
+ - `uv@^0.1` or `poetry@^1.7`. Deps pinned in committed `uv.lock`/`poetry.lock`.
194
96
  {{/if}}
195
97
 
196
98
  - id: production-code-standards
@@ -198,322 +100,114 @@ blocks:
198
100
  title: "Production Code Standards"
199
101
  content: |
200
102
  ## Production Code Standards — NON-NEGOTIABLE
201
-
202
- These apply to ALL code including prototypes. "It's just a prototype" is never a valid
203
- exception. Prototypes become production code within days at CC development speed.
204
-
205
- ### SOLID Principles
206
- - **Single Responsibility**: One module = one reason to change. Use "and" to describe it? Split it.
207
- - **Open/Closed**: Extend via interfaces and composition. Never modify working code for new behavior.
208
- - **Liskov Substitution**: Any interface implementation must be fully swappable. No isinstance checks.
209
- - **Interface Segregation**: Small focused interfaces. No god-interfaces.
210
- - **Dependency Inversion**: Depend on abstractions. Concrete classes are injected, never instantiated
211
- inside business logic. **In practice**: define `IUserRepository`, `IOrderRepository`,
212
- `IEmailSender` etc. as interfaces in the domain/service layer first. Services depend on
213
- the interface. The Prisma/SQL/HTTP concrete implementation lives in the adapter layer and
214
- is injected at the composition root. Emit these interfaces in P1 alongside the schema —
215
- a service that imports a concrete class cannot be unit-tested, cannot be swapped, and
216
- is not Composable.
217
-
218
- ### Zero Hardcoded Values
219
- - ALL configuration through environment variables or config files. No exceptions.
220
- - ALL external URLs, ports, credentials, thresholds, feature flags must be configurable.
221
- - ALL magic numbers must be named constants with documentation.
222
- - Config is validated at startup — fail fast if required values are missing.
223
-
224
- ### Zero Mocks in Application Code
225
- - No mock objects, fake data, or stub responses in source code. Ever.
226
- - Mocks belong ONLY in test files.
227
- - For local dev: create proper interface implementations selected via config.
228
- - No `if DEBUG: return fake_data` patterns. Use dependency injection to swap implementations.
229
- - No TODO/FIXME stubs returning hardcoded values. Use NotImplementedError with a description.
230
-
231
- ### Interfaces First
232
- Before writing any implementation:
233
- 1. Define the interface/protocol/abstract class
234
- 2. Define the data contracts (input/output DTOs)
235
- 3. Write the consuming code against the interface
236
- 4. Write tests against the interface
237
- 5. THEN implement the concrete class
238
-
239
- ### Dependency Injection
240
- - Every service receives dependencies through its constructor.
241
- - A composition root (main.py / app.ts / container) wires everything.
242
- - No service locator pattern. No global singletons. No module-level instances.
243
-
244
- ### Error Handling
245
- - Custom exception hierarchy per module. No bare Exception raises.
246
- - Errors carry context: IDs, timestamps, operation names.
247
- - Fail fast, fail loud. No silent swallowing of exceptions.
248
- - Domain code never returns HTTP status codes — that's the API layer's job.
249
-
250
- ### Modular from Day One
251
- - Feature-based modules over layer-based. Each feature owns its models, service, repository, routes.
252
- - Module dependency graph must be acyclic.
253
- - Every module has a clear public API via {{#if language_is_typescript}}index.ts{{/if}}{{#if language_is_python}}__init__.py{{/if}} exports.
103
+ Apply to ALL code including prototypes.
104
+
105
+ - **SOLID**: SRP (one reason to change), OCP (extend, don't modify), LSP (swappable, no isinstance), ISP (small interfaces), DIP (depend on abstractions; inject concretes at composition root).
106
+ - Define port interfaces (`IUserRepository`, `IEmailSender`) in the domain/service layer in P1; concrete impls live in adapters, injected at the root.
107
+ - Zero hardcoded values: all config via env/config files, validated at startup (fail fast). Magic numbers → named constants.
108
+ - Zero mocks in source: mocks only in test files. No `if DEBUG: return fake_data`. Stubs use NotImplementedError, not hardcoded returns.
109
+ - Interfaces first: interface DTOs consuming code tests concrete impl.
110
+ - Type-driven design: make illegal states unrepresentable. {{#if language_is_typescript}}Discriminated unions for state, `Result<T,E>` for fallible ops, branded types for validated values; parse external input into typed objects at the boundary (Zod) — never pass raw input inward.{{/if}}{{#if language_is_python}}Tagged unions (`Literal` + dataclass), `NewType` for validated values, frozen dataclasses; parse external input at the boundary (Pydantic) — never pass raw dicts inward.{{/if}}
111
+ - DI via constructor; composition root wires everything. No service locator, global singletons, module-level instances.
112
+ - Error handling: custom exception hierarchy per module, errors carry context (IDs, timestamps, op name), fail loud. Domain never returns HTTP status codes.
113
+ - Feature-based modules (own models/service/repo/routes), acyclic graph, public API via {{#if language_is_typescript}}index.ts{{/if}}{{#if language_is_python}}__init__.py{{/if}}.
254
114
 
255
115
  - id: layered-architecture
256
116
  tier: recommended
257
117
  title: "Layered Architecture"
258
118
  content: |
259
119
  ## Layered Architecture (Ports & Adapters / Hexagonal)
120
+ Layers (outer→inner): API/CLI/Handlers (thin, validate+delegate) → Services (orchestrate, depend on ports only) → Domain models (pure, no I/O, no framework) → Port interfaces → Repositories/Adapters (external I/O) → Infrastructure/Config (DI, env).
260
121
 
261
- ```
262
- ┌─────────────────────────────┐
263
- │ API / CLI / Event Handlers │ ← Thin. Validation + delegation only. No logic.
264
- ├─────────────────────────────┤ These are DRIVING ADAPTERS (primary).
265
- │ Services (Business Logic) │ ← Orchestration. Depends on PORT INTERFACES only.
266
- ├─────────────────────────────┤
267
- │ Domain Models │ ← Pure data + behavior. No I/O. No framework imports.
268
- │ (Entities, Value Objects) │ The inner hexagon. Zero external dependencies.
269
- ├─────────────────────────────┤
270
- │ Port Interfaces │ ← Abstract contracts (Repository, Gateway, Notifier).
271
- │ │ Defined by the domain, implemented by adapters.
272
- ├─────────────────────────────┤
273
- │ Repositories / Adapters │ ← DRIVEN ADAPTERS (secondary). All external I/O
274
- │ │ (DB, APIs, files, queues, email, caches).
275
- ├─────────────────────────────┤
276
- │ Infrastructure / Config │ ← DI container, env config, connection factories
277
- └─────────────────────────────┘
278
- ```
122
+ ### Ports & Adapters
123
+ - Ports (`UserRepository`, `PaymentGateway`, `EmailSender`) defined in domain/service layer, never in adapters. Specify WHAT, not HOW.
124
+ - Driving adapters (HTTP/CLI/consumers) call through ports; driven adapters (PostgresUserRepository, StripePaymentGateway) are called through ports.
125
+ - Adapters interchangeable: swap Postgres for InMemory in tests with zero logic changes.
279
126
 
280
- ### Ports (Interfaces owned by the domain)
281
- - **Repository ports**: `UserRepository`, `OrderRepository`data persistence contracts.
282
- - **Gateway ports**: `PaymentGateway`, `EmailSender` external service contracts.
283
- - Ports are defined in the domain/service layer, never in the adapter layer.
284
- - Port interfaces specify WHAT, never HOW.
285
-
286
- ### Adapters (Implementations of ports)
287
- - **Driving adapters** (primary): HTTP controllers, CLI handlers, message consumers
288
- — they CALL the application through port interfaces.
289
- - **Driven adapters** (secondary): PostgresUserRepository, StripePaymentGateway,
290
- SESEmailSender — they ARE CALLED BY the application through port interfaces.
291
- - Adapters are interchangeable. Swap `PostgresUserRepository` for `InMemoryUserRepository`
292
- in tests without changing a single line of business logic.
293
-
294
- ### Data Transfer Objects (DTOs)
295
- - Use DTOs at layer boundaries — never pass domain entities to/from the API layer.
296
- - **Request DTOs**: validated at the API boundary ({{#if language_is_typescript}}Zod schema{{/if}}{{#if language_is_python}}Pydantic model{{/if}} → typed object).
297
- - **Response DTOs**: shaped for the consumer, not mirroring the domain model.
298
- - **Domain ↔ Persistence mapping**: repositories map between domain entities and DB rows/documents.
299
- - DTOs are plain data objects — no methods, no behavior, no framework decorators.
127
+ ### DTOs
128
+ - DTOs at every layer boundary never pass domain entities to/from the API layer.
129
+ - Request DTOs validated at boundary ({{#if language_is_typescript}}Zod{{/if}}{{#if language_is_python}}Pydantic{{/if}}); Response DTOs shaped for consumer; repos map domain ↔ persistence.
130
+ - DTOs are plain data no methods, no framework decorators.
300
131
 
301
132
  ### Layer Rules
302
- - Never skip layers. API handlers do not call repositories directly.
303
- - Dependencies point INWARD only. Inner layers never import from outer layers.
304
- - Domain models have ZERO external dependencies.
305
- - The domain layer does not know HTTP, SQL, or any framework exists.
133
+ - Never skip layers (handlers never call repos directly). Dependencies point INWARD only.
134
+ - Domain models have ZERO external dependencies; the domain does not know HTTP/SQL/frameworks exist.
306
135
 
307
136
  - id: clean-code-principles
308
137
  tier: recommended
309
138
  title: "Clean Code Principles"
310
139
  content: |
311
140
  ## Clean Code Principles
141
+ - **CQS**: commands change state return void; queries return data no side effects. A function does one, not both.
142
+ - **Guard clauses**: handle invalid cases first, return early; happy path at shallowest indent.
143
+ - **Composition over inheritance**: compose via interfaces/delegation; inheritance only for genuine "is-a".
144
+ - **Law of Demeter**: no chaining through objects (`order.getCustomer().getAddress()` — bad); add `order.getShippingCity()`.
145
+ - **Immutability by default**: {{#if language_is_typescript}}`const`/`readonly`, `ReadonlyArray<T>`{{/if}}{{#if language_is_python}}`Final`, `frozen=True` dataclasses, `tuple` over `list`{{/if}}; copy-on-modify; restrict mutable state to smallest scope.
146
+ - **Pure functions**: domain logic/validation/calculation pure; push I/O to adapters.
147
+ - **Factory pattern**: encapsulate construction (`User.create(dto)`); the DI container is the top-level factory.
312
148
 
313
- ### Command-Query Separation (CQS)
314
- - **Commands** change state but return nothing (void).
315
- - **Queries** return data but change nothing (no side effects).
316
- - A function should do one or the other, never both.
317
- - Exception: stack.pop() style operations where separation is impractical — document why.
318
-
319
- ### Guard Clauses & Early Return
320
- - Eliminate deep nesting. Handle invalid cases first, return early.
321
- - The happy path runs at the shallowest indentation level.
322
- - Before:
323
- ```
324
- if (user) {
325
- if (user.isActive) {
326
- if (user.hasPermission) {
327
- // actual logic buried 3 levels deep
328
- ```
329
- - After:
330
- ```
331
- if (!user) throw new NotFoundError(...);
332
- if (!user.isActive) throw new InactiveError(...);
333
- if (!user.hasPermission) throw new ForbiddenError(...);
334
- // actual logic at top level
335
- ```
336
-
337
- ### Composition over Inheritance
338
- - Prefer composing objects via interfaces and delegation over class inheritance.
339
- - Inheritance creates tight coupling and fragile hierarchies.
340
- - Use inheritance ONLY for genuine "is-a" relationships (rare).
341
- - When in doubt, compose: inject a collaborator, don't extend a base class.
342
-
343
- ### Law of Demeter (Principle of Least Knowledge)
344
- - A method should only call methods on: its own object, its parameters, objects it creates,
345
- its direct dependencies.
346
- - Do NOT chain through objects: `order.getCustomer().getAddress().getCity()` — BAD.
347
- - Instead: `order.getShippingCity()` or pass the needed data directly.
348
-
349
- ### Immutability by Default
350
- {{#if language_is_typescript}}- Use `const` over `let`. Use `readonly` on properties and parameters.
351
- - Prefer `ReadonlyArray<T>`, `Readonly<T>`, `ReadonlyMap`, `ReadonlySet`.{{/if}}{{#if language_is_python}}- Use `Final` for constants. Use `frozen=True` on dataclasses.
352
- - Prefer `tuple` over `list` for immutable sequences. Use `MappingProxyType` for immutable dicts.{{/if}}
353
- - When you need to "modify" data, create a new copy with the change.
354
- - Mutable state is the #1 source of bugs. Restrict it to the smallest possible scope.
355
-
356
- ### Pure Functions
357
- - A pure function: same inputs → same outputs, no side effects.
358
- - Domain logic, validation, transformation, and calculation should be pure.
359
- - Side effects (I/O, logging, database) are pushed to the edges (adapters).
360
- - Pure functions are trivially testable — no mocks needed.
361
-
362
- ### Factory Pattern
363
- - Use factories to encapsulate complex object construction.
364
- - Factory methods on the class itself for simple cases: `User.create(dto)`.
365
- - Factory classes/functions when construction involves dependencies or conditional logic.
366
- - Factories are the natural companion to dependency injection — the DI container
367
- IS the top-level factory.
368
-
369
- > **Design reference patterns** (DDD, CQRS, GoF) available on demand via `get_design_reference` tool.
149
+ > Design reference patterns (DDD, CQRS, GoF) on demand via `get_design_reference` tool.
370
150
 
371
151
  - id: twelve-factor-ops
372
152
  tier: optional
373
153
  title: "12-Factor & Operational Readiness"
374
154
  content: |
375
- ## 12-Factor App & Operational Readiness
376
-
377
- ### Configuration
378
- - ALL config comes from environment variables or external config services. Zero config in code.
379
- - Config is validated at startup fail fast with a clear error if required values are missing.
380
- - `.env.example` committed with every variable documented. `.env` is gitignored.
381
-
382
- ### Stateless Processes
383
- - Application processes are stateless. Session data lives in external stores (Redis, DB).
384
- - Any process can be killed and restarted without data loss.
385
- - File uploads go to object storage (S3, GCS), not local disk.
386
-
387
- ### Port Binding
388
- - The application is self-contained and exports services via port binding.
389
- - No runtime injection of a web server — the app embeds its own (Express, Uvicorn, etc.).
390
-
391
- ### Disposability
392
- - Processes start fast (< 5 seconds) and shut down gracefully.
393
- - SIGTERM triggers: stop accepting new work → finish in-flight requests → close connections → exit.
394
- - Workers use robust job queues so interrupted work is retried, not lost.
395
-
396
- ### Dev/Prod Parity
397
- - Minimize gaps between development and production environments.
398
- - Use the same backing services in dev as prod (same DB engine, same cache).
399
- - Docker / containers recommended for environment parity.
400
-
401
- ### Logs as Event Streams
402
- - The app writes logs to stdout/stderr — never to local files.
403
- - Log aggregation is an ops concern (ELK, Datadog, CloudWatch), not an application concern.
404
- - Structured JSON logs with correlation IDs for tracing across services.
405
-
406
- ### Build, Release, Run
407
- - Strict separation: build (compile + assets), release (build + config), run (execute).
408
- - Every release is immutable and tagged. Rollback = deploy a previous release.
409
- - CI/CD pipeline automates: lint → test → build → deploy with gates at each stage.
155
+ ## 12-Factor & Operational Readiness
156
+ - **Config**: all from env/config services, validated at startup (fail fast). Commit `.env.example`, gitignore `.env`.
157
+ - **Stateless processes**: session data in external stores (Redis/DB); uploads to object storage (S3/GCS), not local disk.
158
+ - **Port binding**: app embeds its own server (Express, Uvicorn); no runtime web-server injection.
159
+ - **Disposability**: start < 5s; SIGTERM stop intake drain close exit; workers use durable queues.
160
+ - **Dev/prod parity**: same backing services (DB engine, cache); containers recommended.
161
+ - **Logs as streams**: write to stdout/stderr (never files); structured JSON + correlation IDs; aggregation is an ops concern.
162
+ - **Build/release/run**: strict separation; immutable tagged releases; rollback = redeploy prior release.
410
163
 
411
164
  - id: cicd-deployment
412
165
  tier: recommended
413
166
  title: "CI/CD & Deployment"
414
167
  content: |
415
168
  ## CI/CD & Deployment
416
-
417
- ### Pipeline
418
- - Every push triggers: lint type-check unit tests build integration tests.
419
- - Merges to main additionally run: security scan deploy to staging smoke tests promote.
420
- - Pipeline must complete in under 10 minutes. Parallelize test suites, cache dependencies.
421
- - Failed pipelines block merge. No exceptions.
422
-
423
- ### Environments
424
- - Minimum three environments: **development** (local), **staging** (mirrors prod), **production**.
425
- - Environment config is injected — same artifact runs everywhere with different env vars.
426
- - Staging is a faithful replica of production (same provider, same DB engine, same services).
427
-
428
- ### Deployment Strategy
429
- - Default: **rolling deployment** with health checks (zero downtime).
430
- - For critical services: **blue-green** or **canary** with automated rollback on error rate spike.
431
- - Every deploy is tagged with git SHA. Rollback = redeploy a previous SHA.
432
- - Deployment must be one command or one button. No multi-step manual runbooks.
433
-
434
- ### Preview Environments
435
- - Pull requests get ephemeral preview deployments where feasible (Vercel, Netlify, Railway).
436
- - Preview URLs in PR comments for stakeholder review before merge.
169
+ - Pipeline on push: lint → type-check → unit → build → integration. On main: + security scan → staging → smoke → promote. < 10 min (parallelize, cache). Failed pipeline blocks merge.
170
+ - Three environments min: development (local), staging (faithful prod replica), production. Same artifact, injected config.
171
+ - Default rolling deploy with health checks; blue-green/canary for critical services with auto-rollback on error spike.
172
+ - Tag every deploy with git SHA; rollback = redeploy prior SHA. One command/button no manual runbooks.
173
+ - PRs get ephemeral preview deploys where feasible (Vercel, Netlify, Railway); preview URL in PR comment.
437
174
 
438
175
  - id: testing-pyramid
439
176
  tier: core
440
177
  title: "Testing Pyramid"
441
178
  content: |
442
179
  ## Testing Pyramid
443
-
444
- ```
445
- / E2E \ ← 5-10% of tests. Core journeys only.
446
- / Integration \ ← 20-30%. Real dependencies at boundaries.
447
- / Unit Tests \ ← 60-75%. Fast, isolated, every public function.
448
- ```
180
+ Unit 60-75% (fast, isolated, every public fn) · Integration 20-30% (real deps at boundaries) · E2E 5-10% (core journeys only).
449
181
 
450
182
  ### Coverage Targets
451
- - Overall minimum: {{coverage_minimum | default: 80}}% line coverage (blocks commit)
452
- - New/changed code: {{coverage_new_code_min | default: 90}}% minimum (measured on diff)
453
- - Critical paths: 95%+ (data pipelines, auth, PHI handling, financial calculations)
454
- - Mutation score (MSI) — overall: ≥ 65% (blocks PR merge)
455
- - Mutation score (MSI) — new/changed code: ≥ 70% (measured on diff)
456
- - Note: Line coverage and mutation score are both required. 80% line coverage can coexist
457
- with 58% MSI when tests execute code without asserting its behavior (confirmed in Shattered
458
- Stars). Run stryker-mutator immediately after writing each test batch, not only pre-release.
459
- Tooling: stryker-mutator (JS/TS), mutmut (Python), Pitest (Java).
183
+ - Overall: {{coverage_minimum | default: 80}}% line (blocks commit). New/changed: {{coverage_new_code_min | default: 90}}% on diff. Critical paths (auth, data pipelines, financial): 95%+.
184
+ - Mutation score (MSI): overall 65% (blocks PR merge), new/changed ≥ 70% on diff.
185
+ - Line coverage AND MSI both required (80% line can coexist with 58% MSI). Run mutation after each test batch. Tooling: stryker-mutator (JS/TS), mutmut (Python), Pitest (Java).
460
186
 
461
187
  ### Test Rules
462
- - Every test name is a specification: `test_rejects_duplicate_member_ids` not `test_validation`
463
- - No empty catch blocks. No `assert True`. No tests that can't fail.
464
- - Test files colocated: `[module].test.[ext]` or in `tests/` mirroring src structure.
465
- - Flaky tests are bugs fix or quarantine, never ignore.
466
- - After writing tests for any module, run Stryker on that module before moving on.
467
- Surviving mutants = missing assertions. Fix before proceeding.
468
-
469
- ### Test Doubles Taxonomy
470
- Use the correct double for the job:
471
- - **Stub**: Returns canned data. No assertions on calls. Use when you need to control input.
472
- - **Spy**: Records calls. Assert after the fact. Use to verify side effects.
473
- - **Fake**: Working implementation with shortcuts (in-memory DB). Use for integration-speed tests.
474
- - **Mock**: Pre-programmed expectations. Assert call patterns. Use sparingly — they couple to implementation.
475
- Prefer stubs and fakes over mocks. Tests that mock everything test nothing.
476
-
477
- ### Test Data Builders
478
- - Use Builder or Factory pattern for test data: `UserBuilder.anAdmin().withName('Alice').build()`.
479
- - One builder per domain entity. Builders provide sensible defaults so tests only specify what matters.
480
- - No raw object literals scattered across tests. Centralize in `tests/fixtures/` or `tests/builders/`.
481
-
482
- ### Property-Based Testing
483
- - For pure functions with wide input ranges, add property tests (fast-check, Hypothesis, QuickCheck).
484
- - Define invariants, not examples: "sorting is idempotent", "encode then decode = identity".
485
- - Property tests complement, not replace, example-based tests.
188
+ - Test name = spec: `test_rejects_duplicate_member_ids` not `test_validation`.
189
+ - No empty catch, no `assert True`, no tests that can't fail.
190
+ - Colocate `[module].test.[ext]` or mirror src in `tests/`. Flaky tests are bugs — fix or quarantine.
191
+ - Run Stryker per module after writing its tests; surviving mutants = missing assertions.
192
+
193
+ ### Test Doubles
194
+ - Stub (canned data), Spy (record+assert calls), Fake (in-memory working impl), Mock (preprogrammed expectations — use sparingly). Prefer stubs/fakes.
195
+ - Test data via Builder/Factory (`UserBuilder.anAdmin().build()`), one per entity, centralized in `tests/builders/`. No scattered literals.
196
+ - Property tests (fast-check, Hypothesis) for pure functions: assert invariants, not examples. Complement example tests.
486
197
 
487
198
  - id: tdd-methodology
488
199
  tier: core
489
200
  title: "Test-Driven Development"
490
201
  content: |
491
202
  ## Test-Driven Development (TDD)
203
+ - **RED**: write a failing test, run it, confirm it fails (if it passes, the test is wrong).
204
+ - **GREEN**: minimum code to pass. No more.
205
+ - **REFACTOR**: clean up while green; no new behavior.
206
+ Repeat for every feature, function, and bug fix.
492
207
 
493
- ### Red-Green-Refactor The Only Cycle
494
- 1. **RED**: Write a failing test that describes the desired behavior. Run it. It MUST fail.
495
- If it passes, the test is wrong it's not testing what you think.
496
- 2. **GREEN**: Write the minimum code to make the test pass. No more.
497
- 3. **REFACTOR**: Clean up while all tests stay green. No new behavior in this step.
498
- Repeat. Every feature, every function, every bug fix follows this cycle.
499
-
500
- ### Tests Are Specifications, Not Confirmations
501
- - Write tests against **expected behavior**, never against current implementation.
502
- - A test that passes on broken code is worse than no test — it provides false confidence.
503
- - Never weaken an assertion to match what the code currently does. If the code disagrees
504
- with the spec, the code is wrong.
505
- - Never write a test suite after the fact that just "locks in" existing behavior without
506
- verifying it's correct.
507
-
508
- ### Bug Fix Protocol
509
- - **Every bug fix starts with a failing test** that reproduces the bug.
510
- - The test must fail before the fix and pass after. No exceptions.
511
- - If you can't write a reproducing test, you don't understand the bug well enough to fix it.
512
-
513
- ### One Behavior Per Test
514
- - Each test verifies exactly one behavior or rule.
515
- - A test with multiple unrelated assertions is testing multiple things — split it.
516
- - Test name = the specification: `rejects_expired_tokens`, not `test_auth`.
208
+ - Tests are specs: write against expected behavior, never current implementation. Never weaken an assertion to match the code. Never write after-the-fact tests that lock in unverified behavior.
209
+ - Bug fix: starts with a failing reproducing test (fails before fix, passes after). Can't reproduce = don't understand it.
210
+ - One behavior per test; test name = spec (`rejects_expired_tokens`, not `test_auth`).
517
211
 
518
212
  - id: tdd-enforcement
519
213
  tier: core
@@ -521,144 +215,47 @@ blocks:
521
215
  content: |
522
216
  ## TDD Enforcement — Forbidden Patterns and Gate Protocol
523
217
 
524
- Instructions describe a process. Gates enforce it. This block defines what is
525
- structurally prohibited, what output is required at each gate, and how the
526
- commit sequence makes the TDD cycle auditable.
527
-
528
- ### Forbidden Patterns (non-negotiable)
529
- The following are architecture violations, not style preferences:
530
- - **NEVER write an implementation file before running and showing a failing test.**
531
- Stating that "the test would fail" is not equivalent to running it. Run it.
532
- - **NEVER write tests after implementation** except for bug fix reproduction tests on
533
- pre-existing code not yet covered. Even then: write the test, show it fails, fix,
534
- show it passes.
535
- - **NEVER weaken an assertion** to make a test pass. If the assertion disagrees with
536
- the output, the implementation is wrong.
537
- - **NEVER skip the refactor phase** because "the code is clean enough." The refactor
538
- phase exists to enforce separation of concerns under green. Skipping it is a
539
- commitment not to separate concerns in that increment.
540
- - **NEVER commit a `feat:` or `fix:` with no corresponding `test:` commit** preceding
541
- it in the same branch. The test commit is the audit trail that the red phase occurred.
542
-
543
- ### The Session Gate Protocol
544
- TDD across a multi-step session requires explicit checkpoints the AI reports and the
545
- human can verify. At each gate, the AI must output the actual test runner output,
546
- not a summary of what it expects.
218
+ ### Forbidden (non-negotiable)
219
+ - NEVER write an implementation file before running and showing a failing test. "Would fail" ≠ ran it.
220
+ - NEVER write tests after implementation (except bug-fix repro on pre-existing code: write, show fail, fix, show pass).
221
+ - NEVER weaken an assertion to pass a test.
222
+ - NEVER skip the refactor phase.
223
+ - NEVER commit `feat:`/`fix:` without a preceding `test:` commit in the same branch.
547
224
 
548
- ```
549
- ┌─────────────────────────────────────────────────────┐
550
- │ PHASE 1: RED │
551
- │ Action: Write test for the specified behavior
552
- │ Gate: Run test — paste full failure output │
553
- │ Block: Cannot proceed until failure is shown │
554
- │ Commit: test(scope): [RED] describe behavior │
555
- └───────────────────┬─────────────────────────────────┘
556
- │ failure confirmed
557
- ┌───────────────────▼─────────────────────────────────┐
558
- │ PHASE 2: GREEN │
559
- │ Action: Write minimum implementation │
560
- │ Gate: Run test — paste full passing output │
561
- │ Block: Cannot proceed until passing is shown │
562
- │ Commit: feat(scope): implement to satisfy test │
563
- └───────────────────┬─────────────────────────────────┘
564
- │ green confirmed
565
- ┌───────────────────▼─────────────────────────────────┐
566
- │ PHASE 3: REFACTOR │
567
- │ Action: Improve structure, not behavior │
568
- │ Gate: Run full suite — paste summary output │
569
- │ Block: Cannot commit if any test regresses │
570
- │ Commit: refactor(scope): clean without behavior │
571
- └─────────────────────────────────────────────────────┘
572
- ```
225
+ ### Gate Protocol — paste actual runner output at each gate, not a summary
226
+ - RED: write test → run, paste full failure → commit `test(scope): [RED] describe behavior`.
227
+ - GREEN: minimum impl → run, paste full pass → commit `feat(scope): implement to satisfy test`.
228
+ - REFACTOR: improve structure run full suite, paste summary → commit `refactor(scope): clean without behavior`.
573
229
 
574
- ### Commit Sequence as Audit Trail
575
- The git log for any feature must be readable as:
576
- ```
577
- test(cart): [RED] add test for removing last item empties cart
578
- feat(cart): remove last item empties cart
579
- refactor(cart): extract empty-check to CartState predicate
580
- ```
581
- This sequence is auditable. An AI that wrote the `feat:` commit without the preceding
582
- `test:` commit either skipped the red phase entirely or conflated it with implementation.
583
- The commit hook `pre-commit-tdd-check.sh` detects the second pattern before it lands.
584
-
585
- ### Why Instructions Alone Are Not Sufficient
586
- A language model generating in a single context window experiences no time delay between
587
- writing a test and writing an implementation that passes it. The RED phase is structurally
588
- collapsed. The gates above exist precisely to make the phases non-simultaneous:
589
- - The test commit must happen before the implementation can be written.
590
- - The failure output must be produced (by running the code) before the game state is known.
591
- - The model cannot "know" the failure output without actually running the test,
592
- because the failure messages are not in the training distribution for this specific code.
593
- These gates transform TDD from a discipline into a constraint.
230
+ The git log per feature reads as the audit trail (test → feat → refactor). `pre-commit-tdd-check.sh` detects a `feat:` with no preceding `test:`.
594
231
 
595
232
  - id: adversarial-testing
596
233
  tier: core
597
234
  title: "Adversarial Testing Posture"
598
235
  content: |
599
236
  ## Adversarial Testing Posture
600
-
601
- Tests are not documentation of what the code does. Tests are adversarial assertions
602
- that the code does the right thing even when given inputs designed to break it.
603
-
604
- ### The adversarial posture
605
- - Design every test as if the implementation is wrong until proven otherwise.
606
- - Write tests that FAIL on incorrect code — not tests that pass on any reasonable implementation.
607
- - If a test is hard to make fail, the specification is underspecified, not the test.
608
-
609
- ### Name tests as behaviors, not paths
610
- - `rejects_expired_tokens` not `test_validate_token`
611
- - `throws_on_missing_required_field` not `test_error_handling`
612
- - `returns_empty_list_not_null_when_no_results` not `test_query`
613
-
614
- ### Cover the adversarial surface
615
- For every public function or API endpoint, write tests for:
616
- 1. **Valid boundary values**: minimum, maximum, exact-zero, single-element
617
- 2. **Invalid boundary values**: below-minimum, above-maximum, empty, null/undefined
618
- 3. **Constraint violations**: values that look valid but break invariants (negative balance, future birth date)
619
- 4. **Ordering and concurrency**: does order matter? what if called twice?
620
- 5. **Authorization boundaries**: can a user access another user's resource?
621
-
622
- A test suite that only exercises the happy path is documentation, not specification.
623
- Every mutation that survives is a missing adversarial test.
237
+ - Write tests that FAIL on incorrect code, not tests that pass on any reasonable impl. Hard to make fail = underspecified.
238
+ - Name as behaviors: `rejects_expired_tokens` not `test_validate_token`; `returns_empty_list_not_null_when_no_results` not `test_query`.
239
+ - Per public function/endpoint, cover: valid boundaries (min/max/zero/single) · invalid boundaries (below/above/empty/null) · constraint violations (negative balance, future birth date) · ordering/concurrency · authorization boundaries.
240
+ - Happy-path-only suite is documentation, not spec. Every surviving mutant = a missing adversarial test.
624
241
 
625
242
  - id: property-based-testing
626
243
  tier: recommended
627
244
  title: "Property-Based Testing"
628
245
  content: |
629
246
  ## Property-Based Testing
247
+ Add property tests for: pure functions with wide input domains (serialization, parsing, math, sorting); encoder/decoder pairs (`decode(encode(x)) === x`); sort idempotence (`sort(sort(xs)) === sort(xs)`); financial calculations (bounded results for all valid inputs).
630
248
 
631
- Example-based tests verify that `f(x) = y` for specific known pairs.
632
- Property-based tests verify that invariants hold for ALL inputs the generator can produce.
633
- Both are required. Neither replaces the other.
634
-
635
- ### When to add property tests
636
- - Pure functions with wide input domains (serialization, parsing, math, sorting)
637
- - Functions where "same inputs → same outputs" must hold across edge cases
638
- - Any encoder/decoder pair: `decode(encode(x)) === x` must hold for all x
639
- - Any sort or ranking: `sort(sort(xs))` must equal `sort(xs)` (idempotence)
640
- - Any financial calculation: results must be within bounds for all valid inputs
641
-
642
- ### Ecosystem tools (language-agnostic principle)
643
- Use whatever property testing library matches the project's language:
644
- - TypeScript / JavaScript: `fast-check`
645
- - Python: `hypothesis`
646
- - Java / Kotlin: `jqwik` or `kotest`
647
- - Go: `gopter` or `rapid`
648
- - Rust: `proptest`
649
- - Scala: `scalacheck`
650
-
651
- ### Template invariant structure
652
- ```
653
- property("encode-decode round trip", () => {
654
- forAll(arbitrary_valid_input(), (input) => {
655
- expect(decode(encode(input))).toEqual(input);
656
- });
657
- });
658
- ```
249
+ | Ecosystem | Tool |
250
+ |---|---|
251
+ | TS/JS | `fast-check` |
252
+ | Python | `hypothesis` |
253
+ | Java/Kotlin | `jqwik` / `kotest` |
254
+ | Go | `gopter` / `rapid` |
255
+ | Rust | `proptest` |
256
+ | Scala | `scalacheck` |
659
257
 
660
- If a property test fails with an unexpected input, add that input as a regression example test.
661
- Property failures are bugs, not edge cases to suppress.
258
+ A property failure is a bug add the failing input as a regression example test, do not suppress.
662
259
 
663
260
  - id: spec-meta-query
664
261
  tier: recommended
@@ -712,139 +309,22 @@ blocks:
712
309
  title: "Commit Protocol"
713
310
  content: |
714
311
  ## Commit Protocol
312
+ - Conventional commits: `feat|fix|refactor|docs|test|chore(scope): description`.
313
+ - A commit must pass: compilation, lint, tests, coverage gate, mutation gate (Stryker on changed modules), anti-pattern scan.
314
+ - Atomic — one logical change per commit. Never combine a behavior change with a refactor.
315
+ - Commit BEFORE any risky refactor. Update Status.md at end of every session.
715
316
 
716
- A commit is a **verified state** of the system — not a save point, not a checkpoint.
717
- A valid commit requires all three: test suite passes, delta is bounded and coherent,
718
- no new anti-patterns introduced.
719
-
720
- - Conventional commits: `feat|fix|refactor|docs|test|chore(scope): description`
721
- - Commits must pass: compilation, lint, tests, coverage gate, mutation score gate (Stryker on changed modules), anti-pattern scan.
722
- - Keep commits atomic — one logical change per commit.
723
- - Commit BEFORE any risky refactor. Tag stable states.
724
- - Update Status.md at the end of every session.
725
-
726
- ### Commit Hooks — Emit, Don't Reference
727
- Commit hooks, commit-message linting, and the CI pipeline must be **emitted as fenced
728
- code blocks** in the first session response — not merely referenced in prose or README
729
- text. A hook that exists only as "you should add a pre-commit hook" in documentation
730
- provides zero enforcement. If the file is not written to disk, the gate does not exist.
731
-
732
- The following files must be emitted for any new project:
733
-
734
- **`package.json`** — add to `scripts` and `devDependencies`:
735
- ```json
736
- "scripts": { "prepare": "husky install" },
737
- "devDependencies": {
738
- "husky": "^9.0.0",
739
- "@commitlint/cli": "^19.0.0",
740
- "@commitlint/config-conventional": "^19.0.0"
741
- }
742
- ```
743
-
744
- **`.husky/pre-commit`**:
745
- ```bash
746
- #!/usr/bin/env sh
747
- . "$(dirname -- "$0")/_/husky.sh"
748
- npx tsc --noEmit && npm run lint && npm test -- --passWithNoTests
749
- ```
750
-
751
- **`.husky/commit-msg`**:
752
- ```bash
753
- #!/usr/bin/env sh
754
- . "$(dirname -- "$0")/_/husky.sh"
755
- npx commitlint --edit "$1"
756
- ```
757
-
758
- **`commitlint.config.js`**:
759
- ```js
760
- module.exports = { extends: ['@commitlint/config-conventional'] };
761
- ```
762
-
763
- ### Linter Config — Emit in P0, Don't Reference
764
- Linter configuration is infrastructure, not application code. It must be committed to the
765
- repo root in the **first response** (P0) alongside hooks and CI config — not added post-hoc.
766
- A linter mentioned only in documentation does not enforce anything.
767
-
768
- **TypeScript / JavaScript** — emit `.eslintrc.json` (or `eslint.config.js` for flat config):
769
- ```json
770
- {
771
- "parser": "@typescript-eslint/parser",
772
- "plugins": ["@typescript-eslint"],
773
- "rules": {
774
- "no-unused-vars": "off",
775
- "@typescript-eslint/no-unused-vars": "error",
776
- "@typescript-eslint/no-explicit-any": "error"
777
- }
778
- }
779
- ```
780
-
781
- **Python** — emit `ruff.toml` (or `[tool.ruff]` section in `pyproject.toml`):
782
- ```toml
783
- [tool.ruff]
784
- select = ["E", "F", "I"]
785
- ignore = []
786
- line-length = 100
787
- ```
788
-
789
- **Go** — emit `.golangci.yaml`:
790
- ```yaml
791
- linters:
792
- enable:
793
- - unused
794
- - govet
795
- - errcheck
796
- ```
797
-
798
- The correct linter config for **this project's language** must be committed to the repo root
799
- in the same response that emits hooks and CI. Discovering lint errors at code review is too late.
800
-
801
- ### CI Pipeline — Emit, Don't Reference
802
- `.github/workflows/ci.yml` must be emitted as a fenced code block in the first response.
803
- A CI configuration described only in documentation does not enforce anything.
804
- Adapt service blocks, branch names, and language-specific commands to the project stack.
805
- The mutation gate step (`npx stryker run` for JS/TS, `mutmut run` for Python, `pitest` for
806
- Java) is non-negotiable — it is the only gate that verifies test quality, not just
807
- test execution. Line coverage at 80% can coexist with 58% mutation score; the mutation
808
- gate catches the difference.
809
-
810
- Minimum CI for a Node.js/TypeScript project:
811
- ```yaml
812
- name: CI
813
- on:
814
- push:
815
- branches: [main, develop]
816
- pull_request:
817
- branches: [main, develop]
818
- jobs:
819
- ci:
820
- runs-on: ubuntu-latest
821
- steps:
822
- - uses: actions/checkout@v4
823
- - uses: actions/setup-node@v4
824
- with:
825
- node-version: '20'
826
- cache: 'npm'
827
- - run: npm ci
828
- - run: npx tsc --noEmit
829
- - run: npm run lint
830
- - run: npm test -- --coverage --passWithNoTests
831
- - name: Mutation gate
832
- run: npx stryker run
833
- ```
317
+ ### One logical change =
318
+ feature+tests · behavior-preserving refactor · spec change + its code change · bug fix + repro test.
834
319
 
835
- ### Commit Message Precision
836
- The commit message is the sentence describing this state in the project's typed corpus.
837
- - `fix bug` not a sentence; not queryable; useless as episodic memory.
838
- - `fix(auth): reject expired tokens at middleware boundary before service layer invocation`
839
- The AI uses commit history as context in future sessions. Typed, scoped conventional
840
- messages are a queryable episodic record. `wip` and `changes` are not.
320
+ ### Message precision
321
+ - `fix bug` not queryable.
322
+ - `fix(auth): reject expired tokens at middleware boundary before service invocation`
323
+ Commit history is episodic memory the AI reads in future sessions; `wip`/`changes` are not.
841
324
 
842
- ### What Constitutes One Logical Change
843
- - A new feature and its tests: one commit.
844
- - A refactor of an existing module that does not change behavior: one commit.
845
- - A spec update (constitution change + the code change it governs): one commit.
846
- - A bug fix with the reproducing test included: one commit.
847
- Never combine a behavior change with a refactor in the same commit.
325
+ ### Emit, Don't Reference (P0/P1)
326
+ Hooks (`.husky/pre-commit`, `.husky/commit-msg`, `commitlint.config.js`), linter config (`.eslintrc.json`/`ruff.toml`/`.golangci.yaml` for this stack), and `.github/workflows/ci.yml` must be written to disk as fenced code blocks in the first response — not referenced in prose. If the file is not on disk, the gate does not exist. The hook stack emits these configs in P0/P1.
327
+ - CI steps: checkout install type-check lint test --coverage → mutation gate (`stryker run`/`mutmut run`/`pitest`). The mutation gate is non-negotiable — it verifies test quality, not just execution.
848
328
 
849
329
  - id: clarification-protocol
850
330
  tier: core
@@ -868,56 +348,20 @@ blocks:
868
348
  title: "Feature Completion Protocol"
869
349
  content: |
870
350
  ## Feature Completion Protocol
871
- After implementing any feature (new or changed):
872
-
873
- ### 1. Verify (local, pre-commit)
874
- Run: `npx forgecraft-mcp verify .`
875
- (Or `npm test` + manual HTTP check if forgecraft is not installed.)
876
- A feature is not done until verify passes. Do not proceed to docs if it fails.
877
-
878
- ### 2. Commit (code only)
879
- Commit after `verify` passes. This triggers CI and the staging deploy pipeline.
880
- `feat(scope): <description>` — describes the feature, not the docs update.
881
-
882
- ### 3. Deploy to Staging + Smoke Gate
883
- After the CI pipeline deploys to staging, run the smoke suite:
884
- ```
885
- npx playwright test --config playwright.smoke.config.ts --grep @smoke
886
- ```
887
- If smoke fails: **revert the deploy**. Do not proceed to production and do not cascade docs
888
- for a feature that is broken in the deployed environment.
889
-
890
- ### 4. Doc Sync Cascade
891
- Update the following in order — skip any that do not exist in this project:
892
- 1. **spec.md** — update the relevant feature section (APIs, behavior, contract changes)
893
- 2. **docs/adrs/** — add an ADR if a new architectural decision was made
894
- 3. **docs/diagrams/c4-*.md** — update `c4-context.md` or `c4-container.md` if a new
895
- module, container, or external dependency was added. Diagrams must be written to disk
896
- as fenced Mermaid blocks — updating prose that references a diagram is not an update.
897
- 4. **docs/diagrams/sequence-*.md / state-*.md / flow-*.md** — update or create the
898
- relevant diagram file for the changed surface. Sequence diagrams must name real
899
- participants; state diagrams must name real states and transitions; flow diagrams must
900
- have entry/exit nodes and decision diamonds. A file containing only `<!-- UNFILLED -->`
901
- markers is a specification gap, not a completed diagram.
902
- 5. **docs/TechSpec.md** — update module list, API reference, or technology choice sections
903
- 6. **docs/use-cases.md** — update or add use cases if new actor interactions were introduced
904
- 7. **Status.md** — always update: what changed, current state, next steps
351
+ 1. **Verify** (local): `npx forgecraft-mcp verify .` (or `npm test` + manual HTTP check). Not done until it passes; don't proceed to docs on failure.
352
+ 2. **Commit** (code only) after verify passes: `feat(scope): <description>`. Triggers CI + staging deploy.
353
+ 3. **Smoke gate**: after staging deploy, `npx playwright test --config playwright.smoke.config.ts --grep @smoke`. On failure: revert the deploy, do not cascade docs.
354
+ 4. **Doc sync cascade** in order (skip non-existent): spec.md → docs/adrs/ (if new decision) → docs/diagrams/c4-*.md → docs/diagrams/sequence|state|flow-*.md → docs/TechSpec.md → docs/use-cases.md Status.md (always). Diagrams written to disk as real Mermaid (named participants/states/nodes); `<!-- UNFILLED -->` is a gap, not a diagram.
905
355
 
906
356
  - id: mcp-tooling
907
357
  tier: recommended
908
358
  title: "MCP-Powered Tooling"
909
359
  content: |
910
360
  ## MCP-Powered Tooling
911
- ### CodeSeeker — Graph-Powered Code Intelligence
912
- CodeSeeker builds a knowledge graph of the codebase with hybrid search
913
- (vector + text + path, fused with RRF). Use it for:
914
- - **Semantic search**: "find code that handles errors like this" not just grep.
915
- - **Graph traversal**: imports, calls, extends — follow dependency chains.
916
- - **Coding standards**: auto-detected validation, error handling, and state patterns.
917
- - **Contextual reads**: `get_file_context` returns a file with its related code.
918
- Indexing is automatic on first search (~30s–5min depending on codebase size).
919
- Most valuable on mid-to-large projects (10K+ files) with established patterns.
920
- Install: `npx codeseeker install --vscode` or see https://github.com/jghiringhelli/codeseeker
361
+ ### CodeSeeker — graph-powered code intelligence (hybrid vector+text+path, RRF)
362
+ - Semantic search (beyond grep), graph traversal (imports/calls/extends), auto-detected standards, `get_file_context` contextual reads.
363
+ - Auto-indexes on first search (~30s–5min). Most valuable on 10K+ file projects.
364
+ - Install: `npx codeseeker install --vscode`https://github.com/jghiringhelli/codeseeker
921
365
 
922
366
  - id: engineering-preferences
923
367
  tier: recommended
@@ -938,108 +382,36 @@ blocks:
938
382
  title: "Agent Mechanical Constraints"
939
383
  content: |
940
384
  ## Agent Mechanical Constraints — Non-Negotiable Overrides
941
-
942
- These rules exist because the model has hard constraints that silently produce wrong
943
- results without them. Each override targets a specific failure mode observed in practice.
944
-
945
- ### 1. Step 0Dead Code First
946
- Before ANY structural refactor on a file > 300 LOC:
947
- 1. Delete all dead props, unused exports, unused imports, and debug logs.
948
- 2. Commit that cleanup separately under `chore(scope): strip dead code`.
949
- 3. Only then begin the real work.
950
- Dead code accelerates context compaction. Every orphaned import is budget spent on
951
- tokens that contribute nothing to the task.
952
-
953
- ### 2. Phase Limit — 5 Files Maximum
954
- Never attempt multi-file refactors in a single response. Break work into explicit phases.
955
- Each phase must touch **no more than 5 files**. Complete Phase N, verify it compiles and
956
- all tests pass, then wait for explicit approval before Phase N+1.
957
- Why: context compaction fires around 167K tokens and discards all intermediate reasoning,
958
- file reads, and decision chains. Keeping phases small prevents mid-task compaction.
959
-
960
- ### 3. Senior Dev Override
961
- The model's default system-level directives instruct it to "try the simplest approach
962
- first" and "avoid refactoring beyond what was asked." These are active constraints that
963
- fight your intent on architectural tasks.
964
- **Override them explicitly** when quality requires it:
965
- > "What would a senior, experienced, perfectionist dev reject in code review? Fix all of it."
966
- This reframes what "done" means. System-level brevity mandates yield to an explicit
967
- quality bar stated in the prompt.
968
-
969
- ### 4. Sub-Agent Parallelism for Large Tasks
970
- For tasks spanning > 5 independent files, **launch parallel sub-agents** (5–8 files per
971
- agent). Each agent gets its own isolated context window and compaction cycle.
972
- Sequential processing of large tasks guarantees context decay by the later files.
973
- Batching into parallel agents multiplies the effective working memory proportionally.
974
-
975
- ### 5. File Read Budget — 2,000-Line Cap
976
- Each file read is hard-capped at 2,000 lines. Everything past that is silently truncated.
977
- The model does not know what it didn't see — it will hallucinate the rest.
978
- **For any file over 500 LOC**: read in sequential chunks using `offset` and `limit`
979
- parameters. Never assume a single read captured the full file.
980
-
981
- ### 6. Tool Result Truncation
982
- Tool results exceeding ~50,000 characters are truncated to a 2,000-byte preview.
983
- The model works from the preview and does not know results were cut.
984
- If any search returns suspiciously few results: re-run it with narrower scope
985
- (single directory, stricter glob). State explicitly when truncation may have occurred.
986
-
987
- ### 7. Grep Is Not an AST
988
- `grep` is raw text pattern matching. It cannot distinguish a function call from a
989
- comment, a type reference from a string literal, or an import from one module vs another.
990
- On any rename or signature change, search **separately** for:
991
- - Direct calls and references
992
- - Type-level references (interfaces, generics, `typeof`)
993
- - String literals containing the name
994
- - Dynamic imports and `require()` calls
995
- - Re-exports and barrel file entries (`index.ts`, `__init__.py`)
996
- - Test files and mocks
997
- Never assume a single grep caught everything. Verify or expect regressions.
385
+ 1. **Dead code first**: before any refactor on a file > 300 LOC, strip dead props/exports/imports/logs and commit `chore(scope): strip dead code` separately.
386
+ 2. **Phase limit 5 files max** per response. Complete a phase, verify compile+tests, await approval before the next. (Compaction fires ~167K tokens and discards intermediate reasoning.)
387
+ 3. **Senior dev override**: explicitly counter the "simplest approach / don't refactor beyond ask" defaults when quality requires — "What would a perfectionist senior reject in review? Fix all of it."
388
+ 4. **Sub-agent parallelism**: for > 5 independent files, launch parallel sub-agents (5–8 files each) for isolated context windows.
389
+ 5. **File read budget 2,000-line cap** (silent truncation beyond). For files > 500 LOC, read in `offset`/`limit` chunks.
390
+ 6. **Tool result truncation**: results > ~50K chars truncate to a 2K preview. On suspiciously few results, re-run narrower and state when truncation may have occurred.
391
+ 7. **Grep is not an AST**: on rename/signature change, search separately for direct calls, type-level refs, string literals, dynamic imports/`require()`, re-exports/barrels (`index.ts`/`__init__.py`), and test files/mocks.
998
392
 
999
393
  - id: code-generation-verification
1000
394
  tier: core
1001
395
  title: "Code Generation — Self-Verify Loop"
1002
396
  content: |
1003
397
  ## Code Generation — Verify Before Returning
398
+ Show the evidence — do not claim without running.
399
+ 1. **Compile**: `tsc --noEmit` / `mypy` / equiv — 0 errors.
400
+ 2. **Test suite**: full run (`jest --runInBand`, `pytest`) — 0 failures.
401
+ 3. **Interface consistency**: when changing a signature, fix ALL callers in the same pass (else oscillation).
402
+ 4. **DRY check**: duplication < 5% (min-tokens 50) on `src/` — see project-gates.yaml `no-code-duplication`; extract above threshold.
403
+ 5. **Interface completeness**: every interface method implemented by its concrete class — see `interface-contract-completeness`.
1004
404
 
1005
- When emitting implementation code across one or more files, the response is not complete
1006
- until the following are true. Show the evidence in your response — do not claim without running.
1007
-
1008
- ### Verification steps (in order)
1009
- 1. **Compile check**: Run `tsc --noEmit` (TypeScript), `mypy` (Python), or equivalent.
1010
- Zero errors required. Do not return with type errors outstanding.
1011
- 2. **Test suite**: Run the full test suite (`jest --runInBand`, `pytest`, etc.).
1012
- Zero failures required. Fix every failure before returning.
1013
- 3. **Interface consistency**: When fixing a compile error in file A, check ALL callers of
1014
- the changed interface. Fixing one side without seeing the other causes oscillation:
1015
- the model fixes `service.ts` (3-param signature) but `routes.ts` still calls it with
1016
- an object — same error reappears inverted next pass.
1017
- 4. **§8 DRY Check**: Run duplication detector on `src/`. Duplicated lines must be < 5%
1018
- (min-tokens 50). Use the tool appropriate for your stack (see project-gates.yaml:
1019
- `no-code-duplication`). If above threshold, extract duplicated logic to a shared utility
1020
- before closing.
1021
- 5. **§9 Interface Completeness**: Every method declared in each interface must be implemented
1022
- by its concrete class. Run static type checking (0 errors required). Use the tool
1023
- appropriate for your stack (see project-gates.yaml: `interface-contract-completeness`).
1024
- If errors exist, implement missing methods before closing.
1025
-
1026
- ### Required evidence in the final response
405
+ Required evidence:
1027
406
  ```
1028
407
  tsc --noEmit: 0 errors
1029
408
  Jest: 109 passed, 0 failed, 11 suites
1030
409
  ```
1031
410
 
1032
- ### Common test setup pitfalls (TypeScript / Prisma)
1033
- - **`prisma db push`, not `prisma migrate deploy`** in test environments.
1034
- `migrate deploy` silently no-ops when no `prisma/migrations/` folder exists,
1035
- leaving all tables absent. `db push --accept-data-loss` syncs `schema.prisma` directly.
1036
- - **`deleteMany` in FK order, not `DROP SCHEMA`**.
1037
- `$executeRawUnsafe('DROP SCHEMA public CASCADE; CREATE SCHEMA public;')` throws
1038
- error 42601 — pg rejects multi-statement queries in prepared statements.
1039
- Use ordered `deleteMany()` calls in `beforeEach` instead.
1040
- - **JWT_SECRET minimum length**: HS256 requires ≥ 32 characters.
1041
- Test secrets like `"test-secret"` (11 chars) cause startup errors.
1042
- Use `"test-secret-that-is-at-least-32-chars"` in test env.
411
+ ### Test-setup pitfalls (TS/Prisma)
412
+ - Use `prisma db push --accept-data-loss`, not `migrate deploy` (no-ops without a migrations folder).
413
+ - Reset via ordered `deleteMany()` in FK order, not `DROP SCHEMA` (pg error 42601 on multi-statement).
414
+ - JWT_SECRET 32 chars (HS256) in test env.
1043
415
 
1044
416
  - id: known-pitfalls
1045
417
  tier: core
@@ -1089,8 +461,7 @@ blocks:
1089
461
  content: |
1090
462
  ## Testing Architecture
1091
463
 
1092
- ### Test Types by Scope and Purpose
1093
- Listed from fastest/most-isolated to slowest/most-integrated:
464
+ ### Test Types by Scope and Purpose (fast/isolated → slow/integrated)
1094
465
 
1095
466
  | Type | Description | Tooling |
1096
467
  |---|---|---|
@@ -1116,19 +487,9 @@ blocks:
1116
487
  | **Exploratory** | Manual, session-based, scheduled. Charter-driven. Findings become regression tests. | Manual + session notes |
1117
488
 
1118
489
  ### Variant Coverage Dimensions
1119
- For each test scope, the following input/condition variants are required:
1120
-
1121
- - **Happy path** — nominal, valid inputs. Necessary but never sufficient.
1122
- - **Sad / Negative path** — correct rejection of invalid input or sequences.
1123
- - **Edge case / BVA** — boundary values: max, min, empty, null, type coercions.
1124
- - **Corner case** — intersection of two or more simultaneous edge conditions. Requires explicit enumeration.
1125
- - **State transition** — valid and invalid state machine transitions. Requires a state diagram as prerequisite.
1126
- - **Equivalence partitioning** — one representative from each equivalence class. Reduces test count without reducing coverage.
1127
- - **Error path** — infrastructure/dependency failure: timeout, 500, DB refused, queue full — conditions the user did not cause.
1128
- - **Security / Adversarial input** — SQL injection, XSS, path traversal, oversized payloads, malformed tokens. Required at every layer touching user-supplied data.
1129
- - **Random / Monkey** — unstructured random input. Subsumed by property-based layer.
490
+ Happy path · Sad/Negative · Edge/BVA (max/min/empty/null/coercion) · Corner (intersecting edges) · State transition (needs state diagram) · Equivalence partition · Error path (timeout/500/DB refused) · Security/Adversarial (SQLi, XSS, path traversal, oversized, malformed tokens) · Random/Monkey (via property-based).
1130
491
 
1131
- **Variant coverage matrix** (✓ = required, ~ = structural constraint, — = not applicable):
492
+ **Variant coverage matrix** (✓ required, ~ structural, — n/a):
1132
493
 
1133
494
  | Variant | Unit | Integration | Contract | API | E2E | Smoke | Chaos |
1134
495
  |---|---|---|---|---|---|---|---|
@@ -1142,8 +503,7 @@ blocks:
1142
503
  | Security / Adversarial | — | — | — | ✓ | — | — | ~ always adversarial |
1143
504
  | Random / Monkey | via property-based | — | — | — | — | — | ✓ |
1144
505
 
1145
- ### Test Pipeline Mapping
1146
- Each trigger gate accumulates the prior gates. A gate may not be skipped.
506
+ ### Test Pipeline Mapping (each gate accumulates prior gates; none skippable)
1147
507
 
1148
508
  | Trigger | Gate Contents | Target Duration |
1149
509
  |---|---|---|
@@ -1161,10 +521,7 @@ blocks:
1161
521
  title: "Active Release Phase Gate"
1162
522
  content: |
1163
523
  ## Active Release Phase: {{release_phase | default: development}}
1164
-
1165
- Your current phase determines which test gates are **required now**, not advisory.
1166
- The full taxonomy and trigger mapping are in the Testing section above.
1167
- Read your phase row below and apply every requirement listed.
524
+ Your phase determines which gates are blocking NOW. Apply every requirement in your phase row.
1168
525
 
1169
526
  | Phase | Required now — blocking | Not required yet |
1170
527
  |---|---|---|
@@ -1175,89 +532,18 @@ blocks:
1175
532
 
1176
533
  **Current active phase: `{{release_phase | default: development}}`**
1177
534
 
1178
- > If the phase is `pre-release` or `release-candidate`:
1179
- > Hardening tests (load, DAST, penetration) are REQUIRED in this session, not deferred.
1180
- > Do not proceed to merge without completing the required gate for your phase.
1181
- > The Testing section above maps each gate to its tooling and target duration.
535
+ > If `pre-release`/`release-candidate`: hardening tests (load, DAST, penetration) are REQUIRED this session, not deferred. Do not merge without completing your phase's gate.
1182
536
 
1183
537
  - id: gs-test-techniques
1184
538
  tier: recommended
1185
539
  title: "Generative Specification — Testing Techniques"
1186
540
  content: |
1187
541
  ## Generative Specification: Testing Techniques
1188
-
1189
- These five techniques are specific to GS practice and extend the standard taxonomy above.
1190
-
1191
- ### Adversarial Test Posture
1192
- The test is a hunter, not a witness.
1193
- - Tests are written to FAIL on incorrect code — to find the input or condition that exposes
1194
- a violation, not to confirm the current behavior.
1195
- - Tests must be written against interfaces, not implementations.
1196
- A test coupled to internal state fails on correct refactors and passes on behavioral violations
1197
- that happen to preserve internal structure. That is the worst outcome.
1198
-
1199
- ### Expose-Store-to-Window (Interactive / Game / Real-Time UIs)
1200
- For applications with a shared state store (Redux, Zustand, Pinia, state machine), expose the
1201
- store to `window` in the test environment:
1202
- ```typescript
1203
- if (process.env.NODE_ENV === 'test') {
1204
- (window as any).__store = store;
1205
- }
1206
- ```
1207
- Playwright tests can then assert both what the screen renders AND what the application believes
1208
- is true — the store's internal state — without coupling assertions to DOM structure. This catches
1209
- the failure class that renders correctly but corrupts internal state (score displays right, stored wrong;
1210
- entity in undefined state not yet manifested as a visual defect).
1211
-
1212
- ### Vertical Chain Test
1213
- A single UI action triggers Playwright, which then:
1214
- 1. Queries the service layer response
1215
- 2. Queries the database state and any affected indexes
1216
- 3. Verifies correct propagation through every boundary the action crosses
1217
- 4. Returns to the UI to confirm the visible outcome matches the stored state
1218
-
1219
- Not a unit test, not a visual check, not a flow test: a chain verification. One trigger, inspected
1220
- at every boundary it crosses. Specify which critical flows receive this treatment in the test
1221
- architecture document. A defect anywhere in the chain (service logic, persistence, index consistency,
1222
- UI rendering) is surfaced in a single pass.
1223
-
1224
- ### Mutation Testing as Adversarial Audit
1225
- An AI-generated test suite carries a structural risk: tests written by a system that knows the
1226
- correct implementation may pass it rather than catch violations of it.
1227
- - Run Stryker (JS/TS) or mutmut (Python) against every AI-generated suite before accepting it.
1228
- - A test that passes a mutant is not testing the contract — it is confirming the absence of one
1229
- specific mutation, no more.
1230
- - Coverage measures what was executed. Mutation score measures what was caught. The second is
1231
- the meaningful metric.
1232
- - Gates: 70% mutation score at PR, 80% at release candidate on changed code.
1233
-
1234
- ### Multimodal Quality Gates (Generative Assets)
1235
- When content is AI-generated (images, audio, video), the acceptance criteria must be executable.
1236
- Manual review at scale is not a pipeline.
1237
-
1238
- **Visual assets (sprite sheets, generated imagery):**
1239
- ```python
1240
- # PCA-based orientation check
1241
- from sklearn.decomposition import PCA
1242
- pca = PCA(n_components=2).fit(ship_pixel_coordinates)
1243
- angle = np.degrees(np.arctan2(*pca.components_[0][::-1]))
1244
- assert abs(angle) <= 15, f"Sprite orientation {angle:.1f}° exceeds 15° tolerance"
1245
-
1246
- # Symmetry check (horizontal flip similarity)
1247
- similarity = ssim(img_half_left, np.fliplr(img_half_right))
1248
- assert similarity >= 0.85, f"Symmetry {similarity:.2f} below 0.85 threshold"
1249
- ```
1250
-
1251
- **Audio assets:**
1252
- - Loudness normalization: assert target LUFS within ±1 dB of spec (pyloudnorm).
1253
- - Frequency profile: no asset competes in the 2–4 kHz presence range during dialogue.
1254
- - Silence detection: reject assets with generation artifacts (> X ms silence in unexpected positions).
1255
-
1256
- **MCP-mediated inspection (judgment-requiring defects):**
1257
- An instrumented game/app state exposed through an MCP server lets a language model
1258
- evaluate whether a running scene satisfies its acceptance criteria without pre-scripting
1259
- every assertion. Feed the model the scene spec + MCP access; it reports violations.
1260
- This addresses defects that are easy to name but hard to encode as assertions.
542
+ - **Adversarial posture**: write tests to FAIL on incorrect code; against interfaces, not internal state.
543
+ - **Expose-store-to-window**: for shared-state UIs (Redux/Zustand/Pinia), set `window.__store` in test env so Playwright asserts internal state, not just DOM. Catches "renders right, stored wrong".
544
+ - **Vertical chain test**: one UI action → assert service response → DB state + indexes → back to UI. Surfaces a defect anywhere in the chain in one pass. Name which critical flows get it.
545
+ - **Mutation as adversarial audit**: run Stryker (JS/TS) / mutmut (Python) on every AI-generated suite. Gates: 70% at PR, 80% at RC on changed code.
546
+ - **Multimodal quality gates** (generated assets): executable acceptance, not manual review. Visual: PCA orientation (≤15°), SSIM symmetry (≥0.85). Audio: LUFS ±1 dB, frequency profile, silence detection. MCP-mediated inspection for judgment-requiring defects.
1261
547
 
1262
548
  - id: artifact-grammar
1263
549
  tier: core
@@ -1357,140 +643,45 @@ blocks:
1357
643
  title: "ADR Protocol"
1358
644
  content: |
1359
645
  ## ADR Protocol — Persistent Memory
1360
-
1361
- Every non-obvious architectural decision produces an ADR before implementation begins.
1362
- An unrecorded architectural decision is a gap in the grammar.
1363
-
1364
- Without an ADR, the AI will "improve" intentional decisions that appear suboptimal
1365
- without context — turning deliberate architectural tradeoffs into silently-introduced drift.
646
+ Every non-obvious architectural decision produces an ADR before implementation.
1366
647
 
1367
648
  ### Format (minimal)
1368
649
  ```markdown
1369
- # ADR-NNNN: [Decision Title]
1370
-
650
+ # ADR-NNNN: [Title]
1371
651
  **Date**: YYYY-MM-DD
1372
652
  **Status**: Proposed | Accepted | Deprecated | Superseded by ADR-NNNN
1373
-
1374
- ## Context
1375
- What is the situation that requires a decision? What forces are in tension?
1376
-
1377
- ## Decision
1378
- What was decided? State it plainly.
1379
-
1380
- ## Alternatives Considered
1381
- What other options were evaluated and why were they not chosen?
1382
-
1383
- ## Consequences
1384
- What becomes easier or harder as a result of this decision?
1385
- What will the AI need to know to work within this constraint?
653
+ ## Context / Decision / Alternatives Considered / Consequences
1386
654
  ```
1387
655
 
1388
- ### When to Write an ADR
1389
- - Any architectural choice that is not obvious from the code structure
1390
- - Any decision that involves a tradeoff (performance vs. simplicity, security vs. UX)
1391
- - Any decision that was reached after considering alternatives
1392
- - Any decision that future engineers (or AI sessions) might be tempted to "fix"
1393
- - Any change to the architectural constitution itself
1394
-
1395
- ### ADR Directory
1396
- - Path: `docs/adrs/` (zero-padded, kebab-case: `ADR-0001-short-title.md`)
1397
- - ADRs are immutable once Accepted. To change a decision: write a new ADR that supersedes the old one.
1398
- - The old ADR is updated only to add `Superseded by ADR-NNNN` to its status.
1399
-
1400
- ### ADR Stubs — Emit in P1
1401
- When starting a new project, emit ADR stub files as **fenced code blocks** in the first
1402
- response alongside `prisma/schema.prisma`, `tsconfig.json`, and `package.json`.
1403
- ADRs referenced only in a README but not written as files are not present in the project.
1404
- The model cannot reference a file that does not exist. Emit the file.
1405
-
1406
- **Minimum ADRs to emit in P1** (adapt titles to the actual stack chosen):
1407
- - `docs/adrs/ADR-0001-stack.md` — language, runtime, framework, ORM selection and rationale
1408
- - `docs/adrs/ADR-0002-authentication.md` — auth strategy (JWT/session), hashing algorithm and why
1409
- - `docs/adrs/ADR-0003-architecture.md` — layered/hexagonal architecture decision and boundary rules
1410
-
1411
- Each ADR stub must contain real content in `Status`, `Context`, `Decision`, and `Consequences`
1412
- fields — not placeholder text. A stub that says "TBD" is not an ADR.
1413
-
1414
- **ADR reference check:** If your README mentions `docs/adrs/ADR-0001-stack.md`, that file
1415
- must appear as a fenced code block in the same response. A reference to a non-emitted file
1416
- is an Auditable violation — it creates the appearance of traceability without the substance.
1417
-
1418
- Also emit **`CHANGELOG.md`** in P1 with initial content documenting the P1 decisions:
1419
- ```markdown
1420
- # Changelog
1421
- All notable changes to this project will be documented here.
1422
- Format: [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
1423
-
1424
- ## [Unreleased]
1425
- ### Added
1426
- - Initial project scaffold: layered architecture, Prisma schema, repository interfaces
1427
- - Authentication: JWT + Argon2 (see ADR-0002)
1428
- - Dependency registry: docs/approved-packages.md with audit baseline
1429
- - CI pipeline: lint, type-check, test, npm audit, mutation gate
1430
- - Pre-commit hooks: tsc, lint, audit, test gates
1431
- ```
1432
- A CHANGELOG that exists only as "we will add one" is not Auditable. Write the file.
1433
- Document the P1 decisions immediately — the first entry is not a release entry, it is the
1434
- architectural record of what was built in this session.
656
+ ### When to write
657
+ Non-obvious choice · tradeoff (perf vs simplicity, security vs UX) · decided after alternatives · future sessions might "fix" it · any change to the constitution.
1435
658
 
1436
- ### Session Protocol
1437
- Every session begins by reading the open ADRs. The status of each ADR is the authoritative
1438
- record of what is intentional. A session that modifies an ADR-governed boundary without
1439
- first reading the ADR has produced drift, regardless of whether the code compiles.
659
+ - Path `docs/adrs/ADR-0001-short-title.md` (zero-padded, kebab). Immutable once Accepted; supersede with a new ADR.
660
+ - **Emit in P1** as fenced code blocks (a referenced-but-unwritten ADR is an Auditable violation). Minimum: ADR-0001-stack, ADR-0002-authentication (auth strategy + hashing), ADR-0003-architecture. Real content, not "TBD".
661
+ - Also emit **`CHANGELOG.md`** in P1 (Keep a Changelog format) documenting the P1 decisions under `[Unreleased] > Added`.
662
+ - Session start: read open ADRs first. Modifying an ADR-governed boundary without reading it = drift, even if it compiles.
1440
663
 
1441
664
  - id: use-case-triple-derivation
1442
665
  tier: recommended
1443
666
  title: "Use Case Triple Derivation"
1444
667
  content: |
1445
668
  ## Use Cases — Triple Derivation
669
+ The UC format IS Design by Contract (Meyer): precondition/postcondition/invariant — the contract is the executable spec.
670
+ One precise use case derives three artifacts:
671
+ 1. **Implementation contract** — actor/precondition/trigger/postcondition is the spec the service layer is written against.
672
+ 2. **Acceptance test** — same artifact in test dialect (Playwright/Cucumber). Hard to write the test = underspecified use case.
673
+ 3. **User documentation** — the same content narrated for a non-technical reader; a rendering pass, not a rewrite.
1446
674
 
1447
- A use case is not a requirements artifact produced before implementation and superseded by it.
1448
- In a generative specification it is a multi-purpose production rule: a single, precise
1449
- description of an interaction from which three artifacts derive independently and without
1450
- redundancy.
1451
-
1452
- ### The Three Derivations
1453
- 1. **Implementation contract** — The use case names the actor, precondition, trigger, and
1454
- postcondition with enough precision to be unambiguous. This is the specification the
1455
- service layer is written against. When the AI reads a well-formed use case before
1456
- generating the corresponding service method, it has what a human architect would
1457
- communicate in a design review.
1458
-
1459
- 2. **Acceptance test** — The use case and the test scenario are the same artifact expressed
1460
- in different dialects. A Playwright E2E test for a checkout flow is the checkout use case
1461
- transcribed into executable form. A Cucumber scenario in Given-When-Then is the use case
1462
- in declarative test notation. When the use case is precise, the test writes itself.
1463
- **When the test is hard to write, the use case is underspecified.** The test difficulty
1464
- is the diagnostic for underspecification.
1465
-
1466
- 3. **User documentation** — A use case narrated to a non-technical reader (actor, goal,
1467
- precondition, sequence, expected outcome, error cases) is a user manual section.
1468
- The content is identical. The framing is different. A specification with complete use
1469
- cases does not need a separate documentation writing pass — it needs a rendering pass.
1470
-
1471
- ### Use Case Format (minimal)
675
+ ### Format (minimal)
1472
676
  ```markdown
1473
677
  ## UC-NNN: [Action] [Domain Object]
1474
-
1475
- **Actor**: [who initiates]
1476
- **Precondition**: [what must be true before]
1477
- **Trigger**: [what event or action starts the flow]
1478
- **Main Flow**:
1479
- 1. [Step one]
1480
- 2. [Step two]
1481
- **Postcondition**: [what is true after success]
1482
- **Error Cases**:
1483
- - [Condition]: [System response]
1484
- **Acceptance Criteria** (machine-checkable):
1485
- - [ ] [Criterion 1]
1486
- - [ ] [Criterion 2]
678
+ **Actor**: / **Precondition**: / **Trigger**:
679
+ **Main Flow**: 1. ... 2. ...
680
+ **Postcondition**: / **Error Cases**: [Condition]: [response]
681
+ **Acceptance Criteria** (machine-checkable): - [ ] ...
1487
682
  ```
1488
683
 
1489
- ### The Diagnostic Rule
1490
- Before writing any service method, write the use case first. If you cannot state the
1491
- precondition and postcondition precisely, you do not yet understand the behavior well enough
1492
- to implement it correctly. The implementation will be wrong. The use case forces the
1493
- understanding the implementation requires.
684
+ Diagnostic: write the use case before any service method. Can't state pre/postcondition precisely = don't understand the behavior well enough to implement it.
1494
685
 
1495
686
  - id: living-documentation
1496
687
  tier: recommended