@tgoodington/intuition 10.4.0 → 10.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tgoodington/intuition",
3
- "version": "10.4.0",
3
+ "version": "10.5.0",
4
4
  "description": "Domain-adaptive workflow system for Claude Code: prompt, outline, assemble specialist teams, detail with domain experts, build with format producers, test code output. Supports v8 compat (design, engineer, build) and v9 specialist workflows with 14 domain specialists and 6 format producers.",
5
5
  "keywords": [
6
6
  "claude-code",
@@ -73,25 +73,7 @@ Read these files:
73
73
  7. ALL files matching `{context_path}/scratch/*-decisions.json` — decision tiers and chosen options per specialist.
74
74
  8. `docs/project_notes/decisions.md` — project-level ADRs.
75
75
 
76
- From build_report.md, extract:
77
- - **Files modified** — the scope boundary for testing and fixes
78
- - **Task results** — which tasks passed/failed build review
79
- - **Deviations** — any blueprint deviations that may need test coverage
80
- - **Decision compliance** — any flagged decision issues
81
- - **Test Deliverables Deferred** — test specs/files that specialists recommended but build skipped (if this section exists)
82
-
83
- From blueprints, extract behavioral contracts per module:
84
- - **Deliverable Specification** (Section 5): function signatures, return schemas (dict keys, types, value ranges), error conditions with exact messages, naming conventions, state transitions
85
- - **Acceptance Mapping** (Section 6): which AC each deliverable satisfies and how
86
- - **Producer Handoff** (Section 9): expected file paths, integration points
87
-
88
- From test_advisory.md, extract domain test knowledge:
89
- - Edge cases, critical paths, failure modes, and boundary conditions flagged by specialists
90
-
91
- From decisions files, build a decision index:
92
- - Map each `[USER]` decision to its chosen option
93
- - Map each `[SPEC]` decision to its chosen option and rationale
94
- - This index is used in Step 6 for fix boundary checking
76
+ From these files, extract: **build_report** → files modified (scope boundary), task results, deviations, decision compliance, deferred test deliverables. **Blueprints** → Section 5 behavioral contracts (signatures, return schemas, error conditions, naming), Section 6 AC mapping, Section 9 file paths. **test_advisory** → edge cases, critical paths, failure modes. **Decisions** → index of all [USER] and [SPEC] decisions with chosen options (used in Step 6 boundary checking).
95
77
 
96
78
  ## STEP 2: RESEARCH (2 Parallel Research Agents)
97
79
 
@@ -111,20 +93,9 @@ Spawn two `intuition-researcher` agents in parallel (both Task calls in a single
111
93
  6. **External dependencies** — which external systems each module interacts with (for mocking).
112
94
  7. **Existing tests** — search the project for test files matching source file name patterns. Report paths only.
113
95
 
114
- Output in this format per blueprint:
115
- ```
116
- ## {specialist_name} — {blueprint_file}
117
- ### Module: {file_path as specified in blueprint}
118
- **Import:** `from {module} import {name}`
119
- **Interface:** `function_name(param: Type, ...) -> ReturnType`
120
- **Return schema:** {what the blueprint says it returns — keys, types, values}
121
- **Error conditions:** {what the blueprint says about errors}
122
- **Naming conventions:** {patterns}
123
- **Mocking targets:** {external deps}
124
- **Existing tests:** {paths or 'None found'}
125
- ```
96
+ Output per blueprint as: `## {specialist} — {file}` then per module: Import, Interface, Return schema, Error conditions, Naming conventions, Mocking targets, Existing tests. Mark any unspecified field as 'Not specified in blueprint'.
126
97
 
127
- CRITICAL: Extract ONLY what the blueprint SPECIFIES. Do not supplement with information from source code. If the blueprint does not specify a return schema, report 'Not specified in blueprint'. The purpose is to capture what the spec says the code SHOULD look like — not what the code actually looks like."
98
+ CRITICAL: Extract ONLY what the blueprint SPECIFIES not what the source code does."
128
99
 
129
100
  If no blueprints directory exists, fall back to reading source files for structural information only (function signatures, import paths, external dependencies). Use the strict call-signature format: signatures and import paths only, no return value contents, no error messages, no behavioral descriptions.
130
101
 
@@ -167,17 +138,24 @@ If process_flow.md conflicts with actual implementation, check build_report.md f
167
138
 
168
139
  ### File-to-Tier Mapping
169
140
 
170
- For each modified file, determine which test tier drives its testing:
141
+ | File Type | Primary Tier |
142
+ |-----------|-------------|
143
+ | Route / controller | Tier 1 (AC tests via HTTP) |
144
+ | Engine / orchestrator | Tier 1 (AC tests of engine API) |
145
+ | Service / provider | Tier 2 (blueprint contract) |
146
+ | Model / schema | Tier 2 (blueprint contract) |
147
+ | Utility / helper | Tier 3, or Tier 2 if blueprint specifies |
148
+ | Configuration / Template | Skip (test indirectly via Tier 1) |
149
+
150
+ ### Tier Distribution Minimums
151
+
152
+ The test plan MUST satisfy these ratios (calculated against total test count):
153
+ - **Tier 1 ≥ 40%** — If the plan has fewer than 40% Tier 1 tests, add more AC-level tests before adding Tier 2/3. If there are not enough ACs to reach 40%, document why in the test strategy.
154
+ - **Tier 3 ≤ 30%** — Coverage gap-fillers must not dominate the suite. If Tier 3 exceeds 30%, cut the lowest-value coverage tests.
171
155
 
172
- | File Type | Primary Tier | Rationale |
173
- |-----------|-------------|-----------|
174
- | Route / controller | Tier 1 (AC tests via HTTP) | ACs describe route behavior — test the route |
175
- | Engine / orchestrator | Tier 1 (AC tests of engine API) | ACs describe engine outcomes — test the engine |
176
- | Service / provider | Tier 2 (blueprint contract) | Blueprints specify provider contracts |
177
- | Model / schema | Tier 2 (blueprint contract) | Blueprints specify data shapes |
178
- | Utility / helper | Tier 3 (coverage) or Tier 2 (if blueprint specifies) | Only Tier 2 if blueprint has a deliverable spec for it |
179
- | Configuration | Skip (test indirectly via Tier 1) | Config effects are observable at route/engine level |
180
- | Template / static | Skip (test indirectly via Tier 1) | Template output is observable in route responses |
156
+ ### Negative Test Minimums
157
+
158
+ At least **30% of Tier 1 and Tier 2 tests** must exercise error/failure/invalid-input paths: invalid inputs, dependency failures (timeout, connection refused), state violations (e.g., stopping a non-running container), missing config. If the spec doesn't describe error behavior, flag as spec gap with `# SPEC_AMBIGUOUS` do NOT skip negative testing.
181
159
 
182
160
  ### Edge Cases, Mocking, and Coverage
183
161
 
@@ -185,34 +163,17 @@ For each modified file, determine which test tier drives its testing:
185
163
 
186
164
  **Mock strategy**: Follow project conventions from Step 2. Default: mock external dependencies only. Never mock the unit under test. Tier 1/2 tests mock at system boundaries; Tier 3 may mock internal seams.
187
165
 
166
+ **Mock depth rule for infrastructure/DevOps projects**: When the project orchestrates external systems (Docker, cloud APIs, CLI tools, databases), pure-mock tests risk testing only mock setup. For each external-system wrapper, at least one Tier 1 test MUST assert **mock interaction depth** — not just return values, but that the mock was called with correct arguments, order, and count per the blueprint spec.
167
+
188
168
  **Coverage target**: Match existing config threshold, or 80% line coverage for modified files. Focus on decision-heavy code paths (`[USER]` and `[SPEC]` decisions).
189
169
 
190
170
  ### Spec Oracle Hierarchy
191
171
 
192
- Tests derive expected behavior from spec artifacts, NOT from reading source code. Each oracle maps to a test tier:
193
-
194
- | Oracle | Spec Source | Drives Test Tier | What it defines |
195
- |--------|------------|-----------------|-----------------|
196
- | **Primary** | outline.md acceptance criteria | Tier 1 | Observable outcomes the system must produce |
197
- | **Secondary** | blueprints (Section 5 + 6) | Tier 2 | Detailed behavioral contracts: return schemas, error tables, naming conventions, state machines |
198
- | **Tertiary** | process_flow.md | Tier 1 + 2 | Integration seams, cross-component handoffs, state mutations, error propagation |
199
- | **Advisory** | test_advisory.md | Tier 2 + 3 | Edge cases, critical paths, failure modes (supplements, not replaces, blueprints) |
200
-
201
- When a test fails, the failure means the implementation disagrees with the spec — that is a finding, not automatically a bug in either the test or the code. See Step 6 Classify Failures for how to handle this.
172
+ Tests derive expected behavior from specs, NOT source code. Oracle priority: **outline.md ACs** (Tier 1) → **blueprints Sections 5+6** (Tier 2) → **process_flow.md** (Tier 1+2 integration) → **test_advisory.md** (advisory, Tier 2+3). When a test fails, the implementation disagrees with the spec — classify per Step 6, don't assume either is wrong.
202
173
 
203
174
  ### Acceptance Criteria Path Coverage
204
175
 
205
- For every acceptance criterion in outline.md that describes observable behavior ("displays X", "uses Y for Z", "produces output containing W"):
206
-
207
- 1. At least one **Tier 1** test MUST exercise the **actual entry point at the abstraction level the AC describes**. Read the AC carefully to determine the right level:
208
- - AC mentions HTTP routes or UI behavior → test the route (e.g., `TestClient.post("/admin/container/app/start")`)
209
- - AC mentions engine or service behavior → test the engine's public API (e.g., `engine.run(context)`)
210
- - AC mentions CLI output → test the CLI command
211
- - NEVER satisfy an AC exclusively with a unit test of an internal helper function
212
- 2. The test MUST assert on the **expected output as described by the spec** (acceptance criterion + blueprint deliverable spec). Every assertion value must be traceable to a spec document.
213
- 3. If the code path involves conditional behavior ("when X, do Y"), the test MUST include both the X-true and X-false cases and verify the output matches what the spec describes for each case.
214
-
215
- Tier 2 tests of internal functions supplement Tier 1 but do NOT substitute for them. Every AC needs Tier 1 coverage.
176
+ For every AC with observable behavior, at least one Tier 1 test MUST exercise the **actual entry point at the AC's abstraction level** (HTTP route → test the route, engine API → test the engine, CLI → test the command). NEVER satisfy an AC exclusively with a unit test of an internal helper. Assertions MUST match spec-defined expected output. Conditional behavior ("when X, do Y") requires both branches tested. Tier 2 supplements but does NOT substitute for Tier 1.
216
177
 
217
178
  ### Specialist Test Recommendations
218
179
 
@@ -220,7 +181,7 @@ Before finalizing the test plan, review specialist domain knowledge from bluepri
220
181
  - **Testability Notes**: Edge cases, critical paths, failure modes, and boundary conditions from each blueprint's Approach section (Section 3, `### Testability Notes` subheading)
221
182
  - **Deferred test deliverables**: Any test specs from build_report.md's "Test Deliverables Deferred" section (legacy — older blueprints may still include test files in Producer Handoff)
222
183
 
223
- Specialists have domain expertise about what should be tested. Incorporate their testability insights into your test plan, but you own the test strategy — use specialist input as advisory, not prescriptive.
184
+ Incorporate specialist insights as advisory, not prescriptive you own the test strategy.
224
185
 
225
186
  ### Output
226
187
 
@@ -228,11 +189,14 @@ Write the test strategy to `{context_path}/scratch/test_strategy.md`. This serve
228
189
 
229
190
  The test strategy document MUST contain:
230
191
  - **AC coverage matrix**: For each acceptance criterion, which test(s) cover it, at what tier, and at what abstraction level. Every AC with observable behavior MUST have at least one Tier 1 test.
192
+ - **Tier distribution**: Total count per tier with percentages. Verify: Tier 1 ≥ 40%, Tier 3 ≤ 30%. If not met, adjust plan before proceeding.
193
+ - **Negative test inventory**: List each negative/error-path test explicitly. Verify: ≥ 30% of Tier 1/2 tests are negative. If not met, add more error-path tests.
231
194
  - Test files to create (path, tier, target source file)
232
- - Test cases per file (name, tier, what it validates, **which spec artifact defines the expected behavior**, **what the spec says the expected output is**)
233
- - Mock requirements per file (mock external deps only for Tier 1/2; Tier 3 may mock internal seams)
195
+ - Test cases per file (name, tier, positive/negative, what it validates, **which spec artifact defines the expected behavior**, **what the spec says the expected output is**)
196
+ - Mock requirements per file (mock external deps only for Tier 1/2; Tier 3 may mock internal seams). For infra projects: flag files needing mock-depth assertions (call args, call order, call count).
234
197
  - Framework command to run tests
235
198
  - Estimated test count and distribution by tier
199
+ - **Mutation spot-check candidates**: 3 source files with highest Tier 1/2 coverage, and one candidate mutation per file
236
200
  - Which specialist recommendations were incorporated (and which were skipped, with rationale)
237
201
  - Any acceptance criteria where the expected behavior is ambiguous (flagged for potential SPEC_AMBIGUOUS markers)
238
202
 
@@ -246,11 +210,13 @@ Question: "Test plan ready:
246
210
  **Framework:** [detected framework]
247
211
  **Test files:** [N] files
248
212
  **Test cases:** ~[total] tests covering [file count] modified files
249
- - Tier 1 (AC tests): [N] tests covering [M] of [P] acceptance criteria
213
+ - Tier 1 (AC tests): [N] tests ([X]% of total, min 40%) covering [M] of [P] acceptance criteria
250
214
  - Tier 2 (blueprint contracts): [N] tests
251
- - Tier 3 (coverage): [N] tests
215
+ - Tier 3 (coverage): [N] tests ([X]% of total, max 30%)
216
+ **Negative tests:** [N] of [M] Tier 1/2 tests ([X]%, min 30%)
252
217
  **AC coverage:** [M]/[P] acceptance criteria have Tier 1 tests [list any uncovered ACs]
253
218
  **Coverage target:** [threshold]%
219
+ **Post-pass:** Mutation spot-check on 3 files
254
220
 
255
221
  Proceed?"
256
222
 
@@ -301,19 +267,13 @@ You are a spec-first test writer. Your tests verify the code does what the SPEC
301
267
  - You MUST NOT use Grep or Glob to search source files
302
268
 
303
269
  ## ASSERTION SOURCING RULES
304
- For EVERY assertion that checks a specific value (exact string, number, status code, dict key):
305
- 1. Add a comment citing the spec source: `# blueprint:{specialist}:L{line} — "{spec quote}"`
306
- 2. If no spec document defines the expected value: mark `# SPEC_AMBIGUOUS: spec says "{quote}" — value not specified`
270
+ For EVERY assertion that checks a specific value: add `# blueprint:{specialist}:L{line} "{spec quote}"`. If no spec defines the value: `# SPEC_AMBIGUOUS: spec says "{quote}" — value not specified`.
307
271
 
308
- For Tier 1 tests:
309
- - Test at the abstraction level the AC describes (HTTP routes, CLI output, observable state changes)
310
- - Mock ONLY external systems (Docker, databases, HTTP clients, cloud APIs) — do NOT mock internal modules
311
- - Assertions should verify user-observable outcomes, not internal function return values
272
+ Tier 1: test at AC's abstraction level, mock ONLY external systems, assert user-observable outcomes.
273
+ Tier 2: test at blueprint's module level, mock external deps per blueprint, assert behavioral contracts.
312
274
 
313
- For Tier 2 tests:
314
- - Test at the module level the blueprint describes
315
- - Mock external dependencies as the blueprint specifies
316
- - Assertions should verify the behavioral contracts from the blueprint's Deliverable Specification
275
+ ## ASSERTION DEPTH RULES
276
+ Prefer DEEP assertions over shallow ones. Instead of `assert result is not None` or `assert "key" in result`, assert specific values: `assert result["network_name"] == "myapp-network"`. For infra/DevOps code: assert mock call arguments, order, and count — not just return values.
317
277
 
318
278
  Write the complete test file. Follow existing test style. Do NOT add test infrastructure.
319
279
  ```
@@ -349,20 +309,43 @@ For each value-assertion, check:
349
309
 
350
310
  Assertions without spec provenance AND without SPEC_AMBIGUOUS markers are **source-derived**. (Tier 3 tests are exempt — they are explicitly implementation-derived.)
351
311
 
352
- ### Part B: Abstraction Level Coverage
312
+ ### Part B: Assertion Depth Scoring
313
+
314
+ For each Tier 1 and Tier 2 test file, classify every assertion as **shallow** or **deep**:
315
+
316
+ | Shallow (low signal) | Deep (high signal) |
317
+ |---|---|
318
+ | `is not None` | `== "expected-specific-value"` |
319
+ | `isinstance(result, dict)` | `result["network_name"] == "myapp-network"` |
320
+ | `"key" in result` | `mock_docker.run.assert_called_with(image="x", ports={...})` |
321
+ | `len(result) > 0` | `error.message == "Container myapp not found"` |
322
+ | `result["success"] == True` (when mock returns True) | `result["status"] == "running"` (verified against spec behavior) |
323
+
324
+ **Threshold**: If >50% of assertions in a test file are shallow, flag the file. The test exists but proves almost nothing.
325
+
326
+ **Escalation**: If >30% of ALL Tier 1/2 test files are flagged as shallow-dominant, present via AskUserQuestion:
327
+
328
+ ```
329
+ Header: "Assertion Depth Warning"
330
+ Question: "[N] of [M] test files have >50% shallow assertions.
331
+ These tests pass trivially and won't catch real bugs.
332
+
333
+ Examples: [list 2-3 worst offenders with their shallow assertion patterns]
334
+
335
+ Options: fix shallow tests / accept as-is / skip to Step 6"
336
+ ```
337
+
338
+ If "fix": delegate to `intuition-code-writer` agents with instructions to replace shallow assertions with specific value checks traced to blueprint specs. If the blueprint doesn't specify the value, add `SPEC_AMBIGUOUS` marker.
339
+
340
+ ### Part C: Abstraction Level Coverage
353
341
 
354
342
  For each acceptance criterion in outline.md that describes observable behavior:
355
343
  1. Check: is there at least one Tier 1 test that exercises the AC at the abstraction level it describes?
356
344
  2. If an AC describes HTTP route behavior but the only test is a unit test of an internal function → flag as **abstraction gap**
357
345
 
358
- Example of an abstraction gap:
359
- - AC T2.3: "Container operations execute successfully and status updates reflect within the next poll cycle"
360
- - Only test: `test_start_container_success()` which calls `start_container()` directly and checks `result["success"]`
361
- - Gap: No test exercises the actual HTTP route `POST /admin/container/{app_name}/start` and verifies the response
362
-
363
346
  ### Reporting
364
347
 
365
- If Part A finds >20% source-derived assertions OR Part B finds any abstraction gaps, present via AskUserQuestion:
348
+ If Part A finds >20% source-derived assertions, Part B flags >30% shallow-dominant files, OR Part C finds any abstraction gaps, present via AskUserQuestion:
366
349
 
367
350
  ```
368
351
  Header: "Spec Compliance Audit"
@@ -394,29 +377,24 @@ Also run `mcp__ide__getDiagnostics` to catch type errors and lint issues in the
394
377
 
395
378
  For each failure, classify. The first question is always: **does the spec clearly define the expected behavior the test asserts?**
396
379
 
397
- | Classification | How to identify | Action |
398
- |---|---|---|
399
- | **Test bug** (wrong assertion, incorrect mock, import error) | Test doesn't match the spec it claims to test, or has a structural error | Fix autonomously — `intuition-code-writer` agent |
400
- | **Spec Violation** (implementation disagrees with spec) | Test asserts spec-defined behavior, implementation returns something different, and the spec is clear and unambiguous | Escalate to user: "Test [name] expects [spec behavior] per [acceptance criterion / blueprint spec], but implementation returns [actual]. Is the spec wrong or the code?" Options: "Fix the code" / "Spec was wrong — update test" / "I'll investigate" |
401
- | **Spec Ambiguity** (spec underspecified, test assertion is a guess) | Test is marked SPEC_AMBIGUOUS, or the spec doesn't define the expected value precisely enough to write a deterministic assertion | Escalate to user: "Spec doesn't clearly define expected behavior for [scenario]. The code does [X]. Is that correct?" Options: "Yes, that's correct — lock it in" / "No, it should do [other]" / "Skip this test" |
402
- | **Implementation bug, trivial** (off-by-one, missing null check, typo — 1-3 lines) | Spec is clear, implementation is clearly wrong, fix is small | Fix directly — `intuition-code-writer` agent |
403
- | **Implementation bug, moderate** (logic error, missing handler — contained to one file) | Spec is clear, implementation is wrong, fix is contained | Fix — `intuition-code-writer` agent with full diagnosis |
404
- | **Implementation bug, complex** (multi-file structural issue) | Spec is clear, but fix requires architectural changes | Escalate to user |
405
- | **Fix would violate [USER] decision** | Any tier | STOP — escalate to user immediately |
406
- | **Fix would violate [SPEC] decision** | Any tier | Note the conflict, proceed with fix (specialist had authority) |
407
- | **Fix touches files outside build_report scope** | Any tier | Escalate to user (scope creep) |
380
+ | Classification | Action |
381
+ |---|---|
382
+ | **Test bug** (wrong assertion, mock, import) | Fix autonomously — `intuition-code-writer` |
383
+ | **Spec Violation** (code disagrees with clear spec) | Escalate: "expects [spec] per [source], got [actual]. Fix code / update spec / investigate?" |
384
+ | **Spec Ambiguity** (SPEC_AMBIGUOUS or underspecified) | Escalate: "Spec unclear for [scenario]. Code does [X]. Correct? Lock in / change / skip?" |
385
+ | **Impl bug, trivial** (1-3 lines, spec is clear) | Fix directly — `intuition-code-writer` |
386
+ | **Impl bug, moderate** (one file, spec is clear) | Fix — `intuition-code-writer` with diagnosis |
387
+ | **Impl bug, complex** (multi-file structural) | Escalate to user |
388
+ | **Violates [USER] decision** | STOP — escalate immediately |
389
+ | **Violates [SPEC] decision** | Note conflict, proceed with fix |
390
+ | **Touches files outside build scope** | Escalate (scope creep) |
408
391
 
409
392
  ### Decision Boundary Checking
410
393
 
411
- Before ANY implementation fix (not test-only fixes):
412
-
413
- 1. Read ALL `{context_path}/scratch/*-decisions.json` files + `docs/project_notes/decisions.md`
414
- 2. Check: does the proposed fix contradict any `[USER]`-tier decision?
415
- - If YES → STOP. Report the conflict to the user via AskUserQuestion: "Test failure in [file] requires changing [what], but this contradicts your decision on [D{N}: title] where you chose [chosen option]. How should I proceed?" Options: "Change my decision" / "Skip this test" / "I'll fix manually"
416
- 3. Check: does the proposed fix contradict any `[SPEC]`-tier decision?
417
- - If YES → note the conflict in the test report, proceed with the fix (specialist decisions are advisory)
418
- 4. Check: does the fix modify files NOT listed in build_report's "Files Modified" section?
419
- - If YES → escalate: "Fixing [test] requires modifying [file] which wasn't part of this build. Allow scope expansion?" Options: "Allow this file" / "Skip this test"
394
+ Before ANY implementation fix (not test-only fixes), read all `{context_path}/scratch/*-decisions.json` + `docs/project_notes/decisions.md`. Check:
395
+ 1. **[USER] decision conflict** → STOP, escalate via AskUserQuestion with options: "Change decision" / "Skip test" / "Fix manually"
396
+ 2. **[SPEC] decision conflict** note in report, proceed with fix
397
+ 3. **File outside build scope** escalate: "Allow scope expansion?" / "Skip test"
420
398
 
421
399
  ### Fix Cycle
422
400
 
@@ -429,6 +407,29 @@ For each failure:
429
407
 
430
408
  After all failures are addressed (fixed or escalated), run the full test suite one final time to verify no regressions.
431
409
 
410
+ ### Mutation Spot-Check (Post-Pass Gate)
411
+
412
+ After the final test run passes, perform a lightweight mutation check to verify the tests can actually detect bugs. This is NOT full mutation testing — it's a targeted sanity check.
413
+
414
+ 1. Select **3 source files** with the most Tier 1/2 test coverage (highest test count targeting them).
415
+ 2. For each file, make ONE small, obvious mutation via an `intuition-code-writer` agent:
416
+ - Change a return value (e.g., `"running"` → `"stopped"`, `True` → `False`)
417
+ - Change a string literal (e.g., resource name, error message)
418
+ - Remove a function call (e.g., comment out a validation step)
419
+ - The mutation MUST break behavior that at least one test claims to verify
420
+ 3. Re-run ONLY the tests targeting that file.
421
+ 4. **Expected result:** At least one test fails per mutation. If a mutation causes zero test failures, the tests covering that file are hollow.
422
+ 5. **Revert every mutation immediately** after checking (use `git checkout -- {file}` or re-apply the original content).
423
+
424
+ **If any mutation survives** (0 test failures):
425
+ - Report via AskUserQuestion: "Mutation spot-check: changed [what] in [file] — zero tests caught it. The [N] tests covering this file may be testing mock wiring rather than real behavior. Options: strengthen tests / accept risk / skip"
426
+ - If "strengthen tests": delegate to `intuition-code-writer` with the specific mutation that survived, and instructions to add a test that would catch it.
427
+
428
+ **Track results** in the test report under a new "## Mutation Spot-Check" section:
429
+ | File | Mutation | Tests Run | Caught? |
430
+ |------|----------|-----------|---------|
431
+ | [path] | [what was changed] | [N] | Yes/No |
432
+
432
433
  ## STEP 7: TEST REPORT
433
434
 
434
435
  Write `{context_path}/test_report.md`:
@@ -476,6 +477,24 @@ Write `{context_path}/test_report.md`:
476
477
  - SPEC_AMBIGUOUS marked: **[N]** (spec underspecified, asserting implementation value)
477
478
  - Source-derived (untraced): **[N]** [if any — list examples and user disposition: "accepted as-is" / "fixed"]
478
479
 
480
+ ## Assertion Depth
481
+ - Tier 1/2 files audited: **[N]**
482
+ - Shallow-dominant files (>50% shallow assertions): **[N]** [list any]
483
+ - User disposition: [fixed / accepted as-is / N/A]
484
+
485
+ ## Negative Test Coverage
486
+ - Tier 1/2 negative tests: **[N]** of **[M]** total Tier 1/2 tests (**[X]%**, target: ≥30%)
487
+ - Error paths tested: [list categories — invalid input, dependency failure, state violation, etc.]
488
+
489
+ ## Mutation Spot-Check
490
+ | File | Mutation | Tests Run | Caught? |
491
+ |------|----------|-----------|---------|
492
+ | [path] | [what was changed] | [N] | Yes/No |
493
+
494
+ - Mutations tested: **[N]**
495
+ - Caught: **[N]**
496
+ - Survived: **[N]** [list any — with disposition: strengthened / accepted risk]
497
+
479
498
  ## Decision Compliance
480
499
  - Checked **[N]** decisions across **[M]** specialist decision logs
481
500
  - `[USER]` violations: [count — list any, or "None"]