@tgoodington/intuition 10.3.0 → 10.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tgoodington/intuition",
3
- "version": "10.3.0",
3
+ "version": "10.4.0",
4
4
  "description": "Domain-adaptive workflow system for Claude Code: prompt, outline, assemble specialist teams, detail with domain experts, build with format producers, test code output. Supports v8 compat (design, engineer, build) and v9 specialist workflows with 14 domain specialists and 6 format producers.",
5
5
  "keywords": [
6
6
  "claude-code",
@@ -44,6 +44,7 @@ Step 2: Analyze test infrastructure (2 parallel intuition-researcher agents)
44
44
  Step 3: Design test strategy (self-contained domain reasoning)
45
45
  Step 4: Confirm test plan with user
46
46
  Step 5: Create tests (delegate to sonnet code-writer subagents)
47
+ Step 5.5: Spec compliance audit (assertion provenance + abstraction level coverage)
47
48
  Step 6: Run tests + fix cycle (debugger-style autonomy)
48
49
  Step 7: Write test_report.md
49
50
  Step 8: Exit Protocol (state update, completion)
@@ -66,10 +67,11 @@ Read these files:
66
67
  1. `{context_path}/build_report.md` — REQUIRED. Extract: files modified, task results, deviations from blueprints, decision compliance notes.
67
68
  2. `{context_path}/outline.md` — acceptance criteria per task.
68
69
  3. `{context_path}/process_flow.md` (if exists) — end-to-end user flows, component interactions, data paths, error paths. Primary source for designing integration and E2E tests. If this file does not exist (non-code project or Lightweight workflow), proceed without it.
69
- 4. `{context_path}/test_advisory.md` — compact testability notes extracted by the detail phase (one section per specialist). Read this INSTEAD of all blueprints. If this file does not exist (older workflows), fall back to reading `{context_path}/blueprints/*.md` and extracting Testability Notes from each Approach section.
70
- 4. `{context_path}/team_assignment.json` — producer assignments (identify code-writer tasks).
71
- 5. ALL files matching `{context_path}/scratch/*-decisions.json` — decision tiers and chosen options per specialist.
72
- 6. `docs/project_notes/decisions.md` — project-level ADRs.
70
+ 4. `{context_path}/test_advisory.md` — compact testability notes: edge cases, critical paths, failure modes per specialist.
71
+ 5. `{context_path}/blueprints/*.md` — REQUIRED for spec-first testing. Blueprints contain the detailed behavioral contracts that define expected behavior: return schemas, error conditions, API endpoint specs, naming conventions, and state machine definitions. Read ALL blueprints. Focus on Section 5 (Deliverable Specification) and Section 6 (Acceptance Mapping) — these contain the concrete expected behaviors that tests assert against. If no blueprints directory exists, proceed with test_advisory and outline only.
72
+ 6. `{context_path}/team_assignment.json` — producer assignments (identify code-writer tasks).
73
+ 7. ALL files matching `{context_path}/scratch/*-decisions.json` — decision tiers and chosen options per specialist.
74
+ 8. `docs/project_notes/decisions.md` — project-level ADRs.
73
75
 
74
76
  From build_report.md, extract:
75
77
  - **Files modified** — the scope boundary for testing and fixes
@@ -78,9 +80,13 @@ From build_report.md, extract:
78
80
  - **Decision compliance** — any flagged decision issues
79
81
  - **Test Deliverables Deferred** — test specs/files that specialists recommended but build skipped (if this section exists)
80
82
 
81
- From test_advisory.md (or blueprints as fallback), extract domain test knowledge:
83
+ From blueprints, extract behavioral contracts per module:
84
+ - **Deliverable Specification** (Section 5): function signatures, return schemas (dict keys, types, value ranges), error conditions with exact messages, naming conventions, state transitions
85
+ - **Acceptance Mapping** (Section 6): which AC each deliverable satisfies and how
86
+ - **Producer Handoff** (Section 9): expected file paths, integration points
87
+
88
+ From test_advisory.md, extract domain test knowledge:
82
89
  - Edge cases, critical paths, failure modes, and boundary conditions flagged by specialists
83
- - Any test-relevant domain insights
84
90
 
85
91
  From decisions files, build a decision index:
86
92
  - Map each `[USER]` decision to its chosen option
@@ -94,19 +100,61 @@ Spawn two `intuition-researcher` agents in parallel (both Task calls in a single
94
100
  **Agent 1 — Test Infrastructure:**
95
101
  "Search the project for test infrastructure. Find: test framework and runner (jest, vitest, mocha, pytest, etc.), test configuration files, existing test directories and naming conventions, mock/fixture patterns, test utility helpers, CI test commands, coverage configuration and thresholds. Report exact paths and configuration values."
96
102
 
97
- **Agent 2 — Interface Extraction:**
98
- "Read each of these files modified during build: [list files from build_report]. For each file, extract ONLY structural information do NOT describe implementation logic or return value computations. Report: exported functions/classes/methods with their signatures (name, parameters, types), constructor arguments, import paths, class hierarchies, existing test coverage (search for test files matching the source file name pattern), external dependencies that would need mocking. Output a clean interface stub per file that a test writer could use to call the code without knowing what it does internally."
103
+ **Agent 2 — Blueprint Interface Extraction:**
104
+ "Read each blueprint in `{context_path}/blueprints/`. Do NOT read any source code files. For each blueprint, extract from the Deliverable Specification section (Section 5):
105
+
106
+ 1. **Specified interfaces** — function/method signatures, class definitions, constructor args as described in the blueprint. Use the blueprint's notation exactly.
107
+ 2. **Return contracts** — return types, dict key schemas, field names, value ranges, status codes as the blueprint specifies them.
108
+ 3. **Error contracts** — error conditions, exact error messages, exception types, HTTP status codes as the blueprint specifies.
109
+ 4. **Naming conventions** — resource naming patterns (e.g., `{app_name}-network`, `{app_name}--db-password`).
110
+ 5. **File paths** — where the blueprint says each deliverable should live (import paths derive from these).
111
+ 6. **External dependencies** — which external systems each module interacts with (for mocking).
112
+ 7. **Existing tests** — search the project for test files matching source file name patterns. Report paths only.
113
+
114
+ Output in this format per blueprint:
115
+ ```
116
+ ## {specialist_name} — {blueprint_file}
117
+ ### Module: {file_path as specified in blueprint}
118
+ **Import:** `from {module} import {name}`
119
+ **Interface:** `function_name(param: Type, ...) -> ReturnType`
120
+ **Return schema:** {what the blueprint says it returns — keys, types, values}
121
+ **Error conditions:** {what the blueprint says about errors}
122
+ **Naming conventions:** {patterns}
123
+ **Mocking targets:** {external deps}
124
+ **Existing tests:** {paths or 'None found'}
125
+ ```
126
+
127
+ CRITICAL: Extract ONLY what the blueprint SPECIFIES. Do not supplement with information from source code. If the blueprint does not specify a return schema, report 'Not specified in blueprint'. The purpose is to capture what the spec says the code SHOULD look like — not what the code actually looks like."
128
+
129
+ If no blueprints directory exists, fall back to reading source files for structural information only (function signatures, import paths, external dependencies). Use the strict call-signature format: signatures and import paths only, no return value contents, no error messages, no behavioral descriptions.
99
130
 
100
131
  ## STEP 3: TEST STRATEGY (Embedded Domain Knowledge)
101
132
 
102
133
  Using research results from Step 2, design the test plan. This is your internal reasoning — no subagent needed.
103
134
 
104
- ### Test Pyramid
135
+ ### Spec-Oracle Test Tiers
136
+
137
+ Organize tests by what drives the expected behavior, not by technical test type. Tier 1 is mandatory; Tiers 2 and 3 fill coverage gaps.
138
+
139
+ **Tier 1 — Acceptance Criteria Tests** (REQUIRED, highest priority)
140
+ For each AC that describes observable behavior, write at least one test at the **abstraction level the AC describes**:
141
+ - AC describes route behavior → test the HTTP route, verify the response
142
+ - AC describes engine/service outcome → test the engine's public API, verify observable output
143
+ - These tests catch **spec violations** — they answer "did the build produce what the spec required?"
144
+ - Mock external systems (Docker, Azure, git) but NOT internal modules. Test the full internal call chain.
145
+
146
+ **Tier 2 — Blueprint Behavioral Contract Tests** (REQUIRED when blueprints specify detailed contracts)
147
+ For each behavioral contract in blueprint Deliverable Specifications:
148
+ - Test specific return schemas, error conditions, naming conventions, state transitions
149
+ - These tests verify the **detailed behavioral contracts** specialists specified
150
+ - Test at the module level the blueprint describes (if blueprint specifies `start_container() -> {success, status, error}`, test that function directly)
151
+ - Mock external dependencies as specified in the blueprint
105
152
 
106
- Prioritize by value:
107
- - **Unit tests** (highest priority): Pure functions, business logic, data transformations, utility functions. Isolate with mocks for external dependencies only.
108
- - **Integration tests** (medium priority): API routes, database operations, service interactions, middleware chains. Use real dependencies where feasible, mock externals.
109
- - **E2E tests** (only if framework exists): Only create if the project already has an E2E framework configured. Never introduce a new E2E framework.
153
+ **Tier 3 — Coverage Tests** (OPTIONAL, for gap-filling)
154
+ After Tiers 1 and 2, if coverage target is not met:
155
+ - Add unit tests for untested helper functions, edge cases, error paths
156
+ - These tests MAY read source code to discover mockable seams (this is the ONLY tier where source code reading is allowed for test design)
157
+ - Label these tests clearly: `# Coverage test — not derived from spec`
110
158
 
111
159
  ### Process Flow Coverage (if process_flow.md exists)
112
160
 
@@ -117,51 +165,38 @@ Use process_flow.md to identify cross-component integration boundaries and E2E p
117
165
 
118
166
  If process_flow.md conflicts with actual implementation, check build_report.md for accepted deviations. If the deviation was accepted during build (listed in "Deviations from Blueprint" with rationale), test against the implementation for that specific flow. If the deviation is NOT listed as accepted, test against process_flow.md and classify any failure as a Spec Violation.
119
167
 
120
- ### File Type Heuristic
168
+ ### File-to-Tier Mapping
121
169
 
122
- For each modified file, classify the appropriate test type:
170
+ For each modified file, determine which test tier drives its testing:
123
171
 
124
- | File Type | Test Type | Priority |
125
- |-----------|-----------|----------|
126
- | Utility / helper | Unit | High |
127
- | Model / schema | Integration | High |
128
- | Route / controller | Integration | High |
129
- | Component (UI) | Component + Unit | Medium |
130
- | Service / repository | Integration | Medium |
131
- | Configuration | Skip (test indirectly) | Low |
132
- | Migration / seed | Skip (test via integration) | Low |
133
- | Static asset / style | Skip | None |
172
+ | File Type | Primary Tier | Rationale |
173
+ |-----------|-------------|-----------|
174
+ | Route / controller | Tier 1 (AC tests via HTTP) | ACs describe route behavior — test the route |
175
+ | Engine / orchestrator | Tier 1 (AC tests of engine API) | ACs describe engine outcomes — test the engine |
176
+ | Service / provider | Tier 2 (blueprint contract) | Blueprints specify provider contracts |
177
+ | Model / schema | Tier 2 (blueprint contract) | Blueprints specify data shapes |
178
+ | Utility / helper | Tier 3 (coverage) or Tier 2 (if blueprint specifies) | Only Tier 2 if blueprint has a deliverable spec for it |
179
+ | Configuration | Skip (test indirectly via Tier 1) | Config effects are observable at route/engine level |
180
+ | Template / static | Skip (test indirectly via Tier 1) | Template output is observable in route responses |
134
181
 
135
- ### Edge Case Enumeration
182
+ ### Edge Cases, Mocking, and Coverage
136
183
 
137
- For each testable interface:
138
- - **Boundary values**: min, max, zero, negative, empty string, empty array
139
- - **Null/undefined handling**: missing required fields, null inputs
140
- - **Error paths**: invalid input, failed external calls, timeout scenarios
141
- - **Permission edges**: unauthorized access, role boundaries (if applicable)
142
- - **State transitions**: before/after effects, idempotent operations
184
+ **Edge cases** to enumerate per interface: boundary values, null/undefined inputs, error paths (invalid input, failed external calls, timeouts), permission edges, state transitions.
143
185
 
144
- ### Mock Strategy
186
+ **Mock strategy**: Follow project conventions from Step 2. Default: mock external dependencies only. Never mock the unit under test. Tier 1/2 tests mock at system boundaries; Tier 3 may mock internal seams.
145
187
 
146
- Follow project conventions discovered in Step 2:
147
- - If project uses specific mock patterns (jest.mock, sinon, test doubles) → follow them
148
- - Default: mock external dependencies only (HTTP clients, databases, file system, third-party APIs)
149
- - Never mock the unit under test
150
- - Prefer dependency injection over module mocking when the codebase uses DI
151
-
152
- ### Coverage Target
153
-
154
- - If project has coverage config → match existing threshold
155
- - If no config → target 80% line coverage for modified files
156
- - Focus coverage on decision-heavy code paths (where `[USER]` and `[SPEC]` decisions were implemented)
188
+ **Coverage target**: Match existing config threshold, or 80% line coverage for modified files. Focus on decision-heavy code paths (`[USER]` and `[SPEC]` decisions).
157
189
 
158
190
  ### Spec Oracle Hierarchy
159
191
 
160
- Tests derive their expected behavior from spec artifacts, NOT from reading source code. The oracle has three tiers, used in combination:
192
+ Tests derive expected behavior from spec artifacts, NOT from reading source code. Each oracle maps to a test tier:
161
193
 
162
- 1. **Acceptance criteria** (outline.md) Primary oracle for behavioral correctness. Each criterion describes an observable outcome the implementation must produce. Tests assert these outcomes directly.
163
- 2. **Blueprint deliverable specs** (blueprints or test_advisory.md) — Secondary oracle for domain-specific assertions, edge cases, and expected input/output examples. Use Section 6 (Acceptance Mapping) and Section 9 (Producer Handoff) for concrete expected behaviors.
164
- 3. **Process flow** (process_flow.md) Tertiary oracle for integration contracts and cross-component handoffs. Subject to accepted deviations (see Process Flow Coverage above).
194
+ | Oracle | Spec Source | Drives Test Tier | What it defines |
195
+ |--------|------------|-----------------|-----------------|
196
+ | **Primary** | outline.md acceptance criteria | Tier 1 | Observable outcomes the system must produce |
197
+ | **Secondary** | blueprints (Section 5 + 6) | Tier 2 | Detailed behavioral contracts: return schemas, error tables, naming conventions, state machines |
198
+ | **Tertiary** | process_flow.md | Tier 1 + 2 | Integration seams, cross-component handoffs, state mutations, error propagation |
199
+ | **Advisory** | test_advisory.md | Tier 2 + 3 | Edge cases, critical paths, failure modes (supplements, not replaces, blueprints) |
165
200
 
166
201
  When a test fails, the failure means the implementation disagrees with the spec — that is a finding, not automatically a bug in either the test or the code. See Step 6 Classify Failures for how to handle this.
167
202
 
@@ -169,11 +204,15 @@ When a test fails, the failure means the implementation disagrees with the spec
169
204
 
170
205
  For every acceptance criterion in outline.md that describes observable behavior ("displays X", "uses Y for Z", "produces output containing W"):
171
206
 
172
- 1. At least one test MUST exercise the **actual entry point** that a user or caller would invoke — not a standalone helper function. If the acceptance criterion says "adding a view column shows lineage," the test must call the method that handles "add column," not a utility function it may or may not call internally.
173
- 2. The test MUST assert on the **expected output as described by the spec** (acceptance criterion + blueprint deliverable spec) — not on whatever the implementation happens to return.
207
+ 1. At least one **Tier 1** test MUST exercise the **actual entry point at the abstraction level the AC describes**. Read the AC carefully to determine the right level:
208
+ - AC mentions HTTP routes or UI behavior test the route (e.g., `TestClient.post("/admin/container/app/start")`)
209
+ - AC mentions engine or service behavior → test the engine's public API (e.g., `engine.run(context)`)
210
+ - AC mentions CLI output → test the CLI command
211
+ - NEVER satisfy an AC exclusively with a unit test of an internal helper function
212
+ 2. The test MUST assert on the **expected output as described by the spec** (acceptance criterion + blueprint deliverable spec). Every assertion value must be traceable to a spec document.
174
213
  3. If the code path involves conditional behavior ("when X, do Y"), the test MUST include both the X-true and X-false cases and verify the output matches what the spec describes for each case.
175
214
 
176
- Tests that only exercise isolated helper functions satisfy unit coverage but do NOT satisfy acceptance criteria coverage. Both are needed.
215
+ Tier 2 tests of internal functions supplement Tier 1 but do NOT substitute for them. Every AC needs Tier 1 coverage.
177
216
 
178
217
  ### Specialist Test Recommendations
179
218
 
@@ -188,11 +227,12 @@ Specialists have domain expertise about what should be tested. Incorporate their
188
227
  Write the test strategy to `{context_path}/scratch/test_strategy.md`. This serves as both an audit trail and a resume marker for crash recovery.
189
228
 
190
229
  The test strategy document MUST contain:
191
- - Test files to create (path, type, target source file)
192
- - Test cases per file (name, type, what it validates, **which spec artifact defines the expected behavior**)
193
- - Mock requirements per file
230
+ - **AC coverage matrix**: For each acceptance criterion, which test(s) cover it, at what tier, and at what abstraction level. Every AC with observable behavior MUST have at least one Tier 1 test.
231
+ - Test files to create (path, tier, target source file)
232
+ - Test cases per file (name, tier, what it validates, **which spec artifact defines the expected behavior**, **what the spec says the expected output is**)
233
+ - Mock requirements per file (mock external deps only for Tier 1/2; Tier 3 may mock internal seams)
194
234
  - Framework command to run tests
195
- - Estimated test count and distribution
235
+ - Estimated test count and distribution by tier
196
236
  - Which specialist recommendations were incorporated (and which were skipped, with rationale)
197
237
  - Any acceptance criteria where the expected behavior is ambiguous (flagged for potential SPEC_AMBIGUOUS markers)
198
238
 
@@ -204,9 +244,12 @@ Present the test plan via AskUserQuestion:
204
244
  Question: "Test plan ready:
205
245
 
206
246
  **Framework:** [detected framework]
207
- **Test files:** [N] files ([M] unit, [P] integration)
247
+ **Test files:** [N] files
208
248
  **Test cases:** ~[total] tests covering [file count] modified files
209
- **Key areas:** [2-3 bullet points of most important test targets]
249
+ - Tier 1 (AC tests): [N] tests covering [M] of [P] acceptance criteria
250
+ - Tier 2 (blueprint contracts): [N] tests
251
+ - Tier 3 (coverage): [N] tests
252
+ **AC coverage:** [M]/[P] acceptance criteria have Tier 1 tests [list any uncovered ACs]
210
253
  **Coverage target:** [threshold]%
211
254
 
212
255
  Proceed?"
@@ -226,33 +269,114 @@ Options:
226
269
 
227
270
  Delegate test creation to `intuition-code-writer` agents. Parallelize independent test files (multiple Task calls in a single response). Do NOT use `run_in_background` — you MUST wait for ALL subagents to return before proceeding to Step 6.
228
271
 
229
- For each test file, spawn an `intuition-code-writer` agent:
272
+ For each test file, spawn an `intuition-code-writer` agent with a tier-appropriate prompt:
273
+
274
+ ### Tier 1 and Tier 2 Test Writer Prompt
230
275
 
231
276
  ```
232
- You are a test writer. Your job is to write tests that verify the code does what the SPEC says — not to verify the code does what the code does.
277
+ You are a spec-first test writer. Your tests verify the code does what the SPEC says — not what the code happens to do. You will NOT read source code.
233
278
 
234
279
  **Framework:** [detected framework + version]
235
280
  **Test conventions:** [naming pattern, directory structure, import style from Step 2]
236
281
  **Mock patterns:** [project's established mock approach from Step 2]
237
282
 
238
- **Interface stub (from Step 2 research):**
239
- [Paste the interface stub for this source file — signatures, exports, types, import paths. This is how you call the code.]
283
+ **Blueprint-derived interfaces (from Step 2 research):**
284
+ [Paste the blueprint interface extraction for this module — signatures, return schemas, error contracts, naming conventions, import paths. This comes from the BLUEPRINT, not from source code.]
240
285
 
241
286
  **Spec oracle — what the code SHOULD do:**
242
287
  - Acceptance criteria: [paste relevant acceptance criteria from outline.md]
243
- - Blueprint spec: Read [relevant blueprint path] — use Deliverable Specification and Acceptance Mapping sections to determine expected inputs, outputs, and behaviors
244
- - Flow context (integration/E2E tests only): Read `{context_path}/process_flow.md` (if exists) for cross-component contracts
288
+ - Blueprint spec: Read [relevant blueprint path] — Section 5 (Deliverable Specification) for detailed contracts, Section 6 (Acceptance Mapping) for AC-to-deliverable mapping
289
+ - Flow context: Read `{context_path}/process_flow.md` (if exists) for integration seams, state mutations, error propagation paths
290
+ - Test advisory: [paste relevant section from test_advisory.md] for edge cases and failure modes
245
291
 
292
+ **Test tier:** [Tier 1 or Tier 2]
246
293
  **Test file path:** [target test file path]
247
294
  **Test cases to implement:**
248
- [List each test case with: name, type, what it validates per the spec, expected behavior from spec, mock requirements]
295
+ [List each test case with: name, tier, what it validates per the spec, expected behavior FROM SPEC (quote the source), mock requirements]
296
+
297
+ ## FILE ACCESS RULES
298
+ - You MAY read: blueprint files, outline.md, process_flow.md, test_advisory.md
299
+ - You MAY read: existing test files in the test directory (for conventions only)
300
+ - You MUST NOT read source files being tested: [list source file paths]
301
+ - You MUST NOT use Grep or Glob to search source files
302
+
303
+ ## ASSERTION SOURCING RULES
304
+ For EVERY assertion that checks a specific value (exact string, number, status code, dict key):
305
+ 1. Add a comment citing the spec source: `# blueprint:{specialist}:L{line} — "{spec quote}"`
306
+ 2. If no spec document defines the expected value: mark `# SPEC_AMBIGUOUS: spec says "{quote}" — value not specified`
307
+
308
+ For Tier 1 tests:
309
+ - Test at the abstraction level the AC describes (HTTP routes, CLI output, observable state changes)
310
+ - Mock ONLY external systems (Docker, databases, HTTP clients, cloud APIs) — do NOT mock internal modules
311
+ - Assertions should verify user-observable outcomes, not internal function return values
312
+
313
+ For Tier 2 tests:
314
+ - Test at the module level the blueprint describes
315
+ - Mock external dependencies as the blueprint specifies
316
+ - Assertions should verify the behavioral contracts from the blueprint's Deliverable Specification
317
+
318
+ Write the complete test file. Follow existing test style. Do NOT add test infrastructure.
319
+ ```
320
+
321
+ ### Tier 3 Test Writer Prompt (coverage gap-filling only)
249
322
 
250
- CRITICAL: Derive ALL assertions from the spec artifacts above. Do NOT read [source file path] to determine expected return values or behavior. You have the interface stub for structural info (how to call the code). The spec tells you what it should return. If the spec is ambiguous about a specific expected value, write the test with a clear comment marking the assertion as "SPEC_AMBIGUOUS" so the orchestrator can flag it.
323
+ ```
324
+ You are a coverage test writer. Your job is to increase test coverage for code paths not covered by Tier 1/2 spec tests.
325
+
326
+ **Framework:** [detected framework + version]
327
+ **Test conventions:** [naming pattern, directory structure, import style from Step 2]
328
+ **Source file to cover:** Read [source file path] — you MAY read this file to discover testable code paths
329
+ **Existing coverage gaps:** [list uncovered functions/branches from coverage report]
330
+ **Test file path:** [target test file path]
251
331
 
252
- Write the complete test file to the specified path. Follow the project's existing test style exactly. Do NOT add test infrastructure (no new packages, no config changes).
332
+ Label every test with: `# Coverage test not derived from spec`
333
+ Write focused unit tests for uncovered code paths. Follow existing test style.
253
334
  ```
254
335
 
255
- SYNCHRONIZATION GATE: After all subagents return, verify each test file exists on disk using Glob. If any file is missing, retry that subagent once (foreground) with error context. Do NOT proceed to Step 6 until every planned test file is confirmed on disk.
336
+ SYNCHRONIZATION GATE: After all subagents return, verify each test file exists on disk using Glob. If any file is missing, retry that subagent once (foreground) with error context. Do NOT proceed to Step 5.5 until every planned test file is confirmed on disk.
337
+
338
+ ## STEP 5.5: SPEC COMPLIANCE AUDIT
339
+
340
+ Before running tests, verify two things: (A) assertions trace to spec, and (B) ACs are tested at the right abstraction level.
341
+
342
+ ### Part A: Assertion Provenance
343
+
344
+ For each Tier 1 and Tier 2 test file, identify every assertion that checks a **specific value** (exact strings, status codes, dict keys, field values, call arguments).
345
+
346
+ For each value-assertion, check:
347
+ 1. Does it have a `# blueprint:` or `# SPEC_AMBIGUOUS:` comment citing the source?
348
+ 2. If no comment, does the value appear in a spec document (outline, blueprint, process_flow, test_advisory)?
349
+
350
+ Assertions without spec provenance AND without SPEC_AMBIGUOUS markers are **source-derived**. (Tier 3 tests are exempt — they are explicitly implementation-derived.)
351
+
352
+ ### Part B: Abstraction Level Coverage
353
+
354
+ For each acceptance criterion in outline.md that describes observable behavior:
355
+ 1. Check: is there at least one Tier 1 test that exercises the AC at the abstraction level it describes?
356
+ 2. If an AC describes HTTP route behavior but the only test is a unit test of an internal function → flag as **abstraction gap**
357
+
358
+ Example of an abstraction gap:
359
+ - AC T2.3: "Container operations execute successfully and status updates reflect within the next poll cycle"
360
+ - Only test: `test_start_container_success()` which calls `start_container()` directly and checks `result["success"]`
361
+ - Gap: No test exercises the actual HTTP route `POST /admin/container/{app_name}/start` and verifies the response
362
+
363
+ ### Reporting
364
+
365
+ If Part A finds >20% source-derived assertions OR Part B finds any abstraction gaps, present via AskUserQuestion:
366
+
367
+ ```
368
+ Header: "Spec Compliance Audit"
369
+ Question: "[summary of findings]
370
+
371
+ **Provenance:** [N] of [M] Tier 1/2 assertions lack spec citation [if applicable]
372
+ **Abstraction gaps:** [list ACs with only lower-level coverage] [if applicable]
373
+
374
+ Options: fix issues / accept as-is / skip to Step 6"
375
+ ```
376
+
377
+ **If "fix issues":** Delegate to `intuition-code-writer` subagents. For provenance gaps, add spec citations or SPEC_AMBIGUOUS markers. For abstraction gaps, create additional Tier 1 tests at the AC's described abstraction level.
378
+
379
+ **If "accept as-is":** Note findings in test report. Proceed to Step 6.
256
380
 
257
381
  ## STEP 6: RUN TESTS + FIX CYCLE
258
382
 
@@ -317,15 +441,16 @@ Write `{context_path}/test_report.md`:
317
441
  **Status:** Pass | Partial | Failed
318
442
 
319
443
  ## Test Summary
320
- - **Tests created:** [N]
444
+ - **Tests created:** [N] (Tier 1: [N], Tier 2: [N], Tier 3: [N])
321
445
  - **Passing:** [N]
322
446
  - **Failing:** [N]
447
+ - **AC coverage:** [M]/[P] acceptance criteria have Tier 1 tests
323
448
  - **Coverage:** [X]% (target: [Y]%)
324
449
 
325
450
  ## Test Files Created
326
- | File | Tests | Covers |
327
- |------|-------|--------|
328
- | [path] | [count] | [source filewhat it tests] |
451
+ | File | Tier | Tests | Covers |
452
+ |------|------|-------|--------|
453
+ | [path] | [1/2/3] | [count] | [what it tests AC reference or blueprint section] |
329
454
 
330
455
  ## Failures & Resolutions
331
456
 
@@ -345,6 +470,12 @@ Write `{context_path}/test_report.md`:
345
470
  |-------|--------|
346
471
  | [description] | [why not fixable: USER decision conflict / architectural / scope creep / max retries] |
347
472
 
473
+ ## Assertion Provenance
474
+ - Value-assertions audited: **[N]**
475
+ - Spec-traced: **[N]** (value found in outline, blueprint, process_flow, or test_advisory)
476
+ - SPEC_AMBIGUOUS marked: **[N]** (spec underspecified, asserting implementation value)
477
+ - Source-derived (untraced): **[N]** [if any — list examples and user disposition: "accepted as-is" / "fixed"]
478
+
348
479
  ## Decision Compliance
349
480
  - Checked **[N]** decisions across **[M]** specialist decision logs
350
481
  - `[USER]` violations: [count — list any, or "None"]