@tgoodington/intuition 10.3.0 → 10.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/intuition-test/SKILL.md +260 -110
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@tgoodington/intuition",
|
|
3
|
-
"version": "10.
|
|
3
|
+
"version": "10.5.0",
|
|
4
4
|
"description": "Domain-adaptive workflow system for Claude Code: prompt, outline, assemble specialist teams, detail with domain experts, build with format producers, test code output. Supports v8 compat (design, engineer, build) and v9 specialist workflows with 14 domain specialists and 6 format producers.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"claude-code",
|
|
@@ -44,6 +44,7 @@ Step 2: Analyze test infrastructure (2 parallel intuition-researcher agents)
|
|
|
44
44
|
Step 3: Design test strategy (self-contained domain reasoning)
|
|
45
45
|
Step 4: Confirm test plan with user
|
|
46
46
|
Step 5: Create tests (delegate to sonnet code-writer subagents)
|
|
47
|
+
Step 5.5: Spec compliance audit (assertion provenance + abstraction level coverage)
|
|
47
48
|
Step 6: Run tests + fix cycle (debugger-style autonomy)
|
|
48
49
|
Step 7: Write test_report.md
|
|
49
50
|
Step 8: Exit Protocol (state update, completion)
|
|
@@ -66,26 +67,13 @@ Read these files:
|
|
|
66
67
|
1. `{context_path}/build_report.md` — REQUIRED. Extract: files modified, task results, deviations from blueprints, decision compliance notes.
|
|
67
68
|
2. `{context_path}/outline.md` — acceptance criteria per task.
|
|
68
69
|
3. `{context_path}/process_flow.md` (if exists) — end-to-end user flows, component interactions, data paths, error paths. Primary source for designing integration and E2E tests. If this file does not exist (non-code project or Lightweight workflow), proceed without it.
|
|
69
|
-
4. `{context_path}/test_advisory.md` — compact testability notes
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
- **Task results** — which tasks passed/failed build review
|
|
77
|
-
- **Deviations** — any blueprint deviations that may need test coverage
|
|
78
|
-
- **Decision compliance** — any flagged decision issues
|
|
79
|
-
- **Test Deliverables Deferred** — test specs/files that specialists recommended but build skipped (if this section exists)
|
|
80
|
-
|
|
81
|
-
From test_advisory.md (or blueprints as fallback), extract domain test knowledge:
|
|
82
|
-
- Edge cases, critical paths, failure modes, and boundary conditions flagged by specialists
|
|
83
|
-
- Any test-relevant domain insights
|
|
84
|
-
|
|
85
|
-
From decisions files, build a decision index:
|
|
86
|
-
- Map each `[USER]` decision to its chosen option
|
|
87
|
-
- Map each `[SPEC]` decision to its chosen option and rationale
|
|
88
|
-
- This index is used in Step 6 for fix boundary checking
|
|
70
|
+
4. `{context_path}/test_advisory.md` — compact testability notes: edge cases, critical paths, failure modes per specialist.
|
|
71
|
+
5. `{context_path}/blueprints/*.md` — REQUIRED for spec-first testing. Blueprints contain the detailed behavioral contracts that define expected behavior: return schemas, error conditions, API endpoint specs, naming conventions, and state machine definitions. Read ALL blueprints. Focus on Section 5 (Deliverable Specification) and Section 6 (Acceptance Mapping) — these contain the concrete expected behaviors that tests assert against. If no blueprints directory exists, proceed with test_advisory and outline only.
|
|
72
|
+
6. `{context_path}/team_assignment.json` — producer assignments (identify code-writer tasks).
|
|
73
|
+
7. ALL files matching `{context_path}/scratch/*-decisions.json` — decision tiers and chosen options per specialist.
|
|
74
|
+
8. `docs/project_notes/decisions.md` — project-level ADRs.
|
|
75
|
+
|
|
76
|
+
From these files, extract: **build_report** → files modified (scope boundary), task results, deviations, decision compliance, deferred test deliverables. **Blueprints** → Section 5 behavioral contracts (signatures, return schemas, error conditions, naming), Section 6 AC mapping, Section 9 file paths. **test_advisory** → edge cases, critical paths, failure modes. **Decisions** → index of all [USER] and [SPEC] decisions with chosen options (used in Step 6 boundary checking).
|
|
89
77
|
|
|
90
78
|
## STEP 2: RESEARCH (2 Parallel Research Agents)
|
|
91
79
|
|
|
@@ -94,19 +82,50 @@ Spawn two `intuition-researcher` agents in parallel (both Task calls in a single
|
|
|
94
82
|
**Agent 1 — Test Infrastructure:**
|
|
95
83
|
"Search the project for test infrastructure. Find: test framework and runner (jest, vitest, mocha, pytest, etc.), test configuration files, existing test directories and naming conventions, mock/fixture patterns, test utility helpers, CI test commands, coverage configuration and thresholds. Report exact paths and configuration values."
|
|
96
84
|
|
|
97
|
-
**Agent 2 — Interface Extraction:**
|
|
98
|
-
"Read each
|
|
85
|
+
**Agent 2 — Blueprint Interface Extraction:**
|
|
86
|
+
"Read each blueprint in `{context_path}/blueprints/`. Do NOT read any source code files. For each blueprint, extract from the Deliverable Specification section (Section 5):
|
|
87
|
+
|
|
88
|
+
1. **Specified interfaces** — function/method signatures, class definitions, constructor args as described in the blueprint. Use the blueprint's notation exactly.
|
|
89
|
+
2. **Return contracts** — return types, dict key schemas, field names, value ranges, status codes as the blueprint specifies them.
|
|
90
|
+
3. **Error contracts** — error conditions, exact error messages, exception types, HTTP status codes as the blueprint specifies.
|
|
91
|
+
4. **Naming conventions** — resource naming patterns (e.g., `{app_name}-network`, `{app_name}--db-password`).
|
|
92
|
+
5. **File paths** — where the blueprint says each deliverable should live (import paths derive from these).
|
|
93
|
+
6. **External dependencies** — which external systems each module interacts with (for mocking).
|
|
94
|
+
7. **Existing tests** — search the project for test files matching source file name patterns. Report paths only.
|
|
95
|
+
|
|
96
|
+
Output per blueprint as: `## {specialist} — {file}` then per module: Import, Interface, Return schema, Error conditions, Naming conventions, Mocking targets, Existing tests. Mark any unspecified field as 'Not specified in blueprint'.
|
|
97
|
+
|
|
98
|
+
CRITICAL: Extract ONLY what the blueprint SPECIFIES — not what the source code does."
|
|
99
|
+
|
|
100
|
+
If no blueprints directory exists, fall back to reading source files for structural information only (function signatures, import paths, external dependencies). Use the strict call-signature format: signatures and import paths only, no return value contents, no error messages, no behavioral descriptions.
|
|
99
101
|
|
|
100
102
|
## STEP 3: TEST STRATEGY (Embedded Domain Knowledge)
|
|
101
103
|
|
|
102
104
|
Using research results from Step 2, design the test plan. This is your internal reasoning — no subagent needed.
|
|
103
105
|
|
|
104
|
-
### Test
|
|
106
|
+
### Spec-Oracle Test Tiers
|
|
107
|
+
|
|
108
|
+
Organize tests by what drives the expected behavior, not by technical test type. Tier 1 is mandatory; Tiers 2 and 3 fill coverage gaps.
|
|
109
|
+
|
|
110
|
+
**Tier 1 — Acceptance Criteria Tests** (REQUIRED, highest priority)
|
|
111
|
+
For each AC that describes observable behavior, write at least one test at the **abstraction level the AC describes**:
|
|
112
|
+
- AC describes route behavior → test the HTTP route, verify the response
|
|
113
|
+
- AC describes engine/service outcome → test the engine's public API, verify observable output
|
|
114
|
+
- These tests catch **spec violations** — they answer "did the build produce what the spec required?"
|
|
115
|
+
- Mock external systems (Docker, Azure, git) but NOT internal modules. Test the full internal call chain.
|
|
116
|
+
|
|
117
|
+
**Tier 2 — Blueprint Behavioral Contract Tests** (REQUIRED when blueprints specify detailed contracts)
|
|
118
|
+
For each behavioral contract in blueprint Deliverable Specifications:
|
|
119
|
+
- Test specific return schemas, error conditions, naming conventions, state transitions
|
|
120
|
+
- These tests verify the **detailed behavioral contracts** specialists specified
|
|
121
|
+
- Test at the module level the blueprint describes (if blueprint specifies `start_container() -> {success, status, error}`, test that function directly)
|
|
122
|
+
- Mock external dependencies as specified in the blueprint
|
|
105
123
|
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
-
|
|
109
|
-
-
|
|
124
|
+
**Tier 3 — Coverage Tests** (OPTIONAL, for gap-filling)
|
|
125
|
+
After Tiers 1 and 2, if coverage target is not met:
|
|
126
|
+
- Add unit tests for untested helper functions, edge cases, error paths
|
|
127
|
+
- These tests MAY read source code to discover mockable seams (this is the ONLY tier where source code reading is allowed for test design)
|
|
128
|
+
- Label these tests clearly: `# Coverage test — not derived from spec`
|
|
110
129
|
|
|
111
130
|
### Process Flow Coverage (if process_flow.md exists)
|
|
112
131
|
|
|
@@ -117,63 +136,44 @@ Use process_flow.md to identify cross-component integration boundaries and E2E p
|
|
|
117
136
|
|
|
118
137
|
If process_flow.md conflicts with actual implementation, check build_report.md for accepted deviations. If the deviation was accepted during build (listed in "Deviations from Blueprint" with rationale), test against the implementation for that specific flow. If the deviation is NOT listed as accepted, test against process_flow.md and classify any failure as a Spec Violation.
|
|
119
138
|
|
|
120
|
-
### File
|
|
139
|
+
### File-to-Tier Mapping
|
|
121
140
|
|
|
122
|
-
|
|
141
|
+
| File Type | Primary Tier |
|
|
142
|
+
|-----------|-------------|
|
|
143
|
+
| Route / controller | Tier 1 (AC tests via HTTP) |
|
|
144
|
+
| Engine / orchestrator | Tier 1 (AC tests of engine API) |
|
|
145
|
+
| Service / provider | Tier 2 (blueprint contract) |
|
|
146
|
+
| Model / schema | Tier 2 (blueprint contract) |
|
|
147
|
+
| Utility / helper | Tier 3, or Tier 2 if blueprint specifies |
|
|
148
|
+
| Configuration / Template | Skip (test indirectly via Tier 1) |
|
|
123
149
|
|
|
124
|
-
|
|
125
|
-
|-----------|-----------|----------|
|
|
126
|
-
| Utility / helper | Unit | High |
|
|
127
|
-
| Model / schema | Integration | High |
|
|
128
|
-
| Route / controller | Integration | High |
|
|
129
|
-
| Component (UI) | Component + Unit | Medium |
|
|
130
|
-
| Service / repository | Integration | Medium |
|
|
131
|
-
| Configuration | Skip (test indirectly) | Low |
|
|
132
|
-
| Migration / seed | Skip (test via integration) | Low |
|
|
133
|
-
| Static asset / style | Skip | None |
|
|
150
|
+
### Tier Distribution Minimums
|
|
134
151
|
|
|
135
|
-
|
|
152
|
+
The test plan MUST satisfy these ratios (calculated against total test count):
|
|
153
|
+
- **Tier 1 ≥ 40%** — If the plan has fewer than 40% Tier 1 tests, add more AC-level tests before adding Tier 2/3. If there are not enough ACs to reach 40%, document why in the test strategy.
|
|
154
|
+
- **Tier 3 ≤ 30%** — Coverage gap-fillers must not dominate the suite. If Tier 3 exceeds 30%, cut the lowest-value coverage tests.
|
|
136
155
|
|
|
137
|
-
|
|
138
|
-
- **Boundary values**: min, max, zero, negative, empty string, empty array
|
|
139
|
-
- **Null/undefined handling**: missing required fields, null inputs
|
|
140
|
-
- **Error paths**: invalid input, failed external calls, timeout scenarios
|
|
141
|
-
- **Permission edges**: unauthorized access, role boundaries (if applicable)
|
|
142
|
-
- **State transitions**: before/after effects, idempotent operations
|
|
156
|
+
### Negative Test Minimums
|
|
143
157
|
|
|
144
|
-
|
|
158
|
+
At least **30% of Tier 1 and Tier 2 tests** must exercise error/failure/invalid-input paths: invalid inputs, dependency failures (timeout, connection refused), state violations (e.g., stopping a non-running container), missing config. If the spec doesn't describe error behavior, flag as spec gap with `# SPEC_AMBIGUOUS` — do NOT skip negative testing.
|
|
145
159
|
|
|
146
|
-
|
|
147
|
-
- If project uses specific mock patterns (jest.mock, sinon, test doubles) → follow them
|
|
148
|
-
- Default: mock external dependencies only (HTTP clients, databases, file system, third-party APIs)
|
|
149
|
-
- Never mock the unit under test
|
|
150
|
-
- Prefer dependency injection over module mocking when the codebase uses DI
|
|
160
|
+
### Edge Cases, Mocking, and Coverage
|
|
151
161
|
|
|
152
|
-
|
|
162
|
+
**Edge cases** to enumerate per interface: boundary values, null/undefined inputs, error paths (invalid input, failed external calls, timeouts), permission edges, state transitions.
|
|
153
163
|
|
|
154
|
-
|
|
155
|
-
- If no config → target 80% line coverage for modified files
|
|
156
|
-
- Focus coverage on decision-heavy code paths (where `[USER]` and `[SPEC]` decisions were implemented)
|
|
164
|
+
**Mock strategy**: Follow project conventions from Step 2. Default: mock external dependencies only. Never mock the unit under test. Tier 1/2 tests mock at system boundaries; Tier 3 may mock internal seams.
|
|
157
165
|
|
|
158
|
-
|
|
166
|
+
**Mock depth rule for infrastructure/DevOps projects**: When the project orchestrates external systems (Docker, cloud APIs, CLI tools, databases), pure-mock tests risk testing only mock setup. For each external-system wrapper, at least one Tier 1 test MUST assert **mock interaction depth** — not just return values, but that the mock was called with correct arguments, order, and count per the blueprint spec.
|
|
159
167
|
|
|
160
|
-
|
|
168
|
+
**Coverage target**: Match existing config threshold, or 80% line coverage for modified files. Focus on decision-heavy code paths (`[USER]` and `[SPEC]` decisions).
|
|
161
169
|
|
|
162
|
-
|
|
163
|
-
2. **Blueprint deliverable specs** (blueprints or test_advisory.md) — Secondary oracle for domain-specific assertions, edge cases, and expected input/output examples. Use Section 6 (Acceptance Mapping) and Section 9 (Producer Handoff) for concrete expected behaviors.
|
|
164
|
-
3. **Process flow** (process_flow.md) — Tertiary oracle for integration contracts and cross-component handoffs. Subject to accepted deviations (see Process Flow Coverage above).
|
|
170
|
+
### Spec Oracle Hierarchy
|
|
165
171
|
|
|
166
|
-
|
|
172
|
+
Tests derive expected behavior from specs, NOT source code. Oracle priority: **outline.md ACs** (Tier 1) → **blueprints Sections 5+6** (Tier 2) → **process_flow.md** (Tier 1+2 integration) → **test_advisory.md** (advisory, Tier 2+3). When a test fails, the implementation disagrees with the spec — classify per Step 6, don't assume either is wrong.
|
|
167
173
|
|
|
168
174
|
### Acceptance Criteria Path Coverage
|
|
169
175
|
|
|
170
|
-
For every
|
|
171
|
-
|
|
172
|
-
1. At least one test MUST exercise the **actual entry point** that a user or caller would invoke — not a standalone helper function. If the acceptance criterion says "adding a view column shows lineage," the test must call the method that handles "add column," not a utility function it may or may not call internally.
|
|
173
|
-
2. The test MUST assert on the **expected output as described by the spec** (acceptance criterion + blueprint deliverable spec) — not on whatever the implementation happens to return.
|
|
174
|
-
3. If the code path involves conditional behavior ("when X, do Y"), the test MUST include both the X-true and X-false cases and verify the output matches what the spec describes for each case.
|
|
175
|
-
|
|
176
|
-
Tests that only exercise isolated helper functions satisfy unit coverage but do NOT satisfy acceptance criteria coverage. Both are needed.
|
|
176
|
+
For every AC with observable behavior, at least one Tier 1 test MUST exercise the **actual entry point at the AC's abstraction level** (HTTP route → test the route, engine API → test the engine, CLI → test the command). NEVER satisfy an AC exclusively with a unit test of an internal helper. Assertions MUST match spec-defined expected output. Conditional behavior ("when X, do Y") requires both branches tested. Tier 2 supplements but does NOT substitute for Tier 1.
|
|
177
177
|
|
|
178
178
|
### Specialist Test Recommendations
|
|
179
179
|
|
|
@@ -181,18 +181,22 @@ Before finalizing the test plan, review specialist domain knowledge from bluepri
|
|
|
181
181
|
- **Testability Notes**: Edge cases, critical paths, failure modes, and boundary conditions from each blueprint's Approach section (Section 3, `### Testability Notes` subheading)
|
|
182
182
|
- **Deferred test deliverables**: Any test specs from build_report.md's "Test Deliverables Deferred" section (legacy — older blueprints may still include test files in Producer Handoff)
|
|
183
183
|
|
|
184
|
-
|
|
184
|
+
Incorporate specialist insights as advisory, not prescriptive — you own the test strategy.
|
|
185
185
|
|
|
186
186
|
### Output
|
|
187
187
|
|
|
188
188
|
Write the test strategy to `{context_path}/scratch/test_strategy.md`. This serves as both an audit trail and a resume marker for crash recovery.
|
|
189
189
|
|
|
190
190
|
The test strategy document MUST contain:
|
|
191
|
-
-
|
|
192
|
-
-
|
|
193
|
-
-
|
|
191
|
+
- **AC coverage matrix**: For each acceptance criterion, which test(s) cover it, at what tier, and at what abstraction level. Every AC with observable behavior MUST have at least one Tier 1 test.
|
|
192
|
+
- **Tier distribution**: Total count per tier with percentages. Verify: Tier 1 ≥ 40%, Tier 3 ≤ 30%. If not met, adjust plan before proceeding.
|
|
193
|
+
- **Negative test inventory**: List each negative/error-path test explicitly. Verify: ≥ 30% of Tier 1/2 tests are negative. If not met, add more error-path tests.
|
|
194
|
+
- Test files to create (path, tier, target source file)
|
|
195
|
+
- Test cases per file (name, tier, positive/negative, what it validates, **which spec artifact defines the expected behavior**, **what the spec says the expected output is**)
|
|
196
|
+
- Mock requirements per file (mock external deps only for Tier 1/2; Tier 3 may mock internal seams). For infra projects: flag files needing mock-depth assertions (call args, call order, call count).
|
|
194
197
|
- Framework command to run tests
|
|
195
|
-
- Estimated test count and distribution
|
|
198
|
+
- Estimated test count and distribution by tier
|
|
199
|
+
- **Mutation spot-check candidates**: 3 source files with highest Tier 1/2 coverage, and one candidate mutation per file
|
|
196
200
|
- Which specialist recommendations were incorporated (and which were skipped, with rationale)
|
|
197
201
|
- Any acceptance criteria where the expected behavior is ambiguous (flagged for potential SPEC_AMBIGUOUS markers)
|
|
198
202
|
|
|
@@ -204,10 +208,15 @@ Present the test plan via AskUserQuestion:
|
|
|
204
208
|
Question: "Test plan ready:
|
|
205
209
|
|
|
206
210
|
**Framework:** [detected framework]
|
|
207
|
-
**Test files:** [N] files
|
|
211
|
+
**Test files:** [N] files
|
|
208
212
|
**Test cases:** ~[total] tests covering [file count] modified files
|
|
209
|
-
|
|
213
|
+
- Tier 1 (AC tests): [N] tests ([X]% of total, min 40%) covering [M] of [P] acceptance criteria
|
|
214
|
+
- Tier 2 (blueprint contracts): [N] tests
|
|
215
|
+
- Tier 3 (coverage): [N] tests ([X]% of total, max 30%)
|
|
216
|
+
**Negative tests:** [N] of [M] Tier 1/2 tests ([X]%, min 30%)
|
|
217
|
+
**AC coverage:** [M]/[P] acceptance criteria have Tier 1 tests [list any uncovered ACs]
|
|
210
218
|
**Coverage target:** [threshold]%
|
|
219
|
+
**Post-pass:** Mutation spot-check on 3 files
|
|
211
220
|
|
|
212
221
|
Proceed?"
|
|
213
222
|
|
|
@@ -226,33 +235,131 @@ Options:
|
|
|
226
235
|
|
|
227
236
|
Delegate test creation to `intuition-code-writer` agents. Parallelize independent test files (multiple Task calls in a single response). Do NOT use `run_in_background` — you MUST wait for ALL subagents to return before proceeding to Step 6.
|
|
228
237
|
|
|
229
|
-
For each test file, spawn an `intuition-code-writer` agent:
|
|
238
|
+
For each test file, spawn an `intuition-code-writer` agent with a tier-appropriate prompt:
|
|
239
|
+
|
|
240
|
+
### Tier 1 and Tier 2 Test Writer Prompt
|
|
230
241
|
|
|
231
242
|
```
|
|
232
|
-
You are a test writer. Your
|
|
243
|
+
You are a spec-first test writer. Your tests verify the code does what the SPEC says — not what the code happens to do. You will NOT read source code.
|
|
233
244
|
|
|
234
245
|
**Framework:** [detected framework + version]
|
|
235
246
|
**Test conventions:** [naming pattern, directory structure, import style from Step 2]
|
|
236
247
|
**Mock patterns:** [project's established mock approach from Step 2]
|
|
237
248
|
|
|
238
|
-
**
|
|
239
|
-
[Paste the interface
|
|
249
|
+
**Blueprint-derived interfaces (from Step 2 research):**
|
|
250
|
+
[Paste the blueprint interface extraction for this module — signatures, return schemas, error contracts, naming conventions, import paths. This comes from the BLUEPRINT, not from source code.]
|
|
240
251
|
|
|
241
252
|
**Spec oracle — what the code SHOULD do:**
|
|
242
253
|
- Acceptance criteria: [paste relevant acceptance criteria from outline.md]
|
|
243
|
-
- Blueprint spec: Read [relevant blueprint path] —
|
|
244
|
-
- Flow context
|
|
254
|
+
- Blueprint spec: Read [relevant blueprint path] — Section 5 (Deliverable Specification) for detailed contracts, Section 6 (Acceptance Mapping) for AC-to-deliverable mapping
|
|
255
|
+
- Flow context: Read `{context_path}/process_flow.md` (if exists) for integration seams, state mutations, error propagation paths
|
|
256
|
+
- Test advisory: [paste relevant section from test_advisory.md] for edge cases and failure modes
|
|
245
257
|
|
|
258
|
+
**Test tier:** [Tier 1 or Tier 2]
|
|
246
259
|
**Test file path:** [target test file path]
|
|
247
260
|
**Test cases to implement:**
|
|
248
|
-
[List each test case with: name,
|
|
261
|
+
[List each test case with: name, tier, what it validates per the spec, expected behavior FROM SPEC (quote the source), mock requirements]
|
|
262
|
+
|
|
263
|
+
## FILE ACCESS RULES
|
|
264
|
+
- You MAY read: blueprint files, outline.md, process_flow.md, test_advisory.md
|
|
265
|
+
- You MAY read: existing test files in the test directory (for conventions only)
|
|
266
|
+
- You MUST NOT read source files being tested: [list source file paths]
|
|
267
|
+
- You MUST NOT use Grep or Glob to search source files
|
|
268
|
+
|
|
269
|
+
## ASSERTION SOURCING RULES
|
|
270
|
+
For EVERY assertion that checks a specific value: add `# blueprint:{specialist}:L{line} — "{spec quote}"`. If no spec defines the value: `# SPEC_AMBIGUOUS: spec says "{quote}" — value not specified`.
|
|
271
|
+
|
|
272
|
+
Tier 1: test at AC's abstraction level, mock ONLY external systems, assert user-observable outcomes.
|
|
273
|
+
Tier 2: test at blueprint's module level, mock external deps per blueprint, assert behavioral contracts.
|
|
274
|
+
|
|
275
|
+
## ASSERTION DEPTH RULES
|
|
276
|
+
Prefer DEEP assertions over shallow ones. Instead of `assert result is not None` or `assert "key" in result`, assert specific values: `assert result["network_name"] == "myapp-network"`. For infra/DevOps code: assert mock call arguments, order, and count — not just return values.
|
|
277
|
+
|
|
278
|
+
Write the complete test file. Follow existing test style. Do NOT add test infrastructure.
|
|
279
|
+
```
|
|
280
|
+
|
|
281
|
+
### Tier 3 Test Writer Prompt (coverage gap-filling only)
|
|
282
|
+
|
|
283
|
+
```
|
|
284
|
+
You are a coverage test writer. Your job is to increase test coverage for code paths not covered by Tier 1/2 spec tests.
|
|
285
|
+
|
|
286
|
+
**Framework:** [detected framework + version]
|
|
287
|
+
**Test conventions:** [naming pattern, directory structure, import style from Step 2]
|
|
288
|
+
**Source file to cover:** Read [source file path] — you MAY read this file to discover testable code paths
|
|
289
|
+
**Existing coverage gaps:** [list uncovered functions/branches from coverage report]
|
|
290
|
+
**Test file path:** [target test file path]
|
|
291
|
+
|
|
292
|
+
Label every test with: `# Coverage test — not derived from spec`
|
|
293
|
+
Write focused unit tests for uncovered code paths. Follow existing test style.
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
SYNCHRONIZATION GATE: After all subagents return, verify each test file exists on disk using Glob. If any file is missing, retry that subagent once (foreground) with error context. Do NOT proceed to Step 5.5 until every planned test file is confirmed on disk.
|
|
297
|
+
|
|
298
|
+
## STEP 5.5: SPEC COMPLIANCE AUDIT
|
|
249
299
|
|
|
250
|
-
|
|
300
|
+
Before running tests, verify two things: (A) assertions trace to spec, and (B) ACs are tested at the right abstraction level.
|
|
301
|
+
|
|
302
|
+
### Part A: Assertion Provenance
|
|
303
|
+
|
|
304
|
+
For each Tier 1 and Tier 2 test file, identify every assertion that checks a **specific value** (exact strings, status codes, dict keys, field values, call arguments).
|
|
305
|
+
|
|
306
|
+
For each value-assertion, check:
|
|
307
|
+
1. Does it have a `# blueprint:` or `# SPEC_AMBIGUOUS:` comment citing the source?
|
|
308
|
+
2. If no comment, does the value appear in a spec document (outline, blueprint, process_flow, test_advisory)?
|
|
309
|
+
|
|
310
|
+
Assertions without spec provenance AND without SPEC_AMBIGUOUS markers are **source-derived**. (Tier 3 tests are exempt — they are explicitly implementation-derived.)
|
|
311
|
+
|
|
312
|
+
### Part B: Assertion Depth Scoring
|
|
313
|
+
|
|
314
|
+
For each Tier 1 and Tier 2 test file, classify every assertion as **shallow** or **deep**:
|
|
315
|
+
|
|
316
|
+
| Shallow (low signal) | Deep (high signal) |
|
|
317
|
+
|---|---|
|
|
318
|
+
| `is not None` | `== "expected-specific-value"` |
|
|
319
|
+
| `isinstance(result, dict)` | `result["network_name"] == "myapp-network"` |
|
|
320
|
+
| `"key" in result` | `mock_docker.run.assert_called_with(image="x", ports={...})` |
|
|
321
|
+
| `len(result) > 0` | `error.message == "Container myapp not found"` |
|
|
322
|
+
| `result["success"] == True` (when mock returns True) | `result["status"] == "running"` (verified against spec behavior) |
|
|
323
|
+
|
|
324
|
+
**Threshold**: If >50% of assertions in a test file are shallow, flag the file. The test exists but proves almost nothing.
|
|
325
|
+
|
|
326
|
+
**Escalation**: If >30% of ALL Tier 1/2 test files are flagged as shallow-dominant, present via AskUserQuestion:
|
|
327
|
+
|
|
328
|
+
```
|
|
329
|
+
Header: "Assertion Depth Warning"
|
|
330
|
+
Question: "[N] of [M] test files have >50% shallow assertions.
|
|
331
|
+
These tests pass trivially and won't catch real bugs.
|
|
332
|
+
|
|
333
|
+
Examples: [list 2-3 worst offenders with their shallow assertion patterns]
|
|
334
|
+
|
|
335
|
+
Options: fix shallow tests / accept as-is / skip to Step 6"
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
If "fix": delegate to `intuition-code-writer` agents with instructions to replace shallow assertions with specific value checks traced to blueprint specs. If the blueprint doesn't specify the value, add `SPEC_AMBIGUOUS` marker.
|
|
339
|
+
|
|
340
|
+
### Part C: Abstraction Level Coverage
|
|
341
|
+
|
|
342
|
+
For each acceptance criterion in outline.md that describes observable behavior:
|
|
343
|
+
1. Check: is there at least one Tier 1 test that exercises the AC at the abstraction level it describes?
|
|
344
|
+
2. If an AC describes HTTP route behavior but the only test is a unit test of an internal function → flag as **abstraction gap**
|
|
345
|
+
|
|
346
|
+
### Reporting
|
|
347
|
+
|
|
348
|
+
If Part A finds >20% source-derived assertions, Part B flags >30% shallow-dominant files, OR Part C finds any abstraction gaps, present via AskUserQuestion:
|
|
251
349
|
|
|
252
|
-
Write the complete test file to the specified path. Follow the project's existing test style exactly. Do NOT add test infrastructure (no new packages, no config changes).
|
|
253
350
|
```
|
|
351
|
+
Header: "Spec Compliance Audit"
|
|
352
|
+
Question: "[summary of findings]
|
|
353
|
+
|
|
354
|
+
**Provenance:** [N] of [M] Tier 1/2 assertions lack spec citation [if applicable]
|
|
355
|
+
**Abstraction gaps:** [list ACs with only lower-level coverage] [if applicable]
|
|
356
|
+
|
|
357
|
+
Options: fix issues / accept as-is / skip to Step 6"
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
**If "fix issues":** Delegate to `intuition-code-writer` subagents. For provenance gaps, add spec citations or SPEC_AMBIGUOUS markers. For abstraction gaps, create additional Tier 1 tests at the AC's described abstraction level.
|
|
254
361
|
|
|
255
|
-
|
|
362
|
+
**If "accept as-is":** Note findings in test report. Proceed to Step 6.
|
|
256
363
|
|
|
257
364
|
## STEP 6: RUN TESTS + FIX CYCLE
|
|
258
365
|
|
|
@@ -270,29 +377,24 @@ Also run `mcp__ide__getDiagnostics` to catch type errors and lint issues in the
|
|
|
270
377
|
|
|
271
378
|
For each failure, classify. The first question is always: **does the spec clearly define the expected behavior the test asserts?**
|
|
272
379
|
|
|
273
|
-
| Classification |
|
|
274
|
-
|
|
275
|
-
| **Test bug** (wrong assertion,
|
|
276
|
-
| **Spec Violation** (
|
|
277
|
-
| **Spec Ambiguity** (
|
|
278
|
-
| **
|
|
279
|
-
| **
|
|
280
|
-
| **
|
|
281
|
-
| **
|
|
282
|
-
| **
|
|
283
|
-
| **
|
|
380
|
+
| Classification | Action |
|
|
381
|
+
|---|---|
|
|
382
|
+
| **Test bug** (wrong assertion, mock, import) | Fix autonomously — `intuition-code-writer` |
|
|
383
|
+
| **Spec Violation** (code disagrees with clear spec) | Escalate: "expects [spec] per [source], got [actual]. Fix code / update spec / investigate?" |
|
|
384
|
+
| **Spec Ambiguity** (SPEC_AMBIGUOUS or underspecified) | Escalate: "Spec unclear for [scenario]. Code does [X]. Correct? Lock in / change / skip?" |
|
|
385
|
+
| **Impl bug, trivial** (1-3 lines, spec is clear) | Fix directly — `intuition-code-writer` |
|
|
386
|
+
| **Impl bug, moderate** (one file, spec is clear) | Fix — `intuition-code-writer` with diagnosis |
|
|
387
|
+
| **Impl bug, complex** (multi-file structural) | Escalate to user |
|
|
388
|
+
| **Violates [USER] decision** | STOP — escalate immediately |
|
|
389
|
+
| **Violates [SPEC] decision** | Note conflict, proceed with fix |
|
|
390
|
+
| **Touches files outside build scope** | Escalate (scope creep) |
|
|
284
391
|
|
|
285
392
|
### Decision Boundary Checking
|
|
286
393
|
|
|
287
|
-
Before ANY implementation fix (not test-only fixes):
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
- If YES → STOP. Report the conflict to the user via AskUserQuestion: "Test failure in [file] requires changing [what], but this contradicts your decision on [D{N}: title] where you chose [chosen option]. How should I proceed?" Options: "Change my decision" / "Skip this test" / "I'll fix manually"
|
|
292
|
-
3. Check: does the proposed fix contradict any `[SPEC]`-tier decision?
|
|
293
|
-
- If YES → note the conflict in the test report, proceed with the fix (specialist decisions are advisory)
|
|
294
|
-
4. Check: does the fix modify files NOT listed in build_report's "Files Modified" section?
|
|
295
|
-
- If YES → escalate: "Fixing [test] requires modifying [file] which wasn't part of this build. Allow scope expansion?" Options: "Allow this file" / "Skip this test"
|
|
394
|
+
Before ANY implementation fix (not test-only fixes), read all `{context_path}/scratch/*-decisions.json` + `docs/project_notes/decisions.md`. Check:
|
|
395
|
+
1. **[USER] decision conflict** → STOP, escalate via AskUserQuestion with options: "Change decision" / "Skip test" / "Fix manually"
|
|
396
|
+
2. **[SPEC] decision conflict** → note in report, proceed with fix
|
|
397
|
+
3. **File outside build scope** → escalate: "Allow scope expansion?" / "Skip test"
|
|
296
398
|
|
|
297
399
|
### Fix Cycle
|
|
298
400
|
|
|
@@ -305,6 +407,29 @@ For each failure:
|
|
|
305
407
|
|
|
306
408
|
After all failures are addressed (fixed or escalated), run the full test suite one final time to verify no regressions.
|
|
307
409
|
|
|
410
|
+
### Mutation Spot-Check (Post-Pass Gate)
|
|
411
|
+
|
|
412
|
+
After the final test run passes, perform a lightweight mutation check to verify the tests can actually detect bugs. This is NOT full mutation testing — it's a targeted sanity check.
|
|
413
|
+
|
|
414
|
+
1. Select **3 source files** with the most Tier 1/2 test coverage (highest test count targeting them).
|
|
415
|
+
2. For each file, make ONE small, obvious mutation via an `intuition-code-writer` agent:
|
|
416
|
+
- Change a return value (e.g., `"running"` → `"stopped"`, `True` → `False`)
|
|
417
|
+
- Change a string literal (e.g., resource name, error message)
|
|
418
|
+
- Remove a function call (e.g., comment out a validation step)
|
|
419
|
+
- The mutation MUST break behavior that at least one test claims to verify
|
|
420
|
+
3. Re-run ONLY the tests targeting that file.
|
|
421
|
+
4. **Expected result:** At least one test fails per mutation. If a mutation causes zero test failures, the tests covering that file are hollow.
|
|
422
|
+
5. **Revert every mutation immediately** after checking (use `git checkout -- {file}` or re-apply the original content).
|
|
423
|
+
|
|
424
|
+
**If any mutation survives** (0 test failures):
|
|
425
|
+
- Report via AskUserQuestion: "Mutation spot-check: changed [what] in [file] — zero tests caught it. The [N] tests covering this file may be testing mock wiring rather than real behavior. Options: strengthen tests / accept risk / skip"
|
|
426
|
+
- If "strengthen tests": delegate to `intuition-code-writer` with the specific mutation that survived, and instructions to add a test that would catch it.
|
|
427
|
+
|
|
428
|
+
**Track results** in the test report under a new "## Mutation Spot-Check" section:
|
|
429
|
+
| File | Mutation | Tests Run | Caught? |
|
|
430
|
+
|------|----------|-----------|---------|
|
|
431
|
+
| [path] | [what was changed] | [N] | Yes/No |
|
|
432
|
+
|
|
308
433
|
## STEP 7: TEST REPORT
|
|
309
434
|
|
|
310
435
|
Write `{context_path}/test_report.md`:
|
|
@@ -317,15 +442,16 @@ Write `{context_path}/test_report.md`:
|
|
|
317
442
|
**Status:** Pass | Partial | Failed
|
|
318
443
|
|
|
319
444
|
## Test Summary
|
|
320
|
-
- **Tests created:** [N]
|
|
445
|
+
- **Tests created:** [N] (Tier 1: [N], Tier 2: [N], Tier 3: [N])
|
|
321
446
|
- **Passing:** [N]
|
|
322
447
|
- **Failing:** [N]
|
|
448
|
+
- **AC coverage:** [M]/[P] acceptance criteria have Tier 1 tests
|
|
323
449
|
- **Coverage:** [X]% (target: [Y]%)
|
|
324
450
|
|
|
325
451
|
## Test Files Created
|
|
326
|
-
| File | Tests | Covers |
|
|
327
|
-
|
|
328
|
-
| [path] | [count] | [
|
|
452
|
+
| File | Tier | Tests | Covers |
|
|
453
|
+
|------|------|-------|--------|
|
|
454
|
+
| [path] | [1/2/3] | [count] | [what it tests — AC reference or blueprint section] |
|
|
329
455
|
|
|
330
456
|
## Failures & Resolutions
|
|
331
457
|
|
|
@@ -345,6 +471,30 @@ Write `{context_path}/test_report.md`:
|
|
|
345
471
|
|-------|--------|
|
|
346
472
|
| [description] | [why not fixable: USER decision conflict / architectural / scope creep / max retries] |
|
|
347
473
|
|
|
474
|
+
## Assertion Provenance
|
|
475
|
+
- Value-assertions audited: **[N]**
|
|
476
|
+
- Spec-traced: **[N]** (value found in outline, blueprint, process_flow, or test_advisory)
|
|
477
|
+
- SPEC_AMBIGUOUS marked: **[N]** (spec underspecified, asserting implementation value)
|
|
478
|
+
- Source-derived (untraced): **[N]** [if any — list examples and user disposition: "accepted as-is" / "fixed"]
|
|
479
|
+
|
|
480
|
+
## Assertion Depth
|
|
481
|
+
- Tier 1/2 files audited: **[N]**
|
|
482
|
+
- Shallow-dominant files (>50% shallow assertions): **[N]** [list any]
|
|
483
|
+
- User disposition: [fixed / accepted as-is / N/A]
|
|
484
|
+
|
|
485
|
+
## Negative Test Coverage
|
|
486
|
+
- Tier 1/2 negative tests: **[N]** of **[M]** total Tier 1/2 tests (**[X]%**, target: ≥30%)
|
|
487
|
+
- Error paths tested: [list categories — invalid input, dependency failure, state violation, etc.]
|
|
488
|
+
|
|
489
|
+
## Mutation Spot-Check
|
|
490
|
+
| File | Mutation | Tests Run | Caught? |
|
|
491
|
+
|------|----------|-----------|---------|
|
|
492
|
+
| [path] | [what was changed] | [N] | Yes/No |
|
|
493
|
+
|
|
494
|
+
- Mutations tested: **[N]**
|
|
495
|
+
- Caught: **[N]**
|
|
496
|
+
- Survived: **[N]** [list any — with disposition: strengthened / accepted risk]
|
|
497
|
+
|
|
348
498
|
## Decision Compliance
|
|
349
499
|
- Checked **[N]** decisions across **[M]** specialist decision logs
|
|
350
500
|
- `[USER]` violations: [count — list any, or "None"]
|