@tgoodington/intuition 10.4.0 → 10.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/intuition-test/SKILL.md +245 -115
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@tgoodington/intuition",
|
|
3
|
-
"version": "10.
|
|
3
|
+
"version": "10.6.0",
|
|
4
4
|
"description": "Domain-adaptive workflow system for Claude Code: prompt, outline, assemble specialist teams, detail with domain experts, build with format producers, test code output. Supports v8 compat (design, engineer, build) and v9 specialist workflows with 14 domain specialists and 6 format producers.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"claude-code",
|
|
@@ -25,6 +25,8 @@ These are non-negotiable. Violating any of these means the protocol has failed.
|
|
|
25
25
|
9. You MUST run the Exit Protocol after writing the test report. NEVER route to `/intuition-handoff`.
|
|
26
26
|
10. You MUST update `.project-memory-state.json` as part of the Exit Protocol.
|
|
27
27
|
11. You MUST NOT use `run_in_background` for subagents in Steps 2 and 5. All research and test-creation agents MUST complete before their next step begins.
|
|
28
|
+
12. You MUST NOT create tests for non-code deliverables (SKILL.md files, markdown docs, JSON config, static HTML/CSS). Pattern-matching source content is not testing. Classify deliverables in Step 1.5 and skip non-code files entirely.
|
|
29
|
+
13. You MUST design smoke tests for infrastructure scripts (install, deploy, build, publish scripts) that actually execute the script in an isolated temp environment. Do NOT grep infrastructure script source code.
|
|
28
30
|
|
|
29
31
|
## CONTEXT PATH RESOLUTION
|
|
30
32
|
|
|
@@ -39,15 +41,16 @@ On startup, before reading any files:
|
|
|
39
41
|
## PROTOCOL: COMPLETE FLOW
|
|
40
42
|
|
|
41
43
|
```
|
|
42
|
-
Step 1:
|
|
43
|
-
Step
|
|
44
|
-
Step
|
|
45
|
-
Step
|
|
46
|
-
Step
|
|
44
|
+
Step 1: Read context (state, build_report, blueprints, decisions, outline)
|
|
45
|
+
Step 1.5: Classify deliverables (code / infrastructure-script / non-code)
|
|
46
|
+
Step 2: Analyze test infrastructure (2 parallel intuition-researcher agents)
|
|
47
|
+
Step 3: Design test strategy — code and infrastructure only (self-contained domain reasoning)
|
|
48
|
+
Step 4: Confirm test plan with user (including skipped non-code files)
|
|
49
|
+
Step 5: Create tests (delegate to sonnet code-writer subagents)
|
|
47
50
|
Step 5.5: Spec compliance audit (assertion provenance + abstraction level coverage)
|
|
48
|
-
Step 6:
|
|
49
|
-
Step 7:
|
|
50
|
-
Step 8:
|
|
51
|
+
Step 6: Run tests + fix cycle (debugger-style autonomy)
|
|
52
|
+
Step 7: Write test_report.md
|
|
53
|
+
Step 8: Exit Protocol (state update, completion)
|
|
51
54
|
```
|
|
52
55
|
|
|
53
56
|
## RESUME LOGIC
|
|
@@ -73,25 +76,34 @@ Read these files:
|
|
|
73
76
|
7. ALL files matching `{context_path}/scratch/*-decisions.json` — decision tiers and chosen options per specialist.
|
|
74
77
|
8. `docs/project_notes/decisions.md` — project-level ADRs.
|
|
75
78
|
|
|
76
|
-
From build_report.
|
|
77
|
-
- **Files modified** — the scope boundary for testing and fixes
|
|
78
|
-
- **Task results** — which tasks passed/failed build review
|
|
79
|
-
- **Deviations** — any blueprint deviations that may need test coverage
|
|
80
|
-
- **Decision compliance** — any flagged decision issues
|
|
81
|
-
- **Test Deliverables Deferred** — test specs/files that specialists recommended but build skipped (if this section exists)
|
|
79
|
+
From these files, extract: **build_report** → files modified (scope boundary), task results, deviations, decision compliance, deferred test deliverables. **Blueprints** → Section 5 behavioral contracts (signatures, return schemas, error conditions, naming), Section 6 AC mapping, Section 9 file paths. **test_advisory** → edge cases, critical paths, failure modes. **Decisions** → index of all [USER] and [SPEC] decisions with chosen options (used in Step 6 boundary checking).
|
|
82
80
|
|
|
83
|
-
|
|
84
|
-
- **Deliverable Specification** (Section 5): function signatures, return schemas (dict keys, types, value ranges), error conditions with exact messages, naming conventions, state transitions
|
|
85
|
-
- **Acceptance Mapping** (Section 6): which AC each deliverable satisfies and how
|
|
86
|
-
- **Producer Handoff** (Section 9): expected file paths, integration points
|
|
81
|
+
## STEP 1.5: DELIVERABLE CLASSIFICATION
|
|
87
82
|
|
|
88
|
-
|
|
89
|
-
- Edge cases, critical paths, failure modes, and boundary conditions flagged by specialists
|
|
83
|
+
After reading the build report, classify every output file into one of three categories. This determines what gets tested and how. Files classified as `non-code` are excluded from test design entirely — no structural validation, no grep-based content tests.
|
|
90
84
|
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
85
|
+
| Category | Examples | Test Approach |
|
|
86
|
+
|----------|---------|---------------|
|
|
87
|
+
| **Code** | .py, .js, .ts, .jsx, .tsx, .go, .rs, .java, .rb, .php modules | Unit/integration tests (Tiers 1-3) |
|
|
88
|
+
| **Infrastructure script** | postinstall hooks, deploy scripts, build/publish scripts, CLI tools | Smoke tests (actually execute in isolated temp environment) |
|
|
89
|
+
| **Non-code** | SKILL.md, .md docs, .json config/schema, static .html/.css, .yaml config | **Skip** — not executable, not meaningfully testable |
|
|
90
|
+
|
|
91
|
+
**Classification rules:**
|
|
92
|
+
1. Read the `Files Modified` section of build_report.md
|
|
93
|
+
2. For each file, classify by extension AND purpose:
|
|
94
|
+
- SKILL.md files → **non-code** (prompt engineering artifacts — testing them via pattern matching is low-signal and expensive)
|
|
95
|
+
- Markdown documentation → **non-code**
|
|
96
|
+
- JSON schema definitions or config-only changes (e.g., adding `"private": true` to package.json) → **non-code**
|
|
97
|
+
- HTML templates rendered by a server framework (Jinja2, EJS, etc.) → **code** (tested indirectly via route tests, not directly)
|
|
98
|
+
- Static HTML/CSS with no server logic → **non-code**
|
|
99
|
+
- Python/JavaScript/TypeScript modules with functions, classes, or route handlers → **code**
|
|
100
|
+
- Scripts invoked via npm hooks, CLI, or build pipelines → **infrastructure script**
|
|
101
|
+
3. Record the classification for use in Steps 3-5
|
|
102
|
+
|
|
103
|
+
**If ALL deliverables are non-code**, present via AskUserQuestion:
|
|
104
|
+
"All deliverables are non-code (prompt files, config, documentation). Standard testing does not apply. Options: Skip testing / Proceed anyway"
|
|
105
|
+
|
|
106
|
+
Default recommendation: Skip testing. Write a minimal test_report.md noting no testable code was produced.
|
|
95
107
|
|
|
96
108
|
## STEP 2: RESEARCH (2 Parallel Research Agents)
|
|
97
109
|
|
|
@@ -111,20 +123,9 @@ Spawn two `intuition-researcher` agents in parallel (both Task calls in a single
|
|
|
111
123
|
6. **External dependencies** — which external systems each module interacts with (for mocking).
|
|
112
124
|
7. **Existing tests** — search the project for test files matching source file name patterns. Report paths only.
|
|
113
125
|
|
|
114
|
-
Output
|
|
115
|
-
```
|
|
116
|
-
## {specialist_name} — {blueprint_file}
|
|
117
|
-
### Module: {file_path as specified in blueprint}
|
|
118
|
-
**Import:** `from {module} import {name}`
|
|
119
|
-
**Interface:** `function_name(param: Type, ...) -> ReturnType`
|
|
120
|
-
**Return schema:** {what the blueprint says it returns — keys, types, values}
|
|
121
|
-
**Error conditions:** {what the blueprint says about errors}
|
|
122
|
-
**Naming conventions:** {patterns}
|
|
123
|
-
**Mocking targets:** {external deps}
|
|
124
|
-
**Existing tests:** {paths or 'None found'}
|
|
125
|
-
```
|
|
126
|
+
Output per blueprint as: `## {specialist} — {file}` then per module: Import, Interface, Return schema, Error conditions, Naming conventions, Mocking targets, Existing tests. Mark any unspecified field as 'Not specified in blueprint'.
|
|
126
127
|
|
|
127
|
-
CRITICAL: Extract ONLY what the blueprint SPECIFIES
|
|
128
|
+
CRITICAL: Extract ONLY what the blueprint SPECIFIES — not what the source code does."
|
|
128
129
|
|
|
129
130
|
If no blueprints directory exists, fall back to reading source files for structural information only (function signatures, import paths, external dependencies). Use the strict call-signature format: signatures and import paths only, no return value contents, no error messages, no behavioral descriptions.
|
|
130
131
|
|
|
@@ -167,17 +168,32 @@ If process_flow.md conflicts with actual implementation, check build_report.md f
|
|
|
167
168
|
|
|
168
169
|
### File-to-Tier Mapping
|
|
169
170
|
|
|
170
|
-
|
|
171
|
+
Only files classified as `code` or `infrastructure-script` in Step 1.5 appear here. Non-code files are excluded entirely — do NOT create structural validation or grep-based tests for them.
|
|
172
|
+
|
|
173
|
+
| File Type | Category | Test Approach |
|
|
174
|
+
|-----------|----------|---------------|
|
|
175
|
+
| Route / controller | Code | Tier 1 (AC tests via HTTP) |
|
|
176
|
+
| Engine / orchestrator | Code | Tier 1 (AC tests of engine API) |
|
|
177
|
+
| Service / provider | Code | Tier 2 (blueprint contract) |
|
|
178
|
+
| Model / schema | Code | Tier 2 (blueprint contract) |
|
|
179
|
+
| Utility / helper | Code | Tier 3, or Tier 2 if blueprint specifies |
|
|
180
|
+
| Install / deploy / build script | Infrastructure | Smoke test (execute in temp env) |
|
|
181
|
+
| CLI tool | Infrastructure | Smoke test (execute with test args) |
|
|
182
|
+
| Server-rendered template (.html with server logic) | Code | Tested indirectly via Tier 1 route tests |
|
|
183
|
+
| SKILL.md / prompt file | Non-code | **Skip** |
|
|
184
|
+
| Markdown / documentation | Non-code | **Skip** |
|
|
185
|
+
| JSON config / schema-only changes | Non-code | **Skip** |
|
|
186
|
+
| Static HTML / CSS | Non-code | **Skip** |
|
|
187
|
+
|
|
188
|
+
### Tier Distribution Minimums
|
|
171
189
|
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
| Configuration | Skip (test indirectly via Tier 1) | Config effects are observable at route/engine level |
|
|
180
|
-
| Template / static | Skip (test indirectly via Tier 1) | Template output is observable in route responses |
|
|
190
|
+
The test plan MUST satisfy these ratios (calculated against total test count):
|
|
191
|
+
- **Tier 1 ≥ 40%** — If the plan has fewer than 40% Tier 1 tests, add more AC-level tests before adding Tier 2/3. If there are not enough ACs to reach 40%, document why in the test strategy.
|
|
192
|
+
- **Tier 3 ≤ 30%** — Coverage gap-fillers must not dominate the suite. If Tier 3 exceeds 30%, cut the lowest-value coverage tests.
|
|
193
|
+
|
|
194
|
+
### Negative Test Minimums
|
|
195
|
+
|
|
196
|
+
At least **30% of Tier 1 and Tier 2 tests** must exercise error/failure/invalid-input paths: invalid inputs, dependency failures (timeout, connection refused), state violations (e.g., stopping a non-running container), missing config. If the spec doesn't describe error behavior, flag as spec gap with `# SPEC_AMBIGUOUS` — do NOT skip negative testing.
|
|
181
197
|
|
|
182
198
|
### Edge Cases, Mocking, and Coverage
|
|
183
199
|
|
|
@@ -185,34 +201,17 @@ For each modified file, determine which test tier drives its testing:
|
|
|
185
201
|
|
|
186
202
|
**Mock strategy**: Follow project conventions from Step 2. Default: mock external dependencies only. Never mock the unit under test. Tier 1/2 tests mock at system boundaries; Tier 3 may mock internal seams.
|
|
187
203
|
|
|
204
|
+
**Mock depth rule for infrastructure/DevOps projects**: When the project orchestrates external systems (Docker, cloud APIs, CLI tools, databases), pure-mock tests risk testing only mock setup. For each external-system wrapper, at least one Tier 1 test MUST assert **mock interaction depth** — not just return values, but that the mock was called with correct arguments, order, and count per the blueprint spec.
|
|
205
|
+
|
|
188
206
|
**Coverage target**: Match existing config threshold, or 80% line coverage for modified files. Focus on decision-heavy code paths (`[USER]` and `[SPEC]` decisions).
|
|
189
207
|
|
|
190
208
|
### Spec Oracle Hierarchy
|
|
191
209
|
|
|
192
|
-
Tests derive expected behavior from
|
|
193
|
-
|
|
194
|
-
| Oracle | Spec Source | Drives Test Tier | What it defines |
|
|
195
|
-
|--------|------------|-----------------|-----------------|
|
|
196
|
-
| **Primary** | outline.md acceptance criteria | Tier 1 | Observable outcomes the system must produce |
|
|
197
|
-
| **Secondary** | blueprints (Section 5 + 6) | Tier 2 | Detailed behavioral contracts: return schemas, error tables, naming conventions, state machines |
|
|
198
|
-
| **Tertiary** | process_flow.md | Tier 1 + 2 | Integration seams, cross-component handoffs, state mutations, error propagation |
|
|
199
|
-
| **Advisory** | test_advisory.md | Tier 2 + 3 | Edge cases, critical paths, failure modes (supplements, not replaces, blueprints) |
|
|
200
|
-
|
|
201
|
-
When a test fails, the failure means the implementation disagrees with the spec — that is a finding, not automatically a bug in either the test or the code. See Step 6 Classify Failures for how to handle this.
|
|
210
|
+
Tests derive expected behavior from specs, NOT source code. Oracle priority: **outline.md ACs** (Tier 1) → **blueprints Sections 5+6** (Tier 2) → **process_flow.md** (Tier 1+2 integration) → **test_advisory.md** (advisory, Tier 2+3). When a test fails, the implementation disagrees with the spec — classify per Step 6, don't assume either is wrong.
|
|
202
211
|
|
|
203
212
|
### Acceptance Criteria Path Coverage
|
|
204
213
|
|
|
205
|
-
For every
|
|
206
|
-
|
|
207
|
-
1. At least one **Tier 1** test MUST exercise the **actual entry point at the abstraction level the AC describes**. Read the AC carefully to determine the right level:
|
|
208
|
-
- AC mentions HTTP routes or UI behavior → test the route (e.g., `TestClient.post("/admin/container/app/start")`)
|
|
209
|
-
- AC mentions engine or service behavior → test the engine's public API (e.g., `engine.run(context)`)
|
|
210
|
-
- AC mentions CLI output → test the CLI command
|
|
211
|
-
- NEVER satisfy an AC exclusively with a unit test of an internal helper function
|
|
212
|
-
2. The test MUST assert on the **expected output as described by the spec** (acceptance criterion + blueprint deliverable spec). Every assertion value must be traceable to a spec document.
|
|
213
|
-
3. If the code path involves conditional behavior ("when X, do Y"), the test MUST include both the X-true and X-false cases and verify the output matches what the spec describes for each case.
|
|
214
|
-
|
|
215
|
-
Tier 2 tests of internal functions supplement Tier 1 but do NOT substitute for them. Every AC needs Tier 1 coverage.
|
|
214
|
+
For every AC with observable behavior, at least one Tier 1 test MUST exercise the **actual entry point at the AC's abstraction level** (HTTP route → test the route, engine API → test the engine, CLI → test the command). NEVER satisfy an AC exclusively with a unit test of an internal helper. Assertions MUST match spec-defined expected output. Conditional behavior ("when X, do Y") requires both branches tested. Tier 2 supplements but does NOT substitute for Tier 1.
|
|
216
215
|
|
|
217
216
|
### Specialist Test Recommendations
|
|
218
217
|
|
|
@@ -220,19 +219,46 @@ Before finalizing the test plan, review specialist domain knowledge from bluepri
|
|
|
220
219
|
- **Testability Notes**: Edge cases, critical paths, failure modes, and boundary conditions from each blueprint's Approach section (Section 3, `### Testability Notes` subheading)
|
|
221
220
|
- **Deferred test deliverables**: Any test specs from build_report.md's "Test Deliverables Deferred" section (legacy — older blueprints may still include test files in Producer Handoff)
|
|
222
221
|
|
|
223
|
-
|
|
222
|
+
Incorporate specialist insights as advisory, not prescriptive — you own the test strategy.
|
|
223
|
+
|
|
224
|
+
### Smoke Test Design (for infrastructure scripts)
|
|
225
|
+
|
|
226
|
+
For files classified as `infrastructure-script` in Step 1.5, design smoke tests that **actually execute the script** in an isolated environment. Do NOT write structural validation tests that grep the script's source code — pattern-matching source code catches almost nothing useful and wastes tokens.
|
|
227
|
+
|
|
228
|
+
**Isolation strategy:**
|
|
229
|
+
- Set `HOME` (or equivalent) to a temp directory to avoid modifying real user data
|
|
230
|
+
- Create required directory structures (source files, config) in the temp environment
|
|
231
|
+
- Clean up after each test (or use the test framework's temp directory support)
|
|
232
|
+
|
|
233
|
+
**What smoke tests MUST verify:**
|
|
234
|
+
- Script runs without errors (exit code 0) under normal conditions
|
|
235
|
+
- Script creates expected output files and directories
|
|
236
|
+
- Script handles missing prerequisites gracefully (exit code non-zero, meaningful error message)
|
|
237
|
+
- Script preserves data it should preserve (e.g., user config not overwritten on update)
|
|
238
|
+
- Script output matches cross-domain contracts (e.g., generated manifest schema matches the consuming endpoint's expected format)
|
|
239
|
+
|
|
240
|
+
**What smoke tests MUST NOT do:**
|
|
241
|
+
- Grep source code for variable names, array contents, or string patterns
|
|
242
|
+
- Test internal implementation details — test observable behavior only
|
|
243
|
+
- Validate that specific lines of code exist — that is not testing
|
|
244
|
+
|
|
245
|
+
Smoke tests count as **Tier 1** if they exercise an acceptance criterion's observable behavior, or **Tier 2** if they verify a blueprint behavioral contract. They follow the same tier distribution and negative test minimums as code tests.
|
|
224
246
|
|
|
225
247
|
### Output
|
|
226
248
|
|
|
227
249
|
Write the test strategy to `{context_path}/scratch/test_strategy.md`. This serves as both an audit trail and a resume marker for crash recovery.
|
|
228
250
|
|
|
229
251
|
The test strategy document MUST contain:
|
|
230
|
-
- **
|
|
252
|
+
- **Deliverable classification**: List every file from build_report, its category (code / infrastructure-script / non-code), and rationale. Non-code files are listed as skipped with brief reason.
|
|
253
|
+
- **AC coverage matrix**: For each acceptance criterion, which test(s) cover it, at what tier, and at what abstraction level. Every AC with observable behavior MUST have at least one Tier 1 test. ACs that apply exclusively to non-code deliverables should be noted as "not testable — non-code deliverable."
|
|
254
|
+
- **Tier distribution**: Total count per tier with percentages. Verify: Tier 1 ≥ 40%, Tier 3 ≤ 30%. If not met, adjust plan before proceeding.
|
|
255
|
+
- **Negative test inventory**: List each negative/error-path test explicitly. Verify: ≥ 30% of Tier 1/2 tests are negative. If not met, add more error-path tests.
|
|
231
256
|
- Test files to create (path, tier, target source file)
|
|
232
|
-
- Test cases per file (name, tier, what it validates, **which spec artifact defines the expected behavior**, **what the spec says the expected output is**)
|
|
233
|
-
- Mock requirements per file (mock external deps only for Tier 1/2; Tier 3 may mock internal seams)
|
|
257
|
+
- Test cases per file (name, tier, positive/negative, what it validates, **which spec artifact defines the expected behavior**, **what the spec says the expected output is**)
|
|
258
|
+
- Mock requirements per file (mock external deps only for Tier 1/2; Tier 3 may mock internal seams). For infra projects: flag files needing mock-depth assertions (call args, call order, call count).
|
|
234
259
|
- Framework command to run tests
|
|
235
260
|
- Estimated test count and distribution by tier
|
|
261
|
+
- **Mutation spot-check candidates**: 3 source files with highest Tier 1/2 coverage, and one candidate mutation per file (only `code` and `infrastructure-script` files are eligible)
|
|
236
262
|
- Which specialist recommendations were incorporated (and which were skipped, with rationale)
|
|
237
263
|
- Any acceptance criteria where the expected behavior is ambiguous (flagged for potential SPEC_AMBIGUOUS markers)
|
|
238
264
|
|
|
@@ -244,13 +270,16 @@ Present the test plan via AskUserQuestion:
|
|
|
244
270
|
Question: "Test plan ready:
|
|
245
271
|
|
|
246
272
|
**Framework:** [detected framework]
|
|
273
|
+
**Deliverables:** [N] code files + [N] infrastructure scripts tested, [N] non-code files skipped
|
|
247
274
|
**Test files:** [N] files
|
|
248
|
-
**Test cases:** ~[total] tests covering [file count]
|
|
249
|
-
- Tier 1 (AC tests): [N] tests covering [M] of [P] acceptance criteria
|
|
275
|
+
**Test cases:** ~[total] tests covering [file count] testable files
|
|
276
|
+
- Tier 1 (AC tests): [N] tests ([X]% of total, min 40%) covering [M] of [P] testable acceptance criteria
|
|
250
277
|
- Tier 2 (blueprint contracts): [N] tests
|
|
251
|
-
- Tier 3 (coverage): [N] tests
|
|
252
|
-
**
|
|
278
|
+
- Tier 3 (coverage): [N] tests ([X]% of total, max 30%)
|
|
279
|
+
**Negative tests:** [N] of [M] Tier 1/2 tests ([X]%, min 30%)
|
|
280
|
+
**Skipped (non-code):** [list skipped file types and count, e.g., '7 SKILL.md files, 2 config changes']
|
|
253
281
|
**Coverage target:** [threshold]%
|
|
282
|
+
**Post-pass:** Mutation spot-check on 3 files
|
|
254
283
|
|
|
255
284
|
Proceed?"
|
|
256
285
|
|
|
@@ -301,19 +330,13 @@ You are a spec-first test writer. Your tests verify the code does what the SPEC
|
|
|
301
330
|
- You MUST NOT use Grep or Glob to search source files
|
|
302
331
|
|
|
303
332
|
## ASSERTION SOURCING RULES
|
|
304
|
-
For EVERY assertion that checks a specific value
|
|
305
|
-
1. Add a comment citing the spec source: `# blueprint:{specialist}:L{line} — "{spec quote}"`
|
|
306
|
-
2. If no spec document defines the expected value: mark `# SPEC_AMBIGUOUS: spec says "{quote}" — value not specified`
|
|
333
|
+
For EVERY assertion that checks a specific value: add `# blueprint:{specialist}:L{line} — "{spec quote}"`. If no spec defines the value: `# SPEC_AMBIGUOUS: spec says "{quote}" — value not specified`.
|
|
307
334
|
|
|
308
|
-
|
|
309
|
-
|
|
310
|
-
- Mock ONLY external systems (Docker, databases, HTTP clients, cloud APIs) — do NOT mock internal modules
|
|
311
|
-
- Assertions should verify user-observable outcomes, not internal function return values
|
|
335
|
+
Tier 1: test at AC's abstraction level, mock ONLY external systems, assert user-observable outcomes.
|
|
336
|
+
Tier 2: test at blueprint's module level, mock external deps per blueprint, assert behavioral contracts.
|
|
312
337
|
|
|
313
|
-
|
|
314
|
-
-
|
|
315
|
-
- Mock external dependencies as the blueprint specifies
|
|
316
|
-
- Assertions should verify the behavioral contracts from the blueprint's Deliverable Specification
|
|
338
|
+
## ASSERTION DEPTH RULES
|
|
339
|
+
Prefer DEEP assertions over shallow ones. Instead of `assert result is not None` or `assert "key" in result`, assert specific values: `assert result["network_name"] == "myapp-network"`. For infra/DevOps code: assert mock call arguments, order, and count — not just return values.
|
|
317
340
|
|
|
318
341
|
Write the complete test file. Follow existing test style. Do NOT add test infrastructure.
|
|
319
342
|
```
|
|
@@ -333,6 +356,48 @@ Label every test with: `# Coverage test — not derived from spec`
|
|
|
333
356
|
Write focused unit tests for uncovered code paths. Follow existing test style.
|
|
334
357
|
```
|
|
335
358
|
|
|
359
|
+
### Infrastructure Script Smoke Test Writer Prompt
|
|
360
|
+
|
|
361
|
+
```
|
|
362
|
+
You are a smoke test writer. Your tests actually EXECUTE the script and verify observable behavior. You do NOT grep source code.
|
|
363
|
+
|
|
364
|
+
**Framework:** [detected framework + version]
|
|
365
|
+
**Test conventions:** [naming pattern, directory structure, import style from Step 2]
|
|
366
|
+
|
|
367
|
+
**Script under test:** [script file path]
|
|
368
|
+
**Script purpose:** [what the script does — from build report]
|
|
369
|
+
**Script invocation:** [how the script is run — npm hook, CLI command, etc.]
|
|
370
|
+
|
|
371
|
+
**Spec oracle — what the script SHOULD do:**
|
|
372
|
+
- Acceptance criteria: [paste relevant ACs]
|
|
373
|
+
- Blueprint spec: Read [relevant blueprint path] — Section 5 for behavioral contracts
|
|
374
|
+
- Cross-domain contracts: [any output schemas consumed by other components]
|
|
375
|
+
|
|
376
|
+
**Test cases to implement:**
|
|
377
|
+
[List each test case with: name, tier, what it validates, expected observable outcome, isolation requirements]
|
|
378
|
+
|
|
379
|
+
## ISOLATION RULES
|
|
380
|
+
- Create a temp directory for each test (use the framework's temp directory support)
|
|
381
|
+
- Set HOME or equivalent env vars to temp directory before running the script
|
|
382
|
+
- Create any prerequisite files/directories the script expects in the temp environment
|
|
383
|
+
- NEVER run the script against real user directories (~/.claude/, etc.)
|
|
384
|
+
- Clean up temp directories after each test
|
|
385
|
+
|
|
386
|
+
## WHAT TO TEST
|
|
387
|
+
- Script exit code under normal conditions (0 = success)
|
|
388
|
+
- Files and directories created by the script (verify existence, verify contents match expected schema)
|
|
389
|
+
- Script behavior when prerequisites are missing (non-zero exit, error message)
|
|
390
|
+
- Data preservation (files that should survive re-runs are not overwritten)
|
|
391
|
+
- Output format matches downstream consumer contracts
|
|
392
|
+
|
|
393
|
+
## WHAT NOT TO TEST
|
|
394
|
+
- Do NOT read the script's source code to validate its internal structure
|
|
395
|
+
- Do NOT grep for variable names, array contents, or string patterns in source
|
|
396
|
+
- Do NOT test that specific code constructs exist — test what the script DOES
|
|
397
|
+
|
|
398
|
+
Write the complete test file. Follow existing test style.
|
|
399
|
+
```
|
|
400
|
+
|
|
336
401
|
SYNCHRONIZATION GATE: After all subagents return, verify each test file exists on disk using Glob. If any file is missing, retry that subagent once (foreground) with error context. Do NOT proceed to Step 5.5 until every planned test file is confirmed on disk.
|
|
337
402
|
|
|
338
403
|
## STEP 5.5: SPEC COMPLIANCE AUDIT
|
|
@@ -349,20 +414,43 @@ For each value-assertion, check:
|
|
|
349
414
|
|
|
350
415
|
Assertions without spec provenance AND without SPEC_AMBIGUOUS markers are **source-derived**. (Tier 3 tests are exempt — they are explicitly implementation-derived.)
|
|
351
416
|
|
|
352
|
-
### Part B:
|
|
417
|
+
### Part B: Assertion Depth Scoring
|
|
418
|
+
|
|
419
|
+
For each Tier 1 and Tier 2 test file, classify every assertion as **shallow** or **deep**:
|
|
420
|
+
|
|
421
|
+
| Shallow (low signal) | Deep (high signal) |
|
|
422
|
+
|---|---|
|
|
423
|
+
| `is not None` | `== "expected-specific-value"` |
|
|
424
|
+
| `isinstance(result, dict)` | `result["network_name"] == "myapp-network"` |
|
|
425
|
+
| `"key" in result` | `mock_docker.run.assert_called_with(image="x", ports={...})` |
|
|
426
|
+
| `len(result) > 0` | `error.message == "Container myapp not found"` |
|
|
427
|
+
| `result["success"] == True` (when mock returns True) | `result["status"] == "running"` (verified against spec behavior) |
|
|
428
|
+
|
|
429
|
+
**Threshold**: If >50% of assertions in a test file are shallow, flag the file. The test exists but proves almost nothing.
|
|
430
|
+
|
|
431
|
+
**Escalation**: If >30% of ALL Tier 1/2 test files are flagged as shallow-dominant, present via AskUserQuestion:
|
|
432
|
+
|
|
433
|
+
```
|
|
434
|
+
Header: "Assertion Depth Warning"
|
|
435
|
+
Question: "[N] of [M] test files have >50% shallow assertions.
|
|
436
|
+
These tests pass trivially and won't catch real bugs.
|
|
437
|
+
|
|
438
|
+
Examples: [list 2-3 worst offenders with their shallow assertion patterns]
|
|
439
|
+
|
|
440
|
+
Options: fix shallow tests / accept as-is / skip to Step 6"
|
|
441
|
+
```
|
|
442
|
+
|
|
443
|
+
If "fix": delegate to `intuition-code-writer` agents with instructions to replace shallow assertions with specific value checks traced to blueprint specs. If the blueprint doesn't specify the value, add `SPEC_AMBIGUOUS` marker.
|
|
444
|
+
|
|
445
|
+
### Part C: Abstraction Level Coverage
|
|
353
446
|
|
|
354
447
|
For each acceptance criterion in outline.md that describes observable behavior:
|
|
355
448
|
1. Check: is there at least one Tier 1 test that exercises the AC at the abstraction level it describes?
|
|
356
449
|
2. If an AC describes HTTP route behavior but the only test is a unit test of an internal function → flag as **abstraction gap**
|
|
357
450
|
|
|
358
|
-
Example of an abstraction gap:
|
|
359
|
-
- AC T2.3: "Container operations execute successfully and status updates reflect within the next poll cycle"
|
|
360
|
-
- Only test: `test_start_container_success()` which calls `start_container()` directly and checks `result["success"]`
|
|
361
|
-
- Gap: No test exercises the actual HTTP route `POST /admin/container/{app_name}/start` and verifies the response
|
|
362
|
-
|
|
363
451
|
### Reporting
|
|
364
452
|
|
|
365
|
-
If Part A finds >20% source-derived assertions OR Part
|
|
453
|
+
If Part A finds >20% source-derived assertions, Part B flags >30% shallow-dominant files, OR Part C finds any abstraction gaps, present via AskUserQuestion:
|
|
366
454
|
|
|
367
455
|
```
|
|
368
456
|
Header: "Spec Compliance Audit"
|
|
@@ -394,29 +482,24 @@ Also run `mcp__ide__getDiagnostics` to catch type errors and lint issues in the
|
|
|
394
482
|
|
|
395
483
|
For each failure, classify. The first question is always: **does the spec clearly define the expected behavior the test asserts?**
|
|
396
484
|
|
|
397
|
-
| Classification |
|
|
398
|
-
|
|
399
|
-
| **Test bug** (wrong assertion,
|
|
400
|
-
| **Spec Violation** (
|
|
401
|
-
| **Spec Ambiguity** (
|
|
402
|
-
| **
|
|
403
|
-
| **
|
|
404
|
-
| **
|
|
405
|
-
| **
|
|
406
|
-
| **
|
|
407
|
-
| **
|
|
485
|
+
| Classification | Action |
|
|
486
|
+
|---|---|
|
|
487
|
+
| **Test bug** (wrong assertion, mock, import) | Fix autonomously — `intuition-code-writer` |
|
|
488
|
+
| **Spec Violation** (code disagrees with clear spec) | Escalate: "expects [spec] per [source], got [actual]. Fix code / update spec / investigate?" |
|
|
489
|
+
| **Spec Ambiguity** (SPEC_AMBIGUOUS or underspecified) | Escalate: "Spec unclear for [scenario]. Code does [X]. Correct? Lock in / change / skip?" |
|
|
490
|
+
| **Impl bug, trivial** (1-3 lines, spec is clear) | Fix directly — `intuition-code-writer` |
|
|
491
|
+
| **Impl bug, moderate** (one file, spec is clear) | Fix — `intuition-code-writer` with diagnosis |
|
|
492
|
+
| **Impl bug, complex** (multi-file structural) | Escalate to user |
|
|
493
|
+
| **Violates [USER] decision** | STOP — escalate immediately |
|
|
494
|
+
| **Violates [SPEC] decision** | Note conflict, proceed with fix |
|
|
495
|
+
| **Touches files outside build scope** | Escalate (scope creep) |
|
|
408
496
|
|
|
409
497
|
### Decision Boundary Checking
|
|
410
498
|
|
|
411
|
-
Before ANY implementation fix (not test-only fixes):
|
|
412
|
-
|
|
413
|
-
|
|
414
|
-
|
|
415
|
-
- If YES → STOP. Report the conflict to the user via AskUserQuestion: "Test failure in [file] requires changing [what], but this contradicts your decision on [D{N}: title] where you chose [chosen option]. How should I proceed?" Options: "Change my decision" / "Skip this test" / "I'll fix manually"
|
|
416
|
-
3. Check: does the proposed fix contradict any `[SPEC]`-tier decision?
|
|
417
|
-
- If YES → note the conflict in the test report, proceed with the fix (specialist decisions are advisory)
|
|
418
|
-
4. Check: does the fix modify files NOT listed in build_report's "Files Modified" section?
|
|
419
|
-
- If YES → escalate: "Fixing [test] requires modifying [file] which wasn't part of this build. Allow scope expansion?" Options: "Allow this file" / "Skip this test"
|
|
499
|
+
Before ANY implementation fix (not test-only fixes), read all `{context_path}/scratch/*-decisions.json` + `docs/project_notes/decisions.md`. Check:
|
|
500
|
+
1. **[USER] decision conflict** → STOP, escalate via AskUserQuestion with options: "Change decision" / "Skip test" / "Fix manually"
|
|
501
|
+
2. **[SPEC] decision conflict** → note in report, proceed with fix
|
|
502
|
+
3. **File outside build scope** → escalate: "Allow scope expansion?" / "Skip test"
|
|
420
503
|
|
|
421
504
|
### Fix Cycle
|
|
422
505
|
|
|
@@ -429,6 +512,29 @@ For each failure:
|
|
|
429
512
|
|
|
430
513
|
After all failures are addressed (fixed or escalated), run the full test suite one final time to verify no regressions.
|
|
431
514
|
|
|
515
|
+
### Mutation Spot-Check (Post-Pass Gate)
|
|
516
|
+
|
|
517
|
+
After the final test run passes, perform a lightweight mutation check to verify the tests can actually detect bugs. This is NOT full mutation testing — it's a targeted sanity check.
|
|
518
|
+
|
|
519
|
+
1. Select **3 source files** with the most Tier 1/2 test coverage (highest test count targeting them).
|
|
520
|
+
2. For each file, make ONE small, obvious mutation via an `intuition-code-writer` agent:
|
|
521
|
+
- Change a return value (e.g., `"running"` → `"stopped"`, `True` → `False`)
|
|
522
|
+
- Change a string literal (e.g., resource name, error message)
|
|
523
|
+
- Remove a function call (e.g., comment out a validation step)
|
|
524
|
+
- The mutation MUST break behavior that at least one test claims to verify
|
|
525
|
+
3. Re-run ONLY the tests targeting that file.
|
|
526
|
+
4. **Expected result:** At least one test fails per mutation. If a mutation causes zero test failures, the tests covering that file are hollow.
|
|
527
|
+
5. **Revert every mutation immediately** after checking (use `git checkout -- {file}` or re-apply the original content).
|
|
528
|
+
|
|
529
|
+
**If any mutation survives** (0 test failures):
|
|
530
|
+
- Report via AskUserQuestion: "Mutation spot-check: changed [what] in [file] — zero tests caught it. The [N] tests covering this file may be testing mock wiring rather than real behavior. Options: strengthen tests / accept risk / skip"
|
|
531
|
+
- If "strengthen tests": delegate to `intuition-code-writer` with the specific mutation that survived, and instructions to add a test that would catch it.
|
|
532
|
+
|
|
533
|
+
**Track results** in the test report under a new "## Mutation Spot-Check" section:
|
|
534
|
+
| File | Mutation | Tests Run | Caught? |
|
|
535
|
+
|------|----------|-----------|---------|
|
|
536
|
+
| [path] | [what was changed] | [N] | Yes/No |
|
|
537
|
+
|
|
432
538
|
## STEP 7: TEST REPORT
|
|
433
539
|
|
|
434
540
|
Write `{context_path}/test_report.md`:
|
|
@@ -444,7 +550,8 @@ Write `{context_path}/test_report.md`:
|
|
|
444
550
|
- **Tests created:** [N] (Tier 1: [N], Tier 2: [N], Tier 3: [N])
|
|
445
551
|
- **Passing:** [N]
|
|
446
552
|
- **Failing:** [N]
|
|
447
|
-
- **AC coverage:** [M]/[P] acceptance criteria have Tier 1 tests
|
|
553
|
+
- **AC coverage:** [M]/[P] testable acceptance criteria have Tier 1 tests
|
|
554
|
+
- **Skipped deliverables:** [N] non-code files ([list types: SKILL.md, config, etc.])
|
|
448
555
|
- **Coverage:** [X]% (target: [Y]%)
|
|
449
556
|
|
|
450
557
|
## Test Files Created
|
|
@@ -452,6 +559,11 @@ Write `{context_path}/test_report.md`:
|
|
|
452
559
|
|------|------|-------|--------|
|
|
453
560
|
| [path] | [1/2/3] | [count] | [what it tests — AC reference or blueprint section] |
|
|
454
561
|
|
|
562
|
+
## Skipped Deliverables (Non-Code)
|
|
563
|
+
| File | Type | Reason |
|
|
564
|
+
|------|------|--------|
|
|
565
|
+
| [path] | [SKILL.md / config / markdown / etc.] | Non-code — not executable, not testable |
|
|
566
|
+
|
|
455
567
|
## Failures & Resolutions
|
|
456
568
|
|
|
457
569
|
### [Test name]
|
|
@@ -476,6 +588,24 @@ Write `{context_path}/test_report.md`:
|
|
|
476
588
|
- SPEC_AMBIGUOUS marked: **[N]** (spec underspecified, asserting implementation value)
|
|
477
589
|
- Source-derived (untraced): **[N]** [if any — list examples and user disposition: "accepted as-is" / "fixed"]
|
|
478
590
|
|
|
591
|
+
## Assertion Depth
|
|
592
|
+
- Tier 1/2 files audited: **[N]**
|
|
593
|
+
- Shallow-dominant files (>50% shallow assertions): **[N]** [list any]
|
|
594
|
+
- User disposition: [fixed / accepted as-is / N/A]
|
|
595
|
+
|
|
596
|
+
## Negative Test Coverage
|
|
597
|
+
- Tier 1/2 negative tests: **[N]** of **[M]** total Tier 1/2 tests (**[X]%**, target: ≥30%)
|
|
598
|
+
- Error paths tested: [list categories — invalid input, dependency failure, state violation, etc.]
|
|
599
|
+
|
|
600
|
+
## Mutation Spot-Check
|
|
601
|
+
| File | Mutation | Tests Run | Caught? |
|
|
602
|
+
|------|----------|-----------|---------|
|
|
603
|
+
| [path] | [what was changed] | [N] | Yes/No |
|
|
604
|
+
|
|
605
|
+
- Mutations tested: **[N]**
|
|
606
|
+
- Caught: **[N]**
|
|
607
|
+
- Survived: **[N]** [list any — with disposition: strengthened / accepted risk]
|
|
608
|
+
|
|
479
609
|
## Decision Compliance
|
|
480
610
|
- Checked **[N]** decisions across **[M]** specialist decision logs
|
|
481
611
|
- `[USER]` violations: [count — list any, or "None"]
|