@tgoodington/intuition 10.2.0 → 10.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@tgoodington/intuition",
|
|
3
|
-
"version": "10.
|
|
3
|
+
"version": "10.4.0",
|
|
4
4
|
"description": "Domain-adaptive workflow system for Claude Code: prompt, outline, assemble specialist teams, detail with domain experts, build with format producers, test code output. Supports v8 compat (design, engineer, build) and v9 specialist workflows with 14 domain specialists and 6 format producers.",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"claude-code",
|
|
@@ -15,7 +15,7 @@ You are a prompt-engineering discovery partner. You help users transform rough v
|
|
|
15
15
|
These are non-negotiable. Violating any of these means the protocol has failed.
|
|
16
16
|
|
|
17
17
|
1. You MUST ask exactly ONE question per turn. Never two. Never three. If you catch yourself writing a second question mark, delete it.
|
|
18
|
-
2. You MUST use AskUserQuestion for every question. Present
|
|
18
|
+
2. You MUST use AskUserQuestion for every question. Present concrete options derived from what the user has already said. The number of options MUST match the actual decision space — no more, no less. Do NOT default to any fixed number.
|
|
19
19
|
3. Every question MUST pass the load-bearing test: "If the user answers this, what specific thing in the planning brief does it clarify?" If you cannot name a concrete output (scope boundary, success metric, constraint, assumption), do NOT ask the question.
|
|
20
20
|
4. You MUST NOT launch research subagents proactively. Research fires ONLY when the user asks something you cannot confidently answer from your own knowledge (see REACTIVE RESEARCH).
|
|
21
21
|
5. You MUST create both `prompt_brief.md` and `prompt_output.json` when formalizing.
|
|
@@ -130,13 +130,17 @@ If the user's initial CAPTURE response already covers some dimensions, skip them
|
|
|
130
130
|
|
|
131
131
|
Every question in REFINE follows these principles:
|
|
132
132
|
|
|
133
|
-
**Derive from their words.** Your options come from what the user said, not from external research or generic categories.
|
|
133
|
+
**Derive from their words.** Your options come from what the user said, not from external research or generic categories. The number of options reflects the actual decision space. Examples at different scales:
|
|
134
134
|
|
|
135
|
-
|
|
135
|
+
- Binary: "You said 'handle transfers' — does that mean (a) bulk migration when someone leaves, or (b) real-time co-ownership?"
|
|
136
|
+
- Ternary: "You mentioned 'fast' — is that (a) sub-second response times, (b) same-day turnaround, or (c) perceived speed through progressive loading?"
|
|
137
|
+
- Wider: "The notification system could be (a) email-only, (b) in-app real-time, (c) digest-based batching, or (d) user-configured per event type."
|
|
138
|
+
|
|
139
|
+
Always include a trailing "or something else entirely?" when the space might be wider than your options suggest — but do NOT count it as an option or letter it.
|
|
136
140
|
|
|
137
141
|
**One dimension per turn.** Never combine scope and constraints in the same question. Each turn reduces ONE specific ambiguity.
|
|
138
142
|
|
|
139
|
-
**When the user says "I don't know":** SHIFT from asking to offering. Synthesize
|
|
143
|
+
**When the user says "I don't know":** SHIFT from asking to offering. Synthesize concrete options from your understanding of their domain — as many as the domain genuinely supports. NEVER deflect uncertainty back to the user.
|
|
140
144
|
|
|
141
145
|
**When the user gives a short answer:** USE it to build forward. Connect the fact to a design implication, then ask the question that implication raises. "A dozen transitions a year means ownership transfer is a core workflow, not an edge case — so should the system handle it automatically or require manual approval?"
|
|
142
146
|
|
|
@@ -193,7 +197,7 @@ If they want adjustments, address them (1-2 more turns max), then re-present. If
|
|
|
193
197
|
|
|
194
198
|
After the user approves the REFLECT summary, present the major elements from the brief and ask which areas they want decision authority over during the build.
|
|
195
199
|
|
|
196
|
-
Derive the options from the brief's own elements — NOT abstract categories. Look at the scope items, intent qualities, and open questions to identify
|
|
200
|
+
Derive the options from the brief's own elements — NOT abstract categories. Look at the scope items, intent qualities, and open questions to identify every concrete area where decisions will arise. The count depends entirely on the brief — do NOT pad or cap the list.
|
|
197
201
|
|
|
198
202
|
Use AskUserQuestion with multiSelect:
|
|
199
203
|
|
|
@@ -202,10 +206,10 @@ Question: "Now that we've locked the brief, which of these areas do you want fin
|
|
|
202
206
|
|
|
203
207
|
Header: "Decisions"
|
|
204
208
|
multiSelect: true
|
|
205
|
-
Options (derive from brief —
|
|
206
|
-
- "[Concrete area from scope/intent
|
|
207
|
-
- "[Concrete area from scope/intent
|
|
208
|
-
-
|
|
209
|
+
Options (derive from brief — one per genuine decision area):
|
|
210
|
+
- "[Concrete area 1 from scope/intent]" — "Specialist recommends, you approve"
|
|
211
|
+
- "[Concrete area 2 from scope/intent]" — "Specialist recommends, you approve"
|
|
212
|
+
- ... (as many as the brief genuinely requires)
|
|
209
213
|
- "Just handle everything" — "Team has full autonomy — surface only major surprises"
|
|
210
214
|
```
|
|
211
215
|
|
|
@@ -44,6 +44,7 @@ Step 2: Analyze test infrastructure (2 parallel intuition-researcher agents)
|
|
|
44
44
|
Step 3: Design test strategy (self-contained domain reasoning)
|
|
45
45
|
Step 4: Confirm test plan with user
|
|
46
46
|
Step 5: Create tests (delegate to sonnet code-writer subagents)
|
|
47
|
+
Step 5.5: Spec compliance audit (assertion provenance + abstraction level coverage)
|
|
47
48
|
Step 6: Run tests + fix cycle (debugger-style autonomy)
|
|
48
49
|
Step 7: Write test_report.md
|
|
49
50
|
Step 8: Exit Protocol (state update, completion)
|
|
@@ -66,10 +67,11 @@ Read these files:
|
|
|
66
67
|
1. `{context_path}/build_report.md` — REQUIRED. Extract: files modified, task results, deviations from blueprints, decision compliance notes.
|
|
67
68
|
2. `{context_path}/outline.md` — acceptance criteria per task.
|
|
68
69
|
3. `{context_path}/process_flow.md` (if exists) — end-to-end user flows, component interactions, data paths, error paths. Primary source for designing integration and E2E tests. If this file does not exist (non-code project or Lightweight workflow), proceed without it.
|
|
69
|
-
4. `{context_path}/test_advisory.md` — compact testability notes
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
70
|
+
4. `{context_path}/test_advisory.md` — compact testability notes: edge cases, critical paths, failure modes per specialist.
|
|
71
|
+
5. `{context_path}/blueprints/*.md` — REQUIRED for spec-first testing. Blueprints contain the detailed behavioral contracts that define expected behavior: return schemas, error conditions, API endpoint specs, naming conventions, and state machine definitions. Read ALL blueprints. Focus on Section 5 (Deliverable Specification) and Section 6 (Acceptance Mapping) — these contain the concrete expected behaviors that tests assert against. If no blueprints directory exists, proceed with test_advisory and outline only.
|
|
72
|
+
6. `{context_path}/team_assignment.json` — producer assignments (identify code-writer tasks).
|
|
73
|
+
7. ALL files matching `{context_path}/scratch/*-decisions.json` — decision tiers and chosen options per specialist.
|
|
74
|
+
8. `docs/project_notes/decisions.md` — project-level ADRs.
|
|
73
75
|
|
|
74
76
|
From build_report.md, extract:
|
|
75
77
|
- **Files modified** — the scope boundary for testing and fixes
|
|
@@ -78,9 +80,13 @@ From build_report.md, extract:
|
|
|
78
80
|
- **Decision compliance** — any flagged decision issues
|
|
79
81
|
- **Test Deliverables Deferred** — test specs/files that specialists recommended but build skipped (if this section exists)
|
|
80
82
|
|
|
81
|
-
From
|
|
83
|
+
From blueprints, extract behavioral contracts per module:
|
|
84
|
+
- **Deliverable Specification** (Section 5): function signatures, return schemas (dict keys, types, value ranges), error conditions with exact messages, naming conventions, state transitions
|
|
85
|
+
- **Acceptance Mapping** (Section 6): which AC each deliverable satisfies and how
|
|
86
|
+
- **Producer Handoff** (Section 9): expected file paths, integration points
|
|
87
|
+
|
|
88
|
+
From test_advisory.md, extract domain test knowledge:
|
|
82
89
|
- Edge cases, critical paths, failure modes, and boundary conditions flagged by specialists
|
|
83
|
-
- Any test-relevant domain insights
|
|
84
90
|
|
|
85
91
|
From decisions files, build a decision index:
|
|
86
92
|
- Map each `[USER]` decision to its chosen option
|
|
@@ -94,19 +100,61 @@ Spawn two `intuition-researcher` agents in parallel (both Task calls in a single
|
|
|
94
100
|
**Agent 1 — Test Infrastructure:**
|
|
95
101
|
"Search the project for test infrastructure. Find: test framework and runner (jest, vitest, mocha, pytest, etc.), test configuration files, existing test directories and naming conventions, mock/fixture patterns, test utility helpers, CI test commands, coverage configuration and thresholds. Report exact paths and configuration values."
|
|
96
102
|
|
|
97
|
-
**Agent 2 —
|
|
98
|
-
"Read each
|
|
103
|
+
**Agent 2 — Blueprint Interface Extraction:**
|
|
104
|
+
"Read each blueprint in `{context_path}/blueprints/`. Do NOT read any source code files. For each blueprint, extract from the Deliverable Specification section (Section 5):
|
|
105
|
+
|
|
106
|
+
1. **Specified interfaces** — function/method signatures, class definitions, constructor args as described in the blueprint. Use the blueprint's notation exactly.
|
|
107
|
+
2. **Return contracts** — return types, dict key schemas, field names, value ranges, status codes as the blueprint specifies them.
|
|
108
|
+
3. **Error contracts** — error conditions, exact error messages, exception types, HTTP status codes as the blueprint specifies.
|
|
109
|
+
4. **Naming conventions** — resource naming patterns (e.g., `{app_name}-network`, `{app_name}--db-password`).
|
|
110
|
+
5. **File paths** — where the blueprint says each deliverable should live (import paths derive from these).
|
|
111
|
+
6. **External dependencies** — which external systems each module interacts with (for mocking).
|
|
112
|
+
7. **Existing tests** — search the project for test files matching source file name patterns. Report paths only.
|
|
113
|
+
|
|
114
|
+
Output in this format per blueprint:
|
|
115
|
+
```
|
|
116
|
+
## {specialist_name} — {blueprint_file}
|
|
117
|
+
### Module: {file_path as specified in blueprint}
|
|
118
|
+
**Import:** `from {module} import {name}`
|
|
119
|
+
**Interface:** `function_name(param: Type, ...) -> ReturnType`
|
|
120
|
+
**Return schema:** {what the blueprint says it returns — keys, types, values}
|
|
121
|
+
**Error conditions:** {what the blueprint says about errors}
|
|
122
|
+
**Naming conventions:** {patterns}
|
|
123
|
+
**Mocking targets:** {external deps}
|
|
124
|
+
**Existing tests:** {paths or 'None found'}
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
CRITICAL: Extract ONLY what the blueprint SPECIFIES. Do not supplement with information from source code. If the blueprint does not specify a return schema, report 'Not specified in blueprint'. The purpose is to capture what the spec says the code SHOULD look like — not what the code actually looks like."
|
|
128
|
+
|
|
129
|
+
If no blueprints directory exists, fall back to reading source files for structural information only (function signatures, import paths, external dependencies). Use the strict call-signature format: signatures and import paths only, no return value contents, no error messages, no behavioral descriptions.
|
|
99
130
|
|
|
100
131
|
## STEP 3: TEST STRATEGY (Embedded Domain Knowledge)
|
|
101
132
|
|
|
102
133
|
Using research results from Step 2, design the test plan. This is your internal reasoning — no subagent needed.
|
|
103
134
|
|
|
104
|
-
### Test
|
|
135
|
+
### Spec-Oracle Test Tiers
|
|
136
|
+
|
|
137
|
+
Organize tests by what drives the expected behavior, not by technical test type. Tier 1 is mandatory; Tiers 2 and 3 fill coverage gaps.
|
|
138
|
+
|
|
139
|
+
**Tier 1 — Acceptance Criteria Tests** (REQUIRED, highest priority)
|
|
140
|
+
For each AC that describes observable behavior, write at least one test at the **abstraction level the AC describes**:
|
|
141
|
+
- AC describes route behavior → test the HTTP route, verify the response
|
|
142
|
+
- AC describes engine/service outcome → test the engine's public API, verify observable output
|
|
143
|
+
- These tests catch **spec violations** — they answer "did the build produce what the spec required?"
|
|
144
|
+
- Mock external systems (Docker, Azure, git) but NOT internal modules. Test the full internal call chain.
|
|
105
145
|
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
-
|
|
109
|
-
-
|
|
146
|
+
**Tier 2 — Blueprint Behavioral Contract Tests** (REQUIRED when blueprints specify detailed contracts)
|
|
147
|
+
For each behavioral contract in blueprint Deliverable Specifications:
|
|
148
|
+
- Test specific return schemas, error conditions, naming conventions, state transitions
|
|
149
|
+
- These tests verify the **detailed behavioral contracts** specialists specified
|
|
150
|
+
- Test at the module level the blueprint describes (if blueprint specifies `start_container() -> {success, status, error}`, test that function directly)
|
|
151
|
+
- Mock external dependencies as specified in the blueprint
|
|
152
|
+
|
|
153
|
+
**Tier 3 — Coverage Tests** (OPTIONAL, for gap-filling)
|
|
154
|
+
After Tiers 1 and 2, if coverage target is not met:
|
|
155
|
+
- Add unit tests for untested helper functions, edge cases, error paths
|
|
156
|
+
- These tests MAY read source code to discover mockable seams (this is the ONLY tier where source code reading is allowed for test design)
|
|
157
|
+
- Label these tests clearly: `# Coverage test — not derived from spec`
|
|
110
158
|
|
|
111
159
|
### Process Flow Coverage (if process_flow.md exists)
|
|
112
160
|
|
|
@@ -115,55 +163,56 @@ Use process_flow.md to identify cross-component integration boundaries and E2E p
|
|
|
115
163
|
- **Error propagation**: For each error path described in Core Flows, design a test that triggers the failure and verifies the described fallback behavior.
|
|
116
164
|
- **State mutations**: For each state mutation listed in Core Flows, verify the mutation occurs and dependents react correctly.
|
|
117
165
|
|
|
118
|
-
If process_flow.md conflicts with actual implementation
|
|
166
|
+
If process_flow.md conflicts with actual implementation, check build_report.md for accepted deviations. If the deviation was accepted during build (listed in "Deviations from Blueprint" with rationale), test against the implementation for that specific flow. If the deviation is NOT listed as accepted, test against process_flow.md and classify any failure as a Spec Violation.
|
|
167
|
+
|
|
168
|
+
### File-to-Tier Mapping
|
|
119
169
|
|
|
120
|
-
|
|
170
|
+
For each modified file, determine which test tier drives its testing:
|
|
121
171
|
|
|
122
|
-
|
|
172
|
+
| File Type | Primary Tier | Rationale |
|
|
173
|
+
|-----------|-------------|-----------|
|
|
174
|
+
| Route / controller | Tier 1 (AC tests via HTTP) | ACs describe route behavior — test the route |
|
|
175
|
+
| Engine / orchestrator | Tier 1 (AC tests of engine API) | ACs describe engine outcomes — test the engine |
|
|
176
|
+
| Service / provider | Tier 2 (blueprint contract) | Blueprints specify provider contracts |
|
|
177
|
+
| Model / schema | Tier 2 (blueprint contract) | Blueprints specify data shapes |
|
|
178
|
+
| Utility / helper | Tier 3 (coverage) or Tier 2 (if blueprint specifies) | Only Tier 2 if blueprint has a deliverable spec for it |
|
|
179
|
+
| Configuration | Skip (test indirectly via Tier 1) | Config effects are observable at route/engine level |
|
|
180
|
+
| Template / static | Skip (test indirectly via Tier 1) | Template output is observable in route responses |
|
|
123
181
|
|
|
124
|
-
|
|
125
|
-
|-----------|-----------|----------|
|
|
126
|
-
| Utility / helper | Unit | High |
|
|
127
|
-
| Model / schema | Integration | High |
|
|
128
|
-
| Route / controller | Integration | High |
|
|
129
|
-
| Component (UI) | Component + Unit | Medium |
|
|
130
|
-
| Service / repository | Integration | Medium |
|
|
131
|
-
| Configuration | Skip (test indirectly) | Low |
|
|
132
|
-
| Migration / seed | Skip (test via integration) | Low |
|
|
133
|
-
| Static asset / style | Skip | None |
|
|
182
|
+
### Edge Cases, Mocking, and Coverage
|
|
134
183
|
|
|
135
|
-
|
|
184
|
+
**Edge cases** to enumerate per interface: boundary values, null/undefined inputs, error paths (invalid input, failed external calls, timeouts), permission edges, state transitions.
|
|
136
185
|
|
|
137
|
-
|
|
138
|
-
- **Boundary values**: min, max, zero, negative, empty string, empty array
|
|
139
|
-
- **Null/undefined handling**: missing required fields, null inputs
|
|
140
|
-
- **Error paths**: invalid input, failed external calls, timeout scenarios
|
|
141
|
-
- **Permission edges**: unauthorized access, role boundaries (if applicable)
|
|
142
|
-
- **State transitions**: before/after effects, idempotent operations
|
|
186
|
+
**Mock strategy**: Follow project conventions from Step 2. Default: mock external dependencies only. Never mock the unit under test. Tier 1/2 tests mock at system boundaries; Tier 3 may mock internal seams.
|
|
143
187
|
|
|
144
|
-
|
|
188
|
+
**Coverage target**: Match existing config threshold, or 80% line coverage for modified files. Focus on decision-heavy code paths (`[USER]` and `[SPEC]` decisions).
|
|
145
189
|
|
|
146
|
-
|
|
147
|
-
- If project uses specific mock patterns (jest.mock, sinon, test doubles) → follow them
|
|
148
|
-
- Default: mock external dependencies only (HTTP clients, databases, file system, third-party APIs)
|
|
149
|
-
- Never mock the unit under test
|
|
150
|
-
- Prefer dependency injection over module mocking when the codebase uses DI
|
|
190
|
+
### Spec Oracle Hierarchy
|
|
151
191
|
|
|
152
|
-
|
|
192
|
+
Tests derive expected behavior from spec artifacts, NOT from reading source code. Each oracle maps to a test tier:
|
|
153
193
|
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
194
|
+
| Oracle | Spec Source | Drives Test Tier | What it defines |
|
|
195
|
+
|--------|------------|-----------------|-----------------|
|
|
196
|
+
| **Primary** | outline.md acceptance criteria | Tier 1 | Observable outcomes the system must produce |
|
|
197
|
+
| **Secondary** | blueprints (Section 5 + 6) | Tier 2 | Detailed behavioral contracts: return schemas, error tables, naming conventions, state machines |
|
|
198
|
+
| **Tertiary** | process_flow.md | Tier 1 + 2 | Integration seams, cross-component handoffs, state mutations, error propagation |
|
|
199
|
+
| **Advisory** | test_advisory.md | Tier 2 + 3 | Edge cases, critical paths, failure modes (supplements, not replaces, blueprints) |
|
|
200
|
+
|
|
201
|
+
When a test fails, the failure means the implementation disagrees with the spec — that is a finding, not automatically a bug in either the test or the code. See Step 6 Classify Failures for how to handle this.
|
|
157
202
|
|
|
158
203
|
### Acceptance Criteria Path Coverage
|
|
159
204
|
|
|
160
205
|
For every acceptance criterion in outline.md that describes observable behavior ("displays X", "uses Y for Z", "produces output containing W"):
|
|
161
206
|
|
|
162
|
-
1. At least one test MUST exercise the **actual entry point
|
|
163
|
-
|
|
164
|
-
|
|
207
|
+
1. At least one **Tier 1** test MUST exercise the **actual entry point at the abstraction level the AC describes**. Read the AC carefully to determine the right level:
|
|
208
|
+
- AC mentions HTTP routes or UI behavior → test the route (e.g., `TestClient.post("/admin/container/app/start")`)
|
|
209
|
+
- AC mentions engine or service behavior → test the engine's public API (e.g., `engine.run(context)`)
|
|
210
|
+
- AC mentions CLI output → test the CLI command
|
|
211
|
+
- NEVER satisfy an AC exclusively with a unit test of an internal helper function
|
|
212
|
+
2. The test MUST assert on the **expected output as described by the spec** (acceptance criterion + blueprint deliverable spec). Every assertion value must be traceable to a spec document.
|
|
213
|
+
3. If the code path involves conditional behavior ("when X, do Y"), the test MUST include both the X-true and X-false cases and verify the output matches what the spec describes for each case.
|
|
165
214
|
|
|
166
|
-
|
|
215
|
+
Tier 2 tests of internal functions supplement Tier 1 but do NOT substitute for them. Every AC needs Tier 1 coverage.
|
|
167
216
|
|
|
168
217
|
### Specialist Test Recommendations
|
|
169
218
|
|
|
@@ -178,12 +227,14 @@ Specialists have domain expertise about what should be tested. Incorporate their
|
|
|
178
227
|
Write the test strategy to `{context_path}/scratch/test_strategy.md`. This serves as both an audit trail and a resume marker for crash recovery.
|
|
179
228
|
|
|
180
229
|
The test strategy document MUST contain:
|
|
181
|
-
-
|
|
182
|
-
- Test
|
|
183
|
-
-
|
|
230
|
+
- **AC coverage matrix**: For each acceptance criterion, which test(s) cover it, at what tier, and at what abstraction level. Every AC with observable behavior MUST have at least one Tier 1 test.
|
|
231
|
+
- Test files to create (path, tier, target source file)
|
|
232
|
+
- Test cases per file (name, tier, what it validates, **which spec artifact defines the expected behavior**, **what the spec says the expected output is**)
|
|
233
|
+
- Mock requirements per file (mock external deps only for Tier 1/2; Tier 3 may mock internal seams)
|
|
184
234
|
- Framework command to run tests
|
|
185
|
-
- Estimated test count and distribution
|
|
235
|
+
- Estimated test count and distribution by tier
|
|
186
236
|
- Which specialist recommendations were incorporated (and which were skipped, with rationale)
|
|
237
|
+
- Any acceptance criteria where the expected behavior is ambiguous (flagged for potential SPEC_AMBIGUOUS markers)
|
|
187
238
|
|
|
188
239
|
## STEP 4: USER CONFIRMATION
|
|
189
240
|
|
|
@@ -193,9 +244,12 @@ Present the test plan via AskUserQuestion:
|
|
|
193
244
|
Question: "Test plan ready:
|
|
194
245
|
|
|
195
246
|
**Framework:** [detected framework]
|
|
196
|
-
**Test files:** [N] files
|
|
247
|
+
**Test files:** [N] files
|
|
197
248
|
**Test cases:** ~[total] tests covering [file count] modified files
|
|
198
|
-
|
|
249
|
+
- Tier 1 (AC tests): [N] tests covering [M] of [P] acceptance criteria
|
|
250
|
+
- Tier 2 (blueprint contracts): [N] tests
|
|
251
|
+
- Tier 3 (coverage): [N] tests
|
|
252
|
+
**AC coverage:** [M]/[P] acceptance criteria have Tier 1 tests [list any uncovered ACs]
|
|
199
253
|
**Coverage target:** [threshold]%
|
|
200
254
|
|
|
201
255
|
Proceed?"
|
|
@@ -215,27 +269,114 @@ Options:
|
|
|
215
269
|
|
|
216
270
|
Delegate test creation to `intuition-code-writer` agents. Parallelize independent test files (multiple Task calls in a single response). Do NOT use `run_in_background` — you MUST wait for ALL subagents to return before proceeding to Step 6.
|
|
217
271
|
|
|
218
|
-
For each test file, spawn an `intuition-code-writer` agent:
|
|
272
|
+
For each test file, spawn an `intuition-code-writer` agent with a tier-appropriate prompt:
|
|
273
|
+
|
|
274
|
+
### Tier 1 and Tier 2 Test Writer Prompt
|
|
219
275
|
|
|
220
276
|
```
|
|
221
|
-
You are a test writer.
|
|
277
|
+
You are a spec-first test writer. Your tests verify the code does what the SPEC says — not what the code happens to do. You will NOT read source code.
|
|
222
278
|
|
|
223
279
|
**Framework:** [detected framework + version]
|
|
224
280
|
**Test conventions:** [naming pattern, directory structure, import style from Step 2]
|
|
225
281
|
**Mock patterns:** [project's established mock approach from Step 2]
|
|
226
282
|
|
|
227
|
-
**
|
|
228
|
-
|
|
229
|
-
|
|
283
|
+
**Blueprint-derived interfaces (from Step 2 research):**
|
|
284
|
+
[Paste the blueprint interface extraction for this module — signatures, return schemas, error contracts, naming conventions, import paths. This comes from the BLUEPRINT, not from source code.]
|
|
285
|
+
|
|
286
|
+
**Spec oracle — what the code SHOULD do:**
|
|
287
|
+
- Acceptance criteria: [paste relevant acceptance criteria from outline.md]
|
|
288
|
+
- Blueprint spec: Read [relevant blueprint path] — Section 5 (Deliverable Specification) for detailed contracts, Section 6 (Acceptance Mapping) for AC-to-deliverable mapping
|
|
289
|
+
- Flow context: Read `{context_path}/process_flow.md` (if exists) for integration seams, state mutations, error propagation paths
|
|
290
|
+
- Test advisory: [paste relevant section from test_advisory.md] for edge cases and failure modes
|
|
230
291
|
|
|
292
|
+
**Test tier:** [Tier 1 or Tier 2]
|
|
231
293
|
**Test file path:** [target test file path]
|
|
232
294
|
**Test cases to implement:**
|
|
233
|
-
[List each test case
|
|
295
|
+
[List each test case with: name, tier, what it validates per the spec, expected behavior FROM SPEC (quote the source), mock requirements]
|
|
296
|
+
|
|
297
|
+
## FILE ACCESS RULES
|
|
298
|
+
- You MAY read: blueprint files, outline.md, process_flow.md, test_advisory.md
|
|
299
|
+
- You MAY read: existing test files in the test directory (for conventions only)
|
|
300
|
+
- You MUST NOT read source files being tested: [list source file paths]
|
|
301
|
+
- You MUST NOT use Grep or Glob to search source files
|
|
302
|
+
|
|
303
|
+
## ASSERTION SOURCING RULES
|
|
304
|
+
For EVERY assertion that checks a specific value (exact string, number, status code, dict key):
|
|
305
|
+
1. Add a comment citing the spec source: `# blueprint:{specialist}:L{line} — "{spec quote}"`
|
|
306
|
+
2. If no spec document defines the expected value: mark `# SPEC_AMBIGUOUS: spec says "{quote}" — value not specified`
|
|
307
|
+
|
|
308
|
+
For Tier 1 tests:
|
|
309
|
+
- Test at the abstraction level the AC describes (HTTP routes, CLI output, observable state changes)
|
|
310
|
+
- Mock ONLY external systems (Docker, databases, HTTP clients, cloud APIs) — do NOT mock internal modules
|
|
311
|
+
- Assertions should verify user-observable outcomes, not internal function return values
|
|
312
|
+
|
|
313
|
+
For Tier 2 tests:
|
|
314
|
+
- Test at the module level the blueprint describes
|
|
315
|
+
- Mock external dependencies as the blueprint specifies
|
|
316
|
+
- Assertions should verify the behavioral contracts from the blueprint's Deliverable Specification
|
|
317
|
+
|
|
318
|
+
Write the complete test file. Follow existing test style. Do NOT add test infrastructure.
|
|
319
|
+
```
|
|
320
|
+
|
|
321
|
+
### Tier 3 Test Writer Prompt (coverage gap-filling only)
|
|
234
322
|
|
|
235
|
-
Write the complete test file to the specified path. Follow the project's existing test style exactly. Do NOT add test infrastructure (no new packages, no config changes).
|
|
236
323
|
```
|
|
324
|
+
You are a coverage test writer. Your job is to increase test coverage for code paths not covered by Tier 1/2 spec tests.
|
|
237
325
|
|
|
238
|
-
|
|
326
|
+
**Framework:** [detected framework + version]
|
|
327
|
+
**Test conventions:** [naming pattern, directory structure, import style from Step 2]
|
|
328
|
+
**Source file to cover:** Read [source file path] — you MAY read this file to discover testable code paths
|
|
329
|
+
**Existing coverage gaps:** [list uncovered functions/branches from coverage report]
|
|
330
|
+
**Test file path:** [target test file path]
|
|
331
|
+
|
|
332
|
+
Label every test with: `# Coverage test — not derived from spec`
|
|
333
|
+
Write focused unit tests for uncovered code paths. Follow existing test style.
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
SYNCHRONIZATION GATE: After all subagents return, verify each test file exists on disk using Glob. If any file is missing, retry that subagent once (foreground) with error context. Do NOT proceed to Step 5.5 until every planned test file is confirmed on disk.
|
|
337
|
+
|
|
338
|
+
## STEP 5.5: SPEC COMPLIANCE AUDIT
|
|
339
|
+
|
|
340
|
+
Before running tests, verify two things: (A) assertions trace to spec, and (B) ACs are tested at the right abstraction level.
|
|
341
|
+
|
|
342
|
+
### Part A: Assertion Provenance
|
|
343
|
+
|
|
344
|
+
For each Tier 1 and Tier 2 test file, identify every assertion that checks a **specific value** (exact strings, status codes, dict keys, field values, call arguments).
|
|
345
|
+
|
|
346
|
+
For each value-assertion, check:
|
|
347
|
+
1. Does it have a `# blueprint:` or `# SPEC_AMBIGUOUS:` comment citing the source?
|
|
348
|
+
2. If no comment, does the value appear in a spec document (outline, blueprint, process_flow, test_advisory)?
|
|
349
|
+
|
|
350
|
+
Assertions without spec provenance AND without SPEC_AMBIGUOUS markers are **source-derived**. (Tier 3 tests are exempt — they are explicitly implementation-derived.)
|
|
351
|
+
|
|
352
|
+
### Part B: Abstraction Level Coverage
|
|
353
|
+
|
|
354
|
+
For each acceptance criterion in outline.md that describes observable behavior:
|
|
355
|
+
1. Check: is there at least one Tier 1 test that exercises the AC at the abstraction level it describes?
|
|
356
|
+
2. If an AC describes HTTP route behavior but the only test is a unit test of an internal function → flag as **abstraction gap**
|
|
357
|
+
|
|
358
|
+
Example of an abstraction gap:
|
|
359
|
+
- AC T2.3: "Container operations execute successfully and status updates reflect within the next poll cycle"
|
|
360
|
+
- Only test: `test_start_container_success()` which calls `start_container()` directly and checks `result["success"]`
|
|
361
|
+
- Gap: No test exercises the actual HTTP route `POST /admin/container/{app_name}/start` and verifies the response
|
|
362
|
+
|
|
363
|
+
### Reporting
|
|
364
|
+
|
|
365
|
+
If Part A finds >20% source-derived assertions OR Part B finds any abstraction gaps, present via AskUserQuestion:
|
|
366
|
+
|
|
367
|
+
```
|
|
368
|
+
Header: "Spec Compliance Audit"
|
|
369
|
+
Question: "[summary of findings]
|
|
370
|
+
|
|
371
|
+
**Provenance:** [N] of [M] Tier 1/2 assertions lack spec citation [if applicable]
|
|
372
|
+
**Abstraction gaps:** [list ACs with only lower-level coverage] [if applicable]
|
|
373
|
+
|
|
374
|
+
Options: fix issues / accept as-is / skip to Step 6"
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
**If "fix issues":** Delegate to `intuition-code-writer` subagents. For provenance gaps, add spec citations or SPEC_AMBIGUOUS markers. For abstraction gaps, create additional Tier 1 tests at the AC's described abstraction level.
|
|
378
|
+
|
|
379
|
+
**If "accept as-is":** Note findings in test report. Proceed to Step 6.
|
|
239
380
|
|
|
240
381
|
## STEP 6: RUN TESTS + FIX CYCLE
|
|
241
382
|
|
|
@@ -251,17 +392,19 @@ Also run `mcp__ide__getDiagnostics` to catch type errors and lint issues in the
|
|
|
251
392
|
|
|
252
393
|
### Classify Failures
|
|
253
394
|
|
|
254
|
-
For each failure, classify:
|
|
395
|
+
For each failure, classify. The first question is always: **does the spec clearly define the expected behavior the test asserts?**
|
|
255
396
|
|
|
256
|
-
| Classification | Action |
|
|
257
|
-
|
|
258
|
-
| **Test bug** (wrong assertion, incorrect mock, import error) | Fix autonomously — `intuition-code-writer` agent |
|
|
259
|
-
| **
|
|
260
|
-
| **
|
|
261
|
-
| **Implementation bug,
|
|
262
|
-
| **
|
|
263
|
-
| **
|
|
264
|
-
| **Fix
|
|
397
|
+
| Classification | How to identify | Action |
|
|
398
|
+
|---|---|---|
|
|
399
|
+
| **Test bug** (wrong assertion, incorrect mock, import error) | Test doesn't match the spec it claims to test, or has a structural error | Fix autonomously — `intuition-code-writer` agent |
|
|
400
|
+
| **Spec Violation** (implementation disagrees with spec) | Test asserts spec-defined behavior, implementation returns something different, and the spec is clear and unambiguous | Escalate to user: "Test [name] expects [spec behavior] per [acceptance criterion / blueprint spec], but implementation returns [actual]. Is the spec wrong or the code?" Options: "Fix the code" / "Spec was wrong — update test" / "I'll investigate" |
|
|
401
|
+
| **Spec Ambiguity** (spec underspecified, test assertion is a guess) | Test is marked SPEC_AMBIGUOUS, or the spec doesn't define the expected value precisely enough to write a deterministic assertion | Escalate to user: "Spec doesn't clearly define expected behavior for [scenario]. The code does [X]. Is that correct?" Options: "Yes, that's correct — lock it in" / "No, it should do [other]" / "Skip this test" |
|
|
402
|
+
| **Implementation bug, trivial** (off-by-one, missing null check, typo — 1-3 lines) | Spec is clear, implementation is clearly wrong, fix is small | Fix directly — `intuition-code-writer` agent |
|
|
403
|
+
| **Implementation bug, moderate** (logic error, missing handler — contained to one file) | Spec is clear, implementation is wrong, fix is contained | Fix — `intuition-code-writer` agent with full diagnosis |
|
|
404
|
+
| **Implementation bug, complex** (multi-file structural issue) | Spec is clear, but fix requires architectural changes | Escalate to user |
|
|
405
|
+
| **Fix would violate [USER] decision** | Any tier | STOP — escalate to user immediately |
|
|
406
|
+
| **Fix would violate [SPEC] decision** | Any tier | Note the conflict, proceed with fix (specialist had authority) |
|
|
407
|
+
| **Fix touches files outside build_report scope** | Any tier | Escalate to user (scope creep) |
|
|
265
408
|
|
|
266
409
|
### Decision Boundary Checking
|
|
267
410
|
|
|
@@ -298,22 +441,24 @@ Write `{context_path}/test_report.md`:
|
|
|
298
441
|
**Status:** Pass | Partial | Failed
|
|
299
442
|
|
|
300
443
|
## Test Summary
|
|
301
|
-
- **Tests created:** [N]
|
|
444
|
+
- **Tests created:** [N] (Tier 1: [N], Tier 2: [N], Tier 3: [N])
|
|
302
445
|
- **Passing:** [N]
|
|
303
446
|
- **Failing:** [N]
|
|
447
|
+
- **AC coverage:** [M]/[P] acceptance criteria have Tier 1 tests
|
|
304
448
|
- **Coverage:** [X]% (target: [Y]%)
|
|
305
449
|
|
|
306
450
|
## Test Files Created
|
|
307
|
-
| File | Tests | Covers |
|
|
308
|
-
|
|
309
|
-
| [path] | [count] | [
|
|
451
|
+
| File | Tier | Tests | Covers |
|
|
452
|
+
|------|------|-------|--------|
|
|
453
|
+
| [path] | [1/2/3] | [count] | [what it tests — AC reference or blueprint section] |
|
|
310
454
|
|
|
311
455
|
## Failures & Resolutions
|
|
312
456
|
|
|
313
457
|
### [Test name]
|
|
314
|
-
- **Type:** [test bug / implementation bug — trivial/moderate/complex]
|
|
315
|
-
- **
|
|
316
|
-
- **
|
|
458
|
+
- **Type:** [test bug / spec violation / spec ambiguity / implementation bug — trivial/moderate/complex]
|
|
459
|
+
- **Spec source:** [which acceptance criterion, blueprint spec, or process_flow section defined the expected behavior]
|
|
460
|
+
- **Root cause:** [description — what the spec says vs. what the implementation does]
|
|
461
|
+
- **Resolution:** [fix applied] OR **Escalated:** [reason — spec violation pending user decision / ambiguity / architectural / scope creep / max retries]
|
|
317
462
|
|
|
318
463
|
## Implementation Fixes Applied
|
|
319
464
|
| File | Change | Rationale |
|
|
@@ -325,6 +470,12 @@ Write `{context_path}/test_report.md`:
|
|
|
325
470
|
|-------|--------|
|
|
326
471
|
| [description] | [why not fixable: USER decision conflict / architectural / scope creep / max retries] |
|
|
327
472
|
|
|
473
|
+
## Assertion Provenance
|
|
474
|
+
- Value-assertions audited: **[N]**
|
|
475
|
+
- Spec-traced: **[N]** (value found in outline, blueprint, process_flow, or test_advisory)
|
|
476
|
+
- SPEC_AMBIGUOUS marked: **[N]** (spec underspecified, asserting implementation value)
|
|
477
|
+
- Source-derived (untraced): **[N]** [if any — list examples and user disposition: "accepted as-is" / "fixed"]
|
|
478
|
+
|
|
328
479
|
## Decision Compliance
|
|
329
480
|
- Checked **[N]** decisions across **[M]** specialist decision logs
|
|
330
481
|
- `[USER]` violations: [count — list any, or "None"]
|