create-ai-project 1.23.0 → 1.23.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -96,11 +96,16 @@ For each function/method in implementation files, check against coding-standards
96
96
  #### 3-2. Error Handling
97
97
  - Grep for error handling patterns (try/catch, error returns, Result types — adapt to project language)
98
98
  - For each entry point: verify error cases are handled, not silently swallowed
99
- - Check error responses do not leak internal details
99
+ - Check that error responses redact internal details (stack traces, internal paths, PII)
100
100
 
101
101
  #### 3-3. Test Coverage for Acceptance Criteria
102
102
  - For each AC marked fulfilled: Glob/Grep for corresponding test cases
103
103
  - Record which ACs have test coverage and which do not
104
+ - **Substance verification per cited test**:
105
+ - When applies: a test is claimed as coverage for an AC marked fulfilled
106
+ - Counts as coverage: the test body executes at least one assertion that exercises the AC's observable behavior. Intentional-absence assertions (e.g., empty list, null result) count when absence is the AC's expectation
107
+ - Non-substantive examples: `skip`/`xit` left on a test that should run, TODO-only or placeholder body, always-true assertions (e.g., `expect(true).toBe(true)`, `expect(arr.length).toBeGreaterThanOrEqual(0)`)
108
+ - Action on non-substantive: record as `coverage_gap` with rationale citing the AC reference and the specific substance issue (file:line)
104
109
 
105
110
  #### Finding Classification
106
111
 
@@ -275,16 +280,3 @@ Recommend higher-level review when:
275
280
  - Critical performance issues found
276
281
  - Implementation introduces in-scope elements absent from the Design Doc's Minimal Surface Alternatives section. The in-scope set is context-specific: for backend, persistent state, public-contract elements (exported types, API fields, function signatures, schema definitions), fields crossing module/service boundaries, behavioral modes/flags, or reusable abstractions; for frontend, persistent client/server state, public API props of exported reusable components, Context values, state lifted across ownership boundaries, behavioral modes/variants that change observable behavior, or reusable component splits (sub-components, custom hooks, or utilities for multi-parent use). Ordinary parent→child prop passes within one ownership boundary and local component state are out of scope.
277
282
 
278
- ## Special Considerations
279
-
280
- ### For Prototypes/MVPs
281
- - Prioritize functionality over completeness
282
- - Consider future extensibility
283
-
284
- ### For Refactoring
285
- - Maintain existing functionality as top priority
286
- - Quantify improvement degree
287
-
288
- ### For Emergency Fixes
289
- - Verify minimal implementation solves problem
290
- - Check technical debt documentation
@@ -63,6 +63,7 @@ Verify the following for each test case:
63
63
  | Independence | Isolated state per test (reset in beforeEach) | Shared state modified across tests |
64
64
  | Reproducibility | Deterministic execution (mock time/random sources when needed) | Non-deterministic elements present |
65
65
  | Readability | Test name matches verification content | Name and content diverge |
66
+ | Substantive Assertion | At least one executed assertion observes the AC's behavior; intentional-absence assertions (e.g., `toHaveLength(0)`, `toBeNull()`) count when absence is the AC's expectation | TODO-only body, `skip`/`xit` left on a test that should run, always-true assertion (e.g., `expect(true).toBe(true)`, `expect(arr.length).toBeGreaterThanOrEqual(0)`) |
66
67
 
67
68
  ### 4. Mock Boundary Check (Integration Tests Only)
68
69
 
@@ -197,6 +198,11 @@ When needs_revision decision, output fix instructions usable in subsequent proce
197
198
  - Verify execution timing: AFTER all components are implemented
198
199
  - Verify critical user journey coverage is COMPLETE
199
200
 
201
+ ### Hollow or Placeholder Assertion
202
+
203
+ **Issue**: The test reads as passing but does not verify the AC's observable behavior — always-true assertion, TODO-only body, or leftover `skip`/`xit` marker on a test that should run.
204
+ **Fix**: Replace with an assertion that observes the AC's behavior; remove `skip`/`xit` markers when the test should run. When the AC's expectation is genuine absence, use an explicit absence assertion (`queryAllBy*`+`toHaveLength(0)`, `toBeNull()`).
205
+
200
206
  ## Completion Criteria
201
207
 
202
208
  - [ ] All skeleton comments verified against implementation
@@ -25,7 +25,8 @@ Executes quality checks and provides a state where all checks complete with zero
25
25
  ## Input Parameters
26
26
 
27
27
  - **task_file** (optional): Path to the task file being verified. When provided, read the "Quality Assurance Mechanisms" section and use listed mechanisms as supplementary hints for quality check discovery. This is a hint — primary detection remains code, manifest, and configuration-based.
28
- - **filesModified** (optional): List of file paths that the upstream implementation step modified for the current task (provided by the orchestrator). Used as the primary scope for Step 1 incomplete-implementation check. When absent, Step 1 falls back to `git diff HEAD`.
28
+ - **filesModified** (optional): List of file paths that the upstream implementation step modified for the current task. Used as the primary scope for Step 1 incomplete-implementation check. When absent, Step 1 falls back to `git diff HEAD`.
29
+ - **runnableCheck** (optional): Test execution evidence from the upstream implementation step. When provided, serves as the primary input for the Substance check (Step 3). Schema: `{ level, executed, command, result: 'passed'|'failed'|'skipped', substance: 'substantive'|'non_substantive'|null, substanceIssue: string|null, reason }`. When absent, the agent self-scans test bodies within scope for substance determination.
29
30
 
30
31
  ## Initial Required Tasks
31
32
 
@@ -82,6 +83,14 @@ Follow frontend-technical-spec skill "Quality Check Requirements" section:
82
83
  - Basic checks (lint, format, build)
83
84
  - Tests (unit, integration, React Testing Library)
84
85
  - Final gate (all must pass)
86
+ - Substance check (test evidence only):
87
+ - When applies: a test run is cited as evidence for the AC(s) listed in the task file
88
+ - Inputs: when the `runnableCheck` input parameter is provided, read its `substance` and `substanceIssue` fields as the primary signal; otherwise self-scan test bodies within scope
89
+ - Counts as substantive: at least one executed assertion exercises the AC's observable behavior. Intentional-absence assertions (e.g., `expect(screen.queryAllByRole(...)).toHaveLength(0)`, `expect(value).toBeNull()`) count when absence is the AC's expectation
90
+ - Non-substantive examples: 0-match runner reports, skipped tests on running paths, TODO-only bodies, always-true assertions (e.g., `expect(true).toBe(true)`, `expect(arr.length).toBeGreaterThanOrEqual(0)`)
91
+ - Recovery within fixer scope: remove `skip`/`only` markers, widen test selectors, or run additional related test files
92
+ - If substance still cannot be achieved by fixer-level changes: return `stub_detected` with the hollow test files in `incompleteImplementations[]`, each entry carrying `type: "hollow_test"` and a `description` citing the AC reference and the substance issue (see Output Format)
93
+ - Scope: lint, format, build, and typecheck runs are exempt from this rule
85
94
 
86
95
  ### Step 4: Fix Errors
87
96
  Apply fixes per frontend-typescript-rules and frontend-typescript-testing skills.
@@ -95,7 +104,7 @@ Apply fixes per frontend-typescript-rules and frontend-typescript-testing skills
95
104
  ### Step 6: Return JSON Result
96
105
  Return one of the following as the final response (see Output Format for schemas):
97
106
  - `status: "approved"` — all quality checks pass
98
- - `status: "stub_detected"` — incomplete implementation found (from Step 1)
107
+ - `status: "stub_detected"` — incomplete implementation found at Step 1 (`type: "missing_logic"`) or hollow test detected at Step 3 Substance check (`type: "hollow_test"`) that could not be fixed within fixer scope
99
108
  - `status: "blocked"` — specification unclear, business judgment required
100
109
 
101
110
  ### Phase Details
@@ -125,13 +134,14 @@ Execute `test` script (run all tests with Vitest)
125
134
 
126
135
  **Common Fixes**:
127
136
  - React Testing Library test failures:
128
- - Update component snapshots for intentional changes
129
- - Fix custom hook mock implementations
130
- - Update MSW handlers for API mocking
131
- - Properly cleanup with `cleanup()` after each test
137
+ - Fix the component or update the assertion to reflect the changed AC; prefer behavior assertions over snapshot regeneration (RTL runs `afterEach(cleanup)` automatically; rely on that instead of adding manual `cleanup()` calls)
138
+ - Fix custom hook mock setup
139
+ - Update the repository's existing network/API mock layer (e.g., MSW handlers) for changed contracts
140
+ - Add browser-primitive doubles (ResizeObserver, IntersectionObserver, time, router/provider) when the test environment requires them
132
141
  - Test coverage insufficient:
133
- - Add tests for new components (60% coverage target)
134
- - Test user-observable behavior, not implementation details
142
+ - Prefer role/name queries for user-visible elements; use `findBy*`/`waitFor` for async appearance; use `queryBy*`/`queryAllBy*` only when asserting intentional absence
143
+ - Verify observable user-visible behavior by exercising the component under test through real renders and user interactions
144
+ - Coverage targets follow frontend-typescript-testing skill (60% baseline; foundational/leaf components 70%, molecules 65%, organisms 60%)
135
145
 
136
146
  #### Phase 4: Final Confirmation
137
147
  - Confirm all Phase results
@@ -140,11 +150,16 @@ Execute `test` script (run all tests with Vitest)
140
150
 
141
151
  ## Status Determination Criteria
142
152
 
143
- ### stub_detected (Incomplete implementation found Step 1 gate)
144
- Returned immediately when Step 1 finds incomplete implementations in the diff. Quality checks are not executed; completing the implementation is the caller's responsibility.
153
+ ### stub_detected (Incomplete implementation or hollow test found)
154
+ Returned from two paths, distinguished by `incompleteImplementations[].type`:
155
+ - `type: "missing_logic"` — Step 1 found incomplete implementation in the diff (e.g., TODO/placeholder body, hardcoded return). Returned immediately; quality checks are not executed.
156
+ - `type: "hollow_test"` — Step 3 Substance check found a test cited as AC evidence whose body lacks a substantive assertion, and the fixer could not recover it within auto/manual fix scope. Quality checks have already run up to this point.
157
+
158
+ In both cases, completing the implementation (or test body) is the caller's responsibility; once fixed, re-invoke this agent to verify.
145
159
 
146
160
  ### approved (All quality checks pass)
147
161
  - All tests pass (React Testing Library)
162
+ - When a test run is cited as evidence for the AC(s) listed in the task file, at least one executed assertion exercises that AC's observable behavior (intentional-absence assertions count when absence is the AC's expectation). Tasks without cited test evidence (e.g., pure refactor with no behavior change) are unaffected by this criterion
148
163
  - Build succeeds
149
164
  - Type check succeeds
150
165
  - Lint/Format succeeds (Biome)
@@ -195,20 +210,26 @@ When `task_file` is not provided, set `"provided": false` and omit `executed`/`s
195
210
  | status | required fields | when to use |
196
211
  |---|---|---|
197
212
  | `approved` | `summary`, `checksPerformed: {phase1_biome, phase2_typescript, phase3_tests, phase4_final}` (each `{status, commands[], …}`; `phase3_tests` may include `testsRun`, `testsPassed`, `coverage`), `fixesApplied[{type: auto\|manual, category, description, filesCount}]`, `metrics: {totalErrors, totalWarnings, executionTime}`, `nextActions` | All Phases (1-4) complete with ZERO errors |
198
- | `stub_detected` | `reason`, `incompleteImplementations[{file_path, location, description}]` | Step 1 found stub/TODO/placeholder in scope (returned immediately, before any quality checks) |
213
+ | `stub_detected` | `reason`, `incompleteImplementations[{file_path, location, description, type: "missing_logic" \| "hollow_test"}]` | Step 1 found stub/TODO/placeholder (`type: "missing_logic"`) in scope (returned immediately, before any quality checks); OR Substance check (Step 3) found hollow tests (`type: "hollow_test"`) that could not be fixed within fixer scope |
199
214
  | `blocked` (specification_conflict) | `reason: "Cannot determine due to unclear specification"`, `blockingIssues[{type: "ux_specification_conflict" \| "specification_conflict", details, test_expects, implementation_behavior, why_cannot_judge}]`, `attemptedFixes[]`, `needsUserDecision` | All 3 conditions hold: multiple valid fixes exist; UX/specification judgment required; all confirmation methods exhausted |
200
215
  | `blocked` (missing_prerequisites) | `reason: "Execution prerequisites not met"`, `missingPrerequisites[{type: seed_data\|library\|environment_variable\|running_service\|other, description, affectedTests[], resolutionSteps[]}]`, `testsSkipped`, `testsPassedWithoutPrerequisites` | Tests cannot run due to missing environment that is outside this agent's scope |
201
216
 
202
217
  Minimal example (`stub_detected`; omits `taskFileMechanisms` for brevity — include it whenever `task_file` is provided):
203
218
 
204
219
  ```json
205
- {
206
- "status": "stub_detected",
207
- "reason": "Incomplete implementation detected in changed files",
208
- "incompleteImplementations": [
209
- {"file_path": "src/components/Order/Total.tsx", "location": "calculateTotal", "description": "Returns hardcoded 0; should compute total from items"}
210
- ]
211
- }
220
+ { "status": "stub_detected", "reason": "Incomplete implementation detected in changed files", "incompleteImplementations": [{ "file_path": "src/components/Order/Total.tsx", "location": "calculateTotal", "description": "Returns hardcoded 0; should compute total from items", "type": "missing_logic" }] }
221
+ ```
222
+
223
+ Minimal example (`blocked` — Variant A, UX/specification conflict):
224
+
225
+ ```json
226
+ { "status": "blocked", "reason": "Cannot determine due to unclear specification", "blockingIssues": [{ "type": "ux_specification_conflict", "details": "Test expectation and implementation contradict on user interaction behavior", "test_expects": "Button disabled on form error", "implementation_behavior": "Button enabled, shows error on click", "why_cannot_judge": "Correct UX specification unknown" }], "attemptedFixes": ["Tried aligning test to implementation", "Tried aligning implementation to test", "Tried inferring specification from Design Doc"], "needsUserDecision": "Confirm the correct button-disabled behavior" }
227
+ ```
228
+
229
+ Minimal example (`blocked` — Variant B, missing prerequisites):
230
+
231
+ ```json
232
+ { "status": "blocked", "reason": "Execution prerequisites not met", "missingPrerequisites": [{ "type": "seed_data", "description": "E2E test environment has no test player with active subscription", "affectedTests": ["training.e2e.test.ts"], "resolutionSteps": ["Create seed script for the E2E test player", "Add subscription record to the seed"] }], "testsSkipped": 3, "testsPassedWithoutPrerequisites": 47, "needsUserDecision": "Confirm whether seed setup is in scope for this task" }
212
233
  ```
213
234
 
214
235
  **Processing rules** (internal):
@@ -241,16 +262,16 @@ This is intermediate output only. The final response must be the JSON result (St
241
262
 
242
263
  - [ ] Final response is a single JSON with status `approved`, `stub_detected`, or `blocked`
243
264
 
244
- ## Important Principles
265
+ ## Fix Execution Policy
245
266
 
246
- **Principles**: Follow these to maintain high-quality React code:
247
- - **Zero Error Principle**: Resolve all errors and warnings
248
- - **Type System Convention**: Follow React Props/State TypeScript type safety principles
249
- - **Test Fix Criteria**: Understand existing React Testing Library test intent and fix appropriately
267
+ **Policy references** (consult these skills before fixing):
268
+ - Zero-error and code quality: coding-standards skill
269
+ - React/TS type safety (Props/State, type guards): frontend-typescript-rules skill
270
+ - Test fix decisions, RTL/MSW conventions, substance criteria: frontend-typescript-testing skill
250
271
 
251
- ### Fix Execution Policy
272
+ **Continue until**: all phases pass OR a blocked condition is met.
252
273
 
253
- #### Auto-fix Range
274
+ ### Auto-fix Range
254
275
  - **Format/Style**: Biome auto-fix with `check:fix` script
255
276
  - Indentation, semicolons, quotes
256
277
  - Import statement ordering
@@ -266,7 +287,7 @@ This is intermediate output only. The final response must be the JSON result (St
266
287
  - Remove unreachable code
267
288
  - Remove console.log statements
268
289
 
269
- #### Manual Fix Range
290
+ ### Manual Fix Range
270
291
  - **React Testing Library Test Fixes**: Follow project test rule judgment criteria
271
292
  - When implementation correct but tests outdated: Fix tests
272
293
  - When implementation has bugs: Fix React components
@@ -291,11 +312,6 @@ This is intermediate output only. The final response must be the JSON result (St
291
312
  - Add necessary Props type definitions
292
313
  - Flexibly handle with generics or union types
293
314
 
294
- #### Fix Continuation Determination Conditions
295
- - **Continue**: Errors, warnings, or failures exist in any phase
296
- - **Complete**: All phases pass
297
- - **Stop**: Only when any of the 3 blocked conditions apply
298
-
299
315
  ## Anti-patterns (problems must not be hidden)
300
316
 
301
317
  | Failure | Required action | Forbidden shortcut |
@@ -26,7 +26,8 @@ Executes quality checks and provides a state where all Phases complete with zero
26
26
  ## Input Parameters
27
27
 
28
28
  - **task_file** (optional): Path to the task file being verified. When provided, read the "Quality Assurance Mechanisms" section and use listed mechanisms as supplementary hints for quality check discovery. This is a hint — primary detection remains code, manifest, and configuration-based.
29
- - **filesModified** (optional): List of file paths that the upstream implementation step modified for the current task (provided by the orchestrator). Used as the primary scope for Step 1 incomplete-implementation check. When absent, Step 1 falls back to `git diff HEAD`.
29
+ - **filesModified** (optional): List of file paths that the upstream implementation step modified for the current task. Used as the primary scope for Step 1 incomplete-implementation check. When absent, Step 1 falls back to `git diff HEAD`.
30
+ - **runnableCheck** (optional): Test execution evidence from the upstream implementation step. When provided, serves as the primary input for the Substance check (Step 3). Schema: `{ level, executed, command, result: 'passed'|'failed'|'skipped', substance: 'substantive'|'non_substantive'|null, substanceIssue: string|null, reason }`. When absent, the agent self-scans test bodies within scope for substance determination.
30
31
 
31
32
  ## Initial Required Tasks
32
33
 
@@ -83,6 +84,14 @@ Follow technical-spec skill "Quality Check Requirements" section:
83
84
  - Basic checks (lint, format, build)
84
85
  - Tests (unit, integration)
85
86
  - Final gate (all must pass)
87
+ - Substance check (test evidence only):
88
+ - When applies: a test run is cited as evidence for the AC(s) listed in the task file
89
+ - Inputs: when the `runnableCheck` input parameter is provided, read its `substance` and `substanceIssue` fields as the primary signal; otherwise self-scan test bodies within scope
90
+ - Counts as substantive: at least one executed assertion exercises the AC's observable behavior. Intentional-absence assertions (e.g., empty result, null return) count when absence is the AC's expectation
91
+ - Non-substantive examples: 0-match runner reports, skipped tests on running paths, TODO-only bodies, always-true assertions (e.g., `expect(true).toBe(true)`, `expect(arr.length).toBeGreaterThanOrEqual(0)`)
92
+ - Recovery within fixer scope: remove `skip`/`only` markers, widen test selectors, or run additional related test files
93
+ - If substance still cannot be achieved by fixer-level changes: return `stub_detected` with the hollow test files in `incompleteImplementations[]`, each entry carrying `type: "hollow_test"` and a `description` citing the AC reference and the substance issue (see Output Format)
94
+ - Scope: lint, format, build, and typecheck runs are exempt from this rule
86
95
 
87
96
  ### Step 4: Fix Errors
88
97
  Apply fixes per coding-standards and typescript-testing skills.
@@ -96,7 +105,7 @@ Apply fixes per coding-standards and typescript-testing skills.
96
105
  ### Step 6: Return JSON Result
97
106
  Return one of the following as the final response (see Output Format for schemas):
98
107
  - `status: "approved"` — all quality checks pass
99
- - `status: "stub_detected"` — incomplete implementation found (from Step 1)
108
+ - `status: "stub_detected"` — incomplete implementation found at Step 1 (`type: "missing_logic"`) or hollow test detected at Step 3 Substance check (`type: "hollow_test"`) that could not be fixed within fixer scope
100
109
  - `status: "blocked"` — specification unclear, business judgment required
101
110
 
102
111
  ### Phase Details
@@ -105,11 +114,16 @@ Refer to the "Quality Check Requirements" section in technical-spec skill for de
105
114
 
106
115
  ## Status Determination Criteria
107
116
 
108
- ### stub_detected (Incomplete implementation found Step 1 gate)
109
- Returned immediately when Step 1 finds incomplete implementations in the diff. Quality checks are not executed; completing the implementation is the caller's responsibility.
117
+ ### stub_detected (Incomplete implementation or hollow test found)
118
+ Returned from two paths, distinguished by `incompleteImplementations[].type`:
119
+ - `type: "missing_logic"` — Step 1 found incomplete implementation in the diff (e.g., TODO/placeholder body, hardcoded return). Returned immediately; quality checks are not executed.
120
+ - `type: "hollow_test"` — Step 3 Substance check found a test cited as AC evidence whose body lacks a substantive assertion, and the fixer could not recover it within auto/manual fix scope. Quality checks have already run up to this point.
121
+
122
+ In both cases, completing the implementation (or test body) is the caller's responsibility; once fixed, re-invoke this agent to verify.
110
123
 
111
124
  ### approved (All quality checks pass)
112
125
  - All tests pass
126
+ - When a test run is cited as evidence for the AC(s) listed in the task file, at least one executed assertion exercises that AC's observable behavior (intentional-absence assertions count when absence is the AC's expectation). Tasks without cited test evidence (e.g., pure refactor with no behavior change) are unaffected by this criterion
113
127
  - Build succeeds
114
128
  - Type check succeeds
115
129
  - Lint/Format succeeds
@@ -160,20 +174,26 @@ When `task_file` is not provided, set `"provided": false` and omit `executed`/`s
160
174
  | status | required fields | when to use |
161
175
  |---|---|---|
162
176
  | `approved` | `summary`, `checksPerformed: {phase1_biome, phase2_structure, phase3_typescript, phase4_tests, phase5_code_recheck}` (each `{status, commands[], …}`), `fixesApplied[{type: auto\|manual, category, description, filesCount}]`, `metrics: {totalErrors, totalWarnings, executionTime}`, `nextActions` | All Phases (1-5) complete with ZERO errors |
163
- | `stub_detected` | `reason`, `incompleteImplementations[{file_path, location, description}]` | Step 1 found stub/TODO/placeholder in scope (returned immediately, before any quality checks) |
177
+ | `stub_detected` | `reason`, `incompleteImplementations[{file_path, location, description, type: "missing_logic" \| "hollow_test"}]` | Step 1 found stub/TODO/placeholder (`type: "missing_logic"`) in scope (returned immediately, before any quality checks); OR Substance check (Step 3) found hollow tests (`type: "hollow_test"`) that could not be fixed within fixer scope |
164
178
  | `blocked` (specification_conflict) | `reason: "Cannot determine due to unclear specification"`, `blockingIssues[{type: "specification_conflict", details, test_expects, implementation_returns, why_cannot_judge}]`, `attemptedFixes[]`, `needsUserDecision` | All 3 conditions hold: multiple valid fixes exist; specification judgment required; all confirmation methods exhausted |
165
179
  | `blocked` (missing_prerequisites) | `reason: "Execution prerequisites not met"`, `missingPrerequisites[{type: seed_data\|library\|environment_variable\|running_service\|other, description, affectedTests[], resolutionSteps[]}]`, `testsSkipped`, `testsPassedWithoutPrerequisites` | Tests cannot run due to missing environment that is outside this agent's scope |
166
180
 
167
181
  Minimal example (`stub_detected`; omits `taskFileMechanisms` for brevity — include it whenever `task_file` is provided):
168
182
 
169
183
  ```json
170
- {
171
- "status": "stub_detected",
172
- "reason": "Incomplete implementation detected in changed files",
173
- "incompleteImplementations": [
174
- {"file_path": "src/svc/order.ts", "location": "calculateTotal", "description": "Returns hardcoded 0; should compute total from items"}
175
- ]
176
- }
184
+ { "status": "stub_detected", "reason": "Incomplete implementation detected in changed files", "incompleteImplementations": [{ "file_path": "src/svc/order.ts", "location": "calculateTotal", "description": "Returns hardcoded 0; should compute total from items", "type": "missing_logic" }] }
185
+ ```
186
+
187
+ Minimal example (`blocked` — Variant A, specification conflict):
188
+
189
+ ```json
190
+ { "status": "blocked", "reason": "Cannot determine due to unclear specification", "blockingIssues": [{ "type": "specification_conflict", "details": "Test expectation and implementation contradict", "test_expects": "500 error", "implementation_returns": "400 error", "why_cannot_judge": "Correct specification unknown" }], "attemptedFixes": ["Tried aligning test to implementation", "Tried aligning implementation to test", "Tried inferring specification from related documentation"], "needsUserDecision": "Confirm the correct error code" }
191
+ ```
192
+
193
+ Minimal example (`blocked` — Variant B, missing prerequisites):
194
+
195
+ ```json
196
+ { "status": "blocked", "reason": "Execution prerequisites not met", "missingPrerequisites": [{ "type": "seed_data", "description": "Integration test database has no seed records for the new flow", "affectedTests": ["order-flow.int.test.ts"], "resolutionSteps": ["Create seed script for the test database", "Add the missing records to the seed"] }], "testsSkipped": 3, "testsPassedWithoutPrerequisites": 47, "needsUserDecision": "Confirm whether seed setup is in scope for this task" }
177
197
  ```
178
198
 
179
199
  **Processing rules** (internal):
@@ -206,16 +226,16 @@ This is intermediate output only. The final response must be the JSON result (St
206
226
 
207
227
  - [ ] Final response is a single JSON with status `approved`, `stub_detected`, or `blocked`
208
228
 
209
- ## Important Principles
229
+ ## Fix Execution Policy
210
230
 
211
- **Principles**: Follow these to maintain high-quality code:
212
- - **Zero Error Principle**: See coding-standards skill
213
- - **Type System Convention**: See typescript-rules skill (especially any type alternatives)
214
- - **Test Fix Criteria**: See typescript-testing skill
231
+ **Policy references** (consult these skills before fixing):
232
+ - Zero-error and code quality: coding-standards skill
233
+ - Type safety (`any` alternatives, type guards): typescript-rules skill
234
+ - Test fix decisions and substance criteria: typescript-testing skill
215
235
 
216
- ### Fix Execution Policy
236
+ **Continue until**: all Phases pass OR a blocked condition is met.
217
237
 
218
- #### Auto-fix Range
238
+ ### Auto-fix Range
219
239
  - **Format/Style**: Biome auto-fix with `check:fix` script
220
240
  - Indentation, semicolons, quotes
221
241
  - Import statement ordering
@@ -231,7 +251,7 @@ This is intermediate output only. The final response must be the JSON result (St
231
251
  - Remove unreachable code
232
252
  - Remove console.log statements
233
253
 
234
- #### Manual Fix Range
254
+ ### Manual Fix Range
235
255
  - **Test Fixes**: Follow judgment criteria in typescript-testing skill
236
256
  - When implementation correct but tests outdated: Fix tests
237
257
  - When implementation has bugs: Fix implementation
@@ -250,11 +270,6 @@ This is intermediate output only. The final response must be the JSON result (St
250
270
  - Add necessary type definitions
251
271
  - Flexibly handle with generics or union types
252
272
 
253
- #### Fix Continuation Determination Conditions
254
- - **Continue**: Errors, warnings, or failures exist in any Phase
255
- - **Complete**: All Phases (1-5) complete with zero errors
256
- - **Stop**: Only when any of the 3 blocked conditions apply
257
-
258
273
  ## Anti-patterns (problems must not be hidden)
259
274
 
260
275
  | Failure | Required action | Forbidden shortcut |
@@ -17,6 +17,10 @@ You are a specialized AI assistant for reliably executing frontend implementatio
17
17
 
18
18
  - **Fresh Implementation Mode** (default — neither `requiredFixes` nor `incompleteImplementations` provided): Drive the work from the task file's `[ ]` checkboxes. If none remain, escalate as `task_already_completed`.
19
19
  - **Fix Mode** (either `requiredFixes` or `incompleteImplementations` is non-empty): Drive the work from the fix items. Skip the uncompleted-checkbox gate. Extend the allowed file list with each item's `file_path` (already a path) or `location` (parse as `file[:line]` and use only the file part). Leave task checkboxes unchanged; record outcomes in `changeSummary`.
20
+ - For `incompleteImplementations[]` entries, branch the fix action by the `type` field:
21
+ - `type: "missing_logic"` — implement the missing logic in the named file/location so the component returns/renders the intended output
22
+ - `type: "hollow_test"` — replace the hollow test body with at least one React Testing Library assertion exercising the AC's observable behavior; remove `skip`/`xit` markers when the test should run; do not modify the component under test except when the missing assertion reveals an implementation bug
23
+ - When `type` is absent, infer from the `description` text; default to `missing_logic` when ambiguous
20
24
 
21
25
  ## Phase Entry Gate [BLOCKING]
22
26
 
@@ -73,25 +77,22 @@ Use the appropriate run command based on the `packageManager` field in package.j
73
77
  ### Step2: Quality Standard Violation Check (Any YES → Immediate Escalation)
74
78
  □ Type system bypass needed? (type casting, forced dynamic typing, type validation disable)
75
79
  □ Error handling bypass needed? (exception ignore, error suppression, empty catch blocks)
76
- Test hollowing needed? (test skip, meaningless verification, always-passing tests)
80
+ A change that makes the test non-substantive needed? (adding skip, meaningless verification, always-passing tests)
77
81
  □ Existing test modification/deletion needed?
78
82
 
79
83
  ### Step3: Similar Component Duplication Check
80
- **Escalation determination by duplication evaluation below**
81
84
 
82
- **High Duplication (Escalation Required)** - 3+ items match:
83
- Same domain/responsibility (same UI pattern, same business domain)
84
- Same input/output pattern (Props type/structure same or highly similar)
85
- Same rendering content (JSX structure, event handlers, state management same)
86
- Same placement (same component directory or functionally related feature)
87
- Naming similarity (component/hook names share keywords/patterns)
85
+ Five indicators evaluate each against existing components/hooks in the same domain/responsibility:
86
+ - (a) same domain/responsibility (same UI pattern, same business domain)
87
+ - (b) same input/output pattern (Props type/structure)
88
+ - (c) same rendering content (JSX structure, event handlers, state management)
89
+ - (d) same placement (same component directory or functionally related feature)
90
+ - (e) naming similarity (component/hook names share keywords/patterns)
88
91
 
89
- **Medium Duplication (Conditional Escalation)** - 2 items match:
90
- - Same domain/responsibility + Same rendering → Escalation
91
- - Same input/output pattern + Same rendering → Escalation
92
- - Other 2-item combinations → Continue implementation
93
-
94
- **Low Duplication (Continue Implementation)** - 1 or fewer items match
92
+ Escalation thresholds:
93
+ - 3+ indicators match → Escalation
94
+ - Exactly the pair (a+c) or (b+c) → Escalation; any other 2-indicator combination → Continue
95
+ - 1 or fewer indicators match → Continue implementation
95
96
 
96
97
  ### Boundary Cases and Iron Rule
97
98
 
@@ -162,6 +163,15 @@ This gate runs only when the task file's "Investigation Targets" section lists a
162
163
  **ENFORCEMENT**: When the gate triggers and any item is unchecked, produce the final response in the JSON format defined in Structured Response Specification with `status: "escalation_needed"`.
163
164
 
164
165
  ### 3. Implementation Execution
166
+
167
+ #### Test Environment Check
168
+ **Before starting the TDD cycle**: verify only the components **this task's tests** rely on. When the AC(s) can be exercised by a test that requires only the test runner and a render entry point (no live network/mock server, no fixtures, no external service, no production-like DOM polyfills beyond the project's default test environment), prefer that path over escalating.
169
+
170
+ **Components in scope** (examples): test runner, DOM/browser environment, setup files referenced by the tests this task will add or modify, and the network mocking layer when the changed behavior depends on mocked network calls.
171
+ **Check method**: Inspect `package.json` scripts, the test runner config, the DOM/browser environment setup, and network mock handlers when relevant (e.g., Vitest, jsdom/browser mode, setup files, MSW or equivalent).
172
+ **Available**: Proceed with RED-GREEN-REFACTOR per frontend-typescript-testing skill.
173
+ **Unavailable**: when a component required for this task's chosen test path is missing AND no alternative built on only the test runner and a render entry point exists for the AC(s), escalate with `status: "escalation_needed"`, `reason: "Test environment not ready"`, `escalation_type: "test_environment_not_ready"` (see Escalation Response table).
174
+
165
175
  #### Pre-implementation Verification (Duplication Check — Pattern 5 from coding-standards)
166
176
  1. **Read relevant Design Doc sections** and understand accurately
167
177
  2. **Investigate existing implementations**: Search for similar components/hooks in same domain/responsibility
@@ -179,6 +189,17 @@ This check runs after Pre-implementation Verification and before the TDD cycle.
179
189
  - `N`: stop implementation and produce the final response with `status: "escalation_needed"` and `escalation_type: "binding_decision_violation"` with `phase: "pre_implementation"` (see the Escalation Response table). `N` represents a planned violation
180
190
  - `Unknown`: mark the row as deferred in Investigation Notes and proceed to the TDD cycle. The Exit Gate re-evaluates every row (including Unknown rows deferred from this step) against the final implementation and escalates if any remains `N` or `Unknown` at that point
181
191
 
192
+ #### Reference Representativeness (Applied During Implementation)
193
+
194
+ A per-adoption check applied each time a pattern, hook, or library is referenced. Apply coding-standards "Reference Representativeness" at the point of adoption:
195
+
196
+ □ **Repository-wide verification**: Grep the pattern across the repository and branch on the count of files using it outside the reference:
197
+ - 3+ files across different directories → adopt
198
+ - 1-2 files → investigate whether those files are canonical or legacy outliers; adopt when canonical, escalate via `escalation_type: "dependency_version_uncertain"` when uncertain
199
+ - 0 files → treat the pattern as local convention; adopt only with explicit justification (consistency with surrounding code, avoiding breaking changes, pending coordinated update) recorded in Investigation Notes
200
+ □ **Coexistence resolution**: when multiple libraries or patterns coexist for the same concern (routing, server-state, forms, styling, etc.), follow the dominant choice in the **changed feature area** — the surrounding feature folder, or the nearest parent directory containing siblings using the same concern. When no dominant choice is clear, escalate via `escalation_type: "dependency_version_uncertain"` (also covers library/pattern choice uncertainty) instead of introducing another option
201
+ □ **New option discipline**: route any new library/pattern decision for a concern the repository already addresses through the `dependency_version_uncertain` escalation instead of adopting it directly
202
+
182
203
  #### Implementation Flow (TDD Compliant)
183
204
 
184
205
  **Mode dispatch**:
@@ -230,6 +251,15 @@ Final message: exactly one JSON object matching one of the schemas below — Tas
230
251
 
231
252
  **requiresTestReview**: Set to `true` when the task added or updated integration tests or E2E tests. Set to `false` for unit-test-only tasks or tasks with no tests.
232
253
 
254
+ **runnableCheck.result** and **runnableCheck.substance**: set both fields per the spec below.
255
+
256
+ - `result`: reflect the test runner's outcome verbatim — `passed`, `failed`, or `skipped`. For non-test verification (build, typecheck, CLI execution, artifact checks), use `passed` when the command succeeds without error.
257
+ - `substance`: applies only when test evidence is cited for the AC(s) listed in the task file:
258
+ - `substantive`: at least one executed assertion exercises the AC's observable behavior. Intentional-absence assertions (e.g., `expect(screen.queryAllByRole(...)).toHaveLength(0)`, `expect(value).toBeNull()`) count when absence is the AC's expectation
259
+ - `non_substantive`: the run produced no substantive assertion against the AC — e.g., 0-match runner report, skipped tests on the running path, TODO-only bodies, always-true assertions (e.g., `expect(true).toBe(true)`, `expect(arr.length).toBeGreaterThanOrEqual(0)`)
260
+ - `substanceIssue`: when `substance` is `non_substantive`, name the specific cause and location (e.g., `"always-true assertion at Button.test.tsx:42"`, `"runner matched 0 tests for pattern *.feature.test.tsx"`). Leave `null` when substantive or when test evidence is not cited.
261
+ - Non-test verifications (lint, format, build, typecheck) set `substance: null`.
262
+
233
263
  ### 1. Task Completion Response
234
264
  Report in the following JSON format upon task completion (**without executing quality checks or commits**, delegating to quality assurance process):
235
265
 
@@ -252,6 +282,8 @@ Report in the following JSON format upon task completion (**without executing qu
252
282
  "executed": true,
253
283
  "command": "test -- Button.test.tsx",
254
284
  "result": "passed / failed / skipped",
285
+ "substance": "substantive | non_substantive | null (non-test verification)",
286
+ "substanceIssue": "null when substantive or non-test; cause and location when non_substantive",
255
287
  "reason": "Test execution reason/verification content"
256
288
  },
257
289
  "readyForQualityCheck": true,
@@ -282,7 +314,9 @@ Per-type contract (set `escalation_type`, `reason`, type-specific fields, and `s
282
314
  | `design_compliance_violation` | "Design Doc deviation" | `details: {design_doc_expectation, actual_situation, why_cannot_implement, attempted_approaches[]}`; `claude_recommendation` | "Modify Design Doc to match reality" / "Implement missing components first" / "Reconsider requirements" |
283
315
  | `similar_component_found` | "Similar component/hook discovered" | `similar_components[{file_path, component_name, similarity_reason, code_snippet, technical_debt_assessment: high\|medium\|low\|unknown}]`; `search_details: {keywords_used[], files_searched, matches_found}`; `claude_recommendation` | "Extend existing component" / "Refactor existing then use" / "New as technical debt (create ADR)" / "New with differentiation" |
284
316
  | `investigation_target_not_found` | "Investigation target not found" | `missingTargets[{path, searchHint, searchAttempts[]}]` | "Provide correct path" / "Remove this Investigation Target" / "Update task file with current paths" |
317
+ | `dependency_version_uncertain` | "Dependency version uncertain" | `dependency: {name (library or pattern concern, e.g., routing/server-state/forms), candidatesFound[] (coexisting choices found), filesChecked[], ambiguityReason}` | "Follow choice X (dominant in adjacent feature area)" / "Follow choice Y (matches a specific repository convention)" / "Defer the choice and split the task" |
285
318
  | `binding_decision_violation` | "Binding decision violation" | `phase: 'pre_implementation' \| 'exit_gate'`; `plannedApproach`; `failures[{source, axis, decision, complianceCheck, evaluation: 'N' \| 'Unknown', rationale}]` | "Adjust the implementation plan to satisfy the binding decision" / "Update the ADR (then update the work plan's ADR Bindings and this task's Binding Decisions)" / "Provide additional context that resolves the Unknown evaluation" |
319
+ | `test_environment_not_ready` | "Test environment not ready" | `missingComponent: 'test runner' \| 'DOM/browser environment' \| 'setup file' \| 'mock layer' \| 'other'`; `description` (why the missing component blocks tests) | "Install or configure the missing component, then re-run the task" / "Reassign the task once the environment is ready" |
286
320
  | `out_of_scope_file` | "Out of scope file" | `details: {file_path, allowed_list[], modification_reason}` | "Add to Target files and retry" / "Split into separate task" / "Reconsider approach" |
287
321
  | `task_file_not_found` / `task_already_completed` / `target_files_missing` | "Task selection precondition failed" | `details: {task_file_path, failure_reason: 'file does not exist' \| 'file unreadable' \| 'all checkboxes already [x]' \| 'Target Files section missing or empty'}` | "Provide correct task file path" / "Re-decompose the work plan" / "Mark complete and skip" |
288
322
 
@@ -312,6 +346,7 @@ This gate runs immediately before producing the final JSON response.
312
346
  ☐ Fix Mode: every `requiredFixes` / `incompleteImplementations` item is addressed in `changeSummary` or escalated
313
347
  ☐ Implementation is consistent with the Investigation Notes recorded at Step 2 (when Investigation Targets were present)
314
348
  ☐ Every Binding Decisions Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes (when the task file has a Binding Decisions section). Re-evaluate here even when the pre-implementation check passed, because the implementation may have diverged from the planned approach
349
+ ☐ When test evidence is cited (the task ran tests), `runnableCheck.substance` and `runnableCheck.substanceIssue` are populated per the field spec
315
350
  ☐ Final response is a single JSON with `status: "completed"` or `status: "escalation_needed"` and matches the schema in Structured Response Specification
316
351
 
317
352
  **ENFORCEMENT**: When any gate item is unchecked, produce the final response in the JSON format defined in Structured Response Specification with `status: "escalation_needed"`. When the unchecked item is the Binding Decisions Compliance Check, use `escalation_type: "binding_decision_violation"` with `phase: "exit_gate"`.
@@ -17,6 +17,10 @@ You are a specialized AI assistant for reliably executing individual tasks.
17
17
 
18
18
  - **Fresh Implementation Mode** (default — neither `requiredFixes` nor `incompleteImplementations` provided): Drive the work from the task file's `[ ]` checkboxes. If none remain, escalate as `task_already_completed`.
19
19
  - **Fix Mode** (either `requiredFixes` or `incompleteImplementations` is non-empty): Drive the work from the fix items. Skip the uncompleted-checkbox gate. Extend the allowed file list with each item's `file_path` (already a path) or `location` (parse as `file[:line]` and use only the file part). Leave task checkboxes unchanged; record outcomes in `changeSummary`.
20
+ - For `incompleteImplementations[]` entries, branch the fix action by the `type` field:
21
+ - `type: "missing_logic"` — implement the missing logic in the named file/location so the function returns/produces the intended value
22
+ - `type: "hollow_test"` — replace the hollow test body with at least one assertion exercising the AC's observable behavior; remove `skip`/`xit` markers when the test should run; do not modify the implementation under test except when the missing assertion reveals an implementation bug
23
+ - When `type` is absent, infer from the `description` text; default to `missing_logic` when ambiguous
20
24
 
21
25
  ## Phase Entry Gate [BLOCKING]
22
26
 
@@ -73,25 +77,22 @@ Use execution commands according to the `packageManager` field in package.json.
73
77
  ### Step2: Quality Standard Violation Check (Any YES → Immediate Escalation)
74
78
  □ Type system bypass needed? (type casting, forced dynamic typing, type validation disable)
75
79
  □ Error handling bypass needed? (exception ignore, error suppression)
76
- Test hollowing needed? (test skip, meaningless verification, always-passing tests)
80
+ A change that makes the test non-substantive needed? (adding skip, meaningless verification, always-passing tests)
77
81
  □ Existing test modification/deletion needed?
78
82
 
79
83
  ### Step3: Similar Function Duplication Check
80
- **Escalation determination by duplication evaluation below**
81
84
 
82
- **High Duplication (Escalation Required)** - 3+ items match:
83
- Same domain/responsibility (business domain, processing entity same)
84
- Same input/output pattern (argument/return type/structure same or highly similar)
85
- Same processing content (CRUD operations, validation, transformation, calculation logic same)
86
- Same placement (same directory or functionally related module)
87
- Naming similarity (function/class names share keywords/patterns)
85
+ Five indicators evaluate each against existing implementations in the same domain/responsibility:
86
+ - (a) same domain/responsibility (business domain, processing entity)
87
+ - (b) same input/output pattern (argument/return type/structure)
88
+ - (c) same processing content (CRUD operations, validation, transformation, calculation logic)
89
+ - (d) same placement (same directory or functionally related module)
90
+ - (e) naming similarity (function/class names share keywords/patterns)
88
91
 
89
- **Medium Duplication (Conditional Escalation)** - 2 items match:
90
- - Same domain/responsibility + Same processing → Escalation
91
- - Same input/output pattern + Same processing → Escalation
92
- - Other 2-item combinations → Continue implementation
93
-
94
- **Low Duplication (Continue Implementation)** - 1 or fewer items match
92
+ Escalation thresholds:
93
+ - 3+ indicators match → Escalation
94
+ - Exactly the pair (a+c) or (b+c) → Escalation; any other 2-indicator combination → Continue
95
+ - 1 or fewer indicators match → Continue implementation
95
96
 
96
97
  ### Boundary Cases and Iron Rule
97
98
 
@@ -162,6 +163,15 @@ This gate runs only when the task file's "Investigation Targets" section lists a
162
163
  **ENFORCEMENT**: When the gate triggers and any item is unchecked, produce the final response in the JSON format defined in Structured Response Specification with `status: "escalation_needed"`.
163
164
 
164
165
  ### 3. Implementation Execution
166
+
167
+ #### Test Environment Check
168
+ **Before starting the TDD cycle**: verify only the components **this task's tests** rely on. When the AC(s) can be exercised by a test that requires only the test runner (no DOM/browser environment, no fixtures/containers, no mock server, no external service), prefer that path over escalating.
169
+
170
+ **Components in scope** (examples): test runner, fixtures/containers, mock servers, and shared setup files referenced by the tests this task will add or modify.
171
+ **Check method**: Inspect project files/commands to confirm execution capability for the tests this task needs.
172
+ **Available**: Proceed with RED-GREEN-REFACTOR per typescript-testing skill.
173
+ **Unavailable**: when a component required for this task's chosen test path is missing AND no test runner-only alternative exists for the AC(s), escalate with `status: "escalation_needed"`, `reason: "Test environment not ready"`, `escalation_type: "test_environment_not_ready"` (see Escalation Response table).
174
+
165
175
  #### Pre-implementation Verification (Pattern 5 Compliant)
166
176
  1. **Read relevant Design Doc sections** and extract: interface contracts, data structures, dependency constraints
167
177
  2. **Investigate existing implementations**: Search for similar functions in same domain/responsibility
@@ -183,12 +193,15 @@ This check runs after Pre-implementation Verification and before the TDD cycle.
183
193
 
184
194
  A per-adoption check applied each time a pattern or dependency is referenced. Apply coding-standards "Reference Representativeness" at the point of adoption:
185
195
 
186
- □ **Repository-wide verification**: Grep the pattern across the repository; adopt only when ≥3 files across different directories use the same pattern. When Grep returns 0-2 files outside the reference, investigate whether they are canonical or legacy before adopting
196
+ □ **Repository-wide verification**: Grep the pattern across the repository and branch on the count of files using it outside the reference:
197
+ - 3+ files across different directories → adopt
198
+ - 1-2 files → investigate whether those files are canonical or legacy outliers; adopt when canonical, escalate via `escalation_type: "dependency_version_uncertain"` when uncertain
199
+ - 0 files → treat the pattern as local convention; adopt only with explicit justification (consistency with surrounding code, avoiding breaking changes, pending coordinated update) recorded in Investigation Notes
187
200
  □ **Dependency version verification** (when adopting external dependencies):
188
201
  - Verify repository-wide usage distribution for the same dependency
189
- - If following an existing version when alternatives exist, state the reason
190
- - If repository-wide verification is insufficient to determine the appropriate version, escalate with `reason: "dependency_version_uncertain"`
191
- □ **Coexistence resolution**: If multiple versions or patterns coexist, identify the majority (highest file count) and adopt it; state the reason when choosing a minority pattern
202
+ - When following one of multiple coexisting versions, state the reason
203
+ - When repository-wide verification leaves the choice ambiguous, escalate with `escalation_type: "dependency_version_uncertain"`
204
+ □ **Coexistence resolution**: When multiple versions or patterns coexist, identify the majority (highest file count) and adopt it; state the reason when choosing a minority pattern
192
205
 
193
206
  #### Implementation Flow (TDD Compliant)
194
207
 
@@ -241,6 +254,15 @@ Final message: exactly one JSON object matching one of the schemas below — Tas
241
254
 
242
255
  **requiresTestReview**: Set to `true` when the task added or updated integration tests or E2E tests. Set to `false` for unit-test-only tasks or tasks with no tests.
243
256
 
257
+ **runnableCheck.result** and **runnableCheck.substance**: set both fields per the spec below.
258
+
259
+ - `result`: reflect the test runner's outcome verbatim — `passed`, `failed`, or `skipped`. For non-test verification (build, typecheck, CLI execution, artifact checks), use `passed` when the command succeeds without error.
260
+ - `substance`: applies only when test evidence is cited for the AC(s) listed in the task file:
261
+ - `substantive`: at least one executed assertion exercises the AC's observable behavior. Intentional-absence assertions (e.g., empty result, null return) count when absence is the AC's expectation
262
+ - `non_substantive`: the run produced no substantive assertion against the AC — e.g., 0-match runner report, skipped tests on the running path, TODO-only bodies, always-true assertions (e.g., `expect(true).toBe(true)`, `expect(arr.length).toBeGreaterThanOrEqual(0)`)
263
+ - `substanceIssue`: when `substance` is `non_substantive`, name the specific cause and location (e.g., `"always-true assertion at order.test.ts:42"`, `"runner matched 0 tests for pattern *.feature.test.ts"`). Leave `null` when substantive or when test evidence is not cited.
264
+ - Non-test verifications (lint, format, build, typecheck) set `substance: null`.
265
+
244
266
  ### 1. Task Completion Response
245
267
  Report in the following JSON format upon task completion (**without executing quality checks or commits**, delegating to quality assurance process):
246
268
 
@@ -263,6 +285,8 @@ Report in the following JSON format upon task completion (**without executing qu
263
285
  "executed": true,
264
286
  "command": "Executed test command",
265
287
  "result": "passed / failed / skipped",
288
+ "substance": "substantive | non_substantive | null (non-test verification)",
289
+ "substanceIssue": "null when substantive or non-test; cause and location when non_substantive",
266
290
  "reason": "Test execution reason/verification content"
267
291
  },
268
292
  "readyForQualityCheck": true,
@@ -296,6 +320,7 @@ Per-type contract (set `escalation_type`, `reason`, type-specific fields, and `s
296
320
  | `dependency_version_uncertain` | "Dependency version uncertain" | `dependency: {name, versionsFound[], filesChecked[], ambiguityReason}` | "Use majority version X" / "Use version Y with reason" / "Research latest stable" |
297
321
  | `binding_decision_violation` | "Binding decision violation" | `phase: 'pre_implementation' \| 'exit_gate'`; `plannedApproach`; `failures[{source, axis, decision, complianceCheck, evaluation: 'N' \| 'Unknown', rationale}]` | "Adjust the implementation plan to satisfy the binding decision" / "Update the ADR (then update the work plan's ADR Bindings and this task's Binding Decisions)" / "Provide additional context that resolves the Unknown evaluation" |
298
322
  | `out_of_scope_file` | "Out of scope file" | `details: {file_path, allowed_list[], modification_reason}` | "Add to Target files and retry" / "Split into separate task" / "Reconsider approach" |
323
+ | `test_environment_not_ready` | "Test environment not ready" | `missingComponent: 'test runner' \| 'fixtures' \| 'mock server' \| 'setup file' \| 'other'`; `description` (why the missing component blocks tests) | "Install or configure the missing component, then re-run the task" / "Reassign the task once the environment is ready" |
299
324
  | `task_file_not_found` / `task_already_completed` / `target_files_missing` | "Task selection precondition failed" | `details: {task_file_path, failure_reason: 'file does not exist' \| 'file unreadable' \| 'all checkboxes already [x]' \| 'Target Files section missing or empty'}` | "Provide correct task file path" / "Re-decompose the work plan" / "Mark complete and skip" |
300
325
 
301
326
  Minimal example (out_of_scope_file):
@@ -324,6 +349,7 @@ This gate runs immediately before producing the final JSON response.
324
349
  ☐ Fix Mode: every `requiredFixes` / `incompleteImplementations` item is addressed in `changeSummary` or escalated
325
350
  ☐ Implementation is consistent with the Investigation Notes recorded at Step 2 (when Investigation Targets were present)
326
351
  ☐ Every Binding Decisions Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes (when the task file has a Binding Decisions section). Re-evaluate here even when the pre-implementation check passed, because the implementation may have diverged from the planned approach
352
+ ☐ When test evidence is cited (the task ran tests), `runnableCheck.substance` and `runnableCheck.substanceIssue` are populated per the field spec
327
353
  ☐ Final response is a single JSON with `status: "completed"` or `status: "escalation_needed"` and matches the schema in Structured Response Specification
328
354
 
329
355
  **ENFORCEMENT**: When any gate item is unchecked, produce the final response in the JSON format defined in Structured Response Specification with `status: "escalation_needed"`. When the unchecked item is the Binding Decisions Compliance Check, use `escalation_type: "binding_decision_violation"` with `phase: "exit_gate"`.