codex-workflows 0.2.1 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. package/.agents/skills/recipe-add-integration-tests/SKILL.md +2 -2
  2. package/.agents/skills/recipe-build/SKILL.md +1 -1
  3. package/.agents/skills/recipe-diagnose/SKILL.md +20 -4
  4. package/.agents/skills/recipe-front-build/SKILL.md +2 -2
  5. package/.agents/skills/recipe-fullstack-build/SKILL.md +1 -1
  6. package/.agents/skills/recipe-fullstack-implement/SKILL.md +1 -1
  7. package/.agents/skills/recipe-implement/SKILL.md +1 -1
  8. package/.agents/skills/recipe-reverse-engineer/SKILL.md +56 -12
  9. package/.agents/skills/recipe-update-doc/SKILL.md +10 -5
  10. package/.agents/skills/subagents-orchestration-guide/SKILL.md +3 -3
  11. package/.agents/skills/subagents-orchestration-guide/references/monorepo-flow.md +2 -2
  12. package/.codex/agents/code-reviewer.toml +11 -1
  13. package/.codex/agents/code-verifier.toml +58 -21
  14. package/.codex/agents/document-reviewer.toml +4 -2
  15. package/.codex/agents/integration-test-reviewer.toml +4 -0
  16. package/.codex/agents/investigator.toml +20 -17
  17. package/.codex/agents/prd-creator.toml +39 -24
  18. package/.codex/agents/quality-fixer-frontend.toml +15 -7
  19. package/.codex/agents/quality-fixer.toml +15 -7
  20. package/.codex/agents/requirement-analyzer.toml +4 -0
  21. package/.codex/agents/rule-advisor.toml +9 -0
  22. package/.codex/agents/scope-discoverer.toml +67 -29
  23. package/.codex/agents/security-reviewer.toml +4 -0
  24. package/.codex/agents/solver.toml +6 -2
  25. package/.codex/agents/task-executor-frontend.toml +9 -0
  26. package/.codex/agents/task-executor.toml +9 -0
  27. package/.codex/agents/technical-designer-frontend.toml +68 -115
  28. package/.codex/agents/technical-designer.toml +70 -114
  29. package/.codex/agents/verifier.toml +11 -13
  30. package/README.md +2 -2
  31. package/package.json +1 -1
@@ -109,11 +109,11 @@ Check Step 5 result:
109
109
 
110
110
  Spawn quality-fixer agent: "Final quality assurance for test files added in this workflow. Run all tests and verify coverage."
111
111
 
112
- **Expected output**: `approved` (true/false)
112
+ **Expected output**: `status` (`approved`/`blocked`)
113
113
 
114
114
  ### Step 8: Commit
115
115
 
116
- On `approved: true` from quality-fixer:
116
+ On `status: "approved"` from quality-fixer:
117
117
  - MUST commit test files with appropriate message
118
118
  ENFORCEMENT: Commits without quality-fixer approval are invalid.
119
119
 
@@ -80,7 +80,7 @@ For EACH task, YOU MUST:
80
80
  - `approved` -> Proceed to step 4
81
81
  - `readyForQualityCheck: true` -> Proceed to step 4
82
82
  4. **Spawn quality-fixer agent**: "Execute all quality checks and fixes"
83
- 5. **COMMIT on approval**: After `approved: true` from quality-fixer -> Execute git commit
83
+ 5. **COMMIT on approval**: After `status: "approved"` from quality-fixer -> Execute git commit
84
84
 
85
85
  **CRITICAL**: MUST monitor ALL structured responses WITHOUT EXCEPTION and ENSURE every quality gate is passed.
86
86
  ENFORCEMENT: Proceeding past a failed quality gate invalidates all subsequent work.
@@ -83,7 +83,21 @@ Register the following and execute:
83
83
 
84
84
  ### Step 1: Investigation (investigator)
85
85
 
86
- Spawn investigator agent: "Comprehensively collect information related to the following phenomenon. Phenomenon: [Problem reported by user]. Problem essence: [taskEssence]. Investigation focus: [investigationFocus]. Applicable rules: [selectedRules summary]."
86
+ Spawn investigator agent with the following prompt:
87
+
88
+ ```text
89
+ Comprehensively collect information related to the following phenomenon.
90
+
91
+ Phenomenon: [Problem reported by user]
92
+ Problem essence: [taskEssence]
93
+ Investigation focus: [investigationFocus]
94
+ Applicable rules: [selectedRules summary]
95
+
96
+ For change failures, also include:
97
+ - what changed
98
+ - what broke
99
+ - what both areas share
100
+ ```
87
101
 
88
102
  **Expected output**: Evidence matrix, comparison analysis results, causal tracking results, list of unexplored areas, investigation limitations
89
103
 
@@ -92,12 +106,14 @@ Spawn investigator agent: "Comprehensively collect information related to the fo
92
106
  Review investigation output:
93
107
 
94
108
  **Quality Check** (verify output contains the following):
95
- - [ ] comparisonAnalysis
96
- - [ ] causalChain for each hypothesis (reaching stop condition)
109
+ - [ ] `comparisonAnalysis` is present and `normalImplementation` is non-null, or explicitly states that no working implementation was found
110
+ - [ ] causalChain for each hypothesis reaches a stop condition
97
111
  - [ ] causeCategory for each hypothesis
112
+ - [ ] `investigationSources` covers at least 3 distinct source types
113
+ - [ ] each hypothesis has supporting evidence with a concrete source
98
114
  - [ ] Investigation covering investigationFocus items (when provided)
99
115
 
100
- **If quality insufficient**: MUST re-spawn investigator agent specifying missing items
116
+ **If quality insufficient**: MUST re-spawn investigator agent specifying the missing items and include the previous investigation output for context
101
117
  ENFORCEMENT: Proceeding to verifier with incomplete investigation data produces unreliable conclusions.
102
118
 
103
119
  **design_gap Escalation**:
@@ -74,7 +74,7 @@ Verify generated task files exist in docs/plans/tasks/.
74
74
  Each sub-agent responds in JSON format:
75
75
  - **task-executor-frontend**: status, filesModified, testsAdded, requiresTestReview, readyForQualityCheck
76
76
  - **integration-test-reviewer**: status (approved/needs_revision/blocked), requiredFixes
77
- - **quality-fixer-frontend**: status, checksPerformed, fixesApplied, approved
77
+ - **quality-fixer-frontend**: status, checksPerformed, fixesApplied
78
78
 
79
79
  ### Execution Flow for Each Task
80
80
 
@@ -88,7 +88,7 @@ For EACH task, YOU MUST:
88
88
  - `approved` -> Proceed to step 4
89
89
  - `readyForQualityCheck: true` -> Proceed to step 4
90
90
  4. **Spawn quality-fixer-frontend agent**: "Execute all frontend quality checks and fixes"
91
- 5. **COMMIT on approval**: After `approved: true` from quality-fixer-frontend -> Execute git commit. Use `changeSummary` for commit message.
91
+ 5. **COMMIT on approval**: After `status: "approved"` from quality-fixer-frontend -> Execute git commit. Use `changeSummary` for commit message.
92
92
 
93
93
  **CRITICAL**: MUST monitor ALL structured responses WITHOUT EXCEPTION and ENSURE every quality gate is passed.
94
94
  ENFORCEMENT: Proceeding past a failed quality gate invalidates all subsequent work.
@@ -98,7 +98,7 @@ For EACH task, YOU MUST:
98
98
  - `approved` -> Proceed to step 4
99
99
  - `readyForQualityCheck: true` -> Proceed to step 4
100
100
  4. **Spawn quality-fixer agent** (layer-appropriate per routing table): "Execute all quality checks and fixes"
101
- 5. **COMMIT on approval**: After `approved: true` from quality-fixer -> Execute git commit
101
+ 5. **COMMIT on approval**: After `status: "approved"` from quality-fixer -> Execute git commit
102
102
 
103
103
  **CRITICAL**: MUST monitor ALL structured responses WITHOUT EXCEPTION and ENSURE every quality gate is passed.
104
104
  ENFORCEMENT: Proceeding past a failed quality gate invalidates all subsequent work.
@@ -123,7 +123,7 @@ ENFORCEMENT: Sub-agent prompts missing the constraint suffix MUST be re-issued w
123
123
  1. Execute ONE task completely before starting next (each task goes through the full 4-step cycle individually, using the correct executor per filename pattern)
124
124
  2. Check executor status before quality-fixer (escalation check)
125
125
  3. Quality-fixer MUST run after each executor (no skipping)
126
- 4. Commit MUST execute when quality-fixer returns `approved: true` (do not defer to end)
126
+ 4. Commit MUST execute when quality-fixer returns `status: "approved"` (do not defer to end)
127
127
 
128
128
  ### Security Review (After All Tasks Complete)
129
129
 
@@ -106,7 +106,7 @@ After user grants "batch approval for entire implementation phase", enter autono
106
106
  - `approved` -> Proceed to step 3
107
107
  - Otherwise -> Proceed to step 3
108
108
  3. Spawn quality-fixer (or quality-fixer-frontend) agent: "Quality check and fixes"
109
- 4. git commit -> Execute on `approved: true`
109
+ 4. git commit -> Execute on `status: "approved"`
110
110
 
111
111
  ### Security Review (After All Tasks Complete)
112
112
 
@@ -20,7 +20,7 @@ Target: $ARGUMENTS
20
20
  **Execution Protocol**:
21
21
  1. **Spawn agents for all work** -- your role is to invoke sub-agents, pass data between them, and report results
22
22
  2. **Process one step at a time**: Execute steps sequentially within each unit (2 -> 3 -> 4 -> 5). Each step's output is the required input for the next step. Complete all steps for one unit before starting the next
23
- 3. **Pass `$STEP_N_OUTPUT` as-is** to sub-agents -- the orchestrator bridges data without processing or filtering it
23
+ 3. **Pass `$STEP_N_OUTPUT` as-is** to sub-agents -- the orchestrator bridges data without processing or filtering it, except for steps that explicitly define a deterministic transformation with an input schema, output schema, and mapping rules
24
24
 
25
25
  **Task Registration**: Register phases first, then steps within each phase as you enter it. Track status for each step.
26
26
 
@@ -44,7 +44,7 @@ Ask the user to confirm:
44
44
 
45
45
  ```
46
46
  Phase 1: PRD Generation
47
- Step 1: Scope Discovery (unified, single pass)
47
+ Step 1: Scope Discovery (unified, single pass -> group into PRD units -> human review)
48
48
  Step 2-5: Per-unit loop (Generation -> Verification -> Review -> Revision)
49
49
 
50
50
  Phase 2: Design Doc Generation (if requested)
@@ -67,17 +67,20 @@ Spawn scope-discoverer agent: "Discover functional scope targets in the codebase
67
67
  **Quality Gate**:
68
68
  - At least one unit discovered -> proceed
69
69
  - No units discovered -> ask user for hints
70
+ - `$STEP_1_OUTPUT.prdUnits` exists
71
+ - All `sourceUnits` across `prdUnits` (flattened, deduplicated) match the set of `discoveredUnits` IDs — no unit missing, no unit duplicated
72
+ - Each discovered unit's `unitInventory` has at least one non-empty category. If all categories are empty, re-run discovery with focus on that unit
70
73
 
71
- **[STOP — BLOCKING]** If human review enabled: Present discovered units to user for confirmation.
74
+ **[STOP — BLOCKING]** If human review enabled: Present `$STEP_1_OUTPUT.prdUnits` with their source unit mapping to user for confirmation.
72
75
  **CANNOT proceed until user explicitly confirms.**
73
76
 
74
77
  ### Step 2-5: Per-Unit Processing
75
78
 
76
- **FOR** each unit in `$STEP_1_OUTPUT.discoveredUnits` **(sequential, one unit at a time)**:
79
+ **FOR** each unit in `$STEP_1_OUTPUT.prdUnits` **(sequential, one unit at a time)**:
77
80
 
78
81
  #### Step 2: PRD Generation
79
82
 
80
- Spawn prd-creator agent: "Create reverse-engineered PRD for the following feature. Operation Mode: reverse-engineer. External Scope Provided: true. Feature: $UNIT_NAME. Description: $UNIT_DESCRIPTION. Related Files: $UNIT_RELATED_FILES. Entry Points: $UNIT_ENTRY_POINTS. Skip independent scope discovery. Use provided scope data. Create final version PRD based on code investigation within specified scope."
83
+ Spawn prd-creator agent: "Create reverse-engineered PRD for the following feature. Operation Mode: reverse-engineer. External Scope Provided: true. Feature: $PRD_UNIT_NAME. Description: $PRD_UNIT_DESCRIPTION. Related Files: $PRD_UNIT_COMBINED_RELATED_FILES. Entry Points: $PRD_UNIT_COMBINED_ENTRY_POINTS. Source Units: $PRD_UNIT_SOURCE_UNITS. Use provided scope as an investigation starting point. If tracing entry points reveals directly connected files outside this scope, include them. Create final version PRD based on thorough code investigation."
81
84
 
82
85
  **Store output as**: `$STEP_2_OUTPUT` (PRD path)
83
86
 
@@ -85,12 +88,13 @@ Spawn prd-creator agent: "Create reverse-engineered PRD for the following featur
85
88
 
86
89
  **Prerequisite**: $STEP_2_OUTPUT (PRD path from Step 2)
87
90
 
88
- Spawn code-verifier agent: "Verify consistency between PRD and code implementation. doc_type: prd. document_path: $STEP_2_OUTPUT. code_paths: $UNIT_RELATED_FILES. verbose: false."
91
+ Spawn code-verifier agent: "Verify consistency between PRD and code implementation. doc_type: prd. document_path: $STEP_2_OUTPUT. verbose: false."
89
92
 
90
93
  **Store output as**: `$STEP_3_OUTPUT`
91
94
 
92
95
  **Quality Gate**:
93
- - consistencyScore >= 70 -> proceed to review
96
+ - consistencyScore >= 70 and verifiableClaimCount >= 20 -> proceed to review (guards against shallow verification passes with too few extracted claims)
97
+ - consistencyScore >= 70 and verifiableClaimCount < 20 -> re-run verifier because investigation depth is insufficient
94
98
  - consistencyScore < 70 -> flag for detailed review
95
99
 
96
100
  #### Step 4: Review
@@ -130,18 +134,58 @@ ENFORCEMENT: Exceeding 2 revision cycles without flagging produces unreviewed ou
130
134
 
131
135
  ### Step 6: Design Doc Scope Mapping
132
136
 
133
- **No additional discovery required.** Use `$STEP_1_OUTPUT` (scope discovery results) directly.
137
+ **Step type**: Deterministic transformation step executed by the orchestrator.
134
138
 
135
- Each PRD unit from Phase 1 maps to one Design Doc unit (using technical-designer).
139
+ **No additional discovery required.** Use `$STEP_1_OUTPUT.discoveredUnits` (implementation-granularity units) for technical profiles. Use `$STEP_1_OUTPUT.prdUnits[].sourceUnits` to trace which discovered units belong to each PRD unit.
136
140
 
137
- Map `$STEP_1_OUTPUT` units to Design Doc generation targets, carrying forward:
141
+ **Default mapping rule**: Each PRD unit maps to exactly 1 Design Doc unit.
142
+
143
+ Only split one PRD unit into multiple Design Doc units when BOTH are true:
144
+ 1. The source units contain clearly separate technical boundaries with low shared-file overlap
145
+ 2. Separate Design Docs would improve verification clarity (different public interfaces, dependencies, or module groups)
146
+
147
+ If the split conditions are not clearly met, keep 1 PRD unit -> 1 Design Doc unit.
148
+
149
+ Transform `$STEP_1_OUTPUT` into `$STEP_6_OUTPUT` using only the mapping rules in this step.
150
+
151
+ Map PRD units to Design Doc generation targets by resolving each PRD unit's `sourceUnits` back to `$STEP_1_OUTPUT.discoveredUnits`, carrying forward:
138
152
  - `technicalProfile.primaryModules` -> Primary Files
139
153
  - `technicalProfile.publicInterfaces` -> Public Interfaces
140
154
  - `dependencies` -> Dependencies
141
155
  - `relatedFiles` -> Scope boundary
156
+ - `unitInventory` -> Unit Inventory
142
157
 
143
158
  **Store output as**: `$STEP_6_OUTPUT`
144
159
 
160
+ `$STEP_6_OUTPUT` MUST be a JSON array of Design Doc generation targets in the following shape:
161
+
162
+ ```json
163
+ [
164
+ {
165
+ "unitId": "DD-001",
166
+ "parentPrdUnitId": "PRD-001",
167
+ "unitName": "Authentication",
168
+ "unitDescription": "Current implementation for sign-in and session management",
169
+ "sourceUnits": ["UNIT-001", "UNIT-002"],
170
+ "primaryModules": ["src/auth/service.ts", "src/auth/controller.ts"],
171
+ "publicInterfaces": ["AuthService.login()", "AuthController.handleLogin()"],
172
+ "dependencies": ["UNIT-003"],
173
+ "scopeBoundary": ["src/auth/*"],
174
+ "unitInventory": {
175
+ "routes": [],
176
+ "testFiles": [],
177
+ "publicExports": []
178
+ },
179
+ "mappingRationale": "Default 1:1 mapping from PRD unit because technical scope is cohesive"
180
+ }
181
+ ]
182
+ ```
183
+
184
+ **Quality Gate**:
185
+ - Every PRD unit appears in at least one `$STEP_6_OUTPUT` item
186
+ - Every `$STEP_6_OUTPUT` item references only discovered units from its parent PRD unit
187
+ - `mappingRationale` explicitly states whether the mapping is default 1:1 or an intentional split
188
+
145
189
  ### Step 7-10: Per-Unit Processing
146
190
 
147
191
  **FOR** each unit in `$STEP_6_OUTPUT` **(sequential, one unit at a time)**:
@@ -150,13 +194,13 @@ Map `$STEP_1_OUTPUT` units to Design Doc generation targets, carrying forward:
150
194
 
151
195
  **Scope**: Document current architecture as-is. This is a documentation task, not a design improvement task.
152
196
 
153
- Spawn technical-designer agent: "Create Design Doc for the following feature based on existing code. Operation Mode: create. Feature: $UNIT_NAME. Description: $UNIT_DESCRIPTION. Primary Files: $UNIT_PRIMARY_MODULES. Public Interfaces: $UNIT_PUBLIC_INTERFACES. Dependencies: $UNIT_DEPENDENCIES. Parent PRD: $APPROVED_PRD_PATH. Document current architecture as-is."
197
+ Spawn technical-designer agent: "Create Design Doc for the following feature based on existing code. Operation Mode: reverse-engineer. Feature: $UNIT_NAME. Description: $UNIT_DESCRIPTION. Primary Files: $UNIT_PRIMARY_MODULES. Public Interfaces: $UNIT_PUBLIC_INTERFACES. Dependencies: $UNIT_DEPENDENCIES. Unit Inventory: $UNIT_INVENTORY. Parent PRD: $APPROVED_PRD_PATH. Document current architecture as-is. Use Unit Inventory as the completeness baseline."
154
198
 
155
199
  **Store output as**: `$STEP_7_OUTPUT`
156
200
 
157
201
  #### Step 8: Code Verification
158
202
 
159
- Spawn code-verifier agent: "Verify consistency between Design Doc and code implementation. doc_type: design-doc. document_path: $STEP_7_OUTPUT. code_paths: $UNIT_PRIMARY_MODULES. verbose: false."
203
+ Spawn code-verifier agent: "Verify consistency between Design Doc and code implementation. doc_type: design-doc. document_path: $STEP_7_OUTPUT. verbose: false."
160
204
 
161
205
  **Store output as**: `$STEP_8_OUTPUT`
162
206
 
@@ -31,7 +31,7 @@ ENFORCEMENT: Skipping document-reviewer risks propagating inconsistencies to dow
31
31
  ```
32
32
  Target document -> [Stop: Confirm changes]
33
33
  |
34
- technical-designer / prd-creator (update mode)
34
+ technical-designer / technical-designer-frontend / prd-creator (update mode)
35
35
  |
36
36
  document-reviewer -> [Stop: Review approval]
37
37
  | (Design Doc only)
@@ -70,15 +70,20 @@ Check for existing documents in docs/design/, docs/prd/, docs/adr/.
70
70
  | Multiple candidates found | Present options to user |
71
71
  | No documents found | Report and end (suggest $recipe-design instead) |
72
72
 
73
- ### Step 2: Document Type Determination
73
+ ### Step 2: Document Type and Layer Determination
74
74
 
75
- Determine type from document path:
75
+ Determine type from document path, then determine the layer to select the correct update agent:
76
76
 
77
77
  | Path Pattern | Type | Update Agent | Notes |
78
78
  |-------------|------|--------------|-------|
79
- | `docs/design/*.md` | Design Doc | technical-designer | - |
79
+ | `docs/design/*.md` | Design Doc | technical-designer or technical-designer-frontend | See layer detection below |
80
80
  | `docs/prd/*.md` | PRD | prd-creator | - |
81
- | `docs/adr/*.md` | ADR | technical-designer | Minor changes: update existing file; Major changes: create new ADR file |
81
+ | `docs/adr/*.md` | ADR | technical-designer or technical-designer-frontend | See layer detection below |
82
+
83
+ **Layer detection** (for Design Doc and ADR):
84
+ Read the document and determine its layer from content signals:
85
+ - **Frontend** (-> technical-designer-frontend): Document title/scope mentions React, components, UI, frontend; or file contains component hierarchy, state management, UI interactions
86
+ - **Backend** (-> technical-designer): All other cases (API, data layer, business logic, infrastructure)
82
87
 
83
88
  **ADR Update Guidance**:
84
89
  - **Minor changes** (clarification, typo fix, small scope adjustment): Update the existing ADR file
@@ -173,10 +173,10 @@ All agents MUST use this vocabulary consistently:
173
173
 
174
174
  ## Structured Response Specification
175
175
 
176
- Subagents respond in JSON format. Key fields for orchestrator decisions:
176
+ Subagents respond in JSON format. The final response from each JSON-returning subagent must be the JSON payload itself, with no trailing prose. Key fields for orchestrator decisions:
177
177
  - **requirement-analyzer**: scale, confidence, affectedLayers, adrRequired, scopeDependencies, questions
178
178
  - **task-executor**: status (escalation_needed/blocked/completed), testsAdded, requiresTestReview
179
- - **quality-fixer**: approved (true/false)
179
+ - **quality-fixer**: status (approved/blocked)
180
180
  - **document-reviewer**: verdict.decision (approved/approved_with_conditions/needs_revision/rejected)
181
181
  - **design-sync**: sync_status (CONFLICTS_FOUND/NO_CONFLICTS) — text format with [SUMMARY] block
182
182
  - **integration-test-reviewer**: status (approved/needs_revision/blocked), requiredFixes
@@ -310,7 +310,7 @@ Stop autonomous execution and escalate to user in the following cases:
310
310
  - `approved`: Proceed to step 3
311
311
  - Otherwise: Proceed to step 3
312
312
  3. quality-fixer: Quality check and fixes
313
- 4. git commit (on `approved: true`)
313
+ 4. git commit (on `status: "approved"`)
314
314
 
315
315
  ## Main Orchestrator Roles
316
316
 
@@ -99,13 +99,13 @@ Each task uses the standard 4-step cycle with layer-appropriate agents:
99
99
  1. task-executor: Implementation
100
100
  2. Escalation check
101
101
  3. quality-fixer: Quality check and fixes
102
- 4. git commit (on approved: true)
102
+ 4. git commit (on status: "approved")
103
103
 
104
104
  ### frontend-task
105
105
  1. task-executor-frontend: Implementation
106
106
  2. Escalation check
107
107
  3. quality-fixer-frontend: Quality check and fixes
108
- 4. git commit (on approved: true)
108
+ 4. git commit (on status: "approved")
109
109
 
110
110
  ### integration-test-reviewer Placement
111
111
 
@@ -89,11 +89,14 @@ Verify against the Design Doc architecture:
89
89
  - No unnecessary duplicate implementations (Pattern 5 from ai-development-guide skill)
90
90
  - Existing codebase analysis section includes similar functionality investigation results
91
91
 
92
- ### 5. Calculate Compliance and Produce Report
92
+ ### 5. Calculate Compliance
93
93
  - Compliance rate = (fulfilled items + 0.5 x partially fulfilled items) / total AC items x 100
94
94
  - Compile all AC statuses, quality issues with specific locations
95
95
  - Determine verdict based on compliance rate
96
96
 
97
+ ### 6. Return JSON Result
98
+ Return the JSON result as the final response. See Output Format for the schema.
99
+
97
100
  ## Output Format
98
101
 
99
102
  ```json
@@ -136,6 +139,13 @@ Verify against the Design Doc architecture:
136
139
  - Provide solutions, not just problems; quantify wherever possible
137
140
  - Acknowledge good implementations; present improvements as actionable items
138
141
 
142
+ ## Completion Criteria
143
+
144
+ - [ ] All acceptance criteria individually evaluated
145
+ - [ ] Compliance rate calculated
146
+ - [ ] Verdict determined
147
+ - [ ] Final response is the JSON output
148
+
139
149
  ### Escalation Criteria
140
150
  Recommend higher-level review when: Design Doc itself has deficiencies, security concerns discovered, or critical performance issues found.
141
151
 
@@ -52,13 +52,6 @@ Skill Status:
52
52
  This agent outputs **verification results and discrepancy findings only**.
53
53
  Document modification and solution proposals are out of scope for this agent.
54
54
 
55
- ## Core Responsibilities
56
-
57
- 1. **Claim Extraction** - Extract verifiable claims from document
58
- 2. **Multi-source Evidence Collection** - Gather evidence from code, tests, and config
59
- 3. **Consistency Classification** - Classify each claim's implementation status
60
- 4. **Coverage Assessment** - Identify undocumented code and unimplemented specifications
61
-
62
55
  ## Verification Framework
63
56
 
64
57
  ### Claim Categories
@@ -97,28 +90,38 @@ For each claim, classify as one of:
97
90
 
98
91
  ## Execution Steps
99
92
 
100
- ### Step 1: Document Analysis
93
+ ### Step 1: Document Analysis — Section-by-Section Claim Extraction
101
94
 
102
- 1. Read the target document
103
- 2. Extract specific, testable claims
95
+ 1. Read the target document in full
96
+ 2. Process each section individually:
97
+ - Extract all statements that make verifiable claims about code behavior, data structures, file paths, API contracts, or system behavior
98
+ - Record `{ sectionName, claimCount, claims[] }`
99
+ - If a section contains factual statements but yields zero claims, record that explicitly for review
104
100
  3. Categorize each claim
105
101
  4. Note ambiguous claims that cannot be verified
102
+ 5. Minimum claim threshold: if `verifiableClaimCount < 20`, re-read under-covered sections and extract additional claims before continuing. Fewer than 20 claims usually indicates shallow extraction rather than a fully analyzed document.
106
103
 
107
104
  ### Step 2: Code Scope Identification
108
105
 
109
- 1. Extract file paths mentioned in document
110
- 2. Infer additional relevant paths from context
111
- 3. Build verification target list
106
+ 1. If `code_paths` are provided, use them as a starting point, not a ceiling
107
+ 2. If `code_paths` are not provided, extract file paths from the document and expand scope by searching for referenced identifiers
108
+ 3. Infer additional relevant paths from context
109
+ 4. Build and record the final verification target list
112
110
 
113
111
  ### Step 3: Evidence Collection
114
112
 
115
113
  For each claim:
116
114
 
117
- 1. **Primary Search**: Find direct implementation
115
+ 1. **Primary Search**: Find direct implementation with Read/Grep
118
116
  2. **Secondary Search**: Check test files for expected behavior
119
117
  3. **Tertiary Search**: Review config and type definitions
120
118
 
121
- Record source location and evidence strength for each finding.
119
+ Evidence rules:
120
+ - Record source location and evidence strength for each finding
121
+ - Existence claims must be verified with Grep or file enumeration before reporting
122
+ - Behavioral claims must be backed by reading the implementation, not by naming alone
123
+ - Identifier claims must compare exact strings from code against the document
124
+ - Single-source findings remain low confidence
122
125
 
123
126
  ### Step 4: Consistency Classification
124
127
 
@@ -130,11 +133,19 @@ For each claim with collected evidence:
130
133
  - medium: 2 sources agree
131
134
  - low: 1 source only
132
135
 
133
- ### Step 5: Coverage Assessment
136
+ ### Step 5: Reverse Coverage Assessment — Code-to-Document Direction
137
+
138
+ Perform this step with actual tool-backed enumeration, not memory:
139
+
140
+ 1. Enumerate routes/endpoints in scope and record whether each is documented
141
+ 2. Enumerate test files in scope and record whether their existence is documented
142
+ 3. Enumerate public exports/interfaces in primary source files and record whether each is documented
143
+ 4. Compile undocumented code items from the enumerations
144
+ 5. Compile unimplemented document items from earlier claim verification
134
145
 
135
- 1. **Document Coverage**: What percentage of code is documented?
136
- 2. **Implementation Coverage**: What percentage of specs are implemented?
137
- 3. List undocumented features and unimplemented specs
146
+ ### Step 6: Return JSON Result
147
+
148
+ Return the JSON result as the final response. See Output Format for the schema.
138
149
 
139
150
  ## Output Format
140
151
 
@@ -147,9 +158,16 @@ For each claim with collected evidence:
147
158
  "summary": {
148
159
  "docType": "prd|design-doc",
149
160
  "documentPath": "/path/to/document.md",
161
+ "verifiableClaimCount": 24,
162
+ "matchCount": 20,
150
163
  "consistencyScore": 85,
151
164
  "status": "consistent|mostly_consistent|needs_review|inconsistent"
152
165
  },
166
+ "claimCoverage": {
167
+ "sectionsAnalyzed": 8,
168
+ "sectionsWithClaims": 7,
169
+ "sectionsWithZeroClaims": ["Appendix"]
170
+ },
153
171
  "discrepancies": [
154
172
  {
155
173
  "id": "D001",
@@ -158,9 +176,20 @@ For each claim with collected evidence:
158
176
  "claim": "Brief claim description",
159
177
  "documentLocation": "PRD.md:45",
160
178
  "codeLocation": "src/auth.ts:120",
179
+ "evidence": "Observed implementation or enumeration result",
161
180
  "classification": "What was found"
162
181
  }
163
182
  ],
183
+ "reverseCoverage": {
184
+ "routesInCode": 6,
185
+ "routesDocumented": 5,
186
+ "undocumentedRoutes": ["POST /admin/reindex (src/routes/admin.ts:42)"],
187
+ "testFilesFound": 4,
188
+ "testFilesDocumented": 2,
189
+ "exportsInCode": 12,
190
+ "exportsDocumented": 10,
191
+ "undocumentedExports": ["rebuildSearchIndex (src/search/index.ts:18)"]
192
+ },
164
193
  "coverage": {
165
194
  "documented": ["Feature areas with documentation"],
166
195
  "undocumented": ["Code features lacking documentation"],
@@ -186,6 +215,8 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
186
215
  - (minorDiscrepancies * 2)
187
216
  ```
188
217
 
218
+ If `verifiableClaimCount < 20`, treat the score as unstable and return to Step 1 before finalizing. This threshold exists to prevent shallow extraction from producing an artificially high score.
219
+
189
220
  | Score | Status | Interpretation |
190
221
  |-------|--------|----------------|
191
222
  | 85-100 | consistent | Document accurately reflects code |
@@ -195,19 +226,25 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
195
226
 
196
227
  ## Completion Criteria
197
228
 
198
- - [ ] Extracted all verifiable claims from document
229
+ - [ ] Extracted claims section-by-section with per-section counts recorded
230
+ - [ ] `verifiableClaimCount >= 20`
199
231
  - [ ] Collected evidence from multiple sources for each claim
200
232
  - [ ] Classified each claim (match/drift/gap/conflict)
233
+ - [ ] Performed reverse coverage with route, test file, and public export enumeration
201
234
  - [ ] Identified undocumented features in code
202
235
  - [ ] Identified unimplemented specifications
203
236
  - [ ] Calculated consistency score
204
- - [ ] Output in specified format
237
+ - [ ] Final response is the JSON output
205
238
 
206
239
  ## Output Self-Check
207
240
  - [ ] All findings are based on verification evidence (no modifications proposed)
241
+ - [ ] Existence claims are backed by Grep or enumeration evidence
242
+ - [ ] Behavioral claims are backed by reading the actual implementation
243
+ - [ ] Identifier comparisons use exact strings from code
208
244
  - [ ] Each classification cites multiple sources (not single-source)
209
245
  - [ ] Low-confidence classifications are explicitly noted
210
246
  - [ ] Contradicting evidence is documented, not ignored
247
+ - [ ] `reverseCoverage` includes concrete counts from tool-backed enumeration
211
248
 
212
249
  ## Completion Gate [BLOCKING]
213
250
 
@@ -127,13 +127,15 @@ Checklist:
127
127
  - [ ] If prior_context_count > 0: Each item has resolution status
128
128
  - [ ] If prior_context_count > 0: `prior_context_check` object prepared
129
129
  - [ ] Output is valid JSON
130
+ - [ ] Final response is the JSON output
130
131
 
131
132
  Complete all items before proceeding to output.
132
133
 
133
- ### Step 6: Review Result Report
134
- - Output results in JSON format according to perspective
134
+ ### Step 6: Return JSON Result
135
+ - Use the JSON schema according to review mode (comprehensive or perspective-specific)
135
136
  - Clearly classify problem importance
136
137
  - Include `prior_context_check` object if prior_context_count > 0
138
+ - Return the JSON result as the final response. See Output Format for the schema.
137
139
 
138
140
  ## Output Format
139
141
 
@@ -78,6 +78,9 @@ Evaluate each test for:
78
78
  - No shared state
79
79
  - No time-dependent logic
80
80
 
81
+ ### 4. Return JSON Result
82
+ Return the JSON result as the final response. See Output Format for the schema.
83
+
81
84
  ## Output Format
82
85
 
83
86
  ```json
@@ -137,6 +140,7 @@ Evaluate each test for:
137
140
  - [ ] No test interdependencies
138
141
  - [ ] Deterministic execution (no random/time dependency)
139
142
  - [ ] Test name matches verification content
143
+ - [ ] Final response is the JSON output
140
144
 
141
145
  ## Common Issues and Fixes
142
146
 
@@ -47,14 +47,6 @@ Skill Status:
47
47
  This agent outputs **evidence matrix and factual observations only**.
48
48
  Solution derivation is out of scope for this agent.
49
49
 
50
- ## Core Responsibilities
51
-
52
- 1. **Multi-source information collection (Triangulation)** - Collect data from multiple sources without depending on a single source
53
- 2. **External information collection (web search)** - Search official documentation, community, and known library issues
54
- 3. **Hypothesis enumeration and causal tracking** - List multiple causal relationship candidates and trace to root cause
55
- 4. **Impact scope identification** - Identify locations implemented with the same pattern
56
- 5. **Unexplored areas disclosure** - Honestly report areas that could not be investigated
57
-
58
50
  ## Execution Steps
59
51
 
60
52
  ### Step 1: Problem Understanding and Investigation Strategy
@@ -70,9 +62,18 @@ Solution derivation is out of scope for this agent.
70
62
 
71
63
  ### Step 2: Information Collection
72
64
 
73
- - **Internal sources**: Code, git history, dependencies, configuration, Design Doc/ADR
74
- - **External sources (web search)**: Official documentation, Stack Overflow, GitHub Issues, package issue trackers
75
- - **Comparison analysis**: Differences between working implementation and problematic area (call order, initialization timing, configuration values)
65
+ Investigate each source type below and record findings even when empty:
66
+
67
+ | Source | Minimum Investigation Action |
68
+ |--------|------------------------------|
69
+ | Code | Read directly related files and search for the reported symbols, errors, or messages |
70
+ | git history | Review recent history for affected files and compare working/broken states when applicable |
71
+ | Dependencies | Inspect package manifests and relevant package versions or changelogs |
72
+ | Configuration | Read relevant config files and search for related keys across the project |
73
+ | Design Doc or ADR | Search for matching docs and read them. Record findings or explicitly record that none were found |
74
+ | External | Search official documentation for the primary technology and for the reported error text. Record findings or explicitly record that no relevant result was found |
75
+
76
+ **Comparison analysis**: Differences between working implementation and problematic area (call order, initialization timing, configuration values)
76
77
 
77
78
  Information source priority:
78
79
  1. Comparison with "working implementation" in project
@@ -86,16 +87,17 @@ Information source priority:
86
87
  - Collect supporting and contradicting evidence for each hypothesis
87
88
  - Determine causeCategory: typo / logic_error / missing_constraint / design_gap / external_factor
88
89
 
89
- **Signs of shallow tracking**:
90
- - Stopping at "~ is not configured" → without tracing why it's not configured
91
- - Stopping at technical element names → without tracing why that state occurred
90
+ **Tracking depth check**: Each causal chain must reach a stop condition. If it ends at a configuration state or technical label, continue tracing why that state exists.
92
91
 
93
- ### Step 4: Impact Scope Identification and Output
92
+ ### Step 4: Impact Scope Identification
94
93
 
95
94
  - Search for locations implemented with the same pattern (impactScope)
96
95
  - Determine recurrenceRisk: low (isolated) / medium (2 or fewer locations) / high (3+ locations or design_gap)
97
96
  - Disclose unexplored areas and investigation limitations
98
- - Output in JSON format
97
+
98
+ ### Step 5: Return JSON Result
99
+
100
+ Return the JSON result as the final response. See Output Format for the schema.
99
101
 
100
102
  ## Evidence Strength Classification
101
103
 
@@ -169,10 +171,11 @@ Information source priority:
169
171
 
170
172
  - [ ] Determined problem type and executed diff analysis for change failures
171
173
  - [ ] Output comparisonAnalysis
172
- - [ ] Investigated internal and external sources
174
+ - [ ] Investigated each source type or recorded that it had no relevant findings
173
175
  - [ ] Enumerated 2+ hypotheses with causal tracking, evidence collection, and causeCategory determination for each
174
176
  - [ ] Determined impactScope and recurrenceRisk
175
177
  - [ ] Documented unexplored areas and investigation limitations
178
+ - [ ] Final response is the JSON output
176
179
 
177
180
  ## Output Self-Check
178
181
  - [ ] Multiple hypotheses were evaluated (not just the first plausible one)