codex-workflows 0.2.2 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -64,16 +64,16 @@ description: "Documentation creation criteria for PRD, ADR, Design Doc, UI Spec,
64
64
  ### UI Specification
65
65
  **Purpose**: Define UI structure, screen transitions, component decomposition, and interaction design
66
66
  **Includes**: Screen list and transitions, component state x display matrix, interaction definitions, AC traceability, existing component reuse map, accessibility requirements
67
- **Excludes**: Technical implementation details, API contracts, test implementation, implementation schedule
67
+ **Excludes**: Technical implementation details, API contracts, test implementation (generated by acceptance-test-generator), implementation schedule
68
68
 
69
69
  ### Design Document
70
70
  **Purpose**: Define technical implementation methods in detail
71
71
  **Includes**: Existing codebase analysis, technical approach, dependencies and constraints, interface/contract definitions, data flow, acceptance criteria, change impact map, code inspection evidence
72
- **Excludes**: Why that technology was chosen (reference ADR), when/who to implement (reference Work Plan)
72
+ **Excludes**: Why that technology was chosen (reference ADR), when/who to implement (reference Work Plan), detailed test strategy and test case selection (generated by acceptance-test-generator from acceptance criteria)
73
73
 
74
74
  ### Work Plan
75
75
  **Purpose**: Implementation task management and progress tracking
76
- **Includes**: Task breakdown, schedule estimates, E2E verification procedures, Phase 4 Quality Assurance Phase (required), progress records
76
+ **Includes**: Task breakdown, schedule estimates, test skeleton file paths, Phase 4 Quality Assurance Phase (required), progress records
77
77
  **Excludes**: Technical rationale, design details
78
78
 
79
79
  **Phase Division Criteria**:
@@ -259,40 +259,15 @@ System Invariants:
259
259
  - Prerequisites: [Required pre-implementations]
260
260
 
261
261
  ### Integration Points
262
- Each integration point requires E2E verification:
263
262
 
264
263
  **Integration Point 1: [Name]**
265
264
  - Components: [Component A] to [Component B]
266
- - Verification: [How to verify integration works]
265
+ - Contract: [Interface/API contract between components]
267
266
 
268
267
  ### Migration Strategy
269
268
 
270
269
  [Technical migration approach, ensuring backward compatibility]
271
270
 
272
- ## Test Strategy
273
-
274
- ### Basic Test Design Policy
275
-
276
- Automatically derive test cases from acceptance criteria:
277
- - Create at least one test case for each acceptance criterion
278
- - Implement measurable standards from acceptance criteria as assertions
279
-
280
- ### Unit Tests
281
-
282
- [Unit testing policy and coverage goals]
283
-
284
- ### Integration Tests
285
-
286
- [Integration testing policy and important test cases]
287
-
288
- ### E2E Tests
289
-
290
- [E2E testing policy]
291
-
292
- ### Performance Tests
293
-
294
- [Performance testing methods and standards]
295
-
296
271
  ## Security Considerations
297
272
 
298
273
  Evaluate the following for this feature's trust boundaries and data flow:
@@ -48,11 +48,6 @@ Related Issue/PR: #XXX (if any)
48
48
  - [ ] [Functional completion criteria]
49
49
  - [ ] [Quality completion criteria]
50
50
 
51
- #### Operational Verification Procedures
52
- 1. [Operation verification steps]
53
- 2. [Expected result verification]
54
- 3. [Performance verification (when applicable)]
55
-
56
51
  ### Phase 2: [Phase Name] (Estimated commits: X)
57
52
  **Purpose**: [What this phase aims to achieve]
58
53
 
@@ -66,11 +61,6 @@ Related Issue/PR: #XXX (if any)
66
61
  - [ ] [Functional completion criteria]
67
62
  - [ ] [Quality completion criteria]
68
63
 
69
- #### Operational Verification Procedures
70
- 1. [Operation verification steps]
71
- 2. [Expected result verification]
72
- 3. [Performance verification (when applicable)]
73
-
74
64
  ### Phase 3: [Phase Name] (Estimated commits: X)
75
65
  **Purpose**: [What this phase aims to achieve]
76
66
 
@@ -84,9 +74,6 @@ Related Issue/PR: #XXX (if any)
84
74
  - [ ] [Functional completion criteria]
85
75
  - [ ] [Quality completion criteria]
86
76
 
87
- #### Operational Verification Procedures
88
- [Copy relevant integration point operational verification from Design Doc]
89
-
90
77
  ### Final Phase: Quality Assurance (Required) (Estimated commits: 1)
91
78
  **Purpose**: Overall quality assurance and Design Doc consistency verification
92
79
 
@@ -94,13 +81,10 @@ Related Issue/PR: #XXX (if any)
94
81
  - [ ] Verify all Design Doc acceptance criteria achieved
95
82
  - [ ] Security review: Verify security considerations from Design Doc are implemented
96
83
  - [ ] Quality checks (types, lint, format)
97
- - [ ] Execute all tests
84
+ - [ ] Execute all tests (including integration/E2E from test skeletons, when provided)
98
85
  - [ ] Coverage 70%+
99
86
  - [ ] Document updates
100
87
 
101
- #### Operational Verification Procedures
102
- [Copy operational verification procedures from Design Doc]
103
-
104
88
  ### Quality Assurance
105
89
  - [ ] Implement staged quality checks (details: refer to ai-development-guide skill)
106
90
  - [ ] All tests pass
@@ -110,7 +94,8 @@ Related Issue/PR: #XXX (if any)
110
94
 
111
95
  ## Completion Criteria
112
96
  - [ ] All phases completed
113
- - [ ] Each phase's operational verification procedures executed
97
+ - [ ] All integration/E2E tests passing (when test skeletons provided)
98
+ - [ ] Acceptance criteria manually verified (when test skeletons are not provided)
114
99
  - [ ] Design Doc acceptance criteria satisfied
115
100
  - [ ] Staged quality checks completed (zero errors)
116
101
  - [ ] All tests pass
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: recipe-add-integration-tests
3
- description: "Add integration/E2E tests to existing codebase using Design Doc acceptance criteria."
3
+ description: "Add integration/E2E tests to existing codebase using Design Docs."
4
4
  ---
5
5
 
6
6
  ## Required Skills [LOAD BEFORE EXECUTION]
@@ -26,11 +26,11 @@ description: "Add integration/E2E tests to existing codebase using Design Doc ac
26
26
  - Test review -> Spawn integration-test-reviewer agent
27
27
  - Quality checks -> Spawn quality-fixer agent
28
28
 
29
- Design Doc path: $ARGUMENTS
29
+ Document paths: $ARGUMENTS
30
30
 
31
31
  ## Prerequisites
32
32
 
33
- - Design Doc must exist (created manually or via reverse-engineer)
33
+ - At least one Design Doc must exist (created manually or via reverse-engineer)
34
34
  - Existing implementation to test
35
35
 
36
36
  ## Execution Flow
@@ -39,27 +39,59 @@ Design Doc path: $ARGUMENTS
39
39
 
40
40
  Reference documentation-criteria skill for task file template in Step 3.
41
41
 
42
- ### Step 1: Validate Design Doc
42
+ ### Step 1: Discover and Validate Documents
43
43
 
44
- Verify Design Doc exists at $ARGUMENTS or find the most recent in docs/design/.
44
+ ```bash
45
+ # Verify at least one document path was provided
46
+ test -n "$ARGUMENTS" || { echo "ERROR: No document paths provided"; exit 1; }
47
+
48
+ # Verify provided paths exist
49
+ ls $ARGUMENTS
50
+ ```
51
+
52
+ Use only the user-provided paths in `$ARGUMENTS`. Do not auto-discover additional Design Docs or UI Specs.
53
+
54
+ Classify provided documents by path and filename, using first-match-wins:
55
+ - Path matches `docs/ui-spec/*.md` -> **UI Spec**
56
+ - Path matches `docs/design/*-backend-*.md` or `docs/design/*backend*.md` -> **Design Doc (backend)**
57
+ - Path matches `docs/design/*-frontend-*.md` or `docs/design/*frontend*.md` -> **Design Doc (frontend)**
58
+ - Path matches `docs/design/*.md` and none of the above -> **single-layer Design Doc**
59
+
60
+ If a filename appears to match both backend and frontend, halt and ask the user which layer it belongs to.
45
61
 
46
62
  ### Step 2: Skeleton Generation
47
63
 
48
- Spawn acceptance-test-generator agent: "Generate test skeletons from Design Doc at [path from Step 1]."
64
+ Spawn acceptance-test-generator agent with only the documents that exist from Step 1:
65
+ ```text
66
+ Generate test skeletons from the following documents:
67
+ - Design Doc (backend): [path] <- include only if exists
68
+ - Design Doc (frontend): [path] <- include only if exists
69
+ - UI Spec: [path] <- include only if exists
70
+ ```
49
71
 
50
- **Expected output**: `generatedFiles` containing integration and e2e paths
72
+ **Expected output**: `generatedFiles` as a structured object grouped by layer, for example:
73
+ ```json
74
+ {
75
+ "backend": ["path/to/backend.int.test.ts"],
76
+ "frontend": ["path/to/frontend.int.test.ts"],
77
+ "e2e": ["path/to/flow.e2e.test.ts"]
78
+ }
79
+ ```
51
80
 
52
- ### Step 3: Create Task File [GATE]
81
+ ### Step 3: Create Task Files [GATE]
53
82
 
54
83
  **[STOP — BLOCKING]** Present task file content to user for confirmation before proceeding to implementation.
55
84
  **CANNOT proceed until user explicitly confirms.**
56
85
 
57
- Create task file at: `docs/plans/tasks/integration-tests-YYYYMMDD.md`
86
+ Create one task file per layer, using the monorepo-flow.md naming convention for deterministic agent routing:
87
+ - Backend skeletons exist -> `docs/plans/tasks/integration-tests-backend-task-YYYYMMDD.md`
88
+ - Frontend skeletons exist -> `docs/plans/tasks/integration-tests-frontend-task-YYYYMMDD.md`
89
+ - Single-layer (no backend/frontend distinction) -> `docs/plans/tasks/integration-tests-backend-task-YYYYMMDD.md`
58
90
 
59
- **Template**:
91
+ **Template** (per task file):
60
92
  ```markdown
61
93
  ---
62
- name: Implement integration tests for [feature name]
94
+ name: Implement [layer] integration tests for [feature name]
63
95
  type: test-implementation
64
96
  ---
65
97
 
@@ -69,8 +101,8 @@ Implement test cases defined in skeleton files.
69
101
 
70
102
  ## Target Files
71
103
 
72
- - Skeleton: [path from Step 2 generatedFiles]
73
- - Design Doc: [path from Step 1]
104
+ - Skeleton: [layer-specific paths from Step 2 generatedFiles]
105
+ - Design Doc: [layer-specific Design Doc from Step 1]
74
106
 
75
107
  ## Tasks
76
108
 
@@ -85,17 +117,22 @@ Implement test cases defined in skeleton files.
85
117
  - No quality issues
86
118
  ```
87
119
 
88
- **Output**: "Task file created at [path]. Ready for Step 4."
120
+ **Output**: "Task file(s) created at [path(s)]. Ready for Step 4."
89
121
 
90
122
  ### Step 4: Test Implementation
91
123
 
92
- Spawn task-executor agent: "Implement integration tests. Task file: docs/plans/tasks/integration-tests-YYYYMMDD.md. Implement tests following the task file."
124
+ For each task file from Step 3, invoke task-executor routed by filename pattern:
125
+ - `*-backend-task-*` -> Spawn `task-executor`
126
+ - `*-frontend-task-*` -> Spawn `task-executor-frontend`
127
+ - Prompt: "Task file: [task file path from Step 3]. Implement tests following the task file."
128
+
129
+ Execute one task file at a time through Steps 4 -> 5 -> 6 -> 7 before starting the next.
93
130
 
94
131
  **Expected output**: `status`, `testsAdded`
95
132
 
96
133
  ### Step 5: Test Review
97
134
 
98
- Spawn integration-test-reviewer agent: "Review test quality. Test files: [paths from Step 4 testsAdded]. Skeleton files: [paths from Step 2 generatedFiles]."
135
+ Spawn integration-test-reviewer agent: "Review test quality. Test files: [paths from Step 4 testsAdded]. Skeleton files: [layer-specific paths from Step 2 generatedFiles matching current task's layer]."
99
136
 
100
137
  **Expected output**: `status` (approved/needs_revision), `requiredFixes`
101
138
 
@@ -103,11 +140,14 @@ Spawn integration-test-reviewer agent: "Review test quality. Test files: [paths
103
140
 
104
141
  Check Step 5 result:
105
142
  - `status: approved` -> Mark complete, proceed to Step 7
106
- - `status: needs_revision` -> Spawn task-executor agent: "Fix the following issues in test files: [requiredFixes from Step 5]." Then return to Step 5.
143
+ - `status: needs_revision` -> Spawn the layer-appropriate executor with: "Fix the following issues in test files: [requiredFixes from Step 5]." Then return to Step 5. Maximum 2 revision cycles per task file; if still `needs_revision`, escalate to the user.
107
144
 
108
145
  ### Step 7: Quality Check
109
146
 
110
- Spawn quality-fixer agent: "Final quality assurance for test files added in this workflow. Run all tests and verify coverage."
147
+ Spawn quality-fixer routed by task filename pattern:
148
+ - `*-backend-task-*` -> Spawn `quality-fixer`
149
+ - `*-frontend-task-*` -> Spawn `quality-fixer-frontend`
150
+ - Prompt: "Final quality assurance for test files added in this workflow. Run all tests and verify coverage."
111
151
 
112
152
  **Expected output**: `status` (`approved`/`blocked`)
113
153
 
@@ -83,7 +83,21 @@ Register the following and execute:
83
83
 
84
84
  ### Step 1: Investigation (investigator)
85
85
 
86
- Spawn investigator agent: "Comprehensively collect information related to the following phenomenon. Phenomenon: [Problem reported by user]. Problem essence: [taskEssence]. Investigation focus: [investigationFocus]. Applicable rules: [selectedRules summary]."
86
+ Spawn investigator agent with the following prompt:
87
+
88
+ ```text
89
+ Comprehensively collect information related to the following phenomenon.
90
+
91
+ Phenomenon: [Problem reported by user]
92
+ Problem essence: [taskEssence]
93
+ Investigation focus: [investigationFocus]
94
+ Applicable rules: [selectedRules summary]
95
+
96
+ For change failures, also include:
97
+ - what changed
98
+ - what broke
99
+ - what both areas share
100
+ ```
87
101
 
88
102
  **Expected output**: Evidence matrix, comparison analysis results, causal tracking results, list of unexplored areas, investigation limitations
89
103
 
@@ -92,12 +106,14 @@ Spawn investigator agent: "Comprehensively collect information related to the fo
92
106
  Review investigation output:
93
107
 
94
108
  **Quality Check** (verify output contains the following):
95
- - [ ] comparisonAnalysis
96
- - [ ] causalChain for each hypothesis (reaching stop condition)
109
+ - [ ] `comparisonAnalysis` is present and `normalImplementation` is non-null, or explicitly states that no working implementation was found
110
+ - [ ] causalChain for each hypothesis reaches a stop condition
97
111
  - [ ] causeCategory for each hypothesis
112
+ - [ ] `investigationSources` covers at least 3 distinct source types
113
+ - [ ] each hypothesis has supporting evidence with a concrete source
98
114
  - [ ] Investigation covering investigationFocus items (when provided)
99
115
 
100
- **If quality insufficient**: MUST re-spawn investigator agent specifying missing items
116
+ **If quality insufficient**: MUST re-spawn investigator agent specifying the missing items and include the previous investigation output for context
101
117
  ENFORCEMENT: Proceeding to verifier with incomplete investigation data produces unreliable conclusions.
102
118
 
103
119
  **design_gap Escalation**:
@@ -69,6 +69,7 @@ Spawn scope-discoverer agent: "Discover functional scope targets in the codebase
69
69
  - No units discovered -> ask user for hints
70
70
  - `$STEP_1_OUTPUT.prdUnits` exists
71
71
  - All `sourceUnits` across `prdUnits` (flattened, deduplicated) match the set of `discoveredUnits` IDs — no unit missing, no unit duplicated
72
+ - Each discovered unit's `unitInventory` has at least one non-empty category. If all categories are empty, re-run discovery with focus on that unit
72
73
 
73
74
  **[STOP — BLOCKING]** If human review enabled: Present `$STEP_1_OUTPUT.prdUnits` with their source unit mapping to user for confirmation.
74
75
  **CANNOT proceed until user explicitly confirms.**
@@ -79,7 +80,7 @@ Spawn scope-discoverer agent: "Discover functional scope targets in the codebase
79
80
 
80
81
  #### Step 2: PRD Generation
81
82
 
82
- Spawn prd-creator agent: "Create reverse-engineered PRD for the following feature. Operation Mode: reverse-engineer. External Scope Provided: true. Feature: $PRD_UNIT_NAME. Description: $PRD_UNIT_DESCRIPTION. Related Files: $PRD_UNIT_COMBINED_RELATED_FILES. Entry Points: $PRD_UNIT_COMBINED_ENTRY_POINTS. Source Units: $PRD_UNIT_SOURCE_UNITS. Skip independent scope discovery. Use provided scope data. Create final version PRD based on code investigation within specified scope."
83
+ Spawn prd-creator agent: "Create reverse-engineered PRD for the following feature. Operation Mode: reverse-engineer. External Scope Provided: true. Feature: $PRD_UNIT_NAME. Description: $PRD_UNIT_DESCRIPTION. Related Files: $PRD_UNIT_COMBINED_RELATED_FILES. Entry Points: $PRD_UNIT_COMBINED_ENTRY_POINTS. Source Units: $PRD_UNIT_SOURCE_UNITS. Use provided scope as an investigation starting point. If tracing entry points reveals directly connected files outside this scope, include them. Create final version PRD based on thorough code investigation."
83
84
 
84
85
  **Store output as**: `$STEP_2_OUTPUT` (PRD path)
85
86
 
@@ -87,12 +88,13 @@ Spawn prd-creator agent: "Create reverse-engineered PRD for the following featur
87
88
 
88
89
  **Prerequisite**: $STEP_2_OUTPUT (PRD path from Step 2)
89
90
 
90
- Spawn code-verifier agent: "Verify consistency between PRD and code implementation. doc_type: prd. document_path: $STEP_2_OUTPUT. code_paths: $PRD_UNIT_COMBINED_RELATED_FILES. verbose: false."
91
+ Spawn code-verifier agent: "Verify consistency between PRD and code implementation. doc_type: prd. document_path: $STEP_2_OUTPUT. verbose: false."
91
92
 
92
93
  **Store output as**: `$STEP_3_OUTPUT`
93
94
 
94
95
  **Quality Gate**:
95
- - consistencyScore >= 70 -> proceed to review
96
+ - consistencyScore >= 70 and verifiableClaimCount >= 20 -> proceed to review (guards against shallow verification passes with too few extracted claims)
97
+ - consistencyScore >= 70 and verifiableClaimCount < 20 -> re-run verifier because investigation depth is insufficient
96
98
  - consistencyScore < 70 -> flag for detailed review
97
99
 
98
100
  #### Step 4: Review
@@ -151,6 +153,7 @@ Map PRD units to Design Doc generation targets by resolving each PRD unit's `sou
151
153
  - `technicalProfile.publicInterfaces` -> Public Interfaces
152
154
  - `dependencies` -> Dependencies
153
155
  - `relatedFiles` -> Scope boundary
156
+ - `unitInventory` -> Unit Inventory
154
157
 
155
158
  **Store output as**: `$STEP_6_OUTPUT`
156
159
 
@@ -168,6 +171,11 @@ Map PRD units to Design Doc generation targets by resolving each PRD unit's `sou
168
171
  "publicInterfaces": ["AuthService.login()", "AuthController.handleLogin()"],
169
172
  "dependencies": ["UNIT-003"],
170
173
  "scopeBoundary": ["src/auth/*"],
174
+ "unitInventory": {
175
+ "routes": [],
176
+ "testFiles": [],
177
+ "publicExports": []
178
+ },
171
179
  "mappingRationale": "Default 1:1 mapping from PRD unit because technical scope is cohesive"
172
180
  }
173
181
  ]
@@ -186,13 +194,13 @@ Map PRD units to Design Doc generation targets by resolving each PRD unit's `sou
186
194
 
187
195
  **Scope**: Document current architecture as-is. This is a documentation task, not a design improvement task.
188
196
 
189
- Spawn technical-designer agent: "Create Design Doc for the following feature based on existing code. Operation Mode: create. Feature: $UNIT_NAME. Description: $UNIT_DESCRIPTION. Primary Files: $UNIT_PRIMARY_MODULES. Public Interfaces: $UNIT_PUBLIC_INTERFACES. Dependencies: $UNIT_DEPENDENCIES. Parent PRD: $APPROVED_PRD_PATH. Document current architecture as-is."
197
+ Spawn technical-designer agent: "Create Design Doc for the following feature based on existing code. Operation Mode: reverse-engineer. Feature: $UNIT_NAME. Description: $UNIT_DESCRIPTION. Primary Files: $UNIT_PRIMARY_MODULES. Public Interfaces: $UNIT_PUBLIC_INTERFACES. Dependencies: $UNIT_DEPENDENCIES. Unit Inventory: $UNIT_INVENTORY. Parent PRD: $APPROVED_PRD_PATH. Document current architecture as-is. Use Unit Inventory as the completeness baseline."
190
198
 
191
199
  **Store output as**: `$STEP_7_OUTPUT`
192
200
 
193
201
  #### Step 8: Code Verification
194
202
 
195
- Spawn code-verifier agent: "Verify consistency between Design Doc and code implementation. doc_type: design-doc. document_path: $STEP_7_OUTPUT. code_paths: $UNIT_PRIMARY_MODULES. verbose: false."
203
+ Spawn code-verifier agent: "Verify consistency between Design Doc and code implementation. doc_type: design-doc. document_path: $STEP_7_OUTPUT. verbose: false."
196
204
 
197
205
  **Store output as**: `$STEP_8_OUTPUT`
198
206
 
@@ -52,13 +52,6 @@ Skill Status:
52
52
  This agent outputs **verification results and discrepancy findings only**.
53
53
  Document modification and solution proposals are out of scope for this agent.
54
54
 
55
- ## Core Responsibilities
56
-
57
- 1. **Claim Extraction** - Extract verifiable claims from document
58
- 2. **Multi-source Evidence Collection** - Gather evidence from code, tests, and config
59
- 3. **Consistency Classification** - Classify each claim's implementation status
60
- 4. **Coverage Assessment** - Identify undocumented code and unimplemented specifications
61
-
62
55
  ## Verification Framework
63
56
 
64
57
  ### Claim Categories
@@ -97,28 +90,38 @@ For each claim, classify as one of:
97
90
 
98
91
  ## Execution Steps
99
92
 
100
- ### Step 1: Document Analysis
93
+ ### Step 1: Document Analysis — Section-by-Section Claim Extraction
101
94
 
102
- 1. Read the target document
103
- 2. Extract specific, testable claims
95
+ 1. Read the target document in full
96
+ 2. Process each section individually:
97
+ - Extract all statements that make verifiable claims about code behavior, data structures, file paths, API contracts, or system behavior
98
+ - Record `{ sectionName, claimCount, claims[] }`
99
+ - If a section contains factual statements but yields zero claims, record that explicitly for review
104
100
  3. Categorize each claim
105
101
  4. Note ambiguous claims that cannot be verified
102
+ 5. Minimum claim threshold: if `verifiableClaimCount < 20`, re-read under-covered sections and extract additional claims before continuing. Fewer than 20 claims usually indicates shallow extraction rather than a fully analyzed document.
106
103
 
107
104
  ### Step 2: Code Scope Identification
108
105
 
109
- 1. Extract file paths mentioned in document
110
- 2. Infer additional relevant paths from context
111
- 3. Build verification target list
106
+ 1. If `code_paths` are provided, use them as a starting point, not a ceiling
107
+ 2. If `code_paths` are not provided, extract file paths from the document and expand scope by searching for referenced identifiers
108
+ 3. Infer additional relevant paths from context
109
+ 4. Build and record the final verification target list
112
110
 
113
111
  ### Step 3: Evidence Collection
114
112
 
115
113
  For each claim:
116
114
 
117
- 1. **Primary Search**: Find direct implementation
115
+ 1. **Primary Search**: Find direct implementation with Read/Grep
118
116
  2. **Secondary Search**: Check test files for expected behavior
119
117
  3. **Tertiary Search**: Review config and type definitions
120
118
 
121
- Record source location and evidence strength for each finding.
119
+ Evidence rules:
120
+ - Record source location and evidence strength for each finding
121
+ - Existence claims must be verified with Grep or file enumeration before reporting
122
+ - Behavioral claims must be backed by reading the implementation, not by naming alone
123
+ - Identifier claims must compare exact strings from code against the document
124
+ - Single-source findings remain low confidence
122
125
 
123
126
  ### Step 4: Consistency Classification
124
127
 
@@ -130,11 +133,15 @@ For each claim with collected evidence:
130
133
  - medium: 2 sources agree
131
134
  - low: 1 source only
132
135
 
133
- ### Step 5: Coverage Assessment
136
+ ### Step 5: Reverse Coverage Assessment — Code-to-Document Direction
137
+
138
+ Perform this step with actual tool-backed enumeration, not memory:
134
139
 
135
- 1. **Document Coverage**: What percentage of code is documented?
136
- 2. **Implementation Coverage**: What percentage of specs are implemented?
137
- 3. List undocumented features and unimplemented specs
140
+ 1. Enumerate routes/endpoints in scope and record whether each is documented
141
+ 2. Enumerate test files in scope and record whether their existence is documented
142
+ 3. Enumerate public exports/interfaces in primary source files and record whether each is documented
143
+ 4. Compile undocumented code items from the enumerations
144
+ 5. Compile unimplemented document items from earlier claim verification
138
145
 
139
146
  ### Step 6: Return JSON Result
140
147
 
@@ -151,9 +158,16 @@ Return the JSON result as the final response. See Output Format for the schema.
151
158
  "summary": {
152
159
  "docType": "prd|design-doc",
153
160
  "documentPath": "/path/to/document.md",
161
+ "verifiableClaimCount": 24,
162
+ "matchCount": 20,
154
163
  "consistencyScore": 85,
155
164
  "status": "consistent|mostly_consistent|needs_review|inconsistent"
156
165
  },
166
+ "claimCoverage": {
167
+ "sectionsAnalyzed": 8,
168
+ "sectionsWithClaims": 7,
169
+ "sectionsWithZeroClaims": ["Appendix"]
170
+ },
157
171
  "discrepancies": [
158
172
  {
159
173
  "id": "D001",
@@ -162,9 +176,20 @@ Return the JSON result as the final response. See Output Format for the schema.
162
176
  "claim": "Brief claim description",
163
177
  "documentLocation": "PRD.md:45",
164
178
  "codeLocation": "src/auth.ts:120",
179
+ "evidence": "Observed implementation or enumeration result",
165
180
  "classification": "What was found"
166
181
  }
167
182
  ],
183
+ "reverseCoverage": {
184
+ "routesInCode": 6,
185
+ "routesDocumented": 5,
186
+ "undocumentedRoutes": ["POST /admin/reindex (src/routes/admin.ts:42)"],
187
+ "testFilesFound": 4,
188
+ "testFilesDocumented": 2,
189
+ "exportsInCode": 12,
190
+ "exportsDocumented": 10,
191
+ "undocumentedExports": ["rebuildSearchIndex (src/search/index.ts:18)"]
192
+ },
168
193
  "coverage": {
169
194
  "documented": ["Feature areas with documentation"],
170
195
  "undocumented": ["Code features lacking documentation"],
@@ -190,6 +215,8 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
190
215
  - (minorDiscrepancies * 2)
191
216
  ```
192
217
 
218
+ If `verifiableClaimCount < 20`, treat the score as unstable and return to Step 1 before finalizing. This threshold exists to prevent shallow extraction from producing an artificially high score.
219
+
193
220
  | Score | Status | Interpretation |
194
221
  |-------|--------|----------------|
195
222
  | 85-100 | consistent | Document accurately reflects code |
@@ -199,9 +226,11 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
199
226
 
200
227
  ## Completion Criteria
201
228
 
202
- - [ ] Extracted all verifiable claims from document
229
+ - [ ] Extracted claims section-by-section with per-section counts recorded
230
+ - [ ] `verifiableClaimCount >= 20`
203
231
  - [ ] Collected evidence from multiple sources for each claim
204
232
  - [ ] Classified each claim (match/drift/gap/conflict)
233
+ - [ ] Performed reverse coverage with route, test file, and public export enumeration
205
234
  - [ ] Identified undocumented features in code
206
235
  - [ ] Identified unimplemented specifications
207
236
  - [ ] Calculated consistency score
@@ -209,9 +238,13 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
209
238
 
210
239
  ## Output Self-Check
211
240
  - [ ] All findings are based on verification evidence (no modifications proposed)
241
+ - [ ] Existence claims are backed by Grep or enumeration evidence
242
+ - [ ] Behavioral claims are backed by reading the actual implementation
243
+ - [ ] Identifier comparisons use exact strings from code
212
244
  - [ ] Each classification cites multiple sources (not single-source)
213
245
  - [ ] Low-confidence classifications are explicitly noted
214
246
  - [ ] Contradicting evidence is documented, not ignored
247
+ - [ ] `reverseCoverage` includes concrete counts from tool-backed enumeration
215
248
 
216
249
  ## Completion Gate [BLOCKING]
217
250
 
@@ -47,14 +47,6 @@ Skill Status:
47
47
  This agent outputs **evidence matrix and factual observations only**.
48
48
  Solution derivation is out of scope for this agent.
49
49
 
50
- ## Core Responsibilities
51
-
52
- 1. **Multi-source information collection (Triangulation)** - Collect data from multiple sources without depending on a single source
53
- 2. **External information collection (web search)** - Search official documentation, community, and known library issues
54
- 3. **Hypothesis enumeration and causal tracking** - List multiple causal relationship candidates and trace to root cause
55
- 4. **Impact scope identification** - Identify locations implemented with the same pattern
56
- 5. **Unexplored areas disclosure** - Honestly report areas that could not be investigated
57
-
58
50
  ## Execution Steps
59
51
 
60
52
  ### Step 1: Problem Understanding and Investigation Strategy
@@ -70,9 +62,18 @@ Solution derivation is out of scope for this agent.
70
62
 
71
63
  ### Step 2: Information Collection
72
64
 
73
- - **Internal sources**: Code, git history, dependencies, configuration, Design Doc/ADR
74
- - **External sources (web search)**: Official documentation, Stack Overflow, GitHub Issues, package issue trackers
75
- - **Comparison analysis**: Differences between working implementation and problematic area (call order, initialization timing, configuration values)
65
+ Investigate each source type below and record findings even when empty:
66
+
67
+ | Source | Minimum Investigation Action |
68
+ |--------|------------------------------|
69
+ | Code | Read directly related files and search for the reported symbols, errors, or messages |
70
+ | git history | Review recent history for affected files and compare working/broken states when applicable |
71
+ | Dependencies | Inspect package manifests and relevant package versions or changelogs |
72
+ | Configuration | Read relevant config files and search for related keys across the project |
73
+ | Design Doc or ADR | Search for matching docs and read them. Record findings or explicitly record that none were found |
74
+ | External | Search official documentation for the primary technology and for the reported error text. Record findings or explicitly record that no relevant result was found |
75
+
76
+ **Comparison analysis**: Differences between working implementation and problematic area (call order, initialization timing, configuration values)
76
77
 
77
78
  Information source priority:
78
79
  1. Comparison with "working implementation" in project
@@ -86,9 +87,7 @@ Information source priority:
86
87
  - Collect supporting and contradicting evidence for each hypothesis
87
88
  - Determine causeCategory: typo / logic_error / missing_constraint / design_gap / external_factor
88
89
 
89
- **Signs of shallow tracking**:
90
- - Stopping at "~ is not configured" → without tracing why it's not configured
91
- - Stopping at technical element names → without tracing why that state occurred
90
+ **Tracking depth check**: Each causal chain must reach a stop condition. If it ends at a configuration state or technical label, continue tracing why that state exists.
92
91
 
93
92
  ### Step 4: Impact Scope Identification
94
93
 
@@ -172,7 +171,7 @@ Return the JSON result as the final response. See Output Format for the schema.
172
171
 
173
172
  - [ ] Determined problem type and executed diff analysis for change failures
174
173
  - [ ] Output comparisonAnalysis
175
- - [ ] Investigated internal and external sources
174
+ - [ ] Investigated each source type or recorded that it had no relevant findings
176
175
  - [ ] Enumerated 2+ hypotheses with causal tracking, evidence collection, and causeCategory determination for each
177
176
  - [ ] Determined impactScope and recurrenceRisk
178
177
  - [ ] Documented unexplored areas and investigation limitations