codex-workflows 0.2.1 → 0.2.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/skills/recipe-add-integration-tests/SKILL.md +2 -2
- package/.agents/skills/recipe-build/SKILL.md +1 -1
- package/.agents/skills/recipe-diagnose/SKILL.md +20 -4
- package/.agents/skills/recipe-front-build/SKILL.md +2 -2
- package/.agents/skills/recipe-fullstack-build/SKILL.md +1 -1
- package/.agents/skills/recipe-fullstack-implement/SKILL.md +1 -1
- package/.agents/skills/recipe-implement/SKILL.md +1 -1
- package/.agents/skills/recipe-reverse-engineer/SKILL.md +56 -12
- package/.agents/skills/recipe-update-doc/SKILL.md +10 -5
- package/.agents/skills/subagents-orchestration-guide/SKILL.md +3 -3
- package/.agents/skills/subagents-orchestration-guide/references/monorepo-flow.md +2 -2
- package/.codex/agents/code-reviewer.toml +11 -1
- package/.codex/agents/code-verifier.toml +58 -21
- package/.codex/agents/document-reviewer.toml +4 -2
- package/.codex/agents/integration-test-reviewer.toml +4 -0
- package/.codex/agents/investigator.toml +20 -17
- package/.codex/agents/prd-creator.toml +39 -24
- package/.codex/agents/quality-fixer-frontend.toml +15 -7
- package/.codex/agents/quality-fixer.toml +15 -7
- package/.codex/agents/requirement-analyzer.toml +4 -0
- package/.codex/agents/rule-advisor.toml +9 -0
- package/.codex/agents/scope-discoverer.toml +67 -29
- package/.codex/agents/security-reviewer.toml +4 -0
- package/.codex/agents/solver.toml +6 -2
- package/.codex/agents/task-executor-frontend.toml +9 -0
- package/.codex/agents/task-executor.toml +9 -0
- package/.codex/agents/technical-designer-frontend.toml +68 -115
- package/.codex/agents/technical-designer.toml +70 -114
- package/.codex/agents/verifier.toml +11 -13
- package/README.md +2 -2
- package/package.json +1 -1
|
@@ -109,11 +109,11 @@ Check Step 5 result:
|
|
|
109
109
|
|
|
110
110
|
Spawn quality-fixer agent: "Final quality assurance for test files added in this workflow. Run all tests and verify coverage."
|
|
111
111
|
|
|
112
|
-
**Expected output**: `
|
|
112
|
+
**Expected output**: `status` (`approved`/`blocked`)
|
|
113
113
|
|
|
114
114
|
### Step 8: Commit
|
|
115
115
|
|
|
116
|
-
On `
|
|
116
|
+
On `status: "approved"` from quality-fixer:
|
|
117
117
|
- MUST commit test files with appropriate message
|
|
118
118
|
ENFORCEMENT: Commits without quality-fixer approval are invalid.
|
|
119
119
|
|
|
@@ -80,7 +80,7 @@ For EACH task, YOU MUST:
|
|
|
80
80
|
- `approved` -> Proceed to step 4
|
|
81
81
|
- `readyForQualityCheck: true` -> Proceed to step 4
|
|
82
82
|
4. **Spawn quality-fixer agent**: "Execute all quality checks and fixes"
|
|
83
|
-
5. **COMMIT on approval**: After `
|
|
83
|
+
5. **COMMIT on approval**: After `status: "approved"` from quality-fixer -> Execute git commit
|
|
84
84
|
|
|
85
85
|
**CRITICAL**: MUST monitor ALL structured responses WITHOUT EXCEPTION and ENSURE every quality gate is passed.
|
|
86
86
|
ENFORCEMENT: Proceeding past a failed quality gate invalidates all subsequent work.
|
|
@@ -83,7 +83,21 @@ Register the following and execute:
|
|
|
83
83
|
|
|
84
84
|
### Step 1: Investigation (investigator)
|
|
85
85
|
|
|
86
|
-
Spawn investigator agent
|
|
86
|
+
Spawn investigator agent with the following prompt:
|
|
87
|
+
|
|
88
|
+
```text
|
|
89
|
+
Comprehensively collect information related to the following phenomenon.
|
|
90
|
+
|
|
91
|
+
Phenomenon: [Problem reported by user]
|
|
92
|
+
Problem essence: [taskEssence]
|
|
93
|
+
Investigation focus: [investigationFocus]
|
|
94
|
+
Applicable rules: [selectedRules summary]
|
|
95
|
+
|
|
96
|
+
For change failures, also include:
|
|
97
|
+
- what changed
|
|
98
|
+
- what broke
|
|
99
|
+
- what both areas share
|
|
100
|
+
```
|
|
87
101
|
|
|
88
102
|
**Expected output**: Evidence matrix, comparison analysis results, causal tracking results, list of unexplored areas, investigation limitations
|
|
89
103
|
|
|
@@ -92,12 +106,14 @@ Spawn investigator agent: "Comprehensively collect information related to the fo
|
|
|
92
106
|
Review investigation output:
|
|
93
107
|
|
|
94
108
|
**Quality Check** (verify output contains the following):
|
|
95
|
-
- [ ] comparisonAnalysis
|
|
96
|
-
- [ ] causalChain for each hypothesis
|
|
109
|
+
- [ ] `comparisonAnalysis` is present and `normalImplementation` is non-null, or explicitly states that no working implementation was found
|
|
110
|
+
- [ ] causalChain for each hypothesis reaches a stop condition
|
|
97
111
|
- [ ] causeCategory for each hypothesis
|
|
112
|
+
- [ ] `investigationSources` covers at least 3 distinct source types
|
|
113
|
+
- [ ] each hypothesis has supporting evidence with a concrete source
|
|
98
114
|
- [ ] Investigation covering investigationFocus items (when provided)
|
|
99
115
|
|
|
100
|
-
**If quality insufficient**: MUST re-spawn investigator agent specifying missing items
|
|
116
|
+
**If quality insufficient**: MUST re-spawn investigator agent specifying the missing items and include the previous investigation output for context
|
|
101
117
|
ENFORCEMENT: Proceeding to verifier with incomplete investigation data produces unreliable conclusions.
|
|
102
118
|
|
|
103
119
|
**design_gap Escalation**:
|
|
@@ -74,7 +74,7 @@ Verify generated task files exist in docs/plans/tasks/.
|
|
|
74
74
|
Each sub-agent responds in JSON format:
|
|
75
75
|
- **task-executor-frontend**: status, filesModified, testsAdded, requiresTestReview, readyForQualityCheck
|
|
76
76
|
- **integration-test-reviewer**: status (approved/needs_revision/blocked), requiredFixes
|
|
77
|
-
- **quality-fixer-frontend**: status, checksPerformed, fixesApplied
|
|
77
|
+
- **quality-fixer-frontend**: status, checksPerformed, fixesApplied
|
|
78
78
|
|
|
79
79
|
### Execution Flow for Each Task
|
|
80
80
|
|
|
@@ -88,7 +88,7 @@ For EACH task, YOU MUST:
|
|
|
88
88
|
- `approved` -> Proceed to step 4
|
|
89
89
|
- `readyForQualityCheck: true` -> Proceed to step 4
|
|
90
90
|
4. **Spawn quality-fixer-frontend agent**: "Execute all frontend quality checks and fixes"
|
|
91
|
-
5. **COMMIT on approval**: After `
|
|
91
|
+
5. **COMMIT on approval**: After `status: "approved"` from quality-fixer-frontend -> Execute git commit. Use `changeSummary` for commit message.
|
|
92
92
|
|
|
93
93
|
**CRITICAL**: MUST monitor ALL structured responses WITHOUT EXCEPTION and ENSURE every quality gate is passed.
|
|
94
94
|
ENFORCEMENT: Proceeding past a failed quality gate invalidates all subsequent work.
|
|
@@ -98,7 +98,7 @@ For EACH task, YOU MUST:
|
|
|
98
98
|
- `approved` -> Proceed to step 4
|
|
99
99
|
- `readyForQualityCheck: true` -> Proceed to step 4
|
|
100
100
|
4. **Spawn quality-fixer agent** (layer-appropriate per routing table): "Execute all quality checks and fixes"
|
|
101
|
-
5. **COMMIT on approval**: After `
|
|
101
|
+
5. **COMMIT on approval**: After `status: "approved"` from quality-fixer -> Execute git commit
|
|
102
102
|
|
|
103
103
|
**CRITICAL**: MUST monitor ALL structured responses WITHOUT EXCEPTION and ENSURE every quality gate is passed.
|
|
104
104
|
ENFORCEMENT: Proceeding past a failed quality gate invalidates all subsequent work.
|
|
@@ -123,7 +123,7 @@ ENFORCEMENT: Sub-agent prompts missing the constraint suffix MUST be re-issued w
|
|
|
123
123
|
1. Execute ONE task completely before starting next (each task goes through the full 4-step cycle individually, using the correct executor per filename pattern)
|
|
124
124
|
2. Check executor status before quality-fixer (escalation check)
|
|
125
125
|
3. Quality-fixer MUST run after each executor (no skipping)
|
|
126
|
-
4. Commit MUST execute when quality-fixer returns `
|
|
126
|
+
4. Commit MUST execute when quality-fixer returns `status: "approved"` (do not defer to end)
|
|
127
127
|
|
|
128
128
|
### Security Review (After All Tasks Complete)
|
|
129
129
|
|
|
@@ -106,7 +106,7 @@ After user grants "batch approval for entire implementation phase", enter autono
|
|
|
106
106
|
- `approved` -> Proceed to step 3
|
|
107
107
|
- Otherwise -> Proceed to step 3
|
|
108
108
|
3. Spawn quality-fixer (or quality-fixer-frontend) agent: "Quality check and fixes"
|
|
109
|
-
4. git commit -> Execute on `
|
|
109
|
+
4. git commit -> Execute on `status: "approved"`
|
|
110
110
|
|
|
111
111
|
### Security Review (After All Tasks Complete)
|
|
112
112
|
|
|
@@ -20,7 +20,7 @@ Target: $ARGUMENTS
|
|
|
20
20
|
**Execution Protocol**:
|
|
21
21
|
1. **Spawn agents for all work** -- your role is to invoke sub-agents, pass data between them, and report results
|
|
22
22
|
2. **Process one step at a time**: Execute steps sequentially within each unit (2 -> 3 -> 4 -> 5). Each step's output is the required input for the next step. Complete all steps for one unit before starting the next
|
|
23
|
-
3. **Pass `$STEP_N_OUTPUT` as-is** to sub-agents -- the orchestrator bridges data without processing or filtering it
|
|
23
|
+
3. **Pass `$STEP_N_OUTPUT` as-is** to sub-agents -- the orchestrator bridges data without processing or filtering it, except for steps that explicitly define a deterministic transformation with an input schema, output schema, and mapping rules
|
|
24
24
|
|
|
25
25
|
**Task Registration**: Register phases first, then steps within each phase as you enter it. Track status for each step.
|
|
26
26
|
|
|
@@ -44,7 +44,7 @@ Ask the user to confirm:
|
|
|
44
44
|
|
|
45
45
|
```
|
|
46
46
|
Phase 1: PRD Generation
|
|
47
|
-
Step 1: Scope Discovery (unified, single pass)
|
|
47
|
+
Step 1: Scope Discovery (unified, single pass -> group into PRD units -> human review)
|
|
48
48
|
Step 2-5: Per-unit loop (Generation -> Verification -> Review -> Revision)
|
|
49
49
|
|
|
50
50
|
Phase 2: Design Doc Generation (if requested)
|
|
@@ -67,17 +67,20 @@ Spawn scope-discoverer agent: "Discover functional scope targets in the codebase
|
|
|
67
67
|
**Quality Gate**:
|
|
68
68
|
- At least one unit discovered -> proceed
|
|
69
69
|
- No units discovered -> ask user for hints
|
|
70
|
+
- `$STEP_1_OUTPUT.prdUnits` exists
|
|
71
|
+
- All `sourceUnits` across `prdUnits` (flattened, deduplicated) match the set of `discoveredUnits` IDs — no unit missing, no unit duplicated
|
|
72
|
+
- Each discovered unit's `unitInventory` has at least one non-empty category. If all categories are empty, re-run discovery with focus on that unit
|
|
70
73
|
|
|
71
|
-
**[STOP — BLOCKING]** If human review enabled: Present
|
|
74
|
+
**[STOP — BLOCKING]** If human review enabled: Present `$STEP_1_OUTPUT.prdUnits` with their source unit mapping to user for confirmation.
|
|
72
75
|
**CANNOT proceed until user explicitly confirms.**
|
|
73
76
|
|
|
74
77
|
### Step 2-5: Per-Unit Processing
|
|
75
78
|
|
|
76
|
-
**FOR** each unit in `$STEP_1_OUTPUT.
|
|
79
|
+
**FOR** each unit in `$STEP_1_OUTPUT.prdUnits` **(sequential, one unit at a time)**:
|
|
77
80
|
|
|
78
81
|
#### Step 2: PRD Generation
|
|
79
82
|
|
|
80
|
-
Spawn prd-creator agent: "Create reverse-engineered PRD for the following feature. Operation Mode: reverse-engineer. External Scope Provided: true. Feature: $
|
|
83
|
+
Spawn prd-creator agent: "Create reverse-engineered PRD for the following feature. Operation Mode: reverse-engineer. External Scope Provided: true. Feature: $PRD_UNIT_NAME. Description: $PRD_UNIT_DESCRIPTION. Related Files: $PRD_UNIT_COMBINED_RELATED_FILES. Entry Points: $PRD_UNIT_COMBINED_ENTRY_POINTS. Source Units: $PRD_UNIT_SOURCE_UNITS. Use provided scope as an investigation starting point. If tracing entry points reveals directly connected files outside this scope, include them. Create final version PRD based on thorough code investigation."
|
|
81
84
|
|
|
82
85
|
**Store output as**: `$STEP_2_OUTPUT` (PRD path)
|
|
83
86
|
|
|
@@ -85,12 +88,13 @@ Spawn prd-creator agent: "Create reverse-engineered PRD for the following featur
|
|
|
85
88
|
|
|
86
89
|
**Prerequisite**: $STEP_2_OUTPUT (PRD path from Step 2)
|
|
87
90
|
|
|
88
|
-
Spawn code-verifier agent: "Verify consistency between PRD and code implementation. doc_type: prd. document_path: $STEP_2_OUTPUT.
|
|
91
|
+
Spawn code-verifier agent: "Verify consistency between PRD and code implementation. doc_type: prd. document_path: $STEP_2_OUTPUT. verbose: false."
|
|
89
92
|
|
|
90
93
|
**Store output as**: `$STEP_3_OUTPUT`
|
|
91
94
|
|
|
92
95
|
**Quality Gate**:
|
|
93
|
-
- consistencyScore >= 70 -> proceed to review
|
|
96
|
+
- consistencyScore >= 70 and verifiableClaimCount >= 20 -> proceed to review (guards against shallow verification passes with too few extracted claims)
|
|
97
|
+
- consistencyScore >= 70 and verifiableClaimCount < 20 -> re-run verifier because investigation depth is insufficient
|
|
94
98
|
- consistencyScore < 70 -> flag for detailed review
|
|
95
99
|
|
|
96
100
|
#### Step 4: Review
|
|
@@ -130,18 +134,58 @@ ENFORCEMENT: Exceeding 2 revision cycles without flagging produces unreviewed ou
|
|
|
130
134
|
|
|
131
135
|
### Step 6: Design Doc Scope Mapping
|
|
132
136
|
|
|
133
|
-
**
|
|
137
|
+
**Step type**: Deterministic transformation step executed by the orchestrator.
|
|
134
138
|
|
|
135
|
-
|
|
139
|
+
**No additional discovery required.** Use `$STEP_1_OUTPUT.discoveredUnits` (implementation-granularity units) for technical profiles. Use `$STEP_1_OUTPUT.prdUnits[].sourceUnits` to trace which discovered units belong to each PRD unit.
|
|
136
140
|
|
|
137
|
-
|
|
141
|
+
**Default mapping rule**: Each PRD unit maps to exactly 1 Design Doc unit.
|
|
142
|
+
|
|
143
|
+
Only split one PRD unit into multiple Design Doc units when BOTH are true:
|
|
144
|
+
1. The source units contain clearly separate technical boundaries with low shared-file overlap
|
|
145
|
+
2. Separate Design Docs would improve verification clarity (different public interfaces, dependencies, or module groups)
|
|
146
|
+
|
|
147
|
+
If the split conditions are not clearly met, keep 1 PRD unit -> 1 Design Doc unit.
|
|
148
|
+
|
|
149
|
+
Transform `$STEP_1_OUTPUT` into `$STEP_6_OUTPUT` using only the mapping rules in this step.
|
|
150
|
+
|
|
151
|
+
Map PRD units to Design Doc generation targets by resolving each PRD unit's `sourceUnits` back to `$STEP_1_OUTPUT.discoveredUnits`, carrying forward:
|
|
138
152
|
- `technicalProfile.primaryModules` -> Primary Files
|
|
139
153
|
- `technicalProfile.publicInterfaces` -> Public Interfaces
|
|
140
154
|
- `dependencies` -> Dependencies
|
|
141
155
|
- `relatedFiles` -> Scope boundary
|
|
156
|
+
- `unitInventory` -> Unit Inventory
|
|
142
157
|
|
|
143
158
|
**Store output as**: `$STEP_6_OUTPUT`
|
|
144
159
|
|
|
160
|
+
`$STEP_6_OUTPUT` MUST be a JSON array of Design Doc generation targets in the following shape:
|
|
161
|
+
|
|
162
|
+
```json
|
|
163
|
+
[
|
|
164
|
+
{
|
|
165
|
+
"unitId": "DD-001",
|
|
166
|
+
"parentPrdUnitId": "PRD-001",
|
|
167
|
+
"unitName": "Authentication",
|
|
168
|
+
"unitDescription": "Current implementation for sign-in and session management",
|
|
169
|
+
"sourceUnits": ["UNIT-001", "UNIT-002"],
|
|
170
|
+
"primaryModules": ["src/auth/service.ts", "src/auth/controller.ts"],
|
|
171
|
+
"publicInterfaces": ["AuthService.login()", "AuthController.handleLogin()"],
|
|
172
|
+
"dependencies": ["UNIT-003"],
|
|
173
|
+
"scopeBoundary": ["src/auth/*"],
|
|
174
|
+
"unitInventory": {
|
|
175
|
+
"routes": [],
|
|
176
|
+
"testFiles": [],
|
|
177
|
+
"publicExports": []
|
|
178
|
+
},
|
|
179
|
+
"mappingRationale": "Default 1:1 mapping from PRD unit because technical scope is cohesive"
|
|
180
|
+
}
|
|
181
|
+
]
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
**Quality Gate**:
|
|
185
|
+
- Every PRD unit appears in at least one `$STEP_6_OUTPUT` item
|
|
186
|
+
- Every `$STEP_6_OUTPUT` item references only discovered units from its parent PRD unit
|
|
187
|
+
- `mappingRationale` explicitly states whether the mapping is default 1:1 or an intentional split
|
|
188
|
+
|
|
145
189
|
### Step 7-10: Per-Unit Processing
|
|
146
190
|
|
|
147
191
|
**FOR** each unit in `$STEP_6_OUTPUT` **(sequential, one unit at a time)**:
|
|
@@ -150,13 +194,13 @@ Map `$STEP_1_OUTPUT` units to Design Doc generation targets, carrying forward:
|
|
|
150
194
|
|
|
151
195
|
**Scope**: Document current architecture as-is. This is a documentation task, not a design improvement task.
|
|
152
196
|
|
|
153
|
-
Spawn technical-designer agent: "Create Design Doc for the following feature based on existing code. Operation Mode:
|
|
197
|
+
Spawn technical-designer agent: "Create Design Doc for the following feature based on existing code. Operation Mode: reverse-engineer. Feature: $UNIT_NAME. Description: $UNIT_DESCRIPTION. Primary Files: $UNIT_PRIMARY_MODULES. Public Interfaces: $UNIT_PUBLIC_INTERFACES. Dependencies: $UNIT_DEPENDENCIES. Unit Inventory: $UNIT_INVENTORY. Parent PRD: $APPROVED_PRD_PATH. Document current architecture as-is. Use Unit Inventory as the completeness baseline."
|
|
154
198
|
|
|
155
199
|
**Store output as**: `$STEP_7_OUTPUT`
|
|
156
200
|
|
|
157
201
|
#### Step 8: Code Verification
|
|
158
202
|
|
|
159
|
-
Spawn code-verifier agent: "Verify consistency between Design Doc and code implementation. doc_type: design-doc. document_path: $STEP_7_OUTPUT.
|
|
203
|
+
Spawn code-verifier agent: "Verify consistency between Design Doc and code implementation. doc_type: design-doc. document_path: $STEP_7_OUTPUT. verbose: false."
|
|
160
204
|
|
|
161
205
|
**Store output as**: `$STEP_8_OUTPUT`
|
|
162
206
|
|
|
@@ -31,7 +31,7 @@ ENFORCEMENT: Skipping document-reviewer risks propagating inconsistencies to dow
|
|
|
31
31
|
```
|
|
32
32
|
Target document -> [Stop: Confirm changes]
|
|
33
33
|
|
|
|
34
|
-
technical-designer / prd-creator (update mode)
|
|
34
|
+
technical-designer / technical-designer-frontend / prd-creator (update mode)
|
|
35
35
|
|
|
|
36
36
|
document-reviewer -> [Stop: Review approval]
|
|
37
37
|
| (Design Doc only)
|
|
@@ -70,15 +70,20 @@ Check for existing documents in docs/design/, docs/prd/, docs/adr/.
|
|
|
70
70
|
| Multiple candidates found | Present options to user |
|
|
71
71
|
| No documents found | Report and end (suggest $recipe-design instead) |
|
|
72
72
|
|
|
73
|
-
### Step 2: Document Type Determination
|
|
73
|
+
### Step 2: Document Type and Layer Determination
|
|
74
74
|
|
|
75
|
-
Determine type from document path:
|
|
75
|
+
Determine type from document path, then determine the layer to select the correct update agent:
|
|
76
76
|
|
|
77
77
|
| Path Pattern | Type | Update Agent | Notes |
|
|
78
78
|
|-------------|------|--------------|-------|
|
|
79
|
-
| `docs/design/*.md` | Design Doc | technical-designer
|
|
79
|
+
| `docs/design/*.md` | Design Doc | technical-designer or technical-designer-frontend | See layer detection below |
|
|
80
80
|
| `docs/prd/*.md` | PRD | prd-creator | - |
|
|
81
|
-
| `docs/adr/*.md` | ADR | technical-designer
|
|
81
|
+
| `docs/adr/*.md` | ADR | technical-designer or technical-designer-frontend | See layer detection below |
|
|
82
|
+
|
|
83
|
+
**Layer detection** (for Design Doc and ADR):
|
|
84
|
+
Read the document and determine its layer from content signals:
|
|
85
|
+
- **Frontend** (-> technical-designer-frontend): Document title/scope mentions React, components, UI, frontend; or file contains component hierarchy, state management, UI interactions
|
|
86
|
+
- **Backend** (-> technical-designer): All other cases (API, data layer, business logic, infrastructure)
|
|
82
87
|
|
|
83
88
|
**ADR Update Guidance**:
|
|
84
89
|
- **Minor changes** (clarification, typo fix, small scope adjustment): Update the existing ADR file
|
|
@@ -173,10 +173,10 @@ All agents MUST use this vocabulary consistently:
|
|
|
173
173
|
|
|
174
174
|
## Structured Response Specification
|
|
175
175
|
|
|
176
|
-
Subagents respond in JSON format. Key fields for orchestrator decisions:
|
|
176
|
+
Subagents respond in JSON format. The final response from each JSON-returning subagent must be the JSON payload itself, with no trailing prose. Key fields for orchestrator decisions:
|
|
177
177
|
- **requirement-analyzer**: scale, confidence, affectedLayers, adrRequired, scopeDependencies, questions
|
|
178
178
|
- **task-executor**: status (escalation_needed/blocked/completed), testsAdded, requiresTestReview
|
|
179
|
-
- **quality-fixer**:
|
|
179
|
+
- **quality-fixer**: status (approved/blocked)
|
|
180
180
|
- **document-reviewer**: verdict.decision (approved/approved_with_conditions/needs_revision/rejected)
|
|
181
181
|
- **design-sync**: sync_status (CONFLICTS_FOUND/NO_CONFLICTS) — text format with [SUMMARY] block
|
|
182
182
|
- **integration-test-reviewer**: status (approved/needs_revision/blocked), requiredFixes
|
|
@@ -310,7 +310,7 @@ Stop autonomous execution and escalate to user in the following cases:
|
|
|
310
310
|
- `approved`: Proceed to step 3
|
|
311
311
|
- Otherwise: Proceed to step 3
|
|
312
312
|
3. quality-fixer: Quality check and fixes
|
|
313
|
-
4. git commit (on `
|
|
313
|
+
4. git commit (on `status: "approved"`)
|
|
314
314
|
|
|
315
315
|
## Main Orchestrator Roles
|
|
316
316
|
|
|
@@ -99,13 +99,13 @@ Each task uses the standard 4-step cycle with layer-appropriate agents:
|
|
|
99
99
|
1. task-executor: Implementation
|
|
100
100
|
2. Escalation check
|
|
101
101
|
3. quality-fixer: Quality check and fixes
|
|
102
|
-
4. git commit (on
|
|
102
|
+
4. git commit (on status: "approved")
|
|
103
103
|
|
|
104
104
|
### frontend-task
|
|
105
105
|
1. task-executor-frontend: Implementation
|
|
106
106
|
2. Escalation check
|
|
107
107
|
3. quality-fixer-frontend: Quality check and fixes
|
|
108
|
-
4. git commit (on
|
|
108
|
+
4. git commit (on status: "approved")
|
|
109
109
|
|
|
110
110
|
### integration-test-reviewer Placement
|
|
111
111
|
|
|
@@ -89,11 +89,14 @@ Verify against the Design Doc architecture:
|
|
|
89
89
|
- No unnecessary duplicate implementations (Pattern 5 from ai-development-guide skill)
|
|
90
90
|
- Existing codebase analysis section includes similar functionality investigation results
|
|
91
91
|
|
|
92
|
-
### 5. Calculate Compliance
|
|
92
|
+
### 5. Calculate Compliance
|
|
93
93
|
- Compliance rate = (fulfilled items + 0.5 x partially fulfilled items) / total AC items x 100
|
|
94
94
|
- Compile all AC statuses, quality issues with specific locations
|
|
95
95
|
- Determine verdict based on compliance rate
|
|
96
96
|
|
|
97
|
+
### 6. Return JSON Result
|
|
98
|
+
Return the JSON result as the final response. See Output Format for the schema.
|
|
99
|
+
|
|
97
100
|
## Output Format
|
|
98
101
|
|
|
99
102
|
```json
|
|
@@ -136,6 +139,13 @@ Verify against the Design Doc architecture:
|
|
|
136
139
|
- Provide solutions, not just problems; quantify wherever possible
|
|
137
140
|
- Acknowledge good implementations; present improvements as actionable items
|
|
138
141
|
|
|
142
|
+
## Completion Criteria
|
|
143
|
+
|
|
144
|
+
- [ ] All acceptance criteria individually evaluated
|
|
145
|
+
- [ ] Compliance rate calculated
|
|
146
|
+
- [ ] Verdict determined
|
|
147
|
+
- [ ] Final response is the JSON output
|
|
148
|
+
|
|
139
149
|
### Escalation Criteria
|
|
140
150
|
Recommend higher-level review when: Design Doc itself has deficiencies, security concerns discovered, or critical performance issues found.
|
|
141
151
|
|
|
@@ -52,13 +52,6 @@ Skill Status:
|
|
|
52
52
|
This agent outputs **verification results and discrepancy findings only**.
|
|
53
53
|
Document modification and solution proposals are out of scope for this agent.
|
|
54
54
|
|
|
55
|
-
## Core Responsibilities
|
|
56
|
-
|
|
57
|
-
1. **Claim Extraction** - Extract verifiable claims from document
|
|
58
|
-
2. **Multi-source Evidence Collection** - Gather evidence from code, tests, and config
|
|
59
|
-
3. **Consistency Classification** - Classify each claim's implementation status
|
|
60
|
-
4. **Coverage Assessment** - Identify undocumented code and unimplemented specifications
|
|
61
|
-
|
|
62
55
|
## Verification Framework
|
|
63
56
|
|
|
64
57
|
### Claim Categories
|
|
@@ -97,28 +90,38 @@ For each claim, classify as one of:
|
|
|
97
90
|
|
|
98
91
|
## Execution Steps
|
|
99
92
|
|
|
100
|
-
### Step 1: Document Analysis
|
|
93
|
+
### Step 1: Document Analysis — Section-by-Section Claim Extraction
|
|
101
94
|
|
|
102
|
-
1. Read the target document
|
|
103
|
-
2.
|
|
95
|
+
1. Read the target document in full
|
|
96
|
+
2. Process each section individually:
|
|
97
|
+
- Extract all statements that make verifiable claims about code behavior, data structures, file paths, API contracts, or system behavior
|
|
98
|
+
- Record `{ sectionName, claimCount, claims[] }`
|
|
99
|
+
- If a section contains factual statements but yields zero claims, record that explicitly for review
|
|
104
100
|
3. Categorize each claim
|
|
105
101
|
4. Note ambiguous claims that cannot be verified
|
|
102
|
+
5. Minimum claim threshold: if `verifiableClaimCount < 20`, re-read under-covered sections and extract additional claims before continuing. Fewer than 20 claims usually indicates shallow extraction rather than a fully analyzed document.
|
|
106
103
|
|
|
107
104
|
### Step 2: Code Scope Identification
|
|
108
105
|
|
|
109
|
-
1.
|
|
110
|
-
2.
|
|
111
|
-
3.
|
|
106
|
+
1. If `code_paths` are provided, use them as a starting point, not a ceiling
|
|
107
|
+
2. If `code_paths` are not provided, extract file paths from the document and expand scope by searching for referenced identifiers
|
|
108
|
+
3. Infer additional relevant paths from context
|
|
109
|
+
4. Build and record the final verification target list
|
|
112
110
|
|
|
113
111
|
### Step 3: Evidence Collection
|
|
114
112
|
|
|
115
113
|
For each claim:
|
|
116
114
|
|
|
117
|
-
1. **Primary Search**: Find direct implementation
|
|
115
|
+
1. **Primary Search**: Find direct implementation with Read/Grep
|
|
118
116
|
2. **Secondary Search**: Check test files for expected behavior
|
|
119
117
|
3. **Tertiary Search**: Review config and type definitions
|
|
120
118
|
|
|
121
|
-
|
|
119
|
+
Evidence rules:
|
|
120
|
+
- Record source location and evidence strength for each finding
|
|
121
|
+
- Existence claims must be verified with Grep or file enumeration before reporting
|
|
122
|
+
- Behavioral claims must be backed by reading the implementation, not by naming alone
|
|
123
|
+
- Identifier claims must compare exact strings from code against the document
|
|
124
|
+
- Single-source findings remain low confidence
|
|
122
125
|
|
|
123
126
|
### Step 4: Consistency Classification
|
|
124
127
|
|
|
@@ -130,11 +133,19 @@ For each claim with collected evidence:
|
|
|
130
133
|
- medium: 2 sources agree
|
|
131
134
|
- low: 1 source only
|
|
132
135
|
|
|
133
|
-
### Step 5: Coverage Assessment
|
|
136
|
+
### Step 5: Reverse Coverage Assessment — Code-to-Document Direction
|
|
137
|
+
|
|
138
|
+
Perform this step with actual tool-backed enumeration, not memory:
|
|
139
|
+
|
|
140
|
+
1. Enumerate routes/endpoints in scope and record whether each is documented
|
|
141
|
+
2. Enumerate test files in scope and record whether their existence is documented
|
|
142
|
+
3. Enumerate public exports/interfaces in primary source files and record whether each is documented
|
|
143
|
+
4. Compile undocumented code items from the enumerations
|
|
144
|
+
5. Compile unimplemented document items from earlier claim verification
|
|
134
145
|
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
146
|
+
### Step 6: Return JSON Result
|
|
147
|
+
|
|
148
|
+
Return the JSON result as the final response. See Output Format for the schema.
|
|
138
149
|
|
|
139
150
|
## Output Format
|
|
140
151
|
|
|
@@ -147,9 +158,16 @@ For each claim with collected evidence:
|
|
|
147
158
|
"summary": {
|
|
148
159
|
"docType": "prd|design-doc",
|
|
149
160
|
"documentPath": "/path/to/document.md",
|
|
161
|
+
"verifiableClaimCount": 24,
|
|
162
|
+
"matchCount": 20,
|
|
150
163
|
"consistencyScore": 85,
|
|
151
164
|
"status": "consistent|mostly_consistent|needs_review|inconsistent"
|
|
152
165
|
},
|
|
166
|
+
"claimCoverage": {
|
|
167
|
+
"sectionsAnalyzed": 8,
|
|
168
|
+
"sectionsWithClaims": 7,
|
|
169
|
+
"sectionsWithZeroClaims": ["Appendix"]
|
|
170
|
+
},
|
|
153
171
|
"discrepancies": [
|
|
154
172
|
{
|
|
155
173
|
"id": "D001",
|
|
@@ -158,9 +176,20 @@ For each claim with collected evidence:
|
|
|
158
176
|
"claim": "Brief claim description",
|
|
159
177
|
"documentLocation": "PRD.md:45",
|
|
160
178
|
"codeLocation": "src/auth.ts:120",
|
|
179
|
+
"evidence": "Observed implementation or enumeration result",
|
|
161
180
|
"classification": "What was found"
|
|
162
181
|
}
|
|
163
182
|
],
|
|
183
|
+
"reverseCoverage": {
|
|
184
|
+
"routesInCode": 6,
|
|
185
|
+
"routesDocumented": 5,
|
|
186
|
+
"undocumentedRoutes": ["POST /admin/reindex (src/routes/admin.ts:42)"],
|
|
187
|
+
"testFilesFound": 4,
|
|
188
|
+
"testFilesDocumented": 2,
|
|
189
|
+
"exportsInCode": 12,
|
|
190
|
+
"exportsDocumented": 10,
|
|
191
|
+
"undocumentedExports": ["rebuildSearchIndex (src/search/index.ts:18)"]
|
|
192
|
+
},
|
|
164
193
|
"coverage": {
|
|
165
194
|
"documented": ["Feature areas with documentation"],
|
|
166
195
|
"undocumented": ["Code features lacking documentation"],
|
|
@@ -186,6 +215,8 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
|
|
|
186
215
|
- (minorDiscrepancies * 2)
|
|
187
216
|
```
|
|
188
217
|
|
|
218
|
+
If `verifiableClaimCount < 20`, treat the score as unstable and return to Step 1 before finalizing. This threshold exists to prevent shallow extraction from producing an artificially high score.
|
|
219
|
+
|
|
189
220
|
| Score | Status | Interpretation |
|
|
190
221
|
|-------|--------|----------------|
|
|
191
222
|
| 85-100 | consistent | Document accurately reflects code |
|
|
@@ -195,19 +226,25 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
|
|
|
195
226
|
|
|
196
227
|
## Completion Criteria
|
|
197
228
|
|
|
198
|
-
- [ ] Extracted
|
|
229
|
+
- [ ] Extracted claims section-by-section with per-section counts recorded
|
|
230
|
+
- [ ] `verifiableClaimCount >= 20`
|
|
199
231
|
- [ ] Collected evidence from multiple sources for each claim
|
|
200
232
|
- [ ] Classified each claim (match/drift/gap/conflict)
|
|
233
|
+
- [ ] Performed reverse coverage with route, test file, and public export enumeration
|
|
201
234
|
- [ ] Identified undocumented features in code
|
|
202
235
|
- [ ] Identified unimplemented specifications
|
|
203
236
|
- [ ] Calculated consistency score
|
|
204
|
-
- [ ]
|
|
237
|
+
- [ ] Final response is the JSON output
|
|
205
238
|
|
|
206
239
|
## Output Self-Check
|
|
207
240
|
- [ ] All findings are based on verification evidence (no modifications proposed)
|
|
241
|
+
- [ ] Existence claims are backed by Grep or enumeration evidence
|
|
242
|
+
- [ ] Behavioral claims are backed by reading the actual implementation
|
|
243
|
+
- [ ] Identifier comparisons use exact strings from code
|
|
208
244
|
- [ ] Each classification cites multiple sources (not single-source)
|
|
209
245
|
- [ ] Low-confidence classifications are explicitly noted
|
|
210
246
|
- [ ] Contradicting evidence is documented, not ignored
|
|
247
|
+
- [ ] `reverseCoverage` includes concrete counts from tool-backed enumeration
|
|
211
248
|
|
|
212
249
|
## Completion Gate [BLOCKING]
|
|
213
250
|
|
|
@@ -127,13 +127,15 @@ Checklist:
|
|
|
127
127
|
- [ ] If prior_context_count > 0: Each item has resolution status
|
|
128
128
|
- [ ] If prior_context_count > 0: `prior_context_check` object prepared
|
|
129
129
|
- [ ] Output is valid JSON
|
|
130
|
+
- [ ] Final response is the JSON output
|
|
130
131
|
|
|
131
132
|
Complete all items before proceeding to output.
|
|
132
133
|
|
|
133
|
-
### Step 6:
|
|
134
|
-
-
|
|
134
|
+
### Step 6: Return JSON Result
|
|
135
|
+
- Use the JSON schema according to review mode (comprehensive or perspective-specific)
|
|
135
136
|
- Clearly classify problem importance
|
|
136
137
|
- Include `prior_context_check` object if prior_context_count > 0
|
|
138
|
+
- Return the JSON result as the final response. See Output Format for the schema.
|
|
137
139
|
|
|
138
140
|
## Output Format
|
|
139
141
|
|
|
@@ -78,6 +78,9 @@ Evaluate each test for:
|
|
|
78
78
|
- No shared state
|
|
79
79
|
- No time-dependent logic
|
|
80
80
|
|
|
81
|
+
### 4. Return JSON Result
|
|
82
|
+
Return the JSON result as the final response. See Output Format for the schema.
|
|
83
|
+
|
|
81
84
|
## Output Format
|
|
82
85
|
|
|
83
86
|
```json
|
|
@@ -137,6 +140,7 @@ Evaluate each test for:
|
|
|
137
140
|
- [ ] No test interdependencies
|
|
138
141
|
- [ ] Deterministic execution (no random/time dependency)
|
|
139
142
|
- [ ] Test name matches verification content
|
|
143
|
+
- [ ] Final response is the JSON output
|
|
140
144
|
|
|
141
145
|
## Common Issues and Fixes
|
|
142
146
|
|
|
@@ -47,14 +47,6 @@ Skill Status:
|
|
|
47
47
|
This agent outputs **evidence matrix and factual observations only**.
|
|
48
48
|
Solution derivation is out of scope for this agent.
|
|
49
49
|
|
|
50
|
-
## Core Responsibilities
|
|
51
|
-
|
|
52
|
-
1. **Multi-source information collection (Triangulation)** - Collect data from multiple sources without depending on a single source
|
|
53
|
-
2. **External information collection (web search)** - Search official documentation, community, and known library issues
|
|
54
|
-
3. **Hypothesis enumeration and causal tracking** - List multiple causal relationship candidates and trace to root cause
|
|
55
|
-
4. **Impact scope identification** - Identify locations implemented with the same pattern
|
|
56
|
-
5. **Unexplored areas disclosure** - Honestly report areas that could not be investigated
|
|
57
|
-
|
|
58
50
|
## Execution Steps
|
|
59
51
|
|
|
60
52
|
### Step 1: Problem Understanding and Investigation Strategy
|
|
@@ -70,9 +62,18 @@ Solution derivation is out of scope for this agent.
|
|
|
70
62
|
|
|
71
63
|
### Step 2: Information Collection
|
|
72
64
|
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
65
|
+
Investigate each source type below and record findings even when empty:
|
|
66
|
+
|
|
67
|
+
| Source | Minimum Investigation Action |
|
|
68
|
+
|--------|------------------------------|
|
|
69
|
+
| Code | Read directly related files and search for the reported symbols, errors, or messages |
|
|
70
|
+
| git history | Review recent history for affected files and compare working/broken states when applicable |
|
|
71
|
+
| Dependencies | Inspect package manifests and relevant package versions or changelogs |
|
|
72
|
+
| Configuration | Read relevant config files and search for related keys across the project |
|
|
73
|
+
| Design Doc or ADR | Search for matching docs and read them. Record findings or explicitly record that none were found |
|
|
74
|
+
| External | Search official documentation for the primary technology and for the reported error text. Record findings or explicitly record that no relevant result was found |
|
|
75
|
+
|
|
76
|
+
**Comparison analysis**: Differences between working implementation and problematic area (call order, initialization timing, configuration values)
|
|
76
77
|
|
|
77
78
|
Information source priority:
|
|
78
79
|
1. Comparison with "working implementation" in project
|
|
@@ -86,16 +87,17 @@ Information source priority:
|
|
|
86
87
|
- Collect supporting and contradicting evidence for each hypothesis
|
|
87
88
|
- Determine causeCategory: typo / logic_error / missing_constraint / design_gap / external_factor
|
|
88
89
|
|
|
89
|
-
**
|
|
90
|
-
- Stopping at "~ is not configured" → without tracing why it's not configured
|
|
91
|
-
- Stopping at technical element names → without tracing why that state occurred
|
|
90
|
+
**Tracking depth check**: Each causal chain must reach a stop condition. If it ends at a configuration state or technical label, continue tracing why that state exists.
|
|
92
91
|
|
|
93
|
-
### Step 4: Impact Scope Identification
|
|
92
|
+
### Step 4: Impact Scope Identification
|
|
94
93
|
|
|
95
94
|
- Search for locations implemented with the same pattern (impactScope)
|
|
96
95
|
- Determine recurrenceRisk: low (isolated) / medium (2 or fewer locations) / high (3+ locations or design_gap)
|
|
97
96
|
- Disclose unexplored areas and investigation limitations
|
|
98
|
-
|
|
97
|
+
|
|
98
|
+
### Step 5: Return JSON Result
|
|
99
|
+
|
|
100
|
+
Return the JSON result as the final response. See Output Format for the schema.
|
|
99
101
|
|
|
100
102
|
## Evidence Strength Classification
|
|
101
103
|
|
|
@@ -169,10 +171,11 @@ Information source priority:
|
|
|
169
171
|
|
|
170
172
|
- [ ] Determined problem type and executed diff analysis for change failures
|
|
171
173
|
- [ ] Output comparisonAnalysis
|
|
172
|
-
- [ ] Investigated
|
|
174
|
+
- [ ] Investigated each source type or recorded that it had no relevant findings
|
|
173
175
|
- [ ] Enumerated 2+ hypotheses with causal tracking, evidence collection, and causeCategory determination for each
|
|
174
176
|
- [ ] Determined impactScope and recurrenceRisk
|
|
175
177
|
- [ ] Documented unexplored areas and investigation limitations
|
|
178
|
+
- [ ] Final response is the JSON output
|
|
176
179
|
|
|
177
180
|
## Output Self-Check
|
|
178
181
|
- [ ] Multiple hypotheses were evaluated (not just the first plausible one)
|