codex-workflows 0.4.2 → 0.4.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/skills/documentation-criteria/references/design-template.md +9 -0
- package/.agents/skills/recipe-build/SKILL.md +18 -8
- package/.agents/skills/recipe-front-build/SKILL.md +18 -8
- package/.agents/skills/recipe-front-review/SKILL.md +8 -2
- package/.agents/skills/recipe-fullstack-build/SKILL.md +18 -8
- package/.agents/skills/recipe-fullstack-implement/SKILL.md +18 -8
- package/.agents/skills/recipe-implement/SKILL.md +18 -8
- package/.agents/skills/recipe-review/SKILL.md +8 -2
- package/.agents/skills/subagents-orchestration-guide/SKILL.md +17 -6
- package/.codex/agents/code-reviewer.toml +120 -17
- package/.codex/agents/codebase-analyzer.toml +53 -6
- package/.codex/agents/document-reviewer.toml +3 -0
- package/.codex/agents/technical-designer.toml +6 -2
- package/README.md +54 -35
- package/package.json +4 -1
|
@@ -241,6 +241,15 @@ Self-evident: internal-only refactoring with identical observable inputs and out
|
|
|
241
241
|
- **Success criteria**: [Observable outcome that proves correctness]
|
|
242
242
|
- **Failure response**: [What to do if early verification fails]
|
|
243
243
|
|
|
244
|
+
### Output Comparison (When Changing Existing Observable Behavior, an External Contract, or a Persisted Data Shape)
|
|
245
|
+
|
|
246
|
+
- **Comparison input**: [Identical input used for both the current and new implementation]
|
|
247
|
+
- **Expected output fields**: [Specific fields, columns, or output format to compare]
|
|
248
|
+
- **Diff method**: [How the outputs are compared, such as field-by-field diff, file diff, or snapshot comparison]
|
|
249
|
+
- **Transformation pipeline coverage**: [Map each listed step from codebase analysis `dataTransformationPipelines` to the comparison that verifies its output. If a step passes data through unchanged, mark it excluded with rationale]
|
|
250
|
+
|
|
251
|
+
Mark as `N/A` with a brief rationale only when the change does not alter existing observable behavior, an external contract, or a persisted data shape.
|
|
252
|
+
|
|
244
253
|
### State Transitions and Invariants (When Applicable)
|
|
245
254
|
|
|
246
255
|
```yaml
|
|
@@ -98,14 +98,24 @@ ENFORCEMENT: Sub-agent prompts missing the constraint suffix MUST be re-issued w
|
|
|
98
98
|
|
|
99
99
|
VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous execution mode.
|
|
100
100
|
|
|
101
|
-
##
|
|
102
|
-
|
|
103
|
-
After all task cycles finish, collect all `filesModified` from every task-executor response (deduplicated), then
|
|
104
|
-
1. Spawn
|
|
105
|
-
2.
|
|
106
|
-
|
|
107
|
-
-
|
|
108
|
-
- `
|
|
101
|
+
## Post-Implementation Verification (After All Tasks Complete)
|
|
102
|
+
|
|
103
|
+
After all task cycles finish, collect all `filesModified` from every task-executor response (deduplicated), then run both verification agents before the completion report:
|
|
104
|
+
1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
|
|
105
|
+
2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
|
|
106
|
+
3. Consolidate results:
|
|
107
|
+
- code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`
|
|
108
|
+
- code-verifier fails when `summary.status` is `needs_review` or `inconsistent`
|
|
109
|
+
- security-reviewer passes when `status` is `approved` or `approved_with_notes`
|
|
110
|
+
- security-reviewer fails when `status` is `needs_revision`
|
|
111
|
+
- security-reviewer `blocked` -> Escalate to user
|
|
112
|
+
4. If either verifier fails:
|
|
113
|
+
- Create a single fix task covering verifier discrepancies and security requiredFixes
|
|
114
|
+
- Spawn task-executor with that consolidated task
|
|
115
|
+
- Spawn quality-fixer
|
|
116
|
+
- Re-run only the verifier(s) that failed
|
|
117
|
+
- Maximum retry count is 1 verification fix cycle; if any failed verifier still fails after re-run, escalate to the user
|
|
118
|
+
5. If both verifiers pass -> Proceed to completion report
|
|
109
119
|
|
|
110
120
|
**[STOP — BLOCKING]** Upon detecting ANY requirement changes, halt execution immediately.
|
|
111
121
|
**CANNOT proceed until user explicitly confirms the change scope.**
|
|
@@ -106,14 +106,24 @@ ENFORCEMENT: Sub-agent prompts missing the constraint suffix MUST be re-issued w
|
|
|
106
106
|
|
|
107
107
|
VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous execution mode.
|
|
108
108
|
|
|
109
|
-
##
|
|
110
|
-
|
|
111
|
-
After all task cycles finish, collect all `filesModified` from every task-executor-frontend response (deduplicated), then
|
|
112
|
-
1. Spawn
|
|
113
|
-
2.
|
|
114
|
-
|
|
115
|
-
-
|
|
116
|
-
- `
|
|
109
|
+
## Post-Implementation Verification (After All Tasks Complete)
|
|
110
|
+
|
|
111
|
+
After all task cycles finish, collect all `filesModified` from every task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
|
|
112
|
+
1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
|
|
113
|
+
2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
|
|
114
|
+
3. Consolidate results:
|
|
115
|
+
- code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`
|
|
116
|
+
- code-verifier fails when `summary.status` is `needs_review` or `inconsistent`
|
|
117
|
+
- security-reviewer passes when `status` is `approved` or `approved_with_notes`
|
|
118
|
+
- security-reviewer fails when `status` is `needs_revision`
|
|
119
|
+
- security-reviewer `blocked` -> Escalate to user
|
|
120
|
+
4. If either verifier fails:
|
|
121
|
+
- Create a single fix task covering verifier discrepancies and security requiredFixes
|
|
122
|
+
- Spawn task-executor-frontend with that consolidated task
|
|
123
|
+
- Spawn quality-fixer-frontend
|
|
124
|
+
- Re-run only the verifier(s) that failed
|
|
125
|
+
- Maximum retry count is 1 verification fix cycle; if any failed verifier still fails after re-run, escalate to the user
|
|
126
|
+
5. If both verifiers pass -> Proceed to completion report
|
|
117
127
|
|
|
118
128
|
**[STOP -- BLOCKING]** Upon detecting ANY requirement changes, halt execution immediately.
|
|
119
129
|
**CANNOT proceed until user explicitly confirms the change scope.**
|
|
@@ -33,7 +33,7 @@ Identify the Design Doc in docs/design/ and check implementation files changed f
|
|
|
33
33
|
**CANNOT proceed without both a Design Doc and implementation files.**
|
|
34
34
|
|
|
35
35
|
### 2. Execute code-reviewer
|
|
36
|
-
Spawn code-reviewer agent: "Validate Design Doc compliance for [design-doc-path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report
|
|
36
|
+
Spawn code-reviewer agent: "Validate Design Doc compliance for [design-doc-path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
|
|
37
37
|
|
|
38
38
|
**Store output as**: `$STEP_2_OUTPUT`
|
|
39
39
|
|
|
@@ -59,10 +59,16 @@ Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [file
|
|
|
59
59
|
```
|
|
60
60
|
Code Compliance: [complianceRate from code-reviewer]
|
|
61
61
|
Verdict: [verdict from code-reviewer]
|
|
62
|
+
Identifier Match Rate: [identifierMatchRate from code-reviewer]
|
|
62
63
|
Acceptance Criteria:
|
|
63
|
-
- [fulfilled] [item]
|
|
64
|
+
- [fulfilled] [item] (confidence: [high/medium/low])
|
|
64
65
|
- [partially_fulfilled] [item]: [gap] — [suggestion]
|
|
65
66
|
- [unfulfilled] [item]: [gap] — [suggestion]
|
|
67
|
+
Identifier Mismatches (show only mismatches; write `None` if all identifiers match):
|
|
68
|
+
- None
|
|
69
|
+
- [identifier]: DD=[designDocValue] Code=[codeValue] at [location] (confidence: [high/medium/low])
|
|
70
|
+
Quality Findings:
|
|
71
|
+
- [category] [location]: [description] — [rationale]
|
|
66
72
|
|
|
67
73
|
Security Review: [status from security-reviewer]
|
|
68
74
|
Findings by category:
|
|
@@ -116,14 +116,24 @@ ENFORCEMENT: Sub-agent prompts missing the constraint suffix MUST be re-issued w
|
|
|
116
116
|
|
|
117
117
|
VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous execution mode.
|
|
118
118
|
|
|
119
|
-
##
|
|
120
|
-
|
|
121
|
-
After all task cycles finish, collect all `filesModified` from every task-executor/task-executor-frontend response (deduplicated), then
|
|
122
|
-
1. Spawn
|
|
123
|
-
2.
|
|
124
|
-
|
|
125
|
-
-
|
|
126
|
-
- `
|
|
119
|
+
## Post-Implementation Verification (After All Tasks Complete)
|
|
120
|
+
|
|
121
|
+
After all task cycles finish, collect all `filesModified` from every task-executor/task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
|
|
122
|
+
1. Spawn code-verifier once per Design Doc: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [single design doc path]. `code_paths`: [collected filesModified list]."
|
|
123
|
+
2. Spawn security-reviewer agent: "Design Doc: [path(s)]. Implementation files: [collected filesModified list]. Review security compliance."
|
|
124
|
+
3. Consolidate results:
|
|
125
|
+
- each code-verifier run passes when `summary.status` is `consistent` or `mostly_consistent`
|
|
126
|
+
- a code-verifier run fails when `summary.status` is `needs_review` or `inconsistent`
|
|
127
|
+
- security-reviewer passes when `status` is `approved` or `approved_with_notes`
|
|
128
|
+
- security-reviewer fails when `status` is `needs_revision`
|
|
129
|
+
- security-reviewer `blocked` -> Escalate to user
|
|
130
|
+
4. If any verifier fails:
|
|
131
|
+
- Create a single fix task covering verifier discrepancies and security requiredFixes
|
|
132
|
+
- Spawn the layer-appropriate task-executor
|
|
133
|
+
- Spawn the layer-appropriate quality-fixer
|
|
134
|
+
- Re-run only the verifier(s) that failed
|
|
135
|
+
- Maximum retry count is 1 verification fix cycle; if any failed verifier still fails after re-run, escalate to the user
|
|
136
|
+
5. If all verifiers pass -> Proceed to completion report
|
|
127
137
|
|
|
128
138
|
**[STOP -- BLOCKING]** Upon detecting ANY requirement changes, halt execution immediately.
|
|
129
139
|
**CANNOT proceed until user explicitly confirms the change scope.**
|
|
@@ -127,14 +127,24 @@ ENFORCEMENT: Sub-agent prompts missing the constraint suffix MUST be re-issued w
|
|
|
127
127
|
3. Quality-fixer MUST run after each executor (no skipping)
|
|
128
128
|
4. Commit MUST execute when quality-fixer returns `status: "approved"` (do not defer to end)
|
|
129
129
|
|
|
130
|
-
###
|
|
131
|
-
|
|
132
|
-
After all task cycles finish, collect all `filesModified` from every task-executor/task-executor-frontend response (deduplicated), then
|
|
133
|
-
1. Spawn
|
|
134
|
-
2.
|
|
135
|
-
|
|
136
|
-
-
|
|
137
|
-
- `
|
|
130
|
+
### Post-Implementation Verification (After All Tasks Complete)
|
|
131
|
+
|
|
132
|
+
After all task cycles finish, collect all `filesModified` from every task-executor/task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
|
|
133
|
+
1. Spawn code-verifier once per Design Doc: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [single design doc path]. `code_paths`: [collected filesModified list]."
|
|
134
|
+
2. Spawn security-reviewer agent: "Design Doc: [path(s)]. Implementation files: [collected filesModified list]. Review security compliance."
|
|
135
|
+
3. Consolidate results:
|
|
136
|
+
- each code-verifier run passes when `summary.status` is `consistent` or `mostly_consistent`
|
|
137
|
+
- a code-verifier run fails when `summary.status` is `needs_review` or `inconsistent`
|
|
138
|
+
- security-reviewer passes when `status` is `approved` or `approved_with_notes`
|
|
139
|
+
- security-reviewer fails when `status` is `needs_revision`
|
|
140
|
+
- security-reviewer `blocked` -> Escalate to user
|
|
141
|
+
4. If any verifier fails:
|
|
142
|
+
- Create a single fix task covering verifier discrepancies and security requiredFixes
|
|
143
|
+
- Spawn the layer-appropriate task-executor
|
|
144
|
+
- Spawn the layer-appropriate quality-fixer
|
|
145
|
+
- Re-run only the verifier(s) that failed
|
|
146
|
+
- Maximum retry count is 1 verification fix cycle; if any failed verifier still fails after re-run, escalate to the user
|
|
147
|
+
5. If all verifiers pass -> Proceed to completion report
|
|
138
148
|
|
|
139
149
|
### Test Information Communication
|
|
140
150
|
After acceptance-test-generator execution, when calling work-planner, communicate:
|
|
@@ -108,14 +108,24 @@ After user grants "batch approval for entire implementation phase", enter autono
|
|
|
108
108
|
3. Spawn quality-fixer (or quality-fixer-frontend) agent: "Quality check and fixes"
|
|
109
109
|
4. git commit -> Execute on `status: "approved"`
|
|
110
110
|
|
|
111
|
-
###
|
|
112
|
-
|
|
113
|
-
After all task cycles finish, collect all `filesModified` from every executor response (task-executor and task-executor-frontend, deduplicated), then
|
|
114
|
-
1. Spawn
|
|
115
|
-
2.
|
|
116
|
-
|
|
117
|
-
-
|
|
118
|
-
- `
|
|
111
|
+
### Post-Implementation Verification (After All Tasks Complete)
|
|
112
|
+
|
|
113
|
+
After all task cycles finish, collect all `filesModified` from every executor response (task-executor and task-executor-frontend, deduplicated), then run both verification agents before the completion report:
|
|
114
|
+
1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
|
|
115
|
+
2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
|
|
116
|
+
3. Consolidate results:
|
|
117
|
+
- code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`
|
|
118
|
+
- code-verifier fails when `summary.status` is `needs_review` or `inconsistent`
|
|
119
|
+
- security-reviewer passes when `status` is `approved` or `approved_with_notes`
|
|
120
|
+
- security-reviewer fails when `status` is `needs_revision`
|
|
121
|
+
- security-reviewer `blocked` -> Escalate to user
|
|
122
|
+
4. If either verifier fails:
|
|
123
|
+
- Create a single fix task covering verifier discrepancies and security requiredFixes
|
|
124
|
+
- Spawn the layer-appropriate executor
|
|
125
|
+
- Spawn the layer-appropriate quality-fixer
|
|
126
|
+
- Re-run only the verifier(s) that failed
|
|
127
|
+
- Maximum retry count is 1 verification fix cycle; if any failed verifier still fails after re-run, escalate to the user
|
|
128
|
+
5. If both verifiers pass -> Proceed to completion report
|
|
119
129
|
|
|
120
130
|
### Test Information Communication
|
|
121
131
|
After acceptance-test-generator execution, when spawning work-planner, communicate:
|
|
@@ -35,7 +35,7 @@ Design Doc (uses most recent if omitted): $ARGUMENTS
|
|
|
35
35
|
Identify Design Doc in docs/design/ and check implementation files via git diff.
|
|
36
36
|
|
|
37
37
|
### Step 2: Execute code-reviewer
|
|
38
|
-
Spawn code-reviewer agent: "Validate Design Doc compliance for the implementation. Design Doc path: [path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report
|
|
38
|
+
Spawn code-reviewer agent: "Validate Design Doc compliance for the implementation. Design Doc path: [path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
|
|
39
39
|
|
|
40
40
|
**Store output as**: `$STEP_2_OUTPUT`
|
|
41
41
|
|
|
@@ -61,10 +61,16 @@ Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [file
|
|
|
61
61
|
```
|
|
62
62
|
Code Compliance: [complianceRate from code-reviewer]
|
|
63
63
|
Verdict: [verdict from code-reviewer]
|
|
64
|
+
Identifier Match Rate: [identifierMatchRate from code-reviewer]
|
|
64
65
|
Acceptance Criteria:
|
|
65
|
-
- [fulfilled] [item]
|
|
66
|
+
- [fulfilled] [item] (confidence: [high/medium/low])
|
|
66
67
|
- [partially_fulfilled] [item]: [gap] — [suggestion]
|
|
67
68
|
- [unfulfilled] [item]: [gap] — [suggestion]
|
|
69
|
+
Identifier Mismatches (show only mismatches; write `None` if all identifiers match):
|
|
70
|
+
- None
|
|
71
|
+
- [identifier]: DD=[designDocValue] Code=[codeValue] at [location] (confidence: [high/medium/low])
|
|
72
|
+
Quality Findings:
|
|
73
|
+
- [category] [location]: [description] — [rationale]
|
|
68
74
|
|
|
69
75
|
Security Review: [status from security-reviewer]
|
|
70
76
|
Findings by category:
|
|
@@ -69,7 +69,7 @@ The following subagents are available:
|
|
|
69
69
|
10. **technical-designer**: ADR/Design Doc creation
|
|
70
70
|
11. **work-planner**: Work plan creation from Design Doc and test skeletons
|
|
71
71
|
12. **document-reviewer**: Single document quality and rule compliance check
|
|
72
|
-
13. **code-verifier**: Document-code consistency verification for review inputs
|
|
72
|
+
13. **code-verifier**: Document-code consistency verification for review inputs and post-implementation verification
|
|
73
73
|
14. **design-sync**: Design Doc consistency verification across multiple documents
|
|
74
74
|
15. **acceptance-test-generator**: Generate integration and E2E test skeletons from Design Doc ACs
|
|
75
75
|
|
|
@@ -182,7 +182,7 @@ Subagents respond in JSON format. The final response from each JSON-returning su
|
|
|
182
182
|
- **task-executor**: status (escalation_needed/completed), escalation_type (design_compliance_violation/similar_function_found/similar_component_found/investigation_target_not_found/out_of_scope_file/test_environment_not_ready), testsAdded, requiresTestReview
|
|
183
183
|
- **quality-fixer**: status (approved/blocked). For blocked responses, discriminate by `reason`: specification conflicts use `blockingIssues[]`; execution prerequisites use `missingPrerequisites[]`, and each item provides its own `resolutionSteps`
|
|
184
184
|
- **document-reviewer**: verdict.decision (approved/approved_with_conditions/needs_revision/rejected)
|
|
185
|
-
- **code-verifier**: summary, discrepancies, reverseCoverage
|
|
185
|
+
- **code-verifier**: summary.status, summary.consistencyScore, discrepancies, reverseCoverage
|
|
186
186
|
- **design-sync**: sync_status (CONFLICTS_FOUND/NO_CONFLICTS) — text format with [SUMMARY] block
|
|
187
187
|
- **integration-test-reviewer**: status (approved/needs_revision/blocked), requiredFixes
|
|
188
188
|
- **security-reviewer**: status (approved/approved_with_notes/needs_revision/blocked), findings, notes, requiredFixes
|
|
@@ -252,7 +252,7 @@ When receiving new features or change requests, start with requirement-analyzer.
|
|
|
252
252
|
### Design Flow Data Passing
|
|
253
253
|
|
|
254
254
|
- Pass requirement-analyzer output and original requirements to codebase-analyzer
|
|
255
|
-
- Pass codebase-analyzer JSON to technical-designer or technical-designer-frontend as `Codebase Analysis`
|
|
255
|
+
- Pass codebase-analyzer JSON to technical-designer or technical-designer-frontend as `Codebase Analysis`, including `dataTransformationPipelines` when present
|
|
256
256
|
- Pass Design Doc path to code-verifier
|
|
257
257
|
- Pass code-verifier JSON to document-reviewer as `code_verification`
|
|
258
258
|
|
|
@@ -300,9 +300,9 @@ Batch approval -> Start autonomous execution mode
|
|
|
300
300
|
-> Orchestrator: Execute git commit
|
|
301
301
|
-> Check remaining tasks:
|
|
302
302
|
- Yes -> next task
|
|
303
|
-
- No -> security-reviewer:
|
|
304
|
-
-
|
|
305
|
-
-
|
|
303
|
+
- No -> code-verifier + security-reviewer: Post-implementation verification
|
|
304
|
+
- all pass -> Completion report
|
|
305
|
+
- any fail -> layer-appropriate task-executor: Verification fixes -> quality-fixer -> re-run failed verifiers
|
|
306
306
|
- blocked -> Escalate to user
|
|
307
307
|
```
|
|
308
308
|
|
|
@@ -321,6 +321,16 @@ Use the task loop defined in the autonomous execution diagram above. The canonic
|
|
|
321
321
|
3. quality-fixer quality gate
|
|
322
322
|
4. git commit on approval
|
|
323
323
|
|
|
324
|
+
### Post-Implementation Verification Pass/Fail Criteria
|
|
325
|
+
|
|
326
|
+
| Verifier | Pass | Fail | Blocked |
|
|
327
|
+
|----------|------|------|---------|
|
|
328
|
+
| code-verifier | `summary.status` is `consistent` or `mostly_consistent` | `summary.status` is `needs_review` or `inconsistent` | — |
|
|
329
|
+
| security-reviewer | `status` is `approved` or `approved_with_notes` | `status` is `needs_revision` | `status` is `blocked` |
|
|
330
|
+
|
|
331
|
+
Re-run only verifiers that failed on the previous verification cycle.
|
|
332
|
+
Maximum retry count is 1 verification fix cycle. If any failed verifier still fails after the re-run, escalate to the user.
|
|
333
|
+
|
|
324
334
|
## Main Orchestrator Roles
|
|
325
335
|
|
|
326
336
|
1. **State Management**: Track current phase, each subagent's state, and next action
|
|
@@ -363,6 +373,7 @@ When a Design Doc contains a Verification Strategy section, the orchestrator mus
|
|
|
363
373
|
- Early verification point (first target, success criteria, failure response)
|
|
364
374
|
|
|
365
375
|
The resulting work plan must include this summary in its header so the plan remains self-sufficient for downstream task generation and execution planning.
|
|
376
|
+
When the Design Doc includes an `Output Comparison` section, carry forward the comparison input, expected output fields or format, diff method, and transformation pipeline coverage as part of that summary.
|
|
366
377
|
|
|
367
378
|
## Important Constraints [MANDATORY]
|
|
368
379
|
|
|
@@ -59,27 +59,79 @@ Skill Status:
|
|
|
59
59
|
## Workflow
|
|
60
60
|
|
|
61
61
|
### 1. Load Baseline
|
|
62
|
-
Read the Design Doc and extract:
|
|
62
|
+
Read the Design Doc in full and extract:
|
|
63
63
|
- Functional requirements and acceptance criteria (list each AC individually)
|
|
64
64
|
- Architecture design and data flow
|
|
65
|
+
- Interface contracts (function signatures, API endpoints, data structures)
|
|
66
|
+
- Identifier specifications explicitly written in the Design Doc as exact values, literals, labels, or named fields (resource names, endpoint paths, configuration keys, error codes, schema/model names)
|
|
65
67
|
- Error handling policy
|
|
66
68
|
- Non-functional requirements
|
|
67
69
|
|
|
68
|
-
### 2. Map Implementation to
|
|
70
|
+
### 2. Map Implementation to Design Doc
|
|
71
|
+
|
|
72
|
+
#### 2-1. Acceptance Criteria Verification
|
|
69
73
|
For each acceptance criterion extracted in Step 1:
|
|
70
74
|
- Search implementation files for the corresponding code
|
|
71
75
|
- Determine status: fulfilled / partially fulfilled / unfulfilled
|
|
72
76
|
- Record the file path and relevant code location
|
|
73
77
|
- Note any deviations from the Design Doc specification
|
|
74
78
|
|
|
79
|
+
#### 2-2. Identifier Verification
|
|
80
|
+
For each identifier specification extracted in Step 1:
|
|
81
|
+
1. Search implementation files for the exact string
|
|
82
|
+
2. Compare code values against the Design Doc specification
|
|
83
|
+
3. Flag discrepancies such as missing references, misspellings, or inconsistent naming
|
|
84
|
+
4. Evaluate every identifier and update overall totals for matched and mismatched results
|
|
85
|
+
5. Emit only mismatches in `identifierVerification`, with `{ identifier, designDocValue, codeValue, location, match, confidence, evidence }`
|
|
86
|
+
|
|
87
|
+
Identifier extraction constraints:
|
|
88
|
+
- Only verify identifiers that are explicitly written in the Design Doc as exact values, literals, labeled fields, or code-facing names
|
|
89
|
+
- Do not infer identifiers from descriptive prose, conceptual summaries, or implied naming conventions
|
|
90
|
+
- If the Design Doc names a concept without an exact code-facing value, treat it as a normal Design Doc claim, not an identifier check
|
|
91
|
+
|
|
92
|
+
#### 2-3. Evidence Collection
|
|
93
|
+
For each acceptance criterion and identifier check:
|
|
94
|
+
1. Primary evidence: direct implementation in source files
|
|
95
|
+
2. Secondary evidence: corresponding tests
|
|
96
|
+
3. Tertiary evidence: config, schemas, or type definitions
|
|
97
|
+
|
|
98
|
+
`agreeing sources` means multiple sources independently support the same determination about the same acceptance criterion or identifier. Naming overlap alone is NOT agreement; the evidence must support the same behavior, contract, or exact value match.
|
|
99
|
+
|
|
100
|
+
Assign confidence based on evidence count:
|
|
101
|
+
- high: 3+ agreeing sources
|
|
102
|
+
- medium: 2 agreeing sources
|
|
103
|
+
- low: 1 source only
|
|
104
|
+
|
|
75
105
|
### 3. Assess Code Quality
|
|
76
|
-
Read each implementation file and
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
-
|
|
81
|
-
-
|
|
82
|
-
-
|
|
106
|
+
Read each implementation file and evaluate:
|
|
107
|
+
|
|
108
|
+
#### 3-1. Structural Quality
|
|
109
|
+
For each implementation file, read the concrete functions, handlers, or components in scope and evaluate them against the active coding-rules skill:
|
|
110
|
+
- Function organization: flag `maintainability` when a single function mixes multiple distinct concerns such as validation, orchestration, persistence, and presentation formatting
|
|
111
|
+
- Control-flow clarity: flag `maintainability` when branches, nested conditions, or early-exit patterns make the execution path materially difficult to follow
|
|
112
|
+
- Single responsibility adherence: flag `maintainability` when a function or file has more than one primary responsibility
|
|
113
|
+
- Naming clarity: flag `maintainability` when ambiguous names materially obscure intent, domain meaning, or responsibility
|
|
114
|
+
|
|
115
|
+
#### 3-2. Error Handling and Reliability
|
|
116
|
+
Read error paths and boundary handling directly in the code:
|
|
117
|
+
- Error handling implementation: verify failures are either propagated explicitly or handled with context
|
|
118
|
+
- Explicit failure paths over silent suppression: flag `reliability` when errors are swallowed, converted to defaults without justification, or otherwise hidden from callers and operators
|
|
119
|
+
- Boundary validation: flag `reliability` when external input, deserialized data, or cross-system responses enter important logic without the validation implied by the Design Doc, type contracts, or code boundary shape
|
|
120
|
+
|
|
121
|
+
#### 3-3. Test Coverage for Acceptance Criteria
|
|
122
|
+
- For each fulfilled AC, check whether tests exercise the expected behavior
|
|
123
|
+
|
|
124
|
+
Classify each quality finding into one of:
|
|
125
|
+
- `dd_violation`: implementation deviates from the Design Doc
|
|
126
|
+
- `maintainability`: code structure impedes change or comprehension
|
|
127
|
+
- `reliability`: missing safeguards could cause runtime failure
|
|
128
|
+
- `coverage_gap`: acceptance criteria lack meaningful test verification
|
|
129
|
+
|
|
130
|
+
Each finding MUST include a rationale:
|
|
131
|
+
- `dd_violation`: what the Design Doc says vs what code does
|
|
132
|
+
- `maintainability`: the concrete maintenance or comprehension risk
|
|
133
|
+
- `reliability`: the failure scenario and triggering conditions
|
|
134
|
+
- `coverage_gap`: the untested AC and why coverage matters
|
|
83
135
|
|
|
84
136
|
### 4. Check Architecture Compliance
|
|
85
137
|
Verify against the Design Doc architecture:
|
|
@@ -89,9 +141,10 @@ Verify against the Design Doc architecture:
|
|
|
89
141
|
- No unnecessary duplicate implementations (Pattern 5 from ai-development-guide skill)
|
|
90
142
|
- Existing codebase analysis section includes similar functionality investigation results
|
|
91
143
|
|
|
92
|
-
### 5. Calculate Compliance
|
|
144
|
+
### 5. Calculate Compliance and Consolidate
|
|
93
145
|
- Compliance rate = (fulfilled items + 0.5 x partially fulfilled items) / total AC items x 100
|
|
94
|
-
-
|
|
146
|
+
- Identifier match rate = matched identifiers / total identifiers x 100
|
|
147
|
+
- Compile all AC statuses, identifier results, and quality findings with specific locations
|
|
95
148
|
- Determine verdict based on compliance rate
|
|
96
149
|
|
|
97
150
|
### 6. Return JSON Result
|
|
@@ -102,50 +155,100 @@ Return the JSON result as the final response. See Output Format for the schema.
|
|
|
102
155
|
```json
|
|
103
156
|
{
|
|
104
157
|
"complianceRate": "[X]%",
|
|
158
|
+
"identifierMatchRate": "[X]%",
|
|
105
159
|
"verdict": "[pass/needs-improvement/needs-redesign]",
|
|
106
160
|
|
|
107
161
|
"acceptanceCriteria": [
|
|
108
162
|
{
|
|
109
163
|
"item": "[acceptance criteria name]",
|
|
110
164
|
"status": "fulfilled|partially_fulfilled|unfulfilled",
|
|
165
|
+
"confidence": "high|medium|low",
|
|
111
166
|
"location": "[file:line, if implemented]",
|
|
167
|
+
"evidence": ["[source1: file:line]", "[source2: test file:line]"],
|
|
112
168
|
"gap": "[what is missing or deviating, if not fully fulfilled]",
|
|
113
169
|
"suggestion": "[specific fix, if not fully fulfilled]"
|
|
114
170
|
}
|
|
115
171
|
],
|
|
116
172
|
|
|
117
|
-
"
|
|
173
|
+
"identifierVerification": [
|
|
174
|
+
{
|
|
175
|
+
"identifier": "[identifier name]",
|
|
176
|
+
"designDocValue": "[value specified in Design Doc]",
|
|
177
|
+
"codeValue": "[value found in code, or 'not found']",
|
|
178
|
+
"location": "[file:line]",
|
|
179
|
+
"confidence": "high|medium|low",
|
|
180
|
+
"evidence": ["[source1: file:line]", "[source2: config file:line]"],
|
|
181
|
+
"match": false
|
|
182
|
+
}
|
|
183
|
+
],
|
|
184
|
+
|
|
185
|
+
"qualityFindings": [
|
|
118
186
|
{
|
|
119
|
-
"
|
|
120
|
-
"location": "[filename:function]",
|
|
187
|
+
"category": "dd_violation|maintainability|reliability|coverage_gap",
|
|
188
|
+
"location": "[filename:function or file:line]",
|
|
189
|
+
"description": "[specific issue]",
|
|
190
|
+
"rationale": "[why this matters]",
|
|
121
191
|
"suggestion": "[specific improvement]"
|
|
122
192
|
}
|
|
123
193
|
],
|
|
124
194
|
|
|
195
|
+
"summary": {
|
|
196
|
+
"acsTotal": 0,
|
|
197
|
+
"acsFulfilled": 0,
|
|
198
|
+
"acsPartial": 0,
|
|
199
|
+
"acsUnfulfilled": 0,
|
|
200
|
+
"identifiersTotal": 0,
|
|
201
|
+
"identifiersMatched": 0,
|
|
202
|
+
"lowConfidenceItems": 0,
|
|
203
|
+
"findingsByCategory": {
|
|
204
|
+
"dd_violation": 0,
|
|
205
|
+
"maintainability": 0,
|
|
206
|
+
"reliability": 0,
|
|
207
|
+
"coverage_gap": 0
|
|
208
|
+
}
|
|
209
|
+
},
|
|
210
|
+
|
|
125
211
|
"nextAction": "[highest priority action needed]"
|
|
126
212
|
}
|
|
127
213
|
```
|
|
128
214
|
|
|
215
|
+
`identifierVerification` MUST include mismatches only. Use `summary.identifiersTotal` and `summary.identifiersMatched` for overall counts.
|
|
216
|
+
|
|
129
217
|
## Verdict Criteria
|
|
130
218
|
|
|
131
219
|
- **90%+**: pass — Minor adjustments only
|
|
132
220
|
- **70-89%**: needs-improvement — Critical gaps exist
|
|
133
221
|
- **<70%**: needs-redesign — Major revision required
|
|
134
222
|
|
|
223
|
+
Lower the verdict by one level only when at least one identifier mismatch has confidence `medium` or `high`.
|
|
224
|
+
|
|
135
225
|
## Important Notes
|
|
136
226
|
|
|
137
227
|
### Review Principles
|
|
138
228
|
- Use Design Doc as single source of truth; evaluate independent of implementation context
|
|
229
|
+
- Every finding must include file:line evidence
|
|
230
|
+
- Low-confidence determinations must be explicit
|
|
231
|
+
- Convert abstract skill rules into concrete, code-backed review findings rather than restating the rule alone
|
|
139
232
|
- Provide solutions, not just problems; quantify wherever possible
|
|
140
|
-
- Acknowledge good implementations; present improvements as actionable items
|
|
141
233
|
|
|
142
234
|
## Completion Criteria
|
|
143
235
|
|
|
144
|
-
- [ ] All acceptance criteria individually evaluated
|
|
145
|
-
- [ ]
|
|
236
|
+
- [ ] All acceptance criteria individually evaluated with confidence
|
|
237
|
+
- [ ] Identifier specifications verified against implementation
|
|
238
|
+
- [ ] Compliance rate and identifier match rate calculated
|
|
239
|
+
- [ ] Quality findings classified with rationale
|
|
146
240
|
- [ ] Verdict determined
|
|
147
241
|
- [ ] Final response is the JSON output
|
|
148
242
|
|
|
243
|
+
## Output Self-Check
|
|
244
|
+
|
|
245
|
+
- [ ] Every AC determination cites evidence
|
|
246
|
+
- [ ] Identifier comparisons use exact strings from the Design Doc and code
|
|
247
|
+
- [ ] Low-confidence items are explicit
|
|
248
|
+
- [ ] Every quality finding includes category, rationale, and file:line
|
|
249
|
+
- [ ] Every maintainability or reliability finding is backed by code that was actually read, not inferred from naming alone
|
|
250
|
+
- [ ] identifierVerification contains mismatches only, and each mismatch includes confidence and evidence
|
|
251
|
+
|
|
149
252
|
### Escalation Criteria
|
|
150
253
|
Recommend higher-level review when: Design Doc itself has deficiencies, security concerns discovered, or critical performance issues found.
|
|
151
254
|
|
|
@@ -70,14 +70,29 @@ Identify what exists, what appears missing, and what deserves close attention in
|
|
|
70
70
|
### Step 2: Existing Code Element Discovery
|
|
71
71
|
|
|
72
72
|
For each affected file or inferred target file in the selected scope:
|
|
73
|
-
1. Read the file
|
|
74
|
-
|
|
75
|
-
|
|
73
|
+
1. Read the file in full and record:
|
|
74
|
+
- public and private/internal interfaces, key functions, methods, classes, types, constants, and configuration use
|
|
75
|
+
- exact names, visibility, and signatures where directly observable
|
|
76
|
+
2. Trace call chains with these scope rules:
|
|
77
|
+
- same module or file: follow internal function and method calls recursively until the chain terminates, delegates externally, or reaches a leaf
|
|
78
|
+
- external dependencies through imports or equivalent declarations: record them as integration points with public interface or contract details only
|
|
79
|
+
3. Enumerate all entry points discovered in the traced scope that receive input from outside the module or file
|
|
80
|
+
4. Mark each entry point as `change-relevant` or `non-relevant` based on requirement scope, affected files, and user-visible behavior
|
|
81
|
+
5. For each `change-relevant` entry point, trace the data transformation pipeline step by step:
|
|
82
|
+
- record how the input changes at each step
|
|
83
|
+
- record intermediate representations or formats when they differ from the final output
|
|
84
|
+
- record external lookups that modify values, including configuration, constants, mapping tables, and reference data
|
|
85
|
+
6. For each `non-relevant` entry point, record at minimum:
|
|
86
|
+
- entry point name and file:line
|
|
87
|
+
- input shape
|
|
88
|
+
- output shape
|
|
89
|
+
7. If additional entry points share the same output path or transformation logic as a `change-relevant` entry point, reclassify them as `change-relevant` and trace them step by step
|
|
90
|
+
8. Search for patterns related to:
|
|
76
91
|
- data access: repository usage, ORM calls, query builders, raw SQL, migration references
|
|
77
92
|
- external service calls: HTTP clients, SDK clients, queue producers or consumers, webhook handlers
|
|
78
93
|
- validation logic: validator functions, schema parsers, assertions, guard clauses, constraint checks
|
|
79
94
|
- user-visible state handling: state stores, reducers, hooks, loading or error states, view-model shaping
|
|
80
|
-
|
|
95
|
+
9. Record each discovered element with exact file path and line number
|
|
81
96
|
|
|
82
97
|
### Step 3: Data Model Discovery
|
|
83
98
|
|
|
@@ -149,6 +164,31 @@ Return the JSON result as the final response.
|
|
|
149
164
|
],
|
|
150
165
|
"migrationFiles": ["path/to/migration"]
|
|
151
166
|
},
|
|
167
|
+
"dataTransformationPipelines": [
|
|
168
|
+
{
|
|
169
|
+
"entryPoint": "functionOrMethodName (path/to/file:line)",
|
|
170
|
+
"steps": [
|
|
171
|
+
{
|
|
172
|
+
"order": 1,
|
|
173
|
+
"method": "functionOrMethodName (path/to/file:line)",
|
|
174
|
+
"input": "Input data or format at this step",
|
|
175
|
+
"output": "Output data or format at this step",
|
|
176
|
+
"externalLookups": ["Config.KEY lookup", "Reference table mapping"],
|
|
177
|
+
"transformation": "What changed and why it matters"
|
|
178
|
+
}
|
|
179
|
+
],
|
|
180
|
+
"intermediateFormats": ["Intermediate representation if applicable"],
|
|
181
|
+
"finalOutput": "Final output shape or observable value"
|
|
182
|
+
}
|
|
183
|
+
],
|
|
184
|
+
"entryPointInventory": [
|
|
185
|
+
{
|
|
186
|
+
"entryPoint": "functionOrMethodName (path/to/file:line)",
|
|
187
|
+
"classification": "change-relevant|non-relevant",
|
|
188
|
+
"inputShape": "Input type or shape",
|
|
189
|
+
"outputShape": "Output type or shape"
|
|
190
|
+
}
|
|
191
|
+
],
|
|
152
192
|
"constraints": [
|
|
153
193
|
{
|
|
154
194
|
"type": "validation|business_rule|configuration|assumption",
|
|
@@ -176,11 +216,18 @@ Return the JSON result as the final response.
|
|
|
176
216
|
## Completion Criteria
|
|
177
217
|
|
|
178
218
|
- [ ] Parsed requirement context and identified analysis categories
|
|
179
|
-
- [ ] Read affected files and
|
|
180
|
-
- [ ]
|
|
219
|
+
- [ ] Read affected files in full and recorded public and private implementation elements with file:line evidence, or documented scope limits in `limitations`
|
|
220
|
+
- [ ] Traced call chains according to the scope rules, or documented incomplete traces in `limitations`
|
|
221
|
+
- [ ] Enumerated all entry points in the traced scope and marked each as `change-relevant` or `non-relevant`
|
|
222
|
+
- [ ] Traced data transformation pipelines step by step for all `change-relevant` entry points that transform incoming data
|
|
223
|
+
- [ ] Recorded at minimum entry point name, input shape, and output shape for all `non-relevant` entry points
|
|
224
|
+
- [ ] Reclassified and traced entry points that share the same output path or transformation logic as a `change-relevant` entry point
|
|
225
|
+
- [ ] Recorded external lookups that modify output values, including configuration, constants, and mapping data
|
|
181
226
|
- [ ] Performed data model discovery when data access patterns were present
|
|
182
227
|
- [ ] Extracted constraints and focus areas with concrete risks
|
|
183
228
|
- [ ] Checked existing tests for coverage signals
|
|
229
|
+
- [ ] Populated `dataTransformationPipelines` for all traced pipelines
|
|
230
|
+
- [ ] Populated `entryPointInventory` for all discovered entry points in the traced scope
|
|
184
231
|
- [ ] Returned valid JSON
|
|
185
232
|
"""
|
|
186
233
|
|
|
@@ -98,6 +98,7 @@ For DesignDoc, additionally verify:
|
|
|
98
98
|
- [ ] Field propagation map present (when fields cross boundaries)
|
|
99
99
|
- [ ] Data-oriented designs contain concrete data design or Test Boundaries content, or an explicit N/A rationale
|
|
100
100
|
- [ ] Verification Strategy section present with correctness definition, target comparison, verification method, observable success indicator, normalized verification timing, and early verification point
|
|
101
|
+
- [ ] Output Comparison section present when the design changes existing observable behavior, an external contract, or a persisted data shape
|
|
101
102
|
|
|
102
103
|
#### Gate 1: Quality Assessment (only after Gate 0 passes)
|
|
103
104
|
|
|
@@ -116,6 +117,7 @@ For DesignDoc, additionally verify:
|
|
|
116
117
|
- **Data design completeness check**: When the document references persistence, storage, database, repository, query, ORM, migration, table, schema, or column concepts, verify that the Design Doc includes concrete data design content or an explicit N/A rationale. Useful evidence includes schema references, data model notes, or Test Boundaries with data layer strategy
|
|
117
118
|
- **Code-verifier evidence integration**: When `code_verification` is provided, reconcile major or critical discrepancies and undocumented data operations as part of Gate 1 completeness and consistency review
|
|
118
119
|
- **Verification Strategy quality check**: When the Verification Strategy section exists, verify that: (1) correctness definition is specific and measurable, (2) target comparison and observable success indicator are concrete when the change modifies observable behavior, external contracts, integrations, or data flow, (3) internal-only refactoring with identical observable inputs and outputs may use the minimal form, (4) verification method can detect the change's primary risk, (5) verification timing uses the normalized vocabulary or an explicit `N/A` rationale for minimal form, and (6) vertical-slice designs do not defer all verification to the final phase
|
|
120
|
+
- **Output comparison check**: When the Design Doc changes existing observable behavior, an external contract, or a persisted data shape, verify that a concrete output comparison method is defined with identical input, expected output fields or format, and diff method. When upstream analysis includes `dataTransformationPipelines`, each listed step must be mapped to the comparison that verifies it; steps excluded because data passes through unchanged must include rationale. Missing mappings or rationale → `important` issue (category: `completeness`)
|
|
119
121
|
- **Undetermined items review** [MANDATORY]: Every TBD, unknown, or open item MUST include: (1) **owner** — who resolves it, (2) **due** — when it gets resolved (which phase or milestone), (3) **next-phase handling** — how the next phase treats this gap. Missing any of these three → `important` issue
|
|
120
122
|
|
|
121
123
|
**Perspective-specific Mode**:
|
|
@@ -258,6 +260,7 @@ Include in output when `prior_context_count > 0`:
|
|
|
258
260
|
- [ ] Match of requirements, terminology, numbers between documents
|
|
259
261
|
- [ ] Completeness of required elements in each document
|
|
260
262
|
- [ ] Verification Strategy present with a concrete correctness definition and early verification point
|
|
263
|
+
- [ ] Output Comparison defined when the design changes existing observable behavior, an external contract, or a persisted data shape
|
|
261
264
|
- [ ] Verification Strategy aligns with design type and implementation approach
|
|
262
265
|
- [ ] Compliance with project rules
|
|
263
266
|
- [ ] Technical feasibility and reasonableness of estimates
|
|
@@ -90,7 +90,7 @@ Must be performed before Design Doc creation:
|
|
|
90
90
|
- Record and distinguish between existing implementation locations and planned new locations
|
|
91
91
|
|
|
92
92
|
2. **Existing Interface Investigation** (Only when changing existing features)
|
|
93
|
-
- List
|
|
93
|
+
- List every public method of the target service with full signatures
|
|
94
94
|
- Identify call sites using content search with appropriate search patterns
|
|
95
95
|
|
|
96
96
|
3. **Similar Functionality Search and Decision** (Pattern 5 prevention from ai-development-guide skill)
|
|
@@ -189,7 +189,9 @@ Must be performed when creating Design Doc:
|
|
|
189
189
|
- For new features, specify acceptance-criteria verification beyond unit tests
|
|
190
190
|
- For extensions, specify regression verification that proves existing behavior is preserved
|
|
191
191
|
- For refactors or rewrites, specify behavioral equivalence verification against the current implementation when applicable
|
|
192
|
-
-
|
|
192
|
+
- When the design changes existing observable behavior, an external contract, or a persisted data shape, define a concrete `Output Comparison` method: identical input, expected output fields or format, diff method, and a mapping from each listed pipeline step to the comparison that verifies it
|
|
193
|
+
- When `Codebase Analysis` provides `dataTransformationPipelines`, use them to populate the `Output Comparison` section. Steps that pass data through unchanged may be excluded only with explicit rationale
|
|
194
|
+
- Define an early verification point: the first target to validate before scaling the approach. For changes to existing observable behavior, external contracts, or persisted data shapes, this must be an output comparison of at least one representative case
|
|
193
195
|
|
|
194
196
|
### Change Impact Map【Required】
|
|
195
197
|
Must be included when creating Design Doc:
|
|
@@ -252,6 +254,7 @@ Confirm and document conflicts with existing systems at each integration point t
|
|
|
252
254
|
- `dataModel` informs schema references, data contracts, and persistence design
|
|
253
255
|
- `focusAreas` indicate areas requiring deeper design attention
|
|
254
256
|
- `constraints` inform design constraints, assumptions, and risk handling
|
|
257
|
+
- `dataTransformationPipelines` informs the `Output Comparison` section in Verification Strategy
|
|
255
258
|
- Additional investigation should focus on gaps or limitations that the analysis calls out
|
|
256
259
|
- **PRD**: PRD document (if exists)
|
|
257
260
|
- **Documents to Create**: ADR, Design Doc, or both
|
|
@@ -352,6 +355,7 @@ Implementation sample creation checklist:
|
|
|
352
355
|
- [ ] **Data representation decision documented** (when new structures introduced)
|
|
353
356
|
- [ ] **Field propagation map included** (when fields cross boundaries)
|
|
354
357
|
- [ ] **Verification Strategy defined** (correctness definition, target comparison, verification method, observable success indicator, timing, early verification point)
|
|
358
|
+
- [ ] **Output Comparison defined** when changing existing observable behavior, an external contract, or a persisted data shape (comparison input, expected output fields, diff method, and transformation pipeline coverage)
|
|
355
359
|
|
|
356
360
|
**Reverse-engineer mode only**:
|
|
357
361
|
- [ ] Every architectural claim cites file:line evidence
|
package/README.md
CHANGED
|
@@ -4,9 +4,9 @@
|
|
|
4
4
|
[](https://developers.openai.com/codex/skills/)
|
|
5
5
|
[](https://opensource.org/licenses/MIT)
|
|
6
6
|
|
|
7
|
-
**
|
|
7
|
+
**Structured agentic coding workflows for [OpenAI Codex CLI](https://developers.openai.com/codex/cli)** — specialized AI coding agents plan, implement, test, and review changes with traceable docs, task-level commits, and quality gates.
|
|
8
8
|
|
|
9
|
-
Built on the [Agent Skills specification](https://developers.openai.com/codex/skills/) and [Codex subagents](https://developers.openai.com/codex/subagents).
|
|
9
|
+
Built on the [Agent Skills specification](https://developers.openai.com/codex/skills/) and [Codex subagents](https://developers.openai.com/codex/subagents). Designed for long-running tasks, large refactors, and reviewable changes.
|
|
10
10
|
|
|
11
11
|
---
|
|
12
12
|
|
|
@@ -25,36 +25,45 @@ $recipe-implement Add user authentication with JWT
|
|
|
25
25
|
|
|
26
26
|
`$` is Codex CLI's syntax for invoking a skill explicitly. Type `$recipe-` to see all available recipes via tab completion.
|
|
27
27
|
|
|
28
|
-
|
|
28
|
+
Small changes stay lightweight. Larger tasks get structure: requirements → design → task decomposition → TDD implementation → quality gates.
|
|
29
|
+
|
|
30
|
+
codex-workflows is the Codex-native counterpart of [Claude Code Workflows](https://github.com/shinpr/claude-code-workflows): same document-driven development style, adapted for Codex CLI, subagents, and GPT models.
|
|
29
31
|
|
|
30
32
|
---
|
|
31
33
|
|
|
32
34
|
## Why codex-workflows?
|
|
33
35
|
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
36
|
+
Codex is already strong at one-shot implementation. The problem starts when a change spans multiple files, needs design decisions to stay visible, or has to survive review, testing, and follow-up edits.
|
|
37
|
+
|
|
38
|
+
For larger tasks, explicit planning changes the job from raw generation into verification against a design, a task breakdown, and acceptance criteria. That matters because review loops are more reliable than first-shot generation once scope and ambiguity grow.
|
|
39
|
+
|
|
40
|
+
codex-workflows adds the missing structure around those jobs:
|
|
41
|
+
- Traceable artifacts: PRD → Design Doc → Task → Commit
|
|
42
|
+
- Built-in TDD and quality gates before code is ready to commit
|
|
43
|
+
- Agent context separation for large refactors, migrations, and PR-sized changes
|
|
44
|
+
- Diagnosis and reverse-engineering flows for bugs and legacy code
|
|
45
|
+
|
|
46
|
+
## Not Designed For
|
|
38
47
|
|
|
39
|
-
|
|
40
|
-
-
|
|
41
|
-
-
|
|
42
|
-
- Large tasks stay structured and reviewable through agent context separation
|
|
48
|
+
- One-shot toy scripts or vibe-coding sessions where speed matters more than traceability
|
|
49
|
+
- Repositories that do not use tests, lint, builds, or reviewable commits
|
|
50
|
+
- Teams that do not want design docs, task breakdowns, or explicit quality gates
|
|
43
51
|
|
|
44
52
|
---
|
|
45
53
|
|
|
46
54
|
## What It Does
|
|
47
55
|
|
|
48
|
-
A single request becomes a structured development process:
|
|
56
|
+
A single request becomes a structured development process. The framework chooses the level of ceremony based on scope:
|
|
57
|
+
|
|
58
|
+
| Scale | File Count | What Happens |
|
|
59
|
+
|-------|------------|-------------|
|
|
60
|
+
| Small | 1-2 | Simplified plan → direct implementation |
|
|
61
|
+
| Medium | 3-5 | Design Doc → work plan → task execution |
|
|
62
|
+
| Large | 6+ | PRD → ADR → Design Doc → test skeletons → work plan → autonomous execution |
|
|
49
63
|
|
|
50
|
-
|
|
51
|
-
2. **Analyze the existing codebase** (dependencies, data layer, risk areas)
|
|
52
|
-
3. **Design** the solution (ADR, Design Doc with acceptance criteria)
|
|
53
|
-
4. **Break it into tasks** (atomic, 1 commit each)
|
|
54
|
-
5. **Implement with tests** (TDD per task)
|
|
55
|
-
6. **Run quality checks** (lint, test, build — no failing checks)
|
|
64
|
+
For larger work, the path usually looks like this: understand the problem, analyze the codebase, design the change, break it into atomic tasks, implement with tests, and run quality checks before commit.
|
|
56
65
|
|
|
57
|
-
Each step is handled by a specialized subagent in its own context,
|
|
66
|
+
Each step is handled by a specialized subagent in its own context, using context engineering to prevent context pollution and reduce error accumulation in long-running tasks:
|
|
58
67
|
|
|
59
68
|
```
|
|
60
69
|
User Request
|
|
@@ -96,6 +105,8 @@ Problem → investigator → verifier (ACH + Devil's Advocate) → solver → Ac
|
|
|
96
105
|
Existing code → scope-discoverer (discoveredUnits + prdUnits) → prd-creator → code-verifier → document-reviewer → Design Docs
|
|
97
106
|
```
|
|
98
107
|
|
|
108
|
+
This works best when repository knowledge is explicit and local. Short `AGENTS.md` files can act as entry points, while design docs, plans, and task files hold the deeper instructions that agents need to execute reliably.
|
|
109
|
+
|
|
99
110
|
---
|
|
100
111
|
|
|
101
112
|
## Installation
|
|
@@ -103,7 +114,7 @@ Existing code → scope-discoverer (discoveredUnits + prdUnits) → prd-creator
|
|
|
103
114
|
### Requirements
|
|
104
115
|
|
|
105
116
|
- [Codex CLI](https://developers.openai.com/codex/cli) (latest)
|
|
106
|
-
- Node.js >=
|
|
117
|
+
- Node.js >= 22
|
|
107
118
|
|
|
108
119
|
### Install
|
|
109
120
|
|
|
@@ -266,16 +277,6 @@ Codex spawns these as needed during recipe execution. Each agent runs in its own
|
|
|
266
277
|
|
|
267
278
|
## How It Works
|
|
268
279
|
|
|
269
|
-
### Scale-Based Workflow Selection
|
|
270
|
-
|
|
271
|
-
The framework automatically determines the right level of ceremony:
|
|
272
|
-
|
|
273
|
-
| Scale | File Count | What Happens |
|
|
274
|
-
|-------|------------|-------------|
|
|
275
|
-
| Small | 1-2 | Simplified plan → direct implementation |
|
|
276
|
-
| Medium | 3-5 | Design Doc → work plan → task execution |
|
|
277
|
-
| Large | 6+ | PRD → ADR → Design Doc → test skeletons → work plan → autonomous execution |
|
|
278
|
-
|
|
279
280
|
### Autonomous Execution Mode
|
|
280
281
|
|
|
281
282
|
After work plan approval, the framework enters guided autonomous execution with escalation points:
|
|
@@ -287,7 +288,8 @@ After work plan approval, the framework enters guided autonomous execution with
|
|
|
287
288
|
|
|
288
289
|
### Context Separation
|
|
289
290
|
|
|
290
|
-
Each subagent runs in a fresh context. This
|
|
291
|
+
Each subagent runs in a fresh context. This context-engineering pattern keeps long-running agentic coding tasks legible and reviewable:
|
|
292
|
+
- generation and verification happen in separate contexts, reducing author bias and carry-over assumptions
|
|
291
293
|
- **document-reviewer** reviews without the author's bias
|
|
292
294
|
- **investigator** collects evidence without confirmation bias
|
|
293
295
|
- **code-reviewer** validates compliance without implementation context
|
|
@@ -335,6 +337,14 @@ your-project/
|
|
|
335
337
|
|
|
336
338
|
---
|
|
337
339
|
|
|
340
|
+
## Works With
|
|
341
|
+
|
|
342
|
+
If your requirements already live in Linear or an existing PRD, [linear-prism](https://github.com/shinpr/linear-prism) can decompose them into implementation-ready tasks by reading the codebase, making dependencies explicit, and preserving Design Doc boundaries.
|
|
343
|
+
|
|
344
|
+
Those tasks can then be passed into `$recipe-design` to enter the design phase with clearer scope and better task visibility.
|
|
345
|
+
|
|
346
|
+
---
|
|
347
|
+
|
|
338
348
|
## FAQ
|
|
339
349
|
|
|
340
350
|
**Q: What models does this work with?**
|
|
@@ -349,10 +359,6 @@ A: Yes. Edit the TOML files in `.codex/agents/` — change model, sandbox_mode,
|
|
|
349
359
|
|
|
350
360
|
A: `$recipe-implement` is the universal entry point. It runs requirement-analyzer first, detects affected layers from the codebase, and automatically routes to backend, frontend, or fullstack flow. `$recipe-fullstack-implement` skips the detection and goes straight into the fullstack flow (separate Design Docs per layer, design-sync, layer-aware task execution). Use `$recipe-implement` when you're not sure; use `$recipe-fullstack-implement` when you know upfront that the feature spans both layers.
|
|
351
361
|
|
|
352
|
-
**Q: How does this relate to Claude Code Workflows?**
|
|
353
|
-
|
|
354
|
-
A: codex-workflows is the Codex-native counterpart of [Claude Code Workflows](https://github.com/shinpr/claude-code-workflows). Same development philosophy, adapted for Codex CLI's subagent architecture and GPT model family.
|
|
355
|
-
|
|
356
362
|
**Q: Does this work with MCP servers?**
|
|
357
363
|
|
|
358
364
|
A: Yes. Codex skills and subagents work alongside [MCP](https://developers.openai.com/codex/mcp) — skills operate at the instruction layer while MCP operates at the tool transport layer. You can add MCP servers to any agent's TOML configuration.
|
|
@@ -363,6 +369,19 @@ A: Subagents escalate to the user when they encounter design deviations, ambiguo
|
|
|
363
369
|
|
|
364
370
|
---
|
|
365
371
|
|
|
372
|
+
## Design Rationale
|
|
373
|
+
|
|
374
|
+
<details>
|
|
375
|
+
<summary>Background reading behind the workflow design</summary>
|
|
376
|
+
|
|
377
|
+
- [Planning Is the Real Superpower of Agentic Coding](https://www.norsica.jp/blog/planning-superpower-agentic-coding) — why explicit planning turns large-task execution from raw generation into verification against a design and task breakdown
|
|
378
|
+
- [Why LLMs Are Bad at 'First Try' and Great at Verification](https://www.norsica.jp/blog/llm-verification-over-generation) — why review loops and session separation are more reliable than first-shot generation on complex work
|
|
379
|
+
- [Stop Putting Everything in AGENTS.md](https://www.norsica.jp/blog/stop-putting-everything-in-agents-md) — why `AGENTS.md` should stay lean while rules, docs, and task instructions live near the point of use
|
|
380
|
+
|
|
381
|
+
</details>
|
|
382
|
+
|
|
383
|
+
---
|
|
384
|
+
|
|
366
385
|
## License
|
|
367
386
|
|
|
368
387
|
MIT License — free to use, modify, and distribute.
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "codex-workflows",
|
|
3
|
-
"version": "0.4.
|
|
3
|
+
"version": "0.4.4",
|
|
4
4
|
"description": "Task-oriented agentic coding framework for OpenAI Codex CLI — skills, recipes, and subagents for structured development workflows",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"author": "Shinsuke Kagawa",
|
|
@@ -22,9 +22,12 @@
|
|
|
22
22
|
"agent-skills",
|
|
23
23
|
"agentic-coding",
|
|
24
24
|
"ai-coding",
|
|
25
|
+
"ai-coding-agent",
|
|
25
26
|
"subagents",
|
|
26
27
|
"multi-agent",
|
|
28
|
+
"harness-engineering",
|
|
27
29
|
"context-engineering",
|
|
30
|
+
"ai-development-workflow",
|
|
28
31
|
"tdd",
|
|
29
32
|
"code-generation"
|
|
30
33
|
],
|