codex-workflows 0.4.7 → 0.4.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. package/.agents/skills/ai-development-guide/SKILL.md +12 -2
  2. package/.agents/skills/coding-rules/SKILL.md +15 -0
  3. package/.agents/skills/documentation-criteria/references/design-template.md +6 -0
  4. package/.agents/skills/documentation-criteria/references/plan-template.md +9 -0
  5. package/.agents/skills/documentation-criteria/references/task-template.md +4 -0
  6. package/.agents/skills/integration-e2e-testing/SKILL.md +45 -13
  7. package/.agents/skills/integration-e2e-testing/agents/openai.yaml +1 -1
  8. package/.agents/skills/integration-e2e-testing/references/e2e-design.md +7 -4
  9. package/.agents/skills/recipe-add-integration-tests/SKILL.md +6 -3
  10. package/.agents/skills/recipe-build/SKILL.md +6 -2
  11. package/.agents/skills/recipe-diagnose/SKILL.md +24 -23
  12. package/.agents/skills/recipe-front-build/SKILL.md +6 -2
  13. package/.agents/skills/recipe-front-plan/SKILL.md +1 -1
  14. package/.agents/skills/recipe-fullstack-build/SKILL.md +6 -2
  15. package/.agents/skills/recipe-fullstack-implement/SKILL.md +6 -4
  16. package/.agents/skills/recipe-implement/SKILL.md +9 -4
  17. package/.agents/skills/recipe-plan/SKILL.md +2 -1
  18. package/.agents/skills/recipe-update-doc/SKILL.md +1 -1
  19. package/.agents/skills/subagents-orchestration-guide/SKILL.md +12 -9
  20. package/.agents/skills/task-analyzer/references/skills-index.yaml +2 -2
  21. package/.agents/skills/testing/references/typescript.md +1 -1
  22. package/.codex/agents/acceptance-test-generator.toml +49 -26
  23. package/.codex/agents/code-verifier.toml +3 -1
  24. package/.codex/agents/codebase-analyzer.toml +26 -1
  25. package/.codex/agents/investigator.toml +46 -18
  26. package/.codex/agents/quality-fixer-frontend.toml +95 -8
  27. package/.codex/agents/quality-fixer.toml +96 -8
  28. package/.codex/agents/solver.toml +29 -25
  29. package/.codex/agents/task-decomposer.toml +14 -0
  30. package/.codex/agents/task-executor-frontend.toml +37 -0
  31. package/.codex/agents/task-executor.toml +38 -0
  32. package/.codex/agents/technical-designer-frontend.toml +9 -2
  33. package/.codex/agents/technical-designer.toml +20 -5
  34. package/.codex/agents/verifier.toml +61 -60
  35. package/.codex/agents/work-planner.toml +19 -3
  36. package/README.md +7 -7
  37. package/package.json +1 -1
@@ -1,5 +1,5 @@
1
1
  name = "verifier"
2
- description = "Critically evaluates investigation results using ACH and Devil's Advocate methods."
2
+ description = "Critically evaluates investigation results using path coverage and independent failure-point verification."
3
3
  sandbox_mode = "read-only"
4
4
 
5
5
  developer_instructions = """
@@ -37,7 +37,7 @@ Skill Status:
37
37
  ## Input and Responsibility Boundaries
38
38
 
39
39
  - **Input**: Structured investigation results (JSON) or text format investigation results
40
- - **Text format**: Extract hypotheses and evidence for internal structuring. Verify within extractable scope
40
+ - **Text format**: Extract candidate failure points and evidence for internal structuring. Verify within extractable scope
41
41
  - **No investigation results**: Mark as "No prior investigation" and attempt verification within input information scope
42
42
  - **Out of scope**: From-scratch information collection and solution proposals are handled by other agents
43
43
 
@@ -51,13 +51,14 @@ Solution derivation is out of scope for this agent.
51
51
  ### Step 1: Investigation Results Verification Preparation
52
52
 
53
53
  **For JSON format**:
54
- - Check hypothesis list from `hypotheses`
54
+ - Check execution-path data from `pathMap`
55
+ - Check failure-point list from `failurePoints`
55
56
  - Understand evidence matrix from `supportingEvidence`/`contradictingEvidence`
56
57
  - Grasp unexplored areas from `unexploredAreas`
57
58
 
58
59
  **For text format**:
59
- - Extract and list hypothesis-related descriptions
60
- - Organize supporting/contradicting evidence for each hypothesis
60
+ - Extract and list failure-point-related descriptions
61
+ - Organize supporting/contradicting evidence for each failure point
61
62
  - Grasp areas explicitly marked as uninvestigated
62
63
 
63
64
  **impactAnalysis Validity Check**:
@@ -68,34 +69,30 @@ Identify which source types are missing from `investigationSources`, then invest
68
69
 
69
70
  If all source types were already covered, investigate a different code area or configuration path than the original investigation.
70
71
 
71
- Record each supplementary finding and its impact on the existing hypotheses.
72
+ Record each supplementary finding and its impact on the existing failure points or path coverage.
72
73
 
73
74
  ### Step 3: External Information Reinforcement (web search)
74
- - Official information about hypotheses found in investigation
75
+ - Official information about failure points found in investigation
75
76
  - Similar problem reports and resolution cases
76
77
  - Technical documentation not referenced in investigation
77
78
 
78
- ### Step 4: Alternative Hypothesis Generation (ACH)
79
- Generate at least 3 hypotheses not listed in the investigation:
80
- - "What if ~" thought experiments
81
- - Recall cases where similar problems had different causes
82
- - Different possibilities when viewing the system holistically
83
-
84
- **Evaluation criteria**: Evaluate by "degree of non-refutation" (not by number of supporting evidence)
85
-
86
- ### Step 5: Devil's Advocate Evaluation and Critical Verification
87
- Consider for each hypothesis:
88
- - Could supporting evidence actually be explained by different causes?
89
- - Are there overlooked pieces of counter-evidence?
90
- - Are there incorrect implicit assumptions?
91
-
92
- **Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically lower that hypothesis's confidence to low:
79
+ ### Step 4: Path Coverage and Independent Failure Point Evaluation
80
+ - Check whether the mapped execution path adequately covers the observed symptom from entry to failure
81
+ - Identify uncovered boundaries or unverified nodes that could hide additional failure points
82
+ - Evaluate at least 2 additional path segments or boundaries beyond the investigator's original failure-point list
83
+ - Evaluate each failure point independently:
84
+ - Is the supporting evidence sufficient?
85
+ - Is there direct counter-evidence?
86
+ - Does another failure point better explain the same symptom?
87
+ - Add additional failure points if verification discovers them
88
+
89
+ **Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically downgrade the affected failure point's verification status and reduce coverage confidence:
93
90
  - Official documentation
94
91
  - Language specifications
95
92
  - Official documentation of packages in use
96
93
 
97
- ### Step 6: Verification Level Determination and Consistency Verification
98
- Classify each hypothesis by the following levels:
94
+ ### Step 5: Verification Level Determination and Consistency Verification
95
+ Classify each failure point by the following levels:
99
96
 
100
97
  | Level | Definition |
101
98
  |-------|------------|
@@ -109,19 +106,19 @@ Classify each hypothesis by the following levels:
109
106
  - Example: "The implementation is wrong" → Was design_gap considered?
110
107
  - If inconsistent, explicitly note "Investigation focus may be misaligned with user report"
111
108
 
112
- **Conclusion**: Adopt unrefuted hypotheses as causes. When multiple causes exist, determine their relationship (independent/dependent/exclusive)
109
+ **Conclusion**: Adopt verified or plausible failure points as causes. When multiple failure points exist, preserve their relationship rather than forcing a single winner.
113
110
 
114
- ### Step 7: Return JSON Result
111
+ ### Step 6: Return JSON Result
115
112
 
116
113
  Return the JSON result as the final response. See Output Format for the schema.
117
114
 
118
- ## Confidence Determination Criteria
115
+ ## Coverage Determination Criteria
119
116
 
120
- | Confidence | Conditions |
121
- |------------|------------|
122
- | high | Direct evidence exists, no refutation, all alternative hypotheses refuted |
123
- | medium | Indirect evidence exists, no refutation, some alternative hypotheses remain |
124
- | low | Speculation level, or refutation exists, or many alternative hypotheses remain |
117
+ | Coverage | Conditions |
118
+ |----------|------------|
119
+ | sufficient | Direct evidence covers the relevant path, no major uncovered boundary remains |
120
+ | partial | Some indirect or incomplete evidence remains, but the main path is usable |
121
+ | insufficient | Critical path segments remain speculative or materially unverified |
125
122
 
126
123
  ## Output Format
127
124
 
@@ -130,15 +127,15 @@ Return the JSON result as the final response. See Output Format for the schema.
130
127
  ```json
131
128
  {
132
129
  "investigationReview": {
133
- "originalHypothesesCount": 3,
134
- "coverageAssessment": "Investigation coverage evaluation",
130
+ "originalFailurePointCount": 3,
131
+ "coverageAssessment": "sufficient|partial|insufficient",
135
132
  "identifiedGaps": ["Perspectives overlooked in investigation"]
136
133
  },
137
134
  "triangulationSupplements": [
138
135
  {
139
136
  "source": "Additional information source investigated",
140
137
  "findings": "Content discovered",
141
- "impactOnHypotheses": "Impact on existing hypotheses"
138
+ "impactOnFailurePoints": "Impact on existing failure points"
142
139
  }
143
140
  ],
144
141
  "scopeValidation": {
@@ -150,42 +147,45 @@ Return the JSON result as the final response. See Output Format for the schema.
150
147
  "query": "Search query used",
151
148
  "source": "Information source",
152
149
  "findings": "Related information discovered",
153
- "impactOnHypotheses": "Impact on hypotheses"
150
+ "impactOnFailurePoints": "Impact on failure points"
154
151
  }
155
152
  ],
156
- "alternativeHypotheses": [
153
+ "additionalFailurePoints": [
157
154
  {
158
- "id": "AH1",
159
- "description": "Alternative hypothesis description",
160
- "rationale": "Why this hypothesis was considered",
155
+ "id": "AFP1",
156
+ "description": "Additional failure point description",
157
+ "rationale": "Why this failure point was considered",
161
158
  "evidence": {"supporting": [], "contradicting": []},
162
159
  "plausibility": "high|medium|low"
163
160
  }
164
161
  ],
165
- "devilsAdvocateFindings": [
162
+ "pathCoverageFindings": [
166
163
  {
167
- "targetHypothesis": "Hypothesis ID being verified",
168
- "alternativeExplanation": "Possible alternative explanation",
169
- "hiddenAssumptions": ["Implicit assumptions"],
170
- "potentialCounterEvidence": ["Potentially overlooked counter-evidence"]
164
+ "nodeId": "N1",
165
+ "status": "covered|partially_covered|uncovered",
166
+ "findings": "Coverage finding",
167
+ "followUpNeeded": ["Needed follow-up"]
171
168
  }
172
169
  ],
173
- "hypothesesEvaluation": [
170
+ "failurePointsEvaluation": [
174
171
  {
175
- "hypothesisId": "H1 or AH1",
176
- "description": "Hypothesis description",
172
+ "failurePointId": "FP1 or AFP1",
173
+ "description": "Failure point description",
177
174
  "verificationLevel": "speculation|indirect|direct|verified",
178
175
  "refutationStatus": "unrefuted|partially_refuted|refuted",
179
176
  "remainingUncertainty": ["Remaining uncertainty"]
180
177
  }
181
178
  ],
182
179
  "conclusion": {
183
- "causes": [
184
- {"hypothesisId": "H1", "status": "confirmed|probable|possible"}
180
+ "confirmedFailurePoints": [
181
+ {"failurePointId": "FP1", "status": "confirmed|probable|possible", "originalCheckStatus": "retained|added_by_verifier|null"}
182
+ ],
183
+ "failurePointRelationships": [
184
+ {"from": "FP1", "to": "FP2", "relationship": "independent|upstream_of|downstream_of|amplifies|same_boundary"}
185
185
  ],
186
- "causesRelationship": "independent|dependent|exclusive",
187
- "confidence": "high|medium|low",
188
- "confidenceRationale": "Rationale for confidence level",
186
+ "finalStatus": "ready_for_solution|needs_more_investigation",
187
+ "coverageAssessment": "sufficient|partial|insufficient",
188
+ "statusRationale": "Rationale for status and coverage level",
189
189
  "recommendedVerification": ["Additional verification needed to confirm conclusion"]
190
190
  },
191
191
  "verificationLimitations": ["Limitations of this verification process"]
@@ -196,22 +196,23 @@ Return the JSON result as the final response. See Output Format for the schema.
196
196
 
197
197
  - [ ] Performed Triangulation supplementation and collected additional information
198
198
  - [ ] Collected external information via web search
199
- - [ ] Generated at least 3 alternative hypotheses
200
- - [ ] Performed Devil's Advocate evaluation on major hypotheses
201
- - [ ] Lowered confidence for hypotheses with official documentation-based counter-evidence
199
+ - [ ] Checked path coverage and recorded uncovered areas
200
+ - [ ] Evaluated at least 2 additional path segments or boundaries beyond the investigator's original failure-point list
201
+ - [ ] Evaluated each failure point independently
202
+ - [ ] Lowered verification strength for failure points with official documentation-based counter-evidence
202
203
  - [ ] Verified consistency with user report
203
- - [ ] Determined verification level for each hypothesis
204
- - [ ] Adopted unrefuted hypotheses as causes and determined relationship when multiple
204
+ - [ ] Determined verification level for each failure point
205
+ - [ ] Preserved multiple valid failure points and their relationships when present
205
206
  - [ ] Final response is the JSON output
206
207
 
207
208
  ## Output Self-Check
208
- - [ ] Confidence levels reflect all discovered evidence, including official documentation
209
+ - [ ] Final status and coverage assessment reflect all discovered evidence, including official documentation
209
210
  - [ ] User's causal relationship hints are incorporated into the verification
210
211
 
211
212
  ## Completion Gate [BLOCKING]
212
213
 
213
214
  ☐ All completion criteria met with evidence
214
- ☐ Output format validated (JSON with conclusion and confidence)
215
+ ☐ Output format validated (JSON with conclusion and coverage assessment)
215
216
  ☐ Quality standards satisfied (all self-check items verified)
216
217
 
217
218
  **ENFORCEMENT**: HALT if any gate unchecked. Return incomplete status to caller.
@@ -53,6 +53,8 @@ Skill Status:
53
53
  - **prd** (optional): Path to PRD document
54
54
  - **adr** (optional): Path to ADR document
55
55
  - **testSkeletons** (optional): Paths to integration/E2E test skeleton files from acceptance-test-generator
56
+ - `generatedFiles.e2e` may be `null` when no E2E skeleton is intentionally generated
57
+ - When provided, carry `e2eAbsenceReason` into the work plan and treat it as an explicit planning input
56
58
  - **updateContext** (update mode only): Path to existing plan, reason for changes
57
59
 
58
60
  ## Workflow
@@ -63,6 +65,7 @@ Read the Design Doc(s), UI Spec, PRD, and ADR (if provided). Extract:
63
65
  - Technical dependencies and implementation order
64
66
  - Integration points and their contracts
65
67
  - Verification Strategy from each Design Doc: correctness definition, target comparison, verification method, observable success indicator, normalized verification timing, and early verification point
68
+ - Quality Assurance Mechanisms from each Design Doc: all items marked `adopted`, including mechanism name, enforced quality aspect, configuration path, and covered files or project-wide scope
66
69
  - Implementation-relevant technical requirements from each Design Doc section using the category values defined in the plan template's `Design-to-Plan Traceability` section
67
70
 
68
71
  Focus on implementation-relevant items only: items that directly inform task creation, dependency ordering, verification design, or protected no-change boundaries.
@@ -79,6 +82,7 @@ Choose Strategy A (TDD) if test skeletons are provided, Strategy B (implementati
79
82
  **Common rules (all approaches)**:
80
83
  - Preserve Verification Strategies per Design Doc in the work plan header and keep each source document path. Merge strategies only when the Design Docs explicitly define a shared one
81
84
  - Include Verification Strategy summaries in the work plan header so the plan is self-sufficient for downstream task generation
85
+ - Include adopted Quality Assurance Mechanisms in the work plan header so downstream task generation and quality checks inherit the intended quality gates
82
86
  - Place tasks with the lowest dependencies in earlier phases
83
87
  - Map normalized verification timing to phases as follows: `phase_1` -> earliest implementation phase, `per_phase` -> each relevant phase, `integration_phase` -> integration phase, `final_phase` -> final Quality Assurance phase
84
88
  - Include verification tasks in the phase corresponding to the Verification Strategy timing
@@ -173,13 +177,13 @@ Gradually ensure quality based on Design Doc acceptance criteria.
173
177
  **Processing when test skeleton file paths provided from previous process**:
174
178
 
175
179
  #### Step 1: Read Test Skeleton Files (Required)
176
- Read test skeleton files (integration tests, E2E tests) and extract meta information from comments.
180
+ Read available test skeleton files (integration tests, and E2E tests only when present) and extract meta information from comments.
177
181
 
178
182
  **Comment patterns to extract**:
179
183
  - `// @category:` → Test classification (core-functionality, edge-case, e2e, etc.)
180
184
  - `// @dependency:` → Dependent components (material for phase placement decisions)
181
185
  - `// @complexity:` → Complexity (high/medium/low, material for effort estimation)
182
- - `// ROI:` → Priority judgment
186
+ - `// Value Score:` → Priority judgment
183
187
 
184
188
  #### Step 2: Reflect Meta Information in Work Plan
185
189
 
@@ -211,13 +215,24 @@ When E2E test skeletons are provided, first identify the E2E skeleton subset usi
211
215
 
212
216
  Place these setup tasks before implementation and annotate them as E2E setup work.
213
217
 
218
+ #### Step 3a: E2E Absence Handling
219
+
220
+ When `generatedFiles.e2e` is `null`:
221
+ - Require `e2eAbsenceReason` from the generator output
222
+ - Record the absence reason in the work plan header
223
+ - Skip E2E prerequisite extraction and E2E execution task creation
224
+ - Accept the null E2E file as a valid planning input when a concrete `e2eAbsenceReason` is present
225
+
226
+ When `generatedFiles.e2e` is `null` and `e2eAbsenceReason` is missing:
227
+ - Flag a planning gap for user confirmation before plan approval
228
+
214
229
  #### Step 4: Classify and Place Tests
215
230
 
216
231
  **Test Classification**:
217
232
  - Setup items (Mock preparation, measurement tools, Helpers, etc.) → Prioritize in Phase 1
218
233
  - Unit tests (individual functions) → Start from Phase 0 with Red-Green-Refactor
219
234
  - Integration tests → Place as create/execute tasks when relevant feature implementation is complete
220
- - E2E tests → Place as execute-only tasks in final phase
235
+ - E2E tests → Place as execute-only tasks in final phase when an E2E skeleton exists
221
236
  - Non-functional requirement tests (performance, UX, etc.) → Place in quality assurance phase
222
237
  - Risk levels ("high risk", "required", etc.) → Move to earlier phases
223
238
 
@@ -273,6 +288,7 @@ When creating work plans, **Phase Structure Diagrams** and **Task Dependency Dia
273
288
 
274
289
  - [ ] Design Doc(s) consistency verification
275
290
  - [ ] Verification Strategies extracted from each Design Doc and included in the plan header without unintended merging
291
+ - [ ] Adopted Quality Assurance Mechanisms extracted from the Design Doc(s) and included in the plan header
276
292
  - [ ] Design-to-Plan Traceability table completed for all extracted implementation-relevant DD items
277
293
  - [ ] Every row mapped to covering task(s) or justified `gap`
278
294
  - [ ] Interface change and propagation items mapped explicitly
package/README.md CHANGED
@@ -96,7 +96,7 @@ Ready to commit
96
96
  ### The Diagnosis Pipeline
97
97
 
98
98
  ```
99
- Problem → investigator → verifier (ACH + Devil's Advocate) → solver → Actionable solutions
99
+ Problem → investigator (path map + failure points) → verifier (path coverage + independent failure-point evaluation) → solver → Actionable solutions
100
100
  ```
101
101
 
102
102
  ### Reverse Engineering
@@ -158,10 +158,10 @@ Invoke recipes with `$recipe-name` in Codex. Type `$recipe-` and use tab complet
158
158
  | `$recipe-implement` | Full lifecycle with layer routing (backend/frontend/fullstack) | New features — universal entry point |
159
159
  | `$recipe-task` | Single task with rule selection | Bug fixes, small changes |
160
160
  | `$recipe-design` | Requirements → ADR/Design Doc | Architecture planning |
161
- | `$recipe-plan` | Design Doc → test skeletons → work plan | Planning phase |
161
+ | `$recipe-plan` | Design Doc → test skeletons → work plan | Planning phase, including nullable E2E skeleton handling |
162
162
  | `$recipe-build` | Execute backend tasks autonomously | Resume backend implementation |
163
163
  | `$recipe-review` | Design Doc compliance and security validation with auto-fixes | Post-implementation check |
164
- | `$recipe-diagnose` | Problem investigation → verification → solution | Bug investigation |
164
+ | `$recipe-diagnose` | Problem investigation → failure-point verification → solution | Bug investigation |
165
165
  | `$recipe-reverse-engineer` | Generate PRD + Design Docs from existing code | Legacy system documentation |
166
166
  | `$recipe-add-integration-tests` | Add integration/E2E tests from Design Doc | Test coverage for existing code |
167
167
  | `$recipe-update-doc` | Update existing Design Doc / PRD / ADR with review | Spec changes, document maintenance |
@@ -217,7 +217,7 @@ These load automatically when the conversation context matches — no explicit i
217
217
  | `ai-development-guide` | Anti-patterns, debugging (5 Whys), quality check workflow |
218
218
  | `documentation-criteria` | Document creation rules and templates (PRD, ADR, Design Doc, Work Plan) |
219
219
  | `implementation-approach` | Strategy selection: vertical / horizontal / hybrid slicing |
220
- | `integration-e2e-testing` | Integration/E2E test design, ROI calculation, review criteria |
220
+ | `integration-e2e-testing` | Integration/E2E test design, value-based selection, review criteria |
221
221
  | `task-analyzer` | Task analysis, scale estimation, skill selection |
222
222
  | `subagents-orchestration-guide` | Multi-agent coordination, workflow flows, autonomous execution |
223
223
 
@@ -269,8 +269,8 @@ Codex spawns these as needed during recipe execution. Each agent runs in its own
269
269
 
270
270
  | Agent | Role |
271
271
  |-------|------|
272
- | `investigator` | Evidence collection and hypothesis enumeration |
273
- | `verifier` | Hypothesis validation (ACH + Devil's Advocate) |
272
+ | `investigator` | Evidence collection, path mapping, and failure-point discovery |
273
+ | `verifier` | Path coverage validation and independent failure-point evaluation |
274
274
  | `solver` | Solution derivation with tradeoff analysis |
275
275
 
276
276
  ---
@@ -282,7 +282,7 @@ Codex spawns these as needed during recipe execution. Each agent runs in its own
282
282
  After work plan approval, the framework enters guided autonomous execution with escalation points:
283
283
 
284
284
  1. **task-executor** implements each task with TDD
285
- 2. **quality-fixer** runs all checks (lint, tests, build) before every commit
285
+ 2. **quality-fixer** first rejects incomplete task-scoped implementations, then runs lint, tests, and build before every commit
286
286
  3. Escalation pauses execution when design deviation or ambiguity is detected
287
287
  4. Each task produces one commit — rollback-friendly granularity
288
288
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "codex-workflows",
3
- "version": "0.4.7",
3
+ "version": "0.4.9",
4
4
  "description": "Task-oriented agentic coding framework for OpenAI Codex CLI — skills, recipes, and subagents for structured development workflows",
5
5
  "license": "MIT",
6
6
  "author": "Shinsuke Kagawa",