codex-workflows 0.4.7 → 0.4.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (27) hide show
  1. package/.agents/skills/integration-e2e-testing/SKILL.md +45 -13
  2. package/.agents/skills/integration-e2e-testing/agents/openai.yaml +1 -1
  3. package/.agents/skills/integration-e2e-testing/references/e2e-design.md +7 -4
  4. package/.agents/skills/recipe-add-integration-tests/SKILL.md +6 -3
  5. package/.agents/skills/recipe-build/SKILL.md +6 -2
  6. package/.agents/skills/recipe-diagnose/SKILL.md +24 -23
  7. package/.agents/skills/recipe-front-build/SKILL.md +6 -2
  8. package/.agents/skills/recipe-front-plan/SKILL.md +1 -1
  9. package/.agents/skills/recipe-fullstack-build/SKILL.md +6 -2
  10. package/.agents/skills/recipe-fullstack-implement/SKILL.md +6 -4
  11. package/.agents/skills/recipe-implement/SKILL.md +9 -4
  12. package/.agents/skills/recipe-plan/SKILL.md +2 -1
  13. package/.agents/skills/recipe-update-doc/SKILL.md +1 -1
  14. package/.agents/skills/subagents-orchestration-guide/SKILL.md +9 -6
  15. package/.agents/skills/task-analyzer/references/skills-index.yaml +2 -2
  16. package/.agents/skills/testing/references/typescript.md +1 -1
  17. package/.codex/agents/acceptance-test-generator.toml +49 -26
  18. package/.codex/agents/code-verifier.toml +3 -1
  19. package/.codex/agents/investigator.toml +46 -18
  20. package/.codex/agents/quality-fixer-frontend.toml +54 -8
  21. package/.codex/agents/quality-fixer.toml +55 -8
  22. package/.codex/agents/solver.toml +29 -25
  23. package/.codex/agents/technical-designer-frontend.toml +9 -2
  24. package/.codex/agents/technical-designer.toml +9 -2
  25. package/.codex/agents/verifier.toml +61 -60
  26. package/.codex/agents/work-planner.toml +16 -3
  27. package/package.json +1 -1
@@ -220,6 +220,13 @@ When a UI Spec exists for the feature (`docs/ui-spec/{feature-name}-ui-spec.md`)
220
220
  - Path to existing document
221
221
  - Reason for changes
222
222
  - Sections needing updates
223
+ - Before editing changed sections, build a Dependency Inventory for identifiers referenced by the update
224
+ - Dependency Inventory output format:
225
+ - `identifier`: exact literal identifier
226
+ - `source`: codebase | accepted_adr | external
227
+ - `status`: verified_existing | requires_new_creation | external_dependency
228
+ - `action`: keep | update_document | create_dependency | confirm_external_reference
229
+ - In update mode, cross-check prerequisite ADR references against Accepted ADRs only. Cross-Design-Doc consistency is handled by design-sync after the update
223
230
 
224
231
  - **Reverse-Engineer Context** (reverse-engineer mode only):
225
232
  - Primary Files
@@ -309,14 +316,14 @@ Cover happy path, unhappy path, and edge cases including empty and loading state
309
316
 
310
317
  ### AC Scoping for Autonomous Implementation (Frontend)
311
318
 
312
- **Include** (High automation ROI):
319
+ **Include** (High automation value):
313
320
  - User interaction behavior (button clicks, form submissions, navigation)
314
321
  - Rendering correctness (component displays correct data)
315
322
  - State management behavior (state updates correctly on user actions)
316
323
  - Error handling behavior (error messages displayed to user)
317
324
  - Accessibility (keyboard navigation, screen reader support)
318
325
 
319
- **Exclude** (Low ROI in LLM/CI/CD environment):
326
+ **Exclude** (Low automation value in LLM/CI/CD environment):
320
327
  - External API real connections → Use MSW for API mocking instead
321
328
  - Performance metrics → Non-deterministic in CI environment
322
329
  - Implementation details → Focus on user-observable behavior
@@ -252,6 +252,13 @@ Confirm and document conflicts with existing systems at each integration point t
252
252
  - Path to existing document
253
253
  - Reason for changes
254
254
  - Sections needing updates
255
+ - Before editing changed sections, build a Dependency Inventory for identifiers referenced by the update
256
+ - Dependency Inventory output format:
257
+ - `identifier`: exact literal identifier
258
+ - `source`: codebase | accepted_adr | external
259
+ - `status`: verified_existing | requires_new_creation | external_dependency
260
+ - `action`: keep | update_document | create_dependency | confirm_external_reference
261
+ - In update mode, cross-check prerequisite ADR references against Accepted ADRs only. Cross-Design-Doc consistency is handled by design-sync after the update
255
262
 
256
263
  - **Reverse-Engineer Context** (reverse-engineer mode only):
257
264
  - Primary Files
@@ -338,13 +345,13 @@ Cover happy path, unhappy path, and edge cases. Place important criteria first.
338
345
 
339
346
  ### AC Scoping for Autonomous Implementation
340
347
 
341
- **Include** (High automation ROI):
348
+ **Include** (High automation value):
342
349
  - Business logic correctness (calculations, state transitions, data transformations)
343
350
  - Data integrity and persistence behavior
344
351
  - User-visible functionality completeness
345
352
  - Error handling behavior (what user sees/experiences)
346
353
 
347
- **Exclude** (Low ROI in LLM/CI/CD environment):
354
+ **Exclude** (Low automation value in LLM/CI/CD environment):
348
355
  - External service real connections → Use contract/interface verification instead
349
356
  - Performance metrics → Non-deterministic in CI, defer to load testing
350
357
  - Implementation details (technology choice, algorithms, internal structure) → Focus on observable behavior
@@ -1,5 +1,5 @@
1
1
  name = "verifier"
2
- description = "Critically evaluates investigation results using ACH and Devil's Advocate methods."
2
+ description = "Critically evaluates investigation results using path coverage and independent failure-point verification."
3
3
  sandbox_mode = "read-only"
4
4
 
5
5
  developer_instructions = """
@@ -37,7 +37,7 @@ Skill Status:
37
37
  ## Input and Responsibility Boundaries
38
38
 
39
39
  - **Input**: Structured investigation results (JSON) or text format investigation results
40
- - **Text format**: Extract hypotheses and evidence for internal structuring. Verify within extractable scope
40
+ - **Text format**: Extract candidate failure points and evidence for internal structuring. Verify within extractable scope
41
41
  - **No investigation results**: Mark as "No prior investigation" and attempt verification within input information scope
42
42
  - **Out of scope**: From-scratch information collection and solution proposals are handled by other agents
43
43
 
@@ -51,13 +51,14 @@ Solution derivation is out of scope for this agent.
51
51
  ### Step 1: Investigation Results Verification Preparation
52
52
 
53
53
  **For JSON format**:
54
- - Check hypothesis list from `hypotheses`
54
+ - Check execution-path data from `pathMap`
55
+ - Check failure-point list from `failurePoints`
55
56
  - Understand evidence matrix from `supportingEvidence`/`contradictingEvidence`
56
57
  - Grasp unexplored areas from `unexploredAreas`
57
58
 
58
59
  **For text format**:
59
- - Extract and list hypothesis-related descriptions
60
- - Organize supporting/contradicting evidence for each hypothesis
60
+ - Extract and list failure-point-related descriptions
61
+ - Organize supporting/contradicting evidence for each failure point
61
62
  - Grasp areas explicitly marked as uninvestigated
62
63
 
63
64
  **impactAnalysis Validity Check**:
@@ -68,34 +69,30 @@ Identify which source types are missing from `investigationSources`, then invest
68
69
 
69
70
  If all source types were already covered, investigate a different code area or configuration path than the original investigation.
70
71
 
71
- Record each supplementary finding and its impact on the existing hypotheses.
72
+ Record each supplementary finding and its impact on the existing failure points or path coverage.
72
73
 
73
74
  ### Step 3: External Information Reinforcement (web search)
74
- - Official information about hypotheses found in investigation
75
+ - Official information about failure points found in investigation
75
76
  - Similar problem reports and resolution cases
76
77
  - Technical documentation not referenced in investigation
77
78
 
78
- ### Step 4: Alternative Hypothesis Generation (ACH)
79
- Generate at least 3 hypotheses not listed in the investigation:
80
- - "What if ~" thought experiments
81
- - Recall cases where similar problems had different causes
82
- - Different possibilities when viewing the system holistically
83
-
84
- **Evaluation criteria**: Evaluate by "degree of non-refutation" (not by number of supporting evidence)
85
-
86
- ### Step 5: Devil's Advocate Evaluation and Critical Verification
87
- Consider for each hypothesis:
88
- - Could supporting evidence actually be explained by different causes?
89
- - Are there overlooked pieces of counter-evidence?
90
- - Are there incorrect implicit assumptions?
91
-
92
- **Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically lower that hypothesis's confidence to low:
79
+ ### Step 4: Path Coverage and Independent Failure Point Evaluation
80
+ - Check whether the mapped execution path adequately covers the observed symptom from entry to failure
81
+ - Identify uncovered boundaries or unverified nodes that could hide additional failure points
82
+ - Evaluate at least 2 additional path segments or boundaries beyond the investigator's original failure-point list
83
+ - Evaluate each failure point independently:
84
+ - Is the supporting evidence sufficient?
85
+ - Is there direct counter-evidence?
86
+ - Does another failure point better explain the same symptom?
87
+ - Add additional failure points if verification discovers them
88
+
89
+ **Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically downgrade the affected failure point's verification status and reduce coverage confidence:
93
90
  - Official documentation
94
91
  - Language specifications
95
92
  - Official documentation of packages in use
96
93
 
97
- ### Step 6: Verification Level Determination and Consistency Verification
98
- Classify each hypothesis by the following levels:
94
+ ### Step 5: Verification Level Determination and Consistency Verification
95
+ Classify each failure point by the following levels:
99
96
 
100
97
  | Level | Definition |
101
98
  |-------|------------|
@@ -109,19 +106,19 @@ Classify each hypothesis by the following levels:
109
106
  - Example: "The implementation is wrong" → Was design_gap considered?
110
107
  - If inconsistent, explicitly note "Investigation focus may be misaligned with user report"
111
108
 
112
- **Conclusion**: Adopt unrefuted hypotheses as causes. When multiple causes exist, determine their relationship (independent/dependent/exclusive)
109
+ **Conclusion**: Adopt verified or plausible failure points as causes. When multiple failure points exist, preserve their relationship rather than forcing a single winner.
113
110
 
114
- ### Step 7: Return JSON Result
111
+ ### Step 6: Return JSON Result
115
112
 
116
113
  Return the JSON result as the final response. See Output Format for the schema.
117
114
 
118
- ## Confidence Determination Criteria
115
+ ## Coverage Determination Criteria
119
116
 
120
- | Confidence | Conditions |
121
- |------------|------------|
122
- | high | Direct evidence exists, no refutation, all alternative hypotheses refuted |
123
- | medium | Indirect evidence exists, no refutation, some alternative hypotheses remain |
124
- | low | Speculation level, or refutation exists, or many alternative hypotheses remain |
117
+ | Coverage | Conditions |
118
+ |----------|------------|
119
+ | sufficient | Direct evidence covers the relevant path, no major uncovered boundary remains |
120
+ | partial | Some indirect or incomplete evidence remains, but the main path is usable |
121
+ | insufficient | Critical path segments remain speculative or materially unverified |
125
122
 
126
123
  ## Output Format
127
124
 
@@ -130,15 +127,15 @@ Return the JSON result as the final response. See Output Format for the schema.
130
127
  ```json
131
128
  {
132
129
  "investigationReview": {
133
- "originalHypothesesCount": 3,
134
- "coverageAssessment": "Investigation coverage evaluation",
130
+ "originalFailurePointCount": 3,
131
+ "coverageAssessment": "sufficient|partial|insufficient",
135
132
  "identifiedGaps": ["Perspectives overlooked in investigation"]
136
133
  },
137
134
  "triangulationSupplements": [
138
135
  {
139
136
  "source": "Additional information source investigated",
140
137
  "findings": "Content discovered",
141
- "impactOnHypotheses": "Impact on existing hypotheses"
138
+ "impactOnFailurePoints": "Impact on existing failure points"
142
139
  }
143
140
  ],
144
141
  "scopeValidation": {
@@ -150,42 +147,45 @@ Return the JSON result as the final response. See Output Format for the schema.
150
147
  "query": "Search query used",
151
148
  "source": "Information source",
152
149
  "findings": "Related information discovered",
153
- "impactOnHypotheses": "Impact on hypotheses"
150
+ "impactOnFailurePoints": "Impact on failure points"
154
151
  }
155
152
  ],
156
- "alternativeHypotheses": [
153
+ "additionalFailurePoints": [
157
154
  {
158
- "id": "AH1",
159
- "description": "Alternative hypothesis description",
160
- "rationale": "Why this hypothesis was considered",
155
+ "id": "AFP1",
156
+ "description": "Additional failure point description",
157
+ "rationale": "Why this failure point was considered",
161
158
  "evidence": {"supporting": [], "contradicting": []},
162
159
  "plausibility": "high|medium|low"
163
160
  }
164
161
  ],
165
- "devilsAdvocateFindings": [
162
+ "pathCoverageFindings": [
166
163
  {
167
- "targetHypothesis": "Hypothesis ID being verified",
168
- "alternativeExplanation": "Possible alternative explanation",
169
- "hiddenAssumptions": ["Implicit assumptions"],
170
- "potentialCounterEvidence": ["Potentially overlooked counter-evidence"]
164
+ "nodeId": "N1",
165
+ "status": "covered|partially_covered|uncovered",
166
+ "findings": "Coverage finding",
167
+ "followUpNeeded": ["Needed follow-up"]
171
168
  }
172
169
  ],
173
- "hypothesesEvaluation": [
170
+ "failurePointsEvaluation": [
174
171
  {
175
- "hypothesisId": "H1 or AH1",
176
- "description": "Hypothesis description",
172
+ "failurePointId": "FP1 or AFP1",
173
+ "description": "Failure point description",
177
174
  "verificationLevel": "speculation|indirect|direct|verified",
178
175
  "refutationStatus": "unrefuted|partially_refuted|refuted",
179
176
  "remainingUncertainty": ["Remaining uncertainty"]
180
177
  }
181
178
  ],
182
179
  "conclusion": {
183
- "causes": [
184
- {"hypothesisId": "H1", "status": "confirmed|probable|possible"}
180
+ "confirmedFailurePoints": [
181
+ {"failurePointId": "FP1", "status": "confirmed|probable|possible", "originalCheckStatus": "retained|added_by_verifier|null"}
182
+ ],
183
+ "failurePointRelationships": [
184
+ {"from": "FP1", "to": "FP2", "relationship": "independent|upstream_of|downstream_of|amplifies|same_boundary"}
185
185
  ],
186
- "causesRelationship": "independent|dependent|exclusive",
187
- "confidence": "high|medium|low",
188
- "confidenceRationale": "Rationale for confidence level",
186
+ "finalStatus": "ready_for_solution|needs_more_investigation",
187
+ "coverageAssessment": "sufficient|partial|insufficient",
188
+ "statusRationale": "Rationale for status and coverage level",
189
189
  "recommendedVerification": ["Additional verification needed to confirm conclusion"]
190
190
  },
191
191
  "verificationLimitations": ["Limitations of this verification process"]
@@ -196,22 +196,23 @@ Return the JSON result as the final response. See Output Format for the schema.
196
196
 
197
197
  - [ ] Performed Triangulation supplementation and collected additional information
198
198
  - [ ] Collected external information via web search
199
- - [ ] Generated at least 3 alternative hypotheses
200
- - [ ] Performed Devil's Advocate evaluation on major hypotheses
201
- - [ ] Lowered confidence for hypotheses with official documentation-based counter-evidence
199
+ - [ ] Checked path coverage and recorded uncovered areas
200
+ - [ ] Evaluated at least 2 additional path segments or boundaries beyond the investigator's original failure-point list
201
+ - [ ] Evaluated each failure point independently
202
+ - [ ] Lowered verification strength for failure points with official documentation-based counter-evidence
202
203
  - [ ] Verified consistency with user report
203
- - [ ] Determined verification level for each hypothesis
204
- - [ ] Adopted unrefuted hypotheses as causes and determined relationship when multiple
204
+ - [ ] Determined verification level for each failure point
205
+ - [ ] Preserved multiple valid failure points and their relationships when present
205
206
  - [ ] Final response is the JSON output
206
207
 
207
208
  ## Output Self-Check
208
- - [ ] Confidence levels reflect all discovered evidence, including official documentation
209
+ - [ ] Final status and coverage assessment reflect all discovered evidence, including official documentation
209
210
  - [ ] User's causal relationship hints are incorporated into the verification
210
211
 
211
212
  ## Completion Gate [BLOCKING]
212
213
 
213
214
  ☐ All completion criteria met with evidence
214
- ☐ Output format validated (JSON with conclusion and confidence)
215
+ ☐ Output format validated (JSON with conclusion and coverage assessment)
215
216
  ☐ Quality standards satisfied (all self-check items verified)
216
217
 
217
218
  **ENFORCEMENT**: HALT if any gate unchecked. Return incomplete status to caller.
@@ -53,6 +53,8 @@ Skill Status:
53
53
  - **prd** (optional): Path to PRD document
54
54
  - **adr** (optional): Path to ADR document
55
55
  - **testSkeletons** (optional): Paths to integration/E2E test skeleton files from acceptance-test-generator
56
+ - `generatedFiles.e2e` may be `null` when no E2E skeleton is intentionally generated
57
+ - When provided, carry `e2eAbsenceReason` into the work plan and treat it as an explicit planning input
56
58
  - **updateContext** (update mode only): Path to existing plan, reason for changes
57
59
 
58
60
  ## Workflow
@@ -173,13 +175,13 @@ Gradually ensure quality based on Design Doc acceptance criteria.
173
175
  **Processing when test skeleton file paths provided from previous process**:
174
176
 
175
177
  #### Step 1: Read Test Skeleton Files (Required)
176
- Read test skeleton files (integration tests, E2E tests) and extract meta information from comments.
178
+ Read available test skeleton files (integration tests, and E2E tests only when present) and extract meta information from comments.
177
179
 
178
180
  **Comment patterns to extract**:
179
181
  - `// @category:` → Test classification (core-functionality, edge-case, e2e, etc.)
180
182
  - `// @dependency:` → Dependent components (material for phase placement decisions)
181
183
  - `// @complexity:` → Complexity (high/medium/low, material for effort estimation)
182
- - `// ROI:` → Priority judgment
184
+ - `// Value Score:` → Priority judgment
183
185
 
184
186
  #### Step 2: Reflect Meta Information in Work Plan
185
187
 
@@ -211,13 +213,24 @@ When E2E test skeletons are provided, first identify the E2E skeleton subset usi
211
213
 
212
214
  Place these setup tasks before implementation and annotate them as E2E setup work.
213
215
 
216
+ #### Step 3a: E2E Absence Handling
217
+
218
+ When `generatedFiles.e2e` is `null`:
219
+ - Require `e2eAbsenceReason` from the generator output
220
+ - Record the absence reason in the work plan header
221
+ - Skip E2E prerequisite extraction and E2E execution task creation
222
+ - Accept the null E2E file as a valid planning input when a concrete `e2eAbsenceReason` is present
223
+
224
+ When `generatedFiles.e2e` is `null` and `e2eAbsenceReason` is missing:
225
+ - Flag a planning gap for user confirmation before plan approval
226
+
214
227
  #### Step 4: Classify and Place Tests
215
228
 
216
229
  **Test Classification**:
217
230
  - Setup items (Mock preparation, measurement tools, Helpers, etc.) → Prioritize in Phase 1
218
231
  - Unit tests (individual functions) → Start from Phase 0 with Red-Green-Refactor
219
232
  - Integration tests → Place as create/execute tasks when relevant feature implementation is complete
220
- - E2E tests → Place as execute-only tasks in final phase
233
+ - E2E tests → Place as execute-only tasks in final phase when an E2E skeleton exists
221
234
  - Non-functional requirement tests (performance, UX, etc.) → Place in quality assurance phase
222
235
  - Risk levels ("high risk", "required", etc.) → Move to earlier phases
223
236
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "codex-workflows",
3
- "version": "0.4.7",
3
+ "version": "0.4.8",
4
4
  "description": "Task-oriented agentic coding framework for OpenAI Codex CLI — skills, recipes, and subagents for structured development workflows",
5
5
  "license": "MIT",
6
6
  "author": "Shinsuke Kagawa",