codex-workflows 0.4.7 → 0.4.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/skills/integration-e2e-testing/SKILL.md +45 -13
- package/.agents/skills/integration-e2e-testing/agents/openai.yaml +1 -1
- package/.agents/skills/integration-e2e-testing/references/e2e-design.md +7 -4
- package/.agents/skills/recipe-add-integration-tests/SKILL.md +6 -3
- package/.agents/skills/recipe-build/SKILL.md +6 -2
- package/.agents/skills/recipe-diagnose/SKILL.md +24 -23
- package/.agents/skills/recipe-front-build/SKILL.md +6 -2
- package/.agents/skills/recipe-front-plan/SKILL.md +1 -1
- package/.agents/skills/recipe-fullstack-build/SKILL.md +6 -2
- package/.agents/skills/recipe-fullstack-implement/SKILL.md +6 -4
- package/.agents/skills/recipe-implement/SKILL.md +9 -4
- package/.agents/skills/recipe-plan/SKILL.md +2 -1
- package/.agents/skills/recipe-update-doc/SKILL.md +1 -1
- package/.agents/skills/subagents-orchestration-guide/SKILL.md +9 -6
- package/.agents/skills/task-analyzer/references/skills-index.yaml +2 -2
- package/.agents/skills/testing/references/typescript.md +1 -1
- package/.codex/agents/acceptance-test-generator.toml +49 -26
- package/.codex/agents/code-verifier.toml +3 -1
- package/.codex/agents/investigator.toml +46 -18
- package/.codex/agents/quality-fixer-frontend.toml +54 -8
- package/.codex/agents/quality-fixer.toml +55 -8
- package/.codex/agents/solver.toml +29 -25
- package/.codex/agents/technical-designer-frontend.toml +9 -2
- package/.codex/agents/technical-designer.toml +9 -2
- package/.codex/agents/verifier.toml +61 -60
- package/.codex/agents/work-planner.toml +16 -3
- package/package.json +1 -1
|
@@ -220,6 +220,13 @@ When a UI Spec exists for the feature (`docs/ui-spec/{feature-name}-ui-spec.md`)
|
|
|
220
220
|
- Path to existing document
|
|
221
221
|
- Reason for changes
|
|
222
222
|
- Sections needing updates
|
|
223
|
+
- Before editing changed sections, build a Dependency Inventory for identifiers referenced by the update
|
|
224
|
+
- Dependency Inventory output format:
|
|
225
|
+
- `identifier`: exact literal identifier
|
|
226
|
+
- `source`: codebase | accepted_adr | external
|
|
227
|
+
- `status`: verified_existing | requires_new_creation | external_dependency
|
|
228
|
+
- `action`: keep | update_document | create_dependency | confirm_external_reference
|
|
229
|
+
- In update mode, cross-check prerequisite ADR references against Accepted ADRs only. Cross-Design-Doc consistency is handled by design-sync after the update
|
|
223
230
|
|
|
224
231
|
- **Reverse-Engineer Context** (reverse-engineer mode only):
|
|
225
232
|
- Primary Files
|
|
@@ -309,14 +316,14 @@ Cover happy path, unhappy path, and edge cases including empty and loading state
|
|
|
309
316
|
|
|
310
317
|
### AC Scoping for Autonomous Implementation (Frontend)
|
|
311
318
|
|
|
312
|
-
**Include** (High automation
|
|
319
|
+
**Include** (High automation value):
|
|
313
320
|
- User interaction behavior (button clicks, form submissions, navigation)
|
|
314
321
|
- Rendering correctness (component displays correct data)
|
|
315
322
|
- State management behavior (state updates correctly on user actions)
|
|
316
323
|
- Error handling behavior (error messages displayed to user)
|
|
317
324
|
- Accessibility (keyboard navigation, screen reader support)
|
|
318
325
|
|
|
319
|
-
**Exclude** (Low
|
|
326
|
+
**Exclude** (Low automation value in LLM/CI/CD environment):
|
|
320
327
|
- External API real connections → Use MSW for API mocking instead
|
|
321
328
|
- Performance metrics → Non-deterministic in CI environment
|
|
322
329
|
- Implementation details → Focus on user-observable behavior
|
|
@@ -252,6 +252,13 @@ Confirm and document conflicts with existing systems at each integration point t
|
|
|
252
252
|
- Path to existing document
|
|
253
253
|
- Reason for changes
|
|
254
254
|
- Sections needing updates
|
|
255
|
+
- Before editing changed sections, build a Dependency Inventory for identifiers referenced by the update
|
|
256
|
+
- Dependency Inventory output format:
|
|
257
|
+
- `identifier`: exact literal identifier
|
|
258
|
+
- `source`: codebase | accepted_adr | external
|
|
259
|
+
- `status`: verified_existing | requires_new_creation | external_dependency
|
|
260
|
+
- `action`: keep | update_document | create_dependency | confirm_external_reference
|
|
261
|
+
- In update mode, cross-check prerequisite ADR references against Accepted ADRs only. Cross-Design-Doc consistency is handled by design-sync after the update
|
|
255
262
|
|
|
256
263
|
- **Reverse-Engineer Context** (reverse-engineer mode only):
|
|
257
264
|
- Primary Files
|
|
@@ -338,13 +345,13 @@ Cover happy path, unhappy path, and edge cases. Place important criteria first.
|
|
|
338
345
|
|
|
339
346
|
### AC Scoping for Autonomous Implementation
|
|
340
347
|
|
|
341
|
-
**Include** (High automation
|
|
348
|
+
**Include** (High automation value):
|
|
342
349
|
- Business logic correctness (calculations, state transitions, data transformations)
|
|
343
350
|
- Data integrity and persistence behavior
|
|
344
351
|
- User-visible functionality completeness
|
|
345
352
|
- Error handling behavior (what user sees/experiences)
|
|
346
353
|
|
|
347
|
-
**Exclude** (Low
|
|
354
|
+
**Exclude** (Low automation value in LLM/CI/CD environment):
|
|
348
355
|
- External service real connections → Use contract/interface verification instead
|
|
349
356
|
- Performance metrics → Non-deterministic in CI, defer to load testing
|
|
350
357
|
- Implementation details (technology choice, algorithms, internal structure) → Focus on observable behavior
|
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
name = "verifier"
|
|
2
|
-
description = "Critically evaluates investigation results using
|
|
2
|
+
description = "Critically evaluates investigation results using path coverage and independent failure-point verification."
|
|
3
3
|
sandbox_mode = "read-only"
|
|
4
4
|
|
|
5
5
|
developer_instructions = """
|
|
@@ -37,7 +37,7 @@ Skill Status:
|
|
|
37
37
|
## Input and Responsibility Boundaries
|
|
38
38
|
|
|
39
39
|
- **Input**: Structured investigation results (JSON) or text format investigation results
|
|
40
|
-
- **Text format**: Extract
|
|
40
|
+
- **Text format**: Extract candidate failure points and evidence for internal structuring. Verify within extractable scope
|
|
41
41
|
- **No investigation results**: Mark as "No prior investigation" and attempt verification within input information scope
|
|
42
42
|
- **Out of scope**: From-scratch information collection and solution proposals are handled by other agents
|
|
43
43
|
|
|
@@ -51,13 +51,14 @@ Solution derivation is out of scope for this agent.
|
|
|
51
51
|
### Step 1: Investigation Results Verification Preparation
|
|
52
52
|
|
|
53
53
|
**For JSON format**:
|
|
54
|
-
- Check
|
|
54
|
+
- Check execution-path data from `pathMap`
|
|
55
|
+
- Check failure-point list from `failurePoints`
|
|
55
56
|
- Understand evidence matrix from `supportingEvidence`/`contradictingEvidence`
|
|
56
57
|
- Grasp unexplored areas from `unexploredAreas`
|
|
57
58
|
|
|
58
59
|
**For text format**:
|
|
59
|
-
- Extract and list
|
|
60
|
-
- Organize supporting/contradicting evidence for each
|
|
60
|
+
- Extract and list failure-point-related descriptions
|
|
61
|
+
- Organize supporting/contradicting evidence for each failure point
|
|
61
62
|
- Grasp areas explicitly marked as uninvestigated
|
|
62
63
|
|
|
63
64
|
**impactAnalysis Validity Check**:
|
|
@@ -68,34 +69,30 @@ Identify which source types are missing from `investigationSources`, then invest
|
|
|
68
69
|
|
|
69
70
|
If all source types were already covered, investigate a different code area or configuration path than the original investigation.
|
|
70
71
|
|
|
71
|
-
Record each supplementary finding and its impact on the existing
|
|
72
|
+
Record each supplementary finding and its impact on the existing failure points or path coverage.
|
|
72
73
|
|
|
73
74
|
### Step 3: External Information Reinforcement (web search)
|
|
74
|
-
- Official information about
|
|
75
|
+
- Official information about failure points found in investigation
|
|
75
76
|
- Similar problem reports and resolution cases
|
|
76
77
|
- Technical documentation not referenced in investigation
|
|
77
78
|
|
|
78
|
-
### Step 4:
|
|
79
|
-
|
|
80
|
-
-
|
|
81
|
-
-
|
|
82
|
-
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
-
|
|
89
|
-
- Are there overlooked pieces of counter-evidence?
|
|
90
|
-
- Are there incorrect implicit assumptions?
|
|
91
|
-
|
|
92
|
-
**Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically lower that hypothesis's confidence to low:
|
|
79
|
+
### Step 4: Path Coverage and Independent Failure Point Evaluation
|
|
80
|
+
- Check whether the mapped execution path adequately covers the observed symptom from entry to failure
|
|
81
|
+
- Identify uncovered boundaries or unverified nodes that could hide additional failure points
|
|
82
|
+
- Evaluate at least 2 additional path segments or boundaries beyond the investigator's original failure-point list
|
|
83
|
+
- Evaluate each failure point independently:
|
|
84
|
+
- Is the supporting evidence sufficient?
|
|
85
|
+
- Is there direct counter-evidence?
|
|
86
|
+
- Does another failure point better explain the same symptom?
|
|
87
|
+
- Add additional failure points if verification discovers them
|
|
88
|
+
|
|
89
|
+
**Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically downgrade the affected failure point's verification status and reduce coverage confidence:
|
|
93
90
|
- Official documentation
|
|
94
91
|
- Language specifications
|
|
95
92
|
- Official documentation of packages in use
|
|
96
93
|
|
|
97
|
-
### Step
|
|
98
|
-
Classify each
|
|
94
|
+
### Step 5: Verification Level Determination and Consistency Verification
|
|
95
|
+
Classify each failure point by the following levels:
|
|
99
96
|
|
|
100
97
|
| Level | Definition |
|
|
101
98
|
|-------|------------|
|
|
@@ -109,19 +106,19 @@ Classify each hypothesis by the following levels:
|
|
|
109
106
|
- Example: "The implementation is wrong" → Was design_gap considered?
|
|
110
107
|
- If inconsistent, explicitly note "Investigation focus may be misaligned with user report"
|
|
111
108
|
|
|
112
|
-
**Conclusion**: Adopt
|
|
109
|
+
**Conclusion**: Adopt verified or plausible failure points as causes. When multiple failure points exist, preserve their relationship rather than forcing a single winner.
|
|
113
110
|
|
|
114
|
-
### Step
|
|
111
|
+
### Step 6: Return JSON Result
|
|
115
112
|
|
|
116
113
|
Return the JSON result as the final response. See Output Format for the schema.
|
|
117
114
|
|
|
118
|
-
##
|
|
115
|
+
## Coverage Determination Criteria
|
|
119
116
|
|
|
120
|
-
|
|
|
121
|
-
|
|
122
|
-
|
|
|
123
|
-
|
|
|
124
|
-
|
|
|
117
|
+
| Coverage | Conditions |
|
|
118
|
+
|----------|------------|
|
|
119
|
+
| sufficient | Direct evidence covers the relevant path, no major uncovered boundary remains |
|
|
120
|
+
| partial | Some indirect or incomplete evidence remains, but the main path is usable |
|
|
121
|
+
| insufficient | Critical path segments remain speculative or materially unverified |
|
|
125
122
|
|
|
126
123
|
## Output Format
|
|
127
124
|
|
|
@@ -130,15 +127,15 @@ Return the JSON result as the final response. See Output Format for the schema.
|
|
|
130
127
|
```json
|
|
131
128
|
{
|
|
132
129
|
"investigationReview": {
|
|
133
|
-
"
|
|
134
|
-
"coverageAssessment": "
|
|
130
|
+
"originalFailurePointCount": 3,
|
|
131
|
+
"coverageAssessment": "sufficient|partial|insufficient",
|
|
135
132
|
"identifiedGaps": ["Perspectives overlooked in investigation"]
|
|
136
133
|
},
|
|
137
134
|
"triangulationSupplements": [
|
|
138
135
|
{
|
|
139
136
|
"source": "Additional information source investigated",
|
|
140
137
|
"findings": "Content discovered",
|
|
141
|
-
"
|
|
138
|
+
"impactOnFailurePoints": "Impact on existing failure points"
|
|
142
139
|
}
|
|
143
140
|
],
|
|
144
141
|
"scopeValidation": {
|
|
@@ -150,42 +147,45 @@ Return the JSON result as the final response. See Output Format for the schema.
|
|
|
150
147
|
"query": "Search query used",
|
|
151
148
|
"source": "Information source",
|
|
152
149
|
"findings": "Related information discovered",
|
|
153
|
-
"
|
|
150
|
+
"impactOnFailurePoints": "Impact on failure points"
|
|
154
151
|
}
|
|
155
152
|
],
|
|
156
|
-
"
|
|
153
|
+
"additionalFailurePoints": [
|
|
157
154
|
{
|
|
158
|
-
"id": "
|
|
159
|
-
"description": "
|
|
160
|
-
"rationale": "Why this
|
|
155
|
+
"id": "AFP1",
|
|
156
|
+
"description": "Additional failure point description",
|
|
157
|
+
"rationale": "Why this failure point was considered",
|
|
161
158
|
"evidence": {"supporting": [], "contradicting": []},
|
|
162
159
|
"plausibility": "high|medium|low"
|
|
163
160
|
}
|
|
164
161
|
],
|
|
165
|
-
"
|
|
162
|
+
"pathCoverageFindings": [
|
|
166
163
|
{
|
|
167
|
-
"
|
|
168
|
-
"
|
|
169
|
-
"
|
|
170
|
-
"
|
|
164
|
+
"nodeId": "N1",
|
|
165
|
+
"status": "covered|partially_covered|uncovered",
|
|
166
|
+
"findings": "Coverage finding",
|
|
167
|
+
"followUpNeeded": ["Needed follow-up"]
|
|
171
168
|
}
|
|
172
169
|
],
|
|
173
|
-
"
|
|
170
|
+
"failurePointsEvaluation": [
|
|
174
171
|
{
|
|
175
|
-
"
|
|
176
|
-
"description": "
|
|
172
|
+
"failurePointId": "FP1 or AFP1",
|
|
173
|
+
"description": "Failure point description",
|
|
177
174
|
"verificationLevel": "speculation|indirect|direct|verified",
|
|
178
175
|
"refutationStatus": "unrefuted|partially_refuted|refuted",
|
|
179
176
|
"remainingUncertainty": ["Remaining uncertainty"]
|
|
180
177
|
}
|
|
181
178
|
],
|
|
182
179
|
"conclusion": {
|
|
183
|
-
"
|
|
184
|
-
{"
|
|
180
|
+
"confirmedFailurePoints": [
|
|
181
|
+
{"failurePointId": "FP1", "status": "confirmed|probable|possible", "originalCheckStatus": "retained|added_by_verifier|null"}
|
|
182
|
+
],
|
|
183
|
+
"failurePointRelationships": [
|
|
184
|
+
{"from": "FP1", "to": "FP2", "relationship": "independent|upstream_of|downstream_of|amplifies|same_boundary"}
|
|
185
185
|
],
|
|
186
|
-
"
|
|
187
|
-
"
|
|
188
|
-
"
|
|
186
|
+
"finalStatus": "ready_for_solution|needs_more_investigation",
|
|
187
|
+
"coverageAssessment": "sufficient|partial|insufficient",
|
|
188
|
+
"statusRationale": "Rationale for status and coverage level",
|
|
189
189
|
"recommendedVerification": ["Additional verification needed to confirm conclusion"]
|
|
190
190
|
},
|
|
191
191
|
"verificationLimitations": ["Limitations of this verification process"]
|
|
@@ -196,22 +196,23 @@ Return the JSON result as the final response. See Output Format for the schema.
|
|
|
196
196
|
|
|
197
197
|
- [ ] Performed Triangulation supplementation and collected additional information
|
|
198
198
|
- [ ] Collected external information via web search
|
|
199
|
-
- [ ]
|
|
200
|
-
- [ ]
|
|
201
|
-
- [ ]
|
|
199
|
+
- [ ] Checked path coverage and recorded uncovered areas
|
|
200
|
+
- [ ] Evaluated at least 2 additional path segments or boundaries beyond the investigator's original failure-point list
|
|
201
|
+
- [ ] Evaluated each failure point independently
|
|
202
|
+
- [ ] Lowered verification strength for failure points with official documentation-based counter-evidence
|
|
202
203
|
- [ ] Verified consistency with user report
|
|
203
|
-
- [ ] Determined verification level for each
|
|
204
|
-
- [ ]
|
|
204
|
+
- [ ] Determined verification level for each failure point
|
|
205
|
+
- [ ] Preserved multiple valid failure points and their relationships when present
|
|
205
206
|
- [ ] Final response is the JSON output
|
|
206
207
|
|
|
207
208
|
## Output Self-Check
|
|
208
|
-
- [ ]
|
|
209
|
+
- [ ] Final status and coverage assessment reflect all discovered evidence, including official documentation
|
|
209
210
|
- [ ] User's causal relationship hints are incorporated into the verification
|
|
210
211
|
|
|
211
212
|
## Completion Gate [BLOCKING]
|
|
212
213
|
|
|
213
214
|
☐ All completion criteria met with evidence
|
|
214
|
-
☐ Output format validated (JSON with conclusion and
|
|
215
|
+
☐ Output format validated (JSON with conclusion and coverage assessment)
|
|
215
216
|
☐ Quality standards satisfied (all self-check items verified)
|
|
216
217
|
|
|
217
218
|
**ENFORCEMENT**: HALT if any gate unchecked. Return incomplete status to caller.
|
|
@@ -53,6 +53,8 @@ Skill Status:
|
|
|
53
53
|
- **prd** (optional): Path to PRD document
|
|
54
54
|
- **adr** (optional): Path to ADR document
|
|
55
55
|
- **testSkeletons** (optional): Paths to integration/E2E test skeleton files from acceptance-test-generator
|
|
56
|
+
- `generatedFiles.e2e` may be `null` when no E2E skeleton is intentionally generated
|
|
57
|
+
- When provided, carry `e2eAbsenceReason` into the work plan and treat it as an explicit planning input
|
|
56
58
|
- **updateContext** (update mode only): Path to existing plan, reason for changes
|
|
57
59
|
|
|
58
60
|
## Workflow
|
|
@@ -173,13 +175,13 @@ Gradually ensure quality based on Design Doc acceptance criteria.
|
|
|
173
175
|
**Processing when test skeleton file paths provided from previous process**:
|
|
174
176
|
|
|
175
177
|
#### Step 1: Read Test Skeleton Files (Required)
|
|
176
|
-
Read test skeleton files (integration tests, E2E tests) and extract meta information from comments.
|
|
178
|
+
Read available test skeleton files (integration tests, and E2E tests only when present) and extract meta information from comments.
|
|
177
179
|
|
|
178
180
|
**Comment patterns to extract**:
|
|
179
181
|
- `// @category:` → Test classification (core-functionality, edge-case, e2e, etc.)
|
|
180
182
|
- `// @dependency:` → Dependent components (material for phase placement decisions)
|
|
181
183
|
- `// @complexity:` → Complexity (high/medium/low, material for effort estimation)
|
|
182
|
-
- `//
|
|
184
|
+
- `// Value Score:` → Priority judgment
|
|
183
185
|
|
|
184
186
|
#### Step 2: Reflect Meta Information in Work Plan
|
|
185
187
|
|
|
@@ -211,13 +213,24 @@ When E2E test skeletons are provided, first identify the E2E skeleton subset usi
|
|
|
211
213
|
|
|
212
214
|
Place these setup tasks before implementation and annotate them as E2E setup work.
|
|
213
215
|
|
|
216
|
+
#### Step 3a: E2E Absence Handling
|
|
217
|
+
|
|
218
|
+
When `generatedFiles.e2e` is `null`:
|
|
219
|
+
- Require `e2eAbsenceReason` from the generator output
|
|
220
|
+
- Record the absence reason in the work plan header
|
|
221
|
+
- Skip E2E prerequisite extraction and E2E execution task creation
|
|
222
|
+
- Accept the null E2E file as a valid planning input when a concrete `e2eAbsenceReason` is present
|
|
223
|
+
|
|
224
|
+
When `generatedFiles.e2e` is `null` and `e2eAbsenceReason` is missing:
|
|
225
|
+
- Flag a planning gap for user confirmation before plan approval
|
|
226
|
+
|
|
214
227
|
#### Step 4: Classify and Place Tests
|
|
215
228
|
|
|
216
229
|
**Test Classification**:
|
|
217
230
|
- Setup items (Mock preparation, measurement tools, Helpers, etc.) → Prioritize in Phase 1
|
|
218
231
|
- Unit tests (individual functions) → Start from Phase 0 with Red-Green-Refactor
|
|
219
232
|
- Integration tests → Place as create/execute tasks when relevant feature implementation is complete
|
|
220
|
-
- E2E tests → Place as execute-only tasks in final phase
|
|
233
|
+
- E2E tests → Place as execute-only tasks in final phase when an E2E skeleton exists
|
|
221
234
|
- Non-functional requirement tests (performance, UX, etc.) → Place in quality assurance phase
|
|
222
235
|
- Risk levels ("high risk", "required", etc.) → Move to earlier phases
|
|
223
236
|
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "codex-workflows",
|
|
3
|
-
"version": "0.4.
|
|
3
|
+
"version": "0.4.8",
|
|
4
4
|
"description": "Task-oriented agentic coding framework for OpenAI Codex CLI — skills, recipes, and subagents for structured development workflows",
|
|
5
5
|
"license": "MIT",
|
|
6
6
|
"author": "Shinsuke Kagawa",
|