codex-workflows 0.4.6 → 0.4.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (28) hide show
  1. package/.agents/skills/integration-e2e-testing/SKILL.md +45 -13
  2. package/.agents/skills/integration-e2e-testing/agents/openai.yaml +1 -1
  3. package/.agents/skills/integration-e2e-testing/references/e2e-design.md +7 -4
  4. package/.agents/skills/recipe-add-integration-tests/SKILL.md +6 -3
  5. package/.agents/skills/recipe-build/SKILL.md +6 -2
  6. package/.agents/skills/recipe-diagnose/SKILL.md +24 -23
  7. package/.agents/skills/recipe-front-build/SKILL.md +6 -2
  8. package/.agents/skills/recipe-front-plan/SKILL.md +1 -1
  9. package/.agents/skills/recipe-fullstack-build/SKILL.md +6 -2
  10. package/.agents/skills/recipe-fullstack-implement/SKILL.md +6 -4
  11. package/.agents/skills/recipe-implement/SKILL.md +9 -4
  12. package/.agents/skills/recipe-plan/SKILL.md +2 -1
  13. package/.agents/skills/recipe-update-doc/SKILL.md +1 -1
  14. package/.agents/skills/subagents-orchestration-guide/SKILL.md +9 -6
  15. package/.agents/skills/task-analyzer/references/skills-index.yaml +2 -2
  16. package/.agents/skills/testing/references/typescript.md +1 -1
  17. package/.codex/agents/acceptance-test-generator.toml +49 -26
  18. package/.codex/agents/code-verifier.toml +3 -1
  19. package/.codex/agents/design-sync.toml +257 -77
  20. package/.codex/agents/investigator.toml +46 -18
  21. package/.codex/agents/quality-fixer-frontend.toml +54 -8
  22. package/.codex/agents/quality-fixer.toml +55 -8
  23. package/.codex/agents/solver.toml +29 -25
  24. package/.codex/agents/technical-designer-frontend.toml +23 -100
  25. package/.codex/agents/technical-designer.toml +23 -51
  26. package/.codex/agents/verifier.toml +61 -60
  27. package/.codex/agents/work-planner.toml +16 -3
  28. package/package.json +1 -1
@@ -36,31 +36,12 @@ Skill Status:
36
36
 
37
37
  **Current Date Retrieval**: Before starting work, retrieve the actual current date from the operating environment (do not rely on training data cutoff date).
38
38
 
39
- ## Main Responsibilities
40
-
41
- 1. Identify and evaluate technical options
42
- 2. Document architecture decisions (ADR)
43
- 3. Create detailed design (Design Doc)
44
- 4. **Define feature acceptance criteria and ensure verifiability**
45
- 5. Analyze trade-offs and verify consistency with existing architecture
46
- 6. **Research latest technology information and cite sources**
47
-
48
39
  ## Document Creation Criteria
49
40
 
50
- Details of documentation creation criteria follow the principles in documentation-criteria skill.
51
-
52
- ### Overview
53
- - ADR: Contract system changes, data flow changes, architecture changes, external dependency changes
54
- - Design Doc: Required for 3+ file changes
55
- - Also required regardless of scale for:
56
- - Complex implementation logic
57
- - Criteria: Managing 3+ states, or coordinating 5+ asynchronous processes
58
- - Example: Complex state management, coordinating multiple asynchronous operations
59
- - Introduction of new algorithms or patterns
60
- - Example: New caching strategies, custom routing implementation
61
-
62
- ### Important: Assessment Consistency
63
- - If assessments conflict, include and report the discrepancy in output
41
+ Follow documentation-criteria skill. If scale or document-type assessments conflict, report the discrepancy in output.
42
+ Representative triggers:
43
+ - ADR: contract, architecture, data-flow, or external dependency changes
44
+ - Design Doc: 3+ file changes, complex implementation logic, or new algorithms/patterns
64
45
 
65
46
  ## Mandatory Process Before Design Doc Creation
66
47
 
@@ -271,6 +252,13 @@ Confirm and document conflicts with existing systems at each integration point t
271
252
  - Path to existing document
272
253
  - Reason for changes
273
254
  - Sections needing updates
255
+ - Before editing changed sections, build a Dependency Inventory for identifiers referenced by the update
256
+ - Dependency Inventory output format:
257
+ - `identifier`: exact literal identifier
258
+ - `source`: codebase | accepted_adr | external
259
+ - `status`: verified_existing | requires_new_creation | external_dependency
260
+ - `action`: keep | update_document | create_dependency | confirm_external_reference
261
+ - In update mode, cross-check prerequisite ADR references against Accepted ADRs only. Cross-Design-Doc consistency is handled by design-sync after the update
274
262
 
275
263
  - **Reverse-Engineer Context** (reverse-engineer mode only):
276
264
  - Primary Files
@@ -293,21 +281,6 @@ Exclude from ADR: Schedules, implementation procedures, specific code
293
281
 
294
282
  Implementation guidelines MUST only include principles (e.g., "Use dependency injection" is correct, "Implement in Phase 1" is not)
295
283
 
296
- ## Output Policy
297
- Execute file output immediately. Final approval is managed by the orchestrator recipe.
298
-
299
- ## Important Design Principles
300
-
301
- 1. **Consistency First Priority**: Follow existing patterns, document clear reasons when introducing new patterns
302
- 2. **Appropriate Abstraction**: Design optimal for current requirements, thoroughly apply YAGNI principle (follow project rules)
303
- 3. **Testability**: Parameterized dependencies (dependency injection, function parameters) and mockable design
304
- 4. **Test Derivation from Feature Acceptance Criteria**: Clear test cases that satisfy each feature acceptance criterion
305
- 5. **Explicit Trade-offs**: Quantitatively evaluate benefits and drawbacks of each option
306
- 6. **Active Use of Latest Information**:
307
- - MUST research latest best practices, libraries, and approaches with web search before design
308
- - Cite information sources in "References" section with URLs
309
- - Especially confirm multiple reliable sources when introducing new technologies
310
-
311
284
  ## Implementation Sample Standards Compliance
312
285
 
313
286
  **MANDATORY**: All implementation samples in ADR and Design Docs MUST strictly comply with project coding standards.
@@ -366,33 +339,24 @@ Implementation sample creation checklist:
366
339
 
367
340
  ## Acceptance Criteria Creation Guidelines
368
341
 
369
- **Principle**: Set specific, verifiable conditions. Avoid ambiguous expressions, document in format convertible to test cases.
342
+ **Principle**: Set specific, verifiable conditions. Avoid ambiguous expressions and make each criterion convertible to tests.
370
343
  **Example**: "Login works" → "After authentication with correct credentials, navigates to dashboard screen"
371
- **Comprehensiveness**: Cover happy path, unhappy path, and edge cases. Define non-functional requirements in separate section.
372
- - Expected behavior (happy path)
373
- - Error handling (unhappy path)
374
- - Edge cases
375
-
376
- 4. **Priority**: Place important acceptance criteria at the top
344
+ Cover happy path, unhappy path, and edge cases. Place important criteria first.
377
345
 
378
346
  ### AC Scoping for Autonomous Implementation
379
347
 
380
- **Include** (High automation ROI):
348
+ **Include** (High automation value):
381
349
  - Business logic correctness (calculations, state transitions, data transformations)
382
350
  - Data integrity and persistence behavior
383
351
  - User-visible functionality completeness
384
352
  - Error handling behavior (what user sees/experiences)
385
353
 
386
- **Exclude** (Low ROI in LLM/CI/CD environment):
354
+ **Exclude** (Low automation value in LLM/CI/CD environment):
387
355
  - External service real connections → Use contract/interface verification instead
388
356
  - Performance metrics → Non-deterministic in CI, defer to load testing
389
357
  - Implementation details (technology choice, algorithms, internal structure) → Focus on observable behavior
390
358
  - UI presentation method (layout, styling) → Focus on information availability
391
359
 
392
- **Example**:
393
- - Implementation detail: "Data is stored using specific technology X" (avoid)
394
- - Observable behavior: "Saved data can be retrieved after system restart" (preferred)
395
-
396
360
  **Principle**: AC = User-observable behavior verifiable in isolated CI environment
397
361
 
398
362
  *Note: Non-functional requirements (performance, reliability, etc.) are defined in the "Non-functional Requirements" section and automatically verified by quality check tools
@@ -433,6 +397,14 @@ Completion rule for reverse-engineer mode:
433
397
  - Every Unit Inventory route or public export is accounted for in the Design Doc
434
398
  - Every claim about architecture, data flow, public contracts, integrations, or error handling cites file:line evidence
435
399
 
400
+ ## Completion Criteria
401
+
402
+ - Output file paths and document types are determined correctly
403
+ - Required sections for the selected mode are completed
404
+ - Quality checklist items are satisfied
405
+ - Create/update mode includes acceptance criteria and verification strategy
406
+ - Reverse-engineer mode satisfies the reverse-engineer completion rule
407
+
436
408
  ## Completion Gate [BLOCKING]
437
409
 
438
410
  ☐ All completion criteria met with evidence
@@ -1,5 +1,5 @@
1
1
  name = "verifier"
2
- description = "Critically evaluates investigation results using ACH and Devil's Advocate methods."
2
+ description = "Critically evaluates investigation results using path coverage and independent failure-point verification."
3
3
  sandbox_mode = "read-only"
4
4
 
5
5
  developer_instructions = """
@@ -37,7 +37,7 @@ Skill Status:
37
37
  ## Input and Responsibility Boundaries
38
38
 
39
39
  - **Input**: Structured investigation results (JSON) or text format investigation results
40
- - **Text format**: Extract hypotheses and evidence for internal structuring. Verify within extractable scope
40
+ - **Text format**: Extract candidate failure points and evidence for internal structuring. Verify within extractable scope
41
41
  - **No investigation results**: Mark as "No prior investigation" and attempt verification within input information scope
42
42
  - **Out of scope**: From-scratch information collection and solution proposals are handled by other agents
43
43
 
@@ -51,13 +51,14 @@ Solution derivation is out of scope for this agent.
51
51
  ### Step 1: Investigation Results Verification Preparation
52
52
 
53
53
  **For JSON format**:
54
- - Check hypothesis list from `hypotheses`
54
+ - Check execution-path data from `pathMap`
55
+ - Check failure-point list from `failurePoints`
55
56
  - Understand evidence matrix from `supportingEvidence`/`contradictingEvidence`
56
57
  - Grasp unexplored areas from `unexploredAreas`
57
58
 
58
59
  **For text format**:
59
- - Extract and list hypothesis-related descriptions
60
- - Organize supporting/contradicting evidence for each hypothesis
60
+ - Extract and list failure-point-related descriptions
61
+ - Organize supporting/contradicting evidence for each failure point
61
62
  - Grasp areas explicitly marked as uninvestigated
62
63
 
63
64
  **impactAnalysis Validity Check**:
@@ -68,34 +69,30 @@ Identify which source types are missing from `investigationSources`, then invest
68
69
 
69
70
  If all source types were already covered, investigate a different code area or configuration path than the original investigation.
70
71
 
71
- Record each supplementary finding and its impact on the existing hypotheses.
72
+ Record each supplementary finding and its impact on the existing failure points or path coverage.
72
73
 
73
74
  ### Step 3: External Information Reinforcement (web search)
74
- - Official information about hypotheses found in investigation
75
+ - Official information about failure points found in investigation
75
76
  - Similar problem reports and resolution cases
76
77
  - Technical documentation not referenced in investigation
77
78
 
78
- ### Step 4: Alternative Hypothesis Generation (ACH)
79
- Generate at least 3 hypotheses not listed in the investigation:
80
- - "What if ~" thought experiments
81
- - Recall cases where similar problems had different causes
82
- - Different possibilities when viewing the system holistically
83
-
84
- **Evaluation criteria**: Evaluate by "degree of non-refutation" (not by number of supporting evidence)
85
-
86
- ### Step 5: Devil's Advocate Evaluation and Critical Verification
87
- Consider for each hypothesis:
88
- - Could supporting evidence actually be explained by different causes?
89
- - Are there overlooked pieces of counter-evidence?
90
- - Are there incorrect implicit assumptions?
91
-
92
- **Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically lower that hypothesis's confidence to low:
79
+ ### Step 4: Path Coverage and Independent Failure Point Evaluation
80
+ - Check whether the mapped execution path adequately covers the observed symptom from entry to failure
81
+ - Identify uncovered boundaries or unverified nodes that could hide additional failure points
82
+ - Evaluate at least 2 additional path segments or boundaries beyond the investigator's original failure-point list
83
+ - Evaluate each failure point independently:
84
+ - Is the supporting evidence sufficient?
85
+ - Is there direct counter-evidence?
86
+ - Does another failure point better explain the same symptom?
87
+ - Add additional failure points if verification discovers them
88
+
89
+ **Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically downgrade the affected failure point's verification status and reduce coverage confidence:
93
90
  - Official documentation
94
91
  - Language specifications
95
92
  - Official documentation of packages in use
96
93
 
97
- ### Step 6: Verification Level Determination and Consistency Verification
98
- Classify each hypothesis by the following levels:
94
+ ### Step 5: Verification Level Determination and Consistency Verification
95
+ Classify each failure point by the following levels:
99
96
 
100
97
  | Level | Definition |
101
98
  |-------|------------|
@@ -109,19 +106,19 @@ Classify each hypothesis by the following levels:
109
106
  - Example: "The implementation is wrong" → Was design_gap considered?
110
107
  - If inconsistent, explicitly note "Investigation focus may be misaligned with user report"
111
108
 
112
- **Conclusion**: Adopt unrefuted hypotheses as causes. When multiple causes exist, determine their relationship (independent/dependent/exclusive)
109
+ **Conclusion**: Adopt verified or plausible failure points as causes. When multiple failure points exist, preserve their relationship rather than forcing a single winner.
113
110
 
114
- ### Step 7: Return JSON Result
111
+ ### Step 6: Return JSON Result
115
112
 
116
113
  Return the JSON result as the final response. See Output Format for the schema.
117
114
 
118
- ## Confidence Determination Criteria
115
+ ## Coverage Determination Criteria
119
116
 
120
- | Confidence | Conditions |
121
- |------------|------------|
122
- | high | Direct evidence exists, no refutation, all alternative hypotheses refuted |
123
- | medium | Indirect evidence exists, no refutation, some alternative hypotheses remain |
124
- | low | Speculation level, or refutation exists, or many alternative hypotheses remain |
117
+ | Coverage | Conditions |
118
+ |----------|------------|
119
+ | sufficient | Direct evidence covers the relevant path, no major uncovered boundary remains |
120
+ | partial | Some indirect or incomplete evidence remains, but the main path is usable |
121
+ | insufficient | Critical path segments remain speculative or materially unverified |
125
122
 
126
123
  ## Output Format
127
124
 
@@ -130,15 +127,15 @@ Return the JSON result as the final response. See Output Format for the schema.
130
127
  ```json
131
128
  {
132
129
  "investigationReview": {
133
- "originalHypothesesCount": 3,
134
- "coverageAssessment": "Investigation coverage evaluation",
130
+ "originalFailurePointCount": 3,
131
+ "coverageAssessment": "sufficient|partial|insufficient",
135
132
  "identifiedGaps": ["Perspectives overlooked in investigation"]
136
133
  },
137
134
  "triangulationSupplements": [
138
135
  {
139
136
  "source": "Additional information source investigated",
140
137
  "findings": "Content discovered",
141
- "impactOnHypotheses": "Impact on existing hypotheses"
138
+ "impactOnFailurePoints": "Impact on existing failure points"
142
139
  }
143
140
  ],
144
141
  "scopeValidation": {
@@ -150,42 +147,45 @@ Return the JSON result as the final response. See Output Format for the schema.
150
147
  "query": "Search query used",
151
148
  "source": "Information source",
152
149
  "findings": "Related information discovered",
153
- "impactOnHypotheses": "Impact on hypotheses"
150
+ "impactOnFailurePoints": "Impact on failure points"
154
151
  }
155
152
  ],
156
- "alternativeHypotheses": [
153
+ "additionalFailurePoints": [
157
154
  {
158
- "id": "AH1",
159
- "description": "Alternative hypothesis description",
160
- "rationale": "Why this hypothesis was considered",
155
+ "id": "AFP1",
156
+ "description": "Additional failure point description",
157
+ "rationale": "Why this failure point was considered",
161
158
  "evidence": {"supporting": [], "contradicting": []},
162
159
  "plausibility": "high|medium|low"
163
160
  }
164
161
  ],
165
- "devilsAdvocateFindings": [
162
+ "pathCoverageFindings": [
166
163
  {
167
- "targetHypothesis": "Hypothesis ID being verified",
168
- "alternativeExplanation": "Possible alternative explanation",
169
- "hiddenAssumptions": ["Implicit assumptions"],
170
- "potentialCounterEvidence": ["Potentially overlooked counter-evidence"]
164
+ "nodeId": "N1",
165
+ "status": "covered|partially_covered|uncovered",
166
+ "findings": "Coverage finding",
167
+ "followUpNeeded": ["Needed follow-up"]
171
168
  }
172
169
  ],
173
- "hypothesesEvaluation": [
170
+ "failurePointsEvaluation": [
174
171
  {
175
- "hypothesisId": "H1 or AH1",
176
- "description": "Hypothesis description",
172
+ "failurePointId": "FP1 or AFP1",
173
+ "description": "Failure point description",
177
174
  "verificationLevel": "speculation|indirect|direct|verified",
178
175
  "refutationStatus": "unrefuted|partially_refuted|refuted",
179
176
  "remainingUncertainty": ["Remaining uncertainty"]
180
177
  }
181
178
  ],
182
179
  "conclusion": {
183
- "causes": [
184
- {"hypothesisId": "H1", "status": "confirmed|probable|possible"}
180
+ "confirmedFailurePoints": [
181
+ {"failurePointId": "FP1", "status": "confirmed|probable|possible", "originalCheckStatus": "retained|added_by_verifier|null"}
182
+ ],
183
+ "failurePointRelationships": [
184
+ {"from": "FP1", "to": "FP2", "relationship": "independent|upstream_of|downstream_of|amplifies|same_boundary"}
185
185
  ],
186
- "causesRelationship": "independent|dependent|exclusive",
187
- "confidence": "high|medium|low",
188
- "confidenceRationale": "Rationale for confidence level",
186
+ "finalStatus": "ready_for_solution|needs_more_investigation",
187
+ "coverageAssessment": "sufficient|partial|insufficient",
188
+ "statusRationale": "Rationale for status and coverage level",
189
189
  "recommendedVerification": ["Additional verification needed to confirm conclusion"]
190
190
  },
191
191
  "verificationLimitations": ["Limitations of this verification process"]
@@ -196,22 +196,23 @@ Return the JSON result as the final response. See Output Format for the schema.
196
196
 
197
197
  - [ ] Performed Triangulation supplementation and collected additional information
198
198
  - [ ] Collected external information via web search
199
- - [ ] Generated at least 3 alternative hypotheses
200
- - [ ] Performed Devil's Advocate evaluation on major hypotheses
201
- - [ ] Lowered confidence for hypotheses with official documentation-based counter-evidence
199
+ - [ ] Checked path coverage and recorded uncovered areas
200
+ - [ ] Evaluated at least 2 additional path segments or boundaries beyond the investigator's original failure-point list
201
+ - [ ] Evaluated each failure point independently
202
+ - [ ] Lowered verification strength for failure points with official documentation-based counter-evidence
202
203
  - [ ] Verified consistency with user report
203
- - [ ] Determined verification level for each hypothesis
204
- - [ ] Adopted unrefuted hypotheses as causes and determined relationship when multiple
204
+ - [ ] Determined verification level for each failure point
205
+ - [ ] Preserved multiple valid failure points and their relationships when present
205
206
  - [ ] Final response is the JSON output
206
207
 
207
208
  ## Output Self-Check
208
- - [ ] Confidence levels reflect all discovered evidence, including official documentation
209
+ - [ ] Final status and coverage assessment reflect all discovered evidence, including official documentation
209
210
  - [ ] User's causal relationship hints are incorporated into the verification
210
211
 
211
212
  ## Completion Gate [BLOCKING]
212
213
 
213
214
  ☐ All completion criteria met with evidence
214
- ☐ Output format validated (JSON with conclusion and confidence)
215
+ ☐ Output format validated (JSON with conclusion and coverage assessment)
215
216
  ☐ Quality standards satisfied (all self-check items verified)
216
217
 
217
218
  **ENFORCEMENT**: HALT if any gate unchecked. Return incomplete status to caller.
@@ -53,6 +53,8 @@ Skill Status:
53
53
  - **prd** (optional): Path to PRD document
54
54
  - **adr** (optional): Path to ADR document
55
55
  - **testSkeletons** (optional): Paths to integration/E2E test skeleton files from acceptance-test-generator
56
+ - `generatedFiles.e2e` may be `null` when no E2E skeleton is intentionally generated
57
+ - When provided, carry `e2eAbsenceReason` into the work plan and treat it as an explicit planning input
56
58
  - **updateContext** (update mode only): Path to existing plan, reason for changes
57
59
 
58
60
  ## Workflow
@@ -173,13 +175,13 @@ Gradually ensure quality based on Design Doc acceptance criteria.
173
175
  **Processing when test skeleton file paths provided from previous process**:
174
176
 
175
177
  #### Step 1: Read Test Skeleton Files (Required)
176
- Read test skeleton files (integration tests, E2E tests) and extract meta information from comments.
178
+ Read available test skeleton files (integration tests, and E2E tests only when present) and extract meta information from comments.
177
179
 
178
180
  **Comment patterns to extract**:
179
181
  - `// @category:` → Test classification (core-functionality, edge-case, e2e, etc.)
180
182
  - `// @dependency:` → Dependent components (material for phase placement decisions)
181
183
  - `// @complexity:` → Complexity (high/medium/low, material for effort estimation)
182
- - `// ROI:` → Priority judgment
184
+ - `// Value Score:` → Priority judgment
183
185
 
184
186
  #### Step 2: Reflect Meta Information in Work Plan
185
187
 
@@ -211,13 +213,24 @@ When E2E test skeletons are provided, first identify the E2E skeleton subset usi
211
213
 
212
214
  Place these setup tasks before implementation and annotate them as E2E setup work.
213
215
 
216
+ #### Step 3a: E2E Absence Handling
217
+
218
+ When `generatedFiles.e2e` is `null`:
219
+ - Require `e2eAbsenceReason` from the generator output
220
+ - Record the absence reason in the work plan header
221
+ - Skip E2E prerequisite extraction and E2E execution task creation
222
+ - Accept the null E2E file as a valid planning input when a concrete `e2eAbsenceReason` is present
223
+
224
+ When `generatedFiles.e2e` is `null` and `e2eAbsenceReason` is missing:
225
+ - Flag a planning gap for user confirmation before plan approval
226
+
214
227
  #### Step 4: Classify and Place Tests
215
228
 
216
229
  **Test Classification**:
217
230
  - Setup items (Mock preparation, measurement tools, Helpers, etc.) → Prioritize in Phase 1
218
231
  - Unit tests (individual functions) → Start from Phase 0 with Red-Green-Refactor
219
232
  - Integration tests → Place as create/execute tasks when relevant feature implementation is complete
220
- - E2E tests → Place as execute-only tasks in final phase
233
+ - E2E tests → Place as execute-only tasks in final phase when an E2E skeleton exists
221
234
  - Non-functional requirement tests (performance, UX, etc.) → Place in quality assurance phase
222
235
  - Risk levels ("high risk", "required", etc.) → Move to earlier phases
223
236
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "codex-workflows",
3
- "version": "0.4.6",
3
+ "version": "0.4.8",
4
4
  "description": "Task-oriented agentic coding framework for OpenAI Codex CLI — skills, recipes, and subagents for structured development workflows",
5
5
  "license": "MIT",
6
6
  "author": "Shinsuke Kagawa",