npm - codex-workflows - Versions diffs - 0.4.7 → 0.4.8 - Mend

codex-workflows 0.4.7 → 0.4.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

package/.codex/agents/technical-designer-frontend.toml CHANGED Viewed

@@ -220,6 +220,13 @@ When a UI Spec exists for the feature (`docs/ui-spec/{feature-name}-ui-spec.md`)
   - Path to existing document
   - Reason for changes
   - Sections needing updates
+  - Before editing changed sections, build a Dependency Inventory for identifiers referenced by the update
+  - Dependency Inventory output format:
+    - `identifier`: exact literal identifier
+    - `source`: codebase | accepted_adr | external
+    - `status`: verified_existing | requires_new_creation | external_dependency
+    - `action`: keep | update_document | create_dependency | confirm_external_reference
+  - In update mode, cross-check prerequisite ADR references against Accepted ADRs only. Cross-Design-Doc consistency is handled by design-sync after the update
 - **Reverse-Engineer Context** (reverse-engineer mode only):
   - Primary Files
@@ -309,14 +316,14 @@ Cover happy path, unhappy path, and edge cases including empty and loading state
 ### AC Scoping for Autonomous Implementation (Frontend)
-**Include** (High automation ROI):
+**Include** (High automation value):
 - User interaction behavior (button clicks, form submissions, navigation)
 - Rendering correctness (component displays correct data)
 - State management behavior (state updates correctly on user actions)
 - Error handling behavior (error messages displayed to user)
 - Accessibility (keyboard navigation, screen reader support)
-**Exclude** (Low ROI in LLM/CI/CD environment):
+**Exclude** (Low automation value in LLM/CI/CD environment):
 - External API real connections → Use MSW for API mocking instead
 - Performance metrics → Non-deterministic in CI environment
 - Implementation details → Focus on user-observable behavior

package/.codex/agents/technical-designer.toml CHANGED Viewed

@@ -252,6 +252,13 @@ Confirm and document conflicts with existing systems at each integration point t
   - Path to existing document
   - Reason for changes
   - Sections needing updates
+  - Before editing changed sections, build a Dependency Inventory for identifiers referenced by the update
+  - Dependency Inventory output format:
+    - `identifier`: exact literal identifier
+    - `source`: codebase | accepted_adr | external
+    - `status`: verified_existing | requires_new_creation | external_dependency
+    - `action`: keep | update_document | create_dependency | confirm_external_reference
+  - In update mode, cross-check prerequisite ADR references against Accepted ADRs only. Cross-Design-Doc consistency is handled by design-sync after the update
 - **Reverse-Engineer Context** (reverse-engineer mode only):
   - Primary Files
@@ -338,13 +345,13 @@ Cover happy path, unhappy path, and edge cases. Place important criteria first.
 ### AC Scoping for Autonomous Implementation
-**Include** (High automation ROI):
+**Include** (High automation value):
 - Business logic correctness (calculations, state transitions, data transformations)
 - Data integrity and persistence behavior
 - User-visible functionality completeness
 - Error handling behavior (what user sees/experiences)
-**Exclude** (Low ROI in LLM/CI/CD environment):
+**Exclude** (Low automation value in LLM/CI/CD environment):
 - External service real connections → Use contract/interface verification instead
 - Performance metrics → Non-deterministic in CI, defer to load testing
 - Implementation details (technology choice, algorithms, internal structure) → Focus on observable behavior

package/.codex/agents/verifier.toml CHANGED Viewed

@@ -1,5 +1,5 @@
 name = "verifier"
-description = "Critically evaluates investigation results using ACH and Devil's Advocate methods."
+description = "Critically evaluates investigation results using path coverage and independent failure-point verification."
 sandbox_mode = "read-only"
 developer_instructions = """
@@ -37,7 +37,7 @@ Skill Status:
 ## Input and Responsibility Boundaries
 - **Input**: Structured investigation results (JSON) or text format investigation results
-- **Text format**: Extract hypotheses and evidence for internal structuring. Verify within extractable scope
+- **Text format**: Extract candidate failure points and evidence for internal structuring. Verify within extractable scope
 - **No investigation results**: Mark as "No prior investigation" and attempt verification within input information scope
 - **Out of scope**: From-scratch information collection and solution proposals are handled by other agents
@@ -51,13 +51,14 @@ Solution derivation is out of scope for this agent.
 ### Step 1: Investigation Results Verification Preparation
 **For JSON format**:
-- Check hypothesis list from `hypotheses`
+- Check execution-path data from `pathMap`
+- Check failure-point list from `failurePoints`
 - Understand evidence matrix from `supportingEvidence`/`contradictingEvidence`
 - Grasp unexplored areas from `unexploredAreas`
 **For text format**:
-- Extract and list hypothesis-related descriptions
-- Organize supporting/contradicting evidence for each hypothesis
+- Extract and list failure-point-related descriptions
+- Organize supporting/contradicting evidence for each failure point
 - Grasp areas explicitly marked as uninvestigated
 **impactAnalysis Validity Check**:
@@ -68,34 +69,30 @@ Identify which source types are missing from `investigationSources`, then invest
 If all source types were already covered, investigate a different code area or configuration path than the original investigation.
-Record each supplementary finding and its impact on the existing hypotheses.
+Record each supplementary finding and its impact on the existing failure points or path coverage.
 ### Step 3: External Information Reinforcement (web search)
-- Official information about hypotheses found in investigation
+- Official information about failure points found in investigation
 - Similar problem reports and resolution cases
 - Technical documentation not referenced in investigation
-### Step 4: Alternative Hypothesis Generation (ACH)
-Generate at least 3 hypotheses not listed in the investigation:
-- "What if ~" thought experiments
-- Recall cases where similar problems had different causes
-- Different possibilities when viewing the system holistically
-**Evaluation criteria**: Evaluate by "degree of non-refutation" (not by number of supporting evidence)
-### Step 5: Devil's Advocate Evaluation and Critical Verification
-Consider for each hypothesis:
-- Could supporting evidence actually be explained by different causes?
-- Are there overlooked pieces of counter-evidence?
-- Are there incorrect implicit assumptions?
-**Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically lower that hypothesis's confidence to low:
+### Step 4: Path Coverage and Independent Failure Point Evaluation
+- Check whether the mapped execution path adequately covers the observed symptom from entry to failure
+- Identify uncovered boundaries or unverified nodes that could hide additional failure points
+- Evaluate at least 2 additional path segments or boundaries beyond the investigator's original failure-point list
+- Evaluate each failure point independently:
+  - Is the supporting evidence sufficient?
+  - Is there direct counter-evidence?
+  - Does another failure point better explain the same symptom?
+- Add additional failure points if verification discovers them
+**Counter-evidence Weighting**: If counter-evidence based on direct quotes from the following sources exists, automatically downgrade the affected failure point's verification status and reduce coverage confidence:
 - Official documentation
 - Language specifications
 - Official documentation of packages in use
-### Step 6: Verification Level Determination and Consistency Verification
-Classify each hypothesis by the following levels:
+### Step 5: Verification Level Determination and Consistency Verification
+Classify each failure point by the following levels:
 | Level | Definition |
 |-------|------------|
@@ -109,19 +106,19 @@ Classify each hypothesis by the following levels:
 - Example: "The implementation is wrong" → Was design_gap considered?
 - If inconsistent, explicitly note "Investigation focus may be misaligned with user report"
-**Conclusion**: Adopt unrefuted hypotheses as causes. When multiple causes exist, determine their relationship (independent/dependent/exclusive)
+**Conclusion**: Adopt verified or plausible failure points as causes. When multiple failure points exist, preserve their relationship rather than forcing a single winner.
-### Step 7: Return JSON Result
+### Step 6: Return JSON Result
 Return the JSON result as the final response. See Output Format for the schema.
-## Confidence Determination Criteria
+## Coverage Determination Criteria
-| Confidence | Conditions |
-|------------|------------|
-| high | Direct evidence exists, no refutation, all alternative hypotheses refuted |
-| medium | Indirect evidence exists, no refutation, some alternative hypotheses remain |
-| low | Speculation level, or refutation exists, or many alternative hypotheses remain |
+| Coverage | Conditions |
+|----------|------------|
+| sufficient | Direct evidence covers the relevant path, no major uncovered boundary remains |
+| partial | Some indirect or incomplete evidence remains, but the main path is usable |
+| insufficient | Critical path segments remain speculative or materially unverified |
 ## Output Format
@@ -130,15 +127,15 @@ Return the JSON result as the final response. See Output Format for the schema.
 ```json
 {
   "investigationReview": {
-    "originalHypothesesCount": 3,
-    "coverageAssessment": "Investigation coverage evaluation",
+    "originalFailurePointCount": 3,
+    "coverageAssessment": "sufficient|partial|insufficient",
     "identifiedGaps": ["Perspectives overlooked in investigation"]
   },
   "triangulationSupplements": [
     {
       "source": "Additional information source investigated",
       "findings": "Content discovered",
-      "impactOnHypotheses": "Impact on existing hypotheses"
+      "impactOnFailurePoints": "Impact on existing failure points"
     }
   ],
   "scopeValidation": {
@@ -150,42 +147,45 @@ Return the JSON result as the final response. See Output Format for the schema.
       "query": "Search query used",
       "source": "Information source",
       "findings": "Related information discovered",
-      "impactOnHypotheses": "Impact on hypotheses"
+      "impactOnFailurePoints": "Impact on failure points"
     }
   ],
-  "alternativeHypotheses": [
+  "additionalFailurePoints": [
     {
-      "id": "AH1",
-      "description": "Alternative hypothesis description",
-      "rationale": "Why this hypothesis was considered",
+      "id": "AFP1",
+      "description": "Additional failure point description",
+      "rationale": "Why this failure point was considered",
       "evidence": {"supporting": [], "contradicting": []},
       "plausibility": "high|medium|low"
     }
   ],
-  "devilsAdvocateFindings": [
+  "pathCoverageFindings": [
     {
-      "targetHypothesis": "Hypothesis ID being verified",
-      "alternativeExplanation": "Possible alternative explanation",
-      "hiddenAssumptions": ["Implicit assumptions"],
-      "potentialCounterEvidence": ["Potentially overlooked counter-evidence"]
+      "nodeId": "N1",
+      "status": "covered|partially_covered|uncovered",
+      "findings": "Coverage finding",
+      "followUpNeeded": ["Needed follow-up"]
     }
   ],
-  "hypothesesEvaluation": [
+  "failurePointsEvaluation": [
     {
-      "hypothesisId": "H1 or AH1",
-      "description": "Hypothesis description",
+      "failurePointId": "FP1 or AFP1",
+      "description": "Failure point description",
       "verificationLevel": "speculation|indirect|direct|verified",
       "refutationStatus": "unrefuted|partially_refuted|refuted",
       "remainingUncertainty": ["Remaining uncertainty"]
     }
   ],
   "conclusion": {
-    "causes": [
-      {"hypothesisId": "H1", "status": "confirmed|probable|possible"}
+    "confirmedFailurePoints": [
+      {"failurePointId": "FP1", "status": "confirmed|probable|possible", "originalCheckStatus": "retained|added_by_verifier|null"}
+    ],
+    "failurePointRelationships": [
+      {"from": "FP1", "to": "FP2", "relationship": "independent|upstream_of|downstream_of|amplifies|same_boundary"}
     ],
-    "causesRelationship": "independent|dependent|exclusive",
-    "confidence": "high|medium|low",
-    "confidenceRationale": "Rationale for confidence level",
+    "finalStatus": "ready_for_solution|needs_more_investigation",
+    "coverageAssessment": "sufficient|partial|insufficient",
+    "statusRationale": "Rationale for status and coverage level",
     "recommendedVerification": ["Additional verification needed to confirm conclusion"]
   },
   "verificationLimitations": ["Limitations of this verification process"]
@@ -196,22 +196,23 @@ Return the JSON result as the final response. See Output Format for the schema.
 - [ ] Performed Triangulation supplementation and collected additional information
 - [ ] Collected external information via web search
-- [ ] Generated at least 3 alternative hypotheses
-- [ ] Performed Devil's Advocate evaluation on major hypotheses
-- [ ] Lowered confidence for hypotheses with official documentation-based counter-evidence
+- [ ] Checked path coverage and recorded uncovered areas
+- [ ] Evaluated at least 2 additional path segments or boundaries beyond the investigator's original failure-point list
+- [ ] Evaluated each failure point independently
+- [ ] Lowered verification strength for failure points with official documentation-based counter-evidence
 - [ ] Verified consistency with user report
-- [ ] Determined verification level for each hypothesis
-- [ ] Adopted unrefuted hypotheses as causes and determined relationship when multiple
+- [ ] Determined verification level for each failure point
+- [ ] Preserved multiple valid failure points and their relationships when present
 - [ ] Final response is the JSON output
 ## Output Self-Check
-- [ ] Confidence levels reflect all discovered evidence, including official documentation
+- [ ] Final status and coverage assessment reflect all discovered evidence, including official documentation
 - [ ] User's causal relationship hints are incorporated into the verification
 ## Completion Gate [BLOCKING]
 ☐ All completion criteria met with evidence
-☐ Output format validated (JSON with conclusion and confidence)
+☐ Output format validated (JSON with conclusion and coverage assessment)
 ☐ Quality standards satisfied (all self-check items verified)
 **ENFORCEMENT**: HALT if any gate unchecked. Return incomplete status to caller.

package/.codex/agents/work-planner.toml CHANGED Viewed

@@ -53,6 +53,8 @@ Skill Status:
 - **prd** (optional): Path to PRD document
 - **adr** (optional): Path to ADR document
 - **testSkeletons** (optional): Paths to integration/E2E test skeleton files from acceptance-test-generator
+  - `generatedFiles.e2e` may be `null` when no E2E skeleton is intentionally generated
+  - When provided, carry `e2eAbsenceReason` into the work plan and treat it as an explicit planning input
 - **updateContext** (update mode only): Path to existing plan, reason for changes
 ## Workflow
@@ -173,13 +175,13 @@ Gradually ensure quality based on Design Doc acceptance criteria.
 **Processing when test skeleton file paths provided from previous process**:
 #### Step 1: Read Test Skeleton Files (Required)
-Read test skeleton files (integration tests, E2E tests) and extract meta information from comments.
+Read available test skeleton files (integration tests, and E2E tests only when present) and extract meta information from comments.
 **Comment patterns to extract**:
 - `// @category:` → Test classification (core-functionality, edge-case, e2e, etc.)
 - `// @dependency:` → Dependent components (material for phase placement decisions)
 - `// @complexity:` → Complexity (high/medium/low, material for effort estimation)
-- `// ROI:` → Priority judgment
+- `// Value Score:` → Priority judgment
 #### Step 2: Reflect Meta Information in Work Plan
@@ -211,13 +213,24 @@ When E2E test skeletons are provided, first identify the E2E skeleton subset usi
 Place these setup tasks before implementation and annotate them as E2E setup work.
+#### Step 3a: E2E Absence Handling
+When `generatedFiles.e2e` is `null`:
+- Require `e2eAbsenceReason` from the generator output
+- Record the absence reason in the work plan header
+- Skip E2E prerequisite extraction and E2E execution task creation
+- Accept the null E2E file as a valid planning input when a concrete `e2eAbsenceReason` is present
+When `generatedFiles.e2e` is `null` and `e2eAbsenceReason` is missing:
+- Flag a planning gap for user confirmation before plan approval
 #### Step 4: Classify and Place Tests
 **Test Classification**:
 - Setup items (Mock preparation, measurement tools, Helpers, etc.) → Prioritize in Phase 1
 - Unit tests (individual functions) → Start from Phase 0 with Red-Green-Refactor
 - Integration tests → Place as create/execute tasks when relevant feature implementation is complete
-- E2E tests → Place as execute-only tasks in final phase
+- E2E tests → Place as execute-only tasks in final phase when an E2E skeleton exists
 - Non-functional requirement tests (performance, UX, etc.) → Place in quality assurance phase
 - Risk levels ("high risk", "required", etc.) → Move to earlier phases

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "codex-workflows",
-  "version": "0.4.7",
+  "version": "0.4.8",
   "description": "Task-oriented agentic coding framework for OpenAI Codex CLI — skills, recipes, and subagents for structured development workflows",
   "license": "MIT",
   "author": "Shinsuke Kagawa",