npm - codex-workflows - Versions diffs - 0.2.1 → 0.2.3 - Mend

codex-workflows 0.2.1 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

package/.agents/skills/recipe-add-integration-tests/SKILL.md +2 -2
package/.agents/skills/recipe-build/SKILL.md +1 -1
package/.agents/skills/recipe-diagnose/SKILL.md +20 -4
package/.agents/skills/recipe-front-build/SKILL.md +2 -2
package/.agents/skills/recipe-fullstack-build/SKILL.md +1 -1
package/.agents/skills/recipe-fullstack-implement/SKILL.md +1 -1
package/.agents/skills/recipe-implement/SKILL.md +1 -1
package/.agents/skills/recipe-reverse-engineer/SKILL.md +56 -12
package/.agents/skills/recipe-update-doc/SKILL.md +10 -5
package/.agents/skills/subagents-orchestration-guide/SKILL.md +3 -3
package/.agents/skills/subagents-orchestration-guide/references/monorepo-flow.md +2 -2
package/.codex/agents/code-reviewer.toml +11 -1
package/.codex/agents/code-verifier.toml +58 -21
package/.codex/agents/document-reviewer.toml +4 -2
package/.codex/agents/integration-test-reviewer.toml +4 -0
package/.codex/agents/investigator.toml +20 -17
package/.codex/agents/prd-creator.toml +39 -24
package/.codex/agents/quality-fixer-frontend.toml +15 -7
package/.codex/agents/quality-fixer.toml +15 -7
package/.codex/agents/requirement-analyzer.toml +4 -0
package/.codex/agents/rule-advisor.toml +9 -0
package/.codex/agents/scope-discoverer.toml +67 -29
package/.codex/agents/security-reviewer.toml +4 -0
package/.codex/agents/solver.toml +6 -2
package/.codex/agents/task-executor-frontend.toml +9 -0
package/.codex/agents/task-executor.toml +9 -0
package/.codex/agents/technical-designer-frontend.toml +68 -115
package/.codex/agents/technical-designer.toml +70 -114
package/.codex/agents/verifier.toml +11 -13
package/README.md +2 -2
package/package.json +1 -1

package/.agents/skills/recipe-add-integration-tests/SKILL.md CHANGED Viewed

@@ -109,11 +109,11 @@ Check Step 5 result:
 Spawn quality-fixer agent: "Final quality assurance for test files added in this workflow. Run all tests and verify coverage."
-**Expected output**: `approved` (true/false)
+**Expected output**: `status` (`approved`/`blocked`)
 ### Step 8: Commit
-On `approved: true` from quality-fixer:
+On `status: "approved"` from quality-fixer:
 - MUST commit test files with appropriate message
 ENFORCEMENT: Commits without quality-fixer approval are invalid.

package/.agents/skills/recipe-build/SKILL.md CHANGED Viewed

@@ -80,7 +80,7 @@ For EACH task, YOU MUST:
      - `approved` -> Proceed to step 4
    - `readyForQualityCheck: true` -> Proceed to step 4
 4. **Spawn quality-fixer agent**: "Execute all quality checks and fixes"
-5. **COMMIT on approval**: After `approved: true` from quality-fixer -> Execute git commit
+5. **COMMIT on approval**: After `status: "approved"` from quality-fixer -> Execute git commit
 **CRITICAL**: MUST monitor ALL structured responses WITHOUT EXCEPTION and ENSURE every quality gate is passed.
 ENFORCEMENT: Proceeding past a failed quality gate invalidates all subsequent work.

package/.agents/skills/recipe-diagnose/SKILL.md CHANGED Viewed

@@ -83,7 +83,21 @@ Register the following and execute:
 ### Step 1: Investigation (investigator)
-Spawn investigator agent: "Comprehensively collect information related to the following phenomenon. Phenomenon: [Problem reported by user]. Problem essence: [taskEssence]. Investigation focus: [investigationFocus]. Applicable rules: [selectedRules summary]."
+Spawn investigator agent with the following prompt:
+```text
+Comprehensively collect information related to the following phenomenon.
+Phenomenon: [Problem reported by user]
+Problem essence: [taskEssence]
+Investigation focus: [investigationFocus]
+Applicable rules: [selectedRules summary]
+For change failures, also include:
+- what changed
+- what broke
+- what both areas share
+```
 **Expected output**: Evidence matrix, comparison analysis results, causal tracking results, list of unexplored areas, investigation limitations
@@ -92,12 +106,14 @@ Spawn investigator agent: "Comprehensively collect information related to the fo
 Review investigation output:
 **Quality Check** (verify output contains the following):
-- [ ] comparisonAnalysis
-- [ ] causalChain for each hypothesis (reaching stop condition)
+- [ ] `comparisonAnalysis` is present and `normalImplementation` is non-null, or explicitly states that no working implementation was found
+- [ ] causalChain for each hypothesis reaches a stop condition
 - [ ] causeCategory for each hypothesis
+- [ ] `investigationSources` covers at least 3 distinct source types
+- [ ] each hypothesis has supporting evidence with a concrete source
 - [ ] Investigation covering investigationFocus items (when provided)
-**If quality insufficient**: MUST re-spawn investigator agent specifying missing items
+**If quality insufficient**: MUST re-spawn investigator agent specifying the missing items and include the previous investigation output for context
 ENFORCEMENT: Proceeding to verifier with incomplete investigation data produces unreliable conclusions.
 **design_gap Escalation**:

package/.agents/skills/recipe-front-build/SKILL.md CHANGED Viewed

@@ -74,7 +74,7 @@ Verify generated task files exist in docs/plans/tasks/.
 Each sub-agent responds in JSON format:
 - **task-executor-frontend**: status, filesModified, testsAdded, requiresTestReview, readyForQualityCheck
 - **integration-test-reviewer**: status (approved/needs_revision/blocked), requiredFixes
-- **quality-fixer-frontend**: status, checksPerformed, fixesApplied, approved
+- **quality-fixer-frontend**: status, checksPerformed, fixesApplied
 ### Execution Flow for Each Task
@@ -88,7 +88,7 @@ For EACH task, YOU MUST:
      - `approved` -> Proceed to step 4
    - `readyForQualityCheck: true` -> Proceed to step 4
 4. **Spawn quality-fixer-frontend agent**: "Execute all frontend quality checks and fixes"
-5. **COMMIT on approval**: After `approved: true` from quality-fixer-frontend -> Execute git commit. Use `changeSummary` for commit message.
+5. **COMMIT on approval**: After `status: "approved"` from quality-fixer-frontend -> Execute git commit. Use `changeSummary` for commit message.
 **CRITICAL**: MUST monitor ALL structured responses WITHOUT EXCEPTION and ENSURE every quality gate is passed.
 ENFORCEMENT: Proceeding past a failed quality gate invalidates all subsequent work.

package/.agents/skills/recipe-fullstack-build/SKILL.md CHANGED Viewed

@@ -98,7 +98,7 @@ For EACH task, YOU MUST:
      - `approved` -> Proceed to step 4
    - `readyForQualityCheck: true` -> Proceed to step 4
 4. **Spawn quality-fixer agent** (layer-appropriate per routing table): "Execute all quality checks and fixes"
-5. **COMMIT on approval**: After `approved: true` from quality-fixer -> Execute git commit
+5. **COMMIT on approval**: After `status: "approved"` from quality-fixer -> Execute git commit
 **CRITICAL**: MUST monitor ALL structured responses WITHOUT EXCEPTION and ENSURE every quality gate is passed.
 ENFORCEMENT: Proceeding past a failed quality gate invalidates all subsequent work.

package/.agents/skills/recipe-fullstack-implement/SKILL.md CHANGED Viewed

@@ -123,7 +123,7 @@ ENFORCEMENT: Sub-agent prompts missing the constraint suffix MUST be re-issued w
 1. Execute ONE task completely before starting next (each task goes through the full 4-step cycle individually, using the correct executor per filename pattern)
 2. Check executor status before quality-fixer (escalation check)
 3. Quality-fixer MUST run after each executor (no skipping)
-4. Commit MUST execute when quality-fixer returns `approved: true` (do not defer to end)
+4. Commit MUST execute when quality-fixer returns `status: "approved"` (do not defer to end)
 ### Security Review (After All Tasks Complete)

package/.agents/skills/recipe-implement/SKILL.md CHANGED Viewed

@@ -106,7 +106,7 @@ After user grants "batch approval for entire implementation phase", enter autono
      - `approved` -> Proceed to step 3
    - Otherwise -> Proceed to step 3
 3. Spawn quality-fixer (or quality-fixer-frontend) agent: "Quality check and fixes"
-4. git commit -> Execute on `approved: true`
+4. git commit -> Execute on `status: "approved"`
 ### Security Review (After All Tasks Complete)

package/.agents/skills/recipe-reverse-engineer/SKILL.md CHANGED Viewed

@@ -20,7 +20,7 @@ Target: $ARGUMENTS
 **Execution Protocol**:
 1. **Spawn agents for all work** -- your role is to invoke sub-agents, pass data between them, and report results
 2. **Process one step at a time**: Execute steps sequentially within each unit (2 -> 3 -> 4 -> 5). Each step's output is the required input for the next step. Complete all steps for one unit before starting the next
-3. **Pass `$STEP_N_OUTPUT` as-is** to sub-agents -- the orchestrator bridges data without processing or filtering it
+3. **Pass `$STEP_N_OUTPUT` as-is** to sub-agents -- the orchestrator bridges data without processing or filtering it, except for steps that explicitly define a deterministic transformation with an input schema, output schema, and mapping rules
 **Task Registration**: Register phases first, then steps within each phase as you enter it. Track status for each step.
@@ -44,7 +44,7 @@ Ask the user to confirm:
 ```
 Phase 1: PRD Generation
-  Step 1: Scope Discovery (unified, single pass)
+  Step 1: Scope Discovery (unified, single pass -> group into PRD units -> human review)
   Step 2-5: Per-unit loop (Generation -> Verification -> Review -> Revision)
 Phase 2: Design Doc Generation (if requested)
@@ -67,17 +67,20 @@ Spawn scope-discoverer agent: "Discover functional scope targets in the codebase
 **Quality Gate**:
 - At least one unit discovered -> proceed
 - No units discovered -> ask user for hints
+- `$STEP_1_OUTPUT.prdUnits` exists
+- All `sourceUnits` across `prdUnits` (flattened, deduplicated) match the set of `discoveredUnits` IDs — no unit missing, no unit duplicated
+- Each discovered unit's `unitInventory` has at least one non-empty category. If all categories are empty, re-run discovery with focus on that unit
-**[STOP — BLOCKING]** If human review enabled: Present discovered units to user for confirmation.
+**[STOP — BLOCKING]** If human review enabled: Present `$STEP_1_OUTPUT.prdUnits` with their source unit mapping to user for confirmation.
 **CANNOT proceed until user explicitly confirms.**
 ### Step 2-5: Per-Unit Processing
-**FOR** each unit in `$STEP_1_OUTPUT.discoveredUnits` **(sequential, one unit at a time)**:
+**FOR** each unit in `$STEP_1_OUTPUT.prdUnits` **(sequential, one unit at a time)**:
 #### Step 2: PRD Generation
-Spawn prd-creator agent: "Create reverse-engineered PRD for the following feature. Operation Mode: reverse-engineer. External Scope Provided: true. Feature: $UNIT_NAME. Description: $UNIT_DESCRIPTION. Related Files: $UNIT_RELATED_FILES. Entry Points: $UNIT_ENTRY_POINTS. Skip independent scope discovery. Use provided scope data. Create final version PRD based on code investigation within specified scope."
+Spawn prd-creator agent: "Create reverse-engineered PRD for the following feature. Operation Mode: reverse-engineer. External Scope Provided: true. Feature: $PRD_UNIT_NAME. Description: $PRD_UNIT_DESCRIPTION. Related Files: $PRD_UNIT_COMBINED_RELATED_FILES. Entry Points: $PRD_UNIT_COMBINED_ENTRY_POINTS. Source Units: $PRD_UNIT_SOURCE_UNITS. Use provided scope as an investigation starting point. If tracing entry points reveals directly connected files outside this scope, include them. Create final version PRD based on thorough code investigation."
 **Store output as**: `$STEP_2_OUTPUT` (PRD path)
@@ -85,12 +88,13 @@ Spawn prd-creator agent: "Create reverse-engineered PRD for the following featur
 **Prerequisite**: $STEP_2_OUTPUT (PRD path from Step 2)
-Spawn code-verifier agent: "Verify consistency between PRD and code implementation. doc_type: prd. document_path: $STEP_2_OUTPUT. code_paths: $UNIT_RELATED_FILES. verbose: false."
+Spawn code-verifier agent: "Verify consistency between PRD and code implementation. doc_type: prd. document_path: $STEP_2_OUTPUT. verbose: false."
 **Store output as**: `$STEP_3_OUTPUT`
 **Quality Gate**:
-- consistencyScore >= 70 -> proceed to review
+- consistencyScore >= 70 and verifiableClaimCount >= 20 -> proceed to review (guards against shallow verification passes with too few extracted claims)
+- consistencyScore >= 70 and verifiableClaimCount < 20 -> re-run verifier because investigation depth is insufficient
 - consistencyScore < 70 -> flag for detailed review
 #### Step 4: Review
@@ -130,18 +134,58 @@ ENFORCEMENT: Exceeding 2 revision cycles without flagging produces unreviewed ou
 ### Step 6: Design Doc Scope Mapping
-**No additional discovery required.** Use `$STEP_1_OUTPUT` (scope discovery results) directly.
+**Step type**: Deterministic transformation step executed by the orchestrator.
-Each PRD unit from Phase 1 maps to one Design Doc unit (using technical-designer).
+**No additional discovery required.** Use `$STEP_1_OUTPUT.discoveredUnits` (implementation-granularity units) for technical profiles. Use `$STEP_1_OUTPUT.prdUnits[].sourceUnits` to trace which discovered units belong to each PRD unit.
-Map `$STEP_1_OUTPUT` units to Design Doc generation targets, carrying forward:
+**Default mapping rule**: Each PRD unit maps to exactly 1 Design Doc unit.
+Only split one PRD unit into multiple Design Doc units when BOTH are true:
+1. The source units contain clearly separate technical boundaries with low shared-file overlap
+2. Separate Design Docs would improve verification clarity (different public interfaces, dependencies, or module groups)
+If the split conditions are not clearly met, keep 1 PRD unit -> 1 Design Doc unit.
+Transform `$STEP_1_OUTPUT` into `$STEP_6_OUTPUT` using only the mapping rules in this step.
+Map PRD units to Design Doc generation targets by resolving each PRD unit's `sourceUnits` back to `$STEP_1_OUTPUT.discoveredUnits`, carrying forward:
 - `technicalProfile.primaryModules` -> Primary Files
 - `technicalProfile.publicInterfaces` -> Public Interfaces
 - `dependencies` -> Dependencies
 - `relatedFiles` -> Scope boundary
+- `unitInventory` -> Unit Inventory
 **Store output as**: `$STEP_6_OUTPUT`
+`$STEP_6_OUTPUT` MUST be a JSON array of Design Doc generation targets in the following shape:
+```json
+[
+  {
+    "unitId": "DD-001",
+    "parentPrdUnitId": "PRD-001",
+    "unitName": "Authentication",
+    "unitDescription": "Current implementation for sign-in and session management",
+    "sourceUnits": ["UNIT-001", "UNIT-002"],
+    "primaryModules": ["src/auth/service.ts", "src/auth/controller.ts"],
+    "publicInterfaces": ["AuthService.login()", "AuthController.handleLogin()"],
+    "dependencies": ["UNIT-003"],
+    "scopeBoundary": ["src/auth/*"],
+    "unitInventory": {
+      "routes": [],
+      "testFiles": [],
+      "publicExports": []
+    },
+    "mappingRationale": "Default 1:1 mapping from PRD unit because technical scope is cohesive"
+  }
+]
+```
+**Quality Gate**:
+- Every PRD unit appears in at least one `$STEP_6_OUTPUT` item
+- Every `$STEP_6_OUTPUT` item references only discovered units from its parent PRD unit
+- `mappingRationale` explicitly states whether the mapping is default 1:1 or an intentional split
 ### Step 7-10: Per-Unit Processing
 **FOR** each unit in `$STEP_6_OUTPUT` **(sequential, one unit at a time)**:
@@ -150,13 +194,13 @@ Map `$STEP_1_OUTPUT` units to Design Doc generation targets, carrying forward:
 **Scope**: Document current architecture as-is. This is a documentation task, not a design improvement task.
-Spawn technical-designer agent: "Create Design Doc for the following feature based on existing code. Operation Mode: create. Feature: $UNIT_NAME. Description: $UNIT_DESCRIPTION. Primary Files: $UNIT_PRIMARY_MODULES. Public Interfaces: $UNIT_PUBLIC_INTERFACES. Dependencies: $UNIT_DEPENDENCIES. Parent PRD: $APPROVED_PRD_PATH. Document current architecture as-is."
+Spawn technical-designer agent: "Create Design Doc for the following feature based on existing code. Operation Mode: reverse-engineer. Feature: $UNIT_NAME. Description: $UNIT_DESCRIPTION. Primary Files: $UNIT_PRIMARY_MODULES. Public Interfaces: $UNIT_PUBLIC_INTERFACES. Dependencies: $UNIT_DEPENDENCIES. Unit Inventory: $UNIT_INVENTORY. Parent PRD: $APPROVED_PRD_PATH. Document current architecture as-is. Use Unit Inventory as the completeness baseline."
 **Store output as**: `$STEP_7_OUTPUT`
 #### Step 8: Code Verification
-Spawn code-verifier agent: "Verify consistency between Design Doc and code implementation. doc_type: design-doc. document_path: $STEP_7_OUTPUT. code_paths: $UNIT_PRIMARY_MODULES. verbose: false."
+Spawn code-verifier agent: "Verify consistency between Design Doc and code implementation. doc_type: design-doc. document_path: $STEP_7_OUTPUT. verbose: false."
 **Store output as**: `$STEP_8_OUTPUT`

package/.agents/skills/recipe-update-doc/SKILL.md CHANGED Viewed

@@ -31,7 +31,7 @@ ENFORCEMENT: Skipping document-reviewer risks propagating inconsistencies to dow
 ```
 Target document -> [Stop: Confirm changes]
                         |
-              technical-designer / prd-creator (update mode)
+              technical-designer / technical-designer-frontend / prd-creator (update mode)
                         |
               document-reviewer -> [Stop: Review approval]
                         | (Design Doc only)
@@ -70,15 +70,20 @@ Check for existing documents in docs/design/, docs/prd/, docs/adr/.
 | Multiple candidates found | Present options to user |
 | No documents found | Report and end (suggest $recipe-design instead) |
-### Step 2: Document Type Determination
+### Step 2: Document Type and Layer Determination
-Determine type from document path:
+Determine type from document path, then determine the layer to select the correct update agent:
 | Path Pattern | Type | Update Agent | Notes |
 |-------------|------|--------------|-------|
-| `docs/design/*.md` | Design Doc | technical-designer | - |
+| `docs/design/*.md` | Design Doc | technical-designer or technical-designer-frontend | See layer detection below |
 | `docs/prd/*.md` | PRD | prd-creator | - |
-| `docs/adr/*.md` | ADR | technical-designer | Minor changes: update existing file; Major changes: create new ADR file |
+| `docs/adr/*.md` | ADR | technical-designer or technical-designer-frontend | See layer detection below |
+**Layer detection** (for Design Doc and ADR):
+Read the document and determine its layer from content signals:
+- **Frontend** (-> technical-designer-frontend): Document title/scope mentions React, components, UI, frontend; or file contains component hierarchy, state management, UI interactions
+- **Backend** (-> technical-designer): All other cases (API, data layer, business logic, infrastructure)
 **ADR Update Guidance**:
 - **Minor changes** (clarification, typo fix, small scope adjustment): Update the existing ADR file

package/.agents/skills/subagents-orchestration-guide/SKILL.md CHANGED Viewed

@@ -173,10 +173,10 @@ All agents MUST use this vocabulary consistently:
 ## Structured Response Specification
-Subagents respond in JSON format. Key fields for orchestrator decisions:
+Subagents respond in JSON format. The final response from each JSON-returning subagent must be the JSON payload itself, with no trailing prose. Key fields for orchestrator decisions:
 - **requirement-analyzer**: scale, confidence, affectedLayers, adrRequired, scopeDependencies, questions
 - **task-executor**: status (escalation_needed/blocked/completed), testsAdded, requiresTestReview
-- **quality-fixer**: approved (true/false)
+- **quality-fixer**: status (approved/blocked)
 - **document-reviewer**: verdict.decision (approved/approved_with_conditions/needs_revision/rejected)
 - **design-sync**: sync_status (CONFLICTS_FOUND/NO_CONFLICTS) — text format with [SUMMARY] block
 - **integration-test-reviewer**: status (approved/needs_revision/blocked), requiredFixes
@@ -310,7 +310,7 @@ Stop autonomous execution and escalate to user in the following cases:
      - `approved`: Proceed to step 3
    - Otherwise: Proceed to step 3
 3. quality-fixer: Quality check and fixes
-4. git commit (on `approved: true`)
+4. git commit (on `status: "approved"`)
 ## Main Orchestrator Roles

package/.agents/skills/subagents-orchestration-guide/references/monorepo-flow.md CHANGED Viewed

@@ -99,13 +99,13 @@ Each task uses the standard 4-step cycle with layer-appropriate agents:
 1. task-executor: Implementation
 2. Escalation check
 3. quality-fixer: Quality check and fixes
-4. git commit (on approved: true)
+4. git commit (on status: "approved")
 ### frontend-task
 1. task-executor-frontend: Implementation
 2. Escalation check
 3. quality-fixer-frontend: Quality check and fixes
-4. git commit (on approved: true)
+4. git commit (on status: "approved")
 ### integration-test-reviewer Placement

package/.codex/agents/code-reviewer.toml CHANGED Viewed

@@ -89,11 +89,14 @@ Verify against the Design Doc architecture:
 - No unnecessary duplicate implementations (Pattern 5 from ai-development-guide skill)
 - Existing codebase analysis section includes similar functionality investigation results
-### 5. Calculate Compliance and Produce Report
+### 5. Calculate Compliance
 - Compliance rate = (fulfilled items + 0.5 x partially fulfilled items) / total AC items x 100
 - Compile all AC statuses, quality issues with specific locations
 - Determine verdict based on compliance rate
+### 6. Return JSON Result
+Return the JSON result as the final response. See Output Format for the schema.
 ## Output Format
 ```json
@@ -136,6 +139,13 @@ Verify against the Design Doc architecture:
 - Provide solutions, not just problems; quantify wherever possible
 - Acknowledge good implementations; present improvements as actionable items
+## Completion Criteria
+- [ ] All acceptance criteria individually evaluated
+- [ ] Compliance rate calculated
+- [ ] Verdict determined
+- [ ] Final response is the JSON output
 ### Escalation Criteria
 Recommend higher-level review when: Design Doc itself has deficiencies, security concerns discovered, or critical performance issues found.

package/.codex/agents/code-verifier.toml CHANGED Viewed

@@ -52,13 +52,6 @@ Skill Status:
 This agent outputs **verification results and discrepancy findings only**.
 Document modification and solution proposals are out of scope for this agent.
-## Core Responsibilities
-1. **Claim Extraction** - Extract verifiable claims from document
-2. **Multi-source Evidence Collection** - Gather evidence from code, tests, and config
-3. **Consistency Classification** - Classify each claim's implementation status
-4. **Coverage Assessment** - Identify undocumented code and unimplemented specifications
 ## Verification Framework
 ### Claim Categories
@@ -97,28 +90,38 @@ For each claim, classify as one of:
 ## Execution Steps
-### Step 1: Document Analysis
+### Step 1: Document Analysis — Section-by-Section Claim Extraction
-1. Read the target document
-2. Extract specific, testable claims
+1. Read the target document in full
+2. Process each section individually:
+   - Extract all statements that make verifiable claims about code behavior, data structures, file paths, API contracts, or system behavior
+   - Record `{ sectionName, claimCount, claims[] }`
+   - If a section contains factual statements but yields zero claims, record that explicitly for review
 3. Categorize each claim
 4. Note ambiguous claims that cannot be verified
+5. Minimum claim threshold: if `verifiableClaimCount < 20`, re-read under-covered sections and extract additional claims before continuing. Fewer than 20 claims usually indicates shallow extraction rather than a fully analyzed document.
 ### Step 2: Code Scope Identification
-1. Extract file paths mentioned in document
-2. Infer additional relevant paths from context
-3. Build verification target list
+1. If `code_paths` are provided, use them as a starting point, not a ceiling
+2. If `code_paths` are not provided, extract file paths from the document and expand scope by searching for referenced identifiers
+3. Infer additional relevant paths from context
+4. Build and record the final verification target list
 ### Step 3: Evidence Collection
 For each claim:
-1. **Primary Search**: Find direct implementation
+1. **Primary Search**: Find direct implementation with Read/Grep
 2. **Secondary Search**: Check test files for expected behavior
 3. **Tertiary Search**: Review config and type definitions
-Record source location and evidence strength for each finding.
+Evidence rules:
+- Record source location and evidence strength for each finding
+- Existence claims must be verified with Grep or file enumeration before reporting
+- Behavioral claims must be backed by reading the implementation, not by naming alone
+- Identifier claims must compare exact strings from code against the document
+- Single-source findings remain low confidence
 ### Step 4: Consistency Classification
@@ -130,11 +133,19 @@ For each claim with collected evidence:
    - medium: 2 sources agree
    - low: 1 source only
-### Step 5: Coverage Assessment
+### Step 5: Reverse Coverage Assessment — Code-to-Document Direction
+Perform this step with actual tool-backed enumeration, not memory:
+1. Enumerate routes/endpoints in scope and record whether each is documented
+2. Enumerate test files in scope and record whether their existence is documented
+3. Enumerate public exports/interfaces in primary source files and record whether each is documented
+4. Compile undocumented code items from the enumerations
+5. Compile unimplemented document items from earlier claim verification
-1. **Document Coverage**: What percentage of code is documented?
-2. **Implementation Coverage**: What percentage of specs are implemented?
-3. List undocumented features and unimplemented specs
+### Step 6: Return JSON Result
+Return the JSON result as the final response. See Output Format for the schema.
 ## Output Format
@@ -147,9 +158,16 @@ For each claim with collected evidence:
   "summary": {
     "docType": "prd|design-doc",
     "documentPath": "/path/to/document.md",
+    "verifiableClaimCount": 24,
+    "matchCount": 20,
     "consistencyScore": 85,
     "status": "consistent|mostly_consistent|needs_review|inconsistent"
   },
+  "claimCoverage": {
+    "sectionsAnalyzed": 8,
+    "sectionsWithClaims": 7,
+    "sectionsWithZeroClaims": ["Appendix"]
+  },
   "discrepancies": [
     {
       "id": "D001",
@@ -158,9 +176,20 @@ For each claim with collected evidence:
       "claim": "Brief claim description",
       "documentLocation": "PRD.md:45",
       "codeLocation": "src/auth.ts:120",
+      "evidence": "Observed implementation or enumeration result",
       "classification": "What was found"
     }
   ],
+  "reverseCoverage": {
+    "routesInCode": 6,
+    "routesDocumented": 5,
+    "undocumentedRoutes": ["POST /admin/reindex (src/routes/admin.ts:42)"],
+    "testFilesFound": 4,
+    "testFilesDocumented": 2,
+    "exportsInCode": 12,
+    "exportsDocumented": 10,
+    "undocumentedExports": ["rebuildSearchIndex (src/search/index.ts:18)"]
+  },
   "coverage": {
     "documented": ["Feature areas with documentation"],
     "undocumented": ["Code features lacking documentation"],
@@ -186,6 +215,8 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
                    - (minorDiscrepancies * 2)
 ```
+If `verifiableClaimCount < 20`, treat the score as unstable and return to Step 1 before finalizing. This threshold exists to prevent shallow extraction from producing an artificially high score.
 | Score | Status | Interpretation |
 |-------|--------|----------------|
 | 85-100 | consistent | Document accurately reflects code |
@@ -195,19 +226,25 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
 ## Completion Criteria
-- [ ] Extracted all verifiable claims from document
+- [ ] Extracted claims section-by-section with per-section counts recorded
+- [ ] `verifiableClaimCount >= 20`
 - [ ] Collected evidence from multiple sources for each claim
 - [ ] Classified each claim (match/drift/gap/conflict)
+- [ ] Performed reverse coverage with route, test file, and public export enumeration
 - [ ] Identified undocumented features in code
 - [ ] Identified unimplemented specifications
 - [ ] Calculated consistency score
-- [ ] Output in specified format
+- [ ] Final response is the JSON output
 ## Output Self-Check
 - [ ] All findings are based on verification evidence (no modifications proposed)
+- [ ] Existence claims are backed by Grep or enumeration evidence
+- [ ] Behavioral claims are backed by reading the actual implementation
+- [ ] Identifier comparisons use exact strings from code
 - [ ] Each classification cites multiple sources (not single-source)
 - [ ] Low-confidence classifications are explicitly noted
 - [ ] Contradicting evidence is documented, not ignored
+- [ ] `reverseCoverage` includes concrete counts from tool-backed enumeration
 ## Completion Gate [BLOCKING]

package/.codex/agents/document-reviewer.toml CHANGED Viewed

@@ -127,13 +127,15 @@ Checklist:
 - [ ] If prior_context_count > 0: Each item has resolution status
 - [ ] If prior_context_count > 0: `prior_context_check` object prepared
 - [ ] Output is valid JSON
+- [ ] Final response is the JSON output
 Complete all items before proceeding to output.
-### Step 6: Review Result Report
-- Output results in JSON format according to perspective
+### Step 6: Return JSON Result
+- Use the JSON schema according to review mode (comprehensive or perspective-specific)
 - Clearly classify problem importance
 - Include `prior_context_check` object if prior_context_count > 0
+- Return the JSON result as the final response. See Output Format for the schema.
 ## Output Format

package/.codex/agents/integration-test-reviewer.toml CHANGED Viewed

@@ -78,6 +78,9 @@ Evaluate each test for:
 - No shared state
 - No time-dependent logic
+### 4. Return JSON Result
+Return the JSON result as the final response. See Output Format for the schema.
 ## Output Format
 ```json
@@ -137,6 +140,7 @@ Evaluate each test for:
 - [ ] No test interdependencies
 - [ ] Deterministic execution (no random/time dependency)
 - [ ] Test name matches verification content
+- [ ] Final response is the JSON output
 ## Common Issues and Fixes

package/.codex/agents/investigator.toml CHANGED Viewed

@@ -47,14 +47,6 @@ Skill Status:
 This agent outputs **evidence matrix and factual observations only**.
 Solution derivation is out of scope for this agent.
-## Core Responsibilities
-1. **Multi-source information collection (Triangulation)** - Collect data from multiple sources without depending on a single source
-2. **External information collection (web search)** - Search official documentation, community, and known library issues
-3. **Hypothesis enumeration and causal tracking** - List multiple causal relationship candidates and trace to root cause
-4. **Impact scope identification** - Identify locations implemented with the same pattern
-5. **Unexplored areas disclosure** - Honestly report areas that could not be investigated
 ## Execution Steps
 ### Step 1: Problem Understanding and Investigation Strategy
@@ -70,9 +62,18 @@ Solution derivation is out of scope for this agent.
 ### Step 2: Information Collection
-- **Internal sources**: Code, git history, dependencies, configuration, Design Doc/ADR
-- **External sources (web search)**: Official documentation, Stack Overflow, GitHub Issues, package issue trackers
-- **Comparison analysis**: Differences between working implementation and problematic area (call order, initialization timing, configuration values)
+Investigate each source type below and record findings even when empty:
+| Source | Minimum Investigation Action |
+|--------|------------------------------|
+| Code | Read directly related files and search for the reported symbols, errors, or messages |
+| git history | Review recent history for affected files and compare working/broken states when applicable |
+| Dependencies | Inspect package manifests and relevant package versions or changelogs |
+| Configuration | Read relevant config files and search for related keys across the project |
+| Design Doc or ADR | Search for matching docs and read them. Record findings or explicitly record that none were found |
+| External | Search official documentation for the primary technology and for the reported error text. Record findings or explicitly record that no relevant result was found |
+**Comparison analysis**: Differences between working implementation and problematic area (call order, initialization timing, configuration values)
 Information source priority:
 1. Comparison with "working implementation" in project
@@ -86,16 +87,17 @@ Information source priority:
 - Collect supporting and contradicting evidence for each hypothesis
 - Determine causeCategory: typo / logic_error / missing_constraint / design_gap / external_factor
-**Signs of shallow tracking**:
-- Stopping at "~ is not configured" → without tracing why it's not configured
-- Stopping at technical element names → without tracing why that state occurred
+**Tracking depth check**: Each causal chain must reach a stop condition. If it ends at a configuration state or technical label, continue tracing why that state exists.
-### Step 4: Impact Scope Identification and Output
+### Step 4: Impact Scope Identification
 - Search for locations implemented with the same pattern (impactScope)
 - Determine recurrenceRisk: low (isolated) / medium (2 or fewer locations) / high (3+ locations or design_gap)
 - Disclose unexplored areas and investigation limitations
-- Output in JSON format
+### Step 5: Return JSON Result
+Return the JSON result as the final response. See Output Format for the schema.
 ## Evidence Strength Classification
@@ -169,10 +171,11 @@ Information source priority:
 - [ ] Determined problem type and executed diff analysis for change failures
 - [ ] Output comparisonAnalysis
-- [ ] Investigated internal and external sources
+- [ ] Investigated each source type or recorded that it had no relevant findings
 - [ ] Enumerated 2+ hypotheses with causal tracking, evidence collection, and causeCategory determination for each
 - [ ] Determined impactScope and recurrenceRisk
 - [ ] Documented unexplored areas and investigation limitations
+- [ ] Final response is the JSON output
 ## Output Self-Check
 - [ ] Multiple hypotheses were evaluated (not just the first plausible one)