npm - codex-workflows - Versions diffs - 0.2.2 → 0.2.4 - Mend

codex-workflows 0.2.2 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

package/.agents/skills/documentation-criteria/SKILL.md +3 -3
package/.agents/skills/documentation-criteria/references/design-template.md +1 -26
package/.agents/skills/documentation-criteria/references/plan-template.md +3 -18
package/.agents/skills/recipe-add-integration-tests/SKILL.md +58 -18
package/.agents/skills/recipe-diagnose/SKILL.md +20 -4
package/.agents/skills/recipe-reverse-engineer/SKILL.md +13 -5
package/.codex/agents/code-verifier.toml +53 -20
package/.codex/agents/investigator.toml +14 -15
package/.codex/agents/prd-creator.toml +39 -24
package/.codex/agents/scope-discoverer.toml +23 -27
package/.codex/agents/task-decomposer.toml +1 -1
package/.codex/agents/technical-designer-frontend.toml +70 -117
package/.codex/agents/technical-designer.toml +72 -116
package/.codex/agents/verifier.toml +5 -12
package/.codex/agents/work-planner.toml +7 -6
package/package.json +1 -1

package/.agents/skills/documentation-criteria/SKILL.md CHANGED Viewed

@@ -64,16 +64,16 @@ description: "Documentation creation criteria for PRD, ADR, Design Doc, UI Spec,
 ### UI Specification
 **Purpose**: Define UI structure, screen transitions, component decomposition, and interaction design
 **Includes**: Screen list and transitions, component state x display matrix, interaction definitions, AC traceability, existing component reuse map, accessibility requirements
-**Excludes**: Technical implementation details, API contracts, test implementation, implementation schedule
+**Excludes**: Technical implementation details, API contracts, test implementation (generated by acceptance-test-generator), implementation schedule
 ### Design Document
 **Purpose**: Define technical implementation methods in detail
 **Includes**: Existing codebase analysis, technical approach, dependencies and constraints, interface/contract definitions, data flow, acceptance criteria, change impact map, code inspection evidence
-**Excludes**: Why that technology was chosen (reference ADR), when/who to implement (reference Work Plan)
+**Excludes**: Why that technology was chosen (reference ADR), when/who to implement (reference Work Plan), detailed test strategy and test case selection (generated by acceptance-test-generator from acceptance criteria)
 ### Work Plan
 **Purpose**: Implementation task management and progress tracking
-**Includes**: Task breakdown, schedule estimates, E2E verification procedures, Phase 4 Quality Assurance Phase (required), progress records
+**Includes**: Task breakdown, schedule estimates, test skeleton file paths, Phase 4 Quality Assurance Phase (required), progress records
 **Excludes**: Technical rationale, design details
 **Phase Division Criteria**:

package/.agents/skills/documentation-criteria/references/design-template.md CHANGED Viewed

@@ -259,40 +259,15 @@ System Invariants:
    - Prerequisites: [Required pre-implementations]
 ### Integration Points
-Each integration point requires E2E verification:
 **Integration Point 1: [Name]**
 - Components: [Component A] to [Component B]
-- Verification: [How to verify integration works]
+- Contract: [Interface/API contract between components]
 ### Migration Strategy
 [Technical migration approach, ensuring backward compatibility]
-## Test Strategy
-### Basic Test Design Policy
-Automatically derive test cases from acceptance criteria:
-- Create at least one test case for each acceptance criterion
-- Implement measurable standards from acceptance criteria as assertions
-### Unit Tests
-[Unit testing policy and coverage goals]
-### Integration Tests
-[Integration testing policy and important test cases]
-### E2E Tests
-[E2E testing policy]
-### Performance Tests
-[Performance testing methods and standards]
 ## Security Considerations
 Evaluate the following for this feature's trust boundaries and data flow:

package/.agents/skills/documentation-criteria/references/plan-template.md CHANGED Viewed

@@ -48,11 +48,6 @@ Related Issue/PR: #XXX (if any)
 - [ ] [Functional completion criteria]
 - [ ] [Quality completion criteria]
-#### Operational Verification Procedures
-1. [Operation verification steps]
-2. [Expected result verification]
-3. [Performance verification (when applicable)]
 ### Phase 2: [Phase Name] (Estimated commits: X)
 **Purpose**: [What this phase aims to achieve]
@@ -66,11 +61,6 @@ Related Issue/PR: #XXX (if any)
 - [ ] [Functional completion criteria]
 - [ ] [Quality completion criteria]
-#### Operational Verification Procedures
-1. [Operation verification steps]
-2. [Expected result verification]
-3. [Performance verification (when applicable)]
 ### Phase 3: [Phase Name] (Estimated commits: X)
 **Purpose**: [What this phase aims to achieve]
@@ -84,9 +74,6 @@ Related Issue/PR: #XXX (if any)
 - [ ] [Functional completion criteria]
 - [ ] [Quality completion criteria]
-#### Operational Verification Procedures
-[Copy relevant integration point operational verification from Design Doc]
 ### Final Phase: Quality Assurance (Required) (Estimated commits: 1)
 **Purpose**: Overall quality assurance and Design Doc consistency verification
@@ -94,13 +81,10 @@ Related Issue/PR: #XXX (if any)
 - [ ] Verify all Design Doc acceptance criteria achieved
 - [ ] Security review: Verify security considerations from Design Doc are implemented
 - [ ] Quality checks (types, lint, format)
-- [ ] Execute all tests
+- [ ] Execute all tests (including integration/E2E from test skeletons, when provided)
 - [ ] Coverage 70%+
 - [ ] Document updates
-#### Operational Verification Procedures
-[Copy operational verification procedures from Design Doc]
 ### Quality Assurance
 - [ ] Implement staged quality checks (details: refer to ai-development-guide skill)
 - [ ] All tests pass
@@ -110,7 +94,8 @@ Related Issue/PR: #XXX (if any)
 ## Completion Criteria
 - [ ] All phases completed
-- [ ] Each phase's operational verification procedures executed
+- [ ] All integration/E2E tests passing (when test skeletons provided)
+- [ ] Acceptance criteria manually verified (when test skeletons are not provided)
 - [ ] Design Doc acceptance criteria satisfied
 - [ ] Staged quality checks completed (zero errors)
 - [ ] All tests pass

package/.agents/skills/recipe-add-integration-tests/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: recipe-add-integration-tests
-description: "Add integration/E2E tests to existing codebase using Design Doc acceptance criteria."
+description: "Add integration/E2E tests to existing codebase using Design Docs."
 ---
 ## Required Skills [LOAD BEFORE EXECUTION]
@@ -26,11 +26,11 @@ description: "Add integration/E2E tests to existing codebase using Design Doc ac
 - Test review -> Spawn integration-test-reviewer agent
 - Quality checks -> Spawn quality-fixer agent
-Design Doc path: $ARGUMENTS
+Document paths: $ARGUMENTS
 ## Prerequisites
-- Design Doc must exist (created manually or via reverse-engineer)
+- At least one Design Doc must exist (created manually or via reverse-engineer)
 - Existing implementation to test
 ## Execution Flow
@@ -39,27 +39,59 @@ Design Doc path: $ARGUMENTS
 Reference documentation-criteria skill for task file template in Step 3.
-### Step 1: Validate Design Doc
+### Step 1: Discover and Validate Documents
-Verify Design Doc exists at $ARGUMENTS or find the most recent in docs/design/.
+```bash
+# Verify at least one document path was provided
+test -n "$ARGUMENTS" || { echo "ERROR: No document paths provided"; exit 1; }
+# Verify provided paths exist
+ls $ARGUMENTS
+```
+Use only the user-provided paths in `$ARGUMENTS`. Do not auto-discover additional Design Docs or UI Specs.
+Classify provided documents by path and filename, using first-match-wins:
+- Path matches `docs/ui-spec/*.md` -> **UI Spec**
+- Path matches `docs/design/*-backend-*.md` or `docs/design/*backend*.md` -> **Design Doc (backend)**
+- Path matches `docs/design/*-frontend-*.md` or `docs/design/*frontend*.md` -> **Design Doc (frontend)**
+- Path matches `docs/design/*.md` and none of the above -> **single-layer Design Doc**
+If a filename appears to match both backend and frontend, halt and ask the user which layer it belongs to.
 ### Step 2: Skeleton Generation
-Spawn acceptance-test-generator agent: "Generate test skeletons from Design Doc at [path from Step 1]."
+Spawn acceptance-test-generator agent with only the documents that exist from Step 1:
+```text
+Generate test skeletons from the following documents:
+- Design Doc (backend): [path]    <- include only if exists
+- Design Doc (frontend): [path]   <- include only if exists
+- UI Spec: [path]                 <- include only if exists
+```
-**Expected output**: `generatedFiles` containing integration and e2e paths
+**Expected output**: `generatedFiles` as a structured object grouped by layer, for example:
+```json
+{
+  "backend": ["path/to/backend.int.test.ts"],
+  "frontend": ["path/to/frontend.int.test.ts"],
+  "e2e": ["path/to/flow.e2e.test.ts"]
+}
+```
-### Step 3: Create Task File [GATE]
+### Step 3: Create Task Files [GATE]
 **[STOP — BLOCKING]** Present task file content to user for confirmation before proceeding to implementation.
 **CANNOT proceed until user explicitly confirms.**
-Create task file at: `docs/plans/tasks/integration-tests-YYYYMMDD.md`
+Create one task file per layer, using the monorepo-flow.md naming convention for deterministic agent routing:
+- Backend skeletons exist -> `docs/plans/tasks/integration-tests-backend-task-YYYYMMDD.md`
+- Frontend skeletons exist -> `docs/plans/tasks/integration-tests-frontend-task-YYYYMMDD.md`
+- Single-layer (no backend/frontend distinction) -> `docs/plans/tasks/integration-tests-backend-task-YYYYMMDD.md`
-**Template**:
+**Template** (per task file):
 ```markdown
 ---
-name: Implement integration tests for [feature name]
+name: Implement [layer] integration tests for [feature name]
 type: test-implementation
 ---
@@ -69,8 +101,8 @@ Implement test cases defined in skeleton files.
 ## Target Files
-- Skeleton: [path from Step 2 generatedFiles]
-- Design Doc: [path from Step 1]
+- Skeleton: [layer-specific paths from Step 2 generatedFiles]
+- Design Doc: [layer-specific Design Doc from Step 1]
 ## Tasks
@@ -85,17 +117,22 @@ Implement test cases defined in skeleton files.
 - No quality issues
 ```
-**Output**: "Task file created at [path]. Ready for Step 4."
+**Output**: "Task file(s) created at [path(s)]. Ready for Step 4."
 ### Step 4: Test Implementation
-Spawn task-executor agent: "Implement integration tests. Task file: docs/plans/tasks/integration-tests-YYYYMMDD.md. Implement tests following the task file."
+For each task file from Step 3, invoke task-executor routed by filename pattern:
+- `*-backend-task-*` -> Spawn `task-executor`
+- `*-frontend-task-*` -> Spawn `task-executor-frontend`
+- Prompt: "Task file: [task file path from Step 3]. Implement tests following the task file."
+Execute one task file at a time through Steps 4 -> 5 -> 6 -> 7 before starting the next.
 **Expected output**: `status`, `testsAdded`
 ### Step 5: Test Review
-Spawn integration-test-reviewer agent: "Review test quality. Test files: [paths from Step 4 testsAdded]. Skeleton files: [paths from Step 2 generatedFiles]."
+Spawn integration-test-reviewer agent: "Review test quality. Test files: [paths from Step 4 testsAdded]. Skeleton files: [layer-specific paths from Step 2 generatedFiles matching current task's layer]."
 **Expected output**: `status` (approved/needs_revision), `requiredFixes`
@@ -103,11 +140,14 @@ Spawn integration-test-reviewer agent: "Review test quality. Test files: [paths
 Check Step 5 result:
 - `status: approved` -> Mark complete, proceed to Step 7
-- `status: needs_revision` -> Spawn task-executor agent: "Fix the following issues in test files: [requiredFixes from Step 5]." Then return to Step 5.
+- `status: needs_revision` -> Spawn the layer-appropriate executor with: "Fix the following issues in test files: [requiredFixes from Step 5]." Then return to Step 5. Maximum 2 revision cycles per task file; if still `needs_revision`, escalate to the user.
 ### Step 7: Quality Check
-Spawn quality-fixer agent: "Final quality assurance for test files added in this workflow. Run all tests and verify coverage."
+Spawn quality-fixer routed by task filename pattern:
+- `*-backend-task-*` -> Spawn `quality-fixer`
+- `*-frontend-task-*` -> Spawn `quality-fixer-frontend`
+- Prompt: "Final quality assurance for test files added in this workflow. Run all tests and verify coverage."
 **Expected output**: `status` (`approved`/`blocked`)

package/.agents/skills/recipe-diagnose/SKILL.md CHANGED Viewed

@@ -83,7 +83,21 @@ Register the following and execute:
 ### Step 1: Investigation (investigator)
-Spawn investigator agent: "Comprehensively collect information related to the following phenomenon. Phenomenon: [Problem reported by user]. Problem essence: [taskEssence]. Investigation focus: [investigationFocus]. Applicable rules: [selectedRules summary]."
+Spawn investigator agent with the following prompt:
+```text
+Comprehensively collect information related to the following phenomenon.
+Phenomenon: [Problem reported by user]
+Problem essence: [taskEssence]
+Investigation focus: [investigationFocus]
+Applicable rules: [selectedRules summary]
+For change failures, also include:
+- what changed
+- what broke
+- what both areas share
+```
 **Expected output**: Evidence matrix, comparison analysis results, causal tracking results, list of unexplored areas, investigation limitations
@@ -92,12 +106,14 @@ Spawn investigator agent: "Comprehensively collect information related to the fo
 Review investigation output:
 **Quality Check** (verify output contains the following):
-- [ ] comparisonAnalysis
-- [ ] causalChain for each hypothesis (reaching stop condition)
+- [ ] `comparisonAnalysis` is present and `normalImplementation` is non-null, or explicitly states that no working implementation was found
+- [ ] causalChain for each hypothesis reaches a stop condition
 - [ ] causeCategory for each hypothesis
+- [ ] `investigationSources` covers at least 3 distinct source types
+- [ ] each hypothesis has supporting evidence with a concrete source
 - [ ] Investigation covering investigationFocus items (when provided)
-**If quality insufficient**: MUST re-spawn investigator agent specifying missing items
+**If quality insufficient**: MUST re-spawn investigator agent specifying the missing items and include the previous investigation output for context
 ENFORCEMENT: Proceeding to verifier with incomplete investigation data produces unreliable conclusions.
 **design_gap Escalation**:

package/.agents/skills/recipe-reverse-engineer/SKILL.md CHANGED Viewed

@@ -69,6 +69,7 @@ Spawn scope-discoverer agent: "Discover functional scope targets in the codebase
 - No units discovered -> ask user for hints
 - `$STEP_1_OUTPUT.prdUnits` exists
 - All `sourceUnits` across `prdUnits` (flattened, deduplicated) match the set of `discoveredUnits` IDs — no unit missing, no unit duplicated
+- Each discovered unit's `unitInventory` has at least one non-empty category. If all categories are empty, re-run discovery with focus on that unit
 **[STOP — BLOCKING]** If human review enabled: Present `$STEP_1_OUTPUT.prdUnits` with their source unit mapping to user for confirmation.
 **CANNOT proceed until user explicitly confirms.**
@@ -79,7 +80,7 @@ Spawn scope-discoverer agent: "Discover functional scope targets in the codebase
 #### Step 2: PRD Generation
-Spawn prd-creator agent: "Create reverse-engineered PRD for the following feature. Operation Mode: reverse-engineer. External Scope Provided: true. Feature: $PRD_UNIT_NAME. Description: $PRD_UNIT_DESCRIPTION. Related Files: $PRD_UNIT_COMBINED_RELATED_FILES. Entry Points: $PRD_UNIT_COMBINED_ENTRY_POINTS. Source Units: $PRD_UNIT_SOURCE_UNITS. Skip independent scope discovery. Use provided scope data. Create final version PRD based on code investigation within specified scope."
+Spawn prd-creator agent: "Create reverse-engineered PRD for the following feature. Operation Mode: reverse-engineer. External Scope Provided: true. Feature: $PRD_UNIT_NAME. Description: $PRD_UNIT_DESCRIPTION. Related Files: $PRD_UNIT_COMBINED_RELATED_FILES. Entry Points: $PRD_UNIT_COMBINED_ENTRY_POINTS. Source Units: $PRD_UNIT_SOURCE_UNITS. Use provided scope as an investigation starting point. If tracing entry points reveals directly connected files outside this scope, include them. Create final version PRD based on thorough code investigation."
 **Store output as**: `$STEP_2_OUTPUT` (PRD path)
@@ -87,12 +88,13 @@ Spawn prd-creator agent: "Create reverse-engineered PRD for the following featur
 **Prerequisite**: $STEP_2_OUTPUT (PRD path from Step 2)
-Spawn code-verifier agent: "Verify consistency between PRD and code implementation. doc_type: prd. document_path: $STEP_2_OUTPUT. code_paths: $PRD_UNIT_COMBINED_RELATED_FILES. verbose: false."
+Spawn code-verifier agent: "Verify consistency between PRD and code implementation. doc_type: prd. document_path: $STEP_2_OUTPUT. verbose: false."
 **Store output as**: `$STEP_3_OUTPUT`
 **Quality Gate**:
-- consistencyScore >= 70 -> proceed to review
+- consistencyScore >= 70 and verifiableClaimCount >= 20 -> proceed to review (guards against shallow verification passes with too few extracted claims)
+- consistencyScore >= 70 and verifiableClaimCount < 20 -> re-run verifier because investigation depth is insufficient
 - consistencyScore < 70 -> flag for detailed review
 #### Step 4: Review
@@ -151,6 +153,7 @@ Map PRD units to Design Doc generation targets by resolving each PRD unit's `sou
 - `technicalProfile.publicInterfaces` -> Public Interfaces
 - `dependencies` -> Dependencies
 - `relatedFiles` -> Scope boundary
+- `unitInventory` -> Unit Inventory
 **Store output as**: `$STEP_6_OUTPUT`
@@ -168,6 +171,11 @@ Map PRD units to Design Doc generation targets by resolving each PRD unit's `sou
     "publicInterfaces": ["AuthService.login()", "AuthController.handleLogin()"],
     "dependencies": ["UNIT-003"],
     "scopeBoundary": ["src/auth/*"],
+    "unitInventory": {
+      "routes": [],
+      "testFiles": [],
+      "publicExports": []
+    },
     "mappingRationale": "Default 1:1 mapping from PRD unit because technical scope is cohesive"
   }
 ]
@@ -186,13 +194,13 @@ Map PRD units to Design Doc generation targets by resolving each PRD unit's `sou
 **Scope**: Document current architecture as-is. This is a documentation task, not a design improvement task.
-Spawn technical-designer agent: "Create Design Doc for the following feature based on existing code. Operation Mode: create. Feature: $UNIT_NAME. Description: $UNIT_DESCRIPTION. Primary Files: $UNIT_PRIMARY_MODULES. Public Interfaces: $UNIT_PUBLIC_INTERFACES. Dependencies: $UNIT_DEPENDENCIES. Parent PRD: $APPROVED_PRD_PATH. Document current architecture as-is."
+Spawn technical-designer agent: "Create Design Doc for the following feature based on existing code. Operation Mode: reverse-engineer. Feature: $UNIT_NAME. Description: $UNIT_DESCRIPTION. Primary Files: $UNIT_PRIMARY_MODULES. Public Interfaces: $UNIT_PUBLIC_INTERFACES. Dependencies: $UNIT_DEPENDENCIES. Unit Inventory: $UNIT_INVENTORY. Parent PRD: $APPROVED_PRD_PATH. Document current architecture as-is. Use Unit Inventory as the completeness baseline."
 **Store output as**: `$STEP_7_OUTPUT`
 #### Step 8: Code Verification
-Spawn code-verifier agent: "Verify consistency between Design Doc and code implementation. doc_type: design-doc. document_path: $STEP_7_OUTPUT. code_paths: $UNIT_PRIMARY_MODULES. verbose: false."
+Spawn code-verifier agent: "Verify consistency between Design Doc and code implementation. doc_type: design-doc. document_path: $STEP_7_OUTPUT. verbose: false."
 **Store output as**: `$STEP_8_OUTPUT`

package/.codex/agents/code-verifier.toml CHANGED Viewed

@@ -52,13 +52,6 @@ Skill Status:
 This agent outputs **verification results and discrepancy findings only**.
 Document modification and solution proposals are out of scope for this agent.
-## Core Responsibilities
-1. **Claim Extraction** - Extract verifiable claims from document
-2. **Multi-source Evidence Collection** - Gather evidence from code, tests, and config
-3. **Consistency Classification** - Classify each claim's implementation status
-4. **Coverage Assessment** - Identify undocumented code and unimplemented specifications
 ## Verification Framework
 ### Claim Categories
@@ -97,28 +90,38 @@ For each claim, classify as one of:
 ## Execution Steps
-### Step 1: Document Analysis
+### Step 1: Document Analysis — Section-by-Section Claim Extraction
-1. Read the target document
-2. Extract specific, testable claims
+1. Read the target document in full
+2. Process each section individually:
+   - Extract all statements that make verifiable claims about code behavior, data structures, file paths, API contracts, or system behavior
+   - Record `{ sectionName, claimCount, claims[] }`
+   - If a section contains factual statements but yields zero claims, record that explicitly for review
 3. Categorize each claim
 4. Note ambiguous claims that cannot be verified
+5. Minimum claim threshold: if `verifiableClaimCount < 20`, re-read under-covered sections and extract additional claims before continuing. Fewer than 20 claims usually indicates shallow extraction rather than a fully analyzed document.
 ### Step 2: Code Scope Identification
-1. Extract file paths mentioned in document
-2. Infer additional relevant paths from context
-3. Build verification target list
+1. If `code_paths` are provided, use them as a starting point, not a ceiling
+2. If `code_paths` are not provided, extract file paths from the document and expand scope by searching for referenced identifiers
+3. Infer additional relevant paths from context
+4. Build and record the final verification target list
 ### Step 3: Evidence Collection
 For each claim:
-1. **Primary Search**: Find direct implementation
+1. **Primary Search**: Find direct implementation with Read/Grep
 2. **Secondary Search**: Check test files for expected behavior
 3. **Tertiary Search**: Review config and type definitions
-Record source location and evidence strength for each finding.
+Evidence rules:
+- Record source location and evidence strength for each finding
+- Existence claims must be verified with Grep or file enumeration before reporting
+- Behavioral claims must be backed by reading the implementation, not by naming alone
+- Identifier claims must compare exact strings from code against the document
+- Single-source findings remain low confidence
 ### Step 4: Consistency Classification
@@ -130,11 +133,15 @@ For each claim with collected evidence:
    - medium: 2 sources agree
    - low: 1 source only
-### Step 5: Coverage Assessment
+### Step 5: Reverse Coverage Assessment — Code-to-Document Direction
+Perform this step with actual tool-backed enumeration, not memory:
-1. **Document Coverage**: What percentage of code is documented?
-2. **Implementation Coverage**: What percentage of specs are implemented?
-3. List undocumented features and unimplemented specs
+1. Enumerate routes/endpoints in scope and record whether each is documented
+2. Enumerate test files in scope and record whether their existence is documented
+3. Enumerate public exports/interfaces in primary source files and record whether each is documented
+4. Compile undocumented code items from the enumerations
+5. Compile unimplemented document items from earlier claim verification
 ### Step 6: Return JSON Result
@@ -151,9 +158,16 @@ Return the JSON result as the final response. See Output Format for the schema.
   "summary": {
     "docType": "prd|design-doc",
     "documentPath": "/path/to/document.md",
+    "verifiableClaimCount": 24,
+    "matchCount": 20,
     "consistencyScore": 85,
     "status": "consistent|mostly_consistent|needs_review|inconsistent"
   },
+  "claimCoverage": {
+    "sectionsAnalyzed": 8,
+    "sectionsWithClaims": 7,
+    "sectionsWithZeroClaims": ["Appendix"]
+  },
   "discrepancies": [
     {
       "id": "D001",
@@ -162,9 +176,20 @@ Return the JSON result as the final response. See Output Format for the schema.
       "claim": "Brief claim description",
       "documentLocation": "PRD.md:45",
       "codeLocation": "src/auth.ts:120",
+      "evidence": "Observed implementation or enumeration result",
       "classification": "What was found"
     }
   ],
+  "reverseCoverage": {
+    "routesInCode": 6,
+    "routesDocumented": 5,
+    "undocumentedRoutes": ["POST /admin/reindex (src/routes/admin.ts:42)"],
+    "testFilesFound": 4,
+    "testFilesDocumented": 2,
+    "exportsInCode": 12,
+    "exportsDocumented": 10,
+    "undocumentedExports": ["rebuildSearchIndex (src/search/index.ts:18)"]
+  },
   "coverage": {
     "documented": ["Feature areas with documentation"],
     "undocumented": ["Code features lacking documentation"],
@@ -190,6 +215,8 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
                    - (minorDiscrepancies * 2)
 ```
+If `verifiableClaimCount < 20`, treat the score as unstable and return to Step 1 before finalizing. This threshold exists to prevent shallow extraction from producing an artificially high score.
 | Score | Status | Interpretation |
 |-------|--------|----------------|
 | 85-100 | consistent | Document accurately reflects code |
@@ -199,9 +226,11 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
 ## Completion Criteria
-- [ ] Extracted all verifiable claims from document
+- [ ] Extracted claims section-by-section with per-section counts recorded
+- [ ] `verifiableClaimCount >= 20`
 - [ ] Collected evidence from multiple sources for each claim
 - [ ] Classified each claim (match/drift/gap/conflict)
+- [ ] Performed reverse coverage with route, test file, and public export enumeration
 - [ ] Identified undocumented features in code
 - [ ] Identified unimplemented specifications
 - [ ] Calculated consistency score
@@ -209,9 +238,13 @@ consistencyScore = (matchCount / verifiableClaimCount) * 100
 ## Output Self-Check
 - [ ] All findings are based on verification evidence (no modifications proposed)
+- [ ] Existence claims are backed by Grep or enumeration evidence
+- [ ] Behavioral claims are backed by reading the actual implementation
+- [ ] Identifier comparisons use exact strings from code
 - [ ] Each classification cites multiple sources (not single-source)
 - [ ] Low-confidence classifications are explicitly noted
 - [ ] Contradicting evidence is documented, not ignored
+- [ ] `reverseCoverage` includes concrete counts from tool-backed enumeration
 ## Completion Gate [BLOCKING]

package/.codex/agents/investigator.toml CHANGED Viewed

@@ -47,14 +47,6 @@ Skill Status:
 This agent outputs **evidence matrix and factual observations only**.
 Solution derivation is out of scope for this agent.
-## Core Responsibilities
-1. **Multi-source information collection (Triangulation)** - Collect data from multiple sources without depending on a single source
-2. **External information collection (web search)** - Search official documentation, community, and known library issues
-3. **Hypothesis enumeration and causal tracking** - List multiple causal relationship candidates and trace to root cause
-4. **Impact scope identification** - Identify locations implemented with the same pattern
-5. **Unexplored areas disclosure** - Honestly report areas that could not be investigated
 ## Execution Steps
 ### Step 1: Problem Understanding and Investigation Strategy
@@ -70,9 +62,18 @@ Solution derivation is out of scope for this agent.
 ### Step 2: Information Collection
-- **Internal sources**: Code, git history, dependencies, configuration, Design Doc/ADR
-- **External sources (web search)**: Official documentation, Stack Overflow, GitHub Issues, package issue trackers
-- **Comparison analysis**: Differences between working implementation and problematic area (call order, initialization timing, configuration values)
+Investigate each source type below and record findings even when empty:
+| Source | Minimum Investigation Action |
+|--------|------------------------------|
+| Code | Read directly related files and search for the reported symbols, errors, or messages |
+| git history | Review recent history for affected files and compare working/broken states when applicable |
+| Dependencies | Inspect package manifests and relevant package versions or changelogs |
+| Configuration | Read relevant config files and search for related keys across the project |
+| Design Doc or ADR | Search for matching docs and read them. Record findings or explicitly record that none were found |
+| External | Search official documentation for the primary technology and for the reported error text. Record findings or explicitly record that no relevant result was found |
+**Comparison analysis**: Differences between working implementation and problematic area (call order, initialization timing, configuration values)
 Information source priority:
 1. Comparison with "working implementation" in project
@@ -86,9 +87,7 @@ Information source priority:
 - Collect supporting and contradicting evidence for each hypothesis
 - Determine causeCategory: typo / logic_error / missing_constraint / design_gap / external_factor
-**Signs of shallow tracking**:
-- Stopping at "~ is not configured" → without tracing why it's not configured
-- Stopping at technical element names → without tracing why that state occurred
+**Tracking depth check**: Each causal chain must reach a stop condition. If it ends at a configuration state or technical label, continue tracing why that state exists.
 ### Step 4: Impact Scope Identification
@@ -172,7 +171,7 @@ Return the JSON result as the final response. See Output Format for the schema.
 - [ ] Determined problem type and executed diff analysis for change failures
 - [ ] Output comparisonAnalysis
-- [ ] Investigated internal and external sources
+- [ ] Investigated each source type or recorded that it had no relevant findings
 - [ ] Enumerated 2+ hypotheses with causal tracking, evidence collection, and causeCategory determination for each
 - [ ] Determined impactScope and recurrenceRisk
 - [ ] Documented unexplored areas and investigation limitations