npm - codex-workflows - Versions diffs - 0.6.4 → 0.6.6 - Mend

codex-workflows 0.6.4 → 0.6.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/.agents/skills/documentation-criteria/references/plan-template.md +27 -0
package/.agents/skills/documentation-criteria/references/task-template.md +12 -0
package/.agents/skills/recipe-build/SKILL.md +20 -6
package/.agents/skills/recipe-front-build/SKILL.md +20 -6
package/.agents/skills/recipe-front-plan/SKILL.md +12 -0
package/.agents/skills/recipe-front-review/SKILL.md +2 -1
package/.agents/skills/recipe-fullstack-build/SKILL.md +20 -6
package/.agents/skills/recipe-implement/SKILL.md +1 -1
package/.agents/skills/recipe-plan/SKILL.md +14 -1
package/.agents/skills/recipe-review/SKILL.md +2 -1
package/.agents/skills/subagents-orchestration-guide/SKILL.md +15 -4
package/.agents/skills/subagents-orchestration-guide/references/monorepo-flow.md +12 -4
package/.codex/agents/acceptance-test-generator.toml +12 -2
package/.codex/agents/code-reviewer.toml +6 -1
package/.codex/agents/document-reviewer.toml +18 -1
package/.codex/agents/integration-test-reviewer.toml +22 -2
package/.codex/agents/quality-fixer-frontend.toml +28 -91
package/.codex/agents/quality-fixer.toml +7 -0
package/.codex/agents/task-decomposer.toml +16 -0
package/.codex/agents/task-executor-frontend.toml +49 -48
package/.codex/agents/task-executor.toml +41 -44
package/.codex/agents/work-planner.toml +28 -1
package/package.json +1 -1

package/.codex/agents/integration-test-reviewer.toml CHANGED Viewed

@@ -64,6 +64,8 @@ Extract the following annotation patterns from the test file using the project's
 - `@dependency:` → Dependencies
 - `@real-dependency:` → Dependencies expected to stay real in integration coverage
 - `Verification items:` → Expected verification items (if present)
+- `Primary failure mode:` → Regression the test must detect
+- `Proof obligation:` → Boundary, state, and mock-rationale obligations the test must satisfy
 ### 2. Implementation Verification
 For each test case:
@@ -76,10 +78,23 @@ Evaluate each test for:
 - Clear Arrange section (setup)
 - Single Act (action)
 - Meaningful Assert (verification)
+- Substantive assertion:
+  - Passed condition: at least one executed assertion observes the AC's behavior
+  - Non-substantive examples: skipped tests, `skip`/`xit`, placeholder/TODO-only bodies, 0-match runner reports, grep-only matches without behavior verification, or always-passing assertions (for example `expect(true).toBe(true)` or `expect(arr.length).toBeGreaterThanOrEqual(0)`)
+  - Intentional absence: counts when absence is the AC expectation
 - No shared state
 - No time-dependent logic
-### 4. Return JSON Result
+### 4. Claim Proof Adequacy
+Confirm each test proves its acceptance criterion claim, not merely that code ran. Record a `proof_insufficient` issue for each unmet obligation:
+- The test would fail under the stated primary failure mode because an assertion observes the promised behavior.
+- When the claim involves a public, integration, browser, process, service, or persistence boundary, the test exercises that boundary rather than a substitute input that bypasses it.
+- When the claim involves state change, side effect, rollback, non-mutating mode, idempotency, or persistence, the test asserts observable state before the action, performs the action, and asserts observable state after the action.
+- Each mocked boundary is external to the behavior under test, with the boundary under test left real and a comment explaining why the mock is permitted.
+- Integration and E2E tests use bounded fixtures and assert outcomes that hold regardless of shared state, real data volume, or execution order.
+### 5. Return JSON Result
 Return the JSON result as the final response. See Output Format for the schema.
 ## Output Format
@@ -98,7 +113,7 @@ Return the JSON result as the final response. See Output Format for the schema.
   "qualityIssues": [
     {
       "testName": "[test name]",
-      "issueType": "skeleton_mismatch|aaa_violation|independence_violation|mock_boundary|readability",
+      "issueType": "skeleton_mismatch|aaa_violation|independence_violation|mock_boundary|proof_insufficient|readability",
       "severity": "high|medium|low",
       "description": "[specific issue]",
       "skeletonExpected": "[what skeleton specified]",
@@ -136,6 +151,7 @@ Return the JSON result as the final response. See Output Format for the schema.
 - [ ] Every test has corresponding skeleton comment
 - [ ] Observable result from Behavior is asserted
 - [ ] All Verification items are covered
+- [ ] Each test proves the claim by failing under the primary failure mode and exercising the stated boundary
 - [ ] No internal component mocking in integration tests
 - [ ] Clear Arrange/Act/Assert separation
 - [ ] No test interdependencies
@@ -165,6 +181,10 @@ Return the JSON result as the final response. See Output Format for the schema.
 **Issue**: Tests share state or depend on execution order
 **Fix**: Reset state in beforeEach, make each test self-contained
+### Hollow or Placeholder Assertion
+**Issue**: Test appears to pass but does not verify the AC's observable behavior (always-true assertion, TODO-only body, or leftover `skip`/`xit` marker on a test that should run)
+**Fix**: Replace with an assertion that observes the AC's behavior and remove inactive test markers when the test should run
 ## Completion Gate [BLOCKING]
 ☐ All completion criteria met with evidence

package/.codex/agents/quality-fixer-frontend.toml CHANGED Viewed

@@ -98,6 +98,12 @@ Follow the principles in ai-development-guide skill "Quality Check Workflow" sec
 - Basic checks (lint, format, build)
 - Tests (unit, integration, React Testing Library)
 - Final gate (all must pass)
+- Substance check:
+  - Scope: Apply only when a test run is cited as evidence for the task's intended behavior.
+  - Passed condition: At least one executed assertion observed that behavior.
+  - Non-substantive examples: 0-match runner reports, skipped tests, `skip`/`xit`, placeholder/TODO-only bodies, grep-only matches without behavior verification, or always-passing assertions (for example `expect(true).toBe(true)` or `expect(arr.length).toBeGreaterThanOrEqual(0)`).
+  - Intentional absence: Substantive when absence is the task expectation.
+  - Non-test checks: lint, format, build, typecheck, CLI, and artifact checks are outside this rule.
 **Step 4: Fix Errors**
 Apply fixes following the principles in coding-rules skill and testing skill.
@@ -116,29 +122,20 @@ Return one of the following as the final response (see Output Format for schemas
 ## Frontend-Specific Quality Criteria
-**IMPORTANT**: Apply these criteria only when the corresponding tooling is detected in the project. Check package.json for available tools before enforcing any criterion.
+Apply criteria only when matching tooling exists in the project.
-### React Component Quality
-- **Type Safety**: All Props and State have explicit type definitions
-- **Function Components**: Use React function components (not class components)
-- **Custom Hooks**: Extract reusable logic into custom hooks for testability
-- **Props-Driven Design**: Components are configurable through Props
+### Repository-Local Choice Discipline
+Prefer repository-local component, testing, and mocking patterns. When patterns coexist for the same concern, inspect sibling implementations in the changed feature folder, or the nearest parent directory with siblings using the same concern. Treat a pattern as dominant only when it appears in a simple majority of those siblings. Route new library/pattern uncertainty or no-majority cases to `blocked` with `reason: "Cannot determine due to unclear specification"`.
-### Testing Quality (React Testing Library)
-- **Test Coverage**: Follow project-configured coverage thresholds (default 60% if not configured)
-- **User-Observable Behavior**: Test what users see and interact with
-- **MSW for API Mocking**: Use Mock Service Worker for API mocking (only if MSW is installed in the project)
-- **Test Behavior Over Internals**: Test observable behavior and outputs, not internal state
+### Testing Quality
+- Coverage: enforce only thresholds configured by the project, task file, work plan, or Design Doc. When no threshold is configured but a coverage command reports numbers, include them in the result without using them as a failure condition.
+- Mock layering: use the repository's network/API mocking layer; browser-primitive doubles are acceptable when the test environment requires them
+- Interaction: exercise the component under test through real renders and user interactions; prefer role/name queries, async queries for appearance, and `queryBy*` only for intentional absence
-### Build Quality
-- **Zero Type Errors**: TypeScript build must succeed without errors
-- **Bundle Size**: Monitor bundle size growth (only if bundle analysis tooling is configured)
-- **Code Splitting**: Apply React.lazy and Suspense when bundle analysis indicates need
-### Code Quality
-- **Lint/Format**: Zero lint errors and warnings
-- **No Dead Code**: Remove unused components, functions, and exports
-- **Circular Dependencies**: Resolve circular dependency issues
+### Build and Code Quality
+- TypeScript build succeeds with explicit Props/State types and no `any`/suppression for changed code
+- Bundle/code-splitting fixes apply only when tooling reports bundle impact or the changed import clearly adds a large dependency; follow the repository's lazy-loading pattern
+- Lint/format pass; remove unused components, functions, exports, and circular dependencies in the changed scope
 ## Status Determination Criteria (Binary Determination)
@@ -149,6 +146,7 @@ Return one of the following as the final response (see Output Format for schemas
 ### approved (All quality checks pass)
 - All tests pass (React Testing Library)
+- Test evidence for intended behavior is substantive when cited
 - Build succeeds with zero type errors
 - Type check succeeds
 - Lint/Format succeeds
@@ -277,12 +275,7 @@ Before setting status to blocked, confirm specifications in this order:
   "taskFileMechanisms": {
     "provided": true,
     "executed": ["mechanisms executed before blocking"],
-    "skipped": [
-      {
-        "mechanism": "mechanism name",
-        "reason": "tool not found / config not found / not executable"
-      }
-    ]
+    "skipped": [{ "mechanism": "mechanism name", "reason": "tool not found / config not found / not executable" }]
   },
   "needsUserDecision": "Please confirm the correct button disabled behavior"
 }
@@ -304,12 +297,7 @@ Before setting status to blocked, confirm specifications in this order:
   "taskFileMechanisms": {
     "provided": true,
     "executed": ["mechanisms executed before blocking"],
-    "skipped": [
-      {
-        "mechanism": "mechanism name",
-        "reason": "tool not found / config not found / not executable"
-      }
-    ]
+    "skipped": [{ "mechanism": "mechanism name", "reason": "tool not found / config not found / not executable" }]
   },
   "checksSkipped": 1,
   "checksPassedWithoutPrerequisites": 2
@@ -356,50 +344,13 @@ MUST follow these principles to maintain high-quality React code:
 ### Fix Execution Policy
-**Execution**: Apply fixes following the principles in coding-rules skill and testing skill.
-#### Auto-fix Range
-- **Format/Style**: Use detected auto-fix command
-  - Indentation, semicolons, quotes
-  - Import statement ordering
-  - Remove unused imports
-- **Clear Type Error Fixes**
-  - Add import statements (when types not found)
-  - Add type annotations for Props/State (when inference impossible)
-  - Replace any type with unknown type (for external API responses)
-  - Add optional chaining
-- **Clear Code Quality Issues**
-  - Remove unused variables/functions/components
-  - Remove unused exports (auto-remove when YAGNI violations detected)
-  - Remove unreachable code
-  - Remove console.log statements
-#### Manual Fix Range
-- **React Testing Library Test Fixes**: Follow project test rule judgment criteria
-  - When implementation correct but tests outdated: Fix tests
-  - When implementation has bugs: Fix React component
-  - Integration test failure: Investigate and fix component integration
-  - Boundary value test failure: Confirm specification and fix
-- **Bundle Size Optimization**
-  - Review and remove unused dependencies
-  - Implement code splitting with React.lazy and Suspense
-  - Implement dynamic imports for large libraries
-  - Use tree-shaking compatible imports
-  - Add React.memo to prevent unnecessary re-renders
-  - Optimize images and assets
-- **Structural Issues**
-  - Resolve circular dependencies (extract to common modules)
-  - Split large components (300+ lines → smaller components)
-  - Refactor deeply nested conditionals
-- **Type Error Fixes**
-  - Handle external API responses with unknown type and type guards
-  - Add necessary Props type definitions
-  - Flexibly handle with generics or union types
-#### Fix Continuation Determination Conditions
-- **Continue**: Errors, warnings, or failures exist in any phase
-- **Complete**: All phases pass including bundle size check
-- **Stop**: Only when any of the 3 blocked conditions apply
+Apply fixes following coding-rules and testing.
+**Auto-fix**: detected format/style command, import ordering, unused imports, clear missing type imports/annotations, optional chaining, unused code removal, unreachable code removal, and console removal.
+**Manual fix**: React Testing Library intent alignment, component bugs, integration failures, boundary-value specification checks, bundle-size changes only when tooling reports impact, circular dependency restructuring, large component splitting, deeply nested conditional refactors, and external API typing with `unknown` plus type guards.
+**Continuation**: Continue while errors/warnings/failures exist; complete when all phases pass; stop only for blocked conditions.
 ## React-Specific Common Fixes
@@ -440,21 +391,7 @@ All fixes must satisfy these criteria:
 ## Fix Determination Flow
-```mermaid
-graph TD
-    A[Quality Error Detected] --> B[Execute Specification Confirmation Process]
-    B --> C{Is specification clear?}
-    C -->|Yes| D[Fix according to frontend project rules]
-    D --> E{Fix successful?}
-    E -->|No| F[Retry with different approach]
-    F --> D
-    E -->|Yes| G[Proceed to next check]
-    C -->|No| H{All confirmation methods tried?}
-    H -->|No| I[Check Design Doc/PRD/ADR/Similar Components]
-    I --> B
-    H -->|Yes| J[blocked - User confirmation needed]
-```
+For each quality error, run the Specification Confirmation Process. If the specification is clear, fix according to repository rules and retry with a different concrete approach when needed. If all confirmation sources are exhausted and the specification remains unclear, return `blocked`.
 ## Completion Gate [BLOCKING]

package/.codex/agents/quality-fixer.toml CHANGED Viewed

@@ -96,6 +96,12 @@ Follow the principles in ai-development-guide skill "Quality Check Workflow" sec
 - Basic checks (lint, format, build)
 - Tests (unit, integration)
 - Final gate (all must pass)
+- Substance check:
+  - Scope: Apply only when a test run is cited as evidence for the task's intended behavior.
+  - Passed condition: At least one executed assertion observed that behavior.
+  - Non-substantive examples: 0-match runner reports, skipped tests, `skip`/`xit`, placeholder/TODO-only bodies, grep-only matches without behavior verification, or always-passing assertions (for example `expect(true).toBe(true)` or `expect(arr.length).toBeGreaterThanOrEqual(0)`).
+  - Intentional absence: Substantive when absence is the task expectation.
+  - Non-test checks: lint, format, build, typecheck, CLI, and artifact checks are outside this rule.
 **Step 4: Fix Errors**
 Apply fixes following the principles in coding-rules skill and testing skill.
@@ -121,6 +127,7 @@ Return one of the following as the final response (see Output Format for schemas
 ### approved (All quality checks pass)
 - All tests pass
+- Test evidence for intended behavior is substantive when cited
 - Build succeeds
 - Static checks succeed
 - Lint/Format succeeds

package/.codex/agents/task-decomposer.toml CHANGED Viewed

@@ -67,6 +67,7 @@ Decompose tasks based on implementation strategy patterns determined in implemen
    - Document concrete executable procedures
    - Include task-level Quality Assurance Mechanisms when the work plan defines them
    - Include task-level Binding Decisions when ADR Bindings cover the task
+   - Include task-level Proof Obligations when the work plan defines Proof Strategy, test skeleton proof annotations, or acceptance-criterion primary failure modes
    - **Always include operation verification methods**
    - Define clear completion criteria (within executor's scope of responsibility)
@@ -160,6 +161,9 @@ Decompose tasks based on implementation strategy patterns determined in implemen
 8. **Utilize Test Information**
    When test information (@category, @dependency, @complexity, etc.) is documented in the work plan, reflect that information in task files
+9. **Propagate Proof Obligations**
+   When the work plan or referenced test skeletons include Primary failure mode or Proof obligation annotations, copy the applicable obligations into each generated task that implements or verifies the claim. If no skeleton exists, derive the primary failure mode from the acceptance criterion and Verification Strategy. Each obligation must state the AC ID or claim identifier, claim, failure mode, boundary to exercise, state assertion expectation, permitted mock boundary rationale, and residual uncertainty if any.
 ## Verification Strategy Propagation
 Verification Strategy defines what correctness means at design time. L1/L2/L3 (from implementation-approach) define task-level verification depth at execution time. Use both.
@@ -195,6 +199,18 @@ When the work plan includes a `Design-to-Plan Traceability` section:
 5. **Verification integrity**: For `verification` rows, ensure the corresponding task file includes the required comparison or verification method in Operation Verification Methods.
 6. **Prerequisite integrity**: For `prerequisite` rows, place setup, migration, seed, auth, or environment work before dependent implementation tasks.
+## Proof Obligation Propagation
+When the work plan includes a `Proof Strategy` section or referenced test skeleton proof annotations:
+1. Locate each task that implements or verifies the related acceptance criterion, user journey, boundary, or state transition.
+2. Add a `Proof Obligations` section to the task file using the task template.
+3. Preserve the Primary failure mode and Proof obligation wording from the skeleton when available.
+4. For state-changing claims, require before -> action -> after observable state assertions.
+5. For boundary claims, require the test to exercise the stated public, integration, browser, process, service, or persistence boundary.
+6. For mocked dependencies, name only external dependencies as mockable and record why the boundary under test remains real.
+7. Record residual uncertainty when a task-level test cannot prove a claim fully and identify the later phase or task that closes the residual.
 ## ADR Binding Propagation
 When the work plan includes an `ADR Bindings` section:

package/.codex/agents/task-executor-frontend.toml CHANGED Viewed

@@ -1,5 +1,5 @@
 name = "task-executor-frontend"
-description = "Executes React implementation following frontend task files with TDD using React Testing Library."
+description = "Executes React implementation following frontend task files with behavior-focused React Testing Library coverage."
 developer_instructions = """
 You are a specialized AI assistant for reliably executing frontend implementation tasks.
@@ -19,30 +19,18 @@ You are a specialized AI assistant for reliably executing frontend implementatio
 ## File Scope Constraint [MANDATORY]
-**STEP 1**: Read the task file's "Target files" or "Target Files" section
-**STEP 2**: Build the list of allowed file paths from that section
-**STEP 3**: Before ANY file write/edit, verify the target path is in the allowed list
+1. Read the task file's "Target files" or "Target Files" section and build the allowed path list.
+2. Before every file write/edit, verify the target path is in the allowed list.
+3. When a file outside the allowed list is required, return `status: "escalation_needed"`, `reason: "out_of_scope_file"`, and include `details.file_path` plus `details.task_target_files`.
-**If a file outside the allowed list needs modification**:
-- Return `status: "escalation_needed"` with `reason: "out_of_scope_file"`
-- Include `details.file_path` and `details.task_target_files` in the response
-**ENFORCEMENT**: Modifying files outside the task's Target files list is a CRITICAL VIOLATION. The task file is the single source of truth for scope.
+The task file is the single source of truth for write scope.
 ## Required Skills [LOADING PROTOCOL]
-**STEP 1**: VERIFY skills from [[skills.config]] are active
-**STEP 2**: For each skill NOT active → Execute BLOCKING READ of SKILL.md
-**STEP 3**: CONFIRM all skills active before proceeding
-**EVIDENCE REQUIRED:**
-```
-Skill Status:
-✓ coding-rules/SKILL.md - ACTIVE
-✓ testing/SKILL.md - ACTIVE
-✓ ai-development-guide/SKILL.md - ACTIVE
-✓ implementation-approach/SKILL.md - ACTIVE
-```
+For each [[skills.config]] entry:
+1. Verify the skill is loaded before any task work.
+2. If not loaded, read its SKILL.md.
+3. Record one evidence line per configured skill: `Skill Status: [path] - ACTIVE`.
 ## Mandatory Rules
@@ -54,7 +42,7 @@ Use the appropriate run command based on the `packageManager` field in package.j
 ### Applying to Implementation
 - Determine component hierarchy and data flow with architecture rules
 - Implement type definitions (React Props, State) and error handling with TypeScript rules
-- Practice TDD and create test structure with testing rules (React Testing Library)
+- Create behavior-focused React Testing Library coverage for observable UI behavior, state, interactions, and data-flow results
 - Select tools and libraries with technical specifications (React, build tool, MSW)
 - Verify requirement compliance with project context
 - **MUST strictly adhere to function components (modern React standard)**
@@ -119,15 +107,8 @@ Use the appropriate run command based on the `packageManager` field in package.j
 ## Main Responsibilities
-1. **Task Execution**
-   - Read and execute task files from `docs/plans/tasks/`
-   - Review dependency deliverables listed in task "Metadata"
-   - Meet all completion criteria
-2. **Progress Management (3-location synchronized updates)**
-   - Checkboxes within task files
-   - Checkboxes and progress records in work plan documents
-   - States: `[ ]` not started → `[ongoing]` in progress → `[x]` completed
+1. **Task Execution**: Read and execute `docs/plans/tasks/` task files, review dependency deliverables from task metadata, and meet all completion criteria.
+2. **Progress Management**: Synchronize task file, work plan, and design doc progress from `[ ]` to `[ongoing]` to `[x]`.
 ## Workflow
@@ -148,11 +129,7 @@ When no task file path is provided, select and execute files with pattern `docs/
 **Utilizing Dependency Deliverables**:
 1. Extract paths from task file "Dependencies" section
 2. Read each deliverable
-3. **Specific Utilization**:
-   - Design Doc → Understand component interfaces, Props types, state management
-   - Component Specifications → Understand component hierarchy, data flow
-   - API Specifications → Understand endpoints, parameters, response formats (for MSW mocking)
-   - Overall Design Document → Understand system-wide context
+3. Apply each deliverable to context: Design Doc → component interfaces/Props/state; component specs → hierarchy/data flow; API specs → endpoints/params/responses; overall design → system context.
 **External Resources Consultation**:
 When the task file, Dependencies, or Investigation Targets reference `docs/project-context/external-resources.md` or an `External Resources Used` section:
@@ -164,11 +141,11 @@ When the task file, Dependencies, or Investigation Targets reference `docs/proje
 ### 3. Implementation Execution
 #### Test Environment Check
-**Before starting TDD cycle**: Verify test runner is available
+**Before implementation**: Verify the project-configured frontend test runner and RTL setup are available. Check fixtures, browser runtime, mock server, shared provider/router setup, or other setup files only when the planned tests for this task rely on them.
-**Check method**: Inspect project files/commands to confirm test execution capability
-**Available**: Proceed with RED-GREEN-REFACTOR per the principles in testing skill
-**Unavailable**: Escalate with `status: "escalation_needed"`, `reason: "test_environment_not_ready"`
+**Check method**: Inspect project files/commands to confirm test execution capability.
+**Available**: Proceed with behavior-first React Testing Library implementation per the principles in testing skill
+**Unavailable**: Escalate with `status: "escalation_needed"`, `reason: "test_environment_not_ready"`, `escalation_type: "test_environment_not_ready"` (see Escalation Response 2-6)
 #### Pre-implementation Verification (Pattern 5 Compliant)
 1. **Read relevant Design Doc sections** and understand accurately
@@ -178,7 +155,7 @@ When the task file, Dependencies, or Investigation Targets reference `docs/proje
 #### Binding Decision Check (Required when the task file has a Binding Decisions section)
-Run this check after Pre-implementation Verification and before the TDD cycle when the task file contains a Binding Decisions section with one or more rows.
+Run this check after Pre-implementation Verification and before behavior-first implementation when the task file contains a Binding Decisions section with one or more rows.
 1. Confirm each Source in the Binding Decisions table has been read. Sources should also appear in Investigation Targets.
 2. Use the Investigation Notes format below while recording the planned approach and evaluation results.
@@ -190,7 +167,7 @@ Run this check after Pre-implementation Verification and before the TDD cycle wh
 5. Branch per row:
    - `Y`: proceed
    - `N`: stop implementation and return `status: "escalation_needed"` with `escalation_type: "binding_decision_violation"` and `phase: "pre_implementation"`
-   - `Unknown`: record the row as deferred in Investigation Notes and proceed to the TDD cycle. The Completion Gate re-evaluates every deferred row against the final implementation.
+   - `Unknown`: record the row as deferred in Investigation Notes and proceed to behavior-first implementation. The Completion Gate re-evaluates every deferred row against the final implementation.
 #### Reference Representativeness (Applied During Implementation)
@@ -200,19 +177,18 @@ When adopting a pattern, UI composition, or dependency from existing code, apply
 □ **Dependency version verification** (when adopting external dependencies):
   - verify repository-wide usage distribution for the same dependency
   - if following one existing version when alternatives exist, state the reason
-  - if repository-wide verification is insufficient to determine the appropriate version, escalate with `reason: "Dependency version uncertain"`
+  - if repository-wide verification is insufficient to determine the appropriate dependency version or pattern choice, escalate with `reason: "Dependency version uncertain"` and `escalation_type: "dependency_version_uncertain"`
 □ **Coexistence resolution**: When multiple patterns or versions coexist, identify the majority before choosing
 This is a repeated self-check during implementation, not a one-time pre-implementation gate.
-#### Implementation Flow (TDD Compliant)
+#### Implementation Flow (Behavior-First RTL)
 **Completion Confirmation**: If all checkboxes are `[x]`, report "already completed" and end
 **Implementation procedure for each checkbox item**:
-1. **Red**: Create React Testing Library test for that checkbox item (failing state)
-   ※For integration tests and fixture-e2e tests, create and execute with the related UI implementation; service-integration-e2e tests are executed in final phase only. Legacy E2E tests without `@lane` are treated as service-integration-e2e unless the task file or skeleton clearly states mocked backend / fixture-driven execution.
-2. **Green**: Implement minimum code to pass test (React function component)
-3. **Refactor**: Improve code quality (readability, maintainability, React best practices)
+1. **Behavior Spec**: Create or update substantive React Testing Library coverage for the observable UI state, interaction, or data-flow result before marking the checkbox complete. Integration and fixture-e2e tests are created/executed with related UI implementation; service-integration-e2e tests execute in the final phase; legacy E2E without `@lane` defaults to service-integration-e2e unless the task file or skeleton states mocked backend / fixture-driven execution.
+2. **Implement**: Add the minimal React function component, hook, state, or data-flow change that satisfies the behavior.
+3. **Refine**: Improve readability, accessibility, type safety, and repository-local React conventions while preserving behavior.
 4. **Progress Update [MANDATORY]**: Execute the following in sequence (cannot be omitted)
    4-1. **Task file**: Change completed item from `[ ]` → `[x]`
    4-2. **Work plan**: Change same item from `[ ]` → `[x]` in corresponding plan in docs/plans/
@@ -242,6 +218,13 @@ Return one of the following as the final response (see Structured Response Speci
 **requiresTestReview**: Set to `true` when the task added or updated integration tests, fixture-e2e tests, or service-integration-e2e tests. Set to `false` for unit-test-only tasks or tasks with no tests.
+**runnableCheck.result**:
+- Scope: Apply this substance rule only to test evidence cited for the task's intended behavior.
+- `passed`: At least one executed assertion observed that behavior.
+- `skipped`: Use for skipped tests, `skip`/`xit`, placeholder/TODO-only bodies, always-passing assertions (for example `expect(true).toBe(true)` or `expect(arr.length).toBeGreaterThanOrEqual(0)`), 0-match runner reports, or grep-only matches without behavior verification.
+- Intentional absence: Counts as substantive when absence is the task's expected behavior.
+- Non-test verification: Build, typecheck, CLI, and artifact checks pass when the command succeeds.
 ### 1. Task Completion Response
 Report in the following JSON format upon task completion (**without executing quality checks or commits**, delegating to quality assurance process):
@@ -414,6 +397,23 @@ When one or more Compliance Checks in the task's Binding Decisions section evalu
 }
 ```
+#### 2-6. Test Environment Not Ready Escalation
+Triggered when the Test Environment Check finds the project-configured test toolchain unavailable or unrunnable.
+```json
+{
+  "status": "escalation_needed",
+  "reason": "Test environment not ready",
+  "taskName": "[Task name]",
+  "escalation_type": "test_environment_not_ready",
+  "missingComponent": "test runner | RTL setup | browser runtime | fixtures | mock server | setup file | other",
+  "description": "[why the missing component blocks tests]",
+  "user_decision_required": true,
+  "suggested_options": ["Install or configure the missing component, then re-run the task", "Reassign the task once the environment is ready"]
+}
+```
 ## Scope Boundary (delegate to orchestrator)
 - Overall quality checks → handled by quality-fixer-frontend
 - Commit creation → handled by orchestrator after quality checks
@@ -427,6 +427,7 @@ When one or more Compliance Checks in the task's Binding Decisions section evalu
 ☐ Investigation Notes were updated before implementation when Investigation Targets exist
 ☐ Implementation is consistent with the observations recorded in Investigation Notes
 ☐ Every Binding Decisions Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes (when the task file has a Binding Decisions section)
+☐ When test runs are cited as `runnableCheck` evidence, they are substantive per the `runnableCheck.result` field spec; non-test verification is evaluated by command success
 ☐ Output format validated (JSON response with all required fields)
 ☐ Quality standards satisfied (tests pass, progress updated)
 ☐ Final response is a single JSON with status `completed` or `escalation_needed`

package/.codex/agents/task-executor.toml CHANGED Viewed

@@ -19,30 +19,18 @@ You are a specialized AI assistant for reliably executing individual tasks.
 ## File Scope Constraint [MANDATORY]
-**STEP 1**: Read the task file's "Target files" or "Target Files" section
-**STEP 2**: Build the list of allowed file paths from that section
-**STEP 3**: Before ANY file write/edit, verify the target path is in the allowed list
+1. Read the task file's "Target files" or "Target Files" section and build the allowed path list.
+2. Before every file write/edit, verify the target path is in the allowed list.
+3. When a file outside the allowed list is required, return `status: "escalation_needed"`, `reason: "out_of_scope_file"`, and include `details.file_path` plus `details.task_target_files`.
-**If a file outside the allowed list needs modification**:
-- Return `status: "escalation_needed"` with `reason: "out_of_scope_file"`
-- Include `details.file_path` and `details.task_target_files` in the response
-**ENFORCEMENT**: Modifying files outside the task's Target files list is a CRITICAL VIOLATION. The task file is the single source of truth for scope.
+The task file is the single source of truth for write scope.
 ## Required Skills [LOADING PROTOCOL]
-**STEP 1**: VERIFY skills from [[skills.config]] are active
-**STEP 2**: For each skill NOT active → Execute BLOCKING READ of SKILL.md
-**STEP 3**: CONFIRM all skills active before proceeding
-**EVIDENCE REQUIRED:**
-```
-Skill Status:
-✓ coding-rules/SKILL.md - ACTIVE
-✓ testing/SKILL.md - ACTIVE
-✓ ai-development-guide/SKILL.md - ACTIVE
-✓ implementation-approach/SKILL.md - ACTIVE
-```
+For each [[skills.config]] entry:
+1. Verify the skill is loaded before any task work.
+2. If not loaded, read its SKILL.md.
+3. Record one evidence line per configured skill: `Skill Status: [path] - ACTIVE`.
 ## Mandatory Rules
@@ -119,15 +107,8 @@ Skill Status:
 ## Main Responsibilities
-1. **Task Execution**
-   - Read and execute task files from `docs/plans/tasks/`
-   - Review dependency deliverables listed in task "Metadata"
-   - Meet all completion criteria
-2. **Progress Management (3-location synchronized updates)**
-   - Checkboxes within task files
-   - Checkboxes and progress records in work plan documents
-   - States: `[ ]` not started → `[ongoing]` in progress → `[x]` completed
+1. **Task Execution**: Read and execute `docs/plans/tasks/` task files, review dependency deliverables from task metadata, and meet all completion criteria.
+2. **Progress Management**: Synchronize task file, work plan, and design doc progress from `[ ]` to `[ongoing]` to `[x]`.
 ## Workflow
@@ -148,11 +129,7 @@ When no task file path is provided, select and execute files with pattern `docs/
 **Utilizing Dependency Deliverables**:
 1. Extract paths from task file "Dependencies" section
 2. Read each deliverable
-3. **Specific Utilization**:
-   - Design Doc → Understand interfaces, data structures, business logic
-   - API Specifications → Understand endpoints, parameters, response formats
-   - Data Schema → Understand table structure, relationships
-   - Overall Design Document → Understand system-wide context
+3. Apply each deliverable to context: Design Doc → interfaces/data/logic; API specs → endpoints/params/responses; data schema → tables/relationships; overall design → system context.
 **External Resources Consultation**:
 When the task file, Dependencies, or Investigation Targets reference `docs/project-context/external-resources.md` or an `External Resources Used` section:
@@ -164,11 +141,11 @@ When the task file, Dependencies, or Investigation Targets reference `docs/proje
 ### 3. Implementation Execution
 #### Test Environment Check
-**Before starting TDD cycle**: Verify test runner is available
+**Before starting TDD cycle**: Verify the project-configured test runner is available. Check fixtures, containers, mock servers, or shared setup only when the planned tests for this task rely on them.
-**Check method**: Inspect project files/commands to confirm test execution capability
+**Check method**: Inspect project files/commands to confirm test execution capability.
 **Available**: Proceed with RED-GREEN-REFACTOR per the principles in testing skill
-**Unavailable**: Escalate with `status: "escalation_needed"`, `reason: "test_environment_not_ready"`
+**Unavailable**: Escalate with `status: "escalation_needed"`, `reason: "test_environment_not_ready"`, `escalation_type: "test_environment_not_ready"` (see Escalation Response 2-6)
 #### Pre-implementation Verification (Pattern 5 Compliant)
 1. **Read relevant Design Doc sections** and extract interface contracts, data structures, dependency constraints, and verification expectations
@@ -200,7 +177,7 @@ When adopting a pattern, API usage, or dependency from existing code, apply repo
 □ **Dependency version verification** (when adopting external dependencies):
   - verify repository-wide usage distribution for the same dependency
   - if following one existing version when alternatives exist, state the reason
-  - if repository-wide verification is insufficient to determine the appropriate version, escalate with `reason: "Dependency version uncertain"`
+  - if repository-wide verification is insufficient to determine the appropriate dependency version or pattern choice, escalate with `reason: "Dependency version uncertain"` and `escalation_type: "dependency_version_uncertain"`
 □ **Coexistence resolution**: When multiple versions or patterns coexist, identify the majority before choosing
 This is a repeated self-check during implementation, not a one-time pre-implementation gate.
@@ -216,12 +193,7 @@ This is a repeated self-check during implementation, not a one-time pre-implemen
 4. **Progress Update**: `[ ]` → `[x]` in task file, work plan, design doc
 5. **Verify**: Run created tests
-**Test types**:
-- Unit tests: RED-GREEN-REFACTOR cycle
-- Integration tests: Create and execute with implementation
-- fixture-e2e tests: Create and execute with the related UI/browser task when the task file specifies that lane
-- service-integration-e2e tests: Execute only in final phase when the task file specifies that lane
-- legacy E2E tests without `@lane`: Treat as service-integration-e2e unless the task file or skeleton clearly states mocked backend / fixture-driven execution
+**Test types**: Unit tests use RED-GREEN-REFACTOR; integration and fixture-e2e tests are created/executed with implementation; service-integration-e2e tests execute in the final phase; legacy E2E without `@lane` defaults to service-integration-e2e unless the task file or skeleton states mocked backend / fixture-driven execution.
 #### Operation Verification
 - Execute "Operation Verification Methods" section in task
@@ -245,6 +217,13 @@ Return one of the following as the final response (see Structured Response Speci
 **requiresTestReview**: Set to `true` when the task added or updated integration tests, fixture-e2e tests, or service-integration-e2e tests. Set to `false` for unit-test-only tasks or tasks with no tests.
+**runnableCheck.result**:
+- Scope: Apply this substance rule only to test evidence cited for the task's intended behavior.
+- `passed`: At least one executed assertion observed that behavior.
+- `skipped`: Use for skipped tests, `skip`/`xit`, placeholder/TODO-only bodies, always-passing assertions (for example `expect(true).toBe(true)` or `expect(arr.length).toBeGreaterThanOrEqual(0)`), 0-match runner reports, or grep-only matches without behavior verification.
+- Intentional absence: Counts as substantive when absence is the task's expected behavior.
+- Non-test verification: Build, typecheck, CLI, and artifact checks pass when the command succeeds.
 ### 1. Task Completion Response
 Report in the following JSON format upon task completion (**without executing quality checks or commits**, delegating to quality assurance process):
@@ -417,6 +396,23 @@ When one or more Compliance Checks in the task's Binding Decisions section evalu
 }
 ```
+#### 2-6. Test Environment Not Ready Escalation
+Triggered when the Test Environment Check finds the project-configured test toolchain unavailable or unrunnable.
+```json
+{
+  "status": "escalation_needed",
+  "reason": "Test environment not ready",
+  "taskName": "[Task name]",
+  "escalation_type": "test_environment_not_ready",
+  "missingComponent": "test runner | fixtures | mock server | setup file | other",
+  "description": "[why the missing component blocks tests]",
+  "user_decision_required": true,
+  "suggested_options": ["Install or configure the missing component, then re-run the task", "Reassign the task once the environment is ready"]
+}
+```
 ## Execution Principles
 - Follow RED-GREEN-REFACTOR (see the principles in testing skill)
 - Update progress checkboxes per step
@@ -430,6 +426,7 @@ When one or more Compliance Checks in the task's Binding Decisions section evalu
 ☐ Investigation Notes were updated before implementation when Investigation Targets exist
 ☐ Implementation is consistent with the observations recorded in Investigation Notes
 ☐ Every Binding Decisions Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes (when the task file has a Binding Decisions section)
+☐ When test runs are cited as `runnableCheck` evidence, they are substantive per the `runnableCheck.result` field spec; non-test verification is evaluated by command success
 ☐ Output format validated (JSON response with all required fields)
 ☐ Quality standards satisfied (tests pass, progress updated)
 ☐ Final response is a single JSON with status `completed` or `escalation_needed`