npm - codex-workflows - Versions diffs - 0.6.4 → 0.6.6 - Mend

codex-workflows 0.6.4 → 0.6.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/.agents/skills/documentation-criteria/references/plan-template.md +27 -0
package/.agents/skills/documentation-criteria/references/task-template.md +12 -0
package/.agents/skills/recipe-build/SKILL.md +20 -6
package/.agents/skills/recipe-front-build/SKILL.md +20 -6
package/.agents/skills/recipe-front-plan/SKILL.md +12 -0
package/.agents/skills/recipe-front-review/SKILL.md +2 -1
package/.agents/skills/recipe-fullstack-build/SKILL.md +20 -6
package/.agents/skills/recipe-implement/SKILL.md +1 -1
package/.agents/skills/recipe-plan/SKILL.md +14 -1
package/.agents/skills/recipe-review/SKILL.md +2 -1
package/.agents/skills/subagents-orchestration-guide/SKILL.md +15 -4
package/.agents/skills/subagents-orchestration-guide/references/monorepo-flow.md +12 -4
package/.codex/agents/acceptance-test-generator.toml +12 -2
package/.codex/agents/code-reviewer.toml +6 -1
package/.codex/agents/document-reviewer.toml +18 -1
package/.codex/agents/integration-test-reviewer.toml +22 -2
package/.codex/agents/quality-fixer-frontend.toml +28 -91
package/.codex/agents/quality-fixer.toml +7 -0
package/.codex/agents/task-decomposer.toml +16 -0
package/.codex/agents/task-executor-frontend.toml +49 -48
package/.codex/agents/task-executor.toml +41 -44
package/.codex/agents/work-planner.toml +28 -1
package/package.json +1 -1

package/.agents/skills/documentation-criteria/references/plan-template.md CHANGED Viewed

@@ -5,8 +5,16 @@ Type: feature|fix|refactor
 Estimated Duration: X days
 Estimated Impact: X files
 Related Issue/PR: #XXX (if any)
+Review Scope: [planned-files scope derived from Design Doc and task targets; for a revision plan over existing work, base branch + diff range]
 Implementation Readiness: pending
+## WorkPlan Review
+This section records the review gate state for the exact plan content. Set `Status: pending` when the plan is created or materially updated. The orchestrator treats only `Status: approved` with `Conditions: none` as reviewed.
+- **Status**: pending|approved
+- **Conditions**: none
 ## Related Documents
 - Design Doc(s):
   - [docs/design/XXX.md]
@@ -35,6 +43,10 @@ Repeat this block for each Design Doc when multiple Design Docs exist. Preserve
 - **Success criteria**: [extracted from Design Doc]
 - **Failure response**: [extracted from Design Doc]
+### Proof Strategy
+- **Proof obligation source**: [test skeleton annotations (`Primary failure mode`, `Proof obligation`) when skeletons exist; otherwise each acceptance criterion's primary failure mode derived from the Design Doc]
+- **Per-task propagation**: every task that implements or verifies a claim records the AC ID or claim identifier in Proof Obligations (see task template) so downstream review can judge whether tests prove the claim, not merely run
 ## Quality Assurance Mechanisms (from Design Docs)
 Adopted quality gates for the change area. Each task in this plan must satisfy the applicable mechanisms.
@@ -69,6 +81,21 @@ Map each Design Doc technical requirement to the task or phase that covers it. U
 - Merge duplicate restatements of the same obligation from multiple DD sections into one row and cite the primary section in `DD Section`
 - Keep `scope-boundary` rows concrete: name the protected file group, component boundary, contract, or workflow that must remain unchanged
+## Failure Mode Checklist
+Domain-independent failure categories this implementation must guard against. Enumerate all eight categories, mark which apply, and list a covering task for each that applies; keep category names generic and place project-specific detail in task descriptions or notes.
+| Category | Applies? | Covered By Task(s) |
+|----------|----------|--------------------|
+| same-value | yes/no | [P1-T1] |
+| no-op | yes/no | |
+| empty input | yes/no | |
+| invalid option | yes/no | |
+| missing config | yes/no | |
+| unavailable boundary | yes/no | |
+| shared-state dependency | yes/no | |
+| rollback-only visibility | yes/no | |
 ## UI Spec Component -> Task Mapping
 Include this section when a UI Spec is among the inputs. Map each UI component section to the task(s) that implement it so task-decomposer can pass the exact UI Spec context to executor tasks. Omit this section when no UI Spec exists.

package/.agents/skills/documentation-criteria/references/task-template.md CHANGED Viewed

@@ -58,9 +58,21 @@ Brief observations recorded after reading Investigation Targets:
 - **Failure response**: [What to do if verification fails]
 - **Verification level**: [L1 unit/local verification, L2 integration verification, or L3 end-to-end verification]
+## Proof Obligations
+(Include one entry per acceptance criterion, user journey, boundary, or state transition this task implements or verifies. Derive from test skeleton annotations when present; otherwise derive from the acceptance criterion's primary failure mode.)
+- **AC / Claim ID**: [AC-XXX, user journey identifier, boundary identifier, or task claim identifier]
+- **Claim**: [behavior the acceptance criterion or task promises]
+- **Primary failure mode**: [regression the test should turn red on]
+- **Boundary to exercise**: [public/integration/browser/process/service/persistence boundary, or "in-process unit"]
+- **State assertion**: [observable state before -> action -> after for state-changing claims; "N/A" otherwise]
+- **Mock boundary rationale**: [which external boundaries may be mocked and why; "none" when all real]
+- **Residual**: [what this task-level proof leaves unestablished, and which later task or phase closes it]
 ## Completion Criteria
+- [ ] All listed AC / Claim IDs are implemented or verified by this task
 - [ ] All added tests pass
 - [ ] Operation verified per Operation Verification Methods above
+- [ ] Each Proof Obligation is met: the test turns red under its primary failure mode and exercises the stated boundary
 - [ ] Deliverables created (for research/design tasks)
 - [ ] When Binding Decisions exist, every Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes

package/.agents/skills/recipe-build/SKILL.md CHANGED Viewed

@@ -55,14 +55,28 @@ Analyze task file existence state and determine the action required:
 | State | Criteria | Next Action |
 |-------|----------|-------------|
 | Tasks exist | Consumed Task Set is non-empty | User's execution instruction serves as batch approval -> Enter autonomous execution immediately |
-| No tasks + plan exists | Consumed Task Set is empty but plan exists | Confirm with user -> spawn task-decomposer |
+| No tasks + plan exists + reviewed plan | Consumed Task Set is empty and WorkPlan Review records `Status: approved`, `Conditions: none` | Confirm with user -> spawn task-decomposer |
+| No tasks + small simplified plan | Consumed Task Set is empty, plan exists, and the plan references no Design Doc | Confirm with user -> spawn task-decomposer |
+| No tasks + plan exists + unreviewed plan | Consumed Task Set is empty, the plan references a Design Doc, and WorkPlan Review is absent, pending, conditional, or not approved | Run work plan review, then confirm with user -> spawn task-decomposer |
 | Neither exists | No plan or task files | Error: Prerequisites not met |
 ## Task Decomposition Phase (Conditional)
-When task files don't exist:
+When task files don't exist, the plan references a Design Doc, and the WorkPlan Review section is absent, pending, conditional, or not approved:
-### 1. User Confirmation
+### 1. Work Plan Review
+Spawn document-reviewer agent: "Review the work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Branch on `verdict.decision`:
+- `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
+- `approved_with_conditions` -> stop before task decomposition and report that the work plan needs update via recipe-plan
+- `needs_revision` -> stop before task decomposition and report that the work plan needs update via recipe-plan
+- `rejected` -> stop before task decomposition and present the blocking findings to the user
+When task files don't exist and the WorkPlan Review section records `Status: approved` and `Conditions: none`, skip Work Plan Review and continue to user confirmation.
+### 2. User Confirmation
 ```
 No task files found.
 Work plan: docs/plans/[plan-name].md
@@ -70,10 +84,10 @@ Work plan: docs/plans/[plan-name].md
 Generate tasks from the work plan? (y/n):
 ```
-### 2. Task Decomposition (if approved)
+### 3. Task Decomposition (if approved)
 Spawn task-decomposer agent: "Read work plan at docs/plans/[plan-name].md and decompose into atomic tasks. Output: Individual task files in docs/plans/tasks/. Granularity: 1 task = 1 commit = independently executable."
-### 3. Verify Generation
+### 4. Verify Generation
 Recompute the Consumed Task Set and verify it is non-empty.
 ## Pre-execution Checklist
@@ -123,7 +137,7 @@ VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous ex
 ## Post-Implementation Verification (After All Tasks Complete)
 After all task cycles finish, collect all `filesModified` from every task-executor response (deduplicated), then run both verification agents before the completion report:
-1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
+1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]. Work Plan Review Scope: [Review Scope value from the active work plan, used only to confirm the collected file set is complete]."
 2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
 3. Consolidate results:
    - code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`

package/.agents/skills/recipe-front-build/SKILL.md CHANGED Viewed

@@ -55,14 +55,28 @@ Analyze task file existence state and determine the action required:
 | State | Criteria | Next Action |
 |-------|----------|-------------|
 | Tasks exist | Consumed Task Set is non-empty | User's execution instruction serves as batch approval -> Enter autonomous execution immediately |
-| No tasks + plan exists | Consumed Task Set is empty but plan exists | Confirm with user -> spawn task-decomposer |
+| No tasks + plan exists + reviewed plan | Consumed Task Set is empty and WorkPlan Review records `Status: approved`, `Conditions: none` | Confirm with user -> spawn task-decomposer |
+| No tasks + small simplified plan | Consumed Task Set is empty, plan exists, and the plan references no Design Doc | Confirm with user -> spawn task-decomposer |
+| No tasks + plan exists + unreviewed plan | Consumed Task Set is empty, the plan references a Design Doc, and WorkPlan Review is absent, pending, conditional, or not approved | Run work plan review, then confirm with user -> spawn task-decomposer |
 | Neither exists | No plan or task files | Error: Prerequisites not met |
 ## Task Decomposition Phase (Conditional)
-When task files don't exist:
+When task files don't exist, the plan references a Design Doc, and the WorkPlan Review section is absent, pending, conditional, or not approved:
-### 1. User Confirmation
+### 1. Work Plan Review
+Spawn document-reviewer agent: "Review the frontend work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Branch on `verdict.decision`:
+- `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
+- `approved_with_conditions` -> stop before task decomposition and report that the work plan needs update via recipe-front-plan
+- `needs_revision` -> stop before task decomposition and report that the work plan needs update via recipe-front-plan
+- `rejected` -> stop before task decomposition and present the blocking findings to the user
+When task files don't exist and the WorkPlan Review section records `Status: approved` and `Conditions: none`, skip Work Plan Review and continue to user confirmation.
+### 2. User Confirmation
 ```
 No task files found.
 Work plan: docs/plans/[plan-name].md
@@ -70,10 +84,10 @@ Work plan: docs/plans/[plan-name].md
 Generate tasks from the work plan? (y/n):
 ```
-### 2. Task Decomposition (if approved)
+### 3. Task Decomposition (if approved)
 Spawn task-decomposer agent: "Read work plan at docs/plans/[plan-name].md and decompose into atomic tasks. Output: Individual task files in docs/plans/tasks/. Granularity: 1 task = 1 commit = independently executable"
-### 3. Verify Generation
+### 4. Verify Generation
 Recompute the Consumed Task Set and verify it is non-empty.
 ## Pre-execution Checklist
@@ -131,7 +145,7 @@ VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous ex
 ## Post-Implementation Verification (After All Tasks Complete)
 After all task cycles finish, collect all `filesModified` from every task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
-1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
+1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]. Work Plan Review Scope: [Review Scope value from the active work plan, used only to confirm the collected file set is complete]."
 2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
 3. Consolidate results:
    - code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`

package/.agents/skills/recipe-front-plan/SKILL.md CHANGED Viewed

@@ -20,6 +20,7 @@ description: "Create frontend work plan from design document with test skeleton
 **Execution Method**:
 - Test skeleton generation -> performed by acceptance-test-generator
 - Work plan creation -> performed by work-planner
+- Work plan review -> performed by document-reviewer
 Orchestrator spawns agents and passes structured data between them.
@@ -29,6 +30,7 @@ Orchestrator spawns agents and passes structured data between them.
 - Design document selection
 - Test skeleton generation with acceptance-test-generator
 - Work plan creation with work-planner
+- Work plan review with document-reviewer
 - Plan approval obtainment
 **Responsibility Boundary**: This skill completes with work plan approval.
@@ -50,6 +52,15 @@ Spawn acceptance-test-generator agent: "Generate test skeletons from Design Doc
 ### Step 3: Work Plan Creation
 Spawn work-planner agent: "Create work plan from Design Doc at [path]. Integration test file: [path from step 2]. fixture-e2e test file: [path from step 2 or null]. service-integration-e2e test file: [path from step 2 or null]. E2E absence reasons by lane: [values from step 2 when an E2E lane is null]. Integration tests are created with each phase implementation, fixture-e2e runs alongside UI implementation, service-integration-e2e runs only in the final phase when a service E2E file exists. Include `Implementation Readiness: pending` in the work plan header."
+### Step 4: Work Plan Review
+Spawn document-reviewer agent: "Review the frontend work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Branch on `verdict.decision`:
+- `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then proceed to Step 5
+- `approved_with_conditions` or `needs_revision` -> spawn work-planner in update mode with the findings or conditions, then repeat Step 4. Use max 2 revision iterations as defined by the `needs_revision` row in subagents-orchestration-guide Approval Status Vocabulary.
+- `rejected` -> stop and present the blocking findings to the user.
+### Step 5: Plan Approval
 **[STOP -- BLOCKING]** Interact with user to complete plan and obtain approval for plan content. Clarify specific implementation steps and risks.
 **CANNOT proceed until user explicitly approves the work plan.**
@@ -60,6 +71,7 @@ ENFORCEMENT: Plan content MUST be approved before declaring completion. Unapprov
 - [ ] Design document selected
 - [ ] Test skeletons generated
 - [ ] Work plan created
+- [ ] Work plan reviewed via document-reviewer
 - [ ] User approved plan content
 ## Output Example

package/.agents/skills/recipe-front-review/SKILL.md CHANGED Viewed

@@ -31,12 +31,13 @@ Design Doc (uses most recent if omitted): $ARGUMENTS
 ### 1. Prerequisite Check
 Identify the Design Doc in docs/design/ and check implementation files changed from the default branch (detect via `git symbolic-ref refs/remotes/origin/HEAD` or fall back to current branch diff).
+If a single active work plan is explicitly provided or unambiguously resolved for that Design Doc, read its `Review Scope` line. Otherwise set `Work Plan: none` and `Review Scope: none`; do not infer.
 **[STOP -- BLOCKING]** If no Design Doc or implementation files found, notify user and halt.
 **CANNOT proceed without both a Design Doc and implementation files.**
 ### 2. Execute code-reviewer
-Spawn code-reviewer agent: "Validate Design Doc compliance for [design-doc-path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
+Spawn code-reviewer agent: "Validate Design Doc compliance for [design-doc-path]. Work Plan: [resolved work plan path or none]. Review Scope: [literal Review Scope value or none]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
 **Store output as**: `$STEP_2_OUTPUT`

package/.agents/skills/recipe-fullstack-build/SKILL.md CHANGED Viewed

@@ -65,14 +65,28 @@ Analyze task file existence state and determine the action required:
 | State | Criteria | Next Action |
 |-------|----------|-------------|
 | Tasks exist | Consumed Task Set is non-empty | User's execution instruction serves as batch approval -> Enter autonomous execution immediately |
-| No tasks + plan exists | Consumed Task Set is empty but plan exists | Confirm with user -> spawn task-decomposer |
+| No tasks + plan exists + reviewed plan | Consumed Task Set is empty and WorkPlan Review records `Status: approved`, `Conditions: none` | Confirm with user -> spawn task-decomposer |
+| No tasks + small simplified plan | Consumed Task Set is empty, plan exists, and the plan references no Design Doc | Confirm with user -> spawn task-decomposer |
+| No tasks + plan exists + unreviewed plan | Consumed Task Set is empty, the plan references a Design Doc, and WorkPlan Review is absent, pending, conditional, or not approved | Run work plan review, then confirm with user -> spawn task-decomposer |
 | Neither exists | No plan or task files | Error: Prerequisites not met |
 ## Task Decomposition Phase (Conditional)
-When task files don't exist:
+When task files don't exist, the plan references a Design Doc, and the WorkPlan Review section is absent, pending, conditional, or not approved:
-### 1. User Confirmation
+### 1. Work Plan Review
+Spawn document-reviewer agent: "Review the fullstack work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Branch on `verdict.decision`:
+- `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
+- `approved_with_conditions` -> stop before task decomposition and report that the work plan needs update via recipe-plan or the fullstack planning flow
+- `needs_revision` -> stop before task decomposition and report that the work plan needs update via recipe-plan or the fullstack planning flow
+- `rejected` -> stop before task decomposition and present the blocking findings to the user
+When task files don't exist and the WorkPlan Review section records `Status: approved` and `Conditions: none`, skip Work Plan Review and continue to user confirmation.
+### 2. User Confirmation
 ```
 No task files found.
 Work plan: docs/plans/[plan-name].md
@@ -80,10 +94,10 @@ Work plan: docs/plans/[plan-name].md
 Generate tasks from the work plan? (y/n):
 ```
-### 2. Task Decomposition (if approved)
+### 3. Task Decomposition (if approved)
 Spawn task-decomposer agent: "Read work plan at docs/plans/[plan-name].md and decompose into atomic tasks. Output: Individual task files in docs/plans/tasks/. Granularity: 1 task = 1 commit = independently executable. Use layer-aware naming: {plan}-backend-task-{n}.md, {plan}-frontend-task-{n}.md based on target file paths."
-### 3. Verify Generation
+### 4. Verify Generation
 Recompute the Consumed Task Set and verify it is non-empty.
 ## Pre-execution Checklist
@@ -141,7 +155,7 @@ VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous ex
 ## Post-Implementation Verification (After All Tasks Complete)
 After all task cycles finish, collect all `filesModified` from every task-executor/task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
-1. Spawn code-verifier once per Design Doc: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [single design doc path]. `code_paths`: [collected filesModified list]."
+1. Spawn code-verifier once per Design Doc: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [single design doc path]. `code_paths`: [collected filesModified list]. Work Plan Review Scope: [Review Scope value from the active work plan, used only to confirm the collected file set is complete]."
 2. Spawn security-reviewer agent: "Design Doc: [path(s)]. Implementation files: [collected filesModified list]. Review security compliance."
 3. Consolidate results:
    - each code-verifier run passes when `summary.status` is `consistent` or `mostly_consistent`

package/.agents/skills/recipe-implement/SKILL.md CHANGED Viewed

@@ -69,7 +69,7 @@ Follow subagents-orchestration-guide skill Large/Medium/Small scale flow exactly
 **STEP 3**: Spawn technical-designer-frontend agent → spawn document-reviewer agent → spawn design-sync agent.
 **[STOP — BLOCKING]** Present Frontend Design Doc for user approval. **CANNOT proceed until user explicitly confirms.**
-**STEP 4**: Spawn acceptance-test-generator agent → spawn work-planner agent.
+**STEP 4**: Spawn acceptance-test-generator agent → spawn work-planner agent → spawn document-reviewer agent with `doc_type: WorkPlan`.
 **[STOP — BLOCKING]** Present Work Plan for user approval. **CANNOT proceed until user explicitly confirms.**
 **STEP 5**: Run implementation readiness preflight.

package/.agents/skills/recipe-plan/SKILL.md CHANGED Viewed

@@ -33,6 +33,7 @@ ENFORCEMENT: Work-planner spawned without test skeleton data (when tests were re
 - Design document selection
 - Test skeleton generation with acceptance-test-generator
 - Work plan creation with work-planner
+- Work plan review with document-reviewer
 - Plan approval obtainment
 **Responsibility Boundary**: This skill completes with work plan approval.
@@ -53,7 +54,18 @@ Present options if multiple exist (can be specified with $ARGUMENTS).
 ### Step 3: Work Plan Creation
 - Spawn work-planner agent: "Create work plan from design document at [design-doc-path]. Include deliverables from previous process according to subagents-orchestration-guide skill coordination specification. If `generatedFiles.fixtureE2e` or `generatedFiles.serviceE2e` is null, use the corresponding `e2eAbsenceReason` and accept the null E2E lane as a valid planning input. Include `Implementation Readiness: pending` in the work plan header."
-- Interact with user to complete plan and obtain approval for plan content
+### Step 4: Work Plan Review
+Spawn document-reviewer agent: "Review the work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Branch on `verdict.decision`:
+- `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then proceed to Step 5
+- `approved_with_conditions` or `needs_revision` -> spawn work-planner in update mode with the findings or conditions, then repeat Step 4. Use max 2 revision iterations as defined by the `needs_revision` row in subagents-orchestration-guide Approval Status Vocabulary.
+- `rejected` -> stop and present the blocking findings to the user.
+### Step 5: Plan Approval
+- Present the reviewed work plan to the user for batch approval
+- If the user requests changes, spawn work-planner in update mode and re-run Step 4
 - Clarify specific implementation steps and risks
 **Scope**: Up to work plan creation and obtaining approval for plan content.
@@ -63,6 +75,7 @@ Present options if multiple exist (can be specified with $ARGUMENTS).
 - [ ] Design document identified and selected
 - [ ] Integration/E2E test skeleton generation confirmed with user (generated if requested)
 - [ ] Work plan created via work-planner
+- [ ] Work plan reviewed via document-reviewer
 - [ ] Plan content approved by user
 - [ ] All stopping points honored with user confirmation

package/.agents/skills/recipe-review/SKILL.md CHANGED Viewed

@@ -36,9 +36,10 @@ Design Doc (uses most recent if omitted): $ARGUMENTS
 ### Step 1: Prerequisite Check
 Identify Design Doc in docs/design/ and check implementation files via git diff.
+If a single active work plan is explicitly provided or unambiguously resolved for that Design Doc, read its `Review Scope` line. Otherwise set `Work Plan: none` and `Review Scope: none`; do not infer.
 ### Step 2: Execute code-reviewer
-Spawn code-reviewer agent: "Validate Design Doc compliance for the implementation. Design Doc path: [path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
+Spawn code-reviewer agent: "Validate Design Doc compliance for the implementation. Design Doc path: [path]. Work Plan: [resolved work plan path or none]. Review Scope: [literal Review Scope value or none]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
 **Store output as**: `$STEP_2_OUTPUT`

package/.agents/skills/subagents-orchestration-guide/SKILL.md CHANGED Viewed

@@ -140,7 +140,7 @@ Autonomous execution MUST stop and wait for user input at these points.
 | UI Spec | After document-reviewer completes UI Spec review (frontend/fullstack) | Approve UI Spec |
 | ADR | After document-reviewer completes ADR review (if ADR created) | Approve ADR |
 | Design | After design-sync completes consistency verification | Approve Design Doc |
-| Work Plan | After work-planner creates plan | Batch approval for implementation phase |
+| Work Plan | After document-reviewer completes WorkPlan review for Medium/Large, or after simplified plan creation for Small | Batch approval for implementation phase |
 **ENFORCEMENT**: After batch approval, autonomous execution proceeds without stops until completion or escalation. Skipping stop points is a CRITICAL VIOLATION.
@@ -164,6 +164,16 @@ Handling rules:
 **ENFORCEMENT**: Using any status value outside this vocabulary is a VIOLATION.
+### WorkPlan Review State [MANDATORY]
+Medium and Large work plans must contain a `WorkPlan Review` section. Small simplified plans are exempt because they have no Design Doc to trace against. The plan is reviewed only when that section records `Status: approved` and `Conditions: none`.
+Handling rules:
+- After WorkPlan review returns `approved`, invoke work-planner in update mode once to record the review section, without changing implementation content.
+- Treat WorkPlan `approved_with_conditions` the same as `needs_revision`: return to work-planner in update mode with the conditions, then re-review. Conditions must not be carried into task decomposition or implementation readiness.
+- A material work plan update resets `WorkPlan Review` to `Status: pending`.
+- Standalone build recipes apply WorkPlan review only before task decomposition, not after task files already exist.
 ## Scale Determination and Document Requirements
 | Scale | File Count | PRD | ADR | Design Doc | Work Plan |
@@ -185,7 +195,7 @@ Subagents respond in JSON format. The final response from each JSON-returning su
 | `requirement-analyzer` | `scale`, `confidence`, `affectedLayers`, `adrRequired`, `scopeDependencies`, `questions` |
 | `codebase-analyzer` | `focusAreas`, `dataModel`, `qualityAssurance`, `dataTransformationPipelines`, `limitations` |
 | `ui-analyzer` | `externalResources`, `componentStructure`, `propsPatterns`, `cssLayout`, `stateDisplay`, `focusAreas`, `candidateWriteSet`, `limitations` |
-| `task-executor*` | `status`, `escalation_type` (`design_compliance_violation`, `similar_function_found`, `similar_component_found`, `investigation_target_not_found`, `out_of_scope_file`, `dependency_version_uncertain`, `binding_decision_violation`), `filesModified`, `requiresTestReview` |
+| `task-executor*` | `status`, `escalation_type` (`design_compliance_violation`, `similar_function_found`, `similar_component_found`, `investigation_target_not_found`, `out_of_scope_file`, `dependency_version_uncertain`, `binding_decision_violation`, `test_environment_not_ready`), `filesModified`, `requiresTestReview` |
 | `quality-fixer*` | `status`, `reason`, `stubFindings`, `blockingIssues`, `missingPrerequisites` |
 | `document-reviewer` | `verdict.decision`, `verdict.conditions` |
 | `code-verifier` | `summary.status`, `discrepancies`, `reverseCoverage` |
@@ -242,8 +252,8 @@ Always start with `requirement-analyzer`, then follow the minimum flow required
 | Scale | Required flow |
 |-------|---------------|
-| Large | `requirement-analyzer` **[Stop]** -> `prd-creator` -> `document-reviewer` **[Stop]** -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> optional ADR + `document-reviewer` **[Stop]** -> `codebase-analyzer` -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` **[Stop]** -> `task-decomposer` |
-| Medium | `requirement-analyzer` **[Stop]** -> `codebase-analyzer` -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` **[Stop]** -> `task-decomposer` |
+| Large | `requirement-analyzer` **[Stop]** -> `prd-creator` -> `document-reviewer` **[Stop]** -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> optional ADR + `document-reviewer` **[Stop]** -> `codebase-analyzer` -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` -> `document-reviewer` (doc_type: WorkPlan) **[Stop]** -> `task-decomposer` |
+| Medium | `requirement-analyzer` **[Stop]** -> `codebase-analyzer` -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` -> `document-reviewer` (doc_type: WorkPlan) **[Stop]** -> `task-decomposer` |
 | Small | `requirement-analyzer` **[Stop]** -> simplified plan **[Stop: Batch approval]** -> direct implementation |
 Flow rules:
@@ -253,6 +263,7 @@ Flow rules:
 - Pass `codebase-analyzer` output to the designer as `Codebase Analysis`
 - Pass Design Doc path to `code-verifier`, then pass `code_verification` to `document-reviewer`
 - Fullstack layer sequencing is defined in `references/monorepo-flow.md`
+- Run WorkPlan review after every Medium/Large work plan creation or update and before batch approval. On `needs_revision` or WorkPlan `approved_with_conditions`, return to `work-planner` in update mode and re-review for max 2 revision iterations as defined by the `needs_revision` row in Approval Status Vocabulary. On `rejected`, halt and escalate to the user.
 ## Autonomous Execution Mode

package/.agents/skills/subagents-orchestration-guide/references/monorepo-flow.md CHANGED Viewed

@@ -10,7 +10,7 @@ This reference defines the orchestration flow for projects spanning multiple lay
 ## Design Phase
-### Large Scale Fullstack (6+ Files) - 15 Steps
+### Large Scale Fullstack (6+ Files) - 16 Steps
 | Step | Agent | Purpose | Output |
 |------|-------|---------|--------|
@@ -28,9 +28,10 @@ This reference defines the orchestration flow for projects spanning multiple lay
 | 12 | document-reviewer x2 | Review each Design Doc with verification evidence | Reviews |
 | 13 | design-sync | Cross-layer consistency verification (source: frontend Design Doc) **[Stop]** | Sync status |
 | 14 | acceptance-test-generator | Integration/E2E test skeleton from cross-layer contracts | Test skeletons |
-| 15 | work-planner | Work plan from all Design Docs **[Stop: Batch approval]** | Work plan |
+| 15 | work-planner | Work plan from all Design Docs | Work plan |
+| 16 | document-reviewer | WorkPlan review **[Stop: Batch approval]** | Approval |
-### Medium Scale Fullstack (3-5 Files) - 13 Steps
+### Medium Scale Fullstack (3-5 Files) - 14 Steps
 | Step | Agent | Purpose | Output |
 |------|-------|---------|--------|
@@ -46,7 +47,8 @@ This reference defines the orchestration flow for projects spanning multiple lay
 | 10 | document-reviewer x2 | Review each Design Doc with verification evidence | Reviews |
 | 11 | design-sync | Cross-layer consistency verification (source: frontend Design Doc) **[Stop]** | Sync status |
 | 12 | acceptance-test-generator | Integration/E2E test skeleton from cross-layer contracts | Test skeletons |
-| 13 | work-planner | Work plan from all Design Docs **[Stop: Batch approval]** | Work plan |
+| 13 | work-planner | Work plan from all Design Docs | Work plan |
+| 14 | document-reviewer | WorkPlan review **[Stop: Batch approval]** | Approval |
 ### Parallelization in Multi-Agent Steps
@@ -101,6 +103,12 @@ Spawn work-planner with all Design Docs:
 work-planner's existing Integration Complete criteria naturally covers cross-layer verification when given multiple Design Docs.
+After work-planner creates or updates the plan, spawn document-reviewer:
+> "Review the fullstack work plan. doc_type: WorkPlan. target: [work plan path]. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+On `needs_revision` or `approved_with_conditions`, return to work-planner in update mode and re-review for max 2 revision iterations as defined by the `needs_revision` row in Approval Status Vocabulary. On `rejected`, halt and escalate to the user. Stop for batch approval only after WorkPlan review returns `approved` and the plan's `WorkPlan Review` section records `Status: approved` with `Conditions: none`.
 ## Task Decomposition Phase
 task-decomposer follows standard decomposition from the work plan. The key addition is the **layer-aware naming convention**:

package/.codex/agents/acceptance-test-generator.toml CHANGED Viewed

@@ -192,6 +192,8 @@ Adapt comment syntax to the project's language when generating annotations.
   // @dependency: PaymentService, OrderRepository, Database
   // @real-dependency: OrderRepository, Database
   // @complexity: high
+  // Primary failure mode: payment succeeds but the order row is absent or unpersisted
+  // Proof obligation: assert order persistence after successful payment while keeping OrderRepository and Database real; only the external payment gateway may be mocked
   [Test: 'AC1: Successful payment creates persisted order with correct status']
   // AC1-error: "Payment failure shows user-friendly error message"
@@ -200,6 +202,8 @@ Adapt comment syntax to the project's language when generating annotations.
   // @category: core-functionality
   // @dependency: PaymentService, ErrorHandler
   // @complexity: medium
+  // Primary failure mode: payment failure still creates an order or hides the user-facing error
+  // Proof obligation: assert the visible error and the unchanged order state after a failed payment; mock only the external payment gateway failure
   [Test: 'AC1: Failed payment displays error without creating order']
 ```
@@ -221,6 +225,8 @@ Adapt comment syntax to the project's language when generating annotations.
   // @lane: fixture-e2e
   // @dependency: full-ui (mocked backend)
   // @complexity: medium
+  // Primary failure mode: undo banner appears but the dismissed card is not restored
+  // Proof obligation: assert browser-visible state before dismissal, after dismissal, and after undo using fixture-controlled backend state
   [Test: 'User Journey: Dismiss and undo restores the card']
 ```
@@ -242,6 +248,8 @@ Adapt comment syntax to the project's language when generating annotations.
   // @lane: service-integration-e2e
   // @dependency: full-system
   // @complexity: high
+  // Primary failure mode: checkout appears successful but the persisted order or confirmation event is missing
+  // Proof obligation: exercise the full local service stack and assert persisted order state plus confirmation event after checkout
   [Test: 'User Journey: Complete product purchase persists order and emits confirmation']
 ```
@@ -297,13 +305,15 @@ Each test case MUST have the following standard annotations for test implementat
 - **@lane**: integration | fixture-e2e | service-integration-e2e
 - **@dependency**: none | [component names] | full-ui (mocked backend) | full-system
 - **@complexity**: low | medium | high
+- **Primary failure mode**: the specific regression that should make the implemented test fail
+- **Proof obligation**: what the implemented test must assert to prove the claim, including the boundary to exercise, before/action/after state for state-changing claims, and which boundaries may be mocked with rationale
-These annotations are used when planning and prioritizing test implementation.
+These annotations are used when planning and prioritizing test implementation. Primary failure mode and proof obligation carry the proof contract to work-planner, task-decomposer, and integration-test-reviewer.
 ## Constraints and Quality Standards
 **Mandatory Compliance**:
-- Output test skeletons only: verification points, expected results, and pass criteria
+- Output test skeletons only: verification points, expected results, pass criteria, primary failure mode, and proof obligation
 - Downstream consumers treat these skeletons as design artifacts rather than runnable tests
 - Clearly state verification points, expected results, and pass criteria for each test
 - Preserve original AC statements in comments (ensure traceability)

package/.codex/agents/code-reviewer.toml CHANGED Viewed

@@ -53,7 +53,7 @@ Skill Status:
 ## Input Parameters
 - **designDoc**: Path to the Design Doc (or multiple paths for fullstack features)
-- **implementationFiles**: List of files to review (or git diff range)
+- **implementationFiles**: List of files to review (or git diff range). When a Work Plan is provided and implementationFiles is omitted or ambiguous, derive the review file set from the plan's `Review Scope` value; for revision plans, use the recorded base branch plus diff range.
 - **reviewMode**: `full` (default) | `acceptance` | `architecture`
 ## Workflow
@@ -120,6 +120,11 @@ Read error paths and boundary handling directly in the code:
 #### 3-3. Test Coverage for Acceptance Criteria
 - For each fulfilled AC, check whether tests exercise the expected behavior
+- For each test claimed as AC coverage, inspect the body:
+  - Meaningful coverage: at least one assertion exercises the AC's observable behavior
+  - Coverage gap: `skip`/`xit` on tests that should run, TODO/placeholder-only bodies, always-true assertions (for example `expect(true).toBe(true)` or `expect(arr.length).toBeGreaterThanOrEqual(0)`), 0-match runner reports, or grep-only matches without behavior verification
+  - Intentional absence: meaningful when absence is the AC expectation
+  - Proof adequacy: a covered test should fail under the AC's primary failure mode and should exercise the claimed boundary rather than a substitute input that bypasses it. A test that would stay green if the claimed behavior regressed is a `coverage_gap` with rationale naming the unproven failure mode.
 Classify each quality finding into one of:
 - `dd_violation`: implementation deviates from the Design Doc

package/.codex/agents/document-reviewer.toml CHANGED Viewed

@@ -49,7 +49,7 @@ Skill Status:
   - `composite`: Composite perspective review (recommended) - Verifies structure, implementation, and completeness in one execution
   - When unspecified: Comprehensive review
-- **doc_type**: Document type (`PRD`/`ADR`/`UISpec`/`DesignDoc`)
+- **doc_type**: Document type (`PRD`/`ADR`/`UISpec`/`DesignDoc`/`WorkPlan`)
 - **target**: Document path to review
 - **codebase_analysis**: codebase-analyzer JSON used to create the target document (optional)
 - **ui_analysis**: ui-analyzer JSON used to create the target document (optional)
@@ -84,6 +84,7 @@ Skill Status:
   - When `codebase_analysis` is provided, use `analysisScope`, `existingElements`, `constraints`, `qualityAssurance`, `focusAreas`, and `limitations` as source evidence for scope, feasibility, and completeness checks
   - When `ui_analysis` is provided, use `componentStructure`, `propsPatterns`, `cssLayout`, `stateDisplay`, `displayConditions`, `accessibility`, and `candidateWriteSet` as source evidence for UI scope, feasibility, and completeness checks
   - When `code_verification` is provided, use its discrepancies and reverse coverage as pre-verified evidence during review
+- For WorkPlan: confirm the plan carries the artifacts the semantic gate is judged against: WorkPlan Review, Review Scope, Design-to-Plan Traceability, Verification Strategy summary, Proof Strategy, Failure Mode Checklist, and Quality Assurance Mechanisms. Read the referenced Design Doc(s), UI Spec, ADRs, and test skeletons when listed so coverage can be checked against source artifacts.
 ### Step 2: Target Document Collection
 - Load document specified by target
@@ -105,6 +106,14 @@ For DesignDoc, additionally verify:
 - [ ] Output Comparison section present when the design changes existing observable behavior, an external contract, or a persisted data shape
 - [ ] Minimal Surface Alternatives section present with one entry per new in-scope element as defined by coding-rules "Minimum Surface Terms" when the design proposes new implementation surface. If none are introduced, the section is marked N/A with rationale. Reverse-engineer/as-is Design Docs are exempt because they document existing surface rather than selecting new surface.
+For WorkPlan, additionally verify:
+- [ ] WorkPlan Review section present
+- [ ] Review Scope recorded as planned-files scope, or base branch plus diff range for revision plans
+- [ ] Design-to-Plan Traceability table present
+- [ ] Verification Strategy summary and Proof Strategy present
+- [ ] Failure Mode Checklist present
+- [ ] Final phase includes Quality Assurance covering acceptance criteria achievement and required checks
 #### Gate 1: Quality Assessment (only after Gate 0 passes)
 **Comprehensive Review Mode**:
@@ -124,6 +133,14 @@ For DesignDoc, additionally verify:
 - **Verification Strategy quality check**: When the Verification Strategy section exists, verify that: (1) correctness definition is specific and measurable, (2) target comparison and observable success indicator are concrete when the change modifies observable behavior, external contracts, integrations, or data flow, (3) internal-only refactoring with identical observable inputs and outputs may use the minimal form, (4) verification method can detect the change's primary risk, (5) verification timing uses the normalized vocabulary or an explicit `N/A` rationale for minimal form, and (6) vertical-slice designs do not defer all verification to the final phase
 - **Output comparison check**: When the Design Doc changes existing observable behavior, an external contract, or a persisted data shape, verify that a concrete output comparison method is defined with identical input, expected output fields or format, and diff method. When upstream analysis includes `dataTransformationPipelines`, each listed step must be mapped to the comparison that verifies it; steps excluded because data passes through unchanged must include rationale. Missing mappings or rationale → `important` issue (category: `completeness`)
 - **Minimal Surface Alternatives check**: Applies when the Design Doc proposes new in-scope elements as defined by coding-rules "Minimum Surface Terms". Reverse-engineer/as-is Design Docs are exempt. Missing or empty section when the trigger fires → `critical` issue (category: `completeness`). For each entry verify: (1) Step 1 lists at least one AC ID or accepted technical constraint from the Design Doc or referenced UI Spec; speculative-only linkage → `critical` issue (category: `compliance`). (2) Steps 2-3 include at least one subtractive alternative such as derive, compute on demand, keep at caller, reuse existing, or do not introduce new state/mode/abstraction; missing subtractive alternative → `important` issue (category: `compliance`). (3) Step 4 selects the smallest alternative or names a current requirement smaller alternatives fail to satisfy; primary rationale based on coding-rules subjective-only rationales → `critical` issue (category: `compliance`). (4) Step 5 records rejected alternatives with brief rationale; missing rejected alternatives log → `important` issue (category: `completeness`)
+- **WorkPlan semantic gate**:
+  - Coverage is checked where each item lives in the plan: each acceptance criterion is covered by a task whose Completion Criteria or Proof Obligations reference the AC ID or claim identifier; each data contract, state transition, boundary, prerequisite, and protected scope item has a Design-to-Plan Traceability row mapped to a task or an explicit out-of-scope entry. Missing coverage is a `critical` issue (category: `completeness`).
+  - Distinguish the cause for an uncovered acceptance criterion: when the source Design Doc supports it but no task maps to it, classify as a plan omission (`critical`, fixable by re-planning); when the source document or inputs give it no basis, classify as `rejected` because re-planning cannot invent the missing source requirement.
+  - Early verification must sit in an early phase rather than only the final phase. Deferral to final phase without rationale is an `important` issue (category: `consistency`).
+  - Each cross-boundary, public-boundary, browser-boundary, or persisted-state change names a task that verifies it through the real boundary. Missing real-boundary coverage is an `important` issue (category: `completeness`).
+  - Each traceability table present (Design-to-Plan, UI Spec Component, Connection Map, ADR Bindings) is filled to the granularity needed to resolve the target task. Under-specified rows are `important` issues (category: `completeness`).
+  - The Failure Mode Checklist covers applicable domain-independent categories: same-value, no-op, empty input, invalid option, missing config, unavailable boundary, shared-state dependency, rollback-only visibility. Missing applicable categories are `recommended` issues (category: `completeness`).
+  - Verdict mapping: any WorkPlan semantic-gate `critical` issue forces `needs_revision`, except a coverage gap traceable to missing or contradictory source documents or inputs forces `rejected`. Important-only issues may return `approved_with_conditions`, but orchestration must route WorkPlan conditions back through work-planner update before batch approval or task decomposition.
 - **Undetermined items review** [MANDATORY]: Every TBD, unknown, or open item MUST include: (1) **owner** — who resolves it, (2) **due** — when it gets resolved (which phase or milestone), (3) **next-phase handling** — how the next phase treats this gap. Missing any of these three → `important` issue
 **Perspective-specific Mode**: