npm - codex-workflows - Versions diffs - 0.6.5 → 0.6.7 - Mend

codex-workflows 0.6.5 → 0.6.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/.agents/skills/documentation-criteria/references/plan-template.md +27 -0
package/.agents/skills/documentation-criteria/references/task-template.md +12 -0
package/.agents/skills/recipe-build/SKILL.md +20 -6
package/.agents/skills/recipe-front-build/SKILL.md +20 -6
package/.agents/skills/recipe-front-plan/SKILL.md +12 -0
package/.agents/skills/recipe-front-review/SKILL.md +2 -1
package/.agents/skills/recipe-fullstack-build/SKILL.md +20 -6
package/.agents/skills/recipe-implement/SKILL.md +1 -1
package/.agents/skills/recipe-plan/SKILL.md +14 -1
package/.agents/skills/recipe-review/SKILL.md +2 -1
package/.agents/skills/subagents-orchestration-guide/SKILL.md +14 -3
package/.agents/skills/subagents-orchestration-guide/references/monorepo-flow.md +12 -4
package/.codex/agents/acceptance-test-generator.toml +25 -5
package/.codex/agents/code-reviewer.toml +5 -1
package/.codex/agents/document-reviewer.toml +18 -1
package/.codex/agents/integration-test-reviewer.toml +14 -2
package/.codex/agents/task-decomposer.toml +16 -0
package/.codex/agents/task-executor-frontend.toml +12 -15
package/.codex/agents/task-executor.toml +12 -15
package/.codex/agents/technical-designer-frontend.toml +10 -81
package/.codex/agents/technical-designer.toml +10 -94
package/.codex/agents/work-planner.toml +28 -1
package/package.json +1 -1

package/.agents/skills/documentation-criteria/references/plan-template.md CHANGED Viewed

@@ -5,8 +5,16 @@ Type: feature|fix|refactor
 Estimated Duration: X days
 Estimated Impact: X files
 Related Issue/PR: #XXX (if any)
+Review Scope: [planned-files scope derived from Design Doc and task targets; for a revision plan over existing work, base branch + diff range]
 Implementation Readiness: pending
+## WorkPlan Review
+This section records the review gate state for the exact plan content. Set `Status: pending` when the plan is created or materially updated. The orchestrator treats only `Status: approved` with `Conditions: none` as reviewed.
+- **Status**: pending|approved
+- **Conditions**: none
 ## Related Documents
 - Design Doc(s):
   - [docs/design/XXX.md]
@@ -35,6 +43,10 @@ Repeat this block for each Design Doc when multiple Design Docs exist. Preserve
 - **Success criteria**: [extracted from Design Doc]
 - **Failure response**: [extracted from Design Doc]
+### Proof Strategy
+- **Proof obligation source**: [test skeleton annotations (`Primary failure mode`, `Proof obligation`) when skeletons exist; otherwise each acceptance criterion's primary failure mode derived from the Design Doc]
+- **Per-task propagation**: every task that implements or verifies a claim records the AC ID or claim identifier in Proof Obligations (see task template) so downstream review can judge whether tests prove the claim, not merely run
 ## Quality Assurance Mechanisms (from Design Docs)
 Adopted quality gates for the change area. Each task in this plan must satisfy the applicable mechanisms.
@@ -69,6 +81,21 @@ Map each Design Doc technical requirement to the task or phase that covers it. U
 - Merge duplicate restatements of the same obligation from multiple DD sections into one row and cite the primary section in `DD Section`
 - Keep `scope-boundary` rows concrete: name the protected file group, component boundary, contract, or workflow that must remain unchanged
+## Failure Mode Checklist
+Domain-independent failure categories this implementation must guard against. Enumerate all eight categories, mark which apply, and list a covering task for each that applies; keep category names generic and place project-specific detail in task descriptions or notes.
+| Category | Applies? | Covered By Task(s) |
+|----------|----------|--------------------|
+| same-value | yes/no | [P1-T1] |
+| no-op | yes/no | |
+| empty input | yes/no | |
+| invalid option | yes/no | |
+| missing config | yes/no | |
+| unavailable boundary | yes/no | |
+| shared-state dependency | yes/no | |
+| rollback-only visibility | yes/no | |
 ## UI Spec Component -> Task Mapping
 Include this section when a UI Spec is among the inputs. Map each UI component section to the task(s) that implement it so task-decomposer can pass the exact UI Spec context to executor tasks. Omit this section when no UI Spec exists.

package/.agents/skills/documentation-criteria/references/task-template.md CHANGED Viewed

@@ -58,9 +58,21 @@ Brief observations recorded after reading Investigation Targets:
 - **Failure response**: [What to do if verification fails]
 - **Verification level**: [L1 unit/local verification, L2 integration verification, or L3 end-to-end verification]
+## Proof Obligations
+(Include one entry per acceptance criterion, user journey, boundary, or state transition this task implements or verifies. Derive from test skeleton annotations when present; otherwise derive from the acceptance criterion's primary failure mode.)
+- **AC / Claim ID**: [AC-XXX, user journey identifier, boundary identifier, or task claim identifier]
+- **Claim**: [behavior the acceptance criterion or task promises]
+- **Primary failure mode**: [regression the test should turn red on]
+- **Boundary to exercise**: [public/integration/browser/process/service/persistence boundary, or "in-process unit"]
+- **State assertion**: [observable state before -> action -> after for state-changing claims; "N/A" otherwise]
+- **Mock boundary rationale**: [which external boundaries may be mocked and why; "none" when all real]
+- **Residual**: [what this task-level proof leaves unestablished, and which later task or phase closes it]
 ## Completion Criteria
+- [ ] All listed AC / Claim IDs are implemented or verified by this task
 - [ ] All added tests pass
 - [ ] Operation verified per Operation Verification Methods above
+- [ ] Each Proof Obligation is met: the test turns red under its primary failure mode and exercises the stated boundary
 - [ ] Deliverables created (for research/design tasks)
 - [ ] When Binding Decisions exist, every Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes

package/.agents/skills/recipe-build/SKILL.md CHANGED Viewed

@@ -55,14 +55,28 @@ Analyze task file existence state and determine the action required:
 | State | Criteria | Next Action |
 |-------|----------|-------------|
 | Tasks exist | Consumed Task Set is non-empty | User's execution instruction serves as batch approval -> Enter autonomous execution immediately |
-| No tasks + plan exists | Consumed Task Set is empty but plan exists | Confirm with user -> spawn task-decomposer |
+| No tasks + plan exists + reviewed plan | Consumed Task Set is empty and WorkPlan Review records `Status: approved`, `Conditions: none` | Confirm with user -> spawn task-decomposer |
+| No tasks + small simplified plan | Consumed Task Set is empty, plan exists, and the plan references no Design Doc | Confirm with user -> spawn task-decomposer |
+| No tasks + plan exists + unreviewed plan | Consumed Task Set is empty, the plan references a Design Doc, and WorkPlan Review is absent, pending, conditional, or not approved | Run work plan review, then confirm with user -> spawn task-decomposer |
 | Neither exists | No plan or task files | Error: Prerequisites not met |
 ## Task Decomposition Phase (Conditional)
-When task files don't exist:
+When task files don't exist, the plan references a Design Doc, and the WorkPlan Review section is absent, pending, conditional, or not approved:
-### 1. User Confirmation
+### 1. Work Plan Review
+Spawn document-reviewer agent: "Review the work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Branch on `verdict.decision`:
+- `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
+- `approved_with_conditions` -> stop before task decomposition and report that the work plan needs update via recipe-plan
+- `needs_revision` -> stop before task decomposition and report that the work plan needs update via recipe-plan
+- `rejected` -> stop before task decomposition and present the blocking findings to the user
+When task files don't exist and the WorkPlan Review section records `Status: approved` and `Conditions: none`, skip Work Plan Review and continue to user confirmation.
+### 2. User Confirmation
 ```
 No task files found.
 Work plan: docs/plans/[plan-name].md
@@ -70,10 +84,10 @@ Work plan: docs/plans/[plan-name].md
 Generate tasks from the work plan? (y/n):
 ```
-### 2. Task Decomposition (if approved)
+### 3. Task Decomposition (if approved)
 Spawn task-decomposer agent: "Read work plan at docs/plans/[plan-name].md and decompose into atomic tasks. Output: Individual task files in docs/plans/tasks/. Granularity: 1 task = 1 commit = independently executable."
-### 3. Verify Generation
+### 4. Verify Generation
 Recompute the Consumed Task Set and verify it is non-empty.
 ## Pre-execution Checklist
@@ -123,7 +137,7 @@ VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous ex
 ## Post-Implementation Verification (After All Tasks Complete)
 After all task cycles finish, collect all `filesModified` from every task-executor response (deduplicated), then run both verification agents before the completion report:
-1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
+1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]. Work Plan Review Scope: [Review Scope value from the active work plan, used only to confirm the collected file set is complete]."
 2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
 3. Consolidate results:
    - code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`

package/.agents/skills/recipe-front-build/SKILL.md CHANGED Viewed

@@ -55,14 +55,28 @@ Analyze task file existence state and determine the action required:
 | State | Criteria | Next Action |
 |-------|----------|-------------|
 | Tasks exist | Consumed Task Set is non-empty | User's execution instruction serves as batch approval -> Enter autonomous execution immediately |
-| No tasks + plan exists | Consumed Task Set is empty but plan exists | Confirm with user -> spawn task-decomposer |
+| No tasks + plan exists + reviewed plan | Consumed Task Set is empty and WorkPlan Review records `Status: approved`, `Conditions: none` | Confirm with user -> spawn task-decomposer |
+| No tasks + small simplified plan | Consumed Task Set is empty, plan exists, and the plan references no Design Doc | Confirm with user -> spawn task-decomposer |
+| No tasks + plan exists + unreviewed plan | Consumed Task Set is empty, the plan references a Design Doc, and WorkPlan Review is absent, pending, conditional, or not approved | Run work plan review, then confirm with user -> spawn task-decomposer |
 | Neither exists | No plan or task files | Error: Prerequisites not met |
 ## Task Decomposition Phase (Conditional)
-When task files don't exist:
+When task files don't exist, the plan references a Design Doc, and the WorkPlan Review section is absent, pending, conditional, or not approved:
-### 1. User Confirmation
+### 1. Work Plan Review
+Spawn document-reviewer agent: "Review the frontend work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Branch on `verdict.decision`:
+- `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
+- `approved_with_conditions` -> stop before task decomposition and report that the work plan needs update via recipe-front-plan
+- `needs_revision` -> stop before task decomposition and report that the work plan needs update via recipe-front-plan
+- `rejected` -> stop before task decomposition and present the blocking findings to the user
+When task files don't exist and the WorkPlan Review section records `Status: approved` and `Conditions: none`, skip Work Plan Review and continue to user confirmation.
+### 2. User Confirmation
 ```
 No task files found.
 Work plan: docs/plans/[plan-name].md
@@ -70,10 +84,10 @@ Work plan: docs/plans/[plan-name].md
 Generate tasks from the work plan? (y/n):
 ```
-### 2. Task Decomposition (if approved)
+### 3. Task Decomposition (if approved)
 Spawn task-decomposer agent: "Read work plan at docs/plans/[plan-name].md and decompose into atomic tasks. Output: Individual task files in docs/plans/tasks/. Granularity: 1 task = 1 commit = independently executable"
-### 3. Verify Generation
+### 4. Verify Generation
 Recompute the Consumed Task Set and verify it is non-empty.
 ## Pre-execution Checklist
@@ -131,7 +145,7 @@ VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous ex
 ## Post-Implementation Verification (After All Tasks Complete)
 After all task cycles finish, collect all `filesModified` from every task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
-1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
+1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]. Work Plan Review Scope: [Review Scope value from the active work plan, used only to confirm the collected file set is complete]."
 2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
 3. Consolidate results:
    - code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`

package/.agents/skills/recipe-front-plan/SKILL.md CHANGED Viewed

@@ -20,6 +20,7 @@ description: "Create frontend work plan from design document with test skeleton
 **Execution Method**:
 - Test skeleton generation -> performed by acceptance-test-generator
 - Work plan creation -> performed by work-planner
+- Work plan review -> performed by document-reviewer
 Orchestrator spawns agents and passes structured data between them.
@@ -29,6 +30,7 @@ Orchestrator spawns agents and passes structured data between them.
 - Design document selection
 - Test skeleton generation with acceptance-test-generator
 - Work plan creation with work-planner
+- Work plan review with document-reviewer
 - Plan approval obtainment
 **Responsibility Boundary**: This skill completes with work plan approval.
@@ -50,6 +52,15 @@ Spawn acceptance-test-generator agent: "Generate test skeletons from Design Doc
 ### Step 3: Work Plan Creation
 Spawn work-planner agent: "Create work plan from Design Doc at [path]. Integration test file: [path from step 2]. fixture-e2e test file: [path from step 2 or null]. service-integration-e2e test file: [path from step 2 or null]. E2E absence reasons by lane: [values from step 2 when an E2E lane is null]. Integration tests are created with each phase implementation, fixture-e2e runs alongside UI implementation, service-integration-e2e runs only in the final phase when a service E2E file exists. Include `Implementation Readiness: pending` in the work plan header."
+### Step 4: Work Plan Review
+Spawn document-reviewer agent: "Review the frontend work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Branch on `verdict.decision`:
+- `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then proceed to Step 5
+- `approved_with_conditions` or `needs_revision` -> spawn work-planner in update mode with the findings or conditions, then repeat Step 4. Use max 2 revision iterations as defined by the `needs_revision` row in subagents-orchestration-guide Approval Status Vocabulary.
+- `rejected` -> stop and present the blocking findings to the user.
+### Step 5: Plan Approval
 **[STOP -- BLOCKING]** Interact with user to complete plan and obtain approval for plan content. Clarify specific implementation steps and risks.
 **CANNOT proceed until user explicitly approves the work plan.**
@@ -60,6 +71,7 @@ ENFORCEMENT: Plan content MUST be approved before declaring completion. Unapprov
 - [ ] Design document selected
 - [ ] Test skeletons generated
 - [ ] Work plan created
+- [ ] Work plan reviewed via document-reviewer
 - [ ] User approved plan content
 ## Output Example

package/.agents/skills/recipe-front-review/SKILL.md CHANGED Viewed

@@ -31,12 +31,13 @@ Design Doc (uses most recent if omitted): $ARGUMENTS
 ### 1. Prerequisite Check
 Identify the Design Doc in docs/design/ and check implementation files changed from the default branch (detect via `git symbolic-ref refs/remotes/origin/HEAD` or fall back to current branch diff).
+If a single active work plan is explicitly provided or unambiguously resolved for that Design Doc, read its `Review Scope` line. Otherwise set `Work Plan: none` and `Review Scope: none`; do not infer.
 **[STOP -- BLOCKING]** If no Design Doc or implementation files found, notify user and halt.
 **CANNOT proceed without both a Design Doc and implementation files.**
 ### 2. Execute code-reviewer
-Spawn code-reviewer agent: "Validate Design Doc compliance for [design-doc-path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
+Spawn code-reviewer agent: "Validate Design Doc compliance for [design-doc-path]. Work Plan: [resolved work plan path or none]. Review Scope: [literal Review Scope value or none]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
 **Store output as**: `$STEP_2_OUTPUT`

package/.agents/skills/recipe-fullstack-build/SKILL.md CHANGED Viewed

@@ -65,14 +65,28 @@ Analyze task file existence state and determine the action required:
 | State | Criteria | Next Action |
 |-------|----------|-------------|
 | Tasks exist | Consumed Task Set is non-empty | User's execution instruction serves as batch approval -> Enter autonomous execution immediately |
-| No tasks + plan exists | Consumed Task Set is empty but plan exists | Confirm with user -> spawn task-decomposer |
+| No tasks + plan exists + reviewed plan | Consumed Task Set is empty and WorkPlan Review records `Status: approved`, `Conditions: none` | Confirm with user -> spawn task-decomposer |
+| No tasks + small simplified plan | Consumed Task Set is empty, plan exists, and the plan references no Design Doc | Confirm with user -> spawn task-decomposer |
+| No tasks + plan exists + unreviewed plan | Consumed Task Set is empty, the plan references a Design Doc, and WorkPlan Review is absent, pending, conditional, or not approved | Run work plan review, then confirm with user -> spawn task-decomposer |
 | Neither exists | No plan or task files | Error: Prerequisites not met |
 ## Task Decomposition Phase (Conditional)
-When task files don't exist:
+When task files don't exist, the plan references a Design Doc, and the WorkPlan Review section is absent, pending, conditional, or not approved:
-### 1. User Confirmation
+### 1. Work Plan Review
+Spawn document-reviewer agent: "Review the fullstack work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Branch on `verdict.decision`:
+- `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
+- `approved_with_conditions` -> stop before task decomposition and report that the work plan needs update via recipe-plan or the fullstack planning flow
+- `needs_revision` -> stop before task decomposition and report that the work plan needs update via recipe-plan or the fullstack planning flow
+- `rejected` -> stop before task decomposition and present the blocking findings to the user
+When task files don't exist and the WorkPlan Review section records `Status: approved` and `Conditions: none`, skip Work Plan Review and continue to user confirmation.
+### 2. User Confirmation
 ```
 No task files found.
 Work plan: docs/plans/[plan-name].md
@@ -80,10 +94,10 @@ Work plan: docs/plans/[plan-name].md
 Generate tasks from the work plan? (y/n):
 ```
-### 2. Task Decomposition (if approved)
+### 3. Task Decomposition (if approved)
 Spawn task-decomposer agent: "Read work plan at docs/plans/[plan-name].md and decompose into atomic tasks. Output: Individual task files in docs/plans/tasks/. Granularity: 1 task = 1 commit = independently executable. Use layer-aware naming: {plan}-backend-task-{n}.md, {plan}-frontend-task-{n}.md based on target file paths."
-### 3. Verify Generation
+### 4. Verify Generation
 Recompute the Consumed Task Set and verify it is non-empty.
 ## Pre-execution Checklist
@@ -141,7 +155,7 @@ VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous ex
 ## Post-Implementation Verification (After All Tasks Complete)
 After all task cycles finish, collect all `filesModified` from every task-executor/task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
-1. Spawn code-verifier once per Design Doc: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [single design doc path]. `code_paths`: [collected filesModified list]."
+1. Spawn code-verifier once per Design Doc: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [single design doc path]. `code_paths`: [collected filesModified list]. Work Plan Review Scope: [Review Scope value from the active work plan, used only to confirm the collected file set is complete]."
 2. Spawn security-reviewer agent: "Design Doc: [path(s)]. Implementation files: [collected filesModified list]. Review security compliance."
 3. Consolidate results:
    - each code-verifier run passes when `summary.status` is `consistent` or `mostly_consistent`

package/.agents/skills/recipe-implement/SKILL.md CHANGED Viewed

@@ -69,7 +69,7 @@ Follow subagents-orchestration-guide skill Large/Medium/Small scale flow exactly
 **STEP 3**: Spawn technical-designer-frontend agent → spawn document-reviewer agent → spawn design-sync agent.
 **[STOP — BLOCKING]** Present Frontend Design Doc for user approval. **CANNOT proceed until user explicitly confirms.**
-**STEP 4**: Spawn acceptance-test-generator agent → spawn work-planner agent.
+**STEP 4**: Spawn acceptance-test-generator agent → spawn work-planner agent → spawn document-reviewer agent with `doc_type: WorkPlan`.
 **[STOP — BLOCKING]** Present Work Plan for user approval. **CANNOT proceed until user explicitly confirms.**
 **STEP 5**: Run implementation readiness preflight.

package/.agents/skills/recipe-plan/SKILL.md CHANGED Viewed

@@ -33,6 +33,7 @@ ENFORCEMENT: Work-planner spawned without test skeleton data (when tests were re
 - Design document selection
 - Test skeleton generation with acceptance-test-generator
 - Work plan creation with work-planner
+- Work plan review with document-reviewer
 - Plan approval obtainment
 **Responsibility Boundary**: This skill completes with work plan approval.
@@ -53,7 +54,18 @@ Present options if multiple exist (can be specified with $ARGUMENTS).
 ### Step 3: Work Plan Creation
 - Spawn work-planner agent: "Create work plan from design document at [design-doc-path]. Include deliverables from previous process according to subagents-orchestration-guide skill coordination specification. If `generatedFiles.fixtureE2e` or `generatedFiles.serviceE2e` is null, use the corresponding `e2eAbsenceReason` and accept the null E2E lane as a valid planning input. Include `Implementation Readiness: pending` in the work plan header."
-- Interact with user to complete plan and obtain approval for plan content
+### Step 4: Work Plan Review
+Spawn document-reviewer agent: "Review the work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+Branch on `verdict.decision`:
+- `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then proceed to Step 5
+- `approved_with_conditions` or `needs_revision` -> spawn work-planner in update mode with the findings or conditions, then repeat Step 4. Use max 2 revision iterations as defined by the `needs_revision` row in subagents-orchestration-guide Approval Status Vocabulary.
+- `rejected` -> stop and present the blocking findings to the user.
+### Step 5: Plan Approval
+- Present the reviewed work plan to the user for batch approval
+- If the user requests changes, spawn work-planner in update mode and re-run Step 4
 - Clarify specific implementation steps and risks
 **Scope**: Up to work plan creation and obtaining approval for plan content.
@@ -63,6 +75,7 @@ Present options if multiple exist (can be specified with $ARGUMENTS).
 - [ ] Design document identified and selected
 - [ ] Integration/E2E test skeleton generation confirmed with user (generated if requested)
 - [ ] Work plan created via work-planner
+- [ ] Work plan reviewed via document-reviewer
 - [ ] Plan content approved by user
 - [ ] All stopping points honored with user confirmation

package/.agents/skills/recipe-review/SKILL.md CHANGED Viewed

@@ -36,9 +36,10 @@ Design Doc (uses most recent if omitted): $ARGUMENTS
 ### Step 1: Prerequisite Check
 Identify Design Doc in docs/design/ and check implementation files via git diff.
+If a single active work plan is explicitly provided or unambiguously resolved for that Design Doc, read its `Review Scope` line. Otherwise set `Work Plan: none` and `Review Scope: none`; do not infer.
 ### Step 2: Execute code-reviewer
-Spawn code-reviewer agent: "Validate Design Doc compliance for the implementation. Design Doc path: [path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
+Spawn code-reviewer agent: "Validate Design Doc compliance for the implementation. Design Doc path: [path]. Work Plan: [resolved work plan path or none]. Review Scope: [literal Review Scope value or none]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
 **Store output as**: `$STEP_2_OUTPUT`

package/.agents/skills/subagents-orchestration-guide/SKILL.md CHANGED Viewed

@@ -140,7 +140,7 @@ Autonomous execution MUST stop and wait for user input at these points.
 | UI Spec | After document-reviewer completes UI Spec review (frontend/fullstack) | Approve UI Spec |
 | ADR | After document-reviewer completes ADR review (if ADR created) | Approve ADR |
 | Design | After design-sync completes consistency verification | Approve Design Doc |
-| Work Plan | After work-planner creates plan | Batch approval for implementation phase |
+| Work Plan | After document-reviewer completes WorkPlan review for Medium/Large, or after simplified plan creation for Small | Batch approval for implementation phase |
 **ENFORCEMENT**: After batch approval, autonomous execution proceeds without stops until completion or escalation. Skipping stop points is a CRITICAL VIOLATION.
@@ -164,6 +164,16 @@ Handling rules:
 **ENFORCEMENT**: Using any status value outside this vocabulary is a VIOLATION.
+### WorkPlan Review State [MANDATORY]
+Medium and Large work plans must contain a `WorkPlan Review` section. Small simplified plans are exempt because they have no Design Doc to trace against. The plan is reviewed only when that section records `Status: approved` and `Conditions: none`.
+Handling rules:
+- After WorkPlan review returns `approved`, invoke work-planner in update mode once to record the review section, without changing implementation content.
+- Treat WorkPlan `approved_with_conditions` the same as `needs_revision`: return to work-planner in update mode with the conditions, then re-review. Conditions must not be carried into task decomposition or implementation readiness.
+- A material work plan update resets `WorkPlan Review` to `Status: pending`.
+- Standalone build recipes apply WorkPlan review only before task decomposition, not after task files already exist.
 ## Scale Determination and Document Requirements
 | Scale | File Count | PRD | ADR | Design Doc | Work Plan |
@@ -242,8 +252,8 @@ Always start with `requirement-analyzer`, then follow the minimum flow required
 | Scale | Required flow |
 |-------|---------------|
-| Large | `requirement-analyzer` **[Stop]** -> `prd-creator` -> `document-reviewer` **[Stop]** -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> optional ADR + `document-reviewer` **[Stop]** -> `codebase-analyzer` -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` **[Stop]** -> `task-decomposer` |
-| Medium | `requirement-analyzer` **[Stop]** -> `codebase-analyzer` -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` **[Stop]** -> `task-decomposer` |
+| Large | `requirement-analyzer` **[Stop]** -> `prd-creator` -> `document-reviewer` **[Stop]** -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> optional ADR + `document-reviewer` **[Stop]** -> `codebase-analyzer` -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` -> `document-reviewer` (doc_type: WorkPlan) **[Stop]** -> `task-decomposer` |
+| Medium | `requirement-analyzer` **[Stop]** -> `codebase-analyzer` -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` -> `document-reviewer` (doc_type: WorkPlan) **[Stop]** -> `task-decomposer` |
 | Small | `requirement-analyzer` **[Stop]** -> simplified plan **[Stop: Batch approval]** -> direct implementation |
 Flow rules:
@@ -253,6 +263,7 @@ Flow rules:
 - Pass `codebase-analyzer` output to the designer as `Codebase Analysis`
 - Pass Design Doc path to `code-verifier`, then pass `code_verification` to `document-reviewer`
 - Fullstack layer sequencing is defined in `references/monorepo-flow.md`
+- Run WorkPlan review after every Medium/Large work plan creation or update and before batch approval. On `needs_revision` or WorkPlan `approved_with_conditions`, return to `work-planner` in update mode and re-review for max 2 revision iterations as defined by the `needs_revision` row in Approval Status Vocabulary. On `rejected`, halt and escalate to the user.
 ## Autonomous Execution Mode

package/.agents/skills/subagents-orchestration-guide/references/monorepo-flow.md CHANGED Viewed

@@ -10,7 +10,7 @@ This reference defines the orchestration flow for projects spanning multiple lay
 ## Design Phase
-### Large Scale Fullstack (6+ Files) - 15 Steps
+### Large Scale Fullstack (6+ Files) - 16 Steps
 | Step | Agent | Purpose | Output |
 |------|-------|---------|--------|
@@ -28,9 +28,10 @@ This reference defines the orchestration flow for projects spanning multiple lay
 | 12 | document-reviewer x2 | Review each Design Doc with verification evidence | Reviews |
 | 13 | design-sync | Cross-layer consistency verification (source: frontend Design Doc) **[Stop]** | Sync status |
 | 14 | acceptance-test-generator | Integration/E2E test skeleton from cross-layer contracts | Test skeletons |
-| 15 | work-planner | Work plan from all Design Docs **[Stop: Batch approval]** | Work plan |
+| 15 | work-planner | Work plan from all Design Docs | Work plan |
+| 16 | document-reviewer | WorkPlan review **[Stop: Batch approval]** | Approval |
-### Medium Scale Fullstack (3-5 Files) - 13 Steps
+### Medium Scale Fullstack (3-5 Files) - 14 Steps
 | Step | Agent | Purpose | Output |
 |------|-------|---------|--------|
@@ -46,7 +47,8 @@ This reference defines the orchestration flow for projects spanning multiple lay
 | 10 | document-reviewer x2 | Review each Design Doc with verification evidence | Reviews |
 | 11 | design-sync | Cross-layer consistency verification (source: frontend Design Doc) **[Stop]** | Sync status |
 | 12 | acceptance-test-generator | Integration/E2E test skeleton from cross-layer contracts | Test skeletons |
-| 13 | work-planner | Work plan from all Design Docs **[Stop: Batch approval]** | Work plan |
+| 13 | work-planner | Work plan from all Design Docs | Work plan |
+| 14 | document-reviewer | WorkPlan review **[Stop: Batch approval]** | Approval |
 ### Parallelization in Multi-Agent Steps
@@ -101,6 +103,12 @@ Spawn work-planner with all Design Docs:
 work-planner's existing Integration Complete criteria naturally covers cross-layer verification when given multiple Design Docs.
+After work-planner creates or updates the plan, spawn document-reviewer:
+> "Review the fullstack work plan. doc_type: WorkPlan. target: [work plan path]. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
+On `needs_revision` or `approved_with_conditions`, return to work-planner in update mode and re-review for max 2 revision iterations as defined by the `needs_revision` row in Approval Status Vocabulary. On `rejected`, halt and escalate to the user. Stop for batch approval only after WorkPlan review returns `approved` and the plan's `WorkPlan Review` section records `Status: approved` with `Conditions: none`.
 ## Task Decomposition Phase
 task-decomposer follows standard decomposition from the work plan. The key addition is the **layer-aware naming convention**:

package/.codex/agents/acceptance-test-generator.toml CHANGED Viewed

@@ -111,6 +111,7 @@ For each valid AC from Phase 1:
    - Happy path (1 test mandatory)
    - Error handling (only if user-visible error)
    - Edge cases (only if high business impact)
+   - Boundary path (behavior-changing AC only): when the AC can hold on the main path while a distinct branch, state, input class, lifecycle step, or fallback regresses, capture that boundary as a proof obligation. Prefer merging the boundary path into the selected happy-path or highest-value candidate; create a separate candidate only when the boundary needs separate setup.
 2. **Classify test level**:
    - Integration test candidate (feature-level interaction)
@@ -167,7 +168,8 @@ Value score and E2E selection rules are defined in **integration-e2e-testing ski
 4. Reserve 1 service-integration-e2e slot only when the journey needs real cross-service verification
 5. Fill remaining fixture-e2e budget with candidates that satisfy `Value Score >= 20`
 6. Fill remaining service-integration-e2e budget with candidates that satisfy `Value Score > 50`
-7. If a lane emits no tests, return its generated file as `null` with a concrete lane-specific absence reason
+7. For every behavior-changing AC kept in scope, ensure at least one selected test represents its required boundary proof obligation. Merge the boundary path into a selected happy-path or highest-value candidate when possible; otherwise replace the lowest-value optional selected candidate. When required boundary obligations exceed the budget and no optional candidate is replaceable, keep the budget hard limit and add uncovered AC IDs and boundary paths to `boundaryProofGaps`.
+8. If a lane emits no tests, return its generated file as `null` with a concrete lane-specific absence reason
 ```
 **Output**: Final test set
@@ -192,6 +194,8 @@ Adapt comment syntax to the project's language when generating annotations.
   // @dependency: PaymentService, OrderRepository, Database
   // @real-dependency: OrderRepository, Database
   // @complexity: high
+  // Primary failure mode: payment succeeds but the order row is absent or unpersisted
+  // Proof obligation: assert order persistence after successful payment while keeping OrderRepository and Database real; only the external payment gateway may be mocked
   [Test: 'AC1: Successful payment creates persisted order with correct status']
   // AC1-error: "Payment failure shows user-friendly error message"
@@ -200,6 +204,8 @@ Adapt comment syntax to the project's language when generating annotations.
   // @category: core-functionality
   // @dependency: PaymentService, ErrorHandler
   // @complexity: medium
+  // Primary failure mode: payment failure still creates an order or hides the user-facing error
+  // Proof obligation: assert the visible error and the unchanged order state after a failed payment; mock only the external payment gateway failure
   [Test: 'AC1: Failed payment displays error without creating order']
 ```
@@ -221,6 +227,8 @@ Adapt comment syntax to the project's language when generating annotations.
   // @lane: fixture-e2e
   // @dependency: full-ui (mocked backend)
   // @complexity: medium
+  // Primary failure mode: undo banner appears but the dismissed card is not restored
+  // Proof obligation: assert browser-visible state before dismissal, after dismissal, and after undo using fixture-controlled backend state
   [Test: 'User Journey: Dismiss and undo restores the card']
 ```
@@ -242,6 +250,8 @@ Adapt comment syntax to the project's language when generating annotations.
   // @lane: service-integration-e2e
   // @dependency: full-system
   // @complexity: high
+  // Primary failure mode: checkout appears successful but the persisted order or confirmation event is missing
+  // Proof obligation: exercise the full local service stack and assert persisted order state plus confirmation event after checkout
   [Test: 'User Journey: Complete product purchase persists order and emits confirmation']
 ```
@@ -264,7 +274,8 @@ Adapt comment syntax to the project's language when generating annotations.
   "e2eAbsenceReason": {
     "fixtureE2e": "all_e2e_candidates_below_threshold",
     "serviceE2e": "no_real_service_dependency"
-  }
+  },
+  "boundaryProofGaps": []
 }
 ```
@@ -285,7 +296,14 @@ Adapt comment syntax to the project's language when generating annotations.
   "e2eAbsenceReason": {
     "fixtureE2e": null,
     "serviceE2e": null
-  }
+  },
+  "boundaryProofGaps": [
+    {
+      "acId": "[AC-XXX]",
+      "boundaryPath": "[branch/state/input/lifecycle/fallback/visibility path]",
+      "reason": "budget_insufficient_for_boundary_proof"
+    }
+  ]
 }
 ```
@@ -297,13 +315,15 @@ Each test case MUST have the following standard annotations for test implementat
 - **@lane**: integration | fixture-e2e | service-integration-e2e
 - **@dependency**: none | [component names] | full-ui (mocked backend) | full-system
 - **@complexity**: low | medium | high
+- **Primary failure mode**: the specific regression that should make the implemented test fail
+- **Proof obligation**: what the implemented test must assert to prove the claim, including the boundary to exercise, before/action/after state for state-changing claims, and which boundaries may be mocked with rationale. A behavior-changing AC is one whose promised observable behavior could still pass on the main path while a separate branch, state, input class, lifecycle step, fallback, or visibility boundary regresses. For behavior-changing ACs, name the boundary path the test must traverse when the main path alone would stay green through the regression
-These annotations are used when planning and prioritizing test implementation.
+These annotations are used when planning and prioritizing test implementation. Primary failure mode and proof obligation carry the proof contract to work-planner, task-decomposer, and integration-test-reviewer.
 ## Constraints and Quality Standards
 **Mandatory Compliance**:
-- Output test skeletons only: verification points, expected results, and pass criteria
+- Output test skeletons only: verification points, expected results, pass criteria, primary failure mode, and proof obligation
 - Downstream consumers treat these skeletons as design artifacts rather than runnable tests
 - Clearly state verification points, expected results, and pass criteria for each test
 - Preserve original AC statements in comments (ensure traceability)

package/.codex/agents/code-reviewer.toml CHANGED Viewed

@@ -53,7 +53,7 @@ Skill Status:
 ## Input Parameters
 - **designDoc**: Path to the Design Doc (or multiple paths for fullstack features)
-- **implementationFiles**: List of files to review (or git diff range)
+- **implementationFiles**: List of files to review (or git diff range). When a Work Plan is provided and implementationFiles is omitted or ambiguous, derive the review file set from the plan's `Review Scope` value; for revision plans, use the recorded base branch plus diff range.
 - **reviewMode**: `full` (default) | `acceptance` | `architecture`
 ## Workflow
@@ -75,6 +75,9 @@ For each acceptance criterion extracted in Step 1:
 - Determine status: fulfilled / partially fulfilled / unfulfilled
 - Record the file path and relevant code location
 - Note any deviations from the Design Doc specification
+- For behavior-changing ACs, confirm the evidence covers main and boundary paths. Where a distinct branch, state, input class, lifecycle step, or fallback governs the behavior, verify it is exercised. Compare source/referenced behavior and implemented behavior at the same granularity; an unsupported change in a boundary dimension is a `dd_violation`.
+- Confirm the implementation keeps the core mechanism the AC, Design Doc, or referenced materials require. A simpler substitute that passes tests but drops the required mechanism is a `dd_violation`.
+- For changes to persisted, shared, or externally observable state, identify the publication boundary where the new state becomes observable to another process, component, user, or later step. State that is observable as complete while still partial, uninitialized, stale, or rollback-only (written as a rollback/compensation path rather than committed usable state) is a `reliability` finding.
 #### 2-2. Identifier Verification
 For each identifier specification extracted in Step 1:
@@ -124,6 +127,7 @@ Read error paths and boundary handling directly in the code:
   - Meaningful coverage: at least one assertion exercises the AC's observable behavior
   - Coverage gap: `skip`/`xit` on tests that should run, TODO/placeholder-only bodies, always-true assertions (for example `expect(true).toBe(true)` or `expect(arr.length).toBeGreaterThanOrEqual(0)`), 0-match runner reports, or grep-only matches without behavior verification
   - Intentional absence: meaningful when absence is the AC expectation
+  - Proof adequacy: a covered test should fail under the AC's primary failure mode and should exercise the claimed boundary rather than a substitute input that bypasses it. A test that would stay green if the claimed behavior regressed is a `coverage_gap` with rationale naming the unproven failure mode.
 Classify each quality finding into one of:
 - `dd_violation`: implementation deviates from the Design Doc