codex-workflows 0.6.5 → 0.6.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -5,8 +5,16 @@ Type: feature|fix|refactor
5
5
  Estimated Duration: X days
6
6
  Estimated Impact: X files
7
7
  Related Issue/PR: #XXX (if any)
8
+ Review Scope: [planned-files scope derived from Design Doc and task targets; for a revision plan over existing work, base branch + diff range]
8
9
  Implementation Readiness: pending
9
10
 
11
+ ## WorkPlan Review
12
+
13
+ This section records the review gate state for the exact plan content. Set `Status: pending` when the plan is created or materially updated. The orchestrator treats only `Status: approved` with `Conditions: none` as reviewed.
14
+
15
+ - **Status**: pending|approved
16
+ - **Conditions**: none
17
+
10
18
  ## Related Documents
11
19
  - Design Doc(s):
12
20
  - [docs/design/XXX.md]
@@ -35,6 +43,10 @@ Repeat this block for each Design Doc when multiple Design Docs exist. Preserve
35
43
  - **Success criteria**: [extracted from Design Doc]
36
44
  - **Failure response**: [extracted from Design Doc]
37
45
 
46
+ ### Proof Strategy
47
+ - **Proof obligation source**: [test skeleton annotations (`Primary failure mode`, `Proof obligation`) when skeletons exist; otherwise each acceptance criterion's primary failure mode derived from the Design Doc]
48
+ - **Per-task propagation**: every task that implements or verifies a claim records the AC ID or claim identifier in Proof Obligations (see task template) so downstream review can judge whether tests prove the claim, not merely run
49
+
38
50
  ## Quality Assurance Mechanisms (from Design Docs)
39
51
 
40
52
  Adopted quality gates for the change area. Each task in this plan must satisfy the applicable mechanisms.
@@ -69,6 +81,21 @@ Map each Design Doc technical requirement to the task or phase that covers it. U
69
81
  - Merge duplicate restatements of the same obligation from multiple DD sections into one row and cite the primary section in `DD Section`
70
82
  - Keep `scope-boundary` rows concrete: name the protected file group, component boundary, contract, or workflow that must remain unchanged
71
83
 
84
+ ## Failure Mode Checklist
85
+
86
+ Domain-independent failure categories this implementation must guard against. Enumerate all eight categories, mark which apply, and list a covering task for each that applies; keep category names generic and place project-specific detail in task descriptions or notes.
87
+
88
+ | Category | Applies? | Covered By Task(s) |
89
+ |----------|----------|--------------------|
90
+ | same-value | yes/no | [P1-T1] |
91
+ | no-op | yes/no | |
92
+ | empty input | yes/no | |
93
+ | invalid option | yes/no | |
94
+ | missing config | yes/no | |
95
+ | unavailable boundary | yes/no | |
96
+ | shared-state dependency | yes/no | |
97
+ | rollback-only visibility | yes/no | |
98
+
72
99
  ## UI Spec Component -> Task Mapping
73
100
 
74
101
  Include this section when a UI Spec is among the inputs. Map each UI component section to the task(s) that implement it so task-decomposer can pass the exact UI Spec context to executor tasks. Omit this section when no UI Spec exists.
@@ -58,9 +58,21 @@ Brief observations recorded after reading Investigation Targets:
58
58
  - **Failure response**: [What to do if verification fails]
59
59
  - **Verification level**: [L1 unit/local verification, L2 integration verification, or L3 end-to-end verification]
60
60
 
61
+ ## Proof Obligations
62
+ (Include one entry per acceptance criterion, user journey, boundary, or state transition this task implements or verifies. Derive from test skeleton annotations when present; otherwise derive from the acceptance criterion's primary failure mode.)
63
+ - **AC / Claim ID**: [AC-XXX, user journey identifier, boundary identifier, or task claim identifier]
64
+ - **Claim**: [behavior the acceptance criterion or task promises]
65
+ - **Primary failure mode**: [regression the test should turn red on]
66
+ - **Boundary to exercise**: [public/integration/browser/process/service/persistence boundary, or "in-process unit"]
67
+ - **State assertion**: [observable state before -> action -> after for state-changing claims; "N/A" otherwise]
68
+ - **Mock boundary rationale**: [which external boundaries may be mocked and why; "none" when all real]
69
+ - **Residual**: [what this task-level proof leaves unestablished, and which later task or phase closes it]
70
+
61
71
  ## Completion Criteria
72
+ - [ ] All listed AC / Claim IDs are implemented or verified by this task
62
73
  - [ ] All added tests pass
63
74
  - [ ] Operation verified per Operation Verification Methods above
75
+ - [ ] Each Proof Obligation is met: the test turns red under its primary failure mode and exercises the stated boundary
64
76
  - [ ] Deliverables created (for research/design tasks)
65
77
  - [ ] When Binding Decisions exist, every Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes
66
78
 
@@ -55,14 +55,28 @@ Analyze task file existence state and determine the action required:
55
55
  | State | Criteria | Next Action |
56
56
  |-------|----------|-------------|
57
57
  | Tasks exist | Consumed Task Set is non-empty | User's execution instruction serves as batch approval -> Enter autonomous execution immediately |
58
- | No tasks + plan exists | Consumed Task Set is empty but plan exists | Confirm with user -> spawn task-decomposer |
58
+ | No tasks + plan exists + reviewed plan | Consumed Task Set is empty and WorkPlan Review records `Status: approved`, `Conditions: none` | Confirm with user -> spawn task-decomposer |
59
+ | No tasks + small simplified plan | Consumed Task Set is empty, plan exists, and the plan references no Design Doc | Confirm with user -> spawn task-decomposer |
60
+ | No tasks + plan exists + unreviewed plan | Consumed Task Set is empty, the plan references a Design Doc, and WorkPlan Review is absent, pending, conditional, or not approved | Run work plan review, then confirm with user -> spawn task-decomposer |
59
61
  | Neither exists | No plan or task files | Error: Prerequisites not met |
60
62
 
61
63
  ## Task Decomposition Phase (Conditional)
62
64
 
63
- When task files don't exist:
65
+ When task files don't exist, the plan references a Design Doc, and the WorkPlan Review section is absent, pending, conditional, or not approved:
64
66
 
65
- ### 1. User Confirmation
67
+ ### 1. Work Plan Review
68
+
69
+ Spawn document-reviewer agent: "Review the work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
70
+
71
+ Branch on `verdict.decision`:
72
+ - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
73
+ - `approved_with_conditions` -> stop before task decomposition and report that the work plan needs update via recipe-plan
74
+ - `needs_revision` -> stop before task decomposition and report that the work plan needs update via recipe-plan
75
+ - `rejected` -> stop before task decomposition and present the blocking findings to the user
76
+
77
+ When task files don't exist and the WorkPlan Review section records `Status: approved` and `Conditions: none`, skip Work Plan Review and continue to user confirmation.
78
+
79
+ ### 2. User Confirmation
66
80
  ```
67
81
  No task files found.
68
82
  Work plan: docs/plans/[plan-name].md
@@ -70,10 +84,10 @@ Work plan: docs/plans/[plan-name].md
70
84
  Generate tasks from the work plan? (y/n):
71
85
  ```
72
86
 
73
- ### 2. Task Decomposition (if approved)
87
+ ### 3. Task Decomposition (if approved)
74
88
  Spawn task-decomposer agent: "Read work plan at docs/plans/[plan-name].md and decompose into atomic tasks. Output: Individual task files in docs/plans/tasks/. Granularity: 1 task = 1 commit = independently executable."
75
89
 
76
- ### 3. Verify Generation
90
+ ### 4. Verify Generation
77
91
  Recompute the Consumed Task Set and verify it is non-empty.
78
92
 
79
93
  ## Pre-execution Checklist
@@ -123,7 +137,7 @@ VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous ex
123
137
  ## Post-Implementation Verification (After All Tasks Complete)
124
138
 
125
139
  After all task cycles finish, collect all `filesModified` from every task-executor response (deduplicated), then run both verification agents before the completion report:
126
- 1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
140
+ 1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]. Work Plan Review Scope: [Review Scope value from the active work plan, used only to confirm the collected file set is complete]."
127
141
  2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
128
142
  3. Consolidate results:
129
143
  - code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`
@@ -55,14 +55,28 @@ Analyze task file existence state and determine the action required:
55
55
  | State | Criteria | Next Action |
56
56
  |-------|----------|-------------|
57
57
  | Tasks exist | Consumed Task Set is non-empty | User's execution instruction serves as batch approval -> Enter autonomous execution immediately |
58
- | No tasks + plan exists | Consumed Task Set is empty but plan exists | Confirm with user -> spawn task-decomposer |
58
+ | No tasks + plan exists + reviewed plan | Consumed Task Set is empty and WorkPlan Review records `Status: approved`, `Conditions: none` | Confirm with user -> spawn task-decomposer |
59
+ | No tasks + small simplified plan | Consumed Task Set is empty, plan exists, and the plan references no Design Doc | Confirm with user -> spawn task-decomposer |
60
+ | No tasks + plan exists + unreviewed plan | Consumed Task Set is empty, the plan references a Design Doc, and WorkPlan Review is absent, pending, conditional, or not approved | Run work plan review, then confirm with user -> spawn task-decomposer |
59
61
  | Neither exists | No plan or task files | Error: Prerequisites not met |
60
62
 
61
63
  ## Task Decomposition Phase (Conditional)
62
64
 
63
- When task files don't exist:
65
+ When task files don't exist, the plan references a Design Doc, and the WorkPlan Review section is absent, pending, conditional, or not approved:
64
66
 
65
- ### 1. User Confirmation
67
+ ### 1. Work Plan Review
68
+
69
+ Spawn document-reviewer agent: "Review the frontend work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
70
+
71
+ Branch on `verdict.decision`:
72
+ - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
73
+ - `approved_with_conditions` -> stop before task decomposition and report that the work plan needs update via recipe-front-plan
74
+ - `needs_revision` -> stop before task decomposition and report that the work plan needs update via recipe-front-plan
75
+ - `rejected` -> stop before task decomposition and present the blocking findings to the user
76
+
77
+ When task files don't exist and the WorkPlan Review section records `Status: approved` and `Conditions: none`, skip Work Plan Review and continue to user confirmation.
78
+
79
+ ### 2. User Confirmation
66
80
  ```
67
81
  No task files found.
68
82
  Work plan: docs/plans/[plan-name].md
@@ -70,10 +84,10 @@ Work plan: docs/plans/[plan-name].md
70
84
  Generate tasks from the work plan? (y/n):
71
85
  ```
72
86
 
73
- ### 2. Task Decomposition (if approved)
87
+ ### 3. Task Decomposition (if approved)
74
88
  Spawn task-decomposer agent: "Read work plan at docs/plans/[plan-name].md and decompose into atomic tasks. Output: Individual task files in docs/plans/tasks/. Granularity: 1 task = 1 commit = independently executable"
75
89
 
76
- ### 3. Verify Generation
90
+ ### 4. Verify Generation
77
91
  Recompute the Consumed Task Set and verify it is non-empty.
78
92
 
79
93
  ## Pre-execution Checklist
@@ -131,7 +145,7 @@ VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous ex
131
145
  ## Post-Implementation Verification (After All Tasks Complete)
132
146
 
133
147
  After all task cycles finish, collect all `filesModified` from every task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
134
- 1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
148
+ 1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]. Work Plan Review Scope: [Review Scope value from the active work plan, used only to confirm the collected file set is complete]."
135
149
  2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
136
150
  3. Consolidate results:
137
151
  - code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`
@@ -20,6 +20,7 @@ description: "Create frontend work plan from design document with test skeleton
20
20
  **Execution Method**:
21
21
  - Test skeleton generation -> performed by acceptance-test-generator
22
22
  - Work plan creation -> performed by work-planner
23
+ - Work plan review -> performed by document-reviewer
23
24
 
24
25
  Orchestrator spawns agents and passes structured data between them.
25
26
 
@@ -29,6 +30,7 @@ Orchestrator spawns agents and passes structured data between them.
29
30
  - Design document selection
30
31
  - Test skeleton generation with acceptance-test-generator
31
32
  - Work plan creation with work-planner
33
+ - Work plan review with document-reviewer
32
34
  - Plan approval obtainment
33
35
 
34
36
  **Responsibility Boundary**: This skill completes with work plan approval.
@@ -50,6 +52,15 @@ Spawn acceptance-test-generator agent: "Generate test skeletons from Design Doc
50
52
  ### Step 3: Work Plan Creation
51
53
  Spawn work-planner agent: "Create work plan from Design Doc at [path]. Integration test file: [path from step 2]. fixture-e2e test file: [path from step 2 or null]. service-integration-e2e test file: [path from step 2 or null]. E2E absence reasons by lane: [values from step 2 when an E2E lane is null]. Integration tests are created with each phase implementation, fixture-e2e runs alongside UI implementation, service-integration-e2e runs only in the final phase when a service E2E file exists. Include `Implementation Readiness: pending` in the work plan header."
52
54
 
55
+ ### Step 4: Work Plan Review
56
+ Spawn document-reviewer agent: "Review the frontend work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
57
+
58
+ Branch on `verdict.decision`:
59
+ - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then proceed to Step 5
60
+ - `approved_with_conditions` or `needs_revision` -> spawn work-planner in update mode with the findings or conditions, then repeat Step 4. Use max 2 revision iterations as defined by the `needs_revision` row in subagents-orchestration-guide Approval Status Vocabulary.
61
+ - `rejected` -> stop and present the blocking findings to the user.
62
+
63
+ ### Step 5: Plan Approval
53
64
  **[STOP -- BLOCKING]** Interact with user to complete plan and obtain approval for plan content. Clarify specific implementation steps and risks.
54
65
  **CANNOT proceed until user explicitly approves the work plan.**
55
66
 
@@ -60,6 +71,7 @@ ENFORCEMENT: Plan content MUST be approved before declaring completion. Unapprov
60
71
  - [ ] Design document selected
61
72
  - [ ] Test skeletons generated
62
73
  - [ ] Work plan created
74
+ - [ ] Work plan reviewed via document-reviewer
63
75
  - [ ] User approved plan content
64
76
 
65
77
  ## Output Example
@@ -31,12 +31,13 @@ Design Doc (uses most recent if omitted): $ARGUMENTS
31
31
 
32
32
  ### 1. Prerequisite Check
33
33
  Identify the Design Doc in docs/design/ and check implementation files changed from the default branch (detect via `git symbolic-ref refs/remotes/origin/HEAD` or fall back to current branch diff).
34
+ If a single active work plan is explicitly provided or unambiguously resolved for that Design Doc, read its `Review Scope` line. Otherwise set `Work Plan: none` and `Review Scope: none`; do not infer.
34
35
 
35
36
  **[STOP -- BLOCKING]** If no Design Doc or implementation files found, notify user and halt.
36
37
  **CANNOT proceed without both a Design Doc and implementation files.**
37
38
 
38
39
  ### 2. Execute code-reviewer
39
- Spawn code-reviewer agent: "Validate Design Doc compliance for [design-doc-path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
40
+ Spawn code-reviewer agent: "Validate Design Doc compliance for [design-doc-path]. Work Plan: [resolved work plan path or none]. Review Scope: [literal Review Scope value or none]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
40
41
 
41
42
  **Store output as**: `$STEP_2_OUTPUT`
42
43
 
@@ -65,14 +65,28 @@ Analyze task file existence state and determine the action required:
65
65
  | State | Criteria | Next Action |
66
66
  |-------|----------|-------------|
67
67
  | Tasks exist | Consumed Task Set is non-empty | User's execution instruction serves as batch approval -> Enter autonomous execution immediately |
68
- | No tasks + plan exists | Consumed Task Set is empty but plan exists | Confirm with user -> spawn task-decomposer |
68
+ | No tasks + plan exists + reviewed plan | Consumed Task Set is empty and WorkPlan Review records `Status: approved`, `Conditions: none` | Confirm with user -> spawn task-decomposer |
69
+ | No tasks + small simplified plan | Consumed Task Set is empty, plan exists, and the plan references no Design Doc | Confirm with user -> spawn task-decomposer |
70
+ | No tasks + plan exists + unreviewed plan | Consumed Task Set is empty, the plan references a Design Doc, and WorkPlan Review is absent, pending, conditional, or not approved | Run work plan review, then confirm with user -> spawn task-decomposer |
69
71
  | Neither exists | No plan or task files | Error: Prerequisites not met |
70
72
 
71
73
  ## Task Decomposition Phase (Conditional)
72
74
 
73
- When task files don't exist:
75
+ When task files don't exist, the plan references a Design Doc, and the WorkPlan Review section is absent, pending, conditional, or not approved:
74
76
 
75
- ### 1. User Confirmation
77
+ ### 1. Work Plan Review
78
+
79
+ Spawn document-reviewer agent: "Review the fullstack work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
80
+
81
+ Branch on `verdict.decision`:
82
+ - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
83
+ - `approved_with_conditions` -> stop before task decomposition and report that the work plan needs update via recipe-plan or the fullstack planning flow
84
+ - `needs_revision` -> stop before task decomposition and report that the work plan needs update via recipe-plan or the fullstack planning flow
85
+ - `rejected` -> stop before task decomposition and present the blocking findings to the user
86
+
87
+ When task files don't exist and the WorkPlan Review section records `Status: approved` and `Conditions: none`, skip Work Plan Review and continue to user confirmation.
88
+
89
+ ### 2. User Confirmation
76
90
  ```
77
91
  No task files found.
78
92
  Work plan: docs/plans/[plan-name].md
@@ -80,10 +94,10 @@ Work plan: docs/plans/[plan-name].md
80
94
  Generate tasks from the work plan? (y/n):
81
95
  ```
82
96
 
83
- ### 2. Task Decomposition (if approved)
97
+ ### 3. Task Decomposition (if approved)
84
98
  Spawn task-decomposer agent: "Read work plan at docs/plans/[plan-name].md and decompose into atomic tasks. Output: Individual task files in docs/plans/tasks/. Granularity: 1 task = 1 commit = independently executable. Use layer-aware naming: {plan}-backend-task-{n}.md, {plan}-frontend-task-{n}.md based on target file paths."
85
99
 
86
- ### 3. Verify Generation
100
+ ### 4. Verify Generation
87
101
  Recompute the Consumed Task Set and verify it is non-empty.
88
102
 
89
103
  ## Pre-execution Checklist
@@ -141,7 +155,7 @@ VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous ex
141
155
  ## Post-Implementation Verification (After All Tasks Complete)
142
156
 
143
157
  After all task cycles finish, collect all `filesModified` from every task-executor/task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
144
- 1. Spawn code-verifier once per Design Doc: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [single design doc path]. `code_paths`: [collected filesModified list]."
158
+ 1. Spawn code-verifier once per Design Doc: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [single design doc path]. `code_paths`: [collected filesModified list]. Work Plan Review Scope: [Review Scope value from the active work plan, used only to confirm the collected file set is complete]."
145
159
  2. Spawn security-reviewer agent: "Design Doc: [path(s)]. Implementation files: [collected filesModified list]. Review security compliance."
146
160
  3. Consolidate results:
147
161
  - each code-verifier run passes when `summary.status` is `consistent` or `mostly_consistent`
@@ -69,7 +69,7 @@ Follow subagents-orchestration-guide skill Large/Medium/Small scale flow exactly
69
69
  **STEP 3**: Spawn technical-designer-frontend agent → spawn document-reviewer agent → spawn design-sync agent.
70
70
  **[STOP — BLOCKING]** Present Frontend Design Doc for user approval. **CANNOT proceed until user explicitly confirms.**
71
71
 
72
- **STEP 4**: Spawn acceptance-test-generator agent → spawn work-planner agent.
72
+ **STEP 4**: Spawn acceptance-test-generator agent → spawn work-planner agent → spawn document-reviewer agent with `doc_type: WorkPlan`.
73
73
  **[STOP — BLOCKING]** Present Work Plan for user approval. **CANNOT proceed until user explicitly confirms.**
74
74
 
75
75
  **STEP 5**: Run implementation readiness preflight.
@@ -33,6 +33,7 @@ ENFORCEMENT: Work-planner spawned without test skeleton data (when tests were re
33
33
  - Design document selection
34
34
  - Test skeleton generation with acceptance-test-generator
35
35
  - Work plan creation with work-planner
36
+ - Work plan review with document-reviewer
36
37
  - Plan approval obtainment
37
38
 
38
39
  **Responsibility Boundary**: This skill completes with work plan approval.
@@ -53,7 +54,18 @@ Present options if multiple exist (can be specified with $ARGUMENTS).
53
54
 
54
55
  ### Step 3: Work Plan Creation
55
56
  - Spawn work-planner agent: "Create work plan from design document at [design-doc-path]. Include deliverables from previous process according to subagents-orchestration-guide skill coordination specification. If `generatedFiles.fixtureE2e` or `generatedFiles.serviceE2e` is null, use the corresponding `e2eAbsenceReason` and accept the null E2E lane as a valid planning input. Include `Implementation Readiness: pending` in the work plan header."
56
- - Interact with user to complete plan and obtain approval for plan content
57
+
58
+ ### Step 4: Work Plan Review
59
+ Spawn document-reviewer agent: "Review the work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
60
+
61
+ Branch on `verdict.decision`:
62
+ - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then proceed to Step 5
63
+ - `approved_with_conditions` or `needs_revision` -> spawn work-planner in update mode with the findings or conditions, then repeat Step 4. Use max 2 revision iterations as defined by the `needs_revision` row in subagents-orchestration-guide Approval Status Vocabulary.
64
+ - `rejected` -> stop and present the blocking findings to the user.
65
+
66
+ ### Step 5: Plan Approval
67
+ - Present the reviewed work plan to the user for batch approval
68
+ - If the user requests changes, spawn work-planner in update mode and re-run Step 4
57
69
  - Clarify specific implementation steps and risks
58
70
 
59
71
  **Scope**: Up to work plan creation and obtaining approval for plan content.
@@ -63,6 +75,7 @@ Present options if multiple exist (can be specified with $ARGUMENTS).
63
75
  - [ ] Design document identified and selected
64
76
  - [ ] Integration/E2E test skeleton generation confirmed with user (generated if requested)
65
77
  - [ ] Work plan created via work-planner
78
+ - [ ] Work plan reviewed via document-reviewer
66
79
  - [ ] Plan content approved by user
67
80
  - [ ] All stopping points honored with user confirmation
68
81
 
@@ -36,9 +36,10 @@ Design Doc (uses most recent if omitted): $ARGUMENTS
36
36
 
37
37
  ### Step 1: Prerequisite Check
38
38
  Identify Design Doc in docs/design/ and check implementation files via git diff.
39
+ If a single active work plan is explicitly provided or unambiguously resolved for that Design Doc, read its `Review Scope` line. Otherwise set `Work Plan: none` and `Review Scope: none`; do not infer.
39
40
 
40
41
  ### Step 2: Execute code-reviewer
41
- Spawn code-reviewer agent: "Validate Design Doc compliance for the implementation. Design Doc path: [path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
42
+ Spawn code-reviewer agent: "Validate Design Doc compliance for the implementation. Design Doc path: [path]. Work Plan: [resolved work plan path or none]. Review Scope: [literal Review Scope value or none]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
42
43
 
43
44
  **Store output as**: `$STEP_2_OUTPUT`
44
45
 
@@ -140,7 +140,7 @@ Autonomous execution MUST stop and wait for user input at these points.
140
140
  | UI Spec | After document-reviewer completes UI Spec review (frontend/fullstack) | Approve UI Spec |
141
141
  | ADR | After document-reviewer completes ADR review (if ADR created) | Approve ADR |
142
142
  | Design | After design-sync completes consistency verification | Approve Design Doc |
143
- | Work Plan | After work-planner creates plan | Batch approval for implementation phase |
143
+ | Work Plan | After document-reviewer completes WorkPlan review for Medium/Large, or after simplified plan creation for Small | Batch approval for implementation phase |
144
144
 
145
145
  **ENFORCEMENT**: After batch approval, autonomous execution proceeds without stops until completion or escalation. Skipping stop points is a CRITICAL VIOLATION.
146
146
 
@@ -164,6 +164,16 @@ Handling rules:
164
164
 
165
165
  **ENFORCEMENT**: Using any status value outside this vocabulary is a VIOLATION.
166
166
 
167
+ ### WorkPlan Review State [MANDATORY]
168
+
169
+ Medium and Large work plans must contain a `WorkPlan Review` section. Small simplified plans are exempt because they have no Design Doc to trace against. The plan is reviewed only when that section records `Status: approved` and `Conditions: none`.
170
+
171
+ Handling rules:
172
+ - After WorkPlan review returns `approved`, invoke work-planner in update mode once to record the review section, without changing implementation content.
173
+ - Treat WorkPlan `approved_with_conditions` the same as `needs_revision`: return to work-planner in update mode with the conditions, then re-review. Conditions must not be carried into task decomposition or implementation readiness.
174
+ - A material work plan update resets `WorkPlan Review` to `Status: pending`.
175
+ - Standalone build recipes apply WorkPlan review only before task decomposition, not after task files already exist.
176
+
167
177
  ## Scale Determination and Document Requirements
168
178
 
169
179
  | Scale | File Count | PRD | ADR | Design Doc | Work Plan |
@@ -242,8 +252,8 @@ Always start with `requirement-analyzer`, then follow the minimum flow required
242
252
 
243
253
  | Scale | Required flow |
244
254
  |-------|---------------|
245
- | Large | `requirement-analyzer` **[Stop]** -> `prd-creator` -> `document-reviewer` **[Stop]** -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> optional ADR + `document-reviewer` **[Stop]** -> `codebase-analyzer` -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` **[Stop]** -> `task-decomposer` |
246
- | Medium | `requirement-analyzer` **[Stop]** -> `codebase-analyzer` -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` **[Stop]** -> `task-decomposer` |
255
+ | Large | `requirement-analyzer` **[Stop]** -> `prd-creator` -> `document-reviewer` **[Stop]** -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> optional ADR + `document-reviewer` **[Stop]** -> `codebase-analyzer` -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` -> `document-reviewer` (doc_type: WorkPlan) **[Stop]** -> `task-decomposer` |
256
+ | Medium | `requirement-analyzer` **[Stop]** -> `codebase-analyzer` -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` -> `document-reviewer` (doc_type: WorkPlan) **[Stop]** -> `task-decomposer` |
247
257
  | Small | `requirement-analyzer` **[Stop]** -> simplified plan **[Stop: Batch approval]** -> direct implementation |
248
258
 
249
259
  Flow rules:
@@ -253,6 +263,7 @@ Flow rules:
253
263
  - Pass `codebase-analyzer` output to the designer as `Codebase Analysis`
254
264
  - Pass Design Doc path to `code-verifier`, then pass `code_verification` to `document-reviewer`
255
265
  - Fullstack layer sequencing is defined in `references/monorepo-flow.md`
266
+ - Run WorkPlan review after every Medium/Large work plan creation or update and before batch approval. On `needs_revision` or WorkPlan `approved_with_conditions`, return to `work-planner` in update mode and re-review for max 2 revision iterations as defined by the `needs_revision` row in Approval Status Vocabulary. On `rejected`, halt and escalate to the user.
256
267
 
257
268
  ## Autonomous Execution Mode
258
269
 
@@ -10,7 +10,7 @@ This reference defines the orchestration flow for projects spanning multiple lay
10
10
 
11
11
  ## Design Phase
12
12
 
13
- ### Large Scale Fullstack (6+ Files) - 15 Steps
13
+ ### Large Scale Fullstack (6+ Files) - 16 Steps
14
14
 
15
15
  | Step | Agent | Purpose | Output |
16
16
  |------|-------|---------|--------|
@@ -28,9 +28,10 @@ This reference defines the orchestration flow for projects spanning multiple lay
28
28
  | 12 | document-reviewer x2 | Review each Design Doc with verification evidence | Reviews |
29
29
  | 13 | design-sync | Cross-layer consistency verification (source: frontend Design Doc) **[Stop]** | Sync status |
30
30
  | 14 | acceptance-test-generator | Integration/E2E test skeleton from cross-layer contracts | Test skeletons |
31
- | 15 | work-planner | Work plan from all Design Docs **[Stop: Batch approval]** | Work plan |
31
+ | 15 | work-planner | Work plan from all Design Docs | Work plan |
32
+ | 16 | document-reviewer | WorkPlan review **[Stop: Batch approval]** | Approval |
32
33
 
33
- ### Medium Scale Fullstack (3-5 Files) - 13 Steps
34
+ ### Medium Scale Fullstack (3-5 Files) - 14 Steps
34
35
 
35
36
  | Step | Agent | Purpose | Output |
36
37
  |------|-------|---------|--------|
@@ -46,7 +47,8 @@ This reference defines the orchestration flow for projects spanning multiple lay
46
47
  | 10 | document-reviewer x2 | Review each Design Doc with verification evidence | Reviews |
47
48
  | 11 | design-sync | Cross-layer consistency verification (source: frontend Design Doc) **[Stop]** | Sync status |
48
49
  | 12 | acceptance-test-generator | Integration/E2E test skeleton from cross-layer contracts | Test skeletons |
49
- | 13 | work-planner | Work plan from all Design Docs **[Stop: Batch approval]** | Work plan |
50
+ | 13 | work-planner | Work plan from all Design Docs | Work plan |
51
+ | 14 | document-reviewer | WorkPlan review **[Stop: Batch approval]** | Approval |
50
52
 
51
53
  ### Parallelization in Multi-Agent Steps
52
54
 
@@ -101,6 +103,12 @@ Spawn work-planner with all Design Docs:
101
103
 
102
104
  work-planner's existing Integration Complete criteria naturally covers cross-layer verification when given multiple Design Docs.
103
105
 
106
+ After work-planner creates or updates the plan, spawn document-reviewer:
107
+
108
+ > "Review the fullstack work plan. doc_type: WorkPlan. target: [work plan path]. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
109
+
110
+ On `needs_revision` or `approved_with_conditions`, return to work-planner in update mode and re-review for max 2 revision iterations as defined by the `needs_revision` row in Approval Status Vocabulary. On `rejected`, halt and escalate to the user. Stop for batch approval only after WorkPlan review returns `approved` and the plan's `WorkPlan Review` section records `Status: approved` with `Conditions: none`.
111
+
104
112
  ## Task Decomposition Phase
105
113
 
106
114
  task-decomposer follows standard decomposition from the work plan. The key addition is the **layer-aware naming convention**:
@@ -111,6 +111,7 @@ For each valid AC from Phase 1:
111
111
  - Happy path (1 test mandatory)
112
112
  - Error handling (only if user-visible error)
113
113
  - Edge cases (only if high business impact)
114
+ - Boundary path (behavior-changing AC only): when the AC can hold on the main path while a distinct branch, state, input class, lifecycle step, or fallback regresses, capture that boundary as a proof obligation. Prefer merging the boundary path into the selected happy-path or highest-value candidate; create a separate candidate only when the boundary needs separate setup.
114
115
 
115
116
  2. **Classify test level**:
116
117
  - Integration test candidate (feature-level interaction)
@@ -167,7 +168,8 @@ Value score and E2E selection rules are defined in **integration-e2e-testing ski
167
168
  4. Reserve 1 service-integration-e2e slot only when the journey needs real cross-service verification
168
169
  5. Fill remaining fixture-e2e budget with candidates that satisfy `Value Score >= 20`
169
170
  6. Fill remaining service-integration-e2e budget with candidates that satisfy `Value Score > 50`
170
- 7. If a lane emits no tests, return its generated file as `null` with a concrete lane-specific absence reason
171
+ 7. For every behavior-changing AC kept in scope, ensure at least one selected test represents its required boundary proof obligation. Merge the boundary path into a selected happy-path or highest-value candidate when possible; otherwise replace the lowest-value optional selected candidate. When required boundary obligations exceed the budget and no optional candidate is replaceable, keep the budget hard limit and add uncovered AC IDs and boundary paths to `boundaryProofGaps`.
172
+ 8. If a lane emits no tests, return its generated file as `null` with a concrete lane-specific absence reason
171
173
  ```
172
174
 
173
175
  **Output**: Final test set
@@ -192,6 +194,8 @@ Adapt comment syntax to the project's language when generating annotations.
192
194
  // @dependency: PaymentService, OrderRepository, Database
193
195
  // @real-dependency: OrderRepository, Database
194
196
  // @complexity: high
197
+ // Primary failure mode: payment succeeds but the order row is absent or unpersisted
198
+ // Proof obligation: assert order persistence after successful payment while keeping OrderRepository and Database real; only the external payment gateway may be mocked
195
199
  [Test: 'AC1: Successful payment creates persisted order with correct status']
196
200
 
197
201
  // AC1-error: "Payment failure shows user-friendly error message"
@@ -200,6 +204,8 @@ Adapt comment syntax to the project's language when generating annotations.
200
204
  // @category: core-functionality
201
205
  // @dependency: PaymentService, ErrorHandler
202
206
  // @complexity: medium
207
+ // Primary failure mode: payment failure still creates an order or hides the user-facing error
208
+ // Proof obligation: assert the visible error and the unchanged order state after a failed payment; mock only the external payment gateway failure
203
209
  [Test: 'AC1: Failed payment displays error without creating order']
204
210
  ```
205
211
 
@@ -221,6 +227,8 @@ Adapt comment syntax to the project's language when generating annotations.
221
227
  // @lane: fixture-e2e
222
228
  // @dependency: full-ui (mocked backend)
223
229
  // @complexity: medium
230
+ // Primary failure mode: undo banner appears but the dismissed card is not restored
231
+ // Proof obligation: assert browser-visible state before dismissal, after dismissal, and after undo using fixture-controlled backend state
224
232
  [Test: 'User Journey: Dismiss and undo restores the card']
225
233
  ```
226
234
 
@@ -242,6 +250,8 @@ Adapt comment syntax to the project's language when generating annotations.
242
250
  // @lane: service-integration-e2e
243
251
  // @dependency: full-system
244
252
  // @complexity: high
253
+ // Primary failure mode: checkout appears successful but the persisted order or confirmation event is missing
254
+ // Proof obligation: exercise the full local service stack and assert persisted order state plus confirmation event after checkout
245
255
  [Test: 'User Journey: Complete product purchase persists order and emits confirmation']
246
256
  ```
247
257
 
@@ -264,7 +274,8 @@ Adapt comment syntax to the project's language when generating annotations.
264
274
  "e2eAbsenceReason": {
265
275
  "fixtureE2e": "all_e2e_candidates_below_threshold",
266
276
  "serviceE2e": "no_real_service_dependency"
267
- }
277
+ },
278
+ "boundaryProofGaps": []
268
279
  }
269
280
  ```
270
281
 
@@ -285,7 +296,14 @@ Adapt comment syntax to the project's language when generating annotations.
285
296
  "e2eAbsenceReason": {
286
297
  "fixtureE2e": null,
287
298
  "serviceE2e": null
288
- }
299
+ },
300
+ "boundaryProofGaps": [
301
+ {
302
+ "acId": "[AC-XXX]",
303
+ "boundaryPath": "[branch/state/input/lifecycle/fallback/visibility path]",
304
+ "reason": "budget_insufficient_for_boundary_proof"
305
+ }
306
+ ]
289
307
  }
290
308
  ```
291
309
 
@@ -297,13 +315,15 @@ Each test case MUST have the following standard annotations for test implementat
297
315
  - **@lane**: integration | fixture-e2e | service-integration-e2e
298
316
  - **@dependency**: none | [component names] | full-ui (mocked backend) | full-system
299
317
  - **@complexity**: low | medium | high
318
+ - **Primary failure mode**: the specific regression that should make the implemented test fail
319
+ - **Proof obligation**: what the implemented test must assert to prove the claim, including the boundary to exercise, before/action/after state for state-changing claims, and which boundaries may be mocked with rationale. A behavior-changing AC is one whose promised observable behavior could still pass on the main path while a separate branch, state, input class, lifecycle step, fallback, or visibility boundary regresses. For behavior-changing ACs, name the boundary path the test must traverse when the main path alone would stay green through the regression
300
320
 
301
- These annotations are used when planning and prioritizing test implementation.
321
+ These annotations are used when planning and prioritizing test implementation. Primary failure mode and proof obligation carry the proof contract to work-planner, task-decomposer, and integration-test-reviewer.
302
322
 
303
323
  ## Constraints and Quality Standards
304
324
 
305
325
  **Mandatory Compliance**:
306
- - Output test skeletons only: verification points, expected results, and pass criteria
326
+ - Output test skeletons only: verification points, expected results, pass criteria, primary failure mode, and proof obligation
307
327
  - Downstream consumers treat these skeletons as design artifacts rather than runnable tests
308
328
  - Clearly state verification points, expected results, and pass criteria for each test
309
329
  - Preserve original AC statements in comments (ensure traceability)
@@ -53,7 +53,7 @@ Skill Status:
53
53
  ## Input Parameters
54
54
 
55
55
  - **designDoc**: Path to the Design Doc (or multiple paths for fullstack features)
56
- - **implementationFiles**: List of files to review (or git diff range)
56
+ - **implementationFiles**: List of files to review (or git diff range). When a Work Plan is provided and implementationFiles is omitted or ambiguous, derive the review file set from the plan's `Review Scope` value; for revision plans, use the recorded base branch plus diff range.
57
57
  - **reviewMode**: `full` (default) | `acceptance` | `architecture`
58
58
 
59
59
  ## Workflow
@@ -75,6 +75,9 @@ For each acceptance criterion extracted in Step 1:
75
75
  - Determine status: fulfilled / partially fulfilled / unfulfilled
76
76
  - Record the file path and relevant code location
77
77
  - Note any deviations from the Design Doc specification
78
+ - For behavior-changing ACs, confirm the evidence covers main and boundary paths. Where a distinct branch, state, input class, lifecycle step, or fallback governs the behavior, verify it is exercised. Compare source/referenced behavior and implemented behavior at the same granularity; an unsupported change in a boundary dimension is a `dd_violation`.
79
+ - Confirm the implementation keeps the core mechanism the AC, Design Doc, or referenced materials require. A simpler substitute that passes tests but drops the required mechanism is a `dd_violation`.
80
+ - For changes to persisted, shared, or externally observable state, identify the publication boundary where the new state becomes observable to another process, component, user, or later step. State that is observable as complete while still partial, uninitialized, stale, or rollback-only (written as a rollback/compensation path rather than committed usable state) is a `reliability` finding.
78
81
 
79
82
  #### 2-2. Identifier Verification
80
83
  For each identifier specification extracted in Step 1:
@@ -124,6 +127,7 @@ Read error paths and boundary handling directly in the code:
124
127
  - Meaningful coverage: at least one assertion exercises the AC's observable behavior
125
128
  - Coverage gap: `skip`/`xit` on tests that should run, TODO/placeholder-only bodies, always-true assertions (for example `expect(true).toBe(true)` or `expect(arr.length).toBeGreaterThanOrEqual(0)`), 0-match runner reports, or grep-only matches without behavior verification
126
129
  - Intentional absence: meaningful when absence is the AC expectation
130
+ - Proof adequacy: a covered test should fail under the AC's primary failure mode and should exercise the claimed boundary rather than a substitute input that bypasses it. A test that would stay green if the claimed behavior regressed is a `coverage_gap` with rationale naming the unproven failure mode.
127
131
 
128
132
  Classify each quality finding into one of:
129
133
  - `dd_violation`: implementation deviates from the Design Doc