codex-workflows 0.6.5 → 0.6.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -5,8 +5,16 @@ Type: feature|fix|refactor
5
5
  Estimated Duration: X days
6
6
  Estimated Impact: X files
7
7
  Related Issue/PR: #XXX (if any)
8
+ Review Scope: [planned-files scope derived from Design Doc and task targets; for a revision plan over existing work, base branch + diff range]
8
9
  Implementation Readiness: pending
9
10
 
11
+ ## WorkPlan Review
12
+
13
+ This section records the review gate state for the exact plan content. Set `Status: pending` when the plan is created or materially updated. The orchestrator treats only `Status: approved` with `Conditions: none` as reviewed.
14
+
15
+ - **Status**: pending|approved
16
+ - **Conditions**: none
17
+
10
18
  ## Related Documents
11
19
  - Design Doc(s):
12
20
  - [docs/design/XXX.md]
@@ -35,6 +43,10 @@ Repeat this block for each Design Doc when multiple Design Docs exist. Preserve
35
43
  - **Success criteria**: [extracted from Design Doc]
36
44
  - **Failure response**: [extracted from Design Doc]
37
45
 
46
+ ### Proof Strategy
47
+ - **Proof obligation source**: [test skeleton annotations (`Primary failure mode`, `Proof obligation`) when skeletons exist; otherwise each acceptance criterion's primary failure mode derived from the Design Doc]
48
+ - **Per-task propagation**: every task that implements or verifies a claim records the AC ID or claim identifier in Proof Obligations (see task template) so downstream review can judge whether tests prove the claim, not merely run
49
+
38
50
  ## Quality Assurance Mechanisms (from Design Docs)
39
51
 
40
52
  Adopted quality gates for the change area. Each task in this plan must satisfy the applicable mechanisms.
@@ -69,6 +81,21 @@ Map each Design Doc technical requirement to the task or phase that covers it. U
69
81
  - Merge duplicate restatements of the same obligation from multiple DD sections into one row and cite the primary section in `DD Section`
70
82
  - Keep `scope-boundary` rows concrete: name the protected file group, component boundary, contract, or workflow that must remain unchanged
71
83
 
84
+ ## Failure Mode Checklist
85
+
86
+ Domain-independent failure categories this implementation must guard against. Enumerate all eight categories, mark which apply, and list a covering task for each that applies; keep category names generic and place project-specific detail in task descriptions or notes.
87
+
88
+ | Category | Applies? | Covered By Task(s) |
89
+ |----------|----------|--------------------|
90
+ | same-value | yes/no | [P1-T1] |
91
+ | no-op | yes/no | |
92
+ | empty input | yes/no | |
93
+ | invalid option | yes/no | |
94
+ | missing config | yes/no | |
95
+ | unavailable boundary | yes/no | |
96
+ | shared-state dependency | yes/no | |
97
+ | rollback-only visibility | yes/no | |
98
+
72
99
  ## UI Spec Component -> Task Mapping
73
100
 
74
101
  Include this section when a UI Spec is among the inputs. Map each UI component section to the task(s) that implement it so task-decomposer can pass the exact UI Spec context to executor tasks. Omit this section when no UI Spec exists.
@@ -58,9 +58,21 @@ Brief observations recorded after reading Investigation Targets:
58
58
  - **Failure response**: [What to do if verification fails]
59
59
  - **Verification level**: [L1 unit/local verification, L2 integration verification, or L3 end-to-end verification]
60
60
 
61
+ ## Proof Obligations
62
+ (Include one entry per acceptance criterion, user journey, boundary, or state transition this task implements or verifies. Derive from test skeleton annotations when present; otherwise derive from the acceptance criterion's primary failure mode.)
63
+ - **AC / Claim ID**: [AC-XXX, user journey identifier, boundary identifier, or task claim identifier]
64
+ - **Claim**: [behavior the acceptance criterion or task promises]
65
+ - **Primary failure mode**: [regression the test should turn red on]
66
+ - **Boundary to exercise**: [public/integration/browser/process/service/persistence boundary, or "in-process unit"]
67
+ - **State assertion**: [observable state before -> action -> after for state-changing claims; "N/A" otherwise]
68
+ - **Mock boundary rationale**: [which external boundaries may be mocked and why; "none" when all real]
69
+ - **Residual**: [what this task-level proof leaves unestablished, and which later task or phase closes it]
70
+
61
71
  ## Completion Criteria
72
+ - [ ] All listed AC / Claim IDs are implemented or verified by this task
62
73
  - [ ] All added tests pass
63
74
  - [ ] Operation verified per Operation Verification Methods above
75
+ - [ ] Each Proof Obligation is met: the test turns red under its primary failure mode and exercises the stated boundary
64
76
  - [ ] Deliverables created (for research/design tasks)
65
77
  - [ ] When Binding Decisions exist, every Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes
66
78
 
@@ -55,14 +55,28 @@ Analyze task file existence state and determine the action required:
55
55
  | State | Criteria | Next Action |
56
56
  |-------|----------|-------------|
57
57
  | Tasks exist | Consumed Task Set is non-empty | User's execution instruction serves as batch approval -> Enter autonomous execution immediately |
58
- | No tasks + plan exists | Consumed Task Set is empty but plan exists | Confirm with user -> spawn task-decomposer |
58
+ | No tasks + plan exists + reviewed plan | Consumed Task Set is empty and WorkPlan Review records `Status: approved`, `Conditions: none` | Confirm with user -> spawn task-decomposer |
59
+ | No tasks + small simplified plan | Consumed Task Set is empty, plan exists, and the plan references no Design Doc | Confirm with user -> spawn task-decomposer |
60
+ | No tasks + plan exists + unreviewed plan | Consumed Task Set is empty, the plan references a Design Doc, and WorkPlan Review is absent, pending, conditional, or not approved | Run work plan review, then confirm with user -> spawn task-decomposer |
59
61
  | Neither exists | No plan or task files | Error: Prerequisites not met |
60
62
 
61
63
  ## Task Decomposition Phase (Conditional)
62
64
 
63
- When task files don't exist:
65
+ When task files don't exist, the plan references a Design Doc, and the WorkPlan Review section is absent, pending, conditional, or not approved:
64
66
 
65
- ### 1. User Confirmation
67
+ ### 1. Work Plan Review
68
+
69
+ Spawn document-reviewer agent: "Review the work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
70
+
71
+ Branch on `verdict.decision`:
72
+ - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
73
+ - `approved_with_conditions` -> stop before task decomposition and report that the work plan needs update via recipe-plan
74
+ - `needs_revision` -> stop before task decomposition and report that the work plan needs update via recipe-plan
75
+ - `rejected` -> stop before task decomposition and present the blocking findings to the user
76
+
77
+ When task files don't exist and the WorkPlan Review section records `Status: approved` and `Conditions: none`, skip Work Plan Review and continue to user confirmation.
78
+
79
+ ### 2. User Confirmation
66
80
  ```
67
81
  No task files found.
68
82
  Work plan: docs/plans/[plan-name].md
@@ -70,10 +84,10 @@ Work plan: docs/plans/[plan-name].md
70
84
  Generate tasks from the work plan? (y/n):
71
85
  ```
72
86
 
73
- ### 2. Task Decomposition (if approved)
87
+ ### 3. Task Decomposition (if approved)
74
88
  Spawn task-decomposer agent: "Read work plan at docs/plans/[plan-name].md and decompose into atomic tasks. Output: Individual task files in docs/plans/tasks/. Granularity: 1 task = 1 commit = independently executable."
75
89
 
76
- ### 3. Verify Generation
90
+ ### 4. Verify Generation
77
91
  Recompute the Consumed Task Set and verify it is non-empty.
78
92
 
79
93
  ## Pre-execution Checklist
@@ -123,7 +137,7 @@ VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous ex
123
137
  ## Post-Implementation Verification (After All Tasks Complete)
124
138
 
125
139
  After all task cycles finish, collect all `filesModified` from every task-executor response (deduplicated), then run both verification agents before the completion report:
126
- 1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
140
+ 1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]. Work Plan Review Scope: [Review Scope value from the active work plan, used only to confirm the collected file set is complete]."
127
141
  2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
128
142
  3. Consolidate results:
129
143
  - code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`
@@ -55,14 +55,28 @@ Analyze task file existence state and determine the action required:
55
55
  | State | Criteria | Next Action |
56
56
  |-------|----------|-------------|
57
57
  | Tasks exist | Consumed Task Set is non-empty | User's execution instruction serves as batch approval -> Enter autonomous execution immediately |
58
- | No tasks + plan exists | Consumed Task Set is empty but plan exists | Confirm with user -> spawn task-decomposer |
58
+ | No tasks + plan exists + reviewed plan | Consumed Task Set is empty and WorkPlan Review records `Status: approved`, `Conditions: none` | Confirm with user -> spawn task-decomposer |
59
+ | No tasks + small simplified plan | Consumed Task Set is empty, plan exists, and the plan references no Design Doc | Confirm with user -> spawn task-decomposer |
60
+ | No tasks + plan exists + unreviewed plan | Consumed Task Set is empty, the plan references a Design Doc, and WorkPlan Review is absent, pending, conditional, or not approved | Run work plan review, then confirm with user -> spawn task-decomposer |
59
61
  | Neither exists | No plan or task files | Error: Prerequisites not met |
60
62
 
61
63
  ## Task Decomposition Phase (Conditional)
62
64
 
63
- When task files don't exist:
65
+ When task files don't exist, the plan references a Design Doc, and the WorkPlan Review section is absent, pending, conditional, or not approved:
64
66
 
65
- ### 1. User Confirmation
67
+ ### 1. Work Plan Review
68
+
69
+ Spawn document-reviewer agent: "Review the frontend work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
70
+
71
+ Branch on `verdict.decision`:
72
+ - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
73
+ - `approved_with_conditions` -> stop before task decomposition and report that the work plan needs update via recipe-front-plan
74
+ - `needs_revision` -> stop before task decomposition and report that the work plan needs update via recipe-front-plan
75
+ - `rejected` -> stop before task decomposition and present the blocking findings to the user
76
+
77
+ When task files don't exist and the WorkPlan Review section records `Status: approved` and `Conditions: none`, skip Work Plan Review and continue to user confirmation.
78
+
79
+ ### 2. User Confirmation
66
80
  ```
67
81
  No task files found.
68
82
  Work plan: docs/plans/[plan-name].md
@@ -70,10 +84,10 @@ Work plan: docs/plans/[plan-name].md
70
84
  Generate tasks from the work plan? (y/n):
71
85
  ```
72
86
 
73
- ### 2. Task Decomposition (if approved)
87
+ ### 3. Task Decomposition (if approved)
74
88
  Spawn task-decomposer agent: "Read work plan at docs/plans/[plan-name].md and decompose into atomic tasks. Output: Individual task files in docs/plans/tasks/. Granularity: 1 task = 1 commit = independently executable"
75
89
 
76
- ### 3. Verify Generation
90
+ ### 4. Verify Generation
77
91
  Recompute the Consumed Task Set and verify it is non-empty.
78
92
 
79
93
  ## Pre-execution Checklist
@@ -131,7 +145,7 @@ VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous ex
131
145
  ## Post-Implementation Verification (After All Tasks Complete)
132
146
 
133
147
  After all task cycles finish, collect all `filesModified` from every task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
134
- 1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]."
148
+ 1. Spawn code-verifier agent: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [path]. `code_paths`: [collected filesModified list]. Work Plan Review Scope: [Review Scope value from the active work plan, used only to confirm the collected file set is complete]."
135
149
  2. Spawn security-reviewer agent: "Design Doc: [path]. Implementation files: [collected filesModified list]. Review security compliance."
136
150
  3. Consolidate results:
137
151
  - code-verifier passes when `summary.status` is `consistent` or `mostly_consistent`
@@ -20,6 +20,7 @@ description: "Create frontend work plan from design document with test skeleton
20
20
  **Execution Method**:
21
21
  - Test skeleton generation -> performed by acceptance-test-generator
22
22
  - Work plan creation -> performed by work-planner
23
+ - Work plan review -> performed by document-reviewer
23
24
 
24
25
  Orchestrator spawns agents and passes structured data between them.
25
26
 
@@ -29,6 +30,7 @@ Orchestrator spawns agents and passes structured data between them.
29
30
  - Design document selection
30
31
  - Test skeleton generation with acceptance-test-generator
31
32
  - Work plan creation with work-planner
33
+ - Work plan review with document-reviewer
32
34
  - Plan approval obtainment
33
35
 
34
36
  **Responsibility Boundary**: This skill completes with work plan approval.
@@ -50,6 +52,15 @@ Spawn acceptance-test-generator agent: "Generate test skeletons from Design Doc
50
52
  ### Step 3: Work Plan Creation
51
53
  Spawn work-planner agent: "Create work plan from Design Doc at [path]. Integration test file: [path from step 2]. fixture-e2e test file: [path from step 2 or null]. service-integration-e2e test file: [path from step 2 or null]. E2E absence reasons by lane: [values from step 2 when an E2E lane is null]. Integration tests are created with each phase implementation, fixture-e2e runs alongside UI implementation, service-integration-e2e runs only in the final phase when a service E2E file exists. Include `Implementation Readiness: pending` in the work plan header."
52
54
 
55
+ ### Step 4: Work Plan Review
56
+ Spawn document-reviewer agent: "Review the frontend work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
57
+
58
+ Branch on `verdict.decision`:
59
+ - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then proceed to Step 5
60
+ - `approved_with_conditions` or `needs_revision` -> spawn work-planner in update mode with the findings or conditions, then repeat Step 4. Use max 2 revision iterations as defined by the `needs_revision` row in subagents-orchestration-guide Approval Status Vocabulary.
61
+ - `rejected` -> stop and present the blocking findings to the user.
62
+
63
+ ### Step 5: Plan Approval
53
64
  **[STOP -- BLOCKING]** Interact with user to complete plan and obtain approval for plan content. Clarify specific implementation steps and risks.
54
65
  **CANNOT proceed until user explicitly approves the work plan.**
55
66
 
@@ -60,6 +71,7 @@ ENFORCEMENT: Plan content MUST be approved before declaring completion. Unapprov
60
71
  - [ ] Design document selected
61
72
  - [ ] Test skeletons generated
62
73
  - [ ] Work plan created
74
+ - [ ] Work plan reviewed via document-reviewer
63
75
  - [ ] User approved plan content
64
76
 
65
77
  ## Output Example
@@ -31,12 +31,13 @@ Design Doc (uses most recent if omitted): $ARGUMENTS
31
31
 
32
32
  ### 1. Prerequisite Check
33
33
  Identify the Design Doc in docs/design/ and check implementation files changed from the default branch (detect via `git symbolic-ref refs/remotes/origin/HEAD` or fall back to current branch diff).
34
+ If a single active work plan is explicitly provided or unambiguously resolved for that Design Doc, read its `Review Scope` line. Otherwise set `Work Plan: none` and `Review Scope: none`; do not infer.
34
35
 
35
36
  **[STOP -- BLOCKING]** If no Design Doc or implementation files found, notify user and halt.
36
37
  **CANNOT proceed without both a Design Doc and implementation files.**
37
38
 
38
39
  ### 2. Execute code-reviewer
39
- Spawn code-reviewer agent: "Validate Design Doc compliance for [design-doc-path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
40
+ Spawn code-reviewer agent: "Validate Design Doc compliance for [design-doc-path]. Work Plan: [resolved work plan path or none]. Review Scope: [literal Review Scope value or none]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
40
41
 
41
42
  **Store output as**: `$STEP_2_OUTPUT`
42
43
 
@@ -65,14 +65,28 @@ Analyze task file existence state and determine the action required:
65
65
  | State | Criteria | Next Action |
66
66
  |-------|----------|-------------|
67
67
  | Tasks exist | Consumed Task Set is non-empty | User's execution instruction serves as batch approval -> Enter autonomous execution immediately |
68
- | No tasks + plan exists | Consumed Task Set is empty but plan exists | Confirm with user -> spawn task-decomposer |
68
+ | No tasks + plan exists + reviewed plan | Consumed Task Set is empty and WorkPlan Review records `Status: approved`, `Conditions: none` | Confirm with user -> spawn task-decomposer |
69
+ | No tasks + small simplified plan | Consumed Task Set is empty, plan exists, and the plan references no Design Doc | Confirm with user -> spawn task-decomposer |
70
+ | No tasks + plan exists + unreviewed plan | Consumed Task Set is empty, the plan references a Design Doc, and WorkPlan Review is absent, pending, conditional, or not approved | Run work plan review, then confirm with user -> spawn task-decomposer |
69
71
  | Neither exists | No plan or task files | Error: Prerequisites not met |
70
72
 
71
73
  ## Task Decomposition Phase (Conditional)
72
74
 
73
- When task files don't exist:
75
+ When task files don't exist, the plan references a Design Doc, and the WorkPlan Review section is absent, pending, conditional, or not approved:
74
76
 
75
- ### 1. User Confirmation
77
+ ### 1. Work Plan Review
78
+
79
+ Spawn document-reviewer agent: "Review the fullstack work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
80
+
81
+ Branch on `verdict.decision`:
82
+ - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
83
+ - `approved_with_conditions` -> stop before task decomposition and report that the work plan needs update via recipe-plan or the fullstack planning flow
84
+ - `needs_revision` -> stop before task decomposition and report that the work plan needs update via recipe-plan or the fullstack planning flow
85
+ - `rejected` -> stop before task decomposition and present the blocking findings to the user
86
+
87
+ When task files don't exist and the WorkPlan Review section records `Status: approved` and `Conditions: none`, skip Work Plan Review and continue to user confirmation.
88
+
89
+ ### 2. User Confirmation
76
90
  ```
77
91
  No task files found.
78
92
  Work plan: docs/plans/[plan-name].md
@@ -80,10 +94,10 @@ Work plan: docs/plans/[plan-name].md
80
94
  Generate tasks from the work plan? (y/n):
81
95
  ```
82
96
 
83
- ### 2. Task Decomposition (if approved)
97
+ ### 3. Task Decomposition (if approved)
84
98
  Spawn task-decomposer agent: "Read work plan at docs/plans/[plan-name].md and decompose into atomic tasks. Output: Individual task files in docs/plans/tasks/. Granularity: 1 task = 1 commit = independently executable. Use layer-aware naming: {plan}-backend-task-{n}.md, {plan}-frontend-task-{n}.md based on target file paths."
85
99
 
86
- ### 3. Verify Generation
100
+ ### 4. Verify Generation
87
101
  Recompute the Consumed Task Set and verify it is non-empty.
88
102
 
89
103
  ## Pre-execution Checklist
@@ -141,7 +155,7 @@ VERIFY approval status before proceeding. Once confirmed, INITIATE autonomous ex
141
155
  ## Post-Implementation Verification (After All Tasks Complete)
142
156
 
143
157
  After all task cycles finish, collect all `filesModified` from every task-executor/task-executor-frontend response (deduplicated), then run both verification agents before the completion report:
144
- 1. Spawn code-verifier once per Design Doc: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [single design doc path]. `code_paths`: [collected filesModified list]."
158
+ 1. Spawn code-verifier once per Design Doc: "Verify implementation consistency against the Design Doc. `doc_type: design-doc`. `document_path`: [single design doc path]. `code_paths`: [collected filesModified list]. Work Plan Review Scope: [Review Scope value from the active work plan, used only to confirm the collected file set is complete]."
145
159
  2. Spawn security-reviewer agent: "Design Doc: [path(s)]. Implementation files: [collected filesModified list]. Review security compliance."
146
160
  3. Consolidate results:
147
161
  - each code-verifier run passes when `summary.status` is `consistent` or `mostly_consistent`
@@ -69,7 +69,7 @@ Follow subagents-orchestration-guide skill Large/Medium/Small scale flow exactly
69
69
  **STEP 3**: Spawn technical-designer-frontend agent → spawn document-reviewer agent → spawn design-sync agent.
70
70
  **[STOP — BLOCKING]** Present Frontend Design Doc for user approval. **CANNOT proceed until user explicitly confirms.**
71
71
 
72
- **STEP 4**: Spawn acceptance-test-generator agent → spawn work-planner agent.
72
+ **STEP 4**: Spawn acceptance-test-generator agent → spawn work-planner agent → spawn document-reviewer agent with `doc_type: WorkPlan`.
73
73
  **[STOP — BLOCKING]** Present Work Plan for user approval. **CANNOT proceed until user explicitly confirms.**
74
74
 
75
75
  **STEP 5**: Run implementation readiness preflight.
@@ -33,6 +33,7 @@ ENFORCEMENT: Work-planner spawned without test skeleton data (when tests were re
33
33
  - Design document selection
34
34
  - Test skeleton generation with acceptance-test-generator
35
35
  - Work plan creation with work-planner
36
+ - Work plan review with document-reviewer
36
37
  - Plan approval obtainment
37
38
 
38
39
  **Responsibility Boundary**: This skill completes with work plan approval.
@@ -53,7 +54,18 @@ Present options if multiple exist (can be specified with $ARGUMENTS).
53
54
 
54
55
  ### Step 3: Work Plan Creation
55
56
  - Spawn work-planner agent: "Create work plan from design document at [design-doc-path]. Include deliverables from previous process according to subagents-orchestration-guide skill coordination specification. If `generatedFiles.fixtureE2e` or `generatedFiles.serviceE2e` is null, use the corresponding `e2eAbsenceReason` and accept the null E2E lane as a valid planning input. Include `Implementation Readiness: pending` in the work plan header."
56
- - Interact with user to complete plan and obtain approval for plan content
57
+
58
+ ### Step 4: Work Plan Review
59
+ Spawn document-reviewer agent: "Review the work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
60
+
61
+ Branch on `verdict.decision`:
62
+ - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then proceed to Step 5
63
+ - `approved_with_conditions` or `needs_revision` -> spawn work-planner in update mode with the findings or conditions, then repeat Step 4. Use max 2 revision iterations as defined by the `needs_revision` row in subagents-orchestration-guide Approval Status Vocabulary.
64
+ - `rejected` -> stop and present the blocking findings to the user.
65
+
66
+ ### Step 5: Plan Approval
67
+ - Present the reviewed work plan to the user for batch approval
68
+ - If the user requests changes, spawn work-planner in update mode and re-run Step 4
57
69
  - Clarify specific implementation steps and risks
58
70
 
59
71
  **Scope**: Up to work plan creation and obtaining approval for plan content.
@@ -63,6 +75,7 @@ Present options if multiple exist (can be specified with $ARGUMENTS).
63
75
  - [ ] Design document identified and selected
64
76
  - [ ] Integration/E2E test skeleton generation confirmed with user (generated if requested)
65
77
  - [ ] Work plan created via work-planner
78
+ - [ ] Work plan reviewed via document-reviewer
66
79
  - [ ] Plan content approved by user
67
80
  - [ ] All stopping points honored with user confirmation
68
81
 
@@ -36,9 +36,10 @@ Design Doc (uses most recent if omitted): $ARGUMENTS
36
36
 
37
37
  ### Step 1: Prerequisite Check
38
38
  Identify Design Doc in docs/design/ and check implementation files via git diff.
39
+ If a single active work plan is explicitly provided or unambiguously resolved for that Design Doc, read its `Review Scope` line. Otherwise set `Work Plan: none` and `Review Scope: none`; do not infer.
39
40
 
40
41
  ### Step 2: Execute code-reviewer
41
- Spawn code-reviewer agent: "Validate Design Doc compliance for the implementation. Design Doc path: [path]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
42
+ Spawn code-reviewer agent: "Validate Design Doc compliance for the implementation. Design Doc path: [path]. Work Plan: [resolved work plan path or none]. Review Scope: [literal Review Scope value or none]. Implementation files: [git diff file list]. Review mode: full. Return structured JSON report per your Output Format specification."
42
43
 
43
44
  **Store output as**: `$STEP_2_OUTPUT`
44
45
 
@@ -140,7 +140,7 @@ Autonomous execution MUST stop and wait for user input at these points.
140
140
  | UI Spec | After document-reviewer completes UI Spec review (frontend/fullstack) | Approve UI Spec |
141
141
  | ADR | After document-reviewer completes ADR review (if ADR created) | Approve ADR |
142
142
  | Design | After design-sync completes consistency verification | Approve Design Doc |
143
- | Work Plan | After work-planner creates plan | Batch approval for implementation phase |
143
+ | Work Plan | After document-reviewer completes WorkPlan review for Medium/Large, or after simplified plan creation for Small | Batch approval for implementation phase |
144
144
 
145
145
  **ENFORCEMENT**: After batch approval, autonomous execution proceeds without stops until completion or escalation. Skipping stop points is a CRITICAL VIOLATION.
146
146
 
@@ -164,6 +164,16 @@ Handling rules:
164
164
 
165
165
  **ENFORCEMENT**: Using any status value outside this vocabulary is a VIOLATION.
166
166
 
167
+ ### WorkPlan Review State [MANDATORY]
168
+
169
+ Medium and Large work plans must contain a `WorkPlan Review` section. Small simplified plans are exempt because they have no Design Doc to trace against. The plan is reviewed only when that section records `Status: approved` and `Conditions: none`.
170
+
171
+ Handling rules:
172
+ - After WorkPlan review returns `approved`, invoke work-planner in update mode once to record the review section, without changing implementation content.
173
+ - Treat WorkPlan `approved_with_conditions` the same as `needs_revision`: return to work-planner in update mode with the conditions, then re-review. Conditions must not be carried into task decomposition or implementation readiness.
174
+ - A material work plan update resets `WorkPlan Review` to `Status: pending`.
175
+ - Standalone build recipes apply WorkPlan review only before task decomposition, not after task files already exist.
176
+
167
177
  ## Scale Determination and Document Requirements
168
178
 
169
179
  | Scale | File Count | PRD | ADR | Design Doc | Work Plan |
@@ -242,8 +252,8 @@ Always start with `requirement-analyzer`, then follow the minimum flow required
242
252
 
243
253
  | Scale | Required flow |
244
254
  |-------|---------------|
245
- | Large | `requirement-analyzer` **[Stop]** -> `prd-creator` -> `document-reviewer` **[Stop]** -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> optional ADR + `document-reviewer` **[Stop]** -> `codebase-analyzer` -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` **[Stop]** -> `task-decomposer` |
246
- | Medium | `requirement-analyzer` **[Stop]** -> `codebase-analyzer` -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` **[Stop]** -> `task-decomposer` |
255
+ | Large | `requirement-analyzer` **[Stop]** -> `prd-creator` -> `document-reviewer` **[Stop]** -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> optional ADR + `document-reviewer` **[Stop]** -> `codebase-analyzer` -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` -> `document-reviewer` (doc_type: WorkPlan) **[Stop]** -> `task-decomposer` |
256
+ | Medium | `requirement-analyzer` **[Stop]** -> `codebase-analyzer` -> optional `ui-spec-designer` + `document-reviewer` **[Stop]** -> `technical-designer*` -> `code-verifier` -> `document-reviewer` -> `design-sync` **[Stop]** -> `acceptance-test-generator` -> `work-planner` -> `document-reviewer` (doc_type: WorkPlan) **[Stop]** -> `task-decomposer` |
247
257
  | Small | `requirement-analyzer` **[Stop]** -> simplified plan **[Stop: Batch approval]** -> direct implementation |
248
258
 
249
259
  Flow rules:
@@ -253,6 +263,7 @@ Flow rules:
253
263
  - Pass `codebase-analyzer` output to the designer as `Codebase Analysis`
254
264
  - Pass Design Doc path to `code-verifier`, then pass `code_verification` to `document-reviewer`
255
265
  - Fullstack layer sequencing is defined in `references/monorepo-flow.md`
266
+ - Run WorkPlan review after every Medium/Large work plan creation or update and before batch approval. On `needs_revision` or WorkPlan `approved_with_conditions`, return to `work-planner` in update mode and re-review for max 2 revision iterations as defined by the `needs_revision` row in Approval Status Vocabulary. On `rejected`, halt and escalate to the user.
256
267
 
257
268
  ## Autonomous Execution Mode
258
269
 
@@ -10,7 +10,7 @@ This reference defines the orchestration flow for projects spanning multiple lay
10
10
 
11
11
  ## Design Phase
12
12
 
13
- ### Large Scale Fullstack (6+ Files) - 15 Steps
13
+ ### Large Scale Fullstack (6+ Files) - 16 Steps
14
14
 
15
15
  | Step | Agent | Purpose | Output |
16
16
  |------|-------|---------|--------|
@@ -28,9 +28,10 @@ This reference defines the orchestration flow for projects spanning multiple lay
28
28
  | 12 | document-reviewer x2 | Review each Design Doc with verification evidence | Reviews |
29
29
  | 13 | design-sync | Cross-layer consistency verification (source: frontend Design Doc) **[Stop]** | Sync status |
30
30
  | 14 | acceptance-test-generator | Integration/E2E test skeleton from cross-layer contracts | Test skeletons |
31
- | 15 | work-planner | Work plan from all Design Docs **[Stop: Batch approval]** | Work plan |
31
+ | 15 | work-planner | Work plan from all Design Docs | Work plan |
32
+ | 16 | document-reviewer | WorkPlan review **[Stop: Batch approval]** | Approval |
32
33
 
33
- ### Medium Scale Fullstack (3-5 Files) - 13 Steps
34
+ ### Medium Scale Fullstack (3-5 Files) - 14 Steps
34
35
 
35
36
  | Step | Agent | Purpose | Output |
36
37
  |------|-------|---------|--------|
@@ -46,7 +47,8 @@ This reference defines the orchestration flow for projects spanning multiple lay
46
47
  | 10 | document-reviewer x2 | Review each Design Doc with verification evidence | Reviews |
47
48
  | 11 | design-sync | Cross-layer consistency verification (source: frontend Design Doc) **[Stop]** | Sync status |
48
49
  | 12 | acceptance-test-generator | Integration/E2E test skeleton from cross-layer contracts | Test skeletons |
49
- | 13 | work-planner | Work plan from all Design Docs **[Stop: Batch approval]** | Work plan |
50
+ | 13 | work-planner | Work plan from all Design Docs | Work plan |
51
+ | 14 | document-reviewer | WorkPlan review **[Stop: Batch approval]** | Approval |
50
52
 
51
53
  ### Parallelization in Multi-Agent Steps
52
54
 
@@ -101,6 +103,12 @@ Spawn work-planner with all Design Docs:
101
103
 
102
104
  work-planner's existing Integration Complete criteria naturally covers cross-layer verification when given multiple Design Docs.
103
105
 
106
+ After work-planner creates or updates the plan, spawn document-reviewer:
107
+
108
+ > "Review the fullstack work plan. doc_type: WorkPlan. target: [work plan path]. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
109
+
110
+ On `needs_revision` or `approved_with_conditions`, return to work-planner in update mode and re-review for max 2 revision iterations as defined by the `needs_revision` row in Approval Status Vocabulary. On `rejected`, halt and escalate to the user. Stop for batch approval only after WorkPlan review returns `approved` and the plan's `WorkPlan Review` section records `Status: approved` with `Conditions: none`.
111
+
104
112
  ## Task Decomposition Phase
105
113
 
106
114
  task-decomposer follows standard decomposition from the work plan. The key addition is the **layer-aware naming convention**:
@@ -192,6 +192,8 @@ Adapt comment syntax to the project's language when generating annotations.
192
192
  // @dependency: PaymentService, OrderRepository, Database
193
193
  // @real-dependency: OrderRepository, Database
194
194
  // @complexity: high
195
+ // Primary failure mode: payment succeeds but the order row is absent or unpersisted
196
+ // Proof obligation: assert order persistence after successful payment while keeping OrderRepository and Database real; only the external payment gateway may be mocked
195
197
  [Test: 'AC1: Successful payment creates persisted order with correct status']
196
198
 
197
199
  // AC1-error: "Payment failure shows user-friendly error message"
@@ -200,6 +202,8 @@ Adapt comment syntax to the project's language when generating annotations.
200
202
  // @category: core-functionality
201
203
  // @dependency: PaymentService, ErrorHandler
202
204
  // @complexity: medium
205
+ // Primary failure mode: payment failure still creates an order or hides the user-facing error
206
+ // Proof obligation: assert the visible error and the unchanged order state after a failed payment; mock only the external payment gateway failure
203
207
  [Test: 'AC1: Failed payment displays error without creating order']
204
208
  ```
205
209
 
@@ -221,6 +225,8 @@ Adapt comment syntax to the project's language when generating annotations.
221
225
  // @lane: fixture-e2e
222
226
  // @dependency: full-ui (mocked backend)
223
227
  // @complexity: medium
228
+ // Primary failure mode: undo banner appears but the dismissed card is not restored
229
+ // Proof obligation: assert browser-visible state before dismissal, after dismissal, and after undo using fixture-controlled backend state
224
230
  [Test: 'User Journey: Dismiss and undo restores the card']
225
231
  ```
226
232
 
@@ -242,6 +248,8 @@ Adapt comment syntax to the project's language when generating annotations.
242
248
  // @lane: service-integration-e2e
243
249
  // @dependency: full-system
244
250
  // @complexity: high
251
+ // Primary failure mode: checkout appears successful but the persisted order or confirmation event is missing
252
+ // Proof obligation: exercise the full local service stack and assert persisted order state plus confirmation event after checkout
245
253
  [Test: 'User Journey: Complete product purchase persists order and emits confirmation']
246
254
  ```
247
255
 
@@ -297,13 +305,15 @@ Each test case MUST have the following standard annotations for test implementat
297
305
  - **@lane**: integration | fixture-e2e | service-integration-e2e
298
306
  - **@dependency**: none | [component names] | full-ui (mocked backend) | full-system
299
307
  - **@complexity**: low | medium | high
308
+ - **Primary failure mode**: the specific regression that should make the implemented test fail
309
+ - **Proof obligation**: what the implemented test must assert to prove the claim, including the boundary to exercise, before/action/after state for state-changing claims, and which boundaries may be mocked with rationale
300
310
 
301
- These annotations are used when planning and prioritizing test implementation.
311
+ These annotations are used when planning and prioritizing test implementation. Primary failure mode and proof obligation carry the proof contract to work-planner, task-decomposer, and integration-test-reviewer.
302
312
 
303
313
  ## Constraints and Quality Standards
304
314
 
305
315
  **Mandatory Compliance**:
306
- - Output test skeletons only: verification points, expected results, and pass criteria
316
+ - Output test skeletons only: verification points, expected results, pass criteria, primary failure mode, and proof obligation
307
317
  - Downstream consumers treat these skeletons as design artifacts rather than runnable tests
308
318
  - Clearly state verification points, expected results, and pass criteria for each test
309
319
  - Preserve original AC statements in comments (ensure traceability)
@@ -53,7 +53,7 @@ Skill Status:
53
53
  ## Input Parameters
54
54
 
55
55
  - **designDoc**: Path to the Design Doc (or multiple paths for fullstack features)
56
- - **implementationFiles**: List of files to review (or git diff range)
56
+ - **implementationFiles**: List of files to review (or git diff range). When a Work Plan is provided and implementationFiles is omitted or ambiguous, derive the review file set from the plan's `Review Scope` value; for revision plans, use the recorded base branch plus diff range.
57
57
  - **reviewMode**: `full` (default) | `acceptance` | `architecture`
58
58
 
59
59
  ## Workflow
@@ -124,6 +124,7 @@ Read error paths and boundary handling directly in the code:
124
124
  - Meaningful coverage: at least one assertion exercises the AC's observable behavior
125
125
  - Coverage gap: `skip`/`xit` on tests that should run, TODO/placeholder-only bodies, always-true assertions (for example `expect(true).toBe(true)` or `expect(arr.length).toBeGreaterThanOrEqual(0)`), 0-match runner reports, or grep-only matches without behavior verification
126
126
  - Intentional absence: meaningful when absence is the AC expectation
127
+ - Proof adequacy: a covered test should fail under the AC's primary failure mode and should exercise the claimed boundary rather than a substitute input that bypasses it. A test that would stay green if the claimed behavior regressed is a `coverage_gap` with rationale naming the unproven failure mode.
127
128
 
128
129
  Classify each quality finding into one of:
129
130
  - `dd_violation`: implementation deviates from the Design Doc
@@ -49,7 +49,7 @@ Skill Status:
49
49
  - `composite`: Composite perspective review (recommended) - Verifies structure, implementation, and completeness in one execution
50
50
  - When unspecified: Comprehensive review
51
51
 
52
- - **doc_type**: Document type (`PRD`/`ADR`/`UISpec`/`DesignDoc`)
52
+ - **doc_type**: Document type (`PRD`/`ADR`/`UISpec`/`DesignDoc`/`WorkPlan`)
53
53
  - **target**: Document path to review
54
54
  - **codebase_analysis**: codebase-analyzer JSON used to create the target document (optional)
55
55
  - **ui_analysis**: ui-analyzer JSON used to create the target document (optional)
@@ -84,6 +84,7 @@ Skill Status:
84
84
  - When `codebase_analysis` is provided, use `analysisScope`, `existingElements`, `constraints`, `qualityAssurance`, `focusAreas`, and `limitations` as source evidence for scope, feasibility, and completeness checks
85
85
  - When `ui_analysis` is provided, use `componentStructure`, `propsPatterns`, `cssLayout`, `stateDisplay`, `displayConditions`, `accessibility`, and `candidateWriteSet` as source evidence for UI scope, feasibility, and completeness checks
86
86
  - When `code_verification` is provided, use its discrepancies and reverse coverage as pre-verified evidence during review
87
+ - For WorkPlan: confirm the plan carries the artifacts the semantic gate is judged against: WorkPlan Review, Review Scope, Design-to-Plan Traceability, Verification Strategy summary, Proof Strategy, Failure Mode Checklist, and Quality Assurance Mechanisms. Read the referenced Design Doc(s), UI Spec, ADRs, and test skeletons when listed so coverage can be checked against source artifacts.
87
88
 
88
89
  ### Step 2: Target Document Collection
89
90
  - Load document specified by target
@@ -105,6 +106,14 @@ For DesignDoc, additionally verify:
105
106
  - [ ] Output Comparison section present when the design changes existing observable behavior, an external contract, or a persisted data shape
106
107
  - [ ] Minimal Surface Alternatives section present with one entry per new in-scope element as defined by coding-rules "Minimum Surface Terms" when the design proposes new implementation surface. If none are introduced, the section is marked N/A with rationale. Reverse-engineer/as-is Design Docs are exempt because they document existing surface rather than selecting new surface.
107
108
 
109
+ For WorkPlan, additionally verify:
110
+ - [ ] WorkPlan Review section present
111
+ - [ ] Review Scope recorded as planned-files scope, or base branch plus diff range for revision plans
112
+ - [ ] Design-to-Plan Traceability table present
113
+ - [ ] Verification Strategy summary and Proof Strategy present
114
+ - [ ] Failure Mode Checklist present
115
+ - [ ] Final phase includes Quality Assurance covering acceptance criteria achievement and required checks
116
+
108
117
  #### Gate 1: Quality Assessment (only after Gate 0 passes)
109
118
 
110
119
  **Comprehensive Review Mode**:
@@ -124,6 +133,14 @@ For DesignDoc, additionally verify:
124
133
  - **Verification Strategy quality check**: When the Verification Strategy section exists, verify that: (1) correctness definition is specific and measurable, (2) target comparison and observable success indicator are concrete when the change modifies observable behavior, external contracts, integrations, or data flow, (3) internal-only refactoring with identical observable inputs and outputs may use the minimal form, (4) verification method can detect the change's primary risk, (5) verification timing uses the normalized vocabulary or an explicit `N/A` rationale for minimal form, and (6) vertical-slice designs do not defer all verification to the final phase
125
134
  - **Output comparison check**: When the Design Doc changes existing observable behavior, an external contract, or a persisted data shape, verify that a concrete output comparison method is defined with identical input, expected output fields or format, and diff method. When upstream analysis includes `dataTransformationPipelines`, each listed step must be mapped to the comparison that verifies it; steps excluded because data passes through unchanged must include rationale. Missing mappings or rationale → `important` issue (category: `completeness`)
126
135
  - **Minimal Surface Alternatives check**: Applies when the Design Doc proposes new in-scope elements as defined by coding-rules "Minimum Surface Terms". Reverse-engineer/as-is Design Docs are exempt. Missing or empty section when the trigger fires → `critical` issue (category: `completeness`). For each entry verify: (1) Step 1 lists at least one AC ID or accepted technical constraint from the Design Doc or referenced UI Spec; speculative-only linkage → `critical` issue (category: `compliance`). (2) Steps 2-3 include at least one subtractive alternative such as derive, compute on demand, keep at caller, reuse existing, or do not introduce new state/mode/abstraction; missing subtractive alternative → `important` issue (category: `compliance`). (3) Step 4 selects the smallest alternative or names a current requirement smaller alternatives fail to satisfy; primary rationale based on coding-rules subjective-only rationales → `critical` issue (category: `compliance`). (4) Step 5 records rejected alternatives with brief rationale; missing rejected alternatives log → `important` issue (category: `completeness`)
136
+ - **WorkPlan semantic gate**:
137
+ - Coverage is checked where each item lives in the plan: each acceptance criterion is covered by a task whose Completion Criteria or Proof Obligations reference the AC ID or claim identifier; each data contract, state transition, boundary, prerequisite, and protected scope item has a Design-to-Plan Traceability row mapped to a task or an explicit out-of-scope entry. Missing coverage is a `critical` issue (category: `completeness`).
138
+ - Distinguish the cause for an uncovered acceptance criterion: when the source Design Doc supports it but no task maps to it, classify as a plan omission (`critical`, fixable by re-planning); when the source document or inputs give it no basis, classify as `rejected` because re-planning cannot invent the missing source requirement.
139
+ - Early verification must sit in an early phase rather than only the final phase. Deferral to final phase without rationale is an `important` issue (category: `consistency`).
140
+ - Each cross-boundary, public-boundary, browser-boundary, or persisted-state change names a task that verifies it through the real boundary. Missing real-boundary coverage is an `important` issue (category: `completeness`).
141
+ - Each traceability table present (Design-to-Plan, UI Spec Component, Connection Map, ADR Bindings) is filled to the granularity needed to resolve the target task. Under-specified rows are `important` issues (category: `completeness`).
142
+ - The Failure Mode Checklist covers applicable domain-independent categories: same-value, no-op, empty input, invalid option, missing config, unavailable boundary, shared-state dependency, rollback-only visibility. Missing applicable categories are `recommended` issues (category: `completeness`).
143
+ - Verdict mapping: any WorkPlan semantic-gate `critical` issue forces `needs_revision`, except a coverage gap traceable to missing or contradictory source documents or inputs forces `rejected`. Important-only issues may return `approved_with_conditions`, but orchestration must route WorkPlan conditions back through work-planner update before batch approval or task decomposition.
127
144
  - **Undetermined items review** [MANDATORY]: Every TBD, unknown, or open item MUST include: (1) **owner** — who resolves it, (2) **due** — when it gets resolved (which phase or milestone), (3) **next-phase handling** — how the next phase treats this gap. Missing any of these three → `important` issue
128
145
 
129
146
  **Perspective-specific Mode**:
@@ -64,6 +64,8 @@ Extract the following annotation patterns from the test file using the project's
64
64
  - `@dependency:` → Dependencies
65
65
  - `@real-dependency:` → Dependencies expected to stay real in integration coverage
66
66
  - `Verification items:` → Expected verification items (if present)
67
+ - `Primary failure mode:` → Regression the test must detect
68
+ - `Proof obligation:` → Boundary, state, and mock-rationale obligations the test must satisfy
67
69
 
68
70
  ### 2. Implementation Verification
69
71
  For each test case:
@@ -83,7 +85,16 @@ Evaluate each test for:
83
85
  - No shared state
84
86
  - No time-dependent logic
85
87
 
86
- ### 4. Return JSON Result
88
+ ### 4. Claim Proof Adequacy
89
+
90
+ Confirm each test proves its acceptance criterion claim, not merely that code ran. Record a `proof_insufficient` issue for each unmet obligation:
91
+ - The test would fail under the stated primary failure mode because an assertion observes the promised behavior.
92
+ - When the claim involves a public, integration, browser, process, service, or persistence boundary, the test exercises that boundary rather than a substitute input that bypasses it.
93
+ - When the claim involves state change, side effect, rollback, non-mutating mode, idempotency, or persistence, the test asserts observable state before the action, performs the action, and asserts observable state after the action.
94
+ - Each mocked boundary is external to the behavior under test, with the boundary under test left real and a comment explaining why the mock is permitted.
95
+ - Integration and E2E tests use bounded fixtures and assert outcomes that hold regardless of shared state, real data volume, or execution order.
96
+
97
+ ### 5. Return JSON Result
87
98
  Return the JSON result as the final response. See Output Format for the schema.
88
99
 
89
100
  ## Output Format
@@ -102,7 +113,7 @@ Return the JSON result as the final response. See Output Format for the schema.
102
113
  "qualityIssues": [
103
114
  {
104
115
  "testName": "[test name]",
105
- "issueType": "skeleton_mismatch|aaa_violation|independence_violation|mock_boundary|readability",
116
+ "issueType": "skeleton_mismatch|aaa_violation|independence_violation|mock_boundary|proof_insufficient|readability",
106
117
  "severity": "high|medium|low",
107
118
  "description": "[specific issue]",
108
119
  "skeletonExpected": "[what skeleton specified]",
@@ -140,6 +151,7 @@ Return the JSON result as the final response. See Output Format for the schema.
140
151
  - [ ] Every test has corresponding skeleton comment
141
152
  - [ ] Observable result from Behavior is asserted
142
153
  - [ ] All Verification items are covered
154
+ - [ ] Each test proves the claim by failing under the primary failure mode and exercising the stated boundary
143
155
  - [ ] No internal component mocking in integration tests
144
156
  - [ ] Clear Arrange/Act/Assert separation
145
157
  - [ ] No test interdependencies
@@ -67,6 +67,7 @@ Decompose tasks based on implementation strategy patterns determined in implemen
67
67
  - Document concrete executable procedures
68
68
  - Include task-level Quality Assurance Mechanisms when the work plan defines them
69
69
  - Include task-level Binding Decisions when ADR Bindings cover the task
70
+ - Include task-level Proof Obligations when the work plan defines Proof Strategy, test skeleton proof annotations, or acceptance-criterion primary failure modes
70
71
  - **Always include operation verification methods**
71
72
  - Define clear completion criteria (within executor's scope of responsibility)
72
73
 
@@ -160,6 +161,9 @@ Decompose tasks based on implementation strategy patterns determined in implemen
160
161
  8. **Utilize Test Information**
161
162
  When test information (@category, @dependency, @complexity, etc.) is documented in the work plan, reflect that information in task files
162
163
 
164
+ 9. **Propagate Proof Obligations**
165
+ When the work plan or referenced test skeletons include Primary failure mode or Proof obligation annotations, copy the applicable obligations into each generated task that implements or verifies the claim. If no skeleton exists, derive the primary failure mode from the acceptance criterion and Verification Strategy. Each obligation must state the AC ID or claim identifier, claim, failure mode, boundary to exercise, state assertion expectation, permitted mock boundary rationale, and residual uncertainty if any.
166
+
163
167
  ## Verification Strategy Propagation
164
168
 
165
169
  Verification Strategy defines what correctness means at design time. L1/L2/L3 (from implementation-approach) define task-level verification depth at execution time. Use both.
@@ -195,6 +199,18 @@ When the work plan includes a `Design-to-Plan Traceability` section:
195
199
  5. **Verification integrity**: For `verification` rows, ensure the corresponding task file includes the required comparison or verification method in Operation Verification Methods.
196
200
  6. **Prerequisite integrity**: For `prerequisite` rows, place setup, migration, seed, auth, or environment work before dependent implementation tasks.
197
201
 
202
+ ## Proof Obligation Propagation
203
+
204
+ When the work plan includes a `Proof Strategy` section or referenced test skeleton proof annotations:
205
+
206
+ 1. Locate each task that implements or verifies the related acceptance criterion, user journey, boundary, or state transition.
207
+ 2. Add a `Proof Obligations` section to the task file using the task template.
208
+ 3. Preserve the Primary failure mode and Proof obligation wording from the skeleton when available.
209
+ 4. For state-changing claims, require before -> action -> after observable state assertions.
210
+ 5. For boundary claims, require the test to exercise the stated public, integration, browser, process, service, or persistence boundary.
211
+ 6. For mocked dependencies, name only external dependencies as mockable and record why the boundary under test remains real.
212
+ 7. Record residual uncertainty when a task-level test cannot prove a claim fully and identify the later phase or task that closes the residual.
213
+
198
214
  ## ADR Binding Propagation
199
215
 
200
216
  When the work plan includes an `ADR Bindings` section:
@@ -46,6 +46,8 @@ Skill Status:
46
46
  8. Propagate UI Spec component and runtime boundary context into the plan so task decomposition can pass it to executors
47
47
  9. Map implementation-binding ADR decisions to constrained tasks
48
48
  10. Document in progress-trackable format
49
+ 11. Carry proof obligations from test skeletons or acceptance criteria into the work plan so task files and reviews can judge whether tests prove the claim, not merely run
50
+ 12. Record WorkPlan Review status as pending when creating or materially updating a plan
49
51
 
50
52
  ## Input Parameters
51
53
 
@@ -70,6 +72,7 @@ Read the Design Doc(s), UI Spec, PRD, and ADR (if provided). Extract:
70
72
  - Quality Assurance Mechanisms from each Design Doc: all items marked `adopted`, including mechanism name, enforced quality aspect, configuration path, and covered files or project-wide scope
71
73
  - Implementation-relevant technical requirements from each Design Doc section using the category values defined in the plan template's `Design-to-Plan Traceability` section
72
74
  - Prerequisite ADR references from each Design Doc and implementation-binding decisions from each resolved ADR
75
+ - Primary failure modes from acceptance criteria or generated test skeletons
73
76
 
74
77
  Focus on implementation-relevant items only: items that directly inform task creation, dependency ordering, verification design, or protected no-change boundaries.
75
78
  Extract `scope-boundary` rows from explicit non-target statements such as `No Ripple Effect`, compatibility constraints, or protected boundaries that must remain unchanged.
@@ -88,8 +91,11 @@ Choose Strategy A (TDD) if test skeletons are provided, Strategy B (implementati
88
91
  ### 4. Compose Phases
89
92
 
90
93
  **Common rules (all approaches)**:
94
+ - Record Review Scope in the plan header as the planned file scope derived from the Design Doc and task targets. For revision plans, record the base branch and diff range.
95
+ - Include a `WorkPlan Review` section. Set `Status: pending` and `Conditions: none`. When invoked in update mode after a WorkPlan review verdict, only `approved` with no conditions may set `Status: approved`.
91
96
  - Preserve Verification Strategies per Design Doc in the work plan header and keep each source document path. Merge strategies only when the Design Docs explicitly define a shared one
92
97
  - Include Verification Strategy summaries in the work plan header so the plan is self-sufficient for downstream task generation
98
+ - Include a Proof Strategy in the work plan header. When test skeletons exist, use their Primary failure mode and Proof obligation annotations as the proof obligation source; otherwise derive each claim's primary failure mode from the acceptance criterion and Design Doc verification strategy.
93
99
  - Include adopted Quality Assurance Mechanisms in the work plan header so downstream task generation and quality checks inherit the intended quality gates
94
100
  - Place tasks with the lowest dependencies in earlier phases
95
101
  - Map normalized verification timing to phases as follows: `phase_1` -> earliest implementation phase, `per_phase` -> each relevant phase, `integration_phase` -> integration phase, `final_phase` -> final Quality Assurance phase
@@ -171,8 +177,22 @@ Mapping rules:
171
177
  - Acceptance criteria and required user-visible behaviors belong in `Design-to-Plan Traceability`; `ADR Bindings` covers structural implementation constraints.
172
178
  - If an ADR decision constrains the design but no task covers it, add a justified gap and flag it for user confirmation before plan approval.
173
179
 
180
+ ### 5d. Build Failure Mode Checklist
181
+
182
+ Populate the plan template's `Failure Mode Checklist` before finalizing tasks. Enumerate all eight domain-independent categories, mark whether each applies, and list the task IDs that cover applicable categories. Keep category names generic and put project-specific details in task descriptions or notes.
183
+
184
+ Categories:
185
+ - same-value
186
+ - no-op
187
+ - empty input
188
+ - invalid option
189
+ - missing config
190
+ - unavailable boundary
191
+ - shared-state dependency
192
+ - rollback-only visibility
193
+
174
194
  ### 6. Define Tasks with Completion Criteria
175
- For each task, derive completion criteria from Design Doc acceptance criteria. Apply the 3-element completion definition (Implementation Complete, Quality Complete, Integration Complete).
195
+ For each task, derive completion criteria from Design Doc acceptance criteria. Each task that covers an acceptance criterion must explicitly name the AC ID or claim identifier in Completion Criteria or Proof Obligations. Apply the 3-element completion definition (Implementation Complete, Quality Complete, Integration Complete).
176
196
 
177
197
  ### 7. Produce Work Plan Document
178
198
  Write the work plan following the plan template from documentation-criteria skill. Include Phase Structure Diagram and Task Dependency Diagram (mermaid).
@@ -244,6 +264,8 @@ Read available test skeleton files (integration tests, and E2E tests only when p
244
264
  - `// @dependency:` → Dependent components (material for phase placement decisions)
245
265
  - `// @complexity:` → Complexity (high/medium/low, material for effort estimation)
246
266
  - `// Value Score:` → Priority judgment
267
+ - `// Primary failure mode:` → Regression the implemented test must detect
268
+ - `// Proof obligation:` → Boundary, state, and mock-rationale obligations the implemented test must satisfy
247
269
 
248
270
  #### Step 2: Reflect Meta Information in Work Plan
249
271
 
@@ -257,6 +279,11 @@ Read available test skeleton files (integration tests, and E2E tests only when p
257
279
  - `// @complexity: high` → Subdivide tasks or estimate higher effort
258
280
  - `// @complexity: low` → Consider combining multiple tests into one task
259
281
 
282
+ 3. **Proof Strategy Propagation**
283
+ - Copy Primary failure mode and Proof obligation annotations into the plan's Proof Strategy
284
+ - Map each proof obligation to the task that implements or verifies the corresponding acceptance criterion or user journey
285
+ - When a proof obligation cannot be fully satisfied until a later phase, record the residual and the closing task
286
+
260
287
  #### Step 3: Extract Environment Prerequisites from E2E Skeletons
261
288
 
262
289
  When E2E test skeletons are provided, first identify the E2E skeleton subset using file naming conventions such as `*.fixture.e2e.test.*`, `*.service.e2e.test.*`, or `*.e2e.test.*`, then scan only those files for environment prerequisites in two stages:
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "codex-workflows",
3
- "version": "0.6.5",
3
+ "version": "0.6.6",
4
4
  "description": "Task-oriented agentic coding framework for OpenAI Codex CLI — skills, recipes, and subagents for structured development workflows",
5
5
  "license": "MIT",
6
6
  "author": "Shinsuke Kagawa",