codex-workflows 0.6.8 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/.agents/skills/ai-development-guide/SKILL.md +5 -3
  2. package/.agents/skills/ai-development-guide/references/frontend.md +11 -19
  3. package/.agents/skills/coding-rules/references/typescript.md +17 -12
  4. package/.agents/skills/documentation-criteria/SKILL.md +1 -1
  5. package/.agents/skills/documentation-criteria/references/design-template.md +16 -5
  6. package/.agents/skills/documentation-criteria/references/plan-template.md +19 -5
  7. package/.agents/skills/documentation-criteria/references/task-template.md +19 -1
  8. package/.agents/skills/recipe-build/SKILL.md +1 -1
  9. package/.agents/skills/recipe-front-build/SKILL.md +1 -1
  10. package/.agents/skills/recipe-front-plan/SKILL.md +1 -1
  11. package/.agents/skills/recipe-fullstack-build/SKILL.md +1 -1
  12. package/.agents/skills/recipe-plan/SKILL.md +1 -1
  13. package/.agents/skills/recipe-prepare-implementation/SKILL.md +2 -1
  14. package/.agents/skills/subagents-orchestration-guide/SKILL.md +2 -2
  15. package/.agents/skills/subagents-orchestration-guide/references/monorepo-flow.md +1 -1
  16. package/.agents/skills/testing/SKILL.md +5 -5
  17. package/.agents/skills/testing/references/typescript.md +2 -6
  18. package/.codex/agents/acceptance-test-generator.toml +2 -44
  19. package/.codex/agents/code-reviewer.toml +12 -57
  20. package/.codex/agents/code-verifier.toml +1 -47
  21. package/.codex/agents/codebase-analyzer.toml +1 -106
  22. package/.codex/agents/design-sync.toml +2 -64
  23. package/.codex/agents/document-reviewer.toml +8 -81
  24. package/.codex/agents/integration-test-reviewer.toml +1 -26
  25. package/.codex/agents/investigator.toml +1 -73
  26. package/.codex/agents/quality-fixer-frontend.toml +4 -105
  27. package/.codex/agents/quality-fixer.toml +4 -122
  28. package/.codex/agents/requirement-analyzer.toml +1 -29
  29. package/.codex/agents/rule-advisor.toml +1 -79
  30. package/.codex/agents/scope-discoverer.toml +1 -70
  31. package/.codex/agents/security-reviewer.toml +1 -19
  32. package/.codex/agents/solver.toml +5 -54
  33. package/.codex/agents/task-decomposer.toml +47 -4
  34. package/.codex/agents/task-executor-frontend.toml +37 -144
  35. package/.codex/agents/task-executor.toml +37 -144
  36. package/.codex/agents/technical-designer-frontend.toml +8 -0
  37. package/.codex/agents/technical-designer.toml +10 -1
  38. package/.codex/agents/ui-analyzer.toml +1 -157
  39. package/.codex/agents/verifier.toml +2 -65
  40. package/.codex/agents/work-planner.toml +30 -9
  41. package/package.json +1 -1
@@ -165,7 +165,9 @@ To isolate problems, attempt reproduction with minimal code:
165
165
  - Replace external dependencies with mocks
166
166
  - Create minimal configuration that reproduces problem
167
167
 
168
- ### 4. Debug Log Output
168
+ ### 4. Debug Log Output (temporary)
169
+ Add structured debug logs to isolate the issue, then remove them before commit.
170
+
169
171
  ```
170
172
  Pattern: Structured logging with context
171
173
  {
@@ -204,7 +206,7 @@ Universal quality assurance phases applicable to all languages:
204
206
  ### Phase 3: Testing
205
207
  1. **Unit Tests**: Run all unit tests
206
208
  2. **Integration Tests**: Run integration tests
207
- 3. **Test Coverage**: Measure and verify coverage meets standards
209
+ 3. **Test Coverage**: Measure coverage when configured and use it to find gaps
208
210
  4. **E2E Tests**: Run end-to-end tests
209
211
 
210
212
  ### Phase 4: Final Quality Gate [MANDATORY]
@@ -212,7 +214,7 @@ All checks MUST pass before proceeding:
212
214
  - Zero static analysis errors
213
215
  - Build succeeds
214
216
  - All tests pass
215
- - Coverage meets threshold
217
+ - Coverage threshold passes when the project, task file, work plan, or Design Doc defines one. When no threshold is configured, use coverage output only to identify untested critical paths.
216
218
 
217
219
  **ENFORCEMENT**: Cannot proceed with ANY quality check failures — fix ALL errors before marking task complete
218
220
 
@@ -75,28 +75,19 @@ console.log('DEBUG:', {
75
75
 
76
76
  ## Frontend Quality Check Workflow
77
77
 
78
- Use the appropriate run command based on the `packageManager` field in package.json.
78
+ Read `package.json` scripts and run them with the project's package manager from the `packageManager` field. Map the phases below using the script names declared in `package.json`.
79
79
 
80
- ### Common Commands
81
- - `dev` - Development server
82
- - `build` - Production build
83
- - `preview` - Preview production build
84
- - `type-check` - Type check (no emit)
85
-
86
- ### Quality Check Phases
87
- **Phase 1-3: Basic Checks**
88
- - `check` - Linter + formatter (Biome, ESLint, Prettier, etc.)
89
- - `build` - TypeScript build
90
-
91
- **Phase 4-5: Tests and Final Confirmation**
92
- - `test` - Test execution
93
- - `test:coverage:fresh` - Coverage measurement (fresh cache)
94
- - `check:all` - Overall integrated check
80
+ ### Phases
81
+ 1. **Lint/format** - the project's formatter and linter, such as Biome or ESLint plus Prettier
82
+ 2. **Type check** - type check without emit when the project has a dedicated command
83
+ 3. **Build** - production build
84
+ 4. **Test** - unit and integration tests
85
+ 5. **Coverage** - coverage run when configured or when the task added or changed behavior
95
86
 
96
87
  ### Troubleshooting
97
- - **Port in use error**: Run `cleanup:processes` script if available
98
- - **Cache issues**: Run tests with fresh cache option
99
- - **Dependency errors**: Clean reinstall dependencies
88
+ - **Port already in use**: stop the stale dev, preview, or test process holding the port
89
+ - **Stale cache**: re-run with the project's fresh or clean-cache option
90
+ - **Dependency errors**: clean reinstall dependencies
100
91
 
101
92
  ## Frontend Technical Decisions
102
93
 
@@ -108,6 +99,7 @@ Use the appropriate run command based on the `packageManager` field in package.j
108
99
  ### Performance vs Readability
109
100
  - Prioritize readability unless clear bottleneck exists
110
101
  - Measure before optimizing (use React DevTools Profiler, not guesses)
102
+ - When React Compiler is enabled, routine memoization is automatic. Use manual memoization only for a measured bottleneck or stable reference identity required by third-party APIs or effect dependencies.
111
103
  - Document reason with comments when optimizing
112
104
 
113
105
  ## Frontend Impact Analysis
@@ -62,6 +62,11 @@ function isUser(value: unknown): value is User {
62
62
  - **Component Hierarchy**: Follow the project's existing component architecture. Use Atoms > Molecules > Organisms > Templates > Pages only when the project adopts Atomic Design.
63
63
  - **Co-location**: Place tests, styles, and related files alongside components
64
64
 
65
+ **Server/Client Boundary (RSC frameworks only, such as Next.js App Router)**
66
+ - Default to server components for data fetching and rendering. Isolate interactivity behind a `"use client"` boundary at the smallest scope that needs it.
67
+ - Keep browser-only APIs such as `window`, `localStorage`, and event handlers inside client components.
68
+ - Skip this section for client-only SPAs with no server-component runtime.
69
+
65
70
  **State Management Patterns**
66
71
  - **Local State**: `useState` for component-specific state
67
72
  - **Context API**: For sharing state across component tree (theme, auth, etc.)
@@ -95,19 +100,18 @@ setUsers(users)
95
100
  - Type-safe: Always define Props type explicitly
96
101
 
97
102
  **Environment Variables**
98
- - **Use build tool's environment variable system**: `process.env` does not work in browsers
99
- - Centrally manage environment variables through configuration layer
100
- - Implement proper type safety and default value handling
103
+ - **Use the build tool's env accessor**: read client-side env through the bundler's exposed accessor, such as Vite `import.meta.env` or Next.js/CRA prefixed `process.env`.
104
+ - **Only prefixed vars reach the client**: build tools expose only vars carrying their public prefix. Match the project's bundler, such as Vite `VITE_`, Next.js `NEXT_PUBLIC_`, or CRA `REACT_APP_`.
105
+ - Centrally manage environment variables through a typed configuration layer with defaults.
101
106
 
102
107
  ```typescript
103
- // Build tool environment variables (public values only)
108
+ // Client-exposed env must carry the bundler's public prefix, or it is undefined in the browser.
109
+ // Vite: import.meta.env.VITE_API_URL
110
+ // Next.js: process.env.NEXT_PUBLIC_API_URL
104
111
  const config = {
105
- apiUrl: import.meta.env.API_URL || 'http://localhost:3000',
106
- appName: import.meta.env.APP_NAME || 'My App'
112
+ apiUrl: import.meta.env.VITE_API_URL || 'http://localhost:3000',
113
+ appName: import.meta.env.VITE_APP_NAME || 'My App'
107
114
  }
108
-
109
- // Does not work in frontend
110
- // const apiUrl = process.env.API_URL
111
115
  ```
112
116
 
113
117
  **Security (Client-side Constraints)**
@@ -118,7 +122,7 @@ const config = {
118
122
 
119
123
  ```typescript
120
124
  // Bad: API key exposed in browser
121
- // const apiKey = import.meta.env.API_KEY
125
+ // const apiKey = import.meta.env.VITE_API_KEY
122
126
  // const response = await fetch(`https://api.example.com/data?key=${apiKey}`)
123
127
 
124
128
  // Good: Backend manages secrets, frontend accesses via proxy
@@ -132,6 +136,7 @@ const response = await fetch('/api/data') // Backend handles API key authenticat
132
136
  - Promise Handling: Always use `async/await`
133
137
  - Error Handling: Always handle with `try-catch` or Error Boundary
134
138
  - Type Definition: Explicitly define return value types (e.g., `Promise<Result>`)
139
+ - Effect race/cleanup: guard `useEffect` data fetches against out-of-order responses and post-unmount state updates with `AbortController`, a mounted/stale flag, or a server-state library such as React Query or SWR.
135
140
 
136
141
  **Format Rules**
137
142
  - Semicolon omission (follow project formatter settings)
@@ -209,10 +214,10 @@ Never include sensitive information (password, token, apiKey, secret, creditCard
209
214
 
210
215
  ## Performance Optimization
211
216
 
212
- - Component Memoization: Use React.memo for expensive components
217
+ - Automatic memoization: when React Compiler is enabled, rely on it. Use manual `React.memo`, `useMemo`, or `useCallback` only for a profiler-confirmed bottleneck or stable reference identity required by third-party APIs or effect dependencies.
213
218
  - State Optimization: Minimize re-renders with proper state structure
214
219
  - Lazy Loading: Use React.lazy and Suspense for code splitting
215
- - Bundle Size: Monitor with the `build` script and keep under 500KB
220
+ - Bundle Size: Monitor with the build script against the project's budget
216
221
 
217
222
  ## Non-functional Requirements
218
223
 
@@ -81,7 +81,7 @@ description: "Documentation creation criteria for PRD, ADR, Design Doc, UI Spec,
81
81
 
82
82
  ### Work Plan
83
83
  **Purpose**: Implementation task management and progress tracking
84
- **Scope**: Task breakdown, dependencies, schedule estimates, test skeleton file paths, Verification Strategy summaries from each Design Doc, Design-to-Plan Traceability mapping for implementation-relevant technical requirements, ADR Bindings for implementation-binding ADR decisions, final Quality Assurance phase, and progress tracking only. Technical rationale belongs in ADR and design details belong in Design Doc.
84
+ **Scope**: Task breakdown, dependencies, schedule estimates, test skeleton file paths, Verification Strategy summaries from each Design Doc, Design-to-Plan Traceability mapping for implementation-relevant technical requirements, Reference Contract Values for binding observable Design Doc values, ADR Bindings for implementation-binding ADR decisions, final Quality Assurance phase, and progress tracking only. Technical rationale belongs in ADR and design details belong in Design Doc.
85
85
 
86
86
  **Phase Division Criteria**:
87
87
 
@@ -237,11 +237,12 @@ Rejected Alternatives Log is element-level. Future Extensibility below is design
237
237
  // Record major contract/interface definitions here
238
238
  ```
239
239
 
240
- ### Data Contract
240
+ ### Data Contracts
241
241
 
242
- #### Component 1
242
+ #### [Component or Boundary] (repeat per component/boundary)
243
243
 
244
244
  ```yaml
245
+ Contract: [interface / function / API / schema name]
245
246
  Input:
246
247
  Type: [Data shape, contract, or schema]
247
248
  Preconditions: [Required items, format constraints]
@@ -256,6 +257,14 @@ Invariants:
256
257
  - [Conditions that remain unchanged before and after processing]
257
258
  ```
258
259
 
260
+ ### Observable Contract Values (When Applicable)
261
+
262
+ Use this section when the design defines observable values the implementation must reproduce exactly. Omit it when the Design Doc has no such values.
263
+
264
+ | Contract Type | Required Observable Value |
265
+ |---------------|---------------------------|
266
+ | structure-order / derived-display / state-lifecycle-negative | [Exact column/field/label set and order, derived display rule, or condition where persisted/restored/cached/derived state remains unused] |
267
+
259
268
  ### Test Boundaries
260
269
 
261
270
  #### Mock Boundary Decisions
@@ -274,9 +283,11 @@ Invariants:
274
283
 
275
284
  ### Field Propagation Map (When Fields Cross Boundaries)
276
285
 
277
- | Field | Boundary | Status | Detail |
278
- |-------|----------|--------|--------|
279
- | [field name] | [Component A to B] | preserved / transformed / dropped | [logic or reason] |
286
+ A boundary includes a serialized boundary: a value encoded on one side and parsed on the other through a medium such as a query string, CLI argument, environment variable, config entry, message payload, storage key, or file. For those rows, record the exact encoded representation and how the consumer parses it. Use "-" only when the row is not a serialized boundary.
287
+
288
+ | Field | Boundary | Status | Serialized Format | Consumer Parse Rule | Detail |
289
+ |-------|----------|--------|-------------------|---------------------|--------|
290
+ | [field name] | [Component A to B] | preserved / transformed / dropped | [exact representation the producer emits when serialized; "-" otherwise] | [how the consumer decodes and validates it; "-" otherwise] | [logic or reason] |
280
291
 
281
292
  ## Verification Strategy
282
293
 
@@ -81,6 +81,18 @@ Map each Design Doc technical requirement to the task or phase that covers it. U
81
81
  - Merge duplicate restatements of the same obligation from multiple DD sections into one row and cite the primary section in `DD Section`
82
82
  - Keep `scope-boundary` rows concrete: name the protected file group, component boundary, contract, or workflow that must remain unchanged
83
83
 
84
+ ## Reference Contract Values
85
+
86
+ Include this section when a Traceability row's DD Item encodes a binding observable value the implementation must reproduce exactly: a column/label set and order, a derived-display rule where one field determines another display value, or a state-lifecycle negative that states when persisted or derived state must stay unused. Serialized boundaries belong in the Connection Map / Field Propagation Map. When a value qualifies for both this table and a serialized boundary, record it only in the Connection Map. ADR-derived structural decisions belong in ADR Bindings.
87
+
88
+ The Traceability table records coverage. This table carries the required value verbatim so the covering task can check the exact contract.
89
+
90
+ | Design Doc (section) | Contract Type | Required Observable Value (verbatim) | Covered By Task(s) | Gap Status | Notes |
91
+ |----------------------|---------------|--------------------------------------|--------------------|------------|-------|
92
+ | docs/design/xxx-design.md (Section name) | structure-order / derived-display / state-lifecycle-negative | [Exact value copied from the Design Doc] | [P1-T1] | covered | |
93
+
94
+ **Gap Status values**: `covered` (mapped to one or more tasks), `gap` (no task exists yet; set Covered By Task(s) to `-`, include justification in Notes, and require user confirmation before plan approval)
95
+
84
96
  ## Failure Mode Checklist
85
97
 
86
98
  Domain-independent failure categories this implementation must guard against. Enumerate all eight categories, mark which apply, and list a covering task for each that applies; keep category names generic and place project-specific detail in task descriptions or notes.
@@ -125,11 +137,13 @@ One row represents one independently checkable binding decision. A single ADR ca
125
137
 
126
138
  ## Connection Map
127
139
 
128
- Include this section when implementation crosses runtime, process, deployment, or service boundaries. Omit it when the change stays inside one runtime or only uses in-process package imports.
140
+ Include this section when implementation crosses runtime, process, deployment, or service boundaries, or when a value is serialized and parsed across a boundary within one runtime through a query string, route parameter, form post, CLI argument, environment variable, config entry, message payload, storage key, or file.
141
+
142
+ For serialized boundaries, fill Serialized Format and Consumer Parse Rule with concrete values. Use "-" only for non-serialized external signals where the Expected Signal fully captures the boundary contract.
129
143
 
130
- | Boundary | Caller / Producer | Callee / Consumer | Expected Signal | Covered By Task(s) |
131
- |----------|-------------------|-------------------|-----------------|--------------------|
132
- | [e.g. "web client -> API"] | [module/package initiating request or message] | [module/package receiving request or message] | [Observable evidence, e.g. HTTP 200 matching schema X] | [P1-T1, P1-T2] |
144
+ | Boundary | Caller / Producer | Callee / Consumer | Serialized Format | Consumer Parse Rule | Expected Signal | Covered By Task(s) |
145
+ |----------|-------------------|-------------------|-------------------|---------------------|-----------------|--------------------|
146
+ | [producing side -> consuming side] | [module/package initiating request or message] | [module/package receiving request or message] | [exact representation the producer emits, or "-"] | [how the consumer decodes and validates it, or "-"] | [Observable evidence, e.g. HTTP 200 matching schema X] | [P1-T1, P1-T2] |
133
147
 
134
148
  ## Objective
135
149
  [Why this change is necessary, what problem it solves]
@@ -243,7 +257,7 @@ This phase is required for all implementation approaches.
243
257
  - [ ] Security review: Verify security considerations from Design Doc are implemented
244
258
  - [ ] Quality checks (types, lint, format)
245
259
  - [ ] Execute all tests (including integration/E2E from test skeletons, when provided)
246
- - [ ] Coverage 70%+
260
+ - [ ] Coverage threshold passes when configured
247
261
  - [ ] Document updates
248
262
 
249
263
  ### Quality Assurance
@@ -17,6 +17,13 @@ Metadata:
17
17
  Files to read before starting implementation. Use concrete file paths, optionally with a section/function hint:
18
18
  - [e.g., src/orders/checkout.ts (processOrder function)]
19
19
 
20
+ ## Change Category
21
+ (Include this field only when the task is a bug fix, regression, state-change, or boundary-change. Omit otherwise.)
22
+
23
+ `Change Category: <one or more of bug-fix, regression, state-change, boundary-change, comma-separated>`
24
+
25
+ When present, sweep cases sharing the same path, contract, persisted state, or external boundary for the same class of defect during the Red Phase.
26
+
20
27
  ## Binding Decisions
21
28
  (Include this section when the work plan's ADR Bindings table covers this task. Omit otherwise.)
22
29
 
@@ -26,14 +33,24 @@ Each row is an ADR decision the implementation in this task must comply with.
26
33
  |--------|------|----------|------------------|
27
34
  | docs/adr/ADR-XXXX-title.md (§ <Source Section>) | [Axis value copied verbatim from the work plan's ADR Bindings row] | [Binding decision copied from the work plan's ADR Bindings row] | [Y/N-answerable positive predicate that evaluates whether the planned and final implementation satisfy the decision] |
28
35
 
36
+ ## Reference Contracts
37
+ (Include this section when the work plan's Reference Contract Values table covers this task. Omit otherwise.)
38
+
39
+ Each row is a Design Doc-derived observable contract the implementation in this task must reproduce exactly. Serialized boundaries are carried by Boundary Context from the work plan's Connection Map. ADR-derived structural decisions are carried by Binding Decisions above.
40
+
41
+ | Source | Contract Type | Required Observable Value | Compliance Check |
42
+ |--------|---------------|---------------------------|------------------|
43
+ | docs/design/xxx-design.md (§ Section name) | structure-order / derived-display / state-lifecycle-negative | [Required Observable Value copied verbatim from the work plan row] | [Y/N-answerable positive predicate that evaluates whether the planned and final implementation reproduces the value] |
44
+
29
45
  ## Investigation Notes
30
46
  Brief observations recorded after reading Investigation Targets:
31
47
  - [path] - [interfaces, control/data flow, state transitions, side effects relevant to this task]
32
- - When Binding Decisions exist, record the planned implementation approach and each Compliance Check result here.
48
+ - When Binding Decisions or Reference Contracts exist, record the planned implementation approach and each Compliance Check result here.
33
49
 
34
50
  ## Implementation Steps (TDD: Red-Green-Refactor)
35
51
  ### 1. Red Phase
36
52
  - [ ] Read all Investigation Targets and update Investigation Notes
53
+ - [ ] (When Change Category is set) Sweep adjacent cases sharing the same path, contract, persisted state, or external boundary for the same class of defect; fold any in-scope residual into failing tests
37
54
  - [ ] Review dependency deliverables (if any)
38
55
  - [ ] Verify/create contract definitions
39
56
  - [ ] Write failing tests
@@ -75,6 +92,7 @@ Brief observations recorded after reading Investigation Targets:
75
92
  - [ ] Each Proof Obligation is met: the test turns red under its primary failure mode and exercises the stated boundary
76
93
  - [ ] Deliverables created (for research/design tasks)
77
94
  - [ ] When Binding Decisions exist, every Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes
95
+ - [ ] When Reference Contracts exist, every Compliance Check evaluates to `Y` against the final implementation, with evidence recorded in Investigation Notes
78
96
 
79
97
  ## Notes
80
98
  - Impact scope: [Areas where changes may propagate]
@@ -73,7 +73,7 @@ When task files don't exist, the plan references a Design Doc, and the WorkPlan
73
73
 
74
74
  ### 1. Work Plan Review
75
75
 
76
- Spawn document-reviewer agent: "Review the work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
76
+ Spawn document-reviewer agent: "Review the work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, Reference Contract Values fidelity, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
77
77
 
78
78
  Branch on `verdict.decision`:
79
79
  - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
@@ -73,7 +73,7 @@ When task files don't exist, the plan references a Design Doc, and the WorkPlan
73
73
 
74
74
  ### 1. Work Plan Review
75
75
 
76
- Spawn document-reviewer agent: "Review the frontend work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
76
+ Spawn document-reviewer agent: "Review the frontend work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, Reference Contract Values fidelity, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
77
77
 
78
78
  Branch on `verdict.decision`:
79
79
  - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
@@ -53,7 +53,7 @@ Spawn acceptance-test-generator agent: "Generate test skeletons from Design Doc
53
53
  Spawn work-planner agent: "Create work plan from Design Doc at [path]. Integration test file: [path from step 2]. fixture-e2e test file: [path from step 2 or null]. service-integration-e2e test file: [path from step 2 or null]. E2E absence reasons by lane: [values from step 2 when an E2E lane is null]. Integration tests are created with each phase implementation, fixture-e2e runs alongside UI implementation, service-integration-e2e runs only in the final phase when a service E2E file exists. Include `Implementation Readiness: pending` in the work plan header."
54
54
 
55
55
  ### Step 4: Work Plan Review
56
- Spawn document-reviewer agent: "Review the frontend work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
56
+ Spawn document-reviewer agent: "Review the frontend work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc and UI Spec, Reference Contract Values fidelity, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
57
57
 
58
58
  Branch on `verdict.decision`:
59
59
  - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then proceed to Step 5
@@ -83,7 +83,7 @@ When task files don't exist, the plan references a Design Doc, and the WorkPlan
83
83
 
84
84
  ### 1. Work Plan Review
85
85
 
86
- Spawn document-reviewer agent: "Review the fullstack work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
86
+ Spawn document-reviewer agent: "Review the fullstack work plan before task decomposition. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, Reference Contract Values fidelity, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
87
87
 
88
88
  Branch on `verdict.decision`:
89
89
  - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then continue to user confirmation
@@ -56,7 +56,7 @@ Present options if multiple exist (can be specified with $ARGUMENTS).
56
56
  - Spawn work-planner agent: "Create work plan from design document at [design-doc-path]. Include deliverables from previous process according to subagents-orchestration-guide skill coordination specification. If `generatedFiles.fixtureE2e` or `generatedFiles.serviceE2e` is null, use the corresponding `e2eAbsenceReason` and accept the null E2E lane as a valid planning input. Include `Implementation Readiness: pending` in the work plan header."
57
57
 
58
58
  ### Step 4: Work Plan Review
59
- Spawn document-reviewer agent: "Review the work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
59
+ Spawn document-reviewer agent: "Review the work plan. doc_type: WorkPlan. target: docs/plans/[plan-name].md. mode: composite. Review semantic traceability to the Design Doc, Reference Contract Values fidelity, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
60
60
 
61
61
  Branch on `verdict.decision`:
62
62
  - `approved` -> spawn work-planner in update mode once to record `Status: approved` and `Conditions: none` in WorkPlan Review, then proceed to Step 5
@@ -31,7 +31,7 @@ Each criterion produces `pass`, `fail`, or `not_applicable`, with file:line evid
31
31
 
32
32
  | ID | Criterion | Pass Evidence |
33
33
  |----|-----------|---------------|
34
- | R1 | Verification Strategy and ADR Binding references resolve | Every command, file path, function, endpoint, fixture, seed, and test reference in the work plan's Verification Strategies either exists now or is the deliverable of a task in the plan; every ADR Bindings source path resolves; every ADR Bindings `covered` row references existing task IDs |
34
+ | R1 | Verification Strategy and binding references resolve | Every command, file path, function, endpoint, fixture, seed, and test reference in the work plan's Verification Strategies either exists now or is the deliverable of a task in the plan; every Reference Contract Values `covered` row references existing task IDs; every Reference Contract Values `gap` row has Notes with user-confirmation handling; every ADR Bindings source path resolves; every ADR Bindings `covered` row references existing task IDs |
35
35
  | R2 | E2E prerequisites are addressed | For each fixture-e2e or service-integration-e2e skeleton, every noted precondition is present in the codebase or covered by a Phase 0 task |
36
36
  | R3 | Phase 1 observability exists | The first implementation phase includes at least one operation verification method executable at task completion using existing files, prior Phase 0 deliverables, or the task's own output |
37
37
  | R4 | UI rendering surface exists | When the plan implements UI components, a fixture entry, dev route, Storybook story, preview harness, or equivalent render surface exists or is covered by a Phase 0 task |
@@ -47,6 +47,7 @@ Read the work plan passed in `$ARGUMENTS`; if absent, select the most recent non
47
47
  - Verification Strategies
48
48
  - Quality Assurance Mechanisms
49
49
  - Design-to-Plan Traceability
50
+ - Reference Contract Values
50
51
  - ADR Bindings
51
52
  - UI Spec Component -> Task Mapping
52
53
  - Connection Map
@@ -219,9 +219,9 @@ Work plans use the header line `Implementation Readiness: <status>`.
219
219
 
220
220
  Use this procedure after work-plan approval and before autonomous task execution when the flow needs to verify implementation readiness. The procedure supplies the evidence needed for user decisions; prompts for approval only after concrete failing criteria and proposed prep tasks are known.
221
221
 
222
- 1. Load the approved work plan exact path and extract Verification Strategies, Quality Assurance Mechanisms, Design-to-Plan Traceability, ADR Bindings, UI Spec Component -> Task Mapping, Connection Map, test skeleton references, E2E absence reasons, phase structure, referenced Design Docs, ADRs, and UI Specs.
222
+ 1. Load the approved work plan exact path and extract Verification Strategies, Quality Assurance Mechanisms, Design-to-Plan Traceability, Reference Contract Values, ADR Bindings, UI Spec Component -> Task Mapping, Connection Map, test skeleton references, E2E absence reasons, phase structure, referenced Design Docs, ADRs, and UI Specs.
223
223
  2. Evaluate these criteria with evidence:
224
- - R1 Verification Strategy and ADR Binding references resolve
224
+ - R1 Verification Strategy and binding references resolve
225
225
  - R2 E2E prerequisites are addressed
226
226
  - R3 Phase 1 observability exists
227
227
  - R4 UI rendering surface exists when UI work is present
@@ -105,7 +105,7 @@ work-planner's existing Integration Complete criteria naturally covers cross-lay
105
105
 
106
106
  After work-planner creates or updates the plan, spawn document-reviewer:
107
107
 
108
- > "Review the fullstack work plan. doc_type: WorkPlan. target: [work plan path]. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
108
+ > "Review the fullstack work plan. doc_type: WorkPlan. target: [work plan path]. mode: composite. Review semantic traceability to all Design Docs, UI Spec when present, Reference Contract Values fidelity, cross-layer boundary coverage, early verification placement, real-boundary verification coverage, Proof Strategy, Failure Mode Checklist, Review Scope, and Quality Assurance coverage."
109
109
 
110
110
  On `needs_revision` or `approved_with_conditions`, return to work-planner in update mode and re-review for max 2 revision iterations as defined by the `needs_revision` row in Approval Status Vocabulary. On `rejected`, halt and escalate to the user. Stop for batch approval only after WorkPlan review returns `approved` and the plan's `WorkPlan Review` section records `Status: approved` with `Conditions: none`.
111
111
 
@@ -52,11 +52,11 @@ For language-specific testing patterns, also read:
52
52
 
53
53
  ## Quality Requirements [MANDATORY]
54
54
 
55
- ### Coverage Standards
55
+ ### Coverage
56
56
 
57
- - **Minimum 80% code coverage** for production code
58
- - Prioritize critical paths and business logic
59
- - Use coverage as a guide, not a goal
57
+ - Treat coverage as a diagnostic signal for finding untested areas, not a target. Targets get gamed into trivial tests.
58
+ - Concentrate tests on critical paths, business logic, and behavior whose regression would matter.
59
+ - Prioritize meaningful assertions over the coverage number. Any required threshold comes from the project's CI, task file, work plan, or Design Doc.
60
60
 
61
61
  ### Test Characteristics
62
62
 
@@ -279,7 +279,7 @@ Always test:
279
279
  ☐ All tests pass
280
280
  ☐ No tests skipped or commented
281
281
  ☐ No debug code left in tests
282
- Test coverage meets standards (≥ 80%)
282
+ Coverage threshold passes when the project, task file, work plan, or Design Doc defines one
283
283
  ☐ Tests run in reasonable time
284
284
 
285
285
  ### Zero Tolerance Policy
@@ -10,13 +10,9 @@ import { render, screen } from '@testing-library/react'
10
10
  import userEvent from '@testing-library/user-event'
11
11
  ```
12
12
 
13
- ### Coverage Requirements
13
+ ### Where to Concentrate Test Rigor
14
14
 
15
- - **Overall minimum**: 60%
16
- - **Atomic Design projects**: Atoms 70%+, Molecules 65%+, Organisms 60%+
17
- - **Other component architectures**: Keep 60% as the baseline and raise foundational or highly reused components to 70%+
18
- - **Custom Hooks**: 65%+
19
- - **Utils**: 70%+
15
+ Test foundational, high-reuse units the hardest: shared components, custom hooks, utilities, and business rules reused across features carry the widest blast radius. Higher-composition surfaces such as pages and organisms lean more on integration or E2E coverage. Any numeric threshold is the project's CI, task file, work plan, or Design Doc config.
20
16
 
21
17
  ### Test Types
22
18
 
@@ -265,53 +265,11 @@ A skeleton is committed before its implementation exists, so its committed form
265
265
  ### Generation Report
266
266
 
267
267
  ```json
268
- {
269
- "status": "completed",
270
- "feature": "[feature name]",
271
- "generatedFiles": {
272
- "integration": "[path]/[feature].int.test.[ext]",
273
- "fixtureE2e": null,
274
- "serviceE2e": null
275
- },
276
- "budgetUsage": {
277
- "integration": "2/3",
278
- "fixtureE2e": "0/3",
279
- "serviceE2e": "0/2"
280
- },
281
- "e2eAbsenceReason": {
282
- "fixtureE2e": "all_e2e_candidates_below_threshold",
283
- "serviceE2e": "no_real_service_dependency"
284
- },
285
- "boundaryProofGaps": []
286
- }
268
+ {"status":"completed","feature":"[feature name]","generatedFiles":{"integration":"[path]/[feature].int.test.[ext]","fixtureE2e":null,"serviceE2e":null},"budgetUsage":{"integration":"2/3","fixtureE2e":"0/3","serviceE2e":"0/2"},"e2eAbsenceReason":{"fixtureE2e":"all_e2e_candidates_below_threshold","serviceE2e":"no_real_service_dependency"},"boundaryProofGaps":[]}
287
269
  ```
288
270
 
289
271
  ```json
290
- {
291
- "status": "completed",
292
- "feature": "[feature name]",
293
- "generatedFiles": {
294
- "integration": "[path]/[feature].int.test.[ext]",
295
- "fixtureE2e": "[path]/[feature].fixture.e2e.test.[ext]",
296
- "serviceE2e": "[path]/[feature].service.e2e.test.[ext]"
297
- },
298
- "budgetUsage": {
299
- "integration": "2/3",
300
- "fixtureE2e": "1/3",
301
- "serviceE2e": "1/2"
302
- },
303
- "e2eAbsenceReason": {
304
- "fixtureE2e": null,
305
- "serviceE2e": null
306
- },
307
- "boundaryProofGaps": [
308
- {
309
- "acId": "[AC-XXX]",
310
- "boundaryPath": "[branch/state/input/lifecycle/fallback/visibility path]",
311
- "reason": "budget_insufficient_for_boundary_proof"
312
- }
313
- ]
314
- }
272
+ {"status":"completed","feature":"[feature name]","generatedFiles":{"integration":"[path]/[feature].int.test.[ext]","fixtureE2e":"[path]/[feature].fixture.e2e.test.[ext]","serviceE2e":"[path]/[feature].service.e2e.test.[ext]"},"budgetUsage":{"integration":"2/3","fixtureE2e":"1/3","serviceE2e":"1/2"},"e2eAbsenceReason":{"fixtureE2e":null,"serviceE2e":null},"boundaryProofGaps":[{"acId":"[AC-XXX]","boundaryPath":"[branch/state/input/lifecycle/fallback/visibility path]","reason":"budget_insufficient_for_boundary_proof"}]}
315
273
  ```
316
274
 
317
275
  ## Test Meta Information Assignment
@@ -64,6 +64,7 @@ Read the Design Doc in full and extract:
64
64
  - Architecture design and data flow
65
65
  - Interface contracts (function signatures, API endpoints, data structures)
66
66
  - Identifier specifications explicitly written in the Design Doc as exact values, literals, labels, or named fields (resource names, endpoint paths, configuration keys, error codes, schema/model names)
67
+ - Binding observable contracts: use the Design Doc's `Observable Contract Values` table as the primary source when present; otherwise extract column/field/label sets and order, derived-display rules, and state-lifecycle negatives from the Design Doc. Also extract Field Propagation Map rows that carry a Serialized Format and Consumer Parse Rule
67
68
  - Error handling policy
68
69
  - Non-functional requirements
69
70
 
@@ -78,6 +79,7 @@ For each acceptance criterion extracted in Step 1:
78
79
  - For behavior-changing ACs, confirm the evidence covers main and boundary paths. Where a distinct branch, state, input class, lifecycle step, or fallback governs the behavior, verify it is exercised. Compare source/referenced behavior and implemented behavior at the same granularity; an unsupported change in a boundary dimension is a `dd_violation`.
79
80
  - Confirm the implementation keeps the core mechanism the AC, Design Doc, or referenced materials require. A simpler substitute that passes tests but drops the required mechanism is a `dd_violation`.
80
81
  - For changes to persisted, shared, or externally observable state, identify the publication boundary where the new state becomes observable to another process, component, user, or later step. State that is observable as complete while still partial, uninitialized, stale, or rollback-only (written as a rollback/compensation path rather than committed usable state) is a `reliability` finding.
82
+ - When the reviewed task has `Change Category` set to `bug-fix`, `regression`, `state-change`, or `boundary-change`, check cases sharing its path, contract, persisted state, or external boundary. When no task field is present, classify the change from the diff itself. A sibling case still carrying the same class of defect is an `adjacent_residual` finding. When the task file is in scope, also read Investigation Notes for residuals the executor recorded as outside Target Files; verify each recorded residual and report in-scope unresolved residuals as `adjacent_residual`.
81
83
 
82
84
  #### 2-2. Identifier Verification
83
85
  For each identifier specification extracted in Step 1:
@@ -105,6 +107,13 @@ Assign confidence based on evidence count:
105
107
  - medium: 2 agreeing sources
106
108
  - low: 1 source only
107
109
 
110
+ #### 2-4. Reference Contract and Boundary Verification
111
+
112
+ Run this independently of the AC loop so observable contracts without dedicated ACs are verified.
113
+
114
+ 1. For each binding observable value extracted in Step 1 (column/field/label set and order, derived-display rule, state-lifecycle negative), verify the implementation reproduces it exactly. A deviation is a `dd_violation` whose rationale names it a reference contract gap and states the required observable value versus the implemented value.
115
+ 2. For each Field Propagation Map serialized boundary extracted in Step 1 (Serialized Format and Consumer Parse Rule), verify the producer emits the recorded representation and the consumer parses it by the recorded rule. A mismatch is a `dd_violation` whose rationale names it a boundary contract gap and states what the producer emits versus what the consumer parses.
116
+
108
117
  ### 3. Assess Code Quality
109
118
  Read each implementation file and evaluate:
110
119
 
@@ -134,12 +143,14 @@ Classify each quality finding into one of:
134
143
  - `maintainability`: code structure impedes change or comprehension
135
144
  - `reliability`: missing safeguards could cause runtime failure
136
145
  - `coverage_gap`: acceptance criteria lack meaningful test verification
146
+ - `adjacent_residual`: a case sharing the change's path, contract, persisted state, or external boundary still carries the same class of defect
137
147
 
138
148
  Each finding MUST include a rationale:
139
149
  - `dd_violation`: what the Design Doc says vs what code does
140
150
  - `maintainability`: the concrete maintenance or comprehension risk
141
151
  - `reliability`: the failure scenario and triggering conditions
142
152
  - `coverage_gap`: the untested AC and why coverage matters
153
+ - `adjacent_residual`: which adjacent case shares the path, contract, persisted state, or external boundary and how it still exhibits the defect class
143
154
 
144
155
  ### 4. Check Architecture Compliance
145
156
  Verify against the Design Doc architecture:
@@ -161,63 +172,7 @@ Return the JSON result as the final response. See Output Format for the schema.
161
172
  ## Output Format
162
173
 
163
174
  ```json
164
- {
165
- "complianceRate": "[X]%",
166
- "identifierMatchRate": "[X]%",
167
- "verdict": "[pass/needs-improvement/needs-redesign]",
168
-
169
- "acceptanceCriteria": [
170
- {
171
- "item": "[acceptance criteria name]",
172
- "status": "fulfilled|partially_fulfilled|unfulfilled",
173
- "confidence": "high|medium|low",
174
- "location": "[file:line, if implemented]",
175
- "evidence": ["[source1: file:line]", "[source2: test file:line]"],
176
- "gap": "[what is missing or deviating, if not fully fulfilled]",
177
- "suggestion": "[specific fix, if not fully fulfilled]"
178
- }
179
- ],
180
-
181
- "identifierVerification": [
182
- {
183
- "identifier": "[identifier name]",
184
- "designDocValue": "[value specified in Design Doc]",
185
- "codeValue": "[value found in code, or 'not found']",
186
- "location": "[file:line]",
187
- "confidence": "high|medium|low",
188
- "evidence": ["[source1: file:line]", "[source2: config file:line]"],
189
- "match": false
190
- }
191
- ],
192
-
193
- "qualityFindings": [
194
- {
195
- "category": "dd_violation|maintainability|reliability|coverage_gap",
196
- "location": "[filename:function or file:line]",
197
- "description": "[specific issue]",
198
- "rationale": "[why this matters]",
199
- "suggestion": "[specific improvement]"
200
- }
201
- ],
202
-
203
- "summary": {
204
- "acsTotal": 0,
205
- "acsFulfilled": 0,
206
- "acsPartial": 0,
207
- "acsUnfulfilled": 0,
208
- "identifiersTotal": 0,
209
- "identifiersMatched": 0,
210
- "lowConfidenceItems": 0,
211
- "findingsByCategory": {
212
- "dd_violation": 0,
213
- "maintainability": 0,
214
- "reliability": 0,
215
- "coverage_gap": 0
216
- }
217
- },
218
-
219
- "nextAction": "[highest priority action needed]"
220
- }
175
+ {"complianceRate":"[X]%","identifierMatchRate":"[X]%","verdict":"[pass/needs-improvement/needs-redesign]","acceptanceCriteria":[{"item":"[acceptance criteria name]","status":"fulfilled|partially_fulfilled|unfulfilled","confidence":"high|medium|low","location":"[file:line, if implemented]","evidence":["[source1: file:line]","[source2: test file:line]"],"gap":"[what is missing or deviating, if not fully fulfilled]","suggestion":"[specific fix, if not fully fulfilled]"}],"identifierVerification":[{"identifier":"[identifier name]","designDocValue":"[value specified in Design Doc]","codeValue":"[value found in code, or 'not found']","location":"[file:line]","confidence":"high|medium|low","evidence":["[source1: file:line]","[source2: config file:line]"],"match":false}],"qualityFindings":[{"category":"dd_violation|maintainability|reliability|coverage_gap|adjacent_residual","location":"[filename:function or file:line]","description":"[specific issue]","rationale":"[why this matters]","suggestion":"[specific improvement]"}],"summary":{"acsTotal":0,"acsFulfilled":0,"acsPartial":0,"acsUnfulfilled":0,"identifiersTotal":0,"identifiersMatched":0,"lowConfidenceItems":0,"findingsByCategory":{"dd_violation":0,"maintainability":0,"reliability":0,"coverage_gap":0,"adjacent_residual":0}},"nextAction":"[highest priority action needed]"}
221
176
  ```
222
177
 
223
178
  `identifierVerification` MUST include mismatches only. Use `summary.identifiersTotal` and `summary.identifiersMatched` for overall counts.