@zigrivers/scaffold 2.38.1 → 2.44.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (201) hide show
  1. package/README.md +10 -7
  2. package/dist/cli/commands/build.js +4 -4
  3. package/dist/cli/commands/build.js.map +1 -1
  4. package/dist/cli/commands/check.test.js +11 -8
  5. package/dist/cli/commands/check.test.js.map +1 -1
  6. package/dist/cli/commands/complete.d.ts.map +1 -1
  7. package/dist/cli/commands/complete.js +2 -1
  8. package/dist/cli/commands/complete.js.map +1 -1
  9. package/dist/cli/commands/complete.test.js +4 -1
  10. package/dist/cli/commands/complete.test.js.map +1 -1
  11. package/dist/cli/commands/dashboard.js +4 -4
  12. package/dist/cli/commands/dashboard.js.map +1 -1
  13. package/dist/cli/commands/knowledge.js +2 -2
  14. package/dist/cli/commands/knowledge.js.map +1 -1
  15. package/dist/cli/commands/knowledge.test.js +5 -12
  16. package/dist/cli/commands/knowledge.test.js.map +1 -1
  17. package/dist/cli/commands/list.d.ts +1 -1
  18. package/dist/cli/commands/list.d.ts.map +1 -1
  19. package/dist/cli/commands/list.js +84 -3
  20. package/dist/cli/commands/list.js.map +1 -1
  21. package/dist/cli/commands/list.test.js +82 -0
  22. package/dist/cli/commands/list.test.js.map +1 -1
  23. package/dist/cli/commands/next.test.js +4 -1
  24. package/dist/cli/commands/next.test.js.map +1 -1
  25. package/dist/cli/commands/reset.d.ts.map +1 -1
  26. package/dist/cli/commands/reset.js +5 -2
  27. package/dist/cli/commands/reset.js.map +1 -1
  28. package/dist/cli/commands/reset.test.js +4 -1
  29. package/dist/cli/commands/reset.test.js.map +1 -1
  30. package/dist/cli/commands/rework.d.ts.map +1 -1
  31. package/dist/cli/commands/rework.js +3 -2
  32. package/dist/cli/commands/rework.js.map +1 -1
  33. package/dist/cli/commands/run.d.ts.map +1 -1
  34. package/dist/cli/commands/run.js +28 -13
  35. package/dist/cli/commands/run.js.map +1 -1
  36. package/dist/cli/commands/run.test.js +1 -1
  37. package/dist/cli/commands/run.test.js.map +1 -1
  38. package/dist/cli/commands/skip.d.ts.map +1 -1
  39. package/dist/cli/commands/skip.js +2 -1
  40. package/dist/cli/commands/skip.js.map +1 -1
  41. package/dist/cli/commands/skip.test.js +4 -1
  42. package/dist/cli/commands/skip.test.js.map +1 -1
  43. package/dist/cli/commands/status.d.ts.map +1 -1
  44. package/dist/cli/commands/status.js +88 -4
  45. package/dist/cli/commands/status.js.map +1 -1
  46. package/dist/cli/commands/version.d.ts.map +1 -1
  47. package/dist/cli/commands/version.js +22 -3
  48. package/dist/cli/commands/version.js.map +1 -1
  49. package/dist/cli/commands/version.test.js +42 -0
  50. package/dist/cli/commands/version.test.js.map +1 -1
  51. package/dist/cli/output/context.test.js +14 -13
  52. package/dist/cli/output/context.test.js.map +1 -1
  53. package/dist/cli/output/interactive.js +4 -4
  54. package/dist/cli/output/json.d.ts +1 -0
  55. package/dist/cli/output/json.d.ts.map +1 -1
  56. package/dist/cli/output/json.js +14 -1
  57. package/dist/cli/output/json.js.map +1 -1
  58. package/dist/config/loader.d.ts.map +1 -1
  59. package/dist/config/loader.js +10 -3
  60. package/dist/config/loader.js.map +1 -1
  61. package/dist/config/loader.test.js +28 -0
  62. package/dist/config/loader.test.js.map +1 -1
  63. package/dist/core/assembly/engine.d.ts.map +1 -1
  64. package/dist/core/assembly/engine.js +6 -1
  65. package/dist/core/assembly/engine.js.map +1 -1
  66. package/dist/e2e/init.test.js +3 -0
  67. package/dist/e2e/init.test.js.map +1 -1
  68. package/dist/index.js +2 -1
  69. package/dist/index.js.map +1 -1
  70. package/dist/project/adopt.test.js +3 -0
  71. package/dist/project/adopt.test.js.map +1 -1
  72. package/dist/project/claude-md.d.ts.map +1 -1
  73. package/dist/project/claude-md.js +2 -1
  74. package/dist/project/claude-md.js.map +1 -1
  75. package/dist/project/detector.js +3 -3
  76. package/dist/project/detector.js.map +1 -1
  77. package/dist/project/signals.d.ts +1 -0
  78. package/dist/project/signals.d.ts.map +1 -1
  79. package/dist/state/decision-logger.d.ts.map +1 -1
  80. package/dist/state/decision-logger.js +7 -4
  81. package/dist/state/decision-logger.js.map +1 -1
  82. package/dist/state/lock-manager.js +1 -1
  83. package/dist/state/lock-manager.js.map +1 -1
  84. package/dist/state/lock-manager.test.js +27 -3
  85. package/dist/state/lock-manager.test.js.map +1 -1
  86. package/dist/state/state-manager.d.ts.map +1 -1
  87. package/dist/state/state-manager.js +6 -0
  88. package/dist/state/state-manager.js.map +1 -1
  89. package/dist/state/state-manager.test.js +7 -0
  90. package/dist/state/state-manager.test.js.map +1 -1
  91. package/dist/types/assembly.d.ts +2 -0
  92. package/dist/types/assembly.d.ts.map +1 -1
  93. package/dist/utils/eligible.d.ts +8 -0
  94. package/dist/utils/eligible.d.ts.map +1 -0
  95. package/dist/utils/eligible.js +36 -0
  96. package/dist/utils/eligible.js.map +1 -0
  97. package/dist/validation/config-validator.test.js +15 -13
  98. package/dist/validation/config-validator.test.js.map +1 -1
  99. package/dist/validation/index.test.js +1 -1
  100. package/dist/wizard/wizard.d.ts.map +1 -1
  101. package/dist/wizard/wizard.js +1 -0
  102. package/dist/wizard/wizard.js.map +1 -1
  103. package/dist/wizard/wizard.test.js +2 -0
  104. package/dist/wizard/wizard.test.js.map +1 -1
  105. package/knowledge/core/automated-review-tooling.md +4 -4
  106. package/knowledge/core/eval-craft.md +44 -0
  107. package/knowledge/core/multi-model-review-dispatch.md +8 -0
  108. package/knowledge/core/system-architecture.md +39 -0
  109. package/knowledge/core/task-decomposition.md +53 -0
  110. package/knowledge/core/testing-strategy.md +160 -0
  111. package/knowledge/finalization/implementation-playbook.md +24 -7
  112. package/knowledge/product/prd-craft.md +41 -0
  113. package/knowledge/review/review-adr.md +1 -1
  114. package/knowledge/review/review-api-design.md +1 -1
  115. package/knowledge/review/review-database-design.md +1 -1
  116. package/knowledge/review/review-domain-modeling.md +1 -1
  117. package/knowledge/review/review-implementation-tasks.md +1 -1
  118. package/knowledge/review/review-methodology.md +1 -1
  119. package/knowledge/review/review-operations.md +1 -1
  120. package/knowledge/review/review-prd.md +1 -1
  121. package/knowledge/review/review-security.md +1 -1
  122. package/knowledge/review/review-system-architecture.md +1 -1
  123. package/knowledge/review/review-testing-strategy.md +1 -1
  124. package/knowledge/review/review-user-stories.md +1 -1
  125. package/knowledge/review/review-ux-specification.md +1 -1
  126. package/knowledge/review/review-vision.md +1 -1
  127. package/knowledge/tools/post-implementation-review-methodology.md +107 -0
  128. package/knowledge/validation/critical-path-analysis.md +13 -0
  129. package/knowledge/validation/implementability-review.md +14 -0
  130. package/package.json +2 -1
  131. package/pipeline/architecture/review-architecture.md +8 -5
  132. package/pipeline/architecture/system-architecture.md +9 -3
  133. package/pipeline/build/multi-agent-resume.md +21 -7
  134. package/pipeline/build/multi-agent-start.md +22 -7
  135. package/pipeline/build/new-enhancement.md +20 -12
  136. package/pipeline/build/quick-task.md +18 -11
  137. package/pipeline/build/single-agent-resume.md +20 -6
  138. package/pipeline/build/single-agent-start.md +24 -8
  139. package/pipeline/consolidation/claude-md-optimization.md +8 -4
  140. package/pipeline/consolidation/workflow-audit.md +9 -5
  141. package/pipeline/decisions/adrs.md +7 -3
  142. package/pipeline/decisions/review-adrs.md +8 -5
  143. package/pipeline/environment/ai-memory-setup.md +6 -2
  144. package/pipeline/environment/automated-pr-review.md +79 -12
  145. package/pipeline/environment/design-system.md +9 -6
  146. package/pipeline/environment/dev-env-setup.md +8 -5
  147. package/pipeline/environment/git-workflow.md +16 -13
  148. package/pipeline/finalization/apply-fixes-and-freeze.md +10 -5
  149. package/pipeline/finalization/developer-onboarding-guide.md +10 -3
  150. package/pipeline/finalization/implementation-playbook.md +13 -4
  151. package/pipeline/foundation/beads.md +8 -5
  152. package/pipeline/foundation/coding-standards.md +13 -10
  153. package/pipeline/foundation/project-structure.md +16 -13
  154. package/pipeline/foundation/tdd.md +9 -4
  155. package/pipeline/foundation/tech-stack.md +7 -5
  156. package/pipeline/integration/add-e2e-testing.md +12 -8
  157. package/pipeline/modeling/domain-modeling.md +9 -7
  158. package/pipeline/modeling/review-domain-modeling.md +8 -6
  159. package/pipeline/parity/platform-parity-review.md +9 -6
  160. package/pipeline/planning/implementation-plan-review.md +10 -7
  161. package/pipeline/planning/implementation-plan.md +41 -9
  162. package/pipeline/pre/create-prd.md +7 -4
  163. package/pipeline/pre/innovate-prd.md +12 -8
  164. package/pipeline/pre/innovate-user-stories.md +10 -7
  165. package/pipeline/pre/review-prd.md +12 -10
  166. package/pipeline/pre/review-user-stories.md +12 -9
  167. package/pipeline/pre/user-stories.md +7 -4
  168. package/pipeline/quality/create-evals.md +6 -3
  169. package/pipeline/quality/operations.md +7 -3
  170. package/pipeline/quality/review-operations.md +12 -5
  171. package/pipeline/quality/review-security.md +11 -6
  172. package/pipeline/quality/review-testing.md +11 -6
  173. package/pipeline/quality/security.md +6 -2
  174. package/pipeline/quality/story-tests.md +14 -9
  175. package/pipeline/specification/api-contracts.md +9 -3
  176. package/pipeline/specification/database-schema.md +8 -2
  177. package/pipeline/specification/review-api.md +10 -4
  178. package/pipeline/specification/review-database.md +8 -3
  179. package/pipeline/specification/review-ux.md +9 -3
  180. package/pipeline/specification/ux-spec.md +9 -4
  181. package/pipeline/validation/critical-path-walkthrough.md +10 -5
  182. package/pipeline/validation/cross-phase-consistency.md +9 -4
  183. package/pipeline/validation/decision-completeness.md +8 -3
  184. package/pipeline/validation/dependency-graph-validation.md +8 -3
  185. package/pipeline/validation/implementability-dry-run.md +9 -5
  186. package/pipeline/validation/scope-creep-check.md +11 -6
  187. package/pipeline/validation/traceability-matrix.md +10 -5
  188. package/pipeline/vision/create-vision.md +7 -4
  189. package/pipeline/vision/innovate-vision.md +11 -8
  190. package/pipeline/vision/review-vision.md +15 -12
  191. package/skills/multi-model-dispatch/SKILL.md +6 -5
  192. package/skills/scaffold-runner/SKILL.md +47 -3
  193. package/tools/dashboard.md +53 -0
  194. package/tools/post-implementation-review.md +655 -0
  195. package/tools/prompt-pipeline.md +160 -0
  196. package/tools/release.md +440 -0
  197. package/tools/review-pr.md +229 -0
  198. package/tools/session-analyzer.md +299 -0
  199. package/tools/update.md +113 -0
  200. package/tools/version-bump.md +290 -0
  201. package/tools/version.md +82 -0
@@ -41,7 +41,7 @@ about ecosystem maturity, alternatives, and gotchas.
41
41
  - (mvp) Every choice is a decision, not a menu of options
42
42
  - (mvp) Quick Reference section lists every dependency with version
43
43
  - (deep) Each technology choice documents AI compatibility assessment (training data availability, convention strength); total direct dependencies counted and justified
44
- - (depth 4+) Multi-model recommendations cross-referenced agreements flagged as high-confidence, disagreements flagged for human decision
44
+ - (depth 4+) Multi-model recommendations synthesized: Consensus (all models agree), Majority (2+ models agree), or Divergent (models disagree — present to user for decision)
45
45
 
46
46
  ## Methodology Scaling
47
47
  - **deep**: Comprehensive research with competitive analysis for each category.
@@ -51,10 +51,12 @@ about ecosystem maturity, alternatives, and gotchas.
51
51
  to Claude-only enhanced research.
52
52
  - **mvp**: Core stack decisions only (language, framework, database, test runner).
53
53
  Brief rationale. Quick Reference with versions. 2-3 pages.
54
- - **custom:depth(1-5)**: Depth 1-2: MVP decisions. Depth 3: add infrastructure
55
- and tooling. Depth 4: add AI compatibility analysis + one external model
56
- (if CLI available). Depth 5: full competitive analysis and upgrade strategy
57
- + multi-model with cross-referencing.
54
+ - **custom:depth(1-5)**:
55
+ - Depth 1: Core stack decisions only (language, framework, database). Brief rationale. 1 page.
56
+ - Depth 2: Depth 1 + test runner choice and Quick Reference with versions. 2-3 pages.
57
+ - Depth 3: Add infrastructure, tooling, and developer experience recommendations.
58
+ - Depth 4: Add AI compatibility analysis + one external model research (if CLI available).
59
+ - Depth 5: Full competitive analysis per category, upgrade strategy, + multi-model with cross-referencing.
58
60
 
59
61
  ## Mode Detection
60
62
  Update mode if docs/tech-stack.md exists. In update mode: never change a
@@ -5,7 +5,7 @@ summary: "Detects whether your project is web or mobile, then configures Playwri
5
5
  phase: "integration"
6
6
  order: 410
7
7
  dependencies: [git-workflow, tdd]
8
- outputs: [tests/screenshots/, maestro/]
8
+ outputs: [tests/screenshots/, maestro/, playwright.config.ts]
9
9
  reads: [coding-standards, user-stories]
10
10
  conditional: "if-needed"
11
11
  knowledge-base: [testing-strategy]
@@ -39,13 +39,14 @@ Outputs vary by detected platform:
39
39
  - (mvp) (web) Playwright config uses framework-specific dev server command and port
40
40
  - (mvp) (web) Smoke test passes (navigate, screenshot, close)
41
41
  - (mvp) (mobile) Maestro CLI installed, sample flow executes, screenshot captured
42
- - (mobile) testID naming convention defined and documented
42
+ - (mvp) (mobile) testID naming convention defined and documented
43
43
  - (mvp) E2E section in tdd-standards.md distinguishes when to use E2E vs unit tests
44
- - Baseline screenshots committed, current screenshots gitignored
45
- - CLAUDE.md contains browser/mobile testing section
46
- - tdd-standards.md E2E section updated with when-to-use guidance
44
+ - (mvp) Baseline screenshots committed, current screenshots gitignored
45
+ - (mvp) CLAUDE.md contains browser/mobile testing section
46
+ - (mvp) tdd-standards.md E2E section updated with when-to-use guidance
47
47
  - (deep) CI integration configured for E2E test execution
48
48
  - (deep) Sub-flows defined for common user journeys
49
+ - (deep) Smoke test names and intent are consistent between Playwright and Maestro
49
50
 
50
51
  ## Methodology Scaling
51
52
  - **deep**: Full setup for all detected platforms. All visual testing patterns,
@@ -53,9 +54,12 @@ Outputs vary by detected platform:
53
54
  common journeys, and comprehensive documentation updates.
54
55
  - **mvp**: Basic config and smoke test for detected platform. Minimal docs
55
56
  updates. Two viewports for web, single platform for mobile.
56
- - **custom:depth(1-5)**: Depth 1-2: config + smoke test. Depth 3: add patterns,
57
- naming, testID rules. Depth 4: add CI integration, both mobile platforms.
58
- Depth 5: full suite with baseline management and sub-flows.
57
+ - **custom:depth(1-5)**:
58
+ - Depth 1: Config + smoke test for primary platform only
59
+ - Depth 2: Config + smoke test with basic viewport/device coverage
60
+ - Depth 3: Add patterns, naming conventions, and testID rules
61
+ - Depth 4: Add CI integration and both mobile platforms
62
+ - Depth 5: Full suite with baseline management, sub-flows, and cross-platform consistency
59
63
 
60
64
  ## Conditional Evaluation
61
65
  Enable when: tech-stack.md indicates a web frontend (Playwright) or mobile app
@@ -35,13 +35,12 @@ and aggregate boundaries. User actions reveal the domain model.
35
35
  - docs/domain-models/index.md — overview of all domains and their relationships
36
36
 
37
37
  ## Quality Criteria
38
- - (mvp) Every PRD feature maps to at least one domain
38
+ - (mvp) Every PRD feature maps to >= 1 domain
39
39
  - (mvp) Entity relationships are explicit (not implied)
40
40
  - (mvp) Each aggregate boundary documents: the invariant it protects, the consistency boundary it enforces, and why included entities must change together
41
41
  - (deep) Domain events cover all state transitions
42
- - (deep) Each invariant is phrased as a boolean condition checkable in code (e.g., `order.total >= 0`, `user.email matches /^[^@]+@[^@]+$/`), not a narrative description
43
- - Ubiquitous language is consistent across all domain models
44
- - (mvp) All entity and concept names used consistently across domain model files (ubiquitous language enforced)
42
+ - (mvp) Each invariant is expressible as a runtime-checkable condition (assertion, validation rule, or database constraint) (e.g., `order.total >= 0`, `user.email matches /^[^@]+@[^@]+$/`), not a narrative description
43
+ - (mvp) Every entity name in one domain-model file uses the same name (no synonyms) in all other domain-model files
45
44
  - (deep) Cross-aggregate event flows documented for every state change that crosses aggregate boundaries
46
45
  - (deep) Cross-domain relationships are documented at context boundaries
47
46
 
@@ -51,9 +50,12 @@ and aggregate boundaries. User actions reveal the domain model.
51
50
  relationships between bounded contexts. Separate file per domain.
52
51
  - **mvp**: Key entities and their relationships in a single file. Core business
53
52
  rules listed. Enough to inform architecture decisions.
54
- - **custom:depth(1-5)**: Depth 1-2: single-file entity overview. Depth 3: separate
55
- files per domain with entities and events. Depth 4-5: full DDD approach with
56
- context maps and detailed invariants.
53
+ - **custom:depth(1-5)**:
54
+ - Depth 1: single-file entity list with key relationships.
55
+ - Depth 2: single-file entity overview with attributes and core business rules.
56
+ - Depth 3: separate files per domain with entities, events, and aggregate boundaries.
57
+ - Depth 4: full DDD approach with context maps, detailed invariants, and domain event flows.
58
+ - Depth 5: full DDD approach with cross-context integration contracts and sequence diagrams for all cross-aggregate flows.
57
59
 
58
60
  ## Mode Detection
59
61
  If docs/domain-models/ exists, operate in update mode: read existing models,
@@ -31,14 +31,14 @@ independent review validation.
31
31
 
32
32
  ## Quality Criteria
33
33
  - (mvp) All review passes executed with findings documented
34
- - (mvp) Every finding categorized by severity (P0-P3)
34
+ - (mvp) Every finding categorized by severity (P0-P3). Severity definitions: P0 = Breaks downstream work. P1 = Prevents quality milestone. P2 = Known tech debt. P3 = Polish.
35
35
  - (mvp) Fix plan created for P0 and P1 findings
36
36
  - (mvp) Fixes applied and re-validated
37
37
  - (mvp) Downstream readiness confirmed (decisions phase can proceed)
38
38
  - (mvp) Entity coverage verified (every PRD feature maps to at least one entity)
39
39
  - (deep) Aggregate boundaries verified (each aggregate protects at least one invariant)
40
40
  - (deep) Ubiquitous language consistency verified across all domain model files
41
- - (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
41
+ - (depth 4+) Multi-model findings synthesized: Consensus (all models agree), Majority (2+ models agree), or Divergent (models disagree — present to user for decision)
42
42
 
43
43
  ## Methodology Scaling
44
44
  - **deep**: All review passes from the knowledge base. Full findings report
@@ -46,10 +46,12 @@ independent review validation.
46
46
  review dispatched to Codex and Gemini if available, with graceful fallback
47
47
  to Claude-only enhanced review.
48
48
  - **mvp**: Quick consistency check. Focus on blocking issues only.
49
- - **custom:depth(1-5)**: Depth 1-2: blocking issues only. Depth 3: add coverage
50
- and consistency passes. Depth 4: full multi-pass review + one external model
51
- (if CLI available). Depth 5: full multi-pass review + multi-model with
52
- reconciliation.
49
+ - **custom:depth(1-5)**:
50
+ - Depth 1: single pass blocking issues only (entity coverage against PRD).
51
+ - Depth 2: two passes entity coverage + ubiquitous language consistency.
52
+ - Depth 3: four passes — entity coverage, ubiquitous language, aggregate boundary validation, and cross-domain consistency.
53
+ - Depth 4: all review passes + one external model (if CLI available).
54
+ - Depth 5: all review passes + multi-model with reconciliation.
53
55
 
54
56
  ## Mode Detection
55
57
  If docs/reviews/review-domain-modeling.md exists, this is a re-review. Read previous
@@ -56,8 +56,9 @@ Skip when the project targets a single platform only.
56
56
  - (deep) Navigation patterns appropriate per platform (sidebar vs. tab bar, etc.)
57
57
  - (deep) Offline/connectivity handling addressed per platform (if applicable)
58
58
  - (deep) Web version is treated as first-class (not afterthought) if PRD specifies it
59
- - Fix plan documented for all P0/P1 findings with specific document and section to update
60
- - (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
59
+ - (mvp) Every finding categorized P0-P3 (P0 = Breaks downstream work. P1 = Prevents quality milestone. P2 = Known tech debt. P3 = Polish.)
60
+ - (mvp) Fix plan documented for all P0/P1 findings with specific document and section to update
61
+ - (depth 4+) Multi-model findings synthesized: Consensus (all models agree), Majority (2+ models agree), or Divergent (models disagree — present to user for decision)
61
62
 
62
63
  ## Methodology Scaling
63
64
  - **deep**: Comprehensive platform audit across all documents, feature parity
@@ -67,10 +68,12 @@ Skip when the project targets a single platform only.
67
68
  to Claude-only enhanced review.
68
69
  - **mvp**: Quick check of user stories and tech-stack for platform coverage.
69
70
  Identify top 3 platform gaps. Skip detailed feature parity matrix.
70
- - **custom:depth(1-5)**: Depth 1-2: user stories platform check. Depth 3: add
71
- tech-stack and coding-standards. Depth 4: add feature parity matrix + one
72
- external model (if CLI available). Depth 5: full suite across all documents
73
- + multi-model with reconciliation.
71
+ - **custom:depth(1-5)**:
72
+ - Depth 1: User stories platform check only (1 review pass)
73
+ - Depth 2: Two-pass check — first pass validates user stories against platform constraints; second pass identifies implicit assumptions that could block native implementation (e.g., assumed web APIs not available on mobile).
74
+ - Depth 3: Add tech-stack and coding-standards platform audit (3 review passes)
75
+ - Depth 4: Add feature parity matrix + one external model if CLI available (3 review passes + external dispatch)
76
+ - Depth 5: Full suite across all documents + multi-model with reconciliation (3 review passes + multi-model synthesis)
74
77
 
75
78
  ## Mode Detection
76
79
  Update mode if docs/reviews/platform-parity-review.md exists. In update mode:
@@ -5,7 +5,7 @@ summary: "Verifies every feature has implementation tasks, no task is too large
5
5
  phase: "planning"
6
6
  order: 1220
7
7
  dependencies: [implementation-plan]
8
- outputs: [docs/reviews/review-tasks.md, docs/reviews/implementation-plan/task-coverage.json, docs/reviews/implementation-plan/review-summary.md]
8
+ outputs: [docs/reviews/review-tasks.md, docs/reviews/implementation-plan/task-coverage.json, docs/reviews/implementation-plan/review-summary.md, docs/reviews/implementation-plan/codex-review.json, docs/reviews/implementation-plan/gemini-review.json]
9
9
  conditional: null
10
10
  knowledge-base: [review-methodology, review-implementation-tasks, task-decomposition, multi-model-review-dispatch, review-step-template]
11
11
  ---
@@ -20,8 +20,8 @@ and produce a structured coverage matrix and review summary.
20
20
 
21
21
  ## Inputs
22
22
  - docs/implementation-plan.md (required) — tasks to review
23
- - docs/system-architecture.md (required) — for coverage checking
24
- - docs/domain-models/ (required) — for completeness
23
+ - docs/system-architecture.md (required at deep; optional — not available in MVP) — for coverage checking
24
+ - docs/domain-models/ (required at deep; optional — not available in MVP) — for completeness
25
25
  - docs/user-stories.md (required) — for AC coverage mapping
26
26
  - docs/plan.md (required) — for traceability
27
27
  - docs/project-structure.md (required) — for file contention analysis
@@ -51,9 +51,12 @@ and produce a structured coverage matrix and review summary.
51
51
  - **deep**: Full multi-pass review with multi-model validation. AC coverage
52
52
  matrix. Independent Codex/Gemini dispatches. Detailed reconciliation report.
53
53
  - **mvp**: Coverage check only. No external model dispatch.
54
- - **custom:depth(1-5)**: Depth 1-2: coverage check. Depth 3: add dependency
55
- analysis and AC coverage matrix. Depth 4: add one external model. Depth 5:
56
- full multi-model with reconciliation.
54
+ - **custom:depth(1-5)**:
55
+ - Depth 1: architecture coverage check (every component has tasks).
56
+ - Depth 2: coverage check plus DAG validation and agent executability rules.
57
+ - Depth 3: add dependency analysis, AC coverage matrix, and task sizing audit.
58
+ - Depth 4: add one external model review (Codex or Gemini).
59
+ - Depth 5: full multi-model review with reconciliation and detailed findings report.
57
60
 
58
61
  ## Mode Detection
59
62
  Re-review mode if previous review exists. If multi-model review artifacts exist
@@ -61,7 +64,7 @@ under docs/reviews/implementation-plan/, preserve prior findings still valid.
61
64
 
62
65
  ## Update Mode Specifics
63
66
 
64
- - **Detect**: `docs/reviews/review-implementation-plan.md` exists with tracking comment
67
+ - **Detect**: `docs/reviews/review-tasks.md` exists with tracking comment
65
68
  - **Preserve**: Prior findings still valid, resolution decisions, multi-model review artifacts
66
69
  - **Triggers**: Upstream artifact changed since last review (compare tracking comment dates)
67
70
  - **Conflict resolution**: Previously resolved findings reappearing = regression; flag and re-evaluate
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: implementation-plan
3
- description: Break architecture into implementable tasks with dependencies
3
+ description: Break deliverables into implementable tasks with dependencies, ordered by priority and dependencies
4
4
  summary: "Breaks your user stories and architecture into concrete tasks — each scoped to ~150 lines of code and 3 files max, with clear acceptance criteria, no ambiguous decisions, and explicit dependencies."
5
5
  phase: "planning"
6
6
  order: 1210
@@ -8,7 +8,7 @@ dependencies: [tdd, operations, security, review-architecture, create-evals]
8
8
  outputs: [docs/implementation-plan.md]
9
9
  reads: [create-prd, story-tests, database-schema, api-contracts, ux-spec]
10
10
  conditional: null
11
- knowledge-base: [task-decomposition]
11
+ knowledge-base: [task-decomposition, system-architecture]
12
12
  ---
13
13
 
14
14
  ## Purpose
@@ -37,17 +37,18 @@ The primary mapping is Story → Task(s), with PRD as the traceability root.
37
37
  assignment recommendations
38
38
 
39
39
  ## Quality Criteria
40
- - (mvp) Every architecture component has implementation tasks
41
- - (mvp) Task dependencies form a valid DAG (no cycles)
42
- - (mvp) Each task produces ~150 lines of net-new application code (excluding tests and generated files)
40
+ - (deep) Every architecture component has implementation tasks
41
+ - (mvp) Every user story has implementation tasks
42
+ - (mvp) Task dependencies form a valid DAG (no cycles, verified by checking no task depends on a later-ordered task)
43
+ - (mvp) Each task produces 150 +/- 50 lines of net-new application code (excluding tests and generated files)
43
44
  - (mvp) Tasks include acceptance criteria (how to know it's done)
44
45
  - (mvp) Tasks incorporate testing requirements from the testing strategy
45
46
  - (deep) Tasks reference corresponding test skeletons from tests/acceptance/ where applicable
46
47
  - (deep) Tasks incorporate security controls from the security review where applicable
47
48
  - (deep) Tasks incorporate operational requirements (monitoring, deployment) where applicable
48
- - (deep) Critical path is identified
49
49
  - (deep) Parallelization opportunities are marked with wave plan
50
- - (mvp) Every user story maps to at least one task
50
+ - (mvp) Every user story maps to >= 1 task
51
+ - (mvp) Every PRD feature maps to >= 1 user story, and every user story maps to >= 1 task (transitive traceability)
51
52
  - (deep) High-risk tasks are flagged with risk type and mitigation
52
53
  - (deep) Wave summary produced with agent allocation recommendation
53
54
  - (mvp) No task modifies more than 3 application files (test files excluded; exceptions require justification)
@@ -64,8 +65,39 @@ The primary mapping is Story → Task(s), with PRD as the traceability root.
64
65
  Each task has a brief description, rough size estimate, and key dependency.
65
66
  Enough to start working sequentially. Skip architecture decomposition —
66
67
  work directly from user story acceptance criteria.
67
- - **custom:depth(1-5)**: Depth 1-2: ordered list. Depth 3: add dependencies
68
- and sizing. Depth 4-5: full breakdown with parallelization.
68
+ - **custom:depth(1-5)**:
69
+ - Depth 1: ordered task list derived from PRD features only.
70
+ - Depth 2: ordered list with rough size estimates per task.
71
+ - Depth 3: add explicit dependencies and sizing (150-line budget, 3-file rule).
72
+ - Depth 4: full breakdown with dependency graph and parallelization plan.
73
+ - Depth 5: full breakdown with parallelization, wave assignments, agent allocation, and critical path analysis.
74
+
75
+ ## MVP-Specific Guidance (No Architecture Available)
76
+
77
+ At MVP depth, the system architecture document does not exist. Task decomposition
78
+ must work directly from user stories without explicit component definitions.
79
+
80
+ **How to decompose stories into tasks without architecture:**
81
+
82
+ 1. **Derive implicit layers from tech stack**: Read docs/tech-stack.md. For a web
83
+ app: API layer (backend), UI layer (frontend), Data layer (database). Each
84
+ story typically decomposes into one task per affected layer.
85
+
86
+ 2. **Map each story to layers**: "User can register" → 3 tasks: API endpoint,
87
+ UI form, database table. "User can view dashboard" → 2 tasks: API data
88
+ endpoint, UI display component.
89
+
90
+ 3. **Use acceptance criteria to define task boundaries**: Each AC (Given/When/Then)
91
+ maps to test cases. Group test cases by layer. Each layer's test cases become
92
+ one task.
93
+
94
+ > **Note**: If user stories are one-liner bullets without Given/When/Then ACs (MVP depth 1–2), derive task boundaries directly from the story text instead: treat each story's success condition as defining one task scope. Infer implied acceptance criteria from the story description before decomposing into tasks.
95
+
96
+ 4. **Order tasks by dependency**: Database migrations first, then API endpoints,
97
+ then UI components (bottom-up).
98
+
99
+ 5. **Split within layers when tasks exceed 150 lines**: Happy path in one task,
100
+ validation/error handling in another, edge cases in a third.
69
101
 
70
102
  ## Mode Detection
71
103
  Check for docs/implementation-plan.md. If it exists, operate in update mode:
@@ -28,7 +28,7 @@ throughout the entire pipeline.
28
28
  ## Quality Criteria
29
29
  - (mvp) Problem statement names a specific user group, a specific pain point, and a falsifiable hypothesis about the solution
30
30
  - (mvp) Target users are identified with their needs
31
- - (mvp) Features are scoped with clear boundaries (what's in, what's out)
31
+ - (mvp) Each feature defines at least one explicit out-of-scope item (what it does NOT do) in addition to what it does
32
32
  - (mvp) Success criteria are measurable
33
33
  - (mvp) Each non-functional requirement has a measurable target or threshold (e.g., 'page load < 2s', 'WCAG AA')
34
34
  - (mvp) No two sections contain contradictory statements about the same concept
@@ -40,9 +40,12 @@ throughout the entire pipeline.
40
40
  delivery plan. 15-20 pages.
41
41
  - **mvp**: Problem statement, core features list, primary user description,
42
42
  success criteria. 1-2 pages. Just enough to start building.
43
- - **custom:depth(1-5)**: Depth 1-2: MVP-style. Depth 3: add user personas
44
- and feature prioritization. Depth 4-5: full competitive analysis and
45
- phased delivery.
43
+ - **custom:depth(1-5)**:
44
+ - Depth 1: MVP-style problem statement, core features list, primary user. 1 page.
45
+ - Depth 2: MVP + success criteria and basic constraints. 1-2 pages.
46
+ - Depth 3: Add user personas and feature prioritization (MoSCoW). 3-5 pages.
47
+ - Depth 4: Add competitive analysis, risk assessment, and phased delivery plan. 8-12 pages.
48
+ - Depth 5: Full PRD with competitive analysis, phased delivery, and detailed non-functional requirements. 15-20 pages.
46
49
 
47
50
  ## Mode Detection
48
51
  If docs/plan.md exists, operate in update mode: read existing content, identify
@@ -8,6 +8,7 @@ dependencies: [review-prd]
8
8
  outputs: [docs/prd-innovation.md, docs/plan.md, docs/reviews/prd-innovation/review-summary.md, docs/reviews/prd-innovation/codex-review.json, docs/reviews/prd-innovation/gemini-review.json]
9
9
  conditional: "if-needed"
10
10
  knowledge-base: [prd-innovation, prd-craft, multi-model-review-dispatch]
11
+ reads: [review-prd]
11
12
  ---
12
13
 
13
14
  ## Purpose
@@ -35,12 +36,13 @@ creative opportunities and competitive insights.
35
36
  ## Quality Criteria
36
37
  - (mvp) Enhancements are feature-level, not UX-level polish
37
38
  - (mvp) Each suggestion has a cost estimate (trivial/moderate/significant)
38
- - (mvp) Each suggestion has a clear user benefit and impact assessment
39
+ - (mvp) Each suggestion specifies: the problem it solves for a specific user type, the expected behavior change, and cost estimate (trivial/moderate/significant)
39
40
  - (mvp) Each approved innovation includes: problem it solves, target users, scope boundaries, and success criteria
40
41
  - (mvp) PRD scope boundaries are respected — no uncontrolled scope creep
41
- - User approval is obtained before modifying the PRD
42
- - User approval for each accepted innovation documented as a question-response pair with timestamp (e.g., "Q: Accept feature X? A: Yes — 2025-01-15T14:30Z")
43
- - (depth 4+) Multi-model suggestions deduplicated and synthesized with unique ideas from each model highlighted
42
+ - (mvp) User approval is obtained before modifying the PRD
43
+ - (mvp) User approval for each accepted innovation documented as a question-response pair with timestamp (e.g., "Q: Accept feature X? A: Yes — 2025-01-15T14:30Z")
44
+ - (mvp) Each innovation marked with approval status: approved, deferred, or rejected, with user decision timestamp
45
+ - (depth 4+) Multi-model innovation suggestions synthesized: Consensus (all models propose similar direction), Majority (2+ models agree), or Divergent (models disagree — present all perspectives to user for selection)
44
46
 
45
47
  ## Methodology Scaling
46
48
  - **deep**: Full innovation pass across all categories (competitive research,
@@ -49,10 +51,12 @@ creative opportunities and competitive insights.
49
51
  innovation dispatched to Codex and Gemini if available, with graceful
50
52
  fallback to Claude-only enhanced brainstorming.
51
53
  - **mvp**: Not applicable — this step is conditional and skipped in MVP.
52
- - **custom:depth(1-5)**: Depth 1-2: skip (not enough context for meaningful innovation at this depth). Depth 3: quick scan
53
- for obvious gaps and missing expected features. Depth 4: full innovation
54
- pass + one external model (if CLI available). Depth 5: full innovation pass
55
- + multi-model with deduplication and synthesis.
54
+ - **custom:depth(1-5)**:
55
+ - Depth 1: Skip not enough context for meaningful innovation at this depth.
56
+ - Depth 2: Minimal generate 1–2 brief innovation concepts for the most distinctive PRD feature only; no market analysis or positioning required.
57
+ - Depth 3: Quick scan for obvious gaps and missing expected features.
58
+ - Depth 4: Full innovation pass across all categories + one external model (if CLI available).
59
+ - Depth 5: Full innovation pass + multi-model with deduplication and synthesis.
56
60
 
57
61
  ## Conditional Evaluation
58
62
  Enable when: project has a competitive landscape section in plan.md, user explicitly
@@ -5,7 +5,7 @@ summary: "Identifies UX enhancement opportunities — progressive disclosure, sm
5
5
  phase: "pre"
6
6
  order: 160
7
7
  dependencies: [review-user-stories]
8
- outputs: [docs/user-stories-innovation.md, docs/reviews/user-stories-innovation/review-summary.md, docs/reviews/user-stories-innovation/codex-review.json, docs/reviews/user-stories-innovation/gemini-review.json]
8
+ outputs: [docs/user-stories-innovation.md, docs/user-stories.md, docs/reviews/user-stories-innovation/review-summary.md, docs/reviews/user-stories-innovation/codex-review.json, docs/reviews/user-stories-innovation/gemini-review.json]
9
9
  conditional: "if-needed"
10
10
  knowledge-base: [user-stories, user-story-innovation, multi-model-review-dispatch]
11
11
  ---
@@ -39,8 +39,9 @@ enhancement opportunities.
39
39
  - (mvp) Each suggestion has a clear user benefit
40
40
  - (mvp) Approved enhancements are integrated into existing stories (not new stories)
41
41
  - (mvp) PRD scope boundaries are respected — no scope creep
42
- - User approval for each accepted innovation documented as a question-response pair with timestamp (e.g., "Q: Accept enhancement X? A: Yes — 2025-01-15T14:30Z")
43
- - (depth 4+) Multi-model suggestions deduplicated and synthesized with unique ideas from each model highlighted
42
+ - (mvp) User approval for each accepted innovation documented as a question-response pair with timestamp (e.g., "Q: Accept enhancement X? A: Yes — 2025-01-15T14:30Z")
43
+ - (mvp) Each innovation marked with approval status: approved, deferred, or rejected, with user decision timestamp
44
+ - (depth 4+) Multi-model innovation suggestions synthesized: Consensus (all models propose similar direction), Majority (2+ models agree), or Divergent (models disagree — present all perspectives to user for selection)
44
45
 
45
46
  ## Methodology Scaling
46
47
  - **deep**: Full innovation pass across all three categories (high-value
@@ -49,10 +50,12 @@ enhancement opportunities.
49
50
  innovation dispatched to Codex and Gemini if available, with graceful
50
51
  fallback to Claude-only enhanced brainstorming.
51
52
  - **mvp**: Not applicable — this step is conditional and skipped in MVP.
52
- - **custom:depth(1-5)**: Depth 1-2: skip (not enough context for meaningful innovation at this depth). Depth 3: quick
53
- scan for obvious improvements. Depth 4: full innovation pass + one external
54
- model (if CLI available). Depth 5: full innovation pass + multi-model with
55
- deduplication and synthesis.
53
+ - **custom:depth(1-5)**:
54
+ - Depth 1: Skip not enough context for meaningful innovation at this depth.
55
+ - Depth 2: Minimal generate 1–2 brief innovation concepts for the most distinctive user story only; no full Given/When/Then elaboration required.
56
+ - Depth 3: Quick scan for obvious UX improvements and low-hanging enhancements.
57
+ - Depth 4: Full innovation pass across all three categories + one external model (if CLI available).
58
+ - Depth 5: Full innovation pass + multi-model with deduplication and synthesis.
56
59
 
57
60
  ## Conditional Evaluation
58
61
  Enable when: user stories review identifies UX gaps, project targets a consumer-facing
@@ -32,12 +32,12 @@ independent review validation.
32
32
 
33
33
  ## Quality Criteria
34
34
  - (mvp) Passes 1-2 executed with findings documented
35
- - All review passes executed with findings documented
36
- - Every finding categorized by severity (P0-P3)
37
- - Fix plan created for P0 and P1 findings
38
- - Fixes applied and re-validated
35
+ - (deep) All review passes executed with findings documented
36
+ - (mvp) Every finding categorized by severity: P0 = Breaks downstream work. P1 = Prevents quality milestone. P2 = Known tech debt. P3 = Polish.
37
+ - (mvp) Fix plan created for P0 and P1 findings
38
+ - (mvp) Fixes applied and re-validated
39
39
  - (mvp) Downstream readiness confirmed (User Stories can proceed)
40
- - (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
40
+ - (depth 4+) Multi-model findings synthesized: Consensus (all models agree), Majority (2+ models agree), or Divergent (models disagree — present to user for decision)
41
41
 
42
42
  ## Methodology Scaling
43
43
  - **deep**: All 8 review passes from the knowledge base. Full findings report
@@ -46,10 +46,12 @@ independent review validation.
46
46
  to Claude-only enhanced review.
47
47
  - **mvp**: Passes 1-2 only (Problem Statement Rigor, Persona Coverage). Focus
48
48
  on blocking gaps — requirements too vague to write stories from.
49
- - **custom:depth(1-5)**: Depth 1-2: passes 1-2 only (Problem Statement Rigor,
50
- Persona Coverage). Depth 3: passes 1-4 (add Feature Scoping, Success
51
- Criteria). Depth 4: all 8 passes + one external model review (if CLI
52
- available). Depth 5: all 8 passes + multi-model review with reconciliation.
49
+ - **custom:depth(1-5)**:
50
+ - Depth 1: Pass 1 only (Problem Statement Rigor). One review pass.
51
+ - Depth 2: Passes 1-2 (Problem Statement Rigor, Persona Coverage). Two review passes.
52
+ - Depth 3: Passes 1-4 (add Feature Scoping, Success Criteria). Four review passes.
53
+ - Depth 4: All 8 passes + one external model review (if CLI available).
54
+ - Depth 5: All 8 passes + multi-model review with reconciliation.
53
55
 
54
56
  ## Mode Detection
55
57
  If docs/reviews/pre-review-prd.md exists, this is a re-review. Read previous
@@ -59,7 +61,7 @@ findings still valid.
59
61
 
60
62
  ## Update Mode Specifics
61
63
 
62
- - **Detect**: `docs/reviews/review-prd.md` exists with tracking comment
64
+ - **Detect**: `docs/reviews/pre-review-prd.md` exists with tracking comment
63
65
  - **Preserve**: Prior findings still valid, resolution decisions, multi-model review artifacts
64
66
  - **Triggers**: Upstream artifact changed since last review (compare tracking comment dates)
65
67
  - **Conflict resolution**: Previously resolved findings reappearing = regression; flag and re-evaluate
@@ -5,7 +5,7 @@ summary: "Verifies every PRD feature maps to at least one story, checks that acc
5
5
  phase: "pre"
6
6
  order: 150
7
7
  dependencies: [user-stories]
8
- outputs: [docs/reviews/pre-review-user-stories.md, docs/reviews/user-stories/requirements-index.md, docs/reviews/user-stories/coverage.json, docs/reviews/user-stories/review-summary.md]
8
+ outputs: [docs/reviews/pre-review-user-stories.md, docs/reviews/user-stories/requirements-index.md, docs/reviews/user-stories/coverage.json, docs/reviews/user-stories/review-summary.md, docs/reviews/user-stories/codex-review.json, docs/reviews/user-stories/gemini-review.json]
9
9
  conditional: null
10
10
  knowledge-base: [review-methodology, review-user-stories, multi-model-review-dispatch, review-step-template]
11
11
  ---
@@ -35,14 +35,14 @@ independent coverage validation.
35
35
 
36
36
  ## Quality Criteria
37
37
  - (mvp) Pass 1 (PRD coverage) executed with findings documented
38
- - All review passes executed with findings documented
39
- - Every finding categorized by severity (P0-P3)
40
- - Fix plan created for P0 and P1 findings
41
- - Fixes applied and re-validated
38
+ - (deep) All review passes executed with findings documented
39
+ - (mvp) Every finding categorized by severity: P0 = Breaks downstream work. P1 = Prevents quality milestone. P2 = Known tech debt. P3 = Polish.
40
+ - (mvp) Fix plan created for P0 and P1 findings
41
+ - (mvp) Fixes applied and re-validated
42
42
  - (mvp) Every story has at least one testable acceptance criterion, and every PRD feature maps to at least one story
43
43
  - (depth 4+) Every atomic PRD requirement has a REQ-xxx ID in the requirements index
44
44
  - (depth 4+) Coverage matrix maps every REQ to at least one US (100% coverage target)
45
- - (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
45
+ - (depth 4+) Multi-model findings synthesized: Consensus (all models agree), Majority (2+ models agree), or Divergent (models disagree — present to user for decision)
46
46
 
47
47
  ## Methodology Scaling
48
48
  - **deep**: All 6 review passes from the knowledge base. Full findings report
@@ -51,9 +51,12 @@ independent coverage validation.
51
51
  Gemini if available, with graceful fallback to Claude-only enhanced review.
52
52
  - **mvp**: Pass 1 only (PRD coverage). Focus on blocking gaps — PRD features
53
53
  with no corresponding story.
54
- - **custom:depth(1-5)**: Depth 1: pass 1 only. Depth 2: passes 1-2.
55
- Depth 3: passes 1-4. Depth 4: all 6 passes + requirements index + coverage
56
- matrix. Depth 5: all of depth 4 + multi-model review (if CLIs available).
54
+ - **custom:depth(1-5)**:
55
+ - Depth 1: Pass 1 only (PRD coverage). One review pass.
56
+ - Depth 2: Passes 1-2 (PRD coverage, acceptance criteria quality). Two review passes.
57
+ - Depth 3: Passes 1-4 (add story independence, INVEST criteria). Four review passes.
58
+ - Depth 4: All 6 passes + requirements index + coverage matrix + one external model (if CLI available).
59
+ - Depth 5: All of depth 4 + multi-model review with reconciliation (if CLIs available).
57
60
 
58
61
  ## Mode Detection
59
62
  If docs/reviews/pre-review-user-stories.md exists, this is a re-review. Read
@@ -29,7 +29,7 @@ task decomposition downstream.
29
29
  ## Quality Criteria
30
30
  - (mvp) Every PRD feature maps to at least one user story
31
31
  - (deep) Stories follow INVEST criteria (Independent, Negotiable, Valuable, Estimable, Small, Testable)
32
- - (mvp) Acceptance criteria are testable — unambiguous pass/fail
32
+ - (mvp) Acceptance criteria are testable — unambiguous pass/fail: (a) free of adjectives like 'valid', 'properly', 'quickly', (b) names specific inputs and expected outputs
33
33
  - (deep) No story has more than 7 acceptance criteria
34
34
  - (mvp) Every PRD persona is represented in at least one story
35
35
  - (mvp) Stories describe user behavior, not implementation details
@@ -41,9 +41,12 @@ task decomposition downstream.
41
41
  examples, story-to-domain-event mapping for domain modeling consumption.
42
42
  - **mvp**: Flat list of one-liner stories grouped by PRD section. One bullet
43
43
  per story for the primary success condition. No epics, no scope boundaries.
44
- - **custom:depth(1-5)**: Depth 1-2: flat list with brief acceptance criteria.
45
- Depth 3: full template with IDs, epics, Given/When/Then. Depth 4-5: add
46
- dependency mapping, traceability, UI/UX notes, story splitting rationale.
44
+ - **custom:depth(1-5)**:
45
+ - Depth 1: Flat list of one-liner stories grouped by PRD section. One bullet per story.
46
+ - Depth 2: Flat list with brief acceptance criteria (1-2 criteria per story).
47
+ - Depth 3: Full template with story IDs, epics, Given/When/Then acceptance criteria.
48
+ - Depth 4: Add dependency mapping, traceability to PRD features, and UI/UX notes.
49
+ - Depth 5: Full suite with story splitting rationale, persona journey maps, and story-to-domain-event mapping.
47
50
 
48
51
  ## Mode Detection
49
52
  If docs/user-stories.md exists, operate in update mode: read existing stories,
@@ -59,12 +59,12 @@ Conditional (generated when source doc exists):
59
59
  Supporting:
60
60
  - tests/evals/helpers.* — shared utilities
61
61
  - docs/eval-standards.md — documents what is and isn't checked
62
- - make eval target added to Makefile/package.json
62
+ - make eval target (or equivalent build command) added to project build configuration
63
63
 
64
64
  ## Quality Criteria
65
65
  - (mvp) Consistency + Structure evals generated
66
66
  - (mvp) Evals use the project's own test framework from docs/tech-stack.md
67
- - (mvp) All generated evals pass on the current codebase (no false positives)
67
+ - (mvp) All generated evals pass on the current codebase when exclusion mechanisms are applied
68
68
  - (mvp) Eval results are binary PASS/FAIL, not scores
69
69
  - (mvp) make eval is separate from make test and make check (opt-in for CI)
70
70
  - (deep) All applicable eval categories generated including security, API, DB, accessibility (conditional on source doc existence)
@@ -72,7 +72,9 @@ Supporting:
72
72
  - (deep) docs/eval-standards.md explicitly documents what evals do NOT check
73
73
  - (deep) Full eval suite runs in under 30 seconds
74
74
  - (mvp) `make eval` (or equivalent) runs and all generated evals pass
75
+ - (mvp) All core eval categories (consistency, structure, adherence, coverage, cross-doc) are generated
75
76
  - (deep) Eval false-positive assessment: each eval category documents at least one scenario where valid code might incorrectly fail, with exclusion mechanism
77
+ - (deep) Every conditional eval category with a source document is generated
76
78
 
77
79
  ## Methodology Scaling
78
80
  - **deep**: All 13 eval categories (conditional on doc existence). Stack-specific
@@ -80,7 +82,8 @@ Supporting:
80
82
  conformance. API contract validation. Security patterns. Full suite.
81
83
  - **mvp**: Consistency + Structure only. Skip everything else.
82
84
  - **custom:depth(1-5)**:
83
- - Depth 1-2: Consistency + Structure
85
+ - Depth 1: Consistency + Structure only
86
+ - Depth 2: Consistency + Structure with stack-specific patterns
84
87
  - Depth 3: Add Adherence + Cross-doc
85
88
  - Depth 4: Add Coverage + Architecture + Config + Error handling
86
89
  - Depth 5: All 13 categories (Security, API, Database, Accessibility, Performance)
@@ -39,7 +39,7 @@ development setup rather than redefining it.
39
39
  - (deep) Health check endpoints defined with expected response codes and latency bounds
40
40
  - (deep) Log aggregation strategy specifies retention period and searchable fields
41
41
  - (deep) Each alert threshold documents: the metric, threshold value, business impact if crossed, and mitigation action
42
- - References docs/dev-setup.md for local dev — does not redefine it
42
+ - (mvp) References docs/dev-setup.md for local dev — does not redefine it
43
43
  - (deep) Incident response process defined
44
44
  - (deep) Recovery Time Objective (RTO) and Recovery Point Objective (RPO) documented for each critical service
45
45
  - (deep) Secret rotation procedure documented and tested
@@ -48,8 +48,12 @@ development setup rather than redefining it.
48
48
  - **deep**: Full runbook. Deployment topology diagrams. Monitoring dashboard
49
49
  specs. Alert playbooks. DR plan. Capacity planning.
50
50
  - **mvp**: Deploy command. Basic monitoring. Rollback procedure.
51
- - **custom:depth(1-5)**: Depth 1-2: MVP-style. Depth 3: add monitoring and
52
- alerts. Depth 4-5: full runbook with DR.
51
+ - **custom:depth(1-5)**:
52
+ - Depth 1: deploy command and basic rollback procedure.
53
+ - Depth 2: add basic monitoring metrics (latency, error rate, saturation).
54
+ - Depth 3: add alert thresholds, incident response outline, health check endpoints.
55
+ - Depth 4: full runbook with deployment topology, monitoring dashboards, and DR plan.
56
+ - Depth 5: full runbook with capacity planning, secret rotation testing, and multi-region considerations.
53
57
 
54
58
  ## Mode Detection
55
59
  Check for docs/operations-runbook.md. If it exists, operate in update mode: