npm - @zigrivers/scaffold - Versions diffs - 2.38.1 → 2.44.3 - Mend

@zigrivers/scaffold 2.38.1 → 2.44.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (201) hide show

package/README.md +10 -7
package/dist/cli/commands/build.js +4 -4
package/dist/cli/commands/build.js.map +1 -1
package/dist/cli/commands/check.test.js +11 -8
package/dist/cli/commands/check.test.js.map +1 -1
package/dist/cli/commands/complete.d.ts.map +1 -1
package/dist/cli/commands/complete.js +2 -1
package/dist/cli/commands/complete.js.map +1 -1
package/dist/cli/commands/complete.test.js +4 -1
package/dist/cli/commands/complete.test.js.map +1 -1
package/dist/cli/commands/dashboard.js +4 -4
package/dist/cli/commands/dashboard.js.map +1 -1
package/dist/cli/commands/knowledge.js +2 -2
package/dist/cli/commands/knowledge.js.map +1 -1
package/dist/cli/commands/knowledge.test.js +5 -12
package/dist/cli/commands/knowledge.test.js.map +1 -1
package/dist/cli/commands/list.d.ts +1 -1
package/dist/cli/commands/list.d.ts.map +1 -1
package/dist/cli/commands/list.js +84 -3
package/dist/cli/commands/list.js.map +1 -1
package/dist/cli/commands/list.test.js +82 -0
package/dist/cli/commands/list.test.js.map +1 -1
package/dist/cli/commands/next.test.js +4 -1
package/dist/cli/commands/next.test.js.map +1 -1
package/dist/cli/commands/reset.d.ts.map +1 -1
package/dist/cli/commands/reset.js +5 -2
package/dist/cli/commands/reset.js.map +1 -1
package/dist/cli/commands/reset.test.js +4 -1
package/dist/cli/commands/reset.test.js.map +1 -1
package/dist/cli/commands/rework.d.ts.map +1 -1
package/dist/cli/commands/rework.js +3 -2
package/dist/cli/commands/rework.js.map +1 -1
package/dist/cli/commands/run.d.ts.map +1 -1
package/dist/cli/commands/run.js +28 -13
package/dist/cli/commands/run.js.map +1 -1
package/dist/cli/commands/run.test.js +1 -1
package/dist/cli/commands/run.test.js.map +1 -1
package/dist/cli/commands/skip.d.ts.map +1 -1
package/dist/cli/commands/skip.js +2 -1
package/dist/cli/commands/skip.js.map +1 -1
package/dist/cli/commands/skip.test.js +4 -1
package/dist/cli/commands/skip.test.js.map +1 -1
package/dist/cli/commands/status.d.ts.map +1 -1
package/dist/cli/commands/status.js +88 -4
package/dist/cli/commands/status.js.map +1 -1
package/dist/cli/commands/version.d.ts.map +1 -1
package/dist/cli/commands/version.js +22 -3
package/dist/cli/commands/version.js.map +1 -1
package/dist/cli/commands/version.test.js +42 -0
package/dist/cli/commands/version.test.js.map +1 -1
package/dist/cli/output/context.test.js +14 -13
package/dist/cli/output/context.test.js.map +1 -1
package/dist/cli/output/interactive.js +4 -4
package/dist/cli/output/json.d.ts +1 -0
package/dist/cli/output/json.d.ts.map +1 -1
package/dist/cli/output/json.js +14 -1
package/dist/cli/output/json.js.map +1 -1
package/dist/config/loader.d.ts.map +1 -1
package/dist/config/loader.js +10 -3
package/dist/config/loader.js.map +1 -1
package/dist/config/loader.test.js +28 -0
package/dist/config/loader.test.js.map +1 -1
package/dist/core/assembly/engine.d.ts.map +1 -1
package/dist/core/assembly/engine.js +6 -1
package/dist/core/assembly/engine.js.map +1 -1
package/dist/e2e/init.test.js +3 -0
package/dist/e2e/init.test.js.map +1 -1
package/dist/index.js +2 -1
package/dist/index.js.map +1 -1
package/dist/project/adopt.test.js +3 -0
package/dist/project/adopt.test.js.map +1 -1
package/dist/project/claude-md.d.ts.map +1 -1
package/dist/project/claude-md.js +2 -1
package/dist/project/claude-md.js.map +1 -1
package/dist/project/detector.js +3 -3
package/dist/project/detector.js.map +1 -1
package/dist/project/signals.d.ts +1 -0
package/dist/project/signals.d.ts.map +1 -1
package/dist/state/decision-logger.d.ts.map +1 -1
package/dist/state/decision-logger.js +7 -4
package/dist/state/decision-logger.js.map +1 -1
package/dist/state/lock-manager.js +1 -1
package/dist/state/lock-manager.js.map +1 -1
package/dist/state/lock-manager.test.js +27 -3
package/dist/state/lock-manager.test.js.map +1 -1
package/dist/state/state-manager.d.ts.map +1 -1
package/dist/state/state-manager.js +6 -0
package/dist/state/state-manager.js.map +1 -1
package/dist/state/state-manager.test.js +7 -0
package/dist/state/state-manager.test.js.map +1 -1
package/dist/types/assembly.d.ts +2 -0
package/dist/types/assembly.d.ts.map +1 -1
package/dist/utils/eligible.d.ts +8 -0
package/dist/utils/eligible.d.ts.map +1 -0
package/dist/utils/eligible.js +36 -0
package/dist/utils/eligible.js.map +1 -0
package/dist/validation/config-validator.test.js +15 -13
package/dist/validation/config-validator.test.js.map +1 -1
package/dist/validation/index.test.js +1 -1
package/dist/wizard/wizard.d.ts.map +1 -1
package/dist/wizard/wizard.js +1 -0
package/dist/wizard/wizard.js.map +1 -1
package/dist/wizard/wizard.test.js +2 -0
package/dist/wizard/wizard.test.js.map +1 -1
package/knowledge/core/automated-review-tooling.md +4 -4
package/knowledge/core/eval-craft.md +44 -0
package/knowledge/core/multi-model-review-dispatch.md +8 -0
package/knowledge/core/system-architecture.md +39 -0
package/knowledge/core/task-decomposition.md +53 -0
package/knowledge/core/testing-strategy.md +160 -0
package/knowledge/finalization/implementation-playbook.md +24 -7
package/knowledge/product/prd-craft.md +41 -0
package/knowledge/review/review-adr.md +1 -1
package/knowledge/review/review-api-design.md +1 -1
package/knowledge/review/review-database-design.md +1 -1
package/knowledge/review/review-domain-modeling.md +1 -1
package/knowledge/review/review-implementation-tasks.md +1 -1
package/knowledge/review/review-methodology.md +1 -1
package/knowledge/review/review-operations.md +1 -1
package/knowledge/review/review-prd.md +1 -1
package/knowledge/review/review-security.md +1 -1
package/knowledge/review/review-system-architecture.md +1 -1
package/knowledge/review/review-testing-strategy.md +1 -1
package/knowledge/review/review-user-stories.md +1 -1
package/knowledge/review/review-ux-specification.md +1 -1
package/knowledge/review/review-vision.md +1 -1
package/knowledge/tools/post-implementation-review-methodology.md +107 -0
package/knowledge/validation/critical-path-analysis.md +13 -0
package/knowledge/validation/implementability-review.md +14 -0
package/package.json +2 -1
package/pipeline/architecture/review-architecture.md +8 -5
package/pipeline/architecture/system-architecture.md +9 -3
package/pipeline/build/multi-agent-resume.md +21 -7
package/pipeline/build/multi-agent-start.md +22 -7
package/pipeline/build/new-enhancement.md +20 -12
package/pipeline/build/quick-task.md +18 -11
package/pipeline/build/single-agent-resume.md +20 -6
package/pipeline/build/single-agent-start.md +24 -8
package/pipeline/consolidation/claude-md-optimization.md +8 -4
package/pipeline/consolidation/workflow-audit.md +9 -5
package/pipeline/decisions/adrs.md +7 -3
package/pipeline/decisions/review-adrs.md +8 -5
package/pipeline/environment/ai-memory-setup.md +6 -2
package/pipeline/environment/automated-pr-review.md +79 -12
package/pipeline/environment/design-system.md +9 -6
package/pipeline/environment/dev-env-setup.md +8 -5
package/pipeline/environment/git-workflow.md +16 -13
package/pipeline/finalization/apply-fixes-and-freeze.md +10 -5
package/pipeline/finalization/developer-onboarding-guide.md +10 -3
package/pipeline/finalization/implementation-playbook.md +13 -4
package/pipeline/foundation/beads.md +8 -5
package/pipeline/foundation/coding-standards.md +13 -10
package/pipeline/foundation/project-structure.md +16 -13
package/pipeline/foundation/tdd.md +9 -4
package/pipeline/foundation/tech-stack.md +7 -5
package/pipeline/integration/add-e2e-testing.md +12 -8
package/pipeline/modeling/domain-modeling.md +9 -7
package/pipeline/modeling/review-domain-modeling.md +8 -6
package/pipeline/parity/platform-parity-review.md +9 -6
package/pipeline/planning/implementation-plan-review.md +10 -7
package/pipeline/planning/implementation-plan.md +41 -9
package/pipeline/pre/create-prd.md +7 -4
package/pipeline/pre/innovate-prd.md +12 -8
package/pipeline/pre/innovate-user-stories.md +10 -7
package/pipeline/pre/review-prd.md +12 -10
package/pipeline/pre/review-user-stories.md +12 -9
package/pipeline/pre/user-stories.md +7 -4
package/pipeline/quality/create-evals.md +6 -3
package/pipeline/quality/operations.md +7 -3
package/pipeline/quality/review-operations.md +12 -5
package/pipeline/quality/review-security.md +11 -6
package/pipeline/quality/review-testing.md +11 -6
package/pipeline/quality/security.md +6 -2
package/pipeline/quality/story-tests.md +14 -9
package/pipeline/specification/api-contracts.md +9 -3
package/pipeline/specification/database-schema.md +8 -2
package/pipeline/specification/review-api.md +10 -4
package/pipeline/specification/review-database.md +8 -3
package/pipeline/specification/review-ux.md +9 -3
package/pipeline/specification/ux-spec.md +9 -4
package/pipeline/validation/critical-path-walkthrough.md +10 -5
package/pipeline/validation/cross-phase-consistency.md +9 -4
package/pipeline/validation/decision-completeness.md +8 -3
package/pipeline/validation/dependency-graph-validation.md +8 -3
package/pipeline/validation/implementability-dry-run.md +9 -5
package/pipeline/validation/scope-creep-check.md +11 -6
package/pipeline/validation/traceability-matrix.md +10 -5
package/pipeline/vision/create-vision.md +7 -4
package/pipeline/vision/innovate-vision.md +11 -8
package/pipeline/vision/review-vision.md +15 -12
package/skills/multi-model-dispatch/SKILL.md +6 -5
package/skills/scaffold-runner/SKILL.md +47 -3
package/tools/dashboard.md +53 -0
package/tools/post-implementation-review.md +655 -0
package/tools/prompt-pipeline.md +160 -0
package/tools/release.md +440 -0
package/tools/review-pr.md +229 -0
package/tools/session-analyzer.md +299 -0
package/tools/update.md +113 -0
package/tools/version-bump.md +290 -0
package/tools/version.md +82 -0

package/pipeline/foundation/tech-stack.md CHANGED Viewed

@@ -41,7 +41,7 @@ about ecosystem maturity, alternatives, and gotchas.
 - (mvp) Every choice is a decision, not a menu of options
 - (mvp) Quick Reference section lists every dependency with version
 - (deep) Each technology choice documents AI compatibility assessment (training data availability, convention strength); total direct dependencies counted and justified
-- (depth 4+) Multi-model recommendations cross-referenced — agreements flagged as high-confidence, disagreements flagged for human decision
+- (depth 4+) Multi-model recommendations synthesized: Consensus (all models agree), Majority (2+ models agree), or Divergent (models disagree — present to user for decision)
 ## Methodology Scaling
 - **deep**: Comprehensive research with competitive analysis for each category.
@@ -51,10 +51,12 @@ about ecosystem maturity, alternatives, and gotchas.
   to Claude-only enhanced research.
 - **mvp**: Core stack decisions only (language, framework, database, test runner).
   Brief rationale. Quick Reference with versions. 2-3 pages.
-- **custom:depth(1-5)**: Depth 1-2: MVP decisions. Depth 3: add infrastructure
-  and tooling. Depth 4: add AI compatibility analysis + one external model
-  (if CLI available). Depth 5: full competitive analysis and upgrade strategy
-  + multi-model with cross-referencing.
+- **custom:depth(1-5)**:
+  - Depth 1: Core stack decisions only (language, framework, database). Brief rationale. 1 page.
+  - Depth 2: Depth 1 + test runner choice and Quick Reference with versions. 2-3 pages.
+  - Depth 3: Add infrastructure, tooling, and developer experience recommendations.
+  - Depth 4: Add AI compatibility analysis + one external model research (if CLI available).
+  - Depth 5: Full competitive analysis per category, upgrade strategy, + multi-model with cross-referencing.
 ## Mode Detection
 Update mode if docs/tech-stack.md exists. In update mode: never change a

package/pipeline/integration/add-e2e-testing.md CHANGED Viewed

@@ -5,7 +5,7 @@ summary: "Detects whether your project is web or mobile, then configures Playwri
 phase: "integration"
 order: 410
 dependencies: [git-workflow, tdd]
-outputs: [tests/screenshots/, maestro/]
+outputs: [tests/screenshots/, maestro/, playwright.config.ts]
 reads: [coding-standards, user-stories]
 conditional: "if-needed"
 knowledge-base: [testing-strategy]
@@ -39,13 +39,14 @@ Outputs vary by detected platform:
 - (mvp) (web) Playwright config uses framework-specific dev server command and port
 - (mvp) (web) Smoke test passes (navigate, screenshot, close)
 - (mvp) (mobile) Maestro CLI installed, sample flow executes, screenshot captured
-- (mobile) testID naming convention defined and documented
+- (mvp) (mobile) testID naming convention defined and documented
 - (mvp) E2E section in tdd-standards.md distinguishes when to use E2E vs unit tests
-- Baseline screenshots committed, current screenshots gitignored
-- CLAUDE.md contains browser/mobile testing section
-- tdd-standards.md E2E section updated with when-to-use guidance
+- (mvp) Baseline screenshots committed, current screenshots gitignored
+- (mvp) CLAUDE.md contains browser/mobile testing section
+- (mvp) tdd-standards.md E2E section updated with when-to-use guidance
 - (deep) CI integration configured for E2E test execution
 - (deep) Sub-flows defined for common user journeys
+- (deep) Smoke test names and intent are consistent between Playwright and Maestro
 ## Methodology Scaling
 - **deep**: Full setup for all detected platforms. All visual testing patterns,
@@ -53,9 +54,12 @@ Outputs vary by detected platform:
   common journeys, and comprehensive documentation updates.
 - **mvp**: Basic config and smoke test for detected platform. Minimal docs
   updates. Two viewports for web, single platform for mobile.
-- **custom:depth(1-5)**: Depth 1-2: config + smoke test. Depth 3: add patterns,
-  naming, testID rules. Depth 4: add CI integration, both mobile platforms.
-  Depth 5: full suite with baseline management and sub-flows.
+- **custom:depth(1-5)**:
+  - Depth 1: Config + smoke test for primary platform only
+  - Depth 2: Config + smoke test with basic viewport/device coverage
+  - Depth 3: Add patterns, naming conventions, and testID rules
+  - Depth 4: Add CI integration and both mobile platforms
+  - Depth 5: Full suite with baseline management, sub-flows, and cross-platform consistency
 ## Conditional Evaluation
 Enable when: tech-stack.md indicates a web frontend (Playwright) or mobile app

package/pipeline/modeling/domain-modeling.md CHANGED Viewed

@@ -35,13 +35,12 @@ and aggregate boundaries. User actions reveal the domain model.
 - docs/domain-models/index.md — overview of all domains and their relationships
 ## Quality Criteria
-- (mvp) Every PRD feature maps to at least one domain
+- (mvp) Every PRD feature maps to >= 1 domain
 - (mvp) Entity relationships are explicit (not implied)
 - (mvp) Each aggregate boundary documents: the invariant it protects, the consistency boundary it enforces, and why included entities must change together
 - (deep) Domain events cover all state transitions
-- (deep) Each invariant is phrased as a boolean condition checkable in code (e.g., `order.total >= 0`, `user.email matches /^[^@]+@[^@]+$/`), not a narrative description
-- Ubiquitous language is consistent across all domain models
-- (mvp) All entity and concept names used consistently across domain model files (ubiquitous language enforced)
+- (mvp) Each invariant is expressible as a runtime-checkable condition (assertion, validation rule, or database constraint) (e.g., `order.total >= 0`, `user.email matches /^[^@]+@[^@]+$/`), not a narrative description
+- (mvp) Every entity name in one domain-model file uses the same name (no synonyms) in all other domain-model files
 - (deep) Cross-aggregate event flows documented for every state change that crosses aggregate boundaries
 - (deep) Cross-domain relationships are documented at context boundaries
@@ -51,9 +50,12 @@ and aggregate boundaries. User actions reveal the domain model.
   relationships between bounded contexts. Separate file per domain.
 - **mvp**: Key entities and their relationships in a single file. Core business
   rules listed. Enough to inform architecture decisions.
-- **custom:depth(1-5)**: Depth 1-2: single-file entity overview. Depth 3: separate
-  files per domain with entities and events. Depth 4-5: full DDD approach with
-  context maps and detailed invariants.
+- **custom:depth(1-5)**:
+  - Depth 1: single-file entity list with key relationships.
+  - Depth 2: single-file entity overview with attributes and core business rules.
+  - Depth 3: separate files per domain with entities, events, and aggregate boundaries.
+  - Depth 4: full DDD approach with context maps, detailed invariants, and domain event flows.
+  - Depth 5: full DDD approach with cross-context integration contracts and sequence diagrams for all cross-aggregate flows.
 ## Mode Detection
 If docs/domain-models/ exists, operate in update mode: read existing models,

package/pipeline/modeling/review-domain-modeling.md CHANGED Viewed

@@ -31,14 +31,14 @@ independent review validation.
 ## Quality Criteria
 - (mvp) All review passes executed with findings documented
-- (mvp) Every finding categorized by severity (P0-P3)
+- (mvp) Every finding categorized by severity (P0-P3). Severity definitions: P0 = Breaks downstream work. P1 = Prevents quality milestone. P2 = Known tech debt. P3 = Polish.
 - (mvp) Fix plan created for P0 and P1 findings
 - (mvp) Fixes applied and re-validated
 - (mvp) Downstream readiness confirmed (decisions phase can proceed)
 - (mvp) Entity coverage verified (every PRD feature maps to at least one entity)
 - (deep) Aggregate boundaries verified (each aggregate protects at least one invariant)
 - (deep) Ubiquitous language consistency verified across all domain model files
-- (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
+- (depth 4+) Multi-model findings synthesized: Consensus (all models agree), Majority (2+ models agree), or Divergent (models disagree — present to user for decision)
 ## Methodology Scaling
 - **deep**: All review passes from the knowledge base. Full findings report
@@ -46,10 +46,12 @@ independent review validation.
   review dispatched to Codex and Gemini if available, with graceful fallback
   to Claude-only enhanced review.
 - **mvp**: Quick consistency check. Focus on blocking issues only.
-- **custom:depth(1-5)**: Depth 1-2: blocking issues only. Depth 3: add coverage
-  and consistency passes. Depth 4: full multi-pass review + one external model
-  (if CLI available). Depth 5: full multi-pass review + multi-model with
-  reconciliation.
+- **custom:depth(1-5)**:
+  - Depth 1: single pass — blocking issues only (entity coverage against PRD).
+  - Depth 2: two passes — entity coverage + ubiquitous language consistency.
+  - Depth 3: four passes — entity coverage, ubiquitous language, aggregate boundary validation, and cross-domain consistency.
+  - Depth 4: all review passes + one external model (if CLI available).
+  - Depth 5: all review passes + multi-model with reconciliation.
 ## Mode Detection
 If docs/reviews/review-domain-modeling.md exists, this is a re-review. Read previous

package/pipeline/parity/platform-parity-review.md CHANGED Viewed

@@ -56,8 +56,9 @@ Skip when the project targets a single platform only.
 - (deep) Navigation patterns appropriate per platform (sidebar vs. tab bar, etc.)
 - (deep) Offline/connectivity handling addressed per platform (if applicable)
 - (deep) Web version is treated as first-class (not afterthought) if PRD specifies it
-- Fix plan documented for all P0/P1 findings with specific document and section to update
-- (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
+- (mvp) Every finding categorized P0-P3 (P0 = Breaks downstream work. P1 = Prevents quality milestone. P2 = Known tech debt. P3 = Polish.)
+- (mvp) Fix plan documented for all P0/P1 findings with specific document and section to update
+- (depth 4+) Multi-model findings synthesized: Consensus (all models agree), Majority (2+ models agree), or Divergent (models disagree — present to user for decision)
 ## Methodology Scaling
 - **deep**: Comprehensive platform audit across all documents, feature parity
@@ -67,10 +68,12 @@ Skip when the project targets a single platform only.
   to Claude-only enhanced review.
 - **mvp**: Quick check of user stories and tech-stack for platform coverage.
   Identify top 3 platform gaps. Skip detailed feature parity matrix.
-- **custom:depth(1-5)**: Depth 1-2: user stories platform check. Depth 3: add
-  tech-stack and coding-standards. Depth 4: add feature parity matrix + one
-  external model (if CLI available). Depth 5: full suite across all documents
-  + multi-model with reconciliation.
+- **custom:depth(1-5)**:
+  - Depth 1: User stories platform check only (1 review pass)
+  - Depth 2: Two-pass check — first pass validates user stories against platform constraints; second pass identifies implicit assumptions that could block native implementation (e.g., assumed web APIs not available on mobile).
+  - Depth 3: Add tech-stack and coding-standards platform audit (3 review passes)
+  - Depth 4: Add feature parity matrix + one external model if CLI available (3 review passes + external dispatch)
+  - Depth 5: Full suite across all documents + multi-model with reconciliation (3 review passes + multi-model synthesis)
 ## Mode Detection
 Update mode if docs/reviews/platform-parity-review.md exists. In update mode:

package/pipeline/planning/implementation-plan-review.md CHANGED Viewed

@@ -5,7 +5,7 @@ summary: "Verifies every feature has implementation tasks, no task is too large
 phase: "planning"
 order: 1220
 dependencies: [implementation-plan]
-outputs: [docs/reviews/review-tasks.md, docs/reviews/implementation-plan/task-coverage.json, docs/reviews/implementation-plan/review-summary.md]
+outputs: [docs/reviews/review-tasks.md, docs/reviews/implementation-plan/task-coverage.json, docs/reviews/implementation-plan/review-summary.md, docs/reviews/implementation-plan/codex-review.json, docs/reviews/implementation-plan/gemini-review.json]
 conditional: null
 knowledge-base: [review-methodology, review-implementation-tasks, task-decomposition, multi-model-review-dispatch, review-step-template]
 ---
@@ -20,8 +20,8 @@ and produce a structured coverage matrix and review summary.
 ## Inputs
 - docs/implementation-plan.md (required) — tasks to review
-- docs/system-architecture.md (required) — for coverage checking
-- docs/domain-models/ (required) — for completeness
+- docs/system-architecture.md (required at deep; optional — not available in MVP) — for coverage checking
+- docs/domain-models/ (required at deep; optional — not available in MVP) — for completeness
 - docs/user-stories.md (required) — for AC coverage mapping
 - docs/plan.md (required) — for traceability
 - docs/project-structure.md (required) — for file contention analysis
@@ -51,9 +51,12 @@ and produce a structured coverage matrix and review summary.
 - **deep**: Full multi-pass review with multi-model validation. AC coverage
   matrix. Independent Codex/Gemini dispatches. Detailed reconciliation report.
 - **mvp**: Coverage check only. No external model dispatch.
-- **custom:depth(1-5)**: Depth 1-2: coverage check. Depth 3: add dependency
-  analysis and AC coverage matrix. Depth 4: add one external model. Depth 5:
-  full multi-model with reconciliation.
+- **custom:depth(1-5)**:
+  - Depth 1: architecture coverage check (every component has tasks).
+  - Depth 2: coverage check plus DAG validation and agent executability rules.
+  - Depth 3: add dependency analysis, AC coverage matrix, and task sizing audit.
+  - Depth 4: add one external model review (Codex or Gemini).
+  - Depth 5: full multi-model review with reconciliation and detailed findings report.
 ## Mode Detection
 Re-review mode if previous review exists. If multi-model review artifacts exist
@@ -61,7 +64,7 @@ under docs/reviews/implementation-plan/, preserve prior findings still valid.
 ## Update Mode Specifics
-- **Detect**: `docs/reviews/review-implementation-plan.md` exists with tracking comment
+- **Detect**: `docs/reviews/review-tasks.md` exists with tracking comment
 - **Preserve**: Prior findings still valid, resolution decisions, multi-model review artifacts
 - **Triggers**: Upstream artifact changed since last review (compare tracking comment dates)
 - **Conflict resolution**: Previously resolved findings reappearing = regression; flag and re-evaluate

package/pipeline/planning/implementation-plan.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: implementation-plan
-description: Break architecture into implementable tasks with dependencies
+description: Break deliverables into implementable tasks with dependencies, ordered by priority and dependencies
 summary: "Breaks your user stories and architecture into concrete tasks — each scoped to ~150 lines of code and 3 files max, with clear acceptance criteria, no ambiguous decisions, and explicit dependencies."
 phase: "planning"
 order: 1210
@@ -8,7 +8,7 @@ dependencies: [tdd, operations, security, review-architecture, create-evals]
 outputs: [docs/implementation-plan.md]
 reads: [create-prd, story-tests, database-schema, api-contracts, ux-spec]
 conditional: null
-knowledge-base: [task-decomposition]
+knowledge-base: [task-decomposition, system-architecture]
 ---
 ## Purpose
@@ -37,17 +37,18 @@ The primary mapping is Story → Task(s), with PRD as the traceability root.
   assignment recommendations
 ## Quality Criteria
-- (mvp) Every architecture component has implementation tasks
-- (mvp) Task dependencies form a valid DAG (no cycles)
-- (mvp) Each task produces ~150 lines of net-new application code (excluding tests and generated files)
+- (deep) Every architecture component has implementation tasks
+- (mvp) Every user story has implementation tasks
+- (mvp) Task dependencies form a valid DAG (no cycles, verified by checking no task depends on a later-ordered task)
+- (mvp) Each task produces 150 +/- 50 lines of net-new application code (excluding tests and generated files)
 - (mvp) Tasks include acceptance criteria (how to know it's done)
 - (mvp) Tasks incorporate testing requirements from the testing strategy
 - (deep) Tasks reference corresponding test skeletons from tests/acceptance/ where applicable
 - (deep) Tasks incorporate security controls from the security review where applicable
 - (deep) Tasks incorporate operational requirements (monitoring, deployment) where applicable
-- (deep) Critical path is identified
 - (deep) Parallelization opportunities are marked with wave plan
-- (mvp) Every user story maps to at least one task
+- (mvp) Every user story maps to >= 1 task
+- (mvp) Every PRD feature maps to >= 1 user story, and every user story maps to >= 1 task (transitive traceability)
 - (deep) High-risk tasks are flagged with risk type and mitigation
 - (deep) Wave summary produced with agent allocation recommendation
 - (mvp) No task modifies more than 3 application files (test files excluded; exceptions require justification)
@@ -64,8 +65,39 @@ The primary mapping is Story → Task(s), with PRD as the traceability root.
   Each task has a brief description, rough size estimate, and key dependency.
   Enough to start working sequentially. Skip architecture decomposition —
   work directly from user story acceptance criteria.
-- **custom:depth(1-5)**: Depth 1-2: ordered list. Depth 3: add dependencies
-  and sizing. Depth 4-5: full breakdown with parallelization.
+- **custom:depth(1-5)**:
+  - Depth 1: ordered task list derived from PRD features only.
+  - Depth 2: ordered list with rough size estimates per task.
+  - Depth 3: add explicit dependencies and sizing (150-line budget, 3-file rule).
+  - Depth 4: full breakdown with dependency graph and parallelization plan.
+  - Depth 5: full breakdown with parallelization, wave assignments, agent allocation, and critical path analysis.
+## MVP-Specific Guidance (No Architecture Available)
+At MVP depth, the system architecture document does not exist. Task decomposition
+must work directly from user stories without explicit component definitions.
+**How to decompose stories into tasks without architecture:**
+1. **Derive implicit layers from tech stack**: Read docs/tech-stack.md. For a web
+   app: API layer (backend), UI layer (frontend), Data layer (database). Each
+   story typically decomposes into one task per affected layer.
+2. **Map each story to layers**: "User can register" → 3 tasks: API endpoint,
+   UI form, database table. "User can view dashboard" → 2 tasks: API data
+   endpoint, UI display component.
+3. **Use acceptance criteria to define task boundaries**: Each AC (Given/When/Then)
+   maps to test cases. Group test cases by layer. Each layer's test cases become
+   one task.
+   > **Note**: If user stories are one-liner bullets without Given/When/Then ACs (MVP depth 1–2), derive task boundaries directly from the story text instead: treat each story's success condition as defining one task scope. Infer implied acceptance criteria from the story description before decomposing into tasks.
+4. **Order tasks by dependency**: Database migrations first, then API endpoints,
+   then UI components (bottom-up).
+5. **Split within layers when tasks exceed 150 lines**: Happy path in one task,
+   validation/error handling in another, edge cases in a third.
 ## Mode Detection
 Check for docs/implementation-plan.md. If it exists, operate in update mode:

package/pipeline/pre/create-prd.md CHANGED Viewed

@@ -28,7 +28,7 @@ throughout the entire pipeline.
 ## Quality Criteria
 - (mvp) Problem statement names a specific user group, a specific pain point, and a falsifiable hypothesis about the solution
 - (mvp) Target users are identified with their needs
-- (mvp) Features are scoped with clear boundaries (what's in, what's out)
+- (mvp) Each feature defines at least one explicit out-of-scope item (what it does NOT do) in addition to what it does
 - (mvp) Success criteria are measurable
 - (mvp) Each non-functional requirement has a measurable target or threshold (e.g., 'page load < 2s', 'WCAG AA')
 - (mvp) No two sections contain contradictory statements about the same concept
@@ -40,9 +40,12 @@ throughout the entire pipeline.
   delivery plan. 15-20 pages.
 - **mvp**: Problem statement, core features list, primary user description,
   success criteria. 1-2 pages. Just enough to start building.
-- **custom:depth(1-5)**: Depth 1-2: MVP-style. Depth 3: add user personas
-  and feature prioritization. Depth 4-5: full competitive analysis and
-  phased delivery.
+- **custom:depth(1-5)**:
+  - Depth 1: MVP-style — problem statement, core features list, primary user. 1 page.
+  - Depth 2: MVP + success criteria and basic constraints. 1-2 pages.
+  - Depth 3: Add user personas and feature prioritization (MoSCoW). 3-5 pages.
+  - Depth 4: Add competitive analysis, risk assessment, and phased delivery plan. 8-12 pages.
+  - Depth 5: Full PRD with competitive analysis, phased delivery, and detailed non-functional requirements. 15-20 pages.
 ## Mode Detection
 If docs/plan.md exists, operate in update mode: read existing content, identify

package/pipeline/pre/innovate-prd.md CHANGED Viewed

@@ -8,6 +8,7 @@ dependencies: [review-prd]
 outputs: [docs/prd-innovation.md, docs/plan.md, docs/reviews/prd-innovation/review-summary.md, docs/reviews/prd-innovation/codex-review.json, docs/reviews/prd-innovation/gemini-review.json]
 conditional: "if-needed"
 knowledge-base: [prd-innovation, prd-craft, multi-model-review-dispatch]
+reads: [review-prd]
 ---
 ## Purpose
@@ -35,12 +36,13 @@ creative opportunities and competitive insights.
 ## Quality Criteria
 - (mvp) Enhancements are feature-level, not UX-level polish
 - (mvp) Each suggestion has a cost estimate (trivial/moderate/significant)
-- (mvp) Each suggestion has a clear user benefit and impact assessment
+- (mvp) Each suggestion specifies: the problem it solves for a specific user type, the expected behavior change, and cost estimate (trivial/moderate/significant)
 - (mvp) Each approved innovation includes: problem it solves, target users, scope boundaries, and success criteria
 - (mvp) PRD scope boundaries are respected — no uncontrolled scope creep
-- User approval is obtained before modifying the PRD
-- User approval for each accepted innovation documented as a question-response pair with timestamp (e.g., "Q: Accept feature X? A: Yes — 2025-01-15T14:30Z")
-- (depth 4+) Multi-model suggestions deduplicated and synthesized with unique ideas from each model highlighted
+- (mvp) User approval is obtained before modifying the PRD
+- (mvp) User approval for each accepted innovation documented as a question-response pair with timestamp (e.g., "Q: Accept feature X? A: Yes — 2025-01-15T14:30Z")
+- (mvp) Each innovation marked with approval status: approved, deferred, or rejected, with user decision timestamp
+- (depth 4+) Multi-model innovation suggestions synthesized: Consensus (all models propose similar direction), Majority (2+ models agree), or Divergent (models disagree — present all perspectives to user for selection)
 ## Methodology Scaling
 - **deep**: Full innovation pass across all categories (competitive research,
@@ -49,10 +51,12 @@ creative opportunities and competitive insights.
   innovation dispatched to Codex and Gemini if available, with graceful
   fallback to Claude-only enhanced brainstorming.
 - **mvp**: Not applicable — this step is conditional and skipped in MVP.
-- **custom:depth(1-5)**: Depth 1-2: skip (not enough context for meaningful innovation at this depth). Depth 3: quick scan
-  for obvious gaps and missing expected features. Depth 4: full innovation
-  pass + one external model (if CLI available). Depth 5: full innovation pass
-  + multi-model with deduplication and synthesis.
+- **custom:depth(1-5)**:
+  - Depth 1: Skip — not enough context for meaningful innovation at this depth.
+  - Depth 2: Minimal — generate 1–2 brief innovation concepts for the most distinctive PRD feature only; no market analysis or positioning required.
+  - Depth 3: Quick scan for obvious gaps and missing expected features.
+  - Depth 4: Full innovation pass across all categories + one external model (if CLI available).
+  - Depth 5: Full innovation pass + multi-model with deduplication and synthesis.
 ## Conditional Evaluation
 Enable when: project has a competitive landscape section in plan.md, user explicitly

package/pipeline/pre/innovate-user-stories.md CHANGED Viewed

@@ -5,7 +5,7 @@ summary: "Identifies UX enhancement opportunities — progressive disclosure, sm
 phase: "pre"
 order: 160
 dependencies: [review-user-stories]
-outputs: [docs/user-stories-innovation.md, docs/reviews/user-stories-innovation/review-summary.md, docs/reviews/user-stories-innovation/codex-review.json, docs/reviews/user-stories-innovation/gemini-review.json]
+outputs: [docs/user-stories-innovation.md, docs/user-stories.md, docs/reviews/user-stories-innovation/review-summary.md, docs/reviews/user-stories-innovation/codex-review.json, docs/reviews/user-stories-innovation/gemini-review.json]
 conditional: "if-needed"
 knowledge-base: [user-stories, user-story-innovation, multi-model-review-dispatch]
 ---
@@ -39,8 +39,9 @@ enhancement opportunities.
 - (mvp) Each suggestion has a clear user benefit
 - (mvp) Approved enhancements are integrated into existing stories (not new stories)
 - (mvp) PRD scope boundaries are respected — no scope creep
-- User approval for each accepted innovation documented as a question-response pair with timestamp (e.g., "Q: Accept enhancement X? A: Yes — 2025-01-15T14:30Z")
-- (depth 4+) Multi-model suggestions deduplicated and synthesized with unique ideas from each model highlighted
+- (mvp) User approval for each accepted innovation documented as a question-response pair with timestamp (e.g., "Q: Accept enhancement X? A: Yes — 2025-01-15T14:30Z")
+- (mvp) Each innovation marked with approval status: approved, deferred, or rejected, with user decision timestamp
+- (depth 4+) Multi-model innovation suggestions synthesized: Consensus (all models propose similar direction), Majority (2+ models agree), or Divergent (models disagree — present all perspectives to user for selection)
 ## Methodology Scaling
 - **deep**: Full innovation pass across all three categories (high-value
@@ -49,10 +50,12 @@ enhancement opportunities.
   innovation dispatched to Codex and Gemini if available, with graceful
   fallback to Claude-only enhanced brainstorming.
 - **mvp**: Not applicable — this step is conditional and skipped in MVP.
-- **custom:depth(1-5)**: Depth 1-2: skip (not enough context for meaningful innovation at this depth). Depth 3: quick
-  scan for obvious improvements. Depth 4: full innovation pass + one external
-  model (if CLI available). Depth 5: full innovation pass + multi-model with
-  deduplication and synthesis.
+- **custom:depth(1-5)**:
+  - Depth 1: Skip — not enough context for meaningful innovation at this depth.
+  - Depth 2: Minimal — generate 1–2 brief innovation concepts for the most distinctive user story only; no full Given/When/Then elaboration required.
+  - Depth 3: Quick scan for obvious UX improvements and low-hanging enhancements.
+  - Depth 4: Full innovation pass across all three categories + one external model (if CLI available).
+  - Depth 5: Full innovation pass + multi-model with deduplication and synthesis.
 ## Conditional Evaluation
 Enable when: user stories review identifies UX gaps, project targets a consumer-facing

package/pipeline/pre/review-prd.md CHANGED Viewed

@@ -32,12 +32,12 @@ independent review validation.
 ## Quality Criteria
 - (mvp) Passes 1-2 executed with findings documented
-- All review passes executed with findings documented
-- Every finding categorized by severity (P0-P3)
-- Fix plan created for P0 and P1 findings
-- Fixes applied and re-validated
+- (deep) All review passes executed with findings documented
+- (mvp) Every finding categorized by severity: P0 = Breaks downstream work. P1 = Prevents quality milestone. P2 = Known tech debt. P3 = Polish.
+- (mvp) Fix plan created for P0 and P1 findings
+- (mvp) Fixes applied and re-validated
 - (mvp) Downstream readiness confirmed (User Stories can proceed)
-- (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
+- (depth 4+) Multi-model findings synthesized: Consensus (all models agree), Majority (2+ models agree), or Divergent (models disagree — present to user for decision)
 ## Methodology Scaling
 - **deep**: All 8 review passes from the knowledge base. Full findings report
@@ -46,10 +46,12 @@ independent review validation.
   to Claude-only enhanced review.
 - **mvp**: Passes 1-2 only (Problem Statement Rigor, Persona Coverage). Focus
   on blocking gaps — requirements too vague to write stories from.
-- **custom:depth(1-5)**: Depth 1-2: passes 1-2 only (Problem Statement Rigor,
-  Persona Coverage). Depth 3: passes 1-4 (add Feature Scoping, Success
-  Criteria). Depth 4: all 8 passes + one external model review (if CLI
-  available). Depth 5: all 8 passes + multi-model review with reconciliation.
+- **custom:depth(1-5)**:
+  - Depth 1: Pass 1 only (Problem Statement Rigor). One review pass.
+  - Depth 2: Passes 1-2 (Problem Statement Rigor, Persona Coverage). Two review passes.
+  - Depth 3: Passes 1-4 (add Feature Scoping, Success Criteria). Four review passes.
+  - Depth 4: All 8 passes + one external model review (if CLI available).
+  - Depth 5: All 8 passes + multi-model review with reconciliation.
 ## Mode Detection
 If docs/reviews/pre-review-prd.md exists, this is a re-review. Read previous
@@ -59,7 +61,7 @@ findings still valid.
 ## Update Mode Specifics
-- **Detect**: `docs/reviews/review-prd.md` exists with tracking comment
+- **Detect**: `docs/reviews/pre-review-prd.md` exists with tracking comment
 - **Preserve**: Prior findings still valid, resolution decisions, multi-model review artifacts
 - **Triggers**: Upstream artifact changed since last review (compare tracking comment dates)
 - **Conflict resolution**: Previously resolved findings reappearing = regression; flag and re-evaluate

package/pipeline/pre/review-user-stories.md CHANGED Viewed

@@ -5,7 +5,7 @@ summary: "Verifies every PRD feature maps to at least one story, checks that acc
 phase: "pre"
 order: 150
 dependencies: [user-stories]
-outputs: [docs/reviews/pre-review-user-stories.md, docs/reviews/user-stories/requirements-index.md, docs/reviews/user-stories/coverage.json, docs/reviews/user-stories/review-summary.md]
+outputs: [docs/reviews/pre-review-user-stories.md, docs/reviews/user-stories/requirements-index.md, docs/reviews/user-stories/coverage.json, docs/reviews/user-stories/review-summary.md, docs/reviews/user-stories/codex-review.json, docs/reviews/user-stories/gemini-review.json]
 conditional: null
 knowledge-base: [review-methodology, review-user-stories, multi-model-review-dispatch, review-step-template]
 ---
@@ -35,14 +35,14 @@ independent coverage validation.
 ## Quality Criteria
 - (mvp) Pass 1 (PRD coverage) executed with findings documented
-- All review passes executed with findings documented
-- Every finding categorized by severity (P0-P3)
-- Fix plan created for P0 and P1 findings
-- Fixes applied and re-validated
+- (deep) All review passes executed with findings documented
+- (mvp) Every finding categorized by severity: P0 = Breaks downstream work. P1 = Prevents quality milestone. P2 = Known tech debt. P3 = Polish.
+- (mvp) Fix plan created for P0 and P1 findings
+- (mvp) Fixes applied and re-validated
 - (mvp) Every story has at least one testable acceptance criterion, and every PRD feature maps to at least one story
 - (depth 4+) Every atomic PRD requirement has a REQ-xxx ID in the requirements index
 - (depth 4+) Coverage matrix maps every REQ to at least one US (100% coverage target)
-- (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
+- (depth 4+) Multi-model findings synthesized: Consensus (all models agree), Majority (2+ models agree), or Divergent (models disagree — present to user for decision)
 ## Methodology Scaling
 - **deep**: All 6 review passes from the knowledge base. Full findings report
@@ -51,9 +51,12 @@ independent coverage validation.
   Gemini if available, with graceful fallback to Claude-only enhanced review.
 - **mvp**: Pass 1 only (PRD coverage). Focus on blocking gaps — PRD features
   with no corresponding story.
-- **custom:depth(1-5)**: Depth 1: pass 1 only. Depth 2: passes 1-2.
-  Depth 3: passes 1-4. Depth 4: all 6 passes + requirements index + coverage
-  matrix. Depth 5: all of depth 4 + multi-model review (if CLIs available).
+- **custom:depth(1-5)**:
+  - Depth 1: Pass 1 only (PRD coverage). One review pass.
+  - Depth 2: Passes 1-2 (PRD coverage, acceptance criteria quality). Two review passes.
+  - Depth 3: Passes 1-4 (add story independence, INVEST criteria). Four review passes.
+  - Depth 4: All 6 passes + requirements index + coverage matrix + one external model (if CLI available).
+  - Depth 5: All of depth 4 + multi-model review with reconciliation (if CLIs available).
 ## Mode Detection
 If docs/reviews/pre-review-user-stories.md exists, this is a re-review. Read

package/pipeline/pre/user-stories.md CHANGED Viewed

@@ -29,7 +29,7 @@ task decomposition downstream.
 ## Quality Criteria
 - (mvp) Every PRD feature maps to at least one user story
 - (deep) Stories follow INVEST criteria (Independent, Negotiable, Valuable, Estimable, Small, Testable)
-- (mvp) Acceptance criteria are testable — unambiguous pass/fail
+- (mvp) Acceptance criteria are testable — unambiguous pass/fail: (a) free of adjectives like 'valid', 'properly', 'quickly', (b) names specific inputs and expected outputs
 - (deep) No story has more than 7 acceptance criteria
 - (mvp) Every PRD persona is represented in at least one story
 - (mvp) Stories describe user behavior, not implementation details
@@ -41,9 +41,12 @@ task decomposition downstream.
   examples, story-to-domain-event mapping for domain modeling consumption.
 - **mvp**: Flat list of one-liner stories grouped by PRD section. One bullet
   per story for the primary success condition. No epics, no scope boundaries.
-- **custom:depth(1-5)**: Depth 1-2: flat list with brief acceptance criteria.
-  Depth 3: full template with IDs, epics, Given/When/Then. Depth 4-5: add
-  dependency mapping, traceability, UI/UX notes, story splitting rationale.
+- **custom:depth(1-5)**:
+  - Depth 1: Flat list of one-liner stories grouped by PRD section. One bullet per story.
+  - Depth 2: Flat list with brief acceptance criteria (1-2 criteria per story).
+  - Depth 3: Full template with story IDs, epics, Given/When/Then acceptance criteria.
+  - Depth 4: Add dependency mapping, traceability to PRD features, and UI/UX notes.
+  - Depth 5: Full suite with story splitting rationale, persona journey maps, and story-to-domain-event mapping.
 ## Mode Detection
 If docs/user-stories.md exists, operate in update mode: read existing stories,

package/pipeline/quality/create-evals.md CHANGED Viewed

@@ -59,12 +59,12 @@ Conditional (generated when source doc exists):
 Supporting:
 - tests/evals/helpers.* — shared utilities
 - docs/eval-standards.md — documents what is and isn't checked
-- make eval target added to Makefile/package.json
+- make eval target (or equivalent build command) added to project build configuration
 ## Quality Criteria
 - (mvp) Consistency + Structure evals generated
 - (mvp) Evals use the project's own test framework from docs/tech-stack.md
-- (mvp) All generated evals pass on the current codebase (no false positives)
+- (mvp) All generated evals pass on the current codebase when exclusion mechanisms are applied
 - (mvp) Eval results are binary PASS/FAIL, not scores
 - (mvp) make eval is separate from make test and make check (opt-in for CI)
 - (deep) All applicable eval categories generated including security, API, DB, accessibility (conditional on source doc existence)
@@ -72,7 +72,9 @@ Supporting:
 - (deep) docs/eval-standards.md explicitly documents what evals do NOT check
 - (deep) Full eval suite runs in under 30 seconds
 - (mvp) `make eval` (or equivalent) runs and all generated evals pass
+- (mvp) All core eval categories (consistency, structure, adherence, coverage, cross-doc) are generated
 - (deep) Eval false-positive assessment: each eval category documents at least one scenario where valid code might incorrectly fail, with exclusion mechanism
+- (deep) Every conditional eval category with a source document is generated
 ## Methodology Scaling
 - **deep**: All 13 eval categories (conditional on doc existence). Stack-specific
@@ -80,7 +82,8 @@ Supporting:
   conformance. API contract validation. Security patterns. Full suite.
 - **mvp**: Consistency + Structure only. Skip everything else.
 - **custom:depth(1-5)**:
-  - Depth 1-2: Consistency + Structure
+  - Depth 1: Consistency + Structure only
+  - Depth 2: Consistency + Structure with stack-specific patterns
   - Depth 3: Add Adherence + Cross-doc
   - Depth 4: Add Coverage + Architecture + Config + Error handling
   - Depth 5: All 13 categories (Security, API, Database, Accessibility, Performance)

package/pipeline/quality/operations.md CHANGED Viewed

@@ -39,7 +39,7 @@ development setup rather than redefining it.
 - (deep) Health check endpoints defined with expected response codes and latency bounds
 - (deep) Log aggregation strategy specifies retention period and searchable fields
 - (deep) Each alert threshold documents: the metric, threshold value, business impact if crossed, and mitigation action
-- References docs/dev-setup.md for local dev — does not redefine it
+- (mvp) References docs/dev-setup.md for local dev — does not redefine it
 - (deep) Incident response process defined
 - (deep) Recovery Time Objective (RTO) and Recovery Point Objective (RPO) documented for each critical service
 - (deep) Secret rotation procedure documented and tested
@@ -48,8 +48,12 @@ development setup rather than redefining it.
 - **deep**: Full runbook. Deployment topology diagrams. Monitoring dashboard
   specs. Alert playbooks. DR plan. Capacity planning.
 - **mvp**: Deploy command. Basic monitoring. Rollback procedure.
-- **custom:depth(1-5)**: Depth 1-2: MVP-style. Depth 3: add monitoring and
-  alerts. Depth 4-5: full runbook with DR.
+- **custom:depth(1-5)**:
+  - Depth 1: deploy command and basic rollback procedure.
+  - Depth 2: add basic monitoring metrics (latency, error rate, saturation).
+  - Depth 3: add alert thresholds, incident response outline, health check endpoints.
+  - Depth 4: full runbook with deployment topology, monitoring dashboards, and DR plan.
+  - Depth 5: full runbook with capacity planning, secret rotation testing, and multi-region considerations.
 ## Mode Detection
 Check for docs/operations-runbook.md. If it exists, operate in update mode: