npm - @zigrivers/scaffold - Versions diffs - 2.1.2 → 2.28.1 - Mend

@zigrivers/scaffold 2.1.2 → 2.28.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (97) hide show

package/README.md +272 -59
package/knowledge/core/adr-craft.md +53 -0
package/knowledge/core/ai-memory-management.md +246 -0
package/knowledge/core/api-design.md +4 -0
package/knowledge/core/claude-md-patterns.md +254 -0
package/knowledge/core/coding-conventions.md +246 -0
package/knowledge/core/database-design.md +4 -0
package/knowledge/core/design-system-tokens.md +465 -0
package/knowledge/core/dev-environment.md +223 -0
package/knowledge/core/domain-modeling.md +4 -0
package/knowledge/core/eval-craft.md +1008 -0
package/knowledge/core/multi-model-review-dispatch.md +250 -0
package/knowledge/core/operations-runbook.md +37 -226
package/knowledge/core/project-structure-patterns.md +231 -0
package/knowledge/core/review-step-template.md +247 -0
package/knowledge/core/{security-review.md → security-best-practices.md} +5 -1
package/knowledge/core/task-decomposition.md +57 -34
package/knowledge/core/task-tracking.md +225 -0
package/knowledge/core/tech-stack-selection.md +214 -0
package/knowledge/core/testing-strategy.md +63 -70
package/knowledge/core/user-stories.md +69 -60
package/knowledge/core/user-story-innovation.md +57 -0
package/knowledge/core/ux-specification.md +5 -148
package/knowledge/finalization/apply-fixes-and-freeze.md +165 -14
package/knowledge/product/prd-craft.md +55 -34
package/knowledge/review/review-adr.md +32 -0
package/knowledge/review/{review-api-contracts.md → review-api-design.md} +34 -1
package/knowledge/review/{review-database-schema.md → review-database-design.md} +27 -1
package/knowledge/review/review-domain-modeling.md +33 -0
package/knowledge/review/review-implementation-tasks.md +50 -0
package/knowledge/review/review-operations.md +55 -0
package/knowledge/review/review-prd.md +33 -0
package/knowledge/review/review-security.md +53 -0
package/knowledge/review/review-system-architecture.md +28 -0
package/knowledge/review/review-testing-strategy.md +51 -0
package/knowledge/review/review-user-stories.md +54 -0
package/knowledge/review/{review-ux-spec.md → review-ux-specification.md} +37 -1
package/methodology/custom-defaults.yml +32 -3
package/methodology/deep.yml +32 -3
package/methodology/mvp.yml +32 -3
package/package.json +2 -1
package/pipeline/architecture/review-architecture.md +18 -6
package/pipeline/architecture/system-architecture.md +14 -2
package/pipeline/consolidation/claude-md-optimization.md +73 -0
package/pipeline/consolidation/workflow-audit.md +73 -0
package/pipeline/decisions/adrs.md +14 -2
package/pipeline/decisions/review-adrs.md +18 -5
package/pipeline/environment/ai-memory-setup.md +70 -0
package/pipeline/environment/automated-pr-review.md +70 -0
package/pipeline/environment/design-system.md +73 -0
package/pipeline/environment/dev-env-setup.md +65 -0
package/pipeline/environment/git-workflow.md +71 -0
package/pipeline/finalization/apply-fixes-and-freeze.md +1 -1
package/pipeline/finalization/developer-onboarding-guide.md +1 -1
package/pipeline/finalization/implementation-playbook.md +3 -3
package/pipeline/foundation/beads.md +68 -0
package/pipeline/foundation/coding-standards.md +68 -0
package/pipeline/foundation/project-structure.md +69 -0
package/pipeline/foundation/tdd.md +60 -0
package/pipeline/foundation/tech-stack.md +74 -0
package/pipeline/integration/add-e2e-testing.md +65 -0
package/pipeline/modeling/domain-modeling.md +14 -2
package/pipeline/modeling/review-domain-modeling.md +18 -5
package/pipeline/parity/platform-parity-review.md +70 -0
package/pipeline/planning/implementation-plan-review.md +56 -0
package/pipeline/planning/{implementation-tasks.md → implementation-plan.md} +29 -9
package/pipeline/pre/create-prd.md +13 -4
package/pipeline/pre/innovate-prd.md +37 -8
package/pipeline/pre/innovate-user-stories.md +38 -7
package/pipeline/pre/review-prd.md +18 -6
package/pipeline/pre/review-user-stories.md +23 -6
package/pipeline/pre/user-stories.md +12 -2
package/pipeline/quality/create-evals.md +102 -0
package/pipeline/quality/operations.md +38 -13
package/pipeline/quality/review-operations.md +17 -5
package/pipeline/quality/review-security.md +17 -5
package/pipeline/quality/review-testing.md +20 -8
package/pipeline/quality/security.md +25 -3
package/pipeline/quality/story-tests.md +73 -0
package/pipeline/specification/api-contracts.md +17 -2
package/pipeline/specification/database-schema.md +17 -2
package/pipeline/specification/review-api.md +18 -6
package/pipeline/specification/review-database.md +18 -6
package/pipeline/specification/review-ux.md +19 -7
package/pipeline/specification/ux-spec.md +29 -10
package/pipeline/validation/critical-path-walkthrough.md +34 -7
package/pipeline/validation/cross-phase-consistency.md +34 -7
package/pipeline/validation/decision-completeness.md +34 -7
package/pipeline/validation/dependency-graph-validation.md +34 -7
package/pipeline/validation/implementability-dry-run.md +34 -7
package/pipeline/validation/scope-creep-check.md +34 -7
package/pipeline/validation/traceability-matrix.md +34 -7
package/skills/multi-model-dispatch/SKILL.md +326 -0
package/skills/scaffold-pipeline/SKILL.md +195 -0
package/skills/scaffold-runner/SKILL.md +465 -0
package/pipeline/planning/review-tasks.md +0 -38
package/pipeline/quality/testing-strategy.md +0 -42

package/pipeline/specification/review-ux.md CHANGED Viewed

@@ -2,11 +2,11 @@
 name: review-ux
 description: Review UX specification for completeness and usability
 phase: "specification"
-order: 18
+order: 860
 dependencies: [ux-spec]
-outputs: [docs/reviews/review-ux.md]
+outputs: [docs/reviews/review-ux.md, docs/reviews/ux/review-summary.md, docs/reviews/ux/codex-review.json, docs/reviews/ux/gemini-review.json]
 conditional: "if-needed"
-knowledge-base: [review-methodology, review-ux-spec]
+knowledge-base: [review-methodology, review-ux-specification]
 ---
 ## Purpose
@@ -14,14 +14,20 @@ Review UX specification targeting UX-specific failure modes: user journey gaps,
 accessibility issues, incomplete interaction states, design system inconsistencies,
 and missing error states.
+At depth 4+, dispatches to external AI models (Codex, Gemini) for
+independent review validation.
 ## Inputs
 - docs/ux-spec.md (required) — spec to review
-- docs/prd.md (required) — for journey coverage
+- docs/plan.md (required) — for journey coverage
 - docs/api-contracts.md (optional) — for data shape alignment
 ## Expected Outputs
 - docs/reviews/review-ux.md — findings and resolution log
 - docs/ux-spec.md — updated with fixes
+- docs/reviews/ux/review-summary.md (depth 4+) — multi-model review synthesis
+- docs/reviews/ux/codex-review.json (depth 4+, if available) — raw Codex findings
+- docs/reviews/ux/gemini-review.json (depth 4+, if available) — raw Gemini findings
 ## Quality Criteria
 - User journey coverage verified against PRD
@@ -29,10 +35,16 @@ and missing error states.
 - All interaction states covered
 - Design system consistency verified
 - Error states present for all failure-capable actions
+- (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
 ## Methodology Scaling
-- **deep**: Full multi-pass review. **mvp**: Journey coverage only.
-- **custom:depth(1-5)**: Scale passes with depth.
+- **deep**: Full multi-pass review. Multi-model review dispatched to Codex and
+  Gemini if available, with graceful fallback to Claude-only enhanced review.
+  **mvp**: Journey coverage only.
+- **custom:depth(1-5)**: Depth 1-3: scale passes with depth. Depth 4: full
+  review + one external model (if CLI available). Depth 5: full review +
+  multi-model with reconciliation.
 ## Mode Detection
-Re-review mode if previous review exists.
+Re-review mode if previous review exists. If multi-model review artifacts exist
+under docs/reviews/ux/, preserve prior findings still valid.

package/pipeline/specification/ux-spec.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
 name: ux-spec
-description: Specify UI/UX design including design system
+description: Specify user flows, interaction states, component architecture, accessibility, and responsive behavior
 phase: "specification"
-order: 17
+order: 850
 dependencies: [review-architecture]
 outputs: [docs/ux-spec.md]
 conditional: "if-needed"
@@ -10,25 +10,28 @@ knowledge-base: [ux-specification]
 ---
 ## Purpose
-Define the user experience specification: user flows, wireframes, component
-hierarchy, interaction patterns, and design system (tokens, components, patterns).
-This is the visual and interaction blueprint for the frontend.
+Define the user experience specification: user flows, interaction state machines,
+component architecture (hierarchy and data flow), accessibility requirements, and
+responsive behavior. This is the interaction and behavior blueprint for the frontend.
+Visual tokens and component appearance are defined in `docs/design-system.md` — this
+step consumes those tokens, it does not redefine them.
 ## Inputs
-- docs/prd.md (required) — user requirements and personas
+- docs/plan.md (required) — user requirements and personas
 - docs/system-architecture.md (required) — frontend architecture
 - docs/api-contracts.md (optional) — data shapes for UI components
 - docs/user-stories.md (required) — user journeys driving flow design
+- docs/design-system.md (optional) — design tokens and component visual specs to reference
 ## Expected Outputs
 - docs/ux-spec.md — UX specification with flows, components, design system
 ## Quality Criteria
-- Every PRD user journey has a corresponding flow
+- Every PRD user journey has a corresponding flow with all states documented
 - Component hierarchy covers all UI states (loading, error, empty, populated)
-- Design system defines tokens (colors, spacing, typography) and base components
+- References design tokens from docs/design-system.md (does not redefine them)
 - Accessibility requirements documented (WCAG level, keyboard nav, screen readers)
-- Responsive breakpoints defined with behavior per breakpoint
+- Responsive breakpoints defined with layout behavior per breakpoint
 - Error states documented for every user action that can fail
 ## Methodology Scaling
@@ -40,4 +43,20 @@ This is the visual and interaction blueprint for the frontend.
   system. Depth 4-5: full specification with accessibility.
 ## Mode Detection
-Update mode if spec exists.
+Check for docs/ux-spec.md. If it exists, operate in update mode: read existing
+flows and component hierarchy, diff against updated user stories and system
+architecture. Preserve existing interaction patterns, state machines, and
+component data flow definitions. Add new flows for new user stories or features.
+Update component hierarchy if architecture changed frontend structure. Never
+remove documented accessibility requirements.
+## Update Mode Specifics
+- **Detect prior artifact**: docs/ux-spec.md exists
+- **Preserve**: existing user flows, interaction state machines, component
+  hierarchy, accessibility requirements, responsive breakpoint definitions
+- **Triggers for update**: user stories added or changed, architecture changed
+  frontend components, design system tokens updated, API contracts changed
+  data shapes available to UI
+- **Conflict resolution**: if a user story was rewritten, update its flow
+  in-place rather than creating a duplicate; reconcile component hierarchy
+  changes with existing state machine definitions

package/pipeline/validation/critical-path-walkthrough.md CHANGED Viewed

@@ -2,9 +2,9 @@
 name: critical-path-walkthrough
 description: Walk critical user journeys end-to-end across all specs
 phase: "validation"
-order: 30
-dependencies: [review-tasks, review-security]
-outputs: [docs/validation/critical-path-walkthrough.md]
+order: 1340
+dependencies: [implementation-plan-review, review-security]
+outputs: [docs/validation/critical-path-walkthrough.md, docs/validation/critical-path-walkthrough/review-summary.md, docs/validation/critical-path-walkthrough/codex-review.json, docs/validation/critical-path-walkthrough/gemini-review.json]
 conditional: null
 knowledge-base: [critical-path-analysis]
 ---
@@ -16,22 +16,49 @@ architecture components, database operations, and implementation tasks.
 Use story acceptance criteria as the definition of "correct behavior" when
 verifying completeness and consistency at every layer.
+At depth 4+, dispatches to external AI models (Codex, Gemini) for
+independent journey walkthroughs — different models catch different
+spec gaps along the critical path.
 ## Inputs
-- All phase output artifacts (docs/prd.md, docs/domain-models/, docs/adrs/,
+- All phase output artifacts (docs/plan.md, docs/domain-models/, docs/adrs/,
   docs/system-architecture.md, etc.)
 ## Expected Outputs
 - docs/validation/critical-path-walkthrough.md — findings report
+- docs/validation/critical-path-walkthrough/review-summary.md (depth 4+) — multi-model validation synthesis
+- docs/validation/critical-path-walkthrough/codex-review.json (depth 4+, if available) — raw Codex findings
+- docs/validation/critical-path-walkthrough/gemini-review.json (depth 4+, if available) — raw Gemini findings
 ## Quality Criteria
 - Analysis is comprehensive (not superficial)
 - Findings are actionable (specific file, section, and issue)
 - Severity categorization (P0-P3)
+- (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
+## Finding Disposition
+- **P0 (blocking)**: Must be resolved before proceeding to implementation. Create
+  fix tasks and re-run affected upstream steps.
+- **P1 (critical)**: Should be resolved; proceeding requires explicit risk acceptance
+  documented in an ADR. Flag to project lead.
+- **P2 (medium)**: Document in implementation plan as tech debt. May defer to
+  post-launch with tracking issue.
+- **P3 (minor)**: Log for future improvement. No action required before implementation.
+Findings are reported in the validation output file with severity, affected artifact,
+and recommended resolution. P0/P1 findings block the implementation-plan step from
+proceeding without acknowledgment.
 ## Methodology Scaling
-- **deep**: Exhaustive analysis with all sub-checks.
+- **deep**: Exhaustive analysis with all sub-checks. Multi-model validation
+  dispatched to Codex and Gemini if available, with graceful fallback to
+  Claude-only enhanced validation.
 - **mvp**: High-level scan for blocking issues only.
-- **custom:depth(1-5)**: Scale thoroughness with depth.
+- **custom:depth(1-5)**: Depth 1-3: scale thoroughness with depth. Depth 4:
+  full analysis + one external model (if CLI available). Depth 5: full
+  analysis + multi-model with reconciliation.
 ## Mode Detection
-Not applicable — validation always runs fresh against current artifacts.
+Not applicable — validation always runs fresh against current artifacts. If
+multi-model artifacts exist under docs/validation/critical-path-walkthrough/,
+they are regenerated each run.

package/pipeline/validation/cross-phase-consistency.md CHANGED Viewed

@@ -2,9 +2,9 @@
 name: cross-phase-consistency
 description: Audit naming, assumptions, data flows, interface contracts across all phases
 phase: "validation"
-order: 27
-dependencies: [review-tasks, review-security]
-outputs: [docs/validation/cross-phase-consistency.md]
+order: 1310
+dependencies: [implementation-plan-review, review-security]
+outputs: [docs/validation/cross-phase-consistency.md, docs/validation/cross-phase-consistency/review-summary.md, docs/validation/cross-phase-consistency/codex-review.json, docs/validation/cross-phase-consistency/gemini-review.json]
 conditional: null
 knowledge-base: [cross-phase-consistency]
 ---
@@ -14,22 +14,49 @@ Audit naming, assumptions, data flows, interface contracts across all phases.
 Ensure consistent terminology, compatible assumptions, and aligned interfaces
 between every pair of phase artifacts.
+At depth 4+, dispatches to external AI models (Codex, Gemini) for
+independent consistency validation — different models catch different
+drift patterns.
 ## Inputs
-- All phase output artifacts (docs/prd.md, docs/domain-models/, docs/adrs/,
+- All phase output artifacts (docs/plan.md, docs/domain-models/, docs/adrs/,
   docs/system-architecture.md, etc.)
 ## Expected Outputs
 - docs/validation/cross-phase-consistency.md — findings report
+- docs/validation/cross-phase-consistency/review-summary.md (depth 4+) — multi-model validation synthesis
+- docs/validation/cross-phase-consistency/codex-review.json (depth 4+, if available) — raw Codex findings
+- docs/validation/cross-phase-consistency/gemini-review.json (depth 4+, if available) — raw Gemini findings
 ## Quality Criteria
 - Analysis is comprehensive (not superficial)
 - Findings are actionable (specific file, section, and issue)
 - Severity categorization (P0-P3)
+- (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
+## Finding Disposition
+- **P0 (blocking)**: Must be resolved before proceeding to implementation. Create
+  fix tasks and re-run affected upstream steps.
+- **P1 (critical)**: Should be resolved; proceeding requires explicit risk acceptance
+  documented in an ADR. Flag to project lead.
+- **P2 (medium)**: Document in implementation plan as tech debt. May defer to
+  post-launch with tracking issue.
+- **P3 (minor)**: Log for future improvement. No action required before implementation.
+Findings are reported in the validation output file with severity, affected artifact,
+and recommended resolution. P0/P1 findings block the implementation-plan step from
+proceeding without acknowledgment.
 ## Methodology Scaling
-- **deep**: Exhaustive analysis with all sub-checks.
+- **deep**: Exhaustive analysis with all sub-checks. Multi-model validation
+  dispatched to Codex and Gemini if available, with graceful fallback to
+  Claude-only enhanced validation.
 - **mvp**: High-level scan for blocking issues only.
-- **custom:depth(1-5)**: Scale thoroughness with depth.
+- **custom:depth(1-5)**: Depth 1-3: scale thoroughness with depth. Depth 4:
+  full analysis + one external model (if CLI available). Depth 5: full
+  analysis + multi-model with reconciliation.
 ## Mode Detection
-Not applicable — validation always runs fresh against current artifacts.
+Not applicable — validation always runs fresh against current artifacts. If
+multi-model artifacts exist under docs/validation/cross-phase-consistency/,
+they are regenerated each run.

package/pipeline/validation/decision-completeness.md CHANGED Viewed

@@ -2,9 +2,9 @@
 name: decision-completeness
 description: Verify all decisions are recorded, justified, non-contradictory
 phase: "validation"
-order: 29
-dependencies: [review-tasks, review-security]
-outputs: [docs/validation/decision-completeness.md]
+order: 1330
+dependencies: [implementation-plan-review, review-security]
+outputs: [docs/validation/decision-completeness.md, docs/validation/decision-completeness/review-summary.md, docs/validation/decision-completeness/codex-review.json, docs/validation/decision-completeness/gemini-review.json]
 conditional: null
 knowledge-base: [decision-completeness]
 ---
@@ -15,22 +15,49 @@ significant architectural and technology decision has a corresponding ADR,
 that no two ADRs contradict each other, and that all decisions have clear
 rationale.
+At depth 4+, dispatches to external AI models (Codex, Gemini) for
+independent decision audit — different models surface different implicit
+decisions.
 ## Inputs
-- All phase output artifacts (docs/prd.md, docs/domain-models/, docs/adrs/,
+- All phase output artifacts (docs/plan.md, docs/domain-models/, docs/adrs/,
   docs/system-architecture.md, etc.)
 ## Expected Outputs
 - docs/validation/decision-completeness.md — findings report
+- docs/validation/decision-completeness/review-summary.md (depth 4+) — multi-model validation synthesis
+- docs/validation/decision-completeness/codex-review.json (depth 4+, if available) — raw Codex findings
+- docs/validation/decision-completeness/gemini-review.json (depth 4+, if available) — raw Gemini findings
 ## Quality Criteria
 - Analysis is comprehensive (not superficial)
 - Findings are actionable (specific file, section, and issue)
 - Severity categorization (P0-P3)
+- (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
+## Finding Disposition
+- **P0 (blocking)**: Must be resolved before proceeding to implementation. Create
+  fix tasks and re-run affected upstream steps.
+- **P1 (critical)**: Should be resolved; proceeding requires explicit risk acceptance
+  documented in an ADR. Flag to project lead.
+- **P2 (medium)**: Document in implementation plan as tech debt. May defer to
+  post-launch with tracking issue.
+- **P3 (minor)**: Log for future improvement. No action required before implementation.
+Findings are reported in the validation output file with severity, affected artifact,
+and recommended resolution. P0/P1 findings block the implementation-plan step from
+proceeding without acknowledgment.
 ## Methodology Scaling
-- **deep**: Exhaustive analysis with all sub-checks.
+- **deep**: Exhaustive analysis with all sub-checks. Multi-model validation
+  dispatched to Codex and Gemini if available, with graceful fallback to
+  Claude-only enhanced validation.
 - **mvp**: High-level scan for blocking issues only.
-- **custom:depth(1-5)**: Scale thoroughness with depth.
+- **custom:depth(1-5)**: Depth 1-3: scale thoroughness with depth. Depth 4:
+  full analysis + one external model (if CLI available). Depth 5: full
+  analysis + multi-model with reconciliation.
 ## Mode Detection
-Not applicable — validation always runs fresh against current artifacts.
+Not applicable — validation always runs fresh against current artifacts. If
+multi-model artifacts exist under docs/validation/decision-completeness/,
+they are regenerated each run.

package/pipeline/validation/dependency-graph-validation.md CHANGED Viewed

@@ -2,9 +2,9 @@
 name: dependency-graph-validation
 description: Verify task dependency graphs are acyclic, complete, correctly ordered
 phase: "validation"
-order: 32
-dependencies: [review-tasks, review-security]
-outputs: [docs/validation/dependency-graph-validation.md]
+order: 1360
+dependencies: [implementation-plan-review, review-security]
+outputs: [docs/validation/dependency-graph-validation.md, docs/validation/dependency-graph-validation/review-summary.md, docs/validation/dependency-graph-validation/codex-review.json, docs/validation/dependency-graph-validation/gemini-review.json]
 conditional: null
 knowledge-base: [dependency-validation]
 ---
@@ -15,22 +15,49 @@ Validate that the implementation task dependency graph forms a valid DAG,
 that all dependencies are satisfied before dependent tasks, and that no
 critical tasks are missing from the graph.
+At depth 4+, dispatches to external AI models (Codex, Gemini) for
+independent graph validation — different models catch different ordering
+and completeness issues.
 ## Inputs
-- All phase output artifacts (docs/prd.md, docs/domain-models/, docs/adrs/,
+- All phase output artifacts (docs/plan.md, docs/domain-models/, docs/adrs/,
   docs/system-architecture.md, etc.)
 ## Expected Outputs
 - docs/validation/dependency-graph-validation.md — findings report
+- docs/validation/dependency-graph-validation/review-summary.md (depth 4+) — multi-model validation synthesis
+- docs/validation/dependency-graph-validation/codex-review.json (depth 4+, if available) — raw Codex findings
+- docs/validation/dependency-graph-validation/gemini-review.json (depth 4+, if available) — raw Gemini findings
 ## Quality Criteria
 - Analysis is comprehensive (not superficial)
 - Findings are actionable (specific file, section, and issue)
 - Severity categorization (P0-P3)
+- (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
+## Finding Disposition
+- **P0 (blocking)**: Must be resolved before proceeding to implementation. Create
+  fix tasks and re-run affected upstream steps.
+- **P1 (critical)**: Should be resolved; proceeding requires explicit risk acceptance
+  documented in an ADR. Flag to project lead.
+- **P2 (medium)**: Document in implementation plan as tech debt. May defer to
+  post-launch with tracking issue.
+- **P3 (minor)**: Log for future improvement. No action required before implementation.
+Findings are reported in the validation output file with severity, affected artifact,
+and recommended resolution. P0/P1 findings block the implementation-plan step from
+proceeding without acknowledgment.
 ## Methodology Scaling
-- **deep**: Exhaustive analysis with all sub-checks.
+- **deep**: Exhaustive analysis with all sub-checks. Multi-model validation
+  dispatched to Codex and Gemini if available, with graceful fallback to
+  Claude-only enhanced validation.
 - **mvp**: High-level scan for blocking issues only.
-- **custom:depth(1-5)**: Scale thoroughness with depth.
+- **custom:depth(1-5)**: Depth 1-3: scale thoroughness with depth. Depth 4:
+  full analysis + one external model (if CLI available). Depth 5: full
+  analysis + multi-model with reconciliation.
 ## Mode Detection
-Not applicable — validation always runs fresh against current artifacts.
+Not applicable — validation always runs fresh against current artifacts. If
+multi-model artifacts exist under docs/validation/dependency-graph-validation/,
+they are regenerated each run.

package/pipeline/validation/implementability-dry-run.md CHANGED Viewed

@@ -2,9 +2,9 @@
 name: implementability-dry-run
 description: Dry-run specs as implementing agent, catching ambiguity
 phase: "validation"
-order: 31
-dependencies: [review-tasks, review-security]
-outputs: [docs/validation/implementability-dry-run.md]
+order: 1350
+dependencies: [implementation-plan-review, review-security]
+outputs: [docs/validation/implementability-dry-run.md, docs/validation/implementability-dry-run/review-summary.md, docs/validation/implementability-dry-run/codex-review.json, docs/validation/implementability-dry-run/gemini-review.json]
 conditional: null
 knowledge-base: [implementability-review]
 ---
@@ -15,22 +15,49 @@ AI agent would experience when picking up each implementation task: are the
 inputs clear, are the acceptance criteria testable, are there ambiguities
 that would force the agent to guess?
+At depth 4+, dispatches to external AI models (Codex, Gemini) for
+independent dry-runs — different models encounter different ambiguities
+when simulating implementation.
 ## Inputs
-- All phase output artifacts (docs/prd.md, docs/domain-models/, docs/adrs/,
+- All phase output artifacts (docs/plan.md, docs/domain-models/, docs/adrs/,
   docs/system-architecture.md, etc.)
 ## Expected Outputs
 - docs/validation/implementability-dry-run.md — findings report
+- docs/validation/implementability-dry-run/review-summary.md (depth 4+) — multi-model validation synthesis
+- docs/validation/implementability-dry-run/codex-review.json (depth 4+, if available) — raw Codex findings
+- docs/validation/implementability-dry-run/gemini-review.json (depth 4+, if available) — raw Gemini findings
 ## Quality Criteria
 - Analysis is comprehensive (not superficial)
 - Findings are actionable (specific file, section, and issue)
 - Severity categorization (P0-P3)
+- (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
+## Finding Disposition
+- **P0 (blocking)**: Must be resolved before proceeding to implementation. Create
+  fix tasks and re-run affected upstream steps.
+- **P1 (critical)**: Should be resolved; proceeding requires explicit risk acceptance
+  documented in an ADR. Flag to project lead.
+- **P2 (medium)**: Document in implementation plan as tech debt. May defer to
+  post-launch with tracking issue.
+- **P3 (minor)**: Log for future improvement. No action required before implementation.
+Findings are reported in the validation output file with severity, affected artifact,
+and recommended resolution. P0/P1 findings block the implementation-plan step from
+proceeding without acknowledgment.
 ## Methodology Scaling
-- **deep**: Exhaustive analysis with all sub-checks.
+- **deep**: Exhaustive analysis with all sub-checks. Multi-model validation
+  dispatched to Codex and Gemini if available, with graceful fallback to
+  Claude-only enhanced validation.
 - **mvp**: High-level scan for blocking issues only.
-- **custom:depth(1-5)**: Scale thoroughness with depth.
+- **custom:depth(1-5)**: Depth 1-3: scale thoroughness with depth. Depth 4:
+  full analysis + one external model (if CLI available). Depth 5: full
+  analysis + multi-model with reconciliation.
 ## Mode Detection
-Not applicable — validation always runs fresh against current artifacts.
+Not applicable — validation always runs fresh against current artifacts. If
+multi-model artifacts exist under docs/validation/implementability-dry-run/,
+they are regenerated each run.

package/pipeline/validation/scope-creep-check.md CHANGED Viewed

@@ -2,9 +2,9 @@
 name: scope-creep-check
 description: Verify specs stay aligned to PRD boundaries
 phase: "validation"
-order: 33
-dependencies: [review-tasks, review-security]
-outputs: [docs/validation/scope-creep-check.md]
+order: 1370
+dependencies: [implementation-plan-review, review-security]
+outputs: [docs/validation/scope-creep-check.md, docs/validation/scope-creep-check/review-summary.md, docs/validation/scope-creep-check/codex-review.json, docs/validation/scope-creep-check/gemini-review.json]
 conditional: null
 knowledge-base: [scope-management]
 ---
@@ -17,22 +17,49 @@ should not introduce features not in the PRD — UX-level enhancements are
 allowed only via the innovation step with explicit user approval. Flag any
 scope expansion for explicit approval.
+At depth 4+, dispatches to external AI models (Codex, Gemini) for
+independent scope analysis — different models interpret PRD boundaries
+differently, surfacing subtle creep.
 ## Inputs
-- All phase output artifacts (docs/prd.md, docs/domain-models/, docs/adrs/,
+- All phase output artifacts (docs/plan.md, docs/domain-models/, docs/adrs/,
   docs/system-architecture.md, etc.)
 ## Expected Outputs
 - docs/validation/scope-creep-check.md — findings report
+- docs/validation/scope-creep-check/review-summary.md (depth 4+) — multi-model validation synthesis
+- docs/validation/scope-creep-check/codex-review.json (depth 4+, if available) — raw Codex findings
+- docs/validation/scope-creep-check/gemini-review.json (depth 4+, if available) — raw Gemini findings
 ## Quality Criteria
 - Analysis is comprehensive (not superficial)
 - Findings are actionable (specific file, section, and issue)
 - Severity categorization (P0-P3)
+- (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
+## Finding Disposition
+- **P0 (blocking)**: Must be resolved before proceeding to implementation. Create
+  fix tasks and re-run affected upstream steps.
+- **P1 (critical)**: Should be resolved; proceeding requires explicit risk acceptance
+  documented in an ADR. Flag to project lead.
+- **P2 (medium)**: Document in implementation plan as tech debt. May defer to
+  post-launch with tracking issue.
+- **P3 (minor)**: Log for future improvement. No action required before implementation.
+Findings are reported in the validation output file with severity, affected artifact,
+and recommended resolution. P0/P1 findings block the implementation-plan step from
+proceeding without acknowledgment.
 ## Methodology Scaling
-- **deep**: Exhaustive analysis with all sub-checks.
+- **deep**: Exhaustive analysis with all sub-checks. Multi-model validation
+  dispatched to Codex and Gemini if available, with graceful fallback to
+  Claude-only enhanced validation.
 - **mvp**: High-level scan for blocking issues only.
-- **custom:depth(1-5)**: Scale thoroughness with depth.
+- **custom:depth(1-5)**: Depth 1-3: scale thoroughness with depth. Depth 4:
+  full analysis + one external model (if CLI available). Depth 5: full
+  analysis + multi-model with reconciliation.
 ## Mode Detection
-Not applicable — validation always runs fresh against current artifacts.
+Not applicable — validation always runs fresh against current artifacts. If
+multi-model artifacts exist under docs/validation/scope-creep-check/,
+they are regenerated each run.

package/pipeline/validation/traceability-matrix.md CHANGED Viewed

@@ -2,9 +2,9 @@
 name: traceability-matrix
 description: Build traceability from PRD requirements through architecture to implementation tasks
 phase: "validation"
-order: 28
-dependencies: [review-tasks, review-security]
-outputs: [docs/validation/traceability-matrix.md]
+order: 1320
+dependencies: [implementation-plan-review, review-security]
+outputs: [docs/validation/traceability-matrix.md, docs/validation/traceability-matrix/review-summary.md, docs/validation/traceability-matrix/codex-review.json, docs/validation/traceability-matrix/gemini-review.json]
 conditional: null
 knowledge-base: [traceability]
 ---
@@ -15,22 +15,49 @@ to implementation tasks. Verify the full chain: PRD → User Stories → Domain
 Model → Architecture → Tasks, with no orphans in either direction. Every PRD
 requirement must trace to at least one story, every story to at least one task.
+At depth 4+, dispatches to external AI models (Codex, Gemini) for
+independent traceability validation — different models catch different
+coverage gaps.
 ## Inputs
-- All phase output artifacts (docs/prd.md, docs/domain-models/, docs/adrs/,
+- All phase output artifacts (docs/plan.md, docs/domain-models/, docs/adrs/,
   docs/system-architecture.md, etc.)
 ## Expected Outputs
 - docs/validation/traceability-matrix.md — findings report
+- docs/validation/traceability-matrix/review-summary.md (depth 4+) — multi-model validation synthesis
+- docs/validation/traceability-matrix/codex-review.json (depth 4+, if available) — raw Codex findings
+- docs/validation/traceability-matrix/gemini-review.json (depth 4+, if available) — raw Gemini findings
 ## Quality Criteria
 - Analysis is comprehensive (not superficial)
 - Findings are actionable (specific file, section, and issue)
 - Severity categorization (P0-P3)
+- (depth 4+) Multi-model findings synthesized with consensus/disagreement analysis
+## Finding Disposition
+- **P0 (blocking)**: Must be resolved before proceeding to implementation. Create
+  fix tasks and re-run affected upstream steps.
+- **P1 (critical)**: Should be resolved; proceeding requires explicit risk acceptance
+  documented in an ADR. Flag to project lead.
+- **P2 (medium)**: Document in implementation plan as tech debt. May defer to
+  post-launch with tracking issue.
+- **P3 (minor)**: Log for future improvement. No action required before implementation.
+Findings are reported in the validation output file with severity, affected artifact,
+and recommended resolution. P0/P1 findings block the implementation-plan step from
+proceeding without acknowledgment.
 ## Methodology Scaling
-- **deep**: Exhaustive analysis with all sub-checks.
+- **deep**: Exhaustive analysis with all sub-checks. Multi-model validation
+  dispatched to Codex and Gemini if available, with graceful fallback to
+  Claude-only enhanced validation.
 - **mvp**: High-level scan for blocking issues only.
-- **custom:depth(1-5)**: Scale thoroughness with depth.
+- **custom:depth(1-5)**: Depth 1-3: scale thoroughness with depth. Depth 4:
+  full analysis + one external model (if CLI available). Depth 5: full
+  analysis + multi-model with reconciliation.
 ## Mode Detection
-Not applicable — validation always runs fresh against current artifacts.
+Not applicable — validation always runs fresh against current artifacts. If
+multi-model artifacts exist under docs/validation/traceability-matrix/,
+they are regenerated each run.