npm - @zigrivers/scaffold - Versions diffs - 3.14.0 → 3.16.0 - Mend

@zigrivers/scaffold 3.14.0 → 3.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (122) hide show

package/README.md +50 -21
package/content/knowledge/core/automated-review-tooling.md +21 -26
package/content/knowledge/core/multi-model-review-dispatch.md +30 -55
package/content/knowledge/research/research-architecture.md +385 -0
package/content/knowledge/research/research-conventions.md +248 -0
package/content/knowledge/research/research-dev-environment.md +303 -0
package/content/knowledge/research/research-experiment-loop.md +429 -0
package/content/knowledge/research/research-experiment-tracking.md +336 -0
package/content/knowledge/research/research-ml-architecture-search.md +383 -0
package/content/knowledge/research/research-ml-evaluation.md +407 -0
package/content/knowledge/research/research-ml-experiment-tracking.md +466 -0
package/content/knowledge/research/research-ml-training-patterns.md +413 -0
package/content/knowledge/research/research-observability.md +395 -0
package/content/knowledge/research/research-overfitting-prevention.md +306 -0
package/content/knowledge/research/research-project-structure.md +264 -0
package/content/knowledge/research/research-quant-backtesting.md +326 -0
package/content/knowledge/research/research-quant-market-data.md +366 -0
package/content/knowledge/research/research-quant-metrics.md +335 -0
package/content/knowledge/research/research-quant-requirements.md +223 -0
package/content/knowledge/research/research-quant-risk.md +469 -0
package/content/knowledge/research/research-quant-strategy-patterns.md +412 -0
package/content/knowledge/research/research-requirements.md +201 -0
package/content/knowledge/research/research-security.md +374 -0
package/content/knowledge/research/research-sim-compute-management.md +538 -0
package/content/knowledge/research/research-sim-engine-patterns.md +448 -0
package/content/knowledge/research/research-sim-parameter-spaces.md +425 -0
package/content/knowledge/research/research-sim-validation.md +456 -0
package/content/knowledge/research/research-testing.md +334 -0
package/content/methodology/research-ml-research.yml +23 -0
package/content/methodology/research-overlay.yml +65 -0
package/content/methodology/research-quant-finance.yml +29 -0
package/content/methodology/research-simulation.yml +23 -0
package/content/tools/post-implementation-review.md +36 -7
package/content/tools/review-code.md +33 -8
package/content/tools/review-pr.md +79 -95
package/dist/cli/commands/adopt.d.ts.map +1 -1
package/dist/cli/commands/adopt.js +22 -1
package/dist/cli/commands/adopt.js.map +1 -1
package/dist/cli/commands/adopt.serialization.test.js +41 -0
package/dist/cli/commands/adopt.serialization.test.js.map +1 -1
package/dist/cli/commands/init.d.ts +4 -0
package/dist/cli/commands/init.d.ts.map +1 -1
package/dist/cli/commands/init.js +32 -2
package/dist/cli/commands/init.js.map +1 -1
package/dist/cli/init-flag-families.d.ts +6 -1
package/dist/cli/init-flag-families.d.ts.map +1 -1
package/dist/cli/init-flag-families.js +32 -1
package/dist/cli/init-flag-families.js.map +1 -1
package/dist/cli/init-flag-families.test.js +47 -0
package/dist/cli/init-flag-families.test.js.map +1 -1
package/dist/config/schema.d.ts +272 -16
package/dist/config/schema.d.ts.map +1 -1
package/dist/config/schema.js +25 -1
package/dist/config/schema.js.map +1 -1
package/dist/config/schema.test.js +103 -3
package/dist/config/schema.test.js.map +1 -1
package/dist/core/assembly/overlay-loader.d.ts +12 -0
package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
package/dist/core/assembly/overlay-loader.js +30 -0
package/dist/core/assembly/overlay-loader.js.map +1 -1
package/dist/core/assembly/overlay-loader.test.js +66 -1
package/dist/core/assembly/overlay-loader.test.js.map +1 -1
package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
package/dist/core/assembly/overlay-state-resolver.js +48 -19
package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
package/dist/core/assembly/overlay-state-resolver.test.js +80 -0
package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
package/dist/e2e/project-type-overlays.test.js +119 -0
package/dist/e2e/project-type-overlays.test.js.map +1 -1
package/dist/project/adopt.d.ts.map +1 -1
package/dist/project/adopt.js +3 -1
package/dist/project/adopt.js.map +1 -1
package/dist/project/detectors/disambiguate.js +1 -1
package/dist/project/detectors/disambiguate.js.map +1 -1
package/dist/project/detectors/index.d.ts.map +1 -1
package/dist/project/detectors/index.js +2 -1
package/dist/project/detectors/index.js.map +1 -1
package/dist/project/detectors/ml.d.ts.map +1 -1
package/dist/project/detectors/ml.js +2 -6
package/dist/project/detectors/ml.js.map +1 -1
package/dist/project/detectors/research.d.ts +4 -0
package/dist/project/detectors/research.d.ts.map +1 -0
package/dist/project/detectors/research.js +141 -0
package/dist/project/detectors/research.js.map +1 -0
package/dist/project/detectors/research.test.d.ts +2 -0
package/dist/project/detectors/research.test.d.ts.map +1 -0
package/dist/project/detectors/research.test.js +235 -0
package/dist/project/detectors/research.test.js.map +1 -0
package/dist/project/detectors/shared-signals.d.ts +3 -0
package/dist/project/detectors/shared-signals.d.ts.map +1 -0
package/dist/project/detectors/shared-signals.js +9 -0
package/dist/project/detectors/shared-signals.js.map +1 -0
package/dist/project/detectors/types.d.ts +6 -2
package/dist/project/detectors/types.d.ts.map +1 -1
package/dist/project/detectors/types.js.map +1 -1
package/dist/types/config.d.ts +7 -1
package/dist/types/config.d.ts.map +1 -1
package/dist/wizard/copy/core.d.ts.map +1 -1
package/dist/wizard/copy/core.js +4 -0
package/dist/wizard/copy/core.js.map +1 -1
package/dist/wizard/copy/index.d.ts.map +1 -1
package/dist/wizard/copy/index.js +2 -0
package/dist/wizard/copy/index.js.map +1 -1
package/dist/wizard/copy/research.d.ts +3 -0
package/dist/wizard/copy/research.d.ts.map +1 -0
package/dist/wizard/copy/research.js +27 -0
package/dist/wizard/copy/research.js.map +1 -0
package/dist/wizard/copy/types.d.ts +5 -1
package/dist/wizard/copy/types.d.ts.map +1 -1
package/dist/wizard/flags.d.ts +7 -1
package/dist/wizard/flags.d.ts.map +1 -1
package/dist/wizard/questions.d.ts +4 -2
package/dist/wizard/questions.d.ts.map +1 -1
package/dist/wizard/questions.js +27 -1
package/dist/wizard/questions.js.map +1 -1
package/dist/wizard/questions.test.js +51 -0
package/dist/wizard/questions.test.js.map +1 -1
package/dist/wizard/wizard.d.ts +3 -2
package/dist/wizard/wizard.d.ts.map +1 -1
package/dist/wizard/wizard.js +3 -1
package/dist/wizard/wizard.js.map +1 -1
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -29,7 +29,7 @@ Either way, Scaffold constructs the prompt and the target AI tool does the work.
 **Assembly engine** — At execution time, Scaffold builds a 7-section prompt from: system metadata, the meta-prompt, knowledge base entries, project context (artifacts from prior steps), methodology settings, layered instructions, and depth-specific execution guidance.
-**Knowledge base** — 194 domain expertise entries in `content/knowledge/` organized in sixteen categories (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension) covering testing strategy, domain modeling, API design, security best practices, eval craft, TDD execution, task claiming, worktree management, release management, rendering strategies, data stores, CLI patterns, game engines, library bundling, mobile deployment, batch and streaming pipelines, model training and serving, browser extension manifests and service workers, and more. These get injected into prompts based on each step's `knowledge-base` frontmatter field. Knowledge files with a `## Deep Guidance` section are optimized for CLI assembly — only the deep guidance content is loaded, avoiding redundancy with the prompt text. Teams can add project-local overrides in `.scaffold/knowledge/` that layer on top of the global entries.
+**Knowledge base** — 222 domain expertise entries in `content/knowledge/` organized in seventeen categories (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension, research) covering testing strategy, domain modeling, API design, security best practices, eval craft, TDD execution, task claiming, worktree management, release management, rendering strategies, data stores, CLI patterns, game engines, library bundling, mobile deployment, batch and streaming pipelines, model training and serving, browser extension manifests and service workers, and more. These get injected into prompts based on each step's `knowledge-base` frontmatter field. Knowledge files with a `## Deep Guidance` section are optimized for CLI assembly — only the deep guidance content is loaded, avoiding redundancy with the prompt text. Teams can add project-local overrides in `.scaffold/knowledge/` that layer on top of the global entries.
 **Methodology presets** — Three built-in presets control which steps run and how deep the analysis goes:
 - **deep** (depth 5) — all steps enabled, exhaustive analysis
@@ -368,7 +368,7 @@ Every `scaffold init` wizard question can be answered via CLI flags, making scaf
 | `--depth` | 1-5 | Custom methodology depth (requires `--methodology custom`) |
 | `--adapters` | comma-sep | AI adapters: claude-code, codex, gemini |
 | `--traits` | comma-sep | Project traits: web, mobile |
-| `--project-type` | string | web-app, mobile-app, backend, cli, library, game, data-pipeline, ml, browser-extension |
+| `--project-type` | string | web-app, mobile-app, backend, cli, library, game, data-pipeline, ml, browser-extension, research |
 | `--auto` | boolean | Non-interactive mode (uses Zod defaults for unset flags) |
 #### Web-App Config Flags (require `--project-type web-app` or auto-set it)
@@ -445,6 +445,15 @@ Every `scaffold init` wizard question can be answered via CLI flags, making scaf
 | `--ext-content-script` | boolean | `--ext-content-script` / `--no-ext-content-script` |
 | `--ext-background-worker` | boolean | `--ext-background-worker` / `--no-ext-background-worker` |
+#### Research Config Flags (require `--project-type research` or auto-set it)
+| Flag | Type | Values |
+|------|------|--------|
+| `--research-driver` | string | code-driven, config-driven, api-driven, notebook-driven |
+| `--research-interaction` | string | autonomous, checkpoint-gated, human-guided |
+| `--research-domain` | string | none, quant-finance, ml-research, simulation |
+| `--research-tracking` | boolean | `--research-tracking` / `--no-research-tracking` |
 #### Game Config Flags (require `--project-type game` or auto-set it)
 | Flag | Type | Values |
@@ -467,9 +476,9 @@ Every `scaffold init` wizard question can be answered via CLI flags, making scaf
 - **Flag > auto > interactive**: Flags always take highest precedence. `--auto --engine unreal` uses defaults for everything except engine.
 - **Partial flags + interactive**: Provide some flags and the wizard asks only the remaining questions. `scaffold init --project-type game --engine unreal` prompts interactively for multiplayer, platforms, etc.
-- **Type-specific flags auto-set project type**: `--engine unity` automatically sets `--project-type game`, `--web-rendering ssr` sets `--project-type web-app`, `--backend-api-style rest` sets `--project-type backend`, `--cli-interactivity hybrid` sets `--project-type cli`, `--lib-visibility public` sets `--project-type library`, `--mobile-platform ios` sets `--project-type mobile-app`, `--pipeline-processing batch` sets `--project-type data-pipeline`, `--ml-phase training` sets `--project-type ml`, `--ext-manifest 3` sets `--project-type browser-extension`. Error if conflicting type.
-- **Cannot mix flag families**: `--web-rendering ssr --backend-api-style rest` is an error. Each flag family (`--web-*`, `--backend-*`, `--cli-*`, `--lib-*`, `--mobile-*`, `--pipeline-*`, `--ml-*`, `--ext-*`, game) is exclusive.
-- **Validation**: `--depth` requires `--methodology custom`. `--online-services` requires `--multiplayer online` or `hybrid`. SSR/hybrid rendering is incompatible with static deploy target. Session auth requires server state (not static). ML inference projects must specify a serving pattern. Browser extensions must declare at least one capability (UI surface, content script, or background worker).
+- **Type-specific flags auto-set project type**: `--engine unity` automatically sets `--project-type game`, `--web-rendering ssr` sets `--project-type web-app`, `--backend-api-style rest` sets `--project-type backend`, `--cli-interactivity hybrid` sets `--project-type cli`, `--lib-visibility public` sets `--project-type library`, `--mobile-platform ios` sets `--project-type mobile-app`, `--pipeline-processing batch` sets `--project-type data-pipeline`, `--ml-phase training` sets `--project-type ml`, `--ext-manifest 3` sets `--project-type browser-extension`, `--research-driver code-driven` sets `--project-type research`. Error if conflicting type.
+- **Cannot mix flag families**: `--web-rendering ssr --backend-api-style rest` is an error. Each flag family (`--web-*`, `--backend-*`, `--cli-*`, `--lib-*`, `--mobile-*`, `--pipeline-*`, `--ml-*`, `--research-*`, `--ext-*`, game) is exclusive.
+- **Validation**: `--depth` requires `--methodology custom`. `--online-services` requires `--multiplayer online` or `hybrid`. SSR/hybrid rendering is incompatible with static deploy target. Session auth requires server state (not static). ML inference projects must specify a serving pattern. Browser extensions must declare at least one capability (UI surface, content script, or background worker). Notebook-driven research cannot be fully autonomous.
 #### CI Examples
@@ -550,6 +559,16 @@ scaffold init --auto --methodology mvp --project-type browser-extension \
   --ext-manifest 3 --ext-ui-surfaces devtools \
   --no-ext-content-script
+# Autonomous quant-finance research (trading strategy optimization)
+scaffold init --auto --methodology deep --project-type research \
+  --research-driver code-driven --research-interaction autonomous \
+  --research-domain quant-finance
+# Checkpoint-gated ML architecture search
+scaffold init --auto --methodology deep --project-type research \
+  --research-driver config-driven --research-interaction checkpoint-gated \
+  --research-domain ml-research
 # Multiplayer mobile game with Unity
 scaffold init --project-type game --methodology deep --auto \
   --engine unity --multiplayer online --target-platforms ios,android \
@@ -576,7 +595,7 @@ Scaffold supports **project-type overlays** — domain-specific knowledge and pi
 - **Injects domain knowledge** into existing pipeline steps (e.g., SSR caching strategies into `tech-stack`, API pagination patterns into `coding-standards`)
-The game overlay additionally adjusts step enablement, remaps artifact references, and adds dependency overrides (because game development has fundamentally different artifacts). The web-app, backend, CLI, library, mobile-app, data-pipeline, ML, and browser-extension overlays are **knowledge-only** — they inject domain expertise into existing steps without changing which steps run or how they depend on each other.
+The game overlay additionally adjusts step enablement, remaps artifact references, and adds dependency overrides (because game development has fundamentally different artifacts). The web-app, backend, CLI, library, mobile-app, data-pipeline, ML, browser-extension, and research overlays are **knowledge-only** — they inject domain expertise into existing steps without changing which steps run or how they depend on each other. The research type additionally supports **domain sub-overlays** (quant-finance, ml-research, simulation) that layer domain-specific knowledge on top of the core research overlay.
 Overlays are composable with methodology presets. An MVP web-app gets fewer steps at lower depth; a deep backend project gets exhaustive analysis of every architectural decision.
@@ -590,6 +609,7 @@ Overlays are composable with methodology presets. An MVP web-app gets fewer step
 | `data-pipeline` | `data-pipeline-overlay.yml` | 12 entries (architecture, batch and streaming patterns, orchestration, schema management, quality, testing, security) | Processing model, orchestration, data quality strategy, schema management, data catalog |
 | `ml` | `ml-overlay.yml` | 12 entries (architecture, training and serving patterns, experiment tracking, model evaluation, observability, testing, security) | Project phase, model type, serving pattern, experiment tracking |
 | `browser-extension` | `browser-extension-overlay.yml` | 12 entries (architecture, manifest configuration, service workers, content scripts, cross-browser, store submission, testing, security) | Manifest version, UI surfaces, content script, background worker |
+| `research` | `research-overlay.yml` + domain sub-overlays | 25 entries (experiment loop, tracking, overfitting prevention, backtesting, risk metrics, architecture search, simulation) | Experiment driver, interaction mode, domain, experiment tracking |
 | `game` | `game-overlay.yml` | 24 entries (engines, networking, audio, VR/AR, economy, save systems, certification) | Engine, multiplayer, platforms, economy, narrative, and 6 more |
 ### Game Development
@@ -675,7 +695,7 @@ These answers control which conditional steps activate. A single-player puzzle g
 #### Multi-type Detection
-`scaffold adopt` detects 9 project types from manifest files and directory layouts:
+`scaffold adopt` detects 10 project types from manifest files and directory layouts:
 | Type | Key Signals |
 |------|-------------|
@@ -688,6 +708,7 @@ These answers control which conditional steps activate. A single-player puzzle g
 | `data-pipeline` | `dags/` dir, Airflow/Prefect/Dagster deps, Spark configs |
 | `ml` | `training/`/`models/` dirs, PyTorch/TensorFlow deps, MLflow configs |
 | `browser-extension` | `manifest.json` with `manifest_version` field |
+| `research` | `program.md` + `results.tsv`, backtest/strategy files with trading deps, optimization deps + experiment dirs, simulation framework deps |
 Each detector returns a confidence tier (high/medium/low) with evidence trails. Override detection with `--project-type <type>`.
@@ -926,23 +947,22 @@ You don't need both — Scaffold works with whichever CLIs are available. Having
 #### How mmr Works
 ```
-mmr review --pr 47           ──→  Dispatches to all channels in background
-                                   Returns job ID immediately
-                                   Agent continues working
-mmr status mmr-a1b2c3        ──→  Poll progress (which channels done?)
-                                   Exit code: 0=done, 1=running, 4=failed
+# Recommended: single-command pipeline (--sync)
+mmr review --pr 47 --sync    ──→  Dispatches to all channels
+                                   Runs compensating passes for unavailable channels
+                                   Parses outputs, reconciles findings
+                                   Applies severity gate, derives verdict
+                                   Exit code: 0=pass, 2=blocked, 3=needs-decision
-mmr results mmr-a1b2c3       ──→  Reconcile findings across channels
-                                   Run compensating passes for unavailable channels
-                                   Apply severity gate
-                                   Output unified findings
-                                   Exit code: 0=passed, 2=gate failed, 3=degraded
+# Alternative: step-by-step (for async workflows)
+mmr review --pr 47           ──→  Dispatch and await all channels
+mmr results mmr-a1b2c3       ──→  Reconcile findings, output verdict
 ```
 **Key features:**
-- **Async job model** — reviews run in background processes. The agent fires `mmr review` and continues working. No blocking for 4-6 minutes.
+- **--sync mode** — single-command pipeline: dispatch, parse, reconcile, verdict. The recommended entry point for agents and CI.
+- **Compensating passes** — when a channel is unavailable, a Claude-based review focused on that channel's strength area runs automatically.
 - **Per-channel auth verification** — checks authentication before every dispatch. Auth failures are never silent — `mmr` tells you exactly what expired and the command to fix it.
 - **Immutable core prompt** — every channel gets the same severity definitions (P0-P3), output format spec (JSON), and review criteria. No prompt drift between channels.
 - **Automated reconciliation** — when two channels flag the same location, that's consensus (high confidence). When only one channel flags something, it's unique (medium confidence). P0 from any single source is always high confidence.
@@ -1058,6 +1078,14 @@ You can also adjust per-channel timeouts, the default severity threshold, and na
 **After creating a PR:**
 ```bash
+# Recommended: single-command review
+mmr review --pr 47 --sync --focus "auth flow, session handling"
+# → Full review output with verdict and findings
+# Or with text output for readability:
+mmr review --pr 47 --sync --format text
+# Step-by-step (when you want to continue working while review runs):
 mmr review --pr 47 --focus "auth flow, session handling"
 # → Job mmr-a1b2c3 started. 2/2 channels dispatched.
 ```
@@ -1316,7 +1344,7 @@ scaffold dashboard
 ## Knowledge System
-Scaffold ships with 194 domain expertise entries organized in sixteen categories:
+Scaffold ships with 222 domain expertise entries organized in sixteen categories:
 - **core/** (26 entries) — eval craft, testing strategy, domain modeling, API design, database design, system architecture, ADR craft, security best practices, operations, task decomposition, user stories, UX specification, design system tokens, user story innovation, AI memory management, coding conventions, tech stack selection, project structure patterns, task tracking, CLAUDE.md patterns, multi-model review dispatch, review step template, dev environment, git workflow patterns, automated review tooling, vision craft
 - **product/** (5 entries) — PRD craft, PRD innovation, gap analysis, vision craft, vision innovation
@@ -1334,6 +1362,7 @@ Scaffold ships with 194 domain expertise entries organized in sixteen categories
 - **data-pipeline/** (12 entries) — batch/streaming/hybrid patterns, orchestration (DAG/event-driven/scheduled), data quality, schema management, lineage, pipeline testing
 - **ml/** (12 entries) — training and inference patterns, model types (classical/deep-learning/llm), serving patterns, experiment tracking, model evaluation, MLOps observability
 - **browser-extension/** (12 entries) — Manifest V3, content scripts, service workers, cross-browser compatibility, extension security, store submission
+- **research/** (25 entries) — experiment loop architecture, parameter optimization, overfitting prevention, experiment tracking, security/sandboxing; domain knowledge for quant-finance (backtesting, risk metrics, market data, strategy patterns), ML-research (architecture search, ablation studies, evaluation), and simulation (engine integration, parameter spaces, compute management)
 Each pipeline step declares which knowledge entries it needs in its frontmatter. The assembly engine injects them automatically. Knowledge files with a `## Deep Guidance` section are optimized for the CLI — only the deep guidance content is loaded into the assembled prompt, skipping the summary to avoid redundancy with the prompt text.
@@ -1540,7 +1569,7 @@ All build inputs live under `content/`:
 content/
 ├── pipeline/         # 60 meta-prompts organized by 16 phases (phases 0-15, including build)
 ├── tools/            # 10 tool meta-prompts (stateless, category: tool)
-├── knowledge/        # 194 domain expertise entries (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension)
+├── knowledge/        # 222 domain expertise entries (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension)
 ├── methodology/      # 3 YAML presets (deep, mvp, custom)
 └── skills/           # Skill templates with {{markers}} for multi-platform resolution (includes mmr)
 ```

package/content/knowledge/core/automated-review-tooling.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: automated-review-tooling
-description: Patterns for setting up automated PR code review using AI models (Codex, Gemini) via local CLI, including dual-model review, reconciliation, and CI integration
+description: Patterns for automated PR code review using AI CLI tools (Codex, Gemini, Claude) — orchestration, reconciliation, compensating passes, and CI integration
 topics: [code-review, automation, codex, gemini, pull-requests, ci-cd, review-tooling]
 ---
@@ -24,10 +24,10 @@ These are the authoritative verdict definitions. Tool files (`review-code.md`, `
 | Verdict | Condition |
 |---------|-----------|
-| `pass` | All configured channels ran, no unresolved P0/P1/P2 |
-| `degraded-pass` | Channels skipped, compensated, or have non-full coverage (e.g., partial timeout), no unresolved P0/P1/P2 |
-| `blocked` | Unresolved P0/P1/P2 after 3 fix rounds |
-| `needs-user-decision` | Contradictions or unresolvable findings |
+| `pass` | All channels completed, no unresolved P0/P1/P2 |
+| `degraded-pass` | Some channels unavailable, compensating passes ran, no unresolved P0/P1/P2 |
+| `blocked` | Findings at or above fix threshold remain unresolved |
+| `needs-user-decision` | No channels completed — insufficient data for a determination |
 **Verdict precedence:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. When multiple conditions apply, the higher-precedence verdict wins.
@@ -35,13 +35,13 @@ These are the authoritative verdict definitions. Tool files (`review-code.md`, `
 #### Status Model
-`compensating` is a **coverage label** applied to a channel's output, not a replacement for the root-cause status. Each channel retains its root-cause status (`not_installed`, `auth_failed`, `auth_timeout`, `failed`) AND gains a coverage label (`compensating (X-equivalent)`) when a compensating pass ran. The fix cycle uses the **root-cause status** to decide whether to retry (never retry `not_installed`, `auth_failed`, `auth_timeout`). The report uses the **coverage label** to show the reader what ran.
+`compensating` is a **coverage label** applied to a channel's output, not a replacement for the root-cause status. Each channel retains its root-cause status (`not_installed`, `auth_failed`, `timeout`, `failed`) AND gains a coverage label (`compensating (X-equivalent)`) when a compensating pass ran. The fix cycle uses the **root-cause status** to decide whether to retry (never retry `not_installed`, `auth_failed`, `timeout`). The report uses the **coverage label** to show the reader what ran.
 #### Compensating Passes
-When an external channel (Codex or Gemini) is unavailable, run a compensating Claude self-review pass:
+When a channel (Codex or Gemini) is unavailable, the CLI dispatches a compensating pass via `claude -p`:
-- Same prompt structure as the missing channel, executed as a Claude self-review pass.
+- Same prompt structure as the missing channel, executed as a `claude -p` dispatch.
 - Labeled `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]` in the review summary.
 - Missing Codex → focus on implementation correctness, security, API contracts.
 - Missing Gemini → focus on architectural patterns, design reasoning, broad context.
@@ -49,8 +49,6 @@ When an external channel (Codex or Gemini) is unavailable, run a compensating Cl
 - Compensating-pass findings are **single-source confidence** — they do NOT raise to high confidence even if they agree with another channel's findings.
 - Normal mandatory-fix thresholds apply: P0/P1/P2 findings from compensating passes still require fixing.
-**Superpowers channel:** No compensating pass needed — Superpowers is a Claude subagent and is always available. If the Superpowers plugin is not installed, run available external CLIs and warn the user that review coverage is reduced.
 #### Foreground-Only Execution
 Always run Codex and Gemini CLI commands as foreground Bash calls. Never use `run_in_background`, `&`, or `nohup`. Background execution produces empty or truncated output from Codex and Gemini CLIs. Multiple foreground calls can still run in parallel if the tool runner supports parallel tool invocations.
@@ -67,7 +65,7 @@ Reconciliation normalizes findings from all channels (real and compensating) to
 The reconciliation output is a deduplicated list of findings with confidence scores. High-confidence findings (agreed by 2+ real channels) are actionable without further discussion. Low-confidence findings (single-source, or from compensating passes) still require action at P0/P1/P2 but should be noted as lower-confidence in the review summary.
-Findings that appear in all three channels (Codex, Gemini, Superpowers) are considered maximum-confidence and should be surfaced first in the review summary. Findings that appear in only one channel should include the channel name in the finding description to help the developer assess confidence independently.
+Findings that appear in all three channels (Codex, Gemini, Claude) are considered maximum-confidence and should be surfaced first in the review summary. Findings that appear in only one channel should include the channel name in the finding description to help the developer assess confidence independently.
 ```bash
 # Orchestration reconciliation workflow
@@ -80,16 +78,15 @@ Findings that appear in all three channels (Codex, Gemini, Superpowers) are cons
 ### Channel Dispatch Pattern and Orchestration
-Each external channel (Codex, Gemini) follows the same dispatch pattern: check installation, check auth, then dispatch as a foreground call. If any step fails, record the root-cause status, queue a compensating pass, and continue to the next channel. The Superpowers channel is always available as a Claude subagent and does not require installation or auth checks.
+Each channel (Codex, Gemini, Claude) follows the same dispatch pattern: check installation, check auth, then dispatch as a foreground call. If any step fails, record the root-cause status, queue a compensating pass (for Codex/Gemini), and continue to the next channel.
 ```bash
 # Channel dispatch pattern
-# For each external channel (Codex, Gemini):
+# For each channel (codex, gemini, claude):
 #   1. command -v <tool> >/dev/null 2>&1 || { status=not_installed; queue_compensating; continue; }
 #   2. <auth_check> || { status=auth_failed; queue_compensating; continue; }
 #   3. <dispatch_foreground> || { status=failed; queue_compensating; continue; }
-# For Superpowers: dispatch subagent (always available)
-# After all: run queued compensating passes → reconcile → verdict
+# After all: run queued compensating passes (via claude -p) → reconcile → verdict
 ```
 After all channels and compensating passes complete, run the reconciliation workflow above and apply the verdict decision flow. Channel results and compensating-pass labels must be preserved in the review output for auditability — do not collapse or omit them even when findings are empty.
@@ -99,14 +96,14 @@ After all channels and compensating passes complete, run the reconciliation work
 When Codex is unavailable (not installed or auth failure), the orchestration proceeds as follows:
 1. The installation check (`command -v codex`) fails. Codex channel status is set to `not_installed`.
-2. A compensating Codex-equivalent pass is queued: a Claude self-review focused on implementation correctness, security, and API contracts.
-3. Gemini and Superpowers channels run normally.
+2. A compensating Codex-equivalent pass is queued: a `claude -p` dispatch focused on implementation correctness, security, and API contracts.
+3. Gemini and Claude channels run normally.
 4. The compensating pass runs, producing findings labeled `[compensating: Codex-equivalent]`.
-5. Reconciliation merges findings from all three sources (Gemini, Superpowers, compensating-Codex).
+5. Reconciliation merges findings from all three sources (Gemini, Claude, compensating-Codex).
 6. Maximum achievable verdict is `degraded-pass` because a real channel was absent.
 7. The review summary notes: "Codex channel: not_installed (compensating: Codex-equivalent pass ran)."
-**Fix-cycle channel rule:** Only re-run channels that originally completed or ran as compensating passes. `failed` channels are covered by their compensating pass and are not retried during fix rounds. Never retry a channel with status `not_installed`, `auth_failed`, or `auth_timeout` — these indicate persistent environment conditions that will not resolve between fix rounds.
+**Fix-cycle channel rule:** Only re-run channels that originally completed or ran as compensating passes. `failed` channels are covered by their compensating pass and are not retried during fix rounds. Never retry a channel with status `not_installed`, `auth_failed`, or `timeout` — these indicate persistent environment conditions that will not resolve between fix rounds.
 ### Verdict Decision Flow
@@ -114,19 +111,17 @@ Apply the following evaluation order to determine the final verdict. The first m
 ```
 Verdict evaluation order:
-1. Any contradictions or unresolvable findings? → needs-user-decision
+1. No channels completed? → needs-user-decision
 2. Any unresolved P0/P1/P2 after 3 fix rounds? → blocked
 3. Any channel not at full coverage? → degraded-pass
 4. All channels completed, no unresolved P0/P1/P2? → pass
 ```
-A "contradiction" exists when two channels report opposite conclusions about the same code location — for example, Codex flags a function as insecure while Gemini explicitly approves it. Contradictions cannot be resolved by the agent alone and must be surfaced to the user.
-A channel is "not at full coverage" when: it ran as a compensating pass instead of a real tool, it timed out partially, or the Superpowers plugin is not installed and available channels do not cover the full diff.
+A channel is "not at full coverage" when: it ran as a compensating pass instead of a real tool, or it timed out.
-**Verdict precedence reminder:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. If multiple conditions apply simultaneously (for example, both a contradiction and an unresolved P0 exist), the higher-precedence verdict wins.
+**Verdict precedence reminder:** `needs-user-decision` > `blocked` > `degraded-pass` > `pass`. When multiple conditions apply simultaneously, the higher-precedence verdict wins.
-The verdict is always computed after all fix rounds are exhausted — do not emit a partial verdict mid-cycle. If a fix round resolves all P0/P1/P2 findings and no contradictions remain, the verdict upgrades from `blocked` to `pass` or `degraded-pass` depending on channel coverage. This upgrade must be verified explicitly by re-running the reconciliation step after each fix round, not assumed from the fact that fixes were applied.
+The verdict is always computed after all fix rounds are exhausted — do not emit a partial verdict mid-cycle. If a fix round resolves all P0/P1/P2 findings, the verdict upgrades from `blocked` to `pass` or `degraded-pass` depending on channel coverage. This upgrade must be verified explicitly by re-running the reconciliation step after each fix round, not assumed from the fact that fixes were applied.
 ### Security-Focused Review Checklist
@@ -197,4 +192,4 @@ When external CLIs are unavailable, the degraded-mode behavior defined in the Su
 5. When both external channels are unavailable, note "All findings are single-model (Claude only). External validation was unavailable." in the review summary.
 6. Never silently drop unavailable channels — always record the channel status and compensating coverage label in the review output.
-**Superpowers channel exception:** Superpowers is a Claude subagent and requires no external CLI or auth. It is always available as long as the Superpowers plugin is installed in the Claude Code environment. If the plugin is not installed, run available external CLIs and warn the user that review coverage is reduced — but do not run a compensating pass for Superpowers (the compensating-pass mechanism only applies to external CLIs that have an installation/auth gate).
+**Claude CLI channel:** Claude CLI handles its own auth and is generally always available. The compensating-pass mechanism applies to external CLIs (Codex, Gemini) that have an installation/auth gate. When Codex or Gemini are unavailable, compensating passes are dispatched via `claude -p` with focused prompts targeting the missing channel's strength area.

package/content/knowledge/core/multi-model-review-dispatch.md CHANGED Viewed

@@ -1,27 +1,18 @@
 ---
 name: multi-model-review-dispatch
-description: Patterns for dispatching reviews to external AI models (Codex, Gemini) at depth 4+, including fallback strategies and finding reconciliation
-topics: [multi-model, code-review, depth-scaling, codex, gemini, review-synthesis]
+description: Patterns for dispatching reviews to AI CLI tools (Codex, Gemini, Claude), including fallback strategies and finding reconciliation
+topics: [multi-model, code-review, codex, gemini, claude, review-synthesis]
 ---
 # Multi-Model Review Dispatch
-At higher methodology depths (4+), reviews benefit from independent validation by external AI models. Different models have different blind spots — Codex excels at code-centric analysis while Gemini brings strength in design and architectural reasoning. Dispatching to multiple models and reconciling their findings produces higher-quality reviews than any single model alone. This knowledge covers when to dispatch, how to dispatch, how to handle failures, and how to reconcile disagreements.
+Reviews benefit from independent validation by multiple AI models. Different models have different blind spots — Codex excels at code-centric analysis, Gemini brings strength in design and architectural reasoning, and Claude provides plan alignment and code quality assessment. Dispatching to multiple models and reconciling their findings produces higher-quality reviews than any single model alone. This knowledge covers how to dispatch, how to handle failures, and how to reconcile disagreements.
 ## Summary
 ### When to Dispatch
-Multi-model review activates at depth 4+ in the methodology scaling system:
-| Depth | Review Approach |
-|-------|----------------|
-| 1-2 | Claude-only, reduced pass count |
-| 3 | Claude-only, full pass count |
-| 4 | Full passes + one external model (if available) |
-| 5 | Full passes + multi-model with reconciliation |
-Dispatch is always optional. If no external model CLI is available, the review proceeds as a Claude-only enhanced review with additional self-review passes to partially compensate.
+Multi-model review runs all enabled channels on every review. The MMR CLI (`mmr review --sync`) is the primary entry point and handles dispatch, parsing, reconciliation, and verdict derivation automatically.
 ### Model Selection
@@ -29,15 +20,16 @@ Dispatch is always optional. If no external model CLI is available, the review p
 |-------|----------|----------|
 | **Codex** (OpenAI) | Code analysis, implementation correctness, API contract validation | Code reviews, security reviews, API reviews, database schema reviews |
 | **Gemini** (Google) | Design reasoning, architectural patterns, broad context understanding | Architecture reviews, PRD reviews, UX reviews, domain model reviews |
+| **Claude** (Anthropic) | Plan alignment, code quality, testing thoroughness | Code reviews, plan verification, test coverage |
-When both models are available at depth 5, dispatch to both and reconcile. At depth 4, choose the model best suited to the artifact type.
+All enabled channels run on every review. When a channel is unavailable, a compensating pass is dispatched via `claude -p` focused on the missing channel's strength area.
 ### Graceful Fallback
 External models are never required. The fallback chain:
 1. Attempt dispatch to selected model(s)
 2. If CLI unavailable → skip that model, note in report
-3. If timeout → use partial results if any, note incompleteness
+3. If timeout → CLI kills the process; no partial output preserved; compensating pass runs
 4. If all external models fail → Claude-only enhanced review (additional self-review passes)
 The review never blocks on external model availability.
@@ -82,15 +74,15 @@ If auth fails, report status `auth_failed` and surface recovery to the user:
 - Codex: "Codex auth expired — run `! codex login` to re-authenticate"
 - Gemini: "Gemini auth expired — run `! gemini -p \"hello\"` to re-authenticate"
-If auth check times out (~5 seconds), retry once. If still failing, report `auth_timeout`.
+If auth check times out (~5 seconds), retry once. If still failing, report `timeout`.
 If auth succeeds, report `ready` and proceed to dispatch.
 **Post-dispatch terminal states:**
 - `completed` — channel produced results, use normally
-- `partial_timeout` — partial output before timeout; use what was received, note incompleteness. Does NOT trigger compensating pass.
-- `failed` — crashed or unparseable output; triggers compensating pass.
+- `timeout` — channel exceeded time limit; CLI kills the process and marks it as `timeout`; triggers compensating pass
+- `failed` — crashed or unparseable output; triggers compensating pass
-Verdict impact: `partial_timeout` and `failed` channels mean the review is degraded. Maximum verdict is `degraded-pass` when any channel has a non-`completed` terminal state.
+Verdict impact: `timeout` and `failed` channels mean the review is degraded. Maximum verdict is `degraded-pass` when any channel has a non-`completed` terminal state.
 #### Prompt Formatting
@@ -126,10 +118,12 @@ Respond with a JSON array of findings:
     "severity": "P0|P1|P2|P3",
     "category": "coverage|consistency|correctness|completeness",
     "location": "section or line reference",
-    "finding": "description of the issue",
+    "description": "description of the issue",
     "suggestion": "recommended fix"
   }
 ]
+Note: `id` and `category` are optional — the CLI auto-generates IDs (F-001, F-002, ...) when omitted.
 ```
 #### Output Parsing
@@ -137,15 +131,11 @@ Respond with a JSON array of findings:
 External model output is parsed as JSON. Handle common parsing issues:
 - Strip markdown code fences (```json ... ```) if the model wraps output
 - Handle trailing commas in JSON arrays
-- Validate that each finding has the required fields (severity, category, finding)
+- Validate that each finding has the required fields (severity, location, description, suggestion)
 - Discard malformed entries rather than failing the entire parse
-Store raw output for audit:
-```
-docs/reviews/{artifact}/codex-review.json   — raw Codex findings
-docs/reviews/{artifact}/gemini-review.json  — raw Gemini findings
-docs/reviews/{artifact}/review-summary.md   — reconciled synthesis
-```
+The CLI stores raw output at `~/.mmr/jobs/{job-id}/` per channel. Review results
+are available via `mmr results <job-id>`.
 ### Timeout Handling
@@ -158,14 +148,7 @@ External model calls can hang or take unreasonably long. Set reasonable timeouts
 | Medium artifact review (2000-10000 words) | 120 seconds | Needs more processing time |
 | Large artifact review (>10000 words) | 180 seconds | Maximum reasonable wait |
-#### Partial Result Handling
-If a timeout occurs mid-response:
-1. Check if the partial output contains valid JSON entries
-2. If yes, use the valid entries and note "partial results" in the report
-3. If no, treat as a model failure and fall back
-Never wait indefinitely. A review that completes in 3 minutes with Claude-only findings is better than one that blocks for 10 minutes waiting for an external model.
+Never wait indefinitely. A review that completes in 3 minutes with Claude-only findings is better than one that blocks for 10 minutes waiting for an external model. When a channel times out, the CLI kills the process — no partial output is preserved. A compensating pass runs in its place.
 ### Finding Reconciliation
@@ -224,9 +207,9 @@ When synthesizing multi-model findings, classify each finding:
 # Multi-Model Review Summary: [Artifact Name]
 ## Models Used
-- Claude (primary reviewer)
-- Codex (external, depth 4+) — [available/unavailable/timeout]
-- Gemini (external, depth 5) — [available/unavailable/timeout]
+- Claude CLI — [available/unavailable/timeout]
+- Codex CLI — [available/unavailable/timeout]
+- Gemini CLI — [available/unavailable/timeout]
 ## Consensus Findings
 | # | Severity | Finding | Models | Confidence |
@@ -251,14 +234,10 @@ or areas where external models provided unique value]
 #### Raw JSON Preservation
-Always preserve the raw JSON output from external models, even after reconciliation. The raw findings serve as an audit trail and enable re-analysis if the reconciliation logic is later improved.
+Always preserve the raw JSON output from each channel, even after reconciliation. The raw findings serve as an audit trail and enable re-analysis if the reconciliation logic is later improved.
-```
-docs/reviews/{artifact}/
-  codex-review.json     — raw output from Codex
-  gemini-review.json    — raw output from Gemini
-  review-summary.md     — reconciled synthesis
-```
+The CLI stores raw output at `~/.mmr/jobs/{job-id}/` with per-channel result files.
+Results are accessible via `mmr results <job-id>`.
 ### Quality Gates
@@ -266,20 +245,18 @@ Minimum standards for a multi-model review to be considered complete:
 | Gate | Threshold | Rationale |
 |------|-----------|-----------|
-| Minimum finding count | At least 3 findings across all models | A review with zero findings likely missed something |
-| Coverage threshold | Every review pass has at least one finding or explicit "no issues found" note | Ensures all passes were actually executed |
+| Coverage threshold | Every channel has at least one finding or explicit "no issues found" note | Ensures all channels were actually executed |
 | Reconciliation completeness | All cross-model disagreements have documented resolutions | No unresolved conflicts |
-| Raw output preserved | JSON files exist for all models that were dispatched | Audit trail |
+| Raw output preserved | Per-channel results exist for all dispatched channels | Audit trail |
-If the primary Claude review produces zero findings and external models are unavailable, the review should explicitly note this as unusual and recommend a targeted re-review at a later stage.
+Zero findings across all channels is a valid outcome when the diff is clean.
 #### Degraded-Mode Gate Adaptation
 When channels are skipped and compensating passes are used:
-- **Minimum finding count** gate: compensating passes count toward the total but are not treated as separate external channels for consensus purposes.
 - **Reconciliation completeness** gate (cross-model disagreement documentation): applies whenever 2+ distinct model perspectives participate (Claude + one external counts). N/A only when Claude is the sole perspective (no external models and no compensating passes that introduce genuinely different framing).
-- **Coverage threshold** gate: compensating passes satisfy the "every pass has at least one finding or explicit no-issues note" requirement.
+- **Coverage threshold** gate: compensating passes satisfy the "every channel has at least one finding or explicit no-issues note" requirement.
 - The reconciled output must record which channels were real, which were compensating, and which were skipped, so the orchestration layer can apply appropriate verdict logic.
 ### Common Anti-Patterns
@@ -288,12 +265,10 @@ When channels are skipped and compensating passes are used:
 **Ignoring disagreements.** Two models disagree, and the reviewer picks one without analysis. Fix: disagreements are the most valuable signal in multi-model review. They identify areas of genuine ambiguity or complexity. Always investigate and document the resolution.
-**Dispatching at low depth.** Running external model reviews at depth 1-2 where the review scope is intentionally minimal. The external model does a full analysis anyway, producing findings that are out of scope. Fix: only dispatch at depth 4+. Lower depths use Claude-only review with reduced pass count.
-**No fallback plan.** The review pipeline assumes external models are always available. When Codex is down, the review fails entirely. Fix: external dispatch is always optional. The fallback to Claude-only enhanced review must be implemented and tested.
+**No fallback plan.** The review pipeline assumes external models are always available. When Codex is down, the review fails entirely. Fix: external dispatch is always optional. The CLI automatically dispatches compensating passes via `claude -p` when channels are unavailable.
 **Over-weighting consensus.** Two models agree on a finding, so it must be correct. But both models may share the same bias (e.g., both flag a pattern as an anti-pattern that is actually appropriate for this project's constraints). Fix: consensus increases confidence but does not guarantee correctness. All findings still require artifact-level verification.
 **Dispatching the full pipeline context.** Sending the entire project context (all docs, all code) to the external model. This exceeds context limits and dilutes focus. Fix: send only the artifact under review and the minimal upstream context needed for that specific review.
-**Ignoring partial results.** A model times out after producing 3 of 5 findings. The reviewer discards all results because the review is "incomplete." Fix: partial results are still valuable. Include them with a note about incompleteness. Three real findings are better than zero.
+**Treating a timeout as a silent skip.** A channel times out and the reviewer proceeds without documenting it. Fix: when a channel times out, record the root-cause status as `timeout`, queue a compensating pass, and include it in the review summary. The CLI kills timed-out processes — no partial output is available, but the compensating pass ensures coverage.