npm - @zigrivers/scaffold - Versions diffs - 3.22.0 → 3.24.0 - Mend

@zigrivers/scaffold 3.22.0 → 3.24.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (111) hide show

package/README.md +44 -23
package/content/knowledge/core/automated-review-tooling.md +3 -3
package/content/knowledge/core/multi-model-review-dispatch.md +13 -4
package/content/knowledge/data-science/README.md +23 -0
package/content/knowledge/data-science/data-science-architecture.md +163 -0
package/content/knowledge/data-science/data-science-conventions.md +233 -0
package/content/knowledge/data-science/data-science-data-versioning.md +198 -0
package/content/knowledge/data-science/data-science-dev-environment.md +159 -0
package/content/knowledge/data-science/data-science-experiment-tracking.md +194 -0
package/content/knowledge/data-science/data-science-model-evaluation.md +160 -0
package/content/knowledge/data-science/data-science-notebook-discipline.md +170 -0
package/content/knowledge/data-science/data-science-observability.md +161 -0
package/content/knowledge/data-science/data-science-project-structure.md +178 -0
package/content/knowledge/data-science/data-science-reproducibility.md +164 -0
package/content/knowledge/data-science/data-science-requirements.md +151 -0
package/content/knowledge/data-science/data-science-security.md +151 -0
package/content/knowledge/data-science/data-science-testing.md +183 -0
package/content/knowledge/ml/README.md +10 -0
package/content/methodology/data-science-overlay.yml +39 -0
package/content/pipeline/build/multi-agent-resume.md +7 -6
package/content/pipeline/build/multi-agent-start.md +7 -6
package/content/pipeline/build/single-agent-resume.md +7 -6
package/content/pipeline/build/single-agent-start.md +7 -6
package/content/pipeline/environment/automated-pr-review.md +79 -27
package/content/skills/mmr/SKILL.md +72 -2
package/content/skills/scaffold-runner/SKILL.md +65 -19
package/content/tools/review-code.md +74 -16
package/content/tools/review-pr.md +25 -6
package/dist/cli/commands/check.d.ts.map +1 -1
package/dist/cli/commands/check.js +28 -17
package/dist/cli/commands/check.js.map +1 -1
package/dist/config/schema.d.ts +672 -126
package/dist/config/schema.d.ts.map +1 -1
package/dist/config/schema.js +8 -0
package/dist/config/schema.js.map +1 -1
package/dist/config/schema.test.js +2 -2
package/dist/config/schema.test.js.map +1 -1
package/dist/config/validators/data-science.d.ts +4 -0
package/dist/config/validators/data-science.d.ts.map +1 -0
package/dist/config/validators/data-science.js +15 -0
package/dist/config/validators/data-science.js.map +1 -0
package/dist/config/validators/index.d.ts.map +1 -1
package/dist/config/validators/index.js +2 -0
package/dist/config/validators/index.js.map +1 -1
package/dist/core/assembly/knowledge-loader.d.ts.map +1 -1
package/dist/core/assembly/knowledge-loader.js +6 -0
package/dist/core/assembly/knowledge-loader.js.map +1 -1
package/dist/core/assembly/knowledge-loader.test.js +34 -0
package/dist/core/assembly/knowledge-loader.test.js.map +1 -1
package/dist/e2e/project-type-overlays.test.js +73 -0
package/dist/e2e/project-type-overlays.test.js.map +1 -1
package/dist/project/adopt.d.ts.map +1 -1
package/dist/project/adopt.js +3 -1
package/dist/project/adopt.js.map +1 -1
package/dist/project/detectors/coverage.test.d.ts +2 -0
package/dist/project/detectors/coverage.test.d.ts.map +1 -0
package/dist/project/detectors/coverage.test.js +78 -0
package/dist/project/detectors/coverage.test.js.map +1 -0
package/dist/project/detectors/data-science.d.ts +4 -0
package/dist/project/detectors/data-science.d.ts.map +1 -0
package/dist/project/detectors/data-science.js +32 -0
package/dist/project/detectors/data-science.js.map +1 -0
package/dist/project/detectors/data-science.test.d.ts +2 -0
package/dist/project/detectors/data-science.test.d.ts.map +1 -0
package/dist/project/detectors/data-science.test.js +62 -0
package/dist/project/detectors/data-science.test.js.map +1 -0
package/dist/project/detectors/disambiguate.d.ts +2 -0
package/dist/project/detectors/disambiguate.d.ts.map +1 -1
package/dist/project/detectors/disambiguate.js +3 -2
package/dist/project/detectors/disambiguate.js.map +1 -1
package/dist/project/detectors/disambiguate.test.js +10 -1
package/dist/project/detectors/disambiguate.test.js.map +1 -1
package/dist/project/detectors/index.d.ts.map +1 -1
package/dist/project/detectors/index.js +2 -0
package/dist/project/detectors/index.js.map +1 -1
package/dist/project/detectors/library.d.ts.map +1 -1
package/dist/project/detectors/library.js +1 -0
package/dist/project/detectors/library.js.map +1 -1
package/dist/project/detectors/resolve-detection.test.js +31 -0
package/dist/project/detectors/resolve-detection.test.js.map +1 -1
package/dist/project/detectors/types.d.ts +6 -2
package/dist/project/detectors/types.d.ts.map +1 -1
package/dist/project/detectors/types.js.map +1 -1
package/dist/types/config.d.ts +8 -1
package/dist/types/config.d.ts.map +1 -1
package/dist/wizard/copy/core.d.ts.map +1 -1
package/dist/wizard/copy/core.js +4 -0
package/dist/wizard/copy/core.js.map +1 -1
package/dist/wizard/copy/data-science.d.ts +3 -0
package/dist/wizard/copy/data-science.d.ts.map +1 -0
package/dist/wizard/copy/data-science.js +15 -0
package/dist/wizard/copy/data-science.js.map +1 -0
package/dist/wizard/copy/index.d.ts.map +1 -1
package/dist/wizard/copy/index.js +2 -0
package/dist/wizard/copy/index.js.map +1 -1
package/dist/wizard/copy/types.d.ts +5 -1
package/dist/wizard/copy/types.d.ts.map +1 -1
package/dist/wizard/copy/types.test-d.js +7 -0
package/dist/wizard/copy/types.test-d.js.map +1 -1
package/dist/wizard/questions.d.ts +2 -1
package/dist/wizard/questions.d.ts.map +1 -1
package/dist/wizard/questions.js +9 -1
package/dist/wizard/questions.js.map +1 -1
package/dist/wizard/questions.test.js +14 -0
package/dist/wizard/questions.test.js.map +1 -1
package/dist/wizard/wizard.d.ts.map +1 -1
package/dist/wizard/wizard.js +1 -0
package/dist/wizard/wizard.js.map +1 -1
package/package.json +1 -1
package/skills/mmr/SKILL.md +72 -2
package/skills/scaffold-runner/SKILL.md +65 -19

package/README.md CHANGED Viewed

@@ -29,7 +29,7 @@ Either way, Scaffold constructs the prompt and the target AI tool does the work.
 **Assembly engine** — At execution time, Scaffold builds a 7-section prompt from: system metadata, the meta-prompt, knowledge base entries, project context (artifacts from prior steps), methodology settings, layered instructions, and depth-specific execution guidance.
-**Knowledge base** — 222 domain expertise entries in `content/knowledge/` organized in seventeen categories (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension, research) covering testing strategy, domain modeling, API design, security best practices, eval craft, TDD execution, task claiming, worktree management, release management, rendering strategies, data stores, CLI patterns, game engines, library bundling, mobile deployment, batch and streaming pipelines, model training and serving, browser extension manifests and service workers, and more. These get injected into prompts based on each step's `knowledge-base` frontmatter field. Knowledge files with a `## Deep Guidance` section are optimized for CLI assembly — only the deep guidance content is loaded, avoiding redundancy with the prompt text. Teams can add project-local overrides in `.scaffold/knowledge/` that layer on top of the global entries.
+**Knowledge base** — 235 domain expertise entries in `content/knowledge/` organized in eighteen categories (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension, research, data-science) covering testing strategy, domain modeling, API design, security best practices, eval craft, TDD execution, task claiming, worktree management, release management, rendering strategies, data stores, CLI patterns, game engines, library bundling, mobile deployment, batch and streaming pipelines, model training and serving, browser extension manifests and service workers, data-science reproducibility and notebook discipline, and more. These get injected into prompts based on each step's `knowledge-base` frontmatter field. Knowledge files with a `## Deep Guidance` section are optimized for CLI assembly — only the deep guidance content is loaded, avoiding redundancy with the prompt text. Teams can add project-local overrides in `.scaffold/knowledge/` that layer on top of the global entries.
 **Methodology presets** — Three built-in presets control which steps run and how deep the analysis goes:
 - **deep** (depth 5) — all steps enabled, exhaustive analysis
@@ -38,7 +38,7 @@ Either way, Scaffold constructs the prompt and the target AI tool does the work.
 **Depth scale** (1-5) — Controls how thorough each step's output is, from "focus on the core deliverable" (1) to "explore all angles, tradeoffs, and edge cases" (5). Depth resolves with 4-level precedence: CLI flag > step override > custom default > preset default.
-**Multi-model validation** — At depth 4-5, all 19 review and validation steps can dispatch independent reviews to Codex and/or Gemini CLIs. Two independent models catch more blind spots than one. When both CLIs are available, findings are reconciled by confidence level (both agree = high confidence, single model P0 = still actionable). When a channel is unavailable, a compensating Claude self-review pass runs in its place (labeled `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]`, single-source confidence). CLI commands must always run in the foreground — background execution produces empty output. See the [Multi-Model Review](#multi-model-review) section.
+**Multi-model validation** — At depth 4-5, review and validation steps can dispatch independent reviews to the three MMR CLI channels (Codex, Gemini, Claude) via the `mmr` CLI, plus the Superpowers code-reviewer agent as a complementary 4th channel on wrapper invocations (`scaffold run review-pr`, `scaffold run review-code`). The MMR-backed wrappers are the preferred path; some older depth-5 validation steps still dispatch Codex/Gemini directly via the `multi-model-dispatch` skill (migration in progress). Multiple independent models catch more blind spots than one. Findings are reconciled by confidence level (multiple channels agree = high confidence, single channel P0 = still actionable). When Codex or Gemini is unavailable, a compensating Claude self-review pass runs in its place (labeled `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]`, single-source confidence); there is no compensating pass when Claude itself is unavailable — the review simply proceeds with the remaining channels. CLI commands must always run in the foreground — background execution produces empty output. `mmr review` also supports non-PR targets (staged changes, branch diff, specific files) — see the [Multi-Model Review](#multi-model-review) section.
 **State management** — Pipeline progress is tracked in `.scaffold/state.json` with atomic file writes and crash recovery. An advisory lock prevents concurrent runs. Decisions are logged to an append-only `decisions.jsonl`. Pressing Ctrl+C during any command exits cleanly with an informative message — no stack traces, no orphaned locks, no corrupted state.
@@ -368,7 +368,7 @@ Every `scaffold init` wizard question can be answered via CLI flags, making scaf
 | `--depth` | 1-5 | Custom methodology depth (requires `--methodology custom`) |
 | `--adapters` | comma-sep | AI adapters: claude-code, codex, gemini |
 | `--traits` | comma-sep | Project traits: web, mobile |
-| `--project-type` | string | web-app, mobile-app, backend, cli, library, game, data-pipeline, ml, browser-extension, research |
+| `--project-type` | string | web-app, mobile-app, backend, cli, library, game, data-pipeline, ml, browser-extension, research, data-science |
 | `--auto` | boolean | Non-interactive mode (uses Zod defaults for unset flags) |
 #### Web-App Config Flags (require `--project-type web-app` or auto-set it)
@@ -457,6 +457,14 @@ Every `scaffold init` wizard question can be answered via CLI flags, making scaf
 | `--research-domain` | string | none, quant-finance, ml-research, simulation |
 | `--research-tracking` | boolean | `--research-tracking` / `--no-research-tracking` |
+#### Data Science Config (`--project-type data-science`)
+Data science has one forward-compatible config field in the schema, defaulted automatically — no CLI flags are needed in v1:
+| Config field | Values | Notes |
+|------|------|--------|
+| `dataScienceConfig.audience` | `solo` | Default (applied by the wizard and `--auto`). Covers the DS-1 audience (solo / small-team, local-first, prototyping). A future DS-2 release will extend the enum with `'platform'` (platform-engineered / larger-team DS) additively, without breaking existing configs. |
 #### Game Config Flags (require `--project-type game` or auto-set it)
 | Flag | Type | Values |
@@ -506,7 +514,7 @@ during assembly.
 - **Flag > auto > interactive**: Flags always take highest precedence. `--auto --engine unreal` uses defaults for everything except engine.
 - **Partial flags + interactive**: Provide some flags and the wizard asks only the remaining questions. `scaffold init --project-type game --engine unreal` prompts interactively for multiplayer, platforms, etc.
-- **Type-specific flags auto-set project type**: `--engine unity` automatically sets `--project-type game`, `--web-rendering ssr` sets `--project-type web-app`, `--backend-api-style rest` sets `--project-type backend`, `--cli-interactivity hybrid` sets `--project-type cli`, `--lib-visibility public` sets `--project-type library`, `--mobile-platform ios` sets `--project-type mobile-app`, `--pipeline-processing batch` sets `--project-type data-pipeline`, `--ml-phase training` sets `--project-type ml`, `--ext-manifest 3` sets `--project-type browser-extension`, `--research-driver code-driven` sets `--project-type research`. Error if conflicting type.
+- **Type-specific flags auto-set project type**: `--engine unity` automatically sets `--project-type game`, `--web-rendering ssr` sets `--project-type web-app`, `--backend-api-style rest` sets `--project-type backend`, `--cli-interactivity hybrid` sets `--project-type cli`, `--lib-visibility public` sets `--project-type library`, `--mobile-platform ios` sets `--project-type mobile-app`, `--pipeline-processing batch` sets `--project-type data-pipeline`, `--ml-phase training` sets `--project-type ml`, `--ext-manifest 3` sets `--project-type browser-extension`, `--research-driver code-driven` sets `--project-type research`. Error if conflicting type. (Data science currently has no dedicated CLI flags — pass `--project-type data-science` directly.)
 - **Cannot mix flag families**: `--web-rendering ssr --backend-api-style rest` is an error. Each flag family (`--web-*`, `--backend-*`, `--cli-*`, `--lib-*`, `--mobile-*`, `--pipeline-*`, `--ml-*`, `--research-*`, `--ext-*`, game) is exclusive.
 - **Validation**: `--depth` requires `--methodology custom`. `--online-services` requires `--multiplayer online` or `hybrid`. SSR/hybrid rendering is incompatible with static deploy target. Session auth requires server state (not static). ML inference projects must specify a serving pattern. Browser extensions must declare at least one capability (UI surface, content script, or background worker). Notebook-driven research cannot be fully autonomous.
@@ -599,6 +607,9 @@ scaffold init --auto --methodology deep --project-type research \
   --research-driver config-driven --research-interaction checkpoint-gated \
   --research-domain ml-research
+# Solo / small-team data science project (reproducibility-first)
+scaffold init --auto --methodology deep --project-type data-science
 # Multiplayer mobile game with Unity
 scaffold init --project-type game --methodology deep --auto \
   --engine unity --multiplayer online --target-platforms ios,android \
@@ -625,7 +636,7 @@ Scaffold supports **project-type overlays** — domain-specific knowledge and pi
 - **Injects domain knowledge** into existing pipeline steps (e.g., SSR caching strategies into `tech-stack`, API pagination patterns into `coding-standards`)
-The game overlay additionally adjusts step enablement, remaps artifact references, and adds dependency overrides (because game development has fundamentally different artifacts). The web-app, backend, CLI, library, mobile-app, data-pipeline, ML, browser-extension, and research overlays are **knowledge-only** — they inject domain expertise into existing steps without changing which steps run or how they depend on each other. The research type additionally supports **domain sub-overlays** (quant-finance, ml-research, simulation) that layer domain-specific knowledge on top of the core research overlay, and the backend type supports a `fintech` sub-overlay. Both research and backend accept `domain` as either a single string or an array (e.g. `domain: ['quant-finance', 'simulation']`) for stacking multiple sub-overlays; the wizard and CLI flags remain single-select in v1, so multi-domain stacking requires hand-editing `.scaffold/config.yml`.
+The game overlay additionally adjusts step enablement, remaps artifact references, and adds dependency overrides (because game development has fundamentally different artifacts). The web-app, backend, CLI, library, mobile-app, data-pipeline, ML, browser-extension, research, and data-science overlays are **knowledge-only** — they inject domain expertise into existing steps without changing which steps run or how they depend on each other. The research type additionally supports **domain sub-overlays** (quant-finance, ml-research, simulation) that layer domain-specific knowledge on top of the core research overlay, and the backend type supports a `fintech` sub-overlay. Both research and backend accept `domain` as either a single string or an array (e.g. `domain: ['quant-finance', 'simulation']`) for stacking multiple sub-overlays; the wizard and CLI flags remain single-select in v1, so multi-domain stacking requires hand-editing `.scaffold/config.yml`.
 Overlays are composable with methodology presets. An MVP web-app gets fewer steps at lower depth; a deep backend project gets exhaustive analysis of every architectural decision.
@@ -640,6 +651,7 @@ Overlays are composable with methodology presets. An MVP web-app gets fewer step
 | `ml` | `ml-overlay.yml` | 12 entries (architecture, training and serving patterns, experiment tracking, model evaluation, observability, testing, security) | Project phase, model type, serving pattern, experiment tracking |
 | `browser-extension` | `browser-extension-overlay.yml` | 12 entries (architecture, manifest configuration, service workers, content scripts, cross-browser, store submission, testing, security) | Manifest version, UI surfaces, content script, background worker |
 | `research` | `research-overlay.yml` + domain sub-overlays | 25 entries (experiment loop, tracking, overfitting prevention, backtesting, risk metrics, architecture search, simulation) | Experiment driver, interaction mode, domain, experiment tracking |
+| `data-science` | `data-science-overlay.yml` | 13 entries (reproducibility, experiment tracking, notebook discipline, model evaluation, data versioning, dev environment, observability, project structure, conventions, requirements, security, testing, architecture) | Audience (`solo` default; `platform` reserved for DS-2) |
 | `game` | `game-overlay.yml` | 24 entries (engines, networking, audio, VR/AR, economy, save systems, certification) | Engine, multiplayer, platforms, economy, narrative, and 6 more |
 ### Game Development
@@ -725,7 +737,7 @@ These answers control which conditional steps activate. A single-player puzzle g
 #### Multi-type Detection
-`scaffold adopt` detects 10 project types from manifest files and directory layouts:
+`scaffold adopt` detects 11 project types from manifest files and directory layouts:
 | Type | Key Signals |
 |------|-------------|
@@ -739,6 +751,7 @@ These answers control which conditional steps activate. A single-player puzzle g
 | `ml` | `training/`/`models/` dirs, PyTorch/TensorFlow deps, MLflow configs |
 | `browser-extension` | `manifest.json` with `manifest_version` field |
 | `research` | `program.md` + `results.tsv`, backtest/strategy files with trading deps, optimization deps + experiment dirs, simulation framework deps |
+| `data-science` | Marimo signals required (`marimo` dep or `.marimo.toml`); DVC (`dvc.yaml`, `.dvc/config`, `dvc` py dep) is supplementary evidence only. Low-tier; defers to `ml` / `research` / `data-pipeline` when those match at medium/high tier |
 Each detector returns a confidence tier (high/medium/low) with evidence trails. Override detection with `--project-type <type>`.
@@ -813,7 +826,7 @@ Claude sets up your local dev environment with one-command startup and live relo
 | `dev-env-setup` | Claude configures your project so `make dev` (or equivalent) starts everything — dev server with live reload, local database, environment variables — and documents the setup in a getting-started guide. |
 | `design-system` | Claude creates a visual language — color palette (WCAG-compliant), typography scale, spacing system, component patterns — and generates working theme config files for your frontend framework. *(web apps only)* |
 | `git-workflow` | Claude sets up your branching strategy, commit message format, PR workflow, CI pipeline with lint and test jobs, and worktree scripts so multiple AI agents can work in parallel without conflicts. |
-| `automated-pr-review` | Claude configures automated code review — using Codex and/or Gemini CLIs for dual-model review when available, or an external bot — with severity definitions and review criteria tailored to your project. *(optional)* |
+| `automated-pr-review` | Claude configures automated code review — three-CLI MMR dispatch (Codex, Gemini, Claude) plus Superpowers code-reviewer as a complementary 4th channel via the scaffold wrappers, with severity definitions and review criteria tailored to your project. Covers PRs and non-PR targets (local code, diffs, files). *(optional)* |
 | `ai-memory-setup` | Claude extracts conventions from your docs into path-scoped rule files that load automatically, optimizes CLAUDE.md with a pointer pattern, and optionally sets up persistent cross-session memory. |
 ### Phase 4 — Testing Integration (integration)
@@ -1201,7 +1214,9 @@ channels:
     command: claude -p
     auth:
       check: "claude -p 'respond with ok' 2>/dev/null"
-      timeout: 5
+      # Claude's auth probe is a full LLM round-trip (not a local status
+      # check) and routinely takes 9-14s, so 20s is the realistic default.
+      timeout: 20
       failure_exit_codes: [1]
       recovery: "Run: claude login"
@@ -1215,7 +1230,10 @@ channels:
       NO_BROWSER: "true"
     auth:
       check: "NO_BROWSER=true gemini -p 'respond with ok' -o json 2>&1"
-      timeout: 5
+      # Gemini's auth probe is also a full LLM round-trip; same reasoning
+      # as Claude. Codex stays at the 5s default (see below) because its
+      # check is a local file probe.
+      timeout: 20
       failure_exit_codes: [41]
       recovery: "Run: gemini -p 'hello' (interactive, opens browser)"
     timeout: 360     # Gemini tends to be slower
@@ -1272,11 +1290,12 @@ When multiple channels return findings, mmr applies consensus rules:
 | Scenario | Confidence | Action |
 |----------|-----------|--------|
-| Both models flag the same issue | **High** | Fix immediately |
-| Both models approve | **High** | Proceed confidently |
-| One flags P0, other approves | **High** | Fix it (P0 is critical) |
-| One flags P1, other approves | **Medium** | Review before fixing |
-| Models contradict each other | **Low** | Present both to user |
+| 2+ channels flag the same issue | **High** | Fix immediately |
+| All channels approve | **High** | Proceed confidently |
+| One channel flags P0, others approve | **High** | Fix it (P0 is critical) |
+| One channel flags P1, others approve | **Medium** | Review before fixing |
+| Channels contradict each other | **Low** | Present all perspectives to user |
+| Compensating-pass P0/P1/P2 finding | **Single-source** | Fix per normal thresholds, label as compensating |
 Scaffold verifies CLI authentication before every dispatch. If a token has expired, it tells you and provides the command to re-authenticate — it never silently skips a review.
@@ -1289,15 +1308,15 @@ At depth 1-3, reviews are Claude-only — still thorough with multiple passes, b
 ### What You Need
 - **Depth 4 or 5** — set during `scaffold init` or override per step
-- **At least one additional CLI** — Codex or Gemini (or both for triple-model review)
-- **Valid authentication** — Scaffold checks before every dispatch and tells you if credentials need refreshing
+- **At least one additional CLI** — Codex, Gemini, and/or Claude CLI. All three dispatched independently as MMR channels when available. Missing Codex or Gemini channels fall back to compensating Claude passes (labeled `[compensating: Codex-equivalent]` / `[compensating: Gemini-equivalent]`, single-source confidence); if Claude itself is unavailable, the review proceeds with the remaining channels — MMR does not compensate for a missing Claude channel.
+- **Valid authentication** — Scaffold checks before every dispatch (run `mmr config test` to pre-flight all three at once) and tells you if credentials need refreshing
 ## Methodology Presets
 Not every project needs all 60 steps. Choose a methodology when you run `scaffold init`:
 ### deep (depth 5)
-All steps enabled. Comprehensive analysis of every angle — domain modeling, ADRs, security review, traceability matrix, the works. At depth 4-5, review steps dispatch to Codex/Gemini CLIs for multi-model validation. Best for complex systems, team projects, or when you want thorough documentation.
+All steps enabled. Comprehensive analysis of every angle — domain modeling, ADRs, security review, traceability matrix, the works. At depth 4-5, review steps dispatch to the three MMR CLI channels (Codex, Gemini, Claude) for multi-model validation, with the Superpowers code-reviewer agent added as a complementary 4th channel via the scaffold wrappers. Best for complex systems, team projects, or when you want thorough documentation.
 ### mvp (depth 1)
 Only 7 critical steps: create-prd, review-prd, user-stories, review-user-stories, tdd, implementation-plan, and implementation-playbook. Minimal ceremony — get to code fast. Best for prototypes, hackathons, or solo projects.
@@ -1359,7 +1378,8 @@ scaffold check add-e2e-testing
 # → Applicable: yes | Platform: web | Brownfield: no | Mode: fresh
 scaffold check automated-pr-review
-# → Applicable: yes | GitHub remote: yes | Available CLIs: codex, gemini | Recommended: local-cli (dual-model)
+# → Applicable: yes | GitHub remote: yes | Available CLIs: codex, gemini, claude | Recommended: local-cli (three-CLI MMR review)
+# (suffix is `(three-CLI MMR review)` / `(two-CLI MMR review)` / `(single-CLI review)` based on how many of codex/gemini/claude are detected)
 scaffold check ai-memory-setup
 # → Rules: no | MCP server: none | Hooks: none | Mode: fresh
@@ -1374,7 +1394,7 @@ scaffold dashboard
 ## Knowledge System
-Scaffold ships with 222 domain expertise entries organized in sixteen categories:
+Scaffold ships with 235 domain expertise entries organized in eighteen categories:
 - **core/** (26 entries) — eval craft, testing strategy, domain modeling, API design, database design, system architecture, ADR craft, security best practices, operations, task decomposition, user stories, UX specification, design system tokens, user story innovation, AI memory management, coding conventions, tech stack selection, project structure patterns, task tracking, CLAUDE.md patterns, multi-model review dispatch, review step template, dev environment, git workflow patterns, automated review tooling, vision craft
 - **product/** (5 entries) — PRD craft, PRD innovation, gap analysis, vision craft, vision innovation
@@ -1393,6 +1413,7 @@ Scaffold ships with 222 domain expertise entries organized in sixteen categories
 - **ml/** (12 entries) — training and inference patterns, model types (classical/deep-learning/llm), serving patterns, experiment tracking, model evaluation, MLOps observability
 - **browser-extension/** (12 entries) — Manifest V3, content scripts, service workers, cross-browser compatibility, extension security, store submission
 - **research/** (25 entries) — experiment loop architecture, parameter optimization, overfitting prevention, experiment tracking, security/sandboxing; domain knowledge for quant-finance (backtesting, risk metrics, market data, strategy patterns), ML-research (architecture search, ablation studies, evaluation), and simulation (engine integration, parameter spaces, compute management)
+- **data-science/** (13 entries) — reproducibility, experiment tracking, notebook discipline, model evaluation, data versioning, dev environment (Marimo/Jupyter/Hex), observability, project structure, conventions, requirements, security, testing, architecture
 Each pipeline step declares which knowledge entries it needs in its frontmatter. The assembly engine injects them automatically. Knowledge files with a `## Deep Guidance` section are optimized for the CLI — only the deep guidance content is loaded into the assembled prompt, skipping the summary to avoid redundancy with the prompt text.
@@ -1438,9 +1459,9 @@ These are orthogonal to the pipeline — usable at any time, not tied to pipelin
 | `scaffold run update` | Update Scaffold to the latest version. |
 | `scaffold run dashboard` | Open a visual progress dashboard in your browser. |
 | `scaffold run prompt-pipeline` | Print the full pipeline reference table. |
-| `scaffold run review-code` | Run all 3 code review channels on local code before commit or push. |
-| `scaffold run review-pr` | Run all 3 code review channels (Codex CLI, Gemini CLI, Superpowers) on a PR. |
-| `scaffold run post-implementation-review` | Full 3-channel codebase review after an AI agent completes all tasks — checks requirements coverage, security, architecture alignment, and more. |
+| `scaffold run review-code` | Run all 3 MMR CLI review channels (Codex CLI, Gemini CLI, Claude CLI) on tracked local code (committed branch diff + staged + unstaged — no untracked files) before commit or push, plus Superpowers code-reviewer as a complementary 4th channel. |
+| `scaffold run review-pr` | Run all 3 MMR CLI review channels (Codex CLI, Gemini CLI, Claude CLI) on a PR, plus Superpowers code-reviewer as a complementary 4th channel. Also usable on non-PR targets (staged changes, branch diff, specific files) via `mmr review` directly. |
+| `scaffold run post-implementation-review` | Full codebase review (Codex CLI + Gemini CLI + Superpowers code-reviewer — note: does not currently include Claude CLI as a standard channel) after an AI agent completes all tasks — checks requirements coverage, security, architecture alignment, and more. |
 | `scaffold run spark` | Explore and expand a raw project idea through Socratic questioning, competitive research, and innovation expansion. Produces a `docs/spark-brief.md` that feeds into `create-vision`. At depth 4+, dispatches to external models for independent research and adversarial red-teaming. |
 | `scaffold run session-analyzer` | Analyze Claude Code session logs for patterns and insights. |
@@ -1599,7 +1620,7 @@ All build inputs live under `content/`:
 content/
 ├── pipeline/         # 60 meta-prompts organized by 16 phases (phases 0-15, including build)
 ├── tools/            # 10 tool meta-prompts (stateless, category: tool)
-├── knowledge/        # 222 domain expertise entries (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension)
+├── knowledge/        # 235 domain expertise entries (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension, research, data-science)
 ├── methodology/      # 3 YAML presets (deep, mvp, custom)
 └── skills/           # Skill templates with {{markers}} for multi-platform resolution (includes mmr)
 ```

package/content/knowledge/core/automated-review-tooling.md CHANGED Viewed

@@ -1,12 +1,12 @@
 ---
 name: automated-review-tooling
-description: Patterns for automated PR code review using AI CLI tools (Codex, Gemini, Claude) — orchestration, reconciliation, compensating passes, and CI integration
-topics: [code-review, automation, codex, gemini, pull-requests, ci-cd, review-tooling]
+description: Patterns for automated code review using AI CLI tools (Codex, Gemini, Claude) — three-CLI MMR orchestration plus Superpowers 4th channel in wrappers, reconciliation, compensating passes, PR + non-PR targets, and CI integration
+topics: [code-review, automation, codex, gemini, claude, pull-requests, non-pr-review, mmr, ci-cd, review-tooling]
 ---
 # Automated Review Tooling
-Automated PR review leverages AI models to provide consistent, thorough code review without manual reviewer bottlenecks. This knowledge covers the local CLI approach (no GitHub Actions), dual-model review patterns, and integration with the PR workflow.
+Automated code review leverages AI models to provide consistent, thorough code review without manual reviewer bottlenecks. This knowledge covers the local CLI approach (no GitHub Actions), the three-channel MMR orchestration (Codex + Gemini + Claude) with the Superpowers code-reviewer added as a complementary 4th channel by the scaffold MMR wrappers, and integration with both PR and non-PR review targets (local code, branch diffs, specific files).
 ## Summary

package/content/knowledge/core/multi-model-review-dispatch.md CHANGED Viewed

@@ -48,7 +48,7 @@ When an AI agent dispatches CLI reviews via a tool runner (Claude Code Bash tool
 Before dispatching, verify the model CLI is installed and authenticated using a two-step process that produces distinct statuses for the orchestration layer:
-**Step 1 — Installation check:**
+**Step 1 — Installation check** (all three MMR channels):
 ```bash
 # Codex: not found -> status: "not_installed"
@@ -56,6 +56,9 @@ command -v codex >/dev/null 2>&1
 # Gemini: not found -> status: "not_installed"
 command -v gemini >/dev/null 2>&1
+# Claude CLI: not found -> status: "not_installed"
+command -v claude >/dev/null 2>&1
 ```
 If the CLI is not found, report status `not_installed` to the orchestration layer. Do not prompt the user to install it.
@@ -64,17 +67,23 @@ If the CLI is not found, report status `not_installed` to the orchestration laye
 ```bash
 # Codex: fail -> status: "auth_failed"
-codex login status 2>/dev/null
+codex login status 2>/dev/null                                       # local file probe
 # Gemini: exit 41 -> status: "auth_failed"
-NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
+NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1             # full LLM round-trip
+# Claude CLI: non-zero -> status: "auth_failed"
+claude -p "respond with ok" 2>/dev/null                              # full LLM round-trip
 ```
+Prefer `mmr config test` as a single-command pre-flight that runs all three checks and emits structured JSON.
 If auth fails, report status `auth_failed` and surface recovery to the user:
 - Codex: "Codex auth expired — run `! codex login` to re-authenticate"
 - Gemini: "Gemini auth expired — run `! gemini -p \"hello\"` to re-authenticate"
+- Claude CLI: "Claude CLI auth expired — run `! claude login` to re-authenticate"
-If auth check times out (~5 seconds), retry once. If still failing, report `timeout`.
+Auth-check timeouts: Codex's check is a local file probe so the default is 5s; Gemini's and Claude's are full LLM round-trips and routinely take 9-14s, so MMR's built-in defaults use 20s for those two. If a check times out, retry once. If still failing, report `timeout`.
 If auth succeeds, report `ready` and proceed to dispatch.
 **Post-dispatch terminal states:**

package/content/knowledge/data-science/README.md ADDED Viewed

@@ -0,0 +1,23 @@
+# `data-science/` knowledge
+Solo / small-team data-science domain knowledge injected into universal pipeline
+steps by `content/methodology/data-science-overlay.yml`.
+## Lockstep pairs with `ml/`
+Five documents here mirror documents in `content/knowledge/ml/`. The two
+overlays never compose at runtime (a user picks exactly one project type), but
+edits to one side of a pair should trigger review of the other to prevent
+recommendation drift over time:
+| `data-science/`                         | `ml/`                            |
+| --------------------------------------- | -------------------------------- |
+| `data-science-experiment-tracking.md`   | `ml-experiment-tracking.md`      |
+| `data-science-model-evaluation.md`      | `ml-model-evaluation.md`         |
+| `data-science-observability.md`         | `ml-observability.md`            |
+| `data-science-requirements.md`          | `ml-requirements.md`             |
+| `data-science-conventions.md`           | `ml-conventions.md`              |
+`ml/` targets production training and serving systems. `data-science/` targets
+solo / small-team analytics and prototyping. Tool picks may diverge where the
+audience justifies it (e.g. MLflow self-hosted vs managed W&B).

package/content/knowledge/data-science/data-science-architecture.md ADDED Viewed

@@ -0,0 +1,163 @@
+---
+name: data-science-architecture
+description: Local-first architecture for solo and small-team data science — notebook exploration, src/ promotion, idempotent entrypoint pipelines, Polars vs Pandas choice, and artifact separation
+topics: [data-science, architecture, polars, pandas, notebook-promotion]
+---
+"Architecture" sounds heavy for a single analyst opening a notebook, but it is the one decision that separates work a collaborator can rerun tomorrow from a pile of ad-hoc scripts that only you can coax back to life. Solo DS work is local-first, reproducibility-first, and almost never needs Airflow or a Kubernetes cluster. What it needs is a coherent shape that scales from "a single notebook" to "a pipeline a teammate can clone and run" — and a clear story about where raw data, intermediate data, models, and reports each live. This doc lays out that shape and the small set of conventions that make it hold together.
+## Summary
+Architect a solo DS project as layers: exploratory notebooks on top, reusable functions in `src/`, unit tests in `tests/`, and a thin entrypoint script that composes those functions into a reproducible run. Use Polars for datasets >1 GB or >10M rows and Pandas for everything smaller where scikit-learn / seaborn compatibility matters. Runs happen via `uv run python -m src.pipeline` — no scheduler needed. Pipelines are idempotent functions that move data from `data/raw/` to `data/interim/` to `data/processed/`, emitting models to `models/` and reports to `reports/`. This shape deliberately does not solve distributed data, production serving, or real-time inference — when those become real, graduate to Prefect / Dagster and cross over to `ml-serving-patterns.md`.
+## Deep Guidance
+### The layered shape
+The entire architecture is five layers, each with a single responsibility:
+```
+┌──────────────────────────────────────────────────────────────┐
+│ notebooks/               exploration, narrative, charts       │
+│   ↓  (promote stable code)                                    │
+├──────────────────────────────────────────────────────────────┤
+│ src/<project>/           typed, importable functions          │
+│   ↓  (test every function you ship)                           │
+├──────────────────────────────────────────────────────────────┤
+│ tests/                   pytest smoke + unit tests            │
+│   ↓  (functions compose into a run)                           │
+├──────────────────────────────────────────────────────────────┤
+│ src/pipeline.py          entrypoint: load→features→train→save │
+│   ↓  (run produces artifacts)                                 │
+├──────────────────────────────────────────────────────────────┤
+│ data/ models/ reports/   outputs, gitignored or DVC-tracked   │
+└──────────────────────────────────────────────────────────────┘
+```
+Read top-to-bottom it is the promotion path; read bottom-to-top it is the dependency graph. A notebook may import from `src/` but `src/` must never import from a notebook. Tests depend only on `src/`. The entrypoint (`pipeline.py`) is itself a module under `src/`, not a loose script at the repo root — keeping it importable lets you exercise it end-to-end in tests with a tiny fixture dataset.
+### Polars vs Pandas
+Pick the DataFrame library based on data size and ecosystem needs, not on what's trendy. Rule of thumb:
+| Dimension            | Pandas                                | Polars                                  |
+|----------------------|---------------------------------------|-----------------------------------------|
+| Rows                 | <10M comfortably                      | 10M–1B on a single machine              |
+| In-memory size       | <1 GB                                 | 1 GB – ~RAM/2                           |
+| Execution            | Eager, single-threaded                | Lazy + multi-threaded, Arrow-native     |
+| Ecosystem            | scikit-learn, seaborn, plotly, statsmodels | Native; interop via `.to_pandas()` |
+| API stability        | Mature, huge Stack Overflow corpus    | Younger, faster-moving                  |
+**Default to Pandas** when you are in sklearn / statsmodels / seaborn territory with small-to-medium data — ecosystem friction is the dominant cost. **Reach for Polars** when you are doing heavy group-bys, joins, or window functions on datasets where Pandas starts swapping or takes minutes per cell. The two libraries express the same group-by almost identically:
+```python
+# Pandas
+(df
+ .groupby("customer_id")
+ .agg(total_spend=("amount", "sum"), tx_count=("amount", "count"))
+ .reset_index())
+# Polars (lazy — add .collect() to execute)
+(df.lazy()
+ .group_by("customer_id")
+ .agg(pl.col("amount").sum().alias("total_spend"),
+      pl.col("amount").count().alias("tx_count"))
+ .collect())
+```
+Mixing is fine: load with Polars, do the fast aggregation, then `.to_pandas()` right before feeding a scikit-learn estimator. Avoid the trap of half-converting the codebase — pick one as the default for a given project and document it.
+### Notebook to pipeline promotion
+Every piece of code starts life in a notebook. The discipline is knowing when to move it:
+1. You copy-paste a cell into a second notebook → promote.
+2. A transformation has a non-trivial branch (try/except, conditional handling) → promote.
+3. You want to unit-test it → promote (you can't test a notebook cell cleanly).
+Promotion is a four-step move: extract the cell into `src/<project>/features/engineer.py` as a typed function, add a pytest in `tests/`, replace the notebook cell with an `import`, and turn on `%autoreload 2` so subsequent edits live-reload without a kernel restart.
+```python
+# src/<project>/features/engineer.py
+import polars as pl
+def add_tenure_bucket(df: pl.DataFrame, *, today: str) -> pl.DataFrame:
+    """Bucket customers by days since signup into short / medium / long tenure."""
+    return df.with_columns(
+        ((pl.lit(today).str.to_date() - pl.col("signup_date")).dt.total_days())
+        .alias("tenure_days")
+    ).with_columns(
+        pl.when(pl.col("tenure_days") < 90).then(pl.lit("short"))
+          .when(pl.col("tenure_days") < 365).then(pl.lit("medium"))
+          .otherwise(pl.lit("long"))
+          .alias("tenure_bucket")
+    )
+```
+The notebook now reads `from <project>.features.engineer import add_tenure_bucket` and the function is covered by `tests/test_engineer.py` with a six-row fixture. This is the single most important habit in a DS codebase — see `data-science-project-structure.md` for the directory layout it slots into.
+### Idempotent pipeline entrypoints
+The pipeline is a thin composition layer — one function per stage, each one idempotent (same inputs → same outputs, safe to rerun). It lives at `src/<project>/pipeline.py` and exposes a `main(cfg)` that a CLI wraps:
+```python
+# src/<project>/pipeline.py
+import argparse, yaml
+from pathlib import Path
+from <project>.ingestion import load_transactions
+from <project>.validation import validate_schema
+from <project>.features.engineer import build_features
+from <project>.training import train_model
+from <project>.evaluation import evaluate
+from <project>.io import save_model, save_report
+def run(cfg: dict) -> None:
+    run_id = cfg["run_name"]
+    raw = load_transactions(cfg["data"]["raw_path"])
+    validate_schema(raw, cfg["data"]["schema"])
+    processed = build_features(raw, cfg["features"])
+    processed.write_parquet(Path(cfg["data"]["processed_path"]))
+    model, metrics = train_model(processed, cfg["model"])
+    report = evaluate(model, processed, cfg["evaluation"])
+    save_model(model, f"models/{run_id}.joblib")
+    save_report(report, f"reports/{run_id}.html")
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--config", required=True, type=Path)
+    args = ap.parse_args()
+    run(yaml.safe_load(args.config.read_text()))
+if __name__ == "__main__":
+    main()
+```
+Invoke it with `uv run python -m <project>.pipeline --config configs/baseline.yaml`. Idempotence means: each stage writes to a deterministic path based on the config, and re-running over existing outputs is a no-op (or an overwrite of identical content). That property is what lets a teammate — or future-you — rerun the pipeline confidently without inspecting every intermediate.
+### Where outputs go
+Artifacts follow a strict directory contract so a run never scatters files:
+| Artifact               | Path                                | Notes                                    |
+|------------------------|-------------------------------------|------------------------------------------|
+| Immutable source data  | `data/raw/`                         | Never written to after initial ingest    |
+| Cached partial transforms | `data/interim/`                  | Safe to delete; regenerable from raw     |
+| Analysis-ready datasets | `data/processed/`                  | Consumed by training                     |
+| Predictions            | `data/processed/predictions/`       | Keeps inference outputs alongside data   |
+| Trained models         | `models/<run_id>.joblib`            | DVC or git-lfs pointer tracked           |
+| Rendered reports       | `reports/<run_id>.html`             | HTML / markdown summaries                |
+| Figures                | `reports/figures/<run_id>/`         | PNG / SVG charts                         |
+The rule: **paths come from config, never hard-coded in code**. `cfg["output"]["model_path"]` lives in the YAML; `"models/baseline_v1.joblib"` never appears as a string literal inside `training.py`. That is what lets a single pipeline module serve every run variant.
+### When to outgrow this
+This architecture covers the 0-to-100GB, one-to-three-contributors slot. Signals you are leaving that slot:
+- Data no longer fits on a laptop (>100 GB, or streaming sources) → Spark, DuckDB+S3, or a warehouse-side pipeline.
+- You need scheduled / triggered runs with retries, alerting, observability → Prefect, Dagster, or Airflow.
+- The model must serve real-time predictions with SLA → cross over to `ml-serving-patterns.md` for online inference, feature stores, and the training-serving split.
+- Multiple people are editing the pipeline concurrently → promote `configs/` to a registry, add a model registry (MLflow), and start writing ADRs under `docs/adr/`.
+- The team wants experiment tracking beyond a CSV of metrics → MLflow Tracking or Weights & Biases.
+Do not preemptively adopt any of these. Installing Dagster for a weekly notebook is a classic small-team failure mode — the operational tax (scheduler, DB, UI, auth) dwarfs the benefit. Graduate one piece at a time, and only when the pain is concrete. The layered shape above is deliberately the smallest coherent thing; resist making it bigger until the evidence demands it.