@zigrivers/scaffold 3.22.0 → 3.24.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +44 -23
- package/content/knowledge/core/automated-review-tooling.md +3 -3
- package/content/knowledge/core/multi-model-review-dispatch.md +13 -4
- package/content/knowledge/data-science/README.md +23 -0
- package/content/knowledge/data-science/data-science-architecture.md +163 -0
- package/content/knowledge/data-science/data-science-conventions.md +233 -0
- package/content/knowledge/data-science/data-science-data-versioning.md +198 -0
- package/content/knowledge/data-science/data-science-dev-environment.md +159 -0
- package/content/knowledge/data-science/data-science-experiment-tracking.md +194 -0
- package/content/knowledge/data-science/data-science-model-evaluation.md +160 -0
- package/content/knowledge/data-science/data-science-notebook-discipline.md +170 -0
- package/content/knowledge/data-science/data-science-observability.md +161 -0
- package/content/knowledge/data-science/data-science-project-structure.md +178 -0
- package/content/knowledge/data-science/data-science-reproducibility.md +164 -0
- package/content/knowledge/data-science/data-science-requirements.md +151 -0
- package/content/knowledge/data-science/data-science-security.md +151 -0
- package/content/knowledge/data-science/data-science-testing.md +183 -0
- package/content/knowledge/ml/README.md +10 -0
- package/content/methodology/data-science-overlay.yml +39 -0
- package/content/pipeline/build/multi-agent-resume.md +7 -6
- package/content/pipeline/build/multi-agent-start.md +7 -6
- package/content/pipeline/build/single-agent-resume.md +7 -6
- package/content/pipeline/build/single-agent-start.md +7 -6
- package/content/pipeline/environment/automated-pr-review.md +79 -27
- package/content/skills/mmr/SKILL.md +72 -2
- package/content/skills/scaffold-runner/SKILL.md +65 -19
- package/content/tools/review-code.md +74 -16
- package/content/tools/review-pr.md +25 -6
- package/dist/cli/commands/check.d.ts.map +1 -1
- package/dist/cli/commands/check.js +28 -17
- package/dist/cli/commands/check.js.map +1 -1
- package/dist/config/schema.d.ts +672 -126
- package/dist/config/schema.d.ts.map +1 -1
- package/dist/config/schema.js +8 -0
- package/dist/config/schema.js.map +1 -1
- package/dist/config/schema.test.js +2 -2
- package/dist/config/schema.test.js.map +1 -1
- package/dist/config/validators/data-science.d.ts +4 -0
- package/dist/config/validators/data-science.d.ts.map +1 -0
- package/dist/config/validators/data-science.js +15 -0
- package/dist/config/validators/data-science.js.map +1 -0
- package/dist/config/validators/index.d.ts.map +1 -1
- package/dist/config/validators/index.js +2 -0
- package/dist/config/validators/index.js.map +1 -1
- package/dist/core/assembly/knowledge-loader.d.ts.map +1 -1
- package/dist/core/assembly/knowledge-loader.js +6 -0
- package/dist/core/assembly/knowledge-loader.js.map +1 -1
- package/dist/core/assembly/knowledge-loader.test.js +34 -0
- package/dist/core/assembly/knowledge-loader.test.js.map +1 -1
- package/dist/e2e/project-type-overlays.test.js +73 -0
- package/dist/e2e/project-type-overlays.test.js.map +1 -1
- package/dist/project/adopt.d.ts.map +1 -1
- package/dist/project/adopt.js +3 -1
- package/dist/project/adopt.js.map +1 -1
- package/dist/project/detectors/coverage.test.d.ts +2 -0
- package/dist/project/detectors/coverage.test.d.ts.map +1 -0
- package/dist/project/detectors/coverage.test.js +78 -0
- package/dist/project/detectors/coverage.test.js.map +1 -0
- package/dist/project/detectors/data-science.d.ts +4 -0
- package/dist/project/detectors/data-science.d.ts.map +1 -0
- package/dist/project/detectors/data-science.js +32 -0
- package/dist/project/detectors/data-science.js.map +1 -0
- package/dist/project/detectors/data-science.test.d.ts +2 -0
- package/dist/project/detectors/data-science.test.d.ts.map +1 -0
- package/dist/project/detectors/data-science.test.js +62 -0
- package/dist/project/detectors/data-science.test.js.map +1 -0
- package/dist/project/detectors/disambiguate.d.ts +2 -0
- package/dist/project/detectors/disambiguate.d.ts.map +1 -1
- package/dist/project/detectors/disambiguate.js +3 -2
- package/dist/project/detectors/disambiguate.js.map +1 -1
- package/dist/project/detectors/disambiguate.test.js +10 -1
- package/dist/project/detectors/disambiguate.test.js.map +1 -1
- package/dist/project/detectors/index.d.ts.map +1 -1
- package/dist/project/detectors/index.js +2 -0
- package/dist/project/detectors/index.js.map +1 -1
- package/dist/project/detectors/library.d.ts.map +1 -1
- package/dist/project/detectors/library.js +1 -0
- package/dist/project/detectors/library.js.map +1 -1
- package/dist/project/detectors/resolve-detection.test.js +31 -0
- package/dist/project/detectors/resolve-detection.test.js.map +1 -1
- package/dist/project/detectors/types.d.ts +6 -2
- package/dist/project/detectors/types.d.ts.map +1 -1
- package/dist/project/detectors/types.js.map +1 -1
- package/dist/types/config.d.ts +8 -1
- package/dist/types/config.d.ts.map +1 -1
- package/dist/wizard/copy/core.d.ts.map +1 -1
- package/dist/wizard/copy/core.js +4 -0
- package/dist/wizard/copy/core.js.map +1 -1
- package/dist/wizard/copy/data-science.d.ts +3 -0
- package/dist/wizard/copy/data-science.d.ts.map +1 -0
- package/dist/wizard/copy/data-science.js +15 -0
- package/dist/wizard/copy/data-science.js.map +1 -0
- package/dist/wizard/copy/index.d.ts.map +1 -1
- package/dist/wizard/copy/index.js +2 -0
- package/dist/wizard/copy/index.js.map +1 -1
- package/dist/wizard/copy/types.d.ts +5 -1
- package/dist/wizard/copy/types.d.ts.map +1 -1
- package/dist/wizard/copy/types.test-d.js +7 -0
- package/dist/wizard/copy/types.test-d.js.map +1 -1
- package/dist/wizard/questions.d.ts +2 -1
- package/dist/wizard/questions.d.ts.map +1 -1
- package/dist/wizard/questions.js +9 -1
- package/dist/wizard/questions.js.map +1 -1
- package/dist/wizard/questions.test.js +14 -0
- package/dist/wizard/questions.test.js.map +1 -1
- package/dist/wizard/wizard.d.ts.map +1 -1
- package/dist/wizard/wizard.js +1 -0
- package/dist/wizard/wizard.js.map +1 -1
- package/package.json +1 -1
- package/skills/mmr/SKILL.md +72 -2
- package/skills/scaffold-runner/SKILL.md +65 -19
package/README.md
CHANGED
|
@@ -29,7 +29,7 @@ Either way, Scaffold constructs the prompt and the target AI tool does the work.
|
|
|
29
29
|
|
|
30
30
|
**Assembly engine** — At execution time, Scaffold builds a 7-section prompt from: system metadata, the meta-prompt, knowledge base entries, project context (artifacts from prior steps), methodology settings, layered instructions, and depth-specific execution guidance.
|
|
31
31
|
|
|
32
|
-
**Knowledge base** —
|
|
32
|
+
**Knowledge base** — 235 domain expertise entries in `content/knowledge/` organized in eighteen categories (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension, research, data-science) covering testing strategy, domain modeling, API design, security best practices, eval craft, TDD execution, task claiming, worktree management, release management, rendering strategies, data stores, CLI patterns, game engines, library bundling, mobile deployment, batch and streaming pipelines, model training and serving, browser extension manifests and service workers, data-science reproducibility and notebook discipline, and more. These get injected into prompts based on each step's `knowledge-base` frontmatter field. Knowledge files with a `## Deep Guidance` section are optimized for CLI assembly — only the deep guidance content is loaded, avoiding redundancy with the prompt text. Teams can add project-local overrides in `.scaffold/knowledge/` that layer on top of the global entries.
|
|
33
33
|
|
|
34
34
|
**Methodology presets** — Three built-in presets control which steps run and how deep the analysis goes:
|
|
35
35
|
- **deep** (depth 5) — all steps enabled, exhaustive analysis
|
|
@@ -38,7 +38,7 @@ Either way, Scaffold constructs the prompt and the target AI tool does the work.
|
|
|
38
38
|
|
|
39
39
|
**Depth scale** (1-5) — Controls how thorough each step's output is, from "focus on the core deliverable" (1) to "explore all angles, tradeoffs, and edge cases" (5). Depth resolves with 4-level precedence: CLI flag > step override > custom default > preset default.
|
|
40
40
|
|
|
41
|
-
**Multi-model validation** — At depth 4-5,
|
|
41
|
+
**Multi-model validation** — At depth 4-5, review and validation steps can dispatch independent reviews to the three MMR CLI channels (Codex, Gemini, Claude) via the `mmr` CLI, plus the Superpowers code-reviewer agent as a complementary 4th channel on wrapper invocations (`scaffold run review-pr`, `scaffold run review-code`). The MMR-backed wrappers are the preferred path; some older depth-5 validation steps still dispatch Codex/Gemini directly via the `multi-model-dispatch` skill (migration in progress). Multiple independent models catch more blind spots than one. Findings are reconciled by confidence level (multiple channels agree = high confidence, single channel P0 = still actionable). When Codex or Gemini is unavailable, a compensating Claude self-review pass runs in its place (labeled `[compensating: Codex-equivalent]` or `[compensating: Gemini-equivalent]`, single-source confidence); there is no compensating pass when Claude itself is unavailable — the review simply proceeds with the remaining channels. CLI commands must always run in the foreground — background execution produces empty output. `mmr review` also supports non-PR targets (staged changes, branch diff, specific files) — see the [Multi-Model Review](#multi-model-review) section.
|
|
42
42
|
|
|
43
43
|
**State management** — Pipeline progress is tracked in `.scaffold/state.json` with atomic file writes and crash recovery. An advisory lock prevents concurrent runs. Decisions are logged to an append-only `decisions.jsonl`. Pressing Ctrl+C during any command exits cleanly with an informative message — no stack traces, no orphaned locks, no corrupted state.
|
|
44
44
|
|
|
@@ -368,7 +368,7 @@ Every `scaffold init` wizard question can be answered via CLI flags, making scaf
|
|
|
368
368
|
| `--depth` | 1-5 | Custom methodology depth (requires `--methodology custom`) |
|
|
369
369
|
| `--adapters` | comma-sep | AI adapters: claude-code, codex, gemini |
|
|
370
370
|
| `--traits` | comma-sep | Project traits: web, mobile |
|
|
371
|
-
| `--project-type` | string | web-app, mobile-app, backend, cli, library, game, data-pipeline, ml, browser-extension, research |
|
|
371
|
+
| `--project-type` | string | web-app, mobile-app, backend, cli, library, game, data-pipeline, ml, browser-extension, research, data-science |
|
|
372
372
|
| `--auto` | boolean | Non-interactive mode (uses Zod defaults for unset flags) |
|
|
373
373
|
|
|
374
374
|
#### Web-App Config Flags (require `--project-type web-app` or auto-set it)
|
|
@@ -457,6 +457,14 @@ Every `scaffold init` wizard question can be answered via CLI flags, making scaf
|
|
|
457
457
|
| `--research-domain` | string | none, quant-finance, ml-research, simulation |
|
|
458
458
|
| `--research-tracking` | boolean | `--research-tracking` / `--no-research-tracking` |
|
|
459
459
|
|
|
460
|
+
#### Data Science Config (`--project-type data-science`)
|
|
461
|
+
|
|
462
|
+
Data science has one forward-compatible config field in the schema, defaulted automatically — no CLI flags are needed in v1:
|
|
463
|
+
|
|
464
|
+
| Config field | Values | Notes |
|
|
465
|
+
|------|------|--------|
|
|
466
|
+
| `dataScienceConfig.audience` | `solo` | Default (applied by the wizard and `--auto`). Covers the DS-1 audience (solo / small-team, local-first, prototyping). A future DS-2 release will extend the enum with `'platform'` (platform-engineered / larger-team DS) additively, without breaking existing configs. |
|
|
467
|
+
|
|
460
468
|
#### Game Config Flags (require `--project-type game` or auto-set it)
|
|
461
469
|
|
|
462
470
|
| Flag | Type | Values |
|
|
@@ -506,7 +514,7 @@ during assembly.
|
|
|
506
514
|
|
|
507
515
|
- **Flag > auto > interactive**: Flags always take highest precedence. `--auto --engine unreal` uses defaults for everything except engine.
|
|
508
516
|
- **Partial flags + interactive**: Provide some flags and the wizard asks only the remaining questions. `scaffold init --project-type game --engine unreal` prompts interactively for multiplayer, platforms, etc.
|
|
509
|
-
- **Type-specific flags auto-set project type**: `--engine unity` automatically sets `--project-type game`, `--web-rendering ssr` sets `--project-type web-app`, `--backend-api-style rest` sets `--project-type backend`, `--cli-interactivity hybrid` sets `--project-type cli`, `--lib-visibility public` sets `--project-type library`, `--mobile-platform ios` sets `--project-type mobile-app`, `--pipeline-processing batch` sets `--project-type data-pipeline`, `--ml-phase training` sets `--project-type ml`, `--ext-manifest 3` sets `--project-type browser-extension`, `--research-driver code-driven` sets `--project-type research`. Error if conflicting type.
|
|
517
|
+
- **Type-specific flags auto-set project type**: `--engine unity` automatically sets `--project-type game`, `--web-rendering ssr` sets `--project-type web-app`, `--backend-api-style rest` sets `--project-type backend`, `--cli-interactivity hybrid` sets `--project-type cli`, `--lib-visibility public` sets `--project-type library`, `--mobile-platform ios` sets `--project-type mobile-app`, `--pipeline-processing batch` sets `--project-type data-pipeline`, `--ml-phase training` sets `--project-type ml`, `--ext-manifest 3` sets `--project-type browser-extension`, `--research-driver code-driven` sets `--project-type research`. Error if conflicting type. (Data science currently has no dedicated CLI flags — pass `--project-type data-science` directly.)
|
|
510
518
|
- **Cannot mix flag families**: `--web-rendering ssr --backend-api-style rest` is an error. Each flag family (`--web-*`, `--backend-*`, `--cli-*`, `--lib-*`, `--mobile-*`, `--pipeline-*`, `--ml-*`, `--research-*`, `--ext-*`, game) is exclusive.
|
|
511
519
|
- **Validation**: `--depth` requires `--methodology custom`. `--online-services` requires `--multiplayer online` or `hybrid`. SSR/hybrid rendering is incompatible with static deploy target. Session auth requires server state (not static). ML inference projects must specify a serving pattern. Browser extensions must declare at least one capability (UI surface, content script, or background worker). Notebook-driven research cannot be fully autonomous.
|
|
512
520
|
|
|
@@ -599,6 +607,9 @@ scaffold init --auto --methodology deep --project-type research \
|
|
|
599
607
|
--research-driver config-driven --research-interaction checkpoint-gated \
|
|
600
608
|
--research-domain ml-research
|
|
601
609
|
|
|
610
|
+
# Solo / small-team data science project (reproducibility-first)
|
|
611
|
+
scaffold init --auto --methodology deep --project-type data-science
|
|
612
|
+
|
|
602
613
|
# Multiplayer mobile game with Unity
|
|
603
614
|
scaffold init --project-type game --methodology deep --auto \
|
|
604
615
|
--engine unity --multiplayer online --target-platforms ios,android \
|
|
@@ -625,7 +636,7 @@ Scaffold supports **project-type overlays** — domain-specific knowledge and pi
|
|
|
625
636
|
|
|
626
637
|
- **Injects domain knowledge** into existing pipeline steps (e.g., SSR caching strategies into `tech-stack`, API pagination patterns into `coding-standards`)
|
|
627
638
|
|
|
628
|
-
The game overlay additionally adjusts step enablement, remaps artifact references, and adds dependency overrides (because game development has fundamentally different artifacts). The web-app, backend, CLI, library, mobile-app, data-pipeline, ML, browser-extension, and
|
|
639
|
+
The game overlay additionally adjusts step enablement, remaps artifact references, and adds dependency overrides (because game development has fundamentally different artifacts). The web-app, backend, CLI, library, mobile-app, data-pipeline, ML, browser-extension, research, and data-science overlays are **knowledge-only** — they inject domain expertise into existing steps without changing which steps run or how they depend on each other. The research type additionally supports **domain sub-overlays** (quant-finance, ml-research, simulation) that layer domain-specific knowledge on top of the core research overlay, and the backend type supports a `fintech` sub-overlay. Both research and backend accept `domain` as either a single string or an array (e.g. `domain: ['quant-finance', 'simulation']`) for stacking multiple sub-overlays; the wizard and CLI flags remain single-select in v1, so multi-domain stacking requires hand-editing `.scaffold/config.yml`.
|
|
629
640
|
|
|
630
641
|
Overlays are composable with methodology presets. An MVP web-app gets fewer steps at lower depth; a deep backend project gets exhaustive analysis of every architectural decision.
|
|
631
642
|
|
|
@@ -640,6 +651,7 @@ Overlays are composable with methodology presets. An MVP web-app gets fewer step
|
|
|
640
651
|
| `ml` | `ml-overlay.yml` | 12 entries (architecture, training and serving patterns, experiment tracking, model evaluation, observability, testing, security) | Project phase, model type, serving pattern, experiment tracking |
|
|
641
652
|
| `browser-extension` | `browser-extension-overlay.yml` | 12 entries (architecture, manifest configuration, service workers, content scripts, cross-browser, store submission, testing, security) | Manifest version, UI surfaces, content script, background worker |
|
|
642
653
|
| `research` | `research-overlay.yml` + domain sub-overlays | 25 entries (experiment loop, tracking, overfitting prevention, backtesting, risk metrics, architecture search, simulation) | Experiment driver, interaction mode, domain, experiment tracking |
|
|
654
|
+
| `data-science` | `data-science-overlay.yml` | 13 entries (reproducibility, experiment tracking, notebook discipline, model evaluation, data versioning, dev environment, observability, project structure, conventions, requirements, security, testing, architecture) | Audience (`solo` default; `platform` reserved for DS-2) |
|
|
643
655
|
| `game` | `game-overlay.yml` | 24 entries (engines, networking, audio, VR/AR, economy, save systems, certification) | Engine, multiplayer, platforms, economy, narrative, and 6 more |
|
|
644
656
|
|
|
645
657
|
### Game Development
|
|
@@ -725,7 +737,7 @@ These answers control which conditional steps activate. A single-player puzzle g
|
|
|
725
737
|
|
|
726
738
|
#### Multi-type Detection
|
|
727
739
|
|
|
728
|
-
`scaffold adopt` detects
|
|
740
|
+
`scaffold adopt` detects 11 project types from manifest files and directory layouts:
|
|
729
741
|
|
|
730
742
|
| Type | Key Signals |
|
|
731
743
|
|------|-------------|
|
|
@@ -739,6 +751,7 @@ These answers control which conditional steps activate. A single-player puzzle g
|
|
|
739
751
|
| `ml` | `training/`/`models/` dirs, PyTorch/TensorFlow deps, MLflow configs |
|
|
740
752
|
| `browser-extension` | `manifest.json` with `manifest_version` field |
|
|
741
753
|
| `research` | `program.md` + `results.tsv`, backtest/strategy files with trading deps, optimization deps + experiment dirs, simulation framework deps |
|
|
754
|
+
| `data-science` | Marimo signals required (`marimo` dep or `.marimo.toml`); DVC (`dvc.yaml`, `.dvc/config`, `dvc` py dep) is supplementary evidence only. Low-tier; defers to `ml` / `research` / `data-pipeline` when those match at medium/high tier |
|
|
742
755
|
|
|
743
756
|
Each detector returns a confidence tier (high/medium/low) with evidence trails. Override detection with `--project-type <type>`.
|
|
744
757
|
|
|
@@ -813,7 +826,7 @@ Claude sets up your local dev environment with one-command startup and live relo
|
|
|
813
826
|
| `dev-env-setup` | Claude configures your project so `make dev` (or equivalent) starts everything — dev server with live reload, local database, environment variables — and documents the setup in a getting-started guide. |
|
|
814
827
|
| `design-system` | Claude creates a visual language — color palette (WCAG-compliant), typography scale, spacing system, component patterns — and generates working theme config files for your frontend framework. *(web apps only)* |
|
|
815
828
|
| `git-workflow` | Claude sets up your branching strategy, commit message format, PR workflow, CI pipeline with lint and test jobs, and worktree scripts so multiple AI agents can work in parallel without conflicts. |
|
|
816
|
-
| `automated-pr-review` | Claude configures automated code review —
|
|
829
|
+
| `automated-pr-review` | Claude configures automated code review — three-CLI MMR dispatch (Codex, Gemini, Claude) plus Superpowers code-reviewer as a complementary 4th channel via the scaffold wrappers, with severity definitions and review criteria tailored to your project. Covers PRs and non-PR targets (local code, diffs, files). *(optional)* |
|
|
817
830
|
| `ai-memory-setup` | Claude extracts conventions from your docs into path-scoped rule files that load automatically, optimizes CLAUDE.md with a pointer pattern, and optionally sets up persistent cross-session memory. |
|
|
818
831
|
|
|
819
832
|
### Phase 4 — Testing Integration (integration)
|
|
@@ -1201,7 +1214,9 @@ channels:
|
|
|
1201
1214
|
command: claude -p
|
|
1202
1215
|
auth:
|
|
1203
1216
|
check: "claude -p 'respond with ok' 2>/dev/null"
|
|
1204
|
-
|
|
1217
|
+
# Claude's auth probe is a full LLM round-trip (not a local status
|
|
1218
|
+
# check) and routinely takes 9-14s, so 20s is the realistic default.
|
|
1219
|
+
timeout: 20
|
|
1205
1220
|
failure_exit_codes: [1]
|
|
1206
1221
|
recovery: "Run: claude login"
|
|
1207
1222
|
|
|
@@ -1215,7 +1230,10 @@ channels:
|
|
|
1215
1230
|
NO_BROWSER: "true"
|
|
1216
1231
|
auth:
|
|
1217
1232
|
check: "NO_BROWSER=true gemini -p 'respond with ok' -o json 2>&1"
|
|
1218
|
-
|
|
1233
|
+
# Gemini's auth probe is also a full LLM round-trip; same reasoning
|
|
1234
|
+
# as Claude. Codex stays at the 5s default (see below) because its
|
|
1235
|
+
# check is a local file probe.
|
|
1236
|
+
timeout: 20
|
|
1219
1237
|
failure_exit_codes: [41]
|
|
1220
1238
|
recovery: "Run: gemini -p 'hello' (interactive, opens browser)"
|
|
1221
1239
|
timeout: 360 # Gemini tends to be slower
|
|
@@ -1272,11 +1290,12 @@ When multiple channels return findings, mmr applies consensus rules:
|
|
|
1272
1290
|
|
|
1273
1291
|
| Scenario | Confidence | Action |
|
|
1274
1292
|
|----------|-----------|--------|
|
|
1275
|
-
|
|
|
1276
|
-
|
|
|
1277
|
-
| One flags P0,
|
|
1278
|
-
| One flags P1,
|
|
1279
|
-
|
|
|
1293
|
+
| 2+ channels flag the same issue | **High** | Fix immediately |
|
|
1294
|
+
| All channels approve | **High** | Proceed confidently |
|
|
1295
|
+
| One channel flags P0, others approve | **High** | Fix it (P0 is critical) |
|
|
1296
|
+
| One channel flags P1, others approve | **Medium** | Review before fixing |
|
|
1297
|
+
| Channels contradict each other | **Low** | Present all perspectives to user |
|
|
1298
|
+
| Compensating-pass P0/P1/P2 finding | **Single-source** | Fix per normal thresholds, label as compensating |
|
|
1280
1299
|
|
|
1281
1300
|
Scaffold verifies CLI authentication before every dispatch. If a token has expired, it tells you and provides the command to re-authenticate — it never silently skips a review.
|
|
1282
1301
|
|
|
@@ -1289,15 +1308,15 @@ At depth 1-3, reviews are Claude-only — still thorough with multiple passes, b
|
|
|
1289
1308
|
### What You Need
|
|
1290
1309
|
|
|
1291
1310
|
- **Depth 4 or 5** — set during `scaffold init` or override per step
|
|
1292
|
-
- **At least one additional CLI** — Codex or Gemini (
|
|
1293
|
-
- **Valid authentication** — Scaffold checks before every dispatch and tells you if credentials need refreshing
|
|
1311
|
+
- **At least one additional CLI** — Codex, Gemini, and/or Claude CLI. All three dispatched independently as MMR channels when available. Missing Codex or Gemini channels fall back to compensating Claude passes (labeled `[compensating: Codex-equivalent]` / `[compensating: Gemini-equivalent]`, single-source confidence); if Claude itself is unavailable, the review proceeds with the remaining channels — MMR does not compensate for a missing Claude channel.
|
|
1312
|
+
- **Valid authentication** — Scaffold checks before every dispatch (run `mmr config test` to pre-flight all three at once) and tells you if credentials need refreshing
|
|
1294
1313
|
|
|
1295
1314
|
## Methodology Presets
|
|
1296
1315
|
|
|
1297
1316
|
Not every project needs all 60 steps. Choose a methodology when you run `scaffold init`:
|
|
1298
1317
|
|
|
1299
1318
|
### deep (depth 5)
|
|
1300
|
-
All steps enabled. Comprehensive analysis of every angle — domain modeling, ADRs, security review, traceability matrix, the works. At depth 4-5, review steps dispatch to Codex
|
|
1319
|
+
All steps enabled. Comprehensive analysis of every angle — domain modeling, ADRs, security review, traceability matrix, the works. At depth 4-5, review steps dispatch to the three MMR CLI channels (Codex, Gemini, Claude) for multi-model validation, with the Superpowers code-reviewer agent added as a complementary 4th channel via the scaffold wrappers. Best for complex systems, team projects, or when you want thorough documentation.
|
|
1301
1320
|
|
|
1302
1321
|
### mvp (depth 1)
|
|
1303
1322
|
Only 7 critical steps: create-prd, review-prd, user-stories, review-user-stories, tdd, implementation-plan, and implementation-playbook. Minimal ceremony — get to code fast. Best for prototypes, hackathons, or solo projects.
|
|
@@ -1359,7 +1378,8 @@ scaffold check add-e2e-testing
|
|
|
1359
1378
|
# → Applicable: yes | Platform: web | Brownfield: no | Mode: fresh
|
|
1360
1379
|
|
|
1361
1380
|
scaffold check automated-pr-review
|
|
1362
|
-
# → Applicable: yes | GitHub remote: yes | Available CLIs: codex, gemini | Recommended: local-cli (
|
|
1381
|
+
# → Applicable: yes | GitHub remote: yes | Available CLIs: codex, gemini, claude | Recommended: local-cli (three-CLI MMR review)
|
|
1382
|
+
# (suffix is `(three-CLI MMR review)` / `(two-CLI MMR review)` / `(single-CLI review)` based on how many of codex/gemini/claude are detected)
|
|
1363
1383
|
|
|
1364
1384
|
scaffold check ai-memory-setup
|
|
1365
1385
|
# → Rules: no | MCP server: none | Hooks: none | Mode: fresh
|
|
@@ -1374,7 +1394,7 @@ scaffold dashboard
|
|
|
1374
1394
|
|
|
1375
1395
|
## Knowledge System
|
|
1376
1396
|
|
|
1377
|
-
Scaffold ships with
|
|
1397
|
+
Scaffold ships with 235 domain expertise entries organized in eighteen categories:
|
|
1378
1398
|
|
|
1379
1399
|
- **core/** (26 entries) — eval craft, testing strategy, domain modeling, API design, database design, system architecture, ADR craft, security best practices, operations, task decomposition, user stories, UX specification, design system tokens, user story innovation, AI memory management, coding conventions, tech stack selection, project structure patterns, task tracking, CLAUDE.md patterns, multi-model review dispatch, review step template, dev environment, git workflow patterns, automated review tooling, vision craft
|
|
1380
1400
|
- **product/** (5 entries) — PRD craft, PRD innovation, gap analysis, vision craft, vision innovation
|
|
@@ -1393,6 +1413,7 @@ Scaffold ships with 222 domain expertise entries organized in sixteen categories
|
|
|
1393
1413
|
- **ml/** (12 entries) — training and inference patterns, model types (classical/deep-learning/llm), serving patterns, experiment tracking, model evaluation, MLOps observability
|
|
1394
1414
|
- **browser-extension/** (12 entries) — Manifest V3, content scripts, service workers, cross-browser compatibility, extension security, store submission
|
|
1395
1415
|
- **research/** (25 entries) — experiment loop architecture, parameter optimization, overfitting prevention, experiment tracking, security/sandboxing; domain knowledge for quant-finance (backtesting, risk metrics, market data, strategy patterns), ML-research (architecture search, ablation studies, evaluation), and simulation (engine integration, parameter spaces, compute management)
|
|
1416
|
+
- **data-science/** (13 entries) — reproducibility, experiment tracking, notebook discipline, model evaluation, data versioning, dev environment (Marimo/Jupyter/Hex), observability, project structure, conventions, requirements, security, testing, architecture
|
|
1396
1417
|
|
|
1397
1418
|
Each pipeline step declares which knowledge entries it needs in its frontmatter. The assembly engine injects them automatically. Knowledge files with a `## Deep Guidance` section are optimized for the CLI — only the deep guidance content is loaded into the assembled prompt, skipping the summary to avoid redundancy with the prompt text.
|
|
1398
1419
|
|
|
@@ -1438,9 +1459,9 @@ These are orthogonal to the pipeline — usable at any time, not tied to pipelin
|
|
|
1438
1459
|
| `scaffold run update` | Update Scaffold to the latest version. |
|
|
1439
1460
|
| `scaffold run dashboard` | Open a visual progress dashboard in your browser. |
|
|
1440
1461
|
| `scaffold run prompt-pipeline` | Print the full pipeline reference table. |
|
|
1441
|
-
| `scaffold run review-code` | Run all 3
|
|
1442
|
-
| `scaffold run review-pr` | Run all 3
|
|
1443
|
-
| `scaffold run post-implementation-review` | Full
|
|
1462
|
+
| `scaffold run review-code` | Run all 3 MMR CLI review channels (Codex CLI, Gemini CLI, Claude CLI) on tracked local code (committed branch diff + staged + unstaged — no untracked files) before commit or push, plus Superpowers code-reviewer as a complementary 4th channel. |
|
|
1463
|
+
| `scaffold run review-pr` | Run all 3 MMR CLI review channels (Codex CLI, Gemini CLI, Claude CLI) on a PR, plus Superpowers code-reviewer as a complementary 4th channel. Also usable on non-PR targets (staged changes, branch diff, specific files) via `mmr review` directly. |
|
|
1464
|
+
| `scaffold run post-implementation-review` | Full codebase review (Codex CLI + Gemini CLI + Superpowers code-reviewer — note: does not currently include Claude CLI as a standard channel) after an AI agent completes all tasks — checks requirements coverage, security, architecture alignment, and more. |
|
|
1444
1465
|
| `scaffold run spark` | Explore and expand a raw project idea through Socratic questioning, competitive research, and innovation expansion. Produces a `docs/spark-brief.md` that feeds into `create-vision`. At depth 4+, dispatches to external models for independent research and adversarial red-teaming. |
|
|
1445
1466
|
| `scaffold run session-analyzer` | Analyze Claude Code session logs for patterns and insights. |
|
|
1446
1467
|
|
|
@@ -1599,7 +1620,7 @@ All build inputs live under `content/`:
|
|
|
1599
1620
|
content/
|
|
1600
1621
|
├── pipeline/ # 60 meta-prompts organized by 16 phases (phases 0-15, including build)
|
|
1601
1622
|
├── tools/ # 10 tool meta-prompts (stateless, category: tool)
|
|
1602
|
-
├── knowledge/ #
|
|
1623
|
+
├── knowledge/ # 235 domain expertise entries (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension, research, data-science)
|
|
1603
1624
|
├── methodology/ # 3 YAML presets (deep, mvp, custom)
|
|
1604
1625
|
└── skills/ # Skill templates with {{markers}} for multi-platform resolution (includes mmr)
|
|
1605
1626
|
```
|
|
@@ -1,12 +1,12 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: automated-review-tooling
|
|
3
|
-
description: Patterns for automated
|
|
4
|
-
topics: [code-review, automation, codex, gemini, pull-requests, ci-cd, review-tooling]
|
|
3
|
+
description: Patterns for automated code review using AI CLI tools (Codex, Gemini, Claude) — three-CLI MMR orchestration plus Superpowers 4th channel in wrappers, reconciliation, compensating passes, PR + non-PR targets, and CI integration
|
|
4
|
+
topics: [code-review, automation, codex, gemini, claude, pull-requests, non-pr-review, mmr, ci-cd, review-tooling]
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# Automated Review Tooling
|
|
8
8
|
|
|
9
|
-
Automated
|
|
9
|
+
Automated code review leverages AI models to provide consistent, thorough code review without manual reviewer bottlenecks. This knowledge covers the local CLI approach (no GitHub Actions), the three-channel MMR orchestration (Codex + Gemini + Claude) with the Superpowers code-reviewer added as a complementary 4th channel by the scaffold MMR wrappers, and integration with both PR and non-PR review targets (local code, branch diffs, specific files).
|
|
10
10
|
|
|
11
11
|
## Summary
|
|
12
12
|
|
|
@@ -48,7 +48,7 @@ When an AI agent dispatches CLI reviews via a tool runner (Claude Code Bash tool
|
|
|
48
48
|
|
|
49
49
|
Before dispatching, verify the model CLI is installed and authenticated using a two-step process that produces distinct statuses for the orchestration layer:
|
|
50
50
|
|
|
51
|
-
**Step 1 — Installation check
|
|
51
|
+
**Step 1 — Installation check** (all three MMR channels):
|
|
52
52
|
|
|
53
53
|
```bash
|
|
54
54
|
# Codex: not found -> status: "not_installed"
|
|
@@ -56,6 +56,9 @@ command -v codex >/dev/null 2>&1
|
|
|
56
56
|
|
|
57
57
|
# Gemini: not found -> status: "not_installed"
|
|
58
58
|
command -v gemini >/dev/null 2>&1
|
|
59
|
+
|
|
60
|
+
# Claude CLI: not found -> status: "not_installed"
|
|
61
|
+
command -v claude >/dev/null 2>&1
|
|
59
62
|
```
|
|
60
63
|
|
|
61
64
|
If the CLI is not found, report status `not_installed` to the orchestration layer. Do not prompt the user to install it.
|
|
@@ -64,17 +67,23 @@ If the CLI is not found, report status `not_installed` to the orchestration laye
|
|
|
64
67
|
|
|
65
68
|
```bash
|
|
66
69
|
# Codex: fail -> status: "auth_failed"
|
|
67
|
-
codex login status 2>/dev/null
|
|
70
|
+
codex login status 2>/dev/null # local file probe
|
|
68
71
|
|
|
69
72
|
# Gemini: exit 41 -> status: "auth_failed"
|
|
70
|
-
NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
|
|
73
|
+
NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1 # full LLM round-trip
|
|
74
|
+
|
|
75
|
+
# Claude CLI: non-zero -> status: "auth_failed"
|
|
76
|
+
claude -p "respond with ok" 2>/dev/null # full LLM round-trip
|
|
71
77
|
```
|
|
72
78
|
|
|
79
|
+
Prefer `mmr config test` as a single-command pre-flight that runs all three checks and emits structured JSON.
|
|
80
|
+
|
|
73
81
|
If auth fails, report status `auth_failed` and surface recovery to the user:
|
|
74
82
|
- Codex: "Codex auth expired — run `! codex login` to re-authenticate"
|
|
75
83
|
- Gemini: "Gemini auth expired — run `! gemini -p \"hello\"` to re-authenticate"
|
|
84
|
+
- Claude CLI: "Claude CLI auth expired — run `! claude login` to re-authenticate"
|
|
76
85
|
|
|
77
|
-
If
|
|
86
|
+
Auth-check timeouts: Codex's check is a local file probe so the default is 5s; Gemini's and Claude's are full LLM round-trips and routinely take 9-14s, so MMR's built-in defaults use 20s for those two. If a check times out, retry once. If still failing, report `timeout`.
|
|
78
87
|
If auth succeeds, report `ready` and proceed to dispatch.
|
|
79
88
|
|
|
80
89
|
**Post-dispatch terminal states:**
|
|
@@ -0,0 +1,23 @@
|
|
|
1
|
+
# `data-science/` knowledge
|
|
2
|
+
|
|
3
|
+
Solo / small-team data-science domain knowledge injected into universal pipeline
|
|
4
|
+
steps by `content/methodology/data-science-overlay.yml`.
|
|
5
|
+
|
|
6
|
+
## Lockstep pairs with `ml/`
|
|
7
|
+
|
|
8
|
+
Five documents here mirror documents in `content/knowledge/ml/`. The two
|
|
9
|
+
overlays never compose at runtime (a user picks exactly one project type), but
|
|
10
|
+
edits to one side of a pair should trigger review of the other to prevent
|
|
11
|
+
recommendation drift over time:
|
|
12
|
+
|
|
13
|
+
| `data-science/` | `ml/` |
|
|
14
|
+
| --------------------------------------- | -------------------------------- |
|
|
15
|
+
| `data-science-experiment-tracking.md` | `ml-experiment-tracking.md` |
|
|
16
|
+
| `data-science-model-evaluation.md` | `ml-model-evaluation.md` |
|
|
17
|
+
| `data-science-observability.md` | `ml-observability.md` |
|
|
18
|
+
| `data-science-requirements.md` | `ml-requirements.md` |
|
|
19
|
+
| `data-science-conventions.md` | `ml-conventions.md` |
|
|
20
|
+
|
|
21
|
+
`ml/` targets production training and serving systems. `data-science/` targets
|
|
22
|
+
solo / small-team analytics and prototyping. Tool picks may diverge where the
|
|
23
|
+
audience justifies it (e.g. MLflow self-hosted vs managed W&B).
|
|
@@ -0,0 +1,163 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: data-science-architecture
|
|
3
|
+
description: Local-first architecture for solo and small-team data science — notebook exploration, src/ promotion, idempotent entrypoint pipelines, Polars vs Pandas choice, and artifact separation
|
|
4
|
+
topics: [data-science, architecture, polars, pandas, notebook-promotion]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
"Architecture" sounds heavy for a single analyst opening a notebook, but it is the one decision that separates work a collaborator can rerun tomorrow from a pile of ad-hoc scripts that only you can coax back to life. Solo DS work is local-first, reproducibility-first, and almost never needs Airflow or a Kubernetes cluster. What it needs is a coherent shape that scales from "a single notebook" to "a pipeline a teammate can clone and run" — and a clear story about where raw data, intermediate data, models, and reports each live. This doc lays out that shape and the small set of conventions that make it hold together.
|
|
8
|
+
|
|
9
|
+
## Summary
|
|
10
|
+
|
|
11
|
+
Architect a solo DS project as layers: exploratory notebooks on top, reusable functions in `src/`, unit tests in `tests/`, and a thin entrypoint script that composes those functions into a reproducible run. Use Polars for datasets >1 GB or >10M rows and Pandas for everything smaller where scikit-learn / seaborn compatibility matters. Runs happen via `uv run python -m src.pipeline` — no scheduler needed. Pipelines are idempotent functions that move data from `data/raw/` to `data/interim/` to `data/processed/`, emitting models to `models/` and reports to `reports/`. This shape deliberately does not solve distributed data, production serving, or real-time inference — when those become real, graduate to Prefect / Dagster and cross over to `ml-serving-patterns.md`.
|
|
12
|
+
|
|
13
|
+
## Deep Guidance
|
|
14
|
+
|
|
15
|
+
### The layered shape
|
|
16
|
+
|
|
17
|
+
The entire architecture is five layers, each with a single responsibility:
|
|
18
|
+
|
|
19
|
+
```
|
|
20
|
+
┌──────────────────────────────────────────────────────────────┐
|
|
21
|
+
│ notebooks/ exploration, narrative, charts │
|
|
22
|
+
│ ↓ (promote stable code) │
|
|
23
|
+
├──────────────────────────────────────────────────────────────┤
|
|
24
|
+
│ src/<project>/ typed, importable functions │
|
|
25
|
+
│ ↓ (test every function you ship) │
|
|
26
|
+
├──────────────────────────────────────────────────────────────┤
|
|
27
|
+
│ tests/ pytest smoke + unit tests │
|
|
28
|
+
│ ↓ (functions compose into a run) │
|
|
29
|
+
├──────────────────────────────────────────────────────────────┤
|
|
30
|
+
│ src/pipeline.py entrypoint: load→features→train→save │
|
|
31
|
+
│ ↓ (run produces artifacts) │
|
|
32
|
+
├──────────────────────────────────────────────────────────────┤
|
|
33
|
+
│ data/ models/ reports/ outputs, gitignored or DVC-tracked │
|
|
34
|
+
└──────────────────────────────────────────────────────────────┘
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
Read top-to-bottom it is the promotion path; read bottom-to-top it is the dependency graph. A notebook may import from `src/` but `src/` must never import from a notebook. Tests depend only on `src/`. The entrypoint (`pipeline.py`) is itself a module under `src/`, not a loose script at the repo root — keeping it importable lets you exercise it end-to-end in tests with a tiny fixture dataset.
|
|
38
|
+
|
|
39
|
+
### Polars vs Pandas
|
|
40
|
+
|
|
41
|
+
Pick the DataFrame library based on data size and ecosystem needs, not on what's trendy. Rule of thumb:
|
|
42
|
+
|
|
43
|
+
| Dimension | Pandas | Polars |
|
|
44
|
+
|----------------------|---------------------------------------|-----------------------------------------|
|
|
45
|
+
| Rows | <10M comfortably | 10M–1B on a single machine |
|
|
46
|
+
| In-memory size | <1 GB | 1 GB – ~RAM/2 |
|
|
47
|
+
| Execution | Eager, single-threaded | Lazy + multi-threaded, Arrow-native |
|
|
48
|
+
| Ecosystem | scikit-learn, seaborn, plotly, statsmodels | Native; interop via `.to_pandas()` |
|
|
49
|
+
| API stability | Mature, huge Stack Overflow corpus | Younger, faster-moving |
|
|
50
|
+
|
|
51
|
+
**Default to Pandas** when you are in sklearn / statsmodels / seaborn territory with small-to-medium data — ecosystem friction is the dominant cost. **Reach for Polars** when you are doing heavy group-bys, joins, or window functions on datasets where Pandas starts swapping or takes minutes per cell. The two libraries express the same group-by almost identically:
|
|
52
|
+
|
|
53
|
+
```python
|
|
54
|
+
# Pandas
|
|
55
|
+
(df
|
|
56
|
+
.groupby("customer_id")
|
|
57
|
+
.agg(total_spend=("amount", "sum"), tx_count=("amount", "count"))
|
|
58
|
+
.reset_index())
|
|
59
|
+
|
|
60
|
+
# Polars (lazy — add .collect() to execute)
|
|
61
|
+
(df.lazy()
|
|
62
|
+
.group_by("customer_id")
|
|
63
|
+
.agg(pl.col("amount").sum().alias("total_spend"),
|
|
64
|
+
pl.col("amount").count().alias("tx_count"))
|
|
65
|
+
.collect())
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
Mixing is fine: load with Polars, do the fast aggregation, then `.to_pandas()` right before feeding a scikit-learn estimator. Avoid the trap of half-converting the codebase — pick one as the default for a given project and document it.
|
|
69
|
+
|
|
70
|
+
### Notebook to pipeline promotion
|
|
71
|
+
|
|
72
|
+
Every piece of code starts life in a notebook. The discipline is knowing when to move it:
|
|
73
|
+
|
|
74
|
+
1. You copy-paste a cell into a second notebook → promote.
|
|
75
|
+
2. A transformation has a non-trivial branch (try/except, conditional handling) → promote.
|
|
76
|
+
3. You want to unit-test it → promote (you can't test a notebook cell cleanly).
|
|
77
|
+
|
|
78
|
+
Promotion is a four-step move: extract the cell into `src/<project>/features/engineer.py` as a typed function, add a pytest in `tests/`, replace the notebook cell with an `import`, and turn on `%autoreload 2` so subsequent edits live-reload without a kernel restart.
|
|
79
|
+
|
|
80
|
+
```python
|
|
81
|
+
# src/<project>/features/engineer.py
|
|
82
|
+
import polars as pl
|
|
83
|
+
|
|
84
|
+
def add_tenure_bucket(df: pl.DataFrame, *, today: str) -> pl.DataFrame:
|
|
85
|
+
"""Bucket customers by days since signup into short / medium / long tenure."""
|
|
86
|
+
return df.with_columns(
|
|
87
|
+
((pl.lit(today).str.to_date() - pl.col("signup_date")).dt.total_days())
|
|
88
|
+
.alias("tenure_days")
|
|
89
|
+
).with_columns(
|
|
90
|
+
pl.when(pl.col("tenure_days") < 90).then(pl.lit("short"))
|
|
91
|
+
.when(pl.col("tenure_days") < 365).then(pl.lit("medium"))
|
|
92
|
+
.otherwise(pl.lit("long"))
|
|
93
|
+
.alias("tenure_bucket")
|
|
94
|
+
)
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
The notebook now reads `from <project>.features.engineer import add_tenure_bucket` and the function is covered by `tests/test_engineer.py` with a six-row fixture. This is the single most important habit in a DS codebase — see `data-science-project-structure.md` for the directory layout it slots into.
|
|
98
|
+
|
|
99
|
+
### Idempotent pipeline entrypoints
|
|
100
|
+
|
|
101
|
+
The pipeline is a thin composition layer — one function per stage, each one idempotent (same inputs → same outputs, safe to rerun). It lives at `src/<project>/pipeline.py` and exposes a `main(cfg)` that a CLI wraps:
|
|
102
|
+
|
|
103
|
+
```python
|
|
104
|
+
# src/<project>/pipeline.py
|
|
105
|
+
import argparse, yaml
|
|
106
|
+
from pathlib import Path
|
|
107
|
+
from <project>.ingestion import load_transactions
|
|
108
|
+
from <project>.validation import validate_schema
|
|
109
|
+
from <project>.features.engineer import build_features
|
|
110
|
+
from <project>.training import train_model
|
|
111
|
+
from <project>.evaluation import evaluate
|
|
112
|
+
from <project>.io import save_model, save_report
|
|
113
|
+
|
|
114
|
+
def run(cfg: dict) -> None:
|
|
115
|
+
run_id = cfg["run_name"]
|
|
116
|
+
raw = load_transactions(cfg["data"]["raw_path"])
|
|
117
|
+
validate_schema(raw, cfg["data"]["schema"])
|
|
118
|
+
processed = build_features(raw, cfg["features"])
|
|
119
|
+
processed.write_parquet(Path(cfg["data"]["processed_path"]))
|
|
120
|
+
model, metrics = train_model(processed, cfg["model"])
|
|
121
|
+
report = evaluate(model, processed, cfg["evaluation"])
|
|
122
|
+
save_model(model, f"models/{run_id}.joblib")
|
|
123
|
+
save_report(report, f"reports/{run_id}.html")
|
|
124
|
+
|
|
125
|
+
def main() -> None:
|
|
126
|
+
ap = argparse.ArgumentParser()
|
|
127
|
+
ap.add_argument("--config", required=True, type=Path)
|
|
128
|
+
args = ap.parse_args()
|
|
129
|
+
run(yaml.safe_load(args.config.read_text()))
|
|
130
|
+
|
|
131
|
+
if __name__ == "__main__":
|
|
132
|
+
main()
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
Invoke it with `uv run python -m <project>.pipeline --config configs/baseline.yaml`. Idempotence means: each stage writes to a deterministic path based on the config, and re-running over existing outputs is a no-op (or an overwrite of identical content). That property is what lets a teammate — or future-you — rerun the pipeline confidently without inspecting every intermediate.
|
|
136
|
+
|
|
137
|
+
### Where outputs go
|
|
138
|
+
|
|
139
|
+
Artifacts follow a strict directory contract so a run never scatters files:
|
|
140
|
+
|
|
141
|
+
| Artifact | Path | Notes |
|
|
142
|
+
|------------------------|-------------------------------------|------------------------------------------|
|
|
143
|
+
| Immutable source data | `data/raw/` | Never written to after initial ingest |
|
|
144
|
+
| Cached partial transforms | `data/interim/` | Safe to delete; regenerable from raw |
|
|
145
|
+
| Analysis-ready datasets | `data/processed/` | Consumed by training |
|
|
146
|
+
| Predictions | `data/processed/predictions/` | Keeps inference outputs alongside data |
|
|
147
|
+
| Trained models | `models/<run_id>.joblib` | DVC or git-lfs pointer tracked |
|
|
148
|
+
| Rendered reports | `reports/<run_id>.html` | HTML / markdown summaries |
|
|
149
|
+
| Figures | `reports/figures/<run_id>/` | PNG / SVG charts |
|
|
150
|
+
|
|
151
|
+
The rule: **paths come from config, never hard-coded in code**. `cfg["output"]["model_path"]` lives in the YAML; `"models/baseline_v1.joblib"` never appears as a string literal inside `training.py`. That is what lets a single pipeline module serve every run variant.
|
|
152
|
+
|
|
153
|
+
### When to outgrow this
|
|
154
|
+
|
|
155
|
+
This architecture covers the 0-to-100GB, one-to-three-contributors slot. Signals you are leaving that slot:
|
|
156
|
+
|
|
157
|
+
- Data no longer fits on a laptop (>100 GB, or streaming sources) → Spark, DuckDB+S3, or a warehouse-side pipeline.
|
|
158
|
+
- You need scheduled / triggered runs with retries, alerting, observability → Prefect, Dagster, or Airflow.
|
|
159
|
+
- The model must serve real-time predictions with SLA → cross over to `ml-serving-patterns.md` for online inference, feature stores, and the training-serving split.
|
|
160
|
+
- Multiple people are editing the pipeline concurrently → promote `configs/` to a registry, add a model registry (MLflow), and start writing ADRs under `docs/adr/`.
|
|
161
|
+
- The team wants experiment tracking beyond a CSV of metrics → MLflow Tracking or Weights & Biases.
|
|
162
|
+
|
|
163
|
+
Do not preemptively adopt any of these. Installing Dagster for a weekly notebook is a classic small-team failure mode — the operational tax (scheduler, DB, UI, auth) dwarfs the benefit. Graduate one piece at a time, and only when the pain is concrete. The layered shape above is deliberately the smallest coherent thing; resist making it bigger until the evidence demands it.
|