npm - @zigrivers/scaffold - Versions diffs - 3.22.0 → 3.23.0 - Mend

@zigrivers/scaffold 3.22.0 → 3.23.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (95) hide show

package/README.md +21 -7
package/content/knowledge/data-science/README.md +23 -0
package/content/knowledge/data-science/data-science-architecture.md +163 -0
package/content/knowledge/data-science/data-science-conventions.md +233 -0
package/content/knowledge/data-science/data-science-data-versioning.md +198 -0
package/content/knowledge/data-science/data-science-dev-environment.md +159 -0
package/content/knowledge/data-science/data-science-experiment-tracking.md +194 -0
package/content/knowledge/data-science/data-science-model-evaluation.md +160 -0
package/content/knowledge/data-science/data-science-notebook-discipline.md +170 -0
package/content/knowledge/data-science/data-science-observability.md +161 -0
package/content/knowledge/data-science/data-science-project-structure.md +178 -0
package/content/knowledge/data-science/data-science-reproducibility.md +164 -0
package/content/knowledge/data-science/data-science-requirements.md +151 -0
package/content/knowledge/data-science/data-science-security.md +151 -0
package/content/knowledge/data-science/data-science-testing.md +183 -0
package/content/knowledge/ml/README.md +10 -0
package/content/methodology/data-science-overlay.yml +39 -0
package/dist/config/schema.d.ts +672 -126
package/dist/config/schema.d.ts.map +1 -1
package/dist/config/schema.js +8 -0
package/dist/config/schema.js.map +1 -1
package/dist/config/schema.test.js +2 -2
package/dist/config/schema.test.js.map +1 -1
package/dist/config/validators/data-science.d.ts +4 -0
package/dist/config/validators/data-science.d.ts.map +1 -0
package/dist/config/validators/data-science.js +15 -0
package/dist/config/validators/data-science.js.map +1 -0
package/dist/config/validators/index.d.ts.map +1 -1
package/dist/config/validators/index.js +2 -0
package/dist/config/validators/index.js.map +1 -1
package/dist/core/assembly/knowledge-loader.d.ts.map +1 -1
package/dist/core/assembly/knowledge-loader.js +6 -0
package/dist/core/assembly/knowledge-loader.js.map +1 -1
package/dist/core/assembly/knowledge-loader.test.js +34 -0
package/dist/core/assembly/knowledge-loader.test.js.map +1 -1
package/dist/e2e/project-type-overlays.test.js +73 -0
package/dist/e2e/project-type-overlays.test.js.map +1 -1
package/dist/project/adopt.d.ts.map +1 -1
package/dist/project/adopt.js +3 -1
package/dist/project/adopt.js.map +1 -1
package/dist/project/detectors/coverage.test.d.ts +2 -0
package/dist/project/detectors/coverage.test.d.ts.map +1 -0
package/dist/project/detectors/coverage.test.js +78 -0
package/dist/project/detectors/coverage.test.js.map +1 -0
package/dist/project/detectors/data-science.d.ts +4 -0
package/dist/project/detectors/data-science.d.ts.map +1 -0
package/dist/project/detectors/data-science.js +32 -0
package/dist/project/detectors/data-science.js.map +1 -0
package/dist/project/detectors/data-science.test.d.ts +2 -0
package/dist/project/detectors/data-science.test.d.ts.map +1 -0
package/dist/project/detectors/data-science.test.js +62 -0
package/dist/project/detectors/data-science.test.js.map +1 -0
package/dist/project/detectors/disambiguate.d.ts +2 -0
package/dist/project/detectors/disambiguate.d.ts.map +1 -1
package/dist/project/detectors/disambiguate.js +3 -2
package/dist/project/detectors/disambiguate.js.map +1 -1
package/dist/project/detectors/disambiguate.test.js +10 -1
package/dist/project/detectors/disambiguate.test.js.map +1 -1
package/dist/project/detectors/index.d.ts.map +1 -1
package/dist/project/detectors/index.js +2 -0
package/dist/project/detectors/index.js.map +1 -1
package/dist/project/detectors/library.d.ts.map +1 -1
package/dist/project/detectors/library.js +1 -0
package/dist/project/detectors/library.js.map +1 -1
package/dist/project/detectors/resolve-detection.test.js +31 -0
package/dist/project/detectors/resolve-detection.test.js.map +1 -1
package/dist/project/detectors/types.d.ts +6 -2
package/dist/project/detectors/types.d.ts.map +1 -1
package/dist/project/detectors/types.js.map +1 -1
package/dist/types/config.d.ts +8 -1
package/dist/types/config.d.ts.map +1 -1
package/dist/wizard/copy/core.d.ts.map +1 -1
package/dist/wizard/copy/core.js +4 -0
package/dist/wizard/copy/core.js.map +1 -1
package/dist/wizard/copy/data-science.d.ts +3 -0
package/dist/wizard/copy/data-science.d.ts.map +1 -0
package/dist/wizard/copy/data-science.js +15 -0
package/dist/wizard/copy/data-science.js.map +1 -0
package/dist/wizard/copy/index.d.ts.map +1 -1
package/dist/wizard/copy/index.js +2 -0
package/dist/wizard/copy/index.js.map +1 -1
package/dist/wizard/copy/types.d.ts +5 -1
package/dist/wizard/copy/types.d.ts.map +1 -1
package/dist/wizard/copy/types.test-d.js +7 -0
package/dist/wizard/copy/types.test-d.js.map +1 -1
package/dist/wizard/questions.d.ts +2 -1
package/dist/wizard/questions.d.ts.map +1 -1
package/dist/wizard/questions.js +9 -1
package/dist/wizard/questions.js.map +1 -1
package/dist/wizard/questions.test.js +14 -0
package/dist/wizard/questions.test.js.map +1 -1
package/dist/wizard/wizard.d.ts.map +1 -1
package/dist/wizard/wizard.js +1 -0
package/dist/wizard/wizard.js.map +1 -1
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -29,7 +29,7 @@ Either way, Scaffold constructs the prompt and the target AI tool does the work.
 **Assembly engine** — At execution time, Scaffold builds a 7-section prompt from: system metadata, the meta-prompt, knowledge base entries, project context (artifacts from prior steps), methodology settings, layered instructions, and depth-specific execution guidance.
-**Knowledge base** — 222 domain expertise entries in `content/knowledge/` organized in seventeen categories (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension, research) covering testing strategy, domain modeling, API design, security best practices, eval craft, TDD execution, task claiming, worktree management, release management, rendering strategies, data stores, CLI patterns, game engines, library bundling, mobile deployment, batch and streaming pipelines, model training and serving, browser extension manifests and service workers, and more. These get injected into prompts based on each step's `knowledge-base` frontmatter field. Knowledge files with a `## Deep Guidance` section are optimized for CLI assembly — only the deep guidance content is loaded, avoiding redundancy with the prompt text. Teams can add project-local overrides in `.scaffold/knowledge/` that layer on top of the global entries.
+**Knowledge base** — 235 domain expertise entries in `content/knowledge/` organized in eighteen categories (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension, research, data-science) covering testing strategy, domain modeling, API design, security best practices, eval craft, TDD execution, task claiming, worktree management, release management, rendering strategies, data stores, CLI patterns, game engines, library bundling, mobile deployment, batch and streaming pipelines, model training and serving, browser extension manifests and service workers, data-science reproducibility and notebook discipline, and more. These get injected into prompts based on each step's `knowledge-base` frontmatter field. Knowledge files with a `## Deep Guidance` section are optimized for CLI assembly — only the deep guidance content is loaded, avoiding redundancy with the prompt text. Teams can add project-local overrides in `.scaffold/knowledge/` that layer on top of the global entries.
 **Methodology presets** — Three built-in presets control which steps run and how deep the analysis goes:
 - **deep** (depth 5) — all steps enabled, exhaustive analysis
@@ -368,7 +368,7 @@ Every `scaffold init` wizard question can be answered via CLI flags, making scaf
 | `--depth` | 1-5 | Custom methodology depth (requires `--methodology custom`) |
 | `--adapters` | comma-sep | AI adapters: claude-code, codex, gemini |
 | `--traits` | comma-sep | Project traits: web, mobile |
-| `--project-type` | string | web-app, mobile-app, backend, cli, library, game, data-pipeline, ml, browser-extension, research |
+| `--project-type` | string | web-app, mobile-app, backend, cli, library, game, data-pipeline, ml, browser-extension, research, data-science |
 | `--auto` | boolean | Non-interactive mode (uses Zod defaults for unset flags) |
 #### Web-App Config Flags (require `--project-type web-app` or auto-set it)
@@ -457,6 +457,14 @@ Every `scaffold init` wizard question can be answered via CLI flags, making scaf
 | `--research-domain` | string | none, quant-finance, ml-research, simulation |
 | `--research-tracking` | boolean | `--research-tracking` / `--no-research-tracking` |
+#### Data Science Config (`--project-type data-science`)
+Data science has one forward-compatible config field in the schema, defaulted automatically — no CLI flags are needed in v1:
+| Config field | Values | Notes |
+|------|------|--------|
+| `dataScienceConfig.audience` | `solo` | Default (applied by the wizard and `--auto`). Covers the DS-1 audience (solo / small-team, local-first, prototyping). A future DS-2 release will extend the enum with `'platform'` (platform-engineered / larger-team DS) additively, without breaking existing configs. |
 #### Game Config Flags (require `--project-type game` or auto-set it)
 | Flag | Type | Values |
@@ -506,7 +514,7 @@ during assembly.
 - **Flag > auto > interactive**: Flags always take highest precedence. `--auto --engine unreal` uses defaults for everything except engine.
 - **Partial flags + interactive**: Provide some flags and the wizard asks only the remaining questions. `scaffold init --project-type game --engine unreal` prompts interactively for multiplayer, platforms, etc.
-- **Type-specific flags auto-set project type**: `--engine unity` automatically sets `--project-type game`, `--web-rendering ssr` sets `--project-type web-app`, `--backend-api-style rest` sets `--project-type backend`, `--cli-interactivity hybrid` sets `--project-type cli`, `--lib-visibility public` sets `--project-type library`, `--mobile-platform ios` sets `--project-type mobile-app`, `--pipeline-processing batch` sets `--project-type data-pipeline`, `--ml-phase training` sets `--project-type ml`, `--ext-manifest 3` sets `--project-type browser-extension`, `--research-driver code-driven` sets `--project-type research`. Error if conflicting type.
+- **Type-specific flags auto-set project type**: `--engine unity` automatically sets `--project-type game`, `--web-rendering ssr` sets `--project-type web-app`, `--backend-api-style rest` sets `--project-type backend`, `--cli-interactivity hybrid` sets `--project-type cli`, `--lib-visibility public` sets `--project-type library`, `--mobile-platform ios` sets `--project-type mobile-app`, `--pipeline-processing batch` sets `--project-type data-pipeline`, `--ml-phase training` sets `--project-type ml`, `--ext-manifest 3` sets `--project-type browser-extension`, `--research-driver code-driven` sets `--project-type research`. Error if conflicting type. (Data science currently has no dedicated CLI flags — pass `--project-type data-science` directly.)
 - **Cannot mix flag families**: `--web-rendering ssr --backend-api-style rest` is an error. Each flag family (`--web-*`, `--backend-*`, `--cli-*`, `--lib-*`, `--mobile-*`, `--pipeline-*`, `--ml-*`, `--research-*`, `--ext-*`, game) is exclusive.
 - **Validation**: `--depth` requires `--methodology custom`. `--online-services` requires `--multiplayer online` or `hybrid`. SSR/hybrid rendering is incompatible with static deploy target. Session auth requires server state (not static). ML inference projects must specify a serving pattern. Browser extensions must declare at least one capability (UI surface, content script, or background worker). Notebook-driven research cannot be fully autonomous.
@@ -599,6 +607,9 @@ scaffold init --auto --methodology deep --project-type research \
   --research-driver config-driven --research-interaction checkpoint-gated \
   --research-domain ml-research
+# Solo / small-team data science project (reproducibility-first)
+scaffold init --auto --methodology deep --project-type data-science
 # Multiplayer mobile game with Unity
 scaffold init --project-type game --methodology deep --auto \
   --engine unity --multiplayer online --target-platforms ios,android \
@@ -625,7 +636,7 @@ Scaffold supports **project-type overlays** — domain-specific knowledge and pi
 - **Injects domain knowledge** into existing pipeline steps (e.g., SSR caching strategies into `tech-stack`, API pagination patterns into `coding-standards`)
-The game overlay additionally adjusts step enablement, remaps artifact references, and adds dependency overrides (because game development has fundamentally different artifacts). The web-app, backend, CLI, library, mobile-app, data-pipeline, ML, browser-extension, and research overlays are **knowledge-only** — they inject domain expertise into existing steps without changing which steps run or how they depend on each other. The research type additionally supports **domain sub-overlays** (quant-finance, ml-research, simulation) that layer domain-specific knowledge on top of the core research overlay, and the backend type supports a `fintech` sub-overlay. Both research and backend accept `domain` as either a single string or an array (e.g. `domain: ['quant-finance', 'simulation']`) for stacking multiple sub-overlays; the wizard and CLI flags remain single-select in v1, so multi-domain stacking requires hand-editing `.scaffold/config.yml`.
+The game overlay additionally adjusts step enablement, remaps artifact references, and adds dependency overrides (because game development has fundamentally different artifacts). The web-app, backend, CLI, library, mobile-app, data-pipeline, ML, browser-extension, research, and data-science overlays are **knowledge-only** — they inject domain expertise into existing steps without changing which steps run or how they depend on each other. The research type additionally supports **domain sub-overlays** (quant-finance, ml-research, simulation) that layer domain-specific knowledge on top of the core research overlay, and the backend type supports a `fintech` sub-overlay. Both research and backend accept `domain` as either a single string or an array (e.g. `domain: ['quant-finance', 'simulation']`) for stacking multiple sub-overlays; the wizard and CLI flags remain single-select in v1, so multi-domain stacking requires hand-editing `.scaffold/config.yml`.
 Overlays are composable with methodology presets. An MVP web-app gets fewer steps at lower depth; a deep backend project gets exhaustive analysis of every architectural decision.
@@ -640,6 +651,7 @@ Overlays are composable with methodology presets. An MVP web-app gets fewer step
 | `ml` | `ml-overlay.yml` | 12 entries (architecture, training and serving patterns, experiment tracking, model evaluation, observability, testing, security) | Project phase, model type, serving pattern, experiment tracking |
 | `browser-extension` | `browser-extension-overlay.yml` | 12 entries (architecture, manifest configuration, service workers, content scripts, cross-browser, store submission, testing, security) | Manifest version, UI surfaces, content script, background worker |
 | `research` | `research-overlay.yml` + domain sub-overlays | 25 entries (experiment loop, tracking, overfitting prevention, backtesting, risk metrics, architecture search, simulation) | Experiment driver, interaction mode, domain, experiment tracking |
+| `data-science` | `data-science-overlay.yml` | 13 entries (reproducibility, experiment tracking, notebook discipline, model evaluation, data versioning, dev environment, observability, project structure, conventions, requirements, security, testing, architecture) | Audience (`solo` default; `platform` reserved for DS-2) |
 | `game` | `game-overlay.yml` | 24 entries (engines, networking, audio, VR/AR, economy, save systems, certification) | Engine, multiplayer, platforms, economy, narrative, and 6 more |
 ### Game Development
@@ -725,7 +737,7 @@ These answers control which conditional steps activate. A single-player puzzle g
 #### Multi-type Detection
-`scaffold adopt` detects 10 project types from manifest files and directory layouts:
+`scaffold adopt` detects 11 project types from manifest files and directory layouts:
 | Type | Key Signals |
 |------|-------------|
@@ -739,6 +751,7 @@ These answers control which conditional steps activate. A single-player puzzle g
 | `ml` | `training/`/`models/` dirs, PyTorch/TensorFlow deps, MLflow configs |
 | `browser-extension` | `manifest.json` with `manifest_version` field |
 | `research` | `program.md` + `results.tsv`, backtest/strategy files with trading deps, optimization deps + experiment dirs, simulation framework deps |
+| `data-science` | Marimo signals required (`marimo` dep or `.marimo.toml`); DVC (`dvc.yaml`, `.dvc/config`, `dvc` py dep) is supplementary evidence only. Low-tier; defers to `ml` / `research` / `data-pipeline` when those match at medium/high tier |
 Each detector returns a confidence tier (high/medium/low) with evidence trails. Override detection with `--project-type <type>`.
@@ -1374,7 +1387,7 @@ scaffold dashboard
 ## Knowledge System
-Scaffold ships with 222 domain expertise entries organized in sixteen categories:
+Scaffold ships with 235 domain expertise entries organized in eighteen categories:
 - **core/** (26 entries) — eval craft, testing strategy, domain modeling, API design, database design, system architecture, ADR craft, security best practices, operations, task decomposition, user stories, UX specification, design system tokens, user story innovation, AI memory management, coding conventions, tech stack selection, project structure patterns, task tracking, CLAUDE.md patterns, multi-model review dispatch, review step template, dev environment, git workflow patterns, automated review tooling, vision craft
 - **product/** (5 entries) — PRD craft, PRD innovation, gap analysis, vision craft, vision innovation
@@ -1393,6 +1406,7 @@ Scaffold ships with 222 domain expertise entries organized in sixteen categories
 - **ml/** (12 entries) — training and inference patterns, model types (classical/deep-learning/llm), serving patterns, experiment tracking, model evaluation, MLOps observability
 - **browser-extension/** (12 entries) — Manifest V3, content scripts, service workers, cross-browser compatibility, extension security, store submission
 - **research/** (25 entries) — experiment loop architecture, parameter optimization, overfitting prevention, experiment tracking, security/sandboxing; domain knowledge for quant-finance (backtesting, risk metrics, market data, strategy patterns), ML-research (architecture search, ablation studies, evaluation), and simulation (engine integration, parameter spaces, compute management)
+- **data-science/** (13 entries) — reproducibility, experiment tracking, notebook discipline, model evaluation, data versioning, dev environment (Marimo/Jupyter/Hex), observability, project structure, conventions, requirements, security, testing, architecture
 Each pipeline step declares which knowledge entries it needs in its frontmatter. The assembly engine injects them automatically. Knowledge files with a `## Deep Guidance` section are optimized for the CLI — only the deep guidance content is loaded into the assembled prompt, skipping the summary to avoid redundancy with the prompt text.
@@ -1599,7 +1613,7 @@ All build inputs live under `content/`:
 content/
 ├── pipeline/         # 60 meta-prompts organized by 16 phases (phases 0-15, including build)
 ├── tools/            # 10 tool meta-prompts (stateless, category: tool)
-├── knowledge/        # 222 domain expertise entries (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension)
+├── knowledge/        # 235 domain expertise entries (core, product, review, validation, finalization, execution, tools, game, web-app, backend, cli, library, mobile-app, data-pipeline, ml, browser-extension, research, data-science)
 ├── methodology/      # 3 YAML presets (deep, mvp, custom)
 └── skills/           # Skill templates with {{markers}} for multi-platform resolution (includes mmr)
 ```

package/content/knowledge/data-science/README.md ADDED Viewed

@@ -0,0 +1,23 @@
+# `data-science/` knowledge
+Solo / small-team data-science domain knowledge injected into universal pipeline
+steps by `content/methodology/data-science-overlay.yml`.
+## Lockstep pairs with `ml/`
+Five documents here mirror documents in `content/knowledge/ml/`. The two
+overlays never compose at runtime (a user picks exactly one project type), but
+edits to one side of a pair should trigger review of the other to prevent
+recommendation drift over time:
+| `data-science/`                         | `ml/`                            |
+| --------------------------------------- | -------------------------------- |
+| `data-science-experiment-tracking.md`   | `ml-experiment-tracking.md`      |
+| `data-science-model-evaluation.md`      | `ml-model-evaluation.md`         |
+| `data-science-observability.md`         | `ml-observability.md`            |
+| `data-science-requirements.md`          | `ml-requirements.md`             |
+| `data-science-conventions.md`           | `ml-conventions.md`              |
+`ml/` targets production training and serving systems. `data-science/` targets
+solo / small-team analytics and prototyping. Tool picks may diverge where the
+audience justifies it (e.g. MLflow self-hosted vs managed W&B).

package/content/knowledge/data-science/data-science-architecture.md ADDED Viewed

@@ -0,0 +1,163 @@
+---
+name: data-science-architecture
+description: Local-first architecture for solo and small-team data science — notebook exploration, src/ promotion, idempotent entrypoint pipelines, Polars vs Pandas choice, and artifact separation
+topics: [data-science, architecture, polars, pandas, notebook-promotion]
+---
+"Architecture" sounds heavy for a single analyst opening a notebook, but it is the one decision that separates work a collaborator can rerun tomorrow from a pile of ad-hoc scripts that only you can coax back to life. Solo DS work is local-first, reproducibility-first, and almost never needs Airflow or a Kubernetes cluster. What it needs is a coherent shape that scales from "a single notebook" to "a pipeline a teammate can clone and run" — and a clear story about where raw data, intermediate data, models, and reports each live. This doc lays out that shape and the small set of conventions that make it hold together.
+## Summary
+Architect a solo DS project as layers: exploratory notebooks on top, reusable functions in `src/`, unit tests in `tests/`, and a thin entrypoint script that composes those functions into a reproducible run. Use Polars for datasets >1 GB or >10M rows and Pandas for everything smaller where scikit-learn / seaborn compatibility matters. Runs happen via `uv run python -m src.pipeline` — no scheduler needed. Pipelines are idempotent functions that move data from `data/raw/` to `data/interim/` to `data/processed/`, emitting models to `models/` and reports to `reports/`. This shape deliberately does not solve distributed data, production serving, or real-time inference — when those become real, graduate to Prefect / Dagster and cross over to `ml-serving-patterns.md`.
+## Deep Guidance
+### The layered shape
+The entire architecture is five layers, each with a single responsibility:
+```
+┌──────────────────────────────────────────────────────────────┐
+│ notebooks/               exploration, narrative, charts       │
+│   ↓  (promote stable code)                                    │
+├──────────────────────────────────────────────────────────────┤
+│ src/<project>/           typed, importable functions          │
+│   ↓  (test every function you ship)                           │
+├──────────────────────────────────────────────────────────────┤
+│ tests/                   pytest smoke + unit tests            │
+│   ↓  (functions compose into a run)                           │
+├──────────────────────────────────────────────────────────────┤
+│ src/pipeline.py          entrypoint: load→features→train→save │
+│   ↓  (run produces artifacts)                                 │
+├──────────────────────────────────────────────────────────────┤
+│ data/ models/ reports/   outputs, gitignored or DVC-tracked   │
+└──────────────────────────────────────────────────────────────┘
+```
+Read top-to-bottom it is the promotion path; read bottom-to-top it is the dependency graph. A notebook may import from `src/` but `src/` must never import from a notebook. Tests depend only on `src/`. The entrypoint (`pipeline.py`) is itself a module under `src/`, not a loose script at the repo root — keeping it importable lets you exercise it end-to-end in tests with a tiny fixture dataset.
+### Polars vs Pandas
+Pick the DataFrame library based on data size and ecosystem needs, not on what's trendy. Rule of thumb:
+| Dimension            | Pandas                                | Polars                                  |
+|----------------------|---------------------------------------|-----------------------------------------|
+| Rows                 | <10M comfortably                      | 10M–1B on a single machine              |
+| In-memory size       | <1 GB                                 | 1 GB – ~RAM/2                           |
+| Execution            | Eager, single-threaded                | Lazy + multi-threaded, Arrow-native     |
+| Ecosystem            | scikit-learn, seaborn, plotly, statsmodels | Native; interop via `.to_pandas()` |
+| API stability        | Mature, huge Stack Overflow corpus    | Younger, faster-moving                  |
+**Default to Pandas** when you are in sklearn / statsmodels / seaborn territory with small-to-medium data — ecosystem friction is the dominant cost. **Reach for Polars** when you are doing heavy group-bys, joins, or window functions on datasets where Pandas starts swapping or takes minutes per cell. The two libraries express the same group-by almost identically:
+```python
+# Pandas
+(df
+ .groupby("customer_id")
+ .agg(total_spend=("amount", "sum"), tx_count=("amount", "count"))
+ .reset_index())
+# Polars (lazy — add .collect() to execute)
+(df.lazy()
+ .group_by("customer_id")
+ .agg(pl.col("amount").sum().alias("total_spend"),
+      pl.col("amount").count().alias("tx_count"))
+ .collect())
+```
+Mixing is fine: load with Polars, do the fast aggregation, then `.to_pandas()` right before feeding a scikit-learn estimator. Avoid the trap of half-converting the codebase — pick one as the default for a given project and document it.
+### Notebook to pipeline promotion
+Every piece of code starts life in a notebook. The discipline is knowing when to move it:
+1. You copy-paste a cell into a second notebook → promote.
+2. A transformation has a non-trivial branch (try/except, conditional handling) → promote.
+3. You want to unit-test it → promote (you can't test a notebook cell cleanly).
+Promotion is a four-step move: extract the cell into `src/<project>/features/engineer.py` as a typed function, add a pytest in `tests/`, replace the notebook cell with an `import`, and turn on `%autoreload 2` so subsequent edits live-reload without a kernel restart.
+```python
+# src/<project>/features/engineer.py
+import polars as pl
+def add_tenure_bucket(df: pl.DataFrame, *, today: str) -> pl.DataFrame:
+    """Bucket customers by days since signup into short / medium / long tenure."""
+    return df.with_columns(
+        ((pl.lit(today).str.to_date() - pl.col("signup_date")).dt.total_days())
+        .alias("tenure_days")
+    ).with_columns(
+        pl.when(pl.col("tenure_days") < 90).then(pl.lit("short"))
+          .when(pl.col("tenure_days") < 365).then(pl.lit("medium"))
+          .otherwise(pl.lit("long"))
+          .alias("tenure_bucket")
+    )
+```
+The notebook now reads `from <project>.features.engineer import add_tenure_bucket` and the function is covered by `tests/test_engineer.py` with a six-row fixture. This is the single most important habit in a DS codebase — see `data-science-project-structure.md` for the directory layout it slots into.
+### Idempotent pipeline entrypoints
+The pipeline is a thin composition layer — one function per stage, each one idempotent (same inputs → same outputs, safe to rerun). It lives at `src/<project>/pipeline.py` and exposes a `main(cfg)` that a CLI wraps:
+```python
+# src/<project>/pipeline.py
+import argparse, yaml
+from pathlib import Path
+from <project>.ingestion import load_transactions
+from <project>.validation import validate_schema
+from <project>.features.engineer import build_features
+from <project>.training import train_model
+from <project>.evaluation import evaluate
+from <project>.io import save_model, save_report
+def run(cfg: dict) -> None:
+    run_id = cfg["run_name"]
+    raw = load_transactions(cfg["data"]["raw_path"])
+    validate_schema(raw, cfg["data"]["schema"])
+    processed = build_features(raw, cfg["features"])
+    processed.write_parquet(Path(cfg["data"]["processed_path"]))
+    model, metrics = train_model(processed, cfg["model"])
+    report = evaluate(model, processed, cfg["evaluation"])
+    save_model(model, f"models/{run_id}.joblib")
+    save_report(report, f"reports/{run_id}.html")
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument("--config", required=True, type=Path)
+    args = ap.parse_args()
+    run(yaml.safe_load(args.config.read_text()))
+if __name__ == "__main__":
+    main()
+```
+Invoke it with `uv run python -m <project>.pipeline --config configs/baseline.yaml`. Idempotence means: each stage writes to a deterministic path based on the config, and re-running over existing outputs is a no-op (or an overwrite of identical content). That property is what lets a teammate — or future-you — rerun the pipeline confidently without inspecting every intermediate.
+### Where outputs go
+Artifacts follow a strict directory contract so a run never scatters files:
+| Artifact               | Path                                | Notes                                    |
+|------------------------|-------------------------------------|------------------------------------------|
+| Immutable source data  | `data/raw/`                         | Never written to after initial ingest    |
+| Cached partial transforms | `data/interim/`                  | Safe to delete; regenerable from raw     |
+| Analysis-ready datasets | `data/processed/`                  | Consumed by training                     |
+| Predictions            | `data/processed/predictions/`       | Keeps inference outputs alongside data   |
+| Trained models         | `models/<run_id>.joblib`            | DVC or git-lfs pointer tracked           |
+| Rendered reports       | `reports/<run_id>.html`             | HTML / markdown summaries                |
+| Figures                | `reports/figures/<run_id>/`         | PNG / SVG charts                         |
+The rule: **paths come from config, never hard-coded in code**. `cfg["output"]["model_path"]` lives in the YAML; `"models/baseline_v1.joblib"` never appears as a string literal inside `training.py`. That is what lets a single pipeline module serve every run variant.
+### When to outgrow this
+This architecture covers the 0-to-100GB, one-to-three-contributors slot. Signals you are leaving that slot:
+- Data no longer fits on a laptop (>100 GB, or streaming sources) → Spark, DuckDB+S3, or a warehouse-side pipeline.
+- You need scheduled / triggered runs with retries, alerting, observability → Prefect, Dagster, or Airflow.
+- The model must serve real-time predictions with SLA → cross over to `ml-serving-patterns.md` for online inference, feature stores, and the training-serving split.
+- Multiple people are editing the pipeline concurrently → promote `configs/` to a registry, add a model registry (MLflow), and start writing ADRs under `docs/adr/`.
+- The team wants experiment tracking beyond a CSV of metrics → MLflow Tracking or Weights & Biases.
+Do not preemptively adopt any of these. Installing Dagster for a weekly notebook is a classic small-team failure mode — the operational tax (scheduler, DB, UI, auth) dwarfs the benefit. Graduate one piece at a time, and only when the pain is concrete. The layered shape above is deliberately the smallest coherent thing; resist making it bigger until the evidence demands it.

package/content/knowledge/data-science/data-science-conventions.md ADDED Viewed

@@ -0,0 +1,233 @@
+---
+name: data-science-conventions
+description: Python coding conventions for solo data-science work — ruff for lint+format, pragmatic type hints, pyproject.toml as single config source, import ordering, module layout, naming, and docstrings
+topics: [data-science, conventions, python, ruff, type-hints]
+---
+Solo data-science code drifts faster than any other kind of Python: half of it lives in notebooks, the other half migrates into scripts, and nothing stays stable long enough to earn a style review. Consistent conventions are the only thing that keeps cognitive load bounded when you come back to a project after two months. Encode them in tooling (`ruff`, `pyproject.toml`) so they run on save — not on willpower — and the notebook→script promotion path stays smooth instead of becoming a cleanup tax.
+## Summary
+Use `ruff` as the single lint + format tool — `ruff format` is Black-compatible and replaces Black, so do not install both. Apply `type hints` pragmatically: typed on any function another module imports, omitted on throwaway notebook helpers. Centralize all project and tool configuration in `pyproject.toml` — one file for build metadata, dependencies, ruff, and pytest. Use `ruff`/`isort`-style import sections (stdlib → third-party → local), a flat `src/` layout with a clear module split, and docstrings sized to the consumer: one-liners for internal helpers, full Google/NumPy style for anything a teammate will call without reading the source.
+## Deep Guidance
+### Linter + formatter (ruff)
+`ruff` is the only Python linter/formatter a solo DS project needs. It replaces `flake8`, `isort`, `pyupgrade`, `pydocstyle`, `pylint` (mostly), and — via `ruff format` — Black. It is an order of magnitude faster than the tools it replaces, configured in one `[tool.ruff]` block, and has no plugin-management overhead. Do not layer Black on top: `ruff format` implements the same formatting contract, and running both just causes churn.
+```toml
+# pyproject.toml
+[tool.ruff]
+line-length = 100
+target-version = "py311"
+extend-exclude = ["notebooks/_scratch", "data", "models"]
+[tool.ruff.lint]
+select = [
+  "E",   # pycodestyle errors
+  "W",   # pycodestyle warnings
+  "F",   # pyflakes
+  "I",   # isort (import sorting)
+  "N",   # pep8-naming
+  "UP",  # pyupgrade
+  "B",   # flake8-bugbear
+  "C90", # mccabe complexity
+  "D",   # pydocstyle
+]
+ignore = [
+  "D100",  # missing docstring in public module — noisy for scripts
+  "D104",  # missing docstring in public package
+  "E501",  # line-too-long — formatter handles it
+]
+[tool.ruff.lint.per-file-ignores]
+# Notebooks and experiment scripts get a lighter hand
+"notebooks/**/*.py" = ["D", "N806", "E402"]
+"scripts/**/*.py"  = ["D"]
+"tests/**/*.py"    = ["D"]
+[tool.ruff.lint.pydocstyle]
+convention = "google"
+[tool.ruff.lint.mccabe]
+max-complexity = 12
+[tool.ruff.format]
+quote-style = "double"
+indent-style = "space"
+```
+**Tradeoff**: notebook and exploration code legitimately breaks rules that production code should not — uppercase variable names (`X_train`), imports after executable code, no docstrings. The `per-file-ignores` block disables the rules that fight notebook workflows without weakening the defaults for `src/`. Do not globally ignore `D` or `N` just to silence notebook noise.
+Run on save (editor integration) and as a pre-commit hook. In CI, run `ruff check .` and `ruff format --check .` — the `--check` flag fails instead of rewriting.
+### Type hints
+Python is not a typed language, and pretending it is in exploratory code wastes time. The rule is **import boundary = type boundary**: if another module imports the function, type it. Notebook-local helpers and inline lambdas do not need annotations.
+```python
+# src/features/encoders.py — imported by training and serving, fully typed
+from __future__ import annotations
+import numpy as np
+import pandas as pd
+def target_encode(
+    series: pd.Series,
+    target: pd.Series,
+    smoothing: float = 10.0,
+) -> pd.Series:
+    """Smoothed target encoding for a categorical feature.
+    Args:
+        series: Categorical feature values (any hashable dtype).
+        target: Numeric target aligned to `series` by index.
+        smoothing: Prior weight; higher values pull rare categories
+            toward the global mean.
+    Returns:
+        Series of encoded floats aligned to `series.index`.
+    """
+    global_mean = target.mean()
+    agg = target.groupby(series).agg(["mean", "count"])
+    weight = agg["count"] / (agg["count"] + smoothing)
+    encoding = weight * agg["mean"] + (1 - weight) * global_mean
+    return series.map(encoding).astype(np.float64)
+```
+```python
+# notebooks/03_eda.py — throwaway scratch, no annotations needed
+def quick_hist(col):
+    return df[col].value_counts().head(20)
+for c in cat_cols:
+    print(c, quick_hist(c).to_dict())
+```
+Practical rules:
+- Type every function exported from `src/` — parameters and return.
+- Type dataclasses and `TypedDict` schemas that describe data contracts (row shapes, config dicts).
+- Skip annotations on notebook cells, inline closures, and private helpers inside a single script.
+- Use `from __future__ import annotations` at the top of every `src/` file — it makes all annotations lazy strings, so forward references and expensive-to-import types (`torch.Tensor`, `pd.DataFrame`) cost nothing at import time.
+- Do not run `mypy --strict` on a solo DS project. Run it on `src/` with `--ignore-missing-imports` if you want a safety net, and do not bother with notebooks.
+### Project layout and pyproject.toml
+One `pyproject.toml` at the repo root configures the build, dependencies, lint, format, and tests. Do not scatter config across `setup.cfg`, `.flake8`, `.isort.cfg`, and `pytest.ini` — everything lives in `pyproject.toml`.
+```toml
+# pyproject.toml
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[project]
+name = "churn-model"
+version = "0.1.0"
+description = "Customer churn prediction — feature pipeline, training, and serving."
+requires-python = ">=3.11"
+dependencies = [
+  "pandas>=2.1",
+  "numpy>=1.26",
+  "scikit-learn>=1.4",
+  "pydantic>=2.5",
+]
+[project.optional-dependencies]
+dev = [
+  "ruff>=0.3",
+  "pytest>=8.0",
+  "pytest-cov>=4.1",
+  "ipykernel>=6.29",
+]
+[tool.ruff]
+line-length = 100
+target-version = "py311"
+# ... (see ruff section above)
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+addopts = "-ra --strict-markers --cov=src --cov-report=term-missing"
+markers = [
+  "slow: marks tests as slow (deselect with '-m \"not slow\"')",
+]
+```
+Repo layout:
+```
+churn-model/
+  pyproject.toml
+  README.md
+  src/
+    churn_model/
+      __init__.py
+      data/          # loaders, schemas, splits
+      features/      # transformers, encoders, selection
+      models/        # model definitions and wrappers
+      training/      # train loops, CV runners
+      evaluation/    # metrics, diagnostics
+      serving/       # inference helpers
+  notebooks/
+    01_data_audit.ipynb
+    02_feature_exploration.ipynb
+  tests/
+    test_features.py
+    test_training.py
+  configs/
+    base.yaml
+```
+Use a `src/` layout (not flat) so imports always go through the installed package — this prevents the "works in notebook, breaks in test" failure mode where `from my_module import x` resolves from the CWD instead of the package.
+### Import ordering
+`ruff` with rule `I` enforces `isort`-compatible sections automatically. The contract:
+1. Future imports (`from __future__ import annotations`)
+2. Standard library
+3. Third-party
+4. First-party (your package)
+5. Local relative (`from .utils import ...`)
+One blank line between sections, alphabetical within each. Do not hand-maintain this — `ruff check --fix` sorts imports in milliseconds.
+```python
+from __future__ import annotations
+import json
+from pathlib import Path
+import numpy as np
+import pandas as pd
+from sklearn.model_selection import KFold
+from churn_model.data import load_raw
+from churn_model.features import target_encode
+from .utils import timed
+```
+### Naming and docstrings
+Naming rubric (enforced by `ruff` rule `N`):
+- **Modules/files**: `snake_case.py` (`feature_store.py`, not `FeatureStore.py`).
+- **Functions/variables**: `snake_case` (`compute_auc`, `n_splits`).
+- **Classes**: `PascalCase` (`ChurnDataset`, `TargetEncoder`).
+- **Constants**: `UPPER_SNAKE_CASE` at module top level (`DEFAULT_SEED = 42`, `FEATURE_COLUMNS: tuple[str, ...] = (...)`).
+- **Private**: single leading underscore (`_internal_helper`). Double underscore only when you specifically want name-mangling inside a class.
+- **Type variables**: `PascalCase` with suffix (`ModelT = TypeVar("ModelT")`).
+- **DataFrame matrices**: `X`, `y`, `X_train`, `y_test` are the one permitted uppercase exception — this is ML convention and `ruff` can be told to allow it via `N806` ignore in model/training modules.
+Docstring style sizing — match the cost of writing the docstring to the consumer:
+- **Terse one-liner** for private helpers and obvious utilities. `"""Return the 95th percentile of non-null values."""` is enough.
+- **Full Google-style** (Args/Returns/Raises) for any public function in `src/features/`, `src/models/`, or `src/serving/` — anything a teammate or future-you will call without opening the source. See the `target_encode` example above.
+- **Module docstring** on every `src/` module: one sentence describing what lives there. Skip on `scripts/` and `notebooks/`.
+- **Class docstring** covers the class contract; `__init__` args go in the class docstring, not a separate `__init__` docstring. (This is the Google convention and `ruff`'s `pydocstyle` setting enforces it.)
+Pick Google **or** NumPy style — not both — and set it in `[tool.ruff.lint.pydocstyle]`. Google is more compact and reads better in IDE hover; NumPy is better when you have long parameter descriptions with math. For solo DS, Google is the default recommendation.