@zigrivers/scaffold 3.22.0 → 3.24.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (111) hide show
  1. package/README.md +44 -23
  2. package/content/knowledge/core/automated-review-tooling.md +3 -3
  3. package/content/knowledge/core/multi-model-review-dispatch.md +13 -4
  4. package/content/knowledge/data-science/README.md +23 -0
  5. package/content/knowledge/data-science/data-science-architecture.md +163 -0
  6. package/content/knowledge/data-science/data-science-conventions.md +233 -0
  7. package/content/knowledge/data-science/data-science-data-versioning.md +198 -0
  8. package/content/knowledge/data-science/data-science-dev-environment.md +159 -0
  9. package/content/knowledge/data-science/data-science-experiment-tracking.md +194 -0
  10. package/content/knowledge/data-science/data-science-model-evaluation.md +160 -0
  11. package/content/knowledge/data-science/data-science-notebook-discipline.md +170 -0
  12. package/content/knowledge/data-science/data-science-observability.md +161 -0
  13. package/content/knowledge/data-science/data-science-project-structure.md +178 -0
  14. package/content/knowledge/data-science/data-science-reproducibility.md +164 -0
  15. package/content/knowledge/data-science/data-science-requirements.md +151 -0
  16. package/content/knowledge/data-science/data-science-security.md +151 -0
  17. package/content/knowledge/data-science/data-science-testing.md +183 -0
  18. package/content/knowledge/ml/README.md +10 -0
  19. package/content/methodology/data-science-overlay.yml +39 -0
  20. package/content/pipeline/build/multi-agent-resume.md +7 -6
  21. package/content/pipeline/build/multi-agent-start.md +7 -6
  22. package/content/pipeline/build/single-agent-resume.md +7 -6
  23. package/content/pipeline/build/single-agent-start.md +7 -6
  24. package/content/pipeline/environment/automated-pr-review.md +79 -27
  25. package/content/skills/mmr/SKILL.md +72 -2
  26. package/content/skills/scaffold-runner/SKILL.md +65 -19
  27. package/content/tools/review-code.md +74 -16
  28. package/content/tools/review-pr.md +25 -6
  29. package/dist/cli/commands/check.d.ts.map +1 -1
  30. package/dist/cli/commands/check.js +28 -17
  31. package/dist/cli/commands/check.js.map +1 -1
  32. package/dist/config/schema.d.ts +672 -126
  33. package/dist/config/schema.d.ts.map +1 -1
  34. package/dist/config/schema.js +8 -0
  35. package/dist/config/schema.js.map +1 -1
  36. package/dist/config/schema.test.js +2 -2
  37. package/dist/config/schema.test.js.map +1 -1
  38. package/dist/config/validators/data-science.d.ts +4 -0
  39. package/dist/config/validators/data-science.d.ts.map +1 -0
  40. package/dist/config/validators/data-science.js +15 -0
  41. package/dist/config/validators/data-science.js.map +1 -0
  42. package/dist/config/validators/index.d.ts.map +1 -1
  43. package/dist/config/validators/index.js +2 -0
  44. package/dist/config/validators/index.js.map +1 -1
  45. package/dist/core/assembly/knowledge-loader.d.ts.map +1 -1
  46. package/dist/core/assembly/knowledge-loader.js +6 -0
  47. package/dist/core/assembly/knowledge-loader.js.map +1 -1
  48. package/dist/core/assembly/knowledge-loader.test.js +34 -0
  49. package/dist/core/assembly/knowledge-loader.test.js.map +1 -1
  50. package/dist/e2e/project-type-overlays.test.js +73 -0
  51. package/dist/e2e/project-type-overlays.test.js.map +1 -1
  52. package/dist/project/adopt.d.ts.map +1 -1
  53. package/dist/project/adopt.js +3 -1
  54. package/dist/project/adopt.js.map +1 -1
  55. package/dist/project/detectors/coverage.test.d.ts +2 -0
  56. package/dist/project/detectors/coverage.test.d.ts.map +1 -0
  57. package/dist/project/detectors/coverage.test.js +78 -0
  58. package/dist/project/detectors/coverage.test.js.map +1 -0
  59. package/dist/project/detectors/data-science.d.ts +4 -0
  60. package/dist/project/detectors/data-science.d.ts.map +1 -0
  61. package/dist/project/detectors/data-science.js +32 -0
  62. package/dist/project/detectors/data-science.js.map +1 -0
  63. package/dist/project/detectors/data-science.test.d.ts +2 -0
  64. package/dist/project/detectors/data-science.test.d.ts.map +1 -0
  65. package/dist/project/detectors/data-science.test.js +62 -0
  66. package/dist/project/detectors/data-science.test.js.map +1 -0
  67. package/dist/project/detectors/disambiguate.d.ts +2 -0
  68. package/dist/project/detectors/disambiguate.d.ts.map +1 -1
  69. package/dist/project/detectors/disambiguate.js +3 -2
  70. package/dist/project/detectors/disambiguate.js.map +1 -1
  71. package/dist/project/detectors/disambiguate.test.js +10 -1
  72. package/dist/project/detectors/disambiguate.test.js.map +1 -1
  73. package/dist/project/detectors/index.d.ts.map +1 -1
  74. package/dist/project/detectors/index.js +2 -0
  75. package/dist/project/detectors/index.js.map +1 -1
  76. package/dist/project/detectors/library.d.ts.map +1 -1
  77. package/dist/project/detectors/library.js +1 -0
  78. package/dist/project/detectors/library.js.map +1 -1
  79. package/dist/project/detectors/resolve-detection.test.js +31 -0
  80. package/dist/project/detectors/resolve-detection.test.js.map +1 -1
  81. package/dist/project/detectors/types.d.ts +6 -2
  82. package/dist/project/detectors/types.d.ts.map +1 -1
  83. package/dist/project/detectors/types.js.map +1 -1
  84. package/dist/types/config.d.ts +8 -1
  85. package/dist/types/config.d.ts.map +1 -1
  86. package/dist/wizard/copy/core.d.ts.map +1 -1
  87. package/dist/wizard/copy/core.js +4 -0
  88. package/dist/wizard/copy/core.js.map +1 -1
  89. package/dist/wizard/copy/data-science.d.ts +3 -0
  90. package/dist/wizard/copy/data-science.d.ts.map +1 -0
  91. package/dist/wizard/copy/data-science.js +15 -0
  92. package/dist/wizard/copy/data-science.js.map +1 -0
  93. package/dist/wizard/copy/index.d.ts.map +1 -1
  94. package/dist/wizard/copy/index.js +2 -0
  95. package/dist/wizard/copy/index.js.map +1 -1
  96. package/dist/wizard/copy/types.d.ts +5 -1
  97. package/dist/wizard/copy/types.d.ts.map +1 -1
  98. package/dist/wizard/copy/types.test-d.js +7 -0
  99. package/dist/wizard/copy/types.test-d.js.map +1 -1
  100. package/dist/wizard/questions.d.ts +2 -1
  101. package/dist/wizard/questions.d.ts.map +1 -1
  102. package/dist/wizard/questions.js +9 -1
  103. package/dist/wizard/questions.js.map +1 -1
  104. package/dist/wizard/questions.test.js +14 -0
  105. package/dist/wizard/questions.test.js.map +1 -1
  106. package/dist/wizard/wizard.d.ts.map +1 -1
  107. package/dist/wizard/wizard.js +1 -0
  108. package/dist/wizard/wizard.js.map +1 -1
  109. package/package.json +1 -1
  110. package/skills/mmr/SKILL.md +72 -2
  111. package/skills/scaffold-runner/SKILL.md +65 -19
@@ -0,0 +1,151 @@
1
+ ---
2
+ name: data-science-security
3
+ description: Practical security guardrails for solo / small-team data-science work — PII masking at ingest, credential hygiene with direnv and 1Password, data classification tiers, notebook output stripping, and a note on model memorization
4
+ topics: [data-science, security, pii, secrets, data-classification]
5
+ ---
6
+
7
+ DS work has elevated security risk because analysis code routinely touches raw customer data before anyone has had a chance to sanitize it. A notebook can render real names, emails, and account numbers inline, then get committed to git, emailed to a stakeholder, or pasted into Slack without a second thought. Prediction caches and CSV exports quietly duplicate sensitive rows into `data/` subdirectories. Credentials for warehouses and cloud buckets get dropped into `.env` files or — worse — directly into a notebook cell. The blast radius of a sloppy DS workflow is larger than people assume, and the mitigations are not exotic: they are cheap, boring habits that need to be enforced by tooling.
8
+
9
+ ## Summary
10
+
11
+ Mask `PII` at the ingest boundary so downstream notebooks and logs never see raw identifiers — hash emails, truncate names, drop free-text you do not need. Never commit `secrets`; keep local credentials in a gitignored `direnv` `.envrc.local` or, better, inject them at runtime with `1Password` CLI (`op run --`) so they are never written to disk. Classify every dataset as public / internal / confidential / restricted and let the tier decide where it lives — restricted data stays in the warehouse, confidential gets gitignored, internal lives on a shared drive, public is public. Strip notebook outputs with `nbstripout` as a pre-commit hook (or switch to Marimo's `.py` notebooks, which do not embed outputs at all). For fine-tuned or RAG models, assume training data can leak back out through generations and scrub accordingly.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Handling PII
16
+
17
+ Identify `PII` at the ingest boundary, not inside your analysis code. The rule is: once a column has left the ingest layer, it should either be pseudonymized (hashed, truncated, bucketed) or stripped. Free-text fields (support tickets, chat logs, notes) are the worst offenders — if the analysis does not require them, drop them. If it does, run them through a scrubber like Presidio or a simple regex pass before they land in a DataFrame.
18
+
19
+ Typical categories to handle:
20
+
21
+ - **Direct identifiers** — name, email, phone, SSN, account number, precise address. Hash or drop.
22
+ - **Quasi-identifiers** — ZIP + age + gender can re-identify an individual in a surprisingly small population. Bucket aggressively (age → 10-year bands, ZIP → first 3 digits).
23
+ - **Sensitive attributes** — health, financial, biometric. Treat as restricted (see classification below) and keep out of local files entirely.
24
+ - **Free-text** — run through a scrubber or drop unless the analysis genuinely needs the prose.
25
+
26
+ A minimal masking helper for structured data:
27
+
28
+ ```python
29
+ # src/pii.py
30
+ import hashlib
31
+ import pandas as pd
32
+
33
+ def _hash_email(email: str, salt: str) -> str:
34
+ """Deterministic, salted hash — same email maps to same token for joins."""
35
+ if pd.isna(email):
36
+ return ""
37
+ return hashlib.sha256(f"{salt}:{email.lower().strip()}".encode()).hexdigest()[:16]
38
+
39
+ def mask_customer_frame(df: pd.DataFrame, salt: str) -> pd.DataFrame:
40
+ out = df.copy()
41
+ if "email" in out:
42
+ out["email_id"] = out["email"].map(lambda e: _hash_email(e, salt))
43
+ out = out.drop(columns=["email"])
44
+ if "full_name" in out:
45
+ # keep first initial for rough demographic analysis, drop the rest
46
+ out["name_initial"] = out["full_name"].str[:1]
47
+ out = out.drop(columns=["full_name"])
48
+ # drop anything we never need
49
+ for col in ("phone", "ssn", "address", "dob"):
50
+ if col in out:
51
+ out = out.drop(columns=[col])
52
+ return out
53
+ ```
54
+
55
+ Pair this with a `pandera` schema check on the training-ready DataFrame that asserts sensitive columns are absent — "no bare `email` column, no `ssn` column, no `phone` column." That way a future change that accidentally reintroduces raw PII fails loudly in CI instead of silently:
56
+
57
+ ```python
58
+ import pandera.pandas as pa
59
+
60
+ TrainingSchema = pa.DataFrameSchema(
61
+ columns={
62
+ "email_id": pa.Column(str),
63
+ "name_initial": pa.Column(str, nullable=True),
64
+ "signup_month": pa.Column("datetime64[ns]"),
65
+ },
66
+ strict=True, # reject any column not listed
67
+ )
68
+
69
+ # extra defensive: blacklist raw-PII names in case strict=False is relaxed later
70
+ _FORBIDDEN = {"email", "full_name", "phone", "ssn", "address", "dob"}
71
+ assert not (_FORBIDDEN & set(df.columns)), f"raw PII leaked: {_FORBIDDEN & set(df.columns)}"
72
+ ```
73
+
74
+ Run this check at the boundary between ingest and modeling, and again before anything gets written to a prediction cache or exported as a report.
75
+
76
+ ### Credential hygiene
77
+
78
+ Never commit `secrets`. There are two patterns worth using locally; pick one per project and be consistent.
79
+
80
+ **Pattern 1 — `direnv` with a gitignored `.envrc.local`:**
81
+
82
+ ```bash
83
+ # .envrc (committed — references local overrides)
84
+ dotenv_if_exists .envrc.local
85
+
86
+ # .envrc.local (gitignored — real values live here)
87
+ export WAREHOUSE_URL="postgres://analytics:REAL_PASSWORD@warehouse.internal/prod"
88
+ export AWS_PROFILE="ds-read"
89
+ ```
90
+
91
+ Add `.envrc.local` and `.env*` to `.gitignore`. `direnv` loads these exports automatically when you `cd` into the project.
92
+
93
+ **Pattern 2 — `1Password` CLI with `op run`:**
94
+
95
+ ```bash
96
+ # .env.1password (committed — references, not values)
97
+ WAREHOUSE_URL=op://DS/warehouse-prod/connection_url
98
+ OPENAI_API_KEY=op://DS/openai/api_key
99
+
100
+ # run any command with secrets injected at runtime
101
+ op run --env-file=.env.1password -- python src/train.py
102
+ op run --env-file=.env.1password -- jupyter lab
103
+ ```
104
+
105
+ `op run` substitutes the `op://` references with real values in the child process's environment and never writes them to disk. The committed `.env.1password` file is safe to share because it contains only vault paths, not secrets. This is the stronger pattern when more than one person needs access — you manage grants in 1Password instead of passing `.envrc.local` files around.
106
+
107
+ In production, secrets live in the platform's secret manager (AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault) and get injected into the runtime the same way. The governing rule: **if it would go in a `.env` file, it goes in 1Password; if it would go in a secret manager in prod, it stays there — don't duplicate a copy onto your laptop.**
108
+
109
+ A few hygiene rules that follow from this:
110
+
111
+ - Never paste an API key into a notebook cell, even temporarily. Cells get autosaved, checkpointed, and sometimes committed.
112
+ - Never print a credential to logs — wrap secret-carrying objects in types that redact on `__repr__` (Pydantic's `SecretStr`, for example).
113
+ - Rotate any credential that has ever touched your clipboard, a chat window, or a screen share.
114
+ - Run a pre-commit scanner (`gitleaks` or `detect-secrets`) so a stray key cannot get committed even when the `.envrc.local` pattern is ignored.
115
+
116
+ ### Data classification
117
+
118
+ Classify every dataset against a four-tier rubric and let the tier drive storage and access:
119
+
120
+ - **Public** — already on the internet (open datasets, published benchmarks). Can live anywhere, including git.
121
+ - **Internal** — non-sensitive company data (aggregated metrics, anonymized cohorts). Shared private drive or object store with team-level access. Do not commit to git.
122
+ - **Confidential** — business-sensitive but not regulated (revenue breakdowns, customer segments, unreleased product data). Gitignored `data/` directory locally; encrypted bucket with narrow ACL for sharing. Never in notebooks you paste into Slack.
123
+ - **Restricted** — regulated or high-risk PII (health records, payment data, government IDs, raw customer identifiers). Stays in the warehouse or source bucket — **do not download**. Run analysis server-side (dbt model, warehouse notebook, SQL-only pipeline) and only materialize aggregates locally.
124
+
125
+ The mapping matters more than the labels. The point of classification is that "can I keep a CSV of this on my laptop?" has a predetermined answer instead of a per-dataset judgment call made while tired.
126
+
127
+ Record the classification alongside the data — a one-line `data/README.md` entry per source (`customers_raw: restricted, warehouse-only`) is enough. When a new teammate or a future-you adds a pull, the constraint is visible without having to ask.
128
+
129
+ ### Notebook output hygiene
130
+
131
+ A Jupyter `.ipynb` file is a JSON blob that embeds every cell's rendered output, which means a single `df.head()` on a customer table commits 5 real customer rows to git forever. Strip outputs with `nbstripout` as a pre-commit hook:
132
+
133
+ ```yaml
134
+ # .pre-commit-config.yaml
135
+ repos:
136
+ - repo: https://github.com/kynan/nbstripout
137
+ rev: 0.7.1
138
+ hooks:
139
+ - id: nbstripout
140
+ files: \.ipynb$
141
+ ```
142
+
143
+ Install once with `pre-commit install` and every `git commit` scrubs outputs automatically. Pair with a Jupyter config (`jupyter_notebook_config.py`) that disables output saving entirely if you want belt-and-braces.
144
+
145
+ Marimo's `.py`-format notebooks sidestep this problem — they are regular Python files, outputs never get persisted in the notebook, and diffs are reviewable like ordinary code. If you have not picked a notebook format for a new project, prefer Marimo; see `data-science-notebook-discipline` for the broader tradeoffs.
146
+
147
+ Whichever format you pick, also keep prediction caches, CSV exports, and ad-hoc scratch files out of git — a broad `data/` and `outputs/` entry in `.gitignore` prevents the most common leak: a confidential sample dataset getting committed as an "example."
148
+
149
+ ### A word on model memorization
150
+
151
+ Fine-tuned LLMs and RAG systems can reproduce training data verbatim under the right prompt. If your fine-tune corpus or retrieval index contains PII, assume it can leak. Mitigations, in order of strength: scrub PII from the corpus before training or indexing (reuse the masking helper above); host the model privately so prompts and responses stay inside your perimeter; apply output filtering to block regex-detectable identifiers on the way out. Do not fine-tune a public base model on raw customer data and then expose it on an open endpoint — that is the failure mode worth avoiding.
@@ -0,0 +1,183 @@
1
+ ---
2
+ name: data-science-testing
3
+ description: Testing strategy for solo DS code — pytest for pure functions, pandera for DataFrame schemas at test time and at ingest boundaries, and committed CSV fixtures for deterministic tests
4
+ topics: [data-science, testing, pytest, pandera]
5
+ ---
6
+
7
+ Data-science code rots quietly. A notebook cell that worked on Tuesday's snapshot silently breaks on Friday's because an upstream column was renamed, a dtype shifted from `int64` to `float64`, or a categorical grew a new level nobody tested for. Refactors that move feature logic out of a notebook into `src/` routinely regress because there was no test pinning the old behavior. Tests catch these failures at the line that introduced them instead of at the end of a three-hour pipeline run.
8
+
9
+ ## Summary
10
+
11
+ Treat DS testing as three separate layers with distinct tools. Use `pytest` for pure-function unit tests — feature engineering, metric calculations, preprocessing helpers in `src/`. Use `pandera` for DataFrame-level contracts: schemas assert column names, dtypes, value ranges, and non-null expectations, and those same schemas run both in tests and at runtime at ingest boundaries. Use committed CSV fixtures in `tests/fixtures/` loaded through pytest fixtures for deterministic, reviewable test data. Keep this doc's scope to CODE correctness; model quality (AUC, calibration, drift) belongs in `data-science-model-evaluation.md`.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Unit tests with pytest
16
+
17
+ Every helper in `src/` that transforms data is a pure function candidate for `pytest`. Arrange small inputs, act by calling the function, assert on the output. If a helper reaches for a database or filesystem, push that I/O out to the caller so the core logic stays testable without mocks.
18
+
19
+ ```python
20
+ # tests/test_features.py
21
+ import numpy as np
22
+ import pandas as pd
23
+ import pytest
24
+ from src.features import impute_missing_ages
25
+
26
+ class TestImputeMissingAges:
27
+ def test_fills_nan_with_median(self):
28
+ df = pd.DataFrame({"age": [10.0, 20.0, 30.0, np.nan]})
29
+ result = impute_missing_ages(df)
30
+ assert result["age"].isna().sum() == 0
31
+ assert result.loc[3, "age"] == 20.0 # median of [10, 20, 30]
32
+
33
+ def test_preserves_non_null_values(self):
34
+ df = pd.DataFrame({"age": [10.0, 20.0, 30.0, np.nan]})
35
+ result = impute_missing_ages(df)
36
+ pd.testing.assert_series_equal(
37
+ result.loc[:2, "age"], df.loc[:2, "age"], check_names=False
38
+ )
39
+
40
+ def test_all_nan_raises(self):
41
+ df = pd.DataFrame({"age": [np.nan, np.nan]})
42
+ with pytest.raises(ValueError, match="all-null"):
43
+ impute_missing_ages(df)
44
+ ```
45
+
46
+ Run with `pytest -q`. Add `--cov=src` via `pytest-cov` once the project has more than a handful of helpers; aim for coverage on feature-engineering and metrics modules, not notebooks.
47
+
48
+ Four rules keep this layer productive:
49
+
50
+ - **Name tests after the behavior, not the function**: `test_fills_nan_with_median` beats `test_impute_missing_ages_1`. The name is the failure message when CI turns red.
51
+ - **One assertion family per test**: a test checks either output values, or output shape, or error behavior — not all three. Split into three tests. Failures point at the broken property immediately.
52
+ - **Use `pd.testing.assert_frame_equal` and `np.testing.assert_allclose`**: never compare DataFrames with `==` or floats with exact equality. Pass `rtol`/`atol` explicitly so the tolerance is visible in the test.
53
+ - **Mark slow tests**: decorate any test that loads a non-trivial dataset with `@pytest.mark.slow` and run the default suite with `-m "not slow"` so `pytest` stays under ~5 seconds on save.
54
+
55
+ ### Data-frame validation with pandera
56
+
57
+ Column drift is the single most common source of silent DS bugs. `pandera` encodes a DataFrame contract once and reuses it as a test assertion and a runtime guard at ingest boundaries — the moment a CSV, parquet file, or API response becomes a DataFrame.
58
+
59
+ ```python
60
+ # src/schemas.py
61
+ import pandera.pandas as pa
62
+ from pandera.typing import Series
63
+
64
+ class CustomersSchema(pa.DataFrameModel):
65
+ customer_id: Series[int] = pa.Field(unique=True, ge=0)
66
+ age: Series[float] = pa.Field(ge=0, le=120, nullable=True)
67
+ signup_date: Series[pa.DateTime]
68
+ segment: Series[str] = pa.Field(isin=["free", "pro", "enterprise"])
69
+
70
+ class Config:
71
+ strict = True # reject unexpected columns
72
+
73
+ # src/ingest.py — runtime validation at the boundary
74
+ from src.schemas import CustomersSchema
75
+
76
+ def load_customers(path: str) -> pd.DataFrame:
77
+ df = pd.read_csv(path, parse_dates=["signup_date"])
78
+ return CustomersSchema.validate(df) # raises SchemaError on violation
79
+ ```
80
+
81
+ The same schema doubles as a test fixture contract:
82
+
83
+ ```python
84
+ # tests/test_ingest.py
85
+ import pandas as pd
86
+ import pytest
87
+ from pandera.errors import SchemaError
88
+ from src.ingest import load_customers
89
+
90
+ def test_rejects_invalid_segment(tmp_path):
91
+ bad = tmp_path / "bad.csv"
92
+ bad.write_text("customer_id,age,signup_date,segment\n1,30,2024-01-01,vip\n")
93
+ with pytest.raises(SchemaError, match="segment"):
94
+ load_customers(str(bad))
95
+ ```
96
+
97
+ Prefer `schema.validate(df)` calls over the `@pa.check_input` decorator — explicit validation is easier to trace in stack traces and does not hide behind import-time decoration.
98
+
99
+ Three patterns make pandera pay off:
100
+
101
+ - **Validate once at the boundary, trust downstream**: call `Schema.validate(df)` inside `load_customers`, `load_orders`, or whatever function first produces a DataFrame. Downstream code can then assume columns, dtypes, and ranges without re-checking.
102
+ - **Use `lazy=True` during development**: `Schema.validate(df, lazy=True)` collects every violation instead of failing on the first, which is dramatically faster when fixing a bad CSV.
103
+ - **Version schemas alongside migrations**: when a column renames or a new category lands, update the schema in the same PR as the code change. Schema drift caught in code review is cheaper than schema drift caught in production.
104
+
105
+ ### Fixtures: deterministic test data
106
+
107
+ Random DataFrames in tests produce flaky failures that are painful to debug. Commit small, hand-curated CSVs to `tests/fixtures/` and load them through pytest `fixture` functions. The CSVs are reviewable in PRs, the fixtures are reusable across test modules.
108
+
109
+ ```python
110
+ # tests/conftest.py
111
+ from pathlib import Path
112
+ import pandas as pd
113
+ import pytest
114
+
115
+ FIXTURES = Path(__file__).parent / "fixtures"
116
+
117
+ @pytest.fixture
118
+ def customers_df() -> pd.DataFrame:
119
+ return pd.read_csv(FIXTURES / "customers_small.csv", parse_dates=["signup_date"])
120
+
121
+ @pytest.fixture(params=["customers_empty.csv", "customers_one_row.csv", "customers_small.csv"])
122
+ def customers_edge_cases(request) -> pd.DataFrame:
123
+ return pd.read_csv(FIXTURES / request.param, parse_dates=["signup_date"])
124
+ ```
125
+
126
+ Use `@pytest.mark.parametrize` to cover multiple scenarios without duplicating test bodies:
127
+
128
+ ```python
129
+ @pytest.mark.parametrize(
130
+ "segment,expected_discount",
131
+ [("free", 0.0), ("pro", 0.1), ("enterprise", 0.2)],
132
+ )
133
+ def test_discount_by_segment(segment, expected_discount):
134
+ assert compute_discount(segment) == expected_discount
135
+ ```
136
+
137
+ Keep fixture CSVs under ~50 rows. Anything larger belongs in a `data/` directory and should be generated or downloaded, not committed.
138
+
139
+ When a test genuinely needs a larger or procedurally generated DataFrame, build it deterministically with a seeded RNG inside a fixture — never inline, and never with the global `np.random` state:
140
+
141
+ ```python
142
+ @pytest.fixture
143
+ def synthetic_transactions() -> pd.DataFrame:
144
+ rng = np.random.default_rng(seed=42)
145
+ n = 1000
146
+ return pd.DataFrame({
147
+ "user_id": rng.integers(0, 100, size=n),
148
+ "amount": rng.lognormal(mean=3.0, sigma=1.0, size=n),
149
+ "ts": pd.date_range("2024-01-01", periods=n, freq="1h"),
150
+ })
151
+ ```
152
+
153
+ A fixed seed means the same test always sees the same data, so flaky-failure postmortems are possible instead of "must have been a weird random sample."
154
+
155
+ ### Running the suite
156
+
157
+ Layout and commands stay boring on purpose:
158
+
159
+ ```
160
+ tests/
161
+ conftest.py # shared fixtures
162
+ fixtures/ # small committed CSVs
163
+ customers_small.csv
164
+ customers_empty.csv
165
+ test_features.py # pytest for src/features.py
166
+ test_ingest.py # pandera + ingest tests
167
+ test_metrics.py # pytest for src/metrics.py
168
+ ```
169
+
170
+ Wire `pytest` into `pyproject.toml` so `pytest` alone runs the right suite:
171
+
172
+ ```toml
173
+ [tool.pytest.ini_options]
174
+ testpaths = ["tests"]
175
+ addopts = "-q --strict-markers -m 'not slow'"
176
+ markers = ["slow: tests that load non-trivial data"]
177
+ ```
178
+
179
+ Run the fast suite on every save (a file-watcher like `pytest-watch` helps), and run `pytest -m slow` or `pytest` with no marker filter before each commit. In CI, run the full suite unconditionally.
180
+
181
+ ### What NOT to test
182
+
183
+ Don't unit-test `pandas`, `numpy`, or `pandera` themselves — assume upstream libraries work and pin versions in `pyproject.toml` to catch surprises via dependency bumps, not your own test suite. Don't assert on model quality metrics here (AUC, precision, calibration); those live in `data-science-model-evaluation.md` and run on held-out data, not fixtures. Don't write tests that require a live database, S3 bucket, or trained model file — those belong in integration tests run out-of-band, not the fast `pytest` suite a developer runs on save. And don't test notebooks directly; if a notebook cell has logic worth testing, extract it to `src/` first, then test the function.
@@ -0,0 +1,10 @@
1
+ # `ml/` knowledge
2
+
3
+ Production machine-learning domain knowledge injected into universal pipeline
4
+ steps by `content/methodology/ml-overlay.yml`.
5
+
6
+ ## Lockstep pairs with `data-science/`
7
+
8
+ Five documents here mirror documents in `content/knowledge/data-science/`. See
9
+ `content/knowledge/data-science/README.md` for the full pair table. Edits to
10
+ one side should trigger review of the other to prevent recommendation drift.
@@ -0,0 +1,39 @@
1
+ # methodology/data-science-overlay.yml
2
+ name: data-science
3
+ description: >
4
+ Data science overlay — injects solo / small-team data science domain
5
+ knowledge into existing pipeline steps for local-first, reproducibility-first
6
+ analytical work and model prototyping.
7
+ project-type: data-science
8
+
9
+ knowledge-overrides:
10
+ # Foundational
11
+ create-prd: { append: [data-science-requirements] }
12
+ user-stories: { append: [data-science-requirements] }
13
+ coding-standards: { append: [data-science-conventions, data-science-notebook-discipline] }
14
+ project-structure: { append: [data-science-project-structure] }
15
+ dev-env-setup: { append: [data-science-dev-environment] }
16
+ git-workflow: { append: [data-science-reproducibility] }
17
+
18
+ # Architecture & Design
19
+ system-architecture: { append: [data-science-architecture] }
20
+ tech-stack: { append: [data-science-architecture, data-science-dev-environment] }
21
+ adrs: { append: [data-science-architecture] }
22
+ domain-modeling: { append: [data-science-data-versioning] }
23
+ database-schema: { append: [data-science-data-versioning] }
24
+ security: { append: [data-science-security] }
25
+ operations: { append: [data-science-experiment-tracking, data-science-observability, data-science-reproducibility] }
26
+
27
+ # Testing
28
+ tdd: { append: [data-science-testing] }
29
+ create-evals: { append: [data-science-testing, data-science-model-evaluation] }
30
+
31
+ # Reviews
32
+ review-architecture: { append: [data-science-architecture] }
33
+ review-database: { append: [data-science-data-versioning] }
34
+ review-security: { append: [data-science-security] }
35
+ review-operations: { append: [data-science-experiment-tracking, data-science-observability] }
36
+ review-testing: { append: [data-science-testing, data-science-model-evaluation] }
37
+
38
+ # Planning
39
+ implementation-plan: { append: [data-science-architecture] }
@@ -177,14 +177,15 @@ Once in-progress work is complete (or if there was none):
177
177
 
178
178
  4. **Run code reviews (MANDATORY)**
179
179
  - Run the review-pr tool: `scaffold run review-pr` (CLI) or `/scaffold:review-pr` (plugin)
180
- - This runs **all three** review channels on the PR diff:
180
+ - This runs the three MMR CLI channels on the PR diff plus the Superpowers code-reviewer agent as a complementary 4th channel reconciled through `mmr reconcile`:
181
181
  1. **Codex CLI**: `codex exec --skip-git-repo-check -s read-only --ephemeral "REVIEW_PROMPT" 2>/dev/null`
182
182
  2. **Gemini CLI**: `NO_BROWSER=true gemini -p "REVIEW_PROMPT" --output-format json --approval-mode yolo 2>/dev/null`
183
- 3. **Superpowers code-reviewer**: dispatch `superpowers:code-reviewer` subagent with BASE_SHA and HEAD_SHA
184
- - Verify auth before each CLI (`codex login status`, `NO_BROWSER=true gemini -p "respond with ok" -o json`)
185
- - All three channels must execute (skip only if a tool is genuinely not installed)
183
+ 3. **Claude CLI**: `claude -p "REVIEW_PROMPT" --output-format json 2>/dev/null`
184
+ 4. **Superpowers code-reviewer** (4th channel): dispatch `superpowers:code-reviewer` subagent with BASE_SHA and HEAD_SHA
185
+ - Verify auth before each CLI (`mmr config test` pre-flights all three at once)
186
+ - All four channels should execute. Missing Codex or Gemini → MMR runs a compensating Claude pass in its place (degraded-pass verdict). Missing Claude CLI → review proceeds without compensation.
186
187
  - Fix any P0/P1/P2 findings before proceeding
187
- - Do NOT move to the next task until all channels have run
188
+ - Do NOT move to the next task until the review completes
188
189
 
189
190
  5. **Between-task cleanup**
190
191
  - `git fetch origin --prune && git clean -fd`
@@ -238,7 +239,7 @@ Once in-progress work is complete (or if there was none):
238
239
  5. **TDD is not optional** — Continue the red-green-refactor cycle for any in-progress work.
239
240
  6. **Quality gates before PR** — Never create a PR with failing checks.
240
241
  7. **Honor pre-push review when requested** — If the user or project workflow asks for pre-push multi-model review, run `scaffold run review-code` after quality gates and before `git push`.
241
- 8. **Code review before next task** — After creating a PR, run all three review channels (Codex CLI, Gemini CLI, Superpowers code-reviewer) and fix all P0/P1/P2 findings before moving on.
242
+ 8. **Code review before next task** — After creating a PR, run `scaffold run review-pr`: three CLI channels (Codex CLI, Gemini CLI, Claude CLI) via MMR plus the Superpowers code-reviewer agent as a complementary 4th channel. Fix all P0/P1/P2 findings before moving on.
242
243
  9. **Follow CLAUDE.md** — It is the authority on project conventions and commands.
243
244
 
244
245
  ---
@@ -181,14 +181,15 @@ For each task:
181
181
 
182
182
  8. **Run code reviews (MANDATORY)**
183
183
  - Run the review-pr tool: `scaffold run review-pr` (CLI) or `/scaffold:review-pr` (plugin)
184
- - This runs **all three** review channels on the PR diff:
184
+ - This runs the three MMR CLI channels on the PR diff plus the Superpowers code-reviewer agent as a complementary 4th channel reconciled through `mmr reconcile`:
185
185
  1. **Codex CLI**: `codex exec --skip-git-repo-check -s read-only --ephemeral "REVIEW_PROMPT" 2>/dev/null`
186
186
  2. **Gemini CLI**: `NO_BROWSER=true gemini -p "REVIEW_PROMPT" --output-format json --approval-mode yolo 2>/dev/null`
187
- 3. **Superpowers code-reviewer**: dispatch `superpowers:code-reviewer` subagent with BASE_SHA and HEAD_SHA
188
- - Verify auth before each CLI (`codex login status`, `NO_BROWSER=true gemini -p "respond with ok" -o json`)
189
- - All three channels must execute (skip only if a tool is genuinely not installed)
187
+ 3. **Claude CLI**: `claude -p "REVIEW_PROMPT" --output-format json 2>/dev/null`
188
+ 4. **Superpowers code-reviewer** (4th channel): dispatch `superpowers:code-reviewer` subagent with BASE_SHA and HEAD_SHA
189
+ - Verify auth before each CLI (`mmr config test` pre-flights all three at once)
190
+ - All four channels should execute. Missing Codex or Gemini → MMR runs a compensating Claude pass in its place (degraded-pass verdict). Missing Claude CLI → review proceeds without compensation.
190
191
  - Fix any P0/P1/P2 findings before proceeding
191
- - Do NOT move to the next task until all channels have run
192
+ - Do NOT move to the next task until the review completes
192
193
 
193
194
  9. **Between-task cleanup**
194
195
  - `git fetch origin --prune && git clean -fd`
@@ -230,7 +231,7 @@ For each task:
230
231
  4. **TDD is not optional** — Write failing tests before implementation. No exceptions.
231
232
  5. **Quality gates before PR** — Never create a PR with failing checks.
232
233
  6. **Honor pre-push review when requested** — If the user or project workflow asks for pre-push multi-model review, run `scaffold run review-code` after quality gates and before `git push`.
233
- 7. **Code review before next task** — After creating a PR, run all three review channels (Codex CLI, Gemini CLI, Superpowers code-reviewer) and fix all P0/P1/P2 findings before moving on.
234
+ 7. **Code review before next task** — After creating a PR, run `scaffold run review-pr`: three CLI channels (Codex CLI, Gemini CLI, Claude CLI) via MMR plus the Superpowers code-reviewer agent as a complementary 4th channel. Fix all P0/P1/P2 findings before moving on.
234
235
  8. **Avoid task conflicts** — Check what other agents are working on before claiming.
235
236
  9. **Follow CLAUDE.md** — It is the authority on project conventions and commands.
236
237
 
@@ -154,14 +154,15 @@ Once in-progress work is complete (or if there was none):
154
154
 
155
155
  4. **Run code reviews (MANDATORY)**
156
156
  - Run the review-pr tool: `scaffold run review-pr` (CLI) or `/scaffold:review-pr` (plugin)
157
- - This runs **all three** review channels on the PR diff:
157
+ - This runs the three MMR CLI channels on the PR diff plus the Superpowers code-reviewer agent as a complementary 4th channel reconciled through `mmr reconcile`:
158
158
  1. **Codex CLI**: `codex exec --skip-git-repo-check -s read-only --ephemeral "REVIEW_PROMPT" 2>/dev/null`
159
159
  2. **Gemini CLI**: `NO_BROWSER=true gemini -p "REVIEW_PROMPT" --output-format json --approval-mode yolo 2>/dev/null`
160
- 3. **Superpowers code-reviewer**: dispatch `superpowers:code-reviewer` subagent with BASE_SHA and HEAD_SHA
161
- - Verify auth before each CLI (`codex login status`, `NO_BROWSER=true gemini -p "respond with ok" -o json`)
162
- - All three channels must execute (skip only if a tool is genuinely not installed)
160
+ 3. **Claude CLI**: `claude -p "REVIEW_PROMPT" --output-format json 2>/dev/null`
161
+ 4. **Superpowers code-reviewer** (4th channel): dispatch `superpowers:code-reviewer` subagent with BASE_SHA and HEAD_SHA
162
+ - Verify auth before each CLI (`mmr config test` pre-flights all three at once)
163
+ - All four channels should execute. Missing Codex or Gemini → MMR runs a compensating Claude pass in its place (degraded-pass verdict). Missing Claude CLI → review proceeds without compensation.
163
164
  - Fix any P0/P1/P2 findings before proceeding
164
- - Do NOT move to the next task until all channels have run
165
+ - Do NOT move to the next task until the review completes
165
166
 
166
167
  5. **Claim next task**
167
168
  - Return to main: `git checkout main && git pull origin main`
@@ -203,7 +204,7 @@ Once in-progress work is complete (or if there was none):
203
204
  4. **TDD is not optional** — Continue the red-green-refactor cycle for any in-progress work.
204
205
  5. **Quality gates before PR** — Never create a PR with failing checks.
205
206
  6. **Honor pre-push review when requested** — If the user or project workflow asks for pre-push multi-model review, run `scaffold run review-code` after quality gates and before `git push`.
206
- 7. **Code review before next task** — After creating a PR, run all three review channels (Codex CLI, Gemini CLI, Superpowers code-reviewer) and fix all P0/P1/P2 findings before moving on.
207
+ 7. **Code review before next task** — After creating a PR, run `scaffold run review-pr`: three CLI channels (Codex CLI, Gemini CLI, Claude CLI) via MMR plus the Superpowers code-reviewer agent as a complementary 4th channel. Fix all P0/P1/P2 findings before moving on.
207
208
  8. **Follow CLAUDE.md** — It is the authority on project conventions and commands.
208
209
 
209
210
  ---
@@ -160,14 +160,15 @@ For each task:
160
160
 
161
161
  8. **Run code reviews (MANDATORY)**
162
162
  - Run the review-pr tool: `scaffold run review-pr` (CLI) or `/scaffold:review-pr` (plugin)
163
- - This runs **all three** review channels on the PR diff:
163
+ - This runs the three MMR CLI channels on the PR diff plus the Superpowers code-reviewer agent as a complementary 4th channel reconciled through `mmr reconcile`:
164
164
  1. **Codex CLI**: `codex exec --skip-git-repo-check -s read-only --ephemeral "REVIEW_PROMPT" 2>/dev/null`
165
165
  2. **Gemini CLI**: `NO_BROWSER=true gemini -p "REVIEW_PROMPT" --output-format json --approval-mode yolo 2>/dev/null`
166
- 3. **Superpowers code-reviewer**: dispatch `superpowers:code-reviewer` subagent with BASE_SHA and HEAD_SHA
167
- - Verify auth before each CLI (`codex login status`, `NO_BROWSER=true gemini -p "respond with ok" -o json`)
168
- - All three channels must execute (skip only if a tool is genuinely not installed)
166
+ 3. **Claude CLI**: `claude -p "REVIEW_PROMPT" --output-format json 2>/dev/null`
167
+ 4. **Superpowers code-reviewer** (4th channel): dispatch `superpowers:code-reviewer` subagent with BASE_SHA and HEAD_SHA
168
+ - Verify auth before each CLI (`mmr config test` pre-flights all three at once)
169
+ - All four channels should execute. Missing Codex or Gemini → MMR runs a compensating Claude pass in its place (degraded-pass verdict). Missing Claude CLI → review proceeds without compensation.
169
170
  - Fix any P0/P1/P2 findings before proceeding
170
- - Do NOT move to the next task until all channels have run
171
+ - Do NOT move to the next task until the review completes
171
172
 
172
173
  9. **Update status**
173
174
  - If Beads: task status is managed via `bd` commands
@@ -201,7 +202,7 @@ For each task:
201
202
  2. **One task at a time** — Complete the current task fully before starting the next.
202
203
  3. **Quality gates before PR** — Never create a PR with failing checks.
203
204
  4. **Honor pre-push review when requested** — If the user or project workflow asks for pre-push multi-model review, run `scaffold run review-code` after quality gates and before `git push`.
204
- 5. **Code review before next task** — After creating a PR, run all three review channels (Codex CLI, Gemini CLI, Superpowers code-reviewer) and fix all P0/P1/P2 findings before moving on.
205
+ 5. **Code review before next task** — After creating a PR, run `scaffold run review-pr`: three CLI channels (Codex CLI, Gemini CLI, Claude CLI) via MMR plus the Superpowers code-reviewer agent as a complementary 4th channel. Fix all P0/P1/P2 findings before moving on.
205
206
  6. **Update status immediately** — Mark tasks complete as soon as review passes.
206
207
  7. **Consult lessons.md** — Check for relevant anti-patterns before each task.
207
208
  8. **Follow CLAUDE.md** — It is the authority on project conventions and commands.