@zigrivers/scaffold 3.8.0 → 3.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (70) hide show
  1. package/README.md +73 -8
  2. package/content/knowledge/browser-extension/browser-extension-architecture.md +195 -0
  3. package/content/knowledge/browser-extension/browser-extension-content-scripts.md +264 -0
  4. package/content/knowledge/browser-extension/browser-extension-conventions.md +156 -0
  5. package/content/knowledge/browser-extension/browser-extension-cross-browser.md +229 -0
  6. package/content/knowledge/browser-extension/browser-extension-dev-environment.md +247 -0
  7. package/content/knowledge/browser-extension/browser-extension-manifest.md +220 -0
  8. package/content/knowledge/browser-extension/browser-extension-project-structure.md +183 -0
  9. package/content/knowledge/browser-extension/browser-extension-requirements.md +107 -0
  10. package/content/knowledge/browser-extension/browser-extension-security.md +202 -0
  11. package/content/knowledge/browser-extension/browser-extension-service-workers.md +265 -0
  12. package/content/knowledge/browser-extension/browser-extension-store-submission.md +155 -0
  13. package/content/knowledge/browser-extension/browser-extension-testing.md +270 -0
  14. package/content/knowledge/data-pipeline/data-pipeline-architecture.md +175 -0
  15. package/content/knowledge/data-pipeline/data-pipeline-batch-patterns.md +263 -0
  16. package/content/knowledge/data-pipeline/data-pipeline-conventions.md +176 -0
  17. package/content/knowledge/data-pipeline/data-pipeline-dev-environment.md +350 -0
  18. package/content/knowledge/data-pipeline/data-pipeline-orchestration.md +291 -0
  19. package/content/knowledge/data-pipeline/data-pipeline-project-structure.md +257 -0
  20. package/content/knowledge/data-pipeline/data-pipeline-quality.md +324 -0
  21. package/content/knowledge/data-pipeline/data-pipeline-requirements.md +145 -0
  22. package/content/knowledge/data-pipeline/data-pipeline-schema-management.md +295 -0
  23. package/content/knowledge/data-pipeline/data-pipeline-security.md +326 -0
  24. package/content/knowledge/data-pipeline/data-pipeline-streaming-patterns.md +280 -0
  25. package/content/knowledge/data-pipeline/data-pipeline-testing.md +406 -0
  26. package/content/knowledge/ml/ml-architecture.md +172 -0
  27. package/content/knowledge/ml/ml-conventions.md +209 -0
  28. package/content/knowledge/ml/ml-dev-environment.md +299 -0
  29. package/content/knowledge/ml/ml-experiment-tracking.md +285 -0
  30. package/content/knowledge/ml/ml-model-evaluation.md +256 -0
  31. package/content/knowledge/ml/ml-observability.md +253 -0
  32. package/content/knowledge/ml/ml-project-structure.md +216 -0
  33. package/content/knowledge/ml/ml-requirements.md +138 -0
  34. package/content/knowledge/ml/ml-security.md +188 -0
  35. package/content/knowledge/ml/ml-serving-patterns.md +243 -0
  36. package/content/knowledge/ml/ml-testing.md +301 -0
  37. package/content/knowledge/ml/ml-training-patterns.md +269 -0
  38. package/content/methodology/browser-extension-overlay.yml +82 -0
  39. package/content/methodology/data-pipeline-overlay.yml +70 -0
  40. package/content/methodology/ml-overlay.yml +70 -0
  41. package/dist/cli/commands/init.d.ts +13 -0
  42. package/dist/cli/commands/init.d.ts.map +1 -1
  43. package/dist/cli/commands/init.js +122 -2
  44. package/dist/cli/commands/init.js.map +1 -1
  45. package/dist/cli/commands/init.test.js +120 -0
  46. package/dist/cli/commands/init.test.js.map +1 -1
  47. package/dist/config/schema.d.ts +864 -48
  48. package/dist/config/schema.d.ts.map +1 -1
  49. package/dist/config/schema.js +53 -0
  50. package/dist/config/schema.js.map +1 -1
  51. package/dist/config/schema.test.js +166 -3
  52. package/dist/config/schema.test.js.map +1 -1
  53. package/dist/core/assembly/overlay-loader.test.js +33 -0
  54. package/dist/core/assembly/overlay-loader.test.js.map +1 -1
  55. package/dist/e2e/project-type-overlays.test.d.ts +2 -2
  56. package/dist/e2e/project-type-overlays.test.js +499 -33
  57. package/dist/e2e/project-type-overlays.test.js.map +1 -1
  58. package/dist/types/config.d.ts +10 -1
  59. package/dist/types/config.d.ts.map +1 -1
  60. package/dist/wizard/questions.d.ts +17 -1
  61. package/dist/wizard/questions.d.ts.map +1 -1
  62. package/dist/wizard/questions.js +75 -1
  63. package/dist/wizard/questions.js.map +1 -1
  64. package/dist/wizard/questions.test.js +167 -0
  65. package/dist/wizard/questions.test.js.map +1 -1
  66. package/dist/wizard/wizard.d.ts +13 -0
  67. package/dist/wizard/wizard.d.ts.map +1 -1
  68. package/dist/wizard/wizard.js +17 -1
  69. package/dist/wizard/wizard.js.map +1 -1
  70. package/package.json +1 -1
@@ -0,0 +1,209 @@
1
+ ---
2
+ name: ml-conventions
3
+ description: Experiment naming, model versioning, reproducibility via random seeds, config-as-code patterns, and team conventions for ML projects
4
+ topics: [ml, conventions, reproducibility, versioning, config, experiments]
5
+ ---
6
+
7
+ ML projects without conventions degenerate into chaos within weeks: unnamed experiments with lost hyperparameters, models named `model_v2_final_FINAL.pkl`, and results that cannot be reproduced. Unlike software engineering where the compiler enforces structure, ML workflows are loose scripts and notebooks that require disciplined conventions to remain comprehensible. Establish these conventions at project start and encode them in tooling so they are followed by default, not willpower.
8
+
9
+ ## Summary
10
+
11
+ ML conventions cover experiment naming (structured, searchable identifiers), model versioning (semantic or content-addressed), reproducibility (seeding all random sources, recording environment), and config-as-code (no magic numbers in code, all hyperparameters in config files). These conventions are not optional hygiene — they are the infrastructure that makes ML engineering a repeatable discipline rather than a research lottery.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Experiment Naming
16
+
17
+ Every training run that produces artifacts or results must have a unique, human-readable identifier. Ad-hoc names like `test`, `v2`, or `new_model` are unusable at scale:
18
+
19
+ **Recommended format**: `{model_type}-{dataset}-{date}-{purpose}[-{variant}]`
20
+
21
+ Examples:
22
+ - `resnet50-imagenet-20240315-baseline`
23
+ - `bert-sst2-20240315-lr-sweep`
24
+ - `xgboost-churn-20240320-feature-v3`
25
+ - `gpt2-reviews-20240322-dropout-ablation`
26
+
27
+ **Rules**:
28
+ - All lowercase, hyphen-separated (no spaces, no underscores)
29
+ - Date in `YYYYMMDD` format (sorts chronologically)
30
+ - Purpose is human-readable and specific — not `experiment1` or `test`
31
+ - Variant suffix for ablations and sweeps (`-v2`, `-no-dropout`, `-lr-1e-3`)
32
+
33
+ Many teams use auto-generated experiment IDs (MLflow assigns UUID-based IDs automatically) and rely on tagging/metadata for search. This is fine as a secondary system, but always add a human-readable display name.
34
+
35
+ ### Model Versioning
36
+
37
+ Model versioning is distinct from experiment tracking. A version is a production artifact; an experiment is a training run:
38
+
39
+ **Semantic versioning for models**:
40
+ - `v{major}.{minor}.{patch}` — consistent with software versioning
41
+ - Major: Breaking change in model interface (input/output schema, preprocessing contract)
42
+ - Minor: Meaningful accuracy improvement or new feature support
43
+ - Patch: Bug fix, minor data update, no interface change
44
+
45
+ **Content-addressed versioning** (used by MLflow Model Registry, DVC):
46
+ - Models are identified by a hash of their weights + config
47
+ - Prevents accidental overwriting
48
+ - Enables exact reproducibility — "which weights produced this prediction?"
49
+
50
+ **Registry-based model lifecycle**:
51
+ ```
52
+ Staging → Validation → Production → Archived
53
+ ```
54
+ - Never promote directly to Production — always pass through Staging validation
55
+ - Keep at least one previous Production version for instant rollback
56
+ - Document promotion reason: "Promoted: +2.3% AUC on Q1 eval set, latency within budget"
57
+
58
+ ### Reproducibility
59
+
60
+ ML reproducibility means: given the same code, data, and config, the same model is produced. Achieve it through four controls:
61
+
62
+ **1. Random seed management**
63
+
64
+ Set all random sources before any computation:
65
+ ```python
66
+ import random
67
+ import numpy as np
68
+ import torch
69
+
70
+ def set_seed(seed: int) -> None:
71
+ random.seed(seed)
72
+ np.random.seed(seed)
73
+ torch.manual_seed(seed)
74
+ torch.cuda.manual_seed_all(seed)
75
+ # For full determinism (may impact performance)
76
+ torch.backends.cudnn.deterministic = True
77
+ torch.backends.cudnn.benchmark = False
78
+ ```
79
+
80
+ Record the seed in experiment config. Default seed: `42` (or any fixed value — consistency matters more than the value). When running hyperparameter sweeps with multiple seeds, record all seeds and report mean ± std.
81
+
82
+ **2. Dependency pinning**
83
+
84
+ Pin all dependencies to exact versions:
85
+ ```toml
86
+ # pyproject.toml (Poetry)
87
+ [tool.poetry.dependencies]
88
+ python = "3.11.4"
89
+ torch = "2.1.0"
90
+ transformers = "4.35.2"
91
+ numpy = "1.26.0"
92
+ ```
93
+
94
+ Use `poetry.lock` or `requirements.txt` generated by `pip freeze`. Never use unpinned dependencies (`torch>=2.0`) in a training environment.
95
+
96
+ **3. Data versioning**
97
+
98
+ Record the exact dataset version used for each training run:
99
+ - DVC: content-addressed data with `dvc add` and `.dvc` pointers
100
+ - Dataset registry: log dataset name + version + hash in experiment metadata
101
+ - SQL-based datasets: log the query hash and execution timestamp
102
+
103
+ **4. Environment reproducibility**
104
+
105
+ Capture the full environment:
106
+ ```bash
107
+ # Save environment
108
+ conda env export > environment.yml
109
+ pip freeze > requirements-frozen.txt
110
+
111
+ # Record GPU driver and CUDA version
112
+ nvidia-smi --query-gpu=driver_version,name --format=csv
113
+ nvcc --version
114
+ ```
115
+
116
+ For full environment isolation, use Docker. The Dockerfile is the environment specification.
117
+
118
+ ### Config-as-Code
119
+
120
+ No magic numbers in code. Every hyperparameter, data path, and training setting belongs in a config file:
121
+
122
+ **Bad** (magic numbers scattered in code):
123
+ ```python
124
+ optimizer = Adam(model.parameters(), lr=0.001)
125
+ scheduler = CosineAnnealingLR(optimizer, T_max=100)
126
+ train_loader = DataLoader(dataset, batch_size=32, num_workers=4)
127
+ ```
128
+
129
+ **Good** (config-driven):
130
+ ```yaml
131
+ # configs/train.yaml
132
+ training:
133
+ seed: 42
134
+ epochs: 100
135
+ batch_size: 32
136
+ num_workers: 4
137
+
138
+ optimizer:
139
+ type: adam
140
+ lr: 1.0e-3
141
+ weight_decay: 1.0e-4
142
+
143
+ scheduler:
144
+ type: cosine_annealing
145
+ t_max: 100
146
+ ```
147
+
148
+ ```python
149
+ # src/training/train.py
150
+ def train(cfg: DictConfig) -> None:
151
+ set_seed(cfg.training.seed)
152
+ optimizer = build_optimizer(model, cfg.optimizer)
153
+ scheduler = build_scheduler(optimizer, cfg.scheduler)
154
+ ```
155
+
156
+ Use **Hydra** (Meta) or **OmegaConf** for hierarchical config management with CLI override support:
157
+ ```bash
158
+ # Override from CLI without changing config files
159
+ python train.py optimizer.lr=1e-4 training.batch_size=64
160
+ ```
161
+
162
+ **Config file organization**:
163
+ ```
164
+ configs/
165
+ base.yaml # Default config for all experiments
166
+ model/
167
+ resnet50.yaml
168
+ vit-b16.yaml
169
+ data/
170
+ imagenet.yaml
171
+ cifar10.yaml
172
+ training/
173
+ fast.yaml # Low-epoch for debugging
174
+ full.yaml # Production training
175
+ ```
176
+
177
+ ### Code and Notebook Conventions
178
+
179
+ **Notebooks are for exploration, not production**:
180
+ - Notebooks belong in `notebooks/` — never in `src/`
181
+ - Notebooks must be cleared before committing (no large outputs committed to git)
182
+ - Meaningful results from notebooks are refactored into `src/` modules with tests
183
+
184
+ **Module structure conventions**:
185
+ - `src/data/` — dataset classes, data loaders, preprocessing transforms
186
+ - `src/models/` — model architectures (no training logic)
187
+ - `src/training/` — training loop, loss functions, callbacks
188
+ - `src/evaluation/` — metrics, evaluation runners
189
+ - `src/serving/` — inference code, prediction pipelines
190
+
191
+ **Naming conventions**:
192
+ - Files: `snake_case.py`
193
+ - Classes: `PascalCase` (e.g., `ResNet50Classifier`, `ChurnDataset`)
194
+ - Functions: `snake_case` (e.g., `compute_f1_score`, `load_checkpoint`)
195
+ - Constants: `UPPER_SNAKE_CASE` (e.g., `MAX_SEQ_LENGTH = 512`)
196
+ - Config keys: `snake_case` in YAML
197
+
198
+ ### Checklist Before Starting a Training Run
199
+
200
+ ```
201
+ [ ] Experiment name follows naming convention
202
+ [ ] Random seed set and recorded in config
203
+ [ ] Config file committed (not just command-line overrides)
204
+ [ ] Dataset version recorded
205
+ [ ] Experiment tracker (MLflow/W&B) initialized with run metadata
206
+ [ ] Code committed to git (note the commit SHA in the experiment)
207
+ [ ] Output directory created and named consistently
208
+ [ ] Hardware/environment recorded
209
+ ```
@@ -0,0 +1,299 @@
1
+ ---
2
+ name: ml-dev-environment
3
+ description: Conda/Poetry environment setup, Jupyter integration, GPU detection and configuration, and Docker for reproducible ML development
4
+ topics: [ml, dev-environment, conda, poetry, jupyter, gpu, docker, reproducibility]
5
+ ---
6
+
7
+ ML development environments have more complexity than typical software projects: GPU drivers, CUDA toolkits, Python packages with native extensions, and Jupyter notebook infrastructure all need to align. A broken environment costs hours and blocks the whole team. Invest in environment standardisation upfront — the payoff is that every team member can reproduce results and that CI pipelines match local runs.
8
+
9
+ ## Summary
10
+
11
+ Prefer Conda for ML projects when GPU and CUDA management is required; use Poetry for pure-Python projects or as the Python dependency manager on top of Conda. Configure Jupyter as a managed service rather than ad-hoc invocations. Detect GPU availability programmatically and handle CPU fallback gracefully. Use Docker to capture the full environment for reproducible training runs and production serving.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Conda vs. Poetry: When to Use Each
16
+
17
+ **Conda** is the right choice when:
18
+ - Managing GPU drivers and CUDA toolkit versions (Conda can install CUDA without root)
19
+ - Working with packages that have complex native dependencies (PyTorch, TensorFlow, OpenCV)
20
+ - Need to isolate Python version itself (not just packages)
21
+ - Team uses multiple ML frameworks with conflicting dependencies
22
+
23
+ **Poetry** is the right choice when:
24
+ - Pure-Python project or all native dependencies are available via pip
25
+ - Need strict dependency locking and reproducible installs
26
+ - Publishing a library (Poetry handles packaging well)
27
+ - Already using a Conda environment for CUDA and want finer control over Python packages
28
+
29
+ **Common hybrid pattern**: Conda manages Python version and CUDA; Poetry manages Python package dependencies inside the Conda environment.
30
+
31
+ ### Conda Environment Setup
32
+
33
+ ```yaml
34
+ # environment.yml — commit to git
35
+ name: myproject
36
+ channels:
37
+ - pytorch
38
+ - nvidia
39
+ - conda-forge
40
+ - defaults
41
+ dependencies:
42
+ - python=3.11
43
+ - cuda-toolkit=12.1
44
+ - cudnn=8.9
45
+ - pip>=23.0
46
+ - pip:
47
+ - torch==2.1.0+cu121
48
+ - torchvision==0.16.0+cu121
49
+ - -r requirements.txt # or use pyproject.toml
50
+ ```
51
+
52
+ ```bash
53
+ # Create and activate
54
+ conda env create -f environment.yml
55
+ conda activate myproject
56
+
57
+ # Update after environment.yml changes
58
+ conda env update -f environment.yml --prune
59
+
60
+ # Export current state (for exact reproducibility audit)
61
+ conda env export > environment-lock.yml
62
+ ```
63
+
64
+ **Critical**: Pin exact versions in `environment.yml`. `pytorch>=2.0` is not a reproducible spec.
65
+
66
+ ### Poetry Setup (Python Dependencies)
67
+
68
+ ```bash
69
+ # Initialize
70
+ poetry init
71
+
72
+ # Add dependencies
73
+ poetry add torch==2.1.0 transformers==4.35.2
74
+ poetry add --group dev pytest black mypy
75
+
76
+ # Install (creates .venv by default)
77
+ poetry install
78
+
79
+ # Run in the managed venv
80
+ poetry run python train.py
81
+ poetry run pytest
82
+ ```
83
+
84
+ `pyproject.toml` example:
85
+ ```toml
86
+ [tool.poetry]
87
+ name = "myproject"
88
+ version = "0.1.0"
89
+ description = "ML project"
90
+ python = "^3.11"
91
+
92
+ [tool.poetry.dependencies]
93
+ torch = "2.1.0"
94
+ transformers = "4.35.2"
95
+ hydra-core = "1.3.2"
96
+ mlflow = "2.9.2"
97
+
98
+ [tool.poetry.group.dev.dependencies]
99
+ pytest = "7.4.3"
100
+ black = "23.11.0"
101
+ mypy = "1.7.0"
102
+ nbstripout = "0.6.1"
103
+ ```
104
+
105
+ ### GPU Detection and Configuration
106
+
107
+ Always detect GPU availability at runtime and handle CPU fallback:
108
+
109
+ ```python
110
+ # src/utils/device.py
111
+ import torch
112
+ import logging
113
+
114
+ logger = logging.getLogger(__name__)
115
+
116
+ def get_device(prefer_gpu: bool = True) -> torch.device:
117
+ """Return the best available device with logging."""
118
+ if prefer_gpu and torch.cuda.is_available():
119
+ device = torch.device("cuda")
120
+ gpu_name = torch.cuda.get_device_name(0)
121
+ gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
122
+ logger.info(f"Using GPU: {gpu_name} ({gpu_memory:.1f} GB)")
123
+ elif prefer_gpu and torch.backends.mps.is_available():
124
+ # Apple Silicon
125
+ device = torch.device("mps")
126
+ logger.info("Using Apple MPS device")
127
+ else:
128
+ device = torch.device("cpu")
129
+ logger.info("Using CPU — GPU not available or not requested")
130
+ return device
131
+
132
+ def log_gpu_memory() -> None:
133
+ """Log current GPU memory usage."""
134
+ if torch.cuda.is_available():
135
+ allocated = torch.cuda.memory_allocated() / 1e9
136
+ reserved = torch.cuda.memory_reserved() / 1e9
137
+ logger.debug(f"GPU memory: {allocated:.2f} GB allocated, {reserved:.2f} GB reserved")
138
+ ```
139
+
140
+ **CUDA version compatibility**: PyTorch packages are built against specific CUDA versions. Always match:
141
+
142
+ | PyTorch | CUDA | CUDNN |
143
+ |---------|------|-------|
144
+ | 2.1.x | 12.1, 11.8 | 8.x |
145
+ | 2.0.x | 11.7, 11.8 | 8.x |
146
+
147
+ Check compatibility at pytorch.org before pinning.
148
+
149
+ **Multi-GPU setup** (training only — not for development):
150
+ ```python
151
+ # Detect available GPUs
152
+ n_gpus = torch.cuda.device_count()
153
+ if n_gpus > 1:
154
+ model = torch.nn.DataParallel(model) # Simple, for research
155
+ # Or for production: use DistributedDataParallel (see ml-training-patterns)
156
+ ```
157
+
158
+ ### Jupyter Integration
159
+
160
+ Run Jupyter as a managed kernel rather than an ad-hoc server:
161
+
162
+ ```bash
163
+ # Install Jupyter in the project environment
164
+ poetry add --group dev jupyter jupyterlab ipykernel
165
+
166
+ # Register the project venv as a named Jupyter kernel
167
+ poetry run python -m ipykernel install --user --name myproject --display-name "MyProject (Python 3.11)"
168
+
169
+ # Launch JupyterLab
170
+ poetry run jupyter lab
171
+ ```
172
+
173
+ Now all project notebooks run in the same environment as the source code.
174
+
175
+ **Recommended Jupyter extensions**:
176
+ - `nbstripout` — strips outputs before git commit
177
+ - `jupyterlab-git` — git integration in the UI
178
+ - `jupyterlab-lsp` — language server (autocomplete, type hints)
179
+
180
+ **VS Code Jupyter integration** (recommended over browser-based):
181
+ ```json
182
+ // .vscode/settings.json
183
+ {
184
+ "jupyter.kernels.filter": [
185
+ {"path": "${workspaceFolder}/.venv/bin/python", "type": "pythonEnvironment"}
186
+ ],
187
+ "jupyter.notebookFileRoot": "${workspaceFolder}",
188
+ "python.defaultInterpreterPath": "${workspaceFolder}/.venv/bin/python"
189
+ }
190
+ ```
191
+
192
+ ### Docker for Reproducibility
193
+
194
+ Docker captures the entire environment — OS, CUDA, Python, and packages. Use it for:
195
+ - CI training runs
196
+ - Sharing experiments with collaborators who have different local setups
197
+ - Production serving (identical environment to training)
198
+
199
+ **Base `Dockerfile` for ML training**:
200
+ ```dockerfile
201
+ # Use NVIDIA's official CUDA base image
202
+ FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
203
+
204
+ # Set Python version
205
+ ENV PYTHON_VERSION=3.11
206
+ ENV DEBIAN_FRONTEND=noninteractive
207
+
208
+ RUN apt-get update && apt-get install -y \
209
+ python${PYTHON_VERSION} \
210
+ python3-pip \
211
+ git \
212
+ && rm -rf /var/lib/apt/lists/*
213
+
214
+ RUN ln -s /usr/bin/python${PYTHON_VERSION} /usr/bin/python
215
+
216
+ # Install Poetry
217
+ RUN pip install poetry==1.7.1
218
+ ENV POETRY_NO_INTERACTION=1 \
219
+ POETRY_VENV_IN_PROJECT=1
220
+
221
+ WORKDIR /app
222
+
223
+ # Install dependencies (cached layer)
224
+ COPY pyproject.toml poetry.lock ./
225
+ RUN poetry install --no-root --without dev
226
+
227
+ # Copy source
228
+ COPY src/ ./src/
229
+ COPY configs/ ./configs/
230
+
231
+ # Install the project itself
232
+ RUN poetry install --without dev
233
+
234
+ ENTRYPOINT ["poetry", "run", "python", "-m", "src.training.train"]
235
+ ```
236
+
237
+ **Docker Compose for development**:
238
+ ```yaml
239
+ # docker-compose.yml
240
+ services:
241
+ train:
242
+ build: .
243
+ volumes:
244
+ - ./data:/app/data
245
+ - ./models:/app/models
246
+ - ./configs:/app/configs
247
+ environment:
248
+ - MLFLOW_TRACKING_URI=http://mlflow:5000
249
+ deploy:
250
+ resources:
251
+ reservations:
252
+ devices:
253
+ - driver: nvidia
254
+ count: all
255
+ capabilities: [gpu]
256
+
257
+ mlflow:
258
+ image: ghcr.io/mlflow/mlflow:v2.9.2
259
+ ports:
260
+ - "5000:5000"
261
+ volumes:
262
+ - ./mlruns:/mlflow/mlruns
263
+ ```
264
+
265
+ ### Makefile Task Runner
266
+
267
+ Encode common tasks in a `Makefile` to eliminate "how do I run this?" questions:
268
+
269
+ ```makefile
270
+ .PHONY: env train eval test lint clean
271
+
272
+ env:
273
+ conda env create -f environment.yml || conda env update -f environment.yml --prune
274
+
275
+ train:
276
+ poetry run python -m src.training.train $(ARGS)
277
+
278
+ eval:
279
+ poetry run python -m src.evaluation.evaluator $(ARGS)
280
+
281
+ test:
282
+ poetry run pytest tests/ -v
283
+
284
+ lint:
285
+ poetry run black --check src/ tests/
286
+ poetry run mypy src/
287
+
288
+ clean:
289
+ find . -type f -name "*.pyc" -delete
290
+ find . -type d -name "__pycache__" -delete
291
+ rm -rf .pytest_cache/
292
+ ```
293
+
294
+ Usage:
295
+ ```bash
296
+ make env # Set up environment
297
+ make train ARGS="optimizer.lr=1e-4"
298
+ make test
299
+ ```