@zigrivers/scaffold 3.13.0 → 3.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (180) hide show
  1. package/README.md +32 -10
  2. package/content/knowledge/research/research-architecture.md +385 -0
  3. package/content/knowledge/research/research-conventions.md +248 -0
  4. package/content/knowledge/research/research-dev-environment.md +303 -0
  5. package/content/knowledge/research/research-experiment-loop.md +429 -0
  6. package/content/knowledge/research/research-experiment-tracking.md +336 -0
  7. package/content/knowledge/research/research-ml-architecture-search.md +383 -0
  8. package/content/knowledge/research/research-ml-evaluation.md +407 -0
  9. package/content/knowledge/research/research-ml-experiment-tracking.md +466 -0
  10. package/content/knowledge/research/research-ml-training-patterns.md +413 -0
  11. package/content/knowledge/research/research-observability.md +395 -0
  12. package/content/knowledge/research/research-overfitting-prevention.md +306 -0
  13. package/content/knowledge/research/research-project-structure.md +264 -0
  14. package/content/knowledge/research/research-quant-backtesting.md +326 -0
  15. package/content/knowledge/research/research-quant-market-data.md +366 -0
  16. package/content/knowledge/research/research-quant-metrics.md +335 -0
  17. package/content/knowledge/research/research-quant-requirements.md +223 -0
  18. package/content/knowledge/research/research-quant-risk.md +469 -0
  19. package/content/knowledge/research/research-quant-strategy-patterns.md +412 -0
  20. package/content/knowledge/research/research-requirements.md +201 -0
  21. package/content/knowledge/research/research-security.md +374 -0
  22. package/content/knowledge/research/research-sim-compute-management.md +538 -0
  23. package/content/knowledge/research/research-sim-engine-patterns.md +448 -0
  24. package/content/knowledge/research/research-sim-parameter-spaces.md +425 -0
  25. package/content/knowledge/research/research-sim-validation.md +456 -0
  26. package/content/knowledge/research/research-testing.md +334 -0
  27. package/content/methodology/research-ml-research.yml +23 -0
  28. package/content/methodology/research-overlay.yml +65 -0
  29. package/content/methodology/research-quant-finance.yml +29 -0
  30. package/content/methodology/research-simulation.yml +23 -0
  31. package/dist/cli/commands/adopt.d.ts.map +1 -1
  32. package/dist/cli/commands/adopt.js +30 -8
  33. package/dist/cli/commands/adopt.js.map +1 -1
  34. package/dist/cli/commands/adopt.serialization.test.js +49 -0
  35. package/dist/cli/commands/adopt.serialization.test.js.map +1 -1
  36. package/dist/cli/commands/adopt.test.js +8 -0
  37. package/dist/cli/commands/adopt.test.js.map +1 -1
  38. package/dist/cli/commands/build.d.ts.map +1 -1
  39. package/dist/cli/commands/build.js +191 -180
  40. package/dist/cli/commands/build.js.map +1 -1
  41. package/dist/cli/commands/complete.d.ts.map +1 -1
  42. package/dist/cli/commands/complete.js +16 -12
  43. package/dist/cli/commands/complete.js.map +1 -1
  44. package/dist/cli/commands/complete.test.js +14 -5
  45. package/dist/cli/commands/complete.test.js.map +1 -1
  46. package/dist/cli/commands/init.d.ts +4 -0
  47. package/dist/cli/commands/init.d.ts.map +1 -1
  48. package/dist/cli/commands/init.js +75 -51
  49. package/dist/cli/commands/init.js.map +1 -1
  50. package/dist/cli/commands/init.test.js +33 -27
  51. package/dist/cli/commands/init.test.js.map +1 -1
  52. package/dist/cli/commands/reset.d.ts.map +1 -1
  53. package/dist/cli/commands/reset.js +44 -40
  54. package/dist/cli/commands/reset.js.map +1 -1
  55. package/dist/cli/commands/reset.test.js +42 -20
  56. package/dist/cli/commands/reset.test.js.map +1 -1
  57. package/dist/cli/commands/rework.d.ts.map +1 -1
  58. package/dist/cli/commands/rework.js +16 -12
  59. package/dist/cli/commands/rework.js.map +1 -1
  60. package/dist/cli/commands/rework.test.js +12 -3
  61. package/dist/cli/commands/rework.test.js.map +1 -1
  62. package/dist/cli/commands/run.d.ts.map +1 -1
  63. package/dist/cli/commands/run.js +318 -298
  64. package/dist/cli/commands/run.js.map +1 -1
  65. package/dist/cli/commands/run.test.js +92 -120
  66. package/dist/cli/commands/run.test.js.map +1 -1
  67. package/dist/cli/commands/skip.d.ts.map +1 -1
  68. package/dist/cli/commands/skip.js +19 -15
  69. package/dist/cli/commands/skip.js.map +1 -1
  70. package/dist/cli/commands/skip.test.js +22 -11
  71. package/dist/cli/commands/skip.test.js.map +1 -1
  72. package/dist/cli/commands/update.d.ts.map +1 -1
  73. package/dist/cli/commands/update.js +3 -1
  74. package/dist/cli/commands/update.js.map +1 -1
  75. package/dist/cli/commands/update.test.js +8 -4
  76. package/dist/cli/commands/update.test.js.map +1 -1
  77. package/dist/cli/commands/version.d.ts.map +1 -1
  78. package/dist/cli/commands/version.js +3 -1
  79. package/dist/cli/commands/version.js.map +1 -1
  80. package/dist/cli/commands/version.test.js +9 -5
  81. package/dist/cli/commands/version.test.js.map +1 -1
  82. package/dist/cli/index.d.ts.map +1 -1
  83. package/dist/cli/index.js +2 -0
  84. package/dist/cli/index.js.map +1 -1
  85. package/dist/cli/init-flag-families.d.ts +6 -1
  86. package/dist/cli/init-flag-families.d.ts.map +1 -1
  87. package/dist/cli/init-flag-families.js +32 -1
  88. package/dist/cli/init-flag-families.js.map +1 -1
  89. package/dist/cli/init-flag-families.test.js +47 -0
  90. package/dist/cli/init-flag-families.test.js.map +1 -1
  91. package/dist/cli/output/interactive.d.ts +1 -0
  92. package/dist/cli/output/interactive.d.ts.map +1 -1
  93. package/dist/cli/output/interactive.js +5 -0
  94. package/dist/cli/output/interactive.js.map +1 -1
  95. package/dist/cli/shutdown.d.ts +51 -0
  96. package/dist/cli/shutdown.d.ts.map +1 -0
  97. package/dist/cli/shutdown.js +199 -0
  98. package/dist/cli/shutdown.js.map +1 -0
  99. package/dist/cli/shutdown.test.d.ts +2 -0
  100. package/dist/cli/shutdown.test.d.ts.map +1 -0
  101. package/dist/cli/shutdown.test.js +316 -0
  102. package/dist/cli/shutdown.test.js.map +1 -0
  103. package/dist/config/schema.d.ts +272 -16
  104. package/dist/config/schema.d.ts.map +1 -1
  105. package/dist/config/schema.js +25 -1
  106. package/dist/config/schema.js.map +1 -1
  107. package/dist/config/schema.test.js +103 -3
  108. package/dist/config/schema.test.js.map +1 -1
  109. package/dist/core/assembly/overlay-loader.d.ts +12 -0
  110. package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
  111. package/dist/core/assembly/overlay-loader.js +30 -0
  112. package/dist/core/assembly/overlay-loader.js.map +1 -1
  113. package/dist/core/assembly/overlay-loader.test.js +66 -1
  114. package/dist/core/assembly/overlay-loader.test.js.map +1 -1
  115. package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
  116. package/dist/core/assembly/overlay-state-resolver.js +48 -19
  117. package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
  118. package/dist/core/assembly/overlay-state-resolver.test.js +80 -0
  119. package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
  120. package/dist/e2e/init.test.js +5 -4
  121. package/dist/e2e/init.test.js.map +1 -1
  122. package/dist/e2e/project-type-overlays.test.js +119 -0
  123. package/dist/e2e/project-type-overlays.test.js.map +1 -1
  124. package/dist/project/adopt.d.ts.map +1 -1
  125. package/dist/project/adopt.js +3 -1
  126. package/dist/project/adopt.js.map +1 -1
  127. package/dist/project/detectors/disambiguate.js +1 -1
  128. package/dist/project/detectors/disambiguate.js.map +1 -1
  129. package/dist/project/detectors/index.d.ts.map +1 -1
  130. package/dist/project/detectors/index.js +2 -1
  131. package/dist/project/detectors/index.js.map +1 -1
  132. package/dist/project/detectors/ml.d.ts.map +1 -1
  133. package/dist/project/detectors/ml.js +2 -6
  134. package/dist/project/detectors/ml.js.map +1 -1
  135. package/dist/project/detectors/research.d.ts +4 -0
  136. package/dist/project/detectors/research.d.ts.map +1 -0
  137. package/dist/project/detectors/research.js +141 -0
  138. package/dist/project/detectors/research.js.map +1 -0
  139. package/dist/project/detectors/research.test.d.ts +2 -0
  140. package/dist/project/detectors/research.test.d.ts.map +1 -0
  141. package/dist/project/detectors/research.test.js +235 -0
  142. package/dist/project/detectors/research.test.js.map +1 -0
  143. package/dist/project/detectors/shared-signals.d.ts +3 -0
  144. package/dist/project/detectors/shared-signals.d.ts.map +1 -0
  145. package/dist/project/detectors/shared-signals.js +9 -0
  146. package/dist/project/detectors/shared-signals.js.map +1 -0
  147. package/dist/project/detectors/types.d.ts +6 -2
  148. package/dist/project/detectors/types.d.ts.map +1 -1
  149. package/dist/project/detectors/types.js.map +1 -1
  150. package/dist/state/lock-manager.d.ts +1 -0
  151. package/dist/state/lock-manager.d.ts.map +1 -1
  152. package/dist/state/lock-manager.js +1 -1
  153. package/dist/state/lock-manager.js.map +1 -1
  154. package/dist/types/config.d.ts +7 -1
  155. package/dist/types/config.d.ts.map +1 -1
  156. package/dist/wizard/copy/core.d.ts.map +1 -1
  157. package/dist/wizard/copy/core.js +4 -0
  158. package/dist/wizard/copy/core.js.map +1 -1
  159. package/dist/wizard/copy/index.d.ts.map +1 -1
  160. package/dist/wizard/copy/index.js +2 -0
  161. package/dist/wizard/copy/index.js.map +1 -1
  162. package/dist/wizard/copy/research.d.ts +3 -0
  163. package/dist/wizard/copy/research.d.ts.map +1 -0
  164. package/dist/wizard/copy/research.js +27 -0
  165. package/dist/wizard/copy/research.js.map +1 -0
  166. package/dist/wizard/copy/types.d.ts +5 -1
  167. package/dist/wizard/copy/types.d.ts.map +1 -1
  168. package/dist/wizard/flags.d.ts +7 -1
  169. package/dist/wizard/flags.d.ts.map +1 -1
  170. package/dist/wizard/questions.d.ts +4 -2
  171. package/dist/wizard/questions.d.ts.map +1 -1
  172. package/dist/wizard/questions.js +27 -1
  173. package/dist/wizard/questions.js.map +1 -1
  174. package/dist/wizard/questions.test.js +51 -0
  175. package/dist/wizard/questions.test.js.map +1 -1
  176. package/dist/wizard/wizard.d.ts +3 -2
  177. package/dist/wizard/wizard.d.ts.map +1 -1
  178. package/dist/wizard/wizard.js +3 -1
  179. package/dist/wizard/wizard.js.map +1 -1
  180. package/package.json +1 -1
@@ -0,0 +1,248 @@
1
+ ---
2
+ name: research-conventions
3
+ description: Coding conventions for research projects including experiment branching, result naming, config management, and reproducibility standards
4
+ topics: [research, conventions, git, branching, reproducibility, config-management]
5
+ ---
6
+
7
+ Research code has a unique lifecycle: most code is written to be tried and discarded. A trading strategy that underperforms is reverted. A hyperparameter sweep that converges to a local minimum is abandoned. The conventions must make this try-and-discard cycle fast and safe while preserving a complete audit trail of what was tried and why it was kept or discarded.
8
+
9
+ ## Summary
10
+
11
+ Use git branches as the state machine for experiment lifecycle (try, evaluate, keep/revert). Name branches, results, and configs with a consistent scheme that encodes the experiment ID, hypothesis, and timestamp. Pin every dependency and seed every random source for reproducibility. Separate experiment code (disposable) from infrastructure code (durable) in the repository structure. Use structured config files (YAML/TOML) instead of command-line argument sprawl.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Git as Experiment State Machine
16
+
17
+ The experiment loop uses git as its state management layer. Each experiment run is a branch. The decision to keep or discard is a merge or branch deletion:
18
+
19
+ ```
20
+ main (stable baseline)
21
+ |
22
+ +-- exp/001-momentum-lookback-20 (try → evaluate → keep → merge)
23
+ |
24
+ +-- exp/002-momentum-lookback-10 (try → evaluate → discard → delete)
25
+ |
26
+ +-- exp/003-mean-revert-rsi (try → evaluate → keep → merge)
27
+ ```
28
+
29
+ **Branch naming convention**: `exp/{NNN}-{short-description}`
30
+ - `NNN`: Zero-padded sequential experiment number
31
+ - `short-description`: Kebab-case summary of what is being tested
32
+ - Examples: `exp/001-adaptive-lookback`, `exp/042-ensemble-top3`
33
+
34
+ **Workflow**:
35
+ ```bash
36
+ # Start a new experiment
37
+ git checkout main
38
+ git checkout -b exp/015-rsi-threshold-sweep
39
+
40
+ # ... agent modifies code, runs experiment ...
41
+
42
+ # Experiment succeeded — merge to main
43
+ git checkout main
44
+ git merge --no-ff exp/015-rsi-threshold-sweep -m "exp/015: RSI threshold 30/70 Sharpe=1.6"
45
+
46
+ # Experiment failed — discard
47
+ git branch -D exp/015-rsi-threshold-sweep
48
+ # Or keep for reference:
49
+ git tag archive/exp/015-rsi-threshold-sweep exp/015-rsi-threshold-sweep
50
+ git branch -D exp/015-rsi-threshold-sweep
51
+ ```
52
+
53
+ **Commit message convention for experiments**:
54
+ ```
55
+ exp/015: RSI threshold sweep
56
+
57
+ Hypothesis: RSI overbought/oversold thresholds of 30/70 will outperform
58
+ the default 20/80 on 2020-2023 equity data.
59
+
60
+ Result: Sharpe=1.6, MaxDD=11%, 247 trades
61
+ Decision: KEEP — new best by Sharpe, DD within guardrail
62
+ ```
63
+
64
+ ### Result Naming
65
+
66
+ Every experiment run produces artifacts. Use a consistent naming scheme:
67
+
68
+ ```
69
+ results/
70
+ exp-001/
71
+ config.yml # Exact config used for this run
72
+ metrics.json # Final metrics
73
+ metrics_history.csv # Per-iteration metrics
74
+ artifacts/ # Model checkpoints, plots, etc.
75
+ log.txt # Full stdout/stderr
76
+ exp-002/
77
+ ...
78
+ ```
79
+
80
+ **File naming rules**:
81
+ - Directories: `exp-{NNN}` matching the git branch number
82
+ - Timestamps in filenames when multiple runs share an experiment: `exp-001-20240315T143022`
83
+ - Never use spaces or special characters in result paths
84
+ - Metrics files are always JSON (machine-readable) or CSV (tabular)
85
+
86
+ ### Config Management
87
+
88
+ Research projects accumulate dozens of configuration parameters. Manage them with structured config files, not argument sprawl:
89
+
90
+ ```yaml
91
+ # configs/base.yml — shared defaults
92
+ experiment:
93
+ seed: 42
94
+ num_runs: 100
95
+ patience: 20
96
+
97
+ data:
98
+ source: "data/prices.parquet"
99
+ train_start: "2015-01-01"
100
+ train_end: "2019-12-31"
101
+ test_start: "2020-01-01"
102
+ test_end: "2023-12-31"
103
+
104
+ logging:
105
+ level: INFO
106
+ results_dir: "results"
107
+ ```
108
+
109
+ ```yaml
110
+ # configs/exp-015-rsi-sweep.yml — experiment-specific overrides
111
+ _base_: base.yml
112
+
113
+ strategy:
114
+ type: "rsi_threshold"
115
+ params:
116
+ overbought: 70
117
+ oversold: 30
118
+ lookback: 14
119
+
120
+ experiment:
121
+ num_runs: 200 # Override base
122
+ ```
123
+
124
+ **Config loading pattern** (merge base + override):
125
+
126
+ ```python
127
+ # src/config.py
128
+ import yaml
129
+ from pathlib import Path
130
+ from typing import Any
131
+
132
+ def load_config(config_path: str) -> dict[str, Any]:
133
+ """Load config with base inheritance."""
134
+ with open(config_path) as f:
135
+ config = yaml.safe_load(f)
136
+
137
+ # Resolve base config inheritance
138
+ if "_base_" in config:
139
+ base_path = Path(config_path).parent / config.pop("_base_")
140
+ base = load_config(str(base_path))
141
+ base = deep_merge(base, config)
142
+ return base
143
+
144
+ return config
145
+
146
+ def deep_merge(base: dict, override: dict) -> dict:
147
+ """Recursively merge override into base."""
148
+ result = base.copy()
149
+ for key, value in override.items():
150
+ if key in result and isinstance(result[key], dict) and isinstance(value, dict):
151
+ result[key] = deep_merge(result[key], value)
152
+ else:
153
+ result[key] = value
154
+ return result
155
+ ```
156
+
157
+ ### Reproducibility Standards
158
+
159
+ Every experiment must be reproducible. This means another researcher (or the same agent in a future session) can re-run the experiment and get the same result:
160
+
161
+ **Mandatory reproducibility checklist**:
162
+
163
+ 1. **Seed everything**: Random number generators, data shuffling, model initialization.
164
+ ```python
165
+ import random
166
+ import numpy as np
167
+
168
+ def set_seed(seed: int) -> None:
169
+ random.seed(seed)
170
+ np.random.seed(seed)
171
+ # Framework-specific seeding
172
+ try:
173
+ import torch
174
+ torch.manual_seed(seed)
175
+ torch.cuda.manual_seed_all(seed)
176
+ torch.backends.cudnn.deterministic = True
177
+ torch.backends.cudnn.benchmark = False
178
+ except ImportError:
179
+ pass
180
+ ```
181
+
182
+ 2. **Pin dependencies**: Use exact versions, not ranges.
183
+ ```
184
+ # requirements.txt — pinned
185
+ numpy==1.26.4
186
+ pandas==2.2.1
187
+ scikit-learn==1.4.1
188
+ optuna==3.5.0
189
+ ```
190
+
191
+ 3. **Record environment**: Capture the full environment at experiment start.
192
+ ```python
193
+ import subprocess
194
+ import platform
195
+ import json
196
+
197
+ def capture_environment() -> dict:
198
+ return {
199
+ "python": platform.python_version(),
200
+ "platform": platform.platform(),
201
+ "pip_freeze": subprocess.check_output(
202
+ ["pip", "freeze"], text=True
203
+ ).strip().split("\n"),
204
+ "git_sha": subprocess.check_output(
205
+ ["git", "rev-parse", "HEAD"], text=True
206
+ ).strip(),
207
+ "git_dirty": bool(subprocess.check_output(
208
+ ["git", "status", "--porcelain"], text=True
209
+ ).strip()),
210
+ }
211
+ ```
212
+
213
+ 4. **Never modify data in place**: Raw data is immutable. Processed data is derived and can be regenerated from raw data + processing code.
214
+
215
+ 5. **Config-as-code**: The experiment config file (committed to git) must fully define the experiment. No "I changed that parameter manually."
216
+
217
+ ### Code Organization Conventions
218
+
219
+ Separate durable infrastructure code from disposable experiment code:
220
+
221
+ | Category | Location | Lifecycle |
222
+ |----------|----------|-----------|
223
+ | Experiment runner | `src/runner/` | Durable — rarely changes |
224
+ | Evaluation framework | `src/evaluation/` | Durable — rarely changes |
225
+ | Data loading | `src/data/` | Durable — rarely changes |
226
+ | Strategy/model code | `src/strategies/` or `src/models/` | Disposable — changes every experiment |
227
+ | Config files | `configs/` | Per-experiment |
228
+ | Results | `results/` | Per-experiment output |
229
+
230
+ **Import hygiene**: Experiment code imports from infrastructure code, never the reverse. The runner does not import specific strategies -- it discovers them via a registry or config-specified entry point.
231
+
232
+ ### Code Style for Research
233
+
234
+ - **Type hints everywhere**: Even in experiment code. Catches bugs early in a fast-iteration cycle.
235
+ - **Docstrings on public functions**: Especially for metric computation (document the formula).
236
+ - **No notebooks in git**: Notebooks are for interactive exploration. Convert to scripts before committing. If notebook-driven experiments are required, use `nbstripout` to strip outputs before committing.
237
+ - **Linting**: Use `ruff` for fast linting. Research code skips some style rules (unused imports during exploration) but enforces correctness rules (undefined variables, type errors).
238
+
239
+ ```toml
240
+ # pyproject.toml
241
+ [tool.ruff]
242
+ line-length = 100
243
+ select = ["E", "F", "W", "I"] # Errors, pyflakes, warnings, isort
244
+ ignore = ["E501"] # Allow long lines in research code
245
+
246
+ [tool.ruff.lint.isort]
247
+ known-first-party = ["src"]
248
+ ```
@@ -0,0 +1,303 @@
1
+ ---
2
+ name: research-dev-environment
3
+ description: Development tooling for research projects including virtual environments, dependency management, GPU setup, and data access configuration
4
+ topics: [research, dev-environment, dependencies, virtual-env, gpu, data-access, tooling]
5
+ ---
6
+
7
+ Research dev environments have stricter reproducibility requirements than typical application development. A trading strategy that produces different results on a different machine is useless -- the environment itself is a variable that must be controlled. At the same time, research environments need flexibility for rapid iteration: installing new packages, switching between CPU and GPU, and accessing large datasets must be frictionless.
8
+
9
+ ## Summary
10
+
11
+ Use `uv` (preferred) or `pip` with pinned dependencies in a virtual environment for reproducible Python dependency management. Lock the full dependency tree (not just direct dependencies). Configure GPU access and CUDA versions explicitly when applicable. Set up data access credentials via environment variables (never in code or config files). Use a Makefile with standard targets (`setup`, `run`, `test`) so that both humans and agents can operate the environment identically.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Python Environment Setup
16
+
17
+ **Recommended: `uv` for dependency management**
18
+
19
+ `uv` is the fastest Python package manager and provides deterministic resolution with a lockfile:
20
+
21
+ ```bash
22
+ # Install uv
23
+ curl -LsSf https://astral.sh/uv/install.sh | sh
24
+
25
+ # Create a new project
26
+ uv init research-project
27
+ cd research-project
28
+
29
+ # Add dependencies
30
+ uv add numpy pandas scikit-learn optuna
31
+ uv add --dev pytest ruff mypy
32
+
33
+ # The lockfile (uv.lock) is auto-generated and pinned
34
+ # Commit both pyproject.toml and uv.lock
35
+ ```
36
+
37
+ ```toml
38
+ # pyproject.toml
39
+ [project]
40
+ name = "research-project"
41
+ version = "0.1.0"
42
+ requires-python = ">=3.11"
43
+ dependencies = [
44
+ "numpy>=1.26",
45
+ "pandas>=2.2",
46
+ "scikit-learn>=1.4",
47
+ "optuna>=3.5",
48
+ "pyyaml>=6.0",
49
+ "structlog>=24.1",
50
+ ]
51
+
52
+ [project.optional-dependencies]
53
+ gpu = ["torch>=2.2"]
54
+ tracking = ["mlflow>=2.11"]
55
+ notebooks = ["jupyter>=1.0", "papermill>=2.5"]
56
+
57
+ [tool.uv]
58
+ dev-dependencies = [
59
+ "pytest>=8.0",
60
+ "ruff>=0.3",
61
+ "mypy>=1.9",
62
+ ]
63
+ ```
64
+
65
+ **Alternative: `pip` with `requirements.txt`**
66
+
67
+ If `uv` is not available, use `pip` with fully pinned requirements:
68
+
69
+ ```bash
70
+ python -m venv .venv
71
+ source .venv/bin/activate
72
+ pip install -r requirements.txt
73
+ ```
74
+
75
+ ```
76
+ # requirements.txt — fully pinned (generated by pip freeze)
77
+ numpy==1.26.4
78
+ pandas==2.2.1
79
+ scikit-learn==1.4.1.post1
80
+ optuna==3.5.0
81
+ PyYAML==6.0.1
82
+ structlog==24.1.0
83
+ ```
84
+
85
+ **Alternative: `conda` for complex native dependencies**
86
+
87
+ Use conda when the project requires system-level native libraries (CUDA toolkit, MKL, OpenBLAS) that pip cannot manage:
88
+
89
+ ```yaml
90
+ # environment.yml
91
+ name: research
92
+ channels:
93
+ - conda-forge
94
+ - defaults
95
+ dependencies:
96
+ - python=3.11
97
+ - numpy=1.26
98
+ - pandas=2.2
99
+ - scikit-learn=1.4
100
+ - cudatoolkit=12.1 # Native dependency
101
+ - pip:
102
+ - optuna==3.5.0 # pip packages within conda env
103
+ ```
104
+
105
+ ### GPU Configuration
106
+
107
+ For research projects that use GPU acceleration (ML model training, simulation, etc.):
108
+
109
+ ```python
110
+ # src/gpu.py
111
+ import os
112
+ import logging
113
+
114
+ logger = logging.getLogger(__name__)
115
+
116
+ def configure_gpu(config: dict) -> str:
117
+ """Configure GPU access. Returns device string."""
118
+ if not config.get("gpu", {}).get("enabled", False):
119
+ logger.info("GPU disabled by config, using CPU")
120
+ return "cpu"
121
+
122
+ # Restrict visible GPUs (useful for multi-GPU machines)
123
+ gpu_ids = config.get("gpu", {}).get("device_ids", [0])
124
+ os.environ["CUDA_VISIBLE_DEVICES"] = ",".join(str(i) for i in gpu_ids)
125
+
126
+ try:
127
+ import torch
128
+ if torch.cuda.is_available():
129
+ device = f"cuda:{gpu_ids[0]}"
130
+ logger.info(
131
+ "GPU configured: %s (%s, %.1f GB)",
132
+ device,
133
+ torch.cuda.get_device_name(0),
134
+ torch.cuda.get_device_properties(0).total_mem / 1e9,
135
+ )
136
+ return device
137
+ else:
138
+ logger.warning("CUDA not available, falling back to CPU")
139
+ return "cpu"
140
+ except ImportError:
141
+ logger.warning("PyTorch not installed, using CPU")
142
+ return "cpu"
143
+ ```
144
+
145
+ **GPU config in YAML**:
146
+ ```yaml
147
+ gpu:
148
+ enabled: true
149
+ device_ids: [0]
150
+ memory_fraction: 0.8 # Limit GPU memory usage
151
+ ```
152
+
153
+ ### Data Access Configuration
154
+
155
+ Data credentials are managed via environment variables, never committed to git:
156
+
157
+ ```bash
158
+ # .env (gitignored)
159
+ DATA_SOURCE_PATH=/mnt/data/research
160
+ DATABASE_URL=postgresql://user:pass@host:5432/research
161
+ AWS_PROFILE=research-data
162
+ POLYGON_API_KEY=pk_xxx # Market data API
163
+ ```
164
+
165
+ ```python
166
+ # src/data/credentials.py
167
+ import os
168
+ from dataclasses import dataclass
169
+
170
+ @dataclass
171
+ class DataCredentials:
172
+ """Data access credentials loaded from environment."""
173
+ data_path: str
174
+ database_url: str | None = None
175
+ api_key: str | None = None
176
+
177
+ @classmethod
178
+ def from_env(cls) -> "DataCredentials":
179
+ data_path = os.environ.get("DATA_SOURCE_PATH", "data/raw")
180
+ if not os.path.exists(data_path):
181
+ raise EnvironmentError(
182
+ f"DATA_SOURCE_PATH={data_path} does not exist. "
183
+ "Set DATA_SOURCE_PATH to your data directory."
184
+ )
185
+ return cls(
186
+ data_path=data_path,
187
+ database_url=os.environ.get("DATABASE_URL"),
188
+ api_key=os.environ.get("POLYGON_API_KEY"),
189
+ )
190
+ ```
191
+
192
+ ### Makefile for Environment Management
193
+
194
+ ```makefile
195
+ .PHONY: setup run test lint clean
196
+
197
+ PYTHON ?= python3
198
+ UV := $(shell command -v uv 2>/dev/null)
199
+
200
+ setup: ## Set up development environment
201
+ ifdef UV
202
+ uv sync
203
+ uv sync --group dev
204
+ else
205
+ $(PYTHON) -m venv .venv
206
+ .venv/bin/pip install -r requirements.txt
207
+ .venv/bin/pip install -r requirements-dev.txt
208
+ endif
209
+ @echo "Environment ready. Activate with: source .venv/bin/activate"
210
+
211
+ setup-gpu: setup ## Set up with GPU dependencies
212
+ ifdef UV
213
+ uv sync --extra gpu
214
+ else
215
+ .venv/bin/pip install -r requirements-gpu.txt
216
+ endif
217
+
218
+ run: ## Run experiment (usage: make run CONFIG=configs/exp-001.yml)
219
+ $(PYTHON) -m src.runner.experiment_runner --config $(CONFIG)
220
+
221
+ test: ## Run test suite
222
+ $(PYTHON) -m pytest tests/ -v --tb=short
223
+
224
+ lint: ## Lint and type-check
225
+ ruff check src/ tests/
226
+ mypy src/ --ignore-missing-imports
227
+
228
+ clean: ## Clean generated artifacts
229
+ rm -rf .venv/ __pycache__/ .mypy_cache/ .pytest_cache/
230
+ find . -name '*.pyc' -delete
231
+ ```
232
+
233
+ ### IDE Configuration
234
+
235
+ **VS Code** (`.vscode/settings.json`):
236
+ ```json
237
+ {
238
+ "python.defaultInterpreterPath": ".venv/bin/python",
239
+ "python.analysis.typeCheckingMode": "basic",
240
+ "editor.formatOnSave": true,
241
+ "[python]": {
242
+ "editor.defaultFormatter": "charliermarsh.ruff"
243
+ },
244
+ "python.testing.pytestEnabled": true,
245
+ "python.testing.pytestArgs": ["tests/"]
246
+ }
247
+ ```
248
+
249
+ ### Environment Verification Script
250
+
251
+ Run this at the start of every experiment to verify the environment:
252
+
253
+ ```python
254
+ # scripts/verify_env.py
255
+ """Verify that the research environment is correctly configured."""
256
+ import sys
257
+ import importlib
258
+
259
+ REQUIRED_PACKAGES = [
260
+ "numpy", "pandas", "sklearn", "optuna", "yaml", "structlog",
261
+ ]
262
+
263
+ def verify():
264
+ errors = []
265
+
266
+ # Python version
267
+ if sys.version_info < (3, 11):
268
+ errors.append(f"Python >= 3.11 required, got {sys.version}")
269
+
270
+ # Required packages
271
+ for pkg in REQUIRED_PACKAGES:
272
+ try:
273
+ importlib.import_module(pkg)
274
+ except ImportError:
275
+ errors.append(f"Missing required package: {pkg}")
276
+
277
+ # Data access
278
+ import os
279
+ data_path = os.environ.get("DATA_SOURCE_PATH", "data/raw")
280
+ if not os.path.exists(data_path):
281
+ errors.append(f"Data path not found: {data_path}")
282
+
283
+ if errors:
284
+ print("Environment verification FAILED:")
285
+ for e in errors:
286
+ print(f" - {e}")
287
+ sys.exit(1)
288
+ else:
289
+ print("Environment verification passed.")
290
+
291
+ if __name__ == "__main__":
292
+ verify()
293
+ ```
294
+
295
+ ### Dependency Update Strategy
296
+
297
+ Research projects should update dependencies cautiously:
298
+
299
+ 1. **Lock everything**: Both direct and transitive dependencies are pinned.
300
+ 2. **Update on a schedule**: Not every commit. Weekly or per-milestone.
301
+ 3. **Test after update**: Run the full test suite after any dependency change.
302
+ 4. **Document breaking changes**: If a dependency update changes experiment results, document which runs were affected.
303
+ 5. **Separate experiment deps from infra deps**: Changing the plotting library should not affect experiment reproducibility. Use optional dependency groups.