@zigrivers/scaffold 3.14.0 → 3.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (122) hide show
  1. package/README.md +50 -21
  2. package/content/knowledge/core/automated-review-tooling.md +21 -26
  3. package/content/knowledge/core/multi-model-review-dispatch.md +30 -55
  4. package/content/knowledge/research/research-architecture.md +385 -0
  5. package/content/knowledge/research/research-conventions.md +248 -0
  6. package/content/knowledge/research/research-dev-environment.md +303 -0
  7. package/content/knowledge/research/research-experiment-loop.md +429 -0
  8. package/content/knowledge/research/research-experiment-tracking.md +336 -0
  9. package/content/knowledge/research/research-ml-architecture-search.md +383 -0
  10. package/content/knowledge/research/research-ml-evaluation.md +407 -0
  11. package/content/knowledge/research/research-ml-experiment-tracking.md +466 -0
  12. package/content/knowledge/research/research-ml-training-patterns.md +413 -0
  13. package/content/knowledge/research/research-observability.md +395 -0
  14. package/content/knowledge/research/research-overfitting-prevention.md +306 -0
  15. package/content/knowledge/research/research-project-structure.md +264 -0
  16. package/content/knowledge/research/research-quant-backtesting.md +326 -0
  17. package/content/knowledge/research/research-quant-market-data.md +366 -0
  18. package/content/knowledge/research/research-quant-metrics.md +335 -0
  19. package/content/knowledge/research/research-quant-requirements.md +223 -0
  20. package/content/knowledge/research/research-quant-risk.md +469 -0
  21. package/content/knowledge/research/research-quant-strategy-patterns.md +412 -0
  22. package/content/knowledge/research/research-requirements.md +201 -0
  23. package/content/knowledge/research/research-security.md +374 -0
  24. package/content/knowledge/research/research-sim-compute-management.md +538 -0
  25. package/content/knowledge/research/research-sim-engine-patterns.md +448 -0
  26. package/content/knowledge/research/research-sim-parameter-spaces.md +425 -0
  27. package/content/knowledge/research/research-sim-validation.md +456 -0
  28. package/content/knowledge/research/research-testing.md +334 -0
  29. package/content/methodology/research-ml-research.yml +23 -0
  30. package/content/methodology/research-overlay.yml +65 -0
  31. package/content/methodology/research-quant-finance.yml +29 -0
  32. package/content/methodology/research-simulation.yml +23 -0
  33. package/content/tools/post-implementation-review.md +36 -7
  34. package/content/tools/review-code.md +33 -8
  35. package/content/tools/review-pr.md +79 -95
  36. package/dist/cli/commands/adopt.d.ts.map +1 -1
  37. package/dist/cli/commands/adopt.js +22 -1
  38. package/dist/cli/commands/adopt.js.map +1 -1
  39. package/dist/cli/commands/adopt.serialization.test.js +41 -0
  40. package/dist/cli/commands/adopt.serialization.test.js.map +1 -1
  41. package/dist/cli/commands/init.d.ts +4 -0
  42. package/dist/cli/commands/init.d.ts.map +1 -1
  43. package/dist/cli/commands/init.js +32 -2
  44. package/dist/cli/commands/init.js.map +1 -1
  45. package/dist/cli/init-flag-families.d.ts +6 -1
  46. package/dist/cli/init-flag-families.d.ts.map +1 -1
  47. package/dist/cli/init-flag-families.js +32 -1
  48. package/dist/cli/init-flag-families.js.map +1 -1
  49. package/dist/cli/init-flag-families.test.js +47 -0
  50. package/dist/cli/init-flag-families.test.js.map +1 -1
  51. package/dist/config/schema.d.ts +272 -16
  52. package/dist/config/schema.d.ts.map +1 -1
  53. package/dist/config/schema.js +25 -1
  54. package/dist/config/schema.js.map +1 -1
  55. package/dist/config/schema.test.js +103 -3
  56. package/dist/config/schema.test.js.map +1 -1
  57. package/dist/core/assembly/overlay-loader.d.ts +12 -0
  58. package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
  59. package/dist/core/assembly/overlay-loader.js +30 -0
  60. package/dist/core/assembly/overlay-loader.js.map +1 -1
  61. package/dist/core/assembly/overlay-loader.test.js +66 -1
  62. package/dist/core/assembly/overlay-loader.test.js.map +1 -1
  63. package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
  64. package/dist/core/assembly/overlay-state-resolver.js +48 -19
  65. package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
  66. package/dist/core/assembly/overlay-state-resolver.test.js +80 -0
  67. package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
  68. package/dist/e2e/project-type-overlays.test.js +119 -0
  69. package/dist/e2e/project-type-overlays.test.js.map +1 -1
  70. package/dist/project/adopt.d.ts.map +1 -1
  71. package/dist/project/adopt.js +3 -1
  72. package/dist/project/adopt.js.map +1 -1
  73. package/dist/project/detectors/disambiguate.js +1 -1
  74. package/dist/project/detectors/disambiguate.js.map +1 -1
  75. package/dist/project/detectors/index.d.ts.map +1 -1
  76. package/dist/project/detectors/index.js +2 -1
  77. package/dist/project/detectors/index.js.map +1 -1
  78. package/dist/project/detectors/ml.d.ts.map +1 -1
  79. package/dist/project/detectors/ml.js +2 -6
  80. package/dist/project/detectors/ml.js.map +1 -1
  81. package/dist/project/detectors/research.d.ts +4 -0
  82. package/dist/project/detectors/research.d.ts.map +1 -0
  83. package/dist/project/detectors/research.js +141 -0
  84. package/dist/project/detectors/research.js.map +1 -0
  85. package/dist/project/detectors/research.test.d.ts +2 -0
  86. package/dist/project/detectors/research.test.d.ts.map +1 -0
  87. package/dist/project/detectors/research.test.js +235 -0
  88. package/dist/project/detectors/research.test.js.map +1 -0
  89. package/dist/project/detectors/shared-signals.d.ts +3 -0
  90. package/dist/project/detectors/shared-signals.d.ts.map +1 -0
  91. package/dist/project/detectors/shared-signals.js +9 -0
  92. package/dist/project/detectors/shared-signals.js.map +1 -0
  93. package/dist/project/detectors/types.d.ts +6 -2
  94. package/dist/project/detectors/types.d.ts.map +1 -1
  95. package/dist/project/detectors/types.js.map +1 -1
  96. package/dist/types/config.d.ts +7 -1
  97. package/dist/types/config.d.ts.map +1 -1
  98. package/dist/wizard/copy/core.d.ts.map +1 -1
  99. package/dist/wizard/copy/core.js +4 -0
  100. package/dist/wizard/copy/core.js.map +1 -1
  101. package/dist/wizard/copy/index.d.ts.map +1 -1
  102. package/dist/wizard/copy/index.js +2 -0
  103. package/dist/wizard/copy/index.js.map +1 -1
  104. package/dist/wizard/copy/research.d.ts +3 -0
  105. package/dist/wizard/copy/research.d.ts.map +1 -0
  106. package/dist/wizard/copy/research.js +27 -0
  107. package/dist/wizard/copy/research.js.map +1 -0
  108. package/dist/wizard/copy/types.d.ts +5 -1
  109. package/dist/wizard/copy/types.d.ts.map +1 -1
  110. package/dist/wizard/flags.d.ts +7 -1
  111. package/dist/wizard/flags.d.ts.map +1 -1
  112. package/dist/wizard/questions.d.ts +4 -2
  113. package/dist/wizard/questions.d.ts.map +1 -1
  114. package/dist/wizard/questions.js +27 -1
  115. package/dist/wizard/questions.js.map +1 -1
  116. package/dist/wizard/questions.test.js +51 -0
  117. package/dist/wizard/questions.test.js.map +1 -1
  118. package/dist/wizard/wizard.d.ts +3 -2
  119. package/dist/wizard/wizard.d.ts.map +1 -1
  120. package/dist/wizard/wizard.js +3 -1
  121. package/dist/wizard/wizard.js.map +1 -1
  122. package/package.json +1 -1
@@ -0,0 +1,385 @@
1
+ ---
2
+ name: research-architecture
3
+ description: Experiment runner architecture including pluggable experiment and evaluation interfaces, state management patterns, and result persistence
4
+ topics: [research, architecture, experiment-runner, state-management, interfaces, persistence]
5
+ ---
6
+
7
+ The experiment runner is the central architectural component of a research project. It orchestrates the loop of loading configuration, executing experiments, evaluating results, and deciding whether to keep or discard each run. The runner must be completely decoupled from the specific experiment logic (strategies, models, parameter spaces) so that it can drive any experiment without modification. This separation is what makes autonomous iteration possible -- the agent modifies experiment code while the runner infrastructure remains stable.
8
+
9
+ ## Summary
10
+
11
+ Build the experiment runner around three pluggable interfaces: Strategy (executes an experiment given config), Evaluator (computes metrics from raw results), and Tracker (records results for comparison). Use a state manager to track the current best result, iteration history, and budget consumption. Persist all state to disk so that the runner can resume after crashes. The runner never imports specific strategy code -- it discovers strategies via a registry or config-specified entry point.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Core Architecture
16
+
17
+ ```
18
+ ┌──────────────────────┐
19
+ │ ExperimentRunner │
20
+ │ ┌────────────────┐ │
21
+ Config ──────────►│ │ State Manager │ │
22
+ │ │ (best, history)│ │
23
+ │ └───────┬────────┘ │
24
+ │ │ │
25
+ │ ┌───────▼────────┐ │
26
+ │ │ Budget Checker │ │
27
+ │ └───────┬────────┘ │
28
+ │ │ │
29
+ │ ┌───────▼────────┐ │
30
+ │ │ Strategy │◄─┼── Registry lookup
31
+ │ │ (pluggable) │ │
32
+ │ └───────┬────────┘ │
33
+ │ │ │
34
+ │ ┌───────▼────────┐ │
35
+ │ │ Evaluator │ │
36
+ │ │ (pluggable) │ │
37
+ │ └───────┬────────┘ │
38
+ │ │ │
39
+ │ ┌───────▼────────┐ │
40
+ │ │ Tracker │ │
41
+ │ │ (pluggable) │ │
42
+ │ └────────────────┘ │
43
+ └──────────────────────┘
44
+ ```
45
+
46
+ ### Pluggable Interface Design
47
+
48
+ The three core interfaces use Python's Protocol type for structural subtyping. This means strategies do not need to inherit from a base class -- they only need to implement the required methods:
49
+
50
+ ```python
51
+ # src/interfaces.py
52
+ from typing import Protocol, Any, runtime_checkable
53
+
54
+ @runtime_checkable
55
+ class Strategy(Protocol):
56
+ """Interface for experiment execution strategies."""
57
+
58
+ @property
59
+ def name(self) -> str:
60
+ """Unique identifier for this strategy."""
61
+ ...
62
+
63
+ def execute(self, config: dict[str, Any]) -> dict[str, Any]:
64
+ """
65
+ Execute the experiment and return raw results.
66
+
67
+ Args:
68
+ config: Experiment configuration dict.
69
+
70
+ Returns:
71
+ Raw results dict. Structure is strategy-specific but must
72
+ contain enough information for the Evaluator to compute metrics.
73
+ """
74
+ ...
75
+
76
+ @runtime_checkable
77
+ class Evaluator(Protocol):
78
+ """Interface for result evaluation."""
79
+
80
+ def evaluate(self, raw_results: dict[str, Any]) -> dict[str, float]:
81
+ """
82
+ Compute metrics from raw experiment results.
83
+
84
+ Args:
85
+ raw_results: Output from Strategy.execute().
86
+
87
+ Returns:
88
+ Dict mapping metric names to float values.
89
+ """
90
+ ...
91
+
92
+ def is_improvement(self, current: dict[str, float],
93
+ best: dict[str, float]) -> bool:
94
+ """
95
+ Determine if current results improve on the best so far.
96
+
97
+ Args:
98
+ current: Metrics from the current run.
99
+ best: Metrics from the best run so far.
100
+
101
+ Returns:
102
+ True if current should replace best.
103
+ """
104
+ ...
105
+
106
+ @runtime_checkable
107
+ class Tracker(Protocol):
108
+ """Interface for experiment result tracking."""
109
+
110
+ def log_run(self, run_id: str, config: dict, metrics: dict[str, float],
111
+ artifacts: dict[str, Any] | None = None) -> None:
112
+ """Record a single experiment run."""
113
+ ...
114
+
115
+ def get_history(self) -> list[dict]:
116
+ """Return all recorded runs."""
117
+ ...
118
+ ```
119
+
120
+ ### Strategy Registry
121
+
122
+ The registry pattern allows the runner to instantiate strategies by name without importing them directly:
123
+
124
+ ```python
125
+ # src/strategies/registry.py
126
+ from typing import Type
127
+ from src.interfaces import Strategy
128
+
129
+ class StrategyRegistry:
130
+ """Registry for experiment strategy classes."""
131
+
132
+ _registry: dict[str, Type[Strategy]] = {}
133
+
134
+ @classmethod
135
+ def register(cls, name: str):
136
+ """Decorator to register a strategy class."""
137
+ def decorator(strategy_cls: Type[Strategy]):
138
+ if name in cls._registry:
139
+ raise ValueError(f"Strategy '{name}' already registered")
140
+ cls._registry[name] = strategy_cls
141
+ return strategy_cls
142
+ return decorator
143
+
144
+ @classmethod
145
+ def get(cls, name: str) -> Type[Strategy]:
146
+ """Look up a strategy by name."""
147
+ if name not in cls._registry:
148
+ available = ", ".join(sorted(cls._registry.keys()))
149
+ raise KeyError(
150
+ f"Strategy '{name}' not found. Available: {available}"
151
+ )
152
+ return cls._registry[name]
153
+
154
+ @classmethod
155
+ def list_strategies(cls) -> list[str]:
156
+ return sorted(cls._registry.keys())
157
+
158
+
159
+ # Usage in a strategy file:
160
+ # src/strategies/momentum.py
161
+ from src.strategies.registry import StrategyRegistry
162
+
163
+ @StrategyRegistry.register("momentum_crossover")
164
+ class MomentumCrossover:
165
+ name = "momentum_crossover"
166
+
167
+ def __init__(self, lookback: int = 20, **kwargs):
168
+ self.lookback = lookback
169
+
170
+ def execute(self, config: dict) -> dict:
171
+ # ... run the momentum crossover strategy ...
172
+ return {"trades": trades, "equity_curve": equity}
173
+ ```
174
+
175
+ ### State Management
176
+
177
+ The state manager tracks the experiment loop's progress and enables resume-after-crash:
178
+
179
+ ```python
180
+ # src/runner/state.py
181
+ import json
182
+ from pathlib import Path
183
+ from dataclasses import dataclass, field, asdict
184
+ from typing import Any
185
+
186
+ @dataclass
187
+ class RunRecord:
188
+ """Record of a single experiment run."""
189
+ run_id: str
190
+ config: dict[str, Any]
191
+ metrics: dict[str, float]
192
+ is_best: bool = False
193
+ decision: str = "" # "keep" or "discard"
194
+ reason: str = ""
195
+
196
+ @dataclass
197
+ class ExperimentState:
198
+ """Persistent state for the experiment loop."""
199
+ experiment_id: str
200
+ total_runs: int = 0
201
+ best_run: RunRecord | None = None
202
+ history: list[RunRecord] = field(default_factory=list)
203
+ runs_since_improvement: int = 0
204
+
205
+ def record_run(self, run: RunRecord) -> None:
206
+ """Record a completed run and update state."""
207
+ self.total_runs += 1
208
+ self.history.append(run)
209
+
210
+ if run.is_best:
211
+ self.best_run = run
212
+ self.runs_since_improvement = 0
213
+ else:
214
+ self.runs_since_improvement += 1
215
+
216
+ def save(self, path: Path) -> None:
217
+ """Persist state to disk for crash recovery."""
218
+ path.parent.mkdir(parents=True, exist_ok=True)
219
+ with open(path, "w") as f:
220
+ json.dump(asdict(self), f, indent=2, default=str)
221
+
222
+ @classmethod
223
+ def load(cls, path: Path) -> "ExperimentState":
224
+ """Load state from disk. Returns empty state if file missing."""
225
+ if not path.exists():
226
+ return cls(experiment_id="unknown")
227
+ with open(path) as f:
228
+ data = json.load(f)
229
+ state = cls(experiment_id=data["experiment_id"])
230
+ state.total_runs = data["total_runs"]
231
+ state.runs_since_improvement = data["runs_since_improvement"]
232
+ state.history = [RunRecord(**r) for r in data["history"]]
233
+ if data["best_run"]:
234
+ state.best_run = RunRecord(**data["best_run"])
235
+ return state
236
+ ```
237
+
238
+ ### The Experiment Runner
239
+
240
+ The runner ties the interfaces together:
241
+
242
+ ```python
243
+ # src/runner/experiment_runner.py
244
+ import logging
245
+ from pathlib import Path
246
+ from src.interfaces import Strategy, Evaluator, Tracker
247
+ from src.runner.state import ExperimentState, RunRecord
248
+ from src.runner.budget import IterationBudget
249
+ from src.config import load_config
250
+ from src.seed import set_seed, capture_environment
251
+ from src.strategies.registry import StrategyRegistry
252
+
253
+ logger = logging.getLogger(__name__)
254
+
255
+ class ExperimentRunner:
256
+ def __init__(self, config_path: str):
257
+ self.config = load_config(config_path)
258
+ self.experiment_id = Path(config_path).stem
259
+ self.results_dir = Path(self.config["logging"]["results_dir"]) / self.experiment_id
260
+
261
+ # Load pluggable components
262
+ strategy_cls = StrategyRegistry.get(self.config["strategy"]["type"])
263
+ self.strategy: Strategy = strategy_cls(**self.config["strategy"].get("params", {}))
264
+ self.evaluator: Evaluator = self._build_evaluator()
265
+ self.tracker: Tracker = self._build_tracker()
266
+ self.budget = IterationBudget(**self.config.get("budget", {}))
267
+
268
+ # Load or initialize state
269
+ self.state_path = self.results_dir / "state.json"
270
+ self.state = ExperimentState.load(self.state_path)
271
+ self.state.experiment_id = self.experiment_id
272
+
273
+ def run_loop(self) -> ExperimentState:
274
+ """Run the full experiment loop until budget exhaustion or convergence."""
275
+ logger.info("Starting experiment %s (resuming from run %d)",
276
+ self.experiment_id, self.state.total_runs)
277
+
278
+ while True:
279
+ # Check budget
280
+ exhausted, reason = self.budget.is_exhausted(
281
+ runs=self.state.total_runs,
282
+ runs_since_improvement=self.state.runs_since_improvement,
283
+ )
284
+ if exhausted:
285
+ logger.info("Stopping: %s", reason)
286
+ break
287
+
288
+ # Execute one iteration
289
+ run_id = f"run-{self.state.total_runs + 1:04d}"
290
+ set_seed(self.config["experiment"]["seed"] + self.state.total_runs)
291
+
292
+ try:
293
+ raw_results = self.strategy.execute(self.config)
294
+ metrics = self.evaluator.evaluate(raw_results)
295
+ except Exception as e:
296
+ logger.error("Run %s failed: %s", run_id, e)
297
+ continue
298
+
299
+ # Evaluate improvement
300
+ is_best = (
301
+ self.state.best_run is None
302
+ or self.evaluator.is_improvement(metrics, self.state.best_run.metrics)
303
+ )
304
+ decision = "keep" if is_best else "discard"
305
+
306
+ run = RunRecord(
307
+ run_id=run_id,
308
+ config=self.config,
309
+ metrics=metrics,
310
+ is_best=is_best,
311
+ decision=decision,
312
+ reason=f"{'New best' if is_best else 'No improvement'}",
313
+ )
314
+
315
+ # Record and persist
316
+ self.state.record_run(run)
317
+ self.tracker.log_run(run_id, self.config, metrics)
318
+ self.state.save(self.state_path)
319
+
320
+ logger.info(
321
+ "Run %s: %s (metrics: %s)",
322
+ run_id, decision,
323
+ {k: f"{v:.4f}" for k, v in metrics.items()},
324
+ )
325
+
326
+ return self.state
327
+ ```
328
+
329
+ ### Result Persistence
330
+
331
+ Results are persisted at two levels:
332
+
333
+ 1. **Per-run**: Each run's config, metrics, and artifacts are saved to `results/{experiment_id}/{run_id}/`.
334
+ 2. **Experiment state**: The full experiment state (history, best run, budget consumption) is saved to `results/{experiment_id}/state.json` after every run.
335
+
336
+ ```python
337
+ # src/tracking/file_tracker.py
338
+ import json
339
+ from pathlib import Path
340
+ from src.interfaces import Tracker
341
+
342
+ class FileTracker:
343
+ """Simple file-based experiment tracker."""
344
+
345
+ def __init__(self, results_dir: str):
346
+ self.results_dir = Path(results_dir)
347
+ self.results_dir.mkdir(parents=True, exist_ok=True)
348
+
349
+ def log_run(self, run_id: str, config: dict, metrics: dict[str, float],
350
+ artifacts: dict | None = None) -> None:
351
+ run_dir = self.results_dir / run_id
352
+ run_dir.mkdir(parents=True, exist_ok=True)
353
+
354
+ with open(run_dir / "config.json", "w") as f:
355
+ json.dump(config, f, indent=2, default=str)
356
+ with open(run_dir / "metrics.json", "w") as f:
357
+ json.dump(metrics, f, indent=2)
358
+
359
+ if artifacts:
360
+ artifact_dir = run_dir / "artifacts"
361
+ artifact_dir.mkdir(exist_ok=True)
362
+ for name, data in artifacts.items():
363
+ with open(artifact_dir / name, "w") as f:
364
+ json.dump(data, f, indent=2, default=str)
365
+
366
+ def get_history(self) -> list[dict]:
367
+ runs = []
368
+ for run_dir in sorted(self.results_dir.iterdir()):
369
+ if run_dir.is_dir() and (run_dir / "metrics.json").exists():
370
+ with open(run_dir / "metrics.json") as f:
371
+ metrics = json.load(f)
372
+ runs.append({"run_id": run_dir.name, "metrics": metrics})
373
+ return runs
374
+ ```
375
+
376
+ ### Architecture Decision: When to Use Each Driver
377
+
378
+ | Driver | Architecture Pattern | Use When |
379
+ |--------|---------------------|----------|
380
+ | Code-driven | Git state machine, agent modifies source | Exploring algorithmic variations, strategy development |
381
+ | Config-driven | Fixed runner, parameterised configs | Hyperparameter sweeps, systematic parameter search |
382
+ | API-driven | Client wrapper, parameter serialization | External backtest engines, cloud simulation APIs |
383
+ | Notebook-driven | Papermill execution, cell-level tracking | Exploratory research, visualization-heavy analysis |
384
+
385
+ The runner architecture remains the same across all drivers. What changes is the Strategy implementation: code-driven strategies contain the algorithm directly, config-driven strategies delegate to a parameterised engine, API-driven strategies wrap HTTP calls, and notebook-driven strategies use papermill to execute notebooks.
@@ -0,0 +1,248 @@
1
+ ---
2
+ name: research-conventions
3
+ description: Coding conventions for research projects including experiment branching, result naming, config management, and reproducibility standards
4
+ topics: [research, conventions, git, branching, reproducibility, config-management]
5
+ ---
6
+
7
+ Research code has a unique lifecycle: most code is written to be tried and discarded. A trading strategy that underperforms is reverted. A hyperparameter sweep that converges to a local minimum is abandoned. The conventions must make this try-and-discard cycle fast and safe while preserving a complete audit trail of what was tried and why it was kept or discarded.
8
+
9
+ ## Summary
10
+
11
+ Use git branches as the state machine for experiment lifecycle (try, evaluate, keep/revert). Name branches, results, and configs with a consistent scheme that encodes the experiment ID, hypothesis, and timestamp. Pin every dependency and seed every random source for reproducibility. Separate experiment code (disposable) from infrastructure code (durable) in the repository structure. Use structured config files (YAML/TOML) instead of command-line argument sprawl.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Git as Experiment State Machine
16
+
17
+ The experiment loop uses git as its state management layer. Each experiment run is a branch. The decision to keep or discard is a merge or branch deletion:
18
+
19
+ ```
20
+ main (stable baseline)
21
+ |
22
+ +-- exp/001-momentum-lookback-20 (try → evaluate → keep → merge)
23
+ |
24
+ +-- exp/002-momentum-lookback-10 (try → evaluate → discard → delete)
25
+ |
26
+ +-- exp/003-mean-revert-rsi (try → evaluate → keep → merge)
27
+ ```
28
+
29
+ **Branch naming convention**: `exp/{NNN}-{short-description}`
30
+ - `NNN`: Zero-padded sequential experiment number
31
+ - `short-description`: Kebab-case summary of what is being tested
32
+ - Examples: `exp/001-adaptive-lookback`, `exp/042-ensemble-top3`
33
+
34
+ **Workflow**:
35
+ ```bash
36
+ # Start a new experiment
37
+ git checkout main
38
+ git checkout -b exp/015-rsi-threshold-sweep
39
+
40
+ # ... agent modifies code, runs experiment ...
41
+
42
+ # Experiment succeeded — merge to main
43
+ git checkout main
44
+ git merge --no-ff exp/015-rsi-threshold-sweep -m "exp/015: RSI threshold 30/70 Sharpe=1.6"
45
+
46
+ # Experiment failed — discard
47
+ git branch -D exp/015-rsi-threshold-sweep
48
+ # Or keep for reference:
49
+ git tag archive/exp/015-rsi-threshold-sweep exp/015-rsi-threshold-sweep
50
+ git branch -D exp/015-rsi-threshold-sweep
51
+ ```
52
+
53
+ **Commit message convention for experiments**:
54
+ ```
55
+ exp/015: RSI threshold sweep
56
+
57
+ Hypothesis: RSI overbought/oversold thresholds of 30/70 will outperform
58
+ the default 20/80 on 2020-2023 equity data.
59
+
60
+ Result: Sharpe=1.6, MaxDD=11%, 247 trades
61
+ Decision: KEEP — new best by Sharpe, DD within guardrail
62
+ ```
63
+
64
+ ### Result Naming
65
+
66
+ Every experiment run produces artifacts. Use a consistent naming scheme:
67
+
68
+ ```
69
+ results/
70
+ exp-001/
71
+ config.yml # Exact config used for this run
72
+ metrics.json # Final metrics
73
+ metrics_history.csv # Per-iteration metrics
74
+ artifacts/ # Model checkpoints, plots, etc.
75
+ log.txt # Full stdout/stderr
76
+ exp-002/
77
+ ...
78
+ ```
79
+
80
+ **File naming rules**:
81
+ - Directories: `exp-{NNN}` matching the git branch number
82
+ - Timestamps in filenames when multiple runs share an experiment: `exp-001-20240315T143022`
83
+ - Never use spaces or special characters in result paths
84
+ - Metrics files are always JSON (machine-readable) or CSV (tabular)
85
+
86
+ ### Config Management
87
+
88
+ Research projects accumulate dozens of configuration parameters. Manage them with structured config files, not argument sprawl:
89
+
90
+ ```yaml
91
+ # configs/base.yml — shared defaults
92
+ experiment:
93
+ seed: 42
94
+ num_runs: 100
95
+ patience: 20
96
+
97
+ data:
98
+ source: "data/prices.parquet"
99
+ train_start: "2015-01-01"
100
+ train_end: "2019-12-31"
101
+ test_start: "2020-01-01"
102
+ test_end: "2023-12-31"
103
+
104
+ logging:
105
+ level: INFO
106
+ results_dir: "results"
107
+ ```
108
+
109
+ ```yaml
110
+ # configs/exp-015-rsi-sweep.yml — experiment-specific overrides
111
+ _base_: base.yml
112
+
113
+ strategy:
114
+ type: "rsi_threshold"
115
+ params:
116
+ overbought: 70
117
+ oversold: 30
118
+ lookback: 14
119
+
120
+ experiment:
121
+ num_runs: 200 # Override base
122
+ ```
123
+
124
+ **Config loading pattern** (merge base + override):
125
+
126
+ ```python
127
+ # src/config.py
128
+ import yaml
129
+ from pathlib import Path
130
+ from typing import Any
131
+
132
+ def load_config(config_path: str) -> dict[str, Any]:
133
+ """Load config with base inheritance."""
134
+ with open(config_path) as f:
135
+ config = yaml.safe_load(f)
136
+
137
+ # Resolve base config inheritance
138
+ if "_base_" in config:
139
+ base_path = Path(config_path).parent / config.pop("_base_")
140
+ base = load_config(str(base_path))
141
+ base = deep_merge(base, config)
142
+ return base
143
+
144
+ return config
145
+
146
+ def deep_merge(base: dict, override: dict) -> dict:
147
+ """Recursively merge override into base."""
148
+ result = base.copy()
149
+ for key, value in override.items():
150
+ if key in result and isinstance(result[key], dict) and isinstance(value, dict):
151
+ result[key] = deep_merge(result[key], value)
152
+ else:
153
+ result[key] = value
154
+ return result
155
+ ```
156
+
157
+ ### Reproducibility Standards
158
+
159
+ Every experiment must be reproducible. This means another researcher (or the same agent in a future session) can re-run the experiment and get the same result:
160
+
161
+ **Mandatory reproducibility checklist**:
162
+
163
+ 1. **Seed everything**: Random number generators, data shuffling, model initialization.
164
+ ```python
165
+ import random
166
+ import numpy as np
167
+
168
+ def set_seed(seed: int) -> None:
169
+ random.seed(seed)
170
+ np.random.seed(seed)
171
+ # Framework-specific seeding
172
+ try:
173
+ import torch
174
+ torch.manual_seed(seed)
175
+ torch.cuda.manual_seed_all(seed)
176
+ torch.backends.cudnn.deterministic = True
177
+ torch.backends.cudnn.benchmark = False
178
+ except ImportError:
179
+ pass
180
+ ```
181
+
182
+ 2. **Pin dependencies**: Use exact versions, not ranges.
183
+ ```
184
+ # requirements.txt — pinned
185
+ numpy==1.26.4
186
+ pandas==2.2.1
187
+ scikit-learn==1.4.1
188
+ optuna==3.5.0
189
+ ```
190
+
191
+ 3. **Record environment**: Capture the full environment at experiment start.
192
+ ```python
193
+ import subprocess
194
+ import platform
195
+ import json
196
+
197
+ def capture_environment() -> dict:
198
+ return {
199
+ "python": platform.python_version(),
200
+ "platform": platform.platform(),
201
+ "pip_freeze": subprocess.check_output(
202
+ ["pip", "freeze"], text=True
203
+ ).strip().split("\n"),
204
+ "git_sha": subprocess.check_output(
205
+ ["git", "rev-parse", "HEAD"], text=True
206
+ ).strip(),
207
+ "git_dirty": bool(subprocess.check_output(
208
+ ["git", "status", "--porcelain"], text=True
209
+ ).strip()),
210
+ }
211
+ ```
212
+
213
+ 4. **Never modify data in place**: Raw data is immutable. Processed data is derived and can be regenerated from raw data + processing code.
214
+
215
+ 5. **Config-as-code**: The experiment config file (committed to git) must fully define the experiment. No "I changed that parameter manually."
216
+
217
+ ### Code Organization Conventions
218
+
219
+ Separate durable infrastructure code from disposable experiment code:
220
+
221
+ | Category | Location | Lifecycle |
222
+ |----------|----------|-----------|
223
+ | Experiment runner | `src/runner/` | Durable — rarely changes |
224
+ | Evaluation framework | `src/evaluation/` | Durable — rarely changes |
225
+ | Data loading | `src/data/` | Durable — rarely changes |
226
+ | Strategy/model code | `src/strategies/` or `src/models/` | Disposable — changes every experiment |
227
+ | Config files | `configs/` | Per-experiment |
228
+ | Results | `results/` | Per-experiment output |
229
+
230
+ **Import hygiene**: Experiment code imports from infrastructure code, never the reverse. The runner does not import specific strategies -- it discovers them via a registry or config-specified entry point.
231
+
232
+ ### Code Style for Research
233
+
234
+ - **Type hints everywhere**: Even in experiment code. Catches bugs early in a fast-iteration cycle.
235
+ - **Docstrings on public functions**: Especially for metric computation (document the formula).
236
+ - **No notebooks in git**: Notebooks are for interactive exploration. Convert to scripts before committing. If notebook-driven experiments are required, use `nbstripout` to strip outputs before committing.
237
+ - **Linting**: Use `ruff` for fast linting. Research code skips some style rules (unused imports during exploration) but enforces correctness rules (undefined variables, type errors).
238
+
239
+ ```toml
240
+ # pyproject.toml
241
+ [tool.ruff]
242
+ line-length = 100
243
+ select = ["E", "F", "W", "I"] # Errors, pyflakes, warnings, isort
244
+ ignore = ["E501"] # Allow long lines in research code
245
+
246
+ [tool.ruff.lint.isort]
247
+ known-first-party = ["src"]
248
+ ```