@zigrivers/scaffold 3.14.0 → 3.16.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +50 -21
- package/content/knowledge/core/automated-review-tooling.md +21 -26
- package/content/knowledge/core/multi-model-review-dispatch.md +30 -55
- package/content/knowledge/research/research-architecture.md +385 -0
- package/content/knowledge/research/research-conventions.md +248 -0
- package/content/knowledge/research/research-dev-environment.md +303 -0
- package/content/knowledge/research/research-experiment-loop.md +429 -0
- package/content/knowledge/research/research-experiment-tracking.md +336 -0
- package/content/knowledge/research/research-ml-architecture-search.md +383 -0
- package/content/knowledge/research/research-ml-evaluation.md +407 -0
- package/content/knowledge/research/research-ml-experiment-tracking.md +466 -0
- package/content/knowledge/research/research-ml-training-patterns.md +413 -0
- package/content/knowledge/research/research-observability.md +395 -0
- package/content/knowledge/research/research-overfitting-prevention.md +306 -0
- package/content/knowledge/research/research-project-structure.md +264 -0
- package/content/knowledge/research/research-quant-backtesting.md +326 -0
- package/content/knowledge/research/research-quant-market-data.md +366 -0
- package/content/knowledge/research/research-quant-metrics.md +335 -0
- package/content/knowledge/research/research-quant-requirements.md +223 -0
- package/content/knowledge/research/research-quant-risk.md +469 -0
- package/content/knowledge/research/research-quant-strategy-patterns.md +412 -0
- package/content/knowledge/research/research-requirements.md +201 -0
- package/content/knowledge/research/research-security.md +374 -0
- package/content/knowledge/research/research-sim-compute-management.md +538 -0
- package/content/knowledge/research/research-sim-engine-patterns.md +448 -0
- package/content/knowledge/research/research-sim-parameter-spaces.md +425 -0
- package/content/knowledge/research/research-sim-validation.md +456 -0
- package/content/knowledge/research/research-testing.md +334 -0
- package/content/methodology/research-ml-research.yml +23 -0
- package/content/methodology/research-overlay.yml +65 -0
- package/content/methodology/research-quant-finance.yml +29 -0
- package/content/methodology/research-simulation.yml +23 -0
- package/content/tools/post-implementation-review.md +36 -7
- package/content/tools/review-code.md +33 -8
- package/content/tools/review-pr.md +79 -95
- package/dist/cli/commands/adopt.d.ts.map +1 -1
- package/dist/cli/commands/adopt.js +22 -1
- package/dist/cli/commands/adopt.js.map +1 -1
- package/dist/cli/commands/adopt.serialization.test.js +41 -0
- package/dist/cli/commands/adopt.serialization.test.js.map +1 -1
- package/dist/cli/commands/init.d.ts +4 -0
- package/dist/cli/commands/init.d.ts.map +1 -1
- package/dist/cli/commands/init.js +32 -2
- package/dist/cli/commands/init.js.map +1 -1
- package/dist/cli/init-flag-families.d.ts +6 -1
- package/dist/cli/init-flag-families.d.ts.map +1 -1
- package/dist/cli/init-flag-families.js +32 -1
- package/dist/cli/init-flag-families.js.map +1 -1
- package/dist/cli/init-flag-families.test.js +47 -0
- package/dist/cli/init-flag-families.test.js.map +1 -1
- package/dist/config/schema.d.ts +272 -16
- package/dist/config/schema.d.ts.map +1 -1
- package/dist/config/schema.js +25 -1
- package/dist/config/schema.js.map +1 -1
- package/dist/config/schema.test.js +103 -3
- package/dist/config/schema.test.js.map +1 -1
- package/dist/core/assembly/overlay-loader.d.ts +12 -0
- package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
- package/dist/core/assembly/overlay-loader.js +30 -0
- package/dist/core/assembly/overlay-loader.js.map +1 -1
- package/dist/core/assembly/overlay-loader.test.js +66 -1
- package/dist/core/assembly/overlay-loader.test.js.map +1 -1
- package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
- package/dist/core/assembly/overlay-state-resolver.js +48 -19
- package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
- package/dist/core/assembly/overlay-state-resolver.test.js +80 -0
- package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
- package/dist/e2e/project-type-overlays.test.js +119 -0
- package/dist/e2e/project-type-overlays.test.js.map +1 -1
- package/dist/project/adopt.d.ts.map +1 -1
- package/dist/project/adopt.js +3 -1
- package/dist/project/adopt.js.map +1 -1
- package/dist/project/detectors/disambiguate.js +1 -1
- package/dist/project/detectors/disambiguate.js.map +1 -1
- package/dist/project/detectors/index.d.ts.map +1 -1
- package/dist/project/detectors/index.js +2 -1
- package/dist/project/detectors/index.js.map +1 -1
- package/dist/project/detectors/ml.d.ts.map +1 -1
- package/dist/project/detectors/ml.js +2 -6
- package/dist/project/detectors/ml.js.map +1 -1
- package/dist/project/detectors/research.d.ts +4 -0
- package/dist/project/detectors/research.d.ts.map +1 -0
- package/dist/project/detectors/research.js +141 -0
- package/dist/project/detectors/research.js.map +1 -0
- package/dist/project/detectors/research.test.d.ts +2 -0
- package/dist/project/detectors/research.test.d.ts.map +1 -0
- package/dist/project/detectors/research.test.js +235 -0
- package/dist/project/detectors/research.test.js.map +1 -0
- package/dist/project/detectors/shared-signals.d.ts +3 -0
- package/dist/project/detectors/shared-signals.d.ts.map +1 -0
- package/dist/project/detectors/shared-signals.js +9 -0
- package/dist/project/detectors/shared-signals.js.map +1 -0
- package/dist/project/detectors/types.d.ts +6 -2
- package/dist/project/detectors/types.d.ts.map +1 -1
- package/dist/project/detectors/types.js.map +1 -1
- package/dist/types/config.d.ts +7 -1
- package/dist/types/config.d.ts.map +1 -1
- package/dist/wizard/copy/core.d.ts.map +1 -1
- package/dist/wizard/copy/core.js +4 -0
- package/dist/wizard/copy/core.js.map +1 -1
- package/dist/wizard/copy/index.d.ts.map +1 -1
- package/dist/wizard/copy/index.js +2 -0
- package/dist/wizard/copy/index.js.map +1 -1
- package/dist/wizard/copy/research.d.ts +3 -0
- package/dist/wizard/copy/research.d.ts.map +1 -0
- package/dist/wizard/copy/research.js +27 -0
- package/dist/wizard/copy/research.js.map +1 -0
- package/dist/wizard/copy/types.d.ts +5 -1
- package/dist/wizard/copy/types.d.ts.map +1 -1
- package/dist/wizard/flags.d.ts +7 -1
- package/dist/wizard/flags.d.ts.map +1 -1
- package/dist/wizard/questions.d.ts +4 -2
- package/dist/wizard/questions.d.ts.map +1 -1
- package/dist/wizard/questions.js +27 -1
- package/dist/wizard/questions.js.map +1 -1
- package/dist/wizard/questions.test.js +51 -0
- package/dist/wizard/questions.test.js.map +1 -1
- package/dist/wizard/wizard.d.ts +3 -2
- package/dist/wizard/wizard.d.ts.map +1 -1
- package/dist/wizard/wizard.js +3 -1
- package/dist/wizard/wizard.js.map +1 -1
- package/package.json +1 -1
|
@@ -0,0 +1,385 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research-architecture
|
|
3
|
+
description: Experiment runner architecture including pluggable experiment and evaluation interfaces, state management patterns, and result persistence
|
|
4
|
+
topics: [research, architecture, experiment-runner, state-management, interfaces, persistence]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
The experiment runner is the central architectural component of a research project. It orchestrates the loop of loading configuration, executing experiments, evaluating results, and deciding whether to keep or discard each run. The runner must be completely decoupled from the specific experiment logic (strategies, models, parameter spaces) so that it can drive any experiment without modification. This separation is what makes autonomous iteration possible -- the agent modifies experiment code while the runner infrastructure remains stable.
|
|
8
|
+
|
|
9
|
+
## Summary
|
|
10
|
+
|
|
11
|
+
Build the experiment runner around three pluggable interfaces: Strategy (executes an experiment given config), Evaluator (computes metrics from raw results), and Tracker (records results for comparison). Use a state manager to track the current best result, iteration history, and budget consumption. Persist all state to disk so that the runner can resume after crashes. The runner never imports specific strategy code -- it discovers strategies via a registry or config-specified entry point.
|
|
12
|
+
|
|
13
|
+
## Deep Guidance
|
|
14
|
+
|
|
15
|
+
### Core Architecture
|
|
16
|
+
|
|
17
|
+
```
|
|
18
|
+
┌──────────────────────┐
|
|
19
|
+
│ ExperimentRunner │
|
|
20
|
+
│ ┌────────────────┐ │
|
|
21
|
+
Config ──────────►│ │ State Manager │ │
|
|
22
|
+
│ │ (best, history)│ │
|
|
23
|
+
│ └───────┬────────┘ │
|
|
24
|
+
│ │ │
|
|
25
|
+
│ ┌───────▼────────┐ │
|
|
26
|
+
│ │ Budget Checker │ │
|
|
27
|
+
│ └───────┬────────┘ │
|
|
28
|
+
│ │ │
|
|
29
|
+
│ ┌───────▼────────┐ │
|
|
30
|
+
│ │ Strategy │◄─┼── Registry lookup
|
|
31
|
+
│ │ (pluggable) │ │
|
|
32
|
+
│ └───────┬────────┘ │
|
|
33
|
+
│ │ │
|
|
34
|
+
│ ┌───────▼────────┐ │
|
|
35
|
+
│ │ Evaluator │ │
|
|
36
|
+
│ │ (pluggable) │ │
|
|
37
|
+
│ └───────┬────────┘ │
|
|
38
|
+
│ │ │
|
|
39
|
+
│ ┌───────▼────────┐ │
|
|
40
|
+
│ │ Tracker │ │
|
|
41
|
+
│ │ (pluggable) │ │
|
|
42
|
+
│ └────────────────┘ │
|
|
43
|
+
└──────────────────────┘
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
### Pluggable Interface Design
|
|
47
|
+
|
|
48
|
+
The three core interfaces use Python's Protocol type for structural subtyping. This means strategies do not need to inherit from a base class -- they only need to implement the required methods:
|
|
49
|
+
|
|
50
|
+
```python
|
|
51
|
+
# src/interfaces.py
|
|
52
|
+
from typing import Protocol, Any, runtime_checkable
|
|
53
|
+
|
|
54
|
+
@runtime_checkable
|
|
55
|
+
class Strategy(Protocol):
|
|
56
|
+
"""Interface for experiment execution strategies."""
|
|
57
|
+
|
|
58
|
+
@property
|
|
59
|
+
def name(self) -> str:
|
|
60
|
+
"""Unique identifier for this strategy."""
|
|
61
|
+
...
|
|
62
|
+
|
|
63
|
+
def execute(self, config: dict[str, Any]) -> dict[str, Any]:
|
|
64
|
+
"""
|
|
65
|
+
Execute the experiment and return raw results.
|
|
66
|
+
|
|
67
|
+
Args:
|
|
68
|
+
config: Experiment configuration dict.
|
|
69
|
+
|
|
70
|
+
Returns:
|
|
71
|
+
Raw results dict. Structure is strategy-specific but must
|
|
72
|
+
contain enough information for the Evaluator to compute metrics.
|
|
73
|
+
"""
|
|
74
|
+
...
|
|
75
|
+
|
|
76
|
+
@runtime_checkable
|
|
77
|
+
class Evaluator(Protocol):
|
|
78
|
+
"""Interface for result evaluation."""
|
|
79
|
+
|
|
80
|
+
def evaluate(self, raw_results: dict[str, Any]) -> dict[str, float]:
|
|
81
|
+
"""
|
|
82
|
+
Compute metrics from raw experiment results.
|
|
83
|
+
|
|
84
|
+
Args:
|
|
85
|
+
raw_results: Output from Strategy.execute().
|
|
86
|
+
|
|
87
|
+
Returns:
|
|
88
|
+
Dict mapping metric names to float values.
|
|
89
|
+
"""
|
|
90
|
+
...
|
|
91
|
+
|
|
92
|
+
def is_improvement(self, current: dict[str, float],
|
|
93
|
+
best: dict[str, float]) -> bool:
|
|
94
|
+
"""
|
|
95
|
+
Determine if current results improve on the best so far.
|
|
96
|
+
|
|
97
|
+
Args:
|
|
98
|
+
current: Metrics from the current run.
|
|
99
|
+
best: Metrics from the best run so far.
|
|
100
|
+
|
|
101
|
+
Returns:
|
|
102
|
+
True if current should replace best.
|
|
103
|
+
"""
|
|
104
|
+
...
|
|
105
|
+
|
|
106
|
+
@runtime_checkable
|
|
107
|
+
class Tracker(Protocol):
|
|
108
|
+
"""Interface for experiment result tracking."""
|
|
109
|
+
|
|
110
|
+
def log_run(self, run_id: str, config: dict, metrics: dict[str, float],
|
|
111
|
+
artifacts: dict[str, Any] | None = None) -> None:
|
|
112
|
+
"""Record a single experiment run."""
|
|
113
|
+
...
|
|
114
|
+
|
|
115
|
+
def get_history(self) -> list[dict]:
|
|
116
|
+
"""Return all recorded runs."""
|
|
117
|
+
...
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
### Strategy Registry
|
|
121
|
+
|
|
122
|
+
The registry pattern allows the runner to instantiate strategies by name without importing them directly:
|
|
123
|
+
|
|
124
|
+
```python
|
|
125
|
+
# src/strategies/registry.py
|
|
126
|
+
from typing import Type
|
|
127
|
+
from src.interfaces import Strategy
|
|
128
|
+
|
|
129
|
+
class StrategyRegistry:
|
|
130
|
+
"""Registry for experiment strategy classes."""
|
|
131
|
+
|
|
132
|
+
_registry: dict[str, Type[Strategy]] = {}
|
|
133
|
+
|
|
134
|
+
@classmethod
|
|
135
|
+
def register(cls, name: str):
|
|
136
|
+
"""Decorator to register a strategy class."""
|
|
137
|
+
def decorator(strategy_cls: Type[Strategy]):
|
|
138
|
+
if name in cls._registry:
|
|
139
|
+
raise ValueError(f"Strategy '{name}' already registered")
|
|
140
|
+
cls._registry[name] = strategy_cls
|
|
141
|
+
return strategy_cls
|
|
142
|
+
return decorator
|
|
143
|
+
|
|
144
|
+
@classmethod
|
|
145
|
+
def get(cls, name: str) -> Type[Strategy]:
|
|
146
|
+
"""Look up a strategy by name."""
|
|
147
|
+
if name not in cls._registry:
|
|
148
|
+
available = ", ".join(sorted(cls._registry.keys()))
|
|
149
|
+
raise KeyError(
|
|
150
|
+
f"Strategy '{name}' not found. Available: {available}"
|
|
151
|
+
)
|
|
152
|
+
return cls._registry[name]
|
|
153
|
+
|
|
154
|
+
@classmethod
|
|
155
|
+
def list_strategies(cls) -> list[str]:
|
|
156
|
+
return sorted(cls._registry.keys())
|
|
157
|
+
|
|
158
|
+
|
|
159
|
+
# Usage in a strategy file:
|
|
160
|
+
# src/strategies/momentum.py
|
|
161
|
+
from src.strategies.registry import StrategyRegistry
|
|
162
|
+
|
|
163
|
+
@StrategyRegistry.register("momentum_crossover")
|
|
164
|
+
class MomentumCrossover:
|
|
165
|
+
name = "momentum_crossover"
|
|
166
|
+
|
|
167
|
+
def __init__(self, lookback: int = 20, **kwargs):
|
|
168
|
+
self.lookback = lookback
|
|
169
|
+
|
|
170
|
+
def execute(self, config: dict) -> dict:
|
|
171
|
+
# ... run the momentum crossover strategy ...
|
|
172
|
+
return {"trades": trades, "equity_curve": equity}
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
### State Management
|
|
176
|
+
|
|
177
|
+
The state manager tracks the experiment loop's progress and enables resume-after-crash:
|
|
178
|
+
|
|
179
|
+
```python
|
|
180
|
+
# src/runner/state.py
|
|
181
|
+
import json
|
|
182
|
+
from pathlib import Path
|
|
183
|
+
from dataclasses import dataclass, field, asdict
|
|
184
|
+
from typing import Any
|
|
185
|
+
|
|
186
|
+
@dataclass
|
|
187
|
+
class RunRecord:
|
|
188
|
+
"""Record of a single experiment run."""
|
|
189
|
+
run_id: str
|
|
190
|
+
config: dict[str, Any]
|
|
191
|
+
metrics: dict[str, float]
|
|
192
|
+
is_best: bool = False
|
|
193
|
+
decision: str = "" # "keep" or "discard"
|
|
194
|
+
reason: str = ""
|
|
195
|
+
|
|
196
|
+
@dataclass
|
|
197
|
+
class ExperimentState:
|
|
198
|
+
"""Persistent state for the experiment loop."""
|
|
199
|
+
experiment_id: str
|
|
200
|
+
total_runs: int = 0
|
|
201
|
+
best_run: RunRecord | None = None
|
|
202
|
+
history: list[RunRecord] = field(default_factory=list)
|
|
203
|
+
runs_since_improvement: int = 0
|
|
204
|
+
|
|
205
|
+
def record_run(self, run: RunRecord) -> None:
|
|
206
|
+
"""Record a completed run and update state."""
|
|
207
|
+
self.total_runs += 1
|
|
208
|
+
self.history.append(run)
|
|
209
|
+
|
|
210
|
+
if run.is_best:
|
|
211
|
+
self.best_run = run
|
|
212
|
+
self.runs_since_improvement = 0
|
|
213
|
+
else:
|
|
214
|
+
self.runs_since_improvement += 1
|
|
215
|
+
|
|
216
|
+
def save(self, path: Path) -> None:
|
|
217
|
+
"""Persist state to disk for crash recovery."""
|
|
218
|
+
path.parent.mkdir(parents=True, exist_ok=True)
|
|
219
|
+
with open(path, "w") as f:
|
|
220
|
+
json.dump(asdict(self), f, indent=2, default=str)
|
|
221
|
+
|
|
222
|
+
@classmethod
|
|
223
|
+
def load(cls, path: Path) -> "ExperimentState":
|
|
224
|
+
"""Load state from disk. Returns empty state if file missing."""
|
|
225
|
+
if not path.exists():
|
|
226
|
+
return cls(experiment_id="unknown")
|
|
227
|
+
with open(path) as f:
|
|
228
|
+
data = json.load(f)
|
|
229
|
+
state = cls(experiment_id=data["experiment_id"])
|
|
230
|
+
state.total_runs = data["total_runs"]
|
|
231
|
+
state.runs_since_improvement = data["runs_since_improvement"]
|
|
232
|
+
state.history = [RunRecord(**r) for r in data["history"]]
|
|
233
|
+
if data["best_run"]:
|
|
234
|
+
state.best_run = RunRecord(**data["best_run"])
|
|
235
|
+
return state
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
### The Experiment Runner
|
|
239
|
+
|
|
240
|
+
The runner ties the interfaces together:
|
|
241
|
+
|
|
242
|
+
```python
|
|
243
|
+
# src/runner/experiment_runner.py
|
|
244
|
+
import logging
|
|
245
|
+
from pathlib import Path
|
|
246
|
+
from src.interfaces import Strategy, Evaluator, Tracker
|
|
247
|
+
from src.runner.state import ExperimentState, RunRecord
|
|
248
|
+
from src.runner.budget import IterationBudget
|
|
249
|
+
from src.config import load_config
|
|
250
|
+
from src.seed import set_seed, capture_environment
|
|
251
|
+
from src.strategies.registry import StrategyRegistry
|
|
252
|
+
|
|
253
|
+
logger = logging.getLogger(__name__)
|
|
254
|
+
|
|
255
|
+
class ExperimentRunner:
|
|
256
|
+
def __init__(self, config_path: str):
|
|
257
|
+
self.config = load_config(config_path)
|
|
258
|
+
self.experiment_id = Path(config_path).stem
|
|
259
|
+
self.results_dir = Path(self.config["logging"]["results_dir"]) / self.experiment_id
|
|
260
|
+
|
|
261
|
+
# Load pluggable components
|
|
262
|
+
strategy_cls = StrategyRegistry.get(self.config["strategy"]["type"])
|
|
263
|
+
self.strategy: Strategy = strategy_cls(**self.config["strategy"].get("params", {}))
|
|
264
|
+
self.evaluator: Evaluator = self._build_evaluator()
|
|
265
|
+
self.tracker: Tracker = self._build_tracker()
|
|
266
|
+
self.budget = IterationBudget(**self.config.get("budget", {}))
|
|
267
|
+
|
|
268
|
+
# Load or initialize state
|
|
269
|
+
self.state_path = self.results_dir / "state.json"
|
|
270
|
+
self.state = ExperimentState.load(self.state_path)
|
|
271
|
+
self.state.experiment_id = self.experiment_id
|
|
272
|
+
|
|
273
|
+
def run_loop(self) -> ExperimentState:
|
|
274
|
+
"""Run the full experiment loop until budget exhaustion or convergence."""
|
|
275
|
+
logger.info("Starting experiment %s (resuming from run %d)",
|
|
276
|
+
self.experiment_id, self.state.total_runs)
|
|
277
|
+
|
|
278
|
+
while True:
|
|
279
|
+
# Check budget
|
|
280
|
+
exhausted, reason = self.budget.is_exhausted(
|
|
281
|
+
runs=self.state.total_runs,
|
|
282
|
+
runs_since_improvement=self.state.runs_since_improvement,
|
|
283
|
+
)
|
|
284
|
+
if exhausted:
|
|
285
|
+
logger.info("Stopping: %s", reason)
|
|
286
|
+
break
|
|
287
|
+
|
|
288
|
+
# Execute one iteration
|
|
289
|
+
run_id = f"run-{self.state.total_runs + 1:04d}"
|
|
290
|
+
set_seed(self.config["experiment"]["seed"] + self.state.total_runs)
|
|
291
|
+
|
|
292
|
+
try:
|
|
293
|
+
raw_results = self.strategy.execute(self.config)
|
|
294
|
+
metrics = self.evaluator.evaluate(raw_results)
|
|
295
|
+
except Exception as e:
|
|
296
|
+
logger.error("Run %s failed: %s", run_id, e)
|
|
297
|
+
continue
|
|
298
|
+
|
|
299
|
+
# Evaluate improvement
|
|
300
|
+
is_best = (
|
|
301
|
+
self.state.best_run is None
|
|
302
|
+
or self.evaluator.is_improvement(metrics, self.state.best_run.metrics)
|
|
303
|
+
)
|
|
304
|
+
decision = "keep" if is_best else "discard"
|
|
305
|
+
|
|
306
|
+
run = RunRecord(
|
|
307
|
+
run_id=run_id,
|
|
308
|
+
config=self.config,
|
|
309
|
+
metrics=metrics,
|
|
310
|
+
is_best=is_best,
|
|
311
|
+
decision=decision,
|
|
312
|
+
reason=f"{'New best' if is_best else 'No improvement'}",
|
|
313
|
+
)
|
|
314
|
+
|
|
315
|
+
# Record and persist
|
|
316
|
+
self.state.record_run(run)
|
|
317
|
+
self.tracker.log_run(run_id, self.config, metrics)
|
|
318
|
+
self.state.save(self.state_path)
|
|
319
|
+
|
|
320
|
+
logger.info(
|
|
321
|
+
"Run %s: %s (metrics: %s)",
|
|
322
|
+
run_id, decision,
|
|
323
|
+
{k: f"{v:.4f}" for k, v in metrics.items()},
|
|
324
|
+
)
|
|
325
|
+
|
|
326
|
+
return self.state
|
|
327
|
+
```
|
|
328
|
+
|
|
329
|
+
### Result Persistence
|
|
330
|
+
|
|
331
|
+
Results are persisted at two levels:
|
|
332
|
+
|
|
333
|
+
1. **Per-run**: Each run's config, metrics, and artifacts are saved to `results/{experiment_id}/{run_id}/`.
|
|
334
|
+
2. **Experiment state**: The full experiment state (history, best run, budget consumption) is saved to `results/{experiment_id}/state.json` after every run.
|
|
335
|
+
|
|
336
|
+
```python
|
|
337
|
+
# src/tracking/file_tracker.py
|
|
338
|
+
import json
|
|
339
|
+
from pathlib import Path
|
|
340
|
+
from src.interfaces import Tracker
|
|
341
|
+
|
|
342
|
+
class FileTracker:
|
|
343
|
+
"""Simple file-based experiment tracker."""
|
|
344
|
+
|
|
345
|
+
def __init__(self, results_dir: str):
|
|
346
|
+
self.results_dir = Path(results_dir)
|
|
347
|
+
self.results_dir.mkdir(parents=True, exist_ok=True)
|
|
348
|
+
|
|
349
|
+
def log_run(self, run_id: str, config: dict, metrics: dict[str, float],
|
|
350
|
+
artifacts: dict | None = None) -> None:
|
|
351
|
+
run_dir = self.results_dir / run_id
|
|
352
|
+
run_dir.mkdir(parents=True, exist_ok=True)
|
|
353
|
+
|
|
354
|
+
with open(run_dir / "config.json", "w") as f:
|
|
355
|
+
json.dump(config, f, indent=2, default=str)
|
|
356
|
+
with open(run_dir / "metrics.json", "w") as f:
|
|
357
|
+
json.dump(metrics, f, indent=2)
|
|
358
|
+
|
|
359
|
+
if artifacts:
|
|
360
|
+
artifact_dir = run_dir / "artifacts"
|
|
361
|
+
artifact_dir.mkdir(exist_ok=True)
|
|
362
|
+
for name, data in artifacts.items():
|
|
363
|
+
with open(artifact_dir / name, "w") as f:
|
|
364
|
+
json.dump(data, f, indent=2, default=str)
|
|
365
|
+
|
|
366
|
+
def get_history(self) -> list[dict]:
|
|
367
|
+
runs = []
|
|
368
|
+
for run_dir in sorted(self.results_dir.iterdir()):
|
|
369
|
+
if run_dir.is_dir() and (run_dir / "metrics.json").exists():
|
|
370
|
+
with open(run_dir / "metrics.json") as f:
|
|
371
|
+
metrics = json.load(f)
|
|
372
|
+
runs.append({"run_id": run_dir.name, "metrics": metrics})
|
|
373
|
+
return runs
|
|
374
|
+
```
|
|
375
|
+
|
|
376
|
+
### Architecture Decision: When to Use Each Driver
|
|
377
|
+
|
|
378
|
+
| Driver | Architecture Pattern | Use When |
|
|
379
|
+
|--------|---------------------|----------|
|
|
380
|
+
| Code-driven | Git state machine, agent modifies source | Exploring algorithmic variations, strategy development |
|
|
381
|
+
| Config-driven | Fixed runner, parameterised configs | Hyperparameter sweeps, systematic parameter search |
|
|
382
|
+
| API-driven | Client wrapper, parameter serialization | External backtest engines, cloud simulation APIs |
|
|
383
|
+
| Notebook-driven | Papermill execution, cell-level tracking | Exploratory research, visualization-heavy analysis |
|
|
384
|
+
|
|
385
|
+
The runner architecture remains the same across all drivers. What changes is the Strategy implementation: code-driven strategies contain the algorithm directly, config-driven strategies delegate to a parameterised engine, API-driven strategies wrap HTTP calls, and notebook-driven strategies use papermill to execute notebooks.
|
|
@@ -0,0 +1,248 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research-conventions
|
|
3
|
+
description: Coding conventions for research projects including experiment branching, result naming, config management, and reproducibility standards
|
|
4
|
+
topics: [research, conventions, git, branching, reproducibility, config-management]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
Research code has a unique lifecycle: most code is written to be tried and discarded. A trading strategy that underperforms is reverted. A hyperparameter sweep that converges to a local minimum is abandoned. The conventions must make this try-and-discard cycle fast and safe while preserving a complete audit trail of what was tried and why it was kept or discarded.
|
|
8
|
+
|
|
9
|
+
## Summary
|
|
10
|
+
|
|
11
|
+
Use git branches as the state machine for experiment lifecycle (try, evaluate, keep/revert). Name branches, results, and configs with a consistent scheme that encodes the experiment ID, hypothesis, and timestamp. Pin every dependency and seed every random source for reproducibility. Separate experiment code (disposable) from infrastructure code (durable) in the repository structure. Use structured config files (YAML/TOML) instead of command-line argument sprawl.
|
|
12
|
+
|
|
13
|
+
## Deep Guidance
|
|
14
|
+
|
|
15
|
+
### Git as Experiment State Machine
|
|
16
|
+
|
|
17
|
+
The experiment loop uses git as its state management layer. Each experiment run is a branch. The decision to keep or discard is a merge or branch deletion:
|
|
18
|
+
|
|
19
|
+
```
|
|
20
|
+
main (stable baseline)
|
|
21
|
+
|
|
|
22
|
+
+-- exp/001-momentum-lookback-20 (try → evaluate → keep → merge)
|
|
23
|
+
|
|
|
24
|
+
+-- exp/002-momentum-lookback-10 (try → evaluate → discard → delete)
|
|
25
|
+
|
|
|
26
|
+
+-- exp/003-mean-revert-rsi (try → evaluate → keep → merge)
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
**Branch naming convention**: `exp/{NNN}-{short-description}`
|
|
30
|
+
- `NNN`: Zero-padded sequential experiment number
|
|
31
|
+
- `short-description`: Kebab-case summary of what is being tested
|
|
32
|
+
- Examples: `exp/001-adaptive-lookback`, `exp/042-ensemble-top3`
|
|
33
|
+
|
|
34
|
+
**Workflow**:
|
|
35
|
+
```bash
|
|
36
|
+
# Start a new experiment
|
|
37
|
+
git checkout main
|
|
38
|
+
git checkout -b exp/015-rsi-threshold-sweep
|
|
39
|
+
|
|
40
|
+
# ... agent modifies code, runs experiment ...
|
|
41
|
+
|
|
42
|
+
# Experiment succeeded — merge to main
|
|
43
|
+
git checkout main
|
|
44
|
+
git merge --no-ff exp/015-rsi-threshold-sweep -m "exp/015: RSI threshold 30/70 Sharpe=1.6"
|
|
45
|
+
|
|
46
|
+
# Experiment failed — discard
|
|
47
|
+
git branch -D exp/015-rsi-threshold-sweep
|
|
48
|
+
# Or keep for reference:
|
|
49
|
+
git tag archive/exp/015-rsi-threshold-sweep exp/015-rsi-threshold-sweep
|
|
50
|
+
git branch -D exp/015-rsi-threshold-sweep
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
**Commit message convention for experiments**:
|
|
54
|
+
```
|
|
55
|
+
exp/015: RSI threshold sweep
|
|
56
|
+
|
|
57
|
+
Hypothesis: RSI overbought/oversold thresholds of 30/70 will outperform
|
|
58
|
+
the default 20/80 on 2020-2023 equity data.
|
|
59
|
+
|
|
60
|
+
Result: Sharpe=1.6, MaxDD=11%, 247 trades
|
|
61
|
+
Decision: KEEP — new best by Sharpe, DD within guardrail
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### Result Naming
|
|
65
|
+
|
|
66
|
+
Every experiment run produces artifacts. Use a consistent naming scheme:
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
results/
|
|
70
|
+
exp-001/
|
|
71
|
+
config.yml # Exact config used for this run
|
|
72
|
+
metrics.json # Final metrics
|
|
73
|
+
metrics_history.csv # Per-iteration metrics
|
|
74
|
+
artifacts/ # Model checkpoints, plots, etc.
|
|
75
|
+
log.txt # Full stdout/stderr
|
|
76
|
+
exp-002/
|
|
77
|
+
...
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**File naming rules**:
|
|
81
|
+
- Directories: `exp-{NNN}` matching the git branch number
|
|
82
|
+
- Timestamps in filenames when multiple runs share an experiment: `exp-001-20240315T143022`
|
|
83
|
+
- Never use spaces or special characters in result paths
|
|
84
|
+
- Metrics files are always JSON (machine-readable) or CSV (tabular)
|
|
85
|
+
|
|
86
|
+
### Config Management
|
|
87
|
+
|
|
88
|
+
Research projects accumulate dozens of configuration parameters. Manage them with structured config files, not argument sprawl:
|
|
89
|
+
|
|
90
|
+
```yaml
|
|
91
|
+
# configs/base.yml — shared defaults
|
|
92
|
+
experiment:
|
|
93
|
+
seed: 42
|
|
94
|
+
num_runs: 100
|
|
95
|
+
patience: 20
|
|
96
|
+
|
|
97
|
+
data:
|
|
98
|
+
source: "data/prices.parquet"
|
|
99
|
+
train_start: "2015-01-01"
|
|
100
|
+
train_end: "2019-12-31"
|
|
101
|
+
test_start: "2020-01-01"
|
|
102
|
+
test_end: "2023-12-31"
|
|
103
|
+
|
|
104
|
+
logging:
|
|
105
|
+
level: INFO
|
|
106
|
+
results_dir: "results"
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
```yaml
|
|
110
|
+
# configs/exp-015-rsi-sweep.yml — experiment-specific overrides
|
|
111
|
+
_base_: base.yml
|
|
112
|
+
|
|
113
|
+
strategy:
|
|
114
|
+
type: "rsi_threshold"
|
|
115
|
+
params:
|
|
116
|
+
overbought: 70
|
|
117
|
+
oversold: 30
|
|
118
|
+
lookback: 14
|
|
119
|
+
|
|
120
|
+
experiment:
|
|
121
|
+
num_runs: 200 # Override base
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
**Config loading pattern** (merge base + override):
|
|
125
|
+
|
|
126
|
+
```python
|
|
127
|
+
# src/config.py
|
|
128
|
+
import yaml
|
|
129
|
+
from pathlib import Path
|
|
130
|
+
from typing import Any
|
|
131
|
+
|
|
132
|
+
def load_config(config_path: str) -> dict[str, Any]:
|
|
133
|
+
"""Load config with base inheritance."""
|
|
134
|
+
with open(config_path) as f:
|
|
135
|
+
config = yaml.safe_load(f)
|
|
136
|
+
|
|
137
|
+
# Resolve base config inheritance
|
|
138
|
+
if "_base_" in config:
|
|
139
|
+
base_path = Path(config_path).parent / config.pop("_base_")
|
|
140
|
+
base = load_config(str(base_path))
|
|
141
|
+
base = deep_merge(base, config)
|
|
142
|
+
return base
|
|
143
|
+
|
|
144
|
+
return config
|
|
145
|
+
|
|
146
|
+
def deep_merge(base: dict, override: dict) -> dict:
|
|
147
|
+
"""Recursively merge override into base."""
|
|
148
|
+
result = base.copy()
|
|
149
|
+
for key, value in override.items():
|
|
150
|
+
if key in result and isinstance(result[key], dict) and isinstance(value, dict):
|
|
151
|
+
result[key] = deep_merge(result[key], value)
|
|
152
|
+
else:
|
|
153
|
+
result[key] = value
|
|
154
|
+
return result
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
### Reproducibility Standards
|
|
158
|
+
|
|
159
|
+
Every experiment must be reproducible. This means another researcher (or the same agent in a future session) can re-run the experiment and get the same result:
|
|
160
|
+
|
|
161
|
+
**Mandatory reproducibility checklist**:
|
|
162
|
+
|
|
163
|
+
1. **Seed everything**: Random number generators, data shuffling, model initialization.
|
|
164
|
+
```python
|
|
165
|
+
import random
|
|
166
|
+
import numpy as np
|
|
167
|
+
|
|
168
|
+
def set_seed(seed: int) -> None:
|
|
169
|
+
random.seed(seed)
|
|
170
|
+
np.random.seed(seed)
|
|
171
|
+
# Framework-specific seeding
|
|
172
|
+
try:
|
|
173
|
+
import torch
|
|
174
|
+
torch.manual_seed(seed)
|
|
175
|
+
torch.cuda.manual_seed_all(seed)
|
|
176
|
+
torch.backends.cudnn.deterministic = True
|
|
177
|
+
torch.backends.cudnn.benchmark = False
|
|
178
|
+
except ImportError:
|
|
179
|
+
pass
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
2. **Pin dependencies**: Use exact versions, not ranges.
|
|
183
|
+
```
|
|
184
|
+
# requirements.txt — pinned
|
|
185
|
+
numpy==1.26.4
|
|
186
|
+
pandas==2.2.1
|
|
187
|
+
scikit-learn==1.4.1
|
|
188
|
+
optuna==3.5.0
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
3. **Record environment**: Capture the full environment at experiment start.
|
|
192
|
+
```python
|
|
193
|
+
import subprocess
|
|
194
|
+
import platform
|
|
195
|
+
import json
|
|
196
|
+
|
|
197
|
+
def capture_environment() -> dict:
|
|
198
|
+
return {
|
|
199
|
+
"python": platform.python_version(),
|
|
200
|
+
"platform": platform.platform(),
|
|
201
|
+
"pip_freeze": subprocess.check_output(
|
|
202
|
+
["pip", "freeze"], text=True
|
|
203
|
+
).strip().split("\n"),
|
|
204
|
+
"git_sha": subprocess.check_output(
|
|
205
|
+
["git", "rev-parse", "HEAD"], text=True
|
|
206
|
+
).strip(),
|
|
207
|
+
"git_dirty": bool(subprocess.check_output(
|
|
208
|
+
["git", "status", "--porcelain"], text=True
|
|
209
|
+
).strip()),
|
|
210
|
+
}
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
4. **Never modify data in place**: Raw data is immutable. Processed data is derived and can be regenerated from raw data + processing code.
|
|
214
|
+
|
|
215
|
+
5. **Config-as-code**: The experiment config file (committed to git) must fully define the experiment. No "I changed that parameter manually."
|
|
216
|
+
|
|
217
|
+
### Code Organization Conventions
|
|
218
|
+
|
|
219
|
+
Separate durable infrastructure code from disposable experiment code:
|
|
220
|
+
|
|
221
|
+
| Category | Location | Lifecycle |
|
|
222
|
+
|----------|----------|-----------|
|
|
223
|
+
| Experiment runner | `src/runner/` | Durable — rarely changes |
|
|
224
|
+
| Evaluation framework | `src/evaluation/` | Durable — rarely changes |
|
|
225
|
+
| Data loading | `src/data/` | Durable — rarely changes |
|
|
226
|
+
| Strategy/model code | `src/strategies/` or `src/models/` | Disposable — changes every experiment |
|
|
227
|
+
| Config files | `configs/` | Per-experiment |
|
|
228
|
+
| Results | `results/` | Per-experiment output |
|
|
229
|
+
|
|
230
|
+
**Import hygiene**: Experiment code imports from infrastructure code, never the reverse. The runner does not import specific strategies -- it discovers them via a registry or config-specified entry point.
|
|
231
|
+
|
|
232
|
+
### Code Style for Research
|
|
233
|
+
|
|
234
|
+
- **Type hints everywhere**: Even in experiment code. Catches bugs early in a fast-iteration cycle.
|
|
235
|
+
- **Docstrings on public functions**: Especially for metric computation (document the formula).
|
|
236
|
+
- **No notebooks in git**: Notebooks are for interactive exploration. Convert to scripts before committing. If notebook-driven experiments are required, use `nbstripout` to strip outputs before committing.
|
|
237
|
+
- **Linting**: Use `ruff` for fast linting. Research code skips some style rules (unused imports during exploration) but enforces correctness rules (undefined variables, type errors).
|
|
238
|
+
|
|
239
|
+
```toml
|
|
240
|
+
# pyproject.toml
|
|
241
|
+
[tool.ruff]
|
|
242
|
+
line-length = 100
|
|
243
|
+
select = ["E", "F", "W", "I"] # Errors, pyflakes, warnings, isort
|
|
244
|
+
ignore = ["E501"] # Allow long lines in research code
|
|
245
|
+
|
|
246
|
+
[tool.ruff.lint.isort]
|
|
247
|
+
known-first-party = ["src"]
|
|
248
|
+
```
|