@zigrivers/scaffold 3.14.0 → 3.16.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +50 -21
- package/content/knowledge/core/automated-review-tooling.md +21 -26
- package/content/knowledge/core/multi-model-review-dispatch.md +30 -55
- package/content/knowledge/research/research-architecture.md +385 -0
- package/content/knowledge/research/research-conventions.md +248 -0
- package/content/knowledge/research/research-dev-environment.md +303 -0
- package/content/knowledge/research/research-experiment-loop.md +429 -0
- package/content/knowledge/research/research-experiment-tracking.md +336 -0
- package/content/knowledge/research/research-ml-architecture-search.md +383 -0
- package/content/knowledge/research/research-ml-evaluation.md +407 -0
- package/content/knowledge/research/research-ml-experiment-tracking.md +466 -0
- package/content/knowledge/research/research-ml-training-patterns.md +413 -0
- package/content/knowledge/research/research-observability.md +395 -0
- package/content/knowledge/research/research-overfitting-prevention.md +306 -0
- package/content/knowledge/research/research-project-structure.md +264 -0
- package/content/knowledge/research/research-quant-backtesting.md +326 -0
- package/content/knowledge/research/research-quant-market-data.md +366 -0
- package/content/knowledge/research/research-quant-metrics.md +335 -0
- package/content/knowledge/research/research-quant-requirements.md +223 -0
- package/content/knowledge/research/research-quant-risk.md +469 -0
- package/content/knowledge/research/research-quant-strategy-patterns.md +412 -0
- package/content/knowledge/research/research-requirements.md +201 -0
- package/content/knowledge/research/research-security.md +374 -0
- package/content/knowledge/research/research-sim-compute-management.md +538 -0
- package/content/knowledge/research/research-sim-engine-patterns.md +448 -0
- package/content/knowledge/research/research-sim-parameter-spaces.md +425 -0
- package/content/knowledge/research/research-sim-validation.md +456 -0
- package/content/knowledge/research/research-testing.md +334 -0
- package/content/methodology/research-ml-research.yml +23 -0
- package/content/methodology/research-overlay.yml +65 -0
- package/content/methodology/research-quant-finance.yml +29 -0
- package/content/methodology/research-simulation.yml +23 -0
- package/content/tools/post-implementation-review.md +36 -7
- package/content/tools/review-code.md +33 -8
- package/content/tools/review-pr.md +79 -95
- package/dist/cli/commands/adopt.d.ts.map +1 -1
- package/dist/cli/commands/adopt.js +22 -1
- package/dist/cli/commands/adopt.js.map +1 -1
- package/dist/cli/commands/adopt.serialization.test.js +41 -0
- package/dist/cli/commands/adopt.serialization.test.js.map +1 -1
- package/dist/cli/commands/init.d.ts +4 -0
- package/dist/cli/commands/init.d.ts.map +1 -1
- package/dist/cli/commands/init.js +32 -2
- package/dist/cli/commands/init.js.map +1 -1
- package/dist/cli/init-flag-families.d.ts +6 -1
- package/dist/cli/init-flag-families.d.ts.map +1 -1
- package/dist/cli/init-flag-families.js +32 -1
- package/dist/cli/init-flag-families.js.map +1 -1
- package/dist/cli/init-flag-families.test.js +47 -0
- package/dist/cli/init-flag-families.test.js.map +1 -1
- package/dist/config/schema.d.ts +272 -16
- package/dist/config/schema.d.ts.map +1 -1
- package/dist/config/schema.js +25 -1
- package/dist/config/schema.js.map +1 -1
- package/dist/config/schema.test.js +103 -3
- package/dist/config/schema.test.js.map +1 -1
- package/dist/core/assembly/overlay-loader.d.ts +12 -0
- package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
- package/dist/core/assembly/overlay-loader.js +30 -0
- package/dist/core/assembly/overlay-loader.js.map +1 -1
- package/dist/core/assembly/overlay-loader.test.js +66 -1
- package/dist/core/assembly/overlay-loader.test.js.map +1 -1
- package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
- package/dist/core/assembly/overlay-state-resolver.js +48 -19
- package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
- package/dist/core/assembly/overlay-state-resolver.test.js +80 -0
- package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
- package/dist/e2e/project-type-overlays.test.js +119 -0
- package/dist/e2e/project-type-overlays.test.js.map +1 -1
- package/dist/project/adopt.d.ts.map +1 -1
- package/dist/project/adopt.js +3 -1
- package/dist/project/adopt.js.map +1 -1
- package/dist/project/detectors/disambiguate.js +1 -1
- package/dist/project/detectors/disambiguate.js.map +1 -1
- package/dist/project/detectors/index.d.ts.map +1 -1
- package/dist/project/detectors/index.js +2 -1
- package/dist/project/detectors/index.js.map +1 -1
- package/dist/project/detectors/ml.d.ts.map +1 -1
- package/dist/project/detectors/ml.js +2 -6
- package/dist/project/detectors/ml.js.map +1 -1
- package/dist/project/detectors/research.d.ts +4 -0
- package/dist/project/detectors/research.d.ts.map +1 -0
- package/dist/project/detectors/research.js +141 -0
- package/dist/project/detectors/research.js.map +1 -0
- package/dist/project/detectors/research.test.d.ts +2 -0
- package/dist/project/detectors/research.test.d.ts.map +1 -0
- package/dist/project/detectors/research.test.js +235 -0
- package/dist/project/detectors/research.test.js.map +1 -0
- package/dist/project/detectors/shared-signals.d.ts +3 -0
- package/dist/project/detectors/shared-signals.d.ts.map +1 -0
- package/dist/project/detectors/shared-signals.js +9 -0
- package/dist/project/detectors/shared-signals.js.map +1 -0
- package/dist/project/detectors/types.d.ts +6 -2
- package/dist/project/detectors/types.d.ts.map +1 -1
- package/dist/project/detectors/types.js.map +1 -1
- package/dist/types/config.d.ts +7 -1
- package/dist/types/config.d.ts.map +1 -1
- package/dist/wizard/copy/core.d.ts.map +1 -1
- package/dist/wizard/copy/core.js +4 -0
- package/dist/wizard/copy/core.js.map +1 -1
- package/dist/wizard/copy/index.d.ts.map +1 -1
- package/dist/wizard/copy/index.js +2 -0
- package/dist/wizard/copy/index.js.map +1 -1
- package/dist/wizard/copy/research.d.ts +3 -0
- package/dist/wizard/copy/research.d.ts.map +1 -0
- package/dist/wizard/copy/research.js +27 -0
- package/dist/wizard/copy/research.js.map +1 -0
- package/dist/wizard/copy/types.d.ts +5 -1
- package/dist/wizard/copy/types.d.ts.map +1 -1
- package/dist/wizard/flags.d.ts +7 -1
- package/dist/wizard/flags.d.ts.map +1 -1
- package/dist/wizard/questions.d.ts +4 -2
- package/dist/wizard/questions.d.ts.map +1 -1
- package/dist/wizard/questions.js +27 -1
- package/dist/wizard/questions.js.map +1 -1
- package/dist/wizard/questions.test.js +51 -0
- package/dist/wizard/questions.test.js.map +1 -1
- package/dist/wizard/wizard.d.ts +3 -2
- package/dist/wizard/wizard.d.ts.map +1 -1
- package/dist/wizard/wizard.js +3 -1
- package/dist/wizard/wizard.js.map +1 -1
- package/package.json +1 -1
|
@@ -0,0 +1,336 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research-experiment-tracking
|
|
3
|
+
description: Experiment results logging including structured result formats, run comparison, reproducibility tracking, and artifact management
|
|
4
|
+
topics: [research, experiment-tracking, results, comparison, reproducibility, artifacts, mlflow]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
Experiment tracking is the difference between research and random exploration. Without structured logging of what was tried, what resulted, and what was decided, a research project becomes impossible to audit, reproduce, or learn from. Tracking must capture the full context of every run: the exact config, the environment, the metrics, and the keep/discard decision with its rationale.
|
|
8
|
+
|
|
9
|
+
## Summary
|
|
10
|
+
|
|
11
|
+
Log every experiment run with its complete config, environment snapshot, metrics, and decision. Use structured formats (JSON for metrics, CSV for time series, YAML for configs) that are both human-readable and machine-parseable. Implement run comparison utilities that rank runs by primary metric and highlight configuration differences. For larger projects, integrate MLflow or Weights & Biases for web-based dashboards and artifact storage. Always store enough information to reproduce any run from scratch.
|
|
12
|
+
|
|
13
|
+
## Deep Guidance
|
|
14
|
+
|
|
15
|
+
### Structured Result Format
|
|
16
|
+
|
|
17
|
+
Every experiment run produces a result record with four sections:
|
|
18
|
+
|
|
19
|
+
```python
|
|
20
|
+
# src/tracking/result_schema.py
|
|
21
|
+
from dataclasses import dataclass, field
|
|
22
|
+
from datetime import datetime
|
|
23
|
+
from typing import Any
|
|
24
|
+
|
|
25
|
+
@dataclass
|
|
26
|
+
class RunResult:
|
|
27
|
+
"""Complete record of a single experiment run."""
|
|
28
|
+
# Identity
|
|
29
|
+
run_id: str
|
|
30
|
+
experiment_id: str
|
|
31
|
+
timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
|
|
32
|
+
|
|
33
|
+
# Configuration (frozen snapshot)
|
|
34
|
+
config: dict[str, Any] = field(default_factory=dict)
|
|
35
|
+
|
|
36
|
+
# Environment (for reproducibility)
|
|
37
|
+
environment: dict[str, Any] = field(default_factory=dict)
|
|
38
|
+
|
|
39
|
+
# Metrics
|
|
40
|
+
metrics: dict[str, float] = field(default_factory=dict)
|
|
41
|
+
metric_history: list[dict[str, float]] = field(default_factory=list)
|
|
42
|
+
|
|
43
|
+
# Decision
|
|
44
|
+
decision: str = "" # "keep" or "discard"
|
|
45
|
+
decision_reason: str = ""
|
|
46
|
+
is_best: bool = False
|
|
47
|
+
|
|
48
|
+
# Artifacts (paths to saved files)
|
|
49
|
+
artifact_paths: dict[str, str] = field(default_factory=dict)
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
### File-Based Tracking
|
|
53
|
+
|
|
54
|
+
For small to medium projects (under ~1000 runs), file-based tracking is sufficient and has zero infrastructure dependencies:
|
|
55
|
+
|
|
56
|
+
```python
|
|
57
|
+
# src/tracking/file_tracker.py
|
|
58
|
+
import json
|
|
59
|
+
import csv
|
|
60
|
+
from pathlib import Path
|
|
61
|
+
from typing import Any
|
|
62
|
+
from src.tracking.result_schema import RunResult
|
|
63
|
+
|
|
64
|
+
class FileExperimentTracker:
|
|
65
|
+
"""File-based experiment tracker with no external dependencies."""
|
|
66
|
+
|
|
67
|
+
def __init__(self, results_dir: str):
|
|
68
|
+
self.results_dir = Path(results_dir)
|
|
69
|
+
self.results_dir.mkdir(parents=True, exist_ok=True)
|
|
70
|
+
self.leaderboard_path = self.results_dir / "leaderboard.csv"
|
|
71
|
+
|
|
72
|
+
def log_run(self, result: RunResult) -> Path:
|
|
73
|
+
"""Log a complete run result to disk."""
|
|
74
|
+
run_dir = self.results_dir / result.run_id
|
|
75
|
+
run_dir.mkdir(parents=True, exist_ok=True)
|
|
76
|
+
|
|
77
|
+
# Save config snapshot
|
|
78
|
+
with open(run_dir / "config.json", "w") as f:
|
|
79
|
+
json.dump(result.config, f, indent=2, default=str)
|
|
80
|
+
|
|
81
|
+
# Save environment
|
|
82
|
+
with open(run_dir / "environment.json", "w") as f:
|
|
83
|
+
json.dump(result.environment, f, indent=2)
|
|
84
|
+
|
|
85
|
+
# Save metrics
|
|
86
|
+
with open(run_dir / "metrics.json", "w") as f:
|
|
87
|
+
json.dump(result.metrics, f, indent=2)
|
|
88
|
+
|
|
89
|
+
# Save metric history (if available)
|
|
90
|
+
if result.metric_history:
|
|
91
|
+
with open(run_dir / "metric_history.csv", "w", newline="") as f:
|
|
92
|
+
writer = csv.DictWriter(f, fieldnames=result.metric_history[0].keys())
|
|
93
|
+
writer.writeheader()
|
|
94
|
+
writer.writerows(result.metric_history)
|
|
95
|
+
|
|
96
|
+
# Save decision
|
|
97
|
+
with open(run_dir / "decision.json", "w") as f:
|
|
98
|
+
json.dump({
|
|
99
|
+
"decision": result.decision,
|
|
100
|
+
"reason": result.decision_reason,
|
|
101
|
+
"is_best": result.is_best,
|
|
102
|
+
}, f, indent=2)
|
|
103
|
+
|
|
104
|
+
# Update leaderboard
|
|
105
|
+
self._update_leaderboard(result)
|
|
106
|
+
|
|
107
|
+
return run_dir
|
|
108
|
+
|
|
109
|
+
def _update_leaderboard(self, result: RunResult) -> None:
|
|
110
|
+
"""Append to the CSV leaderboard for quick comparison."""
|
|
111
|
+
exists = self.leaderboard_path.exists()
|
|
112
|
+
fieldnames = ["run_id", "timestamp", "decision"] + sorted(result.metrics.keys())
|
|
113
|
+
|
|
114
|
+
with open(self.leaderboard_path, "a", newline="") as f:
|
|
115
|
+
writer = csv.DictWriter(f, fieldnames=fieldnames, extrasaction="ignore")
|
|
116
|
+
if not exists:
|
|
117
|
+
writer.writeheader()
|
|
118
|
+
writer.writerow({
|
|
119
|
+
"run_id": result.run_id,
|
|
120
|
+
"timestamp": result.timestamp,
|
|
121
|
+
"decision": result.decision,
|
|
122
|
+
**result.metrics,
|
|
123
|
+
})
|
|
124
|
+
|
|
125
|
+
def load_run(self, run_id: str) -> RunResult:
|
|
126
|
+
"""Load a run result from disk."""
|
|
127
|
+
run_dir = self.results_dir / run_id
|
|
128
|
+
with open(run_dir / "config.json") as f:
|
|
129
|
+
config = json.load(f)
|
|
130
|
+
with open(run_dir / "metrics.json") as f:
|
|
131
|
+
metrics = json.load(f)
|
|
132
|
+
with open(run_dir / "decision.json") as f:
|
|
133
|
+
decision = json.load(f)
|
|
134
|
+
|
|
135
|
+
return RunResult(
|
|
136
|
+
run_id=run_id,
|
|
137
|
+
experiment_id="",
|
|
138
|
+
config=config,
|
|
139
|
+
metrics=metrics,
|
|
140
|
+
decision=decision["decision"],
|
|
141
|
+
decision_reason=decision["reason"],
|
|
142
|
+
is_best=decision["is_best"],
|
|
143
|
+
)
|
|
144
|
+
|
|
145
|
+
def get_leaderboard(self, sort_by: str = "", ascending: bool = False) -> list[dict]:
|
|
146
|
+
"""Load and sort the leaderboard."""
|
|
147
|
+
if not self.leaderboard_path.exists():
|
|
148
|
+
return []
|
|
149
|
+
with open(self.leaderboard_path, newline="") as f:
|
|
150
|
+
reader = csv.DictReader(f)
|
|
151
|
+
rows = list(reader)
|
|
152
|
+
if sort_by and rows:
|
|
153
|
+
rows.sort(key=lambda r: float(r.get(sort_by, 0)), reverse=not ascending)
|
|
154
|
+
return rows
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
### Run Comparison
|
|
158
|
+
|
|
159
|
+
Compare runs to understand what configuration changes produced which metric changes:
|
|
160
|
+
|
|
161
|
+
```python
|
|
162
|
+
# src/tracking/comparison.py
|
|
163
|
+
from typing import Any
|
|
164
|
+
|
|
165
|
+
def compare_runs(run_a: dict[str, Any], run_b: dict[str, Any]) -> dict[str, Any]:
|
|
166
|
+
"""Compare two runs, highlighting config and metric differences."""
|
|
167
|
+
config_diff = diff_dicts(run_a["config"], run_b["config"])
|
|
168
|
+
metric_diff = {
|
|
169
|
+
k: {
|
|
170
|
+
"a": run_a["metrics"].get(k),
|
|
171
|
+
"b": run_b["metrics"].get(k),
|
|
172
|
+
"delta": (run_b["metrics"].get(k, 0) - run_a["metrics"].get(k, 0)),
|
|
173
|
+
}
|
|
174
|
+
for k in set(run_a["metrics"]) | set(run_b["metrics"])
|
|
175
|
+
}
|
|
176
|
+
return {
|
|
177
|
+
"config_diff": config_diff,
|
|
178
|
+
"metric_diff": metric_diff,
|
|
179
|
+
}
|
|
180
|
+
|
|
181
|
+
def diff_dicts(a: dict, b: dict, prefix: str = "") -> list[dict]:
|
|
182
|
+
"""Recursively diff two dicts, returning changed keys."""
|
|
183
|
+
diffs = []
|
|
184
|
+
all_keys = set(a) | set(b)
|
|
185
|
+
for key in sorted(all_keys):
|
|
186
|
+
path = f"{prefix}.{key}" if prefix else key
|
|
187
|
+
val_a = a.get(key)
|
|
188
|
+
val_b = b.get(key)
|
|
189
|
+
if isinstance(val_a, dict) and isinstance(val_b, dict):
|
|
190
|
+
diffs.extend(diff_dicts(val_a, val_b, path))
|
|
191
|
+
elif val_a != val_b:
|
|
192
|
+
diffs.append({"path": path, "old": val_a, "new": val_b})
|
|
193
|
+
return diffs
|
|
194
|
+
|
|
195
|
+
def rank_runs(runs: list[dict], metric: str, direction: str = "maximize") -> list[dict]:
|
|
196
|
+
"""Rank runs by a metric."""
|
|
197
|
+
reverse = direction == "maximize"
|
|
198
|
+
return sorted(
|
|
199
|
+
runs,
|
|
200
|
+
key=lambda r: r["metrics"].get(metric, float("-inf") if reverse else float("inf")),
|
|
201
|
+
reverse=reverse,
|
|
202
|
+
)
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
### MLflow Integration
|
|
206
|
+
|
|
207
|
+
For projects with many runs or team collaboration, MLflow provides a web UI and artifact store:
|
|
208
|
+
|
|
209
|
+
```python
|
|
210
|
+
# src/tracking/mlflow_tracker.py
|
|
211
|
+
import mlflow
|
|
212
|
+
from pathlib import Path
|
|
213
|
+
from typing import Any
|
|
214
|
+
|
|
215
|
+
class MLflowTracker:
|
|
216
|
+
"""MLflow-backed experiment tracker."""
|
|
217
|
+
|
|
218
|
+
def __init__(self, experiment_name: str, tracking_uri: str = "sqlite:///mlruns.db"):
|
|
219
|
+
mlflow.set_tracking_uri(tracking_uri)
|
|
220
|
+
mlflow.set_experiment(experiment_name)
|
|
221
|
+
|
|
222
|
+
def log_run(self, run_id: str, config: dict[str, Any],
|
|
223
|
+
metrics: dict[str, float], artifacts: dict[str, str] | None = None,
|
|
224
|
+
decision: str = "") -> None:
|
|
225
|
+
with mlflow.start_run(run_name=run_id):
|
|
226
|
+
# Log config as parameters (flattened)
|
|
227
|
+
flat_config = self._flatten(config)
|
|
228
|
+
mlflow.log_params(flat_config)
|
|
229
|
+
|
|
230
|
+
# Log metrics
|
|
231
|
+
for name, value in metrics.items():
|
|
232
|
+
mlflow.log_metric(name, value)
|
|
233
|
+
|
|
234
|
+
# Log decision as tag
|
|
235
|
+
mlflow.set_tag("decision", decision)
|
|
236
|
+
|
|
237
|
+
# Log artifacts
|
|
238
|
+
if artifacts:
|
|
239
|
+
for name, path in artifacts.items():
|
|
240
|
+
mlflow.log_artifact(path)
|
|
241
|
+
|
|
242
|
+
@staticmethod
|
|
243
|
+
def _flatten(d: dict, prefix: str = "") -> dict[str, str]:
|
|
244
|
+
"""Flatten nested dict for MLflow params (which are flat)."""
|
|
245
|
+
items = {}
|
|
246
|
+
for k, v in d.items():
|
|
247
|
+
key = f"{prefix}.{k}" if prefix else k
|
|
248
|
+
if isinstance(v, dict):
|
|
249
|
+
items.update(MLflowTracker._flatten(v, key))
|
|
250
|
+
else:
|
|
251
|
+
items[key] = str(v)
|
|
252
|
+
return items
|
|
253
|
+
```
|
|
254
|
+
|
|
255
|
+
### Reproducibility Tracking
|
|
256
|
+
|
|
257
|
+
Every run must capture enough context to be reproduced. The minimum reproducibility record:
|
|
258
|
+
|
|
259
|
+
```python
|
|
260
|
+
# src/tracking/reproducibility.py
|
|
261
|
+
import subprocess
|
|
262
|
+
import platform
|
|
263
|
+
import hashlib
|
|
264
|
+
import json
|
|
265
|
+
|
|
266
|
+
def create_reproducibility_record(config: dict, data_path: str) -> dict:
|
|
267
|
+
"""Create a record sufficient to reproduce this experiment run."""
|
|
268
|
+
return {
|
|
269
|
+
# Software
|
|
270
|
+
"python_version": platform.python_version(),
|
|
271
|
+
"platform": platform.platform(),
|
|
272
|
+
"pip_freeze": _pip_freeze(),
|
|
273
|
+
# Code
|
|
274
|
+
"git_sha": _git_sha(),
|
|
275
|
+
"git_dirty": _git_is_dirty(),
|
|
276
|
+
"git_branch": _git_branch(),
|
|
277
|
+
# Data
|
|
278
|
+
"data_hash": _hash_file(data_path) if data_path else None,
|
|
279
|
+
# Config (complete)
|
|
280
|
+
"config_hash": hashlib.sha256(
|
|
281
|
+
json.dumps(config, sort_keys=True).encode()
|
|
282
|
+
).hexdigest(),
|
|
283
|
+
}
|
|
284
|
+
|
|
285
|
+
def _pip_freeze() -> list[str]:
|
|
286
|
+
return subprocess.check_output(
|
|
287
|
+
["pip", "freeze"], text=True
|
|
288
|
+
).strip().split("\n")
|
|
289
|
+
|
|
290
|
+
def _git_sha() -> str:
|
|
291
|
+
return subprocess.check_output(
|
|
292
|
+
["git", "rev-parse", "HEAD"], text=True
|
|
293
|
+
).strip()
|
|
294
|
+
|
|
295
|
+
def _git_is_dirty() -> bool:
|
|
296
|
+
return bool(subprocess.check_output(
|
|
297
|
+
["git", "status", "--porcelain"], text=True
|
|
298
|
+
).strip())
|
|
299
|
+
|
|
300
|
+
def _git_branch() -> str:
|
|
301
|
+
return subprocess.check_output(
|
|
302
|
+
["git", "rev-parse", "--abbrev-ref", "HEAD"], text=True
|
|
303
|
+
).strip()
|
|
304
|
+
|
|
305
|
+
def _hash_file(path: str) -> str:
|
|
306
|
+
h = hashlib.sha256()
|
|
307
|
+
with open(path, "rb") as f:
|
|
308
|
+
for chunk in iter(lambda: f.read(8192), b""):
|
|
309
|
+
h.update(chunk)
|
|
310
|
+
return h.hexdigest()
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
### Artifact Management
|
|
314
|
+
|
|
315
|
+
Artifacts are files produced during experiment execution that are too large or complex for JSON metrics:
|
|
316
|
+
|
|
317
|
+
| Artifact Type | Format | Storage |
|
|
318
|
+
|---------------|--------|---------|
|
|
319
|
+
| Model checkpoints | `.pt`, `.pkl`, `.joblib` | `results/{run}/artifacts/` |
|
|
320
|
+
| Equity curves | `.csv`, `.parquet` | `results/{run}/artifacts/` |
|
|
321
|
+
| Plots | `.png`, `.svg` | `results/{run}/artifacts/` |
|
|
322
|
+
| Logs | `.txt`, `.log` | `results/{run}/log.txt` |
|
|
323
|
+
| Configs | `.yml`, `.json` | `results/{run}/config.json` |
|
|
324
|
+
|
|
325
|
+
**Storage strategy**:
|
|
326
|
+
- Small artifacts (< 10 MB): Store in the run directory
|
|
327
|
+
- Large artifacts (> 10 MB): Store in cloud storage (S3, GCS) with a reference in the run record
|
|
328
|
+
- Transient artifacts (intermediate checkpoints): Delete after the run unless explicitly requested
|
|
329
|
+
|
|
330
|
+
### Tracking Best Practices
|
|
331
|
+
|
|
332
|
+
1. **Log everything, filter later**: It is cheaper to log too much than to re-run an experiment because you forgot to record a parameter.
|
|
333
|
+
2. **Structured formats only**: JSON and CSV, never unstructured text logs for metrics. Text logs are for debugging, not analysis.
|
|
334
|
+
3. **Immutable run records**: Once a run is recorded, never modify its metrics or config. If a metric was computed incorrectly, add a new metric column rather than editing the old one.
|
|
335
|
+
4. **Leaderboard as index**: Maintain a single CSV leaderboard that can be loaded into pandas for quick analysis. Do not rely on scanning individual run directories.
|
|
336
|
+
5. **Version the tracking schema**: If you add new metrics or change the format, version the schema so old runs remain parseable.
|