@zigrivers/scaffold 3.14.0 → 3.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (117) hide show
  1. package/README.md +31 -9
  2. package/content/knowledge/research/research-architecture.md +385 -0
  3. package/content/knowledge/research/research-conventions.md +248 -0
  4. package/content/knowledge/research/research-dev-environment.md +303 -0
  5. package/content/knowledge/research/research-experiment-loop.md +429 -0
  6. package/content/knowledge/research/research-experiment-tracking.md +336 -0
  7. package/content/knowledge/research/research-ml-architecture-search.md +383 -0
  8. package/content/knowledge/research/research-ml-evaluation.md +407 -0
  9. package/content/knowledge/research/research-ml-experiment-tracking.md +466 -0
  10. package/content/knowledge/research/research-ml-training-patterns.md +413 -0
  11. package/content/knowledge/research/research-observability.md +395 -0
  12. package/content/knowledge/research/research-overfitting-prevention.md +306 -0
  13. package/content/knowledge/research/research-project-structure.md +264 -0
  14. package/content/knowledge/research/research-quant-backtesting.md +326 -0
  15. package/content/knowledge/research/research-quant-market-data.md +366 -0
  16. package/content/knowledge/research/research-quant-metrics.md +335 -0
  17. package/content/knowledge/research/research-quant-requirements.md +223 -0
  18. package/content/knowledge/research/research-quant-risk.md +469 -0
  19. package/content/knowledge/research/research-quant-strategy-patterns.md +412 -0
  20. package/content/knowledge/research/research-requirements.md +201 -0
  21. package/content/knowledge/research/research-security.md +374 -0
  22. package/content/knowledge/research/research-sim-compute-management.md +538 -0
  23. package/content/knowledge/research/research-sim-engine-patterns.md +448 -0
  24. package/content/knowledge/research/research-sim-parameter-spaces.md +425 -0
  25. package/content/knowledge/research/research-sim-validation.md +456 -0
  26. package/content/knowledge/research/research-testing.md +334 -0
  27. package/content/methodology/research-ml-research.yml +23 -0
  28. package/content/methodology/research-overlay.yml +65 -0
  29. package/content/methodology/research-quant-finance.yml +29 -0
  30. package/content/methodology/research-simulation.yml +23 -0
  31. package/dist/cli/commands/adopt.d.ts.map +1 -1
  32. package/dist/cli/commands/adopt.js +22 -1
  33. package/dist/cli/commands/adopt.js.map +1 -1
  34. package/dist/cli/commands/adopt.serialization.test.js +41 -0
  35. package/dist/cli/commands/adopt.serialization.test.js.map +1 -1
  36. package/dist/cli/commands/init.d.ts +4 -0
  37. package/dist/cli/commands/init.d.ts.map +1 -1
  38. package/dist/cli/commands/init.js +32 -2
  39. package/dist/cli/commands/init.js.map +1 -1
  40. package/dist/cli/init-flag-families.d.ts +6 -1
  41. package/dist/cli/init-flag-families.d.ts.map +1 -1
  42. package/dist/cli/init-flag-families.js +32 -1
  43. package/dist/cli/init-flag-families.js.map +1 -1
  44. package/dist/cli/init-flag-families.test.js +47 -0
  45. package/dist/cli/init-flag-families.test.js.map +1 -1
  46. package/dist/config/schema.d.ts +272 -16
  47. package/dist/config/schema.d.ts.map +1 -1
  48. package/dist/config/schema.js +25 -1
  49. package/dist/config/schema.js.map +1 -1
  50. package/dist/config/schema.test.js +103 -3
  51. package/dist/config/schema.test.js.map +1 -1
  52. package/dist/core/assembly/overlay-loader.d.ts +12 -0
  53. package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
  54. package/dist/core/assembly/overlay-loader.js +30 -0
  55. package/dist/core/assembly/overlay-loader.js.map +1 -1
  56. package/dist/core/assembly/overlay-loader.test.js +66 -1
  57. package/dist/core/assembly/overlay-loader.test.js.map +1 -1
  58. package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
  59. package/dist/core/assembly/overlay-state-resolver.js +48 -19
  60. package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
  61. package/dist/core/assembly/overlay-state-resolver.test.js +80 -0
  62. package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
  63. package/dist/e2e/project-type-overlays.test.js +119 -0
  64. package/dist/e2e/project-type-overlays.test.js.map +1 -1
  65. package/dist/project/adopt.d.ts.map +1 -1
  66. package/dist/project/adopt.js +3 -1
  67. package/dist/project/adopt.js.map +1 -1
  68. package/dist/project/detectors/disambiguate.js +1 -1
  69. package/dist/project/detectors/disambiguate.js.map +1 -1
  70. package/dist/project/detectors/index.d.ts.map +1 -1
  71. package/dist/project/detectors/index.js +2 -1
  72. package/dist/project/detectors/index.js.map +1 -1
  73. package/dist/project/detectors/ml.d.ts.map +1 -1
  74. package/dist/project/detectors/ml.js +2 -6
  75. package/dist/project/detectors/ml.js.map +1 -1
  76. package/dist/project/detectors/research.d.ts +4 -0
  77. package/dist/project/detectors/research.d.ts.map +1 -0
  78. package/dist/project/detectors/research.js +141 -0
  79. package/dist/project/detectors/research.js.map +1 -0
  80. package/dist/project/detectors/research.test.d.ts +2 -0
  81. package/dist/project/detectors/research.test.d.ts.map +1 -0
  82. package/dist/project/detectors/research.test.js +235 -0
  83. package/dist/project/detectors/research.test.js.map +1 -0
  84. package/dist/project/detectors/shared-signals.d.ts +3 -0
  85. package/dist/project/detectors/shared-signals.d.ts.map +1 -0
  86. package/dist/project/detectors/shared-signals.js +9 -0
  87. package/dist/project/detectors/shared-signals.js.map +1 -0
  88. package/dist/project/detectors/types.d.ts +6 -2
  89. package/dist/project/detectors/types.d.ts.map +1 -1
  90. package/dist/project/detectors/types.js.map +1 -1
  91. package/dist/types/config.d.ts +7 -1
  92. package/dist/types/config.d.ts.map +1 -1
  93. package/dist/wizard/copy/core.d.ts.map +1 -1
  94. package/dist/wizard/copy/core.js +4 -0
  95. package/dist/wizard/copy/core.js.map +1 -1
  96. package/dist/wizard/copy/index.d.ts.map +1 -1
  97. package/dist/wizard/copy/index.js +2 -0
  98. package/dist/wizard/copy/index.js.map +1 -1
  99. package/dist/wizard/copy/research.d.ts +3 -0
  100. package/dist/wizard/copy/research.d.ts.map +1 -0
  101. package/dist/wizard/copy/research.js +27 -0
  102. package/dist/wizard/copy/research.js.map +1 -0
  103. package/dist/wizard/copy/types.d.ts +5 -1
  104. package/dist/wizard/copy/types.d.ts.map +1 -1
  105. package/dist/wizard/flags.d.ts +7 -1
  106. package/dist/wizard/flags.d.ts.map +1 -1
  107. package/dist/wizard/questions.d.ts +4 -2
  108. package/dist/wizard/questions.d.ts.map +1 -1
  109. package/dist/wizard/questions.js +27 -1
  110. package/dist/wizard/questions.js.map +1 -1
  111. package/dist/wizard/questions.test.js +51 -0
  112. package/dist/wizard/questions.test.js.map +1 -1
  113. package/dist/wizard/wizard.d.ts +3 -2
  114. package/dist/wizard/wizard.d.ts.map +1 -1
  115. package/dist/wizard/wizard.js +3 -1
  116. package/dist/wizard/wizard.js.map +1 -1
  117. package/package.json +1 -1
@@ -0,0 +1,336 @@
1
+ ---
2
+ name: research-experiment-tracking
3
+ description: Experiment results logging including structured result formats, run comparison, reproducibility tracking, and artifact management
4
+ topics: [research, experiment-tracking, results, comparison, reproducibility, artifacts, mlflow]
5
+ ---
6
+
7
+ Experiment tracking is the difference between research and random exploration. Without structured logging of what was tried, what resulted, and what was decided, a research project becomes impossible to audit, reproduce, or learn from. Tracking must capture the full context of every run: the exact config, the environment, the metrics, and the keep/discard decision with its rationale.
8
+
9
+ ## Summary
10
+
11
+ Log every experiment run with its complete config, environment snapshot, metrics, and decision. Use structured formats (JSON for metrics, CSV for time series, YAML for configs) that are both human-readable and machine-parseable. Implement run comparison utilities that rank runs by primary metric and highlight configuration differences. For larger projects, integrate MLflow or Weights & Biases for web-based dashboards and artifact storage. Always store enough information to reproduce any run from scratch.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Structured Result Format
16
+
17
+ Every experiment run produces a result record with four sections:
18
+
19
+ ```python
20
+ # src/tracking/result_schema.py
21
+ from dataclasses import dataclass, field
22
+ from datetime import datetime
23
+ from typing import Any
24
+
25
+ @dataclass
26
+ class RunResult:
27
+ """Complete record of a single experiment run."""
28
+ # Identity
29
+ run_id: str
30
+ experiment_id: str
31
+ timestamp: str = field(default_factory=lambda: datetime.now().isoformat())
32
+
33
+ # Configuration (frozen snapshot)
34
+ config: dict[str, Any] = field(default_factory=dict)
35
+
36
+ # Environment (for reproducibility)
37
+ environment: dict[str, Any] = field(default_factory=dict)
38
+
39
+ # Metrics
40
+ metrics: dict[str, float] = field(default_factory=dict)
41
+ metric_history: list[dict[str, float]] = field(default_factory=list)
42
+
43
+ # Decision
44
+ decision: str = "" # "keep" or "discard"
45
+ decision_reason: str = ""
46
+ is_best: bool = False
47
+
48
+ # Artifacts (paths to saved files)
49
+ artifact_paths: dict[str, str] = field(default_factory=dict)
50
+ ```
51
+
52
+ ### File-Based Tracking
53
+
54
+ For small to medium projects (under ~1000 runs), file-based tracking is sufficient and has zero infrastructure dependencies:
55
+
56
+ ```python
57
+ # src/tracking/file_tracker.py
58
+ import json
59
+ import csv
60
+ from pathlib import Path
61
+ from typing import Any
62
+ from src.tracking.result_schema import RunResult
63
+
64
+ class FileExperimentTracker:
65
+ """File-based experiment tracker with no external dependencies."""
66
+
67
+ def __init__(self, results_dir: str):
68
+ self.results_dir = Path(results_dir)
69
+ self.results_dir.mkdir(parents=True, exist_ok=True)
70
+ self.leaderboard_path = self.results_dir / "leaderboard.csv"
71
+
72
+ def log_run(self, result: RunResult) -> Path:
73
+ """Log a complete run result to disk."""
74
+ run_dir = self.results_dir / result.run_id
75
+ run_dir.mkdir(parents=True, exist_ok=True)
76
+
77
+ # Save config snapshot
78
+ with open(run_dir / "config.json", "w") as f:
79
+ json.dump(result.config, f, indent=2, default=str)
80
+
81
+ # Save environment
82
+ with open(run_dir / "environment.json", "w") as f:
83
+ json.dump(result.environment, f, indent=2)
84
+
85
+ # Save metrics
86
+ with open(run_dir / "metrics.json", "w") as f:
87
+ json.dump(result.metrics, f, indent=2)
88
+
89
+ # Save metric history (if available)
90
+ if result.metric_history:
91
+ with open(run_dir / "metric_history.csv", "w", newline="") as f:
92
+ writer = csv.DictWriter(f, fieldnames=result.metric_history[0].keys())
93
+ writer.writeheader()
94
+ writer.writerows(result.metric_history)
95
+
96
+ # Save decision
97
+ with open(run_dir / "decision.json", "w") as f:
98
+ json.dump({
99
+ "decision": result.decision,
100
+ "reason": result.decision_reason,
101
+ "is_best": result.is_best,
102
+ }, f, indent=2)
103
+
104
+ # Update leaderboard
105
+ self._update_leaderboard(result)
106
+
107
+ return run_dir
108
+
109
+ def _update_leaderboard(self, result: RunResult) -> None:
110
+ """Append to the CSV leaderboard for quick comparison."""
111
+ exists = self.leaderboard_path.exists()
112
+ fieldnames = ["run_id", "timestamp", "decision"] + sorted(result.metrics.keys())
113
+
114
+ with open(self.leaderboard_path, "a", newline="") as f:
115
+ writer = csv.DictWriter(f, fieldnames=fieldnames, extrasaction="ignore")
116
+ if not exists:
117
+ writer.writeheader()
118
+ writer.writerow({
119
+ "run_id": result.run_id,
120
+ "timestamp": result.timestamp,
121
+ "decision": result.decision,
122
+ **result.metrics,
123
+ })
124
+
125
+ def load_run(self, run_id: str) -> RunResult:
126
+ """Load a run result from disk."""
127
+ run_dir = self.results_dir / run_id
128
+ with open(run_dir / "config.json") as f:
129
+ config = json.load(f)
130
+ with open(run_dir / "metrics.json") as f:
131
+ metrics = json.load(f)
132
+ with open(run_dir / "decision.json") as f:
133
+ decision = json.load(f)
134
+
135
+ return RunResult(
136
+ run_id=run_id,
137
+ experiment_id="",
138
+ config=config,
139
+ metrics=metrics,
140
+ decision=decision["decision"],
141
+ decision_reason=decision["reason"],
142
+ is_best=decision["is_best"],
143
+ )
144
+
145
+ def get_leaderboard(self, sort_by: str = "", ascending: bool = False) -> list[dict]:
146
+ """Load and sort the leaderboard."""
147
+ if not self.leaderboard_path.exists():
148
+ return []
149
+ with open(self.leaderboard_path, newline="") as f:
150
+ reader = csv.DictReader(f)
151
+ rows = list(reader)
152
+ if sort_by and rows:
153
+ rows.sort(key=lambda r: float(r.get(sort_by, 0)), reverse=not ascending)
154
+ return rows
155
+ ```
156
+
157
+ ### Run Comparison
158
+
159
+ Compare runs to understand what configuration changes produced which metric changes:
160
+
161
+ ```python
162
+ # src/tracking/comparison.py
163
+ from typing import Any
164
+
165
+ def compare_runs(run_a: dict[str, Any], run_b: dict[str, Any]) -> dict[str, Any]:
166
+ """Compare two runs, highlighting config and metric differences."""
167
+ config_diff = diff_dicts(run_a["config"], run_b["config"])
168
+ metric_diff = {
169
+ k: {
170
+ "a": run_a["metrics"].get(k),
171
+ "b": run_b["metrics"].get(k),
172
+ "delta": (run_b["metrics"].get(k, 0) - run_a["metrics"].get(k, 0)),
173
+ }
174
+ for k in set(run_a["metrics"]) | set(run_b["metrics"])
175
+ }
176
+ return {
177
+ "config_diff": config_diff,
178
+ "metric_diff": metric_diff,
179
+ }
180
+
181
+ def diff_dicts(a: dict, b: dict, prefix: str = "") -> list[dict]:
182
+ """Recursively diff two dicts, returning changed keys."""
183
+ diffs = []
184
+ all_keys = set(a) | set(b)
185
+ for key in sorted(all_keys):
186
+ path = f"{prefix}.{key}" if prefix else key
187
+ val_a = a.get(key)
188
+ val_b = b.get(key)
189
+ if isinstance(val_a, dict) and isinstance(val_b, dict):
190
+ diffs.extend(diff_dicts(val_a, val_b, path))
191
+ elif val_a != val_b:
192
+ diffs.append({"path": path, "old": val_a, "new": val_b})
193
+ return diffs
194
+
195
+ def rank_runs(runs: list[dict], metric: str, direction: str = "maximize") -> list[dict]:
196
+ """Rank runs by a metric."""
197
+ reverse = direction == "maximize"
198
+ return sorted(
199
+ runs,
200
+ key=lambda r: r["metrics"].get(metric, float("-inf") if reverse else float("inf")),
201
+ reverse=reverse,
202
+ )
203
+ ```
204
+
205
+ ### MLflow Integration
206
+
207
+ For projects with many runs or team collaboration, MLflow provides a web UI and artifact store:
208
+
209
+ ```python
210
+ # src/tracking/mlflow_tracker.py
211
+ import mlflow
212
+ from pathlib import Path
213
+ from typing import Any
214
+
215
+ class MLflowTracker:
216
+ """MLflow-backed experiment tracker."""
217
+
218
+ def __init__(self, experiment_name: str, tracking_uri: str = "sqlite:///mlruns.db"):
219
+ mlflow.set_tracking_uri(tracking_uri)
220
+ mlflow.set_experiment(experiment_name)
221
+
222
+ def log_run(self, run_id: str, config: dict[str, Any],
223
+ metrics: dict[str, float], artifacts: dict[str, str] | None = None,
224
+ decision: str = "") -> None:
225
+ with mlflow.start_run(run_name=run_id):
226
+ # Log config as parameters (flattened)
227
+ flat_config = self._flatten(config)
228
+ mlflow.log_params(flat_config)
229
+
230
+ # Log metrics
231
+ for name, value in metrics.items():
232
+ mlflow.log_metric(name, value)
233
+
234
+ # Log decision as tag
235
+ mlflow.set_tag("decision", decision)
236
+
237
+ # Log artifacts
238
+ if artifacts:
239
+ for name, path in artifacts.items():
240
+ mlflow.log_artifact(path)
241
+
242
+ @staticmethod
243
+ def _flatten(d: dict, prefix: str = "") -> dict[str, str]:
244
+ """Flatten nested dict for MLflow params (which are flat)."""
245
+ items = {}
246
+ for k, v in d.items():
247
+ key = f"{prefix}.{k}" if prefix else k
248
+ if isinstance(v, dict):
249
+ items.update(MLflowTracker._flatten(v, key))
250
+ else:
251
+ items[key] = str(v)
252
+ return items
253
+ ```
254
+
255
+ ### Reproducibility Tracking
256
+
257
+ Every run must capture enough context to be reproduced. The minimum reproducibility record:
258
+
259
+ ```python
260
+ # src/tracking/reproducibility.py
261
+ import subprocess
262
+ import platform
263
+ import hashlib
264
+ import json
265
+
266
+ def create_reproducibility_record(config: dict, data_path: str) -> dict:
267
+ """Create a record sufficient to reproduce this experiment run."""
268
+ return {
269
+ # Software
270
+ "python_version": platform.python_version(),
271
+ "platform": platform.platform(),
272
+ "pip_freeze": _pip_freeze(),
273
+ # Code
274
+ "git_sha": _git_sha(),
275
+ "git_dirty": _git_is_dirty(),
276
+ "git_branch": _git_branch(),
277
+ # Data
278
+ "data_hash": _hash_file(data_path) if data_path else None,
279
+ # Config (complete)
280
+ "config_hash": hashlib.sha256(
281
+ json.dumps(config, sort_keys=True).encode()
282
+ ).hexdigest(),
283
+ }
284
+
285
+ def _pip_freeze() -> list[str]:
286
+ return subprocess.check_output(
287
+ ["pip", "freeze"], text=True
288
+ ).strip().split("\n")
289
+
290
+ def _git_sha() -> str:
291
+ return subprocess.check_output(
292
+ ["git", "rev-parse", "HEAD"], text=True
293
+ ).strip()
294
+
295
+ def _git_is_dirty() -> bool:
296
+ return bool(subprocess.check_output(
297
+ ["git", "status", "--porcelain"], text=True
298
+ ).strip())
299
+
300
+ def _git_branch() -> str:
301
+ return subprocess.check_output(
302
+ ["git", "rev-parse", "--abbrev-ref", "HEAD"], text=True
303
+ ).strip()
304
+
305
+ def _hash_file(path: str) -> str:
306
+ h = hashlib.sha256()
307
+ with open(path, "rb") as f:
308
+ for chunk in iter(lambda: f.read(8192), b""):
309
+ h.update(chunk)
310
+ return h.hexdigest()
311
+ ```
312
+
313
+ ### Artifact Management
314
+
315
+ Artifacts are files produced during experiment execution that are too large or complex for JSON metrics:
316
+
317
+ | Artifact Type | Format | Storage |
318
+ |---------------|--------|---------|
319
+ | Model checkpoints | `.pt`, `.pkl`, `.joblib` | `results/{run}/artifacts/` |
320
+ | Equity curves | `.csv`, `.parquet` | `results/{run}/artifacts/` |
321
+ | Plots | `.png`, `.svg` | `results/{run}/artifacts/` |
322
+ | Logs | `.txt`, `.log` | `results/{run}/log.txt` |
323
+ | Configs | `.yml`, `.json` | `results/{run}/config.json` |
324
+
325
+ **Storage strategy**:
326
+ - Small artifacts (< 10 MB): Store in the run directory
327
+ - Large artifacts (> 10 MB): Store in cloud storage (S3, GCS) with a reference in the run record
328
+ - Transient artifacts (intermediate checkpoints): Delete after the run unless explicitly requested
329
+
330
+ ### Tracking Best Practices
331
+
332
+ 1. **Log everything, filter later**: It is cheaper to log too much than to re-run an experiment because you forgot to record a parameter.
333
+ 2. **Structured formats only**: JSON and CSV, never unstructured text logs for metrics. Text logs are for debugging, not analysis.
334
+ 3. **Immutable run records**: Once a run is recorded, never modify its metrics or config. If a metric was computed incorrectly, add a new metric column rather than editing the old one.
335
+ 4. **Leaderboard as index**: Maintain a single CSV leaderboard that can be loaded into pandas for quick analysis. Do not rely on scanning individual run directories.
336
+ 5. **Version the tracking schema**: If you add new metrics or change the format, version the schema so old runs remain parseable.