@zigrivers/scaffold 3.13.0 → 3.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (180) hide show
  1. package/README.md +32 -10
  2. package/content/knowledge/research/research-architecture.md +385 -0
  3. package/content/knowledge/research/research-conventions.md +248 -0
  4. package/content/knowledge/research/research-dev-environment.md +303 -0
  5. package/content/knowledge/research/research-experiment-loop.md +429 -0
  6. package/content/knowledge/research/research-experiment-tracking.md +336 -0
  7. package/content/knowledge/research/research-ml-architecture-search.md +383 -0
  8. package/content/knowledge/research/research-ml-evaluation.md +407 -0
  9. package/content/knowledge/research/research-ml-experiment-tracking.md +466 -0
  10. package/content/knowledge/research/research-ml-training-patterns.md +413 -0
  11. package/content/knowledge/research/research-observability.md +395 -0
  12. package/content/knowledge/research/research-overfitting-prevention.md +306 -0
  13. package/content/knowledge/research/research-project-structure.md +264 -0
  14. package/content/knowledge/research/research-quant-backtesting.md +326 -0
  15. package/content/knowledge/research/research-quant-market-data.md +366 -0
  16. package/content/knowledge/research/research-quant-metrics.md +335 -0
  17. package/content/knowledge/research/research-quant-requirements.md +223 -0
  18. package/content/knowledge/research/research-quant-risk.md +469 -0
  19. package/content/knowledge/research/research-quant-strategy-patterns.md +412 -0
  20. package/content/knowledge/research/research-requirements.md +201 -0
  21. package/content/knowledge/research/research-security.md +374 -0
  22. package/content/knowledge/research/research-sim-compute-management.md +538 -0
  23. package/content/knowledge/research/research-sim-engine-patterns.md +448 -0
  24. package/content/knowledge/research/research-sim-parameter-spaces.md +425 -0
  25. package/content/knowledge/research/research-sim-validation.md +456 -0
  26. package/content/knowledge/research/research-testing.md +334 -0
  27. package/content/methodology/research-ml-research.yml +23 -0
  28. package/content/methodology/research-overlay.yml +65 -0
  29. package/content/methodology/research-quant-finance.yml +29 -0
  30. package/content/methodology/research-simulation.yml +23 -0
  31. package/dist/cli/commands/adopt.d.ts.map +1 -1
  32. package/dist/cli/commands/adopt.js +30 -8
  33. package/dist/cli/commands/adopt.js.map +1 -1
  34. package/dist/cli/commands/adopt.serialization.test.js +49 -0
  35. package/dist/cli/commands/adopt.serialization.test.js.map +1 -1
  36. package/dist/cli/commands/adopt.test.js +8 -0
  37. package/dist/cli/commands/adopt.test.js.map +1 -1
  38. package/dist/cli/commands/build.d.ts.map +1 -1
  39. package/dist/cli/commands/build.js +191 -180
  40. package/dist/cli/commands/build.js.map +1 -1
  41. package/dist/cli/commands/complete.d.ts.map +1 -1
  42. package/dist/cli/commands/complete.js +16 -12
  43. package/dist/cli/commands/complete.js.map +1 -1
  44. package/dist/cli/commands/complete.test.js +14 -5
  45. package/dist/cli/commands/complete.test.js.map +1 -1
  46. package/dist/cli/commands/init.d.ts +4 -0
  47. package/dist/cli/commands/init.d.ts.map +1 -1
  48. package/dist/cli/commands/init.js +75 -51
  49. package/dist/cli/commands/init.js.map +1 -1
  50. package/dist/cli/commands/init.test.js +33 -27
  51. package/dist/cli/commands/init.test.js.map +1 -1
  52. package/dist/cli/commands/reset.d.ts.map +1 -1
  53. package/dist/cli/commands/reset.js +44 -40
  54. package/dist/cli/commands/reset.js.map +1 -1
  55. package/dist/cli/commands/reset.test.js +42 -20
  56. package/dist/cli/commands/reset.test.js.map +1 -1
  57. package/dist/cli/commands/rework.d.ts.map +1 -1
  58. package/dist/cli/commands/rework.js +16 -12
  59. package/dist/cli/commands/rework.js.map +1 -1
  60. package/dist/cli/commands/rework.test.js +12 -3
  61. package/dist/cli/commands/rework.test.js.map +1 -1
  62. package/dist/cli/commands/run.d.ts.map +1 -1
  63. package/dist/cli/commands/run.js +318 -298
  64. package/dist/cli/commands/run.js.map +1 -1
  65. package/dist/cli/commands/run.test.js +92 -120
  66. package/dist/cli/commands/run.test.js.map +1 -1
  67. package/dist/cli/commands/skip.d.ts.map +1 -1
  68. package/dist/cli/commands/skip.js +19 -15
  69. package/dist/cli/commands/skip.js.map +1 -1
  70. package/dist/cli/commands/skip.test.js +22 -11
  71. package/dist/cli/commands/skip.test.js.map +1 -1
  72. package/dist/cli/commands/update.d.ts.map +1 -1
  73. package/dist/cli/commands/update.js +3 -1
  74. package/dist/cli/commands/update.js.map +1 -1
  75. package/dist/cli/commands/update.test.js +8 -4
  76. package/dist/cli/commands/update.test.js.map +1 -1
  77. package/dist/cli/commands/version.d.ts.map +1 -1
  78. package/dist/cli/commands/version.js +3 -1
  79. package/dist/cli/commands/version.js.map +1 -1
  80. package/dist/cli/commands/version.test.js +9 -5
  81. package/dist/cli/commands/version.test.js.map +1 -1
  82. package/dist/cli/index.d.ts.map +1 -1
  83. package/dist/cli/index.js +2 -0
  84. package/dist/cli/index.js.map +1 -1
  85. package/dist/cli/init-flag-families.d.ts +6 -1
  86. package/dist/cli/init-flag-families.d.ts.map +1 -1
  87. package/dist/cli/init-flag-families.js +32 -1
  88. package/dist/cli/init-flag-families.js.map +1 -1
  89. package/dist/cli/init-flag-families.test.js +47 -0
  90. package/dist/cli/init-flag-families.test.js.map +1 -1
  91. package/dist/cli/output/interactive.d.ts +1 -0
  92. package/dist/cli/output/interactive.d.ts.map +1 -1
  93. package/dist/cli/output/interactive.js +5 -0
  94. package/dist/cli/output/interactive.js.map +1 -1
  95. package/dist/cli/shutdown.d.ts +51 -0
  96. package/dist/cli/shutdown.d.ts.map +1 -0
  97. package/dist/cli/shutdown.js +199 -0
  98. package/dist/cli/shutdown.js.map +1 -0
  99. package/dist/cli/shutdown.test.d.ts +2 -0
  100. package/dist/cli/shutdown.test.d.ts.map +1 -0
  101. package/dist/cli/shutdown.test.js +316 -0
  102. package/dist/cli/shutdown.test.js.map +1 -0
  103. package/dist/config/schema.d.ts +272 -16
  104. package/dist/config/schema.d.ts.map +1 -1
  105. package/dist/config/schema.js +25 -1
  106. package/dist/config/schema.js.map +1 -1
  107. package/dist/config/schema.test.js +103 -3
  108. package/dist/config/schema.test.js.map +1 -1
  109. package/dist/core/assembly/overlay-loader.d.ts +12 -0
  110. package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
  111. package/dist/core/assembly/overlay-loader.js +30 -0
  112. package/dist/core/assembly/overlay-loader.js.map +1 -1
  113. package/dist/core/assembly/overlay-loader.test.js +66 -1
  114. package/dist/core/assembly/overlay-loader.test.js.map +1 -1
  115. package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
  116. package/dist/core/assembly/overlay-state-resolver.js +48 -19
  117. package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
  118. package/dist/core/assembly/overlay-state-resolver.test.js +80 -0
  119. package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
  120. package/dist/e2e/init.test.js +5 -4
  121. package/dist/e2e/init.test.js.map +1 -1
  122. package/dist/e2e/project-type-overlays.test.js +119 -0
  123. package/dist/e2e/project-type-overlays.test.js.map +1 -1
  124. package/dist/project/adopt.d.ts.map +1 -1
  125. package/dist/project/adopt.js +3 -1
  126. package/dist/project/adopt.js.map +1 -1
  127. package/dist/project/detectors/disambiguate.js +1 -1
  128. package/dist/project/detectors/disambiguate.js.map +1 -1
  129. package/dist/project/detectors/index.d.ts.map +1 -1
  130. package/dist/project/detectors/index.js +2 -1
  131. package/dist/project/detectors/index.js.map +1 -1
  132. package/dist/project/detectors/ml.d.ts.map +1 -1
  133. package/dist/project/detectors/ml.js +2 -6
  134. package/dist/project/detectors/ml.js.map +1 -1
  135. package/dist/project/detectors/research.d.ts +4 -0
  136. package/dist/project/detectors/research.d.ts.map +1 -0
  137. package/dist/project/detectors/research.js +141 -0
  138. package/dist/project/detectors/research.js.map +1 -0
  139. package/dist/project/detectors/research.test.d.ts +2 -0
  140. package/dist/project/detectors/research.test.d.ts.map +1 -0
  141. package/dist/project/detectors/research.test.js +235 -0
  142. package/dist/project/detectors/research.test.js.map +1 -0
  143. package/dist/project/detectors/shared-signals.d.ts +3 -0
  144. package/dist/project/detectors/shared-signals.d.ts.map +1 -0
  145. package/dist/project/detectors/shared-signals.js +9 -0
  146. package/dist/project/detectors/shared-signals.js.map +1 -0
  147. package/dist/project/detectors/types.d.ts +6 -2
  148. package/dist/project/detectors/types.d.ts.map +1 -1
  149. package/dist/project/detectors/types.js.map +1 -1
  150. package/dist/state/lock-manager.d.ts +1 -0
  151. package/dist/state/lock-manager.d.ts.map +1 -1
  152. package/dist/state/lock-manager.js +1 -1
  153. package/dist/state/lock-manager.js.map +1 -1
  154. package/dist/types/config.d.ts +7 -1
  155. package/dist/types/config.d.ts.map +1 -1
  156. package/dist/wizard/copy/core.d.ts.map +1 -1
  157. package/dist/wizard/copy/core.js +4 -0
  158. package/dist/wizard/copy/core.js.map +1 -1
  159. package/dist/wizard/copy/index.d.ts.map +1 -1
  160. package/dist/wizard/copy/index.js +2 -0
  161. package/dist/wizard/copy/index.js.map +1 -1
  162. package/dist/wizard/copy/research.d.ts +3 -0
  163. package/dist/wizard/copy/research.d.ts.map +1 -0
  164. package/dist/wizard/copy/research.js +27 -0
  165. package/dist/wizard/copy/research.js.map +1 -0
  166. package/dist/wizard/copy/types.d.ts +5 -1
  167. package/dist/wizard/copy/types.d.ts.map +1 -1
  168. package/dist/wizard/flags.d.ts +7 -1
  169. package/dist/wizard/flags.d.ts.map +1 -1
  170. package/dist/wizard/questions.d.ts +4 -2
  171. package/dist/wizard/questions.d.ts.map +1 -1
  172. package/dist/wizard/questions.js +27 -1
  173. package/dist/wizard/questions.js.map +1 -1
  174. package/dist/wizard/questions.test.js +51 -0
  175. package/dist/wizard/questions.test.js.map +1 -1
  176. package/dist/wizard/wizard.d.ts +3 -2
  177. package/dist/wizard/wizard.d.ts.map +1 -1
  178. package/dist/wizard/wizard.js +3 -1
  179. package/dist/wizard/wizard.js.map +1 -1
  180. package/package.json +1 -1
@@ -0,0 +1,429 @@
1
+ ---
2
+ name: research-experiment-loop
3
+ description: Autonomous experiment loop patterns including hypothesis-execute-evaluate-keep/discard cycle, iteration control, budget management, and early stopping
4
+ topics: [research, experiment-loop, autonomous, iteration, budget, early-stopping, hypothesis]
5
+ ---
6
+
7
+ The experiment loop is the defining pattern of research projects: an agent iteratively generates hypotheses, executes experiments, evaluates results, and makes keep/discard decisions. This loop can run autonomously (agent decides everything), with checkpoints (agent pauses for human review at intervals), or human-guided (human decides what to try, agent executes). The loop's correctness depends on proper iteration control, budget enforcement, and state management -- without these, autonomous agents will iterate forever or lose track of what has been tried.
8
+
9
+ ## Summary
10
+
11
+ Implement the experiment loop as a state machine with four phases: hypothesize (select what to try next), execute (run the experiment), evaluate (compute metrics and compare to baseline), and decide (keep or discard based on success criteria). Enforce iteration budgets (run count, wall time, compute cost) and early stopping (convergence detection, diminishing returns). Persist full loop state to disk after every iteration so the loop can resume after interruption. For autonomous mode, implement safety limits that cannot be overridden by the agent.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### The Four-Phase Loop
16
+
17
+ ```
18
+ ┌──────────────┐
19
+ │ Hypothesize │◄──────────────────────────┐
20
+ │ (what next?) │ │
21
+ └──────┬───────┘ │
22
+ │ │
23
+ ┌──────▼───────┐ │
24
+ │ Execute │ │
25
+ │ (run it) │ │
26
+ └──────┬───────┘ │
27
+ │ │
28
+ ┌──────▼───────┐ │
29
+ │ Evaluate │ │
30
+ │ (measure) │ │
31
+ └──────┬───────┘ │
32
+ │ │
33
+ ┌──────▼───────┐ ┌──────────┐ │
34
+ │ Decide │────►│ Keep │──────────┘
35
+ │ (keep/discard)│ └──────────┘
36
+ └──────┬───────┘ ┌──────────┐
37
+ └────────────►│ Discard │──────────┘
38
+ └──────────┘
39
+ ```
40
+
41
+ ### State Machine Implementation
42
+
43
+ ```python
44
+ # src/loop/state_machine.py
45
+ from enum import Enum, auto
46
+ from dataclasses import dataclass, field
47
+ from typing import Any
48
+ import time
49
+
50
+ class LoopPhase(Enum):
51
+ HYPOTHESIZE = auto()
52
+ EXECUTE = auto()
53
+ EVALUATE = auto()
54
+ DECIDE = auto()
55
+ STOPPED = auto()
56
+
57
+ @dataclass
58
+ class LoopState:
59
+ """Full state of the experiment loop, persisted after every transition."""
60
+ phase: LoopPhase = LoopPhase.HYPOTHESIZE
61
+ iteration: int = 0
62
+ current_hypothesis: dict[str, Any] | None = None
63
+ current_results: dict[str, Any] | None = None
64
+ current_metrics: dict[str, float] | None = None
65
+ best_metrics: dict[str, float] | None = None
66
+ best_hypothesis: dict[str, Any] | None = None
67
+ best_iteration: int = 0
68
+ history: list[dict] = field(default_factory=list)
69
+ start_time: float = field(default_factory=time.time)
70
+ stop_reason: str = ""
71
+
72
+ class ExperimentLoop:
73
+ """State machine for the experiment loop."""
74
+
75
+ def __init__(self, strategy, evaluator, budget, tracker,
76
+ state: LoopState | None = None):
77
+ self.strategy = strategy
78
+ self.evaluator = evaluator
79
+ self.budget = budget
80
+ self.tracker = tracker
81
+ self.state = state or LoopState()
82
+
83
+ def step(self) -> LoopPhase:
84
+ """Execute one phase transition. Returns the new phase."""
85
+ match self.state.phase:
86
+ case LoopPhase.HYPOTHESIZE:
87
+ return self._hypothesize()
88
+ case LoopPhase.EXECUTE:
89
+ return self._execute()
90
+ case LoopPhase.EVALUATE:
91
+ return self._evaluate()
92
+ case LoopPhase.DECIDE:
93
+ return self._decide()
94
+ case LoopPhase.STOPPED:
95
+ return LoopPhase.STOPPED
96
+
97
+ def run(self) -> LoopState:
98
+ """Run the loop until stopped."""
99
+ while self.state.phase != LoopPhase.STOPPED:
100
+ self.step()
101
+ self.tracker.save_state(self.state)
102
+ return self.state
103
+
104
+ def _hypothesize(self) -> LoopPhase:
105
+ """Generate the next hypothesis to test."""
106
+ # Check budget before starting a new iteration
107
+ exhausted, reason = self.budget.check(self.state)
108
+ if exhausted:
109
+ self.state.stop_reason = reason
110
+ self.state.phase = LoopPhase.STOPPED
111
+ return LoopPhase.STOPPED
112
+
113
+ self.state.iteration += 1
114
+ self.state.current_hypothesis = self.strategy.next_hypothesis(self.state)
115
+ self.state.phase = LoopPhase.EXECUTE
116
+ return LoopPhase.EXECUTE
117
+
118
+ def _execute(self) -> LoopPhase:
119
+ """Execute the current hypothesis."""
120
+ self.state.current_results = self.strategy.execute(
121
+ self.state.current_hypothesis
122
+ )
123
+ self.state.phase = LoopPhase.EVALUATE
124
+ return LoopPhase.EVALUATE
125
+
126
+ def _evaluate(self) -> LoopPhase:
127
+ """Evaluate execution results."""
128
+ self.state.current_metrics = self.evaluator.evaluate(
129
+ self.state.current_results
130
+ )
131
+ self.state.phase = LoopPhase.DECIDE
132
+ return LoopPhase.DECIDE
133
+
134
+ def _decide(self) -> LoopPhase:
135
+ """Decide whether to keep or discard the current run."""
136
+ is_improvement = (
137
+ self.state.best_metrics is None
138
+ or self.evaluator.is_improvement(
139
+ self.state.current_metrics, self.state.best_metrics
140
+ )
141
+ )
142
+
143
+ decision = "keep" if is_improvement else "discard"
144
+
145
+ if is_improvement:
146
+ self.state.best_metrics = self.state.current_metrics
147
+ self.state.best_hypothesis = self.state.current_hypothesis
148
+ self.state.best_iteration = self.state.iteration
149
+
150
+ # Record to history
151
+ self.state.history.append({
152
+ "iteration": self.state.iteration,
153
+ "hypothesis": self.state.current_hypothesis,
154
+ "metrics": self.state.current_metrics,
155
+ "decision": decision,
156
+ })
157
+
158
+ self.tracker.log_decision(
159
+ iteration=self.state.iteration,
160
+ decision=decision,
161
+ metrics=self.state.current_metrics,
162
+ )
163
+
164
+ # Reset for next iteration
165
+ self.state.current_hypothesis = None
166
+ self.state.current_results = None
167
+ self.state.current_metrics = None
168
+ self.state.phase = LoopPhase.HYPOTHESIZE
169
+ return LoopPhase.HYPOTHESIZE
170
+ ```
171
+
172
+ ### Git-Based Keep/Discard (Code-Driven)
173
+
174
+ For code-driven experiments, git is the state machine. The agent creates a branch, modifies code, runs the experiment, and either merges (keep) or deletes the branch (discard):
175
+
176
+ ```python
177
+ # src/loop/git_state.py
178
+ import subprocess
179
+ import logging
180
+
181
+ logger = logging.getLogger(__name__)
182
+
183
+ class GitExperimentState:
184
+ """Git-based state management for code-driven experiments."""
185
+
186
+ def __init__(self, base_branch: str = "main"):
187
+ self.base_branch = base_branch
188
+
189
+ def start_experiment(self, experiment_id: str) -> str:
190
+ """Create a new experiment branch."""
191
+ branch = f"exp/{experiment_id}"
192
+ subprocess.run(
193
+ ["git", "checkout", "-b", branch, self.base_branch],
194
+ check=True, capture_output=True,
195
+ )
196
+ logger.info("Created experiment branch: %s", branch)
197
+ return branch
198
+
199
+ def keep(self, branch: str, message: str) -> None:
200
+ """Merge experiment branch to main (keep decision)."""
201
+ subprocess.run(
202
+ ["git", "checkout", self.base_branch],
203
+ check=True, capture_output=True,
204
+ )
205
+ subprocess.run(
206
+ ["git", "merge", "--no-ff", branch, "-m", message],
207
+ check=True, capture_output=True,
208
+ )
209
+ subprocess.run(
210
+ ["git", "branch", "-d", branch],
211
+ check=True, capture_output=True,
212
+ )
213
+ logger.info("Kept experiment: %s", branch)
214
+
215
+ def discard(self, branch: str) -> None:
216
+ """Delete experiment branch (discard decision)."""
217
+ subprocess.run(
218
+ ["git", "checkout", self.base_branch],
219
+ check=True, capture_output=True,
220
+ )
221
+ # Tag for reference before deleting
222
+ tag = f"archive/{branch}"
223
+ subprocess.run(
224
+ ["git", "tag", tag, branch],
225
+ capture_output=True, # Don't fail if tag exists
226
+ )
227
+ subprocess.run(
228
+ ["git", "branch", "-D", branch],
229
+ check=True, capture_output=True,
230
+ )
231
+ logger.info("Discarded experiment: %s (tagged as %s)", branch, tag)
232
+
233
+ def revert_to_baseline(self) -> None:
234
+ """Hard reset to the base branch (emergency recovery)."""
235
+ subprocess.run(
236
+ ["git", "checkout", self.base_branch],
237
+ check=True, capture_output=True,
238
+ )
239
+ subprocess.run(
240
+ ["git", "reset", "--hard", self.base_branch],
241
+ check=True, capture_output=True,
242
+ )
243
+ logger.warning("Reverted to baseline: %s", self.base_branch)
244
+ ```
245
+
246
+ ### Interaction Modes
247
+
248
+ **Autonomous mode**: The loop runs without human intervention. Safety limits are critical:
249
+
250
+ ```python
251
+ # Autonomous mode — hard safety limits
252
+ AUTONOMOUS_LIMITS = {
253
+ "max_runs": 1000, # Absolute maximum, non-overridable
254
+ "max_wall_hours": 72, # 3-day hard cap
255
+ "max_cost_usd": 500, # Cost ceiling
256
+ "max_consecutive_errors": 10, # Stop if 10 runs fail in a row
257
+ }
258
+ ```
259
+
260
+ **Checkpoint-gated mode**: The loop pauses for human review at intervals. The checkpoint gate is a blocking call that waits for human input:
261
+
262
+ ```python
263
+ # src/loop/checkpoint.py
264
+ import logging
265
+
266
+ logger = logging.getLogger(__name__)
267
+
268
+ class CheckpointGate:
269
+ """Pause the experiment loop for human review."""
270
+
271
+ def __init__(self, interval: int = 100):
272
+ self.interval = interval
273
+
274
+ def should_checkpoint(self, iteration: int) -> bool:
275
+ return iteration > 0 and iteration % self.interval == 0
276
+
277
+ def checkpoint(self, state: "LoopState") -> bool:
278
+ """Present state to human, return True to continue, False to stop."""
279
+ print(f"\n{'='*60}")
280
+ print(f"CHECKPOINT — Iteration {state.iteration}")
281
+ print(f"Best so far: {state.best_metrics}")
282
+ print(f"Best found at iteration: {state.best_iteration}")
283
+ print(f"Runs since improvement: {state.iteration - state.best_iteration}")
284
+ print(f"{'='*60}")
285
+
286
+ while True:
287
+ response = input("Continue? [y/n/s(kip to next checkpoint)]: ").lower()
288
+ if response in ("y", "yes"):
289
+ return True
290
+ elif response in ("n", "no"):
291
+ return False
292
+ elif response in ("s", "skip"):
293
+ return True
294
+ ```
295
+
296
+ **Human-guided mode**: The human provides the hypothesis, the agent executes and evaluates. The loop does not auto-generate hypotheses -- it waits for input:
297
+
298
+ ```python
299
+ # Human-guided: the strategy's next_hypothesis() prompts the user
300
+ class HumanGuidedStrategy:
301
+ def next_hypothesis(self, state):
302
+ print(f"\nCurrent best: {state.best_metrics}")
303
+ print("Enter next experiment parameters (or 'quit'):")
304
+ # ... interactive parameter input ...
305
+ ```
306
+
307
+ ### Early Stopping
308
+
309
+ Early stopping detects when further iteration is unlikely to produce meaningful improvements:
310
+
311
+ ```python
312
+ # src/loop/early_stopping.py
313
+ import numpy as np
314
+ from typing import Optional
315
+
316
+ class EarlyStoppingMonitor:
317
+ """Monitor experiment progress and detect when to stop."""
318
+
319
+ def __init__(self, patience: int = 50, min_delta: float = 1e-4,
320
+ convergence_window: int = 20):
321
+ self.patience = patience
322
+ self.min_delta = min_delta
323
+ self.convergence_window = convergence_window
324
+ self.best_value: Optional[float] = None
325
+ self.wait: int = 0
326
+ self.history: list[float] = []
327
+
328
+ def update(self, value: float) -> tuple[bool, str]:
329
+ """Update with new metric value. Returns (should_stop, reason)."""
330
+ self.history.append(value)
331
+
332
+ # Check patience (no improvement for N iterations)
333
+ if self.best_value is None or value > self.best_value + self.min_delta:
334
+ self.best_value = value
335
+ self.wait = 0
336
+ else:
337
+ self.wait += 1
338
+ if self.wait >= self.patience:
339
+ return True, f"No improvement for {self.patience} iterations"
340
+
341
+ # Check convergence (metric has plateaued)
342
+ if len(self.history) >= self.convergence_window * 2:
343
+ recent = np.array(self.history[-self.convergence_window:])
344
+ prior = np.array(
345
+ self.history[-2 * self.convergence_window:-self.convergence_window]
346
+ )
347
+ if abs(recent.mean() - prior.mean()) < self.min_delta:
348
+ return True, "Metric has converged (plateau detected)"
349
+
350
+ return False, ""
351
+ ```
352
+
353
+ ### Budget Enforcement
354
+
355
+ Budget limits must be enforced at the loop level, not delegated to the strategy. The strategy should not be able to override budget limits:
356
+
357
+ ```python
358
+ # src/loop/budget.py
359
+ import time
360
+ from dataclasses import dataclass
361
+ from datetime import timedelta
362
+
363
+ @dataclass
364
+ class BudgetEnforcer:
365
+ """Enforce hard limits on experiment iteration."""
366
+ max_runs: int = 500
367
+ max_wall_seconds: float = 48 * 3600 # 48 hours
368
+ max_consecutive_errors: int = 10
369
+ patience: int = 50
370
+
371
+ _start_time: float = 0.0
372
+ _consecutive_errors: int = 0
373
+
374
+ def start(self) -> None:
375
+ self._start_time = time.time()
376
+ self._consecutive_errors = 0
377
+
378
+ def record_success(self) -> None:
379
+ self._consecutive_errors = 0
380
+
381
+ def record_error(self) -> None:
382
+ self._consecutive_errors += 1
383
+
384
+ def check(self, state) -> tuple[bool, str]:
385
+ """Check all budget constraints. Returns (exhausted, reason)."""
386
+ if state.iteration >= self.max_runs:
387
+ return True, f"Run limit: {state.iteration}/{self.max_runs}"
388
+
389
+ elapsed = time.time() - self._start_time
390
+ if elapsed >= self.max_wall_seconds:
391
+ return True, f"Time limit: {timedelta(seconds=int(elapsed))}"
392
+
393
+ if self._consecutive_errors >= self.max_consecutive_errors:
394
+ return True, f"Error limit: {self._consecutive_errors} consecutive failures"
395
+
396
+ runs_since = state.iteration - state.best_iteration
397
+ if runs_since >= self.patience:
398
+ return True, f"Patience: {runs_since} runs without improvement"
399
+
400
+ return False, ""
401
+ ```
402
+
403
+ ### Crash Recovery
404
+
405
+ The loop must be resumable. After every iteration, persist the full state:
406
+
407
+ ```python
408
+ # Resume pattern
409
+ state_path = Path("results/exp-001/loop_state.json")
410
+ if state_path.exists():
411
+ state = LoopState.load(state_path)
412
+ logger.info("Resuming from iteration %d", state.iteration)
413
+ else:
414
+ state = LoopState()
415
+ logger.info("Starting fresh experiment loop")
416
+
417
+ loop = ExperimentLoop(strategy, evaluator, budget, tracker, state=state)
418
+ final_state = loop.run()
419
+ ```
420
+
421
+ The key invariant: **state is persisted after every decide phase, before the next hypothesize phase**. This means a crash during execute or evaluate loses at most one run, and the loop resumes at the correct phase.
422
+
423
+ ### Anti-Patterns
424
+
425
+ - **No budget limits**: The loop runs forever. Always set a max_runs limit.
426
+ - **Budget in strategy code**: The strategy overrides budget limits. Budget enforcement must be in the runner.
427
+ - **No state persistence**: A crash loses all progress. Save state after every iteration.
428
+ - **Hypothesis depends on results order**: If the next hypothesis depends on the order results were computed (not just their values), the loop is not reproducible.
429
+ - **Shared mutable state between iterations**: Each iteration must be independent. The only shared state is the loop state (best result, history), never mutable global variables.