@zigrivers/scaffold 3.14.0 → 3.16.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +50 -21
- package/content/knowledge/core/automated-review-tooling.md +21 -26
- package/content/knowledge/core/multi-model-review-dispatch.md +30 -55
- package/content/knowledge/research/research-architecture.md +385 -0
- package/content/knowledge/research/research-conventions.md +248 -0
- package/content/knowledge/research/research-dev-environment.md +303 -0
- package/content/knowledge/research/research-experiment-loop.md +429 -0
- package/content/knowledge/research/research-experiment-tracking.md +336 -0
- package/content/knowledge/research/research-ml-architecture-search.md +383 -0
- package/content/knowledge/research/research-ml-evaluation.md +407 -0
- package/content/knowledge/research/research-ml-experiment-tracking.md +466 -0
- package/content/knowledge/research/research-ml-training-patterns.md +413 -0
- package/content/knowledge/research/research-observability.md +395 -0
- package/content/knowledge/research/research-overfitting-prevention.md +306 -0
- package/content/knowledge/research/research-project-structure.md +264 -0
- package/content/knowledge/research/research-quant-backtesting.md +326 -0
- package/content/knowledge/research/research-quant-market-data.md +366 -0
- package/content/knowledge/research/research-quant-metrics.md +335 -0
- package/content/knowledge/research/research-quant-requirements.md +223 -0
- package/content/knowledge/research/research-quant-risk.md +469 -0
- package/content/knowledge/research/research-quant-strategy-patterns.md +412 -0
- package/content/knowledge/research/research-requirements.md +201 -0
- package/content/knowledge/research/research-security.md +374 -0
- package/content/knowledge/research/research-sim-compute-management.md +538 -0
- package/content/knowledge/research/research-sim-engine-patterns.md +448 -0
- package/content/knowledge/research/research-sim-parameter-spaces.md +425 -0
- package/content/knowledge/research/research-sim-validation.md +456 -0
- package/content/knowledge/research/research-testing.md +334 -0
- package/content/methodology/research-ml-research.yml +23 -0
- package/content/methodology/research-overlay.yml +65 -0
- package/content/methodology/research-quant-finance.yml +29 -0
- package/content/methodology/research-simulation.yml +23 -0
- package/content/tools/post-implementation-review.md +36 -7
- package/content/tools/review-code.md +33 -8
- package/content/tools/review-pr.md +79 -95
- package/dist/cli/commands/adopt.d.ts.map +1 -1
- package/dist/cli/commands/adopt.js +22 -1
- package/dist/cli/commands/adopt.js.map +1 -1
- package/dist/cli/commands/adopt.serialization.test.js +41 -0
- package/dist/cli/commands/adopt.serialization.test.js.map +1 -1
- package/dist/cli/commands/init.d.ts +4 -0
- package/dist/cli/commands/init.d.ts.map +1 -1
- package/dist/cli/commands/init.js +32 -2
- package/dist/cli/commands/init.js.map +1 -1
- package/dist/cli/init-flag-families.d.ts +6 -1
- package/dist/cli/init-flag-families.d.ts.map +1 -1
- package/dist/cli/init-flag-families.js +32 -1
- package/dist/cli/init-flag-families.js.map +1 -1
- package/dist/cli/init-flag-families.test.js +47 -0
- package/dist/cli/init-flag-families.test.js.map +1 -1
- package/dist/config/schema.d.ts +272 -16
- package/dist/config/schema.d.ts.map +1 -1
- package/dist/config/schema.js +25 -1
- package/dist/config/schema.js.map +1 -1
- package/dist/config/schema.test.js +103 -3
- package/dist/config/schema.test.js.map +1 -1
- package/dist/core/assembly/overlay-loader.d.ts +12 -0
- package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
- package/dist/core/assembly/overlay-loader.js +30 -0
- package/dist/core/assembly/overlay-loader.js.map +1 -1
- package/dist/core/assembly/overlay-loader.test.js +66 -1
- package/dist/core/assembly/overlay-loader.test.js.map +1 -1
- package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
- package/dist/core/assembly/overlay-state-resolver.js +48 -19
- package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
- package/dist/core/assembly/overlay-state-resolver.test.js +80 -0
- package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
- package/dist/e2e/project-type-overlays.test.js +119 -0
- package/dist/e2e/project-type-overlays.test.js.map +1 -1
- package/dist/project/adopt.d.ts.map +1 -1
- package/dist/project/adopt.js +3 -1
- package/dist/project/adopt.js.map +1 -1
- package/dist/project/detectors/disambiguate.js +1 -1
- package/dist/project/detectors/disambiguate.js.map +1 -1
- package/dist/project/detectors/index.d.ts.map +1 -1
- package/dist/project/detectors/index.js +2 -1
- package/dist/project/detectors/index.js.map +1 -1
- package/dist/project/detectors/ml.d.ts.map +1 -1
- package/dist/project/detectors/ml.js +2 -6
- package/dist/project/detectors/ml.js.map +1 -1
- package/dist/project/detectors/research.d.ts +4 -0
- package/dist/project/detectors/research.d.ts.map +1 -0
- package/dist/project/detectors/research.js +141 -0
- package/dist/project/detectors/research.js.map +1 -0
- package/dist/project/detectors/research.test.d.ts +2 -0
- package/dist/project/detectors/research.test.d.ts.map +1 -0
- package/dist/project/detectors/research.test.js +235 -0
- package/dist/project/detectors/research.test.js.map +1 -0
- package/dist/project/detectors/shared-signals.d.ts +3 -0
- package/dist/project/detectors/shared-signals.d.ts.map +1 -0
- package/dist/project/detectors/shared-signals.js +9 -0
- package/dist/project/detectors/shared-signals.js.map +1 -0
- package/dist/project/detectors/types.d.ts +6 -2
- package/dist/project/detectors/types.d.ts.map +1 -1
- package/dist/project/detectors/types.js.map +1 -1
- package/dist/types/config.d.ts +7 -1
- package/dist/types/config.d.ts.map +1 -1
- package/dist/wizard/copy/core.d.ts.map +1 -1
- package/dist/wizard/copy/core.js +4 -0
- package/dist/wizard/copy/core.js.map +1 -1
- package/dist/wizard/copy/index.d.ts.map +1 -1
- package/dist/wizard/copy/index.js +2 -0
- package/dist/wizard/copy/index.js.map +1 -1
- package/dist/wizard/copy/research.d.ts +3 -0
- package/dist/wizard/copy/research.d.ts.map +1 -0
- package/dist/wizard/copy/research.js +27 -0
- package/dist/wizard/copy/research.js.map +1 -0
- package/dist/wizard/copy/types.d.ts +5 -1
- package/dist/wizard/copy/types.d.ts.map +1 -1
- package/dist/wizard/flags.d.ts +7 -1
- package/dist/wizard/flags.d.ts.map +1 -1
- package/dist/wizard/questions.d.ts +4 -2
- package/dist/wizard/questions.d.ts.map +1 -1
- package/dist/wizard/questions.js +27 -1
- package/dist/wizard/questions.js.map +1 -1
- package/dist/wizard/questions.test.js +51 -0
- package/dist/wizard/questions.test.js.map +1 -1
- package/dist/wizard/wizard.d.ts +3 -2
- package/dist/wizard/wizard.d.ts.map +1 -1
- package/dist/wizard/wizard.js +3 -1
- package/dist/wizard/wizard.js.map +1 -1
- package/package.json +1 -1
|
@@ -0,0 +1,429 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research-experiment-loop
|
|
3
|
+
description: Autonomous experiment loop patterns including hypothesis-execute-evaluate-keep/discard cycle, iteration control, budget management, and early stopping
|
|
4
|
+
topics: [research, experiment-loop, autonomous, iteration, budget, early-stopping, hypothesis]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
The experiment loop is the defining pattern of research projects: an agent iteratively generates hypotheses, executes experiments, evaluates results, and makes keep/discard decisions. This loop can run autonomously (agent decides everything), with checkpoints (agent pauses for human review at intervals), or human-guided (human decides what to try, agent executes). The loop's correctness depends on proper iteration control, budget enforcement, and state management -- without these, autonomous agents will iterate forever or lose track of what has been tried.
|
|
8
|
+
|
|
9
|
+
## Summary
|
|
10
|
+
|
|
11
|
+
Implement the experiment loop as a state machine with four phases: hypothesize (select what to try next), execute (run the experiment), evaluate (compute metrics and compare to baseline), and decide (keep or discard based on success criteria). Enforce iteration budgets (run count, wall time, compute cost) and early stopping (convergence detection, diminishing returns). Persist full loop state to disk after every iteration so the loop can resume after interruption. For autonomous mode, implement safety limits that cannot be overridden by the agent.
|
|
12
|
+
|
|
13
|
+
## Deep Guidance
|
|
14
|
+
|
|
15
|
+
### The Four-Phase Loop
|
|
16
|
+
|
|
17
|
+
```
|
|
18
|
+
┌──────────────┐
|
|
19
|
+
│ Hypothesize │◄──────────────────────────┐
|
|
20
|
+
│ (what next?) │ │
|
|
21
|
+
└──────┬───────┘ │
|
|
22
|
+
│ │
|
|
23
|
+
┌──────▼───────┐ │
|
|
24
|
+
│ Execute │ │
|
|
25
|
+
│ (run it) │ │
|
|
26
|
+
└──────┬───────┘ │
|
|
27
|
+
│ │
|
|
28
|
+
┌──────▼───────┐ │
|
|
29
|
+
│ Evaluate │ │
|
|
30
|
+
│ (measure) │ │
|
|
31
|
+
└──────┬───────┘ │
|
|
32
|
+
│ │
|
|
33
|
+
┌──────▼───────┐ ┌──────────┐ │
|
|
34
|
+
│ Decide │────►│ Keep │──────────┘
|
|
35
|
+
│ (keep/discard)│ └──────────┘
|
|
36
|
+
└──────┬───────┘ ┌──────────┐
|
|
37
|
+
└────────────►│ Discard │──────────┘
|
|
38
|
+
└──────────┘
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
### State Machine Implementation
|
|
42
|
+
|
|
43
|
+
```python
|
|
44
|
+
# src/loop/state_machine.py
|
|
45
|
+
from enum import Enum, auto
|
|
46
|
+
from dataclasses import dataclass, field
|
|
47
|
+
from typing import Any
|
|
48
|
+
import time
|
|
49
|
+
|
|
50
|
+
class LoopPhase(Enum):
|
|
51
|
+
HYPOTHESIZE = auto()
|
|
52
|
+
EXECUTE = auto()
|
|
53
|
+
EVALUATE = auto()
|
|
54
|
+
DECIDE = auto()
|
|
55
|
+
STOPPED = auto()
|
|
56
|
+
|
|
57
|
+
@dataclass
|
|
58
|
+
class LoopState:
|
|
59
|
+
"""Full state of the experiment loop, persisted after every transition."""
|
|
60
|
+
phase: LoopPhase = LoopPhase.HYPOTHESIZE
|
|
61
|
+
iteration: int = 0
|
|
62
|
+
current_hypothesis: dict[str, Any] | None = None
|
|
63
|
+
current_results: dict[str, Any] | None = None
|
|
64
|
+
current_metrics: dict[str, float] | None = None
|
|
65
|
+
best_metrics: dict[str, float] | None = None
|
|
66
|
+
best_hypothesis: dict[str, Any] | None = None
|
|
67
|
+
best_iteration: int = 0
|
|
68
|
+
history: list[dict] = field(default_factory=list)
|
|
69
|
+
start_time: float = field(default_factory=time.time)
|
|
70
|
+
stop_reason: str = ""
|
|
71
|
+
|
|
72
|
+
class ExperimentLoop:
|
|
73
|
+
"""State machine for the experiment loop."""
|
|
74
|
+
|
|
75
|
+
def __init__(self, strategy, evaluator, budget, tracker,
|
|
76
|
+
state: LoopState | None = None):
|
|
77
|
+
self.strategy = strategy
|
|
78
|
+
self.evaluator = evaluator
|
|
79
|
+
self.budget = budget
|
|
80
|
+
self.tracker = tracker
|
|
81
|
+
self.state = state or LoopState()
|
|
82
|
+
|
|
83
|
+
def step(self) -> LoopPhase:
|
|
84
|
+
"""Execute one phase transition. Returns the new phase."""
|
|
85
|
+
match self.state.phase:
|
|
86
|
+
case LoopPhase.HYPOTHESIZE:
|
|
87
|
+
return self._hypothesize()
|
|
88
|
+
case LoopPhase.EXECUTE:
|
|
89
|
+
return self._execute()
|
|
90
|
+
case LoopPhase.EVALUATE:
|
|
91
|
+
return self._evaluate()
|
|
92
|
+
case LoopPhase.DECIDE:
|
|
93
|
+
return self._decide()
|
|
94
|
+
case LoopPhase.STOPPED:
|
|
95
|
+
return LoopPhase.STOPPED
|
|
96
|
+
|
|
97
|
+
def run(self) -> LoopState:
|
|
98
|
+
"""Run the loop until stopped."""
|
|
99
|
+
while self.state.phase != LoopPhase.STOPPED:
|
|
100
|
+
self.step()
|
|
101
|
+
self.tracker.save_state(self.state)
|
|
102
|
+
return self.state
|
|
103
|
+
|
|
104
|
+
def _hypothesize(self) -> LoopPhase:
|
|
105
|
+
"""Generate the next hypothesis to test."""
|
|
106
|
+
# Check budget before starting a new iteration
|
|
107
|
+
exhausted, reason = self.budget.check(self.state)
|
|
108
|
+
if exhausted:
|
|
109
|
+
self.state.stop_reason = reason
|
|
110
|
+
self.state.phase = LoopPhase.STOPPED
|
|
111
|
+
return LoopPhase.STOPPED
|
|
112
|
+
|
|
113
|
+
self.state.iteration += 1
|
|
114
|
+
self.state.current_hypothesis = self.strategy.next_hypothesis(self.state)
|
|
115
|
+
self.state.phase = LoopPhase.EXECUTE
|
|
116
|
+
return LoopPhase.EXECUTE
|
|
117
|
+
|
|
118
|
+
def _execute(self) -> LoopPhase:
|
|
119
|
+
"""Execute the current hypothesis."""
|
|
120
|
+
self.state.current_results = self.strategy.execute(
|
|
121
|
+
self.state.current_hypothesis
|
|
122
|
+
)
|
|
123
|
+
self.state.phase = LoopPhase.EVALUATE
|
|
124
|
+
return LoopPhase.EVALUATE
|
|
125
|
+
|
|
126
|
+
def _evaluate(self) -> LoopPhase:
|
|
127
|
+
"""Evaluate execution results."""
|
|
128
|
+
self.state.current_metrics = self.evaluator.evaluate(
|
|
129
|
+
self.state.current_results
|
|
130
|
+
)
|
|
131
|
+
self.state.phase = LoopPhase.DECIDE
|
|
132
|
+
return LoopPhase.DECIDE
|
|
133
|
+
|
|
134
|
+
def _decide(self) -> LoopPhase:
|
|
135
|
+
"""Decide whether to keep or discard the current run."""
|
|
136
|
+
is_improvement = (
|
|
137
|
+
self.state.best_metrics is None
|
|
138
|
+
or self.evaluator.is_improvement(
|
|
139
|
+
self.state.current_metrics, self.state.best_metrics
|
|
140
|
+
)
|
|
141
|
+
)
|
|
142
|
+
|
|
143
|
+
decision = "keep" if is_improvement else "discard"
|
|
144
|
+
|
|
145
|
+
if is_improvement:
|
|
146
|
+
self.state.best_metrics = self.state.current_metrics
|
|
147
|
+
self.state.best_hypothesis = self.state.current_hypothesis
|
|
148
|
+
self.state.best_iteration = self.state.iteration
|
|
149
|
+
|
|
150
|
+
# Record to history
|
|
151
|
+
self.state.history.append({
|
|
152
|
+
"iteration": self.state.iteration,
|
|
153
|
+
"hypothesis": self.state.current_hypothesis,
|
|
154
|
+
"metrics": self.state.current_metrics,
|
|
155
|
+
"decision": decision,
|
|
156
|
+
})
|
|
157
|
+
|
|
158
|
+
self.tracker.log_decision(
|
|
159
|
+
iteration=self.state.iteration,
|
|
160
|
+
decision=decision,
|
|
161
|
+
metrics=self.state.current_metrics,
|
|
162
|
+
)
|
|
163
|
+
|
|
164
|
+
# Reset for next iteration
|
|
165
|
+
self.state.current_hypothesis = None
|
|
166
|
+
self.state.current_results = None
|
|
167
|
+
self.state.current_metrics = None
|
|
168
|
+
self.state.phase = LoopPhase.HYPOTHESIZE
|
|
169
|
+
return LoopPhase.HYPOTHESIZE
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
### Git-Based Keep/Discard (Code-Driven)
|
|
173
|
+
|
|
174
|
+
For code-driven experiments, git is the state machine. The agent creates a branch, modifies code, runs the experiment, and either merges (keep) or deletes the branch (discard):
|
|
175
|
+
|
|
176
|
+
```python
|
|
177
|
+
# src/loop/git_state.py
|
|
178
|
+
import subprocess
|
|
179
|
+
import logging
|
|
180
|
+
|
|
181
|
+
logger = logging.getLogger(__name__)
|
|
182
|
+
|
|
183
|
+
class GitExperimentState:
|
|
184
|
+
"""Git-based state management for code-driven experiments."""
|
|
185
|
+
|
|
186
|
+
def __init__(self, base_branch: str = "main"):
|
|
187
|
+
self.base_branch = base_branch
|
|
188
|
+
|
|
189
|
+
def start_experiment(self, experiment_id: str) -> str:
|
|
190
|
+
"""Create a new experiment branch."""
|
|
191
|
+
branch = f"exp/{experiment_id}"
|
|
192
|
+
subprocess.run(
|
|
193
|
+
["git", "checkout", "-b", branch, self.base_branch],
|
|
194
|
+
check=True, capture_output=True,
|
|
195
|
+
)
|
|
196
|
+
logger.info("Created experiment branch: %s", branch)
|
|
197
|
+
return branch
|
|
198
|
+
|
|
199
|
+
def keep(self, branch: str, message: str) -> None:
|
|
200
|
+
"""Merge experiment branch to main (keep decision)."""
|
|
201
|
+
subprocess.run(
|
|
202
|
+
["git", "checkout", self.base_branch],
|
|
203
|
+
check=True, capture_output=True,
|
|
204
|
+
)
|
|
205
|
+
subprocess.run(
|
|
206
|
+
["git", "merge", "--no-ff", branch, "-m", message],
|
|
207
|
+
check=True, capture_output=True,
|
|
208
|
+
)
|
|
209
|
+
subprocess.run(
|
|
210
|
+
["git", "branch", "-d", branch],
|
|
211
|
+
check=True, capture_output=True,
|
|
212
|
+
)
|
|
213
|
+
logger.info("Kept experiment: %s", branch)
|
|
214
|
+
|
|
215
|
+
def discard(self, branch: str) -> None:
|
|
216
|
+
"""Delete experiment branch (discard decision)."""
|
|
217
|
+
subprocess.run(
|
|
218
|
+
["git", "checkout", self.base_branch],
|
|
219
|
+
check=True, capture_output=True,
|
|
220
|
+
)
|
|
221
|
+
# Tag for reference before deleting
|
|
222
|
+
tag = f"archive/{branch}"
|
|
223
|
+
subprocess.run(
|
|
224
|
+
["git", "tag", tag, branch],
|
|
225
|
+
capture_output=True, # Don't fail if tag exists
|
|
226
|
+
)
|
|
227
|
+
subprocess.run(
|
|
228
|
+
["git", "branch", "-D", branch],
|
|
229
|
+
check=True, capture_output=True,
|
|
230
|
+
)
|
|
231
|
+
logger.info("Discarded experiment: %s (tagged as %s)", branch, tag)
|
|
232
|
+
|
|
233
|
+
def revert_to_baseline(self) -> None:
|
|
234
|
+
"""Hard reset to the base branch (emergency recovery)."""
|
|
235
|
+
subprocess.run(
|
|
236
|
+
["git", "checkout", self.base_branch],
|
|
237
|
+
check=True, capture_output=True,
|
|
238
|
+
)
|
|
239
|
+
subprocess.run(
|
|
240
|
+
["git", "reset", "--hard", self.base_branch],
|
|
241
|
+
check=True, capture_output=True,
|
|
242
|
+
)
|
|
243
|
+
logger.warning("Reverted to baseline: %s", self.base_branch)
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
### Interaction Modes
|
|
247
|
+
|
|
248
|
+
**Autonomous mode**: The loop runs without human intervention. Safety limits are critical:
|
|
249
|
+
|
|
250
|
+
```python
|
|
251
|
+
# Autonomous mode — hard safety limits
|
|
252
|
+
AUTONOMOUS_LIMITS = {
|
|
253
|
+
"max_runs": 1000, # Absolute maximum, non-overridable
|
|
254
|
+
"max_wall_hours": 72, # 3-day hard cap
|
|
255
|
+
"max_cost_usd": 500, # Cost ceiling
|
|
256
|
+
"max_consecutive_errors": 10, # Stop if 10 runs fail in a row
|
|
257
|
+
}
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
**Checkpoint-gated mode**: The loop pauses for human review at intervals. The checkpoint gate is a blocking call that waits for human input:
|
|
261
|
+
|
|
262
|
+
```python
|
|
263
|
+
# src/loop/checkpoint.py
|
|
264
|
+
import logging
|
|
265
|
+
|
|
266
|
+
logger = logging.getLogger(__name__)
|
|
267
|
+
|
|
268
|
+
class CheckpointGate:
|
|
269
|
+
"""Pause the experiment loop for human review."""
|
|
270
|
+
|
|
271
|
+
def __init__(self, interval: int = 100):
|
|
272
|
+
self.interval = interval
|
|
273
|
+
|
|
274
|
+
def should_checkpoint(self, iteration: int) -> bool:
|
|
275
|
+
return iteration > 0 and iteration % self.interval == 0
|
|
276
|
+
|
|
277
|
+
def checkpoint(self, state: "LoopState") -> bool:
|
|
278
|
+
"""Present state to human, return True to continue, False to stop."""
|
|
279
|
+
print(f"\n{'='*60}")
|
|
280
|
+
print(f"CHECKPOINT — Iteration {state.iteration}")
|
|
281
|
+
print(f"Best so far: {state.best_metrics}")
|
|
282
|
+
print(f"Best found at iteration: {state.best_iteration}")
|
|
283
|
+
print(f"Runs since improvement: {state.iteration - state.best_iteration}")
|
|
284
|
+
print(f"{'='*60}")
|
|
285
|
+
|
|
286
|
+
while True:
|
|
287
|
+
response = input("Continue? [y/n/s(kip to next checkpoint)]: ").lower()
|
|
288
|
+
if response in ("y", "yes"):
|
|
289
|
+
return True
|
|
290
|
+
elif response in ("n", "no"):
|
|
291
|
+
return False
|
|
292
|
+
elif response in ("s", "skip"):
|
|
293
|
+
return True
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
**Human-guided mode**: The human provides the hypothesis, the agent executes and evaluates. The loop does not auto-generate hypotheses -- it waits for input:
|
|
297
|
+
|
|
298
|
+
```python
|
|
299
|
+
# Human-guided: the strategy's next_hypothesis() prompts the user
|
|
300
|
+
class HumanGuidedStrategy:
|
|
301
|
+
def next_hypothesis(self, state):
|
|
302
|
+
print(f"\nCurrent best: {state.best_metrics}")
|
|
303
|
+
print("Enter next experiment parameters (or 'quit'):")
|
|
304
|
+
# ... interactive parameter input ...
|
|
305
|
+
```
|
|
306
|
+
|
|
307
|
+
### Early Stopping
|
|
308
|
+
|
|
309
|
+
Early stopping detects when further iteration is unlikely to produce meaningful improvements:
|
|
310
|
+
|
|
311
|
+
```python
|
|
312
|
+
# src/loop/early_stopping.py
|
|
313
|
+
import numpy as np
|
|
314
|
+
from typing import Optional
|
|
315
|
+
|
|
316
|
+
class EarlyStoppingMonitor:
|
|
317
|
+
"""Monitor experiment progress and detect when to stop."""
|
|
318
|
+
|
|
319
|
+
def __init__(self, patience: int = 50, min_delta: float = 1e-4,
|
|
320
|
+
convergence_window: int = 20):
|
|
321
|
+
self.patience = patience
|
|
322
|
+
self.min_delta = min_delta
|
|
323
|
+
self.convergence_window = convergence_window
|
|
324
|
+
self.best_value: Optional[float] = None
|
|
325
|
+
self.wait: int = 0
|
|
326
|
+
self.history: list[float] = []
|
|
327
|
+
|
|
328
|
+
def update(self, value: float) -> tuple[bool, str]:
|
|
329
|
+
"""Update with new metric value. Returns (should_stop, reason)."""
|
|
330
|
+
self.history.append(value)
|
|
331
|
+
|
|
332
|
+
# Check patience (no improvement for N iterations)
|
|
333
|
+
if self.best_value is None or value > self.best_value + self.min_delta:
|
|
334
|
+
self.best_value = value
|
|
335
|
+
self.wait = 0
|
|
336
|
+
else:
|
|
337
|
+
self.wait += 1
|
|
338
|
+
if self.wait >= self.patience:
|
|
339
|
+
return True, f"No improvement for {self.patience} iterations"
|
|
340
|
+
|
|
341
|
+
# Check convergence (metric has plateaued)
|
|
342
|
+
if len(self.history) >= self.convergence_window * 2:
|
|
343
|
+
recent = np.array(self.history[-self.convergence_window:])
|
|
344
|
+
prior = np.array(
|
|
345
|
+
self.history[-2 * self.convergence_window:-self.convergence_window]
|
|
346
|
+
)
|
|
347
|
+
if abs(recent.mean() - prior.mean()) < self.min_delta:
|
|
348
|
+
return True, "Metric has converged (plateau detected)"
|
|
349
|
+
|
|
350
|
+
return False, ""
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
### Budget Enforcement
|
|
354
|
+
|
|
355
|
+
Budget limits must be enforced at the loop level, not delegated to the strategy. The strategy should not be able to override budget limits:
|
|
356
|
+
|
|
357
|
+
```python
|
|
358
|
+
# src/loop/budget.py
|
|
359
|
+
import time
|
|
360
|
+
from dataclasses import dataclass
|
|
361
|
+
from datetime import timedelta
|
|
362
|
+
|
|
363
|
+
@dataclass
|
|
364
|
+
class BudgetEnforcer:
|
|
365
|
+
"""Enforce hard limits on experiment iteration."""
|
|
366
|
+
max_runs: int = 500
|
|
367
|
+
max_wall_seconds: float = 48 * 3600 # 48 hours
|
|
368
|
+
max_consecutive_errors: int = 10
|
|
369
|
+
patience: int = 50
|
|
370
|
+
|
|
371
|
+
_start_time: float = 0.0
|
|
372
|
+
_consecutive_errors: int = 0
|
|
373
|
+
|
|
374
|
+
def start(self) -> None:
|
|
375
|
+
self._start_time = time.time()
|
|
376
|
+
self._consecutive_errors = 0
|
|
377
|
+
|
|
378
|
+
def record_success(self) -> None:
|
|
379
|
+
self._consecutive_errors = 0
|
|
380
|
+
|
|
381
|
+
def record_error(self) -> None:
|
|
382
|
+
self._consecutive_errors += 1
|
|
383
|
+
|
|
384
|
+
def check(self, state) -> tuple[bool, str]:
|
|
385
|
+
"""Check all budget constraints. Returns (exhausted, reason)."""
|
|
386
|
+
if state.iteration >= self.max_runs:
|
|
387
|
+
return True, f"Run limit: {state.iteration}/{self.max_runs}"
|
|
388
|
+
|
|
389
|
+
elapsed = time.time() - self._start_time
|
|
390
|
+
if elapsed >= self.max_wall_seconds:
|
|
391
|
+
return True, f"Time limit: {timedelta(seconds=int(elapsed))}"
|
|
392
|
+
|
|
393
|
+
if self._consecutive_errors >= self.max_consecutive_errors:
|
|
394
|
+
return True, f"Error limit: {self._consecutive_errors} consecutive failures"
|
|
395
|
+
|
|
396
|
+
runs_since = state.iteration - state.best_iteration
|
|
397
|
+
if runs_since >= self.patience:
|
|
398
|
+
return True, f"Patience: {runs_since} runs without improvement"
|
|
399
|
+
|
|
400
|
+
return False, ""
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
### Crash Recovery
|
|
404
|
+
|
|
405
|
+
The loop must be resumable. After every iteration, persist the full state:
|
|
406
|
+
|
|
407
|
+
```python
|
|
408
|
+
# Resume pattern
|
|
409
|
+
state_path = Path("results/exp-001/loop_state.json")
|
|
410
|
+
if state_path.exists():
|
|
411
|
+
state = LoopState.load(state_path)
|
|
412
|
+
logger.info("Resuming from iteration %d", state.iteration)
|
|
413
|
+
else:
|
|
414
|
+
state = LoopState()
|
|
415
|
+
logger.info("Starting fresh experiment loop")
|
|
416
|
+
|
|
417
|
+
loop = ExperimentLoop(strategy, evaluator, budget, tracker, state=state)
|
|
418
|
+
final_state = loop.run()
|
|
419
|
+
```
|
|
420
|
+
|
|
421
|
+
The key invariant: **state is persisted after every decide phase, before the next hypothesize phase**. This means a crash during execute or evaluate loses at most one run, and the loop resumes at the correct phase.
|
|
422
|
+
|
|
423
|
+
### Anti-Patterns
|
|
424
|
+
|
|
425
|
+
- **No budget limits**: The loop runs forever. Always set a max_runs limit.
|
|
426
|
+
- **Budget in strategy code**: The strategy overrides budget limits. Budget enforcement must be in the runner.
|
|
427
|
+
- **No state persistence**: A crash loses all progress. Save state after every iteration.
|
|
428
|
+
- **Hypothesis depends on results order**: If the next hypothesis depends on the order results were computed (not just their values), the loop is not reproducible.
|
|
429
|
+
- **Shared mutable state between iterations**: Each iteration must be independent. The only shared state is the loop state (best result, history), never mutable global variables.
|