@zigrivers/scaffold 3.14.0 → 3.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (122) hide show
  1. package/README.md +50 -21
  2. package/content/knowledge/core/automated-review-tooling.md +21 -26
  3. package/content/knowledge/core/multi-model-review-dispatch.md +30 -55
  4. package/content/knowledge/research/research-architecture.md +385 -0
  5. package/content/knowledge/research/research-conventions.md +248 -0
  6. package/content/knowledge/research/research-dev-environment.md +303 -0
  7. package/content/knowledge/research/research-experiment-loop.md +429 -0
  8. package/content/knowledge/research/research-experiment-tracking.md +336 -0
  9. package/content/knowledge/research/research-ml-architecture-search.md +383 -0
  10. package/content/knowledge/research/research-ml-evaluation.md +407 -0
  11. package/content/knowledge/research/research-ml-experiment-tracking.md +466 -0
  12. package/content/knowledge/research/research-ml-training-patterns.md +413 -0
  13. package/content/knowledge/research/research-observability.md +395 -0
  14. package/content/knowledge/research/research-overfitting-prevention.md +306 -0
  15. package/content/knowledge/research/research-project-structure.md +264 -0
  16. package/content/knowledge/research/research-quant-backtesting.md +326 -0
  17. package/content/knowledge/research/research-quant-market-data.md +366 -0
  18. package/content/knowledge/research/research-quant-metrics.md +335 -0
  19. package/content/knowledge/research/research-quant-requirements.md +223 -0
  20. package/content/knowledge/research/research-quant-risk.md +469 -0
  21. package/content/knowledge/research/research-quant-strategy-patterns.md +412 -0
  22. package/content/knowledge/research/research-requirements.md +201 -0
  23. package/content/knowledge/research/research-security.md +374 -0
  24. package/content/knowledge/research/research-sim-compute-management.md +538 -0
  25. package/content/knowledge/research/research-sim-engine-patterns.md +448 -0
  26. package/content/knowledge/research/research-sim-parameter-spaces.md +425 -0
  27. package/content/knowledge/research/research-sim-validation.md +456 -0
  28. package/content/knowledge/research/research-testing.md +334 -0
  29. package/content/methodology/research-ml-research.yml +23 -0
  30. package/content/methodology/research-overlay.yml +65 -0
  31. package/content/methodology/research-quant-finance.yml +29 -0
  32. package/content/methodology/research-simulation.yml +23 -0
  33. package/content/tools/post-implementation-review.md +36 -7
  34. package/content/tools/review-code.md +33 -8
  35. package/content/tools/review-pr.md +79 -95
  36. package/dist/cli/commands/adopt.d.ts.map +1 -1
  37. package/dist/cli/commands/adopt.js +22 -1
  38. package/dist/cli/commands/adopt.js.map +1 -1
  39. package/dist/cli/commands/adopt.serialization.test.js +41 -0
  40. package/dist/cli/commands/adopt.serialization.test.js.map +1 -1
  41. package/dist/cli/commands/init.d.ts +4 -0
  42. package/dist/cli/commands/init.d.ts.map +1 -1
  43. package/dist/cli/commands/init.js +32 -2
  44. package/dist/cli/commands/init.js.map +1 -1
  45. package/dist/cli/init-flag-families.d.ts +6 -1
  46. package/dist/cli/init-flag-families.d.ts.map +1 -1
  47. package/dist/cli/init-flag-families.js +32 -1
  48. package/dist/cli/init-flag-families.js.map +1 -1
  49. package/dist/cli/init-flag-families.test.js +47 -0
  50. package/dist/cli/init-flag-families.test.js.map +1 -1
  51. package/dist/config/schema.d.ts +272 -16
  52. package/dist/config/schema.d.ts.map +1 -1
  53. package/dist/config/schema.js +25 -1
  54. package/dist/config/schema.js.map +1 -1
  55. package/dist/config/schema.test.js +103 -3
  56. package/dist/config/schema.test.js.map +1 -1
  57. package/dist/core/assembly/overlay-loader.d.ts +12 -0
  58. package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
  59. package/dist/core/assembly/overlay-loader.js +30 -0
  60. package/dist/core/assembly/overlay-loader.js.map +1 -1
  61. package/dist/core/assembly/overlay-loader.test.js +66 -1
  62. package/dist/core/assembly/overlay-loader.test.js.map +1 -1
  63. package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
  64. package/dist/core/assembly/overlay-state-resolver.js +48 -19
  65. package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
  66. package/dist/core/assembly/overlay-state-resolver.test.js +80 -0
  67. package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
  68. package/dist/e2e/project-type-overlays.test.js +119 -0
  69. package/dist/e2e/project-type-overlays.test.js.map +1 -1
  70. package/dist/project/adopt.d.ts.map +1 -1
  71. package/dist/project/adopt.js +3 -1
  72. package/dist/project/adopt.js.map +1 -1
  73. package/dist/project/detectors/disambiguate.js +1 -1
  74. package/dist/project/detectors/disambiguate.js.map +1 -1
  75. package/dist/project/detectors/index.d.ts.map +1 -1
  76. package/dist/project/detectors/index.js +2 -1
  77. package/dist/project/detectors/index.js.map +1 -1
  78. package/dist/project/detectors/ml.d.ts.map +1 -1
  79. package/dist/project/detectors/ml.js +2 -6
  80. package/dist/project/detectors/ml.js.map +1 -1
  81. package/dist/project/detectors/research.d.ts +4 -0
  82. package/dist/project/detectors/research.d.ts.map +1 -0
  83. package/dist/project/detectors/research.js +141 -0
  84. package/dist/project/detectors/research.js.map +1 -0
  85. package/dist/project/detectors/research.test.d.ts +2 -0
  86. package/dist/project/detectors/research.test.d.ts.map +1 -0
  87. package/dist/project/detectors/research.test.js +235 -0
  88. package/dist/project/detectors/research.test.js.map +1 -0
  89. package/dist/project/detectors/shared-signals.d.ts +3 -0
  90. package/dist/project/detectors/shared-signals.d.ts.map +1 -0
  91. package/dist/project/detectors/shared-signals.js +9 -0
  92. package/dist/project/detectors/shared-signals.js.map +1 -0
  93. package/dist/project/detectors/types.d.ts +6 -2
  94. package/dist/project/detectors/types.d.ts.map +1 -1
  95. package/dist/project/detectors/types.js.map +1 -1
  96. package/dist/types/config.d.ts +7 -1
  97. package/dist/types/config.d.ts.map +1 -1
  98. package/dist/wizard/copy/core.d.ts.map +1 -1
  99. package/dist/wizard/copy/core.js +4 -0
  100. package/dist/wizard/copy/core.js.map +1 -1
  101. package/dist/wizard/copy/index.d.ts.map +1 -1
  102. package/dist/wizard/copy/index.js +2 -0
  103. package/dist/wizard/copy/index.js.map +1 -1
  104. package/dist/wizard/copy/research.d.ts +3 -0
  105. package/dist/wizard/copy/research.d.ts.map +1 -0
  106. package/dist/wizard/copy/research.js +27 -0
  107. package/dist/wizard/copy/research.js.map +1 -0
  108. package/dist/wizard/copy/types.d.ts +5 -1
  109. package/dist/wizard/copy/types.d.ts.map +1 -1
  110. package/dist/wizard/flags.d.ts +7 -1
  111. package/dist/wizard/flags.d.ts.map +1 -1
  112. package/dist/wizard/questions.d.ts +4 -2
  113. package/dist/wizard/questions.d.ts.map +1 -1
  114. package/dist/wizard/questions.js +27 -1
  115. package/dist/wizard/questions.js.map +1 -1
  116. package/dist/wizard/questions.test.js +51 -0
  117. package/dist/wizard/questions.test.js.map +1 -1
  118. package/dist/wizard/wizard.d.ts +3 -2
  119. package/dist/wizard/wizard.d.ts.map +1 -1
  120. package/dist/wizard/wizard.js +3 -1
  121. package/dist/wizard/wizard.js.map +1 -1
  122. package/package.json +1 -1
@@ -0,0 +1,334 @@
1
+ ---
2
+ name: research-testing
3
+ description: Testing experiment loops including determinism tests, result validation, integration tests for experiment pipelines, and regression baselines
4
+ topics: [research, testing, determinism, validation, integration-tests, regression, tdd]
5
+ ---
6
+
7
+ Research code is notoriously undertested because "the results are stochastic" feels like an excuse. It is not. The experiment runner, evaluation framework, data pipeline, and state management are all deterministic and must be tested rigorously. The stochastic parts (experiment outcomes) require seed-based determinism tests and statistical validation. Untested experiment loops produce unreliable results that waste compute and mislead researchers.
8
+
9
+ ## Summary
10
+
11
+ Test research projects at four levels: determinism tests (same seed produces same results), component tests (runner, evaluator, tracker work correctly in isolation), integration tests (full experiment loop produces valid results on fixture data), and regression tests (new code changes do not alter previously established baselines). Use pytest with fixtures for small datasets and mocked external dependencies. Run tests on every commit -- fast tests in pre-commit, slow integration tests in CI.
12
+
13
+ ## Deep Guidance
14
+
15
+ ### Determinism Tests
16
+
17
+ The most important property of a research system: given the same seed and config, it must produce identical results:
18
+
19
+ ```python
20
+ # tests/test_determinism.py
21
+ import pytest
22
+ from src.runner.experiment_runner import ExperimentRunner
23
+ from src.seed import set_seed
24
+
25
+ class TestDeterminism:
26
+ def test_same_seed_same_results(self, tmp_path, fixture_config):
27
+ """Two runs with the same seed must produce identical metrics."""
28
+ config = fixture_config.copy()
29
+ config["experiment"]["seed"] = 42
30
+ config["logging"]["results_dir"] = str(tmp_path)
31
+
32
+ # Run 1
33
+ set_seed(42)
34
+ runner1 = ExperimentRunner(config)
35
+ result1 = runner1.run_single()
36
+
37
+ # Run 2
38
+ set_seed(42)
39
+ runner2 = ExperimentRunner(config)
40
+ result2 = runner2.run_single()
41
+
42
+ assert result1.metrics == result2.metrics, (
43
+ f"Non-deterministic results:\n"
44
+ f" Run 1: {result1.metrics}\n"
45
+ f" Run 2: {result2.metrics}"
46
+ )
47
+
48
+ def test_different_seeds_different_results(self, tmp_path, fixture_config):
49
+ """Different seeds should produce different results (not trivially constant)."""
50
+ config = fixture_config.copy()
51
+ config["logging"]["results_dir"] = str(tmp_path)
52
+
53
+ set_seed(42)
54
+ runner1 = ExperimentRunner(config)
55
+ result1 = runner1.run_single()
56
+
57
+ set_seed(123)
58
+ runner2 = ExperimentRunner(config)
59
+ result2 = runner2.run_single()
60
+
61
+ assert result1.metrics != result2.metrics, (
62
+ "Different seeds produced identical results -- "
63
+ "strategy may be ignoring the seed"
64
+ )
65
+
66
+ def test_seed_isolation_between_runs(self, tmp_path, fixture_config):
67
+ """Each run in the loop must use an independent seed."""
68
+ config = fixture_config.copy()
69
+ config["experiment"]["seed"] = 42
70
+ config["experiment"]["num_runs"] = 5
71
+ config["logging"]["results_dir"] = str(tmp_path)
72
+
73
+ runner = ExperimentRunner(config)
74
+ state = runner.run_loop()
75
+
76
+ # Verify all runs produced different metrics (not re-using the same seed)
77
+ metric_values = [r["metrics"]["primary"] for r in state.history]
78
+ assert len(set(str(v) for v in metric_values)) > 1, (
79
+ "All runs produced identical metrics -- seed may not be incremented"
80
+ )
81
+ ```
82
+
83
+ ### Component Tests
84
+
85
+ Test each component of the experiment system in isolation:
86
+
87
+ ```python
88
+ # tests/test_evaluator.py
89
+ import pytest
90
+ from src.evaluation.evaluator import MetricEvaluator
91
+
92
+ class TestMetricEvaluator:
93
+ @pytest.fixture
94
+ def evaluator(self):
95
+ return MetricEvaluator(
96
+ primary_metric="sharpe_ratio",
97
+ direction="maximize",
98
+ )
99
+
100
+ def test_evaluate_returns_expected_metrics(self, evaluator):
101
+ """Evaluator must return all configured metrics."""
102
+ raw_results = {
103
+ "returns": [0.01, -0.005, 0.02, -0.01, 0.015],
104
+ "trades": 5,
105
+ }
106
+ metrics = evaluator.evaluate(raw_results)
107
+ assert "sharpe_ratio" in metrics
108
+ assert "max_drawdown" in metrics
109
+ assert "num_trades" in metrics
110
+ assert isinstance(metrics["sharpe_ratio"], float)
111
+
112
+ def test_is_improvement_maximization(self, evaluator):
113
+ """Higher primary metric should be an improvement when maximizing."""
114
+ current = {"sharpe_ratio": 1.5, "max_drawdown": 0.1}
115
+ best = {"sharpe_ratio": 1.2, "max_drawdown": 0.12}
116
+ assert evaluator.is_improvement(current, best) is True
117
+
118
+ def test_is_not_improvement(self, evaluator):
119
+ """Lower primary metric should not be an improvement when maximizing."""
120
+ current = {"sharpe_ratio": 1.0, "max_drawdown": 0.1}
121
+ best = {"sharpe_ratio": 1.5, "max_drawdown": 0.12}
122
+ assert evaluator.is_improvement(current, best) is False
123
+
124
+ def test_evaluate_empty_results_raises(self, evaluator):
125
+ """Empty results must raise a clear error, not return NaN."""
126
+ with pytest.raises(ValueError, match="empty"):
127
+ evaluator.evaluate({"returns": [], "trades": 0})
128
+
129
+
130
+ # tests/test_state.py
131
+ import pytest
132
+ import json
133
+ from pathlib import Path
134
+ from src.runner.state import ExperimentState, RunRecord
135
+
136
+ class TestExperimentState:
137
+ def test_save_and_load_roundtrip(self, tmp_path):
138
+ """State must survive a save/load cycle."""
139
+ state = ExperimentState(experiment_id="test-001")
140
+ run = RunRecord(
141
+ run_id="run-0001",
142
+ config={"strategy": {"type": "momentum"}},
143
+ metrics={"sharpe_ratio": 1.5},
144
+ is_best=True,
145
+ decision="keep",
146
+ )
147
+ state.record_run(run)
148
+
149
+ path = tmp_path / "state.json"
150
+ state.save(path)
151
+ loaded = ExperimentState.load(path)
152
+
153
+ assert loaded.experiment_id == "test-001"
154
+ assert loaded.total_runs == 1
155
+ assert loaded.best_run.metrics == {"sharpe_ratio": 1.5}
156
+
157
+ def test_runs_since_improvement_tracking(self):
158
+ """State must track runs since last improvement."""
159
+ state = ExperimentState(experiment_id="test")
160
+
161
+ # First run is always best
162
+ state.record_run(RunRecord(
163
+ run_id="1", config={}, metrics={"m": 1.0}, is_best=True, decision="keep",
164
+ ))
165
+ assert state.runs_since_improvement == 0
166
+
167
+ # Non-improvement increments counter
168
+ state.record_run(RunRecord(
169
+ run_id="2", config={}, metrics={"m": 0.5}, is_best=False, decision="discard",
170
+ ))
171
+ assert state.runs_since_improvement == 1
172
+
173
+ # New best resets counter
174
+ state.record_run(RunRecord(
175
+ run_id="3", config={}, metrics={"m": 2.0}, is_best=True, decision="keep",
176
+ ))
177
+ assert state.runs_since_improvement == 0
178
+ ```
179
+
180
+ ### Integration Tests
181
+
182
+ Integration tests run the full experiment loop on small fixture data:
183
+
184
+ ```python
185
+ # tests/test_integration.py
186
+ import pytest
187
+ from pathlib import Path
188
+ from src.runner.experiment_runner import ExperimentRunner
189
+ from src.loop.state_machine import ExperimentLoop, LoopState
190
+
191
+ class TestExperimentLoopIntegration:
192
+ @pytest.fixture
193
+ def small_config(self, tmp_path):
194
+ return {
195
+ "experiment": {"seed": 42, "num_runs": 10},
196
+ "strategy": {"type": "mock_strategy", "params": {}},
197
+ "budget": {"max_runs": 10, "patience": 5},
198
+ "logging": {"results_dir": str(tmp_path / "results")},
199
+ }
200
+
201
+ def test_loop_runs_to_completion(self, small_config, tmp_path):
202
+ """Loop must complete within budget and produce valid state."""
203
+ runner = ExperimentRunner(small_config)
204
+ state = runner.run_loop()
205
+
206
+ assert state.total_runs <= 10
207
+ assert state.best_run is not None
208
+ assert len(state.history) == state.total_runs
209
+
210
+ def test_loop_persists_state(self, small_config, tmp_path):
211
+ """State file must exist and be loadable after loop completes."""
212
+ runner = ExperimentRunner(small_config)
213
+ runner.run_loop()
214
+
215
+ state_path = Path(small_config["logging"]["results_dir"]) / "state.json"
216
+ assert state_path.exists()
217
+
218
+ loaded = LoopState.load(state_path)
219
+ assert loaded.iteration > 0
220
+
221
+ def test_loop_resume_after_interruption(self, small_config, tmp_path):
222
+ """Loop must resume correctly from persisted state."""
223
+ config = small_config.copy()
224
+ config["budget"]["max_runs"] = 20
225
+
226
+ # Run 10 iterations
227
+ runner1 = ExperimentRunner(config)
228
+ runner1.budget.max_runs = 10
229
+ state1 = runner1.run_loop()
230
+ assert state1.total_runs == 10
231
+
232
+ # Resume from saved state, run 10 more
233
+ runner2 = ExperimentRunner(config)
234
+ state2 = runner2.run_loop()
235
+ assert state2.total_runs == 20
236
+
237
+ def test_results_directory_structure(self, small_config, tmp_path):
238
+ """Each run must create the expected result files."""
239
+ runner = ExperimentRunner(small_config)
240
+ runner.run_loop()
241
+
242
+ results_dir = Path(small_config["logging"]["results_dir"])
243
+ run_dirs = sorted(d for d in results_dir.iterdir()
244
+ if d.is_dir() and d.name.startswith("run-"))
245
+
246
+ assert len(run_dirs) > 0
247
+ for run_dir in run_dirs:
248
+ assert (run_dir / "config.json").exists()
249
+ assert (run_dir / "metrics.json").exists()
250
+ ```
251
+
252
+ ### Regression Baselines
253
+
254
+ Establish metric baselines so that code changes do not silently degrade results:
255
+
256
+ ```python
257
+ # tests/test_regression.py
258
+ import pytest
259
+ import json
260
+ from pathlib import Path
261
+
262
+ BASELINE_PATH = Path("tests/fixtures/baselines/metrics_baseline.json")
263
+
264
+ class TestRegressionBaseline:
265
+ @pytest.fixture(scope="class")
266
+ def current_metrics(self, small_config, tmp_path):
267
+ """Run the standard benchmark and return metrics."""
268
+ from src.runner.experiment_runner import ExperimentRunner
269
+ runner = ExperimentRunner(small_config)
270
+ state = runner.run_loop()
271
+ return state.best_run.metrics
272
+
273
+ @pytest.fixture(scope="class")
274
+ def baseline_metrics(self):
275
+ """Load the committed baseline metrics."""
276
+ with open(BASELINE_PATH) as f:
277
+ return json.load(f)
278
+
279
+ def test_primary_metric_no_regression(self, current_metrics, baseline_metrics):
280
+ """Primary metric must not regress beyond tolerance."""
281
+ tolerance = 0.05 # 5% relative tolerance
282
+ baseline = baseline_metrics["sharpe_ratio"]
283
+ current = current_metrics["sharpe_ratio"]
284
+ assert current >= baseline * (1 - tolerance), (
285
+ f"Regression: sharpe_ratio {current:.4f} < baseline {baseline:.4f} "
286
+ f"(tolerance: {tolerance:.0%})"
287
+ )
288
+ ```
289
+
290
+ ### Test Fixtures
291
+
292
+ ```python
293
+ # tests/conftest.py
294
+ import pytest
295
+
296
+ @pytest.fixture
297
+ def fixture_config(tmp_path):
298
+ """Minimal config for fast tests."""
299
+ return {
300
+ "experiment": {"seed": 42, "num_runs": 5},
301
+ "strategy": {"type": "mock_strategy", "params": {}},
302
+ "data": {"source": "tests/fixtures/small_data.csv"},
303
+ "budget": {"max_runs": 5, "patience": 3},
304
+ "logging": {"results_dir": str(tmp_path / "results")},
305
+ }
306
+
307
+ @pytest.fixture
308
+ def mock_strategy():
309
+ """Strategy that returns predictable results for testing."""
310
+ class MockStrategy:
311
+ name = "mock_strategy"
312
+ _call_count = 0
313
+
314
+ def execute(self, config):
315
+ self._call_count += 1
316
+ return {
317
+ "returns": [0.01 * self._call_count, -0.005, 0.02],
318
+ "trades": self._call_count * 10,
319
+ }
320
+
321
+ def next_hypothesis(self, state):
322
+ return {"param": state.iteration}
323
+
324
+ return MockStrategy()
325
+ ```
326
+
327
+ ### Testing Best Practices for Research
328
+
329
+ - **Fast tests in pre-commit**: Determinism and component tests must run in < 10 seconds.
330
+ - **Slow tests in CI**: Integration tests with actual experiment execution run in CI only (mark with `@pytest.mark.slow`).
331
+ - **Mock external resources**: Mock file I/O, API calls, and database connections in unit tests. Integration tests may use real file I/O with `tmp_path`.
332
+ - **Test the loop termination**: Verify that every stopping condition actually stops the loop. Budget exhaustion, patience, convergence, and error limits must all be tested.
333
+ - **Test crash recovery**: Simulate a crash by persisting state mid-loop, then verify the loop resumes correctly.
334
+ - **Baseline updates are deliberate**: Updating regression baselines requires a commit message explaining why the baseline changed. Never auto-update baselines in CI.
@@ -0,0 +1,23 @@
1
+ # methodology/research-ml-research.yml
2
+ name: research-ml-research
3
+ description: >
4
+ ML-research domain sub-overlay — adds architecture search, training
5
+ patterns, and evaluation knowledge for ML research projects.
6
+ project-type: research
7
+ domain: ml-research
8
+
9
+ knowledge-overrides:
10
+ system-architecture:
11
+ append: [research-ml-architecture-search, research-ml-training-patterns]
12
+ operations:
13
+ append: [research-ml-experiment-tracking]
14
+ tdd:
15
+ append: [research-ml-evaluation]
16
+ create-evals:
17
+ append: [research-ml-evaluation]
18
+ review-architecture:
19
+ append: [research-ml-architecture-search]
20
+ review-testing:
21
+ append: [research-ml-evaluation]
22
+ implementation-plan:
23
+ append: [research-ml-architecture-search]
@@ -0,0 +1,65 @@
1
+ # methodology/research-overlay.yml
2
+ name: research
3
+ description: >
4
+ Research overlay — injects research domain knowledge into existing
5
+ pipeline steps for experiment loop architecture, tracking, evaluation,
6
+ overfitting prevention, and domain-specific patterns.
7
+ project-type: research
8
+
9
+ # ---------------------------------------------------------------------------
10
+ # knowledge-overrides
11
+ # ---------------------------------------------------------------------------
12
+ # Map research knowledge entries into existing pipeline steps so that
13
+ # experiment loop domain expertise is injected during prompt assembly.
14
+ knowledge-overrides:
15
+ # Foundational (6 steps)
16
+ create-prd:
17
+ append: [research-requirements]
18
+ user-stories:
19
+ append: [research-requirements]
20
+ coding-standards:
21
+ append: [research-conventions]
22
+ project-structure:
23
+ append: [research-project-structure]
24
+ dev-env-setup:
25
+ append: [research-dev-environment]
26
+ git-workflow:
27
+ append: [research-conventions]
28
+
29
+ # Architecture & Design (6 steps)
30
+ system-architecture:
31
+ append: [research-architecture, research-experiment-loop]
32
+ tech-stack:
33
+ append: [research-architecture]
34
+ adrs:
35
+ append: [research-architecture]
36
+ domain-modeling:
37
+ append: [research-experiment-loop]
38
+ security:
39
+ append: [research-security]
40
+ operations:
41
+ append: [research-experiment-tracking, research-observability]
42
+
43
+ # Testing (4 steps)
44
+ tdd:
45
+ append: [research-testing, research-overfitting-prevention]
46
+ add-e2e-testing:
47
+ append: [research-testing]
48
+ create-evals:
49
+ append: [research-testing, research-overfitting-prevention]
50
+ story-tests:
51
+ append: [research-testing]
52
+
53
+ # Reviews (4 steps)
54
+ review-architecture:
55
+ append: [research-architecture, research-experiment-loop]
56
+ review-security:
57
+ append: [research-security]
58
+ review-operations:
59
+ append: [research-experiment-tracking, research-observability]
60
+ review-testing:
61
+ append: [research-testing, research-overfitting-prevention]
62
+
63
+ # Planning (1 step)
64
+ implementation-plan:
65
+ append: [research-architecture]
@@ -0,0 +1,29 @@
1
+ # methodology/research-quant-finance.yml
2
+ name: research-quant-finance
3
+ description: >
4
+ Quant-finance domain sub-overlay — adds trading strategy, backtesting,
5
+ risk analysis, and market data knowledge to research projects.
6
+ project-type: research
7
+ domain: quant-finance
8
+
9
+ knowledge-overrides:
10
+ create-prd:
11
+ append: [research-quant-requirements]
12
+ system-architecture:
13
+ append: [research-quant-backtesting, research-quant-strategy-patterns]
14
+ domain-modeling:
15
+ append: [research-quant-market-data]
16
+ security:
17
+ append: [research-quant-risk]
18
+ operations:
19
+ append: [research-quant-metrics]
20
+ tdd:
21
+ append: [research-quant-backtesting]
22
+ create-evals:
23
+ append: [research-quant-metrics, research-quant-backtesting]
24
+ review-architecture:
25
+ append: [research-quant-backtesting, research-quant-strategy-patterns]
26
+ review-testing:
27
+ append: [research-quant-backtesting]
28
+ implementation-plan:
29
+ append: [research-quant-backtesting, research-quant-strategy-patterns]
@@ -0,0 +1,23 @@
1
+ # methodology/research-simulation.yml
2
+ name: research-simulation
3
+ description: >
4
+ Simulation domain sub-overlay — adds physics/materials simulation engine,
5
+ parameter space, and compute management knowledge.
6
+ project-type: research
7
+ domain: simulation
8
+
9
+ knowledge-overrides:
10
+ system-architecture:
11
+ append: [research-sim-engine-patterns, research-sim-parameter-spaces]
12
+ domain-modeling:
13
+ append: [research-sim-parameter-spaces]
14
+ operations:
15
+ append: [research-sim-compute-management]
16
+ tdd:
17
+ append: [research-sim-validation]
18
+ create-evals:
19
+ append: [research-sim-validation, research-sim-parameter-spaces]
20
+ review-architecture:
21
+ append: [research-sim-engine-patterns]
22
+ implementation-plan:
23
+ append: [research-sim-engine-patterns]
@@ -26,7 +26,7 @@ comprehensive quality check before releasing or handing off the project.
26
26
  The three channels are:
27
27
  1. **Codex CLI** — Implementation correctness, security, API contracts
28
28
  2. **Gemini CLI** — Design reasoning, architectural patterns, broad context
29
- 3. **Superpowers code-reviewer** — Plan alignment, code quality, testing
29
+ 3. **Claude CLI** — Plan alignment, code quality, testing
30
30
 
31
31
  ## Inputs
32
32
 
@@ -191,6 +191,7 @@ Return ALL findings as valid JSON:
191
191
  {
192
192
  "severity": "P0|P1|P2|P3",
193
193
  "category": "architecture-alignment|security|error-handling|test-coverage|complexity|dependencies",
194
+ "location": "relative/path/to/file.ts:42",
194
195
  "file": "relative/path/to/file.ts",
195
196
  "line": 42,
196
197
  "description": "Specific description of the issue",
@@ -226,7 +227,7 @@ If not installed: queue a compensating pass (implementation correctness, securit
226
227
  codex login status 2>/dev/null && echo "codex authenticated" || echo "codex NOT authenticated"
227
228
  ```
228
229
 
229
- If not authenticated: tell the user "Codex auth expired. Run: `! codex login`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (auth_timeout or user declines): queue a compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`).
230
+ If not authenticated: tell the user "Codex auth expired. Run: `! codex login`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (timeout or user declines): queue a compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`).
230
231
 
231
232
  If Codex fails during execution (non-zero exit, malformed output, timeout): queue a compensating pass with the same focus and label.
232
233
 
@@ -254,7 +255,7 @@ If not installed: queue a compensating pass (architectural patterns, design reas
254
255
  NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
255
256
  ```
256
257
 
257
- If exit code is 41: tell the user "Gemini auth expired. Run: `! gemini -p \"hello\"`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (auth_timeout or user declines): queue a compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`).
258
+ If exit code is 41: tell the user "Gemini auth expired. Run: `! gemini -p \"hello\"`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (timeout or user declines): queue a compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`).
258
259
 
259
260
  If Gemini fails during execution (non-zero exit, malformed output, timeout): queue a compensating pass with the same focus and label.
260
261
 
@@ -291,6 +292,7 @@ surfaces to this format before returning):
291
292
  {
292
293
  "severity": "P0|P1|P2|P3",
293
294
  "category": "architecture-alignment|security|error-handling|test-coverage|complexity|dependencies",
295
+ "location": "relative/path/to/file.ts:42",
294
296
  "file": "relative/path/to/file.ts",
295
297
  "line": 42,
296
298
  "description": "Specific description of the issue",
@@ -300,6 +302,10 @@ surfaces to this format before returning):
300
302
  }
301
303
  ```
302
304
 
305
+ **MMR compatibility:** The `location` field (`file:line` format) is required for
306
+ `mmr reconcile` injection. The `file` and `line` fields are retained for backward
307
+ compatibility with direct channel consumers.
308
+
303
309
  Store as `SUPERPOWERS_PHASE1_FINDINGS`.
304
310
 
305
311
  ### Step 5: Run Phase 2 — Parallel User Story Review
@@ -441,8 +447,8 @@ before returning. Then return all three channels' findings plus channel status:
441
447
  {
442
448
  "story": "[STORY_TITLE]",
443
449
  "channel_status": {
444
- "codex": { "root_cause": "null|not_installed|auth_failed|auth_timeout|failed", "coverage_status": "full|compensating" },
445
- "gemini": { "root_cause": "null|not_installed|auth_failed|auth_timeout|failed", "coverage_status": "full|compensating" },
450
+ "codex": { "root_cause": "null|not_installed|auth_failed|timeout|failed", "coverage_status": "full|compensating" },
451
+ "gemini": { "root_cause": "null|not_installed|auth_failed|timeout|failed", "coverage_status": "full|compensating" },
446
452
  "superpowers": { "root_cause": null, "coverage_status": "full" }
447
453
  },
448
454
  "codex": { "findings": [...] },
@@ -453,6 +459,29 @@ before returning. Then return all three channels' findings plus channel status:
453
459
 
454
460
  Collect findings from all subagents. Store as `PHASE2_FINDINGS`.
455
461
 
462
+ ### Step 5e: Optional — Inject Findings into MMR for Unified Reconciliation
463
+
464
+ If an MMR job exists (e.g., from a prior `mmr review` run on the same branch), the
465
+ agent can inject its post-implementation review findings into MMR for unified
466
+ reconciliation across all channels:
467
+
468
+ ```bash
469
+ # Inject Phase 1 and Phase 2 findings into an existing MMR job
470
+ # Write agent findings to a temp file for mmr reconcile
471
+ echo "$AGENT_FINDINGS" > /tmp/agent-findings.json
472
+ mmr reconcile "$JOB_ID" --channel superpowers --input /tmp/agent-findings.json
473
+ ```
474
+
475
+ All findings injected via `mmr reconcile` must use MMR-compatible schema: each
476
+ finding needs `severity` (P0-P3), `location` (file:line), and `description`
477
+ (`suggestion` is optional). The strict validator will reject findings with
478
+ missing or invalid required fields.
479
+
480
+ This step is optional — post-implementation review is a full-codebase review (not
481
+ diff-only), so it operates independently of `mmr review`. Use `mmr reconcile` only
482
+ when you want to merge post-implementation findings into an existing MMR job for a
483
+ single unified verdict.
484
+
456
485
  ### Step 6: Consolidate Findings
457
486
 
458
487
  Merge all findings from Phase 1 (`CODEX_PHASE1_FINDINGS`, `GEMINI_PHASE1_FINDINGS`,
@@ -656,9 +685,9 @@ the user they require manual attention before the project is ready to release.
656
685
  | Codex not installed (`command -v` fails) | Queue compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`); document as "not_installed" in report |
657
686
  | Gemini not installed (`command -v` fails) | Queue compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`); document as "not_installed" in report |
658
687
  | Codex auth expired — user recovers | Re-run auth check; proceed with full Codex channel |
659
- | Codex auth expired — user declines or auth_timeout | Queue compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`); document as "auth_failed" or "auth_timeout" in report |
688
+ | Codex auth expired — user declines or timeout | Queue compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`); document as "auth_failed" or "timeout" in report |
660
689
  | Gemini auth expired (exit 41) — user recovers | Re-run auth check; proceed with full Gemini channel |
661
- | Gemini auth expired — user declines or auth_timeout | Queue compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`); document as "auth_failed" or "auth_timeout" in report |
690
+ | Gemini auth expired — user declines or timeout | Queue compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`); document as "auth_failed" or "timeout" in report |
662
691
  | Channel fails during execution (non-zero exit, malformed output, timeout) | Queue compensating pass for that channel with same focus and label; document root cause in report |
663
692
  | Both external CLIs unavailable (any combination of not_installed / auth failure) | Run all compensating passes plus Superpowers code-reviewer; report coverage as "degraded-coverage"; warn user that review coverage is reduced |
664
693
  | Superpowers unavailable | Document as "unavailable" in report; proceed with remaining channels; Superpowers is a Claude subagent and should always be available |
@@ -23,7 +23,7 @@ anything leaves the machine.
23
23
  The three channels are:
24
24
  1. **Codex CLI** — implementation correctness, security, API contracts
25
25
  2. **Gemini CLI** — architectural patterns, broad-context reasoning
26
- 3. **Superpowers code-reviewer** — Claude subagent review of code quality, tests, and plan alignment
26
+ 3. **Claude CLI** — Claude subagent review of code quality, tests, and plan alignment
27
27
 
28
28
  ## Inputs
29
29
 
@@ -46,6 +46,31 @@ The three channels are:
46
46
 
47
47
  ## Instructions
48
48
 
49
+ ### Primary: MMR CLI + Agent Reconcile
50
+
51
+ When the MMR CLI is installed, use it as the primary entry point:
52
+
53
+ ```bash
54
+ # Staged changes
55
+ mmr review --staged --sync --format json
56
+
57
+ # Branch diff against main
58
+ mmr review --base main --sync --format json
59
+ ```
60
+
61
+ After the CLI review completes, dispatch the agent's code-reviewer skill (4th channel) and inject findings into the MMR job for unified reconciliation:
62
+
63
+ ```bash
64
+ # job_id is captured from mmr review --sync --format json output
65
+ # Write agent findings to a temp file for mmr reconcile
66
+ echo "$AGENT_FINDINGS" > /tmp/agent-findings.json
67
+ mmr reconcile "$JOB_ID" --channel superpowers --input /tmp/agent-findings.json
68
+ ```
69
+
70
+ The agent's review output must use MMR-compatible finding schema: each finding needs `severity` (P0-P3), `location` (file:line), and `description` (`suggestion` is optional).
71
+
72
+ If `mmr` is not installed (`command -v mmr` fails), fall back to the manual multi-channel flow below.
73
+
49
74
  ### Step 1: Detect Mode
50
75
 
51
76
  Parse `$ARGUMENTS` and set:
@@ -173,7 +198,7 @@ codex login status 2>/dev/null
173
198
  - If `codex` is not installed: skip this channel and record root-cause `not_installed`
174
199
  - If auth fails: tell the user to run `! codex login`, retry after recovery, and if recovery is not possible, record root-cause `auth_failed` and continue with the remaining channels
175
200
 
176
- If auth cannot be recovered, or if Codex is not installed, queue a compensating Claude self-review pass focused on implementation correctness, security, and API contracts. Label findings as `[compensating: Codex-equivalent]`. If auth check times out (~5s), retry once; if still failing, record `auth timeout` and queue compensating pass. This pass runs after all channel dispatch attempts complete.
201
+ If auth cannot be recovered, or if Codex is not installed, queue a compensating Claude self-review pass focused on implementation correctness, security, and API contracts. Label findings as `[compensating: Codex-equivalent]`. If auth check times out (~5s), retry once; if still failing, record `timeout` and queue compensating pass. This pass runs after all channel dispatch attempts complete.
177
202
 
178
203
  Build the prompt in a temporary file and pass it over stdin:
179
204
 
@@ -209,9 +234,9 @@ NO_BROWSER=true gemini -p "$(cat "$PROMPT_FILE")" --output-format json --approva
209
234
 
210
235
  If the CLI exits with a non-zero code, produces malformed/unparseable output, or is killed by the tool runner timeout, record root-cause `failed` and queue a compensating pass for that channel.
211
236
 
212
- #### Channel 3: Superpowers code-reviewer
237
+ #### Channel 3: Claude CLI
213
238
 
214
- Dispatch the `superpowers:code-reviewer` subagent.
239
+ Dispatch via `claude -p` with the review prompt.
215
240
 
216
241
  - If explicit refs are being reviewed, provide `BASE_SHA` and `HEAD_SHA`
217
242
  - Otherwise provide:
@@ -297,7 +322,7 @@ Otherwise:
297
322
  3. Repeat for up to 3 fix rounds
298
323
  4. If any finding remains unresolved after 3 rounds, stop with verdict `needs-user-decision`
299
324
 
300
- **Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `not installed`, `auth failed`, or `auth timeout` during fix rounds — its availability does not change within a session.
325
+ **Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `not_installed`, `auth_failed`, or `timeout` during fix rounds — its availability does not change within a session.
301
326
 
302
327
  ### Step 8: Final Verdict
303
328
 
@@ -321,9 +346,9 @@ Output a concise summary in this format:
321
346
  [scope label]
322
347
 
323
348
  ### Channels Executed
324
- - Codex CLI — root cause: [completed / not installed / auth failed / auth timeout / failed], coverage: [full / compensating (Codex-equivalent)]
325
- - Gemini CLI — root cause: [completed / not installed / auth failed / auth timeout / failed], coverage: [full / compensating (Gemini-equivalent)]
326
- - Superpowers code-reviewer — [completed / failed]
349
+ - Codex CLI — root cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating (Codex-equivalent)]
350
+ - Gemini CLI — root cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating (Gemini-equivalent)]
351
+ - Claude CLIroot cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating]
327
352
 
328
353
  ### Findings
329
354
  [consensus findings first, then single-source findings]