npm - @zigrivers/scaffold - Versions diffs - 3.14.0 → 3.16.0 - Mend

@zigrivers/scaffold 3.14.0 → 3.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (122) hide show

package/README.md +50 -21
package/content/knowledge/core/automated-review-tooling.md +21 -26
package/content/knowledge/core/multi-model-review-dispatch.md +30 -55
package/content/knowledge/research/research-architecture.md +385 -0
package/content/knowledge/research/research-conventions.md +248 -0
package/content/knowledge/research/research-dev-environment.md +303 -0
package/content/knowledge/research/research-experiment-loop.md +429 -0
package/content/knowledge/research/research-experiment-tracking.md +336 -0
package/content/knowledge/research/research-ml-architecture-search.md +383 -0
package/content/knowledge/research/research-ml-evaluation.md +407 -0
package/content/knowledge/research/research-ml-experiment-tracking.md +466 -0
package/content/knowledge/research/research-ml-training-patterns.md +413 -0
package/content/knowledge/research/research-observability.md +395 -0
package/content/knowledge/research/research-overfitting-prevention.md +306 -0
package/content/knowledge/research/research-project-structure.md +264 -0
package/content/knowledge/research/research-quant-backtesting.md +326 -0
package/content/knowledge/research/research-quant-market-data.md +366 -0
package/content/knowledge/research/research-quant-metrics.md +335 -0
package/content/knowledge/research/research-quant-requirements.md +223 -0
package/content/knowledge/research/research-quant-risk.md +469 -0
package/content/knowledge/research/research-quant-strategy-patterns.md +412 -0
package/content/knowledge/research/research-requirements.md +201 -0
package/content/knowledge/research/research-security.md +374 -0
package/content/knowledge/research/research-sim-compute-management.md +538 -0
package/content/knowledge/research/research-sim-engine-patterns.md +448 -0
package/content/knowledge/research/research-sim-parameter-spaces.md +425 -0
package/content/knowledge/research/research-sim-validation.md +456 -0
package/content/knowledge/research/research-testing.md +334 -0
package/content/methodology/research-ml-research.yml +23 -0
package/content/methodology/research-overlay.yml +65 -0
package/content/methodology/research-quant-finance.yml +29 -0
package/content/methodology/research-simulation.yml +23 -0
package/content/tools/post-implementation-review.md +36 -7
package/content/tools/review-code.md +33 -8
package/content/tools/review-pr.md +79 -95
package/dist/cli/commands/adopt.d.ts.map +1 -1
package/dist/cli/commands/adopt.js +22 -1
package/dist/cli/commands/adopt.js.map +1 -1
package/dist/cli/commands/adopt.serialization.test.js +41 -0
package/dist/cli/commands/adopt.serialization.test.js.map +1 -1
package/dist/cli/commands/init.d.ts +4 -0
package/dist/cli/commands/init.d.ts.map +1 -1
package/dist/cli/commands/init.js +32 -2
package/dist/cli/commands/init.js.map +1 -1
package/dist/cli/init-flag-families.d.ts +6 -1
package/dist/cli/init-flag-families.d.ts.map +1 -1
package/dist/cli/init-flag-families.js +32 -1
package/dist/cli/init-flag-families.js.map +1 -1
package/dist/cli/init-flag-families.test.js +47 -0
package/dist/cli/init-flag-families.test.js.map +1 -1
package/dist/config/schema.d.ts +272 -16
package/dist/config/schema.d.ts.map +1 -1
package/dist/config/schema.js +25 -1
package/dist/config/schema.js.map +1 -1
package/dist/config/schema.test.js +103 -3
package/dist/config/schema.test.js.map +1 -1
package/dist/core/assembly/overlay-loader.d.ts +12 -0
package/dist/core/assembly/overlay-loader.d.ts.map +1 -1
package/dist/core/assembly/overlay-loader.js +30 -0
package/dist/core/assembly/overlay-loader.js.map +1 -1
package/dist/core/assembly/overlay-loader.test.js +66 -1
package/dist/core/assembly/overlay-loader.test.js.map +1 -1
package/dist/core/assembly/overlay-state-resolver.d.ts.map +1 -1
package/dist/core/assembly/overlay-state-resolver.js +48 -19
package/dist/core/assembly/overlay-state-resolver.js.map +1 -1
package/dist/core/assembly/overlay-state-resolver.test.js +80 -0
package/dist/core/assembly/overlay-state-resolver.test.js.map +1 -1
package/dist/e2e/project-type-overlays.test.js +119 -0
package/dist/e2e/project-type-overlays.test.js.map +1 -1
package/dist/project/adopt.d.ts.map +1 -1
package/dist/project/adopt.js +3 -1
package/dist/project/adopt.js.map +1 -1
package/dist/project/detectors/disambiguate.js +1 -1
package/dist/project/detectors/disambiguate.js.map +1 -1
package/dist/project/detectors/index.d.ts.map +1 -1
package/dist/project/detectors/index.js +2 -1
package/dist/project/detectors/index.js.map +1 -1
package/dist/project/detectors/ml.d.ts.map +1 -1
package/dist/project/detectors/ml.js +2 -6
package/dist/project/detectors/ml.js.map +1 -1
package/dist/project/detectors/research.d.ts +4 -0
package/dist/project/detectors/research.d.ts.map +1 -0
package/dist/project/detectors/research.js +141 -0
package/dist/project/detectors/research.js.map +1 -0
package/dist/project/detectors/research.test.d.ts +2 -0
package/dist/project/detectors/research.test.d.ts.map +1 -0
package/dist/project/detectors/research.test.js +235 -0
package/dist/project/detectors/research.test.js.map +1 -0
package/dist/project/detectors/shared-signals.d.ts +3 -0
package/dist/project/detectors/shared-signals.d.ts.map +1 -0
package/dist/project/detectors/shared-signals.js +9 -0
package/dist/project/detectors/shared-signals.js.map +1 -0
package/dist/project/detectors/types.d.ts +6 -2
package/dist/project/detectors/types.d.ts.map +1 -1
package/dist/project/detectors/types.js.map +1 -1
package/dist/types/config.d.ts +7 -1
package/dist/types/config.d.ts.map +1 -1
package/dist/wizard/copy/core.d.ts.map +1 -1
package/dist/wizard/copy/core.js +4 -0
package/dist/wizard/copy/core.js.map +1 -1
package/dist/wizard/copy/index.d.ts.map +1 -1
package/dist/wizard/copy/index.js +2 -0
package/dist/wizard/copy/index.js.map +1 -1
package/dist/wizard/copy/research.d.ts +3 -0
package/dist/wizard/copy/research.d.ts.map +1 -0
package/dist/wizard/copy/research.js +27 -0
package/dist/wizard/copy/research.js.map +1 -0
package/dist/wizard/copy/types.d.ts +5 -1
package/dist/wizard/copy/types.d.ts.map +1 -1
package/dist/wizard/flags.d.ts +7 -1
package/dist/wizard/flags.d.ts.map +1 -1
package/dist/wizard/questions.d.ts +4 -2
package/dist/wizard/questions.d.ts.map +1 -1
package/dist/wizard/questions.js +27 -1
package/dist/wizard/questions.js.map +1 -1
package/dist/wizard/questions.test.js +51 -0
package/dist/wizard/questions.test.js.map +1 -1
package/dist/wizard/wizard.d.ts +3 -2
package/dist/wizard/wizard.d.ts.map +1 -1
package/dist/wizard/wizard.js +3 -1
package/dist/wizard/wizard.js.map +1 -1
package/package.json +1 -1

package/content/knowledge/research/research-testing.md ADDED Viewed

@@ -0,0 +1,334 @@
+---
+name: research-testing
+description: Testing experiment loops including determinism tests, result validation, integration tests for experiment pipelines, and regression baselines
+topics: [research, testing, determinism, validation, integration-tests, regression, tdd]
+---
+Research code is notoriously undertested because "the results are stochastic" feels like an excuse. It is not. The experiment runner, evaluation framework, data pipeline, and state management are all deterministic and must be tested rigorously. The stochastic parts (experiment outcomes) require seed-based determinism tests and statistical validation. Untested experiment loops produce unreliable results that waste compute and mislead researchers.
+## Summary
+Test research projects at four levels: determinism tests (same seed produces same results), component tests (runner, evaluator, tracker work correctly in isolation), integration tests (full experiment loop produces valid results on fixture data), and regression tests (new code changes do not alter previously established baselines). Use pytest with fixtures for small datasets and mocked external dependencies. Run tests on every commit -- fast tests in pre-commit, slow integration tests in CI.
+## Deep Guidance
+### Determinism Tests
+The most important property of a research system: given the same seed and config, it must produce identical results:
+```python
+# tests/test_determinism.py
+import pytest
+from src.runner.experiment_runner import ExperimentRunner
+from src.seed import set_seed
+class TestDeterminism:
+    def test_same_seed_same_results(self, tmp_path, fixture_config):
+        """Two runs with the same seed must produce identical metrics."""
+        config = fixture_config.copy()
+        config["experiment"]["seed"] = 42
+        config["logging"]["results_dir"] = str(tmp_path)
+        # Run 1
+        set_seed(42)
+        runner1 = ExperimentRunner(config)
+        result1 = runner1.run_single()
+        # Run 2
+        set_seed(42)
+        runner2 = ExperimentRunner(config)
+        result2 = runner2.run_single()
+        assert result1.metrics == result2.metrics, (
+            f"Non-deterministic results:\n"
+            f"  Run 1: {result1.metrics}\n"
+            f"  Run 2: {result2.metrics}"
+        )
+    def test_different_seeds_different_results(self, tmp_path, fixture_config):
+        """Different seeds should produce different results (not trivially constant)."""
+        config = fixture_config.copy()
+        config["logging"]["results_dir"] = str(tmp_path)
+        set_seed(42)
+        runner1 = ExperimentRunner(config)
+        result1 = runner1.run_single()
+        set_seed(123)
+        runner2 = ExperimentRunner(config)
+        result2 = runner2.run_single()
+        assert result1.metrics != result2.metrics, (
+            "Different seeds produced identical results -- "
+            "strategy may be ignoring the seed"
+        )
+    def test_seed_isolation_between_runs(self, tmp_path, fixture_config):
+        """Each run in the loop must use an independent seed."""
+        config = fixture_config.copy()
+        config["experiment"]["seed"] = 42
+        config["experiment"]["num_runs"] = 5
+        config["logging"]["results_dir"] = str(tmp_path)
+        runner = ExperimentRunner(config)
+        state = runner.run_loop()
+        # Verify all runs produced different metrics (not re-using the same seed)
+        metric_values = [r["metrics"]["primary"] for r in state.history]
+        assert len(set(str(v) for v in metric_values)) > 1, (
+            "All runs produced identical metrics -- seed may not be incremented"
+        )
+```
+### Component Tests
+Test each component of the experiment system in isolation:
+```python
+# tests/test_evaluator.py
+import pytest
+from src.evaluation.evaluator import MetricEvaluator
+class TestMetricEvaluator:
+    @pytest.fixture
+    def evaluator(self):
+        return MetricEvaluator(
+            primary_metric="sharpe_ratio",
+            direction="maximize",
+        )
+    def test_evaluate_returns_expected_metrics(self, evaluator):
+        """Evaluator must return all configured metrics."""
+        raw_results = {
+            "returns": [0.01, -0.005, 0.02, -0.01, 0.015],
+            "trades": 5,
+        }
+        metrics = evaluator.evaluate(raw_results)
+        assert "sharpe_ratio" in metrics
+        assert "max_drawdown" in metrics
+        assert "num_trades" in metrics
+        assert isinstance(metrics["sharpe_ratio"], float)
+    def test_is_improvement_maximization(self, evaluator):
+        """Higher primary metric should be an improvement when maximizing."""
+        current = {"sharpe_ratio": 1.5, "max_drawdown": 0.1}
+        best = {"sharpe_ratio": 1.2, "max_drawdown": 0.12}
+        assert evaluator.is_improvement(current, best) is True
+    def test_is_not_improvement(self, evaluator):
+        """Lower primary metric should not be an improvement when maximizing."""
+        current = {"sharpe_ratio": 1.0, "max_drawdown": 0.1}
+        best = {"sharpe_ratio": 1.5, "max_drawdown": 0.12}
+        assert evaluator.is_improvement(current, best) is False
+    def test_evaluate_empty_results_raises(self, evaluator):
+        """Empty results must raise a clear error, not return NaN."""
+        with pytest.raises(ValueError, match="empty"):
+            evaluator.evaluate({"returns": [], "trades": 0})
+# tests/test_state.py
+import pytest
+import json
+from pathlib import Path
+from src.runner.state import ExperimentState, RunRecord
+class TestExperimentState:
+    def test_save_and_load_roundtrip(self, tmp_path):
+        """State must survive a save/load cycle."""
+        state = ExperimentState(experiment_id="test-001")
+        run = RunRecord(
+            run_id="run-0001",
+            config={"strategy": {"type": "momentum"}},
+            metrics={"sharpe_ratio": 1.5},
+            is_best=True,
+            decision="keep",
+        )
+        state.record_run(run)
+        path = tmp_path / "state.json"
+        state.save(path)
+        loaded = ExperimentState.load(path)
+        assert loaded.experiment_id == "test-001"
+        assert loaded.total_runs == 1
+        assert loaded.best_run.metrics == {"sharpe_ratio": 1.5}
+    def test_runs_since_improvement_tracking(self):
+        """State must track runs since last improvement."""
+        state = ExperimentState(experiment_id="test")
+        # First run is always best
+        state.record_run(RunRecord(
+            run_id="1", config={}, metrics={"m": 1.0}, is_best=True, decision="keep",
+        ))
+        assert state.runs_since_improvement == 0
+        # Non-improvement increments counter
+        state.record_run(RunRecord(
+            run_id="2", config={}, metrics={"m": 0.5}, is_best=False, decision="discard",
+        ))
+        assert state.runs_since_improvement == 1
+        # New best resets counter
+        state.record_run(RunRecord(
+            run_id="3", config={}, metrics={"m": 2.0}, is_best=True, decision="keep",
+        ))
+        assert state.runs_since_improvement == 0
+```
+### Integration Tests
+Integration tests run the full experiment loop on small fixture data:
+```python
+# tests/test_integration.py
+import pytest
+from pathlib import Path
+from src.runner.experiment_runner import ExperimentRunner
+from src.loop.state_machine import ExperimentLoop, LoopState
+class TestExperimentLoopIntegration:
+    @pytest.fixture
+    def small_config(self, tmp_path):
+        return {
+            "experiment": {"seed": 42, "num_runs": 10},
+            "strategy": {"type": "mock_strategy", "params": {}},
+            "budget": {"max_runs": 10, "patience": 5},
+            "logging": {"results_dir": str(tmp_path / "results")},
+        }
+    def test_loop_runs_to_completion(self, small_config, tmp_path):
+        """Loop must complete within budget and produce valid state."""
+        runner = ExperimentRunner(small_config)
+        state = runner.run_loop()
+        assert state.total_runs <= 10
+        assert state.best_run is not None
+        assert len(state.history) == state.total_runs
+    def test_loop_persists_state(self, small_config, tmp_path):
+        """State file must exist and be loadable after loop completes."""
+        runner = ExperimentRunner(small_config)
+        runner.run_loop()
+        state_path = Path(small_config["logging"]["results_dir"]) / "state.json"
+        assert state_path.exists()
+        loaded = LoopState.load(state_path)
+        assert loaded.iteration > 0
+    def test_loop_resume_after_interruption(self, small_config, tmp_path):
+        """Loop must resume correctly from persisted state."""
+        config = small_config.copy()
+        config["budget"]["max_runs"] = 20
+        # Run 10 iterations
+        runner1 = ExperimentRunner(config)
+        runner1.budget.max_runs = 10
+        state1 = runner1.run_loop()
+        assert state1.total_runs == 10
+        # Resume from saved state, run 10 more
+        runner2 = ExperimentRunner(config)
+        state2 = runner2.run_loop()
+        assert state2.total_runs == 20
+    def test_results_directory_structure(self, small_config, tmp_path):
+        """Each run must create the expected result files."""
+        runner = ExperimentRunner(small_config)
+        runner.run_loop()
+        results_dir = Path(small_config["logging"]["results_dir"])
+        run_dirs = sorted(d for d in results_dir.iterdir()
+                          if d.is_dir() and d.name.startswith("run-"))
+        assert len(run_dirs) > 0
+        for run_dir in run_dirs:
+            assert (run_dir / "config.json").exists()
+            assert (run_dir / "metrics.json").exists()
+```
+### Regression Baselines
+Establish metric baselines so that code changes do not silently degrade results:
+```python
+# tests/test_regression.py
+import pytest
+import json
+from pathlib import Path
+BASELINE_PATH = Path("tests/fixtures/baselines/metrics_baseline.json")
+class TestRegressionBaseline:
+    @pytest.fixture(scope="class")
+    def current_metrics(self, small_config, tmp_path):
+        """Run the standard benchmark and return metrics."""
+        from src.runner.experiment_runner import ExperimentRunner
+        runner = ExperimentRunner(small_config)
+        state = runner.run_loop()
+        return state.best_run.metrics
+    @pytest.fixture(scope="class")
+    def baseline_metrics(self):
+        """Load the committed baseline metrics."""
+        with open(BASELINE_PATH) as f:
+            return json.load(f)
+    def test_primary_metric_no_regression(self, current_metrics, baseline_metrics):
+        """Primary metric must not regress beyond tolerance."""
+        tolerance = 0.05  # 5% relative tolerance
+        baseline = baseline_metrics["sharpe_ratio"]
+        current = current_metrics["sharpe_ratio"]
+        assert current >= baseline * (1 - tolerance), (
+            f"Regression: sharpe_ratio {current:.4f} < baseline {baseline:.4f} "
+            f"(tolerance: {tolerance:.0%})"
+        )
+```
+### Test Fixtures
+```python
+# tests/conftest.py
+import pytest
+@pytest.fixture
+def fixture_config(tmp_path):
+    """Minimal config for fast tests."""
+    return {
+        "experiment": {"seed": 42, "num_runs": 5},
+        "strategy": {"type": "mock_strategy", "params": {}},
+        "data": {"source": "tests/fixtures/small_data.csv"},
+        "budget": {"max_runs": 5, "patience": 3},
+        "logging": {"results_dir": str(tmp_path / "results")},
+    }
+@pytest.fixture
+def mock_strategy():
+    """Strategy that returns predictable results for testing."""
+    class MockStrategy:
+        name = "mock_strategy"
+        _call_count = 0
+        def execute(self, config):
+            self._call_count += 1
+            return {
+                "returns": [0.01 * self._call_count, -0.005, 0.02],
+                "trades": self._call_count * 10,
+            }
+        def next_hypothesis(self, state):
+            return {"param": state.iteration}
+    return MockStrategy()
+```
+### Testing Best Practices for Research
+- **Fast tests in pre-commit**: Determinism and component tests must run in < 10 seconds.
+- **Slow tests in CI**: Integration tests with actual experiment execution run in CI only (mark with `@pytest.mark.slow`).
+- **Mock external resources**: Mock file I/O, API calls, and database connections in unit tests. Integration tests may use real file I/O with `tmp_path`.
+- **Test the loop termination**: Verify that every stopping condition actually stops the loop. Budget exhaustion, patience, convergence, and error limits must all be tested.
+- **Test crash recovery**: Simulate a crash by persisting state mid-loop, then verify the loop resumes correctly.
+- **Baseline updates are deliberate**: Updating regression baselines requires a commit message explaining why the baseline changed. Never auto-update baselines in CI.

package/content/methodology/research-ml-research.yml ADDED Viewed

@@ -0,0 +1,23 @@
+# methodology/research-ml-research.yml
+name: research-ml-research
+description: >
+  ML-research domain sub-overlay — adds architecture search, training
+  patterns, and evaluation knowledge for ML research projects.
+project-type: research
+domain: ml-research
+knowledge-overrides:
+  system-architecture:
+    append: [research-ml-architecture-search, research-ml-training-patterns]
+  operations:
+    append: [research-ml-experiment-tracking]
+  tdd:
+    append: [research-ml-evaluation]
+  create-evals:
+    append: [research-ml-evaluation]
+  review-architecture:
+    append: [research-ml-architecture-search]
+  review-testing:
+    append: [research-ml-evaluation]
+  implementation-plan:
+    append: [research-ml-architecture-search]

package/content/methodology/research-overlay.yml ADDED Viewed

@@ -0,0 +1,65 @@
+# methodology/research-overlay.yml
+name: research
+description: >
+  Research overlay — injects research domain knowledge into existing
+  pipeline steps for experiment loop architecture, tracking, evaluation,
+  overfitting prevention, and domain-specific patterns.
+project-type: research
+# ---------------------------------------------------------------------------
+# knowledge-overrides
+# ---------------------------------------------------------------------------
+# Map research knowledge entries into existing pipeline steps so that
+# experiment loop domain expertise is injected during prompt assembly.
+knowledge-overrides:
+  # Foundational (6 steps)
+  create-prd:
+    append: [research-requirements]
+  user-stories:
+    append: [research-requirements]
+  coding-standards:
+    append: [research-conventions]
+  project-structure:
+    append: [research-project-structure]
+  dev-env-setup:
+    append: [research-dev-environment]
+  git-workflow:
+    append: [research-conventions]
+  # Architecture & Design (6 steps)
+  system-architecture:
+    append: [research-architecture, research-experiment-loop]
+  tech-stack:
+    append: [research-architecture]
+  adrs:
+    append: [research-architecture]
+  domain-modeling:
+    append: [research-experiment-loop]
+  security:
+    append: [research-security]
+  operations:
+    append: [research-experiment-tracking, research-observability]
+  # Testing (4 steps)
+  tdd:
+    append: [research-testing, research-overfitting-prevention]
+  add-e2e-testing:
+    append: [research-testing]
+  create-evals:
+    append: [research-testing, research-overfitting-prevention]
+  story-tests:
+    append: [research-testing]
+  # Reviews (4 steps)
+  review-architecture:
+    append: [research-architecture, research-experiment-loop]
+  review-security:
+    append: [research-security]
+  review-operations:
+    append: [research-experiment-tracking, research-observability]
+  review-testing:
+    append: [research-testing, research-overfitting-prevention]
+  # Planning (1 step)
+  implementation-plan:
+    append: [research-architecture]

package/content/methodology/research-quant-finance.yml ADDED Viewed

@@ -0,0 +1,29 @@
+# methodology/research-quant-finance.yml
+name: research-quant-finance
+description: >
+  Quant-finance domain sub-overlay — adds trading strategy, backtesting,
+  risk analysis, and market data knowledge to research projects.
+project-type: research
+domain: quant-finance
+knowledge-overrides:
+  create-prd:
+    append: [research-quant-requirements]
+  system-architecture:
+    append: [research-quant-backtesting, research-quant-strategy-patterns]
+  domain-modeling:
+    append: [research-quant-market-data]
+  security:
+    append: [research-quant-risk]
+  operations:
+    append: [research-quant-metrics]
+  tdd:
+    append: [research-quant-backtesting]
+  create-evals:
+    append: [research-quant-metrics, research-quant-backtesting]
+  review-architecture:
+    append: [research-quant-backtesting, research-quant-strategy-patterns]
+  review-testing:
+    append: [research-quant-backtesting]
+  implementation-plan:
+    append: [research-quant-backtesting, research-quant-strategy-patterns]

package/content/methodology/research-simulation.yml ADDED Viewed

@@ -0,0 +1,23 @@
+# methodology/research-simulation.yml
+name: research-simulation
+description: >
+  Simulation domain sub-overlay — adds physics/materials simulation engine,
+  parameter space, and compute management knowledge.
+project-type: research
+domain: simulation
+knowledge-overrides:
+  system-architecture:
+    append: [research-sim-engine-patterns, research-sim-parameter-spaces]
+  domain-modeling:
+    append: [research-sim-parameter-spaces]
+  operations:
+    append: [research-sim-compute-management]
+  tdd:
+    append: [research-sim-validation]
+  create-evals:
+    append: [research-sim-validation, research-sim-parameter-spaces]
+  review-architecture:
+    append: [research-sim-engine-patterns]
+  implementation-plan:
+    append: [research-sim-engine-patterns]

package/content/tools/post-implementation-review.md CHANGED Viewed

@@ -26,7 +26,7 @@ comprehensive quality check before releasing or handing off the project.
 The three channels are:
 1. **Codex CLI** — Implementation correctness, security, API contracts
 2. **Gemini CLI** — Design reasoning, architectural patterns, broad context
-3. **Superpowers code-reviewer** — Plan alignment, code quality, testing
+3. **Claude CLI** — Plan alignment, code quality, testing
 ## Inputs
@@ -191,6 +191,7 @@ Return ALL findings as valid JSON:
     {
       "severity": "P0|P1|P2|P3",
       "category": "architecture-alignment|security|error-handling|test-coverage|complexity|dependencies",
+      "location": "relative/path/to/file.ts:42",
       "file": "relative/path/to/file.ts",
       "line": 42,
       "description": "Specific description of the issue",
@@ -226,7 +227,7 @@ If not installed: queue a compensating pass (implementation correctness, securit
 codex login status 2>/dev/null && echo "codex authenticated" || echo "codex NOT authenticated"
 ```
-If not authenticated: tell the user "Codex auth expired. Run: `! codex login`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (auth_timeout or user declines): queue a compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`).
+If not authenticated: tell the user "Codex auth expired. Run: `! codex login`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (timeout or user declines): queue a compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`).
 If Codex fails during execution (non-zero exit, malformed output, timeout): queue a compensating pass with the same focus and label.
@@ -254,7 +255,7 @@ If not installed: queue a compensating pass (architectural patterns, design reas
 NO_BROWSER=true gemini -p "respond with ok" -o json 2>&1
 ```
-If exit code is 41: tell the user "Gemini auth expired. Run: `! gemini -p \"hello\"`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (auth_timeout or user declines): queue a compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`).
+If exit code is 41: tell the user "Gemini auth expired. Run: `! gemini -p \"hello\"`". Do NOT silently skip. Wait for re-auth and retry once. If auth cannot be recovered (timeout or user declines): queue a compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`).
 If Gemini fails during execution (non-zero exit, malformed output, timeout): queue a compensating pass with the same focus and label.
@@ -291,6 +292,7 @@ surfaces to this format before returning):
     {
       "severity": "P0|P1|P2|P3",
       "category": "architecture-alignment|security|error-handling|test-coverage|complexity|dependencies",
+      "location": "relative/path/to/file.ts:42",
       "file": "relative/path/to/file.ts",
       "line": 42,
       "description": "Specific description of the issue",
@@ -300,6 +302,10 @@ surfaces to this format before returning):
 }
 ```
+**MMR compatibility:** The `location` field (`file:line` format) is required for
+`mmr reconcile` injection. The `file` and `line` fields are retained for backward
+compatibility with direct channel consumers.
 Store as `SUPERPOWERS_PHASE1_FINDINGS`.
 ### Step 5: Run Phase 2 — Parallel User Story Review
@@ -441,8 +447,8 @@ before returning. Then return all three channels' findings plus channel status:
 {
   "story": "[STORY_TITLE]",
   "channel_status": {
-    "codex": { "root_cause": "null|not_installed|auth_failed|auth_timeout|failed", "coverage_status": "full|compensating" },
-    "gemini": { "root_cause": "null|not_installed|auth_failed|auth_timeout|failed", "coverage_status": "full|compensating" },
+    "codex": { "root_cause": "null|not_installed|auth_failed|timeout|failed", "coverage_status": "full|compensating" },
+    "gemini": { "root_cause": "null|not_installed|auth_failed|timeout|failed", "coverage_status": "full|compensating" },
     "superpowers": { "root_cause": null, "coverage_status": "full" }
   },
   "codex": { "findings": [...] },
@@ -453,6 +459,29 @@ before returning. Then return all three channels' findings plus channel status:
 Collect findings from all subagents. Store as `PHASE2_FINDINGS`.
+### Step 5e: Optional — Inject Findings into MMR for Unified Reconciliation
+If an MMR job exists (e.g., from a prior `mmr review` run on the same branch), the
+agent can inject its post-implementation review findings into MMR for unified
+reconciliation across all channels:
+```bash
+# Inject Phase 1 and Phase 2 findings into an existing MMR job
+# Write agent findings to a temp file for mmr reconcile
+echo "$AGENT_FINDINGS" > /tmp/agent-findings.json
+mmr reconcile "$JOB_ID" --channel superpowers --input /tmp/agent-findings.json
+```
+All findings injected via `mmr reconcile` must use MMR-compatible schema: each
+finding needs `severity` (P0-P3), `location` (file:line), and `description`
+(`suggestion` is optional). The strict validator will reject findings with
+missing or invalid required fields.
+This step is optional — post-implementation review is a full-codebase review (not
+diff-only), so it operates independently of `mmr review`. Use `mmr reconcile` only
+when you want to merge post-implementation findings into an existing MMR job for a
+single unified verdict.
 ### Step 6: Consolidate Findings
 Merge all findings from Phase 1 (`CODEX_PHASE1_FINDINGS`, `GEMINI_PHASE1_FINDINGS`,
@@ -656,9 +685,9 @@ the user they require manual attention before the project is ready to release.
 | Codex not installed (`command -v` fails) | Queue compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`); document as "not_installed" in report |
 | Gemini not installed (`command -v` fails) | Queue compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`); document as "not_installed" in report |
 | Codex auth expired — user recovers | Re-run auth check; proceed with full Codex channel |
-| Codex auth expired — user declines or auth_timeout | Queue compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`); document as "auth_failed" or "auth_timeout" in report |
+| Codex auth expired — user declines or timeout | Queue compensating pass (implementation correctness, security, API contracts, labeled `[compensating: Codex-equivalent]`); document as "auth_failed" or "timeout" in report |
 | Gemini auth expired (exit 41) — user recovers | Re-run auth check; proceed with full Gemini channel |
-| Gemini auth expired — user declines or auth_timeout | Queue compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`); document as "auth_failed" or "auth_timeout" in report |
+| Gemini auth expired — user declines or timeout | Queue compensating pass (architectural patterns, design reasoning, broad context, labeled `[compensating: Gemini-equivalent]`); document as "auth_failed" or "timeout" in report |
 | Channel fails during execution (non-zero exit, malformed output, timeout) | Queue compensating pass for that channel with same focus and label; document root cause in report |
 | Both external CLIs unavailable (any combination of not_installed / auth failure) | Run all compensating passes plus Superpowers code-reviewer; report coverage as "degraded-coverage"; warn user that review coverage is reduced |
 | Superpowers unavailable | Document as "unavailable" in report; proceed with remaining channels; Superpowers is a Claude subagent and should always be available |

package/content/tools/review-code.md CHANGED Viewed

@@ -23,7 +23,7 @@ anything leaves the machine.
 The three channels are:
 1. **Codex CLI** — implementation correctness, security, API contracts
 2. **Gemini CLI** — architectural patterns, broad-context reasoning
-3. **Superpowers code-reviewer** — Claude subagent review of code quality, tests, and plan alignment
+3. **Claude CLI** — Claude subagent review of code quality, tests, and plan alignment
 ## Inputs
@@ -46,6 +46,31 @@ The three channels are:
 ## Instructions
+### Primary: MMR CLI + Agent Reconcile
+When the MMR CLI is installed, use it as the primary entry point:
+```bash
+# Staged changes
+mmr review --staged --sync --format json
+# Branch diff against main
+mmr review --base main --sync --format json
+```
+After the CLI review completes, dispatch the agent's code-reviewer skill (4th channel) and inject findings into the MMR job for unified reconciliation:
+```bash
+# job_id is captured from mmr review --sync --format json output
+# Write agent findings to a temp file for mmr reconcile
+echo "$AGENT_FINDINGS" > /tmp/agent-findings.json
+mmr reconcile "$JOB_ID" --channel superpowers --input /tmp/agent-findings.json
+```
+The agent's review output must use MMR-compatible finding schema: each finding needs `severity` (P0-P3), `location` (file:line), and `description` (`suggestion` is optional).
+If `mmr` is not installed (`command -v mmr` fails), fall back to the manual multi-channel flow below.
 ### Step 1: Detect Mode
 Parse `$ARGUMENTS` and set:
@@ -173,7 +198,7 @@ codex login status 2>/dev/null
 - If `codex` is not installed: skip this channel and record root-cause `not_installed`
 - If auth fails: tell the user to run `! codex login`, retry after recovery, and if recovery is not possible, record root-cause `auth_failed` and continue with the remaining channels
-If auth cannot be recovered, or if Codex is not installed, queue a compensating Claude self-review pass focused on implementation correctness, security, and API contracts. Label findings as `[compensating: Codex-equivalent]`. If auth check times out (~5s), retry once; if still failing, record `auth timeout` and queue compensating pass. This pass runs after all channel dispatch attempts complete.
+If auth cannot be recovered, or if Codex is not installed, queue a compensating Claude self-review pass focused on implementation correctness, security, and API contracts. Label findings as `[compensating: Codex-equivalent]`. If auth check times out (~5s), retry once; if still failing, record `timeout` and queue compensating pass. This pass runs after all channel dispatch attempts complete.
 Build the prompt in a temporary file and pass it over stdin:
@@ -209,9 +234,9 @@ NO_BROWSER=true gemini -p "$(cat "$PROMPT_FILE")" --output-format json --approva
 If the CLI exits with a non-zero code, produces malformed/unparseable output, or is killed by the tool runner timeout, record root-cause `failed` and queue a compensating pass for that channel.
-#### Channel 3: Superpowers code-reviewer
+#### Channel 3: Claude CLI
-Dispatch the `superpowers:code-reviewer` subagent.
+Dispatch via `claude -p` with the review prompt.
 - If explicit refs are being reviewed, provide `BASE_SHA` and `HEAD_SHA`
 - Otherwise provide:
@@ -297,7 +322,7 @@ Otherwise:
 3. Repeat for up to 3 fix rounds
 4. If any finding remains unresolved after 3 rounds, stop with verdict `needs-user-decision`
-**Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `not installed`, `auth failed`, or `auth timeout` during fix rounds — its availability does not change within a session.
+**Fix cycle channel rule:** Re-run only channels that originally completed or ran as compensating passes. Never retry a channel marked `not_installed`, `auth_failed`, or `timeout` during fix rounds — its availability does not change within a session.
 ### Step 8: Final Verdict
@@ -321,9 +346,9 @@ Output a concise summary in this format:
 [scope label]
 ### Channels Executed
-- Codex CLI — root cause: [completed / not installed / auth failed / auth timeout / failed], coverage: [full / compensating (Codex-equivalent)]
-- Gemini CLI — root cause: [completed / not installed / auth failed / auth timeout / failed], coverage: [full / compensating (Gemini-equivalent)]
-- Superpowers code-reviewer — [completed / failed]
+- Codex CLI — root cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating (Codex-equivalent)]
+- Gemini CLI — root cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating (Gemini-equivalent)]
+- Claude CLI — root cause: [completed / not_installed / auth_failed / timeout / failed], coverage: [full / compensating]
 ### Findings
 [consensus findings first, then single-source findings]