npm - claude-turing - Versions diffs - 4.7.0 → 4.8.1 - Mend

claude-turing 4.7.0 → 4.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (172) hide show

package/.claude-plugin/plugin.json +2 -2
package/README.md +1 -1
package/agents/ml-evaluator.md +4 -4
package/agents/ml-researcher.md +2 -2
package/bin/turing-init.sh +2 -2
package/commands/ablate.md +3 -4
package/commands/annotate.md +2 -3
package/commands/archive.md +2 -3
package/commands/audit.md +3 -4
package/commands/baseline.md +3 -4
package/commands/brief.md +5 -6
package/commands/budget.md +3 -4
package/commands/calibrate.md +3 -4
package/commands/card.md +3 -4
package/commands/changelog.md +2 -3
package/commands/checkpoint.md +3 -4
package/commands/cite.md +2 -3
package/commands/compare.md +1 -2
package/commands/counterfactual.md +2 -3
package/commands/curriculum.md +3 -4
package/commands/design.md +3 -4
package/commands/diagnose.md +4 -5
package/commands/diff.md +3 -4
package/commands/distill.md +3 -4
package/commands/doctor.md +2 -3
package/commands/ensemble.md +3 -4
package/commands/explore.md +4 -5
package/commands/export.md +3 -4
package/commands/feature.md +3 -4
package/commands/flashback.md +2 -3
package/commands/fork.md +3 -4
package/commands/frontier.md +3 -4
package/commands/init.md +5 -6
package/commands/leak.md +3 -4
package/commands/lit.md +3 -4
package/commands/logbook.md +5 -6
package/commands/merge.md +2 -3
package/commands/mode.md +1 -2
package/commands/onboard.md +2 -3
package/commands/paper.md +3 -4
package/commands/plan.md +2 -3
package/commands/poster.md +3 -4
package/commands/postmortem.md +2 -3
package/commands/preflight.md +5 -6
package/commands/present.md +2 -3
package/commands/profile.md +3 -4
package/commands/prune.md +2 -3
package/commands/quantize.md +2 -3
package/commands/queue.md +3 -4
package/commands/registry.md +2 -3
package/commands/regress.md +3 -4
package/commands/replay.md +2 -3
package/commands/report.md +3 -4
package/commands/reproduce.md +3 -4
package/commands/retry.md +3 -4
package/commands/review.md +2 -3
package/commands/rules/loop-protocol.md +11 -11
package/commands/sanity.md +3 -4
package/commands/scale.md +4 -5
package/commands/search.md +2 -3
package/commands/seed.md +3 -4
package/commands/sensitivity.md +3 -4
package/commands/share.md +2 -3
package/commands/simulate.md +2 -3
package/commands/status.md +1 -2
package/commands/stitch.md +3 -4
package/commands/suggest.md +5 -6
package/commands/surgery.md +2 -3
package/commands/sweep.md +8 -9
package/commands/template.md +2 -3
package/commands/train.md +5 -6
package/commands/transfer.md +3 -4
package/commands/trend.md +2 -3
package/commands/try.md +4 -5
package/commands/turing.md +3 -3
package/commands/update.md +2 -3
package/commands/validate.md +4 -5
package/commands/warm.md +3 -4
package/commands/watch.md +4 -5
package/commands/whatif.md +2 -3
package/commands/xray.md +3 -4
package/config/commands.yaml +75 -75
package/package.json +3 -2
package/skills/turing/SKILL.md +3 -3
package/skills/turing/ablate/SKILL.md +3 -4
package/skills/turing/annotate/SKILL.md +2 -3
package/skills/turing/archive/SKILL.md +2 -3
package/skills/turing/audit/SKILL.md +3 -4
package/skills/turing/baseline/SKILL.md +3 -4
package/skills/turing/brief/SKILL.md +5 -6
package/skills/turing/budget/SKILL.md +3 -4
package/skills/turing/calibrate/SKILL.md +3 -4
package/skills/turing/card/SKILL.md +3 -4
package/skills/turing/changelog/SKILL.md +2 -3
package/skills/turing/checkpoint/SKILL.md +3 -4
package/skills/turing/cite/SKILL.md +2 -3
package/skills/turing/compare/SKILL.md +1 -2
package/skills/turing/counterfactual/SKILL.md +2 -3
package/skills/turing/curriculum/SKILL.md +3 -4
package/skills/turing/design/SKILL.md +3 -4
package/skills/turing/diagnose/SKILL.md +4 -5
package/skills/turing/diff/SKILL.md +3 -4
package/skills/turing/distill/SKILL.md +3 -4
package/skills/turing/doctor/SKILL.md +2 -3
package/skills/turing/ensemble/SKILL.md +3 -4
package/skills/turing/explore/SKILL.md +4 -5
package/skills/turing/export/SKILL.md +3 -4
package/skills/turing/feature/SKILL.md +3 -4
package/skills/turing/flashback/SKILL.md +2 -3
package/skills/turing/fork/SKILL.md +3 -4
package/skills/turing/frontier/SKILL.md +3 -4
package/skills/turing/init/SKILL.md +5 -6
package/skills/turing/leak/SKILL.md +3 -4
package/skills/turing/lit/SKILL.md +3 -4
package/skills/turing/logbook/SKILL.md +5 -6
package/skills/turing/merge/SKILL.md +2 -3
package/skills/turing/mode/SKILL.md +1 -2
package/skills/turing/onboard/SKILL.md +2 -3
package/skills/turing/paper/SKILL.md +3 -4
package/skills/turing/plan/SKILL.md +2 -3
package/skills/turing/poster/SKILL.md +3 -4
package/skills/turing/postmortem/SKILL.md +2 -3
package/skills/turing/preflight/SKILL.md +5 -6
package/skills/turing/present/SKILL.md +2 -3
package/skills/turing/profile/SKILL.md +3 -4
package/skills/turing/prune/SKILL.md +2 -3
package/skills/turing/quantize/SKILL.md +2 -3
package/skills/turing/queue/SKILL.md +3 -4
package/skills/turing/registry/SKILL.md +2 -3
package/skills/turing/regress/SKILL.md +3 -4
package/skills/turing/replay/SKILL.md +2 -3
package/skills/turing/report/SKILL.md +3 -4
package/skills/turing/reproduce/SKILL.md +3 -4
package/skills/turing/retry/SKILL.md +3 -4
package/skills/turing/review/SKILL.md +2 -3
package/skills/turing/rules/loop-protocol.md +11 -11
package/skills/turing/sanity/SKILL.md +3 -4
package/skills/turing/scale/SKILL.md +4 -5
package/skills/turing/search/SKILL.md +2 -3
package/skills/turing/seed/SKILL.md +3 -4
package/skills/turing/sensitivity/SKILL.md +3 -4
package/skills/turing/share/SKILL.md +2 -3
package/skills/turing/simulate/SKILL.md +2 -3
package/skills/turing/status/SKILL.md +1 -2
package/skills/turing/stitch/SKILL.md +3 -4
package/skills/turing/suggest/SKILL.md +5 -6
package/skills/turing/surgery/SKILL.md +2 -3
package/skills/turing/sweep/SKILL.md +8 -9
package/skills/turing/template/SKILL.md +2 -3
package/skills/turing/train/SKILL.md +5 -6
package/skills/turing/transfer/SKILL.md +3 -4
package/skills/turing/trend/SKILL.md +2 -3
package/skills/turing/try/SKILL.md +4 -5
package/skills/turing/update/SKILL.md +2 -3
package/skills/turing/validate/SKILL.md +4 -5
package/skills/turing/warm/SKILL.md +3 -4
package/skills/turing/watch/SKILL.md +4 -5
package/skills/turing/whatif/SKILL.md +2 -3
package/skills/turing/xray/SKILL.md +3 -4
package/src/command-registry.js +12 -0
package/src/install.js +4 -3
package/src/sync-commands-layout.js +149 -0
package/src/sync-skills-layout.js +4 -133
package/templates/README.md +5 -8
package/templates/program.md +18 -18
package/templates/pyproject.toml +10 -0
package/templates/requirements.txt +4 -1
package/templates/scripts/generate_onboarding.py +1 -1
package/templates/scripts/post-train-hook.sh +7 -8
package/templates/scripts/scaffold.py +24 -26
package/templates/scripts/stop-hook.sh +2 -3
package/templates/scripts/turing-run-python.sh +9 -0

package/commands/replay.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: replay
 description: Experiment replay — re-run a historical experiment with current infrastructure to test if old approaches do better now.
-disable-model-invocation: true
 argument-hint: "<exp-id> [--with-current-data] [--with-current-preprocessing]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -9,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
 Should you revisit old ideas? Infrastructure changes may make failed approaches work now.
 ## Steps
-1. **Activate environment:** `source .venv/bin/activate`
-2. **Run:** `python scripts/experiment_replay.py $ARGUMENTS`
+1. **Sync environment:** `uv sync`
+2. **Run:** `uv run python scripts/experiment_replay.py $ARGUMENTS`
 3. **Modes:** default (current code+data), --with-current-data, --with-current-preprocessing
 4. **Report:** original vs replayed metrics, delta, verdict
 5. **Saved output:** `experiments/replays/`

package/commands/report.md CHANGED Viewed

@@ -1,9 +1,8 @@
 ---
 name: report
 description: Generate a markdown research report from experiment history — structured for sharing, archiving, or including in documentation. More detailed than a brief, less visual than a poster.
-disable-model-invocation: true
 argument-hint: "[--since YYYY-MM-DD] [--output path]"
-allowed-tools: Read, Bash(python scripts/*:*, source .venv/bin/activate:*, mkdir:*), Grep, Glob
+allowed-tools: Read, Bash(uv run python scripts/*:*, uv sync:*, mkdir:*), Grep, Glob
 ---
 Generate a structured markdown research report summarizing the experiment campaign.
@@ -15,12 +14,12 @@ Generate a structured markdown research report summarizing the experiment campai
 Use the logbook generator in markdown mode as the data backbone:
 ```bash
-source .venv/bin/activate && python scripts/generate_logbook.py --format markdown
+uv run python scripts/generate_logbook.py --format markdown
 ```
 Also gather supplementary data:
 ```bash
-source .venv/bin/activate && python scripts/generate_brief.py
+uv run python scripts/generate_brief.py
 cat experiment_state.yaml 2>/dev/null || true
 cat RESEARCH_PLAN.md 2>/dev/null || true
 ```

package/commands/reproduce.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: reproduce
 description: Verify reproducibility of a specific experiment by re-running from logged config and checking metrics fall within tolerance.
-disable-model-invocation: true
 argument-hint: "<exp-id> [--tolerance 0.02] [--strict] [--runs 3]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Verify that a logged experiment can be reproduced with consistent results.
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -23,7 +22,7 @@ Verify that a logged experiment can be reproduced with consistent results.
 3. **Run reproducibility verification:**
    ```bash
-   python scripts/reproduce_experiment.py $ARGUMENTS
+   uv run python scripts/reproduce_experiment.py $ARGUMENTS
    ```
 4. **Report results:**

package/commands/retry.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: retry
 description: Smart failure recovery — auto-diagnose crash type and retry with targeted fix. OOM → halve batch. NaN → add clipping.
-disable-model-invocation: true
 argument-hint: "<exp-id> [--max-attempts 3]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Auto-diagnose and recover from experiment failures.
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -22,7 +21,7 @@ Auto-diagnose and recover from experiment failures.
 3. **Run smart retry:**
    ```bash
-   python scripts/smart_retry.py $ARGUMENTS
+   uv run python scripts/smart_retry.py $ARGUMENTS
    ```
 4. **Report results:**

package/commands/review.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: review
 description: Peer review simulation — generate likely reviewer objections with severity ratings and fix commands.
-disable-model-invocation: true
 argument-hint: "[--venue neurips|icml|general] [--harsh]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -9,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
 Simulate a conference reviewer before you submit. Each weakness links to the command that fixes it.
 ## Steps
-1. `source .venv/bin/activate`
-2. `python scripts/simulate_review.py $ARGUMENTS`
+1. `uv sync`
+2. `uv run python scripts/simulate_review.py $ARGUMENTS`
 3. **Saved:** `experiments/reviews/`
 ## Examples

package/commands/rules/loop-protocol.md CHANGED Viewed

@@ -16,9 +16,9 @@ The autoresearch harness enforces a strict separation between the **hypothesis s
 ## Execution Rules
-- **ALWAYS redirect training output:** `python train.py > run.log 2>&1`
+- **ALWAYS redirect training output:** `uv run python train.py > run.log 2>&1`
 - **ALWAYS parse metrics with grep** between `---` delimiters: `grep -A 10 "^---" run.log | head -10`
-- **ALWAYS activate the venv first:** `source .venv/bin/activate`
+- **ALWAYS run Python through uv:** `uv run python ...`
 - **NEVER install new packages** without human approval
 ## Git Discipline
@@ -40,16 +40,16 @@ The autoresearch harness enforces a strict separation between the **hypothesis s
 ## Sweep Workflow
-1. Generate queue: `python scripts/sweep.py`
-2. Check status: `python scripts/sweep.py --status`
-3. Get next: `python scripts/sweep.py --next`
+1. Generate queue: `uv run python scripts/sweep.py`
+2. Check status: `uv run python scripts/sweep.py --status`
+3. Get next: `uv run python scripts/sweep.py --next`
 4. Apply overrides, create branch, run training
-5. Mark: `python scripts/sweep.py --mark <name> complete|failed`
+5. Mark: `uv run python scripts/sweep.py --mark <name> complete|failed`
 6. Repeat until queue is empty
 ## Logging Rules
-- **Log every experiment** to `experiments/log.jsonl` via `python scripts/log_experiment.py` — kept and discarded alike.
+- **Log every experiment** to `experiments/log.jsonl` via `uv run python scripts/log_experiment.py` — kept and discarded alike.
 - **Include all metrics, config, and description** of the hypothesis and its outcome.
 ## Convergence Rules
@@ -64,11 +64,11 @@ The researcher agent's Bash access is restricted to a whitelist of necessary com
 | Allowed Pattern | Purpose |
 |-----------------|---------|
-| `python train.py:*` | Execute training |
-| `python scripts/*:*` | Run utility scripts (logging, metrics, sweep) |
+| `uv run python train.py:*` | Execute training |
+| `uv run python scripts/*:*` | Run utility scripts (logging, metrics, sweep) |
 | `git:*` | Branch, commit, merge, reset operations |
-| `source .venv/bin/activate:*` | Virtual environment activation |
-| `pip:*` | Package installation (requires human approval) |
+| `uv sync:*` | Virtual environment activation |
+| `uv add:*` | Package installation (requires human approval) |
 **Blocked by omission:** `cat`, `head`, `tail`, `less` (prevents reading hidden files via shell), `curl`, `wget` (prevents data exfiltration), arbitrary command execution.

package/commands/sanity.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: sanity
 description: Pre-training sanity checks — catch broken data loaders, misconfigured losses, and dead gradients in 30 seconds before wasting hours.
-disable-model-invocation: true
 argument-hint: "[--quick] [--verbose]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Run a battery of fast checks before committing to a full training run. Catches w
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -22,7 +21,7 @@ Run a battery of fast checks before committing to a full training run. Catches w
 3. **Run sanity checks:**
    ```bash
-   python scripts/sanity_checks.py $ARGUMENTS
+   uv run python scripts/sanity_checks.py $ARGUMENTS
    ```
 4. **Checks performed:**

package/commands/scale.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: scale
 description: Scaling law estimator — run small experiments at different sizes, fit a power law, and predict full-scale performance before committing compute.
-disable-model-invocation: true
 argument-hint: "[--axis data|compute|params] [--points 4] [--analyze results.yaml]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Predict full-scale performance from a handful of small experiments. Answers "is
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -25,11 +24,11 @@ Predict full-scale performance from a handful of small experiments. Answers "is
 3. **Plan or analyze:**
    - **Plan mode (default):** generates scale point configs to run
      ```bash
-     python scripts/scaling_estimator.py --axis data --points 4
+     uv run python scripts/scaling_estimator.py --axis data --points 4
      ```
    - **Analyze mode:** fits power law to completed results
      ```bash
-     python scripts/scaling_estimator.py --analyze experiments/scaling/results.yaml
+     uv run python scripts/scaling_estimator.py --analyze experiments/scaling/results.yaml
      ```
 4. **Scaling axes:**

package/commands/search.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: search
 description: Natural language experiment search — query with text + structured filters over 200+ experiments.
-disable-model-invocation: true
 argument-hint: "<query> [--filter \"accuracy>0.85\"] [--limit 10]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -9,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
 Find specific experiments in a large history with natural language and structured filters.
 ## Steps
-1. **Activate environment:** `source .venv/bin/activate`
-2. **Run:** `python scripts/experiment_search.py $ARGUMENTS`
+1. **Sync environment:** `uv sync`
+2. **Run:** `uv run python scripts/experiment_search.py $ARGUMENTS`
 3. **Filters:** `accuracy>0.85`, `status:kept`, `family:baseline`, `date:last-week`
 4. **Report:** ranked table of matching experiments

package/commands/seed.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: seed
 description: Run multi-seed study on an experiment to compute mean/std/CI and flag seed-sensitive results. Prevents publishing lucky seeds.
-disable-model-invocation: true
 argument-hint: "[N] [--quick] [--exp-id <id>]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Run a multi-seed study to verify that experiment results are robust across rando
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -23,7 +22,7 @@ Run a multi-seed study to verify that experiment results are robust across rando
 3. **Run seed study:**
    ```bash
-   python scripts/seed_runner.py $ARGUMENTS
+   uv run python scripts/seed_runner.py $ARGUMENTS
    ```
 4. **Report results:**

package/commands/sensitivity.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: sensitivity
 description: Hyperparameter sensitivity analysis — rank parameters by impact, identify which matter and which are noise.
-disable-model-invocation: true
 argument-hint: "[exp-id] [--params learning_rate,max_depth]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Which hyperparameters actually matter? Stop wasting time on the ones that don't.
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -22,7 +21,7 @@ Which hyperparameters actually matter? Stop wasting time on the ones that don't.
 3. **Run sensitivity analysis:**
    ```bash
-   python scripts/sensitivity_analysis.py $ARGUMENTS
+   uv run python scripts/sensitivity_analysis.py $ARGUMENTS
    ```
 4. **Report includes:**

package/commands/share.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: share
 description: Experiment packaging — portable archive with config, metrics, seed study, annotations, reproduction instructions.
-disable-model-invocation: true
 argument-hint: "<exp-ids...> [--include model,figures,code]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -9,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
 Package experiments for collaborator handoff or paper supplementary material.
 ## Steps
-1. `source .venv/bin/activate`
-2. `python scripts/package_experiments.py $ARGUMENTS`
+1. `uv sync`
+2. `uv run python scripts/package_experiments.py $ARGUMENTS`
 3. **Saved:** `exports/packages/<name>/`
 ## Examples

package/commands/simulate.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: simulate
 description: Experiment outcome prediction — predict which configs will beat the current best before running them.
-disable-model-invocation: true
 argument-hint: "[--configs configs.yaml] [--top-k 5] [--threshold 0.001]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -9,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
 Predict outcomes before spending compute. Ranks proposed configs and recommends which to run vs skip.
 ## Steps
-1. `source .venv/bin/activate`
-2. `python scripts/experiment_simulator.py $ARGUMENTS`
+1. `uv sync`
+2. `uv run python scripts/experiment_simulator.py $ARGUMENTS`
 3. **Saved:** `experiments/simulations/`
 ## How it works

package/commands/status.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: status
 description: Show current ML experiment status — best model, recent experiments, convergence state, and trend analysis. Delegates to @ml-evaluator for read-only safety.
-disable-model-invocation: true
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -11,7 +10,7 @@ Show the current state of the ML training pipeline. This is an observation-only
 1. **Run metrics display:**
    ```bash
-   source .venv/bin/activate && python scripts/show_metrics.py --last 10
+   uv run python scripts/show_metrics.py --last 10
    ```
 2. **Summarize for the user:**

package/commands/stitch.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: stitch
 description: Pipeline composition — decompose ML pipelines into swappable stages. Show, swap, cache, and run stages independently.
-disable-model-invocation: true
 argument-hint: "<show|swap|cache|run> [stage] [--from exp-id]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Decompose your ML pipeline into stages that can be independently varied, cached,
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -24,7 +23,7 @@ Decompose your ML pipeline into stages that can be independently varied, cached,
 3. **Run pipeline manager:**
    ```bash
-   python scripts/pipeline_manager.py $ARGUMENTS
+   uv run python scripts/pipeline_manager.py $ARGUMENTS
    ```
 4. **Report results:**

package/commands/suggest.md CHANGED Viewed

@@ -1,9 +1,8 @@
 ---
 name: suggest
 description: Literature-grounded model selection. Reads the ML task context, searches recent literature, and suggests model architectures worth trying — with citations. Suggestions are auto-queued as hypotheses.
-disable-model-invocation: true
 argument-hint: "[task description override]"
-allowed-tools: Read, Write, Bash(python scripts/*:*, source .venv/bin/activate:*), Grep, Glob, WebSearch, WebFetch
+allowed-tools: Read, Write, Bash(uv run python scripts/*:*, uv sync:*), Grep, Glob, WebSearch, WebFetch
 ---
 Suggest model architectures for the current ML task. Supports two strategies:
@@ -26,7 +25,7 @@ cat config.yaml
 ```
 ```bash
-source .venv/bin/activate && python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
+uv run python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
 ```
 If `$ARGUMENTS` is provided, use that as the task description. Otherwise, infer from `config.yaml` (model type, primary metric, data source, target column).
@@ -67,7 +66,7 @@ From the literature, synthesize **3-5 concrete model architecture suggestions**.
 For each suggestion, add to the hypothesis queue:
 ```bash
-source .venv/bin/activate && python scripts/manage_hypotheses.py add "<model>: <rationale> (source: <citation>)" --priority medium --source literature
+uv run python scripts/manage_hypotheses.py add "<model>: <rationale> (source: <citation>)" --priority medium --source literature
 ```
 ### 5. Show Results
@@ -106,7 +105,7 @@ Same detection logic as the literature strategy — find `config.yaml` + `train.
 ### 2. Run Tree Search
 ```bash
-source .venv/bin/activate && python scripts/treequest_suggest.py \
+uv run python scripts/treequest_suggest.py \
     --log experiments/log.jsonl \
     --config config.yaml \
     --top 5 \
@@ -121,7 +120,7 @@ If TreeQuest is not installed, the script automatically falls back to greedy bes
 For each result from the tree search, queue as a hypothesis:
 ```bash
-source .venv/bin/activate && python scripts/manage_hypotheses.py add "<description>" --priority medium --source treequest
+uv run python scripts/manage_hypotheses.py add "<description>" --priority medium --source treequest
 ```
 ### 4. Show Results

package/commands/surgery.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: surgery
 description: Architecture modification — add/remove layers, widen/narrow, swap activations, inject skip connections. Specify what to change, system handles how.
-disable-model-invocation: true
 argument-hint: "<exp-id> --op <operation> [args...]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,8 +9,8 @@ Programmatic architecture changes with auto warm-start from existing weights.
 ## Steps
-1. **Activate environment:** `source .venv/bin/activate`
-2. **Run:** `python scripts/architecture_surgery.py $ARGUMENTS`
+1. **Sync environment:** `uv sync`
+2. **Run:** `uv run python scripts/architecture_surgery.py $ARGUMENTS`
 3. **Operations:** add-layer, remove-layer, widen, narrow, swap-activation, add-skip, add-norm, deepen, swap-objective
 4. **For tree models:** deepen (increase max_depth), widen (more estimators), swap-objective
 5. **Report:** operation details, config changes, parameter count delta, warm-start source

package/commands/sweep.md CHANGED Viewed

@@ -1,40 +1,39 @@
 ---
 name: sweep
 description: Generate and run a systematic hyperparameter sweep. Computes the cartesian product of configured parameter ranges and processes the queue sequentially with full experiment logging.
-disable-model-invocation: true
 argument-hint: "[sweep_config.yaml]"
-allowed-tools: Read, Write, Edit, Bash(python train.py:*, python scripts/*:*, git:*, source .venv/bin/activate:*, pip:*), Grep, Glob
+allowed-tools: Read, Write, Edit, Bash(uv run python train.py:*, uv run python scripts/*:*, git:*, uv sync:*, uv add:*), Grep, Glob
 ---
 Run a systematic hyperparameter sweep using the sweep configuration.
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Resolve config:** Use `$ARGUMENTS` as sweep config path, or default to `sweep_config.yaml`.
 3. **Generate queue** (if not already generated):
    ```bash
-   python scripts/sweep.py [sweep_config.yaml]
+   uv run python scripts/sweep.py [sweep_config.yaml]
    ```
 4. **Check queue status:**
    ```bash
-   python scripts/sweep.py --status
+   uv run python scripts/sweep.py --status
    ```
 5. **Process queue sequentially:**
-   - Get next: `python scripts/sweep.py --next`
+   - Get next: `uv run python scripts/sweep.py --next`
    - Apply config overrides to `config.yaml`
    - Create experiment branch: `git checkout -b exp/NNN-description`
-   - Run training: `python train.py > run.log 2>&1`
+   - Run training: `uv run python train.py > run.log 2>&1`
    - Parse metrics: `grep -A 10 "^---" run.log | head -10`
    - Log the experiment
-   - Mark complete: `python scripts/sweep.py --mark <name> complete`
+   - Mark complete: `uv run python scripts/sweep.py --mark <name> complete`
    - If improved, merge to main. If not, return to main.
    - Repeat until queue is empty

package/commands/template.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: template
 description: Experiment template library — save winning configs as reusable templates, apply to new projects.
-disable-model-invocation: true
 argument-hint: "<save|list|apply|share> [--name name] [--from exp-id]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -9,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
 Turn your best experiment configs into reusable recipes that persist across projects.
 ## Steps
-1. **Activate environment:** `source .venv/bin/activate`
-2. **Run:** `python scripts/experiment_templates.py $ARGUMENTS`
+1. **Sync environment:** `uv sync`
+2. **Run:** `uv run python scripts/experiment_templates.py $ARGUMENTS`
 3. **Operations:** save (from experiment), list (all templates), apply (to current project), share (export)
 4. **Stored at:** `~/.turing/templates/` (cross-project)

package/commands/train.md CHANGED Viewed

@@ -1,9 +1,8 @@
 ---
 name: train
 description: Run the autonomous ML experiment loop. Iteratively hypothesizes, trains, evaluates, and decides — keeping only improvements. Implements the autoresearch pattern with formal convergence detection and git-disciplined rollback.
-disable-model-invocation: true
 argument-hint: "[max_iterations]"
-allowed-tools: Read, Write, Edit, Bash(python train.py:*, python scripts/*:*, git:*, source .venv/bin/activate:*, pip:*), Grep, Glob
+allowed-tools: Read, Write, Edit, Bash(uv run python train.py:*, uv run python scripts/*:*, git:*, uv sync:*, uv add:*), Grep, Glob
 ---
 You are an autonomous ML researcher. Your goal: iteratively improve a model by following the experiment loop protocol — the scientific method applied to machine learning.
@@ -27,9 +26,9 @@ Read `program.md` in the ML project directory for the complete protocol. Follow
 1. **Restore memory:** Read `.claude/agent-memory/ml-researcher-{project_name}/MEMORY.md` for prior observations and best results.
 2. **Read protocol:** Read `program.md` completely — it defines the experiment loop, constraints, and output format.
-3. **Bootstrap data:** Check for training data at `config.yaml` → `data.source`. If no splits exist, run `python prepare.py`.
-4. **Bootstrap venv:** `test -d .venv || (python3 -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt)`
-5. **Assess state:** `source .venv/bin/activate && python scripts/show_metrics.py --last 5`
+3. **Bootstrap data:** Check for training data at `config.yaml` → `data.source`. If no splits exist, run `uv run python prepare.py`.
+4. **Bootstrap uv environment:** `uv sync`
+5. **Assess state:** `uv run python scripts/show_metrics.py --last 5`
 6. **Begin the loop** from program.md.
 ## The Loop
@@ -48,7 +47,7 @@ Use `@ml-evaluator` for analysis tasks. It is read-only (no Write/Edit) and cann
 ## Context Management
-- Redirect all training output: `python train.py > run.log 2>&1`
+- Redirect all training output: `uv run python train.py > run.log 2>&1`
 - Parse metrics with grep, never read full output
 - Persist observations to MEMORY.md after each experiment

package/commands/transfer.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: transfer
 description: Cross-project knowledge transfer — find similar prior projects and surface what worked. Builds institutional ML memory.
-disable-model-invocation: true
 argument-hint: "[--from project-path] [--auto]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Find similar prior projects and surface what worked. "Last time you had tabular
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -23,7 +22,7 @@ Find similar prior projects and surface what worked. "Last time you had tabular
 3. **Run knowledge transfer:**
    ```bash
-   python scripts/knowledge_transfer.py $ARGUMENTS
+   uv run python scripts/knowledge_transfer.py $ARGUMENTS
    ```
 4. **Report includes:**

package/commands/trend.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: trend
 description: Long-term trend analysis — improvement velocity, family ROI, diminishing returns detection, strategic research direction.
-disable-model-invocation: true
 argument-hint: "[--window 30d] [--metric accuracy]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -9,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
 See the arc of your research, not just the latest results. Strategic view over 100+ experiments.
 ## Steps
-1. **Activate environment:** `source .venv/bin/activate`
-2. **Run:** `python scripts/trend_analysis.py $ARGUMENTS`
+1. **Sync environment:** `uv sync`
+2. **Run:** `uv run python scripts/trend_analysis.py $ARGUMENTS`
 3. **Report:** improvement velocity over time windows, family ROI ranking, diminishing returns prediction, phase transitions
 4. **Saved output:** `experiments/trends/trend-*.yaml`

package/commands/try.md CHANGED Viewed

@@ -1,9 +1,8 @@
 ---
 name: try
 description: Inject a hypothesis into the agent's experiment queue. This is how research taste reaches the agent — the human selects which coins to flip, the agent flips them.
-disable-model-invocation: true
 argument-hint: "<hypothesis description>"
-allowed-tools: Read, Write, Edit, Bash(python scripts/*:*, source .venv/bin/activate:*), Grep, Glob
+allowed-tools: Read, Write, Edit, Bash(uv run python scripts/*:*, uv sync:*), Grep, Glob
 ---
 Inject a human hypothesis into the experiment queue for the next `/turing:train` iteration.
@@ -16,18 +15,18 @@ This is the taste-leverage mechanism: you provide judgment about what's worth tr
 2. **Check for archetype syntax.** If the argument starts with `archetype:`, expand it:
    ```bash
-   source .venv/bin/activate && python scripts/manage_hypotheses.py add --archetype <name> --priority high --source human
+   uv run python scripts/manage_hypotheses.py add --archetype <name> --priority high --source human
    ```
    Otherwise, use the raw description:
    ```bash
-   source .venv/bin/activate && python scripts/manage_hypotheses.py add "$ARGUMENTS" --priority high --source human
+   uv run python scripts/manage_hypotheses.py add "$ARGUMENTS" --priority high --source human
    ```
 3. **Confirm** with the hypothesis ID and instructions:
    - "Queued as hyp-NNN (high priority, human-injected)"
    - "The agent will prioritize this on the next `/turing:train` iteration"
-   - Show current queue: `python scripts/manage_hypotheses.py list --status queued`
+   - Show current queue: `uv run python scripts/manage_hypotheses.py list --status queued`
 ## Examples