npm - claude-turing - Versions diffs - 4.7.0 → 4.8.1 - Mend

claude-turing 4.7.0 → 4.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (172) hide show

package/.claude-plugin/plugin.json +2 -2
package/README.md +1 -1
package/agents/ml-evaluator.md +4 -4
package/agents/ml-researcher.md +2 -2
package/bin/turing-init.sh +2 -2
package/commands/ablate.md +3 -4
package/commands/annotate.md +2 -3
package/commands/archive.md +2 -3
package/commands/audit.md +3 -4
package/commands/baseline.md +3 -4
package/commands/brief.md +5 -6
package/commands/budget.md +3 -4
package/commands/calibrate.md +3 -4
package/commands/card.md +3 -4
package/commands/changelog.md +2 -3
package/commands/checkpoint.md +3 -4
package/commands/cite.md +2 -3
package/commands/compare.md +1 -2
package/commands/counterfactual.md +2 -3
package/commands/curriculum.md +3 -4
package/commands/design.md +3 -4
package/commands/diagnose.md +4 -5
package/commands/diff.md +3 -4
package/commands/distill.md +3 -4
package/commands/doctor.md +2 -3
package/commands/ensemble.md +3 -4
package/commands/explore.md +4 -5
package/commands/export.md +3 -4
package/commands/feature.md +3 -4
package/commands/flashback.md +2 -3
package/commands/fork.md +3 -4
package/commands/frontier.md +3 -4
package/commands/init.md +5 -6
package/commands/leak.md +3 -4
package/commands/lit.md +3 -4
package/commands/logbook.md +5 -6
package/commands/merge.md +2 -3
package/commands/mode.md +1 -2
package/commands/onboard.md +2 -3
package/commands/paper.md +3 -4
package/commands/plan.md +2 -3
package/commands/poster.md +3 -4
package/commands/postmortem.md +2 -3
package/commands/preflight.md +5 -6
package/commands/present.md +2 -3
package/commands/profile.md +3 -4
package/commands/prune.md +2 -3
package/commands/quantize.md +2 -3
package/commands/queue.md +3 -4
package/commands/registry.md +2 -3
package/commands/regress.md +3 -4
package/commands/replay.md +2 -3
package/commands/report.md +3 -4
package/commands/reproduce.md +3 -4
package/commands/retry.md +3 -4
package/commands/review.md +2 -3
package/commands/rules/loop-protocol.md +11 -11
package/commands/sanity.md +3 -4
package/commands/scale.md +4 -5
package/commands/search.md +2 -3
package/commands/seed.md +3 -4
package/commands/sensitivity.md +3 -4
package/commands/share.md +2 -3
package/commands/simulate.md +2 -3
package/commands/status.md +1 -2
package/commands/stitch.md +3 -4
package/commands/suggest.md +5 -6
package/commands/surgery.md +2 -3
package/commands/sweep.md +8 -9
package/commands/template.md +2 -3
package/commands/train.md +5 -6
package/commands/transfer.md +3 -4
package/commands/trend.md +2 -3
package/commands/try.md +4 -5
package/commands/turing.md +3 -3
package/commands/update.md +2 -3
package/commands/validate.md +4 -5
package/commands/warm.md +3 -4
package/commands/watch.md +4 -5
package/commands/whatif.md +2 -3
package/commands/xray.md +3 -4
package/config/commands.yaml +75 -75
package/package.json +3 -2
package/skills/turing/SKILL.md +3 -3
package/skills/turing/ablate/SKILL.md +3 -4
package/skills/turing/annotate/SKILL.md +2 -3
package/skills/turing/archive/SKILL.md +2 -3
package/skills/turing/audit/SKILL.md +3 -4
package/skills/turing/baseline/SKILL.md +3 -4
package/skills/turing/brief/SKILL.md +5 -6
package/skills/turing/budget/SKILL.md +3 -4
package/skills/turing/calibrate/SKILL.md +3 -4
package/skills/turing/card/SKILL.md +3 -4
package/skills/turing/changelog/SKILL.md +2 -3
package/skills/turing/checkpoint/SKILL.md +3 -4
package/skills/turing/cite/SKILL.md +2 -3
package/skills/turing/compare/SKILL.md +1 -2
package/skills/turing/counterfactual/SKILL.md +2 -3
package/skills/turing/curriculum/SKILL.md +3 -4
package/skills/turing/design/SKILL.md +3 -4
package/skills/turing/diagnose/SKILL.md +4 -5
package/skills/turing/diff/SKILL.md +3 -4
package/skills/turing/distill/SKILL.md +3 -4
package/skills/turing/doctor/SKILL.md +2 -3
package/skills/turing/ensemble/SKILL.md +3 -4
package/skills/turing/explore/SKILL.md +4 -5
package/skills/turing/export/SKILL.md +3 -4
package/skills/turing/feature/SKILL.md +3 -4
package/skills/turing/flashback/SKILL.md +2 -3
package/skills/turing/fork/SKILL.md +3 -4
package/skills/turing/frontier/SKILL.md +3 -4
package/skills/turing/init/SKILL.md +5 -6
package/skills/turing/leak/SKILL.md +3 -4
package/skills/turing/lit/SKILL.md +3 -4
package/skills/turing/logbook/SKILL.md +5 -6
package/skills/turing/merge/SKILL.md +2 -3
package/skills/turing/mode/SKILL.md +1 -2
package/skills/turing/onboard/SKILL.md +2 -3
package/skills/turing/paper/SKILL.md +3 -4
package/skills/turing/plan/SKILL.md +2 -3
package/skills/turing/poster/SKILL.md +3 -4
package/skills/turing/postmortem/SKILL.md +2 -3
package/skills/turing/preflight/SKILL.md +5 -6
package/skills/turing/present/SKILL.md +2 -3
package/skills/turing/profile/SKILL.md +3 -4
package/skills/turing/prune/SKILL.md +2 -3
package/skills/turing/quantize/SKILL.md +2 -3
package/skills/turing/queue/SKILL.md +3 -4
package/skills/turing/registry/SKILL.md +2 -3
package/skills/turing/regress/SKILL.md +3 -4
package/skills/turing/replay/SKILL.md +2 -3
package/skills/turing/report/SKILL.md +3 -4
package/skills/turing/reproduce/SKILL.md +3 -4
package/skills/turing/retry/SKILL.md +3 -4
package/skills/turing/review/SKILL.md +2 -3
package/skills/turing/rules/loop-protocol.md +11 -11
package/skills/turing/sanity/SKILL.md +3 -4
package/skills/turing/scale/SKILL.md +4 -5
package/skills/turing/search/SKILL.md +2 -3
package/skills/turing/seed/SKILL.md +3 -4
package/skills/turing/sensitivity/SKILL.md +3 -4
package/skills/turing/share/SKILL.md +2 -3
package/skills/turing/simulate/SKILL.md +2 -3
package/skills/turing/status/SKILL.md +1 -2
package/skills/turing/stitch/SKILL.md +3 -4
package/skills/turing/suggest/SKILL.md +5 -6
package/skills/turing/surgery/SKILL.md +2 -3
package/skills/turing/sweep/SKILL.md +8 -9
package/skills/turing/template/SKILL.md +2 -3
package/skills/turing/train/SKILL.md +5 -6
package/skills/turing/transfer/SKILL.md +3 -4
package/skills/turing/trend/SKILL.md +2 -3
package/skills/turing/try/SKILL.md +4 -5
package/skills/turing/update/SKILL.md +2 -3
package/skills/turing/validate/SKILL.md +4 -5
package/skills/turing/warm/SKILL.md +3 -4
package/skills/turing/watch/SKILL.md +4 -5
package/skills/turing/whatif/SKILL.md +2 -3
package/skills/turing/xray/SKILL.md +3 -4
package/src/command-registry.js +12 -0
package/src/install.js +4 -3
package/src/sync-commands-layout.js +149 -0
package/src/sync-skills-layout.js +4 -133
package/templates/README.md +5 -8
package/templates/program.md +18 -18
package/templates/pyproject.toml +10 -0
package/templates/requirements.txt +4 -1
package/templates/scripts/generate_onboarding.py +1 -1
package/templates/scripts/post-train-hook.sh +7 -8
package/templates/scripts/scaffold.py +24 -26
package/templates/scripts/stop-hook.sh +2 -3
package/templates/scripts/turing-run-python.sh +9 -0

package/commands/explore.md CHANGED Viewed

@@ -1,9 +1,8 @@
 ---
 name: explore
 description: Tree-search-guided hypothesis exploration using AB-MCTS. Explores the space of experiment ideas as a search tree, scored by the critique engine. Discovers non-obvious refinement chains that linear suggestion cannot find.
-disable-model-invocation: true
 argument-hint: "[ml/project] [--iterations N] [--top N] [--strategy abmcts-a|abmcts-m|greedy]"
-allowed-tools: Read, Write, Bash(python scripts/*:*, source .venv/bin/activate:*), Grep, Glob
+allowed-tools: Read, Write, Bash(uv run python scripts/*:*, uv sync:*), Grep, Glob
 ---
 Explore the hypothesis space using tree search. Instead of suggesting independent ideas, this builds and searches a tree of refinement chains — each node is a hypothesis scored by novelty, feasibility, and expected impact.
@@ -32,7 +31,7 @@ Extract from `$ARGUMENTS`:
 ### 1. Assess Current State
 ```bash
-source .venv/bin/activate && python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
+uv run python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
 ```
 Read `config.yaml` to understand the current model and metric.
@@ -40,7 +39,7 @@ Read `config.yaml` to understand the current model and metric.
 ### 2. Run Tree Search
 ```bash
-source .venv/bin/activate && python scripts/treequest_suggest.py \
+uv run python scripts/treequest_suggest.py \
     --log experiments/log.jsonl \
     --config config.yaml \
     --top <N> \
@@ -59,7 +58,7 @@ The script will:
 For each result, add to the hypothesis queue:
 ```bash
-source .venv/bin/activate && python scripts/manage_hypotheses.py add "<description>" \
+uv run python scripts/manage_hypotheses.py add "<description>" \
     --priority medium --source treequest
 ```

package/commands/export.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: export
 description: Export model to production format with equivalence verification, latency benchmarking, and deployment model card.
-disable-model-invocation: true
 argument-hint: "[exp-id] [--format joblib|xgboost_json|onnx|torchscript|tflite]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Export a trained model to a production-ready format.
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -24,7 +23,7 @@ Export a trained model to a production-ready format.
 3. **Run export pipeline:**
    ```bash
-   python scripts/export_model.py $ARGUMENTS
+   uv run python scripts/export_model.py $ARGUMENTS
    ```
 4. **Report results:**

package/commands/feature.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: feature
 description: Automated feature selection — multi-method importance consensus, redundancy detection, and interaction feature generation.
-disable-model-invocation: true
 argument-hint: "[--method all|importance] [--top-k 20]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Systematically evaluate which features matter and which are noise.
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -22,7 +21,7 @@ Systematically evaluate which features matter and which are noise.
 3. **Run feature analysis:**
    ```bash
-   python scripts/feature_intelligence.py $ARGUMENTS
+   uv run python scripts/feature_intelligence.py $ARGUMENTS
    ```
 4. **Report includes:**

package/commands/flashback.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: flashback
 description: Session context restoration — "where was I?" summary after days away. Current best, pending hypotheses, last session, annotations.
-disable-model-invocation: true
 argument-hint: "[--days 7] [--last 10]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -9,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
 Come back to a project after a week and start working in 10 seconds instead of 30 minutes.
 ## Steps
-1. **Activate environment:** `source .venv/bin/activate`
-2. **Run:** `python scripts/session_flashback.py $ARGUMENTS`
+1. **Sync environment:** `uv sync`
+2. **Run:** `uv run python scripts/session_flashback.py $ARGUMENTS`
 3. **Report:** current best, last session experiments, pending hypotheses, annotations, budget, suggested next action
 4. **Saved output:** `experiments/flashbacks/flashback-*.yaml`

package/commands/fork.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: fork
 description: Branch an experiment into parallel tracks — run both A and B, report the winner.
-disable-model-invocation: true
 argument-hint: "<exp-id> --branches \"approach A\" \"approach B\" [--auto-promote]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Fork an experiment into parallel branches and compare results.
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -22,7 +21,7 @@ Fork an experiment into parallel branches and compare results.
 3. **Run fork:**
    ```bash
-   python scripts/fork_experiment.py $ARGUMENTS
+   uv run python scripts/fork_experiment.py $ARGUMENTS
    ```
 4. **Report results:**

package/commands/frontier.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: frontier
 description: Visualize Pareto frontier across multiple objectives — answers "which model is actually best?" when there are tradeoffs.
-disable-model-invocation: true
 argument-hint: "[--metrics \"accuracy,train_seconds,n_params\"] [--ascii]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Visualize the Pareto frontier across multiple objectives from experiment history
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -22,7 +21,7 @@ Visualize the Pareto frontier across multiple objectives from experiment history
 3. **Run Pareto analysis:**
    ```bash
-   python scripts/pareto_frontier.py $ARGUMENTS
+   uv run python scripts/pareto_frontier.py $ARGUMENTS
    ```
 4. **Report results:**

package/commands/init.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: init
-description: Initialize a new ML project with the Turing autoresearch harness. Scaffolds the full experiment infrastructure — immutable evaluation pipeline, agent-editable training code, structured logging, convergence detection hooks, and a Python virtual environment. Use --plan to generate a research plan.
-disable-model-invocation: true
+description: Initialize a new ML project with the Turing autoresearch harness. Scaffolds the full experiment infrastructure — immutable evaluation pipeline, agent-editable training code, structured logging, convergence detection hooks, and a uv-managed Python environment. Use --plan to generate a research plan.
 argument-hint: "[project_name] [--plan]"
 allowed-tools: Read, Write, Edit, Bash(*), Grep, Glob, WebSearch, WebFetch
 ---
@@ -24,7 +23,7 @@ Ask the user for the following (or accept from `$ARGUMENTS` if provided as JSON)
 Once you have all 6 values, delegate to the unified scaffolding script:
 ```bash
-python3 <templates_dir>/scripts/scaffold.py \
+uv run python <templates_dir>/scripts/scaffold.py \
     --project-name "<project_name>" \
     --target-metric "<target_metric>" \
     --metric-direction "<metric_direction>" \
@@ -39,7 +38,7 @@ The scaffold script handles everything in a single atomic operation:
 - Creates data/, experiments/, models/ directories
 - Sets up agent memory at `.claude/agent-memory/ml-researcher-{project_name}/MEMORY.md`
 - Configures Claude Code hooks in `.claude/settings.local.json`
-- Creates Python virtual environment and installs requirements
+- Runs `uv sync` from the ML directory when uv is available
 - Verifies all placeholders were replaced (fails loudly if any remain)
 ## Locating Templates
@@ -58,7 +57,7 @@ node_modules/claude-turing/templates/
 Example command:
 ```bash
-python3 ~/.claude/commands/turing/templates/scripts/scaffold.py \
+uv run python ~/.claude/commands/turing/templates/scripts/scaffold.py \
     --project-name "<project_name>" \
     --target-metric "<target_metric>" \
     --metric-direction "<metric_direction>" \
@@ -72,7 +71,7 @@ python3 ~/.claude/commands/turing/templates/scripts/scaffold.py \
 Report what was created:
 - The separation: READ-ONLY (`prepare.py`, `evaluate.py`) vs AGENT-EDITABLE (`train.py`)
-- Next steps: add data to the configured data source path, run `python prepare.py`, then `/turing:train`
+- Next steps: add data to the configured data source path, run `uv run python prepare.py`, then `/turing:train`
 - The taste-leverage loop: `/turing:try` to inject hypotheses, `/turing:brief` for intelligence reports
 ## Research Plan Generation (--plan flag)

package/commands/leak.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: leak
 description: Targeted leakage detection — probe for data leakage with single-feature tests, correlation checks, and train/test overlap detection.
-disable-model-invocation: true
 argument-hint: "[--deep] [--features feature_1,feature_2]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Actively probe for data leakage. The #1 cause of "too good to be true" results.
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -22,7 +21,7 @@ Actively probe for data leakage. The #1 cause of "too good to be true" results.
 3. **Run leakage scan:**
    ```bash
-   python scripts/leakage_detector.py $ARGUMENTS
+   uv run python scripts/leakage_detector.py $ARGUMENTS
    ```
 4. **Checks performed:**

package/commands/lit.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: lit
 description: Literature search scoped to the current experiment domain — find papers, SOTA baselines, and related work without leaving the terminal.
-disable-model-invocation: true
 argument-hint: "<query> | --baseline | --related <exp-id>"
 allowed-tools: Read, Bash(*), Grep, Glob, WebSearch
 ---
@@ -10,9 +9,9 @@ Search the literature for papers, baselines, and related work.
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -24,7 +23,7 @@ Search the literature for papers, baselines, and related work.
 3. **Run literature search:**
    ```bash
-   python scripts/literature_search.py $ARGUMENTS
+   uv run python scripts/literature_search.py $ARGUMENTS
    ```
 4. **Report results:**

package/commands/logbook.md CHANGED Viewed

@@ -1,9 +1,8 @@
 ---
 name: logbook
 description: Generate a research logbook showing the full experiment narrative — hypotheses proposed, experiments run, decisions made, and progress over time. Outputs HTML (with interactive chart) or markdown.
-disable-model-invocation: true
 argument-hint: "[--since YYYY-MM-DD] [--format html|markdown] [--output path]"
-allowed-tools: Read, Bash(python scripts/*:*, source .venv/bin/activate:*, mkdir:*), Grep, Glob
+allowed-tools: Read, Bash(uv run python scripts/*:*, uv sync:*, mkdir:*), Grep, Glob
 ---
 Generate a research logbook that captures the full narrative of the experiment campaign.
@@ -12,7 +11,7 @@ Generate a research logbook that captures the full narrative of the experiment c
 1. **Generate the logbook:**
    ```bash
-   source .venv/bin/activate && python scripts/generate_logbook.py
+   uv run python scripts/generate_logbook.py
    ```
    **With options from `$ARGUMENTS`:**
@@ -23,13 +22,13 @@ Generate a research logbook that captures the full narrative of the experiment c
    **Common usage:**
    ```bash
    # HTML logbook with interactive trajectory chart
-   source .venv/bin/activate && python scripts/generate_logbook.py --output logbook.html
+   uv run python scripts/generate_logbook.py --output logbook.html
    # Markdown for embedding in docs or READMEs
-   source .venv/bin/activate && python scripts/generate_logbook.py --format markdown --output logbook.md
+   uv run python scripts/generate_logbook.py --format markdown --output logbook.md
    # Last week's activity
-   source .venv/bin/activate && python scripts/generate_logbook.py --since 2026-03-24 --output logbook.html
+   uv run python scripts/generate_logbook.py --since 2026-03-24 --output logbook.html
    ```
 2. **Present the result:**

package/commands/merge.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: merge
 description: Model merging — average weights from multiple checkpoints into a single model (soups, TIES, DARE). Free accuracy, zero latency cost.
-disable-model-invocation: true
 argument-hint: "<exp-ids...> [--method uniform|greedy|ties|dare]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,8 +9,8 @@ Combine model weights (not predictions) into a single, better model with no late
 ## Steps
-1. **Activate environment:** `source .venv/bin/activate`
-2. **Run:** `python scripts/model_merger.py $ARGUMENTS`
+1. **Sync environment:** `uv sync`
+2. **Run:** `uv run python scripts/model_merger.py $ARGUMENTS`
 3. **Methods:** uniform soup (simple average), greedy soup (include only if improves), TIES (trim+elect+merge), DARE (drop+rescale)
 4. **Report:** compatibility check, per-model metrics, method comparison, improvement delta
 5. **Saved output:** `experiments/merges/merge-*.yaml`

package/commands/mode.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: mode
 description: Set the research strategy mode — explore (try new things), exploit (refine what works), or replicate (verify results). Drives novelty guard policy and agent behavior.
-disable-model-invocation: true
 argument-hint: "<explore|exploit|replicate>"
 ---
@@ -21,7 +20,7 @@ Set the research mode for the current project. The mode determines how the novel
 2. **Update experiment state:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    python -c "
    import yaml
    from pathlib import Path

package/commands/onboard.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: onboard
 description: Project onboarding — generate a walkthrough for new collaborators. Task, history, decisions, next steps.
-disable-model-invocation: true
 argument-hint: "[--audience researcher|engineer|stakeholder] [--depth brief|full]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -9,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
 5-minute read that replaces a 1-hour onboarding meeting.
 ## Steps
-1. `source .venv/bin/activate`
-2. `python scripts/generate_onboarding.py $ARGUMENTS`
+1. `uv sync`
+2. `uv run python scripts/generate_onboarding.py $ARGUMENTS`
 3. **Saved:** `ONBOARDING.md`
 ## Examples

package/commands/paper.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: paper
 description: Draft mechanical paper sections (setup, results, ablation, hyperparameters) from experiment logs. LaTeX and markdown output.
-disable-model-invocation: true
 argument-hint: "[--sections setup,results,ablation] [--format latex|markdown]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Draft paper sections directly from experiment data.
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -21,7 +20,7 @@ Draft paper sections directly from experiment data.
 3. **Run paper drafting:**
    ```bash
-   python scripts/draft_paper_sections.py $ARGUMENTS
+   uv run python scripts/draft_paper_sections.py $ARGUMENTS
    ```
 4. **Report results:**

package/commands/plan.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: plan
 description: Research planning assistant — design a strategic experiment campaign with budget-aware ROI allocation.
-disable-model-invocation: true
 argument-hint: "[--budget 20] [--goal \"maximize F1 for production\"]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -9,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
 Design the next N experiments strategically, not randomly. Allocates budget by expected ROI.
 ## Steps
-1. `source .venv/bin/activate`
-2. `python scripts/research_planner.py $ARGUMENTS`
+1. `uv sync`
+2. `uv run python scripts/research_planner.py $ARGUMENTS`
 3. **Saved:** `experiments/plans/`
 ## How it works

package/commands/poster.md CHANGED Viewed

@@ -1,9 +1,8 @@
 ---
 name: poster
 description: Generate a single-page HTML research poster summarizing the experiment campaign — best result, trajectory, key findings, and methodology. Adapted from posterskill's self-contained HTML architecture.
-disable-model-invocation: true
 argument-hint: "[title override]"
-allowed-tools: Read, Write, Edit, Bash(python scripts/*:*, source .venv/bin/activate:*, mkdir:*, open:*), Grep, Glob
+allowed-tools: Read, Write, Edit, Bash(uv run python scripts/*:*, uv sync:*, mkdir:*, open:*), Grep, Glob
 ---
 Generate a research poster summarizing the experiment campaign as a single self-contained HTML file. Adapted from [posterskill](https://github.com/ethanweber/posterskill)'s architecture — no build step, works when opened as `file://`.
@@ -16,8 +15,8 @@ Read the experiment history and project context:
 ```bash
 cat config.yaml
-source .venv/bin/activate && python scripts/generate_brief.py
-source .venv/bin/activate && python scripts/show_metrics.py --last 20
+uv run python scripts/generate_brief.py
+uv run python scripts/show_metrics.py --last 20
 cat experiment_state.yaml 2>/dev/null || true
 cat RESEARCH_PLAN.md 2>/dev/null || true
 ```

package/commands/postmortem.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: postmortem
 description: Failure postmortem — diagnose why experiments stopped improving and get actionable next steps.
-disable-model-invocation: true
 argument-hint: "[--window 10] [--auto-trigger 5]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -9,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
 When experiments stop improving, find out why. Diagnoses search space exhaustion, config errors, data issues, metric ceilings, and noise floors.
 ## Steps
-1. `source .venv/bin/activate`
-2. `python scripts/failure_postmortem.py $ARGUMENTS`
+1. `uv sync`
+2. `uv run python scripts/failure_postmortem.py $ARGUMENTS`
 3. **Saved:** `experiments/postmortems/`
 ## Diagnosis categories

package/commands/preflight.md CHANGED Viewed

@@ -1,30 +1,29 @@
 ---
 name: preflight
 description: Pre-flight resource check — estimates VRAM, RAM, and disk requirements before running ML training. Compares against available system resources and issues PASS/WARN/FAIL verdict. Use before training to catch OOM errors before they happen.
-disable-model-invocation: true
 argument-hint: "[--model-type torch] [--params 10M] [--batch-size 32]"
-allowed-tools: Read, Bash(python scripts/*:*, source .venv/bin/activate:*, nvidia-smi:*), Grep, Glob
+allowed-tools: Read, Bash(uv run python scripts/*:*, uv sync:*, nvidia-smi:*), Grep, Glob
 ---
 Check whether the current system has enough resources to run the planned experiment.
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Run preflight check:**
    If `$ARGUMENTS` is empty (auto-detect from config.yaml):
    ```bash
-   python scripts/preflight.py
+   uv run python scripts/preflight.py
    ```
    If `$ARGUMENTS` contains flags:
    ```bash
-   python scripts/preflight.py $ARGUMENTS
+   uv run python scripts/preflight.py $ARGUMENTS
    ```
 3. **Interpret the verdict:**

package/commands/present.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: present
 description: Presentation figure generation — training curves, comparison charts, ablation tables, Pareto plots, sensitivity heatmaps.
-disable-model-invocation: true
 argument-hint: "[--figures training,comparison] [--style light|dark|poster]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -9,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
 Generate presentation-ready figure specifications from experiment data in seconds.
 ## Steps
-1. **Activate environment:** `source .venv/bin/activate`
-2. **Run:** `python scripts/generate_figures.py $ARGUMENTS`
+1. **Sync environment:** `uv sync`
+2. **Run:** `uv run python scripts/generate_figures.py $ARGUMENTS`
 3. **Figure types:** training, comparison, ablation, pareto, sensitivity
 4. **Styles:** light (papers), dark (demos), poster (large fonts)
 5. **Saved output:** `paper/figures/`

package/commands/profile.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: profile
 description: Profile a training run — timing breakdown, memory usage, throughput, bottleneck detection with actionable recommendations.
-disable-model-invocation: true
 argument-hint: "[exp-id] [--seed 42]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Profile a training run to identify performance bottlenecks.
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -21,7 +20,7 @@ Profile a training run to identify performance bottlenecks.
 3. **Run profiling:**
    ```bash
-   python scripts/profile_training.py $ARGUMENTS
+   uv run python scripts/profile_training.py $ARGUMENTS
    ```
 4. **Report results:**

package/commands/prune.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: prune
 description: Weight pruning — measure accuracy at different sparsity levels, find the knee point, produce a smaller/faster model.
-disable-model-invocation: true
 argument-hint: "<exp-id> [--sparsity 0.5,0.75,0.9] [--method magnitude|structured|lottery]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,8 +9,8 @@ Remove redundant weights for faster inference and smaller models.
 ## Steps
-1. **Activate environment:** `source .venv/bin/activate`
-2. **Run:** `python scripts/model_pruning.py $ARGUMENTS`
+1. **Sync environment:** `uv sync`
+2. **Run:** `uv run python scripts/model_pruning.py $ARGUMENTS`
 3. **Methods:** magnitude (zero small weights), structured (remove neurons), lottery (iterative with rewind)
 4. **For tree models:** progressively reduces n_estimators
 5. **Report:** sparsity sweep table, knee point, recommended sparsity

package/commands/quantize.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: quantize
 description: Post-training quantization — FP32→INT8/FP16, measure accuracy loss, 2-4x speedup with <0.5% accuracy loss.
-disable-model-invocation: true
 argument-hint: "<exp-id> [--precision int8|fp16|dynamic]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,8 +9,8 @@ Quantize for production. Lowest-effort optimization: 2-4x speedup, 2-4x memory r
 ## Steps
-1. **Activate environment:** `source .venv/bin/activate`
-2. **Run:** `python scripts/model_quantization.py $ARGUMENTS`
+1. **Sync environment:** `uv sync`
+2. **Run:** `uv run python scripts/model_quantization.py $ARGUMENTS`
 3. **Precision levels:** FP32 (baseline), FP16 (GPU), INT8 dynamic (simplest), INT8 static (best accuracy)
 4. **Report:** precision comparison table, recommended level, QAT suggestion if needed
 5. **Saved output:** `experiments/quantization/<exp-id>-quantization.yaml`

package/commands/queue.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: queue
 description: Queue experiments for batch execution with priority ordering and dependency chains. Load the queue, walk away, read the summary.
-disable-model-invocation: true
 argument-hint: "<add|list|run|pause|clear> [description] [--priority high] [--after q-001]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ Manage the experiment queue for unattended batch execution.
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -24,7 +23,7 @@ Manage the experiment queue for unattended batch execution.
 3. **Run queue manager:**
    ```bash
-   python scripts/experiment_queue.py $ARGUMENTS
+   uv run python scripts/experiment_queue.py $ARGUMENTS
    ```
 4. **Report results by action:**

package/commands/registry.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: registry
 description: Model registry — track, promote, and govern the model lifecycle from candidate to production.
-disable-model-invocation: true
 argument-hint: "[list|register|promote|demote|archive|history] [exp-id] [stage]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -9,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
 Track which model is production, staging, candidate, or archived. Promotion requires passing gates.
 ## Steps
-1. `source .venv/bin/activate`
-2. `python scripts/model_lifecycle.py $ARGUMENTS`
+1. `uv sync`
+2. `uv run python scripts/model_lifecycle.py $ARGUMENTS`
 3. **Registry:** `experiments/registry.yaml`
 ## Promotion gates

package/commands/regress.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
 name: regress
 description: Performance regression gate — re-run best experiment after code/dependency changes and verify metrics haven't degraded.
-disable-model-invocation: true
 argument-hint: "[--tolerance 0.01] [--against exp-id] [--quick]"
 allowed-tools: Read, Bash(*), Grep, Glob
 ---
@@ -10,9 +9,9 @@ CI for your model. After any change to code, dependencies, or data, verify metri
 ## Steps
-1. **Activate environment:**
+1. **Sync environment:**
    ```bash
-   source .venv/bin/activate
+   uv sync
    ```
 2. **Parse arguments from `$ARGUMENTS`:**
@@ -24,7 +23,7 @@ CI for your model. After any change to code, dependencies, or data, verify metri
 3. **Run regression gate:**
    ```bash
-   python scripts/regression_gate.py $ARGUMENTS
+   uv run python scripts/regression_gate.py $ARGUMENTS
    ```
 4. **Report results:**