claude-turing 4.8.0 → 4.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (166) hide show
  1. package/.claude-plugin/plugin.json +1 -1
  2. package/README.md +1 -1
  3. package/agents/ml-evaluator.md +4 -4
  4. package/agents/ml-researcher.md +2 -2
  5. package/bin/turing-init.sh +2 -2
  6. package/commands/ablate.md +3 -3
  7. package/commands/annotate.md +2 -2
  8. package/commands/archive.md +2 -2
  9. package/commands/audit.md +3 -3
  10. package/commands/baseline.md +3 -3
  11. package/commands/brief.md +5 -5
  12. package/commands/budget.md +3 -3
  13. package/commands/calibrate.md +3 -3
  14. package/commands/card.md +3 -3
  15. package/commands/changelog.md +2 -2
  16. package/commands/checkpoint.md +3 -3
  17. package/commands/cite.md +2 -2
  18. package/commands/compare.md +1 -1
  19. package/commands/counterfactual.md +2 -2
  20. package/commands/curriculum.md +3 -3
  21. package/commands/design.md +3 -3
  22. package/commands/diagnose.md +4 -4
  23. package/commands/diff.md +3 -3
  24. package/commands/distill.md +3 -3
  25. package/commands/doctor.md +2 -2
  26. package/commands/ensemble.md +3 -3
  27. package/commands/explore.md +4 -4
  28. package/commands/export.md +3 -3
  29. package/commands/feature.md +3 -3
  30. package/commands/flashback.md +2 -2
  31. package/commands/fork.md +3 -3
  32. package/commands/frontier.md +3 -3
  33. package/commands/init.md +5 -5
  34. package/commands/leak.md +3 -3
  35. package/commands/lit.md +3 -3
  36. package/commands/logbook.md +5 -5
  37. package/commands/merge.md +2 -2
  38. package/commands/mode.md +1 -1
  39. package/commands/onboard.md +2 -2
  40. package/commands/paper.md +3 -3
  41. package/commands/plan.md +2 -2
  42. package/commands/poster.md +3 -3
  43. package/commands/postmortem.md +2 -2
  44. package/commands/preflight.md +5 -5
  45. package/commands/present.md +2 -2
  46. package/commands/profile.md +3 -3
  47. package/commands/prune.md +2 -2
  48. package/commands/quantize.md +2 -2
  49. package/commands/queue.md +3 -3
  50. package/commands/registry.md +2 -2
  51. package/commands/regress.md +3 -3
  52. package/commands/replay.md +2 -2
  53. package/commands/report.md +3 -3
  54. package/commands/reproduce.md +3 -3
  55. package/commands/retry.md +3 -3
  56. package/commands/review.md +2 -2
  57. package/commands/rules/loop-protocol.md +11 -11
  58. package/commands/sanity.md +3 -3
  59. package/commands/scale.md +4 -4
  60. package/commands/search.md +2 -2
  61. package/commands/seed.md +3 -3
  62. package/commands/sensitivity.md +3 -3
  63. package/commands/share.md +2 -2
  64. package/commands/simulate.md +2 -2
  65. package/commands/status.md +1 -1
  66. package/commands/stitch.md +3 -3
  67. package/commands/suggest.md +5 -5
  68. package/commands/surgery.md +2 -2
  69. package/commands/sweep.md +8 -8
  70. package/commands/template.md +2 -2
  71. package/commands/train.md +5 -5
  72. package/commands/transfer.md +3 -3
  73. package/commands/trend.md +2 -2
  74. package/commands/try.md +4 -4
  75. package/commands/update.md +2 -2
  76. package/commands/validate.md +4 -4
  77. package/commands/warm.md +3 -3
  78. package/commands/watch.md +4 -4
  79. package/commands/whatif.md +2 -2
  80. package/commands/xray.md +3 -3
  81. package/config/commands.yaml +1 -1
  82. package/package.json +1 -1
  83. package/skills/turing/ablate/SKILL.md +3 -3
  84. package/skills/turing/annotate/SKILL.md +2 -2
  85. package/skills/turing/archive/SKILL.md +2 -2
  86. package/skills/turing/audit/SKILL.md +3 -3
  87. package/skills/turing/baseline/SKILL.md +3 -3
  88. package/skills/turing/brief/SKILL.md +5 -5
  89. package/skills/turing/budget/SKILL.md +3 -3
  90. package/skills/turing/calibrate/SKILL.md +3 -3
  91. package/skills/turing/card/SKILL.md +3 -3
  92. package/skills/turing/changelog/SKILL.md +2 -2
  93. package/skills/turing/checkpoint/SKILL.md +3 -3
  94. package/skills/turing/cite/SKILL.md +2 -2
  95. package/skills/turing/compare/SKILL.md +1 -1
  96. package/skills/turing/counterfactual/SKILL.md +2 -2
  97. package/skills/turing/curriculum/SKILL.md +3 -3
  98. package/skills/turing/design/SKILL.md +3 -3
  99. package/skills/turing/diagnose/SKILL.md +4 -4
  100. package/skills/turing/diff/SKILL.md +3 -3
  101. package/skills/turing/distill/SKILL.md +3 -3
  102. package/skills/turing/doctor/SKILL.md +2 -2
  103. package/skills/turing/ensemble/SKILL.md +3 -3
  104. package/skills/turing/explore/SKILL.md +4 -4
  105. package/skills/turing/export/SKILL.md +3 -3
  106. package/skills/turing/feature/SKILL.md +3 -3
  107. package/skills/turing/flashback/SKILL.md +2 -2
  108. package/skills/turing/fork/SKILL.md +3 -3
  109. package/skills/turing/frontier/SKILL.md +3 -3
  110. package/skills/turing/init/SKILL.md +5 -5
  111. package/skills/turing/leak/SKILL.md +3 -3
  112. package/skills/turing/lit/SKILL.md +3 -3
  113. package/skills/turing/logbook/SKILL.md +5 -5
  114. package/skills/turing/merge/SKILL.md +2 -2
  115. package/skills/turing/mode/SKILL.md +1 -1
  116. package/skills/turing/onboard/SKILL.md +2 -2
  117. package/skills/turing/paper/SKILL.md +3 -3
  118. package/skills/turing/plan/SKILL.md +2 -2
  119. package/skills/turing/poster/SKILL.md +3 -3
  120. package/skills/turing/postmortem/SKILL.md +2 -2
  121. package/skills/turing/preflight/SKILL.md +5 -5
  122. package/skills/turing/present/SKILL.md +2 -2
  123. package/skills/turing/profile/SKILL.md +3 -3
  124. package/skills/turing/prune/SKILL.md +2 -2
  125. package/skills/turing/quantize/SKILL.md +2 -2
  126. package/skills/turing/queue/SKILL.md +3 -3
  127. package/skills/turing/registry/SKILL.md +2 -2
  128. package/skills/turing/regress/SKILL.md +3 -3
  129. package/skills/turing/replay/SKILL.md +2 -2
  130. package/skills/turing/report/SKILL.md +3 -3
  131. package/skills/turing/reproduce/SKILL.md +3 -3
  132. package/skills/turing/retry/SKILL.md +3 -3
  133. package/skills/turing/review/SKILL.md +2 -2
  134. package/skills/turing/rules/loop-protocol.md +11 -11
  135. package/skills/turing/sanity/SKILL.md +3 -3
  136. package/skills/turing/scale/SKILL.md +4 -4
  137. package/skills/turing/search/SKILL.md +2 -2
  138. package/skills/turing/seed/SKILL.md +3 -3
  139. package/skills/turing/sensitivity/SKILL.md +3 -3
  140. package/skills/turing/share/SKILL.md +2 -2
  141. package/skills/turing/simulate/SKILL.md +2 -2
  142. package/skills/turing/status/SKILL.md +1 -1
  143. package/skills/turing/stitch/SKILL.md +3 -3
  144. package/skills/turing/suggest/SKILL.md +5 -5
  145. package/skills/turing/surgery/SKILL.md +2 -2
  146. package/skills/turing/sweep/SKILL.md +8 -8
  147. package/skills/turing/template/SKILL.md +2 -2
  148. package/skills/turing/train/SKILL.md +5 -5
  149. package/skills/turing/transfer/SKILL.md +3 -3
  150. package/skills/turing/trend/SKILL.md +2 -2
  151. package/skills/turing/try/SKILL.md +4 -4
  152. package/skills/turing/update/SKILL.md +2 -2
  153. package/skills/turing/validate/SKILL.md +4 -4
  154. package/skills/turing/warm/SKILL.md +3 -3
  155. package/skills/turing/watch/SKILL.md +4 -4
  156. package/skills/turing/whatif/SKILL.md +2 -2
  157. package/skills/turing/xray/SKILL.md +3 -3
  158. package/templates/README.md +5 -8
  159. package/templates/program.md +18 -18
  160. package/templates/pyproject.toml +10 -0
  161. package/templates/requirements.txt +4 -1
  162. package/templates/scripts/generate_onboarding.py +1 -1
  163. package/templates/scripts/post-train-hook.sh +7 -8
  164. package/templates/scripts/scaffold.py +24 -26
  165. package/templates/scripts/stop-hook.sh +2 -3
  166. package/templates/scripts/turing-run-python.sh +9 -0
package/commands/init.md CHANGED
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: init
3
- description: Initialize a new ML project with the Turing autoresearch harness. Scaffolds the full experiment infrastructure — immutable evaluation pipeline, agent-editable training code, structured logging, convergence detection hooks, and a Python virtual environment. Use --plan to generate a research plan.
3
+ description: Initialize a new ML project with the Turing autoresearch harness. Scaffolds the full experiment infrastructure — immutable evaluation pipeline, agent-editable training code, structured logging, convergence detection hooks, and a uv-managed Python environment. Use --plan to generate a research plan.
4
4
  argument-hint: "[project_name] [--plan]"
5
5
  allowed-tools: Read, Write, Edit, Bash(*), Grep, Glob, WebSearch, WebFetch
6
6
  ---
@@ -23,7 +23,7 @@ Ask the user for the following (or accept from `$ARGUMENTS` if provided as JSON)
23
23
  Once you have all 6 values, delegate to the unified scaffolding script:
24
24
 
25
25
  ```bash
26
- python3 <templates_dir>/scripts/scaffold.py \
26
+ uv run python <templates_dir>/scripts/scaffold.py \
27
27
  --project-name "<project_name>" \
28
28
  --target-metric "<target_metric>" \
29
29
  --metric-direction "<metric_direction>" \
@@ -38,7 +38,7 @@ The scaffold script handles everything in a single atomic operation:
38
38
  - Creates data/, experiments/, models/ directories
39
39
  - Sets up agent memory at `.claude/agent-memory/ml-researcher-{project_name}/MEMORY.md`
40
40
  - Configures Claude Code hooks in `.claude/settings.local.json`
41
- - Creates Python virtual environment and installs requirements
41
+ - Runs `uv sync` from the ML directory when uv is available
42
42
  - Verifies all placeholders were replaced (fails loudly if any remain)
43
43
 
44
44
  ## Locating Templates
@@ -57,7 +57,7 @@ node_modules/claude-turing/templates/
57
57
  Example command:
58
58
 
59
59
  ```bash
60
- python3 ~/.claude/commands/turing/templates/scripts/scaffold.py \
60
+ uv run python ~/.claude/commands/turing/templates/scripts/scaffold.py \
61
61
  --project-name "<project_name>" \
62
62
  --target-metric "<target_metric>" \
63
63
  --metric-direction "<metric_direction>" \
@@ -71,7 +71,7 @@ python3 ~/.claude/commands/turing/templates/scripts/scaffold.py \
71
71
 
72
72
  Report what was created:
73
73
  - The separation: READ-ONLY (`prepare.py`, `evaluate.py`) vs AGENT-EDITABLE (`train.py`)
74
- - Next steps: add data to the configured data source path, run `python prepare.py`, then `/turing:train`
74
+ - Next steps: add data to the configured data source path, run `uv run python prepare.py`, then `/turing:train`
75
75
  - The taste-leverage loop: `/turing:try` to inject hypotheses, `/turing:brief` for intelligence reports
76
76
 
77
77
  ## Research Plan Generation (--plan flag)
package/commands/leak.md CHANGED
@@ -9,9 +9,9 @@ Actively probe for data leakage. The #1 cause of "too good to be true" results.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -21,7 +21,7 @@ Actively probe for data leakage. The #1 cause of "too good to be true" results.
21
21
 
22
22
  3. **Run leakage scan:**
23
23
  ```bash
24
- python scripts/leakage_detector.py $ARGUMENTS
24
+ uv run python scripts/leakage_detector.py $ARGUMENTS
25
25
  ```
26
26
 
27
27
  4. **Checks performed:**
package/commands/lit.md CHANGED
@@ -9,9 +9,9 @@ Search the literature for papers, baselines, and related work.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -23,7 +23,7 @@ Search the literature for papers, baselines, and related work.
23
23
 
24
24
  3. **Run literature search:**
25
25
  ```bash
26
- python scripts/literature_search.py $ARGUMENTS
26
+ uv run python scripts/literature_search.py $ARGUMENTS
27
27
  ```
28
28
 
29
29
  4. **Report results:**
@@ -2,7 +2,7 @@
2
2
  name: logbook
3
3
  description: Generate a research logbook showing the full experiment narrative — hypotheses proposed, experiments run, decisions made, and progress over time. Outputs HTML (with interactive chart) or markdown.
4
4
  argument-hint: "[--since YYYY-MM-DD] [--format html|markdown] [--output path]"
5
- allowed-tools: Read, Bash(python scripts/*:*, source .venv/bin/activate:*, mkdir:*), Grep, Glob
5
+ allowed-tools: Read, Bash(uv run python scripts/*:*, uv sync:*, mkdir:*), Grep, Glob
6
6
  ---
7
7
 
8
8
  Generate a research logbook that captures the full narrative of the experiment campaign.
@@ -11,7 +11,7 @@ Generate a research logbook that captures the full narrative of the experiment c
11
11
 
12
12
  1. **Generate the logbook:**
13
13
  ```bash
14
- source .venv/bin/activate && python scripts/generate_logbook.py
14
+ uv run python scripts/generate_logbook.py
15
15
  ```
16
16
 
17
17
  **With options from `$ARGUMENTS`:**
@@ -22,13 +22,13 @@ Generate a research logbook that captures the full narrative of the experiment c
22
22
  **Common usage:**
23
23
  ```bash
24
24
  # HTML logbook with interactive trajectory chart
25
- source .venv/bin/activate && python scripts/generate_logbook.py --output logbook.html
25
+ uv run python scripts/generate_logbook.py --output logbook.html
26
26
 
27
27
  # Markdown for embedding in docs or READMEs
28
- source .venv/bin/activate && python scripts/generate_logbook.py --format markdown --output logbook.md
28
+ uv run python scripts/generate_logbook.py --format markdown --output logbook.md
29
29
 
30
30
  # Last week's activity
31
- source .venv/bin/activate && python scripts/generate_logbook.py --since 2026-03-24 --output logbook.html
31
+ uv run python scripts/generate_logbook.py --since 2026-03-24 --output logbook.html
32
32
  ```
33
33
 
34
34
  2. **Present the result:**
package/commands/merge.md CHANGED
@@ -9,8 +9,8 @@ Combine model weights (not predictions) into a single, better model with no late
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:** `source .venv/bin/activate`
13
- 2. **Run:** `python scripts/model_merger.py $ARGUMENTS`
12
+ 1. **Sync environment:** `uv sync`
13
+ 2. **Run:** `uv run python scripts/model_merger.py $ARGUMENTS`
14
14
  3. **Methods:** uniform soup (simple average), greedy soup (include only if improves), TIES (trim+elect+merge), DARE (drop+rescale)
15
15
  4. **Report:** compatibility check, per-model metrics, method comparison, improvement delta
16
16
  5. **Saved output:** `experiments/merges/merge-*.yaml`
package/commands/mode.md CHANGED
@@ -20,7 +20,7 @@ Set the research mode for the current project. The mode determines how the novel
20
20
 
21
21
  2. **Update experiment state:**
22
22
  ```bash
23
- source .venv/bin/activate
23
+ uv sync
24
24
  python -c "
25
25
  import yaml
26
26
  from pathlib import Path
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  5-minute read that replaces a 1-hour onboarding meeting.
9
9
 
10
10
  ## Steps
11
- 1. `source .venv/bin/activate`
12
- 2. `python scripts/generate_onboarding.py $ARGUMENTS`
11
+ 1. `uv sync`
12
+ 2. `uv run python scripts/generate_onboarding.py $ARGUMENTS`
13
13
  3. **Saved:** `ONBOARDING.md`
14
14
 
15
15
  ## Examples
package/commands/paper.md CHANGED
@@ -9,9 +9,9 @@ Draft paper sections directly from experiment data.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -20,7 +20,7 @@ Draft paper sections directly from experiment data.
20
20
 
21
21
  3. **Run paper drafting:**
22
22
  ```bash
23
- python scripts/draft_paper_sections.py $ARGUMENTS
23
+ uv run python scripts/draft_paper_sections.py $ARGUMENTS
24
24
  ```
25
25
 
26
26
  4. **Report results:**
package/commands/plan.md CHANGED
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  Design the next N experiments strategically, not randomly. Allocates budget by expected ROI.
9
9
 
10
10
  ## Steps
11
- 1. `source .venv/bin/activate`
12
- 2. `python scripts/research_planner.py $ARGUMENTS`
11
+ 1. `uv sync`
12
+ 2. `uv run python scripts/research_planner.py $ARGUMENTS`
13
13
  3. **Saved:** `experiments/plans/`
14
14
 
15
15
  ## How it works
@@ -2,7 +2,7 @@
2
2
  name: poster
3
3
  description: Generate a single-page HTML research poster summarizing the experiment campaign — best result, trajectory, key findings, and methodology. Adapted from posterskill's self-contained HTML architecture.
4
4
  argument-hint: "[title override]"
5
- allowed-tools: Read, Write, Edit, Bash(python scripts/*:*, source .venv/bin/activate:*, mkdir:*, open:*), Grep, Glob
5
+ allowed-tools: Read, Write, Edit, Bash(uv run python scripts/*:*, uv sync:*, mkdir:*, open:*), Grep, Glob
6
6
  ---
7
7
 
8
8
  Generate a research poster summarizing the experiment campaign as a single self-contained HTML file. Adapted from [posterskill](https://github.com/ethanweber/posterskill)'s architecture — no build step, works when opened as `file://`.
@@ -15,8 +15,8 @@ Read the experiment history and project context:
15
15
 
16
16
  ```bash
17
17
  cat config.yaml
18
- source .venv/bin/activate && python scripts/generate_brief.py
19
- source .venv/bin/activate && python scripts/show_metrics.py --last 20
18
+ uv run python scripts/generate_brief.py
19
+ uv run python scripts/show_metrics.py --last 20
20
20
  cat experiment_state.yaml 2>/dev/null || true
21
21
  cat RESEARCH_PLAN.md 2>/dev/null || true
22
22
  ```
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  When experiments stop improving, find out why. Diagnoses search space exhaustion, config errors, data issues, metric ceilings, and noise floors.
9
9
 
10
10
  ## Steps
11
- 1. `source .venv/bin/activate`
12
- 2. `python scripts/failure_postmortem.py $ARGUMENTS`
11
+ 1. `uv sync`
12
+ 2. `uv run python scripts/failure_postmortem.py $ARGUMENTS`
13
13
  3. **Saved:** `experiments/postmortems/`
14
14
 
15
15
  ## Diagnosis categories
@@ -2,28 +2,28 @@
2
2
  name: preflight
3
3
  description: Pre-flight resource check — estimates VRAM, RAM, and disk requirements before running ML training. Compares against available system resources and issues PASS/WARN/FAIL verdict. Use before training to catch OOM errors before they happen.
4
4
  argument-hint: "[--model-type torch] [--params 10M] [--batch-size 32]"
5
- allowed-tools: Read, Bash(python scripts/*:*, source .venv/bin/activate:*, nvidia-smi:*), Grep, Glob
5
+ allowed-tools: Read, Bash(uv run python scripts/*:*, uv sync:*, nvidia-smi:*), Grep, Glob
6
6
  ---
7
7
 
8
8
  Check whether the current system has enough resources to run the planned experiment.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Run preflight check:**
18
18
 
19
19
  If `$ARGUMENTS` is empty (auto-detect from config.yaml):
20
20
  ```bash
21
- python scripts/preflight.py
21
+ uv run python scripts/preflight.py
22
22
  ```
23
23
 
24
24
  If `$ARGUMENTS` contains flags:
25
25
  ```bash
26
- python scripts/preflight.py $ARGUMENTS
26
+ uv run python scripts/preflight.py $ARGUMENTS
27
27
  ```
28
28
 
29
29
  3. **Interpret the verdict:**
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  Generate presentation-ready figure specifications from experiment data in seconds.
9
9
 
10
10
  ## Steps
11
- 1. **Activate environment:** `source .venv/bin/activate`
12
- 2. **Run:** `python scripts/generate_figures.py $ARGUMENTS`
11
+ 1. **Sync environment:** `uv sync`
12
+ 2. **Run:** `uv run python scripts/generate_figures.py $ARGUMENTS`
13
13
  3. **Figure types:** training, comparison, ablation, pareto, sensitivity
14
14
  4. **Styles:** light (papers), dark (demos), poster (large fonts)
15
15
  5. **Saved output:** `paper/figures/`
@@ -9,9 +9,9 @@ Profile a training run to identify performance bottlenecks.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -20,7 +20,7 @@ Profile a training run to identify performance bottlenecks.
20
20
 
21
21
  3. **Run profiling:**
22
22
  ```bash
23
- python scripts/profile_training.py $ARGUMENTS
23
+ uv run python scripts/profile_training.py $ARGUMENTS
24
24
  ```
25
25
 
26
26
  4. **Report results:**
package/commands/prune.md CHANGED
@@ -9,8 +9,8 @@ Remove redundant weights for faster inference and smaller models.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:** `source .venv/bin/activate`
13
- 2. **Run:** `python scripts/model_pruning.py $ARGUMENTS`
12
+ 1. **Sync environment:** `uv sync`
13
+ 2. **Run:** `uv run python scripts/model_pruning.py $ARGUMENTS`
14
14
  3. **Methods:** magnitude (zero small weights), structured (remove neurons), lottery (iterative with rewind)
15
15
  4. **For tree models:** progressively reduces n_estimators
16
16
  5. **Report:** sparsity sweep table, knee point, recommended sparsity
@@ -9,8 +9,8 @@ Quantize for production. Lowest-effort optimization: 2-4x speedup, 2-4x memory r
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:** `source .venv/bin/activate`
13
- 2. **Run:** `python scripts/model_quantization.py $ARGUMENTS`
12
+ 1. **Sync environment:** `uv sync`
13
+ 2. **Run:** `uv run python scripts/model_quantization.py $ARGUMENTS`
14
14
  3. **Precision levels:** FP32 (baseline), FP16 (GPU), INT8 dynamic (simplest), INT8 static (best accuracy)
15
15
  4. **Report:** precision comparison table, recommended level, QAT suggestion if needed
16
16
  5. **Saved output:** `experiments/quantization/<exp-id>-quantization.yaml`
package/commands/queue.md CHANGED
@@ -9,9 +9,9 @@ Manage the experiment queue for unattended batch execution.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -23,7 +23,7 @@ Manage the experiment queue for unattended batch execution.
23
23
 
24
24
  3. **Run queue manager:**
25
25
  ```bash
26
- python scripts/experiment_queue.py $ARGUMENTS
26
+ uv run python scripts/experiment_queue.py $ARGUMENTS
27
27
  ```
28
28
 
29
29
  4. **Report results by action:**
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  Track which model is production, staging, candidate, or archived. Promotion requires passing gates.
9
9
 
10
10
  ## Steps
11
- 1. `source .venv/bin/activate`
12
- 2. `python scripts/model_lifecycle.py $ARGUMENTS`
11
+ 1. `uv sync`
12
+ 2. `uv run python scripts/model_lifecycle.py $ARGUMENTS`
13
13
  3. **Registry:** `experiments/registry.yaml`
14
14
 
15
15
  ## Promotion gates
@@ -9,9 +9,9 @@ CI for your model. After any change to code, dependencies, or data, verify metri
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -23,7 +23,7 @@ CI for your model. After any change to code, dependencies, or data, verify metri
23
23
 
24
24
  3. **Run regression gate:**
25
25
  ```bash
26
- python scripts/regression_gate.py $ARGUMENTS
26
+ uv run python scripts/regression_gate.py $ARGUMENTS
27
27
  ```
28
28
 
29
29
  4. **Report results:**
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  Should you revisit old ideas? Infrastructure changes may make failed approaches work now.
9
9
 
10
10
  ## Steps
11
- 1. **Activate environment:** `source .venv/bin/activate`
12
- 2. **Run:** `python scripts/experiment_replay.py $ARGUMENTS`
11
+ 1. **Sync environment:** `uv sync`
12
+ 2. **Run:** `uv run python scripts/experiment_replay.py $ARGUMENTS`
13
13
  3. **Modes:** default (current code+data), --with-current-data, --with-current-preprocessing
14
14
  4. **Report:** original vs replayed metrics, delta, verdict
15
15
  5. **Saved output:** `experiments/replays/`
@@ -2,7 +2,7 @@
2
2
  name: report
3
3
  description: Generate a markdown research report from experiment history — structured for sharing, archiving, or including in documentation. More detailed than a brief, less visual than a poster.
4
4
  argument-hint: "[--since YYYY-MM-DD] [--output path]"
5
- allowed-tools: Read, Bash(python scripts/*:*, source .venv/bin/activate:*, mkdir:*), Grep, Glob
5
+ allowed-tools: Read, Bash(uv run python scripts/*:*, uv sync:*, mkdir:*), Grep, Glob
6
6
  ---
7
7
 
8
8
  Generate a structured markdown research report summarizing the experiment campaign.
@@ -14,12 +14,12 @@ Generate a structured markdown research report summarizing the experiment campai
14
14
  Use the logbook generator in markdown mode as the data backbone:
15
15
 
16
16
  ```bash
17
- source .venv/bin/activate && python scripts/generate_logbook.py --format markdown
17
+ uv run python scripts/generate_logbook.py --format markdown
18
18
  ```
19
19
 
20
20
  Also gather supplementary data:
21
21
  ```bash
22
- source .venv/bin/activate && python scripts/generate_brief.py
22
+ uv run python scripts/generate_brief.py
23
23
  cat experiment_state.yaml 2>/dev/null || true
24
24
  cat RESEARCH_PLAN.md 2>/dev/null || true
25
25
  ```
@@ -9,9 +9,9 @@ Verify that a logged experiment can be reproduced with consistent results.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -22,7 +22,7 @@ Verify that a logged experiment can be reproduced with consistent results.
22
22
 
23
23
  3. **Run reproducibility verification:**
24
24
  ```bash
25
- python scripts/reproduce_experiment.py $ARGUMENTS
25
+ uv run python scripts/reproduce_experiment.py $ARGUMENTS
26
26
  ```
27
27
 
28
28
  4. **Report results:**
package/commands/retry.md CHANGED
@@ -9,9 +9,9 @@ Auto-diagnose and recover from experiment failures.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -21,7 +21,7 @@ Auto-diagnose and recover from experiment failures.
21
21
 
22
22
  3. **Run smart retry:**
23
23
  ```bash
24
- python scripts/smart_retry.py $ARGUMENTS
24
+ uv run python scripts/smart_retry.py $ARGUMENTS
25
25
  ```
26
26
 
27
27
  4. **Report results:**
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  Simulate a conference reviewer before you submit. Each weakness links to the command that fixes it.
9
9
 
10
10
  ## Steps
11
- 1. `source .venv/bin/activate`
12
- 2. `python scripts/simulate_review.py $ARGUMENTS`
11
+ 1. `uv sync`
12
+ 2. `uv run python scripts/simulate_review.py $ARGUMENTS`
13
13
  3. **Saved:** `experiments/reviews/`
14
14
 
15
15
  ## Examples
@@ -16,9 +16,9 @@ The autoresearch harness enforces a strict separation between the **hypothesis s
16
16
 
17
17
  ## Execution Rules
18
18
 
19
- - **ALWAYS redirect training output:** `python train.py > run.log 2>&1`
19
+ - **ALWAYS redirect training output:** `uv run python train.py > run.log 2>&1`
20
20
  - **ALWAYS parse metrics with grep** between `---` delimiters: `grep -A 10 "^---" run.log | head -10`
21
- - **ALWAYS activate the venv first:** `source .venv/bin/activate`
21
+ - **ALWAYS run Python through uv:** `uv run python ...`
22
22
  - **NEVER install new packages** without human approval
23
23
 
24
24
  ## Git Discipline
@@ -40,16 +40,16 @@ The autoresearch harness enforces a strict separation between the **hypothesis s
40
40
 
41
41
  ## Sweep Workflow
42
42
 
43
- 1. Generate queue: `python scripts/sweep.py`
44
- 2. Check status: `python scripts/sweep.py --status`
45
- 3. Get next: `python scripts/sweep.py --next`
43
+ 1. Generate queue: `uv run python scripts/sweep.py`
44
+ 2. Check status: `uv run python scripts/sweep.py --status`
45
+ 3. Get next: `uv run python scripts/sweep.py --next`
46
46
  4. Apply overrides, create branch, run training
47
- 5. Mark: `python scripts/sweep.py --mark <name> complete|failed`
47
+ 5. Mark: `uv run python scripts/sweep.py --mark <name> complete|failed`
48
48
  6. Repeat until queue is empty
49
49
 
50
50
  ## Logging Rules
51
51
 
52
- - **Log every experiment** to `experiments/log.jsonl` via `python scripts/log_experiment.py` — kept and discarded alike.
52
+ - **Log every experiment** to `experiments/log.jsonl` via `uv run python scripts/log_experiment.py` — kept and discarded alike.
53
53
  - **Include all metrics, config, and description** of the hypothesis and its outcome.
54
54
 
55
55
  ## Convergence Rules
@@ -64,11 +64,11 @@ The researcher agent's Bash access is restricted to a whitelist of necessary com
64
64
 
65
65
  | Allowed Pattern | Purpose |
66
66
  |-----------------|---------|
67
- | `python train.py:*` | Execute training |
68
- | `python scripts/*:*` | Run utility scripts (logging, metrics, sweep) |
67
+ | `uv run python train.py:*` | Execute training |
68
+ | `uv run python scripts/*:*` | Run utility scripts (logging, metrics, sweep) |
69
69
  | `git:*` | Branch, commit, merge, reset operations |
70
- | `source .venv/bin/activate:*` | Virtual environment activation |
71
- | `pip:*` | Package installation (requires human approval) |
70
+ | `uv sync:*` | Virtual environment activation |
71
+ | `uv add:*` | Package installation (requires human approval) |
72
72
 
73
73
  **Blocked by omission:** `cat`, `head`, `tail`, `less` (prevents reading hidden files via shell), `curl`, `wget` (prevents data exfiltration), arbitrary command execution.
74
74
 
@@ -9,9 +9,9 @@ Run a battery of fast checks before committing to a full training run. Catches w
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -21,7 +21,7 @@ Run a battery of fast checks before committing to a full training run. Catches w
21
21
 
22
22
  3. **Run sanity checks:**
23
23
  ```bash
24
- python scripts/sanity_checks.py $ARGUMENTS
24
+ uv run python scripts/sanity_checks.py $ARGUMENTS
25
25
  ```
26
26
 
27
27
  4. **Checks performed:**
package/commands/scale.md CHANGED
@@ -9,9 +9,9 @@ Predict full-scale performance from a handful of small experiments. Answers "is
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -24,11 +24,11 @@ Predict full-scale performance from a handful of small experiments. Answers "is
24
24
  3. **Plan or analyze:**
25
25
  - **Plan mode (default):** generates scale point configs to run
26
26
  ```bash
27
- python scripts/scaling_estimator.py --axis data --points 4
27
+ uv run python scripts/scaling_estimator.py --axis data --points 4
28
28
  ```
29
29
  - **Analyze mode:** fits power law to completed results
30
30
  ```bash
31
- python scripts/scaling_estimator.py --analyze experiments/scaling/results.yaml
31
+ uv run python scripts/scaling_estimator.py --analyze experiments/scaling/results.yaml
32
32
  ```
33
33
 
34
34
  4. **Scaling axes:**
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  Find specific experiments in a large history with natural language and structured filters.
9
9
 
10
10
  ## Steps
11
- 1. **Activate environment:** `source .venv/bin/activate`
12
- 2. **Run:** `python scripts/experiment_search.py $ARGUMENTS`
11
+ 1. **Sync environment:** `uv sync`
12
+ 2. **Run:** `uv run python scripts/experiment_search.py $ARGUMENTS`
13
13
  3. **Filters:** `accuracy>0.85`, `status:kept`, `family:baseline`, `date:last-week`
14
14
  4. **Report:** ranked table of matching experiments
15
15
 
package/commands/seed.md CHANGED
@@ -9,9 +9,9 @@ Run a multi-seed study to verify that experiment results are robust across rando
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -22,7 +22,7 @@ Run a multi-seed study to verify that experiment results are robust across rando
22
22
 
23
23
  3. **Run seed study:**
24
24
  ```bash
25
- python scripts/seed_runner.py $ARGUMENTS
25
+ uv run python scripts/seed_runner.py $ARGUMENTS
26
26
  ```
27
27
 
28
28
  4. **Report results:**
@@ -9,9 +9,9 @@ Which hyperparameters actually matter? Stop wasting time on the ones that don't.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -21,7 +21,7 @@ Which hyperparameters actually matter? Stop wasting time on the ones that don't.
21
21
 
22
22
  3. **Run sensitivity analysis:**
23
23
  ```bash
24
- python scripts/sensitivity_analysis.py $ARGUMENTS
24
+ uv run python scripts/sensitivity_analysis.py $ARGUMENTS
25
25
  ```
26
26
 
27
27
  4. **Report includes:**
package/commands/share.md CHANGED
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  Package experiments for collaborator handoff or paper supplementary material.
9
9
 
10
10
  ## Steps
11
- 1. `source .venv/bin/activate`
12
- 2. `python scripts/package_experiments.py $ARGUMENTS`
11
+ 1. `uv sync`
12
+ 2. `uv run python scripts/package_experiments.py $ARGUMENTS`
13
13
  3. **Saved:** `exports/packages/<name>/`
14
14
 
15
15
  ## Examples
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  Predict outcomes before spending compute. Ranks proposed configs and recommends which to run vs skip.
9
9
 
10
10
  ## Steps
11
- 1. `source .venv/bin/activate`
12
- 2. `python scripts/experiment_simulator.py $ARGUMENTS`
11
+ 1. `uv sync`
12
+ 2. `uv run python scripts/experiment_simulator.py $ARGUMENTS`
13
13
  3. **Saved:** `experiments/simulations/`
14
14
 
15
15
  ## How it works