claude-turing 4.8.0 → 4.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (166) hide show
  1. package/.claude-plugin/plugin.json +1 -1
  2. package/README.md +1 -1
  3. package/agents/ml-evaluator.md +4 -4
  4. package/agents/ml-researcher.md +2 -2
  5. package/bin/turing-init.sh +2 -2
  6. package/commands/ablate.md +3 -3
  7. package/commands/annotate.md +2 -2
  8. package/commands/archive.md +2 -2
  9. package/commands/audit.md +3 -3
  10. package/commands/baseline.md +3 -3
  11. package/commands/brief.md +5 -5
  12. package/commands/budget.md +3 -3
  13. package/commands/calibrate.md +3 -3
  14. package/commands/card.md +3 -3
  15. package/commands/changelog.md +2 -2
  16. package/commands/checkpoint.md +3 -3
  17. package/commands/cite.md +2 -2
  18. package/commands/compare.md +1 -1
  19. package/commands/counterfactual.md +2 -2
  20. package/commands/curriculum.md +3 -3
  21. package/commands/design.md +3 -3
  22. package/commands/diagnose.md +4 -4
  23. package/commands/diff.md +3 -3
  24. package/commands/distill.md +3 -3
  25. package/commands/doctor.md +2 -2
  26. package/commands/ensemble.md +3 -3
  27. package/commands/explore.md +4 -4
  28. package/commands/export.md +3 -3
  29. package/commands/feature.md +3 -3
  30. package/commands/flashback.md +2 -2
  31. package/commands/fork.md +3 -3
  32. package/commands/frontier.md +3 -3
  33. package/commands/init.md +5 -5
  34. package/commands/leak.md +3 -3
  35. package/commands/lit.md +3 -3
  36. package/commands/logbook.md +5 -5
  37. package/commands/merge.md +2 -2
  38. package/commands/mode.md +1 -1
  39. package/commands/onboard.md +2 -2
  40. package/commands/paper.md +3 -3
  41. package/commands/plan.md +2 -2
  42. package/commands/poster.md +3 -3
  43. package/commands/postmortem.md +2 -2
  44. package/commands/preflight.md +5 -5
  45. package/commands/present.md +2 -2
  46. package/commands/profile.md +3 -3
  47. package/commands/prune.md +2 -2
  48. package/commands/quantize.md +2 -2
  49. package/commands/queue.md +3 -3
  50. package/commands/registry.md +2 -2
  51. package/commands/regress.md +3 -3
  52. package/commands/replay.md +2 -2
  53. package/commands/report.md +3 -3
  54. package/commands/reproduce.md +3 -3
  55. package/commands/retry.md +3 -3
  56. package/commands/review.md +2 -2
  57. package/commands/rules/loop-protocol.md +11 -11
  58. package/commands/sanity.md +3 -3
  59. package/commands/scale.md +4 -4
  60. package/commands/search.md +2 -2
  61. package/commands/seed.md +3 -3
  62. package/commands/sensitivity.md +3 -3
  63. package/commands/share.md +2 -2
  64. package/commands/simulate.md +2 -2
  65. package/commands/status.md +1 -1
  66. package/commands/stitch.md +3 -3
  67. package/commands/suggest.md +5 -5
  68. package/commands/surgery.md +2 -2
  69. package/commands/sweep.md +8 -8
  70. package/commands/template.md +2 -2
  71. package/commands/train.md +5 -5
  72. package/commands/transfer.md +3 -3
  73. package/commands/trend.md +2 -2
  74. package/commands/try.md +4 -4
  75. package/commands/update.md +2 -2
  76. package/commands/validate.md +4 -4
  77. package/commands/warm.md +3 -3
  78. package/commands/watch.md +4 -4
  79. package/commands/whatif.md +2 -2
  80. package/commands/xray.md +3 -3
  81. package/config/commands.yaml +1 -1
  82. package/package.json +1 -1
  83. package/skills/turing/ablate/SKILL.md +3 -3
  84. package/skills/turing/annotate/SKILL.md +2 -2
  85. package/skills/turing/archive/SKILL.md +2 -2
  86. package/skills/turing/audit/SKILL.md +3 -3
  87. package/skills/turing/baseline/SKILL.md +3 -3
  88. package/skills/turing/brief/SKILL.md +5 -5
  89. package/skills/turing/budget/SKILL.md +3 -3
  90. package/skills/turing/calibrate/SKILL.md +3 -3
  91. package/skills/turing/card/SKILL.md +3 -3
  92. package/skills/turing/changelog/SKILL.md +2 -2
  93. package/skills/turing/checkpoint/SKILL.md +3 -3
  94. package/skills/turing/cite/SKILL.md +2 -2
  95. package/skills/turing/compare/SKILL.md +1 -1
  96. package/skills/turing/counterfactual/SKILL.md +2 -2
  97. package/skills/turing/curriculum/SKILL.md +3 -3
  98. package/skills/turing/design/SKILL.md +3 -3
  99. package/skills/turing/diagnose/SKILL.md +4 -4
  100. package/skills/turing/diff/SKILL.md +3 -3
  101. package/skills/turing/distill/SKILL.md +3 -3
  102. package/skills/turing/doctor/SKILL.md +2 -2
  103. package/skills/turing/ensemble/SKILL.md +3 -3
  104. package/skills/turing/explore/SKILL.md +4 -4
  105. package/skills/turing/export/SKILL.md +3 -3
  106. package/skills/turing/feature/SKILL.md +3 -3
  107. package/skills/turing/flashback/SKILL.md +2 -2
  108. package/skills/turing/fork/SKILL.md +3 -3
  109. package/skills/turing/frontier/SKILL.md +3 -3
  110. package/skills/turing/init/SKILL.md +5 -5
  111. package/skills/turing/leak/SKILL.md +3 -3
  112. package/skills/turing/lit/SKILL.md +3 -3
  113. package/skills/turing/logbook/SKILL.md +5 -5
  114. package/skills/turing/merge/SKILL.md +2 -2
  115. package/skills/turing/mode/SKILL.md +1 -1
  116. package/skills/turing/onboard/SKILL.md +2 -2
  117. package/skills/turing/paper/SKILL.md +3 -3
  118. package/skills/turing/plan/SKILL.md +2 -2
  119. package/skills/turing/poster/SKILL.md +3 -3
  120. package/skills/turing/postmortem/SKILL.md +2 -2
  121. package/skills/turing/preflight/SKILL.md +5 -5
  122. package/skills/turing/present/SKILL.md +2 -2
  123. package/skills/turing/profile/SKILL.md +3 -3
  124. package/skills/turing/prune/SKILL.md +2 -2
  125. package/skills/turing/quantize/SKILL.md +2 -2
  126. package/skills/turing/queue/SKILL.md +3 -3
  127. package/skills/turing/registry/SKILL.md +2 -2
  128. package/skills/turing/regress/SKILL.md +3 -3
  129. package/skills/turing/replay/SKILL.md +2 -2
  130. package/skills/turing/report/SKILL.md +3 -3
  131. package/skills/turing/reproduce/SKILL.md +3 -3
  132. package/skills/turing/retry/SKILL.md +3 -3
  133. package/skills/turing/review/SKILL.md +2 -2
  134. package/skills/turing/rules/loop-protocol.md +11 -11
  135. package/skills/turing/sanity/SKILL.md +3 -3
  136. package/skills/turing/scale/SKILL.md +4 -4
  137. package/skills/turing/search/SKILL.md +2 -2
  138. package/skills/turing/seed/SKILL.md +3 -3
  139. package/skills/turing/sensitivity/SKILL.md +3 -3
  140. package/skills/turing/share/SKILL.md +2 -2
  141. package/skills/turing/simulate/SKILL.md +2 -2
  142. package/skills/turing/status/SKILL.md +1 -1
  143. package/skills/turing/stitch/SKILL.md +3 -3
  144. package/skills/turing/suggest/SKILL.md +5 -5
  145. package/skills/turing/surgery/SKILL.md +2 -2
  146. package/skills/turing/sweep/SKILL.md +8 -8
  147. package/skills/turing/template/SKILL.md +2 -2
  148. package/skills/turing/train/SKILL.md +5 -5
  149. package/skills/turing/transfer/SKILL.md +3 -3
  150. package/skills/turing/trend/SKILL.md +2 -2
  151. package/skills/turing/try/SKILL.md +4 -4
  152. package/skills/turing/update/SKILL.md +2 -2
  153. package/skills/turing/validate/SKILL.md +4 -4
  154. package/skills/turing/warm/SKILL.md +3 -3
  155. package/skills/turing/watch/SKILL.md +4 -4
  156. package/skills/turing/whatif/SKILL.md +2 -2
  157. package/skills/turing/xray/SKILL.md +3 -3
  158. package/templates/README.md +5 -8
  159. package/templates/program.md +18 -18
  160. package/templates/pyproject.toml +10 -0
  161. package/templates/requirements.txt +4 -1
  162. package/templates/scripts/generate_onboarding.py +1 -1
  163. package/templates/scripts/post-train-hook.sh +7 -8
  164. package/templates/scripts/scaffold.py +24 -26
  165. package/templates/scripts/stop-hook.sh +2 -3
  166. package/templates/scripts/turing-run-python.sh +9 -0
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  What would need to change to flip this prediction? Minimum-change counterfactual for individual predictions.
9
9
 
10
10
  ## Steps
11
- 1. `source .venv/bin/activate`
12
- 2. `python scripts/counterfactual_explanation.py $ARGUMENTS`
11
+ 1. `uv sync`
12
+ 2. `uv run python scripts/counterfactual_explanation.py $ARGUMENTS`
13
13
  3. **Saved:** `experiments/counterfactuals/`
14
14
 
15
15
  ## Methods
@@ -9,9 +9,9 @@ Does the order your model sees data matter? Find out systematically.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -21,7 +21,7 @@ Does the order your model sees data matter? Find out systematically.
21
21
 
22
22
  3. **Run curriculum analysis:**
23
23
  ```bash
24
- python scripts/curriculum_optimizer.py $ARGUMENTS
24
+ uv run python scripts/curriculum_optimizer.py $ARGUMENTS
25
25
  ```
26
26
 
27
27
  4. **Strategies tested:**
@@ -2,7 +2,7 @@
2
2
  name: design
3
3
  description: Generate a structured experiment design for a hypothesis. Reads experiment history, searches literature for methodology, produces a scored design document at experiments/designs/.
4
4
  argument-hint: "<hypothesis-id or description>"
5
- allowed-tools: Read, Write, Bash(python scripts/*:*, source .venv/bin/activate:*, mkdir:*), Grep, Glob, WebSearch, WebFetch
5
+ allowed-tools: Read, Write, Bash(uv run python scripts/*:*, uv sync:*, mkdir:*), Grep, Glob, WebSearch, WebFetch
6
6
  ---
7
7
 
8
8
  Front-load the thinking before the coding. Given a hypothesis, produce a structured experiment design grounded in methodology from the literature.
@@ -13,7 +13,7 @@ Front-load the thinking before the coding. Given a hypothesis, produce a structu
13
13
 
14
14
  If `$ARGUMENTS` matches `hyp-NNN`, load the hypothesis:
15
15
  ```bash
16
- source .venv/bin/activate && python scripts/manage_hypotheses.py show $ARGUMENTS
16
+ uv run python scripts/manage_hypotheses.py show $ARGUMENTS
17
17
  ```
18
18
 
19
19
  If freeform text, use it directly as the hypothesis description.
@@ -23,7 +23,7 @@ Read the current config and experiment state:
23
23
  cat config.yaml
24
24
  ```
25
25
  ```bash
26
- source .venv/bin/activate && python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
26
+ uv run python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
27
27
  ```
28
28
  ```bash
29
29
  cat experiment_state.yaml 2>/dev/null || echo "No experiment state yet"
@@ -9,15 +9,15 @@ Analyze where and why the model fails, beyond aggregate metrics.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Generate predictions if needed:**
18
18
  Check if `experiments/predictions/exp-NNN-preds.yaml` exists. If not, run:
19
19
  ```bash
20
- python train.py --predict-only --output experiments/predictions/
20
+ uv run python train.py --predict-only --output experiments/predictions/
21
21
  ```
22
22
  The predictions file must contain `y_true`, `y_pred`, `task_type`, and optionally `features`.
23
23
 
@@ -28,7 +28,7 @@ Analyze where and why the model fails, beyond aggregate metrics.
28
28
 
29
29
  4. **Run error analysis:**
30
30
  ```bash
31
- python scripts/diagnose_errors.py $ARGUMENTS
31
+ uv run python scripts/diagnose_errors.py $ARGUMENTS
32
32
  ```
33
33
 
34
34
  5. **Report results:**
@@ -9,9 +9,9 @@ Deep diagnostic comparison of two experiments. Goes beyond "which metric is high
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -21,7 +21,7 @@ Deep diagnostic comparison of two experiments. Goes beyond "which metric is high
21
21
 
22
22
  3. **Run deep comparison:**
23
23
  ```bash
24
- python scripts/experiment_diff.py $ARGUMENTS
24
+ uv run python scripts/experiment_diff.py $ARGUMENTS
25
25
  ```
26
26
 
27
27
  4. **Report results — the diff includes:**
@@ -9,9 +9,9 @@ Compress a large model into a smaller, faster one for production. Measures the a
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -23,7 +23,7 @@ Compress a large model into a smaller, faster one for production. Measures the a
23
23
 
24
24
  3. **Run distillation planner:**
25
25
  ```bash
26
- python scripts/model_distiller.py $ARGUMENTS
26
+ uv run python scripts/model_distiller.py $ARGUMENTS
27
27
  ```
28
28
 
29
29
  4. **Report includes:**
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  Is Turing healthy? Check everything and get a score.
9
9
 
10
10
  ## Steps
11
- 1. `source .venv/bin/activate`
12
- 2. `python scripts/harness_doctor.py $ARGUMENTS`
11
+ 1. `uv sync`
12
+ 2. `uv run python scripts/harness_doctor.py $ARGUMENTS`
13
13
  3. **Saved:** `experiments/doctor/`
14
14
 
15
15
  ## Checks
@@ -9,9 +9,9 @@ Build ensembles from your best experiments automatically. Often yields 1-3% impr
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -22,7 +22,7 @@ Build ensembles from your best experiments automatically. Often yields 1-3% impr
22
22
 
23
23
  3. **Run ensemble construction:**
24
24
  ```bash
25
- python scripts/build_ensemble.py $ARGUMENTS
25
+ uv run python scripts/build_ensemble.py $ARGUMENTS
26
26
  ```
27
27
 
28
28
  4. **Report results:**
@@ -2,7 +2,7 @@
2
2
  name: explore
3
3
  description: Tree-search-guided hypothesis exploration using AB-MCTS. Explores the space of experiment ideas as a search tree, scored by the critique engine. Discovers non-obvious refinement chains that linear suggestion cannot find.
4
4
  argument-hint: "[ml/project] [--iterations N] [--top N] [--strategy abmcts-a|abmcts-m|greedy]"
5
- allowed-tools: Read, Write, Bash(python scripts/*:*, source .venv/bin/activate:*), Grep, Glob
5
+ allowed-tools: Read, Write, Bash(uv run python scripts/*:*, uv sync:*), Grep, Glob
6
6
  ---
7
7
 
8
8
  Explore the hypothesis space using tree search. Instead of suggesting independent ideas, this builds and searches a tree of refinement chains — each node is a hypothesis scored by novelty, feasibility, and expected impact.
@@ -31,7 +31,7 @@ Extract from `$ARGUMENTS`:
31
31
  ### 1. Assess Current State
32
32
 
33
33
  ```bash
34
- source .venv/bin/activate && python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
34
+ uv run python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
35
35
  ```
36
36
 
37
37
  Read `config.yaml` to understand the current model and metric.
@@ -39,7 +39,7 @@ Read `config.yaml` to understand the current model and metric.
39
39
  ### 2. Run Tree Search
40
40
 
41
41
  ```bash
42
- source .venv/bin/activate && python scripts/treequest_suggest.py \
42
+ uv run python scripts/treequest_suggest.py \
43
43
  --log experiments/log.jsonl \
44
44
  --config config.yaml \
45
45
  --top <N> \
@@ -58,7 +58,7 @@ The script will:
58
58
  For each result, add to the hypothesis queue:
59
59
 
60
60
  ```bash
61
- source .venv/bin/activate && python scripts/manage_hypotheses.py add "<description>" \
61
+ uv run python scripts/manage_hypotheses.py add "<description>" \
62
62
  --priority medium --source treequest
63
63
  ```
64
64
 
@@ -9,9 +9,9 @@ Export a trained model to a production-ready format.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -23,7 +23,7 @@ Export a trained model to a production-ready format.
23
23
 
24
24
  3. **Run export pipeline:**
25
25
  ```bash
26
- python scripts/export_model.py $ARGUMENTS
26
+ uv run python scripts/export_model.py $ARGUMENTS
27
27
  ```
28
28
 
29
29
  4. **Report results:**
@@ -9,9 +9,9 @@ Systematically evaluate which features matter and which are noise.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -21,7 +21,7 @@ Systematically evaluate which features matter and which are noise.
21
21
 
22
22
  3. **Run feature analysis:**
23
23
  ```bash
24
- python scripts/feature_intelligence.py $ARGUMENTS
24
+ uv run python scripts/feature_intelligence.py $ARGUMENTS
25
25
  ```
26
26
 
27
27
  4. **Report includes:**
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  Come back to a project after a week and start working in 10 seconds instead of 30 minutes.
9
9
 
10
10
  ## Steps
11
- 1. **Activate environment:** `source .venv/bin/activate`
12
- 2. **Run:** `python scripts/session_flashback.py $ARGUMENTS`
11
+ 1. **Sync environment:** `uv sync`
12
+ 2. **Run:** `uv run python scripts/session_flashback.py $ARGUMENTS`
13
13
  3. **Report:** current best, last session experiments, pending hypotheses, annotations, budget, suggested next action
14
14
  4. **Saved output:** `experiments/flashbacks/flashback-*.yaml`
15
15
 
@@ -9,9 +9,9 @@ Fork an experiment into parallel branches and compare results.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -21,7 +21,7 @@ Fork an experiment into parallel branches and compare results.
21
21
 
22
22
  3. **Run fork:**
23
23
  ```bash
24
- python scripts/fork_experiment.py $ARGUMENTS
24
+ uv run python scripts/fork_experiment.py $ARGUMENTS
25
25
  ```
26
26
 
27
27
  4. **Report results:**
@@ -9,9 +9,9 @@ Visualize the Pareto frontier across multiple objectives from experiment history
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -21,7 +21,7 @@ Visualize the Pareto frontier across multiple objectives from experiment history
21
21
 
22
22
  3. **Run Pareto analysis:**
23
23
  ```bash
24
- python scripts/pareto_frontier.py $ARGUMENTS
24
+ uv run python scripts/pareto_frontier.py $ARGUMENTS
25
25
  ```
26
26
 
27
27
  4. **Report results:**
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: init
3
- description: Initialize a new ML project with the Turing autoresearch harness. Scaffolds the full experiment infrastructure — immutable evaluation pipeline, agent-editable training code, structured logging, convergence detection hooks, and a Python virtual environment. Use --plan to generate a research plan.
3
+ description: Initialize a new ML project with the Turing autoresearch harness. Scaffolds the full experiment infrastructure — immutable evaluation pipeline, agent-editable training code, structured logging, convergence detection hooks, and a uv-managed Python environment. Use --plan to generate a research plan.
4
4
  argument-hint: "[project_name] [--plan]"
5
5
  allowed-tools: Read, Write, Edit, Bash(*), Grep, Glob, WebSearch, WebFetch
6
6
  ---
@@ -23,7 +23,7 @@ Ask the user for the following (or accept from `$ARGUMENTS` if provided as JSON)
23
23
  Once you have all 6 values, delegate to the unified scaffolding script:
24
24
 
25
25
  ```bash
26
- python3 <templates_dir>/scripts/scaffold.py \
26
+ uv run python <templates_dir>/scripts/scaffold.py \
27
27
  --project-name "<project_name>" \
28
28
  --target-metric "<target_metric>" \
29
29
  --metric-direction "<metric_direction>" \
@@ -38,7 +38,7 @@ The scaffold script handles everything in a single atomic operation:
38
38
  - Creates data/, experiments/, models/ directories
39
39
  - Sets up agent memory at `.claude/agent-memory/ml-researcher-{project_name}/MEMORY.md`
40
40
  - Configures Claude Code hooks in `.claude/settings.local.json`
41
- - Creates Python virtual environment and installs requirements
41
+ - Runs `uv sync` from the ML directory when uv is available
42
42
  - Verifies all placeholders were replaced (fails loudly if any remain)
43
43
 
44
44
  ## Locating Templates
@@ -57,7 +57,7 @@ node_modules/claude-turing/templates/
57
57
  Example command:
58
58
 
59
59
  ```bash
60
- python3 ~/.claude/commands/turing/templates/scripts/scaffold.py \
60
+ uv run python ~/.claude/commands/turing/templates/scripts/scaffold.py \
61
61
  --project-name "<project_name>" \
62
62
  --target-metric "<target_metric>" \
63
63
  --metric-direction "<metric_direction>" \
@@ -71,7 +71,7 @@ python3 ~/.claude/commands/turing/templates/scripts/scaffold.py \
71
71
 
72
72
  Report what was created:
73
73
  - The separation: READ-ONLY (`prepare.py`, `evaluate.py`) vs AGENT-EDITABLE (`train.py`)
74
- - Next steps: add data to the configured data source path, run `python prepare.py`, then `/turing:train`
74
+ - Next steps: add data to the configured data source path, run `uv run python prepare.py`, then `/turing:train`
75
75
  - The taste-leverage loop: `/turing:try` to inject hypotheses, `/turing:brief` for intelligence reports
76
76
 
77
77
  ## Research Plan Generation (--plan flag)
@@ -9,9 +9,9 @@ Actively probe for data leakage. The #1 cause of "too good to be true" results.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -21,7 +21,7 @@ Actively probe for data leakage. The #1 cause of "too good to be true" results.
21
21
 
22
22
  3. **Run leakage scan:**
23
23
  ```bash
24
- python scripts/leakage_detector.py $ARGUMENTS
24
+ uv run python scripts/leakage_detector.py $ARGUMENTS
25
25
  ```
26
26
 
27
27
  4. **Checks performed:**
@@ -9,9 +9,9 @@ Search the literature for papers, baselines, and related work.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -23,7 +23,7 @@ Search the literature for papers, baselines, and related work.
23
23
 
24
24
  3. **Run literature search:**
25
25
  ```bash
26
- python scripts/literature_search.py $ARGUMENTS
26
+ uv run python scripts/literature_search.py $ARGUMENTS
27
27
  ```
28
28
 
29
29
  4. **Report results:**
@@ -2,7 +2,7 @@
2
2
  name: logbook
3
3
  description: Generate a research logbook showing the full experiment narrative — hypotheses proposed, experiments run, decisions made, and progress over time. Outputs HTML (with interactive chart) or markdown.
4
4
  argument-hint: "[--since YYYY-MM-DD] [--format html|markdown] [--output path]"
5
- allowed-tools: Read, Bash(python scripts/*:*, source .venv/bin/activate:*, mkdir:*), Grep, Glob
5
+ allowed-tools: Read, Bash(uv run python scripts/*:*, uv sync:*, mkdir:*), Grep, Glob
6
6
  ---
7
7
 
8
8
  Generate a research logbook that captures the full narrative of the experiment campaign.
@@ -11,7 +11,7 @@ Generate a research logbook that captures the full narrative of the experiment c
11
11
 
12
12
  1. **Generate the logbook:**
13
13
  ```bash
14
- source .venv/bin/activate && python scripts/generate_logbook.py
14
+ uv run python scripts/generate_logbook.py
15
15
  ```
16
16
 
17
17
  **With options from `$ARGUMENTS`:**
@@ -22,13 +22,13 @@ Generate a research logbook that captures the full narrative of the experiment c
22
22
  **Common usage:**
23
23
  ```bash
24
24
  # HTML logbook with interactive trajectory chart
25
- source .venv/bin/activate && python scripts/generate_logbook.py --output logbook.html
25
+ uv run python scripts/generate_logbook.py --output logbook.html
26
26
 
27
27
  # Markdown for embedding in docs or READMEs
28
- source .venv/bin/activate && python scripts/generate_logbook.py --format markdown --output logbook.md
28
+ uv run python scripts/generate_logbook.py --format markdown --output logbook.md
29
29
 
30
30
  # Last week's activity
31
- source .venv/bin/activate && python scripts/generate_logbook.py --since 2026-03-24 --output logbook.html
31
+ uv run python scripts/generate_logbook.py --since 2026-03-24 --output logbook.html
32
32
  ```
33
33
 
34
34
  2. **Present the result:**
@@ -9,8 +9,8 @@ Combine model weights (not predictions) into a single, better model with no late
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:** `source .venv/bin/activate`
13
- 2. **Run:** `python scripts/model_merger.py $ARGUMENTS`
12
+ 1. **Sync environment:** `uv sync`
13
+ 2. **Run:** `uv run python scripts/model_merger.py $ARGUMENTS`
14
14
  3. **Methods:** uniform soup (simple average), greedy soup (include only if improves), TIES (trim+elect+merge), DARE (drop+rescale)
15
15
  4. **Report:** compatibility check, per-model metrics, method comparison, improvement delta
16
16
  5. **Saved output:** `experiments/merges/merge-*.yaml`
@@ -20,7 +20,7 @@ Set the research mode for the current project. The mode determines how the novel
20
20
 
21
21
  2. **Update experiment state:**
22
22
  ```bash
23
- source .venv/bin/activate
23
+ uv sync
24
24
  python -c "
25
25
  import yaml
26
26
  from pathlib import Path
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  5-minute read that replaces a 1-hour onboarding meeting.
9
9
 
10
10
  ## Steps
11
- 1. `source .venv/bin/activate`
12
- 2. `python scripts/generate_onboarding.py $ARGUMENTS`
11
+ 1. `uv sync`
12
+ 2. `uv run python scripts/generate_onboarding.py $ARGUMENTS`
13
13
  3. **Saved:** `ONBOARDING.md`
14
14
 
15
15
  ## Examples
@@ -9,9 +9,9 @@ Draft paper sections directly from experiment data.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -20,7 +20,7 @@ Draft paper sections directly from experiment data.
20
20
 
21
21
  3. **Run paper drafting:**
22
22
  ```bash
23
- python scripts/draft_paper_sections.py $ARGUMENTS
23
+ uv run python scripts/draft_paper_sections.py $ARGUMENTS
24
24
  ```
25
25
 
26
26
  4. **Report results:**
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  Design the next N experiments strategically, not randomly. Allocates budget by expected ROI.
9
9
 
10
10
  ## Steps
11
- 1. `source .venv/bin/activate`
12
- 2. `python scripts/research_planner.py $ARGUMENTS`
11
+ 1. `uv sync`
12
+ 2. `uv run python scripts/research_planner.py $ARGUMENTS`
13
13
  3. **Saved:** `experiments/plans/`
14
14
 
15
15
  ## How it works
@@ -2,7 +2,7 @@
2
2
  name: poster
3
3
  description: Generate a single-page HTML research poster summarizing the experiment campaign — best result, trajectory, key findings, and methodology. Adapted from posterskill's self-contained HTML architecture.
4
4
  argument-hint: "[title override]"
5
- allowed-tools: Read, Write, Edit, Bash(python scripts/*:*, source .venv/bin/activate:*, mkdir:*, open:*), Grep, Glob
5
+ allowed-tools: Read, Write, Edit, Bash(uv run python scripts/*:*, uv sync:*, mkdir:*, open:*), Grep, Glob
6
6
  ---
7
7
 
8
8
  Generate a research poster summarizing the experiment campaign as a single self-contained HTML file. Adapted from [posterskill](https://github.com/ethanweber/posterskill)'s architecture — no build step, works when opened as `file://`.
@@ -15,8 +15,8 @@ Read the experiment history and project context:
15
15
 
16
16
  ```bash
17
17
  cat config.yaml
18
- source .venv/bin/activate && python scripts/generate_brief.py
19
- source .venv/bin/activate && python scripts/show_metrics.py --last 20
18
+ uv run python scripts/generate_brief.py
19
+ uv run python scripts/show_metrics.py --last 20
20
20
  cat experiment_state.yaml 2>/dev/null || true
21
21
  cat RESEARCH_PLAN.md 2>/dev/null || true
22
22
  ```
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  When experiments stop improving, find out why. Diagnoses search space exhaustion, config errors, data issues, metric ceilings, and noise floors.
9
9
 
10
10
  ## Steps
11
- 1. `source .venv/bin/activate`
12
- 2. `python scripts/failure_postmortem.py $ARGUMENTS`
11
+ 1. `uv sync`
12
+ 2. `uv run python scripts/failure_postmortem.py $ARGUMENTS`
13
13
  3. **Saved:** `experiments/postmortems/`
14
14
 
15
15
  ## Diagnosis categories
@@ -2,28 +2,28 @@
2
2
  name: preflight
3
3
  description: Pre-flight resource check — estimates VRAM, RAM, and disk requirements before running ML training. Compares against available system resources and issues PASS/WARN/FAIL verdict. Use before training to catch OOM errors before they happen.
4
4
  argument-hint: "[--model-type torch] [--params 10M] [--batch-size 32]"
5
- allowed-tools: Read, Bash(python scripts/*:*, source .venv/bin/activate:*, nvidia-smi:*), Grep, Glob
5
+ allowed-tools: Read, Bash(uv run python scripts/*:*, uv sync:*, nvidia-smi:*), Grep, Glob
6
6
  ---
7
7
 
8
8
  Check whether the current system has enough resources to run the planned experiment.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Run preflight check:**
18
18
 
19
19
  If `$ARGUMENTS` is empty (auto-detect from config.yaml):
20
20
  ```bash
21
- python scripts/preflight.py
21
+ uv run python scripts/preflight.py
22
22
  ```
23
23
 
24
24
  If `$ARGUMENTS` contains flags:
25
25
  ```bash
26
- python scripts/preflight.py $ARGUMENTS
26
+ uv run python scripts/preflight.py $ARGUMENTS
27
27
  ```
28
28
 
29
29
  3. **Interpret the verdict:**
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  Generate presentation-ready figure specifications from experiment data in seconds.
9
9
 
10
10
  ## Steps
11
- 1. **Activate environment:** `source .venv/bin/activate`
12
- 2. **Run:** `python scripts/generate_figures.py $ARGUMENTS`
11
+ 1. **Sync environment:** `uv sync`
12
+ 2. **Run:** `uv run python scripts/generate_figures.py $ARGUMENTS`
13
13
  3. **Figure types:** training, comparison, ablation, pareto, sensitivity
14
14
  4. **Styles:** light (papers), dark (demos), poster (large fonts)
15
15
  5. **Saved output:** `paper/figures/`
@@ -9,9 +9,9 @@ Profile a training run to identify performance bottlenecks.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -20,7 +20,7 @@ Profile a training run to identify performance bottlenecks.
20
20
 
21
21
  3. **Run profiling:**
22
22
  ```bash
23
- python scripts/profile_training.py $ARGUMENTS
23
+ uv run python scripts/profile_training.py $ARGUMENTS
24
24
  ```
25
25
 
26
26
  4. **Report results:**
@@ -9,8 +9,8 @@ Remove redundant weights for faster inference and smaller models.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:** `source .venv/bin/activate`
13
- 2. **Run:** `python scripts/model_pruning.py $ARGUMENTS`
12
+ 1. **Sync environment:** `uv sync`
13
+ 2. **Run:** `uv run python scripts/model_pruning.py $ARGUMENTS`
14
14
  3. **Methods:** magnitude (zero small weights), structured (remove neurons), lottery (iterative with rewind)
15
15
  4. **For tree models:** progressively reduces n_estimators
16
16
  5. **Report:** sparsity sweep table, knee point, recommended sparsity
@@ -9,8 +9,8 @@ Quantize for production. Lowest-effort optimization: 2-4x speedup, 2-4x memory r
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:** `source .venv/bin/activate`
13
- 2. **Run:** `python scripts/model_quantization.py $ARGUMENTS`
12
+ 1. **Sync environment:** `uv sync`
13
+ 2. **Run:** `uv run python scripts/model_quantization.py $ARGUMENTS`
14
14
  3. **Precision levels:** FP32 (baseline), FP16 (GPU), INT8 dynamic (simplest), INT8 static (best accuracy)
15
15
  4. **Report:** precision comparison table, recommended level, QAT suggestion if needed
16
16
  5. **Saved output:** `experiments/quantization/<exp-id>-quantization.yaml`
@@ -9,9 +9,9 @@ Manage the experiment queue for unattended batch execution.
9
9
 
10
10
  ## Steps
11
11
 
12
- 1. **Activate environment:**
12
+ 1. **Sync environment:**
13
13
  ```bash
14
- source .venv/bin/activate
14
+ uv sync
15
15
  ```
16
16
 
17
17
  2. **Parse arguments from `$ARGUMENTS`:**
@@ -23,7 +23,7 @@ Manage the experiment queue for unattended batch execution.
23
23
 
24
24
  3. **Run queue manager:**
25
25
  ```bash
26
- python scripts/experiment_queue.py $ARGUMENTS
26
+ uv run python scripts/experiment_queue.py $ARGUMENTS
27
27
  ```
28
28
 
29
29
  4. **Report results by action:**
@@ -8,8 +8,8 @@ allowed-tools: Read, Bash(*), Grep, Glob
8
8
  Track which model is production, staging, candidate, or archived. Promotion requires passing gates.
9
9
 
10
10
  ## Steps
11
- 1. `source .venv/bin/activate`
12
- 2. `python scripts/model_lifecycle.py $ARGUMENTS`
11
+ 1. `uv sync`
12
+ 2. `uv run python scripts/model_lifecycle.py $ARGUMENTS`
13
13
  3. **Registry:** `experiments/registry.yaml`
14
14
 
15
15
  ## Promotion gates