claude-turing 4.5.0 → 4.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (260) hide show
  1. package/.claude-plugin/marketplace.json +18 -0
  2. package/.claude-plugin/plugin.json +2 -2
  3. package/README.md +1 -1
  4. package/commands/turing.md +85 -77
  5. package/config/commands.yaml +928 -0
  6. package/package.json +11 -4
  7. package/skills/turing/SKILL.md +180 -0
  8. package/skills/turing/ablate/SKILL.md +47 -0
  9. package/skills/turing/annotate/SKILL.md +23 -0
  10. package/skills/turing/archive/SKILL.md +23 -0
  11. package/skills/turing/audit/SKILL.md +56 -0
  12. package/skills/turing/baseline/SKILL.md +45 -0
  13. package/skills/turing/brief/SKILL.md +95 -0
  14. package/skills/turing/budget/SKILL.md +52 -0
  15. package/skills/turing/calibrate/SKILL.md +47 -0
  16. package/skills/turing/card/SKILL.md +36 -0
  17. package/skills/turing/changelog/SKILL.md +22 -0
  18. package/skills/turing/checkpoint/SKILL.md +47 -0
  19. package/skills/turing/cite/SKILL.md +23 -0
  20. package/skills/turing/compare/SKILL.md +24 -0
  21. package/skills/turing/counterfactual/SKILL.md +27 -0
  22. package/skills/turing/curriculum/SKILL.md +43 -0
  23. package/skills/turing/design/SKILL.md +97 -0
  24. package/skills/turing/diagnose/SKILL.md +52 -0
  25. package/skills/turing/diff/SKILL.md +48 -0
  26. package/skills/turing/distill/SKILL.md +56 -0
  27. package/skills/turing/doctor/SKILL.md +31 -0
  28. package/skills/turing/ensemble/SKILL.md +54 -0
  29. package/skills/turing/explore/SKILL.md +107 -0
  30. package/skills/turing/export/SKILL.md +48 -0
  31. package/skills/turing/feature/SKILL.md +42 -0
  32. package/skills/turing/flashback/SKILL.md +22 -0
  33. package/skills/turing/fork/SKILL.md +40 -0
  34. package/skills/turing/frontier/SKILL.md +45 -0
  35. package/skills/turing/init/SKILL.md +154 -0
  36. package/skills/turing/leak/SKILL.md +47 -0
  37. package/skills/turing/lit/SKILL.md +47 -0
  38. package/skills/turing/logbook/SKILL.md +51 -0
  39. package/skills/turing/merge/SKILL.md +24 -0
  40. package/skills/turing/mode/SKILL.md +43 -0
  41. package/skills/turing/onboard/SKILL.md +20 -0
  42. package/skills/turing/paper/SKILL.md +44 -0
  43. package/skills/turing/plan/SKILL.md +27 -0
  44. package/skills/turing/poster/SKILL.md +89 -0
  45. package/skills/turing/postmortem/SKILL.md +28 -0
  46. package/skills/turing/preflight/SKILL.md +75 -0
  47. package/skills/turing/present/SKILL.md +23 -0
  48. package/skills/turing/profile/SKILL.md +43 -0
  49. package/skills/turing/prune/SKILL.md +26 -0
  50. package/skills/turing/quantize/SKILL.md +24 -0
  51. package/skills/turing/queue/SKILL.md +48 -0
  52. package/skills/turing/registry/SKILL.md +31 -0
  53. package/skills/turing/regress/SKILL.md +53 -0
  54. package/skills/turing/replay/SKILL.md +23 -0
  55. package/skills/turing/report/SKILL.md +97 -0
  56. package/skills/turing/reproduce/SKILL.md +48 -0
  57. package/skills/turing/retry/SKILL.md +41 -0
  58. package/skills/turing/review/SKILL.md +20 -0
  59. package/skills/turing/rules/loop-protocol.md +91 -0
  60. package/skills/turing/sanity/SKILL.md +48 -0
  61. package/skills/turing/scale/SKILL.md +55 -0
  62. package/skills/turing/search/SKILL.md +22 -0
  63. package/skills/turing/seed/SKILL.md +47 -0
  64. package/skills/turing/sensitivity/SKILL.md +41 -0
  65. package/skills/turing/share/SKILL.md +20 -0
  66. package/skills/turing/simulate/SKILL.md +28 -0
  67. package/skills/turing/status/SKILL.md +24 -0
  68. package/skills/turing/stitch/SKILL.md +49 -0
  69. package/skills/turing/suggest/SKILL.md +159 -0
  70. package/skills/turing/surgery/SKILL.md +27 -0
  71. package/skills/turing/sweep/SKILL.md +45 -0
  72. package/skills/turing/template/SKILL.md +22 -0
  73. package/skills/turing/train/SKILL.md +75 -0
  74. package/skills/turing/transfer/SKILL.md +54 -0
  75. package/skills/turing/trend/SKILL.md +21 -0
  76. package/skills/turing/try/SKILL.md +63 -0
  77. package/skills/turing/update/SKILL.md +27 -0
  78. package/skills/turing/validate/SKILL.md +34 -0
  79. package/skills/turing/warm/SKILL.md +53 -0
  80. package/skills/turing/watch/SKILL.md +60 -0
  81. package/skills/turing/whatif/SKILL.md +31 -0
  82. package/skills/turing/xray/SKILL.md +43 -0
  83. package/src/command-registry.js +160 -0
  84. package/src/install.js +8 -34
  85. package/src/sync-skills-layout.js +149 -0
  86. package/src/verify.js +5 -88
  87. package/templates/__pycache__/evaluate.cpython-312.pyc +0 -0
  88. package/templates/__pycache__/evaluate.cpython-314.pyc +0 -0
  89. package/templates/__pycache__/prepare.cpython-312.pyc +0 -0
  90. package/templates/__pycache__/prepare.cpython-314.pyc +0 -0
  91. package/templates/features/__pycache__/__init__.cpython-312.pyc +0 -0
  92. package/templates/features/__pycache__/__init__.cpython-314.pyc +0 -0
  93. package/templates/features/__pycache__/featurizers.cpython-312.pyc +0 -0
  94. package/templates/features/__pycache__/featurizers.cpython-314.pyc +0 -0
  95. package/templates/scripts/__pycache__/__init__.cpython-312.pyc +0 -0
  96. package/templates/scripts/__pycache__/__init__.cpython-314.pyc +0 -0
  97. package/templates/scripts/__pycache__/ablation_study.cpython-312.pyc +0 -0
  98. package/templates/scripts/__pycache__/ablation_study.cpython-314.pyc +0 -0
  99. package/templates/scripts/__pycache__/architecture_surgery.cpython-312.pyc +0 -0
  100. package/templates/scripts/__pycache__/architecture_surgery.cpython-314.pyc +0 -0
  101. package/templates/scripts/__pycache__/budget_manager.cpython-312.pyc +0 -0
  102. package/templates/scripts/__pycache__/budget_manager.cpython-314.pyc +0 -0
  103. package/templates/scripts/__pycache__/build_ensemble.cpython-312.pyc +0 -0
  104. package/templates/scripts/__pycache__/build_ensemble.cpython-314.pyc +0 -0
  105. package/templates/scripts/__pycache__/calibration.cpython-312.pyc +0 -0
  106. package/templates/scripts/__pycache__/calibration.cpython-314.pyc +0 -0
  107. package/templates/scripts/__pycache__/check_convergence.cpython-312.pyc +0 -0
  108. package/templates/scripts/__pycache__/check_convergence.cpython-314.pyc +0 -0
  109. package/templates/scripts/__pycache__/checkpoint_manager.cpython-312.pyc +0 -0
  110. package/templates/scripts/__pycache__/checkpoint_manager.cpython-314.pyc +0 -0
  111. package/templates/scripts/__pycache__/citation_manager.cpython-312.pyc +0 -0
  112. package/templates/scripts/__pycache__/citation_manager.cpython-314.pyc +0 -0
  113. package/templates/scripts/__pycache__/cost_frontier.cpython-312.pyc +0 -0
  114. package/templates/scripts/__pycache__/cost_frontier.cpython-314.pyc +0 -0
  115. package/templates/scripts/__pycache__/counterfactual_explanation.cpython-312.pyc +0 -0
  116. package/templates/scripts/__pycache__/counterfactual_explanation.cpython-314.pyc +0 -0
  117. package/templates/scripts/__pycache__/critique_hypothesis.cpython-312.pyc +0 -0
  118. package/templates/scripts/__pycache__/critique_hypothesis.cpython-314.pyc +0 -0
  119. package/templates/scripts/__pycache__/curriculum_optimizer.cpython-312.pyc +0 -0
  120. package/templates/scripts/__pycache__/curriculum_optimizer.cpython-314.pyc +0 -0
  121. package/templates/scripts/__pycache__/diagnose_errors.cpython-312.pyc +0 -0
  122. package/templates/scripts/__pycache__/diagnose_errors.cpython-314.pyc +0 -0
  123. package/templates/scripts/__pycache__/draft_paper_sections.cpython-312.pyc +0 -0
  124. package/templates/scripts/__pycache__/draft_paper_sections.cpython-314.pyc +0 -0
  125. package/templates/scripts/__pycache__/equivalence_checker.cpython-312.pyc +0 -0
  126. package/templates/scripts/__pycache__/equivalence_checker.cpython-314.pyc +0 -0
  127. package/templates/scripts/__pycache__/experiment_annotations.cpython-312.pyc +0 -0
  128. package/templates/scripts/__pycache__/experiment_annotations.cpython-314.pyc +0 -0
  129. package/templates/scripts/__pycache__/experiment_archive.cpython-312.pyc +0 -0
  130. package/templates/scripts/__pycache__/experiment_archive.cpython-314.pyc +0 -0
  131. package/templates/scripts/__pycache__/experiment_diff.cpython-312.pyc +0 -0
  132. package/templates/scripts/__pycache__/experiment_diff.cpython-314.pyc +0 -0
  133. package/templates/scripts/__pycache__/experiment_index.cpython-312.pyc +0 -0
  134. package/templates/scripts/__pycache__/experiment_index.cpython-314.pyc +0 -0
  135. package/templates/scripts/__pycache__/experiment_queue.cpython-312.pyc +0 -0
  136. package/templates/scripts/__pycache__/experiment_queue.cpython-314.pyc +0 -0
  137. package/templates/scripts/__pycache__/experiment_replay.cpython-312.pyc +0 -0
  138. package/templates/scripts/__pycache__/experiment_replay.cpython-314.pyc +0 -0
  139. package/templates/scripts/__pycache__/experiment_search.cpython-312.pyc +0 -0
  140. package/templates/scripts/__pycache__/experiment_search.cpython-314.pyc +0 -0
  141. package/templates/scripts/__pycache__/experiment_simulator.cpython-312.pyc +0 -0
  142. package/templates/scripts/__pycache__/experiment_simulator.cpython-314.pyc +0 -0
  143. package/templates/scripts/__pycache__/experiment_templates.cpython-312.pyc +0 -0
  144. package/templates/scripts/__pycache__/experiment_templates.cpython-314.pyc +0 -0
  145. package/templates/scripts/__pycache__/export_card.cpython-312.pyc +0 -0
  146. package/templates/scripts/__pycache__/export_card.cpython-314.pyc +0 -0
  147. package/templates/scripts/__pycache__/export_formats.cpython-312.pyc +0 -0
  148. package/templates/scripts/__pycache__/export_formats.cpython-314.pyc +0 -0
  149. package/templates/scripts/__pycache__/failure_postmortem.cpython-312.pyc +0 -0
  150. package/templates/scripts/__pycache__/failure_postmortem.cpython-314.pyc +0 -0
  151. package/templates/scripts/__pycache__/feature_intelligence.cpython-312.pyc +0 -0
  152. package/templates/scripts/__pycache__/feature_intelligence.cpython-314.pyc +0 -0
  153. package/templates/scripts/__pycache__/fork_experiment.cpython-312.pyc +0 -0
  154. package/templates/scripts/__pycache__/fork_experiment.cpython-314.pyc +0 -0
  155. package/templates/scripts/__pycache__/generate_baselines.cpython-312.pyc +0 -0
  156. package/templates/scripts/__pycache__/generate_baselines.cpython-314.pyc +0 -0
  157. package/templates/scripts/__pycache__/generate_brief.cpython-312.pyc +0 -0
  158. package/templates/scripts/__pycache__/generate_brief.cpython-314.pyc +0 -0
  159. package/templates/scripts/__pycache__/generate_changelog.cpython-312.pyc +0 -0
  160. package/templates/scripts/__pycache__/generate_changelog.cpython-314.pyc +0 -0
  161. package/templates/scripts/__pycache__/generate_figures.cpython-312.pyc +0 -0
  162. package/templates/scripts/__pycache__/generate_figures.cpython-314.pyc +0 -0
  163. package/templates/scripts/__pycache__/generate_logbook.cpython-312.pyc +0 -0
  164. package/templates/scripts/__pycache__/generate_logbook.cpython-314.pyc +0 -0
  165. package/templates/scripts/__pycache__/generate_model_card.cpython-312.pyc +0 -0
  166. package/templates/scripts/__pycache__/generate_model_card.cpython-314.pyc +0 -0
  167. package/templates/scripts/__pycache__/generate_onboarding.cpython-312.pyc +0 -0
  168. package/templates/scripts/__pycache__/generate_onboarding.cpython-314.pyc +0 -0
  169. package/templates/scripts/__pycache__/harness_doctor.cpython-312.pyc +0 -0
  170. package/templates/scripts/__pycache__/harness_doctor.cpython-314.pyc +0 -0
  171. package/templates/scripts/__pycache__/incremental_update.cpython-312.pyc +0 -0
  172. package/templates/scripts/__pycache__/incremental_update.cpython-314.pyc +0 -0
  173. package/templates/scripts/__pycache__/knowledge_transfer.cpython-312.pyc +0 -0
  174. package/templates/scripts/__pycache__/knowledge_transfer.cpython-314.pyc +0 -0
  175. package/templates/scripts/__pycache__/latency_benchmark.cpython-312.pyc +0 -0
  176. package/templates/scripts/__pycache__/latency_benchmark.cpython-314.pyc +0 -0
  177. package/templates/scripts/__pycache__/leakage_detector.cpython-312.pyc +0 -0
  178. package/templates/scripts/__pycache__/leakage_detector.cpython-314.pyc +0 -0
  179. package/templates/scripts/__pycache__/literature_search.cpython-312.pyc +0 -0
  180. package/templates/scripts/__pycache__/literature_search.cpython-314.pyc +0 -0
  181. package/templates/scripts/__pycache__/log_experiment.cpython-312.pyc +0 -0
  182. package/templates/scripts/__pycache__/log_experiment.cpython-314.pyc +0 -0
  183. package/templates/scripts/__pycache__/manage_hypotheses.cpython-312.pyc +0 -0
  184. package/templates/scripts/__pycache__/manage_hypotheses.cpython-314.pyc +0 -0
  185. package/templates/scripts/__pycache__/methodology_audit.cpython-312.pyc +0 -0
  186. package/templates/scripts/__pycache__/methodology_audit.cpython-314.pyc +0 -0
  187. package/templates/scripts/__pycache__/model_distiller.cpython-312.pyc +0 -0
  188. package/templates/scripts/__pycache__/model_distiller.cpython-314.pyc +0 -0
  189. package/templates/scripts/__pycache__/model_lifecycle.cpython-312.pyc +0 -0
  190. package/templates/scripts/__pycache__/model_lifecycle.cpython-314.pyc +0 -0
  191. package/templates/scripts/__pycache__/model_merger.cpython-312.pyc +0 -0
  192. package/templates/scripts/__pycache__/model_merger.cpython-314.pyc +0 -0
  193. package/templates/scripts/__pycache__/model_pruning.cpython-312.pyc +0 -0
  194. package/templates/scripts/__pycache__/model_pruning.cpython-314.pyc +0 -0
  195. package/templates/scripts/__pycache__/model_quantization.cpython-312.pyc +0 -0
  196. package/templates/scripts/__pycache__/model_quantization.cpython-314.pyc +0 -0
  197. package/templates/scripts/__pycache__/model_xray.cpython-312.pyc +0 -0
  198. package/templates/scripts/__pycache__/model_xray.cpython-314.pyc +0 -0
  199. package/templates/scripts/__pycache__/novelty_guard.cpython-312.pyc +0 -0
  200. package/templates/scripts/__pycache__/novelty_guard.cpython-314.pyc +0 -0
  201. package/templates/scripts/__pycache__/package_experiments.cpython-312.pyc +0 -0
  202. package/templates/scripts/__pycache__/package_experiments.cpython-314.pyc +0 -0
  203. package/templates/scripts/__pycache__/pareto_frontier.cpython-312.pyc +0 -0
  204. package/templates/scripts/__pycache__/pareto_frontier.cpython-314.pyc +0 -0
  205. package/templates/scripts/__pycache__/parse_metrics.cpython-312.pyc +0 -0
  206. package/templates/scripts/__pycache__/parse_metrics.cpython-314.pyc +0 -0
  207. package/templates/scripts/__pycache__/pipeline_manager.cpython-312.pyc +0 -0
  208. package/templates/scripts/__pycache__/pipeline_manager.cpython-314.pyc +0 -0
  209. package/templates/scripts/__pycache__/profile_training.cpython-312.pyc +0 -0
  210. package/templates/scripts/__pycache__/profile_training.cpython-314.pyc +0 -0
  211. package/templates/scripts/__pycache__/regression_gate.cpython-312.pyc +0 -0
  212. package/templates/scripts/__pycache__/regression_gate.cpython-314.pyc +0 -0
  213. package/templates/scripts/__pycache__/reproduce_experiment.cpython-312.pyc +0 -0
  214. package/templates/scripts/__pycache__/reproduce_experiment.cpython-314.pyc +0 -0
  215. package/templates/scripts/__pycache__/research_planner.cpython-312.pyc +0 -0
  216. package/templates/scripts/__pycache__/research_planner.cpython-314.pyc +0 -0
  217. package/templates/scripts/__pycache__/sanity_checks.cpython-312.pyc +0 -0
  218. package/templates/scripts/__pycache__/sanity_checks.cpython-314.pyc +0 -0
  219. package/templates/scripts/__pycache__/scaffold.cpython-312.pyc +0 -0
  220. package/templates/scripts/__pycache__/scaffold.cpython-314.pyc +0 -0
  221. package/templates/scripts/__pycache__/scaling_estimator.cpython-312.pyc +0 -0
  222. package/templates/scripts/__pycache__/scaling_estimator.cpython-314.pyc +0 -0
  223. package/templates/scripts/__pycache__/seed_runner.cpython-312.pyc +0 -0
  224. package/templates/scripts/__pycache__/seed_runner.cpython-314.pyc +0 -0
  225. package/templates/scripts/__pycache__/sensitivity_analysis.cpython-312.pyc +0 -0
  226. package/templates/scripts/__pycache__/sensitivity_analysis.cpython-314.pyc +0 -0
  227. package/templates/scripts/__pycache__/session_flashback.cpython-312.pyc +0 -0
  228. package/templates/scripts/__pycache__/session_flashback.cpython-314.pyc +0 -0
  229. package/templates/scripts/__pycache__/show_experiment_tree.cpython-312.pyc +0 -0
  230. package/templates/scripts/__pycache__/show_experiment_tree.cpython-314.pyc +0 -0
  231. package/templates/scripts/__pycache__/show_families.cpython-312.pyc +0 -0
  232. package/templates/scripts/__pycache__/show_families.cpython-314.pyc +0 -0
  233. package/templates/scripts/__pycache__/simulate_review.cpython-312.pyc +0 -0
  234. package/templates/scripts/__pycache__/simulate_review.cpython-314.pyc +0 -0
  235. package/templates/scripts/__pycache__/smart_retry.cpython-312.pyc +0 -0
  236. package/templates/scripts/__pycache__/smart_retry.cpython-314.pyc +0 -0
  237. package/templates/scripts/__pycache__/statistical_compare.cpython-312.pyc +0 -0
  238. package/templates/scripts/__pycache__/statistical_compare.cpython-314.pyc +0 -0
  239. package/templates/scripts/__pycache__/suggest_next.cpython-312.pyc +0 -0
  240. package/templates/scripts/__pycache__/suggest_next.cpython-314.pyc +0 -0
  241. package/templates/scripts/__pycache__/sweep.cpython-312.pyc +0 -0
  242. package/templates/scripts/__pycache__/sweep.cpython-314.pyc +0 -0
  243. package/templates/scripts/__pycache__/synthesize_decision.cpython-312.pyc +0 -0
  244. package/templates/scripts/__pycache__/synthesize_decision.cpython-314.pyc +0 -0
  245. package/templates/scripts/__pycache__/training_monitor.cpython-312.pyc +0 -0
  246. package/templates/scripts/__pycache__/training_monitor.cpython-314.pyc +0 -0
  247. package/templates/scripts/__pycache__/treequest_suggest.cpython-312.pyc +0 -0
  248. package/templates/scripts/__pycache__/treequest_suggest.cpython-314.pyc +0 -0
  249. package/templates/scripts/__pycache__/trend_analysis.cpython-312.pyc +0 -0
  250. package/templates/scripts/__pycache__/trend_analysis.cpython-314.pyc +0 -0
  251. package/templates/scripts/__pycache__/turing_io.cpython-312.pyc +0 -0
  252. package/templates/scripts/__pycache__/turing_io.cpython-314.pyc +0 -0
  253. package/templates/scripts/__pycache__/update_state.cpython-312.pyc +0 -0
  254. package/templates/scripts/__pycache__/update_state.cpython-314.pyc +0 -0
  255. package/templates/scripts/__pycache__/verify_placeholders.cpython-312.pyc +0 -0
  256. package/templates/scripts/__pycache__/verify_placeholders.cpython-314.pyc +0 -0
  257. package/templates/scripts/__pycache__/warm_start.cpython-312.pyc +0 -0
  258. package/templates/scripts/__pycache__/warm_start.cpython-314.pyc +0 -0
  259. package/templates/scripts/__pycache__/whatif_engine.cpython-312.pyc +0 -0
  260. package/templates/scripts/__pycache__/whatif_engine.cpython-314.pyc +0 -0
@@ -0,0 +1,36 @@
1
+ ---
2
+ name: card
3
+ description: Generate a standardized model card documenting the trained model — type, performance, training data, limitations, intended use, and artifact contract.
4
+ disable-model-invocation: true
5
+ allowed-tools: Read, Bash(python scripts/*:*, source .venv/bin/activate:*), Grep, Glob
6
+ ---
7
+
8
+ You generate a standardized model card from the experiment log, model contract, and config.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate the virtual environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Run the model card generator:**
18
+ ```bash
19
+ python scripts/generate_model_card.py --config config.yaml --log experiments/log.jsonl --contract model_contract.md --output MODEL_CARD.md
20
+ ```
21
+
22
+ 3. **Read and present the generated card:**
23
+ - Read `MODEL_CARD.md` and display it to the user.
24
+ - If no experiments exist yet, inform the user and show the skeleton card.
25
+
26
+ 4. **Suggest next steps:**
27
+ - Review the **Ethical Considerations** section and fill in bias, fairness, and impact notes.
28
+ - Review the **Intended Use** section and document what the model is NOT intended for.
29
+ - If limitations mention overfitting, suggest running `/turing:validate` for stability checks.
30
+ - If the card looks complete, suggest committing it to version control.
31
+
32
+ ## Error Handling
33
+
34
+ - If `config.yaml` is missing, tell the user to run `/turing:init` first.
35
+ - If `experiments/log.jsonl` is missing or empty, generate a skeleton card and note that training is needed.
36
+ - If `.venv` doesn't exist, try `python3 scripts/generate_model_card.py` directly.
@@ -0,0 +1,22 @@
1
+ ---
2
+ name: changelog
3
+ description: Model changelog generation — auto-generate human-readable progress narrative from experiment history for stakeholders.
4
+ disable-model-invocation: true
5
+ argument-hint: "[--since exp-id|date] [--audience technical|stakeholder]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Translate experiment logs into a narrative that PMs and stakeholders can read in 2 minutes.
10
+
11
+ ## Steps
12
+ 1. **Activate environment:** `source .venv/bin/activate`
13
+ 2. **Run:** `python scripts/generate_changelog.py $ARGUMENTS`
14
+ 3. **Audience:** technical (experiment IDs, configs), stakeholder (plain English, percentages)
15
+ 4. **Saved output:** `paper/CHANGELOG.md`
16
+
17
+ ## Examples
18
+ ```
19
+ /turing:changelog # Full changelog
20
+ /turing:changelog --audience stakeholder # Non-technical summary
21
+ /turing:changelog --since exp-042 # Since specific experiment
22
+ ```
@@ -0,0 +1,47 @@
1
+ ---
2
+ name: checkpoint
3
+ description: Smart checkpoint management — list, prune (Pareto-based), average top-K, resume from any point, disk usage stats.
4
+ disable-model-invocation: true
5
+ argument-hint: "<list|prune|average|resume|stats> [exp-id] [--top 3] [--dry-run]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Manage model checkpoints intelligently using Pareto dominance.
10
+
11
+ ## Steps
12
+
13
+ 1. **Activate environment:**
14
+ ```bash
15
+ source .venv/bin/activate
16
+ ```
17
+
18
+ 2. **Parse arguments from `$ARGUMENTS`:**
19
+ - First word is the action: `list`, `prune`, `average`, `resume`, `stats`
20
+ - `resume` requires an experiment ID as second argument
21
+ - `--top 3` sets the number of checkpoints for averaging
22
+ - `--dry-run` previews pruning without deleting
23
+
24
+ 3. **Run checkpoint manager:**
25
+ ```bash
26
+ python scripts/checkpoint_manager.py $ARGUMENTS
27
+ ```
28
+
29
+ 4. **Report results by action:**
30
+ - **list:** Table of all checkpoints with metrics, size, and Pareto status
31
+ - **prune:** Removes dominated checkpoints, reports space saved
32
+ - **average:** Lists top-K checkpoints for weight averaging
33
+ - **resume:** Locates checkpoint for a specific experiment
34
+ - **stats:** Disk usage summary by total, average, and model type
35
+
36
+ 5. **Saved output:** report written to `experiments/checkpoints/checkpoint-report.yaml`
37
+
38
+ ## Examples
39
+
40
+ ```
41
+ /turing:checkpoint list # Show all checkpoints
42
+ /turing:checkpoint stats # Disk usage summary
43
+ /turing:checkpoint prune --dry-run # Preview what would be pruned
44
+ /turing:checkpoint prune # Remove dominated checkpoints
45
+ /turing:checkpoint average --top 5 # Top 5 for averaging
46
+ /turing:checkpoint resume exp-042 # Resume from checkpoint
47
+ ```
@@ -0,0 +1,23 @@
1
+ ---
2
+ name: cite
3
+ description: Citation & attribution manager — track papers, datasets, methods. Audit for missing citations, generate BibTeX.
4
+ disable-model-invocation: true
5
+ argument-hint: "<add|list|check|bib> [--key Chen2016 --title XGBoost --url ...]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Track which papers and methods influenced each experiment. Catch missing citations before submission.
10
+
11
+ ## Steps
12
+ 1. **Activate environment:** `source .venv/bin/activate`
13
+ 2. **Run:** `python scripts/citation_manager.py $ARGUMENTS`
14
+ 3. **Operations:** add (associate citation with experiment), list (group by type), check (audit missing), bib (BibTeX)
15
+ 4. **Stored in:** `experiments/citations.yaml`
16
+
17
+ ## Examples
18
+ ```
19
+ /turing:cite add exp-042 --key Chen2016 --title "XGBoost" --type method --url "https://arxiv.org/abs/1603.02754"
20
+ /turing:cite list
21
+ /turing:cite check # Audit for missing citations
22
+ /turing:cite bib # Generate BibTeX
23
+ ```
@@ -0,0 +1,24 @@
1
+ ---
2
+ name: compare
3
+ description: Compare two ML experiment runs side-by-side — metrics, configuration deltas, and a verdict on which approach is more promising.
4
+ disable-model-invocation: true
5
+ argument-hint: "<exp-id-1> <exp-id-2>"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Compare two ML experiment runs side-by-side to understand what changed and why one performed better.
10
+
11
+ ## Steps
12
+
13
+ 1. **Run comparison:**
14
+ ```bash
15
+ source .venv/bin/activate && python scripts/compare_runs.py $0 $1
16
+ ```
17
+
18
+ 2. **Analyze the delta:**
19
+ - **Metric differences:** all configured metrics for both runs
20
+ - **Configuration delta:** what changed (model type, hyperparameters, features)
21
+ - **Causal analysis:** which changes likely caused the metric difference
22
+ - **Verdict:** which approach is more promising for future experiments
23
+
24
+ 3. **If either ID is missing:** report the error and suggest `/turing:status` to see available experiment IDs.
@@ -0,0 +1,27 @@
1
+ ---
2
+ name: counterfactual
3
+ description: Input-level counterfactual explanations — find the smallest input change to flip a prediction.
4
+ disable-model-invocation: true
5
+ argument-hint: "<exp-id> --sample <index> [--target <class>]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ What would need to change to flip this prediction? Minimum-change counterfactual for individual predictions.
10
+
11
+ ## Steps
12
+ 1. `source .venv/bin/activate`
13
+ 2. `python scripts/counterfactual_explanation.py $ARGUMENTS`
14
+ 3. **Saved:** `experiments/counterfactuals/`
15
+
16
+ ## Methods
17
+ - **Greedy perturbation:** change one feature at a time, find minimum flip
18
+ - **Prototype-based:** find nearest training sample from target class
19
+ - Both methods run and the best (smallest distance) is selected
20
+
21
+ ## Examples
22
+ ```
23
+ /turing:counterfactual exp-042 --sample 1247
24
+ /turing:counterfactual exp-042 --sample 1247 --target 0
25
+ /turing:counterfactual exp-042 --batch-misclassified
26
+ /turing:counterfactual exp-042 --sample 500 --json
27
+ ```
@@ -0,0 +1,43 @@
1
+ ---
2
+ name: curriculum
3
+ description: Training curriculum optimization — order data by difficulty, compare easy-to-hard vs hard-to-easy vs self-paced strategies.
4
+ disable-model-invocation: true
5
+ argument-hint: "[exp-id] [--strategies easy-to-hard,random]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Does the order your model sees data matter? Find out systematically.
10
+
11
+ ## Steps
12
+
13
+ 1. **Activate environment:**
14
+ ```bash
15
+ source .venv/bin/activate
16
+ ```
17
+
18
+ 2. **Parse arguments from `$ARGUMENTS`:**
19
+ - Optional experiment ID
20
+ - `--strategies "easy_to_hard,hard_to_easy,self_paced,random"` — strategies to test
21
+ - `--json` — raw JSON output
22
+
23
+ 3. **Run curriculum analysis:**
24
+ ```bash
25
+ python scripts/curriculum_optimizer.py $ARGUMENTS
26
+ ```
27
+
28
+ 4. **Strategies tested:**
29
+ - **Random:** standard shuffling (control)
30
+ - **Easy-to-hard:** classic curriculum learning
31
+ - **Hard-to-easy:** anti-curriculum
32
+ - **Self-paced:** start easy, gradually include harder samples
33
+
34
+ 5. **Report includes:** strategy comparison table with metric, convergence epoch, and speedup vs random; impossible sample detection (likely mislabeled)
35
+
36
+ 6. **Saved output:** report in `experiments/curriculum/<exp-id>-curriculum.yaml`
37
+
38
+ ## Examples
39
+
40
+ ```
41
+ /turing:curriculum exp-042 # All strategies
42
+ /turing:curriculum --strategies easy_to_hard,random # Specific strategies
43
+ ```
@@ -0,0 +1,97 @@
1
+ ---
2
+ name: design
3
+ description: Generate a structured experiment design for a hypothesis. Reads experiment history, searches literature for methodology, produces a scored design document at experiments/designs/.
4
+ disable-model-invocation: true
5
+ argument-hint: "<hypothesis-id or description>"
6
+ allowed-tools: Read, Write, Bash(python scripts/*:*, source .venv/bin/activate:*, mkdir:*), Grep, Glob, WebSearch, WebFetch
7
+ ---
8
+
9
+ Front-load the thinking before the coding. Given a hypothesis, produce a structured experiment design grounded in methodology from the literature.
10
+
11
+ ## Steps
12
+
13
+ ### 1. Load Context
14
+
15
+ If `$ARGUMENTS` matches `hyp-NNN`, load the hypothesis:
16
+ ```bash
17
+ source .venv/bin/activate && python scripts/manage_hypotheses.py show $ARGUMENTS
18
+ ```
19
+
20
+ If freeform text, use it directly as the hypothesis description.
21
+
22
+ Read the current config and experiment state:
23
+ ```bash
24
+ cat config.yaml
25
+ ```
26
+ ```bash
27
+ source .venv/bin/activate && python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
28
+ ```
29
+ ```bash
30
+ cat experiment_state.yaml 2>/dev/null || echo "No experiment state yet"
31
+ ```
32
+
33
+ ### 2. Search for Methodology
34
+
35
+ Use `WebSearch` to find 2-3 papers or articles describing how to implement the proposed change effectively. Target:
36
+ - The specific technique in the hypothesis (e.g., "LightGBM dart boosting implementation best practices")
37
+ - Common pitfalls for this type of change
38
+ - Benchmark results showing expected improvement range
39
+
40
+ Use `WebFetch` on the most relevant results to extract specific methodology details: hyperparameter recommendations, training procedures, evaluation approaches.
41
+
42
+ ### 3. Write the Design Document
43
+
44
+ Create `experiments/designs/<hyp-id>-design.md` (or `experiments/designs/adhoc-<date>-design.md` for freeform hypotheses):
45
+
46
+ ```bash
47
+ mkdir -p experiments/designs
48
+ ```
49
+
50
+ Write with this structure:
51
+
52
+ ```markdown
53
+ # Experiment Design: <hypothesis summary>
54
+
55
+ ## Hypothesis
56
+ <full description>
57
+
58
+ ## Objective
59
+ <what we're testing, stated as a falsifiable claim>
60
+
61
+ ## Method
62
+ <specific changes, grounded in literature findings>
63
+
64
+ ## Literature Support
65
+ - <source 1>: <what it says about this approach>
66
+ - <source 2>: <relevant finding>
67
+
68
+ ## Implementation Plan
69
+ ### Changes to train.py
70
+ <concrete code changes needed>
71
+
72
+ ### Changes to config.yaml (if any)
73
+ <hyperparameter values to set, with rationale from literature>
74
+
75
+ ## Expected Outcome
76
+ - **Success:** <metric > threshold, specific number>
77
+ - **Failure:** <what would disprove the hypothesis>
78
+
79
+ ## Risks
80
+ <specific pitfalls from literature, not generic "might not work">
81
+
82
+ ## Estimated Runs
83
+ <how many iterations>
84
+ ```
85
+
86
+ ### 4. Self-Critique
87
+
88
+ Review the design:
89
+ - Is the implementation plan specific enough for the researcher agent to execute without ambiguity?
90
+ - Does the expected outcome have a concrete metric threshold?
91
+ - Are risks actionable?
92
+
93
+ Score each dimension 1-10 (feasibility, novelty, clarity). If any < 7, revise that section. Max 2 revision rounds.
94
+
95
+ ### 5. Report
96
+
97
+ Display the design summary with scores and file location. The researcher agent can read the design during `/turing:train`.
@@ -0,0 +1,52 @@
1
+ ---
2
+ name: diagnose
3
+ description: Error analysis — cluster failure cases, identify systematic failure modes, and suggest targeted fixes with auto-queued hypotheses.
4
+ disable-model-invocation: true
5
+ argument-hint: "[exp-id] [--auto-queue] [--top 5]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Analyze where and why the model fails, beyond aggregate metrics.
10
+
11
+ ## Steps
12
+
13
+ 1. **Activate environment:**
14
+ ```bash
15
+ source .venv/bin/activate
16
+ ```
17
+
18
+ 2. **Generate predictions if needed:**
19
+ Check if `experiments/predictions/exp-NNN-preds.yaml` exists. If not, run:
20
+ ```bash
21
+ python train.py --predict-only --output experiments/predictions/
22
+ ```
23
+ The predictions file must contain `y_true`, `y_pred`, `task_type`, and optionally `features`.
24
+
25
+ 3. **Parse arguments from `$ARGUMENTS`:**
26
+ - First argument can be an experiment ID (e.g., `exp-042`); defaults to best
27
+ - `--auto-queue` auto-queues hypotheses from failure modes into `hypotheses.yaml`
28
+ - `--top 5` limits to top N failure modes (default 5)
29
+
30
+ 4. **Run error analysis:**
31
+ ```bash
32
+ python scripts/diagnose_errors.py $ARGUMENTS
33
+ ```
34
+
35
+ 5. **Report results:**
36
+ - **Classification:** confusion matrix, most-confused pairs, per-class P/R/F1, low-recall classes
37
+ - **Regression:** residual stats, P90/P95 errors, feature-range bias, systematic bias
38
+ - **Failure modes:** ranked by impact, with suggested fixes
39
+ - **Auto-hypotheses:** if `--auto-queue`, shows queued hypotheses targeting weaknesses
40
+
41
+ 6. **Saved output:** report written to `experiments/diagnoses/exp-NNN-diagnosis.yaml`
42
+
43
+ 7. **If no predictions file exists:** instruct user to run the model on validation set first.
44
+
45
+ ## Examples
46
+
47
+ ```
48
+ /turing:diagnose # Analyze best experiment
49
+ /turing:diagnose exp-042 # Specific experiment
50
+ /turing:diagnose --auto-queue # Queue fix hypotheses
51
+ /turing:diagnose --top 10 # Top 10 failure modes
52
+ ```
@@ -0,0 +1,48 @@
1
+ ---
2
+ name: diff
3
+ description: Deep experiment comparison — config diffs, metric significance, per-class regressions, training curve divergence, feature importance shifts.
4
+ disable-model-invocation: true
5
+ argument-hint: "<exp-a> <exp-b> [--code]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Deep diagnostic comparison of two experiments. Goes beyond "which metric is higher" to show where, when, and why two experiments diverge.
10
+
11
+ ## Steps
12
+
13
+ 1. **Activate environment:**
14
+ ```bash
15
+ source .venv/bin/activate
16
+ ```
17
+
18
+ 2. **Parse arguments from `$ARGUMENTS`:**
19
+ - First two arguments are experiment IDs (required), e.g. `exp-042 exp-053`
20
+ - `--code` includes git diff of train.py between the two experiments' commits
21
+ - `--json` outputs raw JSON instead of markdown
22
+
23
+ 3. **Run deep comparison:**
24
+ ```bash
25
+ python scripts/experiment_diff.py $ARGUMENTS
26
+ ```
27
+
28
+ 4. **Report results — the diff includes:**
29
+ - **Config diff:** which hyperparameters changed, with magnitude (e.g., `max_depth: 6 → 8 (+33%)`)
30
+ - **Metric diff:** all metrics with deltas and statistical significance (if seed studies exist)
31
+ - **Per-class diff:** which classes improved/regressed — flags regressions hidden by aggregate improvement
32
+ - **Training curve divergence:** the epoch where the two experiments' loss/metric curves separate
33
+ - **Feature importance shifts:** which features gained/lost importance
34
+ - **Code diff (--code):** git diff of train.py between the two commits
35
+
36
+ 5. **Saved output:** report written to `experiments/diffs/<exp-a>-vs-<exp-b>.yaml`
37
+
38
+ 6. **If experiment ID not found:** list available experiment IDs from `experiments/log.jsonl`
39
+
40
+ 7. **If no training pipeline exists:** suggest `/turing:init` first.
41
+
42
+ ## Examples
43
+
44
+ ```
45
+ /turing:diff exp-042 exp-053 # Full diagnostic comparison
46
+ /turing:diff exp-042 exp-053 --code # Include train.py code changes
47
+ /turing:diff exp-001 exp-010 --json # Raw JSON output
48
+ ```
@@ -0,0 +1,56 @@
1
+ ---
2
+ name: distill
3
+ description: Model compression via distillation — train a smaller student model to match a larger teacher's predictions.
4
+ disable-model-invocation: true
5
+ argument-hint: "<teacher-exp-id> [--compression 4] [--method soft-labels]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Compress a large model into a smaller, faster one for production. Measures the accuracy/size/latency tradeoff.
10
+
11
+ ## Steps
12
+
13
+ 1. **Activate environment:**
14
+ ```bash
15
+ source .venv/bin/activate
16
+ ```
17
+
18
+ 2. **Parse arguments from `$ARGUMENTS`:**
19
+ - First argument is teacher experiment ID (required)
20
+ - `--compression 4` — compression ratio (default: 4x)
21
+ - `--method soft_labels|feature_matching|dataset_distillation` — distillation method
22
+ - `--target-latency 5` — auto-adjust compression to meet latency target (ms)
23
+ - `--json` — raw JSON output
24
+
25
+ 3. **Run distillation planner:**
26
+ ```bash
27
+ python scripts/model_distiller.py $ARGUMENTS
28
+ ```
29
+
30
+ 4. **Report includes:**
31
+ - Teacher model metrics
32
+ - Auto-selected student architecture (fewer trees/layers/width)
33
+ - Estimated size reduction and latency improvement
34
+ - Distillation configuration (temperature, alpha, loss function)
35
+ - Verdict: EXCELLENT / ACCEPTABLE / MARGINAL / TOO MUCH LOSS
36
+
37
+ 5. **Student selection by model type:**
38
+ - **Tree models:** fewer estimators, shallower depth
39
+ - **Neural networks:** fewer layers, narrower hidden dims
40
+ - **scikit-learn:** simpler model family (RandomForest → DecisionTree)
41
+
42
+ 6. **Distillation methods:**
43
+ - **soft_labels:** train on teacher's probability outputs with temperature scaling
44
+ - **feature_matching:** align intermediate representations (neural only)
45
+ - **dataset_distillation:** train on teacher-labeled synthetic data
46
+
47
+ 7. **Saved output:** report written to `experiments/distillations/distill-<exp-id>.yaml`
48
+
49
+ ## Examples
50
+
51
+ ```
52
+ /turing:distill exp-042 # 4x compression, soft labels
53
+ /turing:distill exp-042 --compression 8 # Aggressive compression
54
+ /turing:distill exp-042 --method feature_matching # Neural feature alignment
55
+ /turing:distill exp-042 --target-latency 5 # Meet 5ms latency target
56
+ ```
@@ -0,0 +1,31 @@
1
+ ---
2
+ name: doctor
3
+ description: Harness self-diagnosis — check environment, project, resources, and git state. Auto-fix common issues.
4
+ disable-model-invocation: true
5
+ argument-hint: "[--fix] [--verbose]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Is Turing healthy? Check everything and get a score.
10
+
11
+ ## Steps
12
+ 1. `source .venv/bin/activate`
13
+ 2. `python scripts/harness_doctor.py $ARGUMENTS`
14
+ 3. **Saved:** `experiments/doctor/`
15
+
16
+ ## Checks
17
+ - **Environment:** Python version, venv status
18
+ - **Dependencies:** all required packages importable
19
+ - **Config:** config.yaml valid with required fields
20
+ - **Experiment log:** JSONL integrity, corrupt line detection
21
+ - **Scripts:** train.py, prepare.py, evaluate.py exist and parse
22
+ - **Disk space:** warn if <1GB free
23
+ - **Git state:** uncommitted changes to critical files
24
+ - **Claude hooks:** `.claude/settings.local.json` hook group schema; `--fix` migrates legacy bare command hooks
25
+
26
+ ## Examples
27
+ ```
28
+ /turing:doctor
29
+ /turing:doctor --fix
30
+ /turing:doctor --verbose --json
31
+ ```
@@ -0,0 +1,54 @@
1
+ ---
2
+ name: ensemble
3
+ description: Automated ensemble construction — combines top-K models via voting, stacking, and blending for zero-cost improvement.
4
+ disable-model-invocation: true
5
+ argument-hint: "[--top-k 5] [--methods voting,stacking,blending]"
6
+ allowed-tools: Read, Bash(*), Grep, Glob
7
+ ---
8
+
9
+ Build ensembles from your best experiments automatically. Often yields 1-3% improvement with zero additional training.
10
+
11
+ ## Steps
12
+
13
+ 1. **Activate environment:**
14
+ ```bash
15
+ source .venv/bin/activate
16
+ ```
17
+
18
+ 2. **Parse arguments from `$ARGUMENTS`:**
19
+ - `--top-k 5` — number of top models to include (default: 5)
20
+ - `--methods voting,stacking,blending` — ensemble methods to try
21
+ - `--predictions-dir experiments/predictions` — directory with saved predictions
22
+ - `--json` — raw JSON output
23
+
24
+ 3. **Run ensemble construction:**
25
+ ```bash
26
+ python scripts/build_ensemble.py $ARGUMENTS
27
+ ```
28
+
29
+ 4. **Report results:**
30
+ - Table of all ensemble methods tried with metric deltas vs best single model
31
+ - Best ensemble method highlighted with improvement amount
32
+ - Diversity analysis: prediction correlation matrix, diversity assessment
33
+ - Base model summary: which experiments were combined
34
+
35
+ 5. **Ensemble methods:**
36
+ - **Voting:** majority vote (classification) or mean (regression)
37
+ - **Weighted voting:** weights proportional to individual model performance
38
+ - **Stacking:** cross-validated meta-learner (ridge/logistic) on out-of-fold predictions
39
+ - **Blending:** holdout-based meta-learner (simpler, less data-efficient)
40
+
41
+ 6. **Prerequisites:** experiments must have saved predictions in `experiments/predictions/`. Each experiment needs `<exp-id>-predictions.npy` and a shared `labels.npy`.
42
+
43
+ 7. **If no predictions exist:** suggest saving predictions during training by adding prediction logging to `evaluate.py`.
44
+
45
+ 8. **Saved output:** report written to `experiments/ensembles/ensemble-*.yaml`
46
+
47
+ ## Examples
48
+
49
+ ```
50
+ /turing:ensemble # Default: top-5, all methods
51
+ /turing:ensemble --top-k 3 # Top-3 models only
52
+ /turing:ensemble --methods voting,stacking # Specific methods
53
+ /turing:ensemble --json # Machine-readable output
54
+ ```
@@ -0,0 +1,107 @@
1
+ ---
2
+ name: explore
3
+ description: Tree-search-guided hypothesis exploration using AB-MCTS. Explores the space of experiment ideas as a search tree, scored by the critique engine. Discovers non-obvious refinement chains that linear suggestion cannot find.
4
+ disable-model-invocation: true
5
+ argument-hint: "[ml/project] [--iterations N] [--top N] [--strategy abmcts-a|abmcts-m|greedy]"
6
+ allowed-tools: Read, Write, Bash(python scripts/*:*, source .venv/bin/activate:*), Grep, Glob
7
+ ---
8
+
9
+ Explore the hypothesis space using tree search. Instead of suggesting independent ideas, this builds and searches a tree of refinement chains — each node is a hypothesis scored by novelty, feasibility, and expected impact.
10
+
11
+ ## Project Detection
12
+
13
+ 0. **Detect project directory:**
14
+ - If `$ARGUMENTS` contains a path (e.g., `ml/coding`), use that as the project directory
15
+ - Else if cwd contains `config.yaml` and `train.py`, use cwd
16
+ - Else search for `ml/*/` subdirectories containing `config.yaml`
17
+ - If exactly one found, use it
18
+ - If multiple found, list them and ask the user which to target
19
+ - All subsequent commands run from the detected project directory
20
+
21
+ ## Parse Options
22
+
23
+ Extract from `$ARGUMENTS`:
24
+ - `--iterations N` — search depth (default: 30)
25
+ - `--top N` — number of results to return (default: 5)
26
+ - `--strategy` — algorithm choice: `abmcts-a` (default), `abmcts-m` (Bayesian), or `greedy` (no TreeQuest needed)
27
+ - `--seeds-only` — just show generated seeds without running search
28
+ - `--json` — output as JSON for programmatic use
29
+
30
+ ## Steps
31
+
32
+ ### 1. Assess Current State
33
+
34
+ ```bash
35
+ source .venv/bin/activate && python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
36
+ ```
37
+
38
+ Read `config.yaml` to understand the current model and metric.
39
+
40
+ ### 2. Run Tree Search
41
+
42
+ ```bash
43
+ source .venv/bin/activate && python scripts/treequest_suggest.py \
44
+ --log experiments/log.jsonl \
45
+ --config config.yaml \
46
+ --top <N> \
47
+ --iterations <N> \
48
+ --strategy <strategy>
49
+ ```
50
+
51
+ The script will:
52
+ - Generate seed hypotheses from config and experiment history
53
+ - Run AB-MCTS (or greedy fallback) over the hypothesis tree
54
+ - Score each node using the critique engine
55
+ - Return top-K ranked, deduplicated hypotheses
56
+
57
+ ### 3. Queue Best Hypotheses
58
+
59
+ For each result, add to the hypothesis queue:
60
+
61
+ ```bash
62
+ source .venv/bin/activate && python scripts/manage_hypotheses.py add "<description>" \
63
+ --priority medium --source treequest
64
+ ```
65
+
66
+ ### 4. Show Results
67
+
68
+ Display the search output and confirm queuing:
69
+
70
+ ```
71
+ TreeQuest Hypothesis Exploration (AB-MCTS-A)
72
+ ============================================
73
+ Nodes explored: 35
74
+ Top 5 hypotheses by critique score:
75
+
76
+ 1. [PROCEED] (score: 7.8/10)
77
+ Switch to LightGBM with dart boosting; additionally add polynomial features
78
+ Novelty: 8 Feasibility: 9 Impact: 7
79
+ -> Queued as hyp-NNN
80
+
81
+ 2. [PROCEED] (score: 7.2/10)
82
+ Use low learning rate (0.01) with 2000 estimators; additionally add L2 regularization
83
+ Novelty: 7 Feasibility: 8 Impact: 7
84
+ Depth: 1 (refined from parent)
85
+ -> Queued as hyp-NNN
86
+
87
+ ...
88
+
89
+ Queued N hypotheses. Run /turing:train to test them.
90
+ ```
91
+
92
+ ## How It Differs From /turing:suggest
93
+
94
+ | | `/turing:suggest` | `/turing:explore` |
95
+ |---|---|---|
96
+ | **Source** | Web literature search | Tree search over critique scores |
97
+ | **Strategy** | Independent suggestions | Refinement chains (parent -> child) |
98
+ | **Requires internet** | Yes | No |
99
+ | **Discovers** | What papers recommend | What combinations score well |
100
+ | **Best for** | Early-stage exploration | Mid-experiment optimization |
101
+
102
+ ## Integration
103
+
104
+ - Results feed into `hypotheses.yaml` — the next `/turing:train` picks them up
105
+ - `/turing:brief` shows queued treequest-sourced hypotheses
106
+ - `/turing:suggest --strategy treequest` is an alias for this command
107
+ - Human can override priority: `/turing:try` always takes precedence