claude-turing 4.6.0 → 4.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (333) hide show
  1. package/.claude-plugin/plugin.json +2 -2
  2. package/README.md +1 -1
  3. package/commands/ablate.md +0 -1
  4. package/commands/annotate.md +0 -1
  5. package/commands/archive.md +0 -1
  6. package/commands/audit.md +0 -1
  7. package/commands/baseline.md +0 -1
  8. package/commands/brief.md +0 -1
  9. package/commands/budget.md +0 -1
  10. package/commands/calibrate.md +0 -1
  11. package/commands/card.md +0 -1
  12. package/commands/changelog.md +0 -1
  13. package/commands/checkpoint.md +0 -1
  14. package/commands/cite.md +0 -1
  15. package/commands/compare.md +0 -1
  16. package/commands/counterfactual.md +0 -1
  17. package/commands/curriculum.md +0 -1
  18. package/commands/design.md +0 -1
  19. package/commands/diagnose.md +0 -1
  20. package/commands/diff.md +0 -1
  21. package/commands/distill.md +0 -1
  22. package/commands/doctor.md +0 -1
  23. package/commands/ensemble.md +0 -1
  24. package/commands/explore.md +0 -1
  25. package/commands/export.md +0 -1
  26. package/commands/feature.md +0 -1
  27. package/commands/flashback.md +0 -1
  28. package/commands/fork.md +0 -1
  29. package/commands/frontier.md +0 -1
  30. package/commands/init.md +0 -1
  31. package/commands/leak.md +0 -1
  32. package/commands/lit.md +0 -1
  33. package/commands/logbook.md +0 -1
  34. package/commands/merge.md +0 -1
  35. package/commands/mode.md +0 -1
  36. package/commands/onboard.md +0 -1
  37. package/commands/paper.md +0 -1
  38. package/commands/plan.md +0 -1
  39. package/commands/poster.md +0 -1
  40. package/commands/postmortem.md +0 -1
  41. package/commands/preflight.md +0 -1
  42. package/commands/present.md +0 -1
  43. package/commands/profile.md +0 -1
  44. package/commands/prune.md +0 -1
  45. package/commands/quantize.md +0 -1
  46. package/commands/queue.md +0 -1
  47. package/commands/registry.md +0 -1
  48. package/commands/regress.md +0 -1
  49. package/commands/replay.md +0 -1
  50. package/commands/report.md +0 -1
  51. package/commands/reproduce.md +0 -1
  52. package/commands/retry.md +0 -1
  53. package/commands/review.md +0 -1
  54. package/commands/sanity.md +0 -1
  55. package/commands/scale.md +0 -1
  56. package/commands/search.md +0 -1
  57. package/commands/seed.md +0 -1
  58. package/commands/sensitivity.md +0 -1
  59. package/commands/share.md +0 -1
  60. package/commands/simulate.md +0 -1
  61. package/commands/status.md +0 -1
  62. package/commands/stitch.md +0 -1
  63. package/commands/suggest.md +0 -1
  64. package/commands/surgery.md +0 -1
  65. package/commands/sweep.md +0 -1
  66. package/commands/template.md +0 -1
  67. package/commands/train.md +0 -1
  68. package/commands/transfer.md +0 -1
  69. package/commands/trend.md +0 -1
  70. package/commands/try.md +0 -1
  71. package/commands/turing.md +3 -3
  72. package/commands/update.md +0 -1
  73. package/commands/validate.md +0 -1
  74. package/commands/warm.md +0 -1
  75. package/commands/watch.md +0 -1
  76. package/commands/whatif.md +0 -1
  77. package/commands/xray.md +0 -1
  78. package/config/commands.yaml +74 -74
  79. package/package.json +10 -3
  80. package/skills/turing/SKILL.md +180 -0
  81. package/skills/turing/ablate/SKILL.md +46 -0
  82. package/skills/turing/annotate/SKILL.md +22 -0
  83. package/skills/turing/archive/SKILL.md +22 -0
  84. package/skills/turing/audit/SKILL.md +55 -0
  85. package/skills/turing/baseline/SKILL.md +44 -0
  86. package/skills/turing/brief/SKILL.md +94 -0
  87. package/skills/turing/budget/SKILL.md +51 -0
  88. package/skills/turing/calibrate/SKILL.md +46 -0
  89. package/skills/turing/card/SKILL.md +35 -0
  90. package/skills/turing/changelog/SKILL.md +21 -0
  91. package/skills/turing/checkpoint/SKILL.md +46 -0
  92. package/skills/turing/cite/SKILL.md +22 -0
  93. package/skills/turing/compare/SKILL.md +23 -0
  94. package/skills/turing/counterfactual/SKILL.md +26 -0
  95. package/skills/turing/curriculum/SKILL.md +42 -0
  96. package/skills/turing/design/SKILL.md +96 -0
  97. package/skills/turing/diagnose/SKILL.md +51 -0
  98. package/skills/turing/diff/SKILL.md +47 -0
  99. package/skills/turing/distill/SKILL.md +55 -0
  100. package/skills/turing/doctor/SKILL.md +30 -0
  101. package/skills/turing/ensemble/SKILL.md +53 -0
  102. package/skills/turing/explore/SKILL.md +106 -0
  103. package/skills/turing/export/SKILL.md +47 -0
  104. package/skills/turing/feature/SKILL.md +41 -0
  105. package/skills/turing/flashback/SKILL.md +21 -0
  106. package/skills/turing/fork/SKILL.md +39 -0
  107. package/skills/turing/frontier/SKILL.md +44 -0
  108. package/skills/turing/init/SKILL.md +153 -0
  109. package/skills/turing/leak/SKILL.md +46 -0
  110. package/skills/turing/lit/SKILL.md +46 -0
  111. package/skills/turing/logbook/SKILL.md +50 -0
  112. package/skills/turing/merge/SKILL.md +23 -0
  113. package/skills/turing/mode/SKILL.md +42 -0
  114. package/skills/turing/onboard/SKILL.md +19 -0
  115. package/skills/turing/paper/SKILL.md +43 -0
  116. package/skills/turing/plan/SKILL.md +26 -0
  117. package/skills/turing/poster/SKILL.md +88 -0
  118. package/skills/turing/postmortem/SKILL.md +27 -0
  119. package/skills/turing/preflight/SKILL.md +74 -0
  120. package/skills/turing/present/SKILL.md +22 -0
  121. package/skills/turing/profile/SKILL.md +42 -0
  122. package/skills/turing/prune/SKILL.md +25 -0
  123. package/skills/turing/quantize/SKILL.md +23 -0
  124. package/skills/turing/queue/SKILL.md +47 -0
  125. package/skills/turing/registry/SKILL.md +30 -0
  126. package/skills/turing/regress/SKILL.md +52 -0
  127. package/skills/turing/replay/SKILL.md +22 -0
  128. package/skills/turing/report/SKILL.md +96 -0
  129. package/skills/turing/reproduce/SKILL.md +47 -0
  130. package/skills/turing/retry/SKILL.md +40 -0
  131. package/skills/turing/review/SKILL.md +19 -0
  132. package/skills/turing/rules/loop-protocol.md +91 -0
  133. package/skills/turing/sanity/SKILL.md +47 -0
  134. package/skills/turing/scale/SKILL.md +54 -0
  135. package/skills/turing/search/SKILL.md +21 -0
  136. package/skills/turing/seed/SKILL.md +46 -0
  137. package/skills/turing/sensitivity/SKILL.md +40 -0
  138. package/skills/turing/share/SKILL.md +19 -0
  139. package/skills/turing/simulate/SKILL.md +27 -0
  140. package/skills/turing/status/SKILL.md +23 -0
  141. package/skills/turing/stitch/SKILL.md +48 -0
  142. package/skills/turing/suggest/SKILL.md +158 -0
  143. package/skills/turing/surgery/SKILL.md +26 -0
  144. package/skills/turing/sweep/SKILL.md +44 -0
  145. package/skills/turing/template/SKILL.md +21 -0
  146. package/skills/turing/train/SKILL.md +74 -0
  147. package/skills/turing/transfer/SKILL.md +53 -0
  148. package/skills/turing/trend/SKILL.md +20 -0
  149. package/skills/turing/try/SKILL.md +62 -0
  150. package/skills/turing/update/SKILL.md +26 -0
  151. package/skills/turing/validate/SKILL.md +33 -0
  152. package/skills/turing/warm/SKILL.md +52 -0
  153. package/skills/turing/watch/SKILL.md +59 -0
  154. package/skills/turing/whatif/SKILL.md +30 -0
  155. package/skills/turing/xray/SKILL.md +42 -0
  156. package/src/command-registry.js +21 -0
  157. package/src/install.js +4 -3
  158. package/src/sync-commands-layout.js +149 -0
  159. package/src/sync-skills-layout.js +20 -0
  160. package/templates/__pycache__/evaluate.cpython-312.pyc +0 -0
  161. package/templates/__pycache__/evaluate.cpython-314.pyc +0 -0
  162. package/templates/__pycache__/prepare.cpython-312.pyc +0 -0
  163. package/templates/__pycache__/prepare.cpython-314.pyc +0 -0
  164. package/templates/features/__pycache__/__init__.cpython-312.pyc +0 -0
  165. package/templates/features/__pycache__/__init__.cpython-314.pyc +0 -0
  166. package/templates/features/__pycache__/featurizers.cpython-312.pyc +0 -0
  167. package/templates/features/__pycache__/featurizers.cpython-314.pyc +0 -0
  168. package/templates/scripts/__pycache__/__init__.cpython-312.pyc +0 -0
  169. package/templates/scripts/__pycache__/__init__.cpython-314.pyc +0 -0
  170. package/templates/scripts/__pycache__/ablation_study.cpython-312.pyc +0 -0
  171. package/templates/scripts/__pycache__/ablation_study.cpython-314.pyc +0 -0
  172. package/templates/scripts/__pycache__/architecture_surgery.cpython-312.pyc +0 -0
  173. package/templates/scripts/__pycache__/architecture_surgery.cpython-314.pyc +0 -0
  174. package/templates/scripts/__pycache__/budget_manager.cpython-312.pyc +0 -0
  175. package/templates/scripts/__pycache__/budget_manager.cpython-314.pyc +0 -0
  176. package/templates/scripts/__pycache__/build_ensemble.cpython-312.pyc +0 -0
  177. package/templates/scripts/__pycache__/build_ensemble.cpython-314.pyc +0 -0
  178. package/templates/scripts/__pycache__/calibration.cpython-312.pyc +0 -0
  179. package/templates/scripts/__pycache__/calibration.cpython-314.pyc +0 -0
  180. package/templates/scripts/__pycache__/check_convergence.cpython-312.pyc +0 -0
  181. package/templates/scripts/__pycache__/check_convergence.cpython-314.pyc +0 -0
  182. package/templates/scripts/__pycache__/checkpoint_manager.cpython-312.pyc +0 -0
  183. package/templates/scripts/__pycache__/checkpoint_manager.cpython-314.pyc +0 -0
  184. package/templates/scripts/__pycache__/citation_manager.cpython-312.pyc +0 -0
  185. package/templates/scripts/__pycache__/citation_manager.cpython-314.pyc +0 -0
  186. package/templates/scripts/__pycache__/cost_frontier.cpython-312.pyc +0 -0
  187. package/templates/scripts/__pycache__/cost_frontier.cpython-314.pyc +0 -0
  188. package/templates/scripts/__pycache__/counterfactual_explanation.cpython-312.pyc +0 -0
  189. package/templates/scripts/__pycache__/counterfactual_explanation.cpython-314.pyc +0 -0
  190. package/templates/scripts/__pycache__/critique_hypothesis.cpython-312.pyc +0 -0
  191. package/templates/scripts/__pycache__/critique_hypothesis.cpython-314.pyc +0 -0
  192. package/templates/scripts/__pycache__/curriculum_optimizer.cpython-312.pyc +0 -0
  193. package/templates/scripts/__pycache__/curriculum_optimizer.cpython-314.pyc +0 -0
  194. package/templates/scripts/__pycache__/diagnose_errors.cpython-312.pyc +0 -0
  195. package/templates/scripts/__pycache__/diagnose_errors.cpython-314.pyc +0 -0
  196. package/templates/scripts/__pycache__/draft_paper_sections.cpython-312.pyc +0 -0
  197. package/templates/scripts/__pycache__/draft_paper_sections.cpython-314.pyc +0 -0
  198. package/templates/scripts/__pycache__/equivalence_checker.cpython-312.pyc +0 -0
  199. package/templates/scripts/__pycache__/equivalence_checker.cpython-314.pyc +0 -0
  200. package/templates/scripts/__pycache__/experiment_annotations.cpython-312.pyc +0 -0
  201. package/templates/scripts/__pycache__/experiment_annotations.cpython-314.pyc +0 -0
  202. package/templates/scripts/__pycache__/experiment_archive.cpython-312.pyc +0 -0
  203. package/templates/scripts/__pycache__/experiment_archive.cpython-314.pyc +0 -0
  204. package/templates/scripts/__pycache__/experiment_diff.cpython-312.pyc +0 -0
  205. package/templates/scripts/__pycache__/experiment_diff.cpython-314.pyc +0 -0
  206. package/templates/scripts/__pycache__/experiment_index.cpython-312.pyc +0 -0
  207. package/templates/scripts/__pycache__/experiment_index.cpython-314.pyc +0 -0
  208. package/templates/scripts/__pycache__/experiment_queue.cpython-312.pyc +0 -0
  209. package/templates/scripts/__pycache__/experiment_queue.cpython-314.pyc +0 -0
  210. package/templates/scripts/__pycache__/experiment_replay.cpython-312.pyc +0 -0
  211. package/templates/scripts/__pycache__/experiment_replay.cpython-314.pyc +0 -0
  212. package/templates/scripts/__pycache__/experiment_search.cpython-312.pyc +0 -0
  213. package/templates/scripts/__pycache__/experiment_search.cpython-314.pyc +0 -0
  214. package/templates/scripts/__pycache__/experiment_simulator.cpython-312.pyc +0 -0
  215. package/templates/scripts/__pycache__/experiment_simulator.cpython-314.pyc +0 -0
  216. package/templates/scripts/__pycache__/experiment_templates.cpython-312.pyc +0 -0
  217. package/templates/scripts/__pycache__/experiment_templates.cpython-314.pyc +0 -0
  218. package/templates/scripts/__pycache__/export_card.cpython-312.pyc +0 -0
  219. package/templates/scripts/__pycache__/export_card.cpython-314.pyc +0 -0
  220. package/templates/scripts/__pycache__/export_formats.cpython-312.pyc +0 -0
  221. package/templates/scripts/__pycache__/export_formats.cpython-314.pyc +0 -0
  222. package/templates/scripts/__pycache__/failure_postmortem.cpython-312.pyc +0 -0
  223. package/templates/scripts/__pycache__/failure_postmortem.cpython-314.pyc +0 -0
  224. package/templates/scripts/__pycache__/feature_intelligence.cpython-312.pyc +0 -0
  225. package/templates/scripts/__pycache__/feature_intelligence.cpython-314.pyc +0 -0
  226. package/templates/scripts/__pycache__/fork_experiment.cpython-312.pyc +0 -0
  227. package/templates/scripts/__pycache__/fork_experiment.cpython-314.pyc +0 -0
  228. package/templates/scripts/__pycache__/generate_baselines.cpython-312.pyc +0 -0
  229. package/templates/scripts/__pycache__/generate_baselines.cpython-314.pyc +0 -0
  230. package/templates/scripts/__pycache__/generate_brief.cpython-312.pyc +0 -0
  231. package/templates/scripts/__pycache__/generate_brief.cpython-314.pyc +0 -0
  232. package/templates/scripts/__pycache__/generate_changelog.cpython-312.pyc +0 -0
  233. package/templates/scripts/__pycache__/generate_changelog.cpython-314.pyc +0 -0
  234. package/templates/scripts/__pycache__/generate_figures.cpython-312.pyc +0 -0
  235. package/templates/scripts/__pycache__/generate_figures.cpython-314.pyc +0 -0
  236. package/templates/scripts/__pycache__/generate_logbook.cpython-312.pyc +0 -0
  237. package/templates/scripts/__pycache__/generate_logbook.cpython-314.pyc +0 -0
  238. package/templates/scripts/__pycache__/generate_model_card.cpython-312.pyc +0 -0
  239. package/templates/scripts/__pycache__/generate_model_card.cpython-314.pyc +0 -0
  240. package/templates/scripts/__pycache__/generate_onboarding.cpython-312.pyc +0 -0
  241. package/templates/scripts/__pycache__/generate_onboarding.cpython-314.pyc +0 -0
  242. package/templates/scripts/__pycache__/harness_doctor.cpython-312.pyc +0 -0
  243. package/templates/scripts/__pycache__/harness_doctor.cpython-314.pyc +0 -0
  244. package/templates/scripts/__pycache__/incremental_update.cpython-312.pyc +0 -0
  245. package/templates/scripts/__pycache__/incremental_update.cpython-314.pyc +0 -0
  246. package/templates/scripts/__pycache__/knowledge_transfer.cpython-312.pyc +0 -0
  247. package/templates/scripts/__pycache__/knowledge_transfer.cpython-314.pyc +0 -0
  248. package/templates/scripts/__pycache__/latency_benchmark.cpython-312.pyc +0 -0
  249. package/templates/scripts/__pycache__/latency_benchmark.cpython-314.pyc +0 -0
  250. package/templates/scripts/__pycache__/leakage_detector.cpython-312.pyc +0 -0
  251. package/templates/scripts/__pycache__/leakage_detector.cpython-314.pyc +0 -0
  252. package/templates/scripts/__pycache__/literature_search.cpython-312.pyc +0 -0
  253. package/templates/scripts/__pycache__/literature_search.cpython-314.pyc +0 -0
  254. package/templates/scripts/__pycache__/log_experiment.cpython-312.pyc +0 -0
  255. package/templates/scripts/__pycache__/log_experiment.cpython-314.pyc +0 -0
  256. package/templates/scripts/__pycache__/manage_hypotheses.cpython-312.pyc +0 -0
  257. package/templates/scripts/__pycache__/manage_hypotheses.cpython-314.pyc +0 -0
  258. package/templates/scripts/__pycache__/methodology_audit.cpython-312.pyc +0 -0
  259. package/templates/scripts/__pycache__/methodology_audit.cpython-314.pyc +0 -0
  260. package/templates/scripts/__pycache__/model_distiller.cpython-312.pyc +0 -0
  261. package/templates/scripts/__pycache__/model_distiller.cpython-314.pyc +0 -0
  262. package/templates/scripts/__pycache__/model_lifecycle.cpython-312.pyc +0 -0
  263. package/templates/scripts/__pycache__/model_lifecycle.cpython-314.pyc +0 -0
  264. package/templates/scripts/__pycache__/model_merger.cpython-312.pyc +0 -0
  265. package/templates/scripts/__pycache__/model_merger.cpython-314.pyc +0 -0
  266. package/templates/scripts/__pycache__/model_pruning.cpython-312.pyc +0 -0
  267. package/templates/scripts/__pycache__/model_pruning.cpython-314.pyc +0 -0
  268. package/templates/scripts/__pycache__/model_quantization.cpython-312.pyc +0 -0
  269. package/templates/scripts/__pycache__/model_quantization.cpython-314.pyc +0 -0
  270. package/templates/scripts/__pycache__/model_xray.cpython-312.pyc +0 -0
  271. package/templates/scripts/__pycache__/model_xray.cpython-314.pyc +0 -0
  272. package/templates/scripts/__pycache__/novelty_guard.cpython-312.pyc +0 -0
  273. package/templates/scripts/__pycache__/novelty_guard.cpython-314.pyc +0 -0
  274. package/templates/scripts/__pycache__/package_experiments.cpython-312.pyc +0 -0
  275. package/templates/scripts/__pycache__/package_experiments.cpython-314.pyc +0 -0
  276. package/templates/scripts/__pycache__/pareto_frontier.cpython-312.pyc +0 -0
  277. package/templates/scripts/__pycache__/pareto_frontier.cpython-314.pyc +0 -0
  278. package/templates/scripts/__pycache__/parse_metrics.cpython-312.pyc +0 -0
  279. package/templates/scripts/__pycache__/parse_metrics.cpython-314.pyc +0 -0
  280. package/templates/scripts/__pycache__/pipeline_manager.cpython-312.pyc +0 -0
  281. package/templates/scripts/__pycache__/pipeline_manager.cpython-314.pyc +0 -0
  282. package/templates/scripts/__pycache__/profile_training.cpython-312.pyc +0 -0
  283. package/templates/scripts/__pycache__/profile_training.cpython-314.pyc +0 -0
  284. package/templates/scripts/__pycache__/regression_gate.cpython-312.pyc +0 -0
  285. package/templates/scripts/__pycache__/regression_gate.cpython-314.pyc +0 -0
  286. package/templates/scripts/__pycache__/reproduce_experiment.cpython-312.pyc +0 -0
  287. package/templates/scripts/__pycache__/reproduce_experiment.cpython-314.pyc +0 -0
  288. package/templates/scripts/__pycache__/research_planner.cpython-312.pyc +0 -0
  289. package/templates/scripts/__pycache__/research_planner.cpython-314.pyc +0 -0
  290. package/templates/scripts/__pycache__/sanity_checks.cpython-312.pyc +0 -0
  291. package/templates/scripts/__pycache__/sanity_checks.cpython-314.pyc +0 -0
  292. package/templates/scripts/__pycache__/scaffold.cpython-312.pyc +0 -0
  293. package/templates/scripts/__pycache__/scaffold.cpython-314.pyc +0 -0
  294. package/templates/scripts/__pycache__/scaling_estimator.cpython-312.pyc +0 -0
  295. package/templates/scripts/__pycache__/scaling_estimator.cpython-314.pyc +0 -0
  296. package/templates/scripts/__pycache__/seed_runner.cpython-312.pyc +0 -0
  297. package/templates/scripts/__pycache__/seed_runner.cpython-314.pyc +0 -0
  298. package/templates/scripts/__pycache__/sensitivity_analysis.cpython-312.pyc +0 -0
  299. package/templates/scripts/__pycache__/sensitivity_analysis.cpython-314.pyc +0 -0
  300. package/templates/scripts/__pycache__/session_flashback.cpython-312.pyc +0 -0
  301. package/templates/scripts/__pycache__/session_flashback.cpython-314.pyc +0 -0
  302. package/templates/scripts/__pycache__/show_experiment_tree.cpython-312.pyc +0 -0
  303. package/templates/scripts/__pycache__/show_experiment_tree.cpython-314.pyc +0 -0
  304. package/templates/scripts/__pycache__/show_families.cpython-312.pyc +0 -0
  305. package/templates/scripts/__pycache__/show_families.cpython-314.pyc +0 -0
  306. package/templates/scripts/__pycache__/simulate_review.cpython-312.pyc +0 -0
  307. package/templates/scripts/__pycache__/simulate_review.cpython-314.pyc +0 -0
  308. package/templates/scripts/__pycache__/smart_retry.cpython-312.pyc +0 -0
  309. package/templates/scripts/__pycache__/smart_retry.cpython-314.pyc +0 -0
  310. package/templates/scripts/__pycache__/statistical_compare.cpython-312.pyc +0 -0
  311. package/templates/scripts/__pycache__/statistical_compare.cpython-314.pyc +0 -0
  312. package/templates/scripts/__pycache__/suggest_next.cpython-312.pyc +0 -0
  313. package/templates/scripts/__pycache__/suggest_next.cpython-314.pyc +0 -0
  314. package/templates/scripts/__pycache__/sweep.cpython-312.pyc +0 -0
  315. package/templates/scripts/__pycache__/sweep.cpython-314.pyc +0 -0
  316. package/templates/scripts/__pycache__/synthesize_decision.cpython-312.pyc +0 -0
  317. package/templates/scripts/__pycache__/synthesize_decision.cpython-314.pyc +0 -0
  318. package/templates/scripts/__pycache__/training_monitor.cpython-312.pyc +0 -0
  319. package/templates/scripts/__pycache__/training_monitor.cpython-314.pyc +0 -0
  320. package/templates/scripts/__pycache__/treequest_suggest.cpython-312.pyc +0 -0
  321. package/templates/scripts/__pycache__/treequest_suggest.cpython-314.pyc +0 -0
  322. package/templates/scripts/__pycache__/trend_analysis.cpython-312.pyc +0 -0
  323. package/templates/scripts/__pycache__/trend_analysis.cpython-314.pyc +0 -0
  324. package/templates/scripts/__pycache__/turing_io.cpython-312.pyc +0 -0
  325. package/templates/scripts/__pycache__/turing_io.cpython-314.pyc +0 -0
  326. package/templates/scripts/__pycache__/update_state.cpython-312.pyc +0 -0
  327. package/templates/scripts/__pycache__/update_state.cpython-314.pyc +0 -0
  328. package/templates/scripts/__pycache__/verify_placeholders.cpython-312.pyc +0 -0
  329. package/templates/scripts/__pycache__/verify_placeholders.cpython-314.pyc +0 -0
  330. package/templates/scripts/__pycache__/warm_start.cpython-312.pyc +0 -0
  331. package/templates/scripts/__pycache__/warm_start.cpython-314.pyc +0 -0
  332. package/templates/scripts/__pycache__/whatif_engine.cpython-312.pyc +0 -0
  333. package/templates/scripts/__pycache__/whatif_engine.cpython-314.pyc +0 -0
@@ -0,0 +1,91 @@
1
+ # Autoresearch Loop Protocol Rules
2
+
3
+ These rules govern the autonomous ML experiment loop. They are non-negotiable safety constraints that preserve the integrity of the experimental process.
4
+
5
+ ## The Fundamental Separation
6
+
7
+ The autoresearch harness enforces a strict separation between the **hypothesis space** (what the agent can change) and the **measurement apparatus** (how results are evaluated). This separation is the architectural invariant that makes autonomous experimentation trustworthy.
8
+
9
+ | Layer | Files | Agent Access | Rationale |
10
+ |-------|-------|-------------|-----------|
11
+ | Hidden | `evaluate.py` | NONE — do not read, write, or reference | Reading evaluation code enables seed exploitation and metric gaming |
12
+ | Measurement | `prepare.py` | READ-ONLY | Data loading is visible but immutable |
13
+ | Hypothesis | `train.py` | READ-WRITE | All experimental changes go here |
14
+ | Configuration | `config.yaml` | READ-WRITE | Hyperparameter changes without code changes |
15
+ | Features | `features/featurizers.py` | READ-ONLY | Modify how `train.py` *uses* featurizers instead |
16
+
17
+ ## Execution Rules
18
+
19
+ - **ALWAYS redirect training output:** `python train.py > run.log 2>&1`
20
+ - **ALWAYS parse metrics with grep** between `---` delimiters: `grep -A 10 "^---" run.log | head -10`
21
+ - **ALWAYS activate the venv first:** `source .venv/bin/activate`
22
+ - **NEVER install new packages** without human approval
23
+
24
+ ## Git Discipline
25
+
26
+ ### Per-Experiment Branches (preferred)
27
+
28
+ - **Create branch before each experiment:** `git checkout -b exp/{NNN}-{short-description}`
29
+ - **Commit changes on the branch:** `git commit -am "exp: {description}"`
30
+ - **Run the experiment on the branch**
31
+ - **If improved:** `git checkout main && git merge exp/{NNN}-{short-description}`. Copy model to `models/best/`.
32
+ - **If NOT improved:** `git checkout main`. Branch preserved for comparison.
33
+ - **Keep all experiment branches** — they preserve code variants for later analysis.
34
+
35
+ ### Fallback: Commit/Revert (mid-sweep)
36
+
37
+ - **ALWAYS commit before running:** `git commit -am "exp: {description}"`
38
+ - **If improved:** keep commit, copy model to `models/best/`
39
+ - **If NOT improved:** `git reset --hard HEAD~1`
40
+
41
+ ## Sweep Workflow
42
+
43
+ 1. Generate queue: `python scripts/sweep.py`
44
+ 2. Check status: `python scripts/sweep.py --status`
45
+ 3. Get next: `python scripts/sweep.py --next`
46
+ 4. Apply overrides, create branch, run training
47
+ 5. Mark: `python scripts/sweep.py --mark <name> complete|failed`
48
+ 6. Repeat until queue is empty
49
+
50
+ ## Logging Rules
51
+
52
+ - **Log every experiment** to `experiments/log.jsonl` via `python scripts/log_experiment.py` — kept and discarded alike.
53
+ - **Include all metrics, config, and description** of the hypothesis and its outcome.
54
+
55
+ ## Convergence Rules
56
+
57
+ - **N consecutive non-improvements** (from `config.yaml` `convergence.patience`) with less than threshold relative gain = STOP.
58
+ - **max_iterations** (if provided) overrides convergence.
59
+ - **Always report** final best model, metrics, and recommended next steps when stopping.
60
+
61
+ ## Tool Restrictions
62
+
63
+ The researcher agent's Bash access is restricted to a whitelist of necessary commands:
64
+
65
+ | Allowed Pattern | Purpose |
66
+ |-----------------|---------|
67
+ | `python train.py:*` | Execute training |
68
+ | `python scripts/*:*` | Run utility scripts (logging, metrics, sweep) |
69
+ | `git:*` | Branch, commit, merge, reset operations |
70
+ | `source .venv/bin/activate:*` | Virtual environment activation |
71
+ | `pip:*` | Package installation (requires human approval) |
72
+
73
+ **Blocked by omission:** `cat`, `head`, `tail`, `less` (prevents reading hidden files via shell), `curl`, `wget` (prevents data exfiltration), arbitrary command execution.
74
+
75
+ The agent's Read tool is separately governed by the file access tiers above — hidden files are denied at the tool level.
76
+
77
+ ## Reproducibility Rules
78
+
79
+ Every experiment must be fully reproducible. The training template handles this automatically, but the agent must not subvert it:
80
+
81
+ - **NEVER use unseeded randomness.** All random state flows from `config.yaml → data.random_state`. The `pin_all_seeds()` function in `train.py` sets stdlib `random`, `numpy`, `PYTHONHASHSEED`, and `torch`/`cuda` seeds from this single source.
82
+ - **NEVER modify seeds mid-experiment.** If you need a different seed, use `--seed` flag for multi-run comparison (Phase 2.1). Do not hardcode seeds in `train.py`.
83
+ - **Environment is captured automatically.** `train_metadata.json` records python version, package versions, platform, GPU info, and a config hash. Do not modify this recording — it's used by behavioral probes.
84
+ - **Config snapshot:** The config at training time is stored inside the model artifact (`model.joblib` contains the full config dict). For any saved model, the exact configuration can be recovered.
85
+ - **If adding new dependencies** (requires human approval), note that the environment capture in `train_metadata.json` will automatically record the new package version.
86
+
87
+ ## Safety
88
+
89
+ - Do not modify files outside the ML project directory.
90
+ - Do not delete experiment logs or model archives.
91
+ - If something breaks unexpectedly, stop and report — do not auto-fix evaluation infrastructure.
@@ -0,0 +1,47 @@
1
+ ---
2
+ name: sanity
3
+ description: Pre-training sanity checks — catch broken data loaders, misconfigured losses, and dead gradients in 30 seconds before wasting hours.
4
+ argument-hint: "[--quick] [--verbose]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Run a battery of fast checks before committing to a full training run. Catches wiring bugs in seconds.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Parse arguments from `$ARGUMENTS`:**
18
+ - `--quick` — skip single-batch overfit test (fastest, ~5 seconds)
19
+ - `--verbose` — show detailed check output
20
+ - `--json` — raw JSON output
21
+
22
+ 3. **Run sanity checks:**
23
+ ```bash
24
+ python scripts/sanity_checks.py $ARGUMENTS
25
+ ```
26
+
27
+ 4. **Checks performed:**
28
+ - **Data pipeline** (critical): first batch loads, shapes match, no NaN/Inf
29
+ - **Initial loss** (high): loss at initialization matches theory (e.g., -log(1/C) for cross-entropy)
30
+ - **Gradient flow** (high): all parameters have non-zero, non-exploding gradients
31
+ - **Single-batch overfit** (critical): model can memorize 1 batch in 50 steps — if not, something is broken
32
+ - **Output validation** (high): predictions are non-NaN, non-constant, reasonable range
33
+ - **Config consistency** (medium): learning rate, batch size in reasonable ranges
34
+
35
+ 5. **Verdicts:**
36
+ - **PASS** — safe to proceed
37
+ - **PASS (with warnings)** — review before training
38
+ - **FAIL** — do not proceed, fix issues first
39
+
40
+ 6. **Saved output:** report in `experiments/sanity/sanity-*.yaml`
41
+
42
+ ## Examples
43
+
44
+ ```
45
+ /turing:sanity # Full check (~30 seconds)
46
+ /turing:sanity --quick # Skip overfit test (~5 seconds)
47
+ ```
@@ -0,0 +1,54 @@
1
+ ---
2
+ name: scale
3
+ description: Scaling law estimator — run small experiments at different sizes, fit a power law, and predict full-scale performance before committing compute.
4
+ argument-hint: "[--axis data|compute|params] [--points 4] [--analyze results.yaml]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Predict full-scale performance from a handful of small experiments. Answers "is it worth training on the full dataset?" in 30 minutes instead of 3 days.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Parse arguments from `$ARGUMENTS`:**
18
+ - `--axis data|compute|params` — scaling axis (default: data)
19
+ - `--points 4` — number of scale points (default: 4)
20
+ - `--analyze results.yaml` — analyze existing results instead of planning
21
+ - `--plot` — include ASCII scaling plot
22
+ - `--json` — raw JSON output
23
+
24
+ 3. **Plan or analyze:**
25
+ - **Plan mode (default):** generates scale point configs to run
26
+ ```bash
27
+ python scripts/scaling_estimator.py --axis data --points 4
28
+ ```
29
+ - **Analyze mode:** fits power law to completed results
30
+ ```bash
31
+ python scripts/scaling_estimator.py --analyze experiments/scaling/results.yaml
32
+ ```
33
+
34
+ 4. **Scaling axes:**
35
+ - **data:** train on 10%, 25%, 50%, 75% of dataset
36
+ - **compute:** train for 10%, 25%, 50%, 75% of max epochs
37
+ - **params:** scale model size (fewer estimators, shallower depth)
38
+
39
+ 5. **After planning:** run each scale point experiment, record results in YAML, then use `--analyze` to fit the curve
40
+
41
+ 6. **Report includes:**
42
+ - Power law fit: `metric = a × n^b` with R²
43
+ - Predictions for 100%, 150%, 200% scale
44
+ - Verdict: DIMINISHING RETURNS / MARGINAL GAINS / WORTH SCALING
45
+
46
+ 7. **Saved output:** report written to `experiments/scaling/scale-YYYY-MM-DD.yaml`
47
+
48
+ ## Examples
49
+
50
+ ```
51
+ /turing:scale # Plan: data axis, 4 points
52
+ /turing:scale --axis compute --points 3 # Plan: compute axis, 3 points
53
+ /turing:scale --analyze results.yaml --plot # Analyze with ASCII plot
54
+ ```
@@ -0,0 +1,21 @@
1
+ ---
2
+ name: search
3
+ description: Natural language experiment search — query with text + structured filters over 200+ experiments.
4
+ argument-hint: "<query> [--filter \"accuracy>0.85\"] [--limit 10]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Find specific experiments in a large history with natural language and structured filters.
9
+
10
+ ## Steps
11
+ 1. **Activate environment:** `source .venv/bin/activate`
12
+ 2. **Run:** `python scripts/experiment_search.py $ARGUMENTS`
13
+ 3. **Filters:** `accuracy>0.85`, `status:kept`, `family:baseline`, `date:last-week`
14
+ 4. **Report:** ranked table of matching experiments
15
+
16
+ ## Examples
17
+ ```
18
+ /turing:search "LightGBM high accuracy" --filter "accuracy>0.85"
19
+ /turing:search "failed neural net" --filter "status:discarded"
20
+ /turing:search "last week" --limit 5
21
+ ```
@@ -0,0 +1,46 @@
1
+ ---
2
+ name: seed
3
+ description: Run multi-seed study on an experiment to compute mean/std/CI and flag seed-sensitive results. Prevents publishing lucky seeds.
4
+ argument-hint: "[N] [--quick] [--exp-id <id>]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Run a multi-seed study to verify that experiment results are robust across random seeds.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Parse arguments from `$ARGUMENTS`:**
18
+ - A bare number (e.g., `5`) sets the seed count
19
+ - `--quick` runs 3 seeds instead of 5
20
+ - `--exp-id exp-042` targets a specific experiment (defaults to best)
21
+ - `--seed-list 42,123,456` uses specific seed values
22
+
23
+ 3. **Run seed study:**
24
+ ```bash
25
+ python scripts/seed_runner.py $ARGUMENTS
26
+ ```
27
+
28
+ 4. **Report results:**
29
+ - Show the per-seed results table
30
+ - Show mean +/- std with 95% CI
31
+ - **STABLE (CV < 5%):** result is robust, safe to report
32
+ - **SEED-SENSITIVE (CV >= 5%):** result varies too much across seeds — do not report single-seed numbers
33
+ - If seed-sensitive, recommend reporting as mean +/- std over N seeds
34
+
35
+ 5. **Saved output:** results are written to `experiments/seed_studies/exp-NNN-seeds.yaml`
36
+
37
+ 6. **If no training pipeline exists:** suggest `/turing:init` first.
38
+
39
+ ## Examples
40
+
41
+ ```
42
+ /turing:seed # 5 seeds on best experiment
43
+ /turing:seed --quick # 3 seeds for fast check
44
+ /turing:seed 10 # 10 seeds for thorough study
45
+ /turing:seed --exp-id exp-042 # Specific experiment
46
+ ```
@@ -0,0 +1,40 @@
1
+ ---
2
+ name: sensitivity
3
+ description: Hyperparameter sensitivity analysis — rank parameters by impact, identify which matter and which are noise.
4
+ argument-hint: "[exp-id] [--params learning_rate,max_depth]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Which hyperparameters actually matter? Stop wasting time on the ones that don't.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Parse arguments from `$ARGUMENTS`:**
18
+ - Optional experiment ID
19
+ - `--params "learning_rate,max_depth"` — specific parameters to analyze
20
+ - `--json` — raw JSON output
21
+
22
+ 3. **Run sensitivity analysis:**
23
+ ```bash
24
+ python scripts/sensitivity_analysis.py $ARGUMENTS
25
+ ```
26
+
27
+ 4. **Report includes:**
28
+ - Per-parameter sensitivity ranking: HIGH / MED / LOW / NONE
29
+ - Metric range for each parameter sweep
30
+ - Monotonicity detection (is there a sweet spot?)
31
+ - Recommendations: focus tuning on X, stop tuning Y
32
+
33
+ 5. **Saved output:** report in `experiments/sensitivity/<exp-id>-sensitivity.yaml`
34
+
35
+ ## Examples
36
+
37
+ ```
38
+ /turing:sensitivity exp-042 # All tunable params
39
+ /turing:sensitivity --params "learning_rate,max_depth" # Specific params
40
+ ```
@@ -0,0 +1,19 @@
1
+ ---
2
+ name: share
3
+ description: Experiment packaging — portable archive with config, metrics, seed study, annotations, reproduction instructions.
4
+ argument-hint: "<exp-ids...> [--include model,figures,code]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Package experiments for collaborator handoff or paper supplementary material.
9
+
10
+ ## Steps
11
+ 1. `source .venv/bin/activate`
12
+ 2. `python scripts/package_experiments.py $ARGUMENTS`
13
+ 3. **Saved:** `exports/packages/<name>/`
14
+
15
+ ## Examples
16
+ ```
17
+ /turing:share exp-089
18
+ /turing:share exp-042 exp-089 --include model,figures
19
+ ```
@@ -0,0 +1,27 @@
1
+ ---
2
+ name: simulate
3
+ description: Experiment outcome prediction — predict which configs will beat the current best before running them.
4
+ argument-hint: "[--configs configs.yaml] [--top-k 5] [--threshold 0.001]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Predict outcomes before spending compute. Ranks proposed configs and recommends which to run vs skip.
9
+
10
+ ## Steps
11
+ 1. `source .venv/bin/activate`
12
+ 2. `python scripts/experiment_simulator.py $ARGUMENTS`
13
+ 3. **Saved:** `experiments/simulations/`
14
+
15
+ ## How it works
16
+ - Builds a surrogate model from experiment history (weighted k-NN)
17
+ - Predicts metric for each proposed config
18
+ - Applies novelty penalty for configs far from training distribution
19
+ - Ranks and filters: only recommend configs predicted to improve
20
+
21
+ ## Examples
22
+ ```
23
+ /turing:simulate --configs sweep_configs.yaml
24
+ /turing:simulate --configs candidates.yaml --top-k 3
25
+ /turing:simulate --configs proposals.yaml --threshold 0.005
26
+ /turing:simulate --configs sweep.yaml --json
27
+ ```
@@ -0,0 +1,23 @@
1
+ ---
2
+ name: status
3
+ description: Show current ML experiment status — best model, recent experiments, convergence state, and trend analysis. Delegates to @ml-evaluator for read-only safety.
4
+ allowed-tools: Read, Bash(*), Grep, Glob
5
+ ---
6
+
7
+ Show the current state of the ML training pipeline. This is an observation-only operation — no code is modified.
8
+
9
+ ## Steps
10
+
11
+ 1. **Run metrics display:**
12
+ ```bash
13
+ source .venv/bin/activate && python scripts/show_metrics.py --last 10
14
+ ```
15
+
16
+ 2. **Summarize for the user:**
17
+ - **Best model:** type, key metrics, experiment ID
18
+ - **Total experiments:** count from the log
19
+ - **Convergence state:** consecutive non-improvements vs patience threshold
20
+ - **Trend:** improving, plateauing, or regressing?
21
+ - **Recommendation:** continue training, try a different approach, or declare convergence
22
+
23
+ 3. **If no experiments exist:** report that the pipeline is ready but untrained. Suggest `/turing:train`.
@@ -0,0 +1,48 @@
1
+ ---
2
+ name: stitch
3
+ description: Pipeline composition — decompose ML pipelines into swappable stages. Show, swap, cache, and run stages independently.
4
+ argument-hint: "<show|swap|cache|run> [stage] [--from exp-id]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Decompose your ML pipeline into stages that can be independently varied, cached, and reused across experiments.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Parse arguments from `$ARGUMENTS`:**
18
+ - First argument is the action: `show`, `swap`, `cache`, `run`
19
+ - `show` — display pipeline stages with hash and cache status
20
+ - `swap <stage> --from <exp-id>` — replace a stage with one from another experiment
21
+ - `cache` — save intermediate stage outputs to disk
22
+ - `run` — execute pipeline, skipping cached stages
23
+
24
+ 3. **Run pipeline manager:**
25
+ ```bash
26
+ python scripts/pipeline_manager.py $ARGUMENTS
27
+ ```
28
+
29
+ 4. **Report results:**
30
+ - **show:** numbered stage list with description, content hash, and cache status
31
+ - **swap:** what changed, old vs new stage config, updated pipeline
32
+ - **cache:** per-stage cache paths and status
33
+ - **run:** which stages will be skipped (cached) vs re-run
34
+
35
+ 5. **Stage types:** preprocess, features, model, postprocess (configurable in `config.yaml` under `pipeline.stages`)
36
+
37
+ 6. **Cache benefit:** when only the model stage changes, preprocessing and feature engineering are skipped — experiments run faster
38
+
39
+ 7. **If no pipeline config:** falls back to default 4-stage pipeline
40
+
41
+ ## Examples
42
+
43
+ ```
44
+ /turing:stitch show # Display pipeline stages
45
+ /turing:stitch swap model --from exp-031 # Keep features, swap model
46
+ /turing:stitch cache # Cache intermediate outputs
47
+ /turing:stitch run # Run with cached stages
48
+ ```
@@ -0,0 +1,158 @@
1
+ ---
2
+ name: suggest
3
+ description: Literature-grounded model selection. Reads the ML task context, searches recent literature, and suggests model architectures worth trying — with citations. Suggestions are auto-queued as hypotheses.
4
+ argument-hint: "[task description override]"
5
+ allowed-tools: Read, Write, Bash(python scripts/*:*, source .venv/bin/activate:*), Grep, Glob, WebSearch, WebFetch
6
+ ---
7
+
8
+ Suggest model architectures for the current ML task. Supports two strategies:
9
+
10
+ - **literature** (default): Web search for recent papers, synthesize grounded suggestions with citations.
11
+ - **treequest**: Tree-search-guided hypothesis exploration using AB-MCTS over the critique scoring function. Explores refinement chains that literature search cannot find.
12
+
13
+ ## Strategy Detection
14
+
15
+ If `$ARGUMENTS` contains `--strategy treequest` or `treequest`, use the TreeQuest strategy below. Otherwise use the default literature strategy.
16
+
17
+ ## Steps (Literature Strategy — default)
18
+
19
+ ### 1. Understand the Task
20
+
21
+ Read the project config and recent experiment history to understand the task:
22
+
23
+ ```bash
24
+ cat config.yaml
25
+ ```
26
+
27
+ ```bash
28
+ source .venv/bin/activate && python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
29
+ ```
30
+
31
+ If `$ARGUMENTS` is provided, use that as the task description. Otherwise, infer from `config.yaml` (model type, primary metric, data source, target column).
32
+
33
+ From the config and any task description, identify the key task properties:
34
+ - Data type (tabular, time series, image, text, etc.)
35
+ - Objective (classification, regression, generation, etc.)
36
+ - Special constraints (imbalanced classes, small dataset, real-time, interpretability, etc.)
37
+ - Current model family and what's been tried
38
+
39
+ ### 2. Search Literature
40
+
41
+ Use `WebSearch` to find recent papers and benchmark results. Run 3-5 searches targeting:
42
+
43
+ 1. **Model comparison for this task type:** e.g., "best models for tabular classification benchmark 2024"
44
+ 2. **Current model alternatives:** e.g., "LightGBM vs XGBoost vs CatBoost tabular data"
45
+ 3. **Task-specific techniques:** e.g., "handling class imbalance gradient boosting"
46
+
47
+ For each search, use `WebFetch` on the top 1-2 results to extract specific model recommendations, benchmark numbers, and methodology.
48
+
49
+ Focus on:
50
+ - Recent work (2023-2026) with empirical comparisons
51
+ - Benchmark studies and surveys
52
+ - arXiv papers or reputable ML blogs with concrete results
53
+
54
+ ### 3. Synthesize Suggestions
55
+
56
+ From the literature, synthesize **3-5 concrete model architecture suggestions**. Each must include:
57
+
58
+ - **Model architecture:** specific (e.g., "LightGBM with GOSS sampling", not "try a different model")
59
+ - **Why:** one-sentence rationale grounded in what the literature says
60
+ - **Citation:** paper or source that supports this
61
+ - **Expected impact:** high/medium/low based on how well it fits this task
62
+ - **Implementation hint:** what to change in `train.py` (one concrete line)
63
+
64
+ ### 4. Queue as Hypotheses
65
+
66
+ For each suggestion, add to the hypothesis queue:
67
+
68
+ ```bash
69
+ source .venv/bin/activate && python scripts/manage_hypotheses.py add "<model>: <rationale> (source: <citation>)" --priority medium --source literature
70
+ ```
71
+
72
+ ### 5. Show Results
73
+
74
+ ```
75
+ Literature-Grounded Model Suggestions
76
+ ======================================
77
+
78
+ Task: <task description>
79
+ Current: <current model> (<current metric>=<value>)
80
+ Sources consulted: <N papers/articles>
81
+
82
+ 1. [HIGH] <technique>
83
+ Why: <one-sentence rationale with citation>
84
+ Source: <URL>
85
+ Change: <specific train.py change>
86
+ → Queued as hyp-NNN
87
+
88
+ 2. [MEDIUM] ...
89
+
90
+ Queued N hypotheses. Run /turing:train to test them.
91
+ ```
92
+
93
+ ## Fallback (Literature Strategy)
94
+
95
+ If web search returns insufficient results, suggest model families from `config/taxonomy.toml` based on what hasn't been tried yet. Note that suggestions are taxonomy-based, not literature-backed, and queue with `--source taxonomy`.
96
+
97
+ ## Steps (TreeQuest Strategy)
98
+
99
+ When using `--strategy treequest`:
100
+
101
+ ### 1. Detect Project Directory
102
+
103
+ Same detection logic as the literature strategy — find `config.yaml` + `train.py`.
104
+
105
+ ### 2. Run Tree Search
106
+
107
+ ```bash
108
+ source .venv/bin/activate && python scripts/treequest_suggest.py \
109
+ --log experiments/log.jsonl \
110
+ --config config.yaml \
111
+ --top 5 \
112
+ --iterations 30 \
113
+ --strategy abmcts-a
114
+ ```
115
+
116
+ If TreeQuest is not installed, the script automatically falls back to greedy best-first search.
117
+
118
+ ### 3. Queue Results
119
+
120
+ For each result from the tree search, queue as a hypothesis:
121
+
122
+ ```bash
123
+ source .venv/bin/activate && python scripts/manage_hypotheses.py add "<description>" --priority medium --source treequest
124
+ ```
125
+
126
+ ### 4. Show Results
127
+
128
+ Display the tree search output and confirm hypotheses were queued:
129
+
130
+ ```
131
+ TreeQuest Hypothesis Exploration (AB-MCTS-A)
132
+ ============================================
133
+ Nodes explored: 35
134
+ Top 5 hypotheses by critique score:
135
+
136
+ 1. [PROCEED] (score: 7.8/10)
137
+ Switch to LightGBM with dart boosting; additionally add polynomial features
138
+ Novelty: 8 Feasibility: 9 Impact: 7
139
+
140
+ ...
141
+
142
+ Queued N hypotheses. Run /turing:train to test them.
143
+ ```
144
+
145
+ ### TreeQuest Options
146
+
147
+ Pass additional flags via `$ARGUMENTS`:
148
+ - `--iterations N` — search depth (default: 30)
149
+ - `--top N` — number of results (default: 5)
150
+ - `--strategy abmcts-m` — use Bayesian mixed model variant (requires PyMC)
151
+ - `--greedy` — force greedy fallback without TreeQuest
152
+
153
+ ## Integration
154
+
155
+ - Suggestions feed into `hypotheses.yaml` — the next `/turing:train` picks them up
156
+ - `/turing:brief` shows queued literature-sourced and treequest-sourced hypotheses
157
+ - `/turing:explore` runs the TreeQuest search as a standalone command
158
+ - Human can override priority: `/turing:try` always takes precedence
@@ -0,0 +1,26 @@
1
+ ---
2
+ name: surgery
3
+ description: Architecture modification — add/remove layers, widen/narrow, swap activations, inject skip connections. Specify what to change, system handles how.
4
+ argument-hint: "<exp-id> --op <operation> [args...]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Programmatic architecture changes with auto warm-start from existing weights.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:** `source .venv/bin/activate`
13
+ 2. **Run:** `python scripts/architecture_surgery.py $ARGUMENTS`
14
+ 3. **Operations:** add-layer, remove-layer, widen, narrow, swap-activation, add-skip, add-norm, deepen, swap-objective
15
+ 4. **For tree models:** deepen (increase max_depth), widen (more estimators), swap-objective
16
+ 5. **Report:** operation details, config changes, parameter count delta, warm-start source
17
+ 6. **Saved output:** `experiments/surgery/<exp-id>-<op>.yaml`
18
+
19
+ ## Examples
20
+
21
+ ```
22
+ /turing:surgery exp-042 --op widen 2 # 2x wider hidden layers
23
+ /turing:surgery exp-042 --op add-layer # Insert a layer
24
+ /turing:surgery exp-042 --op swap-activation relu gelu # ReLU → GELU
25
+ /turing:surgery exp-042 --op deepen # Deeper trees
26
+ ```
@@ -0,0 +1,44 @@
1
+ ---
2
+ name: sweep
3
+ description: Generate and run a systematic hyperparameter sweep. Computes the cartesian product of configured parameter ranges and processes the queue sequentially with full experiment logging.
4
+ argument-hint: "[sweep_config.yaml]"
5
+ allowed-tools: Read, Write, Edit, Bash(python train.py:*, python scripts/*:*, git:*, source .venv/bin/activate:*, pip:*), Grep, Glob
6
+ ---
7
+
8
+ Run a systematic hyperparameter sweep using the sweep configuration.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Resolve config:** Use `$ARGUMENTS` as sweep config path, or default to `sweep_config.yaml`.
18
+
19
+ 3. **Generate queue** (if not already generated):
20
+ ```bash
21
+ python scripts/sweep.py [sweep_config.yaml]
22
+ ```
23
+
24
+ 4. **Check queue status:**
25
+ ```bash
26
+ python scripts/sweep.py --status
27
+ ```
28
+
29
+ 5. **Process queue sequentially:**
30
+ - Get next: `python scripts/sweep.py --next`
31
+ - Apply config overrides to `config.yaml`
32
+ - Create experiment branch: `git checkout -b exp/NNN-description`
33
+ - Run training: `python train.py > run.log 2>&1`
34
+ - Parse metrics: `grep -A 10 "^---" run.log | head -10`
35
+ - Log the experiment
36
+ - Mark complete: `python scripts/sweep.py --mark <name> complete`
37
+ - If improved, merge to main. If not, return to main.
38
+ - Repeat until queue is empty
39
+
40
+ 6. **Report** final results with best configuration found.
41
+
42
+ ## Rules
43
+
44
+ Follow the same safety constraints as `/turing:train` — see `rules/loop-protocol.md`.
@@ -0,0 +1,21 @@
1
+ ---
2
+ name: template
3
+ description: Experiment template library — save winning configs as reusable templates, apply to new projects.
4
+ argument-hint: "<save|list|apply|share> [--name name] [--from exp-id]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Turn your best experiment configs into reusable recipes that persist across projects.
9
+
10
+ ## Steps
11
+ 1. **Activate environment:** `source .venv/bin/activate`
12
+ 2. **Run:** `python scripts/experiment_templates.py $ARGUMENTS`
13
+ 3. **Operations:** save (from experiment), list (all templates), apply (to current project), share (export)
14
+ 4. **Stored at:** `~/.turing/templates/` (cross-project)
15
+
16
+ ## Examples
17
+ ```
18
+ /turing:template save --from exp-042 --name "tabular-xgboost-v2"
19
+ /turing:template list
20
+ /turing:template apply tabular-xgboost-v2
21
+ ```