claude-turing 4.6.0 → 4.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (333) hide show
  1. package/.claude-plugin/plugin.json +2 -2
  2. package/README.md +1 -1
  3. package/commands/ablate.md +0 -1
  4. package/commands/annotate.md +0 -1
  5. package/commands/archive.md +0 -1
  6. package/commands/audit.md +0 -1
  7. package/commands/baseline.md +0 -1
  8. package/commands/brief.md +0 -1
  9. package/commands/budget.md +0 -1
  10. package/commands/calibrate.md +0 -1
  11. package/commands/card.md +0 -1
  12. package/commands/changelog.md +0 -1
  13. package/commands/checkpoint.md +0 -1
  14. package/commands/cite.md +0 -1
  15. package/commands/compare.md +0 -1
  16. package/commands/counterfactual.md +0 -1
  17. package/commands/curriculum.md +0 -1
  18. package/commands/design.md +0 -1
  19. package/commands/diagnose.md +0 -1
  20. package/commands/diff.md +0 -1
  21. package/commands/distill.md +0 -1
  22. package/commands/doctor.md +0 -1
  23. package/commands/ensemble.md +0 -1
  24. package/commands/explore.md +0 -1
  25. package/commands/export.md +0 -1
  26. package/commands/feature.md +0 -1
  27. package/commands/flashback.md +0 -1
  28. package/commands/fork.md +0 -1
  29. package/commands/frontier.md +0 -1
  30. package/commands/init.md +0 -1
  31. package/commands/leak.md +0 -1
  32. package/commands/lit.md +0 -1
  33. package/commands/logbook.md +0 -1
  34. package/commands/merge.md +0 -1
  35. package/commands/mode.md +0 -1
  36. package/commands/onboard.md +0 -1
  37. package/commands/paper.md +0 -1
  38. package/commands/plan.md +0 -1
  39. package/commands/poster.md +0 -1
  40. package/commands/postmortem.md +0 -1
  41. package/commands/preflight.md +0 -1
  42. package/commands/present.md +0 -1
  43. package/commands/profile.md +0 -1
  44. package/commands/prune.md +0 -1
  45. package/commands/quantize.md +0 -1
  46. package/commands/queue.md +0 -1
  47. package/commands/registry.md +0 -1
  48. package/commands/regress.md +0 -1
  49. package/commands/replay.md +0 -1
  50. package/commands/report.md +0 -1
  51. package/commands/reproduce.md +0 -1
  52. package/commands/retry.md +0 -1
  53. package/commands/review.md +0 -1
  54. package/commands/sanity.md +0 -1
  55. package/commands/scale.md +0 -1
  56. package/commands/search.md +0 -1
  57. package/commands/seed.md +0 -1
  58. package/commands/sensitivity.md +0 -1
  59. package/commands/share.md +0 -1
  60. package/commands/simulate.md +0 -1
  61. package/commands/status.md +0 -1
  62. package/commands/stitch.md +0 -1
  63. package/commands/suggest.md +0 -1
  64. package/commands/surgery.md +0 -1
  65. package/commands/sweep.md +0 -1
  66. package/commands/template.md +0 -1
  67. package/commands/train.md +0 -1
  68. package/commands/transfer.md +0 -1
  69. package/commands/trend.md +0 -1
  70. package/commands/try.md +0 -1
  71. package/commands/turing.md +3 -3
  72. package/commands/update.md +0 -1
  73. package/commands/validate.md +0 -1
  74. package/commands/warm.md +0 -1
  75. package/commands/watch.md +0 -1
  76. package/commands/whatif.md +0 -1
  77. package/commands/xray.md +0 -1
  78. package/config/commands.yaml +74 -74
  79. package/package.json +10 -3
  80. package/skills/turing/SKILL.md +180 -0
  81. package/skills/turing/ablate/SKILL.md +46 -0
  82. package/skills/turing/annotate/SKILL.md +22 -0
  83. package/skills/turing/archive/SKILL.md +22 -0
  84. package/skills/turing/audit/SKILL.md +55 -0
  85. package/skills/turing/baseline/SKILL.md +44 -0
  86. package/skills/turing/brief/SKILL.md +94 -0
  87. package/skills/turing/budget/SKILL.md +51 -0
  88. package/skills/turing/calibrate/SKILL.md +46 -0
  89. package/skills/turing/card/SKILL.md +35 -0
  90. package/skills/turing/changelog/SKILL.md +21 -0
  91. package/skills/turing/checkpoint/SKILL.md +46 -0
  92. package/skills/turing/cite/SKILL.md +22 -0
  93. package/skills/turing/compare/SKILL.md +23 -0
  94. package/skills/turing/counterfactual/SKILL.md +26 -0
  95. package/skills/turing/curriculum/SKILL.md +42 -0
  96. package/skills/turing/design/SKILL.md +96 -0
  97. package/skills/turing/diagnose/SKILL.md +51 -0
  98. package/skills/turing/diff/SKILL.md +47 -0
  99. package/skills/turing/distill/SKILL.md +55 -0
  100. package/skills/turing/doctor/SKILL.md +30 -0
  101. package/skills/turing/ensemble/SKILL.md +53 -0
  102. package/skills/turing/explore/SKILL.md +106 -0
  103. package/skills/turing/export/SKILL.md +47 -0
  104. package/skills/turing/feature/SKILL.md +41 -0
  105. package/skills/turing/flashback/SKILL.md +21 -0
  106. package/skills/turing/fork/SKILL.md +39 -0
  107. package/skills/turing/frontier/SKILL.md +44 -0
  108. package/skills/turing/init/SKILL.md +153 -0
  109. package/skills/turing/leak/SKILL.md +46 -0
  110. package/skills/turing/lit/SKILL.md +46 -0
  111. package/skills/turing/logbook/SKILL.md +50 -0
  112. package/skills/turing/merge/SKILL.md +23 -0
  113. package/skills/turing/mode/SKILL.md +42 -0
  114. package/skills/turing/onboard/SKILL.md +19 -0
  115. package/skills/turing/paper/SKILL.md +43 -0
  116. package/skills/turing/plan/SKILL.md +26 -0
  117. package/skills/turing/poster/SKILL.md +88 -0
  118. package/skills/turing/postmortem/SKILL.md +27 -0
  119. package/skills/turing/preflight/SKILL.md +74 -0
  120. package/skills/turing/present/SKILL.md +22 -0
  121. package/skills/turing/profile/SKILL.md +42 -0
  122. package/skills/turing/prune/SKILL.md +25 -0
  123. package/skills/turing/quantize/SKILL.md +23 -0
  124. package/skills/turing/queue/SKILL.md +47 -0
  125. package/skills/turing/registry/SKILL.md +30 -0
  126. package/skills/turing/regress/SKILL.md +52 -0
  127. package/skills/turing/replay/SKILL.md +22 -0
  128. package/skills/turing/report/SKILL.md +96 -0
  129. package/skills/turing/reproduce/SKILL.md +47 -0
  130. package/skills/turing/retry/SKILL.md +40 -0
  131. package/skills/turing/review/SKILL.md +19 -0
  132. package/skills/turing/rules/loop-protocol.md +91 -0
  133. package/skills/turing/sanity/SKILL.md +47 -0
  134. package/skills/turing/scale/SKILL.md +54 -0
  135. package/skills/turing/search/SKILL.md +21 -0
  136. package/skills/turing/seed/SKILL.md +46 -0
  137. package/skills/turing/sensitivity/SKILL.md +40 -0
  138. package/skills/turing/share/SKILL.md +19 -0
  139. package/skills/turing/simulate/SKILL.md +27 -0
  140. package/skills/turing/status/SKILL.md +23 -0
  141. package/skills/turing/stitch/SKILL.md +48 -0
  142. package/skills/turing/suggest/SKILL.md +158 -0
  143. package/skills/turing/surgery/SKILL.md +26 -0
  144. package/skills/turing/sweep/SKILL.md +44 -0
  145. package/skills/turing/template/SKILL.md +21 -0
  146. package/skills/turing/train/SKILL.md +74 -0
  147. package/skills/turing/transfer/SKILL.md +53 -0
  148. package/skills/turing/trend/SKILL.md +20 -0
  149. package/skills/turing/try/SKILL.md +62 -0
  150. package/skills/turing/update/SKILL.md +26 -0
  151. package/skills/turing/validate/SKILL.md +33 -0
  152. package/skills/turing/warm/SKILL.md +52 -0
  153. package/skills/turing/watch/SKILL.md +59 -0
  154. package/skills/turing/whatif/SKILL.md +30 -0
  155. package/skills/turing/xray/SKILL.md +42 -0
  156. package/src/command-registry.js +21 -0
  157. package/src/install.js +4 -3
  158. package/src/sync-commands-layout.js +149 -0
  159. package/src/sync-skills-layout.js +20 -0
  160. package/templates/__pycache__/evaluate.cpython-312.pyc +0 -0
  161. package/templates/__pycache__/evaluate.cpython-314.pyc +0 -0
  162. package/templates/__pycache__/prepare.cpython-312.pyc +0 -0
  163. package/templates/__pycache__/prepare.cpython-314.pyc +0 -0
  164. package/templates/features/__pycache__/__init__.cpython-312.pyc +0 -0
  165. package/templates/features/__pycache__/__init__.cpython-314.pyc +0 -0
  166. package/templates/features/__pycache__/featurizers.cpython-312.pyc +0 -0
  167. package/templates/features/__pycache__/featurizers.cpython-314.pyc +0 -0
  168. package/templates/scripts/__pycache__/__init__.cpython-312.pyc +0 -0
  169. package/templates/scripts/__pycache__/__init__.cpython-314.pyc +0 -0
  170. package/templates/scripts/__pycache__/ablation_study.cpython-312.pyc +0 -0
  171. package/templates/scripts/__pycache__/ablation_study.cpython-314.pyc +0 -0
  172. package/templates/scripts/__pycache__/architecture_surgery.cpython-312.pyc +0 -0
  173. package/templates/scripts/__pycache__/architecture_surgery.cpython-314.pyc +0 -0
  174. package/templates/scripts/__pycache__/budget_manager.cpython-312.pyc +0 -0
  175. package/templates/scripts/__pycache__/budget_manager.cpython-314.pyc +0 -0
  176. package/templates/scripts/__pycache__/build_ensemble.cpython-312.pyc +0 -0
  177. package/templates/scripts/__pycache__/build_ensemble.cpython-314.pyc +0 -0
  178. package/templates/scripts/__pycache__/calibration.cpython-312.pyc +0 -0
  179. package/templates/scripts/__pycache__/calibration.cpython-314.pyc +0 -0
  180. package/templates/scripts/__pycache__/check_convergence.cpython-312.pyc +0 -0
  181. package/templates/scripts/__pycache__/check_convergence.cpython-314.pyc +0 -0
  182. package/templates/scripts/__pycache__/checkpoint_manager.cpython-312.pyc +0 -0
  183. package/templates/scripts/__pycache__/checkpoint_manager.cpython-314.pyc +0 -0
  184. package/templates/scripts/__pycache__/citation_manager.cpython-312.pyc +0 -0
  185. package/templates/scripts/__pycache__/citation_manager.cpython-314.pyc +0 -0
  186. package/templates/scripts/__pycache__/cost_frontier.cpython-312.pyc +0 -0
  187. package/templates/scripts/__pycache__/cost_frontier.cpython-314.pyc +0 -0
  188. package/templates/scripts/__pycache__/counterfactual_explanation.cpython-312.pyc +0 -0
  189. package/templates/scripts/__pycache__/counterfactual_explanation.cpython-314.pyc +0 -0
  190. package/templates/scripts/__pycache__/critique_hypothesis.cpython-312.pyc +0 -0
  191. package/templates/scripts/__pycache__/critique_hypothesis.cpython-314.pyc +0 -0
  192. package/templates/scripts/__pycache__/curriculum_optimizer.cpython-312.pyc +0 -0
  193. package/templates/scripts/__pycache__/curriculum_optimizer.cpython-314.pyc +0 -0
  194. package/templates/scripts/__pycache__/diagnose_errors.cpython-312.pyc +0 -0
  195. package/templates/scripts/__pycache__/diagnose_errors.cpython-314.pyc +0 -0
  196. package/templates/scripts/__pycache__/draft_paper_sections.cpython-312.pyc +0 -0
  197. package/templates/scripts/__pycache__/draft_paper_sections.cpython-314.pyc +0 -0
  198. package/templates/scripts/__pycache__/equivalence_checker.cpython-312.pyc +0 -0
  199. package/templates/scripts/__pycache__/equivalence_checker.cpython-314.pyc +0 -0
  200. package/templates/scripts/__pycache__/experiment_annotations.cpython-312.pyc +0 -0
  201. package/templates/scripts/__pycache__/experiment_annotations.cpython-314.pyc +0 -0
  202. package/templates/scripts/__pycache__/experiment_archive.cpython-312.pyc +0 -0
  203. package/templates/scripts/__pycache__/experiment_archive.cpython-314.pyc +0 -0
  204. package/templates/scripts/__pycache__/experiment_diff.cpython-312.pyc +0 -0
  205. package/templates/scripts/__pycache__/experiment_diff.cpython-314.pyc +0 -0
  206. package/templates/scripts/__pycache__/experiment_index.cpython-312.pyc +0 -0
  207. package/templates/scripts/__pycache__/experiment_index.cpython-314.pyc +0 -0
  208. package/templates/scripts/__pycache__/experiment_queue.cpython-312.pyc +0 -0
  209. package/templates/scripts/__pycache__/experiment_queue.cpython-314.pyc +0 -0
  210. package/templates/scripts/__pycache__/experiment_replay.cpython-312.pyc +0 -0
  211. package/templates/scripts/__pycache__/experiment_replay.cpython-314.pyc +0 -0
  212. package/templates/scripts/__pycache__/experiment_search.cpython-312.pyc +0 -0
  213. package/templates/scripts/__pycache__/experiment_search.cpython-314.pyc +0 -0
  214. package/templates/scripts/__pycache__/experiment_simulator.cpython-312.pyc +0 -0
  215. package/templates/scripts/__pycache__/experiment_simulator.cpython-314.pyc +0 -0
  216. package/templates/scripts/__pycache__/experiment_templates.cpython-312.pyc +0 -0
  217. package/templates/scripts/__pycache__/experiment_templates.cpython-314.pyc +0 -0
  218. package/templates/scripts/__pycache__/export_card.cpython-312.pyc +0 -0
  219. package/templates/scripts/__pycache__/export_card.cpython-314.pyc +0 -0
  220. package/templates/scripts/__pycache__/export_formats.cpython-312.pyc +0 -0
  221. package/templates/scripts/__pycache__/export_formats.cpython-314.pyc +0 -0
  222. package/templates/scripts/__pycache__/failure_postmortem.cpython-312.pyc +0 -0
  223. package/templates/scripts/__pycache__/failure_postmortem.cpython-314.pyc +0 -0
  224. package/templates/scripts/__pycache__/feature_intelligence.cpython-312.pyc +0 -0
  225. package/templates/scripts/__pycache__/feature_intelligence.cpython-314.pyc +0 -0
  226. package/templates/scripts/__pycache__/fork_experiment.cpython-312.pyc +0 -0
  227. package/templates/scripts/__pycache__/fork_experiment.cpython-314.pyc +0 -0
  228. package/templates/scripts/__pycache__/generate_baselines.cpython-312.pyc +0 -0
  229. package/templates/scripts/__pycache__/generate_baselines.cpython-314.pyc +0 -0
  230. package/templates/scripts/__pycache__/generate_brief.cpython-312.pyc +0 -0
  231. package/templates/scripts/__pycache__/generate_brief.cpython-314.pyc +0 -0
  232. package/templates/scripts/__pycache__/generate_changelog.cpython-312.pyc +0 -0
  233. package/templates/scripts/__pycache__/generate_changelog.cpython-314.pyc +0 -0
  234. package/templates/scripts/__pycache__/generate_figures.cpython-312.pyc +0 -0
  235. package/templates/scripts/__pycache__/generate_figures.cpython-314.pyc +0 -0
  236. package/templates/scripts/__pycache__/generate_logbook.cpython-312.pyc +0 -0
  237. package/templates/scripts/__pycache__/generate_logbook.cpython-314.pyc +0 -0
  238. package/templates/scripts/__pycache__/generate_model_card.cpython-312.pyc +0 -0
  239. package/templates/scripts/__pycache__/generate_model_card.cpython-314.pyc +0 -0
  240. package/templates/scripts/__pycache__/generate_onboarding.cpython-312.pyc +0 -0
  241. package/templates/scripts/__pycache__/generate_onboarding.cpython-314.pyc +0 -0
  242. package/templates/scripts/__pycache__/harness_doctor.cpython-312.pyc +0 -0
  243. package/templates/scripts/__pycache__/harness_doctor.cpython-314.pyc +0 -0
  244. package/templates/scripts/__pycache__/incremental_update.cpython-312.pyc +0 -0
  245. package/templates/scripts/__pycache__/incremental_update.cpython-314.pyc +0 -0
  246. package/templates/scripts/__pycache__/knowledge_transfer.cpython-312.pyc +0 -0
  247. package/templates/scripts/__pycache__/knowledge_transfer.cpython-314.pyc +0 -0
  248. package/templates/scripts/__pycache__/latency_benchmark.cpython-312.pyc +0 -0
  249. package/templates/scripts/__pycache__/latency_benchmark.cpython-314.pyc +0 -0
  250. package/templates/scripts/__pycache__/leakage_detector.cpython-312.pyc +0 -0
  251. package/templates/scripts/__pycache__/leakage_detector.cpython-314.pyc +0 -0
  252. package/templates/scripts/__pycache__/literature_search.cpython-312.pyc +0 -0
  253. package/templates/scripts/__pycache__/literature_search.cpython-314.pyc +0 -0
  254. package/templates/scripts/__pycache__/log_experiment.cpython-312.pyc +0 -0
  255. package/templates/scripts/__pycache__/log_experiment.cpython-314.pyc +0 -0
  256. package/templates/scripts/__pycache__/manage_hypotheses.cpython-312.pyc +0 -0
  257. package/templates/scripts/__pycache__/manage_hypotheses.cpython-314.pyc +0 -0
  258. package/templates/scripts/__pycache__/methodology_audit.cpython-312.pyc +0 -0
  259. package/templates/scripts/__pycache__/methodology_audit.cpython-314.pyc +0 -0
  260. package/templates/scripts/__pycache__/model_distiller.cpython-312.pyc +0 -0
  261. package/templates/scripts/__pycache__/model_distiller.cpython-314.pyc +0 -0
  262. package/templates/scripts/__pycache__/model_lifecycle.cpython-312.pyc +0 -0
  263. package/templates/scripts/__pycache__/model_lifecycle.cpython-314.pyc +0 -0
  264. package/templates/scripts/__pycache__/model_merger.cpython-312.pyc +0 -0
  265. package/templates/scripts/__pycache__/model_merger.cpython-314.pyc +0 -0
  266. package/templates/scripts/__pycache__/model_pruning.cpython-312.pyc +0 -0
  267. package/templates/scripts/__pycache__/model_pruning.cpython-314.pyc +0 -0
  268. package/templates/scripts/__pycache__/model_quantization.cpython-312.pyc +0 -0
  269. package/templates/scripts/__pycache__/model_quantization.cpython-314.pyc +0 -0
  270. package/templates/scripts/__pycache__/model_xray.cpython-312.pyc +0 -0
  271. package/templates/scripts/__pycache__/model_xray.cpython-314.pyc +0 -0
  272. package/templates/scripts/__pycache__/novelty_guard.cpython-312.pyc +0 -0
  273. package/templates/scripts/__pycache__/novelty_guard.cpython-314.pyc +0 -0
  274. package/templates/scripts/__pycache__/package_experiments.cpython-312.pyc +0 -0
  275. package/templates/scripts/__pycache__/package_experiments.cpython-314.pyc +0 -0
  276. package/templates/scripts/__pycache__/pareto_frontier.cpython-312.pyc +0 -0
  277. package/templates/scripts/__pycache__/pareto_frontier.cpython-314.pyc +0 -0
  278. package/templates/scripts/__pycache__/parse_metrics.cpython-312.pyc +0 -0
  279. package/templates/scripts/__pycache__/parse_metrics.cpython-314.pyc +0 -0
  280. package/templates/scripts/__pycache__/pipeline_manager.cpython-312.pyc +0 -0
  281. package/templates/scripts/__pycache__/pipeline_manager.cpython-314.pyc +0 -0
  282. package/templates/scripts/__pycache__/profile_training.cpython-312.pyc +0 -0
  283. package/templates/scripts/__pycache__/profile_training.cpython-314.pyc +0 -0
  284. package/templates/scripts/__pycache__/regression_gate.cpython-312.pyc +0 -0
  285. package/templates/scripts/__pycache__/regression_gate.cpython-314.pyc +0 -0
  286. package/templates/scripts/__pycache__/reproduce_experiment.cpython-312.pyc +0 -0
  287. package/templates/scripts/__pycache__/reproduce_experiment.cpython-314.pyc +0 -0
  288. package/templates/scripts/__pycache__/research_planner.cpython-312.pyc +0 -0
  289. package/templates/scripts/__pycache__/research_planner.cpython-314.pyc +0 -0
  290. package/templates/scripts/__pycache__/sanity_checks.cpython-312.pyc +0 -0
  291. package/templates/scripts/__pycache__/sanity_checks.cpython-314.pyc +0 -0
  292. package/templates/scripts/__pycache__/scaffold.cpython-312.pyc +0 -0
  293. package/templates/scripts/__pycache__/scaffold.cpython-314.pyc +0 -0
  294. package/templates/scripts/__pycache__/scaling_estimator.cpython-312.pyc +0 -0
  295. package/templates/scripts/__pycache__/scaling_estimator.cpython-314.pyc +0 -0
  296. package/templates/scripts/__pycache__/seed_runner.cpython-312.pyc +0 -0
  297. package/templates/scripts/__pycache__/seed_runner.cpython-314.pyc +0 -0
  298. package/templates/scripts/__pycache__/sensitivity_analysis.cpython-312.pyc +0 -0
  299. package/templates/scripts/__pycache__/sensitivity_analysis.cpython-314.pyc +0 -0
  300. package/templates/scripts/__pycache__/session_flashback.cpython-312.pyc +0 -0
  301. package/templates/scripts/__pycache__/session_flashback.cpython-314.pyc +0 -0
  302. package/templates/scripts/__pycache__/show_experiment_tree.cpython-312.pyc +0 -0
  303. package/templates/scripts/__pycache__/show_experiment_tree.cpython-314.pyc +0 -0
  304. package/templates/scripts/__pycache__/show_families.cpython-312.pyc +0 -0
  305. package/templates/scripts/__pycache__/show_families.cpython-314.pyc +0 -0
  306. package/templates/scripts/__pycache__/simulate_review.cpython-312.pyc +0 -0
  307. package/templates/scripts/__pycache__/simulate_review.cpython-314.pyc +0 -0
  308. package/templates/scripts/__pycache__/smart_retry.cpython-312.pyc +0 -0
  309. package/templates/scripts/__pycache__/smart_retry.cpython-314.pyc +0 -0
  310. package/templates/scripts/__pycache__/statistical_compare.cpython-312.pyc +0 -0
  311. package/templates/scripts/__pycache__/statistical_compare.cpython-314.pyc +0 -0
  312. package/templates/scripts/__pycache__/suggest_next.cpython-312.pyc +0 -0
  313. package/templates/scripts/__pycache__/suggest_next.cpython-314.pyc +0 -0
  314. package/templates/scripts/__pycache__/sweep.cpython-312.pyc +0 -0
  315. package/templates/scripts/__pycache__/sweep.cpython-314.pyc +0 -0
  316. package/templates/scripts/__pycache__/synthesize_decision.cpython-312.pyc +0 -0
  317. package/templates/scripts/__pycache__/synthesize_decision.cpython-314.pyc +0 -0
  318. package/templates/scripts/__pycache__/training_monitor.cpython-312.pyc +0 -0
  319. package/templates/scripts/__pycache__/training_monitor.cpython-314.pyc +0 -0
  320. package/templates/scripts/__pycache__/treequest_suggest.cpython-312.pyc +0 -0
  321. package/templates/scripts/__pycache__/treequest_suggest.cpython-314.pyc +0 -0
  322. package/templates/scripts/__pycache__/trend_analysis.cpython-312.pyc +0 -0
  323. package/templates/scripts/__pycache__/trend_analysis.cpython-314.pyc +0 -0
  324. package/templates/scripts/__pycache__/turing_io.cpython-312.pyc +0 -0
  325. package/templates/scripts/__pycache__/turing_io.cpython-314.pyc +0 -0
  326. package/templates/scripts/__pycache__/update_state.cpython-312.pyc +0 -0
  327. package/templates/scripts/__pycache__/update_state.cpython-314.pyc +0 -0
  328. package/templates/scripts/__pycache__/verify_placeholders.cpython-312.pyc +0 -0
  329. package/templates/scripts/__pycache__/verify_placeholders.cpython-314.pyc +0 -0
  330. package/templates/scripts/__pycache__/warm_start.cpython-312.pyc +0 -0
  331. package/templates/scripts/__pycache__/warm_start.cpython-314.pyc +0 -0
  332. package/templates/scripts/__pycache__/whatif_engine.cpython-312.pyc +0 -0
  333. package/templates/scripts/__pycache__/whatif_engine.cpython-314.pyc +0 -0
@@ -0,0 +1,46 @@
1
+ ---
2
+ name: checkpoint
3
+ description: Smart checkpoint management — list, prune (Pareto-based), average top-K, resume from any point, disk usage stats.
4
+ argument-hint: "<list|prune|average|resume|stats> [exp-id] [--top 3] [--dry-run]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Manage model checkpoints intelligently using Pareto dominance.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Parse arguments from `$ARGUMENTS`:**
18
+ - First word is the action: `list`, `prune`, `average`, `resume`, `stats`
19
+ - `resume` requires an experiment ID as second argument
20
+ - `--top 3` sets the number of checkpoints for averaging
21
+ - `--dry-run` previews pruning without deleting
22
+
23
+ 3. **Run checkpoint manager:**
24
+ ```bash
25
+ python scripts/checkpoint_manager.py $ARGUMENTS
26
+ ```
27
+
28
+ 4. **Report results by action:**
29
+ - **list:** Table of all checkpoints with metrics, size, and Pareto status
30
+ - **prune:** Removes dominated checkpoints, reports space saved
31
+ - **average:** Lists top-K checkpoints for weight averaging
32
+ - **resume:** Locates checkpoint for a specific experiment
33
+ - **stats:** Disk usage summary by total, average, and model type
34
+
35
+ 5. **Saved output:** report written to `experiments/checkpoints/checkpoint-report.yaml`
36
+
37
+ ## Examples
38
+
39
+ ```
40
+ /turing:checkpoint list # Show all checkpoints
41
+ /turing:checkpoint stats # Disk usage summary
42
+ /turing:checkpoint prune --dry-run # Preview what would be pruned
43
+ /turing:checkpoint prune # Remove dominated checkpoints
44
+ /turing:checkpoint average --top 5 # Top 5 for averaging
45
+ /turing:checkpoint resume exp-042 # Resume from checkpoint
46
+ ```
@@ -0,0 +1,22 @@
1
+ ---
2
+ name: cite
3
+ description: Citation & attribution manager — track papers, datasets, methods. Audit for missing citations, generate BibTeX.
4
+ argument-hint: "<add|list|check|bib> [--key Chen2016 --title XGBoost --url ...]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Track which papers and methods influenced each experiment. Catch missing citations before submission.
9
+
10
+ ## Steps
11
+ 1. **Activate environment:** `source .venv/bin/activate`
12
+ 2. **Run:** `python scripts/citation_manager.py $ARGUMENTS`
13
+ 3. **Operations:** add (associate citation with experiment), list (group by type), check (audit missing), bib (BibTeX)
14
+ 4. **Stored in:** `experiments/citations.yaml`
15
+
16
+ ## Examples
17
+ ```
18
+ /turing:cite add exp-042 --key Chen2016 --title "XGBoost" --type method --url "https://arxiv.org/abs/1603.02754"
19
+ /turing:cite list
20
+ /turing:cite check # Audit for missing citations
21
+ /turing:cite bib # Generate BibTeX
22
+ ```
@@ -0,0 +1,23 @@
1
+ ---
2
+ name: compare
3
+ description: Compare two ML experiment runs side-by-side — metrics, configuration deltas, and a verdict on which approach is more promising.
4
+ argument-hint: "<exp-id-1> <exp-id-2>"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Compare two ML experiment runs side-by-side to understand what changed and why one performed better.
9
+
10
+ ## Steps
11
+
12
+ 1. **Run comparison:**
13
+ ```bash
14
+ source .venv/bin/activate && python scripts/compare_runs.py $0 $1
15
+ ```
16
+
17
+ 2. **Analyze the delta:**
18
+ - **Metric differences:** all configured metrics for both runs
19
+ - **Configuration delta:** what changed (model type, hyperparameters, features)
20
+ - **Causal analysis:** which changes likely caused the metric difference
21
+ - **Verdict:** which approach is more promising for future experiments
22
+
23
+ 3. **If either ID is missing:** report the error and suggest `/turing:status` to see available experiment IDs.
@@ -0,0 +1,26 @@
1
+ ---
2
+ name: counterfactual
3
+ description: Input-level counterfactual explanations — find the smallest input change to flip a prediction.
4
+ argument-hint: "<exp-id> --sample <index> [--target <class>]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ What would need to change to flip this prediction? Minimum-change counterfactual for individual predictions.
9
+
10
+ ## Steps
11
+ 1. `source .venv/bin/activate`
12
+ 2. `python scripts/counterfactual_explanation.py $ARGUMENTS`
13
+ 3. **Saved:** `experiments/counterfactuals/`
14
+
15
+ ## Methods
16
+ - **Greedy perturbation:** change one feature at a time, find minimum flip
17
+ - **Prototype-based:** find nearest training sample from target class
18
+ - Both methods run and the best (smallest distance) is selected
19
+
20
+ ## Examples
21
+ ```
22
+ /turing:counterfactual exp-042 --sample 1247
23
+ /turing:counterfactual exp-042 --sample 1247 --target 0
24
+ /turing:counterfactual exp-042 --batch-misclassified
25
+ /turing:counterfactual exp-042 --sample 500 --json
26
+ ```
@@ -0,0 +1,42 @@
1
+ ---
2
+ name: curriculum
3
+ description: Training curriculum optimization — order data by difficulty, compare easy-to-hard vs hard-to-easy vs self-paced strategies.
4
+ argument-hint: "[exp-id] [--strategies easy-to-hard,random]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Does the order your model sees data matter? Find out systematically.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Parse arguments from `$ARGUMENTS`:**
18
+ - Optional experiment ID
19
+ - `--strategies "easy_to_hard,hard_to_easy,self_paced,random"` — strategies to test
20
+ - `--json` — raw JSON output
21
+
22
+ 3. **Run curriculum analysis:**
23
+ ```bash
24
+ python scripts/curriculum_optimizer.py $ARGUMENTS
25
+ ```
26
+
27
+ 4. **Strategies tested:**
28
+ - **Random:** standard shuffling (control)
29
+ - **Easy-to-hard:** classic curriculum learning
30
+ - **Hard-to-easy:** anti-curriculum
31
+ - **Self-paced:** start easy, gradually include harder samples
32
+
33
+ 5. **Report includes:** strategy comparison table with metric, convergence epoch, and speedup vs random; impossible sample detection (likely mislabeled)
34
+
35
+ 6. **Saved output:** report in `experiments/curriculum/<exp-id>-curriculum.yaml`
36
+
37
+ ## Examples
38
+
39
+ ```
40
+ /turing:curriculum exp-042 # All strategies
41
+ /turing:curriculum --strategies easy_to_hard,random # Specific strategies
42
+ ```
@@ -0,0 +1,96 @@
1
+ ---
2
+ name: design
3
+ description: Generate a structured experiment design for a hypothesis. Reads experiment history, searches literature for methodology, produces a scored design document at experiments/designs/.
4
+ argument-hint: "<hypothesis-id or description>"
5
+ allowed-tools: Read, Write, Bash(python scripts/*:*, source .venv/bin/activate:*, mkdir:*), Grep, Glob, WebSearch, WebFetch
6
+ ---
7
+
8
+ Front-load the thinking before the coding. Given a hypothesis, produce a structured experiment design grounded in methodology from the literature.
9
+
10
+ ## Steps
11
+
12
+ ### 1. Load Context
13
+
14
+ If `$ARGUMENTS` matches `hyp-NNN`, load the hypothesis:
15
+ ```bash
16
+ source .venv/bin/activate && python scripts/manage_hypotheses.py show $ARGUMENTS
17
+ ```
18
+
19
+ If freeform text, use it directly as the hypothesis description.
20
+
21
+ Read the current config and experiment state:
22
+ ```bash
23
+ cat config.yaml
24
+ ```
25
+ ```bash
26
+ source .venv/bin/activate && python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
27
+ ```
28
+ ```bash
29
+ cat experiment_state.yaml 2>/dev/null || echo "No experiment state yet"
30
+ ```
31
+
32
+ ### 2. Search for Methodology
33
+
34
+ Use `WebSearch` to find 2-3 papers or articles describing how to implement the proposed change effectively. Target:
35
+ - The specific technique in the hypothesis (e.g., "LightGBM dart boosting implementation best practices")
36
+ - Common pitfalls for this type of change
37
+ - Benchmark results showing expected improvement range
38
+
39
+ Use `WebFetch` on the most relevant results to extract specific methodology details: hyperparameter recommendations, training procedures, evaluation approaches.
40
+
41
+ ### 3. Write the Design Document
42
+
43
+ Create `experiments/designs/<hyp-id>-design.md` (or `experiments/designs/adhoc-<date>-design.md` for freeform hypotheses):
44
+
45
+ ```bash
46
+ mkdir -p experiments/designs
47
+ ```
48
+
49
+ Write with this structure:
50
+
51
+ ```markdown
52
+ # Experiment Design: <hypothesis summary>
53
+
54
+ ## Hypothesis
55
+ <full description>
56
+
57
+ ## Objective
58
+ <what we're testing, stated as a falsifiable claim>
59
+
60
+ ## Method
61
+ <specific changes, grounded in literature findings>
62
+
63
+ ## Literature Support
64
+ - <source 1>: <what it says about this approach>
65
+ - <source 2>: <relevant finding>
66
+
67
+ ## Implementation Plan
68
+ ### Changes to train.py
69
+ <concrete code changes needed>
70
+
71
+ ### Changes to config.yaml (if any)
72
+ <hyperparameter values to set, with rationale from literature>
73
+
74
+ ## Expected Outcome
75
+ - **Success:** <metric > threshold, specific number>
76
+ - **Failure:** <what would disprove the hypothesis>
77
+
78
+ ## Risks
79
+ <specific pitfalls from literature, not generic "might not work">
80
+
81
+ ## Estimated Runs
82
+ <how many iterations>
83
+ ```
84
+
85
+ ### 4. Self-Critique
86
+
87
+ Review the design:
88
+ - Is the implementation plan specific enough for the researcher agent to execute without ambiguity?
89
+ - Does the expected outcome have a concrete metric threshold?
90
+ - Are risks actionable?
91
+
92
+ Score each dimension 1-10 (feasibility, novelty, clarity). If any < 7, revise that section. Max 2 revision rounds.
93
+
94
+ ### 5. Report
95
+
96
+ Display the design summary with scores and file location. The researcher agent can read the design during `/turing:train`.
@@ -0,0 +1,51 @@
1
+ ---
2
+ name: diagnose
3
+ description: Error analysis — cluster failure cases, identify systematic failure modes, and suggest targeted fixes with auto-queued hypotheses.
4
+ argument-hint: "[exp-id] [--auto-queue] [--top 5]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Analyze where and why the model fails, beyond aggregate metrics.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Generate predictions if needed:**
18
+ Check if `experiments/predictions/exp-NNN-preds.yaml` exists. If not, run:
19
+ ```bash
20
+ python train.py --predict-only --output experiments/predictions/
21
+ ```
22
+ The predictions file must contain `y_true`, `y_pred`, `task_type`, and optionally `features`.
23
+
24
+ 3. **Parse arguments from `$ARGUMENTS`:**
25
+ - First argument can be an experiment ID (e.g., `exp-042`); defaults to best
26
+ - `--auto-queue` auto-queues hypotheses from failure modes into `hypotheses.yaml`
27
+ - `--top 5` limits to top N failure modes (default 5)
28
+
29
+ 4. **Run error analysis:**
30
+ ```bash
31
+ python scripts/diagnose_errors.py $ARGUMENTS
32
+ ```
33
+
34
+ 5. **Report results:**
35
+ - **Classification:** confusion matrix, most-confused pairs, per-class P/R/F1, low-recall classes
36
+ - **Regression:** residual stats, P90/P95 errors, feature-range bias, systematic bias
37
+ - **Failure modes:** ranked by impact, with suggested fixes
38
+ - **Auto-hypotheses:** if `--auto-queue`, shows queued hypotheses targeting weaknesses
39
+
40
+ 6. **Saved output:** report written to `experiments/diagnoses/exp-NNN-diagnosis.yaml`
41
+
42
+ 7. **If no predictions file exists:** instruct user to run the model on validation set first.
43
+
44
+ ## Examples
45
+
46
+ ```
47
+ /turing:diagnose # Analyze best experiment
48
+ /turing:diagnose exp-042 # Specific experiment
49
+ /turing:diagnose --auto-queue # Queue fix hypotheses
50
+ /turing:diagnose --top 10 # Top 10 failure modes
51
+ ```
@@ -0,0 +1,47 @@
1
+ ---
2
+ name: diff
3
+ description: Deep experiment comparison — config diffs, metric significance, per-class regressions, training curve divergence, feature importance shifts.
4
+ argument-hint: "<exp-a> <exp-b> [--code]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Deep diagnostic comparison of two experiments. Goes beyond "which metric is higher" to show where, when, and why two experiments diverge.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Parse arguments from `$ARGUMENTS`:**
18
+ - First two arguments are experiment IDs (required), e.g. `exp-042 exp-053`
19
+ - `--code` includes git diff of train.py between the two experiments' commits
20
+ - `--json` outputs raw JSON instead of markdown
21
+
22
+ 3. **Run deep comparison:**
23
+ ```bash
24
+ python scripts/experiment_diff.py $ARGUMENTS
25
+ ```
26
+
27
+ 4. **Report results — the diff includes:**
28
+ - **Config diff:** which hyperparameters changed, with magnitude (e.g., `max_depth: 6 → 8 (+33%)`)
29
+ - **Metric diff:** all metrics with deltas and statistical significance (if seed studies exist)
30
+ - **Per-class diff:** which classes improved/regressed — flags regressions hidden by aggregate improvement
31
+ - **Training curve divergence:** the epoch where the two experiments' loss/metric curves separate
32
+ - **Feature importance shifts:** which features gained/lost importance
33
+ - **Code diff (--code):** git diff of train.py between the two commits
34
+
35
+ 5. **Saved output:** report written to `experiments/diffs/<exp-a>-vs-<exp-b>.yaml`
36
+
37
+ 6. **If experiment ID not found:** list available experiment IDs from `experiments/log.jsonl`
38
+
39
+ 7. **If no training pipeline exists:** suggest `/turing:init` first.
40
+
41
+ ## Examples
42
+
43
+ ```
44
+ /turing:diff exp-042 exp-053 # Full diagnostic comparison
45
+ /turing:diff exp-042 exp-053 --code # Include train.py code changes
46
+ /turing:diff exp-001 exp-010 --json # Raw JSON output
47
+ ```
@@ -0,0 +1,55 @@
1
+ ---
2
+ name: distill
3
+ description: Model compression via distillation — train a smaller student model to match a larger teacher's predictions.
4
+ argument-hint: "<teacher-exp-id> [--compression 4] [--method soft-labels]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Compress a large model into a smaller, faster one for production. Measures the accuracy/size/latency tradeoff.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Parse arguments from `$ARGUMENTS`:**
18
+ - First argument is teacher experiment ID (required)
19
+ - `--compression 4` — compression ratio (default: 4x)
20
+ - `--method soft_labels|feature_matching|dataset_distillation` — distillation method
21
+ - `--target-latency 5` — auto-adjust compression to meet latency target (ms)
22
+ - `--json` — raw JSON output
23
+
24
+ 3. **Run distillation planner:**
25
+ ```bash
26
+ python scripts/model_distiller.py $ARGUMENTS
27
+ ```
28
+
29
+ 4. **Report includes:**
30
+ - Teacher model metrics
31
+ - Auto-selected student architecture (fewer trees/layers/width)
32
+ - Estimated size reduction and latency improvement
33
+ - Distillation configuration (temperature, alpha, loss function)
34
+ - Verdict: EXCELLENT / ACCEPTABLE / MARGINAL / TOO MUCH LOSS
35
+
36
+ 5. **Student selection by model type:**
37
+ - **Tree models:** fewer estimators, shallower depth
38
+ - **Neural networks:** fewer layers, narrower hidden dims
39
+ - **scikit-learn:** simpler model family (RandomForest → DecisionTree)
40
+
41
+ 6. **Distillation methods:**
42
+ - **soft_labels:** train on teacher's probability outputs with temperature scaling
43
+ - **feature_matching:** align intermediate representations (neural only)
44
+ - **dataset_distillation:** train on teacher-labeled synthetic data
45
+
46
+ 7. **Saved output:** report written to `experiments/distillations/distill-<exp-id>.yaml`
47
+
48
+ ## Examples
49
+
50
+ ```
51
+ /turing:distill exp-042 # 4x compression, soft labels
52
+ /turing:distill exp-042 --compression 8 # Aggressive compression
53
+ /turing:distill exp-042 --method feature_matching # Neural feature alignment
54
+ /turing:distill exp-042 --target-latency 5 # Meet 5ms latency target
55
+ ```
@@ -0,0 +1,30 @@
1
+ ---
2
+ name: doctor
3
+ description: Harness self-diagnosis — check environment, project, resources, and git state. Auto-fix common issues.
4
+ argument-hint: "[--fix] [--verbose]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Is Turing healthy? Check everything and get a score.
9
+
10
+ ## Steps
11
+ 1. `source .venv/bin/activate`
12
+ 2. `python scripts/harness_doctor.py $ARGUMENTS`
13
+ 3. **Saved:** `experiments/doctor/`
14
+
15
+ ## Checks
16
+ - **Environment:** Python version, venv status
17
+ - **Dependencies:** all required packages importable
18
+ - **Config:** config.yaml valid with required fields
19
+ - **Experiment log:** JSONL integrity, corrupt line detection
20
+ - **Scripts:** train.py, prepare.py, evaluate.py exist and parse
21
+ - **Disk space:** warn if <1GB free
22
+ - **Git state:** uncommitted changes to critical files
23
+ - **Claude hooks:** `.claude/settings.local.json` hook group schema; `--fix` migrates legacy bare command hooks
24
+
25
+ ## Examples
26
+ ```
27
+ /turing:doctor
28
+ /turing:doctor --fix
29
+ /turing:doctor --verbose --json
30
+ ```
@@ -0,0 +1,53 @@
1
+ ---
2
+ name: ensemble
3
+ description: Automated ensemble construction — combines top-K models via voting, stacking, and blending for zero-cost improvement.
4
+ argument-hint: "[--top-k 5] [--methods voting,stacking,blending]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Build ensembles from your best experiments automatically. Often yields 1-3% improvement with zero additional training.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Parse arguments from `$ARGUMENTS`:**
18
+ - `--top-k 5` — number of top models to include (default: 5)
19
+ - `--methods voting,stacking,blending` — ensemble methods to try
20
+ - `--predictions-dir experiments/predictions` — directory with saved predictions
21
+ - `--json` — raw JSON output
22
+
23
+ 3. **Run ensemble construction:**
24
+ ```bash
25
+ python scripts/build_ensemble.py $ARGUMENTS
26
+ ```
27
+
28
+ 4. **Report results:**
29
+ - Table of all ensemble methods tried with metric deltas vs best single model
30
+ - Best ensemble method highlighted with improvement amount
31
+ - Diversity analysis: prediction correlation matrix, diversity assessment
32
+ - Base model summary: which experiments were combined
33
+
34
+ 5. **Ensemble methods:**
35
+ - **Voting:** majority vote (classification) or mean (regression)
36
+ - **Weighted voting:** weights proportional to individual model performance
37
+ - **Stacking:** cross-validated meta-learner (ridge/logistic) on out-of-fold predictions
38
+ - **Blending:** holdout-based meta-learner (simpler, less data-efficient)
39
+
40
+ 6. **Prerequisites:** experiments must have saved predictions in `experiments/predictions/`. Each experiment needs `<exp-id>-predictions.npy` and a shared `labels.npy`.
41
+
42
+ 7. **If no predictions exist:** suggest saving predictions during training by adding prediction logging to `evaluate.py`.
43
+
44
+ 8. **Saved output:** report written to `experiments/ensembles/ensemble-*.yaml`
45
+
46
+ ## Examples
47
+
48
+ ```
49
+ /turing:ensemble # Default: top-5, all methods
50
+ /turing:ensemble --top-k 3 # Top-3 models only
51
+ /turing:ensemble --methods voting,stacking # Specific methods
52
+ /turing:ensemble --json # Machine-readable output
53
+ ```
@@ -0,0 +1,106 @@
1
+ ---
2
+ name: explore
3
+ description: Tree-search-guided hypothesis exploration using AB-MCTS. Explores the space of experiment ideas as a search tree, scored by the critique engine. Discovers non-obvious refinement chains that linear suggestion cannot find.
4
+ argument-hint: "[ml/project] [--iterations N] [--top N] [--strategy abmcts-a|abmcts-m|greedy]"
5
+ allowed-tools: Read, Write, Bash(python scripts/*:*, source .venv/bin/activate:*), Grep, Glob
6
+ ---
7
+
8
+ Explore the hypothesis space using tree search. Instead of suggesting independent ideas, this builds and searches a tree of refinement chains — each node is a hypothesis scored by novelty, feasibility, and expected impact.
9
+
10
+ ## Project Detection
11
+
12
+ 0. **Detect project directory:**
13
+ - If `$ARGUMENTS` contains a path (e.g., `ml/coding`), use that as the project directory
14
+ - Else if cwd contains `config.yaml` and `train.py`, use cwd
15
+ - Else search for `ml/*/` subdirectories containing `config.yaml`
16
+ - If exactly one found, use it
17
+ - If multiple found, list them and ask the user which to target
18
+ - All subsequent commands run from the detected project directory
19
+
20
+ ## Parse Options
21
+
22
+ Extract from `$ARGUMENTS`:
23
+ - `--iterations N` — search depth (default: 30)
24
+ - `--top N` — number of results to return (default: 5)
25
+ - `--strategy` — algorithm choice: `abmcts-a` (default), `abmcts-m` (Bayesian), or `greedy` (no TreeQuest needed)
26
+ - `--seeds-only` — just show generated seeds without running search
27
+ - `--json` — output as JSON for programmatic use
28
+
29
+ ## Steps
30
+
31
+ ### 1. Assess Current State
32
+
33
+ ```bash
34
+ source .venv/bin/activate && python scripts/show_metrics.py --last 10 2>/dev/null || echo "No experiments yet"
35
+ ```
36
+
37
+ Read `config.yaml` to understand the current model and metric.
38
+
39
+ ### 2. Run Tree Search
40
+
41
+ ```bash
42
+ source .venv/bin/activate && python scripts/treequest_suggest.py \
43
+ --log experiments/log.jsonl \
44
+ --config config.yaml \
45
+ --top <N> \
46
+ --iterations <N> \
47
+ --strategy <strategy>
48
+ ```
49
+
50
+ The script will:
51
+ - Generate seed hypotheses from config and experiment history
52
+ - Run AB-MCTS (or greedy fallback) over the hypothesis tree
53
+ - Score each node using the critique engine
54
+ - Return top-K ranked, deduplicated hypotheses
55
+
56
+ ### 3. Queue Best Hypotheses
57
+
58
+ For each result, add to the hypothesis queue:
59
+
60
+ ```bash
61
+ source .venv/bin/activate && python scripts/manage_hypotheses.py add "<description>" \
62
+ --priority medium --source treequest
63
+ ```
64
+
65
+ ### 4. Show Results
66
+
67
+ Display the search output and confirm queuing:
68
+
69
+ ```
70
+ TreeQuest Hypothesis Exploration (AB-MCTS-A)
71
+ ============================================
72
+ Nodes explored: 35
73
+ Top 5 hypotheses by critique score:
74
+
75
+ 1. [PROCEED] (score: 7.8/10)
76
+ Switch to LightGBM with dart boosting; additionally add polynomial features
77
+ Novelty: 8 Feasibility: 9 Impact: 7
78
+ -> Queued as hyp-NNN
79
+
80
+ 2. [PROCEED] (score: 7.2/10)
81
+ Use low learning rate (0.01) with 2000 estimators; additionally add L2 regularization
82
+ Novelty: 7 Feasibility: 8 Impact: 7
83
+ Depth: 1 (refined from parent)
84
+ -> Queued as hyp-NNN
85
+
86
+ ...
87
+
88
+ Queued N hypotheses. Run /turing:train to test them.
89
+ ```
90
+
91
+ ## How It Differs From /turing:suggest
92
+
93
+ | | `/turing:suggest` | `/turing:explore` |
94
+ |---|---|---|
95
+ | **Source** | Web literature search | Tree search over critique scores |
96
+ | **Strategy** | Independent suggestions | Refinement chains (parent -> child) |
97
+ | **Requires internet** | Yes | No |
98
+ | **Discovers** | What papers recommend | What combinations score well |
99
+ | **Best for** | Early-stage exploration | Mid-experiment optimization |
100
+
101
+ ## Integration
102
+
103
+ - Results feed into `hypotheses.yaml` — the next `/turing:train` picks them up
104
+ - `/turing:brief` shows queued treequest-sourced hypotheses
105
+ - `/turing:suggest --strategy treequest` is an alias for this command
106
+ - Human can override priority: `/turing:try` always takes precedence
@@ -0,0 +1,47 @@
1
+ ---
2
+ name: export
3
+ description: Export model to production format with equivalence verification, latency benchmarking, and deployment model card.
4
+ argument-hint: "[exp-id] [--format joblib|xgboost_json|onnx|torchscript|tflite]"
5
+ allowed-tools: Read, Bash(*), Grep, Glob
6
+ ---
7
+
8
+ Export a trained model to a production-ready format.
9
+
10
+ ## Steps
11
+
12
+ 1. **Activate environment:**
13
+ ```bash
14
+ source .venv/bin/activate
15
+ ```
16
+
17
+ 2. **Parse arguments from `$ARGUMENTS`:**
18
+ - First argument can be an experiment ID (e.g., `exp-042`); defaults to best
19
+ - `--format joblib|xgboost_json|onnx|torchscript|tflite` specifies export format (auto-detected if omitted)
20
+ - `--skip-equivalence` skips inference equivalence check
21
+ - `--skip-latency` skips latency benchmark
22
+ - `--samples 100` sets test sample count
23
+
24
+ 3. **Run export pipeline:**
25
+ ```bash
26
+ python scripts/export_model.py $ARGUMENTS
27
+ ```
28
+
29
+ 4. **Report results:**
30
+ - **Export:** format, file size, output path, dependencies
31
+ - **Equivalence:** verdict (equivalent/approximately_equivalent/divergent), max delta
32
+ - **Latency:** p50/p95/p99 ms, speedup vs original
33
+ - **Model Card:** metrics, seed study, equivalence, latency, dependencies
34
+
35
+ 5. **Output:** exported model + model_card.yaml written to `exports/exp-NNN/`
36
+
37
+ 6. **If model file not found:** suggest checking models/best/ directory.
38
+
39
+ ## Examples
40
+
41
+ ```
42
+ /turing:export # Best experiment, default format
43
+ /turing:export exp-042 # Specific experiment
44
+ /turing:export --format xgboost_json # Native XGBoost JSON
45
+ /turing:export --format onnx # ONNX format
46
+ /turing:export --skip-equivalence --skip-latency # Fast export
47
+ ```