@researai/deepscientist 1.5.17 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (894) hide show
  1. package/AGENTS.md +309 -130
  2. package/AISB/catalog/aisb.b1.agentic_coding.yaml +244 -0
  3. package/AISB/catalog/aisb.b10.climate_earth.yaml +235 -0
  4. package/AISB/catalog/aisb.b11.model_efficiency.yaml +231 -0
  5. package/AISB/catalog/aisb.b12.embodied_ai.yaml +238 -0
  6. package/AISB/catalog/aisb.b2.agent_systems.yaml +229 -0
  7. package/AISB/catalog/aisb.b3.self_evolving_rl.yaml +237 -0
  8. package/AISB/catalog/aisb.b4.lm_reasoning.yaml +240 -0
  9. package/AISB/catalog/aisb.b5.math_proof.yaml +235 -0
  10. package/AISB/catalog/aisb.b6.research_process.yaml +243 -0
  11. package/AISB/catalog/aisb.b7.multimodal_fusion.yaml +232 -0
  12. package/AISB/catalog/aisb.b8.lifesci_drug.yaml +275 -0
  13. package/AISB/catalog/aisb.b9.material_science.yaml +237 -0
  14. package/AISB/catalog/aisb.t3.001_savvy.yaml +159 -0
  15. package/AISB/catalog/aisb.t3.001_savvy.zh.yaml +121 -0
  16. package/AISB/catalog/aisb.t3.002_pinet.yaml +189 -0
  17. package/AISB/catalog/aisb.t3.002_pinet.zh.yaml +130 -0
  18. package/AISB/catalog/aisb.t3.004_decentralattn.yaml +184 -0
  19. package/AISB/catalog/aisb.t3.004_decentralattn.zh.yaml +153 -0
  20. package/AISB/catalog/aisb.t3.005_tsae.yaml +193 -0
  21. package/AISB/catalog/aisb.t3.005_tsae.zh.yaml +139 -0
  22. package/AISB/catalog/aisb.t3.006_physense.yaml +194 -0
  23. package/AISB/catalog/aisb.t3.006_physense.zh.yaml +118 -0
  24. package/AISB/catalog/aisb.t3.007_reasoningiqa.yaml +169 -0
  25. package/AISB/catalog/aisb.t3.007_reasoningiqa.zh.yaml +133 -0
  26. package/AISB/catalog/aisb.t3.008_meanflows.yaml +188 -0
  27. package/AISB/catalog/aisb.t3.008_meanflows.zh.yaml +140 -0
  28. package/AISB/catalog/aisb.t3.009_scoremissing.yaml +179 -0
  29. package/AISB/catalog/aisb.t3.009_scoremissing.zh.yaml +119 -0
  30. package/AISB/catalog/aisb.t3.010_suitabilityfilter.yaml +221 -0
  31. package/AISB/catalog/aisb.t3.010_suitabilityfilter.zh.yaml +141 -0
  32. package/AISB/catalog/aisb.t3.011_osd.yaml +206 -0
  33. package/AISB/catalog/aisb.t3.011_osd.zh.yaml +163 -0
  34. package/AISB/catalog/aisb.t3.012_efficientqat.yaml +206 -0
  35. package/AISB/catalog/aisb.t3.012_efficientqat.zh.yaml +159 -0
  36. package/AISB/catalog/aisb.t3.013_appl.yaml +152 -0
  37. package/AISB/catalog/aisb.t3.013_appl.zh.yaml +126 -0
  38. package/AISB/catalog/aisb.t3.014_piguard.yaml +207 -0
  39. package/AISB/catalog/aisb.t3.014_piguard.zh.yaml +164 -0
  40. package/AISB/catalog/aisb.t3.015_frspec.yaml +209 -0
  41. package/AISB/catalog/aisb.t3.015_frspec.zh.yaml +163 -0
  42. package/AISB/catalog/aisb.t3.016_mathfusion.yaml +166 -0
  43. package/AISB/catalog/aisb.t3.016_mathfusion.zh.yaml +145 -0
  44. package/AISB/catalog/aisb.t3.017_multimodalglp.yaml +171 -0
  45. package/AISB/catalog/aisb.t3.017_multimodalglp.zh.yaml +122 -0
  46. package/AISB/catalog/aisb.t3.018_cotsynth.yaml +206 -0
  47. package/AISB/catalog/aisb.t3.018_cotsynth.zh.yaml +162 -0
  48. package/AISB/catalog/aisb.t3.019_dyscaleut.yaml +211 -0
  49. package/AISB/catalog/aisb.t3.019_dyscaleut.zh.yaml +148 -0
  50. package/AISB/catalog/aisb.t3.020_aristotle.yaml +173 -0
  51. package/AISB/catalog/aisb.t3.020_aristotle.zh.yaml +119 -0
  52. package/AISB/catalog/aisb.t3.021_tokenrecycling.yaml +160 -0
  53. package/AISB/catalog/aisb.t3.021_tokenrecycling.zh.yaml +129 -0
  54. package/AISB/catalog/aisb.t3.022_chainofreasoning.yaml +204 -0
  55. package/AISB/catalog/aisb.t3.022_chainofreasoning.zh.yaml +161 -0
  56. package/AISB/catalog/aisb.t3.023_guidedembed.yaml +211 -0
  57. package/AISB/catalog/aisb.t3.023_guidedembed.zh.yaml +189 -0
  58. package/AISB/catalog/aisb.t3.024_outputcentric.yaml +148 -0
  59. package/AISB/catalog/aisb.t3.024_outputcentric.zh.yaml +131 -0
  60. package/AISB/catalog/aisb.t3.025_deeper.yaml +143 -0
  61. package/AISB/catalog/aisb.t3.025_deeper.zh.yaml +116 -0
  62. package/AISB/catalog/aisb.t3.026_gartkg.yaml +195 -0
  63. package/AISB/catalog/aisb.t3.026_gartkg.zh.yaml +127 -0
  64. package/AISB/catalog/aisb.t3.027_citeeval.yaml +182 -0
  65. package/AISB/catalog/aisb.t3.027_citeeval.zh.yaml +135 -0
  66. package/AISB/catalog/aisb.t3.028_sbam.yaml +206 -0
  67. package/AISB/catalog/aisb.t3.028_sbam.zh.yaml +166 -0
  68. package/AISB/catalog/aisb.t3.029_cdqgeoembed.yaml +224 -0
  69. package/AISB/catalog/aisb.t3.029_cdqgeoembed.zh.yaml +142 -0
  70. package/AISB/catalog/aisb.t3.030_processrm.yaml +211 -0
  71. package/AISB/catalog/aisb.t3.030_processrm.zh.yaml +166 -0
  72. package/AISB/catalog/aisb.t3.031_circuitstability.yaml +172 -0
  73. package/AISB/catalog/aisb.t3.031_circuitstability.zh.yaml +134 -0
  74. package/AISB/catalog/aisb.t3.032_ptsolver.yaml +169 -0
  75. package/AISB/catalog/aisb.t3.032_ptsolver.zh.yaml +135 -0
  76. package/AISB/catalog/aisb.t3.033_gcse.yaml +144 -0
  77. package/AISB/catalog/aisb.t3.033_gcse.zh.yaml +126 -0
  78. package/AISB/catalog/aisb.t3.034_ensemblewm.yaml +183 -0
  79. package/AISB/catalog/aisb.t3.034_ensemblewm.zh.yaml +146 -0
  80. package/AISB/catalog/aisb.t3.035_moralvalueswa.yaml +207 -0
  81. package/AISB/catalog/aisb.t3.035_moralvalueswa.zh.yaml +165 -0
  82. package/AISB/catalog/aisb.t3.036_weakstrongpref.yaml +210 -0
  83. package/AISB/catalog/aisb.t3.036_weakstrongpref.zh.yaml +194 -0
  84. package/AISB/catalog/aisb.t3.037_dementiamask.yaml +172 -0
  85. package/AISB/catalog/aisb.t3.037_dementiamask.zh.yaml +132 -0
  86. package/AISB/catalog/aisb.t3.038_tinysam.yaml +284 -0
  87. package/AISB/catalog/aisb.t3.038_tinysam.zh.yaml +240 -0
  88. package/AISB/catalog/aisb.t3.039_calf.yaml +224 -0
  89. package/AISB/catalog/aisb.t3.039_calf.zh.yaml +194 -0
  90. package/AISB/catalog/aisb.t3.040_graniteguardian.yaml +199 -0
  91. package/AISB/catalog/aisb.t3.040_graniteguardian.zh.yaml +174 -0
  92. package/AISB/catalog/aisb.t3.041_amdm.yaml +149 -0
  93. package/AISB/catalog/aisb.t3.041_amdm.zh.yaml +137 -0
  94. package/AISB/catalog/aisb.t3.042_xpatch.yaml +216 -0
  95. package/AISB/catalog/aisb.t3.042_xpatch.zh.yaml +182 -0
  96. package/AISB/catalog/aisb.t3.043_vhm.yaml +268 -0
  97. package/AISB/catalog/aisb.t3.043_vhm.zh.yaml +193 -0
  98. package/AISB/catalog/aisb.t3.044_rgvi.yaml +224 -0
  99. package/AISB/catalog/aisb.t3.044_rgvi.zh.yaml +176 -0
  100. package/AISB/catalog/aisb.t3.045_pslstm.yaml +203 -0
  101. package/AISB/catalog/aisb.t3.045_pslstm.zh.yaml +179 -0
  102. package/AISB/catalog/aisb.t3.046_nonstatts.yaml +208 -0
  103. package/AISB/catalog/aisb.t3.046_nonstatts.zh.yaml +194 -0
  104. package/AISB/catalog/aisb.t3.047_timepfn.yaml +156 -0
  105. package/AISB/catalog/aisb.t3.047_timepfn.zh.yaml +124 -0
  106. package/AISB/catalog/aisb.t3.048_proxyspex.yaml +148 -0
  107. package/AISB/catalog/aisb.t3.048_proxyspex.zh.yaml +125 -0
  108. package/AISB/catalog/aisb.t3.049_hogwildinference.yaml +183 -0
  109. package/AISB/catalog/aisb.t3.049_hogwildinference.zh.yaml +138 -0
  110. package/AISB/catalog/aisb.t3.050_causalpfn.yaml +214 -0
  111. package/AISB/catalog/aisb.t3.050_causalpfn.zh.yaml +190 -0
  112. package/AISB/catalog/aisb.t3.051_flashtp.yaml +169 -0
  113. package/AISB/catalog/aisb.t3.051_flashtp.zh.yaml +124 -0
  114. package/AISB/catalog/aisb.t3.052_nsdiff.yaml +155 -0
  115. package/AISB/catalog/aisb.t3.052_nsdiff.zh.yaml +138 -0
  116. package/AISB/catalog/aisb.t3.053_k2vae.yaml +158 -0
  117. package/AISB/catalog/aisb.t3.053_k2vae.zh.yaml +132 -0
  118. package/AISB/catalog/aisb.t3.054_timebase.yaml +178 -0
  119. package/AISB/catalog/aisb.t3.054_timebase.zh.yaml +158 -0
  120. package/AISB/catalog/aisb.t3.055_csbrain.yaml +238 -0
  121. package/AISB/catalog/aisb.t3.055_csbrain.zh.yaml +184 -0
  122. package/AISB/catalog/aisb.t3.056_infosam.yaml +224 -0
  123. package/AISB/catalog/aisb.t3.056_infosam.zh.yaml +189 -0
  124. package/AISB/catalog/aisb.t3.057_mdreid.yaml +129 -0
  125. package/AISB/catalog/aisb.t3.057_mdreid.zh.yaml +117 -0
  126. package/AISB/catalog/aisb.t3.058_mindglitch.yaml +171 -0
  127. package/AISB/catalog/aisb.t3.058_mindglitch.zh.yaml +145 -0
  128. package/AISB/catalog/aisb.t3.059_selfsupervised.yaml +154 -0
  129. package/AISB/catalog/aisb.t3.059_selfsupervised.zh.yaml +125 -0
  130. package/AISB/catalog/aisb.t3.060_iaggad.yaml +121 -0
  131. package/AISB/catalog/aisb.t3.060_iaggad.zh.yaml +100 -0
  132. package/AISB/catalog/aisb.t3.061_hsgkn.yaml +136 -0
  133. package/AISB/catalog/aisb.t3.061_hsgkn.zh.yaml +113 -0
  134. package/AISB/catalog/aisb.t3.062_visionts.yaml +237 -0
  135. package/AISB/catalog/aisb.t3.062_visionts.zh.yaml +216 -0
  136. package/AISB/catalog/aisb.t3.063_tsrag.yaml +162 -0
  137. package/AISB/catalog/aisb.t3.063_tsrag.zh.yaml +138 -0
  138. package/AISB/catalog/aisb.t3.064_pir.yaml +221 -0
  139. package/AISB/catalog/aisb.t3.064_pir.zh.yaml +197 -0
  140. package/AISB/catalog/aisb.t3.065_proteinbinding.yaml +234 -0
  141. package/AISB/catalog/aisb.t3.065_proteinbinding.zh.yaml +167 -0
  142. package/AISB/catalog/aisb.t3.066_tropicalattention.yaml +267 -0
  143. package/AISB/catalog/aisb.t3.066_tropicalattention.zh.yaml +229 -0
  144. package/AISB/catalog/aisb.t3.067_kanad.yaml +193 -0
  145. package/AISB/catalog/aisb.t3.067_kanad.zh.yaml +167 -0
  146. package/AISB/catalog/aisb.t3.068_sempo.yaml +187 -0
  147. package/AISB/catalog/aisb.t3.068_sempo.zh.yaml +148 -0
  148. package/AISB/catalog/aisb.t3.069_treehfd.yaml +129 -0
  149. package/AISB/catalog/aisb.t3.069_treehfd.zh.yaml +111 -0
  150. package/AISB/catalog/aisb.t3.070_certifiedunlearning.yaml +224 -0
  151. package/AISB/catalog/aisb.t3.070_certifiedunlearning.zh.yaml +171 -0
  152. package/AISB/catalog/aisb.t3.071_neuralmjd.yaml +142 -0
  153. package/AISB/catalog/aisb.t3.071_neuralmjd.zh.yaml +120 -0
  154. package/AISB/catalog/aisb.t3.072_fedgmt.yaml +181 -0
  155. package/AISB/catalog/aisb.t3.072_fedgmt.zh.yaml +158 -0
  156. package/AISB/catalog/aisb.t3.073_rld.yaml +161 -0
  157. package/AISB/catalog/aisb.t3.073_rld.zh.yaml +129 -0
  158. package/AISB/catalog/aisb.t3.074_lsvi.yaml +163 -0
  159. package/AISB/catalog/aisb.t3.074_lsvi.zh.yaml +129 -0
  160. package/AISB/catalog/aisb.t3.075_treeslicedentropy.yaml +201 -0
  161. package/AISB/catalog/aisb.t3.075_treeslicedentropy.zh.yaml +148 -0
  162. package/AISB/catalog/aisb.t3.076_aanet.yaml +169 -0
  163. package/AISB/catalog/aisb.t3.076_aanet.zh.yaml +129 -0
  164. package/AISB/catalog/aisb.t3.077_cmnn.yaml +199 -0
  165. package/AISB/catalog/aisb.t3.077_cmnn.zh.yaml +165 -0
  166. package/AISB/catalog/aisb.t3.078_conformalanomaly.yaml +146 -0
  167. package/AISB/catalog/aisb.t3.078_conformalanomaly.zh.yaml +117 -0
  168. package/AISB/catalog/aisb.t3.079_dpfkmeans.yaml +131 -0
  169. package/AISB/catalog/aisb.t3.079_dpfkmeans.zh.yaml +104 -0
  170. package/AISB/catalog/aisb.t3.080_latentscorereweight.yaml +169 -0
  171. package/AISB/catalog/aisb.t3.080_latentscorereweight.zh.yaml +123 -0
  172. package/AISB/catalog/aisb.t3.081_qmamba.yaml +150 -0
  173. package/AISB/catalog/aisb.t3.081_qmamba.zh.yaml +117 -0
  174. package/AISB/catalog/aisb.t3.082_onlinellmrouting.yaml +160 -0
  175. package/AISB/catalog/aisb.t3.082_onlinellmrouting.zh.yaml +133 -0
  176. package/AISB/catalog/aisb.t3.083_starformer.yaml +178 -0
  177. package/AISB/catalog/aisb.t3.083_starformer.zh.yaml +140 -0
  178. package/AISB/catalog/aisb.t3.084_ift.yaml +139 -0
  179. package/AISB/catalog/aisb.t3.084_ift.zh.yaml +111 -0
  180. package/AISB/catalog/aisb.t3.085_neuralsurv.yaml +183 -0
  181. package/AISB/catalog/aisb.t3.085_neuralsurv.zh.yaml +143 -0
  182. package/AISB/catalog/aisb.t3.086_stella.yaml +197 -0
  183. package/AISB/catalog/aisb.t3.086_stella.zh.yaml +142 -0
  184. package/AISB/catalog/aisb.t3.087_moses.yaml +167 -0
  185. package/AISB/catalog/aisb.t3.087_moses.zh.yaml +132 -0
  186. package/AISB/catalog/aisb.t3.088_channelnorm.yaml +140 -0
  187. package/AISB/catalog/aisb.t3.088_channelnorm.zh.yaml +109 -0
  188. package/AISB/catalog/aisb.t3.089_causalvelocity.yaml +730 -0
  189. package/AISB/catalog/aisb.t3.089_causalvelocity.zh.yaml +668 -0
  190. package/AISB/catalog/aisb.t3.090_rstib.yaml +144 -0
  191. package/AISB/catalog/aisb.t3.090_rstib.zh.yaml +109 -0
  192. package/AISB/catalog/aisb.t3.091_timeawarecausal.yaml +132 -0
  193. package/AISB/catalog/aisb.t3.091_timeawarecausal.zh.yaml +107 -0
  194. package/AISB/catalog/aisb.t3.092_kmeanslocalopt.yaml +138 -0
  195. package/AISB/catalog/aisb.t3.092_kmeanslocalopt.zh.yaml +110 -0
  196. package/AISB/catalog/aisb.t3.093_fedwmsam.yaml +134 -0
  197. package/AISB/catalog/aisb.t3.093_fedwmsam.zh.yaml +106 -0
  198. package/AISB/catalog/aisb.t3.094_boundre.yaml +147 -0
  199. package/AISB/catalog/aisb.t3.094_boundre.zh.yaml +114 -0
  200. package/AISB/catalog/aisb.t3.095_fastfeaturecp.yaml +153 -0
  201. package/AISB/catalog/aisb.t3.095_fastfeaturecp.zh.yaml +118 -0
  202. package/AISB/catalog/aisb.t3.096_m3svm.yaml +189 -0
  203. package/AISB/catalog/aisb.t3.096_m3svm.zh.yaml +149 -0
  204. package/AISB/catalog/aisb.t3.097_wassersteintl.yaml +212 -0
  205. package/AISB/catalog/aisb.t3.097_wassersteintl.zh.yaml +169 -0
  206. package/AISB/catalog/aisb.t3.098_xmahalanobis.yaml +171 -0
  207. package/AISB/catalog/aisb.t3.098_xmahalanobis.zh.yaml +127 -0
  208. package/AISB/catalog/aisb.t3.099_ollalanding.yaml +248 -0
  209. package/AISB/catalog/aisb.t3.099_ollalanding.zh.yaml +182 -0
  210. package/AISB/catalog/aisb.t3.100_invmissingdata.yaml +179 -0
  211. package/AISB/catalog/aisb.t3.100_invmissingdata.zh.yaml +150 -0
  212. package/AISB/catalog/aisb.t3.101_acia.yaml +164 -0
  213. package/AISB/catalog/aisb.t3.101_acia.zh.yaml +109 -0
  214. package/AISB/catalog/aisb.t3.102_stochasticff.yaml +178 -0
  215. package/AISB/catalog/aisb.t3.102_stochasticff.zh.yaml +130 -0
  216. package/AISB/catalog/aisb.t3.103_qdcp.yaml +150 -0
  217. package/AISB/catalog/aisb.t3.103_qdcp.zh.yaml +116 -0
  218. package/AISB/catalog/aisb.t3.104_balancedactiveinf.yaml +137 -0
  219. package/AISB/catalog/aisb.t3.104_balancedactiveinf.zh.yaml +104 -0
  220. package/AISB/catalog/aisb.t3.105_binaryclasseval.yaml +161 -0
  221. package/AISB/catalog/aisb.t3.105_binaryclasseval.zh.yaml +130 -0
  222. package/AISB/image/001_aisb.t3.001_savvy.jpg +0 -0
  223. package/AISB/image/002_aisb.t3.002_pinet.jpg +0 -0
  224. package/AISB/image/003_aisb.t3.003_dmsqd.jpg +0 -0
  225. package/AISB/image/004_aisb.t3.004_decentralattn.jpg +0 -0
  226. package/AISB/image/005_aisb.t3.005_tsae.jpg +0 -0
  227. package/AISB/image/006_aisb.t3.006_physense.jpg +0 -0
  228. package/AISB/image/007_aisb.t3.007_reasoningiqa.jpg +0 -0
  229. package/AISB/image/008_aisb.t3.008_meanflows.jpg +0 -0
  230. package/AISB/image/009_aisb.t3.009_scoremissing.jpg +0 -0
  231. package/AISB/image/010_aisb.t3.010_suitabilityfilter.jpg +0 -0
  232. package/AISB/image/011_aisb.t3.011_osd.jpg +0 -0
  233. package/AISB/image/012_aisb.t3.012_efficientqat.jpg +0 -0
  234. package/AISB/image/013_aisb.t3.013_appl.jpg +0 -0
  235. package/AISB/image/014_aisb.t3.014_piguard.jpg +0 -0
  236. package/AISB/image/015_aisb.t3.015_frspec.jpg +0 -0
  237. package/AISB/image/016_aisb.t3.016_mathfusion.jpg +0 -0
  238. package/AISB/image/017_aisb.t3.017_multimodalglp.jpg +0 -0
  239. package/AISB/image/018_aisb.t3.018_cotsynth.jpg +0 -0
  240. package/AISB/image/019_aisb.t3.019_dyscaleut.jpg +0 -0
  241. package/AISB/image/020_aisb.t3.020_aristotle.jpg +0 -0
  242. package/AISB/image/021_aisb.t3.021_tokenrecycling.jpg +0 -0
  243. package/AISB/image/022_aisb.t3.022_chainofreasoning.jpg +0 -0
  244. package/AISB/image/023_aisb.t3.023_guidedembed.jpg +0 -0
  245. package/AISB/image/024_aisb.t3.024_outputcentric.jpg +0 -0
  246. package/AISB/image/025_aisb.t3.025_deeper.jpg +0 -0
  247. package/AISB/image/026_aisb.t3.026_gartkg.jpg +0 -0
  248. package/AISB/image/027_aisb.t3.027_citeeval.jpg +0 -0
  249. package/AISB/image/028_aisb.t3.028_sbam.jpg +0 -0
  250. package/AISB/image/029_aisb.t3.029_cdqgeoembed.jpg +0 -0
  251. package/AISB/image/030_aisb.t3.030_processrm.jpg +0 -0
  252. package/AISB/image/031_aisb.t3.031_circuitstability.jpg +0 -0
  253. package/AISB/image/032_aisb.t3.032_ptsolver.jpg +0 -0
  254. package/AISB/image/033_aisb.t3.033_gcse.jpg +0 -0
  255. package/AISB/image/034_aisb.t3.034_ensemblewm.jpg +0 -0
  256. package/AISB/image/035_aisb.t3.035_moralvalueswa.jpg +0 -0
  257. package/AISB/image/036_aisb.t3.036_weakstrongpref.jpg +0 -0
  258. package/AISB/image/037_aisb.t3.037_dementiamask.jpg +0 -0
  259. package/AISB/image/038_aisb.t3.038_tinysam.jpg +0 -0
  260. package/AISB/image/039_aisb.t3.039_calf.jpg +0 -0
  261. package/AISB/image/040_aisb.t3.040_graniteguardian.jpg +0 -0
  262. package/AISB/image/041_aisb.t3.041_amdm.jpg +0 -0
  263. package/AISB/image/042_aisb.t3.042_xpatch.jpg +0 -0
  264. package/AISB/image/043_aisb.t3.043_vhm.jpg +0 -0
  265. package/AISB/image/044_aisb.t3.044_rgvi.jpg +0 -0
  266. package/AISB/image/045_aisb.t3.045_pslstm.jpg +0 -0
  267. package/AISB/image/046_aisb.t3.046_nonstatts.jpg +0 -0
  268. package/AISB/image/047_aisb.t3.047_timepfn.jpg +0 -0
  269. package/AISB/image/048_aisb.t3.048_proxyspex.jpg +0 -0
  270. package/AISB/image/049_aisb.t3.049_hogwildinference.jpg +0 -0
  271. package/AISB/image/050_aisb.t3.050_causalpfn.jpg +0 -0
  272. package/AISB/image/051_aisb.t3.051_flashtp.jpg +0 -0
  273. package/AISB/image/052_aisb.t3.052_nsdiff.jpg +0 -0
  274. package/AISB/image/053_aisb.t3.053_k2vae.jpg +0 -0
  275. package/AISB/image/054_aisb.t3.054_timebase.jpg +0 -0
  276. package/AISB/image/055_aisb.t3.055_csbrain.jpg +0 -0
  277. package/AISB/image/056_aisb.t3.056_infosam.jpg +0 -0
  278. package/AISB/image/057_aisb.t3.057_mdreid.jpg +0 -0
  279. package/AISB/image/058_aisb.t3.058_mindglitch.jpg +0 -0
  280. package/AISB/image/059_aisb.t3.059_selfsupervised.jpg +0 -0
  281. package/AISB/image/060_aisb.t3.060_iaggad.jpg +0 -0
  282. package/AISB/image/061_aisb.t3.061_hsgkn.jpg +0 -0
  283. package/AISB/image/062_aisb.t3.062_visionts.jpg +0 -0
  284. package/AISB/image/063_aisb.t3.063_tsrag.jpg +0 -0
  285. package/AISB/image/064_aisb.t3.064_pir.jpg +0 -0
  286. package/AISB/image/065_aisb.t3.065_proteinbinding.jpg +0 -0
  287. package/AISB/image/066_aisb.t3.066_tropicalattention.jpg +0 -0
  288. package/AISB/image/067_aisb.t3.067_kanad.jpg +0 -0
  289. package/AISB/image/068_aisb.t3.068_sempo.jpg +0 -0
  290. package/AISB/image/069_aisb.t3.069_treehfd.jpg +0 -0
  291. package/AISB/image/070_aisb.t3.070_certifiedunlearning.jpg +0 -0
  292. package/AISB/image/071_aisb.t3.071_neuralmjd.jpg +0 -0
  293. package/AISB/image/072_aisb.t3.072_fedgmt.jpg +0 -0
  294. package/AISB/image/073_aisb.t3.073_rld.jpg +0 -0
  295. package/AISB/image/074_aisb.t3.074_lsvi.jpg +0 -0
  296. package/AISB/image/075_aisb.t3.075_treeslicedentropy.jpg +0 -0
  297. package/AISB/image/076_aisb.t3.076_aanet.jpg +0 -0
  298. package/AISB/image/077_aisb.t3.077_cmnn.jpg +0 -0
  299. package/AISB/image/078_aisb.t3.078_conformalanomaly.jpg +0 -0
  300. package/AISB/image/079_aisb.t3.079_dpfkmeans.jpg +0 -0
  301. package/AISB/image/080_aisb.t3.080_latentscorereweight.jpg +0 -0
  302. package/AISB/image/081_aisb.t3.081_qmamba.jpg +0 -0
  303. package/AISB/image/082_aisb.t3.082_onlinellmrouting.jpg +0 -0
  304. package/AISB/image/083_aisb.t3.083_starformer.jpg +0 -0
  305. package/AISB/image/084_aisb.t3.084_ift.jpg +0 -0
  306. package/AISB/image/085_aisb.t3.085_neuralsurv.jpg +0 -0
  307. package/AISB/image/086_aisb.t3.086_stella.jpg +0 -0
  308. package/AISB/image/087_aisb.t3.087_moses.jpg +0 -0
  309. package/AISB/image/088_aisb.t3.088_channelnorm.jpg +0 -0
  310. package/AISB/image/089_aisb.t3.089_causalvelocity.jpg +0 -0
  311. package/AISB/image/090_aisb.t3.090_rstib.jpg +0 -0
  312. package/AISB/image/091_aisb.t3.091_timeawarecausal.jpg +0 -0
  313. package/AISB/image/092_aisb.t3.092_kmeanslocalopt.jpg +0 -0
  314. package/AISB/image/093_aisb.t3.093_fedwmsam.jpg +0 -0
  315. package/AISB/image/094_aisb.t3.094_boundre.jpg +0 -0
  316. package/AISB/image/095_aisb.t3.095_fastfeaturecp.jpg +0 -0
  317. package/AISB/image/096_aisb.t3.096_m3svm.jpg +0 -0
  318. package/AISB/image/097_aisb.t3.097_wassersteintl.jpg +0 -0
  319. package/AISB/image/098_aisb.t3.098_xmahalanobis.jpg +0 -0
  320. package/AISB/image/099_aisb.t3.099_ollalanding.jpg +0 -0
  321. package/AISB/image/100_aisb.t3.100_invmissingdata.jpg +0 -0
  322. package/AISB/image/101_aisb.t3.101_acia.jpg +0 -0
  323. package/AISB/image/102_aisb.t3.102_stochasticff.jpg +0 -0
  324. package/AISB/image/103_aisb.t3.103_qdcp.jpg +0 -0
  325. package/AISB/image/104_aisb.t3.104_balancedactiveinf.jpg +0 -0
  326. package/AISB/image/105_aisb.t3.105_binaryclasseval.jpg +0 -0
  327. package/AISB/image/106_aisb.t1.reasoning_lite.jpg +0 -0
  328. package/AISB/image/107_aisb.t2.paper_audit.jpg +0 -0
  329. package/AISB/image/108_aisb.t3.multi_gpu_search.jpg +0 -0
  330. package/AISB/image/109_aisb.t3.tdc_admet.jpg +0 -0
  331. package/AISB/image/aisb.b1.agentic_coding.svg +16 -0
  332. package/AISB/image/aisb.b10.climate_earth.svg +16 -0
  333. package/AISB/image/aisb.b11.model_efficiency.svg +16 -0
  334. package/AISB/image/aisb.b12.embodied_ai.svg +16 -0
  335. package/AISB/image/aisb.b2.agent_systems.svg +16 -0
  336. package/AISB/image/aisb.b3.self_evolving_rl.svg +16 -0
  337. package/AISB/image/aisb.b4.lm_reasoning.svg +16 -0
  338. package/AISB/image/aisb.b5.math_proof.svg +16 -0
  339. package/AISB/image/aisb.b6.research_process.svg +16 -0
  340. package/AISB/image/aisb.b7.multimodal_fusion.svg +16 -0
  341. package/AISB/image/aisb.b8.lifesci_drug.svg +16 -0
  342. package/AISB/image/aisb.b9.material_science.svg +16 -0
  343. package/README.md +132 -11
  344. package/bin/ds.js +376 -49
  345. package/docs/en/00_QUICK_START.md +135 -18
  346. package/docs/en/01_SETTINGS_REFERENCE.md +468 -96
  347. package/docs/en/02_START_RESEARCH_GUIDE.md +26 -5
  348. package/docs/en/03_QQ_CONNECTOR_GUIDE.md +14 -3
  349. package/docs/en/04_LINGZHU_CONNECTOR_GUIDE.md +2 -0
  350. package/docs/en/05_TUI_GUIDE.md +171 -2
  351. package/docs/en/07_MEMORY_AND_MCP.md +38 -2
  352. package/docs/en/09_DOCTOR.md +64 -4
  353. package/docs/en/10_WEIXIN_CONNECTOR_GUIDE.md +38 -1
  354. package/docs/en/11_LICENSE_AND_RISK.md +4 -0
  355. package/docs/en/12_GUIDED_WORKFLOW_TOUR.md +15 -0
  356. package/docs/en/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +9 -0
  357. package/docs/en/15_CODEX_PROVIDER_SETUP.md +622 -187
  358. package/docs/en/16_TELEGRAM_CONNECTOR_GUIDE.md +14 -0
  359. package/docs/en/17_WHATSAPP_CONNECTOR_GUIDE.md +14 -0
  360. package/docs/en/18_FEISHU_CONNECTOR_GUIDE.md +14 -0
  361. package/docs/en/21_LOCAL_MODEL_BACKENDS_GUIDE.md +105 -2
  362. package/docs/en/22_BENCHSTORE_YAML_REFERENCE.md +469 -0
  363. package/docs/en/23_BENCHSTORE_GITHUB_RELEASES_SPEC.md +316 -0
  364. package/docs/en/24_CLAUDE_CODE_PROVIDER_SETUP.md +469 -0
  365. package/docs/en/25_OPENCODE_PROVIDER_SETUP.md +653 -0
  366. package/docs/en/26_CITATION_AND_ATTRIBUTION.md +119 -0
  367. package/docs/en/27_KIMI_CODE_PROVIDER_SETUP.md +180 -0
  368. package/docs/en/28_DISCORD_CONNECTOR_GUIDE.md +61 -0
  369. package/docs/en/29_SLACK_CONNECTOR_GUIDE.md +60 -0
  370. package/docs/en/30_SETTINGS_CONTROL_CENTER_GUIDE.md +371 -0
  371. package/docs/en/{19_LOCAL_BROWSER_AUTH.md → 31_LOCAL_BROWSER_AUTH.md} +1 -1
  372. package/docs/en/32_WINDOWS_WSL2_DEPLOYMENT_GUIDE.md +273 -0
  373. package/docs/en/33_WORKSPACE_EXPLORER_QA.md +121 -0
  374. package/docs/en/91_DEVELOPMENT.md +29 -0
  375. package/docs/en/99_ACKNOWLEDGEMENTS.md +24 -19
  376. package/docs/en/README.md +44 -7
  377. package/docs/images/admin/admin-connectors-health-en.png +0 -0
  378. package/docs/images/admin/admin-controllers-en.png +0 -0
  379. package/docs/images/admin/admin-diagnostics-en.png +0 -0
  380. package/docs/images/admin/admin-errors-en.png +0 -0
  381. package/docs/images/admin/admin-issues-en.png +0 -0
  382. package/docs/images/admin/admin-logs-en.png +0 -0
  383. package/docs/images/admin/admin-quest-detail-en.png +0 -0
  384. package/docs/images/admin/admin-quests-en.png +0 -0
  385. package/docs/images/admin/admin-repairs-en.png +0 -0
  386. package/docs/images/admin/admin-runtime-en.png +0 -0
  387. package/docs/images/admin/admin-search-en.png +0 -0
  388. package/docs/images/admin/admin-stats-en.png +0 -0
  389. package/docs/images/admin/admin-summary-en.png +0 -0
  390. package/docs/images/connectors/connector-discord-en.png +0 -0
  391. package/docs/images/connectors/connector-feishu-en.png +0 -0
  392. package/docs/images/connectors/connector-lingzhu-en.png +0 -0
  393. package/docs/images/connectors/connector-qq-en.png +0 -0
  394. package/docs/images/connectors/connector-slack-en.png +0 -0
  395. package/docs/images/connectors/connector-telegram-en.png +0 -0
  396. package/docs/images/connectors/connector-weixin-en.png +0 -0
  397. package/docs/images/connectors/connector-whatsapp-en.png +0 -0
  398. package/docs/images/settings/settings-baselines-en.png +0 -0
  399. package/docs/images/settings/settings-config-en.png +0 -0
  400. package/docs/images/settings/settings-connectors-overview-en.png +0 -0
  401. package/docs/images/settings/settings-deepxiv-en.png +0 -0
  402. package/docs/images/settings/settings-mcp-servers-en.png +0 -0
  403. package/docs/images/settings/settings-plugins-en.png +0 -0
  404. package/docs/images/settings/settings-runners-en.png +0 -0
  405. package/docs/zh/00_QUICK_START.md +92 -17
  406. package/docs/zh/01_SETTINGS_REFERENCE.md +219 -98
  407. package/docs/zh/02_START_RESEARCH_GUIDE.md +26 -5
  408. package/docs/zh/05_TUI_GUIDE.md +171 -2
  409. package/docs/zh/07_MEMORY_AND_MCP.md +29 -2
  410. package/docs/zh/09_DOCTOR.md +39 -4
  411. package/docs/zh/10_WEIXIN_CONNECTOR_GUIDE.md +24 -1
  412. package/docs/zh/11_LICENSE_AND_RISK.md +4 -0
  413. package/docs/zh/12_GUIDED_WORKFLOW_TOUR.md +15 -0
  414. package/docs/zh/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +9 -0
  415. package/docs/zh/15_CODEX_PROVIDER_SETUP.md +550 -188
  416. package/docs/zh/21_LOCAL_MODEL_BACKENDS_GUIDE.md +105 -2
  417. package/docs/zh/22_BENCHSTORE_YAML_REFERENCE.md +459 -0
  418. package/docs/zh/23_BENCHSTORE_GITHUB_RELEASES_SPEC.md +287 -0
  419. package/docs/zh/23_CLAUDE_RUNNER_GUIDE.md +103 -0
  420. package/docs/zh/24_CLAUDE_CODE_PROVIDER_SETUP.md +460 -0
  421. package/docs/zh/25_OPENCODE_PROVIDER_SETUP.md +660 -0
  422. package/docs/zh/26_CITATION_AND_ATTRIBUTION.md +102 -0
  423. package/docs/zh/27_KIMI_CODE_PROVIDER_SETUP.md +51 -0
  424. package/docs/zh/{19_LOCAL_BROWSER_AUTH.md → 31_LOCAL_BROWSER_AUTH.md} +1 -1
  425. package/docs/zh/32_WINDOWS_WSL2_DEPLOYMENT_GUIDE.md +264 -0
  426. package/docs/zh/33_WORKSPACE_EXPLORER_QA.md +127 -0
  427. package/docs/zh/99_ACKNOWLEDGEMENTS.md +23 -19
  428. package/docs/zh/README.md +29 -7
  429. package/install.sh +122 -16
  430. package/package.json +4 -1
  431. package/pyproject.toml +2 -1
  432. package/src/deepscientist/__init__.py +1 -1
  433. package/src/deepscientist/acp/envelope.py +13 -0
  434. package/src/deepscientist/admin/__init__.py +3 -0
  435. package/src/deepscientist/admin/charts.py +681 -0
  436. package/src/deepscientist/admin/logs.py +119 -0
  437. package/src/deepscientist/admin/repairs.py +217 -0
  438. package/src/deepscientist/admin/service.py +1310 -0
  439. package/src/deepscientist/admin/system_info.py +700 -0
  440. package/src/deepscientist/admin/tasks.py +465 -0
  441. package/src/deepscientist/admin/tool_metrics.py +600 -0
  442. package/src/deepscientist/artifact/guidance.py +8 -4
  443. package/src/deepscientist/artifact/schemas.py +115 -0
  444. package/src/deepscientist/artifact/service.py +4268 -260
  445. package/src/deepscientist/bash_exec/monitor.py +30 -3
  446. package/src/deepscientist/bash_exec/service.py +134 -1
  447. package/src/deepscientist/benchstore/__init__.py +4 -0
  448. package/src/deepscientist/benchstore/prompt_builder.py +224 -0
  449. package/src/deepscientist/benchstore/service.py +1716 -0
  450. package/src/deepscientist/channels/weixin_ilink.py +8 -1
  451. package/src/deepscientist/cli.py +92 -17
  452. package/src/deepscientist/codex_cli_compat.py +2 -2
  453. package/src/deepscientist/config/models.py +82 -11
  454. package/src/deepscientist/config/service.py +927 -91
  455. package/src/deepscientist/connector/weixin_support.py +48 -17
  456. package/src/deepscientist/daemon/api/handlers.py +697 -210
  457. package/src/deepscientist/daemon/api/router.py +76 -1
  458. package/src/deepscientist/daemon/app.py +1054 -51
  459. package/src/deepscientist/diagnostics/runner_failures.py +147 -0
  460. package/src/deepscientist/doctor.py +212 -65
  461. package/src/deepscientist/evidence_packets.py +590 -0
  462. package/src/deepscientist/home.py +52 -4
  463. package/src/deepscientist/kimi_cli_compat.py +50 -0
  464. package/src/deepscientist/latex_runtime.py +2 -2
  465. package/src/deepscientist/mcp/context.py +2 -0
  466. package/src/deepscientist/mcp/schemas.py +114 -0
  467. package/src/deepscientist/mcp/server.py +1566 -126
  468. package/src/deepscientist/memory/service.py +203 -16
  469. package/src/deepscientist/process_control.py +8 -1
  470. package/src/deepscientist/prompts/builder.py +836 -92
  471. package/src/deepscientist/quest/__init__.py +2 -2
  472. package/src/deepscientist/quest/layout.py +12 -1
  473. package/src/deepscientist/quest/node_traces.py +10 -0
  474. package/src/deepscientist/quest/service.py +1430 -139
  475. package/src/deepscientist/quest/stage_views.py +1 -1
  476. package/src/deepscientist/runners/__init__.py +18 -0
  477. package/src/deepscientist/runners/base.py +89 -1
  478. package/src/deepscientist/runners/builtins.py +13 -1
  479. package/src/deepscientist/runners/claude.py +391 -0
  480. package/src/deepscientist/runners/codex.py +421 -21
  481. package/src/deepscientist/runners/codex_telemetry.py +127 -0
  482. package/src/deepscientist/runners/kimi.py +334 -0
  483. package/src/deepscientist/runners/metadata.py +68 -0
  484. package/src/deepscientist/runners/opencode.py +414 -0
  485. package/src/deepscientist/runners/runtime_overrides.py +100 -0
  486. package/src/deepscientist/runners/simple_cli.py +538 -0
  487. package/src/deepscientist/runtime_storage.py +303 -0
  488. package/src/deepscientist/shared.py +61 -16
  489. package/src/deepscientist/skills/installer.py +37 -0
  490. package/src/deepscientist/skills/registry.py +2 -0
  491. package/src/deepscientist/tinytex.py +2 -2
  492. package/src/deepscientist/tui.py +10 -3
  493. package/src/prompts/benchstore/system.md +77 -0
  494. package/src/prompts/connectors/qq.md +33 -2
  495. package/src/prompts/connectors/weixin.md +208 -23
  496. package/src/prompts/contracts/admin_ops.md +74 -0
  497. package/src/prompts/contracts/admin_ops_knowledge.md +138 -0
  498. package/src/prompts/contracts/shared_interaction.md +5 -11
  499. package/src/prompts/start_setup/system.md +422 -0
  500. package/src/prompts/system.md +409 -315
  501. package/src/prompts/system_copilot.md +88 -12
  502. package/src/skills/analysis-campaign/SKILL.md +239 -578
  503. package/src/skills/analysis-campaign/references/artifact-flow-examples.md +102 -0
  504. package/src/skills/analysis-campaign/references/boundary-cases.md +98 -0
  505. package/src/skills/analysis-campaign/references/campaign-checklist-template.md +39 -24
  506. package/src/skills/analysis-campaign/references/campaign-design.md +26 -10
  507. package/src/skills/analysis-campaign/references/campaign-plan-template.md +53 -54
  508. package/src/skills/analysis-campaign/references/operational-guidance.md +97 -0
  509. package/src/skills/analysis-campaign/references/writing-facing-slice-examples.md +10 -20
  510. package/src/skills/baseline/SKILL.md +183 -461
  511. package/src/skills/baseline/references/artifact-flow-examples.md +106 -0
  512. package/src/skills/baseline/references/artifact-payload-examples.md +1 -1
  513. package/src/skills/baseline/references/baseline-checklist-template.md +27 -35
  514. package/src/skills/baseline/references/baseline-plan-template.md +37 -76
  515. package/src/skills/baseline/references/boundary-cases.md +86 -0
  516. package/src/skills/baseline/references/codebase-audit-checklist.md +2 -6
  517. package/src/skills/baseline/references/comparability-contract.md +7 -12
  518. package/src/skills/baseline/references/operational-guidance.md +56 -0
  519. package/src/skills/baseline/references/route-selection.md +5 -25
  520. package/src/skills/decision/SKILL.md +113 -306
  521. package/src/skills/decision/references/checkpoint-memory-template.md +47 -0
  522. package/src/skills/decision/references/operational-guidance.md +94 -0
  523. package/src/skills/decision/references/research-route-criteria.md +7 -8
  524. package/src/skills/decision/references/strategic-decision-template.md +13 -26
  525. package/src/skills/experiment/SKILL.md +132 -670
  526. package/src/skills/experiment/references/execution-playbook.md +374 -0
  527. package/src/skills/experiment/references/main-experiment-checklist-template.md +26 -2
  528. package/src/skills/experiment/references/main-experiment-plan-template.md +28 -17
  529. package/src/skills/experiment/references/operational-guidance.md +108 -0
  530. package/src/skills/finalize/SKILL.md +62 -0
  531. package/src/skills/finalize/references/checkpoint-memory-template.md +49 -0
  532. package/src/skills/finalize/references/resume-packet-template.md +7 -0
  533. package/src/skills/idea/SKILL.md +228 -15
  534. package/src/skills/idea/references/controlled-brainstorming-playbook.md +78 -0
  535. package/src/skills/idea/references/current-board-packet-template.md +61 -0
  536. package/src/skills/idea/references/high-value-idea-sourcing.md +119 -0
  537. package/src/skills/idea/references/idea-generation-playbook.md +21 -0
  538. package/src/skills/idea/references/idea-thinking-flow.md +6 -0
  539. package/src/skills/idea/references/literature-survey-template.md +3 -0
  540. package/src/skills/idea/references/objective-contract-template.md +54 -0
  541. package/src/skills/idea/references/outline-seeding-example.md +56 -0
  542. package/src/skills/idea/references/pre-idea-draft-template.md +105 -0
  543. package/src/skills/idea/references/related-work-playbook.md +75 -2
  544. package/src/skills/idea/references/research-history-playbook.md +114 -0
  545. package/src/skills/idea/references/selection-gate.md +58 -6
  546. package/src/skills/intake-audit/SKILL.md +43 -2
  547. package/src/skills/intake-audit/references/state-audit-template.md +10 -0
  548. package/src/skills/nature-data/SKILL.md +128 -0
  549. package/src/skills/nature-data/UPSTREAM_LICENSE.txt +21 -0
  550. package/src/skills/nature-data/agents/openai.yaml +4 -0
  551. package/src/skills/nature-data/references/chinese-author-alignment.md +84 -0
  552. package/src/skills/nature-data/references/fair-metadata-checklist.md +105 -0
  553. package/src/skills/nature-data/references/policy-principles.md +103 -0
  554. package/src/skills/nature-data/references/repository-and-identifiers.md +96 -0
  555. package/src/skills/nature-data/references/source-basis.md +54 -0
  556. package/src/skills/nature-data/references/statement-patterns.md +153 -0
  557. package/src/skills/nature-figure/SKILL.md +197 -0
  558. package/src/skills/nature-figure/UPSTREAM_LICENSE.txt +21 -0
  559. package/src/skills/nature-figure/agents/openai.yaml +4 -0
  560. package/src/skills/nature-figure/evals/evals.json +37 -0
  561. package/src/skills/nature-figure/references/api.md +428 -0
  562. package/src/skills/nature-figure/references/backend-selection.md +100 -0
  563. package/src/skills/nature-figure/references/chart-types.md +281 -0
  564. package/src/skills/nature-figure/references/common-patterns.md +349 -0
  565. package/src/skills/nature-figure/references/design-theory.md +436 -0
  566. package/src/skills/nature-figure/references/figure-contract.md +93 -0
  567. package/src/skills/nature-figure/references/nature-2026-observations.md +112 -0
  568. package/src/skills/nature-figure/references/qa-contract.md +119 -0
  569. package/src/skills/nature-figure/references/r-template-index.md +66 -0
  570. package/src/skills/nature-figure/references/r-workflow.md +161 -0
  571. package/src/skills/nature-figure/references/tutorials.md +250 -0
  572. package/src/skills/nature-paper2ppt/SKILL.md +507 -0
  573. package/src/skills/nature-paper2ppt/UPSTREAM_LICENSE.txt +21 -0
  574. package/src/skills/nature-paper2ppt/agents/openai.yaml +4 -0
  575. package/src/skills/nature-polishing/SKILL.md +385 -0
  576. package/src/skills/nature-polishing/UPSTREAM_LICENSE.txt +21 -0
  577. package/src/skills/nature-polishing/agents/openai.yaml +4 -0
  578. package/src/skills/nature-polishing/references/phrasebank-playbook.md +162 -0
  579. package/src/skills/nature-polishing/references/section-moves.md +240 -0
  580. package/src/skills/nature-polishing/references/style-guardrails.md +94 -0
  581. package/src/skills/nature-polishing/references/writing-strategy.md +148 -0
  582. package/src/skills/optimize/SKILL.md +177 -1568
  583. package/src/skills/optimize/references/brief-shaping-playbook.md +95 -0
  584. package/src/skills/optimize/references/candidate-board-template.md +13 -0
  585. package/src/skills/optimize/references/candidate-ranking-template.md +51 -0
  586. package/src/skills/optimize/references/codegen-route-playbook.md +50 -0
  587. package/src/skills/optimize/references/debug-response-template.md +29 -0
  588. package/src/skills/optimize/references/frontier-review-template.md +32 -0
  589. package/src/skills/optimize/references/fusion-playbook.md +36 -0
  590. package/src/skills/optimize/references/method-brief-template.md +73 -0
  591. package/src/skills/optimize/references/operational-guidance.md +621 -0
  592. package/src/skills/optimize/references/optimization-memory-template.md +30 -0
  593. package/src/skills/optimize/references/optimize-checklist-template.md +18 -0
  594. package/src/skills/optimize/references/plateau-response-playbook.md +28 -0
  595. package/src/skills/optimize/references/prompt-patterns.md +49 -0
  596. package/src/skills/paper-outline/SKILL.md +227 -0
  597. package/src/skills/paper-outline/references/outline-patterns.md +87 -0
  598. package/src/skills/paper-plot/SKILL.md +79 -0
  599. package/src/skills/paper-plot/agents/openai.yaml +4 -0
  600. package/src/skills/paper-plot/references/bar_grouped_hatch.md +96 -0
  601. package/src/skills/paper-plot/references/bar_paired_delta.md +72 -0
  602. package/src/skills/paper-plot/references/line_confidence_band.md +75 -0
  603. package/src/skills/paper-plot/references/line_loss_with_inset.md +65 -0
  604. package/src/skills/paper-plot/references/line_training_curve.md +44 -0
  605. package/src/skills/paper-plot/references/radar_dual_series.md +59 -0
  606. package/src/skills/paper-plot/references/scatter_broken_axis.md +59 -0
  607. package/src/skills/paper-plot/references/scatter_tsne_cluster.md +72 -0
  608. package/src/skills/paper-plot/scripts/bar_memevolve.py +109 -0
  609. package/src/skills/paper-plot/scripts/bar_spice.py +166 -0
  610. package/src/skills/paper-plot/scripts/line_aime.py +94 -0
  611. package/src/skills/paper-plot/scripts/line_loss_inset.py +157 -0
  612. package/src/skills/paper-plot/scripts/line_selfdistill.py +168 -0
  613. package/src/skills/paper-plot/scripts/radar_dora.py +151 -0
  614. package/src/skills/paper-plot/scripts/scatter_break.py +169 -0
  615. package/src/skills/paper-plot/scripts/scatter_tsne.py +133 -0
  616. package/src/skills/rebuttal/SKILL.md +9 -0
  617. package/src/skills/references/tool-usage-by-stage.md +438 -0
  618. package/src/skills/review/SKILL.md +105 -7
  619. package/src/skills/science/PROVENANCE.md +44 -0
  620. package/src/skills/science/SKILL.md +137 -0
  621. package/src/skills/science/references/artifact-science-tool.md +110 -0
  622. package/src/skills/science/references/claim-type-discipline.md +56 -0
  623. package/src/skills/science/references/domain-index.md +422 -0
  624. package/src/skills/science/references/hpc-via-bash-exec.md +42 -0
  625. package/src/skills/science/references/package-check-playbook.md +64 -0
  626. package/src/skills/science/references/package-index.min.json +3616 -0
  627. package/src/skills/science/references/packages/abinit.md +80 -0
  628. package/src/skills/science/references/packages/acts.md +73 -0
  629. package/src/skills/science/references/packages/aiida-core.md +80 -0
  630. package/src/skills/science/references/packages/alamode.md +80 -0
  631. package/src/skills/science/references/packages/amuse.md +88 -0
  632. package/src/skills/science/references/packages/anndata.md +88 -0
  633. package/src/skills/science/references/packages/arbor.md +80 -0
  634. package/src/skills/science/references/packages/arc.md +73 -0
  635. package/src/skills/science/references/packages/astropy.md +88 -0
  636. package/src/skills/science/references/packages/astroquery.md +88 -0
  637. package/src/skills/science/references/packages/atomate2.md +80 -0
  638. package/src/skills/science/references/packages/atomsmltr.md +73 -0
  639. package/src/skills/science/references/packages/awkward.md +73 -0
  640. package/src/skills/science/references/packages/batman.md +88 -0
  641. package/src/skills/science/references/packages/biopython.md +88 -0
  642. package/src/skills/science/references/packages/bloqade.md +73 -0
  643. package/src/skills/science/references/packages/brian2.md +73 -0
  644. package/src/skills/science/references/packages/bullet3.md +73 -0
  645. package/src/skills/science/references/packages/calculix.md +80 -0
  646. package/src/skills/science/references/packages/cantera.md +73 -0
  647. package/src/skills/science/references/packages/cavity-md-ipi.md +80 -0
  648. package/src/skills/science/references/packages/ccdproc.md +88 -0
  649. package/src/skills/science/references/packages/celerite2.md +88 -0
  650. package/src/skills/science/references/packages/cellrank.md +73 -0
  651. package/src/skills/science/references/packages/cesm.md +80 -0
  652. package/src/skills/science/references/packages/chemicals.md +73 -0
  653. package/src/skills/science/references/packages/chempy.md +73 -0
  654. package/src/skills/science/references/packages/cirq.md +73 -0
  655. package/src/skills/science/references/packages/coffea.md +73 -0
  656. package/src/skills/science/references/packages/cp2k.md +88 -0
  657. package/src/skills/science/references/packages/custodian.md +80 -0
  658. package/src/skills/science/references/packages/dart.md +73 -0
  659. package/src/skills/science/references/packages/datamol.md +88 -0
  660. package/src/skills/science/references/packages/dd4hep.md +73 -0
  661. package/src/skills/science/references/packages/dealii.md +80 -0
  662. package/src/skills/science/references/packages/deepchem.md +88 -0
  663. package/src/skills/science/references/packages/delphes.md +73 -0
  664. package/src/skills/science/references/packages/devito.md +80 -0
  665. package/src/skills/science/references/packages/dftb.md +88 -0
  666. package/src/skills/science/references/packages/dftd4.md +88 -0
  667. package/src/skills/science/references/packages/dftk-jl.md +80 -0
  668. package/src/skills/science/references/packages/dolfinx.md +80 -0
  669. package/src/skills/science/references/packages/drake.md +73 -0
  670. package/src/skills/science/references/packages/dumux.md +73 -0
  671. package/src/skills/science/references/packages/elk.md +80 -0
  672. package/src/skills/science/references/packages/elmerfem.md +80 -0
  673. package/src/skills/science/references/packages/enzo-e.md +88 -0
  674. package/src/skills/science/references/packages/espresso.md +80 -0
  675. package/src/skills/science/references/packages/exoplanet.md +88 -0
  676. package/src/skills/science/references/packages/fairroot.md +73 -0
  677. package/src/skills/science/references/packages/fbpic.md +80 -0
  678. package/src/skills/science/references/packages/fdtdbath-meep.md +80 -0
  679. package/src/skills/science/references/packages/geant4.md +73 -0
  680. package/src/skills/science/references/packages/geosx.md +80 -0
  681. package/src/skills/science/references/packages/gprmax.md +80 -0
  682. package/src/skills/science/references/packages/gromacs.md +80 -0
  683. package/src/skills/science/references/packages/gwaslab.md +73 -0
  684. package/src/skills/science/references/packages/gz-sim.md +73 -0
  685. package/src/skills/science/references/packages/hail.md +88 -0
  686. package/src/skills/science/references/packages/hiphive.md +80 -0
  687. package/src/skills/science/references/packages/hoomd-blue.md +80 -0
  688. package/src/skills/science/references/packages/itensor.md +73 -0
  689. package/src/skills/science/references/packages/itensors-jl.md +73 -0
  690. package/src/skills/science/references/packages/jdftx.md +73 -0
  691. package/src/skills/science/references/packages/jobflow.md +80 -0
  692. package/src/skills/science/references/packages/kadanoffbaym-jl.md +73 -0
  693. package/src/skills/science/references/packages/kite.md +80 -0
  694. package/src/skills/science/references/packages/kratos.md +80 -0
  695. package/src/skills/science/references/packages/kwant.md +73 -0
  696. package/src/skills/science/references/packages/lammps.md +80 -0
  697. package/src/skills/science/references/packages/lightkurve.md +88 -0
  698. package/src/skills/science/references/packages/limix.md +73 -0
  699. package/src/skills/science/references/packages/maxwelllink.md +80 -0
  700. package/src/skills/science/references/packages/mcdc.md +73 -0
  701. package/src/skills/science/references/packages/meep.md +80 -0
  702. package/src/skills/science/references/packages/mfem.md +80 -0
  703. package/src/skills/science/references/packages/mitgcm.md +73 -0
  704. package/src/skills/science/references/packages/modflow6.md +73 -0
  705. package/src/skills/science/references/packages/molecool.md +73 -0
  706. package/src/skills/science/references/packages/mom6.md +73 -0
  707. package/src/skills/science/references/packages/moose.md +80 -0
  708. package/src/skills/science/references/packages/mpas-model.md +73 -0
  709. package/src/skills/science/references/packages/mujoco.md +73 -0
  710. package/src/skills/science/references/packages/mumax3.md +73 -0
  711. package/src/skills/science/references/packages/nekrs.md +80 -0
  712. package/src/skills/science/references/packages/nessi.md +73 -0
  713. package/src/skills/science/references/packages/nest-simulator.md +73 -0
  714. package/src/skills/science/references/packages/netket.md +73 -0
  715. package/src/skills/science/references/packages/neuron.md +73 -0
  716. package/src/skills/science/references/packages/nextflow.md +88 -0
  717. package/src/skills/science/references/packages/nwchem.md +88 -0
  718. package/src/skills/science/references/packages/openbabel.md +88 -0
  719. package/src/skills/science/references/packages/openems.md +80 -0
  720. package/src/skills/science/references/packages/openff-toolkit.md +88 -0
  721. package/src/skills/science/references/packages/openfoam-dev.md +80 -0
  722. package/src/skills/science/references/packages/openmc.md +73 -0
  723. package/src/skills/science/references/packages/openmm.md +80 -0
  724. package/src/skills/science/references/packages/openmoc.md +73 -0
  725. package/src/skills/science/references/packages/openmx.md +80 -0
  726. package/src/skills/science/references/packages/opensees.md +80 -0
  727. package/src/skills/science/references/packages/opensn.md +80 -0
  728. package/src/skills/science/references/packages/opm-simulators.md +73 -0
  729. package/src/skills/science/references/packages/oqupy.md +73 -0
  730. package/src/skills/science/references/packages/packmol.md +80 -0
  731. package/src/skills/science/references/packages/palabos.md +80 -0
  732. package/src/skills/science/references/packages/parflow.md +80 -0
  733. package/src/skills/science/references/packages/pennylane.md +88 -0
  734. package/src/skills/science/references/packages/perceval.md +73 -0
  735. package/src/skills/science/references/packages/phono3py.md +73 -0
  736. package/src/skills/science/references/packages/phonopy.md +73 -0
  737. package/src/skills/science/references/packages/photutils.md +88 -0
  738. package/src/skills/science/references/packages/picongpu.md +80 -0
  739. package/src/skills/science/references/packages/plink-ng.md +88 -0
  740. package/src/skills/science/references/packages/precice.md +73 -0
  741. package/src/skills/science/references/packages/psc.md +80 -0
  742. package/src/skills/science/references/packages/psi4.md +88 -0
  743. package/src/skills/science/references/packages/pybinding.md +73 -0
  744. package/src/skills/science/references/packages/pyfr.md +80 -0
  745. package/src/skills/science/references/packages/pyhf.md +73 -0
  746. package/src/skills/science/references/packages/pyiron_base.md +80 -0
  747. package/src/skills/science/references/packages/pylcp.md +73 -0
  748. package/src/skills/science/references/packages/pylith.md +80 -0
  749. package/src/skills/science/references/packages/pynbody.md +88 -0
  750. package/src/skills/science/references/packages/pysam.md +88 -0
  751. package/src/skills/science/references/packages/pyscf.md +88 -0
  752. package/src/skills/science/references/packages/q-e.md +73 -0
  753. package/src/skills/science/references/packages/qibo.md +73 -0
  754. package/src/skills/science/references/packages/qiskit.md +73 -0
  755. package/src/skills/science/references/packages/quantica-jl.md +73 -0
  756. package/src/skills/science/references/packages/quantumoptics-jl.md +73 -0
  757. package/src/skills/science/references/packages/quimb.md +73 -0
  758. package/src/skills/science/references/packages/qulacs.md +73 -0
  759. package/src/skills/science/references/packages/qutip.md +73 -0
  760. package/src/skills/science/references/packages/rdkit.md +88 -0
  761. package/src/skills/science/references/packages/rmg-py.md +73 -0
  762. package/src/skills/science/references/packages/root.md +73 -0
  763. package/src/skills/science/references/packages/scanpy.md +88 -0
  764. package/src/skills/science/references/packages/scikit-allel.md +88 -0
  765. package/src/skills/science/references/packages/scikit-bio.md +88 -0
  766. package/src/skills/science/references/packages/scqubits.md +73 -0
  767. package/src/skills/science/references/packages/scuff-em.md +80 -0
  768. package/src/skills/science/references/packages/scvi-tools.md +73 -0
  769. package/src/skills/science/references/packages/seissol.md +73 -0
  770. package/src/skills/science/references/packages/sfepy.md +80 -0
  771. package/src/skills/science/references/packages/sisl.md +73 -0
  772. package/src/skills/science/references/packages/smilei.md +80 -0
  773. package/src/skills/science/references/packages/snakemake.md +88 -0
  774. package/src/skills/science/references/packages/specfem3d-globe.md +80 -0
  775. package/src/skills/science/references/packages/specutils.md +88 -0
  776. package/src/skills/science/references/packages/spglib.md +80 -0
  777. package/src/skills/science/references/packages/squidpy.md +88 -0
  778. package/src/skills/science/references/packages/starry.md +88 -0
  779. package/src/skills/science/references/packages/strawberryfields.md +73 -0
  780. package/src/skills/science/references/packages/su2.md +80 -0
  781. package/src/skills/science/references/packages/sunny-jl.md +73 -0
  782. package/src/skills/science/references/packages/sw4.md +73 -0
  783. package/src/skills/science/references/packages/swift.md +88 -0
  784. package/src/skills/science/references/packages/tdnegf.md +73 -0
  785. package/src/skills/science/references/packages/tenpy.md +73 -0
  786. package/src/skills/science/references/packages/thermo.md +73 -0
  787. package/src/skills/science/references/packages/tkwant.md +73 -0
  788. package/src/skills/science/references/packages/tvb-root.md +73 -0
  789. package/src/skills/science/references/packages/uproot5.md +73 -0
  790. package/src/skills/science/references/packages/vampire.md +80 -0
  791. package/src/skills/science/references/packages/wannier_tools.md +73 -0
  792. package/src/skills/science/references/packages/warpx.md +80 -0
  793. package/src/skills/science/references/packages/wrf.md +73 -0
  794. package/src/skills/science/references/packages/xtb.md +88 -0
  795. package/src/skills/science/references/packages/yt.md +73 -0
  796. package/src/skills/science/references/science-task-brief-template.md +71 -0
  797. package/src/skills/scout/SKILL.md +83 -425
  798. package/src/skills/scout/references/literature-scout-template.md +5 -24
  799. package/src/skills/scout/references/operational-guidance.md +191 -0
  800. package/src/skills/scout/references/paper-triage-playbook.md +11 -35
  801. package/src/skills/write/SKILL.md +744 -1246
  802. package/src/skills/write/references/experiments_analysis_patterns.md +129 -0
  803. package/src/skills/write/references/oral_package_patterns.md +252 -0
  804. package/src/skills/write/references/oral_writing_principles.md +291 -0
  805. package/src/skills/write/references/section_rewrite_checklist.md +234 -0
  806. package/src/tui/dist/app/AppContainer.js +1314 -27
  807. package/src/tui/dist/components/Composer.js +26 -1
  808. package/src/tui/dist/components/ConfigScreen.js +2 -1
  809. package/src/tui/dist/components/InputPrompt.js +25 -9
  810. package/src/tui/dist/components/MainContent.js +18 -3
  811. package/src/tui/dist/components/QuestScreen.js +3 -2
  812. package/src/tui/dist/components/UtilityScreen.js +37 -0
  813. package/src/tui/dist/hooks/useSafeInput.js +10 -0
  814. package/src/tui/dist/index.js +13 -1
  815. package/src/tui/dist/layouts/DefaultAppLayout.js +11 -8
  816. package/src/tui/dist/lib/api.js +89 -1
  817. package/src/tui/package.json +1 -1
  818. package/src/ui/dist/assets/{AnalysisPlugin-BCKAfjba.js → AnalysisPlugin-CA94NGmI.js} +1 -1
  819. package/src/ui/dist/assets/CliPlugin-DHBzphZU.js +79 -0
  820. package/src/ui/dist/assets/CodeEditorPlugin-BOFwD2rn.js +2 -0
  821. package/src/ui/dist/assets/{CodeViewerPlugin-CbaFRrUU.js → CodeViewerPlugin-CqDpgjik.js} +4 -4
  822. package/src/ui/dist/assets/{DocViewerPlugin-DAjLVeQD.js → DocViewerPlugin-UDBgt8-4.js} +3 -3
  823. package/src/ui/dist/assets/GitCommitViewerPlugin-BmHtZ0bZ.js +6 -0
  824. package/src/ui/dist/assets/{GitDiffViewerPlugin-CQACjoAA.js → GitDiffViewerPlugin-CAxjNorQ.js} +2 -2
  825. package/src/ui/dist/assets/{GitSnapshotViewer-0r4nLPke.js → GitSnapshotViewer-CweA6VON.js} +2 -2
  826. package/src/ui/dist/assets/{ImageViewerPlugin-nBOmI2v_.js → ImageViewerPlugin-C8wHGvGN.js} +5 -5
  827. package/src/ui/dist/assets/LabPlugin-COyyLUol.js +32 -0
  828. package/src/ui/dist/assets/{LatexPlugin-ZwtV8pIp.js → LatexPlugin-BQjAaA5J.js} +4 -4
  829. package/src/ui/dist/assets/{MarkdownViewerPlugin-DKqVfKyW.js → MarkdownViewerPlugin-Dy1NE2dI.js} +3 -3
  830. package/src/ui/dist/assets/{MarketplacePlugin-BwxStZ9D.js → MarketplacePlugin-DMIZtEJ2.js} +2 -2
  831. package/src/ui/dist/assets/NotebookEditor-CFHMq_Qt.js +91 -0
  832. package/src/ui/dist/assets/{NotebookEditor-DB9N_T9q.js → NotebookEditor-WFyd8Ybt.js} +3 -3
  833. package/src/ui/dist/assets/{PdfLoader-eWBONbQP.js → PdfLoader-CLE5u5TS.js} +3 -3
  834. package/src/ui/dist/assets/{PdfMarkdownPlugin-D22YOZL3.js → PdfMarkdownPlugin-_iNK_H83.js} +1 -1
  835. package/src/ui/dist/assets/PdfViewerPlugin-DgWsbInT.js +22 -0
  836. package/src/ui/dist/assets/SearchPlugin-DrZmn5iw.js +11 -0
  837. package/src/ui/dist/assets/{TextViewerPlugin-C5xqeeUH.js → TextViewerPlugin-D1-T3aC7.js} +4 -4
  838. package/src/ui/dist/assets/branding/runner-claude.svg +107 -0
  839. package/src/ui/dist/assets/branding/runner-codex.svg +10 -0
  840. package/src/ui/dist/assets/branding/runner-kimi.svg +14 -0
  841. package/src/ui/dist/assets/branding/runner-opencode.svg +7 -0
  842. package/src/ui/dist/assets/cli-store-CoZ-x5Ip.js +1 -0
  843. package/src/ui/dist/assets/{code-WlFHE7z_.js → code-DbsmSd3Y.js} +1 -1
  844. package/src/ui/dist/assets/file-diff-panel-DsvyRz47.js +1 -0
  845. package/src/ui/dist/assets/{wrap-text-BC-Hltpd.js → file-jump-queue-DeQBikaw.js} +3 -3
  846. package/src/ui/dist/assets/{file-socket-CfQPKQKj.js → file-socket-DA5XIx88.js} +1 -1
  847. package/src/ui/dist/assets/fonts/ds-fonts.css +50 -4
  848. package/src/ui/dist/assets/images/deepxiv/register-guide.png +0 -0
  849. package/src/ui/dist/assets/index-39vY9LmZ.js +1 -0
  850. package/src/ui/dist/assets/{index-CwNu1aH4.js → index-BsO46tJA.js} +1 -1
  851. package/src/ui/dist/assets/index-CHzJ2xtB.js +3530 -0
  852. package/src/ui/dist/assets/index-DH-zxoZ3.css +33 -0
  853. package/src/ui/dist/assets/{plugin-notebook-HbW2K-1c.js → plugin-notebook-JRhysCqj.js} +2 -2
  854. package/src/ui/dist/assets/{project-sync-C9IdzdZW.js → project-sync-DPmWKmKD.js} +1 -1
  855. package/src/ui/dist/assets/{zoom-out-E_gaeAxL.js → zoom-out-DAukFWen.js} +3 -3
  856. package/src/ui/dist/index.html +3 -3
  857. package/src/skills/analysis-campaign/references/artifact-orchestration.md +0 -58
  858. package/src/skills/baseline/references/memory-playbook.md +0 -40
  859. package/src/skills/baseline/references/publishable-baseline-package.md +0 -30
  860. package/src/skills/write/references/outline-evidence-contract-example.md +0 -107
  861. package/src/skills/write/references/paper-experiment-matrix-template.md +0 -131
  862. package/src/skills/write/references/paper-section-playbook.md +0 -64
  863. package/src/skills/write/references/reviewer-first-writing.md +0 -64
  864. package/src/skills/write/references/revision-checklist.md +0 -70
  865. package/src/skills/write/references/section-contracts.md +0 -82
  866. package/src/skills/write/references/sentence-level-proofing.md +0 -49
  867. package/src/ui/dist/assets/AiManusChatView-Bv-Z8YpU.js +0 -204
  868. package/src/ui/dist/assets/CliPlugin-BCKcpc35.js +0 -109
  869. package/src/ui/dist/assets/CodeEditorPlugin-DbOfSJ8K.js +0 -2
  870. package/src/ui/dist/assets/GitCommitViewerPlugin-CIUqbUDO.js +0 -1
  871. package/src/ui/dist/assets/LabCopilotPanel-BHxOxF4z.js +0 -14
  872. package/src/ui/dist/assets/LabPlugin-BKoZGs95.js +0 -22
  873. package/src/ui/dist/assets/NotebookEditor-BEQhaQbt.js +0 -81
  874. package/src/ui/dist/assets/PdfViewerPlugin-c-RK9DLM.js +0 -17
  875. package/src/ui/dist/assets/SearchPlugin-CxF9ytAx.js +0 -16
  876. package/src/ui/dist/assets/VNCViewer-BoLGLnHz.js +0 -11
  877. package/src/ui/dist/assets/bot-DREQOxzP.js +0 -6
  878. package/src/ui/dist/assets/chevron-up-C9Qpx4DE.js +0 -6
  879. package/src/ui/dist/assets/file-content-BZMz3RYp.js +0 -1
  880. package/src/ui/dist/assets/file-diff-panel-CQhw0jS2.js +0 -1
  881. package/src/ui/dist/assets/file-jump-queue-DA-SdG__.js +0 -1
  882. package/src/ui/dist/assets/git-commit-horizontal-DxZ8DCZh.js +0 -6
  883. package/src/ui/dist/assets/image-Bgl4VIyx.js +0 -6
  884. package/src/ui/dist/assets/index-BpV6lusQ.css +0 -33
  885. package/src/ui/dist/assets/index-CBNVuWcP.js +0 -2496
  886. package/src/ui/dist/assets/index-DrUnlf6K.js +0 -1
  887. package/src/ui/dist/assets/index-NW-h8VzN.js +0 -1
  888. package/src/ui/dist/assets/pdf-effect-queue-J8OnM0jE.js +0 -6
  889. package/src/ui/dist/assets/popover-CLc0pPP8.js +0 -1
  890. package/src/ui/dist/assets/select-Cs2PmzwL.js +0 -11
  891. package/src/ui/dist/assets/sigma-ClKcHAXm.js +0 -6
  892. package/src/ui/dist/assets/trash-DwpbFr3w.js +0 -11
  893. package/src/ui/dist/assets/useCliAccess-NQ8m0Let.js +0 -1
  894. package/src/ui/dist/assets/useFileDiffOverlay-FuhcnKiw.js +0 -1
@@ -7,87 +7,122 @@ skill_role: stage
7
7
  # Experiment
8
8
 
9
9
  Use this skill for the main evidence-producing runs of the quest.
10
+ The goal is to turn one selected route into one trustworthy measured result with the smallest valid amount of execution.
10
11
 
11
- ## Interaction discipline
12
+ ## Match signals
12
13
 
13
- - Follow the shared interaction contract injected by the system prompt.
14
- - For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
15
- - Keep ordinary subtask completions concise. When a main experiment actually finishes or reaches a stage-significant checkpoint, upgrade to a richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` report rather than another short progress line.
16
- - That richer experiment-stage milestone report should normally cover: what run finished, the headline result versus baseline or expectation, the main caveat, and the exact recommended next action.
17
- - That richer milestone report is still normally non-blocking. If the next route is already justified locally, continue automatically after reporting rather than idling for acknowledgment.
18
- - If the active communication surface is QQ and QQ milestone media is enabled in config, a completed main experiment may attach one summary PNG to that richer milestone update.
19
- - That PNG should be a connector-facing report chart, not a raw debug plot and not a draft paper figure.
20
- - Do not auto-send every training curve, per-step plot, or intermediate slice image.
21
- - Preferred connector-chart palettes are Morandi-like and restrained:
22
- - `sage-clay`: `#E7E1D6`, `#B7A99A`, `#7F8F84` for the default QQ summary look
23
- - `mist-stone`: `#F3EEE8`, `#D8D1C7`, `#8A9199` for conservative summaries
24
- - `dust-rose`: `#F2E9E6`, `#D8C3BC`, `#B88C8C` for secondary comparisons only
25
- - Connector-facing chart requirements:
26
- - white or near-white background
27
- - low saturation, no neon colors
28
- - one primary accent plus one neutral comparison color whenever possible
29
- - simple legend, light grid, readable labels, and no dashboard clutter
30
- - summarize only the evidence needed for the milestone
31
- - Default chart choice:
32
- - line chart for training / budget / step trends
33
- - bar chart for a small number of categorical end-point comparisons
34
- - point-range chart when uncertainty or seed spread matters
35
- - If the figure encodes ordered magnitude, use a sequential muted palette; if it encodes signed delta around a reference, use a diverging muted palette with a neutral midpoint.
36
- - Avoid rainbow / jet-like colormaps, 3D effects, and over-annotated dashboards.
37
- - If the chart may later be reused in the paper, export a vector copy (`pdf` or `svg`) alongside the connector `png`.
38
- - If the figure matters beyond transient debugging, open `figure-polish/SKILL.md` and follow its render-inspect-revise workflow before treating the image as final.
39
- - If plotting in Python, reuse the fixed Morandi plotting starter from the system prompt rather than inventing a new bright style for each run.
40
- - If the runtime starts an auto-continue turn with no new user message, continue from the current run state, logs, artifacts, and active requirements instead of replaying the previous user turn.
41
- - Progress message templates are references only. Adapt to the actual context and vary wording so messages feel human, respectful, and non-robotic.
42
- - If a threaded user reply arrives, interpret it relative to the latest experiment progress update before assuming the task changed completely.
43
- - Hard execution rule: every terminal command in this stage must go through `bash_exec`; do not use any other terminal path for smoke tests, real runs, Git, Python, package-manager, or file-inspection commands.
44
- - Prefer `bash_exec` for experiment commands so each run gets a durable session id, quest-local log folder, and later `read/list/kill` control.
45
- - For meaningful long-running runs, include the estimated next reply time or next check-in window whenever it is defensible.
14
+ Use `experiment` when:
46
15
 
47
- ## Tool discipline
16
+ - a baseline is accepted
17
+ - an idea has been selected
18
+ - the evaluation contract is explicit
19
+ - the quest is ready for implementation and measurement rather than framing, route selection, or writing
48
20
 
49
- - **Do not use native `shell_command` / `command_execution` in this skill.**
50
- - **All smoke tests, real runs, shell, CLI, Python, bash, node, git, npm, uv, and environment work must go through `bash_exec(...)`.**
51
- - **For git work inside the current quest repository or worktree, prefer `artifact.git(...)` before raw shell git commands.**
52
- - **If a scratch repository or isolated test environment is needed, create and drive it through `bash_exec(...)`, not native shell tools.**
21
+ Do not use `experiment` when:
22
+
23
+ - the baseline gate is unresolved
24
+ - the idea stage still has unresolved tradeoffs
25
+ - the main need is writing or follow-up analysis rather than a main run
26
+ - the real problem is still route choice, baseline recovery, or open-ended optimization rather than one bounded measured run
53
27
 
54
- ## Stage purpose
28
+ ## One-sentence summary
55
29
 
56
- The experiment stage should turn a selected idea into auditable evidence.
57
- It should preserve the strongest old experiment-planning and execution discipline:
30
+ Turn one selected route into one trustworthy measured result with the smallest valid amount of execution, then record and route from the evidence.
58
31
 
59
- - define the run contract before execution
60
- - keep the run comparable to baseline
61
- - capture configs, commands, logs, and metrics
62
- - report both success and failure honestly
63
- - route the next action through an explicit decision
32
+ ## Quick workflow
64
33
 
65
- The experiment stage is not just "run code".
66
- It is the stage that converts an idea contract into evidence that other stages can trust.
67
- It is also the stage that should decide the next route once the measured result exists.
68
- Within the user's explicit constraints, maximize valid evidence per unit time and compute.
69
- Prefer equivalence-preserving efficiency upgrades first: larger safe batch size, mixed precision, gradient accumulation, dataloader workers, cache reuse, checkpoint resume, precomputed features, and smaller pilots.
70
- If a proposed efficiency change alters optimization dynamics, effective budget, or baseline comparability, treat it as a real experiment change and record it as such.
34
+ - Recover the selected idea, accepted baseline, metric contract, and current workspace before implementation.
35
+ - Keep the selected idea summarized in `1-2` sentences, then write a minimal code-change map before touching broad code.
36
+ - Define the null hypothesis, alternative hypothesis, research question, research type, research objective, experimental setup, experimental results, experimental analysis, and experimental conclusions as the run matures.
37
+ - Run only the checks needed to maximize valid evidence per unit time and compute.
38
+ - Use equivalence-preserving efficiency upgrades when they preserve baseline comparability; For `comparison_ready`, `verify-local-existing`, attach, or import should usually beat full reproduction.
39
+ - If an efficiency change affects baseline comparability, treat it as a real experiment change.
40
+ - Prefer one clean implementation pass and one real run over repeated half-runs when the route is already concrete.
41
+ - Implement according to the current `PLAN.md`; revise the plan before changing the route.
42
+ - implement according to the current `PLAN.md`
43
+ - Extra metrics are allowed, but missing required metrics are not.
44
+ - extra metrics are allowed, but missing required metrics are not
45
+ - If a useful non-canonical metric appears, record it as supplementary output rather than replacing the canonical comparator.
46
+ - In algorithm-first work, `experiment` is the execution surface of `optimize`, then results return to `optimize` or `decision` for frontier review.
47
+ - End with a concise `1-2` sentence outcome summary, `evaluation_summary`, `claim_update`, `baseline_relation`, `failure_mode`, and `next_action`.
71
48
 
72
- Use `references/evidence-ladder.md` when deciding whether the current package is merely executable, solid enough to carry the main claim, or already in the stage where broader polish is justified.
49
+ ## Required plan and checklist
73
50
 
74
- Completing one main run is not quest completion.
75
- After reporting the run, keep moving to iterate, analyze, write, or finalize unless a genuine blocking decision remains.
51
+ Use `PLAN.md` and `CHECKLIST.md` when the run is non-trivial, expensive, or branch/worktree-sensitive.
52
+ If the plan or checklist is stale, revise `PLAN.md` before spending more code or compute.
53
+ Keep a rolling run log or rolling durable experiment log that captures command ids, output paths, metric changes, and blocker changes.
76
54
 
77
- When the quest is algorithm-first, treat `experiment` as the execution surface of `optimize`, not as the terminal goal of the workflow.
78
- After a measured result, the default next move is frontier review and optimize-side route selection rather than paper packaging.
55
+ The planning surface should cover:
79
56
 
80
- ## Quick workflow
57
+ - selected idea summarized in `1-2` sentences
58
+ - minimal code-change map
59
+ - experiment tier: `auxiliary/dev` or `main/test`
60
+ - minimum -> solid -> maximum evidence target
61
+ - significance-testing plan when statistical claims are likely
62
+ - references/main-experiment-plan-template.md
63
+ - references/main-experiment-checklist-template.md
64
+
65
+ Incremental-recording rule: record the run contract early, update it as evidence arrives, and do not wait until the end to reconstruct what happened.
66
+
67
+ ## Control workflow
68
+
69
+ 1. Lock the run contract.
70
+ Make explicit the research question, baseline reference, dataset/split, metric keys, stop condition, abandonment condition, and expected outputs.
71
+ 2. Implement only the minimum hypothesis-bound change.
72
+ Keep the baseline read-only and avoid unrelated cleanup or hidden scope expansion.
73
+ 3. Run a bounded smoke or pilot only when the command path, output schema, or evaluator wiring are still unverified.
74
+ 4. Execute and monitor the real run honestly.
75
+ Preserve commands, configs, logs, outputs, comparability, and the last-known-good state.
76
+ 5. Validate and record the result.
77
+ Check metric completeness and comparability, then call `artifact.record_main_experiment(...)` and choose the next route.
78
+
79
+ ## AVOID / pitfalls
80
+
81
+ - Do not confuse smoke or pilot success with main evidence.
82
+ - Do not silently change dataset, split, metric definition, evaluator logic, or baseline comparison recipe.
83
+ - Do not retry without a real route, code, command, environment, or evidence change.
84
+ - Do not claim success before durable outputs exist and `artifact.record_main_experiment(...)` succeeds.
85
+ - Do not record a durable main experiment from an idea branch, quest root branch, or paper branch as if that were the final result node.
86
+ - Do not disguise idea search or route revision as a routine rerun.
87
+ - Do not keep rerunning after the next route is already clear.
88
+
89
+ ## Constraints
90
+
91
+ - All smoke tests, real runs, shell, CLI, Python, bash, node, git, npm, uv, and environment work must go through `bash_exec(...)`.
92
+ - For git work inside the current quest repository or worktree, prefer `artifact.git(...)` before raw shell git commands.
93
+ - Keep the accepted baseline reference read-only.
94
+ - If `active_baseline_metric_contract_json` exists, required baseline metric keys must still be covered unless a concrete deviation is durably recorded.
95
+ - Durable main experiments should land on their own `run/*` branch or an equivalent isolated run surface.
96
+ - If an active paper line or selected outline already exists, a recorded main experiment should be synchronized into the current paper contract instead of living only as a run artifact.
97
+ - In algorithm-first work, after each main run, return to `optimize` or `decision` for frontier review before launching another large run.
98
+ - Main-run evidence is not complete until `artifact.record_main_experiment(...)` succeeds.
99
+
100
+ ## Validation
81
101
 
82
- Treat this as the short run-order summary. The detailed run contract, execution rules, and recording rules remain in `Workflow`.
102
+ Before `experiment` can end, all applicable checks should be true:
83
103
 
84
- 1. Restate the selected idea in `1-2` sentences and confirm the baseline comparison contract.
85
- 2. Before substantial code edits or the real main run, create `PLAN.md` and `CHECKLIST.md`.
86
- 3. Materialize or confirm a dedicated child `run/*` branch/worktree for this main experiment line; one durable main experiment should map to one run branch and one Canvas node.
87
- 4. Use `PLAN.md` to lock the concrete run path, and use `CHECKLIST.md` as the living control surface while planning, implementing, pilot testing, running, and validating.
88
- 5. Run a bounded smoke test or pilot before the real long run, then launch the real run with durable logging and monitor it through `bash_exec`.
89
- 6. Once the route is concrete, prefer one clean implementation pass, one bounded smoke or pilot run, and then one normal main run; retry only after a concrete failure, invalidity, or genuinely new evidence justifies another attempt.
90
- 7. Revise the plan if implementation, comparability, runtime, or route assumptions change materially, and close each real main-run milestone with a concise `1-2` sentence summary that says what was tested, whether performance improved / worsened / stayed mixed, and the exact next action.
104
+ - outputs correspond to the intended code and config
105
+ - required metric keys are present and finite
106
+ - baseline comparison is still comparable, or the deviation is explicit
107
+ - the claim is classified as `supported`, `refuted`, or `inconclusive`
108
+ - the run manifest includes exact command, config, seed, and environment snapshot
109
+ - `evaluation_summary` exists with the six stable fields the next stage needs
110
+ - if a paper line is active, the run is visible through the current paper contract rows rather than only through the run artifact
111
+ - `artifact.record_main_experiment(...)` succeeded
112
+ - the next route is explicit
113
+
114
+ ## Interaction discipline
115
+
116
+ Follow the shared interaction contract injected by the system prompt.
117
+ Keep run updates brief unless the measured result, blocker state, or next route changed materially.
118
+ For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
119
+
120
+ ## Tool discipline
121
+
122
+ - **Do not use native `shell_command` / `command_execution` in this skill.**
123
+ - **All smoke tests, real runs, shell, CLI, Python, bash, node, git, npm, uv, and environment work must go through `bash_exec(...)`.**
124
+ - **For git work inside the current quest repository or worktree, prefer `artifact.git(...)` before raw shell git commands.**
125
+ - **If a scratch repository or isolated test environment is needed, create and drive it through `bash_exec(...)`, not native shell tools.**
91
126
 
92
127
  ## Non-negotiable rules
93
128
 
@@ -99,78 +134,7 @@ Treat this as the short run-order summary. The detailed run contract, execution
99
134
  - Implement the claimed mechanism, not a convenient shortcut that changes the theory.
100
135
  - Keep the baseline reference read-only.
101
136
  - Avoid asking the user to fix the environment unless there is no credible agent-side path left.
102
- - Do not record a durable main experiment from an idea branch, quest root branch, or paper branch as if that were the final result node; every durable main experiment should land on its own `run/*` branch.
103
- - After each `artifact.record_main_experiment(...)`, route from the measured result:
104
- - if paper mode is enabled, decide whether to strengthen evidence, analyze, or write
105
- - if paper mode is disabled, prefer iterate / revise-idea / branch over default writing
106
- - In algorithm-first work, after each main run, return to `optimize` or `decision` for frontier review before launching another large run.
107
-
108
- ## Experiment mental guardrails
109
-
110
- - Baseline reproduction is not wasted time; untrusted comparison is wasted time.
111
- - Failed runs are still data when the delta and diagnosis are recorded clearly.
112
- - Suspiciously good results deserve the same skepticism as obvious failures.
113
- - Change less, learn more.
114
- - If a retry does not add new evidence, it is budget burn rather than progress.
115
-
116
- ## Use when
117
-
118
- - a baseline is accepted
119
- - an idea has been selected
120
- - the evaluation contract is explicit
121
- - the quest is ready for implementation and measurement
122
-
123
- ## Do not use when
124
-
125
- - the baseline gate is unresolved
126
- - the idea stage still has unresolved tradeoffs
127
- - the main need is writing or follow-up analysis rather than a main run
128
-
129
- ## Preconditions and gate
130
-
131
- Before a main run starts, confirm:
132
-
133
- - selected idea or hypothesis
134
- - baseline reference
135
- - dataset and split
136
- - primary metric
137
- - stop condition
138
- - resource budget
139
- - dedicated `run/*` target branch or isolated worktree for this exact main experiment
140
- - exact output location
141
- - required metric keys for acceptance
142
- - minimal experiment and abandonment condition from the idea stage
143
-
144
- If any of these are materially unknown, stop and resolve them through `decision`.
145
-
146
- ## Required plan and checklist
147
-
148
- Before substantial implementation work or a real main run, create a quest-visible `PLAN.md` and `CHECKLIST.md`.
149
-
150
- - Use `references/main-experiment-plan-template.md` as the canonical structure for `PLAN.md`.
151
- - Use `references/main-experiment-checklist-template.md` as the canonical structure for `CHECKLIST.md`.
152
- - `PLAN.md` should lead with the selected idea summarized in `1-2` sentences, put the user's explicit requirements and non-negotiable constraints first, and then make the run contract concrete: baseline and comparability rules, safe efficiency levers, code touchpoints, minimal code-change map, smoke / pilot path, full-run path, fallback options, monitoring and sleep rules, expected outputs, and a revision log.
153
- - `CHECKLIST.md` is the living execution list; update it during planning, implementation, smoke testing, main execution, validation, and every material route change.
154
- - If the code path, comparability contract, runtime strategy, or execution route changes materially, revise `PLAN.md` before spending more code or compute.
155
- - The later `RUN.md`, `summary.md`, and artifact payloads remain required outputs, but `PLAN.md` and `CHECKLIST.md` are the canonical planning-and-control surface before and during execution.
156
- - Once `PLAN.md` makes the implementation route concrete, do not keep reshaping code and commands speculatively. The normal default is one bounded smoke or pilot run and then one real run, with retries only after a documented failure, invalidity, or new evidence that changes the expected outcome.
157
-
158
- ## Working-boundary rules
159
-
160
- Only modify the active quest workspace for this experiment line.
161
-
162
- - treat the accepted baseline workspace as read-only
163
- - do not derive branch or worktree assumptions from guesswork
164
- - keep all durable outputs inside the quest
165
- - if the runtime gives an explicit worktree path, use it exactly
166
-
167
- ## Resource and environment rules
168
-
169
- - Follow the explicit resource assignment if one exists.
170
- - If GPU assignment is explicit, respect it exactly and record it in the run manifest.
171
- - Do not silently consume extra GPUs or broaden resource scope.
172
- - Capture enough environment information that the run can later be reconstructed.
173
- - If a new dependency appears necessary, record it as a risk and prefer a fallback if possible.
137
+ - After each `artifact.record_main_experiment(...)`, route from the measured result instead of stopping at run finished”.
174
138
 
175
139
  ## Truth sources
176
140
 
@@ -193,424 +157,45 @@ Do not claim run success without durable outputs.
193
157
  A meaningful experiment pass should leave behind:
194
158
 
195
159
  - a run directory under `artifacts/experiment/<run_id>/` or the quest-equivalent canonical location
196
- - `artifact_manifest.json`
197
- - `run_manifest.json`
198
- - `metrics.json`
199
- - `metrics.md`
200
- - `summary.md`
201
- - `runlog.summary.md`
202
160
  - durable command, config, and log pointers
203
161
  - exported shell log, typically `bash.log`
204
162
  - a run artifact with explicit deltas versus baseline
205
163
  - a decision about what should happen next
206
164
 
207
- Recommended additional files:
208
-
209
- - `claim_validation.md`
210
- - environment snapshot files such as:
211
- - Python version
212
- - package freeze
213
- - GPU info when applicable
214
- - a live execution note or rolling run log when the experiment spans multiple implementation or execution steps
215
-
216
- `run_manifest.json` should capture at least:
217
-
218
- - `run_id`
219
- - quest / branch context
220
- - baseline reference or commit
221
- - full commands
222
- - config paths and key resolved hyperparameters
223
- - dataset identifier or version
224
- - seeds
225
- - environment snapshot paths
226
- - start time, end time, and final status
165
+ For the exact run-manifest fields, checklist template, and detailed recording contract, use the references listed below.
227
166
 
228
- If a command needed for environment capture is unavailable, record that gap in the manifest and summary.
167
+ ## Evidence ladder note
229
168
 
230
- ## Workflow
231
-
232
- ### 1. Define the run contract
233
-
234
- Before implementation or execution, state:
235
-
236
- - `run_id`
237
- - experiment tier: `auxiliary/dev` or `main/test`
238
- - research question
239
- - null hypothesis
240
- - alternative hypothesis
241
- - hypothesis
242
- - baseline id or variant
243
- - metric targets
244
- - expected changed files
245
- - expected outputs
246
- - stop condition
247
- - compute or runtime budget
248
- - minimal experiment
249
- - abandonment condition
250
- - strongest alternative hypothesis
251
- - exact metric keys that will decide success or failure
252
-
253
- Prefer to write this contract first in `PLAN.md` using `references/main-experiment-plan-template.md`, then keep the current execution state visible in `CHECKLIST.md` using `references/main-experiment-checklist-template.md`.
254
-
255
- For substantial runs, also record the following seven experiment fields early and keep them updated during execution:
256
-
257
- 1. research question
258
- 2. research type
259
- 3. research objective
260
- 4. experimental setup
261
- 5. experimental results
262
- 6. experimental analysis
263
- 7. experimental conclusions
264
-
265
- If the run contract changes materially later, record the change durably.
266
-
267
- Treat the run contract as a research question contract, not only an execution checklist.
268
- Before coding, be able to explain:
269
-
270
- - why this run is the best current route rather than the main alternatives
271
- - what observation would count as a real answer to the research question
272
- - what result would force a downgrade, retry, or route change
273
- - what confounder would make the run non-comparable even if it finishes successfully
274
-
275
- If multiple candidate experiment packages exist, prefer the one with the best balance of:
276
-
277
- - technical feasibility
278
- - research importance
279
- - methodological rigor
280
-
281
- Do not choose a package only because it sounds ambitious.
282
-
283
- For paper-facing lines, default to this evidence ladder:
169
+ Use `references/evidence-ladder.md` when deciding whether the current package is merely executable, solid enough to carry the main claim, or already in the stage where broader polish is justified.
284
170
 
285
- - `auxiliary/dev`
286
- - clarify parameters, settings, mechanisms, or diagnostics
287
- - `main/test`
288
- - carry the core comparison the paper will rely on
289
- - `minimum -> solid -> maximum`
290
- - first make the result executable and comparable
291
- - then make it strong enough to carry the claim
292
- - only then spend effort on broader supporting polish
171
+ The default ladder is:
293
172
 
294
- ### 2. Run a preflight check
173
+ - `minimum`: executable and comparable
174
+ - `solid`: strong enough to carry the main claim
175
+ - `maximum`: broader supporting polish after the main claim is already credible
295
176
 
296
- Before editing or executing:
177
+ Do not spend for `maximum` before the line is at least `solid`.
297
178
 
298
- - confirm the dataset path, version, and split contract
299
- - confirm the baseline metrics reference
300
- - if durable state exposes `active_baseline_metric_contract_json`, read that JSON file before planning commands or comparisons
301
- - treat `active_baseline_metric_contract_json` as the default authoritative baseline comparison contract unless you record a concrete reason to override it
302
- - confirm the selected idea claim and code-level plan
303
- - look up prior incidents or repeated failure patterns when available
304
- - confirm output directories and naming
305
- - confirm that the intended run still matches the current quest decision
179
+ ## Planning note
306
180
 
307
- If a repeated failure pattern already exists, apply the mitigation first and record that choice.
181
+ Use quest or workspace planning files only when they help control a non-trivial run.
182
+ Otherwise keep the run contract small and move to the first decisive execution step.
308
183
 
309
- Also confirm before comparison work:
310
-
311
- - the baseline verification is trustworthy enough
312
- - the planned comparison still uses the same metric contract
313
- - the metric keys and primary metric still match `active_baseline_metric_contract_json` when that file is available
314
- - every main experiment submission still covers all required baseline metric ids from `active_baseline_metric_contract_json`; extra metrics are allowed, but missing required metrics are not
315
- - the required baseline metrics still use the same evaluation code and metric definitions; if an extra evaluator is genuinely necessary, record it as supplementary output rather than replacing the canonical comparator
316
- - if the run is `main/test` and superiority is likely to be claimed, define the significance-testing plan before execution rather than after seeing the numbers
317
- - if `Result/metric.md` was used during the run, treat it as optional scratch memory only and reconcile it against the final submitted metrics before `artifact.record_main_experiment(...)`
318
-
319
- Before you begin a substantial run, send a concise threaded `artifact.interact(kind='progress', ...)` update naming:
184
+ ## Operational guidance
320
185
 
321
- - the run contract you are about to execute
322
- - the main evidence it is testing
323
- - the expected durable outputs
324
- - the next checkpoint for reporting back
186
+ The main skill keeps the control surface in front.
187
+ For the longer operational notes, read the references:
325
188
 
326
- ### 2.1 Diagnostic mode trigger
189
+ - `references/main-experiment-plan-template.md`
190
+ - `references/main-experiment-checklist-template.md`
191
+ - `references/execution-playbook.md`
192
+ - `references/operational-guidance.md`
327
193
 
328
- Switch from ordinary execution mode into diagnosis mode when any of the following becomes true:
329
-
330
- - two retries in a row add no new evidence or no interpretable delta
331
- - the baseline gap is much larger than expected and the cause is unclear
332
- - the metrics are suspiciously strong, suspiciously identical to baseline, or highly unstable
333
- - logs, checkpoints, or intermediate outputs conflict with the claimed behavior
334
-
335
- In diagnosis mode:
336
-
337
- - stop brute-force retrying
338
- - prefer the smallest discriminative test that can separate competing hypotheses
339
- - resolve obvious environment or data-contract issues before launching another comparison run
340
- - make the diagnosis goal explicit: explain the behavior, not just "try something else"
341
-
342
- ### 3. Confirm the execution workspace
343
-
344
- The normal experiment workspace is the current active idea worktree returned by `artifact.submit_idea(...)`.
345
-
346
- - do not create a fresh manual branch for the main experiment unless recovery or debugging truly requires it
347
- - implement and run inside the current active idea workspace
348
- - if the idea package changes materially before execution, submit a new durable idea branch with `artifact.submit_idea(mode='create', lineage_intent='continue_line'|'branch_alternative', ...)` instead of silently mutating the old node
349
- - after a real main run finishes, record it with `artifact.record_main_experiment(...)` before moving to analysis or writing
350
- - once that durable main result exists, treat the branch as a fixed round node; a later new optimization round should usually compare foundations and create a new `continue_line` child branch or `branch_alternative` sibling-like branch
351
- - after `artifact.record_main_experiment(...)`, if QQ milestone media is enabled and the metrics are stable enough to summarize honestly, prefer one concise summary PNG over multiple attachments
352
-
353
- ### 4. Implement the minimum required change
354
-
355
- Implementation rules:
356
-
357
- - keep the change hypothesis-bound
358
- - prefer small, explainable edits
359
- - avoid unrelated cleanup during a main run
360
- - record which files matter for later review
361
- - preserve theory fidelity between the idea claim and the code change
362
- - add robustness checks when the mechanism risks NaN, inf, or unstable behavior
363
- - implement according to the current `PLAN.md` instead of repeatedly improvising a new method after each small observation
364
- - avoid repeated code churn between the smoke test and the real run unless the smoke test exposes a specific problem that the next change is meant to fix
365
-
366
- Prefer to complete one experiment cleanly before expanding to the next, unless parallel execution is explicitly justified and isolated.
367
- For substantial experiment packages, the default is one experiment at a time, with each one reaching a recoverable recorded state before the next begins.
194
+ Use them when:
368
195
 
369
- Retry-delta discipline:
370
-
371
- - unless the current state is completely non-executable, change only one major variable per retry
372
- - if broader recovery is unavoidable, record exactly which layer changed: data, preprocessing, model, objective, optimization, evaluation, or environment
373
- - before each retry, state the expected effect and the fastest falsification signal
374
- - if the retry produced no interpretable delta, do not treat it as meaningful evidence about the underlying research hypothesis
375
-
376
- ### 5. Execute the run
377
-
378
- Run with auditable commands and durable outputs.
379
-
380
- Execution rules:
381
-
382
- - use non-interactive commands
383
- - prefer `bash_exec` instead of ephemeral shell invocations
384
- - use the intended dataset and split
385
- - keep logs durable
386
- - report progress for long runs
387
- - avoid silent metric-definition changes
388
- - do not drift away from `active_baseline_metric_contract_json` silently when that file exists
389
- - avoid silently changing the baseline comparison recipe
390
- - run the full agreed evaluation, not only a smoke test
391
-
392
- You may do a quick sanity run first, but if the stage goal is a real experiment you must continue to the real evaluation unless the run is blocked and recorded.
393
-
394
- Pilot-before-scale rule:
395
-
396
- - start with a bounded pilot when the modification is non-trivial
397
- - use the pilot to catch implementation mistakes early
398
- - record pilot outcomes explicitly
399
- - do not mistake pilot success for final evidence
400
-
401
- Incremental-recording rule:
402
-
403
- - do not wait until the end to reconstruct the run from memory
404
- - update the durable run note after:
405
- - contract definition
406
- - important code changes
407
- - pilot validation
408
- - full execution checkpoints
409
- - post-run analysis
410
- - update `CHECKLIST.md` alongside those durable notes so the current execution frontier is obvious without replaying the whole log
411
- - include timestamps when they materially help reconstruction
412
- - preserve failed attempts, anomalies, and partial outcomes rather than overwriting them
413
-
414
- Last-known-good rule:
415
-
416
- - keep track of the most recent state that was executable, comparable, and explainable
417
- - when a new attempt breaks that state, debug forward from the last-known-good point instead of stacking more speculative edits on top of the broken state
418
- - if the last-known-good state is unclear, reconstruct it before spending more budget on new hypotheses
419
-
420
- ### 5.1 Long-running command protocol
421
-
422
- For commands that may run longer than a few minutes:
423
-
424
- - before the real long run, execute a bounded smoke test or pilot that validates command paths, outputs, and basic metrics
425
- - once the smoke test passes, launch the real run with `bash_exec(mode='detach', ...)` and normally leave `timeout_seconds` unset for that long run
426
- - monitor through durable logs rather than only live terminal output
427
- - `bash_exec(mode='read', id=...)` returns the full rendered log when it is 2000 lines or fewer; for longer logs it returns the first 500 lines plus the last 1500 lines and a hint to inspect omitted sections with `start` and `tail`
428
- - if the middle of a long saved log matters, inspect that omitted region with `bash_exec(mode='read', id=..., start=..., tail=...)`
429
- - use `bash_exec(mode='list')` and `bash_exec(mode='read', id=..., tail_limit=..., order='desc')` to monitor or revisit managed commands while focusing on the newest evidence first
430
- - after the first read, prefer `bash_exec(mode='read', id=..., after_seq=last_seen_seq, tail_limit=..., order='asc')` so later checks only fetch new evidence
431
- - if you need to recover ids or sanity-check the active session ordering, use `bash_exec(mode='history')`
432
- - launch important runs with a structured `comment` such as `{stage, goal, action, expected_signal, next_check}`
433
- - use `silent_seconds`, `progress_age_seconds`, `signal_age_seconds`, and `watchdog_overdue` from `bash_exec(mode='list'|'read', ...)` as your default watchdog signals
434
- - use an explicit wait-and-check loop such as:
435
- - wait about `60s`, then inspect logs
436
- - wait about `120s`, then inspect logs
437
- - wait about `300s`, then inspect logs
438
- - wait about `600s`, then inspect logs
439
- - wait about `1800s`, then inspect logs
440
- - then keep checking about every `1800s` while the run is still active
441
- - if needed, use an explicit bounded wait such as `bash_exec(command='sleep 60', mode='await', timeout_seconds=70)` or `bash_exec(mode='await', id=..., timeout_seconds=...)` between checks
442
- - canonical sleep choice:
443
- - if you only need wall-clock waiting between checks, use `bash_exec(command='sleep N', mode='await', timeout_seconds=N+buffer, ...)`
444
- - keep a real buffer on that sleep timeout; do not set `timeout_seconds` exactly equal to `N`
445
- - if you are waiting on an already running managed session, prefer `bash_exec(mode='await', id=..., timeout_seconds=...)` instead of starting a new sleep command
446
- - after every completed sleep / await cycle, inspect logs first; only send `artifact.interact(kind='progress', ...)` when the user-visible state, frontier, blocker status, or ETA materially changed
447
- - after the first meaningful signal and then at real checkpoints (e.g., completion, recovery, blocker, or a materially widened comparable surface), keep those progress updates going rather than waiting silently
448
- - if the run is clearly invalid, wedged, or superseded, stop it with `bash_exec(mode='kill', id=..., wait=true, timeout_seconds=...)`; if it must die immediately, add `force=true`, record the reason, fix the issue, and relaunch cleanly
449
- - do not report completion until logs and output files both confirm completion
450
-
451
- Always preserve the managed `bash_exec` log and export it into the experiment artifact directory when the run artifact is written.
452
-
453
- ### 5.2 Progress marker protocol
454
-
455
- Long loops should emit structured progress markers rather than noisy raw progress bars.
456
-
457
- - use single-line JSON progress markers
458
- - keep them throttled
459
- - treat them as UI signals, not narrative prose
460
- - do not paste raw progress lines into summaries
461
- - when possible include `eta` in seconds and `next_reply_at` or `next_check_at` so web/TUI can show the next expected update
462
-
463
- If you control the code, prefer a throttled `tqdm`-style progress reporter for the run itself and pair it with concise structured `__DS_PROGRESS__` lines when feasible so monitoring remains machine-readable.
464
-
465
- ### 6. Validate the outputs
466
-
467
- After the run, verify:
468
-
469
- - outputs correspond to the intended code/config
470
- - metrics are complete and interpretable
471
- - comparison to baseline is fair
472
- - any failure mode or confounder is visible
473
- - required metric keys are present and finite
474
- - the result can be mapped back to the original claim
475
- - the summary states a clear go or no-go recommendation
476
-
477
- Create a durable claim-validation record that maps:
478
-
479
- - claim
480
- - metric key
481
- - expected direction
482
- - observed result
483
- - verdict:
484
- - `supported`
485
- - `refuted`
486
- - `inconclusive`
487
-
488
- Also verify baseline comparability before claiming deltas:
489
-
490
- - was the baseline verification stable?
491
- - was the evaluation path the same?
492
- - are the compared metric keys identical?
493
- - if the run is claim-carrying, are the significance results or uncertainty estimates strong enough for main-text use?
494
- - do known caveats make the delta weaker than it first appears?
495
-
496
- ### 7. Record the run
497
-
498
- Every meaningful main run must be recorded through `artifact.record_main_experiment(...)`.
499
-
500
- That call is responsible for writing:
501
-
502
- - `experiments/main/<run_id>/RUN.md`
503
- - `experiments/main/<run_id>/RESULT.json`
504
- - the durable `run` artifact payload
505
- - baseline comparisons
506
- - breakthrough status derived by the system
507
-
508
- `artifact.record_main_experiment(...)` should include at least:
509
-
510
- - `run_id`
511
- - title
512
- - hypothesis
513
- - setup
514
- - execution
515
- - results
516
- - conclusion
517
- - baseline reference
518
- - `metrics_summary`
519
- - `metric_rows` when available
520
- - the metric contract actually used
521
- - verdict
522
- - evidence paths
523
- - changed files
524
- - relevant config paths when applicable
525
- - `evaluation_summary` with exactly these six fields:
526
- - `takeaway`
527
- - `claim_update`
528
- - `baseline_relation`
529
- - `comparability`
530
- - `failure_mode`
531
- - `next_action`
532
-
533
- Use `evaluation_summary` as the short structured judgment layer on top of the longer narrative fields:
534
-
535
- - `takeaway`: one sentence the next reader can reuse directly
536
- - `claim_update`: `strengthens`, `weakens`, `narrows`, or `neutral`
537
- - `baseline_relation`: `better`, `worse`, `mixed`, or `not_comparable`
538
- - `comparability`: `high`, `medium`, or `low`
539
- - `failure_mode`: `none`, `implementation`, `evaluation`, `environment`, or `direction`
540
- - `next_action`: the immediate route such as `continue`, `revise_idea`, `analysis_campaign`, `write`, or `stop`
541
-
542
- After `artifact.record_main_experiment(...)` succeeds, do not assume the same branch should absorb the next round by default.
543
- Interpret the measured result first, then either:
544
-
545
- - launch analysis from this branch, or
546
- - compare candidate foundations and create the next child research branch
547
-
548
- Use `artifact.create_analysis_campaign(...)` only when the extra slices have clear academic or claim-level value relative to their resource cost.
549
- If the main need is simply to continue optimization from a measured result, prefer a new durable child idea branch instead of an expensive analysis package by reflex.
550
- If the extra work should happen on an older durable branch rather than the current head, first switch the runtime back there with `artifact.activate_branch(...)`, then launch the analysis campaign from that activated workspace.
551
-
552
- When `artifact.record_main_experiment(...)` succeeds, send a richer threaded `artifact.interact(kind='milestone', ...)` update rather than a generic one-line progress ping.
553
- Lead that milestone with a concise `1-2` sentence outcome summary before expanding into more detail.
554
- That milestone should state:
555
-
556
- - the research question that was tested
557
- - the primary result and baseline delta
558
- - whether the run supports, weakens, or leaves the idea inconclusive
559
- - the main caveat or confidence note that still matters
560
- - the exact recommended next move
561
-
562
- Do not treat a main run as durably complete until `artifact.record_main_experiment(...)` succeeds.
563
-
564
- Recommended per-run documentation fields:
565
-
566
- 1. research question
567
- 2. research type
568
- 3. research objective
569
- 4. experimental setup
570
- 5. experimental results
571
- 6. experimental analysis
572
- 7. experimental conclusions
573
-
574
- These seven fields should be progressively filled as the run advances, not only at final packaging time.
575
-
576
- `RUN.md` should make it easy for later stages to answer:
577
-
578
- - what changed?
579
- - how can this run be reproduced?
580
- - what are the main results?
581
- - why did it work or fail?
582
- - what should happen next?
583
-
584
- When the run is analysis-heavy or meant to fill a writing evidence gap, prefer a structured summary with:
585
-
586
- 1. research question
587
- 2. research type
588
- 3. objective and success criteria
589
- 4. setup
590
- 5. results
591
- 6. analysis
592
- 7. conclusion
593
-
594
- Recording rules:
595
-
596
- - record results incrementally, not only at the end
597
- - include timestamps when helpful
598
- - include failed attempts, partial runs, and unexpected outcomes
599
- - do not leave placeholder sections for later if the information is already known
600
- - report exactly what happened, not what you hoped would happen
601
-
602
- ### 8. Decide the next move
603
-
604
- The experiment stage should normally end with one of:
605
-
606
- - continue the current line
607
- - branch a new line
608
- - launch an analysis campaign
609
- - move to writing
610
- - reset or stop
611
-
612
- Do not let the stage end without an explicit next direction.
613
- If analysis is selected, record why the expected information gain is strong enough to justify the added compute, time, or annotation budget.
196
+ - the run contract is non-trivial
197
+ - the long-running protocol or monitoring cadence matters
198
+ - the exact manifest, artifact, memory, or charting rules matter
614
199
 
615
200
  ## Run-quality rules
616
201
 
@@ -622,7 +207,7 @@ A credible main run should satisfy:
622
207
  - outcome can be explained by the intended intervention or its failure
623
208
  - commands, configs, and seeds are reconstructable
624
209
  - environment context is reconstructable
625
- - frontend or later readers can trace code and diff context to command, logs, and metrics
210
+ - later readers can trace code and diff context to command, logs, and metrics
626
211
 
627
212
  If the result is confounded, say so directly.
628
213
 
@@ -640,108 +225,6 @@ Before marking the run complete, verify all of the following:
640
225
 
641
226
  If these checks fail, record the run as partial or blocked rather than pretending it is complete.
642
227
 
643
- ## Memory rules
644
-
645
- Stage-start requirement:
646
-
647
- - begin every experiment pass with `memory.list_recent(scope='quest', limit=5)`
648
- - then run at least one experiment-relevant `memory.search(...)` before a new run, retry, or material execution change
649
- - if several idea or experiment lines exist, narrow retrieval to the current `idea_id`, `branch`, and `run_id`; do not casually reuse memory from another idea line unless you are explicitly comparing lines
650
-
651
- Write to memory only when the lesson is reusable, such as:
652
-
653
- - experiment failure patterns
654
- - stable implementation lessons
655
- - evaluation pitfalls
656
- - validated mechanism scope and caveats
657
-
658
- The canonical record of the run itself belongs in `artifact`, not only in memory.
659
-
660
- Preferred memory usage:
661
-
662
- - quest `ideas`:
663
- - the current idea contract and claim boundary
664
- - quest `decisions`:
665
- - run-scope choices
666
- - retry or branch decisions
667
- - stop conditions that must not drift
668
- - quest `episodes`:
669
- - failed runs
670
- - debugging episodes
671
- - suspicious-result investigations
672
- - repeated infrastructure or resource failures
673
- - quest `knowledge`:
674
- - validated mechanism scope
675
- - evaluation caveats
676
- - stable implementation lessons worth reusing in later runs of this quest
677
- - global `knowledge`:
678
- - reusable debugging heuristics
679
- - stable reproducibility lessons
680
- - cross-quest experiment design playbooks
681
- - global `templates`:
682
- - run-manifest patterns
683
- - claim-validation templates
684
- - experiment summary templates
685
-
686
- Use tags to refine retrieval when helpful, for example:
687
-
688
- - `stage:experiment`
689
- - `type:failure-pattern`
690
- - `type:metric-contract`
691
- - `type:claim-validation`
692
- - `topic:<mechanism>`
693
-
694
- When calling `memory.write(...)`, pass `tags` as an array like `["stage:experiment", "type:failure-pattern", "topic:<mechanism>"]`, not as one comma-joined string.
695
-
696
- Recommended read timing:
697
-
698
- - before the first run:
699
- - consult quest `ideas`, `decisions`, and relevant `knowledge`
700
- - before a retry:
701
- - search quest `episodes` first
702
- - before changing execution strategy materially:
703
- - re-check quest `decisions`
704
- - after suspicious results:
705
- - consult recent `episodes` and stable debugging `knowledge`
706
-
707
- Stage-end requirement:
708
-
709
- - successful runs should leave at least one reusable knowledge note if the lesson generalizes
710
- - failed or partial runs should leave an incident note when the failure pattern is reusable
711
- - every experiment `memory.write(...)` must state whether the outcome was `success`, `partial`, or `failure`
712
- - every experiment `memory.write(...)` should also include the current `idea_id`, `branch`, and `run_id` so later retrieval does not mix different experiment lines
713
-
714
- ## Artifact rules
715
-
716
- Typical artifact sequence:
717
-
718
- - progress artifact for long runs
719
- - `artifact.record_main_experiment(...)` at main-run completion
720
- - milestone or report artifact for major findings
721
- - decision artifact to choose next stage
722
-
723
- Preferred artifact choices:
724
-
725
- - use `progress` for long-running execution updates
726
- - use `artifact.record_main_experiment(...)` for each meaningful completed main experiment
727
- - use `run` for analysis slice records when `artifact.record_analysis_slice(...)` writes them
728
- - use `report` for:
729
- - analysis-rich summaries
730
- - suspicious-result investigations
731
- - post-run interpretation
732
- - use `milestone` when a major stage checkpoint is reached
733
- - use `decision` for:
734
- - continue
735
- - branch
736
- - analysis
737
- - write
738
- - reset
739
- - stop
740
- - use `approval` when an explicit user approval is captured for an expensive or risky run change
741
-
742
- Use `artifact.checkpoint(...)` when code evolution is meaningful and should be preserved in Git.
743
- After a meaningful experiment checkpoint or completion, emit `artifact.interact(kind='progress' | 'milestone', ...)` so the user sees the concrete result and next step.
744
-
745
228
  ## Failure and blocked handling
746
229
 
747
230
  A failed main run is still useful if it is explained well.
@@ -750,7 +233,7 @@ Record:
750
233
 
751
234
  - what was attempted
752
235
  - where the failure occurred
753
- - whether the failure is likely methodological or infrastructural
236
+ - whether the failure was methodological or infrastructural
754
237
  - what retry, branch, or reset is justified
755
238
  - the single best next action
756
239
 
@@ -771,31 +254,8 @@ Also classify the broader failure layer when possible:
771
254
  - environment
772
255
  - direction
773
256
 
774
- Do not collapse these into one bucket.
775
- A direction should only be treated as failing when repeated, well-instrumented evidence still points to underperformance after implementation, evaluation, and environment explanations have been checked.
776
-
777
- Blocked experiment states commonly include:
778
-
779
- - missing baseline reference
780
- - unknown metric contract
781
- - environment failure
782
- - run failed before producing metrics
783
- - metrics produced but not comparable
784
-
785
- When results are suspicious, such as identical to baseline, implausibly perfect, or inconsistent across repeats, diagnose systematically:
786
-
787
- 1. fix the subset and seeds
788
- 2. isolate preprocessing, tokenization, model init, training, and evaluation one by one
789
- 3. compare intermediate outputs on the same inputs
790
- 4. align inputs first, then outputs, then metrics
791
-
792
- Default diagnosis loop:
793
-
794
- 1. collect the concrete failure or anomaly cases
795
- 2. identify the last-known-good comparable state
796
- 3. define the smallest delta between working and broken states
797
- 4. write `2-4` concrete hypotheses
798
- 5. run the cheapest discriminative check before another full retry
257
+ Blocked experiment states commonly include missing baseline reference, unknown metric contract, environment failure, run failure before metrics, or metrics that are not comparable.
258
+ When results are suspicious, fix the subset and seeds, isolate preprocessing/model/training/evaluation one by one, compare intermediate outputs on the same inputs, and run the cheapest discriminative check before another full retry.
799
259
 
800
260
  ## Exit criteria
801
261
 
@@ -803,4 +263,6 @@ Exit the experiment stage once one of the following is durably true:
803
263
 
804
264
  - a main run is completed and recorded
805
265
  - the run failed and the blocker is durably recorded
806
- - the next step is clearly `analysis-campaign`, `write`, another `experiment`, or `reset`
266
+ - the next step is clearly `analysis-campaign`, `write`, another `experiment`, `optimize`, or `reset`
267
+
268
+ A good experiment pass leaves one interpretable result or one explicit blocker, not another vague promise to rerun later.