@researai/deepscientist 1.5.17 → 1.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +309 -130
- package/AISB/catalog/aisb.b1.agentic_coding.yaml +244 -0
- package/AISB/catalog/aisb.b10.climate_earth.yaml +235 -0
- package/AISB/catalog/aisb.b11.model_efficiency.yaml +231 -0
- package/AISB/catalog/aisb.b12.embodied_ai.yaml +238 -0
- package/AISB/catalog/aisb.b2.agent_systems.yaml +229 -0
- package/AISB/catalog/aisb.b3.self_evolving_rl.yaml +237 -0
- package/AISB/catalog/aisb.b4.lm_reasoning.yaml +240 -0
- package/AISB/catalog/aisb.b5.math_proof.yaml +235 -0
- package/AISB/catalog/aisb.b6.research_process.yaml +243 -0
- package/AISB/catalog/aisb.b7.multimodal_fusion.yaml +232 -0
- package/AISB/catalog/aisb.b8.lifesci_drug.yaml +275 -0
- package/AISB/catalog/aisb.b9.material_science.yaml +237 -0
- package/AISB/catalog/aisb.t3.001_savvy.yaml +159 -0
- package/AISB/catalog/aisb.t3.001_savvy.zh.yaml +121 -0
- package/AISB/catalog/aisb.t3.002_pinet.yaml +189 -0
- package/AISB/catalog/aisb.t3.002_pinet.zh.yaml +130 -0
- package/AISB/catalog/aisb.t3.004_decentralattn.yaml +184 -0
- package/AISB/catalog/aisb.t3.004_decentralattn.zh.yaml +153 -0
- package/AISB/catalog/aisb.t3.005_tsae.yaml +193 -0
- package/AISB/catalog/aisb.t3.005_tsae.zh.yaml +139 -0
- package/AISB/catalog/aisb.t3.006_physense.yaml +194 -0
- package/AISB/catalog/aisb.t3.006_physense.zh.yaml +118 -0
- package/AISB/catalog/aisb.t3.007_reasoningiqa.yaml +169 -0
- package/AISB/catalog/aisb.t3.007_reasoningiqa.zh.yaml +133 -0
- package/AISB/catalog/aisb.t3.008_meanflows.yaml +188 -0
- package/AISB/catalog/aisb.t3.008_meanflows.zh.yaml +140 -0
- package/AISB/catalog/aisb.t3.009_scoremissing.yaml +179 -0
- package/AISB/catalog/aisb.t3.009_scoremissing.zh.yaml +119 -0
- package/AISB/catalog/aisb.t3.010_suitabilityfilter.yaml +221 -0
- package/AISB/catalog/aisb.t3.010_suitabilityfilter.zh.yaml +141 -0
- package/AISB/catalog/aisb.t3.011_osd.yaml +206 -0
- package/AISB/catalog/aisb.t3.011_osd.zh.yaml +163 -0
- package/AISB/catalog/aisb.t3.012_efficientqat.yaml +206 -0
- package/AISB/catalog/aisb.t3.012_efficientqat.zh.yaml +159 -0
- package/AISB/catalog/aisb.t3.013_appl.yaml +152 -0
- package/AISB/catalog/aisb.t3.013_appl.zh.yaml +126 -0
- package/AISB/catalog/aisb.t3.014_piguard.yaml +207 -0
- package/AISB/catalog/aisb.t3.014_piguard.zh.yaml +164 -0
- package/AISB/catalog/aisb.t3.015_frspec.yaml +209 -0
- package/AISB/catalog/aisb.t3.015_frspec.zh.yaml +163 -0
- package/AISB/catalog/aisb.t3.016_mathfusion.yaml +166 -0
- package/AISB/catalog/aisb.t3.016_mathfusion.zh.yaml +145 -0
- package/AISB/catalog/aisb.t3.017_multimodalglp.yaml +171 -0
- package/AISB/catalog/aisb.t3.017_multimodalglp.zh.yaml +122 -0
- package/AISB/catalog/aisb.t3.018_cotsynth.yaml +206 -0
- package/AISB/catalog/aisb.t3.018_cotsynth.zh.yaml +162 -0
- package/AISB/catalog/aisb.t3.019_dyscaleut.yaml +211 -0
- package/AISB/catalog/aisb.t3.019_dyscaleut.zh.yaml +148 -0
- package/AISB/catalog/aisb.t3.020_aristotle.yaml +173 -0
- package/AISB/catalog/aisb.t3.020_aristotle.zh.yaml +119 -0
- package/AISB/catalog/aisb.t3.021_tokenrecycling.yaml +160 -0
- package/AISB/catalog/aisb.t3.021_tokenrecycling.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.022_chainofreasoning.yaml +204 -0
- package/AISB/catalog/aisb.t3.022_chainofreasoning.zh.yaml +161 -0
- package/AISB/catalog/aisb.t3.023_guidedembed.yaml +211 -0
- package/AISB/catalog/aisb.t3.023_guidedembed.zh.yaml +189 -0
- package/AISB/catalog/aisb.t3.024_outputcentric.yaml +148 -0
- package/AISB/catalog/aisb.t3.024_outputcentric.zh.yaml +131 -0
- package/AISB/catalog/aisb.t3.025_deeper.yaml +143 -0
- package/AISB/catalog/aisb.t3.025_deeper.zh.yaml +116 -0
- package/AISB/catalog/aisb.t3.026_gartkg.yaml +195 -0
- package/AISB/catalog/aisb.t3.026_gartkg.zh.yaml +127 -0
- package/AISB/catalog/aisb.t3.027_citeeval.yaml +182 -0
- package/AISB/catalog/aisb.t3.027_citeeval.zh.yaml +135 -0
- package/AISB/catalog/aisb.t3.028_sbam.yaml +206 -0
- package/AISB/catalog/aisb.t3.028_sbam.zh.yaml +166 -0
- package/AISB/catalog/aisb.t3.029_cdqgeoembed.yaml +224 -0
- package/AISB/catalog/aisb.t3.029_cdqgeoembed.zh.yaml +142 -0
- package/AISB/catalog/aisb.t3.030_processrm.yaml +211 -0
- package/AISB/catalog/aisb.t3.030_processrm.zh.yaml +166 -0
- package/AISB/catalog/aisb.t3.031_circuitstability.yaml +172 -0
- package/AISB/catalog/aisb.t3.031_circuitstability.zh.yaml +134 -0
- package/AISB/catalog/aisb.t3.032_ptsolver.yaml +169 -0
- package/AISB/catalog/aisb.t3.032_ptsolver.zh.yaml +135 -0
- package/AISB/catalog/aisb.t3.033_gcse.yaml +144 -0
- package/AISB/catalog/aisb.t3.033_gcse.zh.yaml +126 -0
- package/AISB/catalog/aisb.t3.034_ensemblewm.yaml +183 -0
- package/AISB/catalog/aisb.t3.034_ensemblewm.zh.yaml +146 -0
- package/AISB/catalog/aisb.t3.035_moralvalueswa.yaml +207 -0
- package/AISB/catalog/aisb.t3.035_moralvalueswa.zh.yaml +165 -0
- package/AISB/catalog/aisb.t3.036_weakstrongpref.yaml +210 -0
- package/AISB/catalog/aisb.t3.036_weakstrongpref.zh.yaml +194 -0
- package/AISB/catalog/aisb.t3.037_dementiamask.yaml +172 -0
- package/AISB/catalog/aisb.t3.037_dementiamask.zh.yaml +132 -0
- package/AISB/catalog/aisb.t3.038_tinysam.yaml +284 -0
- package/AISB/catalog/aisb.t3.038_tinysam.zh.yaml +240 -0
- package/AISB/catalog/aisb.t3.039_calf.yaml +224 -0
- package/AISB/catalog/aisb.t3.039_calf.zh.yaml +194 -0
- package/AISB/catalog/aisb.t3.040_graniteguardian.yaml +199 -0
- package/AISB/catalog/aisb.t3.040_graniteguardian.zh.yaml +174 -0
- package/AISB/catalog/aisb.t3.041_amdm.yaml +149 -0
- package/AISB/catalog/aisb.t3.041_amdm.zh.yaml +137 -0
- package/AISB/catalog/aisb.t3.042_xpatch.yaml +216 -0
- package/AISB/catalog/aisb.t3.042_xpatch.zh.yaml +182 -0
- package/AISB/catalog/aisb.t3.043_vhm.yaml +268 -0
- package/AISB/catalog/aisb.t3.043_vhm.zh.yaml +193 -0
- package/AISB/catalog/aisb.t3.044_rgvi.yaml +224 -0
- package/AISB/catalog/aisb.t3.044_rgvi.zh.yaml +176 -0
- package/AISB/catalog/aisb.t3.045_pslstm.yaml +203 -0
- package/AISB/catalog/aisb.t3.045_pslstm.zh.yaml +179 -0
- package/AISB/catalog/aisb.t3.046_nonstatts.yaml +208 -0
- package/AISB/catalog/aisb.t3.046_nonstatts.zh.yaml +194 -0
- package/AISB/catalog/aisb.t3.047_timepfn.yaml +156 -0
- package/AISB/catalog/aisb.t3.047_timepfn.zh.yaml +124 -0
- package/AISB/catalog/aisb.t3.048_proxyspex.yaml +148 -0
- package/AISB/catalog/aisb.t3.048_proxyspex.zh.yaml +125 -0
- package/AISB/catalog/aisb.t3.049_hogwildinference.yaml +183 -0
- package/AISB/catalog/aisb.t3.049_hogwildinference.zh.yaml +138 -0
- package/AISB/catalog/aisb.t3.050_causalpfn.yaml +214 -0
- package/AISB/catalog/aisb.t3.050_causalpfn.zh.yaml +190 -0
- package/AISB/catalog/aisb.t3.051_flashtp.yaml +169 -0
- package/AISB/catalog/aisb.t3.051_flashtp.zh.yaml +124 -0
- package/AISB/catalog/aisb.t3.052_nsdiff.yaml +155 -0
- package/AISB/catalog/aisb.t3.052_nsdiff.zh.yaml +138 -0
- package/AISB/catalog/aisb.t3.053_k2vae.yaml +158 -0
- package/AISB/catalog/aisb.t3.053_k2vae.zh.yaml +132 -0
- package/AISB/catalog/aisb.t3.054_timebase.yaml +178 -0
- package/AISB/catalog/aisb.t3.054_timebase.zh.yaml +158 -0
- package/AISB/catalog/aisb.t3.055_csbrain.yaml +238 -0
- package/AISB/catalog/aisb.t3.055_csbrain.zh.yaml +184 -0
- package/AISB/catalog/aisb.t3.056_infosam.yaml +224 -0
- package/AISB/catalog/aisb.t3.056_infosam.zh.yaml +189 -0
- package/AISB/catalog/aisb.t3.057_mdreid.yaml +129 -0
- package/AISB/catalog/aisb.t3.057_mdreid.zh.yaml +117 -0
- package/AISB/catalog/aisb.t3.058_mindglitch.yaml +171 -0
- package/AISB/catalog/aisb.t3.058_mindglitch.zh.yaml +145 -0
- package/AISB/catalog/aisb.t3.059_selfsupervised.yaml +154 -0
- package/AISB/catalog/aisb.t3.059_selfsupervised.zh.yaml +125 -0
- package/AISB/catalog/aisb.t3.060_iaggad.yaml +121 -0
- package/AISB/catalog/aisb.t3.060_iaggad.zh.yaml +100 -0
- package/AISB/catalog/aisb.t3.061_hsgkn.yaml +136 -0
- package/AISB/catalog/aisb.t3.061_hsgkn.zh.yaml +113 -0
- package/AISB/catalog/aisb.t3.062_visionts.yaml +237 -0
- package/AISB/catalog/aisb.t3.062_visionts.zh.yaml +216 -0
- package/AISB/catalog/aisb.t3.063_tsrag.yaml +162 -0
- package/AISB/catalog/aisb.t3.063_tsrag.zh.yaml +138 -0
- package/AISB/catalog/aisb.t3.064_pir.yaml +221 -0
- package/AISB/catalog/aisb.t3.064_pir.zh.yaml +197 -0
- package/AISB/catalog/aisb.t3.065_proteinbinding.yaml +234 -0
- package/AISB/catalog/aisb.t3.065_proteinbinding.zh.yaml +167 -0
- package/AISB/catalog/aisb.t3.066_tropicalattention.yaml +267 -0
- package/AISB/catalog/aisb.t3.066_tropicalattention.zh.yaml +229 -0
- package/AISB/catalog/aisb.t3.067_kanad.yaml +193 -0
- package/AISB/catalog/aisb.t3.067_kanad.zh.yaml +167 -0
- package/AISB/catalog/aisb.t3.068_sempo.yaml +187 -0
- package/AISB/catalog/aisb.t3.068_sempo.zh.yaml +148 -0
- package/AISB/catalog/aisb.t3.069_treehfd.yaml +129 -0
- package/AISB/catalog/aisb.t3.069_treehfd.zh.yaml +111 -0
- package/AISB/catalog/aisb.t3.070_certifiedunlearning.yaml +224 -0
- package/AISB/catalog/aisb.t3.070_certifiedunlearning.zh.yaml +171 -0
- package/AISB/catalog/aisb.t3.071_neuralmjd.yaml +142 -0
- package/AISB/catalog/aisb.t3.071_neuralmjd.zh.yaml +120 -0
- package/AISB/catalog/aisb.t3.072_fedgmt.yaml +181 -0
- package/AISB/catalog/aisb.t3.072_fedgmt.zh.yaml +158 -0
- package/AISB/catalog/aisb.t3.073_rld.yaml +161 -0
- package/AISB/catalog/aisb.t3.073_rld.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.074_lsvi.yaml +163 -0
- package/AISB/catalog/aisb.t3.074_lsvi.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.075_treeslicedentropy.yaml +201 -0
- package/AISB/catalog/aisb.t3.075_treeslicedentropy.zh.yaml +148 -0
- package/AISB/catalog/aisb.t3.076_aanet.yaml +169 -0
- package/AISB/catalog/aisb.t3.076_aanet.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.077_cmnn.yaml +199 -0
- package/AISB/catalog/aisb.t3.077_cmnn.zh.yaml +165 -0
- package/AISB/catalog/aisb.t3.078_conformalanomaly.yaml +146 -0
- package/AISB/catalog/aisb.t3.078_conformalanomaly.zh.yaml +117 -0
- package/AISB/catalog/aisb.t3.079_dpfkmeans.yaml +131 -0
- package/AISB/catalog/aisb.t3.079_dpfkmeans.zh.yaml +104 -0
- package/AISB/catalog/aisb.t3.080_latentscorereweight.yaml +169 -0
- package/AISB/catalog/aisb.t3.080_latentscorereweight.zh.yaml +123 -0
- package/AISB/catalog/aisb.t3.081_qmamba.yaml +150 -0
- package/AISB/catalog/aisb.t3.081_qmamba.zh.yaml +117 -0
- package/AISB/catalog/aisb.t3.082_onlinellmrouting.yaml +160 -0
- package/AISB/catalog/aisb.t3.082_onlinellmrouting.zh.yaml +133 -0
- package/AISB/catalog/aisb.t3.083_starformer.yaml +178 -0
- package/AISB/catalog/aisb.t3.083_starformer.zh.yaml +140 -0
- package/AISB/catalog/aisb.t3.084_ift.yaml +139 -0
- package/AISB/catalog/aisb.t3.084_ift.zh.yaml +111 -0
- package/AISB/catalog/aisb.t3.085_neuralsurv.yaml +183 -0
- package/AISB/catalog/aisb.t3.085_neuralsurv.zh.yaml +143 -0
- package/AISB/catalog/aisb.t3.086_stella.yaml +197 -0
- package/AISB/catalog/aisb.t3.086_stella.zh.yaml +142 -0
- package/AISB/catalog/aisb.t3.087_moses.yaml +167 -0
- package/AISB/catalog/aisb.t3.087_moses.zh.yaml +132 -0
- package/AISB/catalog/aisb.t3.088_channelnorm.yaml +140 -0
- package/AISB/catalog/aisb.t3.088_channelnorm.zh.yaml +109 -0
- package/AISB/catalog/aisb.t3.089_causalvelocity.yaml +730 -0
- package/AISB/catalog/aisb.t3.089_causalvelocity.zh.yaml +668 -0
- package/AISB/catalog/aisb.t3.090_rstib.yaml +144 -0
- package/AISB/catalog/aisb.t3.090_rstib.zh.yaml +109 -0
- package/AISB/catalog/aisb.t3.091_timeawarecausal.yaml +132 -0
- package/AISB/catalog/aisb.t3.091_timeawarecausal.zh.yaml +107 -0
- package/AISB/catalog/aisb.t3.092_kmeanslocalopt.yaml +138 -0
- package/AISB/catalog/aisb.t3.092_kmeanslocalopt.zh.yaml +110 -0
- package/AISB/catalog/aisb.t3.093_fedwmsam.yaml +134 -0
- package/AISB/catalog/aisb.t3.093_fedwmsam.zh.yaml +106 -0
- package/AISB/catalog/aisb.t3.094_boundre.yaml +147 -0
- package/AISB/catalog/aisb.t3.094_boundre.zh.yaml +114 -0
- package/AISB/catalog/aisb.t3.095_fastfeaturecp.yaml +153 -0
- package/AISB/catalog/aisb.t3.095_fastfeaturecp.zh.yaml +118 -0
- package/AISB/catalog/aisb.t3.096_m3svm.yaml +189 -0
- package/AISB/catalog/aisb.t3.096_m3svm.zh.yaml +149 -0
- package/AISB/catalog/aisb.t3.097_wassersteintl.yaml +212 -0
- package/AISB/catalog/aisb.t3.097_wassersteintl.zh.yaml +169 -0
- package/AISB/catalog/aisb.t3.098_xmahalanobis.yaml +171 -0
- package/AISB/catalog/aisb.t3.098_xmahalanobis.zh.yaml +127 -0
- package/AISB/catalog/aisb.t3.099_ollalanding.yaml +248 -0
- package/AISB/catalog/aisb.t3.099_ollalanding.zh.yaml +182 -0
- package/AISB/catalog/aisb.t3.100_invmissingdata.yaml +179 -0
- package/AISB/catalog/aisb.t3.100_invmissingdata.zh.yaml +150 -0
- package/AISB/catalog/aisb.t3.101_acia.yaml +164 -0
- package/AISB/catalog/aisb.t3.101_acia.zh.yaml +109 -0
- package/AISB/catalog/aisb.t3.102_stochasticff.yaml +178 -0
- package/AISB/catalog/aisb.t3.102_stochasticff.zh.yaml +130 -0
- package/AISB/catalog/aisb.t3.103_qdcp.yaml +150 -0
- package/AISB/catalog/aisb.t3.103_qdcp.zh.yaml +116 -0
- package/AISB/catalog/aisb.t3.104_balancedactiveinf.yaml +137 -0
- package/AISB/catalog/aisb.t3.104_balancedactiveinf.zh.yaml +104 -0
- package/AISB/catalog/aisb.t3.105_binaryclasseval.yaml +161 -0
- package/AISB/catalog/aisb.t3.105_binaryclasseval.zh.yaml +130 -0
- package/AISB/image/001_aisb.t3.001_savvy.jpg +0 -0
- package/AISB/image/002_aisb.t3.002_pinet.jpg +0 -0
- package/AISB/image/003_aisb.t3.003_dmsqd.jpg +0 -0
- package/AISB/image/004_aisb.t3.004_decentralattn.jpg +0 -0
- package/AISB/image/005_aisb.t3.005_tsae.jpg +0 -0
- package/AISB/image/006_aisb.t3.006_physense.jpg +0 -0
- package/AISB/image/007_aisb.t3.007_reasoningiqa.jpg +0 -0
- package/AISB/image/008_aisb.t3.008_meanflows.jpg +0 -0
- package/AISB/image/009_aisb.t3.009_scoremissing.jpg +0 -0
- package/AISB/image/010_aisb.t3.010_suitabilityfilter.jpg +0 -0
- package/AISB/image/011_aisb.t3.011_osd.jpg +0 -0
- package/AISB/image/012_aisb.t3.012_efficientqat.jpg +0 -0
- package/AISB/image/013_aisb.t3.013_appl.jpg +0 -0
- package/AISB/image/014_aisb.t3.014_piguard.jpg +0 -0
- package/AISB/image/015_aisb.t3.015_frspec.jpg +0 -0
- package/AISB/image/016_aisb.t3.016_mathfusion.jpg +0 -0
- package/AISB/image/017_aisb.t3.017_multimodalglp.jpg +0 -0
- package/AISB/image/018_aisb.t3.018_cotsynth.jpg +0 -0
- package/AISB/image/019_aisb.t3.019_dyscaleut.jpg +0 -0
- package/AISB/image/020_aisb.t3.020_aristotle.jpg +0 -0
- package/AISB/image/021_aisb.t3.021_tokenrecycling.jpg +0 -0
- package/AISB/image/022_aisb.t3.022_chainofreasoning.jpg +0 -0
- package/AISB/image/023_aisb.t3.023_guidedembed.jpg +0 -0
- package/AISB/image/024_aisb.t3.024_outputcentric.jpg +0 -0
- package/AISB/image/025_aisb.t3.025_deeper.jpg +0 -0
- package/AISB/image/026_aisb.t3.026_gartkg.jpg +0 -0
- package/AISB/image/027_aisb.t3.027_citeeval.jpg +0 -0
- package/AISB/image/028_aisb.t3.028_sbam.jpg +0 -0
- package/AISB/image/029_aisb.t3.029_cdqgeoembed.jpg +0 -0
- package/AISB/image/030_aisb.t3.030_processrm.jpg +0 -0
- package/AISB/image/031_aisb.t3.031_circuitstability.jpg +0 -0
- package/AISB/image/032_aisb.t3.032_ptsolver.jpg +0 -0
- package/AISB/image/033_aisb.t3.033_gcse.jpg +0 -0
- package/AISB/image/034_aisb.t3.034_ensemblewm.jpg +0 -0
- package/AISB/image/035_aisb.t3.035_moralvalueswa.jpg +0 -0
- package/AISB/image/036_aisb.t3.036_weakstrongpref.jpg +0 -0
- package/AISB/image/037_aisb.t3.037_dementiamask.jpg +0 -0
- package/AISB/image/038_aisb.t3.038_tinysam.jpg +0 -0
- package/AISB/image/039_aisb.t3.039_calf.jpg +0 -0
- package/AISB/image/040_aisb.t3.040_graniteguardian.jpg +0 -0
- package/AISB/image/041_aisb.t3.041_amdm.jpg +0 -0
- package/AISB/image/042_aisb.t3.042_xpatch.jpg +0 -0
- package/AISB/image/043_aisb.t3.043_vhm.jpg +0 -0
- package/AISB/image/044_aisb.t3.044_rgvi.jpg +0 -0
- package/AISB/image/045_aisb.t3.045_pslstm.jpg +0 -0
- package/AISB/image/046_aisb.t3.046_nonstatts.jpg +0 -0
- package/AISB/image/047_aisb.t3.047_timepfn.jpg +0 -0
- package/AISB/image/048_aisb.t3.048_proxyspex.jpg +0 -0
- package/AISB/image/049_aisb.t3.049_hogwildinference.jpg +0 -0
- package/AISB/image/050_aisb.t3.050_causalpfn.jpg +0 -0
- package/AISB/image/051_aisb.t3.051_flashtp.jpg +0 -0
- package/AISB/image/052_aisb.t3.052_nsdiff.jpg +0 -0
- package/AISB/image/053_aisb.t3.053_k2vae.jpg +0 -0
- package/AISB/image/054_aisb.t3.054_timebase.jpg +0 -0
- package/AISB/image/055_aisb.t3.055_csbrain.jpg +0 -0
- package/AISB/image/056_aisb.t3.056_infosam.jpg +0 -0
- package/AISB/image/057_aisb.t3.057_mdreid.jpg +0 -0
- package/AISB/image/058_aisb.t3.058_mindglitch.jpg +0 -0
- package/AISB/image/059_aisb.t3.059_selfsupervised.jpg +0 -0
- package/AISB/image/060_aisb.t3.060_iaggad.jpg +0 -0
- package/AISB/image/061_aisb.t3.061_hsgkn.jpg +0 -0
- package/AISB/image/062_aisb.t3.062_visionts.jpg +0 -0
- package/AISB/image/063_aisb.t3.063_tsrag.jpg +0 -0
- package/AISB/image/064_aisb.t3.064_pir.jpg +0 -0
- package/AISB/image/065_aisb.t3.065_proteinbinding.jpg +0 -0
- package/AISB/image/066_aisb.t3.066_tropicalattention.jpg +0 -0
- package/AISB/image/067_aisb.t3.067_kanad.jpg +0 -0
- package/AISB/image/068_aisb.t3.068_sempo.jpg +0 -0
- package/AISB/image/069_aisb.t3.069_treehfd.jpg +0 -0
- package/AISB/image/070_aisb.t3.070_certifiedunlearning.jpg +0 -0
- package/AISB/image/071_aisb.t3.071_neuralmjd.jpg +0 -0
- package/AISB/image/072_aisb.t3.072_fedgmt.jpg +0 -0
- package/AISB/image/073_aisb.t3.073_rld.jpg +0 -0
- package/AISB/image/074_aisb.t3.074_lsvi.jpg +0 -0
- package/AISB/image/075_aisb.t3.075_treeslicedentropy.jpg +0 -0
- package/AISB/image/076_aisb.t3.076_aanet.jpg +0 -0
- package/AISB/image/077_aisb.t3.077_cmnn.jpg +0 -0
- package/AISB/image/078_aisb.t3.078_conformalanomaly.jpg +0 -0
- package/AISB/image/079_aisb.t3.079_dpfkmeans.jpg +0 -0
- package/AISB/image/080_aisb.t3.080_latentscorereweight.jpg +0 -0
- package/AISB/image/081_aisb.t3.081_qmamba.jpg +0 -0
- package/AISB/image/082_aisb.t3.082_onlinellmrouting.jpg +0 -0
- package/AISB/image/083_aisb.t3.083_starformer.jpg +0 -0
- package/AISB/image/084_aisb.t3.084_ift.jpg +0 -0
- package/AISB/image/085_aisb.t3.085_neuralsurv.jpg +0 -0
- package/AISB/image/086_aisb.t3.086_stella.jpg +0 -0
- package/AISB/image/087_aisb.t3.087_moses.jpg +0 -0
- package/AISB/image/088_aisb.t3.088_channelnorm.jpg +0 -0
- package/AISB/image/089_aisb.t3.089_causalvelocity.jpg +0 -0
- package/AISB/image/090_aisb.t3.090_rstib.jpg +0 -0
- package/AISB/image/091_aisb.t3.091_timeawarecausal.jpg +0 -0
- package/AISB/image/092_aisb.t3.092_kmeanslocalopt.jpg +0 -0
- package/AISB/image/093_aisb.t3.093_fedwmsam.jpg +0 -0
- package/AISB/image/094_aisb.t3.094_boundre.jpg +0 -0
- package/AISB/image/095_aisb.t3.095_fastfeaturecp.jpg +0 -0
- package/AISB/image/096_aisb.t3.096_m3svm.jpg +0 -0
- package/AISB/image/097_aisb.t3.097_wassersteintl.jpg +0 -0
- package/AISB/image/098_aisb.t3.098_xmahalanobis.jpg +0 -0
- package/AISB/image/099_aisb.t3.099_ollalanding.jpg +0 -0
- package/AISB/image/100_aisb.t3.100_invmissingdata.jpg +0 -0
- package/AISB/image/101_aisb.t3.101_acia.jpg +0 -0
- package/AISB/image/102_aisb.t3.102_stochasticff.jpg +0 -0
- package/AISB/image/103_aisb.t3.103_qdcp.jpg +0 -0
- package/AISB/image/104_aisb.t3.104_balancedactiveinf.jpg +0 -0
- package/AISB/image/105_aisb.t3.105_binaryclasseval.jpg +0 -0
- package/AISB/image/106_aisb.t1.reasoning_lite.jpg +0 -0
- package/AISB/image/107_aisb.t2.paper_audit.jpg +0 -0
- package/AISB/image/108_aisb.t3.multi_gpu_search.jpg +0 -0
- package/AISB/image/109_aisb.t3.tdc_admet.jpg +0 -0
- package/AISB/image/aisb.b1.agentic_coding.svg +16 -0
- package/AISB/image/aisb.b10.climate_earth.svg +16 -0
- package/AISB/image/aisb.b11.model_efficiency.svg +16 -0
- package/AISB/image/aisb.b12.embodied_ai.svg +16 -0
- package/AISB/image/aisb.b2.agent_systems.svg +16 -0
- package/AISB/image/aisb.b3.self_evolving_rl.svg +16 -0
- package/AISB/image/aisb.b4.lm_reasoning.svg +16 -0
- package/AISB/image/aisb.b5.math_proof.svg +16 -0
- package/AISB/image/aisb.b6.research_process.svg +16 -0
- package/AISB/image/aisb.b7.multimodal_fusion.svg +16 -0
- package/AISB/image/aisb.b8.lifesci_drug.svg +16 -0
- package/AISB/image/aisb.b9.material_science.svg +16 -0
- package/README.md +132 -11
- package/bin/ds.js +376 -49
- package/docs/en/00_QUICK_START.md +135 -18
- package/docs/en/01_SETTINGS_REFERENCE.md +468 -96
- package/docs/en/02_START_RESEARCH_GUIDE.md +26 -5
- package/docs/en/03_QQ_CONNECTOR_GUIDE.md +14 -3
- package/docs/en/04_LINGZHU_CONNECTOR_GUIDE.md +2 -0
- package/docs/en/05_TUI_GUIDE.md +171 -2
- package/docs/en/07_MEMORY_AND_MCP.md +38 -2
- package/docs/en/09_DOCTOR.md +64 -4
- package/docs/en/10_WEIXIN_CONNECTOR_GUIDE.md +38 -1
- package/docs/en/11_LICENSE_AND_RISK.md +4 -0
- package/docs/en/12_GUIDED_WORKFLOW_TOUR.md +15 -0
- package/docs/en/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +9 -0
- package/docs/en/15_CODEX_PROVIDER_SETUP.md +622 -187
- package/docs/en/16_TELEGRAM_CONNECTOR_GUIDE.md +14 -0
- package/docs/en/17_WHATSAPP_CONNECTOR_GUIDE.md +14 -0
- package/docs/en/18_FEISHU_CONNECTOR_GUIDE.md +14 -0
- package/docs/en/21_LOCAL_MODEL_BACKENDS_GUIDE.md +105 -2
- package/docs/en/22_BENCHSTORE_YAML_REFERENCE.md +469 -0
- package/docs/en/23_BENCHSTORE_GITHUB_RELEASES_SPEC.md +316 -0
- package/docs/en/24_CLAUDE_CODE_PROVIDER_SETUP.md +469 -0
- package/docs/en/25_OPENCODE_PROVIDER_SETUP.md +653 -0
- package/docs/en/26_CITATION_AND_ATTRIBUTION.md +119 -0
- package/docs/en/27_KIMI_CODE_PROVIDER_SETUP.md +180 -0
- package/docs/en/28_DISCORD_CONNECTOR_GUIDE.md +61 -0
- package/docs/en/29_SLACK_CONNECTOR_GUIDE.md +60 -0
- package/docs/en/30_SETTINGS_CONTROL_CENTER_GUIDE.md +371 -0
- package/docs/en/{19_LOCAL_BROWSER_AUTH.md → 31_LOCAL_BROWSER_AUTH.md} +1 -1
- package/docs/en/32_WINDOWS_WSL2_DEPLOYMENT_GUIDE.md +273 -0
- package/docs/en/33_WORKSPACE_EXPLORER_QA.md +121 -0
- package/docs/en/91_DEVELOPMENT.md +29 -0
- package/docs/en/99_ACKNOWLEDGEMENTS.md +24 -19
- package/docs/en/README.md +44 -7
- package/docs/images/admin/admin-connectors-health-en.png +0 -0
- package/docs/images/admin/admin-controllers-en.png +0 -0
- package/docs/images/admin/admin-diagnostics-en.png +0 -0
- package/docs/images/admin/admin-errors-en.png +0 -0
- package/docs/images/admin/admin-issues-en.png +0 -0
- package/docs/images/admin/admin-logs-en.png +0 -0
- package/docs/images/admin/admin-quest-detail-en.png +0 -0
- package/docs/images/admin/admin-quests-en.png +0 -0
- package/docs/images/admin/admin-repairs-en.png +0 -0
- package/docs/images/admin/admin-runtime-en.png +0 -0
- package/docs/images/admin/admin-search-en.png +0 -0
- package/docs/images/admin/admin-stats-en.png +0 -0
- package/docs/images/admin/admin-summary-en.png +0 -0
- package/docs/images/connectors/connector-discord-en.png +0 -0
- package/docs/images/connectors/connector-feishu-en.png +0 -0
- package/docs/images/connectors/connector-lingzhu-en.png +0 -0
- package/docs/images/connectors/connector-qq-en.png +0 -0
- package/docs/images/connectors/connector-slack-en.png +0 -0
- package/docs/images/connectors/connector-telegram-en.png +0 -0
- package/docs/images/connectors/connector-weixin-en.png +0 -0
- package/docs/images/connectors/connector-whatsapp-en.png +0 -0
- package/docs/images/settings/settings-baselines-en.png +0 -0
- package/docs/images/settings/settings-config-en.png +0 -0
- package/docs/images/settings/settings-connectors-overview-en.png +0 -0
- package/docs/images/settings/settings-deepxiv-en.png +0 -0
- package/docs/images/settings/settings-mcp-servers-en.png +0 -0
- package/docs/images/settings/settings-plugins-en.png +0 -0
- package/docs/images/settings/settings-runners-en.png +0 -0
- package/docs/zh/00_QUICK_START.md +92 -17
- package/docs/zh/01_SETTINGS_REFERENCE.md +219 -98
- package/docs/zh/02_START_RESEARCH_GUIDE.md +26 -5
- package/docs/zh/05_TUI_GUIDE.md +171 -2
- package/docs/zh/07_MEMORY_AND_MCP.md +29 -2
- package/docs/zh/09_DOCTOR.md +39 -4
- package/docs/zh/10_WEIXIN_CONNECTOR_GUIDE.md +24 -1
- package/docs/zh/11_LICENSE_AND_RISK.md +4 -0
- package/docs/zh/12_GUIDED_WORKFLOW_TOUR.md +15 -0
- package/docs/zh/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +9 -0
- package/docs/zh/15_CODEX_PROVIDER_SETUP.md +550 -188
- package/docs/zh/21_LOCAL_MODEL_BACKENDS_GUIDE.md +105 -2
- package/docs/zh/22_BENCHSTORE_YAML_REFERENCE.md +459 -0
- package/docs/zh/23_BENCHSTORE_GITHUB_RELEASES_SPEC.md +287 -0
- package/docs/zh/23_CLAUDE_RUNNER_GUIDE.md +103 -0
- package/docs/zh/24_CLAUDE_CODE_PROVIDER_SETUP.md +460 -0
- package/docs/zh/25_OPENCODE_PROVIDER_SETUP.md +660 -0
- package/docs/zh/26_CITATION_AND_ATTRIBUTION.md +102 -0
- package/docs/zh/27_KIMI_CODE_PROVIDER_SETUP.md +51 -0
- package/docs/zh/{19_LOCAL_BROWSER_AUTH.md → 31_LOCAL_BROWSER_AUTH.md} +1 -1
- package/docs/zh/32_WINDOWS_WSL2_DEPLOYMENT_GUIDE.md +264 -0
- package/docs/zh/33_WORKSPACE_EXPLORER_QA.md +127 -0
- package/docs/zh/99_ACKNOWLEDGEMENTS.md +23 -19
- package/docs/zh/README.md +29 -7
- package/install.sh +122 -16
- package/package.json +4 -1
- package/pyproject.toml +2 -1
- package/src/deepscientist/__init__.py +1 -1
- package/src/deepscientist/acp/envelope.py +13 -0
- package/src/deepscientist/admin/__init__.py +3 -0
- package/src/deepscientist/admin/charts.py +681 -0
- package/src/deepscientist/admin/logs.py +119 -0
- package/src/deepscientist/admin/repairs.py +217 -0
- package/src/deepscientist/admin/service.py +1310 -0
- package/src/deepscientist/admin/system_info.py +700 -0
- package/src/deepscientist/admin/tasks.py +465 -0
- package/src/deepscientist/admin/tool_metrics.py +600 -0
- package/src/deepscientist/artifact/guidance.py +8 -4
- package/src/deepscientist/artifact/schemas.py +115 -0
- package/src/deepscientist/artifact/service.py +4268 -260
- package/src/deepscientist/bash_exec/monitor.py +30 -3
- package/src/deepscientist/bash_exec/service.py +134 -1
- package/src/deepscientist/benchstore/__init__.py +4 -0
- package/src/deepscientist/benchstore/prompt_builder.py +224 -0
- package/src/deepscientist/benchstore/service.py +1716 -0
- package/src/deepscientist/channels/weixin_ilink.py +8 -1
- package/src/deepscientist/cli.py +92 -17
- package/src/deepscientist/codex_cli_compat.py +2 -2
- package/src/deepscientist/config/models.py +82 -11
- package/src/deepscientist/config/service.py +927 -91
- package/src/deepscientist/connector/weixin_support.py +48 -17
- package/src/deepscientist/daemon/api/handlers.py +697 -210
- package/src/deepscientist/daemon/api/router.py +76 -1
- package/src/deepscientist/daemon/app.py +1054 -51
- package/src/deepscientist/diagnostics/runner_failures.py +147 -0
- package/src/deepscientist/doctor.py +212 -65
- package/src/deepscientist/evidence_packets.py +590 -0
- package/src/deepscientist/home.py +52 -4
- package/src/deepscientist/kimi_cli_compat.py +50 -0
- package/src/deepscientist/latex_runtime.py +2 -2
- package/src/deepscientist/mcp/context.py +2 -0
- package/src/deepscientist/mcp/schemas.py +114 -0
- package/src/deepscientist/mcp/server.py +1566 -126
- package/src/deepscientist/memory/service.py +203 -16
- package/src/deepscientist/process_control.py +8 -1
- package/src/deepscientist/prompts/builder.py +836 -92
- package/src/deepscientist/quest/__init__.py +2 -2
- package/src/deepscientist/quest/layout.py +12 -1
- package/src/deepscientist/quest/node_traces.py +10 -0
- package/src/deepscientist/quest/service.py +1430 -139
- package/src/deepscientist/quest/stage_views.py +1 -1
- package/src/deepscientist/runners/__init__.py +18 -0
- package/src/deepscientist/runners/base.py +89 -1
- package/src/deepscientist/runners/builtins.py +13 -1
- package/src/deepscientist/runners/claude.py +391 -0
- package/src/deepscientist/runners/codex.py +421 -21
- package/src/deepscientist/runners/codex_telemetry.py +127 -0
- package/src/deepscientist/runners/kimi.py +334 -0
- package/src/deepscientist/runners/metadata.py +68 -0
- package/src/deepscientist/runners/opencode.py +414 -0
- package/src/deepscientist/runners/runtime_overrides.py +100 -0
- package/src/deepscientist/runners/simple_cli.py +538 -0
- package/src/deepscientist/runtime_storage.py +303 -0
- package/src/deepscientist/shared.py +61 -16
- package/src/deepscientist/skills/installer.py +37 -0
- package/src/deepscientist/skills/registry.py +2 -0
- package/src/deepscientist/tinytex.py +2 -2
- package/src/deepscientist/tui.py +10 -3
- package/src/prompts/benchstore/system.md +77 -0
- package/src/prompts/connectors/qq.md +33 -2
- package/src/prompts/connectors/weixin.md +208 -23
- package/src/prompts/contracts/admin_ops.md +74 -0
- package/src/prompts/contracts/admin_ops_knowledge.md +138 -0
- package/src/prompts/contracts/shared_interaction.md +5 -11
- package/src/prompts/start_setup/system.md +422 -0
- package/src/prompts/system.md +409 -315
- package/src/prompts/system_copilot.md +88 -12
- package/src/skills/analysis-campaign/SKILL.md +239 -578
- package/src/skills/analysis-campaign/references/artifact-flow-examples.md +102 -0
- package/src/skills/analysis-campaign/references/boundary-cases.md +98 -0
- package/src/skills/analysis-campaign/references/campaign-checklist-template.md +39 -24
- package/src/skills/analysis-campaign/references/campaign-design.md +26 -10
- package/src/skills/analysis-campaign/references/campaign-plan-template.md +53 -54
- package/src/skills/analysis-campaign/references/operational-guidance.md +97 -0
- package/src/skills/analysis-campaign/references/writing-facing-slice-examples.md +10 -20
- package/src/skills/baseline/SKILL.md +183 -461
- package/src/skills/baseline/references/artifact-flow-examples.md +106 -0
- package/src/skills/baseline/references/artifact-payload-examples.md +1 -1
- package/src/skills/baseline/references/baseline-checklist-template.md +27 -35
- package/src/skills/baseline/references/baseline-plan-template.md +37 -76
- package/src/skills/baseline/references/boundary-cases.md +86 -0
- package/src/skills/baseline/references/codebase-audit-checklist.md +2 -6
- package/src/skills/baseline/references/comparability-contract.md +7 -12
- package/src/skills/baseline/references/operational-guidance.md +56 -0
- package/src/skills/baseline/references/route-selection.md +5 -25
- package/src/skills/decision/SKILL.md +113 -306
- package/src/skills/decision/references/checkpoint-memory-template.md +47 -0
- package/src/skills/decision/references/operational-guidance.md +94 -0
- package/src/skills/decision/references/research-route-criteria.md +7 -8
- package/src/skills/decision/references/strategic-decision-template.md +13 -26
- package/src/skills/experiment/SKILL.md +132 -670
- package/src/skills/experiment/references/execution-playbook.md +374 -0
- package/src/skills/experiment/references/main-experiment-checklist-template.md +26 -2
- package/src/skills/experiment/references/main-experiment-plan-template.md +28 -17
- package/src/skills/experiment/references/operational-guidance.md +108 -0
- package/src/skills/finalize/SKILL.md +62 -0
- package/src/skills/finalize/references/checkpoint-memory-template.md +49 -0
- package/src/skills/finalize/references/resume-packet-template.md +7 -0
- package/src/skills/idea/SKILL.md +228 -15
- package/src/skills/idea/references/controlled-brainstorming-playbook.md +78 -0
- package/src/skills/idea/references/current-board-packet-template.md +61 -0
- package/src/skills/idea/references/high-value-idea-sourcing.md +119 -0
- package/src/skills/idea/references/idea-generation-playbook.md +21 -0
- package/src/skills/idea/references/idea-thinking-flow.md +6 -0
- package/src/skills/idea/references/literature-survey-template.md +3 -0
- package/src/skills/idea/references/objective-contract-template.md +54 -0
- package/src/skills/idea/references/outline-seeding-example.md +56 -0
- package/src/skills/idea/references/pre-idea-draft-template.md +105 -0
- package/src/skills/idea/references/related-work-playbook.md +75 -2
- package/src/skills/idea/references/research-history-playbook.md +114 -0
- package/src/skills/idea/references/selection-gate.md +58 -6
- package/src/skills/intake-audit/SKILL.md +43 -2
- package/src/skills/intake-audit/references/state-audit-template.md +10 -0
- package/src/skills/nature-data/SKILL.md +128 -0
- package/src/skills/nature-data/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-data/agents/openai.yaml +4 -0
- package/src/skills/nature-data/references/chinese-author-alignment.md +84 -0
- package/src/skills/nature-data/references/fair-metadata-checklist.md +105 -0
- package/src/skills/nature-data/references/policy-principles.md +103 -0
- package/src/skills/nature-data/references/repository-and-identifiers.md +96 -0
- package/src/skills/nature-data/references/source-basis.md +54 -0
- package/src/skills/nature-data/references/statement-patterns.md +153 -0
- package/src/skills/nature-figure/SKILL.md +197 -0
- package/src/skills/nature-figure/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-figure/agents/openai.yaml +4 -0
- package/src/skills/nature-figure/evals/evals.json +37 -0
- package/src/skills/nature-figure/references/api.md +428 -0
- package/src/skills/nature-figure/references/backend-selection.md +100 -0
- package/src/skills/nature-figure/references/chart-types.md +281 -0
- package/src/skills/nature-figure/references/common-patterns.md +349 -0
- package/src/skills/nature-figure/references/design-theory.md +436 -0
- package/src/skills/nature-figure/references/figure-contract.md +93 -0
- package/src/skills/nature-figure/references/nature-2026-observations.md +112 -0
- package/src/skills/nature-figure/references/qa-contract.md +119 -0
- package/src/skills/nature-figure/references/r-template-index.md +66 -0
- package/src/skills/nature-figure/references/r-workflow.md +161 -0
- package/src/skills/nature-figure/references/tutorials.md +250 -0
- package/src/skills/nature-paper2ppt/SKILL.md +507 -0
- package/src/skills/nature-paper2ppt/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-paper2ppt/agents/openai.yaml +4 -0
- package/src/skills/nature-polishing/SKILL.md +385 -0
- package/src/skills/nature-polishing/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-polishing/agents/openai.yaml +4 -0
- package/src/skills/nature-polishing/references/phrasebank-playbook.md +162 -0
- package/src/skills/nature-polishing/references/section-moves.md +240 -0
- package/src/skills/nature-polishing/references/style-guardrails.md +94 -0
- package/src/skills/nature-polishing/references/writing-strategy.md +148 -0
- package/src/skills/optimize/SKILL.md +177 -1568
- package/src/skills/optimize/references/brief-shaping-playbook.md +95 -0
- package/src/skills/optimize/references/candidate-board-template.md +13 -0
- package/src/skills/optimize/references/candidate-ranking-template.md +51 -0
- package/src/skills/optimize/references/codegen-route-playbook.md +50 -0
- package/src/skills/optimize/references/debug-response-template.md +29 -0
- package/src/skills/optimize/references/frontier-review-template.md +32 -0
- package/src/skills/optimize/references/fusion-playbook.md +36 -0
- package/src/skills/optimize/references/method-brief-template.md +73 -0
- package/src/skills/optimize/references/operational-guidance.md +621 -0
- package/src/skills/optimize/references/optimization-memory-template.md +30 -0
- package/src/skills/optimize/references/optimize-checklist-template.md +18 -0
- package/src/skills/optimize/references/plateau-response-playbook.md +28 -0
- package/src/skills/optimize/references/prompt-patterns.md +49 -0
- package/src/skills/paper-outline/SKILL.md +227 -0
- package/src/skills/paper-outline/references/outline-patterns.md +87 -0
- package/src/skills/paper-plot/SKILL.md +79 -0
- package/src/skills/paper-plot/agents/openai.yaml +4 -0
- package/src/skills/paper-plot/references/bar_grouped_hatch.md +96 -0
- package/src/skills/paper-plot/references/bar_paired_delta.md +72 -0
- package/src/skills/paper-plot/references/line_confidence_band.md +75 -0
- package/src/skills/paper-plot/references/line_loss_with_inset.md +65 -0
- package/src/skills/paper-plot/references/line_training_curve.md +44 -0
- package/src/skills/paper-plot/references/radar_dual_series.md +59 -0
- package/src/skills/paper-plot/references/scatter_broken_axis.md +59 -0
- package/src/skills/paper-plot/references/scatter_tsne_cluster.md +72 -0
- package/src/skills/paper-plot/scripts/bar_memevolve.py +109 -0
- package/src/skills/paper-plot/scripts/bar_spice.py +166 -0
- package/src/skills/paper-plot/scripts/line_aime.py +94 -0
- package/src/skills/paper-plot/scripts/line_loss_inset.py +157 -0
- package/src/skills/paper-plot/scripts/line_selfdistill.py +168 -0
- package/src/skills/paper-plot/scripts/radar_dora.py +151 -0
- package/src/skills/paper-plot/scripts/scatter_break.py +169 -0
- package/src/skills/paper-plot/scripts/scatter_tsne.py +133 -0
- package/src/skills/rebuttal/SKILL.md +9 -0
- package/src/skills/references/tool-usage-by-stage.md +438 -0
- package/src/skills/review/SKILL.md +105 -7
- package/src/skills/science/PROVENANCE.md +44 -0
- package/src/skills/science/SKILL.md +137 -0
- package/src/skills/science/references/artifact-science-tool.md +110 -0
- package/src/skills/science/references/claim-type-discipline.md +56 -0
- package/src/skills/science/references/domain-index.md +422 -0
- package/src/skills/science/references/hpc-via-bash-exec.md +42 -0
- package/src/skills/science/references/package-check-playbook.md +64 -0
- package/src/skills/science/references/package-index.min.json +3616 -0
- package/src/skills/science/references/packages/abinit.md +80 -0
- package/src/skills/science/references/packages/acts.md +73 -0
- package/src/skills/science/references/packages/aiida-core.md +80 -0
- package/src/skills/science/references/packages/alamode.md +80 -0
- package/src/skills/science/references/packages/amuse.md +88 -0
- package/src/skills/science/references/packages/anndata.md +88 -0
- package/src/skills/science/references/packages/arbor.md +80 -0
- package/src/skills/science/references/packages/arc.md +73 -0
- package/src/skills/science/references/packages/astropy.md +88 -0
- package/src/skills/science/references/packages/astroquery.md +88 -0
- package/src/skills/science/references/packages/atomate2.md +80 -0
- package/src/skills/science/references/packages/atomsmltr.md +73 -0
- package/src/skills/science/references/packages/awkward.md +73 -0
- package/src/skills/science/references/packages/batman.md +88 -0
- package/src/skills/science/references/packages/biopython.md +88 -0
- package/src/skills/science/references/packages/bloqade.md +73 -0
- package/src/skills/science/references/packages/brian2.md +73 -0
- package/src/skills/science/references/packages/bullet3.md +73 -0
- package/src/skills/science/references/packages/calculix.md +80 -0
- package/src/skills/science/references/packages/cantera.md +73 -0
- package/src/skills/science/references/packages/cavity-md-ipi.md +80 -0
- package/src/skills/science/references/packages/ccdproc.md +88 -0
- package/src/skills/science/references/packages/celerite2.md +88 -0
- package/src/skills/science/references/packages/cellrank.md +73 -0
- package/src/skills/science/references/packages/cesm.md +80 -0
- package/src/skills/science/references/packages/chemicals.md +73 -0
- package/src/skills/science/references/packages/chempy.md +73 -0
- package/src/skills/science/references/packages/cirq.md +73 -0
- package/src/skills/science/references/packages/coffea.md +73 -0
- package/src/skills/science/references/packages/cp2k.md +88 -0
- package/src/skills/science/references/packages/custodian.md +80 -0
- package/src/skills/science/references/packages/dart.md +73 -0
- package/src/skills/science/references/packages/datamol.md +88 -0
- package/src/skills/science/references/packages/dd4hep.md +73 -0
- package/src/skills/science/references/packages/dealii.md +80 -0
- package/src/skills/science/references/packages/deepchem.md +88 -0
- package/src/skills/science/references/packages/delphes.md +73 -0
- package/src/skills/science/references/packages/devito.md +80 -0
- package/src/skills/science/references/packages/dftb.md +88 -0
- package/src/skills/science/references/packages/dftd4.md +88 -0
- package/src/skills/science/references/packages/dftk-jl.md +80 -0
- package/src/skills/science/references/packages/dolfinx.md +80 -0
- package/src/skills/science/references/packages/drake.md +73 -0
- package/src/skills/science/references/packages/dumux.md +73 -0
- package/src/skills/science/references/packages/elk.md +80 -0
- package/src/skills/science/references/packages/elmerfem.md +80 -0
- package/src/skills/science/references/packages/enzo-e.md +88 -0
- package/src/skills/science/references/packages/espresso.md +80 -0
- package/src/skills/science/references/packages/exoplanet.md +88 -0
- package/src/skills/science/references/packages/fairroot.md +73 -0
- package/src/skills/science/references/packages/fbpic.md +80 -0
- package/src/skills/science/references/packages/fdtdbath-meep.md +80 -0
- package/src/skills/science/references/packages/geant4.md +73 -0
- package/src/skills/science/references/packages/geosx.md +80 -0
- package/src/skills/science/references/packages/gprmax.md +80 -0
- package/src/skills/science/references/packages/gromacs.md +80 -0
- package/src/skills/science/references/packages/gwaslab.md +73 -0
- package/src/skills/science/references/packages/gz-sim.md +73 -0
- package/src/skills/science/references/packages/hail.md +88 -0
- package/src/skills/science/references/packages/hiphive.md +80 -0
- package/src/skills/science/references/packages/hoomd-blue.md +80 -0
- package/src/skills/science/references/packages/itensor.md +73 -0
- package/src/skills/science/references/packages/itensors-jl.md +73 -0
- package/src/skills/science/references/packages/jdftx.md +73 -0
- package/src/skills/science/references/packages/jobflow.md +80 -0
- package/src/skills/science/references/packages/kadanoffbaym-jl.md +73 -0
- package/src/skills/science/references/packages/kite.md +80 -0
- package/src/skills/science/references/packages/kratos.md +80 -0
- package/src/skills/science/references/packages/kwant.md +73 -0
- package/src/skills/science/references/packages/lammps.md +80 -0
- package/src/skills/science/references/packages/lightkurve.md +88 -0
- package/src/skills/science/references/packages/limix.md +73 -0
- package/src/skills/science/references/packages/maxwelllink.md +80 -0
- package/src/skills/science/references/packages/mcdc.md +73 -0
- package/src/skills/science/references/packages/meep.md +80 -0
- package/src/skills/science/references/packages/mfem.md +80 -0
- package/src/skills/science/references/packages/mitgcm.md +73 -0
- package/src/skills/science/references/packages/modflow6.md +73 -0
- package/src/skills/science/references/packages/molecool.md +73 -0
- package/src/skills/science/references/packages/mom6.md +73 -0
- package/src/skills/science/references/packages/moose.md +80 -0
- package/src/skills/science/references/packages/mpas-model.md +73 -0
- package/src/skills/science/references/packages/mujoco.md +73 -0
- package/src/skills/science/references/packages/mumax3.md +73 -0
- package/src/skills/science/references/packages/nekrs.md +80 -0
- package/src/skills/science/references/packages/nessi.md +73 -0
- package/src/skills/science/references/packages/nest-simulator.md +73 -0
- package/src/skills/science/references/packages/netket.md +73 -0
- package/src/skills/science/references/packages/neuron.md +73 -0
- package/src/skills/science/references/packages/nextflow.md +88 -0
- package/src/skills/science/references/packages/nwchem.md +88 -0
- package/src/skills/science/references/packages/openbabel.md +88 -0
- package/src/skills/science/references/packages/openems.md +80 -0
- package/src/skills/science/references/packages/openff-toolkit.md +88 -0
- package/src/skills/science/references/packages/openfoam-dev.md +80 -0
- package/src/skills/science/references/packages/openmc.md +73 -0
- package/src/skills/science/references/packages/openmm.md +80 -0
- package/src/skills/science/references/packages/openmoc.md +73 -0
- package/src/skills/science/references/packages/openmx.md +80 -0
- package/src/skills/science/references/packages/opensees.md +80 -0
- package/src/skills/science/references/packages/opensn.md +80 -0
- package/src/skills/science/references/packages/opm-simulators.md +73 -0
- package/src/skills/science/references/packages/oqupy.md +73 -0
- package/src/skills/science/references/packages/packmol.md +80 -0
- package/src/skills/science/references/packages/palabos.md +80 -0
- package/src/skills/science/references/packages/parflow.md +80 -0
- package/src/skills/science/references/packages/pennylane.md +88 -0
- package/src/skills/science/references/packages/perceval.md +73 -0
- package/src/skills/science/references/packages/phono3py.md +73 -0
- package/src/skills/science/references/packages/phonopy.md +73 -0
- package/src/skills/science/references/packages/photutils.md +88 -0
- package/src/skills/science/references/packages/picongpu.md +80 -0
- package/src/skills/science/references/packages/plink-ng.md +88 -0
- package/src/skills/science/references/packages/precice.md +73 -0
- package/src/skills/science/references/packages/psc.md +80 -0
- package/src/skills/science/references/packages/psi4.md +88 -0
- package/src/skills/science/references/packages/pybinding.md +73 -0
- package/src/skills/science/references/packages/pyfr.md +80 -0
- package/src/skills/science/references/packages/pyhf.md +73 -0
- package/src/skills/science/references/packages/pyiron_base.md +80 -0
- package/src/skills/science/references/packages/pylcp.md +73 -0
- package/src/skills/science/references/packages/pylith.md +80 -0
- package/src/skills/science/references/packages/pynbody.md +88 -0
- package/src/skills/science/references/packages/pysam.md +88 -0
- package/src/skills/science/references/packages/pyscf.md +88 -0
- package/src/skills/science/references/packages/q-e.md +73 -0
- package/src/skills/science/references/packages/qibo.md +73 -0
- package/src/skills/science/references/packages/qiskit.md +73 -0
- package/src/skills/science/references/packages/quantica-jl.md +73 -0
- package/src/skills/science/references/packages/quantumoptics-jl.md +73 -0
- package/src/skills/science/references/packages/quimb.md +73 -0
- package/src/skills/science/references/packages/qulacs.md +73 -0
- package/src/skills/science/references/packages/qutip.md +73 -0
- package/src/skills/science/references/packages/rdkit.md +88 -0
- package/src/skills/science/references/packages/rmg-py.md +73 -0
- package/src/skills/science/references/packages/root.md +73 -0
- package/src/skills/science/references/packages/scanpy.md +88 -0
- package/src/skills/science/references/packages/scikit-allel.md +88 -0
- package/src/skills/science/references/packages/scikit-bio.md +88 -0
- package/src/skills/science/references/packages/scqubits.md +73 -0
- package/src/skills/science/references/packages/scuff-em.md +80 -0
- package/src/skills/science/references/packages/scvi-tools.md +73 -0
- package/src/skills/science/references/packages/seissol.md +73 -0
- package/src/skills/science/references/packages/sfepy.md +80 -0
- package/src/skills/science/references/packages/sisl.md +73 -0
- package/src/skills/science/references/packages/smilei.md +80 -0
- package/src/skills/science/references/packages/snakemake.md +88 -0
- package/src/skills/science/references/packages/specfem3d-globe.md +80 -0
- package/src/skills/science/references/packages/specutils.md +88 -0
- package/src/skills/science/references/packages/spglib.md +80 -0
- package/src/skills/science/references/packages/squidpy.md +88 -0
- package/src/skills/science/references/packages/starry.md +88 -0
- package/src/skills/science/references/packages/strawberryfields.md +73 -0
- package/src/skills/science/references/packages/su2.md +80 -0
- package/src/skills/science/references/packages/sunny-jl.md +73 -0
- package/src/skills/science/references/packages/sw4.md +73 -0
- package/src/skills/science/references/packages/swift.md +88 -0
- package/src/skills/science/references/packages/tdnegf.md +73 -0
- package/src/skills/science/references/packages/tenpy.md +73 -0
- package/src/skills/science/references/packages/thermo.md +73 -0
- package/src/skills/science/references/packages/tkwant.md +73 -0
- package/src/skills/science/references/packages/tvb-root.md +73 -0
- package/src/skills/science/references/packages/uproot5.md +73 -0
- package/src/skills/science/references/packages/vampire.md +80 -0
- package/src/skills/science/references/packages/wannier_tools.md +73 -0
- package/src/skills/science/references/packages/warpx.md +80 -0
- package/src/skills/science/references/packages/wrf.md +73 -0
- package/src/skills/science/references/packages/xtb.md +88 -0
- package/src/skills/science/references/packages/yt.md +73 -0
- package/src/skills/science/references/science-task-brief-template.md +71 -0
- package/src/skills/scout/SKILL.md +83 -425
- package/src/skills/scout/references/literature-scout-template.md +5 -24
- package/src/skills/scout/references/operational-guidance.md +191 -0
- package/src/skills/scout/references/paper-triage-playbook.md +11 -35
- package/src/skills/write/SKILL.md +744 -1246
- package/src/skills/write/references/experiments_analysis_patterns.md +129 -0
- package/src/skills/write/references/oral_package_patterns.md +252 -0
- package/src/skills/write/references/oral_writing_principles.md +291 -0
- package/src/skills/write/references/section_rewrite_checklist.md +234 -0
- package/src/tui/dist/app/AppContainer.js +1314 -27
- package/src/tui/dist/components/Composer.js +26 -1
- package/src/tui/dist/components/ConfigScreen.js +2 -1
- package/src/tui/dist/components/InputPrompt.js +25 -9
- package/src/tui/dist/components/MainContent.js +18 -3
- package/src/tui/dist/components/QuestScreen.js +3 -2
- package/src/tui/dist/components/UtilityScreen.js +37 -0
- package/src/tui/dist/hooks/useSafeInput.js +10 -0
- package/src/tui/dist/index.js +13 -1
- package/src/tui/dist/layouts/DefaultAppLayout.js +11 -8
- package/src/tui/dist/lib/api.js +89 -1
- package/src/tui/package.json +1 -1
- package/src/ui/dist/assets/{AnalysisPlugin-BCKAfjba.js → AnalysisPlugin-CA94NGmI.js} +1 -1
- package/src/ui/dist/assets/CliPlugin-DHBzphZU.js +79 -0
- package/src/ui/dist/assets/CodeEditorPlugin-BOFwD2rn.js +2 -0
- package/src/ui/dist/assets/{CodeViewerPlugin-CbaFRrUU.js → CodeViewerPlugin-CqDpgjik.js} +4 -4
- package/src/ui/dist/assets/{DocViewerPlugin-DAjLVeQD.js → DocViewerPlugin-UDBgt8-4.js} +3 -3
- package/src/ui/dist/assets/GitCommitViewerPlugin-BmHtZ0bZ.js +6 -0
- package/src/ui/dist/assets/{GitDiffViewerPlugin-CQACjoAA.js → GitDiffViewerPlugin-CAxjNorQ.js} +2 -2
- package/src/ui/dist/assets/{GitSnapshotViewer-0r4nLPke.js → GitSnapshotViewer-CweA6VON.js} +2 -2
- package/src/ui/dist/assets/{ImageViewerPlugin-nBOmI2v_.js → ImageViewerPlugin-C8wHGvGN.js} +5 -5
- package/src/ui/dist/assets/LabPlugin-COyyLUol.js +32 -0
- package/src/ui/dist/assets/{LatexPlugin-ZwtV8pIp.js → LatexPlugin-BQjAaA5J.js} +4 -4
- package/src/ui/dist/assets/{MarkdownViewerPlugin-DKqVfKyW.js → MarkdownViewerPlugin-Dy1NE2dI.js} +3 -3
- package/src/ui/dist/assets/{MarketplacePlugin-BwxStZ9D.js → MarketplacePlugin-DMIZtEJ2.js} +2 -2
- package/src/ui/dist/assets/NotebookEditor-CFHMq_Qt.js +91 -0
- package/src/ui/dist/assets/{NotebookEditor-DB9N_T9q.js → NotebookEditor-WFyd8Ybt.js} +3 -3
- package/src/ui/dist/assets/{PdfLoader-eWBONbQP.js → PdfLoader-CLE5u5TS.js} +3 -3
- package/src/ui/dist/assets/{PdfMarkdownPlugin-D22YOZL3.js → PdfMarkdownPlugin-_iNK_H83.js} +1 -1
- package/src/ui/dist/assets/PdfViewerPlugin-DgWsbInT.js +22 -0
- package/src/ui/dist/assets/SearchPlugin-DrZmn5iw.js +11 -0
- package/src/ui/dist/assets/{TextViewerPlugin-C5xqeeUH.js → TextViewerPlugin-D1-T3aC7.js} +4 -4
- package/src/ui/dist/assets/branding/runner-claude.svg +107 -0
- package/src/ui/dist/assets/branding/runner-codex.svg +10 -0
- package/src/ui/dist/assets/branding/runner-kimi.svg +14 -0
- package/src/ui/dist/assets/branding/runner-opencode.svg +7 -0
- package/src/ui/dist/assets/cli-store-CoZ-x5Ip.js +1 -0
- package/src/ui/dist/assets/{code-WlFHE7z_.js → code-DbsmSd3Y.js} +1 -1
- package/src/ui/dist/assets/file-diff-panel-DsvyRz47.js +1 -0
- package/src/ui/dist/assets/{wrap-text-BC-Hltpd.js → file-jump-queue-DeQBikaw.js} +3 -3
- package/src/ui/dist/assets/{file-socket-CfQPKQKj.js → file-socket-DA5XIx88.js} +1 -1
- package/src/ui/dist/assets/fonts/ds-fonts.css +50 -4
- package/src/ui/dist/assets/images/deepxiv/register-guide.png +0 -0
- package/src/ui/dist/assets/index-39vY9LmZ.js +1 -0
- package/src/ui/dist/assets/{index-CwNu1aH4.js → index-BsO46tJA.js} +1 -1
- package/src/ui/dist/assets/index-CHzJ2xtB.js +3530 -0
- package/src/ui/dist/assets/index-DH-zxoZ3.css +33 -0
- package/src/ui/dist/assets/{plugin-notebook-HbW2K-1c.js → plugin-notebook-JRhysCqj.js} +2 -2
- package/src/ui/dist/assets/{project-sync-C9IdzdZW.js → project-sync-DPmWKmKD.js} +1 -1
- package/src/ui/dist/assets/{zoom-out-E_gaeAxL.js → zoom-out-DAukFWen.js} +3 -3
- package/src/ui/dist/index.html +3 -3
- package/src/skills/analysis-campaign/references/artifact-orchestration.md +0 -58
- package/src/skills/baseline/references/memory-playbook.md +0 -40
- package/src/skills/baseline/references/publishable-baseline-package.md +0 -30
- package/src/skills/write/references/outline-evidence-contract-example.md +0 -107
- package/src/skills/write/references/paper-experiment-matrix-template.md +0 -131
- package/src/skills/write/references/paper-section-playbook.md +0 -64
- package/src/skills/write/references/reviewer-first-writing.md +0 -64
- package/src/skills/write/references/revision-checklist.md +0 -70
- package/src/skills/write/references/section-contracts.md +0 -82
- package/src/skills/write/references/sentence-level-proofing.md +0 -49
- package/src/ui/dist/assets/AiManusChatView-Bv-Z8YpU.js +0 -204
- package/src/ui/dist/assets/CliPlugin-BCKcpc35.js +0 -109
- package/src/ui/dist/assets/CodeEditorPlugin-DbOfSJ8K.js +0 -2
- package/src/ui/dist/assets/GitCommitViewerPlugin-CIUqbUDO.js +0 -1
- package/src/ui/dist/assets/LabCopilotPanel-BHxOxF4z.js +0 -14
- package/src/ui/dist/assets/LabPlugin-BKoZGs95.js +0 -22
- package/src/ui/dist/assets/NotebookEditor-BEQhaQbt.js +0 -81
- package/src/ui/dist/assets/PdfViewerPlugin-c-RK9DLM.js +0 -17
- package/src/ui/dist/assets/SearchPlugin-CxF9ytAx.js +0 -16
- package/src/ui/dist/assets/VNCViewer-BoLGLnHz.js +0 -11
- package/src/ui/dist/assets/bot-DREQOxzP.js +0 -6
- package/src/ui/dist/assets/chevron-up-C9Qpx4DE.js +0 -6
- package/src/ui/dist/assets/file-content-BZMz3RYp.js +0 -1
- package/src/ui/dist/assets/file-diff-panel-CQhw0jS2.js +0 -1
- package/src/ui/dist/assets/file-jump-queue-DA-SdG__.js +0 -1
- package/src/ui/dist/assets/git-commit-horizontal-DxZ8DCZh.js +0 -6
- package/src/ui/dist/assets/image-Bgl4VIyx.js +0 -6
- package/src/ui/dist/assets/index-BpV6lusQ.css +0 -33
- package/src/ui/dist/assets/index-CBNVuWcP.js +0 -2496
- package/src/ui/dist/assets/index-DrUnlf6K.js +0 -1
- package/src/ui/dist/assets/index-NW-h8VzN.js +0 -1
- package/src/ui/dist/assets/pdf-effect-queue-J8OnM0jE.js +0 -6
- package/src/ui/dist/assets/popover-CLc0pPP8.js +0 -1
- package/src/ui/dist/assets/select-Cs2PmzwL.js +0 -11
- package/src/ui/dist/assets/sigma-ClKcHAXm.js +0 -6
- package/src/ui/dist/assets/trash-DwpbFr3w.js +0 -11
- package/src/ui/dist/assets/useCliAccess-NQ8m0Let.js +0 -1
- package/src/ui/dist/assets/useFileDiffOverlay-FuhcnKiw.js +0 -1
|
@@ -7,87 +7,122 @@ skill_role: stage
|
|
|
7
7
|
# Experiment
|
|
8
8
|
|
|
9
9
|
Use this skill for the main evidence-producing runs of the quest.
|
|
10
|
+
The goal is to turn one selected route into one trustworthy measured result with the smallest valid amount of execution.
|
|
10
11
|
|
|
11
|
-
##
|
|
12
|
+
## Match signals
|
|
12
13
|
|
|
13
|
-
|
|
14
|
-
- For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
|
|
15
|
-
- Keep ordinary subtask completions concise. When a main experiment actually finishes or reaches a stage-significant checkpoint, upgrade to a richer `artifact.interact(kind='milestone', reply_mode='threaded', ...)` report rather than another short progress line.
|
|
16
|
-
- That richer experiment-stage milestone report should normally cover: what run finished, the headline result versus baseline or expectation, the main caveat, and the exact recommended next action.
|
|
17
|
-
- That richer milestone report is still normally non-blocking. If the next route is already justified locally, continue automatically after reporting rather than idling for acknowledgment.
|
|
18
|
-
- If the active communication surface is QQ and QQ milestone media is enabled in config, a completed main experiment may attach one summary PNG to that richer milestone update.
|
|
19
|
-
- That PNG should be a connector-facing report chart, not a raw debug plot and not a draft paper figure.
|
|
20
|
-
- Do not auto-send every training curve, per-step plot, or intermediate slice image.
|
|
21
|
-
- Preferred connector-chart palettes are Morandi-like and restrained:
|
|
22
|
-
- `sage-clay`: `#E7E1D6`, `#B7A99A`, `#7F8F84` for the default QQ summary look
|
|
23
|
-
- `mist-stone`: `#F3EEE8`, `#D8D1C7`, `#8A9199` for conservative summaries
|
|
24
|
-
- `dust-rose`: `#F2E9E6`, `#D8C3BC`, `#B88C8C` for secondary comparisons only
|
|
25
|
-
- Connector-facing chart requirements:
|
|
26
|
-
- white or near-white background
|
|
27
|
-
- low saturation, no neon colors
|
|
28
|
-
- one primary accent plus one neutral comparison color whenever possible
|
|
29
|
-
- simple legend, light grid, readable labels, and no dashboard clutter
|
|
30
|
-
- summarize only the evidence needed for the milestone
|
|
31
|
-
- Default chart choice:
|
|
32
|
-
- line chart for training / budget / step trends
|
|
33
|
-
- bar chart for a small number of categorical end-point comparisons
|
|
34
|
-
- point-range chart when uncertainty or seed spread matters
|
|
35
|
-
- If the figure encodes ordered magnitude, use a sequential muted palette; if it encodes signed delta around a reference, use a diverging muted palette with a neutral midpoint.
|
|
36
|
-
- Avoid rainbow / jet-like colormaps, 3D effects, and over-annotated dashboards.
|
|
37
|
-
- If the chart may later be reused in the paper, export a vector copy (`pdf` or `svg`) alongside the connector `png`.
|
|
38
|
-
- If the figure matters beyond transient debugging, open `figure-polish/SKILL.md` and follow its render-inspect-revise workflow before treating the image as final.
|
|
39
|
-
- If plotting in Python, reuse the fixed Morandi plotting starter from the system prompt rather than inventing a new bright style for each run.
|
|
40
|
-
- If the runtime starts an auto-continue turn with no new user message, continue from the current run state, logs, artifacts, and active requirements instead of replaying the previous user turn.
|
|
41
|
-
- Progress message templates are references only. Adapt to the actual context and vary wording so messages feel human, respectful, and non-robotic.
|
|
42
|
-
- If a threaded user reply arrives, interpret it relative to the latest experiment progress update before assuming the task changed completely.
|
|
43
|
-
- Hard execution rule: every terminal command in this stage must go through `bash_exec`; do not use any other terminal path for smoke tests, real runs, Git, Python, package-manager, or file-inspection commands.
|
|
44
|
-
- Prefer `bash_exec` for experiment commands so each run gets a durable session id, quest-local log folder, and later `read/list/kill` control.
|
|
45
|
-
- For meaningful long-running runs, include the estimated next reply time or next check-in window whenever it is defensible.
|
|
14
|
+
Use `experiment` when:
|
|
46
15
|
|
|
47
|
-
|
|
16
|
+
- a baseline is accepted
|
|
17
|
+
- an idea has been selected
|
|
18
|
+
- the evaluation contract is explicit
|
|
19
|
+
- the quest is ready for implementation and measurement rather than framing, route selection, or writing
|
|
48
20
|
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
-
|
|
52
|
-
-
|
|
21
|
+
Do not use `experiment` when:
|
|
22
|
+
|
|
23
|
+
- the baseline gate is unresolved
|
|
24
|
+
- the idea stage still has unresolved tradeoffs
|
|
25
|
+
- the main need is writing or follow-up analysis rather than a main run
|
|
26
|
+
- the real problem is still route choice, baseline recovery, or open-ended optimization rather than one bounded measured run
|
|
53
27
|
|
|
54
|
-
##
|
|
28
|
+
## One-sentence summary
|
|
55
29
|
|
|
56
|
-
|
|
57
|
-
It should preserve the strongest old experiment-planning and execution discipline:
|
|
30
|
+
Turn one selected route into one trustworthy measured result with the smallest valid amount of execution, then record and route from the evidence.
|
|
58
31
|
|
|
59
|
-
|
|
60
|
-
- keep the run comparable to baseline
|
|
61
|
-
- capture configs, commands, logs, and metrics
|
|
62
|
-
- report both success and failure honestly
|
|
63
|
-
- route the next action through an explicit decision
|
|
32
|
+
## Quick workflow
|
|
64
33
|
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
If
|
|
34
|
+
- Recover the selected idea, accepted baseline, metric contract, and current workspace before implementation.
|
|
35
|
+
- Keep the selected idea summarized in `1-2` sentences, then write a minimal code-change map before touching broad code.
|
|
36
|
+
- Define the null hypothesis, alternative hypothesis, research question, research type, research objective, experimental setup, experimental results, experimental analysis, and experimental conclusions as the run matures.
|
|
37
|
+
- Run only the checks needed to maximize valid evidence per unit time and compute.
|
|
38
|
+
- Use equivalence-preserving efficiency upgrades when they preserve baseline comparability; For `comparison_ready`, `verify-local-existing`, attach, or import should usually beat full reproduction.
|
|
39
|
+
- If an efficiency change affects baseline comparability, treat it as a real experiment change.
|
|
40
|
+
- Prefer one clean implementation pass and one real run over repeated half-runs when the route is already concrete.
|
|
41
|
+
- Implement according to the current `PLAN.md`; revise the plan before changing the route.
|
|
42
|
+
- implement according to the current `PLAN.md`
|
|
43
|
+
- Extra metrics are allowed, but missing required metrics are not.
|
|
44
|
+
- extra metrics are allowed, but missing required metrics are not
|
|
45
|
+
- If a useful non-canonical metric appears, record it as supplementary output rather than replacing the canonical comparator.
|
|
46
|
+
- In algorithm-first work, `experiment` is the execution surface of `optimize`, then results return to `optimize` or `decision` for frontier review.
|
|
47
|
+
- End with a concise `1-2` sentence outcome summary, `evaluation_summary`, `claim_update`, `baseline_relation`, `failure_mode`, and `next_action`.
|
|
71
48
|
|
|
72
|
-
|
|
49
|
+
## Required plan and checklist
|
|
73
50
|
|
|
74
|
-
|
|
75
|
-
|
|
51
|
+
Use `PLAN.md` and `CHECKLIST.md` when the run is non-trivial, expensive, or branch/worktree-sensitive.
|
|
52
|
+
If the plan or checklist is stale, revise `PLAN.md` before spending more code or compute.
|
|
53
|
+
Keep a rolling run log or rolling durable experiment log that captures command ids, output paths, metric changes, and blocker changes.
|
|
76
54
|
|
|
77
|
-
|
|
78
|
-
After a measured result, the default next move is frontier review and optimize-side route selection rather than paper packaging.
|
|
55
|
+
The planning surface should cover:
|
|
79
56
|
|
|
80
|
-
|
|
57
|
+
- selected idea summarized in `1-2` sentences
|
|
58
|
+
- minimal code-change map
|
|
59
|
+
- experiment tier: `auxiliary/dev` or `main/test`
|
|
60
|
+
- minimum -> solid -> maximum evidence target
|
|
61
|
+
- significance-testing plan when statistical claims are likely
|
|
62
|
+
- references/main-experiment-plan-template.md
|
|
63
|
+
- references/main-experiment-checklist-template.md
|
|
64
|
+
|
|
65
|
+
Incremental-recording rule: record the run contract early, update it as evidence arrives, and do not wait until the end to reconstruct what happened.
|
|
66
|
+
|
|
67
|
+
## Control workflow
|
|
68
|
+
|
|
69
|
+
1. Lock the run contract.
|
|
70
|
+
Make explicit the research question, baseline reference, dataset/split, metric keys, stop condition, abandonment condition, and expected outputs.
|
|
71
|
+
2. Implement only the minimum hypothesis-bound change.
|
|
72
|
+
Keep the baseline read-only and avoid unrelated cleanup or hidden scope expansion.
|
|
73
|
+
3. Run a bounded smoke or pilot only when the command path, output schema, or evaluator wiring are still unverified.
|
|
74
|
+
4. Execute and monitor the real run honestly.
|
|
75
|
+
Preserve commands, configs, logs, outputs, comparability, and the last-known-good state.
|
|
76
|
+
5. Validate and record the result.
|
|
77
|
+
Check metric completeness and comparability, then call `artifact.record_main_experiment(...)` and choose the next route.
|
|
78
|
+
|
|
79
|
+
## AVOID / pitfalls
|
|
80
|
+
|
|
81
|
+
- Do not confuse smoke or pilot success with main evidence.
|
|
82
|
+
- Do not silently change dataset, split, metric definition, evaluator logic, or baseline comparison recipe.
|
|
83
|
+
- Do not retry without a real route, code, command, environment, or evidence change.
|
|
84
|
+
- Do not claim success before durable outputs exist and `artifact.record_main_experiment(...)` succeeds.
|
|
85
|
+
- Do not record a durable main experiment from an idea branch, quest root branch, or paper branch as if that were the final result node.
|
|
86
|
+
- Do not disguise idea search or route revision as a routine rerun.
|
|
87
|
+
- Do not keep rerunning after the next route is already clear.
|
|
88
|
+
|
|
89
|
+
## Constraints
|
|
90
|
+
|
|
91
|
+
- All smoke tests, real runs, shell, CLI, Python, bash, node, git, npm, uv, and environment work must go through `bash_exec(...)`.
|
|
92
|
+
- For git work inside the current quest repository or worktree, prefer `artifact.git(...)` before raw shell git commands.
|
|
93
|
+
- Keep the accepted baseline reference read-only.
|
|
94
|
+
- If `active_baseline_metric_contract_json` exists, required baseline metric keys must still be covered unless a concrete deviation is durably recorded.
|
|
95
|
+
- Durable main experiments should land on their own `run/*` branch or an equivalent isolated run surface.
|
|
96
|
+
- If an active paper line or selected outline already exists, a recorded main experiment should be synchronized into the current paper contract instead of living only as a run artifact.
|
|
97
|
+
- In algorithm-first work, after each main run, return to `optimize` or `decision` for frontier review before launching another large run.
|
|
98
|
+
- Main-run evidence is not complete until `artifact.record_main_experiment(...)` succeeds.
|
|
99
|
+
|
|
100
|
+
## Validation
|
|
81
101
|
|
|
82
|
-
|
|
102
|
+
Before `experiment` can end, all applicable checks should be true:
|
|
83
103
|
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
104
|
+
- outputs correspond to the intended code and config
|
|
105
|
+
- required metric keys are present and finite
|
|
106
|
+
- baseline comparison is still comparable, or the deviation is explicit
|
|
107
|
+
- the claim is classified as `supported`, `refuted`, or `inconclusive`
|
|
108
|
+
- the run manifest includes exact command, config, seed, and environment snapshot
|
|
109
|
+
- `evaluation_summary` exists with the six stable fields the next stage needs
|
|
110
|
+
- if a paper line is active, the run is visible through the current paper contract rows rather than only through the run artifact
|
|
111
|
+
- `artifact.record_main_experiment(...)` succeeded
|
|
112
|
+
- the next route is explicit
|
|
113
|
+
|
|
114
|
+
## Interaction discipline
|
|
115
|
+
|
|
116
|
+
Follow the shared interaction contract injected by the system prompt.
|
|
117
|
+
Keep run updates brief unless the measured result, blocker state, or next route changed materially.
|
|
118
|
+
For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
|
|
119
|
+
|
|
120
|
+
## Tool discipline
|
|
121
|
+
|
|
122
|
+
- **Do not use native `shell_command` / `command_execution` in this skill.**
|
|
123
|
+
- **All smoke tests, real runs, shell, CLI, Python, bash, node, git, npm, uv, and environment work must go through `bash_exec(...)`.**
|
|
124
|
+
- **For git work inside the current quest repository or worktree, prefer `artifact.git(...)` before raw shell git commands.**
|
|
125
|
+
- **If a scratch repository or isolated test environment is needed, create and drive it through `bash_exec(...)`, not native shell tools.**
|
|
91
126
|
|
|
92
127
|
## Non-negotiable rules
|
|
93
128
|
|
|
@@ -99,78 +134,7 @@ Treat this as the short run-order summary. The detailed run contract, execution
|
|
|
99
134
|
- Implement the claimed mechanism, not a convenient shortcut that changes the theory.
|
|
100
135
|
- Keep the baseline reference read-only.
|
|
101
136
|
- Avoid asking the user to fix the environment unless there is no credible agent-side path left.
|
|
102
|
-
-
|
|
103
|
-
- After each `artifact.record_main_experiment(...)`, route from the measured result:
|
|
104
|
-
- if paper mode is enabled, decide whether to strengthen evidence, analyze, or write
|
|
105
|
-
- if paper mode is disabled, prefer iterate / revise-idea / branch over default writing
|
|
106
|
-
- In algorithm-first work, after each main run, return to `optimize` or `decision` for frontier review before launching another large run.
|
|
107
|
-
|
|
108
|
-
## Experiment mental guardrails
|
|
109
|
-
|
|
110
|
-
- Baseline reproduction is not wasted time; untrusted comparison is wasted time.
|
|
111
|
-
- Failed runs are still data when the delta and diagnosis are recorded clearly.
|
|
112
|
-
- Suspiciously good results deserve the same skepticism as obvious failures.
|
|
113
|
-
- Change less, learn more.
|
|
114
|
-
- If a retry does not add new evidence, it is budget burn rather than progress.
|
|
115
|
-
|
|
116
|
-
## Use when
|
|
117
|
-
|
|
118
|
-
- a baseline is accepted
|
|
119
|
-
- an idea has been selected
|
|
120
|
-
- the evaluation contract is explicit
|
|
121
|
-
- the quest is ready for implementation and measurement
|
|
122
|
-
|
|
123
|
-
## Do not use when
|
|
124
|
-
|
|
125
|
-
- the baseline gate is unresolved
|
|
126
|
-
- the idea stage still has unresolved tradeoffs
|
|
127
|
-
- the main need is writing or follow-up analysis rather than a main run
|
|
128
|
-
|
|
129
|
-
## Preconditions and gate
|
|
130
|
-
|
|
131
|
-
Before a main run starts, confirm:
|
|
132
|
-
|
|
133
|
-
- selected idea or hypothesis
|
|
134
|
-
- baseline reference
|
|
135
|
-
- dataset and split
|
|
136
|
-
- primary metric
|
|
137
|
-
- stop condition
|
|
138
|
-
- resource budget
|
|
139
|
-
- dedicated `run/*` target branch or isolated worktree for this exact main experiment
|
|
140
|
-
- exact output location
|
|
141
|
-
- required metric keys for acceptance
|
|
142
|
-
- minimal experiment and abandonment condition from the idea stage
|
|
143
|
-
|
|
144
|
-
If any of these are materially unknown, stop and resolve them through `decision`.
|
|
145
|
-
|
|
146
|
-
## Required plan and checklist
|
|
147
|
-
|
|
148
|
-
Before substantial implementation work or a real main run, create a quest-visible `PLAN.md` and `CHECKLIST.md`.
|
|
149
|
-
|
|
150
|
-
- Use `references/main-experiment-plan-template.md` as the canonical structure for `PLAN.md`.
|
|
151
|
-
- Use `references/main-experiment-checklist-template.md` as the canonical structure for `CHECKLIST.md`.
|
|
152
|
-
- `PLAN.md` should lead with the selected idea summarized in `1-2` sentences, put the user's explicit requirements and non-negotiable constraints first, and then make the run contract concrete: baseline and comparability rules, safe efficiency levers, code touchpoints, minimal code-change map, smoke / pilot path, full-run path, fallback options, monitoring and sleep rules, expected outputs, and a revision log.
|
|
153
|
-
- `CHECKLIST.md` is the living execution list; update it during planning, implementation, smoke testing, main execution, validation, and every material route change.
|
|
154
|
-
- If the code path, comparability contract, runtime strategy, or execution route changes materially, revise `PLAN.md` before spending more code or compute.
|
|
155
|
-
- The later `RUN.md`, `summary.md`, and artifact payloads remain required outputs, but `PLAN.md` and `CHECKLIST.md` are the canonical planning-and-control surface before and during execution.
|
|
156
|
-
- Once `PLAN.md` makes the implementation route concrete, do not keep reshaping code and commands speculatively. The normal default is one bounded smoke or pilot run and then one real run, with retries only after a documented failure, invalidity, or new evidence that changes the expected outcome.
|
|
157
|
-
|
|
158
|
-
## Working-boundary rules
|
|
159
|
-
|
|
160
|
-
Only modify the active quest workspace for this experiment line.
|
|
161
|
-
|
|
162
|
-
- treat the accepted baseline workspace as read-only
|
|
163
|
-
- do not derive branch or worktree assumptions from guesswork
|
|
164
|
-
- keep all durable outputs inside the quest
|
|
165
|
-
- if the runtime gives an explicit worktree path, use it exactly
|
|
166
|
-
|
|
167
|
-
## Resource and environment rules
|
|
168
|
-
|
|
169
|
-
- Follow the explicit resource assignment if one exists.
|
|
170
|
-
- If GPU assignment is explicit, respect it exactly and record it in the run manifest.
|
|
171
|
-
- Do not silently consume extra GPUs or broaden resource scope.
|
|
172
|
-
- Capture enough environment information that the run can later be reconstructed.
|
|
173
|
-
- If a new dependency appears necessary, record it as a risk and prefer a fallback if possible.
|
|
137
|
+
- After each `artifact.record_main_experiment(...)`, route from the measured result instead of stopping at “run finished”.
|
|
174
138
|
|
|
175
139
|
## Truth sources
|
|
176
140
|
|
|
@@ -193,424 +157,45 @@ Do not claim run success without durable outputs.
|
|
|
193
157
|
A meaningful experiment pass should leave behind:
|
|
194
158
|
|
|
195
159
|
- a run directory under `artifacts/experiment/<run_id>/` or the quest-equivalent canonical location
|
|
196
|
-
- `artifact_manifest.json`
|
|
197
|
-
- `run_manifest.json`
|
|
198
|
-
- `metrics.json`
|
|
199
|
-
- `metrics.md`
|
|
200
|
-
- `summary.md`
|
|
201
|
-
- `runlog.summary.md`
|
|
202
160
|
- durable command, config, and log pointers
|
|
203
161
|
- exported shell log, typically `bash.log`
|
|
204
162
|
- a run artifact with explicit deltas versus baseline
|
|
205
163
|
- a decision about what should happen next
|
|
206
164
|
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
- `claim_validation.md`
|
|
210
|
-
- environment snapshot files such as:
|
|
211
|
-
- Python version
|
|
212
|
-
- package freeze
|
|
213
|
-
- GPU info when applicable
|
|
214
|
-
- a live execution note or rolling run log when the experiment spans multiple implementation or execution steps
|
|
215
|
-
|
|
216
|
-
`run_manifest.json` should capture at least:
|
|
217
|
-
|
|
218
|
-
- `run_id`
|
|
219
|
-
- quest / branch context
|
|
220
|
-
- baseline reference or commit
|
|
221
|
-
- full commands
|
|
222
|
-
- config paths and key resolved hyperparameters
|
|
223
|
-
- dataset identifier or version
|
|
224
|
-
- seeds
|
|
225
|
-
- environment snapshot paths
|
|
226
|
-
- start time, end time, and final status
|
|
165
|
+
For the exact run-manifest fields, checklist template, and detailed recording contract, use the references listed below.
|
|
227
166
|
|
|
228
|
-
|
|
167
|
+
## Evidence ladder note
|
|
229
168
|
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
### 1. Define the run contract
|
|
233
|
-
|
|
234
|
-
Before implementation or execution, state:
|
|
235
|
-
|
|
236
|
-
- `run_id`
|
|
237
|
-
- experiment tier: `auxiliary/dev` or `main/test`
|
|
238
|
-
- research question
|
|
239
|
-
- null hypothesis
|
|
240
|
-
- alternative hypothesis
|
|
241
|
-
- hypothesis
|
|
242
|
-
- baseline id or variant
|
|
243
|
-
- metric targets
|
|
244
|
-
- expected changed files
|
|
245
|
-
- expected outputs
|
|
246
|
-
- stop condition
|
|
247
|
-
- compute or runtime budget
|
|
248
|
-
- minimal experiment
|
|
249
|
-
- abandonment condition
|
|
250
|
-
- strongest alternative hypothesis
|
|
251
|
-
- exact metric keys that will decide success or failure
|
|
252
|
-
|
|
253
|
-
Prefer to write this contract first in `PLAN.md` using `references/main-experiment-plan-template.md`, then keep the current execution state visible in `CHECKLIST.md` using `references/main-experiment-checklist-template.md`.
|
|
254
|
-
|
|
255
|
-
For substantial runs, also record the following seven experiment fields early and keep them updated during execution:
|
|
256
|
-
|
|
257
|
-
1. research question
|
|
258
|
-
2. research type
|
|
259
|
-
3. research objective
|
|
260
|
-
4. experimental setup
|
|
261
|
-
5. experimental results
|
|
262
|
-
6. experimental analysis
|
|
263
|
-
7. experimental conclusions
|
|
264
|
-
|
|
265
|
-
If the run contract changes materially later, record the change durably.
|
|
266
|
-
|
|
267
|
-
Treat the run contract as a research question contract, not only an execution checklist.
|
|
268
|
-
Before coding, be able to explain:
|
|
269
|
-
|
|
270
|
-
- why this run is the best current route rather than the main alternatives
|
|
271
|
-
- what observation would count as a real answer to the research question
|
|
272
|
-
- what result would force a downgrade, retry, or route change
|
|
273
|
-
- what confounder would make the run non-comparable even if it finishes successfully
|
|
274
|
-
|
|
275
|
-
If multiple candidate experiment packages exist, prefer the one with the best balance of:
|
|
276
|
-
|
|
277
|
-
- technical feasibility
|
|
278
|
-
- research importance
|
|
279
|
-
- methodological rigor
|
|
280
|
-
|
|
281
|
-
Do not choose a package only because it sounds ambitious.
|
|
282
|
-
|
|
283
|
-
For paper-facing lines, default to this evidence ladder:
|
|
169
|
+
Use `references/evidence-ladder.md` when deciding whether the current package is merely executable, solid enough to carry the main claim, or already in the stage where broader polish is justified.
|
|
284
170
|
|
|
285
|
-
|
|
286
|
-
- clarify parameters, settings, mechanisms, or diagnostics
|
|
287
|
-
- `main/test`
|
|
288
|
-
- carry the core comparison the paper will rely on
|
|
289
|
-
- `minimum -> solid -> maximum`
|
|
290
|
-
- first make the result executable and comparable
|
|
291
|
-
- then make it strong enough to carry the claim
|
|
292
|
-
- only then spend effort on broader supporting polish
|
|
171
|
+
The default ladder is:
|
|
293
172
|
|
|
294
|
-
|
|
173
|
+
- `minimum`: executable and comparable
|
|
174
|
+
- `solid`: strong enough to carry the main claim
|
|
175
|
+
- `maximum`: broader supporting polish after the main claim is already credible
|
|
295
176
|
|
|
296
|
-
|
|
177
|
+
Do not spend for `maximum` before the line is at least `solid`.
|
|
297
178
|
|
|
298
|
-
|
|
299
|
-
- confirm the baseline metrics reference
|
|
300
|
-
- if durable state exposes `active_baseline_metric_contract_json`, read that JSON file before planning commands or comparisons
|
|
301
|
-
- treat `active_baseline_metric_contract_json` as the default authoritative baseline comparison contract unless you record a concrete reason to override it
|
|
302
|
-
- confirm the selected idea claim and code-level plan
|
|
303
|
-
- look up prior incidents or repeated failure patterns when available
|
|
304
|
-
- confirm output directories and naming
|
|
305
|
-
- confirm that the intended run still matches the current quest decision
|
|
179
|
+
## Planning note
|
|
306
180
|
|
|
307
|
-
|
|
181
|
+
Use quest or workspace planning files only when they help control a non-trivial run.
|
|
182
|
+
Otherwise keep the run contract small and move to the first decisive execution step.
|
|
308
183
|
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
- the baseline verification is trustworthy enough
|
|
312
|
-
- the planned comparison still uses the same metric contract
|
|
313
|
-
- the metric keys and primary metric still match `active_baseline_metric_contract_json` when that file is available
|
|
314
|
-
- every main experiment submission still covers all required baseline metric ids from `active_baseline_metric_contract_json`; extra metrics are allowed, but missing required metrics are not
|
|
315
|
-
- the required baseline metrics still use the same evaluation code and metric definitions; if an extra evaluator is genuinely necessary, record it as supplementary output rather than replacing the canonical comparator
|
|
316
|
-
- if the run is `main/test` and superiority is likely to be claimed, define the significance-testing plan before execution rather than after seeing the numbers
|
|
317
|
-
- if `Result/metric.md` was used during the run, treat it as optional scratch memory only and reconcile it against the final submitted metrics before `artifact.record_main_experiment(...)`
|
|
318
|
-
|
|
319
|
-
Before you begin a substantial run, send a concise threaded `artifact.interact(kind='progress', ...)` update naming:
|
|
184
|
+
## Operational guidance
|
|
320
185
|
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
- the expected durable outputs
|
|
324
|
-
- the next checkpoint for reporting back
|
|
186
|
+
The main skill keeps the control surface in front.
|
|
187
|
+
For the longer operational notes, read the references:
|
|
325
188
|
|
|
326
|
-
|
|
189
|
+
- `references/main-experiment-plan-template.md`
|
|
190
|
+
- `references/main-experiment-checklist-template.md`
|
|
191
|
+
- `references/execution-playbook.md`
|
|
192
|
+
- `references/operational-guidance.md`
|
|
327
193
|
|
|
328
|
-
|
|
329
|
-
|
|
330
|
-
- two retries in a row add no new evidence or no interpretable delta
|
|
331
|
-
- the baseline gap is much larger than expected and the cause is unclear
|
|
332
|
-
- the metrics are suspiciously strong, suspiciously identical to baseline, or highly unstable
|
|
333
|
-
- logs, checkpoints, or intermediate outputs conflict with the claimed behavior
|
|
334
|
-
|
|
335
|
-
In diagnosis mode:
|
|
336
|
-
|
|
337
|
-
- stop brute-force retrying
|
|
338
|
-
- prefer the smallest discriminative test that can separate competing hypotheses
|
|
339
|
-
- resolve obvious environment or data-contract issues before launching another comparison run
|
|
340
|
-
- make the diagnosis goal explicit: explain the behavior, not just "try something else"
|
|
341
|
-
|
|
342
|
-
### 3. Confirm the execution workspace
|
|
343
|
-
|
|
344
|
-
The normal experiment workspace is the current active idea worktree returned by `artifact.submit_idea(...)`.
|
|
345
|
-
|
|
346
|
-
- do not create a fresh manual branch for the main experiment unless recovery or debugging truly requires it
|
|
347
|
-
- implement and run inside the current active idea workspace
|
|
348
|
-
- if the idea package changes materially before execution, submit a new durable idea branch with `artifact.submit_idea(mode='create', lineage_intent='continue_line'|'branch_alternative', ...)` instead of silently mutating the old node
|
|
349
|
-
- after a real main run finishes, record it with `artifact.record_main_experiment(...)` before moving to analysis or writing
|
|
350
|
-
- once that durable main result exists, treat the branch as a fixed round node; a later new optimization round should usually compare foundations and create a new `continue_line` child branch or `branch_alternative` sibling-like branch
|
|
351
|
-
- after `artifact.record_main_experiment(...)`, if QQ milestone media is enabled and the metrics are stable enough to summarize honestly, prefer one concise summary PNG over multiple attachments
|
|
352
|
-
|
|
353
|
-
### 4. Implement the minimum required change
|
|
354
|
-
|
|
355
|
-
Implementation rules:
|
|
356
|
-
|
|
357
|
-
- keep the change hypothesis-bound
|
|
358
|
-
- prefer small, explainable edits
|
|
359
|
-
- avoid unrelated cleanup during a main run
|
|
360
|
-
- record which files matter for later review
|
|
361
|
-
- preserve theory fidelity between the idea claim and the code change
|
|
362
|
-
- add robustness checks when the mechanism risks NaN, inf, or unstable behavior
|
|
363
|
-
- implement according to the current `PLAN.md` instead of repeatedly improvising a new method after each small observation
|
|
364
|
-
- avoid repeated code churn between the smoke test and the real run unless the smoke test exposes a specific problem that the next change is meant to fix
|
|
365
|
-
|
|
366
|
-
Prefer to complete one experiment cleanly before expanding to the next, unless parallel execution is explicitly justified and isolated.
|
|
367
|
-
For substantial experiment packages, the default is one experiment at a time, with each one reaching a recoverable recorded state before the next begins.
|
|
194
|
+
Use them when:
|
|
368
195
|
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
-
|
|
372
|
-
- if broader recovery is unavoidable, record exactly which layer changed: data, preprocessing, model, objective, optimization, evaluation, or environment
|
|
373
|
-
- before each retry, state the expected effect and the fastest falsification signal
|
|
374
|
-
- if the retry produced no interpretable delta, do not treat it as meaningful evidence about the underlying research hypothesis
|
|
375
|
-
|
|
376
|
-
### 5. Execute the run
|
|
377
|
-
|
|
378
|
-
Run with auditable commands and durable outputs.
|
|
379
|
-
|
|
380
|
-
Execution rules:
|
|
381
|
-
|
|
382
|
-
- use non-interactive commands
|
|
383
|
-
- prefer `bash_exec` instead of ephemeral shell invocations
|
|
384
|
-
- use the intended dataset and split
|
|
385
|
-
- keep logs durable
|
|
386
|
-
- report progress for long runs
|
|
387
|
-
- avoid silent metric-definition changes
|
|
388
|
-
- do not drift away from `active_baseline_metric_contract_json` silently when that file exists
|
|
389
|
-
- avoid silently changing the baseline comparison recipe
|
|
390
|
-
- run the full agreed evaluation, not only a smoke test
|
|
391
|
-
|
|
392
|
-
You may do a quick sanity run first, but if the stage goal is a real experiment you must continue to the real evaluation unless the run is blocked and recorded.
|
|
393
|
-
|
|
394
|
-
Pilot-before-scale rule:
|
|
395
|
-
|
|
396
|
-
- start with a bounded pilot when the modification is non-trivial
|
|
397
|
-
- use the pilot to catch implementation mistakes early
|
|
398
|
-
- record pilot outcomes explicitly
|
|
399
|
-
- do not mistake pilot success for final evidence
|
|
400
|
-
|
|
401
|
-
Incremental-recording rule:
|
|
402
|
-
|
|
403
|
-
- do not wait until the end to reconstruct the run from memory
|
|
404
|
-
- update the durable run note after:
|
|
405
|
-
- contract definition
|
|
406
|
-
- important code changes
|
|
407
|
-
- pilot validation
|
|
408
|
-
- full execution checkpoints
|
|
409
|
-
- post-run analysis
|
|
410
|
-
- update `CHECKLIST.md` alongside those durable notes so the current execution frontier is obvious without replaying the whole log
|
|
411
|
-
- include timestamps when they materially help reconstruction
|
|
412
|
-
- preserve failed attempts, anomalies, and partial outcomes rather than overwriting them
|
|
413
|
-
|
|
414
|
-
Last-known-good rule:
|
|
415
|
-
|
|
416
|
-
- keep track of the most recent state that was executable, comparable, and explainable
|
|
417
|
-
- when a new attempt breaks that state, debug forward from the last-known-good point instead of stacking more speculative edits on top of the broken state
|
|
418
|
-
- if the last-known-good state is unclear, reconstruct it before spending more budget on new hypotheses
|
|
419
|
-
|
|
420
|
-
### 5.1 Long-running command protocol
|
|
421
|
-
|
|
422
|
-
For commands that may run longer than a few minutes:
|
|
423
|
-
|
|
424
|
-
- before the real long run, execute a bounded smoke test or pilot that validates command paths, outputs, and basic metrics
|
|
425
|
-
- once the smoke test passes, launch the real run with `bash_exec(mode='detach', ...)` and normally leave `timeout_seconds` unset for that long run
|
|
426
|
-
- monitor through durable logs rather than only live terminal output
|
|
427
|
-
- `bash_exec(mode='read', id=...)` returns the full rendered log when it is 2000 lines or fewer; for longer logs it returns the first 500 lines plus the last 1500 lines and a hint to inspect omitted sections with `start` and `tail`
|
|
428
|
-
- if the middle of a long saved log matters, inspect that omitted region with `bash_exec(mode='read', id=..., start=..., tail=...)`
|
|
429
|
-
- use `bash_exec(mode='list')` and `bash_exec(mode='read', id=..., tail_limit=..., order='desc')` to monitor or revisit managed commands while focusing on the newest evidence first
|
|
430
|
-
- after the first read, prefer `bash_exec(mode='read', id=..., after_seq=last_seen_seq, tail_limit=..., order='asc')` so later checks only fetch new evidence
|
|
431
|
-
- if you need to recover ids or sanity-check the active session ordering, use `bash_exec(mode='history')`
|
|
432
|
-
- launch important runs with a structured `comment` such as `{stage, goal, action, expected_signal, next_check}`
|
|
433
|
-
- use `silent_seconds`, `progress_age_seconds`, `signal_age_seconds`, and `watchdog_overdue` from `bash_exec(mode='list'|'read', ...)` as your default watchdog signals
|
|
434
|
-
- use an explicit wait-and-check loop such as:
|
|
435
|
-
- wait about `60s`, then inspect logs
|
|
436
|
-
- wait about `120s`, then inspect logs
|
|
437
|
-
- wait about `300s`, then inspect logs
|
|
438
|
-
- wait about `600s`, then inspect logs
|
|
439
|
-
- wait about `1800s`, then inspect logs
|
|
440
|
-
- then keep checking about every `1800s` while the run is still active
|
|
441
|
-
- if needed, use an explicit bounded wait such as `bash_exec(command='sleep 60', mode='await', timeout_seconds=70)` or `bash_exec(mode='await', id=..., timeout_seconds=...)` between checks
|
|
442
|
-
- canonical sleep choice:
|
|
443
|
-
- if you only need wall-clock waiting between checks, use `bash_exec(command='sleep N', mode='await', timeout_seconds=N+buffer, ...)`
|
|
444
|
-
- keep a real buffer on that sleep timeout; do not set `timeout_seconds` exactly equal to `N`
|
|
445
|
-
- if you are waiting on an already running managed session, prefer `bash_exec(mode='await', id=..., timeout_seconds=...)` instead of starting a new sleep command
|
|
446
|
-
- after every completed sleep / await cycle, inspect logs first; only send `artifact.interact(kind='progress', ...)` when the user-visible state, frontier, blocker status, or ETA materially changed
|
|
447
|
-
- after the first meaningful signal and then at real checkpoints (e.g., completion, recovery, blocker, or a materially widened comparable surface), keep those progress updates going rather than waiting silently
|
|
448
|
-
- if the run is clearly invalid, wedged, or superseded, stop it with `bash_exec(mode='kill', id=..., wait=true, timeout_seconds=...)`; if it must die immediately, add `force=true`, record the reason, fix the issue, and relaunch cleanly
|
|
449
|
-
- do not report completion until logs and output files both confirm completion
|
|
450
|
-
|
|
451
|
-
Always preserve the managed `bash_exec` log and export it into the experiment artifact directory when the run artifact is written.
|
|
452
|
-
|
|
453
|
-
### 5.2 Progress marker protocol
|
|
454
|
-
|
|
455
|
-
Long loops should emit structured progress markers rather than noisy raw progress bars.
|
|
456
|
-
|
|
457
|
-
- use single-line JSON progress markers
|
|
458
|
-
- keep them throttled
|
|
459
|
-
- treat them as UI signals, not narrative prose
|
|
460
|
-
- do not paste raw progress lines into summaries
|
|
461
|
-
- when possible include `eta` in seconds and `next_reply_at` or `next_check_at` so web/TUI can show the next expected update
|
|
462
|
-
|
|
463
|
-
If you control the code, prefer a throttled `tqdm`-style progress reporter for the run itself and pair it with concise structured `__DS_PROGRESS__` lines when feasible so monitoring remains machine-readable.
|
|
464
|
-
|
|
465
|
-
### 6. Validate the outputs
|
|
466
|
-
|
|
467
|
-
After the run, verify:
|
|
468
|
-
|
|
469
|
-
- outputs correspond to the intended code/config
|
|
470
|
-
- metrics are complete and interpretable
|
|
471
|
-
- comparison to baseline is fair
|
|
472
|
-
- any failure mode or confounder is visible
|
|
473
|
-
- required metric keys are present and finite
|
|
474
|
-
- the result can be mapped back to the original claim
|
|
475
|
-
- the summary states a clear go or no-go recommendation
|
|
476
|
-
|
|
477
|
-
Create a durable claim-validation record that maps:
|
|
478
|
-
|
|
479
|
-
- claim
|
|
480
|
-
- metric key
|
|
481
|
-
- expected direction
|
|
482
|
-
- observed result
|
|
483
|
-
- verdict:
|
|
484
|
-
- `supported`
|
|
485
|
-
- `refuted`
|
|
486
|
-
- `inconclusive`
|
|
487
|
-
|
|
488
|
-
Also verify baseline comparability before claiming deltas:
|
|
489
|
-
|
|
490
|
-
- was the baseline verification stable?
|
|
491
|
-
- was the evaluation path the same?
|
|
492
|
-
- are the compared metric keys identical?
|
|
493
|
-
- if the run is claim-carrying, are the significance results or uncertainty estimates strong enough for main-text use?
|
|
494
|
-
- do known caveats make the delta weaker than it first appears?
|
|
495
|
-
|
|
496
|
-
### 7. Record the run
|
|
497
|
-
|
|
498
|
-
Every meaningful main run must be recorded through `artifact.record_main_experiment(...)`.
|
|
499
|
-
|
|
500
|
-
That call is responsible for writing:
|
|
501
|
-
|
|
502
|
-
- `experiments/main/<run_id>/RUN.md`
|
|
503
|
-
- `experiments/main/<run_id>/RESULT.json`
|
|
504
|
-
- the durable `run` artifact payload
|
|
505
|
-
- baseline comparisons
|
|
506
|
-
- breakthrough status derived by the system
|
|
507
|
-
|
|
508
|
-
`artifact.record_main_experiment(...)` should include at least:
|
|
509
|
-
|
|
510
|
-
- `run_id`
|
|
511
|
-
- title
|
|
512
|
-
- hypothesis
|
|
513
|
-
- setup
|
|
514
|
-
- execution
|
|
515
|
-
- results
|
|
516
|
-
- conclusion
|
|
517
|
-
- baseline reference
|
|
518
|
-
- `metrics_summary`
|
|
519
|
-
- `metric_rows` when available
|
|
520
|
-
- the metric contract actually used
|
|
521
|
-
- verdict
|
|
522
|
-
- evidence paths
|
|
523
|
-
- changed files
|
|
524
|
-
- relevant config paths when applicable
|
|
525
|
-
- `evaluation_summary` with exactly these six fields:
|
|
526
|
-
- `takeaway`
|
|
527
|
-
- `claim_update`
|
|
528
|
-
- `baseline_relation`
|
|
529
|
-
- `comparability`
|
|
530
|
-
- `failure_mode`
|
|
531
|
-
- `next_action`
|
|
532
|
-
|
|
533
|
-
Use `evaluation_summary` as the short structured judgment layer on top of the longer narrative fields:
|
|
534
|
-
|
|
535
|
-
- `takeaway`: one sentence the next reader can reuse directly
|
|
536
|
-
- `claim_update`: `strengthens`, `weakens`, `narrows`, or `neutral`
|
|
537
|
-
- `baseline_relation`: `better`, `worse`, `mixed`, or `not_comparable`
|
|
538
|
-
- `comparability`: `high`, `medium`, or `low`
|
|
539
|
-
- `failure_mode`: `none`, `implementation`, `evaluation`, `environment`, or `direction`
|
|
540
|
-
- `next_action`: the immediate route such as `continue`, `revise_idea`, `analysis_campaign`, `write`, or `stop`
|
|
541
|
-
|
|
542
|
-
After `artifact.record_main_experiment(...)` succeeds, do not assume the same branch should absorb the next round by default.
|
|
543
|
-
Interpret the measured result first, then either:
|
|
544
|
-
|
|
545
|
-
- launch analysis from this branch, or
|
|
546
|
-
- compare candidate foundations and create the next child research branch
|
|
547
|
-
|
|
548
|
-
Use `artifact.create_analysis_campaign(...)` only when the extra slices have clear academic or claim-level value relative to their resource cost.
|
|
549
|
-
If the main need is simply to continue optimization from a measured result, prefer a new durable child idea branch instead of an expensive analysis package by reflex.
|
|
550
|
-
If the extra work should happen on an older durable branch rather than the current head, first switch the runtime back there with `artifact.activate_branch(...)`, then launch the analysis campaign from that activated workspace.
|
|
551
|
-
|
|
552
|
-
When `artifact.record_main_experiment(...)` succeeds, send a richer threaded `artifact.interact(kind='milestone', ...)` update rather than a generic one-line progress ping.
|
|
553
|
-
Lead that milestone with a concise `1-2` sentence outcome summary before expanding into more detail.
|
|
554
|
-
That milestone should state:
|
|
555
|
-
|
|
556
|
-
- the research question that was tested
|
|
557
|
-
- the primary result and baseline delta
|
|
558
|
-
- whether the run supports, weakens, or leaves the idea inconclusive
|
|
559
|
-
- the main caveat or confidence note that still matters
|
|
560
|
-
- the exact recommended next move
|
|
561
|
-
|
|
562
|
-
Do not treat a main run as durably complete until `artifact.record_main_experiment(...)` succeeds.
|
|
563
|
-
|
|
564
|
-
Recommended per-run documentation fields:
|
|
565
|
-
|
|
566
|
-
1. research question
|
|
567
|
-
2. research type
|
|
568
|
-
3. research objective
|
|
569
|
-
4. experimental setup
|
|
570
|
-
5. experimental results
|
|
571
|
-
6. experimental analysis
|
|
572
|
-
7. experimental conclusions
|
|
573
|
-
|
|
574
|
-
These seven fields should be progressively filled as the run advances, not only at final packaging time.
|
|
575
|
-
|
|
576
|
-
`RUN.md` should make it easy for later stages to answer:
|
|
577
|
-
|
|
578
|
-
- what changed?
|
|
579
|
-
- how can this run be reproduced?
|
|
580
|
-
- what are the main results?
|
|
581
|
-
- why did it work or fail?
|
|
582
|
-
- what should happen next?
|
|
583
|
-
|
|
584
|
-
When the run is analysis-heavy or meant to fill a writing evidence gap, prefer a structured summary with:
|
|
585
|
-
|
|
586
|
-
1. research question
|
|
587
|
-
2. research type
|
|
588
|
-
3. objective and success criteria
|
|
589
|
-
4. setup
|
|
590
|
-
5. results
|
|
591
|
-
6. analysis
|
|
592
|
-
7. conclusion
|
|
593
|
-
|
|
594
|
-
Recording rules:
|
|
595
|
-
|
|
596
|
-
- record results incrementally, not only at the end
|
|
597
|
-
- include timestamps when helpful
|
|
598
|
-
- include failed attempts, partial runs, and unexpected outcomes
|
|
599
|
-
- do not leave placeholder sections for later if the information is already known
|
|
600
|
-
- report exactly what happened, not what you hoped would happen
|
|
601
|
-
|
|
602
|
-
### 8. Decide the next move
|
|
603
|
-
|
|
604
|
-
The experiment stage should normally end with one of:
|
|
605
|
-
|
|
606
|
-
- continue the current line
|
|
607
|
-
- branch a new line
|
|
608
|
-
- launch an analysis campaign
|
|
609
|
-
- move to writing
|
|
610
|
-
- reset or stop
|
|
611
|
-
|
|
612
|
-
Do not let the stage end without an explicit next direction.
|
|
613
|
-
If analysis is selected, record why the expected information gain is strong enough to justify the added compute, time, or annotation budget.
|
|
196
|
+
- the run contract is non-trivial
|
|
197
|
+
- the long-running protocol or monitoring cadence matters
|
|
198
|
+
- the exact manifest, artifact, memory, or charting rules matter
|
|
614
199
|
|
|
615
200
|
## Run-quality rules
|
|
616
201
|
|
|
@@ -622,7 +207,7 @@ A credible main run should satisfy:
|
|
|
622
207
|
- outcome can be explained by the intended intervention or its failure
|
|
623
208
|
- commands, configs, and seeds are reconstructable
|
|
624
209
|
- environment context is reconstructable
|
|
625
|
-
-
|
|
210
|
+
- later readers can trace code and diff context to command, logs, and metrics
|
|
626
211
|
|
|
627
212
|
If the result is confounded, say so directly.
|
|
628
213
|
|
|
@@ -640,108 +225,6 @@ Before marking the run complete, verify all of the following:
|
|
|
640
225
|
|
|
641
226
|
If these checks fail, record the run as partial or blocked rather than pretending it is complete.
|
|
642
227
|
|
|
643
|
-
## Memory rules
|
|
644
|
-
|
|
645
|
-
Stage-start requirement:
|
|
646
|
-
|
|
647
|
-
- begin every experiment pass with `memory.list_recent(scope='quest', limit=5)`
|
|
648
|
-
- then run at least one experiment-relevant `memory.search(...)` before a new run, retry, or material execution change
|
|
649
|
-
- if several idea or experiment lines exist, narrow retrieval to the current `idea_id`, `branch`, and `run_id`; do not casually reuse memory from another idea line unless you are explicitly comparing lines
|
|
650
|
-
|
|
651
|
-
Write to memory only when the lesson is reusable, such as:
|
|
652
|
-
|
|
653
|
-
- experiment failure patterns
|
|
654
|
-
- stable implementation lessons
|
|
655
|
-
- evaluation pitfalls
|
|
656
|
-
- validated mechanism scope and caveats
|
|
657
|
-
|
|
658
|
-
The canonical record of the run itself belongs in `artifact`, not only in memory.
|
|
659
|
-
|
|
660
|
-
Preferred memory usage:
|
|
661
|
-
|
|
662
|
-
- quest `ideas`:
|
|
663
|
-
- the current idea contract and claim boundary
|
|
664
|
-
- quest `decisions`:
|
|
665
|
-
- run-scope choices
|
|
666
|
-
- retry or branch decisions
|
|
667
|
-
- stop conditions that must not drift
|
|
668
|
-
- quest `episodes`:
|
|
669
|
-
- failed runs
|
|
670
|
-
- debugging episodes
|
|
671
|
-
- suspicious-result investigations
|
|
672
|
-
- repeated infrastructure or resource failures
|
|
673
|
-
- quest `knowledge`:
|
|
674
|
-
- validated mechanism scope
|
|
675
|
-
- evaluation caveats
|
|
676
|
-
- stable implementation lessons worth reusing in later runs of this quest
|
|
677
|
-
- global `knowledge`:
|
|
678
|
-
- reusable debugging heuristics
|
|
679
|
-
- stable reproducibility lessons
|
|
680
|
-
- cross-quest experiment design playbooks
|
|
681
|
-
- global `templates`:
|
|
682
|
-
- run-manifest patterns
|
|
683
|
-
- claim-validation templates
|
|
684
|
-
- experiment summary templates
|
|
685
|
-
|
|
686
|
-
Use tags to refine retrieval when helpful, for example:
|
|
687
|
-
|
|
688
|
-
- `stage:experiment`
|
|
689
|
-
- `type:failure-pattern`
|
|
690
|
-
- `type:metric-contract`
|
|
691
|
-
- `type:claim-validation`
|
|
692
|
-
- `topic:<mechanism>`
|
|
693
|
-
|
|
694
|
-
When calling `memory.write(...)`, pass `tags` as an array like `["stage:experiment", "type:failure-pattern", "topic:<mechanism>"]`, not as one comma-joined string.
|
|
695
|
-
|
|
696
|
-
Recommended read timing:
|
|
697
|
-
|
|
698
|
-
- before the first run:
|
|
699
|
-
- consult quest `ideas`, `decisions`, and relevant `knowledge`
|
|
700
|
-
- before a retry:
|
|
701
|
-
- search quest `episodes` first
|
|
702
|
-
- before changing execution strategy materially:
|
|
703
|
-
- re-check quest `decisions`
|
|
704
|
-
- after suspicious results:
|
|
705
|
-
- consult recent `episodes` and stable debugging `knowledge`
|
|
706
|
-
|
|
707
|
-
Stage-end requirement:
|
|
708
|
-
|
|
709
|
-
- successful runs should leave at least one reusable knowledge note if the lesson generalizes
|
|
710
|
-
- failed or partial runs should leave an incident note when the failure pattern is reusable
|
|
711
|
-
- every experiment `memory.write(...)` must state whether the outcome was `success`, `partial`, or `failure`
|
|
712
|
-
- every experiment `memory.write(...)` should also include the current `idea_id`, `branch`, and `run_id` so later retrieval does not mix different experiment lines
|
|
713
|
-
|
|
714
|
-
## Artifact rules
|
|
715
|
-
|
|
716
|
-
Typical artifact sequence:
|
|
717
|
-
|
|
718
|
-
- progress artifact for long runs
|
|
719
|
-
- `artifact.record_main_experiment(...)` at main-run completion
|
|
720
|
-
- milestone or report artifact for major findings
|
|
721
|
-
- decision artifact to choose next stage
|
|
722
|
-
|
|
723
|
-
Preferred artifact choices:
|
|
724
|
-
|
|
725
|
-
- use `progress` for long-running execution updates
|
|
726
|
-
- use `artifact.record_main_experiment(...)` for each meaningful completed main experiment
|
|
727
|
-
- use `run` for analysis slice records when `artifact.record_analysis_slice(...)` writes them
|
|
728
|
-
- use `report` for:
|
|
729
|
-
- analysis-rich summaries
|
|
730
|
-
- suspicious-result investigations
|
|
731
|
-
- post-run interpretation
|
|
732
|
-
- use `milestone` when a major stage checkpoint is reached
|
|
733
|
-
- use `decision` for:
|
|
734
|
-
- continue
|
|
735
|
-
- branch
|
|
736
|
-
- analysis
|
|
737
|
-
- write
|
|
738
|
-
- reset
|
|
739
|
-
- stop
|
|
740
|
-
- use `approval` when an explicit user approval is captured for an expensive or risky run change
|
|
741
|
-
|
|
742
|
-
Use `artifact.checkpoint(...)` when code evolution is meaningful and should be preserved in Git.
|
|
743
|
-
After a meaningful experiment checkpoint or completion, emit `artifact.interact(kind='progress' | 'milestone', ...)` so the user sees the concrete result and next step.
|
|
744
|
-
|
|
745
228
|
## Failure and blocked handling
|
|
746
229
|
|
|
747
230
|
A failed main run is still useful if it is explained well.
|
|
@@ -750,7 +233,7 @@ Record:
|
|
|
750
233
|
|
|
751
234
|
- what was attempted
|
|
752
235
|
- where the failure occurred
|
|
753
|
-
- whether the failure
|
|
236
|
+
- whether the failure was methodological or infrastructural
|
|
754
237
|
- what retry, branch, or reset is justified
|
|
755
238
|
- the single best next action
|
|
756
239
|
|
|
@@ -771,31 +254,8 @@ Also classify the broader failure layer when possible:
|
|
|
771
254
|
- environment
|
|
772
255
|
- direction
|
|
773
256
|
|
|
774
|
-
|
|
775
|
-
|
|
776
|
-
|
|
777
|
-
Blocked experiment states commonly include:
|
|
778
|
-
|
|
779
|
-
- missing baseline reference
|
|
780
|
-
- unknown metric contract
|
|
781
|
-
- environment failure
|
|
782
|
-
- run failed before producing metrics
|
|
783
|
-
- metrics produced but not comparable
|
|
784
|
-
|
|
785
|
-
When results are suspicious, such as identical to baseline, implausibly perfect, or inconsistent across repeats, diagnose systematically:
|
|
786
|
-
|
|
787
|
-
1. fix the subset and seeds
|
|
788
|
-
2. isolate preprocessing, tokenization, model init, training, and evaluation one by one
|
|
789
|
-
3. compare intermediate outputs on the same inputs
|
|
790
|
-
4. align inputs first, then outputs, then metrics
|
|
791
|
-
|
|
792
|
-
Default diagnosis loop:
|
|
793
|
-
|
|
794
|
-
1. collect the concrete failure or anomaly cases
|
|
795
|
-
2. identify the last-known-good comparable state
|
|
796
|
-
3. define the smallest delta between working and broken states
|
|
797
|
-
4. write `2-4` concrete hypotheses
|
|
798
|
-
5. run the cheapest discriminative check before another full retry
|
|
257
|
+
Blocked experiment states commonly include missing baseline reference, unknown metric contract, environment failure, run failure before metrics, or metrics that are not comparable.
|
|
258
|
+
When results are suspicious, fix the subset and seeds, isolate preprocessing/model/training/evaluation one by one, compare intermediate outputs on the same inputs, and run the cheapest discriminative check before another full retry.
|
|
799
259
|
|
|
800
260
|
## Exit criteria
|
|
801
261
|
|
|
@@ -803,4 +263,6 @@ Exit the experiment stage once one of the following is durably true:
|
|
|
803
263
|
|
|
804
264
|
- a main run is completed and recorded
|
|
805
265
|
- the run failed and the blocker is durably recorded
|
|
806
|
-
- the next step is clearly `analysis-campaign`, `write`, another `experiment`, or `reset`
|
|
266
|
+
- the next step is clearly `analysis-campaign`, `write`, another `experiment`, `optimize`, or `reset`
|
|
267
|
+
|
|
268
|
+
A good experiment pass leaves one interpretable result or one explicit blocker, not another vague promise to rerun later.
|