@researai/deepscientist 1.5.16 → 1.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +309 -130
- package/AISB/catalog/aisb.b1.agentic_coding.yaml +244 -0
- package/AISB/catalog/aisb.b10.climate_earth.yaml +235 -0
- package/AISB/catalog/aisb.b11.model_efficiency.yaml +231 -0
- package/AISB/catalog/aisb.b12.embodied_ai.yaml +238 -0
- package/AISB/catalog/aisb.b2.agent_systems.yaml +229 -0
- package/AISB/catalog/aisb.b3.self_evolving_rl.yaml +237 -0
- package/AISB/catalog/aisb.b4.lm_reasoning.yaml +240 -0
- package/AISB/catalog/aisb.b5.math_proof.yaml +235 -0
- package/AISB/catalog/aisb.b6.research_process.yaml +243 -0
- package/AISB/catalog/aisb.b7.multimodal_fusion.yaml +232 -0
- package/AISB/catalog/aisb.b8.lifesci_drug.yaml +275 -0
- package/AISB/catalog/aisb.b9.material_science.yaml +237 -0
- package/AISB/catalog/aisb.t3.001_savvy.yaml +159 -0
- package/AISB/catalog/aisb.t3.001_savvy.zh.yaml +121 -0
- package/AISB/catalog/aisb.t3.002_pinet.yaml +189 -0
- package/AISB/catalog/aisb.t3.002_pinet.zh.yaml +130 -0
- package/AISB/catalog/aisb.t3.004_decentralattn.yaml +184 -0
- package/AISB/catalog/aisb.t3.004_decentralattn.zh.yaml +153 -0
- package/AISB/catalog/aisb.t3.005_tsae.yaml +193 -0
- package/AISB/catalog/aisb.t3.005_tsae.zh.yaml +139 -0
- package/AISB/catalog/aisb.t3.006_physense.yaml +194 -0
- package/AISB/catalog/aisb.t3.006_physense.zh.yaml +118 -0
- package/AISB/catalog/aisb.t3.007_reasoningiqa.yaml +169 -0
- package/AISB/catalog/aisb.t3.007_reasoningiqa.zh.yaml +133 -0
- package/AISB/catalog/aisb.t3.008_meanflows.yaml +188 -0
- package/AISB/catalog/aisb.t3.008_meanflows.zh.yaml +140 -0
- package/AISB/catalog/aisb.t3.009_scoremissing.yaml +179 -0
- package/AISB/catalog/aisb.t3.009_scoremissing.zh.yaml +119 -0
- package/AISB/catalog/aisb.t3.010_suitabilityfilter.yaml +221 -0
- package/AISB/catalog/aisb.t3.010_suitabilityfilter.zh.yaml +141 -0
- package/AISB/catalog/aisb.t3.011_osd.yaml +206 -0
- package/AISB/catalog/aisb.t3.011_osd.zh.yaml +163 -0
- package/AISB/catalog/aisb.t3.012_efficientqat.yaml +206 -0
- package/AISB/catalog/aisb.t3.012_efficientqat.zh.yaml +159 -0
- package/AISB/catalog/aisb.t3.013_appl.yaml +152 -0
- package/AISB/catalog/aisb.t3.013_appl.zh.yaml +126 -0
- package/AISB/catalog/aisb.t3.014_piguard.yaml +207 -0
- package/AISB/catalog/aisb.t3.014_piguard.zh.yaml +164 -0
- package/AISB/catalog/aisb.t3.015_frspec.yaml +209 -0
- package/AISB/catalog/aisb.t3.015_frspec.zh.yaml +163 -0
- package/AISB/catalog/aisb.t3.016_mathfusion.yaml +166 -0
- package/AISB/catalog/aisb.t3.016_mathfusion.zh.yaml +145 -0
- package/AISB/catalog/aisb.t3.017_multimodalglp.yaml +171 -0
- package/AISB/catalog/aisb.t3.017_multimodalglp.zh.yaml +122 -0
- package/AISB/catalog/aisb.t3.018_cotsynth.yaml +206 -0
- package/AISB/catalog/aisb.t3.018_cotsynth.zh.yaml +162 -0
- package/AISB/catalog/aisb.t3.019_dyscaleut.yaml +211 -0
- package/AISB/catalog/aisb.t3.019_dyscaleut.zh.yaml +148 -0
- package/AISB/catalog/aisb.t3.020_aristotle.yaml +173 -0
- package/AISB/catalog/aisb.t3.020_aristotle.zh.yaml +119 -0
- package/AISB/catalog/aisb.t3.021_tokenrecycling.yaml +160 -0
- package/AISB/catalog/aisb.t3.021_tokenrecycling.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.022_chainofreasoning.yaml +204 -0
- package/AISB/catalog/aisb.t3.022_chainofreasoning.zh.yaml +161 -0
- package/AISB/catalog/aisb.t3.023_guidedembed.yaml +211 -0
- package/AISB/catalog/aisb.t3.023_guidedembed.zh.yaml +189 -0
- package/AISB/catalog/aisb.t3.024_outputcentric.yaml +148 -0
- package/AISB/catalog/aisb.t3.024_outputcentric.zh.yaml +131 -0
- package/AISB/catalog/aisb.t3.025_deeper.yaml +143 -0
- package/AISB/catalog/aisb.t3.025_deeper.zh.yaml +116 -0
- package/AISB/catalog/aisb.t3.026_gartkg.yaml +195 -0
- package/AISB/catalog/aisb.t3.026_gartkg.zh.yaml +127 -0
- package/AISB/catalog/aisb.t3.027_citeeval.yaml +182 -0
- package/AISB/catalog/aisb.t3.027_citeeval.zh.yaml +135 -0
- package/AISB/catalog/aisb.t3.028_sbam.yaml +206 -0
- package/AISB/catalog/aisb.t3.028_sbam.zh.yaml +166 -0
- package/AISB/catalog/aisb.t3.029_cdqgeoembed.yaml +224 -0
- package/AISB/catalog/aisb.t3.029_cdqgeoembed.zh.yaml +142 -0
- package/AISB/catalog/aisb.t3.030_processrm.yaml +211 -0
- package/AISB/catalog/aisb.t3.030_processrm.zh.yaml +166 -0
- package/AISB/catalog/aisb.t3.031_circuitstability.yaml +172 -0
- package/AISB/catalog/aisb.t3.031_circuitstability.zh.yaml +134 -0
- package/AISB/catalog/aisb.t3.032_ptsolver.yaml +169 -0
- package/AISB/catalog/aisb.t3.032_ptsolver.zh.yaml +135 -0
- package/AISB/catalog/aisb.t3.033_gcse.yaml +144 -0
- package/AISB/catalog/aisb.t3.033_gcse.zh.yaml +126 -0
- package/AISB/catalog/aisb.t3.034_ensemblewm.yaml +183 -0
- package/AISB/catalog/aisb.t3.034_ensemblewm.zh.yaml +146 -0
- package/AISB/catalog/aisb.t3.035_moralvalueswa.yaml +207 -0
- package/AISB/catalog/aisb.t3.035_moralvalueswa.zh.yaml +165 -0
- package/AISB/catalog/aisb.t3.036_weakstrongpref.yaml +210 -0
- package/AISB/catalog/aisb.t3.036_weakstrongpref.zh.yaml +194 -0
- package/AISB/catalog/aisb.t3.037_dementiamask.yaml +172 -0
- package/AISB/catalog/aisb.t3.037_dementiamask.zh.yaml +132 -0
- package/AISB/catalog/aisb.t3.038_tinysam.yaml +284 -0
- package/AISB/catalog/aisb.t3.038_tinysam.zh.yaml +240 -0
- package/AISB/catalog/aisb.t3.039_calf.yaml +224 -0
- package/AISB/catalog/aisb.t3.039_calf.zh.yaml +194 -0
- package/AISB/catalog/aisb.t3.040_graniteguardian.yaml +199 -0
- package/AISB/catalog/aisb.t3.040_graniteguardian.zh.yaml +174 -0
- package/AISB/catalog/aisb.t3.041_amdm.yaml +149 -0
- package/AISB/catalog/aisb.t3.041_amdm.zh.yaml +137 -0
- package/AISB/catalog/aisb.t3.042_xpatch.yaml +216 -0
- package/AISB/catalog/aisb.t3.042_xpatch.zh.yaml +182 -0
- package/AISB/catalog/aisb.t3.043_vhm.yaml +268 -0
- package/AISB/catalog/aisb.t3.043_vhm.zh.yaml +193 -0
- package/AISB/catalog/aisb.t3.044_rgvi.yaml +224 -0
- package/AISB/catalog/aisb.t3.044_rgvi.zh.yaml +176 -0
- package/AISB/catalog/aisb.t3.045_pslstm.yaml +203 -0
- package/AISB/catalog/aisb.t3.045_pslstm.zh.yaml +179 -0
- package/AISB/catalog/aisb.t3.046_nonstatts.yaml +208 -0
- package/AISB/catalog/aisb.t3.046_nonstatts.zh.yaml +194 -0
- package/AISB/catalog/aisb.t3.047_timepfn.yaml +156 -0
- package/AISB/catalog/aisb.t3.047_timepfn.zh.yaml +124 -0
- package/AISB/catalog/aisb.t3.048_proxyspex.yaml +148 -0
- package/AISB/catalog/aisb.t3.048_proxyspex.zh.yaml +125 -0
- package/AISB/catalog/aisb.t3.049_hogwildinference.yaml +183 -0
- package/AISB/catalog/aisb.t3.049_hogwildinference.zh.yaml +138 -0
- package/AISB/catalog/aisb.t3.050_causalpfn.yaml +214 -0
- package/AISB/catalog/aisb.t3.050_causalpfn.zh.yaml +190 -0
- package/AISB/catalog/aisb.t3.051_flashtp.yaml +169 -0
- package/AISB/catalog/aisb.t3.051_flashtp.zh.yaml +124 -0
- package/AISB/catalog/aisb.t3.052_nsdiff.yaml +155 -0
- package/AISB/catalog/aisb.t3.052_nsdiff.zh.yaml +138 -0
- package/AISB/catalog/aisb.t3.053_k2vae.yaml +158 -0
- package/AISB/catalog/aisb.t3.053_k2vae.zh.yaml +132 -0
- package/AISB/catalog/aisb.t3.054_timebase.yaml +178 -0
- package/AISB/catalog/aisb.t3.054_timebase.zh.yaml +158 -0
- package/AISB/catalog/aisb.t3.055_csbrain.yaml +238 -0
- package/AISB/catalog/aisb.t3.055_csbrain.zh.yaml +184 -0
- package/AISB/catalog/aisb.t3.056_infosam.yaml +224 -0
- package/AISB/catalog/aisb.t3.056_infosam.zh.yaml +189 -0
- package/AISB/catalog/aisb.t3.057_mdreid.yaml +129 -0
- package/AISB/catalog/aisb.t3.057_mdreid.zh.yaml +117 -0
- package/AISB/catalog/aisb.t3.058_mindglitch.yaml +171 -0
- package/AISB/catalog/aisb.t3.058_mindglitch.zh.yaml +145 -0
- package/AISB/catalog/aisb.t3.059_selfsupervised.yaml +154 -0
- package/AISB/catalog/aisb.t3.059_selfsupervised.zh.yaml +125 -0
- package/AISB/catalog/aisb.t3.060_iaggad.yaml +121 -0
- package/AISB/catalog/aisb.t3.060_iaggad.zh.yaml +100 -0
- package/AISB/catalog/aisb.t3.061_hsgkn.yaml +136 -0
- package/AISB/catalog/aisb.t3.061_hsgkn.zh.yaml +113 -0
- package/AISB/catalog/aisb.t3.062_visionts.yaml +237 -0
- package/AISB/catalog/aisb.t3.062_visionts.zh.yaml +216 -0
- package/AISB/catalog/aisb.t3.063_tsrag.yaml +162 -0
- package/AISB/catalog/aisb.t3.063_tsrag.zh.yaml +138 -0
- package/AISB/catalog/aisb.t3.064_pir.yaml +221 -0
- package/AISB/catalog/aisb.t3.064_pir.zh.yaml +197 -0
- package/AISB/catalog/aisb.t3.065_proteinbinding.yaml +234 -0
- package/AISB/catalog/aisb.t3.065_proteinbinding.zh.yaml +167 -0
- package/AISB/catalog/aisb.t3.066_tropicalattention.yaml +267 -0
- package/AISB/catalog/aisb.t3.066_tropicalattention.zh.yaml +229 -0
- package/AISB/catalog/aisb.t3.067_kanad.yaml +193 -0
- package/AISB/catalog/aisb.t3.067_kanad.zh.yaml +167 -0
- package/AISB/catalog/aisb.t3.068_sempo.yaml +187 -0
- package/AISB/catalog/aisb.t3.068_sempo.zh.yaml +148 -0
- package/AISB/catalog/aisb.t3.069_treehfd.yaml +129 -0
- package/AISB/catalog/aisb.t3.069_treehfd.zh.yaml +111 -0
- package/AISB/catalog/aisb.t3.070_certifiedunlearning.yaml +224 -0
- package/AISB/catalog/aisb.t3.070_certifiedunlearning.zh.yaml +171 -0
- package/AISB/catalog/aisb.t3.071_neuralmjd.yaml +142 -0
- package/AISB/catalog/aisb.t3.071_neuralmjd.zh.yaml +120 -0
- package/AISB/catalog/aisb.t3.072_fedgmt.yaml +181 -0
- package/AISB/catalog/aisb.t3.072_fedgmt.zh.yaml +158 -0
- package/AISB/catalog/aisb.t3.073_rld.yaml +161 -0
- package/AISB/catalog/aisb.t3.073_rld.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.074_lsvi.yaml +163 -0
- package/AISB/catalog/aisb.t3.074_lsvi.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.075_treeslicedentropy.yaml +201 -0
- package/AISB/catalog/aisb.t3.075_treeslicedentropy.zh.yaml +148 -0
- package/AISB/catalog/aisb.t3.076_aanet.yaml +169 -0
- package/AISB/catalog/aisb.t3.076_aanet.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.077_cmnn.yaml +199 -0
- package/AISB/catalog/aisb.t3.077_cmnn.zh.yaml +165 -0
- package/AISB/catalog/aisb.t3.078_conformalanomaly.yaml +146 -0
- package/AISB/catalog/aisb.t3.078_conformalanomaly.zh.yaml +117 -0
- package/AISB/catalog/aisb.t3.079_dpfkmeans.yaml +131 -0
- package/AISB/catalog/aisb.t3.079_dpfkmeans.zh.yaml +104 -0
- package/AISB/catalog/aisb.t3.080_latentscorereweight.yaml +169 -0
- package/AISB/catalog/aisb.t3.080_latentscorereweight.zh.yaml +123 -0
- package/AISB/catalog/aisb.t3.081_qmamba.yaml +150 -0
- package/AISB/catalog/aisb.t3.081_qmamba.zh.yaml +117 -0
- package/AISB/catalog/aisb.t3.082_onlinellmrouting.yaml +160 -0
- package/AISB/catalog/aisb.t3.082_onlinellmrouting.zh.yaml +133 -0
- package/AISB/catalog/aisb.t3.083_starformer.yaml +178 -0
- package/AISB/catalog/aisb.t3.083_starformer.zh.yaml +140 -0
- package/AISB/catalog/aisb.t3.084_ift.yaml +139 -0
- package/AISB/catalog/aisb.t3.084_ift.zh.yaml +111 -0
- package/AISB/catalog/aisb.t3.085_neuralsurv.yaml +183 -0
- package/AISB/catalog/aisb.t3.085_neuralsurv.zh.yaml +143 -0
- package/AISB/catalog/aisb.t3.086_stella.yaml +197 -0
- package/AISB/catalog/aisb.t3.086_stella.zh.yaml +142 -0
- package/AISB/catalog/aisb.t3.087_moses.yaml +167 -0
- package/AISB/catalog/aisb.t3.087_moses.zh.yaml +132 -0
- package/AISB/catalog/aisb.t3.088_channelnorm.yaml +140 -0
- package/AISB/catalog/aisb.t3.088_channelnorm.zh.yaml +109 -0
- package/AISB/catalog/aisb.t3.089_causalvelocity.yaml +730 -0
- package/AISB/catalog/aisb.t3.089_causalvelocity.zh.yaml +668 -0
- package/AISB/catalog/aisb.t3.090_rstib.yaml +144 -0
- package/AISB/catalog/aisb.t3.090_rstib.zh.yaml +109 -0
- package/AISB/catalog/aisb.t3.091_timeawarecausal.yaml +132 -0
- package/AISB/catalog/aisb.t3.091_timeawarecausal.zh.yaml +107 -0
- package/AISB/catalog/aisb.t3.092_kmeanslocalopt.yaml +138 -0
- package/AISB/catalog/aisb.t3.092_kmeanslocalopt.zh.yaml +110 -0
- package/AISB/catalog/aisb.t3.093_fedwmsam.yaml +134 -0
- package/AISB/catalog/aisb.t3.093_fedwmsam.zh.yaml +106 -0
- package/AISB/catalog/aisb.t3.094_boundre.yaml +147 -0
- package/AISB/catalog/aisb.t3.094_boundre.zh.yaml +114 -0
- package/AISB/catalog/aisb.t3.095_fastfeaturecp.yaml +153 -0
- package/AISB/catalog/aisb.t3.095_fastfeaturecp.zh.yaml +118 -0
- package/AISB/catalog/aisb.t3.096_m3svm.yaml +189 -0
- package/AISB/catalog/aisb.t3.096_m3svm.zh.yaml +149 -0
- package/AISB/catalog/aisb.t3.097_wassersteintl.yaml +212 -0
- package/AISB/catalog/aisb.t3.097_wassersteintl.zh.yaml +169 -0
- package/AISB/catalog/aisb.t3.098_xmahalanobis.yaml +171 -0
- package/AISB/catalog/aisb.t3.098_xmahalanobis.zh.yaml +127 -0
- package/AISB/catalog/aisb.t3.099_ollalanding.yaml +248 -0
- package/AISB/catalog/aisb.t3.099_ollalanding.zh.yaml +182 -0
- package/AISB/catalog/aisb.t3.100_invmissingdata.yaml +179 -0
- package/AISB/catalog/aisb.t3.100_invmissingdata.zh.yaml +150 -0
- package/AISB/catalog/aisb.t3.101_acia.yaml +164 -0
- package/AISB/catalog/aisb.t3.101_acia.zh.yaml +109 -0
- package/AISB/catalog/aisb.t3.102_stochasticff.yaml +178 -0
- package/AISB/catalog/aisb.t3.102_stochasticff.zh.yaml +130 -0
- package/AISB/catalog/aisb.t3.103_qdcp.yaml +150 -0
- package/AISB/catalog/aisb.t3.103_qdcp.zh.yaml +116 -0
- package/AISB/catalog/aisb.t3.104_balancedactiveinf.yaml +137 -0
- package/AISB/catalog/aisb.t3.104_balancedactiveinf.zh.yaml +104 -0
- package/AISB/catalog/aisb.t3.105_binaryclasseval.yaml +161 -0
- package/AISB/catalog/aisb.t3.105_binaryclasseval.zh.yaml +130 -0
- package/AISB/image/001_aisb.t3.001_savvy.jpg +0 -0
- package/AISB/image/002_aisb.t3.002_pinet.jpg +0 -0
- package/AISB/image/003_aisb.t3.003_dmsqd.jpg +0 -0
- package/AISB/image/004_aisb.t3.004_decentralattn.jpg +0 -0
- package/AISB/image/005_aisb.t3.005_tsae.jpg +0 -0
- package/AISB/image/006_aisb.t3.006_physense.jpg +0 -0
- package/AISB/image/007_aisb.t3.007_reasoningiqa.jpg +0 -0
- package/AISB/image/008_aisb.t3.008_meanflows.jpg +0 -0
- package/AISB/image/009_aisb.t3.009_scoremissing.jpg +0 -0
- package/AISB/image/010_aisb.t3.010_suitabilityfilter.jpg +0 -0
- package/AISB/image/011_aisb.t3.011_osd.jpg +0 -0
- package/AISB/image/012_aisb.t3.012_efficientqat.jpg +0 -0
- package/AISB/image/013_aisb.t3.013_appl.jpg +0 -0
- package/AISB/image/014_aisb.t3.014_piguard.jpg +0 -0
- package/AISB/image/015_aisb.t3.015_frspec.jpg +0 -0
- package/AISB/image/016_aisb.t3.016_mathfusion.jpg +0 -0
- package/AISB/image/017_aisb.t3.017_multimodalglp.jpg +0 -0
- package/AISB/image/018_aisb.t3.018_cotsynth.jpg +0 -0
- package/AISB/image/019_aisb.t3.019_dyscaleut.jpg +0 -0
- package/AISB/image/020_aisb.t3.020_aristotle.jpg +0 -0
- package/AISB/image/021_aisb.t3.021_tokenrecycling.jpg +0 -0
- package/AISB/image/022_aisb.t3.022_chainofreasoning.jpg +0 -0
- package/AISB/image/023_aisb.t3.023_guidedembed.jpg +0 -0
- package/AISB/image/024_aisb.t3.024_outputcentric.jpg +0 -0
- package/AISB/image/025_aisb.t3.025_deeper.jpg +0 -0
- package/AISB/image/026_aisb.t3.026_gartkg.jpg +0 -0
- package/AISB/image/027_aisb.t3.027_citeeval.jpg +0 -0
- package/AISB/image/028_aisb.t3.028_sbam.jpg +0 -0
- package/AISB/image/029_aisb.t3.029_cdqgeoembed.jpg +0 -0
- package/AISB/image/030_aisb.t3.030_processrm.jpg +0 -0
- package/AISB/image/031_aisb.t3.031_circuitstability.jpg +0 -0
- package/AISB/image/032_aisb.t3.032_ptsolver.jpg +0 -0
- package/AISB/image/033_aisb.t3.033_gcse.jpg +0 -0
- package/AISB/image/034_aisb.t3.034_ensemblewm.jpg +0 -0
- package/AISB/image/035_aisb.t3.035_moralvalueswa.jpg +0 -0
- package/AISB/image/036_aisb.t3.036_weakstrongpref.jpg +0 -0
- package/AISB/image/037_aisb.t3.037_dementiamask.jpg +0 -0
- package/AISB/image/038_aisb.t3.038_tinysam.jpg +0 -0
- package/AISB/image/039_aisb.t3.039_calf.jpg +0 -0
- package/AISB/image/040_aisb.t3.040_graniteguardian.jpg +0 -0
- package/AISB/image/041_aisb.t3.041_amdm.jpg +0 -0
- package/AISB/image/042_aisb.t3.042_xpatch.jpg +0 -0
- package/AISB/image/043_aisb.t3.043_vhm.jpg +0 -0
- package/AISB/image/044_aisb.t3.044_rgvi.jpg +0 -0
- package/AISB/image/045_aisb.t3.045_pslstm.jpg +0 -0
- package/AISB/image/046_aisb.t3.046_nonstatts.jpg +0 -0
- package/AISB/image/047_aisb.t3.047_timepfn.jpg +0 -0
- package/AISB/image/048_aisb.t3.048_proxyspex.jpg +0 -0
- package/AISB/image/049_aisb.t3.049_hogwildinference.jpg +0 -0
- package/AISB/image/050_aisb.t3.050_causalpfn.jpg +0 -0
- package/AISB/image/051_aisb.t3.051_flashtp.jpg +0 -0
- package/AISB/image/052_aisb.t3.052_nsdiff.jpg +0 -0
- package/AISB/image/053_aisb.t3.053_k2vae.jpg +0 -0
- package/AISB/image/054_aisb.t3.054_timebase.jpg +0 -0
- package/AISB/image/055_aisb.t3.055_csbrain.jpg +0 -0
- package/AISB/image/056_aisb.t3.056_infosam.jpg +0 -0
- package/AISB/image/057_aisb.t3.057_mdreid.jpg +0 -0
- package/AISB/image/058_aisb.t3.058_mindglitch.jpg +0 -0
- package/AISB/image/059_aisb.t3.059_selfsupervised.jpg +0 -0
- package/AISB/image/060_aisb.t3.060_iaggad.jpg +0 -0
- package/AISB/image/061_aisb.t3.061_hsgkn.jpg +0 -0
- package/AISB/image/062_aisb.t3.062_visionts.jpg +0 -0
- package/AISB/image/063_aisb.t3.063_tsrag.jpg +0 -0
- package/AISB/image/064_aisb.t3.064_pir.jpg +0 -0
- package/AISB/image/065_aisb.t3.065_proteinbinding.jpg +0 -0
- package/AISB/image/066_aisb.t3.066_tropicalattention.jpg +0 -0
- package/AISB/image/067_aisb.t3.067_kanad.jpg +0 -0
- package/AISB/image/068_aisb.t3.068_sempo.jpg +0 -0
- package/AISB/image/069_aisb.t3.069_treehfd.jpg +0 -0
- package/AISB/image/070_aisb.t3.070_certifiedunlearning.jpg +0 -0
- package/AISB/image/071_aisb.t3.071_neuralmjd.jpg +0 -0
- package/AISB/image/072_aisb.t3.072_fedgmt.jpg +0 -0
- package/AISB/image/073_aisb.t3.073_rld.jpg +0 -0
- package/AISB/image/074_aisb.t3.074_lsvi.jpg +0 -0
- package/AISB/image/075_aisb.t3.075_treeslicedentropy.jpg +0 -0
- package/AISB/image/076_aisb.t3.076_aanet.jpg +0 -0
- package/AISB/image/077_aisb.t3.077_cmnn.jpg +0 -0
- package/AISB/image/078_aisb.t3.078_conformalanomaly.jpg +0 -0
- package/AISB/image/079_aisb.t3.079_dpfkmeans.jpg +0 -0
- package/AISB/image/080_aisb.t3.080_latentscorereweight.jpg +0 -0
- package/AISB/image/081_aisb.t3.081_qmamba.jpg +0 -0
- package/AISB/image/082_aisb.t3.082_onlinellmrouting.jpg +0 -0
- package/AISB/image/083_aisb.t3.083_starformer.jpg +0 -0
- package/AISB/image/084_aisb.t3.084_ift.jpg +0 -0
- package/AISB/image/085_aisb.t3.085_neuralsurv.jpg +0 -0
- package/AISB/image/086_aisb.t3.086_stella.jpg +0 -0
- package/AISB/image/087_aisb.t3.087_moses.jpg +0 -0
- package/AISB/image/088_aisb.t3.088_channelnorm.jpg +0 -0
- package/AISB/image/089_aisb.t3.089_causalvelocity.jpg +0 -0
- package/AISB/image/090_aisb.t3.090_rstib.jpg +0 -0
- package/AISB/image/091_aisb.t3.091_timeawarecausal.jpg +0 -0
- package/AISB/image/092_aisb.t3.092_kmeanslocalopt.jpg +0 -0
- package/AISB/image/093_aisb.t3.093_fedwmsam.jpg +0 -0
- package/AISB/image/094_aisb.t3.094_boundre.jpg +0 -0
- package/AISB/image/095_aisb.t3.095_fastfeaturecp.jpg +0 -0
- package/AISB/image/096_aisb.t3.096_m3svm.jpg +0 -0
- package/AISB/image/097_aisb.t3.097_wassersteintl.jpg +0 -0
- package/AISB/image/098_aisb.t3.098_xmahalanobis.jpg +0 -0
- package/AISB/image/099_aisb.t3.099_ollalanding.jpg +0 -0
- package/AISB/image/100_aisb.t3.100_invmissingdata.jpg +0 -0
- package/AISB/image/101_aisb.t3.101_acia.jpg +0 -0
- package/AISB/image/102_aisb.t3.102_stochasticff.jpg +0 -0
- package/AISB/image/103_aisb.t3.103_qdcp.jpg +0 -0
- package/AISB/image/104_aisb.t3.104_balancedactiveinf.jpg +0 -0
- package/AISB/image/105_aisb.t3.105_binaryclasseval.jpg +0 -0
- package/AISB/image/106_aisb.t1.reasoning_lite.jpg +0 -0
- package/AISB/image/107_aisb.t2.paper_audit.jpg +0 -0
- package/AISB/image/108_aisb.t3.multi_gpu_search.jpg +0 -0
- package/AISB/image/109_aisb.t3.tdc_admet.jpg +0 -0
- package/AISB/image/aisb.b1.agentic_coding.svg +16 -0
- package/AISB/image/aisb.b10.climate_earth.svg +16 -0
- package/AISB/image/aisb.b11.model_efficiency.svg +16 -0
- package/AISB/image/aisb.b12.embodied_ai.svg +16 -0
- package/AISB/image/aisb.b2.agent_systems.svg +16 -0
- package/AISB/image/aisb.b3.self_evolving_rl.svg +16 -0
- package/AISB/image/aisb.b4.lm_reasoning.svg +16 -0
- package/AISB/image/aisb.b5.math_proof.svg +16 -0
- package/AISB/image/aisb.b6.research_process.svg +16 -0
- package/AISB/image/aisb.b7.multimodal_fusion.svg +16 -0
- package/AISB/image/aisb.b8.lifesci_drug.svg +16 -0
- package/AISB/image/aisb.b9.material_science.svg +16 -0
- package/README.md +196 -32
- package/bin/ds.js +924 -66
- package/docs/en/00_QUICK_START.md +195 -18
- package/docs/en/01_SETTINGS_REFERENCE.md +468 -96
- package/docs/en/02_START_RESEARCH_GUIDE.md +26 -5
- package/docs/en/03_QQ_CONNECTOR_GUIDE.md +14 -3
- package/docs/en/04_LINGZHU_CONNECTOR_GUIDE.md +2 -0
- package/docs/en/05_TUI_GUIDE.md +171 -2
- package/docs/en/07_MEMORY_AND_MCP.md +38 -2
- package/docs/en/09_DOCTOR.md +78 -7
- package/docs/en/10_WEIXIN_CONNECTOR_GUIDE.md +38 -1
- package/docs/en/11_LICENSE_AND_RISK.md +4 -0
- package/docs/en/12_GUIDED_WORKFLOW_TOUR.md +15 -0
- package/docs/en/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +9 -0
- package/docs/en/15_CODEX_PROVIDER_SETUP.md +624 -180
- package/docs/en/16_TELEGRAM_CONNECTOR_GUIDE.md +14 -0
- package/docs/en/17_WHATSAPP_CONNECTOR_GUIDE.md +14 -0
- package/docs/en/18_FEISHU_CONNECTOR_GUIDE.md +14 -0
- package/docs/en/21_LOCAL_MODEL_BACKENDS_GUIDE.md +386 -0
- package/docs/en/22_BENCHSTORE_YAML_REFERENCE.md +469 -0
- package/docs/en/23_BENCHSTORE_GITHUB_RELEASES_SPEC.md +316 -0
- package/docs/en/24_CLAUDE_CODE_PROVIDER_SETUP.md +469 -0
- package/docs/en/25_OPENCODE_PROVIDER_SETUP.md +653 -0
- package/docs/en/26_CITATION_AND_ATTRIBUTION.md +119 -0
- package/docs/en/27_KIMI_CODE_PROVIDER_SETUP.md +180 -0
- package/docs/en/28_DISCORD_CONNECTOR_GUIDE.md +61 -0
- package/docs/en/29_SLACK_CONNECTOR_GUIDE.md +60 -0
- package/docs/en/30_SETTINGS_CONTROL_CENTER_GUIDE.md +371 -0
- package/docs/en/{19_LOCAL_BROWSER_AUTH.md → 31_LOCAL_BROWSER_AUTH.md} +1 -1
- package/docs/en/32_WINDOWS_WSL2_DEPLOYMENT_GUIDE.md +273 -0
- package/docs/en/33_WORKSPACE_EXPLORER_QA.md +121 -0
- package/docs/en/91_DEVELOPMENT.md +266 -0
- package/docs/en/99_ACKNOWLEDGEMENTS.md +24 -19
- package/docs/en/README.md +48 -7
- package/docs/images/admin/admin-connectors-health-en.png +0 -0
- package/docs/images/admin/admin-controllers-en.png +0 -0
- package/docs/images/admin/admin-diagnostics-en.png +0 -0
- package/docs/images/admin/admin-errors-en.png +0 -0
- package/docs/images/admin/admin-issues-en.png +0 -0
- package/docs/images/admin/admin-logs-en.png +0 -0
- package/docs/images/admin/admin-quest-detail-en.png +0 -0
- package/docs/images/admin/admin-quests-en.png +0 -0
- package/docs/images/admin/admin-repairs-en.png +0 -0
- package/docs/images/admin/admin-runtime-en.png +0 -0
- package/docs/images/admin/admin-search-en.png +0 -0
- package/docs/images/admin/admin-stats-en.png +0 -0
- package/docs/images/admin/admin-summary-en.png +0 -0
- package/docs/images/connectors/connector-discord-en.png +0 -0
- package/docs/images/connectors/connector-feishu-en.png +0 -0
- package/docs/images/connectors/connector-lingzhu-en.png +0 -0
- package/docs/images/connectors/connector-qq-en.png +0 -0
- package/docs/images/connectors/connector-slack-en.png +0 -0
- package/docs/images/connectors/connector-telegram-en.png +0 -0
- package/docs/images/connectors/connector-weixin-en.png +0 -0
- package/docs/images/connectors/connector-whatsapp-en.png +0 -0
- package/docs/images/settings/settings-baselines-en.png +0 -0
- package/docs/images/settings/settings-config-en.png +0 -0
- package/docs/images/settings/settings-connectors-overview-en.png +0 -0
- package/docs/images/settings/settings-deepxiv-en.png +0 -0
- package/docs/images/settings/settings-mcp-servers-en.png +0 -0
- package/docs/images/settings/settings-plugins-en.png +0 -0
- package/docs/images/settings/settings-runners-en.png +0 -0
- package/docs/zh/00_QUICK_START.md +142 -18
- package/docs/zh/01_SETTINGS_REFERENCE.md +219 -98
- package/docs/zh/02_START_RESEARCH_GUIDE.md +26 -5
- package/docs/zh/05_TUI_GUIDE.md +171 -2
- package/docs/zh/07_MEMORY_AND_MCP.md +29 -2
- package/docs/zh/09_DOCTOR.md +54 -8
- package/docs/zh/10_WEIXIN_CONNECTOR_GUIDE.md +24 -1
- package/docs/zh/11_LICENSE_AND_RISK.md +4 -0
- package/docs/zh/12_GUIDED_WORKFLOW_TOUR.md +15 -0
- package/docs/zh/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +9 -0
- package/docs/zh/15_CODEX_PROVIDER_SETUP.md +552 -181
- package/docs/zh/21_LOCAL_MODEL_BACKENDS_GUIDE.md +384 -0
- package/docs/zh/22_BENCHSTORE_YAML_REFERENCE.md +459 -0
- package/docs/zh/23_BENCHSTORE_GITHUB_RELEASES_SPEC.md +287 -0
- package/docs/zh/23_CLAUDE_RUNNER_GUIDE.md +103 -0
- package/docs/zh/24_CLAUDE_CODE_PROVIDER_SETUP.md +460 -0
- package/docs/zh/25_OPENCODE_PROVIDER_SETUP.md +660 -0
- package/docs/zh/26_CITATION_AND_ATTRIBUTION.md +102 -0
- package/docs/zh/27_KIMI_CODE_PROVIDER_SETUP.md +51 -0
- package/docs/zh/{19_LOCAL_BROWSER_AUTH.md → 31_LOCAL_BROWSER_AUTH.md} +1 -1
- package/docs/zh/32_WINDOWS_WSL2_DEPLOYMENT_GUIDE.md +264 -0
- package/docs/zh/33_WORKSPACE_EXPLORER_QA.md +127 -0
- package/docs/zh/99_ACKNOWLEDGEMENTS.md +23 -19
- package/docs/zh/README.md +33 -7
- package/install.sh +168 -20
- package/package.json +5 -1
- package/pyproject.toml +2 -1
- package/src/deepscientist/__init__.py +1 -1
- package/src/deepscientist/acp/envelope.py +13 -0
- package/src/deepscientist/admin/__init__.py +3 -0
- package/src/deepscientist/admin/charts.py +681 -0
- package/src/deepscientist/admin/logs.py +119 -0
- package/src/deepscientist/admin/repairs.py +217 -0
- package/src/deepscientist/admin/service.py +1310 -0
- package/src/deepscientist/admin/system_info.py +700 -0
- package/src/deepscientist/admin/tasks.py +465 -0
- package/src/deepscientist/admin/tool_metrics.py +600 -0
- package/src/deepscientist/artifact/guidance.py +8 -4
- package/src/deepscientist/artifact/schemas.py +115 -0
- package/src/deepscientist/artifact/service.py +4268 -260
- package/src/deepscientist/bash_exec/monitor.py +30 -3
- package/src/deepscientist/bash_exec/service.py +134 -1
- package/src/deepscientist/benchstore/__init__.py +4 -0
- package/src/deepscientist/benchstore/prompt_builder.py +224 -0
- package/src/deepscientist/benchstore/service.py +1716 -0
- package/src/deepscientist/bridges/connectors.py +8 -2
- package/src/deepscientist/channels/weixin_ilink.py +8 -1
- package/src/deepscientist/cli.py +92 -17
- package/src/deepscientist/codex_cli_compat.py +187 -74
- package/src/deepscientist/config/models.py +82 -11
- package/src/deepscientist/config/service.py +1077 -93
- package/src/deepscientist/connector/weixin_support.py +48 -17
- package/src/deepscientist/daemon/api/handlers.py +827 -235
- package/src/deepscientist/daemon/api/router.py +81 -1
- package/src/deepscientist/daemon/app.py +1512 -85
- package/src/deepscientist/diagnostics/__init__.py +6 -0
- package/src/deepscientist/diagnostics/runner_failures.py +277 -0
- package/src/deepscientist/doctor.py +407 -56
- package/src/deepscientist/evidence_packets.py +590 -0
- package/src/deepscientist/home.py +52 -4
- package/src/deepscientist/kimi_cli_compat.py +50 -0
- package/src/deepscientist/latex_runtime.py +2 -2
- package/src/deepscientist/mcp/context.py +2 -0
- package/src/deepscientist/mcp/schemas.py +114 -0
- package/src/deepscientist/mcp/server.py +1566 -126
- package/src/deepscientist/memory/service.py +203 -16
- package/src/deepscientist/process_control.py +8 -1
- package/src/deepscientist/prompts/builder.py +850 -88
- package/src/deepscientist/quest/__init__.py +2 -2
- package/src/deepscientist/quest/layout.py +12 -1
- package/src/deepscientist/quest/node_traces.py +10 -0
- package/src/deepscientist/quest/service.py +1852 -161
- package/src/deepscientist/quest/stage_views.py +1 -1
- package/src/deepscientist/runners/__init__.py +18 -0
- package/src/deepscientist/runners/base.py +89 -1
- package/src/deepscientist/runners/builtins.py +13 -1
- package/src/deepscientist/runners/claude.py +391 -0
- package/src/deepscientist/runners/codex.py +480 -35
- package/src/deepscientist/runners/codex_telemetry.py +127 -0
- package/src/deepscientist/runners/kimi.py +334 -0
- package/src/deepscientist/runners/metadata.py +68 -0
- package/src/deepscientist/runners/opencode.py +414 -0
- package/src/deepscientist/runners/runtime_overrides.py +100 -0
- package/src/deepscientist/runners/simple_cli.py +538 -0
- package/src/deepscientist/runtime_storage.py +303 -0
- package/src/deepscientist/shared.py +80 -16
- package/src/deepscientist/skills/installer.py +37 -0
- package/src/deepscientist/skills/registry.py +2 -0
- package/src/deepscientist/tinytex.py +2 -2
- package/src/deepscientist/tui.py +10 -3
- package/src/prompts/benchstore/system.md +77 -0
- package/src/prompts/connectors/qq.md +33 -2
- package/src/prompts/connectors/weixin.md +208 -23
- package/src/prompts/contracts/admin_ops.md +74 -0
- package/src/prompts/contracts/admin_ops_knowledge.md +138 -0
- package/src/prompts/contracts/shared_interaction.md +5 -10
- package/src/prompts/start_setup/system.md +422 -0
- package/src/prompts/system.md +411 -304
- package/src/prompts/system_copilot.md +89 -0
- package/src/skills/analysis-campaign/SKILL.md +239 -578
- package/src/skills/analysis-campaign/references/artifact-flow-examples.md +102 -0
- package/src/skills/analysis-campaign/references/boundary-cases.md +98 -0
- package/src/skills/analysis-campaign/references/campaign-checklist-template.md +39 -24
- package/src/skills/analysis-campaign/references/campaign-design.md +26 -10
- package/src/skills/analysis-campaign/references/campaign-plan-template.md +53 -54
- package/src/skills/analysis-campaign/references/operational-guidance.md +97 -0
- package/src/skills/analysis-campaign/references/writing-facing-slice-examples.md +10 -20
- package/src/skills/baseline/SKILL.md +183 -461
- package/src/skills/baseline/references/artifact-flow-examples.md +106 -0
- package/src/skills/baseline/references/artifact-payload-examples.md +1 -1
- package/src/skills/baseline/references/baseline-checklist-template.md +27 -35
- package/src/skills/baseline/references/baseline-plan-template.md +37 -76
- package/src/skills/baseline/references/boundary-cases.md +86 -0
- package/src/skills/baseline/references/codebase-audit-checklist.md +2 -6
- package/src/skills/baseline/references/comparability-contract.md +7 -12
- package/src/skills/baseline/references/operational-guidance.md +56 -0
- package/src/skills/baseline/references/route-selection.md +5 -25
- package/src/skills/decision/SKILL.md +113 -306
- package/src/skills/decision/references/checkpoint-memory-template.md +47 -0
- package/src/skills/decision/references/operational-guidance.md +94 -0
- package/src/skills/decision/references/research-route-criteria.md +7 -8
- package/src/skills/decision/references/strategic-decision-template.md +13 -26
- package/src/skills/experiment/SKILL.md +132 -670
- package/src/skills/experiment/references/execution-playbook.md +374 -0
- package/src/skills/experiment/references/main-experiment-checklist-template.md +26 -2
- package/src/skills/experiment/references/main-experiment-plan-template.md +28 -17
- package/src/skills/experiment/references/operational-guidance.md +108 -0
- package/src/skills/finalize/SKILL.md +62 -0
- package/src/skills/finalize/references/checkpoint-memory-template.md +49 -0
- package/src/skills/finalize/references/resume-packet-template.md +7 -0
- package/src/skills/idea/SKILL.md +228 -15
- package/src/skills/idea/references/controlled-brainstorming-playbook.md +78 -0
- package/src/skills/idea/references/current-board-packet-template.md +61 -0
- package/src/skills/idea/references/high-value-idea-sourcing.md +119 -0
- package/src/skills/idea/references/idea-generation-playbook.md +21 -0
- package/src/skills/idea/references/idea-thinking-flow.md +6 -0
- package/src/skills/idea/references/literature-survey-template.md +3 -0
- package/src/skills/idea/references/objective-contract-template.md +54 -0
- package/src/skills/idea/references/outline-seeding-example.md +56 -0
- package/src/skills/idea/references/pre-idea-draft-template.md +105 -0
- package/src/skills/idea/references/related-work-playbook.md +75 -2
- package/src/skills/idea/references/research-history-playbook.md +114 -0
- package/src/skills/idea/references/selection-gate.md +58 -6
- package/src/skills/intake-audit/SKILL.md +43 -2
- package/src/skills/intake-audit/references/state-audit-template.md +10 -0
- package/src/skills/nature-data/SKILL.md +128 -0
- package/src/skills/nature-data/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-data/agents/openai.yaml +4 -0
- package/src/skills/nature-data/references/chinese-author-alignment.md +84 -0
- package/src/skills/nature-data/references/fair-metadata-checklist.md +105 -0
- package/src/skills/nature-data/references/policy-principles.md +103 -0
- package/src/skills/nature-data/references/repository-and-identifiers.md +96 -0
- package/src/skills/nature-data/references/source-basis.md +54 -0
- package/src/skills/nature-data/references/statement-patterns.md +153 -0
- package/src/skills/nature-figure/SKILL.md +197 -0
- package/src/skills/nature-figure/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-figure/agents/openai.yaml +4 -0
- package/src/skills/nature-figure/evals/evals.json +37 -0
- package/src/skills/nature-figure/references/api.md +428 -0
- package/src/skills/nature-figure/references/backend-selection.md +100 -0
- package/src/skills/nature-figure/references/chart-types.md +281 -0
- package/src/skills/nature-figure/references/common-patterns.md +349 -0
- package/src/skills/nature-figure/references/design-theory.md +436 -0
- package/src/skills/nature-figure/references/figure-contract.md +93 -0
- package/src/skills/nature-figure/references/nature-2026-observations.md +112 -0
- package/src/skills/nature-figure/references/qa-contract.md +119 -0
- package/src/skills/nature-figure/references/r-template-index.md +66 -0
- package/src/skills/nature-figure/references/r-workflow.md +161 -0
- package/src/skills/nature-figure/references/tutorials.md +250 -0
- package/src/skills/nature-paper2ppt/SKILL.md +507 -0
- package/src/skills/nature-paper2ppt/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-paper2ppt/agents/openai.yaml +4 -0
- package/src/skills/nature-polishing/SKILL.md +385 -0
- package/src/skills/nature-polishing/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-polishing/agents/openai.yaml +4 -0
- package/src/skills/nature-polishing/references/phrasebank-playbook.md +162 -0
- package/src/skills/nature-polishing/references/section-moves.md +240 -0
- package/src/skills/nature-polishing/references/style-guardrails.md +94 -0
- package/src/skills/nature-polishing/references/writing-strategy.md +148 -0
- package/src/skills/optimize/SKILL.md +177 -1568
- package/src/skills/optimize/references/brief-shaping-playbook.md +95 -0
- package/src/skills/optimize/references/candidate-board-template.md +13 -0
- package/src/skills/optimize/references/candidate-ranking-template.md +51 -0
- package/src/skills/optimize/references/codegen-route-playbook.md +50 -0
- package/src/skills/optimize/references/debug-response-template.md +29 -0
- package/src/skills/optimize/references/frontier-review-template.md +32 -0
- package/src/skills/optimize/references/fusion-playbook.md +36 -0
- package/src/skills/optimize/references/method-brief-template.md +73 -0
- package/src/skills/optimize/references/operational-guidance.md +621 -0
- package/src/skills/optimize/references/optimization-memory-template.md +30 -0
- package/src/skills/optimize/references/optimize-checklist-template.md +18 -0
- package/src/skills/optimize/references/plateau-response-playbook.md +28 -0
- package/src/skills/optimize/references/prompt-patterns.md +49 -0
- package/src/skills/paper-outline/SKILL.md +227 -0
- package/src/skills/paper-outline/references/outline-patterns.md +87 -0
- package/src/skills/paper-plot/SKILL.md +79 -0
- package/src/skills/paper-plot/agents/openai.yaml +4 -0
- package/src/skills/paper-plot/references/bar_grouped_hatch.md +96 -0
- package/src/skills/paper-plot/references/bar_paired_delta.md +72 -0
- package/src/skills/paper-plot/references/line_confidence_band.md +75 -0
- package/src/skills/paper-plot/references/line_loss_with_inset.md +65 -0
- package/src/skills/paper-plot/references/line_training_curve.md +44 -0
- package/src/skills/paper-plot/references/radar_dual_series.md +59 -0
- package/src/skills/paper-plot/references/scatter_broken_axis.md +59 -0
- package/src/skills/paper-plot/references/scatter_tsne_cluster.md +72 -0
- package/src/skills/paper-plot/scripts/bar_memevolve.py +109 -0
- package/src/skills/paper-plot/scripts/bar_spice.py +166 -0
- package/src/skills/paper-plot/scripts/line_aime.py +94 -0
- package/src/skills/paper-plot/scripts/line_loss_inset.py +157 -0
- package/src/skills/paper-plot/scripts/line_selfdistill.py +168 -0
- package/src/skills/paper-plot/scripts/radar_dora.py +151 -0
- package/src/skills/paper-plot/scripts/scatter_break.py +169 -0
- package/src/skills/paper-plot/scripts/scatter_tsne.py +133 -0
- package/src/skills/rebuttal/SKILL.md +9 -0
- package/src/skills/references/tool-usage-by-stage.md +438 -0
- package/src/skills/review/SKILL.md +105 -7
- package/src/skills/science/PROVENANCE.md +44 -0
- package/src/skills/science/SKILL.md +137 -0
- package/src/skills/science/references/artifact-science-tool.md +110 -0
- package/src/skills/science/references/claim-type-discipline.md +56 -0
- package/src/skills/science/references/domain-index.md +422 -0
- package/src/skills/science/references/hpc-via-bash-exec.md +42 -0
- package/src/skills/science/references/package-check-playbook.md +64 -0
- package/src/skills/science/references/package-index.min.json +3616 -0
- package/src/skills/science/references/packages/abinit.md +80 -0
- package/src/skills/science/references/packages/acts.md +73 -0
- package/src/skills/science/references/packages/aiida-core.md +80 -0
- package/src/skills/science/references/packages/alamode.md +80 -0
- package/src/skills/science/references/packages/amuse.md +88 -0
- package/src/skills/science/references/packages/anndata.md +88 -0
- package/src/skills/science/references/packages/arbor.md +80 -0
- package/src/skills/science/references/packages/arc.md +73 -0
- package/src/skills/science/references/packages/astropy.md +88 -0
- package/src/skills/science/references/packages/astroquery.md +88 -0
- package/src/skills/science/references/packages/atomate2.md +80 -0
- package/src/skills/science/references/packages/atomsmltr.md +73 -0
- package/src/skills/science/references/packages/awkward.md +73 -0
- package/src/skills/science/references/packages/batman.md +88 -0
- package/src/skills/science/references/packages/biopython.md +88 -0
- package/src/skills/science/references/packages/bloqade.md +73 -0
- package/src/skills/science/references/packages/brian2.md +73 -0
- package/src/skills/science/references/packages/bullet3.md +73 -0
- package/src/skills/science/references/packages/calculix.md +80 -0
- package/src/skills/science/references/packages/cantera.md +73 -0
- package/src/skills/science/references/packages/cavity-md-ipi.md +80 -0
- package/src/skills/science/references/packages/ccdproc.md +88 -0
- package/src/skills/science/references/packages/celerite2.md +88 -0
- package/src/skills/science/references/packages/cellrank.md +73 -0
- package/src/skills/science/references/packages/cesm.md +80 -0
- package/src/skills/science/references/packages/chemicals.md +73 -0
- package/src/skills/science/references/packages/chempy.md +73 -0
- package/src/skills/science/references/packages/cirq.md +73 -0
- package/src/skills/science/references/packages/coffea.md +73 -0
- package/src/skills/science/references/packages/cp2k.md +88 -0
- package/src/skills/science/references/packages/custodian.md +80 -0
- package/src/skills/science/references/packages/dart.md +73 -0
- package/src/skills/science/references/packages/datamol.md +88 -0
- package/src/skills/science/references/packages/dd4hep.md +73 -0
- package/src/skills/science/references/packages/dealii.md +80 -0
- package/src/skills/science/references/packages/deepchem.md +88 -0
- package/src/skills/science/references/packages/delphes.md +73 -0
- package/src/skills/science/references/packages/devito.md +80 -0
- package/src/skills/science/references/packages/dftb.md +88 -0
- package/src/skills/science/references/packages/dftd4.md +88 -0
- package/src/skills/science/references/packages/dftk-jl.md +80 -0
- package/src/skills/science/references/packages/dolfinx.md +80 -0
- package/src/skills/science/references/packages/drake.md +73 -0
- package/src/skills/science/references/packages/dumux.md +73 -0
- package/src/skills/science/references/packages/elk.md +80 -0
- package/src/skills/science/references/packages/elmerfem.md +80 -0
- package/src/skills/science/references/packages/enzo-e.md +88 -0
- package/src/skills/science/references/packages/espresso.md +80 -0
- package/src/skills/science/references/packages/exoplanet.md +88 -0
- package/src/skills/science/references/packages/fairroot.md +73 -0
- package/src/skills/science/references/packages/fbpic.md +80 -0
- package/src/skills/science/references/packages/fdtdbath-meep.md +80 -0
- package/src/skills/science/references/packages/geant4.md +73 -0
- package/src/skills/science/references/packages/geosx.md +80 -0
- package/src/skills/science/references/packages/gprmax.md +80 -0
- package/src/skills/science/references/packages/gromacs.md +80 -0
- package/src/skills/science/references/packages/gwaslab.md +73 -0
- package/src/skills/science/references/packages/gz-sim.md +73 -0
- package/src/skills/science/references/packages/hail.md +88 -0
- package/src/skills/science/references/packages/hiphive.md +80 -0
- package/src/skills/science/references/packages/hoomd-blue.md +80 -0
- package/src/skills/science/references/packages/itensor.md +73 -0
- package/src/skills/science/references/packages/itensors-jl.md +73 -0
- package/src/skills/science/references/packages/jdftx.md +73 -0
- package/src/skills/science/references/packages/jobflow.md +80 -0
- package/src/skills/science/references/packages/kadanoffbaym-jl.md +73 -0
- package/src/skills/science/references/packages/kite.md +80 -0
- package/src/skills/science/references/packages/kratos.md +80 -0
- package/src/skills/science/references/packages/kwant.md +73 -0
- package/src/skills/science/references/packages/lammps.md +80 -0
- package/src/skills/science/references/packages/lightkurve.md +88 -0
- package/src/skills/science/references/packages/limix.md +73 -0
- package/src/skills/science/references/packages/maxwelllink.md +80 -0
- package/src/skills/science/references/packages/mcdc.md +73 -0
- package/src/skills/science/references/packages/meep.md +80 -0
- package/src/skills/science/references/packages/mfem.md +80 -0
- package/src/skills/science/references/packages/mitgcm.md +73 -0
- package/src/skills/science/references/packages/modflow6.md +73 -0
- package/src/skills/science/references/packages/molecool.md +73 -0
- package/src/skills/science/references/packages/mom6.md +73 -0
- package/src/skills/science/references/packages/moose.md +80 -0
- package/src/skills/science/references/packages/mpas-model.md +73 -0
- package/src/skills/science/references/packages/mujoco.md +73 -0
- package/src/skills/science/references/packages/mumax3.md +73 -0
- package/src/skills/science/references/packages/nekrs.md +80 -0
- package/src/skills/science/references/packages/nessi.md +73 -0
- package/src/skills/science/references/packages/nest-simulator.md +73 -0
- package/src/skills/science/references/packages/netket.md +73 -0
- package/src/skills/science/references/packages/neuron.md +73 -0
- package/src/skills/science/references/packages/nextflow.md +88 -0
- package/src/skills/science/references/packages/nwchem.md +88 -0
- package/src/skills/science/references/packages/openbabel.md +88 -0
- package/src/skills/science/references/packages/openems.md +80 -0
- package/src/skills/science/references/packages/openff-toolkit.md +88 -0
- package/src/skills/science/references/packages/openfoam-dev.md +80 -0
- package/src/skills/science/references/packages/openmc.md +73 -0
- package/src/skills/science/references/packages/openmm.md +80 -0
- package/src/skills/science/references/packages/openmoc.md +73 -0
- package/src/skills/science/references/packages/openmx.md +80 -0
- package/src/skills/science/references/packages/opensees.md +80 -0
- package/src/skills/science/references/packages/opensn.md +80 -0
- package/src/skills/science/references/packages/opm-simulators.md +73 -0
- package/src/skills/science/references/packages/oqupy.md +73 -0
- package/src/skills/science/references/packages/packmol.md +80 -0
- package/src/skills/science/references/packages/palabos.md +80 -0
- package/src/skills/science/references/packages/parflow.md +80 -0
- package/src/skills/science/references/packages/pennylane.md +88 -0
- package/src/skills/science/references/packages/perceval.md +73 -0
- package/src/skills/science/references/packages/phono3py.md +73 -0
- package/src/skills/science/references/packages/phonopy.md +73 -0
- package/src/skills/science/references/packages/photutils.md +88 -0
- package/src/skills/science/references/packages/picongpu.md +80 -0
- package/src/skills/science/references/packages/plink-ng.md +88 -0
- package/src/skills/science/references/packages/precice.md +73 -0
- package/src/skills/science/references/packages/psc.md +80 -0
- package/src/skills/science/references/packages/psi4.md +88 -0
- package/src/skills/science/references/packages/pybinding.md +73 -0
- package/src/skills/science/references/packages/pyfr.md +80 -0
- package/src/skills/science/references/packages/pyhf.md +73 -0
- package/src/skills/science/references/packages/pyiron_base.md +80 -0
- package/src/skills/science/references/packages/pylcp.md +73 -0
- package/src/skills/science/references/packages/pylith.md +80 -0
- package/src/skills/science/references/packages/pynbody.md +88 -0
- package/src/skills/science/references/packages/pysam.md +88 -0
- package/src/skills/science/references/packages/pyscf.md +88 -0
- package/src/skills/science/references/packages/q-e.md +73 -0
- package/src/skills/science/references/packages/qibo.md +73 -0
- package/src/skills/science/references/packages/qiskit.md +73 -0
- package/src/skills/science/references/packages/quantica-jl.md +73 -0
- package/src/skills/science/references/packages/quantumoptics-jl.md +73 -0
- package/src/skills/science/references/packages/quimb.md +73 -0
- package/src/skills/science/references/packages/qulacs.md +73 -0
- package/src/skills/science/references/packages/qutip.md +73 -0
- package/src/skills/science/references/packages/rdkit.md +88 -0
- package/src/skills/science/references/packages/rmg-py.md +73 -0
- package/src/skills/science/references/packages/root.md +73 -0
- package/src/skills/science/references/packages/scanpy.md +88 -0
- package/src/skills/science/references/packages/scikit-allel.md +88 -0
- package/src/skills/science/references/packages/scikit-bio.md +88 -0
- package/src/skills/science/references/packages/scqubits.md +73 -0
- package/src/skills/science/references/packages/scuff-em.md +80 -0
- package/src/skills/science/references/packages/scvi-tools.md +73 -0
- package/src/skills/science/references/packages/seissol.md +73 -0
- package/src/skills/science/references/packages/sfepy.md +80 -0
- package/src/skills/science/references/packages/sisl.md +73 -0
- package/src/skills/science/references/packages/smilei.md +80 -0
- package/src/skills/science/references/packages/snakemake.md +88 -0
- package/src/skills/science/references/packages/specfem3d-globe.md +80 -0
- package/src/skills/science/references/packages/specutils.md +88 -0
- package/src/skills/science/references/packages/spglib.md +80 -0
- package/src/skills/science/references/packages/squidpy.md +88 -0
- package/src/skills/science/references/packages/starry.md +88 -0
- package/src/skills/science/references/packages/strawberryfields.md +73 -0
- package/src/skills/science/references/packages/su2.md +80 -0
- package/src/skills/science/references/packages/sunny-jl.md +73 -0
- package/src/skills/science/references/packages/sw4.md +73 -0
- package/src/skills/science/references/packages/swift.md +88 -0
- package/src/skills/science/references/packages/tdnegf.md +73 -0
- package/src/skills/science/references/packages/tenpy.md +73 -0
- package/src/skills/science/references/packages/thermo.md +73 -0
- package/src/skills/science/references/packages/tkwant.md +73 -0
- package/src/skills/science/references/packages/tvb-root.md +73 -0
- package/src/skills/science/references/packages/uproot5.md +73 -0
- package/src/skills/science/references/packages/vampire.md +80 -0
- package/src/skills/science/references/packages/wannier_tools.md +73 -0
- package/src/skills/science/references/packages/warpx.md +80 -0
- package/src/skills/science/references/packages/wrf.md +73 -0
- package/src/skills/science/references/packages/xtb.md +88 -0
- package/src/skills/science/references/packages/yt.md +73 -0
- package/src/skills/science/references/science-task-brief-template.md +71 -0
- package/src/skills/scout/SKILL.md +83 -425
- package/src/skills/scout/references/literature-scout-template.md +5 -24
- package/src/skills/scout/references/operational-guidance.md +191 -0
- package/src/skills/scout/references/paper-triage-playbook.md +11 -35
- package/src/skills/write/SKILL.md +744 -1246
- package/src/skills/write/references/experiments_analysis_patterns.md +129 -0
- package/src/skills/write/references/oral_package_patterns.md +252 -0
- package/src/skills/write/references/oral_writing_principles.md +291 -0
- package/src/skills/write/references/section_rewrite_checklist.md +234 -0
- package/src/tui/dist/app/AppContainer.js +1314 -27
- package/src/tui/dist/components/Composer.js +26 -1
- package/src/tui/dist/components/ConfigScreen.js +2 -1
- package/src/tui/dist/components/InputPrompt.js +25 -9
- package/src/tui/dist/components/MainContent.js +18 -3
- package/src/tui/dist/components/QuestScreen.js +3 -2
- package/src/tui/dist/components/UtilityScreen.js +37 -0
- package/src/tui/dist/hooks/useSafeInput.js +10 -0
- package/src/tui/dist/index.js +13 -1
- package/src/tui/dist/layouts/DefaultAppLayout.js +11 -8
- package/src/tui/dist/lib/api.js +89 -1
- package/src/tui/package.json +1 -1
- package/src/ui/dist/assets/{AnalysisPlugin-DnSm0GZn.js → AnalysisPlugin-CA94NGmI.js} +1 -1
- package/src/ui/dist/assets/CliPlugin-DHBzphZU.js +79 -0
- package/src/ui/dist/assets/CodeEditorPlugin-BOFwD2rn.js +2 -0
- package/src/ui/dist/assets/{CodeViewerPlugin-itb0tltR.js → CodeViewerPlugin-CqDpgjik.js} +4 -4
- package/src/ui/dist/assets/{DocViewerPlugin-DqKkiCI6.js → DocViewerPlugin-UDBgt8-4.js} +3 -3
- package/src/ui/dist/assets/GitCommitViewerPlugin-BmHtZ0bZ.js +6 -0
- package/src/ui/dist/assets/{GitDiffViewerPlugin-DxL2ezFG.js → GitDiffViewerPlugin-CAxjNorQ.js} +2 -2
- package/src/ui/dist/assets/{GitSnapshotViewer-B_RQm1YZ.js → GitSnapshotViewer-CweA6VON.js} +2 -2
- package/src/ui/dist/assets/{ImageViewerPlugin-tHqlXY3n.js → ImageViewerPlugin-C8wHGvGN.js} +5 -5
- package/src/ui/dist/assets/LabPlugin-COyyLUol.js +32 -0
- package/src/ui/dist/assets/{LatexPlugin-B495DTXC.js → LatexPlugin-BQjAaA5J.js} +4 -4
- package/src/ui/dist/assets/{MarkdownViewerPlugin-DG28-61B.js → MarkdownViewerPlugin-Dy1NE2dI.js} +3 -3
- package/src/ui/dist/assets/{MarketplacePlugin-BiOGT-Kj.js → MarketplacePlugin-DMIZtEJ2.js} +2 -2
- package/src/ui/dist/assets/NotebookEditor-CFHMq_Qt.js +91 -0
- package/src/ui/dist/assets/{NotebookEditor-CVsj8h_T.js → NotebookEditor-WFyd8Ybt.js} +23 -23
- package/src/ui/dist/assets/{PdfLoader-CASDQmxJ.js → PdfLoader-CLE5u5TS.js} +3 -3
- package/src/ui/dist/assets/{PdfMarkdownPlugin-BFhwoKsY.js → PdfMarkdownPlugin-_iNK_H83.js} +1 -1
- package/src/ui/dist/assets/PdfViewerPlugin-DgWsbInT.js +22 -0
- package/src/ui/dist/assets/SearchPlugin-DrZmn5iw.js +11 -0
- package/src/ui/dist/assets/{TextViewerPlugin-CB4DYfWO.js → TextViewerPlugin-D1-T3aC7.js} +4 -4
- package/src/ui/dist/assets/branding/runner-claude.svg +107 -0
- package/src/ui/dist/assets/branding/runner-codex.svg +10 -0
- package/src/ui/dist/assets/branding/runner-kimi.svg +14 -0
- package/src/ui/dist/assets/branding/runner-opencode.svg +7 -0
- package/src/ui/dist/assets/cli-store-CoZ-x5Ip.js +1 -0
- package/src/ui/dist/assets/{code-DLC6G24T.js → code-DbsmSd3Y.js} +1 -1
- package/src/ui/dist/assets/file-diff-panel-DsvyRz47.js +1 -0
- package/src/ui/dist/assets/{wrap-text-CwMn-iqb.js → file-jump-queue-DeQBikaw.js} +3 -3
- package/src/ui/dist/assets/{file-socket-Cu4Qln7Y.js → file-socket-DA5XIx88.js} +1 -1
- package/src/ui/dist/assets/fonts/ds-fonts.css +50 -4
- package/src/ui/dist/assets/images/deepxiv/register-guide.png +0 -0
- package/src/ui/dist/assets/index-39vY9LmZ.js +1 -0
- package/src/ui/dist/assets/{index-wQ7RIIRd.js → index-BsO46tJA.js} +1 -1
- package/src/ui/dist/assets/index-CHzJ2xtB.js +3530 -0
- package/src/ui/dist/assets/index-DH-zxoZ3.css +33 -0
- package/src/ui/dist/assets/{plugin-notebook-HbW2K-1c.js → plugin-notebook-JRhysCqj.js} +2 -2
- package/src/ui/dist/assets/{project-sync-CsX08Qno.js → project-sync-DPmWKmKD.js} +1 -1
- package/src/ui/dist/assets/{zoom-out-R-GWEhzS.js → zoom-out-DAukFWen.js} +3 -3
- package/src/ui/dist/index.html +3 -3
- package/src/skills/analysis-campaign/references/artifact-orchestration.md +0 -58
- package/src/skills/baseline/references/memory-playbook.md +0 -40
- package/src/skills/baseline/references/publishable-baseline-package.md +0 -30
- package/src/skills/write/references/outline-evidence-contract-example.md +0 -107
- package/src/skills/write/references/paper-experiment-matrix-template.md +0 -131
- package/src/skills/write/references/paper-section-playbook.md +0 -64
- package/src/skills/write/references/reviewer-first-writing.md +0 -64
- package/src/skills/write/references/revision-checklist.md +0 -70
- package/src/skills/write/references/section-contracts.md +0 -82
- package/src/skills/write/references/sentence-level-proofing.md +0 -49
- package/src/ui/dist/assets/AiManusChatView-COFACy7V.js +0 -204
- package/src/ui/dist/assets/CliPlugin-CvwCmDQ5.js +0 -109
- package/src/ui/dist/assets/CodeEditorPlugin-cOqSa0xq.js +0 -2
- package/src/ui/dist/assets/GitCommitViewerPlugin-DVgNHBCS.js +0 -1
- package/src/ui/dist/assets/LabCopilotPanel-ClMbq5Yu.js +0 -14
- package/src/ui/dist/assets/LabPlugin-L_SuE8ow.js +0 -22
- package/src/ui/dist/assets/NotebookEditor-C-4Kt1p9.js +0 -81
- package/src/ui/dist/assets/PdfViewerPlugin-DcOzU9vd.js +0 -17
- package/src/ui/dist/assets/SearchPlugin-CHj7M58O.js +0 -16
- package/src/ui/dist/assets/VNCViewer-CjlbyCB3.js +0 -11
- package/src/ui/dist/assets/bot-CFkZY-JP.js +0 -6
- package/src/ui/dist/assets/chevron-up-Dq5ofbht.js +0 -6
- package/src/ui/dist/assets/file-content-Dv4LoZec.js +0 -1
- package/src/ui/dist/assets/file-diff-panel-Denq-lC3.js +0 -1
- package/src/ui/dist/assets/file-jump-queue-DA-SdG__.js +0 -1
- package/src/ui/dist/assets/git-commit-horizontal-BUh6G52n.js +0 -6
- package/src/ui/dist/assets/image-B9HUUddG.js +0 -6
- package/src/ui/dist/assets/index-B2B1sg-M.js +0 -1
- package/src/ui/dist/assets/index-Cgla8biy.css +0 -33
- package/src/ui/dist/assets/index-DRyx7vAc.js +0 -1
- package/src/ui/dist/assets/index-Gbl53BNp.js +0 -2496
- package/src/ui/dist/assets/pdf-effect-queue-ZtnHFCAi.js +0 -6
- package/src/ui/dist/assets/popover-DL6h35vr.js +0 -1
- package/src/ui/dist/assets/select-DvmXt1yY.js +0 -11
- package/src/ui/dist/assets/sigma-7jpXazui.js +0 -6
- package/src/ui/dist/assets/trash-xA7kFt8i.js +0 -11
- package/src/ui/dist/assets/useCliAccess-DsMwDjOp.js +0 -1
- package/src/ui/dist/assets/useFileDiffOverlay-FuhcnKiw.js +0 -1
|
@@ -6,446 +6,226 @@ skill_role: stage
|
|
|
6
6
|
|
|
7
7
|
# Baseline
|
|
8
8
|
|
|
9
|
-
|
|
10
|
-
The target is one
|
|
9
|
+
Use this skill to secure one trustworthy comparator and then get out of the way.
|
|
10
|
+
The target is one accepted baseline line, not an endless reproduction diary.
|
|
11
11
|
|
|
12
|
-
##
|
|
13
|
-
|
|
14
|
-
- Follow the shared interaction contract injected by the system prompt.
|
|
15
|
-
- Keep ordinary setup and debugging updates concise.
|
|
16
|
-
- Use richer milestone updates only when the baseline becomes trusted, caveated, blocked, waived, or route-changing.
|
|
17
|
-
- Hard execution rule: every terminal command in this stage must go through `bash_exec`; do not use any other terminal path for setup, reproduction, monitoring, verification, Git, Python, package-manager, or file-inspection commands.
|
|
18
|
-
- Prefer `bash_exec` for setup, reproduction, monitoring, and verification commands so the baseline line stays durable and auditable.
|
|
19
|
-
|
|
20
|
-
## Tool discipline
|
|
21
|
-
|
|
22
|
-
- **Do not use native `shell_command` / `command_execution` in this skill.**
|
|
23
|
-
- **All shell, CLI, Python, bash, node, git, npm, uv, and environment work must go through `bash_exec(...)`.**
|
|
24
|
-
- **For git work inside the current quest repository or worktree, prefer `artifact.git(...)` before raw shell git commands.**
|
|
25
|
-
- **If a generic git smoke test is needed outside the quest repo, use `bash_exec(...)` in an isolated scratch repository.**
|
|
26
|
-
|
|
27
|
-
## Non-negotiable rules
|
|
28
|
-
|
|
29
|
-
- no fabricated metrics, logs, run status, or success claims
|
|
30
|
-
- do not skip baseline steps or silently simplify the route when that would change trust or comparability
|
|
31
|
-
- do not claim a baseline is ready before verification is complete
|
|
32
|
-
- do not infer missing commands, scripts, or parameters when the uncertainty could change the result
|
|
33
|
-
- any unavoidable guess must be written down explicitly with expected impact
|
|
34
|
-
- use web search for discovering papers or repos, but use `artifact.arxiv(paper_id=..., full_text=False)` for actually reading a source arXiv paper when it exists
|
|
35
|
-
- set `full_text=True` only when the short form is insufficient
|
|
36
|
-
- for Python baselines, environment setup should be standardized around `uv`
|
|
37
|
-
|
|
38
|
-
## Stage purpose
|
|
39
|
-
|
|
40
|
-
The baseline stage should produce a usable reference point through one of four routes:
|
|
41
|
-
|
|
42
|
-
1. attach an existing reusable baseline
|
|
43
|
-
2. import a reusable baseline package
|
|
44
|
-
3. reproduce a baseline from source
|
|
45
|
-
4. repair a broken or stale baseline
|
|
46
|
-
|
|
47
|
-
Keep the classic control flow:
|
|
48
|
-
|
|
49
|
-
1. analysis
|
|
50
|
-
2. setup
|
|
51
|
-
3. execution
|
|
52
|
-
4. verification
|
|
53
|
-
|
|
54
|
-
These are control gates, not paperwork walls.
|
|
55
|
-
|
|
56
|
-
## Quick workflow
|
|
57
|
-
|
|
58
|
-
1. Read the source paper and source repo first, or record exactly what is missing and why.
|
|
59
|
-
2. Choose the lightest trustworthy route: attach, import, reproduce, or repair.
|
|
60
|
-
3. Start with the fast path whenever the current baseline object, command path, and acceptance target are already clear enough to validate cheaply.
|
|
61
|
-
4. Before substantial baseline setup, code edits, or a real baseline run, create `PLAN.md` and `CHECKLIST.md`; short-form files are enough for simple fast-path work.
|
|
62
|
-
5. Keep one dominant phase visible: analysis -> setup -> execution -> verification.
|
|
63
|
-
6. Prefer one clean implementation pass, one smoke test, and then one normal baseline run.
|
|
64
|
-
7. Retry only when smoke, verification, or runtime evidence shows a concrete failure or incompatibility.
|
|
65
|
-
8. Close the stage by confirming or waiving the gate, then hand off with a concise `1-2` sentence summary of trust status and next anchor.
|
|
66
|
-
|
|
67
|
-
## Fast-path first
|
|
68
|
-
|
|
69
|
-
Default to the lightest baseline path that can still establish a trustworthy comparison.
|
|
70
|
-
Default to a fast path when it can establish trust with less work.
|
|
71
|
-
|
|
72
|
-
Fast path is the default when any of the following is true:
|
|
73
|
-
|
|
74
|
-
- `requested_baseline_ref` or `confirmed_baseline_ref` already points to the active baseline object
|
|
75
|
-
- the route is clearly `attach` or `import`
|
|
76
|
-
- the repo entrypoint, dataset or split, and metric contract are already concrete enough to validate cheaply
|
|
77
|
-
- reproduction requires no meaningful code changes and the main uncertainty is only whether the command still runs
|
|
78
|
-
|
|
79
|
-
Fast path means:
|
|
80
|
-
|
|
81
|
-
- do not restart broad baseline discovery by default
|
|
82
|
-
- do not front-load a full codebase audit when the entrypoint is already concrete
|
|
83
|
-
- use a minimal `PLAN.md`, a minimal `CHECKLIST.md`, one bounded smoke test when needed, and then one real validation or run
|
|
84
|
-
- default to reuse-and-verify when runtime already attached a concrete baseline
|
|
85
|
-
|
|
86
|
-
Escalate from fast path to fuller audit only when:
|
|
87
|
-
|
|
88
|
-
- the paper and repo disagree materially
|
|
89
|
-
- the real run or eval entrypoint is unclear
|
|
90
|
-
- code changes are likely required
|
|
91
|
-
- the contract spans multiple metrics, datasets, subtasks, or splits that still need interpretation
|
|
92
|
-
- the same failure class reappears after one documented autonomous fix
|
|
93
|
-
- the quest is trying to publish a reusable global baseline rather than only clear the current gate
|
|
12
|
+
## Match signals
|
|
94
13
|
|
|
95
|
-
|
|
14
|
+
Use `baseline` when:
|
|
96
15
|
|
|
97
16
|
- no credible baseline exists yet
|
|
98
17
|
- the current baseline is unverified or stale
|
|
99
18
|
- the user already has a baseline package that should be attached or imported
|
|
19
|
+
- a local code path or local service should be verified as the comparator
|
|
100
20
|
- a reproduction failed earlier and now needs repair
|
|
101
21
|
- the quest resumed and the baseline trust state is unclear
|
|
102
22
|
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
- the quest already has a verified active baseline and the next move is ideation or execution
|
|
106
|
-
- the user explicitly waived the baseline gate and that waiver is durably recorded
|
|
107
|
-
|
|
108
|
-
## Stage gate
|
|
109
|
-
|
|
110
|
-
Do not proceed to comparison-heavy downstream work unless one of the following is durably true:
|
|
111
|
-
|
|
112
|
-
- a baseline has been attached and accepted
|
|
113
|
-
- a baseline has been imported and accepted
|
|
114
|
-
- a baseline reproduction has completed and been verified
|
|
115
|
-
- an explicit waiver decision exists with a clear reason
|
|
116
|
-
|
|
117
|
-
Operationally:
|
|
118
|
-
|
|
119
|
-
- call `artifact.confirm_baseline(...)` once the accepted baseline root and trusted comparison contract are clear
|
|
120
|
-
- call `artifact.waive_baseline(...)` when the quest must continue without a baseline
|
|
121
|
-
- attach, import, or publish alone do not open the downstream gate
|
|
122
|
-
|
|
123
|
-
## Required plan and checklist
|
|
124
|
-
|
|
125
|
-
Before substantial baseline setup, code edits, or a real baseline run, create a quest-visible `PLAN.md` and `CHECKLIST.md`.
|
|
126
|
-
|
|
127
|
-
- Use `references/baseline-plan-template.md` as the canonical structure for `PLAN.md`.
|
|
128
|
-
- Use `references/baseline-checklist-template.md` as the canonical structure for `CHECKLIST.md`.
|
|
129
|
-
- `analysis_plan.md` and `REPRO_CHECKLIST.md` remain acceptable compatibility alias files when an older quest already depends on them.
|
|
130
|
-
- For fast-path attach/import/prebound validation or a simple reproduce path with no expected code changes, short-form `PLAN.md` and `CHECKLIST.md` are enough.
|
|
131
|
-
- The plan should put the user's explicit requirements and non-negotiable constraints first.
|
|
132
|
-
- Then record the chosen route, source identity, command path, expected outputs, acceptance condition, safe efficiency levers, main risks, and fallback.
|
|
133
|
-
- If the route, commands, source package, fallback path, or trust judgment changes materially, revise `PLAN.md` before continuing.
|
|
134
|
-
- Once the route is concrete, stop reshaping code and commands speculatively.
|
|
135
|
-
|
|
136
|
-
Default retry discipline:
|
|
137
|
-
|
|
138
|
-
- do not rerun the same unchanged smoke command just to reconfirm the same fact
|
|
139
|
-
- treat one autonomous retry for the same failure class as the normal upper bound
|
|
140
|
-
- if the same failure class appears again, switch explicitly into `repair`, record `blocked`, or route through `decision`
|
|
141
|
-
|
|
142
|
-
## Required durable outputs
|
|
143
|
-
|
|
144
|
-
The baseline stage should usually leave behind:
|
|
145
|
-
|
|
146
|
-
- a baseline directory under `baselines/local/` or `baselines/imported/`
|
|
147
|
-
- `PLAN.md` and `CHECKLIST.md`
|
|
148
|
-
- a verification note or report
|
|
149
|
-
- command, config, environment, and metrics pointers
|
|
150
|
-
- a baseline artifact
|
|
151
|
-
- a confirmed baseline gate via `artifact.confirm_baseline(...)`, or an explicit waiver via `artifact.waive_baseline(...)`
|
|
152
|
-
- an optional registry publication if the baseline is reusable beyond this quest
|
|
153
|
-
|
|
154
|
-
For simple attach/import flows or a straightforward reproduce flow, do not stall just to precreate every optional note file.
|
|
155
|
-
|
|
156
|
-
Useful optional notes:
|
|
157
|
-
|
|
158
|
-
- `setup.md`
|
|
159
|
-
- `execution.md`
|
|
160
|
-
- `verification.md`
|
|
161
|
-
- `STRUCTURE.md` when the layout is non-obvious
|
|
162
|
-
|
|
163
|
-
## File-by-file contract
|
|
164
|
-
|
|
165
|
-
- `PLAN.md` or compatibility alias `analysis_plan.md` is the required route contract before substantial setup, code edits, or a real run; it should state the route, source identity, command path, expected outputs, acceptance condition, main risks, and fallback.
|
|
166
|
-
- `CHECKLIST.md` or compatibility alias `REPRO_CHECKLIST.md` is the required living state tracker; it should show whether the baseline object, smoke decision, real run decision, and final accept / block / waive outcome are explicit.
|
|
167
|
-
- `setup.md` is optional unless environment or layout choices are non-trivial; if used, record the working directory, environment route, important config paths, source revision, and notable setup deviations.
|
|
168
|
-
- `execution.md` is optional unless the run is long, multi-step, or rerun-heavy; if used, record the launched commands, durable log paths, checkpoints, exit state, and any reruns or repairs.
|
|
169
|
-
- `verification.md` is optional as a filename but required in substance before acceptance or blocked closeout; either this file or an equivalent report should record trusted metrics, expected-versus-observed comparison, caveats, canonical output paths, and the next anchor.
|
|
170
|
-
- `STRUCTURE.md` becomes required when the workspace layout, mounts, symlinks, or generated outputs are non-obvious or meant for reuse; it should map the important directories and say which paths are canonical.
|
|
171
|
-
- `attachment.yaml` is required for attached or imported baselines under `baselines/imported/`; preserve source identity, selected variant when relevant, and attachment provenance there.
|
|
172
|
-
- `<baseline_root>/json/metric_contract.json` is the canonical accepted comparison contract; once the baseline is accepted, do not leave the authoritative metric surface only in chat, memory, or prose.
|
|
173
|
-
- `Result/metric.md` is scratch-only; it may help during execution, but it is never the final source of truth.
|
|
174
|
-
|
|
175
|
-
Minimum stability rules:
|
|
176
|
-
|
|
177
|
-
- before the first real run, leave one durable note with the chosen route, expected command path, target outputs, and main risks
|
|
178
|
-
- after each smoke test or real run, record what actually happened and whether the route still looks viable
|
|
179
|
-
- before acceptance, leave a clear verification note and baseline gate decision
|
|
180
|
-
- every accepted baseline should leave one accepted baseline artifact
|
|
181
|
-
- every blocked baseline line should leave one blocked report and one next-step decision
|
|
182
|
-
- if one rolling note is enough for a simple baseline line, use it
|
|
183
|
-
|
|
184
|
-
## Durable path contract
|
|
185
|
-
|
|
186
|
-
Use the real runtime paths consistently.
|
|
187
|
-
|
|
188
|
-
Quest-local paths:
|
|
189
|
-
|
|
190
|
-
- reproduced baseline root: `<quest_root>/baselines/local/<baseline_id>/`
|
|
191
|
-
- attached or imported baseline root: `<quest_root>/baselines/imported/<baseline_id>/`
|
|
192
|
-
- attachment record: `<quest_root>/baselines/imported/<baseline_id>/attachment.yaml`
|
|
193
|
-
- canonical baseline metric contract JSON: `<baseline_root>/json/metric_contract.json`
|
|
194
|
-
- baseline artifact record: `<quest_root>/artifacts/baselines/<artifact_id>.json`
|
|
195
|
-
- baseline reports: `<quest_root>/artifacts/reports/<artifact_id>.json`
|
|
196
|
-
- confirmed baseline reference: `quest.yaml -> confirmed_baseline_ref`
|
|
197
|
-
|
|
198
|
-
Global reusable registry paths:
|
|
199
|
-
|
|
200
|
-
- baseline registry index: `~/DeepScientist/config/baselines/index.jsonl`
|
|
201
|
-
- canonical baseline entry: `~/DeepScientist/config/baselines/entries/<baseline_id>.yaml`
|
|
23
|
+
Do not use `baseline` when:
|
|
202
24
|
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
- `baseline_id` should be short, stable, and filesystem-safe
|
|
206
|
-
- use letters, digits, `.`, `_`, or `-`
|
|
207
|
-
- do not use spaces, `/`, `\\`, or `..`
|
|
208
|
-
- if one codebase contains multiple comparable baselines, prefer one `baseline_id` with structured variants instead of inventing many near-duplicate entries
|
|
209
|
-
- when variants exist, keep `default_variant_id`, `baseline_variants`, and per-variant metric summaries stable enough that later `experiment` and `write` stages can cite them directly
|
|
210
|
-
|
|
211
|
-
Do not invent parallel durable locations when these runtime contracts already exist.
|
|
212
|
-
Do not leave the authoritative metric contract only in chat, memory, or prose once the baseline is accepted.
|
|
213
|
-
|
|
214
|
-
If a baseline is reproduced only because an analysis campaign needs an extra comparator:
|
|
25
|
+
- a verified active baseline already exists and the next move is obviously `idea`, `experiment`, `write`, or `finalize`
|
|
26
|
+
- the baseline gate was already explicitly waived for the current route
|
|
215
27
|
|
|
216
|
-
-
|
|
217
|
-
- treat it as a supplementary analysis baseline unless the quest explicitly promotes it into the canonical gate
|
|
218
|
-
- do not call `artifact.confirm_baseline(...)` for that supplementary case unless the quest truly intends to replace the canonical baseline
|
|
28
|
+
## One-sentence summary
|
|
219
29
|
|
|
220
|
-
|
|
30
|
+
Secure the lightest trustworthy comparator, make the comparison contract explicit, then confirm, waive, or block the baseline and stop.
|
|
221
31
|
|
|
222
|
-
|
|
32
|
+
## Control workflow
|
|
223
33
|
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
34
|
+
1. Choose the current acceptance target and the lightest route that can satisfy it.
|
|
35
|
+
Prefer `attach`, `import`, or `verify-local-existing` before full reproduction.
|
|
36
|
+
2. Make the comparator identity and core metric contract explicit.
|
|
37
|
+
Record task, dataset, split, evaluation path, required metric ids, metric directions, source identity, and known deviations.
|
|
38
|
+
3. Collect only the evidence needed to establish comparability.
|
|
39
|
+
Do not widen into broad codebase audit or heavy reruns unless the lighter route cannot be trusted.
|
|
40
|
+
4. Verify before acceptance.
|
|
41
|
+
Check that outputs are real, metrics trace to real evidence, and the intended dataset/split and metric definitions match the contract.
|
|
42
|
+
Explicitly verify the comparator and metric contract before treating the baseline gate as open.
|
|
43
|
+
5. Close the gate explicitly.
|
|
44
|
+
Call `artifact.confirm_baseline(...)`, call `artifact.waive_baseline(...)`, or record an explicit blocker and next route.
|
|
45
|
+
When an already accepted baseline needs a deliberate second-pass refresh after verified code, variant, or canonical metric changes, prefer `artifact.overwrite_baseline(...)` over pretending the update is just a first confirmation.
|
|
228
46
|
|
|
229
|
-
##
|
|
47
|
+
## AVOID / pitfalls
|
|
230
48
|
|
|
231
|
-
|
|
49
|
+
- Do not default to full source reproduction when reuse or verify-local-existing is already sufficient.
|
|
50
|
+
- Do not treat attach, import, or publish alone as baseline acceptance.
|
|
51
|
+
- Do not accept metrics that are fabricated, copied from the paper, or not traceable to real outputs, logs, or service responses.
|
|
52
|
+
- Do not silently normalize away deviations in dataset, split, metric definition, evaluation path, or source identity.
|
|
53
|
+
- Do not keep doing baseline work after the current acceptance target is already satisfied.
|
|
54
|
+
- Do not repeat the same failure class without new evidence, code changes, environment changes, or a route change.
|
|
232
55
|
|
|
233
|
-
|
|
234
|
-
2. import
|
|
235
|
-
3. reproduce
|
|
236
|
-
4. repair
|
|
56
|
+
## Constraints
|
|
237
57
|
|
|
238
|
-
|
|
58
|
+
- Routes, templates, filenames, smoke tests, and environment choices are tactics; the hard requirement is objective evidence sufficient to accept, waive, block, or switch the route.
|
|
59
|
+
- Do not treat templates, filenames, `uv`, smoke tests, detached runs, or the phase order as required paths.
|
|
60
|
+
- Durable records are required in substance, not in fixed filenames.
|
|
61
|
+
- `PLAN.md`, `CHECKLIST.md`, `setup.md`, `execution.md`, `verification.md`, `analysis_plan.md`, and `REPRO_CHECKLIST.md` are allowed compatibility surfaces, not mandatory success paths.
|
|
62
|
+
- `<baseline_root>/json/metric_contract.json` is the canonical accepted comparison contract.
|
|
63
|
+
- Accepted baselines still require `artifact.confirm_baseline(...)`.
|
|
64
|
+
- Waived baselines still require `artifact.waive_baseline(...)`.
|
|
65
|
+
- Attach/import/publish alone do not open the downstream gate.
|
|
66
|
+
- Later stages must not need to guess the active comparator, trusted metrics, or main caveats.
|
|
239
67
|
|
|
240
|
-
##
|
|
68
|
+
## Validation
|
|
241
69
|
|
|
242
|
-
|
|
70
|
+
Before `baseline` can end, all applicable checks should be true:
|
|
243
71
|
|
|
244
|
-
|
|
72
|
+
- comparator identity is explicit and stable enough to cite later
|
|
73
|
+
- task, dataset, split, evaluation path, required metric ids, metric directions, source identity, and known deviations are durably recorded
|
|
74
|
+
- trusted metric values or trusted output pointers trace to real files, logs, service responses, or source artifacts
|
|
75
|
+
- verification checked the intended dataset/split and metric definitions
|
|
76
|
+
- the accepted comparison contract exists at `<baseline_root>/json/metric_contract.json`
|
|
77
|
+
- the route ended in `artifact.confirm_baseline(...)`, `artifact.waive_baseline(...)`, or an explicit blocked state with next-step routing
|
|
245
78
|
|
|
246
|
-
|
|
247
|
-
- dataset and split contract
|
|
248
|
-
- metric contract
|
|
249
|
-
- source baseline identity
|
|
250
|
-
- source code path
|
|
251
|
-
- expected run command or evaluation path
|
|
252
|
-
- expected paper or repo numbers when they exist
|
|
253
|
-
- local resource constraints
|
|
254
|
-
|
|
255
|
-
Default analysis discipline:
|
|
79
|
+
## Interaction discipline
|
|
256
80
|
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
- identify the real run or evaluation entrypoint
|
|
260
|
-
- identify the dataset or split and metric contract
|
|
261
|
-
- identify likely environment blockers
|
|
262
|
-
- define the cheapest credible smoke test
|
|
81
|
+
Follow the shared interaction contract injected by the system prompt.
|
|
82
|
+
Keep baseline updates brief unless trust state, blocker state, route, cost, or user-facing risk changed materially.
|
|
263
83
|
|
|
264
|
-
|
|
84
|
+
## Tool discipline
|
|
265
85
|
|
|
266
|
-
|
|
86
|
+
- **Do not use native `shell_command` / `command_execution` in this skill.**
|
|
87
|
+
- **All shell, CLI, Python, bash, node, git, npm, uv, and environment work must go through `bash_exec(...)`.**
|
|
88
|
+
- **For git work inside the current quest repository or worktree, prefer `artifact.git(...)` before raw shell git commands.**
|
|
89
|
+
- **If a generic git smoke test is needed outside the quest repo, use `bash_exec(...)` in an isolated scratch repository.**
|
|
90
|
+
- Use web search for discovering papers or repos, but use `artifact.arxiv(paper_id=..., full_text=False)` for actually reading a source arXiv paper when it exists.
|
|
91
|
+
- Set `full_text=True` only when the short form is insufficient.
|
|
267
92
|
|
|
268
|
-
|
|
269
|
-
- end-to-end data flow
|
|
270
|
-
- evaluation path and metric computation path
|
|
271
|
-
- obvious environment assumptions
|
|
272
|
-
- obvious bottlenecks or incompatibilities
|
|
93
|
+
## Authority and freedom
|
|
273
94
|
|
|
274
|
-
|
|
95
|
+
The agent owns the execution path.
|
|
96
|
+
It may choose the workspace layout, environment manager, command order, debugging route, smoke strategy, local paths, and whether the best route is attach, import, verify-local-existing, reproduce, or repair.
|
|
275
97
|
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
- the main weaknesses or bottlenecks likely to matter for this quest
|
|
98
|
+
Ask the user only when the next move depends on a real scope, cost, permission, data-access, or scientific-preference decision that cannot be inferred from the quest contract.
|
|
99
|
+
Ordinary route, path, environment, and debugging choices are autonomous unless they change the accepted comparison meaning.
|
|
279
100
|
|
|
280
|
-
|
|
101
|
+
## Comparator-first rule
|
|
281
102
|
|
|
282
|
-
The
|
|
103
|
+
The baseline stage is comparator-first, not reproduction-first.
|
|
104
|
+
For `comparison_ready`, the default question is:
|
|
283
105
|
|
|
284
|
-
|
|
106
|
+
- what is the lightest trustworthy comparator?
|
|
285
107
|
|
|
286
|
-
|
|
108
|
+
not:
|
|
287
109
|
|
|
288
|
-
-
|
|
289
|
-
- import: place the imported baseline metadata under the quest and confirm the package is readable
|
|
290
|
-
- reproduce: prepare the baseline work directory, commands, config pointers, and environment notes
|
|
291
|
-
- repair: identify the precise broken point before rerunning blindly
|
|
110
|
+
- how do I reproduce the whole source package most completely?
|
|
292
111
|
|
|
293
|
-
|
|
112
|
+
Default to the lightest baseline path that can still support a fair downstream comparison.
|
|
113
|
+
Default to a fast path when it can establish trust with less work.
|
|
114
|
+
Do not restart broad discovery or front-load a full codebase audit when the comparator, command path, and metric contract are already concrete.
|
|
115
|
+
When this applies, do not front-load a full codebase audit.
|
|
116
|
+
In that fast-path state, do not restart broad baseline discovery by default.
|
|
117
|
+
Do not require a fresh memory pass for every fast-path validation; use memory when it prevents repeated work or clarifies stale route state.
|
|
118
|
+
In short, do not require a fresh memory pass for every fast-path validation.
|
|
119
|
+
A bounded smoke test is usually helpful only when command path, environment viability, evaluator wiring, or output schema is still unclear.
|
|
120
|
+
Treat smoke/pilot work as a `0-2` default budget, and remember not to repeat an unchanged check without new evidence.
|
|
121
|
+
When resuming a previously blocked or ambiguous route, recover the relevant memory before trusting the old path again.
|
|
294
122
|
|
|
295
|
-
|
|
123
|
+
If runtime already exposes `requested_baseline_ref` or a matching `confirmed_baseline_ref`, default to reuse-and-verify.
|
|
124
|
+
Escalate to fuller audit, reproduction, or repair only when no concrete comparator, command path, or core comparability surface can be trusted yet.
|
|
296
125
|
|
|
297
|
-
|
|
298
|
-
-
|
|
299
|
-
- install dependencies with `uv pip install ...`
|
|
300
|
-
- run setup, smoke tests, and real commands through `uv run ...`
|
|
126
|
+
For route examples and boundary cases, read `references/route-selection.md`, `references/artifact-flow-examples.md`, and `references/boundary-cases.md`.
|
|
127
|
+
Use `references/baseline-plan-template.md` and `references/baseline-checklist-template.md` when a baseline route is complex enough to need durable planning surfaces.
|
|
301
128
|
|
|
302
|
-
|
|
129
|
+
## Acceptance targets
|
|
303
130
|
|
|
304
|
-
-
|
|
305
|
-
-
|
|
306
|
-
-
|
|
307
|
-
-
|
|
308
|
-
-
|
|
131
|
+
- `comparison_ready`: the default target; one comparator is trustworthy enough for downstream comparison, and the core metric contract is durably recorded
|
|
132
|
+
- `paper_repro_ready`: the baseline is strong enough to support paper-facing reproduction or comparison claims
|
|
133
|
+
- `registry_publishable`: the baseline package is reusable and clean enough to publish as a durable baseline package
|
|
134
|
+
- `blocked`: the current route cannot clear the gate cleanly, and the next move is explicit
|
|
135
|
+
- `waived`: the quest must continue without a baseline, and the reason is durably recorded
|
|
309
136
|
|
|
310
|
-
|
|
137
|
+
Not every baseline needs paper-grade exact reproduction.
|
|
138
|
+
A verified attached, imported, or local-existing comparator can be enough when the acceptance target is only `comparison_ready`.
|
|
311
139
|
|
|
312
|
-
|
|
313
|
-
- `uv venv --python 3.11`
|
|
314
|
-
- `uv pip install -r requirements.txt`
|
|
315
|
-
- `uv run python scripts/smoke_test.py`
|
|
316
|
-
- `uv run python train.py --config ...`
|
|
140
|
+
## Hard acceptance gates
|
|
317
141
|
|
|
318
|
-
|
|
142
|
+
Baseline success means later stages can compare against one accepted comparator without guessing task, data, split, metric, source, command or evaluation path, provenance, or caveats.
|
|
319
143
|
|
|
320
|
-
|
|
321
|
-
- working directory
|
|
322
|
-
- config files
|
|
323
|
-
- command template
|
|
324
|
-
- expected outputs
|
|
325
|
-
- known deviations from paper or source
|
|
326
|
-
- the chosen `uv` route and Python version
|
|
144
|
+
A baseline is successful only when all applicable gates are true:
|
|
327
145
|
|
|
328
|
-
|
|
146
|
+
- the comparator identity is explicit and stable enough for later stages to cite
|
|
147
|
+
- the task, dataset, split, evaluation path, required metric ids, metric directions, source identity, and known deviations are durably recorded
|
|
148
|
+
- trusted metric values or trusted output pointers are traceable to real files, logs, service responses, source artifacts, or an accepted registry/package record
|
|
149
|
+
- verification checked that the evidence came from the intended dataset/split and metric definitions
|
|
150
|
+
- the accepted comparison contract is written to `<baseline_root>/json/metric_contract.json`
|
|
151
|
+
- the baseline gate is opened with `artifact.confirm_baseline(...)`, or intentionally bypassed with `artifact.waive_baseline(...)`
|
|
329
152
|
|
|
330
|
-
|
|
331
|
-
|
|
153
|
+
Once a comparison-ready baseline is durably confirmed, baseline should usually stop immediately.
|
|
154
|
+
Once a comparison-ready baseline is durably confirmed, baseline should usually stop immediately and hand off to the next scientific step.
|
|
155
|
+
Any extra baseline work after that must name one explicit unresolved comparison risk it is meant to remove.
|
|
332
156
|
|
|
333
|
-
##
|
|
157
|
+
## Route success criteria
|
|
334
158
|
|
|
335
|
-
|
|
159
|
+
Choose the route that maximizes trust per unit time and compute; do not follow a fixed ritual.
|
|
160
|
+
Keep one dominant baseline route active at a time.
|
|
161
|
+
If a lighter route already satisfies the current acceptance target, stop there.
|
|
336
162
|
|
|
337
|
-
|
|
163
|
+
- `attach` succeeds when baseline identity, provenance, trusted outputs pointer, core metric contract, and accepted baseline artifact are explicit
|
|
164
|
+
- `import` succeeds when the package is materialized/readable inside the quest, `attachment.yaml` or equivalent provenance exists, and trusted outputs or metrics are traceable
|
|
165
|
+
- `verify-local-existing` succeeds when the concrete local path or service, exact command or evaluation endpoint, output location, required metrics, and core metric contract are verified
|
|
166
|
+
- `reproduce` succeeds when source identity, command or evaluation path, expected outputs, verification evidence, deviations, and metric contract are explicit
|
|
167
|
+
- `repair` succeeds when the broken point is identified, a bounded fix or route change is made, rerun or re-read evidence supports the new trust state, and the result is accepted or blocked
|
|
338
168
|
|
|
339
|
-
|
|
340
|
-
-
|
|
341
|
-
- avoid uncontrolled side experiments during baseline establishment
|
|
342
|
-
- checkpoint only explainable, minimal code changes
|
|
343
|
-
- prefer equivalence-preserving efficiency gains such as larger safe batch size, cache reuse, checkpoint resume, and parallel downloads or workers
|
|
344
|
-
- do not use an efficiency lever if it changes accepted baseline meaning, effective evaluation contract, or trust judgment
|
|
169
|
+
Prefer reuse over redundant reproduction, but prefer reproduction or repair when reuse would still leave the baseline incomparable.
|
|
170
|
+
Do not replace a working comparison-ready comparator with a heavier route merely because the heavier route feels cleaner or more complete.
|
|
345
171
|
|
|
346
|
-
|
|
172
|
+
## Objective evidence requirements
|
|
347
173
|
|
|
348
|
-
|
|
349
|
-
- once the smoke test passes, launch the real baseline reproduction with `bash_exec(mode='detach', ...)`
|
|
350
|
-
- monitor by forward progress instead of by short-window completion anxiety
|
|
351
|
-
- do not report final success until the command actually finished and the expected result files exist
|
|
352
|
-
- if you need to recover ids or inspect session state, use `bash_exec(mode='history')` or `bash_exec(mode='list')`
|
|
353
|
-
- `bash_exec(mode='read', id=...)` returns the full saved log when it is `2000 lines or fewer`; for longer logs, inspect omitted middle windows with `start` and `tail`
|
|
354
|
-
- during monitoring, prefer `bash_exec(mode='read', id=..., tail_limit=..., order='desc')`, and after the first read prefer incremental checks with `after_seq=last_seen_seq`
|
|
355
|
-
- use `silent_seconds`, `progress_age_seconds`, `signal_age_seconds`, and `watchdog_overdue` as the default staleness clues
|
|
356
|
-
- if a run is clearly invalid, wedged, or superseded, stop it with `bash_exec(mode='kill', id=..., wait=true, timeout_seconds=...)`, document why, and relaunch cleanly
|
|
357
|
-
- do not let more than the `30-minute visibility bound` pass without a real inspection and a `next expected update time`
|
|
358
|
-
- when the baseline code is under your control, prefer a throttled `tqdm` progress reporter and periodic `__DS_PROGRESS__` markers when feasible
|
|
174
|
+
The final evidence should cover these facts before acceptance:
|
|
359
175
|
|
|
360
|
-
|
|
176
|
+
- comparator candidate and baseline id
|
|
177
|
+
- source paper, source repo, source commit/version/tag, local service identity, or registry/package identity as applicable
|
|
178
|
+
- task identity
|
|
179
|
+
- dataset identity and split contract
|
|
180
|
+
- evaluation script, evaluation endpoint, or evaluation path
|
|
181
|
+
- required metric keys for the current downstream comparison
|
|
182
|
+
- metric directions
|
|
183
|
+
- metric values or trusted output pointers
|
|
184
|
+
- environment and hardware facts that materially affect comparability
|
|
185
|
+
- known deviations from the paper, source package, local reference, or selected target
|
|
186
|
+
- verification verdict and caveats
|
|
361
187
|
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
|
|
188
|
+
Unless the user explicitly specifies otherwise, treat the original paper's evaluation protocol as the canonical starting point.
|
|
189
|
+
If later `experiment` work would still have to guess the comparison contract, the baseline is not ready.
|
|
190
|
+
For a compact verdict rubric, read `references/comparability-contract.md`.
|
|
365
191
|
|
|
366
|
-
##
|
|
192
|
+
## Verification
|
|
367
193
|
|
|
368
194
|
Verification is mandatory before baseline acceptance.
|
|
369
195
|
|
|
370
196
|
Verify:
|
|
371
197
|
|
|
372
|
-
- the run actually finished
|
|
198
|
+
- the run, service call, package import, or trusted-output inspection actually finished
|
|
373
199
|
- the reported metrics came from the intended dataset and split
|
|
374
|
-
-
|
|
375
|
-
- the result is comparable to the paper, source repo, or selected target
|
|
376
|
-
-
|
|
200
|
+
- metric definitions and directions match the quest contract
|
|
201
|
+
- the result is comparable to the paper, source repo, local comparator, registry package, or selected target
|
|
202
|
+
- deviations are explicitly stated rather than silently normalized away
|
|
377
203
|
|
|
378
204
|
Classify the outcome as one of:
|
|
379
205
|
|
|
380
206
|
- `verified_match`
|
|
381
207
|
- `verified_close`
|
|
382
208
|
- `verified_diverged`
|
|
209
|
+
- `trusted_with_caveats`
|
|
383
210
|
- `broken`
|
|
384
211
|
|
|
385
|
-
Verification
|
|
212
|
+
Verification should explicitly separate likely implementation mismatch, environment mismatch, data or split mismatch, expected stochastic variance, and unexplained divergence when those distinctions matter.
|
|
386
213
|
|
|
387
|
-
|
|
388
|
-
- environment mismatch
|
|
389
|
-
- data or split mismatch
|
|
390
|
-
- expected stochastic variance
|
|
391
|
-
- unexplained divergence
|
|
214
|
+
## Core metric contract
|
|
392
215
|
|
|
393
|
-
|
|
394
|
-
|
|
395
|
-
- whether the baseline is trustworthy enough for downstream comparison
|
|
396
|
-
- whether the result is reusable beyond this quest
|
|
397
|
-
- whether another repair or rerun is justified
|
|
398
|
-
- whether the line should stop here and hand off
|
|
399
|
-
|
|
400
|
-
A verification report should be self-contained enough that a later stage can answer:
|
|
401
|
-
|
|
402
|
-
- what was used
|
|
403
|
-
- how it was obtained: attach, import, reproduce, or repair
|
|
404
|
-
- what commands and configs were used
|
|
405
|
-
- what metrics are trusted
|
|
406
|
-
- what caveats remain
|
|
407
|
-
- whether the result is reusable beyond this quest
|
|
408
|
-
|
|
409
|
-
## Baseline comparability contract
|
|
410
|
-
|
|
411
|
-
The baseline stage is not complete just because something ran.
|
|
412
|
-
It is complete when later stages can compare against it fairly.
|
|
413
|
-
|
|
414
|
-
Before declaring a baseline usable, make the comparability contract explicit:
|
|
216
|
+
Before declaring a baseline usable, make the core comparison contract explicit:
|
|
415
217
|
|
|
416
218
|
- task identity
|
|
417
|
-
- dataset identity and
|
|
418
|
-
- split contract
|
|
419
|
-
- preprocessing boundary
|
|
219
|
+
- dataset identity and split contract
|
|
420
220
|
- evaluation script or evaluation path
|
|
421
|
-
- required metric keys
|
|
221
|
+
- required metric keys for the current downstream comparison
|
|
422
222
|
- metric directions
|
|
423
|
-
- seed policy when relevant
|
|
424
223
|
- source commit or source package identity
|
|
425
224
|
- known deviations from the source reference
|
|
426
225
|
|
|
427
|
-
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
|
|
431
|
-
## Feasibility and trust classes
|
|
432
|
-
|
|
433
|
-
Before acceptance, classify feasibility as one of:
|
|
434
|
-
|
|
435
|
-
- `full_reproducible`
|
|
436
|
-
- `degraded_but_acceptable`
|
|
437
|
-
- `blocked`
|
|
438
|
-
|
|
439
|
-
And classify downstream trust as one of:
|
|
440
|
-
|
|
441
|
-
- `verified`
|
|
442
|
-
- `partially_verified`
|
|
443
|
-
- `operational_but_incomparable`
|
|
444
|
-
- `failed`
|
|
445
|
-
|
|
446
|
-
Do not silently upgrade a degraded or merely operational result into a normal trusted baseline.
|
|
447
|
-
|
|
448
|
-
## Minimum baseline artifact content
|
|
226
|
+
`<baseline_root>/json/metric_contract.json` is the canonical accepted comparison contract.
|
|
227
|
+
The comparison-ready minimum still requires `<baseline_root>/json/metric_contract.json`.
|
|
228
|
+
A core contract is enough to confirm a `comparison_ready` baseline; expand it later when paper claims, registry publication, or variant-heavy comparison need more coverage.
|
|
449
229
|
|
|
450
230
|
The accepted baseline artifact should include at least:
|
|
451
231
|
|
|
@@ -460,111 +240,46 @@ The accepted baseline artifact should include at least:
|
|
|
460
240
|
- `source`
|
|
461
241
|
- `summary`
|
|
462
242
|
|
|
463
|
-
If variants exist, also include:
|
|
464
|
-
|
|
465
|
-
- `default_variant_id`
|
|
466
|
-
- `baseline_variants`
|
|
467
|
-
|
|
468
243
|
Metric-contract rules:
|
|
469
244
|
|
|
470
|
-
- if the accepted baseline contract includes multiple metrics, datasets, subtasks, or splits, record all of them in `<baseline_root>/json/metric_contract.json`
|
|
471
245
|
- keep `primary_metric` as the headline metric only; do not let it erase the rest of the comparison surface
|
|
472
|
-
-
|
|
246
|
+
- submit canonical `metrics_summary` as a flat top-level dictionary keyed by the paper-facing metric ids
|
|
473
247
|
- every canonical baseline metric entry should include `description`, either `derivation` or `origin_path`, and `source_ref`
|
|
248
|
+
- mark only the currently required canonical metrics as required; additional metrics can be added later or kept supplementary
|
|
249
|
+
- if the accepted baseline contract already needs multiple metrics, datasets, subtasks, or splits, record them in `<baseline_root>/json/metric_contract.json`
|
|
474
250
|
- if the paper reports both aggregate and per-dataset or per-task results, preserve both whenever feasible through `metrics_summary` plus structured rows rather than one cherry-picked scalar
|
|
475
251
|
- if the source package already has a richer leaderboard table, structured result file, or `json/metric_contract.json`, reuse that richer contract instead of hand-writing a thinner one that keeps only one averaged scalar
|
|
476
252
|
- `Result/metric.md` is optional temporary scratch memory only; reconcile against it before calling `artifact.confirm_baseline(...)`, but do not treat it as a required durable file
|
|
253
|
+
- for stable accepted payload shapes, read `references/artifact-payload-examples.md`
|
|
477
254
|
|
|
478
|
-
##
|
|
479
|
-
|
|
480
|
-
Use the registry deliberately, not as an afterthought.
|
|
481
|
-
|
|
482
|
-
If the result is reusable beyond the current quest:
|
|
483
|
-
|
|
484
|
-
- publish it through `artifact.publish_baseline(...)`
|
|
485
|
-
- ensure the payload includes identity, provenance, trusted metrics, and any variant structure
|
|
486
|
-
- set `publish_global: true` only when verification is complete and reuse is justified
|
|
487
|
-
|
|
488
|
-
If the current quest should reuse an existing baseline:
|
|
489
|
-
|
|
490
|
-
- attach it through `artifact.attach_baseline(...)`
|
|
491
|
-
- preserve the selected `baseline_id`
|
|
492
|
-
- preserve the selected `variant_id` when one is used
|
|
493
|
-
- keep the attachment durable under `baselines/imported/`
|
|
494
|
-
|
|
495
|
-
If runtime state already includes `requested_baseline_ref` or a matching `confirmed_baseline_ref`:
|
|
496
|
-
|
|
497
|
-
- default to reuse-and-verify, not rediscovery
|
|
498
|
-
- treat a creation-time pre-bound baseline as the active starting point unless you find a concrete incompatibility
|
|
499
|
-
- do not rerun broad baseline scouting or full reproduction just because the stage name is `baseline`
|
|
500
|
-
|
|
501
|
-
For a clearer attach/import/reproduce/repair rubric, read `references/route-selection.md`.
|
|
502
|
-
For reusable-package expectations, read `references/publishable-baseline-package.md`.
|
|
503
|
-
|
|
504
|
-
## Workspace and branch rules
|
|
505
|
-
|
|
506
|
-
- treat the baseline workspace as a system-managed reproduction surface, not an unrelated sandbox
|
|
507
|
-
- avoid creating a nested authoritative Git lifecycle inside the baseline workspace
|
|
508
|
-
- use the quest branch unless isolation is genuinely needed
|
|
509
|
-
- if baseline setup is risky or intrusive, prepare an isolated branch or worktree first and record why
|
|
510
|
-
- do not proliferate branches without a reason
|
|
511
|
-
|
|
512
|
-
## Memory rules
|
|
513
|
-
|
|
514
|
-
Stage-start requirement:
|
|
515
|
-
|
|
516
|
-
- by default, begin every baseline pass with `memory.list_recent(scope='quest', limit=5)`
|
|
517
|
-
- then run at least one baseline-relevant `memory.search(...)` before new baseline analysis, repair, or rerun work
|
|
518
|
-
- fast-path exception: if the quest already exposes a clear `requested_baseline_ref` or `confirmed_baseline_ref` and the immediate task is only to validate or reattach that concrete baseline, you may skip broad retrieval
|
|
255
|
+
## Operational guidance
|
|
519
256
|
|
|
520
|
-
|
|
257
|
+
The main skill keeps the control surface in front.
|
|
258
|
+
For the longer operational notes, read `references/operational-guidance.md`.
|
|
521
259
|
|
|
522
|
-
-
|
|
523
|
-
- environment
|
|
524
|
-
-
|
|
525
|
-
- verification caveats
|
|
526
|
-
- attach vs import vs reproduce vs repair rationale
|
|
260
|
+
- use it when you need the exact durable route record shape
|
|
261
|
+
- use it when you need detailed execution tactics or environment tactics
|
|
262
|
+
- use it when reuse or memory handling materially affects the route
|
|
527
263
|
|
|
528
|
-
|
|
264
|
+
## Negative cases and stop rules
|
|
529
265
|
|
|
530
|
-
|
|
266
|
+
Do not accept a baseline when:
|
|
531
267
|
|
|
532
|
-
-
|
|
533
|
-
|
|
534
|
-
|
|
535
|
-
|
|
536
|
-
|
|
537
|
-
|
|
538
|
-
-
|
|
539
|
-
-
|
|
540
|
-
-
|
|
541
|
-
- `baseline` only for an accepted baseline record
|
|
542
|
-
|
|
543
|
-
For stable field shapes, read `references/artifact-payload-examples.md`.
|
|
544
|
-
|
|
545
|
-
The baseline handoff should make these items obvious:
|
|
546
|
-
|
|
547
|
-
- `baseline_id`
|
|
548
|
-
- `baseline_variant_id` when relevant
|
|
549
|
-
- route used: attach, import, reproduce, or repair
|
|
550
|
-
- trusted metrics
|
|
551
|
-
- canonical metric contract JSON path
|
|
552
|
-
- verification outcome
|
|
553
|
-
- reusable or quest-local only
|
|
554
|
-
- canonical output paths
|
|
555
|
-
- main caveats
|
|
556
|
-
- recommended next anchor
|
|
557
|
-
|
|
558
|
-
If this packet is not obvious from the accepted artifact plus verification note, the baseline line is not stable enough yet.
|
|
559
|
-
|
|
560
|
-
## Failure and blocked handling
|
|
268
|
+
- metrics are fabricated, copied, or paraphrased without provenance
|
|
269
|
+
- metrics are copied from a paper while the acceptance target requires local verification
|
|
270
|
+
- dataset, split, metric direction, or evaluation path is materially unknown
|
|
271
|
+
- outputs exist but cannot be tied to the intended command, source, comparator, package, or service
|
|
272
|
+
- a local run completed but used a materially different protocol without a recorded caveat
|
|
273
|
+
- source code was modified in a way that changes baseline scope without recording the deviation
|
|
274
|
+
- a package imports but trusted metrics or outputs are not traceable
|
|
275
|
+
- later experiment work would still need to guess the required baseline metric ids
|
|
276
|
+
- the same failure class reappears without new evidence, code changes, environment changes, or a route change
|
|
561
277
|
|
|
278
|
+
If the same failure class appears again without new evidence, code changes, environment changes, or a route change, stop looping and route through `repair`, `decision`, `blocked`, `waive`, or one bounded clarification.
|
|
562
279
|
Do not hide failures.
|
|
563
|
-
|
|
564
|
-
If blocked, record the class explicitly:
|
|
280
|
+
If blocked, record the class explicitly when possible:
|
|
565
281
|
|
|
566
282
|
- `missing_source`
|
|
567
|
-
- `missing_code`
|
|
568
283
|
- `missing_metric_contract`
|
|
569
284
|
- `environment_infeasible`
|
|
570
285
|
- `command_unknown`
|
|
@@ -576,29 +291,36 @@ A blocked result must state:
|
|
|
576
291
|
- what failed
|
|
577
292
|
- what was tried
|
|
578
293
|
- which paths or logs show the issue
|
|
579
|
-
- whether the next best move is attach, import, retry, repair, reset, or ask the user
|
|
294
|
+
- whether the next best move is attach, import, retry, repair, reset, waive, or ask the user
|
|
580
295
|
|
|
581
|
-
|
|
296
|
+
Bounded autonomous fixes are acceptable only when they do not change confirmed scope, metrics, permissions, resource assumptions, or scientific meaning.
|
|
297
|
+
Reasonable bounded fixes include missing dependency installs, wrong dataset paths, permission fixes on scripts, obvious environment activation mistakes, and conservative batch-size reductions for OOM.
|
|
582
298
|
|
|
583
|
-
|
|
584
|
-
|
|
585
|
-
|
|
586
|
-
- reasonable batch-size reductions for OOM
|
|
587
|
-
- obvious environment activation mistakes
|
|
299
|
+
## Baseline id and variant rules
|
|
300
|
+
|
|
301
|
+
Keep baseline identifiers and variant names stable enough that later stages can cite the same comparator without guesswork.
|
|
588
302
|
|
|
589
|
-
|
|
303
|
+
- keep `baseline_id` short, stable, and filesystem-safe
|
|
304
|
+
- prefer one baseline id with stable variant names over many near-duplicate ids
|
|
305
|
+
- if multiple comparators exist, mark which one is the primary downstream baseline
|
|
590
306
|
|
|
591
307
|
## Exit criteria
|
|
592
308
|
|
|
593
|
-
Exit
|
|
309
|
+
Exit once one of these is durably true:
|
|
594
310
|
|
|
595
311
|
- a baseline is attached and accepted
|
|
596
312
|
- an imported baseline is accepted
|
|
313
|
+
- a verified local-existing comparator is accepted
|
|
597
314
|
- a reproduced baseline is verified and accepted
|
|
315
|
+
- a repaired baseline is verified and accepted
|
|
598
316
|
- a broken route has been declared blocked and a next decision is recorded
|
|
317
|
+
- a waiver decision explicitly leaves the baseline gate
|
|
318
|
+
- a route change is recorded because the previous route is no longer the best trust-per-cost path
|
|
599
319
|
|
|
600
320
|
Typical next anchors:
|
|
601
321
|
|
|
602
322
|
- `idea`
|
|
603
323
|
- `experiment` in tightly scoped follow-on cases
|
|
604
324
|
- `decision` if the baseline line remains contested
|
|
325
|
+
|
|
326
|
+
A good baseline pass leaves one trusted comparator, one explicit blocker, or one explicit route change, not a vague promise to keep rechecking baseline.
|