@researai/deepscientist 1.5.17 → 1.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +309 -130
- package/AISB/catalog/aisb.b1.agentic_coding.yaml +244 -0
- package/AISB/catalog/aisb.b10.climate_earth.yaml +235 -0
- package/AISB/catalog/aisb.b11.model_efficiency.yaml +231 -0
- package/AISB/catalog/aisb.b12.embodied_ai.yaml +238 -0
- package/AISB/catalog/aisb.b2.agent_systems.yaml +229 -0
- package/AISB/catalog/aisb.b3.self_evolving_rl.yaml +237 -0
- package/AISB/catalog/aisb.b4.lm_reasoning.yaml +240 -0
- package/AISB/catalog/aisb.b5.math_proof.yaml +235 -0
- package/AISB/catalog/aisb.b6.research_process.yaml +243 -0
- package/AISB/catalog/aisb.b7.multimodal_fusion.yaml +232 -0
- package/AISB/catalog/aisb.b8.lifesci_drug.yaml +275 -0
- package/AISB/catalog/aisb.b9.material_science.yaml +237 -0
- package/AISB/catalog/aisb.t3.001_savvy.yaml +159 -0
- package/AISB/catalog/aisb.t3.001_savvy.zh.yaml +121 -0
- package/AISB/catalog/aisb.t3.002_pinet.yaml +189 -0
- package/AISB/catalog/aisb.t3.002_pinet.zh.yaml +130 -0
- package/AISB/catalog/aisb.t3.004_decentralattn.yaml +184 -0
- package/AISB/catalog/aisb.t3.004_decentralattn.zh.yaml +153 -0
- package/AISB/catalog/aisb.t3.005_tsae.yaml +193 -0
- package/AISB/catalog/aisb.t3.005_tsae.zh.yaml +139 -0
- package/AISB/catalog/aisb.t3.006_physense.yaml +194 -0
- package/AISB/catalog/aisb.t3.006_physense.zh.yaml +118 -0
- package/AISB/catalog/aisb.t3.007_reasoningiqa.yaml +169 -0
- package/AISB/catalog/aisb.t3.007_reasoningiqa.zh.yaml +133 -0
- package/AISB/catalog/aisb.t3.008_meanflows.yaml +188 -0
- package/AISB/catalog/aisb.t3.008_meanflows.zh.yaml +140 -0
- package/AISB/catalog/aisb.t3.009_scoremissing.yaml +179 -0
- package/AISB/catalog/aisb.t3.009_scoremissing.zh.yaml +119 -0
- package/AISB/catalog/aisb.t3.010_suitabilityfilter.yaml +221 -0
- package/AISB/catalog/aisb.t3.010_suitabilityfilter.zh.yaml +141 -0
- package/AISB/catalog/aisb.t3.011_osd.yaml +206 -0
- package/AISB/catalog/aisb.t3.011_osd.zh.yaml +163 -0
- package/AISB/catalog/aisb.t3.012_efficientqat.yaml +206 -0
- package/AISB/catalog/aisb.t3.012_efficientqat.zh.yaml +159 -0
- package/AISB/catalog/aisb.t3.013_appl.yaml +152 -0
- package/AISB/catalog/aisb.t3.013_appl.zh.yaml +126 -0
- package/AISB/catalog/aisb.t3.014_piguard.yaml +207 -0
- package/AISB/catalog/aisb.t3.014_piguard.zh.yaml +164 -0
- package/AISB/catalog/aisb.t3.015_frspec.yaml +209 -0
- package/AISB/catalog/aisb.t3.015_frspec.zh.yaml +163 -0
- package/AISB/catalog/aisb.t3.016_mathfusion.yaml +166 -0
- package/AISB/catalog/aisb.t3.016_mathfusion.zh.yaml +145 -0
- package/AISB/catalog/aisb.t3.017_multimodalglp.yaml +171 -0
- package/AISB/catalog/aisb.t3.017_multimodalglp.zh.yaml +122 -0
- package/AISB/catalog/aisb.t3.018_cotsynth.yaml +206 -0
- package/AISB/catalog/aisb.t3.018_cotsynth.zh.yaml +162 -0
- package/AISB/catalog/aisb.t3.019_dyscaleut.yaml +211 -0
- package/AISB/catalog/aisb.t3.019_dyscaleut.zh.yaml +148 -0
- package/AISB/catalog/aisb.t3.020_aristotle.yaml +173 -0
- package/AISB/catalog/aisb.t3.020_aristotle.zh.yaml +119 -0
- package/AISB/catalog/aisb.t3.021_tokenrecycling.yaml +160 -0
- package/AISB/catalog/aisb.t3.021_tokenrecycling.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.022_chainofreasoning.yaml +204 -0
- package/AISB/catalog/aisb.t3.022_chainofreasoning.zh.yaml +161 -0
- package/AISB/catalog/aisb.t3.023_guidedembed.yaml +211 -0
- package/AISB/catalog/aisb.t3.023_guidedembed.zh.yaml +189 -0
- package/AISB/catalog/aisb.t3.024_outputcentric.yaml +148 -0
- package/AISB/catalog/aisb.t3.024_outputcentric.zh.yaml +131 -0
- package/AISB/catalog/aisb.t3.025_deeper.yaml +143 -0
- package/AISB/catalog/aisb.t3.025_deeper.zh.yaml +116 -0
- package/AISB/catalog/aisb.t3.026_gartkg.yaml +195 -0
- package/AISB/catalog/aisb.t3.026_gartkg.zh.yaml +127 -0
- package/AISB/catalog/aisb.t3.027_citeeval.yaml +182 -0
- package/AISB/catalog/aisb.t3.027_citeeval.zh.yaml +135 -0
- package/AISB/catalog/aisb.t3.028_sbam.yaml +206 -0
- package/AISB/catalog/aisb.t3.028_sbam.zh.yaml +166 -0
- package/AISB/catalog/aisb.t3.029_cdqgeoembed.yaml +224 -0
- package/AISB/catalog/aisb.t3.029_cdqgeoembed.zh.yaml +142 -0
- package/AISB/catalog/aisb.t3.030_processrm.yaml +211 -0
- package/AISB/catalog/aisb.t3.030_processrm.zh.yaml +166 -0
- package/AISB/catalog/aisb.t3.031_circuitstability.yaml +172 -0
- package/AISB/catalog/aisb.t3.031_circuitstability.zh.yaml +134 -0
- package/AISB/catalog/aisb.t3.032_ptsolver.yaml +169 -0
- package/AISB/catalog/aisb.t3.032_ptsolver.zh.yaml +135 -0
- package/AISB/catalog/aisb.t3.033_gcse.yaml +144 -0
- package/AISB/catalog/aisb.t3.033_gcse.zh.yaml +126 -0
- package/AISB/catalog/aisb.t3.034_ensemblewm.yaml +183 -0
- package/AISB/catalog/aisb.t3.034_ensemblewm.zh.yaml +146 -0
- package/AISB/catalog/aisb.t3.035_moralvalueswa.yaml +207 -0
- package/AISB/catalog/aisb.t3.035_moralvalueswa.zh.yaml +165 -0
- package/AISB/catalog/aisb.t3.036_weakstrongpref.yaml +210 -0
- package/AISB/catalog/aisb.t3.036_weakstrongpref.zh.yaml +194 -0
- package/AISB/catalog/aisb.t3.037_dementiamask.yaml +172 -0
- package/AISB/catalog/aisb.t3.037_dementiamask.zh.yaml +132 -0
- package/AISB/catalog/aisb.t3.038_tinysam.yaml +284 -0
- package/AISB/catalog/aisb.t3.038_tinysam.zh.yaml +240 -0
- package/AISB/catalog/aisb.t3.039_calf.yaml +224 -0
- package/AISB/catalog/aisb.t3.039_calf.zh.yaml +194 -0
- package/AISB/catalog/aisb.t3.040_graniteguardian.yaml +199 -0
- package/AISB/catalog/aisb.t3.040_graniteguardian.zh.yaml +174 -0
- package/AISB/catalog/aisb.t3.041_amdm.yaml +149 -0
- package/AISB/catalog/aisb.t3.041_amdm.zh.yaml +137 -0
- package/AISB/catalog/aisb.t3.042_xpatch.yaml +216 -0
- package/AISB/catalog/aisb.t3.042_xpatch.zh.yaml +182 -0
- package/AISB/catalog/aisb.t3.043_vhm.yaml +268 -0
- package/AISB/catalog/aisb.t3.043_vhm.zh.yaml +193 -0
- package/AISB/catalog/aisb.t3.044_rgvi.yaml +224 -0
- package/AISB/catalog/aisb.t3.044_rgvi.zh.yaml +176 -0
- package/AISB/catalog/aisb.t3.045_pslstm.yaml +203 -0
- package/AISB/catalog/aisb.t3.045_pslstm.zh.yaml +179 -0
- package/AISB/catalog/aisb.t3.046_nonstatts.yaml +208 -0
- package/AISB/catalog/aisb.t3.046_nonstatts.zh.yaml +194 -0
- package/AISB/catalog/aisb.t3.047_timepfn.yaml +156 -0
- package/AISB/catalog/aisb.t3.047_timepfn.zh.yaml +124 -0
- package/AISB/catalog/aisb.t3.048_proxyspex.yaml +148 -0
- package/AISB/catalog/aisb.t3.048_proxyspex.zh.yaml +125 -0
- package/AISB/catalog/aisb.t3.049_hogwildinference.yaml +183 -0
- package/AISB/catalog/aisb.t3.049_hogwildinference.zh.yaml +138 -0
- package/AISB/catalog/aisb.t3.050_causalpfn.yaml +214 -0
- package/AISB/catalog/aisb.t3.050_causalpfn.zh.yaml +190 -0
- package/AISB/catalog/aisb.t3.051_flashtp.yaml +169 -0
- package/AISB/catalog/aisb.t3.051_flashtp.zh.yaml +124 -0
- package/AISB/catalog/aisb.t3.052_nsdiff.yaml +155 -0
- package/AISB/catalog/aisb.t3.052_nsdiff.zh.yaml +138 -0
- package/AISB/catalog/aisb.t3.053_k2vae.yaml +158 -0
- package/AISB/catalog/aisb.t3.053_k2vae.zh.yaml +132 -0
- package/AISB/catalog/aisb.t3.054_timebase.yaml +178 -0
- package/AISB/catalog/aisb.t3.054_timebase.zh.yaml +158 -0
- package/AISB/catalog/aisb.t3.055_csbrain.yaml +238 -0
- package/AISB/catalog/aisb.t3.055_csbrain.zh.yaml +184 -0
- package/AISB/catalog/aisb.t3.056_infosam.yaml +224 -0
- package/AISB/catalog/aisb.t3.056_infosam.zh.yaml +189 -0
- package/AISB/catalog/aisb.t3.057_mdreid.yaml +129 -0
- package/AISB/catalog/aisb.t3.057_mdreid.zh.yaml +117 -0
- package/AISB/catalog/aisb.t3.058_mindglitch.yaml +171 -0
- package/AISB/catalog/aisb.t3.058_mindglitch.zh.yaml +145 -0
- package/AISB/catalog/aisb.t3.059_selfsupervised.yaml +154 -0
- package/AISB/catalog/aisb.t3.059_selfsupervised.zh.yaml +125 -0
- package/AISB/catalog/aisb.t3.060_iaggad.yaml +121 -0
- package/AISB/catalog/aisb.t3.060_iaggad.zh.yaml +100 -0
- package/AISB/catalog/aisb.t3.061_hsgkn.yaml +136 -0
- package/AISB/catalog/aisb.t3.061_hsgkn.zh.yaml +113 -0
- package/AISB/catalog/aisb.t3.062_visionts.yaml +237 -0
- package/AISB/catalog/aisb.t3.062_visionts.zh.yaml +216 -0
- package/AISB/catalog/aisb.t3.063_tsrag.yaml +162 -0
- package/AISB/catalog/aisb.t3.063_tsrag.zh.yaml +138 -0
- package/AISB/catalog/aisb.t3.064_pir.yaml +221 -0
- package/AISB/catalog/aisb.t3.064_pir.zh.yaml +197 -0
- package/AISB/catalog/aisb.t3.065_proteinbinding.yaml +234 -0
- package/AISB/catalog/aisb.t3.065_proteinbinding.zh.yaml +167 -0
- package/AISB/catalog/aisb.t3.066_tropicalattention.yaml +267 -0
- package/AISB/catalog/aisb.t3.066_tropicalattention.zh.yaml +229 -0
- package/AISB/catalog/aisb.t3.067_kanad.yaml +193 -0
- package/AISB/catalog/aisb.t3.067_kanad.zh.yaml +167 -0
- package/AISB/catalog/aisb.t3.068_sempo.yaml +187 -0
- package/AISB/catalog/aisb.t3.068_sempo.zh.yaml +148 -0
- package/AISB/catalog/aisb.t3.069_treehfd.yaml +129 -0
- package/AISB/catalog/aisb.t3.069_treehfd.zh.yaml +111 -0
- package/AISB/catalog/aisb.t3.070_certifiedunlearning.yaml +224 -0
- package/AISB/catalog/aisb.t3.070_certifiedunlearning.zh.yaml +171 -0
- package/AISB/catalog/aisb.t3.071_neuralmjd.yaml +142 -0
- package/AISB/catalog/aisb.t3.071_neuralmjd.zh.yaml +120 -0
- package/AISB/catalog/aisb.t3.072_fedgmt.yaml +181 -0
- package/AISB/catalog/aisb.t3.072_fedgmt.zh.yaml +158 -0
- package/AISB/catalog/aisb.t3.073_rld.yaml +161 -0
- package/AISB/catalog/aisb.t3.073_rld.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.074_lsvi.yaml +163 -0
- package/AISB/catalog/aisb.t3.074_lsvi.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.075_treeslicedentropy.yaml +201 -0
- package/AISB/catalog/aisb.t3.075_treeslicedentropy.zh.yaml +148 -0
- package/AISB/catalog/aisb.t3.076_aanet.yaml +169 -0
- package/AISB/catalog/aisb.t3.076_aanet.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.077_cmnn.yaml +199 -0
- package/AISB/catalog/aisb.t3.077_cmnn.zh.yaml +165 -0
- package/AISB/catalog/aisb.t3.078_conformalanomaly.yaml +146 -0
- package/AISB/catalog/aisb.t3.078_conformalanomaly.zh.yaml +117 -0
- package/AISB/catalog/aisb.t3.079_dpfkmeans.yaml +131 -0
- package/AISB/catalog/aisb.t3.079_dpfkmeans.zh.yaml +104 -0
- package/AISB/catalog/aisb.t3.080_latentscorereweight.yaml +169 -0
- package/AISB/catalog/aisb.t3.080_latentscorereweight.zh.yaml +123 -0
- package/AISB/catalog/aisb.t3.081_qmamba.yaml +150 -0
- package/AISB/catalog/aisb.t3.081_qmamba.zh.yaml +117 -0
- package/AISB/catalog/aisb.t3.082_onlinellmrouting.yaml +160 -0
- package/AISB/catalog/aisb.t3.082_onlinellmrouting.zh.yaml +133 -0
- package/AISB/catalog/aisb.t3.083_starformer.yaml +178 -0
- package/AISB/catalog/aisb.t3.083_starformer.zh.yaml +140 -0
- package/AISB/catalog/aisb.t3.084_ift.yaml +139 -0
- package/AISB/catalog/aisb.t3.084_ift.zh.yaml +111 -0
- package/AISB/catalog/aisb.t3.085_neuralsurv.yaml +183 -0
- package/AISB/catalog/aisb.t3.085_neuralsurv.zh.yaml +143 -0
- package/AISB/catalog/aisb.t3.086_stella.yaml +197 -0
- package/AISB/catalog/aisb.t3.086_stella.zh.yaml +142 -0
- package/AISB/catalog/aisb.t3.087_moses.yaml +167 -0
- package/AISB/catalog/aisb.t3.087_moses.zh.yaml +132 -0
- package/AISB/catalog/aisb.t3.088_channelnorm.yaml +140 -0
- package/AISB/catalog/aisb.t3.088_channelnorm.zh.yaml +109 -0
- package/AISB/catalog/aisb.t3.089_causalvelocity.yaml +730 -0
- package/AISB/catalog/aisb.t3.089_causalvelocity.zh.yaml +668 -0
- package/AISB/catalog/aisb.t3.090_rstib.yaml +144 -0
- package/AISB/catalog/aisb.t3.090_rstib.zh.yaml +109 -0
- package/AISB/catalog/aisb.t3.091_timeawarecausal.yaml +132 -0
- package/AISB/catalog/aisb.t3.091_timeawarecausal.zh.yaml +107 -0
- package/AISB/catalog/aisb.t3.092_kmeanslocalopt.yaml +138 -0
- package/AISB/catalog/aisb.t3.092_kmeanslocalopt.zh.yaml +110 -0
- package/AISB/catalog/aisb.t3.093_fedwmsam.yaml +134 -0
- package/AISB/catalog/aisb.t3.093_fedwmsam.zh.yaml +106 -0
- package/AISB/catalog/aisb.t3.094_boundre.yaml +147 -0
- package/AISB/catalog/aisb.t3.094_boundre.zh.yaml +114 -0
- package/AISB/catalog/aisb.t3.095_fastfeaturecp.yaml +153 -0
- package/AISB/catalog/aisb.t3.095_fastfeaturecp.zh.yaml +118 -0
- package/AISB/catalog/aisb.t3.096_m3svm.yaml +189 -0
- package/AISB/catalog/aisb.t3.096_m3svm.zh.yaml +149 -0
- package/AISB/catalog/aisb.t3.097_wassersteintl.yaml +212 -0
- package/AISB/catalog/aisb.t3.097_wassersteintl.zh.yaml +169 -0
- package/AISB/catalog/aisb.t3.098_xmahalanobis.yaml +171 -0
- package/AISB/catalog/aisb.t3.098_xmahalanobis.zh.yaml +127 -0
- package/AISB/catalog/aisb.t3.099_ollalanding.yaml +248 -0
- package/AISB/catalog/aisb.t3.099_ollalanding.zh.yaml +182 -0
- package/AISB/catalog/aisb.t3.100_invmissingdata.yaml +179 -0
- package/AISB/catalog/aisb.t3.100_invmissingdata.zh.yaml +150 -0
- package/AISB/catalog/aisb.t3.101_acia.yaml +164 -0
- package/AISB/catalog/aisb.t3.101_acia.zh.yaml +109 -0
- package/AISB/catalog/aisb.t3.102_stochasticff.yaml +178 -0
- package/AISB/catalog/aisb.t3.102_stochasticff.zh.yaml +130 -0
- package/AISB/catalog/aisb.t3.103_qdcp.yaml +150 -0
- package/AISB/catalog/aisb.t3.103_qdcp.zh.yaml +116 -0
- package/AISB/catalog/aisb.t3.104_balancedactiveinf.yaml +137 -0
- package/AISB/catalog/aisb.t3.104_balancedactiveinf.zh.yaml +104 -0
- package/AISB/catalog/aisb.t3.105_binaryclasseval.yaml +161 -0
- package/AISB/catalog/aisb.t3.105_binaryclasseval.zh.yaml +130 -0
- package/AISB/image/001_aisb.t3.001_savvy.jpg +0 -0
- package/AISB/image/002_aisb.t3.002_pinet.jpg +0 -0
- package/AISB/image/003_aisb.t3.003_dmsqd.jpg +0 -0
- package/AISB/image/004_aisb.t3.004_decentralattn.jpg +0 -0
- package/AISB/image/005_aisb.t3.005_tsae.jpg +0 -0
- package/AISB/image/006_aisb.t3.006_physense.jpg +0 -0
- package/AISB/image/007_aisb.t3.007_reasoningiqa.jpg +0 -0
- package/AISB/image/008_aisb.t3.008_meanflows.jpg +0 -0
- package/AISB/image/009_aisb.t3.009_scoremissing.jpg +0 -0
- package/AISB/image/010_aisb.t3.010_suitabilityfilter.jpg +0 -0
- package/AISB/image/011_aisb.t3.011_osd.jpg +0 -0
- package/AISB/image/012_aisb.t3.012_efficientqat.jpg +0 -0
- package/AISB/image/013_aisb.t3.013_appl.jpg +0 -0
- package/AISB/image/014_aisb.t3.014_piguard.jpg +0 -0
- package/AISB/image/015_aisb.t3.015_frspec.jpg +0 -0
- package/AISB/image/016_aisb.t3.016_mathfusion.jpg +0 -0
- package/AISB/image/017_aisb.t3.017_multimodalglp.jpg +0 -0
- package/AISB/image/018_aisb.t3.018_cotsynth.jpg +0 -0
- package/AISB/image/019_aisb.t3.019_dyscaleut.jpg +0 -0
- package/AISB/image/020_aisb.t3.020_aristotle.jpg +0 -0
- package/AISB/image/021_aisb.t3.021_tokenrecycling.jpg +0 -0
- package/AISB/image/022_aisb.t3.022_chainofreasoning.jpg +0 -0
- package/AISB/image/023_aisb.t3.023_guidedembed.jpg +0 -0
- package/AISB/image/024_aisb.t3.024_outputcentric.jpg +0 -0
- package/AISB/image/025_aisb.t3.025_deeper.jpg +0 -0
- package/AISB/image/026_aisb.t3.026_gartkg.jpg +0 -0
- package/AISB/image/027_aisb.t3.027_citeeval.jpg +0 -0
- package/AISB/image/028_aisb.t3.028_sbam.jpg +0 -0
- package/AISB/image/029_aisb.t3.029_cdqgeoembed.jpg +0 -0
- package/AISB/image/030_aisb.t3.030_processrm.jpg +0 -0
- package/AISB/image/031_aisb.t3.031_circuitstability.jpg +0 -0
- package/AISB/image/032_aisb.t3.032_ptsolver.jpg +0 -0
- package/AISB/image/033_aisb.t3.033_gcse.jpg +0 -0
- package/AISB/image/034_aisb.t3.034_ensemblewm.jpg +0 -0
- package/AISB/image/035_aisb.t3.035_moralvalueswa.jpg +0 -0
- package/AISB/image/036_aisb.t3.036_weakstrongpref.jpg +0 -0
- package/AISB/image/037_aisb.t3.037_dementiamask.jpg +0 -0
- package/AISB/image/038_aisb.t3.038_tinysam.jpg +0 -0
- package/AISB/image/039_aisb.t3.039_calf.jpg +0 -0
- package/AISB/image/040_aisb.t3.040_graniteguardian.jpg +0 -0
- package/AISB/image/041_aisb.t3.041_amdm.jpg +0 -0
- package/AISB/image/042_aisb.t3.042_xpatch.jpg +0 -0
- package/AISB/image/043_aisb.t3.043_vhm.jpg +0 -0
- package/AISB/image/044_aisb.t3.044_rgvi.jpg +0 -0
- package/AISB/image/045_aisb.t3.045_pslstm.jpg +0 -0
- package/AISB/image/046_aisb.t3.046_nonstatts.jpg +0 -0
- package/AISB/image/047_aisb.t3.047_timepfn.jpg +0 -0
- package/AISB/image/048_aisb.t3.048_proxyspex.jpg +0 -0
- package/AISB/image/049_aisb.t3.049_hogwildinference.jpg +0 -0
- package/AISB/image/050_aisb.t3.050_causalpfn.jpg +0 -0
- package/AISB/image/051_aisb.t3.051_flashtp.jpg +0 -0
- package/AISB/image/052_aisb.t3.052_nsdiff.jpg +0 -0
- package/AISB/image/053_aisb.t3.053_k2vae.jpg +0 -0
- package/AISB/image/054_aisb.t3.054_timebase.jpg +0 -0
- package/AISB/image/055_aisb.t3.055_csbrain.jpg +0 -0
- package/AISB/image/056_aisb.t3.056_infosam.jpg +0 -0
- package/AISB/image/057_aisb.t3.057_mdreid.jpg +0 -0
- package/AISB/image/058_aisb.t3.058_mindglitch.jpg +0 -0
- package/AISB/image/059_aisb.t3.059_selfsupervised.jpg +0 -0
- package/AISB/image/060_aisb.t3.060_iaggad.jpg +0 -0
- package/AISB/image/061_aisb.t3.061_hsgkn.jpg +0 -0
- package/AISB/image/062_aisb.t3.062_visionts.jpg +0 -0
- package/AISB/image/063_aisb.t3.063_tsrag.jpg +0 -0
- package/AISB/image/064_aisb.t3.064_pir.jpg +0 -0
- package/AISB/image/065_aisb.t3.065_proteinbinding.jpg +0 -0
- package/AISB/image/066_aisb.t3.066_tropicalattention.jpg +0 -0
- package/AISB/image/067_aisb.t3.067_kanad.jpg +0 -0
- package/AISB/image/068_aisb.t3.068_sempo.jpg +0 -0
- package/AISB/image/069_aisb.t3.069_treehfd.jpg +0 -0
- package/AISB/image/070_aisb.t3.070_certifiedunlearning.jpg +0 -0
- package/AISB/image/071_aisb.t3.071_neuralmjd.jpg +0 -0
- package/AISB/image/072_aisb.t3.072_fedgmt.jpg +0 -0
- package/AISB/image/073_aisb.t3.073_rld.jpg +0 -0
- package/AISB/image/074_aisb.t3.074_lsvi.jpg +0 -0
- package/AISB/image/075_aisb.t3.075_treeslicedentropy.jpg +0 -0
- package/AISB/image/076_aisb.t3.076_aanet.jpg +0 -0
- package/AISB/image/077_aisb.t3.077_cmnn.jpg +0 -0
- package/AISB/image/078_aisb.t3.078_conformalanomaly.jpg +0 -0
- package/AISB/image/079_aisb.t3.079_dpfkmeans.jpg +0 -0
- package/AISB/image/080_aisb.t3.080_latentscorereweight.jpg +0 -0
- package/AISB/image/081_aisb.t3.081_qmamba.jpg +0 -0
- package/AISB/image/082_aisb.t3.082_onlinellmrouting.jpg +0 -0
- package/AISB/image/083_aisb.t3.083_starformer.jpg +0 -0
- package/AISB/image/084_aisb.t3.084_ift.jpg +0 -0
- package/AISB/image/085_aisb.t3.085_neuralsurv.jpg +0 -0
- package/AISB/image/086_aisb.t3.086_stella.jpg +0 -0
- package/AISB/image/087_aisb.t3.087_moses.jpg +0 -0
- package/AISB/image/088_aisb.t3.088_channelnorm.jpg +0 -0
- package/AISB/image/089_aisb.t3.089_causalvelocity.jpg +0 -0
- package/AISB/image/090_aisb.t3.090_rstib.jpg +0 -0
- package/AISB/image/091_aisb.t3.091_timeawarecausal.jpg +0 -0
- package/AISB/image/092_aisb.t3.092_kmeanslocalopt.jpg +0 -0
- package/AISB/image/093_aisb.t3.093_fedwmsam.jpg +0 -0
- package/AISB/image/094_aisb.t3.094_boundre.jpg +0 -0
- package/AISB/image/095_aisb.t3.095_fastfeaturecp.jpg +0 -0
- package/AISB/image/096_aisb.t3.096_m3svm.jpg +0 -0
- package/AISB/image/097_aisb.t3.097_wassersteintl.jpg +0 -0
- package/AISB/image/098_aisb.t3.098_xmahalanobis.jpg +0 -0
- package/AISB/image/099_aisb.t3.099_ollalanding.jpg +0 -0
- package/AISB/image/100_aisb.t3.100_invmissingdata.jpg +0 -0
- package/AISB/image/101_aisb.t3.101_acia.jpg +0 -0
- package/AISB/image/102_aisb.t3.102_stochasticff.jpg +0 -0
- package/AISB/image/103_aisb.t3.103_qdcp.jpg +0 -0
- package/AISB/image/104_aisb.t3.104_balancedactiveinf.jpg +0 -0
- package/AISB/image/105_aisb.t3.105_binaryclasseval.jpg +0 -0
- package/AISB/image/106_aisb.t1.reasoning_lite.jpg +0 -0
- package/AISB/image/107_aisb.t2.paper_audit.jpg +0 -0
- package/AISB/image/108_aisb.t3.multi_gpu_search.jpg +0 -0
- package/AISB/image/109_aisb.t3.tdc_admet.jpg +0 -0
- package/AISB/image/aisb.b1.agentic_coding.svg +16 -0
- package/AISB/image/aisb.b10.climate_earth.svg +16 -0
- package/AISB/image/aisb.b11.model_efficiency.svg +16 -0
- package/AISB/image/aisb.b12.embodied_ai.svg +16 -0
- package/AISB/image/aisb.b2.agent_systems.svg +16 -0
- package/AISB/image/aisb.b3.self_evolving_rl.svg +16 -0
- package/AISB/image/aisb.b4.lm_reasoning.svg +16 -0
- package/AISB/image/aisb.b5.math_proof.svg +16 -0
- package/AISB/image/aisb.b6.research_process.svg +16 -0
- package/AISB/image/aisb.b7.multimodal_fusion.svg +16 -0
- package/AISB/image/aisb.b8.lifesci_drug.svg +16 -0
- package/AISB/image/aisb.b9.material_science.svg +16 -0
- package/README.md +132 -11
- package/bin/ds.js +376 -49
- package/docs/en/00_QUICK_START.md +135 -18
- package/docs/en/01_SETTINGS_REFERENCE.md +468 -96
- package/docs/en/02_START_RESEARCH_GUIDE.md +26 -5
- package/docs/en/03_QQ_CONNECTOR_GUIDE.md +14 -3
- package/docs/en/04_LINGZHU_CONNECTOR_GUIDE.md +2 -0
- package/docs/en/05_TUI_GUIDE.md +171 -2
- package/docs/en/07_MEMORY_AND_MCP.md +38 -2
- package/docs/en/09_DOCTOR.md +64 -4
- package/docs/en/10_WEIXIN_CONNECTOR_GUIDE.md +38 -1
- package/docs/en/11_LICENSE_AND_RISK.md +4 -0
- package/docs/en/12_GUIDED_WORKFLOW_TOUR.md +15 -0
- package/docs/en/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +9 -0
- package/docs/en/15_CODEX_PROVIDER_SETUP.md +622 -187
- package/docs/en/16_TELEGRAM_CONNECTOR_GUIDE.md +14 -0
- package/docs/en/17_WHATSAPP_CONNECTOR_GUIDE.md +14 -0
- package/docs/en/18_FEISHU_CONNECTOR_GUIDE.md +14 -0
- package/docs/en/21_LOCAL_MODEL_BACKENDS_GUIDE.md +105 -2
- package/docs/en/22_BENCHSTORE_YAML_REFERENCE.md +469 -0
- package/docs/en/23_BENCHSTORE_GITHUB_RELEASES_SPEC.md +316 -0
- package/docs/en/24_CLAUDE_CODE_PROVIDER_SETUP.md +469 -0
- package/docs/en/25_OPENCODE_PROVIDER_SETUP.md +653 -0
- package/docs/en/26_CITATION_AND_ATTRIBUTION.md +119 -0
- package/docs/en/27_KIMI_CODE_PROVIDER_SETUP.md +180 -0
- package/docs/en/28_DISCORD_CONNECTOR_GUIDE.md +61 -0
- package/docs/en/29_SLACK_CONNECTOR_GUIDE.md +60 -0
- package/docs/en/30_SETTINGS_CONTROL_CENTER_GUIDE.md +371 -0
- package/docs/en/{19_LOCAL_BROWSER_AUTH.md → 31_LOCAL_BROWSER_AUTH.md} +1 -1
- package/docs/en/32_WINDOWS_WSL2_DEPLOYMENT_GUIDE.md +273 -0
- package/docs/en/33_WORKSPACE_EXPLORER_QA.md +121 -0
- package/docs/en/91_DEVELOPMENT.md +29 -0
- package/docs/en/99_ACKNOWLEDGEMENTS.md +24 -19
- package/docs/en/README.md +44 -7
- package/docs/images/admin/admin-connectors-health-en.png +0 -0
- package/docs/images/admin/admin-controllers-en.png +0 -0
- package/docs/images/admin/admin-diagnostics-en.png +0 -0
- package/docs/images/admin/admin-errors-en.png +0 -0
- package/docs/images/admin/admin-issues-en.png +0 -0
- package/docs/images/admin/admin-logs-en.png +0 -0
- package/docs/images/admin/admin-quest-detail-en.png +0 -0
- package/docs/images/admin/admin-quests-en.png +0 -0
- package/docs/images/admin/admin-repairs-en.png +0 -0
- package/docs/images/admin/admin-runtime-en.png +0 -0
- package/docs/images/admin/admin-search-en.png +0 -0
- package/docs/images/admin/admin-stats-en.png +0 -0
- package/docs/images/admin/admin-summary-en.png +0 -0
- package/docs/images/connectors/connector-discord-en.png +0 -0
- package/docs/images/connectors/connector-feishu-en.png +0 -0
- package/docs/images/connectors/connector-lingzhu-en.png +0 -0
- package/docs/images/connectors/connector-qq-en.png +0 -0
- package/docs/images/connectors/connector-slack-en.png +0 -0
- package/docs/images/connectors/connector-telegram-en.png +0 -0
- package/docs/images/connectors/connector-weixin-en.png +0 -0
- package/docs/images/connectors/connector-whatsapp-en.png +0 -0
- package/docs/images/settings/settings-baselines-en.png +0 -0
- package/docs/images/settings/settings-config-en.png +0 -0
- package/docs/images/settings/settings-connectors-overview-en.png +0 -0
- package/docs/images/settings/settings-deepxiv-en.png +0 -0
- package/docs/images/settings/settings-mcp-servers-en.png +0 -0
- package/docs/images/settings/settings-plugins-en.png +0 -0
- package/docs/images/settings/settings-runners-en.png +0 -0
- package/docs/zh/00_QUICK_START.md +92 -17
- package/docs/zh/01_SETTINGS_REFERENCE.md +219 -98
- package/docs/zh/02_START_RESEARCH_GUIDE.md +26 -5
- package/docs/zh/05_TUI_GUIDE.md +171 -2
- package/docs/zh/07_MEMORY_AND_MCP.md +29 -2
- package/docs/zh/09_DOCTOR.md +39 -4
- package/docs/zh/10_WEIXIN_CONNECTOR_GUIDE.md +24 -1
- package/docs/zh/11_LICENSE_AND_RISK.md +4 -0
- package/docs/zh/12_GUIDED_WORKFLOW_TOUR.md +15 -0
- package/docs/zh/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +9 -0
- package/docs/zh/15_CODEX_PROVIDER_SETUP.md +550 -188
- package/docs/zh/21_LOCAL_MODEL_BACKENDS_GUIDE.md +105 -2
- package/docs/zh/22_BENCHSTORE_YAML_REFERENCE.md +459 -0
- package/docs/zh/23_BENCHSTORE_GITHUB_RELEASES_SPEC.md +287 -0
- package/docs/zh/23_CLAUDE_RUNNER_GUIDE.md +103 -0
- package/docs/zh/24_CLAUDE_CODE_PROVIDER_SETUP.md +460 -0
- package/docs/zh/25_OPENCODE_PROVIDER_SETUP.md +660 -0
- package/docs/zh/26_CITATION_AND_ATTRIBUTION.md +102 -0
- package/docs/zh/27_KIMI_CODE_PROVIDER_SETUP.md +51 -0
- package/docs/zh/{19_LOCAL_BROWSER_AUTH.md → 31_LOCAL_BROWSER_AUTH.md} +1 -1
- package/docs/zh/32_WINDOWS_WSL2_DEPLOYMENT_GUIDE.md +264 -0
- package/docs/zh/33_WORKSPACE_EXPLORER_QA.md +127 -0
- package/docs/zh/99_ACKNOWLEDGEMENTS.md +23 -19
- package/docs/zh/README.md +29 -7
- package/install.sh +122 -16
- package/package.json +4 -1
- package/pyproject.toml +2 -1
- package/src/deepscientist/__init__.py +1 -1
- package/src/deepscientist/acp/envelope.py +13 -0
- package/src/deepscientist/admin/__init__.py +3 -0
- package/src/deepscientist/admin/charts.py +681 -0
- package/src/deepscientist/admin/logs.py +119 -0
- package/src/deepscientist/admin/repairs.py +217 -0
- package/src/deepscientist/admin/service.py +1310 -0
- package/src/deepscientist/admin/system_info.py +700 -0
- package/src/deepscientist/admin/tasks.py +465 -0
- package/src/deepscientist/admin/tool_metrics.py +600 -0
- package/src/deepscientist/artifact/guidance.py +8 -4
- package/src/deepscientist/artifact/schemas.py +115 -0
- package/src/deepscientist/artifact/service.py +4268 -260
- package/src/deepscientist/bash_exec/monitor.py +30 -3
- package/src/deepscientist/bash_exec/service.py +134 -1
- package/src/deepscientist/benchstore/__init__.py +4 -0
- package/src/deepscientist/benchstore/prompt_builder.py +224 -0
- package/src/deepscientist/benchstore/service.py +1716 -0
- package/src/deepscientist/channels/weixin_ilink.py +8 -1
- package/src/deepscientist/cli.py +92 -17
- package/src/deepscientist/codex_cli_compat.py +2 -2
- package/src/deepscientist/config/models.py +82 -11
- package/src/deepscientist/config/service.py +927 -91
- package/src/deepscientist/connector/weixin_support.py +48 -17
- package/src/deepscientist/daemon/api/handlers.py +697 -210
- package/src/deepscientist/daemon/api/router.py +76 -1
- package/src/deepscientist/daemon/app.py +1054 -51
- package/src/deepscientist/diagnostics/runner_failures.py +147 -0
- package/src/deepscientist/doctor.py +212 -65
- package/src/deepscientist/evidence_packets.py +590 -0
- package/src/deepscientist/home.py +52 -4
- package/src/deepscientist/kimi_cli_compat.py +50 -0
- package/src/deepscientist/latex_runtime.py +2 -2
- package/src/deepscientist/mcp/context.py +2 -0
- package/src/deepscientist/mcp/schemas.py +114 -0
- package/src/deepscientist/mcp/server.py +1566 -126
- package/src/deepscientist/memory/service.py +203 -16
- package/src/deepscientist/process_control.py +8 -1
- package/src/deepscientist/prompts/builder.py +836 -92
- package/src/deepscientist/quest/__init__.py +2 -2
- package/src/deepscientist/quest/layout.py +12 -1
- package/src/deepscientist/quest/node_traces.py +10 -0
- package/src/deepscientist/quest/service.py +1430 -139
- package/src/deepscientist/quest/stage_views.py +1 -1
- package/src/deepscientist/runners/__init__.py +18 -0
- package/src/deepscientist/runners/base.py +89 -1
- package/src/deepscientist/runners/builtins.py +13 -1
- package/src/deepscientist/runners/claude.py +391 -0
- package/src/deepscientist/runners/codex.py +421 -21
- package/src/deepscientist/runners/codex_telemetry.py +127 -0
- package/src/deepscientist/runners/kimi.py +334 -0
- package/src/deepscientist/runners/metadata.py +68 -0
- package/src/deepscientist/runners/opencode.py +414 -0
- package/src/deepscientist/runners/runtime_overrides.py +100 -0
- package/src/deepscientist/runners/simple_cli.py +538 -0
- package/src/deepscientist/runtime_storage.py +303 -0
- package/src/deepscientist/shared.py +61 -16
- package/src/deepscientist/skills/installer.py +37 -0
- package/src/deepscientist/skills/registry.py +2 -0
- package/src/deepscientist/tinytex.py +2 -2
- package/src/deepscientist/tui.py +10 -3
- package/src/prompts/benchstore/system.md +77 -0
- package/src/prompts/connectors/qq.md +33 -2
- package/src/prompts/connectors/weixin.md +208 -23
- package/src/prompts/contracts/admin_ops.md +74 -0
- package/src/prompts/contracts/admin_ops_knowledge.md +138 -0
- package/src/prompts/contracts/shared_interaction.md +5 -11
- package/src/prompts/start_setup/system.md +422 -0
- package/src/prompts/system.md +409 -315
- package/src/prompts/system_copilot.md +88 -12
- package/src/skills/analysis-campaign/SKILL.md +239 -578
- package/src/skills/analysis-campaign/references/artifact-flow-examples.md +102 -0
- package/src/skills/analysis-campaign/references/boundary-cases.md +98 -0
- package/src/skills/analysis-campaign/references/campaign-checklist-template.md +39 -24
- package/src/skills/analysis-campaign/references/campaign-design.md +26 -10
- package/src/skills/analysis-campaign/references/campaign-plan-template.md +53 -54
- package/src/skills/analysis-campaign/references/operational-guidance.md +97 -0
- package/src/skills/analysis-campaign/references/writing-facing-slice-examples.md +10 -20
- package/src/skills/baseline/SKILL.md +183 -461
- package/src/skills/baseline/references/artifact-flow-examples.md +106 -0
- package/src/skills/baseline/references/artifact-payload-examples.md +1 -1
- package/src/skills/baseline/references/baseline-checklist-template.md +27 -35
- package/src/skills/baseline/references/baseline-plan-template.md +37 -76
- package/src/skills/baseline/references/boundary-cases.md +86 -0
- package/src/skills/baseline/references/codebase-audit-checklist.md +2 -6
- package/src/skills/baseline/references/comparability-contract.md +7 -12
- package/src/skills/baseline/references/operational-guidance.md +56 -0
- package/src/skills/baseline/references/route-selection.md +5 -25
- package/src/skills/decision/SKILL.md +113 -306
- package/src/skills/decision/references/checkpoint-memory-template.md +47 -0
- package/src/skills/decision/references/operational-guidance.md +94 -0
- package/src/skills/decision/references/research-route-criteria.md +7 -8
- package/src/skills/decision/references/strategic-decision-template.md +13 -26
- package/src/skills/experiment/SKILL.md +132 -670
- package/src/skills/experiment/references/execution-playbook.md +374 -0
- package/src/skills/experiment/references/main-experiment-checklist-template.md +26 -2
- package/src/skills/experiment/references/main-experiment-plan-template.md +28 -17
- package/src/skills/experiment/references/operational-guidance.md +108 -0
- package/src/skills/finalize/SKILL.md +62 -0
- package/src/skills/finalize/references/checkpoint-memory-template.md +49 -0
- package/src/skills/finalize/references/resume-packet-template.md +7 -0
- package/src/skills/idea/SKILL.md +228 -15
- package/src/skills/idea/references/controlled-brainstorming-playbook.md +78 -0
- package/src/skills/idea/references/current-board-packet-template.md +61 -0
- package/src/skills/idea/references/high-value-idea-sourcing.md +119 -0
- package/src/skills/idea/references/idea-generation-playbook.md +21 -0
- package/src/skills/idea/references/idea-thinking-flow.md +6 -0
- package/src/skills/idea/references/literature-survey-template.md +3 -0
- package/src/skills/idea/references/objective-contract-template.md +54 -0
- package/src/skills/idea/references/outline-seeding-example.md +56 -0
- package/src/skills/idea/references/pre-idea-draft-template.md +105 -0
- package/src/skills/idea/references/related-work-playbook.md +75 -2
- package/src/skills/idea/references/research-history-playbook.md +114 -0
- package/src/skills/idea/references/selection-gate.md +58 -6
- package/src/skills/intake-audit/SKILL.md +43 -2
- package/src/skills/intake-audit/references/state-audit-template.md +10 -0
- package/src/skills/nature-data/SKILL.md +128 -0
- package/src/skills/nature-data/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-data/agents/openai.yaml +4 -0
- package/src/skills/nature-data/references/chinese-author-alignment.md +84 -0
- package/src/skills/nature-data/references/fair-metadata-checklist.md +105 -0
- package/src/skills/nature-data/references/policy-principles.md +103 -0
- package/src/skills/nature-data/references/repository-and-identifiers.md +96 -0
- package/src/skills/nature-data/references/source-basis.md +54 -0
- package/src/skills/nature-data/references/statement-patterns.md +153 -0
- package/src/skills/nature-figure/SKILL.md +197 -0
- package/src/skills/nature-figure/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-figure/agents/openai.yaml +4 -0
- package/src/skills/nature-figure/evals/evals.json +37 -0
- package/src/skills/nature-figure/references/api.md +428 -0
- package/src/skills/nature-figure/references/backend-selection.md +100 -0
- package/src/skills/nature-figure/references/chart-types.md +281 -0
- package/src/skills/nature-figure/references/common-patterns.md +349 -0
- package/src/skills/nature-figure/references/design-theory.md +436 -0
- package/src/skills/nature-figure/references/figure-contract.md +93 -0
- package/src/skills/nature-figure/references/nature-2026-observations.md +112 -0
- package/src/skills/nature-figure/references/qa-contract.md +119 -0
- package/src/skills/nature-figure/references/r-template-index.md +66 -0
- package/src/skills/nature-figure/references/r-workflow.md +161 -0
- package/src/skills/nature-figure/references/tutorials.md +250 -0
- package/src/skills/nature-paper2ppt/SKILL.md +507 -0
- package/src/skills/nature-paper2ppt/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-paper2ppt/agents/openai.yaml +4 -0
- package/src/skills/nature-polishing/SKILL.md +385 -0
- package/src/skills/nature-polishing/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-polishing/agents/openai.yaml +4 -0
- package/src/skills/nature-polishing/references/phrasebank-playbook.md +162 -0
- package/src/skills/nature-polishing/references/section-moves.md +240 -0
- package/src/skills/nature-polishing/references/style-guardrails.md +94 -0
- package/src/skills/nature-polishing/references/writing-strategy.md +148 -0
- package/src/skills/optimize/SKILL.md +177 -1568
- package/src/skills/optimize/references/brief-shaping-playbook.md +95 -0
- package/src/skills/optimize/references/candidate-board-template.md +13 -0
- package/src/skills/optimize/references/candidate-ranking-template.md +51 -0
- package/src/skills/optimize/references/codegen-route-playbook.md +50 -0
- package/src/skills/optimize/references/debug-response-template.md +29 -0
- package/src/skills/optimize/references/frontier-review-template.md +32 -0
- package/src/skills/optimize/references/fusion-playbook.md +36 -0
- package/src/skills/optimize/references/method-brief-template.md +73 -0
- package/src/skills/optimize/references/operational-guidance.md +621 -0
- package/src/skills/optimize/references/optimization-memory-template.md +30 -0
- package/src/skills/optimize/references/optimize-checklist-template.md +18 -0
- package/src/skills/optimize/references/plateau-response-playbook.md +28 -0
- package/src/skills/optimize/references/prompt-patterns.md +49 -0
- package/src/skills/paper-outline/SKILL.md +227 -0
- package/src/skills/paper-outline/references/outline-patterns.md +87 -0
- package/src/skills/paper-plot/SKILL.md +79 -0
- package/src/skills/paper-plot/agents/openai.yaml +4 -0
- package/src/skills/paper-plot/references/bar_grouped_hatch.md +96 -0
- package/src/skills/paper-plot/references/bar_paired_delta.md +72 -0
- package/src/skills/paper-plot/references/line_confidence_band.md +75 -0
- package/src/skills/paper-plot/references/line_loss_with_inset.md +65 -0
- package/src/skills/paper-plot/references/line_training_curve.md +44 -0
- package/src/skills/paper-plot/references/radar_dual_series.md +59 -0
- package/src/skills/paper-plot/references/scatter_broken_axis.md +59 -0
- package/src/skills/paper-plot/references/scatter_tsne_cluster.md +72 -0
- package/src/skills/paper-plot/scripts/bar_memevolve.py +109 -0
- package/src/skills/paper-plot/scripts/bar_spice.py +166 -0
- package/src/skills/paper-plot/scripts/line_aime.py +94 -0
- package/src/skills/paper-plot/scripts/line_loss_inset.py +157 -0
- package/src/skills/paper-plot/scripts/line_selfdistill.py +168 -0
- package/src/skills/paper-plot/scripts/radar_dora.py +151 -0
- package/src/skills/paper-plot/scripts/scatter_break.py +169 -0
- package/src/skills/paper-plot/scripts/scatter_tsne.py +133 -0
- package/src/skills/rebuttal/SKILL.md +9 -0
- package/src/skills/references/tool-usage-by-stage.md +438 -0
- package/src/skills/review/SKILL.md +105 -7
- package/src/skills/science/PROVENANCE.md +44 -0
- package/src/skills/science/SKILL.md +137 -0
- package/src/skills/science/references/artifact-science-tool.md +110 -0
- package/src/skills/science/references/claim-type-discipline.md +56 -0
- package/src/skills/science/references/domain-index.md +422 -0
- package/src/skills/science/references/hpc-via-bash-exec.md +42 -0
- package/src/skills/science/references/package-check-playbook.md +64 -0
- package/src/skills/science/references/package-index.min.json +3616 -0
- package/src/skills/science/references/packages/abinit.md +80 -0
- package/src/skills/science/references/packages/acts.md +73 -0
- package/src/skills/science/references/packages/aiida-core.md +80 -0
- package/src/skills/science/references/packages/alamode.md +80 -0
- package/src/skills/science/references/packages/amuse.md +88 -0
- package/src/skills/science/references/packages/anndata.md +88 -0
- package/src/skills/science/references/packages/arbor.md +80 -0
- package/src/skills/science/references/packages/arc.md +73 -0
- package/src/skills/science/references/packages/astropy.md +88 -0
- package/src/skills/science/references/packages/astroquery.md +88 -0
- package/src/skills/science/references/packages/atomate2.md +80 -0
- package/src/skills/science/references/packages/atomsmltr.md +73 -0
- package/src/skills/science/references/packages/awkward.md +73 -0
- package/src/skills/science/references/packages/batman.md +88 -0
- package/src/skills/science/references/packages/biopython.md +88 -0
- package/src/skills/science/references/packages/bloqade.md +73 -0
- package/src/skills/science/references/packages/brian2.md +73 -0
- package/src/skills/science/references/packages/bullet3.md +73 -0
- package/src/skills/science/references/packages/calculix.md +80 -0
- package/src/skills/science/references/packages/cantera.md +73 -0
- package/src/skills/science/references/packages/cavity-md-ipi.md +80 -0
- package/src/skills/science/references/packages/ccdproc.md +88 -0
- package/src/skills/science/references/packages/celerite2.md +88 -0
- package/src/skills/science/references/packages/cellrank.md +73 -0
- package/src/skills/science/references/packages/cesm.md +80 -0
- package/src/skills/science/references/packages/chemicals.md +73 -0
- package/src/skills/science/references/packages/chempy.md +73 -0
- package/src/skills/science/references/packages/cirq.md +73 -0
- package/src/skills/science/references/packages/coffea.md +73 -0
- package/src/skills/science/references/packages/cp2k.md +88 -0
- package/src/skills/science/references/packages/custodian.md +80 -0
- package/src/skills/science/references/packages/dart.md +73 -0
- package/src/skills/science/references/packages/datamol.md +88 -0
- package/src/skills/science/references/packages/dd4hep.md +73 -0
- package/src/skills/science/references/packages/dealii.md +80 -0
- package/src/skills/science/references/packages/deepchem.md +88 -0
- package/src/skills/science/references/packages/delphes.md +73 -0
- package/src/skills/science/references/packages/devito.md +80 -0
- package/src/skills/science/references/packages/dftb.md +88 -0
- package/src/skills/science/references/packages/dftd4.md +88 -0
- package/src/skills/science/references/packages/dftk-jl.md +80 -0
- package/src/skills/science/references/packages/dolfinx.md +80 -0
- package/src/skills/science/references/packages/drake.md +73 -0
- package/src/skills/science/references/packages/dumux.md +73 -0
- package/src/skills/science/references/packages/elk.md +80 -0
- package/src/skills/science/references/packages/elmerfem.md +80 -0
- package/src/skills/science/references/packages/enzo-e.md +88 -0
- package/src/skills/science/references/packages/espresso.md +80 -0
- package/src/skills/science/references/packages/exoplanet.md +88 -0
- package/src/skills/science/references/packages/fairroot.md +73 -0
- package/src/skills/science/references/packages/fbpic.md +80 -0
- package/src/skills/science/references/packages/fdtdbath-meep.md +80 -0
- package/src/skills/science/references/packages/geant4.md +73 -0
- package/src/skills/science/references/packages/geosx.md +80 -0
- package/src/skills/science/references/packages/gprmax.md +80 -0
- package/src/skills/science/references/packages/gromacs.md +80 -0
- package/src/skills/science/references/packages/gwaslab.md +73 -0
- package/src/skills/science/references/packages/gz-sim.md +73 -0
- package/src/skills/science/references/packages/hail.md +88 -0
- package/src/skills/science/references/packages/hiphive.md +80 -0
- package/src/skills/science/references/packages/hoomd-blue.md +80 -0
- package/src/skills/science/references/packages/itensor.md +73 -0
- package/src/skills/science/references/packages/itensors-jl.md +73 -0
- package/src/skills/science/references/packages/jdftx.md +73 -0
- package/src/skills/science/references/packages/jobflow.md +80 -0
- package/src/skills/science/references/packages/kadanoffbaym-jl.md +73 -0
- package/src/skills/science/references/packages/kite.md +80 -0
- package/src/skills/science/references/packages/kratos.md +80 -0
- package/src/skills/science/references/packages/kwant.md +73 -0
- package/src/skills/science/references/packages/lammps.md +80 -0
- package/src/skills/science/references/packages/lightkurve.md +88 -0
- package/src/skills/science/references/packages/limix.md +73 -0
- package/src/skills/science/references/packages/maxwelllink.md +80 -0
- package/src/skills/science/references/packages/mcdc.md +73 -0
- package/src/skills/science/references/packages/meep.md +80 -0
- package/src/skills/science/references/packages/mfem.md +80 -0
- package/src/skills/science/references/packages/mitgcm.md +73 -0
- package/src/skills/science/references/packages/modflow6.md +73 -0
- package/src/skills/science/references/packages/molecool.md +73 -0
- package/src/skills/science/references/packages/mom6.md +73 -0
- package/src/skills/science/references/packages/moose.md +80 -0
- package/src/skills/science/references/packages/mpas-model.md +73 -0
- package/src/skills/science/references/packages/mujoco.md +73 -0
- package/src/skills/science/references/packages/mumax3.md +73 -0
- package/src/skills/science/references/packages/nekrs.md +80 -0
- package/src/skills/science/references/packages/nessi.md +73 -0
- package/src/skills/science/references/packages/nest-simulator.md +73 -0
- package/src/skills/science/references/packages/netket.md +73 -0
- package/src/skills/science/references/packages/neuron.md +73 -0
- package/src/skills/science/references/packages/nextflow.md +88 -0
- package/src/skills/science/references/packages/nwchem.md +88 -0
- package/src/skills/science/references/packages/openbabel.md +88 -0
- package/src/skills/science/references/packages/openems.md +80 -0
- package/src/skills/science/references/packages/openff-toolkit.md +88 -0
- package/src/skills/science/references/packages/openfoam-dev.md +80 -0
- package/src/skills/science/references/packages/openmc.md +73 -0
- package/src/skills/science/references/packages/openmm.md +80 -0
- package/src/skills/science/references/packages/openmoc.md +73 -0
- package/src/skills/science/references/packages/openmx.md +80 -0
- package/src/skills/science/references/packages/opensees.md +80 -0
- package/src/skills/science/references/packages/opensn.md +80 -0
- package/src/skills/science/references/packages/opm-simulators.md +73 -0
- package/src/skills/science/references/packages/oqupy.md +73 -0
- package/src/skills/science/references/packages/packmol.md +80 -0
- package/src/skills/science/references/packages/palabos.md +80 -0
- package/src/skills/science/references/packages/parflow.md +80 -0
- package/src/skills/science/references/packages/pennylane.md +88 -0
- package/src/skills/science/references/packages/perceval.md +73 -0
- package/src/skills/science/references/packages/phono3py.md +73 -0
- package/src/skills/science/references/packages/phonopy.md +73 -0
- package/src/skills/science/references/packages/photutils.md +88 -0
- package/src/skills/science/references/packages/picongpu.md +80 -0
- package/src/skills/science/references/packages/plink-ng.md +88 -0
- package/src/skills/science/references/packages/precice.md +73 -0
- package/src/skills/science/references/packages/psc.md +80 -0
- package/src/skills/science/references/packages/psi4.md +88 -0
- package/src/skills/science/references/packages/pybinding.md +73 -0
- package/src/skills/science/references/packages/pyfr.md +80 -0
- package/src/skills/science/references/packages/pyhf.md +73 -0
- package/src/skills/science/references/packages/pyiron_base.md +80 -0
- package/src/skills/science/references/packages/pylcp.md +73 -0
- package/src/skills/science/references/packages/pylith.md +80 -0
- package/src/skills/science/references/packages/pynbody.md +88 -0
- package/src/skills/science/references/packages/pysam.md +88 -0
- package/src/skills/science/references/packages/pyscf.md +88 -0
- package/src/skills/science/references/packages/q-e.md +73 -0
- package/src/skills/science/references/packages/qibo.md +73 -0
- package/src/skills/science/references/packages/qiskit.md +73 -0
- package/src/skills/science/references/packages/quantica-jl.md +73 -0
- package/src/skills/science/references/packages/quantumoptics-jl.md +73 -0
- package/src/skills/science/references/packages/quimb.md +73 -0
- package/src/skills/science/references/packages/qulacs.md +73 -0
- package/src/skills/science/references/packages/qutip.md +73 -0
- package/src/skills/science/references/packages/rdkit.md +88 -0
- package/src/skills/science/references/packages/rmg-py.md +73 -0
- package/src/skills/science/references/packages/root.md +73 -0
- package/src/skills/science/references/packages/scanpy.md +88 -0
- package/src/skills/science/references/packages/scikit-allel.md +88 -0
- package/src/skills/science/references/packages/scikit-bio.md +88 -0
- package/src/skills/science/references/packages/scqubits.md +73 -0
- package/src/skills/science/references/packages/scuff-em.md +80 -0
- package/src/skills/science/references/packages/scvi-tools.md +73 -0
- package/src/skills/science/references/packages/seissol.md +73 -0
- package/src/skills/science/references/packages/sfepy.md +80 -0
- package/src/skills/science/references/packages/sisl.md +73 -0
- package/src/skills/science/references/packages/smilei.md +80 -0
- package/src/skills/science/references/packages/snakemake.md +88 -0
- package/src/skills/science/references/packages/specfem3d-globe.md +80 -0
- package/src/skills/science/references/packages/specutils.md +88 -0
- package/src/skills/science/references/packages/spglib.md +80 -0
- package/src/skills/science/references/packages/squidpy.md +88 -0
- package/src/skills/science/references/packages/starry.md +88 -0
- package/src/skills/science/references/packages/strawberryfields.md +73 -0
- package/src/skills/science/references/packages/su2.md +80 -0
- package/src/skills/science/references/packages/sunny-jl.md +73 -0
- package/src/skills/science/references/packages/sw4.md +73 -0
- package/src/skills/science/references/packages/swift.md +88 -0
- package/src/skills/science/references/packages/tdnegf.md +73 -0
- package/src/skills/science/references/packages/tenpy.md +73 -0
- package/src/skills/science/references/packages/thermo.md +73 -0
- package/src/skills/science/references/packages/tkwant.md +73 -0
- package/src/skills/science/references/packages/tvb-root.md +73 -0
- package/src/skills/science/references/packages/uproot5.md +73 -0
- package/src/skills/science/references/packages/vampire.md +80 -0
- package/src/skills/science/references/packages/wannier_tools.md +73 -0
- package/src/skills/science/references/packages/warpx.md +80 -0
- package/src/skills/science/references/packages/wrf.md +73 -0
- package/src/skills/science/references/packages/xtb.md +88 -0
- package/src/skills/science/references/packages/yt.md +73 -0
- package/src/skills/science/references/science-task-brief-template.md +71 -0
- package/src/skills/scout/SKILL.md +83 -425
- package/src/skills/scout/references/literature-scout-template.md +5 -24
- package/src/skills/scout/references/operational-guidance.md +191 -0
- package/src/skills/scout/references/paper-triage-playbook.md +11 -35
- package/src/skills/write/SKILL.md +744 -1246
- package/src/skills/write/references/experiments_analysis_patterns.md +129 -0
- package/src/skills/write/references/oral_package_patterns.md +252 -0
- package/src/skills/write/references/oral_writing_principles.md +291 -0
- package/src/skills/write/references/section_rewrite_checklist.md +234 -0
- package/src/tui/dist/app/AppContainer.js +1314 -27
- package/src/tui/dist/components/Composer.js +26 -1
- package/src/tui/dist/components/ConfigScreen.js +2 -1
- package/src/tui/dist/components/InputPrompt.js +25 -9
- package/src/tui/dist/components/MainContent.js +18 -3
- package/src/tui/dist/components/QuestScreen.js +3 -2
- package/src/tui/dist/components/UtilityScreen.js +37 -0
- package/src/tui/dist/hooks/useSafeInput.js +10 -0
- package/src/tui/dist/index.js +13 -1
- package/src/tui/dist/layouts/DefaultAppLayout.js +11 -8
- package/src/tui/dist/lib/api.js +89 -1
- package/src/tui/package.json +1 -1
- package/src/ui/dist/assets/{AnalysisPlugin-BCKAfjba.js → AnalysisPlugin-CA94NGmI.js} +1 -1
- package/src/ui/dist/assets/CliPlugin-DHBzphZU.js +79 -0
- package/src/ui/dist/assets/CodeEditorPlugin-BOFwD2rn.js +2 -0
- package/src/ui/dist/assets/{CodeViewerPlugin-CbaFRrUU.js → CodeViewerPlugin-CqDpgjik.js} +4 -4
- package/src/ui/dist/assets/{DocViewerPlugin-DAjLVeQD.js → DocViewerPlugin-UDBgt8-4.js} +3 -3
- package/src/ui/dist/assets/GitCommitViewerPlugin-BmHtZ0bZ.js +6 -0
- package/src/ui/dist/assets/{GitDiffViewerPlugin-CQACjoAA.js → GitDiffViewerPlugin-CAxjNorQ.js} +2 -2
- package/src/ui/dist/assets/{GitSnapshotViewer-0r4nLPke.js → GitSnapshotViewer-CweA6VON.js} +2 -2
- package/src/ui/dist/assets/{ImageViewerPlugin-nBOmI2v_.js → ImageViewerPlugin-C8wHGvGN.js} +5 -5
- package/src/ui/dist/assets/LabPlugin-COyyLUol.js +32 -0
- package/src/ui/dist/assets/{LatexPlugin-ZwtV8pIp.js → LatexPlugin-BQjAaA5J.js} +4 -4
- package/src/ui/dist/assets/{MarkdownViewerPlugin-DKqVfKyW.js → MarkdownViewerPlugin-Dy1NE2dI.js} +3 -3
- package/src/ui/dist/assets/{MarketplacePlugin-BwxStZ9D.js → MarketplacePlugin-DMIZtEJ2.js} +2 -2
- package/src/ui/dist/assets/NotebookEditor-CFHMq_Qt.js +91 -0
- package/src/ui/dist/assets/{NotebookEditor-DB9N_T9q.js → NotebookEditor-WFyd8Ybt.js} +3 -3
- package/src/ui/dist/assets/{PdfLoader-eWBONbQP.js → PdfLoader-CLE5u5TS.js} +3 -3
- package/src/ui/dist/assets/{PdfMarkdownPlugin-D22YOZL3.js → PdfMarkdownPlugin-_iNK_H83.js} +1 -1
- package/src/ui/dist/assets/PdfViewerPlugin-DgWsbInT.js +22 -0
- package/src/ui/dist/assets/SearchPlugin-DrZmn5iw.js +11 -0
- package/src/ui/dist/assets/{TextViewerPlugin-C5xqeeUH.js → TextViewerPlugin-D1-T3aC7.js} +4 -4
- package/src/ui/dist/assets/branding/runner-claude.svg +107 -0
- package/src/ui/dist/assets/branding/runner-codex.svg +10 -0
- package/src/ui/dist/assets/branding/runner-kimi.svg +14 -0
- package/src/ui/dist/assets/branding/runner-opencode.svg +7 -0
- package/src/ui/dist/assets/cli-store-CoZ-x5Ip.js +1 -0
- package/src/ui/dist/assets/{code-WlFHE7z_.js → code-DbsmSd3Y.js} +1 -1
- package/src/ui/dist/assets/file-diff-panel-DsvyRz47.js +1 -0
- package/src/ui/dist/assets/{wrap-text-BC-Hltpd.js → file-jump-queue-DeQBikaw.js} +3 -3
- package/src/ui/dist/assets/{file-socket-CfQPKQKj.js → file-socket-DA5XIx88.js} +1 -1
- package/src/ui/dist/assets/fonts/ds-fonts.css +50 -4
- package/src/ui/dist/assets/images/deepxiv/register-guide.png +0 -0
- package/src/ui/dist/assets/index-39vY9LmZ.js +1 -0
- package/src/ui/dist/assets/{index-CwNu1aH4.js → index-BsO46tJA.js} +1 -1
- package/src/ui/dist/assets/index-CHzJ2xtB.js +3530 -0
- package/src/ui/dist/assets/index-DH-zxoZ3.css +33 -0
- package/src/ui/dist/assets/{plugin-notebook-HbW2K-1c.js → plugin-notebook-JRhysCqj.js} +2 -2
- package/src/ui/dist/assets/{project-sync-C9IdzdZW.js → project-sync-DPmWKmKD.js} +1 -1
- package/src/ui/dist/assets/{zoom-out-E_gaeAxL.js → zoom-out-DAukFWen.js} +3 -3
- package/src/ui/dist/index.html +3 -3
- package/src/skills/analysis-campaign/references/artifact-orchestration.md +0 -58
- package/src/skills/baseline/references/memory-playbook.md +0 -40
- package/src/skills/baseline/references/publishable-baseline-package.md +0 -30
- package/src/skills/write/references/outline-evidence-contract-example.md +0 -107
- package/src/skills/write/references/paper-experiment-matrix-template.md +0 -131
- package/src/skills/write/references/paper-section-playbook.md +0 -64
- package/src/skills/write/references/reviewer-first-writing.md +0 -64
- package/src/skills/write/references/revision-checklist.md +0 -70
- package/src/skills/write/references/section-contracts.md +0 -82
- package/src/skills/write/references/sentence-level-proofing.md +0 -49
- package/src/ui/dist/assets/AiManusChatView-Bv-Z8YpU.js +0 -204
- package/src/ui/dist/assets/CliPlugin-BCKcpc35.js +0 -109
- package/src/ui/dist/assets/CodeEditorPlugin-DbOfSJ8K.js +0 -2
- package/src/ui/dist/assets/GitCommitViewerPlugin-CIUqbUDO.js +0 -1
- package/src/ui/dist/assets/LabCopilotPanel-BHxOxF4z.js +0 -14
- package/src/ui/dist/assets/LabPlugin-BKoZGs95.js +0 -22
- package/src/ui/dist/assets/NotebookEditor-BEQhaQbt.js +0 -81
- package/src/ui/dist/assets/PdfViewerPlugin-c-RK9DLM.js +0 -17
- package/src/ui/dist/assets/SearchPlugin-CxF9ytAx.js +0 -16
- package/src/ui/dist/assets/VNCViewer-BoLGLnHz.js +0 -11
- package/src/ui/dist/assets/bot-DREQOxzP.js +0 -6
- package/src/ui/dist/assets/chevron-up-C9Qpx4DE.js +0 -6
- package/src/ui/dist/assets/file-content-BZMz3RYp.js +0 -1
- package/src/ui/dist/assets/file-diff-panel-CQhw0jS2.js +0 -1
- package/src/ui/dist/assets/file-jump-queue-DA-SdG__.js +0 -1
- package/src/ui/dist/assets/git-commit-horizontal-DxZ8DCZh.js +0 -6
- package/src/ui/dist/assets/image-Bgl4VIyx.js +0 -6
- package/src/ui/dist/assets/index-BpV6lusQ.css +0 -33
- package/src/ui/dist/assets/index-CBNVuWcP.js +0 -2496
- package/src/ui/dist/assets/index-DrUnlf6K.js +0 -1
- package/src/ui/dist/assets/index-NW-h8VzN.js +0 -1
- package/src/ui/dist/assets/pdf-effect-queue-J8OnM0jE.js +0 -6
- package/src/ui/dist/assets/popover-CLc0pPP8.js +0 -1
- package/src/ui/dist/assets/select-Cs2PmzwL.js +0 -11
- package/src/ui/dist/assets/sigma-ClKcHAXm.js +0 -6
- package/src/ui/dist/assets/trash-DwpbFr3w.js +0 -11
- package/src/ui/dist/assets/useCliAccess-NQ8m0Let.js +0 -1
- package/src/ui/dist/assets/useFileDiffOverlay-FuhcnKiw.js +0 -1
|
@@ -7,427 +7,108 @@ skill_role: stage
|
|
|
7
7
|
# Optimize
|
|
8
8
|
|
|
9
9
|
Use this skill for algorithm-first quests where the goal is the strongest justified optimization result rather than paper packaging.
|
|
10
|
+
The goal is to move the frontier by one justified step at a time, not to generate a large pile of low-information candidates.
|
|
10
11
|
|
|
11
|
-
|
|
12
|
-
It does not replace the normal quest runtime. It tells you how to use the existing DeepScientist artifact, memory, bash_exec, Git, and worktree mechanisms as an optimization system.
|
|
12
|
+
## Match signals
|
|
13
13
|
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
- Follow the shared interaction contract injected by the system prompt.
|
|
17
|
-
- For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
|
|
18
|
-
- Ordinary candidate creation, smoke checks, and route updates should stay concise.
|
|
19
|
-
- Use richer milestone updates only when a candidate is promoted, a strong run finishes, the frontier shifts materially, or a fusion/debug route becomes the new main path.
|
|
20
|
-
- When the user asks for the current optimization state, answer from the frontier and durable artifacts rather than from chat memory.
|
|
21
|
-
- Hard execution rule: every terminal command in this stage must go through `bash_exec`; do not use any other terminal path for smoke checks, quick validations, long runs, Git, Python, package-manager, or file-inspection commands.
|
|
22
|
-
|
|
23
|
-
## Stage purpose
|
|
24
|
-
|
|
25
|
-
The optimize stage should do four things:
|
|
26
|
-
|
|
27
|
-
1. turn loose ideas into candidate briefs
|
|
28
|
-
2. rank and promote only the strongest briefs into durable lines
|
|
29
|
-
3. manage candidate attempts within a durable line
|
|
30
|
-
4. choose when to explore, exploit, fuse, debug, or stop
|
|
31
|
-
|
|
32
|
-
This skill is especially appropriate when `startup_contract.need_research_paper = false`.
|
|
33
|
-
|
|
34
|
-
Treat `optimize` as one stable stage skill with six internal submodes:
|
|
35
|
-
|
|
36
|
-
- `brief`
|
|
37
|
-
- `rank`
|
|
38
|
-
- `seed`
|
|
39
|
-
- `loop`
|
|
40
|
-
- `fusion`
|
|
41
|
-
- `debug`
|
|
42
|
-
|
|
43
|
-
Do not treat these as separate public skills.
|
|
44
|
-
Treat them as internal execution modes inside one optimize workflow.
|
|
45
|
-
|
|
46
|
-
InternAgent maps most naturally onto the `brief` and `rank` side of this stage.
|
|
47
|
-
MLEvolve maps most naturally onto the `seed`, `loop`, `fusion`, and `debug` side of this stage.
|
|
48
|
-
Do not collapse those two layers into one vague "optimize more" loop.
|
|
49
|
-
|
|
50
|
-
## Required working files
|
|
51
|
-
|
|
52
|
-
Before broad optimization search or candidate management becomes substantial, maintain these quest-visible control files:
|
|
53
|
-
|
|
54
|
-
- `OPTIMIZE_CHECKLIST.md`
|
|
55
|
-
- `CANDIDATE_BOARD.md`
|
|
56
|
-
|
|
57
|
-
Use:
|
|
58
|
-
|
|
59
|
-
- the integrated `optimize checklist template` appendix section
|
|
60
|
-
- the integrated `candidate board template` appendix section
|
|
61
|
-
|
|
62
|
-
`OPTIMIZE_CHECKLIST.md` is the execution control surface.
|
|
63
|
-
It should track:
|
|
64
|
-
|
|
65
|
-
- current frontier mode
|
|
66
|
-
- current optimize submode
|
|
67
|
-
- candidate brief count
|
|
68
|
-
- promoted line count
|
|
69
|
-
- current smoke queue
|
|
70
|
-
- current full-eval queue
|
|
71
|
-
- stagnation / fusion checks
|
|
72
|
-
- next concrete action
|
|
73
|
-
|
|
74
|
-
`CANDIDATE_BOARD.md` is the compact candidate ledger.
|
|
75
|
-
It should track:
|
|
76
|
-
|
|
77
|
-
- candidate id
|
|
78
|
-
- candidate type: brief or implementation attempt
|
|
79
|
-
- parent line or parent candidate
|
|
80
|
-
- strategy: explore / exploit / fusion / debug
|
|
81
|
-
- status
|
|
82
|
-
- expected gain
|
|
83
|
-
- observed result
|
|
84
|
-
- promote / archive recommendation
|
|
85
|
-
|
|
86
|
-
## Required MCP-driven workflow
|
|
87
|
-
|
|
88
|
-
Treat this as the concrete optimize workflow. Do not skip these steps just because the quest is algorithm-first.
|
|
89
|
-
|
|
90
|
-
### 1. Recover the optimization state first
|
|
91
|
-
|
|
92
|
-
At the start of each meaningful optimize pass, use this order unless a stronger local reason exists:
|
|
93
|
-
|
|
94
|
-
1. `artifact.get_optimization_frontier(...)`
|
|
95
|
-
2. `memory.list_recent(scope='quest', limit=5)`
|
|
96
|
-
3. `memory.search(...)`
|
|
97
|
-
4. `artifact.get_quest_state(detail='summary')`
|
|
98
|
-
5. `artifact.read_quest_documents(...)` when exact durable wording matters
|
|
99
|
-
|
|
100
|
-
Do not create new candidates before the frontier, recent optimization lessons, and current runtime refs are checked.
|
|
101
|
-
If the frontier is missing or obviously stale, recover that state before proposing more work.
|
|
102
|
-
|
|
103
|
-
### 2. Shape candidate briefs before branch promotion
|
|
104
|
-
|
|
105
|
-
When the next direction is still fuzzy, do not jump straight into code or branch creation.
|
|
106
|
-
First turn the direction into a compact candidate brief.
|
|
107
|
-
|
|
108
|
-
The brief-shaping sequence is:
|
|
109
|
-
|
|
110
|
-
1. clarify the bottleneck, constraints, and comparability boundary
|
|
111
|
-
2. identify the incumbent or baseline that this brief must beat or complement
|
|
112
|
-
3. generate a small differentiated slate, usually `2-3` serious approaches
|
|
113
|
-
4. compare them on one shared surface
|
|
114
|
-
5. recommend exactly one lead brief
|
|
115
|
-
6. self-check the recommended brief before submission
|
|
116
|
-
|
|
117
|
-
Every serious brief should answer:
|
|
118
|
-
|
|
119
|
-
- bottleneck
|
|
120
|
-
- why_current_line_is_limited
|
|
121
|
-
- mechanism
|
|
122
|
-
- why_now
|
|
123
|
-
- keep_unchanged
|
|
124
|
-
- expected_gain
|
|
125
|
-
- implementation_surface
|
|
126
|
-
- main_risks
|
|
127
|
-
|
|
128
|
-
The durable call for this step is usually:
|
|
129
|
-
|
|
130
|
-
- `artifact.submit_idea(mode='create', submission_mode='candidate', ...)`
|
|
131
|
-
|
|
132
|
-
Use `idea` when the mechanism family itself is still unresolved.
|
|
133
|
-
Use `optimize` when the family is already chosen and the work is now branchless brief shaping, ranking, or within-line search.
|
|
134
|
-
|
|
135
|
-
### 3. Rank candidate briefs on one explicit surface
|
|
136
|
-
|
|
137
|
-
Before promoting a line, compare the serious briefs on one shared ranking surface.
|
|
138
|
-
At minimum evaluate:
|
|
139
|
-
|
|
140
|
-
- expected information gain
|
|
141
|
-
- feasibility in current repo
|
|
142
|
-
- comparability against baseline
|
|
143
|
-
- implementation surface
|
|
144
|
-
- novelty or distinctiveness
|
|
145
|
-
- family diversity
|
|
146
|
-
- change-layer diversity
|
|
147
|
-
- incumbent-improvement potential
|
|
148
|
-
- failure risk
|
|
149
|
-
|
|
150
|
-
Then state:
|
|
151
|
-
|
|
152
|
-
- winner justification
|
|
153
|
-
- non-winner defer / reject reasons
|
|
154
|
-
- promotion cap: how many lines should actually be promoted now
|
|
155
|
-
|
|
156
|
-
Do not promote every plausible brief.
|
|
157
|
-
Default rule: promote only `1-3` candidate briefs, and usually fewer.
|
|
158
|
-
|
|
159
|
-
The durable call for this step is one of:
|
|
160
|
-
|
|
161
|
-
- `artifact.submit_idea(mode='create', submission_mode='line', source_candidate_id=..., ...)`
|
|
162
|
-
- `artifact.record(payload={'kind': 'decision', 'action': 'branch'|'continue'|'stop', ...})`
|
|
163
|
-
|
|
164
|
-
### 4. Hand off promoted lines into experiment cleanly
|
|
165
|
-
|
|
166
|
-
Once a brief is promoted, the next main work belongs to `experiment`, not to vague optimize chatter.
|
|
167
|
-
Before substantial implementation or compute:
|
|
168
|
-
|
|
169
|
-
- activate or confirm the intended durable line
|
|
170
|
-
- update `OPTIMIZE_CHECKLIST.md`
|
|
171
|
-
- update `CANDIDATE_BOARD.md`
|
|
172
|
-
- create or revise `PLAN.md`
|
|
173
|
-
- create or revise `CHECKLIST.md`
|
|
174
|
-
- define the smoke queue and full-eval queue explicitly
|
|
175
|
-
|
|
176
|
-
Then hand off into `experiment` for:
|
|
177
|
-
|
|
178
|
-
- one clean implementation pass
|
|
179
|
-
- one bounded smoke or pilot run
|
|
180
|
-
- one real measured main run
|
|
181
|
-
|
|
182
|
-
Do not keep reshaping the method after the run contract is already concrete.
|
|
183
|
-
|
|
184
|
-
### 5. Record every meaningful result durably
|
|
185
|
-
|
|
186
|
-
Use these artifact forms consistently:
|
|
187
|
-
|
|
188
|
-
- candidate brief:
|
|
189
|
-
- `artifact.submit_idea(..., submission_mode='candidate')`
|
|
190
|
-
- durable optimization line:
|
|
191
|
-
- `artifact.submit_idea(..., submission_mode='line')`
|
|
192
|
-
- implementation-level candidate attempt inside one line:
|
|
193
|
-
- `artifact.record(payload={'kind': 'report', 'report_type': 'optimization_candidate', ...})`
|
|
194
|
-
- real measured main result:
|
|
195
|
-
- `artifact.record_main_experiment(...)`
|
|
196
|
-
- route change after the result:
|
|
197
|
-
- `artifact.record(payload={'kind': 'decision', 'action': 'iterate'|'branch'|'continue'|'stop', ...})`
|
|
198
|
-
|
|
199
|
-
Do not treat chat summaries as substitutes for these durable records.
|
|
200
|
-
|
|
201
|
-
### 6. Manage process lifecycle explicitly
|
|
202
|
-
|
|
203
|
-
Optimize uses the same long-run process discipline as `experiment`.
|
|
204
|
-
|
|
205
|
-
- Use `bash_exec` for smoke checks, quick validations, and long runs.
|
|
206
|
-
- Before launching a new run, inspect current managed sessions first.
|
|
207
|
-
- Do not start a duplicate process for the same purpose if a valid live session already exists.
|
|
208
|
-
- Use bounded smoke before long runs unless direct quick validation is already cheap and equally informative.
|
|
209
|
-
- Use `bash_exec(mode='detach', ...)` for long runs and monitor with `list/read/await`.
|
|
210
|
-
- Read logs before retrying a failed or suspicious run; do not relaunch blindly.
|
|
211
|
-
- Kill only on explicit invalidity, supersession, or checked no-progress conditions.
|
|
212
|
-
- After pause, resume, or daemon recovery, recover session state before spawning new runs.
|
|
14
|
+
Use `optimize` when:
|
|
213
15
|
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
2. compare the result against the incumbent and backlog
|
|
220
|
-
3. choose exactly one dominant next action:
|
|
221
|
-
- explore
|
|
222
|
-
- exploit
|
|
223
|
-
- fusion
|
|
224
|
-
- debug
|
|
225
|
-
- stop
|
|
226
|
-
4. record that route durably
|
|
227
|
-
|
|
228
|
-
Do not treat one candidate creation, one smoke pass, or one detached launch as stage completion.
|
|
229
|
-
|
|
230
|
-
## Integrated templates and playbooks
|
|
231
|
-
|
|
232
|
-
Use the following integrated structures directly inside this skill. They replace the old optimize reference files conceptually, even if those files still exist on disk.
|
|
233
|
-
|
|
234
|
-
### Candidate brief template
|
|
235
|
-
|
|
236
|
-
Every serious candidate brief should include:
|
|
237
|
-
|
|
238
|
-
- title
|
|
239
|
-
- bottleneck
|
|
240
|
-
- why_current_line_is_limited
|
|
241
|
-
- mechanism
|
|
242
|
-
- mechanism_family
|
|
243
|
-
- change_layer: `Tier1` / `Tier2` / `Tier3`
|
|
244
|
-
- source_lens
|
|
245
|
-
- keep_unchanged
|
|
246
|
-
- expected_gain
|
|
247
|
-
- implementation_surface
|
|
248
|
-
- risks
|
|
249
|
-
- foundation
|
|
250
|
-
- promote_now
|
|
251
|
-
- next_target
|
|
252
|
-
|
|
253
|
-
### Brief-shaping playbook
|
|
254
|
-
|
|
255
|
-
Use this when a candidate direction is still fuzzy and needs to become a ranking-ready brief.
|
|
256
|
-
|
|
257
|
-
- clarify the concrete bottleneck before widening
|
|
258
|
-
- resolve the evaluation or comparability boundary
|
|
259
|
-
- identify the main hard constraint
|
|
260
|
-
- identify the current incumbent
|
|
261
|
-
- generate only a small differentiated slate
|
|
262
|
-
- compare on one shared surface
|
|
263
|
-
- recommend exactly one lead brief
|
|
264
|
-
- self-check for ambiguity, overlap, and weak justification
|
|
265
|
-
|
|
266
|
-
### Candidate ranking template
|
|
267
|
-
|
|
268
|
-
When several briefs compete, produce:
|
|
269
|
-
|
|
270
|
-
- candidate set
|
|
271
|
-
- ranking scope
|
|
272
|
-
- comparison surface
|
|
273
|
-
- ranked candidates with score summary, why each ranks there, and promote / hold / reject
|
|
274
|
-
- winner justification
|
|
275
|
-
- non-winner notes
|
|
276
|
-
- promotion cap
|
|
277
|
-
|
|
278
|
-
### Candidate board template
|
|
279
|
-
|
|
280
|
-
`CANDIDATE_BOARD.md` should expose at least these columns:
|
|
281
|
-
|
|
282
|
-
- candidate id
|
|
283
|
-
- level: `brief` or `implementation`
|
|
284
|
-
- parent
|
|
285
|
-
- strategy
|
|
286
|
-
- status
|
|
287
|
-
- expected gain
|
|
288
|
-
- observed result
|
|
289
|
-
- promote / archive recommendation
|
|
290
|
-
|
|
291
|
-
### Optimize checklist template
|
|
292
|
-
|
|
293
|
-
`OPTIMIZE_CHECKLIST.md` should track at least:
|
|
294
|
-
|
|
295
|
-
- frontier has been refreshed
|
|
296
|
-
- primary optimize submode chosen
|
|
297
|
-
- current route mode chosen
|
|
298
|
-
- recent optimization memory reviewed
|
|
299
|
-
- brief slate checked for family diversity
|
|
300
|
-
- candidate briefs updated or confirmed
|
|
301
|
-
- candidate ranking updated
|
|
302
|
-
- promotion decision made
|
|
303
|
-
- current implementation pool recorded
|
|
304
|
-
- smoke queue defined
|
|
305
|
-
- full-eval queue defined
|
|
306
|
-
- failures classified
|
|
307
|
-
- stagnation check performed
|
|
308
|
-
- fusion eligibility checked
|
|
309
|
-
- next concrete action written
|
|
310
|
-
|
|
311
|
-
### Frontier review template
|
|
312
|
-
|
|
313
|
-
Whenever route choice is unclear, write down:
|
|
314
|
-
|
|
315
|
-
- current frontier
|
|
316
|
-
- evidence summary
|
|
317
|
-
- route choice
|
|
318
|
-
- active optimize submode
|
|
319
|
-
- immediate next action
|
|
320
|
-
|
|
321
|
-
### Code-generation route playbook
|
|
322
|
-
|
|
323
|
-
Choose one route deliberately:
|
|
324
|
-
|
|
325
|
-
- brief-only when the direction is still unclear
|
|
326
|
-
- stepwise generation for first substantial implementation of a new line
|
|
327
|
-
- diff / patch generation for improve / exploit / debug / most fusion work
|
|
328
|
-
- full rewrite only when the current implementation is structurally broken or mismatched
|
|
329
|
-
|
|
330
|
-
Do not jump to a rewrite merely because one local patch failed.
|
|
331
|
-
|
|
332
|
-
### Debug response template
|
|
333
|
-
|
|
334
|
-
When a candidate fails but still looks strategically valuable, record:
|
|
335
|
-
|
|
336
|
-
- error
|
|
337
|
-
- retrieved memory
|
|
338
|
-
- root cause
|
|
339
|
-
- minimal fix
|
|
340
|
-
- keep unchanged
|
|
341
|
-
- next check
|
|
342
|
-
- archive threshold
|
|
343
|
-
|
|
344
|
-
### Fusion playbook
|
|
16
|
+
- the quest is algorithm-first
|
|
17
|
+
- the baseline gate is already confirmed or waived
|
|
18
|
+
- the task has at least one plausible optimization direction
|
|
19
|
+
- multiple candidate directions exist and the system should rank them before promotion
|
|
20
|
+
- a durable line exists and the next step is to manage explore, exploit, fusion, debug, or stop
|
|
345
21
|
|
|
346
|
-
|
|
22
|
+
Do not use `optimize` when:
|
|
347
23
|
|
|
348
|
-
-
|
|
349
|
-
-
|
|
350
|
-
-
|
|
351
|
-
-
|
|
352
|
-
- what bounded first validation step should run before any broad rollout?
|
|
24
|
+
- the baseline gate is unresolved
|
|
25
|
+
- the main need is a paper draft, rebuttal, review, or finalize task
|
|
26
|
+
- the quest is still in broad literature scouting with no concrete optimization handle
|
|
27
|
+
- the real blocker is still idea-family selection rather than bounded optimization search inside an accepted family
|
|
353
28
|
|
|
354
|
-
|
|
29
|
+
## One-sentence summary
|
|
355
30
|
|
|
356
|
-
|
|
31
|
+
Recover the current frontier, choose one optimize submode, advance one justified move, then record the new frontier or explicit stop condition.
|
|
357
32
|
|
|
358
|
-
|
|
33
|
+
## Control workflow
|
|
359
34
|
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
|
|
35
|
+
1. Recover the current frontier and recent durable optimization state.
|
|
36
|
+
Read the frontier, recent memory, and current quest state before creating or promoting anything.
|
|
37
|
+
2. Choose exactly one primary optimize submode for this pass.
|
|
38
|
+
Keep the pass legible: one dominant optimize move, not several unrelated route changes.
|
|
39
|
+
3. Keep the candidate slate or active pool small and differentiated.
|
|
40
|
+
If the direction is still fuzzy, shape and rank branchless candidate briefs; if a durable line already exists, manage a bounded implementation pool inside that line.
|
|
41
|
+
4. Promote or execute only bounded candidates with explicit evidence criteria.
|
|
42
|
+
Promote only the strongest briefs into durable lines, and record implementation-level attempts separately from durable line creation.
|
|
43
|
+
5. Route from evidence to exactly one dominant next action.
|
|
44
|
+
End in `explore`, `exploit`, `fusion`, `debug`, or `stop`, and record that route durably.
|
|
366
45
|
|
|
367
|
-
|
|
46
|
+
## AVOID / pitfalls
|
|
368
47
|
|
|
369
|
-
|
|
48
|
+
- Do not treat every patch or micro-attempt as a new durable idea line.
|
|
49
|
+
- Do not create a new Git branch or worktree for every implementation-level candidate.
|
|
50
|
+
- Do not create a new Git branch/worktree for every implementation-level candidate.
|
|
51
|
+
- Do not promote every plausible brief.
|
|
52
|
+
- Do not keep widening the frontier once a small serious slate already exists.
|
|
53
|
+
- Do not let one optimize pass mix multiple major route changes.
|
|
54
|
+
- Do not keep selecting the same familiar mechanism family after repeated non-improving results.
|
|
55
|
+
- Do not drift into paper-outline, bundle, or finalize work by default while this stage is active.
|
|
56
|
+
- Do not treat one candidate creation or one smoke pass as stage completion.
|
|
370
57
|
|
|
371
|
-
|
|
372
|
-
2. identify the most likely root cause
|
|
373
|
-
3. choose one larger route change:
|
|
374
|
-
- widen search
|
|
375
|
-
- promote a stronger alternative
|
|
376
|
-
- fuse
|
|
377
|
-
- debug
|
|
378
|
-
- stop
|
|
379
|
-
4. record one explicit non-repeat rule
|
|
58
|
+
## Constraints
|
|
380
59
|
|
|
381
|
-
|
|
60
|
+
- Use these three object levels consistently:
|
|
61
|
+
- candidate brief
|
|
62
|
+
- durable optimization line
|
|
63
|
+
- implementation-level candidate attempt
|
|
64
|
+
- Keep exactly one primary optimize submode active for the current meaningful pass.
|
|
65
|
+
- Keep only one bottom-layer optimize move truly in progress at a time.
|
|
66
|
+
- Before deciding the next route, call `artifact.get_optimization_frontier(...)` when available and use it as the primary optimization-state summary.
|
|
67
|
+
- Candidate briefs should use `artifact.submit_idea(..., submission_mode='candidate')`.
|
|
68
|
+
- Durable lines should use `artifact.submit_idea(..., submission_mode='line')`.
|
|
69
|
+
- Only promote a candidate brief into a durable line when it has enough expected value, differentiation, and execution path clarity to deserve branch/worktree state.
|
|
70
|
+
- Implementation-level candidate attempts inside one durable line should use `artifact.record(... report_type='optimization_candidate' ...)`.
|
|
71
|
+
- Real measured line results should use `artifact.record_main_experiment(...)`.
|
|
72
|
+
- All terminal work in this stage must go through `bash_exec(...)`.
|
|
382
73
|
|
|
383
|
-
|
|
74
|
+
## Validation
|
|
384
75
|
|
|
385
|
-
|
|
76
|
+
Before `optimize` can end, all applicable checks should be true:
|
|
386
77
|
|
|
387
|
-
-
|
|
388
|
-
-
|
|
389
|
-
-
|
|
390
|
-
-
|
|
391
|
-
-
|
|
392
|
-
-
|
|
78
|
+
- the frontier was refreshed
|
|
79
|
+
- the active optimize submode is explicit
|
|
80
|
+
- the candidate board and optimize checklist reflect the current state
|
|
81
|
+
- promoted lines are justified and bounded
|
|
82
|
+
- every live candidate has status and next action
|
|
83
|
+
- every major success, failure, promotion, or route change is durably recorded
|
|
84
|
+
- the pass ends with one durable next action or stop condition
|
|
393
85
|
|
|
394
|
-
|
|
86
|
+
## Interaction discipline
|
|
395
87
|
|
|
396
|
-
-
|
|
397
|
-
-
|
|
398
|
-
-
|
|
399
|
-
-
|
|
400
|
-
-
|
|
88
|
+
- Follow the shared interaction contract injected by the system prompt.
|
|
89
|
+
- For ordinary active work, prefer a concise progress update once work has crossed roughly 6 tool calls with a human-meaningful delta, and do not drift beyond roughly 12 tool calls or about 8 minutes without a user-visible update.
|
|
90
|
+
- Ordinary candidate creation, smoke checks, and route updates should stay concise.
|
|
91
|
+
- Use richer milestone updates only when a candidate is promoted, a strong run finishes, the frontier shifts materially, or a fusion/debug route becomes the new main path.
|
|
92
|
+
- When the user asks for the current optimization state, answer from the frontier and durable artifacts rather than from chat memory.
|
|
93
|
+
- Every terminal command in this stage must go through `bash_exec`; do not use any other terminal path for smoke checks, quick validations, long runs, Git, Python, package-manager, or file-inspection commands.
|
|
401
94
|
|
|
402
|
-
##
|
|
95
|
+
## Working surfaces
|
|
403
96
|
|
|
404
|
-
|
|
405
|
-
- Do not create a new Git branch/worktree for every implementation-level candidate.
|
|
406
|
-
- Use `artifact.submit_idea(..., submission_mode='candidate')` for candidate briefs that should be ranked before promotion.
|
|
407
|
-
- Use `artifact.submit_idea(..., submission_mode='line')` only for directions that deserve a durable optimization line and branch/worktree.
|
|
408
|
-
- Use `artifact.record(payload={'kind': 'report', 'report_type': 'optimization_candidate', ...})` for implementation-level candidate attempts inside one durable line.
|
|
409
|
-
- Before deciding the next route, call `artifact.get_optimization_frontier(...)` when available and use it as the primary optimization-state summary.
|
|
410
|
-
- Keep all major optimization successes and failures durable through artifacts and memory.
|
|
411
|
-
- Do not drift into paper-outline, bundle, or finalize work by default while this stage is active.
|
|
412
|
-
- Do not convert ranking uncertainty into premature branch creation.
|
|
413
|
-
- Do not treat an implementation-level candidate report as a new durable optimization line.
|
|
414
|
-
- Do not keep widening the frontier once a small serious slate already exists.
|
|
415
|
-
- Do not let one optimize pass mix multiple major route changes.
|
|
416
|
-
One pass may inspect several possibilities, but it should finish with one dominant next action.
|
|
97
|
+
Before broad optimization search or candidate management becomes substantial, maintain these quest-visible control files:
|
|
417
98
|
|
|
418
|
-
|
|
99
|
+
- quest-root `plan.md` as the research map and loop tracker for the whole quest
|
|
100
|
+
- workspace `PLAN.md` as the active optimize-node contract
|
|
101
|
+
- `OPTIMIZE_CHECKLIST.md` as the optimize-specific execution frontier
|
|
102
|
+
- workspace `CHECKLIST.md` as a mirror of the immediate next move when it exists
|
|
103
|
+
- `CANDIDATE_BOARD.md` as the compact candidate ledger
|
|
419
104
|
|
|
420
|
-
|
|
421
|
-
- the baseline gate is already confirmed or waived
|
|
422
|
-
- the task has at least one plausible optimization direction
|
|
423
|
-
- multiple candidate directions exist and the system should rank them before promotion
|
|
424
|
-
- a durable line exists and the next step is to manage explore / exploit / fuse / debug
|
|
105
|
+
Use these templates:
|
|
425
106
|
|
|
426
|
-
|
|
107
|
+
- `references/optimize-checklist-template.md`
|
|
108
|
+
- `references/candidate-board-template.md`
|
|
427
109
|
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
- the quest is still in broad literature scouting with no concrete optimization handle
|
|
110
|
+
`optimize` is the looped search controller for algorithm-first quests, not a replacement for the quest-level roadmap.
|
|
111
|
+
When a result becomes the new incumbent, plateaus, or stops, update quest-root `plan.md` so the next loop edge is explicit.
|
|
431
112
|
|
|
432
113
|
## Core object model
|
|
433
114
|
|
|
@@ -435,230 +116,42 @@ Use these three object levels consistently:
|
|
|
435
116
|
|
|
436
117
|
1. candidate brief
|
|
437
118
|
`artifact.submit_idea(mode='create', submission_mode='candidate', ...)`
|
|
438
|
-
|
|
439
|
-
|
|
119
|
+
Record a possible direction or method brief without opening a branch yet.
|
|
440
120
|
2. durable optimization line
|
|
441
121
|
`artifact.submit_idea(mode='create', submission_mode='line', ...)`
|
|
442
|
-
|
|
443
|
-
|
|
122
|
+
Open a real branch or worktree and make it a formal optimization path.
|
|
444
123
|
3. implementation-level candidate attempt
|
|
445
124
|
`artifact.record(payload={'kind': 'report', 'report_type': 'optimization_candidate', ...})`
|
|
446
|
-
|
|
447
|
-
|
|
448
|
-
## Recommended workflow
|
|
449
|
-
|
|
450
|
-
1. Read the current frontier and recent durable state.
|
|
451
|
-
2. If only loose candidate directions exist, create or refine candidate briefs first.
|
|
452
|
-
3. Rank the candidate briefs and promote only the best `1-3` into durable lines.
|
|
453
|
-
4. Inside a durable line, generate a small candidate pool, then run bounded smoke checks before full evaluations.
|
|
454
|
-
5. Record each implementation-level attempt durably with status, change plan, and result.
|
|
455
|
-
6. After each real result, decide whether to explore, exploit, fuse, debug, or stop.
|
|
456
|
-
7. Write optimization lessons to memory before leaving the stage.
|
|
457
|
-
|
|
458
|
-
At the start of each meaningful optimize pass, update `OPTIMIZE_CHECKLIST.md` before spending significant code or compute.
|
|
459
|
-
|
|
460
|
-
## Mandatory first-call sequence
|
|
125
|
+
Record one within-line attempt such as one patch, one smoke candidate, one debug candidate, or one fusion candidate.
|
|
461
126
|
|
|
462
|
-
|
|
127
|
+
Use `artifact.record(payload={'kind': 'decision', ...})` when the frontier route changes, a line is promoted, a line is stopped, or the next optimize submode is selected.
|
|
463
128
|
|
|
464
|
-
|
|
465
|
-
2. `memory.search(...)`
|
|
466
|
-
3. `artifact.get_quest_state(detail='summary')`
|
|
467
|
-
4. `artifact.read_quest_documents(...)` when exact durable wording matters
|
|
129
|
+
## Optimize submodes
|
|
468
130
|
|
|
469
|
-
|
|
470
|
-
|
|
471
|
-
## Stage-start requirement
|
|
472
|
-
|
|
473
|
-
Stage-start requirement:
|
|
474
|
-
|
|
475
|
-
- run `memory.list_recent(scope='quest', limit=5)`
|
|
476
|
-
- run at least one `memory.search(...)`
|
|
477
|
-
- read `artifact.get_optimization_frontier(...)`
|
|
478
|
-
- update `OPTIMIZE_CHECKLIST.md`
|
|
479
|
-
|
|
480
|
-
If the frontier is missing or obviously stale, recover that state before proposing more work.
|
|
131
|
+
Treat `optimize` as one stable stage skill with six internal submodes:
|
|
481
132
|
|
|
482
|
-
|
|
133
|
+
- `brief`: turn loose directions into compact candidate briefs
|
|
134
|
+
- `rank`: compare briefs on one shared surface and choose promotion candidates
|
|
135
|
+
- `seed`: create a small implementation-level pool inside one durable line
|
|
136
|
+
- `loop`: advance one durable line with bounded smoke/full-eval/record actions
|
|
137
|
+
- `fusion`: combine complementary strengths from multiple lines
|
|
138
|
+
- `debug`: rescue a strategically valuable candidate blocked by a concrete failure mode
|
|
483
139
|
|
|
484
|
-
|
|
140
|
+
Do not treat these as separate public skills.
|
|
141
|
+
Treat them as internal execution modes inside one optimize workflow.
|
|
485
142
|
|
|
486
143
|
Default selection order:
|
|
487
144
|
|
|
488
|
-
1. `fusion`
|
|
489
|
-
|
|
490
|
-
|
|
491
|
-
|
|
492
|
-
|
|
493
|
-
|
|
494
|
-
4. `brief`
|
|
495
|
-
- when the candidate-brief slate is too thin or too weak
|
|
496
|
-
5. `seed`
|
|
497
|
-
- when a durable line exists but there is no live implementation-candidate pool
|
|
498
|
-
6. `loop`
|
|
499
|
-
- when a live candidate pool or leading durable line already exists and the main need is bounded execution progress
|
|
500
|
-
|
|
501
|
-
Do not bounce among submodes repeatedly in one pass.
|
|
502
|
-
If the best submode changes after new evidence appears, record that route shift explicitly.
|
|
503
|
-
|
|
504
|
-
## Candidate brief protocol
|
|
505
|
-
|
|
506
|
-
When a direction is interesting but not yet worthy of a new branch:
|
|
507
|
-
|
|
508
|
-
- create a candidate brief with `submission_mode='candidate'`
|
|
509
|
-
- keep it branchless
|
|
510
|
-
- record enough structure that later ranking or promotion is possible
|
|
511
|
-
|
|
512
|
-
Good candidate-brief fields include:
|
|
513
|
-
|
|
514
|
-
- title
|
|
515
|
-
- problem
|
|
516
|
-
- hypothesis
|
|
517
|
-
- mechanism
|
|
518
|
-
- mechanism_family
|
|
519
|
-
- change_layer
|
|
520
|
-
- source_lens
|
|
521
|
-
- expected_gain
|
|
522
|
-
- risks
|
|
523
|
-
- decision_reason
|
|
524
|
-
- foundation_ref
|
|
525
|
-
- lineage_intent
|
|
526
|
-
|
|
527
|
-
Do not promote every candidate automatically.
|
|
528
|
-
|
|
529
|
-
Use the integrated `method brief template` section for the minimum acceptable candidate-brief structure.
|
|
530
|
-
Use the integrated `brief shaping playbook` section when the brief is still too vague, too implementation-first, or too collapsed onto one familiar mechanism.
|
|
531
|
-
|
|
532
|
-
Candidate briefs should explicitly answer:
|
|
533
|
-
|
|
534
|
-
- WHAT bottleneck is being targeted?
|
|
535
|
-
- WHY is the current line limited?
|
|
536
|
-
- HOW does this mechanism address the limitation?
|
|
537
|
-
- WHAT must remain unchanged for comparability?
|
|
538
|
-
|
|
539
|
-
If the brief cannot answer those four questions clearly, it is not ready for promotion or implementation.
|
|
540
|
-
|
|
541
|
-
Treat a candidate brief as the DeepScientist form of a method brief.
|
|
542
|
-
It should sit between "idea intuition" and "code implementation".
|
|
543
|
-
|
|
544
|
-
Preserve this brief-shaping discipline:
|
|
545
|
-
|
|
546
|
-
1. clarify the bottleneck, constraints, and comparability boundary first
|
|
547
|
-
2. generate a small differentiated slate, usually `2-3` serious approaches
|
|
548
|
-
3. recommend one approach with explicit tradeoffs against the alternatives
|
|
549
|
-
4. self-check the winning brief for ambiguity, overlap, and weak justification before submission
|
|
550
|
-
|
|
551
|
-
Do not jump from "interesting intuition" to branch creation.
|
|
552
|
-
Do not jump from "I know how to code this" to "this deserves promotion."
|
|
553
|
-
|
|
554
|
-
When running the `brief` submode:
|
|
555
|
-
|
|
556
|
-
- produce only `2-4` serious candidate briefs by default
|
|
557
|
-
- ask or answer the minimum clarifying questions needed to remove ambiguity around bottleneck, constraint fit, and comparability
|
|
558
|
-
- explicitly keep one incumbent-compatible refinement when possible
|
|
559
|
-
- explicitly keep one orthogonal alternative when possible
|
|
560
|
-
- explicitly keep one broader lens or paradigm shift candidate when possible
|
|
561
|
-
- avoid generating several renamed variants of the same mechanism
|
|
562
|
-
- prefer mechanism-level distinctness over volume
|
|
563
|
-
- present the differentiated slate on one shared comparison surface before choosing a recommended brief
|
|
564
|
-
- keep the questioning bounded and execution-oriented rather than open-ended brainstorming
|
|
145
|
+
1. `fusion` when the frontier explicitly says `fusion`
|
|
146
|
+
2. `debug` when a strategically valuable candidate failed for a concrete and likely fixable reason
|
|
147
|
+
3. `rank` when several candidate briefs already exist and promotion is the main unresolved question
|
|
148
|
+
4. `brief` when the candidate-brief slate is too thin or too weak
|
|
149
|
+
5. `seed` when a durable line exists but there is no live implementation-candidate pool
|
|
150
|
+
6. `loop` when a live candidate pool or leading durable line already exists and the main need is bounded execution progress
|
|
565
151
|
|
|
566
|
-
|
|
152
|
+
## Frontier route meanings
|
|
567
153
|
|
|
568
|
-
|
|
569
|
-
- one `orthogonal-mechanism` direction when justified
|
|
570
|
-
- one `paradigm/objective/data-view shift` direction when justified
|
|
571
|
-
|
|
572
|
-
If all serious briefs belong to the same mechanism family, do one widening pass before ranking.
|
|
573
|
-
Do not treat a same-family slate as sufficient merely because the local scores look good.
|
|
574
|
-
|
|
575
|
-
For each serious brief, record at least:
|
|
576
|
-
|
|
577
|
-
- bottleneck
|
|
578
|
-
- why_current_line_is_limited
|
|
579
|
-
- mechanism
|
|
580
|
-
- why_now
|
|
581
|
-
- mechanism_family
|
|
582
|
-
- change_layer: `Tier1` / `Tier2` / `Tier3`
|
|
583
|
-
- source_lens
|
|
584
|
-
- keep_unchanged
|
|
585
|
-
- expected_gain
|
|
586
|
-
- implementation_surface
|
|
587
|
-
- main_risks
|
|
588
|
-
- promote_now: yes or no
|
|
589
|
-
|
|
590
|
-
InternAgent-style behavior to preserve here:
|
|
591
|
-
|
|
592
|
-
- generate candidate methods first
|
|
593
|
-
- critique them before promotion
|
|
594
|
-
- express them as method-layer objects rather than code patches
|
|
595
|
-
- defer branch creation until the candidate is actually chosen
|
|
596
|
-
- prefer one-question-at-a-time clarification when one missing assumption would otherwise contaminate the whole brief slate
|
|
597
|
-
|
|
598
|
-
Do not require a paper-style literature hard gate inside this submode unless the quest explicitly moved back toward paper work.
|
|
599
|
-
|
|
600
|
-
## Promotion protocol
|
|
601
|
-
|
|
602
|
-
Only promote a candidate brief into a durable line when at least one of the following is true:
|
|
603
|
-
|
|
604
|
-
- it clearly dominates the nearby alternatives
|
|
605
|
-
- it is top-ranked and sufficiently distinct
|
|
606
|
-
- the user explicitly asked to pursue it
|
|
607
|
-
- the current frontier indicates the line is the strongest next move
|
|
608
|
-
|
|
609
|
-
Promotion should use:
|
|
610
|
-
|
|
611
|
-
`artifact.submit_idea(mode='create', submission_mode='line', source_candidate_id=..., ...)`
|
|
612
|
-
|
|
613
|
-
When several candidate briefs are plausible, rank them explicitly before promotion.
|
|
614
|
-
Use the integrated `candidate ranking template` section for the minimum acceptable ranking record.
|
|
615
|
-
|
|
616
|
-
Default promotion rule:
|
|
617
|
-
|
|
618
|
-
- promote only `1-3` candidate briefs into durable lines
|
|
619
|
-
- if one candidate clearly dominates, promote only that one
|
|
620
|
-
- if the frontier is still structurally uncertain, promote at most two sufficiently distinct lines
|
|
621
|
-
|
|
622
|
-
When running the `rank` submode:
|
|
623
|
-
|
|
624
|
-
- compare the current serious briefs on one explicit shared surface
|
|
625
|
-
- score or rank them with written reasons
|
|
626
|
-
- state why the winner is better now
|
|
627
|
-
- state why the main alternatives are deferred rather than erased
|
|
628
|
-
- never treat "all seem promising" as a sufficient reason to promote them all
|
|
629
|
-
|
|
630
|
-
Use a distinct promotion policy:
|
|
631
|
-
|
|
632
|
-
- default rule: each mechanism family should contribute at most one promoted line
|
|
633
|
-
- do not let one familiar family fill the whole promoted slate
|
|
634
|
-
- only override that family cap when one candidate clearly dominates the whole field
|
|
635
|
-
|
|
636
|
-
When ranking, explicitly check:
|
|
637
|
-
|
|
638
|
-
- family diversity
|
|
639
|
-
- change-layer diversity
|
|
640
|
-
- whether the brief slate is collapsing into one familiar lens
|
|
641
|
-
|
|
642
|
-
If the top briefs are all same-family, either:
|
|
643
|
-
|
|
644
|
-
- keep only the strongest one
|
|
645
|
-
- or return to `brief` for a widening pass
|
|
646
|
-
|
|
647
|
-
The output of `rank` should be promotion-ready.
|
|
648
|
-
The output of `brief` should be candidate-ready.
|
|
649
|
-
|
|
650
|
-
## Frontier protocol
|
|
651
|
-
|
|
652
|
-
At meaningful route boundaries, inspect:
|
|
653
|
-
|
|
654
|
-
- best branch
|
|
655
|
-
- best recent run
|
|
656
|
-
- stagnant branches
|
|
657
|
-
- candidate backlog
|
|
658
|
-
- possible fusion opportunities
|
|
659
|
-
- recommended mode
|
|
660
|
-
|
|
661
|
-
Prefer these route meanings:
|
|
154
|
+
At meaningful route boundaries, choose exactly one dominant route meaning:
|
|
662
155
|
|
|
663
156
|
- `explore`: widen search with fresh candidate directions
|
|
664
157
|
- `exploit`: focus on the strongest current line
|
|
@@ -666,980 +159,96 @@ Prefer these route meanings:
|
|
|
666
159
|
- `debug`: rescue a candidate or line blocked by a concrete failure mode
|
|
667
160
|
- `stop`: the current frontier is saturated or the remaining routes are not justified
|
|
668
161
|
|
|
669
|
-
|
|
670
|
-
|
|
671
|
-
Interpret frontier state with these default heuristics:
|
|
672
|
-
|
|
673
|
-
- `explore`
|
|
674
|
-
- use when no line is clearly dominant
|
|
675
|
-
- use when current lines are too similar
|
|
676
|
-
- use when the search has not yet established a strong incumbent
|
|
677
|
-
|
|
678
|
-
- `exploit`
|
|
679
|
-
- use when one line clearly leads on evidence and comparability
|
|
680
|
-
- use when smoke results already narrowed the candidate pool
|
|
681
|
-
|
|
682
|
-
- `fusion`
|
|
683
|
-
- use when at least two lines have meaningful strengths
|
|
684
|
-
- use when one line is strong but another line contributes a complementary mechanism
|
|
685
|
-
- use when the current incumbent is stagnating but the broader frontier is still promising
|
|
686
|
-
|
|
687
|
-
- `debug`
|
|
688
|
-
- use when a candidate failed for a concrete and likely fixable reason
|
|
689
|
-
- use when the candidate is still strategically valuable after the failure
|
|
162
|
+
Default heuristics:
|
|
690
163
|
|
|
691
|
-
- `
|
|
692
|
-
|
|
693
|
-
|
|
164
|
+
- choose `explore` when no line is clearly dominant or the current lines are too similar
|
|
165
|
+
- choose `exploit` when one line clearly leads on evidence and comparability
|
|
166
|
+
- choose `fusion` when at least two lines have meaningful complementary strengths
|
|
167
|
+
- choose `debug` when a strategically valuable candidate failed for a concrete and likely fixable reason
|
|
168
|
+
- choose `stop` when the frontier is saturated or the remaining routes are low-value relative to cost
|
|
694
169
|
|
|
695
|
-
|
|
696
|
-
When the frontier says `exploit`, the default optimize submode is `seed` or `loop`.
|
|
697
|
-
When the frontier says `fusion`, the default optimize submode is `fusion`.
|
|
698
|
-
When a candidate failure dominates the next move, the default optimize submode is `debug` even if the frontier does not yet say so explicitly.
|
|
699
|
-
|
|
700
|
-
## Seed protocol
|
|
701
|
-
|
|
702
|
-
Use `seed` after a durable line exists and before a broad execution loop begins.
|
|
703
|
-
|
|
704
|
-
The goal is not to launch a full run immediately.
|
|
705
|
-
The goal is to generate a small within-line candidate pool that can be smoke-tested and triaged.
|
|
706
|
-
|
|
707
|
-
When running `seed`:
|
|
708
|
-
|
|
709
|
-
- generate only `2-3` implementation-level candidates by default
|
|
710
|
-
- make each candidate meaningfully different in mechanism, implementation path, or risk profile
|
|
711
|
-
- prefer plan-first candidates over immediate large edits
|
|
712
|
-
- record each candidate as `report_type='optimization_candidate'`
|
|
713
|
-
- define which candidates enter smoke first
|
|
714
|
-
- for a newly promoted line, keep at least one `simple-first` candidate in the initial seed batch
|
|
715
|
-
- do not start a fresh line with ensemble stacking, broad HPO, or a heavy multi-stage pipeline unless durable evidence already proves the simple route is insufficient
|
|
716
|
-
|
|
717
|
-
For each seed candidate, record at least:
|
|
718
|
-
|
|
719
|
-
- candidate_id
|
|
720
|
-
- parent line
|
|
721
|
-
- strategy
|
|
722
|
-
- mechanism_family
|
|
723
|
-
- change_layer
|
|
724
|
-
- change_plan
|
|
725
|
-
- expected_gain
|
|
726
|
-
- keep_unchanged
|
|
727
|
-
- first validation step
|
|
728
|
-
- archive condition
|
|
729
|
-
|
|
730
|
-
MLEvolve-style behavior to preserve here:
|
|
731
|
-
|
|
732
|
-
- one durable line may produce multiple candidate attempts
|
|
733
|
-
- candidate generation is bounded
|
|
734
|
-
- smoke comes before full evaluation unless the task is explicitly `fast-check` and direct quick validation is cheaper and equally informative
|
|
735
|
-
|
|
736
|
-
Use a validation-cost-aware seed policy:
|
|
737
|
-
|
|
738
|
-
- `fast-check`: the first objective smoke signal is likely under about `20` minutes
|
|
739
|
-
- `slow-check`: the first objective smoke signal is likely over about `20` minutes or expensive enough that broad probing is wasteful
|
|
740
|
-
|
|
741
|
-
For `fast-check` seed work:
|
|
742
|
-
|
|
743
|
-
- widen a bit more aggressively inside the line
|
|
744
|
-
- a seed batch of `3-5` candidates can be justified when they are genuinely differentiated
|
|
745
|
-
- prefer multiple orthogonal quick tests over one over-discussed candidate
|
|
746
|
-
- a separate smoke stage is optional; direct submission into quick parallel validation is acceptable when the first check is already cheap
|
|
747
|
-
- only skip smoke when the parallel quick validations are expected to produce distinguishable conclusions rather than repeated near-duplicate outcomes
|
|
748
|
-
|
|
749
|
-
For `slow-check` seed work:
|
|
750
|
-
|
|
751
|
-
- keep the initial seed batch tighter, usually `1-2` candidates and rarely `3`
|
|
752
|
-
- insist on a stronger reason for every candidate entering smoke
|
|
753
|
-
- prefer one dominant hypothesis plus one hedge candidate over a broad exploratory pool
|
|
754
|
-
- do not spend long runs to discover that the brief itself was weak
|
|
755
|
-
|
|
756
|
-
Do not keep a live implementation pool dominated by the same mechanism family.
|
|
757
|
-
Default active-pool rule:
|
|
758
|
-
|
|
759
|
-
- at most `1-2` live candidates from the same family
|
|
760
|
-
- if one family already fills the live pool, new same-family candidates do not enter smoke by default
|
|
761
|
-
|
|
762
|
-
## Loop protocol
|
|
763
|
-
|
|
764
|
-
Use `loop` when a durable line and implementation-candidate pool already exist and the main need is bounded forward motion.
|
|
765
|
-
|
|
766
|
-
Before changing code in `loop`, inspect the same-line local attempt memory for the current line.
|
|
767
|
-
Treat recent sibling attempts on the same line as the first memory surface, ahead of broader quest memory.
|
|
768
|
-
|
|
769
|
-
When running `loop`, choose one primary action:
|
|
770
|
-
|
|
771
|
-
- `smoke`
|
|
772
|
-
- `promote_to_full_eval`
|
|
773
|
-
- `archive`
|
|
774
|
-
- `record_main_result`
|
|
775
|
-
- `switch_to_fusion`
|
|
776
|
-
- `switch_to_debug`
|
|
777
|
-
- `stop`
|
|
778
|
-
|
|
779
|
-
Every loop pass should end with:
|
|
780
|
-
|
|
781
|
-
- one updated candidate status
|
|
782
|
-
- one updated next action
|
|
783
|
-
- one frontier review trigger
|
|
784
|
-
|
|
785
|
-
Do not leave the line with several half-started directions and no dominant next move.
|
|
786
|
-
|
|
787
|
-
Default exploit rule: one atomic improvement per pass.
|
|
788
|
-
Do not bundle several unrelated changes into one exploit candidate unless:
|
|
789
|
-
|
|
790
|
-
- the changes are one tightly coupled design package
|
|
791
|
-
- or the pass is explicitly a fusion route
|
|
792
|
-
|
|
793
|
-
MLEvolve-style behavior to preserve here:
|
|
794
|
-
|
|
795
|
-
- bounded parallelism
|
|
796
|
-
- small live candidate pool
|
|
797
|
-
- explicit move from draft -> smoke -> full eval -> archive or result
|
|
798
|
-
- measured frontier review after real evidence
|
|
799
|
-
|
|
800
|
-
Use a validation-cost-aware loop policy:
|
|
801
|
-
|
|
802
|
-
- for `fast-check` tasks, it is acceptable to run more quick, different tests before converging
|
|
803
|
-
- for `fast-check` tasks, direct quick validation may replace a separate smoke stage if that saves time without losing decision quality
|
|
804
|
-
- for `slow-check` tasks, use fewer but sharper passes, and require objective gain before widening or evolving further
|
|
805
|
-
- if the validation loop is slow, do not keep paying for frontier uncertainty that could have been reduced in `brief`
|
|
806
|
-
- if the validation loop is fast, prefer resolving uncertainty with evidence instead of over-arguing in chat
|
|
807
|
-
|
|
808
|
-
Use a branch/family diversity cap during exploitation:
|
|
809
|
-
|
|
810
|
-
- do not keep selecting only the locally familiar family because it is easiest to elaborate
|
|
811
|
-
- when several strong candidates are close, prefer the one that preserves frontier diversity
|
|
812
|
-
- if one branch or family already dominates recent attempts, require stronger evidence before selecting another near-duplicate attempt
|
|
813
|
-
|
|
814
|
-
## Memory protocol
|
|
815
|
-
|
|
816
|
-
Before broad new search, run at least one `memory.search(...)` using:
|
|
817
|
-
|
|
818
|
-
- the current task name
|
|
819
|
-
- the active idea id
|
|
820
|
-
- a method keyword
|
|
821
|
-
- the most recent failure mode or successful mechanism
|
|
822
|
-
|
|
823
|
-
When the search appears too narrow, also retrieve one of:
|
|
824
|
-
|
|
825
|
-
- a similar failure pattern
|
|
826
|
-
- an orthogonal success pattern
|
|
827
|
-
- a deliberately dissimilar but high-value prior attempt
|
|
828
|
-
|
|
829
|
-
For `seed`, `loop`, and `debug`, also inspect the same-line local attempt memory from the current leading line before widening to broader quest memory.
|
|
830
|
-
|
|
831
|
-
Write at least one quest memory card when you learn something reusable, such as:
|
|
832
|
-
|
|
833
|
-
- a successful optimization pattern
|
|
834
|
-
- a repeated failure pattern
|
|
835
|
-
- a fusion lesson
|
|
836
|
-
- a reason a candidate should not be retried
|
|
837
|
-
|
|
838
|
-
Use the integrated `optimization memory template` section for the minimum acceptable memory-card shape.
|
|
839
|
-
|
|
840
|
-
Do not write generic "we tried some optimization" memory cards.
|
|
841
|
-
Each card should be retrieval-friendly and decision-relevant.
|
|
842
|
-
|
|
843
|
-
## Artifact protocol
|
|
170
|
+
## Non-negotiable rules
|
|
844
171
|
|
|
845
|
-
|
|
172
|
+
- Keep all major optimization successes and failures durable through artifacts and memory.
|
|
173
|
+
- Do not convert ranking uncertainty into premature branch creation.
|
|
174
|
+
- Do not treat an implementation-level candidate report as a new durable optimization line.
|
|
175
|
+
- Before broad new search, inspect recent optimization memory and the same-line local attempt memory when relevant.
|
|
176
|
+
- If the same line stalls repeatedly, switch route instead of pretending more of the same is new evidence.
|
|
177
|
+
- Plateau is a route signal, not a reason to keep issuing tiny tweaks.
|
|
178
|
+
|
|
179
|
+
## Operational guidance
|
|
180
|
+
|
|
181
|
+
The main skill keeps the control surface in front.
|
|
182
|
+
For the longer playbooks, templates, and protocol details, read the references:
|
|
183
|
+
|
|
184
|
+
- `references/operational-guidance.md`
|
|
185
|
+
- `references/brief-shaping-playbook.md`
|
|
186
|
+
- `references/candidate-ranking-template.md`
|
|
187
|
+
- `references/frontier-review-template.md`
|
|
188
|
+
- `references/method-brief-template.md`
|
|
189
|
+
- `references/codegen-route-playbook.md`
|
|
190
|
+
- `references/debug-response-template.md`
|
|
191
|
+
- `references/fusion-playbook.md`
|
|
192
|
+
- `references/optimization-memory-template.md`
|
|
193
|
+
- `references/optimize-checklist-template.md`
|
|
194
|
+
- `references/plateau-response-playbook.md`
|
|
195
|
+
- `references/prompt-patterns.md`
|
|
196
|
+
|
|
197
|
+
Use them when:
|
|
198
|
+
|
|
199
|
+
- the candidate brief is still fuzzy
|
|
200
|
+
- explicit ranking or promotion notes are needed
|
|
201
|
+
- the frontier route is unclear
|
|
202
|
+
- implementation-route choice, debug, fusion, or plateau handling needs the full playbook
|
|
203
|
+
- memory writing, checklist maintenance, or prompt shaping materially affect the route
|
|
846
204
|
|
|
847
|
-
|
|
848
|
-
- `artifact.submit_idea(..., submission_mode='line')` for durable promoted lines
|
|
849
|
-
- `artifact.record(payload={'kind': 'report', 'report_type': 'optimization_candidate', ...})` for within-line attempts
|
|
850
|
-
- `artifact.record(payload={'kind': 'decision', 'action': 'iterate'|'branch'|'continue'|'stop', ...})` for route changes
|
|
851
|
-
- `artifact.record_main_experiment(...)` for real measured line results
|
|
205
|
+
## Integrated reference appendix
|
|
852
206
|
|
|
853
|
-
|
|
207
|
+
Use these reference sections as needed without copying them into chat:
|
|
854
208
|
|
|
855
|
-
-
|
|
856
|
-
-
|
|
857
|
-
-
|
|
858
|
-
-
|
|
209
|
+
### optimize-checklist-template.md
|
|
210
|
+
### candidate-board-template.md
|
|
211
|
+
### method-brief-template.md
|
|
212
|
+
### brief-shaping-playbook.md
|
|
213
|
+
### candidate-ranking-template.md
|
|
214
|
+
### frontier-review-template.md
|
|
215
|
+
### optimization-memory-template.md
|
|
216
|
+
### fusion-playbook.md
|
|
217
|
+
### codegen-route-playbook.md
|
|
218
|
+
### debug-response-template.md
|
|
219
|
+
### prompt-patterns.md
|
|
220
|
+
### plateau-response-playbook.md
|
|
859
221
|
|
|
860
|
-
|
|
222
|
+
Codegen route choices should stay explicit: stepwise generation for incremental edits, diff / patch generation for contained changes, and full rewrite only when the old surface is genuinely the blocker.
|
|
223
|
+
Mandatory first-call sequence: refresh `artifact.get_optimization_frontier(...)`, recover quest state, then choose `brief`, `rank`, `seed`, `loop`, `fusion`, `debug`, or `stop`.
|
|
224
|
+
Use memory.search(...) for same-line local attempt memory before repeating a known failure or reopening stale frontier assumptions.
|
|
861
225
|
|
|
862
|
-
-
|
|
863
|
-
|
|
864
|
-
-
|
|
865
|
-
- `smoke_failed`
|
|
866
|
-
- `promoted`
|
|
867
|
-
- `full_eval_running`
|
|
868
|
-
- `succeeded`
|
|
869
|
-
- `failed`
|
|
870
|
-
- `archived`
|
|
226
|
+
Stall-recovery protocol: if a line stops improving, decide whether the issue is mechanism family, change-layer diversity, validation-cost-aware seed policy, validation-cost-aware loop policy, or execution noise.
|
|
227
|
+
Internal submode selection should preserve a coverage contract and a distinct promotion policy for each route.
|
|
228
|
+
InternAgent maps most naturally to codegen-route and execution-surface optimization; MLEvolve maps most naturally to search-loop, mutation, and validation orchestration.
|
|
871
229
|
|
|
872
|
-
|
|
230
|
+
Brief shaping should clarify the bottleneck, constraints, and comparability boundary first, then generate a small differentiated slate, usually `2-3` serious approaches.
|
|
231
|
+
Recommend one approach with explicit tradeoffs against the alternatives, and self-check the winning brief for ambiguity, overlap, and weak justification before submission.
|
|
232
|
+
recommend one approach with explicit tradeoffs against the alternatives
|
|
233
|
+
Candidate briefs should expose `why_now`.
|
|
873
234
|
|
|
874
|
-
|
|
235
|
+
For seed mode, use a validation-cost-aware seed policy: if checks are under about `20` minutes, a separate smoke stage is optional; direct submission into quick parallel validation is acceptable.
|
|
236
|
+
Only skip smoke when the parallel quick validations are expected to produce distinguishable conclusions.
|
|
237
|
+
only skip smoke when the parallel quick validations are expected to produce distinguishable conclusions
|
|
238
|
+
Use smoke test or direct quick validation according to uncertainty, and you may skip a separate smoke stage and submit several quick validations in parallel when the hypotheses are separable.
|
|
239
|
+
For loop mode, use a validation-cost-aware loop policy; if the validation loop is slow, do not keep paying for frontier uncertainty that could have been reduced in `brief`.
|
|
240
|
+
Gate evolution on clear objective signal rather than small local preference.
|
|
241
|
+
gate evolution on clear objective signal
|
|
875
242
|
|
|
876
|
-
-
|
|
877
|
-
-
|
|
878
|
-
- Do not keep rerunning the same unchanged candidate.
|
|
879
|
-
- If a candidate fails with a clear root cause, either debug it deliberately or archive it.
|
|
880
|
-
- If the same line stalls repeatedly, switch to exploit or fusion rather than pretending more of the same is new evidence.
|
|
243
|
+
Family-shift trigger: when repeated same-family edits stall, revisit the mechanism family.
|
|
244
|
+
Task-category primer: prefer simple-first changes, one atomic improvement per pass, and bugfix-only passes when the failure is localized.
|
|
881
245
|
|
|
882
|
-
|
|
246
|
+
## Exit criteria
|
|
883
247
|
|
|
884
|
-
|
|
885
|
-
2. implementation-level candidate generation
|
|
886
|
-
3. smoke test or direct quick validation
|
|
887
|
-
4. promotion to fuller evaluation when justified
|
|
888
|
-
5. durable result recording
|
|
889
|
-
6. frontier review
|
|
248
|
+
Exit `optimize` only when one of these is durably true:
|
|
890
249
|
|
|
891
|
-
|
|
892
|
-
|
|
893
|
-
-
|
|
894
|
-
- usually `2-3` live implementation candidates in smoke
|
|
895
|
-
- usually `1-2` full evaluations running at once unless the environment clearly supports more
|
|
896
|
-
|
|
897
|
-
Validation-cost-aware override:
|
|
898
|
-
|
|
899
|
-
- if first-pass validation is under about `20` minutes, it is reasonable to increase smoke breadth modestly and compare more alternatives early
|
|
900
|
-
- if first-pass validation is under about `20` minutes, you may skip a separate smoke stage and submit several quick validations in parallel
|
|
901
|
-
- only do that when the validations are likely to yield different conclusions such as clear win / tie / fail / instability, rather than redundant repeats
|
|
902
|
-
- if first-pass validation is slower than that, keep the active pool narrow and gate evolution on clear objective signal
|
|
903
|
-
- for slow validation, do not promote a candidate into heavier resource investment until smoke or pilot evidence shows a real performance improvement, stability improvement, or comparability-preserving advantage
|
|
904
|
-
|
|
905
|
-
## Code-generation route selection
|
|
906
|
-
|
|
907
|
-
Do not use the same code-generation route for every optimization step.
|
|
908
|
-
|
|
909
|
-
Prefer:
|
|
910
|
-
|
|
911
|
-
1. brief-first, no code yet
|
|
912
|
-
- when the direction is still unclear
|
|
913
|
-
- stay at candidate-brief level
|
|
914
|
-
|
|
915
|
-
2. stepwise generation
|
|
916
|
-
- for the first substantial implementation of a new durable line
|
|
917
|
-
- especially when the line touches multiple subsystems such as data processing, model design, and training/evaluation
|
|
918
|
-
|
|
919
|
-
3. diff / patch generation
|
|
920
|
-
- when a strong current implementation already exists
|
|
921
|
-
- for improve, exploit, debug, and most fusion work
|
|
922
|
-
|
|
923
|
-
4. full rewrite
|
|
924
|
-
- only when the current implementation is too broken or too structurally mismatched for diff patching to remain safe
|
|
925
|
-
|
|
926
|
-
Use the integrated `codegen route playbook` section before committing to a larger rewrite.
|
|
927
|
-
|
|
928
|
-
## Debug protocol
|
|
929
|
-
|
|
930
|
-
Use `debug` when a candidate failed but still looks strategically valuable.
|
|
931
|
-
|
|
932
|
-
`debug` is bugfix-only.
|
|
933
|
-
Do not use a debug pass to sneak in a new performance-improvement idea.
|
|
934
|
-
If the proposed change goes beyond the minimal fix and becomes a new mechanism, stop and route back to `brief` or `loop` instead.
|
|
935
|
-
|
|
936
|
-
When a candidate fails:
|
|
937
|
-
|
|
938
|
-
- classify whether the failure is structural, local, or environmental
|
|
939
|
-
- retrieve similar failure patterns from memory before changing code
|
|
940
|
-
- prefer targeted fixes over broad rewrites
|
|
941
|
-
- define the exact post-fix bounded check before editing
|
|
942
|
-
|
|
943
|
-
Good debug prompts should make these explicit:
|
|
944
|
-
|
|
945
|
-
- the concrete error
|
|
946
|
-
- the likely root cause
|
|
947
|
-
- the minimal fix
|
|
948
|
-
- what must remain unchanged
|
|
949
|
-
|
|
950
|
-
Use the integrated `debug response template` section for the minimum acceptable debug response shape.
|
|
951
|
-
|
|
952
|
-
Archive rather than debug when:
|
|
953
|
-
|
|
954
|
-
- the failure is mostly strategic rather than local
|
|
955
|
-
- the candidate no longer looks better than the nearby alternatives
|
|
956
|
-
- the fix would effectively turn it into a different candidate anyway
|
|
957
|
-
|
|
958
|
-
## Fusion protocol
|
|
959
|
-
|
|
960
|
-
Use `fusion` only when the frontier justifies cross-line combination.
|
|
961
|
-
|
|
962
|
-
Before opening a fusion candidate:
|
|
963
|
-
|
|
964
|
-
- identify the real strength of each source line
|
|
965
|
-
- identify the real weakness of each source line
|
|
966
|
-
- explain why the strengths are complementary rather than redundant
|
|
967
|
-
- define what remains unchanged for comparability
|
|
968
|
-
- define the bounded evidence that would prove the fusion was worthwhile
|
|
969
|
-
|
|
970
|
-
Use the integrated `fusion playbook` section before launching cross-line fusion.
|
|
971
|
-
|
|
972
|
-
Do not fuse:
|
|
973
|
-
|
|
974
|
-
- two lines with the same mechanism under different names
|
|
975
|
-
- two weak lines that lack a clear strength
|
|
976
|
-
- merely because multiple branches exist
|
|
977
|
-
|
|
978
|
-
If the fusion hypothesis is still underspecified, return to `brief` instead of pretending fusion is ready.
|
|
979
|
-
|
|
980
|
-
## Prompt patterns worth preserving
|
|
981
|
-
|
|
982
|
-
For candidate-brief, improve, fusion, and debug prompts, preserve these recurring structures:
|
|
983
|
-
|
|
984
|
-
- Introduction
|
|
985
|
-
- Task description
|
|
986
|
-
- Memory
|
|
987
|
-
- Previous solution or previous line
|
|
988
|
-
- Instructions
|
|
989
|
-
- assistant_prefix when a stable response lead-in reduces drift
|
|
990
|
-
- explicit response format
|
|
991
|
-
|
|
992
|
-
And preserve these recurring reasoning contracts:
|
|
993
|
-
|
|
994
|
-
- root cause first
|
|
995
|
-
- WHAT / WHY / HOW
|
|
996
|
-
- KEEP UNCHANGED
|
|
997
|
-
- explicit next action
|
|
998
|
-
|
|
999
|
-
Use the integrated `prompt patterns` section as the canonical optimization prompt crib sheet.
|
|
1000
|
-
|
|
1001
|
-
## Plateau and fusion protocol
|
|
1002
|
-
|
|
1003
|
-
Treat repeated local edits without evidence gain as a search failure mode.
|
|
1004
|
-
|
|
1005
|
-
If one line shows repeated non-improving results:
|
|
1006
|
-
|
|
1007
|
-
- stop issuing near-duplicate attempts
|
|
1008
|
-
- record the stagnation explicitly
|
|
1009
|
-
- either widen the search or fuse with another line
|
|
1010
|
-
|
|
1011
|
-
Use the integrated `fusion playbook` section before launching cross-line fusion.
|
|
1012
|
-
Use the integrated `plateau response playbook` section when deciding how to respond to repeated non-improving results.
|
|
1013
|
-
|
|
1014
|
-
Good fusion candidates usually satisfy both:
|
|
1015
|
-
|
|
1016
|
-
- each source line has at least one real strength
|
|
1017
|
-
- the strengths are complementary rather than redundant
|
|
1018
|
-
|
|
1019
|
-
Do not fuse merely because two lines both exist.
|
|
1020
|
-
|
|
1021
|
-
When a line plateaus:
|
|
1022
|
-
|
|
1023
|
-
- stop issuing near-duplicate low-information attempts
|
|
1024
|
-
- say explicitly that the line is plateauing
|
|
1025
|
-
- force one larger route change:
|
|
1026
|
-
- widen the brief slate
|
|
1027
|
-
- promote a stronger alternative
|
|
1028
|
-
- fuse
|
|
1029
|
-
- debug one blocked but valuable candidate
|
|
1030
|
-
- stop
|
|
1031
|
-
|
|
1032
|
-
Do not hide plateau under a sequence of tiny "one more tweak" loops.
|
|
1033
|
-
|
|
1034
|
-
Family-shift trigger:
|
|
1035
|
-
|
|
1036
|
-
- if recent attempts stay inside one mechanism family and there is no meaningful improvement
|
|
1037
|
-
- or if `success_patience >= 2`
|
|
1038
|
-
- or if `total_patience >= 5`
|
|
1039
|
-
- the next pass must not be another same-family Tier1 tweak
|
|
1040
|
-
- instead choose one of:
|
|
1041
|
-
- orthogonal family
|
|
1042
|
-
- Tier2 or Tier3 shift
|
|
1043
|
-
- fusion
|
|
1044
|
-
- stop
|
|
1045
|
-
|
|
1046
|
-
This is the default anti-collapse rule for optimize.
|
|
1047
|
-
|
|
1048
|
-
## Task-category primer
|
|
1049
|
-
|
|
1050
|
-
Before widening a stale frontier, classify the task briefly into one or more dominant structures:
|
|
1051
|
-
|
|
1052
|
-
- tabular
|
|
1053
|
-
- vision / spatial
|
|
1054
|
-
- sequence / language
|
|
1055
|
-
- graph / topology
|
|
1056
|
-
- systems / optimization
|
|
1057
|
-
- mixed
|
|
1058
|
-
|
|
1059
|
-
Then ask whether the current brief slate overfits one familiar method family for that task.
|
|
1060
|
-
If it does, require at least one serious candidate from a different plausible family or lens before promotion.
|
|
1061
|
-
|
|
1062
|
-
## Stall-recovery protocol
|
|
1063
|
-
|
|
1064
|
-
If the optimize stage appears to stall, diagnose the stall explicitly instead of idling.
|
|
1065
|
-
|
|
1066
|
-
Common stall classes:
|
|
1067
|
-
|
|
1068
|
-
- no frontier information
|
|
1069
|
-
- no candidate clearly worth promotion
|
|
1070
|
-
- candidate pool is too similar
|
|
1071
|
-
- repeated failures on one line
|
|
1072
|
-
- no active runs and no next action recorded
|
|
1073
|
-
|
|
1074
|
-
Preferred recovery order:
|
|
1075
|
-
|
|
1076
|
-
1. refresh the frontier
|
|
1077
|
-
2. inspect the current candidate board
|
|
1078
|
-
3. inspect recent optimization memory
|
|
1079
|
-
4. record one explicit route decision
|
|
1080
|
-
5. continue with exactly one concrete next action
|
|
1081
|
-
|
|
1082
|
-
Do not leave the stage parked without a recorded reason and a concrete reopen condition.
|
|
1083
|
-
|
|
1084
|
-
## Stage-end requirement
|
|
1085
|
-
|
|
1086
|
-
Stage-end requirement:
|
|
1087
|
-
|
|
1088
|
-
- write at least one `memory.write(...)` when the pass produced a reusable success pattern, repeated failure pattern, fusion lesson, or explicit non-retry rule
|
|
1089
|
-
- update `OPTIMIZE_CHECKLIST.md`
|
|
1090
|
-
- update `CANDIDATE_BOARD.md` when the candidate pool changed
|
|
1091
|
-
- leave one durable next action or stop condition
|
|
1092
|
-
|
|
1093
|
-
If nothing reusable was learned, record why this pass was still necessary instead of writing a fake memory card.
|
|
1094
|
-
|
|
1095
|
-
## Completion rule
|
|
1096
|
-
|
|
1097
|
-
This stage is complete only when one of these is durably true:
|
|
1098
|
-
|
|
1099
|
-
- a stronger line was promoted and the next anchor is clear
|
|
1100
|
-
- the current line produced a real measured result and the next route is recorded
|
|
1101
|
-
- the optimization frontier says stop and that stop decision is durably recorded
|
|
250
|
+
- a stronger line was promoted and the next anchor is clear
|
|
251
|
+
- the current line produced a real measured result and the next route is recorded
|
|
252
|
+
- the optimization frontier says stop and that stop decision is durably recorded
|
|
1102
253
|
|
|
1103
254
|
Do not treat one candidate creation or one smoke pass as stage completion.
|
|
1104
|
-
|
|
1105
|
-
## Integrated reference appendix
|
|
1106
|
-
|
|
1107
|
-
This appendix inlines the former `optimize/references/*.md` material so the skill remains self-contained.
|
|
1108
|
-
|
|
1109
|
-
### brief-shaping-playbook.md
|
|
1110
|
-
|
|
1111
|
-
# Brief Shaping Playbook
|
|
1112
|
-
|
|
1113
|
-
Use this reference when a candidate direction is still fuzzy and needs to become a structured, ranking-ready brief.
|
|
1114
|
-
|
|
1115
|
-
This playbook borrows the useful part of product-style brainstorming without importing a full software-spec workflow.
|
|
1116
|
-
The goal is not a long design document.
|
|
1117
|
-
The goal is a compact candidate brief that is clear enough to compare, rank, and either submit as `submission_mode='candidate'` or reject.
|
|
1118
|
-
|
|
1119
|
-
## 1. Clarify before widening
|
|
1120
|
-
|
|
1121
|
-
Before generating more variants, resolve the minimum ambiguity around:
|
|
1122
|
-
|
|
1123
|
-
- the concrete bottleneck
|
|
1124
|
-
- the evaluation or comparability boundary
|
|
1125
|
-
- the main hard constraint: data, metric, compute, latency, memory, interface, or training budget
|
|
1126
|
-
- the current incumbent or baseline that this brief must beat or complement
|
|
1127
|
-
|
|
1128
|
-
If one unknown would materially change every candidate, clarify it first instead of generating a noisy slate.
|
|
1129
|
-
Prefer one question at a time when clarification is genuinely needed.
|
|
1130
|
-
If the answer is already available from durable state, use that instead of asking.
|
|
1131
|
-
|
|
1132
|
-
## 2. Generate a small differentiated slate
|
|
1133
|
-
|
|
1134
|
-
Default target: `2-3` serious approaches.
|
|
1135
|
-
|
|
1136
|
-
The slate should usually include:
|
|
1137
|
-
|
|
1138
|
-
- one incumbent-deepening refinement
|
|
1139
|
-
- one orthogonal mechanism
|
|
1140
|
-
- one broader shift candidate when justified
|
|
1141
|
-
|
|
1142
|
-
Do not produce several renamed variants of the same mechanism family.
|
|
1143
|
-
If two variants differ only by parameter choice or patch detail, keep only the sharper one.
|
|
1144
|
-
|
|
1145
|
-
For each candidate, write:
|
|
1146
|
-
|
|
1147
|
-
- bottleneck
|
|
1148
|
-
- why_current_line_is_limited
|
|
1149
|
-
- mechanism
|
|
1150
|
-
- why_now
|
|
1151
|
-
- keep_unchanged
|
|
1152
|
-
- expected_gain
|
|
1153
|
-
- main_risks
|
|
1154
|
-
|
|
1155
|
-
## 3. Compare on one shared surface
|
|
1156
|
-
|
|
1157
|
-
Before recommending a winner, compare the serious candidates on the same dimensions:
|
|
1158
|
-
|
|
1159
|
-
- expected upside
|
|
1160
|
-
- comparability safety
|
|
1161
|
-
- implementation surface
|
|
1162
|
-
- mechanism distinctness
|
|
1163
|
-
- failure risk
|
|
1164
|
-
- reason this route is better now than the nearby alternatives
|
|
1165
|
-
|
|
1166
|
-
Do not let each candidate justify itself with a different scoring story.
|
|
1167
|
-
Use one comparison surface so ranking is auditable.
|
|
1168
|
-
|
|
1169
|
-
## 4. Recommend exactly one lead brief
|
|
1170
|
-
|
|
1171
|
-
After comparison, recommend one lead brief and explain:
|
|
1172
|
-
|
|
1173
|
-
- why it is the best next move now
|
|
1174
|
-
- why the main alternatives are deferred instead of promoted
|
|
1175
|
-
- what evidence would quickly disconfirm the lead brief
|
|
1176
|
-
|
|
1177
|
-
Do not say "all are promising" and promote everything.
|
|
1178
|
-
If the slate is still too close to call, return to widening once or narrow the slate further.
|
|
1179
|
-
|
|
1180
|
-
## 5. Self-check before submission
|
|
1181
|
-
|
|
1182
|
-
Before calling `artifact.submit_idea(..., submission_mode='candidate', ...)`, check:
|
|
1183
|
-
|
|
1184
|
-
- Is the bottleneck concrete rather than generic?
|
|
1185
|
-
- Does `why_current_line_is_limited` explain a real gap instead of restating the mechanism?
|
|
1186
|
-
- Does `why_now` explain what changed in evidence, failure pattern, or frontier state?
|
|
1187
|
-
- Is the comparability boundary explicit?
|
|
1188
|
-
- Is the recommendation based on tradeoffs rather than implementation convenience?
|
|
1189
|
-
- Would the brief still make sense if handed to another agent with no chat context?
|
|
1190
|
-
|
|
1191
|
-
If any answer is no, refine the brief before submission.
|
|
1192
|
-
|
|
1193
|
-
## 6. Output shape
|
|
1194
|
-
|
|
1195
|
-
A good final brief package is short and structured:
|
|
1196
|
-
|
|
1197
|
-
1. brief title
|
|
1198
|
-
2. one-paragraph bottleneck and constraint summary
|
|
1199
|
-
3. a `2-3` candidate comparison table or bullet slate
|
|
1200
|
-
4. recommended brief with tradeoff summary
|
|
1201
|
-
5. self-check outcome
|
|
1202
|
-
6. fields ready for the integrated `method-brief-template.md` section
|
|
1203
|
-
|
|
1204
|
-
Keep it compact.
|
|
1205
|
-
This is a shaping pass for optimization candidates, not a paper draft or engineering spec.
|
|
1206
|
-
|
|
1207
|
-
### candidate-board-template.md
|
|
1208
|
-
|
|
1209
|
-
# CANDIDATE_BOARD.md
|
|
1210
|
-
|
|
1211
|
-
| Candidate ID | Level | Parent | Strategy | Status | Expected Gain | Observed Result | Promote / Archive |
|
|
1212
|
-
| --- | --- | --- | --- | --- | --- | --- | --- |
|
|
1213
|
-
| cand-001 | brief | current-head | explore | proposed | Better tail accuracy | n/a | pending |
|
|
1214
|
-
| cand-002 | impl | cand-001 | exploit | smoke_passed | Faster convergence | smoke ok | consider promote |
|
|
1215
|
-
|
|
1216
|
-
Notes:
|
|
1217
|
-
|
|
1218
|
-
- `Level` should be `brief` or `implementation`
|
|
1219
|
-
- `Parent` may be a branch, idea id, run id, or candidate id
|
|
1220
|
-
- `Strategy` should usually be one of `explore`, `exploit`, `fusion`, `debug`
|
|
1221
|
-
- `Promote / Archive` should be a clear recommendation, not an empty placeholder
|
|
1222
|
-
|
|
1223
|
-
### candidate-ranking-template.md
|
|
1224
|
-
|
|
1225
|
-
# Candidate Ranking Template
|
|
1226
|
-
|
|
1227
|
-
## Candidate Set
|
|
1228
|
-
|
|
1229
|
-
- Candidate IDs:
|
|
1230
|
-
- Ranking scope:
|
|
1231
|
-
- Comparison surface:
|
|
1232
|
-
|
|
1233
|
-
## Criteria
|
|
1234
|
-
|
|
1235
|
-
- expected information gain
|
|
1236
|
-
- feasibility in current repo
|
|
1237
|
-
- comparability against baseline
|
|
1238
|
-
- implementation surface
|
|
1239
|
-
- likely novelty or distinctiveness
|
|
1240
|
-
- risk of redundant overlap
|
|
1241
|
-
- incumbent-improvement potential
|
|
1242
|
-
- distinctness from other candidates
|
|
1243
|
-
- mechanism-family diversity
|
|
1244
|
-
- change-layer diversity
|
|
1245
|
-
|
|
1246
|
-
## Ranked Candidates
|
|
1247
|
-
|
|
1248
|
-
1. `candidate_id`
|
|
1249
|
-
Score summary:
|
|
1250
|
-
Why it ranks here:
|
|
1251
|
-
Promote / hold / reject:
|
|
1252
|
-
|
|
1253
|
-
2. `candidate_id`
|
|
1254
|
-
Score summary:
|
|
1255
|
-
Why it ranks here:
|
|
1256
|
-
Promote / hold / reject:
|
|
1257
|
-
|
|
1258
|
-
3. `candidate_id`
|
|
1259
|
-
Score summary:
|
|
1260
|
-
Why it ranks here:
|
|
1261
|
-
Promote / hold / reject:
|
|
1262
|
-
|
|
1263
|
-
## Winner Justification
|
|
1264
|
-
|
|
1265
|
-
Why the selected candidate should become a durable line now.
|
|
1266
|
-
|
|
1267
|
-
## Non-Winner Notes
|
|
1268
|
-
|
|
1269
|
-
Why the other candidates were deferred, fused, or rejected.
|
|
1270
|
-
|
|
1271
|
-
## Promotion Cap
|
|
1272
|
-
|
|
1273
|
-
- how many candidates should be promoted now:
|
|
1274
|
-
- why more promotion would dilute the frontier:
|
|
1275
|
-
- same-family cap override justification:
|
|
1276
|
-
|
|
1277
|
-
### codegen-route-playbook.md
|
|
1278
|
-
|
|
1279
|
-
# Codegen Route Playbook
|
|
1280
|
-
|
|
1281
|
-
Choose the code-generation route deliberately.
|
|
1282
|
-
|
|
1283
|
-
## Use brief-only
|
|
1284
|
-
|
|
1285
|
-
Use no-code candidate briefs when:
|
|
1286
|
-
|
|
1287
|
-
- the direction is still underspecified
|
|
1288
|
-
- multiple distinct directions still need ranking
|
|
1289
|
-
- a new line should not be promoted yet
|
|
1290
|
-
|
|
1291
|
-
## Use stepwise generation
|
|
1292
|
-
|
|
1293
|
-
Prefer stepwise generation when:
|
|
1294
|
-
|
|
1295
|
-
- a new durable line is being implemented for the first time
|
|
1296
|
-
- the change spans data processing, model design, and training/evaluation
|
|
1297
|
-
- a modular decomposition will reduce large integrated errors
|
|
1298
|
-
- a plan -> refine -> implement sequence is safer than one monolithic edit
|
|
1299
|
-
|
|
1300
|
-
## Use diff / patch generation
|
|
1301
|
-
|
|
1302
|
-
Prefer diff / patch generation when:
|
|
1303
|
-
|
|
1304
|
-
- a strong current implementation already exists
|
|
1305
|
-
- the current change is local enough to preserve most of the line
|
|
1306
|
-
- the task is improve, exploit, debug, or most fusion work
|
|
1307
|
-
- the desired change can be described as a bounded delta from the current solution
|
|
1308
|
-
|
|
1309
|
-
## Use full rewrite
|
|
1310
|
-
|
|
1311
|
-
Use a full rewrite only when:
|
|
1312
|
-
|
|
1313
|
-
- the existing implementation is structurally broken
|
|
1314
|
-
- the desired architecture no longer matches the current codebase shape
|
|
1315
|
-
- diff patching would be more fragile than replacement
|
|
1316
|
-
|
|
1317
|
-
Do not jump to a rewrite merely because one local patch failed.
|
|
1318
|
-
|
|
1319
|
-
## Response shape
|
|
1320
|
-
|
|
1321
|
-
For non-trivial codegen work, prefer this shape:
|
|
1322
|
-
|
|
1323
|
-
1. short plan
|
|
1324
|
-
2. bounded implementation surface
|
|
1325
|
-
3. keep-unchanged contract
|
|
1326
|
-
4. validation step
|
|
1327
|
-
|
|
1328
|
-
Do not go from a vague idea directly into a large patch with no intermediate plan.
|
|
1329
|
-
|
|
1330
|
-
### debug-response-template.md
|
|
1331
|
-
|
|
1332
|
-
# Debug Response Template
|
|
1333
|
-
|
|
1334
|
-
## Error
|
|
1335
|
-
|
|
1336
|
-
What concrete error or failure occurred?
|
|
1337
|
-
|
|
1338
|
-
## Retrieved Memory
|
|
1339
|
-
|
|
1340
|
-
What similar failure pattern or repair lesson should be reused before changing code?
|
|
1341
|
-
|
|
1342
|
-
## Root Cause
|
|
1343
|
-
|
|
1344
|
-
What is the most likely underlying cause?
|
|
1345
|
-
|
|
1346
|
-
## Minimal Fix
|
|
1347
|
-
|
|
1348
|
-
What is the smallest plausible fix?
|
|
1349
|
-
|
|
1350
|
-
## Keep Unchanged
|
|
1351
|
-
|
|
1352
|
-
What parts of the line must remain unchanged for comparability and stability?
|
|
1353
|
-
|
|
1354
|
-
## Next Check
|
|
1355
|
-
|
|
1356
|
-
What bounded smoke or validation check should confirm the fix?
|
|
1357
|
-
|
|
1358
|
-
## Archive Threshold
|
|
1359
|
-
|
|
1360
|
-
What outcome would prove this candidate should be archived instead of debugged again?
|
|
1361
|
-
|
|
1362
|
-
### frontier-review-template.md
|
|
1363
|
-
|
|
1364
|
-
# Frontier Review Template
|
|
1365
|
-
|
|
1366
|
-
## Current Frontier
|
|
1367
|
-
|
|
1368
|
-
- mode:
|
|
1369
|
-
- best branch:
|
|
1370
|
-
- best run:
|
|
1371
|
-
- stagnant branches:
|
|
1372
|
-
- candidate backlog:
|
|
1373
|
-
- fusion candidates:
|
|
1374
|
-
|
|
1375
|
-
## Evidence Summary
|
|
1376
|
-
|
|
1377
|
-
- strongest support:
|
|
1378
|
-
- strongest contradiction:
|
|
1379
|
-
- biggest unresolved risk:
|
|
1380
|
-
|
|
1381
|
-
## Route Choice
|
|
1382
|
-
|
|
1383
|
-
- explore / exploit / fusion / debug / stop:
|
|
1384
|
-
- why this is the best next move:
|
|
1385
|
-
|
|
1386
|
-
## Active Optimize Submode
|
|
1387
|
-
|
|
1388
|
-
- brief / rank / seed / loop / fusion / debug:
|
|
1389
|
-
- why this submode is dominant now:
|
|
1390
|
-
|
|
1391
|
-
## Immediate Next Action
|
|
1392
|
-
|
|
1393
|
-
- exact next step:
|
|
1394
|
-
- what result will trigger another frontier review:
|
|
1395
|
-
- what result would force a different mode:
|
|
1396
|
-
|
|
1397
|
-
### fusion-playbook.md
|
|
1398
|
-
|
|
1399
|
-
# Fusion Playbook
|
|
1400
|
-
|
|
1401
|
-
Use fusion only when:
|
|
1402
|
-
|
|
1403
|
-
- at least two lines have real strengths
|
|
1404
|
-
- the strengths are complementary
|
|
1405
|
-
- one line alone is no longer improving fast enough
|
|
1406
|
-
|
|
1407
|
-
Before fusion, write down:
|
|
1408
|
-
|
|
1409
|
-
- source line A:
|
|
1410
|
-
strongest mechanism:
|
|
1411
|
-
strongest evidence:
|
|
1412
|
-
main weakness:
|
|
1413
|
-
what must survive the fusion:
|
|
1414
|
-
|
|
1415
|
-
- source line B:
|
|
1416
|
-
strongest mechanism:
|
|
1417
|
-
strongest evidence:
|
|
1418
|
-
main weakness:
|
|
1419
|
-
what must survive the fusion:
|
|
1420
|
-
|
|
1421
|
-
Then answer:
|
|
1422
|
-
|
|
1423
|
-
- what exactly is being fused?
|
|
1424
|
-
- why does this combination address a real bottleneck?
|
|
1425
|
-
- why are the source strengths complementary rather than redundant?
|
|
1426
|
-
- what remains unchanged for comparability?
|
|
1427
|
-
- what evidence would prove the fusion was worth it?
|
|
1428
|
-
- what bounded first validation step should run before any broad rollout?
|
|
1429
|
-
|
|
1430
|
-
Do not fuse:
|
|
1431
|
-
|
|
1432
|
-
- two lines with the same mechanism under different names
|
|
1433
|
-
- two weak lines with no clear strengths
|
|
1434
|
-
- merely because multiple branches exist
|
|
1435
|
-
|
|
1436
|
-
### method-brief-template.md
|
|
1437
|
-
|
|
1438
|
-
# Method Brief Template
|
|
1439
|
-
|
|
1440
|
-
## Title
|
|
1441
|
-
|
|
1442
|
-
One short line naming the candidate direction.
|
|
1443
|
-
|
|
1444
|
-
## Bottleneck
|
|
1445
|
-
|
|
1446
|
-
What concrete bottleneck or limitation does this target?
|
|
1447
|
-
|
|
1448
|
-
## Why Current Line Is Limited
|
|
1449
|
-
|
|
1450
|
-
Why is the current best line or baseline not already solving this?
|
|
1451
|
-
|
|
1452
|
-
## Mechanism
|
|
1453
|
-
|
|
1454
|
-
What specific intervention or design change is proposed?
|
|
1455
|
-
|
|
1456
|
-
## Mechanism Family
|
|
1457
|
-
|
|
1458
|
-
Name the family explicitly, for example `adapter`, `loss`, `architecture`, `augmentation`, `ensemble`, `retrieval`, `objective-shift`.
|
|
1459
|
-
|
|
1460
|
-
## Change Layer
|
|
1461
|
-
|
|
1462
|
-
One of:
|
|
1463
|
-
|
|
1464
|
-
- `Tier1`: local optimization / training detail
|
|
1465
|
-
- `Tier2`: representation or component change
|
|
1466
|
-
- `Tier3`: paradigm or system-level shift
|
|
1467
|
-
|
|
1468
|
-
## Source Lens
|
|
1469
|
-
|
|
1470
|
-
Where did this candidate come from?
|
|
1471
|
-
|
|
1472
|
-
- baseline_refinement
|
|
1473
|
-
- orthogonal_mechanism
|
|
1474
|
-
- failure_repair
|
|
1475
|
-
- cross_domain_transfer
|
|
1476
|
-
- objective_shift
|
|
1477
|
-
- search_widening
|
|
1478
|
-
|
|
1479
|
-
## Keep Unchanged
|
|
1480
|
-
|
|
1481
|
-
What must remain stable for comparability?
|
|
1482
|
-
|
|
1483
|
-
## Expected Gain
|
|
1484
|
-
|
|
1485
|
-
What evidence should improve if this works?
|
|
1486
|
-
|
|
1487
|
-
## Implementation Surface
|
|
1488
|
-
|
|
1489
|
-
- main files or modules likely involved:
|
|
1490
|
-
- likely change scope: local / moderate / broad
|
|
1491
|
-
|
|
1492
|
-
## Risks
|
|
1493
|
-
|
|
1494
|
-
- Main failure mode
|
|
1495
|
-
- Comparability risk
|
|
1496
|
-
- Implementation risk
|
|
1497
|
-
|
|
1498
|
-
## Foundation
|
|
1499
|
-
|
|
1500
|
-
- Source branch / run / baseline:
|
|
1501
|
-
- Why this foundation is the right starting point:
|
|
1502
|
-
|
|
1503
|
-
## Promote Now
|
|
1504
|
-
|
|
1505
|
-
- yes / no
|
|
1506
|
-
- why:
|
|
1507
|
-
|
|
1508
|
-
## Next Target
|
|
1509
|
-
|
|
1510
|
-
Usually `optimize` or `experiment`.
|
|
1511
|
-
|
|
1512
|
-
### optimization-memory-template.md
|
|
1513
|
-
|
|
1514
|
-
# Optimization Memory Template
|
|
1515
|
-
|
|
1516
|
-
## Type
|
|
1517
|
-
|
|
1518
|
-
- success pattern / failure pattern / fusion lesson
|
|
1519
|
-
|
|
1520
|
-
## Context
|
|
1521
|
-
|
|
1522
|
-
- task:
|
|
1523
|
-
- branch or idea:
|
|
1524
|
-
- candidate id:
|
|
1525
|
-
- strategy:
|
|
1526
|
-
|
|
1527
|
-
## Observation
|
|
1528
|
-
|
|
1529
|
-
What actually happened?
|
|
1530
|
-
|
|
1531
|
-
## Why It Matters
|
|
1532
|
-
|
|
1533
|
-
Why should a later optimization pass retrieve this?
|
|
1534
|
-
|
|
1535
|
-
## Retrieval Hint
|
|
1536
|
-
|
|
1537
|
-
- query keywords:
|
|
1538
|
-
- closest line or mechanism family:
|
|
1539
|
-
- when this should be recalled first:
|
|
1540
|
-
|
|
1541
|
-
## Reuse Hint
|
|
1542
|
-
|
|
1543
|
-
When should this lesson be reused, and when should it be avoided?
|
|
1544
|
-
|
|
1545
|
-
### optimize-checklist-template.md
|
|
1546
|
-
|
|
1547
|
-
# OPTIMIZE_CHECKLIST.md
|
|
1548
|
-
|
|
1549
|
-
- [ ] Read `artifact.get_optimization_frontier(...)` or equivalent durable frontier summary
|
|
1550
|
-
- [ ] Select the primary optimize submode: `brief`, `rank`, `seed`, `loop`, `fusion`, or `debug`
|
|
1551
|
-
- [ ] Confirm whether the current pass is `explore`, `exploit`, `fusion`, `debug`, or `stop`
|
|
1552
|
-
- [ ] Review recent optimization memory before generating new candidates
|
|
1553
|
-
- [ ] Check whether the current brief slate covers more than one mechanism family
|
|
1554
|
-
- [ ] Candidate briefs updated or confirmed
|
|
1555
|
-
- [ ] Candidate ranking updated
|
|
1556
|
-
- [ ] Promote only the strongest brief(s) into durable line(s) if justified
|
|
1557
|
-
- [ ] Current implementation candidate pool recorded
|
|
1558
|
-
- [ ] Smoke queue defined
|
|
1559
|
-
- [ ] Full-eval queue defined
|
|
1560
|
-
- [ ] Recent failures classified and either debugged or archived
|
|
1561
|
-
- [ ] Stagnation check performed
|
|
1562
|
-
- [ ] Family-shift trigger checked
|
|
1563
|
-
- [ ] Fusion eligibility checked
|
|
1564
|
-
- [ ] Next concrete action written
|
|
1565
|
-
|
|
1566
|
-
### plateau-response-playbook.md
|
|
1567
|
-
|
|
1568
|
-
# Plateau Response Playbook
|
|
1569
|
-
|
|
1570
|
-
Use this when one line keeps producing non-improving results.
|
|
1571
|
-
|
|
1572
|
-
## Plateau indicators
|
|
1573
|
-
|
|
1574
|
-
- repeated non-improving results on the same line
|
|
1575
|
-
- repeated "small tweak" proposals with no structural change
|
|
1576
|
-
- candidate queue filled with near-duplicate mechanisms
|
|
1577
|
-
|
|
1578
|
-
## Required response
|
|
1579
|
-
|
|
1580
|
-
1. state that the line is plateauing
|
|
1581
|
-
2. identify the most likely root cause of the plateau
|
|
1582
|
-
3. choose one of:
|
|
1583
|
-
- widen search
|
|
1584
|
-
- promote a stronger alternative
|
|
1585
|
-
- fuse with another line
|
|
1586
|
-
- debug a strategically valuable blocked candidate
|
|
1587
|
-
- stop the line
|
|
1588
|
-
4. record one explicit non-repeat rule so the next pass does not retry the same low-information move
|
|
1589
|
-
|
|
1590
|
-
## Do not do
|
|
1591
|
-
|
|
1592
|
-
- keep proposing near-identical local tweaks
|
|
1593
|
-
- rerun the same unchanged candidate
|
|
1594
|
-
- fuse without a clear complementary mechanism
|
|
1595
|
-
- hide a plateau under a sequence of tiny "one more tweak" edits
|
|
1596
|
-
|
|
1597
|
-
### prompt-patterns.md
|
|
1598
|
-
|
|
1599
|
-
# Optimization Prompt Patterns
|
|
1600
|
-
|
|
1601
|
-
These prompt structures are worth preserving across optimize subroutines.
|
|
1602
|
-
|
|
1603
|
-
## Common skeleton
|
|
1604
|
-
|
|
1605
|
-
- Introduction
|
|
1606
|
-
- Task description
|
|
1607
|
-
- Memory
|
|
1608
|
-
- Previous solution or previous line
|
|
1609
|
-
- Instructions
|
|
1610
|
-
- assistant_prefix when a stable response lead-in reduces drift
|
|
1611
|
-
- Explicit response format
|
|
1612
|
-
|
|
1613
|
-
## Common reasoning contract
|
|
1614
|
-
|
|
1615
|
-
- WHAT is changing?
|
|
1616
|
-
- WHY is the current line limited?
|
|
1617
|
-
- HOW should the change address the limitation?
|
|
1618
|
-
- KEEP UNCHANGED: what must remain stable for comparability?
|
|
1619
|
-
- NEXT ACTION: what concrete step follows this prompt?
|
|
1620
|
-
|
|
1621
|
-
## Plateau pattern
|
|
1622
|
-
|
|
1623
|
-
When the line is stagnating:
|
|
1624
|
-
|
|
1625
|
-
- explicitly state that the current approach has plateaued
|
|
1626
|
-
- forbid trivial hyperparameter-only tweaks when a deeper change is needed
|
|
1627
|
-
- require a larger representational or architectural shift
|
|
1628
|
-
|
|
1629
|
-
## Fusion pattern
|
|
1630
|
-
|
|
1631
|
-
When combining lines:
|
|
1632
|
-
|
|
1633
|
-
- identify the real strength of each source line
|
|
1634
|
-
- explain why those strengths are complementary
|
|
1635
|
-
- avoid combining everything
|
|
1636
|
-
- preserve the comparison surface
|
|
1637
|
-
|
|
1638
|
-
## Debug pattern
|
|
1639
|
-
|
|
1640
|
-
For debugging:
|
|
1641
|
-
|
|
1642
|
-
- restate the concrete error
|
|
1643
|
-
- state the likely root cause
|
|
1644
|
-
- require the minimal targeted fix
|
|
1645
|
-
- preserve the original solution intent unless the bug proves the design invalid
|