@researai/deepscientist 1.5.17 → 1.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +309 -130
- package/AISB/catalog/aisb.b1.agentic_coding.yaml +244 -0
- package/AISB/catalog/aisb.b10.climate_earth.yaml +235 -0
- package/AISB/catalog/aisb.b11.model_efficiency.yaml +231 -0
- package/AISB/catalog/aisb.b12.embodied_ai.yaml +238 -0
- package/AISB/catalog/aisb.b2.agent_systems.yaml +229 -0
- package/AISB/catalog/aisb.b3.self_evolving_rl.yaml +237 -0
- package/AISB/catalog/aisb.b4.lm_reasoning.yaml +240 -0
- package/AISB/catalog/aisb.b5.math_proof.yaml +235 -0
- package/AISB/catalog/aisb.b6.research_process.yaml +243 -0
- package/AISB/catalog/aisb.b7.multimodal_fusion.yaml +232 -0
- package/AISB/catalog/aisb.b8.lifesci_drug.yaml +275 -0
- package/AISB/catalog/aisb.b9.material_science.yaml +237 -0
- package/AISB/catalog/aisb.t3.001_savvy.yaml +159 -0
- package/AISB/catalog/aisb.t3.001_savvy.zh.yaml +121 -0
- package/AISB/catalog/aisb.t3.002_pinet.yaml +189 -0
- package/AISB/catalog/aisb.t3.002_pinet.zh.yaml +130 -0
- package/AISB/catalog/aisb.t3.004_decentralattn.yaml +184 -0
- package/AISB/catalog/aisb.t3.004_decentralattn.zh.yaml +153 -0
- package/AISB/catalog/aisb.t3.005_tsae.yaml +193 -0
- package/AISB/catalog/aisb.t3.005_tsae.zh.yaml +139 -0
- package/AISB/catalog/aisb.t3.006_physense.yaml +194 -0
- package/AISB/catalog/aisb.t3.006_physense.zh.yaml +118 -0
- package/AISB/catalog/aisb.t3.007_reasoningiqa.yaml +169 -0
- package/AISB/catalog/aisb.t3.007_reasoningiqa.zh.yaml +133 -0
- package/AISB/catalog/aisb.t3.008_meanflows.yaml +188 -0
- package/AISB/catalog/aisb.t3.008_meanflows.zh.yaml +140 -0
- package/AISB/catalog/aisb.t3.009_scoremissing.yaml +179 -0
- package/AISB/catalog/aisb.t3.009_scoremissing.zh.yaml +119 -0
- package/AISB/catalog/aisb.t3.010_suitabilityfilter.yaml +221 -0
- package/AISB/catalog/aisb.t3.010_suitabilityfilter.zh.yaml +141 -0
- package/AISB/catalog/aisb.t3.011_osd.yaml +206 -0
- package/AISB/catalog/aisb.t3.011_osd.zh.yaml +163 -0
- package/AISB/catalog/aisb.t3.012_efficientqat.yaml +206 -0
- package/AISB/catalog/aisb.t3.012_efficientqat.zh.yaml +159 -0
- package/AISB/catalog/aisb.t3.013_appl.yaml +152 -0
- package/AISB/catalog/aisb.t3.013_appl.zh.yaml +126 -0
- package/AISB/catalog/aisb.t3.014_piguard.yaml +207 -0
- package/AISB/catalog/aisb.t3.014_piguard.zh.yaml +164 -0
- package/AISB/catalog/aisb.t3.015_frspec.yaml +209 -0
- package/AISB/catalog/aisb.t3.015_frspec.zh.yaml +163 -0
- package/AISB/catalog/aisb.t3.016_mathfusion.yaml +166 -0
- package/AISB/catalog/aisb.t3.016_mathfusion.zh.yaml +145 -0
- package/AISB/catalog/aisb.t3.017_multimodalglp.yaml +171 -0
- package/AISB/catalog/aisb.t3.017_multimodalglp.zh.yaml +122 -0
- package/AISB/catalog/aisb.t3.018_cotsynth.yaml +206 -0
- package/AISB/catalog/aisb.t3.018_cotsynth.zh.yaml +162 -0
- package/AISB/catalog/aisb.t3.019_dyscaleut.yaml +211 -0
- package/AISB/catalog/aisb.t3.019_dyscaleut.zh.yaml +148 -0
- package/AISB/catalog/aisb.t3.020_aristotle.yaml +173 -0
- package/AISB/catalog/aisb.t3.020_aristotle.zh.yaml +119 -0
- package/AISB/catalog/aisb.t3.021_tokenrecycling.yaml +160 -0
- package/AISB/catalog/aisb.t3.021_tokenrecycling.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.022_chainofreasoning.yaml +204 -0
- package/AISB/catalog/aisb.t3.022_chainofreasoning.zh.yaml +161 -0
- package/AISB/catalog/aisb.t3.023_guidedembed.yaml +211 -0
- package/AISB/catalog/aisb.t3.023_guidedembed.zh.yaml +189 -0
- package/AISB/catalog/aisb.t3.024_outputcentric.yaml +148 -0
- package/AISB/catalog/aisb.t3.024_outputcentric.zh.yaml +131 -0
- package/AISB/catalog/aisb.t3.025_deeper.yaml +143 -0
- package/AISB/catalog/aisb.t3.025_deeper.zh.yaml +116 -0
- package/AISB/catalog/aisb.t3.026_gartkg.yaml +195 -0
- package/AISB/catalog/aisb.t3.026_gartkg.zh.yaml +127 -0
- package/AISB/catalog/aisb.t3.027_citeeval.yaml +182 -0
- package/AISB/catalog/aisb.t3.027_citeeval.zh.yaml +135 -0
- package/AISB/catalog/aisb.t3.028_sbam.yaml +206 -0
- package/AISB/catalog/aisb.t3.028_sbam.zh.yaml +166 -0
- package/AISB/catalog/aisb.t3.029_cdqgeoembed.yaml +224 -0
- package/AISB/catalog/aisb.t3.029_cdqgeoembed.zh.yaml +142 -0
- package/AISB/catalog/aisb.t3.030_processrm.yaml +211 -0
- package/AISB/catalog/aisb.t3.030_processrm.zh.yaml +166 -0
- package/AISB/catalog/aisb.t3.031_circuitstability.yaml +172 -0
- package/AISB/catalog/aisb.t3.031_circuitstability.zh.yaml +134 -0
- package/AISB/catalog/aisb.t3.032_ptsolver.yaml +169 -0
- package/AISB/catalog/aisb.t3.032_ptsolver.zh.yaml +135 -0
- package/AISB/catalog/aisb.t3.033_gcse.yaml +144 -0
- package/AISB/catalog/aisb.t3.033_gcse.zh.yaml +126 -0
- package/AISB/catalog/aisb.t3.034_ensemblewm.yaml +183 -0
- package/AISB/catalog/aisb.t3.034_ensemblewm.zh.yaml +146 -0
- package/AISB/catalog/aisb.t3.035_moralvalueswa.yaml +207 -0
- package/AISB/catalog/aisb.t3.035_moralvalueswa.zh.yaml +165 -0
- package/AISB/catalog/aisb.t3.036_weakstrongpref.yaml +210 -0
- package/AISB/catalog/aisb.t3.036_weakstrongpref.zh.yaml +194 -0
- package/AISB/catalog/aisb.t3.037_dementiamask.yaml +172 -0
- package/AISB/catalog/aisb.t3.037_dementiamask.zh.yaml +132 -0
- package/AISB/catalog/aisb.t3.038_tinysam.yaml +284 -0
- package/AISB/catalog/aisb.t3.038_tinysam.zh.yaml +240 -0
- package/AISB/catalog/aisb.t3.039_calf.yaml +224 -0
- package/AISB/catalog/aisb.t3.039_calf.zh.yaml +194 -0
- package/AISB/catalog/aisb.t3.040_graniteguardian.yaml +199 -0
- package/AISB/catalog/aisb.t3.040_graniteguardian.zh.yaml +174 -0
- package/AISB/catalog/aisb.t3.041_amdm.yaml +149 -0
- package/AISB/catalog/aisb.t3.041_amdm.zh.yaml +137 -0
- package/AISB/catalog/aisb.t3.042_xpatch.yaml +216 -0
- package/AISB/catalog/aisb.t3.042_xpatch.zh.yaml +182 -0
- package/AISB/catalog/aisb.t3.043_vhm.yaml +268 -0
- package/AISB/catalog/aisb.t3.043_vhm.zh.yaml +193 -0
- package/AISB/catalog/aisb.t3.044_rgvi.yaml +224 -0
- package/AISB/catalog/aisb.t3.044_rgvi.zh.yaml +176 -0
- package/AISB/catalog/aisb.t3.045_pslstm.yaml +203 -0
- package/AISB/catalog/aisb.t3.045_pslstm.zh.yaml +179 -0
- package/AISB/catalog/aisb.t3.046_nonstatts.yaml +208 -0
- package/AISB/catalog/aisb.t3.046_nonstatts.zh.yaml +194 -0
- package/AISB/catalog/aisb.t3.047_timepfn.yaml +156 -0
- package/AISB/catalog/aisb.t3.047_timepfn.zh.yaml +124 -0
- package/AISB/catalog/aisb.t3.048_proxyspex.yaml +148 -0
- package/AISB/catalog/aisb.t3.048_proxyspex.zh.yaml +125 -0
- package/AISB/catalog/aisb.t3.049_hogwildinference.yaml +183 -0
- package/AISB/catalog/aisb.t3.049_hogwildinference.zh.yaml +138 -0
- package/AISB/catalog/aisb.t3.050_causalpfn.yaml +214 -0
- package/AISB/catalog/aisb.t3.050_causalpfn.zh.yaml +190 -0
- package/AISB/catalog/aisb.t3.051_flashtp.yaml +169 -0
- package/AISB/catalog/aisb.t3.051_flashtp.zh.yaml +124 -0
- package/AISB/catalog/aisb.t3.052_nsdiff.yaml +155 -0
- package/AISB/catalog/aisb.t3.052_nsdiff.zh.yaml +138 -0
- package/AISB/catalog/aisb.t3.053_k2vae.yaml +158 -0
- package/AISB/catalog/aisb.t3.053_k2vae.zh.yaml +132 -0
- package/AISB/catalog/aisb.t3.054_timebase.yaml +178 -0
- package/AISB/catalog/aisb.t3.054_timebase.zh.yaml +158 -0
- package/AISB/catalog/aisb.t3.055_csbrain.yaml +238 -0
- package/AISB/catalog/aisb.t3.055_csbrain.zh.yaml +184 -0
- package/AISB/catalog/aisb.t3.056_infosam.yaml +224 -0
- package/AISB/catalog/aisb.t3.056_infosam.zh.yaml +189 -0
- package/AISB/catalog/aisb.t3.057_mdreid.yaml +129 -0
- package/AISB/catalog/aisb.t3.057_mdreid.zh.yaml +117 -0
- package/AISB/catalog/aisb.t3.058_mindglitch.yaml +171 -0
- package/AISB/catalog/aisb.t3.058_mindglitch.zh.yaml +145 -0
- package/AISB/catalog/aisb.t3.059_selfsupervised.yaml +154 -0
- package/AISB/catalog/aisb.t3.059_selfsupervised.zh.yaml +125 -0
- package/AISB/catalog/aisb.t3.060_iaggad.yaml +121 -0
- package/AISB/catalog/aisb.t3.060_iaggad.zh.yaml +100 -0
- package/AISB/catalog/aisb.t3.061_hsgkn.yaml +136 -0
- package/AISB/catalog/aisb.t3.061_hsgkn.zh.yaml +113 -0
- package/AISB/catalog/aisb.t3.062_visionts.yaml +237 -0
- package/AISB/catalog/aisb.t3.062_visionts.zh.yaml +216 -0
- package/AISB/catalog/aisb.t3.063_tsrag.yaml +162 -0
- package/AISB/catalog/aisb.t3.063_tsrag.zh.yaml +138 -0
- package/AISB/catalog/aisb.t3.064_pir.yaml +221 -0
- package/AISB/catalog/aisb.t3.064_pir.zh.yaml +197 -0
- package/AISB/catalog/aisb.t3.065_proteinbinding.yaml +234 -0
- package/AISB/catalog/aisb.t3.065_proteinbinding.zh.yaml +167 -0
- package/AISB/catalog/aisb.t3.066_tropicalattention.yaml +267 -0
- package/AISB/catalog/aisb.t3.066_tropicalattention.zh.yaml +229 -0
- package/AISB/catalog/aisb.t3.067_kanad.yaml +193 -0
- package/AISB/catalog/aisb.t3.067_kanad.zh.yaml +167 -0
- package/AISB/catalog/aisb.t3.068_sempo.yaml +187 -0
- package/AISB/catalog/aisb.t3.068_sempo.zh.yaml +148 -0
- package/AISB/catalog/aisb.t3.069_treehfd.yaml +129 -0
- package/AISB/catalog/aisb.t3.069_treehfd.zh.yaml +111 -0
- package/AISB/catalog/aisb.t3.070_certifiedunlearning.yaml +224 -0
- package/AISB/catalog/aisb.t3.070_certifiedunlearning.zh.yaml +171 -0
- package/AISB/catalog/aisb.t3.071_neuralmjd.yaml +142 -0
- package/AISB/catalog/aisb.t3.071_neuralmjd.zh.yaml +120 -0
- package/AISB/catalog/aisb.t3.072_fedgmt.yaml +181 -0
- package/AISB/catalog/aisb.t3.072_fedgmt.zh.yaml +158 -0
- package/AISB/catalog/aisb.t3.073_rld.yaml +161 -0
- package/AISB/catalog/aisb.t3.073_rld.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.074_lsvi.yaml +163 -0
- package/AISB/catalog/aisb.t3.074_lsvi.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.075_treeslicedentropy.yaml +201 -0
- package/AISB/catalog/aisb.t3.075_treeslicedentropy.zh.yaml +148 -0
- package/AISB/catalog/aisb.t3.076_aanet.yaml +169 -0
- package/AISB/catalog/aisb.t3.076_aanet.zh.yaml +129 -0
- package/AISB/catalog/aisb.t3.077_cmnn.yaml +199 -0
- package/AISB/catalog/aisb.t3.077_cmnn.zh.yaml +165 -0
- package/AISB/catalog/aisb.t3.078_conformalanomaly.yaml +146 -0
- package/AISB/catalog/aisb.t3.078_conformalanomaly.zh.yaml +117 -0
- package/AISB/catalog/aisb.t3.079_dpfkmeans.yaml +131 -0
- package/AISB/catalog/aisb.t3.079_dpfkmeans.zh.yaml +104 -0
- package/AISB/catalog/aisb.t3.080_latentscorereweight.yaml +169 -0
- package/AISB/catalog/aisb.t3.080_latentscorereweight.zh.yaml +123 -0
- package/AISB/catalog/aisb.t3.081_qmamba.yaml +150 -0
- package/AISB/catalog/aisb.t3.081_qmamba.zh.yaml +117 -0
- package/AISB/catalog/aisb.t3.082_onlinellmrouting.yaml +160 -0
- package/AISB/catalog/aisb.t3.082_onlinellmrouting.zh.yaml +133 -0
- package/AISB/catalog/aisb.t3.083_starformer.yaml +178 -0
- package/AISB/catalog/aisb.t3.083_starformer.zh.yaml +140 -0
- package/AISB/catalog/aisb.t3.084_ift.yaml +139 -0
- package/AISB/catalog/aisb.t3.084_ift.zh.yaml +111 -0
- package/AISB/catalog/aisb.t3.085_neuralsurv.yaml +183 -0
- package/AISB/catalog/aisb.t3.085_neuralsurv.zh.yaml +143 -0
- package/AISB/catalog/aisb.t3.086_stella.yaml +197 -0
- package/AISB/catalog/aisb.t3.086_stella.zh.yaml +142 -0
- package/AISB/catalog/aisb.t3.087_moses.yaml +167 -0
- package/AISB/catalog/aisb.t3.087_moses.zh.yaml +132 -0
- package/AISB/catalog/aisb.t3.088_channelnorm.yaml +140 -0
- package/AISB/catalog/aisb.t3.088_channelnorm.zh.yaml +109 -0
- package/AISB/catalog/aisb.t3.089_causalvelocity.yaml +730 -0
- package/AISB/catalog/aisb.t3.089_causalvelocity.zh.yaml +668 -0
- package/AISB/catalog/aisb.t3.090_rstib.yaml +144 -0
- package/AISB/catalog/aisb.t3.090_rstib.zh.yaml +109 -0
- package/AISB/catalog/aisb.t3.091_timeawarecausal.yaml +132 -0
- package/AISB/catalog/aisb.t3.091_timeawarecausal.zh.yaml +107 -0
- package/AISB/catalog/aisb.t3.092_kmeanslocalopt.yaml +138 -0
- package/AISB/catalog/aisb.t3.092_kmeanslocalopt.zh.yaml +110 -0
- package/AISB/catalog/aisb.t3.093_fedwmsam.yaml +134 -0
- package/AISB/catalog/aisb.t3.093_fedwmsam.zh.yaml +106 -0
- package/AISB/catalog/aisb.t3.094_boundre.yaml +147 -0
- package/AISB/catalog/aisb.t3.094_boundre.zh.yaml +114 -0
- package/AISB/catalog/aisb.t3.095_fastfeaturecp.yaml +153 -0
- package/AISB/catalog/aisb.t3.095_fastfeaturecp.zh.yaml +118 -0
- package/AISB/catalog/aisb.t3.096_m3svm.yaml +189 -0
- package/AISB/catalog/aisb.t3.096_m3svm.zh.yaml +149 -0
- package/AISB/catalog/aisb.t3.097_wassersteintl.yaml +212 -0
- package/AISB/catalog/aisb.t3.097_wassersteintl.zh.yaml +169 -0
- package/AISB/catalog/aisb.t3.098_xmahalanobis.yaml +171 -0
- package/AISB/catalog/aisb.t3.098_xmahalanobis.zh.yaml +127 -0
- package/AISB/catalog/aisb.t3.099_ollalanding.yaml +248 -0
- package/AISB/catalog/aisb.t3.099_ollalanding.zh.yaml +182 -0
- package/AISB/catalog/aisb.t3.100_invmissingdata.yaml +179 -0
- package/AISB/catalog/aisb.t3.100_invmissingdata.zh.yaml +150 -0
- package/AISB/catalog/aisb.t3.101_acia.yaml +164 -0
- package/AISB/catalog/aisb.t3.101_acia.zh.yaml +109 -0
- package/AISB/catalog/aisb.t3.102_stochasticff.yaml +178 -0
- package/AISB/catalog/aisb.t3.102_stochasticff.zh.yaml +130 -0
- package/AISB/catalog/aisb.t3.103_qdcp.yaml +150 -0
- package/AISB/catalog/aisb.t3.103_qdcp.zh.yaml +116 -0
- package/AISB/catalog/aisb.t3.104_balancedactiveinf.yaml +137 -0
- package/AISB/catalog/aisb.t3.104_balancedactiveinf.zh.yaml +104 -0
- package/AISB/catalog/aisb.t3.105_binaryclasseval.yaml +161 -0
- package/AISB/catalog/aisb.t3.105_binaryclasseval.zh.yaml +130 -0
- package/AISB/image/001_aisb.t3.001_savvy.jpg +0 -0
- package/AISB/image/002_aisb.t3.002_pinet.jpg +0 -0
- package/AISB/image/003_aisb.t3.003_dmsqd.jpg +0 -0
- package/AISB/image/004_aisb.t3.004_decentralattn.jpg +0 -0
- package/AISB/image/005_aisb.t3.005_tsae.jpg +0 -0
- package/AISB/image/006_aisb.t3.006_physense.jpg +0 -0
- package/AISB/image/007_aisb.t3.007_reasoningiqa.jpg +0 -0
- package/AISB/image/008_aisb.t3.008_meanflows.jpg +0 -0
- package/AISB/image/009_aisb.t3.009_scoremissing.jpg +0 -0
- package/AISB/image/010_aisb.t3.010_suitabilityfilter.jpg +0 -0
- package/AISB/image/011_aisb.t3.011_osd.jpg +0 -0
- package/AISB/image/012_aisb.t3.012_efficientqat.jpg +0 -0
- package/AISB/image/013_aisb.t3.013_appl.jpg +0 -0
- package/AISB/image/014_aisb.t3.014_piguard.jpg +0 -0
- package/AISB/image/015_aisb.t3.015_frspec.jpg +0 -0
- package/AISB/image/016_aisb.t3.016_mathfusion.jpg +0 -0
- package/AISB/image/017_aisb.t3.017_multimodalglp.jpg +0 -0
- package/AISB/image/018_aisb.t3.018_cotsynth.jpg +0 -0
- package/AISB/image/019_aisb.t3.019_dyscaleut.jpg +0 -0
- package/AISB/image/020_aisb.t3.020_aristotle.jpg +0 -0
- package/AISB/image/021_aisb.t3.021_tokenrecycling.jpg +0 -0
- package/AISB/image/022_aisb.t3.022_chainofreasoning.jpg +0 -0
- package/AISB/image/023_aisb.t3.023_guidedembed.jpg +0 -0
- package/AISB/image/024_aisb.t3.024_outputcentric.jpg +0 -0
- package/AISB/image/025_aisb.t3.025_deeper.jpg +0 -0
- package/AISB/image/026_aisb.t3.026_gartkg.jpg +0 -0
- package/AISB/image/027_aisb.t3.027_citeeval.jpg +0 -0
- package/AISB/image/028_aisb.t3.028_sbam.jpg +0 -0
- package/AISB/image/029_aisb.t3.029_cdqgeoembed.jpg +0 -0
- package/AISB/image/030_aisb.t3.030_processrm.jpg +0 -0
- package/AISB/image/031_aisb.t3.031_circuitstability.jpg +0 -0
- package/AISB/image/032_aisb.t3.032_ptsolver.jpg +0 -0
- package/AISB/image/033_aisb.t3.033_gcse.jpg +0 -0
- package/AISB/image/034_aisb.t3.034_ensemblewm.jpg +0 -0
- package/AISB/image/035_aisb.t3.035_moralvalueswa.jpg +0 -0
- package/AISB/image/036_aisb.t3.036_weakstrongpref.jpg +0 -0
- package/AISB/image/037_aisb.t3.037_dementiamask.jpg +0 -0
- package/AISB/image/038_aisb.t3.038_tinysam.jpg +0 -0
- package/AISB/image/039_aisb.t3.039_calf.jpg +0 -0
- package/AISB/image/040_aisb.t3.040_graniteguardian.jpg +0 -0
- package/AISB/image/041_aisb.t3.041_amdm.jpg +0 -0
- package/AISB/image/042_aisb.t3.042_xpatch.jpg +0 -0
- package/AISB/image/043_aisb.t3.043_vhm.jpg +0 -0
- package/AISB/image/044_aisb.t3.044_rgvi.jpg +0 -0
- package/AISB/image/045_aisb.t3.045_pslstm.jpg +0 -0
- package/AISB/image/046_aisb.t3.046_nonstatts.jpg +0 -0
- package/AISB/image/047_aisb.t3.047_timepfn.jpg +0 -0
- package/AISB/image/048_aisb.t3.048_proxyspex.jpg +0 -0
- package/AISB/image/049_aisb.t3.049_hogwildinference.jpg +0 -0
- package/AISB/image/050_aisb.t3.050_causalpfn.jpg +0 -0
- package/AISB/image/051_aisb.t3.051_flashtp.jpg +0 -0
- package/AISB/image/052_aisb.t3.052_nsdiff.jpg +0 -0
- package/AISB/image/053_aisb.t3.053_k2vae.jpg +0 -0
- package/AISB/image/054_aisb.t3.054_timebase.jpg +0 -0
- package/AISB/image/055_aisb.t3.055_csbrain.jpg +0 -0
- package/AISB/image/056_aisb.t3.056_infosam.jpg +0 -0
- package/AISB/image/057_aisb.t3.057_mdreid.jpg +0 -0
- package/AISB/image/058_aisb.t3.058_mindglitch.jpg +0 -0
- package/AISB/image/059_aisb.t3.059_selfsupervised.jpg +0 -0
- package/AISB/image/060_aisb.t3.060_iaggad.jpg +0 -0
- package/AISB/image/061_aisb.t3.061_hsgkn.jpg +0 -0
- package/AISB/image/062_aisb.t3.062_visionts.jpg +0 -0
- package/AISB/image/063_aisb.t3.063_tsrag.jpg +0 -0
- package/AISB/image/064_aisb.t3.064_pir.jpg +0 -0
- package/AISB/image/065_aisb.t3.065_proteinbinding.jpg +0 -0
- package/AISB/image/066_aisb.t3.066_tropicalattention.jpg +0 -0
- package/AISB/image/067_aisb.t3.067_kanad.jpg +0 -0
- package/AISB/image/068_aisb.t3.068_sempo.jpg +0 -0
- package/AISB/image/069_aisb.t3.069_treehfd.jpg +0 -0
- package/AISB/image/070_aisb.t3.070_certifiedunlearning.jpg +0 -0
- package/AISB/image/071_aisb.t3.071_neuralmjd.jpg +0 -0
- package/AISB/image/072_aisb.t3.072_fedgmt.jpg +0 -0
- package/AISB/image/073_aisb.t3.073_rld.jpg +0 -0
- package/AISB/image/074_aisb.t3.074_lsvi.jpg +0 -0
- package/AISB/image/075_aisb.t3.075_treeslicedentropy.jpg +0 -0
- package/AISB/image/076_aisb.t3.076_aanet.jpg +0 -0
- package/AISB/image/077_aisb.t3.077_cmnn.jpg +0 -0
- package/AISB/image/078_aisb.t3.078_conformalanomaly.jpg +0 -0
- package/AISB/image/079_aisb.t3.079_dpfkmeans.jpg +0 -0
- package/AISB/image/080_aisb.t3.080_latentscorereweight.jpg +0 -0
- package/AISB/image/081_aisb.t3.081_qmamba.jpg +0 -0
- package/AISB/image/082_aisb.t3.082_onlinellmrouting.jpg +0 -0
- package/AISB/image/083_aisb.t3.083_starformer.jpg +0 -0
- package/AISB/image/084_aisb.t3.084_ift.jpg +0 -0
- package/AISB/image/085_aisb.t3.085_neuralsurv.jpg +0 -0
- package/AISB/image/086_aisb.t3.086_stella.jpg +0 -0
- package/AISB/image/087_aisb.t3.087_moses.jpg +0 -0
- package/AISB/image/088_aisb.t3.088_channelnorm.jpg +0 -0
- package/AISB/image/089_aisb.t3.089_causalvelocity.jpg +0 -0
- package/AISB/image/090_aisb.t3.090_rstib.jpg +0 -0
- package/AISB/image/091_aisb.t3.091_timeawarecausal.jpg +0 -0
- package/AISB/image/092_aisb.t3.092_kmeanslocalopt.jpg +0 -0
- package/AISB/image/093_aisb.t3.093_fedwmsam.jpg +0 -0
- package/AISB/image/094_aisb.t3.094_boundre.jpg +0 -0
- package/AISB/image/095_aisb.t3.095_fastfeaturecp.jpg +0 -0
- package/AISB/image/096_aisb.t3.096_m3svm.jpg +0 -0
- package/AISB/image/097_aisb.t3.097_wassersteintl.jpg +0 -0
- package/AISB/image/098_aisb.t3.098_xmahalanobis.jpg +0 -0
- package/AISB/image/099_aisb.t3.099_ollalanding.jpg +0 -0
- package/AISB/image/100_aisb.t3.100_invmissingdata.jpg +0 -0
- package/AISB/image/101_aisb.t3.101_acia.jpg +0 -0
- package/AISB/image/102_aisb.t3.102_stochasticff.jpg +0 -0
- package/AISB/image/103_aisb.t3.103_qdcp.jpg +0 -0
- package/AISB/image/104_aisb.t3.104_balancedactiveinf.jpg +0 -0
- package/AISB/image/105_aisb.t3.105_binaryclasseval.jpg +0 -0
- package/AISB/image/106_aisb.t1.reasoning_lite.jpg +0 -0
- package/AISB/image/107_aisb.t2.paper_audit.jpg +0 -0
- package/AISB/image/108_aisb.t3.multi_gpu_search.jpg +0 -0
- package/AISB/image/109_aisb.t3.tdc_admet.jpg +0 -0
- package/AISB/image/aisb.b1.agentic_coding.svg +16 -0
- package/AISB/image/aisb.b10.climate_earth.svg +16 -0
- package/AISB/image/aisb.b11.model_efficiency.svg +16 -0
- package/AISB/image/aisb.b12.embodied_ai.svg +16 -0
- package/AISB/image/aisb.b2.agent_systems.svg +16 -0
- package/AISB/image/aisb.b3.self_evolving_rl.svg +16 -0
- package/AISB/image/aisb.b4.lm_reasoning.svg +16 -0
- package/AISB/image/aisb.b5.math_proof.svg +16 -0
- package/AISB/image/aisb.b6.research_process.svg +16 -0
- package/AISB/image/aisb.b7.multimodal_fusion.svg +16 -0
- package/AISB/image/aisb.b8.lifesci_drug.svg +16 -0
- package/AISB/image/aisb.b9.material_science.svg +16 -0
- package/README.md +132 -11
- package/bin/ds.js +376 -49
- package/docs/en/00_QUICK_START.md +135 -18
- package/docs/en/01_SETTINGS_REFERENCE.md +468 -96
- package/docs/en/02_START_RESEARCH_GUIDE.md +26 -5
- package/docs/en/03_QQ_CONNECTOR_GUIDE.md +14 -3
- package/docs/en/04_LINGZHU_CONNECTOR_GUIDE.md +2 -0
- package/docs/en/05_TUI_GUIDE.md +171 -2
- package/docs/en/07_MEMORY_AND_MCP.md +38 -2
- package/docs/en/09_DOCTOR.md +64 -4
- package/docs/en/10_WEIXIN_CONNECTOR_GUIDE.md +38 -1
- package/docs/en/11_LICENSE_AND_RISK.md +4 -0
- package/docs/en/12_GUIDED_WORKFLOW_TOUR.md +15 -0
- package/docs/en/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +9 -0
- package/docs/en/15_CODEX_PROVIDER_SETUP.md +622 -187
- package/docs/en/16_TELEGRAM_CONNECTOR_GUIDE.md +14 -0
- package/docs/en/17_WHATSAPP_CONNECTOR_GUIDE.md +14 -0
- package/docs/en/18_FEISHU_CONNECTOR_GUIDE.md +14 -0
- package/docs/en/21_LOCAL_MODEL_BACKENDS_GUIDE.md +105 -2
- package/docs/en/22_BENCHSTORE_YAML_REFERENCE.md +469 -0
- package/docs/en/23_BENCHSTORE_GITHUB_RELEASES_SPEC.md +316 -0
- package/docs/en/24_CLAUDE_CODE_PROVIDER_SETUP.md +469 -0
- package/docs/en/25_OPENCODE_PROVIDER_SETUP.md +653 -0
- package/docs/en/26_CITATION_AND_ATTRIBUTION.md +119 -0
- package/docs/en/27_KIMI_CODE_PROVIDER_SETUP.md +180 -0
- package/docs/en/28_DISCORD_CONNECTOR_GUIDE.md +61 -0
- package/docs/en/29_SLACK_CONNECTOR_GUIDE.md +60 -0
- package/docs/en/30_SETTINGS_CONTROL_CENTER_GUIDE.md +371 -0
- package/docs/en/{19_LOCAL_BROWSER_AUTH.md → 31_LOCAL_BROWSER_AUTH.md} +1 -1
- package/docs/en/32_WINDOWS_WSL2_DEPLOYMENT_GUIDE.md +273 -0
- package/docs/en/33_WORKSPACE_EXPLORER_QA.md +121 -0
- package/docs/en/91_DEVELOPMENT.md +29 -0
- package/docs/en/99_ACKNOWLEDGEMENTS.md +24 -19
- package/docs/en/README.md +44 -7
- package/docs/images/admin/admin-connectors-health-en.png +0 -0
- package/docs/images/admin/admin-controllers-en.png +0 -0
- package/docs/images/admin/admin-diagnostics-en.png +0 -0
- package/docs/images/admin/admin-errors-en.png +0 -0
- package/docs/images/admin/admin-issues-en.png +0 -0
- package/docs/images/admin/admin-logs-en.png +0 -0
- package/docs/images/admin/admin-quest-detail-en.png +0 -0
- package/docs/images/admin/admin-quests-en.png +0 -0
- package/docs/images/admin/admin-repairs-en.png +0 -0
- package/docs/images/admin/admin-runtime-en.png +0 -0
- package/docs/images/admin/admin-search-en.png +0 -0
- package/docs/images/admin/admin-stats-en.png +0 -0
- package/docs/images/admin/admin-summary-en.png +0 -0
- package/docs/images/connectors/connector-discord-en.png +0 -0
- package/docs/images/connectors/connector-feishu-en.png +0 -0
- package/docs/images/connectors/connector-lingzhu-en.png +0 -0
- package/docs/images/connectors/connector-qq-en.png +0 -0
- package/docs/images/connectors/connector-slack-en.png +0 -0
- package/docs/images/connectors/connector-telegram-en.png +0 -0
- package/docs/images/connectors/connector-weixin-en.png +0 -0
- package/docs/images/connectors/connector-whatsapp-en.png +0 -0
- package/docs/images/settings/settings-baselines-en.png +0 -0
- package/docs/images/settings/settings-config-en.png +0 -0
- package/docs/images/settings/settings-connectors-overview-en.png +0 -0
- package/docs/images/settings/settings-deepxiv-en.png +0 -0
- package/docs/images/settings/settings-mcp-servers-en.png +0 -0
- package/docs/images/settings/settings-plugins-en.png +0 -0
- package/docs/images/settings/settings-runners-en.png +0 -0
- package/docs/zh/00_QUICK_START.md +92 -17
- package/docs/zh/01_SETTINGS_REFERENCE.md +219 -98
- package/docs/zh/02_START_RESEARCH_GUIDE.md +26 -5
- package/docs/zh/05_TUI_GUIDE.md +171 -2
- package/docs/zh/07_MEMORY_AND_MCP.md +29 -2
- package/docs/zh/09_DOCTOR.md +39 -4
- package/docs/zh/10_WEIXIN_CONNECTOR_GUIDE.md +24 -1
- package/docs/zh/11_LICENSE_AND_RISK.md +4 -0
- package/docs/zh/12_GUIDED_WORKFLOW_TOUR.md +15 -0
- package/docs/zh/14_PROMPT_SKILLS_AND_MCP_GUIDE.md +9 -0
- package/docs/zh/15_CODEX_PROVIDER_SETUP.md +550 -188
- package/docs/zh/21_LOCAL_MODEL_BACKENDS_GUIDE.md +105 -2
- package/docs/zh/22_BENCHSTORE_YAML_REFERENCE.md +459 -0
- package/docs/zh/23_BENCHSTORE_GITHUB_RELEASES_SPEC.md +287 -0
- package/docs/zh/23_CLAUDE_RUNNER_GUIDE.md +103 -0
- package/docs/zh/24_CLAUDE_CODE_PROVIDER_SETUP.md +460 -0
- package/docs/zh/25_OPENCODE_PROVIDER_SETUP.md +660 -0
- package/docs/zh/26_CITATION_AND_ATTRIBUTION.md +102 -0
- package/docs/zh/27_KIMI_CODE_PROVIDER_SETUP.md +51 -0
- package/docs/zh/{19_LOCAL_BROWSER_AUTH.md → 31_LOCAL_BROWSER_AUTH.md} +1 -1
- package/docs/zh/32_WINDOWS_WSL2_DEPLOYMENT_GUIDE.md +264 -0
- package/docs/zh/33_WORKSPACE_EXPLORER_QA.md +127 -0
- package/docs/zh/99_ACKNOWLEDGEMENTS.md +23 -19
- package/docs/zh/README.md +29 -7
- package/install.sh +122 -16
- package/package.json +4 -1
- package/pyproject.toml +2 -1
- package/src/deepscientist/__init__.py +1 -1
- package/src/deepscientist/acp/envelope.py +13 -0
- package/src/deepscientist/admin/__init__.py +3 -0
- package/src/deepscientist/admin/charts.py +681 -0
- package/src/deepscientist/admin/logs.py +119 -0
- package/src/deepscientist/admin/repairs.py +217 -0
- package/src/deepscientist/admin/service.py +1310 -0
- package/src/deepscientist/admin/system_info.py +700 -0
- package/src/deepscientist/admin/tasks.py +465 -0
- package/src/deepscientist/admin/tool_metrics.py +600 -0
- package/src/deepscientist/artifact/guidance.py +8 -4
- package/src/deepscientist/artifact/schemas.py +115 -0
- package/src/deepscientist/artifact/service.py +4268 -260
- package/src/deepscientist/bash_exec/monitor.py +30 -3
- package/src/deepscientist/bash_exec/service.py +134 -1
- package/src/deepscientist/benchstore/__init__.py +4 -0
- package/src/deepscientist/benchstore/prompt_builder.py +224 -0
- package/src/deepscientist/benchstore/service.py +1716 -0
- package/src/deepscientist/channels/weixin_ilink.py +8 -1
- package/src/deepscientist/cli.py +92 -17
- package/src/deepscientist/codex_cli_compat.py +2 -2
- package/src/deepscientist/config/models.py +82 -11
- package/src/deepscientist/config/service.py +927 -91
- package/src/deepscientist/connector/weixin_support.py +48 -17
- package/src/deepscientist/daemon/api/handlers.py +697 -210
- package/src/deepscientist/daemon/api/router.py +76 -1
- package/src/deepscientist/daemon/app.py +1054 -51
- package/src/deepscientist/diagnostics/runner_failures.py +147 -0
- package/src/deepscientist/doctor.py +212 -65
- package/src/deepscientist/evidence_packets.py +590 -0
- package/src/deepscientist/home.py +52 -4
- package/src/deepscientist/kimi_cli_compat.py +50 -0
- package/src/deepscientist/latex_runtime.py +2 -2
- package/src/deepscientist/mcp/context.py +2 -0
- package/src/deepscientist/mcp/schemas.py +114 -0
- package/src/deepscientist/mcp/server.py +1566 -126
- package/src/deepscientist/memory/service.py +203 -16
- package/src/deepscientist/process_control.py +8 -1
- package/src/deepscientist/prompts/builder.py +836 -92
- package/src/deepscientist/quest/__init__.py +2 -2
- package/src/deepscientist/quest/layout.py +12 -1
- package/src/deepscientist/quest/node_traces.py +10 -0
- package/src/deepscientist/quest/service.py +1430 -139
- package/src/deepscientist/quest/stage_views.py +1 -1
- package/src/deepscientist/runners/__init__.py +18 -0
- package/src/deepscientist/runners/base.py +89 -1
- package/src/deepscientist/runners/builtins.py +13 -1
- package/src/deepscientist/runners/claude.py +391 -0
- package/src/deepscientist/runners/codex.py +421 -21
- package/src/deepscientist/runners/codex_telemetry.py +127 -0
- package/src/deepscientist/runners/kimi.py +334 -0
- package/src/deepscientist/runners/metadata.py +68 -0
- package/src/deepscientist/runners/opencode.py +414 -0
- package/src/deepscientist/runners/runtime_overrides.py +100 -0
- package/src/deepscientist/runners/simple_cli.py +538 -0
- package/src/deepscientist/runtime_storage.py +303 -0
- package/src/deepscientist/shared.py +61 -16
- package/src/deepscientist/skills/installer.py +37 -0
- package/src/deepscientist/skills/registry.py +2 -0
- package/src/deepscientist/tinytex.py +2 -2
- package/src/deepscientist/tui.py +10 -3
- package/src/prompts/benchstore/system.md +77 -0
- package/src/prompts/connectors/qq.md +33 -2
- package/src/prompts/connectors/weixin.md +208 -23
- package/src/prompts/contracts/admin_ops.md +74 -0
- package/src/prompts/contracts/admin_ops_knowledge.md +138 -0
- package/src/prompts/contracts/shared_interaction.md +5 -11
- package/src/prompts/start_setup/system.md +422 -0
- package/src/prompts/system.md +409 -315
- package/src/prompts/system_copilot.md +88 -12
- package/src/skills/analysis-campaign/SKILL.md +239 -578
- package/src/skills/analysis-campaign/references/artifact-flow-examples.md +102 -0
- package/src/skills/analysis-campaign/references/boundary-cases.md +98 -0
- package/src/skills/analysis-campaign/references/campaign-checklist-template.md +39 -24
- package/src/skills/analysis-campaign/references/campaign-design.md +26 -10
- package/src/skills/analysis-campaign/references/campaign-plan-template.md +53 -54
- package/src/skills/analysis-campaign/references/operational-guidance.md +97 -0
- package/src/skills/analysis-campaign/references/writing-facing-slice-examples.md +10 -20
- package/src/skills/baseline/SKILL.md +183 -461
- package/src/skills/baseline/references/artifact-flow-examples.md +106 -0
- package/src/skills/baseline/references/artifact-payload-examples.md +1 -1
- package/src/skills/baseline/references/baseline-checklist-template.md +27 -35
- package/src/skills/baseline/references/baseline-plan-template.md +37 -76
- package/src/skills/baseline/references/boundary-cases.md +86 -0
- package/src/skills/baseline/references/codebase-audit-checklist.md +2 -6
- package/src/skills/baseline/references/comparability-contract.md +7 -12
- package/src/skills/baseline/references/operational-guidance.md +56 -0
- package/src/skills/baseline/references/route-selection.md +5 -25
- package/src/skills/decision/SKILL.md +113 -306
- package/src/skills/decision/references/checkpoint-memory-template.md +47 -0
- package/src/skills/decision/references/operational-guidance.md +94 -0
- package/src/skills/decision/references/research-route-criteria.md +7 -8
- package/src/skills/decision/references/strategic-decision-template.md +13 -26
- package/src/skills/experiment/SKILL.md +132 -670
- package/src/skills/experiment/references/execution-playbook.md +374 -0
- package/src/skills/experiment/references/main-experiment-checklist-template.md +26 -2
- package/src/skills/experiment/references/main-experiment-plan-template.md +28 -17
- package/src/skills/experiment/references/operational-guidance.md +108 -0
- package/src/skills/finalize/SKILL.md +62 -0
- package/src/skills/finalize/references/checkpoint-memory-template.md +49 -0
- package/src/skills/finalize/references/resume-packet-template.md +7 -0
- package/src/skills/idea/SKILL.md +228 -15
- package/src/skills/idea/references/controlled-brainstorming-playbook.md +78 -0
- package/src/skills/idea/references/current-board-packet-template.md +61 -0
- package/src/skills/idea/references/high-value-idea-sourcing.md +119 -0
- package/src/skills/idea/references/idea-generation-playbook.md +21 -0
- package/src/skills/idea/references/idea-thinking-flow.md +6 -0
- package/src/skills/idea/references/literature-survey-template.md +3 -0
- package/src/skills/idea/references/objective-contract-template.md +54 -0
- package/src/skills/idea/references/outline-seeding-example.md +56 -0
- package/src/skills/idea/references/pre-idea-draft-template.md +105 -0
- package/src/skills/idea/references/related-work-playbook.md +75 -2
- package/src/skills/idea/references/research-history-playbook.md +114 -0
- package/src/skills/idea/references/selection-gate.md +58 -6
- package/src/skills/intake-audit/SKILL.md +43 -2
- package/src/skills/intake-audit/references/state-audit-template.md +10 -0
- package/src/skills/nature-data/SKILL.md +128 -0
- package/src/skills/nature-data/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-data/agents/openai.yaml +4 -0
- package/src/skills/nature-data/references/chinese-author-alignment.md +84 -0
- package/src/skills/nature-data/references/fair-metadata-checklist.md +105 -0
- package/src/skills/nature-data/references/policy-principles.md +103 -0
- package/src/skills/nature-data/references/repository-and-identifiers.md +96 -0
- package/src/skills/nature-data/references/source-basis.md +54 -0
- package/src/skills/nature-data/references/statement-patterns.md +153 -0
- package/src/skills/nature-figure/SKILL.md +197 -0
- package/src/skills/nature-figure/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-figure/agents/openai.yaml +4 -0
- package/src/skills/nature-figure/evals/evals.json +37 -0
- package/src/skills/nature-figure/references/api.md +428 -0
- package/src/skills/nature-figure/references/backend-selection.md +100 -0
- package/src/skills/nature-figure/references/chart-types.md +281 -0
- package/src/skills/nature-figure/references/common-patterns.md +349 -0
- package/src/skills/nature-figure/references/design-theory.md +436 -0
- package/src/skills/nature-figure/references/figure-contract.md +93 -0
- package/src/skills/nature-figure/references/nature-2026-observations.md +112 -0
- package/src/skills/nature-figure/references/qa-contract.md +119 -0
- package/src/skills/nature-figure/references/r-template-index.md +66 -0
- package/src/skills/nature-figure/references/r-workflow.md +161 -0
- package/src/skills/nature-figure/references/tutorials.md +250 -0
- package/src/skills/nature-paper2ppt/SKILL.md +507 -0
- package/src/skills/nature-paper2ppt/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-paper2ppt/agents/openai.yaml +4 -0
- package/src/skills/nature-polishing/SKILL.md +385 -0
- package/src/skills/nature-polishing/UPSTREAM_LICENSE.txt +21 -0
- package/src/skills/nature-polishing/agents/openai.yaml +4 -0
- package/src/skills/nature-polishing/references/phrasebank-playbook.md +162 -0
- package/src/skills/nature-polishing/references/section-moves.md +240 -0
- package/src/skills/nature-polishing/references/style-guardrails.md +94 -0
- package/src/skills/nature-polishing/references/writing-strategy.md +148 -0
- package/src/skills/optimize/SKILL.md +177 -1568
- package/src/skills/optimize/references/brief-shaping-playbook.md +95 -0
- package/src/skills/optimize/references/candidate-board-template.md +13 -0
- package/src/skills/optimize/references/candidate-ranking-template.md +51 -0
- package/src/skills/optimize/references/codegen-route-playbook.md +50 -0
- package/src/skills/optimize/references/debug-response-template.md +29 -0
- package/src/skills/optimize/references/frontier-review-template.md +32 -0
- package/src/skills/optimize/references/fusion-playbook.md +36 -0
- package/src/skills/optimize/references/method-brief-template.md +73 -0
- package/src/skills/optimize/references/operational-guidance.md +621 -0
- package/src/skills/optimize/references/optimization-memory-template.md +30 -0
- package/src/skills/optimize/references/optimize-checklist-template.md +18 -0
- package/src/skills/optimize/references/plateau-response-playbook.md +28 -0
- package/src/skills/optimize/references/prompt-patterns.md +49 -0
- package/src/skills/paper-outline/SKILL.md +227 -0
- package/src/skills/paper-outline/references/outline-patterns.md +87 -0
- package/src/skills/paper-plot/SKILL.md +79 -0
- package/src/skills/paper-plot/agents/openai.yaml +4 -0
- package/src/skills/paper-plot/references/bar_grouped_hatch.md +96 -0
- package/src/skills/paper-plot/references/bar_paired_delta.md +72 -0
- package/src/skills/paper-plot/references/line_confidence_band.md +75 -0
- package/src/skills/paper-plot/references/line_loss_with_inset.md +65 -0
- package/src/skills/paper-plot/references/line_training_curve.md +44 -0
- package/src/skills/paper-plot/references/radar_dual_series.md +59 -0
- package/src/skills/paper-plot/references/scatter_broken_axis.md +59 -0
- package/src/skills/paper-plot/references/scatter_tsne_cluster.md +72 -0
- package/src/skills/paper-plot/scripts/bar_memevolve.py +109 -0
- package/src/skills/paper-plot/scripts/bar_spice.py +166 -0
- package/src/skills/paper-plot/scripts/line_aime.py +94 -0
- package/src/skills/paper-plot/scripts/line_loss_inset.py +157 -0
- package/src/skills/paper-plot/scripts/line_selfdistill.py +168 -0
- package/src/skills/paper-plot/scripts/radar_dora.py +151 -0
- package/src/skills/paper-plot/scripts/scatter_break.py +169 -0
- package/src/skills/paper-plot/scripts/scatter_tsne.py +133 -0
- package/src/skills/rebuttal/SKILL.md +9 -0
- package/src/skills/references/tool-usage-by-stage.md +438 -0
- package/src/skills/review/SKILL.md +105 -7
- package/src/skills/science/PROVENANCE.md +44 -0
- package/src/skills/science/SKILL.md +137 -0
- package/src/skills/science/references/artifact-science-tool.md +110 -0
- package/src/skills/science/references/claim-type-discipline.md +56 -0
- package/src/skills/science/references/domain-index.md +422 -0
- package/src/skills/science/references/hpc-via-bash-exec.md +42 -0
- package/src/skills/science/references/package-check-playbook.md +64 -0
- package/src/skills/science/references/package-index.min.json +3616 -0
- package/src/skills/science/references/packages/abinit.md +80 -0
- package/src/skills/science/references/packages/acts.md +73 -0
- package/src/skills/science/references/packages/aiida-core.md +80 -0
- package/src/skills/science/references/packages/alamode.md +80 -0
- package/src/skills/science/references/packages/amuse.md +88 -0
- package/src/skills/science/references/packages/anndata.md +88 -0
- package/src/skills/science/references/packages/arbor.md +80 -0
- package/src/skills/science/references/packages/arc.md +73 -0
- package/src/skills/science/references/packages/astropy.md +88 -0
- package/src/skills/science/references/packages/astroquery.md +88 -0
- package/src/skills/science/references/packages/atomate2.md +80 -0
- package/src/skills/science/references/packages/atomsmltr.md +73 -0
- package/src/skills/science/references/packages/awkward.md +73 -0
- package/src/skills/science/references/packages/batman.md +88 -0
- package/src/skills/science/references/packages/biopython.md +88 -0
- package/src/skills/science/references/packages/bloqade.md +73 -0
- package/src/skills/science/references/packages/brian2.md +73 -0
- package/src/skills/science/references/packages/bullet3.md +73 -0
- package/src/skills/science/references/packages/calculix.md +80 -0
- package/src/skills/science/references/packages/cantera.md +73 -0
- package/src/skills/science/references/packages/cavity-md-ipi.md +80 -0
- package/src/skills/science/references/packages/ccdproc.md +88 -0
- package/src/skills/science/references/packages/celerite2.md +88 -0
- package/src/skills/science/references/packages/cellrank.md +73 -0
- package/src/skills/science/references/packages/cesm.md +80 -0
- package/src/skills/science/references/packages/chemicals.md +73 -0
- package/src/skills/science/references/packages/chempy.md +73 -0
- package/src/skills/science/references/packages/cirq.md +73 -0
- package/src/skills/science/references/packages/coffea.md +73 -0
- package/src/skills/science/references/packages/cp2k.md +88 -0
- package/src/skills/science/references/packages/custodian.md +80 -0
- package/src/skills/science/references/packages/dart.md +73 -0
- package/src/skills/science/references/packages/datamol.md +88 -0
- package/src/skills/science/references/packages/dd4hep.md +73 -0
- package/src/skills/science/references/packages/dealii.md +80 -0
- package/src/skills/science/references/packages/deepchem.md +88 -0
- package/src/skills/science/references/packages/delphes.md +73 -0
- package/src/skills/science/references/packages/devito.md +80 -0
- package/src/skills/science/references/packages/dftb.md +88 -0
- package/src/skills/science/references/packages/dftd4.md +88 -0
- package/src/skills/science/references/packages/dftk-jl.md +80 -0
- package/src/skills/science/references/packages/dolfinx.md +80 -0
- package/src/skills/science/references/packages/drake.md +73 -0
- package/src/skills/science/references/packages/dumux.md +73 -0
- package/src/skills/science/references/packages/elk.md +80 -0
- package/src/skills/science/references/packages/elmerfem.md +80 -0
- package/src/skills/science/references/packages/enzo-e.md +88 -0
- package/src/skills/science/references/packages/espresso.md +80 -0
- package/src/skills/science/references/packages/exoplanet.md +88 -0
- package/src/skills/science/references/packages/fairroot.md +73 -0
- package/src/skills/science/references/packages/fbpic.md +80 -0
- package/src/skills/science/references/packages/fdtdbath-meep.md +80 -0
- package/src/skills/science/references/packages/geant4.md +73 -0
- package/src/skills/science/references/packages/geosx.md +80 -0
- package/src/skills/science/references/packages/gprmax.md +80 -0
- package/src/skills/science/references/packages/gromacs.md +80 -0
- package/src/skills/science/references/packages/gwaslab.md +73 -0
- package/src/skills/science/references/packages/gz-sim.md +73 -0
- package/src/skills/science/references/packages/hail.md +88 -0
- package/src/skills/science/references/packages/hiphive.md +80 -0
- package/src/skills/science/references/packages/hoomd-blue.md +80 -0
- package/src/skills/science/references/packages/itensor.md +73 -0
- package/src/skills/science/references/packages/itensors-jl.md +73 -0
- package/src/skills/science/references/packages/jdftx.md +73 -0
- package/src/skills/science/references/packages/jobflow.md +80 -0
- package/src/skills/science/references/packages/kadanoffbaym-jl.md +73 -0
- package/src/skills/science/references/packages/kite.md +80 -0
- package/src/skills/science/references/packages/kratos.md +80 -0
- package/src/skills/science/references/packages/kwant.md +73 -0
- package/src/skills/science/references/packages/lammps.md +80 -0
- package/src/skills/science/references/packages/lightkurve.md +88 -0
- package/src/skills/science/references/packages/limix.md +73 -0
- package/src/skills/science/references/packages/maxwelllink.md +80 -0
- package/src/skills/science/references/packages/mcdc.md +73 -0
- package/src/skills/science/references/packages/meep.md +80 -0
- package/src/skills/science/references/packages/mfem.md +80 -0
- package/src/skills/science/references/packages/mitgcm.md +73 -0
- package/src/skills/science/references/packages/modflow6.md +73 -0
- package/src/skills/science/references/packages/molecool.md +73 -0
- package/src/skills/science/references/packages/mom6.md +73 -0
- package/src/skills/science/references/packages/moose.md +80 -0
- package/src/skills/science/references/packages/mpas-model.md +73 -0
- package/src/skills/science/references/packages/mujoco.md +73 -0
- package/src/skills/science/references/packages/mumax3.md +73 -0
- package/src/skills/science/references/packages/nekrs.md +80 -0
- package/src/skills/science/references/packages/nessi.md +73 -0
- package/src/skills/science/references/packages/nest-simulator.md +73 -0
- package/src/skills/science/references/packages/netket.md +73 -0
- package/src/skills/science/references/packages/neuron.md +73 -0
- package/src/skills/science/references/packages/nextflow.md +88 -0
- package/src/skills/science/references/packages/nwchem.md +88 -0
- package/src/skills/science/references/packages/openbabel.md +88 -0
- package/src/skills/science/references/packages/openems.md +80 -0
- package/src/skills/science/references/packages/openff-toolkit.md +88 -0
- package/src/skills/science/references/packages/openfoam-dev.md +80 -0
- package/src/skills/science/references/packages/openmc.md +73 -0
- package/src/skills/science/references/packages/openmm.md +80 -0
- package/src/skills/science/references/packages/openmoc.md +73 -0
- package/src/skills/science/references/packages/openmx.md +80 -0
- package/src/skills/science/references/packages/opensees.md +80 -0
- package/src/skills/science/references/packages/opensn.md +80 -0
- package/src/skills/science/references/packages/opm-simulators.md +73 -0
- package/src/skills/science/references/packages/oqupy.md +73 -0
- package/src/skills/science/references/packages/packmol.md +80 -0
- package/src/skills/science/references/packages/palabos.md +80 -0
- package/src/skills/science/references/packages/parflow.md +80 -0
- package/src/skills/science/references/packages/pennylane.md +88 -0
- package/src/skills/science/references/packages/perceval.md +73 -0
- package/src/skills/science/references/packages/phono3py.md +73 -0
- package/src/skills/science/references/packages/phonopy.md +73 -0
- package/src/skills/science/references/packages/photutils.md +88 -0
- package/src/skills/science/references/packages/picongpu.md +80 -0
- package/src/skills/science/references/packages/plink-ng.md +88 -0
- package/src/skills/science/references/packages/precice.md +73 -0
- package/src/skills/science/references/packages/psc.md +80 -0
- package/src/skills/science/references/packages/psi4.md +88 -0
- package/src/skills/science/references/packages/pybinding.md +73 -0
- package/src/skills/science/references/packages/pyfr.md +80 -0
- package/src/skills/science/references/packages/pyhf.md +73 -0
- package/src/skills/science/references/packages/pyiron_base.md +80 -0
- package/src/skills/science/references/packages/pylcp.md +73 -0
- package/src/skills/science/references/packages/pylith.md +80 -0
- package/src/skills/science/references/packages/pynbody.md +88 -0
- package/src/skills/science/references/packages/pysam.md +88 -0
- package/src/skills/science/references/packages/pyscf.md +88 -0
- package/src/skills/science/references/packages/q-e.md +73 -0
- package/src/skills/science/references/packages/qibo.md +73 -0
- package/src/skills/science/references/packages/qiskit.md +73 -0
- package/src/skills/science/references/packages/quantica-jl.md +73 -0
- package/src/skills/science/references/packages/quantumoptics-jl.md +73 -0
- package/src/skills/science/references/packages/quimb.md +73 -0
- package/src/skills/science/references/packages/qulacs.md +73 -0
- package/src/skills/science/references/packages/qutip.md +73 -0
- package/src/skills/science/references/packages/rdkit.md +88 -0
- package/src/skills/science/references/packages/rmg-py.md +73 -0
- package/src/skills/science/references/packages/root.md +73 -0
- package/src/skills/science/references/packages/scanpy.md +88 -0
- package/src/skills/science/references/packages/scikit-allel.md +88 -0
- package/src/skills/science/references/packages/scikit-bio.md +88 -0
- package/src/skills/science/references/packages/scqubits.md +73 -0
- package/src/skills/science/references/packages/scuff-em.md +80 -0
- package/src/skills/science/references/packages/scvi-tools.md +73 -0
- package/src/skills/science/references/packages/seissol.md +73 -0
- package/src/skills/science/references/packages/sfepy.md +80 -0
- package/src/skills/science/references/packages/sisl.md +73 -0
- package/src/skills/science/references/packages/smilei.md +80 -0
- package/src/skills/science/references/packages/snakemake.md +88 -0
- package/src/skills/science/references/packages/specfem3d-globe.md +80 -0
- package/src/skills/science/references/packages/specutils.md +88 -0
- package/src/skills/science/references/packages/spglib.md +80 -0
- package/src/skills/science/references/packages/squidpy.md +88 -0
- package/src/skills/science/references/packages/starry.md +88 -0
- package/src/skills/science/references/packages/strawberryfields.md +73 -0
- package/src/skills/science/references/packages/su2.md +80 -0
- package/src/skills/science/references/packages/sunny-jl.md +73 -0
- package/src/skills/science/references/packages/sw4.md +73 -0
- package/src/skills/science/references/packages/swift.md +88 -0
- package/src/skills/science/references/packages/tdnegf.md +73 -0
- package/src/skills/science/references/packages/tenpy.md +73 -0
- package/src/skills/science/references/packages/thermo.md +73 -0
- package/src/skills/science/references/packages/tkwant.md +73 -0
- package/src/skills/science/references/packages/tvb-root.md +73 -0
- package/src/skills/science/references/packages/uproot5.md +73 -0
- package/src/skills/science/references/packages/vampire.md +80 -0
- package/src/skills/science/references/packages/wannier_tools.md +73 -0
- package/src/skills/science/references/packages/warpx.md +80 -0
- package/src/skills/science/references/packages/wrf.md +73 -0
- package/src/skills/science/references/packages/xtb.md +88 -0
- package/src/skills/science/references/packages/yt.md +73 -0
- package/src/skills/science/references/science-task-brief-template.md +71 -0
- package/src/skills/scout/SKILL.md +83 -425
- package/src/skills/scout/references/literature-scout-template.md +5 -24
- package/src/skills/scout/references/operational-guidance.md +191 -0
- package/src/skills/scout/references/paper-triage-playbook.md +11 -35
- package/src/skills/write/SKILL.md +744 -1246
- package/src/skills/write/references/experiments_analysis_patterns.md +129 -0
- package/src/skills/write/references/oral_package_patterns.md +252 -0
- package/src/skills/write/references/oral_writing_principles.md +291 -0
- package/src/skills/write/references/section_rewrite_checklist.md +234 -0
- package/src/tui/dist/app/AppContainer.js +1314 -27
- package/src/tui/dist/components/Composer.js +26 -1
- package/src/tui/dist/components/ConfigScreen.js +2 -1
- package/src/tui/dist/components/InputPrompt.js +25 -9
- package/src/tui/dist/components/MainContent.js +18 -3
- package/src/tui/dist/components/QuestScreen.js +3 -2
- package/src/tui/dist/components/UtilityScreen.js +37 -0
- package/src/tui/dist/hooks/useSafeInput.js +10 -0
- package/src/tui/dist/index.js +13 -1
- package/src/tui/dist/layouts/DefaultAppLayout.js +11 -8
- package/src/tui/dist/lib/api.js +89 -1
- package/src/tui/package.json +1 -1
- package/src/ui/dist/assets/{AnalysisPlugin-BCKAfjba.js → AnalysisPlugin-CA94NGmI.js} +1 -1
- package/src/ui/dist/assets/CliPlugin-DHBzphZU.js +79 -0
- package/src/ui/dist/assets/CodeEditorPlugin-BOFwD2rn.js +2 -0
- package/src/ui/dist/assets/{CodeViewerPlugin-CbaFRrUU.js → CodeViewerPlugin-CqDpgjik.js} +4 -4
- package/src/ui/dist/assets/{DocViewerPlugin-DAjLVeQD.js → DocViewerPlugin-UDBgt8-4.js} +3 -3
- package/src/ui/dist/assets/GitCommitViewerPlugin-BmHtZ0bZ.js +6 -0
- package/src/ui/dist/assets/{GitDiffViewerPlugin-CQACjoAA.js → GitDiffViewerPlugin-CAxjNorQ.js} +2 -2
- package/src/ui/dist/assets/{GitSnapshotViewer-0r4nLPke.js → GitSnapshotViewer-CweA6VON.js} +2 -2
- package/src/ui/dist/assets/{ImageViewerPlugin-nBOmI2v_.js → ImageViewerPlugin-C8wHGvGN.js} +5 -5
- package/src/ui/dist/assets/LabPlugin-COyyLUol.js +32 -0
- package/src/ui/dist/assets/{LatexPlugin-ZwtV8pIp.js → LatexPlugin-BQjAaA5J.js} +4 -4
- package/src/ui/dist/assets/{MarkdownViewerPlugin-DKqVfKyW.js → MarkdownViewerPlugin-Dy1NE2dI.js} +3 -3
- package/src/ui/dist/assets/{MarketplacePlugin-BwxStZ9D.js → MarketplacePlugin-DMIZtEJ2.js} +2 -2
- package/src/ui/dist/assets/NotebookEditor-CFHMq_Qt.js +91 -0
- package/src/ui/dist/assets/{NotebookEditor-DB9N_T9q.js → NotebookEditor-WFyd8Ybt.js} +3 -3
- package/src/ui/dist/assets/{PdfLoader-eWBONbQP.js → PdfLoader-CLE5u5TS.js} +3 -3
- package/src/ui/dist/assets/{PdfMarkdownPlugin-D22YOZL3.js → PdfMarkdownPlugin-_iNK_H83.js} +1 -1
- package/src/ui/dist/assets/PdfViewerPlugin-DgWsbInT.js +22 -0
- package/src/ui/dist/assets/SearchPlugin-DrZmn5iw.js +11 -0
- package/src/ui/dist/assets/{TextViewerPlugin-C5xqeeUH.js → TextViewerPlugin-D1-T3aC7.js} +4 -4
- package/src/ui/dist/assets/branding/runner-claude.svg +107 -0
- package/src/ui/dist/assets/branding/runner-codex.svg +10 -0
- package/src/ui/dist/assets/branding/runner-kimi.svg +14 -0
- package/src/ui/dist/assets/branding/runner-opencode.svg +7 -0
- package/src/ui/dist/assets/cli-store-CoZ-x5Ip.js +1 -0
- package/src/ui/dist/assets/{code-WlFHE7z_.js → code-DbsmSd3Y.js} +1 -1
- package/src/ui/dist/assets/file-diff-panel-DsvyRz47.js +1 -0
- package/src/ui/dist/assets/{wrap-text-BC-Hltpd.js → file-jump-queue-DeQBikaw.js} +3 -3
- package/src/ui/dist/assets/{file-socket-CfQPKQKj.js → file-socket-DA5XIx88.js} +1 -1
- package/src/ui/dist/assets/fonts/ds-fonts.css +50 -4
- package/src/ui/dist/assets/images/deepxiv/register-guide.png +0 -0
- package/src/ui/dist/assets/index-39vY9LmZ.js +1 -0
- package/src/ui/dist/assets/{index-CwNu1aH4.js → index-BsO46tJA.js} +1 -1
- package/src/ui/dist/assets/index-CHzJ2xtB.js +3530 -0
- package/src/ui/dist/assets/index-DH-zxoZ3.css +33 -0
- package/src/ui/dist/assets/{plugin-notebook-HbW2K-1c.js → plugin-notebook-JRhysCqj.js} +2 -2
- package/src/ui/dist/assets/{project-sync-C9IdzdZW.js → project-sync-DPmWKmKD.js} +1 -1
- package/src/ui/dist/assets/{zoom-out-E_gaeAxL.js → zoom-out-DAukFWen.js} +3 -3
- package/src/ui/dist/index.html +3 -3
- package/src/skills/analysis-campaign/references/artifact-orchestration.md +0 -58
- package/src/skills/baseline/references/memory-playbook.md +0 -40
- package/src/skills/baseline/references/publishable-baseline-package.md +0 -30
- package/src/skills/write/references/outline-evidence-contract-example.md +0 -107
- package/src/skills/write/references/paper-experiment-matrix-template.md +0 -131
- package/src/skills/write/references/paper-section-playbook.md +0 -64
- package/src/skills/write/references/reviewer-first-writing.md +0 -64
- package/src/skills/write/references/revision-checklist.md +0 -70
- package/src/skills/write/references/section-contracts.md +0 -82
- package/src/skills/write/references/sentence-level-proofing.md +0 -49
- package/src/ui/dist/assets/AiManusChatView-Bv-Z8YpU.js +0 -204
- package/src/ui/dist/assets/CliPlugin-BCKcpc35.js +0 -109
- package/src/ui/dist/assets/CodeEditorPlugin-DbOfSJ8K.js +0 -2
- package/src/ui/dist/assets/GitCommitViewerPlugin-CIUqbUDO.js +0 -1
- package/src/ui/dist/assets/LabCopilotPanel-BHxOxF4z.js +0 -14
- package/src/ui/dist/assets/LabPlugin-BKoZGs95.js +0 -22
- package/src/ui/dist/assets/NotebookEditor-BEQhaQbt.js +0 -81
- package/src/ui/dist/assets/PdfViewerPlugin-c-RK9DLM.js +0 -17
- package/src/ui/dist/assets/SearchPlugin-CxF9ytAx.js +0 -16
- package/src/ui/dist/assets/VNCViewer-BoLGLnHz.js +0 -11
- package/src/ui/dist/assets/bot-DREQOxzP.js +0 -6
- package/src/ui/dist/assets/chevron-up-C9Qpx4DE.js +0 -6
- package/src/ui/dist/assets/file-content-BZMz3RYp.js +0 -1
- package/src/ui/dist/assets/file-diff-panel-CQhw0jS2.js +0 -1
- package/src/ui/dist/assets/file-jump-queue-DA-SdG__.js +0 -1
- package/src/ui/dist/assets/git-commit-horizontal-DxZ8DCZh.js +0 -6
- package/src/ui/dist/assets/image-Bgl4VIyx.js +0 -6
- package/src/ui/dist/assets/index-BpV6lusQ.css +0 -33
- package/src/ui/dist/assets/index-CBNVuWcP.js +0 -2496
- package/src/ui/dist/assets/index-DrUnlf6K.js +0 -1
- package/src/ui/dist/assets/index-NW-h8VzN.js +0 -1
- package/src/ui/dist/assets/pdf-effect-queue-J8OnM0jE.js +0 -6
- package/src/ui/dist/assets/popover-CLc0pPP8.js +0 -1
- package/src/ui/dist/assets/select-Cs2PmzwL.js +0 -11
- package/src/ui/dist/assets/sigma-ClKcHAXm.js +0 -6
- package/src/ui/dist/assets/trash-DwpbFr3w.js +0 -11
- package/src/ui/dist/assets/useCliAccess-NQ8m0Let.js +0 -1
- package/src/ui/dist/assets/useFileDiffOverlay-FuhcnKiw.js +0 -1
|
@@ -0,0 +1,210 @@
|
|
|
1
|
+
id: aisb.t3.036_weakstrongpref
|
|
2
|
+
name: COWEST
|
|
3
|
+
version: 0.1.0
|
|
4
|
+
one_line: Weak-strong model collaboration via DPO, SFT, and preference-aligned candidate
|
|
5
|
+
aggregation to improve response quality on specialized QA tasks.
|
|
6
|
+
task_description: 'This benchmark implements COWEST (Collaboration method between
|
|
7
|
+
Weak and Strong models), a framework that pairs a specialized weak model with a
|
|
8
|
+
general strong model to tackle reasoning tasks requiring domain-specific knowledge.
|
|
9
|
+
During inference, the weak model generates initial drafts and background information
|
|
10
|
+
while the strong model refines them using advanced reasoning. During training, the
|
|
11
|
+
framework quantifies the weak model''s contribution influence and constructs preference
|
|
12
|
+
pairs to guide DPO fine-tuning of the weak model. The pipeline supports both individual
|
|
13
|
+
model baselines and the full collaborative workflow.
|
|
14
|
+
|
|
15
|
+
'
|
|
16
|
+
task_mode: experiment_driven
|
|
17
|
+
requires_execution: true
|
|
18
|
+
requires_paper: true
|
|
19
|
+
integrity_level: cas_plus_canary
|
|
20
|
+
snapshot_status: external_eval_required
|
|
21
|
+
support_level: advanced
|
|
22
|
+
time_band: 1d+
|
|
23
|
+
cost_band: high
|
|
24
|
+
difficulty: hard
|
|
25
|
+
data_access: public
|
|
26
|
+
primary_outputs:
|
|
27
|
+
- f1_ifqa
|
|
28
|
+
- em_ifqa
|
|
29
|
+
- preference_checkpoint
|
|
30
|
+
- aligned_candidates
|
|
31
|
+
launch_profiles:
|
|
32
|
+
- id: quick_eval
|
|
33
|
+
label: Quick Eval
|
|
34
|
+
description: Run one packaged evaluation route on the weak-strong preference stack
|
|
35
|
+
using pretrained checkpoints.
|
|
36
|
+
- id: full_alignment
|
|
37
|
+
label: Full Alignment
|
|
38
|
+
description: Run the complete SFT, DPO, and preference-aligned aggregation workflow
|
|
39
|
+
from scratch.
|
|
40
|
+
- id: inference_only
|
|
41
|
+
label: Inference Only
|
|
42
|
+
description: Run inference pipelines for weak-only, strong-only, and weak-strong
|
|
43
|
+
collaborative modes without training.
|
|
44
|
+
dataset_download:
|
|
45
|
+
primary_method: mixed
|
|
46
|
+
sources:
|
|
47
|
+
- type: url
|
|
48
|
+
url: https://deepscientist.cc/AISB/036_weakstrongpref
|
|
49
|
+
format: zip
|
|
50
|
+
- type: huggingface
|
|
51
|
+
note: Some datasets may require Hugging Face authentication for IFQA
|
|
52
|
+
notes:
|
|
53
|
+
- Archive extracts to paper-36-WeakStrongPref
|
|
54
|
+
- IFQA dataset components included in bundle
|
|
55
|
+
credential_requirements:
|
|
56
|
+
mode: partial
|
|
57
|
+
items:
|
|
58
|
+
- HuggingFace token for Llama-3-8B-Instruct access
|
|
59
|
+
- Azure OpenAI API key for GPT-35-turbo (used as strong model in inference)
|
|
60
|
+
notes:
|
|
61
|
+
- Strong model can be swapped for open-weight alternatives to avoid proprietary
|
|
62
|
+
API requirements
|
|
63
|
+
environment:
|
|
64
|
+
python: '3.10'
|
|
65
|
+
cuda: '11.8'
|
|
66
|
+
pytorch: 2.0.1
|
|
67
|
+
flash_attn: null
|
|
68
|
+
key_packages:
|
|
69
|
+
- bitsandbytes==0.41.1
|
|
70
|
+
- peft
|
|
71
|
+
- transformers
|
|
72
|
+
- datasets
|
|
73
|
+
- sklearn
|
|
74
|
+
- torch>=2.0.0
|
|
75
|
+
notes:
|
|
76
|
+
- Bundled requirements pin torch to <=2.0.1 and rely on a custom Transformers fork
|
|
77
|
+
for left-padding behavior
|
|
78
|
+
- See bundled README/requirements.txt for the full dependency set
|
|
79
|
+
- Requires multi-GPU setup for full training pipeline
|
|
80
|
+
resources:
|
|
81
|
+
minimum:
|
|
82
|
+
cpu_cores: 16
|
|
83
|
+
ram_gb: 64
|
|
84
|
+
disk_gb: 150
|
|
85
|
+
gpu_count: 1
|
|
86
|
+
gpu_vram_gb: 24
|
|
87
|
+
notes: Single GPU feasible only for inference; training requires multi-GPU for
|
|
88
|
+
LoRA with 8B models
|
|
89
|
+
recommended:
|
|
90
|
+
cpu_cores: 32
|
|
91
|
+
ram_gb: 128
|
|
92
|
+
disk_gb: 300
|
|
93
|
+
gpu_count: 4
|
|
94
|
+
gpu_vram_gb: 24
|
|
95
|
+
notes: SFT script configured for 4x24GB GPUs (WORLD_SIZE=4); DPO uses 3-GPU setup
|
|
96
|
+
risk_flags:
|
|
97
|
+
- proprietary_model_dependency
|
|
98
|
+
- multi_gpu_training_complexity
|
|
99
|
+
- dataset_size
|
|
100
|
+
risk_notes:
|
|
101
|
+
- Full pipeline requires access to Llama-3-8B-Instruct via HuggingFace
|
|
102
|
+
- DPO training uses RMSProp optimizer (non-standard choice)
|
|
103
|
+
- Azure OpenAI dependency for strong model inference can introduce API latency and
|
|
104
|
+
cost
|
|
105
|
+
recommended_when: 'Use this benchmark when you want to study weak-strong model collaboration
|
|
106
|
+
with alignment objectives, explore preference data construction from collaborative
|
|
107
|
+
feedback, or evaluate alignment methods on knowledge-intensive QA tasks requiring
|
|
108
|
+
specialized domain knowledge.
|
|
109
|
+
|
|
110
|
+
'
|
|
111
|
+
not_recommended_when: 'Do not use this if you cannot support LLaMA-class 8B model
|
|
112
|
+
training with LoRA, or if you need a lightweight benchmark without DPO-style optimization.
|
|
113
|
+
Not recommended when Azure API access is unavailable and no alternative strong model
|
|
114
|
+
is configured.
|
|
115
|
+
|
|
116
|
+
'
|
|
117
|
+
paper:
|
|
118
|
+
title: Synergistic Weak-Strong Collaboration by Aligning Preferences
|
|
119
|
+
venue: ACL 2025
|
|
120
|
+
year: 2025
|
|
121
|
+
url: https://arxiv.org/abs/2504.15188
|
|
122
|
+
authors:
|
|
123
|
+
- Yizhu Jiao
|
|
124
|
+
- Xuchao Zhang
|
|
125
|
+
- Zhaoyang Wang
|
|
126
|
+
- Yubo Ma
|
|
127
|
+
- Zhun Deng
|
|
128
|
+
- Rujia Wang
|
|
129
|
+
- Chetan Bansal
|
|
130
|
+
- Saravan Rajmohan
|
|
131
|
+
- Jiawei Han
|
|
132
|
+
- Huaxiu Yao
|
|
133
|
+
institutions:
|
|
134
|
+
- Microsoft Research
|
|
135
|
+
- University of Illinois Urbana-Champaign
|
|
136
|
+
- University of North Carolina at Chapel Hill
|
|
137
|
+
- Nanyang Technological University
|
|
138
|
+
display:
|
|
139
|
+
palette_seed: coral-ink-alignment
|
|
140
|
+
art_style: preference-dashboard
|
|
141
|
+
accent_priority: high
|
|
142
|
+
image_path: ../image/036_aisb.t3.036_weakstrongpref.jpg
|
|
143
|
+
capability_tags:
|
|
144
|
+
- research_code_optimization
|
|
145
|
+
- alignment
|
|
146
|
+
- preference_optimization
|
|
147
|
+
- large_language_models
|
|
148
|
+
- evaluation
|
|
149
|
+
- weak_to_strong_generalization
|
|
150
|
+
- collaborative_inference
|
|
151
|
+
aisb_direction: T3
|
|
152
|
+
track_fit:
|
|
153
|
+
- paper_track
|
|
154
|
+
- benchmark_track
|
|
155
|
+
execution_notes:
|
|
156
|
+
stages:
|
|
157
|
+
- id: sft
|
|
158
|
+
script: sft.sh
|
|
159
|
+
model: meta-llama/Meta-Llama-3-8B-Instruct
|
|
160
|
+
dataset: ifqa_sft
|
|
161
|
+
epochs: 3
|
|
162
|
+
batch_per_device: 1
|
|
163
|
+
gradient_accumulation: 8
|
|
164
|
+
lora_r: 16
|
|
165
|
+
lora_alpha: 16
|
|
166
|
+
lr: 1.41e-05
|
|
167
|
+
output: ./lora_weights/llama3-8b-sft-ifqa
|
|
168
|
+
- id: dpo
|
|
169
|
+
script: dpo.sh
|
|
170
|
+
model: meta-llama/Meta-Llama-3-8B-Instruct
|
|
171
|
+
dataset: ifqa_dpo
|
|
172
|
+
epochs: 1
|
|
173
|
+
batch_per_device: 1
|
|
174
|
+
gradient_accumulation: 16
|
|
175
|
+
lora_r: 16
|
|
176
|
+
lora_alpha: 16
|
|
177
|
+
lr: 1.41e-05
|
|
178
|
+
optimizer: rmsprop
|
|
179
|
+
output: ./lora_weights/llama3-8b-dpo-ifqa
|
|
180
|
+
- id: inference
|
|
181
|
+
script: inference_weak_strong_models.py
|
|
182
|
+
weak_model: local/lora_adapters
|
|
183
|
+
strong_model: gpt-35-turbo (or open-weight alternative)
|
|
184
|
+
modes:
|
|
185
|
+
- weak_only
|
|
186
|
+
- strong_only
|
|
187
|
+
- weak_strong_collaborative
|
|
188
|
+
metric_protocol:
|
|
189
|
+
f1_ifqa:
|
|
190
|
+
origin: code-backed
|
|
191
|
+
source: evaluation.py
|
|
192
|
+
description: F1 score computed over question-level answers for IFQA dataset
|
|
193
|
+
em_ifqa:
|
|
194
|
+
origin: code-backed
|
|
195
|
+
source: evaluation.py
|
|
196
|
+
description: Exact match metric for IFQA question answering
|
|
197
|
+
caveats:
|
|
198
|
+
- No benchmark execution performed in packaging pass; metric values require trusted
|
|
199
|
+
runtime output
|
|
200
|
+
- Static audit confirms executable anchors exist for all staged metrics
|
|
201
|
+
- README absent in current snapshot; follow AGENTS.md code anchors
|
|
202
|
+
download:
|
|
203
|
+
provider: github_release
|
|
204
|
+
repo: ResearAI/DeepScientist
|
|
205
|
+
tag: aisb-v0.0.1
|
|
206
|
+
asset_name: aisb.t3.036_weakstrongpref.zip
|
|
207
|
+
url: https://github.com/ResearAI/DeepScientist/releases/download/aisb-v0.0.1/aisb.t3.036_weakstrongpref.zip
|
|
208
|
+
archive_type: zip
|
|
209
|
+
sha256: e98abaf4d388b77c120e970e947bd56f2be0276171180bf6c9ee3d5e144a77e7
|
|
210
|
+
size_bytes: 55245
|
|
@@ -0,0 +1,194 @@
|
|
|
1
|
+
id: aisb.t3.036_weakstrongpref
|
|
2
|
+
name: COWEST
|
|
3
|
+
version: 0.1.0
|
|
4
|
+
one_line: 通过DPO、SFT和偏好对齐候选聚合实现弱强模型协作,提升专业问答任务的回复质量。
|
|
5
|
+
task_description: >
|
|
6
|
+
本基准测试实现了COWEST(弱强模型协作方法),该框架将专业弱模型与通用强模型配对,
|
|
7
|
+
以处理需要领域特定知识的推理任务。在推理阶段,弱模型生成初始草稿和背景信息,
|
|
8
|
+
强模型则利用高级推理能力对其进行精炼。在训练阶段,框架量化弱模型的贡献影响,
|
|
9
|
+
并构建偏好对来指导对弱模型的DPO微调。该流程支持独立模型基线和完整协作工作流。
|
|
10
|
+
|
|
11
|
+
task_mode: experiment_driven
|
|
12
|
+
requires_execution: true
|
|
13
|
+
requires_paper: true
|
|
14
|
+
integrity_level: cas_plus_canary
|
|
15
|
+
snapshot_status: external_eval_required
|
|
16
|
+
support_level: advanced
|
|
17
|
+
time_band: 1d+
|
|
18
|
+
cost_band: high
|
|
19
|
+
difficulty: hard
|
|
20
|
+
data_access: public
|
|
21
|
+
primary_outputs:
|
|
22
|
+
- f1_ifqa
|
|
23
|
+
- em_ifqa
|
|
24
|
+
- preference_checkpoint
|
|
25
|
+
- aligned_candidates
|
|
26
|
+
launch_profiles:
|
|
27
|
+
- id: quick_eval
|
|
28
|
+
label: 快速评估
|
|
29
|
+
description: 使用预训练检查点在弱强偏好堆栈上运行一个打包的评估流程。
|
|
30
|
+
- id: full_alignment
|
|
31
|
+
label: 完整对齐
|
|
32
|
+
description: 从零开始运行完整的SFT、DPO和偏好对齐聚合工作流。
|
|
33
|
+
- id: inference_only
|
|
34
|
+
label: 仅推理
|
|
35
|
+
description: 运行弱模型仅、强模型仅和弱强协作模式的推理流程,无需训练。
|
|
36
|
+
dataset_download:
|
|
37
|
+
primary_method: mixed
|
|
38
|
+
sources:
|
|
39
|
+
- type: url
|
|
40
|
+
url: https://deepscientist.cc/AISB/036_weakstrongpref
|
|
41
|
+
format: zip
|
|
42
|
+
- type: huggingface
|
|
43
|
+
note: 部分数据集可能需要Hugging Face认证以访问IFQA
|
|
44
|
+
notes:
|
|
45
|
+
- 压缩包解压至paper-36-WeakStrongPref
|
|
46
|
+
- IFQA数据集组件已包含在包中
|
|
47
|
+
credential_requirements:
|
|
48
|
+
mode: partial
|
|
49
|
+
items:
|
|
50
|
+
- HuggingFace token用于访问Llama-3-8B-Instruct
|
|
51
|
+
- Azure OpenAI API密钥用于GPT-35-turbo(在推理中作为强模型)
|
|
52
|
+
notes:
|
|
53
|
+
- 强模型可替换为开源权重替代方案以避免使用专有API
|
|
54
|
+
environment:
|
|
55
|
+
python: '3.10'
|
|
56
|
+
cuda: '11.8'
|
|
57
|
+
pytorch: 2.0.1
|
|
58
|
+
flash_attn: null
|
|
59
|
+
key_packages:
|
|
60
|
+
- bitsandbytes==0.41.1
|
|
61
|
+
- peft
|
|
62
|
+
- transformers
|
|
63
|
+
- datasets
|
|
64
|
+
- sklearn
|
|
65
|
+
- torch>=2.0.0
|
|
66
|
+
notes:
|
|
67
|
+
- 打包的依赖项将torch固定在<=2.0.1,并依赖自定义Transformers分支以实现左填充行为
|
|
68
|
+
- 详见打包的README/requirements.txt了解完整依赖集
|
|
69
|
+
- 完整训练流程需要多GPU配置
|
|
70
|
+
resources:
|
|
71
|
+
minimum:
|
|
72
|
+
cpu_cores: 16
|
|
73
|
+
ram_gb: 64
|
|
74
|
+
disk_gb: 150
|
|
75
|
+
gpu_count: 1
|
|
76
|
+
gpu_vram_gb: 24
|
|
77
|
+
notes: 单GPU仅可用于推理;训练需要多GPU以支持8B模型的LoRA
|
|
78
|
+
recommended:
|
|
79
|
+
cpu_cores: 32
|
|
80
|
+
ram_gb: 128
|
|
81
|
+
disk_gb: 300
|
|
82
|
+
gpu_count: 4
|
|
83
|
+
gpu_vram_gb: 24
|
|
84
|
+
notes: SFT脚本配置为4张24GB GPU(WORLD_SIZE=4);DPO使用3-GPU配置
|
|
85
|
+
risk_flags:
|
|
86
|
+
- proprietary_model_dependency
|
|
87
|
+
- multi_gpu_training_complexity
|
|
88
|
+
- dataset_size
|
|
89
|
+
risk_notes:
|
|
90
|
+
- 完整流程需要通过HuggingFace访问Llama-3-8B-Instruct
|
|
91
|
+
- DPO训练使用RMSProp优化器(非标准选择)
|
|
92
|
+
- 强模型推理依赖Azure OpenAI可能引入API延迟和成本
|
|
93
|
+
recommended_when: >
|
|
94
|
+
当您希望研究带对齐目标的弱强模型协作、探索从协作反馈构建偏好数据,
|
|
95
|
+
或在需要专业领域知识的知识密集型问答任务上评估对齐方法时使用此基准测试。
|
|
96
|
+
|
|
97
|
+
not_recommended_when: >
|
|
98
|
+
如果您无法支持LLaMA类8B模型的LoRA训练,或需要不含DPO风格优化的轻量级基准测试,
|
|
99
|
+
请勿使用。当Azure API不可用且未配置替代强模型时也不推荐使用。
|
|
100
|
+
|
|
101
|
+
paper:
|
|
102
|
+
title: Synergistic Weak-Strong Collaboration by Aligning Preferences
|
|
103
|
+
venue: ACL 2025
|
|
104
|
+
year: 2025
|
|
105
|
+
url: https://arxiv.org/abs/2504.15188
|
|
106
|
+
authors:
|
|
107
|
+
- Yizhu Jiao
|
|
108
|
+
- Xuchao Zhang
|
|
109
|
+
- Zhaoyang Wang
|
|
110
|
+
- Yubo Ma
|
|
111
|
+
- Zhun Deng
|
|
112
|
+
- Rujia Wang
|
|
113
|
+
- Chetan Bansal
|
|
114
|
+
- Saravan Rajmohan
|
|
115
|
+
- Jiawei Han
|
|
116
|
+
- Huaxiu Yao
|
|
117
|
+
institutions:
|
|
118
|
+
- Microsoft Research
|
|
119
|
+
- University of Illinois Urbana-Champaign
|
|
120
|
+
- University of North Carolina at Chapel Hill
|
|
121
|
+
- Nanyang Technological University
|
|
122
|
+
display:
|
|
123
|
+
palette_seed: coral-ink-alignment
|
|
124
|
+
art_style: preference-dashboard
|
|
125
|
+
accent_priority: high
|
|
126
|
+
image_path: ../image/036_aisb.t3.036_weakstrongpref.jpg
|
|
127
|
+
capability_tags:
|
|
128
|
+
- research_code_optimization
|
|
129
|
+
- alignment
|
|
130
|
+
- preference_optimization
|
|
131
|
+
- large_language_models
|
|
132
|
+
- evaluation
|
|
133
|
+
- weak_to_strong_generalization
|
|
134
|
+
- collaborative_inference
|
|
135
|
+
aisb_direction: T3
|
|
136
|
+
track_fit:
|
|
137
|
+
- paper_track
|
|
138
|
+
- benchmark_track
|
|
139
|
+
execution_notes:
|
|
140
|
+
stages:
|
|
141
|
+
- id: sft
|
|
142
|
+
script: sft.sh
|
|
143
|
+
model: meta-llama/Meta-Llama-3-8B-Instruct
|
|
144
|
+
dataset: ifqa_sft
|
|
145
|
+
epochs: 3
|
|
146
|
+
batch_per_device: 1
|
|
147
|
+
gradient_accumulation: 8
|
|
148
|
+
lora_r: 16
|
|
149
|
+
lora_alpha: 16
|
|
150
|
+
lr: 1.41e-05
|
|
151
|
+
output: ./lora_weights/llama3-8b-sft-ifqa
|
|
152
|
+
- id: dpo
|
|
153
|
+
script: dpo.sh
|
|
154
|
+
model: meta-llama/Meta-Llama-3-8B-Instruct
|
|
155
|
+
dataset: ifqa_dpo
|
|
156
|
+
epochs: 1
|
|
157
|
+
batch_per_device: 1
|
|
158
|
+
gradient_accumulation: 16
|
|
159
|
+
lora_r: 16
|
|
160
|
+
lora_alpha: 16
|
|
161
|
+
lr: 1.41e-05
|
|
162
|
+
optimizer: rmsprop
|
|
163
|
+
output: ./lora_weights/llama3-8b-dpo-ifqa
|
|
164
|
+
- id: inference
|
|
165
|
+
script: inference_weak_strong_models.py
|
|
166
|
+
weak_model: local/lora_adapters
|
|
167
|
+
strong_model: gpt-35-turbo (or open-weight alternative)
|
|
168
|
+
modes:
|
|
169
|
+
- weak_only
|
|
170
|
+
- strong_only
|
|
171
|
+
- weak_strong_collaborative
|
|
172
|
+
metric_protocol:
|
|
173
|
+
f1_ifqa:
|
|
174
|
+
origin: code-backed
|
|
175
|
+
source: evaluation.py
|
|
176
|
+
description: F1 score computed over question-level answers for IFQA dataset
|
|
177
|
+
em_ifqa:
|
|
178
|
+
origin: code-backed
|
|
179
|
+
source: evaluation.py
|
|
180
|
+
description: Exact match metric for IFQA question answering
|
|
181
|
+
caveats:
|
|
182
|
+
- No benchmark execution performed in packaging pass; metric values require trusted
|
|
183
|
+
runtime output
|
|
184
|
+
- Static audit confirms executable anchors exist for all staged metrics
|
|
185
|
+
- README absent in current snapshot; follow AGENTS.md code anchors
|
|
186
|
+
download:
|
|
187
|
+
provider: github_release
|
|
188
|
+
repo: ResearAI/DeepScientist
|
|
189
|
+
tag: aisb-v0.0.1
|
|
190
|
+
asset_name: aisb.t3.036_weakstrongpref.zip
|
|
191
|
+
url: https://github.com/ResearAI/DeepScientist/releases/download/aisb-v0.0.1/aisb.t3.036_weakstrongpref.zip
|
|
192
|
+
archive_type: zip
|
|
193
|
+
sha256: e98abaf4d388b77c120e970e947bd56f2be0276171180bf6c9ee3d5e144a77e7
|
|
194
|
+
size_bytes: 55245
|
|
@@ -0,0 +1,172 @@
|
|
|
1
|
+
id: aisb.t3.037_dementiamask
|
|
2
|
+
name: DementiaMask
|
|
3
|
+
version: 0.1.0
|
|
4
|
+
one_line: Debiasing benchmark for speech-based dementia detection using weight masking
|
|
5
|
+
to isolate gender-related confounding in transformer models.
|
|
6
|
+
task_description: 'This packaged benchmark covers gender-confound mitigation in speech-based
|
|
7
|
+
dementia detection using two weight masking methods: Extended Confounding Filter
|
|
8
|
+
(ECF) and Dual Filter (DF). The workflow trains a BERT-based classifier on speech-derived
|
|
9
|
+
transcripts, then applies gradient-based weight tracking to identify and ablate
|
|
10
|
+
parameters encoding confounding information. Primary evaluation measures AUPRC and
|
|
11
|
+
delta-FPR (false positive rate gap between demographic groups) to assess both diagnostic
|
|
12
|
+
performance and fairness under distribution shift. The benchmark generates train/eval/test
|
|
13
|
+
splits with controlled confounder distributions via cf_train_test_split, supporting
|
|
14
|
+
configurable confounder ratios and test set imbalances.
|
|
15
|
+
|
|
16
|
+
'
|
|
17
|
+
task_mode: experiment_driven
|
|
18
|
+
requires_execution: true
|
|
19
|
+
requires_paper: true
|
|
20
|
+
integrity_level: cas_plus_canary
|
|
21
|
+
snapshot_status: runnable
|
|
22
|
+
support_level: advanced
|
|
23
|
+
time_band: 2-6h
|
|
24
|
+
cost_band: medium
|
|
25
|
+
difficulty: medium
|
|
26
|
+
data_access: restricted
|
|
27
|
+
primary_outputs:
|
|
28
|
+
- diagnostic_scores
|
|
29
|
+
- debiasing_report
|
|
30
|
+
- delta_fpr
|
|
31
|
+
launch_profiles:
|
|
32
|
+
- id: quick_eval
|
|
33
|
+
label: Quick Eval
|
|
34
|
+
description: Run a single DualFilter evaluation path with preconfigured hyperparameters
|
|
35
|
+
(alpha_train=0.2, mask_ratio=0.10, 5 repetitions).
|
|
36
|
+
- id: full_train_eval
|
|
37
|
+
label: Full Train + Eval
|
|
38
|
+
description: Run the complete masking-based debiasing workflow with configurable
|
|
39
|
+
confounder ratios via run_df.sh, including downstream fairness evaluation.
|
|
40
|
+
- id: cf_evaluation
|
|
41
|
+
label: Confounding Filter Eval
|
|
42
|
+
description: Run the original Confounding Filter baseline via run_cf.sh for comparison.
|
|
43
|
+
- id: ecf_evaluation
|
|
44
|
+
label: Extended Confounding Filter Eval
|
|
45
|
+
description: Run ECF with sequential layer unfreezing via run_ecf.sh.
|
|
46
|
+
dataset_download:
|
|
47
|
+
primary_method: mixed
|
|
48
|
+
sources:
|
|
49
|
+
- deepscientist_url
|
|
50
|
+
notes:
|
|
51
|
+
- Archive contains pitts speech dataset with controlled confounder splits
|
|
52
|
+
- Data split parameters: train_pos_z0, train_pos_z1, alpha_test control confounder
|
|
53
|
+
distributions
|
|
54
|
+
credential_requirements:
|
|
55
|
+
mode: external_access
|
|
56
|
+
items:
|
|
57
|
+
- deepscientist_download_token
|
|
58
|
+
notes:
|
|
59
|
+
- Requires ACL 2025 artifact access
|
|
60
|
+
resources:
|
|
61
|
+
minimum:
|
|
62
|
+
cpu_cores: 8
|
|
63
|
+
ram_gb: 16
|
|
64
|
+
disk_gb: 30
|
|
65
|
+
gpu_count: 1
|
|
66
|
+
gpu_vram_gb: 8
|
|
67
|
+
recommended:
|
|
68
|
+
cpu_cores: 16
|
|
69
|
+
ram_gb: 32
|
|
70
|
+
disk_gb: 80
|
|
71
|
+
gpu_count: 1
|
|
72
|
+
gpu_vram_gb: 16
|
|
73
|
+
environment:
|
|
74
|
+
python: '3.9'
|
|
75
|
+
cuda: '11.8'
|
|
76
|
+
pytorch: 2.1.0
|
|
77
|
+
flash_attn: null
|
|
78
|
+
key_packages:
|
|
79
|
+
- transformers
|
|
80
|
+
- torch
|
|
81
|
+
- numpy
|
|
82
|
+
- pandas
|
|
83
|
+
notes:
|
|
84
|
+
- Model weights (bert-base-uncased) must be available locally or downloaded
|
|
85
|
+
- See bundled requirements.txt for full dependency set
|
|
86
|
+
risk_flags:
|
|
87
|
+
- distribution_shift
|
|
88
|
+
- demographic_disparity
|
|
89
|
+
risk_notes:
|
|
90
|
+
- Fairness-accuracy trade-off observed: debiasing may slightly reduce dementia detection
|
|
91
|
+
AUPRC
|
|
92
|
+
- Distribution shift between train and test confounder distributions can affect delta_FPR
|
|
93
|
+
- Gender confounder mitigation effectiveness varies with mask_ratio hyperparameter
|
|
94
|
+
recommended_when: 'Use this benchmark when evaluating bias mitigation techniques in
|
|
95
|
+
clinical NLP, studying gender confounding in dementia detection, or comparing weight
|
|
96
|
+
masking approaches (ECF vs DF vs naive baselines) on speech-derived medical text
|
|
97
|
+
classification.
|
|
98
|
+
|
|
99
|
+
'
|
|
100
|
+
not_recommended_when: 'Do not use when access to speech or clinical datasets is unavailable,
|
|
101
|
+
or when pure diagnostic accuracy without fairness considerations is the sole objective.
|
|
102
|
+
|
|
103
|
+
'
|
|
104
|
+
paper:
|
|
105
|
+
title: Mitigating Confounding in Speech-Based Dementia Detection through Weight
|
|
106
|
+
Masking
|
|
107
|
+
authors: Sheng, Ding, Hur, Li, Cohen, Pakhomov
|
|
108
|
+
venue: ACL 2025 Main
|
|
109
|
+
year: 2025
|
|
110
|
+
url: https://arxiv.org/abs/2506.05610
|
|
111
|
+
pages: 8
|
|
112
|
+
abstract: 'Deep transformer models detect linguistic anomalies in patient transcripts
|
|
113
|
+
for early Alzheimer''s disease screening. This work addresses gender confounding
|
|
114
|
+
via Extended Confounding Filter (ECF) and Dual Filter (DF), which isolate and
|
|
115
|
+
ablate gender-associated weights. Results show transformer models tend to overfit
|
|
116
|
+
training distributions; disrupting gender-related weights yields a deconfounded
|
|
117
|
+
classifier with trade-off of slightly reduced dementia detection performance.
|
|
118
|
+
|
|
119
|
+
'
|
|
120
|
+
download:
|
|
121
|
+
url: https://github.com/ResearAI/DeepScientist/releases/download/aisb-v0.0.1/aisb.t3.037_dementiamask.zip
|
|
122
|
+
archive_type: zip
|
|
123
|
+
local_dir_name: paper-37-DementiaMask
|
|
124
|
+
provider: github_release
|
|
125
|
+
repo: ResearAI/DeepScientist
|
|
126
|
+
tag: aisb-v0.0.1
|
|
127
|
+
asset_name: aisb.t3.037_dementiamask.zip
|
|
128
|
+
sha256: d759476e5d109aae5e6177cd9e294126e5d9fd1e1e21af1b40d16f855455ba5c
|
|
129
|
+
size_bytes: 98039
|
|
130
|
+
display:
|
|
131
|
+
palette_seed: wine-silver-clinic
|
|
132
|
+
art_style: clinical-notebook
|
|
133
|
+
accent_priority: medium
|
|
134
|
+
image_path: ../image/037_aisb.t3.037_dementiamask.jpg
|
|
135
|
+
metric_evidence:
|
|
136
|
+
- metric: auprc
|
|
137
|
+
origin: run_experiment.py
|
|
138
|
+
status: code_backed
|
|
139
|
+
protocol: 'Average Precision Score (APS) computed per group (full, confounder_0,
|
|
140
|
+
confounder_1) after DualFilter training and ablation. Reported per-group and aggregated.
|
|
141
|
+
|
|
142
|
+
'
|
|
143
|
+
- metric: delta_fpr
|
|
144
|
+
origin: run_experiment.py
|
|
145
|
+
status: code_backed
|
|
146
|
+
protocol: 'Delta FPR = FPR_confounder_1 - FPR_confounder_0, where FPR = 1 - recall_neg.
|
|
147
|
+
Measures fairness gap between demographic groups after debiasing.
|
|
148
|
+
|
|
149
|
+
'
|
|
150
|
+
execution_anchors:
|
|
151
|
+
primary: run_experiment.py
|
|
152
|
+
shell_scripts:
|
|
153
|
+
- run_df.sh
|
|
154
|
+
- run_cf.sh
|
|
155
|
+
- run_ecf.sh
|
|
156
|
+
model_modules:
|
|
157
|
+
- src/model/weights_filter.py
|
|
158
|
+
preprocessing: src/preprocessing/data_generator.py
|
|
159
|
+
hardware_notes: 'Single GPU execution sufficient. bert-base-uncased requires ~1.1GB
|
|
160
|
+
GPU memory for inference. Training with 5 repetitions and 30 epochs per rep on full
|
|
161
|
+
dataset may require 8+ GB VRAM with gradient accumulation. CPU-only execution not
|
|
162
|
+
recommended for training.
|
|
163
|
+
|
|
164
|
+
'
|
|
165
|
+
dataset_notes: 'pitts speech dataset contains first-person narrative transcripts from
|
|
166
|
+
dementia patients and healthy controls. cf_train_test_split generates controlled
|
|
167
|
+
confounder distributions with configurable P(Y=1|Z) for train and test sets. Key
|
|
168
|
+
parameters: train_pos_z0 (baseline dementia rate for Z=0), train_pos_z1 (baseline
|
|
169
|
+
dementia rate for Z=1), alpha_test (test set imbalance ratio). Test set fixed at
|
|
170
|
+
150 samples, validation at 120 samples.
|
|
171
|
+
|
|
172
|
+
'
|
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
id: aisb.t3.037_dementiamask
|
|
2
|
+
name: 痴呆症掩码
|
|
3
|
+
version: 0.1.0
|
|
4
|
+
one_line: 基于权重掩码的语音痴呆检测去偏基准,用于隔离Transformer模型中与性别相关的混杂因素。
|
|
5
|
+
task_description: |
|
|
6
|
+
此打包基准测试涵盖使用两种权重掩码方法进行语音痴呆检测中的性别混杂因素缓解:扩展混杂过滤器(ECF)和双过滤器(DF)。工作流程在语音衍生转录文本上训练基于BERT的分类器,然后应用基于梯度的权重追踪来识别并消融编码混杂信息的参数。主要评估指标包括AUPRC和delta-FPR(不同人口统计学组之间的假阳性率差距),以评估分布偏移下的诊断性能和公平性。基准测试通过cf_train_test_split生成训练/评估/测试集划分,并对混杂因素分布进行控制,支持可配置的混杂因素比例和测试集不平衡。
|
|
7
|
+
task_mode: experiment_driven
|
|
8
|
+
requires_execution: true
|
|
9
|
+
requires_paper: true
|
|
10
|
+
integrity_level: cas_plus_canary
|
|
11
|
+
snapshot_status: runnable
|
|
12
|
+
support_level: advanced
|
|
13
|
+
time_band: 2-6h
|
|
14
|
+
cost_band: medium
|
|
15
|
+
difficulty: medium
|
|
16
|
+
data_access: restricted
|
|
17
|
+
primary_outputs:
|
|
18
|
+
- diagnostic_scores
|
|
19
|
+
- debiasing_report
|
|
20
|
+
- delta_fpr
|
|
21
|
+
launch_profiles:
|
|
22
|
+
- id: quick_eval
|
|
23
|
+
label: 快速评估
|
|
24
|
+
description: 使用预配置超参数运行单一DualFilter评估路径(alpha_train=0.2, mask_ratio=0.10, 5次重复)。
|
|
25
|
+
- id: full_train_eval
|
|
26
|
+
label: 完整训练+评估
|
|
27
|
+
description: 通过run_df.sh运行完整的基于掩码的去偏工作流,支持可配置的混杂因素比例,包括下游公平性评估。
|
|
28
|
+
- id: cf_evaluation
|
|
29
|
+
label: 混杂过滤器评估
|
|
30
|
+
description: 通过run_cf.sh运行原始Confounding Filter基线进行对比。
|
|
31
|
+
- id: ecf_evaluation
|
|
32
|
+
label: 扩展混杂过滤器评估
|
|
33
|
+
description: 通过run_ecf.sh运行ECF,使用逐层解冻策略。
|
|
34
|
+
dataset_download:
|
|
35
|
+
primary_method: mixed
|
|
36
|
+
sources:
|
|
37
|
+
- deepscientist_url
|
|
38
|
+
notes:
|
|
39
|
+
- 压缩包包含pitts语音数据集及受控的混杂因素划分
|
|
40
|
+
- 数据划分参数:train_pos_z0、train_pos_z1、alpha_test控制混杂因素分布
|
|
41
|
+
credential_requirements:
|
|
42
|
+
mode: external_access
|
|
43
|
+
items:
|
|
44
|
+
- deepscientist_download_token
|
|
45
|
+
notes:
|
|
46
|
+
- 需要ACL 2025论文工件访问权限
|
|
47
|
+
resources:
|
|
48
|
+
minimum:
|
|
49
|
+
cpu_cores: 8
|
|
50
|
+
ram_gb: 16
|
|
51
|
+
disk_gb: 30
|
|
52
|
+
gpu_count: 1
|
|
53
|
+
gpu_vram_gb: 8
|
|
54
|
+
recommended:
|
|
55
|
+
cpu_cores: 16
|
|
56
|
+
ram_gb: 32
|
|
57
|
+
disk_gb: 80
|
|
58
|
+
gpu_count: 1
|
|
59
|
+
gpu_vram_gb: 16
|
|
60
|
+
environment:
|
|
61
|
+
python: '3.9'
|
|
62
|
+
cuda: '11.8'
|
|
63
|
+
pytorch: 2.1.0
|
|
64
|
+
flash_attn: null
|
|
65
|
+
key_packages:
|
|
66
|
+
- transformers
|
|
67
|
+
- torch
|
|
68
|
+
- numpy
|
|
69
|
+
- pandas
|
|
70
|
+
notes:
|
|
71
|
+
- 模型权重(bert-base-uncased)必须在本地可用或已下载
|
|
72
|
+
- 完整的依赖项列表见捆绑的requirements.txt
|
|
73
|
+
risk_flags:
|
|
74
|
+
- distribution_shift
|
|
75
|
+
- demographic_disparity
|
|
76
|
+
risk_notes:
|
|
77
|
+
- 观察到公平性-准确性权衡:去偏可能略微降低痴呆检测AUPRC
|
|
78
|
+
- 训练和测试混杂因素分布之间的分布偏移会影响delta_FPR
|
|
79
|
+
- 性别混杂因素缓解效果随mask_ratio超参数变化
|
|
80
|
+
recommended_when: |
|
|
81
|
+
在评估临床NLP中的偏差缓解技术、研究痴呆检测中的性别混杂因素,或比较语音衍生医学文本分类中的权重掩码方法(ECF vs DF vs 朴素基线)时使用此基准测试。
|
|
82
|
+
not_recommended_when: |
|
|
83
|
+
当无法获得语音或临床数据集访问权限,或当纯粹追求诊断准确性而不考虑公平性时,不建议使用此基准测试。
|
|
84
|
+
paper:
|
|
85
|
+
title: Mitigating Confounding in Speech-Based Dementia Detection through Weight
|
|
86
|
+
Masking
|
|
87
|
+
authors: Sheng, Ding, Hur, Li, Cohen, Pakhomov
|
|
88
|
+
venue: ACL 2025 Main
|
|
89
|
+
year: 2025
|
|
90
|
+
url: https://arxiv.org/abs/2506.05610
|
|
91
|
+
pages: 8
|
|
92
|
+
abstract: |
|
|
93
|
+
深度Transformer模型通过检测患者转录文本中的语言异常来进行早期阿尔茨海默病筛查。本工作通过扩展混杂过滤器(ECF)和双过滤器(DF)处理性别混杂因素,这两种方法可隔离并消融性别相关权重。结果表明Transformer模型倾向于过度拟合训练分布;破坏性别相关权重会产生去偏分类器,代价是痴呆检测性能略有下降。
|
|
94
|
+
download:
|
|
95
|
+
url: https://github.com/ResearAI/DeepScientist/releases/download/aisb-v0.0.1/aisb.t3.037_dementiamask.zip
|
|
96
|
+
archive_type: zip
|
|
97
|
+
local_dir_name: paper-37-DementiaMask
|
|
98
|
+
provider: github_release
|
|
99
|
+
repo: ResearAI/DeepScientist
|
|
100
|
+
tag: aisb-v0.0.1
|
|
101
|
+
asset_name: aisb.t3.037_dementiamask.zip
|
|
102
|
+
sha256: d759476e5d109aae5e6177cd9e294126e5d9fd1e1e21af1b40d16f855455ba5c
|
|
103
|
+
size_bytes: 98039
|
|
104
|
+
display:
|
|
105
|
+
palette_seed: wine-silver-clinic
|
|
106
|
+
art_style: clinical-notebook
|
|
107
|
+
accent_priority: medium
|
|
108
|
+
image_path: ../image/037_aisb.t3.037_dementiamask.jpg
|
|
109
|
+
metric_evidence:
|
|
110
|
+
- metric: auprc
|
|
111
|
+
origin: run_experiment.py
|
|
112
|
+
status: code_backed
|
|
113
|
+
protocol: |
|
|
114
|
+
在DualFilter训练和消融后,按组(full、confounder_0、confounder_1)计算平均精度分数(APS)。报告每组及聚合结果。
|
|
115
|
+
- metric: delta_fpr
|
|
116
|
+
origin: run_experiment.py
|
|
117
|
+
status: code_backed
|
|
118
|
+
protocol: |
|
|
119
|
+
Delta FPR = FPR_confounder_1 - FPR_confounder_0,其中FPR = 1 - recall_neg。衡量去偏后不同人口统计学组之间的公平性差距。
|
|
120
|
+
execution_anchors:
|
|
121
|
+
primary: run_experiment.py
|
|
122
|
+
shell_scripts:
|
|
123
|
+
- run_df.sh
|
|
124
|
+
- run_cf.sh
|
|
125
|
+
- run_ecf.sh
|
|
126
|
+
model_modules:
|
|
127
|
+
- src/model/weights_filter.py
|
|
128
|
+
preprocessing: src/preprocessing/data_generator.py
|
|
129
|
+
hardware_notes: |
|
|
130
|
+
单GPU执行足够。bert-base-uncased推理需要约1.1GB GPU内存。在完整数据集上用5次重复和每次30个epoch进行训练,配合梯度累积可能需要8GB以上显存。不建议CPU-only执行训练。
|
|
131
|
+
dataset_notes: |
|
|
132
|
+
pitts语音数据集包含来自痴呆症患者和健康对照组的第一人称叙事转录文本。cf_train_test_split生成受控的混杂因素分布,支持可配置的P(Y=1|Z)用于训练和测试集。关键参数:train_pos_z0(Z=0的基线痴呆率)、train_pos_z1(Z=1的基线痴呆率)、alpha_test(测试集不平衡比例)。测试集固定150个样本,验证集120个样本。
|